ALLaVA
Vision-Language Model Dataset
A collection of datasets and models designed to support the training of lite vision-language models.
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
246 stars
11 watching
8 forks
Language: Python
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
freedomintelligence/longllava | A system for scaling large language models to process and understand visual information from multiple images efficiently. | 179 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 230 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
evolvinglmms-lab/longva | This project provides a model for long context transfer from language to vision using a deep learning framework. | 334 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,861 |
maluuba/geneva_datasets | Scripts to generate datasets for an image generation task using Generative Adversarial Networks and deep learning techniques | 37 |
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 508 |
vlf-silkie/vlfeedback | An annotated preference dataset and training framework for improving large vision language models. | 85 |
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
freedomintelligence/huatuogpt | Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions | 1,076 |
llava-vl/llava-plus-codebase | A platform for training and deploying large language and vision models that can use tools to perform tasks | 704 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 528 |