ALLaVA
Vision-Language Model Dataset
A collection of datasets and models designed to support the training of lite vision-language models.
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
249 stars
11 watching
9 forks
Language: Python
last commit: 7 months ago Related projects:
Repository | Description | Stars |
---|---|---|
freedomintelligence/longllava | A system for scaling large language models to process and understand visual information from multiple images efficiently. | 183 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
evolvinglmms-lab/longva | An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. | 347 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,923 |
maluuba/geneva_datasets | Scripts to generate datasets for an image generation task using Generative Adversarial Networks and deep learning techniques | 37 |
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 513 |
vlf-silkie/vlfeedback | An annotated preference dataset and training framework for improving large vision language models. | 88 |
shizhediao/davinci | Implementing a unified modal learning framework for generative vision-language models | 43 |
freedomintelligence/huatuogpt | Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions | 1,093 |
llava-vl/llava-plus-codebase | A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 544 |