ALLaVA
Vision-Language Model Dataset
A collection of datasets and models designed to support the training of lite vision-language models.
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
249 stars
11 watching
9 forks
Language: Python
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A system for scaling large language models to process and understand visual information from multiple images efficiently. | 183 |
| A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. | 347 |
| A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
| Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
| A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
| A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,923 |
| Scripts to generate datasets for an image generation task using Generative Adversarial Networks and deep learning techniques | 37 |
| An open-source implementation of a vision-language instructed large language model | 513 |
| An annotated preference dataset and training framework for improving large vision language models. | 88 |
| Implementing a unified modal learning framework for generative vision-language models | 43 |
| Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions | 1,093 |
| A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
| A unified framework for training large language models to understand and generate visual content | 544 |