ALLaVA

Vision-Language Model Dataset

A collection of datasets and models designed to support the training of lite vision-language models.

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

GitHub

249 stars

11 watching

9 forks

Language: Python

last commit: about 2 years ago

Related projects:

Repository	Description	Stars
freedomintelligence/longllava	A system for scaling large language models to process and understand visual information from multiple images efficiently.	183
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
dvlab-research/lisa	A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge.	1,923
maluuba/geneva_datasets	Scripts to generate datasets for an image generation task using Generative Adversarial Networks and deep learning techniques	37
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
vlf-silkie/vlfeedback	An annotated preference dataset and training framework for improving large vision language models.	88
shizhediao/davinci	Implementing a unified modal learning framework for generative vision-language models	43
freedomintelligence/huatuogpt	Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions	1,093
llava-vl/llava-plus-codebase	A platform for training and deploying large language and vision models that can use tools to perform tasks	717
jy0205/lavit	A unified framework for training large language models to understand and generate visual content	544