ALLaVA

Vision-Language Model Dataset

A collection of datasets and models designed to support the training of lite vision-language models.

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

GitHub

249 stars
11 watching
9 forks
Language: Python
last commit: 7 months ago

Related projects:

Repository Description Stars
freedomintelligence/longllava A system for scaling large language models to process and understand visual information from multiple images efficiently. 183
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 246
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,299
evolvinglmms-lab/longva An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. 347
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,145
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 302
dvlab-research/lisa A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. 1,923
maluuba/geneva_datasets Scripts to generate datasets for an image generation task using Generative Adversarial Networks and deep learning techniques 37
luogen1996/lavin An open-source implementation of a vision-language instructed large language model 513
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 88
shizhediao/davinci Implementing a unified modal learning framework for generative vision-language models 43
freedomintelligence/huatuogpt Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions 1,093
llava-vl/llava-plus-codebase A platform for training and deploying large language and vision models that can use tools to perform tasks 717
jy0205/lavit A unified framework for training large language models to understand and generate visual content 544