ALLaVA

Vision-Language Model Dataset

A collection of datasets and models designed to support the training of lite vision-language models.

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

GitHub

246 stars
11 watching
8 forks
Language: Python
last commit: 5 months ago

Related projects:

Repository Description Stars
freedomintelligence/longllava A system for scaling large language models to process and understand visual information from multiple images efficiently. 179
baaivision/eve A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities 230
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
evolvinglmms-lab/longva This project provides a model for long context transfer from language to vision using a deep learning framework. 334
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
dvlab-research/lisa A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. 1,861
maluuba/geneva_datasets Scripts to generate datasets for an image generation task using Generative Adversarial Networks and deep learning techniques 37
luogen1996/lavin An open-source implementation of a vision-language instructed large language model 508
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 85
shizhediao/davinci An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. 43
freedomintelligence/huatuogpt Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions 1,076
llava-vl/llava-plus-codebase A platform for training and deploying large language and vision models that can use tools to perform tasks 704
jy0205/lavit A unified framework for training large language models to understand and generate visual content 528