360VL

Image understanding model

A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.

32 stars

0 watching

2 forks

Language: Python

last commit: about 1 year ago

Related projects:

Repository	Description	Stars
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
jiyt17/ida-vlm	An open-source project that aims to improve large vision-language models by integrating identity-aware capabilities and utilizing visual instruction tuning data for movie understanding	26
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
opengvlab/visionllm	A large language model designed to process and generate visual information	956
alibaba/conv-llava	This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance.	106
ucas-haoranwei/vary	An implementation of a vision vocabulary model for large language models to improve document understanding and recognition capabilities	1,831
isekai-portal/link-context-learning	An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts.	91
dvlab-research/lisa	A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge.	1,923
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37
boheumd/ma-lmm	This project develops an AI model for long-term video understanding	254
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
deepcs233/visual-cot	A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts.	162