360VL

Image understanding model

A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.

GitHub

30 stars
0 watching
2 forks
Language: Python
last commit: 6 months ago

Related projects:

Repository Description Stars
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
jiyt17/ida-vlm A project that proposes and develops an identity-aware large vision-language model to understand complex visual narratives like movies. 25
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
opengvlab/visionllm A large language model designed to process and generate visual information 915
alibaba/conv-llava This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. 104
ucas-haoranwei/vary An implementation of a vision vocabulary model for large language models to improve document understanding and recognition capabilities 1,817
isekai-portal/link-context-learning An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. 89
dvlab-research/lisa A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. 1,861
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36
boheumd/ma-lmm This project develops an AI model for long-term video understanding 244
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
deepcs233/visual-cot Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning 134