360VL
Image understanding model
A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.
30 stars
0 watching
2 forks
Language: Python
last commit: 6 months ago Related projects:
Repository | Description | Stars |
---|---|---|
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
jiyt17/ida-vlm | A project that proposes and develops an identity-aware large vision-language model to understand complex visual narratives like movies. | 25 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
alibaba/conv-llava | This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. | 104 |
ucas-haoranwei/vary | An implementation of a vision vocabulary model for large language models to improve document understanding and recognition capabilities | 1,817 |
isekai-portal/link-context-learning | An implementation of a multimodal learning approach to improve language models' ability to recognize unseen images and understand novel concepts. | 89 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,861 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 36 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 244 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,089 |
deepcs233/visual-cot | Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning | 134 |