VisionLLM

Visual decoder

A large language model designed to process and generate visual information

VisionLLM Series

956 stars

45 watching

29 forks

Language: Python

last commit: 10 months ago

generalist-modellarge-language-modelsobject-detection

Screenshot of OpenGVLab/VisionLLM website

arxiv.org/abs/2305.11175

Related projects:

Repository	Description	Stars
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
visual-openllm/visual-openllm	An interactive tool that connects multiple visual models and an LLM to facilitate text-based conversations.	1,213
dvlab-research/lisa	A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge.	1,923
opengvlab/all-seeing	A research project that develops tools and models for understanding visual data in the open world, enabling applications such as image-text retrieval and relation comprehension.	466
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
opengvlab/controlllm	An open-source framework for augmenting large language models with tools by searching on graphs to solve complex real-world tasks.	187
vhellendoorn/code-lms	A guide to using pre-trained large language models in source code analysis and generation	1,789
360cvgroup/360vl	A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.	32
gordonhu608/mqt-llava	A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens.	101
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585