Qwen-VL

Large vision language model

A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

GitHub

5k stars

49 watching

392 forks

Language: Python

last commit: 12 months ago

large-language-modelsvision-language-model

Related projects:

Repository	Description	Stars
qwenlm/qwen2-vl	A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text.	3,613
qwenlm/qwen	This repository provides large language models and chat capabilities based on pre-trained Chinese models.	14,797
qwenlm/qwen2.5	A large language model series with various sizes and variants for text generation and understanding.	10,959
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
internlm/internlm-xcomposer	A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition	2,616
qwenlm/qwen-audio	A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages	1,515
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
openbmb/minicpm-v	A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs.	12,870
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
google-research/big_vision	Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.	2,439
thudm/cogvlm	Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.	6,182
llava-vl/llava-next	Develops large multimodal models for various computer vision tasks including image and video analysis	3,099