Qwen-VL

Large vision language model

A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

GitHub

5k stars
49 watching
392 forks
Language: Python
last commit: 5 months ago
large-language-modelsvision-language-model

Related projects:

Repository Description Stars
qwenlm/qwen2-vl A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. 3,613
qwenlm/qwen This repository provides large language models and chat capabilities based on pre-trained Chinese models. 14,797
qwenlm/qwen2.5 A large language model series with various sizes and variants for text generation and understanding. 10,959
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
internlm/internlm-xcomposer A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition 2,616
qwenlm/qwen-audio A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages 1,515
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,229
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,490
openbmb/minicpm-v A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. 12,870
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
google-research/big_vision Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. 2,439
thudm/cogvlm Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. 6,182
llava-vl/llava-next Develops large multimodal models for various computer vision tasks including image and video analysis 3,099