Qwen2-VL

Multimodal LM

A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text.

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

GitHub

4k stars

30 watching

224 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
qwenlm/qwen2.5	A large language model series with various sizes and variants for text generation and understanding.	10,959
qwenlm/qwen	This repository provides large language models and chat capabilities based on pre-trained Chinese models.	14,797
qwenlm/qwen-audio	A multimodal audio language model developed by Alibaba Cloud that supports various tasks and languages	1,515
internlm/internlm-xcomposer	A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition	2,616
sgl-project/sglang	A fast serving framework for large language models and vision language models.	6,551
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
qwenlm/qwen2-audio	An audio-language model that can analyze or respond to speech instructions based on audio input	1,306
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
llava-vl/llava-next	Develops large multimodal models for various computer vision tasks including image and video analysis	3,099
wang-bin/qtav	A multimedia framework that provides an easy-to-use API for building video players across multiple platforms.	4,001
pku-yuangroup/video-llava	A deep learning framework for generating videos from text inputs and visual features.	3,071