mPLUG-Owl

Visual AI model

Develops large language models that can understand and generate human-like visual and video content

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

GitHub

2k stars

30 watching

177 forks

Language: Python

last commit: about 1 year ago

alpacachatbotchatgptdamodialoguegptgpt4gpt4-apihuggingfaceinstruction-tuninglarge-language-modelsllamamplugmplug-owlmultimodalpretrainingpytorchtransformervideovisual-recognition

www.modelscope.cn/studios/damo/mPLUG-Owl

Related projects:

Repository	Description	Stars
x-plug/mplug-halowl	Evaluates and mitigates hallucinations in multimodal large language models	82
pku-yuangroup/video-llava	A deep learning framework for generating videos from text inputs and visual features.	3,071
fuxiaoliu/mmc	Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models.	87
mooler0410/llmspracticalguide	A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP	9,551
microsoft/promptbench	A unified framework for evaluating large language models' performance and robustness in various scenarios.	2,487
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
x-plug/mplug-docowl	A large language model designed to understand documents without OCR, focusing on document structure and content analysis.	1,958
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
internlm/internlm-xcomposer	A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition	2,616
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
qwenlm/qwen2-vl	A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text.	3,613
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775