mPLUG-Owl

Visual AI model

Develops large language models that can understand and generate human-like visual and video content

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

GitHub

2k stars
30 watching
177 forks
Language: Python
last commit: about 2 months ago
alpacachatbotchatgptdamodialoguegptgpt4gpt4-apihuggingfaceinstruction-tuninglarge-language-modelsllamamplugmplug-owlmultimodalpretrainingpytorchtransformervideovisual-recognition

Related projects:

Repository Description Stars
x-plug/mplug-halowl Evaluates and mitigates hallucinations in multimodal large language models 82
pku-yuangroup/video-llava A deep learning framework for generating videos from text inputs and visual features. 3,071
fuxiaoliu/mmc Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. 87
mooler0410/llmspracticalguide A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP 9,551
microsoft/promptbench A unified framework for evaluating large language models' performance and robustness in various scenarios. 2,487
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
x-plug/mplug-docowl A large language model designed to understand documents without OCR, focusing on document structure and content analysis. 1,958
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,229
internlm/internlm-xcomposer A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition 2,616
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,490
yuliang-liu/monkey An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. 1,849
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,179
qwenlm/qwen2-vl A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. 3,613
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 73
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775