mPLUG-Owl

Visual AI model

Develops large language models that can understand and generate human-like visual and video content

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

GitHub

2k stars
30 watching
176 forks
Language: Python
last commit: about 1 month ago
alpacachatbotchatgptdamodialoguegptgpt4gpt4-apihuggingfaceinstruction-tuninglarge-language-modelsllamamplugmplug-owlmultimodalpretrainingpytorchtransformervideovisual-recognition

Related projects:

Repository Description Stars
x-plug/mplug-halowl Evaluates and mitigates hallucinations in multimodal large language models 79
pku-yuangroup/video-llava This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. 2,990
fuxiaoliu/mmc Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. 84
mooler0410/llmspracticalguide A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP 9,489
microsoft/promptbench A unified framework for evaluating large language models' performance and robustness in various scenarios. 2,462
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
x-plug/mplug-docowl A large language model designed to understand documents without OCR, focusing on document structure and content analysis. 1,563
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,211
internlm/internlm-xcomposer A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue. 2,521
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,422
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,045
qwenlm/qwen2-vl A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. 3,093
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754