mPLUG-Owl
Visual AI model
Develops large language models that can understand and generate human-like visual and video content
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
2k stars
30 watching
177 forks
Language: Python
last commit: about 2 months ago alpacachatbotchatgptdamodialoguegptgpt4gpt4-apihuggingfaceinstruction-tuninglarge-language-modelsllamamplugmplug-owlmultimodalpretrainingpytorchtransformervideovisual-recognition
Related projects:
Repository | Description | Stars |
---|---|---|
x-plug/mplug-halowl | Evaluates and mitigates hallucinations in multimodal large language models | 82 |
pku-yuangroup/video-llava | A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
fuxiaoliu/mmc | Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 87 |
mooler0410/llmspracticalguide | A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP | 9,551 |
microsoft/promptbench | A unified framework for evaluating large language models' performance and robustness in various scenarios. | 2,487 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
x-plug/mplug-docowl | A large language model designed to understand documents without OCR, focusing on document structure and content analysis. | 1,958 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,229 |
internlm/internlm-xcomposer | A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition | 2,616 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,490 |
yuliang-liu/monkey | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,179 |
qwenlm/qwen2-vl | A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,613 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |