mPLUG-Owl
Visual AI model
Develops large language models that can understand and generate human-like visual and video content
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
2k stars
30 watching
176 forks
Language: Python
last commit: about 1 month ago alpacachatbotchatgptdamodialoguegptgpt4gpt4-apihuggingfaceinstruction-tuninglarge-language-modelsllamamplugmplug-owlmultimodalpretrainingpytorchtransformervideovisual-recognition
Related projects:
Repository | Description | Stars |
---|---|---|
x-plug/mplug-halowl | Evaluates and mitigates hallucinations in multimodal large language models | 79 |
pku-yuangroup/video-llava | This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. | 2,990 |
fuxiaoliu/mmc | Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 84 |
mooler0410/llmspracticalguide | A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP | 9,489 |
microsoft/promptbench | A unified framework for evaluating large language models' performance and robustness in various scenarios. | 2,462 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
x-plug/mplug-docowl | A large language model designed to understand documents without OCR, focusing on document structure and content analysis. | 1,563 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
internlm/internlm-xcomposer | A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue. | 2,521 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,422 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,045 |
qwenlm/qwen2-vl | A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,093 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 72 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |