mPLUG-Owl
Visual AI model
Develops large language models that can understand and generate human-like visual and video content
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
2k stars
30 watching
177 forks
Language: Python
last commit: 11 months ago alpacachatbotchatgptdamodialoguegptgpt4gpt4-apihuggingfaceinstruction-tuninglarge-language-modelsllamamplugmplug-owlmultimodalpretrainingpytorchtransformervideovisual-recognition
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | Evaluates and mitigates hallucinations in multimodal large language models | 82 |
| | A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
| | Develops a large-scale dataset and benchmark for training multimodal chart understanding models using large language models. | 87 |
| | A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP | 9,551 |
| | A unified framework for evaluating large language models' performance and robustness in various scenarios. | 2,487 |
| | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
| | A large language model designed to understand documents without OCR, focusing on document structure and content analysis. | 1,958 |
| | An open-source framework for training large language models with vision capabilities. | 3,229 |
| | A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition | 2,616 |
| | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,490 |
| | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,179 |
| | A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,613 |
| | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |