MoE-LLaVA
Mixture of Experts Model
A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks
Mixture-of-Experts for Large Vision-Language Models
2k stars
24 watching
126 forks
Language: Python
last commit: 3 months ago large-vision-language-modelmixture-of-expertsmoemulti-modal
Related projects:
Repository | Description | Stars |
---|---|---|
| Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |
| A method for scaling multimodal large language models by combining multiple experts and fine-tuning them together | 136 |
| Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 314 |
| A high-performance mixture-of-experts model with innovative training techniques for language processing tasks | 126 |
| Debiasing techniques to minimize hallucinations in large visual language models | 75 |
| Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 37 |
| Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 |
| Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 |
| A large language model with improved efficiency and performance compared to similar models | 1,024 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| Develops and releases Mixtral-based models for natural language processing tasks with a focus on Chinese text generation and understanding | 589 |
| This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. | 106 |
| A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 101 |
| A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 182 |
| A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |