MoE-LLaVA

Mixture-of-Experts architecture

Develops a neural network architecture for multi-modal learning with large vision-language models

Mixture-of-Experts for Large Vision-Language Models

GitHub

2k stars
24 watching
127 forks
Language: Python
last commit: 6 months ago
large-vision-language-modelmixture-of-expertsmoemulti-modal

Related projects:

Repository Description Stars
xverse-ai/xverse-moe-a4.2b Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. 36
shi-labs/cumo A method for scaling multimodal large language models by combining multiple experts and fine-tuning them together 134
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
skyworkai/skywork-moe A high-performance mixture-of-experts model with innovative training techniques for language processing tasks 126
yfzhang114/llava-align Debiasing techniques to minimize hallucinations in large visual language models 71
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
deepseek-ai/deepseek-moe A large language model with improved efficiency and performance compared to similar models 1,006
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
ymcui/chinese-mixtral Develops and releases Mixtral-based models for natural language processing tasks with a focus on Chinese text generation and understanding 584
alibaba/conv-llava This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. 104
gordonhu608/mqt-llava A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. 97
ieit-yuan/yuan2.0-m32 A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation 180
llava-vl/llava-plus-codebase A platform for training and deploying large language and vision models that can use tools to perform tasks 704