DeepSeek-MoE

Efficient LLM

A large language model with improved efficiency and performance compared to similar models

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

GitHub

1k stars

16 watching

53 forks

Language: Python

last commit: almost 2 years ago

Related projects:

Repository	Description	Stars
deepseek-ai/deepseek-llm	A large language model trained on a massive dataset for various applications	1,512
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
google-deepmind/recurrentgemma	An implementation of a fast and efficient language model architecture	613
deepseek-ai/deepseek-coder-v2	A code intelligence model designed to generate and complete code in various programming languages	2,322
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
damo-nlp-sg/m3exam	A benchmark for evaluating large language models in multiple languages and formats	93
ai-hypercomputer/maxtext	A high-performance LLM written in Python/Jax for training and inference on Google Cloud TPUs and GPUs.	1,557
vhellendoorn/code-lms	A guide to using pre-trained large language models in source code analysis and generation	1,789
elanmart/psmm	An implementation of a neural network model for character-level language modeling.	50
pku-yuangroup/moe-llava	A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks	2,023
horseee/deepcache	A novel paradigm to accelerate diffusion models by reusing and updating high-level features in a cheap way	818
mpaepper/llm_agents	Builds agents controlled by large language models (LLMs) to perform tasks with tool-based components	940
aiplanethub/beyondllm	An open-source toolkit for building and evaluating large language models	267
darshandeshpande/jax-models	Provides a collection of deep learning models and utilities in JAX/Flax for research purposes.	151
xverse-ai/xverse-moe-a4.2b	Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding.	36