DeepSeek-MoE

Efficient LLM

A large language model with improved efficiency and performance compared to similar models

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

GitHub

1k stars
16 watching
53 forks
Language: Python
last commit: 11 months ago

Related projects:

Repository Description Stars
deepseek-ai/deepseek-llm A large language model trained on a massive dataset for various applications 1,512
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,145
google-deepmind/recurrentgemma An implementation of a fast and efficient language model architecture 613
deepseek-ai/deepseek-coder-v2 A code intelligence model designed to generate and complete code in various programming languages 2,322
luogen1996/lavin An open-source implementation of a vision-language instructed large language model 513
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 93
ai-hypercomputer/maxtext A high-performance LLM written in Python/Jax for training and inference on Google Cloud TPUs and GPUs. 1,557
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,786
elanmart/psmm An implementation of a neural network model for character-level language modeling. 50
pku-yuangroup/moe-llava A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks 2,023
horseee/deepcache A novel paradigm to accelerate diffusion models by reusing and updating high-level features in a cheap way 818
mpaepper/llm_agents Builds agents controlled by large language models (LLMs) to perform tasks with tool-based components 940
aiplanethub/beyondllm An open-source toolkit for building and evaluating large language models 267
darshandeshpande/jax-models Provides a collection of deep learning models and utilities in JAX/Flax for research purposes. 151
xverse-ai/xverse-moe-a4.2b Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. 36