DeepSeek-MoE
Efficient LLM
A large language model with improved efficiency and performance compared to similar models
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
1k stars
16 watching
53 forks
Language: Python
last commit: 11 months ago Related projects:
Repository | Description | Stars |
---|---|---|
deepseek-ai/deepseek-llm | A large language model trained on a massive dataset for various applications | 1,512 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
google-deepmind/recurrentgemma | An implementation of a fast and efficient language model architecture | 613 |
deepseek-ai/deepseek-coder-v2 | A code intelligence model designed to generate and complete code in various programming languages | 2,322 |
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 513 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 93 |
ai-hypercomputer/maxtext | A high-performance LLM written in Python/Jax for training and inference on Google Cloud TPUs and GPUs. | 1,557 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,786 |
elanmart/psmm | An implementation of a neural network model for character-level language modeling. | 50 |
pku-yuangroup/moe-llava | A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks | 2,023 |
horseee/deepcache | A novel paradigm to accelerate diffusion models by reusing and updating high-level features in a cheap way | 818 |
mpaepper/llm_agents | Builds agents controlled by large language models (LLMs) to perform tasks with tool-based components | 940 |
aiplanethub/beyondllm | An open-source toolkit for building and evaluating large language models | 267 |
darshandeshpande/jax-models | Provides a collection of deep learning models and utilities in JAX/Flax for research purposes. | 151 |
xverse-ai/xverse-moe-a4.2b | Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |