DeepSeek-MoE
Efficient LLM
A large language model with improved efficiency and performance compared to similar models
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
1k stars
16 watching
53 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
| A large language model trained on a massive dataset for various applications | 1,512 |
| A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
| An implementation of a fast and efficient language model architecture | 613 |
| A code intelligence model designed to generate and complete code in various programming languages | 2,322 |
| An open-source implementation of a vision-language instructed large language model | 513 |
| A benchmark for evaluating large language models in multiple languages and formats | 93 |
| A high-performance LLM written in Python/Jax for training and inference on Google Cloud TPUs and GPUs. | 1,557 |
| A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
| An implementation of a neural network model for character-level language modeling. | 50 |
| A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks | 2,023 |
| A novel paradigm to accelerate diffusion models by reusing and updating high-level features in a cheap way | 818 |
| Builds agents controlled by large language models (LLMs) to perform tasks with tool-based components | 940 |
| An open-source toolkit for building and evaluating large language models | 267 |
| Provides a collection of deep learning models and utilities in JAX/Flax for research purposes. | 151 |
| Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |