deepsparse
Inference runtime
A sparsity-aware deep learning inference runtime for CPUs that optimizes neural network performance on CPU hardware.
Sparsity-aware deep learning inference runtime for CPUs
3k stars
57 watching
173 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list
computer-visioncpusdeepsparseinferencellm-inferencemachinelearningnlpobject-detectiononnxperformancepretrained-modelspruningquantizationsparsification
Related projects:
Repository | Description | Stars |
---|---|---|
neuralmagic/sparseml | Enables the creation of smaller neural network models through efficient pruning and quantization techniques | 2,071 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,226 |
microsoft/deepspeed | A deep learning optimization library that makes distributed training and inference easy, efficient, and effective. | 35,463 |
mit-han-lab/llm-awq | A tool for efficient and accurate weight quantization in large language models | 2,517 |
labmlai/annotated_deep_learning_paper_implementations | Implementations of various deep learning algorithms and techniques with accompanying documentation | 56,215 |
confident-ai/deepeval | A framework for evaluating large language models | 3,669 |
ludwig-ai/ludwig | A low-code framework for building custom deep learning models and neural networks | 11,189 |
oxford-cs-deepnlp-2017/lectures | An open-source repository containing lecture slides and course materials for an advanced natural language processing course. | 15,683 |
deepseek-ai/deepseek-v2 | A high-performance mixture-of-experts language model with strong performance and efficient inference capabilities. | 3,590 |
fminference/flexllmgen | Generates large language model outputs in high-throughput mode on single GPUs | 9,192 |
google-deepmind/deepmind-research | Provides implementations and illustrative code to accompany DeepMind research publications | 13,250 |
microsoft/lightgbm | A high-performance gradient boosting framework for machine learning tasks | 16,694 |
modeltc/lightllm | An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. | 2,609 |
optimalscale/lmflow | A toolkit for finetuning large language models and providing efficient inference capabilities | 8,273 |
lyogavin/airllm | A Python library that optimizes inference memory usage for large language models on limited GPU resources. | 5,259 |