FlexLLMGen

Batch processor

Generates large language model outputs in high-throughput mode on single GPUs

Running large language models on a single GPU for throughput-oriented scenarios.

Archived

GitHub

9k stars
112 watching
553 forks
Language: Python
last commit: about 2 months ago
deep-learninggpt-3high-throughputlarge-language-modelsmachine-learningoffloadingopt

Related projects:

Repository Description Stars
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551
huggingface/text-generation-inference A toolkit for deploying and serving Large Language Models (LLMs) for high-performance text generation 9,456
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,487
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,691
google/big-bench A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks. 2,899
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
thudm/glm-130b An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. 7,672
microsoft/flaml Automates machine learning workflows and optimizes model performance using large language models and efficient algorithms 3,968
aksnzhy/xlearn A high-performance machine learning package with linear models and factorization machines. 3,087
qwenlm/qwen2.5 A large language model series with various sizes and variants for text generation and understanding. 10,959
dair-ai/ml-papers-explained An explanation of key concepts and advancements in the field of Machine Learning 7,352
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 40,053
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,428