skypilot

AI/batch workload manager

A framework for running AI and batch workloads on any infrastructure, offering unified execution, cost savings, and high GPU availability.

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

GitHub

7k stars
70 watching
529 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists

cloud-computingcloud-managementcost-managementcost-optimizationdata-sciencedeep-learningdistributed-trainingfinopsgpuhyperparameter-tuningjob-queuejob-schedulerllm-servingllm-trainingmachine-learningml-infrastructureml-platformmulticloudspot-instancestpu

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lightning-ai/lit-llama An implementation of a large language model using the nanoGPT architecture 6,013
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
hiyouga/llama-factory A tool for efficiently fine-tuning large language models across multiple architectures and methods. 36,219
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
lyogavin/airllm Optimizes large language model inference on limited GPU resources 5,446
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 71,176
meta-llama/llama-recipes Provides tools and examples for fine-tuning the Meta Llama model and building applications with it 15,578
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
scisharp/llamasharp An efficient C#/.NET library for running Large Language Models (LLMs) on local devices 2,750
sgl-project/sglang A fast serving framework for large language models and vision language models. 6,551
llava-vl/llava-next Develops large multimodal models for various computer vision tasks including image and video analysis 3,099
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 8,011
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
ploomber/ploomber A platform for building and deploying data pipelines using Python, with features for caching, automation, and modularization. 3,530
meta-llama/llama-stack Provides pre-packaged building blocks for generative AI applications with standardized APIs and service-oriented design. 5,164