skypilot

AI/batch workload manager

A framework for running AI and batch workloads on any infrastructure, offering unified execution, cost savings, and high GPU availability.

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

GitHub

7k stars
70 watching
510 forks
Language: Python
last commit: 6 days ago
Linked from 2 awesome lists

cloud-computingcloud-managementcost-managementcost-optimizationdata-sciencedeep-learningdistributed-trainingfinopsgpuhyperparameter-tuningjob-queuejob-schedulerllm-servingllm-trainingmachine-learningml-infrastructureml-platformmulticloudspot-instancestpu

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lightning-ai/lit-llama An implementation of a large language model using the nanoGPT architecture 5,993
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
hiyouga/llama-factory A unified platform for fine-tuning multiple large language models with various training approaches and methods 34,436
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,722
lyogavin/airllm A Python library that optimizes inference memory usage for large language models on limited GPU resources. 5,259
nomic-ai/gpt4all An open-source Python client for running Large Language Models (LLMs) locally on any device. 70,694
meta-llama/llama-recipes Provides tools and examples for fine-tuning the Meta Llama model and building applications with it 15,288
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,359
scisharp/llamasharp A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices 2,703
sgl-project/sglang A framework for serving large language models and vision models with efficient runtime and flexible interface. 6,082
llava-vl/llava-next Develops large multimodal models for various computer vision tasks including image and video analysis 2,872
sjtu-ipads/powerinfer An efficient Large Language Model inference engine leveraging consumer-grade GPUs on PCs 7,964
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
ploomber/ploomber A platform for building and deploying data pipelines using Python, with features for caching, automation, and modularization. 3,513
meta-llama/llama-stack Provides a set of standardized APIs and tools to build generative AI applications 4,591