foundation-model-benchmarking-tool

AI benchmarking tool

A tool for benchmarking performance and accuracy of generative AI models on various AWS platforms

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

GitHub

210 stars

8 watching

32 forks

Language: Jupyter Notebook

last commit: 8 months ago

Linked from 1 awesome list

bedrockbenchmarkbenchmarkingevaluation-metricsfoundation-modelsg5g6g6egenerative-aiinferentiallama2llama3p4dp5sagemakertrainium

Screenshot of aws-samples/foundation-model-benchmarking-tool website

aws-samples.github.io/foundation-model-benchmarking-tool/

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
cloud-cv/evalai	A platform for comparing and evaluating AI and machine learning algorithms at scale	1,779
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
google-research/weatherbench2	A benchmark framework for evaluating and comparing data-driven global weather models.	469
sparks-baird/matbench-genmetrics	Provides standardized benchmarks for evaluating the quality of generative models for crystal structures.	34
johnsnowlabs/langtest	A tool for testing and evaluating large language models with a focus on AI safety and model assessment.	506
benchflow/benchflow	Automated performance testing and analysis tool for distributed systems	25
bluss/bencher	A Rust benchmarking library that supports running and filtering benchmarks.	85
apexai/performance_test	An open-source project providing benchmarks and tools to measure the performance of AI and machine learning systems.	65
allenai/reward-bench	A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning.	459
farama-foundation/metaworld	A collection of robotic environment benchmarks for evaluating meta-reinforcement learning and multi-task learning algorithms.	1,290
logicalparadox/matcha	A tool for designing and running benchmarking experiments in JavaScript to measure the performance of code	563
microsoft/private-benchmarking	A platform for private benchmarking of machine learning models with different trust levels.	7
laion-ai/clip_benchmark	Evaluates and compares the performance of various CLIP-like models on different tasks and datasets.	632
mlcommons/inference	Measures the performance of deep learning models in various deployment scenarios.	1,256
bencheeorg/benchee	A tool for benchmarking Elixir code and comparing performance statistics	1,422