foundation-model-benchmarking-tool

AI benchmarking tool

A tool for benchmarking performance and accuracy of generative AI models on various AWS platforms

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

GitHub

210 stars
8 watching
32 forks
Language: Jupyter Notebook
last commit: about 1 month ago
Linked from 1 awesome list

bedrockbenchmarkbenchmarkingevaluation-metricsfoundation-modelsg5g6g6egenerative-aiinferentiallama2llama3p4dp5sagemakertrainium

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
cloud-cv/evalai A platform for comparing and evaluating AI and machine learning algorithms at scale 1,779
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 84
google-research/weatherbench2 A benchmark framework for evaluating and comparing data-driven global weather models. 469
sparks-baird/matbench-genmetrics Provides standardized benchmarks for evaluating the quality of generative models for crystal structures. 34
johnsnowlabs/langtest A tool for testing and evaluating large language models with a focus on AI safety and model assessment. 506
benchflow/benchflow Automated performance testing and analysis tool for distributed systems 25
bluss/bencher A Rust benchmarking library that supports running and filtering benchmarks. 85
apexai/performance_test An open-source project providing benchmarks and tools to measure the performance of AI and machine learning systems. 65
allenai/reward-bench A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning. 459
farama-foundation/metaworld A collection of robotic environment benchmarks for evaluating meta-reinforcement learning and multi-task learning algorithms. 1,290
logicalparadox/matcha A tool for designing and running benchmarking experiments in JavaScript to measure the performance of code 563
microsoft/private-benchmarking A platform for private benchmarking of machine learning models with different trust levels. 7
laion-ai/clip_benchmark Evaluates and compares the performance of various CLIP-like models on different tasks and datasets. 632
mlcommons/inference Measures the performance of deep learning models in various deployment scenarios. 1,256
bencheeorg/benchee A tool for benchmarking Elixir code and comparing performance statistics 1,422