foundation-model-benchmarking-tool

AI model benchmarker

A tool for benchmarking and evaluating generative AI models on various AWS platforms

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

GitHub

203 stars
8 watching
31 forks
Language: Jupyter Notebook
last commit: 6 days ago
Linked from 1 awesome list

bedrockbenchmarkbenchmarkingevaluation-metricsfoundation-modelsg5g6g6egenerative-aiinferentiallama2llama3p4dp5sagemakertrainium

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
cloud-cv/evalai A platform for comparing and evaluating AI and machine learning algorithms at scale 1,774
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 82
google-research/weatherbench2 A benchmark framework for evaluating and comparing data-driven global weather models. 455
sparks-baird/matbench-genmetrics Provides standardized benchmarks for evaluating the quality of generative models for crystal structures. 34
johnsnowlabs/langtest A tool for testing and evaluating large language models with a focus on AI safety and model assessment. 506
benchflow/benchflow Automated performance testing and analysis tool for distributed systems 25
bluss/bencher A Rust benchmarking library that supports running and filtering benchmarks. 85
apexai/performance_test An open-source project providing benchmarks and tools to measure the performance of AI and machine learning systems. 64
allenai/reward-bench A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning. 442
farama-foundation/metaworld A collection of robotic environment benchmarks for evaluating meta-reinforcement learning and multi-task learning algorithms. 1,282
logicalparadox/matcha A tool for designing and running benchmarking experiments in JavaScript to measure the performance of code 563
microsoft/private-benchmarking A platform for private benchmarking of machine learning models with different trust levels. 6
laion-ai/clip_benchmark Evaluates and compares the performance of various CLIP-like models on different tasks and datasets. 625
mlcommons/inference Measures the performance of deep learning models in various deployment scenarios. 1,243
bencheeorg/benchee A tool for benchmarking Elixir code and comparing performance statistics 1,416