foundation-model-benchmarking-tool
AI model benchmarker
A tool for benchmarking and evaluating generative AI models on various AWS platforms
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
203 stars
8 watching
31 forks
Language: Jupyter Notebook
last commit: 6 days ago
Linked from 1 awesome list
bedrockbenchmarkbenchmarkingevaluation-metricsfoundation-modelsg5g6g6egenerative-aiinferentiallama2llama3p4dp5sagemakertrainium
Related projects:
Repository | Description | Stars |
---|---|---|
cloud-cv/evalai | A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,774 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 82 |
google-research/weatherbench2 | A benchmark framework for evaluating and comparing data-driven global weather models. | 455 |
sparks-baird/matbench-genmetrics | Provides standardized benchmarks for evaluating the quality of generative models for crystal structures. | 34 |
johnsnowlabs/langtest | A tool for testing and evaluating large language models with a focus on AI safety and model assessment. | 506 |
benchflow/benchflow | Automated performance testing and analysis tool for distributed systems | 25 |
bluss/bencher | A Rust benchmarking library that supports running and filtering benchmarks. | 85 |
apexai/performance_test | An open-source project providing benchmarks and tools to measure the performance of AI and machine learning systems. | 64 |
allenai/reward-bench | A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning. | 442 |
farama-foundation/metaworld | A collection of robotic environment benchmarks for evaluating meta-reinforcement learning and multi-task learning algorithms. | 1,282 |
logicalparadox/matcha | A tool for designing and running benchmarking experiments in JavaScript to measure the performance of code | 563 |
microsoft/private-benchmarking | A platform for private benchmarking of machine learning models with different trust levels. | 6 |
laion-ai/clip_benchmark | Evaluates and compares the performance of various CLIP-like models on different tasks and datasets. | 625 |
mlcommons/inference | Measures the performance of deep learning models in various deployment scenarios. | 1,243 |
bencheeorg/benchee | A tool for benchmarking Elixir code and comparing performance statistics | 1,416 |