EvalAI
Benchmarking tool
A platform for comparing and evaluating AI and machine learning algorithms at scale
Evaluating state of the art in AI
2k stars
54 watching
799 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list
aiai-challengesangular7angularjsartificial-intelligencechallengedjangodockerevalaievaluationleaderboardmachine-learningpythonreproducibilityreproducible-research
Related projects:
Repository | Description | Stars |
---|---|---|
catboost/benchmarks | Comparative benchmarks of various machine learning algorithms | 169 |
ailab-cvc/seed-bench | A benchmark for evaluating large language models' ability to process multimodal input | 322 |
aws-samples/foundation-model-benchmarking-tool | A tool for benchmarking performance and accuracy of generative AI models on various AWS platforms | 210 |
alco/benchfella | Tools for comparing and benchmarking small code snippets | 514 |
ethicalml/xai | An eXplainability toolbox for machine learning that enables data analysis and model evaluation to mitigate biases and improve performance | 1,135 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 21 |
princeton-nlp/charxiv | An evaluation suite for assessing chart understanding in multimodal large language models. | 85 |
ys-zong/vl-icl | A benchmarking suite for multimodal in-context learning models | 31 |
bencheeorg/benchee | A tool for benchmarking Elixir code and comparing performance statistics | 1,422 |
bailool/doyouevenlearn | A comprehensive resource guide to stay updated on AI, ML, DL, and CV advancements | 1,039 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 84 |
vlall/swift-brain | A collection of algorithms and data structures for artificial intelligence and machine learning in Swift | 335 |
openai/simple-evals | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
jvalegre/robert | Automated machine learning protocols for cheminformatics using Python | 39 |
vchitect/vbench | A benchmark suite for evaluating the performance of video generative models | 643 |