EvalAI

Benchmarking tool

A platform for comparing and evaluating AI and machine learning algorithms at scale

cloud rocket bar_chart chart_with_upwards_trend Evaluating state of the art in AI

GitHub

2k stars
54 watching
799 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list

aiai-challengesangular7angularjsartificial-intelligencechallengedjangodockerevalaievaluationleaderboardmachine-learningpythonreproducibilityreproducible-research

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
catboost/benchmarks Comparative benchmarks of various machine learning algorithms 169
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 322
aws-samples/foundation-model-benchmarking-tool A tool for benchmarking performance and accuracy of generative AI models on various AWS platforms 210
alco/benchfella Tools for comparing and benchmarking small code snippets 514
ethicalml/xai An eXplainability toolbox for machine learning that enables data analysis and model evaluation to mitigate biases and improve performance 1,135
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 21
princeton-nlp/charxiv An evaluation suite for assessing chart understanding in multimodal large language models. 85
ys-zong/vl-icl A benchmarking suite for multimodal in-context learning models 31
bencheeorg/benchee A tool for benchmarking Elixir code and comparing performance statistics 1,422
bailool/doyouevenlearn A comprehensive resource guide to stay updated on AI, ML, DL, and CV advancements 1,039
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 84
vlall/swift-brain A collection of algorithms and data structures for artificial intelligence and machine learning in Swift 335
openai/simple-evals Evaluates language models using standardized benchmarks and prompting techniques. 2,059
jvalegre/robert Automated machine learning protocols for cheminformatics using Python 39
vchitect/vbench A benchmark suite for evaluating the performance of video generative models 643