EvalAI

Benchmarking tool

A platform for comparing and evaluating AI and machine learning algorithms at scale

Evaluating state of the art in AI

GitHub

2k stars

54 watching

799 forks

Language: Python

last commit: about 1 year ago

Linked from 1 awesome list

aiai-challengesangular7angularjsartificial-intelligencechallengedjangodockerevalaievaluationleaderboardmachine-learningpythonreproducibilityreproducible-research

eval.ai

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
catboost/benchmarks	Comparative benchmarks of various machine learning algorithms	169
ailab-cvc/seed-bench	A benchmark for evaluating large language models' ability to process multimodal input	322
aws-samples/foundation-model-benchmarking-tool	A tool for benchmarking performance and accuracy of generative AI models on various AWS platforms	210
alco/benchfella	Tools for comparing and benchmarking small code snippets	514
ethicalml/xai	An eXplainability toolbox for machine learning that enables data analysis and model evaluation to mitigate biases and improve performance	1,135
mshukor/evalign-icl	Evaluating and improving large multimodal models through in-context learning	21
princeton-nlp/charxiv	An evaluation suite for assessing chart understanding in multimodal large language models.	85
ys-zong/vl-icl	A benchmarking suite for multimodal in-context learning models	31
bencheeorg/benchee	A tool for benchmarking Elixir code and comparing performance statistics	1,422
bailool/doyouevenlearn	A comprehensive resource guide to stay updated on AI, ML, DL, and CV advancements	1,039
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
vlall/swift-brain	A collection of algorithms and data structures for artificial intelligence and machine learning in Swift	335
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
jvalegre/robert	Automated machine learning protocols for cheminformatics using Python	39
vchitect/vbench	A benchmark suite for evaluating the performance of video generative models	643