EvalAI
Benchmarking tool
A platform for comparing and evaluating AI and machine learning algorithms at scale
Evaluating state of the art in AI
2k stars
54 watching
786 forks
Language: Python
last commit: 2 months ago
Linked from 1 awesome list
aiai-challengesangular7angularjsartificial-intelligencechallengedjangodockerevalaievaluationleaderboardmachine-learningpythonreproducibilityreproducible-research
Related projects:
Repository | Description | Stars |
---|---|---|
catboost/benchmarks | Comparative benchmarks of various machine learning algorithms | 169 |
ailab-cvc/seed-bench | A benchmark for evaluating large language models' ability to process multimodal input | 315 |
aws-samples/foundation-model-benchmarking-tool | A tool for benchmarking and evaluating generative AI models on various AWS platforms | 196 |
alco/benchfella | Tools for comparing and benchmarking small code snippets | 516 |
ethicalml/xai | An eXplainability toolbox for machine learning that enables data analysis and model evaluation to mitigate biases and improve performance | 1,125 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 20 |
princeton-nlp/charxiv | An evaluation suite for assessing chart understanding in multimodal large language models. | 75 |
ys-zong/vl-icl | A benchmarking suite for multimodal in-context learning models | 28 |
bencheeorg/benchee | A tool for benchmarking Elixir code and comparing performance statistics | 1,417 |
bailool/doyouevenlearn | A comprehensive resource guide to stay updated on AI, ML, DL, and CV advancements | 1,038 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
vlall/swift-brain | A collection of algorithms and data structures for artificial intelligence and machine learning in Swift | 335 |
openai/simple-evals | A library for evaluating language models using standardized prompts and benchmarking tests. | 1,939 |
jvalegre/robert | Automated machine learning protocols for cheminformatics using Python | 38 |
vchitect/vbench | A tool for evaluating and benchmarking video generative models in computer vision and artificial intelligence | 576 |