Bingo

Model evaluation tool

An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges.

53 stars

3 watching

1 forks

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
mikegu721/xiezhibenchmark	An evaluation suite to assess language models' performance in multi-choice questions	93
zzhanghub/eval-co-sod	An evaluation tool for co-saliency detection tasks	97
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
open-compass/mmbench	A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models.	168
mlgroupjlu/llm-eval-survey	A repository of papers and resources for evaluating large language models.	1,450
cluebenchmark/supercluelyb	A benchmarking platform for evaluating Chinese general-purpose models through anonymous, random battles	143
open-compass/vlmevalkit	An evaluation toolkit for large vision-language models	1,514
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87
agrigpts/agrigpts	Developing large language models for agricultural applications to improve crop yields and support rural development.	22
yuweihao/mm-vet	Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics	274
cgnorthcutt/cleanlab	A tool for evaluating and improving the fairness of machine learning models	57
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
masaiahhan/correlationqa	An investigation into the relationship between misleading images and hallucinations in large language models	8
applieddatasciencepartners/xgboostexplainer	Provides tools to understand and interpret the decisions made by XGBoost models in machine learning	253
tsb0601/mmvp	An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks.	296