Bingo

Model evaluation tool

An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges.

GitHub

53 stars
3 watching
1 forks
last commit: 10 months ago

Related projects:

Repository Description Stars
mikegu721/xiezhibenchmark An evaluation suite to assess language models' performance in multi-choice questions 93
zzhanghub/eval-co-sod An evaluation tool for co-saliency detection tasks 97
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations 797
open-compass/mmbench A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. 168
mlgroupjlu/llm-eval-survey A repository of papers and resources for evaluating large language models. 1,450
cluebenchmark/supercluelyb A benchmarking platform for evaluating Chinese general-purpose models through anonymous, random battles 143
open-compass/vlmevalkit An evaluation toolkit for large vision-language models 1,514
felixgithub2017/mmcu Measures the understanding of massive multitask Chinese datasets using large language models 87
agrigpts/agrigpts Developing large language models for agricultural applications to improve crop yields and support rural development. 22
yuweihao/mm-vet Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics 274
cgnorthcutt/cleanlab A tool for evaluating and improving the fairness of machine learning models 57
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
masaiahhan/correlationqa An investigation into the relationship between misleading images and hallucinations in large language models 8
applieddatasciencepartners/xgboostexplainer Provides tools to understand and interpret the decisions made by XGBoost models in machine learning 253
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 296