Bingo

Model evaluation tool

An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges.

GitHub

53 stars
3 watching
1 forks
last commit: 8 months ago

Related projects:

Repository Description Stars
mikegu721/xiezhibenchmark An evaluation suite to assess language models' performance in multi-choice questions 91
zzhanghub/eval-co-sod An evaluation tool for co-saliency detection tasks 96
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
open-compass/mmbench A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. 163
mlgroupjlu/llm-eval-survey A repository of papers and resources for evaluating large language models. 1,433
cluebenchmark/supercluelyb A benchmarking platform for evaluating Chinese general-purpose models through anonymous, random battles 141
open-compass/vlmevalkit A toolkit for evaluating large vision-language models on various benchmarks and datasets. 1,343
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87
agrigpts/agrigpts Developing agricultural large language models to support research and practical applications in agriculture. 22
yuweihao/mm-vet Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics 267
cgnorthcutt/cleanlab A tool for evaluating and improving the fairness of machine learning models 57
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
masaiahhan/correlationqa An investigation into the relationship between misleading images and hallucinations in large language models 8
applieddatasciencepartners/xgboostexplainer Provides tools to understand and interpret the decisions made by XGBoost models in machine learning 252
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 288