Bingo
Model evaluation tool
An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges.
53 stars
3 watching
1 forks
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
mikegu721/xiezhibenchmark | An evaluation suite to assess language models' performance in multi-choice questions | 91 |
zzhanghub/eval-co-sod | An evaluation tool for co-saliency detection tasks | 96 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
open-compass/mmbench | A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. | 163 |
mlgroupjlu/llm-eval-survey | A repository of papers and resources for evaluating large language models. | 1,433 |
cluebenchmark/supercluelyb | A benchmarking platform for evaluating Chinese general-purpose models through anonymous, random battles | 141 |
open-compass/vlmevalkit | A toolkit for evaluating large vision-language models on various benchmarks and datasets. | 1,343 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
agrigpts/agrigpts | Developing agricultural large language models to support research and practical applications in agriculture. | 22 |
yuweihao/mm-vet | Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 267 |
cgnorthcutt/cleanlab | A tool for evaluating and improving the fairness of machine learning models | 57 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 55 |
masaiahhan/correlationqa | An investigation into the relationship between misleading images and hallucinations in large language models | 8 |
applieddatasciencepartners/xgboostexplainer | Provides tools to understand and interpret the decisions made by XGBoost models in machine learning | 252 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |