Bingo
Model evaluation tool
An analysis project investigating limitations of visual language models in understanding and processing images with potential biases and interference challenges.
53 stars
3 watching
1 forks
last commit: 10 months ago Related projects:
Repository | Description | Stars |
---|---|---|
mikegu721/xiezhibenchmark | An evaluation suite to assess language models' performance in multi-choice questions | 93 |
zzhanghub/eval-co-sod | An evaluation tool for co-saliency detection tasks | 97 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
open-compass/mmbench | A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. | 168 |
mlgroupjlu/llm-eval-survey | A repository of papers and resources for evaluating large language models. | 1,450 |
cluebenchmark/supercluelyb | A benchmarking platform for evaluating Chinese general-purpose models through anonymous, random battles | 143 |
open-compass/vlmevalkit | An evaluation toolkit for large vision-language models | 1,514 |
felixgithub2017/mmcu | Measures the understanding of massive multitask Chinese datasets using large language models | 87 |
agrigpts/agrigpts | Developing large language models for agricultural applications to improve crop yields and support rural development. | 22 |
yuweihao/mm-vet | Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 274 |
cgnorthcutt/cleanlab | A tool for evaluating and improving the fairness of machine learning models | 57 |
freedomintelligence/mllm-bench | Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
masaiahhan/correlationqa | An investigation into the relationship between misleading images and hallucinations in large language models | 8 |
applieddatasciencepartners/xgboostexplainer | Provides tools to understand and interpret the decisions made by XGBoost models in machine learning | 253 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 296 |