VEGA

Multimodal evaluation framework

Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.

GitHub

33 stars
1 watching
2 forks
Language: Python
last commit: 6 months ago

Related projects:

Repository Description Stars
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 296
yuweihao/mm-vet Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics 274
yuliang-liu/monkey An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. 1,849
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 15
yuxie11/r2d2 A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese 157
xverse-ai/xverse-v-13b A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. 78
shizhediao/davinci Implementing a unified modal learning framework for generative vision-language models 43
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,299
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 143
kohjingyu/fromage A framework for grounding language models to images and handling multimodal inputs and outputs 478
meituan-automl/mobilevlm An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. 1,076
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 751
haozhezhao/mic Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. 337
yuqifan1117/hallucidoctor This project provides tools and frameworks to mitigate hallucinatory toxicity in visual instruction data, allowing researchers to fine-tune MLLM models on specific datasets. 41
penghao-wu/vstar PyTorch implementation of guided visual search mechanism for multimodal LLMs 541