VEGA

Multimodal evaluation framework

Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.

33 stars

1 watching

2 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
tsb0601/mmvp	An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks.	296
yuweihao/mm-vet	Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics	274
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
yuxie11/r2d2	A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese	157
xverse-ai/xverse-v-13b	A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.	78
shizhediao/davinci	Implementing a unified modal learning framework for generative vision-language models	43
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
kohjingyu/fromage	A framework for grounding language models to images and handling multimodal inputs and outputs	478
meituan-automl/mobilevlm	An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.	1,076
pku-yuangroup/languagebind	Extending pretraining models to handle multiple modalities by aligning language and video representations	751
haozhezhao/mic	Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks.	337
yuqifan1117/hallucidoctor	This project provides tools and frameworks to mitigate hallucinatory toxicity in visual instruction data, allowing researchers to fine-tune MLLM models on specific datasets.	41
penghao-wu/vstar	PyTorch implementation of guided visual search mechanism for multimodal LLMs	541