VEGA
Multimodal evaluation framework
Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.
33 stars
1 watching
2 forks
Language: Python
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 296 |
| Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 274 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
| A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese | 157 |
| A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 78 |
| Implementing a unified modal learning framework for generative vision-language models | 43 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
| A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |
| An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. | 1,076 |
| Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 |
| Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. | 337 |
| This project provides tools and frameworks to mitigate hallucinatory toxicity in visual instruction data, allowing researchers to fine-tune MLLM models on specific datasets. | 41 |
| PyTorch implementation of guided visual search mechanism for multimodal LLMs | 541 |