OmniBench
Multimodal benchmarking
Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.
A project for tri-modal LLM benchmarking and instruction tuning.
14 stars
0 watching
1 forks
Language: Python
last commit: 16 days ago Related projects:
Repository | Description | Stars |
---|---|---|
ailab-cvc/seed-bench | A benchmark for evaluating large language models' ability to process multimodal input | 315 |
yuliang-liu/multimodalocr | An evaluation benchmark for OCR capabilities in large multmodal models. | 471 |
multimodal-art-projection/map-neo | A large language model designed for research and application in natural language processing tasks. | 877 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 72 |
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
qcri/llmebench | A benchmarking framework for large language models | 80 |
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,089 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,550 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 576 |
subho406/omninet | An implementation of a unified architecture for multi-modal multi-task learning using PyTorch. | 512 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
uw-madison-lee-lab/cobsat | Provides a benchmarking framework and dataset for evaluating the performance of large language models in text-to-image tasks | 28 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 92 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |