Video-MME
Video analysis benchmark
An evaluation framework for large language models in video analysis, providing a comprehensive benchmark of their capabilities.
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
406 stars
5 watching
12 forks
last commit: 5 months ago large-language-modelslarge-vision-language-modelsmmemultimodal-large-language-modelsvideovideo-mme
Related projects:
Repository | Description | Stars |
---|---|---|
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
pku-yuangroup/video-bench | Evaluates and benchmarks large language models' video understanding capabilities | 117 |
rese1f/moviechat | A deep learning model designed to efficiently process and analyze long videos using large language models | 525 |
tsb0601/mmvp | An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 288 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 244 |
yfzhang114/mme-realworld | A benchmark dataset designed to evaluate the performance of multimodal large language models in realistic, high-resolution real-world scenarios. | 78 |
pku-yuangroup/chronomagic-bench | A benchmark and dataset for evaluating text-to-video generation models' ability to generate coherent and varied metamorphic time-lapse videos. | 186 |
huaizhengzhang/awsome-deep-learning-for-video-analysis | A collection of resources and tools for video analysis using deep learning and multi-modal learning techniques. | 763 |
cmmmu-benchmark/cmmmu | An evaluation benchmark and dataset for multimodal question answering models | 46 |
mltframework/mlt | A multimedia framework designed for video editing, providing tools and libraries for audio and video processing. | 1,506 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 92 |
gabeur/mmt | Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text | 258 |
chenllliang/mmevalpro | A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
laomao0/bin | Software to interpolate blurry video frames and enhance image quality | 210 |