Video-MME

Video analysis benchmark

An evaluation framework for large language models in video analysis, providing a comprehensive benchmark of their capabilities.

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

GitHub

406 stars
5 watching
12 forks
last commit: 5 months ago
large-language-modelslarge-vision-language-modelsmmemultimodal-large-language-modelsvideovideo-mme

Related projects:

Repository Description Stars
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 83
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
rese1f/moviechat A deep learning model designed to efficiently process and analyze long videos using large language models 525
tsb0601/mmvp An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. 288
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87
boheumd/ma-lmm This project develops an AI model for long-term video understanding 244
yfzhang114/mme-realworld A benchmark dataset designed to evaluate the performance of multimodal large language models in realistic, high-resolution real-world scenarios. 78
pku-yuangroup/chronomagic-bench A benchmark and dataset for evaluating text-to-video generation models' ability to generate coherent and varied metamorphic time-lapse videos. 186
huaizhengzhang/awsome-deep-learning-for-video-analysis A collection of resources and tools for video analysis using deep learning and multi-modal learning techniques. 763
cmmmu-benchmark/cmmmu An evaluation benchmark and dataset for multimodal question answering models 46
mltframework/mlt A multimedia framework designed for video editing, providing tools and libraries for audio and video processing. 1,506
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 92
gabeur/mmt Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text 258
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
laomao0/bin Software to interpolate blurry video frames and enhance image quality 210