MovieChat
Video understanding optimizer
Develops a method for long video understanding by optimizing memory usage
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
550 stars
10 watching
41 forks
Language: Python
last commit: 3 months ago computer-visiondatasetlarge-language-modelsllamalong-video-understandingmultimodal-large-language-models
Related projects:
Repository | Description | Stars |
---|---|---|
| A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths. | 314 |
| Evaluates and benchmarks large language models' video understanding capabilities | 121 |
| Comprehensive benchmark for evaluating multi-modal large language models on video analysis tasks | 422 |
| Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text | 259 |
| An artificial intelligence system designed to understand and describe long-form video content | 329 |
| An image-based language model that uses large language models to generate visual and text features from videos | 748 |
| A native video editing library for React Native that provides tools for trimming, compressing, and processing videos on mobile devices. | 1,257 |
| A collection of resources and tools for video analysis using deep learning and multi-modal learning techniques. | 767 |
| This project develops an AI model for long-term video understanding | 254 |
| A utility for analyzing videos based on objects and scenes within them | 358 |
| Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 245 |
| A collection of video analysis methods and datasets for research and development | 533 |
| This project explores question-answering in movies using various machine learning approaches. | 80 |
| A CLI tool for retrieving and comparing movie information | 162 |
| An implementation of an end-to-end learning framework for video object detection using feature aggregation along motion paths | 724 |