Momentor
Video LLM
A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling
58 stars
6 watching
2 forks
Language: Python
last commit: 3 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A PyTorch-based Video LLM designed to understand and reason about video moments in terms of time boundaries. | 231 |
| This project develops an AI model for long-term video understanding | 254 |
| An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing. | 957 |
| A polyglot large language model designed to address limitations in current LLM research and provide better multilingual instruction-following capability. | 77 |
| A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
| A tool to evaluate video language models' ability to understand and describe video content | 91 |
| Developing a Large Language Model capable of processing 3D representations as inputs | 979 |
| An image-based language model that uses large language models to generate visual and text features from videos | 748 |
| An API that provides a unified interface to multiple large language models for chat fine-tuning | 79 |
| An open-source implementation of a vision-language instructed large language model | 513 |
| A large language model designed to understand and generate instructions with accompanying visual content | 360 |
| A framework that enables large language models to process and understand multimodal inputs from various sources such as images and speech. | 308 |
| A tutorial project for exploring large language models and their applications in natural language processing tasks. | 1,593 |
| An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
| A deep learning project that provides a video-text retrieval model and tools for training and evaluating it on the MSR-VTT dataset | 154 |