Momentor
Video LLM
A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling
54 stars
8 watching
2 forks
Language: Python
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
huangb23/vtimellm | A PyTorch-based Video LLM designed to understand and reason about video moments in terms of time boundaries. | 225 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 244 |
damo-nlp-sg/videollama2 | An audio-visual language model designed to understand and generate video content | 871 |
damo-nlp-mt/polylm | A polyglot large language model designed to address limitations in current LLM research and provide better multilingual instruction-following capability. | 76 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,550 |
llyx97/tempcompass | A tool to evaluate video language models' ability to understand and describe video content | 84 |
umass-foundation-model/3d-llm | Developing a Large Language Model capable of processing 3D representations as inputs | 961 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
victordibia/llmx | An API that provides a unified interface to multiple large language models for chat fine-tuning | 79 |
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 508 |
dcdmllm/cheetah | A large language model designed to understand and generate instructions with accompanying visual content | 356 |
phellonchen/x-llm | A framework that enables large language models to process and understand multimodal inputs from various sources such as images and speech. | 306 |
internlm/tutorial | A comprehensive tutorial project offering in-depth training and practice on advanced language model technologies | 1,530 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
danieljf24/dual_encoding | A deep learning project that provides a video-text retrieval model and tools for training and evaluating it on the MSR-VTT dataset | 155 |