Momentor

Video LLM

A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling

GitHub

58 stars

6 watching

2 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
huangb23/vtimellm	A PyTorch-based Video LLM designed to understand and reason about video moments in terms of time boundaries.	231
boheumd/ma-lmm	This project develops an AI model for long-term video understanding	254
damo-nlp-sg/videollama2	An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing.	957
damo-nlp-mt/polylm	A polyglot large language model designed to address limitations in current LLM research and provide better multilingual instruction-following capability.	77
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
llyx97/tempcompass	A tool to evaluate video language models' ability to understand and describe video content	91
umass-foundation-model/3d-llm	Developing a Large Language Model capable of processing 3D representations as inputs	979
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
victordibia/llmx	An API that provides a unified interface to multiple large language models for chat fine-tuning	79
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
dcdmllm/cheetah	A large language model designed to understand and generate instructions with accompanying visual content	360
phellonchen/x-llm	A framework that enables large language models to process and understand multimodal inputs from various sources such as images and speech.	308
internlm/tutorial	A tutorial project for exploring large language models and their applications in natural language processing tasks.	1,593
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
danieljf24/dual_encoding	A deep learning project that provides a video-text retrieval model and tools for training and evaluating it on the MSR-VTT dataset	154