MA-LMM

Video understanding model

This project develops an AI model for long-term video understanding

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

GitHub

244 stars
4 watching
27 forks
Language: Python
last commit: 4 months ago
llmvideo-understanding

Related projects:

Repository Description Stars
luogen1996/lavin An open-source implementation of a vision-language instructed large language model 508
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 92
dcdmllm/momentor A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling 54
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,550
evolvinglmms-lab/longva This project provides a model for long context transfer from language to vision using a deep learning framework. 334
huangb23/vtimellm A PyTorch-based Video LLM designed to understand and reason about video moments in terms of time boundaries. 225
alpha-vllm/wemix-llm An LLaMA-based multimodal language model with various instruction-following and multimodal variants. 17
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 576
mlpc-ucsd/bliva A multimodal LLM designed to handle text-rich visual questions 269
360cvgroup/360vl A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. 30
evolvinglmms-lab/lmms-eval Tools and evaluation suite for large multimodal models 2,058
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
jiyt17/ida-vlm A project that proposes and develops an identity-aware large vision-language model to understand complex visual narratives like movies. 25
ldmt-muri/morpholm This project develops language models that incorporate morphological knowledge to improve their understanding of linguistic structures and relationships. 3
llyx97/tempcompass A tool to evaluate video language models' ability to understand and describe video content 84