MA-LMM

Video understanding model

This project develops an AI model for long-term video understanding

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

GitHub

254 stars

4 watching

27 forks

Language: Python

last commit: over 1 year ago

llmvideo-understanding

boheumd.github.io/MA-LMM/

Related projects:

Repository	Description	Stars
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
damo-nlp-sg/m3exam	A benchmark for evaluating large language models in multiple languages and formats	93
dcdmllm/momentor	A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling	58
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
huangb23/vtimellm	A PyTorch-based Video LLM designed to understand and reason about video moments in terms of time boundaries.	231
alpha-vllm/wemix-llm	An LLaMA-based multimodal language model with various instruction-following and multimodal variants.	17
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
360cvgroup/360vl	A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.	32
evolvinglmms-lab/lmms-eval	Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance	2,164
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
jiyt17/ida-vlm	An open-source project that aims to improve large vision-language models by integrating identity-aware capabilities and utilizing visual instruction tuning data for movie understanding	26
ldmt-muri/morpholm	This project develops language models that incorporate morphological knowledge to improve their understanding of linguistic structures and relationships.	3
llyx97/tempcompass	A tool to evaluate video language models' ability to understand and describe video content	91