MA-LMM
Video understanding model
This project develops an AI model for long-term video understanding
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
244 stars
4 watching
27 forks
Language: Python
last commit: 4 months ago llmvideo-understanding
Related projects:
Repository | Description | Stars |
---|---|---|
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 508 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 92 |
dcdmllm/momentor | A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling | 54 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,550 |
evolvinglmms-lab/longva | This project provides a model for long context transfer from language to vision using a deep learning framework. | 334 |
huangb23/vtimellm | A PyTorch-based Video LLM designed to understand and reason about video moments in terms of time boundaries. | 225 |
alpha-vllm/wemix-llm | An LLaMA-based multimodal language model with various instruction-following and multimodal variants. | 17 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 576 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 269 |
360cvgroup/360vl | A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 30 |
evolvinglmms-lab/lmms-eval | Tools and evaluation suite for large multimodal models | 2,058 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
jiyt17/ida-vlm | A project that proposes and develops an identity-aware large vision-language model to understand complex visual narratives like movies. | 25 |
ldmt-muri/morpholm | This project develops language models that incorporate morphological knowledge to improve their understanding of linguistic structures and relationships. | 3 |
llyx97/tempcompass | A tool to evaluate video language models' ability to understand and describe video content | 84 |