MA-LMM
Video understanding model
This project develops an AI model for long-term video understanding
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
254 stars
4 watching
27 forks
Language: Python
last commit: 6 months ago llmvideo-understanding
Related projects:
Repository | Description | Stars |
---|---|---|
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 513 |
damo-nlp-sg/m3exam | A benchmark for evaluating large language models in multiple languages and formats | 93 |
dcdmllm/momentor | A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling | 58 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
evolvinglmms-lab/longva | An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. | 347 |
huangb23/vtimellm | A PyTorch-based Video LLM designed to understand and reason about video moments in terms of time boundaries. | 231 |
alpha-vllm/wemix-llm | An LLaMA-based multimodal language model with various instruction-following and multimodal variants. | 17 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 585 |
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 270 |
360cvgroup/360vl | A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 32 |
evolvinglmms-lab/lmms-eval | Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance | 2,164 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 748 |
jiyt17/ida-vlm | An open-source project that aims to improve large vision-language models by integrating identity-aware capabilities and utilizing visual instruction tuning data for movie understanding | 26 |
ldmt-muri/morpholm | This project develops language models that incorporate morphological knowledge to improve their understanding of linguistic structures and relationships. | 3 |
llyx97/tempcompass | A tool to evaluate video language models' ability to understand and describe video content | 91 |