IDA-VLM

Movie Understanding Model

A project that proposes and develops an identity-aware large vision-language model to understand complex visual narratives like movies.

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

GitHub

25 stars
4 watching
0 forks
Language: Python
last commit: about 1 month ago

Related projects:

Repository Description Stars
360cvgroup/360vl A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. 30
boheumd/ma-lmm This project develops an AI model for long-term video understanding 244
meituan-automl/mobilevlm An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. 1,039
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
opengvlab/visionllm A large language model designed to process and generate visual information 915
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
aidc-ai/ovis An architecture designed to align visual and textual embeddings in multimodal learning 536
mayer79/flashlight A toolset for understanding and interpreting complex machine learning models 22
lackel/agla Improving large vision-language models to accurately describe images without generating fictional objects 15
ieit-yuan/yuan2.0-m32 A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation 180
evolvinglmms-lab/longva This project provides a model for long context transfer from language to vision using a deep learning framework. 334
pjlab-adg/gpt4v-ad-exploration An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions 287
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 82
elanmart/psmm An implementation of a neural network model for character-level language modeling. 50