IDA-VLM
Movie Understanding Model
A project that proposes and develops an identity-aware large vision-language model to understand complex visual narratives like movies.
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
25 stars
4 watching
0 forks
Language: Python
last commit: about 1 month ago Related projects:
Repository | Description | Stars |
---|---|---|
360cvgroup/360vl | A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 30 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 244 |
meituan-automl/mobilevlm | An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. | 1,039 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,782 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
aidc-ai/ovis | An architecture designed to align visual and textual embeddings in multimodal learning | 517 |
mayer79/flashlight | A toolset for understanding and interpreting complex machine learning models | 22 |
lackel/agla | Improving large vision-language models to accurately describe images without generating fictional objects | 15 |
ieit-yuan/yuan2.0-m32 | A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 180 |
evolvinglmms-lab/longva | This project provides a model for long context transfer from language to vision using a deep learning framework. | 334 |
pjlab-adg/gpt4v-ad-exploration | An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions | 287 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
elanmart/psmm | An implementation of a neural network model for character-level language modeling. | 50 |