IDA-VLM

Identity-aware video model

An open-source project that aims to improve large vision-language models by integrating identity-aware capabilities and utilizing visual instruction tuning data for movie understanding

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

GitHub

26 stars

4 watching

0 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
360cvgroup/360vl	A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.	32
boheumd/ma-lmm	This project develops an AI model for long-term video understanding	254
meituan-automl/mobilevlm	An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.	1,076
vhellendoorn/code-lms	A guide to using pre-trained large language models in source code analysis and generation	1,789
opengvlab/visionllm	A large language model designed to process and generate visual information	956
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
aidc-ai/ovis	An MLLM architecture designed to align visual and textual embeddings through structural alignment	575
mayer79/flashlight	A toolset for understanding and interpreting complex machine learning models	22
lackel/agla	Improves large vision-language models' ability to accurately describe images by combining global and local attention mechanisms.	18
ieit-yuan/yuan2.0-m32	A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation	182
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
pjlab-adg/gpt4v-ad-exploration	An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions	288
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
elanmart/psmm	An implementation of a neural network model for character-level language modeling.	50