IDA-VLM
Identity-aware video model
An open-source project that aims to improve large vision-language models by integrating identity-aware capabilities and utilizing visual instruction tuning data for movie understanding
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
26 stars
4 watching
0 forks
Language: Python
last commit: about 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
360cvgroup/360vl | A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 32 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 254 |
meituan-automl/mobilevlm | An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. | 1,076 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 956 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 748 |
aidc-ai/ovis | An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
mayer79/flashlight | A toolset for understanding and interpreting complex machine learning models | 22 |
lackel/agla | Improves large vision-language models' ability to accurately describe images by combining global and local attention mechanisms. | 18 |
ieit-yuan/yuan2.0-m32 | A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 182 |
evolvinglmms-lab/longva | An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. | 347 |
pjlab-adg/gpt4v-ad-exploration | An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions | 288 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 84 |
elanmart/psmm | An implementation of a neural network model for character-level language modeling. | 50 |