IDA-VLM
Identity-aware video model
An open-source project that aims to improve large vision-language models by integrating identity-aware capabilities and utilizing visual instruction tuning data for movie understanding
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
26 stars
4 watching
0 forks
Language: Python
last commit: 11 months ago Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 32 |
| | This project develops an AI model for long-term video understanding | 254 |
| | An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. | 1,076 |
| | A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
| | A large language model designed to process and generate visual information | 956 |
| | A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
| | An image-based language model that uses large language models to generate visual and text features from videos | 748 |
| | An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
| | A toolset for understanding and interpreting complex machine learning models | 22 |
| | Improves large vision-language models' ability to accurately describe images by combining global and local attention mechanisms. | 18 |
| | A high-performance language model designed to excel in tasks like natural language understanding, mathematical computation, and code generation | 182 |
| | An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. | 347 |
| | An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions | 288 |
| | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 84 |
| | An implementation of a neural network model for character-level language modeling. | 50 |