MobileVLM
Vision Language Model
An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.
Strong and Open Vision Language Assistant for Mobile Devices
1k stars
21 watching
69 forks
Language: Python
last commit: 10 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| An AI-powered image detection system with language-based reasoning capabilities | 78 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
| Implementing a unified modal learning framework for generative vision-language models | 43 |
| This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. | 124 |
| Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
| A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
| An annotated preference dataset and training framework for improving large vision language models. | 88 |
| An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. | 347 |
| A large language model designed to process and generate visual information | 956 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| Develops a multimodal Chinese language model with visual capabilities | 429 |