MobileVLM

Vision Language Model

An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.

Strong and Open Vision Language Assistant for Mobile Devices

GitHub

1k stars
21 watching
69 forks
Language: Python
last commit: 9 months ago

Related projects:

Repository Description Stars
meituan-automl/lenna An AI-powered image detection system with language-based reasoning capabilities 78
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,299
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,145
shizhediao/davinci Implementing a unified modal learning framework for generative vision-language models 43
jiutian-vl/jiutian-lion This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. 124
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,568
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 88
evolvinglmms-lab/longva An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. 347
opengvlab/visionllm A large language model designed to process and generate visual information 956
yuliang-liu/monkey An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. 1,849
baai-wudao/brivl Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications 279
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 73
airaria/visual-chinese-llama-alpaca Develops a multimodal Chinese language model with visual capabilities 429