MobileVLM

Vision Language Model

An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.

Strong and Open Vision Language Assistant for Mobile Devices

GitHub

1k stars
21 watching
66 forks
Language: Python
last commit: 7 months ago

Related projects:

Repository Description Stars
meituan-automl/lenna An AI-powered image detection system with language-based reasoning capabilities 78
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
shizhediao/davinci An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. 43
jiutian-vl/jiutian-lion This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. 121
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,550
vlf-silkie/vlfeedback An annotated preference dataset and training framework for improving large vision language models. 85
evolvinglmms-lab/longva This project provides a model for long context transfer from language to vision using a deep learning framework. 334
opengvlab/visionllm A large language model designed to process and generate visual information 915
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
baai-wudao/brivl Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications 279
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
airaria/visual-chinese-llama-alpaca Develops a multimodal Chinese language model with visual capabilities 424