MobileVLM

Vision Language Model

An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.

Strong and Open Vision Language Assistant for Mobile Devices

GitHub

1k stars

21 watching

69 forks

Language: Python

last commit: over 2 years ago

Related projects:

Repository	Description	Stars
meituan-automl/lenna	An AI-powered image detection system with language-based reasoning capabilities	78
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
shizhediao/davinci	Implementing a unified modal learning framework for generative vision-language models	43
jiutian-vl/jiutian-lion	This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations.	124
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
vlf-silkie/vlfeedback	An annotated preference dataset and training framework for improving large vision language models.	88
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
opengvlab/visionllm	A large language model designed to process and generate visual information	956
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
baai-wudao/brivl	Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications	279
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
airaria/visual-chinese-llama-alpaca	Develops a multimodal Chinese language model with visual capabilities	429