MobileVLM
Vision Language Model
An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.
Strong and Open Vision Language Assistant for Mobile Devices
1k stars
21 watching
66 forks
Language: Python
last commit: 7 months ago Related projects:
Repository | Description | Stars |
---|---|---|
meituan-automl/lenna | An AI-powered image detection system with language-based reasoning capabilities | 78 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
jiutian-vl/jiutian-lion | This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. | 121 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,550 |
vlf-silkie/vlfeedback | An annotated preference dataset and training framework for improving large vision language models. | 85 |
evolvinglmms-lab/longva | This project provides a model for long context transfer from language to vision using a deep learning framework. | 334 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
baai-wudao/brivl | Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 72 |
airaria/visual-chinese-llama-alpaca | Develops a multimodal Chinese language model with visual capabilities | 424 |