EVE
Vision-Language Model
A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
230 stars
8 watching
3 forks
Language: Python
last commit: about 2 months ago clipencoder-free-vlminstruction-followinglarge-language-modelsllmmllmmultimodal-large-language-modelsvision-language-modelsvlm
Related projects:
Repository | Description | Stars |
---|---|---|
baai-wudao/brivl | Pre-trains a multilingual model to bridge vision and language modalities for various downstream applications | 279 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
nvlabs/eagle | Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 539 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
jayleicn/clipbert | An efficient framework for end-to-end learning on image-text and video-text tasks | 704 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
baaivision/emu | A multimodal generative model framework | 1,659 |
freedomintelligence/allava | A collection of datasets and models designed to support the training of lite vision-language models. | 246 |
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
paganpasta/eqxvision | A package of pre-trained computer vision models for image classification and segmentation. | 102 |
awni/speech | A PyTorch implementation of end-to-end speech recognition models. | 754 |
vishaal27/sus-x | This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. | 94 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |