Vary
Document comprehension model
An implementation of a vision vocabulary model for large language models to improve document understanding and recognition capabilities
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
2k stars
54 watching
159 forks
Language: Python
last commit: 16 days ago Related projects:
Repository | Description | Stars |
---|---|---|
360cvgroup/360vl | A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 32 |
sergioburdisso/pyss3 | A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
sicara/tf-explain | A library providing interpretability methods for TensorFlow 2.x models | 1,019 |
interpretml/dice | Provides counterfactual explanations for machine learning models to facilitate interpretability and understanding. | 1,373 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
akosiorek/attend_infer_repeat | An implementation of Attend, Infer, Repeat, a method for fast scene understanding using generative models. | 82 |
jalammar/ecco | An interactive visualization library for exploring and understanding transformer-based language models | 1,986 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 314 |
shizhediao/davinci | Implementing a unified modal learning framework for generative vision-language models | 43 |
msracver/fcis | An implementation of a fully convolutional instance-aware semantic segmentation framework using CUDA. | 1,567 |
princeton-nlp/charxiv | An evaluation suite for assessing chart understanding in multimodal large language models. | 85 |
yuweihao/mm-vet | Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 274 |
tca19/dict2vec | A framework to learn word embeddings using lexical dictionaries | 115 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |