Vary
Document comprehension model
An implementation of a vision vocabulary model for large language models to improve document understanding and recognition capabilities
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
2k stars
54 watching
158 forks
Language: Python
last commit: about 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
360cvgroup/360vl | A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 30 |
sergioburdisso/pyss3 | A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
sicara/tf-explain | A library providing interpretability methods for TensorFlow 2.x models | 1,018 |
interpretml/dice | Provides counterfactual explanations for machine learning models to facilitate interpretability and understanding. | 1,364 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
akosiorek/attend_infer_repeat | An implementation of Attend, Infer, Repeat, a method for fast scene understanding using generative models. | 82 |
jalammar/ecco | An interactive visualization library for exploring and understanding transformer-based language models | 1,985 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
msracver/fcis | An implementation of a deep learning framework for instance-aware semantic segmentation | 1,566 |
princeton-nlp/charxiv | An evaluation suite for assessing chart understanding in multimodal large language models. | 75 |
yuweihao/mm-vet | Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 267 |
tca19/dict2vec | A framework to learn word embeddings using lexical dictionaries | 115 |
baaivision/eve | A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 230 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |