Vary
Document comprehension model
An implementation of a vision vocabulary model for large language models to improve document understanding and recognition capabilities
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
2k stars
54 watching
159 forks
Language: Python
last commit: 3 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 32 |
| A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
| A library providing interpretability methods for TensorFlow 2.x models | 1,019 |
| Provides counterfactual explanations for machine learning models to facilitate interpretability and understanding. | 1,373 |
| Develops a PyTorch implementation of an enhanced vision language model | 93 |
| An implementation of Attend, Infer, Repeat, a method for fast scene understanding using generative models. | 82 |
| An interactive visualization library for exploring and understanding transformer-based language models | 1,986 |
| Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 314 |
| Implementing a unified modal learning framework for generative vision-language models | 43 |
| An implementation of a fully convolutional instance-aware semantic segmentation framework using CUDA. | 1,567 |
| An evaluation suite for assessing chart understanding in multimodal large language models. | 85 |
| Evaluates the capabilities of large multimodal models using a set of diverse tasks and metrics | 274 |
| A framework to learn word embeddings using lexical dictionaries | 115 |
| A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
| Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |