Pink
Visual Text Understanding
This project enables multi-modal language models to understand and generate text about visual content using referential comprehension.
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
79 stars
4 watching
5 forks
Language: Python
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| Develops a multimodal Chinese language model with visual capabilities | 429 |
| Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
| A benchmarking suite for multimodal in-context learning models | 31 |
| A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 |
| This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. | 124 |
| An interactive control system for text generation in multi-modal language models | 135 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
| Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| Enhances language models by incorporating human-like eyes to improve visual comprehension and interaction with external world | 48 |
| PyTorch implementation of guided visual search mechanism for multimodal LLMs | 541 |
| A collection of R visualization tools and utilities for creating color palettes, themes, and visualizing statistical models. | 63 |