Pink
Visual Text Understanding
This project enables multi-modal language models to understand and generate text about visual content using referential comprehension.
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
76 stars
4 watching
5 forks
Language: Python
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
airaria/visual-chinese-llama-alpaca | Develops a multimodal Chinese language model with visual capabilities | 424 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
ys-zong/vl-icl | A benchmarking suite for multimodal in-context learning models | 28 |
sergioburdisso/pyss3 | A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
jiutian-vl/jiutian-lion | This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. | 121 |
dvlab-research/prompt-highlighter | An interactive control system for text generation in multi-modal language models | 132 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 72 |
yfzhang114/slime | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 137 |
brightmart/xlnet_zh | Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks | 230 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
yunxinli/lingcloud | An approach to enhance large language models by incorporating visual information using human-like eyes | 48 |
penghao-wu/vstar | PyTorch implementation of guided visual search mechanism for multimodal LLMs | 527 |
m-clark/visibly | A collection of R visualization tools and utilities for creating color palettes, themes, and visualizing statistical models. | 62 |