Pink

Visual Text Understanding

This project enables multi-modal language models to understand and generate text about visual content using referential comprehension.

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

GitHub

79 stars
4 watching
5 forks
Language: Python
last commit: 7 months ago

Related projects:

Repository Description Stars
airaria/visual-chinese-llama-alpaca Develops a multimodal Chinese language model with visual capabilities 429
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,336
ys-zong/vl-icl A benchmarking suite for multimodal in-context learning models 31
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336
yuliang-liu/monkey An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. 1,849
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 751
jiutian-vl/jiutian-lion This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. 124
dvlab-research/prompt-highlighter An interactive control system for text generation in multi-modal language models 135
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 73
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 143
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
yunxinli/lingcloud Enhances language models by incorporating human-like eyes to improve visual comprehension and interaction with external world 48
penghao-wu/vstar PyTorch implementation of guided visual search mechanism for multimodal LLMs 541
m-clark/visibly A collection of R visualization tools and utilities for creating color palettes, themes, and visualizing statistical models. 63