Pink

Visual Text Understanding

This project enables multi-modal language models to understand and generate text about visual content using referential comprehension.

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

GitHub

79 stars

4 watching

5 forks

Language: Python

last commit: about 1 year ago

Related projects:

Repository	Description	Stars
airaria/visual-chinese-llama-alpaca	Develops a multimodal Chinese language model with visual capabilities	429
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
ys-zong/vl-icl	A benchmarking suite for multimodal in-context learning models	31
sergioburdisso/pyss3	A Python package implementing an interpretable machine learning model for text classification with visualization tools	336
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
pku-yuangroup/languagebind	Extending pretraining models to handle multiple modalities by aligning language and video representations	751
jiutian-vl/jiutian-lion	This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations.	124
dvlab-research/prompt-highlighter	An interactive control system for text generation in multi-modal language models	135
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
brightmart/xlnet_zh	Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks	230
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
yunxinli/lingcloud	Enhances language models by incorporating human-like eyes to improve visual comprehension and interaction with external world	48
penghao-wu/vstar	PyTorch implementation of guided visual search mechanism for multimodal LLMs	541
m-clark/visibly	A collection of R visualization tools and utilities for creating color palettes, themes, and visualizing statistical models.	63