Pink

Visual Text Understanding

This project enables multi-modal language models to understand and generate text about visual content using referential comprehension.

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

GitHub

76 stars
4 watching
5 forks
Language: Python
last commit: 5 months ago

Related projects:

Repository Description Stars
airaria/visual-chinese-llama-alpaca Develops a multimodal Chinese language model with visual capabilities 424
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
ys-zong/vl-icl A benchmarking suite for multimodal in-context learning models 28
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
jiutian-vl/jiutian-lion This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. 121
dvlab-research/prompt-highlighter An interactive control system for text generation in multi-modal language models 132
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 137
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
yunxinli/lingcloud An approach to enhance large language models by incorporating visual information using human-like eyes 48
penghao-wu/vstar PyTorch implementation of guided visual search mechanism for multimodal LLMs 527
m-clark/visibly A collection of R visualization tools and utilities for creating color palettes, themes, and visualizing statistical models. 62