JiuTian-LION

Visual Knowledge Model

This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations.

[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

GitHub

124 stars

13 watching

6 forks

Language: Jupyter Notebook

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
liaoning97/revo-lion	A comprehensive dataset and evaluation framework for Vision-Language Instruction Tuning models	11
yunxinli/lingcloud	Enhances language models by incorporating human-like eyes to improve visual comprehension and interaction with external world	48
byungkwanlee/collavo	Develops a PyTorch implementation of an enhanced vision language model	93
yfzhang114/llava-align	Debiasing techniques to minimize hallucinations in large visual language models	75
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
meituan-automl/mobilevlm	An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models.	1,076
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
ys-zong/vl-icl	A benchmarking suite for multimodal in-context learning models	31
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
brucesherwood/vpython-jupyter	An integration of VPython with Jupyter Notebook for interactive 3D visualization and simulation in scientific computing.	64
sy-xuan/pink	This project enables multi-modal language models to understand and generate text about visual content using referential comprehension.	79
jiasenlu/vilbert_beta	A pre-trained model and toolset for performing vision-and-language tasks using a specific neural network architecture.	473
yiyangzhou/lure	Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability.	136