JiuTian-LION

Visual Knowledge Model

This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations.

[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

GitHub

121 stars
13 watching
5 forks
Language: Jupyter Notebook
last commit: 4 months ago

Related projects:

Repository Description Stars
liaoning97/revo-lion A comprehensive dataset and evaluation framework for Vision-Language Instruction Tuning models 11
yunxinli/lingcloud An approach to enhance large language models by incorporating visual information using human-like eyes 48
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93
yfzhang114/llava-align Debiasing techniques to minimize hallucinations in large visual language models 71
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
meituan-automl/mobilevlm An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. 1,039
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
ys-zong/vl-icl A benchmarking suite for multimodal in-context learning models 28
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
brucesherwood/vpython-jupyter An integration of VPython with Jupyter Notebook for interactive 3D visualization and simulation in scientific computing. 64
sy-xuan/pink This project enables multi-modal language models to understand and generate text about visual content using referential comprehension. 76
jiasenlu/vilbert_beta A pre-trained model and toolset for performing vision-and-language tasks using a specific neural network architecture. 474
yiyangzhou/lure Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. 134