JiuTian-LION
Visual Knowledge Model
This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations.
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
121 stars
13 watching
5 forks
Language: Jupyter Notebook
last commit: 4 months ago Related projects:
Repository | Description | Stars |
---|---|---|
liaoning97/revo-lion | A comprehensive dataset and evaluation framework for Vision-Language Instruction Tuning models | 11 |
yunxinli/lingcloud | An approach to enhance large language models by incorporating visual information using human-like eyes | 48 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
yfzhang114/llava-align | Debiasing techniques to minimize hallucinations in large visual language models | 71 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
meituan-automl/mobilevlm | An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. | 1,039 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
ys-zong/vl-icl | A benchmarking suite for multimodal in-context learning models | 28 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 72 |
brucesherwood/vpython-jupyter | An integration of VPython with Jupyter Notebook for interactive 3D visualization and simulation in scientific computing. | 64 |
sy-xuan/pink | This project enables multi-modal language models to understand and generate text about visual content using referential comprehension. | 76 |
jiasenlu/vilbert_beta | A pre-trained model and toolset for performing vision-and-language tasks using a specific neural network architecture. | 474 |
yiyangzhou/lure | Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 134 |