JiuTian-LION
Visual Knowledge Model
This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations.
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
124 stars
13 watching
6 forks
Language: Jupyter Notebook
last commit: 7 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A comprehensive dataset and evaluation framework for Vision-Language Instruction Tuning models | 11 |
| Enhances language models by incorporating human-like eyes to improve visual comprehension and interaction with external world | 48 |
| Develops a PyTorch implementation of an enhanced vision language model | 93 |
| Debiasing techniques to minimize hallucinations in large visual language models | 75 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| An implementation of a vision language model designed for mobile devices, utilizing a lightweight downsample projector and pre-trained language models. | 1,076 |
| A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |
| A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
| A benchmarking suite for multimodal in-context learning models | 31 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| An integration of VPython with Jupyter Notebook for interactive 3D visualization and simulation in scientific computing. | 64 |
| This project enables multi-modal language models to understand and generate text about visual content using referential comprehension. | 79 |
| A pre-trained model and toolset for performing vision-and-language tasks using a specific neural network architecture. | 473 |
| Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. | 136 |