Chat-UniVi

Visual unification framework

A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data.

[CVPR 2024 HighlightšŸ”„] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

GitHub

847 stars
7 watching
43 forks
Language: Python
last commit: about 1 month ago
image-understandinglarge-language-modelsvideo-understandingvision-language-model

Related projects:

Repository Description Stars
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
jy0205/lavit A unified framework for training large language models to understand and generate visual content 528
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
pzzhang/vinvl A project aimed at improving visual representations in vision-language models by developing an object detection model for richer visual object and concept representations. 350
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
pku-yuangroup/moe-llava Develops a neural network architecture for multi-modal learning with large vision-language models 1,980
hxyou/idealgpt A deep learning framework for iteratively decomposing vision and language reasoning via large language models. 32
shizhediao/davinci An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. 43
jiutian-vl/jiutian-lion This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. 121
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
penghao-wu/vstar PyTorch implementation of guided visual search mechanism for multimodal LLMs 527
mingyuliutw/unit An unsupervised deep learning framework for translating images between different modalities 1,988