Chat-UniVi
Visual unification framework
A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data.
[CVPR 2024 Highlightš„] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
847 stars
7 watching
43 forks
Language: Python
last commit: about 1 month ago image-understandinglarge-language-modelsvideo-understandingvision-language-model
Related projects:
Repository | Description | Stars |
---|---|---|
pku-yuangroup/languagebind | Extending pretraining models to handle multiple modalities by aligning language and video representations | 723 |
pku-yuangroup/video-bench | Evaluates and benchmarks large language models' video understanding capabilities | 117 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 528 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
pzzhang/vinvl | A project aimed at improving visual representations in vision-language models by developing an object detection model for richer visual object and concept representations. | 350 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
pku-yuangroup/moe-llava | Develops a neural network architecture for multi-modal learning with large vision-language models | 1,980 |
hxyou/idealgpt | A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
jiutian-vl/jiutian-lion | This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. | 121 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
penghao-wu/vstar | PyTorch implementation of guided visual search mechanism for multimodal LLMs | 527 |
mingyuliutw/unit | An unsupervised deep learning framework for translating images between different modalities | 1,988 |