LaVIT

Visual understanding and generation framework

A unified framework for training large language models to understand and generate visual content

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

GitHub

528 stars
14 watching
29 forks
Language: Jupyter Notebook
last commit: about 2 months ago

Related projects:

Repository Description Stars
lavi-lab/visual-table A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. 14
gicentre/litvis An approach to designing and building visualizations through literate programming with Elm, Markdown, and Vega. 379
luogen1996/lavin An open-source implementation of a vision-language instructed large language model 508
nvlabs/relvit A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations. 64
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
pku-yuangroup/chat-univi A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. 847
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 269
opengvlab/visionllm A large language model designed to process and generate visual information 915
hxyou/idealgpt A deep learning framework for iteratively decomposing vision and language reasoning via large language models. 32
jiasenlu/hiecoattenvqa A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. 349
yitzchak/ngl-clj A software framework that provides a widget model approach to create interactive visualizations in Common Lisp for Jupyter notebooks. 2
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
vega/vega A declarative format for creating interactive visualization designs 11,250
x2fd/lvis-instruct4v A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset 131