LaVIT

Visual understanding and generation framework

A unified framework for training large language models to understand and generate visual content

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

GitHub

544 stars

14 watching

30 forks

Language: Jupyter Notebook

last commit: about 1 year ago

Related projects:

Repository	Description	Stars
lavi-lab/visual-table	A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge.	14
gicentre/litvis	An approach to designing and building visualizations through literate programming with Elm, Markdown, and Vega.	382
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
nvlabs/relvit	A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations.	64
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
pku-yuangroup/chat-univi	A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data.	895
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
opengvlab/visionllm	A large language model designed to process and generate visual information	956
hxyou/idealgpt	A deep learning framework for iteratively decomposing vision and language reasoning via large language models.	32
jiasenlu/hiecoattenvqa	A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model.	349
yitzchak/ngl-clj	A software framework that provides a widget model approach to create interactive visualizations in Common Lisp for Jupyter notebooks.	2
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
vega/vega	A declarative format for creating interactive visualization designs	11,276
x2fd/lvis-instruct4v	A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset	131