LaVIT
Visual understanding and generation framework
A unified framework for training large language models to understand and generate visual content
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
528 stars
14 watching
29 forks
Language: Jupyter Notebook
last commit: about 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
lavi-lab/visual-table | A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. | 14 |
gicentre/litvis | An approach to designing and building visualizations through literate programming with Elm, Markdown, and Vega. | 379 |
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 508 |
nvlabs/relvit | A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations. | 64 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
pku-yuangroup/chat-univi | A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. | 847 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
hxyou/idealgpt | A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
yitzchak/ngl-clj | A software framework that provides a widget model approach to create interactive visualizations in Common Lisp for Jupyter notebooks. | 2 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
vega/vega | A declarative format for creating interactive visualization designs | 11,250 |
x2fd/lvis-instruct4v | A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset | 131 |