 LaVIT
 LaVIT 
 Visual understanding and generation framework
 A unified framework for training large language models to understand and generate visual content
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
544 stars
 14 watching
 30 forks
 
Language: Jupyter Notebook 
last commit: about 1 year ago  Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. | 14 | 
|  | An approach to designing and building visualizations through literate programming with Elm, Markdown, and Vega. | 382 | 
|  | An open-source implementation of a vision-language instructed large language model | 513 | 
|  | A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations. | 64 | 
|  | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 | 
|  | A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. | 895 | 
|  | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 | 
|  | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 | 
|  | A large language model designed to process and generate visual information | 956 | 
|  | A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 | 
|  | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 | 
|  | A software framework that provides a widget model approach to create interactive visualizations in Common Lisp for Jupyter notebooks. | 2 | 
|  | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 | 
|  | A declarative format for creating interactive visualization designs | 11,276 | 
|  | A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset | 131 |