LaVIT
Visual understanding and generation framework
A unified framework for training large language models to understand and generate visual content
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
544 stars
14 watching
30 forks
Language: Jupyter Notebook
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. | 14 |
| An approach to designing and building visualizations through literate programming with Elm, Markdown, and Vega. | 382 |
| An open-source implementation of a vision-language instructed large language model | 513 |
| A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations. | 64 |
| Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 |
| A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. | 895 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
| A large language model designed to process and generate visual information | 956 |
| A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |
| A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
| A software framework that provides a widget model approach to create interactive visualizations in Common Lisp for Jupyter notebooks. | 2 |
| Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
| A declarative format for creating interactive visualization designs | 11,276 |
| A dataset of fine-grained visual instructions generated by prompting a large language model with images from another dataset | 131 |