LLMGA
Image editing assistant
An implementation of a multimodal generation assistant using large language models and various image editing techniques.
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
461 stars
13 watching
29 forks
Language: Python
last commit: 3 months ago aigcimage-design-assistantimage-editingimage-generationlarge-language-modelllmmllmmulti-modal
Related projects:
Repository | Description | Stars |
---|---|---|
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,861 |
llava-vl/llava-interactive-demo | An all-in-one demo for interactive image processing and generation | 351 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 576 |
rksm/org-ai | A minor mode in Emacs for integrating generative AI models into text editing, with support for speech input/output and image generation. | 692 |
noelyahan/mergi | A Go library and command-line tool for manipulating images | 233 |
hjprint/university_project_vmd_majorization | A MATLAB implementation of a variable-model data fusion algorithm for removing noise from images and generating denoised images | 77 |
microsoft/llava-med | A research project aimed at building large language and vision models for biomedical applications with capabilities comparable to GPT-4. | 1,556 |
google-research/xmcgan_image_generation | This implementation enables text-to-image generation by leveraging cross-modal contrastive learning. | 98 |
dvlab-research/prompt-highlighter | An interactive control system for text generation in multi-modal language models | 132 |
nvlabs/eagle | Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 539 |
open3da/ll3da | An interactive system for understanding and interacting with 3D environments using natural language. | 248 |
alibaba/conv-llava | This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. | 104 |
opengvlab/multi-modality-arena | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 467 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |