LLMGA

Image editing assistant

An implementation of a multimodal generation assistant using large language models and various image editing techniques.

This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral

GitHub

461 stars
13 watching
29 forks
Language: Python
last commit: 3 months ago
aigcimage-design-assistantimage-editingimage-generationlarge-language-modelllmmllmmulti-modal

Related projects:

Repository Description Stars
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
dvlab-research/lisa A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. 1,861
llava-vl/llava-interactive-demo An all-in-one demo for interactive image processing and generation 351
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 576
rksm/org-ai A minor mode in Emacs for integrating generative AI models into text editing, with support for speech input/output and image generation. 692
noelyahan/mergi A Go library and command-line tool for manipulating images 233
hjprint/university_project_vmd_majorization A MATLAB implementation of a variable-model data fusion algorithm for removing noise from images and generating denoised images 77
microsoft/llava-med A research project aimed at building large language and vision models for biomedical applications with capabilities comparable to GPT-4. 1,556
google-research/xmcgan_image_generation This implementation enables text-to-image generation by leveraging cross-modal contrastive learning. 98
dvlab-research/prompt-highlighter An interactive control system for text generation in multi-modal language models 132
nvlabs/eagle Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions 539
open3da/ll3da An interactive system for understanding and interacting with 3D environments using natural language. 248
alibaba/conv-llava This project presents an optimization technique for large-scale image models to reduce computational requirements while maintaining performance. 104
opengvlab/multi-modality-arena An evaluation platform for comparing multi-modality models on visual question-answering tasks 467
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 269