Accountable-Textual-Visual-Chat

Instruction rejection model

Develops accountability in image generation models by learning to reject human instructions

The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.

GitHub

7 stars
1 watching
2 forks
Language: Shell
last commit: over 1 year ago

Related projects:

Repository Description Stars
peteanderson80/bottom-up-attention Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks 1,433
jnhwkim/nips-mrn-vqa This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. 39
rucaibox/comvint Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks 18
aidc-ai/parrot A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. 30
aidc-ai/ovis An architecture designed to align visual and textual embeddings in multimodal learning 517
opendatalab/vigc Autonomously generates high-quality image-text instruction fine-tuning datasets 90
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 269
jiasenlu/adaptiveattention Adaptive attention mechanism for image captioning using visual sentinels 334
eric-xw/arel This codebase provides an implementation of a novel adversarial reward learning algorithm for generating human-like visual stories from image sequences. 137
ethanyanjiali/minchatgpt This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. 213
gt-vision-lab/vqa_lstm_cnn A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. 376
visionlearninggroup/ask_attend_and_answer Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance 25
ucsc-vlaa/sight-beyond-text This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. 19
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
vision-cair/chatcaptioner Enables automatic generation of descriptive text from images and videos based on user input. 452