Accountable-Textual-Visual-Chat

Instruction rejection model

Develops accountability in image generation models by learning to reject human instructions

The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.

GitHub

7 stars
1 watching
2 forks
Language: Shell
last commit: over 1 year ago

Related projects:

Repository Description Stars
peteanderson80/bottom-up-attention Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks 1,438
jnhwkim/nips-mrn-vqa This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. 39
rucaibox/comvint Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks 18
aidc-ai/parrot A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. 34
aidc-ai/ovis An MLLM architecture designed to align visual and textual embeddings through structural alignment 575
opendatalab/vigc Autonomously generates high-quality image-text instruction fine-tuning datasets 91
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 270
jiasenlu/adaptiveattention Adaptive attention mechanism for image captioning using visual sentinels 335
eric-xw/arel This codebase provides an implementation of a novel adversarial reward learning algorithm for generating human-like visual stories from image sequences. 136
ethanyanjiali/minchatgpt This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. 214
gt-vision-lab/vqa_lstm_cnn A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. 377
visionlearninggroup/ask_attend_and_answer Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance 25
ucsc-vlaa/sight-beyond-text An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models 19
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 302
vision-cair/chatcaptioner Enables automatic generation of descriptive text from images and videos based on user input. 457