Accountable-Textual-Visual-Chat

Instruction rejection model

Develops accountability in image generation models by learning to reject human instructions

The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.

GitHub

7 stars

1 watching

2 forks

Language: Shell

last commit: about 2 years ago

Related projects:

Repository	Description	Stars
peteanderson80/bottom-up-attention	Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks	1,438
jnhwkim/nips-mrn-vqa	This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.	39
rucaibox/comvint	Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks	18
aidc-ai/parrot	A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages.	34
aidc-ai/ovis	An MLLM architecture designed to align visual and textual embeddings through structural alignment	575
opendatalab/vigc	Autonomously generates high-quality image-text instruction fine-tuning datasets	91
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
jiasenlu/adaptiveattention	Adaptive attention mechanism for image captioning using visual sentinels	335
eric-xw/arel	This codebase provides an implementation of a novel adversarial reward learning algorithm for generating human-like visual stories from image sequences.	136
ethanyanjiali/minchatgpt	This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2.	214
gt-vision-lab/vqa_lstm_cnn	A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.	377
visionlearninggroup/ask_attend_and_answer	Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance	25
ucsc-vlaa/sight-beyond-text	An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models	19
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
vision-cair/chatcaptioner	Enables automatic generation of descriptive text from images and videos based on user input.	457