Accountable-Textual-Visual-Chat
Instruction rejection model
Develops accountability in image generation models by learning to reject human instructions
The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.
7 stars
1 watching
2 forks
Language: Shell
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
peteanderson80/bottom-up-attention | Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks | 1,438 |
jnhwkim/nips-mrn-vqa | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
rucaibox/comvint | Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
aidc-ai/parrot | A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. | 34 |
aidc-ai/ovis | An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
opendatalab/vigc | Autonomously generates high-quality image-text instruction fine-tuning datasets | 91 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
jiasenlu/adaptiveattention | Adaptive attention mechanism for image captioning using visual sentinels | 335 |
eric-xw/arel | This codebase provides an implementation of a novel adversarial reward learning algorithm for generating human-like visual stories from image sequences. | 136 |
ethanyanjiali/minchatgpt | This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. | 214 |
gt-vision-lab/vqa_lstm_cnn | A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
visionlearninggroup/ask_attend_and_answer | Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance | 25 |
ucsc-vlaa/sight-beyond-text | An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models | 19 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
vision-cair/chatcaptioner | Enables automatic generation of descriptive text from images and videos based on user input. | 457 |