Accountable-Textual-Visual-Chat
Instruction rejection model
Develops accountability in image generation models by learning to reject human instructions
The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.
7 stars
1 watching
2 forks
Language: Shell
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
| Trains a bottom-up attention model using Faster R-CNN and Visual Genome annotations for image captioning and VQA tasks | 1,438 |
| This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
| Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
| A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. | 34 |
| An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
| Autonomously generates high-quality image-text instruction fine-tuning datasets | 91 |
| Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
| Adaptive attention mechanism for image captioning using visual sentinels | 335 |
| This codebase provides an implementation of a novel adversarial reward learning algorithm for generating human-like visual stories from image sequences. | 136 |
| This project demonstrates the effectiveness of reinforcement learning from human feedback (RLHF) in improving small language models like GPT-2. | 214 |
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance | 25 |
| An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models | 19 |
| A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
| Enables automatic generation of descriptive text from images and videos based on user input. | 457 |