RelViT

Visual reasoning tool

A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations.

[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

GitHub

64 stars
6 watching
3 forks
Language: Python
last commit: about 2 years ago
hico-deticlr2022pytorchvisual-reasoningvqa

Related projects:

Repository Description Stars
nvlabs/bongard-hoi A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models. 64
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
davidmascharka/tbd-nets An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. 348
lavi-lab/visual-table A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. 14
jy0205/lavit A unified framework for training large language models to understand and generate visual content 528
rowanz/r2c An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning. 466
nexusapoorvacus/deepvariationstructuredrl An implementation of reinforcement learning for visual relationship and attribute detection using PyTorch. 63
rucaibox/comvint Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks 18
gordonhu608/mqt-llava A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. 97
nv-tlabs/steal Develops a method to create high-quality training data from noisy labels in semantic segmentation tasks. 478
reedscot/cvpr2016 A system for learning deep representations of fine-grained visual descriptions from images 334
jnhwkim/nips-mrn-vqa This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. 39
rlhf-v/rlhf-v Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. 233
jiasenlu/hiecoattenvqa A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. 349
0xstepit/flow.nvim A customizable, high-contrast Nvim color scheme designed to enhance coding focus and productivity 189