RelViT

Visual reasoning tool

A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations.

[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

GitHub

64 stars

6 watching

3 forks

Language: Python

last commit: almost 4 years ago

hico-deticlr2022pytorchvisual-reasoningvqa

Related projects:

Repository	Description	Stars
nvlabs/bongard-hoi	A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models.	64
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
davidmascharka/tbd-nets	An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks.	348
lavi-lab/visual-table	A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge.	14
jy0205/lavit	A unified framework for training large language models to understand and generate visual content	544
rowanz/r2c	An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning.	466
nexusapoorvacus/deepvariationstructuredrl	An implementation of reinforcement learning for visual relationship and attribute detection using PyTorch.	63
rucaibox/comvint	Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks	18
gordonhu608/mqt-llava	A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens.	101
nv-tlabs/steal	Develops a method to create high-quality training data from noisy labels in semantic segmentation tasks.	478
reedscot/cvpr2016	A system for learning deep representations of fine-grained visual descriptions from images	336
jnhwkim/nips-mrn-vqa	This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework.	39
rlhf-v/rlhf-v	Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy.	245
jiasenlu/hiecoattenvqa	A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model.	349
0xstepit/flow.nvim	A customizable, high-contrast Nvim color scheme designed to enhance coding focus and productivity	195