RelViT
Visual reasoning tool
A deep learning framework designed to improve visual reasoning capabilities by utilizing concepts and semantic relations.
[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
64 stars
6 watching
3 forks
Language: Python
last commit: about 2 years ago hico-deticlr2022pytorchvisual-reasoningvqa
Related projects:
Repository | Description | Stars |
---|---|---|
nvlabs/bongard-hoi | A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models. | 64 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
davidmascharka/tbd-nets | An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
lavi-lab/visual-table | A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. | 14 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 528 |
rowanz/r2c | An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning. | 466 |
nexusapoorvacus/deepvariationstructuredrl | An implementation of reinforcement learning for visual relationship and attribute detection using PyTorch. | 63 |
rucaibox/comvint | Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks | 18 |
gordonhu608/mqt-llava | A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 97 |
nv-tlabs/steal | Develops a method to create high-quality training data from noisy labels in semantic segmentation tasks. | 478 |
reedscot/cvpr2016 | A system for learning deep representations of fine-grained visual descriptions from images | 334 |
jnhwkim/nips-mrn-vqa | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
rlhf-v/rlhf-v | Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy. | 233 |
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
0xstepit/flow.nvim | A customizable, high-contrast Nvim color scheme designed to enhance coding focus and productivity | 189 |