imageqa-san
Image QA model
This project provides code for training image question answering models using stacked attention networks and convolutional neural networks.
code for Stacked attention networks for image question answering
107 stars
8 watching
52 forks
Language: Python
last commit: almost 8 years ago Related projects:
Repository | Description | Stars |
---|---|---|
jnhwkim/nips-mrn-vqa | This project presents a neural network model designed to answer visual questions by combining question and image features in a residual learning framework. | 39 |
zhengpeng7/birefnet | An implementation of a deep learning-based image segmentation model for high-resolution images | 1,319 |
xiaoman-zhang/pmc-vqa | A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 174 |
gt-vision-lab/vqa_lstm_cnn | A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 376 |
zhegan27/semantic_compositional_nets | A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks | 70 |
hszhao/pspnet | A PyTorch implementation of a deep learning model for semantic image segmentation | 1,593 |
tencentarc-qq/qa-clip | Provides Chinese language models with high performance for image-text retrieval and classification tasks. | 48 |
jiasenlu/hiecoattenvqa | A framework for training Hierarchical Co-Attention models for Visual Question Answering using preprocessed data and a specific image model. | 349 |
juntang-zhuang/laddernet | A deep learning implementation of a multi-path network architecture for medical image segmentation | 139 |
visionlearninggroup/ask_attend_and_answer | Develops a deep learning model to answer questions about visual scenes based on spatial attention and question guidance | 25 |
cadene/vqa.pytorch | A PyTorch implementation of visual question answering with multimodal representation learning | 716 |
masaiahhan/correlationqa | An investigation into the relationship between misleading images and hallucinations in large language models | 8 |
zsdonghao/text-to-image | A TensorFlow implementation of generating images from text descriptions using a Generative Adversarial Network (GAN) architecture | 599 |
ocampor/image-quality | Library providing a set of tools and algorithms for evaluating the quality of digital images | 401 |
allenai/document-qa | Tools and codebase for training neural question answering models on multiple paragraphs of text data | 434 |