awesome-image-captioning

A curated list of image captioning and related area resources. :-)

GitHub

1k stars
40 watching
184 forks
last commit: over 1 year ago
Linked from 1 awesome list


Awesome Image Captioning / Change Log

here 56 about 2 years ago May 25 An up-to-date paper list about vision-and-language pre-training is available

Awesome Image Captioning / Papers / Survey

A Comprehensive Survey of Deep Learning for Image Captioning Hossain M et al,

Awesome Image Captioning / Papers / Before

I2t: Image parsing to text description Yao B Z et al,
Im2Text: Describing Images Using 1 Million Captioned Photographs Ordonez V et al,
Deep Captioning with Multimodal Recurrent Neural Networks Mao J et al,

Awesome Image Captioning / Papers / 2015

Show and Tell: A Neural Image Caption Generator Vinyals O et al,
Deep Visual-Semantic Alignments for Generating Image Descriptions Karpathy A et al,
Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation Chen X et al,
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Donahue J et al,
Guiding the Long-Short Term Memory Model for Image Caption Generation Jia X et al,
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images Mao J et al,
Expressing an Image Stream with a Sequence of Natural Sentences Park C C et al,
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Xu K et al,
Order-Embeddings of Images and Language Vendrov I et al,
Generating Images from Captions with Attention Mansimov E et al,
Learning FRAME Models Using CNN Filters for Knowledge Visualization Lu Y, et al,
Aligning where to see and what to tell: image caption with region-based attention and scene factorization Jin J et al,

Awesome Image Captioning / Papers / 2016

Image captioning with semantic attention You Q et al,
DenseCap: Fully Convolutional Localization Networks for Dense Captioning Johnson J et al,
What value do explicit high level concepts have in vision to language problems? Wu Q et al,
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data Lisa Anne Hendricks et al,
SPICE: Semantic Propositional Image Caption Evaluation Anderson P et al,
Image Captioning with Deep Bidirectional LSTMs Wang C et al,
Multimodal Pivots for Image Caption Translation Hitschler J et al,
Image Caption Generation with Text-Conditional Semantic Attention Zhou L et al,
DeepDiary: Automatic Caption Generation for Lifelogging Image Streams Fan C et al,
Learning to generalize to new compositions in image understanding Atzmon Y et al,
Generating captions without looking beyond objects Heuer H et al,
Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning Chen W et al,
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering Liu H et al,
Recurrent Highway Networks with Language CNN for Image Captioning Gu J et al,

Awesome Image Captioning / Papers / 2017

Captioning Images with Diverse Objects Venugopalan S et al,
Top-down Visual Saliency Guided by Captions Ramanishka V et al,
Self-Critical Sequence Training for Image Captioning Steven J et al,
Dense Captioning with Joint Inference and Visual Context Yang L et al,
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition Yufei W et al,
A Hierarchical Approach for Generating Descriptive Image Paragraphs Krause J et al,
Deep Reinforcement Learning-based Image Captioning with Embedding Reward Ren Z et al,
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects Ting Y et al,
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning Lu J et al,
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks CC Park et al,
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning Chen L et al,
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning Qing S et al,
Areas of Attention for Image Captioning Pedersoli M et al,
Boosting Image Captioning with Attributes Yao T et al,
An Empirical Study of Language CNN for Image Captioning Gu J et al,
Improved Image Captioning via Policy Gradient Optimization of SPIDEr Liu S et al,
Towards Diverse and Natural Image Descriptions via a Conditional GAN Dai B et al,
Paying Attention to Descriptions Generated by Image Captioning Models Tavakoliy H R et al,
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner Chen T H et al,
Image Caption with Global-Local Attention Li L et al,
Reference Based LSTM for Image Captioning Chen M et al,
Attention Correctness in Neural Image Captioning Liu C et al,
Text-guided Attention Model for Image Captioning Mun J et al,
Contrastive Learning for Image Captioning Dai B et al,
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge Vinyals O et al,
MAT: A Multimodal Attentive Translator for Image Captioning Liu C et al,
Actor-Critic Sequence Training for Image Captioning Zhang L et al,
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator? Tanti M et al,
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning Xian Y et al,
Phrase-based Image Captioning with Hierarchical LSTM Model Tan Y H et al,
Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning Chen H et al,

Awesome Image Captioning / Papers / 2018

Neural Baby Talk Lu J et al,
Convolutional Image Captioning Aneja J et al,
Learning to Evaluate Image Captioning Cui Y et al,
Discriminability Objective for Training Descriptive Captions Luo R et al,
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text Mathews A et al,
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Anderson P et al,
GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints Chen F et al,
Unpaired Image Captioning by Language Pivoting Gu J et al,
Recurrent Fusion Network for Image Captioning Jiang W et al,
Exploring Visual Relationship for Image Captioning Yao T et al,
Rethinking the Form of Latent States in Image Captioning Dai B et al,
Boosted Attention: Leveraging Human Attention for Image Captioning Chen S et al,
"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention Chen T et al,
Learning to Guide Decoding for Image Captioning Jiang W et al,
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning Gu J et al,
Temporal-difference Learning with Sampling Baseline for Image Captioning Chen H et al,
Partially-Supervised Image Captioning Anderson P et al,
A Neural Compositional Paradigm for Image Captioning Dai B et al,
Defoiling Foiled Image Captions Wang J et al,
Punny Captions: Witty Wordplay in Image Descriptions Chandrasekaran A et al,
Object Counts! Bringing Explicit Detections Back into Image Captioning Aneja J et al,
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning Sharma P et al,
Attacking visual language grounding with adversarial examples: A case study on neural image captioning Chen H et al,
simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions Liu et al,
Improved Image Captioning with Adversarial Semantic Alignment Melnyk I et al,
Improving Image Captioning with Conditional Generative Adversarial Nets Chen C et al,
CNN+CNN: Convolutional Decoders for Image Captioning Wang Q et al,
Diverse and Controllable Image Captioning with Part-of-Speech Guidance Deshpande A et al,

Awesome Image Captioning / Papers / 2019

Unsupervised Image Captioning Yang F et al,
Engaging Image Captioning Via Personality Shuster K et al,
Pointing Novel Objects in Image Captioning Li Y et al,
Auto-Encoding Scene Graphs for Image Captioning Yang X et al,
Context and Attribute Grounded Dense Captioning Yin G et al,
Look Back and Predict Forward in Image Captioning Qin Y et al,
Self-critical n-step Training for Image Captioning Gao J et al,
Intention Oriented Image Captions with Guiding Objects Zheng Y et al,
Describing like humans: on diversity in image captioning Wang Q et al,
Adversarial Semantic Alignment for Improved Image Captions Dognin P et al,
MSCap: Multi-Style Image Captioning With Unpaired Stylized Text Gao L et al,
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech Aditya D et al,
Good News, Everyone! Context driven entity-aware captioning for news images Biten A F et al,
CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection 50 over 4 years ago Zhang L et al,
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning Kim D et al,
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions Cornia M et al,
Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables Xu Y et al,
Meta Learning for Image Captioning Li N et al,
Learning Object Context for Dense Captioning Li X et al,
Hierarchical Attention Network for Image Captioning Wang W et al,
Deliberate Residual based Attention Network for Image Captioning Gao L et al,
Improving Image Captioning with Conditional Generative Adversarial Nets Chen C et al,
Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding Song L et al,
Dense Procedure Captioning in Narrated Instructional Videos Shi B et al,
Informative Image Captioning with External Sources of Information Zhao S et al,
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning Fan Z et al,
Image Captioning with Unseen Objects Demirel et al,
Look and Modify: Modification Networks for Image Captioning Sammani et al,
Show, Infer and Tell: Contextual Inference for Creative Captioning Khare et al,
SC-RANK: Improving Convolutional Image Captioning with Self-Critical Learning and Ranking Metric-based Reward Yan et al,
Hierarchy Parsing for Image Captioning Yao T et al,
Entangled Transformer for Image Captioning Li G et al,
Attention on Attention for Image Captioning Huang L et al,
Reflective Decoding Network for Image Captioning Ke L at al,
Learning to Collocate Neural Modules for Image Captioning Yang X et al,
Image Captioning: Transforming Objects into Words Herdade S et al,
Adaptively Aligned Image Captioning via Adaptive Attention Time Huang L et al,
Variational Structured Semantic Inference for Diverse Image Captioning Chen F et al,
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations Liu F et al,
Image Captioning with Compositional Neural Module Networks Tian J et al,
Exploring and Distilling Cross-Modal Information for Image Captioning Liu F et al,
Swell-and-Shrink: Decomposing Image Captioning by Transformation and Summarization Wang H et al,
Hornet: a hierarchical offshoot recurrent network for improving person re-ID via image captioning Yan S et al,
Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach Kim D J et al,
TIGEr: Text-to-Image Grounding for Image Caption Evaluation Jiang M et al,
REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning Jiang M et al,
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering Changpinyo S et al,
Compositional Generalization in Image Captioning Nikolaus M et al,

Awesome Image Captioning / Papers / 2020

MemCap: Memorizing Style Knowledge for Image Captioning Zhao et al,
Unified Vision-Language Pre-Training for Image Captioning and VQA Zhou L et al,
Show, Recall, and Tell: Image Captioning with Recall Mechanism Wang L et al,
Reinforcing an Image Caption Generator using Off-line Human Feedback Hongsuck Seo P et al,
Interactive Dual Generative Adversarial Networks for Image Captioning Liu et al,
Feature Deformation Meta-Networks in Image Captioning of Novel Objects Cao et al,
Joint Commonsense and Relation Reasoning for Image and Video Captioning Hou et al,
Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption Zhang et al,
Normalized and Geometry-Aware Self-Attention Network for Image Captioning Guo L et al,
Object Relational Graph with Teacher-Recommended Learning for Video Captioning Zhang Z et al,
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs Chen S et al,
X-Linear Attention Networks for Image Captioning Pan et al,
Improving Image Captioning with Better Use of Caption Shi Z et al,
Cross-modal Coherence Modeling for Caption Generation Alikhani M et al,
Improving Image Captioning Evaluation by Considering Inter References Variance Yi Y et al,
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning Lei J et al,
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA Kim H et al,
Length-Controllable Image Captioning Deng C et al,
Captioning Images Taken by People Who Are Blind Gurari D et al,
Towards Unique and Informative Captioning of Images Wang Z et al,
Learning Visual Representations with Caption Annotations Sariyildiz M et al,
Comprehensive Image Captioning via Scene Graph Decomposition Zhong Y et al,
SODA: Story Oriented Dense Video Captioning Evaluation Framework Fujita S et al,
TextCaps: a Dataset for Image Captioning with Reading Comprehension Sidorov O et al,
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets Wang J et al,
Learning to Generate Grounded Visual Captions without Localization Supervision Ma C et al,
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards Yang X et al,
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos Chen S et al,
CapWAP: Image Captioning with a Purpose Fisch A et al,
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers Cho J et al,
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning Fang Z et al,
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements Li Y et al,
Diverse Image Captioning with Context-Object Split Latent Spaces Mahajan S et al,
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning Chiaro R et al,

Awesome Image Captioning / Dataset

nocaps , LANG:
MS COCO , LANG:
Flickr 8k , LANG:
Flickr 30k , LANG:
AI Challenger , LANG:
Visual Genome , LANG:
SBUCaptionedPhotoDataset , LANG:
IAPR TC-12 , LANG:

Awesome Image Captioning / Image Captioning Challenge

Microsoft COCO Image Captioning
Google AI Blog: Conceptual Captions
ruotianluo/self-critical.pytorch 993 about 1 year ago
ruotianluo/ImageCaptioning.pytorch 1,436 about 1 year ago
jiasenlu/NeuralBabyTalk 523 over 5 years ago
tensorflow/models/im2txt 76,987 2 days ago
DeepRNN/image_captioning 785 over 2 years ago
jcjohnson/densecap 1,580 about 6 years ago
karpathy/neuraltalk2 5,500 almost 7 years ago
jiasenlu/AdaptiveAttention 334 almost 7 years ago
emansim/text2image 592 over 7 years ago
apple2373/chainer-caption 64 over 5 years ago
peteanderson80/bottom-up-attention 1,426 over 1 year ago

Backlinks from these awesome lists: