awesome-image-captioning

Image Captioning Resources

Curated list of resources on image captioning and related areas, including papers, datasets, and implementations.

A curated list of image captioning and related area resources. :-)

GitHub

1k stars

40 watching

184 forks

last commit: over 2 years ago

Linked from 1 awesome list

Awesome Image Captioning / Change Log
here	56	over 3 years ago	May 25 An up-to-date paper list about vision-and-language pre-training is available
Awesome Image Captioning / Papers / Survey
A Comprehensive Survey of Deep Learning for Image Captioning			Hossain M et al,
Awesome Image Captioning / Papers / Before
I2t: Image parsing to text description			Yao B Z et al,
Im2Text: Describing Images Using 1 Million Captioned Photographs			Ordonez V et al,
Deep Captioning with Multimodal Recurrent Neural Networks			Mao J et al,
Awesome Image Captioning / Papers / 2015
Show and Tell: A Neural Image Caption Generator			Vinyals O et al,
Deep Visual-Semantic Alignments for Generating Image Descriptions			Karpathy A et al,
Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation			Chen X et al,
Long-term Recurrent Convolutional Networks for Visual Recognition and Description			Donahue J et al,
Guiding the Long-Short Term Memory Model for Image Caption Generation			Jia X et al,
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images			Mao J et al,
Expressing an Image Stream with a Sequence of Natural Sentences			Park C C et al,
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention			Xu K et al,
Order-Embeddings of Images and Language			Vendrov I et al,
Generating Images from Captions with Attention			Mansimov E et al,
Learning FRAME Models Using CNN Filters for Knowledge Visualization			Lu Y, et al,
Aligning where to see and what to tell: image caption with region-based attention and scene factorization			Jin J et al,
Awesome Image Captioning / Papers / 2016
Image captioning with semantic attention			You Q et al,
DenseCap: Fully Convolutional Localization Networks for Dense Captioning			Johnson J et al,
What value do explicit high level concepts have in vision to language problems?			Wu Q et al,
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data			Lisa Anne Hendricks et al,
SPICE: Semantic Propositional Image Caption Evaluation			Anderson P et al,
Image Captioning with Deep Bidirectional LSTMs			Wang C et al,
Multimodal Pivots for Image Caption Translation			Hitschler J et al,
Image Caption Generation with Text-Conditional Semantic Attention			Zhou L et al,
DeepDiary: Automatic Caption Generation for Lifelogging Image Streams			Fan C et al,
Learning to generalize to new compositions in image understanding			Atzmon Y et al,
Generating captions without looking beyond objects			Heuer H et al,
Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning			Chen W et al,
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering			Liu H et al,
Recurrent Highway Networks with Language CNN for Image Captioning			Gu J et al,
Awesome Image Captioning / Papers / 2017
Captioning Images with Diverse Objects			Venugopalan S et al,
Top-down Visual Saliency Guided by Captions			Ramanishka V et al,
Self-Critical Sequence Training for Image Captioning			Steven J et al,
Dense Captioning with Joint Inference and Visual Context			Yang L et al,
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition			Yufei W et al,
A Hierarchical Approach for Generating Descriptive Image Paragraphs			Krause J et al,
Deep Reinforcement Learning-based Image Captioning with Embedding Reward			Ren Z et al,
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects			Ting Y et al,
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning			Lu J et al,
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks			CC Park et al,
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning			Chen L et al,
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning			Qing S et al,
Areas of Attention for Image Captioning			Pedersoli M et al,
Boosting Image Captioning with Attributes			Yao T et al,
An Empirical Study of Language CNN for Image Captioning			Gu J et al,
Improved Image Captioning via Policy Gradient Optimization of SPIDEr			Liu S et al,
Towards Diverse and Natural Image Descriptions via a Conditional GAN			Dai B et al,
Paying Attention to Descriptions Generated by Image Captioning Models			Tavakoliy H R et al,
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner			Chen T H et al,
Image Caption with Global-Local Attention			Li L et al,
Reference Based LSTM for Image Captioning			Chen M et al,
Attention Correctness in Neural Image Captioning			Liu C et al,
Text-guided Attention Model for Image Captioning			Mun J et al,
Contrastive Learning for Image Captioning			Dai B et al,
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge			Vinyals O et al,
MAT: A Multimodal Attentive Translator for Image Captioning			Liu C et al,
Actor-Critic Sequence Training for Image Captioning			Zhang L et al,
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?			Tanti M et al,
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning			Xian Y et al,
Phrase-based Image Captioning with Hierarchical LSTM Model			Tan Y H et al,
Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning			Chen H et al,
Awesome Image Captioning / Papers / 2018
Neural Baby Talk			Lu J et al,
Convolutional Image Captioning			Aneja J et al,
Learning to Evaluate Image Captioning			Cui Y et al,
Discriminability Objective for Training Descriptive Captions			Luo R et al,
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text			Mathews A et al,
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering			Anderson P et al,
GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints			Chen F et al,
Unpaired Image Captioning by Language Pivoting			Gu J et al,
Recurrent Fusion Network for Image Captioning			Jiang W et al,
Exploring Visual Relationship for Image Captioning			Yao T et al,
Rethinking the Form of Latent States in Image Captioning			Dai B et al,
Boosted Attention: Leveraging Human Attention for Image Captioning			Chen S et al,
"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention			Chen T et al,
Learning to Guide Decoding for Image Captioning			Jiang W et al,
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning			Gu J et al,
Temporal-difference Learning with Sampling Baseline for Image Captioning			Chen H et al,
Partially-Supervised Image Captioning			Anderson P et al,
A Neural Compositional Paradigm for Image Captioning			Dai B et al,
Defoiling Foiled Image Captions			Wang J et al,
Punny Captions: Witty Wordplay in Image Descriptions			Chandrasekaran A et al,
Object Counts! Bringing Explicit Detections Back into Image Captioning			Aneja J et al,
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning			Sharma P et al,
Attacking visual language grounding with adversarial examples: A case study on neural image captioning			Chen H et al,
simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions			Liu et al,
Improved Image Captioning with Adversarial Semantic Alignment			Melnyk I et al,
Improving Image Captioning with Conditional Generative Adversarial Nets			Chen C et al,
CNN+CNN: Convolutional Decoders for Image Captioning			Wang Q et al,
Diverse and Controllable Image Captioning with Part-of-Speech Guidance			Deshpande A et al,
Awesome Image Captioning / Papers / 2019
Unsupervised Image Captioning			Yang F et al,
Engaging Image Captioning Via Personality			Shuster K et al,
Pointing Novel Objects in Image Captioning			Li Y et al,
Auto-Encoding Scene Graphs for Image Captioning			Yang X et al,
Context and Attribute Grounded Dense Captioning			Yin G et al,
Look Back and Predict Forward in Image Captioning			Qin Y et al,
Self-critical n-step Training for Image Captioning			Gao J et al,
Intention Oriented Image Captions with Guiding Objects			Zheng Y et al,
Describing like humans: on diversity in image captioning			Wang Q et al,
Adversarial Semantic Alignment for Improved Image Captions			Dognin P et al,
MSCap: Multi-Style Image Captioning With Unpaired Stylized Text			Gao L et al,
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech			Aditya D et al,
Good News, Everyone! Context driven entity-aware captioning for news images			Biten A F et al,
CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection	50	almost 6 years ago	Zhang L et al,
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning			Kim D et al,
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions			Cornia M et al,
Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables			Xu Y et al,
Meta Learning for Image Captioning			Li N et al,
Learning Object Context for Dense Captioning			Li X et al,
Hierarchical Attention Network for Image Captioning			Wang W et al,
Deliberate Residual based Attention Network for Image Captioning			Gao L et al,
Improving Image Captioning with Conditional Generative Adversarial Nets			Chen C et al,
Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding			Song L et al,
Dense Procedure Captioning in Narrated Instructional Videos			Shi B et al,
Informative Image Captioning with External Sources of Information			Zhao S et al,
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning			Fan Z et al,
Image Captioning with Unseen Objects			Demirel et al,
Look and Modify: Modification Networks for Image Captioning			Sammani et al,
Show, Infer and Tell: Contextual Inference for Creative Captioning			Khare et al,
SC-RANK: Improving Convolutional Image Captioning with Self-Critical Learning and Ranking Metric-based Reward			Yan et al,
Hierarchy Parsing for Image Captioning			Yao T et al,
Entangled Transformer for Image Captioning			Li G et al,
Attention on Attention for Image Captioning			Huang L et al,
Reflective Decoding Network for Image Captioning			Ke L at al,
Learning to Collocate Neural Modules for Image Captioning			Yang X et al,
Image Captioning: Transforming Objects into Words			Herdade S et al,
Adaptively Aligned Image Captioning via Adaptive Attention Time			Huang L et al,
Variational Structured Semantic Inference for Diverse Image Captioning			Chen F et al,
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations			Liu F et al,
Image Captioning with Compositional Neural Module Networks			Tian J et al,
Exploring and Distilling Cross-Modal Information for Image Captioning			Liu F et al,
Swell-and-Shrink: Decomposing Image Captioning by Transformation and Summarization			Wang H et al,
Hornet: a hierarchical offshoot recurrent network for improving person re-ID via image captioning			Yan S et al,
Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach			Kim D J et al,
TIGEr: Text-to-Image Grounding for Image Caption Evaluation			Jiang M et al,
REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning			Jiang M et al,
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering			Changpinyo S et al,
Compositional Generalization in Image Captioning			Nikolaus M et al,
Awesome Image Captioning / Papers / 2020
MemCap: Memorizing Style Knowledge for Image Captioning			Zhao et al,
Unified Vision-Language Pre-Training for Image Captioning and VQA			Zhou L et al,
Show, Recall, and Tell: Image Captioning with Recall Mechanism			Wang L et al,
Reinforcing an Image Caption Generator using Off-line Human Feedback			Hongsuck Seo P et al,
Interactive Dual Generative Adversarial Networks for Image Captioning			Liu et al,
Feature Deformation Meta-Networks in Image Captioning of Novel Objects			Cao et al,
Joint Commonsense and Relation Reasoning for Image and Video Captioning			Hou et al,
Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption			Zhang et al,
Normalized and Geometry-Aware Self-Attention Network for Image Captioning			Guo L et al,
Object Relational Graph with Teacher-Recommended Learning for Video Captioning			Zhang Z et al,
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs			Chen S et al,
X-Linear Attention Networks for Image Captioning			Pan et al,
Improving Image Captioning with Better Use of Caption			Shi Z et al,
Cross-modal Coherence Modeling for Caption Generation			Alikhani M et al,
Improving Image Captioning Evaluation by Considering Inter References Variance			Yi Y et al,
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning			Lei J et al,
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA			Kim H et al,
Length-Controllable Image Captioning			Deng C et al,
Captioning Images Taken by People Who Are Blind			Gurari D et al,
Towards Unique and Informative Captioning of Images			Wang Z et al,
Learning Visual Representations with Caption Annotations			Sariyildiz M et al,
Comprehensive Image Captioning via Scene Graph Decomposition			Zhong Y et al,
SODA: Story Oriented Dense Video Captioning Evaluation Framework			Fujita S et al,
TextCaps: a Dataset for Image Captioning with Reading Comprehension			Sidorov O et al,
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets			Wang J et al,
Learning to Generate Grounded Visual Captions without Localization Supervision			Ma C et al,
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards			Yang X et al,
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos			Chen S et al,
CapWAP: Image Captioning with a Purpose			Fisch A et al,
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers			Cho J et al,
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning			Fang Z et al,
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements			Li Y et al,
Diverse Image Captioning with Context-Object Split Latent Spaces			Mahajan S et al,
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning			Chiaro R et al,
Awesome Image Captioning / Dataset
nocaps			, LANG:
MS COCO			, LANG:
Flickr 8k			, LANG:
Flickr 30k			, LANG:
AI Challenger			, LANG:
Visual Genome			, LANG:
SBUCaptionedPhotoDataset			, LANG:
IAPR TC-12			, LANG:
Awesome Image Captioning / Image Captioning Challenge
Microsoft COCO Image Captioning
Google AI Blog: Conceptual Captions
Awesome Image Captioning / Popular Implementations / PyTorch
ruotianluo/self-critical.pytorch	998	about 2 years ago
ruotianluo/ImageCaptioning.pytorch	1,458	about 2 years ago
jiasenlu/NeuralBabyTalk	525	over 6 years ago
Awesome Image Captioning / Popular Implementations / TensorFlow
tensorflow/models/im2txt	77,258	12 months ago
DeepRNN/image_captioning	790	over 3 years ago
Awesome Image Captioning / Popular Implementations / Torch
jcjohnson/densecap	1,584	over 7 years ago
karpathy/neuraltalk2	5,515	about 8 years ago
jiasenlu/AdaptiveAttention	335	almost 8 years ago
Awesome Image Captioning / Popular Implementations / Others
emansim/text2image	594	almost 9 years ago
apple2373/chainer-caption	64	over 6 years ago
peteanderson80/bottom-up-attention	1,438	almost 3 years ago

Backlinks from these awesome lists:

0ex/more-awesome

awesome-image-captioning

Awesome Image Captioning / Change Log

Awesome Image Captioning / Papers / Survey

Awesome Image Captioning / Papers / Before

Awesome Image Captioning / Papers / 2015

Awesome Image Captioning / Papers / 2016

Awesome Image Captioning / Papers / 2017

Awesome Image Captioning / Papers / 2018

Awesome Image Captioning / Papers / 2019

Awesome Image Captioning / Papers / 2020

Awesome Image Captioning / Dataset

Awesome Image Captioning / Image Captioning Challenge

Awesome Image Captioning / Popular Implementations / PyTorch

Awesome Image Captioning / Popular Implementations / TensorFlow

Awesome Image Captioning / Popular Implementations / Torch

Awesome Image Captioning / Popular Implementations / Others

Backlinks from these awesome lists:

More related projects: