DenseVideoCaptioning

Video captioning model

An implementation of a dense video captioning model with attention-based fusion and context gating

Official Tensorflow Implementation of the paper "Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning" in CVPR 2018, with code, model and prediction results.

GitHub

148 stars
6 watching
50 forks
Language: Python
last commit: over 5 years ago
dense-video-captioning

Related projects:

Repository Description Stars
jamespark3922/adv-inf A method for generating and evaluating video captions using adversarial inference, trained on large datasets of text and multimedia features. 34
cshizhe/asg2cap An image caption generation model that uses abstract scene graphs to fine-grained control and generate captions 200
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 401
zhegan27/semantic_compositional_nets A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks 70
yiwuzhong/sub-gc A PyTorch implementation of image captioning models via scene graph decomposition. 96
shangwei5/vidue A deep learning model that jointly performs video frame interpolation and deblurring with unknown exposure time 66
jayleicn/clipbert An efficient framework for end-to-end learning on image-text and video-text tasks 704
chapternewscu/image-captioning-with-semantic-attention A deep learning model for generating image captions with semantic attention 51
jcjohnson/densecap A deep learning framework for generating natural language descriptions of images by detecting objects and their attributes 1,584
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
kdexd/virtex A pretraining approach that uses semantically dense captions to learn visual representations and improve image understanding tasks. 557
deeprnn/image_captioning This implementation allows users to generate captions from images using a neural network model with visual attention. 786
pku-yuangroup/chronomagic-bench A benchmark and dataset for evaluating text-to-video generation models' ability to generate coherent and varied metamorphic time-lapse videos. 186
codeslake/pvdnet An open-source implementation of a deep learning model for video deblurring and motion estimation. 114
zhengpeng7/birefnet An implementation of a deep learning-based image segmentation model for high-resolution images 1,319