DenseVideoCaptioning

Video captioning model

An implementation of a dense video captioning model with attention-based fusion and context gating

Official Tensorflow Implementation of the paper "Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning" in CVPR 2018, with code, model and prediction results.

GitHub

149 stars

6 watching

50 forks

Language: Python

last commit: about 6 years ago

dense-video-captioning

Related projects:

Repository	Description	Stars
jamespark3922/adv-inf	A method for generating and evaluating video captions using adversarial inference, trained on large datasets of text and multimedia features.	34
cshizhe/asg2cap	An image caption generation model that uses abstract scene graphs to fine-grained control and generate captions	200
xiadingz/video-caption.pytorch	PyTorch implementation of video captioning, combining deep learning and computer vision techniques.	402
zhegan27/semantic_compositional_nets	A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks	70
yiwuzhong/sub-gc	A PyTorch implementation of image captioning models via scene graph decomposition.	96
shangwei5/vidue	A deep learning model that jointly performs video frame interpolation and deblurring with unknown exposure time	69
jayleicn/clipbert	An efficient framework for end-to-end learning on image-text and video-text tasks	709
chapternewscu/image-captioning-with-semantic-attention	A deep learning model for generating image captions with semantic attention	51
jcjohnson/densecap	A deep learning framework for generating natural language descriptions of images by detecting objects and their attributes	1,584
pku-yuangroup/video-bench	Evaluates and benchmarks large language models' video understanding capabilities	121
kdexd/virtex	A pretraining approach that uses semantically dense captions to learn visual representations and improve image understanding tasks.	556
deeprnn/image_captioning	This implementation allows users to generate captions from images using a neural network model with visual attention.	790
pku-yuangroup/chronomagic-bench	Provides a benchmarking framework for evaluating the quality of text-to-video generation models	191
codeslake/pvdnet	An open-source implementation of a deep learning model for video deblurring and motion estimation.	114
zhengpeng7/birefnet	An open-source implementation of an image segmentation model that combines background removal and object detection capabilities.	1,484