awesome-action-recognition

Action Recognition Resources

A curated collection of resources and research papers on action recognition and video understanding techniques.

A curated list of action recognition and related area resources

GitHub

4k stars
207 watching
724 forks
last commit: over 1 year ago
Linked from 3 awesome lists

action-classificationaction-detectionaction-recognitionactivity-recognitionactivity-understandingawesomeawesome-listobject-recognitionpose-estimationvideo-processingvideo-recognitionvideo-understanding

Awesome Action Recognition: / Action Recognition and Video Understanding / Summary posts

Deep Learning for Videos: A 2018 Guide to Action Recognition Summary of major landmark action recognition research papers till 2018
Literature Survey: Human Action Recognition Brief human action recognition literature survey of work published between 2014 and 2019

Awesome Action Recognition: / Action Recognition and Video Understanding / Video Representation

Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition J. Choi et al., NeurIPS2019
SlowFast Networks for Video Recognition C. Feichtenhofer et al., ICCV2019
Large-scale weakly-supervised pre-training for video action recognition D. Ghadiyaram et al., arXiv2019
Video Classification with Channel-Separated Convolutional Networks D. Tran et al., arXiv2019
DistInit: Learning Video Representations without a Single Labeled Video R. Girdhar et al., arXiv2019
SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition B. Korbar et al., arXiv2019
Video Action Transformer Network R. Girdhar et al., CVPR2019
Learning Correspondence from the Cycle-consistency of Time X. Wang et al., CVPR2019
Representation Flow for Action Recognition AJ. Piergiovanni and M. S. Ryoo et al., CVPR2019
Collaborative Spatiotemporal Feature Learning for Video Action Recognition C. Li et al., CVPR2019
Learning Video Representations from Correspondence Proposals X. Liu et al., CVPR2019
Timeception for Complex Action Recognition N. Hussein et al., CVPR2019
The Visual Centrifuge: Model-Free Layered Video Representations J.-B. Alayrac et al., CVPR2019
Long-Term Feature Banks for Detailed Video Understanding C.-Y. Wu. et al., CVPR2019
Temporal Relational Reasoning in Videos B. Zhou et al., ECCV2018
Action Recognition Zoo 244 over 5 years ago - Codes for popular action recognition models, written based on pytorch, verified on the something-something dataset
Videos as Space-Time Region Graphs X. Wang and A. Gupta, ECCV2018
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? K. Hara et al., CVPR2019
A Closer Look at Spatiotemporal Convolutions for Action Recognition D. Tran et al., CVPR2018
Attend and Interact: Higher-Order Object Interactions for Video Understanding CY. Ma et al., CVPR 2018
Non-Local Neural Networks X. Wang et al., CVPR2018
Rethinking Spatiotemporal Feature Learning For Video Understanding S. Xie et al., arXiv2017
ConvNet Architecture Search for Spatiotemporal Feature Learning D. Tran et al, arXiv2017. Note: Aka Res3D. : In the repository, C3D-v1.1 is the Res3D implementation
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks Z. Qui et al, ICCV2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset J. Carreira et al, CVPR2017. ,
Learning Spatiotemporal Features with 3D Convolutional Networks D. Tran et al, ICCV2015. Note: Aka C3D. Note that the official caffe does not support python wrapper. , , , : ,
Deep Temporal Linear Encoding Networks A. Diba et al, CVPR2017
Temporal Convolutional Networks: A Unified Approach to Action Segmentation and Detection C. Lea et al, CVPR 2017
Long-term Temporal Convolutions G. Varol et al, TPAMI2017
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition L. Wang et al, arXiv 2016
Convolutional Two-Stream Network Fusion for Video Action Recognition C. Feichtenhofer et al, CVPR2016
Two-Stream Convolutional Networks for Action Recognition in Videos K. Simonyan and A. Zisserman, NIPS2014
Temporal Recurrent Networks for Online Action Detection M. Xu et al, ICCV2019
Long Short-Term Transformer for Online Action Detection M. Xu et al, Neurips2021
[3D ResNet PyTorch] 3,912 almost 4 years ago
[PyTorch Video Research] 533 over 5 years ago
[M-PACT: Michigan Platform for Activity Classification in Tensorflow] 107 over 5 years ago
[Inflated models on PyTorch] 148 over 3 years ago
[I3D models transfered from Tensorflow to PyTorch] 529 6 months ago
[A Two Stream Baseline on Kinectics dataset] 42 almost 6 years ago
[MMAction] 1,864 over 2 years ago
[MMAction2] 4,315 4 months ago
[PySlowFast] 6,652 7 days ago
[Decord] 1,906 5 months ago Efficient video reader for python
[I3D models converted from Tensorflow to Core ML] 24 over 4 years ago
[Extract frame and optical-flow from videos, #docker] 133 over 2 years ago
[NVIDIA-DALI, video loading pipelines]
[NVIDIA optical-flow SDK]

Awesome Action Recognition: / Action Recognition and Video Understanding / Action Classification

Guided Weak Supervision for Action Recognition with Scarce Data to Assess Skills of Children with Autism P. Pandey et al, AAAI 2020
Neural Graph Matching Networks for Fewshot 3D Action Recognition M. Guo et al., ECCV2018
Temporal 3D ConvNets using Temporal Transition Layer A. Diba et al., CVPRW2018
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification A. Diba et al., arXiv2017
Attentional Pooling for Action Recognition R. Girdhar and D. Ramanan, NIPS2017
Fully Context-Aware Video Prediction Byeon et al, arXiv2017
Hidden Two-Stream Convolutional Networks for Action Recognition Y. Zhu et al, arXiv2017
Dynamic Image Networks for Action Recognition H. Bilen et al, CVPR2016
Long-term Recurrent Convolutional Networks for Visual Recognition and Description J. Donahue et al, CVPR2015
Describing Videos by Exploiting Temporal Structure L. Yao et al, ICCV2015. note: from the same group of RCN paper “Delving Deeper into Convolutional Networks for Learning Video Representations"
Two-Stream SR-CNNs for Action Recognition in Videos L. Wang et al, BMVC2016
Real-time Action Recognition with Enhanced Motion Vector CNNs B. Zhang et al, CVPR2016
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors L. Wang et al, CVPR2015

Awesome Action Recognition: / Action Recognition and Video Understanding / Skeleton-Based Action Classification

Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition M. Li et al., CVPR2019
An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition C. Si et al., CVPR2019
View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition P. Zhang et al., TPAMI2019
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition S. Yan et al., AAAI2018
Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition Y. Tang et al., CVPR2018
Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation C. Li et al., IJCAI2018
Part-based Graph Convolutional Network for Action Recognition K. Thakkar et al., BMVC2018

Awesome Action Recognition: / Action Recognition and Video Understanding / Temporal Action Detection

Rethinking the Faster R-CNN Architecture for Temporal Action Localization Yu-Wei Chao et al., CVPR2018
Weakly Supervised Action Localization by Sparse Temporal Pooling Network Phuc Nguyen et al., CVPR 2018
Temporal Deformable Residual Networks for Action Segmentation in Videos P. Lei and S. Todrovic., CVPR2018
End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos Shayamal Buch et al., BMVC 2017
Cascaded Boundary Regression for Temporal Action Detection Jiyang Gao et al., BMVC 2017 [ ]
Temporal Tessellation: A Unified Approach for Video Analysis Kaufman et al., ICCV2017
Temporal Action Detection with Structured Segment Networks Y. Zhao et al., ICCV2017
Temporal Context Network for Activity Localization in Videos X. Dai et al., ICCV2017
Detecting the Moment of Completion: Temporal Models for Localising Action Completion F. Heidarivincheh et al., arXiv2017
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos Z. Shou et al, CVPR2017
SST: Single-Stream Temporal Action Proposals S. Buch et al, CVPR2017
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection H. Xu et al, arXiv2017
DAPs: Deep Action Proposals for Action Understanding V. Escorcia et al, ECCV2016
Online Action Detection using Joint Classification-Regression Recurrent Neural Networks Y. Li et al, ECCV2016. Noe: RGB-D Action Detection
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs Z. Shou et al, CVPR2016. Note: Aka S-CNN
Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos F. Heilbron et al, CVPR2016. Note: Depends on , aka SparseProp
Actionness Estimation Using Hybrid Fully Convolutional Networks L. Wang et al, CVPR2016. Note: The code is not a complete verision. It only contains a demo, not training
Learning Activity Progression in LSTMs for Activity Detection and Early Detection S. Ma et al, CVPR2016
End-to-end Learning of Action Detection from Frame Glimpses in Videos S. Yeung et al, CVPR2016. Note: This method uses reinforcement learning
Fast Action Proposals for Human Action Detection and Search G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP
Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting P. Mettes et al, ICMR2015
Action localization in videos through context walk K. Soomro et al, ICCV2015

Awesome Action Recognition: / Action Recognition and Video Understanding / Spatio-Temporal Action Detection

A Better Baseline for AVA R. Girdhar et al., ActivityNet Workshop, CVPR2018
Real-Time End-to-End Action Detection with Two-Stream Networks A. El-Nouby and G. Taylor, arXiv2018
Human Action Localization with Sparse Spatial Supervision P. Weinzaepfel et al., arXiv2017
Unsupervised Action Discovery and Localization in Videos K. Soomro and M. Shah, ICCV2017
Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions P. Mettes and C. G. M. Snoek, ICCV2017
Action Tubelet Detector for Spatio-Temporal Action Localization V. Kalogeiton et al, ICCV2017
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos et al, ICCV2017
Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection M. Zolfaghari et al, ICCV2017
TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal H. Zhu et al., ICCV2017
Online Real time Multiple Spatiotemporal Action Localisation and Prediction et al, ICCV2017
AMTnet: Action-Micro-Tube regression by end-to-end trainable deep architecture S. Saha et al, ICCV2017
Am I Done? Predicting Action Progress in Videos F. Becattini et al, BMVC2017
Generic Tubelet Proposals for Action Localization J. He et al, arXiv2017
Incremental Tube Construction for Human Action Detection H. S. Behl et al, arXiv2017
Multi-region two-stream R-CNN for action detection and C. Schmid. ECCV2016
Spot On: Action Localization from Pointly-Supervised Proposals P. Mettes et al, ECCV2016
Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos S. Saha et al, BMVC2016
Learning to track for spatio-temporal action localization P. Weinzaepfel et al. ICCV2015
Action detection by implicit intentional motion clustering W. Chen and J. Corso, ICCV2015
Finding Action Tubes G. Gkioxari and J. Malik CVPR2015
APT: Action localization proposals from dense trajectories J. Gemert et al, BMVC2015
Spatio-Temporal Object Detection Proposals D. Oneata et al, ECCV2014
Action localization with tubelets from motion M. Jain et al, CVPR2014
Spatiotemporal deformable part models for action detection et al, CVPR2013
Action localization in videos through context walk K. Soomro et al, ICCV2015
Fast Action Proposals for Human Action Detection and Search G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP

Awesome Action Recognition: / Action Recognition and Video Understanding / Ego-Centric Action Recognition

Actor and Observer: Joint Modeling of First and Third-Person Videos G. Sigurdsson et al., CVPR2018

Awesome Action Recognition: / Action Recognition and Video Understanding / Miscellaneous

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment P. Parma and B. T. Morris. CVPR2019
PathTrack: Fast Trajectory Annotation with Path Supervision S. Manen et al., ICCV2017
CortexNet: a Generic Network Family for Robust Visual Temporal Representations A. Canziani and E. Culurciello - arXiv2017
Slicing Convolutional Neural Network for Crowd Video Understanding J. Shao et al, CVPR2016
Two-Stream (RGB and Flow) pretrained model weights 26 about 8 years ago

Awesome Action Recognition: / Action Recognition and Video Understanding / Action Recognition Datasets

Video Dataset Overview from Antoine Miech
HACS
Moments in Time ,
AVA , , for missing videos
Kinetics , ,
OOPS A dataset of unintentional action,
COIN a large-scale dataset for comprehensive instructional video analysis,
YouTube-8M ,
YouTube-BB ,
DALY Daily Action Localization in Youtube videos. Note: Weakly supervised action detection dataset. Annotations consist of start and end time of each action, one bounding box per each action per video
20BN-JESTER ,
ActivityNet Note: They provide a download script and evaluation code
Charades
Charades-Ego , - First person and third person video aligned dataset
EPIC-Kitchens , - First person videos recorded in kitchens. Note they provide download scripts and a python library
Sports-1M Large scale action recognition dataset
THUMOS14 Note: It overlaps with dataset
THUMOS15 Note: It overlaps with dataset
HOLLYWOOD2 :
UCF-101 , , and , and . And there are also some pre-computed spatiotemporal action detection
UCF-50
UCF-Sports , note: the train/test split link in the official website is broken. Instead, you can download it from
HMDB
J-HMDB
LIRIS-HARL
KTH
MSR Action Note: It overlaps with datset
Sports Videos in the Wild
NTU RGB+D 759 almost 3 years ago
Mixamo Mocap Dataset
UWA3D Multiview Activity II Dataset
Northwestern-UCLA Dataset
SYSU 3D Human-Object Interaction Dataset
MEVA (Multiview Extended Video with Activities) Dataset

Awesome Action Recognition: / Action Recognition and Video Understanding / Video Annotation

Efficiently scaling up crowdsourced video annotation C. Vondrick et. al, IJCV2013
The Design and Implementation of ViPER D. Mihalcik and D. Doermann, Technical report
VTT: Visual Object Tagging Tool 4,320 almost 3 years ago . Modern app to annotate objects in videos and images. It facilitates the development of an end-to-end machine learning pipeline encompassing the annotation/export/import of assets. Moreover, it could run as a native app or via web
VIA: VGG Image Annotator . Simple and standalone manual annotation web-app for image, audio and video. It runs in the web browser and does not require any installation or setup

Awesome Action Recognition: / Object Recognition / Object Detection

Deformable Convolutional Networks J. Dai et al., ICCV2017
Detectron 26,276 about 1 year ago Open Source Object Detection Framework from Facebook AI Research. Includes Mask R-CNN, FPN, and etc. Caffe2 implementation
Mask R-CNN K. He et al, , , , , - State-of-the-art object detection/instance segmentation algorithm
Faster R-CNN S. Ren et al, NIPS2015. , , , - State-of-the-art object detector
YOLO J. Redmon et al, CVPR2016. , - Fast object detector
YOLO9000 J. Redmon and A. Farhadi, CVPR2017. - State-of-the-art object detector which can detect 9000 objects in realtime
SSD W. Liu et al, ECCV2016. , , - State-of-the-art object detector with realtime processing speed
RetinaNet Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár, Facebook AI Research FAIR & ICCV 2017. - State-of-the-art object detector with realtime processing speed

Awesome Action Recognition: / Object Recognition / Video Object Detection

[code] 553 over 6 years ago [Detect to Track and Track to Detect] - C. Feichtenhofer et al., ICCV2017. ,
[code] 723 about 3 years ago [Flow-Guided Feature Aggregation for Video Object Detection] - X. Zhu et al., ICCV2017. , aka FGFA

Awesome Action Recognition: / Object Recognition / Video Object Detection Datasets

ImageNet VID
YouTube-8M ,
YouTube-BB ,

Awesome Action Recognition: / Pose Estimation / Pose Estimation

AlphaPose 8,065 7 months ago PyTorch based realtime and accurate pose estimation and tracking tool from SJTU
Detect-and-Track: Efficient Pose Estimation in Videos R. Girdhar et al., arXiv2017
OpenPose Library 31,387 4 months ago Caffe based realtime pose estimation library from CMU
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Z. Cao et al, CVPR2017. depends on the - Earlier version of OpenPose from CMU
DensePose Dense pose human estimation in the wild implemented in the Detectron framework
MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network M. Kocabas et al, ECCV2018
DeepLabCut: markerless pose estimation of user-defined body parts with deep learning A. Mathis et al, Nature Neuroscience 2018

Awesome Action Recognition: / Competitions / Competitions

ActEV (Activities in Extended Video Activity detection in security camera videos. Runs through 2021. Hosted by NIST

Backlinks from these awesome lists:

More related projects: