Awesome Computer Vision Models / Classification models |
| 'One weird trick for parallelizing convolutional neural networks' | | | AlexNet ( ) |
| 'Very Deep Convolutional Networks for Large-Scale Image Recognition' | | | VGG-16 ( ) |
| 'Deep Residual Learning for Image Recognition' | | | ResNet-10 ( ) |
| 'Deep Residual Learning for Image Recognition' | | | ResNet-18 ( ) |
| 'Deep Residual Learning for Image Recognition' | | | ResNet-34 ( ) |
| 'Deep Residual Learning for Image Recognition' | | | ResNet-50 ( ) |
| 'Rethinking the Inception Architecture for Computer Vision' | | | InceptionV3 ( ) |
| 'Identity Mappings in Deep Residual Networks' | | | PreResNet-18 ( ) |
| 'Identity Mappings in Deep Residual Networks' | | | PreResNet-34 ( ) |
| 'Identity Mappings in Deep Residual Networks' | | | PreResNet-50 ( ) |
| 'Densely Connected Convolutional Networks' | | | DenseNet-121 ( ) |
| 'Densely Connected Convolutional Networks' | | | DenseNet-161 ( ) |
| 'Deep Pyramidal Residual Networks' | | | PyramidNet-101 ( ) |
| 'Aggregated Residual Transformations for Deep Neural Networks' | | | ResNeXt-14(32x4d) ( ) |
| 'Aggregated Residual Transformations for Deep Neural Networks' | | | ResNeXt-26(32x4d) ( ) |
| 'Wide Residual Networks' | | | WRN-50-2 ( ) |
| 'Xception: Deep Learning with Depthwise Separable Convolutions' | | | Xception ( ) |
| 'Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' | | | InceptionV4 ( ) |
| 'Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' | | | InceptionResNetV2 ( ) |
| 'PolyNet: A Pursuit of Structural Diversity in Very Deep Networks' | | | PolyNet ( ) |
| 'Darknet: Open source neural networks in C' | 25,894 | over 1 year ago | DarkNet Ref ( ) |
| 'Darknet: Open source neural networks in C' | 25,894 | over 1 year ago | DarkNet Tiny ( ) |
| 'Darknet: Open source neural networks in C' | 25,894 | over 1 year ago | DarkNet 53 ( ) |
| 'SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' | | | SqueezeResNet1.1 ( ) |
| 'SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' | | | SqueezeNet1.1 ( ) |
| 'Residual Attention Network for Image Classification' | | | ResAttNet-92 ( ) |
| 'CondenseNet: An Efficient DenseNet using Learned Group Convolutions' | | | CondenseNet (G=C=8) ( ) |
| 'Dual Path Networks' | | | DPN-68 ( ) |
| 'ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices' | | | ShuffleNet x1.0 (g=1) ( ) |
| 'DiracNets: Training Very Deep Neural Networks Without Skip-Connections' | | | DiracNetV2-18 ( ) |
| 'DiracNets: Training Very Deep Neural Networks Without Skip-Connections' | | | DiracNetV2-34 ( ) |
| 'Squeeze-and-Excitation Networks' | | | SENet-16 ( ) |
| 'Squeeze-and-Excitation Networks' | | | SENet-154 ( ) |
| 'MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications' | | | MobileNet ( ) |
| 'Learning Transferable Architectures for Scalable Image Recognition' | | | NASNet-A 4@1056 ( ) |
| 'Learning Transferable Architectures for Scalable Image Recognition' | | | NASNet-A 6@4032( ) |
| 'Deep Layer Aggregation' | | | DLA-34 ( ) |
| 'Attention Inspiring Receptive-Fields Network for Learning Invariant Representations' | | | AirNet50-1x64d (r=2) ( ) |
| 'BAM: Bottleneck Attention Module' | | | BAM-ResNet-50 ( ) |
| 'CBAM: Convolutional Block Attention Module' | | | CBAM-ResNet-50 ( ) |
| 'SqueezeNext: Hardware-Aware Neural Network Design' | | | 1.0-SqNxt-23v5 ( ) |
| 'SqueezeNext: Hardware-Aware Neural Network Design' | | | 1.5-SqNxt-23v5 ( ) |
| 'SqueezeNext: Hardware-Aware Neural Network Design' | | | 2.0-SqNxt-23v5 ( ) |
| 'ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design' | | | ShuffleNetV2 ( ) |
| 'Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications' | | | 456-MENet-24×1(g=3) ( ) |
| 'FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy' | | | FD-MobileNet ( ) |
| 'MobileNetV2: Inverted Residuals and Linear Bottlenecks' | | | MobileNetV2 ( ) |
| 'IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks' | | | IGCV3 ( ) |
| 'DARTS: Differentiable Architecture Search' | | | DARTS ( ) |
| 'Progressive Neural Architecture Search' | | | PNASNet-5 ( ) |
| 'Regularized Evolution for Image Classifier Architecture Search' | | | AmoebaNet-C ( ) |
| 'MnasNet: Platform-Aware Neural Architecture Search for Mobile' | | | MnasNet ( ) |
| 'Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net' | | | IBN-Net50-a ( ) |
| 'Large Margin Deep Networks for Classification' | | | MarginNet ( ) |
| 'A^2-Nets: Double Attention Networks' | | | A^2 Net ( ) |
| 'FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction' | | | FishNeXt-150 ( ) |
| 'IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS' | | | Shape-ResNet ( ) |
| 'Greedy Layerwise Learning Can Scale to ImageNet' | | | SimCNN(k=3 train) ( ) |
| 'Selective Kernel Networks' | | | SKNet-50 ( ) |
| 'SRM : A Style-based Recalibration Module for Convolutional Neural Networks' | | | SRM-ResNet-50 ( ) |
| 'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' | | | EfficientNet-B0 ( ) |
| 'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' | | | EfficientNet-B7b ( ) |
| 'PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE' | | | ProxylessNAS ( ) |
| 'MixNet: Mixed Depthwise Convolutional Kernels' | | | MixNet-L ( )) |
| 'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' | | | ECA-Net50 ( ) |
| 'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' | | | ECA-Net101 ( ) |
| 'ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks' | | | ACNet-Densenet121 ( ) |
| 'LIP: Local Importance-based Pooling' | | | LIP-ResNet-50 ( ) |
| 'LIP: Local Importance-based Pooling' | | | LIP-ResNet-101 ( ) |
| 'LIP: Local Importance-based Pooling' | | | LIP-DenseNet-BC-121 ( ) |
| 'MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' | | | MuffNet_1.0 ( ) |
| 'MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' | | | MuffNet_1.5 ( ) |
| 'Making Convolutional Networks Shift-Invariant Again' | | | ResNet-34-Bin-5 ( ) |
| 'Making Convolutional Networks Shift-Invariant Again' | | | ResNet-50-Bin-5 ( ) |
| 'Making Convolutional Networks Shift-Invariant Again' | | | MobileNetV2-Bin-5 ( ) |
| 'Fixing the train-test resolution discrepancy' | | | FixRes ResNeXt101 WSL ( ) |
| 'Self-training with Noisy Student improves ImageNet classification' | | | Noisy Student*(L2) ( ) |
| 'TResNet: High Performance GPU-Dedicated Architecture' | | | TResNet-M ( ) |
| 'DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search' | | | DA-NAS-C ( ) |
| 'ResNeSt: Split-Attention Networks' | | | ResNeSt-50 ( ) |
| 'ResNeSt: Split-Attention Networks' | | | ResNeSt-101 ( ) |
| 'Funnel Activation for Visual Recognition' | | | ResNet-50-FReLU ( ) |
| 'Funnel Activation for Visual Recognition' | | | ResNet-101-FReLU ( ) |
| 'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | ResNet-50-MEALv2 ( ) |
| 'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | ResNet-50-MEALv2 + CutMix ( ) |
| 'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | MobileNet V3-Large-MEALv2 ( ) |
| 'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | EfficientNet-B0-MEALv2 ( ) |
| 'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' | | | T2T-ViT-7 ( ) |
| 'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' | | | T2T-ViT-14 ( ) |
| 'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' | | | T2T-ViT-19 ( ) |
| 'High-Performance Large-Scale Image Recognition Without Normalization' | | | NFNet-F0 ( ) |
| 'High-Performance Large-Scale Image Recognition Without Normalization' | | | NFNet-F1 ( ) |
| 'High-Performance Large-Scale Image Recognition Without Normalization' | | | NFNet-F6+SAM ( ) |
| 'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-S ( ) |
| 'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-M ( ) |
| 'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-L ( ) |
| 'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-S (21k) ( ) |
| 'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-M (21k) ( ) |
| 'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-L (21k) ( ) |
Awesome Computer Vision Models / Segmentation models |
| 'U-Net: Convolutional Networks for Biomedical Image Segmentation' | | | U-Net ( ) |
| 'Learning Deconvolution Network for Semantic Segmentation' | | | DeconvNet ( ) |
| 'ParseNet: Looking Wider to See Better' | | | ParseNet ( ) |
| 'Efficient piecewise training of deep structured models for semantic segmentation' | | | Piecewise ( ) |
| 'SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation' | | | SegNet ( ) |
| 'Fully Convolutional Networks for Semantic Segmentation' | | | FCN ( ) |
| 'ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation' | | | ENet ( ) |
| 'MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS' | | | DilatedNet ( ) |
| 'PixelNet: Towards a General Pixel-Level Architecture' | | | PixelNet ( ) |
| 'RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation' | | | RefineNet ( ) |
| 'Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation' | | | LRR ( ) |
| 'Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes' | | | FRRN ( ) |
| 'MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving' | | | MultiNet ( ) |
| 'DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs' | | | DeepLab ( ) |
| 'LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation' | | | LinkNet ( ) |
| 'The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation' | | | Tiramisu ( ) |
| 'ICNet for Real-Time Semantic Segmentation on High-Resolution Images' | | | ICNet ( ) |
| 'Efficient ConvNet for Real-time Semantic Segmentation' | | | ERFNet ( ) |
| 'Pyramid Scene Parsing Network' | | | PSPNet ( ) |
| 'Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network' | | | GCN ( ) |
| 'Segmentation-Aware Convolutional Networks Using Local Attention Masks' | | | Segaware ( ) |
| 'PIXEL DECONVOLUTIONAL NETWORKS' | | | PixelDCN ( ) |
| 'Rethinking Atrous Convolution for Semantic Image Segmentation' | | | DeepLabv3 ( ) |
| 'Understanding Convolution for Semantic Segmentation' | | | DUC, HDC ( ) |
| 'SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK' | | | ShuffleSeg ( ) |
| 'Learning to Adapt Structured Output Space for Semantic Segmentation' | | | AdaptSegNet ( ) |
| 'Understanding Convolution for Semantic Segmentation' | | | TuSimple-DUC ( ) |
| 'Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation' | | | R2U-Net ( ) |
| 'Attention U-Net: Learning Where to Look for the Pancreas' | | | Attention U-Net ( ) |
| 'Dual Attention Network for Scene Segmentation' | | | DANet ( ) |
| 'Context Encoding for Semantic Segmentation' | | | ENCNet ( ) |
| 'ShelfNet for Real-time Semantic Segmentation' | | | ShelfNet ( ) |
| 'LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION' | | | LadderNet ( ) |
| 'Concentrated-Comprehensive Convolutions for lightweight semantic segmentation' | | | CCC-ERFnet ( ) |
| 'DifNet: Semantic Segmentation by Diffusion Networks' | | | DifNet-101 ( ) |
| 'BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation' | | | BiSeNet(Res18) ( ) |
| 'ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation' | | | ESPNet ( ) |
| 'Semantic Image Synthesis with Spatially-Adaptive Normalization' | | | SPADE ( ) |
| 'Seamless Scene Segmentation' | | | SeamlessSeg ( ) |
| 'Expectation-Maximization Attention Networks for Semantic Segmentation' | | | EMANet ( ) |
Awesome Computer Vision Models / Detection models |
| 'Rich feature hierarchies for accurate object detection and semantic segmentation' | | | R-CNN ( ) |
| 'OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks' | | | OverFeat ( ) |
| 'Scalable Object Detection using Deep Neural Networks' | | | MultiBox ( ) |
| 'Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition' | | | SPP-Net ( ) |
| 'Object detection via a multi-region & semantic segmentation-aware CNN model' | | | MR-CNN ( ) |
| 'AttentionNet: Aggregating Weak Directions for Accurate Object Detection' | | | AttentionNet ( ) |
| 'Fast R-CNN' | | | Fast R-CNN ( ) |
| 'Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks' | | | Fast R-CNN ( ) |
| 'You Only Look Once: Unified, Real-Time Object Detection' | | | YOLO v1 ( ) |
| 'G-CNN: an Iterative Grid Based Object Detector' | | | G-CNN ( ) |
| 'Adaptive Object Detection Using Adjacency and Zoom Prediction' | | | AZNet ( ) |
| 'Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks' | | | ION ( ) |
| 'HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection' | | | HyperNet ( ) |
| 'Training Region-based Object Detectors with Online Hard Example Mining' | | | OHEM ( ) |
| 'A MultiPath Network for Object Detection' | | | MPN ( ) |
| 'SSD: Single Shot MultiBox Detector' | | | SSD ( ) |
| 'Crafting GBD-Net for Object Detection' | | | GBDNet ( ) |
| 'Contextual Priming and Feedback for Faster R-CNN' | | | CPF ( ) |
| 'A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection' | | | MS-CNN ( ) |
| 'R-FCN: Object Detection via Region-based Fully Convolutional Networks' | | | R-FCN ( ) |
| 'PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection' | | | PVANET ( ) |
| 'DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection' | | | DeepID-Net ( ) |
| 'Object Detection Networks on Convolutional Feature Maps' | | | NoC ( ) |
| 'DSSD : Deconvolutional Single Shot Detector' | | | DSSD ( ) |
| 'Beyond Skip Connections: Top-Down Modulation for Object Detection' | | | TDM ( ) |
| 'Feature Pyramid Networks for Object Detection' | | | FPN ( ) |
| 'YOLO9000: Better, Faster, Stronger' | | | YOLO v2 ( ) |
| 'RON: Reverse Connection with Objectness Prior Networks for Object Detection' | | | RON ( ) |
| 'Deformable Convolutional Networks' | | | DCN ( ) |
| 'DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling' | | | DeNet ( ) |
| 'CoupleNet: Coupling Global Structure with Local Parts for Object Detection' | | | CoupleNet ( ) |
| 'Focal Loss for Dense Object Detection' | | | RetinaNet ( ) |
| 'Mask R-CNN' | | | Mask R-CNN ( ) |
| 'DSOD: Learning Deeply Supervised Object Detectors from Scratch' | | | DSOD ( ) |
| 'Spatial Memory for Context Reasoning in Object Detection' | | | SMN ( ) |
| 'YOLOv3: An Incremental Improvement' | | | YOLO v3 ( ) |
| 'Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships' | | | SIN ( ) |
| 'Scale-Transferrable Object Detection' | | | STDN ( ) |
| 'Single-Shot Refinement Neural Network for Object Detection' | | | RefineDet ( ) |
| 'MegDet: A Large Mini-Batch Object Detector' | | | MegDet ( ) |
| 'Receptive Field Block Net for Accurate and Fast Object Detection' | | | RFBNet ( ) |
| 'CornerNet: Detecting Objects as Paired Keypoints' | | | CornerNet ( ) |
| 'Libra R-CNN: Towards Balanced Learning for Object Detection' | | | LibraRetinaNet ( ) |
| 'YOLACT Real-time Instance Segmentation' | | | YOLACT-700 ( ) |
| 'DetNAS: Backbone Search for Object Detection' | | | DetNASNet(3.8) ( ) |
| 'YOLOv4: Optimal Speed and Accuracy of Object Detection' | | | YOLOv4 ( ) |
| 'SOLO: Segmenting Objects by Locations' | | | SOLO ( ) |
| 'SOLO: Segmenting Objects by Locations' | | | D-SOLO ( ) |
| 'Scale Normalized Image Pyramids with AutoFocus for Object Detection' | | | SNIPER ( ) |
| 'Scale Normalized Image Pyramids with AutoFocus for Object Detection' | | | AutoFocus ( ) |