Awesome Computer Vision Models / Classification models |
'One weird trick for parallelizing convolutional neural networks' | | | AlexNet ( ) |
'Very Deep Convolutional Networks for Large-Scale Image Recognition' | | | VGG-16 ( ) |
'Deep Residual Learning for Image Recognition' | | | ResNet-10 ( ) |
'Deep Residual Learning for Image Recognition' | | | ResNet-18 ( ) |
'Deep Residual Learning for Image Recognition' | | | ResNet-34 ( ) |
'Deep Residual Learning for Image Recognition' | | | ResNet-50 ( ) |
'Rethinking the Inception Architecture for Computer Vision' | | | InceptionV3 ( ) |
'Identity Mappings in Deep Residual Networks' | | | PreResNet-18 ( ) |
'Identity Mappings in Deep Residual Networks' | | | PreResNet-34 ( ) |
'Identity Mappings in Deep Residual Networks' | | | PreResNet-50 ( ) |
'Densely Connected Convolutional Networks' | | | DenseNet-121 ( ) |
'Densely Connected Convolutional Networks' | | | DenseNet-161 ( ) |
'Deep Pyramidal Residual Networks' | | | PyramidNet-101 ( ) |
'Aggregated Residual Transformations for Deep Neural Networks' | | | ResNeXt-14(32x4d) ( ) |
'Aggregated Residual Transformations for Deep Neural Networks' | | | ResNeXt-26(32x4d) ( ) |
'Wide Residual Networks' | | | WRN-50-2 ( ) |
'Xception: Deep Learning with Depthwise Separable Convolutions' | | | Xception ( ) |
'Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' | | | InceptionV4 ( ) |
'Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' | | | InceptionResNetV2 ( ) |
'PolyNet: A Pursuit of Structural Diversity in Very Deep Networks' | | | PolyNet ( ) |
'Darknet: Open source neural networks in C' | 25,894 | 9 months ago | DarkNet Ref ( ) |
'Darknet: Open source neural networks in C' | 25,894 | 9 months ago | DarkNet Tiny ( ) |
'Darknet: Open source neural networks in C' | 25,894 | 9 months ago | DarkNet 53 ( ) |
'SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' | | | SqueezeResNet1.1 ( ) |
'SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' | | | SqueezeNet1.1 ( ) |
'Residual Attention Network for Image Classification' | | | ResAttNet-92 ( ) |
'CondenseNet: An Efficient DenseNet using Learned Group Convolutions' | | | CondenseNet (G=C=8) ( ) |
'Dual Path Networks' | | | DPN-68 ( ) |
'ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices' | | | ShuffleNet x1.0 (g=1) ( ) |
'DiracNets: Training Very Deep Neural Networks Without Skip-Connections' | | | DiracNetV2-18 ( ) |
'DiracNets: Training Very Deep Neural Networks Without Skip-Connections' | | | DiracNetV2-34 ( ) |
'Squeeze-and-Excitation Networks' | | | SENet-16 ( ) |
'Squeeze-and-Excitation Networks' | | | SENet-154 ( ) |
'MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications' | | | MobileNet ( ) |
'Learning Transferable Architectures for Scalable Image Recognition' | | | NASNet-A 4@1056 ( ) |
'Learning Transferable Architectures for Scalable Image Recognition' | | | NASNet-A 6@4032( ) |
'Deep Layer Aggregation' | | | DLA-34 ( ) |
'Attention Inspiring Receptive-Fields Network for Learning Invariant Representations' | | | AirNet50-1x64d (r=2) ( ) |
'BAM: Bottleneck Attention Module' | | | BAM-ResNet-50 ( ) |
'CBAM: Convolutional Block Attention Module' | | | CBAM-ResNet-50 ( ) |
'SqueezeNext: Hardware-Aware Neural Network Design' | | | 1.0-SqNxt-23v5 ( ) |
'SqueezeNext: Hardware-Aware Neural Network Design' | | | 1.5-SqNxt-23v5 ( ) |
'SqueezeNext: Hardware-Aware Neural Network Design' | | | 2.0-SqNxt-23v5 ( ) |
'ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design' | | | ShuffleNetV2 ( ) |
'Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications' | | | 456-MENet-24×1(g=3) ( ) |
'FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy' | | | FD-MobileNet ( ) |
'MobileNetV2: Inverted Residuals and Linear Bottlenecks' | | | MobileNetV2 ( ) |
'IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks' | | | IGCV3 ( ) |
'DARTS: Differentiable Architecture Search' | | | DARTS ( ) |
'Progressive Neural Architecture Search' | | | PNASNet-5 ( ) |
'Regularized Evolution for Image Classifier Architecture Search' | | | AmoebaNet-C ( ) |
'MnasNet: Platform-Aware Neural Architecture Search for Mobile' | | | MnasNet ( ) |
'Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net' | | | IBN-Net50-a ( ) |
'Large Margin Deep Networks for Classification' | | | MarginNet ( ) |
'A^2-Nets: Double Attention Networks' | | | A^2 Net ( ) |
'FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction' | | | FishNeXt-150 ( ) |
'IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS' | | | Shape-ResNet ( ) |
'Greedy Layerwise Learning Can Scale to ImageNet' | | | SimCNN(k=3 train) ( ) |
'Selective Kernel Networks' | | | SKNet-50 ( ) |
'SRM : A Style-based Recalibration Module for Convolutional Neural Networks' | | | SRM-ResNet-50 ( ) |
'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' | | | EfficientNet-B0 ( ) |
'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' | | | EfficientNet-B7b ( ) |
'PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE' | | | ProxylessNAS ( ) |
'MixNet: Mixed Depthwise Convolutional Kernels' | | | MixNet-L ( )) |
'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' | | | ECA-Net50 ( ) |
'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' | | | ECA-Net101 ( ) |
'ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks' | | | ACNet-Densenet121 ( ) |
'LIP: Local Importance-based Pooling' | | | LIP-ResNet-50 ( ) |
'LIP: Local Importance-based Pooling' | | | LIP-ResNet-101 ( ) |
'LIP: Local Importance-based Pooling' | | | LIP-DenseNet-BC-121 ( ) |
'MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' | | | MuffNet_1.0 ( ) |
'MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' | | | MuffNet_1.5 ( ) |
'Making Convolutional Networks Shift-Invariant Again' | | | ResNet-34-Bin-5 ( ) |
'Making Convolutional Networks Shift-Invariant Again' | | | ResNet-50-Bin-5 ( ) |
'Making Convolutional Networks Shift-Invariant Again' | | | MobileNetV2-Bin-5 ( ) |
'Fixing the train-test resolution discrepancy' | | | FixRes ResNeXt101 WSL ( ) |
'Self-training with Noisy Student improves ImageNet classification' | | | Noisy Student*(L2) ( ) |
'TResNet: High Performance GPU-Dedicated Architecture' | | | TResNet-M ( ) |
'DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search' | | | DA-NAS-C ( ) |
'ResNeSt: Split-Attention Networks' | | | ResNeSt-50 ( ) |
'ResNeSt: Split-Attention Networks' | | | ResNeSt-101 ( ) |
'Funnel Activation for Visual Recognition' | | | ResNet-50-FReLU ( ) |
'Funnel Activation for Visual Recognition' | | | ResNet-101-FReLU ( ) |
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | ResNet-50-MEALv2 ( ) |
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | ResNet-50-MEALv2 + CutMix ( ) |
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | MobileNet V3-Large-MEALv2 ( ) |
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' | | | EfficientNet-B0-MEALv2 ( ) |
'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' | | | T2T-ViT-7 ( ) |
'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' | | | T2T-ViT-14 ( ) |
'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' | | | T2T-ViT-19 ( ) |
'High-Performance Large-Scale Image Recognition Without Normalization' | | | NFNet-F0 ( ) |
'High-Performance Large-Scale Image Recognition Without Normalization' | | | NFNet-F1 ( ) |
'High-Performance Large-Scale Image Recognition Without Normalization' | | | NFNet-F6+SAM ( ) |
'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-S ( ) |
'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-M ( ) |
'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-L ( ) |
'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-S (21k) ( ) |
'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-M (21k) ( ) |
'EfficientNetV2: Smaller Models and Faster Training' | | | EfficientNetV2-L (21k) ( ) |
Awesome Computer Vision Models / Segmentation models |
'U-Net: Convolutional Networks for Biomedical Image Segmentation' | | | U-Net ( ) |
'Learning Deconvolution Network for Semantic Segmentation' | | | DeconvNet ( ) |
'ParseNet: Looking Wider to See Better' | | | ParseNet ( ) |
'Efficient piecewise training of deep structured models for semantic segmentation' | | | Piecewise ( ) |
'SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation' | | | SegNet ( ) |
'Fully Convolutional Networks for Semantic Segmentation' | | | FCN ( ) |
'ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation' | | | ENet ( ) |
'MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS' | | | DilatedNet ( ) |
'PixelNet: Towards a General Pixel-Level Architecture' | | | PixelNet ( ) |
'RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation' | | | RefineNet ( ) |
'Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation' | | | LRR ( ) |
'Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes' | | | FRRN ( ) |
'MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving' | | | MultiNet ( ) |
'DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs' | | | DeepLab ( ) |
'LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation' | | | LinkNet ( ) |
'The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation' | | | Tiramisu ( ) |
'ICNet for Real-Time Semantic Segmentation on High-Resolution Images' | | | ICNet ( ) |
'Efficient ConvNet for Real-time Semantic Segmentation' | | | ERFNet ( ) |
'Pyramid Scene Parsing Network' | | | PSPNet ( ) |
'Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network' | | | GCN ( ) |
'Segmentation-Aware Convolutional Networks Using Local Attention Masks' | | | Segaware ( ) |
'PIXEL DECONVOLUTIONAL NETWORKS' | | | PixelDCN ( ) |
'Rethinking Atrous Convolution for Semantic Image Segmentation' | | | DeepLabv3 ( ) |
'Understanding Convolution for Semantic Segmentation' | | | DUC, HDC ( ) |
'SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK' | | | ShuffleSeg ( ) |
'Learning to Adapt Structured Output Space for Semantic Segmentation' | | | AdaptSegNet ( ) |
'Understanding Convolution for Semantic Segmentation' | | | TuSimple-DUC ( ) |
'Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation' | | | R2U-Net ( ) |
'Attention U-Net: Learning Where to Look for the Pancreas' | | | Attention U-Net ( ) |
'Dual Attention Network for Scene Segmentation' | | | DANet ( ) |
'Context Encoding for Semantic Segmentation' | | | ENCNet ( ) |
'ShelfNet for Real-time Semantic Segmentation' | | | ShelfNet ( ) |
'LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION' | | | LadderNet ( ) |
'Concentrated-Comprehensive Convolutions for lightweight semantic segmentation' | | | CCC-ERFnet ( ) |
'DifNet: Semantic Segmentation by Diffusion Networks' | | | DifNet-101 ( ) |
'BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation' | | | BiSeNet(Res18) ( ) |
'ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation' | | | ESPNet ( ) |
'Semantic Image Synthesis with Spatially-Adaptive Normalization' | | | SPADE ( ) |
'Seamless Scene Segmentation' | | | SeamlessSeg ( ) |
'Expectation-Maximization Attention Networks for Semantic Segmentation' | | | EMANet ( ) |
Awesome Computer Vision Models / Detection models |
'Rich feature hierarchies for accurate object detection and semantic segmentation' | | | R-CNN ( ) |
'OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks' | | | OverFeat ( ) |
'Scalable Object Detection using Deep Neural Networks' | | | MultiBox ( ) |
'Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition' | | | SPP-Net ( ) |
'Object detection via a multi-region & semantic segmentation-aware CNN model' | | | MR-CNN ( ) |
'AttentionNet: Aggregating Weak Directions for Accurate Object Detection' | | | AttentionNet ( ) |
'Fast R-CNN' | | | Fast R-CNN ( ) |
'Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks' | | | Fast R-CNN ( ) |
'You Only Look Once: Unified, Real-Time Object Detection' | | | YOLO v1 ( ) |
'G-CNN: an Iterative Grid Based Object Detector' | | | G-CNN ( ) |
'Adaptive Object Detection Using Adjacency and Zoom Prediction' | | | AZNet ( ) |
'Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks' | | | ION ( ) |
'HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection' | | | HyperNet ( ) |
'Training Region-based Object Detectors with Online Hard Example Mining' | | | OHEM ( ) |
'A MultiPath Network for Object Detection' | | | MPN ( ) |
'SSD: Single Shot MultiBox Detector' | | | SSD ( ) |
'Crafting GBD-Net for Object Detection' | | | GBDNet ( ) |
'Contextual Priming and Feedback for Faster R-CNN' | | | CPF ( ) |
'A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection' | | | MS-CNN ( ) |
'R-FCN: Object Detection via Region-based Fully Convolutional Networks' | | | R-FCN ( ) |
'PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection' | | | PVANET ( ) |
'DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection' | | | DeepID-Net ( ) |
'Object Detection Networks on Convolutional Feature Maps' | | | NoC ( ) |
'DSSD : Deconvolutional Single Shot Detector' | | | DSSD ( ) |
'Beyond Skip Connections: Top-Down Modulation for Object Detection' | | | TDM ( ) |
'Feature Pyramid Networks for Object Detection' | | | FPN ( ) |
'YOLO9000: Better, Faster, Stronger' | | | YOLO v2 ( ) |
'RON: Reverse Connection with Objectness Prior Networks for Object Detection' | | | RON ( ) |
'Deformable Convolutional Networks' | | | DCN ( ) |
'DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling' | | | DeNet ( ) |
'CoupleNet: Coupling Global Structure with Local Parts for Object Detection' | | | CoupleNet ( ) |
'Focal Loss for Dense Object Detection' | | | RetinaNet ( ) |
'Mask R-CNN' | | | Mask R-CNN ( ) |
'DSOD: Learning Deeply Supervised Object Detectors from Scratch' | | | DSOD ( ) |
'Spatial Memory for Context Reasoning in Object Detection' | | | SMN ( ) |
'YOLOv3: An Incremental Improvement' | | | YOLO v3 ( ) |
'Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships' | | | SIN ( ) |
'Scale-Transferrable Object Detection' | | | STDN ( ) |
'Single-Shot Refinement Neural Network for Object Detection' | | | RefineDet ( ) |
'MegDet: A Large Mini-Batch Object Detector' | | | MegDet ( ) |
'Receptive Field Block Net for Accurate and Fast Object Detection' | | | RFBNet ( ) |
'CornerNet: Detecting Objects as Paired Keypoints' | | | CornerNet ( ) |
'Libra R-CNN: Towards Balanced Learning for Object Detection' | | | LibraRetinaNet ( ) |
'YOLACT Real-time Instance Segmentation' | | | YOLACT-700 ( ) |
'DetNAS: Backbone Search for Object Detection' | | | DetNASNet(3.8) ( ) |
'YOLOv4: Optimal Speed and Accuracy of Object Detection' | | | YOLOv4 ( ) |
'SOLO: Segmenting Objects by Locations' | | | SOLO ( ) |
'SOLO: Segmenting Objects by Locations' | | | D-SOLO ( ) |
'Scale Normalized Image Pyramids with AutoFocus for Object Detection' | | | SNIPER ( ) |
'Scale Normalized Image Pyramids with AutoFocus for Object Detection' | | | AutoFocus ( ) |