awesome-computer-vision-models

Computer Vision Models

A curated list of popular computer vision models with their performance metrics

A list of popular deep learning models related to classification, segmentation and detection problems

GitHub

515 stars

26 watching

94 forks

last commit: about 5 years ago

Linked from 4 awesome lists

awesomeawesome-listawesome-listscomputer-visioncomputer-vision-algorithmsdeep-learningdensenetdiracnetefficientnetimage-classificationmachine-learningmachine-learning-algorithmsmachine-learning-modelsmixnetnasnetobject-detectionproxylessnasresnetsemantic-segmentationsknet

Awesome Computer Vision Models / Classification models
'One weird trick for parallelizing convolutional neural networks'			AlexNet ( )
'Very Deep Convolutional Networks for Large-Scale Image Recognition'			VGG-16 ( )
'Deep Residual Learning for Image Recognition'			ResNet-10 ( )
'Deep Residual Learning for Image Recognition'			ResNet-18 ( )
'Deep Residual Learning for Image Recognition'			ResNet-34 ( )
'Deep Residual Learning for Image Recognition'			ResNet-50 ( )
'Rethinking the Inception Architecture for Computer Vision'			InceptionV3 ( )
'Identity Mappings in Deep Residual Networks'			PreResNet-18 ( )
'Identity Mappings in Deep Residual Networks'			PreResNet-34 ( )
'Identity Mappings in Deep Residual Networks'			PreResNet-50 ( )
'Densely Connected Convolutional Networks'			DenseNet-121 ( )
'Densely Connected Convolutional Networks'			DenseNet-161 ( )
'Deep Pyramidal Residual Networks'			PyramidNet-101 ( )
'Aggregated Residual Transformations for Deep Neural Networks'			ResNeXt-14(32x4d) ( )
'Aggregated Residual Transformations for Deep Neural Networks'			ResNeXt-26(32x4d) ( )
'Wide Residual Networks'			WRN-50-2 ( )
'Xception: Deep Learning with Depthwise Separable Convolutions'			Xception ( )
'Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning'			InceptionV4 ( )
'Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning'			InceptionResNetV2 ( )
'PolyNet: A Pursuit of Structural Diversity in Very Deep Networks'			PolyNet ( )
'Darknet: Open source neural networks in C'	25,894	about 2 years ago	DarkNet Ref ( )
'Darknet: Open source neural networks in C'	25,894	about 2 years ago	DarkNet Tiny ( )
'Darknet: Open source neural networks in C'	25,894	about 2 years ago	DarkNet 53 ( )
'SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size'			SqueezeResNet1.1 ( )
'SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size'			SqueezeNet1.1 ( )
'Residual Attention Network for Image Classification'			ResAttNet-92 ( )
'CondenseNet: An Efficient DenseNet using Learned Group Convolutions'			CondenseNet (G=C=8) ( )
'Dual Path Networks'			DPN-68 ( )
'ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices'			ShuffleNet x1.0 (g=1) ( )
'DiracNets: Training Very Deep Neural Networks Without Skip-Connections'			DiracNetV2-18 ( )
'DiracNets: Training Very Deep Neural Networks Without Skip-Connections'			DiracNetV2-34 ( )
'Squeeze-and-Excitation Networks'			SENet-16 ( )
'Squeeze-and-Excitation Networks'			SENet-154 ( )
'MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications'			MobileNet ( )
'Learning Transferable Architectures for Scalable Image Recognition'			NASNet-A 4@1056 ( )
'Learning Transferable Architectures for Scalable Image Recognition'			NASNet-A 6@4032( )
'Deep Layer Aggregation'			DLA-34 ( )
'Attention Inspiring Receptive-Fields Network for Learning Invariant Representations'			AirNet50-1x64d (r=2) ( )
'BAM: Bottleneck Attention Module'			BAM-ResNet-50 ( )
'CBAM: Convolutional Block Attention Module'			CBAM-ResNet-50 ( )
'SqueezeNext: Hardware-Aware Neural Network Design'			1.0-SqNxt-23v5 ( )
'SqueezeNext: Hardware-Aware Neural Network Design'			1.5-SqNxt-23v5 ( )
'SqueezeNext: Hardware-Aware Neural Network Design'			2.0-SqNxt-23v5 ( )
'ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design'			ShuffleNetV2 ( )
'Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications'			456-MENet-24×1(g=3) ( )
'FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy'			FD-MobileNet ( )
'MobileNetV2: Inverted Residuals and Linear Bottlenecks'			MobileNetV2 ( )
'IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks'			IGCV3 ( )
'DARTS: Differentiable Architecture Search'			DARTS ( )
'Progressive Neural Architecture Search'			PNASNet-5 ( )
'Regularized Evolution for Image Classifier Architecture Search'			AmoebaNet-C ( )
'MnasNet: Platform-Aware Neural Architecture Search for Mobile'			MnasNet ( )
'Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net'			IBN-Net50-a ( )
'Large Margin Deep Networks for Classification'			MarginNet ( )
'A^2-Nets: Double Attention Networks'			A^2 Net ( )
'FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction'			FishNeXt-150 ( )
'IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS'			Shape-ResNet ( )
'Greedy Layerwise Learning Can Scale to ImageNet'			SimCNN(k=3 train) ( )
'Selective Kernel Networks'			SKNet-50 ( )
'SRM : A Style-based Recalibration Module for Convolutional Neural Networks'			SRM-ResNet-50 ( )
'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'			EfficientNet-B0 ( )
'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'			EfficientNet-B7b ( )
'PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE'			ProxylessNAS ( )
'MixNet: Mixed Depthwise Convolutional Kernels'			MixNet-L ( ))
'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'			ECA-Net50 ( )
'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'			ECA-Net101 ( )
'ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks'			ACNet-Densenet121 ( )
'LIP: Local Importance-based Pooling'			LIP-ResNet-50 ( )
'LIP: Local Importance-based Pooling'			LIP-ResNet-101 ( )
'LIP: Local Importance-based Pooling'			LIP-DenseNet-BC-121 ( )
'MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'			MuffNet_1.0 ( )
'MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'			MuffNet_1.5 ( )
'Making Convolutional Networks Shift-Invariant Again'			ResNet-34-Bin-5 ( )
'Making Convolutional Networks Shift-Invariant Again'			ResNet-50-Bin-5 ( )
'Making Convolutional Networks Shift-Invariant Again'			MobileNetV2-Bin-5 ( )
'Fixing the train-test resolution discrepancy'			FixRes ResNeXt101 WSL ( )
'Self-training with Noisy Student improves ImageNet classification'			Noisy Student*(L2) ( )
'TResNet: High Performance GPU-Dedicated Architecture'			TResNet-M ( )
'DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search'			DA-NAS-C ( )
'ResNeSt: Split-Attention Networks'			ResNeSt-50 ( )
'ResNeSt: Split-Attention Networks'			ResNeSt-101 ( )
'Funnel Activation for Visual Recognition'			ResNet-50-FReLU ( )
'Funnel Activation for Visual Recognition'			ResNet-101-FReLU ( )
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'			ResNet-50-MEALv2 ( )
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'			ResNet-50-MEALv2 + CutMix ( )
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'			MobileNet V3-Large-MEALv2 ( )
'MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'			EfficientNet-B0-MEALv2 ( )
'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'			T2T-ViT-7 ( )
'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'			T2T-ViT-14 ( )
'Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'			T2T-ViT-19 ( )
'High-Performance Large-Scale Image Recognition Without Normalization'			NFNet-F0 ( )
'High-Performance Large-Scale Image Recognition Without Normalization'			NFNet-F1 ( )
'High-Performance Large-Scale Image Recognition Without Normalization'			NFNet-F6+SAM ( )
'EfficientNetV2: Smaller Models and Faster Training'			EfficientNetV2-S ( )
'EfficientNetV2: Smaller Models and Faster Training'			EfficientNetV2-M ( )
'EfficientNetV2: Smaller Models and Faster Training'			EfficientNetV2-L ( )
'EfficientNetV2: Smaller Models and Faster Training'			EfficientNetV2-S (21k) ( )
'EfficientNetV2: Smaller Models and Faster Training'			EfficientNetV2-M (21k) ( )
'EfficientNetV2: Smaller Models and Faster Training'			EfficientNetV2-L (21k) ( )
Awesome Computer Vision Models / Segmentation models
'U-Net: Convolutional Networks for Biomedical Image Segmentation'			U-Net ( )
'Learning Deconvolution Network for Semantic Segmentation'			DeconvNet ( )
'ParseNet: Looking Wider to See Better'			ParseNet ( )
'Efficient piecewise training of deep structured models for semantic segmentation'			Piecewise ( )
'SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation'			SegNet ( )
'Fully Convolutional Networks for Semantic Segmentation'			FCN ( )
'ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation'			ENet ( )
'MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS'			DilatedNet ( )
'PixelNet: Towards a General Pixel-Level Architecture'			PixelNet ( )
'RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation'			RefineNet ( )
'Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation'			LRR ( )
'Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes'			FRRN ( )
'MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving'			MultiNet ( )
'DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs'			DeepLab ( )
'LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation'			LinkNet ( )
'The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation'			Tiramisu ( )
'ICNet for Real-Time Semantic Segmentation on High-Resolution Images'			ICNet ( )
'Efficient ConvNet for Real-time Semantic Segmentation'			ERFNet ( )
'Pyramid Scene Parsing Network'			PSPNet ( )
'Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network'			GCN ( )
'Segmentation-Aware Convolutional Networks Using Local Attention Masks'			Segaware ( )
'PIXEL DECONVOLUTIONAL NETWORKS'			PixelDCN ( )
'Rethinking Atrous Convolution for Semantic Image Segmentation'			DeepLabv3 ( )
'Understanding Convolution for Semantic Segmentation'			DUC, HDC ( )
'SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK'			ShuffleSeg ( )
'Learning to Adapt Structured Output Space for Semantic Segmentation'			AdaptSegNet ( )
'Understanding Convolution for Semantic Segmentation'			TuSimple-DUC ( )
'Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation'			R2U-Net ( )
'Attention U-Net: Learning Where to Look for the Pancreas'			Attention U-Net ( )
'Dual Attention Network for Scene Segmentation'			DANet ( )
'Context Encoding for Semantic Segmentation'			ENCNet ( )
'ShelfNet for Real-time Semantic Segmentation'			ShelfNet ( )
'LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION'			LadderNet ( )
'Concentrated-Comprehensive Convolutions for lightweight semantic segmentation'			CCC-ERFnet ( )
'DifNet: Semantic Segmentation by Diffusion Networks'			DifNet-101 ( )
'BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation'			BiSeNet(Res18) ( )
'ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation'			ESPNet ( )
'Semantic Image Synthesis with Spatially-Adaptive Normalization'			SPADE ( )
'Seamless Scene Segmentation'			SeamlessSeg ( )
'Expectation-Maximization Attention Networks for Semantic Segmentation'			EMANet ( )
Awesome Computer Vision Models / Detection models
'Rich feature hierarchies for accurate object detection and semantic segmentation'			R-CNN ( )
'OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks'			OverFeat ( )
'Scalable Object Detection using Deep Neural Networks'			MultiBox ( )
'Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition'			SPP-Net ( )
'Object detection via a multi-region & semantic segmentation-aware CNN model'			MR-CNN ( )
'AttentionNet: Aggregating Weak Directions for Accurate Object Detection'			AttentionNet ( )
'Fast R-CNN'			Fast R-CNN ( )
'Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks'			Fast R-CNN ( )
'You Only Look Once: Unified, Real-Time Object Detection'			YOLO v1 ( )
'G-CNN: an Iterative Grid Based Object Detector'			G-CNN ( )
'Adaptive Object Detection Using Adjacency and Zoom Prediction'			AZNet ( )
'Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks'			ION ( )
'HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection'			HyperNet ( )
'Training Region-based Object Detectors with Online Hard Example Mining'			OHEM ( )
'A MultiPath Network for Object Detection'			MPN ( )
'SSD: Single Shot MultiBox Detector'			SSD ( )
'Crafting GBD-Net for Object Detection'			GBDNet ( )
'Contextual Priming and Feedback for Faster R-CNN'			CPF ( )
'A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection'			MS-CNN ( )
'R-FCN: Object Detection via Region-based Fully Convolutional Networks'			R-FCN ( )
'PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection'			PVANET ( )
'DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection'			DeepID-Net ( )
'Object Detection Networks on Convolutional Feature Maps'			NoC ( )
'DSSD : Deconvolutional Single Shot Detector'			DSSD ( )
'Beyond Skip Connections: Top-Down Modulation for Object Detection'			TDM ( )
'Feature Pyramid Networks for Object Detection'			FPN ( )
'YOLO9000: Better, Faster, Stronger'			YOLO v2 ( )
'RON: Reverse Connection with Objectness Prior Networks for Object Detection'			RON ( )
'Deformable Convolutional Networks'			DCN ( )
'DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling'			DeNet ( )
'CoupleNet: Coupling Global Structure with Local Parts for Object Detection'			CoupleNet ( )
'Focal Loss for Dense Object Detection'			RetinaNet ( )
'Mask R-CNN'			Mask R-CNN ( )
'DSOD: Learning Deeply Supervised Object Detectors from Scratch'			DSOD ( )
'Spatial Memory for Context Reasoning in Object Detection'			SMN ( )
'YOLOv3: An Incremental Improvement'			YOLO v3 ( )
'Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships'			SIN ( )
'Scale-Transferrable Object Detection'			STDN ( )
'Single-Shot Refinement Neural Network for Object Detection'			RefineDet ( )
'MegDet: A Large Mini-Batch Object Detector'			MegDet ( )
'Receptive Field Block Net for Accurate and Fast Object Detection'			RFBNet ( )
'CornerNet: Detecting Objects as Paired Keypoints'			CornerNet ( )
'Libra R-CNN: Towards Balanced Learning for Object Detection'			LibraRetinaNet ( )
'YOLACT Real-time Instance Segmentation'			YOLACT-700 ( )
'DetNAS: Backbone Search for Object Detection'			DetNASNet(3.8) ( )
'YOLOv4: Optimal Speed and Accuracy of Object Detection'			YOLOv4 ( )
'SOLO: Segmenting Objects by Locations'			SOLO ( )
'SOLO: Segmenting Objects by Locations'			D-SOLO ( )
'Scale Normalized Image Pyramids with AutoFocus for Object Detection'			SNIPER ( )
'Scale Normalized Image Pyramids with AutoFocus for Object Detection'			AutoFocus ( )

awesome-computer-vision-models

Awesome Computer Vision Models / Classification models

Awesome Computer Vision Models / Segmentation models

Awesome Computer Vision Models / Detection models

Backlinks from these awesome lists: