Summary: Bag of Tricks for Image Classification with Convolutional Neural Networks
Authors: Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li, arXiv:1812.01187v2, December 2018
Have fun reading
In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation. Introduction Since the introduction of AlexNet  in 2012, deep convolutional neural networks have become the dominating approach for image classification. For example, the top-1 validation accuracy on ImageNet  has been raised from 62.5% (AlexNet) to 82.7% (NASNet-A). Training procedure refinements, including changes in loss functions, data preprocessing, and optimization methods also played a major role. In the literature, most were only briefly mentioned as implementation details while others can only be found in source code. ResNet, trained with our “tricks”, is able to outperform newer and improved architectures trained with standard pipeline. procedure and model architecture refinements that improve model accuracy but barely change computational complexity. Many of them are minor “tricks” like modifying the stride size of a particular convolution layer or adjusting learning rate schedule. Our empirical evaluation shows that several tricks lead to significant accuracy improvement and combining them together can further boost the model accuracy. We further show that models trained with our tricks bring better transfer learning performance in other application domains such as object detection and semantic segmentation. We first set up a baseline training procedure in Section 2, and then discuss several tricks that are 1 ar The template of training a neural network with mini-batch stochastic gradient descent is shown in Algorithm 1. In each iteration, we randomly sample b images to compute the gradients and then update the network parameters. All functions and hyper-parameters in Algorithm 1 can be implemented in many different ways. In other words, for the same number of epochs, training with a large batch size results in a model with degraded validation accuracy compared to the ones trained with smaller batch sizes. One can vary the number of residual blocks in each stage to obtain different ResNet models, such as ResNet-50 and ResNet-152, where the number presents the number of convolutional layers in the network. During training, we add a distillation loss to penalize the difference between the softmax outputs from the teacher model and the learner model. We can observe that a base model with a higher validation accuracy leads to a higher mAP for Faster-RNN in a consistent manner. Semantic Segmentation Semantic segmentation predicts the category for every pixel from the input images. A potential explanation to the phenomenon is that semantic segmentation predicts in the pixel level. In addition, these improved pre-trained models show strong advantages in transfer learning, which improve both object detection and semantic segmentation.
Did you enjoy reading? Follow us on Medium and give us feedback to help us improve our Summarizer.