Mixup of Feature Maps in a Hidden Layer for Training of Convolutional Neural Network
The deep Convolutional Neural Network (CNN) became very popular as a fundamental technique for image classification and objects recognition. To improve the recognition accuracy for the more complex tasks, deeper networks have being introduced. However, the recognition accuracy of the trained deep CNN drastically decreases for the samples which are obtained from the outside regions of the training samples. To improve the generalization ability for such samples, Krizhevsky et al. proposed to generate additional samples through transformations from the existing samples and to make the training samples richer. This method is known as data augmentation. Hongyi Zhang et al. introduced data augmentation method called mixup which achieves state-of-the-art performance in various datasets. Mixup generates new samples by mixing two different training samples. Mixing of the two images is implemented with simple image morphing. In this paper, we propose to apply mixup to the feature maps in a hidden layer. To implement the mixup in the hidden layer we use the Siamese network or the triplet network architecture to mix feature maps. From the experimental comparison, it is observed that the mixup of the feature maps obtained from the first convolution layer is more effective than the original image mixup.
This work was partly supported by JSPS KAKENHI Grant Number 16K00239.
- 1.Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
- 2.Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2014)
- 3.Simard, P.Y., LeCun, Y.A., Denker, J.S., Victorri, B.: Transformation invariance in pattern recognition — tangent distance and tangent propagation. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 239–274. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49430-8_13CrossRefGoogle Scholar
- 4.Kurita, T., Asoh, H., Umeyama, S., Akaho, S., Hosomi, A.: A structural learning by adding independent noises to hidden units. In: Proceedings of IEEE International Conference on Neural Networks, pp. 275–278 (1994)Google Scholar
- 6.Inayohsi, H., Kurita, T.: Improved generalization by adding both auto-association and hidden-layer noise to neural-network-based-classifiers. In: IEEE Workshop on Machine Learning for Signal Processing, pp. 141–146 (2005)Google Scholar
- 7.Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: Proceedings of 2018 International Conference on Learning Representations (ICLR 2018) (2018)Google Scholar
- 8.Tokozume, Y., Ushiku, Y., Harada, T.: Learning from between-class examples for deep sound recognition. In: Proceedings of 2018 International Conference on Learning Representations (ICLR 2018) (2018)Google Scholar
- 9.Tokozume, Y., Ushiku, Y., Harada, T.: Between-class learning for image classification. In: Proceedings of 2018 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR 2018) (2018)Google Scholar
- 10.Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a Siamese time delay neural network. In: Advances in Neural Information Processing Systems, vol. 6 (1993)Google Scholar
- 11.Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 539–546 (2005)Google Scholar
- 12.Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742 (2006)Google Scholar