Abstract
Data augmentation has become a standard step to improve the predictive power and robustness of convolutional neural networks by means of the synthetic generation of new samples depicting different deformations. This step has been traditionally considered to improve the network at the training stage. In this work, however, we study the use of data augmentation at classification time. That is, the test sample is augmented, following the same procedure considered for training, and the decision is taken with an ensemble prediction over all these samples. We present comprehensive experimentation with several datasets and ensemble decisions, considering a rather generic data augmentation procedure. Our results show that performing this step is able to boost the original classification, even when the room for improvement is limited.
Similar content being viewed by others
References
Agrawal R (2008) Karmeshu: perturbation scheme for online learning of features: Incremental principal component analysis. Pattern Recognit 41(5):1452–1460
Ahmed E, Jones M, Marks TK (2015) An improved deep learning architecture for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Aksakalli V, Malekipirbazari M (2016) Feature selection via binary simultaneous perturbation stochastic approximation. Pattern Recognit Lett 75(Supplement C):41–47
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, Springer, pp. 177–186
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167
Calvo-Zaragoza J, Oncina J (2014) Recognition of pen-based music notation: the HOMUS dataset. In: 22nd international conference on pattern recognition, ICPR 2014, Stockholm, Sweden, August 24–28, pp 3038–3043
Cui X, Goel V, Kingsbury B (2015) Data augmentation for deep neural network acoustic modeling. IEEE/ACM Trans Audio Speech Lang Proc 23(9):1469–1477
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Duchi JC, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
Duda RO, Hart PE (1973) Pattern recognition and scene analysis. Wiley, New York
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, April 11–13, pp 315–323
Goodfellow I, Bengio Y, Courville A (2016) Regularization for deep learning. In: Deep learning, chap 10, MIT Press, pp 228–273
Ha TM, Bunke H (1997) Off-line, handwritten numeral recognition by perturbation method. IEEE Trans Pattern Anal Mach Intell 19(5):535–539
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. CoRR. arXiv:1502.01852
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16:550–554
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer 29(3):31–44
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20:226–239
Ko T, Peddinti V, Povey D, Khudanpur S (2015) Audio augmentation for speech recognition. In: 16th Annual conference of the international speech communication association, INTERSPEECH 2015, Dresden, Germany, September 6–10, pp 3586–3589
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: 26th Annual conference on neural information processing systems, pp 1106–1114
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, New York
Latecki LJ, Lakamper R, Eckhardt T (2000) Shape descriptors for non-rigid shapes with a single closed contour. In: Proceedings. IEEE conference on computer vision and pattern recognition, vol 1, IEEE, pp 424–429
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lemley J, Bazrafkan S, Corcoran P (2017) Smart augmentation learning an optimal data augmentation strategy. IEEE Access 5:5858–5869
Lv JJ, Cheng C, Tian GD, Zhou XD, Zhou X (2016) Landmark perturbation-based data augmentation for unconstrained face recognition. Signal Process Image Commun 47:465–475
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR. arXiv:1312.6229
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR. arXiv:1409.1556
Smith LN, Topin N (2016) Deep convolutional neural network design patterns. arXiv preprint arXiv:1611.00847
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sun Y, Wang X, Tang X (2013) Hybrid deep learning for face verification. In: The IEEE international conference on computer vision (ICCV)
Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th international conference on machine learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp 1139–1147
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Dasgupta S, Mcallester D (eds) JMLR workshop and conference proceedings, proceedings of the 30th international conference on machine learning (ICML-13), vol 28, pp 1058–1066
Wilkinson RA, Geist J, Janet S, Grother PJ et al (1992) The first census optical character recognition system conference. Technical report, US Department of Commerce. https://doi.org/10.18434/T4H01C
Zeiler MD (2012) ADADELTA: an adaptive learning rate method. CoRR. arXiv:1212.5701
Zheng WS, Lai J, Yuen PC, Li SZ (2009) Perturbation LDA: learning the difference between the class empirical mean and its expectation. Pattern Recognit 42(5):764–779
Funding
First author thanks the support from the Spanish Ministerio de Ciencia, Innovación y Universidades through Juan de la Cierva-Formación Grant (Ref. FJCI-2016-27873).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Calvo-Zaragoza, J., Rico-Juan, J.R. & Gallego, AJ. Ensemble classification from deep predictions with test data augmentation. Soft Comput 24, 1423–1433 (2020). https://doi.org/10.1007/s00500-019-03976-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-03976-7