Abstract
Supervised learning techniques require labeled examples that can be time consuming to obtain. In particular, deep learning approaches, where all the feature extraction stages are learned within the artificial neural network, require a large number of labeled examples to train the model. Various data augmentation techniques can be performed to overcome this issue by taking advantage of known variations that have no impact on the label of an example. Typical solutions in computer vision and document analysis and recognition are based on geometric transformations (e.g. shift and rotation) and random elastic deformations of the original training examples. In this paper, we consider Generative Adversarial Networks (GAN), a technique that does not require prior knowledge of the possible variabilities that exist across examples to create novel artificial examples. In the case of a training dataset with a low number of labeled examples, which are described in a high dimensional space, the classifier may generalize poorly. Therefore, we aim at enriching databases of images or signals for improving the classifier performance by designing a GAN for creating artificial images. While adding more images through a GAN can help, the extent to which it will help is unknown, and it may degrade the performance if too many artificial images are added. The approach is tested on four datasets on handwritten digits (Latin, Bangla, Devanagri, and Oriya). The accuracy for each dataset shows that the addition of GAN generated images in the training dataset provides an improvement of the accuracy. However, the results suggest that the addition of too many GAN generated images deteriorates the performance.
Similar content being viewed by others
References
Baird H (1990) Document image defect models. In: Proc. of the IAPR workshop on syntactic and structural pattern recognition, pp 38–46
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Bhattacharya U, Chaudhuri B (2005) Databases for research on recognition of handwritten characters of indian scripts. In: Proc. of the 8th int. conf. on document analysis and recognition (ICDAR’05), pp 789–793
Bhowmick T, Parui S, Bhattacharya U, Shaw B (2006) An HMM based recognition scheme for handwritten oriya numerals. In: Proc. of the 9th int. conf. on information technology (ICIT 2006), pp 105–110
Chaudhuri BB, Pal U (1998) A complete printed Bangla OCR system. Pattern Recogn 31:531–549
Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Computer vision and pattern recognition (CVPR), pp 3642–3649
Dieleman S, Willett KW, Dambre J (2015) Rotation-invariant convolutional neural networks for galaxy morphology prediction. Mon Not R Astron Soc 450:1441–1459
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Guha R, Das N, Kundu M, Nasipuri M, Santosh KC Devnet: an efficient CNN architecture for handwritten devanagari character recognition. International Journal of Pattern Recognition and Artificial Intelligence (2019). https://doi.org/10.1142/S0218001420520096
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Kamble PM, Hegadi RS (2015) Handwritten marathi character recognition using R-HOG feature. Procedia Comput Sci 45:266–274
Kamble PM, Hegadi RS (2016) Comparative study of handwritten marathi characters recognition based on KNN and SVM classifier. In: Int. Conf. on recent trends in image processing and pattern recognition, pp 93–101
Kamble PM, Hegadi RS (2017) Deep neural network for handwritten marathi character recognition. Int J Imag Robot 17(1):95–107
Keysers D, Deselaers T, Gollan C, Ney H (2007) Deformation models for image recognition. IEEE Trans Pattern Anal Machs Intell 29(8):1422–1435
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proc. advances in neural information processing systems, vol 25, pp 1090–1098
Kupyn O, Budzan V, Mykhailych M, Mishkin D, Matas J (2018) Deblurgan: blind motion deblurring using conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8183–8192
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444
Leung MK, Xiong HY, Lee LJ, Frey BJ (2014) Deep learning of the tissue-regulated splicing code. Bioinformatics 30:i121–i129
Li W, Gauci M, Groß R (2016) Turing learning: a metric-free approach to inferring behavior and its application to swarms. Swarm Intell 10(3):211–243
Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O (2018) Are gans created equal? A large-scale study. In: Advances in neural information processing systems, pp 700–709
Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55:263–274
Obaidullah SM, Santosh KC, Goncalves T, Das N, Roy K (eds) (2019) Document processing using machine learning. CRC Press, Boca Raton, FL, USA
Pal U, Chaudhuri BB (2004) Indian script character recognition: a survey. Pattern Recogn 37(9):1887–1899
Pardeshi R, Chaudhuri BB, Hangarge M, Santosh KC (2014) Automatic handwritten indian scripts identification. In: Proc. of the 14th international conference on frontiers in handwriting recognition, pp 375–380
Razali NM, Wah YB, et al. (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J Stat Model Anal 2(1):21–33
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. arXiv:https://arxiv.org/abs/1605.05396
Santosh KC, Lamiroy B, Wendling L (2012) Symbol recognition using spatial relations. Pattern Recogn Lett 33:331–341
Schawinski K, Zhang C, Zhang H, Fowler L, Santhanam GK (2017) Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Mon Not R Astronom Soc: Lett 467(1):L110–L114
Schmidhuber J (1992) Learning factorial codes by predictability minimization. Neural Comput 4(6):863–879
Simard P, Victorri B, LeCun Y, Denker J (1991) Tangent prop - a formalism for specifying selected invariances in an adaptive network. In: Moody RPLEJE, Hanson SJ (eds) Advances in neural information processing systems, pp 895–903
Simard P, Steinkraus D, Platt J (2003) Best practices for convolutional neural networks applied to visual document analysis. In: Proc. of the 7th int. conf. document analysis and recognition (ICDAR), pp 958–962
Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Da N Improved word-level handwritten indic script identification by integrating small convolutional neural networks. Neural Computing and Applications (2019). https://doi.org/10.1007/s00521-019-04111-1
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jha, G., Cecotti, H. Data augmentation for handwritten digit recognition using generative adversarial networks. Multimed Tools Appl 79, 35055–35068 (2020). https://doi.org/10.1007/s11042-020-08883-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08883-w