Abstract
An excellent text recognition system requires document images to be finely pre-processed. Several conventional image processing techniques have already been implemented to pre-process Devanagari document images by handcrafting features. In contrast with these methods, a deep learning process can be performed that learns the features automatically. In this paper, we have proposed a transfer learning (TL)-based multi-task deep learning (MTL) architecture for pre-processing of Devanagari document images. The MTL approach allows us to pre-process an input image for three pre-processing tasks, viz. binarization, shirorekha removal, and noise reduction, simultaneously. On the other hand, TL helps to transfer the already learned features from a pre-trained network to the existing one and copes with the problem of dataset scarcity. For each branch of the proposed TL-MTL architecture, we have implemented a convolutional encoder–decoder model. Further, the proposed architecture is optimized using Taguchi’s optimization method with different network’s hyper-parameters as the control factors. The results are then compared to those from the conventional pre-processing methods that are widely used on document images. The comparative results show that the proposed optimized architecture outdoes the traditional image processing methods and has an excellent performance on the dataset of Devanagari document images.
Similar content being viewed by others
References
Datta A K 1984 A generalized formal approach for description and analysis of major Indian scripts. IETE J. Res. 30(6): 155–161
Jayadevan R, Kolhe S R, Patil P M and Pal U 2011 Offline recognition of Devanagari script: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6): 782–796
Pal U and Chaudhuri B B 2004 Indian script character recognition: a survey. Pattern Recognition 37(9): 1887–1899
Shafait F 2009 Document image analysis with OCRopus. In: Proceedings of the 2009 IEEE 13th International Multitopic Conference, IEEE, pp. 1–6
Taguchi G and Yokoyama Y 1993 Taguchi methods: design of experiments. Taguchi Methods Series 4
Bathla A K, Gupta S K and Jindal M K 2016 Challenges in recognition of Devanagari scripts due to segmentation of handwritten text. In: Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 2711–2715
Pachpande S and Chaudhari A 2017 Implementation of Devanagri character recognition system through pattern recognition techniques. In: Proceedings of the 2017 International Conference on Trends in Electronics and Informatics (ICEI), IEEE, pp. 717–722
Avadesh M and Goyal N 2018 Optical character recognition for sanskrit using convolution neural networks. In: Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), IEEE, pp. 447–452
Mukherji P and Rege P P 2009 Shape feature and fuzzy logic based offline Devnagari handwritten optical character recognition. J. Pattern Recognit. Res. 4: 52–68
Mukherji P and Rege P P 2008 Fuzzy stroke analysis of Devnagari handwritten characters. WSEAS Trans. Comput. 7(5): 351–362
Arora S, Jahirabadkar S and Kulkarni A 2019 GPU approach for handwritten Devanagari document binarization. In: Smart Innovations in Communication and Computational Sciences. Singapore: Springer, pp. 299–308
Khedekar S, Ramanaprasad V, Setlur S and Govindaraju V 2003 Text–image separation in Devanagari documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, IEEE, pp. 1265–1269
Shinde A B and Dandawate Y H 2014 Shirorekha extraction in character segmentation for printed Devanagri text in document image processing. In: Proceedings of the 2014 Annual IEEE India Conference (INDICON), IEEE, pp. 1–7
Krizhevsky A, Sutskever I and Hinton G E 2012 Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
Long J, Shelhamer E and Darrell T 2015 Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440
Wang N, Li S, Gupta A and Yeung D Y 2015 Transferring rich feature hierarchies for robust visual tracking. arXiv preprintarXiv:1501.04587
Mao J, Xu W, Yang Y, Wang J, Huang Z and Yuille A 2014 Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv preprintarXiv:1412.6632
Karayil T, Ul-Hasan A and Breuel T M 2015 A segmentation-free approach for printed Devanagari script recognition. In: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 946–950
Akhter S S M N and Rege P P 2019 Semantic segmentation of printed text from Marathi document images using deep learning methods. In: Proceedings of the 2019 IEEE 16th India Council International Conference (INDICON), IEEE, pp. 1–4
Zhang Y and Yang Q 2017 A survey on multi-task learning. arXiv preprintarXiv:1707.08114
Ruder S 2017 An overview of multi-task learning in deep neural networks. arXiv preprintarXiv:1706.05098
Goodfellow I, Bengio Y and Courville A 2016 Deep Learning. MIT Press, Cambridge
Zhu Y, Chen Y, Lu Z, Pan S J, Xue G R, Yu Y and Yang Q 2011 Heterogeneous transfer learning for image classification. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence
Bhattacharya U, Parui S K and Mondal S 2009 Devanagari and Bangla text extraction from natural scene images. In: Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, IEEE, pp. 171–175
Banerjee P and Chaudhuri B B 2013 An approach for Bangla and Devanagari video text recognition. In: Proceedings of the 4th International Workshop on Multilingual OCR, pp. 1–5
Kompalli S, Nayak S, Setlur S and Govindaraju V 2005 Challenges in OCR of Devanagari documents. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), IEEE, pp. 327–331
Rege P P and Chandrakar C A 2012 Text–image separation in document images using boundary/perimeter detection. Proc. ACEEE Int. J. Signal Image Process. 3(1): 10–14
Bhirud J P and Rege P P 2016 A modified SWT based text-image separation in natural scene images. In: Proceedings of the 2016 Conference on Advances in Signal Processing (CASP), IEEE, pp. 360–365
Singh B, Chand V, Mittal A and Ghosh D 2012 A comparative study of different approaches of noise removal for document images. In: Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011). India: Springer, pp. 847–854
Rege P P and Akhter S 2020 Text separation from document images: a deep learning approach. In: Machine Learning and Deep Learning in Real-Time Applications. IGI Global, pp. 283–313
Noh H, Hong S and Han B 2015 Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528
Taghanaki S A, Abhishek K, Cohen J P, Cohen-Adad J and Hamarneh G 2020 Deep semantic segmentation of natural and medical images: a review. Artificial Intelligence Review 54: 137–178
Ahmad Z, Jindal R, Ekbal A and Bhattachharyya P 2020 Borrow from rich cousin: transfer learning for emotion detection using cross lingual embedding. Expert Systems with Applications 139: 112851
Quattoni A, Collins M and Darrell T 2008 Transfer learning for image classification with sparse prototype representations. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1–8
Zhang X, Chen X, Yao L, Ge C and Dong M 2019 Deep neural network hyperparameter optimization with orthogonal array tuning. In: Proceedings of the International Conference on Neural Information Processing. Cham: Springer, pp. 287–295
Young S R, Rose D C, Karnowski T P, Lim S H and Patton R M 2015 Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, pp. 1–5
Ilievski I, Akhtar T, Feng J and Shoemaker C A 2017 Efficient hyperparameter optimization for deep learning algorithms using deterministic RBF surrogates. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence
Li L, Jamieson K, DeSalvo G, Rostamizadeh A and Talwalkar A 2017 Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1): 6765–6816
Bergstra J and Bengio Y 2012 Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13: 281–305
Yang H F, Dillon T S and Chen Y P P 2016 Optimized structure of the traffic flow forecasting model with a deep learning approach. IEEE Trans. Neural Netw. Learn. Syst. 28(10): 2371–2381
Taguchi G and Phadke M S 1989 Quality engineering through design optimization. In: Quality Control, Robust Design, and the Taguchi Method. Boston, MA: Springer, pp. 77–96
Bagchi T P 1993 Taguchi Methods Explained: Practical Steps to Robust Design. Hoboken: Prentice-Hall
Ioffe S and Szegedy C 2015 Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Badrinarayanan V, Mishra B and Cipolla R 2015 Understanding symmetries in deep networks. arXiv preprint arXiv:1511.01029
Badrinarayanan V, Kendall A and Cipolla R 2017 Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12): 2481–2495
Xie J, Xu L and Chen E 2012 Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems, pp. 341–349
Pan S J and Yang Q 2009 A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10): 1345–1359
Liskowski P and Krawiec K 2016 Segmenting retinal blood vessels with deep neural networks. IEEE Transactions on Medical Imaging 35(11): 2369–2380
He K, Zhang X, Ren S and Sun J 2016 Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Nwankpa C, Ijomah W, Gachagan A and Marshall S 2018 Activation functions: comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378
Cun Y L, Bottou L, Orr G and Muller K 1998 Efficient backprop, neural networks: tricks of the trade. Lecture Notes Comput. Sci. 1524: 5–50
He K, Zhang X, Ren S and Sun J 2015 Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034
Bengio Y 2012 Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade. Berlin–Heidelberg: Springer, pp. 437–478
Masters D and Luschi C 2018 Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612
Asim M N, Khan M U G, Malik M I, Razzaque K, Dengel A and Ahmed S 2019 Two stream deep network for document image classification. In: Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1410–1416
Sorgdrager A, Wang R J and Grobler A 2017 Taguchi method in electrical machine design. SAIEE Africa Res. J. 108(4): 150–164
Acknowledgements
The authors are grateful to the editor and the reviewers for their thorough review, valued comments, and positive suggestions. This study did not have any grants from funding agencies in the public, commercial, or non-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Akhter, S.S.M.N., Rege, P.P. Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method. Sādhanā 46, 145 (2021). https://doi.org/10.1007/s12046-021-01664-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-021-01664-7