Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method

Akhter, Shaheera Saba Mohd Naseem; Rege, Priti P

doi:10.1007/s12046-021-01664-7

Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method

Published: 26 July 2021

Volume 46, article number 145, (2021)
Cite this article

Sādhanā Aims and scope Submit manuscript

190 Accesses
1 Citation
Explore all metrics

Abstract

An excellent text recognition system requires document images to be finely pre-processed. Several conventional image processing techniques have already been implemented to pre-process Devanagari document images by handcrafting features. In contrast with these methods, a deep learning process can be performed that learns the features automatically. In this paper, we have proposed a transfer learning (TL)-based multi-task deep learning (MTL) architecture for pre-processing of Devanagari document images. The MTL approach allows us to pre-process an input image for three pre-processing tasks, viz. binarization, shirorekha removal, and noise reduction, simultaneously. On the other hand, TL helps to transfer the already learned features from a pre-trained network to the existing one and copes with the problem of dataset scarcity. For each branch of the proposed TL-MTL architecture, we have implemented a convolutional encoder–decoder model. Further, the proposed architecture is optimized using Taguchi’s optimization method with different network’s hyper-parameters as the control factors. The results are then compared to those from the conventional pre-processing methods that are widely used on document images. The comparative results show that the proposed optimized architecture outdoes the traditional image processing methods and has an excellent performance on the dataset of Devanagari document images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preprocessing of Document Images Based on the GGD and GMM for Binarization of Degraded Ancient Papyri Images

Deep Neural Network Optimization for Handwritten Text Recognition

Bi-ESRGAN: A New Approach of Document Image Super-Resolution Based on Dual Deep Transfer Learning

References

Datta A K 1984 A generalized formal approach for description and analysis of major Indian scripts. IETE J. Res. 30(6): 155–161
Article Google Scholar
Jayadevan R, Kolhe S R, Patil P M and Pal U 2011 Offline recognition of Devanagari script: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6): 782–796
Pal U and Chaudhuri B B 2004 Indian script character recognition: a survey. Pattern Recognition 37(9): 1887–1899
Article Google Scholar
Shafait F 2009 Document image analysis with OCRopus. In: Proceedings of the 2009 IEEE 13th International Multitopic Conference, IEEE, pp. 1–6
Taguchi G and Yokoyama Y 1993 Taguchi methods: design of experiments. Taguchi Methods Series 4
Bathla A K, Gupta S K and Jindal M K 2016 Challenges in recognition of Devanagari scripts due to segmentation of handwritten text. In: Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 2711–2715
Pachpande S and Chaudhari A 2017 Implementation of Devanagri character recognition system through pattern recognition techniques. In: Proceedings of the 2017 International Conference on Trends in Electronics and Informatics (ICEI), IEEE, pp. 717–722
Avadesh M and Goyal N 2018 Optical character recognition for sanskrit using convolution neural networks. In: Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), IEEE, pp. 447–452
Mukherji P and Rege P P 2009 Shape feature and fuzzy logic based offline Devnagari handwritten optical character recognition. J. Pattern Recognit. Res. 4: 52–68
Article Google Scholar
Mukherji P and Rege P P 2008 Fuzzy stroke analysis of Devnagari handwritten characters. WSEAS Trans. Comput. 7(5): 351–362
Google Scholar
Arora S, Jahirabadkar S and Kulkarni A 2019 GPU approach for handwritten Devanagari document binarization. In: Smart Innovations in Communication and Computational Sciences. Singapore: Springer, pp. 299–308
Chapter Google Scholar
Khedekar S, Ramanaprasad V, Setlur S and Govindaraju V 2003 Text–image separation in Devanagari documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, IEEE, pp. 1265–1269
Shinde A B and Dandawate Y H 2014 Shirorekha extraction in character segmentation for printed Devanagri text in document image processing. In: Proceedings of the 2014 Annual IEEE India Conference (INDICON), IEEE, pp. 1–7
Krizhevsky A, Sutskever I and Hinton G E 2012 Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
Long J, Shelhamer E and Darrell T 2015 Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440
Wang N, Li S, Gupta A and Yeung D Y 2015 Transferring rich feature hierarchies for robust visual tracking. arXiv preprintarXiv:1501.04587
Mao J, Xu W, Yang Y, Wang J, Huang Z and Yuille A 2014 Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv preprintarXiv:1412.6632
Karayil T, Ul-Hasan A and Breuel T M 2015 A segmentation-free approach for printed Devanagari script recognition. In: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 946–950
Akhter S S M N and Rege P P 2019 Semantic segmentation of printed text from Marathi document images using deep learning methods. In: Proceedings of the 2019 IEEE 16th India Council International Conference (INDICON), IEEE, pp. 1–4
Zhang Y and Yang Q 2017 A survey on multi-task learning. arXiv preprintarXiv:1707.08114
Ruder S 2017 An overview of multi-task learning in deep neural networks. arXiv preprintarXiv:1706.05098
Goodfellow I, Bengio Y and Courville A 2016 Deep Learning. MIT Press, Cambridge
MATH Google Scholar
Zhu Y, Chen Y, Lu Z, Pan S J, Xue G R, Yu Y and Yang Q 2011 Heterogeneous transfer learning for image classification. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence
Bhattacharya U, Parui S K and Mondal S 2009 Devanagari and Bangla text extraction from natural scene images. In: Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, IEEE, pp. 171–175
Banerjee P and Chaudhuri B B 2013 An approach for Bangla and Devanagari video text recognition. In: Proceedings of the 4th International Workshop on Multilingual OCR, pp. 1–5
Kompalli S, Nayak S, Setlur S and Govindaraju V 2005 Challenges in OCR of Devanagari documents. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), IEEE, pp. 327–331
Rege P P and Chandrakar C A 2012 Text–image separation in document images using boundary/perimeter detection. Proc. ACEEE Int. J. Signal Image Process. 3(1): 10–14
Google Scholar
Bhirud J P and Rege P P 2016 A modified SWT based text-image separation in natural scene images. In: Proceedings of the 2016 Conference on Advances in Signal Processing (CASP), IEEE, pp. 360–365
Singh B, Chand V, Mittal A and Ghosh D 2012 A comparative study of different approaches of noise removal for document images. In: Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011). India: Springer, pp. 847–854
Rege P P and Akhter S 2020 Text separation from document images: a deep learning approach. In: Machine Learning and Deep Learning in Real-Time Applications. IGI Global, pp. 283–313
Noh H, Hong S and Han B 2015 Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528
Taghanaki S A, Abhishek K, Cohen J P, Cohen-Adad J and Hamarneh G 2020 Deep semantic segmentation of natural and medical images: a review. Artificial Intelligence Review 54: 137–178
Article Google Scholar
Ahmad Z, Jindal R, Ekbal A and Bhattachharyya P 2020 Borrow from rich cousin: transfer learning for emotion detection using cross lingual embedding. Expert Systems with Applications 139: 112851
Article Google Scholar
Quattoni A, Collins M and Darrell T 2008 Transfer learning for image classification with sparse prototype representations. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1–8
Zhang X, Chen X, Yao L, Ge C and Dong M 2019 Deep neural network hyperparameter optimization with orthogonal array tuning. In: Proceedings of the International Conference on Neural Information Processing. Cham: Springer, pp. 287–295
Young S R, Rose D C, Karnowski T P, Lim S H and Patton R M 2015 Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, pp. 1–5
Ilievski I, Akhtar T, Feng J and Shoemaker C A 2017 Efficient hyperparameter optimization for deep learning algorithms using deterministic RBF surrogates. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence
Li L, Jamieson K, DeSalvo G, Rostamizadeh A and Talwalkar A 2017 Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1): 6765–6816
MathSciNet MATH Google Scholar
Bergstra J and Bengio Y 2012 Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13: 281–305
MathSciNet MATH Google Scholar
Yang H F, Dillon T S and Chen Y P P 2016 Optimized structure of the traffic flow forecasting model with a deep learning approach. IEEE Trans. Neural Netw. Learn. Syst. 28(10): 2371–2381
Article Google Scholar
Taguchi G and Phadke M S 1989 Quality engineering through design optimization. In: Quality Control, Robust Design, and the Taguchi Method. Boston, MA: Springer, pp. 77–96
Chapter Google Scholar
Bagchi T P 1993 Taguchi Methods Explained: Practical Steps to Robust Design. Hoboken: Prentice-Hall
Google Scholar
Ioffe S and Szegedy C 2015 Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Badrinarayanan V, Mishra B and Cipolla R 2015 Understanding symmetries in deep networks. arXiv preprint arXiv:1511.01029
Badrinarayanan V, Kendall A and Cipolla R 2017 Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12): 2481–2495
Article Google Scholar
Xie J, Xu L and Chen E 2012 Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems, pp. 341–349
Pan S J and Yang Q 2009 A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10): 1345–1359
Article Google Scholar
Liskowski P and Krawiec K 2016 Segmenting retinal blood vessels with deep neural networks. IEEE Transactions on Medical Imaging 35(11): 2369–2380
Article Google Scholar
He K, Zhang X, Ren S and Sun J 2016 Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Nwankpa C, Ijomah W, Gachagan A and Marshall S 2018 Activation functions: comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378
Cun Y L, Bottou L, Orr G and Muller K 1998 Efficient backprop, neural networks: tricks of the trade. Lecture Notes Comput. Sci. 1524: 5–50
Google Scholar
He K, Zhang X, Ren S and Sun J 2015 Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034
Bengio Y 2012 Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade. Berlin–Heidelberg: Springer, pp. 437–478
Chapter Google Scholar
Masters D and Luschi C 2018 Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612
Asim M N, Khan M U G, Malik M I, Razzaque K, Dengel A and Ahmed S 2019 Two stream deep network for document image classification. In: Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1410–1416
Sorgdrager A, Wang R J and Grobler A 2017 Taguchi method in electrical machine design. SAIEE Africa Res. J. 108(4): 150–164
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to the editor and the reviewers for their thorough review, valued comments, and positive suggestions. This study did not have any grants from funding agencies in the public, commercial, or non-profit sectors.

Author information

Authors and Affiliations

College of Engineering, Pune, Maharashtra, India
Shaheera Saba Mohd Naseem Akhter & Priti P Rege

Authors

Shaheera Saba Mohd Naseem Akhter
View author publications
You can also search for this author in PubMed Google Scholar
Priti P Rege
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaheera Saba Mohd Naseem Akhter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akhter, S.S.M.N., Rege, P.P. Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method. Sādhanā 46, 145 (2021). https://doi.org/10.1007/s12046-021-01664-7

Download citation

Received: 05 April 2020
Revised: 29 March 2021
Accepted: 21 May 2021
Published: 26 July 2021
DOI: https://doi.org/10.1007/s12046-021-01664-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method

Abstract

Access this article

Similar content being viewed by others

Preprocessing of Document Images Based on the GGD and GMM for Binarization of Degraded Ancient Papyri Images

Deep Neural Network Optimization for Handwritten Text Recognition

Bi-ESRGAN: A New Approach of Document Image Super-Resolution Based on Dual Deep Transfer Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method

Abstract

Access this article

Similar content being viewed by others

Preprocessing of Document Images Based on the GGD and GMM for Binarization of Degraded Ancient Papyri Images

Deep Neural Network Optimization for Handwritten Text Recognition

Bi-ESRGAN: A New Approach of Document Image Super-Resolution Based on Dual Deep Transfer Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation