Skip to main content
Log in

Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

An excellent text recognition system requires document images to be finely pre-processed. Several conventional image processing techniques have already been implemented to pre-process Devanagari document images by handcrafting features. In contrast with these methods, a deep learning process can be performed that learns the features automatically. In this paper, we have proposed a transfer learning (TL)-based multi-task deep learning (MTL) architecture for pre-processing of Devanagari document images. The MTL approach allows us to pre-process an input image for three pre-processing tasks, viz. binarization, shirorekha removal, and noise reduction, simultaneously. On the other hand, TL helps to transfer the already learned features from a pre-trained network to the existing one and copes with the problem of dataset scarcity. For each branch of the proposed TL-MTL architecture, we have implemented a convolutional encoder–decoder model. Further, the proposed architecture is optimized using Taguchi’s optimization method with different network’s hyper-parameters as the control factors. The results are then compared to those from the conventional pre-processing methods that are widely used on document images. The comparative results show that the proposed optimized architecture outdoes the traditional image processing methods and has an excellent performance on the dataset of Devanagari document images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19

Similar content being viewed by others

References

  1. Datta A K 1984 A generalized formal approach for description and analysis of major Indian scripts. IETE J. Res. 30(6): 155–161

    Article  Google Scholar 

  2. Jayadevan R, Kolhe S R, Patil P M and Pal U 2011 Offline recognition of Devanagari script: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6): 782–796

  3. Pal U and Chaudhuri B B 2004 Indian script character recognition: a survey. Pattern Recognition 37(9): 1887–1899

    Article  Google Scholar 

  4. Shafait F 2009 Document image analysis with OCRopus. In: Proceedings of the 2009 IEEE 13th International Multitopic Conference, IEEE, pp. 1–6

  5. Taguchi G and Yokoyama Y 1993 Taguchi methods: design of experiments. Taguchi Methods Series 4

  6. Bathla A K, Gupta S K and Jindal M K 2016 Challenges in recognition of Devanagari scripts due to segmentation of handwritten text. In: Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 2711–2715

  7. Pachpande S and Chaudhari A 2017 Implementation of Devanagri character recognition system through pattern recognition techniques. In: Proceedings of the 2017 International Conference on Trends in Electronics and Informatics (ICEI), IEEE, pp. 717–722

  8. Avadesh M and Goyal N 2018 Optical character recognition for sanskrit using convolution neural networks. In: Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), IEEE, pp. 447–452

  9. Mukherji P and Rege P P 2009 Shape feature and fuzzy logic based offline Devnagari handwritten optical character recognition. J. Pattern Recognit. Res. 4: 52–68

    Article  Google Scholar 

  10. Mukherji P and Rege P P 2008 Fuzzy stroke analysis of Devnagari handwritten characters. WSEAS Trans. Comput. 7(5): 351–362

    Google Scholar 

  11. Arora S, Jahirabadkar S and Kulkarni A 2019 GPU approach for handwritten Devanagari document binarization. In: Smart Innovations in Communication and Computational Sciences. Singapore: Springer, pp. 299–308

    Chapter  Google Scholar 

  12. Khedekar S, Ramanaprasad V, Setlur S and Govindaraju V 2003 Text–image separation in Devanagari documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, IEEE, pp. 1265–1269

  13. Shinde A B and Dandawate Y H 2014 Shirorekha extraction in character segmentation for printed Devanagri text in document image processing. In: Proceedings of the 2014 Annual IEEE India Conference (INDICON), IEEE, pp. 1–7

  14. Krizhevsky A, Sutskever I and Hinton G E 2012 Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105

  15. Long J, Shelhamer E and Darrell T 2015 Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440

  16. Wang N, Li S, Gupta A and Yeung D Y 2015 Transferring rich feature hierarchies for robust visual tracking. arXiv preprintarXiv:1501.04587

  17. Mao J, Xu W, Yang Y, Wang J, Huang Z and Yuille A 2014 Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv preprintarXiv:1412.6632

  18. Karayil T, Ul-Hasan A and Breuel T M 2015 A segmentation-free approach for printed Devanagari script recognition. In: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 946–950

  19. Akhter S S M N and Rege P P 2019 Semantic segmentation of printed text from Marathi document images using deep learning methods. In: Proceedings of the 2019 IEEE 16th India Council International Conference (INDICON), IEEE, pp. 1–4

  20. Zhang Y and Yang Q 2017 A survey on multi-task learning. arXiv preprintarXiv:1707.08114

  21. Ruder S 2017 An overview of multi-task learning in deep neural networks. arXiv preprintarXiv:1706.05098

  22. Goodfellow I, Bengio Y and Courville A 2016 Deep Learning. MIT Press, Cambridge

    MATH  Google Scholar 

  23. Zhu Y, Chen Y, Lu Z, Pan S J, Xue G R, Yu Y and Yang Q 2011 Heterogeneous transfer learning for image classification. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence

  24. Bhattacharya U, Parui S K and Mondal S 2009 Devanagari and Bangla text extraction from natural scene images. In: Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, IEEE, pp. 171–175

  25. Banerjee P and Chaudhuri B B 2013 An approach for Bangla and Devanagari video text recognition. In: Proceedings of the 4th International Workshop on Multilingual OCR, pp. 1–5

  26. Kompalli S, Nayak S, Setlur S and Govindaraju V 2005 Challenges in OCR of Devanagari documents. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), IEEE, pp. 327–331

  27. Rege P P and Chandrakar C A 2012 Text–image separation in document images using boundary/perimeter detection. Proc. ACEEE Int. J. Signal Image Process. 3(1): 10–14

    Google Scholar 

  28. Bhirud J P and Rege P P 2016 A modified SWT based text-image separation in natural scene images. In: Proceedings of the 2016 Conference on Advances in Signal Processing (CASP), IEEE, pp. 360–365

  29. Singh B, Chand V, Mittal A and Ghosh D 2012 A comparative study of different approaches of noise removal for document images. In: Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011). India: Springer, pp. 847–854

  30. Rege P P and Akhter S 2020 Text separation from document images: a deep learning approach. In: Machine Learning and Deep Learning in Real-Time Applications. IGI Global, pp. 283–313

  31. Noh H, Hong S and Han B 2015 Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528

  32. Taghanaki S A, Abhishek K, Cohen J P, Cohen-Adad J and Hamarneh G 2020 Deep semantic segmentation of natural and medical images: a review. Artificial Intelligence Review 54: 137–178

    Article  Google Scholar 

  33. Ahmad Z, Jindal R, Ekbal A and Bhattachharyya P 2020 Borrow from rich cousin: transfer learning for emotion detection using cross lingual embedding. Expert Systems with Applications 139: 112851

    Article  Google Scholar 

  34. Quattoni A, Collins M and Darrell T 2008 Transfer learning for image classification with sparse prototype representations. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1–8

  35. Zhang X, Chen X, Yao L, Ge C and Dong M 2019 Deep neural network hyperparameter optimization with orthogonal array tuning. In: Proceedings of the International Conference on Neural Information Processing. Cham: Springer, pp. 287–295

  36. Young S R, Rose D C, Karnowski T P, Lim S H and Patton R M 2015 Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, pp. 1–5

  37. Ilievski I, Akhtar T, Feng J and Shoemaker C A 2017 Efficient hyperparameter optimization for deep learning algorithms using deterministic RBF surrogates. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

  38. Li L, Jamieson K, DeSalvo G, Rostamizadeh A and Talwalkar A 2017 Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1): 6765–6816

    MathSciNet  MATH  Google Scholar 

  39. Bergstra J and Bengio Y 2012 Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13: 281–305

    MathSciNet  MATH  Google Scholar 

  40. Yang H F, Dillon T S and Chen Y P P 2016 Optimized structure of the traffic flow forecasting model with a deep learning approach. IEEE Trans. Neural Netw. Learn. Syst. 28(10): 2371–2381

    Article  Google Scholar 

  41. Taguchi G and Phadke M S 1989 Quality engineering through design optimization. In: Quality Control, Robust Design, and the Taguchi Method. Boston, MA: Springer, pp. 77–96

    Chapter  Google Scholar 

  42. Bagchi T P 1993 Taguchi Methods Explained: Practical Steps to Robust Design. Hoboken: Prentice-Hall

    Google Scholar 

  43. Ioffe S and Szegedy C 2015 Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  44. Badrinarayanan V, Mishra B and Cipolla R 2015 Understanding symmetries in deep networks. arXiv preprint arXiv:1511.01029

  45. Badrinarayanan V, Kendall A and Cipolla R 2017 Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12): 2481–2495

    Article  Google Scholar 

  46. Xie J, Xu L and Chen E 2012 Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems, pp. 341–349

  47. Pan S J and Yang Q 2009 A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10): 1345–1359

    Article  Google Scholar 

  48. Liskowski P and Krawiec K 2016 Segmenting retinal blood vessels with deep neural networks. IEEE Transactions on Medical Imaging 35(11): 2369–2380

    Article  Google Scholar 

  49. He K, Zhang X, Ren S and Sun J 2016 Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778

  50. Nwankpa C, Ijomah W, Gachagan A and Marshall S 2018 Activation functions: comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378

  51. Cun Y L, Bottou L, Orr G and Muller K 1998 Efficient backprop, neural networks: tricks of the trade. Lecture Notes Comput. Sci. 1524: 5–50

    Google Scholar 

  52. He K, Zhang X, Ren S and Sun J 2015 Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034

  53. Bengio Y 2012 Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade. Berlin–Heidelberg: Springer, pp. 437–478

    Chapter  Google Scholar 

  54. Masters D and Luschi C 2018 Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612

  55. Asim M N, Khan M U G, Malik M I, Razzaque K, Dengel A and Ahmed S 2019 Two stream deep network for document image classification. In: Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1410–1416

  56. Sorgdrager A, Wang R J and Grobler A 2017 Taguchi method in electrical machine design. SAIEE Africa Res. J. 108(4): 150–164

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor and the reviewers for their thorough review, valued comments, and positive suggestions. This study did not have any grants from funding agencies in the public, commercial, or non-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaheera Saba Mohd Naseem Akhter.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akhter, S.S.M.N., Rege, P.P. Multi-task learning for pre-processing of printed Devanagari document images with hyper-parameter optimization of the deep architecture using Taguchi method. Sādhanā 46, 145 (2021). https://doi.org/10.1007/s12046-021-01664-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-021-01664-7

Keywords

Navigation