Skip to main content
Log in

DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

The recognition and detection of multioriented text from textual natural scene images are still challenging in the computer vision community. The segmentation on either word level or character level is a vital step in the entire end-to-end performance of the scene text recognition system. Many academicians and researchers have done work in the prominent field of segmenting the words or characters from complex document images as well as handwritten images for various non-Indian scripts. In this paper, we extensively presented a deep learning-based architecture named DELIGHT-Net which is derived from the general UNet architecture to segment the text at the word level from natural scene images. The method is mainly proposed to segment the Devanagari, Gurumukhi, and English scenic words from complete images collected from day-to-day life. To achieve this, we have introduced a new dataset, i.e., National Institute of Technology Jalandhar-Word Segmentation (NITJ-WS) which has around 2200 text blocks extracted from 1500 natural images containing unilingual, bilingual, and trilingual text. The benchmark comparative assessment of our dataset is performed with the proposed model and two state-of-the-art models, i.e., UNet and ResUNet. Statistical and visual results are evaluated using different evaluation parameters, which depict the efficiency of the proposed model. Some possible future directions are also recommended in the manuscript. We hope that our work is a stepping stone for academicians in the field of natural scene text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available on request from the corresponding author, [Shilpa Mahajan].

References

  1. Alghamdi A, Alluhaybi D, Almehmadi D, Alameer K, Siddeq SB, Alsubait T (2021) Text segmentation of historical Arabic handwritten manuscripts using projection profile. In: 2021 national computing colleges conference (NCCC), pp 1–6. https://doi.org/10.1109/NCCC49330.2021.9428836

  2. Amara M, Zidi K, Ghedira K, Zidi S (2016) New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. In: International conference on hybrid intelligent systems. Springer, pp 167–176

  3. Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recogn 35:875–893. https://doi.org/10.1016/S0031-3203(01)00081-4

    Article  MATH  Google Scholar 

  4. Basavaraju HT, Aradhya VN, Pavithra MS, Guru DS, Bhateja V (2021) Arbitrary oriented multilingual text detection and segmentation using level set and Gaussian mixture model. Evol Intell 14:881–894. https://doi.org/10.1007/s12065-020-00472-y

    Article  Google Scholar 

  5. Bhattacharya U, Parui SK, Mondal S (2009) Devanagari and Bangla text extraction from natural scene images. In: 2009 10th international conference on document analysis and recognition, pp 171–175. https://doi.org/10.1109/ICDAR.2009.178

  6. Chaitra Y, Dinesh R (2022) An impact of radon transforms and filtering techniques for text localization in natural scene text images. In: ICT with intelligent applications: proceedings of ICTIS 2021, vol 1. Springer, pp 563–573

  7. Chaitra Y, Dinesh R, Gopalakrishna M, Prakash BA (2021) Deep-cnntl: text localization from natural scene images using deep convolution neural network with transfer learning. Arab J Sci Eng. https://doi.org/10.1007/s13369-021-06309-9

    Article  Google Scholar 

  8. Chaitra Y, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of yolov5 on text detection and recognition system using tesseractocr in images/video frames. In: 2022 IEEE international conference on data science and information system (ICDSIS). IEEE, pp 1–6

  9. Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W (2018) Fused text segmentation networks for multi-oriented scene text detection. In: Proceedings: international conference on pattern recognition. IEEE, pp 3604–3609. https://doi.org/10.1109/ICPR.2018.8546066

  10. Dhok SB (2018) Multilingual character segmentation and recognition schemes for Indian document images. IEEE Access 6:10603–10617. https://doi.org/10.1109/ACCESS.2018.2795104

    Article  Google Scholar 

  11. Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114

    Article  Google Scholar 

  12. Firdaus FI, Khumaini A, Utaminingrum F (2017) Arabic letter segmentation using modified connected component labeling. In: 2017 international conference on sustainable information engineering and technology (SIET). IEEE, pp 392–397

  13. Jillani G, Hussain J, Yasmin M, Sharif M, Lawrence S (2018) A novel machine learning approach for scene text extraction. FuturE Gener Comput Syst 87:328–340. https://doi.org/10.1016/j.future.2018.04.074

    Article  Google Scholar 

  14. Karaoglu S, Tao R, Gevers T, Smeulders AWM (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimed 19:1063–1076. https://doi.org/10.1109/TMM.2016.2638622

    Article  Google Scholar 

  15. Kaur RP, Jindal MK, Kumar M (2021) Text and graphics segmentation of newspapers printed in Gurmukhi script: a hybrid approach. Vis Comput 37:1637–1659. https://doi.org/10.1007/s00371-020-01927-0

    Article  Google Scholar 

  16. Khare V, Shivakumara P, Chan CS, Lu T, Meng LK, Woon HH, Blumenstein M (2019) A novel character segmentation-reconstruction approach for license plate recognition. Expert Syst Appl 131:219–239

    Article  Google Scholar 

  17. Kumar S, Gupta R, Khanna N, Chaudhury S, Joshi SD (2007) Text extraction and document image segmentation using matched wavelets and MRF model. IEEE Trans Image Process 16:2117–2128. https://doi.org/10.1109/TIP.2007.900098

    Article  MathSciNet  Google Scholar 

  18. Liao M, Pang G, Huang J, Hassner T, Bai X (2020) Mask textspotter v3: segmentation proposal network for robust scene text spotting. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XI 16. Springer, pp 706–722

  19. Liu X (2005) An edge-based text region extraction algorithm for indoor mobile robot navigation. In: IEEE international conference mechatronics and automation, 2005, vol 2, pp 701–706. https://doi.org/10.1109/ICMA.2005.1626635

  20. Liu X (2006) Multiscale edge-based text extraction from complex images. Xiaoqing Liu and Jagath Samarabandu The University of Western Ontario Department of Electrical & Computer Engineering. Neural Computing and Applications, pp 1721–1724

  21. Lu T, Dooms A (2021) Probabilistic homogeneity for document image segmentation. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107591

    Article  Google Scholar 

  22. Ma J, Zhang H, Shan Y, Qie X, Xu X, Qi Z (2022) BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR, pp 19152–19162

  23. Madi B, Droby A, El-Sana J (2022) Textline alignment on the image domain. Int J Doc Anal Recognit 25:415–427

    Article  Google Scholar 

  24. Mahajan S, Rani R (2018) Text extraction from Indian and non-Indian natural scene images: a review. In: 2018 first international conference on secure cyber computing and communication (ICSCCC). IEEE, pp 584–588. https://doi.org/10.1109/ICSCCC.2018.8703369

  25. Mahajan S, Rani R (2019) A decade on script identification from natural images/videos: a review. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT), pp 1–5. https://app.dimensions.ai/details/publication/pub.1124551290. https://doi.org/10.1109/icict46931.2019.8977630

  26. Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54:4317–4377

    Article  Google Scholar 

  27. Mancas-Thillou C, Gosselin B (2005) Color text extraction from camera-based images: the impact of the choice of the clustering distance. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 312–316. https://doi.org/10.1109/ICDAR.2005.76

  28. Mechi O, Mehri M, Ingold R, Amara NEB (2019) Text line segmentation in historical document images using an adaptive U-net architecture. In: Proceedings of the international conference on document analysis and recognition, ICDAR, vol 1, pp 369–374. https://doi.org/10.1109/ICDAR.2019.00066

  29. Milosevic N, Gregson C, Hernandez R, Nenadic G (2019) A framework for information extraction from tables in biomedical literature. Int J Doc Anal Recognit 22:55–78

    Article  Google Scholar 

  30. Nguyen DD (2022) Tablesegnet: a fully convolutional network for table detection and segmentation in document images. Int J Doc Anal Recognit 25:1–14

    Article  Google Scholar 

  31. Papavassiliou V, Stafylakis T, Katsouros V, Carayannis G (2010) Handwritten document image segmentation into text lines and words. Pattern Recogn 43:369–377. https://doi.org/10.1016/j.patcog.2009.05.007

    Article  MATH  Google Scholar 

  32. Peng D, Jin L, Wu Y, Wang Z, Cai M (2019) A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 25–30. https://doi.org/10.1109/ICDAR.2019.00014

  33. Qomariyah F, Utaminingrum F, Mahmudy WF (2017) The segmentation of printed Arabic characters based on interest point. J Telecommun Electron Comput Eng 9:19–24

    Google Scholar 

  34. Raj H, Ghosh R (2014) Devanagari text extraction from natural scene images. In: International conference on advances in computing,communications and informatics (ICACCI), pp 513–517

  35. Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional Poisson model. In: Proceedings of the IEEE 2017 international conference on computing methodologies and communication, pp 1136–1141

  36. Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627. https://doi.org/10.17485/ijst/v14i7.2146

    Article  Google Scholar 

  37. Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627

    Article  Google Scholar 

  38. Rong X, Yi C, Tian Y (2020) Unambiguous scene text segmentation with referring expression comprehension. IEEE Trans Image Process 29:591–601. https://doi.org/10.1109/TIP.2019.2930176

    Article  MathSciNet  MATH  Google Scholar 

  39. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

  40. Saleem SI, Abdulazeez AM, Orman Z (2021) A new segmentation framework for Arabic handwritten text using machine learning techniques. Comput Mater Contin 68:2727–2754. https://doi.org/10.32604/cmc.2021.016447

    Article  Google Scholar 

  41. Wang C, Zhao S, Zhu L, Luo K, Guo Y, Wang J, Liu S (2021) Semi-supervised pixel-level scene text segmentation by mutually guided network. IEEE Trans Image Process 30:8212–8221. https://doi.org/10.1109/TIP.2021.3113157

    Article  Google Scholar 

  42. Xu X, Qi Z, Ma J, Zhang H, Shan Y, Qie X (2022) Bts: a bi-lingual benchmark for text segmentation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19152–19162

  43. Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12045–12055

  44. Yang H, Wu S, Member S, Deng C, Lin W, Member S (2015) Scale and orientation invariant text segmentation for born-digital compound images. IEEE Trans Cybern 45:519–533. https://doi.org/10.1109/TCYB.2014.2330657

    Article  Google Scholar 

  45. Zhang C, Tao Y, Du K, Ding W, Wang B, Liu J, Wang W (2021) Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving. IEEE Trans Artif Intell 3:297–308. https://doi.org/10.1109/tai.2021.3116216

    Article  Google Scholar 

  46. Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15:749–753

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shilpa Mahajan.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahajan, S., Rani, R. & Trehan, K. DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images. Int J Multimed Info Retr 12, 29 (2023). https://doi.org/10.1007/s13735-023-00293-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13735-023-00293-6

Keywords

Navigation