DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

Mahajan, Shilpa; Rani, Rajneesh; Trehan, Karan

doi:10.1007/s13735-023-00293-6

DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

Regular Paper
Published: 24 August 2023

Volume 12, article number 29, (2023)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Shilpa Mahajan¹,
Rajneesh Rani¹ &
Karan Trehan¹

216 Accesses
Explore all metrics

Abstract

The recognition and detection of multioriented text from textual natural scene images are still challenging in the computer vision community. The segmentation on either word level or character level is a vital step in the entire end-to-end performance of the scene text recognition system. Many academicians and researchers have done work in the prominent field of segmenting the words or characters from complex document images as well as handwritten images for various non-Indian scripts. In this paper, we extensively presented a deep learning-based architecture named DELIGHT-Net which is derived from the general UNet architecture to segment the text at the word level from natural scene images. The method is mainly proposed to segment the Devanagari, Gurumukhi, and English scenic words from complete images collected from day-to-day life. To achieve this, we have introduced a new dataset, i.e., National Institute of Technology Jalandhar-Word Segmentation (NITJ-WS) which has around 2200 text blocks extracted from 1500 natural images containing unilingual, bilingual, and trilingual text. The benchmark comparative assessment of our dataset is performed with the proposed model and two state-of-the-art models, i.e., UNet and ResUNet. Statistical and visual results are evaluated using different evaluation parameters, which depict the efficiency of the proposed model. Some possible future directions are also recommended in the manuscript. We hope that our work is a stepping stone for academicians in the field of natural scene text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Article Open access 13 September 2023

A detector for page-level handwritten music object recognition based on deep learning

Article 20 January 2023

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

Data Availability

The data that support the findings of this study are available on request from the corresponding author, [Shilpa Mahajan].

References

Alghamdi A, Alluhaybi D, Almehmadi D, Alameer K, Siddeq SB, Alsubait T (2021) Text segmentation of historical Arabic handwritten manuscripts using projection profile. In: 2021 national computing colleges conference (NCCC), pp 1–6. https://doi.org/10.1109/NCCC49330.2021.9428836
Amara M, Zidi K, Ghedira K, Zidi S (2016) New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. In: International conference on hybrid intelligent systems. Springer, pp 167–176
Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recogn 35:875–893. https://doi.org/10.1016/S0031-3203(01)00081-4
Article MATH Google Scholar
Basavaraju HT, Aradhya VN, Pavithra MS, Guru DS, Bhateja V (2021) Arbitrary oriented multilingual text detection and segmentation using level set and Gaussian mixture model. Evol Intell 14:881–894. https://doi.org/10.1007/s12065-020-00472-y
Article Google Scholar
Bhattacharya U, Parui SK, Mondal S (2009) Devanagari and Bangla text extraction from natural scene images. In: 2009 10th international conference on document analysis and recognition, pp 171–175. https://doi.org/10.1109/ICDAR.2009.178
Chaitra Y, Dinesh R (2022) An impact of radon transforms and filtering techniques for text localization in natural scene text images. In: ICT with intelligent applications: proceedings of ICTIS 2021, vol 1. Springer, pp 563–573
Chaitra Y, Dinesh R, Gopalakrishna M, Prakash BA (2021) Deep-cnntl: text localization from natural scene images using deep convolution neural network with transfer learning. Arab J Sci Eng. https://doi.org/10.1007/s13369-021-06309-9
Article Google Scholar
Chaitra Y, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of yolov5 on text detection and recognition system using tesseractocr in images/video frames. In: 2022 IEEE international conference on data science and information system (ICDSIS). IEEE, pp 1–6
Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W (2018) Fused text segmentation networks for multi-oriented scene text detection. In: Proceedings: international conference on pattern recognition. IEEE, pp 3604–3609. https://doi.org/10.1109/ICPR.2018.8546066
Dhok SB (2018) Multilingual character segmentation and recognition schemes for Indian document images. IEEE Access 6:10603–10617. https://doi.org/10.1109/ACCESS.2018.2795104
Article Google Scholar
Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114
Article Google Scholar
Firdaus FI, Khumaini A, Utaminingrum F (2017) Arabic letter segmentation using modified connected component labeling. In: 2017 international conference on sustainable information engineering and technology (SIET). IEEE, pp 392–397
Jillani G, Hussain J, Yasmin M, Sharif M, Lawrence S (2018) A novel machine learning approach for scene text extraction. FuturE Gener Comput Syst 87:328–340. https://doi.org/10.1016/j.future.2018.04.074
Article Google Scholar
Karaoglu S, Tao R, Gevers T, Smeulders AWM (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimed 19:1063–1076. https://doi.org/10.1109/TMM.2016.2638622
Article Google Scholar
Kaur RP, Jindal MK, Kumar M (2021) Text and graphics segmentation of newspapers printed in Gurmukhi script: a hybrid approach. Vis Comput 37:1637–1659. https://doi.org/10.1007/s00371-020-01927-0
Article Google Scholar
Khare V, Shivakumara P, Chan CS, Lu T, Meng LK, Woon HH, Blumenstein M (2019) A novel character segmentation-reconstruction approach for license plate recognition. Expert Syst Appl 131:219–239
Article Google Scholar
Kumar S, Gupta R, Khanna N, Chaudhury S, Joshi SD (2007) Text extraction and document image segmentation using matched wavelets and MRF model. IEEE Trans Image Process 16:2117–2128. https://doi.org/10.1109/TIP.2007.900098
Article MathSciNet Google Scholar
Liao M, Pang G, Huang J, Hassner T, Bai X (2020) Mask textspotter v3: segmentation proposal network for robust scene text spotting. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XI 16. Springer, pp 706–722
Liu X (2005) An edge-based text region extraction algorithm for indoor mobile robot navigation. In: IEEE international conference mechatronics and automation, 2005, vol 2, pp 701–706. https://doi.org/10.1109/ICMA.2005.1626635
Liu X (2006) Multiscale edge-based text extraction from complex images. Xiaoqing Liu and Jagath Samarabandu The University of Western Ontario Department of Electrical & Computer Engineering. Neural Computing and Applications, pp 1721–1724
Lu T, Dooms A (2021) Probabilistic homogeneity for document image segmentation. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107591
Article Google Scholar
Ma J, Zhang H, Shan Y, Qie X, Xu X, Qi Z (2022) BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR, pp 19152–19162
Madi B, Droby A, El-Sana J (2022) Textline alignment on the image domain. Int J Doc Anal Recognit 25:415–427
Article Google Scholar
Mahajan S, Rani R (2018) Text extraction from Indian and non-Indian natural scene images: a review. In: 2018 first international conference on secure cyber computing and communication (ICSCCC). IEEE, pp 584–588. https://doi.org/10.1109/ICSCCC.2018.8703369
Mahajan S, Rani R (2019) A decade on script identification from natural images/videos: a review. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT), pp 1–5. https://app.dimensions.ai/details/publication/pub.1124551290. https://doi.org/10.1109/icict46931.2019.8977630
Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54:4317–4377
Article Google Scholar
Mancas-Thillou C, Gosselin B (2005) Color text extraction from camera-based images: the impact of the choice of the clustering distance. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 312–316. https://doi.org/10.1109/ICDAR.2005.76
Mechi O, Mehri M, Ingold R, Amara NEB (2019) Text line segmentation in historical document images using an adaptive U-net architecture. In: Proceedings of the international conference on document analysis and recognition, ICDAR, vol 1, pp 369–374. https://doi.org/10.1109/ICDAR.2019.00066
Milosevic N, Gregson C, Hernandez R, Nenadic G (2019) A framework for information extraction from tables in biomedical literature. Int J Doc Anal Recognit 22:55–78
Article Google Scholar
Nguyen DD (2022) Tablesegnet: a fully convolutional network for table detection and segmentation in document images. Int J Doc Anal Recognit 25:1–14
Article Google Scholar
Papavassiliou V, Stafylakis T, Katsouros V, Carayannis G (2010) Handwritten document image segmentation into text lines and words. Pattern Recogn 43:369–377. https://doi.org/10.1016/j.patcog.2009.05.007
Article MATH Google Scholar
Peng D, Jin L, Wu Y, Wang Z, Cai M (2019) A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 25–30. https://doi.org/10.1109/ICDAR.2019.00014
Qomariyah F, Utaminingrum F, Mahmudy WF (2017) The segmentation of printed Arabic characters based on interest point. J Telecommun Electron Comput Eng 9:19–24
Google Scholar
Raj H, Ghosh R (2014) Devanagari text extraction from natural scene images. In: International conference on advances in computing,communications and informatics (ICACCI), pp 513–517
Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional Poisson model. In: Proceedings of the IEEE 2017 international conference on computing methodologies and communication, pp 1136–1141
Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627. https://doi.org/10.17485/ijst/v14i7.2146
Article Google Scholar
Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627
Article Google Scholar
Rong X, Yi C, Tian Y (2020) Unambiguous scene text segmentation with referring expression comprehension. IEEE Trans Image Process 29:591–601. https://doi.org/10.1109/TIP.2019.2930176
Article MathSciNet MATH Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Saleem SI, Abdulazeez AM, Orman Z (2021) A new segmentation framework for Arabic handwritten text using machine learning techniques. Comput Mater Contin 68:2727–2754. https://doi.org/10.32604/cmc.2021.016447
Article Google Scholar
Wang C, Zhao S, Zhu L, Luo K, Guo Y, Wang J, Liu S (2021) Semi-supervised pixel-level scene text segmentation by mutually guided network. IEEE Trans Image Process 30:8212–8221. https://doi.org/10.1109/TIP.2021.3113157
Article Google Scholar
Xu X, Qi Z, Ma J, Zhang H, Shan Y, Qie X (2022) Bts: a bi-lingual benchmark for text segmentation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19152–19162
Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12045–12055
Yang H, Wu S, Member S, Deng C, Lin W, Member S (2015) Scale and orientation invariant text segmentation for born-digital compound images. IEEE Trans Cybern 45:519–533. https://doi.org/10.1109/TCYB.2014.2330657
Article Google Scholar
Zhang C, Tao Y, Du K, Ding W, Wang B, Liu J, Wang W (2021) Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving. IEEE Trans Artif Intell 3:297–308. https://doi.org/10.1109/tai.2021.3116216
Article Google Scholar
Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15:749–753
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dr. BR Ambedkar National Institute of Technology, Jalandhar, Punjab, 144011, India
Shilpa Mahajan, Rajneesh Rani & Karan Trehan

Authors

Shilpa Mahajan
View author publications
You can also search for this author in PubMed Google Scholar
Rajneesh Rani
View author publications
You can also search for this author in PubMed Google Scholar
Karan Trehan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shilpa Mahajan.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mahajan, S., Rani, R. & Trehan, K. DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images. Int J Multimed Info Retr 12, 29 (2023). https://doi.org/10.1007/s13735-023-00293-6

Download citation

Received: 22 March 2023
Revised: 29 May 2023
Accepted: 10 July 2023
Published: 24 August 2023
DOI: https://doi.org/10.1007/s13735-023-00293-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

Abstract

Access this article

Similar content being viewed by others

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

A detector for page-level handwritten music object recognition based on deep learning

TextConvoNet: a convolutional neural network based architecture for text classification

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

Abstract

Access this article

Similar content being viewed by others

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

A detector for page-level handwritten music object recognition based on deep learning

TextConvoNet: a convolutional neural network based architecture for text classification

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation