A Performance Comparison of Segmentation Techniques for the Urdu Text

Mahmood, Atif; Tiwari, Amod Kumar; Singh, Sanjay Kumar

doi:10.1007/978-981-15-2369-4_12

A Performance Comparison of Segmentation Techniques for the Urdu Text

Atif Mahmood⁸,
Amod Kumar Tiwari⁸ &
Sanjay Kumar Singh⁹

Conference paper
First Online: 03 March 2020

476 Accesses

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

Segmentation is a procedure of splitting the image contents into its subparts (i.e., line and words). For all the common language-handling applications, for example document structure extraction, content rebuilding, optical character recognition, falsifications, security, graphology, and so forth, segmentation is an indispensable and primary step. This paper presents the quantitative performance of three different text line segmentation techniques: projection method, smearing method, and edge information-based method, for the Urdu Nastaleeq type-written text. The evaluation is performed over the gathered standard data samples taken from different magazines, poetry books, and newspapers, using precision and recall metrics. In the course of evaluation, the potency and debility of algorithms are analyzed and it is spotted that smearing segmentation method checkmates the other two methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Mahmood A (2013) Arabic & Urdu text segmentation challenges & techniques. IJCST 4(special-1):32–34
Google Scholar
Sattar SA (2009) Ph.D. thesis: A technique for the design and implementation of an OCR for printed Nastalique text. N.E.D. University of Engineering & Technology, Karachi, Pakistan
Google Scholar
Mahmood A, Srivastava A (2018) A novel segmentation technique for Urdu type written text. In: IEEE international conference on recent advances in engineering, technology and computational sciences, Allahabad (U.P), India, 06–08 February, 2018
Google Scholar
Rehman Z, Anwar W, Bajwa UI (2011 Nov 8) Challenges in Urdu text tokinization and sentence boundary disambiguation. IJCNLP, 40–45
Google Scholar
Ahmed Z, Orakzai JK, Shamsher I, Adnan A (2007) Urdu Nastaleeq optical character recognition, 249–252
Google Scholar
Nawaz T, Naqvi S, Rehman H, Faiz A (2008) Optical character recognition system for Urdu (Nashk font) using pattern matching technique IJIP 3(3):92–104
Google Scholar
Muaz A (2010) MS thesis: Urdu optical character recognition system. National University of Computer & Emerging Sciences, Lahore, Pakistan
Google Scholar
Iftikhar U (2011) M.Sc. thesis: Recognition of Urdu ligatures. [S.l.]: VIBOT Consortium and German Research Center for Artificial Intelligence
Google Scholar
Chanda A, U Pal (2005) English, Devnagari and Urdu Text Identification. In: Proceedings of the international conference on cognition and recognition. [S.l.]: [s.n.]
Google Scholar
Antonacopoulos A, Karatzas D (2004 Jan) Document image analysis for world war II personal records. In: Proceedings of the international workshop on document image analysis for libraries (DIAL2004). IEEE Computer Society Press, Palo Alto, pp 336–341
Google Scholar
Ha J, Haralick RM, Phillips IT (1995) Recursive X-Y cut using bounding boxes of connected components. In: Proceeding of 3rd international conference on document analysis and recognition (ICDAR), Aug 1995. IEEE Computer Society, pp 952–955
Google Scholar
Öztop E, Mülayim AY, Atalay V, Yarman-Vural F (1997) Repulsive attractive network for baseline extraction on document images. In: IEEE international conference on acoustics, speech, and signal processing. IEEE, Munich, Germany
Google Scholar
Wong KY, Casey RG, Wahl FM (1982) Document analysis systems. IBM J Res Dev 26(6)
Article Google Scholar
Shi Z, Govindaraju V (2004) Historical document image enhancement using background light intensity normalization. In: Proceedings of the 17th international conference on pattern recognition (ICPR’04). IEEE Computer Society
Google Scholar
O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans 15:1162–1173
Google Scholar
Likforman-Sulem L, Hanimyan A, Faure C (1995) A hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, pp 774–777
Google Scholar
Pu Y, Shi Z (1998) A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents. In: Proceedings of the 6 international workshop on frontiers in handwriting recognition, Taejon, Korea, 637–646
Google Scholar
Tseng Y-H, Lee H-J (1999 Aug) Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognit Lett 20(8):791–806
Article Google Scholar
Breuel TM (2001) Two geometric algorithms for layout analysis. In: Proceedings of the international workshop on document analysis systems, 188–199
Google Scholar
Khorsheed MS (2001) Offline Arabic character recognition—a review. Pattern Anal Appl 5(1):31–45
Google Scholar
Shafait F, Keysers D, Breuel TM (2006) Performance comparison of six algorithms for page segmentation. Image Understanding and Pattern Recognition (IUPR) Research Group, German Research Center for Artificial Intelligence (DFKI) and Technical University of Kaiserslautern, Kaiserslautern, Germany
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Rajkiya Engineering College, Sonbhadra, Uttar Pradesh, India
Atif Mahmood & Amod Kumar Tiwari
Department of Computer Science, R.S.M.T., U.P College, Varanasi, Uttar Pradesh, India
Sanjay Kumar Singh

Authors

Atif Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Amod Kumar Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, India
V. K. Giri
Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur, India
Nishchal K. Verma
Department of Electrical Engineering, Rajkiya Engineering College Sonbhadra, Churk, India
R. K. Patel
Department of Electrical Engineering, Rajkiya Engineering College Sonbhadra, Churk, India
V. P. Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahmood, A., Tiwari, A.K., Singh, S.K. (2020). A Performance Comparison of Segmentation Techniques for the Urdu Text. In: Giri, V., Verma, N., Patel, R., Singh, V. (eds) Computing Algorithms with Applications in Engineering. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-2369-4_12

Download citation

DOI: https://doi.org/10.1007/978-981-15-2369-4_12
Published: 03 March 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2368-7
Online ISBN: 978-981-15-2369-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics