Abstract
Segmentation is a procedure of splitting the image contents into its subparts (i.e., line and words). For all the common language-handling applications, for example document structure extraction, content rebuilding, optical character recognition, falsifications, security, graphology, and so forth, segmentation is an indispensable and primary step. This paper presents the quantitative performance of three different text line segmentation techniques: projection method, smearing method, and edge information-based method, for the Urdu Nastaleeq type-written text. The evaluation is performed over the gathered standard data samples taken from different magazines, poetry books, and newspapers, using precision and recall metrics. In the course of evaluation, the potency and debility of algorithms are analyzed and it is spotted that smearing segmentation method checkmates the other two methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Mahmood A (2013) Arabic & Urdu text segmentation challenges & techniques. IJCST 4(special-1):32–34
Sattar SA (2009) Ph.D. thesis: A technique for the design and implementation of an OCR for printed Nastalique text. N.E.D. University of Engineering & Technology, Karachi, Pakistan
Mahmood A, Srivastava A (2018) A novel segmentation technique for Urdu type written text. In: IEEE international conference on recent advances in engineering, technology and computational sciences, Allahabad (U.P), India, 06–08 February, 2018
Rehman Z, Anwar W, Bajwa UI (2011 Nov 8) Challenges in Urdu text tokinization and sentence boundary disambiguation. IJCNLP, 40–45
Ahmed Z, Orakzai JK, Shamsher I, Adnan A (2007) Urdu Nastaleeq optical character recognition, 249–252
Nawaz T, Naqvi S, Rehman H, Faiz A (2008) Optical character recognition system for Urdu (Nashk font) using pattern matching technique IJIP 3(3):92–104
Muaz A (2010) MS thesis: Urdu optical character recognition system. National University of Computer & Emerging Sciences, Lahore, Pakistan
Iftikhar U (2011) M.Sc. thesis: Recognition of Urdu ligatures. [S.l.]: VIBOT Consortium and German Research Center for Artificial Intelligence
Chanda A, U Pal (2005) English, Devnagari and Urdu Text Identification. In: Proceedings of the international conference on cognition and recognition. [S.l.]: [s.n.]
Antonacopoulos A, Karatzas D (2004 Jan) Document image analysis for world war II personal records. In: Proceedings of the international workshop on document image analysis for libraries (DIAL2004). IEEE Computer Society Press, Palo Alto, pp 336–341
Ha J, Haralick RM, Phillips IT (1995) Recursive X-Y cut using bounding boxes of connected components. In: Proceeding of 3rd international conference on document analysis and recognition (ICDAR), Aug 1995. IEEE Computer Society, pp 952–955
Öztop E, Mülayim AY, Atalay V, Yarman-Vural F (1997) Repulsive attractive network for baseline extraction on document images. In: IEEE international conference on acoustics, speech, and signal processing. IEEE, Munich, Germany
Wong KY, Casey RG, Wahl FM (1982) Document analysis systems. IBM J Res Dev 26(6)
Shi Z, Govindaraju V (2004) Historical document image enhancement using background light intensity normalization. In: Proceedings of the 17th international conference on pattern recognition (ICPR’04). IEEE Computer Society
O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans 15:1162–1173
Likforman-Sulem L, Hanimyan A, Faure C (1995) A hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, pp 774–777
Pu Y, Shi Z (1998) A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents. In: Proceedings of the 6 international workshop on frontiers in handwriting recognition, Taejon, Korea, 637–646
Tseng Y-H, Lee H-J (1999 Aug) Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognit Lett 20(8):791–806
Breuel TM (2001) Two geometric algorithms for layout analysis. In: Proceedings of the international workshop on document analysis systems, 188–199
Khorsheed MS (2001) Offline Arabic character recognition—a review. Pattern Anal Appl 5(1):31–45
Shafait F, Keysers D, Breuel TM (2006) Performance comparison of six algorithms for page segmentation. Image Understanding and Pattern Recognition (IUPR) Research Group, German Research Center for Artificial Intelligence (DFKI) and Technical University of Kaiserslautern, Kaiserslautern, Germany
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mahmood, A., Tiwari, A.K., Singh, S.K. (2020). A Performance Comparison of Segmentation Techniques for the Urdu Text. In: Giri, V., Verma, N., Patel, R., Singh, V. (eds) Computing Algorithms with Applications in Engineering. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-2369-4_12
Download citation
DOI: https://doi.org/10.1007/978-981-15-2369-4_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2368-7
Online ISBN: 978-981-15-2369-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)