Skip to main content

A Performance Comparison of Segmentation Techniques for the Urdu Text

  • Conference paper
  • First Online:
  • 476 Accesses

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

Segmentation is a procedure of splitting the image contents into its subparts (i.e., line and words). For all the common language-handling applications, for example document structure extraction, content rebuilding, optical character recognition, falsifications, security, graphology, and so forth, segmentation is an indispensable and primary step. This paper presents the quantitative performance of three different text line segmentation techniques: projection method, smearing method, and edge information-based method, for the Urdu Nastaleeq type-written text. The evaluation is performed over the gathered standard data samples taken from different magazines, poetry books, and newspapers, using precision and recall metrics. In the course of evaluation, the potency and debility of algorithms are analyzed and it is spotted that smearing segmentation method checkmates the other two methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Mahmood A (2013) Arabic & Urdu text segmentation challenges & techniques. IJCST 4(special-1):32–34

    Google Scholar 

  2. Sattar SA (2009) Ph.D. thesis: A technique for the design and implementation of an OCR for printed Nastalique text. N.E.D. University of Engineering & Technology, Karachi, Pakistan

    Google Scholar 

  3. Mahmood A, Srivastava A (2018) A novel segmentation technique for Urdu type written text. In: IEEE international conference on recent advances in engineering, technology and computational sciences, Allahabad (U.P), India, 06–08 February, 2018

    Google Scholar 

  4. Rehman Z, Anwar W, Bajwa UI (2011 Nov 8) Challenges in Urdu text tokinization and sentence boundary disambiguation. IJCNLP, 40–45

    Google Scholar 

  5. Ahmed Z, Orakzai JK, Shamsher I, Adnan A (2007) Urdu Nastaleeq optical character recognition, 249–252

    Google Scholar 

  6. Nawaz T, Naqvi S, Rehman H, Faiz A (2008) Optical character recognition system for Urdu (Nashk font) using pattern matching technique IJIP 3(3):92–104

    Google Scholar 

  7. Muaz A (2010) MS thesis: Urdu optical character recognition system. National University of Computer & Emerging Sciences, Lahore, Pakistan

    Google Scholar 

  8. Iftikhar U (2011) M.Sc. thesis: Recognition of Urdu ligatures. [S.l.]: VIBOT Consortium and German Research Center for Artificial Intelligence

    Google Scholar 

  9. Chanda A, U Pal (2005) English, Devnagari and Urdu Text Identification. In: Proceedings of the international conference on cognition and recognition. [S.l.]: [s.n.]

    Google Scholar 

  10. Antonacopoulos A, Karatzas D (2004 Jan) Document image analysis for world war II personal records. In: Proceedings of the international workshop on document image analysis for libraries (DIAL2004). IEEE Computer Society Press, Palo Alto, pp 336–341

    Google Scholar 

  11. Ha J, Haralick RM, Phillips IT (1995) Recursive X-Y cut using bounding boxes of connected components. In: Proceeding of 3rd international conference on document analysis and recognition (ICDAR), Aug 1995. IEEE Computer Society, pp 952–955

    Google Scholar 

  12. Öztop E, Mülayim AY, Atalay V, Yarman-Vural F (1997) Repulsive attractive network for baseline extraction on document images. In: IEEE international conference on acoustics, speech, and signal processing. IEEE, Munich, Germany

    Google Scholar 

  13. Wong KY, Casey RG, Wahl FM (1982) Document analysis systems. IBM J Res Dev 26(6)

    Article  Google Scholar 

  14. Shi Z, Govindaraju V (2004) Historical document image enhancement using background light intensity normalization. In: Proceedings of the 17th international conference on pattern recognition (ICPR’04). IEEE Computer Society

    Google Scholar 

  15. O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans 15:1162–1173

    Google Scholar 

  16. Likforman-Sulem L, Hanimyan A, Faure C (1995) A hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, pp 774–777

    Google Scholar 

  17. Pu Y, Shi Z (1998) A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents. In: Proceedings of the 6 international workshop on frontiers in handwriting recognition, Taejon, Korea, 637–646

    Google Scholar 

  18. Tseng Y-H, Lee H-J (1999 Aug) Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognit Lett 20(8):791–806

    Article  Google Scholar 

  19. Breuel TM (2001) Two geometric algorithms for layout analysis. In: Proceedings of the international workshop on document analysis systems, 188–199

    Google Scholar 

  20. Khorsheed MS (2001) Offline Arabic character recognition—a review. Pattern Anal Appl 5(1):31–45

    Google Scholar 

  21. Shafait F, Keysers D, Breuel TM (2006) Performance comparison of six algorithms for page segmentation. Image Understanding and Pattern Recognition (IUPR) Research Group, German Research Center for Artificial Intelligence (DFKI) and Technical University of Kaiserslautern, Kaiserslautern, Germany

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahmood, A., Tiwari, A.K., Singh, S.K. (2020). A Performance Comparison of Segmentation Techniques for the Urdu Text. In: Giri, V., Verma, N., Patel, R., Singh, V. (eds) Computing Algorithms with Applications in Engineering. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-2369-4_12

Download citation

Publish with us

Policies and ethics