A New Large Urdu Database for Off-Line Handwriting Recognition

  • Malik Waqas Sagheer
  • Chun Lei He
  • Nicola Nobile
  • Ching Y. Suen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5716)

Abstract

A new large Urdu handwriting database, which includes isolated digits, numeral strings with/without decimal points, five special symbols, 44 isolated characters, 57 Urdu words (mostly financial related), and Urdu dates in different patterns, was designed at Centre for Pattern Recognition and Machine Intelligence (CENPARMI). It is the first database for Urdu off-line handwriting recognition. It involves a large number of Urdu native speakers from different regions of the world. Moreover, the database has different formats – true color, gray level and binary. Experiments on Urdu digits recognition has been conducted with an accuracy of 98.61%. Methodologies in image pre-processing, gradient feature extraction and classification using SVM have been described, and a detailed error analysis is presented on the recognition results.

Keywords

Urdu OCR Off-line Handwriting Recognition Handwriting Segmentation Urdu Digit Recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anwar, W., Wang, X., Wang. X.-L.: A survey of automatic Urdu language processing. In: Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, China, pp. 13–16 (2006)Google Scholar
  2. 2.
    Dehghan, M., Faez, K., Ahmadi, M., Shridhar, M.: Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognition 34(5), 1057–1065 (2001)CrossRefMATHGoogle Scholar
  3. 3.
    Liu, C.-L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit recognition: Investigation of normalization and feature extraction techniques. Pattern Recognition 37(2), 265–279 (2004)CrossRefMATHGoogle Scholar
  4. 4.
    Liu, C.-L., Suen, C.Y.: A new benchmark on the recognition of handwritten Bangla and Farsi numeral characters. In: Proceedings of 11th International Conference on Frontiers in Handwriting Recognition (ICFHR), Montreal, Canada, pp. 278–283 (2008)Google Scholar
  5. 5.
    Otsu, N.: A threshold selection method from gray-level histogram. IEEE Trans. System Man Cybernet. 9, 1569–1576 (1979)CrossRefGoogle Scholar
  6. 6.
    Shi, M., Fujisawa, Y., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. Pattern Recognition 35(10), 2051–2059 (2002)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Malik Waqas Sagheer
    • 1
  • Chun Lei He
    • 1
  • Nicola Nobile
    • 1
  • Ching Y. Suen
    • 1
  1. 1.CENPARMI (Centre for Pattern Recognition and Machine Intelligence) Computer Science and Software Engineering DepartmentConcordia UniversityMontrealCanada

Personalised recommendations