A New Large Urdu Database for Off-Line Handwriting Recognition
A new large Urdu handwriting database, which includes isolated digits, numeral strings with/without decimal points, five special symbols, 44 isolated characters, 57 Urdu words (mostly financial related), and Urdu dates in different patterns, was designed at Centre for Pattern Recognition and Machine Intelligence (CENPARMI). It is the first database for Urdu off-line handwriting recognition. It involves a large number of Urdu native speakers from different regions of the world. Moreover, the database has different formats – true color, gray level and binary. Experiments on Urdu digits recognition has been conducted with an accuracy of 98.61%. Methodologies in image pre-processing, gradient feature extraction and classification using SVM have been described, and a detailed error analysis is presented on the recognition results.
KeywordsUrdu OCR Off-line Handwriting Recognition Handwriting Segmentation Urdu Digit Recognition
- 1.Anwar, W., Wang, X., Wang. X.-L.: A survey of automatic Urdu language processing. In: Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, China, pp. 13–16 (2006)Google Scholar
- 4.Liu, C.-L., Suen, C.Y.: A new benchmark on the recognition of handwritten Bangla and Farsi numeral characters. In: Proceedings of 11th International Conference on Frontiers in Handwriting Recognition (ICFHR), Montreal, Canada, pp. 278–283 (2008)Google Scholar