Language Resources and Evaluation

, Volume 41, Issue 2, pp 117–128

A novel approach for ranking spelling error corrections for Urdu

Article

Abstract

This paper presents a scheme for ranking of spelling error corrections for Urdu. Conventionally spell-checking techniques do not provide any explicit ranking mechanism. Ranking is either implicit in the correction algorithm or corrections are not ranked at all. The research presented in this paper shows that for Urdu, phonetic similarity between the corrections and the erroneous word can serve as a useful parameter for ranking the corrections. This combined with a new technique Shapex that uses visual similarity of characters for ranking gives an improvement of 23% in the accuracy of the one-best match compared to the result obtained when the ranking is done on the basis of word frequencies only.

Keywords

Correction ranking Soundex Shapex Spelling error correction Urdu 

References

  1. Aliprand, J., et al. (2003). The unicode standard (Version 4.0). Addison-Wesley Publishing Company.Google Scholar
  2. Brill, E., & Moore, R. C. (2000). An improved error model for noisy channel spelling correction. In Proceedings of 38th Annual Meeting of Association for Computational Linguistics (pp. 286–293).Google Scholar
  3. Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of ACM, 7(3), 171–177.CrossRefGoogle Scholar
  4. Erikson, K. (1997). Approximate Swedish name matching—survey and test of different algorithms. NADA report TRITA-NA-E9721. http://www.csc.kth.se/tcs/projects/swedish.html
  5. Hodge, V. J., & Austin, J. (2003). A comparison of standard spell checking algorithms and a novel binary neural approach. IEEE Transactions on Knowledge and Data Engineering, 15(5), 1073–1081.Google Scholar
  6. Holmes, D., & McCabe, M. (2002). Improving precision and recall for Soundex retrieval. In Proceedings of the 2002 IEEE International Conference on Information Technology—Coding and Computing (ITCC), Las Vegas, April 2002.Google Scholar
  7. Hussain, S. (2004). Letter to sound rules for Urdu text to speech system. In Proceedings of Workshop on “Computational Approaches to Arabic Script-based Languages,” COLING, Geneva, Switzerland.Google Scholar
  8. Hussain, S., & Karamat, N. (2003). Urdu collation sequence. In Proceedings of the IEEE International Multi-Topic Conference, Islamabad.Google Scholar
  9. Kann, V., et al. (1998). Implementation aspects and applications of a spelling correction algorithm. NADA report TRITA-NA-9813, May 1998. http://www.nada.kth.se/∼viggo/papers.html
  10. Kernighan, M., et al. (1990). A spelling correction program based on noisy channel model. In Proceedings of COLING-90, The 13th International Conference On Computational Linguistics, Vol. 2.Google Scholar
  11. Khan, R. H. (1998). “Urdu Imla”, Qaumi Council bra-e-Taraki-e-Urdu Zabaan.Google Scholar
  12. Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Survey, 14(4), 377–439.CrossRefGoogle Scholar
  13. Odell and Russell Soundex. U.S. Patent 1261167 and U.S. Patent 1435663, 1918 and 1922.Google Scholar
  14. Peterson, L. J. (1986). A note on undetected typing errors. Communications of ACM, 29(7), 633–637.CrossRefGoogle Scholar
  15. Stanier, A. (1990). How accurate is Soundex matching. Computers in Genealogy, 3(7), 286–288.Google Scholar
  16. Toutanova, K., & Moore, R. C. (2002). Pronunciation modeling for improved spelling correction. In Proceedings of 40th Annual meeting of Association for Computational Linguistics (pp. 144–151). July 2002.Google Scholar
  17. Zobel, J., & Dart, P. W. (1995). Finding approximate matches in large lexicons. Software—Practice and Experience, 25(3), 331–345.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.Center for Research in Urdu Language ProcessingNational University of Computer and Emerging SciencesLahorePakistan

Personalised recommendations