A novel approach for ranking spelling error corrections for Urdu
First Online: 26 September 2007 Received: 19 August 2006 Accepted: 02 June 2007 DOI:
10.1007/s10579-007-9028-6 Cite this article as: Naseem, T. & Hussain, S. Lang Resources & Evaluation (2007) 41: 117. doi:10.1007/s10579-007-9028-6 Abstract
This paper presents a scheme for ranking of spelling error corrections for Urdu. Conventionally spell-checking techniques do not provide any explicit ranking mechanism. Ranking is either implicit in the correction algorithm or corrections are not ranked at all. The research presented in this paper shows that for Urdu, phonetic similarity between the corrections and the erroneous word can serve as a useful parameter for ranking the corrections. This combined with a new technique Shapex that uses visual similarity of characters for ranking gives an improvement of 23% in the accuracy of the one-best match compared to the result obtained when the ranking is done on the basis of word frequencies only.
Keywords Correction ranking Soundex Shapex Spelling error correction Urdu References
Aliprand, J., et al. (2003).
The unicode standard
(Version 4.0). Addison-Wesley Publishing Company.
Brill, E., & Moore, R. C. (2000). An improved error model for noisy channel spelling correction. In
Proceedings of 38th Annual Meeting of Association for Computational Linguistics (pp. 286–293).
Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors.
Communications of ACM, 7
Erikson, K. (1997). Approximate Swedish name matching—survey and test of different algorithms. NADA report TRITA-NA-E9721.
Hodge, V. J., & Austin, J. (2003). A comparison of standard spell checking algorithms and a novel binary neural approach.
IEEE Transactions on Knowledge and Data Engineering, 15(5), 1073–1081.
Holmes, D., & McCabe, M. (2002). Improving precision and recall for Soundex retrieval. In
Proceedings of the 2002 IEEE International Conference on Information Technology—Coding and Computing (ITCC), Las Vegas, April 2002.
Hussain, S. (2004). Letter to sound rules for Urdu text to speech system. In
Proceedings of Workshop on “Computational Approaches to Arabic Script-based Languages,” COLING, Geneva, Switzerland.
Hussain, S., & Karamat, N. (2003). Urdu collation sequence. In
Proceedings of the IEEE International Multi-Topic Conference, Islamabad.
Kann, V., et al. (1998). Implementation aspects and applications of a spelling correction algorithm. NADA report TRITA-NA-9813, May 1998.
Kernighan, M., et al. (1990). A spelling correction program based on noisy channel model. In
Proceedings of COLING-90, The 13th International Conference On Computational Linguistics, Vol. 2.
Khan, R. H. (1998). “Urdu Imla”, Qaumi Council bra-e-Taraki-e-Urdu Zabaan.
Kukich, K. (1992). Techniques for automatically correcting words in text.
ACM Computing Survey, 14
Odell and Russell Soundex. U.S. Patent 1261167 and U.S. Patent 1435663, 1918 and 1922.
Peterson, L. J. (1986). A note on undetected typing errors.
Communications of ACM, 29
Stanier, A. (1990). How accurate is Soundex matching.
Computers in Genealogy, 3(7), 286–288.
Toutanova, K., & Moore, R. C. (2002). Pronunciation modeling for improved spelling correction. In
Proceedings of 40th Annual meeting of Association for Computational Linguistics (pp. 144–151). July 2002.
Zobel, J., & Dart, P. W. (1995). Finding approximate matches in large lexicons.
Software—Practice and Experience, 25
CrossRef Copyright information
© Springer Science+Business Media B.V. 2007