Transliteration Based Text Input Methods for Telugu
Telugu is the third most spoken language in India and one of the fifteen most spoken languages in the world. But, there is no standardized input method for Telugu, which has a widespread use. Since majority of users of Telugu typing tools on the computers are familiar with English, we propose a transliteration based text input method in which the users type Telugu using Roman script. We have shown that simple edit-distance based approach can give a light-weight system with good efficiency for a text input method. We have tested the approach with three datasets – general data, countries and places and person names. The approach has worked considerably well for all the datasets and holds promise as an efficient text input method.
KeywordsTelugu Text input methods Transliteration Levenshtein edit-distance
Unable to display preview. Download preview PDF.
- 1.Andrew, T.F., Sherri, L.C., Christopher, M.A.: Cross Linguistic Name Matching in English and Arabic: A One to Many Mapping Extension of the Levenshtein Edit Distance Algorithm. In: Human Language Technology Conference of the North American Chapter of the ACL, pp. 471–478 (2006)Google Scholar
- 2.Animesh, N., Ravi Kiran Rao, B., Pawandeep, S., Sudip, S., Ratna, S.: Named Entity Recognition for Indian Languages. In: Workshop on NER for South and South East Asian Languages (NERSSEA), International Joint Conference on Natural Language Processing (IJCNLP) (2008)Google Scholar
- 3.Anirudha, J., Ashish, G., Aditya, C., Vikram, P., Gaurav, M.: Keylekh: A keyboard for text entry in Indic scripts. In: Proc. Computer Human Interaction (CHI) (2004)Google Scholar
- 5.Prasad, P., Vasudeva, V.: Word normalization in Indian languages. In: 4th International Conference on Natural Language Processing (ICON) (2005)Google Scholar
- 6.Ranbeer, M., Nikita, P., Prasad, P., Vasudeva, V.: Experiments in Cross-lingual IR among Indian Languages. In: International Workshop on Cross Language Information Processing (CLIP 2007) (2007)Google Scholar
- 7.Report of the Committee for Standardization of Keyboard Layout for Indian Script Based Computers. Electronics Information & Planning Journal 14(1) (October 1986)Google Scholar
- 8.Sandeva, G., Yoshihiko, H., Yuichi, I., Fumio, K.: An Efficient and User Friendly Sinhala Input method based on Phonetic Transcription. Journal of Natural Language Processing 14(5) (October 2007)Google Scholar
- 9.Sandeva, G., Yoshihiko, H., Yuichi, I., Fumio, K.: SriShell Primo: A Predictive Sinhala Text Input System. In: Workshop on NLP for Less Privileged Languages (NLPLPL), International Joint Conference on Natural Language Processing (IJCNLP) (2008)Google Scholar
- 10.Serva, M., Petroni, F.: Indo-European languages tree by Levenshtein distance. Exploring the Frontiers of Physics (EPL) (6) (2008)Google Scholar
- 11.William, W.C., Pradeep, R., Stephen, E.F.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI) (2003)Google Scholar
- 12.Winkler, W.E.: The State of Record Linkage and Current Research Problems. In: Statistics of Income Division, Internal Revenue Service Publication, R99/04Google Scholar