Transliteration Based Text Input Methods for Telugu

  • V. B. Sowmya
  • Vasudeva Varma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5459)

Abstract

Telugu is the third most spoken language in India and one of the fifteen most spoken languages in the world. But, there is no standardized input method for Telugu, which has a widespread use. Since majority of users of Telugu typing tools on the computers are familiar with English, we propose a transliteration based text input method in which the users type Telugu using Roman script. We have shown that simple edit-distance based approach can give a light-weight system with good efficiency for a text input method. We have tested the approach with three datasets – general data, countries and places and person names. The approach has worked considerably well for all the datasets and holds promise as an efficient text input method.

Keywords

Telugu Text input methods Transliteration Levenshtein  edit-distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andrew, T.F., Sherri, L.C., Christopher, M.A.: Cross Linguistic Name Matching in English and Arabic: A One to Many Mapping Extension of the Levenshtein Edit Distance Algorithm. In: Human Language Technology Conference of the North American Chapter of the ACL, pp. 471–478 (2006)Google Scholar
  2. 2.
    Animesh, N., Ravi Kiran Rao, B., Pawandeep, S., Sudip, S., Ratna, S.: Named Entity Recognition for Indian Languages. In: Workshop on NER for South and South East Asian Languages (NERSSEA), International Joint Conference on Natural Language Processing (IJCNLP) (2008)Google Scholar
  3. 3.
    Anirudha, J., Ashish, G., Aditya, C., Vikram, P., Gaurav, M.: Keylekh: A keyboard for text entry in Indic scripts. In: Proc. Computer Human Interaction (CHI) (2004)Google Scholar
  4. 4.
    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)CrossRefGoogle Scholar
  5. 5.
    Prasad, P., Vasudeva, V.: Word normalization in Indian languages. In: 4th International Conference on Natural Language Processing (ICON) (2005)Google Scholar
  6. 6.
    Ranbeer, M., Nikita, P., Prasad, P., Vasudeva, V.: Experiments in Cross-lingual IR among Indian Languages. In: International Workshop on Cross Language Information Processing (CLIP 2007) (2007)Google Scholar
  7. 7.
    Report of the Committee for Standardization of Keyboard Layout for Indian Script Based Computers. Electronics Information & Planning Journal 14(1) (October 1986)Google Scholar
  8. 8.
    Sandeva, G., Yoshihiko, H., Yuichi, I., Fumio, K.: An Efficient and User Friendly Sinhala Input method based on Phonetic Transcription. Journal of Natural Language Processing 14(5) (October 2007)Google Scholar
  9. 9.
    Sandeva, G., Yoshihiko, H., Yuichi, I., Fumio, K.: SriShell Primo: A Predictive Sinhala Text Input System. In: Workshop on NLP for Less Privileged Languages (NLPLPL), International Joint Conference on Natural Language Processing (IJCNLP) (2008)Google Scholar
  10. 10.
    Serva, M., Petroni, F.: Indo-European languages tree by Levenshtein distance. Exploring the Frontiers of Physics (EPL) (6) (2008)Google Scholar
  11. 11.
    William, W.C., Pradeep, R., Stephen, E.F.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI) (2003)Google Scholar
  12. 12.
    Winkler, W.E.: The State of Record Linkage and Current Research Problems. In: Statistics of Income Division, Internal Revenue Service Publication, R99/04Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • V. B. Sowmya
    • 1
  • Vasudeva Varma
    • 1
  1. 1.International Institute of Information TechnologyLanguage Technologies Research CenterHyderabadIndia

Personalised recommendations