English to Persian Transliteration

  • Sarvnaz Karimi
  • Andrew Turpin
  • Falk Scholer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4209)


Persian is an Indo-European language written using Arabic script, and is an official language of Iran, Afghanistan, and Tajikistan. Transliteration of Persian to English—that is, the character-by-character mapping of a Persian word that is not readily available in a bilingual dictionary—is an unstudied problem. In this paper we make three novel contributions. First, we present performance comparisons of existing grapheme-based transliteration methods on English to Persian. Second, we discuss the difficulties in establishing a corpus for studying transliteration. Finally, we introduce a new model of Persian that takes into account the habit of shortening, or even omitting, runs of English vowels. This trait makes transliteration of Persian particularly difficult for phonetic based methods. This new model outperforms the existing grapheme based methods on Persian, exhibiting a 24% relative increase in transliteration accuracy measured using the top-5 criteria.


Target Word Training Corpus Baseline System Source Symbol Source Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English-Arabic cross language information retrieval. In: CIKM, pp. 139–146 (2003)Google Scholar
  2. 2.
    Bilac, S., Tanaka, H.: Direct combination of spelling and pronunciation information for robust back-transliteration. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 413–424. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computional Linguistics 19(2), 263–311 (1993)Google Scholar
  4. 4.
    Cleary, J.G., Witten, I.H.: A comparison of enumerative and adaptive codes. IEEE Transactions on Information Theory 30(2), 306–315 (1984)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Eppstein, D.: Finding the k shortest paths. SIAM J. Computing 28(2), 652–673 (1998)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. 12(4), 381–402 (1980)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Jung, S.Y., Hong, S.L., Paek, E.: An English to Korean transliteration model of extended markov window. In: COLING, pp. 383–389 (2000)Google Scholar
  8. 8.
    Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)Google Scholar
  9. 9.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)MathSciNetGoogle Scholar
  10. 10.
    Linden, K.: Multilingual modeling of cross-lingual spelling variants. Inf. Retrieval 9(3), 295–310 (2005)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)CrossRefGoogle Scholar
  12. 12.
    Jong-Hoon, O., Key-Sun, C.: An ensemble of transliteration models for information retrieval. Inf. Process. Manage. 42(4), 980–1002 (2006)CrossRefGoogle Scholar
  13. 13.
    Toivonen, J., Pirkola, A., Keskustalo, H., Visala, K., Järvelin, K.: Translating cross-lingual spelling variants using transformation rules. Inf. Process. Manage. 41(4), 859–872 (2005)CrossRefGoogle Scholar
  14. 14.
    Wan, S., Verspoor, C.: Automatic English-Chinese name transliteration for development of multilingual resources. In: COLING-ACL, pp. 1352–1356 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sarvnaz Karimi
    • 1
  • Andrew Turpin
    • 1
  • Falk Scholer
    • 1
  1. 1.School of Computer Science and Information TechnologyRMIT University, GPOMelbourneAustralia

Personalised recommendations