Improving Back-Transliteration by Combining Information Sources

  • Slaven Bilac
  • Hozumi Tanaka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


Transliterating words and names from one language to another is a frequent and highly productive phenomenon. Transliteration is information loosing since important distinctions are not preserved in the process. Hence, automatically converting transliterated words back into their original form is a real challenge. However, due to wide applicability in MT and CLIR, it is a computationally interesting problem. Previously proposed back-transliteration methods are based either on phoneme modeling or grapheme modeling across languages. In this paper, we propose a new method, combining the two models in order to enhance the back–transliterations of words transliterated in Japanese. Our experiments show that the resulting system outperforms single-model systems.


Machine Translation English Word Input String English Alphabet Electronic Dictionary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24, 599–612 (1998)Google Scholar
  2. 2.
    Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proc. of the Sixth Conference on Natural Language Learning, pp. 139–145 (2002)Google Scholar
  3. 3.
    Jeong, K.S., Myaeng, S.H., Lee, J.S., Choi, K.S.: Automatic identification and backtransliteration of foreign words for information retrieval. Information Processing and Management 35, 523–540 (1999)CrossRefGoogle Scholar
  4. 4.
    Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting katakana-English term pairs from search engine query logs. In: Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan, pp. 393–399 (2001)Google Scholar
  5. 5.
    Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proc. of the COLING/ACLWorkshop on Computational Approaches to Semitic Languages (1998)Google Scholar
  6. 6.
    Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting of the Association for Computarional Linguistics (ACL 2000), Tokyo, Japan, pp. 286–293 (2000)Google Scholar
  7. 7.
    Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 659–664 (1964)CrossRefGoogle Scholar
  8. 8.
    Levensthein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics–Doklady 10, 707–710 (1966)Google Scholar
  9. 9.
    Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proc. of IXth MT Summit (2003)Google Scholar
  10. 10.
    Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: Proc. of the 2nd International Conference on Language Resources and Evaluation, LREC 2000 (2000)Google Scholar
  11. 11.
    Eppstein, D.: Finding the k shortest paths. In: Proc. of the 35th Symposium on the Foundations of Computer Science, pp. 154–165 (1994)Google Scholar
  12. 12.
    Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proc. of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 393–399 (2002)Google Scholar
  13. 13.
    Pereira, F.C.N., Riley, M.: Speech recognition by composition of weighted finite automata. In: Roche, E., Shabes, Y. (eds.) Finite-State Language Processing, pp. 431–453. MIT Press, Cambridge (1997)Google Scholar
  14. 14.
    Press, W.H., Flannery, B.P., Teukolsky, A., Vetterling, T.: Numeric Recipies in C, 2nd edn. Cambridge University Press, Cambridge (1992)Google Scholar
  15. 15.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete via the em algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  16. 16.
    EDR: EDR Electronic Dictionary Technical Guide. Japan Electronic Dictionary Research Institute, Ltd. (1995) (in Japanese)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Slaven Bilac
    • 1
  • Hozumi Tanaka
    • 1
  1. 1.Department of Computer ScienceTokyo Institute of TechnologyTokyo

Personalised recommendations