Phoneme-Based Transliteration of Foreign Names for OOV Problem

  • Wei Gao
  • Kam-Fai Wong
  • Wai Lam
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


A proper noun dictionary is never complete rendering name translation from English to Chinese ineffective. One way to solve this problem is not to rely on a dictionary alone but to adopt automatic translation according to pronunciation similarities, i.e. to map phonemes comprising an English name to the phonetic representations of the corresponding Chinese name. This process is called transliteration. We present a statistical transliteration method. An efficient algorithm for aligning phoneme chunks is described. Unlike rule-based approaches, our method is data-driven. Compared to source-channel based statistical approaches, we adopt a direct transliteration model, i.e. the direction of probabilistic estimation conforms to the transliteration direction. We demonstrate comparable performance to source-channel based system.


Machine Translation Edit Distance Statistical Machine Translation Close Test Bilingual Dictionary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, D., Och, F.J., Purdy, D., Smith, N.A., Yarowsky, D.: Statistical machine translation. In: Final Report of JHU Workshop (1999)Google Scholar
  2. 2.
    Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–72 (1996)Google Scholar
  3. 3.
    Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)Google Scholar
  4. 4.
    Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU-Cambridge toolkit. In: Proc. of the 5th European Conf. on Speech Communication and Technology, pp. 2707–2710 (1997)Google Scholar
  5. 5.
    Germann, U., Jahr, M., Knight, K., Marcu, D., Yamada, K.: Fast decoding and optimal decoding for machine translation. In: Proc. of the 39th Annual Meeting of ACL, pp. 228–235 (2001)Google Scholar
  6. 6.
    Knight, K., Graehl, J.: Machine transliteration. In: Proc. of the 35th Annual Meeting of ACL, pp. 128–135 (1997)Google Scholar
  7. 7.
    Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proc. of the 40th Annual Meeting of ACL, pp. 295–302 (2002)Google Scholar
  8. 8.
    Virga, P., Khudanpur, S.: Transliteration of proper names in cross-lingual information retrieval. In: Proc. of the ACL Workshop on Multi-lingual Named Entity Recognition (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Wei Gao
    • 1
  • Kam-Fai Wong
    • 1
  • Wai Lam
    • 1
  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongShatin, N.T, Hong Kong

Personalised recommendations