A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora

  • Jong-Hoon Oh
  • Key-Sun Choi
  • Hitoshi Isahara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)


Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations. To date, however, there has been little literature regarding a framework that can accommodate several models at the same time. Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents. To address these problems, we propose a hybrid model for transliteration pair acquisition. In this paper, we concentrate on a framework for combining several models for transliteration pair acquisition. Experiments showed that our hybrid model was more effective than each individual transliteration pair acquisition model alone.


Hybrid Model English Word Recall Rate Parallel Corpus Foreign Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kang, B.J., Choi, K.S.: Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. IJCPOL 14 (2001)Google Scholar
  2. 2.
    Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35, 389–420 (2001)CrossRefGoogle Scholar
  3. 3.
    Tsujii, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. IJCPOL 15, 261–279 (2002)Google Scholar
  4. 4.
    Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting Katakana-English term pairs from search engine query logs. In: Proc. of NLPRS 2001, pp. 393–399 (2001)Google Scholar
  5. 5.
    Bilac, S., Tanaka, H.: Extracting transliteration pairs from comparable corpora. In: Proc. of NLP 2005 (2005)Google Scholar
  6. 6.
    Oh, J.H., Choi, K.S.: A statistical model for automatic extraction of Korean transliterated foreign words. IJCPOL 16 (2003)Google Scholar
  7. 7.
    Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jong-Hoon Oh
    • 1
    • 2
  • Key-Sun Choi
    • 2
  • Hitoshi Isahara
    • 1
  1. 1.Computational Linguistics GroupNICTKyotoJapan
  2. 2.Computer Science DivisionEECS, KAISTDaejeonRepublic of Korea

Personalised recommendations