Extracting English-Korean Transliteration Pairs from Web Corpora

  • Jong-Hoon Oh
  • Hitoshi Isahara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)


Transliteration pair acquisition has received significant attention as a technique for constructing up-to-date transliteration lexicons, and for supporting machine translation and cross-language information retrieval. Previous studies on transliteration pair acquisition focused on only the phonetic similarity model but seldom considered the real-usage of transliterations in texts. Moreover, previous web-based validation models considered only one-way validation (validation from the viewpoint of a source term) rather than joint validation between a source term and a target term. To address these problems, we propose a novel transliteration pair acquisition model that extracts transliteration pairs from the Web and validates the pairs by combining the phonetic similarity and joint web-validation models. Experiments demonstrated that our transliteration pair acquisition model was effective.


English Term Target Term Phonetic Similarity Bilingual Corpus Candidate Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35(4), 389–420 (2001)CrossRefGoogle Scholar
  2. 2.
    Kang, B.J., Choi, K.S.: Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. IJCPOL 14(2) (2001)Google Scholar
  3. 3.
    Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting Katakana-English term pairs from search engine query logs. In: Proc. of NLPRS 2001, pp. 393–399 (2001)Google Scholar
  4. 4.
    Tsujii, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. IJCPOL 15(3), 261–279 (2002)Google Scholar
  5. 5.
    Lee, C.J., Chang, J.S.: Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model. In: Proc. of the HLT-NAACL 2003 Workshop on Building and using parallel texts, pp. 96–103 (2003)Google Scholar
  6. 6.
    Bilac, S., Tanaka, H.: Extracting transliteration pairs from comparable corpora. In: Proc. of Symposium on Large-Scale Knowledge Resources (LKR 2005), pp. 203–206 (2005)Google Scholar
  7. 7.
    Oh, J.H., Choi, K.S.: Recognizing transliteration equivalents for enriching domain-specific thesauri. In: Proc. of the 3rd International WordNet Conference (GWC 2006), pp. 231–237 (2006)Google Scholar
  8. 8.
    Oh, J.H., Choi, K.S., Isahara, H.: A hybrid model for extracting transliteration equivalents from parallel corpora. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 119–126. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Resnik, P., Smith, N.A.: The web as a parallel corpus. Computational Linguistics 29(3), 349–380 (2003)CrossRefGoogle Scholar
  10. 10.
    Qu, Y., Grefenstette, G.: Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation. In: Proc. of ACL, pp. 183–190 (2004)Google Scholar
  11. 11.
    Lu, W.H., Chien, L.F., Lee, H.J.: Translation of web queries using anchor text mining. ACM Transactions on Asian Language Information Processing 1(2), 159–172 (2002)CrossRefGoogle Scholar
  12. 12.
    Lu, W.H., Chien, L.F., Lee, H.J.: Anchor text mining for translation of web queries: A transitive translation approach. ACM Transactions on Information Systems 22(2), 242–269 (2004)CrossRefGoogle Scholar
  13. 13.
    Wang, J.H., Teng, J.W., Lu, W.H., Chien, L.F.: Exploiting the web as the multilingual corpus for unknown query translation. Journal of the American Society for Information Science and Technology 57(5), 660–670 (2006)CrossRefGoogle Scholar
  14. 14.
    Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jong-Hoon Oh
    • 1
  • Hitoshi Isahara
    • 1
  1. 1.Computational Linguistics Group, National Institute of Information and Communications Technology (NICT)KyotoJapan

Personalised recommendations