Advertisement

Improving Machine Transliteration Performance by Using Multiple Transliteration Models

  • Jong-Hoon Oh
  • Key-Sun Choi
  • Hitoshi Isahara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)

Abstract

Machine transliteration has received significant attention as a supporting tool for machine translation and cross-language information retrieval. During the last decade, four kinds of transliteration model have been studied — grapheme-based model, phoneme-based model, hybrid model, and correspondence-based model. These models are classified in terms of the information sources for transliteration or the units to be transliterated — source graphemes, source phonemes, both source graphemes and source phonemes, and the correspondence between source graphemes and phonemes, respectively. Although each transliteration model has shown relatively good performance, one model alone has limitations on handling complex transliteration behaviors. To address the problem, we combined different transliteration models with a “generating transliterations followed by their validation” strategy. The strategy makes it possible to consider complex transliteration behaviors using the strengths of each model and to improve transliteration performance by validating transliterations. Our method makes use of web-based and transliteration model-based validation for transliteration validation. Experiments showed that our method outperforms both the individual transliteration models and previous work.

Keywords

English Word Target Language Maximum Entropy Model Source Word Target Grapheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Knight, K., Graehl, J.: Machine transliteration. In: Proc. of the 35th Annual Meetings of the Association for Computational Linguistics, pp. 128–135 (1997)Google Scholar
  2. 2.
    Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proc. of ACL 2002, pp. 400–408 (2002)Google Scholar
  3. 3.
    Fujii, A., Tetsuya, I.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities 35, 389–420 (2001)CrossRefGoogle Scholar
  4. 4.
    Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proc. of the Sixth Conference on Natural Language Learning (CoNLL), pp. 139–145 (2002)Google Scholar
  5. 5.
    Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: Proc. of the 2nd International Conference on Language Resources and Evaluation, pp. 1135–1411 (2000)Google Scholar
  6. 6.
    Kang, I.H., Kim, G.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: Proc. of the 18th International Conference on Computational Linguistics, pp. 418–424 (2000)Google Scholar
  7. 7.
    Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proc. of MT-Summit IX, pp. 125–132 (2003)Google Scholar
  8. 8.
    Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. In: Proc. of ACL 2004, pp. 160–167 (2004)Google Scholar
  9. 9.
    Jung, S.Y., Hong, S., Paek, E.: An English to Korean transliteration model of extended markov window. In: Proc. of the 18th conference on Computational linguistics, pp. 383–389 (2000)Google Scholar
  10. 10.
    Meng, H., Lo, W.K., Chen, B., Tang, K.: Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Proc. of Automatic Speech Recognition and Understanding. ASRU 2001, pp. 311–314 (2001)Google Scholar
  11. 11.
    Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 542–547. Springer, Heidelberg (2005)Google Scholar
  12. 12.
    Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proc. of COLING 2002, pp. 758–764 (2002)Google Scholar
  13. 13.
    Oh, J.H., Choi, K.S.: An ensemble of grapheme and phoneme for machine transliteration. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 450–461. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Oh, J.H., Choi, K.S.: Machine learning based English-to-Korean transliteration using grapheme and phoneme information. IEICE Transaction on Information & Systems E88-D, 1737–1748 (2005)CrossRefGoogle Scholar
  15. 15.
    Berger, A.L., Pietra, S.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)Google Scholar
  16. 16.
    Zhang, L.: Maximum entropy modeling toolkit for python and C++ (2004), http://homepages.inf.ed.ac.uk/s0450736/software/maxent/manual.pdf
  17. 17.
    Qu, Y., Grefenstette, G.: Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation. In: ACL, pp. 183–190 (2004)Google Scholar
  18. 18.
    Wang, J.H., Teng, J.W., Lu, W.H., Chien, L.F.: Exploiting the web as the multilingual corpus for unknown query translation. Journal of the American Society for Information Science and Technology 57, 660–670 (2006)CrossRefGoogle Scholar
  19. 19.
    Grefenstette, G., Qu, Y., Evans, D.A.: Mining the web to create a language model for mapping between English names and phrases and Japanese. In: Proc. of Web Intelligence, pp. 110–116 (2004)Google Scholar
  20. 20.
    Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)Google Scholar
  21. 21.
    Breen, J.: EDICT Japanese/English dictionary.le. The Electronic Dictionary Research and Development Group, Monash University (2003), http://www.csse.monash.edu.au/~jwb/edict.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jong-Hoon Oh
    • 1
  • Key-Sun Choi
    • 2
  • Hitoshi Isahara
    • 1
  1. 1.Computational Linguistics GroupNational Institute of Information and Communications Technology (NICT)KyotoJapan
  2. 2.Computer Science DivisionEECS, KAISTDaejeonRepublic of Korea

Personalised recommendations