A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora
Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations. To date, however, there has been little literature regarding a framework that can accommodate several models at the same time. Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents. To address these problems, we propose a hybrid model for transliteration pair acquisition. In this paper, we concentrate on a framework for combining several models for transliteration pair acquisition. Experiments showed that our hybrid model was more effective than each individual transliteration pair acquisition model alone.
KeywordsHybrid Model English Word Recall Rate Parallel Corpus Foreign Word
Unable to display preview. Download preview PDF.
- 1.Kang, B.J., Choi, K.S.: Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. IJCPOL 14 (2001)Google Scholar
- 3.Tsujii, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. IJCPOL 15, 261–279 (2002)Google Scholar
- 4.Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting Katakana-English term pairs from search engine query logs. In: Proc. of NLPRS 2001, pp. 393–399 (2001)Google Scholar
- 5.Bilac, S., Tanaka, H.: Extracting transliteration pairs from comparable corpora. In: Proc. of NLP 2005 (2005)Google Scholar
- 6.Oh, J.H., Choi, K.S.: A statistical model for automatic extraction of Korean transliterated foreign words. IJCPOL 16 (2003)Google Scholar
- 7.Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)Google Scholar