Extracting English-Korean Transliteration Pairs from Web Corpora
Transliteration pair acquisition has received significant attention as a technique for constructing up-to-date transliteration lexicons, and for supporting machine translation and cross-language information retrieval. Previous studies on transliteration pair acquisition focused on only the phonetic similarity model but seldom considered the real-usage of transliterations in texts. Moreover, previous web-based validation models considered only one-way validation (validation from the viewpoint of a source term) rather than joint validation between a source term and a target term. To address these problems, we propose a novel transliteration pair acquisition model that extracts transliteration pairs from the Web and validates the pairs by combining the phonetic similarity and joint web-validation models. Experiments demonstrated that our transliteration pair acquisition model was effective.
KeywordsEnglish Term Target Term Phonetic Similarity Bilingual Corpus Candidate Extraction
Unable to display preview. Download preview PDF.
- 2.Kang, B.J., Choi, K.S.: Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. IJCPOL 14(2) (2001)Google Scholar
- 3.Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting Katakana-English term pairs from search engine query logs. In: Proc. of NLPRS 2001, pp. 393–399 (2001)Google Scholar
- 4.Tsujii, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. IJCPOL 15(3), 261–279 (2002)Google Scholar
- 5.Lee, C.J., Chang, J.S.: Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model. In: Proc. of the HLT-NAACL 2003 Workshop on Building and using parallel texts, pp. 96–103 (2003)Google Scholar
- 6.Bilac, S., Tanaka, H.: Extracting transliteration pairs from comparable corpora. In: Proc. of Symposium on Large-Scale Knowledge Resources (LKR 2005), pp. 203–206 (2005)Google Scholar
- 7.Oh, J.H., Choi, K.S.: Recognizing transliteration equivalents for enriching domain-specific thesauri. In: Proc. of the 3rd International WordNet Conference (GWC 2006), pp. 231–237 (2006)Google Scholar
- 10.Qu, Y., Grefenstette, G.: Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation. In: Proc. of ACL, pp. 183–190 (2004)Google Scholar
- 14.Nam, Y.S.: Foreign dictionary. Sung An Dang (1997)Google Scholar