Abstract
The traditional method of Named Entity Translation Equivalents extraction is often based on large-scale parallel or comparable corpora. But the practicability of the research results is constrained by the relatively scarce of the bilingual corpus resources. We combined the features of Chinese and Japanese, and proposed a method to automatically extract the Chinese-Japanese NE translation equivalents based on inductive learning from monolingual corpus. This method uses the Chinese Hanzi and Japanese Kanji comparison table to calculate NE instances similarity between Japanese and Chinese. Then, we use inductive learning method to obtain partial translation rules of NEs through extracting the differences between Chinese and Japanese high similarity NE instances. In the end, the feedback process refreshes the Chinese and Japanese NE similarity and translation rule sets. Experimental results show that the proposed method is simple and efficient, which overcome the shortcoming that the traditional methods have a dependency on bilingual resource.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings of COLING, vol. 96, pp. 466–471 (1996)
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 400–408. Association for Computational Linguistics (2002)
AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English-Arabic cross language information retrieval. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 139–146. ACM (2003)
Virga, P., Khudanpur, S.: Transliteration of proper names in cross-lingual information retrieval. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition, vol. 15, pp. 57–64. Association for Computational Linguistics (2003)
Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 17–22. Association for Computational Linguistics (1993)
Huang, F., Vogel, S., Waibel, A.: Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization. In: Proceedings of the ACL, Workshop on Multilingual and Mixed-Language Named Entity Recognition, vol. 15, pp. 9–16. Association for Computational Linguistics (2003)
Feng, D., Lv, Y., Zhou, M.: A new approach for English-Chinese named entity alignment. In: Proc. of EMNLP, pp. 372–379 (2004)
Rapp, R.: Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 320–322. Association for Computational Linguistics (1995)
Wan, S., Verspoor, C.M.: Automatic English-Chinese name transliteration for development of multilingual resources. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 1352–1356. Association for Computational Linguistics (1998)
Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526. Association for Computational Linguistics (1999)
Fung, P., Yee, L.Y.: An IR approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 1, pp. 414–420. Association for Computational Linguistics (1998)
Shao, L., Ng, H.T.: Mining new word translations from comparable corpora. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 618. Association for Computational Linguistics (2004)
Lee, L., Aw, A., Zhang, M., et al.: Em-based hybrid model for bilingual terminology extraction from comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 639–646. Association for Computational Linguistics (2010)
You, G., Hwang, S., Song, Y.I., et al.: Mining name translations from entity graph mapping. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 430–439. Association for Computational Linguistics (2010)
Kim, J., Hwang, S., Jiang, L., et al.: Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features (2012)
Araki, K., Takahashi, Y., Momouchi, Y., Tochinai, K.: Non-Segmented Kana-Kanji Translation Using Inductive Learning. In: The Transactions of the Institute of Electronics, Information and Communication Engineers, vol. J79-D-II(3), pp. 391–402 (1996)
Chu, C., Nakazawa, T., Kurohashi, S.: Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation, LREC 2012 (2012)
Goh, C.L., Asahara, M., Matsumoto, Y.: Chinese word segmentation by classification of characters. Computational Linguistics and Chinese Language Processing 10(3), 381–396 (2005)
Udupa, R., Saravanan, K., Kumaran, A., et al.: Mint: A method for effective and scalable mining of named entity transliterations from large comparable corpora. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 799–807. Association for Computational Linguistics (2009)
Tao, T., Zhai, C.X.: Mining comparable bilingual text corpora for cross-language information integration. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 691–696. ACM (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ru, K., Xu, J., Zhang, Y., Wu, P. (2013). A Method to Construct Chinese-Japanese Named Entity Translation Equivalents Using Monolingual Corpora. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-41644-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41643-9
Online ISBN: 978-3-642-41644-6
eBook Packages: Computer ScienceComputer Science (R0)