A Method to Construct Chinese-Japanese Named Entity Translation Equivalents Using Monolingual Corpora

Ru, Kuang; Xu, Jinan; Zhang, Yujie; Wu, Peihao

doi:10.1007/978-3-642-41644-6_16

Kuang Ru⁴,
Jinan Xu⁴,
Yujie Zhang⁴ &
…
Peihao Wu⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 400))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1828 Accesses
2 Citations

Abstract

The traditional method of Named Entity Translation Equivalents extraction is often based on large-scale parallel or comparable corpora. But the practicability of the research results is constrained by the relatively scarce of the bilingual corpus resources. We combined the features of Chinese and Japanese, and proposed a method to automatically extract the Chinese-Japanese NE translation equivalents based on inductive learning from monolingual corpus. This method uses the Chinese Hanzi and Japanese Kanji comparison table to calculate NE instances similarity between Japanese and Chinese. Then, we use inductive learning method to obtain partial translation rules of NEs through extracting the differences between Chinese and Japanese high similarity NE instances. In the end, the feedback process refreshes the Chinese and Japanese NE similarity and translation rule sets. Experimental results show that the proposed method is simple and efficient, which overcome the shortcoming that the traditional methods have a dependency on bilingual resource.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings of COLING, vol. 96, pp. 466–471 (1996)
Google Scholar
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 400–408. Association for Computational Linguistics (2002)
Google Scholar
AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English-Arabic cross language information retrieval. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 139–146. ACM (2003)
Google Scholar
Virga, P., Khudanpur, S.: Transliteration of proper names in cross-lingual information retrieval. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition, vol. 15, pp. 57–64. Association for Computational Linguistics (2003)
Google Scholar
Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 17–22. Association for Computational Linguistics (1993)
Google Scholar
Huang, F., Vogel, S., Waibel, A.: Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization. In: Proceedings of the ACL, Workshop on Multilingual and Mixed-Language Named Entity Recognition, vol. 15, pp. 9–16. Association for Computational Linguistics (2003)
Google Scholar
Feng, D., Lv, Y., Zhou, M.: A new approach for English-Chinese named entity alignment. In: Proc. of EMNLP, pp. 372–379 (2004)
Google Scholar
Rapp, R.: Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 320–322. Association for Computational Linguistics (1995)
Google Scholar
Wan, S., Verspoor, C.M.: Automatic English-Chinese name transliteration for development of multilingual resources. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 1352–1356. Association for Computational Linguistics (1998)
Google Scholar
Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526. Association for Computational Linguistics (1999)
Google Scholar
Fung, P., Yee, L.Y.: An IR approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 1, pp. 414–420. Association for Computational Linguistics (1998)
Google Scholar
Shao, L., Ng, H.T.: Mining new word translations from comparable corpora. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 618. Association for Computational Linguistics (2004)
Google Scholar
Lee, L., Aw, A., Zhang, M., et al.: Em-based hybrid model for bilingual terminology extraction from comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 639–646. Association for Computational Linguistics (2010)
Google Scholar
You, G., Hwang, S., Song, Y.I., et al.: Mining name translations from entity graph mapping. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 430–439. Association for Computational Linguistics (2010)
Google Scholar
Kim, J., Hwang, S., Jiang, L., et al.: Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features (2012)
Google Scholar
Araki, K., Takahashi, Y., Momouchi, Y., Tochinai, K.: Non-Segmented Kana-Kanji Translation Using Inductive Learning. In: The Transactions of the Institute of Electronics, Information and Communication Engineers, vol. J79-D-II(3), pp. 391–402 (1996)
Google Scholar
Chu, C., Nakazawa, T., Kurohashi, S.: Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation, LREC 2012 (2012)
Google Scholar
Goh, C.L., Asahara, M., Matsumoto, Y.: Chinese word segmentation by classification of characters. Computational Linguistics and Chinese Language Processing 10(3), 381–396 (2005)
Google Scholar
Udupa, R., Saravanan, K., Kumaran, A., et al.: Mint: A method for effective and scalable mining of named entity transliterations from large comparable corpora. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 799–807. Association for Computational Linguistics (2009)
Google Scholar
Tao, T., Zhai, C.X.: Mining comparable bilingual text corpora for cross-language information integration. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 691–696. ACM (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Kuang Ru, Jinan Xu, Yujie Zhang & Peihao Wu

Authors

Kuang Ru
View author publications
You can also search for this author in PubMed Google Scholar
Jinan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peihao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Soochow University, 1 Shizi Street, 215006, Suzhou, China
Guodong Zhou
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Juanzi Li
Institute of Computer Science & Technology, Peking University, 100871, Beijing, China
Dongyan Zhao & Yansong Feng &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ru, K., Xu, J., Zhang, Y., Wu, P. (2013). A Method to Construct Chinese-Japanese Named Entity Translation Equivalents Using Monolingual Corpora. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-41644-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41643-9
Online ISBN: 978-3-642-41644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics