Skip to main content

A Method to Construct Chinese-Japanese Named Entity Translation Equivalents Using Monolingual Corpora

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 400))

Abstract

The traditional method of Named Entity Translation Equivalents extraction is often based on large-scale parallel or comparable corpora. But the practicability of the research results is constrained by the relatively scarce of the bilingual corpus resources. We combined the features of Chinese and Japanese, and proposed a method to automatically extract the Chinese-Japanese NE translation equivalents based on inductive learning from monolingual corpus. This method uses the Chinese Hanzi and Japanese Kanji comparison table to calculate NE instances similarity between Japanese and Chinese. Then, we use inductive learning method to obtain partial translation rules of NEs through extracting the differences between Chinese and Japanese high similarity NE instances. In the end, the feedback process refreshes the Chinese and Japanese NE similarity and translation rule sets. Experimental results show that the proposed method is simple and efficient, which overcome the shortcoming that the traditional methods have a dependency on bilingual resource.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings of COLING, vol. 96, pp. 466–471 (1996)

    Google Scholar 

  2. Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 400–408. Association for Computational Linguistics (2002)

    Google Scholar 

  3. AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English-Arabic cross language information retrieval. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 139–146. ACM (2003)

    Google Scholar 

  4. Virga, P., Khudanpur, S.: Transliteration of proper names in cross-lingual information retrieval. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition, vol. 15, pp. 57–64. Association for Computational Linguistics (2003)

    Google Scholar 

  5. Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 17–22. Association for Computational Linguistics (1993)

    Google Scholar 

  6. Huang, F., Vogel, S., Waibel, A.: Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization. In: Proceedings of the ACL, Workshop on Multilingual and Mixed-Language Named Entity Recognition, vol. 15, pp. 9–16. Association for Computational Linguistics (2003)

    Google Scholar 

  7. Feng, D., Lv, Y., Zhou, M.: A new approach for English-Chinese named entity alignment. In: Proc. of EMNLP, pp. 372–379 (2004)

    Google Scholar 

  8. Rapp, R.: Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 320–322. Association for Computational Linguistics (1995)

    Google Scholar 

  9. Wan, S., Verspoor, C.M.: Automatic English-Chinese name transliteration for development of multilingual resources. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 1352–1356. Association for Computational Linguistics (1998)

    Google Scholar 

  10. Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526. Association for Computational Linguistics (1999)

    Google Scholar 

  11. Fung, P., Yee, L.Y.: An IR approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 1, pp. 414–420. Association for Computational Linguistics (1998)

    Google Scholar 

  12. Shao, L., Ng, H.T.: Mining new word translations from comparable corpora. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 618. Association for Computational Linguistics (2004)

    Google Scholar 

  13. Lee, L., Aw, A., Zhang, M., et al.: Em-based hybrid model for bilingual terminology extraction from comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 639–646. Association for Computational Linguistics (2010)

    Google Scholar 

  14. You, G., Hwang, S., Song, Y.I., et al.: Mining name translations from entity graph mapping. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 430–439. Association for Computational Linguistics (2010)

    Google Scholar 

  15. Kim, J., Hwang, S., Jiang, L., et al.: Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features (2012)

    Google Scholar 

  16. Araki, K., Takahashi, Y., Momouchi, Y., Tochinai, K.: Non-Segmented Kana-Kanji Translation Using Inductive Learning. In: The Transactions of the Institute of Electronics, Information and Communication Engineers, vol. J79-D-II(3), pp. 391–402 (1996)

    Google Scholar 

  17. Chu, C., Nakazawa, T., Kurohashi, S.: Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation, LREC 2012 (2012)

    Google Scholar 

  18. Goh, C.L., Asahara, M., Matsumoto, Y.: Chinese word segmentation by classification of characters. Computational Linguistics and Chinese Language Processing 10(3), 381–396 (2005)

    Google Scholar 

  19. Udupa, R., Saravanan, K., Kumaran, A., et al.: Mint: A method for effective and scalable mining of named entity transliterations from large comparable corpora. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 799–807. Association for Computational Linguistics (2009)

    Google Scholar 

  20. Tao, T., Zhai, C.X.: Mining comparable bilingual text corpora for cross-language information integration. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 691–696. ACM (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ru, K., Xu, J., Zhang, Y., Wu, P. (2013). A Method to Construct Chinese-Japanese Named Entity Translation Equivalents Using Monolingual Corpora. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41644-6_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41643-9

  • Online ISBN: 978-3-642-41644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics