PQAC-WN: constructing a wordnet for Pre-Qin ancient Chinese

Abstract

The Princeton WordNet® (PWN) is a widely used lexical knowledge database for semantic information processing. There are now many wordnets under creation for languages worldwide. In this paper, we endeavor to construct a wordnet for Pre-Qin ancient Chinese (PQAC), called PQAC WordNet (PQAC-WN), to process the semantic information of PQAC. In previous work, most recently constructed wordnets have been established either manually by experts or automatically using resources from which translation pairs between English and the target language can be extracted. The former method, however, is time-consuming, and the latter method, owing to a lack of language resources, cannot be performed on PQAC. As a result, a method based on word definitions in a monolingual dictionary is proposed. Specifically, for each sense, kernel words are first extracted from its definition, and the senses of each kernel word are then determined by graph-based Word Sense Disambiguation. Finally, one optimal sense is chosen from the kernel word senses to guide the mapping between the word sense and PWN synset. In this research, we obtain 66 % PQAC senses that can be shared with English and another 14 % language-specific senses that were added to PQAC-WN as new synsets. Overall, the automatic mapping achieves a precision of over 85 %.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    The 25 books areZuo Zhuan (左传), Guanzi (管子), Hanfeizi (韩非子), Lv Shi Chun Qiu (吕氏春秋), Liji (礼记), Mohism (墨子), Xunzi (荀子), Guo Yu (国语), Yili (仪礼), Zhuangzi (庄子), The Rites of Zhou (周礼), Gongyang Zhuan (公羊传), Guliang Zhuan (谷梁传), YanziChun Qiu (晏子春秋), Mengzi (孟子), Book of Poetry (诗经), Shang Shu (尚书), Book of Changes (周易), Shang Jun Shu (商君书), The Analects (论语), Chu Ci (楚辞), The Art of War (孙子兵法), Taoism (道德经), Wuzi (吴子) and Xiao Jing (孝经).

  2. 2.

    URL: http://langsphere.com/. License: CC BY-NC-SA 4.0.

References

  1. Amsler, R. A. (1981). A taxonomy for English nouns and verbs. In: Proceedings of the 19th annual meeting on association for computational linguistics (pp. 133–138), Stanford, California.

  2. Barbu, E., Mititelu, V. B., & Molini, S. D. (2007). Automatic building of wordnets. In N. Nicolov, K. Bontcheva, G. Angelova & R. Mitkov (Eds.), Proceedings of recent advances in natural language processing IV (pp. 217–226), Bulgaria: John Benjamins Publishing Company Borovets.

  3. Bizzoni, Y., Boschetti, F., Diako, H., Gratta, R. D., Monachini, M., & Crane, G. (2014). The making of ancient Greek wordnet. In Proceedings of the ninth international conference on language resources and evaluation (LREC’14) (pp. 1140–1147), Reykjavik, Iceland.

  4. Bond, F., Isahara, H., Kanzaki, K., & Uchimoto, K. (2008). Boot-strapping a wordnet using multiple existing wordnets. In Proceedings of the sixth international conference on language resources and evaluation (pp. 1619–1624), Marrakech, Morocco.

  5. Bond, F., Nichols, E., Fujita, S., & Tanaka, T. (2004). Acquiring an ontology for a fundamental vocabulary. In Proceedings of coling 2004 (pp. 1319–1325), Geneva, Switzerland.

  6. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 18, 13–47.

    Article  Google Scholar 

  7. Commercial-Press (Ed.). (2005). The great Chinese dictionary 2.0. Hong Kong: The Commercial Press.

    Google Scholar 

  8. Eckard, E., Barque, L., Nasr, A., & Sagot, B. (2012). Dictionary-ontology cross-enrichment using TLFi and WOLF to enrich one another. In Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon (pp 81–94). Mumbai, India: The COLING 2012 Organizing Committee. http://www.aclweb.org/anthology/W12-5107.

  9. Esuli, A., & Sebastiani, F. (2007). Pageranking wordnet synsets: An application to opinion mining. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, Czech.

  10. Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

    Google Scholar 

  11. Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., & Motta, E. (2011). Semantically enhanced information retrieval: An ontology-based approach. Journal of Web Semantics, 9, 434–452.

    Article  Google Scholar 

  12. Fiser, D., & Sagot, B. (2008). Combining multiple resources to build reliable wordnets. In Proceeding of text, speech and dialogue, Brno, Czech.

  13. Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database (pp. 305–332). Cambridge: MIT Press

    Google Scholar 

  14. HIT (ed). (2012). HIT IR-Lab Tongyici Cilin (extended). Harbin Institute of Technology. http://ir.hit.edu.cn/phpwebsite/index.php?module=pagemaster&PAGE_user_op=viewpage&PAGE_id=162.

  15. Hsu, M., Tsai, M., & Chen, H. (2008). Combining wordnet and conceptnet for automatic query expansion: A learning approach. In Proceedings of the fourth Asia information retrieval societies conference (pp 213–224), Harbin, China.

  16. Kulkarn, M. (2010). Introducing Sanskrit wordnet. In Proceedings of the fifth global WordNet meeting GWC, Mumbai, India.

  17. Lee, C., Lee, G. G., & Seo, J. (2004). Multiple heuristics and their combination for automatic wordnet mapping. Computers and the Humanities, 38, 437–455.

    Article  Google Scholar 

  18. Li, L. (2004). Bamboo and silk books and academic origin. Beijing: SDX Joint Publishing Company.

    Google Scholar 

  19. Li, B., Xi, N., Feng, M., & Chen, X. (2012). Corpus-based statistics of pre-qin Chinese. In Proceedings of CLSW 2012 (pp 145–153), Wuhan, China.

  20. Lindén, K., & Carlson, L. (2010). Finnwordnet — WordNet på finska via översättning. LexicoNordica — Nordic Journal of Lexicography (K. Lindén, Trans.), 17, 119–140.

  21. Liu, X. Y., Li, B., Zhang, Y. J., & Liu, L. (2014). Quantitative research on the origins of contemporary Chinese vocabulary based on the Great Chinese Dictionary. In Proceeding of 15th Workshop CLSW 2014 (pp. 112–123), Macao, China.

  22. Mei, J. et al. (1983). In J. Mei, Y. Zhu, Y. Gao & H. Yin (Eds.), Tongyici cilin (1st ed.). Shangai Lexicographical Publishing House.

  23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013a). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (pp. 3111–3119).

  24. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space. In Proceedings of workshop at ICLR.

  25. Mikolov, T., et al (2013c). Word2vec. URL https://code.google.com/p/word2vec/.

  26. Monozzi, S. (2009). The Latin wordnet project. Proceedings of the 15 th International Colloquium on Latin Linguistics (pp. 707–716). Austria: Innsbruck.

    Google Scholar 

  27. Navigli, R., & Lapata, M. (2010). An experimental study on graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Anaylsis and Machine Intelligence, 32(4), 678–692.

    Article  Google Scholar 

  28. Oliveira, H. G., & Gomes, P. (2013). On the automatic enrichment of a Portuguese wordnet with dictionary definitions. In Advances in artificial intelligence, local proceedings of the 16th Portuguese conference on artificial intelligence (pp. 486–497), Azores, Portugal.

  29. Ordan, N., & Wintner, S. (2007). Hebrew wordnet: A test case of aligning lexical databases across languages. International Journal of Translation, 19(1), 39–58.

    Google Scholar 

  30. Parker, R., Graff, D., Chen, K., Kong, J., & Maeda, K. (2011). Chinese Gigaword fifth edition LDC2011T13. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  31. Pianta, E., Bentivogli, L., & Girardi, C. (2002). Multiwordnet: developing an aligned multilingual database. In Proceedings of the first international conference on global WordNet, Mysore, India.

  32. Ponzetto, S. P., & Strube, M. (2006). Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, New York, USA.

  33. Saveski, M., & Trajkovski, I. (2010). Automatic construction of wordnets by using machine translation and language modeling. In Proceedings of seventh language technologies conference, 13th international multiconference information society (pp 707–716). Ljubljana, Slovenia.

  34. Sinha, R., & Mihalcea, R. (2007). Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In Proceedings of the IEEE international conference on semantic computing (ICSC2007) (pp. 363–369). Irvine, CA.

  35. Socher, R., Bauer, J., Manning, C. D., & Ng, A. Y. (2013). Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (Vol. 1: Long Papers, pp. 455-465). Sofia, Bulgaria: Association for Computational Linguistics. http://www.aclweb.org/anthology/P13-1045.

  36. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on web information and data management, Bremen, Germany.

  37. Vossen, P. (Ed.). (1998). EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic.

    Google Scholar 

  38. Yu, J., & Yu, S. (2002). The structure of Chinese concept dictionary. Journal of Chinese Information Processing, 16(4), 12–20.

    Google Scholar 

Download references

Acknowledgments

We are grateful for the comments of the reviewers. This work is the staged achievement of the projects supported by National Social Science Foundation of China (10&ZD117, 12&ZD177) and Ministry of Education of China (16YJC740034).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Bin Li.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Li, B., Dai, X. et al. PQAC-WN: constructing a wordnet for Pre-Qin ancient Chinese. Lang Resources & Evaluation 51, 525–545 (2017). https://doi.org/10.1007/s10579-016-9366-3

Download citation

Keywords

  • Definition-to-synset mapping
  • Pre-Qin ancient Chinese
  • Graph-based WSD
  • Global wordnet