Language Resources and Evaluation

, Volume 51, Issue 2, pp 525–545 | Cite as

PQAC-WN: constructing a wordnet for Pre-Qin ancient Chinese

  • Yingjie Zhang
  • Bin Li
  • Xinyu Dai
  • Shujian Huang
  • Jiajun Chen
Project Notes

Abstract

The Princeton WordNet® (PWN) is a widely used lexical knowledge database for semantic information processing. There are now many wordnets under creation for languages worldwide. In this paper, we endeavor to construct a wordnet for Pre-Qin ancient Chinese (PQAC), called PQAC WordNet (PQAC-WN), to process the semantic information of PQAC. In previous work, most recently constructed wordnets have been established either manually by experts or automatically using resources from which translation pairs between English and the target language can be extracted. The former method, however, is time-consuming, and the latter method, owing to a lack of language resources, cannot be performed on PQAC. As a result, a method based on word definitions in a monolingual dictionary is proposed. Specifically, for each sense, kernel words are first extracted from its definition, and the senses of each kernel word are then determined by graph-based Word Sense Disambiguation. Finally, one optimal sense is chosen from the kernel word senses to guide the mapping between the word sense and PWN synset. In this research, we obtain 66 % PQAC senses that can be shared with English and another 14 % language-specific senses that were added to PQAC-WN as new synsets. Overall, the automatic mapping achieves a precision of over 85 %.

Keywords

Definition-to-synset mapping Pre-Qin ancient Chinese Graph-based WSD Global wordnet 

References

  1. Amsler, R. A. (1981). A taxonomy for English nouns and verbs. In: Proceedings of the 19th annual meeting on association for computational linguistics (pp. 133–138), Stanford, California.Google Scholar
  2. Barbu, E., Mititelu, V. B., & Molini, S. D. (2007). Automatic building of wordnets. In N. Nicolov, K. Bontcheva, G. Angelova & R. Mitkov (Eds.), Proceedings of recent advances in natural language processing IV (pp. 217–226), Bulgaria: John Benjamins Publishing Company Borovets.Google Scholar
  3. Bizzoni, Y., Boschetti, F., Diako, H., Gratta, R. D., Monachini, M., & Crane, G. (2014). The making of ancient Greek wordnet. In Proceedings of the ninth international conference on language resources and evaluation (LREC’14) (pp. 1140–1147), Reykjavik, Iceland.Google Scholar
  4. Bond, F., Isahara, H., Kanzaki, K., & Uchimoto, K. (2008). Boot-strapping a wordnet using multiple existing wordnets. In Proceedings of the sixth international conference on language resources and evaluation (pp. 1619–1624), Marrakech, Morocco.Google Scholar
  5. Bond, F., Nichols, E., Fujita, S., & Tanaka, T. (2004). Acquiring an ontology for a fundamental vocabulary. In Proceedings of coling 2004 (pp. 1319–1325), Geneva, Switzerland.Google Scholar
  6. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 18, 13–47.CrossRefGoogle Scholar
  7. Commercial-Press (Ed.). (2005). The great Chinese dictionary 2.0. Hong Kong: The Commercial Press.Google Scholar
  8. Eckard, E., Barque, L., Nasr, A., & Sagot, B. (2012). Dictionary-ontology cross-enrichment using TLFi and WOLF to enrich one another. In Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon (pp 81–94). Mumbai, India: The COLING 2012 Organizing Committee. http://www.aclweb.org/anthology/W12-5107.
  9. Esuli, A., & Sebastiani, F. (2007). Pageranking wordnet synsets: An application to opinion mining. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, Czech.Google Scholar
  10. Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.Google Scholar
  11. Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., & Motta, E. (2011). Semantically enhanced information retrieval: An ontology-based approach. Journal of Web Semantics, 9, 434–452.CrossRefGoogle Scholar
  12. Fiser, D., & Sagot, B. (2008). Combining multiple resources to build reliable wordnets. In Proceeding of text, speech and dialogue, Brno, Czech.Google Scholar
  13. Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database (pp. 305–332). Cambridge: MIT PressGoogle Scholar
  14. HIT (ed). (2012). HIT IR-Lab Tongyici Cilin (extended). Harbin Institute of Technology. http://ir.hit.edu.cn/phpwebsite/index.php?module=pagemaster&PAGE_user_op=viewpage&PAGE_id=162.
  15. Hsu, M., Tsai, M., & Chen, H. (2008). Combining wordnet and conceptnet for automatic query expansion: A learning approach. In Proceedings of the fourth Asia information retrieval societies conference (pp 213–224), Harbin, China.Google Scholar
  16. Kulkarn, M. (2010). Introducing Sanskrit wordnet. In Proceedings of the fifth global WordNet meeting GWC, Mumbai, India.Google Scholar
  17. Lee, C., Lee, G. G., & Seo, J. (2004). Multiple heuristics and their combination for automatic wordnet mapping. Computers and the Humanities, 38, 437–455.CrossRefGoogle Scholar
  18. Li, L. (2004). Bamboo and silk books and academic origin. Beijing: SDX Joint Publishing Company.Google Scholar
  19. Li, B., Xi, N., Feng, M., & Chen, X. (2012). Corpus-based statistics of pre-qin Chinese. In Proceedings of CLSW 2012 (pp 145–153), Wuhan, China.Google Scholar
  20. Lindén, K., & Carlson, L. (2010). Finnwordnet — WordNet på finska via översättning. LexicoNordica — Nordic Journal of Lexicography (K. Lindén, Trans.), 17, 119–140.Google Scholar
  21. Liu, X. Y., Li, B., Zhang, Y. J., & Liu, L. (2014). Quantitative research on the origins of contemporary Chinese vocabulary based on the Great Chinese Dictionary. In Proceeding of 15th Workshop CLSW 2014 (pp. 112–123), Macao, China.Google Scholar
  22. Mei, J. et al. (1983). In J. Mei, Y. Zhu, Y. Gao & H. Yin (Eds.), Tongyici cilin (1st ed.). Shangai Lexicographical Publishing House.Google Scholar
  23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013a). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (pp. 3111–3119).Google Scholar
  24. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space. In Proceedings of workshop at ICLR.Google Scholar
  25. Mikolov, T., et al (2013c). Word2vec. URL https://code.google.com/p/word2vec/.
  26. Monozzi, S. (2009). The Latin wordnet project. Proceedings of the 15 th International Colloquium on Latin Linguistics (pp. 707–716). Austria: Innsbruck.Google Scholar
  27. Navigli, R., & Lapata, M. (2010). An experimental study on graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Anaylsis and Machine Intelligence, 32(4), 678–692.CrossRefGoogle Scholar
  28. Oliveira, H. G., & Gomes, P. (2013). On the automatic enrichment of a Portuguese wordnet with dictionary definitions. In Advances in artificial intelligence, local proceedings of the 16th Portuguese conference on artificial intelligence (pp. 486–497), Azores, Portugal.Google Scholar
  29. Ordan, N., & Wintner, S. (2007). Hebrew wordnet: A test case of aligning lexical databases across languages. International Journal of Translation, 19(1), 39–58.Google Scholar
  30. Parker, R., Graff, D., Chen, K., Kong, J., & Maeda, K. (2011). Chinese Gigaword fifth edition LDC2011T13. Philadelphia: Linguistic Data Consortium.Google Scholar
  31. Pianta, E., Bentivogli, L., & Girardi, C. (2002). Multiwordnet: developing an aligned multilingual database. In Proceedings of the first international conference on global WordNet, Mysore, India.Google Scholar
  32. Ponzetto, S. P., & Strube, M. (2006). Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, New York, USA.Google Scholar
  33. Saveski, M., & Trajkovski, I. (2010). Automatic construction of wordnets by using machine translation and language modeling. In Proceedings of seventh language technologies conference, 13th international multiconference information society (pp 707–716). Ljubljana, Slovenia.Google Scholar
  34. Sinha, R., & Mihalcea, R. (2007). Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In Proceedings of the IEEE international conference on semantic computing (ICSC2007) (pp. 363–369). Irvine, CA.Google Scholar
  35. Socher, R., Bauer, J., Manning, C. D., & Ng, A. Y. (2013). Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (Vol. 1: Long Papers, pp. 455-465). Sofia, Bulgaria: Association for Computational Linguistics. http://www.aclweb.org/anthology/P13-1045.
  36. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on web information and data management, Bremen, Germany.Google Scholar
  37. Vossen, P. (Ed.). (1998). EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic.Google Scholar
  38. Yu, J., & Yu, S. (2002). The structure of Chinese concept dictionary. Journal of Chinese Information Processing, 16(4), 12–20.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Yingjie Zhang
    • 1
  • Bin Li
    • 2
  • Xinyu Dai
    • 1
  • Shujian Huang
    • 1
  • Jiajun Chen
    • 1
  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.School of Chinese Language and LiteratureNanjing Normal UniversityNanjingChina

Personalised recommendations