PQAC-WN: constructing a wordnet for Pre-Qin ancient Chinese
- 187 Downloads
Abstract
The Princeton WordNet® (PWN) is a widely used lexical knowledge database for semantic information processing. There are now many wordnets under creation for languages worldwide. In this paper, we endeavor to construct a wordnet for Pre-Qin ancient Chinese (PQAC), called PQAC WordNet (PQAC-WN), to process the semantic information of PQAC. In previous work, most recently constructed wordnets have been established either manually by experts or automatically using resources from which translation pairs between English and the target language can be extracted. The former method, however, is time-consuming, and the latter method, owing to a lack of language resources, cannot be performed on PQAC. As a result, a method based on word definitions in a monolingual dictionary is proposed. Specifically, for each sense, kernel words are first extracted from its definition, and the senses of each kernel word are then determined by graph-based Word Sense Disambiguation. Finally, one optimal sense is chosen from the kernel word senses to guide the mapping between the word sense and PWN synset. In this research, we obtain 66 % PQAC senses that can be shared with English and another 14 % language-specific senses that were added to PQAC-WN as new synsets. Overall, the automatic mapping achieves a precision of over 85 %.
Keywords
Definition-to-synset mapping Pre-Qin ancient Chinese Graph-based WSD Global wordnetNotes
Acknowledgments
We are grateful for the comments of the reviewers. This work is the staged achievement of the projects supported by National Social Science Foundation of China (10&ZD117, 12&ZD177) and Ministry of Education of China (16YJC740034).
References
- Amsler, R. A. (1981). A taxonomy for English nouns and verbs. In: Proceedings of the 19th annual meeting on association for computational linguistics (pp. 133–138), Stanford, California.Google Scholar
- Barbu, E., Mititelu, V. B., & Molini, S. D. (2007). Automatic building of wordnets. In N. Nicolov, K. Bontcheva, G. Angelova & R. Mitkov (Eds.), Proceedings of recent advances in natural language processing IV (pp. 217–226), Bulgaria: John Benjamins Publishing Company Borovets.Google Scholar
- Bizzoni, Y., Boschetti, F., Diako, H., Gratta, R. D., Monachini, M., & Crane, G. (2014). The making of ancient Greek wordnet. In Proceedings of the ninth international conference on language resources and evaluation (LREC’14) (pp. 1140–1147), Reykjavik, Iceland.Google Scholar
- Bond, F., Isahara, H., Kanzaki, K., & Uchimoto, K. (2008). Boot-strapping a wordnet using multiple existing wordnets. In Proceedings of the sixth international conference on language resources and evaluation (pp. 1619–1624), Marrakech, Morocco.Google Scholar
- Bond, F., Nichols, E., Fujita, S., & Tanaka, T. (2004). Acquiring an ontology for a fundamental vocabulary. In Proceedings of coling 2004 (pp. 1319–1325), Geneva, Switzerland.Google Scholar
- Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 18, 13–47.CrossRefGoogle Scholar
- Commercial-Press (Ed.). (2005). The great Chinese dictionary 2.0. Hong Kong: The Commercial Press.Google Scholar
- Eckard, E., Barque, L., Nasr, A., & Sagot, B. (2012). Dictionary-ontology cross-enrichment using TLFi and WOLF to enrich one another. In Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon (pp 81–94). Mumbai, India: The COLING 2012 Organizing Committee. http://www.aclweb.org/anthology/W12-5107.
- Esuli, A., & Sebastiani, F. (2007). Pageranking wordnet synsets: An application to opinion mining. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, Czech.Google Scholar
- Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.Google Scholar
- Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., & Motta, E. (2011). Semantically enhanced information retrieval: An ontology-based approach. Journal of Web Semantics, 9, 434–452.CrossRefGoogle Scholar
- Fiser, D., & Sagot, B. (2008). Combining multiple resources to build reliable wordnets. In Proceeding of text, speech and dialogue, Brno, Czech.Google Scholar
- Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database (pp. 305–332). Cambridge: MIT PressGoogle Scholar
- HIT (ed). (2012). HIT IR-Lab Tongyici Cilin (extended). Harbin Institute of Technology. http://ir.hit.edu.cn/phpwebsite/index.php?module=pagemaster&PAGE_user_op=viewpage&PAGE_id=162.
- Hsu, M., Tsai, M., & Chen, H. (2008). Combining wordnet and conceptnet for automatic query expansion: A learning approach. In Proceedings of the fourth Asia information retrieval societies conference (pp 213–224), Harbin, China.Google Scholar
- Kulkarn, M. (2010). Introducing Sanskrit wordnet. In Proceedings of the fifth global WordNet meeting GWC, Mumbai, India.Google Scholar
- Lee, C., Lee, G. G., & Seo, J. (2004). Multiple heuristics and their combination for automatic wordnet mapping. Computers and the Humanities, 38, 437–455.CrossRefGoogle Scholar
- Li, L. (2004). Bamboo and silk books and academic origin. Beijing: SDX Joint Publishing Company.Google Scholar
- Li, B., Xi, N., Feng, M., & Chen, X. (2012). Corpus-based statistics of pre-qin Chinese. In Proceedings of CLSW 2012 (pp 145–153), Wuhan, China.Google Scholar
- Lindén, K., & Carlson, L. (2010). Finnwordnet — WordNet på finska via översättning. LexicoNordica — Nordic Journal of Lexicography (K. Lindén, Trans.), 17, 119–140.Google Scholar
- Liu, X. Y., Li, B., Zhang, Y. J., & Liu, L. (2014). Quantitative research on the origins of contemporary Chinese vocabulary based on the Great Chinese Dictionary. In Proceeding of 15th Workshop CLSW 2014 (pp. 112–123), Macao, China.Google Scholar
- Mei, J. et al. (1983). In J. Mei, Y. Zhu, Y. Gao & H. Yin (Eds.), Tongyici cilin (1st ed.). Shangai Lexicographical Publishing House.Google Scholar
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013a). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (pp. 3111–3119).Google Scholar
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space. In Proceedings of workshop at ICLR.Google Scholar
- Mikolov, T., et al (2013c). Word2vec. URL https://code.google.com/p/word2vec/.
- Monozzi, S. (2009). The Latin wordnet project. Proceedings of the 15 th International Colloquium on Latin Linguistics (pp. 707–716). Austria: Innsbruck.Google Scholar
- Navigli, R., & Lapata, M. (2010). An experimental study on graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Anaylsis and Machine Intelligence, 32(4), 678–692.CrossRefGoogle Scholar
- Oliveira, H. G., & Gomes, P. (2013). On the automatic enrichment of a Portuguese wordnet with dictionary definitions. In Advances in artificial intelligence, local proceedings of the 16th Portuguese conference on artificial intelligence (pp. 486–497), Azores, Portugal.Google Scholar
- Ordan, N., & Wintner, S. (2007). Hebrew wordnet: A test case of aligning lexical databases across languages. International Journal of Translation, 19(1), 39–58.Google Scholar
- Parker, R., Graff, D., Chen, K., Kong, J., & Maeda, K. (2011). Chinese Gigaword fifth edition LDC2011T13. Philadelphia: Linguistic Data Consortium.Google Scholar
- Pianta, E., Bentivogli, L., & Girardi, C. (2002). Multiwordnet: developing an aligned multilingual database. In Proceedings of the first international conference on global WordNet, Mysore, India.Google Scholar
- Ponzetto, S. P., & Strube, M. (2006). Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, New York, USA.Google Scholar
- Saveski, M., & Trajkovski, I. (2010). Automatic construction of wordnets by using machine translation and language modeling. In Proceedings of seventh language technologies conference, 13th international multiconference information society (pp 707–716). Ljubljana, Slovenia.Google Scholar
- Sinha, R., & Mihalcea, R. (2007). Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In Proceedings of the IEEE international conference on semantic computing (ICSC2007) (pp. 363–369). Irvine, CA.Google Scholar
- Socher, R., Bauer, J., Manning, C. D., & Ng, A. Y. (2013). Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (Vol. 1: Long Papers, pp. 455-465). Sofia, Bulgaria: Association for Computational Linguistics. http://www.aclweb.org/anthology/P13-1045.
- Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on web information and data management, Bremen, Germany.Google Scholar
- Vossen, P. (Ed.). (1998). EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic.Google Scholar
- Yu, J., & Yu, S. (2002). The structure of Chinese concept dictionary. Journal of Chinese Information Processing, 16(4), 12–20.Google Scholar