Advertisement

Mandarin Relata: A Dataset of Word Relations and Their Semantic Types

  • Hongchao Liu
  • Chu-Ren Huang
  • Ren-kui Hou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10709)

Abstract

For both the training and evaluation of semantic distributional models, language datasets are needed that are both elaborate in their word level descriptors and readily intuitive to human judgment. The current paper introduces a dataset for Mandarin Chinese constructed through the combination of word relation pairs from two distinct sources: corpus extraction, and human elicitation. Our results show that while more word relation pairs were gained through the corpus extraction process, human elicited semantic neighbors were almost twice as likely to show agreement with human raters. The current methods created 4091 word relation pairs that span hypernymy, hyponymy, synonymy, antonymy, and meronymy alongside semantic type information. To date, this is the largest collection of human-rated word relation pairs in Mandarin Chinese.

Keywords

DSM Dataset Word relation Semantic types 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baroni, M., & Lenci, A. How we BLESSed distributional semantic evaluation. Paper presented at the Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics. (2011).Google Scholar
  2. 2.
    Bellegarda, J. R. Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE, 88(8), 1279–1296. (2000).Google Scholar
  3. 3.
    Chen, K.-J., Huang, C.-R., Chang, L.-P., & Hsu, H.-L. Sinica corpus: Design methodology for balanced corpora. Language, 167, 176. (1996).Google Scholar
  4. 4.
    Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391. (1990).Google Scholar
  5. 5.
    Fellbaum, C. WordNet: Wiley Online Library. (1998).Google Scholar
  6. 6.
    Girju, R., Badulescu, A., & Moldovan, D. Automatic discovery of part-whole relations. Computational Linguistics, 32(1), 83–135. (2006).Google Scholar
  7. 7.
    Grefenstette, G. Corpus-Derived First, Second and Third-Order Word Affinities. (1994).Google Scholar
  8. 8.
    Huang, C. R., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., & Huang, S.-W. Chinese Wordnet: Design, implementation, and application of an infrastructure for cross-lingual knowledge processing. Journal of Chinese Information Processing, 24(2), 14–23. (2010).Google Scholar
  9. 9.
    Liu, H., & Singh, P. ConceptNet—a practical commonsense reasoning tool-kit. BT technology journal, 22(4), 211-226. (2004).Google Scholar
  10. 10.
    Ma, W.-Y., & Huang, C.-R. Uniform and effective tagging of a heterogeneous giga-word corpus. Paper presented at the 5th International Conference on Language Resources and Evaluation (LREC2006). (2006).Google Scholar
  11. 11.
    McDonald, S. Environmental determinants of lexical processing effort. (2000).Google Scholar
  12. 12.
    Mohammad, S., Dorr, B., & Hirst, G. Computing word-pair antonymy. Paper presented at the Proceedings of the Conference on Empirical Methods in Natural Language Processing. (2008).Google Scholar
  13. 13.
    Pantel, P., & Pennacchiotti, M. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. Paper presented at the Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. (2006).Google Scholar
  14. 14.
    Santus, E., Lenci, A., Lu, Q., & Im Walde, S. S. Chasing Hypernyms in Vector Spaces with Entropy. Paper presented at the EACL. (2014).Google Scholar
  15. 15.
    Santus, E., Lu, Q., Lenci, A., & Huang, C. Unsupervised antonym-synonym discrimination in vector space. (2014).Google Scholar
  16. 16.
    Santus, E., Yung, F., Lenci, A., & Huang, C.-R. EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models. ACL-IJCNLP 2015, 64. (2015).Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.CBS, The Hong Kong Polytechnic UniversityHung HomHong Kong
  2. 2.School of Chinese Language and LiteratureLudong UniversityYantaiChina

Personalised recommendations