Skip to main content

Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity

  • Conference paper
  • First Online:
  • 1380 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11221))

Abstract

Word embeddings have recently been widely used to model words in Natural Language Processing (NLP) tasks including semantic similarity measurement. However, word embeddings are not able to capture polysemy, because a polysemous word is represented by a single vector. To address this problem, learning multiple embedding vectors for different senses of a word is necessary and intuitive. We present a novel approach based on a Chinese lexicon to learn sense embeddings. Every sense is represented by a vector that consists of semantic contributions made by senses explaining it. To make full use of the lexicon’s advantages and address its drawbacks, we perform representation expansion to make sparse embedding vectors dense and disambiguate in gloss polysemous words by semantic contribution allocation. Thanks to the use of an intuitive way of noise filtering, we achieve noticeable improvement both in dimensionality reduction and semantic similarity measurement. We perform experiments on a translated version of Miller-Charles dataset and report state-of-the-art performance on semantic similarity measurement. We also apply our approach to SemEval-2012 Task4: Evaluating Chinese Word Similarity, which uses a translated version of wordsim353 as the standard dataset, and our approach also noticeably outperforms conventional approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)

    MATH  Google Scholar 

  2. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  3. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  4. Reisinger, J., Mooney, R.J.: Multi-prototype vector-space models of word meaning. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 109–117. Association for Computational Linguistics (2010)

    Google Scholar 

  5. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882. Association for Computational Linguistics (2012)

    Google Scholar 

  6. Tian, F., et al.: A probabilistic model for learning multi-prototype word embeddings. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 151–160 (2014)

    Google Scholar 

  7. Pilehvar, M.T., Collier, N.: De-conflated semantic representations. arXiv preprint arXiv:1608.01961 (2016)

  8. Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: Nasari: a novel approach to a semantically-aware representation of items. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 567–577 (2015)

    Google Scholar 

  9. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  10. Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 95–105 (2015)

    Google Scholar 

  11. Zhou, H., Jia, C., Yang, Y., Ning, S., Lin, Y., Huang, D.: Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation. In: Wen, J., Nie, J., Ruan, T., Liu, Y., Qian, T. (eds.) CCIR 2017. LNCS, vol. 10390, pp. 198–210. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68699-8_16

    Chapter  Google Scholar 

  12. Šuster, S., Titov, I., van Noord, G.: Bilingual learning of multi-sense embeddings with discrete autoencoders. arXiv preprint arXiv:1603.09128 (2016)

  13. Guo, J., Che, W., Wang, H., Liu, T.: Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 497–507 (2014)

    Google Scholar 

  14. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)

    Google Scholar 

  15. Zhao, J., Liu, H., Lu, R.: Attribute-base computing of word similarity. In: The 11th China Conference on Machine Learning (2008)

    Google Scholar 

  16. Lv, S., Ding, S.: Chinese Modern Dictionary. The Commercial Press, Beijing (2005)

    Google Scholar 

  17. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)

    Article  MathSciNet  Google Scholar 

  18. Evgeniy, L.F., Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)

    Article  Google Scholar 

  19. Liu, Q.: Word similarity computing based on hownet. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)

    Google Scholar 

  20. Chen, H.H., Lin, M.S., Wei, Y.C.: Novel association measures using web search with double checking. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016. Association for Computational Linguistics (2006)

    Google Scholar 

  21. Liu, H., Zhao, J., Lu, R.: Computing semantic similarities based on machine-readable dictionaries. In: IEEE International Workshop on Semantic Computing and Systems, WSCS 2008, pp. 8–14. IEEE (2008)

    Google Scholar 

  22. Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating Chinese word similarity. In: Joint Conference on Lexical and Computational Semantics, pp. 374–377 (2012)

    Google Scholar 

  23. Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1025–1035 (2014)

    Google Scholar 

  24. Qiu, L., Tu, K., Yu, Y.: Context-dependent sense embedding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 183–191 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuquan Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhen, Z., Chen, Y. (2018). Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2018 2018. Lecture Notes in Computer Science(), vol 11221. Springer, Cham. https://doi.org/10.1007/978-3-030-01716-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01716-3_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01715-6

  • Online ISBN: 978-3-030-01716-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics