Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity

Zhen, Zhuo; Chen, Yuquan

doi:10.1007/978-3-030-01716-3_17

Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity

Zhuo Zhen¹⁸ &
Yuquan Chen¹⁸

Conference paper
First Online: 07 October 2018

1380 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11221))

Abstract

Word embeddings have recently been widely used to model words in Natural Language Processing (NLP) tasks including semantic similarity measurement. However, word embeddings are not able to capture polysemy, because a polysemous word is represented by a single vector. To address this problem, learning multiple embedding vectors for different senses of a word is necessary and intuitive. We present a novel approach based on a Chinese lexicon to learn sense embeddings. Every sense is represented by a vector that consists of semantic contributions made by senses explaining it. To make full use of the lexicon’s advantages and address its drawbacks, we perform representation expansion to make sparse embedding vectors dense and disambiguate in gloss polysemous words by semantic contribution allocation. Thanks to the use of an intuitive way of noise filtering, we achieve noticeable improvement both in dimensionality reduction and semantic similarity measurement. We perform experiments on a translated version of Miller-Charles dataset and report state-of-the-art performance on semantic similarity measurement. We also apply our approach to SemEval-2012 Task4: Evaluating Chinese Word Similarity, which uses a translated version of wordsim353 as the standard dataset, and our approach also noticeably outperforms conventional approaches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
MATH Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Reisinger, J., Mooney, R.J.: Multi-prototype vector-space models of word meaning. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 109–117. Association for Computational Linguistics (2010)
Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882. Association for Computational Linguistics (2012)
Google Scholar
Tian, F., et al.: A probabilistic model for learning multi-prototype word embeddings. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 151–160 (2014)
Google Scholar
Pilehvar, M.T., Collier, N.: De-conflated semantic representations. arXiv preprint arXiv:1608.01961 (2016)
Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: Nasari: a novel approach to a semantically-aware representation of items. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 567–577 (2015)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 95–105 (2015)
Google Scholar
Zhou, H., Jia, C., Yang, Y., Ning, S., Lin, Y., Huang, D.: Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation. In: Wen, J., Nie, J., Ruan, T., Liu, Y., Qian, T. (eds.) CCIR 2017. LNCS, vol. 10390, pp. 198–210. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68699-8_16
Chapter Google Scholar
Šuster, S., Titov, I., van Noord, G.: Bilingual learning of multi-sense embeddings with discrete autoencoders. arXiv preprint arXiv:1603.09128 (2016)
Guo, J., Che, W., Wang, H., Liu, T.: Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 497–507 (2014)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)
Google Scholar
Zhao, J., Liu, H., Lu, R.: Attribute-base computing of word similarity. In: The 11th China Conference on Machine Learning (2008)
Google Scholar
Lv, S., Ding, S.: Chinese Modern Dictionary. The Commercial Press, Beijing (2005)
Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)
Article MathSciNet Google Scholar
Evgeniy, L.F., Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)
Article Google Scholar
Liu, Q.: Word similarity computing based on hownet. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)
Google Scholar
Chen, H.H., Lin, M.S., Wei, Y.C.: Novel association measures using web search with double checking. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016. Association for Computational Linguistics (2006)
Google Scholar
Liu, H., Zhao, J., Lu, R.: Computing semantic similarities based on machine-readable dictionaries. In: IEEE International Workshop on Semantic Computing and Systems, WSCS 2008, pp. 8–14. IEEE (2008)
Google Scholar
Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating Chinese word similarity. In: Joint Conference on Lexical and Computational Semantics, pp. 374–377 (2012)
Google Scholar
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1025–1035 (2014)
Google Scholar
Qiu, L., Tu, K., Yu, Y.: Context-dependent sense embedding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 183–191 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, People’s Republic of China
Zhuo Zhen & Yuquan Chen

Authors

Zhuo Zhen
View author publications
You can also search for this author in PubMed Google Scholar
Yuquan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuquan Chen .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Ting Liu
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Tsinghua University, Beijing, China
Zhiyuan Liu
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhen, Z., Chen, Y. (2018). Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2018 2018. Lecture Notes in Computer Science(), vol 11221. Springer, Cham. https://doi.org/10.1007/978-3-030-01716-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-01716-3_17
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01715-6
Online ISBN: 978-3-030-01716-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics