Word Similarity Fails in Multiple Sense Word Embedding

  • Yong Shi
  • Yuanchun Zheng
  • Kun Guo
  • Wei Li
  • Luyao Zhu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10861)


Word representation is one foundational research in natural language processing which full of challenges compared to other fields such as image and speech processing. It embeds words to a dense low-dimensional vector space and is able to learn syntax and semantics at the same time. But this representation only get one single vector for a word no matter it is polysemy or not. In order to solve this problem, sense information are added in the multiple sense language models to learn alternative vectors for each single word. However, as the most popular measuring method in single sense language models, word similarity did not get the same performance in multiple situation, because word similarity based on cosine distance doesn’t match annotated similarity scores. In this paper, we analyzed similarity algorithms and found there is obvious gap between cosine distance and benchmark datasets, because the negative internal in cosine space does not correspond to manual scores space and cosine similarity did not cover semantic relatedness contained in datasets. Based on this, we proposed a new similarity methods based on mean square error and the experiments showed that our new evaluation algorithm provided a better method for word vector similarity evaluation.



This work is supported by the National Natural Science Foundation of China No. 91546201, No. 71331005 and No. 71501175, Shandong Independent Innovation and Achievement Transformation Special Fund of China (2014ZZCX03302), and the Open Project of Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences.


  1. 1.
    Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  2. 2.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)zbMATHGoogle Scholar
  3. 3.
    Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning, pp. 641–648. ACM (2007)Google Scholar
  4. 4.
    Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)Google Scholar
  5. 5.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  6. 6.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)Google Scholar
  7. 7.
    Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 1532–1543 (2014)Google Scholar
  8. 8.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)CrossRefGoogle Scholar
  9. 9.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
  10. 10.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
  11. 11.
    Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882. Association for Computational Linguistics (2012)Google Scholar
  12. 12.
    Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. arXiv preprint arXiv:1504.06654 (2015)
  13. 13.
    Zheng, Y., Shi, Y., Guo, K., Li, W., Zhu, L.: Enhanced word embedding with multiple prototypes. In: 2017 4th International Conference on Industrial Economics System and Industrial Security Engineering, IEIS 2017, pp. 1–5. IEEE (2017)Google Scholar
  14. 14.
    Tian, F., Dai, H., Bian, J., Gao, B., Zhang, R., Chen, E., Liu, T.Y.: A probabilistic model for learning multi-prototype word embeddings. In: COLING, pp. 151–160 (2014)Google Scholar
  15. 15.
    Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding? CoRR (2015)Google Scholar
  16. 16.
    Gladkova, A., Drozd, A.: Intrinsic evaluations of word embeddings: what can we do better? In: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, Stroudsburg, PA, USA, pp. 36–42. Association for Computational Linguistics (2016)Google Scholar
  17. 17.
    Chiu, B., Korhonen, A., Pyysalo, S.: Intrinsic evaluation of word vectors fails to predict extrinsic performance. In: Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, pp. 1–6 (2016)Google Scholar
  18. 18.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414. ACM (2001)Google Scholar
  19. 19.
    Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 19–27. Association for Computational Linguistics (2009)Google Scholar
  20. 20.
    Bruni, E., Boleda, G., Baroni, M., Tran, N.K.: Distributional semantics in technicolor. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 136–145. Association for Computational Linguistics (2012)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Yong Shi
    • 1
    • 3
    • 4
    • 5
  • Yuanchun Zheng
    • 2
    • 3
    • 4
  • Kun Guo
    • 3
    • 4
    • 5
  • Wei Li
    • 3
    • 4
    • 5
  • Luyao Zhu
    • 3
    • 4
    • 5
  1. 1.College of Information Science and TechnologyUniversity of Nebraska at OmahaOmahaUSA
  2. 2.School of Computer and Control EngineeringUniversity of Chinese Academy of SciencesBeijingChina
  3. 3.Key Laboratory of Big Data Mining and Knowledge ManagementChinese Academy of SciencesBeijingChina
  4. 4.Research Center on Fictitious Economy & Data ScienceChinese Academy of SciencesBeijingChina
  5. 5.School of Economics and ManagementUniversity of Chinese Academy of SciencesBeijingChina

Personalised recommendations