Skip to main content

Embeddings Evaluation Using a Novel Measure of Semantic Similarity


Lexical taxonomies and distributional representations are largely used to support a wide range of NLP applications, including semantic similarity measurements. Recently, several scholars have proposed new approaches to combine those resources into unified representation preserving distributional and knowledge-based lexical features. In this paper, we propose and implement TaxoVec, a novel approach to selecting word embeddings based on their ability to preserve taxonomic similarity. In TaxoVec, we first compute the pairwise semantic similarity between taxonomic words through a new measure we previously developed, the Hierarchical Semantic Similarity (HSS), which we show outperforms previous measures on several benchmark tasks. Then, we train several embedding models on a text corpus and select the best model, that is, the model that maximizes the correlation between the HSS and the cosine similarity of the pair of words that are in both the taxonomy and the corpus. To evaluate TaxoVec, we repeat the embedding selection process using three other semantic similarity benchmark measures. We use the vectors of the four selected embeddings as machine learning model features to perform several NLP tasks. The performances of those tasks constitute an extrinsic evaluation of the criteria for the selection of the best embedding (i.e. the adopted semantic similarity measure). Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy on a benchmark intrinsic evaluation and (ii) the embedding selected through TaxoVec achieves a clear victory against embeddings selected by the competing measures on benchmark NLP tasks. We implemented the HSS, together with other benchmark measures of semantic similarity, as a full-fledged Python package called TaxoSS, whose documentation is available at

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3



  2. Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general nonlinear dimension reduction



  1. Tang D, Qin B, Liu T. Document modeling with gated recurrent neural network for sentiment classification. In: EMNLP; 2015.

  2. Gupta A, Zhang P, Lalwani G, Diab M. CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots. arXiv preprint arXiv:190908705. 2019.

  3. Zhang Y, Gan Z, Fan K, Chen Z, Henao R, Shen D, et al. Adversarial feature matching for text generation. In: Proceedings of Conference on Machine Learning. JMLR. org; 2017.

  4. Bakarov A. A survey of word embeddings evaluation methods. 2018. arXiv preprint

  5. Perone CS, Silveira R, Paula TS. Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv preprint arXiv:180606259. 2018.

  6. Schnabel T, Labutov I, Mimno D, Joachims T. Evaluation methods for unsupervised word embeddings. In: EMNLP; 2015.

  7. Levy O, Goldberg Y, Dagan I. Improving distributional similarity with lessons learned from word embeddings. TACL. 2015;3.

  8. Caselles-Dupré H, Lesaint F, Royo-Letelier J. Word2vec applied to recommendation: Hyperparameters matter. In: RECSYS; 2018.

  9. Zhang Y, Ahmed A, Josifovski V, Smola A. Taxonomy discovery for personalized recommendation. In: Proceedings of the 7th ACM international conference on Web search and data mining; 2014.

  10. Hua W, Wang Z, Wang H, Zheng K, Zhou X. Understand short texts by harvesting and analyzing semantic knowledge. IEEE transactions on Knowledge and data Engineering. 2016;29(3).

  11. Wu W, Li H, Wang H, Zhu KQ. Probase: A probabilistic taxonomy for text understanding. In: ACM SIGMOD; 2012.

  12. Resnik P. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. JAIR. 1999;11.

  13. Malandri L, Mercorio F, Mezzanzanica M, Nobani N. MEET: A Method for Embeddings Evaluation for Taxonomic Data. In: 2020 International Conference on Data Mining Workshops (ICDMW). IEEE; 2020. p. 31-8.

  14. Giabelli A, Malandri L, Mercorio F, Mezzanzanica M, Seveso A. NEO: A Tool for Taxonomy Enrichment with New Emerging Occupations. In: International Semantic Web Conference. Springer; 2020. p. 568–84.

  15. Giabelli A, Malandri L, Mercorio F, Mezzanzanica M, Seveso A. NEO: A System for Identifying New Emerging Occupation from Job Ads. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35; 2021. p. 16035–7.

  16. Seveso A, Mercorio F, Mezzanzanica M. A Human-AI Teaming Approach for Incremental Taxonomy Learning from Text. In: Zhou Z, editor. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021.; 2021. p. 4917–8. Available from:

  17. Malandri L, Mercorio F, Mezzanzanica M, Nobani N. TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases 2021 Sep 13 (pp. 612–627). Springer, Cham.

  18. Giabelli A, Malandri L, Mercorio F, Mezzanzanica M, Seveso A. Skills2Graph: Processing million Job Ads to face the Job Skill Mismatch Problem. In: Zhou Z, editor. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021.; 2021. p. 4984–7. Available from:

  19. Giabelli A, Malandri L, Mercorio F, Mezzanzanica M, Seveso A. Skills2Job: A recommender system that encodes job offer embeddings on graph databases. Appl Soft Comput. 2021;101:107049. Available from:

  20. Otter DW, Medina JR, Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems. 2020.

  21. Malandri L, Xing FZ, Orsenigo C, Vercellis C, Cambria E. Public mood-driven asset allocation: The importance of financial sentiment in portfolio management. Cognitive Computation. 2018;10(6):1167–76.

    Article  Google Scholar 

  22. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J. Deep Learning-based Text Classification: A Comprehensive Review. ACM Computing Surveys (CSUR). 2021;54(3):1–40.

    Article  Google Scholar 

  23. Deng L, Liu Y. Deep learning in natural language processing. Springer; 2018.

  24. Xing F, Malandri L, Zhang Y, Cambria E. Financial Sentiment Analysis: An Investigation into Common Mistakes and Silver Bullets. In: Proceedings of the 28th International Conference on Computational Linguistics; 2020. p. 978–87.

  25. Cambria E, Li Y, Xing FZ, Poria S, Kwok K. SenticNet 6: Ensemble application of symbolic and subsymbolic AI for sentiment analysis. In: Proceedings of the 29th ACM international conference on information & knowledge management; 2020. p. 105-14.

  26. Fu R, Guo J, Qin B, Che W, Wang H, Liu T. Learning semantic hierarchies via word embeddings. In: ACL; 2014.

  27. Maedche A, Volz R. The ontology extraction & maintenance framework Text-To-Onto. In: Proc. Workshop on Integrating Data Mining and Knowledge Management, USA; 2001.

  28. Lastra-Díaz JJ, García-Serrano A, Batet M, Fernández M, Chirigati F. HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems. 2017;66.

  29. Aouicha MB, Taieb MAH, Hamadou AB. SISR: System for integrating semantic relatedness and similarity measures. Soft Computing. 2018;22(6).

  30. Leacock C, Chodorow M. Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database. 1998;49(2).

  31. Wu Z, Palmer M. Verbs semantics and lexical selection. In: ACL; 1994.

  32. Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008. 1997.

  33. Lin D, et al. An information-theoretic definition of similarity. In: ICML. vol. 98; 1998.

  34. Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet. In: Ecai. vol. 16; 2004.

  35. Collobert R, Weston J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In: ICML. ACM; 2008.

  36. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: NeurIPS; 2013.

  37. Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. In: EMNLP; 2014.

  38. Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. In: NeurIPS; 2014.

  39. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. ACL. 2017;5.

  40. Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA. Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:14114166. 2014.

  41. Kiela D, Hill F, Clark S. Specializing word embeddings for similarity or relatedness. In: EMNLP; 2015.

  42. Nguyen KA, Köper M, Walde SSi, Vu NT. Hierarchical embeddings for hypernymy detection and directionality. arXiv preprint arXiv:170707273. 2017.

  43. Meng Y, Huang J, Wang G, Zhang C, Zhuang H, Kaplan L, et al. Spherical text embedding. In: Advances in Neural Information Processing Systems; 2019. p. 8208-17.

  44. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018.

  45. Wang B, Wang A, Chen F, Wang Y, Kuo CCJ. Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing. 2019;8.

  46. Baroni M, Dinu G, Kruszewski G. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL; 2014.

  47. Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A. A study on similarity and relatedness using distributional and WordNet-based approaches. In: NAACL; 2009. p. 19–27.

  48. Hill F, Reichart R, Korhonen A. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics. 2015;41(4).

  49. Liza FF, Grzes M. An improved crowdsourcing based evaluation technique for word embedding methods. In: Workshop on Evaluating Vector-Space Representations for NLP; 2016.

  50. Köhn A. What’s in an embedding? Analyzing word embeddings through multilingual evaluation. In: EMNLP; 2015.

    Google Scholar 

  51. Lau JH, Baldwin T. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:160705368. 2016.

  52. Press O, Wolf L. Using the output embedding to improve language models. arXiv preprint arXiv:160805859. 2016.

  53. Ghannay S, Favre B, Esteve Y, Camelin N. Word embedding evaluation and combination. In: LREC; 2016.

  54. AlMousa M, Benlamri R, Khoury R. Exploiting non-taxonomic relations for measuring semantic similarity and relatedness in WordNet. Knowledge-Based Systems. 2021;212:106565.

    Article  Google Scholar 

  55. Schönbrodt FD, Perugini M. At what sample size do correlations stabilize? Journal of Research in Personality. 2013;47(5).

  56. Camacho-Collados J, Pilehvar MT, Collier N, Navigli R. Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017); 2017. p. 15-26.

  57. Xia Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using bayesian model and opinion-level features. Cognitive Computation. 2015;7(3).

  58. Valdivia A, Luzón MV, Cambria E, Herrera F. Consensus vote models for detecting and filtering neutrality in sentiment analysis. Information Fusion. 2018;44:126–35.

    Article  Google Scholar 

  59. Wang Z, Ho SB, Cambria E. Multi-level fine-scaled sentiment sensing with ambivalence handling. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 2020;28(04):683–97.

    Article  Google Scholar 

  60. Miller GA, Charles WG. Contextual correlates of semantic similarity. Language and cognitive processes. 1991;6(1).

  61. Bird S, Klein E, Loper E. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc. 2009.

  62. Bruni E, Tran NK, Baroni M. Multimodal distributional semantics. JAIR. 2014;49.

  63. Rubenstein H, Goodenough JB. Contextual correlates of synonymy. Communications of the ACM. 1965;8(10).

  64. Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, et al. Placing search in context: The concept revisited. In: WWW; 2001.

  65. Radinsky K, Agichtein E, Gabrilovich E, Markovitch S. A word at a time: computing word relatedness using temporal semantic analysis. In: WWW; 2011.

  66. Halawi G, Dror G, Gabrilovich E, Koren Y. Large-scale learning of word relatedness with constraints. In: ACM SIGKDD; 2012.

  67. Cohen J. A power primer. Psychological bulletin. 1992;112(1).

    Google Scholar 

  68. Baroni M, Evert S, Lenci A. Bridging the gap between semantic theory and computational simulations: Proceedings of the esslli workshop on distributional lexical semantics. Hamburg, Germany: FOLLI. 2008.

    Google Scholar 

  69. Almuhareb A. Attributes in lexical acquisition. University of Essex; 2006.

  70. Baroni M, Murphy B, Barbu E, Poesio M. Strudel: A distributional semantic model based on property and types. Cognitive Science. 2010;34(2).

  71. Aranganayagi S, Thangavel K. Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007). vol. 2; 2007. p. 13–7.

  72. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011;12.

  73. Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. Learning word vectors for sentiment analysis. In: ACL HLT; 2011.

  74. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. LIBLINEAR: A library for large linear classification. Journal of machine learning research. 2008;9(Aug).

  75. Gladkova A, Drozd A, Matsuoka S. Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In: NAACL; 2016.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Lorenzo Malandri.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Conflict of Interest

All authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Giabelli, A., Malandri, L., Mercorio, F. et al. Embeddings Evaluation Using a Novel Measure of Semantic Similarity. Cogn Comput 14, 749–763 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: