Semantic Measures: How Similar? How Related?
There are two main types of semantic measures (SM): similarity and relatedness. There are also two main types of datasets, those intended for similarity evaluations and those intended for relatedness. Although they are clearly distinct, they are similar enough to generate some misconceptions.
Is there a confusion between similarity and relatedness among the semantic measure community, both the designers of SMs and the creators of benchmarks? This is the question that the research presented in this paper tries to answer. Authors performed a survey of both the SMs and datasets and executed a cross evaluation of those measures and datasets. The results show different consistency of measures with datasets of the same type. This research enabled us to conclude not only that there is indeed some confusion but also to pinpoint the SMs and benchmarks less consistent with their intended type.
KeywordsSemantic similarity Semantic relatedness Semantic measures Linked data
This work is partially financed by the ERDF European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme and by the FCT within project POCI-01-0145-FEDER-006961 and project “NORTE-01-0145-FEDER-000020” financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement and through the European Regional Development Fund (ERDF).
- 4.Strube, M., Ponzetto, S.: WikiRelate! Computing semantic relatedness using wikipedia. In: AAAI (2006)Google Scholar
- 5.Philip, R.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI (1995)Google Scholar
- 7.Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. WordNet: Electr. Lexical Database 49, 265–283 (1998)Google Scholar
- 8.Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (1994)Google Scholar
- 9.Bodenreider, O., Aubry, M., Burgun, A.: Non-lexical approaches to identifying associative relations in the gene ontology. In: Pacific Symposium on Biocomputing (2005)Google Scholar
- 10.Lin, D.: An information-theoretic definition of similarity. In: ICML (1998)Google Scholar
- 11.Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: Electr. Lexical Database 305, 305–332 (1998)Google Scholar
- 14.Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity, relatedness using distributional, wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2009)Google Scholar
- 16.Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation (2014). arXiv preprint arXiv:1408.3456
- 17.Evgeniy, G.: The WordSimilarity-353 Test Collection. http://www.cs.technion.ac.il/gabr/resources/data/wordsim353/
- 18.Radinsky, K., Agichtein, E., Gabrilovich, E., Markovitch, S.: A word at a time, computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th International Conference on World Wide Web (2011)Google Scholar
- 19.Halawi, G., Dror, G., Gabrilovich, E., Koren, Y.: Large-scale learning of word relatedness with constraints. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012)Google Scholar