Measurement of Semantic Similarity: A Concept Hierarchy Based Approach

Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 44)

Abstract

Resolving semantic heterogeneity is one of the major issues in many fields, namely, natural language processing, search engine development, document clustering, geospatial information retrieval and knowledge discovery, etc. Semantic heterogeneity is often considered as an obstacle for realizing full interoperability among diverse datasets. Appropriate measurement metric is essential to properly understand the extent of similarity between concepts. The proposed approach is based on the notion of concept hierarchy which is built using a lexical database. The WordNet, a semantic lexical database, is used here to build the semantic hierarchy. A measurement metric is also proposed to quantify the extent of similarity between a pair of concepts. The work is compared with existing methodologies on Miller-Charles benchmark dataset using three correlation coefficients (Pearson’s, Spearman’s and Kendall Tau rank correlation coefficients). The proposed approach is found to yield better results than most of the existing techniques.

Keywords

Semantic heterogeneity Concept hierarchy Wordnet Correlation coefficient 

References

  1. 1.
    Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. In: Spaccapietra, S. ed.: Journal on Data Semantics IV. Volume 3730 of Lecture Notes in Computer Science, pp. 146–171. Springer, Berlin Heidelberg (2005)Google Scholar
  2. 2.
    Bhattacharjee, S., Ghosh, S.K.: Automatic resolution of semantic heterogeneity in GIS: an ontology based approach. In: Advanced Computing, Networking and Informatics, vol. 1, pp. 585–591. Springer (2014)Google Scholar
  3. 3.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  4. 4.
    Jannink, J.F.: A word nexus for systematic interoperation of semantically heterogeneous data sources. PhD Thesis, Stanford University (2001)Google Scholar
  5. 5.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. on Syst. Man Cybern. 19(1), 17–30 (1989)CrossRefGoogle Scholar
  6. 6.
    Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)CrossRefGoogle Scholar
  7. 7.
    Cilibrasi, R.L., Vitanyi, P.M.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)CrossRefGoogle Scholar
  8. 8.
    Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386. ACM (2006)Google Scholar
  9. 9.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2-vol. 2, pp. 803–812. Association for Computational Linguistics (2009)Google Scholar
  10. 10.
    Bhattacharjee, S., Mitra, P., Ghosh, S.K.: Spatial interpolation to predict missing attributes in GIS using semantic kriging. IEEE Trans. Geosci. Remote Sens. 52(8), 4771–4780 (2014). doi: 10.1109/TGRS.2013.2284489 CrossRefGoogle Scholar
  11. 11.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)Google Scholar
  12. 12.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Proc. 6(1), 1–28 (1991)CrossRefGoogle Scholar
  13. 13.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. WWW 7, 757–766 (2007)Google Scholar
  14. 14.
    Chen, H.H., Lin, M.S., Wei, Y.C.: Novel association measures using web search with double checking. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 1009–1016 (2006)Google Scholar
  15. 15.
    Chen, V.Y.J., Chinchilli, V.M., Donald St, P.R.: Robustness and monotonicity properties of generalized correlation coefficients. J. Stat. Planning Infer. 141(2), 924–936 (2011)CrossRefMATHGoogle Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.School of Information TechnologyIndian Institute of TechnologyKharagpurIndia

Personalised recommendations