Automatic Computation of Semantic Proximity Using Taxonomic Knowledge

  • Cai-Nicolas Ziegler
  • Kai Simon
  • Georg Lausen
Part of the Studies in Computational Intelligence book series (SCI, volume 406)


Taxonomic measures of semantic proximity allow us to compute the relatedness of two concepts. These metrics are versatile instruments required for diverse applications, e.g., the Semantic Web, linguistics, and also text mining. However, most approaches are only geared towards hand-crafted taxonomic dictionaries such as WORDNET, which only feature a limited fraction of real-world concepts. More specific concepts, and particularly instances of concepts, i.e., names of artists, locations, brand names, etc., are not covered.

The contributions of this paper are twofold. First, we introduce a framework based on Google and the Open Directory Project (ODP), enabling us to derive the semantic proximity between arbitrary concepts and instances. Second, we introduce a new taxonomy-driven proximity metric tailored for our framework. Studies with human subjects corroborate our hypothesis that our new metric outperforms benchmark semantic proximity metrics and comes close to human judgement.


Semantic Similarity Automatic Computation Word Sense Human Rating Inverse Document Frequency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading, MA (1999)Google Scholar
  2. 2.
    Breese, J., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann, Madison (1998)Google Scholar
  3. 3.
    Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In: Proceedings of the Workshop on WordNet and Other Lexical Resources, Pittsburgh, PA, USA (June 2000)Google Scholar
  4. 4.
    Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation. In: Proceedings of the 14th International World Wide Web Conference. ACM Press, Chiba (2005)Google Scholar
  5. 5.
    Chirita, P.-A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using odp metadata to personlize search. In: Proceedings of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, Salvador (2005)Google Scholar
  6. 6.
    Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International World Wide Web Conference, pp. 462–471. ACM Press, New York (2004)Google Scholar
  7. 7.
    Cimiano, P., Ladwig, G., Staab, S.: Gimme’ the context: context-driven automatic semantic annotation with c-pankow. In: Proceedings of the 14th International World Wide Web Conference, pp. 332–341. ACM Press, Chiba (2005)CrossRefGoogle Scholar
  8. 8.
    Ganesan, P., Garcia-Molina, H., Widom, J.: Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems 21(1), 64–93 (2003)CrossRefGoogle Scholar
  9. 9.
    Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, Taiwan (1997)Google Scholar
  10. 10.
    Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15(4), 871–882 (2003)CrossRefGoogle Scholar
  11. 11.
    Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)Google Scholar
  12. 12.
    Maguitman, A., Menczer, F., Roinestad, H., Vespignani, A.: Algorithmic detection of semantic similarity. In: Proceedings of the 14th International World Wide Web Conference, pp. 107–116. ACM Press, Chiba (2005)CrossRefGoogle Scholar
  13. 13.
    Miller, G.: Wordnet: A lexical database for english. Communications of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  14. 14.
    Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)CrossRefGoogle Scholar
  15. 15.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 448–453 (1995)Google Scholar
  16. 16.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)zbMATHGoogle Scholar
  17. 17.
    Rubenstein, H., Goodenough, J.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  18. 18.
    Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the Tenth International World Wide Web Conference, Hong Kong, China (May 2001)Google Scholar
  19. 19.
    van Rijsbergen, K.: Information Retrieval. Butterworths, London (1975)Google Scholar
  20. 20.
    Vlachos, M., Meek, C., Vagena, Z., Gunopulos, D.: Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 131–142. ACM Press, Paris (2004)CrossRefGoogle Scholar
  21. 21.
    Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Clustering user queries of a search engine. In: Proceedings of the 10th International World Wide Web Conference, pp. 162–168. ACM Press, Hong Kong (2001)CrossRefGoogle Scholar
  22. 22.
    Ziegler, C.-N., Lausen, G., Schmidt-Thieme, L.: Taxonomy-driven computation of product recommendations. In: Proceedings of the 2004 ACM CIKM Conference on Information and Knowledge Management, pp. 406–415. ACM Press, Washington, D.C., USA (2004)CrossRefGoogle Scholar
  23. 23.
    Ziegler, C.-N., McNee, S., Konstan, J., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th International World Wide Web Conference. ACM Press, Chiba (2005)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.DBIS, Institut für InformatikUniversität FreiburgFreiburg i. Br.Germany

Personalised recommendations