A Web-Based Novel Term Similarity Framework for Ontology Learning

  • Seokkyung Chung
  • Jongeun Jun
  • Dennis McLeod
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4275)


Given that pairwise similarity computations are essential in ontology learning and data mining, we propose a similarity framework that is based on a conventional Web search engine. There are two main aspects that we can benefit from utilizing a Web search engine. First, we can obtain the freshest content for each term that represents the up-to-date knowledge on the term. This is particularly useful for dynamic ontology management in that ontologies must evolve with time as new concepts or terms appear. Second, in comparison with the approaches that use the certain amount of crawled Web documents as corpus, our method is less sensitive to the problem of data sparseness because we access as much content as possible using a search engine. At the core of our proposed methodology, we present two different measures for similarity computation, a mutual information based and a feature-based metric. Moreover, we show how the proposed metrics can be utilized for modifying existing ontologies. Finally, we compare the extracted similarity relations with semantic similarity using WordNet. Experimental results show that our method can extract topical relations between terms that are not present in conventional concept-based ontologies.


Search Engine Mutual Information Association Rule Semantic Similarity Query Expansion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching very large ontologies using the WWW. In: Proceedings of the ECAI Workshop on Ontology Learning (2000)Google Scholar
  2. 2.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International World Wide Web Conference (1998)Google Scholar
  3. 3.
    Castano, S., Ferrara, A., Montanelli, S.: H-MATCH: an algorithm for dynamically matching ontologies in peer-based systems. In: Proceedings of the 1st VLDB International Workshop on Semantic Web and Databases (2003)Google Scholar
  4. 4.
    Chung, S., McLeod, D.: Dynamic topic mining from news stream data. In: Proceedings of the 2nd International Conference on Ontologies, Databases, and Application of Semantics for Large Scale Information Systems (2003)Google Scholar
  5. 5.
    Chung, S., McLeod, D.: Dynamic pattern mining: an incremental data clustering approach. Journal on Data Semantics 2, 85–112 (2005)CrossRefGoogle Scholar
  6. 6.
    Dagan, I., Pereira, F., Lee, L.: Similarity-based estimation of word cooccurrence probabilities. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (1994)Google Scholar
  7. 7.
    Glover, E.J., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring hierarchical descriptions. In: Proceedings of the ACM International Conference on Information and Knowledge Management (2002)Google Scholar
  8. 8.
    Jun, J., Chung, S., McLeod, D.: Subspace clustering of microarray data based on domain transformation. In: Proceedings of VLDB Workshop on Data Mining on Bioinformatics (to appear, 2006)Google Scholar
  9. 9.
    Khan, L., McLeod, D., Hovy, E.H.: Retrieval effectiveness of an ontology-based model for information selection. The VLDB Journal 13(1), 71–85 (2004)CrossRefGoogle Scholar
  10. 10.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (1998)Google Scholar
  11. 11.
    Lenat, D., Guha, R.V., Pittman, K., Pratt, D., Shepherd, M.: Cyc: Toward programs with common sense. Communications of the ACM 33(8), 30–49 (1990)CrossRefGoogle Scholar
  12. 12.
    Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning (1998)Google Scholar
  13. 13.
    Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: Proceedings of the 21st International Conference on Data Engineering (2005)Google Scholar
  14. 14.
    Maedche, A., Staab, S.: Ontology learning for the Semantic Web. IEEE Intelligent Systems 16(2) (2001)Google Scholar
  15. 15.
    Miller, G.: Wordnet: An on-line lexical database. International Journal of Lexicography 3(4), 235–312 (1990)CrossRefGoogle Scholar
  16. 16.
    Missikoff, M., Velardi, P., Fabriani, P.: Text mining techniques to automatically enrich a domain ontology. Applied Intelligence 18(3), 323–340 (2003)MATHCrossRefGoogle Scholar
  17. 17.
    Reinberger, M., Spyns, P., Daelemans, W., Meersman, R.: Mining for lexons: applying unsupervised learning methods to create ontology bases. In: Proceedings of International Conference on Ontologies, Databases and Applications of SEmantics (2003)Google Scholar
  18. 18.
    Nemrava, J., Svátek, V.: Text mining tool for ontology engineering based on use of product taxonomy and web directory. In: Proceedings of the Dateso Annual International Workshop on DAtabases, TExts, Specifications and Objects (2005)Google Scholar
  19. 19.
    Noy, N.F., Sintek, M., Decker, S., Crubézy, M., Fergerson, R.W., Musen, M.A.: Creating and acquiring Semantic Web contents with Protégé-2000. IEEE Intelligent Systems 16(2), 60–71 (2001)CrossRefGoogle Scholar
  20. 20.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet:Similarity - measuring the relatedness of concept. In: Proceedings of the 5th Annual Meeting of the North American Chapter of the Association for Computational Linguistics (2004)Google Scholar
  21. 21.
    Pereira, F., Tishby, N.Z., Lee, L.: Distributional clustering of english words. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (1993)Google Scholar
  22. 22.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  23. 23.
    Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research (1999)Google Scholar
  24. 24.
    Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  25. 25.
    Sanderson, M., Croft, W.B.: Deriving concept hierarchies from text. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1999)Google Scholar
  26. 26.
    Spyns, P., Reinberger, M.: Lexically evaluating ontology triples generated automatically from texts. In: Proceedings of the 2nd European Semantic Web Conference (2005)Google Scholar
  27. 27.
    Sure, Y., Erdmann, M., Angele, J., Staab, S., Studer, R., Wenke, D.: OntoEdit: Collaborative ontology development for the semantic web. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 221. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  28. 28.
    Ehrig, M., Sure, Y.: Ontology mapping - an integrated approach. In: Proceedings of the 1st European Semantic Web Symposium (2004)Google Scholar
  29. 29.
    Ziegler, C., Lausen, G., Schmidt-Thieme, L.: Taxonomy-driven computation of product recommendations. In: Proceedings of the ACM International Conference on Information and Knowledge Management (2004)Google Scholar
  30. 30.
    Google Web APIs.,

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Seokkyung Chung
    • 1
  • Jongeun Jun
    • 2
  • Dennis McLeod
    • 2
  1. 1.Yahoo! Inc.Santa ClaraUSA
  2. 2.Department of Computer ScienceUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations