Skip to main content

KnoE: A Web Mining Tool to Validate Previously Discovered Semantic Correspondences


The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of techniques and tools for addressing this problem, however, the complex nature of the matching problem make existing solutions for real situations not fully satisfactory. The Google Similarity Distance has appeared recently. Its purpose is to mine knowledge from the Web using the Google search engine in order to semantically compare text expressions. Our work consists of developing a software application for validating results discovered by schema and ontology matching tools using the philosophy behind this distance. Moreover, we are interested in using not only Google, but other popular search engines with this similarity distance. The results reveal three main facts. Firstly, some web search engines can help us to validate semantic correspondences satisfactorily. Secondly there are significant differences among the web search engines. And thirdly the best results are obtained when using combinations of the web search engines that we have studied.

This is a preview of subscription content, access via your institution.


  1. Berners-Lee T, Hendler J, Lassila O (2001) The Semantic Web. Scientific American 284(5):34–43

    Article  Google Scholar 

  2. Euzenat J, Shvaiko P. Ontology Matching, Springer, 2007.

  3. Kiefer C, Bernstein A, Stocker M. The fundamentals of iSPARQL: A virtual triple approach for similarity-based semantic web tasks. In Proc. ISWC/ASWC, Nov. 2007, pp.295–309.

  4. Ziegler P, Kiefer C, Sturm C, Dittrich K R, Bernstein A. Detecting similarities in ontologies with the SOQA-SimPack toolkit. In Proc. the 10th EDBT, March 2006, pp.59–76.

  5. Lambrix P, Tan H (2007) A tool for evaluating ontology alignment strategies. J Data Semantics 8:182–202

    Google Scholar 

  6. Domshlak C, Gal A, Roitman H (2007) Rank aggregation for automatic schema matching. IEEE Trans Knowl Data Eng 19(4):538–553

    Article  Google Scholar 

  7. Gal A, Anaby-Tavor A, Trombetta A, Montesi D (2005) A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1):50–67

    Article  Google Scholar 

  8. Ehrig M, Staab S, Sure Y. Bootstrapping ontology alignment methods with APFEL. In Proc. the 4th International Semantic Web Conference, Nov. 2005, pp.186–200.

  9. Lee Y, Sayyadian M, Doan A, Rosenthal AS (2007) eTuner: Tuning schema matching software using synthetic scenarios. VLDB Journal 16(1):97–122

    Article  Google Scholar 

  10. Mao M, Peng Y, Spring M (2010) An adaptive ontology mapping approach with neural network based constraint satisfaction. J Web Semantics 8(1):14–25

    Article  Google Scholar 

  11. Wang J, Ding Z, Jiang C. GAOM: Genetic algorithm based ontology matching. In Proc. APSCC, Dec. 2006, pp.617–620.

  12. Ernandes M, Angelini G, Gori M. WebCrow: A web-based system for crossword solving. In Proc. the 20th AAAI, July 2005, pp.1412–1417.

  13. Gracia J, Mena E. Web-based measure of semantic relatedness. In Proc. the 9th WISE, Sept. 2008, pp.136–150.

  14. Cilibrasi RL, Vitányi PMB (2007) The google similarity distance. IEEE Trans Knowledge and Data Engineering 19(3):370–383

    Article  Google Scholar 

  15. Budanitsky A, Hirst G (2006) Evaluating word Net-based measures of lexical semantic relatedness. Computational Linguistics 32(1):13–47

    MATH  Article  Google Scholar 

  16. Motta E, Sabou M. Next generation semantic web applications. In Proc. the 1st ASWC, Sept. 2006, pp.24–29.

  17. Do H H, Rahm E. COMA — A system for flexible combination of schema matching approaches. In Proc. the 28th VLDB, August 2002, pp.610–621.

  18. Aumueller D, Do H H, Massmann S, Rahm E. Schema and ontology matching with COMA++. In Proc. the 24th SIGMOD Conference, June 2005, pp.906–908.

  19. Drumm C, Schmitt M, Do H H, Rahm E. Quickmig: Automatic schema matching for data migration projects. In Proc. the 16th CIKM, Nov. 2007, pp.107–116.

  20. Ehrig M, Sure Y. FOAM — Framework for ontology alignment and mapping - results of the ontology alignment evaluation initiative. In Proc. Integrating Ontologies, Oct. 2005, pp.72–76.

  21. Wang Z, Zhang X, Hou L, Zhao Y, Li J, Qi Y, Tang J. Ri-MOM results for OAEI 2010. In Proc. the 15th OM, Nov. 2010.

  22. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88

    Article  Google Scholar 

  23. Miller GA (1995) WordNet: A lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  24. Martinez-Gil J, Aldana-Montes JF (2011) Evaluation of two heuristic approaches to solve the ontology meta-matching problem. Knowl Inf Syst 26(2):225–247

    Article  Google Scholar 

  25. Avesani P, Giunchiglia F, Yatskevich M. A large scale taxonomy mapping evaluation. In Proc. the 4th International Semantic Web Conference, Nov. 2005, pp.67–81.

  26. Euzenat J, Meilicke C, Stuckenschmidt H, Shvaiko P, Trojahn C (2011) Ontology alignment evaluation initiative: Six years of experience. J Data Semantics 15:158–192

    Article  Google Scholar 

  27. Shvaiko P, Euzenat J, Giunchiglia F, He B (eds.) Proceedings of the 2nd InternationalWorkshop on Ontology Matching Busan, Korea, November 11, 2007.

  28. van Harmelen F. Two obvious intuitions: Ontology-mapping needs background knowledge and approximation. In Proc. IAT, Dec. 2006, p.11.

  29. Giunchiglia F, Shvaiko P, Yatskevich M. Discovering missing background knowledge in ontology matching. In Proc. the 17th ECAI, Aug. 29-Sept. 1, 2006, pp.382–386.

  30. Vazquez R, Swoboda N. Combining the semantic web with the web as background knowledge for ontology mapping. In Proc. OTM, Nov. 2007, 1: 814–831.

  31. Gligorov R, ten Kate W, Aleksovski Z, van Harmelen F. Using google distance to weight approximate ontology matches. In Proc. the 16th WWW, May 2007, pp.767–776.

  32. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Language Cognitive Processes 6(1):1–28

    Article  Google Scholar 

  33. Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Article  Google Scholar 

  34. Keller F, Lapata M (2003) Using the Web to obtain frequencies for unseen bigrams. Computational Linguistics 29(3):459–484

    Article  Google Scholar 

  35. Resnik P, Smith NA (2003) The Web as a parallel corpus. Computational Linguistics 29(3):349–380

    Article  Google Scholar 

  36. Turney P D. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. CoRR, 2002, cs.LG/0212033.

  37. Matsuo Y, Sakaki T, Uchiyama K, Ishizuka M. Graph-based word clustering using a web search engine. In Proc. EMNLP, July 2006, pp.542–550.

  38. Sahami M, Heilman T D. A web-based kernel function for measuring the similarity of short text snippets. In Proc. the 15th WWW, May 2006, pp.377–386.

  39. Chen H H, Lin M S, Wei Y C. Novel association measures using web search with double checking. In Proc. ACL, July 2006.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jorge Martinez-Gil.

Additional information

This work was supported by Spanish Ministry of Innovation and Science through REALIDAD: Gestion, Analisis y Explotacion Efficiente de Datos Vinculados under Grant No. TIN2011-25840.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Martinez-Gil, J., Aldana-Montes, J.F. KnoE: A Web Mining Tool to Validate Previously Discovered Semantic Correspondences. J. Comput. Sci. Technol. 27, 1222–1232 (2012).

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI:


  • database integration
  • data and knowledge engineering
  • similarity distance