Skip to main content
Log in

Learning to match ontologies on the Semantic Web

  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible on the Web scale. Hence the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web. We describe GLUE, a system that employs machine learning techniques to find such mappings. Given two ontologies, for each concept in one ontology GLUE finds the most similar concept in the other ontology. We give well-founded probabilistic definitions to several practical similarity measures and show that GLUE can work with all of them. Another key feature of GLUE is that it uses multiple learning strategies, each of which exploits well a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend GLUE to incorporate commonsense knowledge and domain constraints into the matching process. Our approach is thus distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains and show that GLUE proposes highly accurate semantic mappings. Finally, we extend GLUE to find complex mappings between ontologies and describe experiments that show the promise of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agresti A (1990) Categorical data analysis. Wiley, New York

  2. Berlin J, Motro A (2002) Database schema matching using machine learning with feature selection. In: Proceedings of the conference on advanced information systems engineering (CAiSE), Toronto, 27-31 May 2002, pp 452-466

  3. Berners-Lee T, Hendler J, Lassila O (2001) The Semantic Web. Sci Am 284(5):35-40

    Google Scholar 

  4. Brickley D, Guha R (2000) Resource Description Framework Schema Specification 1.0 http://www.w3.org/TR/rdf-schema/

  5. Broekstra J, Klein M, Decker S, Fensel D, van Harmelen F, Horrocks I (2001) Enabling knowledge representation on the Web by Extending RDF Schema. In: Proceedings of the 10th international World Wide Web conference, Hong Kong, 1-5 May 2001, pp 467-478

  6. Chakrabarti S, Dom B, Indyk P (1998) Enhanced Hypertext Categorization Using Hyperlinks. In: Proceedings of the ACM SIGMOD conference on management of data, Seattle, 2-4 June 1998, pp 307-318

  7. Calvanese D, Giuseppe DG, Lenzerini M (2001) Ontology of integration and integration of ontologies. In: Working notes of the 2001 international description logics workshop (DL-2001), Stanford, CA, 1-3 August 2001

  8. Chalupsky H (2000) Ontomorph: A translation system for symbolic knowledge. In: Proceedings of the 7th international conference on principles of knowledge representation and reasoning (KR2000), Breckenridge, CO, 11-15 April 2002, pp 471-482

  9. Clifton C, Housman E, Rosenthal A (1997) Experience with a combined approach to attribute-matching across heterogeneous databases. In: Proceedings of the 7th IFIP conference on database semantics (DS-7), Leysin, Switzerland, 7-10 October 1997, pp 428-456

  10. www.daml.org

  11. Doan A (2002) Learning to map between structured representations of data. PhD thesis, University of Washington http://anhai.cs.uiuc.edu/home/thesis.html

  12. Doan A, Domingos P, Halevy A (2001) Reconciling schemas of disparate data sources: a machine learning approach. In: Proceedings of the ACM SIGMOD conference on management of data, Santa Barbara, 21-24 May 2001, pp 509-520

  13. Doan A, Madhavan J, Domingos P, Halevy A (2002) Learning to map ontologies on the Semantic Web. In: Proceedings of the 11th international World Wide Web conference, Honolulu, 7-11 May 2002, pp 662-673

  14. Doan A, Madhavan J, Domingos P, Halevy A (2003) Ontology matching: a machine learning approach. In: Staab S, Studer R (eds) Handbook on ontologies in information systems. Springer, Berlin Heidelberg New York

  15. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103-130

    Article  MATH  Google Scholar 

  16. Do H, Rahm E (2002) Coma: a system for flexible combination of schema matching approaches. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong, 20-23 August 2002, pp 610-621

  17. Embley D, Jackman D, Xu L (2001) Multifaceted exploitation of metadata for attribute match discovery in information integration. In: Proceedings of the international workshop on information integration on the Web (WIIW), Rio de Janeiro, 9-11 April 2001, pp 110-117

  18. Fensel D (2001) Ontologies: silver bullet for knowledge management and electronic commerce. Springer, Berlin Heidelberg New York

  19. www.google.com.

  20. Heflin J, Hendler J (2001) A portrait of the Semantic Web in action. IEEE Intell Sys 16(2):54-59

    Article  Google Scholar 

  21. Hummel RA, Zucker SW (1983) On the foundations of relaxation labeling processes. PAMI 5(3):267-287

    MATH  Google Scholar 

  22. Fensel D, Musen M (eds) (2001) IEEE Intell Sys 16(2) March-April 2001

  23. Lacher M, Groh G (2001) Facilitating the exchange of explicit knowledge through ontology mappings. In: Proceedings of the 14th international FLAIRS conference, Key West, 21-23 May 2001, pp 305-309

  24. Lin D (1998) An information-theoritic definiton of similarity. In: Proceedings of the international conference on machine learning (ICML), Madison, WI, 24-27 July 1998, pp 296-304

  25. Lloyd S (1983) An optimization approach to relaxation labeling algorithms. Image Vision Comput 1(2)

  26. Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proceedings of the international conference on Very Large Databases (VLDB), Rome, 11-14 September 2001, pp 49-58

  27. Maedche A (2001) A machine learning perspective for the Semantic Web. Semantic Web Working Symposium (SWWS) Position Paper, Stanford University, Stanford, CA, 30 July-1 August 2001

  28. Maedche A, Staab S (2001) Ontology learning for the Semantic Web. IEEE Intell Sys 16(2)

  29. McGuinness D, Fikes R, Rice J, Wilder S (2000) The chimaera ontology environment. In: Proceedings of the 17th national conference on artificial intelligence (AAAI), Austin, TX, 30 July-3 August 2000, pp 1123-1124

  30. Melnik S, Molina-Garcia H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm. In: Proceedings of the international conference on Data Engineering (ICDE), San Jose, 26 February-1 March 2002, pp 117-128

  31. Miller R, Haas L, Hernandez M (2000) Schema mapping as query discovery. In: Proceedings of the international conference on very large databases (VLDB), Cairo, Egypt, 10-14 September 2000, pp 77-88

  32. Milo T, Zohar S (1998) Using schema matching to simplify heterogeneous data translation. In: Proceedings of the international conference on very large databases (VLDB), New York, 24-27 August 1998, pp 122-133

  33. Mitra P, Wiederhold G, Jannink J (1999) Semi-automatic integration of knowledge sources. In: Proceedings of Fusion’99, Sunnyvale, CA, July 1999

  34. Noy NF, Musen MA (2000) PROMPT: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the national conference on artificial intelligence (AAAI), Austin, TX, 30 July-3 August 2000, pp 450-455

  35. Noy NF, Musen MA (2001) Anchor-PROMPT: using non-local context for semantic matching. In: Proceedings of the workshop on ontologies and information sharing at the international joint conference on artificial intelligence (IJCAI), Seattle, 4-10 August 2001

  36. Omelayenko B (2001) Learning of ontologies for the Web: the analysis of existent approaches. In: Proceedings of the international workshop on Web dynamics, London, 3 January 2001

  37. http://ontobroker.semanticweb.org

  38. http://www.w3.org/tr/owl-ref

  39. Padro L (1998) A hybrid environment for syntax-semantic tagging. PhD thesis, Universitat Polit’ecnica de Catalunya (UPC), Barcelona

  40. Pernelle N, Rousset MC, Ventos V (2001) Automatic construction and refinement of a class hierarchy over semi-structured data. In: Proceedings of the IJCAI workshop on ontology learning, Seattle, 4-10 August 2001

  41. Popa L, Velegrakis Y, Hernandez M, Miller RJ, Fagin R (2002) Translating Web data. In: Proceedings of the international conference on very large databases (VLDB), Hong Kong, 20-23 August 2002, pp 598-609

  42. Rahm E, Bernstein PA (2001) On matching schemas automatically. VLDB J 10(4):334-350

    Article  MATH  Google Scholar 

  43. Rosenthal A, Seligman L (2001) Scalability issues in data integration. In: Proceedings of the AFCEA federal database conference, 2001

  44. Ryutaro I, Hideaki T, Shinichi H (2001) Rule induction for concept hierarchy alignment. In: Proceedings of the 2nd workshop on ontology learning at the 17th international joint conference on artificial intelligence (IJCAI), Seattle, 4-10 August 2001

  45. Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res 10:271-289

    MATH  Google Scholar 

  46. Uschold M (2003) Where is the semantics in the Semantic Web? AI Mag (in press)

  47. Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths, London

  48. Wolpert D (1992) Stacked generalization. Neural Netw 5:241-259

    Google Scholar 

  49. Yan LL, Miller RJ, Haas LM, Fagin R (2001) Data driven understanding and refinement of schema mappings. In: Proceedings of the ACM SIGMOD conference on management of data, Santa Barbara, 21-24 May 2001, pp 485-496

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to AnHai Doan.

Additional information

Received: 16 December 2002, Accepted: 16 April 2003, Published online: 17 September 2003

Edited by: Edited by B.V. Atluri, A. Joshi, and Y. Yesha

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doan, A., Madhavan, J., Dhamankar, R. et al. Learning to match ontologies on the Semantic Web. VLDB 12, 303–319 (2003). https://doi.org/10.1007/s00778-003-0104-2

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-003-0104-2

Keywords:

Navigation