The VLDB Journal

, Volume 12, Issue 4, pp 303–319

Learning to match ontologies on the Semantic Web

  • AnHai Doan
  • Jayant Madhavan
  • Robin Dhamankar
  • Pedro Domingos
  • Alon Halevy
Article

Abstract.

On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible on the Web scale. Hence the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web. We describe GLUE, a system that employs machine learning techniques to find such mappings. Given two ontologies, for each concept in one ontology GLUE finds the most similar concept in the other ontology. We give well-founded probabilistic definitions to several practical similarity measures and show that GLUE can work with all of them. Another key feature of GLUE is that it uses multiple learning strategies, each of which exploits well a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend GLUE to incorporate commonsense knowledge and domain constraints into the matching process. Our approach is thus distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains and show that GLUE proposes highly accurate semantic mappings. Finally, we extend GLUE to find complex mappings between ontologies and describe experiments that show the promise of the approach.

Keywords:

Semantic Web Ontology matching Machine learning Relaxation labeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agresti A (1990) Categorical data analysis. Wiley, New YorkGoogle Scholar
  2. 2.
    Berlin J, Motro A (2002) Database schema matching using machine learning with feature selection. In: Proceedings of the conference on advanced information systems engineering (CAiSE), Toronto, 27-31 May 2002, pp 452-466Google Scholar
  3. 3.
    Berners-Lee T, Hendler J, Lassila O (2001) The Semantic Web. Sci Am 284(5):35-40Google Scholar
  4. 4.
    Brickley D, Guha R (2000) Resource Description Framework Schema Specification 1.0 http://www.w3.org/TR/rdf-schema/Google Scholar
  5. 5.
    Broekstra J, Klein M, Decker S, Fensel D, van Harmelen F, Horrocks I (2001) Enabling knowledge representation on the Web by Extending RDF Schema. In: Proceedings of the 10th international World Wide Web conference, Hong Kong, 1-5 May 2001, pp 467-478Google Scholar
  6. 6.
    Chakrabarti S, Dom B, Indyk P (1998) Enhanced Hypertext Categorization Using Hyperlinks. In: Proceedings of the ACM SIGMOD conference on management of data, Seattle, 2-4 June 1998, pp 307-318Google Scholar
  7. 7.
    Calvanese D, Giuseppe DG, Lenzerini M (2001) Ontology of integration and integration of ontologies. In: Working notes of the 2001 international description logics workshop (DL-2001), Stanford, CA, 1-3 August 2001Google Scholar
  8. 8.
    Chalupsky H (2000) Ontomorph: A translation system for symbolic knowledge. In: Proceedings of the 7th international conference on principles of knowledge representation and reasoning (KR2000), Breckenridge, CO, 11-15 April 2002, pp 471-482Google Scholar
  9. 9.
    Clifton C, Housman E, Rosenthal A (1997) Experience with a combined approach to attribute-matching across heterogeneous databases. In: Proceedings of the 7th IFIP conference on database semantics (DS-7), Leysin, Switzerland, 7-10 October 1997, pp 428-456Google Scholar
  10. 10.
    www.daml.orgGoogle Scholar
  11. 11.
    Doan A (2002) Learning to map between structured representations of data. PhD thesis, University of Washington http://anhai.cs.uiuc.edu/home/thesis.htmlGoogle Scholar
  12. 12.
    Doan A, Domingos P, Halevy A (2001) Reconciling schemas of disparate data sources: a machine learning approach. In: Proceedings of the ACM SIGMOD conference on management of data, Santa Barbara, 21-24 May 2001, pp 509-520Google Scholar
  13. 13.
    Doan A, Madhavan J, Domingos P, Halevy A (2002) Learning to map ontologies on the Semantic Web. In: Proceedings of the 11th international World Wide Web conference, Honolulu, 7-11 May 2002, pp 662-673Google Scholar
  14. 14.
    Doan A, Madhavan J, Domingos P, Halevy A (2003) Ontology matching: a machine learning approach. In: Staab S, Studer R (eds) Handbook on ontologies in information systems. Springer, Berlin Heidelberg New YorkGoogle Scholar
  15. 15.
    Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103-130CrossRefMATHGoogle Scholar
  16. 16.
    Do H, Rahm E (2002) Coma: a system for flexible combination of schema matching approaches. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong, 20-23 August 2002, pp 610-621Google Scholar
  17. 17.
    Embley D, Jackman D, Xu L (2001) Multifaceted exploitation of metadata for attribute match discovery in information integration. In: Proceedings of the international workshop on information integration on the Web (WIIW), Rio de Janeiro, 9-11 April 2001, pp 110-117Google Scholar
  18. 18.
    Fensel D (2001) Ontologies: silver bullet for knowledge management and electronic commerce. Springer, Berlin Heidelberg New YorkGoogle Scholar
  19. 19.
    www.google.com.Google Scholar
  20. 20.
    Heflin J, Hendler J (2001) A portrait of the Semantic Web in action. IEEE Intell Sys 16(2):54-59CrossRefGoogle Scholar
  21. 21.
    Hummel RA, Zucker SW (1983) On the foundations of relaxation labeling processes. PAMI 5(3):267-287MATHGoogle Scholar
  22. 22.
    Fensel D, Musen M (eds) (2001) IEEE Intell Sys 16(2) March-April 2001Google Scholar
  23. 23.
    Lacher M, Groh G (2001) Facilitating the exchange of explicit knowledge through ontology mappings. In: Proceedings of the 14th international FLAIRS conference, Key West, 21-23 May 2001, pp 305-309Google Scholar
  24. 24.
    Lin D (1998) An information-theoritic definiton of similarity. In: Proceedings of the international conference on machine learning (ICML), Madison, WI, 24-27 July 1998, pp 296-304Google Scholar
  25. 25.
    Lloyd S (1983) An optimization approach to relaxation labeling algorithms. Image Vision Comput 1(2)Google Scholar
  26. 26.
    Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proceedings of the international conference on Very Large Databases (VLDB), Rome, 11-14 September 2001, pp 49-58Google Scholar
  27. 27.
    Maedche A (2001) A machine learning perspective for the Semantic Web. Semantic Web Working Symposium (SWWS) Position Paper, Stanford University, Stanford, CA, 30 July-1 August 2001Google Scholar
  28. 28.
    Maedche A, Staab S (2001) Ontology learning for the Semantic Web. IEEE Intell Sys 16(2)Google Scholar
  29. 29.
    McGuinness D, Fikes R, Rice J, Wilder S (2000) The chimaera ontology environment. In: Proceedings of the 17th national conference on artificial intelligence (AAAI), Austin, TX, 30 July-3 August 2000, pp 1123-1124Google Scholar
  30. 30.
    Melnik S, Molina-Garcia H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm. In: Proceedings of the international conference on Data Engineering (ICDE), San Jose, 26 February-1 March 2002, pp 117-128Google Scholar
  31. 31.
    Miller R, Haas L, Hernandez M (2000) Schema mapping as query discovery. In: Proceedings of the international conference on very large databases (VLDB), Cairo, Egypt, 10-14 September 2000, pp 77-88Google Scholar
  32. 32.
    Milo T, Zohar S (1998) Using schema matching to simplify heterogeneous data translation. In: Proceedings of the international conference on very large databases (VLDB), New York, 24-27 August 1998, pp 122-133Google Scholar
  33. 33.
    Mitra P, Wiederhold G, Jannink J (1999) Semi-automatic integration of knowledge sources. In: Proceedings of Fusion’99, Sunnyvale, CA, July 1999Google Scholar
  34. 34.
    Noy NF, Musen MA (2000) PROMPT: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the national conference on artificial intelligence (AAAI), Austin, TX, 30 July-3 August 2000, pp 450-455Google Scholar
  35. 35.
    Noy NF, Musen MA (2001) Anchor-PROMPT: using non-local context for semantic matching. In: Proceedings of the workshop on ontologies and information sharing at the international joint conference on artificial intelligence (IJCAI), Seattle, 4-10 August 2001Google Scholar
  36. 36.
    Omelayenko B (2001) Learning of ontologies for the Web: the analysis of existent approaches. In: Proceedings of the international workshop on Web dynamics, London, 3 January 2001Google Scholar
  37. 37.
    http://ontobroker.semanticweb.orgGoogle Scholar
  38. 38.
    http://www.w3.org/tr/owl-refGoogle Scholar
  39. 39.
    Padro L (1998) A hybrid environment for syntax-semantic tagging. PhD thesis, Universitat Polit’ecnica de Catalunya (UPC), BarcelonaGoogle Scholar
  40. 40.
    Pernelle N, Rousset MC, Ventos V (2001) Automatic construction and refinement of a class hierarchy over semi-structured data. In: Proceedings of the IJCAI workshop on ontology learning, Seattle, 4-10 August 2001Google Scholar
  41. 41.
    Popa L, Velegrakis Y, Hernandez M, Miller RJ, Fagin R (2002) Translating Web data. In: Proceedings of the international conference on very large databases (VLDB), Hong Kong, 20-23 August 2002, pp 598-609Google Scholar
  42. 42.
    Rahm E, Bernstein PA (2001) On matching schemas automatically. VLDB J 10(4):334-350CrossRefMATHGoogle Scholar
  43. 43.
    Rosenthal A, Seligman L (2001) Scalability issues in data integration. In: Proceedings of the AFCEA federal database conference, 2001Google Scholar
  44. 44.
    Ryutaro I, Hideaki T, Shinichi H (2001) Rule induction for concept hierarchy alignment. In: Proceedings of the 2nd workshop on ontology learning at the 17th international joint conference on artificial intelligence (IJCAI), Seattle, 4-10 August 2001Google Scholar
  45. 45.
    Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res 10:271-289MATHGoogle Scholar
  46. 46.
    Uschold M (2003) Where is the semantics in the Semantic Web? AI Mag (in press)Google Scholar
  47. 47.
    Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths, LondonGoogle Scholar
  48. 48.
    Wolpert D (1992) Stacked generalization. Neural Netw 5:241-259Google Scholar
  49. 49.
    Yan LL, Miller RJ, Haas LM, Fagin R (2001) Data driven understanding and refinement of schema mappings. In: Proceedings of the ACM SIGMOD conference on management of data, Santa Barbara, 21-24 May 2001, pp 485-496Google Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2003

Authors and Affiliations

  • AnHai Doan
    • 1
  • Jayant Madhavan
    • 2
  • Robin Dhamankar
    • 1
  • Pedro Domingos
    • 2
  • Alon Halevy
    • 2
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.Department of Computer Science and EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations