Skip to main content
Log in

Word-Sense Disambiguation for Ontology Mapping: Concept Disambiguation using Virtual Documents and Information Retrieval Techniques

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Ontology mapping is a crucial task for the facilitation of information exchange and data integration. A mapping system can use a variety of similarity measures to determine concept correspondences. This paper proposes the integration of word-sense disambiguation techniques into lexical similarity measures. We propose a disambiguation methodology which entails the creation of virtual documents from concept and sense definitions, including their neighbourhoods. The specific terms are weighted according to their origin within their respective ontology. The document similarities between the concept document and sense documents are used to disambiguate the concept meanings. First, we evaluate to what extent the proposed disambiguation method can improve the performance of a lexical similarity metric. We observe that the disambiguation method improves the performance of each tested lexical similarity metric. Next, we demonstrate the potential of a mapping system utilizing the proposed approach through the comparison with contemporary ontology mapping systems. We observe a high performance on a real-world data set. Finally, we evaluate how the application of several term-weighting techniques on the virtual documents can affect the quality of the generated alignments. Here, we observe that weighting terms according to their ontology origin leads to the highest performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Aguirre J, Grau B, Eckert K, Euzenat J, Ferrara A, van Hague R, Hollink L, Jimenez-Ruiz E, Meilicke C, Nikolov A, Ritze D, Shvaiko P, Svab-Zamazal O, Trojahn C, Zapilko B (2012) Results of the ontology alignment evaluation initiative 2012. In: Proceedings of the 7th ISWC workshop on ontology matching, pp 73–115

  2. Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence, San Francisco, CA, USA, IJCAI’03, pp 805–810

  3. Bar-Hillel Y (1960) The present status of automatic translation of languages. Readings in machine translation, pp 45–77

  4. Bodenreider O (2004) The unified medical language system (umls): integrating biomedical terminology. Nucl Acids Res 32(suppl 1):D267–D270

    Article  Google Scholar 

  5. Budanitsky A, Hirst G (2001) Semantic distance in wordnet: an experimental, application-oriented evaluation of five measures. In: workshop on wordNet and other lexical resources, second meeting of the North American chapter of the association for computational linguistics, pp 29–34

  6. Buitelaar P, Cimianop P, Haase P, Sintek M (2009) Towards linguistically grounded ontologies. The semantic web: research and applications, vol 5554., Lecture notes in computer scienceSpringer, Berlin, pp 111–125

    Chapter  Google Scholar 

  7. Cruz I, Lucas W (1997) A visual approach to multimedia querying and presentation. In: Proceedings of the fifth ACM international conference on multimedia, ACM, pp 109–120

  8. Cruz I, Xiao H (2009) Ontology driven data integration in heterogeneous networks. Theory, models and applications, Complex systems in knowledge-based environments, pp 75–98

  9. Cruz I, Antonelli F, Stroe C (2009) Agreementmaker: efficient matching for large real-world schemas and ontologies. Proc VLDB Endow 2(2):1586–1589

    Article  Google Scholar 

  10. Cruz IF, Palmonari M, Caimi F, Stroe C (2013) Building linked ontologies with high precision using subclass mapping discovery. Artif Intell Rev 40(2):127–145

    Article  Google Scholar 

  11. De Melo G, Weikum G (2009) Towards a universal wordnet by learning from combined evidence. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 513–522

  12. Euzenat J (2001) Towards a principled approach to semantic interoperability. In: Proceedings of the IJCAI-01 workshop on ontologies and information sharing, pp 19–25

  13. Euzenat J (2004) An api for ontology alignment. In: Proceedings of the international semantic web conference (ISWC), pp 698–712

  14. Euzenat J, Shvaiko P (2007) Ontology matching, vol 18. Springer, Berlin

    MATH  Google Scholar 

  15. Euzenat J, Ferrara A, van Hague R, Hollink L, Meilicke C, Nikolov A, Scharffe F, Shvaiko P, Stuckenschmidt H, Svab-Zamazal O, Trojahn dos SC (2011a) Results of the ontology alignment evaluation initiative 2011. In: Proceedings 6th ISWC workshop on ontology matching (OM), Bonn (DE), pp 85–110

  16. Euzenat J, Meilicke C, Stuckenschmidt H, Shvaiko P, Trojahn C (2011b) Ontology alignment evaluation initiative: six years of experience. J Data Semant XV, pp 158–192

  17. Gale WA, Church KW, Yarowsky D (1992) A method for disambiguating word senses in a large corpus. Comput Humanit 26(5/6):415–439

    Article  Google Scholar 

  18. Giunchiglia F, Shvaiko P (2003) Semantic matching. Knowl Eng Rev 18(3):265–280

    Article  Google Scholar 

  19. Giunchiglia F, Yatskevich M (2004) Element level semantic matching. Meaning coordination and negotiation (MCN-04), p 37

  20. Giunchiglia F, Shvaiko P, Yatskevich M (2004) S-match: an algorithm and an implementation of semantic matching. The semantic web: research and applications, pp 61–75

  21. Giunchiglia F, Yatskevich M, Avesani P, Shvaiko P (2009) A large dataset for the evaluation of ontology matching. Knowl Eng Rev J 24:137–157

    Article  Google Scholar 

  22. Grau BC, Dragisic Z, Eckert K, Euzenat J, Ferrara A, Granada R, Ivanova V, Jiménez-Ruiz E, Kempf AO, Lambrix P, et al. (2013) Results of the ontology alignment evaluation initiative 2013. In: Proceedings 8th ISWC workshop on ontology matching (OM), pp 61–100

  23. Gulić M, Vrdoljak B (2013) Cromatcher-results for oaei 2013. In: Proceedings of the eighth ISWC international workshop on ontology matching, pp 117–122

  24. Hau J, Lee W, Darlington J (2005) A semantic similarity measure for semantic web services. In: web service Semantics workshop 2005 at WWW2005

  25. He B, Chang KCC (2006) Automatic complex schema matching across web query interfaces: a correlation mining approach. ACM Trans Database Syst (TODS) 31(1):346–395

    Article  Google Scholar 

  26. Hindle D, Rooth M (1993) Structural ambiguity and lexical relations. Comput Linguist 19(1):103–120

    Google Scholar 

  27. Hu W, Qu Y (2008) Falcon-ao: a practical ontology matching system. Web Semant Sci Serv Agents World Wide Web 6(3):237–239

    Article  MathSciNet  Google Scholar 

  28. Ide N, Véronis J (1998) Introduction to the special issue on word sense disambiguation: the state of the art. Comput Linguist 24(1):2–40

    Google Scholar 

  29. Jaro M (1989) Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. J Am Stat Assoc 84(406):414–420

    Article  Google Scholar 

  30. Jean-Mary Y, Shironoshita E, Kabuka M (2009) Ontology matching with semantic verification. Web Semant 7:235–251

    Article  Google Scholar 

  31. Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21

    Article  Google Scholar 

  32. Kalfoglou Y, Schorlemmer M (2003) Ontology mapping: the state of the art. Knowl Eng Rev 18(1):1–31

    Article  Google Scholar 

  33. Kim W, Seo J (1991) Classifying schematic and data heterogeneity in multidatabase systems. Computer 24(12):12–18

    Article  Google Scholar 

  34. Kitamura Y, Segawa S, Sasajima M, Tarumi S, Mizoguchi R (2008) Deep semantic mapping between functional taxonomies for interoperable semantic search. The semantic web, pp 137–151

  35. Kotis K, Valarakos A, Vouros G (2006a) Automs: automated ontology mapping through synthesis of methods. Ontol Matching, pp 96–106.

  36. Kotis K, Vouros G, Stergiou K (2006b) Towards automatic merging of domain ontologies: the hcone-merge approach. Web Semant Sci Serv Agents World Wide Web 4(1):60–79

    Article  Google Scholar 

  37. Lassila O, Swick R, W3C (1998) Resource description framework (rdf) model and syntax specification

  38. Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on systems documentation, SIGDOC ’86, pp 24–26

  39. Litkowski K (1997) Desiderata for tagging with wordnet synsets or mcca categories. In: fourth meeting of the ACL special interest group on the Lexicon. Washington, DC: association for computational linguistics

  40. Locke W, Booth A (1955) Machine translation of languages: fourteen essays. Published jointly by Technology Press of the Massachusetts Institute of Technology and Wiley, New York

  41. Mao M (2007) Ontology mapping: an information retrieval and interactive activation network based approach. In: Proceedings of the 6th international the semantic web and 2nd Asian conference on Asian semantic web conference, Springer, Berlin, ISWC’07/ASWC’07, pp 931–935

  42. Mao M, Peng Y, Spring M (2007) A profile propagation and information retrieval based ontology mapping approach. In: Proceedings of the third international conference on semantics. Knowledge and Grid, IEEE, pp 164–169

  43. Marshall I (1983) Choice of grammatical word-class without global syntactic analysis: tagging words in the lob corpus. Comput Humanit 17(3):139–150

    Article  Google Scholar 

  44. Matuszek C, Cabral J, Witbrock M, DeOliveira J (2006) An introduction to the syntax and content of cyc. AAAI Spring symposium

  45. McCarthy D, Koeling R, Weeds J, Carroll J (2004) Finding predominant word senses in untagged text. In: Proceedings of the 42nd annual meeting on association for computational linguistics, association for computational linguistics, p 279

  46. McCrae J, Spohr D, Cimiano P (2011) Linking lexical resources and ontologies on the semantic web with lemon. In: the semantic web: research and applications, lecture notes in computer science, vol 6643, Springer, pp 245–259

  47. McGuinness D, van Harmelen F (2004) OWL web ontology language overview. W3C recommendation, W3C

  48. Meilicke C, Stuckenschmidt H (2007) Analyzing mapping extraction approaches. In: Proceedings of the ISWC 2007 workshop on ontology matching

  49. Mihalcea R (2006) Knowledge-based methods for wsd. Word sense disambiguation, pp 107–131

  50. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38:39–41

    Article  Google Scholar 

  51. Montoyo A, Suárez A, Rigau G, Palomar M (2005) Combining knowledge-and corpus-based word-sense-disambiguation methods. J Artif Intell Res 23(1):299–330

    MATH  Google Scholar 

  52. Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):10:1–10:69

    Article  Google Scholar 

  53. Navigli R, Ponzetto S (2010) Babelnet: building a very large multilingual semantic network. In: Proceedings of the 48th annual meeting of the association for computational linguistics, association for computational linguistics, pp 216–225

  54. Ngo D, Bellahsene Z, Coletta R (2012) Yam++-a combination of graph matching and machine learning approach to ontology alignment task. J Web Semant

  55. Niles I, Pease A (2001) Towards a standard upper ontology. In: Proceedings of the international conference on formal ontology in information systems-volume 2001, ACM, pp 2–9

  56. Niles I, Terry A (2004) The milo: a general-purpose, mid-level ontology. In: Proceedings of the international conference on information and knowledge engineering, pp 15–19

  57. Noy N, Musen M (2001) Anchor-prompt: using non-local context for semantic matching. In: Proceedings of the workshop on ontologies and information sharing at the international joint conference on artificial intelligence (IJCAI), pp 63–70

  58. Paulheim H, Hertling S (2013) Wesee-match results for oaei 2013. Proceedings of the eigth ISWC international workshop on ontology matching, pp 197–202

  59. Pedersen T (2006) Unsupervised corpus-based methods for wsd. Word sense disambiguation, pp 133–166

  60. Pedersen T, Banerjee S, Patwardhan S (2005) Maximizing semantic relatedness to perform word sense disambiguation. University of Minnesota supercomputing institute research report UMSI 25:2005

  61. Po L, Sorrentino S (2011) Automatic generation of probabilistic relationships for improving schema matching. Inf Syst 36(2):192–208

    Article  Google Scholar 

  62. Qu Y, Hu W, Cheng G (2006) Constructing virtual documents for ontology matching. In: Proceedings of the 15th international conference on World Wide Web, ACM, New York, NY, USA, WWW ’06, pp 23–31

  63. Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350

    Article  MATH  Google Scholar 

  64. Resnik P, Yarowsky D (1999) Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat Lang Eng 5(02):113–133

    Article  Google Scholar 

  65. Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18:613–620

  66. Saruladha K, Aghila G, Sathiya B (2011) A comparative analysis of ontology and schema matching systems. Int J Comput Appl 34(8):14–21, published by Foundation of computer science, New York

  67. Schadd F, Roos N (2012) Coupling of wordnet entries for ontology mapping using virtual documents. In: Proceedings of the seventh international workshop on ontology matching (OM-2012) collocated with the 11th international semantic web conference (ISWC-2012), pp 25–36

  68. Schütze H (1992) Dimensions of meaning. In: Proceedings of the 1992 ACM/IEEE conference on supercomputing, IEEE, pp 787–796

  69. Schütze H, Pedersen JO (1995) Information retrieval based on word senses. In: Proceedings of the 4th annual symposium on document analysis and information retrieval

  70. Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. In: journal on data semantics IV, Springer, pp 146–171

  71. Shvaiko P, Euzenat J (2008) Ten challenges for ontology matching. On the move to meaningful internet systems. OTM 5332:1164–1182

    Google Scholar 

  72. Shvaiko P, Euzenat J (2013) Ontology matching: state of the art and future challenges. Knowl Data Eng IEEE Trans 25(1):158–176

    Article  Google Scholar 

  73. Sicilia M, Garcia E, Sanchez S, Rodriguez E (2004) On integrating learning object metadata inside the opencyc knowledge base. In: Proceedings of advanced learning technologies, 2004 IEEE international conference on, IEEE, pp 900–901

  74. Sproat R, Hirschberg J, Yarowsky D (1992) A corpus-based synthesizer. Proc Int Conf Spok Lang Process 92:563–566

    Google Scholar 

  75. Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedness using wikipedia. AAAI 6:1419–1424

    Google Scholar 

  76. Suchanek F, Kasneci G, Weikum G (2008) Yago: a large ontology from wikipedia and wordnet. Web Semant Sci Serv Agents World Wide Web 6(3):203–217

    Article  Google Scholar 

  77. Talukdar PP, Ives ZG, Pereira F (2010) Automatically incorporating new sources in keyword search-based data integration. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ACM, pp 387–398

  78. Tan PN, Steinbach M, Kumar V (2005) Introduction to Data Mining, 1st edn Addison Wesley

  79. Van Den Broeck G, Driessens K (2011) Automatic discretization of actions and states in monte-carlo tree search. In: Proceedings of the international workshop on machine learning and data mining in and around games (DMLG), pp 1–12

  80. Wache H, Voegele T, Visser U, Stuckenschmidt H, Schuster G, Neumann H, Hübner S (2001) Ontology-based integration of information-a survey of existing approaches. In: IJCAI-01 workshop: ontologies and information sharing, vol 2001, pp 108–117

  81. Watters C (1999) Information retrieval and the virtual document. J Am Soc Inf Sci 50:1028–1029

    Article  Google Scholar 

  82. Weaver W (1955) Translation. Mach Transl Lang 14:15–23

    Google Scholar 

  83. Wilkes Y (1975) Preference semantics. In: Keenan E (ed) Formal semantics of natural language, Cambridge University Press, pp 329–348

  84. Yarowsky D (1994) Decision lists for lexical ambiguity resolution: application to accent restoration in spanish and french. In: Proceedings of the 32nd annual meeting on association for computational linguistics, association for computational linguistics, pp 88–95

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederik C. Schadd.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schadd, F.C., Roos, N. Word-Sense Disambiguation for Ontology Mapping: Concept Disambiguation using Virtual Documents and Information Retrieval Techniques. J Data Semant 4, 167–186 (2015). https://doi.org/10.1007/s13740-014-0045-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-014-0045-5

Keywords

Navigation