Discovery of Probabilistic Mappings between Taxonomies: Principles and Experiments

  • Rémi Tournaire
  • Jean-Marc Petit
  • Marie-Christine Rousset
  • Alexandre Termier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6720)

Abstract

In this paper, we investigate a principled approach for defining and discovering probabilistic mappings between two taxonomies. First, we compare two ways of modeling probabilistic mappings which are compatible with the logical constraints declared in each taxonomy. Then we describe a generate and test algorithm which minimizes the number of calls to the probability estimator for determining those mappings whose probability exceeds a certain threshold. Finally, we provide an experimental analysis of this approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adams, E.: A Primer of Probability logic, CSLI. Stanford University, Stanford (1998)Google Scholar
  2. 2.
    Adjiman, P., Chatalic, P., Goasdoué, F., Rousset, M.C., Simon, L.: Distributed reasoning in a peer-to-peer setting: Application to the semantic web. Journal of Artificial Intelligence Research (JAIR) 25, 269–314 (2006)MATHMathSciNetGoogle Scholar
  3. 3.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM, New York (2005)Google Scholar
  4. 4.
    Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. Journal of the ACM (JACM) 31(1), 30–46 (1984)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Benjelloun, O., Sarma, A.D., Halevy, A.Y., Widom, J.: ULDBs: Databases with uncertainty and lineage. In: VLDB, pp. 953–964 (2006)Google Scholar
  6. 6.
    Castano, S., Ferrara, A., Lorusso, D., Näth, T.H., Möller, R.: Mapping validation by probabilistic reasoning. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 170–184. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Castano, S., Ferrara, A., Messa, G.: Results of the H-MATCH ontology matchmaker in OAEI 2006. In: Proceedings of the ISWC 2006 Workshop on Ontology Matching, Athens, GA, USA (2006)Google Scholar
  8. 8.
    Castano, S., Ferrara, A., Montanelli, S.: H-MATCH: an algorithm for dynamically matching ontologies in peer-based systems. In: SWDB, pp. 231–250 (2003)Google Scholar
  9. 9.
    Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-automatic schema integration in clio. In: VLDB, pp. 1326–1329 (2007)Google Scholar
  10. 10.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (September 2001)MATHGoogle Scholar
  11. 11.
    Dalvi, N.N., Suciu, D.: Answering queries from statistics and probabilistic views. In: VLDB, pp. 805–816 (2005)Google Scholar
  12. 12.
    David, J., Guillet, F., Gras, R., Briand, H.: An interactive, asymmetric and extensional method for matching conceptual hierarchies. In: EMOI-INTEROP Workshop, Luxembourg (2006)Google Scholar
  13. 13.
    Dean, M., Schreiber, G.: OWL web ontology language reference. W3C recommendation, W3C (February 2004)Google Scholar
  14. 14.
    Degroot, M.H.: Optimal Statistical Decisions (Wiley Classics Library). Wiley-Interscience, Hoboken (April 2004)CrossRefGoogle Scholar
  15. 15.
    Do, H.H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: VLDB (2002)Google Scholar
  16. 16.
    Doan, A., Domingos, P., Levy, A.Y.: Learning mappings between data schemas. In: Proceedings of the AAAI 2000 Workshop on Learning Statistical Models from Relational DatA (2000)Google Scholar
  17. 17.
    Doan, A., Madhavan, J., Domingos, P., Halevy, A.Y.: Learning to map between ontologies on the Semantic Web. In: WWW, pp. 662–673 (2002)Google Scholar
  18. 18.
    Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: VLDB, pp. 687–698 (2007)Google Scholar
  19. 19.
    Duchon, P., Flajolet, P., Louchard, G., Schaeffer, G.: Boltzmann samplers for the random generation of combinatorial structures. Comb. Probab. Comput. 13(4-5), 577–625 (2004)CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Euzenat, J., Ferrara, A., Hollink, L., Isaac, A., Joslyn, C., Malais, V., Meilicke, C., Nikolov, A., Pane, J., Sabou, M., et al.: Results of the ontology alignment evaluation initiative 2009. In: Fourth International Workshop on Ontology Matching, Washington, DC (2009)Google Scholar
  21. 21.
    Euzenat, J.: Semantic Precision and Recall for Ontology Alignment Evaluation. In: IJCAI, pp. 348–353 (2007)Google Scholar
  22. 22.
    Euzenat, J.: Ontology alignment evaluation initiative (July 2008), http://www.oaei.ontologymatching.org/
  23. 23.
    Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)MATHGoogle Scholar
  24. 24.
    Euzenat, J., Valtchev, P.: Similarity-based ontology alignment in OWL-Lite. In: ECAI, pp. 333–337 (2004)Google Scholar
  25. 25.
    Fagin, R.: Horn clauses and database dependencies. J. ACM 29(4), 952–985 (1982)CrossRefMATHMathSciNetGoogle Scholar
  26. 26.
    Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (May 1998)MATHGoogle Scholar
  27. 27.
    Flake, G.W., Lawrence, S.: Efficient SVM regression training with SMO. Mach. Learn. 46(1-3), 271–290 (2002)CrossRefMATHGoogle Scholar
  28. 28.
    Gal, A.: Managing uncertainty in schema matching with top-k schema mappings. Journal on Data Semantics 6 (2006)Google Scholar
  29. 29.
    Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. The VLDB Journal 14(1), 50–67 (2005), http://www.portal.acm.org.gate6.inist.fr/citation.cfm?id=1053477 CrossRefGoogle Scholar
  30. 30.
    Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-match: an algorithm and an implementation of semantic matching. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 61–75. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  31. 31.
    Hamdi, F., Safar, B., Reynaud, C., Zargayouna, H.: Alignment-based Partitioning of Large-scale Ontologies. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol. 292, pp. 251–269. Springer, Heidelberg (2010), http://www.hal.inria.fr/inria-00432606/en/ CrossRefGoogle Scholar
  32. 32.
    Hamdi, F., Zargayouna, H., Safar, B., Reynaud, C.: TaxoMap in the OAEI 2008 alignment contest. In: Ontology Alignment Evaluation Initiative (OAEI) 2008, Campaign - Int. Workshop on Ontology Matching (2008)Google Scholar
  33. 33.
    Hayes, P. (ed.) RDF Semantics. W3C Recommendation, World Wide Web Consortium (February 2004), http://www.w3.org/TR/rdf-mt/
  34. 34.
    Ichise, R., Takeda, H., Honiden, S.: Integrating multiple internet directories by instance-based learning. In: International Joint Conference on Artificial Intelligence (IJCAI), vol. 18, pp. 22–30 (2003)Google Scholar
  35. 35.
    Ichise, R., Hamasaki, M., Takeda, H.: Discovering relationships among catalogs. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 371–379. Springer, Heidelberg (2004), http://www.citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.5336 CrossRefGoogle Scholar
  36. 36.
    Isaac, A., van der Meij, L., Schlobach, S., Wang, S.: An empirical study of instance-based ontology matching. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 253–266. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  37. 37.
    Koller, D., Levy, A., Pfeffer, A.: P-CLASSIC: a tractable probablistic description logic. In: Proceedings of the National Conference on Artificial Intelligence, pp. 390–397 (1997)Google Scholar
  38. 38.
    Li, W.S., Clifton, C.: SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33(1), 49–84 (2000)CrossRefMATHGoogle Scholar
  39. 39.
    Lin, F., Sandkuhl, K.: A survey of exploiting wordnet in ontology matching. In: Artificial Intelligence in Theory and Practice II, pp. 341–350 (2008)Google Scholar
  40. 40.
    Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.: Corpus-based schema matching. In: International Conference on Data Engineering, pp. 57–68 (2005)Google Scholar
  41. 41.
    Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. The VLDB Journal, 49–58 (2001), http://www.citeseer.ist.psu.edu/madhavan01generic.html
  42. 42.
    Mao, M., Peng, Y.: PRIOR system: Results for OAEI 2006. In: Proceedings of the Ontology Alignment Evaluation Initiative, pp. 165–172 (2006)Google Scholar
  43. 43.
    Melnik, S., Garcia-Molina, H., Rahm, E., et al.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings of the International Conference on Data Engineering, pp. 117–128 (2002)Google Scholar
  44. 44.
    Mitchell, T.: Machine Learning. McGraw-Hill Education (ISE Editions) (1997), http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0071154671
  45. 45.
    Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management 43(3), 552–576 (2007)CrossRefGoogle Scholar
  46. 46.
    Nottelmann, H., Straccia, U.: A probabilistic, logic-based framework for automated web director alignment. In: Ma, Z. (ed.) Soft Computing in Ontologies and the Semantic Web. Studies in Fuzziness and Soft Computing, vol. 204, pp. 47–77. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  47. 47.
    Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, San Francisco (January 1993), http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/1558602380 Google Scholar
  48. 48.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)CrossRefMATHGoogle Scholar
  49. 49.
    Ramesh, G., Maniatty, W., Zaki, M.J.: Feasible itemset distributions in data mining: theory and application. In: PODS, pp. 284–295 (2003)Google Scholar
  50. 50.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11(95), 130 (1999)MATHGoogle Scholar
  51. 51.
    Saïs, F., Pernelle, N., Rousset, M.C.: Combining a logical and a numerical method for data reconciliation. In: Spaccapietra, S. (ed.) Journal on Data Semantics XII. LNCS, vol. 5480, pp. 66–94. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  52. 52.
    Serafini, L., Bouquet, P., Magnini, B., Zanobini, S.: An algorithm for matching contextualized schemas via SAT. In: Proceedings of CONTEXT 2003 (2003)Google Scholar
  53. 53.
    Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  54. 54.
    Shvaiko, P., Euzenat, J.: Ten challenges for ontology matching. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part II, pp. 1164–1182. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  55. 55.
    Stumme, G., Maedche, A.: FCA-MERGE: Bottom-Up Merging of Ontologies. In: Proc. of the 17th International Joint Conference on Artificial Intelligence, pp. 225–234 (2001)Google Scholar
  56. 56.
    Tournaire, R., Petit, J.M., Rousset, M.C., Termier, A.: Discovery of Probabilistic Mappings between Taxonomies: Principles and Experiments (technical report) (2009), http://www.membres-liglab.imag.fr/tournaire/longpaper.pdf
  57. 57.
    Tournaire, R., Rousset, M.C.: Découverte automatique de correspondances entre taxonomies - internal report (in french) (2008), http://www.membres-liglab.imag.fr/tournaire/irap08.pdf
  58. 58.
    Van Rijsbergen, C.J.: Information retrieval. Butterworths, London (1975)MATHGoogle Scholar
  59. 59.
    Wang, P., Xu, B.: Lily: Ontology alignment results for OAEI 2009. Shvaiko, et al [SEG+ 09] (2009)Google Scholar
  60. 60.
    Wang, S., Englebienne, G., Schlobach, S.: Learning concept mappings from instance similarity. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 339–355. Springer, Heidelberg (2008), http://www.portal.acm.org.gate6.inist.fr/citation.cfm?id=1483184 CrossRefGoogle Scholar
  61. 61.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Rémi Tournaire
    • 1
  • Jean-Marc Petit
    • 2
  • Marie-Christine Rousset
    • 1
  • Alexandre Termier
    • 1
  1. 1.UJF/ Grenoble INP / UPMF / CNRS, LIG UMR 5217Université de GrenobleSt-Martin d’Hères CedexFrance
  2. 2.CNRS INSA-Lyon, LIRIS UMR 5205Université de LyonVilleurbanne CedexFrance

Personalised recommendations