Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing

  • Songyun Duan
  • Achille Fokoue
  • Oktie Hassanzadeh
  • Anastasios Kementsietsidis
  • Kavitha Srinivas
  • Michael J. Ward
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7649)


In this paper, we describe a mechanism for ontology alignment using instance based matching of types (or classes). Instance-based matching is known to be a useful technique for matching ontologies that have different names and different structures. A key problem in instance matching of types, however, is scaling the matching algorithm to (a) handle types with a large number of instances, and (b) efficiently match a large number of type pairs. We propose the use of state-of-the art locality-sensitive hashing (LSH) techniques to vastly improve the scalability of instance matching across multiple types. We show the feasibility of our approach with DBpedia and Freebase, two different type systems with hundreds and thousands of types, respectively. We describe how these techniques can be used to estimate containment or equivalence relations between two type systems, and we compare two different LSH techniques for computing instance similarity.


Ontology Alignment Schema Matching Linked Data Semantic Web 


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA+ +. In: ACM SIGMOD Int’l Conf. on Mgmt. of Data, pp. 906–908 (2005), System demonstrationGoogle Scholar
  3. 3.
    Bellahsene, Z., Bonifati, A., Rahm, E.: Schema Matching and Mapping (Data-Centric Systems and Applications), 1st edn. Springer (2011)Google Scholar
  4. 4.
    Berlin, J., Motro, A.: Database Schema Matching Using Machine Learning with Feature Selection. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 452–466. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Bernstein, P.A., Melnik, S., Petropoulos, M., Quix, C.: Industrial-Strength Schema Matching. SIGMOD Record 33(4), 38–43 (2004)CrossRefGoogle Scholar
  6. 6.
    Bilke, A., Naumann, F.: Schema Matching Using Duplicates. In: IEEE Proc. of the Int’l Conf. on Data Eng., pp. 69–80 (2005)Google Scholar
  7. 7.
    Bizer, C., Jentzsch, A., Cyganiak, R.: State of the LOD Cloud (September 2011), (online; accessed October 31, 2011)
  8. 8.
    Bizer, C., Volz, J., Kobilarov, G., Gaedke, M.: Silk - A Link Discovery Framework for the Web of Data. In: WWW 2009 Workshop on Linked Data on the Web (LDOW 2011) (April 2009)Google Scholar
  9. 9.
    Broder, A.Z.: Some applications of rabin’s fingerprinting method. In: Sequences II: Methods in Communications, Security, and Computer Science (MCSCS), pp. 143–152. Springer (1993)Google Scholar
  10. 10.
    Broder, A.: On the resemblance and containment of documents. In: Proc. Compression and Complexity of Sequences, pp. 21–29 (1997)Google Scholar
  11. 11.
    Byrne, B., Fokoue, A., Kalyanpur, A., Srinivas, K., Wang, M.: Scalable matching of industry models - a case study. In: Proceedings of the International Workshop on Ontology Matching, OM (2009)Google Scholar
  12. 12.
    Carter, J., Wegman, M.N.: Universal classes of hash functions. Journal of Computer and System Sciences 18(2), 143–154 (1979), MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Charikar, M.: Similarity estimation techniques from rounding algorithms. In: ACM Symp. on Theory of Computing (STOC), pp. 380–388 (2002)Google Scholar
  14. 14.
    Dai, B.T., Koudas, N., Srivastava, D., Tung, A.K.H., Venkatasubramanian, S.: Validating Multi-column Schema Matchings by Type. In: IEEE Proc. of the Int’l Conf. on Data Eng., pp. 120–129 (2008)Google Scholar
  15. 15.
    Do, H.H., Rahm, E.: COMA - A System for Flexible Combination of Schema Matching Approaches. In: Proc. of the Int’l Conf. on Very Large Data Bases (VLDB), pp. 610–621 (2002)Google Scholar
  16. 16.
    Doan, A., Domingos, P., Halevy, A.Y.: Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. In: ACM SIGMOD Int’l Conf. on Mgmt. of Data, pp. 509–520 (2001)Google Scholar
  17. 17.
    Doan, A., Halevy, A.Y.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine 26(1), 83–94 (2005)Google Scholar
  18. 18.
    Doan, A., Madhavan, J., Domingos, P., Halevy, A.Y.: Ontology Matching: A Machine Learning Approach. In: Handbook on Ontologies, pp. 385–404. Springer (2004)Google Scholar
  19. 19.
    Duan, S., Fokoue, A., Srinivas, K.: One Size Does Not Fit All: Customizing Ontology Alignment Using User Feedback. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 177–192. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Duan, S., Fokoue, A., Srinivas, K., Byrne, B.: A Clustering-Based Approach to Ontology Alignment. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 146–161. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    Engmann, D., Maßmann, S.: Instance Matching with COMA++. In: BTW Workshops, pp. 28–37 (2007)Google Scholar
  22. 22.
    Euzenat, J., Shvaiko, P.: Ontology Matching. Springer (2007),
  23. 23.
    Hassanzadeh, O., Duan, S., Fokoue, A., Kementsietsidis, A., Srinivas, K., Ward, M.J.: Helix: Online Enterprise Data Analytics. In: Proceedings of the 20th International World Wide Web Conference (WWW 2011) - Demo Track (2011)Google Scholar
  24. 24.
    Hassanzadeh, O., Xin, R., Miller, R.J., Kementsietsidis, A., Lim, L., Wang, M.: Linkage Query Writer. Proceedings of the VLDB Endowment (PVLDB) 2(2), 1590–1593 (2009)Google Scholar
  25. 25.
    Huang, C.C.E., Chiang, R.H.L., Lim, E.P.: Instance-based attribute identification in database integration. VLDB J. 12(3), 228–243 (2003)CrossRefGoogle Scholar
  26. 26.
    Isaac, A., van der Meij, L., Schlobach, S., Wang, S.: An Empirical Study of Instance-Based Ontology Matching. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 253–266. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  27. 27.
    Kang, J., Naughton, J.F.: On Schema Matching with Opaque Column Names and Data Values. In: ACM SIGMOD Int’l Conf. on Mgmt. of Data, pp. 205–216 (2003)Google Scholar
  28. 28.
    Kirsten, T., Thor, A., Rahm, E.: Instance-Based Matching of Large Life Science Ontologies. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 172–187. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  29. 29.
    Li, W.S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data and Knowledge Engineering 33(1), 49–84 (2000)zbMATHCrossRefGoogle Scholar
  30. 30.
    Madhavan, J., Bernstein, P.A., Rahm, E.: Generic Schema Matching with Cupid. In: Proc. of the Int’l Conf. on Very Large Data Bases (VLDB), pp. 49–58 (2001)Google Scholar
  31. 31.
    Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. The Int’l Journal on Very Large Data Bases 10(4), 334–350 (2001)zbMATHCrossRefGoogle Scholar
  32. 32.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 1st edn. Cambridge University Press, College Station (2011)CrossRefGoogle Scholar
  33. 33.
    Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Songyun Duan
    • 1
  • Achille Fokoue
    • 1
  • Oktie Hassanzadeh
    • 1
  • Anastasios Kementsietsidis
    • 1
  • Kavitha Srinivas
    • 1
  • Michael J. Ward
    • 1
  1. 1.IBM T.J. Watson ResearchHawthorneUSA

Personalised recommendations