A Clustering-Based Approach for Large-Scale Ontology Matching

  • Alsayed Algergawy
  • Sabine Massmann
  • Erhard Rahm
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6909)


Schema and ontology matching have attracted a great deal of interest among researchers. Despite the advances achieved, the large matching problem still presents a real challenge, such as it is a time-consuming and memory-intensive process. We therefore propose a scalable, clustering-based matching approach that breaks up the large matching problem into smaller matching problems. In particular, we first introduce a structure-based clustering approach to partition each schema graph into a set of disjoint subgraphs (clusters). Then, we propose a new measure that efficiently determines similar clusters between every two sets of clusters to obtain a set of small matching tasks. Finally, we adopt the matching prototype COMA++ to solve individual matching tasks and combine their results. The experimental analysis reveals that the proposed method permits encouraging and significant improvements.


Match Task Cluster Document Schema Graph Ontology Match Purchase Order 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abiteboul, S., Suciu, D., Buneman, P.: Data on the Web: From Relations to Semistructed Data and XML. Morgan Kaufmann, USA (2000)Google Scholar
  2. 2.
    Algergawy, A., Nayak, R., Saake, G.: Element similarity measures in XML schema matching. Information Sciences 180(24), 4975–4998 (2010)CrossRefGoogle Scholar
  3. 3.
    Chiticariu, L., Hernndez, M.A., Kolaitis, P.G., Popa, L.: Semi-automatic schema integration in Clio. In: VLDB 2007, pp. 1326–1329 (2007)Google Scholar
  4. 4.
    Choi, N., Song, I.-Y., Han, H.: A survey on ontology mapping. SIGMOD Record 35(3), 34–41 (2006)CrossRefGoogle Scholar
  5. 5.
    Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IIWeb, pp. 73–78 (2003)Google Scholar
  6. 6.
    Do, H.H., Rahm, E.: Matching large schemas: Approaches and evaluation. Information Systems 32(6), 857–885 (2007)CrossRefGoogle Scholar
  7. 7.
    Ehrig, M., Staab, S.: QOM- quick ontology mapping. In: International Semantic Web Conference, pp. 683–697 (2004)Google Scholar
  8. 8.
    Gal, A.: Managing uncertainty in schema matching with top-k schema mappings. Journal on Data Semantics 6, 90–114 (2006)Google Scholar
  9. 9.
    Guerrini, G., Mesiti, M., Sanz, I.: An Overview of Similarity Measures for Clustering XML Documents. Emerging Techniques and Technologies (2007)Google Scholar
  10. 10.
    Hamdi, F., Safar, B., Reynaud, C., Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 251–269. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: A divide-and-conquer approach. DKE 67, 140–160 (2008)CrossRefGoogle Scholar
  12. 12.
    O. A. E. Initiative (2010),
  13. 13.
    Massmann, S., Rahm, E.: Evaluating instance-based matching of web directories. In: 11th Workshop on Web and Databases, WebDB (2008)Google Scholar
  14. 14.
    Peukert, E., Berthold, H., Rahm, E.: Rewrite techniques for performance optimization of schema matching processes. In: EDBT, pp. 453–464 (2010)Google Scholar
  15. 15.
    Peukert, E., Massmann, S., Konig, K.: Comparing similarity combination methods for schema matching. In: GI-Workshop, pp. 692–701 (2010)Google Scholar
  16. 16.
    Rahm, E.: Towards large-scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications series, Springer, Heidelberg (2010)Google Scholar
  17. 17.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)CrossRefzbMATHGoogle Scholar
  18. 18.
    Rahm, E., Do, H.-H., Massmann, S.: Matching large XML schemas. SIGMOD Record 33(4), 26–31 (2004)CrossRefGoogle Scholar
  19. 19.
    Seddiquia, M.H., Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Semantics 7(4), 344–356 (2009)CrossRefGoogle Scholar
  20. 20.
    Wang, Z., Wang, Y., Zhang, S., Shen, G., Du, T.: Matching large scale ontology effectively. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 99–105. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Yuruk, N., Mete, M., Xu, X., Schweiger, T.A.J.: AHSCAN: Agglomerative hierarchical structural clustering algorithm for networks. In: International Conference on Advances in Social Network Analysis and Mining, pp. 72–77 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Alsayed Algergawy
    • 1
  • Sabine Massmann
    • 1
  • Erhard Rahm
    • 1
  1. 1.Department of Computer ScienceUniversity of LeipzigGermany

Personalised recommendations