Advertisement

Arabian Journal for Science and Engineering

, Volume 44, Issue 4, pp 3117–3135 | Cite as

A Framework for Efficient Matching of Large-Scale Metadata Models

  • Seham Moawed
  • Alsayed AlgergawyEmail author
  • Amany Sarhan
  • Ali Eldosouky
Research Article - Computer Engineering and Computer Science
  • 92 Downloads

Abstract

Despite the success achieved in the metadata models matching area, large-scale matching does not preserve high match quality and efficiency at the same time. To deal with these challenges, we introduce a generic matching framework, called MetMat, to identify and discover corresponding entities across XML schemas and/or ontologies (metadata models). In particular, the proposed framework is based on a parallelized clustering-based matching approach, which first splits the original matching task into smaller independent tasks. These independent tasks are then carried out in parallel exploiting desktop platform features that are equipped with parallelism enabled multi-core processors. To this end, we develop three different parallel strategies: inter-, intra-, and hybrid-matching strategies. To obtain high quality, a set of matchers are exploited. The proposed framework is validated through an extensive set of experiments over small and large data sets. We also compared the MetMat framework to top matching tools participating in the OAEI (Ontology Alignment Evaluation Initiative) (http://oaei.ontologymatching.org/) for the last three years. The results show that the MetMat framework with the intra-parallel matching strategy outperforms other matching strategies in terms of processing time while preserving the same quality. Moreover, the tool acquires a good position through OAEI for the last three years.

Keywords

Metadata model matching Large-scale matching Partitioning-based matching Hierarchical clustering methods 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

A. Algergawy work has been funded by the Deutsche Forschungsgemeinschaft (DFG) as part of the CRC 1076 AquaDiva.

References

  1. 1.
    Ivanov, P.; Voigt, K.: Schema, ontology and metamodel matching—different, but indeed the same? In: Model and Data Engineering—First International Conference, MEDI 2011, Óbidos, Portugal, September 28–30, 2011. Proceedings, pp. 18–30. (2011)Google Scholar
  2. 2.
    Voigt, K.: Structural Graph-Based Metamodel Matching. Ph.D. thesis, Technischen Universität Dresden (2011)Google Scholar
  3. 3.
    Giunchiglia, F.; Shvaiko, P.: Semantic matching. Knowl. Rev. J. 18(3), 265–280 (2004)CrossRefGoogle Scholar
  4. 4.
    Agreste, S.; Meo, P.D.; Ferrara, E.; Ursino, D.: XML matchers: approaches and challenges. Knowl. Based Syst. 66, 190–209 (2014)CrossRefGoogle Scholar
  5. 5.
    Bellahsene, Z.; Bonifati, A.; Rahm, E.: Schema Matching and Mapping. Springer, Heidelberg (2011)CrossRefzbMATHGoogle Scholar
  6. 6.
    Bernstein, P.; Madhavan, J.; Rahm, E.: Generic schema matching, pp. 695–701. In: Ten Years, Proceedings of the VLDB Endowment (2011)Google Scholar
  7. 7.
    Ehrig, M.: Ontology Alignment: Bridging the Semantic Gap. Springer, New York (2007)Google Scholar
  8. 8.
    Rahm, E.; Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Babalou, S.; Kargar, M.J.; Davarpanah, S.H.: Large-scale ontology matching: a review of the literature. In: Second International Conference on Web Research (ICWR), pp. 158–165. (2016)Google Scholar
  10. 10.
    Do, H.H.; Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)CrossRefGoogle Scholar
  11. 11.
    Hamdi, F.; Safar, B.; Reynaud, C.; Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. Adv. Knowl. Discov. Manage. 292, 251–269 (2010)CrossRefzbMATHGoogle Scholar
  12. 12.
    Rahm, E.: Towards large-scale schema and ontology matching. In: Data-Centric Systems and Applications, pp. 3–27. Springer (2011)Google Scholar
  13. 13.
    Wang, Z.; Wang, Y.; Zhang, S.; Shen, G.; Du, T.: Matching large scale ontology effectively. In: ASWC 2006, LNCS 4185, pp. 99–105 (2006)Google Scholar
  14. 14.
    Doan, A.; Halevy, A.Y.; Ives, Z.G.: Principles of Data Integration. Morgan Kaumann, Boston (2012)Google Scholar
  15. 15.
    Algergawy, A.; Nayak, R.; Siegmund, N.; Koppen, V.; Saake, G.: Combining schema and level-based matching for web service discovery. In: 10th International Conference on Web Engineering, pp. 114–128. Springer (2010)Google Scholar
  16. 16.
    Caruccio, L.; Polese, G.; Tortora, G.: Synchronization of queries and views upon schema evolutions: a survey. ACM Trans. Database Syst. 41(2), 9:1–9:41 (2016)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Zablith, F.; Antoniou, G.; d’Aquin, M.; Flouris, G.; Kondylakis, H.; Motta, E.; Plexousakis, D.; Sabou, M.: Ontology evolution: a process centric survey. Knowl. Eng. Rev. 30(1), 45–75 (2013)CrossRefGoogle Scholar
  18. 18.
    Otero-Cerdeira, L.; Rodríguez-Martínez, F.J.; Gómez-Rodríguez, A.: Ontology matching: a literature review. Expert Syst. Appl. 42(2), 949–971 (2015)CrossRefGoogle Scholar
  19. 19.
    Shvaiko, P.; Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)CrossRefGoogle Scholar
  20. 20.
    Pei, J.; Hong, J.; Bell, D.A.: A novel clustering-based approach to schema matching. In: Advances in Information Systems, 4th International Conference, ADVIS, pp. 60–69. (2006)Google Scholar
  21. 21.
    Algergawy, A.; Massmann, S.; Rahm, E.: A clustering-based approach for large scale ontology matching. In: Advances in Databases and Information Systems, pp. 415–428. (2011)Google Scholar
  22. 22.
    Algergawy, A.; Babalou, S.; Kargar, M.J.; Davarpanah, S.H.: SeeCOnt: a new seeding-based clustering approach for ontology matching. In: 19th International Conference on Advances in Databases and Information Systems, ADBIS, pp. 245–258. (2015)Google Scholar
  23. 23.
    Aumuller, D.; Do, H.H.; Massmann, S.; Rahm, E.: Schema and ontology matching with COMA++. In The 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. Maryland, USA (2005)Google Scholar
  24. 24.
    Hu, W.; Qu, Y.; Cheng, G.: Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng. 67, 140–160 (2008)CrossRefGoogle Scholar
  25. 25.
    Grau, B.C.; Parsia, B.; Sirin, E.; Kalyanpur, A.: Automatic partitioning of OWL ontologies using E-connections. In: Proceedings of the 2005 International Workshop on Description Logics (DL2005), Edinburgh, Scotland, UK, July 26–28. (2005)Google Scholar
  26. 26.
    Garcia, A.C.; Tiveron, L.; Justel, C.M.; Cavalcanti, M.C.: Applying graph partitioning techniques to modularize large ontologies. In: Proceedings of Joint V Seminar on Ontology Research in Brazil and VII International Workshop on Metamodels, Ontologies and Semantic Technologies, pp. 72–83. (2012)Google Scholar
  27. 27.
    Jiménez-Ruiz, E.; Grau, B.C.: LogMap: logic-based and scalable ontology matching. In: 10th International Semantic Web Conference-ISWC 2011, pp. 273–288. (2011)Google Scholar
  28. 28.
    Doran, P.; Tamma, V.A.M.; Iannone, L.: Ontology module extraction for ontology reuse: an ontology engineering perspective. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM, pp. 61–70. (2007)Google Scholar
  29. 29.
    Santos, E.; Faria, D.; Pesquita, C.; Couto, F.M.: Ontology alignment repair through modularization and confidence-based heuristics. PLoS ONE 10(12), e0144807 (2015)CrossRefGoogle Scholar
  30. 30.
    Melnik, S.; Garcia-Molina, H.; Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: ICDE’02. (2002)Google Scholar
  31. 31.
    Seddiquia, M.H.; Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Semantics 7(4), 344–356 (2009)CrossRefGoogle Scholar
  32. 32.
    Kirsten, T.; Groß, A.; Hartung, M.; Rahm, E.: GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J. Biomed. Semantics 2, 6 (2011)CrossRefGoogle Scholar
  33. 33.
    Ngo, D.; Bellahsene, Z.: YAM++: a multi-strategy based approach for ontology matching task. In: EKAW’12 Proceedings of the 18th International Conference on Knowledge Engineering and Knowledge Management, pp. 421–425. (2012)Google Scholar
  34. 34.
    Zhong, Q.; Li, H.; Li, J.; Xie, G.T.; Tang, J.; Zhou, L.; Pan, Y.: A Gauss function based approach for unbalanced ontology matching. In: 2009 ACM SIGMOD International Conference on Management of data, pp. 669–680. (2009)Google Scholar
  35. 35.
    Gross, A.; Hartung, M.; Kirsten, T.; Rahm, E.: On matching large life science ontologies in parallel. In: 7th International Conference on Data Integration in the Life Sciences, pp. 35–49. (2010)Google Scholar
  36. 36.
    Amin, M.B.; Khan, W.A.; Lee, S.; Kang, B.H.: Performance-based ontology matching—a data-parallel approach for an effectiveness-independent performance-gain in ontology matching. Appl. Intell. 43(2), 356–385 (2015)CrossRefGoogle Scholar
  37. 37.
    Torre-Bastida, A.I.; Villar-Rodriguez, E.; Ser, J.D.; Camacho, D.; Rodríguez, M.G.: On interlinking linked data sources by using ontology matching techniques and the map-reduce framework. In: IDEAL, volume 8669 of Lecture Notes in Computer Science, pp. 53–60. Springer (2014)Google Scholar
  38. 38.
    Algergawy, A.; Nayak, R.; Saake, G.: Element similarity measures in XML schema matching. Inf. Sci. 180(24), 4975–4998 (2010)CrossRefGoogle Scholar
  39. 39.
    Miller, G.: WordNet. A lexical database for English. Commun. ACM Mag. 38(11), 39–41 (1995)CrossRefGoogle Scholar
  40. 40.
    Algergawy, A.; Moawed, S.; Sarhan, A.; Eldosouky, A.; Saake, G.: Improving clustering-based schema matching using latent semantic indexing. In: Transactions on Large-Scale Data-and Knowledge-Centered Systems XV, pp. 102–123. (2014)Google Scholar
  41. 41.
    Euzenat, J.; Shvaiko, P.: Ontology Matching, 2nd edn. Springer, Heidelberg (DE) (2013)CrossRefzbMATHGoogle Scholar
  42. 42.
    Cohen, W.; Ravikumar, P.; Fienberg, S.: A comparsion of string distance metrics for name-matching tasks. In: Proceedings of IJCAI-03 Workshop on Information Integration on the Web, IIWeb-03, AAAI (2003), pp. 73–78. (2003)Google Scholar
  43. 43.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  44. 44.
    Thu, T.P.T.: Hybrid Similarity Measure for XML Data Integration and Transformation. Ph.D. thesis, Seoul, Korea (2012)Google Scholar
  45. 45.
    Algergawy, A.: Management of XML Data by Means of Schema Matching. Ph.D. thesis, Otto von Guericke University Magdeburg (2010)Google Scholar
  46. 46.
    Gonzalez, J .F.; Fernandez, J.: Java 7 Concurrency Cookbook. Packt Publishing Ltd., Birmingham (2012)Google Scholar
  47. 47.
    Anderson, T.; Bershad, B.; Lazowska, E.; Levy, H.: Thread management for shared-memory multiprocessors. In: Computing Handbook, Third Edition: Computer Science and Software Engineering, vol. 53, pp. 1–12 (2014)Google Scholar

Copyright information

© King Fahd University of Petroleum & Minerals 2018

Authors and Affiliations

  • Seham Moawed
    • 3
  • Alsayed Algergawy
    • 1
    Email author
  • Amany Sarhan
    • 2
  • Ali Eldosouky
    • 3
  1. 1.Heinz Nixdorf Chair for Distributed Information SystemsFriedrich Schiller University of JenaJenaGermany
  2. 2.Department of Computer EngineeringTanta UniversityTantaEgypt
  3. 3.Department of Computer EngineeringMansoura UniversityMansouraEgypt

Personalised recommendations