The Impact of Imbalanced Training Data on Local Matching Learning of Ontologies

  • Amir LaadharEmail author
  • Faiza Ghozzi
  • Imen Megdiche
  • Franck Ravat
  • Olivier Teste
  • Faiez Gargouri
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 353)


Matching learning corresponds to the combination of ontology matching and machine learning techniques. This strategy has gained increasing attention in recent years. However, state-of-the-art approaches implementing matching learning strategies are not well-tailored to deal with imbalanced training sets. In this paper, we address the problem of the imbalanced training sets and their impacts on the performance of the matching learning in the context of aligning biomedical ontologies. Our approach is applied to local matching learning, which is a technique used to divide a large ontology matching task into a set of distinct local sub-matching tasks. A local matching task is based on a local classifier built using its balanced local training set. Thus, local classifiers discover the alignment of the local sub-matching tasks. To validate our approach, we propose an experimental study to analyze the impact of applying conventional resampling techniques on the quality of the local matching learning.


Imbalanced training data Machine learning Ontology matching Semantic web 


  1. 1.
    Algergawy, A., Babalou, S., Kargar, M.J., Davarpanah, S.H.: SeeCOnt: a new seeding-based clustering approach for ontology matching. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. LNCS, vol. 9282, pp. 245–258. Springer, Cham (2015). Scholar
  2. 2.
    Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  3. 3.
    Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer, Boston (2009). Scholar
  4. 4.
    Chiatti, A., et al.: Reducing the search space in ontology alignment using clustering techniques and topic identification. In: ICKC. ACM (2015)Google Scholar
  5. 5.
    de Souto, M.C.P., Bittencourt, V.G., Costa, J.A.F.: An empirical analysis of under-sampling techniques to balance a protein structural class dataset. In: King, I., Wang, J., Chan, L.-W., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 21–29. Springer, Heidelberg (2006). Scholar
  6. 6.
    Eckert, K., Meilicke, C., Stuckenschmidt, H.: Improving ontology matching using meta-level learning. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 158–172. Springer, Heidelberg (2009). Scholar
  7. 7.
    Euzenat, J., Shvaiko, P.: Ontology Matching, vol. 1. Springer, Heidelberg (2007). Scholar
  8. 8.
    Faria, D., Pesquita, C., Mott, I., Martins, C., Couto, F.M., Cruz, I.F.: Tackling the challenges of matching biomedical ontologies. JBS 9(1), 4 (2018)Google Scholar
  9. 9.
    Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. DKE 67(1), 140–160 (2008)CrossRefGoogle Scholar
  10. 10.
    Ichise, R.: Machine learning approach for ontology mapping using multiple concept similarity measures. In: 7th IEEE/ACIS (2008)Google Scholar
  11. 11.
    Jiménez-Ruiz, E., et al.: We divide, you conquer: from large-scale ontology alignment to manageable subtasks. In: Ontology Matching (2018)Google Scholar
  12. 12.
    Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML 1997 (1997)Google Scholar
  13. 13.
    Laadhar, A., Ghozzi, F., Megdiche, I., Ravat, F., Teste, O., Gargouri, F.: Partitioning and local matching learning of large biomedical ontologies. In: ACM SIGAPP SAC, Limassol, Cyprus (2019, to appear)Google Scholar
  14. 14.
    Laadhar, A., Ghozzi, F., Megdiche, I., Ravat, F., Teste, O., Gargouri, F.: OAEI 2018 results of POMap+. In: Ontology Matching, p. 192 (2018)Google Scholar
  15. 15.
    Porter, M.F.: Snowball: a language for stemming algorithms (2001)Google Scholar
  16. 16.
    Wang, L.L., et al.: Ontology alignment in the biomedical domain using entity definitions and context (2018)Google Scholar
  17. 17.
    Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011)
  18. 18.
    More, A.: Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048 (2016)
  19. 19.
    Nezhadi, A.H., Shadgar, B., Osareh, A.: Ontology alignment using machine learning techniques. IJCSIT 3, 139 (2011)Google Scholar
  20. 20.
    Ngo, D., Bellahsene, Z.: Overview of YAM++—(not) Yet Another Matcher for ontology alignment task. Web Semant.: Sci. Serv. Agents World Wide Web 41, 30–49 (2016)CrossRefGoogle Scholar
  21. 21.
    Nkisi-Orji, I., Wiratunga, N., Massie, S., Hui, K.-Y., Heaven, R.: Ontology alignment based on word embedding and random forest classification. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 557–572. Springer, Cham (2019). Scholar
  22. 22.
    Shvaiko, P., Euzenat, J., Jiménez, E., Cheatham, M., Hassanzadeh, O.: OM 2017. In: International Workshop on Ontology Matching (2017)Google Scholar
  23. 23.
    Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies. LNCS, vol. 5445. Springer, Heidelberg (2009). Scholar
  24. 24.
    Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Xue, X., Pan, J.-S.: A segment-based approach for large-scale ontology matching. Knowl. Inf. Syst. 52(2), 467–484 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Amir Laadhar
    • 1
    Email author
  • Faiza Ghozzi
    • 2
  • Imen Megdiche
    • 1
  • Franck Ravat
    • 1
  • Olivier Teste
    • 1
  • Faiez Gargouri
    • 2
  1. 1.Institut de Recherche en Informatique de ToulouseToulouseFrance
  2. 2.MIRACLSfax UniversitySfaxTunisia

Personalised recommendations