Induction of Rules Based on Similarity Relations for Imbalance Datasets. A Case of Study

  • Yaima Filiberto
  • Mabel FriasEmail author
  • Rafael Larrua
  • Rafael Bello
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 657)


In this paper, the performance of the IRBASIR-Imb algorithm (Induction of Rules Based on Similarity Relations for Imbalance datasets) is used in a classical task in the branch of the Civil Engineering: predict if structural failure depends on the connector (canals) or concrete capacity of connectors. The use of similarity relations allows applying this method in the case of mixed data (features with discrete or real domains). The experimental results show a satisfactory performance of the IRBASIR-Imb algorithm in comparison to others such as C4.5.


Classification rules Similarity relations Imbalanced data sets 


  1. 1.
    Alcalá, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multi.-Valued Log. Soft Comput. 17, 255–287 (2010)Google Scholar
  2. 2.
    Batista, G.E., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bonilla, J.D.: Estudio del comportamiento de conectores tipo perno de estructuras compuestas de hormigón y acero mediante modelación numérica. Ph.D., Universidad Central “Marta Abreu” de Las Villas Santa ClaraGoogle Scholar
  4. 4.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  5. 5.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Berlin, Heidelberg (2009). doi: 10.1007/978-3-642-01307-2_43 CrossRefGoogle Scholar
  6. 6.
    Chawla, N.V., et al.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)CrossRefGoogle Scholar
  7. 7.
    Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  8. 8.
    Fernández, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Filiberto, Y., Bello, R., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría de los conjuntos aproximados extendida. Revista DYNA 78, 62–70 (2011)Google Scholar
  10. 10.
    Filiberto, Y., Bello, R., et al.: A method to built similarity relations into extended Rough set theory. In: Proceedings of the 10th International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, Egipto (2010)Google Scholar
  11. 11.
    Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)zbMATHGoogle Scholar
  12. 12.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Berlin, Heidelberg (2005). doi: 10.1007/11538059_91 CrossRefGoogle Scholar
  13. 13.
    Holm, S.: A simple sequentially rejective multiple test procedure. J. Stat. 6, 65–70 (1979)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRefGoogle Scholar
  15. 15.
    Iman, R., Davenport, J.: Approximations of the critical region of the friedman statistic. Commun. Stat. Theor. Method, Part A 9, 571–595 (1980)CrossRefzbMATHGoogle Scholar
  16. 16.
    Khoshgoftaar, T.M., Van Hulse, J.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41, 552–568 (2010)CrossRefGoogle Scholar
  17. 17.
    Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)zbMATHGoogle Scholar
  18. 18.
    Pawlak, Z.: Rough Sets. Int. J. Comput. Inf. Sci. 11, 341–356 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Tong, L., Chang, Y.C.: Determining the optimal resampling strategy for a classification model with imbalanced data using design of experiments and response surface methodologies. Expert Syst. Appl. 38, 4222–4227 (2011)CrossRefGoogle Scholar
  20. 20.
    Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. Data Mining and Knowledge Discovery Series. Chapman & Hall/CRC, Boca Raton (2001)Google Scholar
  21. 21.
    Yaima., F., Bello, R., et al.: Método para el aprendizaje de reglas de clasificación para conjuntos de datos no balanceados. Revista Cubana de Ciencias Informáticas (RCCI) 5(4)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Yaima Filiberto
    • 1
  • Mabel Frias
    • 1
    Email author
  • Rafael Larrua
    • 1
  • Rafael Bello
    • 2
  1. 1.Department of Computer SciencesUniversidad de CamagüeyCamagüeyCuba
  2. 2.Department of Computer SciencesUniversidad Central “Marta Abreu” de Las VillasSanta ClaraCuba

Personalised recommendations