Skip to main content

Induction of Rules Based on Similarity Relations for Imbalance Datasets. A Case of Study

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 657))

Abstract

In this paper, the performance of the IRBASIR-Imb algorithm (Induction of Rules Based on Similarity Relations for Imbalance datasets) is used in a classical task in the branch of the Civil Engineering: predict if structural failure depends on the connector (canals) or concrete capacity of connectors. The use of similarity relations allows applying this method in the case of mixed data (features with discrete or real domains). The experimental results show a satisfactory performance of the IRBASIR-Imb algorithm in comparison to others such as C4.5.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alcalá, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multi.-Valued Log. Soft Comput. 17, 255–287 (2010)

    Google Scholar 

  2. Batista, G.E., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)

    Article  MathSciNet  Google Scholar 

  3. Bonilla, J.D.: Estudio del comportamiento de conectores tipo perno de estructuras compuestas de hormigón y acero mediante modelación numérica. Ph.D., Universidad Central “Marta Abreu” de Las Villas Santa Clara

    Google Scholar 

  4. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  5. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-642-01307-2_43

    Chapter  Google Scholar 

  6. Chawla, N.V., et al.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)

    Article  Google Scholar 

  7. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  8. Fernández, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)

    Article  MathSciNet  Google Scholar 

  9. Filiberto, Y., Bello, R., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría de los conjuntos aproximados extendida. Revista DYNA 78, 62–70 (2011)

    Google Scholar 

  10. Filiberto, Y., Bello, R., et al.: A method to built similarity relations into extended Rough set theory. In: Proceedings of the 10th International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, Egipto (2010)

    Google Scholar 

  11. Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)

    MATH  Google Scholar 

  12. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Berlin, Heidelberg (2005). doi:10.1007/11538059_91

    Chapter  Google Scholar 

  13. Holm, S.: A simple sequentially rejective multiple test procedure. J. Stat. 6, 65–70 (1979)

    MathSciNet  MATH  Google Scholar 

  14. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  15. Iman, R., Davenport, J.: Approximations of the critical region of the friedman statistic. Commun. Stat. Theor. Method, Part A 9, 571–595 (1980)

    Article  MATH  Google Scholar 

  16. Khoshgoftaar, T.M., Van Hulse, J.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41, 552–568 (2010)

    Article  Google Scholar 

  17. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  18. Pawlak, Z.: Rough Sets. Int. J. Comput. Inf. Sci. 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  19. Tong, L., Chang, Y.C.: Determining the optimal resampling strategy for a classification model with imbalanced data using design of experiments and response surface methodologies. Expert Syst. Appl. 38, 4222–4227 (2011)

    Article  Google Scholar 

  20. Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. Data Mining and Knowledge Discovery Series. Chapman & Hall/CRC, Boca Raton (2001)

    Google Scholar 

  21. Yaima., F., Bello, R., et al.: Método para el aprendizaje de reglas de clasificación para conjuntos de datos no balanceados. Revista Cubana de Ciencias Informáticas (RCCI) 5(4)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mabel Frias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Filiberto, Y., Frias, M., Larrua, R., Bello, R. (2016). Induction of Rules Based on Similarity Relations for Imbalance Datasets. A Case of Study. In: Figueroa-García, J., López-Santana, E., Ferro-Escobar, R. (eds) Applied Computer Sciences in Engineering. WEA 2016. Communications in Computer and Information Science, vol 657. Springer, Cham. https://doi.org/10.1007/978-3-319-50880-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50880-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50879-5

  • Online ISBN: 978-3-319-50880-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics