Induction of Rules Based on Similarity Relations for Imbalance Datasets. A Case of Study

Filiberto, Yaima; Frias, Mabel; Larrua, Rafael; Bello, Rafael

doi:10.1007/978-3-319-50880-1_6

Induction of Rules Based on Similarity Relations for Imbalance Datasets. A Case of Study

Yaima Filiberto¹³,
Mabel Frias¹³,
Rafael Larrua¹³ &
…
Rafael Bello¹⁴

Conference paper
First Online: 03 January 2017

542 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 657))

Abstract

In this paper, the performance of the IRBASIR-Imb algorithm (Induction of Rules Based on Similarity Relations for Imbalance datasets) is used in a classical task in the branch of the Civil Engineering: predict if structural failure depends on the connector (canals) or concrete capacity of connectors. The use of similarity relations allows applying this method in the case of mixed data (features with discrete or real domains). The experimental results show a satisfactory performance of the IRBASIR-Imb algorithm in comparison to others such as C4.5.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alcalá, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multi.-Valued Log. Soft Comput. 17, 255–287 (2010)
Google Scholar
Batista, G.E., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)
Article MathSciNet Google Scholar
Bonilla, J.D.: Estudio del comportamiento de conectores tipo perno de estructuras compuestas de hormigón y acero mediante modelación numérica. Ph.D., Universidad Central “Marta Abreu” de Las Villas Santa Clara
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Article Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-642-01307-2_43
Chapter Google Scholar
Chawla, N.V., et al.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)
Article Google Scholar
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Fernández, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)
Article MathSciNet Google Scholar
Filiberto, Y., Bello, R., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría de los conjuntos aproximados extendida. Revista DYNA 78, 62–70 (2011)
Google Scholar
Filiberto, Y., Bello, R., et al.: A method to built similarity relations into extended Rough set theory. In: Proceedings of the 10th International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, Egipto (2010)
Google Scholar
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)
MATH Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Berlin, Heidelberg (2005). doi:10.1007/11538059_91
Chapter Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. J. Stat. 6, 65–70 (1979)
MathSciNet MATH Google Scholar
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Article Google Scholar
Iman, R., Davenport, J.: Approximations of the critical region of the friedman statistic. Commun. Stat. Theor. Method, Part A 9, 571–595 (1980)
Article MATH Google Scholar
Khoshgoftaar, T.M., Van Hulse, J.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41, 552–568 (2010)
Article Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Pawlak, Z.: Rough Sets. Int. J. Comput. Inf. Sci. 11, 341–356 (1982)
Article MathSciNet MATH Google Scholar
Tong, L., Chang, Y.C.: Determining the optimal resampling strategy for a classification model with imbalanced data using design of experiments and response surface methodologies. Expert Syst. Appl. 38, 4222–4227 (2011)
Article Google Scholar
Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. Data Mining and Knowledge Discovery Series. Chapman & Hall/CRC, Boca Raton (2001)
Google Scholar
Yaima., F., Bello, R., et al.: Método para el aprendizaje de reglas de clasificación para conjuntos de datos no balanceados. Revista Cubana de Ciencias Informáticas (RCCI) 5(4)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, Universidad de Camagüey, Camagüey, Cuba
Yaima Filiberto, Mabel Frias & Rafael Larrua
Department of Computer Sciences, Universidad Central “Marta Abreu” de Las Villas, Santa Clara, Cuba
Rafael Bello

Authors

Yaima Filiberto
View author publications
You can also search for this author in PubMed Google Scholar
Mabel Frias
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Larrua
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Bello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mabel Frias .

Editor information

Editors and Affiliations

Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Juan Carlos Figueroa-García
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Eduyn Ramiro López-Santana
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Roberto Ferro-Escobar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Filiberto, Y., Frias, M., Larrua, R., Bello, R. (2016). Induction of Rules Based on Similarity Relations for Imbalance Datasets. A Case of Study. In: Figueroa-García, J., López-Santana, E., Ferro-Escobar, R. (eds) Applied Computer Sciences in Engineering. WEA 2016. Communications in Computer and Information Science, vol 657. Springer, Cham. https://doi.org/10.1007/978-3-319-50880-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-50880-1_6
Published: 03 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50879-5
Online ISBN: 978-3-319-50880-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics