Simulating Complexity Measures on Imbalanced Datasets

Barella, Victor H.; Garcia, Luís P. F.; de Carvalho, André C. P. L. F.

doi:10.1007/978-3-030-61380-8_34

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

852 Accesses
3 Citations

Abstract

Classification tasks using imbalanced datasets are not challenging on their own. Classification models perform poorly on the minority class when the datasets present other difficulties, such as class overlap and complex decision border. Data complexity measures can identify such difficulties, better dealing with imbalanced datasets. They can capture information about data overlapping, neighborhood, and linearity. Even though they were recently decomposed by classes to deal with imbalanced datasets, their high computational cost prevents their use on applications with a time restriction, such as recommendation systems or high dimensional datasets. In this paper, we use a Meta-Learning approach to estimate the decomposed data complexity measures. We show that the simulated measures assess the difficulty of the dataset after applying preprocessing techniques to different sample sizes. We also show that this approach is significantly faster than computing the original measures, with a statistically similar estimation error for both classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Barella, V., Garcia, L., de Carvalho, A.: The influence of sampling on imbalanced data classification. In: 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pp. 210–215. IEEE (2019)
Google Scholar
Barella, V.H., Garcia, L.P.F., de Souto, M.P., Lorena, A.C., de Carvalho, A.: Data complexity measures for imbalanced classification tasks. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2018)
Google Scholar
Bensusan, H., Giraud-Carrier, C., Kennedy, C.: A higher-order approach to meta-learning. In: 10th International Conference Inductive Logic Programming (ILP), pp. 1–10 (2000)
Google Scholar
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning - Applications to Data Mining. Cognitive Technologies, 1st edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-73263-1
Book MATH Google Scholar
Cano, A., Zafra, A., Ventura, S.: Weighted data gravitation classification for standard and imbalanced data. IEEE Trans. Cybern. 43(6), 1672–1687 (2013)
Article Google Scholar
Castiello, C., Castellano, G., Fanelli, A.M.: Meta-data: characterization of input features for meta-learning. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) MDAI 2005. LNCS (LNAI), vol. 3558, pp. 457–468. Springer, Heidelberg (2005). https://doi.org/10.1007/11526018_45
Chapter MATH Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4
Book Google Scholar
Garcia, L.P.F., Lorena, A.C.: ECoL: complexity measures for classification problems (2018). https://CRAN.R-project.org/package=ECoL
Gonzalez-Abril, L., Nuñez, H., Angulo, C., Velasco, F.: GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems. Appl. Soft Comput. 17, 23–31 (2014)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks (IJCNN), pp. 1322–1328 (2008)
Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Ho, T.K., Basu, M., Law, M.H.C.: Measures of geometrical complexity in classification problems. In: Basu, M., Ho, T.K. (eds.) Data Complexity in Pattern Recognition, pp. 1–23. Springer, London (2006). https://doi.org/10.1007/978-1-84628-172-3_1
Chapter Google Scholar
Lorena, A.C., de Souto, M.C.P.: On measuring the complexity of classification problems. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9489, pp. 158–167. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26532-2_18
Chapter Google Scholar
Lorena, A.C., Garcia, L.P.F., Lehmann, J., de Souto, M.C.P., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5) (2019)
Google Scholar
Muñoz, M.A., Villanova, L., Baatar, D., Smith-Miles, K.: Instance spaces for machine learning classification. Mach. Learn. 107(1), 109–147 (2017). https://doi.org/10.1007/s10994-017-5629-5
Article MathSciNet MATH Google Scholar
Orriols-Puig, A., Maciá, N., Ho, T.K.: Documentation for the data complexity library in C++. La Salle - Universitat Ramon Llull, Technical report (2010)
Google Scholar
Peng, Y., Flach, P.A., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36182-0_14
Chapter Google Scholar
Pimentel, B.A., de Carvalho, A.C.P.L.F.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)
Google Scholar
Reif, M.: A comprehensive dataset for evaluating approaches of various meta-learning tasks. In: 1st International Conference on Pattern Recognition Applications and Methods, pp. 273–276 (2012)
Google Scholar
Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Anal. Appl. 17(1), 83–96 (2012). https://doi.org/10.1007/s10044-012-0280-z
Article MathSciNet Google Scholar
Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–118 (1976)
Article Google Scholar
Rivolli, A., Garcia, L.P.F., Soares, C., Vanschoren, J., de Carvalho, A.C.P.L.F.: Characterizing classification datasets: a study of meta-features for meta-learning. eprint arXiv (1808.10406), pp. 1–49 (2019)
Google Scholar
Segrera, S., Pinho, J., Moreno, M.N.: Information-theoretic measures for meta-learning. In: Corchado, E., Abraham, A., Pedrycz, W. (eds.) HAIS 2008. LNCS (LNAI), vol. 5271, pp. 458–465. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87656-4_57
Chapter Google Scholar
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)
Article Google Scholar
Soares, C., Petrak, J., Brazdil, P.: Sampling-based relative landmarks: systematically test-driving algorithms before choosing. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS, vol. 2258, pp. 88–95. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45329-6_12
Chapter Google Scholar
Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G.: Experiment databases. Mach. Learn. 87(2), 127–158 (2011). https://doi.org/10.1007/s10994-011-5277-0
Article MATH Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)
Article Google Scholar
Vukicevic, M., Radovanovic, S., Delibasic, B., Suknovic, M.: Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures. Int. J. Data Min. Bioinfor. (IJDMB) 14(2), 101–119 (2016)
Article Google Scholar

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES). The authors would like to thank CNPq, FAPESP (grant 2015/01382-0). The authors would like to thank CeMEAI-FAPESP (grant 2013/07375-0) for the computational resources.

Author information

Authors and Affiliations

Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
Victor H. Barella & André C. P. L. F. de Carvalho
Department of Computer Science, University of Brasília, Brasília, Brazil
Luís P. F. Garcia

Authors

Victor H. Barella
View author publications
You can also search for this author in PubMed Google Scholar
Luís P. F. Garcia
View author publications
You can also search for this author in PubMed Google Scholar
André C. P. L. F. de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor H. Barella .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Ricardo Cerri
Federal University of ABC, Santo Andre, Brazil
Ronaldo C. Prati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barella, V.H., Garcia, L.P.F., de Carvalho, A.C.P.L.F. (2020). Simulating Complexity Measures on Imbalanced Datasets. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-61380-8_34
Published: 13 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Simulating Complexity Measures on Imbalanced Datasets