Advertisement

A Dynamic Decision-Making Method Based on Ensemble Methods for Complex Unbalanced Data

  • Dong Chen
  • Xiao-Jun WangEmail author
  • Bin WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11881)

Abstract

Class imbalance has been proven to seriously hinder the precision of many standard learning algorithms. To solve this problem, a number of methods have been proposed, for example, the distance-based balancing ensemble method that learns the unbalanced dataset by converting it into multiple balanced subsets on which sub-classifiers are built. However, the class-imbalance problem is usually accompanied by other data-complexity problems such as class overlap, small disjuncts, and noise instance. Current algorithms developed for primary unbalanced-data problems cannot address the complex-data problems at the same time. Some of these algorithms even exacerbate the class-overlap and small-disjuncts problems after trying to address the complex-data problem. On this account, this study proposes a dynamic ensemble selection decision-making (DESD) method. The DESD first repeats the random-splitting technique to divide the dataset into multiple balanced subsets that contain no or few class-overlap and small-disjunct problems. Then, the classifiers are built on these subsets to compose the candidate classifier pool. To select the most appropriate classifiers from the candidate classifier pool for the classification of each query instance, we use a weighting mechanism to highlight the competence of classifiers that are more powerful in classifying minority instances belonging to the local region in which the query instance is located. Tests with 15 standard datasets from public repositories are performed to demonstrate the effectiveness of the DESD method. The results show that the precision of the DESD method outperforms other ensemble methods.

Keywords

Dynamic ensemble selection Unbalanced dataset Classification 

Notes

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61702070, 61751203, 61772100, 61672121, 61572093, 61802040), Program for Changjiang Scholars and Innovative Research Team in University (No. IRT_15R07), the Program for Liaoning Innovative Research Team in University (No. LT2015002), the Basic Research Program of the Key Lab in Liaoning Province Educational Department (No. LZ2015004).

References

  1. 1.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  2. 2.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016)CrossRefGoogle Scholar
  3. 3.
    Guo, H., et al.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)CrossRefGoogle Scholar
  4. 4.
    Chen, D., Wang, X., Zhou, C., Wang, B.: The distance-based balancing ensemble method for data with a high imbalance ratio. IEEE Access 7, 68940–68956 (2019)CrossRefGoogle Scholar
  5. 5.
    García, S., et al.: Dynamic ensemble selection for multi-class imbalanced datasets. Inf. Sci. 445, 22–37 (2018)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Weiss, G.M.: The impact of small disjuncts on classifier learning. In: Stahlbock, R., Crone, S., Lessmann, S. (eds.) Data Mining, pp. 193–226. Springer, Boston (2010)CrossRefGoogle Scholar
  7. 7.
    García, V., Mollineda, R.A., Sánchez, J.S.: On in the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11(3–4), 269–280 (2008)MathSciNetCrossRefGoogle Scholar
  8. 8.
    García, V., Sánchez, J.S., Ochoa Domínguez, H.J., Cleofas-Sánchez, L.: Dissimilarity-based learning from imbalanced data with small disjuncts and noise. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 370–378. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19390-8_42CrossRefGoogle Scholar
  9. 9.
    Cruz, R.M.O., Sabourin, R., Cavalcanti, G.D.C.: Dynamic classifier selection: recent advances and perspectives. Inf. Fusion 41, 195–216 (2018)CrossRefGoogle Scholar
  10. 10.
    Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRefGoogle Scholar
  11. 11.
    Roy, A., et al.: Meta-learning recommendation of default size of classifier pool for META-DES. Neurocomputing 216, 351–362 (2016)CrossRefGoogle Scholar
  12. 12.
    Ko, A.H.R., Sabourin, R., Britto Jr., A.S.: From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn. 41(5), 1718–1731 (2008)CrossRefGoogle Scholar
  13. 13.
    Cavalin, P.R., Sabourin, R., Suen, C.Y.: Dynamic selection approaches for multiple classifier systems. Neural Comput. Appl. 22(3–4), 673–688 (2013)CrossRefGoogle Scholar
  14. 14.
    Cruz, R.M.O., et al.: META-DES: a dynamic ensemble selection framework using meta-learning. Pattern Recogn. 48(5), 1925–1935 (2015)CrossRefGoogle Scholar
  15. 15.
    Cavalin, P.R., Sabourin, R., Suen, C.Y.: LoGID: An adaptive framework combining local and global incremental learning for dynamic selection of ensembles of HMMs. Pattern Recogn. 45(9), 3544–3556 (2012)CrossRefGoogle Scholar
  16. 16.
    Cruz, R.M.O., Sabourin, R., Cavalcanti, G.D.C.: META-DES. H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2015)Google Scholar
  17. 17.
    Pérez-Gállego, P., et al.: Dynamic ensemble selection for quantification tasks. Inf. Fusion 45, 1–15 (2019)CrossRefGoogle Scholar
  18. 18.
    Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple Valued Logic Soft Comput. 17, 255–287 (2011)Google Scholar
  19. 19.
    Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  20. 20.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biomed. Bull. 1(6), 80–83 (1945)CrossRefGoogle Scholar
  21. 21.
    Roy, A., et al.: A study on combining dynamic selection and data preprocessing for imbalance learning. Neurocomputing 286, 179–192 (2018)CrossRefGoogle Scholar
  22. 22.
    Lin, W.C., et al.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Key Laboratory of Advanced Design and Intelligent Computing, Ministry of EducationDalian UniversityDalianChina
  2. 2.School of Management Science and EngineeringDongbei University of Finance and Economics (DUFE)DalianChina

Personalised recommendations