A Swarm Intelligence Approach in Undersampling Majority Class

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9882)

Abstract

Over the years, machine learning has been facing the issue of imbalance dataset. It occurs when the number of instances in one class significantly outnumbers the instances in the other class. This study investigates a new approach for balancing the dataset using a swarm intelligence technique, Stochastic Diffusion Search (SDS), to undersample the majority class on a direct marketing dataset. The outcome of the novel application of this swarm intelligence algorithm demonstrates promising results which encourage the possibility of undersampling a majority class by removing redundant data whist protecting the useful data in the dataset. This paper details the behaviour of the proposed algorithm in dealing with this problem and investigates the results which are contrasted against other techniques.

Keywords

Swarm intelligence Class imbalance Stochastic diffusion search SVM 

References

  1. 1.
    al-Rifaie, M.M., Aber, A., Sayers, R., Choke, E., Bown, M.: Deploying swarm intelligence in medical imaging. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 14–21. IEEE (2014)Google Scholar
  2. 2.
    Bahnsen, A.C., Aouada, D., Ottersten, B.: Ensemble of example-dependent cost-sensitive decision trees. arXiv preprint (2015). arXiv:1505.04637
  3. 3.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor. Newslett. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  4. 4.
    Beckmann, M., Ebecken, N.F., de Lima, B.S.P.: A KNN undersampling approach for data balancing. J. Intell. Learn. Syst. Appl. 7(04), 104 (2015)Google Scholar
  5. 5.
    Bishop, J.: Stochastic searching networks. In: Procedings of the 1st IEE Conference on Artifical Neural Networks, pp. 329–331 (1989)Google Scholar
  6. 6.
    Bishop, J., Torr, P.: The stochastic search network. In: Linggard, R., Myers, D.J., Nightingale, C. (eds.) Neural Networks for Vision, Speech and Natural Language. BT Telecommunications Series, vol. 1, pp. 370–387. Springer, Cambridge (1992)CrossRefGoogle Scholar
  7. 7.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATHGoogle Scholar
  8. 8.
    Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newslett. 6(1), 1–6 (2004)CrossRefGoogle Scholar
  9. 9.
    Drown, D.J., Khoshgoftaar, T.M., Narayanan, R.: Using evolutionary sampling to mine imbalanced data. In: Sixth International Conference on Machine Learning and Applications, ICMLA 2007, pp. 363–368. IEEE (2007)Google Scholar
  10. 10.
    Elsalamony, H.A.: Bank direct marketing analysis of data mining techniques. Int. J. Comput. Appl. 85(7), 12–22 (2014)Google Scholar
  11. 11.
    Feng, G., Zhang, J.D., Liao, S.S.: A novel method for combining bayesian networks, theoretical analysis, and its applications. Pattern Recogn. 47(5), 2057–2069 (2014)CrossRefMATHGoogle Scholar
  12. 12.
    García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 25(1), 13–21 (2012)CrossRefGoogle Scholar
  13. 13.
    Grech-Cini, H., McKee, G.T.: Locating the mouth region in images of human faces. In: Optical Tools for Manufacturing and Advanced Automation, pp. 458–465. International Society for Optics and Photonics (1993)Google Scholar
  14. 14.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Hurley, S., Whitaker, R.M.: An agent based approach to site selection for wireless networks. In: Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 574–577. ACM (2002)Google Scholar
  16. 16.
    Japkowicz, N., et al.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop on Learning from Imbalanced Data Sets, vol. 68, pp. 10–15, Menlo Park, CA (2000)Google Scholar
  17. 17.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATHGoogle Scholar
  18. 18.
    Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186, Nashville, USA (1997)Google Scholar
  19. 19.
    McCluskey, A., Lalkhen, A.G.: Statistics II: central tendency and spread of data. Conti. Educ. Anaesth. Crit. Care Pain 7(4), 127–130 (2007)CrossRefGoogle Scholar
  20. 20.
    Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)CrossRefGoogle Scholar
  21. 21.
    Moro, S., Laureano, R., Cortez, P.: Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Proceedings of the European Simulation and Modelling Conference, Eurosis (2011)Google Scholar
  22. 22.
    Nasuto, S.: Resource allocation analysis of the stochastic diffusion search. Ph.D. thesis, University of Reading (1999)Google Scholar
  23. 23.
    al Rifaie, M.M., Bishop, J.M.: Stochastic diffusion search review. J. Behav. Robot. 3, 155–173 (2013)Google Scholar
  24. 24.
    Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of ComputingGoldsmiths, University of LondonLondonUK

Personalised recommendations