SMOTE Approach to Imbalanced Dataset in Logistic Regression Analysis

  • Amirah Hazwani Abdul RahimEmail author
  • Nurazlina Abdul Rashid
  • Asmahani Nayan
  • Abd-Razak Ahmad
Conference paper


Logistic regression is a classification model that is commonly used in bankruptcy studies. The classifier works well when data is balanced. However, imbalanced data set is found in almost all bankruptcy studies. The most common approach to deal with imbalanced data set is by selecting and matching the samples from both bankrupt and non-bankrupt samples. The problem of imbalanced data and the approach taken to deal with it can affect a good predictive model. The objective of the study is to improve the classification accuracy of a logit model when data is heavily loaded to one side. The approach taken is by using SMOTE sampling. The study used SMEs categorized under the accommodation and food service activities, and the hotel sector. There are 14 explanatory variables involved. The result from this study confirmed that the AUC and sensitivity values from SMOTE Logistic Regression (SLR) model is higher than the AUC and sensitivity values of a logit model.


Imbalanced data SMOTE sampling Logistic regression 


  1. 1.
    Sun, J., Shang, Z., Li, H.: Imbalance-oriented SVM methods for financial distress prediction: a comparative study among the new SB-SVM- ensemble method and traditional method. J. Oper. Res. Soc. 65, 1905–1919 (2014). Scholar
  2. 2.
    Shi, B., Wang, J., Qi, J., Cheng, Y.: A novel imbalanced data classification approach based on logistic regression and fisher discriminant 2015 (2015)Google Scholar
  3. 3.
    Lakshmi, T.J.: A Study on Classifying Imbalanced Datasets, pp. 141–145 (2014)Google Scholar
  4. 4.
    H. Engineering, Cai, Y., Li, Y.: Oversampling method for imbalanced 34, 1017–1037 (2015)Google Scholar
  5. 5.
    Hanifah, F.S.: SMOTE bagging algorithm for imbalanced dataset in logistic regression analysis (Case: Credit of Bank X) 9(138), 6857–6865 (2015)Google Scholar
  6. 6.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-smote: a new over- sampling method in imbalanced data sets learning. In: Advances in Intelligent Computing, pp. 878–887 (2005)Google Scholar
  7. 7.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  8. 8.
    Alghamdi, M., Al-mallah, M., Keteyian, S., Brawner, C., Ehrman, J., Sakr, S.: Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: the Henry Ford ExercIse Testing (FIT) project, pp. 1–15 (2017)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Amirah Hazwani Abdul Rahim
    • 1
    Email author
  • Nurazlina Abdul Rashid
    • 1
  • Asmahani Nayan
    • 1
  • Abd-Razak Ahmad
    • 1
  1. 1.Faculty of Computer and Mathematical SciencesUniversiti Teknologi MARAKedahMalaysia

Personalised recommendations