Advertisement

Continuous Variable Binning Algorithm to Maximize Information Value Using Genetic Algorithm

  • Nattawut VejkanchanaEmail author
  • Pramote Kucharoen
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1051)

Abstract

Binning (bucketing or discretization) is a commonly used data pre-processing technique for continuous predictive variables in machine learning. There are guidelines for good binning which can be treated as constraints. However, there are also statistics which should be optimized. Therefore, we view the binning problem as a constrained optimization problem. This paper presents a novel supervised binning algorithm for binary classification problems using a genetic algorithm, named GAbin, and demonstrates usage on a well-known dataset. It is inspired by the way that human bins continuous variables. To bin a variable, first, we choose output shapes (e.g., monotonic or best bins in the middle). Second, we define constraints (e.g., minimum samples in each bin). Finally, we try to maximize key statistics to assess the quality of the output bins. The algorithm automates these steps. Results from the algorithm are in the user-desired shapes and satisfy the constraints. The experimental results reveal that the proposed GAbin provides competitive results when compared to other binning algorithms. Moreover, GAbin maximizes information value and can satisfy user-desired constraints such as monotonicity or output shape controls.

Keywords

Binning Genetic algorithm Data pre-processing Information value Constrained optimization 

Notes

Acknowledgments

This research was partially supported by Taskworld Inc.

References

  1. 1.
    Siddiqi, N.: Credit Risk Scorecards, pp. 79–82. Wiley, Hoboken (2013)Google Scholar
  2. 2.
    Thomas, L., Edelman, D., Crook, J.: Credit scoring and its applications, pp. 131–139. SIAM, Society for industrial and applied mathematics, Philadelphia (2002)Google Scholar
  3. 3.
    Refaat, M.: Credit Risk Scorecards: Development and Implementation Using SAS. Lulu.com, Raleigh (2011)Google Scholar
  4. 4.
    Kerber, R.: ChiMerge: discretization of numeric attributes. In: The Tenth National Conference on Artificial Intelligence, San Jose, California (1992)Google Scholar
  5. 5.
    Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: IJCAI (1993)Google Scholar
  6. 6.
    Jopia, H.: Scoring Modeling and Optimal Binning. (2019). https://cran.r-project.org/web/packages/smbinning/smbinning.pdf. Accessed April 2019
  7. 7.
    Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)CrossRefGoogle Scholar
  8. 8.
    Tsai, C., Lee, C., Yang, W.: A discretization algorithm based on class-attribute. Inf. Sci. 178(3), 714–731 (2008)CrossRefGoogle Scholar
  9. 9.
    Gonzalez-Abril, L., Cuberos, F., Velasco, F., Ortega, J.: Ameva: an autonomous discretization algorithm. Expert Syst. Appl. 36(3), 5327–5332 (2009)CrossRefGoogle Scholar
  10. 10.
    Mironchyk, P., Tchistiakov, V.: Monotone optimal binning algorithm for credit risk modeling. Researchgate (2017). https://www.researchgate.net/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling. Accessed April 2019
  11. 11.
    FICO: Home Equity Line of Credit (HELOC) Dataset. FICO. https://community.fico.com/s/explainable-machine-learning-challenge?tabset-3158a=2. Accessed April 2019
  12. 12.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn, pp. 126–129. Prentice Hall, Upper Saddle River (2010)zbMATHGoogle Scholar
  13. 13.
    Coello, C.A.C.: Constraint-handling Techniques used with evolutionary algorithms. In: The Genetic and Evolutionary Computation Conference Companion, Kyoto, Japan (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Institute of Development AdministrationBangkokThailand

Personalised recommendations