Skip to main content
Log in

Predicting direct marketing response in banking: comparison of class imbalance methods

  • Empirical article
  • Published:
Service Business Aims and scope Submit manuscript

Abstract

Customers’ response is an important topic in direct marketing. This study proposes a data mining response model supported by random forests to support the definition of target customers for banking campaigns. Class imbalance is a typical problem in telemarketing that can affect the performance of the data mining techniques. This study also contributes to the literature by exploring the use of class imbalance methods in the banking context. The performance of an undersampling method (the EasyEnsemble algorithm) is compared with that of an oversampling method (the Synthetic Minority Oversampling Technique) in order to determine the most appropriate specification. The importance of the attribute features included in the response model is also explored. In particular, discriminative performance was enhanced by the inclusion of demographic information, contact details and socio-economic features. Random forests, supported by an undersampling algorithm, presented very high prediction performance, outperforming the other techniques explored.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abroud A, Choong YV, Muthaiyah S, Fie DYG (2015) Adopting e-finance: decomposing the technology acceptance model for investors. Serv Bus 9(1):161–182

    Article  Google Scholar 

  • Alpaydin E (2009) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge

    Google Scholar 

  • American Banker (2012) Customer analytics growing in banks. http://www.americanbanker.com/btn/25_11/customer-analytics-growing-in-banks-1053866-1.html

  • Amini M, Rezaeenour J, Hadavandi E (2015) A cluster-based data balancing ensemble classifier for response modeling in Bank Direct Marketing. Int J Comput Intell Appl 14(04):1550,022. doi:10.1142/S1469026815500224

    Article  Google Scholar 

  • Ansari A, Mela CF, Neslin SA (2008) Customer channel migration. J Mark Res 45(1):60–76. doi:10.1509/jmkr.45.1.60

    Article  Google Scholar 

  • Ayetiran EF, Adeyemo AB (2012) A data mining-based response model for target selection in direct marketing. IJ Inf Technol Comput Sci 1:9–18

    Google Scholar 

  • Baesens B, Viaene S, Van den Poel D, Vanthienen J, Dedene G (2002) Bayesian neural network learning for repeat purchase modelling in direct marketing. Eur J Oper Res 138(1):191–211

    Article  Google Scholar 

  • Ben Ishak A (2016) Variable selection using support vector regression and random forests: a comparative study. Intell Data Anal 20(1):83–104. doi:10.3233/IDA-150795

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  • Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36:4626–4636

    Article  Google Scholar 

  • Burton SH, Morris RG, Giraud-Carrier CG, West JH, Thackeray R (2014) Mining useful association rules from questionnaire data. Intell Data Anal 18(3):479–494. doi:10.3233/IDA-140652

    Google Scholar 

  • Chan KY, Loh WY (2004) LOTUS: an algorithm for building accurate and comprehensible logistic regression trees. J Comput Graph Stat 13(4):826–852. doi:10.1198/106186004X13064

    Article  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    Google Scholar 

  • Chen WC, Hsu CC, Hsu JN (2011) Optimal selection of potential customer range through the union sequential pattern by using a response model. Expert Syst Appl 38(6):7451–7461. doi:10.1016/j.eswa.2010.12.078

    Article  Google Scholar 

  • Chen K, Hu YH, Hsieh YC (2014) Predicting customer churn from valuable B2B customers in the logistics industry: a case study. Inf Syst e-Bus Manag 13(3):475–494. doi:10.1007/s10257-014-0264-1

    Article  Google Scholar 

  • Chih WH, Liou DK, Hsu LC (2014) From positive and negative cognition perspectives to explore e-shoppers real purchase behavior: an application of tricomponent attitude model. Inf Syst e-Business Manag 13(3):495–526. doi:10.1007/s10257-014-0249-0

    Article  Google Scholar 

  • Cohen MD (2004) Exploiting response models optimizing cross-sell and up-sell opportunities in banking. Inf Syst 29(4):327–341. doi:10.1016/j.is.2003.08.001

    Article  Google Scholar 

  • Direct Marketing Association (2012) What is the direct marketing association? http://www.the-dma.org/aboutdma/whatisthedma.shtml

  • Elsalamony H, Elsayad A (2013) Bank direct marketing based on neural network. Int J Eng Adv Technol 2(6):392–400

    Google Scholar 

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Proceedings of the thirteenth international conference on machine learning (ICML 1996), Morgan Kaufmann, pp 148–156

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. doi:10.1214/aos/1013203451

    Article  Google Scholar 

  • Garca S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17(3):275–306. doi:10.1162/evco.2009.17.3.275

    Article  Google Scholar 

  • Garca-Pedrajas N, Ortiz-Boyer D, Garca-Pedrajas MD, Fyfe C (2012) Class imbalance methods for translation initiation site recognition. In: Garca-Pedrajas N, Herrera F, Fyfe C, Bentez JM, Ali M (eds) Trends in applied intelligent systems, no. 6096 in lecture notes in computer science. Springer, Berlin, pp 327–336

  • Govindarajan M (2015) Comparative study of ensemble classifiers for direct marketing. Int Dec Tech 9(2):141–152. doi:10.3233/IDT-140212

    Google Scholar 

  • Gür Ali Ö, Aritürk U (2014) Dynamic churn prediction framework with more effective use of rare event data: the case of private banking. Expert Syst Appl 41(17):7889–7903. doi:10.1016/j.eswa.2014.06.018

    Article  Google Scholar 

  • Gzquez-Abad JC, Cannire MHD, Martnez-Lpez FJ (2011) Dynamics of customer response to promotional and relational direct mailings from an apparel retailer: The moderating role of relationship strength. J Retail 87(2):166–181. doi:10.1016/j.jretai.2011.03.001

    Article  Google Scholar 

  • Ha K, Cho S, MacLachlan D (2005) Response models based on bagging neural networks. J Interact Market 19(1):17–30. doi:10.1002/dir.20028

    Article  Google Scholar 

  • Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Amsterdam

    Google Scholar 

  • He H (2011) Self-adaptive systems for machine intelligence. John Wiley & Sons, New Jersey

    Book  Google Scholar 

  • Hosseini SY, Bideh AZ (2014) A data mining approach for segmentation-based importance-performance analysis (SOM-BPNN-IPA): a new framework for developing customer retention strategies. Serv Bus 8(2):295–312. doi:10.1007/s11628-013-0197-7

    Article  Google Scholar 

  • Hsieh NC (2004) An integrated data mining and behavioral scoring model for analyzing bank customers. Expert Syst Appl 27(4):623–633. doi:10.1016/j.eswa.2004.06.007

    Article  Google Scholar 

  • Hu X (2005) A data mining approach for retailing bank customer attrition analysis. Appl Intell 22(1):47–60. doi:10.1023/B:APIN.0000047383.53680.b6

    Article  Google Scholar 

  • Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449

    Google Scholar 

  • Jayasree V (2013) A review on data mining in banking sector. Am J Appl Sci 10(10):1160–1165. doi:10.3844/ajassp.2013.1160.1165

    Article  Google Scholar 

  • Jingbiao R, Shaohong Y (2010) Research and improvement of clustering algorithm in data mining. In: 2010 2nd international conference on signal processing systems (ICSPS), vol 1, pp 842–845, doi:DOIurl10.1109/ICSPS.2010.5555239

  • Khajvand M, Tarokh MJ (2011) Estimating customer future value of different customer segments based on adapted RFM model in retail banking context. Procedia Comput Sci 3:1327–1332. doi:10.1016/j.procs.2011.01.011

    Article  Google Scholar 

  • Kim G, Chae BK, Olson DL (2013) A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models. Serv Bus 7(1):167–182. doi:10.1007/s11628-012-0147-9

    Article  Google Scholar 

  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496. doi:10.1109/TSE.2008.35

    Article  Google Scholar 

  • Li W, Wu X, Sun Y, Zhang Q (2010) Credit card customer segmentation and target marketing based on data mining. In: 2010 international conference on computational intelligence and security (CIS), pp 73–76, doi:DOIurl10.1109/CIS.2010.23

  • Liao SH, Chen CM, Hsieh CL, Hsiao SC (2009) Mining information users’ knowledge for one-to-one marketing on information appliance. Expert Syst Appl 36(3):4967–4979. doi:10.1016/j.eswa.2008.06.020

    Article  Google Scholar 

  • Libana-Cabanillas F, Nogueras R, Herrera LJ, Guilln A (2013) Analysing user trust in electronic banking using data mining methods. Expert Syst Appl 40(14):5439–5447. doi:10.1016/j.eswa.2013.03.010

    Article  Google Scholar 

  • Ling CX, Li C (1998) Data mining for direct marketing: Problems and solutions. In: Knowledge discovery and data mining, pp 217–225

  • Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B 39(2):539–550. doi:10.1109/TSMCB.2008.2007853

    Article  Google Scholar 

  • Lu MT, Tzeng GH, Cheng H, Hsu CC (2015) Exploring mobile banking services for user behavior in intention adoption: using new hybrid MADM model. Serv Bus 9(3):541–565. doi:10.1007/s11628-014-0239-9

    Article  Google Scholar 

  • Mcculloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(133):115

    Article  Google Scholar 

  • Migueis VL, Benoit DF, Van den Poel D (2013) Enhanced decision support in credit scoring using bayesian binary quantile regression. J Oper Res Soc 64(9):1374–1383. doi:10.1057/jors.2012.116

    Article  Google Scholar 

  • Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31. doi:10.1016/j.dss.2014.03.001

    Article  Google Scholar 

  • Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2, Part 2):2592–2602

    Article  Google Scholar 

  • Nie G, Rowe W, Zhang L, Tian Y, Shi Y (2011) Credit card churn forecasting by logistic regression and decision tree. Expert Syst Appl 38(12):15,273–15,285

    Article  Google Scholar 

  • Olson DL, Chae B (2012) Direct marketing decision support through predictive customer response modeling. Decis Support Syst 54(1):443–451. doi:10.1016/j.dss.2012.06.005

    Article  Google Scholar 

  • Olson DL, Cao Q, Gu C, Lee D (2009) Comparison of customer response models. Serv Bus 3(2):117–130. doi:10.1007/s11628-009-0064-8

    Article  Google Scholar 

  • Quah JTS, Sriganesh M (2008) Real-time credit card fraud detection using computational intelligence. Expert Syst Appl 35(4):1721–1732. doi:10.1016/j.eswa.2007.08.093

    Article  Google Scholar 

  • Ras ZW, Wieczorkowska A (2000) Action-rules: how to increase profit of a company. In: Zighed DA, Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery, no. 1910 in lecture notes in computer science. Springer, Berlin, pp 587–592

  • Ratner B (2004) Statistical modeling and analysis for database marketing: effective techniques for mining big data. CRC Press, Boca Raton

    Google Scholar 

  • Schwartz B, Lauridsen JT (2007) Scoring of bank customers for a life insurance campaign. Technical Report 5/2007, University of Southern Denmark, Denmark

  • Seret A, Bejinaru A, Baesens B (2015) Domain knowledge based segmentation of online banking customers. Intell Data Anal 19:163–184. doi:10.3233/IDA-150776

    Article  Google Scholar 

  • Srinivas K, Rao GR, Govardhan A (2014) Adapting rough-fuzzy classifier to solve class imbalance problem in heart disease prediction using FCM. Int J Med Eng Inform 6(4):297–318. doi:10.1504/IJMEI.2014.065427

    Article  Google Scholar 

  • Sun B, Li S, Zhou C (2006) “Adaptive” learning and “proactive” customer relationship management. J Interact Market 20(3–4):82–96. doi:10.1002/dir.20069

    Article  Google Scholar 

  • Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378. doi:10.1016/j.patcog.2007.04.009

    Article  Google Scholar 

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  • Verhoef PC, Spring PN, Hoekstra JC, Leeflang PS (2003) The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decis Support Syst 34(4):471–481

    Article  Google Scholar 

  • Vriens M, Van der Scheer HR, Hoekstra JC, Bult JR (1998) Conjoint experiments for direct mail response optimization. Eur J Market 32(3/4):323–339. doi:10.1108/03090569810204625

    Article  Google Scholar 

  • Wang YY, Luse A, Townsend AM, Mennecke BE (2014) Understanding the moderating roles of types of recommender systems and products on customer behavioral intention to use recommender systems. Inf Syst e-Bus Manag 13(4):769–799. doi:10.1007/s10257-014-0269-9

    Article  Google Scholar 

  • Xiong T, Wang S, Mayers A, Monga E (2013) Personal bankruptcy prediction by mining credit card data. Expert Syst Appl 40(2):665–676. doi:10.1016/j.eswa.2012.07.072

    Article  Google Scholar 

  • Yeh IC, Lien CH (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2, Part 1):2473–2480. doi:10.1016/j.eswa.2007.12.020

    Article  Google Scholar 

  • Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727. doi:10.1016/j.eswa.2008.06.108

    Article  Google Scholar 

  • Zarnani A, Rahgozar M, Lucas C, Taghiyareh F (2009) Effective spatial clustering methods for optimal facility establishment. Intell Data Anal 13(1):61–84

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vera L. Miguéis.

Appendix

Appendix

See Tables 3 and 4.

Table 3 Dataset variables
Table 4 Variable importance measure

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miguéis, V.L., Camanho, A.S. & Borges, J. Predicting direct marketing response in banking: comparison of class imbalance methods. Serv Bus 11, 831–849 (2017). https://doi.org/10.1007/s11628-016-0332-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11628-016-0332-3

Keywords

Navigation