Predicting direct marketing response in banking: comparison of class imbalance methods

Miguéis, Vera L.; Camanho, Ana S.; Borges, José

doi:10.1007/s11628-016-0332-3

Predicting direct marketing response in banking: comparison of class imbalance methods

Empirical article
Published: 02 January 2017

Volume 11, pages 831–849, (2017)
Cite this article

Service Business Aims and scope Submit manuscript

Vera L. Miguéis¹,
Ana S. Camanho¹ &
José Borges¹

2047 Accesses
37 Citations
Explore all metrics

Abstract

Customers’ response is an important topic in direct marketing. This study proposes a data mining response model supported by random forests to support the definition of target customers for banking campaigns. Class imbalance is a typical problem in telemarketing that can affect the performance of the data mining techniques. This study also contributes to the literature by exploring the use of class imbalance methods in the banking context. The performance of an undersampling method (the EasyEnsemble algorithm) is compared with that of an oversampling method (the Synthetic Minority Oversampling Technique) in order to determine the most appropriate specification. The importance of the attribute features included in the response model is also explored. In particular, discriminative performance was enhanced by the inclusion of demographic information, contact details and socio-economic features. Random forests, supported by an undersampling algorithm, presented very high prediction performance, outperforming the other techniques explored.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decoding Customer Behaviour: Relevance of Web and Purchasing Behaviour in Predictive Response Modeling

A Data Mining Framework for Response Modelling in Direct Marketing

Potential Customers Prediction in Bank Telemarketing

References

Abroud A, Choong YV, Muthaiyah S, Fie DYG (2015) Adopting e-finance: decomposing the technology acceptance model for investors. Serv Bus 9(1):161–182
Article Google Scholar
Alpaydin E (2009) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge
Google Scholar
American Banker (2012) Customer analytics growing in banks. http://www.americanbanker.com/btn/25_11/customer-analytics-growing-in-banks-1053866-1.html
Amini M, Rezaeenour J, Hadavandi E (2015) A cluster-based data balancing ensemble classifier for response modeling in Bank Direct Marketing. Int J Comput Intell Appl 14(04):1550,022. doi:10.1142/S1469026815500224
Article Google Scholar
Ansari A, Mela CF, Neslin SA (2008) Customer channel migration. J Mark Res 45(1):60–76. doi:10.1509/jmkr.45.1.60
Article Google Scholar
Ayetiran EF, Adeyemo AB (2012) A data mining-based response model for target selection in direct marketing. IJ Inf Technol Comput Sci 1:9–18
Google Scholar
Baesens B, Viaene S, Van den Poel D, Vanthienen J, Dedene G (2002) Bayesian neural network learning for repeat purchase modelling in direct marketing. Eur J Oper Res 138(1):191–211
Article Google Scholar
Ben Ishak A (2016) Variable selection using support vector regression and random forests: a comparative study. Intell Data Anal 20(1):83–104. doi:10.3233/IDA-150795
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Google Scholar
Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36:4626–4636
Article Google Scholar
Burton SH, Morris RG, Giraud-Carrier CG, West JH, Thackeray R (2014) Mining useful association rules from questionnaire data. Intell Data Anal 18(3):479–494. doi:10.3233/IDA-140652
Google Scholar
Chan KY, Loh WY (2004) LOTUS: an algorithm for building accurate and comprehensible logistic regression trees. J Comput Graph Stat 13(4):826–852. doi:10.1198/106186004X13064
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Google Scholar
Chen WC, Hsu CC, Hsu JN (2011) Optimal selection of potential customer range through the union sequential pattern by using a response model. Expert Syst Appl 38(6):7451–7461. doi:10.1016/j.eswa.2010.12.078
Article Google Scholar
Chen K, Hu YH, Hsieh YC (2014) Predicting customer churn from valuable B2B customers in the logistics industry: a case study. Inf Syst e-Bus Manag 13(3):475–494. doi:10.1007/s10257-014-0264-1
Article Google Scholar
Chih WH, Liou DK, Hsu LC (2014) From positive and negative cognition perspectives to explore e-shoppers real purchase behavior: an application of tricomponent attitude model. Inf Syst e-Business Manag 13(3):495–526. doi:10.1007/s10257-014-0249-0
Article Google Scholar
Cohen MD (2004) Exploiting response models optimizing cross-sell and up-sell opportunities in banking. Inf Syst 29(4):327–341. doi:10.1016/j.is.2003.08.001
Article Google Scholar
Direct Marketing Association (2012) What is the direct marketing association? http://www.the-dma.org/aboutdma/whatisthedma.shtml
Elsalamony H, Elsayad A (2013) Bank direct marketing based on neural network. Int J Eng Adv Technol 2(6):392–400
Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Proceedings of the thirteenth international conference on machine learning (ICML 1996), Morgan Kaufmann, pp 148–156
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. doi:10.1214/aos/1013203451
Article Google Scholar
Garca S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17(3):275–306. doi:10.1162/evco.2009.17.3.275
Article Google Scholar
Garca-Pedrajas N, Ortiz-Boyer D, Garca-Pedrajas MD, Fyfe C (2012) Class imbalance methods for translation initiation site recognition. In: Garca-Pedrajas N, Herrera F, Fyfe C, Bentez JM, Ali M (eds) Trends in applied intelligent systems, no. 6096 in lecture notes in computer science. Springer, Berlin, pp 327–336
Govindarajan M (2015) Comparative study of ensemble classifiers for direct marketing. Int Dec Tech 9(2):141–152. doi:10.3233/IDT-140212
Google Scholar
Gür Ali Ö, Aritürk U (2014) Dynamic churn prediction framework with more effective use of rare event data: the case of private banking. Expert Syst Appl 41(17):7889–7903. doi:10.1016/j.eswa.2014.06.018
Article Google Scholar
Gzquez-Abad JC, Cannire MHD, Martnez-Lpez FJ (2011) Dynamics of customer response to promotional and relational direct mailings from an apparel retailer: The moderating role of relationship strength. J Retail 87(2):166–181. doi:10.1016/j.jretai.2011.03.001
Article Google Scholar
Ha K, Cho S, MacLachlan D (2005) Response models based on bagging neural networks. J Interact Market 19(1):17–30. doi:10.1002/dir.20028
Article Google Scholar
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Amsterdam
Google Scholar
He H (2011) Self-adaptive systems for machine intelligence. John Wiley & Sons, New Jersey
Book Google Scholar
Hosseini SY, Bideh AZ (2014) A data mining approach for segmentation-based importance-performance analysis (SOM-BPNN-IPA): a new framework for developing customer retention strategies. Serv Bus 8(2):295–312. doi:10.1007/s11628-013-0197-7
Article Google Scholar
Hsieh NC (2004) An integrated data mining and behavioral scoring model for analyzing bank customers. Expert Syst Appl 27(4):623–633. doi:10.1016/j.eswa.2004.06.007
Article Google Scholar
Hu X (2005) A data mining approach for retailing bank customer attrition analysis. Appl Intell 22(1):47–60. doi:10.1023/B:APIN.0000047383.53680.b6
Article Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
Google Scholar
Jayasree V (2013) A review on data mining in banking sector. Am J Appl Sci 10(10):1160–1165. doi:10.3844/ajassp.2013.1160.1165
Article Google Scholar
Jingbiao R, Shaohong Y (2010) Research and improvement of clustering algorithm in data mining. In: 2010 2nd international conference on signal processing systems (ICSPS), vol 1, pp 842–845, doi:DOIurl10.1109/ICSPS.2010.5555239
Khajvand M, Tarokh MJ (2011) Estimating customer future value of different customer segments based on adapted RFM model in retail banking context. Procedia Comput Sci 3:1327–1332. doi:10.1016/j.procs.2011.01.011
Article Google Scholar
Kim G, Chae BK, Olson DL (2013) A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models. Serv Bus 7(1):167–182. doi:10.1007/s11628-012-0147-9
Article Google Scholar
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496. doi:10.1109/TSE.2008.35
Article Google Scholar
Li W, Wu X, Sun Y, Zhang Q (2010) Credit card customer segmentation and target marketing based on data mining. In: 2010 international conference on computational intelligence and security (CIS), pp 73–76, doi:DOIurl10.1109/CIS.2010.23
Liao SH, Chen CM, Hsieh CL, Hsiao SC (2009) Mining information users’ knowledge for one-to-one marketing on information appliance. Expert Syst Appl 36(3):4967–4979. doi:10.1016/j.eswa.2008.06.020
Article Google Scholar
Libana-Cabanillas F, Nogueras R, Herrera LJ, Guilln A (2013) Analysing user trust in electronic banking using data mining methods. Expert Syst Appl 40(14):5439–5447. doi:10.1016/j.eswa.2013.03.010
Article Google Scholar
Ling CX, Li C (1998) Data mining for direct marketing: Problems and solutions. In: Knowledge discovery and data mining, pp 217–225
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B 39(2):539–550. doi:10.1109/TSMCB.2008.2007853
Article Google Scholar
Lu MT, Tzeng GH, Cheng H, Hsu CC (2015) Exploring mobile banking services for user behavior in intention adoption: using new hybrid MADM model. Serv Bus 9(3):541–565. doi:10.1007/s11628-014-0239-9
Article Google Scholar
Mcculloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(133):115
Article Google Scholar
Migueis VL, Benoit DF, Van den Poel D (2013) Enhanced decision support in credit scoring using bayesian binary quantile regression. J Oper Res Soc 64(9):1374–1383. doi:10.1057/jors.2012.116
Article Google Scholar
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31. doi:10.1016/j.dss.2014.03.001
Article Google Scholar
Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2, Part 2):2592–2602
Article Google Scholar
Nie G, Rowe W, Zhang L, Tian Y, Shi Y (2011) Credit card churn forecasting by logistic regression and decision tree. Expert Syst Appl 38(12):15,273–15,285
Article Google Scholar
Olson DL, Chae B (2012) Direct marketing decision support through predictive customer response modeling. Decis Support Syst 54(1):443–451. doi:10.1016/j.dss.2012.06.005
Article Google Scholar
Olson DL, Cao Q, Gu C, Lee D (2009) Comparison of customer response models. Serv Bus 3(2):117–130. doi:10.1007/s11628-009-0064-8
Article Google Scholar
Quah JTS, Sriganesh M (2008) Real-time credit card fraud detection using computational intelligence. Expert Syst Appl 35(4):1721–1732. doi:10.1016/j.eswa.2007.08.093
Article Google Scholar
Ras ZW, Wieczorkowska A (2000) Action-rules: how to increase profit of a company. In: Zighed DA, Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery, no. 1910 in lecture notes in computer science. Springer, Berlin, pp 587–592
Ratner B (2004) Statistical modeling and analysis for database marketing: effective techniques for mining big data. CRC Press, Boca Raton
Google Scholar
Schwartz B, Lauridsen JT (2007) Scoring of bank customers for a life insurance campaign. Technical Report 5/2007, University of Southern Denmark, Denmark
Seret A, Bejinaru A, Baesens B (2015) Domain knowledge based segmentation of online banking customers. Intell Data Anal 19:163–184. doi:10.3233/IDA-150776
Article Google Scholar
Srinivas K, Rao GR, Govardhan A (2014) Adapting rough-fuzzy classifier to solve class imbalance problem in heart disease prediction using FCM. Int J Med Eng Inform 6(4):297–318. doi:10.1504/IJMEI.2014.065427
Article Google Scholar
Sun B, Li S, Zhou C (2006) “Adaptive” learning and “proactive” customer relationship management. J Interact Market 20(3–4):82–96. doi:10.1002/dir.20069
Article Google Scholar
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378. doi:10.1016/j.patcog.2007.04.009
Article Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book Google Scholar
Verhoef PC, Spring PN, Hoekstra JC, Leeflang PS (2003) The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decis Support Syst 34(4):471–481
Article Google Scholar
Vriens M, Van der Scheer HR, Hoekstra JC, Bult JR (1998) Conjoint experiments for direct mail response optimization. Eur J Market 32(3/4):323–339. doi:10.1108/03090569810204625
Article Google Scholar
Wang YY, Luse A, Townsend AM, Mennecke BE (2014) Understanding the moderating roles of types of recommender systems and products on customer behavioral intention to use recommender systems. Inf Syst e-Bus Manag 13(4):769–799. doi:10.1007/s10257-014-0269-9
Article Google Scholar
Xiong T, Wang S, Mayers A, Monga E (2013) Personal bankruptcy prediction by mining credit card data. Expert Syst Appl 40(2):665–676. doi:10.1016/j.eswa.2012.07.072
Article Google Scholar
Yeh IC, Lien CH (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2, Part 1):2473–2480. doi:10.1016/j.eswa.2007.12.020
Article Google Scholar
Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727. doi:10.1016/j.eswa.2008.06.108
Article Google Scholar
Zarnani A, Rahgozar M, Lucas C, Taghiyareh F (2009) Effective spatial clustering methods for optimal facility establishment. Intell Data Anal 13(1):61–84
Google Scholar

Download references

Author information

Authors and Affiliations

Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
Vera L. Miguéis, Ana S. Camanho & José Borges

Authors

Vera L. Miguéis
View author publications
You can also search for this author in PubMed Google Scholar
Ana S. Camanho
View author publications
You can also search for this author in PubMed Google Scholar
José Borges
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vera L. Miguéis.

Appendix

See Tables 3 and 4.

Table 3 Dataset variables

Full size table

Table 4 Variable importance measure

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miguéis, V.L., Camanho, A.S. & Borges, J. Predicting direct marketing response in banking: comparison of class imbalance methods. Serv Bus 11, 831–849 (2017). https://doi.org/10.1007/s11628-016-0332-3

Download citation

Received: 13 May 2016
Accepted: 20 December 2016
Published: 02 January 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11628-016-0332-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting direct marketing response in banking: comparison of class imbalance methods

Abstract

Access this article

Similar content being viewed by others

Decoding Customer Behaviour: Relevance of Web and Purchasing Behaviour in Predictive Response Modeling

A Data Mining Framework for Response Modelling in Direct Marketing

Potential Customers Prediction in Bank Telemarketing

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting direct marketing response in banking: comparison of class imbalance methods

Abstract

Access this article

Similar content being viewed by others

Decoding Customer Behaviour: Relevance of Web and Purchasing Behaviour in Predictive Response Modeling

A Data Mining Framework for Response Modelling in Direct Marketing

Potential Customers Prediction in Bank Telemarketing

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation