Abstract
Many sales and service-providing companies need to talk up related customers while launching the new products, services, and updated versions of existing products. While doing so, they need to target their existing customers. The behavior of these customers gives companies information about how to sell products. This paper presents a comparative study of different machine learning techniques that have been applied to the problem of customer purchasing behavior prediction. Experiments are done using supervised classification machine learning techniques like logistic regression, decision tree, k-nearest neighbors (KNN), Naïve Bayes, SVM, random forest, stochastic gradient descent (SGD), ANN, AdaBoost, XgBoost, and dummy classifier, as well as some hybrid algorithms that use stacking like SvmAda, RfAda, and KnnSgd. Models are evaluated using the cross-validation technique. Furthermore, the confusion matrix and ROC curve are used to calculate the accuracy of each model. Finally, the best classifier is a hybrid classifier using the ensemble stacking technique (KnnSgd), with an accuracy of 92.42%. KnnSgd gives the highest accuracy with maximum features because the error of the KNN and SGD are minimized by the KNN at the end.
Similar content being viewed by others
References
Adebola Orogun BO (2019) Predicting consumer behaviour in digital market: a machine learning approach. Int J Innov Res Sci Eng Technol 8(8):8391–8402
Adeniyi D, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommendation system using k-nearest neighbor (KNN) classification method. Appl Comput Inform 12(1):90–108
Agatonovic-Kustrin S, Beresford R (2000) Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal 22(5):717–727
Ali J, Khan R, Ahmad N, Maqsood I (2012) Random forests and decision trees. Int J Comp Sci 9(5). http://ijcsi.org/papers/IJCSI-9-5-3-272-278.pdf
Alloghani M, Al-Jumeily D, Baker T, Hussain A, Mustafina J, Aljaaf AJ (2018) Applications of machine learning techniques for software engineering learning and early prediction of students’ performance. In Communications in computer and information science, Springer Singapore, pp 246–258
Amin A, Shah B, Khattak A. M, Baker T, ur Rahman Durani H, Anwar S (2018) Just-in-time customer churn prediction: eith and without data transformation. In 2018 IEEE congress on evolutionary computation (CEC). IEEE
Bala R, Kumar D (2017) Classification using ANN: a review. Int J Comput Intell Res 13(7):1811–1820
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. Physica-Verlag HD, pp 177–186
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cardoso (2014) Uci machine learning repository
Cardoso MGMS (2012) Logical discriminant models. In: Quantitative modelling in marketing and management. https://doi.org/10.1142/9789814407724_0008
Charanasomboon T, Viyanon W (2019) A comparative study of repeat buyer prediction. In Proceedings of the 2019 2nd international conference on information science and systems. ACM
Chaubey G, Bisen D, Arjaria S, Yadav V (2020) Thyroid disease prediction using machine learning approaches. Natl Acad Sci Lett 44(3):233–238
Chen T, Guestrin C (2016) XGBoost. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM
Das TK (2015) A customer classification prediction model based on machine learning techniques. In 2015 International conference on applied and theoretical computing and communication technology (iCATccT). IEEE
Dawood EAE, Elfakhrany E, Maghraby FA (2019) Improve profiling bank customer’s behavior using machine learning. IEEE Access 7:109320–109327
Do QH, Trang TV (2020) An approach based on machine learning techniques for forecasting vietnamese consumers’ purchase behaviour. Decis Sci Lett, pp 313–322. http://www.growingscience.com/dsl/Vol9/dsl_2020_16.pdf
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359
Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54(3):255–273
Freund Y, Schapire RE (1999) A short introduction to boosting. J Jp Soc Artif Intell 14(5):771–780
Gupta G, Aggarwal H (2012) Improving customer relationship management using data mining. Int J Mach Learn Comput, pp 874–877. http://www.ijmlc.org/papers/256-L40070.pdf
Hehn TM, Kooij JFP, Hamprecht FA (2019) End-to-end learning of decision trees and forests. Int J Comput Vision 128(4):997–1011
Kachamas P, Akkaradamrongrat S, Sinthupinyo S, Chandrachai A (2019) Application of artificial intelligent in the prediction of consumer behavior from facebook posts analysis. Int J Mach Learn Comput 9(1):91–97
Kaviani P, Dhotre MS (2017) Short survey on naive bayes algorithm-ijaerd
Kohavi R, Mason L, Parekh R, Zheng Z (2004) Lessons and challenges from mining retail e-commerce data. Mach Learn 57(1/2):83–113
Lavrač N, Cestnik B, Gamberger D, Flach P (2004) Decision support through subgroup discovery: three case studies and the lessons learned. Mach Learn 57(1/2):115–143
Liu W, Wang J, Sangaiah AK, Yin J (2018) Dynamic metric embedding model for point-of-interest prediction. Futur Gener Comput Syst 83:183–192
Momin S, Bohra T, Raut P (2019) Prediction of customer churn using machine learning. In EAI international conference on big data innovation for sustainable cognitive computing. Springer International Publishing, pp 203–212
Nalepa J, Kawulok M (2018) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900
Raghuwanshi BS, Shukla S (2018) Class-specific extreme learning machine for handling binary class imbalance problem. Neural Netw 105:206–217
Rokach L, Maimon O (2005) Decision trees. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_9
Sánchez-Franco MJ, Navarro-García A, Rondán-Cataluña FJ (2019) A Naive Bayes strategy for classifying customer satisfaction: a study based on online reviews of hospitality services. J Bus Res 101:499–506
Sangaiah AK, Medhane DV, Han T, Hossain MS, Muhammad G (2019) Enforcing position-based confidentiality with machine learning paradigm through mobile edge computing in real-time industrial informatics. IEEE Trans Industr Inf 15(7):4189–4196
Santharam A, Krishnan SB (2018) Survey on customer churn prediction techniques. Int Res J Eng Tech 5(11):131–137
Schapire RE (2013) Explaining AdaBoost. Empirical inference. Springer, Berlin Heidelberg, pp 37–52
Sweilam NH, Tharwat A, Moniem NA (2010) Support vector machine for diagnosis cancer disease: a comparative study. Egypt Inform J 11(2):81–92
Ullah I, Raza B, Malik AK, Imran M, Islam SU, Kim SW (2019) A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access 7:60134–60149
Vafeiadis T, Diamantaras K, Sarigiannidis G, Chatzisavvas K (2015) A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 55:1–9
Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Ann Transl Med 4(11):218–218
Zhao B, Takasu A, Yahyapour R, Fu X (2019) Loyal consumers or one-time deal hunters: repeat buyer prediction for e-commerce. In 2019 International conference on data mining workshops (ICDMW). IEEE
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chaubey, G., Gavhane, P.R., Bisen, D. et al. Customer purchasing behavior prediction using machine learning classification techniques. J Ambient Intell Human Comput 14, 16133–16157 (2023). https://doi.org/10.1007/s12652-022-03837-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03837-6