Skip to main content
Log in

Customer purchasing behavior prediction using machine learning classification techniques

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Many sales and service-providing companies need to talk up related customers while launching the new products, services, and updated versions of existing products. While doing so, they need to target their existing customers. The behavior of these customers gives companies information about how to sell products. This paper presents a comparative study of different machine learning techniques that have been applied to the problem of customer purchasing behavior prediction. Experiments are done using supervised classification machine learning techniques like logistic regression, decision tree, k-nearest neighbors (KNN), Naïve Bayes, SVM, random forest, stochastic gradient descent (SGD), ANN, AdaBoost, XgBoost, and dummy classifier, as well as some hybrid algorithms that use stacking like SvmAda, RfAda, and KnnSgd. Models are evaluated using the cross-validation technique. Furthermore, the confusion matrix and ROC curve are used to calculate the accuracy of each model. Finally, the best classifier is a hybrid classifier using the ensemble stacking technique (KnnSgd), with an accuracy of 92.42%. KnnSgd gives the highest accuracy with maximum features because the error of the KNN and SGD are minimized by the KNN at the end.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39

Similar content being viewed by others

References

  • Adebola Orogun BO (2019) Predicting consumer behaviour in digital market: a machine learning approach. Int J Innov Res Sci Eng Technol 8(8):8391–8402

    Google Scholar 

  • Adeniyi D, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommendation system using k-nearest neighbor (KNN) classification method. Appl Comput Inform 12(1):90–108

    Article  Google Scholar 

  • Agatonovic-Kustrin S, Beresford R (2000) Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal 22(5):717–727

    Article  Google Scholar 

  • Ali J, Khan R, Ahmad N, Maqsood I (2012) Random forests and decision trees. Int J Comp Sci 9(5). http://ijcsi.org/papers/IJCSI-9-5-3-272-278.pdf

  • Alloghani M, Al-Jumeily D, Baker T, Hussain A, Mustafina J, Aljaaf AJ (2018) Applications of machine learning techniques for software engineering learning and early prediction of students’ performance. In Communications in computer and information science, Springer Singapore, pp 246–258

  • Amin A, Shah B, Khattak A. M, Baker T, ur Rahman Durani H, Anwar S (2018) Just-in-time customer churn prediction: eith and without data transformation. In 2018 IEEE congress on evolutionary computation (CEC). IEEE

  • Bala R, Kumar D (2017) Classification using ANN: a review. Int J Comput Intell Res 13(7):1811–1820

    Google Scholar 

  • Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. Physica-Verlag HD, pp 177–186

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Cardoso (2014) Uci machine learning repository

  • Cardoso MGMS (2012) Logical discriminant models. In: Quantitative modelling in marketing and management. https://doi.org/10.1142/9789814407724_0008

  • Charanasomboon T, Viyanon W (2019) A comparative study of repeat buyer prediction. In Proceedings of the 2019 2nd international conference on information science and systems. ACM

  • Chaubey G, Bisen D, Arjaria S, Yadav V (2020) Thyroid disease prediction using machine learning approaches. Natl Acad Sci Lett 44(3):233–238

    Article  Google Scholar 

  • Chen T, Guestrin C (2016) XGBoost. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM

  • Das TK (2015) A customer classification prediction model based on machine learning techniques. In 2015 International conference on applied and theoretical computing and communication technology (iCATccT). IEEE

  • Dawood EAE, Elfakhrany E, Maghraby FA (2019) Improve profiling bank customer’s behavior using machine learning. IEEE Access 7:109320–109327

    Article  Google Scholar 

  • Do QH, Trang TV (2020) An approach based on machine learning techniques for forecasting vietnamese consumers’ purchase behaviour. Decis Sci Lett, pp 313–322. http://www.growingscience.com/dsl/Vol9/dsl_2020_16.pdf

  • Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359

    Article  Google Scholar 

  • Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54(3):255–273

    Article  Google Scholar 

  • Freund Y, Schapire RE (1999) A short introduction to boosting. J Jp Soc Artif Intell 14(5):771–780

    Google Scholar 

  • Gupta G, Aggarwal H (2012) Improving customer relationship management using data mining. Int J Mach Learn Comput, pp 874–877. http://www.ijmlc.org/papers/256-L40070.pdf

  • Hehn TM, Kooij JFP, Hamprecht FA (2019) End-to-end learning of decision trees and forests. Int J Comput Vision 128(4):997–1011

    Article  MathSciNet  Google Scholar 

  • Kachamas P, Akkaradamrongrat S, Sinthupinyo S, Chandrachai A (2019) Application of artificial intelligent in the prediction of consumer behavior from facebook posts analysis. Int J Mach Learn Comput 9(1):91–97

    Article  Google Scholar 

  • Kaviani P, Dhotre MS (2017) Short survey on naive bayes algorithm-ijaerd

  • Kohavi R, Mason L, Parekh R, Zheng Z (2004) Lessons and challenges from mining retail e-commerce data. Mach Learn 57(1/2):83–113

    Article  Google Scholar 

  • Lavrač N, Cestnik B, Gamberger D, Flach P (2004) Decision support through subgroup discovery: three case studies and the lessons learned. Mach Learn 57(1/2):115–143

    Article  Google Scholar 

  • Liu W, Wang J, Sangaiah AK, Yin J (2018) Dynamic metric embedding model for point-of-interest prediction. Futur Gener Comput Syst 83:183–192

    Article  Google Scholar 

  • Momin S, Bohra T, Raut P (2019) Prediction of customer churn using machine learning. In EAI international conference on big data innovation for sustainable cognitive computing. Springer International Publishing, pp 203–212

  • Nalepa J, Kawulok M (2018) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900

    Article  Google Scholar 

  • Raghuwanshi BS, Shukla S (2018) Class-specific extreme learning machine for handling binary class imbalance problem. Neural Netw 105:206–217

    Article  Google Scholar 

  • Rokach L, Maimon O (2005) Decision trees. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_9

    Chapter  Google Scholar 

  • Sánchez-Franco MJ, Navarro-García A, Rondán-Cataluña FJ (2019) A Naive Bayes strategy for classifying customer satisfaction: a study based on online reviews of hospitality services. J Bus Res 101:499–506

    Article  Google Scholar 

  • Sangaiah AK, Medhane DV, Han T, Hossain MS, Muhammad G (2019) Enforcing position-based confidentiality with machine learning paradigm through mobile edge computing in real-time industrial informatics. IEEE Trans Industr Inf 15(7):4189–4196

    Article  Google Scholar 

  • Santharam A, Krishnan SB (2018) Survey on customer churn prediction techniques. Int Res J Eng Tech 5(11):131–137

    Google Scholar 

  • Schapire RE (2013) Explaining AdaBoost. Empirical inference. Springer, Berlin Heidelberg, pp 37–52

    Chapter  Google Scholar 

  • Sweilam NH, Tharwat A, Moniem NA (2010) Support vector machine for diagnosis cancer disease: a comparative study. Egypt Inform J 11(2):81–92

    Article  Google Scholar 

  • Ullah I, Raza B, Malik AK, Imran M, Islam SU, Kim SW (2019) A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access 7:60134–60149

    Article  Google Scholar 

  • Vafeiadis T, Diamantaras K, Sarigiannidis G, Chatzisavvas K (2015) A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 55:1–9

    Article  Google Scholar 

  • Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Ann Transl Med 4(11):218–218

    Article  Google Scholar 

  • Zhao B, Takasu A, Yahyapour R, Fu X (2019) Loyal consumers or one-time deal hunters: repeat buyer prediction for e-commerce. In 2019 International conference on data mining workshops (ICDMW). IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gyanendra Chaubey.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaubey, G., Gavhane, P.R., Bisen, D. et al. Customer purchasing behavior prediction using machine learning classification techniques. J Ambient Intell Human Comput 14, 16133–16157 (2023). https://doi.org/10.1007/s12652-022-03837-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-03837-6

Keywords

Navigation