Skip to main content
Log in

TSUNAMI - an explainable PPM approach for customer churn prediction in evolving retail data environments

  • Research
  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Retail companies are greatly interested in performing continuous monitoring of purchase traces of customers, to identify weak customers and take the necessary actions to improve customer satisfaction and ensure their revenues remain unaffected. In this paper, we formulate the customer churn prediction problem as a Predictive Process Monitoring (PPM) problem to be addressed under possible dynamic conditions of evolving retail data environments. To this aim, we propose TSUNAMI as a PPM approach to monitor the customer loyalty in the retail sector. It processes online the sale receipt stream produced by customers of a retail business company and learns a deep neural model to early detect possible purchase customer traces that will outcome in future churners. In addition, the proposed approach integrates a mechanism to detect concept drifts in customer purchase traces and adapts the deep neural model to concept drifts. Finally, to make decisions of customer purchase monitoring explainable to potential stakeholders, we analyse Shapley values of decisions, to explain which characteristics of the customer purchase traces are the most relevant for disentangling churners from non-churners and how these characteristics have possibly changed over time. Experiments with two benchmark retail data sets explore the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data, Material, and/or Code Availability

Code that support the findings of this study and data extracted for training the classification algorithms are available at https://github.com/vinspdb/TSUNAMI.

Notes

  1. Any accuracy metric can be used in this place.

  2. As an additional constraint, let us consider that retails data are commonly recorded fro 18 months, hence serialized values of \(\textbf{T}\) older than 18 months can be also removed from the disk.

  3. https://archive.ics.uci.edu/dataset/502/online+retail+ii

  4. https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce

  5. The source code is available online at https://github.com/vinspdb/TSUNAMI

  6. https://riverml.xyz/dev/api/drift/ADWIN/

  7. https://shap-lrjball.readthedocs.io/en/latest/index.html

References

  • Ahn, J., Hwang, J., Kim, D., et al. (2020). A survey on churn analysis in various business domains. IEEE Access, 8, 220816–220839. https://doi.org/10.1109/ACCESS.2020.3042657

    Article  Google Scholar 

  • Alippi, C., Boracchi, G., & Roveri, M. (2017). Hierarchical change-detection tests. IEEE Transactions on Neural Networks and Learning Systems, 28(2), 246–258. https://doi.org/10.1109/TNNLS.2015.2512714

    Article  Google Scholar 

  • Alippi, C., & Roveri, M. (2008). Just-in-time adaptive classifiers-part i: Detecting nonstationary changes. IEEE Transactions on Neural Networks, 19(7), 1145–1153. https://doi.org/10.1109/TNN.2008.2000082

    Article  Google Scholar 

  • Benczúr, A. A., Kocsis, L., & Pálovics, R. (2019). Encyclopedia of big data technologies, chap. Online machine learning algorithms over data streams (pp. 1199–1207). Springer International Publishing: Cham. https://doi.org/10.1007/978-3-319-77525-8_329

  • Bifet, A., & Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. In: 7th SIAM International conference on data mining, proceedings (pp. 443–448). SIAM. https://doi.org/10.1137/1.9781611972771.42

  • Bolton, R. N. (1998). A dynamic model of the duration of the customer’s relationship with a continuous service provider: The role of satisfaction. Marketing Science, 17(1), 45–65.

    Article  Google Scholar 

  • Brzezinski, D., Minku, L. L., Pewinski, T., et al. (2021). The impact of data difficulty factors on classification of imbalanced and concept drifting data streams. Knowledge and Information Systems, 63(6), 1429–1469. https://doi.org/10.1007/S10115-021-01560-W

    Article  Google Scholar 

  • Brzezinski, D., Stefanowski, J., Susmaga, R., et al. (2020). On the dynamics of classification measures for imbalanced and streaming data. IEEE Transactions on Neural Networks and Learning Systems, 31(8), 2868–2878. https://doi.org/10.1109/TNNLS.2019.2899061

    Article  Google Scholar 

  • Chen, Y., Xie, X., & Lin, S. D., et al. (2018). Wsdm cup 2018: Music recommendation and churn prediction. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 8–9). ACM. https://doi.org/10.1145/3159652.3160605

  • Di Francescomarino, C., & Ghidini, C. (2022). Process mining handbook, chap. Predictive process monitoring (pp. 320–346). Springer International Publishing: Cham. https://doi.org/10.1007/978-3-031-08848-3_10

  • Ditzler, G., Roveri, M., Alippi, C., et al. (2015). Learning in nonstationary environments: A survey. IEEE Computational Intelligence Magazine, 10(4), 12–25. https://doi.org/10.1109/MCI.2015.2471196

    Article  Google Scholar 

  • Duan, Y., & Ras, Z. W. (2022). Recommendation system for improving churn rate based on action rules and sentiment mining. International Journal of Data Mining, Modelling and Management, 14(4), 287–308.

    Article  Google Scholar 

  • Fu, K., Zheng, G., & Xie, W. (2023). Customer churn prediction for a webcast platform via a voting-based ensemble learning model with nelder-mead optimizer. Journal of Intelligent Information Systems, 1–21

  • Galanti, R., de Leoni, M., Monaro, M., et al. (2023). An explainable decision support system for predictive process analytics. Engineering Applications of Artificial Intelligence, 120, 105904. https://doi.org/10.1016/j.engappai.2023.105904

    Article  Google Scholar 

  • Gama, J., Zliobaite, I., & Bifet, A., et al. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 44:1–44:37. https://doi.org/10.1145/2523813

  • Geiler, L., Affeldt, S., & Nadif, M. (2022). A survey on machine learning methods for churn prediction. International Journal of Data Science and Analytics, 14, 217–242. https://doi.org/10.1007/s41060-022-00312-5

    Article  Google Scholar 

  • Gobet, F., & Lane, P. C. R. (2012). Encyclopedia of the sciences of learning, chap. Chunking mechanisms and learning (pp. 541–544). Springer: US. https://doi.org/10.1007/978-1-4419-1428-6_1731

  • Günther, C. C., Tvete, I. F., Aas, K., et al. (2014). Modelling and predicting customer churn from an insurance company. Scandinavian Actuarial Journal, 2014(1), 58–71. https://doi.org/10.1080/03461238.2011.636502

    Article  MathSciNet  Google Scholar 

  • Hoi, S. C., Sahoo, D., Lu, J., et al. (2021). Online learning: A comprehensive survey. Neurocomputing, 459, 249–289. https://doi.org/10.1016/j.neucom.2021.04.112

    Article  Google Scholar 

  • Jain, N., Tomar, A., & Jana, P. K. (2021). A novel scheme for employee churn problem using multi-attribute decision making approach and machine learning. Journal of Intelligent Information Systems, 56, 279–302.

    Article  Google Scholar 

  • Leung, C. K., Pazdor, A. G., & Souza, J. (2021). Explainable artificial intelligence for data science on customer churn. In: 2021 IEEE 8th International conference on data science and advanced analytics DSAA 2021 (pp. 1–10). https://doi.org/10.1109/DSAA53316.2021.9564166

  • Liu, Y., Fan, J., Zhang, J., et al. (2023). Research on telecom customer churn prediction based on ensemble learning. Journal of Intelligent Information Systems, 60(3), 759–775.

    Article  Google Scholar 

  • Lu, J., Liu, A., Dong, F., et al. (2019). Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 31(12), 2346–2363. https://doi.org/10.1109/TKDE.2018.2876857

    Article  Google Scholar 

  • Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc.

  • Maisenbacher, M., & Weidlich, M. (2017). Handling concept drift in predictive process monitoring. In: IEEE International Conference on Services Computing, SCC 2017 (pp. 1–8). https://doi.org/10.1109/SCC.2017.10

  • Mena, G., Coussement, K., & Bock, K. W. D., et al. (2023). Exploiting time-varying rfm measures for customer churn prediction with deep neural networks. Annals of Operations Research. https://doi.org/10.1007/s10479-023-05259-9

  • Miguéis, V., Van den Poel, D., Camanho, A., et al. (2012). Modeling partial customer churn: On the value of first product-category purchase sequences. Expert Systems with Applications, 39(12), 11250–11256. https://doi.org/10.1016/j.eswa.2012.03.073

    Article  Google Scholar 

  • Mohammadzadeh, M., Hoseini, Z. Z., & Derafshi, H. (2017). A data mining approach for modeling churn behavior via rfm model in specialized clinics case study: A public sector hospital in tehran. In: International conference on theory and application of soft computing, computing with words and perception, ICSCCW 2017, vol. 120 (pp. 23–30). https://doi.org/10.1016/j.procs.2017.11.206

  • Murindanyi, S., Wycliff Mugalu, B., & Nakatumba-Nabende, J., et al. (2023). Interpretable machine learning for predicting customer churn in retail banking. In: International conference on trends in electronics and informatics ICOEI 2023 (pp. 967–974). https://doi.org/10.1109/ICOEI56765.2023.10125859

  • Muschalik, M., Fumagalli, F., & Hammer, B., et al. (2023). isage: An incremental version of SAGE for online explanation on data streams. In: D. Koutra, C. Plant, M.G. Rodriguez, et al. (Eds.), European Conference on machine learning and knowledge discovery in databases: research track, ECML PKDD 2023, proceedings, part III, lecture notes in computer science, vol. 14171 (pp. 428–445). Springer . https://doi.org/10.1007/978-3-031-43418-1_26

  • Pashami, S., Nowaczyk, S., & Fan, Y., et al. (2023). Explainable predictive maintenance. https://doi.org/10.48550/ARXIV.2306.05120. CoRR

  • Pasquadibisceglie, V., Appice, A., Castellano, G., et al. (2020). ORANGE: outcome-oriented predictive process monitoring based on image encoding and CNNs. IEEE Access, 8, 184073–184086. https://doi.org/10.1109/ACCESS.2020.3029323

    Article  Google Scholar 

  • Pasquadibisceglie, V., Appice, A., Castellano, G., et al. (2023). DARWIN: An online deep learning approach to handle concept drifts in predictive process monitoring. Engineering Applications of Artificial Intelligence, 123, 106461. https://doi.org/10.1016/j.engappai.2023.106461

    Article  Google Scholar 

  • Pauwels, S., & Calders, T. (2021). Incremental predictive process monitoring: The next activity case. In: Business process management, BPM 2021 (pp. 123–140). Springer International Publishing: Cham. https://doi.org/10.1007/978-3-030-85469-0_10

  • Prabadevi, B., Shalini, R., & Kavitha, B. (2023). Customer churning analysis using machine learning algorithms. International Journal of Intelligent Networks, 4, 145–154. https://doi.org/10.1016/j.ijin.2023.05.005

    Article  Google Scholar 

  • Read, J., & Zliobaite, I. (2022). Learning from data streams: An overview and update. In: CoRR, vol. abs/2212.14720. https://doi.org/10.48550/ARXIV.2212.14720

  • Sahoo, D., Pham, Q., & Lu, J., et al. (2018). Online deep learning: Learning deep neural networks on the fly. In: Proceedings of the 27th international joint conference on artificial intelligence, IJCAI 2018 (pp. 2660-2666). AAAI Press

  • Seymen, O. F., Dogan, O., & Hiziroglu, A. (2021). Customer churn prediction using deep learning. In: A. Abraham, Y. Ohsawa, N. Gandhi, M. Jabbar, A. Haqiq, S. McLoone, & B. Issac (Eds.), Proceedings of the 12th international conference on soft computing and pattern recognition, SoCPaR 2020, (pp. 520–529). Springer International Publishing: Cham. https://doi.org/10.1007/978-3-030-73689-7_50

  • Tan, C., Sun, F., & Kong, T., et al. (2018a). A survey on deep transfer learning. In: International conference on artificial neural networks and machine learning, ICANN 2018. https://doi.org/10.1007/978-3-030-01424-7_27

  • Tan, F., Wei, Z., & He, J., et al. (2018b). A blended deep learning approach for predicting user intended actions. In: 2018 IEEE International conference on data mining (ICDM) (pp. 487–496). https://doi.org/10.1109/ICDM.2018.00064

  • Teinemaa, I., Dumas, M., & La Rosa, M., et al. (2019). Outcome-oriented predictive process monitoring: Review and benchmark. ACM Transactions on Knowledge Discovery from Data, 13(2). https://doi.org/10.1145/3301300

  • Tekouabou, S. C. K., Gherghina, S. C., & Toulni, H., et al. (2022). Towards explainable machine learning for bank churn prediction using data balancing and ensemble-based methods. Mathematics, 10(14). https://doi.org/10.3390/math10142379

  • Van den Poel, D., & Larivière, B. (2004). Customer attrition analysis for financial services using proportional hazard models. European Journal of Operational Research, 157(1), 196–217. https://doi.org/10.1016/S0377-2217(03)00069-9

    Article  Google Scholar 

  • Vázquez-Martínez, U. J., Morales-Mediano, J., & Leal-Rodríguez, A. L. (2021). The impact of the covid-19 crisis on consumer purchasing motivation and behavior. European Research on Management and Business Economics, 27(3), 100166. https://doi.org/10.1016/j.iedeen.2021.100166

    Article  Google Scholar 

  • Webb, G. I., Lee, L. K., Goethals, B., et al. (2018). Analyzing concept drift and shift from sample data. Data Mining and Knowledge Discovery, 32(5), 1179–1199. https://doi.org/10.1007/S10618-018-0554-1

    Article  MathSciNet  Google Scholar 

  • Zhao, P., Xie, Y., & Zhang, L., et al. (2022). Efficient methods for non-stationary online learning. In: 36th Conference on neural information processing systems, NeurIPS 2022 (pp. 1–13).

  • Zhong, Y., Zhou, J., Li, P., et al. (2023). Dynamically evolving deep neural networks with continuous online learning. Information Sciences, 646, 119411. https://doi.org/10.1016/j.ins.2023.119411

    Article  Google Scholar 

Download references

Acknowledgements

The work of Vincenzo Pasquadibisceglie was supported by the project FAIR - Future AI Research (PE00000013), Spoke 6 - Symbiotic AI, under the NRRP MUR program funded by the NextGenerationEU. The work of Annalisa Appice, Donato Malerba and Giuseppe Ieva was in partial fulfilment of the research objectives of the Research Contract “LUTECH DIGITALE 4.0: Progetto di Tecniche di Machine Learning predittivo per la piattaforma di loyalty Management” within the project “LUTECH DIGITALE 4.0”. We thank reviewers for useful suggestions provided to improve the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

V. P.: Conceptualization, Methodology, Software, Validation, Investigation, Writing - original draft, Writing - review & editing A. A.: Conceptualization, Methodology, Validation, Investigation, Writing - original draft, Writing - review & editing, Supervision, Project administration. G. I.: Conceptualization, Investigation, Writing - review & editing. D. M.: Conceptualization, Methodology, Writing - original draft, Writing - review & editing

Corresponding author

Correspondence to Vincenzo Pasquadibisceglie.

Ethics declarations

Ethics Approval

We declare that this submission follows the policies as outlined in the Guide for Authors. The current research involves no Human Participants and/or Animals.

Conflict of Interests

The authors declare that they have no conflict of interest.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pasquadibisceglie, V., Appice, A., Ieva, G. et al. TSUNAMI - an explainable PPM approach for customer churn prediction in evolving retail data environments. J Intell Inf Syst (2023). https://doi.org/10.1007/s10844-023-00838-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10844-023-00838-5

Keywords

Navigation