Abstract
Repeat buyer prediction is crucial for e-commerce companies to enhance their customer services and product sales. In particular, being aware of which factors or rules drive repeat purchases is as significant as knowing the outcomes of predictions in the business field. Therefore, an interpretable model with excellent prediction performance is required. Many classifiers, such as the multilayer perceptron, have exceptional predictive abilities but lack model interpretability. Tree-based models possess interpretability; however, their predictive performances usually cannot achieve high levels. Based on these observations, we design an approach to balance the predictive and interpretable performance of a decision tree with model distillation and heterogeneous classifier fusion. Specifically, we first train multiple heterogeneous classifiers and integrate them through diverse combination operators. Then, classifier combination plays the role of teacher model. Subsequently, soft targets are obtained from the teacher and guide training of the decision tree. A real-world repeat buyer prediction dataset is utilized in this paper, and we adopt features with respect to three aspects: users, merchants, and user–merchant pairs. Our experimental results show that the accuracy and AUC of the decision tree are both improved, and we provide model interpretations of three aspects.
Similar content being viewed by others
References
Jia R, Li R, Yu M, Wang S (2017) E-commerce purchase prediction approach by user behavior data. In: International conference on computer, information and telecommunication systems (CITS), pp 1–5
Liu G, Nguyen TT, Zhao G, Zha W, Yang J, Cao J, Wu M, Zhao P, Chen W (2016) Repeat buyer prediction for E-Commerce. In: The 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 155–164
Tian Y, Ye Z, Yan Y, Sun M (2015) A practical model to predict the repeat purchasing pattern of consumers in the C2C e-commerce. Electron Commer Res 15:571–583
Zhang H, Li J, Ji Y, Yue H, Learning S (2017) Understanding by character-level. IEEE Trans Ind Inf 13:616–624
Cao J, Li W, Ma C, Tao Z (2018) Optimizing multi-sensor deployment via ensemble pruning for wearable activity recognition. Inf Fusion 41(5):68–79
Kurt S, őz E, Askin OE, őz YY (2018) Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches. Neural Comput Appl 29(8):251–262
Pai P-F, ChangLiao L-H, Lin K-P (2017) Analyzing basketball games by a support vector machines with decision tree model. Neural Comput Appl 28(12):4159–4167
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. In: NIPS deep learning workshop
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Goldstein A, Kapelner A, Bleich J, Pitkin E (2014) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. arXiv:1309.6392v2,
Apley DW (2016) Visualizing the effects of predictor variables in black box supervised learning models. arXiv:1612.08468
Che Z, Purushotham S, Khemani R, Liu Y (2016) Interpretable deep models for ICU outcome prediction. In: American medical informatics association (AMIA) annual symposium, pp 371–380
Tan S, Caruana R, Hooker G, Gordo A (2018) Transparent model distillation. arXiv:1801.08640
Tan S, Caruana R, Hooker G, Lou Y (2017) Detecting bias in black-box models using transparent model distillation. arXiv:1710.06169
Molnar C (2018) Interpretable machine learning. Retrieved from https://christophm.github.io/interpretable-ml-book/. Accessed 27 Aug 2019
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Kumar D, Taylor GW, Wong A (2017) Opening the black box of financial AI with CLEAR-trade: a CLass-enhanced attentive response approach for explaining and visualizing deep learning-driven stock market prediction. arXiv:1709.01574
Puri N, Gupta P, Agarwal P, Verma S, Krishnamurthy B (2018) MAGIX: model agnostic globally interpretable explanations. arXiv:1706.07160v3,
Bucilǎ C, Caruana R, Niculescu-Mizil A (2006) Compression model. KDD 06:20–23
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928
Uijlings J, Popov S, Ferrari V (2017) Revisiting knowledge transfer for training object class detectors. arXiv:1708.06128
Pham H, Guan MY, Zoph B, Le QV , Dean J (2018) Efficient neural architecture search via parameters sharing. arXiv:1802.03268
Frosst N, Hinton G (2017) Distilling a neural network into a soft decision tree. arXiv:1711.09784,
Li W, Hou J, Yin L (2015) A classifier fusion method based on classifier accuracy. In: IEEE international conference on mechatronics and control (ICMC)
Ruta D, Gabrys B (2000) An overview of classifier fusion methods. Comput Inf Syst 7(1):1–10
Haque MN, Noman MN, Berretta R, Moscato P (2016) Optimising weights for heterogeneous ensemble of classifiers with differential evolution. In: IEEE congress on evolutionary computation (CEC)
Riniker S, Fechner N, Landrum GA (2013) Heterogeneous classifier fusion for ligand-based virtual screening: or how decision making by committee can be a good thing. J Chem Inf Model 53(11):2829–2836
Bashir S, Qamar U, Khan FH (2015) Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble. Qual Quant 49(5):2061–2076
Kang S, Cho S, Rhee S, Kyung-Sang Y (2017) Reliable prediction of anti-diabetic drug failure using a reject option. Pattern Anal Appl 20(3):883–891
Ludmila I (2004) Kuncheva, combining pattern classifiers. Wiley, Hoboken, pp 157–160
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Routledge, New York
https://github.com/Junren0716/Tmall-Repeated-buyers-prediction
Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. ArXiv:1412.6980
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Acknowledgements
This research is partially supported by the National Natural Science Foundation of China (Grants Nos. 71620107002, 61502360 and 71821001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest statements.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shen, Y., Xu, X. & Cao, J. Reconciling predictive and interpretable performance in repeat buyer prediction via model distillation and heterogeneous classifiers fusion. Neural Comput & Applic 32, 9495–9508 (2020). https://doi.org/10.1007/s00521-019-04462-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04462-9