Comparing unsupervised probabilistic machine learning methods for market basket analysis

Abstract

We compare several unsupervised probabilistic machine learning methods for market basket analysis, namely binary factor analysis, two topic models (latent Dirichlet allocation and the correlated topic model), the restricted Boltzmann machine and the deep belief net. After an overview of previous applications of unsupervised probabilistic machine learning methods to market basket analysis we shortly present the methods which we investigate and outline their estimation. Performance is measured by tenfold cross-validated log likelihood values. Binary factor analysis vastly outperforms topic models. The restricted Boltzmann machine attains a similar performance advantage over binary factor analysis. Overall, a deep belief net with 45 variables in the first and 15 variables in the second hidden layers turns out to be the best model. We also compare the investigated machine learning methods with respect to ease of interpretation and runtimes. In addition, we show how to interpret the relationships between hidden variables and observed category purchases. To demonstrate managerial implications we estimate the effect of promoting each category both on purchase probability increases of other product categories and the relative increase of basket size. Finally, we indicate several possibilities to extend restricted Boltzmann machines and deep belief nets for market basket analysis.

This is a preview of subscription content, access via your institution.

References

  1. Ackerman TA (2005) Multidimensional item response theory models. In: Everitt BS, Howell DC (eds) Encyclopedia of statistics in behavioral science, vol 3. Wiley, Chichester, pp 1272–1280

    Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in very large databases. In: Proceedings of the 20th international conference on VLDB, Santiago, Chile, pp 487–1280

  3. Altosaar J (2014) ctm-c. https://github.com/blei-lab/ctm-c. Accessed 6 Aug 2019

  4. Ashenfelter O, Levine PB, Zimmerman DJ (2003) Statistics and econometrics: methods and applications. Wiley, New York

    Google Scholar 

  5. Bartholomew DJ (1980) Factor analysis for categorical data. J R Stat Soc B 42:293–321

    Google Scholar 

  6. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–27

    Google Scholar 

  7. Bengio Y, Lamblin P, Popovic D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems 19 (NIPS’06). MIT Press, Cambridge, pp 153–160

    Google Scholar 

  8. Betancourt R, Gautschi D (1990) Demand complementarities, household production, and retail assortments. Mark Sci 9(2):146–161

    Google Scholar 

  9. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):17–35

    Google Scholar 

  10. Blei DM, Lafferty JA (2007) A correlated topic model of science. Ann Appl Stat 1:17–35

    Google Scholar 

  11. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  12. Boztuğ Y, Silberhorn N (2006) Modellierungsansätze in der Warenkorbanalyse im Überblick. J Betr Wirtsch 56:105–128

    Google Scholar 

  13. Boztug Y, Reutterer T (2008) A combined approach for segment-specific market basket analysis. Eur J Oper Res 187:294–312

    Google Scholar 

  14. Brown A, Croudace T (2015) Scoring and estimating score precision using multidimensional IRT. In: Reise SP, Revicki DA (eds) Handbook of item response theory modeling: applications to typical performance assessment. Routledge/Taylor & Francis, New York, pp 307–333

    Google Scholar 

  15. Cai L (2010) High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika 75(1):33–57

    Google Scholar 

  16. Chalmers RP (2012) mirt: a multidimensional item response theory package for the R environment. J Stat Softw 48(6):1–29

    Google Scholar 

  17. Christidis K, Apostolou D, Mentzas G (2010) Exploring customer preferences with probabilistic topic models. In: European conference on machine learning and principles and practice of knowledge discovery in databases. Barcelona, Spain, Sept 20–24

  18. Crain SP, Zhou K, Shuang-Hong Y, Zha H (2012) Dimensionality reduction and topic modeling. From latent semantic indexing to latent Dirichlet allocation and beyond. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, New York, pp 129–161

    Google Scholar 

  19. Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? Mach Learn Res 11:625–660

    Google Scholar 

  20. Erosheva E (2003) Bayesian estimation of the grade of membership model. Bayesian Stat 7:501–510

    Google Scholar 

  21. Evermann J, Rehse J-R, Fettke P (2017) Predicting process behaviour using deep learning. Decis Support Syst 100:129–140

    Google Scholar 

  22. Gedenk K, Neslin SA, Ailawadi KL (2010) Sales promotion. In: Krafft M, Mantrala MK (eds) Retailing in the 21st century, 2nd edn. Springer, Berlin, pp 303–317

    Google Scholar 

  23. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Suppl. 1):5228–5235

    Google Scholar 

  24. Grün B, Hornik K (2011) topicmodels: an R package for fitting topic models. J Stat Softw 40(13):1–30

    Google Scholar 

  25. Hahsler M (2017) Groceries data set. https://rdrr.io/cran/arules/man/Groceries.html. Accessed 6 Aug 2019

  26. Hahsler M, Hornik K, Reutterer T (2006) Implications of probabilistic data modeling for mining association rules. In: Spiliopoulou M, Kruse R, Borgelt C, Nürnberger A, Gaul W (eds) From data and information analysis to knowledge engineering. Springer, Berlin, pp 598–605

    Google Scholar 

  27. Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–1800

    Google Scholar 

  28. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    Google Scholar 

  29. Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554

    Google Scholar 

  30. Hruschka H (2014a) Analyzing market baskets by restricted Boltzmann machines. OR Spectr 36:209–228

    Google Scholar 

  31. Hruschka H (2014b) Linking multi-category purchases to latent activities of shoppers: analysing market baskets by topic models. Mark ZFP 36:267–274

    Google Scholar 

  32. Hruschka H (2017) Multi-category purchase incidences with marketing cross effects. Rev Manag Sci 11:443–469

    Google Scholar 

  33. Jacobs B, Donkers B, Fok D (2016) Model-based purchase predictions for large assortments. Mark Sci 35:389–404

    Google Scholar 

  34. Kamakura WA, Wedel M (2001) Exploratory Tobit factor analysis for multivariate censored data. Multivar Behav Res 36:5–82

    Google Scholar 

  35. Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Ghahramani Z (ed) Proceedings of the 24th international conference on machine learning. ACM, New York, pp 473–480

    Google Scholar 

  36. Le Roux N, Bengio Y (2007) Representational power of restricted Boltzmann machines and deep belief networks. Technical report 1294, Département d’informatique et recherche opérationnelle, Université de Montréal

  37. Manchanda P, Ansari A, Gupta S (1999) The “Shopping Basket”: a model for multi-category purchase incidence decisions. Market Sci 18:95–114

    Google Scholar 

  38. Mochihashi D (2004) lda, a latent Dirichlet allocation package. http://chasen.org/~daiti-m/dist/lda/. Accessed 6 Aug 2019

  39. Murphy KP (2012) Machine learning. A probabilistic perspective. MIT Press, Cambridge

    Google Scholar 

  40. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data. https://doi.org/10.1186/s40537-014-0007-7

    Article  Google Scholar 

  41. Pydoc (2019) Hamiltonian Monte Carlo—a gradient-based MCMC algorithm. https://www.pydoc.io/pypi/tfp-nightly-gpu-0.0.1.dev20180412/autoapi/python/mcmc/hmc/index.html. Accessed 6 Aug 2019

  42. Ramanathan S, Dhar S (2010) The effect of sales promotions on the size and composition of the shopping basket: regulatory compatibility from framing and temporal restrictions. J Mark Res 47:542–552

    Google Scholar 

  43. Reutterer T, Hahsler M, Hornik K (2007) Data Mining und Marketing am Beispiel der explorativenWarenkorbanalyse. Market ZFP 29(3):28–38

    Google Scholar 

  44. Reutterer T, Hornik K, March N, Gruber K (2017) A data mining framework for targeted category promotions. J Bus Econ 87:337–358

    Google Scholar 

  45. Rong X (2014) deepnet: deep learning toolkit in R. https://www.rdocumentation.org/packages/deepnet/versions/0.2. Accessed 6 Aug 2019

  46. Russel GJ, Kamakura WA (1997) Modeling multiple category brand preference with household basket data. J Retail 73(4):439–461

    Google Scholar 

  47. Russell GJ, Petersen A (2000) Analysis of cross category dependence in market basket selection. J Ret 76(3):369–392

    Google Scholar 

  48. Salakhutdinov R, Hinton G (2012) An efficient learning procedure for deep Boltzmann machines. Neural Comput 24:1967–2006

    Google Scholar 

  49. Schröder N (2017) Using multidimensional item response theory models to explain multi-category purchases. Mark ZFP 39(2):28–38

    Google Scholar 

  50. Seetharaman PB, Siddhartha C, Ainslie A, Boatwright P, Chan T, Gupta S, Mehta N, Rao V, Strijnev A (2005) Models of multi-category choice behavior. Mark Lett 16:239–254

    Google Scholar 

  51. Shevchuk Y (2019) Neupy: neural networks in Python. http://neupy.com/pages/home.html. Accessed 6 Aug 2019

  52. Singh A, Tucker CS (2017) A machine learning approach to product review disambiguation based on function, form and behavior classification. Decis Support Syst 97:81–91

    Google Scholar 

  53. Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing: explorations in the microstructure of cognition. Volume 1: foundations. MIT Press, Cambridge, pp 194–281

    Google Scholar 

  54. Steyvers M, Griffiths T (2007) Probabilistic topic model. In: Landauer T, McNamara D, Dennis S, Kintsch W (eds) Handbook of latent semantic analysis. Erlbaum, Hillsdale, pp 424–440

    Google Scholar 

  55. Sun Y, Deng H, Han J (2012) Probabilistic models for text mining. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, New York, pp 259–295

    Google Scholar 

  56. Tirunillai S, Tellis GJ (2014) Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent Dirichlet allocation. J Mark Res 51(4):463–479

    Google Scholar 

  57. Videla-Cavieres I, Ríos SA (2014) Extending market basket analysis with graph mining techniques: a real case. Expert Syst Appl 41:1928–1936

    Google Scholar 

  58. Wedel M, Kamakura WA (1999) Market segmentation. Conceptual and methodological foundations, 2nd edn. Kluwer Academic Publishers, Boston

    Google Scholar 

  59. Wedel M, Kannan PK (2016) Marketing analytics for data-rich environments. J Mark 80:97–121

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Harald Hruschka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Topic models

Table 10 shows the topic proportions of the investigated LDA and CTM each with two topics.

Table 10 Topic proportions of categories

Appendix 2: Binary factor analysis

Table 11 contains the ten highest information values for the one factor BFA model.

Table 11 Informations for the one factor BFA

Appendix 3: Restricted Boltzmann machine

Tables 1213 and 14 list the hidden variables of the selected RBM (DBN) sorted by to the sum of absolute marginal effects in descending order. We show for each hidden variable the five categories with highest information values separately for positive and negative weights \(W_{jk}\).

Table 12 Sum of absolute marginal effects and informations for the RBM (part 1)
Table 13 Sum of absolute marginal effects and informations for the RBM (part 2)
Table 14 Sum of absolute marginal effects and informations for the RBM (part 3)

Appendix 4: Deep belief net

Tables 1516 and 17 list the hidden variables of the selected DBN sorted by to the sum of absolute marginal effects in descending order. We show for each hidden variable the five categories with highest information values separately for positive and negative weights \(W_{3lj}\).

Table 15 Sum of absolute marginal effects and informations for the DBN (part 1)
Table 16 Sum of absolute marginal effects and informations for the DBN (part 2)
Table 17 Sum of absolute marginal effects and informations for the DBN (part 3)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hruschka, H. Comparing unsupervised probabilistic machine learning methods for market basket analysis. Rev Manag Sci 15, 497–527 (2021). https://doi.org/10.1007/s11846-019-00349-0

Download citation

Keywords

  • Machine learning
  • Market basket analysis
  • Factor analysis
  • Topic models
  • Restricted Boltzmann machine
  • Deep learning

JEL Classification

  • M31
  • L81
  • D12
  • C45
  • C89