Skip to main content
Log in

Comparing unsupervised probabilistic machine learning methods for market basket analysis

  • Original Paper
  • Published:
Review of Managerial Science Aims and scope Submit manuscript

Abstract

We compare several unsupervised probabilistic machine learning methods for market basket analysis, namely binary factor analysis, two topic models (latent Dirichlet allocation and the correlated topic model), the restricted Boltzmann machine and the deep belief net. After an overview of previous applications of unsupervised probabilistic machine learning methods to market basket analysis we shortly present the methods which we investigate and outline their estimation. Performance is measured by tenfold cross-validated log likelihood values. Binary factor analysis vastly outperforms topic models. The restricted Boltzmann machine attains a similar performance advantage over binary factor analysis. Overall, a deep belief net with 45 variables in the first and 15 variables in the second hidden layers turns out to be the best model. We also compare the investigated machine learning methods with respect to ease of interpretation and runtimes. In addition, we show how to interpret the relationships between hidden variables and observed category purchases. To demonstrate managerial implications we estimate the effect of promoting each category both on purchase probability increases of other product categories and the relative increase of basket size. Finally, we indicate several possibilities to extend restricted Boltzmann machines and deep belief nets for market basket analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ackerman TA (2005) Multidimensional item response theory models. In: Everitt BS, Howell DC (eds) Encyclopedia of statistics in behavioral science, vol 3. Wiley, Chichester, pp 1272–1280

    Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in very large databases. In: Proceedings of the 20th international conference on VLDB, Santiago, Chile, pp 487–1280

  • Altosaar J (2014) ctm-c. https://github.com/blei-lab/ctm-c. Accessed 6 Aug 2019

  • Ashenfelter O, Levine PB, Zimmerman DJ (2003) Statistics and econometrics: methods and applications. Wiley, New York

    Google Scholar 

  • Bartholomew DJ (1980) Factor analysis for categorical data. J R Stat Soc B 42:293–321

    Google Scholar 

  • Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–27

    Google Scholar 

  • Bengio Y, Lamblin P, Popovic D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems 19 (NIPS’06). MIT Press, Cambridge, pp 153–160

    Google Scholar 

  • Betancourt R, Gautschi D (1990) Demand complementarities, household production, and retail assortments. Mark Sci 9(2):146–161

    Google Scholar 

  • Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):17–35

    Google Scholar 

  • Blei DM, Lafferty JA (2007) A correlated topic model of science. Ann Appl Stat 1:17–35

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  • Boztuğ Y, Silberhorn N (2006) Modellierungsansätze in der Warenkorbanalyse im Überblick. J Betr Wirtsch 56:105–128

    Google Scholar 

  • Boztug Y, Reutterer T (2008) A combined approach for segment-specific market basket analysis. Eur J Oper Res 187:294–312

    Google Scholar 

  • Brown A, Croudace T (2015) Scoring and estimating score precision using multidimensional IRT. In: Reise SP, Revicki DA (eds) Handbook of item response theory modeling: applications to typical performance assessment. Routledge/Taylor & Francis, New York, pp 307–333

    Google Scholar 

  • Cai L (2010) High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika 75(1):33–57

    Google Scholar 

  • Chalmers RP (2012) mirt: a multidimensional item response theory package for the R environment. J Stat Softw 48(6):1–29

    Google Scholar 

  • Christidis K, Apostolou D, Mentzas G (2010) Exploring customer preferences with probabilistic topic models. In: European conference on machine learning and principles and practice of knowledge discovery in databases. Barcelona, Spain, Sept 20–24

  • Crain SP, Zhou K, Shuang-Hong Y, Zha H (2012) Dimensionality reduction and topic modeling. From latent semantic indexing to latent Dirichlet allocation and beyond. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, New York, pp 129–161

    Google Scholar 

  • Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? Mach Learn Res 11:625–660

    Google Scholar 

  • Erosheva E (2003) Bayesian estimation of the grade of membership model. Bayesian Stat 7:501–510

    Google Scholar 

  • Evermann J, Rehse J-R, Fettke P (2017) Predicting process behaviour using deep learning. Decis Support Syst 100:129–140

    Google Scholar 

  • Gedenk K, Neslin SA, Ailawadi KL (2010) Sales promotion. In: Krafft M, Mantrala MK (eds) Retailing in the 21st century, 2nd edn. Springer, Berlin, pp 303–317

    Google Scholar 

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Suppl. 1):5228–5235

    Google Scholar 

  • Grün B, Hornik K (2011) topicmodels: an R package for fitting topic models. J Stat Softw 40(13):1–30

    Google Scholar 

  • Hahsler M (2017) Groceries data set. https://rdrr.io/cran/arules/man/Groceries.html. Accessed 6 Aug 2019

  • Hahsler M, Hornik K, Reutterer T (2006) Implications of probabilistic data modeling for mining association rules. In: Spiliopoulou M, Kruse R, Borgelt C, Nürnberger A, Gaul W (eds) From data and information analysis to knowledge engineering. Springer, Berlin, pp 598–605

    Google Scholar 

  • Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–1800

    Google Scholar 

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    Google Scholar 

  • Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554

    Google Scholar 

  • Hruschka H (2014a) Analyzing market baskets by restricted Boltzmann machines. OR Spectr 36:209–228

    Google Scholar 

  • Hruschka H (2014b) Linking multi-category purchases to latent activities of shoppers: analysing market baskets by topic models. Mark ZFP 36:267–274

    Google Scholar 

  • Hruschka H (2017) Multi-category purchase incidences with marketing cross effects. Rev Manag Sci 11:443–469

    Google Scholar 

  • Jacobs B, Donkers B, Fok D (2016) Model-based purchase predictions for large assortments. Mark Sci 35:389–404

    Google Scholar 

  • Kamakura WA, Wedel M (2001) Exploratory Tobit factor analysis for multivariate censored data. Multivar Behav Res 36:5–82

    Google Scholar 

  • Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Ghahramani Z (ed) Proceedings of the 24th international conference on machine learning. ACM, New York, pp 473–480

    Google Scholar 

  • Le Roux N, Bengio Y (2007) Representational power of restricted Boltzmann machines and deep belief networks. Technical report 1294, Département d’informatique et recherche opérationnelle, Université de Montréal

  • Manchanda P, Ansari A, Gupta S (1999) The “Shopping Basket”: a model for multi-category purchase incidence decisions. Market Sci 18:95–114

    Google Scholar 

  • Mochihashi D (2004) lda, a latent Dirichlet allocation package. http://chasen.org/~daiti-m/dist/lda/. Accessed 6 Aug 2019

  • Murphy KP (2012) Machine learning. A probabilistic perspective. MIT Press, Cambridge

    Google Scholar 

  • Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data. https://doi.org/10.1186/s40537-014-0007-7

    Article  Google Scholar 

  • Pydoc (2019) Hamiltonian Monte Carlo—a gradient-based MCMC algorithm. https://www.pydoc.io/pypi/tfp-nightly-gpu-0.0.1.dev20180412/autoapi/python/mcmc/hmc/index.html. Accessed 6 Aug 2019

  • Ramanathan S, Dhar S (2010) The effect of sales promotions on the size and composition of the shopping basket: regulatory compatibility from framing and temporal restrictions. J Mark Res 47:542–552

    Google Scholar 

  • Reutterer T, Hahsler M, Hornik K (2007) Data Mining und Marketing am Beispiel der explorativenWarenkorbanalyse. Market ZFP 29(3):28–38

    Google Scholar 

  • Reutterer T, Hornik K, March N, Gruber K (2017) A data mining framework for targeted category promotions. J Bus Econ 87:337–358

    Google Scholar 

  • Rong X (2014) deepnet: deep learning toolkit in R. https://www.rdocumentation.org/packages/deepnet/versions/0.2. Accessed 6 Aug 2019

  • Russel GJ, Kamakura WA (1997) Modeling multiple category brand preference with household basket data. J Retail 73(4):439–461

    Google Scholar 

  • Russell GJ, Petersen A (2000) Analysis of cross category dependence in market basket selection. J Ret 76(3):369–392

    Google Scholar 

  • Salakhutdinov R, Hinton G (2012) An efficient learning procedure for deep Boltzmann machines. Neural Comput 24:1967–2006

    Google Scholar 

  • Schröder N (2017) Using multidimensional item response theory models to explain multi-category purchases. Mark ZFP 39(2):28–38

    Google Scholar 

  • Seetharaman PB, Siddhartha C, Ainslie A, Boatwright P, Chan T, Gupta S, Mehta N, Rao V, Strijnev A (2005) Models of multi-category choice behavior. Mark Lett 16:239–254

    Google Scholar 

  • Shevchuk Y (2019) Neupy: neural networks in Python. http://neupy.com/pages/home.html. Accessed 6 Aug 2019

  • Singh A, Tucker CS (2017) A machine learning approach to product review disambiguation based on function, form and behavior classification. Decis Support Syst 97:81–91

    Google Scholar 

  • Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing: explorations in the microstructure of cognition. Volume 1: foundations. MIT Press, Cambridge, pp 194–281

    Google Scholar 

  • Steyvers M, Griffiths T (2007) Probabilistic topic model. In: Landauer T, McNamara D, Dennis S, Kintsch W (eds) Handbook of latent semantic analysis. Erlbaum, Hillsdale, pp 424–440

    Google Scholar 

  • Sun Y, Deng H, Han J (2012) Probabilistic models for text mining. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, New York, pp 259–295

    Google Scholar 

  • Tirunillai S, Tellis GJ (2014) Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent Dirichlet allocation. J Mark Res 51(4):463–479

    Google Scholar 

  • Videla-Cavieres I, Ríos SA (2014) Extending market basket analysis with graph mining techniques: a real case. Expert Syst Appl 41:1928–1936

    Google Scholar 

  • Wedel M, Kamakura WA (1999) Market segmentation. Conceptual and methodological foundations, 2nd edn. Kluwer Academic Publishers, Boston

    Google Scholar 

  • Wedel M, Kannan PK (2016) Marketing analytics for data-rich environments. J Mark 80:97–121

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harald Hruschka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Topic models

Table 10 shows the topic proportions of the investigated LDA and CTM each with two topics.

Table 10 Topic proportions of categories

Appendix 2: Binary factor analysis

Table 11 contains the ten highest information values for the one factor BFA model.

Table 11 Informations for the one factor BFA

Appendix 3: Restricted Boltzmann machine

Tables 1213 and 14 list the hidden variables of the selected RBM (DBN) sorted by to the sum of absolute marginal effects in descending order. We show for each hidden variable the five categories with highest information values separately for positive and negative weights \(W_{jk}\).

Table 12 Sum of absolute marginal effects and informations for the RBM (part 1)
Table 13 Sum of absolute marginal effects and informations for the RBM (part 2)
Table 14 Sum of absolute marginal effects and informations for the RBM (part 3)

Appendix 4: Deep belief net

Tables 1516 and 17 list the hidden variables of the selected DBN sorted by to the sum of absolute marginal effects in descending order. We show for each hidden variable the five categories with highest information values separately for positive and negative weights \(W_{3lj}\).

Table 15 Sum of absolute marginal effects and informations for the DBN (part 1)
Table 16 Sum of absolute marginal effects and informations for the DBN (part 2)
Table 17 Sum of absolute marginal effects and informations for the DBN (part 3)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hruschka, H. Comparing unsupervised probabilistic machine learning methods for market basket analysis. Rev Manag Sci 15, 497–527 (2021). https://doi.org/10.1007/s11846-019-00349-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11846-019-00349-0

Keywords

JEL Classification

Navigation