Skip to main content
Log in

Challenges and opportunities in high-dimensional choice data analyses

  • Published:
Marketing Letters Aims and scope Submit manuscript

Abstract

Modern businesses routinely capture data on millions of observations across subjects, brand SKUs, time periods, predictor variables, and store locations, thereby generating massive high-dimensional datasets. For example, Netflix has choice data on billions of movies selected, user ratings, and geodemographic characteristics. Similar datasets emerge in retailing with potential use of RFIDs, online auctions (e.g., eBay), social networking sites (e.g., mySpace), product reviews (e.g., ePinion), customer relationship marketing, internet commerce, and mobile marketing. We envision massive databases as four-way VAST matrix arrays of Variables × Alternatives × Subjects × Time where at least one dimension is very large. Predictive choice modeling of such massive databases poses novel computational and modeling issues, and the negligence of academic research to address them will result in a disconnect from the marketing practice and an impoverishment of marketing theory. To address these issues, we discuss and identify the challenges and opportunities for both practicing and academic marketers. Thus, we offer an impetus for advancing research in this nascent area and fostering collaboration across scientific disciplines to improve the practice of marketing in information-rich environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allenby, G. M., McCulloch, R., & Rossi, P. E. (1996). The value of purchase history data in target marketing. Marketing Science, 15, 321–340.

    Google Scholar 

  • Ansari, A., Essegaier, S., & Kohli, R. (2000). Internet recommendation systems. Journal of Marketing Research, 37, 363–375.

    Article  Google Scholar 

  • Bacon, L., & Sridhar, A. (2006). Interactive innovation tools and methods. Annual Convention of the Marketing Research Association, Washington DC, June.

  • Baker, S. (2007). Google and the wisdom of clouds. Business Week, December 13th issue.

  • Balakrishnan, S., & Madigan, D. (2006). A one-pass sequential Monte Carlo method for Bayesian analysis of massive datasets. Bayesian Analysis, 1(2), 345–362.

    Google Scholar 

  • Balakrishnan, S., & Madigan, D. (2007). LAPS: LASSO with partition search. Manuscript.

  • Balasubramanian, S., Gupta, S., Kamakura, W. A., & Wedel, M. (1998). Modeling large datasets in marketing. Statistica Neerlandica, 52(3), 303–324.

    Article  Google Scholar 

  • Benzécri, J.-P. (2005). Foreword. In F. Murtaugh (Ed.), Correspondence analysis and data coding with JAVA and R. London, UK: Chapman and Hall.

    Google Scholar 

  • Bodapati, A. (2008). Recommendation systems with purchase data. Journal of Marketing Research, 45, 77–93.

    Article  Google Scholar 

  • Bradlow, E. T., Hardie, B. G. S., & Fader, P. S. (2002). Bayesian inference for the negative binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 189–201.

    Article  Google Scholar 

  • Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. Madison, WI: Morgan Kaufmann.

    Google Scholar 

  • Brockwell, A. E. (2006). Parallel Markov chain Monte Carlo simulation by pre-fetching. Journal of Computational and Graphical Statistics, 15(1), 246–261.

    Article  Google Scholar 

  • Brockwell, A. E., & Kadane, J. B. (2005). Identification of regeneration times in MCMC simulation, with application to adaptive schemes. Journal of Computational and Graphical Statistics, 14(2), 436–458.

    Article  Google Scholar 

  • Brown, S., & Rose, J. (1996). Architecture of FPGAs and CPLDs: A tutorial. IEEE Design and Test of Computers, 13(2), 42–57.

    Article  Google Scholar 

  • Brynjolfson, E., Smith, M., & Montgomery, A. (2007). The great equalizer: An empirical study of choice in shopbots. Working Paper, Carnegie Mellon University, Tepper School of Business.

  • Chung, T., Siong, R. R., & Wedel, M. (2007). My mobile music: Automatic adaptive play-list personalization. Marketing Science, in press.

  • Cook, R. D., & Weisberg, S. (1991). Discussion of Li (1991). Journal of the American Statistical Association, 86, 328–332.

    Article  Google Scholar 

  • Ding, M., Park, Y.-H., & Bradlow, E. (2007). Barter markets. Working Paper, The Wharton School.

  • Du, R., & Kamakura, W. A. (2007). How efficient is your category management? A stochastic-frontier factor model for internal benchmarking. Working Paper.

  • DuMouchel, W. (1999). Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician, 53(3), 177–190.

    Article  Google Scholar 

  • Escobar, M. D., & West, M. (1996). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577–588.

    Article  Google Scholar 

  • Everson, P. J., & Bradlow, E. T. (2002). Bayesian inference for the beta-binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 202–207.

    Article  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Article  Google Scholar 

  • Foutz, N. Z., & Jank, W. (2007). Forecasting new product revenues via online virtual stock market. MSI Report.

  • Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291–304.

    Article  Google Scholar 

  • Handcock, M. S., Raftery, A. E., & Tantrum, J. M. (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society. Series A, 170(2), 301–352.

    Google Scholar 

  • Hauben, M., Madigan, D., Gerrits, C., & Meyboom, R. (2005). The role of data mining in pharmacovigilance. Expert Opinion in Drug Safety, 4(5), 929–948.

    Article  Google Scholar 

  • Huang, Z., & Gelman, A. (2006). Sampling for Bayesian computation with large datasets. Retrieved from http://www.stat.columbia.edu/~gelman/research/unpublished/comp7.pdf.

  • Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning graphical models, vol 89 of series D: Behavioural and social sciences (pp. 105–162). Dordrecht, The Netherlands: Kluwer.

    Google Scholar 

  • Kamakura, W. A., & Kang, W. (2007). Chain-wide and store-level analysis for cross-category management. Journal of Retailing, 83(2), 159–170.

    Article  Google Scholar 

  • Kreulen, J., Cody, W., Spangler, W., & Krishna, V. (2002). The integration of business intelligence and knowledge management. IBM Systems Journal, 41(4), 2002.

    Google Scholar 

  • Kreulen, J., & Spangler, W. (2005). Interactive methods for taxonomy editing and validation. Next generation of data-mining applications, chapter 20 pp. 495–522. New York: Wiley.

    Google Scholar 

  • Kreulen, J., Spangler, W., & Lessler, J. (2003). Generating and browsing multiple taxonomies over a document collection. Journal of Management Information Systems, 19(4), 191–212.

    Google Scholar 

  • Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.

    Article  Google Scholar 

  • Li, L., Cook, R. D., & Tsai, C.-L. (2007). Partial inverse regression. Biometrika, 94, 615–625.

    Article  Google Scholar 

  • Liu, J. S., & Chen, R. R. (1998). Sequential Monte Carlo methods for dynamical systems. Journal of the American Statistical Association, 93, 1032–1044.

    Article  Google Scholar 

  • Miller, S. J., Bradlow, E. T., & Dayartna, K. (2006). Closed-form Bayesian inferences for the logit model via polynomial expansions. Quantitative Marketing and Economics, 4(2), 173–206.

    Article  Google Scholar 

  • Montgomery, A. L. (1997). Creating micro-marketing pricing strategies using supermarket scanner data. Marketing Science, 16(4), 315–337.

    Article  Google Scholar 

  • Montgomery, A. L., Li, S., Srinivasan, K., & Liechty, J. (2004). Modeling online browsing and path analysis using clickstream data. Marketing Science, 23(4), 579–595.

    Article  Google Scholar 

  • Naik, P. A., Hagerty, M., & Tsai, C.-L. (2000). A new dimension reduction approach for data-rich marketing environments: Sliced inverse regression. Journal of Marketing Research, 37(1), 88–101.

    Article  Google Scholar 

  • Naik, P. A., & Tsai, C.-L. (2004). Isotonic single-index model for high-dimensional database marketing. Computational Statistics and Data Analysis, 47(4), 775–790.

    Article  Google Scholar 

  • Naik, P. A., & Tsai, C.-L. (2005). Constrained inverse regression for incorporating prior information. Journal of the American Statistical Association, 100(469), 204–211.

    Article  Google Scholar 

  • Naik, P. A., Wedel, M., & Kamakura, W. (2008). Multi-index binary response model for analysis of large datasets. Journal of Business and Economic Statistics, in press.

  • Prelec, D. (2001). Readings packet on the information pump. Boston, MA: MIT Sloan School of Management.

    Google Scholar 

  • Ridgeway, G., & Madigan, D. (2002). A sequential Monte Carlo method for Bayesian analysis of massive datasets. Journal of Knowledge Discovery and Data Mining, 7, 301–319.

    Article  Google Scholar 

  • Silverman, B. W. (1986). Density estimation. London, UK: Chapman and Hall.

    Google Scholar 

  • Simonoff, J. S. (1996). Smoothing methods in statistics. New York, NY: Springer.

    Google Scholar 

  • Spangler, S., & Kreulen, J. (2007). Mining the talk: Unlocking the business value in unstructured information. Indianapolis, IN: IBM.

    Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1), 267–288.

    Google Scholar 

  • Toubia, O. (2006). Idea generation, creativity, and incentives. Marketing Science, 25(5), 411–425.

    Article  Google Scholar 

  • Trusov, M., Bodapati, A., & Bucklin, R. E. (2007a). Determining influential users in internet social networks. Working Paper, Robert H. Smith School of Business, University of Maryland.

  • Trusov, M., Bucklin, R. E., & Pauwels, K. (2007b). Estimating the dynamic effects of online word-of-mouth on member growth of a social network site. Working Paper, Robert H. Smith School of Business, University of Maryland.

  • Wainwright, M., & Jordan, M. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Department of Statistics, UC Berkeley.

  • Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Wedel, M., & Kamakura, W. (2000). Market segmentation: Conceptual and methodological foundations (2nd edn.). Dordrecht: Kluwer.

    Google Scholar 

  • Wedel, M., & Kamakura, W. A. (2001). Factor analysis with mixed observed and latent variables in the exponential family. Psychometrika, 66(4), 515–530.

    Article  Google Scholar 

  • Wedel, M., & Zhang, J. (2004). Analyzing brand competition across subcategories. Journal of Marketing Research, 41(4), 448–456.

    Article  Google Scholar 

  • Ying, Y., Feinberg, F., & Wedel, M. (2006). Improving online product recommendations by including nonrated items. Journal of Marketing Research, 43, 355–365.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michel Wedel.

Additional information

Prasad Naik and Michel Wedel are co-chairs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naik, P., Wedel, M., Bacon, L. et al. Challenges and opportunities in high-dimensional choice data analyses. Mark Lett 19, 201–213 (2008). https://doi.org/10.1007/s11002-008-9036-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11002-008-9036-3

Keywords

Navigation