Challenges and opportunities in high-dimensional choice data analyses

Naik, Prasad; Wedel, Michel; Bacon, Lynd; Bodapati, Anand; Bradlow, Eric; Kamakura, Wagner; Kreulen, Jeffrey; Lenk, Peter; Madigan, David M.; Montgomery, Alan

doi:10.1007/s11002-008-9036-3

Challenges and opportunities in high-dimensional choice data analyses

Published: 31 May 2008

Volume 19, pages 201–213, (2008)
Cite this article

Marketing Letters Aims and scope Submit manuscript

Prasad Naik¹,
Michel Wedel²,
Lynd Bacon³,
Anand Bodapati⁴,
Eric Bradlow⁵,
Wagner Kamakura⁶,
Jeffrey Kreulen⁷,
Peter Lenk⁸,
David M. Madigan⁹ &
…
Alan Montgomery¹⁰

949 Accesses
32 Citations
Explore all metrics

Abstract

Modern businesses routinely capture data on millions of observations across subjects, brand SKUs, time periods, predictor variables, and store locations, thereby generating massive high-dimensional datasets. For example, Netflix has choice data on billions of movies selected, user ratings, and geodemographic characteristics. Similar datasets emerge in retailing with potential use of RFIDs, online auctions (e.g., eBay), social networking sites (e.g., mySpace), product reviews (e.g., ePinion), customer relationship marketing, internet commerce, and mobile marketing. We envision massive databases as four-way VAST matrix arrays of Variables × Alternatives × Subjects × Time where at least one dimension is very large. Predictive choice modeling of such massive databases poses novel computational and modeling issues, and the negligence of academic research to address them will result in a disconnect from the marketing practice and an impoverishment of marketing theory. To address these issues, we discuss and identify the challenges and opportunities for both practicing and academic marketers. Thus, we offer an impetus for advancing research in this nascent area and fostering collaboration across scientific disciplines to improve the practice of marketing in information-rich environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is Qualitative in Qualitative Research

Article Open access 27 February 2019

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

References

Allenby, G. M., McCulloch, R., & Rossi, P. E. (1996). The value of purchase history data in target marketing. Marketing Science, 15, 321–340.
Google Scholar
Ansari, A., Essegaier, S., & Kohli, R. (2000). Internet recommendation systems. Journal of Marketing Research, 37, 363–375.
Article Google Scholar
Bacon, L., & Sridhar, A. (2006). Interactive innovation tools and methods. Annual Convention of the Marketing Research Association, Washington DC, June.
Baker, S. (2007). Google and the wisdom of clouds. Business Week, December 13th issue.
Balakrishnan, S., & Madigan, D. (2006). A one-pass sequential Monte Carlo method for Bayesian analysis of massive datasets. Bayesian Analysis, 1(2), 345–362.
Google Scholar
Balakrishnan, S., & Madigan, D. (2007). LAPS: LASSO with partition search. Manuscript.
Balasubramanian, S., Gupta, S., Kamakura, W. A., & Wedel, M. (1998). Modeling large datasets in marketing. Statistica Neerlandica, 52(3), 303–324.
Article Google Scholar
Benzécri, J.-P. (2005). Foreword. In F. Murtaugh (Ed.), Correspondence analysis and data coding with JAVA and R. London, UK: Chapman and Hall.
Google Scholar
Bodapati, A. (2008). Recommendation systems with purchase data. Journal of Marketing Research, 45, 77–93.
Article Google Scholar
Bradlow, E. T., Hardie, B. G. S., & Fader, P. S. (2002). Bayesian inference for the negative binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 189–201.
Article Google Scholar
Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. Madison, WI: Morgan Kaufmann.
Google Scholar
Brockwell, A. E. (2006). Parallel Markov chain Monte Carlo simulation by pre-fetching. Journal of Computational and Graphical Statistics, 15(1), 246–261.
Article Google Scholar
Brockwell, A. E., & Kadane, J. B. (2005). Identification of regeneration times in MCMC simulation, with application to adaptive schemes. Journal of Computational and Graphical Statistics, 14(2), 436–458.
Article Google Scholar
Brown, S., & Rose, J. (1996). Architecture of FPGAs and CPLDs: A tutorial. IEEE Design and Test of Computers, 13(2), 42–57.
Article Google Scholar
Brynjolfson, E., Smith, M., & Montgomery, A. (2007). The great equalizer: An empirical study of choice in shopbots. Working Paper, Carnegie Mellon University, Tepper School of Business.
Chung, T., Siong, R. R., & Wedel, M. (2007). My mobile music: Automatic adaptive play-list personalization. Marketing Science, in press.
Cook, R. D., & Weisberg, S. (1991). Discussion of Li (1991). Journal of the American Statistical Association, 86, 328–332.
Article Google Scholar
Ding, M., Park, Y.-H., & Bradlow, E. (2007). Barter markets. Working Paper, The Wharton School.
Du, R., & Kamakura, W. A. (2007). How efficient is your category management? A stochastic-frontier factor model for internal benchmarking. Working Paper.
DuMouchel, W. (1999). Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician, 53(3), 177–190.
Article Google Scholar
Escobar, M. D., & West, M. (1996). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577–588.
Article Google Scholar
Everson, P. J., & Bradlow, E. T. (2002). Bayesian inference for the beta-binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 202–207.
Article Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article Google Scholar
Foutz, N. Z., & Jank, W. (2007). Forecasting new product revenues via online virtual stock market. MSI Report.
Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291–304.
Article Google Scholar
Handcock, M. S., Raftery, A. E., & Tantrum, J. M. (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society. Series A, 170(2), 301–352.
Google Scholar
Hauben, M., Madigan, D., Gerrits, C., & Meyboom, R. (2005). The role of data mining in pharmacovigilance. Expert Opinion in Drug Safety, 4(5), 929–948.
Article Google Scholar
Huang, Z., & Gelman, A. (2006). Sampling for Bayesian computation with large datasets. Retrieved from http://www.stat.columbia.edu/~gelman/research/unpublished/comp7.pdf.
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning graphical models, vol 89 of series D: Behavioural and social sciences (pp. 105–162). Dordrecht, The Netherlands: Kluwer.
Google Scholar
Kamakura, W. A., & Kang, W. (2007). Chain-wide and store-level analysis for cross-category management. Journal of Retailing, 83(2), 159–170.
Article Google Scholar
Kreulen, J., Cody, W., Spangler, W., & Krishna, V. (2002). The integration of business intelligence and knowledge management. IBM Systems Journal, 41(4), 2002.
Google Scholar
Kreulen, J., & Spangler, W. (2005). Interactive methods for taxonomy editing and validation. Next generation of data-mining applications, chapter 20 pp. 495–522. New York: Wiley.
Google Scholar
Kreulen, J., Spangler, W., & Lessler, J. (2003). Generating and browsing multiple taxonomies over a document collection. Journal of Management Information Systems, 19(4), 191–212.
Google Scholar
Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.
Article Google Scholar
Li, L., Cook, R. D., & Tsai, C.-L. (2007). Partial inverse regression. Biometrika, 94, 615–625.
Article Google Scholar
Liu, J. S., & Chen, R. R. (1998). Sequential Monte Carlo methods for dynamical systems. Journal of the American Statistical Association, 93, 1032–1044.
Article Google Scholar
Miller, S. J., Bradlow, E. T., & Dayartna, K. (2006). Closed-form Bayesian inferences for the logit model via polynomial expansions. Quantitative Marketing and Economics, 4(2), 173–206.
Article Google Scholar
Montgomery, A. L. (1997). Creating micro-marketing pricing strategies using supermarket scanner data. Marketing Science, 16(4), 315–337.
Article Google Scholar
Montgomery, A. L., Li, S., Srinivasan, K., & Liechty, J. (2004). Modeling online browsing and path analysis using clickstream data. Marketing Science, 23(4), 579–595.
Article Google Scholar
Naik, P. A., Hagerty, M., & Tsai, C.-L. (2000). A new dimension reduction approach for data-rich marketing environments: Sliced inverse regression. Journal of Marketing Research, 37(1), 88–101.
Article Google Scholar
Naik, P. A., & Tsai, C.-L. (2004). Isotonic single-index model for high-dimensional database marketing. Computational Statistics and Data Analysis, 47(4), 775–790.
Article Google Scholar
Naik, P. A., & Tsai, C.-L. (2005). Constrained inverse regression for incorporating prior information. Journal of the American Statistical Association, 100(469), 204–211.
Article Google Scholar
Naik, P. A., Wedel, M., & Kamakura, W. (2008). Multi-index binary response model for analysis of large datasets. Journal of Business and Economic Statistics, in press.
Prelec, D. (2001). Readings packet on the information pump. Boston, MA: MIT Sloan School of Management.
Google Scholar
Ridgeway, G., & Madigan, D. (2002). A sequential Monte Carlo method for Bayesian analysis of massive datasets. Journal of Knowledge Discovery and Data Mining, 7, 301–319.
Article Google Scholar
Silverman, B. W. (1986). Density estimation. London, UK: Chapman and Hall.
Google Scholar
Simonoff, J. S. (1996). Smoothing methods in statistics. New York, NY: Springer.
Google Scholar
Spangler, S., & Kreulen, J. (2007). Mining the talk: Unlocking the business value in unstructured information. Indianapolis, IN: IBM.
Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1), 267–288.
Google Scholar
Toubia, O. (2006). Idea generation, creativity, and incentives. Marketing Science, 25(5), 411–425.
Article Google Scholar
Trusov, M., Bodapati, A., & Bucklin, R. E. (2007a). Determining influential users in internet social networks. Working Paper, Robert H. Smith School of Business, University of Maryland.
Trusov, M., Bucklin, R. E., & Pauwels, K. (2007b). Estimating the dynamic effects of online word-of-mouth on member growth of a social network site. Working Paper, Robert H. Smith School of Business, University of Maryland.
Wainwright, M., & Jordan, M. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Department of Statistics, UC Berkeley.
Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge: Cambridge University Press.
Google Scholar
Wedel, M., & Kamakura, W. (2000). Market segmentation: Conceptual and methodological foundations (2nd edn.). Dordrecht: Kluwer.
Google Scholar
Wedel, M., & Kamakura, W. A. (2001). Factor analysis with mixed observed and latent variables in the exponential family. Psychometrika, 66(4), 515–530.
Article Google Scholar
Wedel, M., & Zhang, J. (2004). Analyzing brand competition across subcategories. Journal of Marketing Research, 41(4), 448–456.
Article Google Scholar
Ying, Y., Feinberg, F., & Wedel, M. (2006). Improving online product recommendations by including nonrated items. Journal of Marketing Research, 43, 355–365.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Davis, Davis, CA, USA
Prasad Naik
University of Maryland, College Park, MD, USA
Michel Wedel
Polimetrix Inc., Palo Alto, CA, USA
Lynd Bacon
University of California, Los Angeles, Los Angeles, CA, USA
Anand Bodapati
University of Pennsylvania, Philadelphia, PA, USA
Eric Bradlow
Duke University, Durham, NC, USA
Wagner Kamakura
IBM Almaden Research Center, San Jose, CA, USA
Jeffrey Kreulen
University of Michigan, Ann Arbor, MI, USA
Peter Lenk
Columbia University, New York, NY, USA
David M. Madigan
Carnegie Mellon University, Pittsburgh, PA, USA
Alan Montgomery

Authors

Prasad Naik
View author publications
You can also search for this author in PubMed Google Scholar
Michel Wedel
View author publications
You can also search for this author in PubMed Google Scholar
Lynd Bacon
View author publications
You can also search for this author in PubMed Google Scholar
Anand Bodapati
View author publications
You can also search for this author in PubMed Google Scholar
Eric Bradlow
View author publications
You can also search for this author in PubMed Google Scholar
Wagner Kamakura
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Kreulen
View author publications
You can also search for this author in PubMed Google Scholar
Peter Lenk
View author publications
You can also search for this author in PubMed Google Scholar
David M. Madigan
View author publications
You can also search for this author in PubMed Google Scholar
Alan Montgomery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michel Wedel.

Additional information

Prasad Naik and Michel Wedel are co-chairs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naik, P., Wedel, M., Bacon, L. et al. Challenges and opportunities in high-dimensional choice data analyses. Mark Lett 19, 201–213 (2008). https://doi.org/10.1007/s11002-008-9036-3

Download citation

Published: 31 May 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s11002-008-9036-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Challenges and opportunities in high-dimensional choice data analyses

Abstract

Access this article

Similar content being viewed by others

What is Qualitative in Qualitative Research

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Challenges and opportunities in high-dimensional choice data analyses

Abstract

Access this article

Similar content being viewed by others

What is Qualitative in Qualitative Research

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation