Machine Learning and Big Data

Part of the International Series in Quantitative Marketing book series (ISQM)


The last 10 years saw a remarkable increase of data available to marketers. What was considered 5 years ago as big data (see e.g., Vol. I, Sect. 3.5.6) was based on hundreds of thousands of observations (e.g., Reimer et al. 2014). Today this is considered by data scientists as an average sample size. Consumer handscan panels and regular brand tracking allowed us to build models combining consumer actions with mindset metrics (e.g., Hanssens et al. 2014; Van Heerde et al. 2008). And of course, click stream data inspired a brave new world of modeling (prospective) customers’ online decision journeys (Bucklin and Sismeiro 2003; Pauwels and Van Ewijk 2014). Within the last two years we see social media interaction data coming more and more into marketing research (see e.g., Borah and Tellis 2016; Ilhan et al. 2016). Through this source marketers get access to customer and user generated content. Especially service and fast mover consumer goods as entertainment brands enjoy a high volume of interactions (see e.g., Henning-Thurau et al. 2014) in different social media channels that is linked with the company’s future sales performance. Social media again boosts the number of data available to marketers. Instead of now facing hundreds of thousands of observations, researchers are very likely to encounter millions of comments, likes and shares in even short observation periods. Combining this new data with existing sources, will provide new opportunities and further develop marketing research (Sudhir 2016). Verhoef et al. (2016) provide many examples how big data (analytics) can be used to create value for customers and firms.


  1. Anderson, C.R.: A Machine Learning Approach to Web Personalization. University of Washington Press, Washington, DC (2002)Google Scholar
  2. Bennett, K.P., Wu, D., Auslender, L.: On support vector decision trees for database marketing. Neural Netw. 2, 904–909 (1999)CrossRefGoogle Scholar
  3. Biggs, D., De Ville, B., Suen, E.: A method of choosing multi-way partitions for classification and decision trees. J. Appl. Stat. 18, 49–62 (1991)CrossRefGoogle Scholar
  4. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997)CrossRefGoogle Scholar
  5. Borah, A., Tellis, G.J.: Halo (spillover) effects in social media: do product recalls of one brand hurt or help rival brands? J. Mark. Res. 53, 143–160 (2016)CrossRefGoogle Scholar
  6. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory—COLT, pp. 144–146 (1992)Google Scholar
  7. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)Google Scholar
  8. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA (1984)Google Scholar
  9. Buckinx, W., Moons, E., Van den Poel, D., Wets, G.: Customer-adapted coupon targeting using feature selection. Expert Syst. Appl. 26, 509–518 (2004)CrossRefGoogle Scholar
  10. Bucklin, R.E., Sismeiro, C.: A model of web site browsing behavior estimated on clickstream data. J. Mark. Res. 40, 249–267 (2003)CrossRefGoogle Scholar
  11. Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. KDD. 98, 164–168 (1998)Google Scholar
  12. Chen, Y., Pavlov, D., Canny J.F.: Large-scale behavioral targeting. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining ACM (2009)Google Scholar
  13. Cheng, J., Greiner, R.: Comparing Bayesian network classifiers. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 101–108 (1999)Google Scholar
  14. Chevalier, J.A., Mayzlin, D.: The effect of word of mouth on sales: online book reviews. J. Mark. Res. 43, 345–354 (2006)CrossRefGoogle Scholar
  15. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)Google Scholar
  16. Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965)CrossRefGoogle Scholar
  17. Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and prognosis. Cancer Informat. 2, 105–117 (2006)Google Scholar
  18. Dietterich, T.: Overfitting and undercomputing in machine learning. Complicat. Surg. 27, 326–327 (1995)Google Scholar
  19. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40, 139–157 (2000)CrossRefGoogle Scholar
  20. Elith, J., Leathwick, J.R., Hastie, T.: A working guide to boosted regression trees. J. Anim. Ecol. 77, 802–813 (2008)CrossRefGoogle Scholar
  21. Esposito, F., Malerba, D., Semeraro, G., Kay, J.: A comparative analysis of methods for pruning decision trees. IEEE. 19, 476–491 (1997)Google Scholar
  22. Fayyad, U.M.: Data mining and knowledge discovery–making sense out of data. Intell. Syst. Appl. 11, 20–25 (1996)Google Scholar
  23. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics. 7, 179–188 (1936)CrossRefGoogle Scholar
  24. Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.: Pulse: mining customer opinions from free text. In: A.F. Famili, J.N. Kok, J.M. Pena, A. Siebes, A. Feelders (eds.): Advances in Intelligent Data Analysis VI. Springer Berlin 121–132 (2005)Google Scholar
  25. Gascuel, O.: On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. Mol. Biol. Evol. 17, 401–405 (2000)CrossRefGoogle Scholar
  26. Guido, G., Prete, M.I., Miraglia, S., De Mare, I.: Targeting direct marketing campaigns by neural networks. J. Mark. Manag. 27, 992–1006 (2011)CrossRefGoogle Scholar
  27. Hanssens, D.M., Pauwels, K.H., Srinivasan, S., Vanhuele, M., Yildirim, G.: Consumer attitude metrics for guiding marketing mix decisions. Mark. Sci. 33, 534–550 (2014)CrossRefGoogle Scholar
  28. Hastie, T., Tibshirani, R., Friedman, J.H.: Boosting and Additive Trees. The Elements of Statistical Learning (2nd ed.). Springer, New York (2009)Google Scholar
  29. Henning-Thurau, T., Wiertz, C., and Feldhaus, F.: Does twitter matter? The impact of microblogging word of mouth on consumers’ adoption of new movies. J. Acad. Mark. Sci. 43, 375–394 (2014)Google Scholar
  30. Hinton, G.E.: Learning multiple layers of representation. Trends Cogn. Sci. 11, 428–434 (2007)CrossRefGoogle Scholar
  31. Ho, T.K.: A data complexity analysis of comparative advantages of decision forest constructors. Pattern. Anal. Applic. 5, 102–112 (2002)CrossRefGoogle Scholar
  32. Homburg, C., Ehm, L., Artz, M.: Measuring and managing consumer sentiment in an online community environment. J. Mark. Res. 52, 629–641 (2015)CrossRefGoogle Scholar
  33. Ilhan E., Pauwels, K.H., Kübler, R.V.: Dancing with the enemy: broadened understanding of engagement in rival brand dyads, MSI Working Paper Series (2016)Google Scholar
  34. Jaworska, J., Sydow, M.: Behavioral targeting in on-line advertising: An empirical study. In: Web Information Systems Engineering-WISE 2008, pp. 62–76. Springer, Berlin (2008)CrossRefGoogle Scholar
  35. Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. Appl. Stat. 28, 119–127 (1980)CrossRefGoogle Scholar
  36. Kim, Y., Street, W.N.: An intelligent system for customer targeting: a data mining approach. Decis. Support. Syst. 37, 215–228 (2004)CrossRefGoogle Scholar
  37. Kim, Y., Street, W.N., Russell, G.J., Menczer, F.: Customer targeting: a neural network approach guided by genetic algorithms. Manag. Sci. 51, 264–276 (2005)CrossRefGoogle Scholar
  38. Kübler, R.V., Colicev, A., Pauwels, K.H.: User generated content as a predictor for brand equity. In: Proceedings of the Informs Marketing Science Conference (2016)Google Scholar
  39. Kuhn, R., De Mori, R.: The application of semantic classification trees to natural language understanding. Trans. Pattern Anal. Mach. Intell. 17, 449–460 (1995)CrossRefGoogle Scholar
  40. Levandowsky, M., Winter, D.: Distance between sets. Nature. 234, 34–35 (1971)CrossRefGoogle Scholar
  41. Li, T., Liu, N., Yan, J., Wang, G., Bai, F., Chen, Z.: A Markov chain model for integrating behavioral targeting into contextual advertising. In: Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising, pp. 1–9 (2009)Google Scholar
  42. Linden, G., Smith, B., York, J.: recommendations: item-to-item collaborative filtering. Internet Comput. 7, 76–80 (2003)CrossRefGoogle Scholar
  43. Liu, K., Tang, L.: Large-scale behavioral targeting with a social twist. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1815–1824 (2011)Google Scholar
  44. Loh, W.Y., Shih, Y.S.: Split selection methods for classification trees. Stat. Sin. 117, 815–840 (1997)Google Scholar
  45. Marlin, B.: Collaborative filtering: a machine learning perspective. Dissertation University of Toronto (2004)Google Scholar
  46. Martin, J.H., Jurafsky, D.: Speech and Language Processing. Prentice-Hall, Pearson, GA (2000)Google Scholar
  47. Mayer-Schönberger, V., Cukier, K.: Big Data: A Revolution That Will Transform How We Live, Work and Think. John Murray, London (2013)Google Scholar
  48. Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. AAAI/IAAI. 23, 187–192 (2002)Google Scholar
  49. Montgomery, D.C., Peck, E.A.: Introduction to Linear Regression Analysis. Springer, Berlin (1992)Google Scholar
  50. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)Google Scholar
  51. Nielsen, M.A.: Neural Network and Deep Learning. Determination Press (2015)Google Scholar
  52. Osuna, E., Freund, R., Girosit, F.: Training support vector machines: an application to face detection. In: Proceedings of Computer Vision and Pattern Recognition Conference, pp. 130–136 (1997)Google Scholar
  53. Pandey, S., Aly, M., Bagherjeiran, A., Hatch, A., Ciccolo, P., Ratnaparkhi, A., Zinkevich, M.: Learning to target: what works for behavioral targeting. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1805–1814 (2011)Google Scholar
  54. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)CrossRefGoogle Scholar
  55. Pauwels, K.H., Van Ewijk, B.: Do online behavior tracking or attitude survey metrics drive brand sales? An integrative model of attitudes and actions on the consumer boulevard. Mark. Sci. Inst. Rep. 4, 13–118 (2014)Google Scholar
  56. Perlich, C., Dalessandro, B., Hook, R., Stitelman, O., Raeder, T., Provost, F.: Bid optimizing and inventory scoring in targeted online advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 804–812 (2012)Google Scholar
  57. Provost, F., Fawcett, T.: Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O'Reilly Media, Quinlan, TX (2013)Google Scholar
  58. Quinlan, J.R.: Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 221–234 (1987)CrossRefGoogle Scholar
  59. Reimer, K., Rutz, O.J., Pauwels, K.H.: How online consumer segments differ in long-term marketing effectiveness. J. Interact. Mark. 28, 271–284 (2014)CrossRefGoogle Scholar
  60. Reutterer, T., Mild, A., Natter, M., Taudes, A.: A dynamic segmentation approach for targeting and customizing direct marketing campaigns. J. Interact. Mark. 20, 43–57 (2006)CrossRefGoogle Scholar
  61. Ritschard, G.: CHAID and earlier supervised tree methods, accessed online June 12, 2016 at (2010)
  62. Rokach, L., Maimon, O.: Data mining with decision trees: theory and applications. World Scientific (2007)Google Scholar
  63. Schlangenstein, M.: UPS crunches data to make routes more efficient, save gas, Bloomberg, accessed online, June 7, 2016 at (2013)
  64. Schweidel, M., Moe, W.: Listening in on social media: a joint model of sentiment and venue format choice. J. Mark. Res. 51, 387–399 (2014)CrossRefGoogle Scholar
  65. Shannon, C.E.: A note on the concept of entropy. Bell Syst. Tech. J. 27, 379–423 (1948)CrossRefGoogle Scholar
  66. Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Gordon, J.: Empirical study of machine learning based approach for opinion mining in tweets. In: Famili, A.F., Kok, J.N., Pena, J.M., Siebes, A., Feelders, A. (eds.) Advances in Intelligent Data Analysis VI, pp. 1–14. Springer, Berlin (2012)Google Scholar
  67. Skiera, B., Abou Nabout, N.: Practice prize paper-PROSAD: a bidding decision support system for profit optimizing search engine advertising. Mark. Sci. 32, 213–220 (2013)CrossRefGoogle Scholar
  68. Skurichina, M.: Bagging, boosting and the random subspace method for linear classifiers. Pattern. Anal. Applic. 5, 121–135 (2002)CrossRefGoogle Scholar
  69. Sudhir, K.: The exploration-exploitation tradeoff and efficiency in knowledge production. Mark. Sci. 52, 1–14 (2016)CrossRefGoogle Scholar
  70. Tang, J., Liu, N., Yan, J., Shen, Y., Guo, S., Gao, B., Zhang, M.: Learning to rank audience for behavioral targeting in display ads. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 605–610 (2011)Google Scholar
  71. Van Heerde, H.J., Gijsbrechts, E., Pauwels, K.H.: Winners and losers in a major price war. J. Mark. Res. 45, 499–518 (2008)CrossRefGoogle Scholar
  72. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)CrossRefGoogle Scholar
  73. Verhoef, P.C., Kooge, E., Walk, N.: Creating Value with Big Data Analytics. Rootledge, New York, NY (2016)Google Scholar
  74. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)CrossRefGoogle Scholar
  75. Wang, C., Raina, R., Fong, D., Zhou, D., Han, J., Badros, G.: Learning relevance from heterogeneous social network and its application in online targeting. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–664 (2011)Google Scholar
  76. Wasserman, L.: Statistics and Machine Learning, accessed online June, 5 2016 at (2012)
  77. Wedel, M., Kamakura, W.A.: Market Segmentation: Conceptual and Methodological Foundations. Kluwer Academic, Boston, MA (2000)CrossRefGoogle Scholar
  78. Wu, C.H., Kao, S.C., Su, Y.Y., Wu, C.C.: Targeting customers via discovery knowledge for the insurance industry. Expert Syst. Appl. 29, 291–299 (2005)CrossRefGoogle Scholar
  79. Xia, G.E., Jin, W.D.: Model of customer churn prediction on support vector machine. Syst. Eng. Theory Prac. 28, 71–77 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Marketing Özyeğin UniversityIstanbulTurkey
  2. 2.Department of MarketingUniversity of GroningenGroningenThe Netherlands
  3. 3.Department of MarketingNortheastern UniversityBostonUSA

Personalised recommendations