Abstract
The last 10 years saw a remarkable increase of data available to marketers. What was considered 5 years ago as big data (see e.g., Vol. I, Sect. 3.5.6) was based on hundreds of thousands of observations (e.g., Reimer et al. 2014). Today this is considered by data scientists as an average sample size. Consumer handscan panels and regular brand tracking allowed us to build models combining consumer actions with mindset metrics (e.g., Hanssens et al. 2014; Van Heerde et al. 2008). And of course, click stream data inspired a brave new world of modeling (prospective) customers’ online decision journeys (Bucklin and Sismeiro 2003; Pauwels and Van Ewijk 2014). Within the last two years we see social media interaction data coming more and more into marketing research (see e.g., Borah and Tellis 2016; Ilhan et al. 2016). Through this source marketers get access to customer and user generated content. Especially service and fast mover consumer goods as entertainment brands enjoy a high volume of interactions (see e.g., Henning-Thurau et al. 2014) in different social media channels that is linked with the company’s future sales performance. Social media again boosts the number of data available to marketers. Instead of now facing hundreds of thousands of observations, researchers are very likely to encounter millions of comments, likes and shares in even short observation periods. Combining this new data with existing sources, will provide new opportunities and further develop marketing research (Sudhir 2016). Verhoef et al. (2016) provide many examples how big data (analytics) can be used to create value for customers and firms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
See also Chap. 17.
- 2.
Some tree-structured models use different splitting algorithms such as Gini-Impurity (see e.g., Kuhn and De Mori 1995) or Variance Reduction (see e.g., Gascuel 2000). Despite formal differences, the main concept behind this is still to improve in group homogeneity by comparing purity before and after the split.
- 3.
Compare the split in data into an estimation (training set) and a validation sample (hold out set): Vol. I, Sect. 5.7.
References
Anderson, C.R.: A Machine Learning Approach to Web Personalization. University of Washington Press, Washington, DC (2002)
Bennett, K.P., Wu, D., Auslender, L.: On support vector decision trees for database marketing. Neural Netw. 2, 904–909 (1999)
Biggs, D., De Ville, B., Suen, E.: A method of choosing multi-way partitions for classification and decision trees. J. Appl. Stat. 18, 49–62 (1991)
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997)
Borah, A., Tellis, G.J.: Halo (spillover) effects in social media: do product recalls of one brand hurt or help rival brands? J. Mark. Res. 53, 143–160 (2016)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory—COLT, pp. 144–146 (1992)
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA (1984)
Buckinx, W., Moons, E., Van den Poel, D., Wets, G.: Customer-adapted coupon targeting using feature selection. Expert Syst. Appl. 26, 509–518 (2004)
Bucklin, R.E., Sismeiro, C.: A model of web site browsing behavior estimated on clickstream data. J. Mark. Res. 40, 249–267 (2003)
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. KDD. 98, 164–168 (1998)
Chen, Y., Pavlov, D., Canny J.F.: Large-scale behavioral targeting. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining ACM (2009)
Cheng, J., Greiner, R.: Comparing Bayesian network classifiers. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 101–108 (1999)
Chevalier, J.A., Mayzlin, D.: The effect of word of mouth on sales: online book reviews. J. Mark. Res. 43, 345–354 (2006)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965)
Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and prognosis. Cancer Informat. 2, 105–117 (2006)
Dietterich, T.: Overfitting and undercomputing in machine learning. Complicat. Surg. 27, 326–327 (1995)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40, 139–157 (2000)
Elith, J., Leathwick, J.R., Hastie, T.: A working guide to boosted regression trees. J. Anim. Ecol. 77, 802–813 (2008)
Esposito, F., Malerba, D., Semeraro, G., Kay, J.: A comparative analysis of methods for pruning decision trees. IEEE. 19, 476–491 (1997)
Fayyad, U.M.: Data mining and knowledge discovery–making sense out of data. Intell. Syst. Appl. 11, 20–25 (1996)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics. 7, 179–188 (1936)
Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.: Pulse: mining customer opinions from free text. In: A.F. Famili, J.N. Kok, J.M. Pena, A. Siebes, A. Feelders (eds.): Advances in Intelligent Data Analysis VI. Springer Berlin 121–132 (2005)
Gascuel, O.: On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. Mol. Biol. Evol. 17, 401–405 (2000)
Guido, G., Prete, M.I., Miraglia, S., De Mare, I.: Targeting direct marketing campaigns by neural networks. J. Mark. Manag. 27, 992–1006 (2011)
Hanssens, D.M., Pauwels, K.H., Srinivasan, S., Vanhuele, M., Yildirim, G.: Consumer attitude metrics for guiding marketing mix decisions. Mark. Sci. 33, 534–550 (2014)
Hastie, T., Tibshirani, R., Friedman, J.H.: Boosting and Additive Trees. The Elements of Statistical Learning (2nd ed.). Springer, New York (2009)
Henning-Thurau, T., Wiertz, C., and Feldhaus, F.: Does twitter matter? The impact of microblogging word of mouth on consumers’ adoption of new movies. J. Acad. Mark. Sci. 43, 375–394 (2014)
Hinton, G.E.: Learning multiple layers of representation. Trends Cogn. Sci. 11, 428–434 (2007)
Ho, T.K.: A data complexity analysis of comparative advantages of decision forest constructors. Pattern. Anal. Applic. 5, 102–112 (2002)
Homburg, C., Ehm, L., Artz, M.: Measuring and managing consumer sentiment in an online community environment. J. Mark. Res. 52, 629–641 (2015)
Ilhan E., Pauwels, K.H., Kübler, R.V.: Dancing with the enemy: broadened understanding of engagement in rival brand dyads, MSI Working Paper Series (2016)
Jaworska, J., Sydow, M.: Behavioral targeting in on-line advertising: An empirical study. In: Web Information Systems Engineering-WISE 2008, pp. 62–76. Springer, Berlin (2008)
Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. Appl. Stat. 28, 119–127 (1980)
Kim, Y., Street, W.N.: An intelligent system for customer targeting: a data mining approach. Decis. Support. Syst. 37, 215–228 (2004)
Kim, Y., Street, W.N., Russell, G.J., Menczer, F.: Customer targeting: a neural network approach guided by genetic algorithms. Manag. Sci. 51, 264–276 (2005)
Kübler, R.V., Colicev, A., Pauwels, K.H.: User generated content as a predictor for brand equity. In: Proceedings of the Informs Marketing Science Conference (2016)
Kuhn, R., De Mori, R.: The application of semantic classification trees to natural language understanding. Trans. Pattern Anal. Mach. Intell. 17, 449–460 (1995)
Levandowsky, M., Winter, D.: Distance between sets. Nature. 234, 34–35 (1971)
Li, T., Liu, N., Yan, J., Wang, G., Bai, F., Chen, Z.: A Markov chain model for integrating behavioral targeting into contextual advertising. In: Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising, pp. 1–9 (2009)
Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collaborative filtering. Internet Comput. 7, 76–80 (2003)
Liu, K., Tang, L.: Large-scale behavioral targeting with a social twist. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1815–1824 (2011)
Loh, W.Y., Shih, Y.S.: Split selection methods for classification trees. Stat. Sin. 117, 815–840 (1997)
Marlin, B.: Collaborative filtering: a machine learning perspective. Dissertation University of Toronto (2004)
Martin, J.H., Jurafsky, D.: Speech and Language Processing. Prentice-Hall, Pearson, GA (2000)
Mayer-Schönberger, V., Cukier, K.: Big Data: A Revolution That Will Transform How We Live, Work and Think. John Murray, London (2013)
Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. AAAI/IAAI. 23, 187–192 (2002)
Montgomery, D.C., Peck, E.A.: Introduction to Linear Regression Analysis. Springer, Berlin (1992)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)
Nielsen, M.A.: Neural Network and Deep Learning. Determination Press (2015)
Osuna, E., Freund, R., Girosit, F.: Training support vector machines: an application to face detection. In: Proceedings of Computer Vision and Pattern Recognition Conference, pp. 130–136 (1997)
Pandey, S., Aly, M., Bagherjeiran, A., Hatch, A., Ciccolo, P., Ratnaparkhi, A., Zinkevich, M.: Learning to target: what works for behavioral targeting. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1805–1814 (2011)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Pauwels, K.H., Van Ewijk, B.: Do online behavior tracking or attitude survey metrics drive brand sales? An integrative model of attitudes and actions on the consumer boulevard. Mark. Sci. Inst. Rep. 4, 13–118 (2014)
Perlich, C., Dalessandro, B., Hook, R., Stitelman, O., Raeder, T., Provost, F.: Bid optimizing and inventory scoring in targeted online advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 804–812 (2012)
Provost, F., Fawcett, T.: Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O'Reilly Media, Quinlan, TX (2013)
Quinlan, J.R.: Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 221–234 (1987)
Reimer, K., Rutz, O.J., Pauwels, K.H.: How online consumer segments differ in long-term marketing effectiveness. J. Interact. Mark. 28, 271–284 (2014)
Reutterer, T., Mild, A., Natter, M., Taudes, A.: A dynamic segmentation approach for targeting and customizing direct marketing campaigns. J. Interact. Mark. 20, 43–57 (2006)
Ritschard, G.: CHAID and earlier supervised tree methods, accessed online June 12, 2016 at http://www.unige.ch/ses/dsec/repec/files/2010_02.pdf (2010)
Rokach, L., Maimon, O.: Data mining with decision trees: theory and applications. World Scientific (2007)
Schlangenstein, M.: UPS crunches data to make routes more efficient, save gas, Bloomberg, accessed online, June 7, 2016 at http://www.bloomberg.com/news/articles/2013-10-30/ups-uses-big-data-to-make-routes-more-efficient-save-gas (2013)
Schweidel, M., Moe, W.: Listening in on social media: a joint model of sentiment and venue format choice. J. Mark. Res. 51, 387–399 (2014)
Shannon, C.E.: A note on the concept of entropy. Bell Syst. Tech. J. 27, 379–423 (1948)
Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Gordon, J.: Empirical study of machine learning based approach for opinion mining in tweets. In: Famili, A.F., Kok, J.N., Pena, J.M., Siebes, A., Feelders, A. (eds.) Advances in Intelligent Data Analysis VI, pp. 1–14. Springer, Berlin (2012)
Skiera, B., Abou Nabout, N.: Practice prize paper-PROSAD: a bidding decision support system for profit optimizing search engine advertising. Mark. Sci. 32, 213–220 (2013)
Skurichina, M.: Bagging, boosting and the random subspace method for linear classifiers. Pattern. Anal. Applic. 5, 121–135 (2002)
Sudhir, K.: The exploration-exploitation tradeoff and efficiency in knowledge production. Mark. Sci. 52, 1–14 (2016)
Tang, J., Liu, N., Yan, J., Shen, Y., Guo, S., Gao, B., Zhang, M.: Learning to rank audience for behavioral targeting in display ads. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 605–610 (2011)
Van Heerde, H.J., Gijsbrechts, E., Pauwels, K.H.: Winners and losers in a major price war. J. Mark. Res. 45, 499–518 (2008)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Verhoef, P.C., Kooge, E., Walk, N.: Creating Value with Big Data Analytics. Rootledge, New York, NY (2016)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Wang, C., Raina, R., Fong, D., Zhou, D., Han, J., Badros, G.: Learning relevance from heterogeneous social network and its application in online targeting. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–664 (2011)
Wasserman, L.: Statistics and Machine Learning, accessed online June, 5 2016 at https://normaldeviate.wordpress.com/2012/06/12/statistics-versus-machine-learning-52 (2012)
Wedel, M., Kamakura, W.A.: Market Segmentation: Conceptual and Methodological Foundations. Kluwer Academic, Boston, MA (2000)
Wu, C.H., Kao, S.C., Su, Y.Y., Wu, C.C.: Targeting customers via discovery knowledge for the insurance industry. Expert Syst. Appl. 29, 291–299 (2005)
Xia, G.E., Jin, W.D.: Model of customer churn prediction on support vector machine. Syst. Eng. Theory Prac. 28, 71–77 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Kübler, R.V., Wieringa, J.E., Pauwels, K.H. (2017). Machine Learning and Big Data. In: Leeflang, P., Wieringa, J., Bijmolt, T., Pauwels, K. (eds) Advanced Methods for Modeling Markets. International Series in Quantitative Marketing. Springer, Cham. https://doi.org/10.1007/978-3-319-53469-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-53469-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53467-1
Online ISBN: 978-3-319-53469-5
eBook Packages: Business and ManagementBusiness and Management (R0)