Machine Learning and Big Data

Kübler, Raoul V.; Wieringa, Jaap E.; Pauwels, Koen H.

doi:10.1007/978-3-319-53469-5_19

Raoul V. Kübler⁶,
Jaap E. Wieringa⁷ &
Koen H. Pauwels⁸

Part of the book series: International Series in Quantitative Marketing ((ISQM))

4235 Accesses
5 Citations
4 Altmetric

Abstract

The last 10 years saw a remarkable increase of data available to marketers. What was considered 5 years ago as big data (see e.g., Vol. I, Sect. 3.5.6) was based on hundreds of thousands of observations (e.g., Reimer et al. 2014). Today this is considered by data scientists as an average sample size. Consumer handscan panels and regular brand tracking allowed us to build models combining consumer actions with mindset metrics (e.g., Hanssens et al. 2014; Van Heerde et al. 2008). And of course, click stream data inspired a brave new world of modeling (prospective) customers’ online decision journeys (Bucklin and Sismeiro 2003; Pauwels and Van Ewijk 2014). Within the last two years we see social media interaction data coming more and more into marketing research (see e.g., Borah and Tellis 2016; Ilhan et al. 2016). Through this source marketers get access to customer and user generated content. Especially service and fast mover consumer goods as entertainment brands enjoy a high volume of interactions (see e.g., Henning-Thurau et al. 2014) in different social media channels that is linked with the company’s future sales performance. Social media again boosts the number of data available to marketers. Instead of now facing hundreds of thousands of observations, researchers are very likely to encounter millions of comments, likes and shares in even short observation periods. Combining this new data with existing sources, will provide new opportunities and further develop marketing research (Sudhir 2016). Verhoef et al. (2016) provide many examples how big data (analytics) can be used to create value for customers and firms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See also Chap. 17.
2.
Some tree-structured models use different splitting algorithms such as Gini-Impurity (see e.g., Kuhn and De Mori 1995) or Variance Reduction (see e.g., Gascuel 2000). Despite formal differences, the main concept behind this is still to improve in group homogeneity by comparing purity before and after the split.
3.
Compare the split in data into an estimation (training set) and a validation sample (hold out set): Vol. I, Sect. 5.7.

References

Anderson, C.R.: A Machine Learning Approach to Web Personalization. University of Washington Press, Washington, DC (2002)
Google Scholar
Bennett, K.P., Wu, D., Auslender, L.: On support vector decision trees for database marketing. Neural Netw. 2, 904–909 (1999)
Article Google Scholar
Biggs, D., De Ville, B., Suen, E.: A method of choosing multi-way partitions for classification and decision trees. J. Appl. Stat. 18, 49–62 (1991)
Article Google Scholar
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997)
Article Google Scholar
Borah, A., Tellis, G.J.: Halo (spillover) effects in social media: do product recalls of one brand hurt or help rival brands? J. Mark. Res. 53, 143–160 (2016)
Article Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory—COLT, pp. 144–146 (1992)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA (1984)
Google Scholar
Buckinx, W., Moons, E., Van den Poel, D., Wets, G.: Customer-adapted coupon targeting using feature selection. Expert Syst. Appl. 26, 509–518 (2004)
Article Google Scholar
Bucklin, R.E., Sismeiro, C.: A model of web site browsing behavior estimated on clickstream data. J. Mark. Res. 40, 249–267 (2003)
Article Google Scholar
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. KDD. 98, 164–168 (1998)
Google Scholar
Chen, Y., Pavlov, D., Canny J.F.: Large-scale behavioral targeting. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining ACM (2009)
Google Scholar
Cheng, J., Greiner, R.: Comparing Bayesian network classifiers. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 101–108 (1999)
Google Scholar
Chevalier, J.A., Mayzlin, D.: The effect of word of mouth on sales: online book reviews. J. Mark. Res. 43, 345–354 (2006)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Google Scholar
Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965)
Article Google Scholar
Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and prognosis. Cancer Informat. 2, 105–117 (2006)
Google Scholar
Dietterich, T.: Overfitting and undercomputing in machine learning. Complicat. Surg. 27, 326–327 (1995)
Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40, 139–157 (2000)
Article Google Scholar
Elith, J., Leathwick, J.R., Hastie, T.: A working guide to boosted regression trees. J. Anim. Ecol. 77, 802–813 (2008)
Article Google Scholar
Esposito, F., Malerba, D., Semeraro, G., Kay, J.: A comparative analysis of methods for pruning decision trees. IEEE. 19, 476–491 (1997)
Google Scholar
Fayyad, U.M.: Data mining and knowledge discovery–making sense out of data. Intell. Syst. Appl. 11, 20–25 (1996)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics. 7, 179–188 (1936)
Article Google Scholar
Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.: Pulse: mining customer opinions from free text. In: A.F. Famili, J.N. Kok, J.M. Pena, A. Siebes, A. Feelders (eds.): Advances in Intelligent Data Analysis VI. Springer Berlin 121–132 (2005)
Google Scholar
Gascuel, O.: On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. Mol. Biol. Evol. 17, 401–405 (2000)
Article Google Scholar
Guido, G., Prete, M.I., Miraglia, S., De Mare, I.: Targeting direct marketing campaigns by neural networks. J. Mark. Manag. 27, 992–1006 (2011)
Article Google Scholar
Hanssens, D.M., Pauwels, K.H., Srinivasan, S., Vanhuele, M., Yildirim, G.: Consumer attitude metrics for guiding marketing mix decisions. Mark. Sci. 33, 534–550 (2014)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: Boosting and Additive Trees. The Elements of Statistical Learning (2nd ed.). Springer, New York (2009)
Google Scholar
Henning-Thurau, T., Wiertz, C., and Feldhaus, F.: Does twitter matter? The impact of microblogging word of mouth on consumers’ adoption of new movies. J. Acad. Mark. Sci. 43, 375–394 (2014)
Google Scholar
Hinton, G.E.: Learning multiple layers of representation. Trends Cogn. Sci. 11, 428–434 (2007)
Article Google Scholar
Ho, T.K.: A data complexity analysis of comparative advantages of decision forest constructors. Pattern. Anal. Applic. 5, 102–112 (2002)
Article Google Scholar
Homburg, C., Ehm, L., Artz, M.: Measuring and managing consumer sentiment in an online community environment. J. Mark. Res. 52, 629–641 (2015)
Article Google Scholar
Ilhan E., Pauwels, K.H., Kübler, R.V.: Dancing with the enemy: broadened understanding of engagement in rival brand dyads, MSI Working Paper Series (2016)
Google Scholar
Jaworska, J., Sydow, M.: Behavioral targeting in on-line advertising: An empirical study. In: Web Information Systems Engineering-WISE 2008, pp. 62–76. Springer, Berlin (2008)
Chapter Google Scholar
Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. Appl. Stat. 28, 119–127 (1980)
Article Google Scholar
Kim, Y., Street, W.N.: An intelligent system for customer targeting: a data mining approach. Decis. Support. Syst. 37, 215–228 (2004)
Article Google Scholar
Kim, Y., Street, W.N., Russell, G.J., Menczer, F.: Customer targeting: a neural network approach guided by genetic algorithms. Manag. Sci. 51, 264–276 (2005)
Article Google Scholar
Kübler, R.V., Colicev, A., Pauwels, K.H.: User generated content as a predictor for brand equity. In: Proceedings of the Informs Marketing Science Conference (2016)
Google Scholar
Kuhn, R., De Mori, R.: The application of semantic classification trees to natural language understanding. Trans. Pattern Anal. Mach. Intell. 17, 449–460 (1995)
Article Google Scholar
Levandowsky, M., Winter, D.: Distance between sets. Nature. 234, 34–35 (1971)
Article Google Scholar
Li, T., Liu, N., Yan, J., Wang, G., Bai, F., Chen, Z.: A Markov chain model for integrating behavioral targeting into contextual advertising. In: Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising, pp. 1–9 (2009)
Google Scholar
Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collaborative filtering. Internet Comput. 7, 76–80 (2003)
Article Google Scholar
Liu, K., Tang, L.: Large-scale behavioral targeting with a social twist. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1815–1824 (2011)
Google Scholar
Loh, W.Y., Shih, Y.S.: Split selection methods for classification trees. Stat. Sin. 117, 815–840 (1997)
Google Scholar
Marlin, B.: Collaborative filtering: a machine learning perspective. Dissertation University of Toronto (2004)
Google Scholar
Martin, J.H., Jurafsky, D.: Speech and Language Processing. Prentice-Hall, Pearson, GA (2000)
Google Scholar
Mayer-Schönberger, V., Cukier, K.: Big Data: A Revolution That Will Transform How We Live, Work and Think. John Murray, London (2013)
Google Scholar
Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. AAAI/IAAI. 23, 187–192 (2002)
Google Scholar
Montgomery, D.C., Peck, E.A.: Introduction to Linear Regression Analysis. Springer, Berlin (1992)
Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)
Google Scholar
Nielsen, M.A.: Neural Network and Deep Learning. Determination Press (2015)
Google Scholar
Osuna, E., Freund, R., Girosit, F.: Training support vector machines: an application to face detection. In: Proceedings of Computer Vision and Pattern Recognition Conference, pp. 130–136 (1997)
Google Scholar
Pandey, S., Aly, M., Bagherjeiran, A., Hatch, A., Ciccolo, P., Ratnaparkhi, A., Zinkevich, M.: Learning to target: what works for behavioral targeting. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1805–1814 (2011)
Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Article Google Scholar
Pauwels, K.H., Van Ewijk, B.: Do online behavior tracking or attitude survey metrics drive brand sales? An integrative model of attitudes and actions on the consumer boulevard. Mark. Sci. Inst. Rep. 4, 13–118 (2014)
Google Scholar
Perlich, C., Dalessandro, B., Hook, R., Stitelman, O., Raeder, T., Provost, F.: Bid optimizing and inventory scoring in targeted online advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 804–812 (2012)
Google Scholar
Provost, F., Fawcett, T.: Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O'Reilly Media, Quinlan, TX (2013)
Google Scholar
Quinlan, J.R.: Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 221–234 (1987)
Article Google Scholar
Reimer, K., Rutz, O.J., Pauwels, K.H.: How online consumer segments differ in long-term marketing effectiveness. J. Interact. Mark. 28, 271–284 (2014)
Article Google Scholar
Reutterer, T., Mild, A., Natter, M., Taudes, A.: A dynamic segmentation approach for targeting and customizing direct marketing campaigns. J. Interact. Mark. 20, 43–57 (2006)
Article Google Scholar
Ritschard, G.: CHAID and earlier supervised tree methods, accessed online June 12, 2016 at http://www.unige.ch/ses/dsec/repec/files/2010_02.pdf (2010)
Rokach, L., Maimon, O.: Data mining with decision trees: theory and applications. World Scientific (2007)
Google Scholar
Schlangenstein, M.: UPS crunches data to make routes more efficient, save gas, Bloomberg, accessed online, June 7, 2016 at http://www.bloomberg.com/news/articles/2013-10-30/ups-uses-big-data-to-make-routes-more-efficient-save-gas (2013)
Schweidel, M., Moe, W.: Listening in on social media: a joint model of sentiment and venue format choice. J. Mark. Res. 51, 387–399 (2014)
Article Google Scholar
Shannon, C.E.: A note on the concept of entropy. Bell Syst. Tech. J. 27, 379–423 (1948)
Article Google Scholar
Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Gordon, J.: Empirical study of machine learning based approach for opinion mining in tweets. In: Famili, A.F., Kok, J.N., Pena, J.M., Siebes, A., Feelders, A. (eds.) Advances in Intelligent Data Analysis VI, pp. 1–14. Springer, Berlin (2012)
Google Scholar
Skiera, B., Abou Nabout, N.: Practice prize paper-PROSAD: a bidding decision support system for profit optimizing search engine advertising. Mark. Sci. 32, 213–220 (2013)
Article Google Scholar
Skurichina, M.: Bagging, boosting and the random subspace method for linear classifiers. Pattern. Anal. Applic. 5, 121–135 (2002)
Article Google Scholar
Sudhir, K.: The exploration-exploitation tradeoff and efficiency in knowledge production. Mark. Sci. 52, 1–14 (2016)
Article Google Scholar
Tang, J., Liu, N., Yan, J., Shen, Y., Guo, S., Gao, B., Zhang, M.: Learning to rank audience for behavioral targeting in display ads. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 605–610 (2011)
Google Scholar
Van Heerde, H.J., Gijsbrechts, E., Pauwels, K.H.: Winners and losers in a major price war. J. Mark. Res. 45, 499–518 (2008)
Article Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Book Google Scholar
Verhoef, P.C., Kooge, E., Walk, N.: Creating Value with Big Data Analytics. Rootledge, New York, NY (2016)
Google Scholar
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Article Google Scholar
Wang, C., Raina, R., Fong, D., Zhou, D., Han, J., Badros, G.: Learning relevance from heterogeneous social network and its application in online targeting. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–664 (2011)
Google Scholar
Wasserman, L.: Statistics and Machine Learning, accessed online June, 5 2016 at https://normaldeviate.wordpress.com/2012/06/12/statistics-versus-machine-learning-52 (2012)
Wedel, M., Kamakura, W.A.: Market Segmentation: Conceptual and Methodological Foundations. Kluwer Academic, Boston, MA (2000)
Book Google Scholar
Wu, C.H., Kao, S.C., Su, Y.Y., Wu, C.C.: Targeting customers via discovery knowledge for the insurance industry. Expert Syst. Appl. 29, 291–299 (2005)
Article Google Scholar
Xia, G.E., Jin, W.D.: Model of customer churn prediction on support vector machine. Syst. Eng. Theory Prac. 28, 71–77 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Marketing, Özyeğin University, Istanbul, Turkey
Raoul V. Kübler
Department of Marketing, University of Groningen, Groningen, The Netherlands
Jaap E. Wieringa
Department of Marketing, Northeastern University, Boston, USA
Koen H. Pauwels

Authors

Raoul V. Kübler
View author publications
You can also search for this author in PubMed Google Scholar
Jaap E. Wieringa
View author publications
You can also search for this author in PubMed Google Scholar
Koen H. Pauwels
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raoul V. Kübler .

Editor information

Editors and Affiliations

Department of Marketing, University of Groningen, Groningen, The Netherlands
Peter S. H. Leeflang
Department of Marketing, University of Groningen, Groningen, The Netherlands
Jaap E. Wieringa
Department of Marketing, University of Groningen, Groningen, The Netherlands
Tammo H.A Bijmolt
Department of Marketing, D’Amore-McKim School of Business, Northeastern University, Boston, USA
Koen H. Pauwels

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kübler, R.V., Wieringa, J.E., Pauwels, K.H. (2017). Machine Learning and Big Data. In: Leeflang, P., Wieringa, J., Bijmolt, T., Pauwels, K. (eds) Advanced Methods for Modeling Markets. International Series in Quantitative Marketing. Springer, Cham. https://doi.org/10.1007/978-3-319-53469-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-53469-5_19
Published: 30 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53467-1
Online ISBN: 978-3-319-53469-5
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics