Abstract
Recent years have seen a substantial development of quantitative methods, mostly led by the computer science community with the goal to develop better machine learning application, mainly focused on predictive modeling. However, economic, management, and technology forecasting research has up to now been hesitant to apply predictive modeling techniques and workflows. In this paper, we introduce to a machine learning (ML) approach to quantitative analysis geared towards optimizing the predictive performance, contrasting it with standard practices inferential statistics which focus on producing good parameter estimates. We discuss the potential synergies between the two fields against the backdrop of this at first glance, target-incompatibility. We discuss fundamental concepts in predictive modeling, such as out-of-sample model validation, variable and model selection, generalization and hyperparameter tuning procedures. Providing a hands-on predictive modelling for an quantitative social science audience, while aiming at demystifying computer science jargon. We use the example of high-quality patent identification guiding the reader through various model classes and procedures for data preprocessing, modelling and validation. We start of with more familiar easy to interpret model classes (Logit and Elastic Nets), continues with less familiar non-parametric approaches (Classification Trees and Random Forest) and finally presents artificial neural network architectures, first a simple feed-forward and then a deep autoencoder geared towards anomaly detection. Instead of limiting ourselves to the introduction of standard ML techniques, we also present state-of-the-art yet approachable techniques from artificial neural networks and deep learning to predict rare phenomena of interest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Often, the challenge in adapting ML techniques for social science problems can be attributed to two issues: (1) Technical lock-ins and (2) Mental lock-ins against the backdrop of paradigmatic contrasts between research traditions. For instance, many ML techniques are initially demonstrated at a collection of—in the ML and Computer Science—well known standard datasets with specific properties. For an applied statistician particularly in social science, however, the classification of Netflix movie ratings or the reconstruction of handwritten digits form the MNIST data-set may appear remote or trivial. These two problems are addressed by contrasting ML techniques with inferential statistics approaches, while using the non-trivial example of patent quality prediction which should be easy to comprehend for scholars working in social science disciplines such as economics.
- 2.
We here blatantly draw from stereotypical workflows inherent to the econometrics and ML discipline. We apologize for offending whoever does not fit neatly in one of these categories.
- 3.
At the point where our \(R^2\) exceeds a threshold somewhere around 0.1, we commonly stop worrying about it.
- 4.
As the name already suggest, this simply expresses by how much our prediction is on average off: \({RMSE} ={\sqrt{\frac{\sum _{i=1}^{n}({\hat{y}}_{i}-y_{i})^{2}}{n}}}.\)
- 5.
Interestingly, quite some techniques associated with identification strategies which are popular among econometricians, such as the use of instrumental variables, endogeneous selection models, fixed and random effect panel regressions, or vector autogregressions, are little known by the ML community.
- 6.
Such k-fold cross-validations can be conveniently done in R with the caret, and in Python with the scikit-learn package.
- 7.
However, one instantly recognizes the similarity to the nowadays common practice among econometricians to bootstrap standard errors by computing them over different subsets of data. The difference here is that we commonly use this procedure, (i) to get more robust parameter estimates instead of evaluating the model’s overall goodness-of-fit, and (ii) we compute them on subsets of the same data the model as fitted on.
- 8.
- 9.
Bootstrapping is a technique most applied econometricians are well-acquainted with, yes used for a slightly different purpose. In econometrics, bootstrapping represents a powerful way to circumvent problems arising out of selection bias and other sampling issues, where the regression on several subsamples is used to adjust the standard errors of the estimates.
- 10.
This is often not the case for typical ML problems, drawing from large numbers of observations and/or a large set of variables. Here, distributed or cloud-based workflows become necessary. We discuss the arising challenges elsewhere (e.g., Hain and Jurowetzki 2020).
- 11.
For a recent and exhaustive review on patent quality measures, including all used in this exercise, consider Squicciarini et al. (2013).
- 12.
While the described process appears rather tedious by hand, specialized ML packages such as caret in R provide efficient workflows to automatize the creation of folds as well as hyperparamether grid search.
- 13.
For an exhaustive overview on model and variable selection algorithms consider Castle et al. (2009).
- 14.
For an exhaustive discussion on the use of LASSO, consider Belloni et al. (2014). Elastic nets are integrated, among others, in the R package Glmnet, and Python’sscikit-learn.
- 15.
There are quite many packages dealing with different implementations of regression trees in common data science environments, such as rpart, tree, party for R, and again the machine learning allrounder scikit-learn in Python. For a more exhaustive introduction to CART models, consider Strobl et al. (2009).
- 16.
Indeed, it is worth mentioning here that many model tuning techniques are based on the idea that adding randomness to the prediction process—somewhat counter-intuitively—increases the robustness and out-of-sample prediction performance of the model.
- 17.
Just to give an example, Mullainathan and Spiess (2017) demonstrate how a LASSO might select very different features in every fold.
- 18.
It has to be stressed that even though neural networks are indeed inspired by the most basic concept of how a brain works, they are by no means mysterious artificial brains. The analogy goes as far as the abstraction that a couple of neurons that are interconnected in some architecture. The neuron is represented as some sigmoid function (somewhat like a logistic regression) which decides based on the inputs received if it should get activated and send a signal to connected neurons, which might again trigger their activation. Having that said, calling a neural network an artificial brain is somewhat like calling a paper-plane an artificial bird.
- 19.
for the sake of simplicity here we will not distinguish between the simple perceptron model, sigmoid neurons or the recently more commonly used rectified linear neurons (Glorot et al. 2011).
- 20.
This complex algorithm adjusts simultaneously all weight in the network, considering the individual contribution of the neuron to the error.
- 21.
- 22.
Variational autoencoers are a slightly more modern and interesting take on this class of models which also performed well in our experiments. Following the KISS principle, we decided to use the more traditional and simpler autoencoder architecture that is easier to explain and performed almost equally well.
References
Aguinis, H., Pierce, C.A., Bosco, F.A., Muslin, I.S.: First decade of organizational research methods: trends in design, measurement, and data-analysis topics. Organ. Res. Methods 12(1), 69–112 (2009)
Ahuja, G., Lampert, C.: Entrepreneurship in the large corporation: a longitudinal study of how established firms create breakthrough inventions. Strateg. Manag. J. 22(6–7), 521–543 (2001)
An, J., Cho, S.: Variational autoencoder based anomaly detection using reconstruction probability. Rep, SNU Data Mining Center, Tech (2015)
Andrews, R.J., Fazio, C., Guzman, J., Stern, S.: The startup cartography project: A map of entrepreneurial quality and quantity in the united states across time and location. MIT Working Paper (2017)
Athey, S., Imbens, G.W.: The state of applied econometrics: causality and policy evaluation. J. Econ. Perspect. 31(2), 3–32 (2017)
Basberg, B.L.: Patents and the measurement of technological change: a survey of the literature. Res. Policy 16(2–4), 131–141 (1987)
Belloni, A., Chernozhukov, V., Hansen, C.: Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81(2), 608–650 (2014)
Castle, J.L., Qin, X., Reed, W.R., et al.: How to pick the best regression equation: A review and comparison of model selection algorithms. Working Paper No. 13/2009, Department of Economics and Finance, University of Canterbury (2009)
Einav, L., Levin, J.: The data revolution and economic analysis. Innov. Policy Econ. 14(1), 1–24 (2014)
Einav, L., Levin, J.: Economics in the age of big data. Science 346(6210), 1243089 (2014)
Ernst, H.: Patent applications and subsequent changes of performance: evidence from time-series cross-section analyses on the firm level. Res. Policy 30(1), 143–157 (2001)
Fazio, C., Guzman, J., Murray, F., and Stern, S.: A new view of the skew: quantitative assessment of the quality of American entrepreneurship. MIT Innovation Initiative Paper (2016)
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 916–954 (2008)
Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E.L., Fei-Fei, L.: Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states. Proc. Natl. Acad. Sci. 201700035 (2017)
George, G., Osinga, E.C., Lavie, D., Scott, B.A.: From the editors: big data and data science methods for management research. Acad. Manag. J. 59(5), 1493–1507 (2016)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Glorot, X., Bordes, A., and Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 513–520 (2011)
Guzman, J., Stern, S.: Where is silicon valley? Science 347(6222), 606–609 (2015)
Guzman, J., Stern, S.: Nowcasting and placecasting entrepreneurial quality and performance. In: Haltiwanger, J., Hurst, E., Miranda, J., Schoar, A. (eds.) Measuring Entrepreneurial Businesses: Current Knowledge and Challenges, Chapter 2. University of Chicago Press (2017)
Hagedoorn, J., Schakenraad, J.: A comparison of private and subsidized r&d partnerships in the European information technology industry. JCMS J. Common Market Stud. 31(3), 373–390 (1993)
Hain, D.S., Jurowetzki, R.: The potentials of machine learning and big data in entrepreneurship research-the liaison of econometrics and data science. In: Cowling, M., Saridakis, G. (eds.) Handbook of Quantitative Research Methods in Entrepreneurship. Edward Elgar Publishing (2020)
Hain, D.S., Jurowetzki, R., Buchmann, T., Wolf, P.: A text-embedding-based approach to measuring patent-to-patent technological similarity. Technol. Forecast. Soc. Change 177, 121559 (2022). https://doi.org/10.1016/j.techfore.2022.121559
Hall, B.H., Harhoff, D.: Recent research on the economics of patents. Annu. Rev. Econ. 4(1), 541–565 (2012)
Harhoff, D., Scherer, F.M., Vopel, K.: Citations, family size, opposition and the value of patent rights. Res. Policy 32(8), 1343–1363 (2003)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hirschey, M., Richardson, V.J.: Are scientific indicators of patent quality useful to investors? J. Empir. Financ. 11(1), 91–107 (2004)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Lerner, J.: The importance of patent scope: an empirical analysis. RAND J. Econ. 319–333 (1994)
McAfee, A., Brynjolfsson, E., Davenport, T.H., et al.: Big data: the management revolution. Harv. Bus. Rev. 90(10), 60–68 (2012)
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)
Mullainathan, S., Spiess, J.: Machine learning: an applied econometric approach. J. Econ. Perspect. 31(2), 87–106 (2017)
Narin, F., Hamilton, K.S., Olivastro, D.: The increasing linkage between US technology and public science. Res. Policy 26(3), 317–330 (1997). https://doi.org/10.1016/S0048-7333(97)00013-9
Ng, A.Y.: Feature selection, l 1 vs. l 2 regularization, and rotational invariance. In: Proceedings of the Twenty-first International Conference on Machine Learning, pp. 78. ACM (2004)
Perlich, C., Provost, F., Simonoff, J.S.: Tree induction vs. logistic regression: a learning-curve analysis. J. Mach. Learn. Res. 4(Jun), 211–255 (2003)
Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., Ljung, L.: Kernel methods in system identification, machine learning and function estimation: a survey. Automatica 50(3), 657–682 (2014). cited By 115
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)
Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, pp. 4. ACM (2014)
Sedhain, S., Menon, A.K., Sanner, S., Xie, L.: Autorec: Autoencoders meet collaborative filtering. In: Proceedings of the 24th International Conference on World Wide Web, pp. 111–112. ACM (2015)
Shane, S.: Technological opportunities and new firm creation. Manag. Sci. 47(2), 205–220 (2001)
Shyu, M.-L., Chen, S.-C., Sarinnapakorn, K., Chang, L.: A novel anomaly detection scheme based on principal component classifier. Technical report, Miami University Coral Gables FL Department of Electrical and Computer Engineering (2003)
Squicciarini, M., Dernis, H., Criscuolo, C.: Measuring patent quality, Indicators of technological and economic value (2013)
Strobl, C., Malley, J., Tutz, G.: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods 14(4), 323 (2009)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Taleb, N.: The black swan: The impact of the highly improbable. Random House Trade Paperbacks (2010)
Therneau, T.M., Atkinson, E.J., et al.: An introduction to recursive partitioning using the rpart routines (1997)
Tian, F., Gao, B., Cui, Q., Chen, E., Liu, T.-Y.: Learning deep representations for graph clustering. In: AAAI, pp. 1293–1299 (2014)
Trajtenberg, M., Henderson, R., Jaffe, A.: University versus corporate patents: a window on the basicness of invention. Econ. Innov. New Technol. 5(1), 19–50 (1997)
van der Vegt, G.S., Essens, P., Wahlström, M., George, G.: Managing risk and resilience. Acad. Manag. J. 58(4), 971–980 (2015)
Varian, H.R.: Big data: New tricks for econometrics. J. Econ. Perspect. 28(2), 3–27 (2014)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
Wainwright, M.: Structured regularizers for high-dimensional problems: Statistical and computational issues. Ann. Rev. Stat. Appl. 1, 233–253 (2014). cited By 24
Wang, Y.: A multinomial logistic regression modeling approach for anomaly intrusion detection. Comput. Secur. 24(8), 662–674 (2005)
Zhou, M., Lang, S.-D.: Mining frequency content of network traffic for intrusion detection. In: Proceedings of the IASTED International Conference on Communication, Network, and Information Security, pp. 101–107 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hain, D., Jurowetzki, R. (2022). Introduction to Rare-Event Predictive Modeling for Inferential Statisticians—A Hands-On Application in the Prediction of Breakthrough Patents. In: Ngoc Thach, N., Kreinovich, V., Ha, D.T., Trung, N.D. (eds) Financial Econometrics: Bayesian Analysis, Quantum Uncertainty, and Related Topics. ECONVN 2022. Studies in Systems, Decision and Control, vol 427. Springer, Cham. https://doi.org/10.1007/978-3-030-98689-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-98689-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98688-9
Online ISBN: 978-3-030-98689-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)