Comparing Standard Regression Modeling to Ensemble Modeling: How Data Mining Software Can Improve Economists’ Predictions
Economists’ wariness of data mining may be misplaced, even in cases where economic theory provides a well-specified model for estimation. We discuss how new data mining/ensemble modeling software, for example the program TreeNet, can be used to create predictive models. We then show how for a standard labor economics problem, the estimation of wage equations, TreeNet outperforms standard OLS regression in terms of lower prediction error. Ensemble modeling resists the tendency to overfit data. We conclude by considering additional types of economic problems that are well-suited to use of data mining techniques.
Keywordsdata mining ensemble modeling
JEL ClassificationsC14 C51 J31
The authors thank session participants at the May 2013 Eastern Economics Association meetings and an anonymous referee for helpful suggestions, and Wesleyan University for research support.
- Bowles, Mike . 2014. Ensemble Packages in R. Revolutions Blog (April 8), Revolution Analytics, http://blog.revolutionanalytics.com/2014/04/ensemble-packages-in-r.html.
- Friedman, Jerome H. 1999a. Stochastic Gradient Boosting. Technical Report, Dept. of Statistics, Stanford University.Google Scholar
- Friedman, Jerome H. 1999b. Greedy Function Approximation: A Gradient Boosting Machine. Technical Report, Dept. of Statistics, Stanford University.Google Scholar
- Jacobsen, Joyce P. 2007. The Economics of Gender, 3rd ed., Malden, Mass.: Blackwell.Google Scholar
- Munnell, Alicia H., Geoffrey M. B. Tootell, Lynne E. Browne, and James McEneaney . 1996. Mortgage Lending in Boston: Interpreting HMDA Data. American Economic Review, 86 (1): 25–53.Google Scholar
- Salford Systems. 2001–05. TreeNet: An Exclusive Implementation of Jerome Friedman’s MART Methodology: Robust Multi-tree Technology for Data Mining, Predictive Modeling and Data Processing, http://www.salford-systems.com/.
- Schonlau, Matthias . 2005. Boosted Regression (Boosting): An Introductory Tutorial and a Stata Plugin. The Stata Journal, 5 (3): 330–354.Google Scholar