Eastern Economic Journal

, Volume 42, Issue 3, pp 387–398 | Cite as

Comparing Standard Regression Modeling to Ensemble Modeling: How Data Mining Software Can Improve Economists’ Predictions

  • Joyce P Jacobsen
  • Laurence M Levin
  • Zachary Tausanovitch


Economists’ wariness of data mining may be misplaced, even in cases where economic theory provides a well-specified model for estimation. We discuss how new data mining/ensemble modeling software, for example the program TreeNet, can be used to create predictive models. We then show how for a standard labor economics problem, the estimation of wage equations, TreeNet outperforms standard OLS regression in terms of lower prediction error. Ensemble modeling resists the tendency to overfit data. We conclude by considering additional types of economic problems that are well-suited to use of data mining techniques.


data mining ensemble modeling 

JEL Classifications

C14 C51 J31 



The authors thank session participants at the May 2013 Eastern Economics Association meetings and an anonymous referee for helpful suggestions, and Wesleyan University for research support.


  1. Belloni, Alexandre, Victor Chernozhukov, and Christian Hansen . 2014. High-dimensional Methods and Inference on Structural and Treatment Effects. Journal of Economic Perspectives, 28 (2): 29–50.CrossRefGoogle Scholar
  2. Blinder, Alan . 1973. Wage Discrimination: Reduced Form and Structural Estimates. Journal of Human Resources, 8 (4): 436–455.CrossRefGoogle Scholar
  3. Bowles, Mike . 2014. Ensemble Packages in R. Revolutions Blog (April 8), Revolution Analytics, http://blog.revolutionanalytics.com/2014/04/ensemble-packages-in-r.html.
  4. Cook, Thomas D. 2014. “Big Data” in Research on Social Policy. Journal of Policy Analysis and Management, 33 (2): 544–547.CrossRefGoogle Scholar
  5. Friedman, Jerome H. 1999a. Stochastic Gradient Boosting. Technical Report, Dept. of Statistics, Stanford University.Google Scholar
  6. Friedman, Jerome H. 1999b. Greedy Function Approximation: A Gradient Boosting Machine. Technical Report, Dept. of Statistics, Stanford University.Google Scholar
  7. Jacobsen, Joyce P. 2007. The Economics of Gender, 3rd ed., Malden, Mass.: Blackwell.Google Scholar
  8. Munnell, Alicia H., Geoffrey M. B. Tootell, Lynne E. Browne, and James McEneaney . 1996. Mortgage Lending in Boston: Interpreting HMDA Data. American Economic Review, 86 (1): 25–53.Google Scholar
  9. Oaxaca, Ronald . 1973. Male-female Wage Differentials in Urban Labor Markets. International Economic Review, 14 (3): 693–709.CrossRefGoogle Scholar
  10. Salford Systems. 2001–05. TreeNet: An Exclusive Implementation of Jerome Friedman’s MART Methodology: Robust Multi-tree Technology for Data Mining, Predictive Modeling and Data Processing, http://www.salford-systems.com/.
  11. Schonlau, Matthias . 2005. Boosted Regression (Boosting): An Introductory Tutorial and a Stata Plugin. The Stata Journal, 5 (3): 330–354.Google Scholar
  12. Stock, James H. 2010. The Other Transformation in Econometric Practice: Robust Tools for Inference. Journal of Economic Perspectives, 24 (2): 83–94.CrossRefGoogle Scholar
  13. Varian, Hal R. 2014. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives, 28 (2): 3–28.CrossRefGoogle Scholar
  14. Wichard, Jorg D., and Maciej Ogorzalek . 2007. Time Series Prediction with Ensemble Models Applied to the CATS Benchmark. Neurocomputing, 70 (13–15): 2371–78.CrossRefGoogle Scholar
  15. Willis, Robert J. 1986. Wage Determinants: A Survey and Reinterpretation of Human Capital Earnings Functions. Handbook of Labor Economics, 1: 525–602.CrossRefGoogle Scholar

Copyright information

© Eastern Economic Association 2015

Authors and Affiliations

  • Joyce P Jacobsen
    • 1
  • Laurence M Levin
    • 2
  • Zachary Tausanovitch
    • 3
  1. 1.Economics DepartmentPublic Affairs Center, Wesleyan UniversityMiddletownUSA
  2. 2.VISA Inc.
  3. 3.Network for Teaching EntrepreneurshipNew York

Personalised recommendations