Abstract
The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on some recent advances in statistical methodologies and models to evaluate programs with high-dimensional data. In particular, four kinds of methods for making valid statistical inferences for treatment effects in high dimensions are addressed. The first one is the so-called doubly robust type estimation, which models the outcome regression and propensity score functions simultaneously. The second one is the covariate balance method to construct the treatment effect estimators. The third one is the sufficient dimension reduction approach for causal inferences. The last one is the machine learning procedure directly or indirectly to make statistical inferences to treatment effect. In such a way, some of these methods and models are closely related to the de-biased Lasso type methods for the regression model with high dimensions in the statistical literature. Finally, some future research topics are also discussed.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
S Athey, G W Imbens, S Wager. Approximate residual balancing: debiased inference of average treatment effects in high dimensions, Journal of the Royal Statistical Society, Series B, 2018, 80(4): 597–623.
S Athey, J Tibshirani, S Wager. Generalized random forests, Annals of Statistics, 2019, 47(2): 1148–1178.
A Belloni, V Chernozhukov. Least squares after model selection in high-dimensional sparse models, Bernoulli, 2013, 19(2): 521–547.
A Belloni, V Chernozhukov, C Hansen. Inference on treatment effects after selection among high-dimensional controls, Review of Economic Studies, 2014, 81(2): 608–650.
A Belloni, V Chernozhukov, I Fernández-Val, C Hansen. Program evaluation and causal inference with high-dimensional data, Econometrica, 2017, 85(1): 233–298.
L Breiman. Bagging predictors, Machine Learning, 1996, 24(2): 123–140.
L Breiman. Random forests, Machine Learning, 2001, 45(1): 5–32.
Z Cai. Recent developments in estimating treatment effects for panel data, China Journal of Econometrics, 2021, 1(2): 233–249.
C Carvalho, R Masini, M C Medeiros. ArCo: An artificial counterfactual approach for high-dimensional panel time-series data, Journal of Econometrics, 2018, 207(2): 352–380.
G Cerulli. Econometric Evaluation of Socio-Economic Programs. Advanced Studies in Theoretical and Applied Econometrics, Berlin Heidelber: Springer, 2015, 49.
K C G Chan, S C P Yam, Z Zhang. Globally efficient nonparametric inference of average treatment effects by empirical balancing calibration weighting, Journal of the Royal Statistical Society, Series B, 2016, 78(3): 673–700.
L Chen, J Z Huang. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection, Journal of the American Statistical Association, 2012, 107(500): 1533–1545.
M H Farrell. Robust inference on average treatment effects with possibly more covariates than observations, Journal of Econometrics, 2015, 189(1): 1–23.
J Fan, M Zhan, Z Cai, Y Fang, M Lin. Covariate balancing in propensity score estimation with variable selection: Based on GMM-LASSO approach, Systems Engineering — Theory & Practice, 2021, 41(10): 2631–2639.
C Hsiao, S H Ching, K S Wan. A panel data approach for program evaluation: measuring the benefits of political and economic integration of Hong kong with mainland China, Journal of Applied Econometrics, 2012, 27(5): 705–740.
M Y Huang, K C G Chan. Joint sufficient dimension reduction and estimation of conditional and average treatment effects, Biometrika, 2017, 104(3): 583–596.
K Imai, M Ratkovic. Covariate balancing propensity score, Journal of the Royal Statistical Society, Series B, 2014, 76(1): 243–263.
G W Imbens, J M Wooldridge. Recent developments in the econometrics of program evaluation, Journal of Economic Literature, 2009, 47(1): 5–86.
A Javanmard, A Montanari. Confidence intervals and hypothesis testing for high-dimensional regression, Journal of Machine Learning Research, 2014, 15(1): 2869–2909.
Z Liu, Z Cai, Y Fang, M Lin. Statistical analysis and evaluation of macroeconomic policies: a selective review, Applied Mathematics — A Journal of Chinese Universities, 2020, 35(1): 57–83.
W Luo, Y Zhu, D Ghosh. On estimating regression-based causal effects using sufficient dimension reduction, Biometrika, 2017, 104(1): 51–65.
S Ma, L Zhu, Z Zhang, C L Tsai, R J Carroll. A robust and efficient approach to causal inference based on sparse sufficient dimension reduction, Annals of Statistics, 2019, 47(3): 1505–1535.
Y Ning, S Peng, K Imai. Robust estimation of causal effects via a high-dimensional covariate balancing propensity score, Biometrika, 2020, 107(3): 533–554.
P R Rosenbaum, D B Rubin. The central role of the propensity score in observational studies for causal effects, Biometrika, 1983, 70(1): 41–55.
D B Rubin. Matching to remove bias in observational studies, Biometrics, 1973, 29(1), 159–183.
D B Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies, Journal of Educational Psychology, 1974, 66(5): 688–701
D B Rubin. Assignment to treatment group on the basis of a covariate, Journal of Educational Statistics, 1977, 2(1): 1–26.
P H Sant’Anna, X Song, Q Xu. Covariate distribution balance via propensity scores, 2020, arXiv preprint arXiv:1810.01370v4.
Z Shi, J Huang. Forward-selected panel data approach for program evaluation, Journal of Econometrics, 2021, https://doi.org/10.1016/j.jeconom.2021.04.009.
Z Tan. Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 2020, 107(1): 137–158.
Z Tan. Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 2020b, 48(2): 811–837.
R Tibshirani. Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Series B, 1996, 58(1): 267–288.
S Van de Geer, P Bühlmann, Y Ritov, R Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models, Annals of Statistics, 2014, 42(3): 1166–1202.
S Wager, S Athey. Estimation and inference of heterogeneous treatment effects using random forests, Journal of the American Statistical Association, 2018, 113(523): 1228–1242.
Y Xia, H Tong, W K Li, L X Zhu. An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society, Series B, 2002, 64(3): 363–410.
M Yuan, Y Lin. Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, Series B, 2006, 68(1): 49–67.
C H Zhang, S S Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models, Journal of the Royal Statistical Society: Series B, 2014, 76(1): 217–242.
H Zou, T Hastie. Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, Series B, 2005, 67(2): 301–320.
J R Zubizarreta. Stable weights that balance covariates for estimation with incomplete outcome data, Journal of the American Statistical Association, 2015, 110(511): 910–922.
Acknowledgement
The authors thank the editor and two anonymous referees for their helpful and constructive comments, which have improved the presentation of this article.
Funding
Supported by the National Natural Science Foundation of China(71631004, 72033008), National Science Foundation for Distinguished Young Scholars(71625001), and Science Foundation of Ministry of Education of China(19YJA910003).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articles Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articles Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhan, Mf., Cai, Zw., Fang, Y. et al. Recent advances in statistical methodologies in evaluating program for high-dimensional data. Appl. Math. J. Chin. Univ. 37, 131–146 (2022). https://doi.org/10.1007/s11766-022-4489-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11766-022-4489-3