Abstract
This chapter introduces advanced correlational statistical techniques that are employed when there are violations of OLS regression assumptions. This chapter discusses and demonstrates how graphs and formal tests can be utilized to detect the violations of OLS assumptions when using time series and cross-sectional data. The introduction of advanced statistical techniques to address the violation of assumptions is also presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The DF-GLS unit root test uses generalized least squares (GLS) regression to de-trend the data.
- 2.
According to Hoechle (2007), an AR process can be approximated by a MA process.
- 3.
The ARMAX model is an extension of the Box–Jenkins autoregressive moving average (ARIMA) model with exogenous variables. This particular example does not include the moving average (MA) as an independent variable. The decision to include MA term is determined by observing the autocorrelation function of the dependent variable. Although not shown, including the MA term in this example, does not change the results. For more information on the ARIMA model , see Box and Jenkins (1970).
- 4.
- 5.
For more information on inverse roots, see the Stata Reference Time-Series Manual, Release 16 and Hamilton (1994).
- 6.
For more information on unit root tests for panel data, see Stata Longitudinal Data/Panel Data Reference Manual Release 16.
- 7.
For information on the other tests, please see Herwartz et al. (2018).
- 8.
For more information on the use of these tests, see De Hoyos and Sarafidis (2006).
References
Ansley, C. F., & Kohn, R. (1985). Estimation, Filtering, and Smoothing in State Space Models with Incompletely Specified Initial Conditions. Ann. Statist, 13(4), 1286–1316. https://doi.org/10.1214/aos/1176349739
Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. The Review of Economic Studies, 58(2), 277–297. https://doi.org/10.2307/2297968
Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis; forecasting and control. Holden-Day. http://www.gbv.de/dms/hbz/toc/ht000495926.pdf
Breusch, T. S. (1978). Testing for autocorrelation in dynamic linear models. Australian Economic Papers, 17(31), 334–355.
Cumby, R. E., & Huizinga, J. (1992). Testing the Autocorrelation Structure of Disturbances in Ordinary Least Squares and Instrumental Variables Regressions. Econometrica, 60(1), 185–195.
Davidson, R., & MacKinnon, J. G. (1993). Estimation and Inference in Econometrics (1 edition). Oxford University Press.
De Hoyos, R. E., & Sarafidis, V. (2006). Testing for cross-sectional dependence in panel-data models. The Stata Journal, 6(4), 482–496.
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366a), 427–431.
Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics, 80(4), 549–560.
Durbin, J., & Watson, G. S. (1950). Testing for serial correlation in least squares regression: I. Biometrika, 37(3/4), 409–428.
Eberhardt, M. (2011). XTCD: Stata module to investigate Variable/Residual Cross-Section Dependence. https://econpapers.repec.org/software/bocbocode/s457237.htm
Elliott, G., Rothenberg, T. J., & Stock, J. H. (1996). Efficient Tests for an Autoregressive Unit Root. Econometrica, 64(4), 813–836.
Frees, E. W. (1995). Assessing cross-sectional correlation in panel data. Journal of Econometrics, 69(2), 393–414.
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675–701.
Godfrey, L. G. (1978). Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica: Journal of the Econometric Society, 1293–1301.
Hamilton, J. D. (1994). Time Series Analysis (1 edition). Princeton University Press.
Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter by Andrew C. Harvey. CUP.
Herwartz, H., & Siedenburg, F. (2008). Homogenous panel unit root tests under cross sectional dependence: Finite sample modifications and the wild bootstrap. Computational Statistics & Data Analysis, 53(1), 137–150. https://doi.org/10.1016/j.csda.2008.07.008
Herwartz, Helmut, Maxand, S., Raters, F. H., & Walle, Y. M. (2018). Panel unit-root tests for heteroskedastic panels. Stata Journal, 18(1), 184–196.
Hoechle, D. (2007). Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence. The Stata Journal: Promoting Communications on Statistics and Stata, 7(3), 281–312. https://doi.org/10.1177/1536867X0700700301
Hoechle, D. (2018). XTSCC: Stata module to calculate robust standard errors for panels with cross-sectional dependence. In Statistical Software Components. Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s456787.html
Pesaran, M. H. (2004). General diagnostic tests for cross section dependence in panels.
Pesaran, M. H. (2015). Testing weak cross-sectional dependence in large panels. Econometric Reviews, 34(6–10), 1089–1117.
Toutkoushian, R. K., & Paulsen, M. B. (2016). Economics of Higher Education: Background, Concepts, and Applications (1st ed. 2016 edition). Springer.
Woodbridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, 2002.
Wursten, J. (2017). XTCDF: Stata module to perform Pesaran’s CD-test for cross-sectional dependence in panel context. In Statistical Software Components. Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s458385.html
Author information
Authors and Affiliations
8.11 Appendix
8.11 Appendix
*Chapter 8 Stata syntax *Change working directory and open dataset cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 8\Stata files" use "Time series - Enrollment & Tuition & fees at 2 yr public HEIs.dta", clear *set the data to a time series tsset year *create Fig. 8.2.1.1. Fig. 8.2.1.1. Enrollment, Tuition, and Unemployment, Changes Over /// Time (1970 to 2017) twoway (line lnenpub2yr year, lcolor(black) lpattern(solid)) (line lntupub2yr year, /// lcolor(black) lpattern(dash)) (line lnunemprate year, lcolor(black) lpattern(dot)), /// xlabel(1970 (6) 2017, labsize(small)) ytitle(Logs) title("Trends in Enrollment in /// 2 YR, Tuition at 2 YR, and Unemployment Rates" "1970 to 2017", size(medium)) *conduct the DF-GLS tests dfgls lnenpub2yr dfgls lntupub2yr dfgls lnunemprate *create Fig. 8.2.1.2. Enrollment, Tuition, and Unemployment, /// First-Differenced (1971 to 2017), take note of options twoway (line D1.lnenpub2yr year, lcolor(black) lpattern(solid)) (line D1.lntupub2yr year, /// lcolor(black) lpattern(dash)) (line D1.lnunemprate year, lcolor(black) lpattern(dot)),/// xlabel(1971 (5) 2017, labsize(small)) ytitle(Change in Logs) /// title("First-Differenced ///Enrollment in 2 YR, Tuition at 2 /// YR, and Unemployment" "1971 to 2017", size(small)) *regress the first-differenced log of enrollment on the first-differenced log of tuition /// and unemployment reg D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate *create an autocorrelation function or correlogram of the residuals from the regression model /// racplot *generate the residuals from the model predict residuals, resid *create a graph partial autocorrelations pac residuals, yw *DW test estat dwatson *alternative DW test quietly: reg D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate, rob estat durbinalt, force *time series regression model with an AR term calibrated via the Prais-Winsten (P-W) /// estimator prais D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate, rob *generate residuals from the P-W regression predict residuals_PW, resid *use the Cumby-Huizinga (C-H) general test of the residuals ssc install actest actest residuals_PW, lag(4) q0 rob *estimate an ARMAX model with first-order (AR1) and second-order (AR2) /// autoregressive terms arima D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate, ar(1 2 ) vce(robust) *examine the residuals from the ARMAX model to see if there is any autocorrelation and /// conduct a final test predict residuals_ARMX12, resid actest residuals_ARMX12 , lag(4) q0 rob *fit an ARMAX model to the data levels rather than their first-differences using the /// diffuse option, showing with no interations (nolog) arima lnenpub2yr lntupub2yr lnunemprate, ar(1 2 ) rob diffuse nolog *C-H test is used to detect any remaining autocorrelation predict residuals_nsARMA12dn, resid actest residuals_nsARMA12dn, q0 rob lag(4) *To avoid “reverse causality”, regress enrollment on at least a one year lag of tuition. /// Include the lag operator (L1) in a re-calibrated ARMAX model and use data through 2017. arima lnenpub2yr L1.lntupub2yr lnunemprate, ar(1 2 ) rob diff nolog *Fit an ARIMA model to the same data, using slightly different Stata syntax where /// the arima (2 0 0) indicates the model should include a first-order (AR1) and /// second-order (AR2) autoregressive term, *no (0) differencing and no (0) moving average (MA) term. arima lnenpub2yr L1.lntupub2yr lnunemprate, arima(2 0 0) rob nolog diffuse *check the stability of the ARMAX model estat aroots, dlabel *Examples of Autocorrelation Tests - Panel Data *open a panel dataset use "Balanced panel data - state.dta", clear *test for autocorrelation in the panel data xtserial lnnetuit lnstapr lnfte lnpc_income, output *Panel-Data Regression Models with AR terms *fixed effects model with AR term xtregar lnnetuit lnstapr lnfte lnpc_income, fe *panel unit root tests (PURTs); install xtpurt (to install in Stata, /// type "search xtpurt, all", click on "st0519" and install) or type: net install st0519, replace xtpurt lnnetuit xtpurt lnstapr xtpurt lnfte xtpurt lnpc_income *first-differenced variables in our final regression fixed- or random-effects /// model with an AR1 *disturbance term qui xtregar D1.lnnetuit D1.lnstapr D1.lnfte D1.lnpc_income, re *generate residuals from the model predict ar_residuals_re, ue *conduct the C-H autocorrelation general test of the residuals actest ar_residuals_re, lags(10) q0 robust *Tests to Detect Cross-Sectional Dependence - Unobserved Common Factors *install the Stata user-written routine, xtcsd ssc install xtcsd *use unbalanced panel dataset use "Unbalanced panel data - institutional.dta", clear *get a sense of the distribution of observations per unit (i.e., institution) in the //// panel dataset xtdes *run our fixed-effects regression model using the within regression /// estimator (xtreg, with the fe option) qui: xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac , fe *run the Pesaran test and the Friedman test. xtcsd, pesaran xtcsd, friedman *The Frees test is also conducted. xtcsd, frees *We download the most recent version of xtcd (Eberhardt 2011). ssc install xtcd, replace *Then we run the test on variables of interest from the same panel dataset. xtcd lneg lntuition lnftfac lnptfac *Using the variables that we included in a fixed-effects model above, we /// employ a random-effects *regression model and apply the test to the residuals. qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, re predict ue_residuals_re, ue *install the xtcd2 routine ssc install xtcd2, replace *check for weak cross-sectional dependence qui: xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe xtcd2 *use xtcdf (Wursten 2017) to allow for a much faster estimation of the /// Pesaran cross-sectional *dependence test and provide additional statistics ssc install xtcdf, replace qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe predict ue_residuals_fe, ue xtcdf lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac ue_residuals_fe * Panel Regression Models That Take Cross-Sectional Dependency Account /// install routine by Hoechle (2018) of regression model with /// Driscoll and Kraay (D-K) standard errors for use in Stata ssc install xtscc, replace. *run fixed-effects regression model with D-K standard errors and 2 lags of the AR term xtscc lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe lag(2) *check for cross-sectional dependence in the residuals of the regression, including /// year fixed-effects qui xtscc lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac i.endyear, fe lag(2) predict xtscc_residuals_fe2y, resid xtcdf xtscc_residuals_fe2y *Compare the estimated coefficients of interest to policy analysts of /// researchers, by running and storing the results of three /// regression models: (1) a fixed-effects model without *year fixed-effects ; (2) a fixed-effects model with year fixed-effects /// and (3) a fixed-effects *model with year fixed-effects and D-K standard errors . eststo: qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe eststo: qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac i.endyear, fe eststo: qui xtscc lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac i.endyear, fe lag(2) *Use esttab command (with the label, p[(fmt)], and keep options as well as the /// Estout varwidth option) to create a table of the stored regression results to /// compare the estimated beta coefficients of variables of interest /// across the three models esttab, label keep(lnstatea lntuition lntotfteiarep lnftfac lnptfac) varwidth(30) beta(%8.3f) *end
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Titus, M. (2021). Advanced Statistical Techniques: I. In: Higher Education Policy Analysis Using Quantitative Techniques . Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-60831-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-60831-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60830-9
Online ISBN: 978-3-030-60831-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)