Skip to main content

Abstract

This chapter introduces advanced correlational statistical techniques that are employed when there are violations of OLS regression assumptions. This chapter discusses and demonstrates how graphs and formal tests can be utilized to detect the violations of OLS assumptions when using time series and cross-sectional data. The introduction of advanced statistical techniques to address the violation of assumptions is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The DF-GLS unit root test uses generalized least squares (GLS) regression to de-trend the data.

  2. 2.

    According to Hoechle (2007), an AR process can be approximated by a MA process.

  3. 3.

    The ARMAX model is an extension of the Box–Jenkins autoregressive moving average (ARIMA) model with exogenous variables. This particular example does not include the moving average (MA) as an independent variable. The decision to include MA term is determined by observing the autocorrelation function of the dependent variable. Although not shown, including the MA term in this example, does not change the results. For more information on the ARIMA model , see Box and Jenkins (1970).

  4. 4.

    For more information on the diffuse option, see the Stata Reference Time-Series Manual, Release 16 and Ansley and Kohn (1985) and Harvey (1989).

  5. 5.

    For more information on inverse roots, see the Stata Reference Time-Series Manual, Release 16 and Hamilton (1994).

  6. 6.

    For more information on unit root tests for panel data, see Stata Longitudinal Data/Panel Data Reference Manual Release 16.

  7. 7.

    For information on the other tests, please see Herwartz et al. (2018).

  8. 8.

    For more information on the use of these tests, see De Hoyos and Sarafidis (2006).

References

  • Ansley, C. F., & Kohn, R. (1985). Estimation, Filtering, and Smoothing in State Space Models with Incompletely Specified Initial Conditions. Ann. Statist, 13(4), 1286–1316. https://doi.org/10.1214/aos/1176349739

    Article  MathSciNet  Google Scholar 

  • Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. The Review of Economic Studies, 58(2), 277–297. https://doi.org/10.2307/2297968

    Article  Google Scholar 

  • Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis; forecasting and control. Holden-Day. http://www.gbv.de/dms/hbz/toc/ht000495926.pdf

  • Breusch, T. S. (1978). Testing for autocorrelation in dynamic linear models. Australian Economic Papers, 17(31), 334–355.

    Article  Google Scholar 

  • Cumby, R. E., & Huizinga, J. (1992). Testing the Autocorrelation Structure of Disturbances in Ordinary Least Squares and Instrumental Variables Regressions. Econometrica, 60(1), 185–195.

    Article  MathSciNet  Google Scholar 

  • Davidson, R., & MacKinnon, J. G. (1993). Estimation and Inference in Econometrics (1 edition). Oxford University Press.

    Google Scholar 

  • De Hoyos, R. E., & Sarafidis, V. (2006). Testing for cross-sectional dependence in panel-data models. The Stata Journal, 6(4), 482–496.

    Article  Google Scholar 

  • Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366a), 427–431.

    Article  MathSciNet  Google Scholar 

  • Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics, 80(4), 549–560.

    Article  Google Scholar 

  • Durbin, J., & Watson, G. S. (1950). Testing for serial correlation in least squares regression: I. Biometrika, 37(3/4), 409–428.

    Article  MathSciNet  Google Scholar 

  • Eberhardt, M. (2011). XTCD: Stata module to investigate Variable/Residual Cross-Section Dependence. https://econpapers.repec.org/software/bocbocode/s457237.htm

  • Elliott, G., Rothenberg, T. J., & Stock, J. H. (1996). Efficient Tests for an Autoregressive Unit Root. Econometrica, 64(4), 813–836.

    Article  MathSciNet  Google Scholar 

  • Frees, E. W. (1995). Assessing cross-sectional correlation in panel data. Journal of Econometrics, 69(2), 393–414.

    Article  MathSciNet  Google Scholar 

  • Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675–701.

    Article  Google Scholar 

  • Godfrey, L. G. (1978). Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica: Journal of the Econometric Society, 1293–1301.

    Google Scholar 

  • Hamilton, J. D. (1994). Time Series Analysis (1 edition). Princeton University Press.

    Google Scholar 

  • Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter by Andrew C. Harvey. CUP.

    Google Scholar 

  • Herwartz, H., & Siedenburg, F. (2008). Homogenous panel unit root tests under cross sectional dependence: Finite sample modifications and the wild bootstrap. Computational Statistics & Data Analysis, 53(1), 137–150. https://doi.org/10.1016/j.csda.2008.07.008

    Article  MathSciNet  Google Scholar 

  • Herwartz, Helmut, Maxand, S., Raters, F. H., & Walle, Y. M. (2018). Panel unit-root tests for heteroskedastic panels. Stata Journal, 18(1), 184–196.

    Article  Google Scholar 

  • Hoechle, D. (2007). Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence. The Stata Journal: Promoting Communications on Statistics and Stata, 7(3), 281–312. https://doi.org/10.1177/1536867X0700700301

    Article  Google Scholar 

  • Hoechle, D. (2018). XTSCC: Stata module to calculate robust standard errors for panels with cross-sectional dependence. In Statistical Software Components. Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s456787.html

  • Pesaran, M. H. (2004). General diagnostic tests for cross section dependence in panels.

    Google Scholar 

  • Pesaran, M. H. (2015). Testing weak cross-sectional dependence in large panels. Econometric Reviews, 34(6–10), 1089–1117.

    Article  MathSciNet  Google Scholar 

  • Toutkoushian, R. K., & Paulsen, M. B. (2016). Economics of Higher Education: Background, Concepts, and Applications (1st ed. 2016 edition). Springer.

    Google Scholar 

  • Woodbridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, 2002.

    Google Scholar 

  • Wursten, J. (2017). XTCDF: Stata module to perform Pesaran’s CD-test for cross-sectional dependence in panel context. In Statistical Software Components. Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s458385.html

Download references

Author information

Authors and Affiliations

Authors

8.11 Appendix

8.11 Appendix

*Chapter 8 Stata syntax *Change working directory and open dataset cd "C:\Users\Marvin\Dropbox\Manuscripts\Book\Chapter 8\Stata files" use "Time series - Enrollment & Tuition & fees at 2 yr public HEIs.dta", clear *set the data to a time series tsset year *create Fig. 8.2.1.1. Fig. 8.2.1.1. Enrollment, Tuition, and Unemployment, Changes Over /// Time (1970 to 2017) twoway (line lnenpub2yr year, lcolor(black) lpattern(solid)) (line lntupub2yr year, /// lcolor(black) lpattern(dash)) (line lnunemprate year, lcolor(black) lpattern(dot)), /// xlabel(1970 (6) 2017, labsize(small)) ytitle(Logs) title("Trends in Enrollment in /// 2 YR, Tuition at 2 YR, and Unemployment Rates" "1970 to 2017", size(medium)) *conduct the DF-GLS tests dfgls lnenpub2yr dfgls lntupub2yr dfgls lnunemprate *create Fig. 8.2.1.2. Enrollment, Tuition, and Unemployment, /// First-Differenced (1971 to 2017), take note of options twoway (line D1.lnenpub2yr year, lcolor(black) lpattern(solid)) (line D1.lntupub2yr year, /// lcolor(black) lpattern(dash)) (line D1.lnunemprate year, lcolor(black) lpattern(dot)),/// xlabel(1971 (5) 2017, labsize(small)) ytitle(Change in Logs) /// title("First-Differenced ///Enrollment in 2 YR, Tuition at 2 /// YR, and Unemployment" "1971 to 2017", size(small)) *regress the first-differenced log of enrollment on the first-differenced log of tuition /// and unemployment reg D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate *create an autocorrelation function or correlogram of the residuals from the regression model /// racplot *generate the residuals from the model predict residuals, resid *create a graph partial autocorrelations pac residuals, yw *DW test estat dwatson *alternative DW test quietly: reg D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate, rob estat durbinalt, force *time series regression model with an AR term calibrated via the Prais-Winsten (P-W) /// estimator prais D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate, rob *generate residuals from the P-W regression predict residuals_PW, resid *use the Cumby-Huizinga (C-H) general test of the residuals ssc install actest actest residuals_PW, lag(4) q0 rob *estimate an ARMAX model with first-order (AR1) and second-order (AR2) /// autoregressive terms arima D1.lnenpub2yr D1.lntupub2yr D1.lnunemprate, ar(1 2 ) vce(robust) *examine the residuals from the ARMAX model to see if there is any autocorrelation and /// conduct a final test predict residuals_ARMX12, resid actest residuals_ARMX12 , lag(4) q0 rob *fit an ARMAX model to the data levels rather than their first-differences using the /// diffuse option, showing with no interations (nolog) arima lnenpub2yr lntupub2yr lnunemprate, ar(1 2 ) rob diffuse nolog *C-H test is used to detect any remaining autocorrelation predict residuals_nsARMA12dn, resid actest residuals_nsARMA12dn, q0 rob lag(4) *To avoid “reverse causality”, regress enrollment on at least a one year lag of tuition. /// Include the lag operator (L1) in a re-calibrated ARMAX model and use data through 2017. arima lnenpub2yr L1.lntupub2yr lnunemprate, ar(1 2 ) rob diff nolog *Fit an ARIMA model to the same data, using slightly different Stata syntax where /// the arima (2 0 0) indicates the model should include a first-order (AR1) and /// second-order (AR2) autoregressive term, *no (0) differencing and no (0) moving average (MA) term. arima lnenpub2yr L1.lntupub2yr lnunemprate, arima(2 0 0) rob nolog diffuse *check the stability of the ARMAX model estat aroots, dlabel *Examples of Autocorrelation Tests - Panel Data *open a panel dataset use "Balanced panel data - state.dta", clear *test for autocorrelation in the panel data xtserial lnnetuit lnstapr lnfte lnpc_income, output *Panel-Data Regression Models with AR terms *fixed effects model with AR term xtregar lnnetuit lnstapr lnfte lnpc_income, fe *panel unit root tests (PURTs); install xtpurt (to install in Stata, /// type "search xtpurt, all", click on "st0519" and install) or type: net install st0519, replace xtpurt lnnetuit xtpurt lnstapr xtpurt lnfte xtpurt lnpc_income *first-differenced variables in our final regression fixed- or random-effects /// model with an AR1 *disturbance term qui xtregar D1.lnnetuit D1.lnstapr D1.lnfte D1.lnpc_income, re *generate residuals from the model predict ar_residuals_re, ue *conduct the C-H autocorrelation general test of the residuals actest ar_residuals_re, lags(10) q0 robust *Tests to Detect Cross-Sectional Dependence - Unobserved Common Factors *install the Stata user-written routine, xtcsd ssc install xtcsd *use unbalanced panel dataset use "Unbalanced panel data - institutional.dta", clear *get a sense of the distribution of observations per unit (i.e., institution) in the //// panel dataset xtdes *run our fixed-effects regression model using the within regression /// estimator (xtreg, with the fe option) qui: xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac , fe *run the Pesaran test and the Friedman test. xtcsd, pesaran xtcsd, friedman *The Frees test is also conducted. xtcsd, frees *We download the most recent version of xtcd (Eberhardt 2011). ssc install xtcd, replace *Then we run the test on variables of interest from the same panel dataset. xtcd lneg lntuition lnftfac lnptfac *Using the variables that we included in a fixed-effects model above, we /// employ a random-effects *regression model and apply the test to the residuals. qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, re predict ue_residuals_re, ue *install the xtcd2 routine ssc install xtcd2, replace *check for weak cross-sectional dependence qui: xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe xtcd2 *use xtcdf (Wursten 2017) to allow for a much faster estimation of the /// Pesaran cross-sectional *dependence test and provide additional statistics ssc install xtcdf, replace qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe predict ue_residuals_fe, ue xtcdf lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac ue_residuals_fe * Panel Regression Models That Take Cross-Sectional Dependency Account /// install routine by Hoechle (2018) of regression model with /// Driscoll and Kraay (D-K) standard errors for use in Stata ssc install xtscc, replace. *run fixed-effects regression model with D-K standard errors and 2 lags of the AR term xtscc lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe lag(2) *check for cross-sectional dependence in the residuals of the regression, including /// year fixed-effects qui xtscc lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac i.endyear, fe lag(2) predict xtscc_residuals_fe2y, resid xtcdf xtscc_residuals_fe2y *Compare the estimated coefficients of interest to policy analysts of /// researchers, by running and storing the results of three /// regression models: (1) a fixed-effects model without *year fixed-effects ; (2) a fixed-effects model with year fixed-effects /// and (3) a fixed-effects *model with year fixed-effects and D-K standard errors . eststo: qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac, fe eststo: qui xtreg lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac i.endyear, fe eststo: qui xtscc lneg lnstatea lntuition lntotfteiarep lnftfac lnptfac i.endyear, fe lag(2) *Use esttab command (with the label, p[(fmt)], and keep options as well as the /// Estout varwidth option) to create a table of the stored regression results to /// compare the estimated beta coefficients of variables of interest /// across the three models esttab, label keep(lnstatea lntuition lntotfteiarep lnftfac lnptfac) varwidth(30) beta(%8.3f) *end

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Titus, M. (2021). Advanced Statistical Techniques: I. In: Higher Education Policy Analysis Using Quantitative Techniques . Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-60831-6_8

Download citation

Publish with us

Policies and ethics