# Moderation Analysis Using a Two-Level Regression Model

• Published:

## Abstract

Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

This is a preview of subscription content, log in via an institution to check access.

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

## Notes

1. Another implicit assumption in Equations (1) to (5) is that x i and u i do not contain measurement errors. We will discuss measurement errors in predictors in the concluding section.

2. With the LS estimates of γ and the population values of σ as starting values, the criterion for convergence is defined as the difference for any parameter between two consecutive iterations being smaller than 0.0001 within 300 iterations. As we shall see, nonconvergence happens mostly with smaller sample sizes together with stochastic predictors and/or nonnormally distributed errors.

3. The web folder http://www3.nd.edu/~kyuan/moderation/ also contains a SAS IML program (NML.sas), which performs essentially the same function as the R package.

## References

• Aguinis, H. (2004). Regression analysis for categorical moderators. New York: Guilford.

• Aguinis, H., Petersen, S.A., & Pierce, C.A. (1999). Appraisal of the homogeneity of error variance assumption and alternatives for multiple regression for estimating moderated effects of categorical variables. Organizational Research Methods, 2, 315–339.

• Aiken, L.S., & West, S.G. (1991). Multiple regression: testing and interpreting interactions. Thousand Oaks: Sage.

• Baron, R.M., & Kenny, D.A. (1986). The moderator-mediator variable distinction in social psychological research: concept, strategic and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182.

• Bast, J., & Reitsma, P. (1998). Analyzing the development of individual differences in terms of Matthew effects in reading: results from a Dutch longitudinal study. Developmental Psychology, 34, 1373–1399.

• Carroll, R.J., & Ruppert, D. (1988). Transformation and weighting in regression. New York: Chapman & Hall/CRC.

• Casella, G., & Berger, R.L. (2002). Statistical inference (2nd ed.). Pacific Grove: Duxbury Press.

• Chaplin, W.F. (2007). Moderator and mediator models in personality research: a basic introduction. In R.W. Robins, R.C. Fraley, & R.F. Krueger (Eds.), Handbook of research methods in personality psychology (pp. 602–632). New York: Guilford.

• Cohen, J. (1978). Partialed products are interactions; partialed powers are curve components. Psychological Bulletin, 85, 858–866.

• Cribari-Neto, F. (2004). Asymptotic inference under heteroskedasticity of unknown form. Computational Statistics & Data Analysis, 45, 215–233.

• Darlington, R.B. (1990). Regression and linear models. New York: McGraw-Hill.

• Davidson, R., & MacKinnon, J.G. (1993). Estimation and inference in econometrics. Oxford: Oxford University Press.

• Davison, M.L., Kwak, N., Seo, Y.S., & Choi, J. (2002). Using hierarchical linear models to examine moderator effects: person-by-organization interactions. Organizational Research Methods, 5, 231–254.

• Dent, W.T., & Hildreth, C. (1977). Maximum likelihood estimation in random coefficient models. Journal of the American Statistical Association, 72, 69–72.

• DeShon, R.P., & Alexander, R.A. (1996). Alternative procedures for testing regression slope homogeneity when group error variances are unequal. Psychological Methods, 1, 261–277.

• Dretzke, B.J., Levin, J.R., & Serlin, R.C. (1982). Testing for regression homogeneity under variance heterogeneity. Psychological Bulletin, 91, 376–383.

• Efron, B., & Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

• Fisicaro, S.A., & Tisak, J. (1994). A theoretical note on the stochastics of moderated multiple regression. Educational and Psychological Measurement, 54, 32–41.

• Froehlich, B.R. (1973). Some estimators for a random coefficient regression model. Journal of the American Statistical Association, 68, 329–335.

• Hayes, A.F., & Cai, L. (2007). Using heteroscedasticity-consistent standard error estimators in OLS regression: an introduction and software implementation. Behavior Research Methods, 39, 709–722.

• Hinkley, D.V. (1977). Jackknifing in unbalanced situations. Technometrics, 19, 285–292.

• Hildreth, C., & Houck, J. (1968). Some estimators for a linear model with random coefficients. Journal of the American Statistical Association, 63, 584–595.

• Holmbeck, G.N. (1997). Toward terminological, conceptual and statistical clarity in the study of mediators and moderators: examples from the child-clinical and pediatric psychology literatures. Journal of Consulting and Clinical Psychology, 65, 599–610.

• Kenny, D., & Judd, C.M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201–210.

• Littell, R., Milliken, G., Stroup, W., Wolfinger, R., & Schabenberger, O. (2006). SAS for mixed models (2nd ed.). Cary: SAS Institute.

• Long, J.S., & Ervin, L.H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. American Statistician, 54, 217–224.

• MacKinnon, J.G., & White, H. (1985). Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics, 29, 305–325.

• Marsh, H.W., Wen, Z., & Hau, K.-T. (2004). Structural equation models of latent interactions: evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300.

• Nelson, E.A., & Dannefer, D. (1992). Aged heterogeneity: fact or fiction? The fate of diversity in gerontological research. The Gerontologist, 32, 17–23.

• Newsom, J.T., Prigerson, H.G., Schulz, R., & Reynolds, C.F. (2003). Investigating moderator hypotheses in aging research: statistical methodological, and conceptual difficulties with comparing separate regressions. The International Journal of Aging & Human Development, 57, 119–150.

• Ng, M., & Wilcox, R.R. (2010). Comparing the regression slopes of independent groups. British Journal of Mathematical & Statistical Psychology, 63, 319–340.

• Overton, R.C. (2001). Moderated multiple regression for interactions involving categorical variables: a statistical control for heterogeneous variance across two groups. Psychological Methods, 6, 218–233.

• Ping, R.A. (1996). Latent variable interaction and quadratic effect estimation: a two-step technique using structural equation analysis. Psychological Bulletin, 119, 166–175.

• Preacher, K.J., & Merkle, E.C. (2012). The problem of model selection uncertainty in structural equation modeling. Psychological Methods, 17, 1–14.

• Shieh, G. (2009). Detection of interactions between a dichotomous moderator and a continuous predictor in moderated multiple regression with heterogeneous error variance. Behavior Research Methods, 41, 61–74.

• Singh, B., Nagar, A.L., Choudhry, N.K., & Raj, B. (1976). On the estimation of structural change: a generalization of the random coefficients regression model. International Economic Review, 17, 340–361.

• Tang, W., Yu, Q., Crits-Christoph, P., & Tu, X.M. (2009). A new analytic framework for moderation analysis–moving beyond analytic interactions. Journal of Data Science, 7, 313–329.

• Weisberg, S. (1980). Applied linear regression. New York: Wiley.

• White, H. (1980). A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica, 48, 817–838.

• Wooldridge, J.M. (2010). Econometric analysis of cross section and panel data (2nd ed.). Cambridge: MIT Press.

• Wu, C.F.J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. The Annals of Statistics, 14, 1261–1295.

• Yuan, K.-H., & Bentler, P.M. (1997). Improving parameter tests in covariance structure analysis. Computational Statistics & Data Analysis, 26, 177–198.

• Yuan, K.-H., & Bentler, P.M. (2010). Finite normal mixture SEM analysis by fitting multiple conventional SEM models. Sociological Methodology, 40, 191–245.

## Acknowledgement

The research of Ke-Hai Yuan was partially supported by a grant from National Natural Science Foundation of China (31271116).

## Author information

Authors

### Corresponding author

Correspondence to Ke-Hai Yuan.

## Appendices

### Appendix A. Iteratively Reweighted Least Squares (IRLS) Algorithm for NMLEs

This Appendix contains the development of the IRLS algorithm that maximizes the likelihood function l(θ) in (8). Setting the partial derivatives of l(θ) with respect to γ and σ at zero yields the normal estimating equations

$$\sum_{i=1}^n \frac{1}{\tau_i^2}\bigl(y_i-\mathbf {c}_i'\boldsymbol{\gamma}\bigr)\mathbf {c}_i=\mathbf {0},$$
(A.1)

and

$$\sum_{i=1}^n \frac{1}{2\tau_i^4}\bigl[\bigl(y_i-\mathbf {c}_i' \boldsymbol{\gamma}\bigr)^2-{\bf h}_i' \boldsymbol{\sigma}\bigr]{\bf h}_i=\mathbf {0}.$$
(A.2)

We need to solve (A.1) and (A.2) for NMLEs of γ and σ. If the $$\tau_{i}^{2}$$s are known, then the solution to (A.1) is

$$\hat{\boldsymbol{\gamma}}=\Biggl(\sum _{i=1}^n\frac{1}{\tau _i^2}\mathbf {c}_i \mathbf {c}_i'\Biggr)^{-1} \Biggl(\sum _{i=1}^n\frac{1}{\tau_i^2}\mathbf {c}_iy_i \Biggr),$$
(A.3)

and that to (A.2) is

$$\hat{\boldsymbol{\sigma}}=\Biggl(\sum _{i=1}^n \frac{1}{\tau_i^4}{\bf h}_i{ \bf h}_i'\Biggr)^{-1}\Biggl[ \sum _{i=1}^n \frac{1}{\tau_i^4}{\bf h}_i \bigl(y_i-\mathbf {c}_i'\hat{\boldsymbol{\gamma}}\bigr)^2\Biggr].$$
(A.4)

Although the $$\tau_{i}^{2}$$s are unknown in practice, they can be estimated as $${\bf h}_{i}'\hat{\boldsymbol{\sigma}}$$ once a $$\hat{\boldsymbol{\sigma}}$$ is available. Thus, Equations (A.1) and (A.2) can be solved by the following IRLS algorithm:

1. (S1)

With an initial value of σ, obtain each $$\tau_{i}^{2}={\bf h}_{i}'\boldsymbol{\sigma}$$, i=1,2,…,n.

2. (S2)

Obtain $$\hat{\boldsymbol{\gamma}}$$ by (A.3) and $$\hat {\boldsymbol{\sigma}}$$ by (A.4).

3. (S3)

Update the initial σ by $$\hat{\boldsymbol{\sigma}}$$ in (S2) and go back to (S1).

4. (S4)

Continue (S1) to (S3) until $$\hat{\boldsymbol{\gamma}}$$ and $$\hat{\boldsymbol{\sigma}}$$ stabilize.

The converged solutions are the NMLEs of γ and σ.

### Appendix B. Two-Level Regression with p Predictors and m Moderators

This Appendix contains the details of extending the simple two-level regression model in (4) to cases with p predictors and m moderators. With p predictors, the counterpart of the first regression equation in (4) is

$$y_i=\beta_{i0}+\beta_{i1}x_{i1}+ \beta_{i2}x_{i2}+\cdots+\beta _{ip}x_{ip}+e_i= \mathbf {c}'_i\boldsymbol {\beta }_i+e_i, \quad i=1,2,\ldots,n,$$
(B.1)

where $$\mathbf {c}_{i}'=(1,x_{i1},x_{i2},\ldots,x_{ip})$$ and β i =(β i0,β i1,β i2,…,β ip )′. The counterpart of the 2nd and 3rd regression equations in (4) are

$$\beta_{ij}=\gamma_{j0}+\gamma_{j1}u_{i1}+ \gamma_{j2}u_{i2}+\cdots +\gamma _{jm}u_{im}+ \varepsilon_{ij},\quad j=0,1,2,\ldots, p;\ i=1,2,\ldots ,n.$$
(B.2)

Let u i =(1,u i1,u i2,…,u im )′, ε i =(ε i0,ε i1,ε i2,…,ε ip )′,

$$\boldsymbol {\Gamma }=\left ( \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} \gamma_{00}&\gamma_{01}&\gamma_{02}&\cdots&\gamma_{0m}\\ \gamma_{10}&\gamma_{11}&\gamma_{12}&\cdots&\gamma_{1m}\\ \gamma_{20}&\gamma_{21}&\gamma_{22}&\cdots&\gamma_{2m}\\ \cdots&\cdots&\cdots&\cdots&\cdots\\ \gamma_{p0}&\gamma_{p1}&\gamma_{p2}&\cdots&\gamma_{pm} \end{array} \right ).$$

We can rewrite (B.2) in matrix form as

$$\boldsymbol {\beta }_i=\boldsymbol {\Gamma }\mathbf {u}_i+ \boldsymbol {\varepsilon }_i,\quad i=1,2,\ldots, n.$$
(B.3)

The counterpart of Equation (5) is obtained by putting (B.3) into (B.1),

$$y_i=\mathbf {c}_i'\boldsymbol {\Gamma }\mathbf {u}_i+\delta_i,$$
(B.4)

where $$\delta_{i}=\mathbf {c}_{i}'\boldsymbol {\varepsilon }_{i}+e_{i}$$. Let

$$\boldsymbol {\Sigma }=\mathrm {Var}(\boldsymbol {\varepsilon }_i)= \left ( \begin{array}{c@{\quad}@{\quad}c@{\quad}c@{\quad}c@{\quad}c} \sigma_{00}&\sigma_{01}&\sigma_{02}&\cdots&\sigma_{0p}\\ \sigma_{10}&\sigma_{11}&\sigma_{12}&\cdots&\sigma_{1p}\\ \sigma_{20}&\sigma_{21}&\sigma_{22}&\cdots&\sigma_{2p}\\ \cdots&\cdots&\cdots&\cdots&\cdots\\ \sigma_{p0}&\sigma_{p1}&\sigma_{p2}&\cdots&\sigma_{pp} \end{array} \right ).$$

Then

$$\tau_i^2=\mathrm {Var}(\delta_i)= \mathbf {c}_i'\boldsymbol {\Sigma }\mathbf {c}_i+\sigma_e^2.$$
(B.5)

Similarly, the parameters σ 00 and $$\sigma_{e}^{2}$$ are not distinguishable in (B.5) due to not having a nested data structure.

As the counterpart of (16),

$$R_j^2=\frac{\hat {\boldsymbol {\gamma }}_j'\mathbf {S}_{uu}\hat {\boldsymbol {\gamma }}_j}{\hat {\boldsymbol {\gamma }}_j'\mathbf {S}_{uu}\hat {\boldsymbol {\gamma }}_j+\hat{\sigma}_j^2}$$

is the estimate of the percentage of variance of β ij accounted for by the m moderators, where $$\hat{\boldsymbol{\gamma}}_{j}=(\hat{\gamma}_{j1},\hat{\gamma}_{j2},\ldots, \hat{\gamma}_{jm})'$$, j=1,2,…,p; and S uu is the sample covariance matrix of u i =(u i1,u i2,…,u im )′, i=1,2,…,n.

### Appendix C. R Package

This Appendix introduces an R package to perform NML estimation of the two-level regression model. The package can be downloaded at http://www3.nd.edu/~kyuan/moderation/NML.R. A simulated data set with 6 variables (y,x 1,x 2,u 1,u 2,u 3) and 500 cases is used to illustrate the use of the package, and the data set can be downloaded at http://www3.nd.edu/~kyuan/moderation/simudata.dat. Both of these files are saved in the folder d:/moderation/ in this illustration with names NML.R and simudata.dat, respectively.

The code for running the package and its utilities are documented in Appendix D. The first three lines of the code are to change the working directory, to load the package into the R Console, and to read the data, respectively. Lines 4 to 10 of Appendix D are to identify the number of cases, the dependent variable (y), possible level-1 predictors (x1, x2), and possible level-2 predictors or moderators (u1, u2, u3). In this package, we regard the level-1 and level-2 intercepts as the regression coefficients corresponding to the predictor x i0=1 and u i0=1, respectively. These are fulfilled by lines 11 and 12, where the labels or column names are for identifying parameter estimates corresponding to intercepts. Lines 16 to 20 are to specify the level-1 and level-2 regression models. In the example, the level-1 model specified by L1=cbind(x0,x1,x2) has three coefficients, β i0, β i1, and β i2, corresponding to the coefficients of x i0=1, x i1 and x i2, respectively. Lines 17 to 19 are to specify the level-2 models corresponding to each of the level-1 coefficients. L20=cbind(u0,u1,u2) assigns three predictors to β i0: u i0=1, u i1 and u i2; L21=cbind(u0,u2,u3) assigns three predictors to β i1: u i0=1, u i2 and u i3; and L22=cbind(u0,u1,u2,u3) assigns four predictors to β i2: u i0=1, u i1, u i2 and u i3. The 20th line in Appendix D puts all the level-2 predictors together to pass to the package. Notice that the specification of level-2 predictors must correspond to the level-1 predictors. For example, if you decide to leave out the intercept in level-1, then the specification for level-1 and level-2 predictors become

L1=cbind(x1,x2);#level-1 predictors;

L21=cbind(u0,u2,u3);#level-2 predictors for beta_i1;

L22=cbind(u0,u1,u2,u3);#level-2 predictors for beta_i2;

L2=list(L21,L22); #all level-2 predictors;

Lines 24 to 26 in Appendix D are to setup a H matrix corresponding to $${\bf h}_{i}$$ in Equation (7) that contains the predictors for variance parameters in σ. Line 24 requests six variance parameters be estimated in the order of σ 11=Var(ε i1), σ 22=Var(ε i2), σ 12=Cov(ε i1,ε i2), σ 01=Cov(ε i0,ε i1), σ 02=Cov(ε i0,ε i2), and $$\sigma_{0e}^{2}=\mathrm {Var}(\varepsilon_{i0})+\mathrm {Var}(e)$$. If one chooses not to let the prediction errors at level-2 covary, then line 24 needs to be set as H_mat=cbind(x1*x1, x2*x2, 1), which typically results in smaller SEs for the variance estimates. Lines 25 and 26 are to label the estimates for the variance estimates to be correctly identified in the output. For example, if not allowing the level-2 prediction errors to covary, then line 25 should be set as H_name=cbind("x1x1", "x2x2", "x0e"). In particular, the label x1x1 and x2x2 are used to identify the proper variance estimates to calculate R-squares in the package. We do not encourage users to change the notation used in labeling.

Line 28 is to run the package, and the output of running Appendix D is in Appendix E. In addition to the results of NML for the two-level model, the default output also contains the results of LS analysis for the corresponding MMR model, where xjuk corresponds to the level-1 coefficient of xj predicted by the level-2 predictor uk, e.g., x0u0 corresponds to the intercept γ 00 or the level-1 coefficient of x0 predicted by the level-2 predictor u0. For the LS estimates, the default output contains SEls, SEsw0, SEsw3, SEsw4, and the corresponding z-scores. The results corresponding to SEsw0 are not in Appendix E because there is not enough horizontal space. We choose to output these four sets of SEs because SElss are the default for LS analysis; SEsw0s are the most commonly used consistent SEs in software, SEsw3s perform the best with smaller sample sizes according to Long and Ervin (2000), and SEsw4s are most reliable when having high-leverage observations (Cribari-Neto 2004).

The labels for the results of NML are the same as those for the results of LS. The default output of NML also contains SEsw0s and the corresponding z-scores, which are not included in Appendix E due to space limitation.

## Rights and permissions

Reprints and permissions

Yuan, KH., Cheng, Y. & Maxwell, S. Moderation Analysis Using a Two-Level Regression Model. Psychometrika 79, 701–732 (2014). https://doi.org/10.1007/s11336-013-9357-x