## Abstract

When studying the association between an exposure and an outcome, it is common to use regression models to adjust for measured confounders. The most common models in epidemiologic research are logistic regression and Cox regression, which estimate conditional (on the confounders) odds ratios and hazard ratios. When the model has been fitted, one can use regression standardization to estimate marginal measures of association. If the measured confounders are sufficient for confounding control, then the marginal association measures can be interpreted as poulation causal effects. In this paper we describe a new R package, stdReg, that carries out regression standardization with generalized linear models (e.g. logistic regression) and Cox regression models. We illustrate the package with several examples, using real data that are publicly available.

### Similar content being viewed by others

## References

Rothman K, Greenland S, Lash T. Mod Epidemiol. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.

Gail M, Byar D. Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biom J. 1986;28(5):587–99.

Sjölander AAF. stdReg: Regression Standardization. R package version 0.1. 2016.

Dahlqwist E, Sjölander AAF. Model-based estimation of confounder-adjusted attributable fractions. R package version 0.1 2015.

Stefanski L, Boos D. The calculus of M-estimation. Am Stat. 2002;56(1):29–38.

Breslow N, Day N. Statistical methods in cancer research. The analysis of case–control studies, vol. 1. Lyon: IARC/WHO; 1980.

van der Laan M. Estimation based on case–control designs with known prevalence probability. Int J Biostat. 2008;4(1):a17.

De Jong U, Breslow N, Hong G, Ewe J, Sridharan M, Shanmugaratnam K. Aetiological factors in oesophageal cancer in singapore chinese. Int J Cancer. 1974;13(3):291–303.

Sjölander A, Vansteelandt S, Humphreys K. A principal stratification approach to assess the differences in prognosis between cancers caused by hormone replacement therapy and by other factors. Int J Biostat. 2010;6(1):a20.

Breslow N. Discussion of the paper by D. R. Cox. J R Stat Soc B. 1972;34(2):216–7.

Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J. 2007;49(3):453–73.

Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003;14(6):680–6.

Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Progr Biomed. 2004;75(1):45–9.

Robins J. Robust estimation in sequentially ignorable missing data and causal inference models. Proc Am Stat Assoc. 2000;1999:6–10.

Bai X, Tsiatis A, O’Brien S. Doubly-robust estimators of treatment-specific survival distributions in observational studies with stratified sampling. Biometrics. 2013;69(4):830–9.

Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9):1393–512.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix 1: Asymptotic distribution for standardized measures

### Appendix 1: Asymptotic distribution for standardized measures

For generalized linear models, let \(x_0\) and \(x_1\) be fixed constants. Let \(\psi =g\{\theta (x_0),\theta (x_1)\}\) be a function of \(\theta (x_0)\) and \(\theta (x_1)\), e.g. \(\theta (x_1)-\theta (x_0)\). Define \(\nu =\{\beta ,\theta (x_0),\theta (x_1),\psi \}\). The estimator \(\hat{\nu }=[\hat{\beta },\hat{\theta }(x_0),\hat{\theta }(x_1),g\{\hat{\theta } (x_0),\hat{\theta } (x_1)\}]\) is an M-estimator [5] that solves the estimating equation

where \(U_{\beta ,i}(\beta )\) is the contribution to the maximum likelihood score function from subject *i*, \(U_{\theta (x),i}\{\beta ,\theta (x)\}=\eta ^{-1}\{h(X=x,Z_i;\beta )\}-\theta (x)\) for \(x=x_1\) and \(x=x_0\), and \(U_{\psi ,i}\{\theta (x_0),\theta (x_1),\psi \}=g\{\theta (x_0),\theta (x_1)\}-\psi\).

For Cox regression models, let \(x_0\), \(x_1\) and *t* be fixed constants. Let \(\psi =g\{\theta (t,x_0),\theta (t,x_1)\}\) be a function of \(\theta (t,x_0)\) and \(\theta (t,x_1)\), e.g. \(\theta (t,x_1)-\theta (t,x_0)\). Define \(\nu =\{\beta ,{\varLambda }_0(t),\theta (t,x_0),\theta (t,x_1),\psi \}\). The estimator \(\hat{\nu }=[\hat{\beta },\hat{{\varLambda }}_0(t),\hat{\theta }(t,x_0),\hat{\theta }(t,x_1),g\{\hat{\theta } (t,x_0),\hat{\theta } (t,x_1)\}]\) is an M-estimator [5] that solves the estimating equation

where \(U_{\beta ,i}(\beta )\) is the contribution to the Cox partial likelihood score function from subject *i*, \(U_{{\varLambda }_0(t),i}\{\beta ,{\varLambda }_0(t)\}\) is the contribution to the estimating function for Breslow’s estimator of the cumulative baseline hazard from subject *i*, \(U_{\theta (t,x),i}\{\beta ,{\varLambda }_0(t),\theta (t,x)\}=\text {exp}[-{\varLambda }_0(t)\text {exp}\{h(X=x,Z_i;\beta )\}]-\theta (t,x)\) for \(x=x_1\) and \(x=x_0\), and \(U_{\psi ,i}\{\theta (t,x_0),\theta (t,x_1),\psi \}=g\{\theta (t,x_0),\theta (t,x_1)\}-\psi\).

For both generalized linear models and Cox regression models it now follows from standard theory for M-estimators [5] that \(n^{1/2}(\hat{\nu }-\nu )\) is asymptotically normal with mean 0 and variance given by the ‘sandwich formula’

A consistent estimate of the variance of \(\hat{\nu }\) is obtained by replacing \(\nu\) in (5) with \(\hat{\nu }\), and the population moments in (5) by their sample counterparts.

The sandwich formula assumes that \(U_{\nu ,i}(\nu )\) and \(U_{\nu ,i^{\prime}}(\nu )\) are independent, for \(i\ne i^{\prime}\). When data are clustered, as in the example in ‘Standardization with generalized linear models’ section, we may define \(U_{\nu ,i}(\nu )=\sum _{j=1}^{n_i}U_{\nu ,ij}(\nu )\), where \(U_{\nu ,ij}(\nu )\) is the contribution to the estimating equation from subject *j* within cluster *i*, and \(n_i\) is the total number of subjects in cluster *i*. Provided that the clusters are independent we thus have that \(U_{\nu ,i}(\nu )\) and \(U_{\nu ,i^{\prime}}(\nu )\) are independent as well, for \(i\ne i^{\prime}\), so that the sandwich formula still applies.

## Rights and permissions

## About this article

### Cite this article

Sjölander, A. Regression standardization with the R package stdReg
.
*Eur J Epidemiol* **31**, 563–574 (2016). https://doi.org/10.1007/s10654-016-0157-3

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10654-016-0157-3