Abstract
In this paper, we develop robust estimation for the mean and covariance jointly for the regression model of longitudinal data within the framework of generalized estimating equations (GEE). The proposed approach integrates the robust method and joint mean–covariance regression modeling. Robust generalized estimating equations using bounded scores and leverage-based weights are employed for the mean and covariance to achieve robustness against outliers. The resulting estimators are shown to be consistent and asymptotically normally distributed. Simulation studies are conducted to investigate the effectiveness of the proposed method. As expected, the robust method outperforms its non-robust version under contaminations. Finally, we illustrate by analyzing a hormone data set. By downweighing the potential outliers, the proposed method not only shifts the estimation in the mean model, but also shrinks the range of the innovation variance, leading to a more reliable estimation in the covariance matrix.
Similar content being viewed by others
References
Cantoni, E. (2004). A robust approach to longitudinal data analysis. The Canadian Journal of Statistics, 32, 169–180.
Croux, C., Gijbels, I., Prosdocomo, I. (2012). Robust estimation of mean and dispersion functions in extended generalized additive models. Biometrics, 68, 31–44.
Daniels, M., Zhao, Y. (2003). Modelling the random effects covariance matrix on longitudinal data. Statistics in Medicine, 22, 1631–1647.
Fan, J., Huang, T., Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of covariance function. Journal of the American Statistical Association, 35, 632–641.
Fan, J., Wu, Y. (2008). Semiparametric estimation of covariance matrices for longitudinal data. Journal of the American Statistical Association, 103, 1520–1533.
Fan, J., Zhang, J. T. (2000). Two-step estimation of functional linear models with application to longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62, 303–322.
Fung, W. K., Zhu, Z. Y., He, X. (2002). Influence diagnostics and outlier tests for semiparametric mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 565–579.
He, X., Fung, W. K., Zhu, Z. Y. (2005). Robust estimation in generalized partial linear models for clustered data. Journal of the American Statistical Association, 472, 1176–1184.
Leng, C., Zhang, W., Pan, J. (2010). Semiparametric mean-covariance regression analysis for longitudinal data. Journal of the American Statistical Association, 105, 181–193.
Levina, E., Rothman, A. J., Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Annals of Applied Statistics, 2, 245–263.
Liang, K. Y., Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Mao, J., Zhu, Z. Y., Fung, W. K. (2011). Joint estimation of mean-covariance model for longitudinal data with basis function approximations. Computational Statistics and Data Analysis, 55, 983–992.
McCullagh, P. (1983). Quasi-likelihood functions. Annals of Statistics, 11, 59–67.
Pan, J., Mackenzie, G. (2003). On modelling mean-covariance structures in longitudinal studies. Biometrika, 90, 239–244.
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika, 86, 677–690.
Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika, 87, 425–435.
Qin, G. Y., Zhu, Z. Y. (2007). Robust estimation in generalized semiparametric mixed models for longitudinal data. Journal of Multivariate Analysis, 98, 1658–1683.
Qin, G. Y., Zhu, Z. Y., Fung, W. K. (2009). Robust estimation of covariance parameters in partial linear model for longitudinal data. Journal of Statistical Planning and Inference, 139, 558–570.
Qu, A., Lindsay, B., Li, B. (2000). Improving generalised estimating equations using quadratic inference functions. Biometrica, 87, 823–836.
Sinha, S. (2004). Robust analysis of generalized linear mixed models. Journal of the American Statistical Association, 99, 451–460.
Wang, N. (2003). Marginal nonparametric kernel regression accounting within-subject correlation. Biometrika, 90, 29–42.
Wang, N., Carroll, R. J., Lin, X. (2005a). Efficient semiparametric marginal estimation for longitudinal/clustered data. Journal of the American Statistical Association, 100, 147–157.
Wang, Y. G., Lin, X., Zhu, M. (2005b). Robust estimation functions and bias correction for longitudinal data analysis. Biometrics, 61, 684–691.
Wu, W., Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831–844.
Ye, H., Pan, J. (2006). Modelling covariance structures in generalized estimating equations for longitudinal data. Biometrika, 93, 927–941.
Zhang, D. W., Lin, X. H., Raz, J., Sowers, M. F. (1998). Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association, 93, 710–719.
Acknowledgments
The authors are grateful to the reviewers, the Associate Editor, and the Co-Editor for their insightful comments and suggestions which have improved the manuscript significantly.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proofs
Regularity conditions:
-
A1.
We assume that the dimensions \(p,\,q\) and \(d\) of the covariates \(x_{ij}\) \(z_{ij}\) and \(z_{ijk}\) are fixed and that \(\{n_i\}\) is a bounded sequence of positive integers. The first four moments of \(y_{ij}\) exist.
-
A2.
The parameter space of \((\beta ^{\prime },\gamma ^{\prime },\lambda ^{\prime })^{\prime },\,\varTheta \), is a compact subset of \(R^{p+q+d}\), and the true parameter value \((\beta ^{\prime }_{0},\gamma ^{\prime }_{0},\lambda ^{\prime }_{0})^{\prime }\) is in the interior of the parameter space \(\varTheta \).
-
A3.
The covariates \(z_{ijk}\) and \(z_{ij}\), the matrices \(W_{i}^{-1}\) are all bounded, meaning that all the elements of the vectors are bounded. The function \(\dot{g}^{-1}(\cdot )\) has bounded second derivatives.
Proof of Theorem 1
For illustration we only give the proof that \(\hat{\beta }_m\rightarrow \beta _0\) almost surely. The proofs for \(\hat{\gamma }_ m\) and \(\hat{\lambda }_m\) are similar. According to McCullagh (1983), we have
On the other hand, the expectation and variance matrix of \(U_{1i}=X_i^{\prime }\varDelta _i(V^{\beta }_i)^{-1}h^{\beta }_i(\mu _i(\beta ))\) at \(\beta =\beta _0\) are given by \(E_0(U_{1i})=0\) and
where \(G_i^{0}=\text{ diag}\{\dot{g}^{-1}(x_{i1}^{\prime }\beta _0),\dots , \dot{g}^{-1}(x^{\prime }_{in_i}\beta _0)\}\) is an \(n_i\times n_i\) diagonal matrix.
Since \(V_i^{\beta }=A_i^{-1/2}\varSigma _i\) and \(\varSigma _i^{-1}=\varPhi ^{\prime }_iD^{-1}_i\varPhi _i\), the variance can be further written as \(\text{ var}_0(U_{1i})=(G^0_iX_i^{\prime }X_i)^{\prime }\varPhi _i(D_i^{-1}A_i^{-1/2})\varPhi ^{\prime } _i\varGamma ^{\beta }_i(G^0_iX_i^{\prime }X_i)\). Condition A3 above implies that there exists a constant \(\kappa _0\) such that \(\text{ var}_0(U_{1i})\le \kappa _01_{p\times p}\) for all \(i\) and all \(\theta \in \varTheta \), where \(1_{p\times p}\) is the \(p\times p\) matrix with all elements being 1’s, meaning that all elements of \(\text{ var}_0(U_{1i})\) are bounded by \(\kappa _0\). Thus \(\sum ^{\infty }_{i=1}\text{ var}_0(U_{1i})/i^2<\infty \). By Kolmogorov’s strong law of large numbers we know that
almost surely as \(m\rightarrow \infty \). In the same manner it can be shown that
is a bounded matrix. This leads to \(\hat{\beta }_m-\beta _0\rightarrow 0 \) almost surely as \(m\rightarrow \infty \). The proof is complete. \(\square \)
Proof of Theorem 2
First we give some notations. Define
Next we prove the following Lemma:
Lemma
Under condition (A1)–(A3),
Define
By condition \((\)A1\()\)–\((\)A3\()\), \(\Psi (\xi )\) and \(U_1\) give the same root for \(\xi \). The solution of \(\varPhi \) is \(\tilde{\xi }\). Following the proof of Theorem \(1\) in He et al. (2005), we immediately obtain that
where \(L\) is a sufficiently large number. By Brouwer’s fixed-point theorem, (11) is verified. We can prove (12) and (13) similarly. \(\square \)
By Lemma, we only need to show the asymptotic normality of \((\tilde{\xi }^{\prime },\ \tilde{\eta }^{\prime },\ \tilde{\zeta }^{\prime })^{\prime }/\sqrt{m}\). This is equivalent to the asymptotic normality of \((\tilde{U}_{1}^{\prime },\ \tilde{U}_{2}^{\prime },\ \tilde{U}_{3}^{\prime })/\sqrt{m}\). Note that Conditions (A1)–(A3) imply that
for any \(\varsigma \in R^{p+K},\ \omega \in R^{q}\, \text{ and}\, \phi \in R^{d+K^{\prime }}\), where \(\kappa \) is a constant independent of \(i\).
Furthermore, we have
Therefore, the asymptotic normality of \((\tilde{U}_{1}^{\prime },\ \tilde{U}_{2}^{\prime },\ \tilde{U}_{3}^{\prime })/\sqrt{m}\) is easily proved by multivariate Liapounov central limit theorem. Therefore,
The proof of Theorem 2 is completed. \(\square \)
About this article
Cite this article
Zheng, X., Fung, W.K. & Zhu, Z. Robust estimation in joint mean–covariance regression model for longitudinal data. Ann Inst Stat Math 65, 617–638 (2013). https://doi.org/10.1007/s10463-012-0383-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-012-0383-8