# Robust estimation in joint mean–covariance regression model for longitudinal data

## Authors

- First Online:

- Received:
- Revised:

DOI: 10.1007/s10463-012-0383-8

- Cite this article as:
- Zheng, X., Fung, W.K. & Zhu, Z. Ann Inst Stat Math (2013) 65: 617. doi:10.1007/s10463-012-0383-8

- 367 Views

## Abstract

In this paper, we develop robust estimation for the mean and covariance jointly for the regression model of longitudinal data within the framework of generalized estimating equations (GEE). The proposed approach integrates the robust method and joint mean–covariance regression modeling. Robust generalized estimating equations using bounded scores and leverage-based weights are employed for the mean and covariance to achieve robustness against outliers. The resulting estimators are shown to be consistent and asymptotically normally distributed. Simulation studies are conducted to investigate the effectiveness of the proposed method. As expected, the robust method outperforms its non-robust version under contaminations. Finally, we illustrate by analyzing a hormone data set. By downweighing the potential outliers, the proposed method not only shifts the estimation in the mean model, but also shrinks the range of the innovation variance, leading to a more reliable estimation in the covariance matrix.

### Keywords

Covariance matrixGeneralized estimating equation Longitudinal dataModified Cholesky decompositionRobustness## 1 Introduction

Longitudinal data are often characterized by the dependence of repeated observations over time within the same subject. Observations within the same subject are prone to be correlated. In a marginal model, the typical correlation among a subject’s repeated measurements is not of primary interest, but it should be taken into account to make proper inference. In fact, ignoring the within-subject correlation can result in an inefficient estimator of a regression model, see Qu et al. (2000) and Wang (2003). A well-modeled covariance can decrease the bias of mean estimation for longitudinal data including missing values Daniels and Zhao (2003) and maintain a reliable estimation for the covariance matrix even when the mean model is not correctly specified Pan and Mackenzie (2003). In some cases, the correlation structure plays the role as important as the mean structure, which suggests that the estimation of the covariance matrix is crucial in longitudinal study.

To overcome these challenges, there has been substantial recent literature considering the mean and covariance matrix simultaneously. Pourahmadi (1999, 2000) developed a parametric joint mean and covariance model in the framework of GEE by the modified Cholesky decomposition, but their method does not deal with irregular observed measurements. Pan and Mackenzie (2003) exploited a reparameterisation of the marginal covariance matrix to extend Pourahmadi’s work to irregular cases. Wu and Pourahmadi (2003) proposed non-parametric smoothing to regularize the estimation of covariance matrix, using the two-step estimation of Fan and Zhang (2000). To relax the parametric assumption in mean and covariance structure, Fan et al. (2007) and Fan and Wu (2008) proposed semiparametric model for the mean and covariance structure. However, they only considered the normal data or nearly normal data. For longitudinal data analysis, Liang and Zeger (1986) introduced the technique of generalized estimating equations and generalized estimating equations were developed for both mean and covariance parameters in Ye and Pan (2006). To relax the parametric assumption posed in Ye and Pan (2006), Leng et al. (2010) proposed a semiparametric model for mean and covariance structure.

GEE approach takes advantage of the built-in robustness since no specification of the full likelihood is required. However, in a longitudinal data set, one outlier in the subject level may generate a set of outliers in the sample due to repeated measurements. Therefore, robustness study is required for the reason that estimating equations are highly sensitive to outliers in the sample. Robust regression methods have been developed for estimation on mean parameters and covariance matrix estimation separately. An incomplete list of recent works on the robust GEE methods includes Cantoni (2004), He et al. (2005), Wang et al. (2005a), Qin and Zhu (2007), Qin et al. (2009) and Croux et al. (2012).

However, as far as we know, there is little discussion on robust estimation on joint mean and covariance model. Croux et al. (2012) considered robustification on the mean and covariance where they set up estimating equations for both the mean and the dispersion parameter. The constraint of their approach is that they assumed an inflexible covariance structure determined by two parameters. In this article, following Ye and Pan (2006) and He et al. (2005), we establish a set of robust generalized estimating equations for analyzing a parametric joint mean and covariance regression model for longitudinal data. Robust generalized estimating equations using bounded scores and leverage-based weights are developed for the mean and covariance to achieve robustness against outliers. The advantage of the proposed joint model relies on modeling the covariance matrix with moderate number of parameters, rather than assuming a fixed structure or introducing unstructured covariance matrix causing the curse of dimensionality.

Similar to He et al. (2005), the Mallows-type weights are used to downweigh the effect of leverage points when a bounded score function on the Pearson residuals is employed to reduce the effect of outliers in the response. The Mallows-type weights have also been used by Qin and Zhu (2007) in generalized semiparametric mixed model and Qin et al. (2009) in generalized partial linear mixed model for longitudinal data analysis. The resulting estimators for the regression coefficients in both the mean and covariance are shown to be consistent and asymptotically normally distributed. In simulation studies, for one thing, we apply the robust method in the joint model to obtain better estimators for both mean and covariance parameters under contaminations. On the other side, we find robustifications in mean and in covariance matrix estimation are both necessary, owing to the fact that the triple-robust estimating method performs far better than the mean-robust estimating method. In the analysis of the hormone data analysis, the main advantage of the robust method lies in successfully detecting both the subject-level and observation-level potential influential outliers, which results in a more reliable estimation in both the mean and covariance.

The rest of the article is organized as follows: In Sect. 2, we formulate the robust joint mean and covariance model and introduce the estimation methods. Theoretical properties are established in this section as well. Simulation studies are presented in Sect. 3. Finally, we carry out a hormone data set analysis to illustrate the proposed method in Sect. 4.

## 2 Methodology

### 2.1 Joint mean–covariance model

Suppose that we have a sample of \(m\) subjects. Let \(y_i=(y_{i1},\dots ,y_{in_i})^{\prime }\) be the \(n_i\) repeated measurements at time point \(t_i=(t_{i1},\dots ,t_{in_i})^{\prime }\) of the \(i\)th subject. Let \(E(y_i)=\mu _i=(\mu _{i1},\dots ,\mu _{in_i})^{\prime }\) and \(Cov(y_i)=\Sigma _i\) be the \(n_i\times 1\) mean vector and \(n_i\times n_i\) covariance matrix of \(y_i\), respectively.

### 2.2 Robust generalized estimating equations for the mean

He et al. (2005) mentioned that the GEE approach has some built-in robustness since it requires no specification of the full likelihood. However, estimating equations are highly sensitive to outliers in the sample. In longitudinal studies, an outlier in a subject-level measurement can generate multiple outliers in the sample. Sinha (2004) also emphasized that a small proportion of the data may come from an arbitrary distribution rather than the distribution in the assumption, i.e. the deviations from underlying distributions, which can result in outliers or influential observations in the data. Therefore, robust method is desirable to mitigate the effect of outliers and to obtain bounded influence functions.

### 2.3 Robust estimating equations for joint mean and covariance model

Furthermore, \(V^\beta _{i}=A_{i}^{-1/2}\Sigma _{i},\,A_{i}\) is the diagonal elements of \(\Sigma _{i}\); \(V^{\gamma }_{i}=D_i^{1/2}\); \(V^{\lambda }_{i}=\widetilde{A}_i^{-1/2}\widetilde{\Sigma }_i,\, \widetilde{A}_{i}\) is the diagonal elements of \(\widetilde{\Sigma }_{i}\). Similar to Ye and Pan (2006), the sandwich working covariance structure \(\widetilde{\Sigma }_i=B_i^{1/2}R_i(\delta )B_i^{1/2}\) can be used to model the true \(\widetilde{\Sigma }_i=Var(\epsilon ^2_i)\) with \(B_i=2\text{ diag}\{ \sigma ^4_{i1},\dots ,\sigma ^4_{in_i}\}\) and \(R_i(\delta )\) mimics the correlation between \(\epsilon ^2_{ij}\) and \(\epsilon ^2_{ik}\) by introducing a new parameter \(\delta \). Typical structures for \(R_{i}(\delta )\) include compound symmetry and AR(1). Although no particular suggestion on how to choose the structure and the value of \(\delta \) was provided Ye and Pan (2006), the parameter \(\delta \) has little effect on the estimation in practice, which is also confirmed in the simulation study reported in the later section. Moreover, we will show that the working correlation structure also has little effect on the estimates. In fact, \(r_{i}\) in \(U_{2}(\gamma )\) and \(\varepsilon _{i}^{2}\) in \(U_{3}(\lambda )\) play a role similar to that of \(y_{i}\) in \(U_{1}(\beta )\) and they can be viewed as working responses. Hence the ideas behind Eqs. (6) and (7) are in agreement with that in Eq. (5), which enhance the importance of estimation for the covariance matrix.

### 2.4 Huber’s score function \(\psi \) and weights \(w_{ij}\)

In the core of the estimating equations, \(\psi ^{\beta }(\mu _i)=\psi (A_i^{-1/2}(y_i-\mu _i)),\,\psi ^{\gamma }(\hat{r}_i)= \psi (D^{-1/2}_i(r_i-\hat{r}_i))\) and \(\psi ^{\lambda }(\sigma ^2_i)=\psi (\widetilde{A}_i^{-1/2}(\epsilon ^2_i-\sigma ^2_i))\). The function \(\psi (\cdot )\) is chosen to limit the influence of outliers in the response variable, and a common choice is Huber’s score function \(\psi _{c}(x)=\min \{ c,\max \{ -c,x\} \}\) for some constant \(c\), normally chosen to be between 1 and 2.

Huber’s score function is the most widely used robustness technique as a bounded function, truncating large Pearson residuals symmetrically, which ensures the asymptotic normally of the estimator. The tuning constant \(c\) controls the robustness and the level of asymptotic efficiency. In practice, \(c=1.345,\,c=1.5\) or \(c=2\) can be used depending on the seriousness of the contamination in a data set. We do simulations on different \(c\) and find the choice of \(c\) is not critical to gaining a good robust estimate. In this article, we use \(c=2\) in our implementation, which is sufficient to prove the improvement of efficiency by adopting robust estimating equations.

To ensure Fisher consistency, we use \(C^{\beta }_{i}(\mu _{i})=E[\psi (A_{i}^{-1/2}(y_{i}-\mu _{i}))],\, C^{\gamma }_i(\hat{r}_i)=E[\psi (D^{-1/2}_i(r_i-\hat{r}_i))]\) and \(C^{\lambda }_i(\sigma ^2_i)=E[\psi (\widetilde{A}^{-1/2}_i(\epsilon ^2_i-\sigma ^2_i))]\). Given the assumption that \(y_i\) are under normal distribution, the three expectations depend only on the choice of constant c in Huber’s score function. In the following parts, we use \(C^{\beta }_i=0,\,C^{\gamma }_i=0\) and \(C^{\lambda }_i=-0.05\) that calculated under normality assumption.

In general cases, unless the true distribution is correctly specified, the expectation \(C_i\) in estimating equations that used to ensure the Fisher consistency are not available. Qin and Zhu (2007) discussed the difficulty of calculating \(C_i\). They mentioned that \(C_i\) can be calculated easily for binary data as \(y_{ij}\) only take values 0 and 1 while hard to obtain for the data following other distributions as the calculation of expectation \(C_i \) involves intractable integrals. As an alternative, some numerical integration methods or approximations are required to achieve the expectation \(C_i\) in this situation. Wang et al. (2005b) provided a bias correction method for robust estimation functions.

The weighting matrix \(W_i^{\beta }=\text{ diag}(w_{i1}^{\beta },\ldots ,w_{in_{i}}^{\beta }) ,\,W_i^{\gamma }=\text{ diag}(w_{i1}^{\gamma },\ldots ,w_{in_{i}}^{\gamma })\) and \(W_i^{\lambda }=\text{ diag}(w_{i1}^{\lambda },\ldots ,w_{in_{i}}^{\lambda })\) are diagonal weighting matrices assigning weights to each observation. Here, diagonal entries \(w_{ij}\) can assign different weights on each observation, instead of assigning unique weight on observations from a single subject.

The three proposed robust estimation equations within the framework of generalized estimating equations that do not require the normal distribution assumption are extensions for the method illustrated in Ye and Pan (2006) since our equations can resist the contamination and downweigh the potential influential points. By introducing the modified Cholesky decomposition, the positive definiteness of the covariance matrix can be guaranteed. A different variance correlation decomposition on certain type of matrix is implemented in Fan et al. (2007). Furthermore, the dimension of the parameter space of the covariance matrix has been substantially reduced that allows us to consider the regression model for the generalized autoregressive parameters and innovation variances simultaneously with the mean. Most importantly, we apply Mallows-type robust estimations for the mean and covariance jointly for the regression model which enjoy thorough robustness comparing to the single robust estimating equation established in He et al. (2005).

### 2.5 Estimators of parameters

Quasi-Fisher scoring algorithm is applied in solving \(\beta ,\,\gamma \) and \(\lambda \) iteratively. First we choose a starting value for \(\beta ,\,\gamma \) and \(\lambda \), respectively. If we choose the special case of working independence \(R_{i}=I\), which implies a convenient starting value of \(\gamma \) and \(\lambda \) to be \(\gamma ^{(0)}=0\) and \(\lambda ^{(0)}=0\), then (5) no longer depends on \(\gamma \) and \(\lambda \). Hence, an initial estimate \(\beta ^{(0)}\) of \(\beta \) is set to be the solution to (5) in this special case as the robust GEE estimator under working independence covariance structure.

- Step 1:
Select an initial value \((\beta ^{(0)^{\prime }},\gamma ^{(0)^{\prime }},\lambda ^{(0)^{\prime }})^{\prime }\) and use model (3) to form \(\varPhi _{i}^{(0)}\) and \(D_{i}^{(0)}\). Then \(\varSigma _{i}^{(0)}\), the starting value of \(\varSigma _{i}\), is obtained.

- Step 2:
Using the weighted least squares estimators (8)–(10) to calculate the estimators \(\beta ^{(1)},\,\gamma ^{(1)}\) and \(\lambda ^{(1)}\) of \(\beta ,\,\gamma \) and \(\lambda \), respectively.

- Step 3:
Replace \(\beta ^{(0)},\,\gamma ^{(0)}\) and \(\lambda ^{(0)}\) with the estimators \(\beta ^{(1)},\,\gamma ^{(1)}\) and \(\lambda ^{(1)}\).

In simulation, the proposed robust method works well under different contaminations. When the sample size is moderate, the difficulty of the convergence in the non-robust method of the algorithm lies in the non-convergence of \(\hat{ \lambda }_m\) in most of the cases, especially in those having serious contaminations.

The robust method is supposed to outperform the non-robust method substantially in serious contaminations. However, the non-robust method has difficulties in obtaining a reliable result under heavy contaminations. Therefore, we can only compare them under mild contaminations in simulation studies.

### 2.6 Asymptotic properties and hypotheses testing

Following Ye and Pan (2006), we can obtain the following theorems.

**Theorem 1**

Suppose there is only one root \(\hat{\theta }_m=(\hat{\beta }^{\prime }_m,\ \hat{\gamma }^{\prime }_m,\ \hat{\lambda }^{\prime }_m)^{\prime }\) for the generalized estimating equations. Under some mild regularity conditions stated in Appendix, the generalized estimating equation estimator \(\hat{\theta }_m=(\hat{\beta }^{\prime }_m,\ \hat{\gamma }^{\prime }_m,\ \hat{\lambda }^{\prime }_m)^{\prime }\) is strongly consistent for the true value \(\theta _0=(\beta ^{\prime }_0,\ \gamma ^{\prime }_0,\ \lambda ^{\prime }_0)^{\prime }\); that is, \(\hat{\theta }_m=(\hat{\beta }^{\prime }_m,\ \hat{\gamma }^{\prime }_m,\ \hat{\lambda }^{\prime }_m)^{\prime }\rightarrow \theta _0=(\beta ^{\prime }_0,\ \gamma ^{\prime }_0,\ \lambda ^{\prime }_0)^{\prime }\) almost surely as \(m\rightarrow \infty \).

**Theorem 2**

The proofs are given in Appendix.

Note that when the responses \(y_i\) are normally distributed, we have \(v^{kl}=0\ (k\ne l)\) and the asymptotic covariance matrix in Theorem 2 reduces to \(\{ \mathrm{diag}(v^{11},\ v^{22},\ v^{33})\}^{-1}\).

For hypothesis testing, within the framework of generalized estimating equations, the quasi-score test based on the derivative of the generalized estimating equations may be constructed. See Ye and Pan (2006) for details.

## 3 Simulation study

In this section, simulations including contaminated cases are conducted to assess the performance of the proposed robust method. Four estimation methods are considered: NR refers to the non-robust method, which is given in Ye and Pan (2006). HR\(_{m}\) means the half-robust method on the mean. In other words, we only adopt the robust estimating equation (5), which is the estimating equation for the mean. In contrast, we have HR\(_{c}\) that stands for the other half-robust method on the covariance matrix only. The R (robust) method is our proposed method which includes all three robust estimating equations. Note that the non-robust estimators of \(\beta ,\,\gamma \) and \(\lambda \) are defined through the same equations except that \(\psi (x)=x\) and \(W_i = I_i\), where \(I_i\) are \(n_i \times n_i\) identity matrices.

*Study 1*. The following Guassian linear model is used:

The error term \((e_{i1},\dots ,e_{in_i})\) is generated from a multivariate normal distribution with mean 0 and covariance \(\varSigma _i\) satisfying \(T_i\varSigma _iT^{\prime }_i=D_i\), where \(T_i\) and \(D_i\) are described in Sect. 2.1 with \(z_{ijk}=(1,\ (t_{ij}-t_{ik}))^{\prime }\) and \(z_{ij}=x_{ij}\). Two specifications are considered: Case (1) \(\gamma =(0.2,\ 0.3)^{\prime },\,\lambda =(-0.5,\ 0.2)^{\prime }\) and Case (2) \(\gamma =(0.2,\ 0)^{\prime },\,\lambda =(-0.5,\ 0.2)^{\prime }\). The difference between these two cases lies in the choice of \(\gamma _2\).

Similar to the sampling scheme in Fan et al. (2007), the observation times are regularly scheduled but may be missing in practice. Missing at random is considered. More precisely, each subject has a set of scheduled time point \(\{ 0,1,\dots ,12\}\), in which each element (except time 0) has a \(20~\%\) probability of being missing. A uniform \([0,1]\) random variable is added to a non-missing scheduled time. This results in irregular (not on a grid) observed time points \(t_{ij}\) per individual and then \(t_{ij}\) is transformed onto interval \([0,1]\).

- C1:
randomly choose \(2~\%\) of \(x_{ij}\) to be \(x_{ij}-3\);

- C2:
randomly choose \(2~\%\) of \(y_{ij}\) to be \(y_{ij}+6\);

- C3:
randomly choose \(2~\%\) of \(x_{ij}\) to be \(x_{ij}-3\) and \(2~\%\) of \(y_{ij}\) to be \(y_{ij}+6\);

Simulation results of bias and MSE for \(\beta ,\,\gamma \) and \(\lambda \) in Study 1 (Case 1)

NC | C1 | C2 | C3 | |||||
---|---|---|---|---|---|---|---|---|

Bias | MSE | Bias | MSE | Bias | MSE | Bias | MSE | |

\(\beta _0=0.5\) | ||||||||

NR | 0.001 | 0.0004 | 0.059 | 0.0044 | 0.117 | 0.0159 | 0.174 | 0.0332 |

HR\(_m\) | 0.000 | 0.0004 | 0.038 | 0.0022 | 0.045 | 0.0035 | 0.096 | 0.0115 |

HR\(_{c} \) | 0.000 | 0.0004 | 0.056 | 0.0041 | 0.117 | 0.0159 | 0.170 | 0.0318 |

R | 0.000 | 0.0004 | 0.029 | 0.0017 | 0.029 | 0.0020 | 0.067 | 0.0066 |

\(\beta _1=1\) | ||||||||

NR | 0.000 | 0.0001 | -0.041 | 0.0019 | 0.001 | 0.0003 | -0.042 | 0.0021 |

HR\(_m\) | 0.000 | 0.0001 | -0.030 | 0.0011 | 0.001 | 0.0002 | -0.036 | 0.0016 |

HR\(_{c}\) | 0.000 | 0.0001 | -0.045 | 0.0022 | 0.001 | 0.0004 | -0.043 | 0.0023 |

R | 0.000 | 0.0001 | -0.032 | 0.0012 | 0.001 | 0.0002 | -0.035 | 0.0015 |

\(\gamma _1=0.2\) | ||||||||

NR | 0.001 | 0.0003 | 0.026 | 0.0014 | 0.057 | 0.0049 | 0.059 | 0.0050 |

HR\(_m\) | 0.001 | 0.0003 | 0.026 | 0.0014 | 0.058 | 0.0051 | 0.061 | 0.0052 |

HR\(_{c}\) | 0.001 | 0.0003 | 0.025 | 0.0013 | 0.053 | 0.0046 | 0.057 | 0.0048 |

R | 0.001 | 0.0003 | 0.025 | 0.0013 | 0.054 | 0.0047 | 0.059 | 0.0051 |

\(\gamma _2=0.3\) | ||||||||

NR | -0.033 | 0.0033 | -0.102 | 0.0177 | -0.270 | 0.0881 | -0.295 | 0.1010 |

HR\(_m\) | -0.036 | 0.0033 | -0.104 | 0.0180 | -0.278 | 0.0926 | -0.303 | 0.1056 |

HR\(_{c}\) | -0.037 | 0.0033 | -0.097 | 0.0163 | -0.250 | 0.0796 | -0.284 | 0.0954 |

R | -0.041 | 0.0032 | -0.100 | 0.0169 | -0.257 | 0.0840 | -0.298 | 0.1038 |

\(\lambda _1=-0.5\) | ||||||||

NR | 0.000 | 0.0016 | 0.388 | 0.1532 | 0.958 | 0.9198 | 1.101 | 1.213 |

HR\(_m\) | 0.000 | 0.0016 | 0.390 | 0.1541 | 0.961 | 0.9256 | 1.103 | 1.219 |

HR\(_{c}\) | -0.002 | 0.0025 | 0.186 | 0.0371 | 0.364 | 0.1369 | 0.546 | 0.302 |

R | -0.002 | 0.0025 | 0.181 | 0.0352 | 0.363 | 0.1360 | 0.548 | 0.305 |

\(\lambda _2=0.2\) | ||||||||

NR | 0.003 | 0.0000 | -0.184 | 0.0346 | -0.116 | 0.0166 | -0.193 | 0.0389 |

HR\(_m\) | 0.002 | 0.0004 | -0.186 | 0.0352 | -0.169 | 0.0167 | -0.193 | 0.0391 |

HR\(_{c}\) | 0.004 | 0.0005 | -0.084 | 0.0078 | -0.039 | 0.0024 | -0.126 | 0.0172 |

R | 0.004 | 0.0005 | -0.081 | 0.0072 | -0.036 | 0.0023 | -0.125 | 0.0169 |

First we look into performance of estimating in \(\beta \). For \(\beta _0\), we notice that in C1, C2 and C3, the half-robust method on mean outperforms the non-robust method and the half-robust method on covariance. Meanwhile, the robust method performs better than the half-robust method and therefore becomes the best performer for \(\beta _0\) among the three. This does not happen in the estimation of \(\beta _1\). For \(\beta _1\), the robust method R and the half-robust method HR\(_m\) have similar performance. Both of them have much smaller MSEs than that of the non-robust method and the other half-robust method HR\(_c\). However, no great difference can be detected between the estimation of the former two, i.e. R and HR\(_m\).

Next we pay attention to the estimation for parameters in the covariance matrix. All four methods show little difference in estimating \(\gamma \), because the corresponding covariates for \(\gamma \) only contain \(t\), which has no contamination at all. It supports in one way that the proposed robust method performs equally well when there is no contamination in \(\gamma \). On the other hand, the robust method has no advantage under no contamination. As for \(\lambda \), while the half-robust method for mean performs as poor as the non-robust method (in both biases and MSEs), the robust estimators take great advantage uniquely (even slightly better than the half-robust method for covariance). For both \(\lambda _1\) and \(\lambda _2\), the robust estimators have about half of the biases and one quarter of the MSEs as those of the non-robust and mean half-robust methods. Overall, the robust method performs favorably in comparison with the non-robust and the half-robust methods in estimation of the covariance matrix. This point supports a better estimate for the covariance matrix in the robust method resulting a better estimate for the mean parameter \(\beta _0\). In summary, the proposed robust method generally outperforms both the half-robust method and the non-robust method under different contaminations in the simulation study.

Mean squared errors for estimates using different \(\delta \)’s in Study 1 (\(\times 100\))

\(\delta \) |
| 0 | 0.2 | 0.5 | 0.8 |
---|---|---|---|---|---|

NC | |||||

\(\beta _0\) | NR | 0.034 | 0.034 | 0.034 | 0.034 |

R | 0.036 | 0.037 | 0.037 | 0.037 | |

\(\beta _1\) | NR | 0.009 | 0.009 | 0.009 | 0.009 |

R | 0.012 | 0.012 | 0.012 | 0.012 | |

\(\gamma _1\) | NR | 0.036 | 0.036 | 0.036 | 0.036 |

R | 0.037 | 0.037 | 0.037 | 0.037 | |

\(\gamma _2\) | NR | 0.333 | 0.333 | 0.334 | 0.335 |

R | 0.338 | 0.338 | 0.339 | 0.340 | |

\(\lambda _1\) | NR | 0.169 | 0.168 | 0.183 | 0.306 |

R | 0.270 | 0.270 | 0.293 | 0.456 | |

\(\lambda _2\) | NR | 0.044 | 0.050 | 0.064 | 0.076 |

R | 0.052 | 0.057 | 0.075 | 0.089 | |

C3 | |||||

\(\beta _0\) | NR | 3.14 | 3.14 | 3.14 | 3.16 |

R | 0.57 | 0.57 | 0.57 | 0.58 | |

\(\beta _1\) | NR | 0.21 | 0.21 | 0.21 | 0.21 |

R | 0.15 | 0.15 | 0.15 | 0.15 | |

\(\gamma _1\) | NR | 0.46 | 0.46 | 0.47 | 0.47 |

R | 0.46 | 0.46 | 0.46 | 0.46 | |

\(\gamma _2\) | NR | 9.71 | 9.71 | 9.79 | 9.87 |

R | 9.60 | 9.61 | 9.63 | 9.70 | |

\(\lambda _1\) | NR | 122 | 122 | 124 | 128 |

R | 29.7 | 29.9 | 30.9 | 33.1 | |

\(\lambda _2\) | NR | 3.68 | 3.64 | 3.62 | 3.61 |

R | 1.66 | 1.61 | 1.58 | 1.59 |

Entropy loss and quadratic loss in estimating \(\varSigma \) in Study 1

NC | C1 | C2 | C3 | |
---|---|---|---|---|

Entropy loss | ||||

NR | 0.05 | 0.81 | 2.51 | 3.40 |

HR\(_{m}\) | 0.05 | 1.09 | 2.54 | 3.76 |

HR\(_{c}\) | 0.06 | 0.27 | 0.77 | 1.38 |

R | 0.06 | 0.29 | 0.76 | 1.41 |

Quadratic loss | ||||

NR | 0.26 | 5.32 | 25.1 | 29.9 |

HR\(_{m}\) | 0.26 | 6.16 | 25.1 | 31.1 |

HR\(_{c}\) | 0.42 | 1.52 | 5.52 | 10.2 |

R | 0.42 | 1.55 | 5.50 | 10.2 |

It is to be noted that the results in Table 3 are obtained from 200 replications with successful convergence in estimation. The robust method successfully converged in all situations. However, the non-robust and the half-robust methods did not converge in a few percents of the simulations. Thus, the robust method is recommended since the non-convergence problem should not be neglected. Moreover, it is also found that the larger the contamination, the poorer the convergence performance by the non-robust and half-robust methods. And this is one of the reasons why we only compare the performance of the four methods under relatively mild contaminations.

*Study 2*. This study is designed to compare the performance of the proposed robust method and the non-robust method when the data sets are from non-normal distributions. The half-robust methods are not chosen because they are outperformed by the robust method. The setting is similar to those in study 1 except that we consider the error terms \((e_{i1},\dots ,e_{in_i})\) which are drawn from (a) a multivariate \(t\)-distribution with 3 degrees of freedom and covariance matrix \(\varSigma _i\) and (b) a mixed multivariate normal distribution with 30 % coming from a normal distribution \(N(-0.7\times \mu _{mn}, \varSigma _i)\) and the other 70 % from \(N(0.3\times \mu _{mn}, \varSigma _i)\), where \(\mu _{mn}\) will be specified later. Note that the error terms drawn from (b) are asymmetric in distribution. Only the case of no contamination and the case of 3rd contamination C3 are considered.

Mean squares errors for \(\beta _0\) and \(\beta _1\) in Study 2 (\(\times 100\))

Distribution | Normal | t(3) | MN0.5 | MN1 | MN2 | |
---|---|---|---|---|---|---|

NC (no contamination) | ||||||

\(\beta _0\) | NR | 0.035 | 0.036 | 0.087 | 0.277 | 1.008 |

R | 0.036 | 0.040 | 0.094 | 0.300 | 1.181 | |

\(\beta _1\) | NR | 0.009 | 0.010 | 0.010 | 0.013 | 0.015 |

R | 0.012 | 0.012 | 0.013 | 0.014 | 0.016 | |

C3 (contamination 3) | ||||||

\(\beta _0\) | NR | 3.33 | 3.33 | 3.43 | 3.68 | 4.50 |

R | 0.66 | 0.66 | 0.76 | 1.09 | 2.21 | |

\(\beta _1\) | NR | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 |

R | 0.15 | 0.15 | 0.13 | 0.14 | 0.15 |

Entropy loss and quadratic loss in estimating \(\varSigma \) in Study 2

Distribution |
| Normal | t(3) | MN0.5 | MN1 | MN2 |
---|---|---|---|---|---|---|

NC (no contamination) | ||||||

Entropy loss | NR | 0.035 | 0.039 | 0.265 | 1.351 | 3.476 |

R | 0.044 | 0.049 | 0.251 | 1.316 | 3.413 | |

Quadratic loss | NR | 0.216 | 0.216 | 1.002 | 3.452 | 7.976 |

R | 0.359 | 0.365 | 0.660 | 2.374 | 5.730 | |

C3 (contamination 3) | ||||||

Entropy loss | NR | 5.229 | 5.220 | 5.490 | 6.053 | 7.320 |

R | 2.178 | 2.164 | 2.523 | 3.357 | 5.067 | |

Quadratic loss | NR | 43.43 | 42.44 | 42.51 | 41.57 | 40.01 |

R | 13.70 | 13.61 | 14.42 | 15.31 | 17.51 |

Furthermore, we find it interesting that, under cases of asymmetric distribution errors, the robust covariance matrix estimator possesses even smaller entropy (and quadratic) losses than the non-robust estimator under no contamination (Table 5). It supports the view that the robust method cultivates a better estimation for the covariance matrix, which can be seriously affected by outliers, non-normal errors or misspecifications of the underlying distributions. In all, study 2 demonstrates that the proposed robust method is able to accommodate the effect of outliers and improve the efficiency of parameter estimation under non-normal or asymmetric distributions.

## 4 Real data analysis

Regression coefficient estimates and standard deviations (in parentheses) for AGE and BMI of the hormone data

| GEE | NR | R |
---|---|---|---|

Intercept | \(0.87_{(0.13)}\) | \(0.80_{(0.14)}\) | |

AGE | \(2.05 _{(1.96)} \) | \(2.48_{(2.16)}\) | \(3.95_{(2.05)}\) |

BMI | \(-1.72_{(2.26)}\) | \(-1.71_{(2.92)}\) | \(-0.42_{(3.49)}\) |

The weight functions \(w_{ij}\) in our robust method are calculated from \(p_{ij}=(\text{ AGE}_i, \text{ BMI}_i)\). The heavily downweighted points are from subject 18 (a cluster of points from case 244 to case 263), with \(w_{ij}=0.459\). A closer inspection of the data set shows that subject 18 has an extremely high BMI of 38. To further look into robustness, we consider the standardized residual \(s_{ij}\) which is the \(j\)th component of \(\hat{\varSigma }_i^{-1/2}(y_i-\hat{\mu }_i)\). Case 10 appears to be the most extreme point with \(s_{ij}=-4.58\). The progesterone level of the 10th observation for subject 1 (case 10) is 2.46, which is very different from its neighborhood observations 9 and 11 measured one day before and one day after, with the progesterone level being 12.8 and 13.4 respectively. In fact, other 13 observations on the subject 1 range from 8.5 to 13.4 except this case. In particular, this observation is the lowest progesterone level in the whole data set. Therefore, we conclude that case 10 is a clear outlier from subject 1, which is consistent with Fung et al. (2002). When the sample size is moderate, a subject-level potential outlier can have significant influence on estimation and inference. Subject 24 is a potential outlier as the mean of its standardized residuals is 2.66, with \(s_{ij}\) of case 337–346 ranging from 2.09 to 3.80. Subject 24 turns out to be a young women with very low BMI and the highest average progesterone level. Consequently, we find the robust method downweighs substantially the effect of both the subject-level and the observation-level outliers. This is the main reason that the robust method leads to a shift of estimation for the coefficients in the mean model compared with its non-robust version. We believe that the robust estimation is more reliable. Nevertheless, the large standard errors of the estimates suggest that a much larger sample is needed to have any concrete finding.

## 5 Discussion

In this paper, we propose simultaneous robust model for the mean and covariance matrix of longitudinal data. The proposed method has the following advantages and properties: (i) the robust covariance model guarantees the positive definiteness based on the covariance decomposition with a proper statistical interpretation (ii) it is able to control the influence of outliers in the mean and covariance model simultaneously that cultivates a more reliable estimation for the joint mean and covariance model (iii) the robust algorithm has a much greater chance to obtain a convergence solution than the non-robust algorithm. The robust estimating equations we proposed here should enhance the development of joint mean and covariance model for longitudinal data.

A limitation of the proposed method is that the model may include redundant covariates. If we have no prior knowledge of the covariance structure, then we are prone to include all the time and mean associated variables. Redundant covariates may bring in outliers and also increase the computational burden. Robust estimating equations that can serve the goal of both estimating and penalizing the models with too many covariates are under development.

## Acknowledgments

The authors are grateful to the reviewers, the Associate Editor, and the Co-Editor for their insightful comments and suggestions which have improved the manuscript significantly.