Throughout the Appendix “CMT” and “CLT” denote Continuous Mapping Theorem and Central Limit Theorem, respectively.
Proof (Proof of Theorem 1).
The consistency of any of the three estimators follows by the identification condition GEE1(i), and the uniform convergence of \(\hat{Q}_{\hat{W}}\left (\theta,\hat{h}\right )\) which follows by GEE2(i)–(ii). The asymptotic normality of \(\hat{\theta }_{W}\) and \(\hat{\theta }_{\hat{W}}\) follows by standard mean value expansion of the first order conditions
$$\displaystyle\begin{array}{rcl} 0& =\hat{ G}\left (\hat{\theta },\hat{h}\right )^{{\prime}}W\hat{g}\left (\hat{\theta },\hat{h}\right ),& {}\\ 0& =\hat{ G}\left (\hat{\theta },\hat{h}\right )^{{\prime}}\hat{W}\hat{g}\left (\hat{\theta },\hat{h}\right ),& {}\\ \end{array}$$
which hold with probability approaching to 1 by GEE1(ii). We consider
\(\hat{\theta }_{\hat{W}}\) and note that
$$\displaystyle{0 =\hat{ G}\left (\theta _{{\ast}},\hat{h}\right )^{{\prime}}\hat{W}n^{1/2}\hat{g}\left (\theta _{ {\ast}},\hat{h}\right ) +\hat{ G}\left (\hat{\theta },\hat{h}\right )^{{\prime}}\hat{W}\hat{G}\left (\overline{\theta },\hat{h}\right )n^{1/2}\left (\hat{\theta }-\theta _{ {\ast}}\right ),}$$
hence by GEE2(i)–(ii), GEE3(iii)–(iv), CMT and some algebra it follows that
$$\displaystyle\begin{array}{rcl} & n^{1/2}\left (\hat{\theta }_{\hat{W}} -\theta _{{\ast}}\right ) = -\left (I - P\left (\theta _{{\ast}},h_{0}\right )^{-1}\mu _{{\ast}}^{{\prime}}W \otimes I_{k}E\left [\partial \mathit{vec}\left (G\left (\theta _{{\ast}},h_{0}\right )\right )/\partial \theta ^{{\prime}}\right ]\right )^{-1}& \\ & \left \{E\left [G\left (\theta _{{\ast}},h_{0}\right )^{{\prime}}\right ]WE\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]\right \}n^{1/2}\left \{\hat{G}\left (\theta _{{\ast}},h_{0}\right )^{{\prime}}Wn^{1/2}\left (\hat{g}\left (\theta _{{\ast}},h_{0}\right ) -\mu _{{\ast}}\right )\right. & \\ & +\left.\overline{G}\left (\theta _{{\ast}},h_{0}\right )^{{\prime}}W\mu _{{\ast}} + E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]^{{\prime}}\left (\hat{W} - W\right )\mu _{{\ast}}\right \} + o_{p}\left (1\right ), &{}\end{array}$$
(5.5)
where
\(P\left (\theta,h\right ) = \left \{E\left [G\left (\theta,h_{0}\right )^{{\prime}}\right ]WE\left [G\left (\theta,h\right )\right ]\right \}\) and
\(\overline{G}\left (\theta,h\right ) =\hat{ G}\left (\theta,h\right ) - E\left [G\left (\theta,h\right )\right ]\). Note that
$$\displaystyle\begin{array}{rcl} & \left (I - P\left (\theta _{{\ast}},h_{0}\right )^{-1}\mu _{{\ast}}^{{\prime}}W \otimes I_{k}E\left [\partial \mathit{vec}\left (G\left (\theta _{{\ast}},h_{0}\right )\right )/\partial \theta ^{{\prime}}\right ]\right )^{-1}& {}\\ & = K\left (\theta _{{\ast}},h_{0},W\right )^{-1}P\left (\theta _{{\ast}},h_{0}\right )^{-1}, & {}\\ \end{array}$$
and that by CLT
$$\displaystyle{ n^{1/2}\left [\begin{array}{c} \hat{g}\left (\theta _{{\ast}},h_{0}\right ) -\mu _{{\ast}} \\ \overline{G}\left (\theta _{{\ast}},h_{0}\right )^{{\prime}}W\mu _{{\ast}} \\ \left (\hat{W} - W\right )\mu _{{\ast}} \end{array} \right ]\mathop{ \rightarrow }\limits^{ d}N\left (0,\left [\begin{array}{ccc} \varXi _{11}\left (\theta _{{\ast}},h_{0},W\right ) & \varXi _{12}\left (\theta _{{\ast}},h_{0},W\right ) &\varXi _{13}\left (\theta _{{\ast}},h_{0},W\right ) \\ \varXi _{12}\left (\theta _{{\ast}},h_{0},W\right )^{{\prime}}&\varXi _{22}\left (\theta _{{\ast}},h_{0},W\right ) &\varXi _{23}\left (\theta _{{\ast}},h_{0},W\right ) \\ \varXi _{13}\left (\theta _{{\ast}},h_{0},W\right )^{{\prime}}&\varXi _{23}\left (\theta _{{\ast}},h_{0},W\right )^{{\prime}}&\varXi _{33}\left (\theta _{{\ast}},h_{0},W\right ) \end{array} \right ]\right ), }$$
(5.6)
so that the conclusion follows by CMT and some algebra. For \(\hat{\theta }_{W}\) the conclusion follows noting that \(\varXi _{j3}\left (\theta _{{\ast}},h_{0},W\right )\left (j = 1,2\right )\) and Σ
W
are all 0. Finally we consider the iterated semiparametric GEE estimator \(\hat{\theta }_{p}\). We first consider the second-step estimator \(\hat{\theta }_{2}\) based on either \(\hat{\theta }_{W}\) or \(\hat{\theta }_{\hat{W}}\) as preliminary (first step) consistent estimator. Assume that the weight matrix \(\hat{W}\) is given by \(\hat{\varXi }_{11}\left (\hat{\theta }_{W},\hat{h},W\right )^{-1}\) or \(\hat{\varXi }_{11}\left (\hat{\theta }_{\hat{W}},\hat{h},\hat{W}\right )^{-1}\); the same argument as that used to obtain (5.5) can be used to show that \(n^{1/2}\left (\hat{\theta }_{2} -\theta _{{\ast}}\right )\) has the same influence function as that given in (5.5) (with \(\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\) replacing W) except for the last term, which in this case needs a further expansion. First note that
$$\displaystyle\begin{array}{rcl} & & n^{1/2}\left (\hat{\varXi }_{ 11}\left (\hat{\theta }_{\times },\hat{h},W\right )^{-1} -\varXi _{ 11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )\mu _{ {\ast}} {}\\ & & = n^{1/2}\left (\mu _{ {\ast}}^{{\prime}}\otimes \hat{\varXi }_{ 11}\left (\hat{\theta }_{\times },\hat{h},W\right )^{-1}\right )\varXi _{ 11}\left (\theta _{{\ast}},h_{0},W\right )^{-1} {}\\ & & \otimes I_{l}\mathit{vec}\left (\hat{\varXi }_{11}\left (\hat{\theta }_{\times },\hat{h},W\right ) -\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )\right ), {}\\ \end{array}$$
and by GEE(ii) (with
\(\hat{W} =\hat{\varXi } _{11}\left (\hat{\theta }_{\times },\hat{h},W\right )^{-1}\)), the triangle inequality and a mean value expansion we have
$$\displaystyle\begin{array}{rcl} & \left (\mu _{{\ast}}^{{\prime}}\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1} \otimes \varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )n^{1/2}\Bigg\{\mathit{vec}\left (\hat{\varXi }_{11}\left (\theta _{{\ast}},\hat{h},W\right ) -\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )\right )& \\ & \quad + \frac{\partial \mathit{vec}\hat{\varXi }_{11}\left (\overline{\theta }_{\times },\hat{h},W\right )} {\partial \theta ^{{\prime}}} \left (\hat{\theta }_{\times }-\theta _{{\ast}}\right )\Bigg\} + o_{p}\left (1\right ), &{}\end{array}$$
(5.7)
which shows that the asymptotic distribution of
\(n^{1/2}\left (\hat{\theta }_{2} -\theta _{{\ast}}\right )\) crucially depends also on that of
\(n^{1/2}\left (\hat{\theta }_{\times }-\theta _{{\ast}}\right )\), as we now illustrate for
\(\hat{\theta }_{\times } =\hat{\theta } _{I}\). Let
$$\displaystyle\begin{array}{rcl} L\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )& =\mu _{ {\ast}}^{{\prime}}\varXi _{ 11}\left (\theta _{{\ast}},h_{0},W\right )^{-1} \otimes \varXi _{ 11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}& {}\\ & \quad \times \left [I,E\left (\frac{\partial \mathit{vec}\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )} {\partial \theta ^{{\prime}}} \right )\right ] & {}\\ \end{array}$$
and
$$\displaystyle{\hat{S}\left (\theta _{{\ast}}h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right ),\theta _{I}\right ) = \left [\begin{array}{c} \mathit{vec}\left (\hat{\varXi }_{11}\left (\hat{\theta }_{I},\hat{h},W\right ) -\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )\right ) \\ K\left (\theta _{{\ast}},h_{0},I\right )^{-1}\left \{G\left (\theta _{ {\ast}},h_{0}\right )^{{\prime}}\left (\hat{g}\left (\theta _{ {\ast}},h_{0}\right ) -\mu _{{\ast}}\right ) + \overline{G}\left (\theta _{{\ast}},h_{0}\right )^{{\prime}}\mu _{ {\ast}}\right \} \end{array} \right ]}$$
so that
$$\displaystyle\begin{array}{rcl} n^{1/2}\left (\hat{\varXi }_{ 11}\left (\hat{\theta }_{I},\hat{h},W\right )^{-1}-\varXi _{ 11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )\mu _{ {\ast}}& = L\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right ) & {}\\ & \quad \times n^{1/2}\hat{S}\!\left (\theta _{ {\ast}}h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right ),\theta _{I}\!\right ).& {}\\ \end{array}$$
Then
$$\displaystyle\begin{array}{rcl} & \varXi _{13}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right ) = nCov\left \{\hat{g}\left (\theta _{{\ast}},h_{0}\right ) -\mu _{{\ast}},\right., & {}\\ & \left.\hat{S}\left (\theta _{{\ast}}h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right ),\theta _{I}\right )^{{\prime}}L\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )^{{\prime}}\right \}, & {}\\ & \varXi _{23}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right ) = nCov\left \{\left (\hat{G}\left (\theta _{{\ast}},h_{0}\right ) - E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]\right )^{{\prime}}\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\mu _{{\ast}},\right. & {}\\ & \left.\hat{S}\left (\theta _{{\ast}}h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right ),\theta _{I}\right )^{{\prime}}L\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )^{{\prime}}\right \}, & {}\\ & \varXi _{33}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right ) = nV ar\left \{L\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )\hat{S}\left (\theta _{{\ast}}h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right ),\theta _{I}\right )\right \},& {}\\ \end{array}$$
and
$$\displaystyle\begin{array}{rcl} n^{1/2}\left (\hat{\theta }_{ 2} -\theta _{{\ast}}\right )& \mathop{\rightarrow }\limits^{ d}N\left (0,K\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )^{-1}\varPsi _{ \left (2\right )}\left (\theta _{{\ast}},h_{0},W,\theta _{p}\right )\right.& {}\\ & \left.K\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\right )^{-1}\right )^{-1}\right ), & {}\\ \end{array}$$
where
$$\displaystyle\begin{array}{rcl} & \varPsi _{2}\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )^{-1}\right ) =\varPsi _{\left (0\right )}\left (\theta _{{\ast}},h_{0},\varXi _{11}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )^{-1}\right ) & {}\\ & \quad + E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]^{{\prime}}\varXi _{33}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ] & {}\\ & \quad + E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]^{{\prime}}\varXi _{11}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )^{-1}\varXi _{13}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ] & {}\\ & \quad + E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]^{{\prime}}\varXi _{13}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )^{{\prime}}\varXi _{11}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )^{-1}E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]& {}\\ & \quad +\varXi _{23}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right )E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ] + E\left [G\left (\theta _{{\ast}},h_{0}\right )\right ]^{{\prime}}\varXi _{23}\left (\theta _{{\ast}},h_{0},W\left (\theta _{I}\right )\right ).& {}\\ \end{array}$$
The general distribution of the
\(\left (p + 1\right )\) th iterated GEE estimator can be computed using a recursive argument.
Proof (Proof of Theorem 2).
The consistency of \(\hat{\theta }\) follows as in the proof of Theorem 1 by CMD1(i), CMD2(i)–(ii). The first order conditions for \(\hat{\theta }\) are
$$\displaystyle{M\left (\hat{\theta },\hat{h}\right )^{{\prime}}\hat{\varOmega }\left (\hat{\theta }_{ p},\hat{h}\right )^{-1}\left (\hat{\pi }-m\left (\hat{\theta },\hat{h}\right )\right ) = 0,}$$
which hold with probability approaching 1 by CMD1(ii). By a standard mean value expansion
$$\displaystyle{ n^{1/2}\left (\hat{\pi }-m\left (\theta _{ 0},\hat{h}\right ) - M\left (\overline{\theta },\hat{h}\right )\left (\hat{\theta }-\theta _{0}\right )\right ) + o_{p}\left (1\right ) }$$
(5.8)
and by CMD3(i)–(ii), CMD1 (i), and CMD(iv)
$$\displaystyle{n^{1/2}\left (\hat{\pi }-m\left (\theta _{ 0},\hat{h}\right )\right )\mathop{ \rightarrow }\limits^{ d}N\left (0,\varOmega \left (\theta _{0},h_{0}\right )\right ) + o_{p}\left (1\right )}$$
by CLT. The conclusion follows by CMD3(iii)–(iv), (
5.8), and CMT.