Semiparametric regression based on quadratic inference function for multivariate failure time data with auxiliary information

This paper deals with statistical inference procedure of multivariate failure time data when the primary covariate can be measured only on a subset of the full cohort but the auxiliary information is available. To improve efficiency of statistical inference, we use quadratic inference function approach to incorporate the intra-cluster correlation and use kernel smoothing technique to further utilize the auxiliary information. The proposed method is shown to be more efficient than those ignoring the intra-cluster correlation and auxiliary information and is easy to implement. In addition, we develop a chi-squared test for hypothesis testing of hazard ratio parameters. We evaluate the finite-sample performance of the proposed procedure via extensive simulation studies. The proposed approach is illustrated by analysis of a real data set from the study of left ventricular dysfunction.


Introduction
This paper is aimed at developing improved inference procedure for multivariate failure time data with auxiliary information. Large cohort studies often involve thousands or more subjects and the studies, especially when involving failure time outcomes, could last for many years. It is often that the measurement of the primary covariate can only be obtained for a random subset of the study cohort due to technical difficulties or financial limitations. On the other hand, some auxiliary information that is less precise but highly correlated to the primary exposure can be cheaply collected for all cohort members. The auxiliary information could be a mismeasured surrogate to the true covariate, or any covariate that is informative about the true covariate. An example is from the left ventricular dysfunction (SOLVD 1991) prevention study, which aims to assess the effects of risk factors on the time (possibly censored) to heart failure and the first myocardial infarction. One of the most important risk factors is patient's ejection fraction (EF), which can be precisely measured by using a standardized radionucleotide technique, but the cost is very high. Therefore, EF is only measured on a randomly chosen subset of all cohort, while a less precise but cheaper measurement of EF was ascertained for all the patients using a nonstandardized technique. Because each patient could experience both heart failure and the first myocardial infarction, statistical methods for handling multivariate failure time data with covariate measurement error would be required.
Proper use of auxiliary information has been proved to lead to improved efficiency of survival estimates in multivariate failure time data with auxiliary information. For example, Hu and Lin (2004) proposed a corrected estimation function under the assumption that the error is symmetrically distributed. Liu et al. (2009) and Liu et al. (2010) developed estimated pseudo-partial likelihood method for multivariate failure time data with discrete and continuous auxiliary variable, respectively. Liu et al. (2012) and Fan and Wang (2009) studied this problem under the assumption that the intra-cluster subjects have common baseline hazard. The above studies are based on the marginal hazards model, the intra-cluster correlation, however, is ignored in the estimation procedures and only adjusted in the inference step by applying a robust sandwich variance estimate. The practice of ignoring the intra-cluster correlation would result in some loss of efficiency.
Some authors have proposed to incorporate correlation explicitly into the estimating equations to improve the efficiency of estimate in dealing with multivariate failure time data with covariates being fully observed. For example, Prentice (1995, 1997) added a weight matrix based on the inverse of correlation matrix of marginal martingales into the partial likelihood score equation. Simulation studies have shown that their approach is more efficient than that using independent structure when cluster size is small. However, their method is computation intensive when the cluster size is large because the computation involves an estimation of very high dimensional weighting correlation matrix. To overcome this shortcoming, Xue et al. (2010) developed a different approach by applying the method of quadratic inference function (QIF). Their method avoids to explicitly estimate the correlation parameters and is easy to implement especially when cluster size is large. We note that both these two methods assume that the covariates could be observed completely and therefore cannot be applied directly to SOLVD data.
Motivated by the advantages of the QIF method provided, we extend this method to the analysis of multivariate failure time data with auxiliary information. Here, we assume that the auxiliary covariate is continuous. We propose an estimated QIF method and study the asymptotic properties of the proposed estimator. The proposed method inherits the merit of QIF method which avoids the estimation of nuisance correlation parameters and is computationally easy to implement. Under certain regularity conditions, we establish the asymptotic normality of resulting estimator. Simulation studies show that our proposed method can improve the estimation efficiency compared with that ignoring dependent structure, such as the method by Liu et al. (2010). In addition, we study the problem of hypothesis testing, propose a proper test statistic which have a chi-squared limiting distribution under the null hypothesis.
The rest of the article is organized as follows. In Sect. 2, we introduce the model and describe the proposed estimation procedure. In Sect. 3, the large-sample properties of the proposed estimator are presented. In Sect. 4, a chi-squared test is developed for hypothesis testing. In Sect. 5, the finite-sample performance of the proposed procedures is assessed through extensive simulation studies. We illustrate the proposed method through analysis of a real data set from SOLVD study in Sect. 6. Some concluding remarks are given in Sect. 7 and the technical proofs are provided in "Appendix".

Preliminaries
Suppose that the whole cohort consists of n independent clusters, and each cluster contains K correlated failure types. Let (i, k) denote the kth (k = 1, · · · , K ) subject in the ith (i = 1, · · · , n) cluster. Let T ik and C ik be potential failure time and censoring time for subject (i, k). With censoring, one observes T ik = min( T ik , C ik ) and ik = I ( T ik ≤ C ik ), where I (·) is the indicator function. Let Z ik (t) be a p-vector of possibly time-dependent covariates.
For subject (i, k), the hazard function λ ik (t; Z ik (t)) takes the following form: where β is a p-vector of unknown regression parameters and λ 0k (t) is an unspecified marginal baseline hazard function pertaining to the kth failure type. Note that model (2.1) includes as a special case the failure-type-specific model (Wei et al. 1989;Greene and Cai 2004) , which allows for different covariate effect for different k. This can be seen by defining For simplicity, we write λ ik (t; Z ik (t)) as λ ik (t) in the following.
Let 0k (t) = t 0 λ 0k (u)du be the marginal cumulative baseline hazard function for the kth failure type. Let N ik (t) = ik I (T ik ≤ t) and Y ik (t) = I (T ik ≥ t) be the observed counting process and the at-risk indicator process. For convenience, write the relative risk function as du be the marginal martingale process, where β 0 is the true parameter. Given β, 0k (t) can be estimated consistently by the following Breslow type estimator (Breslow 1972): . (2.2) Given β 0 , it follows that M ik (t) could be estimated as follows: To improve the estimation efficiency, Cai and Prentice (1995) added proper weight matrix to the pseudo-partial likelihood equation, and proposed to obtain estimate of β through solving the following equation: with r ( j) (t; β) denotes the jth derivative of r (t; β) with respect to β. The weight matrix measures the intra-cluster correlation and is important to improve estimation efficiency. However, when the cluster size K is large, the estimation of weight matrix is computationally expensive. To overcome this shortcoming, Xue et al. (2010) proposed a QIF method which is based on the following generalized estimating equation: where i (t; β) = diag{λ i1 (t), . . . , λ i K (t)} and i (α) is the working correlation matrix whose common structure is specified by a vector of nuisance correlation parameters α. The inverse of the working correlation is approximated by a linear combination of several pre-specified symmetric basis matrices, namely, where B 1 , . . . , B L are known basis matrices and a 1 , . . . , a L are unknown coefficients. Substituting (2.5) in (2.4) leads to a linear combination of the elements of the following vector As there are more equations than unknown parameters, Xue et al. (2010) proposed to estimate β by minimizing the following QIF: We denote the solution as β Q in the following. In the implementation of QIF, there being an additional issue that the diagonal matrix i (t; β) involves the unknown baseline hazard function λ 0k (t; β). Xue et al. (2010) suggested a kernel smoothed estimator λ 0k (t; β) as follows, where κ(·) is the Epanechnikov kernel function with ν k being the rule-of-thumb bandwidth, and 0k (t; β) = 0k (t; β) − 0k (t−; β) with 0k (t; β) being the Breslow estimator given in (2.2).

Estimated QIF for marginal hazards model
Consider the situation that the primary covariate can only be ascertained in validation set. Let Z ik (t) consist of two parts, X ik (t) and Z ik (t), where X ik (t) is the primary variable which can only be observed in the validation set and Z ik (t) is the vector of the remaining covariates that are measured precisely for the full cohort. Accordingly, write the true parameter as β 0 = (β 10 , β 20 ) with β 10 and β 20 pertaining to X ik (t) and Z ik (t), respectively. Denote A(t) as a time-dependent auxiliary variable for the primary covariate X (t). A(·) can be measured for all cohort members. Suppose that A provides no additional information to model given X , i.e., λ(t; X (t), Z (t), A(t)) ≡ λ(t; X (t), Z (t)).
Use η ik = 1 or 0 to indicate whether the subject (i, k) is in the validation set or not. Let V k = {i : η ik = 1} and V k = {i : η ik = 0} denote the kth marginal validation set and non-validation set, respectively. Then the observed data are: According to Liu et al. (2009), when subject (i, k) is in non-validation set, the hazard function given observed data can be written as: where A * ik (t) denotes all the possible auxiliary information, which may include the auxiliary covariate A ik (t) and the part from Z ik (t). Therefore, the induced relative risk function is If the conditional density of X ik , written as f (x|T ik ≥ t, A * ik ), is a known function up to a parameter θ , then (β, θ ) can be estimated by using the induced risk function to replace risk function in equations (2.3) or (2.4). However, misspecification of such parameterization may lead to biased estimates. We use empirical method to estimate ψ ik (t; β) and then replace it with the corresponding estimate.
In this paper, we consider the often encountered case that both the primary covariate X ik (t) and the auxiliary variable A * ik (t) are one-dimensional. The unknown part of induced relative risk function in non-validation set is estimated by kernel smoothing method where (·) is a kernel function, μ k is the bandwidth. Imputation of the relative risk by interpolation would be used when the denominator is 0. Therefore, the estimate of the relative risk is Replacing r ik (t; β) by r ik (t; β) in the notations in Sect. 2.1, we obtain an estimated version of R i , 0k , λ 0k , i , M i , g i and G n . To differentiate, write as R i , 0k , λ 0k , i , M i , g i and G n . It yields an estimated QIF as To reduce the computation burden, we approximate the first and the second order derivatives of Q n (β) as in Qu et al. (2000) as follows.
Then, Newton-Raphson algorithm can be applied by using the approximation.

Asymptotic properties
In this section, we present the asymptotic properties of the proposed estimated QIF estimator β Q , and provide standard error formula for it.
Let n k denote the number of subjects in V k and assume n k /n → ρ k as n → ∞, where ρ k represents the probability of subject (i, k) being sampled into the kth marginal validation set. Under the conditions listed in "Appendix", we demonstrate the asymptotic behavior of β Q in the following theorems.
The asymptotic covariance Q (β 0 ) can be consistently estimated by where S d k (d = 0, . . . , 5) and ζ ik (t; β, B) are defined in "Appendix". Remark 1 As a special case of the estimated QIF estimator, the estimator using the independent working correlation is denoted as β I , which is the same as the EPPL estimator of Liu et al. (2010), but different expressions of the variance matrix I (β 0 ) of the asymptotic distribution of √ n( β I − β 0 ) and its estimator I ( β I ) are provided under the conditions (C1)-(C8) in "Appendix". From the corresponding expressions of Q (β 0 ) and Q ( β Q ), we can obtain that

Inference on hazard ratio parameters
The QIF is built on an objective function, which provides a natural way to make inference about the hazard ratio parameter β. Suppose that β is partitioned into γ and δ, where γ is vector of hazard ratio parameters of interest with dimension p 0 , and δ is a vector of nuisance parameters with dimension p − p 0 . As a special case, we also allow p 0 = p, with β = γ and δ being absent.
To test we propose a test statistic The values of Q n (γ 0 , δ) and Q n ( γ , δ) measure how well the model fits the data under H 0 and H 1 , respectively. Under H 0 , the difference between Q n (γ 0 , δ) and Q n ( γ , δ) should be very small. However, under H 1 , Q n (γ 0 , δ) should be systematically larger than Q n ( γ , δ).

Theorem 2 Suppose conditions (C1)-(C9) in "Appendix" are satisfied, under H 0 , the test statistic T asymptotically follows chi-squared distribution with p 0 degrees of freedom.
Comment To prove Theorem 2, we rewrite that where Q n is defined as in (2.6), δ * = argmin γ , δ). From the proof of Theorem 1, we can obtain that the first two brackets equal to o p (1). In addition, from the conclusions of the previous Theorem 1 and the Theorem 1 in Xue et al. (2010), both β Q and β Q are consistent estimators of β 0 , then we can have that the third and the fourth brackets also equal to o p (1). Furthermore, the last bracket asymptotically approaches to a random variable which follows chisquared distribution with p 0 degrees of freedom, the proof of the last one is similar to the proof of Theorem 1 in Qu et al. (2000).

Simulation studies
In this section, we conduct simulation studies to evaluate the finite-sample behavior of the proposed method. We first evaluate the performance of proposed estimator in Sect. 5.1 and then the performance of inference method in Sect. 5.2.

Performance of estimated QIF estimator
We compare the proposed estimator with the QIF estimator proposed by Xue et al. (2010) based only on the validation set and the EPPL estimator ( β I ) of Liu et al. (2010), which utilizes the auxiliary information but does not consider the intra-cluster correlation in the estimate of β. The proposed estimator takes both the intra-cluster correlation and the auxiliary information into account. The covariates (X i1 , X i2 , . . . , X i K ), which are only observed in the validation sets in the real studies, are generated independently from uniform distribution U (0, 1). The covariates (Z i1 , Z i2 , . . . , Z i K ) are independent binary covariates taking value one with probability 0.5. The multivariate failure times T i1 , T i2 , . . . , T i K are generated from multivariate (Clayton and Cuzick 1985) model with the joint survival function and β k , which may vary with the failure type, is the corresponding parameter of D k , and θ is the dependence parameter, a larger value of which represents a weaker dependence between the failure times. We set θ = 0.25, 0.5 or 2, which presents a varying degree of correlation between the generated failure times, and the baseline hazard function λ 0k = 1. The simulated failure times (t 1 , . . . , t K ) are generated by using the algorithm described in Cai and Shen (2000) through . . , k−1, and (u 1 , . . . , u K ) are generated from uniform distribution over interval (0, 1). Censoring times are generated from U (0, c), where c is a selected constant to achieve a specified censoring rate.
Notice that the true correlation structure of the Clayton model is exchangeable, the working correlation in (2.9) is taken to be exchangeable, and the corresponding estimated QIF estimator is denoted as β Q E . We calculated another estimated QIF estimator β Q A using the misspecified AR(1) working correlation. The corresponding resulting estimators of the QIF method based only on the validation set are denoted as β Q E V and β Q AV , respectively.
To estimate the induced relative risk function and the baseline hazard function, we apply the Epanechnikov kernel function in (2.8) and (2.7) with bandwidths μ k = 2 σ (A V k )n −1/3 k and ν k = 1.06 σ (T k )n −1/5 , respectively, where σ (·) is the sample standard deviation function, A V k is the part of auxiliary covariate A k in the kth marginal validation set. We choose the nearest neighbor interpolation to estimate the induced relative function when the denominator in (2.8) is 0. In addition, it is worth noting that λ 0k (t) may be 0 at some locations because the Epanechnikov kernel function is of bounded support, which could make the diagonal matrix i (t; β) not invertible. If this happens, we replace λ 0k (t) with the average of values at the non-zero locations.
The auxiliary covariate A k is generated from X k via where k follows a normal distribution N (0, σ 2 ), the positive parameter σ controls the strength of association between A k and X k . Each simulation is repeated 1000 times.

Simulation study (1)
In the first simulation, we set the true parameter β = (β (1) , β (2) ) T = (0.693, −0.2) T , validation proportion ρ k = 0.5, association parameter σ = 0.1. The number of independent clusters is n = 200, with K = 4 or 8 failure types in each cluster. Tables 1, 2 and 3 demonstrate the simulation results for estimates of parameter β for each method under different censoring rates 10%, 40% and 80%. The sample mean and sample standard deviation of the 1000 estimates, the average of estimated standard errors and the coverage rate of the 95% confidence intervals for the true parameter are listed in the Est, SD, SE and CR columns, respectively. RE, the ratio of the empirical variance of β I to that of β Q E or β Q A , is the estimated relative efficiency of estimated QIF estimators relative to β I . We summarize the results as follows: (i) The estimates of all the methods are all approximately unbiased. Moreover, the estimators of the asymptotic standard errors are approximately equal to the empirical standard deviations. The corresponding 95% confidence intervals calculated by the estimated standard errors provide reasonable coverage rates. This suggests that the estimates of asymptotic standard errors for all methods work well. (ii) For each considered scenario, the estimator β I using auxiliary information is more efficient than the estimators β Q E V and β Q AV using validation set only. However, β I loses efficiency when the degree of correlation within a cluster becomes stronger. (iii) As K increases, the empirical standard deviations (SD) of all the estimators decrease. That is naturally because of the increase in the total amount of data. (iv) As θ decreases, the efficiency gain of estimated QIF estimators relative to β I increases. From Table 1, estimated QIF estimators are more efficient than the other estimators for all combinations of θ and K . We also observe the same trend from Table 2. From Table 3, however, the estimated QIF estimators are less efficient than β I , although REs are very close to 1, in several cases due to the reduction of correlation when censoring rate is 80%. Furthermore, as expected, β Q E with correct working correlation is always more efficient than β Q A with misspecified working correlation. (v) The validation proportion of the incomplete covariate has effect on the values of RE, especially for the first parameter. For example, when K = 4 and 10% censoring, the REs of β Q E relative to β I for β (1) decrease from (3.34, 2.26, 1.21) to (2.85, 2.03, 1.18), when validation proportion decreases from 0.5 to 0.3 (results not shown). However, when we increased n, not only the CRs but also the REs increased.

Simulation study (2)
In practical studies, one may be interested in the failure-type-specific model which allows the regression parameters varying with the failure type. We simulate K = 2 failure types in each cluster, the true parameter 2 ) T = (0.5, −0.262) T . Since the cluster size is 2, we only need to consider the exchangeable working correlation structure. Consider three settings of n and censoring rate (CE), (n, CE) = (300, 10%), (300, 40%), (700, 80%). The simulation results are shown in Table 4. From this table, we can observe similar results as in Simulation Study (1).

Performance of inference method
We also conduct simulation studies to assess the performance of the proposed chisquared test method. The data are generated from the same model as in Simulation Study (1) with β = (β (1) , β (2) ) T = (0.693, −0.2) T , n = 200, K = 4, θ = 0.25, ρ k = 0.5, σ = 0.1, and censoring rate is 10%. First, we are interested in testing H 0 : β (1) = 0.693 versus H 1 : β (1) = 0.693. Since the dimension of β (1) is 1, the test statistic T in (4.1) asymptotically follows χ 2 1 , where β (2) is calculated by minimizing Q n (0.693, β (2) ) with exchangeable or AR(1) working correlation. (1) * = 0.693, i.e., the alternative hypothesis collapses into the null hypothesis, powers are 0.051 and 0.061 for exchangeable and AR(1) working correlation, respectively. It shows that the proposed chi-squared test gives the right level for testing. Figure 2 plots the power functions of the chi-squared test for the estimated QIF method and the QIF method with two different working correlations, and the EPPL method. We can observe that the power functions decrease rapidly as β (1) * gets closer to the true value (0.693), but the power function for exchangeable working correlation is always larger than that for AR(1) working correlation, thus the test with correct working correlation is more powerful than the one with misspecified working correlation. Nonetheless, powers of the chi-squared test for either of the estimated QIF methods are larger than those of the other two methods. In addition, the power for the EPPL method that utilizes the auxiliary information is larger than those for QIF method based on the validation set only. Similarly, we also consider the hypothesis test that H 0 : β (2) = −0.2, and Table 1 Simulation results for common effect size across failure type: β I is the EPPL estimator using the independent structure. β Q E and β Q A are the estimators of the proposed estimated QIF method with exchangeable and AR(1) working correlation, respectively Table 2 Simulation results for common effect size across failure type:  Table 1  Table 3 Simulation results for common effect size across failure type:  Table 1  Table 4 Simulation results for varying effect size across failure type:

Analysis of SOLVD data
We apply the proposed method to Left Ventricular Dysfunction (SOLVD 1991) study in this section. The SOLVD study was a randomized, double-masked, placebo-controlled trial between 1986 and 1991. The trial had a three-year recruitment and a two-year follow-up. The basic inclusion criteria for the prevention trial were: age between 21 and 80 years, inclusive, no overt symptoms of congestive heart failure, and left ventricular EF less than 35%. EF is a number between 0 and 100 that measures the efficiency of the heart in ejecting blood. A total of 4228 patients with asymptomatic left ventricular dysfunction were randomly assigned to receive either enalapril or placebo at one of the 83 hospitals linked to 23 centers in the United States, Canada, and Belgium. Liu et al. (2009) andLiu et al. (2010) have analyzed this data without considering the intra-cluster correlation.
The primary clinical issues of interest are the effects of covariates on the risk of heart failure and on the first nonfatal myocardial infarction (MI) after adjusting for the confounding variables. The covariates of interest are ejection fraction, patient's gender (SEX), which is coded 1 for male and 0 for female, treatment (TRT), which is coded 1 for enalapril and 0 for placebo, and patient's age (AGE), which is measured in years. In the SOLVD study, the covariates SEX, TRT and AGE were recorded for almost all of the patients, but only 108 among the total of 4228 patients have their ejection (1) * fraction accurately measured using a standardized radionucleotide technique (LVEF). A related nonstandardized measure (EF) was ascertained for all the patients. Therefore, the nonstandardized measure (EF) is a surrogate measure for the standardized measure (LVEF) in this case.
In terms of the notation in the previous sections, we set where k denotes failure type with k = 1 for heart failure and k = 2 for nonfatal MI and i denotes the patient with i = 1, . . . , 4228. Let β k = (β 1k , β T 2k ) T be the unknown regression coefficients, we fit the following marginal hazards model to the SOLVD data: Since the primary covariate LVEF is continuous and severely incomplete, we need to estimate the induced relative risk function using the validation set. Furthermore, Liu  Table 5 presents the data analysis results for two methods, the proposed estimated QIF method, and the EPPL method which ignores the intra-cluster correlation between the failure times. It can be seen that the parameter estimates from two methods are close but the proposed estimated QIF method have smaller standard errors. To test whether or not the covariates have significant effects on the times of heart failure and nonfatal MI, we calculate p-values from both the two-sided Z-test and the chi-squared test for the two methods. The results indicate that, at 0.05 significance level, all of the covariates are statistically significant for heart failure by the proposed method, while SEX is not significant from the EPPL method. From both methods, only TRT is significant for the risk of nonfatal MI. By the proposed method, the risk of heart failure decreases by 3.92% (95% CI [1.83%, 5.97%]) with 1% increase in LVEF, the risk increases by 2.63% (95% CI [1.43%, 3.85%]) per year increase in age, males have about 29.46% (95% CI [7.19%, 46.39%]) lower risk for heart failure than females, and enalapril reduces the risk by 33.77% (95% CI [20.52%, 44.80%]).
To illustrate the prediction of the survival probability for a subject, Fig. 3 shows the estimated survival curves of heart failure and nonfatal MI for a 69-year-old male patient with LVEF of 28% (the median of LVEF), receiving enalapril. The survival curves by the two methods are very close, but the pointwise confidence intervals from the proposed method are narrower than those from the EPPL method.

Concluding remarks
We proposed an estimated QIF approach for multivariate failure time data when the primary covariate is ascertained on a subset of full cohort but auxiliary information Fig. 3 Survival curves by the proposed method (bold curve) and the EPPL method (thin curve) for heart failure and nonfatal myocardial infarction of a subject with covariates LVEF = 28%, TRT = 1, SEX = 1 and AGE = 69, along with the corresponding 95% pointwise confidence intervals (dotted curves) is available on the full cohort. For our proposed approach, we allowed the censoring times for different failures for a subject to be different. It is worth noting that in practice the censoring times for different failures for a subject are usually the same. This can be treated as a special situation of our general set up and the proposed method is applicable.
In this article, we consider the situation that auxiliary variable is continuous. The method is based on the kernel smoothing technique and therefore is nonparametric with respect to the association between the missing covariate and corresponding auxiliary. QIF method has advantage of incorporating intra-cluster correlation in the estimation procedure. Compared with other existing methods (e.g., Liu et al. 2010) where intra-cluster correlation is not considered, the proposed procedure can improve the estimation efficiency without requiring the specification of the correlation formula. Another advantage of proposed method is that it is easy to implement. In this work, we consider the situation when the dimension of continuous auxiliary covariates is low, further research is needed when the dimension is high.
Clearly, ∂ U (β, B)/∂β T is continuous. For any constant matrix B, the first term on the right hand side of (7.3) is a local square integrable martingale, hence by Lenglart inequality, we can show that it converges to zero in probability, uniformly for β ∈ B. Let (β, B) denote the uniform convergence limit of the second term, we can show that ∂ U (β, B) ∂β T → (β, B) = − When β = β 0 , by Lenglart inequality, we can show that where ζ ik (t; β, B) = K m=1 R im (t; β)φ imk (t; β, B) − e k (t; β, B). From the arguments in Zhou and Wang (2000), it can be shown that Note that F ik (β) = 0 if i ∈ V k . Then we have where M ik (t) is a mean-zero martingale and E(F ik (β 0 )) = 0, then by strong law of large numbers, we have U (β 0 , B) converges to zero with probability 1, thus (ii) is satisfied. Denote Similiarly as in Xue et al. (2010), by condition (C8), we can show that O(β, B j , B j ) converges in probability to ω(β, B j , B j ), and n W n (β) converges in probability to uniformly for β ∈ B, hence (iii) and (iv) are satisfied.

Asymptotic normality
Define D i (B) = K k=1 τ 0 ζ ik (t; β 0 , B)dM ik (t), from the previous statements, for any constant matrix B, we can obtain that √ n U (β 0 , B) is asymptotically equivalent to n −1/2 n i=1 D i (B), which is a sum of independent p-vector random variables with mean zero and variance var( D i (B)). By condition (C8) and the multivariate central limit theorem, we can show that √ n U (β 0 , B) converges in distribution to a normal random vector with mean zero and covariance matrix ω(β 0 , B, B), and √ n G n (β 0 ) converges in distribution to a normal random vector, denoted as X , with mean zero and covariance matrix W 0 as defined in (A.2).