Abstract
Current status data occur in many fields including demographical, epidemiological, financial, medical, and sociological studies. We consider the regression analysis of current status data with latent variables. The proposed model consists of a factor analytic model for characterizing latent variables through their multiple surrogates and an additive hazard model for examining potential covariate effects on the hazards of interest in the presence of current status data. We develop a borrow-strength estimation procedure that incorporates the expectation–maximization algorithm and correlated estimating equations. The consistency and asymptotic normality of the proposed estimators are established. A simulation study is conducted to evaluate the finite sample performance of the proposed method. A real-life study on the chronic kidney disease of type 2 diabetic patients is presented.
Similar content being viewed by others
References
Amemiya Y, Fuller WA, Pantula SG (1987) The asymptotic distributions of some estimators for a factor analysis model. J Multivar Anal 22(1):51–64
Andersen P K, Borgan O, Gill R D, Keiding N. (1992) Statistical models based on counting processes. Springer Series in Statistics. Springer
Anderson TW, Amemiya Y (1988) The asymptotic normal distribution of estimators in factor analysis under general conditions. Ann Statist 16(2):759–771
Bentler P M, Wu E J C.EQS6 for Windows User Guide. Enciuo, CA.: Multivariate Software, Inc, (2002)
Bollen KA (1989) Structural equations with latent variables. Wiley, New York
Diao G, Yuan A (2019) A class of semiparametric cure models with current status data. Lifetime Data Anal 25(1):26–51
Du M, Hu T, Sun J (2019) Semiparametric probit model for informative current status data. Stat Med 38(12):2219–2227
Finkelstein DM (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42(4):845–854
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York
Gaede P, Lund-Andersen H, Parving HH, Pedersen O (2008) Effect of a multifactorial intervention on mortality in type diabetes. New England J Med 358:580C591
He H, Cai J, Song XY, Sun LQ (2017) Analysis of proportional mean residual life model with latent variables. Stat Med 36(5):813–826
He HJ, Pan D, Song XY, Sun LQ (2019) Additive mean residual life model with latent variables under right censoring. Stat Sinica 29(1):47–66
Huang J (1996) Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 24(2):540–568
Jöreskog KG, Sörbom D (1996) LISREL 8: structural equation modeling With the SIMPLIS command language. Lincolnwood, Scientific Software International, IL
Lee SY, Song XY (2004) Maximum likelihood analysis of a general latent variable model with hierarchically mixed data. Biometrics 60(3):624–636
Lee SY (2007) Structural equation modeling: a Bayesian approach. Wiley, New York
Lin DY, Oakes D, Ying Z (1998) Additive hazards regression with current status data. Biometrika 85(2):289–298
Muthén L K, Muthén B O., Mplus Users Guide (5th ed.),Los Angeles, CA: Muthén and Muthén, 1998-2007
Ma L, Hu T, Sun J (2015) Sieve maximum likelihood regression analysis of dependent current status data. Biometrika 102(3):731–738
Pan D, He H, Song XY, Sun LQ (2015) Regression analysis of additive hazards model with latent variables. J Am Stat Ass 110(511):1148–1159
Rossini AJ, Tsiatis AA (1996) A semiparametric proportional odds regression model for the analysis of current status data. J Am Stat Ass 91(434):713–721
Shi JQ, Lee SY (2000) Latent variable models with mixed continuous and polytomous data. J R Stat Soc Ser B 62(1):77–87
Skevington SM, O’Connell MLA (2004) The World Health Organization’s WHOQOL-BREF quality of life assessment: psychometric properties and results of the international field trial - A report from the WHOQOL group. Qual Life Res 13(2):299–310
Song XY, Lee SY (2012) Basic and advanced Bayesian structural equation modeling: with applications in the medical and behavioral sciences. Wiley, London
Song XY, Lee SY, Ma RW, So WY, Cai JH, Tam C, Lam V, Ying W, Ng MCY, Chan JCN (2009) Phenotype genotype interactions on renal function in type 2 diabetes: an analysis using structural equation modelling. Diabetologia 52(8):1543C1553
Sun J (1999) A nonparametric test for current status data with unequal censoring. J R Stat Soc Ser B 61(1):243–250
Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York
Wang CJ, Li Q, Song XY, Dong XG (2019) Bayesian adaptive lasso for additive hazard regression with current status data. Stat Med 38(20):3703–3718
Wang X Q, Wu H T, Feng X N, Song X Y. Bayesian two-level model for repeated partially ordered responses: application to adolescent smoking behavior analysis. Sociological Methods and Research, 2019, to appear
Zhao S, Hu T, Ma L, Sun J (2015) Regression analysis of informative current status data with the additive hazards model. Lifetime Data Anal 21(2):241–258
Acknowledgements
The research of Chunjie Wang was supported by the National Natural Science Foundation of China (NSFC) (Grant No. 11671054). The research of Xinyuan Song was supported by the Research Grant Council of the HKSAR (GRF Grant Nos. 14301918 and 14302519), and the direct grants of Chinese University of Hong Kong.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Let \(\alpha _0\) and \(\theta _0\) be the true values of \(\alpha \) and \(\theta \), respectively. Recall that \(\varPi (\theta )=B \varPhi B^{T} + \varPsi \). The regularity conditions (C1)–(C4) that are required in the proof of Theorem 1 are as follows:
(C1): The matrix \(\varPi (\theta _0)\) is positive definite; all partial derivatives of the first three orders of \(\varPi (\theta _0)\) with respect to the elements of \(\theta _0\) are continuous and bounded in a neighborhood of \(\theta _0\); and the first derivative of \(\varPi (\theta _0)\), \({\dot{\varPi }}(\theta _0)\), is of full rank in a neighborhood of \(\theta _0\).
(C2): The function \(\lambda _0(t)\) is continuous and integerable on \([0,\tau ]\), where \(\tau \) is a prespecified positive constant, such that \(P(C_{i}\ge \tau ) > 0\).
(C3): There exists a positive constant a such that \(P(C_i\ge \tau , T_i \ge \tau \mid Z_i) > a\) with probability 1, and \(Z_i\), \(i=1,\cdots ,n\) are bounded almost surely.
(C4): The limit of matrix \({{\hat{A}}}\), A, is positive definite. and
We first define several notations as follows:
And \( Q_d={Q}_d( \alpha _0, \theta _0)\) and \({{\hat{Q}}}_d={Q}_d({{\hat{\alpha }}},{{\hat{\theta }}})\), \(d=1, \cdots , 5\).
where \(\otimes \) denotes the Kronecker product of two matrices, vec denotes the operation that converts a matrix into a column vector by stacking the rows sequentially, \({\dot{\varPi }} (\theta )= \partial {(vec\varPi (\theta ))^T}/\partial \theta ,\) \({\dot{\varGamma }}_j({\theta })\ (j=1,...,p)\) denote the derivatives of the jth column of \(\varGamma (\theta )\) with respect to \(\theta ^T\), and \({\dot{D}}_r({\theta }) \ (r=1,...,q)\) denote the derivatives of the rth column of \({D}({\theta })\) with respect to \(\theta ^T\).
Let \( {{{\hat{U}}}_i} = { ({{\hat{U}}}_{i1}^T, {{\hat{U}}}_{i2}^T)^T}\), \({{\hat{\varSigma }}}=\frac{1}{n}\sum \limits _{i = 1}^n {{\hat{U}}}_i^{ \otimes 2}\) and
In the above, \(d{{{\hat{M}}}^c_i}(t) = dN_i^c(t) - {Y_i}(t)\exp \big \{- {{\hat{\beta }}} ^TZ_i^*(t) - {{\hat{\gamma }}}^T \varGamma ({{\hat{\theta }}}){V_i}t \big \}d{{{\hat{H}}}_0}(t)\) and \({{{\hat{H}}}_0}(t) = \sum \limits _{i = 1}^n \int _0^t \frac{{dN_i^c(s)}}{{\sum \nolimits _{j = 1}^n {{Y_j}(s)\exp ( - {{\hat{\beta }}}^TZ_j^*(t) - {{\hat{\gamma }}} ^T \varGamma ({{\hat{\theta }}}){V_j}t- \frac{1}{2}{{{\hat{\gamma }}} ^T}D({{\hat{\theta }}} ){t^2} {{\hat{\gamma }}})} }}\).
Now, we are ready to prove the consistency and asymptotic normality of \({{\hat{\alpha }}}\). For the consistency, it is easy to obtain from the following two facts. One is that parameter \({{\hat{\theta }}}\) and its functions \(\varGamma ({{\hat{\theta }}})=({{\hat{B}}}^{T}{{\hat{\varPsi }}}^{-1}{{\hat{B}}})^{-1}{{\hat{B}}}^{T}{{\hat{\varPsi }}}^{-1}\) and \(D({{\hat{\theta }}})=({{\hat{B}}}^T {{\hat{\varPsi }}}^{-1}{{\hat{B}}})^{-1}\) involved in the CFA model are consistent. The consistency of \({{\hat{\theta }}}\) has been well established in the literature (e.g., Amemiya, Fuller and Pantula, 1987; Anderson and Amemiya 1988; Lee 2007). The consistency of \(\varGamma ({{\hat{\theta }}})\) and \(D({{\hat{\theta }}})\) can be obtained in a similar manner as in Pan et al. (2015). The second fact is that the working corrected estimating equations \(U_1(\alpha ; {\hat{\theta }}) = 0\) and \(U_2(\alpha ; {\hat{\theta }}) = 0\) can be written as the summation of n independently and identically distributed mean zero random variables plus some negligible errors.
Hence, based on the lemma of Pan et al. (2015), we have
where \(K(\theta )\), \(R_i(\theta )\), and \(P_i(\theta )\) are defined by (A.10), (A.11), and (A.12), respectively.
Under the AH model (2), we redefine a zero-mean stochastic process as follows: for \(i = 1, \ldots ,n\),
Denote \({{\bar{Z}}}(t,\alpha _0,\theta _0)={{\bar{Z}}}(t)\). We can obtain the following:
where
By the Taylor expansion, we have
and
Based on (A.14) and (A.15), equation (A.16) can be rewritten as
Similarly, equation (A.17) can be rewritten as
Thus, we obtain
and
where
and
Let \({U_i} = {(U_{i1}^T,U_{i2}^T)^T}\). Then, it follows from (A.20) and (A.21) that
which is a sum of independently and identically distributed zero-mean random vectors plus an asymptotically negligible term. The law of large numbers and the multivariate central limit theorem show that \(\frac{1}{n}U({\alpha _0};{{\hat{\theta }}} ) \rightarrow 0\) in probability and \(\frac{1}{{\sqrt{n} }}U({\alpha _0};{{\hat{\theta }}} )\) converges in distribution to a normal random vector with mean zero and covariance matrix \(\varSigma = E(U_i^{ \otimes 2})\). Note that
and \({{\hat{A}}} \rightarrow A\) in probability by the consistency of \(\varGamma ({{\hat{\theta }}})\) and \(D({{\hat{\theta }}})\). Then, based on (A.23), \({{\hat{\alpha }}}\) converges in probability to \(\alpha _0\), and \(\sqrt{n}({{\hat{\alpha }}}- \alpha _0)\) is asymptotically normal with mean zero and covariance matrix \({A^{ - 1}}\varSigma {A^{ - T}}\), and \(A^{-1}\varSigma {A^{-T}}\) can be consistently estimated by \({{{{\hat{A}}}}^{ - 1}}{{\hat{\varSigma }}} {{{{\hat{A}}}}^{ - T}}.\)
Rights and permissions
About this article
Cite this article
Wang, C., Zhao, B., Luo, L. et al. Regression analysis of current status data with latent variables. Lifetime Data Anal 27, 413–436 (2021). https://doi.org/10.1007/s10985-021-09521-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-021-09521-9