Abstract
Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies. We propose in this paper an estimation approach based on time-varying parametric models. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model, but the parameters are smooth functions of time. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Asymptotic properties, including the asymptotic biases, variances and mean squared errors, are derived for the local polynomial smoothed estimators. Applicability of our two-step estimation method is demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through simulation study.
Similar content being viewed by others
References
Chiou JM, Müller HG, Wang JL (2004) Functional response models. Stat Sin 14:675–693
Chowdhury M, Wu C, Modarres R (2017) Local Box–Cox transformation on time-varying parametric models for smoothing estimation of conditional CDF with longitudinal data. J Stat Comput Simul 87(15):2900–2914
Daniels SR, McMahor RP, Obarzanek E, Waclawiw MA, Similo SL, Biro FM, Schreiber GB, Kimm SYS, Morrison JA, Barton BA (1998) Longitudinal corrlates of change in blood pressure in adolescent girls. Hypertension 31:97–103
Diggle PJ, Heagery P, Liang KY, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford
Fan J, Zhang JT (2000) Two-step estimation of functional linear models with applications to longitudinal data. J R Stat Soc Ser B 62:303–322
Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G (eds) (2009) Longitudinal data analysis. Chapman and Hall/CRC, Boca Raton
Hall P, Wolff RCL, Yao Q (1999) Methods for estimating a conditional distribution function. J Am Stat Assoc 94:154–163
Hart TD, Wehrly TE (1993) Kernel regressionn estimation using repeated measurements data. J Am Stat Assoc 81:1080–1088
Hoover DR, Rice JA, Wu CO, Yang LP (1998) Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85:809–822
Hu Z, Wang N, Carroll RJ (2004) Profile-kernel versus backfitting in the partially linear model for longitudinal/clustered data. Biometrika 91:251–262
James GM, Hastie TJ, Sugar CA (2000) Principal component models for sparse functional data. Biometrika 87:587–602
Lin X, Carroll R (2001) Semiparametric regression for clustered data. Biometrika 88:1179–1185
Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
National Heart, Lung, and Blood Institute Growth and Health Research Group (NGHSRG) (1992) Obesity and cardiovascular disease risk factors in black and white girls: the NHLBI Growth and Health Study. Am J Public Health 82:1613–1620
National High Blood Pressure Education Program Working Group on High Blood Pressure in Children and Adolescents (NHBPEP Working Group) (2004) The fourth report on the diagnosis, evaluation, and treatment of high blood pressure in children and adolescents. Pediatrics 114:555–576
Obarzanek E, Wu CO, Cutler JA, Kavey RW, Pearson RW, Daniels SR (2010) Prevalence and incidence of hypertension in adolescent girls. J Pediatr 157(3):461–467
Qu A, Li R (2006) Nonparametric modeling and inference function for longitudinal data. Biometrics 62:379–391
Sentürk D, Müller HG (2006) Inference for covariate adjusted regression via varying coefficient models. Ann Stat 34:654–679
Thompson DR, Obarzanek E, Franko DL, Barton BA, Morrison J, Biro FM, Daniels SR, Striegel-Moore RH (2007) Childhood overweight and cardiovascular disease risk factors: the National Heart, Lung, and Blood Institute Growth and Health Study. J Pediatr 150:18–25
van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Wu CO, Tian X (2013a) Nonparametric estimation of conditional distribution functions and rank-tracking probabilities with longitudinal data. J Stat Theory Pract 7:1–26
Wu CO, Tian X (2013b) Nonparametric estimation of conditional distribution functions and rank-tracking probabilities with time-varying transformation models in longitudinal studies. J Am Stat Assoc 108(503):971–982
Wu CO, Tian X, Yu J (2010) Nonparametric estimation for time-varying transformation models with longitudinal data. J Nonparametr Stat 22:133–147
Zhou L, Huang JZ, Carroll R (2008) Joint modelling of paired sparse functional data using principal components. Biometrika 95:601–619
Acknowledgements
We would like to thank Drs. Eric Leifer and Minjung Kwak for helpful comments and suggestions. The National Growth and Health Study was supported by Contracts #NO1-HC-55023-26 and Grants #U01-HL48941-44 from the National Heart, Lung and Blood Institute.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of theoretical results
Appendix: Proof of theoretical results
1.1 A.1 Useful approximation for the equivalent kernels
The following approximations for the equivalent kernel function \(W_{q,p+1} (t_j, t; h)\) are used in computing the asymptotic bias and variance of \(F_{t, \widehat{\theta }(t|x)}^{(q)} \big [ y(t)|x \big ]\):
where \(K_{q,p+1}(t)\), \(B_{p+1}(K)\) and V(K) are defined in Theorem 1. Proofs of Eqs. (A.1)–(A.4) are given Fan and Zhang (2000, Appendix A, Lemmas 1 and 2).
1.2 A.2 Proof of Theorem 1
First, note that \(\widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ]=F_{t_j, \widetilde{\theta }(t_j|x)} \big [ y(t_{j})|x \big ]\). Using the Eq. (6), the bias of \(\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]\) is
where
It then follows from (15) and (A.1) that
where the second equality holds because, by Assumption A2, \(\lim _{n \rightarrow \infty } (n_j/n)\) is bounded between 0 and 1, and \(\sum _{j=1}^{J} \big | q! \big [Jh^{q+1}g(t) \big ]^{-1}K_{q,p+1}\big [(t_j-t)/h \big ] \big |\) is bounded. By the Taylor expansions for \(F_{t_j, \theta (t_j)} \big [ y(t_{j})|x \big ]\) and the Eqs. (A.2) and (A.3), we have
Since, by Assumption A1, and the asymptotic expressions of (A.7) and (A.8), \({{\mathcal {W}}}_2\) is the dominating term over \({{\mathcal {W}}}_1\). The asymptotic expression for the bias of \(\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]\) follows from (A.6) to (A.8).
Let \(\mu _{F}(t_j)= E\big \{ \widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t) \big | x \big ] \big \}\). Then, by Eq. (6),
where, by (15), (A.4) and Assumption A2 with \(c_j=c\),
and, by (14), Assumptions A2 and A4, the Eq. (A.1) and \(\lim _{n \rightarrow \infty } n_{jk}/n=c_{jk}\), there is a constant \(C_1>0\) such that, when n is sufficiently large,
The bounded support of \(K(\cdot )\) implies that, for any t, \(K_{q, p+1}\big [(t_j-t)/h \big ]=K_{q, p+1}\big [(t_k-t)/h \big ]=0\) for any \(j \ne k\) such that \(\big | t_j -t_k \big | \le ah\) for some constant \(a>0\).
We now consider the following three situations:
-
(i)
If \(\big | t_j - t_k \big | \le \delta \), by Assumption A2, \(Cov\big \{ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ], \widetilde{F}_{t_k, \theta (t_{k}|x)} \big [ y(t_{k}) \big | x \big ] \big \}=0\), so that \({{\mathcal {W}}}_4=0\), and, by (A.9), \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3\).
-
(ii)
If \(\big | t_j - t_k \big | > \delta \ge ah\), then \(K \big [(t_j-t)/h \big ] K \big [(t_k-t)/h \big ] =0\), so that, by (A.11), \({{\mathcal {W}}}_4=0\), and it still follows from (A.9) that \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3\).
-
(iii)
If \(\delta < ah\) and \(\delta \le \big | t_j - t_k \big | \le ah\), since \(K_{q, p+1}(s)\) and \(\rho _F (t_j, t_k)\) are bounded, there is \(C_2>0\), so that, by (A.11) and \(\sum _{j =1}^{J} \sum _{k: \delta \le |t_k-t_j| \le ah} r^{-1}(n_j, n_k, n_{jk}) = o(Jh)\),
$$\begin{aligned} {{\mathcal {W}}}_4\le & {} \frac{C_2}{J^2 h^{2q+2} g^2(t)} \Big \{ \sum _{j =1}^{J} \sum _{k: \delta \le |t_k-t_j| \le ah} r^{-1}(n_j, n_k, n_{jk}) \Big \} \big [ 1+ o_p(1) \big ] \nonumber \\= & {} o_p \big [ \big ( nJh^{2q+1} \big )^{-1} \big ]. \end{aligned}$$(A.12)
Then, by (A.9) and (A.12), \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3 \big [1+ o_p(1) \big ]\). Since, for all three situations (i), (ii) and (iii), \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3 \big [1+ o_p(1) \big ]\), the asymptotic expression for the variance of \(\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]\) follows from (A.9)-(A.11).
Rights and permissions
About this article
Cite this article
Chowdhury, M., Wu, C. & Modarres, R. Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models. Metrika 81, 61–83 (2018). https://doi.org/10.1007/s00184-017-0634-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-017-0634-z