Skip to main content
Log in

Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies. We propose in this paper an estimation approach based on time-varying parametric models. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model, but the parameters are smooth functions of time. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Asymptotic properties, including the asymptotic biases, variances and mean squared errors, are derived for the local polynomial smoothed estimators. Applicability of our two-step estimation method is demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Chiou JM, Müller HG, Wang JL (2004) Functional response models. Stat Sin 14:675–693

    MathSciNet  MATH  Google Scholar 

  • Chowdhury M, Wu C, Modarres R (2017) Local Box–Cox transformation on time-varying parametric models for smoothing estimation of conditional CDF with longitudinal data. J Stat Comput Simul 87(15):2900–2914

    Article  MathSciNet  Google Scholar 

  • Daniels SR, McMahor RP, Obarzanek E, Waclawiw MA, Similo SL, Biro FM, Schreiber GB, Kimm SYS, Morrison JA, Barton BA (1998) Longitudinal corrlates of change in blood pressure in adolescent girls. Hypertension 31:97–103

    Article  Google Scholar 

  • Diggle PJ, Heagery P, Liang KY, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford

    Google Scholar 

  • Fan J, Zhang JT (2000) Two-step estimation of functional linear models with applications to longitudinal data. J R Stat Soc Ser B 62:303–322

    Article  MathSciNet  Google Scholar 

  • Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G (eds) (2009) Longitudinal data analysis. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Hall P, Wolff RCL, Yao Q (1999) Methods for estimating a conditional distribution function. J Am Stat Assoc 94:154–163

  • Hart TD, Wehrly TE (1993) Kernel regressionn estimation using repeated measurements data. J Am Stat Assoc 81:1080–1088

    Article  MATH  Google Scholar 

  • Hoover DR, Rice JA, Wu CO, Yang LP (1998) Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85:809–822

    Article  MathSciNet  MATH  Google Scholar 

  • Hu Z, Wang N, Carroll RJ (2004) Profile-kernel versus backfitting in the partially linear model for longitudinal/clustered data. Biometrika 91:251–262

    Article  MathSciNet  MATH  Google Scholar 

  • James GM, Hastie TJ, Sugar CA (2000) Principal component models for sparse functional data. Biometrika 87:587–602

    Article  MathSciNet  MATH  Google Scholar 

  • Lin X, Carroll R (2001) Semiparametric regression for clustered data. Biometrika 88:1179–1185

    Article  MathSciNet  MATH  Google Scholar 

  • Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York

    MATH  Google Scholar 

  • National Heart, Lung, and Blood Institute Growth and Health Research Group (NGHSRG) (1992) Obesity and cardiovascular disease risk factors in black and white girls: the NHLBI Growth and Health Study. Am J Public Health 82:1613–1620

    Article  Google Scholar 

  • National High Blood Pressure Education Program Working Group on High Blood Pressure in Children and Adolescents (NHBPEP Working Group) (2004) The fourth report on the diagnosis, evaluation, and treatment of high blood pressure in children and adolescents. Pediatrics 114:555–576

    Article  Google Scholar 

  • Obarzanek E, Wu CO, Cutler JA, Kavey RW, Pearson RW, Daniels SR (2010) Prevalence and incidence of hypertension in adolescent girls. J Pediatr 157(3):461–467

    Article  Google Scholar 

  • Qu A, Li R (2006) Nonparametric modeling and inference function for longitudinal data. Biometrics 62:379–391

    Article  MathSciNet  MATH  Google Scholar 

  • Sentürk D, Müller HG (2006) Inference for covariate adjusted regression via varying coefficient models. Ann Stat 34:654–679

    Article  MathSciNet  MATH  Google Scholar 

  • Thompson DR, Obarzanek E, Franko DL, Barton BA, Morrison J, Biro FM, Daniels SR, Striegel-Moore RH (2007) Childhood overweight and cardiovascular disease risk factors: the National Heart, Lung, and Blood Institute Growth and Health Study. J Pediatr 150:18–25

    Article  Google Scholar 

  • van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Wu CO, Tian X (2013a) Nonparametric estimation of conditional distribution functions and rank-tracking probabilities with longitudinal data. J Stat Theory Pract 7:1–26

    Article  MathSciNet  Google Scholar 

  • Wu CO, Tian X (2013b) Nonparametric estimation of conditional distribution functions and rank-tracking probabilities with time-varying transformation models in longitudinal studies. J Am Stat Assoc 108(503):971–982

    Article  MATH  Google Scholar 

  • Wu CO, Tian X, Yu J (2010) Nonparametric estimation for time-varying transformation models with longitudinal data. J Nonparametr Stat 22:133–147

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou L, Huang JZ, Carroll R (2008) Joint modelling of paired sparse functional data using principal components. Biometrika 95:601–619

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank Drs. Eric Leifer and Minjung Kwak for helpful comments and suggestions. The National Growth and Health Study was supported by Contracts #NO1-HC-55023-26 and Grants #U01-HL48941-44 from the National Heart, Lung and Blood Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Chowdhury.

Appendix: Proof of theoretical results

Appendix: Proof of theoretical results

1.1 A.1 Useful approximation for the equivalent kernels

The following approximations for the equivalent kernel function \(W_{q,p+1} (t_j, t; h)\) are used in computing the asymptotic bias and variance of \(F_{t, \widehat{\theta }(t|x)}^{(q)} \big [ y(t)|x \big ]\):

$$\begin{aligned}&W_{q,Q+1}(t_{j},t; h)=\frac{q!}{Jh^{q+1}g(t)}K_{q,p+1}\Big (\frac{t_{j}-t}{h} \Big ) \big [1+o_{p}(1) \big ], \quad j=1, \ldots ,J; \nonumber \\\end{aligned}$$
(A.1)
$$\begin{aligned}&\sum _{j=1}^{J}W_{q,p+1}(t_{j},t; h) \,(t_{j}-t)^{k}=q!1_{[k=q]},\quad k=1, \ldots , p; \end{aligned}$$
(A.2)
$$\begin{aligned}&\sum _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \,(t_{j}-t)^{p+1}\nonumber \\&\quad =\,q!h^{p-q+1} B_{p+1} \big ( K_{q,p+1} \big ) \big [1+ o_{p}(1) \big ], \quad k=1, \ldots , p; \end{aligned}$$
(A.3)
$$\begin{aligned}&\sum _{j=1}^{J} W_{q,p+1}^{2}(t_{j},t; h)=\frac{(q!)^2}{Jh^{2q+1}g(t)} V \big (K_{q,p+1} \big ) \big [1+ o_{p}(1) \big ], \end{aligned}$$
(A.4)

where \(K_{q,p+1}(t)\), \(B_{p+1}(K)\) and V(K) are defined in Theorem 1. Proofs of Eqs. (A.1)–(A.4) are given Fan and Zhang (2000, Appendix A, Lemmas 1 and 2).

1.2 A.2 Proof of Theorem 1

First, note that \(\widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ]=F_{t_j, \widetilde{\theta }(t_j|x)} \big [ y(t_{j})|x \big ]\). Using the Eq. (6), the bias of \(\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]\) is

$$\begin{aligned} E \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \} - F_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] = {\mathcal {W}}_1 + {{\mathcal {W}}}_2, \end{aligned}$$
(A.5)

where

$$\begin{aligned} \left\{ \begin{array}{lll} {{\mathcal {W}}}_1 &{}=&{} \sum \nolimits _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \big \{ E \widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ] - F_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ] \big \}, \\ {{\mathcal {W}}}_2 &{}=&{} \sum \nolimits _{j=1}^{J} W_{q,p+1}(t_{j},t; h) F_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ] - F_{t, \theta (t|x)}^{(q)} \big [ y(t)|x \big ] . \end{array} \right. \end{aligned}$$
(A.6)

It then follows from (15) and (A.1) that

$$\begin{aligned} {{\mathcal {W}}}_1 = \sum _{j=1}^{J} \frac{q!}{Jh^{q+1}g(t)}K_{q,p+1}\Big (\frac{t_{j}-t}{h} \Big ) \big [1+o_{p}(1) \big ] o_p \big ( n_j^{-1/2} \big ) = o_p \big ( n^{-1/2} \big ), \end{aligned}$$
(A.7)

where the second equality holds because, by Assumption A2, \(\lim _{n \rightarrow \infty } (n_j/n)\) is bounded between 0 and 1, and \(\sum _{j=1}^{J} \big | q! \big [Jh^{q+1}g(t) \big ]^{-1}K_{q,p+1}\big [(t_j-t)/h \big ] \big |\) is bounded. By the Taylor expansions for \(F_{t_j, \theta (t_j)} \big [ y(t_{j})|x \big ]\) and the Eqs. (A.2) and (A.3), we have

$$\begin{aligned} {{\mathcal {W}}}_2= & {} \sum _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \Big \{ \sum _{k=0}^{p+1} \Big [ F_{t, \theta (t|x)}^{(k)} \big [ y(t)|x \big ] \frac{(t_j-t)^k}{k!} \Big ] \nonumber \\&+ o_p \big [(t_j-t)^{p+1} \big ] \Big \} - F_{t, \theta (t|x)}^{(q)} \big [ y(t)|x \big ] \nonumber \\= & {} \frac{q!h^{p-q+1}}{p+1} F_{t, \theta (t|x)}^{(p+1)} \big [ y(t)|x \big ] B_{p+1} \big ( K_{q,p+1} \big ) \big [1+ o_{p}(1) \big ]. \end{aligned}$$
(A.8)

Since, by Assumption A1, and the asymptotic expressions of (A.7) and (A.8), \({{\mathcal {W}}}_2\) is the dominating term over \({{\mathcal {W}}}_1\). The asymptotic expression for the bias of \(\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]\) follows from (A.6) to (A.8).

Let \(\mu _{F}(t_j)= E\big \{ \widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t) \big | x \big ] \big \}\). Then, by Eq. (6),

$$\begin{aligned} Var \Big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \Big \}= & {} E\Big \{ \sum _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \Big [ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ] -\mu _{F}(t_j) \Big ] \Big \}^{2} \nonumber \\= & {} {{\mathcal {W}}}_3 + {{\mathcal {W}}}_4, \end{aligned}$$
(A.9)

where, by (15), (A.4) and Assumption A2 with \(c_j=c\),

$$\begin{aligned} {{\mathcal {W}}}_3= & {} \sum _{j=1}^{J} W_{q,p+1}^{2}(t_{j},t; h) E \Big \{ \Big [ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ]-\mu _{F}(t_j) \Big ]^2 \Big \} \nonumber \\= & {} \sum _{j=1}^{J}W_{q,p+1}^{2}(t_{j},t; h) Var \Big \{ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ] \Big ]\nonumber \\= & {} \frac{(q!)^2}{cnJh^{2q+1}g(t)} V \big (K_{q,p+1} \big ) \nonumber \\&\times F'_{t, \theta (t|x)}\big [y(t) \big |x \big ]^T I^{-1}\big [ \theta (t|x) \big ] F'_{t, \theta (t|x)}\big [y(t) \big |x \big ] \big [1+ o_{p}(1) \big ] \end{aligned}$$
(A.10)

and, by (14), Assumptions A2 and A4, the Eq. (A.1) and \(\lim _{n \rightarrow \infty } n_{jk}/n=c_{jk}\), there is a constant \(C_1>0\) such that, when n is sufficiently large,

$$\begin{aligned} {{\mathcal {W}}}_4= & {} \sum _{j\ne k}^{J} \Big \{ W_{q,p+1}(t_{j},t; h) W_{q,p+1}(t_{k},t; h) \nonumber \\&\times E \Big \{ \big [ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ] -\mu _{F}(t_j) \big ] \big [ \widetilde{F}_{t_k, \theta (t_{k}|x)} \big [ y(t_{k}) \big | x \big ]-\mu _{F}(t_k) \big ] \Big \} \Big \} \nonumber \\\le & {} \sum _{j\ne k}^{J}\Big \{ \Big | W_{q,p+1}(t_{j},t; h) \, W_{q,p+1}(t_{k},t; h) \Big | \nonumber \\&\times \Big | Cov\big \{ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ], \widetilde{F}_{t_k, \theta (t_{k}|x)} \big [ y(t_{k}) \big | x \big ] \big \} \Big |\Big \} \nonumber \\\le & {} \frac{C_1}{J^2 h^{2q+2} g^2(t)} \sum _{j \ne k}^{J} \Big | K_{q,p+1}\Big ( \frac{t_j-t}{h} \Big ) \, K_{q,p+1} \Big ( \frac{t_k-t}{h} \Big ) \, \Big [ \frac{\rho _F (t_j, t_k |x)}{r(n_j, n_k, n_{jk})} \Big ] \Big |.\nonumber \\ \end{aligned}$$
(A.11)

The bounded support of \(K(\cdot )\) implies that, for any t, \(K_{q, p+1}\big [(t_j-t)/h \big ]=K_{q, p+1}\big [(t_k-t)/h \big ]=0\) for any \(j \ne k\) such that \(\big | t_j -t_k \big | \le ah\) for some constant \(a>0\).

We now consider the following three situations:

  1. (i)

    If \(\big | t_j - t_k \big | \le \delta \), by Assumption A2, \(Cov\big \{ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ], \widetilde{F}_{t_k, \theta (t_{k}|x)} \big [ y(t_{k}) \big | x \big ] \big \}=0\), so that \({{\mathcal {W}}}_4=0\), and, by (A.9), \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3\).

  2. (ii)

    If \(\big | t_j - t_k \big | > \delta \ge ah\), then \(K \big [(t_j-t)/h \big ] K \big [(t_k-t)/h \big ] =0\), so that, by (A.11), \({{\mathcal {W}}}_4=0\), and it still follows from (A.9) that \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3\).

  3. (iii)

    If \(\delta < ah\) and \(\delta \le \big | t_j - t_k \big | \le ah\), since \(K_{q, p+1}(s)\) and \(\rho _F (t_j, t_k)\) are bounded, there is \(C_2>0\), so that, by (A.11) and \(\sum _{j =1}^{J} \sum _{k: \delta \le |t_k-t_j| \le ah} r^{-1}(n_j, n_k, n_{jk}) = o(Jh)\),

    $$\begin{aligned} {{\mathcal {W}}}_4\le & {} \frac{C_2}{J^2 h^{2q+2} g^2(t)} \Big \{ \sum _{j =1}^{J} \sum _{k: \delta \le |t_k-t_j| \le ah} r^{-1}(n_j, n_k, n_{jk}) \Big \} \big [ 1+ o_p(1) \big ] \nonumber \\= & {} o_p \big [ \big ( nJh^{2q+1} \big )^{-1} \big ]. \end{aligned}$$
    (A.12)

Then, by (A.9) and (A.12), \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3 \big [1+ o_p(1) \big ]\). Since, for all three situations (i), (ii) and (iii), \(Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3 \big [1+ o_p(1) \big ]\), the asymptotic expression for the variance of \(\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]\) follows from (A.9)-(A.11).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chowdhury, M., Wu, C. & Modarres, R. Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models. Metrika 81, 61–83 (2018). https://doi.org/10.1007/s00184-017-0634-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-017-0634-z

Keywords

Navigation