Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models

Chowdhury, Mohammed; Wu, Colin; Modarres, Reza

doi:10.1007/s00184-017-0634-z

Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models

Published: 09 November 2017

Volume 81, pages 61–83, (2018)
Cite this article

Metrika Aims and scope Submit manuscript

Mohammed Chowdhury¹,
Colin Wu² &
Reza Modarres³

281 Accesses
5 Citations
Explore all metrics

Abstract

Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies. We propose in this paper an estimation approach based on time-varying parametric models. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model, but the parameters are smooth functions of time. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Asymptotic properties, including the asymptotic biases, variances and mean squared errors, are derived for the local polynomial smoothed estimators. Applicability of our two-step estimation method is demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation and inference on the joint conditional distribution for bivariate longitudinal data using Gaussian copula

Article 23 December 2016

Conditional median-based Bayesian growth mixture modeling for nonnormal data

Article 29 September 2021

A semiparametric mixture regression model for longitudinal data

Article 01 March 2018

References

Chiou JM, Müller HG, Wang JL (2004) Functional response models. Stat Sin 14:675–693
MathSciNet MATH Google Scholar
Chowdhury M, Wu C, Modarres R (2017) Local Box–Cox transformation on time-varying parametric models for smoothing estimation of conditional CDF with longitudinal data. J Stat Comput Simul 87(15):2900–2914
Article MathSciNet Google Scholar
Daniels SR, McMahor RP, Obarzanek E, Waclawiw MA, Similo SL, Biro FM, Schreiber GB, Kimm SYS, Morrison JA, Barton BA (1998) Longitudinal corrlates of change in blood pressure in adolescent girls. Hypertension 31:97–103
Article Google Scholar
Diggle PJ, Heagery P, Liang KY, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford
Google Scholar
Fan J, Zhang JT (2000) Two-step estimation of functional linear models with applications to longitudinal data. J R Stat Soc Ser B 62:303–322
Article MathSciNet Google Scholar
Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G (eds) (2009) Longitudinal data analysis. Chapman and Hall/CRC, Boca Raton
MATH Google Scholar
Hall P, Wolff RCL, Yao Q (1999) Methods for estimating a conditional distribution function. J Am Stat Assoc 94:154–163
Hart TD, Wehrly TE (1993) Kernel regressionn estimation using repeated measurements data. J Am Stat Assoc 81:1080–1088
Article MATH Google Scholar
Hoover DR, Rice JA, Wu CO, Yang LP (1998) Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85:809–822
Article MathSciNet MATH Google Scholar
Hu Z, Wang N, Carroll RJ (2004) Profile-kernel versus backfitting in the partially linear model for longitudinal/clustered data. Biometrika 91:251–262
Article MathSciNet MATH Google Scholar
James GM, Hastie TJ, Sugar CA (2000) Principal component models for sparse functional data. Biometrika 87:587–602
Article MathSciNet MATH Google Scholar
Lin X, Carroll R (2001) Semiparametric regression for clustered data. Biometrika 88:1179–1185
Article MathSciNet MATH Google Scholar
Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
MATH Google Scholar
National Heart, Lung, and Blood Institute Growth and Health Research Group (NGHSRG) (1992) Obesity and cardiovascular disease risk factors in black and white girls: the NHLBI Growth and Health Study. Am J Public Health 82:1613–1620
Article Google Scholar
National High Blood Pressure Education Program Working Group on High Blood Pressure in Children and Adolescents (NHBPEP Working Group) (2004) The fourth report on the diagnosis, evaluation, and treatment of high blood pressure in children and adolescents. Pediatrics 114:555–576
Article Google Scholar
Obarzanek E, Wu CO, Cutler JA, Kavey RW, Pearson RW, Daniels SR (2010) Prevalence and incidence of hypertension in adolescent girls. J Pediatr 157(3):461–467
Article Google Scholar
Qu A, Li R (2006) Nonparametric modeling and inference function for longitudinal data. Biometrics 62:379–391
Article MathSciNet MATH Google Scholar
Sentürk D, Müller HG (2006) Inference for covariate adjusted regression via varying coefficient models. Ann Stat 34:654–679
Article MathSciNet MATH Google Scholar
Thompson DR, Obarzanek E, Franko DL, Barton BA, Morrison J, Biro FM, Daniels SR, Striegel-Moore RH (2007) Childhood overweight and cardiovascular disease risk factors: the National Heart, Lung, and Blood Institute Growth and Health Study. J Pediatr 150:18–25
Article Google Scholar
van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Book MATH Google Scholar
Wu CO, Tian X (2013a) Nonparametric estimation of conditional distribution functions and rank-tracking probabilities with longitudinal data. J Stat Theory Pract 7:1–26
Article MathSciNet Google Scholar
Wu CO, Tian X (2013b) Nonparametric estimation of conditional distribution functions and rank-tracking probabilities with time-varying transformation models in longitudinal studies. J Am Stat Assoc 108(503):971–982
Article MATH Google Scholar
Wu CO, Tian X, Yu J (2010) Nonparametric estimation for time-varying transformation models with longitudinal data. J Nonparametr Stat 22:133–147
Article MathSciNet MATH Google Scholar
Zhou L, Huang JZ, Carroll R (2008) Joint modelling of paired sparse functional data using principal components. Biometrika 95:601–619
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank Drs. Eric Leifer and Minjung Kwak for helpful comments and suggestions. The National Growth and Health Study was supported by Contracts #NO1-HC-55023-26 and Grants #U01-HL48941-44 from the National Heart, Lung and Blood Institute.

Author information

Authors and Affiliations

Department of Statistics and Analytical Sciences, KSU, Kennesaw, GA, USA
Mohammed Chowdhury
Office of Biostatistics Research, NHLBI, NIH, Bethesda, MD, USA
Colin Wu
Department of Statistics, GWU, Washington, DC, USA
Reza Modarres

Authors

Mohammed Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Colin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Reza Modarres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Chowdhury.

Appendix: Proof of theoretical results

1.1 A.1 Useful approximation for the equivalent kernels

The following approximations for the equivalent kernel function $W_{q,p+1} (t_j, t; h)$ are used in computing the asymptotic bias and variance of $F_{t, \widehat{\theta }(t|x)}^{(q)} \big [ y(t)|x \big ]$:

$$\begin{aligned}&W_{q,Q+1}(t_{j},t; h)=\frac{q!}{Jh^{q+1}g(t)}K_{q,p+1}\Big (\frac{t_{j}-t}{h} \Big ) \big [1+o_{p}(1) \big ], \quad j=1, \ldots ,J; \nonumber \\\end{aligned}$$

(A.1)

$$\begin{aligned}&\sum _{j=1}^{J}W_{q,p+1}(t_{j},t; h) \,(t_{j}-t)^{k}=q!1_{[k=q]},\quad k=1, \ldots , p; \end{aligned}$$

(A.2)

$$\begin{aligned}&\sum _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \,(t_{j}-t)^{p+1}\nonumber \\&\quad =\,q!h^{p-q+1} B_{p+1} \big ( K_{q,p+1} \big ) \big [1+ o_{p}(1) \big ], \quad k=1, \ldots , p; \end{aligned}$$

(A.3)

$$\begin{aligned}&\sum _{j=1}^{J} W_{q,p+1}^{2}(t_{j},t; h)=\frac{(q!)^2}{Jh^{2q+1}g(t)} V \big (K_{q,p+1} \big ) \big [1+ o_{p}(1) \big ], \end{aligned}$$

(A.4)

where $K_{q,p+1}(t)$, $B_{p+1}(K)$ and V(K) are defined in Theorem 1. Proofs of Eqs. (A.1)–(A.4) are given Fan and Zhang (2000, Appendix A, Lemmas 1 and 2).

1.2 A.2 Proof of Theorem 1

First, note that $\widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ]=F_{t_j, \widetilde{\theta }(t_j|x)} \big [ y(t_{j})|x \big ]$. Using the Eq. (6), the bias of $\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]$ is

$$\begin{aligned} E \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \} - F_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] = {\mathcal {W}}_1 + {{\mathcal {W}}}_2, \end{aligned}$$

(A.5)

where

$$\begin{aligned} \left\{ \begin{array}{lll} {{\mathcal {W}}}_1 &{}=&{} \sum \nolimits _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \big \{ E \widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ] - F_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ] \big \}, \\ {{\mathcal {W}}}_2 &{}=&{} \sum \nolimits _{j=1}^{J} W_{q,p+1}(t_{j},t; h) F_{t_j, \theta (t_j|x)} \big [ y(t_{j})|x \big ] - F_{t, \theta (t|x)}^{(q)} \big [ y(t)|x \big ] . \end{array} \right. \end{aligned}$$

(A.6)

It then follows from (15) and (A.1) that

$$\begin{aligned} {{\mathcal {W}}}_1 = \sum _{j=1}^{J} \frac{q!}{Jh^{q+1}g(t)}K_{q,p+1}\Big (\frac{t_{j}-t}{h} \Big ) \big [1+o_{p}(1) \big ] o_p \big ( n_j^{-1/2} \big ) = o_p \big ( n^{-1/2} \big ), \end{aligned}$$

(A.7)

where the second equality holds because, by Assumption A2, $\lim _{n \rightarrow \infty } (n_j/n)$ is bounded between 0 and 1, and $\sum _{j=1}^{J} \big | q! \big [Jh^{q+1}g(t) \big ]^{-1}K_{q,p+1}\big [(t_j-t)/h \big ] \big |$ is bounded. By the Taylor expansions for $F_{t_j, \theta (t_j)} \big [ y(t_{j})|x \big ]$ and the Eqs. (A.2) and (A.3), we have

$$\begin{aligned} {{\mathcal {W}}}_2= & {} \sum _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \Big \{ \sum _{k=0}^{p+1} \Big [ F_{t, \theta (t|x)}^{(k)} \big [ y(t)|x \big ] \frac{(t_j-t)^k}{k!} \Big ] \nonumber \\&+ o_p \big [(t_j-t)^{p+1} \big ] \Big \} - F_{t, \theta (t|x)}^{(q)} \big [ y(t)|x \big ] \nonumber \\= & {} \frac{q!h^{p-q+1}}{p+1} F_{t, \theta (t|x)}^{(p+1)} \big [ y(t)|x \big ] B_{p+1} \big ( K_{q,p+1} \big ) \big [1+ o_{p}(1) \big ]. \end{aligned}$$

(A.8)

Since, by Assumption A1, and the asymptotic expressions of (A.7) and (A.8), ${{\mathcal {W}}}_2$ is the dominating term over ${{\mathcal {W}}}_1$. The asymptotic expression for the bias of $\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]$ follows from (A.6) to (A.8).

Let $\mu _{F}(t_j)= E\big \{ \widetilde{F}_{t_j, \theta (t_j|x)} \big [ y(t) \big | x \big ] \big \}$. Then, by Eq. (6),

$$\begin{aligned} Var \Big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \Big \}= & {} E\Big \{ \sum _{j=1}^{J} W_{q,p+1}(t_{j},t; h) \Big [ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ] -\mu _{F}(t_j) \Big ] \Big \}^{2} \nonumber \\= & {} {{\mathcal {W}}}_3 + {{\mathcal {W}}}_4, \end{aligned}$$

(A.9)

where, by (15), (A.4) and Assumption A2 with $c_j=c$,

$$\begin{aligned} {{\mathcal {W}}}_3= & {} \sum _{j=1}^{J} W_{q,p+1}^{2}(t_{j},t; h) E \Big \{ \Big [ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ]-\mu _{F}(t_j) \Big ]^2 \Big \} \nonumber \\= & {} \sum _{j=1}^{J}W_{q,p+1}^{2}(t_{j},t; h) Var \Big \{ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ] \Big ]\nonumber \\= & {} \frac{(q!)^2}{cnJh^{2q+1}g(t)} V \big (K_{q,p+1} \big ) \nonumber \\&\times F'_{t, \theta (t|x)}\big [y(t) \big |x \big ]^T I^{-1}\big [ \theta (t|x) \big ] F'_{t, \theta (t|x)}\big [y(t) \big |x \big ] \big [1+ o_{p}(1) \big ] \end{aligned}$$

(A.10)

and, by (14), Assumptions A2 and A4, the Eq. (A.1) and $\lim _{n \rightarrow \infty } n_{jk}/n=c_{jk}$, there is a constant $C_1>0$ such that, when n is sufficiently large,

$$\begin{aligned} {{\mathcal {W}}}_4= & {} \sum _{j\ne k}^{J} \Big \{ W_{q,p+1}(t_{j},t; h) W_{q,p+1}(t_{k},t; h) \nonumber \\&\times E \Big \{ \big [ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ] -\mu _{F}(t_j) \big ] \big [ \widetilde{F}_{t_k, \theta (t_{k}|x)} \big [ y(t_{k}) \big | x \big ]-\mu _{F}(t_k) \big ] \Big \} \Big \} \nonumber \\\le & {} \sum _{j\ne k}^{J}\Big \{ \Big | W_{q,p+1}(t_{j},t; h) \, W_{q,p+1}(t_{k},t; h) \Big | \nonumber \\&\times \Big | Cov\big \{ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ], \widetilde{F}_{t_k, \theta (t_{k}|x)} \big [ y(t_{k}) \big | x \big ] \big \} \Big |\Big \} \nonumber \\\le & {} \frac{C_1}{J^2 h^{2q+2} g^2(t)} \sum _{j \ne k}^{J} \Big | K_{q,p+1}\Big ( \frac{t_j-t}{h} \Big ) \, K_{q,p+1} \Big ( \frac{t_k-t}{h} \Big ) \, \Big [ \frac{\rho _F (t_j, t_k |x)}{r(n_j, n_k, n_{jk})} \Big ] \Big |.\nonumber \\ \end{aligned}$$

(A.11)

The bounded support of $K(\cdot )$ implies that, for any t, $K_{q, p+1}\big [(t_j-t)/h \big ]=K_{q, p+1}\big [(t_k-t)/h \big ]=0$ for any $j \ne k$ such that $\big | t_j -t_k \big | \le ah$ for some constant $a>0$.

We now consider the following three situations:

(i)
If $\big | t_j - t_k \big | \le \delta $, by Assumption A2, $Cov\big \{ \widetilde{F}_{t_j, \theta (t_{j}|x)} \big [ y(t_{j}) \big | x \big ], \widetilde{F}_{t_k, \theta (t_{k}|x)} \big [ y(t_{k}) \big | x \big ] \big \}=0$, so that ${{\mathcal {W}}}_4=0$, and, by (A.9), $Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3$.
(ii)
If $\big | t_j - t_k \big | > \delta \ge ah$, then $K \big [(t_j-t)/h \big ] K \big [(t_k-t)/h \big ] =0$, so that, by (A.11), ${{\mathcal {W}}}_4=0$, and it still follows from (A.9) that $Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3$.
(iii)
If $\delta < ah$ and $\delta \le \big | t_j - t_k \big | \le ah$, since $K_{q, p+1}(s)$ and $\rho _F (t_j, t_k)$ are bounded, there is $C_2>0$, so that, by (A.11) and $\sum _{j =1}^{J} \sum _{k: \delta \le |t_k-t_j| \le ah} r^{-1}(n_j, n_k, n_{jk}) = o(Jh)$,
$$\begin{aligned} {{\mathcal {W}}}_4\le & {} \frac{C_2}{J^2 h^{2q+2} g^2(t)} \Big \{ \sum _{j =1}^{J} \sum _{k: \delta \le |t_k-t_j| \le ah} r^{-1}(n_j, n_k, n_{jk}) \Big \} \big [ 1+ o_p(1) \big ] \nonumber \\= & {} o_p \big [ \big ( nJh^{2q+1} \big )^{-1} \big ]. \end{aligned}$$
(A.12)

Then, by (A.9) and (A.12), $Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3 \big [1+ o_p(1) \big ]$. Since, for all three situations (i), (ii) and (iii), $Var \big \{ \widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ] \big \}={{\mathcal {W}}}_3 \big [1+ o_p(1) \big ]$, the asymptotic expression for the variance of $\widehat{F}_{t, \theta (t|x)}^{(q)} \big [ y(t) \big | x \big ]$ follows from (A.9)-(A.11).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chowdhury, M., Wu, C. & Modarres, R. Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models. Metrika 81, 61–83 (2018). https://doi.org/10.1007/s00184-017-0634-z

Download citation

Received: 27 February 2017
Published: 09 November 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00184-017-0634-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models

Abstract

Access this article

Similar content being viewed by others

Estimation and inference on the joint conditional distribution for bivariate longitudinal data using Gaussian copula

Conditional median-based Bayesian growth mixture modeling for nonnormal data

A semiparametric mixture regression model for longitudinal data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of theoretical results

1.1 A.1 Useful approximation for the equivalent kernels

1.2 A.2 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonparametric estimation of conditional distribution functions with longitudinal data and time-varying parametric models

Abstract

Access this article

Similar content being viewed by others

Estimation and inference on the joint conditional distribution for bivariate longitudinal data using Gaussian copula

Conditional median-based Bayesian growth mixture modeling for nonnormal data

A semiparametric mixture regression model for longitudinal data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of theoretical results

Appendix: Proof of theoretical results

1.1 A.1 Useful approximation for the equivalent kernels

1.2 A.2 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation