Skip to main content

Advertisement

Log in

Functional Linear Partial Quantile Regression with Guaranteed Convergence for Neuroimaging Data Analysis

  • Original Paper
  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Functional data such as curves and surfaces have become more and more common with modern technological advancements. The use of functional predictors remains challenging due to its inherent infinite dimensionality. The common practice is to project functional data into a finite dimensional space. The popular partial least square method has been well studied for the functional linear model (Delaigle and Hall in Ann Stat 40(1):322–352, 2012). As an alternative, quantile regression provides a robust and more comprehensive picture of the conditional distribution of a response when it is non-normal, heavy-tailed, or contaminated by outliers. While partial quantile regression (PQR) was proposed in (Yu et al. in Neurocomputing 195:74–87, 2016)[2], no theoretical guarantees were provided due to the iterative nature of the algorithm and the non-smoothness of quantile loss function. To address these issues, we propose an alternative PQR formulation with guaranteed convergence. This novel formulation motivates new theories and allows us to establish asymptotic properties. Numerical studies on a benchmark dataset show the superiority of our new approach. We also apply our novel method to a functional magnetic resonance imaging data to predict attention deficit hyperactivity disorder and a diffusion tensor imaging dataset to predict Alzheimer’s disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Delaigle A, Hall P (2012) Methodology and theory for partial least squares applied to functional data. Ann Stat 40(1):322–352

    Article  MathSciNet  Google Scholar 

  2. Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50

    Article  MathSciNet  Google Scholar 

  3. Yu D, Kong L, Mizera I (2016) Partial functional linear quantile regression for neuroimaging data analysis. Neurocomputing 195:74–87

    Article  Google Scholar 

  4. Kato K (2012) Estimation in functional linear quantile regression. Ann Stat 40(6):3108–3136

    Article  MathSciNet  Google Scholar 

  5. Koenker R (2005) Quantile Regression. Cambridge University Press, Cambridge

    Book  Google Scholar 

  6. James GM, Wang J, Zhu J (2009) Functional linear regression that’s interpretable. Ann Stat 37(5):2083–2108

    MathSciNet  Google Scholar 

  7. Cardot H, Crambes C, Sarda P (2005) Quantile regression when the covariates are functions. Nonparametr Stat 17(7):841–856

    Article  MathSciNet  Google Scholar 

  8. Sun Y (2005) Semiparametric efficient estimation of partially linear quantile regression models. Ann Econ Financ 6(1):105

    Google Scholar 

  9. Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13(3):571–592

    MathSciNet  Google Scholar 

  10. Zhao Y, Ogden RT, Reiss PT (2012) Wavelet-based lasso in functional linear regression. J Comput Gr Stat 21(3):600–617

    Article  MathSciNet  Google Scholar 

  11. Lu Y, Du J, Sun Z (2014) Functional partially linear quantile regression model. Metrika 77(2):317–332

    Article  MathSciNet  Google Scholar 

  12. Tang Q, Cheng L (2014) Partial functional linear quantile regression. Sci China Math 57(12):2589–2608

    Article  MathSciNet  Google Scholar 

  13. Hall P, Horowitz JL (2007) Methodology and convergence rates for functional linear regression. Ann Stat 35(1):70–91

    Article  MathSciNet  Google Scholar 

  14. Lee ER, Park BU (2012) Sparse estimation in functional linear regression. J Multivar Anal 105(1):1–17

    Article  MathSciNet  Google Scholar 

  15. Wold H (1975) Soft modeling by latent variables: the nonlinear iterative partial least squares approach. Perspectives in probability and statistics, papers in honour of M.S. Bartlett, pp 520–540

  16. Helland IS (1990) Partial least squares regression and statistical models. Scand J Stat 17(2):97–114

    MathSciNet  Google Scholar 

  17. Frank LE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135

    Article  Google Scholar 

  18. Nguyen DV, Rocke DM (2004) On partial least squares dimension reduction for microarray-based classification: a simulation study. Comput Stat Data Anal 46(3):407–425

    Article  MathSciNet  Google Scholar 

  19. Abdi H (2010) Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdiscip Rev Comput Stat 2(1):97–106

    Article  Google Scholar 

  20. Preda C, Saporta G (2005) PLS regression on a stochastic process. Comput Stat Data Anal 48(1):149–158

    Article  MathSciNet  Google Scholar 

  21. Zhao Y, Chen H, Ogden RT (2015) Wavelet-based weighted lasso and screening approaches in functional linear regression. J Comput Gr Stat 24(3):655–675

    Article  MathSciNet  Google Scholar 

  22. Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19(2):801

    MathSciNet  Google Scholar 

  23. Zheng S (2011) Gradient descent algorithms for quantile regression with smooth approximation. Int J Mach Learn Cybern 2(3):191–207

    Article  Google Scholar 

  24. Muggeo VM, Sciandra M, Augugliaro L (2012) Quantile regression via iterative least squares computations. J Stat Comput Simul 82(11):1557–1569

    Article  MathSciNet  Google Scholar 

  25. Chen C (2012) A finite smoothing algorithm for quantile regression. J Comput Gr Stat 16(1):136–164

    Article  MathSciNet  Google Scholar 

  26. Crambes C, Kneip A, Sarda P (2009) Smoothing splines estimators for functional linear regression. Ann Stat 37(1):35–72

    Article  MathSciNet  Google Scholar 

  27. Tu W, Liu P, Zhao J, Liu Y, Kong L, Li G, Jiang B, Tian G, Yao H (2019) M-estimation in low-rank matrix factorization: a general framework. In: 2019 IEEE international conference on data mining (ICDM), pp 568–577

  28. Zhu R, Niu D, Kong L, Li Z (2017) Expectile matrix factorization for skewed data analysis. In: thirty-first AAAI conference on artificial intelligence

  29. Van der Vaart AW (2000) Asymptotic Statistics, vol 3. Cambridge University Press, Cambridge

    Google Scholar 

  30. Hjort NL, Pollard D (2011) Asymptotics for minimisers of convex processes. arXiv preprint arXiv:1107.3806

  31. Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Progr 103(1):127–152

    Article  MathSciNet  Google Scholar 

  32. Li X, Xu D, Zhou H, Li L (2018) Tucker tensor regression and neuroimaging analysis. Stat Biosci 10:520–545

    Article  Google Scholar 

  33. Zhou H, Li L, Zhu H (2013) Tensor regression with applications in neuroimaging data analysis. J Am Stat Assoc 108(502):540–552

    Article  MathSciNet  Google Scholar 

  34. Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Probab Lett 54(4):437–447

    Article  MathSciNet  Google Scholar 

  35. Sánchez B, Lachos H, Labra V (2013) Likelihood based inference for quantile regression using the asymmetric laplace distribution. J Stat Comput Simul 81:1565–1578

    Google Scholar 

  36. Rothenberg TJ (1971) Identification in parametric models. Econometrica 39(3):577–591

    Article  MathSciNet  Google Scholar 

  37. De Leeuw J (1994) Block-relaxation algorithms in statistics. Studies in classification, data analysis, and knowledge organization. In: Bock H, Lenski W, Richter MM (eds) Information systems and data analysis. Springer, New York, pp 308–324

    Chapter  Google Scholar 

  38. Yu D, Zhang L, Mizera I, Jiang B, Kong L (2019) Sparse wavelet estimation in quantile regression with multiple functional predictors. Comput stat Data Anal 136:12–29

    Article  MathSciNet  Google Scholar 

  39. Wang Y, Kong L, Jiang B, Zhou X, Yu S, Zhang L, Heo G (2019) Wavelet-based lasso in functional linear quantile regression. J Stat Comput Simul 89(6):1111–1130

    Article  MathSciNet  Google Scholar 

  40. Golubitsky M, Guillemin V (2012) Stable mappings and their singularities, vol 14. Springer, New York

    Google Scholar 

  41. van der Vaart AW, Wellner JA (2000) Weak Convergence. Springer, New York

    Google Scholar 

Download references

Acknowledgements

Linglong Kong, Ivan Mizera, and Bei Jiang acknowledge funding support for this research from the Natural Sciences and Engineering Research Council of Canada (NSERC). Dengdeng Yu acknowledges funding support for this research from the start-up grant from university of Texas at Arlington. Linglong Kong and Dengdeng Yu further acknowledge the funding support from the Canadian Statistical Sciences Institute (CANSSI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linglong Kong.

Appendix 1 Proofs and Verification

Appendix 1 Proofs and Verification

The appendix contains detailed proofs of the results that are missing in the main paper.

1.1 Verification of Assumptions of Theorem 1

Let us first check assumption (i). For a given fixed N, we have that \(\rho _{\tau \nu _N}\) is differentiable, and so \(l_N(\varvec{\upsilon })\) is continuous. As \(||\varvec{\upsilon }||\rightarrow \infty\), that is, as \(||(\alpha ,\varvec{\beta })||\rightarrow \infty\) or \(||{\textbf{C}}||\rightarrow \infty\), the function \(l_N\) should approach \(-\infty\). Therefore, \(l_N\) is coercive. To verify assumption (ii), observe that the function \(y_i - \alpha - {\textbf{x}}_i^T \varvec{\beta }- {\textbf{z}}_{i}^T {\textbf{C}}_{} {\textbf{1}}_K\) is an affine function about \({\textbf{C}}\). Since \(- \rho _\tau\) and its approximation \(-\rho _{\tau \nu _N}\) are strictly concave, we have that \(l_N(\varvec{\upsilon })\) and \(l(\varvec{\upsilon })\) are both strictly concave about \({\textbf{C}}\). Since they are also strictly concave about \(\alpha\), we have strict concavity of \(l_N(\varvec{\upsilon })\) and \(l(\varvec{\upsilon })\) about \(\varvec{\upsilon }\). Lastly, assumption (iii) assures that a locally optimized point is isolated. One important property regarding isolated stationary points is that, if the Hessian matrix at a stationary point is nonsingular, then the stationary point is an isolated one [40]. Alternatively, we can impose Condition A1 of [5] requiring distribution functions \(F_i\) to be absolutely continuous with continuous density \(f_i\) uniformly bounded away from 0 and \(\infty\) at the point \(\xi _i(\tau ) = F_i^{-1}(\tau )\), where \(F_i\) is the conditional distribution of \(y_i\) given \({\textbf{z}}_{i}\). Lemma 2 of [30] states that, if we have a sequence of convex functions \(l_N(\varvec{\upsilon })\) defined on an open convex set S in \({\mathbb {R}}^{Kd+p+1}\) on which \(l_N\) converges pointwisely to l, then \(\sup _{\varvec{\upsilon }\in K}|l_N(\varvec{\upsilon })-l(\varvec{\upsilon })|\) approaches zero for each compact subset \({\textbf{K}}\) of S. As long as the maxima is a unique interior point S, we have that the maxima of \(l_N\) will approach the maxima of l.

1.2 Proof of Theorem 3

This result can be verified by using Theorem 1 of [36], reproduced below in Lemma 6. The regularity assumptions of this lemma are satisfied by the current model since (1) the parameter space S is open, (2) the density \(p(y,{\textbf{z}}\mid {\textbf{C}})\) is proper for all \({\textbf{C}} \in S\), (3) the support of the density \(p(y,{\textbf{z}}\vert {\textbf{C}})\) is the same for all \({\textbf{C}} \in S\), (4) the log density \(l_N({\textbf{C}}\vert y,{\textbf{z}}) = \ln p(y,{\textbf{z}}\vert {\textbf{C}})\) is continuously differentiable, and (5) the information matrix \(I_N({\textbf{C}})\) is continuous in \({\textbf{C}}\) by Theorem 2. Then by Lemma 6, \({\textbf{C}}\) is locally identifiable if and only if \(I_N({\textbf{C}})\) is nonsingular.

Lemma 6

([36], Theorem 1) Let \(\theta _0\) be a regular point of the information matrix \(I(\theta )\). Then \(\theta _0\) is locally identifiable if and only if \(I(\theta _0)\) is nonsingular.

1.3 Proof of Theorem 4

Proof

For simplicity, we omit \(\alpha\) and \(\varvec{\beta }\), though the conclusions generalize easily to the case with them. we want to show that the consistency of the estimated factor matrix \(\hat{{\textbf{C}}}_n\). The following well-known theorem is our major tool for establishing consistency.

Lemma 7

([29], Theorem 5.7) Let \(M_n\) be random functions and let M be a fixed function of \(\theta\) such that for every \(\epsilon > 0\)

$$\begin{aligned} \sup \limits _{\theta \in \nu } \left| M_n(\theta ) - M(\theta )\right|\rightarrow & {} 0 \quad \text { in probability},\\ \sup \limits _{\theta : d(\theta ,\theta _0) \ge \epsilon } M(\theta )< & {} M(\theta _0). \end{aligned}$$

Then any sequence of estimators \({\hat{\theta }}_n\) with \(M_n({\hat{\theta }}) \ge M_n(\theta _0) - o_P(1)\) converges in probability to \(\theta _0\).

To apply Lemma 7 in our setting, we take the nonrandom function M to be \({\textbf{C}} \mapsto {\mathbb {P}}_{{\textbf{C}}_0} \left[ l_N (Y,Z\mid {\textbf{C}})\right]\) and the sequence of random functions to be \(M_n: {\textbf{C}} \mapsto \frac{1}{n} \sum _{i=1}^n l_N (y_i,z_i\mid {\textbf{C}}) = {\mathbb {P}}_n M\), where \({\mathbb {P}}_n\) denotes the empirical measure under \({\textbf{C}}_0\). Then \(M_n\) converges to M a.s. by strong law of large number. The second condition requires that \({\textbf{C}}_0\) is a well-separated maximum of M. This is guaranteed by the (global) identifiability of \({\textbf{C}}_0\) and information inequality. The first uniform convergence condition is most convenient and is verified by the Glivenko-Cantelli theory [29].

The density is \(p_{{\textbf{C}}} (y \mid {\textbf{z}}) = const \cdot \exp \left[ - \rho _{\tau \nu _N} \left( y-\eta ({\textbf{C}},{\textbf{z}})\right) \right]\) where \(\eta ({\textbf{C}},{\textbf{z}}) = \langle {\textbf{C}}, {\textbf{z}}\rangle\). Take \(m_{{\textbf{C}}}=ln\left[ (p_{{\textbf{C}}}+p_{{\textbf{C}}_0})/2\right]\). First we show that \({\textbf{C}}_0\) is a well-separated maximum of the function \(M({\textbf{C}}):={\mathbb {P}}_{{\textbf{C}}_0} m_{{\textbf{C}}}\). The global identifiability of \({\textbf{C}}_0\) and information inequality guarantee that \({\textbf{C}}_0\) is the unique maximum of M. To show that it is a well-separated maximum, we need to verify that \(M({\textbf{C}}_k) \rightarrow M({\textbf{C}}_0)\) implies \({\textbf{C}}_k \rightarrow {\textbf{C}}_0\).

Suppose \(M({\textbf{C}}_k) \rightarrow M({\textbf{C}}_0)\), then \(\langle {\textbf{C}}_k, {\textbf{Z}} \rangle \rightarrow \langle {\textbf{C}}_0, {\textbf{Z}} \rangle\) in probability. If \({\textbf{C}}_k\) are bounded, then \({\textbf{E}} \left[ \langle {\textbf{C}}_k-{\textbf{C}}_0, {\textbf{Z}} \rangle ^2\right] \rightarrow 0\) and \({\textbf{C}}_k \rightarrow {\textbf{C}}_0\) by nonsigularity of \({\textbf{E}}\left[ (\textrm{vec}{\textbf{Z}})(\textrm{vec}{\textbf{Z}})^T\right]\). On the other hand, \({\textbf{C}}_k\) cannot run to infinity. If they do, then \(\langle {\textbf{C}}_k, {\textbf{Z}} \rangle / \left| {\textbf{C}}_k \right| \rightarrow 0\) in probability which in turn implies that \({\textbf{C}}_k / \left| C_k\right| \rightarrow {\textbf{0}}.\)

For the uniform convergence, we see that the class of functions \(\{ \langle {\textbf{C}},{\textbf{Z}}\rangle , {\textbf{C}} \in S \}\) forms a VC class. This is true because it is a collection of number of polynomials of degree 1 and then apply the VC vector space argument [41, 2.6.15]. This implies that \(\{ \eta (\langle {\textbf{C}},{\textbf{Z}}\rangle ), {\textbf{C}} \in S \}\) is a VC class since \(\eta\) is a monotone function [41, 2.6.18].

Now \(m_{{\textbf{C}}}\) is Lipschitz in \(\eta\) since

$$\begin{aligned} \frac{\partial m_{{\textbf{C}}}}{\partial \eta }= & {} \frac{const \cdot \exp \left[ - \rho _{\tau \nu _N} \left( y-\eta \right) \right] \cdot \rho ^\prime _{\tau \nu _N} \left( y-\eta \right) }{const \cdot \exp \left[ - \rho _{\tau \nu _N} \left( y-\eta \right) \right] + const \cdot \exp \left[ - \rho _{\tau \nu _N} \left( y-\eta _0\right) \right] }\\= & {} \frac{ \rho ^\prime _{\tau \nu _N} \left( y-\eta \right) }{1+ \exp \left[ \rho _{\tau \nu _N} \left( y-\eta \right) - \rho _{\tau \nu _N} \left( y-\eta _0\right) \right] } \le \sup \left| \rho ^\prime _{\tau \nu _N} (\cdot ) \right| =const \end{aligned}$$

The last equality holds since \(\rho _{\tau \nu _N}(u) \rightarrow \rho _\tau (u)\) as \(N \rightarrow \infty\), which also implies that \(\rho ^\prime _{\tau \nu _N}(u) \rightarrow \rho ^\prime _{\tau }(u)\) as \(N \rightarrow \infty\) except for \(u=0\). And we know that \(\rho ^\prime _{\tau \nu _N}(0)=0\). Similarly we can show that \(m_{{\textbf{C}}}\) is Lipschitz in \(\eta _0\). A Lipschitz composition of a Donsker class is still a a Donsker class [29, 19.20]. Therefore, \(\left\{ {\textbf{C}} \mapsto m_{\textbf{C}} \right\}\) is a bounded Donsker class with the trivial envelope function 1. A Donsker class is certainly a Glivenko-Cantelli class. Finally, the Glivenko-Cantelli theorem establishes the uniform convergence condition required by Lemma 7.

When the parameter is restricted to a compact set, \(\eta (\langle {\textbf{C}}, {\textbf{Z}}\rangle )\) is confined in a bounded interval and the \(l_N\) is Lipschitz on the finite interval. It follows that \(\{ l_N({\textbf{C}}) = l_N \circ \eta \circ \langle {\textbf{C}}, {\textbf{Z}}\rangle , {\textbf{C}} \in S\}\) is a Donsker class as composition with a monotone or Lipschitz function preserves the Donsker class. Therefore, the Glivenko-Cantelli theorem establishes the uniform convergence. Compactness of parameter space implies that \({\textbf{C}}_0\) is a well-separated maximum if it is the unique maximizer of \(M({\textbf{C}}) = {\mathbb {P}}_{{\textbf{C}}_0} m_{{\textbf{C}}}\) [29, Exercise 5.27]. Uniqueness is guaranteed by the information equality whenever \({\textbf{C}}_0\) is identifiable. This verifies the consistency for quantile regression. \(\square\)

Lemma 8

Tensor quantile linear regression model (3) is quadratic mean differentiable (q.m.d.).

Proof

By a well-known result [29, Lemma 7.6], it suffices to verify that the density is continuously differentiable in parameter for \(\mu\)-almost all x and that the Fisher information matrix exists and it continuous. The derivative of density is

$$\begin{aligned} \nabla l_N({\textbf{C}}) = - \sum _{i=1}^{n} \rho ^\prime _{\tau \nu _N}\left( \eta _i({\textbf{C}}) \right) \cdot \nabla \eta _i({\textbf{C}}), \end{aligned}$$

which is well defined and continuous by Proposition 2. The same proposition shows that the information matrix exists and is continuous. Therefore, the tensor quantile linear regression model is q.m.d. \(\square\)

1.4 Proof of Theorem 5

Proof

The following result relates asymptotic normality to the density that satisfies q.m.d.

Lemma 9

At an inner point \(\theta _0\) of \(\nu \subset {\textbf{R}}^k\). Furthermore, suppose that there exists a measurable function \({\dot{l}}\) with \({\textbf{P}}_{\theta _0} {\dot{l}}^2 < \infty\) such that, for every \(\theta _1\) and \(\theta _2\) in a neighborhood of \(\theta _0\) ,

$$\begin{aligned} \left| ln p_{\theta _1}(x) - ln p_{\theta _2} (x)\right| \le {\dot{l}} (x) \left| \theta _1 -\theta _2 \right| . \end{aligned}$$

If the Fisher information matrix \(I_{\theta _0}\) is nonsingular and \(\hat{\theta _n}\) is consistent, then

$$\begin{aligned} \sqrt{n} (\hat{\theta _n}-\theta _0) = I^{-1}_{\theta _0} \sum _{i=1}^{n} {\dot{l}}_{\theta _0} (X_i) + o_{P_{\theta _0}}(1). \end{aligned}$$

In particular, the sequence \(\sqrt{n} ({\hat{\theta }}_n -\theta _0)\) is asymptotically normal with mean zero and covariance matrix \(I^{-1}_{\theta _0}\).

Lemma 8 shows that tensor quantile regression linear model is q.m.d. By Theorem 2 and chain rule, the score function

$$\begin{aligned} {\dot{l}}_N({\textbf{C}}) = - \sum _{i=1}^{n} \rho ^\prime _{\tau \nu _N}\left( \eta _i({\textbf{C}}) \right) \cdot \nabla \eta _i({\textbf{C}}) \end{aligned}$$

is uniformly bounded in y an \({\textbf{x}}\) and continuous in \({\textbf{C}}\) for every y and \({\textbf{x}}\) with \({\textbf{C}}\) ranging over a compact set of \(S_0\). For sufficiently small neighborhood U of \(S_0\), \(\sup _U \left| {\dot{l}}_{N}({\textbf{C}}) \right|\) is square-integrable. Thus, the local Lipschitz condition is satisfied and Lemma 9 applies. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, D., Pietrosanu, M., Mizera, I. et al. Functional Linear Partial Quantile Regression with Guaranteed Convergence for Neuroimaging Data Analysis. Stat Biosci (2024). https://doi.org/10.1007/s12561-023-09412-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12561-023-09412-7

Keywords

Navigation