Skip to main content
Log in

On principal components regression with Hilbertian predictors

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We demonstrate that, in a regression setting with a Hilbertian predictor, a response variable is more likely to be more highly correlated with the leading principal components of the predictor than with trailing ones. This is despite the extraction procedure being unsupervised. Our results are established under the conditional independence model, which includes linear regression and single-index models as special cases, with some assumptions on the regression vector. These results are a generalisation of earlier work which showed that this phenomenon holds for predictors which are real random vectors. A simulation study is used to quantify the phenomenon.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Arnold, B. C., Brockett, P. L. (1992). On distributions whose component ratios are cauchy. American Statistician, 46(1), 25–26.

    MathSciNet  Google Scholar 

  • Artemiou, A., Li, B. (2009). On principal components regression: A statistical explanation of a natural phenomenon. Statistica Sinica, 19, 1557–1565.

    MathSciNet  MATH  Google Scholar 

  • Artemiou, A., Li, B. (2013). Predictive power of principal components for single-index model and sufficient dimension reduction. Journal of Multivariate Analysis, 119, 176–184.

    Article  MathSciNet  Google Scholar 

  • Cook, R. (2007). Fisher lecture: Dimension reduction in regression. Statistical Science, 22(1), 1–26.

    Article  MathSciNet  Google Scholar 

  • Cox, D. R. (1968). Notes on some aspects of regression analysis. Journal of the Royal Statistical Society Series A (General), 131(3), 265–279.

    Article  Google Scholar 

  • Dauxois, J., Ferré, L., Yao, A.-F. (2001). Un modèle semi-paramétrique pour variables aléatoires hilbertiennes. Comptes Rendus de l’Académie des Sciences, 333(1), 947–952.

    MathSciNet  MATH  Google Scholar 

  • Ferré, L., Yao, A. F. (2003). Functional sliced inverse regression analysis. Statistics, 37(6), 475–488.

    Article  MathSciNet  Google Scholar 

  • Hall, P., Yang, Y. J. (2010). Ordering and selecting components in multivariate or functional data linear prediction. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(1), 93–110.

    Article  MathSciNet  Google Scholar 

  • Hsing, T., Eubank, R. (2015). Theoretical foundations of functional data analysis, with an introduction to linear operators. 1st ed. West Sussex: Wiley.

  • Kingman, J. F. C. (1972). On random sequences with spherical symmetry. Biometrika, 59(2), 492.

    Article  MathSciNet  Google Scholar 

  • Li, B. (2007). Comment: Fisher lecture—Dimension reduction in regression. Statistical Science, 22(1), 32–35.

    Article  MathSciNet  Google Scholar 

  • Li, B. (2018). Sufficient dimension reduction: Methods and applications with R. 1st ed. Boca Raton: CRC Press.

  • Li, B., Song, J. (2017). Nonlinear sufficient dimension reduction for functional data. The Annals of Statistics, 45(3), 1059–1095.

    Article  MathSciNet  Google Scholar 

  • Li, Y. (2007). A note on hilbertian elliptically contoured distributions. Unpublished manuscript, Department of Statistics, University of Georgia.

  • Ni, L. (2011). Principal component regression revisited. Statistica Sinica, 21, 741–747.

    Article  MathSciNet  Google Scholar 

  • Pinelis, I., Molzon, R. (2016). Optimal-order bounds on the rate of convergence to normality in the multivariate delta method. Electronic Journal of Statistics, 10(1), 1001–1063.

    Article  MathSciNet  Google Scholar 

  • Ramsay, J., Silverman, B. W. (1997). Functional data analysis. 1st ed. New York: Springer.

    Book  Google Scholar 

Download references

Acknowledgements

We would like to thank the Editor, Associate Editor and two reviewers for their constructive comments and suggestions which helped improve an earlier version of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Jones.

Appendices

Appendix A: Essential definitions

For the benefit of the reader, we present here some fundamental definitions in functional data analysis. These definitions can be found in Hsing and Eubank (2015) along with a deeper exposition of the field. We first define random variables and nuclear operators in and on a Hilbert space, respectively. Although our interest lies in the case where the random variables are random functions, the definitions are given for the more general setting of Hilbertian random variables. We note that this more abstract framework includes function spaces where the functions need not be univariate so this paper applies to, say, predictors which are random fields. This work therefore is relevant to a number of fields including FMRI data analysis, spatial statistics, image processing and speech recognition.

Definition 2

Let \({{\left( \varOmega , \mathfrak {F}, {{\mathbb {P}}}\right) }}\) be a probability space and \({{\left( {{{\mathcal {H}}}},\ {{\mathcal {B}}}{\left( {{{\mathcal {H}}}}\right) }\right) }}\) be a measureable space where \({{\mathcal {H}}}\) is a Hilbert space and \({{\mathcal {B}}}{\left( {{\mathcal {H}}}\right) }\) is its associated Borel \(\sigma \)-field. A measureable function \({{{X}}: {{{\left( \varOmega , \mathfrak {F}, {{\mathbb {P}}}\right) }}} \rightarrow {{{\left( {{{\mathcal {H}}}},\ {{\mathcal {B}}}{\left( {{{\mathcal {H}}}}\right) }\right) }}}}\) is called an H-valued random variable. We also say that \({X}\) is a Hilbertian random variable.

Definition 3

Let \({{\mathcal {H}}}\) be a Hilbert space. A compact operator, that is one which is the operator norm limit of a sequence of finite rank operators, \({{L}: {{{\mathcal {H}}}} \rightarrow {{{\mathcal {H}}}}}\) is said to be a nuclear operator if the sum of its eigenvalues is finite.

Remark 4

The class of nuclear operators on a Hilbert space contains the class of all operators which have finitely many nonzero eigenvalues.

The expectation of a Hilbertian random variable is defined in terms of the Bochner integral—the construction is given in Hsing and Eubank (2015) and is similar to that for the Lebesgue integral so we will not present it here. For our purposes, it is enough to note that for a Hilbertian random variable a, the expectation \({{{\mathbb {E}}}{\left( {a}\right) }}\) is unique, an element of the space \({{\mathcal {H}}}\), and satisfies

$$\begin{aligned} \forall b \in {{\mathcal {H}}}, \ {{{\mathbb {E}}}{\left( {{{\left\langle {b}, {a} \right\rangle _{{{\mathcal {H}}}}}}}\right) }} = {{\left\langle {b}, {{{{\mathbb {E}}}{\left( {a}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}. \end{aligned}$$
(5)

Remark 5

Observe that the expectation on the left-hand side is the expectation of a real random variable, whereas the expectation on the right side is the expectation of an \({{\mathcal {H}}}\)-valued random variable.

We will also require a generalisation of the notion of variance for a Hilbertian random variable, but first we define a tensor product operation.

Definition 4

Let \(x_{1}, x_{2}\) be elements of Hilbert spaces \({{\mathcal {H}}}_{1}\) and \({{\mathcal {H}}}_{2}\), respectively. The tensor product operator\({{{\left( x_{1} \otimes _{1} x_{2}\right) }}: {{{\mathcal {H}}}_{1}} \rightarrow {{{\mathcal {H}}}_{2}}}\) is defined by

$$\begin{aligned} {\left( x_{1} \otimes _{1} x_{2}\right) }y = {\left\langle {x_{1}}, {y} \right\rangle _{{{\mathcal {H}}}_{1}}}x_{2} \end{aligned}$$

for \(y \in {{\mathcal {H}}}_{1}\). If \({{\mathcal {H}}}_{1} = {{\mathcal {H}}}_{2}\), we use \(\otimes \) instead of \(\otimes _{1}\).

In the case where \({{\mathcal {H}}}_{1} = {{\mathcal {H}}}_{2} = {{\mathbb {R}}}^{p}\), we have that \({{x_{1}} \otimes {x_{2}}} = x_{2} {{x_{1}}^{{\mathrm{T}}}}\) so the usual covariance matrix can be written as

$$\begin{aligned} {{{\mathbb {E}}}{\left( {{{{{\left( {{{\mathbf {X}}}} - {{{\mathbb {E}}}{\left( {{{{\mathbf {X}}}}}\right) }}\right) }}} \otimes {{{\left( {{{\mathbf {X}}}} - {{{\mathbb {E}}}{\left( {{{{\mathbf {X}}}}}\right) }}\right) }}}}}\right) }}. \end{aligned}$$

This notation will also be used for a covariance operator, but with \({X}\) being an \({{\mathcal {H}}}\)-valued random variable. We note that all covariance operators on a Hilbert space \({{\mathcal {H}}}\) are compact, nonnegative definite, and self-adjoint. Proofs of these facts can be found in Pinelis and Molzon (2016).

Assuming that the covariance operator of some predictor \({X}\) is nuclear gives meaning to the phrase “PCA captures most of the variability in the data” for the infinite-dimensional setting. This is because it supplies a notion of how much variance there is in total.

The notion of a spherical distribution was central to the work of Artemiou and Li (2013). In the case of data in an infinite-dimensional space, this notion cannot be generalised as is explained below but the idea of an elliptical distribution can. We will thus make use of this concept instead. The following definition is given by Li (2007).

Definition 5

A Hilbertian random variable A, in a Hilbert space \({{\mathcal {H}}}\), has an elliptically symmetric distribution if the characteristic function of \(A - {{{\mathbb {E}}}{\left( {A}\right) }}\) has the following form:

$$\begin{aligned} \psi {\left( f\right) } = \varphi {\left( {{\left\langle {f}, {\varPsi f} \right\rangle _{{{\mathcal {H}}}}}}\right) } \end{aligned}$$

for all \(f \in {{\mathcal {H}}}\), where \(\varPsi \) is a self-adjoint, nonnegative definite, nuclear operator on \({{\mathcal {H}}}\), and \(\varphi \) is a univariate function.

We note that—in the infinite-dimensional Hilbert space setting—\(\varPsi \) in Definition 5 cannot be the identity operator as it is noncompact and thus not nuclear. It can be shown that \(\varPsi \) is, up to multiplication by a constant, the covariance operator (when it exists) of the Hilbertian random variable—the requirement then that the sum of the eigenvalues of A is finite is equivalent to the sum of the variances of the principal components of A being finite. We conclude that we cannot extend the notion of a spherically symmetric distribution to the entirety of an infinite-dimensional space. Note that we can have sphericity in a finite-dimensional subspace.

Appendix B: Proofs

Lemma 1: Define \(\varPhi \) as an operator on \({{\mathcal {H}}}\) by \(\varPhi {\left( x\right) } = {{\left\langle {{g}}, {x} \right\rangle _{{{\mathcal {H}}}}}}\). This operator takes a fixed \(x \in {{\mathcal {H}}}\) and returns a real random variable so it is a random operator. By the Riesz representation theorem, this random operator can be identified with a random element of the dual space \({{{{{\mathcal {H}}}}^{*}}}\) (that is the space of all continuous linear functions from the space \({{\mathcal {H}}}\) into the base field) so there is a unique random adjoint operator \({{{\varPhi }^{*}}}\) such that for all fixed \(x \in {{\mathcal {H}}}\) and \(y \in {{\mathbb {R}}}\), \(\varPhi {\left( x\right) }y = {{\left\langle {x}, {{{{\varPhi }^{*}}}{\left( y\right) }} \right\rangle _{{{\mathcal {H}}}}}}\). It is easy to see that for any fixed \(y \in {{\mathbb {R}}}\), \({{{\varPhi }^{*}}}{\left( y\right) } = y{g}\). We show that, almost surely, \({{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{\varPhi {{\left( X\right) }}, {{g}, {{\varvec{\Gamma }}}}}\right) }}\) is orthogonal to \({{\mathrm {Span}}{\left( {{{\varvec{\Gamma }}}{g}}\right) }}^{\perp }\). For convenience, let T be the tuple \({\left( \varPhi {{\left( X\right) }}, {{g}, {{\varvec{\Gamma }}}}\right) }\). Let \(x \in {{\mathrm {Span}}{\left( {{{\varvec{\Gamma }}}{g}}\right) }}^{\perp }\), which is a random variable in \({{\mathcal {H}}}\), then we have the following:

$$\begin{aligned} \forall z \in {{\mathrm {Span}}{\left( {{{\varvec{\Gamma }}}{g}}\right) }}, {{\left\langle {x}, {z} \right\rangle _{{{\mathcal {H}}}}}} = 0 \implies \forall y \in {{\mathbb {R}}}, {{\left\langle {x}, {y {{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = 0 \end{aligned}$$

which implies that for any fixed \(y \in {{\mathbb {R}}}\)

$$\begin{aligned} {{\left\langle {x}, {y {{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = {{\left\langle {x}, {{{\varvec{\Gamma }}}{\left( y{g}\right) }} \right\rangle _{{{\mathcal {H}}}}}} = {{\left\langle {{{\varvec{\Gamma }}}x}, {y{g}} \right\rangle _{{{\mathcal {H}}}}}} = {{\left\langle {{{\varvec{\Gamma }}}x}, {\varPhi ^{*}{y}} \right\rangle _{{{\mathcal {H}}}}}} = \varPhi {\left( {{{\varvec{\Gamma }}}x}\right) }y =0 \end{aligned}$$

where the first and second equalities follow from the linearity and self-adjointedness of \({{\varvec{\Gamma }}}\). The above now implies that \(\varPhi {\left( {{\varvec{\Gamma }}}x\right) } = 0\) and therefore \({{\varvec{\Gamma }}}x \in {{\mathrm {Ker}}{\left( {\varPhi }\right) }}\).

Consider now \({{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}^{2}}\right) }}\). Showing this to be 0 gives the result, as it is the expectation of a squared random variable.

$$\begin{aligned} {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}^{2}}\right) }}&= {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}} {{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }} {{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}} {{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( { {{\left\langle {x}, {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}\\ \end{aligned}$$

where the second equality follows from Eq. 5; the third and fourth equalities follow by moving the second inner product into the expectation; the fifth equality uses Eq. 5 again. Now by Assumption 4, there is a real constant A such that \({{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }} = A\varPhi {{\left( X\right) }}\).

Therefore,

$$\begin{aligned} {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}&= {{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {A\varPhi {{\left( X\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }} \\&= A{{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {\varPhi {{\left( X\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }} \\&= A{{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}\right) }}\\&= A{{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {{\varvec{\Gamma }}}}}\right) }}}\right) }}\\&= A{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{\varvec{\Gamma }}}}\right) }} \\&= A{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}}{\left. \vert \right. }{{{\varvec{\Gamma }}}}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}\\&= A{{\left\langle {x}, {{{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = A {{\left\langle {{{\varvec{\Gamma }}}x}, {g} \right\rangle _{{{\mathcal {H}}}}}} = 0. \end{aligned}$$

\(\square \)

Lemma 2: S is an element of \({l}^{2}\) because \({{\mathcal {H}}}\) and \({l}^{2}\) are isomorphic, up to isometry, and by the same reasoning S is elliptically distributed. Now let \({{P}: {{l}^{2}} \rightarrow {{{\mathbb {R}}}^{n}}}\) be the operator which truncates a sequence at the \({{n}^{\mathrm {th}}}\) term. This operator is compact and therefore bounded, so by Theorem 4 of Li (2007), the vector \(T_{n}\) is elliptically distributed. \(\square \)

Theorem 5: From the definition of correlation:

$$\begin{aligned} {{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} = \frac{{{\mathrm {Cov}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Var}}{\left( {{Y}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}. \end{aligned}$$
(6)

Now, recall that conditional expectation is a self-adjoint operator in the covariance inner product. That is for any random variables \(U_{1}, U_{2},U_{3}\), we have

$$\begin{aligned} {{\mathrm {Cov}}{\left( {{{{\mathbb {E}}}{\left( {U_{1}}{\left. \vert \right. }{U_{2}}\right) }}},{U_{3}}\right) }} = {{\mathrm {Cov}}{\left( {U_{1}},{{{{\mathbb {E}}}{\left( {U_{2}}{\left. \vert \right. }{U_{3}}\right) }}}\right) }}. \end{aligned}$$

Consider:

$$\begin{aligned} {{\mathrm {Cov}}{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}&= {{\mathrm {Cov}}{\left( {{Y}}, {{{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {X}, {{\varvec{\Gamma }}}}}\right) }}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \nonumber \\&= {{\mathrm {Cov}}{\left( {{{{\mathbb {E}}}{\left( {{Y}}{\left. \vert \right. }{{{g}, {X}, {{\varvec{\Gamma }}}}}\right) }}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \nonumber \\&= {{\mathrm {Cov}}{\left( {{{{\mathbb {E}}}{\left( {{Y}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \nonumber \\&= {{\mathrm {Cov}}{\left( {{Y}}, {{{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \end{aligned}$$
(7)

where the third equality follows as \({{{Y}} {}{\,\perp \!\perp \,}{{X}} {\left. \vert \right. }{{\left( {{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}\right) }}}\). As Assumption 4 holds, there is a real constant \({\alpha }_{i}\) such that \({{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }} = {\alpha }_{i}{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}\), and similarly for j. Thus, Eq. 7 becomes:

$$\begin{aligned} {{\mathrm {Cov}}{\left( {{Y}}, {{\alpha }_{i}{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} = {\alpha }_{i}{{\mathrm {Cov}}{\left( {{Y}}, {{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}. \end{aligned}$$

Substituting this into Eq. 6, we find that

$$\begin{aligned} {{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} = \frac{{\alpha }_{i}^{2}{{\mathrm {Cov}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Var}}{\left( {{Y}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, { {{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{g}, {{\varvec{\Gamma }}}}\right) }}} = \frac{{\alpha }_{i}^{2}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{g}, {{\varvec{\Gamma }}}}\right) }}}{{\alpha }_{j}^{2}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}. \end{aligned}$$

As \({g}{}{\,\perp \!\perp \,}{\left( {{X}, {{\varvec{\Gamma }}}}\right) }\), \({{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}= {{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{\varvec{\Gamma }}}}\right) }} = {{\lambda _{i}}}\) and similarly for j. Thus,

$$\begin{aligned} \frac{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}} = \frac{{\alpha }_{i}^{2}{{\lambda _{j}}}}{{\alpha }_{j}^{2}{{\lambda _{i}}}}. \end{aligned}$$
(8)

Now look back at Eq. 7. By Eq. 5, we see that

$$\begin{aligned}&{{\mathrm {Cov}}{\left( {{Y}}, {{{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \\&\quad ={{\mathrm {Cov}}{\left( {{Y}}, {{{\left\langle {{{\phi _{i}}}}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}. \end{aligned}$$

By Lemma 1, \({{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }} = c{{\varvec{\Gamma }}}{g}\) for some constant c. Hence,

$$\begin{aligned} {{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }} = {\alpha }_{i} {{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}= {{\left\langle {{{\phi _{i}}}}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}} \right\rangle _{{{\mathcal {H}}}}}} = c{{\left\langle {{{\phi _{i}}}}, {{{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}}. \end{aligned}$$

Now we have

$$\begin{aligned} c{{\left\langle {{{\phi _{i}}}}, {{{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = c{{\left\langle {{{\varvec{\Gamma }}}{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}} = c{{\lambda _{i}}}{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}. \end{aligned}$$

Consequently,

$$\begin{aligned} {\alpha }_{i} = \frac{c{{\lambda _{i}}}{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}} \end{aligned}$$

and similarly for \({\alpha }_{j}\). So Eq. 8 can be rewritten as

$$\begin{aligned} \frac{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}} = \frac{{{\lambda _{i}}}{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}^{2}}{{{\lambda _{j}}}{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}^{2}}. \end{aligned}$$

Now by Assumption 5, \({{\left\{ {{{{\left\langle {{{\phi _{k}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}\right\} }}_{k \in {{{{\mathbb {N}}}\cap \left[ 1, {n}\right] }}}\) is spherically symmetric for any n. Therefore, by Theorem 1 of Arnold and Brockett (1992), \(\frac{{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}{{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}\) has a standard Cauchy distribution. Thus,

\(\square \)

Theorem 6: The proof is similar to that of Theorem 5 up to the point where we have shown that:

$$\begin{aligned} {{{\mathbb {P}}}{\left( {{\rho _{i}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }} > {\rho _{j}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }}}\right) }} = {{{\mathbb {P}}}{\left( {-\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}< \frac{{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}{{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}} < \sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}}\right) }}. \end{aligned}$$

Now as \({g}\) has an elliptical distribution, we apply Lemma 2 and Theorem 2 of Arnold and Brockett (1992) to find that \({{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}/{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}\) has a general Cauchy distribution with scale parameter \({\gamma _{ij}}\) and location \({\kappa _{ij}}\). Thus,

$$\begin{aligned}&{{{\mathbb {P}}}{\left( {{\rho _{i}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }} > {\rho _{j}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }}}\right) }} \\&\quad = \frac{1}{\pi } \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } + \frac{1}{2} - \frac{1}{\pi }\arctan {\left( {\frac{-\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } -\frac{1}{2} \\&\quad = \frac{1}{\pi } {\left( \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } - \arctan {\left( {\frac{-\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) }\right) }. \end{aligned}$$

Using \(\arctan {\left( -x\right) } = - \arctan {\left( x\right) } \), we have that the above is equal to:

$$\begin{aligned} \frac{1}{\pi } {\left( \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } + \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} + {\kappa _{ij}}}{{\gamma _{ij}}}}\right) }\right) }. \end{aligned}$$

Using \(\arctan {\left( u\right) } + \arctan {\left( v\right) } = \arctan {\left( \frac{u+v}{1-uv}\right) }\) provided \(uv \ne 1\) and the result is taken modulo \(\pi \) we have that the above probability is now:

$$\begin{aligned} \frac{1}{\pi } \arctan {\left( \frac{{\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}+ {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} + {\kappa _{ij}}}{{\gamma _{ij}}}}\right) }}{1-{\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}{\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} + {\kappa _{ij}}}{{\gamma _{ij}}}}}\right) } =&\frac{1}{\pi }\arctan {\left( \frac{{\frac{2\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}}{{\gamma _{ij}}}}}{{1-{\left( \frac{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}} - {\kappa _{ij}}^{2}}{{\gamma _{ij}}^{2}}\right) }}}\right) }\\&= \frac{1}{\pi }\arctan {\left( \frac{{2{\gamma _{ij}}\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}}}{{{\gamma _{ij}}^{2}-\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}} + {\kappa _{ij}}^{2}}}\right) }. \end{aligned}$$

We see that the numerator is equal to \({d_{ij, {1}}}\) and the denominator equal to \({d_{ij, {2}}}\). Then, using \(\arctan {\left( x\right) } = 2 \arctan {\left( \frac{x}{1+\sqrt{1 + x^{2}}}\right) }\), we can rewrite the above and simplify to obtain:

$$\begin{aligned} \frac{2}{\pi }\arctan {\left( \frac{{d_{ij, {1}}}}{{d_{ij, {2}}} + \sqrt{{d_{ij, {1}}}^{2} + {d_{ij, {2}}}^{2}}}\right) }. \end{aligned}$$

\(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jones, B., Artemiou, A. On principal components regression with Hilbertian predictors. Ann Inst Stat Math 72, 627–644 (2020). https://doi.org/10.1007/s10463-018-0702-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0702-9

Keywords

Navigation