On principal components regression with Hilbertian predictors

Jones, Ben; Artemiou, Andreas

doi:10.1007/s10463-018-0702-9

On principal components regression with Hilbertian predictors

Published: 06 December 2018

Volume 72, pages 627–644, (2020)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Ben Jones¹ &
Andreas Artemiou¹

298 Accesses
4 Citations
Explore all metrics

Abstract

We demonstrate that, in a regression setting with a Hilbertian predictor, a response variable is more likely to be more highly correlated with the leading principal components of the predictor than with trailing ones. This is despite the extraction procedure being unsupervised. Our results are established under the conditional independence model, which includes linear regression and single-index models as special cases, with some assumptions on the regression vector. These results are a generalisation of earlier work which showed that this phenomenon holds for predictors which are real random vectors. A simulation study is used to quantify the phenomenon.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Article 23 August 2022

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

On Predicting Principal Components Through Linear Mixed Models

References

Arnold, B. C., Brockett, P. L. (1992). On distributions whose component ratios are cauchy. American Statistician, 46(1), 25–26.
MathSciNet Google Scholar
Artemiou, A., Li, B. (2009). On principal components regression: A statistical explanation of a natural phenomenon. Statistica Sinica, 19, 1557–1565.
MathSciNet MATH Google Scholar
Artemiou, A., Li, B. (2013). Predictive power of principal components for single-index model and sufficient dimension reduction. Journal of Multivariate Analysis, 119, 176–184.
Article MathSciNet Google Scholar
Cook, R. (2007). Fisher lecture: Dimension reduction in regression. Statistical Science, 22(1), 1–26.
Article MathSciNet Google Scholar
Cox, D. R. (1968). Notes on some aspects of regression analysis. Journal of the Royal Statistical Society Series A (General), 131(3), 265–279.
Article Google Scholar
Dauxois, J., Ferré, L., Yao, A.-F. (2001). Un modèle semi-paramétrique pour variables aléatoires hilbertiennes. Comptes Rendus de l’Académie des Sciences, 333(1), 947–952.
MathSciNet MATH Google Scholar
Ferré, L., Yao, A. F. (2003). Functional sliced inverse regression analysis. Statistics, 37(6), 475–488.
Article MathSciNet Google Scholar
Hall, P., Yang, Y. J. (2010). Ordering and selecting components in multivariate or functional data linear prediction. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(1), 93–110.
Article MathSciNet Google Scholar
Hsing, T., Eubank, R. (2015). Theoretical foundations of functional data analysis, with an introduction to linear operators. 1st ed. West Sussex: Wiley.
Kingman, J. F. C. (1972). On random sequences with spherical symmetry. Biometrika, 59(2), 492.
Article MathSciNet Google Scholar
Li, B. (2007). Comment: Fisher lecture—Dimension reduction in regression. Statistical Science, 22(1), 32–35.
Article MathSciNet Google Scholar
Li, B. (2018). Sufficient dimension reduction: Methods and applications with R. 1st ed. Boca Raton: CRC Press.
Li, B., Song, J. (2017). Nonlinear sufficient dimension reduction for functional data. The Annals of Statistics, 45(3), 1059–1095.
Article MathSciNet Google Scholar
Li, Y. (2007). A note on hilbertian elliptically contoured distributions. Unpublished manuscript, Department of Statistics, University of Georgia.
Ni, L. (2011). Principal component regression revisited. Statistica Sinica, 21, 741–747.
Article MathSciNet Google Scholar
Pinelis, I., Molzon, R. (2016). Optimal-order bounds on the rate of convergence to normality in the multivariate delta method. Electronic Journal of Statistics, 10(1), 1001–1063.
Article MathSciNet Google Scholar
Ramsay, J., Silverman, B. W. (1997). Functional data analysis. 1st ed. New York: Springer.
Book Google Scholar

Download references

Acknowledgements

We would like to thank the Editor, Associate Editor and two reviewers for their constructive comments and suggestions which helped improve an earlier version of the manuscript.

Author information

Authors and Affiliations

School of Mathematics, Cardiff University, Senghenydd Road, Cardiff, CF24 4AG, UK
Ben Jones & Andreas Artemiou

Authors

Ben Jones
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Artemiou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben Jones.

Appendices

Appendix A: Essential definitions

For the benefit of the reader, we present here some fundamental definitions in functional data analysis. These definitions can be found in Hsing and Eubank (2015) along with a deeper exposition of the field. We first define random variables and nuclear operators in and on a Hilbert space, respectively. Although our interest lies in the case where the random variables are random functions, the definitions are given for the more general setting of Hilbertian random variables. We note that this more abstract framework includes function spaces where the functions need not be univariate so this paper applies to, say, predictors which are random fields. This work therefore is relevant to a number of fields including FMRI data analysis, spatial statistics, image processing and speech recognition.

Definition 2

Let ${{\left( \varOmega , \mathfrak {F}, {{\mathbb {P}}}\right) }}$ be a probability space and ${{\left( {{{\mathcal {H}}}},\ {{\mathcal {B}}}{\left( {{{\mathcal {H}}}}\right) }\right) }}$ be a measureable space where ${{\mathcal {H}}}$ is a Hilbert space and ${{\mathcal {B}}}{\left( {{\mathcal {H}}}\right) }$ is its associated Borel $\sigma $-field. A measureable function ${{{X}}: {{{\left( \varOmega , \mathfrak {F}, {{\mathbb {P}}}\right) }}} \rightarrow {{{\left( {{{\mathcal {H}}}},\ {{\mathcal {B}}}{\left( {{{\mathcal {H}}}}\right) }\right) }}}}$ is called an H-valued random variable. We also say that ${X}$ is a Hilbertian random variable.

Definition 3

Let ${{\mathcal {H}}}$ be a Hilbert space. A compact operator, that is one which is the operator norm limit of a sequence of finite rank operators, ${{L}: {{{\mathcal {H}}}} \rightarrow {{{\mathcal {H}}}}}$ is said to be a nuclear operator if the sum of its eigenvalues is finite.

Remark 4

The class of nuclear operators on a Hilbert space contains the class of all operators which have finitely many nonzero eigenvalues.

The expectation of a Hilbertian random variable is defined in terms of the Bochner integral—the construction is given in Hsing and Eubank (2015) and is similar to that for the Lebesgue integral so we will not present it here. For our purposes, it is enough to note that for a Hilbertian random variable a, the expectation ${{{\mathbb {E}}}{\left( {a}\right) }}$ is unique, an element of the space ${{\mathcal {H}}}$, and satisfies

$$\begin{aligned} \forall b \in {{\mathcal {H}}}, \ {{{\mathbb {E}}}{\left( {{{\left\langle {b}, {a} \right\rangle _{{{\mathcal {H}}}}}}}\right) }} = {{\left\langle {b}, {{{{\mathbb {E}}}{\left( {a}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}. \end{aligned}$$

(5)

Remark 5

Observe that the expectation on the left-hand side is the expectation of a real random variable, whereas the expectation on the right side is the expectation of an ${{\mathcal {H}}}$-valued random variable.

We will also require a generalisation of the notion of variance for a Hilbertian random variable, but first we define a tensor product operation.

Definition 4

Let $x_{1}, x_{2}$ be elements of Hilbert spaces ${{\mathcal {H}}}_{1}$ and ${{\mathcal {H}}}_{2}$, respectively. The tensor product operator${{{\left( x_{1} \otimes _{1} x_{2}\right) }}: {{{\mathcal {H}}}_{1}} \rightarrow {{{\mathcal {H}}}_{2}}}$ is defined by

$$\begin{aligned} {\left( x_{1} \otimes _{1} x_{2}\right) }y = {\left\langle {x_{1}}, {y} \right\rangle _{{{\mathcal {H}}}_{1}}}x_{2} \end{aligned}$$

for $y \in {{\mathcal {H}}}_{1}$. If ${{\mathcal {H}}}_{1} = {{\mathcal {H}}}_{2}$, we use $\otimes $ instead of $\otimes _{1}$.

In the case where ${{\mathcal {H}}}_{1} = {{\mathcal {H}}}_{2} = {{\mathbb {R}}}^{p}$, we have that ${{x_{1}} \otimes {x_{2}}} = x_{2} {{x_{1}}^{{\mathrm{T}}}}$ so the usual covariance matrix can be written as

$$\begin{aligned} {{{\mathbb {E}}}{\left( {{{{{\left( {{{\mathbf {X}}}} - {{{\mathbb {E}}}{\left( {{{{\mathbf {X}}}}}\right) }}\right) }}} \otimes {{{\left( {{{\mathbf {X}}}} - {{{\mathbb {E}}}{\left( {{{{\mathbf {X}}}}}\right) }}\right) }}}}}\right) }}. \end{aligned}$$

This notation will also be used for a covariance operator, but with ${X}$ being an ${{\mathcal {H}}}$-valued random variable. We note that all covariance operators on a Hilbert space ${{\mathcal {H}}}$ are compact, nonnegative definite, and self-adjoint. Proofs of these facts can be found in Pinelis and Molzon (2016).

Assuming that the covariance operator of some predictor ${X}$ is nuclear gives meaning to the phrase “PCA captures most of the variability in the data” for the infinite-dimensional setting. This is because it supplies a notion of how much variance there is in total.

The notion of a spherical distribution was central to the work of Artemiou and Li (2013). In the case of data in an infinite-dimensional space, this notion cannot be generalised as is explained below but the idea of an elliptical distribution can. We will thus make use of this concept instead. The following definition is given by Li (2007).

Definition 5

A Hilbertian random variable A, in a Hilbert space ${{\mathcal {H}}}$, has an elliptically symmetric distribution if the characteristic function of $A - {{{\mathbb {E}}}{\left( {A}\right) }}$ has the following form:

$$\begin{aligned} \psi {\left( f\right) } = \varphi {\left( {{\left\langle {f}, {\varPsi f} \right\rangle _{{{\mathcal {H}}}}}}\right) } \end{aligned}$$

for all $f \in {{\mathcal {H}}}$, where $\varPsi $ is a self-adjoint, nonnegative definite, nuclear operator on ${{\mathcal {H}}}$, and $\varphi $ is a univariate function.

We note that—in the infinite-dimensional Hilbert space setting—$\varPsi $ in Definition 5 cannot be the identity operator as it is noncompact and thus not nuclear. It can be shown that $\varPsi $ is, up to multiplication by a constant, the covariance operator (when it exists) of the Hilbertian random variable—the requirement then that the sum of the eigenvalues of A is finite is equivalent to the sum of the variances of the principal components of A being finite. We conclude that we cannot extend the notion of a spherically symmetric distribution to the entirety of an infinite-dimensional space. Note that we can have sphericity in a finite-dimensional subspace.

Appendix B: Proofs

Lemma 1: Define $\varPhi $ as an operator on ${{\mathcal {H}}}$ by $\varPhi {\left( x\right) } = {{\left\langle {{g}}, {x} \right\rangle _{{{\mathcal {H}}}}}}$. This operator takes a fixed $x \in {{\mathcal {H}}}$ and returns a real random variable so it is a random operator. By the Riesz representation theorem, this random operator can be identified with a random element of the dual space ${{{{{\mathcal {H}}}}^{*}}}$ (that is the space of all continuous linear functions from the space ${{\mathcal {H}}}$ into the base field) so there is a unique random adjoint operator ${{{\varPhi }^{*}}}$ such that for all fixed $x \in {{\mathcal {H}}}$ and $y \in {{\mathbb {R}}}$, $\varPhi {\left( x\right) }y = {{\left\langle {x}, {{{{\varPhi }^{*}}}{\left( y\right) }} \right\rangle _{{{\mathcal {H}}}}}}$. It is easy to see that for any fixed $y \in {{\mathbb {R}}}$, ${{{\varPhi }^{*}}}{\left( y\right) } = y{g}$. We show that, almost surely, ${{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{\varPhi {{\left( X\right) }}, {{g}, {{\varvec{\Gamma }}}}}\right) }}$ is orthogonal to ${{\mathrm {Span}}{\left( {{{\varvec{\Gamma }}}{g}}\right) }}^{\perp }$. For convenience, let T be the tuple ${\left( \varPhi {{\left( X\right) }}, {{g}, {{\varvec{\Gamma }}}}\right) }$. Let $x \in {{\mathrm {Span}}{\left( {{{\varvec{\Gamma }}}{g}}\right) }}^{\perp }$, which is a random variable in ${{\mathcal {H}}}$, then we have the following:

$$\begin{aligned} \forall z \in {{\mathrm {Span}}{\left( {{{\varvec{\Gamma }}}{g}}\right) }}, {{\left\langle {x}, {z} \right\rangle _{{{\mathcal {H}}}}}} = 0 \implies \forall y \in {{\mathbb {R}}}, {{\left\langle {x}, {y {{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = 0 \end{aligned}$$

which implies that for any fixed $y \in {{\mathbb {R}}}$

$$\begin{aligned} {{\left\langle {x}, {y {{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = {{\left\langle {x}, {{{\varvec{\Gamma }}}{\left( y{g}\right) }} \right\rangle _{{{\mathcal {H}}}}}} = {{\left\langle {{{\varvec{\Gamma }}}x}, {y{g}} \right\rangle _{{{\mathcal {H}}}}}} = {{\left\langle {{{\varvec{\Gamma }}}x}, {\varPhi ^{*}{y}} \right\rangle _{{{\mathcal {H}}}}}} = \varPhi {\left( {{{\varvec{\Gamma }}}x}\right) }y =0 \end{aligned}$$

where the first and second equalities follow from the linearity and self-adjointedness of ${{\varvec{\Gamma }}}$. The above now implies that $\varPhi {\left( {{\varvec{\Gamma }}}x\right) } = 0$ and therefore ${{\varvec{\Gamma }}}x \in {{\mathrm {Ker}}{\left( {\varPhi }\right) }}$.

Consider now ${{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}^{2}}\right) }}$. Showing this to be 0 gives the result, as it is the expectation of a squared random variable.

$$\begin{aligned} {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}^{2}}\right) }}&= {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}} {{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }} {{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}} {{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( { {{\left\langle {x}, {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{T}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}\\&= {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}\\ \end{aligned}$$

where the second equality follows from Eq. 5; the third and fourth equalities follow by moving the second inner product into the expectation; the fifth equality uses Eq. 5 again. Now by Assumption 4, there is a real constant A such that ${{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }} = A\varPhi {{\left( X\right) }}$.

Therefore,

$$\begin{aligned} {{{\mathbb {E}}}{\left( { {{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }}&= {{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {A\varPhi {{\left( X\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }} \\&= A{{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {\varPhi {{\left( X\right) }}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{T}\right) }}}\right) }} \\&= A{{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}\right) }}\\&= A{{{\mathbb {E}}}{\left( {{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {{\varvec{\Gamma }}}}}\right) }}}\right) }}\\&= A{{{\mathbb {E}}}{\left( {{{\left\langle {x}, {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{\varvec{\Gamma }}}}\right) }} \\&= A{{\left\langle {x}, {{{{\mathbb {E}}}{\left( {{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}{X}}{\left. \vert \right. }{{{\varvec{\Gamma }}}}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}\\&= A{{\left\langle {x}, {{{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = A {{\left\langle {{{\varvec{\Gamma }}}x}, {g} \right\rangle _{{{\mathcal {H}}}}}} = 0. \end{aligned}$$

$\square $

Lemma 2: S is an element of ${l}^{2}$ because ${{\mathcal {H}}}$ and ${l}^{2}$ are isomorphic, up to isometry, and by the same reasoning S is elliptically distributed. Now let ${{P}: {{l}^{2}} \rightarrow {{{\mathbb {R}}}^{n}}}$ be the operator which truncates a sequence at the ${{n}^{\mathrm {th}}}$ term. This operator is compact and therefore bounded, so by Theorem 4 of Li (2007), the vector $T_{n}$ is elliptically distributed. $\square $

Theorem 5: From the definition of correlation:

$$\begin{aligned} {{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} = \frac{{{\mathrm {Cov}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Var}}{\left( {{Y}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}. \end{aligned}$$

(6)

Now, recall that conditional expectation is a self-adjoint operator in the covariance inner product. That is for any random variables $U_{1}, U_{2},U_{3}$, we have

$$\begin{aligned} {{\mathrm {Cov}}{\left( {{{{\mathbb {E}}}{\left( {U_{1}}{\left. \vert \right. }{U_{2}}\right) }}},{U_{3}}\right) }} = {{\mathrm {Cov}}{\left( {U_{1}},{{{{\mathbb {E}}}{\left( {U_{2}}{\left. \vert \right. }{U_{3}}\right) }}}\right) }}. \end{aligned}$$

Consider:

$$\begin{aligned} {{\mathrm {Cov}}{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}&= {{\mathrm {Cov}}{\left( {{Y}}, {{{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {X}, {{\varvec{\Gamma }}}}}\right) }}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \nonumber \\&= {{\mathrm {Cov}}{\left( {{{{\mathbb {E}}}{\left( {{Y}}{\left. \vert \right. }{{{g}, {X}, {{\varvec{\Gamma }}}}}\right) }}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \nonumber \\&= {{\mathrm {Cov}}{\left( {{{{\mathbb {E}}}{\left( {{Y}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \nonumber \\&= {{\mathrm {Cov}}{\left( {{Y}}, {{{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \end{aligned}$$

(7)

where the third equality follows as ${{{Y}} {}{\,\perp \!\perp \,}{{X}} {\left. \vert \right. }{{\left( {{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}\right) }}}$. As Assumption 4 holds, there is a real constant ${\alpha }_{i}$ such that ${{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }} = {\alpha }_{i}{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}$, and similarly for j. Thus, Eq. 7 becomes:

$$\begin{aligned} {{\mathrm {Cov}}{\left( {{Y}}, {{\alpha }_{i}{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} = {\alpha }_{i}{{\mathrm {Cov}}{\left( {{Y}}, {{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}. \end{aligned}$$

Substituting this into Eq. 6, we find that

$$\begin{aligned} {{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} = \frac{{\alpha }_{i}^{2}{{\mathrm {Cov}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Var}}{\left( {{Y}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, { {{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{g}, {{\varvec{\Gamma }}}}\right) }}} = \frac{{\alpha }_{i}^{2}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{g}, {{\varvec{\Gamma }}}}\right) }}}{{\alpha }_{j}^{2}{{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}. \end{aligned}$$

As ${g}{}{\,\perp \!\perp \,}{\left( {{X}, {{\varvec{\Gamma }}}}\right) }$, ${{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}= {{\mathrm {Var}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{\varvec{\Gamma }}}}\right) }} = {{\lambda _{i}}}$ and similarly for j. Thus,

$$\begin{aligned} \frac{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}} = \frac{{\alpha }_{i}^{2}{{\lambda _{j}}}}{{\alpha }_{j}^{2}{{\lambda _{i}}}}. \end{aligned}$$

(8)

Now look back at Eq. 7. By Eq. 5, we see that

$$\begin{aligned}&{{\mathrm {Cov}}{\left( {{Y}}, {{{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }} \\&\quad ={{\mathrm {Cov}}{\left( {{Y}}, {{{\left\langle {{{\phi _{i}}}}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}} \right\rangle _{{{\mathcal {H}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}. \end{aligned}$$

By Lemma 1, ${{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }} = c{{\varvec{\Gamma }}}{g}$ for some constant c. Hence,

$$\begin{aligned} {{{\mathbb {E}}}{\left( {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }} = {\alpha }_{i} {{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}= {{\left\langle {{{\phi _{i}}}}, {{{{\mathbb {E}}}{\left( {{X}}{\left. \vert \right. }{{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}, {g}, {{\varvec{\Gamma }}}}}\right) }}} \right\rangle _{{{\mathcal {H}}}}}} = c{{\left\langle {{{\phi _{i}}}}, {{{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}}. \end{aligned}$$

Now we have

$$\begin{aligned} c{{\left\langle {{{\phi _{i}}}}, {{{\varvec{\Gamma }}}{g}} \right\rangle _{{{\mathcal {H}}}}}} = c{{\left\langle {{{\varvec{\Gamma }}}{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}} = c{{\lambda _{i}}}{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}. \end{aligned}$$

Consequently,

$$\begin{aligned} {\alpha }_{i} = \frac{c{{\lambda _{i}}}{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}{{{{{\left\langle {{g}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}} \end{aligned}$$

and similarly for ${\alpha }_{j}$. So Eq. 8 can be rewritten as

$$\begin{aligned} \frac{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{i}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}}{{{\mathrm {Corr}}^{2}\!{\left( {{Y}}, {{{{{\left\langle {{{\phi _{j}}}}, {{X}} \right\rangle _{{{\mathcal {H}}}}}}}}}{\left. \vert \right. }{{{g}, {{\varvec{\Gamma }}}}}\right) }}} = \frac{{{\lambda _{i}}}{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}^{2}}{{{\lambda _{j}}}{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}^{2}}. \end{aligned}$$

Now by Assumption 5, ${{\left\{ {{{{\left\langle {{{\phi _{k}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}\right\} }}_{k \in {{{{\mathbb {N}}}\cap \left[ 1, {n}\right] }}}$ is spherically symmetric for any n. Therefore, by Theorem 1 of Arnold and Brockett (1992), $\frac{{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}{{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}$ has a standard Cauchy distribution. Thus,

$\square $

Theorem 6: The proof is similar to that of Theorem 5 up to the point where we have shown that:

$$\begin{aligned} {{{\mathbb {P}}}{\left( {{\rho _{i}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }} > {\rho _{j}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }}}\right) }} = {{{\mathbb {P}}}{\left( {-\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}< \frac{{{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}}{{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}} < \sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}}\right) }}. \end{aligned}$$

Now as ${g}$ has an elliptical distribution, we apply Lemma 2 and Theorem 2 of Arnold and Brockett (1992) to find that ${{{\left\langle {{{\phi _{i}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}/{{{\left\langle {{{\phi _{j}}}}, {{g}} \right\rangle _{{{\mathcal {H}}}}}}}$ has a general Cauchy distribution with scale parameter ${\gamma _{ij}}$ and location ${\kappa _{ij}}$. Thus,

$$\begin{aligned}&{{{\mathbb {P}}}{\left( {{\rho _{i}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }} > {\rho _{j}{\left( {{{g}, {{\varvec{\Gamma }}}}}\right) }}}\right) }} \\&\quad = \frac{1}{\pi } \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } + \frac{1}{2} - \frac{1}{\pi }\arctan {\left( {\frac{-\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } -\frac{1}{2} \\&\quad = \frac{1}{\pi } {\left( \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } - \arctan {\left( {\frac{-\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) }\right) }. \end{aligned}$$

Using $\arctan {\left( -x\right) } = - \arctan {\left( x\right) } $, we have that the above is equal to:

$$\begin{aligned} \frac{1}{\pi } {\left( \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}\right) } + \arctan {\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} + {\kappa _{ij}}}{{\gamma _{ij}}}}\right) }\right) }. \end{aligned}$$

Using $\arctan {\left( u\right) } + \arctan {\left( v\right) } = \arctan {\left( \frac{u+v}{1-uv}\right) }$ provided $uv \ne 1$ and the result is taken modulo $\pi $ we have that the above probability is now:

$$\begin{aligned} \frac{1}{\pi } \arctan {\left( \frac{{\left( {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}+ {\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} + {\kappa _{ij}}}{{\gamma _{ij}}}}\right) }}{1-{\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} - {\kappa _{ij}}}{{\gamma _{ij}}}}{\frac{\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}} + {\kappa _{ij}}}{{\gamma _{ij}}}}}\right) } =&\frac{1}{\pi }\arctan {\left( \frac{{\frac{2\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}}{{\gamma _{ij}}}}}{{1-{\left( \frac{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}} - {\kappa _{ij}}^{2}}{{\gamma _{ij}}^{2}}\right) }}}\right) }\\&= \frac{1}{\pi }\arctan {\left( \frac{{2{\gamma _{ij}}\sqrt{\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}}}}}{{{\gamma _{ij}}^{2}-\frac{{{\lambda _{i}}}}{{{\lambda _{j}}}} + {\kappa _{ij}}^{2}}}\right) }. \end{aligned}$$

We see that the numerator is equal to ${d_{ij, {1}}}$ and the denominator equal to ${d_{ij, {2}}}$. Then, using $\arctan {\left( x\right) } = 2 \arctan {\left( \frac{x}{1+\sqrt{1 + x^{2}}}\right) }$, we can rewrite the above and simplify to obtain:

$$\begin{aligned} \frac{2}{\pi }\arctan {\left( \frac{{d_{ij, {1}}}}{{d_{ij, {2}}} + \sqrt{{d_{ij, {1}}}^{2} + {d_{ij, {2}}}^{2}}}\right) }. \end{aligned}$$

$\square $

About this article

Cite this article

Jones, B., Artemiou, A. On principal components regression with Hilbertian predictors. Ann Inst Stat Math 72, 627–644 (2020). https://doi.org/10.1007/s10463-018-0702-9

Download citation

Received: 06 April 2018
Revised: 23 September 2018
Published: 06 December 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10463-018-0702-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On principal components regression with Hilbertian predictors

Abstract

Access this article

Similar content being viewed by others

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

On Predicting Principal Components Through Linear Mixed Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Essential definitions

Definition 2

Definition 3

Remark 4

Remark 5

Definition 4

Definition 5

Appendix B: Proofs

About this article

Cite this article

Keywords

Navigation

On principal components regression with Hilbertian predictors

Abstract

Access this article

Similar content being viewed by others

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

On Predicting Principal Components Through Linear Mixed Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Essential definitions

Definition 2

Definition 3

Remark 4

Remark 5

Definition 4

Definition 5

Appendix B: Proofs

About this article

Cite this article

Share this article

Keywords

Search

Navigation