Optimal classification of Gaussian processes in homo- and heteroscedastic settings

Abstract

A procedure to derive optimal discrimination rules is formulated for binary functional classification problems in which the instances available for induction are characterized by random trajectories sampled from different Gaussian processes, depending on the class label. Specifically, these optimal rules are derived as the asymptotic form of the quadratic discriminant for the discretely monitored trajectories in the limit that the set of monitoring points becomes dense in the interval on which the processes are defined. The main goal of this work is to provide a detailed analysis of such optimal rules in the dense monitoring limit, with a particular focus on elucidating the mechanisms by which near-perfect classification arises. In the general case, the quadratic discriminant includes terms that are singular in this limit. If such singularities do not cancel out, one obtains near-perfect classification, which means that the error approaches zero asymptotically, for infinite sample sizes. This singular limit is a consequence of the orthogonality of the probability measures associated with the stochastic processes from which the trajectories are sampled. As a further novel result of this analysis, we formulate rules to determine whether two Gaussian processes are equivalent or mutually singular (orthogonal).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Baíllo, A., Cuevas, A., Cuesta-Albertos, J.A.: Supervised classification for a family of Gaussian functional models. Scand. J. Stat. 38(3), 480–498 (2011)

    MathSciNet  MATH  Google Scholar 

  2. Baker, C.T.H.: The Numerical Treatment of Integral Equations. Clarendon, Oxford (1977)

    MATH  Google Scholar 

  3. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, Boston (2004)

    MATH  Google Scholar 

  4. Berrendero, J.R., Cárcamo, J.: Linear components of quadratic classifiers. Adv. Data Anal. Classif. 13(2), 347–377 (2019)

    MathSciNet  MATH  Google Scholar 

  5. Berrendero, J.R., Bueno-Larraz, B., Cuevas, A.: On Mahalanobis distance in functional settings (2018a). arXiv:1803.06550

  6. Berrendero, J.R., Cuevas, A., Torrecilla, J.L.: On the use of reproducing kernel Hilbert spaces in functional classification. J. Am. Stat. Assoc. 113(523), 1210–1218 (2018b)

    MathSciNet  MATH  Google Scholar 

  7. Bollerslev, T., Chou, R., Kroner, K.F.: Arch modeling in finance: a review of the theory and empirical evidence. J. Econom. 52(1–2), 5–59 (1992)

    MATH  Google Scholar 

  8. Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues. Quant. Finance 1(2), 223–236 (2001)

    MATH  Google Scholar 

  9. Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)

    MathSciNet  MATH  Google Scholar 

  10. Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics). Cambridge University Press, New York (2007)

    Google Scholar 

  11. Cuesta-Albertos, J.A., Dutta, S.: On perfect classification for Gaussian processes (2016). arXiv:1602.04941

  12. Cuevas, A.: A partial overview of the theory of statistics with functional data. J. Stat. Plan. Inference 147, 1–23 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Dai, X., Müller, H.G., Yao, F.: Optimal Bayes classifiers for functional data and density ratios. Biometrika 104(3), 545–560 (2017)

    MathSciNet  MATH  Google Scholar 

  14. Delaigle, A., Hall, P.: Defining probability density for a distribution of random functions. Ann. Stat. 38(2), 1171–1193 (2010)

    MathSciNet  MATH  Google Scholar 

  15. Delaigle, A., Hall, P.: Achieving near perfect classification for functional data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 74(2), 267–286 (2012)

    MathSciNet  MATH  Google Scholar 

  16. Delaigle, A., Hall, P.: Classification using censored functional data. J. Am. Stat. Assoc. 108(504), 1269–1283 (2013)

    MathSciNet  MATH  Google Scholar 

  17. Epifanio, I., Ventura-Campos, N.: Hippocampal shape analysis in Alzheimer’s disease using functional data analysis. Stat. Med. 33(5), 867–880 (2014)

    MathSciNet  Google Scholar 

  18. Fama, E.F.: The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965)

    Google Scholar 

  19. Feldman, J.: Equivalence and perpendicularity of Gaussian processes. Pac. J. Math. 8(4), 699–708 (1958)

    MathSciNet  MATH  Google Scholar 

  20. Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, Secaucus (2006)

    MATH  Google Scholar 

  21. Galeano, P., Joseph, E., Lillo, R.E.: The Mahalanobis distance for functional data with applications to classification. Technometrics 57(2), 281–291 (2015). https://doi.org/10.1080/00401706.2014.902774

    MathSciNet  Article  Google Scholar 

  22. Hájek, J.: A property of \(J\)-divergences of marginal probability distributions. Czechoslov. Math. J. 08(3), 460–463 (1958)

    MathSciNet  MATH  Google Scholar 

  23. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Berlin (2009)

    MATH  Google Scholar 

  24. Hubert, M., Rousseeuw, P., Segaert, P.: Multivariate and functional classification using depth and distance. Adv. Data Anal. Classif. 11(3), 445–466 (2017)

    MathSciNet  MATH  Google Scholar 

  25. Kailath, T.: Some results on singular detection. Inf. Control 9(2), 130–152 (1966)

    MATH  Google Scholar 

  26. Kailath, T.: RKHS approach to detection and estimation problems-I: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory 17(5), 530–549 (1971)

    MathSciNet  MATH  Google Scholar 

  27. Kuelbs, J.: Gaussian measures on a Banach space. Journal of Functional Analysis 5(3), 354–367 (1970)

    MathSciNet  MATH  Google Scholar 

  28. Leng, X., Müller, H.G.: Classification using functional data analysis for temporal gene expression data. Bioinformatics 22(1), 68–76 (2006)

    Google Scholar 

  29. Lukić, M.N., Beder, J.H.: Stochastic processes with sample paths in reproducing Kernel Hilbert spaces. Trans. Am. Math. Soc. 353(10), 3945–3969 (2001)

    MathSciNet  MATH  Google Scholar 

  30. Manton, J.H., Amblard, P.O.: A primer on reproducing kernel Hilbert spaces. Found. Trends Signal Process. 8(1–2), 1–126 (2015)

    MathSciNet  MATH  Google Scholar 

  31. Marks, S., Dunn, O.J.: Discriminant functions when covariance matrices are unequal. J. Am. Stat. Assoc. 69(346), 555–559 (1974)

    MATH  Google Scholar 

  32. Martin-Barragan, B., Lillo, R., Romo, J.: Interpretable support vector machines for functional data. Eur. J. Oper. Res. 232(1), 146–155 (2014)

    Google Scholar 

  33. Müller, H.G.: Peter hall, functional data analysis and random objects. Ann. Stat. 44(5), 1867–1887 (2016)

    MathSciNet  MATH  Google Scholar 

  34. Osborne, M.F.M.: Brownian motion in the stock market. Oper. Res. 7(2), 145–173 (1959)

    MathSciNet  MATH  Google Scholar 

  35. Parzen, E.: Statistical inference on time series by Hilbert space methods. Technical report 23, Statistics Department, Stanford University (1959)

  36. Parzen, E.: An approach to time series analysis. Ann Math Stat 32(4), 951–989 (1961a)

    MATH  Google Scholar 

  37. Parzen, E.: Regression analysis of continuous parameter time series. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, University of California Press, Berkeley, California , pp 469–489 (1961b)

  38. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  39. Ramos-Carreño, C., Suárez, A., Torrecilla, J.L., Carbajo Berrocal, M., Marcos Manchón, P., Pérez Manso, P., Hernando Bernabé, A.: scikit-fda: functional data analysis in Python (2019). https://doi.org/10.5281/zenodo.3468127

  40. Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, Berlin (2005)

    Google Scholar 

  41. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, London (2005)

    Google Scholar 

  42. Rincón, M., Ruiz-Medina, M.D.: Wavelet-RKHS-based functional statistical classification. Adv. Data Anal. Classif. 6(3), 201–217 (2012)

    MathSciNet  MATH  Google Scholar 

  43. Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7), 730–742 (2006). New Issues in Neurocomputing: 13th European Symposium on Artificial Neural Networks

    Google Scholar 

  44. Sato, H.: On the equivalence of Gaussian measures. J. Math. Soc. Jpn. 19(2), 159–172 (1967)

    MathSciNet  MATH  Google Scholar 

  45. Shepp, L.A.: Radon–Nikodym derivatives of Gaussian measures. Ann. Math. Stat. 37(2), 321–354 (1966)

    MathSciNet  MATH  Google Scholar 

  46. Song, J.J., Deng, W., Lee, H.J., Kwon, D.: Optimal classification for time-course gene expression data using functional data analysis. Comput. Biol. Chem. 32(6), 426–432 (2008)

    MathSciNet  MATH  Google Scholar 

  47. Spence, A.: On the convergence of the Nyström method for the integral equation eigenvalue problem. Numer. Math. 25(1), 57–66 (1975)

    MathSciNet  MATH  Google Scholar 

  48. Varberg, D.E.: On equivalence of Gaussian measures. Pac. J. Math. 11(2), 751–762 (1961)

    MathSciNet  MATH  Google Scholar 

  49. Wahl, P.W., Kronmal, R.A.: Discriminant functions when covariances are unequal and sample sizes are moderate. Biometrics 33(3), 479–484 (1977)

    MATH  Google Scholar 

  50. Wang, J.L., Chiou, J.M., Müller, H.G.: Functional data analysis. Ann. Rev. Stat. Appl. 3(1), 257–295 (2016)

    Google Scholar 

  51. Zhu, H., Brown, P.J., Morris, J.S.: Robust classification of functional and quantitative image data using functional mixed models. Biometrics 68(4), 1260–1268 (2012)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The research has been supported by the Spanish Ministry of Economy, Industry, and Competitiveness—State Research Agency, Projects MTM2016-78751-P and TIN2016-76406-P(AEI/FEDER, UE), and Comunidad Autónoma de Madrid, Project S2017/BMD-3688. The authors gratefully acknowledge the use of the computational facilities at the Centro de Computación Científica (CCC) at the Universidad Autónoma de Madrid (UAM).

Author information

Affiliations

Authors

Corresponding author

Correspondence to José L. Torrecilla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Discrete monitoring

In the derivations carried out, the processes X are monitored at a set of appropriately chosen discrete times \( \left\{ t_i \right\} _{i=i}^N \in {\mathcal {I}}^N\). The integrals that appear (e.g., in the definitions of the inner products) are then approximated by Riemann sums

$$\begin{aligned} \int _{t \in {\mathcal {I}}} h(t) \mathrm{d}t \approx \frac{1}{N} \sum _{n=1}^N h(t_n). \end{aligned}$$
(93)

For functions that are continuous in \({\mathcal {I}}\), these Riemman sums converge to the corresponding definite integrals in the limit of dense monitoring

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{n=1}^N h(t_n) = \int _{t \in {\mathcal {I}}} h(t) \mathrm{d}t \quad \forall h \in {\mathcal {C}}\left[ I\right] . \end{aligned}$$
(94)

Let \(K_0\) and \(K_1\) be symmetric, strictly positive kernels that are continuous in \({\mathcal {I}}\). Let the corresponding RKHS’s be infinite-dimensional. In the discretized representation, the kernel functions \(\left\{ K_i(s,t); s,t \in {\mathcal {I}} \right\} _{i=0}^1\) is approximated by \({\mathbf {K}}_i\), the corresponding \(N \times N\) Gram matrices, whose elements are

$$\begin{aligned} \left( {\mathbf {K}}_i\right) _{mn} = K_i(t_n,t_m), \quad n,m = 1, 2,\ldots N, \end{aligned}$$
(95)

for \(i = 0,1.\) Let \(\left\{ \nu _{ij} = \right\} _{j=1}^N\) be the (positive) eigenvalues of matrix \({\mathbf {K}}_i\). Theorem 3.4 of Baker (1977) can be used to show that, in the limit of dense monitoring,

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\nu _{j}}{\varDelta T} = \lambda _{j}, \quad j = 1,2, \ldots ,N \end{aligned}$$
(96)

where \(\left\{ \lambda _{i1} \ge \lambda _{i2} \ge \ldots \ge \lambda _{iN} > 0 \right\} \) are the largest N eigenvalues of \({\mathcal {K}}_i\), the covariance operator associated with the kernel \(K_i\).

Therefore, the spectrum of the Gram matrix \({\mathbf {K}}_i\) converges to the spectrum of the covariance operator \({\mathcal {K}}_i\). In particular, the ratio of the determinants of the Gram matrix

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\left| {\mathbf {K}}_1\right| }{\left| {\mathbf {K}}_0\right| }= & {} \lim _{N \rightarrow \infty } \prod _{j=1}^N \frac{\nu _{1j}}{\nu _{0j}} \nonumber \\= & {} \lim _{N \rightarrow \infty } \prod _{j = 1}^N \frac{\lambda _{1j}}{\lambda _{0j}} \equiv \frac{\left| {\mathcal {K}}_1\right| }{\left| {\mathcal {K}}_0\right| }, \end{aligned}$$
(97)

can be used to define the ratio \( \frac{\left| {\mathcal {K}}_1\right| }{\left| {\mathcal {K}}_0\right| } \) when the corresponding Gaussian processes are equivalent (\({\mathbb {P}}_0 \sim {\mathbb {P}}_1\)), in which case the limit exists (is finite) and is different from zero.

B Setup for the experiment with financial data

The setup of the experiment is as follows: Let \(\left\{ S_i(t_0),\right. \)\(\left. S_i(t_1),\ldots , S_i(t_L) \right\} \) be the time series of asset market prices for stock i monitored at the equally spaced instants

$$\begin{aligned} t_n = t_0 + n \varDelta T; \ n = 0,1,\ldots , L, \end{aligned}$$

where \( L = M (N_B + 1) - 1\). In the data analyzed \(\varDelta T\) is 1 day. Therefore, the quantity \(S_i(t_n)\) is the closing price of the corresponding stock on the nth day of the period considered.

The time series is broken up into M segments of length \(N_B +1\), with \(N_B = 2^B\) for some integer B

$$\begin{aligned} \left\{ S_i\left( t_0^{[m]}\right) , S_i\left( t_1^{[m]}\right) , \ldots , S_i\left( t_{N_{B}}^{[m]}\right) \right\} _{m=1}^M, \end{aligned}$$

where \(t_n^{[m]} = t_{n+ (m-1)N_B}\), with \(n = 0, 1, \ldots , N_B\), and \(m = 1,2,\ldots , M\). These M time series of \(N_B+1\) prices are then transformed into the corresponding time series of log-returns

$$\begin{aligned} \left\{ X_i\left( t_0^{[m]}\right) , X_i\left( t_1^{[m]}\right) , \ldots , X_i\left( t_{N_B}^{[m]}\right) \right\} _{m=1}^M, \end{aligned}$$
(98)

where

$$\begin{aligned} X_i\left( t_n^{[m]}\right) = \log \frac{S_i\left( t_{n}^{[m]}\right) }{S_i\left( t_{0}^{[m]}\right) },\quad n = 0,1,\ldots N_B. \end{aligned}$$

The goal is to discriminate between different stocks on the basis of the corresponding time series of log-returns. In particular, we will analyze how the accuracy of the predictions depends on the monitoring frequency. For this reason, discrimination is made on the basis of \(N_b+1\) subsampled values within each segment

$$\begin{aligned} \left\{ X_i\left( t_{0}^{[m]}\right) , X_i\left( t_{ n_b}^{[m]}\right) , X_i\left( t_{2 n_b}^{[m]}\right) , \ldots , X_i\left( t_{N_b n_b}^{[m]}\right) \right\} , \end{aligned}$$

where \(N_b = 2^b\), and \(n_b = 2^{B-b}\) with \(b = 0,1,\ldots ,B\). As an illustration, for \(b = 0\), only two inputs in each time series are used for discrimination

$$\begin{aligned} \left\{ X_i\left( t_{0}^{[m]}\right) , X_i\left( t_{N_B}^{[m]}\right) \right\} . \end{aligned}$$

For \(b = B\) (\(n_B = 1\)) the complete time series given by Eq. (98) is used as input to the different classifiers. The higher monitoring the frequency is, the closer the problem is to a functional paradigm.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Torrecilla, J.L., Ramos-Carreño, C., Sánchez-Montañés, M. et al. Optimal classification of Gaussian processes in homo- and heteroscedastic settings. Stat Comput 30, 1091–1111 (2020). https://doi.org/10.1007/s11222-020-09937-7

Download citation

Keywords

  • Functional data analysis
  • Optimal classification
  • Gaussian processes
  • Reproducing kernel Hilbert spaces
  • Near-perfect classification