Skip to main content
Log in

Fast DD-classification of functional data

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional space. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the DD-plot, which is a subset of the unit square. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification in \([0,1]^2\). The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik–Chervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as by a benchmark study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Baíllo A, Cuevas A (2008) Supervised functional classification: a theoretical remark and some comparisons. arXiv:0806.2831v1 [stat.ML]

  • Biau G, Bunea F, Wegkamp MH (2005) Functional classification in Hilbert spaces. IEEE Trans Inf Theory 51:2163–2172

    Article  MATH  MathSciNet  Google Scholar 

  • Cambanis S (1973) On some continuity and differentiability properties of paths of Gaussian processes. J Multivar Anal 3:420–434

    Article  MATH  MathSciNet  Google Scholar 

  • Carey JR, Liedo P, Müller H-G, Wang J-L, Chiou J-M (1998) Relationship of age patterns of fecundity to mortality, longevity, and lifetime reproduction in a large cohort of Mediterranean fruit fly females. J Gerontol 53A:B245–B251

    Article  Google Scholar 

  • Chakraborty A, Chaudhuri P (2014) On data depth in infinite dimensional spaces. Ann Inst Stat Math 66:303–324

    Article  MATH  MathSciNet  Google Scholar 

  • Cuesta-Albertos JA, Febrero-Bande M, Oviedo de la Fuente M (2015) The DD\(^G\)-classifier in the functional setting. arXiv:1501.00372 [stat.ME]

  • Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. Comput Stat Data Anal 52:4979–4988

    Article  MATH  MathSciNet  Google Scholar 

  • Cuesta-Albertos JA, Nieto-Reyes A (2010) Functional classification and the random Tukey depth. Practical issues. In: Borgelt C, Rodríguez GG, Trutschnig W, Lubiano MA, Angeles Gil M, Grzegorzewski P, Hryniewicz O (eds) Combining soft computing and statistical methods in data analysis. Springer, Berlin/Heidelberg, pp 123–130

  • Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat 22:481–496

    Article  MATH  MathSciNet  Google Scholar 

  • Delaigle A, Hall P (2012) Achieving near-perfect classification for functional data. J R Stat Soc 74:267–286

    Article  MathSciNet  Google Scholar 

  • Delaigle A, Hall P, Bathia N (2012) Componentwise classification and clustering of functional data. Biometrika 99:299–313

    Article  MATH  MathSciNet  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York

    Book  MATH  Google Scholar 

  • Dutta S, Ghosh AK (2012a) On robust classification using projection depth. Ann Inst Stat Math 64:657–676

    Article  MATH  MathSciNet  Google Scholar 

  • Dutta S, Ghosh AK (2012b) On classification based on \(L_p\) depth with an adaptive choice of \(p\). Technical Report Number R5/2011, Statistics and Mathematics Unit. Indian Statistical Institute, Kolkata

  • Ferraty F, Hall P, Vieu P (2010) Most-predictive design points for functional data predictors. Biometrika 94:807–824

    Article  MATH  MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2003) Curves discrimination: a nonparametric functional approach. Comput Stat Data Anal 44:161–173

    Article  MATH  MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York

    MATH  Google Scholar 

  • Ferré L, Villa N (2006) Multi-layer perceptron with functional inputs: an inverse regression approach. Scand J Stat 33:807–823

    Article  MATH  Google Scholar 

  • Fraiman R, Muniz G (2001) Trimmed means for functional data. TEST 10:419–440

    Article  MATH  MathSciNet  Google Scholar 

  • Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32:327–350

    Article  MATH  MathSciNet  Google Scholar 

  • Hall P, Poskitt D, Presnell B (2001) A functional data-analytic approach to signal discrimination. Technometrics 43:1–9

    Article  MATH  MathSciNet  Google Scholar 

  • Hoeffding W (1963) Probability inequalities for sums of bounded random varibles. J Am Stat Assoc 58:13–30

    Article  MATH  Google Scholar 

  • Huang D-S, Zheng C-H (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22:1855–1862

    Article  Google Scholar 

  • James G, Hastie T (2001) Functional linear discriminant analysis for irregularly sampled curves. J R Stat Soc Ser B 63:533–550

    Article  MATH  MathSciNet  Google Scholar 

  • Kuelbs J, Zinn J (2013) Concerns with functional depth. Lat Am J Probab Math Stat 10:831–855

    MATH  MathSciNet  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2014a) Fast nonparametric classification based on data depth. Stat Pap 55:49–69

    Article  MATH  MathSciNet  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2014b). \(DD\alpha \)-classification of asymmetric and fat-tailed data. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis, machine learning and knowledge discovery. Springer, Berlin, pp 71–78

  • Leng XY, Müller H-G (2006) Classification using functional data analysis for temporal gene expression data. Bioinformatics 22:68–76

    Article  Google Scholar 

  • Li J, Cuesta-Albertos JA, Liu RY (2012) \(DD\)-classifier: nonparametric classification procedure based on \(DD\)-plot. J Am Stat Assoc 107:737–753

    Article  MATH  MathSciNet  Google Scholar 

  • Liu X, Zuo Y (2014) Computing projection depth and its associated estimators. Stat Comput 24:51–63

    Article  MATH  MathSciNet  Google Scholar 

  • López-Pintado S, Romo J (2006) Depth-based classification for functional data. In: Liu R, Serfling R, Souvaine D (eds) Data depth: robust multivariate analysis. American Mathematical Society, Computational Geometry and Applications, pp 103–120

  • Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Acad India 12:49–55

    MATH  Google Scholar 

  • Mosler K, Polyakova Y (2012) General notions of depth for functional data. arXiv:1208.1981v1 [stat.ME]

  • Mozharovskyi P, Mosler K, Lange T (2015) Classifying real-world data with the \(DD\alpha \)-procedure. Adv Data Anal Classif 9:287–314

    Article  MathSciNet  Google Scholar 

  • Müller H-G, Stadtmüller U (2005) Generalized functional linear models. Ann Stat 33:774–805

    Article  MATH  MathSciNet  Google Scholar 

  • Nagy S, Gijbels I, Hlubinka D (2015) Weak convergence of discretely observed functional data with applications. J Multivar Anal. doi:10.1016/j.jmva.2015.06.006

  • Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, Berlin

    Google Scholar 

  • Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69:730–742

    Article  Google Scholar 

  • Serfling R (2002) A depth function and a scale curve based on spatial quantiles. In: Y Dodge (ed) Statistics and data analysis based on L\(_1\)-norm and related methods. Birkhaeuser, pp 25–38

  • Sguera C, Galeano P, Lillo RE (2014) Spatial depth-based classification for functional data. TEST 23:725–750

    Article  MATH  MathSciNet  Google Scholar 

  • Tian ST, James G (2013) Interpretable dimensionality reduction for classifying functional data. Comput Stat Data Anal 57:282–296

    Article  MATH  Google Scholar 

  • Tuddenham R, Snyder M (1954) Physical growth of California boys and girls from birth to eighteen years. University of California Press, Berkeley

    Google Scholar 

  • Vapnik VN, Ya Chervonenkis A (1974) Teorija raspoznavanija obrazov (statisticheskie problemy obuchenija) (The theory of pattern recognition (statistical learning problems), in Russian). Nauka, Moscow

    Google Scholar 

  • Vardi Y, Zhang CH (2000) The multivariate \(L_1\)-median and associated data depth. Proc Natl Acad Sci USA 97:1423–1426

  • Vasil’ev VI, Lange T (1998) The duality principle in learning for pattern recognition (in Russian). Kibern i Vytschislit’elnaya Tech 121:7–16

    Google Scholar 

  • Vencálek (2011) Weighted data depth and depth based discrimination. Doctoral thesis. Charles University, Prague

  • Wang XH, Ray S, Mallick BK (2007) Bayesian curve classification using wavelets. J Am Stat Assoc 102:962–973

    Article  MATH  MathSciNet  Google Scholar 

  • Zuo YJ, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank Dominik Liebl for his critical comments on an earlier version of the manuscript, as well as Ondrej Vencalek and Aurore Delaigle for their helpful remarks. The reading and suggestions of two referees are also gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavlo Mozharovskyi.

Appendices

Appendix 1: Implementation details

In calculating the depths, \(\mu _Y\) and \(\Sigma _Y\) for the Mahalanobis depth have been determined by the usual moment estimates and similarly, \(\Sigma _Y\) for the spatial depth. The projection depth has been approximated by drawing 1000 directions from the uniform distribution on the unit sphere. Clearly, the number of directions needed for satisfactory approximation depends on the dimension of the space. Observe that for higher-dimensional problems 1000 directions are not enough, which becomes apparent from the analysis of Model 2 in Sect. 7.2. There the location-slope spaces chosen have dimension eight and higher; see also Tables 4 and 8 in Appendix 2. On the other hand, calculating the projection depth even in one dimension costs something. Computing 1 000 directions to approximate the projection depth takes substantially more time than computing the exact Mahalanobis or spatial depths (see Tables 2 and 14 in Appendix 2).

LDA and QDA are used with classical moment estimates, and priors are estimated by the class portions in the training set. The kNN-classifier is applied to location-slope data in its affine invariant form, based on the covariance matrix of the pooled classes. For time reasons, its parameter k is determined by leave-one-out cross-validation over a reduced range, viz. \(k\in \{1, \dots , \max \{\min \{10(m+n)^{1/d}+1,m+n-1\},2\}\}\). The \(\alpha \)-procedure separating the DD-plot uses polynomial space extensions with maximum degree three; the latter is selected by cross-validation. To keep the training speed of the depth-based kNN-classifier comparable with that of the \(DD\alpha \)-classifier, we also determine k by leave-one-out cross-validation on a reduced range of \(k\in \{1, \dots , \max \{\min \{10\sqrt{m+n}+1,(m+n)/2\},2\}\}\).

Due to linear interpolation, the levels are integrated as piecewise-linear functions, and the derivatives as piecewise constant ones. If the dimension of the location-slope space is too large (in particular for inverting the covariance matrix, as it can be the case in Model 2), PCA is used to reduce the dimension. Then \(\epsilon _{max}\) is estimated and all further computations are performed in the subspace of principal components having positive loadings.

To construct the location-slope space, firstly all pairs (LS) satisfying \(2\le L+S\le M/2\) are considered. (M / 2 amounts to 26 for the synthetic and to 16 for the real data sets.) For each (LS) the data are transformed to \(\mathbb {R}^{L+S}\), and the Vapnik–Chervonenkis bound \(\epsilon _{max}\) is calculated. Then those five pairs are selected that have smallest \(\epsilon _{max}\). Here, tied values of \(\epsilon _{max}\) are taken into account as well, with the consequence that on an average slightly more than five pairs are selected; see the growth data in Table 2 and both synthetic models in Table 14 of Appendix 2. Finally, among these the best (LS)-pair is chosen by means of cross-validation. Note that the goal of this cross-validation is not to actually choose a best location-slope dimension but rather to get rid of obviously misleading (LS)-pairs, which may yield relatively small values of \(\epsilon _{max}\). This is seen from Figs. 4 and 5. When determining an optimal (LS)-pair by crossLS, the same set of (LS)-pairs is considered as with VCcrossLS.

In implementing the componentwise method of finite-dimensional space synthesis (crossDHB) we have followed Delaigle et al. (2012) with slight modifications. The original approach of Delaigle et al. (2012) is combined with the sequential approach of Ferraty et al. (2010). Initially, a grid of equally (\(\Delta t\)) distanced discretization points is built. Then a sequence of finite-dimensional spaces is synthesized by adding points of the grid step by step. We start with all pairs of discretization points that have at least distance \(2\Delta t\). [Note that Delaigle et al. (2012) start with single points instead of pairs.] The best of them is chosen by cross-validation. Then step by step features are added. In each step, that point that has best discrimination power (again, in the sense of cross-validation) when added to the already constructed set is chosen as a new feature. The resulting set of points is used to construct a neighborhood of combinations to be further considered. As a neighborhood we use twenty \(2\Delta t\)-distanced points in the second step, and ten in the third; from the fourth step on the sequential approach is applied only.

All our cross-validations are tenfold, except the leave-one-out cross-validations in determining k with both kNN-classifiers. Of course, partitioning the sample into ten parts only may depreciate our approach against a more comprehensive leave-one-out cross-validation. We have chosen it to keep computation times of the crossDHB approach (Delaigle et al. 2012) in practical limits and also to make the comparison of approaches equitable throughout our study.

The calculations have been implemented in an R-environment, based on the R-package “ddalpha” (Mozharovskyi et al. 2015), with speed critical parts written in C++. The R-code implementing our methodology as well as that performing the experiments can be obtained upon request from the authors. In all experiments, one kernel of the processor Core i7-2600 (3.4 GHz) having enough physical memory has been used. Thus, regarding the methodology of Delaigle et al. (2012) our implementation differs from their original one and, due to its module-based structure, may result in larger computation times. For this reason we provide the number of cross-validations performed; see Tables 2 and 14 of Appendix 2. The comparison appears to be fair, as we always use ten-fold cross-validation together with an identical set of classification rules in the finite-dimensional spaces.

Appendix 2: Additional tables

See Tables 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 14

Table 3 Frequency (in %) of selected location-slope dimensions using the Vapnik–Chervonenkis bound; Model 1
Table 4 Frequency (in %) of selected location-slope dimensions using the Vapnik–Chervonenkis bound; Model 2
Table 5 Frequency (in %) of location-slope dimensions chosen using the Vapnik–Chervonenkis bound; growth data
Table 6 Frequency (in %) of location-slope dimensions chosen using the Vapnik–Chervonenkis bound; medflies data
Table 7 Frequency (in %) of selected location-slope dimensions using cross-validation; Model 1
Table 8 Frequency (in %) of selected location-slope dimensions using cross-validation; Model 2
Table 9 Frequency (in %) of location-slope dimensions chosen using cross-validation; growth data
Table 10 Frequency (in %) of location-slope dimensions chosen using cross-validation; medflies data
Table 11 Frequency (in %) of selected dimensions using componentwise method; Model 1
Table 12 Frequency (in %) of selected dimensions using componentwise method; Model 2
Table 13 Frequency (in %) of selected dimensions using componentwise method; growth data
Table 14 Average (median for componentwise classification = crossDHB) training and classification (in parentheses) times (in s), and numbers of cross-validations performed (in square brackets), over 100 tries

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mosler, K., Mozharovskyi, P. Fast DD-classification of functional data. Stat Papers 58, 1055–1089 (2017). https://doi.org/10.1007/s00362-015-0738-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0738-3

Keywords

Navigation