Abstract
We consider the classification of sparse functional data that are often encountered in longitudinal studies and other scientific experiments. To utilize the information from not only the functional trajectories but also the observed class labels, we propose a probability-enhanced method achieved by weighted support vector machine based on its Fisher consistency property to estimate the effective dimension reduction space. Since only a few measurements are available for some, even all, individuals, a cumulative slicing approach is suggested to borrow information across individuals. We provide justification for validity of the probability-based effective dimension reduction space, and a straightforward implementation that yields a low-dimensional projection space ready for applying standard classifiers. The empirical performance is illustrated through simulated and real examples, particularly in contrast to classification results based on the prominent functional principal component analysis.
Similar content being viewed by others
References
Aizerman MA, Braverman EA, Rozonoer L (1964) Theoretical foundations of the potential function method in pattern recognition learning. Autom Remote Control 25:821–837
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Berkey CS, Laird NM, Valadian I, Gardner J (1991) Modelling adolescent blood pressure patterns and their prediction of adult pressures. Biometrics 47(3):1005–1018
Besse P, Ramsay JO (1986) Principal components analysis of sampled functions. Psychometrika 51(2):285–311
Biau G, Bunea F, Wegkamp MH (2005) Functional classification in Hilbert spaces. IEEE Trans Inf Theory 51:2163–2172
Boente G, Fraiman R (2000) Kernel-based functional principal components. Stat Probab Lett 48(4):335–345
Bongiorno EG, Salinelli E, Goia A, Vieu P (eds) (2014) Contributions in infinite-dimensional statistics and related topics. Società Editrico Esculapio, Bologna
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory (COLT’92). ACM, New York, pp 144–152
Bredensteiner EJ, Bennett KP (1999) Multicategory classification by support vector machines. Comput Optim Appl 12:53–79
Cai TT, Hall P (2006) Prediction in functional linear regression. Ann Stat 34(5):2159–2179
Cai TT, Yuan M (2012) Minimax and adaptive prediction for functional linear regression. J Am Stat Assoc 107:1201–1216
Cardot H, Ferraty F, Mas A, Sarda P (2003a) Testing hypotheses in the functional linear model. Scand J Stat Theory Appl 30(1):241–255
Cardot H, Ferraty F, Sarda P (2003b) Spline estimators for the functional linear model. Stat Sin 13(3):571–591
Castro PE, Lawton WH, Sylvestre EA (1986) Principal modes of variation for processes with continuous sample curves. Technometrics 28:329–337
Chang CC, Chien LJ, Lee YJ (2011) A novel framework for multi-class classification via ternary smooth support vector machine. Pattern Recognit 44:1235–1244
Chen D, Hall P, Müller HG (2011) Single and multiple index functional regression models iwth nonparametric link. Annu Stat 39:1720–1747
Chiaromonte F, Cook R, Li B (2002) Sufficient dimensions reduction in regressions with categorical predictors. Ann Stat 30:475–497
Cook RD (1998) Regression graphics ideas for studying regressions through graphics. Wiley, New York
Cook RD, Weisberg S (1991) Comment on sliced inverse regression for dimension reduction. J Am Stat Assoc 86:328–332
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 14:326–334
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Cuevas A, Febrero M, Fraiman R (2002) Linear functional regression: the case of fixed design and functional response. Can J Stat La Rev Can Stat 30(2):285–300
Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat Data Anal 22:481–496
Delaigle A, Hall P (2012) Achieving near perfect classification for functional data. J R Stat Soc Ser B 74:267–286
Duan N, Li KC (1991) Slicing regression: a link-free regression method. Ann Stat 19:505–530
Escabias M, Aguilera AM, Valderrama MJ (2004) Principal component estimation of functional logistic regression: discussion of two different approaches. J Nonparametric Stat 16(3–4):365–384
Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman and Hall, London
Faraway JJ (1997) Regression analysis for a functional response. Technometrics 39(3):254–261
Ferraty F, Vieu P (2003) Curves discrimination: a nonparametric functional approach. Comput Stat Data Anal 44:161–173
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
Ferré L, Yao AF (2003) Functional sliced inverse regression analysis. Statistics 37:475–488
Ferré L, Yao AF (2005) Smoothed functional inverse regression. Stat Sin 15:665–683
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York
Gasser T, Kneip A (1995) Searching for structure in curve samples. J Am Stat Assoc 90:1179–1188
Gervini D, Gasser T (2004) Self-modeling warping functions. J R Stat Soc Ser B (Stat Methodol) 66:959–971
Gervini D, Gasser T (2005) Nonparametric maximum likelihood estimation of the structural mean of a sample of curves. Biometrika 92(4):801–820
Guo W (2002) Functional mixed effects models. Biometrics 58:121–128
Hall P, Horowitz JL (2007) Methodology and convergence rates for functional linear regression. Ann Stat 35:70–91
Hall P, Hosseini-Nasab M (2006) On properties of functional principal components analysis. J R Stat Soc Ser B (Stat Methodol) 68(1):109–126
Hall P, Müller HG, Wang JL (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Stat 34(3):1493–1517
He G, Müller HG, Wang JL (2003) Functional canonical analysis for square integrable stochastic processes. J Multivar Anal 85(1):54–77
He X, Wang Z, Jin C, Zheng Y, Xue X (2012) A simplified multi-class support vector machine with reduced dual optimization. Pattern Recognit Lett 33:71–82
Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
James GM (2002) Generalized linear models with functional predictors. J R Stat Soc Ser B (Stat Methodol) 64(3):411–432
James GM, Hastie TJ (2001) Functional linear discriminant analysis for irregular sampled curve. J R Stat Soc Ser B 63:533–550
James GM, Silverman BW (2005) Functional additive model estimation. J Am Stat Assoc 100:565–576
James GM, Hastie TJ, Sugar CA (2000) Principal component models for sparse functional data. Biometrika 87(3):587–602
Jank W, Shmueli G (2006) Functional data analysis in electronic commerce research. Stat Sci 21(2):155–166
Jiang CR, Yu W, Wang JL (2014) Inverse regression for longitudinal data. Ann Stat 42(2):563–591
Jones MC, Rice JA (1992) Displaying the important features of large collections of similar curves. Am Stat 46:140–145
Kimeldorf G, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95
Kirkpatrick M, Heckman N (1989) A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters. J Math Biol 27(4):429–450
Kneip A, Ramsay JO (2008) Combining registration and fitting for functional models. J Am Stat Assoc 103(483):1155–1165
Kneip A, Utikal KJ (2001) Inference for density families using functional principal component analysis. J Am Stat Assoc 96(454):519–542 (with comments and a rejoinder by the authors)
Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99:67–81
Lei E, Yao F, Heckman N, Meyer K (2014) Functional data model for genetically related individuals with application to cow growth. J Comput Graph Stat. doi:10.1080/10618600.2014.948180
Leng X, Müller HG (2006) Classification using functional data analysis for temporal gene expression data. Bioinformatics 22:68–76
Li B, Wang S (2007) On directional regression for dimension reduction. J Am Stat Assoc 102:997–1008
Li KC (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86:316–342
Li KC (1992) On principal hessian directions for data visualization and dimension reduction another application of steins lemma. J Am Stat Assoc 87:1025–1039
Li Y, Hsing T (2010) Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Annu Stat 38:3028–3062
Lin X, Carroll RJ (2000) Nonparametric function estimation for cluster data when the predictor is measured without/with error. J Am Stat Assoc 95:520–534
Lin Y, Lee Y, Wahba G (2004) Support vector machines for classification in nonstandard situations. Mach Learn 33:191–202
Liu Y, Shen X (2006) Multicategory \(\psi \)-learning. J Am Stat Assoc 101:500–509
Liu Y, Yuan M (2011) Reinforced multicategory support vector machines. J Comput Graph Stat 20:901–919
Morris JS, Carroll RJ (2006) Wavelet-based functional mixed models. J R Stat Soc Ser B (Statl Methodol) 68(2):179–199
Morris JS, Vannucci M, Brown PJ, Carroll RJ (2003) Wavelet-based nonparametric modeling of hierarchical functions in colon carcinogenesis. J Am Stat Assoc 98(463):573–597 (with comments and a rejoinder by the authors)
Müller HG (2005) Functional modelling and classification of longitudinal data. Scand J Stat Theory Appl 32:223–240
Müller HG (2008) Functional modeling of longitudinal data. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G (eds) Longitudinal data analysis (handbooks of modern statistical methods). Chapman & Hall/CRC, New York, pp 223–252
Müller HG, Stadtmüller U (2005) Generalized functional linear models. Ann Stat 33(2):774–805
Müller HG, Chiou JM, Leng X (2008) Inferring gene expression dynamics via functional regression analysis. BMC Bioinform 9:60
Ramsay J, Silverman B (2002) Applied functional data analysis. Springer series in statistics. Springer, New York
Ramsay JO, Li X (1998) Curve registration. J R Stat Soc Ser B (Stat Methodol) 60(2):351–363
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Ramsay JO, Hooker G, Campbell D, Cao J (2007) Parameter estimation for differential equations: a generalized smoothing approach (with discussion). J R Stat Soc Ser B (Stat Methodol) 69(5):741–796
Rao CR (1958) Some statistical methods for comparison of growth curves. Biometrics 14(1):1–17
Rice AJ, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B 53:233–243
Rice JA (2004) Functional and longitudinal data analysis: perspectives on smoothing. Stat Sin 14:631–647
Rice JA, Wu CO (2001) Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57(1):253–259
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386–407
Rosenblatt F (1962) Principles of neurodynamics. Spartan, New York
Shi M, Weiss RE, Taylor JMG (1996) An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. Ann Stat 45:151–163
Shin SJ, Wu Y, Zhang HH, Liu Y (2014) Probability-enhanced sufficient dimension reduction for binary classification. Biometrics 70:546–555
Silverman BW (1996) Smoothed functional principal components analysis by choice of norm. Ann Stat 24(1):1–24
Tuddenham R, Snyder M (1954) Physical growth of California boys and girls from birth to age 18. Univ Calif Publ Child Dev 1:183–364
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vapnik V, Lerner A (1963) Pattern recognition using generalized portrait method. Autom Remote Control 24:774–780
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Wahba G (1990) Spline models for observational data. In: CBMS-NSF regional conference series in applied mathematics, vol 35. SIAM, Philadelphia
Wang J, Shen X, Liu Y (2008) Probability estimation for large-margin classifier. Biometrika 95:149–167
Wang L, Shen X (2006) Multicategory support vector machines, feature selection and solution path. Stat Sin 16:617–634
Wang L, Shen X (2007) On \(l_1\)-norm multiclass support vector machines: methodology and theory. J Am Stat Assoc 102:583–594
Weston J, Watkins C (1999) Support vector machines for multiclass pattern recognition. In: European symposium on artificial neural networks, pp 219–224
Wu Y, Liu Y (2007) Robust truncated-hinge-loss support vector machines. J Am Stat Assoc 102:974–983
Wu Y, Liu Y (2013) Functional robust support vector machines for sparse and irregular longitudinal data. J Comput Graph Stat 2:379–395
Xia Y, Tong H, Li W, Zhu LX (2002) An adaptive estimation of dimension reduction space. J R Stat Soc Ser B (Stat Methodol) 64(3):363–410
Yao F, Lee TCM (2006) Penalized spline models for functional principal component analysis. J R Stat Soc Ser B (Stat Methodol) 68(1):3–25
Yao F, Müller HG, Wang JL (2005a) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100(470):577–590
Yao F, Müller HG, Wang JL (2005b) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100:577–590
Yao F, Lei E, Wu Y (2015) Effective dimensional reduction for sparse functional data. Biometrika. doi:10.1093/biomet/asv006
Yuan M, Cai TT (2010) A reproducing kernel Hilbert space approach to functional linear regression. Ann Stat 38(6):3412–3444
Zhao X, Marron JS, Wells MT (2004) The functional data analysis view of longitudinal data. Stat Sin 14(3):789–808
Zhou L, Huang JZ, Martinez JW, Maity A, Baladandayuthapani V, Carroll RJ (2010) Reduced rank mixed effects models for spatially correlated hierarchical functional data. J Am Stat Assoc 105:390–400
Zhu L, Zhu L, Feng Z (2010) Dimension reduction in regressions through cumulative slicing estimation. J Am Stat Assoc 105:1455–1466
Author information
Authors and Affiliations
Corresponding author
Additional information
This invited paper is discussed in comments available at: doi:10.1007/s11749-015-0471-1; doi:10.1007/s11749-015-0472-0; doi:10.1007/s11749-015-0473-z; doi:10.1007/s11749-015-0474-y; doi:10.1007/s11749-015-0475-x; doi:10.1007/s11749-015-0476-9; doi:10.1007/s11749-015-0477-8.
Appendix: Technical details
Appendix: Technical details
Proof of Proposition 1
The proof is similar to that of Lemma 1 in Shin et al. (2014). We first show \(S_{p(X)|X}\subseteq S_{Y|X}\), which is equivalent to show that, for any \(\{\beta _k\}_{k=1}^K\) such that \(X\perp p(X)\,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \), we have \(X\perp Y\,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \). Recall \(Y\{p(x),\varepsilon ^{*}\}\) is 1 if \(\varepsilon ^{*}\le p(x)\) and \(-1\) otherwise. As a consequence, \(X\perp Y\, |\, p(X)\) and \(X\perp Y\,|\,\{p(X), \langle X, \langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \}\). Since \(X\perp p(X)\,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \), we obtain \(X\perp Y \,|\, \langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \) owing to Proposition 4.6 of Cook (1998).
To show \(S_{Y|X}\subseteq S_{p(X)|X}\) is equivalent to show \(Y\perp X\,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \Rightarrow X\perp p(X)\,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \) for any \(\{\beta _k\}_{k=1}^K\). Since \(Y\perp X \,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \), we have \(E(Y|X)=E(Y|\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle )\) and \(p(X)=E((Y+1)/2|X)=E(Y|\langle X, \beta _k\rangle _{k=1, \ldots , K})/2+1/2\). Hence \(X\perp p(X)\,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \). \(\square \)
Proof of Theorem 1
It suffices to show, if \(h\,\bot \, \text{ span }({\varSigma }\beta _1,\cdots , {\varSigma }\beta _K)\), then \(\langle h,m(\cdot ,\pi )\rangle =0\). Since \(X\perp p(X)\,|\,\langle \beta _1, X\rangle , \ldots , \langle \beta _K, X\rangle \),
Thus, it is enough to show that \(E(\langle h, X\rangle |\langle \beta _1,X\rangle ,\cdots ,\langle \beta _K,X\rangle )=0\) with probability 1, which is implied by \(E\{E^2(\langle h, X\rangle |\langle \beta _1,X\rangle ,\cdots ,\langle \beta _K,X\rangle )\}=0\). Invoking the linearity condition in Assumption 2 and \(E(\langle \beta _k, X\rangle \langle h, X\rangle )=\langle h, {\varSigma }\beta _k\rangle \), for some constants \(c_0, \ldots , c_K\),
\(\square \)
Rights and permissions
About this article
Cite this article
Yao, F., Wu, Y. & Zou, J. Probability-enhanced effective dimension reduction for classifying sparse functional data. TEST 25, 1–22 (2016). https://doi.org/10.1007/s11749-015-0470-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-015-0470-2
Keywords
- Classification
- Cumulative slicing
- Effective dimension reduction
- Sparse functional data
- Weighted support vector machine