Abstract
In many statistical applications data are curves measured as functions of a continuous parameter as time. Despite of their functional nature and due to discrete-time observation, these type of data are usually analyzed with multivariate statistical methods that do not take into account the high correlation between observations of a single curve at nearby time points. Functional data analysis methodologies have been developed to solve these type of problems. In order to predict the class membership (multi-category response variable) associated to an observed curve (functional data), a functional generalized logit model is proposed. Base-line category logit formulations will be considered and their estimation based on basis expansions of the sample curves of the functional predictor and parameters. Functional principal component analysis will be used to get an accurate estimation of the functional parameters and to classify sample curves in the categories of the response variable. The good performance of the proposed methodology will be studied by developing an experimental study with simulated and real data.
Similar content being viewed by others
References
AGRESTI, A. (2002), Categorical Data Analysis, New York: Wiley.
AGUILERA, A.M., GUTIÉRREZ, R., and VALDERRAMA, M.J. (1996), “Approximation of Estimators in the PCA of a Stochastic Process Using B-splines”, Communications in Statistics - Simulation and Computation, 25(3), 671–690.
AGUILERA, A.M., ESCABIAS, M., and VALDERRAMA, M.J. (2006), ”Using Principal Components for Estimating Logistic Regression with High-Dimensional Multicollinear Data“, Computational Statistics and Data Analysis, 50(8), 1905–1924.
AGUILERA, A.M., ESCABIAS, M., and VALDERRAMA, M.J. (2008), “Discussion of Different Logistic Models with Functional Data. Application to Systemic Lupus Erythematosus”, Computational Statistics and Data Analysis, 53(1), 151–163.
AGUILERA, A.M., ESCABIAS, M., PREDA, C., and SAPORTA, G. (2010), ”Using Basis Expansions for Estimating Functional PLS Regression. Applications with Chemometric Data“, Chemometrics and Intelligent Laboratory Systems, 104(2), 289–305.
CARDOT, H., FAIVRE, R., and GOULARD, M. (2003), “Functional Approaches for Predicting Land Use with the Temporal Evolution of Coarse Resolution Remote Sensing Data”, Journal of Applied Statistics, 30(10), 1185–1199.
CARDOT, H., and SARDA, P. (2005), ”Estimation in Generalized Linear Models for Functional Data Via Penalized Likelihood“, Journal of Multivariate Analysis, 92, 24–41.
CHAMROUKHI, F., SAMÉ, A., GOVAERT, G., and AKNIN, P. (2010), “A Hidden Process Regression Model for Functional Data Description. Application to Curve Discrimination”, Neurocomputing, 73, 1210–1221.
ESCABIAS, M., AGUILERA, A.M., and VALDERRAMA, M.J. (2004), ”Principal Component Estimation of Functional Logistic Regression: Discussion of Two Different Approaches“, Journal of Nonparametric Statistics, 16(3–4), 365–384.
ESCABIAS, M., AGUILERA, A.M., and VALDERRAMA, M.J. (2005), “Modelling Environmental Data by Functional Principal Component Logistic Regression”, Environmetrics, 16(1), 95–107.
ESCABIAS, M., AGUILERA, A.M., and VALDERRAMA, M.J. (2007), ”Functional PLS Logit Regression Model“, Computational Statistics and Data Analysis, 51(10), 4891–4902.
ESCABIAS, M., VALDERRAMA, M.J., AGUILERA, A.M., SANTOFIMIA, M. E., and AGUILERA-MORILLO, M. C. (2013), “Stepwise Selection of Functional Covariates in Forecasting Peak Levels of Olive Pollen”, Stochastic Environmental Research and Risk Assessment, 27(2), 367–376.
FERRATY, F., and VIEU P. (2003), ”Curves Discrimination: A Nonparametric Functional Approach“, Computational Statistics and Data Analysis, 44(1–2), 161–173.
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2008), The Elements of Statistical Learning. Data Mining, Inference, and Prediction, (2nd. ed.), New York: Springer.
HERVÁS, C., SILVA, M., GUTIÉRREZ, P.A., and SERRANO, A. (2008), ”Multilogistic Regression by Evolutionary Neural Network as a Classification Tool to Discriminate Highly Overlapping Signals: Qualitative Investigation of Volatile Organic Compounds in Polluted Waters by Using Headspace-Mass Spectrometric Analysis“, Chemometrics and Intelligent Laboratory Systems, 92(2), 179–185.
JAMES, G.M., and HASTIE, T.J. (2001), “Functional Discriminant Analysis for Irregularly Sampled Curves”, Journal of the Royal Statistical Society. Series B, 63(3), 533–555.
JAMES, G.M. (2002), ”Generalized Linear Models with Functional Predictors“, Journal of the Royal Statistical Society, Series B, 64(3), 411–432.
KAYANO, M., DOZONO, K., and KONISHI, S. (2010), “Functional Cluster Analysis Via Orthonormalized Gaussian Basis Expansions and Its Application”, Journal of Classification, 27, 211–230.
MARX, B.D., and EILERS, P.H.C. (1999), ”Generalized Linear Regression on Sampled Signals and Curves. A P-spline Approach“, Technometrics, 41, 1–13.
MASSY, W.F. (1965), “Principal Component Regression in Exploratory Statistical Research”, Journal of the American Statistical Association, 60(309), 234–256.
MATSUI, H., ARAKI, T., and KONISHI, S. (2011), ”Multiclass Functional Discriminant Analysis and Its Application to Gesture Recognition“, Journal of Classification, 28, 227–243.
MÜLLER, H.G., and STADTMÜLLER, U. (2005), “Generalized Functional Linear Models”, The Annals of Statistics, 33(2), 774–805.
OCAÑA, F.A., AGUILERA, A.M., and ESCABIAS, M. (2007), ”Computational Considerations in Functional Principal Component Analysis“, Computational Statistics, 22(3), 449–466.
PREDA, C., SAPORTA, G., and LÉVÉDER, C. (2007), “PLS Classification of Functional Data”, Computational Statistics, 22(2), 223–235.
RAMSAY, J.O., and SILVERMAN, B.W. (2002), Applied Functional Data Analysis, New York: Springer-Verlag.
RAMSAY, J.O., and SILVERMAN, B.W. (2005), Functional Data Analysis (2nd ed.), New York: Springer-Verlag.
RATCLIFFE, S.J., LEADER, L.R., and HELLER, G.Z. (2002), ”Functional Data Analysis with Application to Periodically Stimulated Foetal Heart Rate Data. II: Functional Logistic Regression“, Statistics in Medicine, 21(8), 1115–1127.
SAEYS, W., De KETELAERE, B., and DAIRUS, P. (2008), “Potential Applications of Functional Data Analysis in Chemometrics”, Journal of Chemometrics, 22, 335–344.
TAN, H., and BROWN, S.D. (2003), ”Multivariate Calibration of Spectral Data Using Dual-Domain Regression Analysis“, Analytica Chimica Acta, 490, 291–301.
TIBSHIRANI, R., SAUNDERS, M., ROSSET, S., ZHU, J., and KNIGHT, K. (2005), “Sparsity and Smoothness Via the Fused Lasso”, Journal of the Royal Statistical Society, Series B, 67(1), 91–108.
VALDERRAMA, M.J., OCAÑA, F.A., AGUILERA, A.M., and OCAÑA-PEINADO, F.M. (2010), ”Forecasting Pollen Concentration by a Two-Step Functional Model“, Biometrics, 66, 135–144.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by Projects MTM2010-20502 from Dirección General de Investigación del MEC, Spain, and FQM-08068 from Consejería de Innovación, Ciencia y Empresa de la Junta de Andalucía Spain. We want to thank the referees advisers.
Rights and permissions
About this article
Cite this article
Escabias, M., Aguilera, A.M. & Aguilera-Morillo, M.C. Functional PCA and Base-Line Logit Models. J Classif 31, 296–324 (2014). https://doi.org/10.1007/s00357-014-9162-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-014-9162-y