Interpretable sparse SIR for functional data

Abstract

We propose a semiparametric framework based on sliced inverse regression (SIR) to address the issue of variable selection in functional regression. SIR is an effective method for dimension reduction which computes a linear projection of the predictors in a low-dimensional space, without loss of information on the regression. In order to deal with the high dimensionality of the predictors, we consider penalized versions of SIR: ridge and sparse. We extend the approaches of variable selection developed for multidimensional SIR to select intervals that form a partition of the definition domain of the functional predictors. Selecting entire intervals rather than separated evaluation points improves the interpretability of the estimated coefficients in the functional framework. A fully automated iterative procedure is proposed to find the critical (interpretable) intervals. The approach is proved efficient on simulated and real data. The method is implemented in the R package SISIR available on CRAN at https://cran.r-project.org/package=SISIR.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Allen, R.G., Pereira, L.S., Raes, D., Smith, M.: Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. FAO, Rome 300(9), D05109 (1998)

    Google Scholar 

  2. Aneiros, G., Vieu, P.: Variable in infinite-dimensional problems. Stat. Probab. Lett. 94, 12–20 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  3. Bernard-Michel, C., Gardes, L., Girard, S.: A note on sliced inverse regression with regularizations. Biometrics 64(3), 982–986 (2008). https://doi.org/10.1111/j.1541-0420.2008.01080.x

    MathSciNet  MATH  Article  Google Scholar 

  4. Bettonvil, B.: Factor screening by sequential bifurcation. Commun. Stat. Simul. Comput. 24(1), 165–185 (1995)

    MATH  Article  Google Scholar 

  5. Biau, G., Bunea, F., Wegkamp, M.: Functional classification in Hilbert spaces. IEEE Trans. Inf. Theory 51, 2163–2172 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  6. Borggaard, C., Thodberg, H.: Optimal minimal neural interpretation of spectra. Anal. Chem. 64(5), 545–551 (1992)

    Article  Google Scholar 

  7. Bura, A., Cook, R.: Extending sliced inverse regression: the weighted chi-squared test. J. Am. Stat. Assoc. 96(455), 996–1003 (2001)

    MathSciNet  MATH  Article  Google Scholar 

  8. Bura, E., Yang, J.: Dimension estimation in sufficient dimension reduction: a unifying approach. J. Multivar. Anal. 102(1), 130–142 (2011). https://doi.org/10.1016/j.jmva.2010.08.007

    MathSciNet  MATH  Article  Google Scholar 

  9. Casadebaig, P., Guilioni, L., Lecoeur, J., Christophe, A., Champolivier, L., Debaeke, P.: Sunflo, a model to simulate genotype-specific performance of the sunflower crop in contrasting environments. Agric. For. Meteorol. 151(2), 163–178 (2011)

    Article  Google Scholar 

  10. Chen, C., Li, K.: Can SIR be as popular as multiple linear regression? Stat. Sin. 8, 289–316 (1998)

    MathSciNet  MATH  Google Scholar 

  11. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis puirsuit. SIAM J. Sci. Comput. 20(1), 33–61 (2015)

    MATH  Article  Google Scholar 

  12. Cook, R.: Testing predictor contributions in sufficient dimension reduction. Ann. Stat. 32(3), 1061–1092 (2004)

    MathSciNet  MATH  Article  Google Scholar 

  13. Cook, R., Yin, X.: Dimension reduction and visualization in discriminant analysis. Aust. N. Z. J. Stat. 43(2), 147–199 (2001)

    MathSciNet  MATH  Article  Google Scholar 

  14. Coudret, R., Liquet, B., Saracco, J.: Comparison of sliced inverse regression aproaches for undetermined cases. J. Soc. Fr. Stat. 155(2), 72–96 (2014). http://journal-sfds.fr/index.php/J-SFdS/article/view/278

  15. Dauxois, J., Ferré, L., Yao, A.: Un modèle semi-paramétrique pour variable aléatoire hilbertienne. Comptes Rendus Math. Acad. Sci. Paris 327(I), 947–952 (2001). https://doi.org/10.1016/S0764-4442(01)02163-2

    MATH  Article  Google Scholar 

  16. Fauvel, M., Deschene, C., Zullo, A., Ferraty, F.: Fast forward feature selection of hyperspectral images for classification with Gaussian mixture models. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(6), 2824–2831 (2015). https://doi.org/10.1109/JSTARS.2015.2441771

    Article  Google Scholar 

  17. Ferraty, F., Hall, P.: An algorithm for nonlinear, nonparametric model choice and prediction. J. Comput. Graph. Stat. 24(3), 695–714 (2015). https://doi.org/10.1080/10618600.2014.936605

    MathSciNet  Article  Google Scholar 

  18. Ferraty, F., Hall, P., Vieu, P.: Most-predictive design points for functional data predictors. Biometrika 97(4), 807–824 (2010). https://doi.org/10.1093/biomet/asq058

    MathSciNet  MATH  Article  Google Scholar 

  19. Ferré, L.: Determining the dimension in sliced inverse regression and related methods. J. Am. Stat. Assoc. 93(441), 132–140 (1998). https://doi.org/10.1080/01621459.1998.10474095

    MathSciNet  MATH  Article  Google Scholar 

  20. Ferré, L., Villa, N.: Multi-layer perceptron with functional inputs: an inverse regression approach. Scand. J. Stat. 33(4), 807–823 (2006). https://doi.org/10.1111/j.1467-9469.2006.00496.x

    MATH  Article  Google Scholar 

  21. Ferré, L., Yao, A.: Functional sliced inverse regression analysis. Statistics 37(6), 475–488 (2003)

    MathSciNet  MATH  Article  Google Scholar 

  22. Fraiman, R., Gimenez, Y., Svarc, M.: Feature selection for functional data. J. Multivar. Anal. 146, 191–208 (2016). https://doi.org/10.1016/j.jmva.2015.09.006

    MathSciNet  MATH  Article  Google Scholar 

  23. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Article  Google Scholar 

  24. Fromont, M., Tuleau, C.: Functional classification with margin conditions. In: Lugosi, G., Simon, H. (eds.) Proceedings of the 19th Annual Conference on Learning Theory (COLT 2006), Springer (Berlin/Heidelberg), Pittsburgh, PA, USA, Lecture Notes in Computer Science, vol. 4005, pp. 94–108 (2006). https://doi.org/10.1007/11776420_10

  25. Fruth, J., Roustant, O., Kuhnt, S.: Sequential designs for sensitivity analysis of functional inputs in computer experiments. Reliab. Eng. Syst. Saf. 134, 260–267 (2015)

    Article  Google Scholar 

  26. Golub, T., Slonim, D., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979). https://doi.org/10.2307/1268518

    MathSciNet  MATH  Article  Google Scholar 

  27. Grollemund, P., Abraham, C., Baragatti, M., Pudlo, P.: Bayesian functional linear regression with sparse step functions. Preprint (2018). arXiv:1604.08403

  28. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, New York (2001)

    Google Scholar 

  29. Hernández, N., Biscay, R., Villa-Vialaneix, N., Talavera, I.: A non parametric approach for calibration with functional data. Stat. Sin. 25, 1547–1566 (2015). https://doi.org/10.5705/ss.2013.242

    MathSciNet  MATH  Article  Google Scholar 

  30. James, G., Wang, J., Zhu, J.: Functional linear regression that’s interpretable. Ann. Stat. 37(5A), 2083–2108 (2009). https://doi.org/10.1214/08-AOS641

    MathSciNet  MATH  Article  Google Scholar 

  31. Kneip, A., Poß, D., Sarda, P.: Functional linear regression with points of impact. Ann. Stat. 44(1), 1–30 (2016). https://doi.org/10.1214/15-AOS1323

    MathSciNet  MATH  Article  Google Scholar 

  32. Li, K.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86(414), 316–342 (1991). http://www.jstor.org/stable/2290563

  33. Li, L., Nachtsheim, C.: Sparse sliced inverse regression. Technometrics 48(4), 503–510 (2008)

    MathSciNet  Article  Google Scholar 

  34. Li, L., Yin, X.: Sliced inverse regression with regularizations. Biometrics 64(1), 124–131 (2008). https://doi.org/10.1111/j.1541-0420.2007.00836.x

    MathSciNet  MATH  Article  Google Scholar 

  35. Lin, Q., Zhao, Z., Liu, J.: On consistency and sparsity for sliced inverse regression in high dimensions. Preprint (2018). arXiv:1507.03895

  36. Liquet, B., Saracco, J.: A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approaches. Comput. Stat. 27(1), 103–125 (2012)

    MathSciNet  MATH  Article  Google Scholar 

  37. Matsui, H., Konishi, S.: Variable selection for functional regression models via the \(l_1\) regularization. Comput. Stat. Data Anal. 55(12), 3304–3310 (2011). https://doi.org/10.1016/j.csda.2011.06.016

    MATH  MathSciNet  Article  Google Scholar 

  38. McKeague, I., Sen, B.: Fractals with point impact in functional linear regression. Ann. Stat. 38(4), 2559–2586 (2010). https://doi.org/10.1214/10-AOS791

    MathSciNet  MATH  Article  Google Scholar 

  39. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.6-7 (2015)

  40. Ni, L., Cook, D., Tsai, C.: A note on shrinkage sliced inverse regression. Biometrika 92(1), 242–247 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  41. Park, A., Aston, J., Ferraty, F.: Stable and predictive functional domain selection with application to brain images. Preprint (2016). arXiv:1606.02186

  42. Portier, F., Delyon, B.: Bootstrap testing of the rank of a matrix via least-square constrained estimation. J. Am. Stat. Assoc. 109(505), 160–172 (2014). https://doi.org/10.1080/01621459.2013.847841

    MATH  Article  Google Scholar 

  43. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)

    Google Scholar 

  44. Schott, J.: Determining the dimensionality in sliced inverse regression. J. Am. Stat. Assoc. 89(425), 141–148 (1994)

    MathSciNet  MATH  Article  Google Scholar 

  45. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22, 231–245 (2013). https://doi.org/10.1080/10618600.2012.681250

    MathSciNet  Article  Google Scholar 

  46. Tibshirani, R., Saunders, G., Rosset, S., Zhu, J., Knight, J.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B 67(1), 91–108 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  47. Zhao, Y., Ogden, R., Reiss, P.: Wavelet-based LASSO in functional linear regression. J. Comput. Graph. Stat. 21(3), 600–617 (2012). https://doi.org/10.1080/10618600.2012.679241

    MathSciNet  Article  Google Scholar 

  48. Zhu, L., Miao, B., Peng, H.: On sliced inverse regression with high-dimensional covariates. J. Am. Stat. Assoc. 101(474), 360–643 (2006)

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

The authors thank the two anonymous referees for relevant remarks and constructive comments on a previous version of the paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nathalie Villa-Vialaneix.

Appendices

Equivalent expressions for \(R^2(d)\)

In this section, we show that \(R^2(d) = \frac{1}{2}\mathbb {E} \left\| \varPi _d - \widehat{\varPi }_d \right\| ^2_F\). We have

$$\begin{aligned} \frac{1}{2} \left\| \varPi _d - \widehat{\varPi }_d \right\| ^2_F= & {} \frac{1}{2} \text{ Tr }\left[ \left( \varPi _d - \widehat{\varPi }_d \right) \left( \varPi _d - \widehat{\varPi }_d \right) ^\top \right] \\= & {} \frac{1}{2} \text{ Tr }\left[ \left( \varPi _d \varPi _d \right) \right] - \text{ Tr }\left[ \left( \varPi _d \widehat{\varPi }_d \right) \right] \\&+\frac{1}{2} \text{ Tr }\left[ \left( \widehat{\varPi }_d \widehat{\varPi }_d \right) \right] . \end{aligned}$$

The norm of a M-orthogonal projector onto a space of dimension d is equal to d, we thus have that

$$\begin{aligned} \frac{1}{2} \left\| \varPi _d - \widehat{\varPi }_d \right\| ^2_F = d - \text{ Tr }\left[ \left( \varPi _d \widehat{\varPi }_d \right) \right] , \end{aligned}$$

which concludes the proof.

figureb

Joint choice of the parameters \(\mu _2\) and d

Notations:

  • \(\mathcal {L}_l\) are observations in fold number l and \(\overline{\mathcal {L}_l}\) are the remaining observations;

  • \(\hat{A}(\mathcal {L}, \mu _2, d)\) and \(\hat{C}(\mathcal {L}, \mu _2, d)\) are minimizers of the ridge regression problem restricted to observations \(i \in \mathcal {L}\). Note that for \(d_1 < d_2\), \(\hat{A}(\tau , \mu _2, d_1)\) are the first \(d_1\) columns of \(\hat{A}(\mathcal {L}, \mu _2, d_2)\) (and similarly for \(\hat{C}(\mathcal {L}, \mu _2, d)\));

  • \(\hat{p}_h^\mathcal {L}\), \(\overline{X}_h^\mathcal {L}\), \(\overline{X}^\mathcal {L}\) and \(\widehat{\varSigma }^\mathcal {L}\) are, respectively, slices frequencies, conditional mean of X given the slices, mean of X given the slices and covariance of X for observations \(i \in \mathcal {L}\);

  • \(\widehat{\varPi }_{d,\mu _2}^{\mathcal {L}}\) is the \((\widehat{\varSigma }^{\mathcal {L}}+\mu _2\mathbb {I}_p)\)-orthogonal projector onto the space spanned by the first d columns of \(\hat{A}(\mathcal {L},\mu _2,d_0)\) and \(\widehat{\varPi }_{d,\mu _2}\) is \(\widehat{\varPi }_{d,\mu _2}^{\mathcal {L}}\) for \(\mathcal {L} = \{1,\,\ldots ,\,n\}\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Picheny, V., Servien, R. & Villa-Vialaneix, N. Interpretable sparse SIR for functional data. Stat Comput 29, 255–267 (2019). https://doi.org/10.1007/s11222-018-9806-6

Download citation

Keywords

  • Functional regression
  • SIR
  • Lasso
  • Ridge regression
  • Interval selection