A Functional Approach to Variable Selection in Spectrometric Problems

  • Fabrice Rossi
  • Damien François
  • Vincent Wertz
  • Michel Verleysen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4131)

Abstract

In spectrometric problems, objects are characterized by high-resolution spectra that correspond to hundreds to thousands of variables. In this context, even fast variable selection methods lead to high computational load. However, spectra are generally smooth and can therefore be accurately approximated by splines. In this paper, we propose to use a B-spline expansion as a pre-processing step before variable selection, in which original variables are replaced by coefficients of the B-spline expansions. Using a simple leave-one-out procedure, the optimal number of B-spline coefficients can be found efficiently. As there is generally an order of magnitude less coefficients than original spectral variables, selecting optimal coefficients is faster than selecting variables. Moreover, a B-spline coefficient depends only on a limited range of original variables: this preserves interpretability of the selected variables. We demonstrate the interest of the proposed method on real-world data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Benoudjit, N., Cools, E., Meurens, M., Verleysen, M.: Chemometric calibration of infrared spectrometers: Selection and validation of variables by non-linear models. Chemometrics and Intelligent Laboratory Systems 70(1), 47–53 (2004)CrossRefGoogle Scholar
  2. 2.
    Benoudjit, N., François, D., Meurens, M., Verleysen, M.: Spectrophotometric variable selection by mutual information. Chemometrics and Intelligent Laboratory Systems 74(2), 243–251 (2004)CrossRefGoogle Scholar
  3. 3.
    Rossi, F., Lendasse, A., François, D., Wertz, V., Verleysen, M.: Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chemometrics and Intelligent Laboratory Systems 80(2), 215–226 (2006)CrossRefGoogle Scholar
  4. 4.
    Ramsay, J., Silverman, B.: Functional Data Analysis. Springer Series in Statistics. Springer, Heidelberg (1997)MATHGoogle Scholar
  5. 5.
    Hastie, T., Mallows, C.: A discussion of A statistical view of some chemometrics regression tools by I.E. Frank and J.H. Friedman. Technometrics 35, 140–143 (1993)CrossRefGoogle Scholar
  6. 6.
    Marx, B.D., Eilers, P.H.: Generalized linear regression on sampled signals with penalized likelihood. In: Forcina, A., Marchetti, G.M., Hatzinger, R., Falmacci, G. (eds.) Statistical Modelling. Proceedings of the 11th International workshop on Statistical Modelling, Orvietto (1996)Google Scholar
  7. 7.
    Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 12(1), 55–67 (1970)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Pezzulli, S., Silverman, B.: On smoothed principal components analysis. Computational Statistics 8, 1–16 (1993)MATHMathSciNetGoogle Scholar
  9. 9.
    Hastie, T., Buja, A., Tibshirani, R.: Penalized discriminant analysis. Annals of Statistics 23, 73–102 (1995)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Cardot, H., Ferraty, F., Sarda, P.: Functional linear model. Statist. & Prob. Letters 45, 11–22 (1999)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Rossi, F., Delannay, N., Conan-Guez, B., Verleysen, M.: Representation of functional data in neural networks. Neurocomputing 64, 183–210 (2005)CrossRefGoogle Scholar
  12. 12.
    Rossi, F., Conan-Guez, B.: Theoretical properties of projection based multilayer perceptrons with functional inputs. Neural Processing Letters 23(1), 55–70 (2006)CrossRefGoogle Scholar
  13. 13.
    Biau, G., Bunea, F., Wegkamp, M.: Functional classification in Hilbert spaces. IEEE Transactions on Information Theory 51, 2163–2172 (2005)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7–9), 730–742 (2006)CrossRefGoogle Scholar
  15. 15.
    Alsberg, B.K.: Representation of spectra by continuous functions. Journal of Chemometrics 7, 177–193 (1993)CrossRefGoogle Scholar
  16. 16.
    Alsberg, B.K., Kvalheim, O.M.: Compression of nth-order data arrays by b-splines. part 1: Theory. Journal of Chemometrics 7(1), 61–73 (1993)CrossRefGoogle Scholar
  17. 17.
    Olsson, R.J.O., Karlsson, M., Moberg, L.: Compression of first-order spectral data using the b-spline zero compression method. Journal of Chemometrics 10(5–6), 399–410 (1996)CrossRefGoogle Scholar
  18. 18.
    de Boor, C.: A Practical Guide to Splines. Applied Mathematical Sciences, vol. 27. Springer, Heidelberg (1978)MATHGoogle Scholar
  19. 19.
    Daubechies, I.: Orthonormal bases of compactly supported wavelets. Communications in Pure & Applied Mathematics 41, 909–996 (1988)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Fabrice Rossi
    • 1
  • Damien François
    • 2
  • Vincent Wertz
    • 2
  • Michel Verleysen
    • 3
  1. 1.Projet AxIS, INRIALe ChesnayFrance
  2. 2.Université catholique de Louvain – Machine Learning Group, CESAMELouvain-la-NeuveBelgium
  3. 3.Machine Learning Group, DICEUniversité catholique de LouvainLouvain-la-NeuveBelgium

Personalised recommendations