Abstract
Often extensive spectral data is collected on multiple samples with the goal of predicting one or more properties of the sample. For example, measurements can be made at hundreds of wavelengths along with the more expensive assay values. The predictor variables are often highly correlated and it is expected that only small sections of the wave are pertinent to the measured analytes. There is a need to simplify or compress the predictors to both save data storage and possibly de-noise the data prior to making predictive models. Our idea is to use a factorial design (a two-step frame work) to explore two wavelet transformations, Haar wavelets and Daubechies wavelets, with progressively better approximation to the raw data curves in combination with several statistical prediction methods, including stepwise regression, principal component regression, ridge regression and partial least squares regression. The plan is to study prediction quality using Haar-Step, Haar-PCR, Haar-PLS, Haar-Ridge, Daubechies-Step, Daubechies-PCR, Daubechies-PLS and Daubechies-Ridge. Often PLS and stepwise regression can predict substance concentrations equally well. In such situations, the preferred statistical method should be the simplest method. From our studies, we conclude that the type of wavelet is unimportant, the number of wavelets should be large enough to capture most of the variability in the wave forms, and the choice of the statistical method depends on the analyte.
Similar content being viewed by others
References
H. Wold, Soft modeling by latent variables; the nonlinear iterative partial least squares approach, in Perspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett, ed. by J. Gandi (Academic Press, London, 1975)
W. Lindberg, J.-A. Persson, S. Wold, Partial least-squares method for spectrofluorimetric analysis of mixtures of humic acid and ligninsulfonate. Anal. Chem. 55, 643–648 (1983)
B.G. Osborne, T. Fearn, A.R. Miller, S. Douglas, Application of near infrared reflectance spectroscopy to the compositional analysis of biscuit doughs. J. Sci. Food Agric. 35, 99–105 (1984)
P.J. Brown, T. Fearn, M. Vannucci, Bayesian wavelet regression on curves with application to a spectroscopic calibration problem. JASA 96, 398–408 (2001)
R.A. Shaw, S. Low-Ting, M. Leroux, H.H. Mantsch, Toward reagent-free clinical analysis: quantitation of urine urea, creatinine, and total protein from the mid-infrared spectra of dried urine films. Clin. Chem. 46, 1493–1495 (2000)
I.E. Frank, J.H. Friedman, A statistical view of some chemometrics regression tools. Technometrics 35, 109–135 (1993)
M.A. Efroymson, Multiple regression analysis, in Mathematical Methods for Digital Computers, ed. by A. Ralston, H.S. Wilf (Wiley, New York, 1960)
W.F. Massy, Principal components regression in exploratory statistical research. J. Am. Stat. Assoc. 60, 234–246 (1965)
A.S. Hadi, R.F. Ling, Some cautionary notes on the use of principle components regression. Am. Stat. 52, 15–19 (1998)
M. Stone, Cross-validatory choice and assessment of statistical predictions (with discussion). J. R. Stat. Soc. Ser. B 36, 111–147 (1974)
P.H. Garthwaite, An interpretation of partial least squares. JASA 89, 122–127 (1994)
S. de Jong, SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab. 18, 251–263 (1993)
H. Abdi, Partial least square regression (PLS regression), in Encyclopedia of Measurement and Statistics, ed. by N.J. Salkind (Sage, CA, 2007), pp. 740–744
N.A. Butler, M.C. Denham, The peculiar shrinkage properties of partial least squares regression. J. R. Stat. Soc. 62, 585–593 (2000)
C. Goutis, Partial least squares algorithm yields shrinkage estimators. Ann. Stat. 24, 816–824 (1996)
P. Hoskuldsson, PLS regression models. J. Chemom. 2, 1–218 (1988)
R.D. Tobias, An Introduction to Partial Least Squares Regression (SAS Institute Inc., Carey, 1997)
A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 69–82 (1970)
A.C. Rencher, F.C. Pun, Inflation of R2 in best subset regression. Technometrics 22, 49–53 (1980)
C.S. Burrus, R.A. Gopinath, H. Gou, Introduction to Wavelets and Wavelet Transforms: A Primer (Prentice Hall, New Jersey, 1997)
D. Donoho, J. Johnstone, Ideal special adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994)
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning (Springer, Berlin, 2001)
Acknowledgments
We acknowledge the support of National Center for Theoretical Sciences (South), Taiwan.
Conflict of interest
The authors declare no competing financial interest.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chen, SC., Hayden, D.M. & Young, S.S. The wavelet transforms and statistical models for near infrared spectra analysis. J Math Chem 53, 551–572 (2015). https://doi.org/10.1007/s10910-014-0434-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10910-014-0434-x