# Partial Least Squares Models and Their Formulations, Diagnostics and Applications to Spectroscopy

## Abstract

Partial least squares (PLS) models are a multivariate technique developed to solve the problem of multicollinearity and/or high dimensionality related to explanatory variables in multiple linear models. PLS models have been extensively applied assuming normality, but this assumption is not always fulfilled. For example, if the response variable has an asymmetric distribution or it is bounded into an interval, normality is violated. In this work, we present a collection of PLS models and their formulations, diagnostics and applications. Formulations are based on different symmetric, asymmetric and bounded distributions, such as normal, beta and Birnbaum-Saunders. Diagnostics are based on residuals and the Cook and Mahalanobis distances. Applications are provided using real-world spectroscopy data.

## Keywords

Cook distance Linear models Mahalanobis distance NIR spectra data Principal component analysis Quantile residuals R software## Notes

### Acknowledgement

The authors thank the editors and reviewers for their comments on this manuscript. This research work was partially supported by FONDECYT 1160868 grant from the Chilean government.

## References

- 1.Ahmed, Y.: Textile industry of Pakistan. Horizon Securities SMC (2008)Google Scholar
- 2.Abdi, H.: Partial least squares regression and projection on latent structure regression (PLS Regression). WIREs Comput. Stat.
**2**, 97–106 (2010)CrossRefGoogle Scholar - 3.Akaike, H.: Information theory and an extension of the maximum likelihood principle, pp. 610–624. Hirotugu Akaike. Springer, New York (1992)Google Scholar
- 4.Bastien, P., Esposito, V., Tenenhaus, M.: PLS generalised linear regression. Comput. Stat. Data Anal.
**48**(1), 17–46 (2005)MathSciNetCrossRefGoogle Scholar - 5.Bertrand, F., Meyer, N., et al.: Régression bêta PLS. Journal de la Société Française de Statistique
**154**, 143–159 (2013)MathSciNetzbMATHGoogle Scholar - 6.Cook, R.D.: Detection of influential observation in linear regression. Technometrics
**19**(1), 15–18 (1977)MathSciNetzbMATHGoogle Scholar - 7.Cook, R.D., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, UK (1982)zbMATHGoogle Scholar
- 8.Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat.
**31**(7), 799–815 (2004)MathSciNetCrossRefGoogle Scholar - 9.Fox, J.: Regression Diagnostics: An Introduction. Sage, Newbury Park (1991)CrossRefGoogle Scholar
- 10.Garcia-Papani, F., Leiva, V., Uribe-Opazo, M.A, Aykroyd, R.G.: Birnbaum-Saunders spatial regression models: diagnostics and application to chemical data. Chemom. Intell. Lab. Syst.
**177**, 114–128 (2018)Google Scholar - 11.Garcia-Papani, F., Uribe-Opazo, M.A., Leiva, V., Aykroyd, R.G.: Birnbaum-Saunders spatial modelling and diagnostics applied to agricultural data. Stoch. Environ. Res. Risk Assess.
**31**(1), 105–124 (2017)Google Scholar - 12.Geladi, P., Kowalski, B.: Partial least squares regression: a tutorial. Anal. Chim. Acta
**1**, 1–17 (1986)CrossRefGoogle Scholar - 13.Huerta, M., Leiva, V., Lillo, C., Rodriguez, M.: A beta partial least squares regression model: diagnostics and application to mining industry data. Appl. Stoch. Model. Bus. Ind.
**34**(3), 305–321 (2018)MathSciNetCrossRefGoogle Scholar - 14.Jolliffe, I.: Principal Component Analysis. Wiley, New York, US (2002)zbMATHGoogle Scholar
- 15.Kalivas, J.: Two data sets of near infrared spectra. Chemom. Intell. Lab. Syst.
**37**(2), 255–259 (1997)CrossRefGoogle Scholar - 16.Kotz, S., van Dorp, J.: Beyond Beta: Other Continuous Families of Distributions with Bounded Support and Applications. World Scientific, Singapore (2004)CrossRefGoogle Scholar
- 17.Leão, J., Leiva, V., Saulo, H., Tomazella, V.: Incorporation of frailties into a cure rate regression model and its diagnostics and application to melanoma data. Stat. Med.
**37**(29), 4421–4440 (2018)MathSciNetCrossRefGoogle Scholar - 18.Leiva, V., Ferreira, M., Gomes, M.I., Lillo, C.: Extreme value Birnbaum-Saunders regression models applied to environmental data. Stoch. Environ. Res. Risk Assess.
**30**(3), 1045–1058 (2016)CrossRefGoogle Scholar - 19.Leiva, V., Santos-Neto, M., Cysneiros, F.J.A., Barros, M.: Birnbaum-Saunders statistical modelling: a new approach. Stat. Model.
**14**(1), 21–48 (2014b)MathSciNetCrossRefGoogle Scholar - 20.Li, B., Morris, J., Martin, E.: Model selection for partial least squares regression. Chemom. Intell. Lab. Syst.
**64**(1), 79–84 (2002)CrossRefGoogle Scholar - 21.Liu, S.: Local influence in multivariate elliptical linear regression models. Linear Algebr. Appl.
**354**(1–3), 159–174 (2002)MathSciNetCrossRefGoogle Scholar - 22.Magnanensi, J., Bertrand, F., Maumy-Bertrand, M., Meyer, N.: A new universal resample-stable bootstrap-based stopping criterion for PLS component construction. Stat. Comput.
**27**, 757–774 (2017)MathSciNetCrossRefGoogle Scholar - 23.Martens, H., Martens, M.: Multivariate Analysis of Quality: An Introduction. Wiley, New York, US (2001)zbMATHGoogle Scholar
- 24.Martinez, J.L., Leiva, V., et al.: A new estimator for the covariance of the PLS coefficients estimator with applications to chemical data. J. Chemom.
**32**, 1–17 (2018). (e3069)CrossRefGoogle Scholar - 25.Marx, B.D.: Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics
**38**(4), 374–381 (1996)CrossRefGoogle Scholar - 26.Mevik, B., Wehrens, R., Liland, K.: Rpackage: pls, partial least squares and principal component regression (2013)Google Scholar
- 27.Santos-Neto, M., Cysneiros, F., Leiva, V., Barros, M.: On new parameterizations of the Birnbaum-Saunders distribution and its moments, estimation and application. REVSTAT Stat. J.
**12**, 247–272 (2014)zbMATHGoogle Scholar - 28.Santos-Neto, M., Cysneiros, F., Leiva, V., Barros, M.: Reparameterized Birnbaum-Saunders regression models with varying precision. Electron. J. Stat.
**10**, 2825–2855 (2016)MathSciNetCrossRefGoogle Scholar - 29.Stone, M.: Comments on model selection criteria of Akaike and Schwarz. J. R. Stat. Soc. B
**41**, 276–278 (1979)Google Scholar - 30.Varmuza, K., Filzmoser, P.: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, US (2009)Google Scholar
- 31.Ventura, M., Saulo, H., Leiva, V., Monzueto, S.: Log-symmetric regression models: information criteria, application to movie business and industry data with economic implications. Appl. Stoch. Model. Bus. Ind. (2019). (in press)Google Scholar
- 32.Williams, D.: Generalized linear model diagnostics using the deviance and single case deletions. J. R. Stat. Soc. C
**36**(2), 1181–191 (1987)MathSciNetGoogle Scholar - 33.Wold, S., Sjöström, M., Eirksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst.
**58**, 109–130 (2001)CrossRefGoogle Scholar