Abstract
Partial least squares (PLS) models are a multivariate technique developed to solve the problem of multicollinearity and/or high dimensionality related to explanatory variables in multiple linear models. PLS models have been extensively applied assuming normality, but this assumption is not always fulfilled. For example, if the response variable has an asymmetric distribution or it is bounded into an interval, normality is violated. In this work, we present a collection of PLS models and their formulations, diagnostics and applications. Formulations are based on different symmetric, asymmetric and bounded distributions, such as normal, beta and Birnbaum-Saunders. Diagnostics are based on residuals and the Cook and Mahalanobis distances. Applications are provided using real-world spectroscopy data.
Keywords
- Cook distance
- Linear models
- Mahalanobis distance
- NIR spectra data
- Principal component analysis
- Quantile residuals
- R software
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ahmed, Y.: Textile industry of Pakistan. Horizon Securities SMC (2008)
Abdi, H.: Partial least squares regression and projection on latent structure regression (PLS Regression). WIREs Comput. Stat. 2, 97–106 (2010)
Akaike, H.: Information theory and an extension of the maximum likelihood principle, pp. 610–624. Hirotugu Akaike. Springer, New York (1992)
Bastien, P., Esposito, V., Tenenhaus, M.: PLS generalised linear regression. Comput. Stat. Data Anal. 48(1), 17–46 (2005)
Bertrand, F., Meyer, N., et al.: Régression bêta PLS. Journal de la Société Française de Statistique 154, 143–159 (2013)
Cook, R.D.: Detection of influential observation in linear regression. Technometrics 19(1), 15–18 (1977)
Cook, R.D., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, UK (1982)
Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004)
Fox, J.: Regression Diagnostics: An Introduction. Sage, Newbury Park (1991)
Garcia-Papani, F., Leiva, V., Uribe-Opazo, M.A, Aykroyd, R.G.: Birnbaum-Saunders spatial regression models: diagnostics and application to chemical data. Chemom. Intell. Lab. Syst. 177, 114–128 (2018)
Garcia-Papani, F., Uribe-Opazo, M.A., Leiva, V., Aykroyd, R.G.: Birnbaum-Saunders spatial modelling and diagnostics applied to agricultural data. Stoch. Environ. Res. Risk Assess. 31(1), 105–124 (2017)
Geladi, P., Kowalski, B.: Partial least squares regression: a tutorial. Anal. Chim. Acta 1, 1–17 (1986)
Huerta, M., Leiva, V., Lillo, C., Rodriguez, M.: A beta partial least squares regression model: diagnostics and application to mining industry data. Appl. Stoch. Model. Bus. Ind. 34(3), 305–321 (2018)
Jolliffe, I.: Principal Component Analysis. Wiley, New York, US (2002)
Kalivas, J.: Two data sets of near infrared spectra. Chemom. Intell. Lab. Syst. 37(2), 255–259 (1997)
Kotz, S., van Dorp, J.: Beyond Beta: Other Continuous Families of Distributions with Bounded Support and Applications. World Scientific, Singapore (2004)
Leão, J., Leiva, V., Saulo, H., Tomazella, V.: Incorporation of frailties into a cure rate regression model and its diagnostics and application to melanoma data. Stat. Med. 37(29), 4421–4440 (2018)
Leiva, V., Ferreira, M., Gomes, M.I., Lillo, C.: Extreme value Birnbaum-Saunders regression models applied to environmental data. Stoch. Environ. Res. Risk Assess. 30(3), 1045–1058 (2016)
Leiva, V., Santos-Neto, M., Cysneiros, F.J.A., Barros, M.: Birnbaum-Saunders statistical modelling: a new approach. Stat. Model. 14(1), 21–48 (2014b)
Li, B., Morris, J., Martin, E.: Model selection for partial least squares regression. Chemom. Intell. Lab. Syst. 64(1), 79–84 (2002)
Liu, S.: Local influence in multivariate elliptical linear regression models. Linear Algebr. Appl. 354(1–3), 159–174 (2002)
Magnanensi, J., Bertrand, F., Maumy-Bertrand, M., Meyer, N.: A new universal resample-stable bootstrap-based stopping criterion for PLS component construction. Stat. Comput. 27, 757–774 (2017)
Martens, H., Martens, M.: Multivariate Analysis of Quality: An Introduction. Wiley, New York, US (2001)
Martinez, J.L., Leiva, V., et al.: A new estimator for the covariance of the PLS coefficients estimator with applications to chemical data. J. Chemom. 32, 1–17 (2018). (e3069)
Marx, B.D.: Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4), 374–381 (1996)
Mevik, B., Wehrens, R., Liland, K.: Rpackage: pls, partial least squares and principal component regression (2013)
Santos-Neto, M., Cysneiros, F., Leiva, V., Barros, M.: On new parameterizations of the Birnbaum-Saunders distribution and its moments, estimation and application. REVSTAT Stat. J. 12, 247–272 (2014)
Santos-Neto, M., Cysneiros, F., Leiva, V., Barros, M.: Reparameterized Birnbaum-Saunders regression models with varying precision. Electron. J. Stat. 10, 2825–2855 (2016)
Stone, M.: Comments on model selection criteria of Akaike and Schwarz. J. R. Stat. Soc. B 41, 276–278 (1979)
Varmuza, K., Filzmoser, P.: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, US (2009)
Ventura, M., Saulo, H., Leiva, V., Monzueto, S.: Log-symmetric regression models: information criteria, application to movie business and industry data with economic implications. Appl. Stoch. Model. Bus. Ind. (2019). (in press)
Williams, D.: Generalized linear model diagnostics using the deviance and single case deletions. J. R. Stat. Soc. C 36(2), 1181–191 (1987)
Wold, S., Sjöström, M., Eirksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130 (2001)
Acknowledgement
The authors thank the editors and reviewers for their comments on this manuscript. This research work was partially supported by FONDECYT 1160868 grant from the Chilean government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Reflectance Spectroscopy
Appendix: Reflectance Spectroscopy
Reflectance spectroscopy is a technique that has been used since the beginning of the 20th century mainly by chemists to identify certain compounds and minerals. However, since 1970, and due to advances in the field of electronics and optics, this technique of detection and analysis of certain compounds and mineral groups began to take a privileged place in topics of investigation and exploration of mineral resources.
Basically, spectroscopy is a technique based on the behavior of electromagnetic field waves which are emitted, absorbed or reflected by a solid, liquid or gaseous body. All matter that is subject to radiation effects (such as a beam of light) undergoes a phenomenon of reflection and absorption of energy. Figure 12 shows the behavior of a beam of light upon striking a given body in which one part of this beam of light is reflected and the other part propagates within the body being absorbed or transmitted. Both cases are manifested in the form of electromagnetic waves that can be measured and analyzed. Figure 13 presents the electromagnetic spectrum divided by type and wavelength, such as gamma rays, X-rays, ultraviolet rays, visible zone and infrared rays, among others.
The absorption and reflection of energy of a molecule are due to the chemical and physical characteristics of this such as the distribution of its atoms, electrical composition, physical properties, etc. The spectroscopy often used in the analysis of minerals in rocks employs the waves considered as the visible zone (between 350 nm and 780 nm), and the waves considered as NIR (780 nm at 2500 nm). The absorption and reflection of the waves in these ranges are due to the vibration and rotation movements at the level of the atoms of each molecule subject to radiation. Thus, if the radiation frequency equals the natural vibration frequency of a given molecule, a change in the amplitude of the molecular vibration is generated by absorbing the radiation. In this way, each different molecule has a different spectrum. We can define a spectrum as a two-dimensional continuous graph whose horizontal axis represents the wavelength to which matter is subjected, and its vertical axis represents the percentage (or proportion) of reflectance. One of the main features that must be considered in any spectrum to determine and identify the presence or absence of its compounds is the absorption traits. These change the shape and depth across different wavelengths depending on the chemical composition of the analyzed sample, giving signs of presence of certain compounds as \(\text {OH}\), \(\text {H}_2\text {O}\), \(\text {NH}_4\), \(\text {CO}_3\), among others. These absorption ranges can be sharp, double and treble, simple and open, among others. Figure 14 presents the main ranges of absorption that can be found in the analysis of spectra. For example, Fig. 15 shows some typical spectra of different minerals, illustrating their different shapes and positions with respect to the wavelength of their absorption traits.
It is worth to mention that, in practice, it is difficult to find pure samples of a particular compound, since the rocks usually present a mixture of several minerals. With the reflectance spectroscopy method, it is possible to detect these combinations of minerals through the presence of different absorption ranges which are typical of certain minerals in the spectrum. Figure 15 displays an example of the association of dickite and alunite by the representation of their spectra, observing that the features are well defined and combined in the sample.
It is important to mention that obtaining clean and accurate measurements of spectroscopy lead to useful and reliable results. For this reason, the whole process of sampling and analysis of the samples must be correctly carried out. Therefore, for a correct interpretation of the spectra, the following considerations must be taken into account:
-
Humidity: Water, like all chemical compounds, has well-defined spectral characteristics that can hide or dissolve the absorption traits of other minerals, generating inappropriate readings of the spectrum and consequently an imprecise interpretation of the sample subject to analysis. For this reason, it is important to consider the humidity of the rock in those minerals that do not contain water in its molecular structure.
-
Irregular surface: It is important that the surface of the mineral to be analyzed is as regular as possible (flat) to avoid deformed spectra. The latter phenomenon is known as “noise”. Some samples that may present this problem are very porous or fractured rocks.
-
Color: Because the spectroscopy method is based on the measurement of reflected waves by the minerals, the presence of certain dark minerals such as tourmaline can alter the levels of light absorption, hiding relevant data of other compounds of interest in the samples generating noise in the spectrum representation. The same happens when the minerals are translucent, such as gypsum, altering the levels of reflection in the measurement.
Some advantages of NIR spectroscopy method are the following:
-
It is a non-destructive or invasive technique.
-
Solid, liquid and gaseous samples can be analyzed.
-
The preparation of the sample is practically null.
-
The analysis is fast.
-
It has a very low cost.
-
There is no need to use solvents, so it does not generate waste.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Huerta, M., Leiva, V., Marchant, C., Rodríguez, M. (2020). Partial Least Squares Models and Their Formulations, Diagnostics and Applications to Spectroscopy. In: Xu, J., Ahmed, S., Cooke, F., Duca, G. (eds) Proceedings of the Thirteenth International Conference on Management Science and Engineering Management. ICMSEM 2019. Advances in Intelligent Systems and Computing, vol 1001. Springer, Cham. https://doi.org/10.1007/978-3-030-21248-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-21248-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21247-6
Online ISBN: 978-3-030-21248-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)