Skip to main content

Partial Least Squares Models and Their Formulations, Diagnostics and Applications to Spectroscopy

  • 602 Accesses

Part of the Advances in Intelligent Systems and Computing book series (AISC,volume 1001)

Abstract

Partial least squares (PLS) models are a multivariate technique developed to solve the problem of multicollinearity and/or high dimensionality related to explanatory variables in multiple linear models. PLS models have been extensively applied assuming normality, but this assumption is not always fulfilled. For example, if the response variable has an asymmetric distribution or it is bounded into an interval, normality is violated. In this work, we present a collection of PLS models and their formulations, diagnostics and applications. Formulations are based on different symmetric, asymmetric and bounded distributions, such as normal, beta and Birnbaum-Saunders. Diagnostics are based on residuals and the Cook and Mahalanobis distances. Applications are provided using real-world spectroscopy data.

Keywords

  • Cook distance
  • Linear models
  • Mahalanobis distance
  • NIR spectra data
  • Principal component analysis
  • Quantile residuals
  • R software

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-21248-3_35
  • Chapter length: 26 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   229.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-21248-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   299.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.

References

  1. Ahmed, Y.: Textile industry of Pakistan. Horizon Securities SMC (2008)

    Google Scholar 

  2. Abdi, H.: Partial least squares regression and projection on latent structure regression (PLS Regression). WIREs Comput. Stat. 2, 97–106 (2010)

    CrossRef  Google Scholar 

  3. Akaike, H.: Information theory and an extension of the maximum likelihood principle, pp. 610–624. Hirotugu Akaike. Springer, New York (1992)

    Google Scholar 

  4. Bastien, P., Esposito, V., Tenenhaus, M.: PLS generalised linear regression. Comput. Stat. Data Anal. 48(1), 17–46 (2005)

    MathSciNet  CrossRef  Google Scholar 

  5. Bertrand, F., Meyer, N., et al.: Régression bêta PLS. Journal de la Société Française de Statistique 154, 143–159 (2013)

    MathSciNet  MATH  Google Scholar 

  6. Cook, R.D.: Detection of influential observation in linear regression. Technometrics 19(1), 15–18 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Cook, R.D., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, UK (1982)

    MATH  Google Scholar 

  8. Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004)

    MathSciNet  CrossRef  Google Scholar 

  9. Fox, J.: Regression Diagnostics: An Introduction. Sage, Newbury Park (1991)

    CrossRef  Google Scholar 

  10. Garcia-Papani, F., Leiva, V., Uribe-Opazo, M.A, Aykroyd, R.G.: Birnbaum-Saunders spatial regression models: diagnostics and application to chemical data. Chemom. Intell. Lab. Syst. 177, 114–128 (2018)

    Google Scholar 

  11. Garcia-Papani, F., Uribe-Opazo, M.A., Leiva, V., Aykroyd, R.G.: Birnbaum-Saunders spatial modelling and diagnostics applied to agricultural data. Stoch. Environ. Res. Risk Assess. 31(1), 105–124 (2017)

    Google Scholar 

  12. Geladi, P., Kowalski, B.: Partial least squares regression: a tutorial. Anal. Chim. Acta 1, 1–17 (1986)

    CrossRef  Google Scholar 

  13. Huerta, M., Leiva, V., Lillo, C., Rodriguez, M.: A beta partial least squares regression model: diagnostics and application to mining industry data. Appl. Stoch. Model. Bus. Ind. 34(3), 305–321 (2018)

    MathSciNet  CrossRef  Google Scholar 

  14. Jolliffe, I.: Principal Component Analysis. Wiley, New York, US (2002)

    MATH  Google Scholar 

  15. Kalivas, J.: Two data sets of near infrared spectra. Chemom. Intell. Lab. Syst. 37(2), 255–259 (1997)

    CrossRef  Google Scholar 

  16. Kotz, S., van Dorp, J.: Beyond Beta: Other Continuous Families of Distributions with Bounded Support and Applications. World Scientific, Singapore (2004)

    CrossRef  Google Scholar 

  17. Leão, J., Leiva, V., Saulo, H., Tomazella, V.: Incorporation of frailties into a cure rate regression model and its diagnostics and application to melanoma data. Stat. Med. 37(29), 4421–4440 (2018)

    MathSciNet  CrossRef  Google Scholar 

  18. Leiva, V., Ferreira, M., Gomes, M.I., Lillo, C.: Extreme value Birnbaum-Saunders regression models applied to environmental data. Stoch. Environ. Res. Risk Assess. 30(3), 1045–1058 (2016)

    CrossRef  Google Scholar 

  19. Leiva, V., Santos-Neto, M., Cysneiros, F.J.A., Barros, M.: Birnbaum-Saunders statistical modelling: a new approach. Stat. Model. 14(1), 21–48 (2014b)

    MathSciNet  CrossRef  Google Scholar 

  20. Li, B., Morris, J., Martin, E.: Model selection for partial least squares regression. Chemom. Intell. Lab. Syst. 64(1), 79–84 (2002)

    CrossRef  Google Scholar 

  21. Liu, S.: Local influence in multivariate elliptical linear regression models. Linear Algebr. Appl. 354(1–3), 159–174 (2002)

    MathSciNet  CrossRef  Google Scholar 

  22. Magnanensi, J., Bertrand, F., Maumy-Bertrand, M., Meyer, N.: A new universal resample-stable bootstrap-based stopping criterion for PLS component construction. Stat. Comput. 27, 757–774 (2017)

    MathSciNet  CrossRef  Google Scholar 

  23. Martens, H., Martens, M.: Multivariate Analysis of Quality: An Introduction. Wiley, New York, US (2001)

    MATH  Google Scholar 

  24. Martinez, J.L., Leiva, V., et al.: A new estimator for the covariance of the PLS coefficients estimator with applications to chemical data. J. Chemom. 32, 1–17 (2018). (e3069)

    CrossRef  Google Scholar 

  25. Marx, B.D.: Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4), 374–381 (1996)

    CrossRef  Google Scholar 

  26. Mevik, B., Wehrens, R., Liland, K.: Rpackage: pls, partial least squares and principal component regression (2013)

    Google Scholar 

  27. Santos-Neto, M., Cysneiros, F., Leiva, V., Barros, M.: On new parameterizations of the Birnbaum-Saunders distribution and its moments, estimation and application. REVSTAT Stat. J. 12, 247–272 (2014)

    MATH  Google Scholar 

  28. Santos-Neto, M., Cysneiros, F., Leiva, V., Barros, M.: Reparameterized Birnbaum-Saunders regression models with varying precision. Electron. J. Stat. 10, 2825–2855 (2016)

    MathSciNet  CrossRef  Google Scholar 

  29. Stone, M.: Comments on model selection criteria of Akaike and Schwarz. J. R. Stat. Soc. B 41, 276–278 (1979)

    Google Scholar 

  30. Varmuza, K., Filzmoser, P.: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, US (2009)

    Google Scholar 

  31. Ventura, M., Saulo, H., Leiva, V., Monzueto, S.: Log-symmetric regression models: information criteria, application to movie business and industry data with economic implications. Appl. Stoch. Model. Bus. Ind. (2019). (in press)

    Google Scholar 

  32. Williams, D.: Generalized linear model diagnostics using the deviance and single case deletions. J. R. Stat. Soc. C 36(2), 1181–191 (1987)

    MathSciNet  Google Scholar 

  33. Wold, S., Sjöström, M., Eirksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130 (2001)

    CrossRef  Google Scholar 

Download references

Acknowledgement

The authors thank the editors and reviewers for their comments on this manuscript. This research work was partially supported by FONDECYT 1160868 grant from the Chilean government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Víctor Leiva .

Editor information

Editors and Affiliations

Appendix: Reflectance Spectroscopy

Appendix: Reflectance Spectroscopy

Reflectance spectroscopy is a technique that has been used since the beginning of the 20th century mainly by chemists to identify certain compounds and minerals. However, since 1970, and due to advances in the field of electronics and optics, this technique of detection and analysis of certain compounds and mineral groups began to take a privileged place in topics of investigation and exploration of mineral resources.

Basically, spectroscopy is a technique based on the behavior of electromagnetic field waves which are emitted, absorbed or reflected by a solid, liquid or gaseous body. All matter that is subject to radiation effects (such as a beam of light) undergoes a phenomenon of reflection and absorption of energy. Figure 12 shows the behavior of a beam of light upon striking a given body in which one part of this beam of light is reflected and the other part propagates within the body being absorbed or transmitted. Both cases are manifested in the form of electromagnetic waves that can be measured and analyzed. Figure 13 presents the electromagnetic spectrum divided by type and wavelength, such as gamma rays, X-rays, ultraviolet rays, visible zone and infrared rays, among others.

Fig. 12.
figure 12

Behavior of a beam of light on a body.

Fig. 13.
figure 13

Electromagnetic spectrum.

Fig. 14.
figure 14

Main absorption traits of an electromagnetic spectrum.

The absorption and reflection of energy of a molecule are due to the chemical and physical characteristics of this such as the distribution of its atoms, electrical composition, physical properties, etc. The spectroscopy often used in the analysis of minerals in rocks employs the waves considered as the visible zone (between 350 nm and 780 nm), and the waves considered as NIR (780 nm at 2500 nm). The absorption and reflection of the waves in these ranges are due to the vibration and rotation movements at the level of the atoms of each molecule subject to radiation. Thus, if the radiation frequency equals the natural vibration frequency of a given molecule, a change in the amplitude of the molecular vibration is generated by absorbing the radiation. In this way, each different molecule has a different spectrum. We can define a spectrum as a two-dimensional continuous graph whose horizontal axis represents the wavelength to which matter is subjected, and its vertical axis represents the percentage (or proportion) of reflectance. One of the main features that must be considered in any spectrum to determine and identify the presence or absence of its compounds is the absorption traits. These change the shape and depth across different wavelengths depending on the chemical composition of the analyzed sample, giving signs of presence of certain compounds as \(\text {OH}\), \(\text {H}_2\text {O}\), \(\text {NH}_4\), \(\text {CO}_3\), among others. These absorption ranges can be sharp, double and treble, simple and open, among others. Figure 14 presents the main ranges of absorption that can be found in the analysis of spectra. For example, Fig. 15 shows some typical spectra of different minerals, illustrating their different shapes and positions with respect to the wavelength of their absorption traits.

Fig. 15.
figure 15

Example of spectra of certain minerals (top) and representation of (a) dickite (100%), (b) association of alunite-dickite and (c) alunite -100%- spectra (bottom).

It is worth to mention that, in practice, it is difficult to find pure samples of a particular compound, since the rocks usually present a mixture of several minerals. With the reflectance spectroscopy method, it is possible to detect these combinations of minerals through the presence of different absorption ranges which are typical of certain minerals in the spectrum. Figure 15 displays an example of the association of dickite and alunite by the representation of their spectra, observing that the features are well defined and combined in the sample.

It is important to mention that obtaining clean and accurate measurements of spectroscopy lead to useful and reliable results. For this reason, the whole process of sampling and analysis of the samples must be correctly carried out. Therefore, for a correct interpretation of the spectra, the following considerations must be taken into account:

  • Humidity: Water, like all chemical compounds, has well-defined spectral characteristics that can hide or dissolve the absorption traits of other minerals, generating inappropriate readings of the spectrum and consequently an imprecise interpretation of the sample subject to analysis. For this reason, it is important to consider the humidity of the rock in those minerals that do not contain water in its molecular structure.

  • Irregular surface: It is important that the surface of the mineral to be analyzed is as regular as possible (flat) to avoid deformed spectra. The latter phenomenon is known as “noise”. Some samples that may present this problem are very porous or fractured rocks.

  • Color: Because the spectroscopy method is based on the measurement of reflected waves by the minerals, the presence of certain dark minerals such as tourmaline can alter the levels of light absorption, hiding relevant data of other compounds of interest in the samples generating noise in the spectrum representation. The same happens when the minerals are translucent, such as gypsum, altering the levels of reflection in the measurement.

Some advantages of NIR spectroscopy method are the following:

  • It is a non-destructive or invasive technique.

  • Solid, liquid and gaseous samples can be analyzed.

  • The preparation of the sample is practically null.

  • The analysis is fast.

  • It has a very low cost.

  • There is no need to use solvents, so it does not generate waste.

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Huerta, M., Leiva, V., Marchant, C., Rodríguez, M. (2020). Partial Least Squares Models and Their Formulations, Diagnostics and Applications to Spectroscopy. In: Xu, J., Ahmed, S., Cooke, F., Duca, G. (eds) Proceedings of the Thirteenth International Conference on Management Science and Engineering Management. ICMSEM 2019. Advances in Intelligent Systems and Computing, vol 1001. Springer, Cham. https://doi.org/10.1007/978-3-030-21248-3_35

Download citation