Skip to main content
Log in

A partial least squares solution to the problem of multicollinearity when predicting the high temperature properties of 1Cr–1Mo–0.25V steel using parametric models

  • Published:
Journal of Materials Science Aims and scope Submit manuscript

Abstract

Recently there has been renewed interest in assessing the predictive accuracy of existing parametric models of creep properties, with the recently develop Wilshire methodology being largely responsible for this revival. Without exception, these studies have used multiple linear regression analysis (MLRA) to estimate the unknown parameters of the models, but such a technique is not suited to data sets where the predictor variables are all highly correlated (a situation termed multicollinearity). Unfortunately, because all existing long-term creep data sets incorporate accelerated tests, multicollinearity will be an issue (when temperature is held high, stress is always set low yielding a negative correlation). This article quantifies the severity of this potential problem in terms of its effect on predictive accuracy and suggests a neat solution to the problem in the form of partial least squares analysis (PLSA). When applied to 1Cr–1Mo–0.25V steel, it was found that when using MLRA nearly all the predictor variables in various parametric models appeared to be statistically insignificant despite these variables accounting for over 90% of the variation in log times to failure. More importantly, the same linear relationship appeared to exist between the first PLS component and the log time to failure in both short and long times to failure and this enabled more accurate extrapolations to be made of the time to failure, compared to when the models were estimated using MLRA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. ECCC Recommendations (1995) In: Holdsworth SR et al (eds) Creep data validation and assessment procedures, vol. 5: Data assessment

  2. Holdsworth SR (1996) In: Proceedings of the 6th international conference on creep and fatigue design and life assessment at high temperatures, Paper C494/087, IMechE, London

  3. Holdsworth SR, Davies RB (1999) Nuclear Engineering & Design 190(3):287. doi:10.1016/S0029-5493(99)00038-2

    Article  CAS  Google Scholar 

  4. Evans M (2009) J Eng Mater Technol 131(2). doi:10.1115/1.3078391

  5. Holdsworth SR (2004) Materials at High Temperatures 21(1):125

    Article  Google Scholar 

  6. Holdsworth SR, Merckling G (2003) In: Proceedings of the 6th international Charles Parsons conference on engineering issues in turbine machinery, power plant and renewables, Trinity College, Dublin

  7. Holdsworth SR, Askins M, Baker A, Gariboldi E, Holmstrom S, Klenk A, Ringel M, Merckling G, Sandstrom M, Schwienheer M (2005) In Proceedings of the 1st ECCC creep conference on creep and fracture in high temperature components, London, p 380

  8. Maddala GS, Lahiri K (2009) Introduction to econometrics, 4th edn. Wiley, Chichester, p 279

    Google Scholar 

  9. Dorn JE, Shepherd LA (1954) In: Proceedings symposium on the effect of cyclical heating and stressing on metals at elevated temperatures, ASTM Special Technical Publications, 165, Chicago

  10. Eyring H, Gladston S, Laidler KJ (1941) The theory of rate processes. McGraw-Hill, New York

    Google Scholar 

  11. Boccaletti G, Borri FR, D’Esponosa F, Ghio E (1989) In: Pollino E (ed) Microelectronic reliability, volume II, reliability, integrity assessment and assurance, Chapter 11. Artech House, Norwood

    Google Scholar 

  12. Klinger DJ (1991) In: Proceedings of the annual reliability and maintainability symposium, institute of electrical and electronics engineering, New York, p 295

  13. Nelson W (1990) Accelerated testing: statistical models test plans and data analyses. Wiley, New York

    Google Scholar 

  14. Monkman FC, Grant NJ (1963) In: Grant NJ, Mullendore AW (eds) Deformation and fracture at elevated temperature. MIT Press, Boston

  15. Ashby MF, Jones RH (1996) Engineering materials 1: an introduction to their properties and applications, 2nd edn. Butterworth-Heinemann, Oxford, p 169

    Google Scholar 

  16. Manson SS, Muraldihan U (1983) Analysis of creep rupture data in five multi heat alloys by minimum commitment method using double heat term centring techniques. Research Project 638-1, EPRI Cs-3171

  17. Trunin, II, Golobova NG, Loginov EA (1971) In: Proceedings of the 4th international symposium on heat resistant metallic materials, Mala Fatra, CSSR, p 168

  18. NIMS Creep Data Sheet No. 9b (1990)

  19. Theil H (1966) Applied economic forecasting. North Holland Publishing Company, Amsterdam

    Google Scholar 

  20. Wilshire B, Scharning PJ (2008) Mater Sci Technol 24(1):1. doi:10.1179/174328407X245779

    Article  CAS  Google Scholar 

  21. Childs D (1970) The essentials of factor analysis. Holt, Rinehart and Winston

    Google Scholar 

  22. Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London

    Google Scholar 

  23. Kendall MG, Stuart A (1966) The advanced theory of statistics, vol 3. Charles Griffith & Co, London

    Google Scholar 

  24. Swingler K (1996) Applying neural networks: a practical guide. Academic Press, San Francisco

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Evans.

Appendix

Appendix

For the sake of generality, suppose there is a sample of size n from which to estimate a linear relationship between Y and the explanatory variables X 1, X 2,…, X m . In the context of the creep data set used in this article, Y would be the log of the minimum creep rate, the log time to failure or some other transformed creep property, X 1 would be the stress, X 2 would be 1/RT and the other X’s would be transformations and/or combinations of these two explanatory variables.

For i = 1,…, n, the ith datum in the sample is denoted by {x l(i),…, x m (i), y(i)}. Also, the vectors of observed values of Y and X j are denoted by y and x j , so y = {y(1),…, y(n)}′ and, for j = 1,…, m, x j = {x j (1),…, x j (n)}′. Denote their sample means by \( \bar{y} = \Upsigma_{i} y(i)/n \) and \( \bar{x}_{j} = \Upsigma_{i} x_{j} (i)/n. \) To simplify notation, Y and the X j are centered to give variables U 1 and V lj , where \( U_{ 1} = Y - \bar{y} \) and, for j = 1,…, m, \( V_{lj} = X_{j} - \bar{x}_{j} . \) The sample means of U 1 and V lj are 0, and their data values are denoted by \( {\mathbf{u}}_{ 1} = {\mathbf{y}} - \bar{y} \cdot {\mathbf{1}} \) and \( {\mathbf{v}}_{1j} = {\mathbf{x}}_{j} - {\bar{\text{x}}}_{j} \cdot {\mathbf{1}} \), where 1 is the n-dimensional unit vector, {1,…, 1}′. It is also possible to standardise Y and the X j to give variables U 1* and V 1j *, where U 1* = U 1/S Y and \( V_{1j}^{\ast} = V_{lj} /S_{{X_{j} }} \) and where S Y refers to the standard deviation of Y and \( S_{{X_{j} }} \) are the standard deviations of X j .

The correlation matrix for all the explanatory variables is then given by R = (1/(n − 1))v 1*/v 1*. The principal components are obtained from the spectral decomposition, R = HΔH′, where Δ = diag{λ1 ≥ λ2 ≥ ··· ≥ λ m } are the eigenvalues and H = (h 1,…, h m ) are the corresponding eigenvectors. Essentially these eigenvectors contain the loadings shown in Eq. 11a of the main text. The m principal components are then given by

$$ z_{i} = {\mathbf{v}}_{1}^{\ast} {\mathbf{h}}_{i} $$
(15)

The partial least squares components can be derived in a similar way using singular value decompositions of v 1 (within this approach, PLS stands for Projections to Latent Structures). Alternatively, the components can be determined sequentially, and it is this approach that is best summarised by the name partial least squares. The first component, T 1, is intended to be useful for predicting U 1 and is constructed as a linear combination of the V lj ’s. During its construction, sample correlations between the V lj ’s are ignored. To obtain T 1, U 1 is first regressed against V 11, then against V 12, and so on for each V lj , in turn. Sample means are 0, so for j = 1,…, m the resulting least squares regression equations are

$$ U_{ 1j} = b_{1j} V_{1j} + \eta_{1} \quad {\text{with}}\quad b_{1j} = {{{\mathbf{v}}^{\prime}_{1{\mathbf{j}}} {\mathbf{u}}_{1} } \mathord{\left/ {\vphantom {{v^{\prime}_{1j} u_{1} } {\left( {v^{\prime}_{ij} v_{1j} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {{\mathbf{v}}^{\prime}_{1{\mathbf{j}}} {\mathbf{v}}_{1{\mathbf{j}}} } \right)}} $$
(16)

where η1 is a random error term. Given values of the V lj , for a further item, each of the m equations in Eq. 16 provides an estimate of U l. To reconcile these estimates, whilst ignoring interrelationships between the V lj , a simple average, Σ j b 1j V 1j /m or, more generally, a weighted average can be used

$$ T_{1} = \sum\limits_{j = 1}^{m} {w_{1j} b_{1j} V_{1j} } $$
(17)

In the true spirit of PLS these weights will be inversely proportional to the variances of the b 1j ’s, namely w 1j  = (n − 1)var(V ij ), where var(V ij ) stands for variance of V ij . An obvious alternative weighting policy is to set each w 1j equal to 1/m, so that each predictor of U 1 is given equal weight. This seems a natural choice and is also in the spirit of PLS, which aims to spread the load amongst the X variables in making predictions. The latter weighting scheme is used in this research article.

The procedure extends iteratively in a natural way to give components T 2,…, T p , where each component is determined from the residuals of regressions on the preceding component, with residual variability in Y being related to residual information in the X’s. Specifically, suppose that T i (i ≥ 1) has just been constructed from variables U 1 and V 1j , (j = 1,…, m) and let T i , U i , and the V ij have sample values t i , u i and v ij . From their construction, it will easily be seen that their sample means are all 0. To obtain T i+1, first the V (i+l)j ’s and U i+1 are determined. For j = 1,…, m, V ij is regressed against T i , giving t i v ij /(t i t i ) as the regression coefficient, and V (i+l)j is defined by

$$ V_{ (i + 1 )j} = V_{ij} - \left\{ {{{{\mathbf{t}}^{\prime}_{i} {\mathbf{v}}_{ij} } \mathord{\left/ {\vphantom {{{\mathbf{t}}^{\prime}_{i} {\mathbf{v}}_{ij} } {({\mathbf{t}}^{\prime}_{i} {\mathbf{t}}_{i} )}}} \right. \kern-\nulldelimiterspace} {({\mathbf{t}}^{\prime}_{i} {\mathbf{t}}_{i} )}}} \right\}{\mathbf{T}}_{i} $$
(18)

Its sample values, v (i+1)j , are the residuals from the regression. Similarly, U i+1 is defined by U i+1 = U i  − {t i u i /(t i t i )}T i , and its sample values, u i+1, are the residuals from the regression of U i on T i .

The “residual variability” in Y is U i+1 and the “residual information” in X j is V (i+l)j , so the next stage is to regress U i+1 against each V (i+l)j in turn. The jth regression yields b (i+1)j V (i+)j as a predictor of U i+1, where

$$ b_{ (i + 1 )j} = {{{\mathbf{v}}^{\prime}_{({\mathbf{i}} + 1){\mathbf{j}}} {\mathbf{u}}_{{\mathbf{i}} + 1} } \mathord{\left/ {\vphantom {{v^{\prime}_{(i + 1){\mathbf{j}}} u_{i + 1} } {\left( {v^{\prime}_{(i + 1)j} v_{(i + 1)j} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {{\mathbf{v}}^{\prime}_{({\mathbf{i}} + 1){\mathbf{j}}} {\mathbf{v}}_{({\mathbf{i}} + 1){\mathbf{j}}} } \right)}} $$
(19)

Forming a linear combination of these predictors, as in Eq. 17, gives the next component

$$ T_{i + 1} = \sum\limits_{j = 1}^{m} {w_{(i + 1)j} b_{(i + 1)j} V_{(i + 1)j} } \quad {\text{with}}\quad w_{{\left( {i + 1} \right)j}} = \left( {n - 1} \right){\text{var}}\left( {V_{{\left( {i + 1} \right)j}} } \right) $$
(20)

The PLS regression equation can then take various forms. It may be a simple linear regression of the form

$$ Y = \beta_{0} + \beta_{1} T_{1} + \beta_{2} T_{2} + \cdots + \beta_{m} T_{p} + \varepsilon $$
(21)

where each component T k is a linear combination of the X j , the sample correlation for any pair of components is 0 and p < m. Alternatively, if scatter plots of Y against the T i trace out curves rather than lines, non linear regressions could be used. For example

$$ \begin{aligned} Y = \beta_{0} + \beta_{1} T_{1}^{2} + \beta_{2} T_{2}^{2} + \cdots + \beta_{m} T_{p}^{2} + \varepsilon \quad {\text{or}} \\ \ln (Y) = \beta_{0} + \beta_{1} \ln \left( {T_{1} } \right) + \beta_{2} \ln \left( {T_{2} } \right) + \cdots + \beta_{m} \ln \left( {T_{p} } \right) + \varepsilon \\ \end{aligned} $$

Finally, if the functional relationship is unclear, Multi Layer Perceptron Neural Networks can be used, where the T i components form the inputs to the network and Y is the output.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Evans, M. A partial least squares solution to the problem of multicollinearity when predicting the high temperature properties of 1Cr–1Mo–0.25V steel using parametric models. J Mater Sci 47, 2712–2724 (2012). https://doi.org/10.1007/s10853-011-6097-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10853-011-6097-0

Keywords

Navigation