In the previous section, results from different methods for various scenarios were compared to each other, within the context of their location on the energy dissipation spectrum. In this section, the agreement between various models is further discussed. This is done in two steps:
-
General comparison of damage quantification results for all contact cases, i.e. the similarity of casesets identified as damaging by each model and the correlation of the respective values.
-
Parametric influence on the degree of agreement.
General comparison
The percentages of nonzero damage cases for each model, Pi, are plotted in Fig. 7a. For a given model ‘i’ and a given number of contact cases studied ‘N0’, they are calculated as
$$ P_{i} = \left( {\frac{{|\{ \Delta D_{{{\text{Norm}}}} > 0\} |}}{{N_{0} }}} \right)_{i} , $$
(1)
The percentages of damaging cases increase in the order of Surface fatigue index < WLRM < KTH < Wedge model, signifying a general trend of how more complex models assign damage, albeit in very small increments, already reiterated from the previous section. The WLRM captures about 78% of the cases as damaging, the remaining 22% (ΔDNorm < 0) are not all ‘non-damaging’ from a strict sense since some of these points are captured at high energy dissipation values, due to flange contact. This only signifies that wear is dominating over RCF for these cases, but these cases are still classified as ‘non-damaging’ with respect to RCF.
Similarity of damaging casesets
It is necessary to find out whether models agree with each other in identifying the same set of contact cases qualitatively as ‘damaging’ for an overall comparison. This is achieved by examining the proportion of damaging contact cases that overlap amongst model combinations. For this, similarity percentage of caseset between two models ‘i’ and ‘j’ is given by the Jaccard similarity coefficient ‘Si-j’ in in Fig. 7b as
$$ S_{i{\text -}j} \leftrightarrow S_{j{\text -}i} = \frac{{\left| {\{ {\Delta }D_{{{\text{Norm}}}} > 0\}_{i} \cap \{ {\Delta }D_{{{\text{Norm}}}} > 0\}_{j} } \right|}}{{\left| {\{ {\Delta }D_{{{\text{Norm}}}} > 0\}_{i} \cup \{ {\Delta }D_{{{\text{Norm}}}} > 0\}_{j} } \right|}} , $$
(2)
where the numerator signifies the number of cases identified as damaging by both models and the denominator signifies the total number of cases identified as damaging by either model. As an example, SWedge-KTH = 85% indicates that amongst the total number of unique contact cases classified as damaging by Wedge and KTH models combined, there is an 85% overlap. This implies that the remaining 15% cases were classified as damaging by only one of the models (where the models’ assessment disagrees). This index does not indicate whether the damage increment values for a given case obtained from both models are equal, but rather the ‘similarity’ of the caseset in being qualitatively classified as ‘damaging’. In other words, the ‘similarity’ between two models is a measure of how much the assessment of a given contact case as ‘damaging’ by a given model agrees with the other. This is an important measure since models giving similar Pi may not necessarily agree for the same set of contact cases, making them very ‘dissimilar’ in their assessment despite similar percentage of nonzero damage cases.
The highest similarity is between the Wedge and KTH models amongst the combinations of four models. At the same time, the lowest similarity is seen between the WLRM and the surface fatigue index model. Model combinations involving either the WLRM or the surface fatigue index model generally exhibited lower similarity. This is broadly due to the higher number of contact cases identified as ‘damaging’ by the more detailed ‘local’ models (Wedge and KTH).
Correlation of damage increment values
Moving further, a measure for joint variability of results from both models is checked using correlation analysis in Fig. 7c. Higher correlation denotes higher values from one model mainly corresponding with higher values of the other and vice versa for lower values. For this, nDlog results from different models are checked for cross-correlation. This is given by the Pearson correlation coefficient (ρ) for models ‘i’ and ‘j’ as
$$ \rho_{i{\text -}j} \leftrightarrow \rho_{j{\text -}i} = \frac{{{\text{cov}}\left( {(nD_{\text{log}} )_{i} ,(nD_{\text{log}} )_{j} } \right)}}{{\sigma_{{(nD_{\text{log}} )_{i} }} \sigma_{{(nD_{\text{log}} )_{j} }} }} , $$
(3)
where ‘cov’ denotes covariance function and σ denotes standard deviation. This value varies between − 1 and 1, where a negative value implies an inverse relationship, 0 implies very low correlation (i.e. random behaviour), and a value of 1 suggests linear proportionality.
Higher correlation is observed for the Wedge-KTH combination than for all other model combinations. This implies that the relative trend of results w.r.t varying contact cases is more similar between these models. At the same time, both Wedge-WLRM and KTH-WLRM combinations are poorly correlated. Keeping similarity analysis conducted in the previous subsection in mind, even though Wedge-WLRM and KTH-WLRM combinations might be reasonably ‘similar’ in qualitatively identifying cases as damaging, the results are not quantitatively correlated (i.e. do not agree w.r.t the trend). Amongst the model combinations studied, higher correlation in results is noticed for models lying next to each other in the damage complexity axis depicted in Fig. 1.
Parametric influence
While overall trends were discussed in the previous subsection, the micro-trends pertaining particularly to the influence of curve radius and traction are discussed in this subsection. Only the overlapping damaging cases across models ({ΔDNorm > 0}i ∩ {ΔDNorm > 0}j) are considered. For this, the difference in logarithmic normalized damage increments ‘ΔnDlog’ from two models ‘i’ and ‘j’ is given by
$$ \left( {{\Delta }nD_{{{\text{log}}}} } \right)_{ij} = (nD_{{{\text{log}}}} )_{i} - (nD_{{{\text{log}}}} )_{j} , $$
(4)
and the corresponding spread of this difference is visualized using mean and standard deviations given by ‘μ(ΔnDlog)ij’ and ‘σ(ΔnDlog)ij’, respectively. The analysis of ΔnDlog values helps to visualize the factor by which the damage increments from both models (ΔDi and ΔDj) differ.
This is visualized in Fig. 8 for results from KTH and Wedge models. In the left plot (Fig. 8a), normalized damage is plotted for each case with the x- and y-axes indicating nDlog for the models being compared. Each point describes a unique contact case marked w.r.t curve radius (colours) and traction (symbol). The diagonal green line in the middle represents the ‘equivalence’ line. The nDlog is more equivalent for both models if the point is closer towards the equivalence line. The equivalence between two models is analysed further on the plot towards the right. Representative Gaussian distributions are plotted using μ(ΔnDlog)ij and σ(ΔnDlog)ij. High equivalence between models is achieved when μ(ΔnDlog)ij and σ(ΔnDlog)ij have a value close to 0.
In Fig. 8, the distribution curves of ΔnDlog are plotted w.r.t curve radius and traction for ΔnDlog-ij between Wedge and KTH models (ΔnDlog-KTH-Wedge). The distributions of contact cases at R = 600 m (indicated by black markers) are close to the equivalence line, implying high equivalence between models. For lower curve radius (R = 300 m) represented by the red curves, the distribution curves are offset towards the right (i.e. the position of μ(ΔnDlog)ij), implying that the KTH model assigns higher damage increment values than the Wedge model. For larger curve radius (R = 1100 m) represented by blue distributions, the distribution curves are offset towards the left, implying that the Wedge model assigns higher damage increment values than the KTH model. The offsets in distributions due to the influence of traction are much smaller and not as strongly noticed as for curve radius.
Overall, the combinations involving the WLRM show higher offset μ(ΔnDlog)i-WLRM as the Wedge-WLRM combination in Fig. 9 indicates. The offset for the cases at R = 300 m indicated with red distributions are particularly high (~6), indicating how the WLRM tends to predict damage increments several magnitudes higher than the Wedge model. Similar observations also hold for WLRM-KTH, WLRM-surface fatigue index combinations, respectively. The distribution plots for the other four model combinations have also been analysed and are listed in the appendix.
The summary of the parametric influence amongst all six model combinations is characterized by the mean and standard deviations (μ(ΔnDlog)ij and σ(ΔnDlog)ij) as visualized for two model combinations in Fig. 8 and Fig. 9. These plots help to visualize the influence of individual parameters while comparing two different RCF quantification models. As indicated in the figures, these plots need to be read within the context of similarity values as described in Sect. 5.1.1 since a higher similarity indicates that the distributions are constructed with a higher number of (overlapping) contact cases.
Overall evaluation
General statistics from Sect. 5.1 and parametric analyses from Sect. 5.2 are consolidated and presented together for all six model combinations in Table 3. The values are colour-coded w.r.t the degree of agreement in the combinations with green indicating a high degree of agreement while red indicating the opposite. The ‘degree of agreement’ indicates the relative closeness of results from the models studied. This is associated with higher similarity and correlation values. It also corresponds to high equivalence in the form of low offset μ(ΔnDlog)ij and low spread σ(ΔnDlog)ij in the comparison of results between the two models.
Table 3 Overall and parameter-wise evaluation of RCF damage quantification results in (ΔnDlog)ij for various model combinations ‘i-j’ (green–agreement; red–disagreement) Amongst the six combinations studied, the Wedge-KTH combination shows the best overall agreement. The similarity and correlation are the highest in the combination, implying a general agreement in both identifying the damaging cases and the relative trends in the damage increment values. Even though the overall offset μT(ΔnDlog)ij is low, the overall spread σT(ΔnDlog)ij is still high as we detect a strong radius-dependent mean offset in Fig. 8, visible in Table 3. However, if the influence of traction on the degree of agreement between KTH and Wedge models is considered, there is high agreement between the models, indicated by the low μ(ΔnDlog)ij and σ(ΔnDlog)ij values for all tractive scenarios. This implies that if the strong radius-dependent mean offset is accounted for, the damage functions for both models such as the ones plotted in Fig. 5 will coincide to a large extent, making the model predictions equivalent. This emphasizes the need to study parametric influence in addition to the overall comparison.
Following the Wedge-KTH combination, the next best combination is the Wedge-FI (fatigue index) combination that has lesser similarity and correlation compared to the former. This is followed by the KTH-FI combination. Even though these two combinations show slightly better values in the parametric influence section of Table 3, the overlapping cases (indicated by similarity percentage) are considerably lower than for the KTH-Wedge combination.
The lowest overall agreement was found for the Wedge-WLRM combination. Even though it has a rather high similarity percentage (i.e. overlapping cases classified as damaging), the results are poorly correlated. Also, it is characterized by high overall offset μT(ΔnDlog)ij and spread σT(ΔnDlog)ij. Even when the parametric influences are individually isolated and studied, there is still a very low degree of agreement with high μ(ΔnDlog)ij and σ(ΔnDlog)ij values regardless of the radius or tractive scenario. The KTH-WLRM combination follows next with a similar trend.
One of the main reasons why both KTH and Wedge models have higher similarity and correlation is the inclusion of the load distribution step, indicated by Fig. 3. This step is lacking in the other two models—leading to lower correlation values in KTH-WLRM and Wedge-WLRM combinations for instance. Another reason is the way how RCF is modelled in the approaches. For instance, the surface fatigue approach considers only surface-initiated RCF cracks for quantification while the WLRM also considers the effect of wear in decreasing cracks, leading to very low agreement between the models at higher energy dissipation values (see Fig. 5). The framework provides a way to address the effects of various aspects of RCF modelling when comparing results as described.
On evaluating overall agreement amongst the six combinations, models placed next to each other on the damage complexity axis described in Fig. 1 were more likely to show higher agreement, understandably so due to more commonalities in their approaches. However, this does not necessitate equal results from both models as the strong parametric influence of radius show for the KTH-Wedge model combination. However, there is relative agreement in the similarity of damaging casesets and the trends formed by the corresponding models.