1 Introduction

With a broad interpretation of the concept of endurance as a property of mechanical systems to resist in-service degradation and to preserve the initial parameters of the technical condition during exploitation, it becomes obvious that exactly this property determines the chosen strategy of equipment maintenance and repair. Components of technological machines (drive unit, executive mechanism, tools, base support structures) have different durability, that is why they are serviced according to different approaches, embodying a mixed strategy as a whole. In the contemporary industrial production Technical Conditional Maintenance (TCM) strategy became widespread that allows more dipply using of the equipment working life. TCM strategy provides moving the emphasis from repair measures to diagnostic operations. For most components of technological equipment, such operations that identify the technical condition are well known and practiced. But for Base Structures (BS—housings, frames, racks, bodies) technical condition diagnostic is an unusual task. It can be explained that BS mostly was treated as components for which maintenance not required, as components with unlimited working life.

Nevertheless, a technological machine has a certain established working life. Since BS's are not intended to be replaced while using it can be assumed that they determine the actual working life of the entire machine. The relevance of this work is due to the global tendency to increase the working life of technological equipment in all industries. Due to the procedure of extending the working life terms not only the specific costs of maintaining the equipment, but also the cost of production is reduced. The possibility of extending the working life is determined by the technical condition of the BS. In general, determining the technical condition during the extension of the working life is not much different from the same procedure during the ongoing inspection. The difference is that the object has exhausted its safe working life and, as usual, needs to be repaired with the modernization.

The aim of the work was to trace the relationship between the durability of the equipment and its maintenance strategy. It is most telling to do this on the example of BS. They have a long service life and during this time manage to accumulate certain damages that need to be diagnosed, after which decisions on its maintenance and repair must be made. The tasks of the research included the adaptation of risk analysis methods to the assessment of the technical condition, as well as the development of an algorithm for forecasting the remaining useful life of BS after their long-term operation.

For BSs, it is recommended to use a risk-based maintenance strategy (RBM), that is a version of the TCM strategy. Breakdown of a BS is basically equivalent to the breakdown of the entire machine. That is, the damage caused by the breakdown of the BS is the most significant among the structural elements of the machine. In the early stages of use, such losses are compensated by the low chance of breakdown. At later stages of usage, when the question of its extension comes up, the chance of breakdown increases. This requires risk control, that also increases.

There is a well-known RBM-strategy, that uses the resource safety index βP as a diagnostic indicator [1]. Its scope of application is limited: from the beginning of equipment use, when reliability P → 1 and until it reaches reliability P = 0.95–0.91. After long-term operation, BS have crossed this threshold and reliability becomes significantly lower. Therefore, there is a need to improve the diagnostic factors for RBM.

The report is structured in the following way. Sect. 1 provides an overview of modern trends in the RBM methodology, as well as the problems that exist in diagnosing the technical condition of BC of industrial equipment. Sect. 2 is devoted to the development of RBM models and algorithms for industrial equipment. Sect. 3 shows how the proposed models are put into practice for diagnosing the technical condition of pipe rolling unit beds after long-term operation.

2 Current issues of industrial equipment risk assessment

2.1 Features of RBM application in industry

After it originated in the mid-twentieth century as a means of assessing the impact of man-made disasters, risk analysis methods evolved into the theoretical basis of RBM in the early twenty-first century. Such a strategy, actively involving predictive practices, allows for maintaining the required safety and reducing the expenses of maintaining technical facilities. The history of hazardous events in areas such as technological installations shows that many accidents occurred due to ineffective maintenance planning strategies [2]. RBM is widely used nowadays, starting from static structures such as bridges [3, 4] to dynamic objects such as airplanes [5, 6] and railway transport [7, 8]. At the same time, modern technical diagnostic tools and information technologies are used to the extent that is consistent with smart maintenance, as well as the principles of Industry 4.0 [9,10,11,12].

Much attention is paid to the collection of data related to available state systems, the creation of a Big Data bank and their analysis using cloud technologies [2, 9, 10, 12]. Their use facilitates individualized maintenance based on machine learning systems. The importance of using Big Data is due to a somewhat paradoxical situation. Critical elements of mechanical systems tend to fail occasionally. Increased reliability makes risk analysis more difficult [5]. Therefore, it is necessary to consider breakdown histories and accident scenarios that occurred at facilities of the same type but located in different production facilities. Thanks to them, specialists actually get another result in joint research of mechanical systems in operation. This situation regulates the shift from classical reliability methods to structural reliability methods, as well as the shift from mathematical and statistical methods to probabilistic and physical methods of breakdown prediction.

As it is well known, the risk is interpreted as a combination of the probability and severity of the breakdown. However, at the initial stages of RBM development, the risk was associated exclusively with the probability of breakdown [5]. Nowadays, there is a move away from this simplistic approach. The severity of a breakdown can be considered through its severity or criticality. Usually, criticality is calculated from the failure mode and effects analysis (FMEA) matrices as a multiplication of severity, occurrence, detection (SOD) ranks to determine the priority number of the risk. Sometimes a risk mitigation capability is added to this [6]. In this aspect, such a proposal corresponds to the phenomenon of risk aversion in Farmer's model. The number of factors that are considered in risk analysis is increasing. For railway bridges, their criticality is ranked by 30 parameters [4].

In RBM, universal optimization models based on the criterion of minimum unit costs are used for planning. There is a tendency to increase the number of phases of the facility's condition and the number of risk levels appearing in the models [7, 8]. There is a tendency to increase the number of object state phases Along with the cost of the facility's stay in operation, inspection, and maintenance phases, the cost of the manufacturing phase or overhaul with modernization is introduced. The entire life cycle of a technical system is covered. This raises the hierarchical level of the beneficiary. That is, savings in manufacturing can lead to losses in operation. Conversely, additional costs during modernization are "recouped" in the form of increased safety. The benefit or economic effect is obtained at the level of the network or equipment stock [3]. This approach can lead to unexpected conclusions. For example, a high potential risk encourages bridges to be constructed of expensive corrosion-resistant beams instead of carbon steel beams. Designing projects with a low-risk threshold is unprofitable where the costs of maintaining an adequate level of reliability and safety are high.

The cost of technical systems is also increased by their equipment with performance monitoring tools. Nevertheless, service personnel incur such costs because monitoring systems prevent accidents and help reduce the current costs of equipment maintenance [9]. The presence of monitoring systems simplifies the task of continuing the facility's operation after reaching the designated resource. This procedure is complex and includes models for risk prediction, crack growth, instrumental damage diagnosis, and operating cost determination [5].

2.2 The challenges of prolonging the operation of base structures

The problem of assessing the possibility of extending the service life of equipment lies in the difficulty of determining the remaining useful life of BS. The reasons for this are as follows.

1. When designing BS, their service life was not calculated. It was believed that reliability was ensured by creating significant static strength reserves, as a rule. At the same time, the stress state was assessed mainly by conservative means. Facilities with 25–50 years of operation need to extend their service life and at the time the BS was created, there were no refined programs such as the finite element method. Therefore, the reliability of the assessment is one-sided, and there are objective grounds to reconsider the possibility of further service life.

As practice shows, when BS is extended, it is necessary to move from the concept of safety margins to the concept of individual life. This is due to the fact that after long-term operation, the metal cannot provide the standard safety margins. But the residual life can be tens of thousands of hours.

2. During the long-term operation of the base structures, due attention was not paid to their loading conditions. It is not always understood by experts that BS elements of technological equipment suffer from fatigue damage. At the present stage of engineering development, fatigue resistance models have been applied to objects whose safety was previously considered in a completely static aspect. These objects include bridges, buildings, pipelines, and supporting structures of industrial equipment [5]. Therefore, when reassessing the technical condition of BS elements, it is necessary to reconstruct the operating conditions and loading processes. It is important to determine the amount of work performed on the equipment.

3. The weight-size parameters of the base structures are much more significant than those of other elements of the mechanical system. In terms of modeling fatigue resistance, this requires considering a number of factors. First, it is a large-scale effect. Its effects can be leveled out in the format of a local deformation approach. Secondly, the complexity of the BS shape along with their length regulates the connections used in manufacturing (welding, riveting, etc.). Such technologies provoke the appearance of defects. Therefore, a certain portion of the BS service life is due to the development of crack-like defects. Moreover, quite often the existing cracks can be slowed down without bringing the structure to complete destruction.

3.1. A consequence of the complexity of the shape is the third characteristic factor associated with high stress gradients. That is, some BS zones are much more stressed than others. This results in a low metal utilization rate in the structure, which also encourages the prolongation of BS operation. Closely related to the same reasons is the fourth characteristic factor that affects the resource model. It is a complex stress state that leads to multi-axial metal fatigue. Quite often, several sources of vibration activity are placed on the base structures, which can contribute to disproportionate loading.

3.2. There is difficulty in determining the signs of BS destruction due to the difficulty of accessing hazardous areas without disassembling the object.

4. After a long-term operation, the mechanical properties of the metal degrade, which is caused not only by operating time but also by long-term storage. The degree of degradation of certain property indicators is a diagnostic parameter of the technical condition of BS. In addition, this phenomenon should be considered when assessing the remaining service life.

5. Metal degradation is not the only consequence of long-term operation. The fact is that the model of unlimited durability ceases to apply at stresses that are less than the endurance limit. The metal enters the zone of gigacycle or very high cycle fatigue. For example, on the automatic mill of the 350 pipe-rolling unit, the elements of the housing received 1.4‧108 load cycles during 80 years of operation. Therefore, the fatigue model should be adjusted for this phenomenon when extending the service life.

6. When assessing the reliability and safety of BS, the problem of multifocal damaging associated with multiple sources of damage emerges [6]. It is necessary to move from the "many points" damage indicators to a certain complex indicator of the technical condition of the entire structure. The problem is solved by combining the reliability indicators of individual BS elements, which are the objects of such damage.

7. For BS, there is a problem with assigning a type of limit state. It can seem that there are no questions here—as long as the BS retains its load-bearing capacity, it is in serviceable condition. The load-bearing capacity is usually lost when it is completely destroyed. But this state is not so easy to achieve for a BS because there are redundant elements, additional connections, supports, etc. They add static uncertainty to the structure, making the rate of crack growth difficult to predict. Often, short cracks appear that change the stiffness of the BS, equalizing their stresses. In this case, the crack propagation rate decreases, which is typical for statically indeterminate structures. The process of loss of bearing capacity is a sequence of stages of breakdown development, each of which is described by its own model. The total service life is represented as the sum of the durability before cracking and the survivability at different stages of fracture. In this aspect, this situation coincides with the concept of phased service life, which is the basis of safety theory [13].

8. The impossibility of a clear interpretation of the BS limit state entails the difficulty of assigning time periods for the operation of the equipment as a whole. In turn, this makes it impossible to use formalized algorithms to determine the optimal maintenance strategy, for example, by the criteria of minimizing equipment maintenance costs.

9. The uncertainty of extending the conclusions about the technical condition of BS obtained during the study of a single sample to the entire fleet of similar equipment units. If it is necessary to extend the service life of the entire stock in operation, it is necessary to develop special forecasting algorithms that take into account the additional uncertainty of operating conditions.

3 Models and algorithms for RBM

3.1 Risk as a complex diagnostic parameter

The standard definition of risk i is the product of the frequency of accidents f and the losses caused by them (severity) S: i = f∙S. On this principle, the risk is measured in terms of losses per unit of time (loss intensity). Using the Bayesian interpretation of probability as a measure of the reliability of the result, the risk can be represented as: i = Q∙S, where Q is the probability of breakdown (accident). That is, the risk is a specific indicator that determines the absolute (total) damage I during the operation time t: I = i∙t.

Since the intensity of losses or severity of breakdowns is a vague concept and is difficult to quantify, only the probability of breakdown is often used to assess risk [5]. Thus, in risk analysis, it is fashionable to separate its material and frequency components. It is more productive to use the concept of a dimensionless risk characteristic in the form of an odds-ratio:

$$ \rho \, = \,Q\,/\,P\, = \,Q\,/\,\left( {1\, - \,Q} \right) $$
(1)

where P is the probability of breakdown-free operation during the service life.

Then the operation safety will be:

$$ R\, = \,1\, - \,\rho $$
(2)

Thus, in this aspect, safety represents the condition of the equipment, in which the risks of its operation do not exceed acceptable (permissible) levels [13].

From the above expressions, the following follows. The deterministic calculations used to assess strength when the probability P = 0.5 are useless for assessing safety since they do not provide it. In this case, risk ρ = 1 and safety R = 0. At realistically achievable levels of reliability, where the probability of breakdown does not exceed Q < 0.05, the safety level R practically becomes equal to the probability of breakdown-free operation. In other words, when Q < 0.05, the risk is practically equal to the probability of breakdown: ρ≈ Q.

Given the level of damage from the breakdown of the entire technical system SΣ, denoting the significance of the breakdown of the i-th element under the influence of the k-th degradation process as criticality or significance uik = Sik/SΣ, we obtain the generalized risk of the system:

$$ i_{\Sigma } \, \cdot \, = \,S_{\Sigma } \, \cdot \,\sum Q_{ik} \, \cdot \,u_{ik} $$
(3)

Therefore, it is possible to compare risks within the same object using a dimensionless expression that is signed as a sum, which can be called a dimensionless risk:

$$ \rho_{\Sigma } \, = \,\sum \rho_{ik} \, = \,\sum Q_{ik} \,u_{ik} $$
(4)

On this basis, risk can be interpreted as the product of the probability of breakdown and its significance, which is confirmed by expression (3).

Risk, being a measure of damage intensity, can be a diagnostic parameter that determines the moment of recovery or the period of operation. This is evident from its connection with the breakdown curve z-t (Fig. 1), which is a monotonic function with increasing intensity [14]. If the value of i = const, the operation process is constant. The constancy of the risk indicates a linear growth of maintenance costs and losses over time. A sharp increase in the slope of the function of losses or losses I-t, when i2 > i1 (Fig. 1), indicates the need for repair actions.

Fig. 1
figure 1

The scheme of the formation of the kinetics of changes of risks during operation t (I quadrant) according to the breakdown curve z(t) (II quadrant) through the Farmer curves (III quadrant) and the total losses of the SΣ system (IV quadrant)

The scheme of the algorithm proposed in [13], which confirms the possibility of using risk as a diagnostic parameter, to some extent contradicts Farmer's curves. As we know, they are sometimes called equal risk curves. As a result, one may get the impression that the risk of a system is its constant quality. This is not the case, as evidenced by the improved algorithm scheme (Fig. 1). Here, the starting point is the experimental breakdown curve z(t), which has an increasing intensity. The breakdown rate f plays this role. With the operating time from t1 to t2, the number of breakdowns increases disproportionately from z1 to z2. This indicates an increase in the breakdown rate from f1 to f2. This is the basis for changing the parameters of the Farmer’s curve, in particular, the value of F0 increases. Each type of breakdown j is characterized by losses Sj (mostly unchanged) and frequency fj. Together, they form the total losses SΣ, which are determined by the Farmer's curve. This means that from time t1 to t2, losses from SΣ1 increase to SΣ2. As a result, the risk as the intensity of total losses increases. When it exceeds the critical value, the operation must be stopped. The peculiarity of the algorithm is its posteriori nature. The need to control the parameters of the equal risk curves after a significant period of time (for example, annually) makes this algorithm effective for determining the optimal service life, but unsuitable for planning current restoration measures.

The criterion of the limit state according to the concept of acceptable risk is summarized as follow:

$$ \rho_{j} \, \le \,\left[ \rho \right]\quad {\text{or}}\quad R_{j} \, \ge \,\left[ R \right] $$
(5)

when the current levels of risk or safety ρj and Rj are equal to the limit values [ρ] and [R].

It is possible to objectively establish the level of acceptable risk [ρ] using a comprehensive diagram of performance indicators that combines diagrams of losses, risks, product and resource costs [13]. An increase in downtime losses leads to a decrease in the level [ρ], which necessitates a decrease in the current risk ρj. In such situations, the requirements for controlling mechanical systems become more stringent, which leads to an increase in inspection costs.

3.2 Optimum period of recovery measures to minimize maintenance and repair costs

After its appearance at the end of the twentieth century, the RBM strategy was focused on minimizing the risk of operation high-cost and dangerous objects. The periodicity of restoration measures was fixed, which was determined on the basis of the risk matrix [15]. This is, in fact, a kind of preventive maintenance strategy. Only the variability of the composition of repair and restoration operations brings this type of RBM strategy closer to the TCM strategy [16]. Subsequently, as a result of the increase in the cost of inspection operations and the invention (discovery) of the concept of risk aversion for the RBM strategy, models of the periodicity of restoration measures began to be used according to the criterion of the minimum specific costs for maintenance and repair [17,18,19,20]. In them, the risk indicator varies from 0 to 1, but the lack of clear recommendations for its purpose somewhat restrains the use of models.

In the maintenance theory, a certain analog of risk is the intensity of operating costs (specific costs) c for maintenance and repair, measured in the same units as risk itself i [14]. In fact, the risk is the unplanned part of the maintenance costs attributed to the guaranteed maintenance time. However, the risk takes into account not only the costs of restoring the Cp but also production losses, including social ones. The costs of restoration of the Cp correspond to the preventive maintenance strategy (PM). With corrective maintenance (CM), the cost of post-accident repair of CC increases in accordance with the risk. Then the cost intensity for CM will be:

$$ c_{c} \, = \,c_{p} \, + \,i $$
(6)

In the most common and simplest case, the state of a technical system is described by two phases: preventive maintenance, where the specific costs are Cp, and corrective maintenance, with specific costs Cc, which also reflect the consequences of emergency breakdowns. In this model, the PM phase corresponds to the regular (technologically planned) maintenance schedule and, if necessary, includes the costs of diagnosing the facility. Therefore, the probability of being in this phase Pp (tj) is the probability of system uptime P(t). In contrast, the probability of being in the corrective repair phase Pc(tj) corresponds to the probability of breakdown, i.e. Pc(tj) = 1- P(t). Then the cost function will look like this:

$$ C\,\left( t \right)\, = \,C_{p} \,\left( t \right)\, + \,C_{c} \,\left( t \right)\, = \,P\,\left( t \right)\,C_{p} \, + \,\left[ {1\, - \,P\left( t \right)} \right]\,C_{c} $$
(7)

If we use the relative measurements of cost in the form cr = Cc /Cp > 1, then the above formula can be transformed as:

$$ C\,\left( t \right)\, = \,C_{p} \,\left[ {P\,\left( t \right)\, + \,\left[ {1\, - \,P\,\left( t \right)\,c_{r} } \right]} \right] $$
(8)

In this case, the planned repair costs Cp play the role of scale, and the cost change schedule will be characterized by the function C(t)/Cp, which acquires the features of universality.

To obtain the cost intensity function required for optimization, this expression should be divided by the average time between breakdowns in the interval from 0 to the maximum set operating time T0, which for this task is chosen as the overhaul (inspection) interval δ:

$$ c\,\left( t \right)\, = \,c_{p} \,\left( t \right)\, + \,c_{c} \,\left( t \right)\, = \,\frac{{P\,\left( t \right)\,C_{p} \, + \,\left[ {1\, - \,P\,\left( t \right)\,C_{c} } \right]}}{{\int_{0}^{\delta } {P\,\left( t \right)\,dt} }} $$
(9)

In many cases, it is permissible to use the upper bound (δ or ΔT) as a solution to the integral in the denominator.

This equation can be transformed as:

$$ c\,\left( t \right)\, = \,\frac{{\left( {P\,\left( t \right)\, + \,\left[ {1\, - \,P\,\left( t \right)} \right]\,c_{r} } \right)\,C_{p} }}{{\delta_{P} }} $$
(10)

where δP is the guaranteed durability for the probability P at the selected inspection interval δ.

The coefficient b can be taken as a unit cost indicator, which will be:

$$ b\, = \,\frac{c\,\left( t \right)}{{C_{p} }}\, = \,\frac{{P\,\left( t \right)\, + \,Q\,\left( t \right)\,c_{r} }}{{\delta_{P} }} $$
(11)

where Q(t) is the breakdown function.

The indicator b has a dimension of time−1. The smaller it is, the lower the repair costs. The optimal recovery interval δopm will correspond to the minimum on the graph [(c(t)/Cp = b)—(δ)] (Fig. 2). It should also be noted that the parameter cr characterizes the risk: the higher it is, the greater the risk of operation.

Fig. 2
figure 2

The overall tendency to change the position of the optimal recovery period δopt (arrow) with an increase in the risk indicator cr

The connection of the cr parameter with risk follows from Eq. (6). Then we have:

$$ c_{r} \, = \,1\, + \,\frac{i}{{c_{p} }} $$
(12)

The last component is the well-known coefficient Ac = i/cp, which characterizes the increase in the cost of post-accident repairs compared to planned repairs. It takes into account losses from additional downtime, losses from the accident itself, increased costs of emergency production of destroyed elements, and lost profits. The cr factor has a simple relationship with the cr parameter: Ac = c-1. Taking this into account:

$$ i\, = \,c_{p} \,\left( {c_{r} \, - \,1} \right) $$
(13)

Here, the intensity of planned expenditures cr plays the role of scale. Since Ac˃1, the minimum value of the parameter cr = 2. If we interpret dimensionless risk as the probability of system breakdown ρΣ, then taking into account Eqs. (3, 4), we have the system risk:

$$ i\, = \,S_{\sum } \rho_{\sum } \, = \,c_{p} \,A_{c} $$
(14)

As the parameter cr increases, the optimal recovery period δopm decreases (Fig. 2). If at low risks (cr ˂10) the recovery period has little effect on costs and it is possible to use a corrective CM strategy, then with an increase in risk, the optimal period δorm decreases several times. This in itself entails an increase in unit costs. In addition, indicator b increases, which further increases the cost intensity. At high risks (cr → 100), the minimum of the function b(δ) tends to zero, i.e. δopt → 0. In such conditions, the cost minimization criterion loses its effectiveness, and it is worth using complex indicators of the technical conditions. Therefore, it can be stated that controlling the operation risk is an effective means of increasing the efficiency of mechanical systems maintenance.

3.3 Indicators for determining the technical condition of base structures

TCM—strategies require a thorough selection of diagnostic indicators that closely correlate with the actual technical condition. Strategies focused on reliability and risk control (reliability-centered maintenance, risk-based maintenance) belong to the TCM class and use complex diagnostic parameters. Among the main requirements for them is sensitivity to operating time at all stages of operation.

In this aspect, the probability of unfailure P or the probability of breakdown (failure) Q = P-1 is widely used as a comprehensive diagnostic parameter. The reasons for their choice, as mentioned above, are that the criticality of breakdown has a vague interpretation [5]. However, these indicators, i.e. Q and P, are less sensitive to the operating time t than the dimensionless risk indicator ρ (Fig. 3). This can be determined by the size of the multiplier, which is located in the linear equations Q(a) and ρ(a) next to the argument a (Table 1). The value of a is a relative resource, and thus characterizes the operating time. For the risk functions ρ(a), this factor is 2–5 times larger than in the breakdown equations Q(a).

Fig. 3
figure 3

Failure functions Q (blue lines) and risk ρ (red lines) in the relative operating time λt for one element (z = 1, lines 2—4), for mechanical systems of 10 elements (z = 10, lines 5—8) at breakdowns of sudden (Q0, ρ0, lines 1,2,5,6) and gradual (Qgr, ρgr, lines 1,2,5,6) types

Table 1 Linear approximations of breakdown and risk functions

In this analysis, the failure or breakdown function was calculated according to the exponential law Q0 = 1-exp(-a), which is considered to be the main one in classical reliability. It is effective for sudden type breakdowns. Sometimes they include fatigue breakdowns. However, if the structure is maintained and its condition is diagnosed, fatigue damage is transferred to the category of gradual damage, in accordance with its nature. For gradual breakdowns, their probability decreases according to the dependence:

$$ Q_{gr} \, = \,Q_{0} \,\left( {1\, - \,e^{ - a} } \right) $$
(15)

In accordance with the defined values of Q0 and Qgr, ρ0 and ρgr were calculated according to Eq. (1). The effect of the number of system elements z was also analyzed. For this purpose, we considered a system of 10 elements that had a single-order breakdown rate λ. This system was used in the authors' previous studies [1]. The reliability of the system was determined by the reliability of individual elements according to the rule of multiplication. Increasing the number of elements inadequately to the experimental results increases the breakdown and risk functions. Such inadequacy creates the illusion of the onset of full risk after a relatively small relative operating time (at a˃0.25). In this case, the question arises of assigning a different level of acceptable risk than [ρ] = 1.

It can be said that the risk function ρ(a) is more sensitive to the operating time in all situations than the breakdown function Q(a). Therefore, the ρ(a) function is better suited as a diagnostic parameter. In certain cases, it has an advantage over the resource safety index βP. The latter is based on the approximate equality ρ≈ Q if the reliability P˃0.9. In other situations, the βP method retains the disadvantages inherent in the breakdown function. It should be noted that treating the dimensionless risk as odds-ratio Eq. (1) simplifies the determination of the limit state of the diagnostic parameter, which is usually equal to [ρ] = 1.

Since the diagnosis of base structures occurs mainly when their resource is almost exhausted, i.e., a → 1, P˂˂0.9, the authors used the risk function method for them. This possibility is stimulated by the fact that the base structures considered in the following are considered as a mono-element. That is, the safety of the studied mills is assessed by the state of the metal in one most cyclically loaded locations. When considering the BS as a multi-element system (z˃˃1), the use of the ρ(a) method leads to too conservative decisions. In this case, it is worth using the βP method, which works well in similar situations.

In the end, it is worth noting that the use of the exponential reliability law for BS is not correct, although experts do so quite often. The fact is that the reliability function as a product of classical reliability is derived from the facts of breakdowns of system elements during operation (PΣ(t), Fig. 4), when the resource exhaustion is small (a → 0). When the exponential function is extrapolated to the zone of high operating time (a → 1), the reliability is underestimated relative to the actual observed one. For BS, in general, it is not realistic to obtain an experimental reliability function. Therefore, we move from classical reliability to structural reliability. Due to the application of the principle of "physics of failure", the durability (lifetime) distribution function of the (TΣp)−1 = Pst(t) is found, which is extrapolated as a function of structural reliability Pst(t) to the zone of low operating time and high-reliability levels (Fig. 4). In this case, the reliability function Pst(t) changes slightly for most of the operating period. That is, it is insensitive to operating time. To "get around" this drawback, the method of reliability index [21] and the method of safety index, which is universal in monitoring the technical condition [1, 13], were developed.

Fig. 4
figure 4

Formation of functions: a of system reliability PΣ(t) based on the facts of failures of various types (points) after operating time tΣj; b distribution of system durability TΣP according to the predicted LDF of TiP elements, as well as their relationship with the Farmer curve f(S)

The difference between the two approaches to reliability assessment is clearly illustrated by the formation of reliability functions using the Farmer’s curve (Fig. 4). In the classical approach, the reliability function PΣ(t) is a posteriori since it is built on the basis of breakdowns of various types during testing or operation of the entire system. The time to failure tΣj has a relatively short period, which means that the consequences and risks are insignificant. In this zone, the actual parameters of the reliability function are determined, after which it is extrapolated to the high-risk breakdown zone (dashed line PΣ(t), Fig. 4). In the structural approach, the median durability \(\overline{{T_{i} }}\) of high-risk elements and low-breakdown rate f is used as initial data. After combining the lifetime distribution functions (LDF) of TiP, the LDF of the TΣP system is obtained. It is the inverse of the reliability function and, unlike it, has an a priori character. This reliability function is extrapolated to the low-risk failure zone (dashed TΣP, Fig. 4).

The system reliability function PΣ(t), obtained using the classical approach, gives a conservative result in the zone of high operating time, which makes it impossible to use it for life extension operations. The TΣP function is suitable for such operations. However, it is insensitive to wear over most of the service life. Therefore, this criterion is inconvenient as a diagnostic parameter. Given these features, experts have come to the conclusion that it is advisable to use risk indicators to assess the technical condition.

3.4 Risk function model for base structures

Long-term safe operation of a mechanical system can be achieved by implementing the concept of phased assignment of guaranteed service life. The total service life of the structure TΣ is represented as the sum of the guaranteed durability before the appearance of a crack Tf and the periods of crack development of a certain geometry up to the critical size Tgi: TΣ = Tf + Tg1 + Tg2 + … Tgi (Figs. 5, 6). The safe stage of the operation is limited by the guaranteed durability before the appearance of a crack Tp when the reliability is P˂0.95. At this stage, the technical condition is controlled by the resource safety index, the critical value of which is βP = 0 (green zone, Fig. 5). The initial value of the safety index is βP˃0 and corresponds to the logarithm of the guaranteed durability. During operation, the value of βP decreases linearly. After the guaranteed service life is exhausted, in the vast majority of situations, reliable operation of the system is possible, but the technical condition is already controlled by the dimensionless risk indicator ρ. The zone of acceptable risk, into which the safety zone passes, is closed between ρ → 0 and the critical value ρ = 0 when the reliability becomes P = 0.5 and the durability is the median T0 (yellow zone, Fig. 5). Further operation of structures is accompanied by the appearance of cracks that cannot always be detected by diagnostic tools. This stage is characterized by the growth of the crack to a size that can be confidently detected by diagnostic tools. In this case, the durability Tg1 corresponds to the full risk ρ1 = 1 (red zone, Fig. 5). Operation in this zone can be perceived as a resource reserve. The purpose of diagnostics during the service life extension procedure is to find out in which zone the equipment is located and what are the parameters of the service life exhaustion function. For this purpose, the safety functions βP(T) and risk ρ(T) are used. The intensity of resource depletion changes at each stage.

Fig. 5
figure 5

Changes in the application of risk criteria depending on the stage of technical condition of the mechanical system

Fig. 6
figure 6

The risk function ρ(T) at the stage of crack initiation T and at the stages of crack growth from a circular defect Tg1 to an edge defect Tg2

The process of loss of load-bearing capacity of base structures, according to the principles of assessing their residual life, is a multi-stage process. In accordance with this principle, the service life of the studied mills was formed. The total service life is represented as the sum of the durability before cracking and the survivability at different stages of fracture. In this aspect, this situation coincides with the concept of staged service life, which is the basis of safety theory.

The following model was adopted. Stage 1—the formation of a crack or circular defect during the service life T. This period is determined by the fatigue life model in the form of a fatigue curve. Stage 2—the development of a crack-like circular defect from the threshold value to the size when the crack can be identified by means of technical diagnostics (checking, Fig. 6). This period Tg1 is predicted by the models in the form of life curves (Fig. 6).

After identifying the position, size, and shape of the crack front, the parameters of the survivability curves and the risk function change. Then the full risk at ρ = 1 during the transformation of a surface crack into an edge crack occurs during the period Tg2 (Fig. 6).

4 Investigation of the technical condition and risk assessment of the further operation of the pipe rolling mill housings

The concepts proposed above were put into practice when deciding on the further operation of the housings of the 350-pipe rolling mill. The diagnostics were carried out on the housings of the piercing mill and the automatic mill, which had been in operation for 80 years. During this time, about 16 million tons of pipes of various sizes were produced. The decline in pipe quality initiated a production audit, which led to the need to determine the technical condition of the base structures of the main machines of the pipe rolling line.

A set of experimental studies was previously conducted. In particular, monitoring the spatial position of the automatic mill stand during rolling, diagnosing metal damage by magnetic memory and ultrasound, and strain gauging cyclic stresses. Based on three-dimensional models, the stresses of the mills were investigated according to the finite elements method (Figs. 7, 8).

Fig. 7
figure 7

3D model of the automatic mill housing and the stress field arising in it during the rolling of a 324 × 7.2 pipe from steel 20 in the right gauge

Fig. 8
figure 8

3D model of the housing of the piercing mill and the stress field arising in it during the rolling of a 324 × 7.2 stainless steel pipe with a 3 mm gap between the cover and the housing

According to the results of the instrumental inspection of the machine housing, it was found that the most vulnerable places are the inner surfaces of the racks, which are difficult to control. The operational load can be represented as a combination of the main and additional cyclic processes. The main process occurs at the rolling rate, and the additional process at the roll rotation rate. The parameters of these processes were found [22].

Based on the study of the stress–strain state of the housings and the actual production of pipes by assortment, a two-level loading block was formed for each of them. It has been found that the endurance limit of the housing elements σaR has a negligible effect on their final durability NΣ if the value of σaR does not exceed 115–125 MPa. A further increase in the endurance limit leads to a significant (by an order of magnitude or more) increase in durability. Therefore, it is beneficial to strengthen the metal of the housing in dangerous areas.

Given the impossibility of obtaining samples for mechanical testing from existing mills, models of fatigue and fracture resistance were developed for them. Deformation criteria that are effective under conditions of uncertainty were used. The operational metal embrittlement typical of rolling equipment was considered. As a result, a model of fatigue resistance of the damaged material was obtained. Its effectiveness is confirmed by the satisfactory convergence of the predicted durability and the frequency of crack reappearance in dangerous areas of the housings.

Risk functions are most adequately described by second-degree polynomials. However, for the 1st stage of crack initiation, the risk function can be approximated by a linear relationship (Fig. 9)

$$ {\text{automatic}}\,{\text{mill}}\,{\text{housing}}\quad \rho \, = \,0.082\, + \,0.0196T $$
(16)
$$ {\text{piercing}}\,{\text{mill}}\,{\text{housing}}\quad \rho \, = \,0.0069\, + \,0.0134T $$
(17)
Fig. 9
figure 9

Risk functions (risk plot) of the piercing mill housings (risk1) and automatic mill housings (risk 2) at the crack initiation stage

From these dependencies, it follows that the safe operating conditions for the piercing mill housing ended 75 years after its start, and the same conditions for the automatic mill housing ended after 51 years. This is due to the higher cycle load of this mill. Compared to the piercing mill, 5 times more cycles are observed during the rolling of one pipe in the automatic mill.

For the stage of development of a spherical defect from the threshold size to the critical size, the risk function (Fig. 10):

$$ {\text{automatic}}{\kern 1pt} \,{\text{mill}}\,{\text{housing}}\quad \rho \, = \,0.002T_{g} \, + \,0.0375T_{g}^{2} $$
(18)
$$ {\text{piercing}}\,{\text{mill}}\,{\text{housing}}\quad \rho \, = \,0.07T_{g} \, + \,0.7T_{g}^{2} $$
(19)
Fig. 10
figure 10

The risk function for the growth of a spherical defect in the housings of the piercing mill (risk1) and the automatic mill (risk2) under the influence of the main load block

With a 100-fold increase in the threshold defect, the durability of the housing element of the piercing mill is Tg = 5.2 years, and for the housing of the automatic mill—Tg = 1.2 years.

At the first stage, it is more convenient to use a linear risk function, especially since it has a theoretical basis. The risk function has 2 fixed points: ρ = 0 at T = 0, ρ = 1 at T = \(\overline{{T_{i} }}\). The latter value is determined from the service life distribution function as corresponding to the median value. Therefore, the risk function at the stage of crack formation is defined through the breakdown rate parameter λ:

$$ \rho {\mkern 1mu} = {\mkern 1mu} \lambda T{\mkern 1mu} = {\mkern 1mu} \frac{T}{{\overline{T}}} $$
(20)

In this formulation, the risk is directly proportional to the operating time and coincides with the damage in the life cycle interpretation. It is known that the base reliability equation is also expressed through the parameter λ, but an exponential function is used: P = exp(-λT). In this form, the reliability function as a diagnostic parameter is inconvenient, since it is insensitive to operating time over a longer period of time.

In contrast, the risk function ρ(T), as well as the resource safety index β(T), are sensitive to operating time. The risk function is a more immediate means of assessing the technical condition than the safety index, which requires the establishment of breakdown criticality. Therefore, the dependence ρ(T) is applied when the breakdown criticality is the same. If the risk function is increasing, the safety function is decreasing. Both methods do not have the property of accumulation. This means that the diagnostic parameter (risk or safety) accumulated at the previous stage of disability does not pass to the next stage. Therefore, each subsequent stage should be considered as a reserve of carrying capacity.

Comparing the risk ρ(T) and safety β(T) methods, it should be noted that the latter assesses the technical condition of the structure more fully since it can combine the effects of damaging processes of different nature, intensity, and criticality. At the same time, the β(T) method is more conservative since the operation should be stopped when the guaranteed service life is less than average. This situation contributes to maintenance overruns. Therefore, individual forecasting with a phased reassignment of the resource is the most optimal strategy.

From the behavior of the risk function, we can draw a conclusion about the inspection model. The fact that the function ρ(T) has a linear shape at the stage of crack initiation and becomes concave at the stage of crack propagation justifies the inspection model with intervals that are consistently reduced [23, 24]. Such a sequential model assumes that each subsequent inspection interval δj+1 will be less than the previous one δj, where j is the number of inspections. Having as a criterion the invariability of the risk increase Δρ = j+1—ρj) = const for the interval between inspections δj+1 = (tj+1—tj), according to Eqs. (18, 19) and Fig. 10, we really get δj+1˂ δj˂ δj-1. This model is relevant for objects with a significant operating time when significant damage has been accumulated.

One of the signs of deterioration in the technical condition of rolling mill stands is a breakdown in the system of fixing and securing the stands. This leads to a loss of spatial orientation of the stand, which is already unacceptable, as it contributes to an increase in product rejects. However, in addition, the gaps associated with this phenomenon lead to an increase in stresses in certain unfavorable areas of the stands. However, the nominal stresses in the majority of the housing volume may remain at the same level and even decrease. In unfavorable areas (for example, corner welds), defects develop intensively. In addition, gaps increase dynamic loads, which in combination leads to a decrease in reliability. Therefore, the modernization of the housing fixation system and its elements is a universal repair measure to extend their service life.

Reliable fixation of the cover on the housing of the piercing mill stand of the pipe rolling unit can be achieved by installing cover fixation devices on the housing in the form of blocks built into the cover with retractable screws, which have a conical section at the end, interacting with conical cups that are fixed in the housing [25]. A hydraulic fixation device is also proposed for convenient maintenance [26]. The study of the stress–strain state of the housing and the cover of the working housing of the piercing mill 350 shows that fixing the cover in the places of their reliable fastening helps to reduce local stresses in the manufacture of pipes from metals that are difficult to deform. The maximum stresses in the places of their localization are respectively 140 MPa on the housing and approximately 109 MPa on the cover. That is, the reduce is 6–12%.

For the housing of the automatic rolling mill, the housing mounting unit was modernized by installing compensation plates under the supports to eliminate gaps. The technology for mounting the fastening unit and tightening the foundation (anchor) bolts were developed.

5 Conclusions and future perspective

It is proved that the dimensionless risk indicator ρ in the form of odds-ratio is more suitable as a diagnostic parameter than the probability of breakdown, which is usually used. Therefore, the risk function is used to assess the technical condition of the mills, which, unlike the reliability function, is more sensitive to the operating time. The process of loss of bearing capacity of the housings is represented by several stages, at which the risk increase occurs with different intensities. For the first time, it was found that the risk function of the housings at the stage of crack initiation can be represented by a linear dependence directly proportional to the operating time.

It is shown how the behavior of the risk function affects the choice of the inspection model. If in the early stages of operation the linearity of the risk function regulates the equality of inter-inspection intervals, then in the final stages it is worth switching to a sequential inspection model.

An algorithm for determining the optimal period of restoration measures according to the criterion of minimizing the intensity of costs, where the risk indicator is a parameter, has been developed. With its growth, the recovery period decreases. When the risk reaches a certain level, the optimal period remains virtually unchanged. In such conditions, the minimum cost criterion loses its effectiveness and it is worth using complex indicators of technical condition. In this capacity, the concept of changing the application of risk criteria depending on the stage of the technical condition of the mechanical system has been developed for base structures.

The risk function for the period of crack initiation and appearance has a convex shape close to linear, and for the period of defect development—concave. This confirms the conclusion that the rate of reliability depletion increases with the development of cracks.

At first look, the connection between the equipment maintenance policy and the life of its base components (BS) is not obvious. After all, BS are most often not maintained but are repaired only after visually detected damage. In actual studies, a significant influence of the condition of the fixing and connection units of the housings on the forces acting in the housings themselves was revealed. In particular, for an automatic mill, such components are the foundation bolts, and for a piercing mill, it is the top cover fasteners. Therefore, during inspections, it is necessary to diagnose the operability of these components. Thus, it can be stated that optimal maintenance indirectly contributes to the long-term operation of the BS.

The authors know about a dozen indicators that characterize the risk of mechanical systems. A suitable solution algorithm has been developed for each model. Such a number of them somewhat disorients the repair personnel and restrains the effectiveness of the RBM strategy. Therefore, the next stage of the development of this direction should be the systematization of RBM models and the development of recommendations for their application in a certain production situation.