Evaluation of guidance provided by international standards on metrics and timelines for run-life estimation of oil and gas equipment

Run-life is a concept used in the oil and gas industry to express time to failure for running equipment. When estimating this as part of reliability engineering activities, different metrics and time periods are considered. One metric is the traditional ‘mean time to failure’ (MTTF), but alternatives such as ‘average run-time’ or ‘average run-life’ can also be considered. For calculating these metrics, different time periods can be used. For example, when estimating the MTTF of well completion equipment, operating times or running times are normally used. However, the periods can also include idle time, where the item is technically available, but associated parts of the production facility might not be. For consistency across the industry, on how to interpret the metrics and what to include in calculating them when performing estimations, ISO 14224 (2016) and IEC 60050-192 (2015) in tandem provide guidance to ensure quality in reliability data collection and analysis. While MTTF is defined and a theoretical basis is given, guidance on when to use the different timelines for the estimation is sparse. Neither ‘run-time’ nor ‘run-life’ is explicitly defined in these standards. They provide no guidance on how to interpret and use the metrics ‘average run-time’ and ‘average run-life’, despite these sometimes replacing the MTTF in reliability analyses. In this article, we discuss the variation in metrics and associated timeline definitions. A main purpose is to identify improvement potentials in the international standards and suggest how to achieve appropriate guidance for consistent interpretation and use of time-to-failure metrics in the oil and gas industry. An additional purpose is to clarify whether all these metrics are really needed. There is particularly confusion around ‘run-time’, which some interpret as a reliability metric and some as an item’s cumulative running time. One suggestion is that the standards focus more on ‘running time’, by adding a formal definition, and clarify how it compares with operating time and run-time, and when to use it. We also suggest introducing ‘running time to failure’ and ‘operating time to failure’, which would be consistent with existing terminology, while clarifying the timeline being referred to. We use examples from well drilling and completion systems, to show the reliability implications for modeling and calculations.


Introduction
For the estimation of up, operating or running time to failure of equipment used in oil and gas operations-so-called run-life-there are several reliability metrics that could be 1 3 considered. Run-life is used as an umbrella term to cover the relevant metrics. For an overview, Skoczylas et al. (2018) describe different run-life metrics for the estimation of artificial lift systems' reliability. The selection of the metric is related to the type of operation, the system considered and its functions, as well as the intended use for the metric. A wide range of use is found within reliability engineering activities, for the purpose of monitoring, controlling, and managing equipment performance: for example, as input to offshore risk or safety management. For this information to add value, a fundamental criterion is the consistent definition and use of the metrics.
In particular, the interpretation should be unambiguous, in the sense that reliability engineers should be performing calculations and making assessments on the same basis. Then, it is very important to understand the limitations of the analysis techniques and metrics used (Alhanati 2008). For that to happen, it must be clear which metrics are appropriate to use, and which timeline information should be included in the calculations. Sheldon et al. (2010) claim that it is important to use several run-life metrics, to achieve a good understanding of the underlying system reliability.
When dealing with downhole well completion equipment (i.e., equipment below the wellhead level, from the tubing hanger at the top to the equipment at the bottom of the well; see ISO 14224: 2016), the reliability considerations are linked to both cost and environmental issues. There might also be safety concerns, e.g., those related to the 'downhole safety valve' component. Another class of equipment falling under the category 'downhole well completion', which has been given significant attention in the literature lately, is the electrical submersible pump (ESP), used for artificial lift systems (Lastra 2017; Rubiano et al. 2015;Singh and Pateriya 2019). When addressing the time to failure of these, the concept 'run-life' is often used. 'Run-life' expresses how long an item can run before it fails (loses the ability to perform a required function), then requiring maintenance actions, i.e., being pulled, repaired, and/or replaced. A traditional metric used to represent this concept is the mean time to failure (MTTF). To cover a variety of purposes, besides insights into replacement and repair frequency and costs, other metrics such as 'average run-time' and 'average run-life' are sometimes introduced. Skoczylas et al. (2018) identify that there are different runlife measures that could be applied, e.g., for tracking the performance of artificial lift systems in oil and gas. One of these is the average run-time with variations. Based on the literature addressing 'run-life' analysis, it is evident that the distinctions between the metrics are unclear.
An objective of this article is to clarify these distinctions, in line with the guidance provided by two international standards especially: ISO 14224 (2016) and IEC 60050-192 (2015; frequently referred to as the International Electrotechnical Vocabulary, or . The standards in tandem provide the main guidance on how to ensure quality in reliability data collection and analysis. The  primarily provides definitions and is not a guidance document as such, but the statistical estimations can be derived from the definitions. Both documents include definitions of key concepts and clarify the type of data that is required for the calculation/estimation of different reliability metrics and parameters. A full overview of terms defined in ISO documents is available from the online browsing platform [ISO OBP (2022)]. In addition, the definitions given in IEV-192 (2015) can also be identified in the Online Electrotechnical Vocabulary, as "IEV Online" [Electropedia (2022)], produced by the IEC.
As one should expect, MTTF is already captured by the two standards, as well as definitions of relevant timelines for the time to failure estimation, including running time. However, as the standards capture neither 'average run-time' nor 'average run-life', there is a call to assess arguments for leaving them out and to discuss whether initiatives should be taken to ensure that one or both of them are considered for inclusion in the next revisions. There is also a need to clarify how these alternative metrics compare to MTTF.
Note the distinction being made here between 'running time' and 'run-time': According to the definitions above, run-time is broader than running time. When recording the run-time, the focus is on the duration "in service" or "in use". The distinction between the terms should be explicit, particularly in the ISO 14224 standard.
In this article, the definition and understanding of different run-life metrics and associated timelines are reviewed based on literature and the relevant international standards. Through the review, we discuss the quality of guidance on run-life metrics provided by international standards applied to oil and gas reliability, availability, maintainability, safety (RAMS) analysis, or engineering purposes. Cross-referencing is made, as part of this review, to the associated ISO technical report ISO/TR 12489 (2013), supporting reliability calculations for safety systems. The standards have an influencing and strong position in the industry and offer some clarity on how to use the metrics in a consistent way. They also offer quality assurance from an international community within the reliability engineering and technology area, related to the rationale for using relevant metrics and how to use them for run-life estimations, e.g., how to select the appropriate metric and which time period to include in the calculations or estimations.
The remainder of the article is structured as follows. Section 2 presents an overview of how the run-life metrics in focus are defined and described in the literature and in international standards, including the definition and meaning of MTTF. Next, Sect. 3 presents the timeline definitions given by the standards and discusses which time periods to use for run-life calculations. Then, in Sect. 4, we discuss the variety of metrics and their meaning and relevance, including a discussion on how to strengthen the guidance provided particularly by ISO 14224 (2016) and IEV-192 (2015). We use examples from offshore well drilling and completion systems to show the reliability implications. Finally, Sect. 5 gives some concluding remarks, in which we summarize key arguments and give some recommendations for oil and gas applications and for future standardization work.

Average run-time definition
Let us start with 'run-time' (RT), the basis for the calculation of 'average run-time' (RT AVG ). RT is a metric perhaps most recognized from programming (software) applications, to capture the time or duration for running scripts, codes, or for running simulations. Such use is common in information technology applications, where the international standard ISO/IEC 2382 (2015), guiding information technology vocabulary, defines RT (i.e., 'execution time' or 'run duration') as "any instant at which the execution of a particular program takes place". However, RT use is also common in non-programming applications, where the term expresses how long an item has been in service or in use. The RT information is sometimes extended to a reliability context, where a statistical average of the sample available is calculated and then used as an estimate of run-life. An application area is the analysis of artificial lift and ESP systems (see, e.g., Al-Aslawi et al. 2010;Skoczylas et al. 2018;Vandevier 2010). RT then indicates the period during which a specific item was in use, from the time when it was started (e.g., a pump motor started), until it was stopped for whatever reason, and up to the point in time where the data were collected. The rules for when to stop the data collection could vary but should reflect the relevant "in use" time for the items and applications considered. For example, Gleirscher (2017) considers run-time for automated vehicles. For an oil and gas application, Quinn et al. (2014) consider the run-time of vapor recovery units (VRU), a type of equipment for flare gas utilization, and use the concept to indicate how long the equipment is running before stopping for some reason. Again, it does not have to be due to failure, but it can also be interpreted as strictly how long a specific item will be running before failure, which has a more reliabilityoriented focus, then being relevant for a wide range of safety critical systems, e.g., as addressed in Eastwood et al. (2013).
The RT AVG is simply the average of several run-times. This usually applies to a population of items and not to only one item. However, it is not clear whether only one time period (e.g., the first one) or several periods for the same item should be included in the calculations. In the literature, there is reference to a metric called 'cumulative runtime'; see, e.g., Munro et al. (2016), Bailey et al. (2014), and Bybee (2008), highlighting or specifying the point that several periods of running time are added up. Included in the data set, then, could be both periods where the item(s) fail(s) in the end and failure-free periods. Hence, the data set could represent a mix of failure and no-failure censoring. Such a mix is in line with Skoczylas et al.'s (2018) interpretation of RT, where it is used as a metric of the cumulative running time for a specific item (e.g., a component or system), regardless of whether it has failed. The authors (ibid.) also made a distinction between 'actual runtime', defined as "the time during which the system is actually running", and 'duration', defined as "the time from when the system is first started to when it fails or last stopped", which could include periods in which the system is idle.
For an item, note the distinction between: • the cumulative running/operating/up time until the item fails or is stopped for any reason ('cumulative RT'); • the cumulative running time until failure; • the lapsed up time until the item fails or is stopped for any reason (duration).
By dividing the cumulative running, operating or up time (i.e., the summation of these time periods), for a set of items by the number of items, we then get the RT AVG for these. Depending on the purpose, one might choose to focus on items that have failed or that are currently running; e.g., Skoczylas et al. (2018) point to the average RT for running systems (i.e., items currently active) and the average RT for items that have failed, or have been pulled for whatever reason, during a time window.
Although not always explicit (see, e.g., Diaz Sierra et al. 2014), Skoczylas et al.'s (2018) definition of average RT for running systems is basically the same as Sawaryn's (2000) definition of 'instantaneous runtime' (sic). This is calculated as "the total runtime of all running units divided by the number of running units", where the total number includes all running times recorded for both those installed initially and those installed later during workovers, up to time t. This time t is then the instant determining the cumulation of running times. See Sawaryn (2000) for how it can be linked to renewal theory. However, we find the label "instantaneous" to appear somewhat confusing, as the calculation seems to rather capture the 'average' value and not an 'instant' value.
The concept of 'instantaneous run-time' is also given other interpretations. For analysis of PCP systems, Sheldon et al. (2010) describe 'instantaneous run-time' as: "the average run-time of all systems still running or pulled within a one-month window". Here, 1 month is the moving time window reflecting the 'instantaneous' aspect, although 'instantaneous' normally points to a particular infinitesimal interval of time [t, t + dt] in traditional failure rate theory (see, e.g., Kapur and Pecht 2014). Nevertheless, an 'average runtime of pulled systems' is also calculated using the RT of all systems pulled within a longer moving window, e.g., 1 year. Refer to Sheldon et al. (2010) for an example comparing these metrics with the MTTF.
Returning to the RT AVG , Rubiano et al. (2015) focus specifically on the artificial lift systems "currently" running in the wells. RT is then the cumulative time that an item still in operation has been running. This is one of the ways to define the relevant population, according to Skoczylas et al. (2018). The RT AVG is defined by Rubiano et al. as: "the average value of run-time of all artificial lift systems in a specific field currently in operation", thus limiting the population to only the active items. All down time periods, e.g., the period between installation and start-up, are then discounted.
By focusing on the running time of items in general, as a main parameter, the metric could be relevant to a wide range of equipment. Particularly, when dealing with equipment having rotating functions, such as pumps in general, it could be informative to have a specific metric on the average running time to failure.
Note that the use of 'average' in this context refers to the statistical average (arithmetic mean) of the time data collected from the population of items, i.e., the cumulative time collected, while 'mean' refers to the true value for the type of items considered, assuming an infinite population. The RT AVG for in-use time registered in the time interval [0, t] can then be expressed as where n is the number of items, and r i (t) is the cumulative running (or operating or up) time for item i.
In principle, as the RT AVG expresses information about in-use time, it could be used as input for the estimation of MTTF, together with input on the number of failures in the observation period. As such, there is a relationship to the MTTF. For constant failure rates (or 1/MTTF), the maximum-likelihood estimator includes both the failed and non-failed items, which is also the population considered for the RT AVG calculations. Note that RT AVG estimates may be quite different, depending on whether running and/or failed systems are included in the population, and whether stand-by and/or idle time is included or not in the RT of an individual item. These in turn may be quite different than the estimated MTTF for the population. For example, when considering the reliability of so-called progress cavity pumping (PCP) systems, Karthik et al. (2014) use data collected from 2 years in operation; they then compare the average run-time of 300 days for all systems, and 258 days for pulled (failed) systems, with a much higher MTTF of 1651 days. There are different ways to illustrate the difference (e.g., chronograms), but the point is that the selection of the RT metric might influence reliability management. However, there are also examples from literature where the RT AVG and MTTF concepts are mixed; for example, in relation to ESP reliability, Mubarak et al. (2003) formulate: "… it is shown that 'Mean time to failures' (Average runtime) significantly increased…" No clear distinction is then made between the metrics.
For the situation where only failed systems are considered for the RT AVG , the approximation of the mean is made by censoring on items with failure, i.e., by ignoring items surviving time t. This gives a lower and less accurate estimate (i.e., an underestimation of the mean RT). Pflueger (2011) points to this second interpretation as a common way to calculate the RT AVG , meaning that there could be discrepancies if the definition and population are not clarified.
Furthermore, when estimating the RT AVG , typically, there are several smaller stops or pauses included. Quinn et al. (2014) point to uncertainty related to some items (e.g., for VRU components), as during normal operation, these frequently take brief pauses, stop and restart. These effects are typically ignored in the reliability modeling and analysis: "The algorithm used to extract run-times from the temperature profile often missed these short pauses" (Quinn et al. 2014). Although this challenge can be addressed to some extent by the use of smart technology, this, in contrast to the failure censoring way, leads to an overestimation of the actual RT. Here, RT refers to the time from the item being put into operation to the time when operation is stopped at time t. Uncertainty, then, is about how well the estimate derived from the modeling captures the actual time, which can be challenged by the sample size and the data collection quality.

Average run-life definition
As for RT, we find the metric 'average run-life' (RL AVG ), used to express running capabilities for different types of equipment, to be particularly substantial in the literature addressing artificial lift systems and ESPs. However, regarding the interpretation and understanding, several papers fail to define this metric, assuming that the reader is aware of its meaning, which we see as a main problem.
The wording 'run-life' is sometimes used to express and cover the situation of the running time of items with recorded failures, i.e., the RT for failed items. For example, Al-Sadah (2014) uses 'run-life' to express the time of different ESPs running to failure, as opposed to the time they were in the well. As another example, when discussing the reliability of PCPs for a 2-year period, Karthik et al. (2014) refer to the period from start to stop of running, due either to failure or to the end of operations, as the 'run-life' time. This shows that the censoring criteria are not consistent, as different time periods could be included in the definition, meaning that there are different ways to interpret the metrics. 'Run-life' could capture not only the time during which the item is actually running in its life (the cumulated running time of the item) but also the period during which the item is in service being functional (running or not).
The interpretation of RL AVG , given the above understanding of RL, is then similar to the 'RT AVG for failed items'. It refers to a statistical average based on the RL sample considered. Again, the distinction is not always clear between the RL AVG observed (based on an average of failed systems) and the expectation for the (true) mean RL, which needs to include the whole population and consider running items and items that were stopped but did not fail. Mogollon et al. (2018) provide an example of this, by stretching the observed average RL toward a mean (expected) RL, characterizing the RL AVG as a "lagging tracking method", and indicating a focus on items with registered failure. They also claim that MTBF (Mean time between failures) is a more appropriate metric, as it considers items still running, i.e., being more "forward". A similar argumentation is made in Camilleri and Macdonald (2010), who claim that the mean time between pulls (MTBP) is more appropriate, based on its focus on items still running and 'running time'.
The focus on only the failed items is also captured in the definition given in Rubiano et al. (2015), where, by corresponding only to the artificial lift systems (items) pulled out from the well, the run-life is distinct from the run-time. The RL AVG is: "the average value of run-life of all artificial lift systems in a specific field currently pulled out or failed." For RT AVG and RL AVG , two disjunct datasets are considered for the calculations as, according to the definitions in Rubiano et al., RT and RT AVG capture strictly the active items. In that way, the authors make a clear distinction between the two metrics, although the definitions remain somewhat inconsistent with the typical RT definition (see Sect. 2.1). By covering only pulled (failed) items, it also makes such an interpretation of RL AVG quite a poor estimator for the mean RL.
Al-Jazzaf et al. (2019) point to a method, or an adjusted RL metric, called the 'Dynamic Average Equipment Run Life' (DAERL). The starting point is the average equipment RL (ERL), which is calculated as the "total exposure time of all systems either currently operating or earlier failure with respect to the total number of systems". Systems can be interpreted as items, in this context. A problem might be that the metric fails to properly capture the current condition, as an average is made over the whole population, meaning also that new items installed, typically, will bring down the average. To compensate, first, only the failed items are considered for the average run-life calculation. Then, a list of the remaining items, some currently running, is screened for those having higher run lives than the initial average, and an average of this selection is calculated. Then, finally, the average of the two is calculated as the DAERL value The idea is that this adjustment will provide a more realistic and useful estimate. Refer to Al-Jazzaf et al. (2019) for further details and examples. There could also be other 'adjusted' RL metrics, and it might be a practical way to handle the mix of failed and non-failed items in the population, but we struggle to see the basis for it.

MTTF definition
The MTTF metric, i.e., the 'mean time to failure', is widely applied and well covered in reliability engineering textbooks, as well as in international standards and technical reports, particularly in ISO 14224 (2016) but also in IEV-192 (2015) and ISO/TR 12489 (2013). Thus, we will not provide an extensive discussion here. Refer to standard reliability engineering textbooks for relationship to failure rate and the use or applicability of various failure distributions, e.g., Kapur and Pecht (2014) and Tobias and Trindade (2012).
Nevertheless, the understanding of this metric plays a key role in the discussion in this article, as it expresses statistical expectation, while both the RT AVG and RL AVG represent statistical averages. However, despite the MTTF metric having long traditions, the use of the new metrics indicate there could be perceived gaps in ISO 14224:2016, regarding the run-life estimation and interpretation related to the time periods used for its estimation. The MTTF metric is defined in ISO 14224 (2016) as: MTTF = the expected time before the item fails, which can be expressed mathematically, with reference to the failure density function f(t), as The failure density function gives the probability per unit of time that a failure occurs at time t, given that the component or system was operating up to time t. This can include both new items and items repaired considered 'as good as new'. We will not go into discussions around the conceptual description of 'expectation', and refer to standard statistics or reliability theory textbooks for further details.
The definition above is the same as that given in ISO 20815 (2018) on production assurance and reliability management, and in ISO/TR 12489 (2013) on reliability assessment for safety systems. However, there are also others. A definition search on the online browsing platform [ISO OBP (2022)] shows, besides the one mentioned, that there are three alterative 'MTTF' definitions that could be considered from ISO documents; see Table 1. One of these expresses basically the same notion as the one above, i.e., the "expectation of the time to failure" instead of the "…time before the item fails".
Mathematically, when assuming an exponential distribution of failures, MTTF is often conveniently estimated as the inverse of the estimated failure rate. This is according to ISO 14224 (2016; item C.3.2.1 of the standard), and both are estimated based on the number of failures observed and the "aggregated time in service, measured either as surveillance time or operating time", leaving it somewhat open regarding which time periods, exactly, should be considered in this "aggregated time in service". Refer to Sect. 3 for an overview of the different time periods presented in the ISO 14224 standard. (3) The other standards we reviewed are of little use in providing further guidance. In IEV-192 (2015), the time period referred to in the definition of MTTF is strictly the time interval for which the item is in an 'operating state' (i.e., the operating time), where the duration of the operating time could be expressed in a variety of ways, depending on which units are appropriate to the situation, e.g., calendar time or number of cycles. It is also suggested in ISO 3977-9 (1999) that 'operating time', or 'running time', be used, representing the number of hours in service. This makes it important to clarify what exactly is the difference between the 'operating time' and the 'service time'. It calls for a discussion on whether it is operating time or rather the up time or service time that is to be considered.
Instead of MTTF, Mogollon et al. (2018) use the MTBF to estimate the run-life. Although, the expression used for this calculation (i.e., survival fraction calculated from exp[time/MTBF]) corresponds to the MTTF expression in (3), when the maintenance time is ignored and conditioning on the failures being exponentially distributed. However, in Mogollan et al. (ibid), it seems that only items still running are considered, which is not common practice or in line with the above-mentioned international standards for use of the term MTBF. We also find the opposite in the literature; just to indicate the inconsistency in run-life estimation practice: Komova et al. (2013) claim that MTBF of ESP: "…is calculated as average run days for all ESPs that already failed". Hence, we see that reliability terms continue to be misused in oil and gas reliability analysis and management, with runlife metrics like MTTF and MTBF not always estimated in ways that are consistent with reliability theory literature and standards.
In the following sections, we focus on the question of which time periods should be used in the estimation of the run-life metrics, and how these should be interpreted, mainly using the case of downhole artificial lift equipment as an example. We also discuss further the distinctions between 'up time' and 'operating time', and between 'operating time' and 'running time', guided by ISO 14224 (2016), as these distinctions are essential for interpreting the metrics. We also comment on the extent to which the guidance provided by the current version of ISO 14224 is adequate, despite missing out on several metrics applied in the oil and gas industry. It is a question of both what to include and how  TR 1972TR (2009 to use these metrics and associated timeline definitions for reliability parameter calculations, particularly the MTTF, the RT AVG , and the RL AVG , but also, to some extent, the failure rate.

Categorizing time before failure
To understand the metrics and how to use them, it is essential to clarify the associated timeline definitions. As a starting point, timeline issues are mainly covered in subsection 8.3 of ISO 14224 (2016). This subsection of the international standard includes an overview, which breaks down the overall 'surveillance time period' into 'up time' and 'down time', and in which the 'up time' is further broken into two main elements: the 'operating time' and the 'non-operating time'. As ISO 14224 (2016) defines the 'non-operating time' as strictly part of 'up time', it follows that it cannot be considered as part of any down time. This is, however, not the situation in IEV-192 (2015), where the non-operating time can be part of both up and down time, the meaning of 'nonoperating time' being that the item is in a "state of not performing any required function".
In Table 2, we identify 'running time' as a specific 'up time' period and the main part of the 'operating time'. The same table shows that the 'non-operating' time includes three parts, including the 'externally disabled time'. However, when presenting 'non-operating time in subsection 8.3, ISO 14224 (2016) ignores the 'externally disabled time'; it is not in any of the categories, which is why it is placed inside a set of brackets in the table below. This is in some way consistent with the timeline presentation in ISO/TR 12489 (2013). However, in a note to entry of the definition of 'idle state' (in item 3.38 of the international standard), it is clearly stated that the 'non-operating' time comprises the 'idle time', 'the stand-by time', and also the 'externally disabled time'.
Furthermore, the text of item 8.3 in the ISO 14224 (2016) standard indicates that "when equipment is in an idle state or in hot stand-by, being ready for operation when started, it is considered to be operating (or 'in-service')". For the idle state, this appears inconsistent with the information presented in Table 4 of the standard, where idle time is, instead, included as non-operating time; it also raises the question of whether, according to the standard, 'in-service' and 'operating' should mean the same thing.
To provide guidance related to the evaluation of (downhole) equipment reliability, Skoczylas et al. (2018) suggest that it is best to use the 'actual run-time' (defined as "the time during which the system is actually running") or 'duration' (defined as "the time from when the system is first started to when it fails or last stopped"), "if there is little downtime…". Skoczylas et al. explicitly describe 'running systems' as 'operating systems', indicating that they understand these two terms to be inter-changeable. Their wording is then, however, not entirely consistent with the terminology defined by ISO 14224 (2016). Using such terminology, this would translate into suggesting that it is best to use 'operating time' or 'up time' if there is a short 'idle' or 'cold stand-by time'.

Up time versus operating time
To distinguish between up time and operating time, let us start with the specific definitions given in ISO 14224 (2016) up time = time interval during which an item is in an up state up state (available state) = state of being able to perform as required (adopted from IEV-192:2015) operating time = time interval during which an item is in an operating state (adopted from the IEV-192:2015) operating state = state of performing as required.
We also add the following related definition given in IEV-192 (2015): operating time to failure = operating time accumulated from the first use, or from restoration, until failure.
Based on the definitions above, we understand the distinction between the two as being that 'up time' refers to all the time in which the equipment has the ability to perform as required, while 'operating time' refers only to the time in which it is actually performing as required. Accordingly, 'up time' includes not only 'operating time' but also the 'nonoperating time', where the equipment is idle, in cold standby or externally disabled (i.e., able to perform as required but not actually performing), as per Table 2.
However, this interpretation may be challenged, based on notes to entry attached to the definition of 'operating time' in ISO 14224 (2016). The second note to entry specifies that the operating time "includes actual operation of the equipment or the equipment being available for performing its required function". Furthermore, there is also a note four to entry, indicating that "it could start from the time of installation, time of commissioning or time of start of service". This makes it difficult to distinguish this time concept from the meaning of 'up time'.
To give an example of this: Skoczylas et al. (2018) mention that there are cases "in which the system is installed in the well much earlier than it is first started, and the operator wishes to consider the effect of the time in which the system is idle in the well on the expected time to failure". We have this situation when the downhole artificial lift systems are installed in the well much earlier than the host facilities are ready to receive the well production or, in offshore operations, in situations where the host platform may not be operational for long periods of time due to weather, forcing the downhole systems to be idle (i.e., in a non-operating up state during non-required time) for such periods. This is also the situation when there are so-called 'dual systems' installed (see, e.g., Horn et al. 2003;Popov 2001), to provide redundancy and continue to operate the well with a second system after the first system fails. Many of these systems that are idle for a period may be failed when tested periodically, or when first activated after a long time in a non-operating state, therefore failing 'on-demand'. We distinguish between the failures revealed (detected) by tests or first activation, and the failures caused by tests or first activation, including the so-called 'maintenance-induced failures'.

Running time
To capture the essence of, particularly, the RT AVG and the RL AVG metrics, it is key to understand the timeline concepts, 'running time', 'operating time', and 'up time', representing the variety of time periods in focus for these two metrics. ISO 14224 (2016) gives no formal definition, but describes such a time period in the normative part of the standard as: "the active operational time for the equipment". The word 'active' is not defined, but can be interpreted as the equipment being in use, performing its main function. Thus, implicitly, running time should be considered as a subset of the operating time. Practically speaking, this is the time when it is 'in work' and not only technically available, meaning also that it should not cover hot stand-by time or start-up or run-down times, which are other and disjunct subsets, according to ISO 14224 (2016) and ISO/TR 12489 (2013). Hot stand-by indicates a condition where the equipment can immediately be brought into active operation when needed. For example, for redundant systems when the primary item fails to perform its required function, the hot stand-by item can be immediately activated.
Hence, in contrast to IEV-192 (2015), where the concept of running time is not at all defined or described, ISO 14224 (2016) outlines 'running time' as a key part of the operating time, applicable to, e.g., rotating machinery.
Based on the categorization in Table 2, it should be sufficiently clear that running time is different from operating time, and that it does not include either run-down, start-up, or hot stand-by time. Adding a formal definition should be a simple task.

On the need for the average run-time and average run-life metrics
Both the RT AVG and the RL AVG cover the spectrum of 'running time', 'operating time' and 'up time', as the relevant timelines in focus. For both, we have identified several possible definitions, and it may be argued that the two concepts have the same meaning, if the focus is on the population of failed items. Typically, this is not the situation, as the RT normally captures both the full population of items running and those that have failed. Then, the RL AVG can be seen as a special case of RT AVG , where only the failed items are included, i.e., 'RT AVG of failed items'. Sometimes, the RT AVG captures only items currently running. Nevertheless, when using the metrics, one should be clear on the population considered. The main issue is that the focus is different: RT focuses on cumulative time in use, while RL focuses on the cumulated time to failure for an item. As such, one could argue a need for both. Rubiano et al. (2015) have suggested a sample selection, based on censoring, and in that way have succeeded in making a somewhat reasonable distinction: separating the metrics based on whether the items have failed or not. However, both of the averages produce a poor approximation of the expected (running, operating or up) time to failure for the item. A good approximation requires having available a large population covering sufficient time, and one should in general avoid interpreting the RT AVG or RL AVG as expectation estimates for comparison with the MTTF, as also indicated in Skoczylas et al. (2018). Particularly for RL AVG , it is tempting to link it to other reliability metrics such as MTTF, when both have a focus on time to failure.
This raises the question of whether the information provided by the RL AVG actually adds any value when it typically produces an underestimation of the expected 'run-time' to failure. In a reliability context, it is common to define metrics in terms of averages and link them to the (true) means and expected values. By referring to expectation of the 'runtime' to failure, it introduces implicitly an issue of how it compares to the traditional MTTF metric and when to use the metrics. If the main distinction between the two metrics is the size of the population considered, it would perhaps be better to label it as means and estimates of this, for the time period considered, probably ensuring higher specificity and consistency. It will at least make it clearer what we are dealing with. However, that strategy requires a consistent definition of the mean, as well as a clarification of which timeline definition is considered. We will address this further in Sect. 4.2.
A key argument against the introduction of the two average metrics mentioned above is the apparent variety of definitions, i.e., there is a lack of specificity in both of them regarding which time periods to include and how to censor the data. This would to a large extent be avoided if the distinction was formalized by including these definitions into future revisions of international reliability standards and technical reports. Clarification could be given on which time periods are appropriate for equipment data collection and analysis purposes, and how to interpret them and the relationship to the MTTF, to facilitate consistent use of the metrics. Adding at least a note on this could be useful in international standards guiding the use of these metrics.

How to achieve MTTF consistency
In the previous section, we reviewed time periods relevant for the RT AVG and RL AVG calculations. For both, the matter of which ones are appropriate relates to whether the information is useful. This usefulness is also influenced by the population size. If the averages are to be compared with mean values or MTTF, there are reasons to criticize the population (i.e., the failure or survival censoring). However, RT or RL, or averages of these, could still provide meaningful information, as long as we are clear on what is captured, do it consistently, and understand the distinction between the average and the mean RT (or RL). The challenge is perhaps more the consistency in MTTF estimation, despite this metric having been around for some time. It is not so much the definition being the problem, but rather what time periods to include in the "time before the item fails" part.
The key is to understand the MTTF and identify why the current guidance could lead to inconsistent run-life estimation practice. The notes to entry of the MTTF definition given in ISO 14224 (2016; item 3.62) serve as a good starting point.
The first note to entry refers to ISO/TR 12489 (2013), which provides a more theoretical foundation. In this technical report, it is noted specifically that the MTTF should not be mixed with the design lifetime of the equipment. This is a problematic link in other standards and also in the literature. For example, ISO/TR 19972-1 (2009) defines MTTF as the "mean lifetime of a component that has not been repaired since its production, based on a statistical mean, using times to failure as the definition of failure". Cui and Li (2007) refer to MTTF as the expected lifetime of the components considered. Okaro and Tao (2016) refer to MTTF as the "mean of the distribution of a product's life calculated by dividing the total operating time accumulated by a defined group of devices within a given period of time by the total number of failures in that time period". This corresponds to the information provided by maximum-likelihood estimation, and mixes estimation and true value, i.e., the mean. ISO/TR 12489 (2013) states that sometimes it may be more understandable to rather express lifetime using the unreliability, e.g., the probability of failure during the design life. This places the focus on the failure probability rather than on the survival probability. We return to this below.
The second note to entry in ISO 14224 points to the definition of MTTF given in IEV-192 (2015) as the "expectation of the operating time to failure". This implies that MTTF refers only to operating time and not the full spectrum of time periods that comprise the up time (see Sect. 3).
The third note to entry in ISO 14224 refers to Annex C, where further guidance is provided with regard to the interpretation and calculation of reliability parameters. In this (in the section on the mathematics of availability), it is indicated that MTTF should be estimated using the actual up times observed in the field, which is not at all in line with the definition in IEV-192 (2015), as per the above paragraph. However, later in Appendix C (the part on the mathematics of failure rate), the 'time to fail' (TTF) is described as "the duration of functioning observed in the field", and it is further stated that, in practice, the sum of TTFs is often replaced by the total operational time of the units investigated. This is now fully in line with the definition of MTTF in IEV-192 (2015). Note also that the text in Annex C of ISO 14224 specifically warns that assuming a constant failure rate in situations where wear-out failures are present for components or parts may result in underestimating (for low operating times) or overestimating the equipment reliability (for high operating times). The examples later in this section (in Sect. 4.3) give some insights into this.
The timeline issues addressed above show that the 'time' period in the definition of MTTF is not sufficiently clear, which challenges the MTTF' applicability and value and could be difficult to deal with. One option, as an acceptable way forward, may be to define and use more than one version of the MTTF. A time period index could then be used to distinguish between the different versions, for example expressing them as MTTF*, where: * = U for up time/failure; * = O for operating time/failure; and * = N for non-operating time/failure: • MTTF U , or simply MTTF for this version only, as the 'mean time to failure', using all up time (and therefore both operating time and non-operating time), for the aggregate time in-service, and considering all failures. • MTTF O , as the 'mean operating time to operating failure', using just the operating time and considering only failures observed when the system is operating. • MTTF N , as the 'mean non-operating time to non-operating failure', using just the non-operating time and considering only failures observed when the system is not operating (and still in an up state).
This maintains the identity of the MTTF abbreviation, while also indicating the specificity of the time periods considered. Note that MTTF O would then exclude nonoperating failures occurring over the observation period, and MTTF N would exclude operating failures occurring over the observation period. The mean operating time to non-operating failures, as well as the mean non-operating time to operating failures, could also be defined, but they do not seem very useful and are not considered in this article.
For downhole equipment, it should be noted that, in cases where there are extensive non-operating periods, and failures in both operating and non-operating periods, an assumption of a constant failure rate is unlikely to be valid, not only because such equipment is exposed to different operating conditions when in different states but also because what happens when the equipment is in one state affects its reliability when in the other state, meaning that the expectations should reflect the failure distribution of the item. However, it may be valid separately for operating and non-operating failures: this is an assumption needed to estimate constant operating and non-operating failure rates.
To perhaps further complicate the picture, when considering entire systems, we could have items in multiple states. For example, some items may be in an operating state (performing as required), while others may be idle (i.e., able to perform but not required at the time) and therefore in a non-operating state.
To be clear, we believe that in no circumstances should a time period following a failure, while the equipment is in a down state and waiting for repair or replacement be included in the estimation of MTTF.
For situations where the analysis is to focus only on running time, this can be handled by defining yet another version of MTTF, focusing on this aspect: • MTTF R , as the 'mean running time to running failure', using just the running time and considering only failures observed when the system was running.
This is a far more appropriate and unambiguous term, compared with the use of MTTF, assuming, for example, only running times to be relevant. Such a measure would also reduce the need for a variety of MTTF-similar measures such as those discussed above, and it would be comparable with the relevant RLAVG, when this is based on a significant population.
The traditional MTTF, with no index, would then refer to only one of the time periods considered for the calculations, i.e., the up time.
In the same way, the RTAVG could be specified using an index to clarify which time period is applied: RT*AVG, using the same set of notations as above: For specificity and consistency, the run-life metric could be presented in a similar way: RL* AVG .

MTTF estimation misconceptions
Again, as indicated by ISO/TR 12489 (2013), MTTF should not be associated with the lifetime of the equipment. MTTF is a statistical parameter. For fully repaired items (as-goodas-new), the MTTF equals the expectation of the up time or running or operating times. However, this has nothing to do with the life duration or a period with zero failures for a given item. Furthermore, it does not provide appropriate information regarding the overall life (run-life) of an item that is not fully repaired (i.e., which is allowed to wear out). Figure 1 shows the typical behavior of an aging component, where the failure rate, λ(t), of the item is constant during its useful life and is quickly increasing due to wearout beyond this useful life period. During the useful life, the item MTTF is equal to 11.4 years (failure rate is equal to 1.0 10 -5 h), but the item lifetime is only about 5.7 years (50,000 h). If a set of such items is not repaired and is allowed to operate into the wear-out period, the average time to failure will have nothing to do with the initial constant failure rate (as the field feedback will include the wear-out periods).
In the oil and gas industry, the field feedback often comes from repaired items. Repairable items can be repaired before they are allowed to wear out and might be assumed to be 'as-good-as-new' after such repairs. Thus, if the failure rate is more or less constant during the useful life, this is thanks to the maintenance that compensates for wear-out. When the maintenance is no longer able to compensate for the wear-out of a given item, the item is normally replaced by a new one; otherwise, it fails quickly. In the example, the item would be replaced after about 38,000 h (4.3 years), i.e., after less than half of its MTTF in the useful life.
A similar example is obtained by considering a set of 40-year-old people, whose 'failure' rate is about 1.25 10 -3 failures/year. This is equivalent to an MTTF of about 800 years-about 10 times the life duration, which is around 83 years, in Norway. In fact, beyond 40 years, the failure rate of human beings increases quickly, and this explains these results. Based on this, it should be apparent that the MTTF should be seen as distinct from the lifetime of the item considered.
For a situation with a constant failure rate (assuming an exponential probability distribution), the probability to observe one failure in a period equal to the MTTF is equal to the unreliability: F(MTTF) This means that there is a probability of 63% (i.e., more than the chance, one over two) that the item fails before the end of the period given by the MTTF. It is then far from realistic to consider the MTTF as a period free of failures.
In summary, we find MTTF to be a poor estimator of the lifetime or RL, and also a poor estimator of a period with zero failures, and it should clearly not be used for these purposes.

Implications for data collection
The definitions of the different metrics have relevance for which data should be collected, where accurate information on up time, operating time, and running time is clearly required for the calculation of the run-life metrics discussed above. This means that it is not sufficient to just collect data on the dates in which an item is installed or removed from a well or a processing plant; at a very minimum, it is also necessary to collect data on the dates that the item started to operate and failed, i.e., it was no longer performing its required functions, and it needed to be replaced.
There are many situations in the oil and gas industry in which there is a long period of time between the item failure and the item replacement. This may be the case, for instance, in offshore wells, where more time is needed to mobilize a workover rig and replace a downhole completion component. It may also be the case where, due to circumstances such as oil prices, production quotas, and production facility limits, the operator has little economic incentive to restore the well production. In all these cases, the mean time to restoration (MTTRes) may be relatively high, and metrics just based on the dates the items were installed or pulled from a well will not be good estimators of the item expected run-life or time to failure. Likewise, there will be significant differences between MTTF and the mean elapsed time between failures (METBF), and also between average failure rate and average workover/pull rate.
The ISO 14224 (2016), as a main oil and gas reference for equipment reliability data collection, gives formal definitions of several 'mean time' metrics, and it also provides guidance related to the calculations of these metrics (in Clause C.5). Explicit guidance on the data required to support these calculations is not included in this international standard (nor in the IEV-192) and could have been added to make it clearer to the data collector the type of data needed for the run-life estimations. This should be considered for the next revision of the ISO 14224.
In the international standard (ibid.), some guidance on timeline issues is given, but without specifying which timelines to collect specifically for running equipment. This is pointed out here as a matter of data relevancy. It depends then on the need of the users, but also the amount of resources that is or will be allocated for the data collection activity. Unless the activity is automated in some way, the recording of start and stop times for running equipment can be quite onerous. Besides, there is an issue of accuracy related to the recording of these times that might challenge the data quality and usefulness.
Several ongoing Joint Industry Projects covering collection of reliability data for running equipment and reliability estimation have allowed the industry to accumulate relevant experience and knowledge about the related implications. However, although it might be of interest to study practical cases related to collection of running time data and associated reliability estimation (-pros and cons), with reference to these projects, such a study is outside the scope of this article.

Recommendations and concluding remarks
Based on the discussions in this paper, and despite the already quite extensive existing guidance, some further clarification could be provided in ISO 14224 (2016), to achieve consistent use of reliability parameters and terminology related to time periods and associated reliability parameters, within the petroleum, petrochemical, and natural gas industries. This includes clarifications with regard to how to classify different time periods and which time periods to include when estimating important reliability parameters, particularly the MTTF and the failure rate. Defining additional reliability parameters, or extensions such as the 'mean operating time to operating failure', in this paper denoted MTTF O , may be necessary to clarify exactly what time periods and types of failures are included in these estimates. This represents a simple way to separate variants from a basic 'mean time to failure' concept based on up times for the aggregated time in-service, where all failures are considered. The discussion around the use and interpretation of the term 'MTTF' identifies that, even though the term is largely familiar, it conceals challenging issues as a result of the somewhat inaccurate definition.
In summary, MTTF is a generic term, which has to be adapted to any specific situation. The need for the maintenance load estimation purpose or a quick reliability calculation (knowing the overall calendar time to failure) is not the same as for accurate reliability calculations (knowing the specific time to a specific failure, to estimate a specific failure rate to use in reliability models). It is important that analysts are aware of that and know how to differentiate between the various MTTFs discussed above (e.g., MTTF U , MTTF O , MTTF R ), to use the right ones to estimate the right failure rates, according to the studies that they are actually performing.
Finally, from a reliability data collection and statistical estimation perspective, there are as many versions of MTTF as states where failures can occur. To avoid the current confusion, it is suggested that the different variants are expressed using the same acronym but combined with a time period index, identifying what is actually being considered. This will reduce ambiguity and enhance clarity regarding the states (or time periods) one is referring to.
In the same way, when referring to average RT or RL, it must be clear which type of population is considered. A similar index system is suggested for these metrics: RT* AVG and RT* AVG , respectively, where * refers to the time period considered. This allows for higher consistency and transparency when calculating and applying these metrics.
An objective of this paper has been to identify gaps in the current guidance provided by ISO 14224 (2016) and IEV-192 (2015). Based on what is identified, we recommend some adjustments in the next revisions, to achieve a more consistent interpretation and use of the reliability parameters addressed. Some inconsistencies have been pointed out, and they should be addressed. Regarding the lack of coverage of 'running time', a simple starting point would be to add a formal definition of the concept as "the active operational time for the equipment", along with an MTTF version, focusing on the running time, i.e., MTTF R , in addition to the MTTF O and MTTF N , along with some guidance on when and how it is appropriate to apply such measures within reliability engineering. Furthermore, some clarification on the meaning and use of the RT and RL concepts, as well as a note on their relationship to MTTF, should be included in the standards. The suggested indexing system allows for a clear distinction between the time periods considered for calculations of the metrics and would add specificity to the definitions outlined in ISO 14224 (2016) and IEV-192 (2015).