Advertisement

Dynamic fault tree analysis of dynamic positioning system using Monte Carlo approach

  • A. S. Cheliyan
  • S. K. BhattacharyyaEmail author
Research Article

Abstract

Dynamic Positioning System (DPS) is a critical component of marine vessels and floating platforms that are used in offshore operations. It is a computer-controlled system that automatically maintains a ship’s position and heading using thrusters fitted to the ship. Reliability assessment of DPS can be analyzed through conventional fault tree analysis. However, the complex behavior such as sequence failure, redundancy management and priority of failing of events cannot be analyzed by the conventional fault tree analysis. The Dynamic Fault Tree (DFT) can address these shortcomings by defining additional dynamic gates. In this paper, Monte Carlo based simulation approach has been adopted for the analysis of DFT that models a DPS. It focuses on computing the Loss of Position (LOP) (or unavailability) of the vessel during the operation, standard measures of mean times to failure and repair, and reliability of the DPS system.

Keywords

Dynamic positioning system Dynamic fault tree Monte Carlo simulation Loss of position Reliability Unavailability 

Introduction

Oil and gas exploration, drilling and production in offshore areas are performed by floating structures or marine vessels. These floating systems use Dynamic Positioning System (DPS) for stabilization to ensure exploration, drilling, and production operations. A DPS consists of hull-mounted controllable thrusters for station keeping. It is always in the running mode during drilling and production operations so that a particular position of the system is maintained. The failure of DPS leads to Loss of Position (LOP) of a floating system that, in addition to the suspension of operations, may put environment, human lives, and assets into risk. The offshore operating environment is very often extreme and hostile, and accidents during drilling operations have proved fatal for health, safety, and environment (Hauff 2014; Pedersen 2015). The motivation for Quantitative Risk Assessment (QRA) studies of the DPS is to ensure safer operations of floating systems. As the complexity of the DPS increases, its reliability becomes important (Sørensen 2011).

An important objective of QRA studies is the prediction of the reliability of a system under operation. The Fault Tree Analysis (FTA) is an established technique for predicting the reliability of the system. The Dynamic Fault Tree Analysis (DFTA), on the other hand, extends the scope of the conventional fault tree (Lee et al. 1985; Vincoli 2014) by incorporating time-dependent behavior such as repairs and dynamic dependencies. (Dugan et al. 1992; Boudali et al. 2007; Rao et al. 2009). In this paper, a Dynamic Fault Tree (DFT) approach has been applied to a ‘floating system-DPS’ combination that is designed to maintain the position of the floating system.

The DFT is generally solved by using the Markov model approach, but in this approach, the state space becomes too large with an increase in the number of gates (Rao et al. 2009; Zhang and Chan 2012; Chiacchio et al. 2011). The Monte Carlo approach has been applied to DFT for high reliability systems such as relay protection system, phase measurement unit of a smart grid (Zhang and Chang, 2012), electrical power supply system of nuclear power plant and its reactor regulation system (Rao et al. 2009).

The DFT in this paper employs three dynamic gates to model the system, namely, PAND, FDEP and SPARE gates (see Fig. 1). The PAND (Priority AND) gate where the output of the gate is in a failed state if its inputs (A and B) are in failed states and the sequence of failure is from left (A) to right (B). The FDEP (Functional Dependency gate) is one where the event (A) connected to this gate fails whenever there is a triggering event (T) to the gate. The SPARE gate fails only if both the input (primary A and spare B) events fail. In addition, if the primary event fails, the spare event takes over till the primary event is restored to working condition.
Fig. 1

PAND, SPARE and FDEP gates

The system ‘unavailability’ over time is an important measure of the robustness (and hence reliability) of a system and in a DPS supported floating system, the system ‘unavailability’ is essentially the Loss of Position (LOP) of the system. The system’s LOP (i.e. unavailability) has been computed in the present work, and furthermore, the system measures such as Mean Time To Failure (MTTF), Mean Time To Repair (MTTR) and reliability are computed. The methodology adopted in DFTA is Monte Carlo simulation which is based on random generation of failure and repair times for the components involved in the DPS integrated floating system.

System definition, data, and its DFT

The DPS fitted to a floating offshore system usually consists of three important units, (a) Thrusters and Control Unit (TCU), (b) Sensing and Control Unit (SCU) and (c) Mooring Line Unit (MLU) as explained in ABS (2013), Spouge (2004) and Desai (2015). The system interconnectivity, i.e. its block diagram, is shown in Fig. 2. The TCU consists of all the components and systems necessary to supply the DPS with directed thrust force. It consists of thrusters with drive units, thruster control electronics, and manual thruster controls with all associated cables and cable routing. The control system consists of a computer system, position reference system and display system. Both the thruster and control units are complementary to each other for maintaining the position of the floating system. The SCU is responsible for changing the control from TCU to MLU. The mooring lines are deployed if the SCU identifies that there is a failure of TCU. It also controls the MLU. Therefore a Loss of Position (LOP) occurs in the following cases:
  1. (i)

    If the SCU fails followed by failure of TCU.

     
  2. (ii)

    If both TCU and MLU fail.

     
  3. (iii)

    If the TCU fails followed by failure of SCU.

     
Fig. 2

Block diagram of the DPS

The DFT of the floating system with DPS (Fig. 2) is shown in Fig. 3. The top event (or TE) of this DFT is the LOP event. In DFT parlance, LOP signifies ‘unavailability’ of the system. There are four gates in this DFT model, namely, OR, PAND (Priority AND), FDEP (Functional DEPendency) and SPARE gates. The OR gate implies that the output event will occur if any one of the input events occurs, and it is a conventional fault tree (FT) gate. The other three gates (PAND, FDEP, and SPARE) are specific to DFT. The PAND gate was first introduced by Fussel et al. (1976), FDEP gate by Dugan et al. (1992) and SPARE gate by Coppit (2003). The convention to represent these 3 gates are shown in Fig. 1.
Fig. 3

DFT of a floating system with DPS

The PAND gate in Fig. 3 has SCU and TCU as input events with SCU as the priority event (located left of the non-priority event TCU by convention). It implies that the gate output (see Fig. 1) occurs if both SCU (A) and TCU (B) fail and if SCU fails before or at the same time as TCU. This covers case (i) listed above.

The SPARE gate in Fig. 3 has TCU and MLU as input events. Of these two events, TCU (A) is the primary event and MLU (B) is a spare (or backup) event. The output of the gate occurs when both the primary event (TCU) and spare (or backup) event (MLU) fail. Therefore the scenario of LOP handled by this gate are that of case (ii) and case (iii) listed above.

The FDEP gate in Fig. 3 has SCU as input, and the output connects to MLU. This gate has a trigger input event, which is SCU (T) in the present DFT and a dependent event, which is MLU (A). It implies that when the trigger event occurs, the dependent basic event is forced to occur. In other words, if SCU (trigger event) fails, MLU also fails. The case where the failure of SCU forces the failure of MLU, i.e. case (iv), is handled by this gate.

Some of the major contributors to the LOP are the loss of reference due to a failure of position sensors, failure of thrusters and failure of the electrical distribution system powering the control unit. The standard failure metrics are ‘Mean Time to Failure’ (MTTF) and ‘Mean Time to Repair’ (MTTR). MTTF represents the length of time that the system is expected to operate until it fails and MTTR represents the length of time required to repair the system so that full functionality is restored. The values of MTTF and MTTR for the basic events (TCU, SCU and MLU failures) of the DFT are obtained from literature and summarised in Table 1. These values of MTTF and MTTR are assumed to be at a ‘constant failure rate’ (CFR). As a consequence, the MTTF and MTTR values of the LOP will also be at CFR. More realistic failure distribution data are not available for the system considered. The task of DFT analysis is to calculate MTTF and MTTR of the overall system (i.e. LOP of DPS) as well as estimate the LOP and system reliability. The values in Table 1 are taken from Phillips et al. (1997) for TCU and SCU, and from Drori (2015) for MLU.
Table 1

MTTF and MTTR data of basic events

Basic event

MTTF (in yrs)

MTTR (in hrs)

TCU

1.0959

100

SCU

2.2831

100

MLU

40.05

100

Monte Carlo approach to DFT

A real-life process can be mathematically modeled by using statistics in a Monte Carlo analysis. In reliability analysis, the failure process of the basic events has a particular distribution (usually assumed exponential). The time to failure and the time to repair for every basic event is computed randomly from the cumulative distribution functions of the failure and repair processes of that event. Once the time to failure and the time to repair for the basic events are estimated, these are used considering the logic of the gate to which the events are connected to obtain the failure and repair timeline of the next event. This process is continued across all higher level gates to compute the failure and repair timeline of the Top Event (TE). This constitutes a single iteration of the DFT. Monte Carlo simulation is based upon a large number of realizations of the random process. This would result in a generalized random walk through the system which is similar to a real process. Monte Carlo simulation allows to find asymptotically convergent estimates of all numerical characteristics of the random process under consideration. Additionally Monte Carlo simulation approach has the major advantage in its ability to model a variety of probability distribution functions for a given basic event. These may be Weibull, lognormal etc. (Chiacchio et al. 2011).

DFT analysis using Monte Carlo approach

The failure rate (λ) and the repair rate (μ) of a basic event are given by
$$ \lambda =\frac{1}{\mathrm{MTTF}};\kern0.48em \mu =\frac{1}{\mathrm{MTTR}} $$
(1)
The failure time distribution function fF(t), where t is the time to failure (TTF) of the event, is typically assumed to be an exponential distribution:
$$ {f}_F(t)=\lambda {e}^{-\lambda t} $$
(2)
so that the corresponding cumulative distribution function (CDF) FF (t) is given by
$$ {F}_F(t)=\underset{0}{\overset{t}{\int }}{f}_F(t)\; dt=1-{e}^{-\lambda t} $$
(3)
In Monte Carlo simulation, the TTF (t) is generated randomly in every iteration and F(t) is computed for each t. The time t is obtained from (3) as
$$ t=\frac{1}{\lambda}\ln \left\{\frac{1}{1-{F}_F(t)}\right\} $$
(4)
The repair time distribution fR(t), where t is the time to repair (TTR), is also assumed to be an exponential distribution as TTF (see Eq. (2)) and the corresponding CDF is denoted FR(t) in a similar manner. The expressions are:
$$ {f}_R(t)=\mu {e}^{-\mu t} $$
(5)
$$ {F}_R(t)=1-{e}^{-\mu t} $$
(6)
$$ t=\frac{1}{\mu}\ln \left\{\frac{1}{1-{F}_R(t)}\right\} $$
(7)

For all basic events, i.e. TCU, SCU and MLU, one has MTTF > MTTR. The maximum value of MTTF of these three events is that of MLU, which is 40 yrs. Therefore, all the events will fail at least once if the simulation (or mission) time Ts ≥ 40 yrs. ≈ 3.5 × 105 h. Therefore, at this time, at least one failure of all the events is likely. In Monte Carlo simulation, where the system MTTF is obtained by averaging many failures, the simulation time must be sufficient to admit many event failures. For example, if about 1000 failures are deemed sufficient, the simulation time will be 1000 times 3.5 × 105 h, or Ts ≈ 3.5 × 108 h. The convergence of the system parameters (e.g. MTTF, MTTR) must be established to select a reasonable value of Ts.

For a basic event, the timing diagram is shown in Fig. 4. In this diagram, starting with t = 0, the successive TTFj ( j = 1 to n) and TTRj ( j = 1 to n-1) are marked such that
$$ {T}_S\approx \sum \limits_{j=1}^n{\mathrm{TTF}}_j+\sum \limits_{j=1}^{n-1}{\mathrm{TTR}}_j\kern0.24em $$
(8)
where n is the number of failures in TS. The downtime (td,j) and uptime (tu,j) instances are marked in this diagram such that
$$ \mathrm{TTF}={\mathrm{t}}_{\mathrm{d},1};{\mathrm{TTF}}_j={t}_{d,j}-{t}_{u,j-1};{\mathrm{TTR}}_{j-1}={t}_{u,j-1}-{t}_{d,j-1}\kern0.5em \left(j=2,3,...,n\right) $$
(9)
$$ {t}_{d,1}={\mathrm{TTF}}_1;\kern0.75em {t}_{d,j}=\sum \limits_{m=1}^j{\mathrm{TTF}}_{\mathrm{m}}+\sum \limits_{m=1}^{j-1}{\mathrm{TTR}}_{\mathrm{m}};\kern0.75em {t}_{u,j-1}=\sum \limits_{m=1}^j{\mathrm{TTF}}_{\mathrm{m}}+\sum \limits_{m=1}^{j-1}{\mathrm{TTR}}_{\mathrm{m}}\ \left(j=2,3,...,n\right) $$
(10)
Fig. 4

The timing diagram for a basic event

Based on Eqs. (9) and (10), one has
$$ {T}_s={t}_{d,n} $$
(11)
The timing diagrams of the gate outputs of PAND, FDEP and SPARE gates (see Fig. 3) based on the input timing diagrams of the gate are shown in Fig. 5.
Fig. 5

Input and output timing diagrams for PAND, SPARE and FDEP gates (a) Gate output for PAND (b) Gate output for SPARE (c) FDEP gate (input A modified to A′)

The objective of DFTA is to obtain MTTF and MTTR of the top event (TE) which is LOP. The algorithm is presented below.

  1. 1.

    Let iteration number be k. Begin with k = 1.

     
  2. 2.

    Consider the basic event Xi (i = 1 for TCU, 2 for SCU and 3 for MLU). Start with i = 1.

     
  1. (a)

    Generate two random number between 0 and 1 and this number represents FF(t) and FR (t) since 0 ≤ FF (t), FR (t) ≤ 1. Find TTF1 (= t) from Eq. (4) and TTR1 (t) from Eq. (7).

     
  2. (b)

    Repeat step (a) above repeatedly to get ‘TTF2 and TTR2’, ‘TTF3 and TTR3’ till ‘TTFn and TTRm’ till Eq. (8) is satisfied.

     
  3. (c)

    Obtain the down and uptimes from Eq. (10). Thus we have the vectors

     

\( {t}_{d,j}^i=<{t}_{d,1},{t}_{d,2},...,{t}_{d,n}{>}^T\mathrm{and}\ {t}_{u,j}^i=<{t}_{u,1},{t}_{u,2},...,{t}_{u,n}{>}^T \)

for basic event Xi.
  1. 3.

    Repeat step 2 for remaining (i = 2 and 3) basic events.

     
  2. 4.

    From the uptime and downtime vectors of basic events, construct output uptime and downtime vectors based on gate type as shown in Fig. 5 and following the DFT connectivity obtain the uptime and downtime vectors for the top event (TE). If these vectors are denoted

     
$$ {t}_{d,j}^{TE}=<{t}_{d,1},{t}_{d,2},...,{t}_{d,n}{>}^T\ \mathrm{and}\ {t}_{u,j}^{TE}=<{t}_{u,1},{t}_{u,2},...,{t}_{u,m}{>}^T $$
compute the vector of TTFj and TTRj using Eq. (9) for the top event. Then, using basic definition, compute MTTF and MTTR of the top event at k-th iteration as
$$ {\mathrm{MTTF}}^{\left(\mathrm{TE},k\right)}=\frac{1}{n}\sum \limits_{j=1}^n{\mathrm{TTF}}_j;{\mathrm{MTTR}}^{\left(\mathrm{TE},k\right)}=\frac{1}{n-1}\sum \limits_{j=1}^{n-1}{\mathrm{TTR}}_j $$
(12)
The system unavailability (i.e. LOP), denoted U, by definition, is computed as
$$ {\mathrm{LOP}}^{(k)}={U}^{(k)}=\frac{1}{T_s}\sum \limits_{j=1}^{n-1}\left({t}_{u,j}-{t}_{d,j}\right) $$
(13)
  1. 5.

    Check convergence of MTTF and MTTR of TE in Eq. (12) at iteration k by comparing them with those at iteration k-1 to any reasonable degree of accuracy. In the present calculations, the accuracy of MTTF is approximately 100 h, and that of MTTR is about 0.0001 h.

     
If not converged, go to step 1 and repeat the steps in order. If converged, set N = k and obtain system parameters as
$$ {\displaystyle \begin{array}{l}{\mathrm{MTTF}}^{\left(\mathrm{TE}\right)}=\frac{1}{N}\sum \limits_{k=1}^N{\mathrm{MTTF}}^{\left(\mathrm{TE},k\right)}\;\\ {}{\mathrm{MTTR}}^{\left(\mathrm{TE}\right)}=\frac{1}{N}\sum \limits_{k=1}^N{\mathrm{MTTR}}^{\left(\mathrm{TE},k\right)}\\ {}{U}^{\left(\mathrm{TE}\right)}=\frac{1}{N}\sum \limits_{k=1}^N{U}^{(k)}\end{array}} $$
(14)
An exact definition of unavailability (U) is
$$ {U}^{\left(\mathrm{TE}\right)}=\frac{{\mathrm{MTTR}}^{\left(\mathrm{TE}\right)}}{{\mathrm{MTTF}}^{\left(\mathrm{TE}\right)}+{\mathrm{MTTR}}^{\left(\mathrm{TE}\right)}} $$
(15)
The system failure rate λS and the system reliability R at the end of time t is given by
$$ {\lambda}_S=1/{\mathrm{MTTF}}^{\left(\mathrm{TE}\right)};R(t)={e}^{-{\lambda}_St} $$
(16)

Results and discussion

The convergence of the system MTTF as a function of simulation time TS is shown in Fig. 6, that of MTTR in Fig. 7 and that of LOP in Fig. 8. It is clear that for convergence, one needs to adopt TS value of at least 108 h. All calculations are performed with TS = 109 h. Fig. 9 shows the convergence of the system MTTF with iterations, showing that about 8000 iterations are sufficient (N ≈ 8000) for converged results. The number of failures (n, see Eq. 8) for various values of TS is shown in Fig. 10, showing, as expected a linear relation for ‘sufficiently large’ values of TS.
Fig. 6

Convergence of system MTTF with simulation time TS

Fig. 7

Convergence of system MTTR with simulation time TS

Fig. 8

Convergence of LOP (unavailability) with simulation time TS

Fig. 9

Convergence of MTTF with iterations (TS = 109 h)

Fig. 10

Number of failures (n) for various simulation times (TS)

The converged system parameters obtained from DFTA are summarized in Table 2. It may be noted that the system MTTF (200.2 yr) is much higher than highest component (MLU) MTTF (40 yr), showing strong redundancy of the system, making it a robust one. The system MTTR is 50 h and LOP is 0.25 h/yr. The reliability of the system is computed as 99.5% at the end of 1 yr., 95.13% at the end of 10 yr., 90.5% at the end of 20 yr. (typical design life) and 60.68% at the end of 100 yr. The variation of reliability with the operation time of the system is shown in Fig. 11.
Table 2

Results of Monte Carlo Simulation of DFT (Simulation time (TS) = 109 h, No. of iterations (N) = 10000)

LOP

0.25 h/yr.

MTTF(TE)

200.2 yrs

MTTR(TE)

50 h

Reliability after 1 yr

99.5%

Reliability after 10 yr

95.13%

Fig. 11

System Reliability

Table 1 shows that MTTF3 (i.e. MTTF of MLU) is overwhelmingly larger (≈ 40 yr) than MTTF1(≈ 1.1 yr) and MTTF2 (≈ 2.3 yr). Arguably, it can be subject to a large uncertainty as well. Thus, to establish the dependence of system MTTF (i.e. MTTF(TE)) on MTTF3, DFT analyses have been carried out for three more values of MTTF3, namely 10 yr., 20 yr. and 30 yr. and the results are summarised in Table 3 and the variation of the system MTTF as function of MTTF of MLU is shown in Fig. 12.
Table 3

System parameters as functions of MTTF of MLU (Simulation time (TS) = 109 h, No. of iterations (N) = 10000)

MTTF (MLU) (yr)

MTTF(TE) (yr)

MTTR(TE) (yr)

Reliability after 10 years (%)

LOP (hr/yr)

40

200.2055

50.0004

95.13

0.2502

30

193.6095

49.9875

94.97

0.2586

20

181.5404

50.0093

94.64

0.2759

10

153.1152

50.0039

93.68

0.327

Fig. 12

System MTTF (MTTF(TE)) as a function of MTTF of MLU (MTTF3)

Conclusion

The dynamic positioning system is one of the most crucial subsystems in a floating offshore platform because the LOP of the platform can lead to either a catastrophic event or serious economic loss due to the suspension of operation. This paper proposes DFTA as a method to compute the system unavailability (LOP) and key system parameters such as MTTF, MTTR, and reliability. This approach models a ‘real’ system better because it can account for the timing of failure, the redundancies available in the system and the functional dependencies of the events of the fault tree, which cannot be considered in conventional FTA. The Monte Carlo approach to DFTA is indeed very powerful and flexible because it can easily accommodate any failure distribution. It also overcomes many problems of analytical approaches such as Markov models, Bayesian belief networks etc. where the complexity of the problem is directly related to the number of events in the fault tree.

The extension and future scope of this work lie in building DFT models for each of the units involved in the DPS and interlinking them. Furthermore, this approach can take into account the uncertainties present in the failure rate data which will aid practical engineering decisions. Such DFT studies can also help in system design by identifying the weakest link of the modeled system and introduce appropriate redundancies in the system to prevent failure.

References

  1. ABS (2013) Guide for dynamic positioning systemsGoogle Scholar
  2. Boudali H, Crouzen P, and Stoelinga M (2007) Dynamic fault tree analysis using input/output interactive Markov chains, DSN '07 proceedings of the 37th annual IEEE/IFIP international conference on dependable systems and networks, 708–717Google Scholar
  3. Chiacchio F, Compagno L, D'Urso D, Manno G, Trapani N (2011) Dynamic fault trees resolution: a conscious trade-off between analytical and simulative approaches. Reliab Eng Syst Saf 96(11):1515–1526CrossRefGoogle Scholar
  4. Coppit D (2003) Engineering modeling and analysis: sound methods and effective tools. PhD thesis, The University of VirginiaGoogle Scholar
  5. Desai N (2015) Dynamic positioning: method for disaster prevention and risk management. Procedia Earth Planet Sci 11:216–223CrossRefGoogle Scholar
  6. Drori G (2015) Underlying causes of mooring lines failures across the industry, BP Systems. http://mcedd.com/wp-content/uploads/2014/04/00_Guy-Drori-BP.pdf
  7. Dugan JB, Bavuso SJ, Boyd MA (1992) Dynamic fault tree models for fault-tolerant computer systems. IEEE Trans Reliab 41(3):363–377CrossRefGoogle Scholar
  8. Fussel J, Aber E, Rahl R (1976) On the quantitative analysis of priority-and failure logic. IEEE Trans Reliab R25(5):324–326CrossRefGoogle Scholar
  9. Hauff K S (2014) Analysis of loss of position incidents for dynamically operated vessels, Master Thesis, Norwegian University of Science and TechnologyGoogle Scholar
  10. Lee WS, Grosh DL, Tillman FA, Lie CH (1985) Fault tree analysis, methods, and applications: a review. IEEE Trans Reliab 34(3):194–203CrossRefGoogle Scholar
  11. Pedersen R N (2015) QRA techniques on dynamic positioning systems during drilling operations in the Arctic: with emphasis on the dynamic positioning operator, Master’s thesis, The Arctic University of NorwayGoogle Scholar
  12. Phillips D, Stanbery R, Weisinger D (1997). Marine Technology Society https://dynamic-positioning.com/proceedings/dp1997/249_reliability_shatto_phillips.pdf
  13. Rao KD, Gopika V, Rao VVSS, Kushwaha HS, Verma AK, Srividya A (2009) Dynamic fault tree analysis using Monte Carlo simulation in probabilistic safety assessment. Reliab Eng Syst Saf 94(4):872–883CrossRefGoogle Scholar
  14. Sørensen AJ (2011) A survey of dynamic positioning control systems. Annu Rev Control 35(1):123–136CrossRefGoogle Scholar
  15. Spouge J (2004) Review of methods for demonstrating redundancy in dynamic positioning systems for the offshore industry, DNV consulting for HSE, ISBN 0 7176 2814 0Google Scholar
  16. Vincoli J W (2014) Fault tree analysis. Basic guide to system safety, Third EditionGoogle Scholar
  17. Zhang P, Chan KW (2012) Reliability evaluation of phasor measurement unit using Monte Carlo dynamic fault tree method. IEEE Transactions on Smart Grid 3(3):1235–1243CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Ocean EngineeringIndian Institute of TechnologyMadrasIndia

Personalised recommendations