Abstract
Dynamic fault trees (DFT) are widely adopted in industry to assess the dependability of safety-critical equipment. Since many systems are too large to be studied numerically, DFTs dependability is often analysed using Monte Carlo simulation. A bottleneck here is that many simulation samples are required in the case of rare events, e.g. in highly reliable systems where components fail seldomly. Rare event simulation (RES) provides techniques to reduce the number of samples in the case of rare events. We present a RES technique based on importance splitting, to study failures in highly reliable DFTs. Whereas RES usually requires meta-information from an expert, our method is fully automatic: By cleverly exploiting the fault tree structure we extract the so-called importance function. We handle DFTs with Markovian and non-Markovian failure and repair distributions—for which no numerical methods exist—and show the efficiency of our approach on several case studies.
This work was partially funded by NWO, NS, and ProRail project 15474 (SEQUOIA), ERC grant 695614 (POWVER), EU project 102112 (SUCCESS), ANPCyT PICT-2017-3894 (RAFTSys), and SeCyT project 33620180100354CB (ARES).
Chapter PDF
Similar content being viewed by others
References
Abate, A., Budde, C.E., Cauchi, N., Hoque, K.A., Stoelinga, M.: Assessment of maintenance policies for smart buildings: Application of formal methods to fault maintenance trees. PHM Society European Conference 4(1) (2018), https://www.phmpapers.org/index.php/phme/article/view/385
Bayes, A.J.: Statistical techniques for simulation models. Australian computer journal 2(4), 180–184 (1970)
Beccuti, M., Codetta-Raiteri, D., Franceschinis, G., Haddad, S.: Non deterministic repairable fault trees for computing optimal repair strategy. In: VALUETOOLS 2008 (2010). https://doi.org/10.4108/ICST.VALUETOOLS2008.4411
Blanchet, J., Mandjes, M.: Rare event simulation for queues. In: Rubino and Tuffin [36], pp. 87–124. https://doi.org/10.1002/9780470745403.ch5
Blom, H.A.P., Bakker, G.J.B., Krystul, J.: Rare event estimation for a large-scale stochastic hybrid system with air traffic application. In: Rubino and Tuffin [36], pp. 193–214. https://doi.org/10.1002/9780470745403.ch9
Bobbio, A., Codetta-Raiteri, D.: Parametric fault trees with dynamic gates and repair boxes. In: RAMS 2004. pp. 459–465. IEEE (2004). https://doi.org/10.1109/RAMS.2004.1285491
Boudali, H., Crouzen, P., Haverkort, B.R., Kuntz, M., Stoelinga, M.: Architectural dependability evaluation with arcade. In: DSN’08. pp. 512–521. IEEE Computer Society (2008). https://doi.org/10.1109/DSN.2008.4630122
Budde, C.E.: Automation of Importance Splitting Techniques for Rare Event Simulation. Ph.D. thesis, FAMAF, Universidad Nacional de Córdoba, Córdoba, Argentina (2017), https://famaf.biblio.unc.edu.ar/cgi-bin/koha/opac-detail.pl?biblionumber=18143
Budde, C.E., Biagi, M., Monti, R.E., D’Argenio, P.R., Stoelinga, M.: Rare event simulation for non-Markovian repairable fault trees. arXiv e-prints arXiv:1910.11672 (2019), https://arxiv.org/abs/1910.11672
Budde, C.E., D’Argenio, P.R., Hartmanns, A.: Better automated importance splitting for transient rare events. In: SETTA. LNCS, vol. 10606, pp. 42–58. Springer (2017). https://doi.org/10.1007/978-3-319-69483-2_3
Budde, C.E., D’Argenio, P.R., Hermanns, H.: Rare event simulation with fully automated importance splitting. In: EPEW 2015. LNCS, vol. 9272, pp. 275–290. Springer (2015). https://doi.org/10.1007/978-3-319-23267-6_18
Budde, C.E., D’Argenio, P.R., Monti, R.E.: Compositional construction of importance functions in fully automated importance splitting. In: VALUETOOLS 2016. pp. 30–37 (2017). https://doi.org/10.4108/eai.25-10-2016.2266501
Codetta-Raiteri, D., Iacono, M., Franceschinis, G., Vittorini, V.: Repairable fault tree for the automatic evaluation of repair policies. In: DSN 2004. pp. 659–668. IEEE Computer Society (2004). https://doi.org/10.1109/DSN.2004.1311936
Coppit, D., Sullivan, K.J., Dugan, J.B.: Formal semantics of models for computational engineering: a case study on dynamic fault trees. In: ISSRE 2000. pp. 270–282 (2000). https://doi.org/10.1109/ISSRE.2000.885878
Crouzen, P., Boudali, H., Stoelinga, M.: Dynamic fault tree analysis using input/output interactive Markov chains. In: DSN 2007. pp. 708–717. IEEE Computer Society (2007). https://doi.org/10.1109/DSN.2007.37
D’Argenio, P.R., Monti, R.E.: Input/Output Stochastic Automata with Urgency: Confluence and weak determinism. In: ICTAC. LNCS, vol. 11187, pp. 132–152. Springer (2018). https://doi.org/10.1007/978-3-030-02508-3_8
Dugan, J.B., Bavuso, S.J., Boyd, M.A.: Fault trees and sequence dependencies. In: ARMS 1990. pp. 286–293. IEEE (1990). https://doi.org/10.1109/ARMS.1990.67971
Garvels, M.J.J., van Ommeren, J.K.C.W., Kroese, D.P.: On the importance function in splitting simulation. European Transactions on Telecommunications 13(4), 363–371 (2002). https://doi.org/10.1002/ett.4460130408
Garvels, M.J.J.: The splitting method in rare event simulation. Ph.D. thesis, Department of Computer Science, University of Twente, Enschede, The Netherlands (2000), http://eprints.eemcs.utwente.nl/14291/.
Goyal, A., Shahabuddin, P., Heidelberger, P., Nicola, V.F., Glynn, P.W.: A unified framework for simulating Markovian models of highly dependable systems. IEEE Transactions on Computers 41(1), 36–51 (1992). https://doi.org/10.1109/12.123381
Guck, D., Spel, J., Stoelinga, M.: DFTCalc: Reliability centered maintenance via fault tree analysis (tool paper). In: ICFEM 2015. LNCS, vol. 9407, pp. 304–311. Springer (2015). https://doi.org/10.1007/978-3-319-25423-4_19
Guck, D., Katoen, J.P., Stoelinga, M., Luiten, T., Romijn, J.: Smart railroad maintenance engineering with stochastic model checking. In: Railways 2014. Civil-Comp Proceedings, Civil-Comp Press (2014). https://doi.org/10.4203/ccp.104.299
Heidelberger, P.: Fast simulation of rare events in queueing and reliability models. ACM Trans. Model. Comput. Simul. 5(1), 43–85 (1995). https://doi.org/10.1145/203091.203094
Jegourel, C., Legay, A., Sedwards, S.: Importance splitting for statistical model checking rare properties. In: CAV 2013. LNCS, vol. 8044, pp. 576–591. Springer (2013). https://doi.org/10.1007/978-3-642-39799-8_38
Jégourel, C., Legay, A., Sedwards, S., Traonouez, L.M.: Distributed verification of rare properties using importance splitting observers. In: AVoCS 2015. ECEASST, vol. 72 (2015). https://doi.org/10.14279/tuj.eceasst.72.1024
Junges, S., Guck, D., Katoen, J.P., Rensink, A., Stoelinga, M.: Fault trees on a diet. In: SETTA 2015. LNCS, vol. 9409, pp. 3–18. Springer (2015). https://doi.org/10.1007/978-3-319-25942-0_1
Junges, S., Guck, D., Katoen, J., Stoelinga, M.: Uncovering dynamic fault trees. In: DSN 2016. pp. 299–310. IEEE Computer Society (2016). https://doi.org/10.1109/DSN.2016.35
Kahn, H., Harris, T.E.: Estimation of particle transmission by random sampling. National Bureau of Standards applied mathematics series 12, 27–30 (1951)
Katoen, J.P., Stoelinga, M.: Boosting Fault Tree Analysis by Formal Methods, LNCS, vol. 10500, pp. 368–389. Springer (2017). https://doi.org/10.1007/978-3-319-68270-9_19
L’Ecuyer, P., Le Gland, F., Lezaud, P., Tuffin, B.: Splitting techniques. In: Rubino and Tuffin [36], pp. 39–61. https://doi.org/10.1002/9780470745403.ch3
Liu, Y., Wu, Y., Kalbarczyk, Z.: Smart maintenance via dynamic fault tree analysis: A case study on Singapore MRT system. In: DSN 2017. pp. 511–518. IEEE Computer Society (2017). https://doi.org/10.1109/DSN.2017.50
Monti, R.E.: Stochastic Automata for Fault Tolerant Concurrent Systems. Ph.D. thesis, FAMAF, Universidad Nacional de Córdoba, Córdoba, Argentina (2018)
Nicola, V.F., Shahabuddin, P., Nakayama, M.K.: Techniques for fast simulation of models of highly dependable systems. IEEE Transactions on Reliability 50(3), 246–264 (2001). https://doi.org/10.1109/24.974122
Ridder, A.: Importance sampling simulations of Markovian reliability systems using cross-entropy. Annals of Operations Research 134(1), 119–136 (2005). https://doi.org/10.1007/s10479-005-5727-9
Rubino, G., Tuffin, B.: Introduction to rare event simulation. In: Rare Event Simulation Using Monte Carlo Methods [36], pp. 1–13. https://doi.org/10.1002/9780470745403.ch1
Rubino, G., Tuffin, B. (eds.): Rare Event Simulation Using Monte Carlo Methods. John Wiley & Sons, Ltd (2009)
Ruijters, E., Guck, D., Drolenga, P., Peters, M., Stoelinga, M.: Maintenance analysis and optimization via statistical model checking. In: QEST 2016. LNCS, vol. 9826, pp. 331–347. Springer (2016). https://doi.org/10.1007/978-3-319-43425-4_22
Ruijters, E., Guck, D., van Noort, M., Stoelinga, M.: Reliability-centered maintenance of the electrically insulated railway joint via fault tree analysis: A practical experience report. In: DSN 2016. pp. 662–669. IEEE Computer Society (2016). https://doi.org/10.1109/DSN.2016.67
Ruijters, E., Reijsbergen, D., de Boer, P.T., Stoelinga, M.: Rare event simulation for dynamic fault trees. Reliability Engineering & System Safety 186, 220–231 (2019). https://doi.org/10.1016/j.ress.2019.02.004
Ruijters, E., Stoelinga, M.: Fault tree analysis: A survey of the state-of-the-art in modeling, analysis and tools. Computer Science Review 15-16, 29–62 (2015). https://doi.org/10.1016/j.cosrev.2015.03.001
Sullivan, K.J., Dugan, J.B.: Galileo user’s manual & design overview. https://www.cse.msu.edu/~cse870/Materials/FaultTolerant/manual-galileo.htm (1998), v2.1-alpha
Sullivan, K., Dugan, J., Coppit, D.: The Galileo fault tree analysis tool. In: 29th Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352). pp. 232–235. IEEE (1999). https://doi.org/10.1109/FTCS.1999.781056
Vesely, W., Stamatelatos, M., Dugan, J., Fragola, J., Minarick, J., Railsback, J.: Fault tree handbook with aerospace applications. NASA Office of Safety and Mission Assurance (2002), version 1.1
Villén-Altamirano, J.: RESTART method for the case where rare events can occur in retrials from any threshold. Int. J. Electron. Commun. 52(3), 183–189 (1998)
Villén-Altamirano, J.: Importance functions for RESTART simulation of highly-dependable systems. Simulation 83(12), 821–828 (2007). https://doi.org/10.1177/0037549707081257
Villén-Altamirano, J.: RESTART vs splitting: A comparative study. Performance Evaluation 121-122, 38–47 (2018). https://doi.org/10.1016/j.peva.2018.02.002
Villén-Altamirano, M., Martínez-Marrón, A., Gamo, J., Fernández-Cuesta, F.: Enhancement of the accelerated simulation method RESTART by considering multiple thresholds. In: Proc. 14th Int. Teletraffic Congress, Teletraffic Science and Engineering, vol. 1, pp. 797–810. Elsevier (1994). https://doi.org/10.1016/B978-0-444-82031-0.50084-6
Villén-Altamirano, M., Villén-Altamirano, J.: RESTART: a method for accelerating rare event simulations. In: Queueing, Performance and Control in ATM (ITC-13). pp. 71–76. Elsevier (1991)
Xiao, G., Li, Z., Li, T.: Dependability estimation for non-Markov consecutive-k-out-of-n: F repairable systems by fast simulation. Reliability Engineering & System Safety 92(3), 293–299 (2007). https://doi.org/10.1016/j.ress.2006.04.004
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Budde, C.E., Biagi, M., Monti, R.E., D’Argenio, P.R., Stoelinga, M. (2020). Rare Event Simulation for Non-Markovian Repairable Fault Trees. In: Biere, A., Parker, D. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2020. Lecture Notes in Computer Science(), vol 12078. Springer, Cham. https://doi.org/10.1007/978-3-030-45190-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-45190-5_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45189-9
Online ISBN: 978-3-030-45190-5
eBook Packages: Computer ScienceComputer Science (R0)