Skip to main content
Log in

How to certify machine learning based safety-critical systems? A systematic literature review

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Context

Machine Learning (ML) has been at the heart of many innovations over the past years. However, including it in so-called “safety-critical” systems such as automotive or aeronautic has proven to be very challenging, since the shift in paradigm that ML brings completely changes traditional certification approaches.

Objective

This paper aims to elucidate challenges related to the certification of ML-based safety-critical systems, as well as the solutions that are proposed in the literature to tackle them, answering the question “How to Certify Machine Learning Based Safety-critical Systems?”.

Method

We conduct a Systematic Literature Review (SLR) of research papers published between 2015 and 2020, covering topics related to the certification of ML systems. In total, we identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification. We analyzed the main trends and problems of each sub-field and provided summaries of the papers extracted.

Results

The SLR results highlighted the enthusiasm of the community for this subject, as well as the lack of diversity in terms of datasets and type of ML models. It also emphasized the need to further develop connections between academia and industries to deepen the domain study. Finally, it also illustrated the necessity to build connections between the above mentioned main pillars that are for now mainly studied separately.

Conclusion

We highlighted current efforts deployed to enable the certification of ML based software systems, and discuss some future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.forbes.com/sites/louiscolumbus/2020/01/19/roundup-of-machine-learning-forecasts-and-market-estimates-2020/

  2. https://www.cbc.ca/news/business/uber-self-driving-car-2018-fatal-crash-software-flaws-1.5349581

  3. https://webstore.iec.ch/publication/6007

  4. https://www.iso.org/standard/68383.html

  5. https://my.rtca.org/NC__Product?id=a1B36000001IcmqEAC

  6. https://www.eaS.A.europa.eu/newsroom-and-events/news/easa-releases-consultation-its-first-usable-guidance-level-1-machine

  7. https://www.iso.org/committee/6794475/x/catalogue/

  8. https://www.iso.org/standard/68305.html?browse=tc

  9. https://www.iso.org/standard/77608.html?browse=tc

  10. https://www.iso.org/standard/77609.html?browse=tc

  11. https://www.iso.org/standard/81283.html?browse=tc

  12. https://www.faa.gov/aircraft/air_cert/design_approvals/air_software/media/TC_Overarching

  13. https://www.deel.ai

  14. https://scholar.google.com

  15. https://www.engineeringvillage.com

  16. https://webofknowledge.com

  17. https://www.sciencedirect.com

  18. https://www.scopus.com

  19. https://dl.acm.org

  20. https://ieeexplore.ieee.org

  21. Harzing, A.W. (2007) Publish or Perish, available from https://harzing.com/resources/publish-or-perish

  22. https://endnote.com

  23. https://github.com/FlowSs/How-to-Certify-Machine-Learning-BasedSafety-critical-Systems-A-Systematic-Literature-Review

  24. It is worth noting, the method can outperform other defense based on adversarial training such as Rusak et al. (2020) (\(L_\infty\) constraint and adversarial noise with Stylized ImageNet training) when considering a wide range of attack constraints and common image corruptions.

  25. Author’s remark: A new improvement of GLOD, FOOD, was released earlier this year. Following our methodology, we kept only GLOD reference our methodology extracted, but we invite readers to check the new instalment of the method: https://arxiv.org/abs/2008.06856

  26. They argue that the background part is why OOD can be misinterpreted. Indeed, they observed both that several OOD can have similar background components as in-distribution data and that the background term can dominate the semantic term in the likelihood computation. By adding noise, they essentially mask the semantic term so they can train a model specifically on background components. This could explain why models such as PixelCNN can fail on OOD detection.

  27. Author’s remark: The original ADP paper pushed on arXiv in 2019 has been improved and re-uploaded in 2020. In this review, we kept the 2019 reference, which was recovered by our methodology, but we invite reader to check the 2020 paper: https://arxiv.org/abs/1912.01108

  28. NNV benefits from parallel computing which makes it faster than Reluplex (Katz et al. 2017) and other existing DNN verification frameworks.

  29. https://www.deel.ai

  30. https://github.com/FlowSs/How-to-Certify-Machine-Learning-BasedSafety-critical-Systems-A-Systematic-Literature-Review

References

  • Arcaini, P., Bombarda, A., Bonfanti, S., Gargantini, A.: Dealing with robustness of convolutional neural networks for image classification. In: 2020 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 7–14 (2020) https://doi.org/10.1109/AITEST49225.2020.00009

  • Abreu, S.: Automated architecture design for deep neural networks (2019). ArXiv preprint arXiv:1908.10714

  • Agostinelli, F., Hocquet, G., Singh, S., Baldi, P.: From reinforcement learning to deep reinforcement learning: an overview. In: Braverman Readings in Machine Learning. Key Ideas From Inception to Current State, pp. 298–328. Springer, Berlin (2018)

    Google Scholar 

  • Alagöz, I., Herpel, T., German, R.: A selection method for black box regression testing with a statistically defined quality level. In: 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 114–125 (2017). https://doi.org/10.1109/ICST.2017.18

  • Amarasinghe, K., Manic, M.: Explaining what a neural network has learned: toward transparent classification. In: 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, pp. 1–6 (2019)

  • Ameyaw, D.A., Deng, Q., Söffker, D.: Probability of detection (pod)-based metric for evaluation of classifiers used in driving behavior prediction. In: Annual Conference of the PHM Society, vol 11 (2019)

  • Amini, A., Schwarting, W., Soleimany, A., Rus, D.: Deep evidential regression (2019). ArXiv preprint arXiv:1910.02600

  • Amit, G., Levy, M., Rosenberg, I., Shabtai, A., Elovici, Y.: Glod: Gaussian likelihood out of distribution detector (2020). ArXiv preprint arXiv:2008.06856

  • Anderson, BG., Ma, Z., Li, J., Sojoudi, S.: Tightened convex relaxations for neural network robustness certification. In: 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, pp. 2190–2197 (2020)

  • Aravantinos, V., Diehl, F.: Traceability of deep neural networks (2019). ArXiv preprint arXiv:1812.06744

  • Arnab, A., Miksik, O., Torr, PH.: On the robustness of semantic segmentation models to adversarial attacks. In: 2018 IEEECVF Conference on Computer Vision and Pattern Recognition, pp. 888–897 (2018) https://doi.org/10.1109/CVPR.2018.00099

  • Arrieta, A.B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-López, S., Molina, D., Benjamins, R., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)

    Article  Google Scholar 

  • Aslansefat, K., Sorokos, I., Whiting, D., Kolagari, R.T., Papadopoulos, Y.: Safeml: Safety monitoring of machine learning classifiers through statistical difference measure (2020). ArXiv preprint arXiv:2005.13166

  • Ayers, EW., Eiras, F., Hawasly, M., Whiteside, I.: Parot: a practical framework for robust deep neural network training. In: NASA Formal Methods Symposium. Springer, Berlin. pp. 63–84 (2020)

  • Bacci, E., Parker, D.: Probabilistic guarantees for safe deep reinforcement learning. In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer, Berlin. pp. 231–248 (2020)

  • Baheri, A., Nageshrao, S., Tseng, H.E., Kolmanovsky, I., Girard, A., Filev, D.: Deep reinforcement learning with enhanced safety for autonomous highway driving. In: 2020 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp. 1550–1555 (2019)

  • Bakhti, Y., Fezza, S.A., Hamidouche, W., Déforges, O.: DDSA: a defense against adversarial attacks using deep denoising sparse autoencoder. IEEE Access 7, 160397–160407 (2019)

    Article  Google Scholar 

  • Baluta, T., Shen, S., Shinde, S., Meel, KS., Saxena, P.: Quantitative verification of neural networks and its security applications. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1249–1264 (2019)

  • Bar, A., Huger, F., Schlicht, P., Fingscheidt, T.: On the robustness of redundant teacher-student frameworks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1380–1388 (2019)

  • Bar, A., Klingner, M., Varghese, S., Huger, F., Schlicht, P., Fingscheidt, T.: Robust semantic segmentation by redundant networks with a layer-specific loss contribution and majority vote. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 332–333 (2020)

  • Ben Braiek, H., Khomh, F.: Deepevolution: A search-based testing approach for deep neural networks. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 454–458 (2019) https://doi.org/10.1109/ICSME.2019.00078

  • Berkenkamp, F., Turchetta, M., Schoellig, AP., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), pp. 908–919 (2017)

  • Bernhard, J., Gieselmann, R., Esterle, K., Knol, A.: Experience-based heuristic search: Robust motion planning with deep q-learning. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp. 3175–3182 (2018)

  • Biondi, A., Nesti, F., Cicero, G., Casini, D., Buttazzo, G.: A safe, secure, and predictable software architecture for deep learning in safety-critical systems. IEEE Embed. Syst. Lett. 12(3), 78–82 (2020). https://doi.org/10.1109/LES.2019.2953253

    Article  Google Scholar 

  • Bragg, J., Habli, I.: What is acceptably safe for reinforcement learning? In: International Conference on Computer Safety, Reliability, and Security. Springer, Berlin. pp. 418–430 (2018)

  • Bunel, R., Lu, J., Turkaslan, I., Torr, P.H., Kohli, P., Kumar, M.P.: Branch and bound for piecewise linear neural network verification. J. MaC.H. Learn. Res. 21(42), 1–39 (2020)

  • Burton, S., Gauerhof, L., Sethy, B.B., Habli, I., Hawkins, R.: Confidence arguments for evidence of performance in machine learning for highly automated driving functions. In: Romanovsky, A., Troubitsyna, E., Gashi, I., Schoitsch, E., Bitsch, F. (eds.) Computer Safety, Reliability, and Security, pp. 365–377. Springer, Berlin (2019)

    Chapter  Google Scholar 

  • Cardelli, L., Kwiatkowska, M., Laurenti, L., Patane, A.: Robustness guarantees for bayesian inference with gaussian processes. Proc. AAAI Conf. Artif. Intell. 33, 7759–7768 (2019)

    Google Scholar 

  • Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017). https://doi.org/10.1109/SP.2017.49

  • Castelvecchi, D.: Can we open the black box of AI? Nat News 538, 20–23 (2016)

    Article  Google Scholar 

  • Chakrabarty, A., Quirynen, R., Danielson, C., Gao, W.: Approximate dynamic programming for linear systems with state and input constraints. In: 2019 18th European Control Conference (ECC), IEEE, pp. 524–529 (2019)

  • Chen, TY., Cheung, SC., Yiu, SM.: Metamorphic testing: a new approach for generating next test cases (2020a). ArXiv preprint arXiv:2002.12543

  • Chen, Z., Narayanan, N., Fang, B., Li, G., Pattabiraman, K., DeBardeleben, N.: Tensorfi: A flexible fault injection framework for tensorflow applications. In: 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pp. 426–435 (2020b). https://doi.org/10.1109/ISSRE5003.2020.00047

  • Cheng, C.H.: Safety-aware hardening of 3d object detection neural network systems (2020). ArXiv preprint arXiv:2003.11242

  • Cheng, C.H., Nührenberg, G., Ruess, H.: Maximum resilience of artificial neural networks. In: International Symposium on Automated Technology for Verification and Analysis. Springer, Berlin. pp. 251–268, (2017)

  • Cheng, C.H., Huang, C.H., Nührenberg, G.: nn-dependability-kit: Engineering neural networks for safety-critical autonomous driving systems (2019a). ArXiv preprint arXiv:1811.06746

  • Cheng, C., Nührenberg, G., Yasuoka, H.: Runtime monitoring neuron activation patterns. In: 2019 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 300–303 (2019b). https://doi.org/10.23919/DATE.2019.8714971

  • Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. Proceedings of the AAAI Conference on Artificial Intelligence 33, 3387–3395 (2019c)

    Article  Google Scholar 

  • Cofer, D., Amundson, I., Sattigeri, R., Passi, A., Boggs, C., Smith, E., Gilham, L., Byun, T., Rayadurgam, S.: Run-time assurance for learning-based aircraft taxiing. In: 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC), pp. 1–9 (2020). https://doi.org/10.1109/DASC50938.2020.9256581

  • Colangelo, F., Neri, A., Battisti, F.: Countering adversarial examples by means of steganographic attacks. In: 2019 8th European Workshop on Visual Information Processing (EUVIP), pp. 193–198 (2019). https://doi.org/10.1109/EUVIP47703.2019.8946254

  • Cosentino, J., Zaiter, F., Pei, D., Zhu, J.: The search for sparse, robust neural networks (2019). ArXiv preprint arXiv:1912.02386

  • Croce, F., Hein, M.: Provable robustness against all adversarial \(l\_p\)-perturbations for \(p \ge 1\) (2019). ArXiv preprint arXiv:1905.11213

  • Croce, F., Andriushchenko, M., Hein, M.: Provable robustness of relu networks via maximization of linear regions. In: the 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 2057–2066 (2019)

  • Daniels, Z.A., Metaxas, D.: Scenarionet: An interpretable data-driven model for scene understanding. In: IJCAI Workshop on Explainable Artificial Intelligence (XAI) 2018 (2018)

  • Dapello, J., Marques, T., Schrimpf, M., Geiger, F., Cox, D.D., DiCarlo, J.J.: Simulating a primary visual cortex at the front of CNNs improves robustness to image perturbations (2020). bioRxiv https://doi.org/10.1101/2020.06.16.154542

  • Dean, S., Matni, N., Recht, B., Ye, V.: Robust guarantees for perception-based control. In: Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR, vol 120, 350–360 (2020)

  • Delseny, H., Gabreau, C., Gauffriau, A., Beaudouin, B., Ponsolle, L., Alecu, L., Bonnin, H., Beltran, B., Duchel, D., Ginestet, J.B., Hervieu, A., Martinez, G., Pasquet, S., Delmas, K., Pagetti, C., Gabriel, J.M., Chapdelaine, C., Picard, S., Damour, M., Cappi, C., Gardès, L., Grancey, F.D., Jenn, E., Lefevre, B., Flandin, G., Gerchinovitz, S., Mamalet, F., Albore, A.: White paper machine learning in certified systems (2021). ArXiv preprint arXiv:2103.10529

  • Demir, S., Eniser, H.F., Sen, A.: Deepsmartfuzzer: Reward guided test generation for deep learning. ArXiv preprint arXiv:arXiv 1911, 10621 (2019)

    Google Scholar 

  • Deshmukh, J.V., Kapinski, JP., Yamaguchi, T., Prokhorov, D.: Learning deep neural network controllers for dynamical systems with safety guarantees. In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), IEEE, pp. 1–7 (2019)

  • Dey, S., Dasgupta, P., Gangopadhyay, B.: Safety augmentation in decision trees. In: AISafety@ IJCAI (2020)

  • Dreossi, T., Ghosh, S., Sangiovanni-Vincentelli, A., Seshia, S.A.: Systematic testing of convolutional neural networks for autonomous driving (2017). ArXiv preprint arXiv:1708.03309

  • Duddu, V., Rao, DV., Balas, VE.: Adversarial fault tolerant training for deep neural networks (2019). ArXiv preprint arXiv:1907.03103

  • Dutta, S., Jha, S., Sanakaranarayanan, S., Tiwari, A.: Output range analysis for deep neural networks (2017). ArXiv preprint arXiv:1709.09130

  • Dybå, T., Dingsøyr, T.: Empirical studies of agile software development: a systematic review. Inf. Softw. Technol. 50(9), 833–859 (2008). https://doi.org/10.1016/j.infsof.2008.01.006

    Article  Google Scholar 

  • Eniser, H.F., Gerasimou, S., Sen, A.: Deepfault: fault localization for deep neural networks. In: Hähnle, R., van der Aalst, W. (eds.) Fundamental Approaches to Software Engineering, pp. 171–191. Springer, Cham (2019)

    Chapter  Google Scholar 

  • Everett, M., Lütjens, B., How, J.P.: Certifiable robustness to adversarial state uncertainty in deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst (2021). https://doi.org/10.1109/TNNLS.2021.3056046

    Article  Google Scholar 

  • Fan, D.D., Nguyen, J., Thakker, R., Alatur, N., Agha-mohammadi, A.A., Theodorou, E.A.: Bayesian learning-based adaptive control for safety critical systems. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4093–4099 (2020). https://doi.org/10.1109/ICRA40945.2020.9196709

  • Feng, D., Rosenbaum, L., Glaeser, C., Timm, F., Dietmayer, K.: Can we trust you? on calibration of a probabilistic object detector for autonomous driving (2019). ArXiv preprint arXiv:1909.12358

  • Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: Prioritizing massive tests to enhance the robustness of deep neural networks. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Association for Computing Machinery, New York, NY, USA, ISSTA 2020, pp. 177-188 (2020). https://doi.org/10.1145/3395363.3397357

  • Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019). https://doi.org/10.1109/TAC.2018.2876389

    Article  MathSciNet  MATH  Google Scholar 

  • François-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An Introduction to Deep Reinforcement Learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018)

    Article  MATH  Google Scholar 

  • Fremont, D.J., Chiu, J., Margineantu, D.D., Osipychev, D., Seshia, S.A.: Formal analysis and redesign of a neural network-based aircraft taxiing system with verifai. In: International Conference on Computer Aided Verification. Springer, Berlin. pp. 122–134 (2020)

  • Fujino, H., Kobayashi, N., Shirasaka, S.: Safety assurance case description method for systems incorporating off-operational machine learning and safety device. INCOSE Int. Symp. 29(S1), 152–164 (2019)

    Article  Google Scholar 

  • Fulton, N., Platzer, A.: Verifiably safe off-model reinforcement learning. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, Berlin. pp. 413–430 (2019)

  • Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, JMLR.org, ICML’16, pp. 1050-1059 (2016)

  • Gambi, A., Mueller, M., Fraser, G.: Automatically testing self-driving cars with search-based procedural content generation. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, New York, NY, USA, ISSTA 2019, pp. 318-328 (2019)

  • Gandhi, D., Pinto, L., Gupta, A.: Learning to fly by crashing. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 3948–3955 (2017)

  • Gauerhof, L., Munk, P., Burton, S.: Structuring validation targets of a machine learning function applied to automated driving. In: Gallina, B., Skavhaug, A., Bitsch, F. (eds.) Computer Safety, Reliability, and Security, pp. 45–58. Springer, Berlin (2018)

    Google Scholar 

  • Gauerhof, L., Hawkins, R., Picardi, C., Paterson, C., Hagiwara, Y., Habli, I.: Assuring the safety of machine learning for pedestrian detection at crossings. In: Casimiro, A., Ortmeier, F., Bitsch, F., Ferreira, P. (eds.) Computer Safety, Reliability, and Security, pp. 197–212. Springer, Cham (2020)

    Chapter  Google Scholar 

  • Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: Ai2: Safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), IEEE, pp. 3–18 (2018)

  • Ghosh, S., Berkenkamp, F., Ranade, G., Qadeer, S., Kapoor, A.: Verifying controllers against adversarial examples with bayesian optimization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 7306–7313 (2018a)

  • Ghosh, S., Jha, S., Tiwari, A., Lincoln, P., Zhu, X.: Model, data and reward repair: Trusted machine learning for markov decision processes. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 194–199 (2018b)

  • Gladisch, C., Heinzemann, C., Herrmann, M., Woehrle, M.: Leveraging combinatorial testing for safety-critical computer vision datasets. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1314–1321 (2020)

  • Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014). ArXiv preprint arXiv:1412.6572

  • Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, New York (2016)

    MATH  Google Scholar 

  • Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a“right to explanation’’. AI magazine 38(3), 50–57 (2017)

    Article  Google Scholar 

  • Göpfert, J.P., Hammer, B., Wersing, H.: Mitigating concept drift via rejection. In: International Conference on Artificial Neural Networks. Springer, Berlin. pp. 456–467 (2018)

  • Gopinath, D., Taly, A., Converse, H., Pasareanu, C.S.: Finding invariants in deep neural networks (2019). ArXiv preprint arXiv:190413215v1

  • Gopinath, D., Katz, G., Păsăreanu, C.S., Barrett, C.: Deepsafe: A data-driven approach for assessing robustness of neural networks. In: Lahiri, S.K., Wang, C. (eds.) Automated Technology for Verification and Analysis, pp. 3–19. Springer, Cham (2018)

    Chapter  Google Scholar 

  • Grefenstette, E., Stanforth, R., O’Donoghue, B., Uesato, J., Swirszcz, G., Kohli, P.: Strength in numbers: Trading-off robustness and computation via adversarially-trained ensembles. CoRR abs/1811.09300 (2018). arXiv:1811.09300

  • Gros, T.P., Hermanns, H., Hoffmann, J., Klauck, M., Steinmetz, M.: Deep statistical model checking. In: International Conference on Formal Techniques for Distributed Objects, Components, and Systems. Springer, Berlin. pp. 96–114 (2020b)

  • Gros, S., Zanon, M., Bemporad, A.: Safe reinforcement learning via projection on a safe set: How to achieve optimality? (2020a). ArXiv preprint arXiv:2004.00915

  • Gschossmann, A., Jobst, S., Mottok, J., Bierl, R.: A measure of confidence of artificial neural network classifiers. In: ARCS Workshop 2019; 32nd International Conference on Architecture of Computing Systems, pp. 1–5 (2019)

  • Gu, X., Easwaran, A.: Towards safe machine learning for cps: infer uncertainty from training data. In: Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, pp. 249–258 (2019)

  • Gualo, F., Rodriguez, M., Verdugo, J., Caballero, I., Piattini, M.: Data quality certification using ISO/IEC 25012: Industrial experiences. J. Syst. Softw. 176, 110938 (2021)

    Article  Google Scholar 

  • Guidotti, D., Leofante, F., Castellini, C., Tacchella, A.: Repairing learned controllers with convex optimization: a case study. In: International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Springer, Berlin. pp. 364–373 (2019a)

  • Guidotti, R., Monreale, A., Giannotti, F., Pedreschi, D., Ruggieri, S., Turini, F.: Factual and counterfactual explanations for black box decision making. IEEE Intell. Syst. 34(6), 14–23 (2019)

    Article  Google Scholar 

  • Guo, W., Mu, D., Xu, J., Su, P., Wang, G., Xing, X.: Lemna: Explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 364–379 (2018b)

  • Guo J, Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: DLFuzz: differential fuzzing testing of deep learning systems. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2018a)

  • Hart, P., Rychly, L., Knoll, A.: Lane-merging using policy-based reinforcement learning and post-optimization. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, 3176–3181 (2019)

  • Hasanbeig, M., Kroening, D., Abate, A.: Towards verifiable and safe model-free reinforcement learning. In: CEUR Workshop Proceedings, CEUR Workshop Proceedings (2020)

  • Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)

  • Hein, M., Andriushchenko, M.: Formal guarantees on the robustness of a classifier against adversarial manipulation. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30 (2017). https://proceedings.neurips.cc/paper/2017/file/e077e1a544eec4f0307cf5c3c721d944-Paper.pdf

  • Heinzmann, L., Shafaei, S., Osman, M.H., Segler, C., Knoll, A.: A framework for safety violation identification and assessment in autonomous driving. In: AISafety@IJCAI (2019)

  • Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M., Steinhardt, J., Song, D.: Scaling out-of-distribution detection for real-world settings (2020). ArXiv preprint arXiv:1911.11132

  • Hendrycks, D., Carlini, N., Schulman, J., Steinhardt, J.: Unsolved problems in ml safety (2021). arXiv:2109.13916

  • Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=HJz6tiCqYm

  • Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net (2017). https://openreview.net/forum?id=Hkg4TI9xl

  • Henne, M., Schwaiger, A., Roscher, K., Weiss, G.: Benchmarking uncertainty estimation methods for deep learning with safety-related metrics. In: SafeAI@ AAAI, pp. 83–90 (2020)

  • Henriksson, J., Berger, C., Borg, M., Tornberg, L., Englund, C., Sathyamoorthy, S.R., Ursing, S.: Towards structured evaluation of deep neural network supervisors. In: 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 27–34 (2019a)

  • Henriksson, J., Berger, C., Borg, M., Tornberg, L., Sathyamoorthy, S.R., Englund, C.: Performance analysis of out-of-distribution detection on various trained neural networks. In: 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 113–120 (2019b). https://doi.org/10.1109/SEAA.2019.00026

  • Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: International conference on computer aided verification. Springer, Berlin. pp. 3–29 (2017)

  • Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y., Thamo, E., Wu, M., Yi, X.: A survey of safety and trustworthiness of deep neural networks: verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020). https://doi.org/10.1016/j.cosrev.2020.100270

    Article  MathSciNet  MATH  Google Scholar 

  • Ignatiev, A., Pereira, F., Narodytska, N., Marques-Silva, J.: A sat-based approach to learn explainable decision sets. In: International Joint Conference on Automated Reasoning. Springer, Berlin. pp. 627–645 (2018)

  • Inouye, D.I., Leqi, L., Kim, J.S., Aragam, B., Ravikumar, P.: Diagnostic curves for black box models (2019). ArXiv preprint arXiv:191201108v1

  • Isele, D., Nakhaei, A., Fujimura, K.: Safe reinforcement learning on autonomous vehicles. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 1–6 (2018)

  • ISO (2018) ISO 26262: Road vehicles – Functional safety. International Organization of Standardization (ISO), Geneva, Switzerland

  • ISO (2019) ISO/PAS 21448: Road vehicles – Safety of the intended functionality. International Organization of Standardization (ISO), Geneva

  • Jain, D., Anumasa, S., Srijith, P.: Decision making under uncertainty with convolutional deep gaussian processes. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp. 143–151 (2020)

  • Jeddi, A., Shafiee, M.J., Karg, M., Scharfenberger, C., Wong, A.: Learn2perturb: An end-to-end feature perturbation learning to improve adversarial robustness. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1238–1247 (2020). https://doi.org/10.1109/CVPR42600.2020.00132

  • Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., Tang, J.: Graph structure learning for robust graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ’20, pp. 66–74 (2020)

  • Julian, K.D., Kochenderfer, M.J.: Guaranteeing safety for neural network-based aircraft collision avoidance systems. In: 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), IEEE, pp. 1–10 (2019)

  • Julian, K.D., Lee, R., Kochenderfer, M.J.: Validation of image-based neural network controllers through adaptive stress testing. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–7 (2020). https://doi.org/10.1109/ITSC45102.2020.9294549

  • Julian, K.D., Sharma, S., Jeannin, J.B., Kochenderfer, M.J.: Verifying aircraft collision avoidance neural networks through linear approximations of safe regions (2019). ArXiv preprint arXiv:1903.00762

  • Kandel, A., Moura, S.J.: Safe zero-shot model-based learning and control: a wasserstein distributionally robust approaC.H (2020). ArXiv preprint arXiv:2004.00759

  • Kaprocki, N., Velikić, G., Teslić, N., Krunić, M.: Multiunit automotive perception framework: Synergy between AI and deterministic processing. In: 2019 IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin), pp. 257–260 (2019)

  • Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: An efficient smt solver for verifying deep neural networks. In: International Conference on Computer Aided Verification. Springer, Berlin. pp. 97–117 (2017)

  • Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’17, pp. 5580–5590 (2017)

  • Kitchenham, B.: Procedures for performing systematic reviews. Joint Technical Report, Computer Science Department, Keele University (TR/SE-0401) and National ICT Australia Ltd (0400011T1) (2004)

  • Kitchenham, B., Pretorius, R., Budgen, D., Pearl Brereton, O., Turner, M., Niazi, M., Linkman, S.: Systematic literature reviews in software engineering - a tertiary study. Inf. Softw. Technol. 52(8), 792–805 (2010)

    Article  Google Scholar 

  • Kläs, M., Sembach, L.: Uncertainty wrappers for data-driven models. In: International Conference on Computer Safety, Reliability, and Security. Springer, Berlin. pp. 358–364 (2019)

  • Kornecki, A., Zalewski, J.: Software certification for safety-critical systems: A status report. In: 2008 International Multiconference on Computer Science and Information Technology, pp. 665–672 (2008). https://doi.org/10.1109/IMCSIT.2008.4747314

  • Kuppers, F., Kronenberger, J., Shantia, A., Haselhoff, A.: Multivariate confidence calibration for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 326–327 (2020)

  • Kuutti, S., Bowden, R., Joshi, H., de Temple, R., Fallah, S.: Safe deep neural network-driven autonomous vehicles using software safety cages. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) Intelligent Data Engineering and Automated Learning - IDEAL 2019. Lecture Notes in Computer Science, pp. 150–160. Springer, Berlin (2019)

    Chapter  Google Scholar 

  • Kuwajima, H., Tanaka, M., Okutomi, M.: Improving transparency of deep neural inference process. Progr. Artif. Intell. 8(2), 273–285 (2019)

    Article  Google Scholar 

  • Laidlaw, C., Feizi, S.: Playing it safe: adversarial robustness with an abstain option (2019). ArXiv preprint arXiv:1911.11253

  • Le, M.T., Diehl, F., Brunner, T., Knol, A.: Uncertainty estimation for deep neural object detectors in safety-critical applications. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp. 3873–3878 (2018)

  • Le, H., Voloshin, C., Yue, Y.: Batch policy learning under constraints. In: International Conference on Machine Learning, PMLR, pp. 3703–3712 (2019)

  • Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: On the connection between differential privacy and adversarial robustness in machine learning (2018). ArXiv preprint arXiv:180203471v1

  • Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’18, pp. 7167–7177 (2018)

  • Lee, K., An, G.N., Zakharov, V., Theodorou, E.A.: Perceptual attention-based predictive control (2019a). ArXiv preprint arXiv:1904.11898

  • Lee, K., Wang, Z., Vlahov, B., Brar, H., Theodorou, E.A.: Ensemble bayesian decision making with redundant deep perceptual control policies. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), IEEE, pp. 831–837 (2019b)

  • Levi, D., Gispan, L., Giladi, N., Fetaya, E.: Evaluating and calibrating uncertainty prediction in regression tasks (2019). ArXiv preprint arXiv:1905.11659

  • Li, S., Chen, Y., Peng, Y., Bai, L.: Learning more robust features with adversarial training. ArXiv preprint arXiv:1804.07757 (2018)

  • Li, J., Liu, J., Yang, P., Chen, L., Huang, X., Zhang, L.: Analyzing deep neural networks with symbolic propagation: towards higher precision and faster verification. In: International Static Analysis Symposium. Springer, Berlin. pp. 296–319 (2019a)

  • Li, Y., Liu, Y., Li, M., Tian, Y., Luo, B., Xu, Q.: D2NN: A fine-grained dual modular redundancy framework for deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC’19), ACM, New York, NY, USA, pp. 138-147 (2019b)

  • Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks (2020). ArXiv preprint arXiv:1706.02690

  • Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning (2015). ArXiv preprint arXiv:1509.02971

  • Lin, W., Yang, Z., Chen, X., Zhao, Q., Li, X., Liu, Z., He, J.: Robustness verification of classification deep neural networks via linear programming. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11418–11427 (2019)

  • Liu, M., Liu, S., Su, H., Cao, K., Zhu, J.: Analyzing the noise robustness of deep neural networks. In: 2018 IEEE Conference on Visual Analytics Science and Technology (VAST), IEEE, pp. 60–71, (2018)

  • Liu, L., Saerbeck, M., Dauwels, J.: Affine disentangled gan for interpretable and robust av perception (2019). ArXiv preprint arXiv:1907.05274

  • Liu, J., Shen, Z., Cui, P., Zhou, L., Kuang, K., Li, B., Lin, Y.: Invariant adversarial learning for distributional robustness (2020). ArXiv preprint arXiv:2006.04414

  • Loquercio, A., Segu, M., Scaramuzza, D.: A general framework for uncertainty estimation in deep learning. IEEE Robot. Autom. Lett. 5(2), 3153–3160 (2020). https://doi.org/10.1109/LRA.2020.2974682

    Article  Google Scholar 

  • Lust, J., Condurache, A.P.: Gran: An efficient gradient-norm based detector for adversarial and misclassified examples (2020). ArXiv preprint arXiv:2004.09179

  • Lütjens, B., Everett, M., How, J.P.: Safe reinforcement learning with model uncertainty estimates. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp. 8662–8668 (2019)

  • Lyu, Z., Ko, C.Y., Kong, Z., Wong, N., Lin, D., Daniel, L.: Fastened crown: Tightened neural network robustness certificates. Proc. AAAI Conf. Artif. Intell. 34, 5037–5044 (2020)

    Google Scholar 

  • Ma, L., Juefei-Xu, F., Xue, M., Li, B., Li, L., Liu, Y., Zhao, J.: DeepCT: Tomographic combinatorial testing for deep learning systems. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 614–618 (2019)

  • Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., Zhao, J., Wang, Y.: Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ACM, New York, NY, USA, ASE 2018, pp. 120-131 (2018). https://doi.org/10.1145/3238147.3238202

  • Machida, F.: N-version machine learning models for safety critical systems. In: 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 48–51 (2019)

  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks (2017). ArXiv preprint arXiv:1706.06083

  • Mani, N., Moh, M., Moh, T.S.: Towards robust ensemble defense against adversarial examples attack. In: 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2019a)

  • Mani, S., Sankaran, A., Tamilselvam, S., Sethi, A.: Coverage testing of deep learning models using dataset characterization (2019b). ArXiv preprint arXiv:1911.07309

  • Marvi, Z., Kiumarsi, B.: Safe off-policy reinforcement learning using barrier functions. In: 2020 American Control Conference (ACC), IEEE, pp. 2176–2181 (2020)

  • Meinke, A., Hein, M.: Towards neural networks that provably know when they don’t know (2019). ArXiv preprint arXiv:1909.12180

  • Meyes, R., de Puiseau, C.W., Posada-Moreno, A., Meisen, T.: Under the hood of neural networks: Characterizing learned representations by functional neuron populations and network ablations (2020a). ArXiv preprint arXiv:2004.01254

  • Meyes, R., Schneider, M., Meisen, T.: How do you act? an empirical study to understand behavior of deep reinforcement learning agents (2020b). ArXiv preprint arXiv:2004.03237

  • Michelmore, R., Kwiatkowska, M., Gal, Y.: Evaluating uncertainty quantification in end-to-end autonomous driving control (2018). ArXiv preprint arXiv:1811.06817

  • Mirman, M., Gehr, T., Vechev, M.: Differentiable abstract interpretation for provably robust neural networks. In: International Conference on Machine Learning, PMLR, pp. 3578–3586 (2018)

  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  • Moravčík, M., Schmid, M., Burch, N., Lisỳ, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., Bowling, M.: Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337), 508–513 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Müller, S., Hospach, D., Bringmann, O., Gerlach, J., Rosenstiel, W.: Robustness evaluation and improvement for vision-based advanced driver assistance systems. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems, pp. 2659–2664 (2015)

  • Naseer, M., Minhas, M.F., Khalid, F., Hanif, M.A., Hasan, O., Shafique, M.: Fannet: formal analysis of noise tolerance, training bias and input sensitivity in neural networks. In: 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, pp. 666–669 (2020)

  • Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2018)

    MATH  Google Scholar 

  • Nguyen, H.H., Matschek, J., Zieger, T., Savchenko, A., Noroozi, N., Findeisen, R.: Towards nominal stability certification of deep learning-based controllers. In: 2020 American Control Conference (ACC), IEEE, 3886–3891 (2020)

  • Nowak, T., Nowicki, M.R., Ćwian, K., Skrzypczyński, P.: How to improve object detection in a driver assistance system applying explainable deep learning. In: 2019 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp. 226–231 (2019)

  • O’Brien, M., Goble, W., Hager, G., Bukowski, J.: Dependable neural networks for safety critical tasks. In: International Workshop on Engineering Dependable and Secure Machine Learning Systems. Springer, Berlin. pp. 126–140 (2020)

  • Pan, R.: Static deep neural network analysis for robustness. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE 2019, pp. 1238-1240 (2019)

  • Pandian, M.K.S., Dajsuren, Y., Luo, Y., Barosan, I.: Analysis of iso 26262 compliant techniques for the automotive domain. In: MASE@MoDELS (2015)

  • Park, C., Kim, J.M., Ha, S.H., Lee, J.: Sampling-based bayesian inference with gradient uncertainty (2018). ArXiv preprint arXiv:1812.03285

  • Pauli, P., Koch, A., Berberich, J., Kohler, P., Allgöwer, F.: Training robust neural networks using lipschitz bounds. IEEE Control Syst. Lett. 6, 121–126 (2022). https://doi.org/10.1109/LCSYS.2021.3050444

    Article  MathSciNet  Google Scholar 

  • Pedreschi, D., Giannotti, F., Guidotti, R., Monreale, A., Pappalardo, L., Ruggieri, S., Turini, F.: Open the black box data-driven explanation of black box decision systems (2018). ArXiv preprint arXiv:1806.09936

  • Pedroza, G., Adedjouma, M.: Safe-by-Design Development Method for Artificial Intelligent Based Systems. In: SEKE 2019 : The 31st International Conference on Software Engineering and Knowledge Engineering, Lisbon, Portugal, pp. 391–397 (2019)

  • Pei, K., Cao, Y., Yang, J., Jana, S.: DeepXplore. Proceedings of the 26th Symposium on Operating Systems Principles (2017a)

  • Pei, K., Cao, Y., Yang, J., Jana, S.: Towards practical verification of machine learning: The case of computer vision systems (2017b). ArXiv preprint arXiv:1712.01785

  • Peng, W., Ye, Z.S., Chen, N.: Bayesian deep-learning-based health prognostics toward prognostics uncertainty. IEEE Trans. Ind. Electron. 67(3), 2283–2293 (2019)

    Article  Google Scholar 

  • Postels, J., Ferroni, F., Coskun, H., Navab, N., Tombari, F.: Sampling-free epistemic uncertainty estimation using approximated variance propagation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2931–2940 (2019)

  • Rahimi, M., Guo, J.L., Kokaly, S., Chechik, M.: Toward requirements specification for machine-learned components. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), 241–244 (2019)

  • Rajabli, N., Flammini, F., Nardone, R., Vittorini, V.: Software verification and validation of safe autonomous cars: A systematic literature review. IEEE Access 9, 4797–4819 (2021). https://doi.org/10.1109/ACCESS.2020.3048047

    Article  Google Scholar 

  • Rakin, A.S., He, Z., Fan, D.: Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack (2018). ArXiv preprint arXiv:1811.09310

  • Ramanagopal, M.S., Anderson, C., Vasudevan, R., Johnson-Roberson, M.: Failing to learn: Autonomously identifying perception failures for self-driving cars. IEEE Robot. Autom. Lett. 3(4), 3860–3867 (2018)

    Article  Google Scholar 

  • Reeb, D., Doerr, A., Gerwinn, S., Rakitsch, B.: Learning gaussian processes by minimizing pac-bayesian generalization bounds. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’18, pp. 3341-3351 (2018)

  • Remeli, V., Morapitiye, S., Rövid, A., Szalay, Z.: Towards verifiable specifications for neural networks in autonomous driving. In: 2019 IEEE 19th International Symposium on Computational Intelligence and Informatics and 7th IEEE International Conference on Recent Achievements in Mechatronics, pp. 000175–000180. Automation, Computer Sciences and Robotics (CINTI-MACRo), IEEE (2019)

  • Ren, H., Chandrasekar, S.K., Murugesan, A.: Using quantifier elimination to enhance the safety assurance of deep neural networks. In: 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), IEEE, pp. 1–8 (2019a)

  • Ren, J., Liu, P.J., Fertig, E., Snoek, J., Poplin, R., DePristo, M.A., Dillon, J.V., Lakshminarayanan, B.: Likelihood Ratios for Out-of-Distribution Detection, pp. 14707–14718. Curran Associates Inc., Red Hook, NY, USA (2019)

  • Ren, K., Zheng, T., Qin, Z., Liu, X.: Adversarial attacks and defenses in deep learning. Engineering 6(3), 346–360 (2020)

    Article  Google Scholar 

  • Revay, M., Wang, R., Manchester, I.R.: A convex parameterization of robust recurrent neural networks. IEEE Control Syst. Lett. 5(4), 1363–1368 (2020)

    Article  MathSciNet  Google Scholar 

  • Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144 (2016)

  • Richards, S.M., Berkenkamp, F., Krause, A.: The lyapunov neural network: adaptive stability certification for safe learning of dynamical systems. In: Conference on Robot Learning, PMLR, pp. 466–476 (2018)

  • Rodriguez-Dapena, P.: Software safety certification: a multidomain problem. IEEE Softw. 16(4), 31–38 (1999). https://doi.org/10.1109/52.776946

    Article  Google Scholar 

  • Roh, Y., Heo, G., Whang, S.E.: A survey on data collection for machine learning: A big data - ai integration perspective. IEEE Trans. Knowl. Data Eng. 33(4), 1328–1347 (2021)

    Article  Google Scholar 

  • Ruan, W., Wu, M., Sun, Y., Huang, X., Kroening, D., Kwiatkowska, M.: Global robustness evaluation of deep neural networks with provable guarantees for the hamming distance. In: IJCAI2019 (2019)

  • Rubies-Royo, V., Calandra, R., Stipanovic, D.M., Tomlin, C.: Fast neural network verification via shadow prices (2019). ArXiv preprint arXiv:1902.07247

  • Rudolph, A., Voget, S., Mottok, J.: A consistent safety case argumentation for artificial intelligence in safety related automotive systems. In: 9th European Congress on Embedded Real Time Software and Systems (ERTS 2018), Toulouse, France (2018)

  • Rusak, E., Schott, L., Zimmermann, R., Bitterwolf, J., Bringmann, O., Bethge, M., Brendel, W.: Increasing the robustness of dnns against image corruptions by playing the game of noise (2020). ArXiv preprint arXiv:2001.06057

  • Salay, R., Angus, M., Czarnecki, K.: A safety analysis method for perceptual components in automated driving. In: 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp. 24–34 (2019)

  • Salay, R., Czarnecki, K.: Using machine learning safely in automotive software: An assessment and adaption of software process requirements in iso 26262 (2018). ArXiv preprint arXiv:1808.01614

  • Scheel, O., Schwarz, L., Navab, N., Tombari, F.: Explicit domain adaptation with loosely coupled samples (2020). ArXiv preprint arXiv:2004.11995

  • Sehwag, V., Bhagoji, A.N., Song, L., Sitawarin, C., Cullina, D., Chiang, M., Mittal, P.: Analyzing the robustness of open-world machine learning. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, ACM, New York, NY, USA, AISec’19, pp. 105-116 (2019)

  • Sehwag, V., Wang, S., Mittal, P., Jana, S.: On pruning adversarially robust neural networks. ArXiv arXiv:2002.10509 (2020)

  • Sekhon, J., Fleming, C.: Towards improved testing for deep learning. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pp. 85–88 (2019)

  • Sena, L.H., Bessa, I.V., Gadelha, M.R., Cordeiro, L.C., Mota, E.: Incremental bounded model checking of artificial neural networks in cuda. In: 2019 IX Brazilian Symposium on Computing Systems Engineering (SBESC), IEEE, pp. 1–8 (2019)

  • Sheikholeslami, F., Jain, S., Giannakis, G.B.: Minimum uncertainty based detection of adversaries in deep neural networks. In: 2020 Information Theory and Applications Workshop (ITA), IEEE, pp. 1–16 (2020)

  • Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information (2017). ArXiv preprint arXiv:1703.00810

  • Singh, G., Gehr, T., Püschel, M., Vechev, M.: Boosting robustness certification of neural networks. In: International Conference on Learning Representations (2018)

  • Sinha, A., Namkoong, H., Volpi, R., Duchi, J.: Certifying some distributional robustness with principled adversarial training (2017). ArXiv preprint arXiv:1710.10571

  • Smith, M.T., Grosse, K., Backes, M., Alvarez, M.A.: Adversarial vulnerability bounds for gaussian process classification (2019). ArXiv preprint arXiv:1909.08864

  • Sohn, J., Kang, S., Yoo, S.: Search based repair of deep neural networks (2019). ArXiv preprint arXiv:1912.12463

  • Steinhardt, J., Koh, P.W., Liang, P.: Certified defenses for data poisoning attacks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’17, pp. 3520-3532 (2017)

  • Summers, C., Dinneen, M.J.: Improved adversarial robustness via logit regularization methods (2019). ArXiv preprint arXiv:1906.03749

  • Sun, Y., Huang, X., Kroening, D., Sharp, J., Hill, M., Ashmore, R.: DeepConcolic: Testing and debugging deep neural networks. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 111–114 (2019)

  • Sun, Y., Huang, X., Kroening, D., Sharp, J., Hill, M., Ashmore, R.: Structural test coverage criteria for deep neural networks. ACM Trans. Embed. Comput. Syst. 18, 5 (2019)

    Article  Google Scholar 

  • Syriani, E., Luhunu, L., Sahraoui, H.: Systematic mapping study of template-based code generation. Comput. Lang. Syst. Struct. 52, 43–62 (2018)

    Google Scholar 

  • Taha, A., Chen, Y., Misu, T., Shrivastava, A., Davis, L.: Unsupervised data uncertainty learning in visual retrieval systems. CoRR abs/1902.02586 (2019). arXiv:1902.02586

  • Tang, Y.C., Zhang, J., Salakhutdinov, R.: Worst cases policy gradients (2019). ArXiv preprint arXiv:1911.03618

  • Tian, Y., Pei, K., Jana, S., Ray, B.: DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE ’18, pp. 303-314 (2018)

  • Tian, Y., Zhong, Z., Ordonez, V., Kaiser, G., Ray, B.: Testing dnn image classifiers for confusion & bias errors. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA, ICSE ’20, pp. 1122-1134 (2020). https://doi.org/10.1145/3377811.3380400,

  • Törnblom, J., Nadjm-Tehrani, S.: Formal verification of input-output mappings of tree ensembles. Sci. Comput. Progr. 194, 102450 (2020)

    Article  Google Scholar 

  • Toubeh, M., Tokekar, P.: Risk-aware planning by confidence estimation using deep learning-based perception (2019). ArXiv preprint arXiv:1910.00101

  • Tran, H.D., Musau, P., Lopez, D.M., Yang, X., Nguyen, L.V., Xiang, W., Johnson, T.T.: Parallelizable reachability analysis algorithms for feed-forward neural networks. In: 2019 IEEE/ACM 7th International Conference on Formal Methods in Software Engineering (FormaliSE), IEEE, pp. 51–60 (2019)

  • Tran, H.D., Yang, X., Lopez, D.M., Musau, P., Nguyen, L.V., Xiang, W., Bak, S., Johnson, T.T.: NNV: The neural network verification tool for deep neural networks and learning-enabled cyber-physical systems. In: International Conference on Computer Aided Verification. Springer, Berlin. pp. 3–17 (2020)

  • Tuncali, C.E., Fainekos, G., Ito, H., Kapinski, J.: Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1555–1562 (2018)

  • Turchetta, M., Berkenkamp, F., Krause, A.: Safe exploration in finite markov decision processes with gaussian processes. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’16, pp. 4312-4320 (2016)

  • Udeshi, S., Jiang, X., Chattopadhyay, S.: Callisto: Entropy-based test generation and data quality assessment for machine learning systems. In: 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), pp. 448–453 (2020)

  • Uesato, J., Kumar, A., Szepesvari, C., Erez, T., Ruderman, A., Anderson, K., Heess, N., Kohli, P. et al.: Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures (2018). ArXiv preprint arXiv:1812.01647

  • Varghese, S., Bayzidi, Y., Bar, A., Kapoor, N., Lahiri, S., Schneider, J.D., Schmidt, N.M., Schlicht, P., Huger, F., Fingscheidt, T.: Unsupervised temporal consistency metric for video segmentation in highly-automated driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 336–337 (2020)

  • Vidot, G., Gabreau, C., Ober, I., Ober, I.: Certification of embedded systems based on machine learning: a survey (2021). arXiv:2106.07221

  • Vijaykeerthy, D., Suri, A., Mehta, S., Kumaraguru, P.: Hardening deep neural networks via adversarial model cascades (2018). arXiv:1802.01448

  • Wabersich, K.P., Zeilinger, M.: Bayesian model predictive control: Efficient model exploration and regret bounds using posterior sampling. In: Learning for Dynamics and Control, PMLR, pp. 455–464 (2020a)

  • Wabersich, K.P., Zeilinger, M.N.: Performance and safety of bayesian model predictive control: scalable model-based RL with guarantees (2020b). ArXiv preprint arXiv:2006.03483

  • Wabersich, K.P., Hewing, L., Carron, A., Zeilinger, M.N.: Probabilistic model predictive safety certification for learning-based control. IEEE Trans. Autom. Control 2021, 10 (2021)

    MATH  Google Scholar 

  • Wagner, J., Kohler, J.M., Gindele, T., Hetzel, L., Wiedemer, J.T., Behnke, S.: Interpretable and fine-grained visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9097–9107 (2019)

  • Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Efficient formal safety analysis of neural networks. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 31 (2018a). https://proceedings.neurips.cc/paper/2018/file/2ecd2bd94734e5dd392d8678bc64cdab-Paper.pdf

  • Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Formal security analysis of neural networks using symbolic intervals. In: 27th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 18), pp. 1599–1614 (2018b)

  • Wang, T.E., Gu, Y., Mehta, D., Zhao, X., Bernal, E.A.: Towards robust deep neural networks (2018c). ArXiv preprint arXiv:1810.11726

  • Wang W, Wang A, Tamar, A., Chen, X., Abbeel, P.: Safer classification by synthesis (2018d). ArXiv preprint arXiv:1711.08534

  • Wang, Y., Jha, S., Chaudhuri, K.: Analyzing the robustness of nearest neighbors to adversarial examples. In: Dy J, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol. 80, pp. 5133–5142, (2018e). https://proceedings.mlr.press/v80/wang18c.html

  • Wang, J., Gou, L., Zhang, W., Yang, H., Shen, H.W.: Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Trans. Visualiz. Comput. Graph. 25(6), 2168–2180 (2019)

    Article  Google Scholar 

  • Wang, Y.S., Weng, T.W., Daniel, L.: Verification of neural network control policy under persistent adversarial perturbation (2019b). ArXiv preprint arXiv:1908.06353

  • Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    Article  MATH  Google Scholar 

  • Wen, M., Topcu, U.: Constrained cross-entropy method for safe reinforcement learning. IEEE Trans. Autom. Control 66, 7 (2020)

    MathSciNet  MATH  Google Scholar 

  • Wen, J., Li, S., Lin, Z., Hu, Y., Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)

    Article  Google Scholar 

  • Weyuker, E.J.: On testing non-testable programs. Comput. J. 25(4), 465–470 (1982)

    Article  Google Scholar 

  • Wicker, M., Huang, X., Kwiatkowska, M.: Feature-guided black-box safety testing of deep neural networks. In: Beyer, D., Huisman, M. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, pp. 408–426. Springer, Cham (2018)

    Chapter  Google Scholar 

  • Wolschke, C., Kuhn, T., Rombach, D., Liggesmeyer, P.: Observation based creation of minimal test suites for autonomous vehicles. In: 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 294–301 (2017)

  • Wu, M., Wicker, M., Ruan, W., Huang, X., Kwiatkowska, M.: A game-based approximate verification of deep neural networks with provable guarantees. Theoret. Comput. Sci. 807, 298–329 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Xiang, W., Lopez, D.M., Musau, P., Johnson, T.T.: Reachable set estimation and verification for neural network models of nonlinear dynamic systems. In: Safe, Autonomous and Intelligent Vehicles. Springer, Berlin. pp. 123–144 (2019)

  • Xie, X., Ma, L., Juefei-Xu, F., Xue, M., Chen, H., Liu, Y., Zhao, J., Li, B., Yin, J., See, S.: Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, New York, NY, USA, ISSTA 2019, pp. 146–157 (2019)

  • Xu, H., Chen, Z., Wu, W., Jin, Z., Kuo, S., Lyu, M.: Nv-dnn: Towards fault-tolerant dnn systems with n-version programming. In: 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 44–47 (2019)

  • Yaghoubi, S., Fainekos, G.: Gray-box adversarial testing for control systems with machine learning components. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, ACM, New York, NY, USA, HSCC ’19, pp. 179–184 (2019)

  • Yan, Y., Pei, Q.: A robust deep-neural-network-based compressed model for mobile device assisted by edge server. IEEE Access 7, 179104–179117 (2019)

    Article  Google Scholar 

  • Yan, M., Wang, L., Fei, A.: ARTDL: Adaptive random testing for deep learning systems. IEEE Access 8, 3055–3064 (2020)

    Article  Google Scholar 

  • Yang, Y., Vamvoudakis, K.G., Modares, H.: Safe reinforcement learning for dynamical games. Int. J. Rob. Nonlinear Control 30(9), 3706–3726 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Ye, S., Tan, S.H., Xu, K., Wang, Y., Bao, C., Ma, K.: Brain-inspired reverse adversarial examples (2019). ArXiv preprint arXiv:1905.12171

  • Youn, W.: jun Yi B,: Software and hardware certification of safety-critical avionic systems: A comparison study. Comput. Stand. Interfaces 36(6), 889–898 (2014). https://doi.org/10.1016/j.csi.2014.02.005

    Article  MathSciNet  Google Scholar 

  • Youn, W.K., Hong, S.B., Oh, K.R., Ahn, O.S.: Software certification of safety-critical avionic systems: Do-178c and its impacts. IEEE Aerospace Electron. Syst. Mag. 30(4), 4–13 (2015)

    Article  Google Scholar 

  • Zhan, W., Li, J., Hu, Y., Tomizuka, M.: Safe and feasible motion generation for autonomous driving via constrained policy net. In: IECON 2017-43rd Annual Conference of the IEEE Industrial Electronics Society, IEEE, pp. 4588–4593 (2017)

  • Zhang, J., Li, J.: Testing and verification of neural-network-based safety-critical control software: a systematic literature review. Inf. Softw. Technol. 123, 106296 (2020). https://doi.org/10.1016/j.infsof.2020.106296

    Article  Google Scholar 

  • Zhang, M., Li, H., Kuang, X., Pang, L., Wu, Z.: Neuron selecting: Defending against adversarial examples in deep neural networks. In: International Conference on Information and Communications Security. Springer, Berlin. pp. 613–629 (2019a)

  • Zhang, P., Dai, Q., Ji, S.: Condition-guided adversarial generative testing for deep learning systems. In: 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 71–77 (2019b)

  • Zhang, J., Cheung, B., Finn, C., Levine, S., Jayaraman, D.: Cautious adaptation for reinforcement learning in safety-critical settings. In: International Conference on Machine Learning, PMLR, pp. 11055–11065 (2020a)

  • Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: Survey, landscapes and horizons. IEEE Trans. Softw. Eng. 1, 1 (2020b). https://doi.org/10.1109/TSE.2019.2962027

    Article  Google Scholar 

  • Zhao, C., Yang, J., Liang, J., Li, C.: Discover learning behavior patterns to predict certification. In: 2016 11th International Conference on Computer Science & Education (ICCSE), IEEE, pp. 69–73 (2016)

Download references

Acknowledgements

We would like to thank the following authors (in no particular order) who kindly provided us feedback about our review of their work: Mahum Naseer, Hoang-Dung Tran, Jie Ren, David Isele, Jesse Zhang, Michaela Klauck, Guy Katz, Patrick Hart, Guy Amit, Yu Li, Anurag Arnab, Tiago Marques, Taylor T. Johnson, Molly O’Brien, Kimin Lee, Lukas Heinzmann, Björn Lütjens, Brendon G. Anderson, Marta Kwiatkowska, Patricia Pauli, Anna Monreale, Alexander Amini, Joerg Wagner, Adrian Schwaiger, Aman Sinha, Joel Dapello, Kim Peter Wabersich. Many thanks also goes to Freddy Lécué from Thalès, who provided us feedback on an early version of this manuscript. They all contributed to improving this SLR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Tambon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work is supported by the DEEL Project CRDPJ 537462-18 funded by the National Science and Engineering Research Council of Canada (NSERC) and the Consortium for Research and Innovation in Aerospace in Québec (CRIAQ), together with its industrial partners Thales Canada inc, Bell Textron Canada Limited, CAE inc and Bombardier inc.

Supplementary Information

Below is the link to the electronic supplementary material.

Electronic supplementary material 1 (PDF 596 kb)

Appendices

Appendices

We provide the list of the papers used in each section. Note that some can only be found in the complementary material, in order to provide readers detailed information while keeping the main review concise (Table 3).

Table 3 Papers reference for each section. Note that “Others” and “Explainability/Interpretable Model” categories are not presented in main development, only in Complementary Material, in order to keep the paper concise

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tambon, F., Laberge, G., An, L. et al. How to certify machine learning based safety-critical systems? A systematic literature review. Autom Softw Eng 29, 38 (2022). https://doi.org/10.1007/s10515-022-00337-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-022-00337-x

Keywords

Navigation