How to certify machine learning based safety-critical systems? A systematic literature review

Tambon, Florian; Laberge, Gabriel; An, Le; Nikanjam, Amin; Mindom, Paulina Stevia Nouwou; Pequignot, Yann; Khomh, Foutse; Antoniol, Giulio; Merlo, Ettore; Laviolette, François

doi:10.1007/s10515-022-00337-x

How to certify machine learning based safety-critical systems? A systematic literature review

Published: 10 April 2022

Volume 29, article number 38, (2022)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Florian Tambon ORCID: orcid.org/0000-0001-5593-9400¹,
Gabriel Laberge¹,
Le An¹,
Amin Nikanjam¹,
Paulina Stevia Nouwou Mindom¹,
Yann Pequignot²,
Foutse Khomh¹,
Giulio Antoniol¹,
Ettore Merlo¹ &
…
François Laviolette²

2802 Accesses
17 Citations
4 Altmetric
Explore all metrics

Abstract

Context

Machine Learning (ML) has been at the heart of many innovations over the past years. However, including it in so-called “safety-critical” systems such as automotive or aeronautic has proven to be very challenging, since the shift in paradigm that ML brings completely changes traditional certification approaches.

Objective

This paper aims to elucidate challenges related to the certification of ML-based safety-critical systems, as well as the solutions that are proposed in the literature to tackle them, answering the question “How to Certify Machine Learning Based Safety-critical Systems?”.

Method

We conduct a Systematic Literature Review (SLR) of research papers published between 2015 and 2020, covering topics related to the certification of ML systems. In total, we identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification. We analyzed the main trends and problems of each sub-field and provided summaries of the papers extracted.

Results

The SLR results highlighted the enthusiasm of the community for this subject, as well as the lack of diversity in terms of datasets and type of ML models. It also emphasized the need to further develop connections between academia and industries to deepen the domain study. Finally, it also illustrated the necessity to build connections between the above mentioned main pillars that are for now mainly studied separately.

Conclusion

We highlighted current efforts deployed to enable the certification of ML based software systems, and discuss some future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How can we know a self-driving car is safe?

Article Open access 30 June 2021

Non-functional requirements for machine learning: understanding current use and challenges among practitioners

Article Open access 07 January 2023

Trustworthy machine learning in the context of security and privacy

Article Open access 03 April 2024

Notes

https://www.forbes.com/sites/louiscolumbus/2020/01/19/roundup-of-machine-learning-forecasts-and-market-estimates-2020/
https://www.cbc.ca/news/business/uber-self-driving-car-2018-fatal-crash-software-flaws-1.5349581
https://webstore.iec.ch/publication/6007
https://www.iso.org/standard/68383.html
https://my.rtca.org/NC__Product?id=a1B36000001IcmqEAC
https://www.eaS.A.europa.eu/newsroom-and-events/news/easa-releases-consultation-its-first-usable-guidance-level-1-machine
https://www.iso.org/committee/6794475/x/catalogue/
https://www.iso.org/standard/68305.html?browse=tc
https://www.iso.org/standard/77608.html?browse=tc
https://www.iso.org/standard/77609.html?browse=tc
https://www.iso.org/standard/81283.html?browse=tc
https://www.faa.gov/aircraft/air_cert/design_approvals/air_software/media/TC_Overarching
https://www.deel.ai
https://scholar.google.com
https://www.engineeringvillage.com
https://webofknowledge.com
https://www.sciencedirect.com
https://www.scopus.com
https://dl.acm.org
https://ieeexplore.ieee.org
Harzing, A.W. (2007) Publish or Perish, available from https://harzing.com/resources/publish-or-perish
https://endnote.com
https://github.com/FlowSs/How-to-Certify-Machine-Learning-BasedSafety-critical-Systems-A-Systematic-Literature-Review
It is worth noting, the method can outperform other defense based on adversarial training such as Rusak et al. (2020) (\(L_\infty\) constraint and adversarial noise with Stylized ImageNet training) when considering a wide range of attack constraints and common image corruptions.
Author’s remark: A new improvement of GLOD, FOOD, was released earlier this year. Following our methodology, we kept only GLOD reference our methodology extracted, but we invite readers to check the new instalment of the method: https://arxiv.org/abs/2008.06856
They argue that the background part is why OOD can be misinterpreted. Indeed, they observed both that several OOD can have similar background components as in-distribution data and that the background term can dominate the semantic term in the likelihood computation. By adding noise, they essentially mask the semantic term so they can train a model specifically on background components. This could explain why models such as PixelCNN can fail on OOD detection.
Author’s remark: The original ADP paper pushed on arXiv in 2019 has been improved and re-uploaded in 2020. In this review, we kept the 2019 reference, which was recovered by our methodology, but we invite reader to check the 2020 paper: https://arxiv.org/abs/1912.01108
NNV benefits from parallel computing which makes it faster than Reluplex (Katz et al. 2017) and other existing DNN verification frameworks.
https://www.deel.ai
https://github.com/FlowSs/How-to-Certify-Machine-Learning-BasedSafety-critical-Systems-A-Systematic-Literature-Review

References

Arcaini, P., Bombarda, A., Bonfanti, S., Gargantini, A.: Dealing with robustness of convolutional neural networks for image classification. In: 2020 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 7–14 (2020) https://doi.org/10.1109/AITEST49225.2020.00009
Abreu, S.: Automated architecture design for deep neural networks (2019). ArXiv preprint arXiv:1908.10714
Agostinelli, F., Hocquet, G., Singh, S., Baldi, P.: From reinforcement learning to deep reinforcement learning: an overview. In: Braverman Readings in Machine Learning. Key Ideas From Inception to Current State, pp. 298–328. Springer, Berlin (2018)
Google Scholar
Alagöz, I., Herpel, T., German, R.: A selection method for black box regression testing with a statistically defined quality level. In: 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 114–125 (2017). https://doi.org/10.1109/ICST.2017.18
Amarasinghe, K., Manic, M.: Explaining what a neural network has learned: toward transparent classification. In: 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, pp. 1–6 (2019)
Ameyaw, D.A., Deng, Q., Söffker, D.: Probability of detection (pod)-based metric for evaluation of classifiers used in driving behavior prediction. In: Annual Conference of the PHM Society, vol 11 (2019)
Amini, A., Schwarting, W., Soleimany, A., Rus, D.: Deep evidential regression (2019). ArXiv preprint arXiv:1910.02600
Amit, G., Levy, M., Rosenberg, I., Shabtai, A., Elovici, Y.: Glod: Gaussian likelihood out of distribution detector (2020). ArXiv preprint arXiv:2008.06856
Anderson, BG., Ma, Z., Li, J., Sojoudi, S.: Tightened convex relaxations for neural network robustness certification. In: 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, pp. 2190–2197 (2020)
Aravantinos, V., Diehl, F.: Traceability of deep neural networks (2019). ArXiv preprint arXiv:1812.06744
Arnab, A., Miksik, O., Torr, PH.: On the robustness of semantic segmentation models to adversarial attacks. In: 2018 IEEECVF Conference on Computer Vision and Pattern Recognition, pp. 888–897 (2018) https://doi.org/10.1109/CVPR.2018.00099
Arrieta, A.B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-López, S., Molina, D., Benjamins, R., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)
Article Google Scholar
Aslansefat, K., Sorokos, I., Whiting, D., Kolagari, R.T., Papadopoulos, Y.: Safeml: Safety monitoring of machine learning classifiers through statistical difference measure (2020). ArXiv preprint arXiv:2005.13166
Ayers, EW., Eiras, F., Hawasly, M., Whiteside, I.: Parot: a practical framework for robust deep neural network training. In: NASA Formal Methods Symposium. Springer, Berlin. pp. 63–84 (2020)
Bacci, E., Parker, D.: Probabilistic guarantees for safe deep reinforcement learning. In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer, Berlin. pp. 231–248 (2020)
Baheri, A., Nageshrao, S., Tseng, H.E., Kolmanovsky, I., Girard, A., Filev, D.: Deep reinforcement learning with enhanced safety for autonomous highway driving. In: 2020 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp. 1550–1555 (2019)
Bakhti, Y., Fezza, S.A., Hamidouche, W., Déforges, O.: DDSA: a defense against adversarial attacks using deep denoising sparse autoencoder. IEEE Access 7, 160397–160407 (2019)
Article Google Scholar
Baluta, T., Shen, S., Shinde, S., Meel, KS., Saxena, P.: Quantitative verification of neural networks and its security applications. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1249–1264 (2019)
Bar, A., Huger, F., Schlicht, P., Fingscheidt, T.: On the robustness of redundant teacher-student frameworks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1380–1388 (2019)
Bar, A., Klingner, M., Varghese, S., Huger, F., Schlicht, P., Fingscheidt, T.: Robust semantic segmentation by redundant networks with a layer-specific loss contribution and majority vote. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 332–333 (2020)
Ben Braiek, H., Khomh, F.: Deepevolution: A search-based testing approach for deep neural networks. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 454–458 (2019) https://doi.org/10.1109/ICSME.2019.00078
Berkenkamp, F., Turchetta, M., Schoellig, AP., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), pp. 908–919 (2017)
Bernhard, J., Gieselmann, R., Esterle, K., Knol, A.: Experience-based heuristic search: Robust motion planning with deep q-learning. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp. 3175–3182 (2018)
Biondi, A., Nesti, F., Cicero, G., Casini, D., Buttazzo, G.: A safe, secure, and predictable software architecture for deep learning in safety-critical systems. IEEE Embed. Syst. Lett. 12(3), 78–82 (2020). https://doi.org/10.1109/LES.2019.2953253
Article Google Scholar
Bragg, J., Habli, I.: What is acceptably safe for reinforcement learning? In: International Conference on Computer Safety, Reliability, and Security. Springer, Berlin. pp. 418–430 (2018)
Bunel, R., Lu, J., Turkaslan, I., Torr, P.H., Kohli, P., Kumar, M.P.: Branch and bound for piecewise linear neural network verification. J. MaC.H. Learn. Res. 21(42), 1–39 (2020)
Burton, S., Gauerhof, L., Sethy, B.B., Habli, I., Hawkins, R.: Confidence arguments for evidence of performance in machine learning for highly automated driving functions. In: Romanovsky, A., Troubitsyna, E., Gashi, I., Schoitsch, E., Bitsch, F. (eds.) Computer Safety, Reliability, and Security, pp. 365–377. Springer, Berlin (2019)
Chapter Google Scholar
Cardelli, L., Kwiatkowska, M., Laurenti, L., Patane, A.: Robustness guarantees for bayesian inference with gaussian processes. Proc. AAAI Conf. Artif. Intell. 33, 7759–7768 (2019)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017). https://doi.org/10.1109/SP.2017.49
Castelvecchi, D.: Can we open the black box of AI? Nat News 538, 20–23 (2016)
Article Google Scholar
Chakrabarty, A., Quirynen, R., Danielson, C., Gao, W.: Approximate dynamic programming for linear systems with state and input constraints. In: 2019 18th European Control Conference (ECC), IEEE, pp. 524–529 (2019)
Chen, TY., Cheung, SC., Yiu, SM.: Metamorphic testing: a new approach for generating next test cases (2020a). ArXiv preprint arXiv:2002.12543
Chen, Z., Narayanan, N., Fang, B., Li, G., Pattabiraman, K., DeBardeleben, N.: Tensorfi: A flexible fault injection framework for tensorflow applications. In: 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pp. 426–435 (2020b). https://doi.org/10.1109/ISSRE5003.2020.00047
Cheng, C.H.: Safety-aware hardening of 3d object detection neural network systems (2020). ArXiv preprint arXiv:2003.11242
Cheng, C.H., Nührenberg, G., Ruess, H.: Maximum resilience of artificial neural networks. In: International Symposium on Automated Technology for Verification and Analysis. Springer, Berlin. pp. 251–268, (2017)
Cheng, C.H., Huang, C.H., Nührenberg, G.: nn-dependability-kit: Engineering neural networks for safety-critical autonomous driving systems (2019a). ArXiv preprint arXiv:1811.06746
Cheng, C., Nührenberg, G., Yasuoka, H.: Runtime monitoring neuron activation patterns. In: 2019 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 300–303 (2019b). https://doi.org/10.23919/DATE.2019.8714971
Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. Proceedings of the AAAI Conference on Artificial Intelligence 33, 3387–3395 (2019c)
Article Google Scholar
Cofer, D., Amundson, I., Sattigeri, R., Passi, A., Boggs, C., Smith, E., Gilham, L., Byun, T., Rayadurgam, S.: Run-time assurance for learning-based aircraft taxiing. In: 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC), pp. 1–9 (2020). https://doi.org/10.1109/DASC50938.2020.9256581
Colangelo, F., Neri, A., Battisti, F.: Countering adversarial examples by means of steganographic attacks. In: 2019 8th European Workshop on Visual Information Processing (EUVIP), pp. 193–198 (2019). https://doi.org/10.1109/EUVIP47703.2019.8946254
Cosentino, J., Zaiter, F., Pei, D., Zhu, J.: The search for sparse, robust neural networks (2019). ArXiv preprint arXiv:1912.02386
Croce, F., Hein, M.: Provable robustness against all adversarial \(l\_p\)-perturbations for \(p \ge 1\) (2019). ArXiv preprint arXiv:1905.11213
Croce, F., Andriushchenko, M., Hein, M.: Provable robustness of relu networks via maximization of linear regions. In: the 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 2057–2066 (2019)
Daniels, Z.A., Metaxas, D.: Scenarionet: An interpretable data-driven model for scene understanding. In: IJCAI Workshop on Explainable Artificial Intelligence (XAI) 2018 (2018)
Dapello, J., Marques, T., Schrimpf, M., Geiger, F., Cox, D.D., DiCarlo, J.J.: Simulating a primary visual cortex at the front of CNNs improves robustness to image perturbations (2020). bioRxiv https://doi.org/10.1101/2020.06.16.154542
Dean, S., Matni, N., Recht, B., Ye, V.: Robust guarantees for perception-based control. In: Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR, vol 120, 350–360 (2020)
Delseny, H., Gabreau, C., Gauffriau, A., Beaudouin, B., Ponsolle, L., Alecu, L., Bonnin, H., Beltran, B., Duchel, D., Ginestet, J.B., Hervieu, A., Martinez, G., Pasquet, S., Delmas, K., Pagetti, C., Gabriel, J.M., Chapdelaine, C., Picard, S., Damour, M., Cappi, C., Gardès, L., Grancey, F.D., Jenn, E., Lefevre, B., Flandin, G., Gerchinovitz, S., Mamalet, F., Albore, A.: White paper machine learning in certified systems (2021). ArXiv preprint arXiv:2103.10529
Demir, S., Eniser, H.F., Sen, A.: Deepsmartfuzzer: Reward guided test generation for deep learning. ArXiv preprint arXiv:arXiv 1911, 10621 (2019)
Google Scholar
Deshmukh, J.V., Kapinski, JP., Yamaguchi, T., Prokhorov, D.: Learning deep neural network controllers for dynamical systems with safety guarantees. In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), IEEE, pp. 1–7 (2019)
Dey, S., Dasgupta, P., Gangopadhyay, B.: Safety augmentation in decision trees. In: AISafety@ IJCAI (2020)
Dreossi, T., Ghosh, S., Sangiovanni-Vincentelli, A., Seshia, S.A.: Systematic testing of convolutional neural networks for autonomous driving (2017). ArXiv preprint arXiv:1708.03309
Duddu, V., Rao, DV., Balas, VE.: Adversarial fault tolerant training for deep neural networks (2019). ArXiv preprint arXiv:1907.03103
Dutta, S., Jha, S., Sanakaranarayanan, S., Tiwari, A.: Output range analysis for deep neural networks (2017). ArXiv preprint arXiv:1709.09130
Dybå, T., Dingsøyr, T.: Empirical studies of agile software development: a systematic review. Inf. Softw. Technol. 50(9), 833–859 (2008). https://doi.org/10.1016/j.infsof.2008.01.006
Article Google Scholar
Eniser, H.F., Gerasimou, S., Sen, A.: Deepfault: fault localization for deep neural networks. In: Hähnle, R., van der Aalst, W. (eds.) Fundamental Approaches to Software Engineering, pp. 171–191. Springer, Cham (2019)
Chapter Google Scholar
Everett, M., Lütjens, B., How, J.P.: Certifiable robustness to adversarial state uncertainty in deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst (2021). https://doi.org/10.1109/TNNLS.2021.3056046
Article Google Scholar
Fan, D.D., Nguyen, J., Thakker, R., Alatur, N., Agha-mohammadi, A.A., Theodorou, E.A.: Bayesian learning-based adaptive control for safety critical systems. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4093–4099 (2020). https://doi.org/10.1109/ICRA40945.2020.9196709
Feng, D., Rosenbaum, L., Glaeser, C., Timm, F., Dietmayer, K.: Can we trust you? on calibration of a probabilistic object detector for autonomous driving (2019). ArXiv preprint arXiv:1909.12358
Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: Prioritizing massive tests to enhance the robustness of deep neural networks. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Association for Computing Machinery, New York, NY, USA, ISSTA 2020, pp. 177-188 (2020). https://doi.org/10.1145/3395363.3397357
Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019). https://doi.org/10.1109/TAC.2018.2876389
Article MathSciNet MATH Google Scholar
François-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An Introduction to Deep Reinforcement Learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018)
Article MATH Google Scholar
Fremont, D.J., Chiu, J., Margineantu, D.D., Osipychev, D., Seshia, S.A.: Formal analysis and redesign of a neural network-based aircraft taxiing system with verifai. In: International Conference on Computer Aided Verification. Springer, Berlin. pp. 122–134 (2020)
Fujino, H., Kobayashi, N., Shirasaka, S.: Safety assurance case description method for systems incorporating off-operational machine learning and safety device. INCOSE Int. Symp. 29(S1), 152–164 (2019)
Article Google Scholar
Fulton, N., Platzer, A.: Verifiably safe off-model reinforcement learning. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, Berlin. pp. 413–430 (2019)
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, JMLR.org, ICML’16, pp. 1050-1059 (2016)
Gambi, A., Mueller, M., Fraser, G.: Automatically testing self-driving cars with search-based procedural content generation. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, New York, NY, USA, ISSTA 2019, pp. 318-328 (2019)
Gandhi, D., Pinto, L., Gupta, A.: Learning to fly by crashing. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 3948–3955 (2017)
Gauerhof, L., Munk, P., Burton, S.: Structuring validation targets of a machine learning function applied to automated driving. In: Gallina, B., Skavhaug, A., Bitsch, F. (eds.) Computer Safety, Reliability, and Security, pp. 45–58. Springer, Berlin (2018)
Google Scholar
Gauerhof, L., Hawkins, R., Picardi, C., Paterson, C., Hagiwara, Y., Habli, I.: Assuring the safety of machine learning for pedestrian detection at crossings. In: Casimiro, A., Ortmeier, F., Bitsch, F., Ferreira, P. (eds.) Computer Safety, Reliability, and Security, pp. 197–212. Springer, Cham (2020)
Chapter Google Scholar
Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: Ai2: Safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), IEEE, pp. 3–18 (2018)
Ghosh, S., Berkenkamp, F., Ranade, G., Qadeer, S., Kapoor, A.: Verifying controllers against adversarial examples with bayesian optimization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 7306–7313 (2018a)
Ghosh, S., Jha, S., Tiwari, A., Lincoln, P., Zhu, X.: Model, data and reward repair: Trusted machine learning for markov decision processes. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 194–199 (2018b)
Gladisch, C., Heinzemann, C., Herrmann, M., Woehrle, M.: Leveraging combinatorial testing for safety-critical computer vision datasets. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1314–1321 (2020)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014). ArXiv preprint arXiv:1412.6572
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, New York (2016)
MATH Google Scholar
Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a“right to explanation’’. AI magazine 38(3), 50–57 (2017)
Article Google Scholar
Göpfert, J.P., Hammer, B., Wersing, H.: Mitigating concept drift via rejection. In: International Conference on Artificial Neural Networks. Springer, Berlin. pp. 456–467 (2018)
Gopinath, D., Taly, A., Converse, H., Pasareanu, C.S.: Finding invariants in deep neural networks (2019). ArXiv preprint arXiv:190413215v1
Gopinath, D., Katz, G., Păsăreanu, C.S., Barrett, C.: Deepsafe: A data-driven approach for assessing robustness of neural networks. In: Lahiri, S.K., Wang, C. (eds.) Automated Technology for Verification and Analysis, pp. 3–19. Springer, Cham (2018)
Chapter Google Scholar
Grefenstette, E., Stanforth, R., O’Donoghue, B., Uesato, J., Swirszcz, G., Kohli, P.: Strength in numbers: Trading-off robustness and computation via adversarially-trained ensembles. CoRR abs/1811.09300 (2018). arXiv:1811.09300
Gros, T.P., Hermanns, H., Hoffmann, J., Klauck, M., Steinmetz, M.: Deep statistical model checking. In: International Conference on Formal Techniques for Distributed Objects, Components, and Systems. Springer, Berlin. pp. 96–114 (2020b)
Gros, S., Zanon, M., Bemporad, A.: Safe reinforcement learning via projection on a safe set: How to achieve optimality? (2020a). ArXiv preprint arXiv:2004.00915
Gschossmann, A., Jobst, S., Mottok, J., Bierl, R.: A measure of confidence of artificial neural network classifiers. In: ARCS Workshop 2019; 32nd International Conference on Architecture of Computing Systems, pp. 1–5 (2019)
Gu, X., Easwaran, A.: Towards safe machine learning for cps: infer uncertainty from training data. In: Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, pp. 249–258 (2019)
Gualo, F., Rodriguez, M., Verdugo, J., Caballero, I., Piattini, M.: Data quality certification using ISO/IEC 25012: Industrial experiences. J. Syst. Softw. 176, 110938 (2021)
Article Google Scholar
Guidotti, D., Leofante, F., Castellini, C., Tacchella, A.: Repairing learned controllers with convex optimization: a case study. In: International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Springer, Berlin. pp. 364–373 (2019a)
Guidotti, R., Monreale, A., Giannotti, F., Pedreschi, D., Ruggieri, S., Turini, F.: Factual and counterfactual explanations for black box decision making. IEEE Intell. Syst. 34(6), 14–23 (2019)
Article Google Scholar
Guo, W., Mu, D., Xu, J., Su, P., Wang, G., Xing, X.: Lemna: Explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 364–379 (2018b)
Guo J, Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: DLFuzz: differential fuzzing testing of deep learning systems. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2018a)
Hart, P., Rychly, L., Knoll, A.: Lane-merging using policy-based reinforcement learning and post-optimization. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, 3176–3181 (2019)
Hasanbeig, M., Kroening, D., Abate, A.: Towards verifiable and safe model-free reinforcement learning. In: CEUR Workshop Proceedings, CEUR Workshop Proceedings (2020)
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)
Hein, M., Andriushchenko, M.: Formal guarantees on the robustness of a classifier against adversarial manipulation. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30 (2017). https://proceedings.neurips.cc/paper/2017/file/e077e1a544eec4f0307cf5c3c721d944-Paper.pdf
Heinzmann, L., Shafaei, S., Osman, M.H., Segler, C., Knoll, A.: A framework for safety violation identification and assessment in autonomous driving. In: AISafety@IJCAI (2019)
Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M., Steinhardt, J., Song, D.: Scaling out-of-distribution detection for real-world settings (2020). ArXiv preprint arXiv:1911.11132
Hendrycks, D., Carlini, N., Schulman, J., Steinhardt, J.: Unsolved problems in ml safety (2021). arXiv:2109.13916
Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=HJz6tiCqYm
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net (2017). https://openreview.net/forum?id=Hkg4TI9xl
Henne, M., Schwaiger, A., Roscher, K., Weiss, G.: Benchmarking uncertainty estimation methods for deep learning with safety-related metrics. In: SafeAI@ AAAI, pp. 83–90 (2020)
Henriksson, J., Berger, C., Borg, M., Tornberg, L., Englund, C., Sathyamoorthy, S.R., Ursing, S.: Towards structured evaluation of deep neural network supervisors. In: 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 27–34 (2019a)
Henriksson, J., Berger, C., Borg, M., Tornberg, L., Sathyamoorthy, S.R., Englund, C.: Performance analysis of out-of-distribution detection on various trained neural networks. In: 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 113–120 (2019b). https://doi.org/10.1109/SEAA.2019.00026
Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: International conference on computer aided verification. Springer, Berlin. pp. 3–29 (2017)
Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y., Thamo, E., Wu, M., Yi, X.: A survey of safety and trustworthiness of deep neural networks: verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020). https://doi.org/10.1016/j.cosrev.2020.100270
Article MathSciNet MATH Google Scholar
Ignatiev, A., Pereira, F., Narodytska, N., Marques-Silva, J.: A sat-based approach to learn explainable decision sets. In: International Joint Conference on Automated Reasoning. Springer, Berlin. pp. 627–645 (2018)
Inouye, D.I., Leqi, L., Kim, J.S., Aragam, B., Ravikumar, P.: Diagnostic curves for black box models (2019). ArXiv preprint arXiv:191201108v1
Isele, D., Nakhaei, A., Fujimura, K.: Safe reinforcement learning on autonomous vehicles. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 1–6 (2018)
ISO (2018) ISO 26262: Road vehicles – Functional safety. International Organization of Standardization (ISO), Geneva, Switzerland
ISO (2019) ISO/PAS 21448: Road vehicles – Safety of the intended functionality. International Organization of Standardization (ISO), Geneva
Jain, D., Anumasa, S., Srijith, P.: Decision making under uncertainty with convolutional deep gaussian processes. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp. 143–151 (2020)
Jeddi, A., Shafiee, M.J., Karg, M., Scharfenberger, C., Wong, A.: Learn2perturb: An end-to-end feature perturbation learning to improve adversarial robustness. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1238–1247 (2020). https://doi.org/10.1109/CVPR42600.2020.00132
Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., Tang, J.: Graph structure learning for robust graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ’20, pp. 66–74 (2020)
Julian, K.D., Kochenderfer, M.J.: Guaranteeing safety for neural network-based aircraft collision avoidance systems. In: 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), IEEE, pp. 1–10 (2019)
Julian, K.D., Lee, R., Kochenderfer, M.J.: Validation of image-based neural network controllers through adaptive stress testing. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–7 (2020). https://doi.org/10.1109/ITSC45102.2020.9294549
Julian, K.D., Sharma, S., Jeannin, J.B., Kochenderfer, M.J.: Verifying aircraft collision avoidance neural networks through linear approximations of safe regions (2019). ArXiv preprint arXiv:1903.00762
Kandel, A., Moura, S.J.: Safe zero-shot model-based learning and control: a wasserstein distributionally robust approaC.H (2020). ArXiv preprint arXiv:2004.00759
Kaprocki, N., Velikić, G., Teslić, N., Krunić, M.: Multiunit automotive perception framework: Synergy between AI and deterministic processing. In: 2019 IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin), pp. 257–260 (2019)
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: An efficient smt solver for verifying deep neural networks. In: International Conference on Computer Aided Verification. Springer, Berlin. pp. 97–117 (2017)
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’17, pp. 5580–5590 (2017)
Kitchenham, B.: Procedures for performing systematic reviews. Joint Technical Report, Computer Science Department, Keele University (TR/SE-0401) and National ICT Australia Ltd (0400011T1) (2004)
Kitchenham, B., Pretorius, R., Budgen, D., Pearl Brereton, O., Turner, M., Niazi, M., Linkman, S.: Systematic literature reviews in software engineering - a tertiary study. Inf. Softw. Technol. 52(8), 792–805 (2010)
Article Google Scholar
Kläs, M., Sembach, L.: Uncertainty wrappers for data-driven models. In: International Conference on Computer Safety, Reliability, and Security. Springer, Berlin. pp. 358–364 (2019)
Kornecki, A., Zalewski, J.: Software certification for safety-critical systems: A status report. In: 2008 International Multiconference on Computer Science and Information Technology, pp. 665–672 (2008). https://doi.org/10.1109/IMCSIT.2008.4747314
Kuppers, F., Kronenberger, J., Shantia, A., Haselhoff, A.: Multivariate confidence calibration for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 326–327 (2020)
Kuutti, S., Bowden, R., Joshi, H., de Temple, R., Fallah, S.: Safe deep neural network-driven autonomous vehicles using software safety cages. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) Intelligent Data Engineering and Automated Learning - IDEAL 2019. Lecture Notes in Computer Science, pp. 150–160. Springer, Berlin (2019)
Chapter Google Scholar
Kuwajima, H., Tanaka, M., Okutomi, M.: Improving transparency of deep neural inference process. Progr. Artif. Intell. 8(2), 273–285 (2019)
Article Google Scholar
Laidlaw, C., Feizi, S.: Playing it safe: adversarial robustness with an abstain option (2019). ArXiv preprint arXiv:1911.11253
Le, M.T., Diehl, F., Brunner, T., Knol, A.: Uncertainty estimation for deep neural object detectors in safety-critical applications. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp. 3873–3878 (2018)
Le, H., Voloshin, C., Yue, Y.: Batch policy learning under constraints. In: International Conference on Machine Learning, PMLR, pp. 3703–3712 (2019)
Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: On the connection between differential privacy and adversarial robustness in machine learning (2018). ArXiv preprint arXiv:180203471v1
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’18, pp. 7167–7177 (2018)
Lee, K., An, G.N., Zakharov, V., Theodorou, E.A.: Perceptual attention-based predictive control (2019a). ArXiv preprint arXiv:1904.11898
Lee, K., Wang, Z., Vlahov, B., Brar, H., Theodorou, E.A.: Ensemble bayesian decision making with redundant deep perceptual control policies. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), IEEE, pp. 831–837 (2019b)
Levi, D., Gispan, L., Giladi, N., Fetaya, E.: Evaluating and calibrating uncertainty prediction in regression tasks (2019). ArXiv preprint arXiv:1905.11659
Li, S., Chen, Y., Peng, Y., Bai, L.: Learning more robust features with adversarial training. ArXiv preprint arXiv:1804.07757 (2018)
Li, J., Liu, J., Yang, P., Chen, L., Huang, X., Zhang, L.: Analyzing deep neural networks with symbolic propagation: towards higher precision and faster verification. In: International Static Analysis Symposium. Springer, Berlin. pp. 296–319 (2019a)
Li, Y., Liu, Y., Li, M., Tian, Y., Luo, B., Xu, Q.: D2NN: A fine-grained dual modular redundancy framework for deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC’19), ACM, New York, NY, USA, pp. 138-147 (2019b)
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks (2020). ArXiv preprint arXiv:1706.02690
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning (2015). ArXiv preprint arXiv:1509.02971
Lin, W., Yang, Z., Chen, X., Zhao, Q., Li, X., Liu, Z., He, J.: Robustness verification of classification deep neural networks via linear programming. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11418–11427 (2019)
Liu, M., Liu, S., Su, H., Cao, K., Zhu, J.: Analyzing the noise robustness of deep neural networks. In: 2018 IEEE Conference on Visual Analytics Science and Technology (VAST), IEEE, pp. 60–71, (2018)
Liu, L., Saerbeck, M., Dauwels, J.: Affine disentangled gan for interpretable and robust av perception (2019). ArXiv preprint arXiv:1907.05274
Liu, J., Shen, Z., Cui, P., Zhou, L., Kuang, K., Li, B., Lin, Y.: Invariant adversarial learning for distributional robustness (2020). ArXiv preprint arXiv:2006.04414
Loquercio, A., Segu, M., Scaramuzza, D.: A general framework for uncertainty estimation in deep learning. IEEE Robot. Autom. Lett. 5(2), 3153–3160 (2020). https://doi.org/10.1109/LRA.2020.2974682
Article Google Scholar
Lust, J., Condurache, A.P.: Gran: An efficient gradient-norm based detector for adversarial and misclassified examples (2020). ArXiv preprint arXiv:2004.09179
Lütjens, B., Everett, M., How, J.P.: Safe reinforcement learning with model uncertainty estimates. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp. 8662–8668 (2019)
Lyu, Z., Ko, C.Y., Kong, Z., Wong, N., Lin, D., Daniel, L.: Fastened crown: Tightened neural network robustness certificates. Proc. AAAI Conf. Artif. Intell. 34, 5037–5044 (2020)
Google Scholar
Ma, L., Juefei-Xu, F., Xue, M., Li, B., Li, L., Liu, Y., Zhao, J.: DeepCT: Tomographic combinatorial testing for deep learning systems. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 614–618 (2019)
Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., Zhao, J., Wang, Y.: Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ACM, New York, NY, USA, ASE 2018, pp. 120-131 (2018). https://doi.org/10.1145/3238147.3238202
Machida, F.: N-version machine learning models for safety critical systems. In: 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 48–51 (2019)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks (2017). ArXiv preprint arXiv:1706.06083
Mani, N., Moh, M., Moh, T.S.: Towards robust ensemble defense against adversarial examples attack. In: 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2019a)
Mani, S., Sankaran, A., Tamilselvam, S., Sethi, A.: Coverage testing of deep learning models using dataset characterization (2019b). ArXiv preprint arXiv:1911.07309
Marvi, Z., Kiumarsi, B.: Safe off-policy reinforcement learning using barrier functions. In: 2020 American Control Conference (ACC), IEEE, pp. 2176–2181 (2020)
Meinke, A., Hein, M.: Towards neural networks that provably know when they don’t know (2019). ArXiv preprint arXiv:1909.12180
Meyes, R., de Puiseau, C.W., Posada-Moreno, A., Meisen, T.: Under the hood of neural networks: Characterizing learned representations by functional neuron populations and network ablations (2020a). ArXiv preprint arXiv:2004.01254
Meyes, R., Schneider, M., Meisen, T.: How do you act? an empirical study to understand behavior of deep reinforcement learning agents (2020b). ArXiv preprint arXiv:2004.03237
Michelmore, R., Kwiatkowska, M., Gal, Y.: Evaluating uncertainty quantification in end-to-end autonomous driving control (2018). ArXiv preprint arXiv:1811.06817
Mirman, M., Gehr, T., Vechev, M.: Differentiable abstract interpretation for provably robust neural networks. In: International Conference on Machine Learning, PMLR, pp. 3578–3586 (2018)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Moravčík, M., Schmid, M., Burch, N., Lisỳ, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., Bowling, M.: Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337), 508–513 (2017)
Article MathSciNet MATH Google Scholar
Müller, S., Hospach, D., Bringmann, O., Gerlach, J., Rosenstiel, W.: Robustness evaluation and improvement for vision-based advanced driver assistance systems. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems, pp. 2659–2664 (2015)
Naseer, M., Minhas, M.F., Khalid, F., Hanif, M.A., Hasan, O., Shafique, M.: Fannet: formal analysis of noise tolerance, training bias and input sensitivity in neural networks. In: 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, pp. 666–669 (2020)
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2018)
MATH Google Scholar
Nguyen, H.H., Matschek, J., Zieger, T., Savchenko, A., Noroozi, N., Findeisen, R.: Towards nominal stability certification of deep learning-based controllers. In: 2020 American Control Conference (ACC), IEEE, 3886–3891 (2020)
Nowak, T., Nowicki, M.R., Ćwian, K., Skrzypczyński, P.: How to improve object detection in a driver assistance system applying explainable deep learning. In: 2019 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp. 226–231 (2019)
O’Brien, M., Goble, W., Hager, G., Bukowski, J.: Dependable neural networks for safety critical tasks. In: International Workshop on Engineering Dependable and Secure Machine Learning Systems. Springer, Berlin. pp. 126–140 (2020)
Pan, R.: Static deep neural network analysis for robustness. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE 2019, pp. 1238-1240 (2019)
Pandian, M.K.S., Dajsuren, Y., Luo, Y., Barosan, I.: Analysis of iso 26262 compliant techniques for the automotive domain. In: MASE@MoDELS (2015)
Park, C., Kim, J.M., Ha, S.H., Lee, J.: Sampling-based bayesian inference with gradient uncertainty (2018). ArXiv preprint arXiv:1812.03285
Pauli, P., Koch, A., Berberich, J., Kohler, P., Allgöwer, F.: Training robust neural networks using lipschitz bounds. IEEE Control Syst. Lett. 6, 121–126 (2022). https://doi.org/10.1109/LCSYS.2021.3050444
Article MathSciNet Google Scholar
Pedreschi, D., Giannotti, F., Guidotti, R., Monreale, A., Pappalardo, L., Ruggieri, S., Turini, F.: Open the black box data-driven explanation of black box decision systems (2018). ArXiv preprint arXiv:1806.09936
Pedroza, G., Adedjouma, M.: Safe-by-Design Development Method for Artificial Intelligent Based Systems. In: SEKE 2019 : The 31st International Conference on Software Engineering and Knowledge Engineering, Lisbon, Portugal, pp. 391–397 (2019)
Pei, K., Cao, Y., Yang, J., Jana, S.: DeepXplore. Proceedings of the 26th Symposium on Operating Systems Principles (2017a)
Pei, K., Cao, Y., Yang, J., Jana, S.: Towards practical verification of machine learning: The case of computer vision systems (2017b). ArXiv preprint arXiv:1712.01785
Peng, W., Ye, Z.S., Chen, N.: Bayesian deep-learning-based health prognostics toward prognostics uncertainty. IEEE Trans. Ind. Electron. 67(3), 2283–2293 (2019)
Article Google Scholar
Postels, J., Ferroni, F., Coskun, H., Navab, N., Tombari, F.: Sampling-free epistemic uncertainty estimation using approximated variance propagation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2931–2940 (2019)
Rahimi, M., Guo, J.L., Kokaly, S., Chechik, M.: Toward requirements specification for machine-learned components. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), 241–244 (2019)
Rajabli, N., Flammini, F., Nardone, R., Vittorini, V.: Software verification and validation of safe autonomous cars: A systematic literature review. IEEE Access 9, 4797–4819 (2021). https://doi.org/10.1109/ACCESS.2020.3048047
Article Google Scholar
Rakin, A.S., He, Z., Fan, D.: Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack (2018). ArXiv preprint arXiv:1811.09310
Ramanagopal, M.S., Anderson, C., Vasudevan, R., Johnson-Roberson, M.: Failing to learn: Autonomously identifying perception failures for self-driving cars. IEEE Robot. Autom. Lett. 3(4), 3860–3867 (2018)
Article Google Scholar
Reeb, D., Doerr, A., Gerwinn, S., Rakitsch, B.: Learning gaussian processes by minimizing pac-bayesian generalization bounds. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’18, pp. 3341-3351 (2018)
Remeli, V., Morapitiye, S., Rövid, A., Szalay, Z.: Towards verifiable specifications for neural networks in autonomous driving. In: 2019 IEEE 19th International Symposium on Computational Intelligence and Informatics and 7th IEEE International Conference on Recent Achievements in Mechatronics, pp. 000175–000180. Automation, Computer Sciences and Robotics (CINTI-MACRo), IEEE (2019)
Ren, H., Chandrasekar, S.K., Murugesan, A.: Using quantifier elimination to enhance the safety assurance of deep neural networks. In: 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), IEEE, pp. 1–8 (2019a)
Ren, J., Liu, P.J., Fertig, E., Snoek, J., Poplin, R., DePristo, M.A., Dillon, J.V., Lakshminarayanan, B.: Likelihood Ratios for Out-of-Distribution Detection, pp. 14707–14718. Curran Associates Inc., Red Hook, NY, USA (2019)
Ren, K., Zheng, T., Qin, Z., Liu, X.: Adversarial attacks and defenses in deep learning. Engineering 6(3), 346–360 (2020)
Article Google Scholar
Revay, M., Wang, R., Manchester, I.R.: A convex parameterization of robust recurrent neural networks. IEEE Control Syst. Lett. 5(4), 1363–1368 (2020)
Article MathSciNet Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144 (2016)
Richards, S.M., Berkenkamp, F., Krause, A.: The lyapunov neural network: adaptive stability certification for safe learning of dynamical systems. In: Conference on Robot Learning, PMLR, pp. 466–476 (2018)
Rodriguez-Dapena, P.: Software safety certification: a multidomain problem. IEEE Softw. 16(4), 31–38 (1999). https://doi.org/10.1109/52.776946
Article Google Scholar
Roh, Y., Heo, G., Whang, S.E.: A survey on data collection for machine learning: A big data - ai integration perspective. IEEE Trans. Knowl. Data Eng. 33(4), 1328–1347 (2021)
Article Google Scholar
Ruan, W., Wu, M., Sun, Y., Huang, X., Kroening, D., Kwiatkowska, M.: Global robustness evaluation of deep neural networks with provable guarantees for the hamming distance. In: IJCAI2019 (2019)
Rubies-Royo, V., Calandra, R., Stipanovic, D.M., Tomlin, C.: Fast neural network verification via shadow prices (2019). ArXiv preprint arXiv:1902.07247
Rudolph, A., Voget, S., Mottok, J.: A consistent safety case argumentation for artificial intelligence in safety related automotive systems. In: 9th European Congress on Embedded Real Time Software and Systems (ERTS 2018), Toulouse, France (2018)
Rusak, E., Schott, L., Zimmermann, R., Bitterwolf, J., Bringmann, O., Bethge, M., Brendel, W.: Increasing the robustness of dnns against image corruptions by playing the game of noise (2020). ArXiv preprint arXiv:2001.06057
Salay, R., Angus, M., Czarnecki, K.: A safety analysis method for perceptual components in automated driving. In: 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp. 24–34 (2019)
Salay, R., Czarnecki, K.: Using machine learning safely in automotive software: An assessment and adaption of software process requirements in iso 26262 (2018). ArXiv preprint arXiv:1808.01614
Scheel, O., Schwarz, L., Navab, N., Tombari, F.: Explicit domain adaptation with loosely coupled samples (2020). ArXiv preprint arXiv:2004.11995
Sehwag, V., Bhagoji, A.N., Song, L., Sitawarin, C., Cullina, D., Chiang, M., Mittal, P.: Analyzing the robustness of open-world machine learning. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, ACM, New York, NY, USA, AISec’19, pp. 105-116 (2019)
Sehwag, V., Wang, S., Mittal, P., Jana, S.: On pruning adversarially robust neural networks. ArXiv arXiv:2002.10509 (2020)
Sekhon, J., Fleming, C.: Towards improved testing for deep learning. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pp. 85–88 (2019)
Sena, L.H., Bessa, I.V., Gadelha, M.R., Cordeiro, L.C., Mota, E.: Incremental bounded model checking of artificial neural networks in cuda. In: 2019 IX Brazilian Symposium on Computing Systems Engineering (SBESC), IEEE, pp. 1–8 (2019)
Sheikholeslami, F., Jain, S., Giannakis, G.B.: Minimum uncertainty based detection of adversaries in deep neural networks. In: 2020 Information Theory and Applications Workshop (ITA), IEEE, pp. 1–16 (2020)
Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information (2017). ArXiv preprint arXiv:1703.00810
Singh, G., Gehr, T., Püschel, M., Vechev, M.: Boosting robustness certification of neural networks. In: International Conference on Learning Representations (2018)
Sinha, A., Namkoong, H., Volpi, R., Duchi, J.: Certifying some distributional robustness with principled adversarial training (2017). ArXiv preprint arXiv:1710.10571
Smith, M.T., Grosse, K., Backes, M., Alvarez, M.A.: Adversarial vulnerability bounds for gaussian process classification (2019). ArXiv preprint arXiv:1909.08864
Sohn, J., Kang, S., Yoo, S.: Search based repair of deep neural networks (2019). ArXiv preprint arXiv:1912.12463
Steinhardt, J., Koh, P.W., Liang, P.: Certified defenses for data poisoning attacks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’17, pp. 3520-3532 (2017)
Summers, C., Dinneen, M.J.: Improved adversarial robustness via logit regularization methods (2019). ArXiv preprint arXiv:1906.03749
Sun, Y., Huang, X., Kroening, D., Sharp, J., Hill, M., Ashmore, R.: DeepConcolic: Testing and debugging deep neural networks. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 111–114 (2019)
Sun, Y., Huang, X., Kroening, D., Sharp, J., Hill, M., Ashmore, R.: Structural test coverage criteria for deep neural networks. ACM Trans. Embed. Comput. Syst. 18, 5 (2019)
Article Google Scholar
Syriani, E., Luhunu, L., Sahraoui, H.: Systematic mapping study of template-based code generation. Comput. Lang. Syst. Struct. 52, 43–62 (2018)
Google Scholar
Taha, A., Chen, Y., Misu, T., Shrivastava, A., Davis, L.: Unsupervised data uncertainty learning in visual retrieval systems. CoRR abs/1902.02586 (2019). arXiv:1902.02586
Tang, Y.C., Zhang, J., Salakhutdinov, R.: Worst cases policy gradients (2019). ArXiv preprint arXiv:1911.03618
Tian, Y., Pei, K., Jana, S., Ray, B.: DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE ’18, pp. 303-314 (2018)
Tian, Y., Zhong, Z., Ordonez, V., Kaiser, G., Ray, B.: Testing dnn image classifiers for confusion & bias errors. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA, ICSE ’20, pp. 1122-1134 (2020). https://doi.org/10.1145/3377811.3380400,
Törnblom, J., Nadjm-Tehrani, S.: Formal verification of input-output mappings of tree ensembles. Sci. Comput. Progr. 194, 102450 (2020)
Article Google Scholar
Toubeh, M., Tokekar, P.: Risk-aware planning by confidence estimation using deep learning-based perception (2019). ArXiv preprint arXiv:1910.00101
Tran, H.D., Musau, P., Lopez, D.M., Yang, X., Nguyen, L.V., Xiang, W., Johnson, T.T.: Parallelizable reachability analysis algorithms for feed-forward neural networks. In: 2019 IEEE/ACM 7th International Conference on Formal Methods in Software Engineering (FormaliSE), IEEE, pp. 51–60 (2019)
Tran, H.D., Yang, X., Lopez, D.M., Musau, P., Nguyen, L.V., Xiang, W., Bak, S., Johnson, T.T.: NNV: The neural network verification tool for deep neural networks and learning-enabled cyber-physical systems. In: International Conference on Computer Aided Verification. Springer, Berlin. pp. 3–17 (2020)
Tuncali, C.E., Fainekos, G., Ito, H., Kapinski, J.: Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1555–1562 (2018)
Turchetta, M., Berkenkamp, F., Krause, A.: Safe exploration in finite markov decision processes with gaussian processes. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’16, pp. 4312-4320 (2016)
Udeshi, S., Jiang, X., Chattopadhyay, S.: Callisto: Entropy-based test generation and data quality assessment for machine learning systems. In: 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), pp. 448–453 (2020)
Uesato, J., Kumar, A., Szepesvari, C., Erez, T., Ruderman, A., Anderson, K., Heess, N., Kohli, P. et al.: Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures (2018). ArXiv preprint arXiv:1812.01647
Varghese, S., Bayzidi, Y., Bar, A., Kapoor, N., Lahiri, S., Schneider, J.D., Schmidt, N.M., Schlicht, P., Huger, F., Fingscheidt, T.: Unsupervised temporal consistency metric for video segmentation in highly-automated driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 336–337 (2020)
Vidot, G., Gabreau, C., Ober, I., Ober, I.: Certification of embedded systems based on machine learning: a survey (2021). arXiv:2106.07221
Vijaykeerthy, D., Suri, A., Mehta, S., Kumaraguru, P.: Hardening deep neural networks via adversarial model cascades (2018). arXiv:1802.01448
Wabersich, K.P., Zeilinger, M.: Bayesian model predictive control: Efficient model exploration and regret bounds using posterior sampling. In: Learning for Dynamics and Control, PMLR, pp. 455–464 (2020a)
Wabersich, K.P., Zeilinger, M.N.: Performance and safety of bayesian model predictive control: scalable model-based RL with guarantees (2020b). ArXiv preprint arXiv:2006.03483
Wabersich, K.P., Hewing, L., Carron, A., Zeilinger, M.N.: Probabilistic model predictive safety certification for learning-based control. IEEE Trans. Autom. Control 2021, 10 (2021)
MATH Google Scholar
Wagner, J., Kohler, J.M., Gindele, T., Hetzel, L., Wiedemer, J.T., Behnke, S.: Interpretable and fine-grained visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9097–9107 (2019)
Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Efficient formal safety analysis of neural networks. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 31 (2018a). https://proceedings.neurips.cc/paper/2018/file/2ecd2bd94734e5dd392d8678bc64cdab-Paper.pdf
Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Formal security analysis of neural networks using symbolic intervals. In: 27th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 18), pp. 1599–1614 (2018b)
Wang, T.E., Gu, Y., Mehta, D., Zhao, X., Bernal, E.A.: Towards robust deep neural networks (2018c). ArXiv preprint arXiv:1810.11726
Wang W, Wang A, Tamar, A., Chen, X., Abbeel, P.: Safer classification by synthesis (2018d). ArXiv preprint arXiv:1711.08534
Wang, Y., Jha, S., Chaudhuri, K.: Analyzing the robustness of nearest neighbors to adversarial examples. In: Dy J, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol. 80, pp. 5133–5142, (2018e). https://proceedings.mlr.press/v80/wang18c.html
Wang, J., Gou, L., Zhang, W., Yang, H., Shen, H.W.: Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Trans. Visualiz. Comput. Graph. 25(6), 2168–2180 (2019)
Article Google Scholar
Wang, Y.S., Weng, T.W., Daniel, L.: Verification of neural network control policy under persistent adversarial perturbation (2019b). ArXiv preprint arXiv:1908.06353
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Article MATH Google Scholar
Wen, M., Topcu, U.: Constrained cross-entropy method for safe reinforcement learning. IEEE Trans. Autom. Control 66, 7 (2020)
MathSciNet MATH Google Scholar
Wen, J., Li, S., Lin, Z., Hu, Y., Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)
Article Google Scholar
Weyuker, E.J.: On testing non-testable programs. Comput. J. 25(4), 465–470 (1982)
Article Google Scholar
Wicker, M., Huang, X., Kwiatkowska, M.: Feature-guided black-box safety testing of deep neural networks. In: Beyer, D., Huisman, M. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, pp. 408–426. Springer, Cham (2018)
Chapter Google Scholar
Wolschke, C., Kuhn, T., Rombach, D., Liggesmeyer, P.: Observation based creation of minimal test suites for autonomous vehicles. In: 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 294–301 (2017)
Wu, M., Wicker, M., Ruan, W., Huang, X., Kwiatkowska, M.: A game-based approximate verification of deep neural networks with provable guarantees. Theoret. Comput. Sci. 807, 298–329 (2020)
Article MathSciNet MATH Google Scholar
Xiang, W., Lopez, D.M., Musau, P., Johnson, T.T.: Reachable set estimation and verification for neural network models of nonlinear dynamic systems. In: Safe, Autonomous and Intelligent Vehicles. Springer, Berlin. pp. 123–144 (2019)
Xie, X., Ma, L., Juefei-Xu, F., Xue, M., Chen, H., Liu, Y., Zhao, J., Li, B., Yin, J., See, S.: Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, New York, NY, USA, ISSTA 2019, pp. 146–157 (2019)
Xu, H., Chen, Z., Wu, W., Jin, Z., Kuo, S., Lyu, M.: Nv-dnn: Towards fault-tolerant dnn systems with n-version programming. In: 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 44–47 (2019)
Yaghoubi, S., Fainekos, G.: Gray-box adversarial testing for control systems with machine learning components. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, ACM, New York, NY, USA, HSCC ’19, pp. 179–184 (2019)
Yan, Y., Pei, Q.: A robust deep-neural-network-based compressed model for mobile device assisted by edge server. IEEE Access 7, 179104–179117 (2019)
Article Google Scholar
Yan, M., Wang, L., Fei, A.: ARTDL: Adaptive random testing for deep learning systems. IEEE Access 8, 3055–3064 (2020)
Article Google Scholar
Yang, Y., Vamvoudakis, K.G., Modares, H.: Safe reinforcement learning for dynamical games. Int. J. Rob. Nonlinear Control 30(9), 3706–3726 (2020)
Article MathSciNet MATH Google Scholar
Ye, S., Tan, S.H., Xu, K., Wang, Y., Bao, C., Ma, K.: Brain-inspired reverse adversarial examples (2019). ArXiv preprint arXiv:1905.12171
Youn, W.: jun Yi B,: Software and hardware certification of safety-critical avionic systems: A comparison study. Comput. Stand. Interfaces 36(6), 889–898 (2014). https://doi.org/10.1016/j.csi.2014.02.005
Article MathSciNet Google Scholar
Youn, W.K., Hong, S.B., Oh, K.R., Ahn, O.S.: Software certification of safety-critical avionic systems: Do-178c and its impacts. IEEE Aerospace Electron. Syst. Mag. 30(4), 4–13 (2015)
Article Google Scholar
Zhan, W., Li, J., Hu, Y., Tomizuka, M.: Safe and feasible motion generation for autonomous driving via constrained policy net. In: IECON 2017-43rd Annual Conference of the IEEE Industrial Electronics Society, IEEE, pp. 4588–4593 (2017)
Zhang, J., Li, J.: Testing and verification of neural-network-based safety-critical control software: a systematic literature review. Inf. Softw. Technol. 123, 106296 (2020). https://doi.org/10.1016/j.infsof.2020.106296
Article Google Scholar
Zhang, M., Li, H., Kuang, X., Pang, L., Wu, Z.: Neuron selecting: Defending against adversarial examples in deep neural networks. In: International Conference on Information and Communications Security. Springer, Berlin. pp. 613–629 (2019a)
Zhang, P., Dai, Q., Ji, S.: Condition-guided adversarial generative testing for deep learning systems. In: 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 71–77 (2019b)
Zhang, J., Cheung, B., Finn, C., Levine, S., Jayaraman, D.: Cautious adaptation for reinforcement learning in safety-critical settings. In: International Conference on Machine Learning, PMLR, pp. 11055–11065 (2020a)
Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: Survey, landscapes and horizons. IEEE Trans. Softw. Eng. 1, 1 (2020b). https://doi.org/10.1109/TSE.2019.2962027
Article Google Scholar
Zhao, C., Yang, J., Liang, J., Li, C.: Discover learning behavior patterns to predict certification. In: 2016 11th International Conference on Computer Science & Education (ICCSE), IEEE, pp. 69–73 (2016)

Download references

Acknowledgements

We would like to thank the following authors (in no particular order) who kindly provided us feedback about our review of their work: Mahum Naseer, Hoang-Dung Tran, Jie Ren, David Isele, Jesse Zhang, Michaela Klauck, Guy Katz, Patrick Hart, Guy Amit, Yu Li, Anurag Arnab, Tiago Marques, Taylor T. Johnson, Molly O’Brien, Kimin Lee, Lukas Heinzmann, Björn Lütjens, Brendon G. Anderson, Marta Kwiatkowska, Patricia Pauli, Anna Monreale, Alexander Amini, Joerg Wagner, Adrian Schwaiger, Aman Sinha, Joel Dapello, Kim Peter Wabersich. Many thanks also goes to Freddy Lécué from Thalès, who provided us feedback on an early version of this manuscript. They all contributed to improving this SLR.

Author information

Authors and Affiliations

Polytechnique Montréal, Montreal, QC, Canada
Florian Tambon, Gabriel Laberge, Le An, Amin Nikanjam, Paulina Stevia Nouwou Mindom, Foutse Khomh, Giulio Antoniol & Ettore Merlo
Laval University, Quebec, Canada
Yann Pequignot & François Laviolette

Authors

Florian Tambon
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Laberge
View author publications
You can also search for this author in PubMed Google Scholar
Le An
View author publications
You can also search for this author in PubMed Google Scholar
Amin Nikanjam
View author publications
You can also search for this author in PubMed Google Scholar
Paulina Stevia Nouwou Mindom
View author publications
You can also search for this author in PubMed Google Scholar
Yann Pequignot
View author publications
You can also search for this author in PubMed Google Scholar
Foutse Khomh
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Antoniol
View author publications
You can also search for this author in PubMed Google Scholar
Ettore Merlo
View author publications
You can also search for this author in PubMed Google Scholar
François Laviolette
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Tambon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work is supported by the DEEL Project CRDPJ 537462-18 funded by the National Science and Engineering Research Council of Canada (NSERC) and the Consortium for Research and Innovation in Aerospace in Québec (CRIAQ), together with its industrial partners Thales Canada inc, Bell Textron Canada Limited, CAE inc and Bombardier inc.

Supplementary Information

Below is the link to the electronic supplementary material.

Electronic supplementary material 1 (PDF 596 kb)

Appendices

We provide the list of the papers used in each section. Note that some can only be found in the complementary material, in order to provide readers detailed information while keeping the main review concise (Table 3).

Table 3 Papers reference for each section. Note that “Others” and “Explainability/Interpretable Model” categories are not presented in main development, only in Complementary Material, in order to keep the paper concise

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tambon, F., Laberge, G., An, L. et al. How to certify machine learning based safety-critical systems? A systematic literature review. Autom Softw Eng 29, 38 (2022). https://doi.org/10.1007/s10515-022-00337-x

Download citation

Received: 03 August 2021
Accepted: 13 March 2022
Published: 10 April 2022
DOI: https://doi.org/10.1007/s10515-022-00337-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to certify machine learning based safety-critical systems? A systematic literature review