An Investigation of Challenges Encountered When Specifying Training Data and Runtime Monitors for Safety Critical ML Applications

Heyn, Hans-Martin; Knauss, Eric; Malleswaran, Iswarya; Dinakaran, Shruthi

doi:10.1007/978-3-031-29786-1_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13975))

Included in the following conference series:

International Working Conference on Requirements Engineering: Foundation for Software Quality

951 Accesses
4 Citations

Abstract

[Context and motivation] The development and operation of critical software that contains machine learning (ML) models requires diligence and established processes. Especially the training data used during the development of ML models have major influences on the later behaviour of the system. Runtime monitors are used to provide guarantees for that behaviour. [Question/problem] We see major uncertainty in how to specify training data and runtime monitoring for critical ML models and by this specifying the final functionality of the system. In this interview-based study we investigate the underlying challenges for these difficulties. [Principal ideas/results] Based on ten interviews with practitioners who develop ML models for critical applications in the automotive and telecommunication sector, we identified 17 underlying challenges in 6 challenge groups that relate to the challenge of specifying training data and runtime monitoring. [Contribution] The article provides a list of the identified underlying challenges related to the difficulties practitioners experience when specifying training data and runtime monitoring for ML models. Furthermore, interconnection between the challenges were found and based on these connections recommendation proposed to overcome the root causes for the challenges.

This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 957197.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Non-governmental organisations, e.g., https://algorithmwatch.org/en/stories/.
2.
We define critical software as software that is safety, privacy, ethically, and/or mission critical, i.e., a failure in the software can cause significant injury or the loss of life, invasion of personal privacy, violation of human rights, and/or significant economic or environmental consequences [31].
3.
The interview guide is available at https://doi.org/10.7910/DVN/WJ8TKY.
4.
The list included functional safety experts, requirement engineers, product owners or function owners, function or model developers, and data engineers.
5.
Very efficient deep learning in the Internet of Things.

References

Abid, A., Farooqi, M., Zou, J.: Persistent anti-muslim bias in large language models. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 298–306 (2021)
Google Scholar
Ashmore, R., Calinescu, R., Paterson, C.: Assuring the machine learning lifecycle: Desiderata, methods, and challenges. ACM Comput. Surv. 54(5), 1–39 (2021)
Article Google Scholar
Banko, M., Brill, E.: Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 26–33 (2001)
Google Scholar
Barocas, S., Selbst, A.D.: Big data’s disparate impact. Calif. L. Rev. 104, 671 (2016)
Google Scholar
Bayram, F., Ahmed, B.S., Kassler, A.: From concept drift to model degradation: An overview on performance-aware drift detectors. Knowl. Based Syst. 108632 (2022)
Google Scholar
Bencomo, N., Guo, J.L., Harrison, R., Heyn, H.M., Menzies, T.: The secret to better ai and better software (is requirements engineering). IEEE Softw. 39(1), 105–110 (2021)
Article Google Scholar
Bencomo, N., Whittle, J., Sawyer, P., Finkelstein, A., Letier, E.: Requirements reflection: requirements as runtime entities. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 2, pp. 199–202 (2010)
Google Scholar
Bernhardt, M., Jones, C., Glocker, B.: Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms. Nat. Med. 1–2 (2022)
Google Scholar
Blodgett, S.L., Barocas, S., Daum’e, H., Wallach, H.M.: Language (technology) is power: A critical survey of "bias” in nlp. In: ACL (2020)
Google Scholar
Borg, M., et al.: Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. J. Automotive Softw. Eng. 1(1), 1–19 (2018)
Article Google Scholar
Breck, E., Cai, S., Nielsen, E., Salib, M., Sculley, D.: The ml test score: A rubric for ml production readiness and technical debt reduction. In: 2017 IEEE International Conference on Big Data, pp. 1123–1132. IEEE (2017)
Google Scholar
Cheng, C.H., Nührenberg, G., Yasuoka, H.: Runtime monitoring neuron activation patterns. In: 2019 Design, Automation & Test in Europe Conference & Exhibition, pp. 300–303. IEEE (2019)
Google Scholar
Creswell, J.W., Creswell, J.D.: Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications (2017)
Google Scholar
Creswell, John W.; Poth, C.N.: Qualitative Inquiry and Research Design: Choosing Among Five Approaches, 4th edn. Sage Publishing (2017)
Google Scholar
Fabbrizzi, S., Papadopoulos, S., Ntoutsi, E., Kompatsiaris, I.: A survey on bias in visual datasets. arXiv preprint arXiv:2107.07919 (2021)
Fauri, D., Dos Santos, D.R., Costante, E., den Hartog, J., Etalle, S., Tonetta, S.: From system specification to anomaly detection (and back). In: Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and PrivaCy, pp. 13–24 (2017)
Google Scholar
Giese, H., et al.: Living with uncertainty in the age of runtime models. In: Bencomo, N., France, R., Cheng, B.H.C., Aßmann, U. (eds.) Models@run.time. LNCS, vol. 8378, pp. 47–100. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08915-7_3
Chapter Google Scholar
Ginart, T., Zhang, M.J., Zou, J.: Mldemon: Deployment monitoring for machine learning systems. In: International Conference on Artificial Intelligence and Statistics, pp. 3962–3997. PMLR (2022)
Google Scholar
Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017)
Google Scholar
Gwilliam, M., Hegde, S., Tinubu, L., Hanson, A.: Rethinking common assumptions to mitigate racial bias in face recognition datasets. In: Proceedings of the IEEE CVF, pp. 4123–4132 (2021)
Google Scholar
Habibullah, K.M., Horkoff, J.: Non-functional requirements for machine learning: understanding current use and challenges in industry. In: 2021 IEEE 29th RE Conference, pp. 13–23. IEEE (2021)
Google Scholar
Heyn, H.-M., Subbiah, P., Linder, J., Knauss, E., Eriksson, O.: Setting AI in context: a case study on defining the context and operational design domain for automated driving. In: Gervasi, V., Vogelsang, A. (eds.) REFSQ 2022. LNCS, vol. 13216, pp. 199–215. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98464-9_16
Chapter Google Scholar
Horkoff, J.: Non-functional requirements for machine learning: Challenges and new directions. In: 2019 IEEE 27th RE Conference, pp. 386–391. IEEE (2019)
Google Scholar
Humbatova, N., Jahangirova, G., Bavota, G., Riccio, V., Stocco, A., Tonella, P.: Taxonomy of real faults in deep learning systems. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering, pp. 1110–1121 (2020)
Google Scholar
Ishikawa, F., Yoshioka, N.: How do engineers perceive difficulties in engineering of machine-learning systems?-questionnaire survey. In: 2019 IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry, pp. 2–9. IEEE (2019)
Google Scholar
Islam, M.J., Nguyen, G., Pan, R., Rajan, H.: A comprehensive study on deep learning bug characteristics. In: 2019 ACM 27th European Software Engineering Conference, pp. 510–520 (2019)
Google Scholar
Jaipuria, N., et al.: Deflating dataset bias using synthetic data augmentation. In: Proceedings of the IEEE CVF, pp. 772–773 (2020)
Google Scholar
Kang, D., Raghavan, D., Bailis, P., Zaharia, M.: Model assertions for monitoring and improving ml models. Proc. Mach. Learn. Syst. 2, 481–496 (2020)
Google Scholar
Karkkainen, K., Joo, J.: Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In: Proceedings of the IEEE CVF, pp. 1548–1558 (2021)
Google Scholar
King, N., Horrocks, C., Brooks, J.: Interviews in qualitative research. Sage (2018)
Google Scholar
Knight, J.C.: Safety critical systems: challenges and directions. In: 24th International Conference on Software Engineering, pp. 547–550 (2002)
Google Scholar
Kreuzberger, D., Kühl, N., Hirschl, S.: Machine learning operations (mlops): Overview, definition, and architecture. arXiv preprint arXiv:2205.02302 (2022)
Liu, A., Tan, Z., Wan, J., Escalera, S., Guo, G., Li, S.Z.: Casia-surf cefa: A benchmark for multi-modal cross-ethnicity face anti-spoofing. In: Proceedings of the IEEE CVF, pp. 1179–1187 (2021)
Google Scholar
Liu, H., Eksmo, S., Risberg, J., Hebig, R.: Emerging and changing tasks in the development process for machine learning systems. In: Proceedings of the International Conference on Software and System Processes, pp. 125–134 (2020)
Google Scholar
Lwakatare, L.E., Crnkovic, I., Bosch, J.: Devops for ai-challenges in development of ai-enabled applications. In: 2020 International Conference on Software, Telecommunications and Computer Networks, pp. 1–6. IEEE (2020)
Google Scholar
Marques, J., Yelisetty, S.: An analysis of software requirements specification characteristics in regulated environments. J. Softw. Eng. Appli. (IJSEA) 10(6), 1–15 (2019)
Google Scholar
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 1–35 (2021)
Article Google Scholar
Miron, M., Tolan, S., Gómez, E., Castillo, C.: Evaluating causes of algorithmic bias in juvenile criminal recidivism. Artifi. Intell. Law 29(2), 111–147 (2021)
Article Google Scholar
Rabiser, R., Schmid, K., Eichelberger, H., Vierhauser, M., Guinea, S., Grünbacher, P.: A domain analysis of resource and requirements monitoring: Towards a comprehensive model of the software monitoring domain. Inf. Softw. Technol. 111, 86–109 (2019)
Article Google Scholar
Rahman, Q.M., Sunderhauf, N., Dayoub, F.: Per-frame map prediction for continuous performance monitoring of object detection during deployment. In: Proceedings of the IEEE CVF, pp. 152–160 (2021)
Google Scholar
Roh, Y., Lee, K., Whang, S., Suh, C.: Sample selection for fair and robust training. Adv. Neural. Inf. Process. Syst. 34, 815–827 (2021)
Google Scholar
Saldaña, J.: The coding manual for qualitative researchers. Sage Publishing, 2nd edn. (2013)
Google Scholar
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., Aroyo, L.M.: “Everyone wants to do the model work, not the data work”: Data cascades in high-stakes ai. In: 2021 Conference on Human Factors in Computing Systems, pp. 1–15 (2021)
Google Scholar
Shao, Z., Yang, J., Ren, S.: Increasing trustworthiness of deep neural networks via accuracy monitoring. arXiv preprint arXiv:2007.01472 (2020)
Slack, M.K., Draugalis, J.R., Jr.: Establishing the internal and external validity of experimental studies. Am. J. Health Syst. Pharm. 58(22), 2173–2181 (2001)
Article Google Scholar
Uchôa, V., Aires, K., Veras, R., Paiva, A., Britto, L.: Data augmentation for face recognition with cnn transfer learning. In: 2020 International Conference on Systems, Signals and Image Processing, pp. 143–148. IEEE (2020)
Google Scholar
Uricár, M., Hurych, D., Krizek, P., Yogamani, S.: Challenges in designing datasets and validation for autonomous driving. arXiv preprint arXiv:1901.09270 (2019)
Vierhauser, M., Rabiser, R., Grünbacher, P.: Requirements monitoring frameworks: A systematic review. Inf. Softw. Technol. 80, 89–109 (2016)
Article Google Scholar
Vierhauser, M., Rabiser, R., Grünbacher, P., Danner, C., Wallner, S., Zeisel, H.: A flexible framework for runtime monitoring of system-of-systems architectures. In: 2014 IEEE Conference on Software Architecture, pp. 57–66. IEEE (2014)
Google Scholar
Vogelsang, A., Borg, M.: Requirements engineering for machine learning: Perspectives from data scientists. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops, pp. 245–251. IEEE (2019)
Google Scholar
Wang, A., et al.: Revise: A tool for measuring and mitigating bias in visual datasets. Int. J. Comput. Vis. 1–21 (2022)
Google Scholar
Wang, T., Zhao, J., Yatskar, M., Chang, K.W., Ordonez, V.: Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (October 2019)
Google Scholar
Wardat, M., Le, W., Rajan, H.: Deeplocalize: Fault localization for deep neural networks. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering, pp. 251–262. IEEE (2021)
Google Scholar
Zhang, X., et al.: Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. 2020 IEEE/ACM 42nd International Conference on Software Engineering, pp. 739–751 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Chalmers University of Technology, SE-412 96, Gothenburg, Sweden
Hans-Martin Heyn, Eric Knauss, Iswarya Malleswaran & Shruthi Dinakaran
University of Gothenburg, SE-405 30, Gothenburg, Sweden
Hans-Martin Heyn & Eric Knauss

Authors

Hans-Martin Heyn
View author publications
You can also search for this author in PubMed Google Scholar
Eric Knauss
View author publications
You can also search for this author in PubMed Google Scholar
Iswarya Malleswaran
View author publications
You can also search for this author in PubMed Google Scholar
Shruthi Dinakaran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans-Martin Heyn .

Editor information

Editors and Affiliations

CNR ISTI, Pisa, Italy
Alessio Ferrari
Chalmers Tekniska Högskola, Gothenburg, Sweden
Birgit Penzenstadler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heyn, HM., Knauss, E., Malleswaran, I., Dinakaran, S. (2023). An Investigation of Challenges Encountered When Specifying Training Data and Runtime Monitors for Safety Critical ML Applications. In: Ferrari, A., Penzenstadler, B. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2023. Lecture Notes in Computer Science, vol 13975. Springer, Cham. https://doi.org/10.1007/978-3-031-29786-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-29786-1_14
Published: 04 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29785-4
Online ISBN: 978-3-031-29786-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Investigation of Challenges Encountered When Specifying Training Data and Runtime Monitors for Safety Critical ML Applications