Skip to main content
Log in

Systematic literature review on software quality for AI-based software

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

There is a widespread demand for Artificial Intelligence (AI) software, specifically Machine Learning (ML). It is getting increasingly popular and being adopted in various applications we use daily. AI-based software quality is different from traditional software quality because it generally addresses distinct and more complex kinds of problems. With the fast advance of AI technologies and related techniques, how to build high-quality AI-based software becomes a very prominent subject. This paper aims at investigating the state of the art on software quality (SQ) for AI-based systems and identifying quality attributes, applied models, challenges, and practices that are reported in the literature. We carried out a systematic literature review (SLR) from 1988 to 2020 to (i) analyze and understand related primary studies and (ii) synthesize limitations and open challenges to drive future research. Our study provides a road map for researchers to understand quality challenges, attributes, and practices in the context of software quality for AI-based software better. From the empirical evidence that we have gathered by this SLR, we suggest future work on this topic be structured under three categories which are Definition/Specification, Design/Evaluation, and Process/Socio-technical.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • 25000 I (2005) The iso/iec 25000 series of standards. https://iso25000.com/index.php/en/iso-25000-standards

  • 25012:2008 I (2008) software engineering — software product quality requirements and evaluation (square) — data quality model. https://www.iso.org/standard/35736.html

  • 26262-1:2018 I (2018) Road vehicles — functional safety. https://www.iso.org/standard/68383.html

  • 29119-1:2013 I (2013) Software and systems engineering — software testing. https://www.iso.org/standard/45142.html

  • 9126-1:2001 I (2001) Software engineering — product quality. https://www.iso.org/standard/22749.html

  • Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 625–635

  • Alamin MAA, Uddin G (2021) Quality assurance challenges for machine learning software applications during software development life cycle phases. arXiv:2105.01195

  • Ali Z Quality measurement challenges for artificial intelligence software

  • de Almeida Biolchini JC, Mian PG, Natali ACC, Conte T, Travassos GH (2007) Scientific research ontology to support systematic review in software engineering. Adv Eng Inform 21(2):133–151

    Article  Google Scholar 

  • Arpteg A, Brinne B, Crnkovic-Friis L, Bosch J (2018) Software engineering challenges of deep learning. In: 2018 44Th euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 50–59

  • Borg M, Englund C, Wnuk K, Duran B, Levandowski C, Gao S, Tan Y, Kaijser H, Lönn H, Törnqvist J (2018) Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. arXiv:1812.05389

  • Bosch J, Olsson HH, Crnkovic I (2021) Engineering ai systems: a research agenda. In: Artificial intelligence paradigms for smart cyber-physical systems. IGI Global, pp 1–19

  • Bourque P, Dupuis R, Abran A, Moore JW, Tripp L (2004) Guide to the software engineering body of knowledge -

  • Braiek HB, Khomh F (2020) On testing machine learning programs. J Syst Softw 164:110542

    Article  Google Scholar 

  • Byrne C (2017) Development Workflows for Data Scientists. O’Reilly Media

  • Chen R, Bastani FB, Tsao TW (1995) On the reliability of ai planning software in real-time applications. IEEE Trans Knowl Data Eng 7(1):4–13

    Article  Google Scholar 

  • Cummaudo A, Vasa R, Grundy J, Abdelrazek M, Cain A (2019) Losing confidence in quality: Unspoken evolution of computer vision services. In: 2019 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 333–342

  • Deng L (2018) Artificial intelligence in the rising wave of deep learning: The historical path and future outlook [perspectives]. IEEE Signal Proc Mag 35(1):180–177

    Article  Google Scholar 

  • Forward A, Lethbridge TC (2008) A taxonomy of software types to facilitate search and evidence-based software engineering. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp 179–191

  • Garousi V, Felderer M, Mäntylä MV (2016) The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, pp 1–6

  • Geske F, Hofmann P, Lämmermann L, Schlatt V, Urbach N (2021) Gateways to artificial intelligence: Developing a taxonomy for ai service platforms. In: Twenty-ninth european conference on information systems (ECIS)

  • Gezici B, Tarhan AK (2019) Final pool. https://drive.google.com/file/d/1ve6BpJTrITsfo6auSoWKh48ajWbNb05n/view?usp=sharing

  • Hamada K, Ishikawa F, Masuda S, Matsuya M, Ujita Y (2020) Guidelines for quality assurance of machine learning-based artificial intelligence. In: SEKE2020: The 32nd international conference on software engineering & knowledge engineering, pp 335–341

  • Hannousse A (2021) Searching relevant papers for software engineering secondary studies: Semantic scholar coverage and identification role. IET Softw 15 (1):126–146

    Article  Google Scholar 

  • Henriksson J, Borg M, Englund C (2018) Automotive safety and machine learning: Initial results from a study on how to adapt the iso 26262 safety standard. In: 2018 IEEE/ACM 1St international workshop on software engineering for AI in autonomous systems (SEFAIAS). IEEE, pp 47–49

  • Hopgood AA (2005) The state of artificial intelligence. Adv Comput 65:1–75

    Article  Google Scholar 

  • Horkoff J (2019) Non-functional requirements for machine learning: Challenges and new directions. In: 2019 IEEE 27Th international requirements engineering conference (RE). IEEE, pp 386–391

  • Hyun Park S, Seon Shin W, Hyun Park Y, Lee Y (2017) Building a new culture for quality management in the era of the fourth industrial revolution. Total Qual Manag Bus Excell 28(9-10):934–945

    Article  Google Scholar 

  • Ishikawa F, Yoshioka N (2019) How do engineers perceive difficulties in engineering of machine-learning systems?-questionnaire survey. In: 2019 IEEE/ACM Joint 7th international workshop on conducting empirical studies in industry (CESI) and 6th international workshop on software engineering research and industrial practice (SER&IP). IEEE, pp 2–9

  • ISO/IEC (2011) Iso/iec 25010 (2011)-systems and software quality requirements and evaluation (square)-system and software quality models. International Standard ISO/IEC 25010 2(1):1–25

    Google Scholar 

  • Ivarsson M, Gorschek T (2011) A method for evaluating rigor and industrial relevance of technology evaluations. Empir Softw Eng 16(3):365–395

    Article  Google Scholar 

  • Kitchenham B (2004) Procedures for performing systematic reviews. keele, UK. Keele Univ 33(2004):1–26

    Google Scholar 

  • Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering -

  • Kuwajima H, Ishikawa F (2019) Adapting square for quality assessment of artificial intelligence systems. In: 2019 IEEE International symposium on software reliability engineering workshops (ISSREW). IEEE, pp 13–18

  • Kuwajima H, Yasuoka H, Nakae T (2018) Open problems in engineering and quality assurance of safety critical machine learning systems. arXiv:1812.03057

  • Kuwajima H, Yasuoka H, Nakae T (2020) Engineering problems in machine learning systems. Mach Learn 109(5):1103–1126

    Article  MathSciNet  MATH  Google Scholar 

  • Lakshen GA, Vraneš S., Janev V (2016) Big data and quality: A literature review. In: 2016 24Th telecommunications forum (TELFOR). IEEE, pp 1–4

  • Lenarduzzi V, Lomio F, Moreschini S, Taibi D, Tamburri DA (2021) Software quality for ai: Where we are now?. In: International conference on software quality. Springer, pp 43–53

  • Liu Y, Ma L, Zhao J (2019) Secure deep learning engineering: a road towards quality assurance of intelligent systems. In: International conference on formal engineering methods. Springer, pp 3–15

  • Lwakatare LE, Raj A, Crnkovic I, Bosch J, Olsson HH (2020) Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions. Inf Softw Technol 127:106368

    Article  Google Scholar 

  • Malik V, Singh S (2020) Artificial intelligent environments: risk management and quality assurance implementation. J Discret Math Sci Cryptogr 23 (1):187–195

    Article  MATH  Google Scholar 

  • Mannarswamy S, Roy S, Chidambaram S (2020) Tutorial on software testing & quality assurance for machine learning applications from research bench to real world. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp 373–374

  • Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems: A survey. arXiv:2105.01984

  • Masuda S, Ono K, Yasue T, Hosokawa N (2018) A survey of software quality for machine learning applications. In: 2018 IEEE International conference on software testing, verification and validation workshops (ICSTW). IEEE, pp 279–284

  • Murphy C, Kaiser GE, Arias M (2006) A framework for quality assurance of machine learning applications -

  • Nakajima S (2018) Quality assurance of machine learning software. In: 2018 IEEE 7Th global conference on consumer electronics (GCCE). IEEE, pp 601–604

  • Nakajima S (2019) Distortion and faults in machine learning software. In: International workshop on structured object-oriented formal language and method. Springer, pp 29–41

  • Nakamichi K, Ohashi K, Namba I, Yamamoto R, Aoyama M, Joeckel L, Siebert J, Heidrich J (2020) Requirements-driven method to determine quality characteristics and measurements for machine learning software and its evaluation. In: 2020 IEEE 28Th international requirements engineering conference (RE). IEEE, pp 260–270

  • Nascimento E, Nguyen-Duc A, Sundbø I, Conte T (2020) Software engineering for artificial intelligence and machine learning software: A systematic literature review. arXiv:2011.03751

  • Nguyen-Duc A, Abrahamsson P (2020) Continuous experimentation on artificial intelligence software: a research agenda. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1513–1516

  • Nishi Y, Masuda S, Ogawa H, Uetsuki K (2018) A test architecture for machine learning product. In: 2018 IEEE International conference on software testing, verification and validation workshops (ICSTW). IEEE, pp 273–278

  • Ongsulee P (2017) Artificial intelligence, machine learning and deep learning. In: 2017 15Th international conference on ICT and knowledge engineering (ICT&KE). IEEE, pp 1–6

  • Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18

    Article  Google Scholar 

  • Pons L, Ozkaya I (2019) Priority quality attributes for engineering ai-enabled systems. arXiv:1911.02912

  • Poth A, Meyer B, Schlicht P, Riel A (2020) Quality assurance for machine learning–an approach to function and system safeguarding. In: 2020 IEEE 20Th international conference on software quality, reliability and security (QRS). IEEE, pp 22–29

  • Rahman MS, Reza H (2020) Systematic mapping study of non-functional requirements in big data system. In: 2020 IEEE International conference on electro information technology (EIT). IEEE, pp 025–031

  • Riccio V, Jahangirova G, Stocco A, Humbatova N, Weiss M, Tonella P (2020) Testing machine learning based systems: a systematic mapping. Empir Softw Eng 25(6):5193–5254

    Article  Google Scholar 

  • Rushby J (1988) Quality measures and assurance for AI software, vol 18. National Aeronautics and Space Administration, Scientific and Technical Information Division

  • Russel S, Norvig P (2009) Artificial intelligence: a modern approach, English

  • Samoili S, Cobo ML, Gomez E, De Prato G, Martinez-Plumed F, Delipetrev B (2020) Ai watch. defining artificial intelligence. towards an operational definition and taxonomy of artificial intelligence. In: JRC Technical reports. Joint research centre (seville site)

  • Siebert J, Joeckel L, Heidrich J, Nakamichi K, Ohashi K, Namba I, Yamamoto R, Aoyama M (2020) Towards guidelines for assessing qualities of machine learning systems. In: International conference on the quality of information and communications technology. Springer, pp 17–31

  • Taleb I, Serhani MA, Dssouli R (2018) Big data quality: a survey. In: 2018 IEEE International congress on big data (bigdata congress). IEEE, pp 166–173

  • Tao C, Gao J, Wang T (2019) Testing and quality validation for ai software–perspectives, issues, and practices. IEEE Access 7:120164–120175

  • Tao C, Hao C, Gao J, Wang T, Wen W (2017) A practical study on quality evaluation for age recognition systems. In: SEKE, pp 345–350

  • Tsintzira AA, Arvanitou EM, Ampatzoglou A, Chatzigeorgiou A (2020) Applying machine learning in technical debt management: Future opportunities and challenges. In: International conference on the quality of information and communications technology. Springer, pp 53–67

  • Turhan B, Kutlubay O (2007) Mining software data. In: 2007 IEEE 23Rd international conference on data engineering workshop. IEEE, pp 912–916

  • Vinayagasundaram B, Srivatsa S (2007) Software quality in artificial intelligence system. Inf Technol J 6(6):835–842

    Article  Google Scholar 

  • Vogelsang A, Borg M (2019) Requirements engineering for machine learning: Perspectives from data scientists. In: 2019 IEEE 27Th international requirements engineering conference workshops (REW). IEEE, pp 245–251

  • Wan Z, Xia X, Lo D, Murphy GC (2019) How does machine learning change software development practices? IEEE Transactions on Software Engineering

  • Wieringa RJ (2014) Design science methodology for information systems and software engineering. Springer

  • Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp 1–10

  • Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media

  • Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering

  • Zhang P, Cao W, Muccini H (2020) Quality assurance technologies of big data applications: A systematic literature review. arXiv:2002.01759

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bahar Gezici.

Ethics declarations

Conflicts of Interests/Competing interests

Please find attached the paper, “Systematic Literature Review on Software Quality for AI-based Software” by Bahar Gezici and Ayça Kolukısa Tarhan, which we would like to submit for possible publication to the Empirical Software Engineering. We confirm that this work is original and has not been published elsewhere nor is it currently under consideration for publication elsewhere.

For any information concerning this manuscript, please contact me preferably by e-mail at bahargezici@cs.hacettepe.edu.tr. Thank you for your consideration of this manuscript.

Additional information

Communicated by: Paolo Tonella

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 18 Mapping between each primary study ID, e.g. P1, P2, and the reference to the corresponding paper
Table 19 Detailed information per study for RQ 2.2 (addressed challenges of quality) and RQ 4.1 (how these challenges are addressed)
Table 20 Details of bottom-up approach followed in this SLR for relations of metrics and quality attributes used in primary studies

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gezici, B., Tarhan, A.K. Systematic literature review on software quality for AI-based software. Empir Software Eng 27, 66 (2022). https://doi.org/10.1007/s10664-021-10105-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10105-2

Keywords

Navigation