Empirical Software Engineering

, Volume 22, Issue 4, pp 1634–1683 | Cite as

Assessing the quality of industrial avionics software: an extensive empirical evaluation

  • Ji WuEmail author
  • Shaukat Ali
  • Tao Yue
  • Jie Tian
  • Chao Liu
Experience Report


A real-time operating system for avionics (RTOS4A) provides an operating environment for avionics application software. Since an RTOS4A has safety-critical applications, demonstrating a satisfactory level of its quality to its stakeholders is very important. By assessing the variation in quality across consecutive releases of an industrial RTOS4A based on test data collected over 17 months, we aim to provide a set of guidelines to 1) improve the test effectiveness and thus the quality of subsequent RTOS4A releases and 2) similarly assess the quality of other systems from test data. We carefully defined a set of research questions, for which we defined a number of variables (based on available test data), including release and measures of test effort, test effectiveness, complexity, test efficiency, test strength, and failure density. With these variables, to assess the quality in terms of number of failures found in tests, we applied a combination of analyses, including trend analysis using two-dimensional graphs, correlation analysis using Spearman’s test, and difference analysis using the Wilcoxon rank test. Key results include the following: 1) The number of failures and failure density decreased in the latest releases and the test coverage was either high or did not decrease with each release; 2) increased test effort was spent on modules of greater complexity and the number of failures was not high in these modules; and 3) the test coverage for modules without failures was not lower than the test coverage for modules with failures uncovered in all the releases. The overall assessment, based on the evidences, suggests that the quality of the latest RTOS4A release has improved. We conclude that the quality of the RTOS4A studied was improved in the latest release. In addition, our industrial partner found our guidelines useful and we believe that these guidelines can be used to assess the quality of other applications in the future.


Software Quality assessment Real time operating system Avionics software Industrial case study 



This research is jointly supported by the Technology Foundation Program (JSZL2014601B008) of the National Defense Technology Industry Ministry, the State Key Laboratory of the Software Development Environment (SKLSDE-2013ZX-12). This work was also supported by the MBT4CPS project funded by the Research Council of Norway (grant no. 240013/O70) under the category of Young Research Talents of the FRIPO funding scheme. Tao Yue and Shaukat Ali are also supported by the EU Horizon 2020 project U-Test ( (grant no. 645463), the RFF Hovedstaden funded MBE-CR (grant no. 239063) project, the Research Council of Norway funded Zen-Configurator (grant no. 240024/F20) project, and the Research Council of Norway funded Certus SFI (grant no. 203461/O30) (


  1. Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52CrossRefGoogle Scholar
  2. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1):289–300MathSciNetzbMATHGoogle Scholar
  3. Boehm BW, Brown JR, Lipow M (1976) Quantitative evaluation of software quality[C]. In: Proceedings of the 2nd international conference on Software engineering. IEEE Computer Society Press, p 592–605Google Scholar
  4. Cai X, Lyu MR (2005) The effect of code coverage on fault detection under different testing profiles. ACM SIGSOFT Softw Eng Notes 30(4):1–7Google Scholar
  5. David L, Podgurski A (2003) A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases. In: Proceedings of the 14th International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society Press, p 442–453Google Scholar
  6. Denaro G, Pezze M (2002) An empirical evaluation of fault proneness models. Software Engineering, 2002. ICSE 2002. In: Proceedings of the 24rd International Conference on. IEEE, 2002.Google Scholar
  7. Fenton NE, Neil M (1999) Software metrics: successes, failures and new directions. J Syst Softw 47:149–157CrossRefGoogle Scholar
  8. Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814CrossRefGoogle Scholar
  9. Fenton N, Pfleeger SL (1996) Software metrics—A rigorous and practical approach, 2nd edn. International Thomson Computer Press, LondonGoogle Scholar
  10. Fujii T, Dohi T, Fujiwara T. Towards quantitative software reliability assessment in incremental development processes[C]. In: Proceedings of the 33rd International Conference on Software Engineering. ACM, p 41–50Google Scholar
  11. Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RL (2006) Multivariate data analysis, vol 6. Pearson Prentice Hall, Upper Saddle RiverGoogle Scholar
  12. Horgan JR, London S, Lyu MR (1994) Achieving software quality with testing coverage measures[J]. Computer 27(9):60–69Google Scholar
  13. Hutchings M, Goradia T, Ostrand T et al (1994) Experiments of the effectiveness of dataflow-and controlflowbased test adequacy criteria[C]. In: Proceedings of the 16th international conference on Software engineering. IEEE Computer Society Press, p 191–200Google Scholar
  14. IEEE standard for software test documentation, IEEE Std 829-1998, 16 Dec. 1998Google Scholar
  15. Inozemtseva L, Holmes R (2014) Coverage is not strongly correlated with test suite effectiveness[C]. In: Proceedings of the 36th International Conference on Software Engineering. ACM, p 435–445Google Scholar
  16. International Standard ISO/IEC 12207, Information Technology-Software Life Cycle Processes, International Organization for Standardization, International Electrotechnical Commission, 1995Google Scholar
  17. ISO/IEC 9126-1:2001: Software Engineering – Product Quality. Part 1: Quality Model. Geneva, Switzerland: International Organization for Standardization, 2001Google Scholar
  18. ISO/IEC 25010 (2011) Systems and software engineering — systems and software quality requirements and evaluation (SQuaRE) — system and software quality models[J]. International Organizationfor Standardization, 2011: 34Google Scholar
  19. Kevrekidis K, Albers S, Sonnemans PJM, Stollman GM (2009) Software complexity and testing effectiveness: an empirical study[C]. Reliability and Maintainability Symposium, 2009. RAMS 2009. Annual. IEEE, 2009: 539–543Google Scholar
  20. Khoshgoftaar TM, Allen EB (1999) Logistic regression modeling of software quality. Int J Reliab Qual Saf Eng 6(04):303–317CrossRefGoogle Scholar
  21. Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system[C]. Software Reliability Engineering. In: Proceedings, SeventhInternational Symposium on. IEEE, p 364–371Google Scholar
  22. Lyu MR, Huang Z, Sze KS, Cai X (2003) An empirical study on testing and fault tolerance for software reliability engineering[C]. Software Reliability Engineering. ISSRE 2003. 14th International Symposium on. IEEE, p 119–130Google Scholar
  23. Malaiya YK, Li N, Bieman J, Karcich R, Skibbe B (1994) The relationship between test coverage and reliability[C]. Software Reliability Engineering, p 186–195. In: Proceedings, 5th International Symposium on. IEEE,Google Scholar
  24. Malaiya YK, Li N, Bieman J, Karcich R (2002) Software reliability growth with test coverage. IEEE Trans Reliab 45(4):420–426CrossRefGoogle Scholar
  25. Marick B (1991) Experience with the cost of different coverage goals for testing[C]. In: Proceedings Pacific Northwest Soft. Quality Conf, p 147–164Google Scholar
  26. Marick B (1999) How to misuse code coverage[C]. In: Proceedings of the 16th Interational Conference on Testing Computer Software, p 16–18Google Scholar
  27. Memon AM, Xie Q (2005) Studying the fault-detection effectiveness of GUI test cases for rapidly evolving software. IEEE Trans Softw Eng 31(10):884–896CrossRefGoogle Scholar
  28. Mockus A, Nagappan N, Dinh-Trong TT (2009) Test coverage and post-verification defects: A multiple case study[C]. Empirical Software Engineering and Measurement. ESEM 2009. 3rd International Symposium on IEEE, p 291–301Google Scholar
  29. Moller K-H, Paulish D (1993) An empirical investigation of software fault distribution[C]. Software Metrics Symposium. In: Proceedings., First International. IEEE, p 82–90Google Scholar
  30. Musa JD, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill, New YorkGoogle Scholar
  31. Nagappan N, Williams L, Vouk M, Osborne J (2005) Early estimation of software quality using in-process testing metrics: a controlled case study. ACM SIGSOFT Softw Eng Notes 30(4):1–7CrossRefGoogle Scholar
  32. Nagappan N, Ball T, Murphy B (2006) Using historical in-process and product metrics for early estimation of software failures, Proceedings of the 17th International Symposium on Software Reliability Engineering, pp. 62–74Google Scholar
  33. Nagappan N, Maximilien EM, Bhat T, Williams L (2008) Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empir Softw Eng 13:289–302CrossRefGoogle Scholar
  34. Ntafos SC (1998) A comparison of some structural testing strategies. IEEE Trans Softw Eng 6:868–874Google Scholar
  35. Rice J (2006) Mathematical statistics and data analysis, 3rd edn. Nelson EducationGoogle Scholar
  36. Rosenberg L, Hammer T, Shaw J (1998) Software metrics and reliability. Proceedings of the Ninth International Symposium on Software Reliability EngineeringGoogle Scholar
  37. RTCA/DO-178B, Software Considerations in Airborne Systems and Equipment Certification, December 1, 1992Google Scholar
  38. Schneidewind NF (1999) Measuring and evaluating maintenance process using reliability, risk, and test metrics. IEEE Trans Softw Eng 25(6):768–781CrossRefGoogle Scholar
  39. Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers, NorwellCrossRefzbMATHGoogle Scholar
  40. Wu J, Ali S, Yue T, Tian J (2013) Experience report: Assessing the reliability of an industrial avionics software: Results, insights and recommendations[C]. Software Reliability Engineering (ISSRE), IEEE 24th International Symposium on. IEEE, p 218–227Google Scholar
  41. Yekutieli D, Benjamini Y (1999) Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plann Infer 82(1–2):171–196MathSciNetCrossRefzbMATHGoogle Scholar
  42. Zhang X, Pham H (2000) An analysis of factors affecting software reliability. J Syst Softw 50(1):43–56CrossRefGoogle Scholar
  43. Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependencygraphs[C]. Proceedings of the 30th international conference on Software engineering. ACM, p 531–540Google Scholar
  44. Zimmermann T, Nagappan N, Herzig K, Premraj R, Williams L (2011) An empirical study on the relation between dependency neighborhoods and failures[C]. Software Testing, Verification and Validation (ICST), 2011 IEEE FourthInternational Conference on. IEEE, p 347–356Google Scholar
  45. Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process, Proceedings of the Seventh Joint Meeting of the European Software Engineering Conference/ACM SIGSOFT Symposium on the Foundations of Software Engineering, p 91–100Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Ji Wu
    • 1
    Email author
  • Shaukat Ali
    • 2
  • Tao Yue
    • 2
    • 3
  • Jie Tian
    • 1
  • Chao Liu
    • 1
  1. 1.School of Computer Science and EngineeringBeihang UniversityBeijingChina
  2. 2.Simula Research LaboratoryOsloNorway
  3. 3.University of OsloOsloNorway

Personalised recommendations