Empirical Software Engineering

, Volume 15, Issue 4, pp 423–454 | Cite as

Support planning and controlling of early quality assurance by combining expert judgment and defect data—a case study

  • Michael Kläs
  • Haruka Nakao
  • Frank Elberzhager
  • Jürgen Münch


Planning quality assurance (QA) activities in a systematic way and controlling their execution are challenging tasks for companies that develop software or software-intensive systems. Both require estimation capabilities regarding the effectiveness of the applied QA techniques and the defect content of the checked artifacts. Existing approaches for these purposes need extensive measurement data from historical projects. Due to the fact that many companies do not collect enough data for applying these approaches (especially for the early project lifecycle), they typically base their QA planning and controlling solely on expert opinion. This article presents a hybrid method combining commonly available measurement data and context-specific expert knowledge. To evaluate the method’s applicability and usefulness, we conducted a case study in the context of independent verification and validation activities for critical software in the space domain. A hybrid defect content and effectiveness model was developed for the software requirements analysis phase and evaluated with available legacy data. One major result is that the hybrid model provides improved estimation accuracy when compared to applicable models based solely on data. The mean magnitude of relative error (MMRE) determined by cross-validation is 29.6% compared to 76.5% obtained by the most accurate data-based model.


Software quality assurance Quality management Quality assurance effectiveness Defect content estimation Hybrid prediction model 



We would like to thank the development project staff and the IV&V staff from the JAXA Engineering Digital Innovation Center (JEDI) at the Japanese Aerospace Exploration Agency (JAXA), where we conducted the case study to construct the hybrid prediction model. We would like to thank the staff of JAMSS, who greatly contributed by answering the questionnaires and giving us historical experience data. Finally, we would like to thank Adam Trendowicz and Marcus Ciolkowski from Fraunhofer IESE for the initial review of the paper, Sonnhild Namingha for proofreading, and the anonymous reviewers of the International Symposium on Software Reliability Engineering and the Journal of Empirical Software Engineering for their valuable feedback. Parts of this work have been funded by the BMBF SE2006 project TestBalance (grant 01 IS F08 D).


  1. Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1):125–127zbMATHCrossRefMathSciNetGoogle Scholar
  2. Aurum A, Petersson H, Wohlin C (2002) State-of-the-art: software inspections after 25 years. Softw Test Verif Reliab 12(3):131–154CrossRefGoogle Scholar
  3. Bibi S, Tsoumakas G, Stamelos I, Vlahvas I (2006) Software defect prediction using regression via classification. Int Conf Comput Syst Appl, pp 330–336Google Scholar
  4. Briand L, Freimunt B (2004) Using multiple adaptive regression splines to support decision making in code inspections. J Syst SoftwGoogle Scholar
  5. Briand L, El Emam K, Freimut B, Laitenberger O (1997) Quantitative evaluation of capture-recapture models to control software inspections. 8th Int Symp Softw Reliability Eng, pp 234–244Google Scholar
  6. Briand L, El Emam K, and Bomarius F (1998) COBRA: a hybrid method for software cost estimation, benchmarking, and risk assessment. ISERN-97-24Google Scholar
  7. Briand L, El Emam K, Freimut B, Laitenberger O (2000a) A comprehensive evaluation of capture-recapture models for estimating software defect content. IEEE Trans Softw Eng 26(6):518–540CrossRefGoogle Scholar
  8. Briand L, Wüst J, Daly JW, Porter V (2000b) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51:245–273CrossRefGoogle Scholar
  9. Conte SD, Dunsmore HE, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummings, Menlo Park, CAGoogle Scholar
  10. Cook TD, Campbell DT (1979) Quasi-experimentation: design and analysis issues for field settings. Mifflin, BostonGoogle Scholar
  11. Eick SG, Loader CR, Long MD, Votta LG, Wiel SV (1992) Estimating software fault content before coding. 14th Int Conf Softw Eng, pp 59–65Google Scholar
  12. El Emam K, Laitenberger O, Harbich T (2000) The application of subjective estimates of effectiveness to controlling software inspections. J Syst Softw USA 54(2):119–136CrossRefGoogle Scholar
  13. Endres A, Rombach D (2003) A handbook of software and systems engineering. Addison WesleyGoogle Scholar
  14. Fenton N, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):676–689CrossRefGoogle Scholar
  15. Fishman GS (1995) Monte Carlo: concepts, algorithms, and applications. Springer Verlag, New YorkGoogle Scholar
  16. Freimut B (2006) MAGIC A hybrid modeling approach for optimizing inspection cost-effectiveness. Fraunhofer-IRBVerlag, StuttgartGoogle Scholar
  17. Friedman J (1991) Multivariate adaptive regression splines. Ann Stat 19:1–141zbMATHCrossRefGoogle Scholar
  18. Halstead MH (1977) Elements of software science. Elsevier, New YorkzbMATHGoogle Scholar
  19. Huang L, Boehm B (2005) Determining how much software assurance is enough? A value-based approach. In: International Symposium on Empirical Software Engineering, Noosa Heads, Qld., Australia, 17–18 NovGoogle Scholar
  20. IEEE (2005) Std. 1012-2004. IEEE standard for software verification and validation. IEEE Comput SocGoogle Scholar
  21. IESE Fraunhofer (2008) CoBRIX Tool. Accessed 1 May 2008
  22. Jacobs J, van Moll J, Kusters R, Trienekens J, Brombacher A (2007) Identification of factors that influence defect injection and detection in development of software intensive products. Inf Softw Technol 49(7):774–789CrossRefGoogle Scholar
  23. Jones C (1996) Applied software measurement: assuring productivity and quality, 2nd edn. McGraw-Hill, New YorkGoogle Scholar
  24. Juristo N, Moreno AM, Vegas S (2002) A survey on testing technique empirical studies: how limited is our knowledge? 1st Int Symp Empir Softw Eng, pp 161–172Google Scholar
  25. Kan SH (2003) Metrics and models in software quality engineering, 2nd edn. Addison-Wesley, BostonGoogle Scholar
  26. Kendall MG, Smith B (1939) The problem of m rankings. Ann Math Stat 3:275–287CrossRefMathSciNetGoogle Scholar
  27. Kitchenham BA, Pickard LM, MacDonell SG, Shepperd MJ (2001) What accuracy statistics really measure. IEEE Softw 148(3):81–85CrossRefGoogle Scholar
  28. Kläs M, Trendowicz A, Wickenkamp A, Münch J, Kikuchi N, Ishigai Y (2008) The use of simulation techniques for hybrid software cost estimation and risk analysis. In: Advances in computers, (74)115–174, ElsevierGoogle Scholar
  29. Kohtake N, Katoh A, Ishihama N, Miyamoto Y, Kawasaki T, Katahira M (2008) Software independent verification and validation for spacecraft at JAXA. IEEE Aerosp ConfGoogle Scholar
  30. McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320CrossRefMathSciNetGoogle Scholar
  31. McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245zbMATHCrossRefMathSciNetGoogle Scholar
  32. Meyer MA, Booker JM (2001) Eliciting and analyzing expert judgment. A practical guide. [First publ. by Acad. Press Ltd, London, 1991]. Philadelphia, Pa: Society for Industrial and Applied Mathematics and American Statistical Association (ASA-SIAM series on statistics and applied probability, 7)Google Scholar
  33. Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. 28th Int Conf Softw Eng, pp 452–461Google Scholar
  34. Nakao H, Yoshikawa S, Port D, Miyamoto Y, Katahira M (2007) Comparing model generated with expert generated IV\&V activity plans. Proc 1st Int Symp Emp Softw Eng Meas: IEEE Comp Soc, pp 71–80Google Scholar
  35. NIST (2002) Planning Report 02-3, The economic impacts of inadequate infrastructure for software qualityGoogle Scholar
  36. Petersson H, Thelin T, Runeson P, Wohlin C (2004) Capture-recapture in software inspections after 10 years research. Theory, evaluation and application. J Syst Softw 72(2):249–264CrossRefGoogle Scholar
  37. Ruhe M, Jeffery R, Wieczorek I (2003) Cost estimation for web applications. 25th Int Conf Softw Eng, pp 285–294Google Scholar
  38. Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall/CRC, Boca Raton, FlazbMATHGoogle Scholar
  39. Shull F, Basili V, Boehm B, Brown AW, Costa A, Lindvall M, Port D, Rus I, Tesoriero R, Zelkowitz M (2002) What we have learned about fighting defects. 8th Int Symp Softw Metr USA, pp 249–258Google Scholar
  40. Trendowicz A, Heidrich J, Münch J, Ishigai Y, Yokoyama K, Kikuchi N (2006) Development of a hybrid cost estimation model in an iterative manner. 28th Int Conf Softw Eng, pp 331–340Google Scholar
  41. Trendowicz A, Münch J, Jeffery R (2008) State of the practice in software effort estimation: a survey and literature review. Proceedings to the 3rd IFIP TC2 Central and East European Conference on Software Engineering Techniques, Brno, 13–15 October 2008. To appear in Springer LNCS, Springer Verlag, 2009Google Scholar
  42. Vose D (1996) Quantitative risk analysis. a guide to Monte Carlo simulation modeling. Wiley, ChichesterGoogle Scholar
  43. Weller EF (1994) Using metrics to manage software projects. IEEE Comput J USA 27(9):27–33Google Scholar
  44. Wohlin C, Runeson P (1998) Defect content estimations from review data. 20th Int Conf Softw Eng, pp 400–409Google Scholar
  45. Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2000) Experimentation in software engineering an introduction. Kluwer, Boston, MAzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Michael Kläs
    • 1
  • Haruka Nakao
    • 2
  • Frank Elberzhager
    • 1
  • Jürgen Münch
    • 1
  1. 1.Fraunhofer Institute for Experimental Software EngineeringKaiserslauternGermany
  2. 2.Safety & Product Assurance DepartmentJapan Manned Space Systems CorporationTsuchiuraJapan

Personalised recommendations