Software Quality Journal

, Volume 22, Issue 1, pp 51–86 | Cite as

Prediction of faults-slip-through in large software projects: an empirical evaluation

  • Wasif AfzalEmail author
  • Richard Torkar
  • Robert Feldt
  • Tony Gorschek


A large percentage of the cost of rework can be avoided by finding more faults earlier in a software test process. Therefore, determination of which software test phases to focus improvement work on has considerable industrial interest. We evaluate a number of prediction techniques for predicting the number of faults slipping through to unit, function, integration, and system test phases of a large industrial project. The objective is to quantify improvement potential in different test phases by striving toward finding the faults in the right phase. The results show that a range of techniques are found to be useful in predicting the number of faults slipping through to the four test phases; however, the group of search-based techniques (genetic programming, gene expression programming, artificial immune recognition system, and particle swarm optimization–based artificial neural network) consistently give better predictions, having a representation at all of the test phases. Human predictions are consistently better at two of the four test phases. We conclude that the human predictions regarding the number of faults slipping through to various test phases can be well supported by the use of search-based techniques. A combination of human and an automated search mechanism (such as any of the search-based techniques) has the potential to provide improved prediction results.


Prediction Empirical Faults-slip-through Search-based 



We are grateful to Prof. Anneliese Andrews, University of Denver, for reading and commenting on the initial concept paper.


  1. Afzal, W. (2009). Search-based approaches to software fault prediction and software testing. Blekinge Institute of Technology Licentiate Series No. 2009:06, Ronneby, Sweden.Google Scholar
  2. Afzal, W. (2010). Using faults-slip-through metric as a predictor of fault-proneness: Proceedings of the 21st Asia Pacific Software Engineering Conference (APSEC’10), IEEE.Google Scholar
  3. Afzal, W., & Torkar, R. (2008). A comparative evaluation of using genetic programming for predicting fault count data: Proceedings of the 3rd International Conference on Software Engineering Advances (ICSEA’08), IEEE.Google Scholar
  4. Afzal, W., Torkar, R., Feldt, R., & Wikstrand, G. (2010). Search-based prediction of fault-slip-through in large software projects: Proceedings of the 2nd International Symposium on Search-Based Software Engineering (SSBSE’10), IEEE Computer Society. pp. 79–88.Google Scholar
  5. Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.Google Scholar
  6. Antolić, Ž. (2007). Fault slip through measurement process implementation in CPP software implementation: Proceedings of the 30th Jubilee International Convention (MIPRO’07). Ericsson Nikola Tesla.Google Scholar
  7. Arisholm, E., Briand, L. C., & Johannessen, E. B. (2010). A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 83(1), 2–17.CrossRefGoogle Scholar
  8. Blickle, T. (1996). Theory of evolutionary algorithms and application to system synthesis. PhD thesis. Zurich, Switzerland: Swiss Federal Institute of Technology.Google Scholar
  9. Boehm, B., & Basili, V.R. (2001). Software defect reduction top 10 list. Computer, 34(1), 135–137.CrossRefGoogle Scholar
  10. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.zbMATHMathSciNetGoogle Scholar
  11. Briand, L., Emam, K., Freimut, B., & Laitenberger, O. (2000). A comprehensive evaluation of capture-recapture models for estimating software defect content. IEEE Transactions on Software Engineering, 26(6).Google Scholar
  12. Canu S, Grandvalet Y, Guigue V, & Rakotomamonjy A (2005) SVM and kernel methods toolbox. Perception Systémes et Information, INSA de Rouen, Rouen, France.Google Scholar
  13. Catal, C., & Diri, B. (2009). A systematic review of software fault prediction studies. Expert Systems with Applications, 36(4), 7346–7354.CrossRefGoogle Scholar
  14. Challagulla, V., Bastani, F., Yen, I., & Paul, R. (2005). Empirical assessment of machine learning based software defect prediction techniques. Proceedings of the 10th IEEE workshop on object oriented real-time dependable systems.Google Scholar
  15. Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. 12th International Conference on Machine Learning (ICML’95).Google Scholar
  16. Damm, L. O. (2007). Early and cost-effective software fault detection—Measurement and implementation in an industrial setting. PhD thesis, Blekinge Institute of Technology.Google Scholar
  17. Damm, L. O., Lundberg, L., & Wohlin, C. (2006). Faults-slip-through—a concept for measuring the efficiency of the test process. Software Process: Improvement & Practice, 11(1), 47–59.Google Scholar
  18. Fenton, N. E., & Neil, M. (1999). A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), 675–689.CrossRefGoogle Scholar
  19. Ferreira, C. (2001). Gene expression programming: A new adaptive algorithm for solving problems. Complex Systems, 13(2).Google Scholar
  20. Gyimothy, T., Ferenc, R., & Siket, I. (2005). Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software Engineering, 31(10). 897–910.Google Scholar
  21. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009), The WEKA data mining software: An update. SIGKDD Explorations Newsletter, 11(1), 10–18.CrossRefGoogle Scholar
  22. Harman, M. (2007). The current state and future of search based software engineering. Proceeding of the Future of Software Engineering (FOSE’07). Washington, DC, USA: IEEE Computer Society, pp. 342–357.Google Scholar
  23. Harman, M. (2010). The relationship between search based software engineering and predictive modeling: Proceedings of the 6th International Conference on Predictive Models in Software Engineering (PROMISE’10). New York, NY, USA: ACM.Google Scholar
  24. Harman, M., & Jones, B. (2001). Search-based software engineering. Information and Software Technology, 43(14), 833–839.CrossRefGoogle Scholar
  25. Hribar, L. (2008) Implementation of FST in design phase of the project: Proceedings of the 31st Jubilee International Convention (MIPRO’08). Ericsson Nikola Tesla.Google Scholar
  26. Hughes, R. T. (1996). Expert judgement as an estimating method. Information and Software Technology, 38(2), 67–75.CrossRefGoogle Scholar
  27. IEEE. (1990) IEEE standard glossary of software engineering terminology—IEEE Std 610.12-1990. Standards Coordinating Committee of the Computer Society of the IEEE, IEEE Standards Board. New York, USA: The Institute of Electrical and Electronic Engineers, Inc.Google Scholar
  28. Ioannidis, J. P. A. (2005) Why most published research findings are false. PLoS Medicine, 2(8), 696–701.CrossRefGoogle Scholar
  29. Jha, G. K., Thulasiraman, P., & Thulasiram, R. K. (2009). PSO based neural network for time series forecasting: International Joint Conference on Neural Networks.Google Scholar
  30. Jørgensen, M., Kirkebøen, G., Sjøberg, D. I. K., Anda, B., & Bratthall, L. (2000). Human judgement in effort estimation of software projects: Proceedings of the workshop on using multi-disciplinary approaches in empirical software engineering research, co-located with ICSE’00. Ireland: Limerick.Google Scholar
  31. Juristo, N., & Moreno, A. M. (2001). Basics of software engineering experimentation. Dordrecht: Kluwer Academic Publishers.CrossRefzbMATHGoogle Scholar
  32. Kachigan, S. K. (1982). Statistical analysis—an interdisciplinary introduction to univariate and multivariate methods. New York: Radius Press.Google Scholar
  33. Kitchenham, B., Pickard, L., MacDonell, S., & Shepperd, M. (2001). What accuracy statistics really measure? IEE Proceedings Software, 148(3), 81–85.Google Scholar
  34. Lavesson, N., & Davidsson, P. (2008). Generic methods for multi-criteria evaluation: Proceedings of the SIAM International Conference on Data Mining (SD’08).Google Scholar
  35. Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.CrossRefGoogle Scholar
  36. Liu, Y., Khoshgoftaar, T., & Seliya, N. (2010). Evolutionary optimization of software quality modeling with multiple repositories. IEEE Transactions on Software Engineering (Article in print).Google Scholar
  37. Lyu, M.R. (ed) (1996). Handbook of software reliability engineering. Hightstown, NJ: McGraw-Hill Inc.Google Scholar
  38. Mohagheghi, P., Conradi, R., Killi, O. M., & Schwarz, H. (2004). An empirical study of software reuse vs. defect-density and stability: Proceedings of the 26th International Conference on Software Engineering (ICSE’04). Washington, DC, USA: IEEE Computer Society.Google Scholar
  39. Nagappan, N., & Ball, T. (2005). Static analysis tools as early indicators of pre-release defect density: Proceedings of the 27th international conference on Software engineering (ICSE’05). New York, NY, USA: ACM.Google Scholar
  40. Nagappan, N., Murphy, B., & Basili, V. (2008). The influence of organizational structure on software quality: An empirical case study: Proceedings of the 30th International Conference on Software Engineering (ICSE’08). New York, NY, USA: ACM.Google Scholar
  41. Pickard, L., Kitchenham, B., & Linkman, S. (1999). An investigation of analysis techniques for software datasets. In: Proceedings of the 6th International Software Metrics Symposium (METRICS’99). Los Alamitos, USA: IEEE Computer Society.Google Scholar
  42. Poli, R., Langdon, W. B., & McPhee, N. F. (2008). A field guide to genetic programming. Published via and freely available at
  43. Rakitin, S. R. (2001). Software verification and validation for practitioners and managers (2nd ed.). 685 Canton Street, Norwood, MA, USA: Artech House., Inc.Google Scholar
  44. Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1619–1630.CrossRefGoogle Scholar
  45. Runeson, P., Andersson, C., Thelin, T., Andrews, A., & Berling, T. (2006). What do we know about defect detection methods? IEEE Software, 23(3), 82–90.CrossRefGoogle Scholar
  46. Russell S., & Norvig P. (2003) Artificial intelligence—a modern approach. Prentice Hall Series in Artificial Intelligence, USAGoogle Scholar
  47. Shepperd, M., Cartwright, M., & Kadoda, G. (2000). On building prediction systems for software engineers. Empirical Software Engineering, 5(3), 175–182.Google Scholar
  48. STD. (2008) IEEE standard 12207-2008 systems and software engineering—Software life cycle processes. Software and systems engineering standards committee of the IEEE computer society. New York, USA: The Institute of Electrical and Electronic Engineers, Inc.Google Scholar
  49. Staron, M., & Meding, W. (2008). Predicting weekly defect inflow in large software projects based on project planning and test status. Information & Software Technology, 50(7–8), 782–796.Google Scholar
  50. Tian, J. (2004). Quality-evaluation models and measurements. IEEE Software, 21(3), 84–91.CrossRefGoogle Scholar
  51. Tomaszewski, P., Håkansson, J., Grahn, H., & Lundberg, L. (2007). Statistical models vs. expert estimation for fault prediction in modified code—an industrial case study. Journal of Systems and Software, 80(8), 1227–1238.CrossRefGoogle Scholar
  52. Trelea, I. C. (2003). The PSO algorithm: Convergence analysis and parameter selection. IP Letters, 85(6).Google Scholar
  53. Veevers, A., & Marshall, A. C. (1994). A relationship between software coverage metrics and reliability. Software Testing, Verification and Reliability, 4(1), 3–8.Google Scholar
  54. Wagner, S. (2006). A literature survey of the quality economics of defect-detection techniques: Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering (ISESE’06) Google Scholar
  55. Wang, Y. (2000). A new approach to fitting linear models in high dimensional spaces. PhD thesis. New Zealand: Department of Computer Science, University of Waikato.Google Scholar
  56. Wang, Y, & Witten, I. H. (1996). Induction of model trees for predicting continuous classes. Technical report, University of Waikato, Department of Computer Science, Hamilton, New Zealand, URL
  57. Watkins, A., Timmis, J., & Boggess, L. (2004). Artificial immune recognition system (AIRS): An immune-inspired supervised learning algorithm. Genetic programming and Evolvable Machines, 5(3), 291–317.Google Scholar
  58. Weyuker, E. J., Ostrand, T. J., & Bell, R. M. (2010). Comparing the effectiveness of several modeling methods for fault prediction. Empirical Software Engineering, 15(3), 277–295.CrossRefGoogle Scholar
  59. Witten I, Frank E (2005) Data mining—practical machine learning tools and techniques. USA: Morgan–Kaufmann PublisherszbMATHGoogle Scholar
  60. Zhong, S., Khoshgoftaar, T. M., & Seliya, N. (2004). Unsupervised learning for expert-based software quality estimation: Proceedings of the 8th IEEE International Symposium on High Assurance Systems Engineering (HASE’04).Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Wasif Afzal
    • 1
    Email author
  • Richard Torkar
    • 2
    • 3
  • Robert Feldt
    • 2
    • 3
  • Tony Gorschek
    • 3
  1. 1.Department of Computer SciencesBahria UniversityIslamabadPakistan
  2. 2.Department of Computer Science and EngineeringChalmers University of TechnologyGothenburgSweden
  3. 3.School of ComputingBlekinge Institute of TechnologyKarlskronaSweden

Personalised recommendations