Predicting Bugs in Large Industrial Software Systems

  • Thomas J. Ostrand
  • Elaine J. Weyuker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7171)


This chapter is a survey of close to ten years of software fault prediction research performed by our group. We describe our initial motivation, the variables used to make predictions, provide a description of our standard model based on Negative Binomial Regression, and summarize the results of using this model to make predictions for nine large industrial software systems. The systems range in size from hundreds of thousands to millions of lines of code. All have been in the field for multiple years and many releases, and continue to be maintained and enhanced, usually at 3 month intervals.

Effectiveness of the fault predictions is assessed using two different metrics. We compare the effectiveness of the standard model to augmented models that include variables related to developer counts, to inter-file calling structure, and to information about specific developers who modified the code.

We also evaluate alternate prediction models based on different training algorithms, including Recursive Partitioning, Bayesian Additive Regression Trees, and Random Forests.


software fault prediction defect prediction negative binomial model fault-percentile average buggy file ratio calling structure prediction tool 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)CrossRefMATHGoogle Scholar
  2. 2.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)MATHGoogle Scholar
  3. 3.
    Chipman, H.A., George, E.I., McCulloch, R.E.: BART: Bayesian Additive Regression Trees (2008),
  4. 4.
    McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, London (1989)CrossRefMATHGoogle Scholar
  5. 5.
    Ostrand, T.J., Weyuker, E.J.: The Distribution of Faults in a Large Industrial Software System. In: International Symposium on Software Testing and Analysis (ISSTA 2002), pp. 55–64. ACM Press, New York (2002)Google Scholar
  6. 6.
    Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. on Software Engineering 31(4), 340–355 (2005)CrossRefGoogle Scholar
  7. 7.
    Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Programmer-based Fault Prediction. In: Predictive Models for Software Engineering (PROMISE 2010). ACM Press, New York (2010)Google Scholar
  8. 8.
    Shin, Y., Bell, R.M., Ostrand, T.J., Weyuker, E.J.: On the use of calling structure information to improve fault prediction. Empirical Software Eng. (July 2011),
  9. 9.
    Weyuker, E.J., Ostrand, T.J., Bell, R.M.: We’re Finding Most of the Bugs, but What are we Missing? In: 3rd International Conference on Software Testing. IEEE Press, New York (2010)Google Scholar
  10. 10.
    Weyuker, E.J., Ostrand, T.J., Bell, R.M.: Do Too Many Cooks Spoil the Broth? Using the Number of Developers to Enhance Defect Prediction Models. Empirical Software Eng. 13(5), 539–559 (2008)CrossRefGoogle Scholar
  11. 11.
    Weyuker, E.J., Ostrand, T.J., Bell, R.M.: Comparing the Effectiveness of Several Modeling Methods for Fault Prediction. Empirical Software Eng. 15(3), 277–295 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Thomas J. Ostrand
    • 1
  • Elaine J. Weyuker
    • 1
  1. 1.AT&T Labs - ResearchFlorham ParkUSA

Personalised recommendations