Regularities in Learning Defect Predictors

  • Burak Turhan
  • Ayse Bener
  • Tim Menzies
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6156)


Collecting large consistent data sets of real world software projects from a single source is problematic. In this study, we show that bug reports need not necessarily come from the local projects in order to learn defect prediction models. We demonstrate that using imported data from different sites can make it suitable for predicting defects at the local site. In addition to our previous work in commercial software, we now explore open source domain with two versions of an open source anti-virus software (Clam AV) and a subset of bugs in two versions of GNU gcc compiler, to mark the regularities in learning predictors for a different domain. Our conclusion is that there are surprisingly uniform assets of software that can be discovered with simple and repeated patterns in local or imported data using just a handful of examples.


Defect prediction Code metrics Software quality Cross-company 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Menzies, T., Elrawas, O., Barry, B., Madachy, R., Hihn, J., Baker, D., Lum, K.: Accurate estimates without calibration. In: International Conference on Software Process (2008)Google Scholar
  2. 2.
    The Standish Group Report: Chaos (1995)Google Scholar
  3. 3.
    Menzies, T., Port, D., Chen, Z., Hihn, J., Stukes, S.: Specialization and extrapolation of induced domain models: Case studies in software effort estimation. In: IEEE ASE 2005 (2005)Google Scholar
  4. 4.
    Menzies, T., Chen, Z., Hihn, J., Lum, K.: Selecting best practices for effort estimation. IEEE Transactions on Software Engineering (2006)Google Scholar
  5. 5.
    Fenton, N.E., Pfleeger, S.: Software Metrics: A Rigorous & Practical Approach. International Thompson Press (1997)Google Scholar
  6. 6.
    Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering (2007)Google Scholar
  7. 7.
    Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Engg. 14(5), 540–578 (2009)CrossRefGoogle Scholar
  8. 8.
    Bell, R., Ostrand, T., Weyuker, E.: Looking for bugs in all the right places. In: ISSTA 2006: Proceedings of the 2006 international symposium on Software testing and analysis (2006)Google Scholar
  9. 9.
    Ostrand, T., Weyuker, E., Bell, R.: Where the bugs are. ACM SIGSOFT Software Engineering Notes 29(4) (2004)Google Scholar
  10. 10.
    Ostrand, T., Weyuker, E.: The distribution of faults in a large industrial software system. In: ISSTA 2002: Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis (2002)Google Scholar
  11. 11.
    Ostrand, T., Weyuker, E., Bell, R.: Automating algorithms for the identification of fault-prone files. In: ISSTA 2007: Proceedings of the 2007 international symposium on Software testing and analysis (2007)Google Scholar
  12. 12.
    Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. JSS (2007)Google Scholar
  13. 13.
    Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., Jiang, Y.: Implications of ceiling effects in defect predictors. In: Proceedings of PROMISE 2008 Workshop, ICSE (2008)Google Scholar
  14. 14.
    Veldhuizen, T.L.: Software libraries and their reuse: Entropy, kolmogorov complexity, and zipf’s law. arXiv cs.SE (2005)Google Scholar
  15. 15.
    Boehm, B.: Software Engineering Economics. Prentice-Hall, Englewood Cliffs (1981)zbMATHGoogle Scholar
  16. 16.
    Jalali, O.: Evaluation bias in effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University (2007)Google Scholar
  17. 17.
    Zhang, H.: On the distribution of software faults. IEEE Transactions on Software Engineering 34(2), 301–302 (2008)CrossRefGoogle Scholar
  18. 18.
    Halstead, M.: Elements of Software Science. Elsevier, Amsterdam (1977)zbMATHGoogle Scholar
  19. 19.
    McCabe, T.: A complexity measure. IEEE Transactions on Software Engineering 2(4), 308–320 (1976)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Fenton, N., Ohlsson, N.: Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software Engineering, 797–814 (2000)Google Scholar
  21. 21.
    Shepperd, M., Ince, D.: A critique of three metrics. The Journal of Systems and Software 26(3), 197–210 (1994)CrossRefGoogle Scholar
  22. 22.
    Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering 8(3), 255–283 (2003)CrossRefGoogle Scholar
  23. 23.
    Tang, W., Khoshgoftaar, T.M.: Noise identification with the k-means algorithm. In: ICTAI, pp. 373–378 (2004)Google Scholar
  24. 24.
    Nagappan, N., Ball, T.: Static analysis tools as early indicators of pre-release defect density. In: ICSE 2005, St. Louis (2005)Google Scholar
  25. 25.
    Nikora, A., Munson, J.: Developing fault predictors for evolving software systems. In: Ninth International Software Metrics Symposium, METRICS 2003 (2003)Google Scholar
  26. 26.
    Porter, A., Selby, R.: Empirically guided software development using metric-based classification trees. IEEE Software, 46–54 (1990)Google Scholar
  27. 27.
    Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. IEEE Trans. Soft. Eng., 126–137 (1995)Google Scholar
  28. 28.
    Tian, J., Zelkowitz, M.: Complexity measure evaluation and selection. IEEE Transaction on Software Engineering 21(8), 641–649 (1995)CrossRefGoogle Scholar
  29. 29.
    Rakitin, S.: Software Verification and Validation for Practitioners and Managers, 2nd edn. Artech House (2001)Google Scholar
  30. 30.
    Fagan, M.: Design and code inspections to reduce errors in program development. IBM Systems Journal 15(3) (1976)Google Scholar
  31. 31.
    Fagan, M.: Advances in software inspections. IEEE Trans. on Software Engineering, 744–751 (1986)Google Scholar
  32. 32.
    Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements inspections. IEEE Computer 33(7), 73–79 (2000)Google Scholar
  33. 33.
    Shull, F., Basili, V., Boehm, B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., Zelkowitz, M.: What we have learned about fighting defects. In: Proceedings of 8th International Software Metrics Symposium, Ottawa, Canada, pp. 249–258 (2002)Google Scholar
  34. 34.
    Menzies, T., Raffo, D., Setamanit, S., Hu, Y., Tootoonian, S.: Model-based tests of truisms. In: Proceedings of IEEE ASE 2002 (2002)Google Scholar
  35. 35.
    Kitchenham, B.A., Mendes, E., Travassos, G.H.: Cross- vs. within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering, 316–329 (2007)Google Scholar
  36. 36.
    Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist. 18(1), 50–60 (1947)zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Easterbrook, S., Lutz, R.R., Covington, R., Kelly, J., Ampo, Y., Hamilton, D.: Experiences using lightweight formal methods for requirements modeling. IEEE Transactions on Software Engineering, 4–14 (1998)Google Scholar
  38. 38.
    Heimdahl, M., Leveson, N.: Completeness and consistency analysis of state-based requirements. IEEE Transactions on Software Engineering (1996)Google Scholar
  39. 39.
    Heitmeyer, C., Jeffords, R., Labaw, B.: Automated consistency checking of requirements specifications. ACM Transactions on Software Engineering and Methodology 5(3), 231–261 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Burak Turhan
    • 1
  • Ayse Bener
    • 2
  • Tim Menzies
    • 3
  1. 1.Department of Information Processing ScienceUniversity of OuluOuluFinland
  2. 2.Department of Computer EngineeringBoğaziçi UniversityIstanbulTurkey
  3. 3.Lane Dept. of CS&EEWest Virginia UniversityMorgantownUSA

Personalised recommendations