Defect prediction from static code features: current results, limitations, new approaches

Menzies, Tim; Milton, Zach; Turhan, Burak; Cukic, Bojan; Jiang, Yue; Bener, Ayşe

doi:10.1007/s10515-010-0069-5

Defect prediction from static code features: current results, limitations, new approaches

Published: 20 May 2010

Volume 17, pages 375–407, (2010)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Tim Menzies¹,
Zach Milton¹,
Burak Turhan²,
Bojan Cukic¹,
Yue Jiang¹ &
…
Ayşe Bener³

1683 Accesses
336 Citations
Explore all metrics

Abstract

Building quality software is expensive and software quality assurance (QA) budgets are limited. Data miners can learn defect predictors from static code features which can be used to control QA resources; e.g. to focus on the parts of the code predicted to be more defective.

Recent results show that better data mining technology is not leading to better defect predictors. We hypothesize that we have reached the limits of the standard learning goal of maximizing area under the curve (AUC) of the probability of false alarms and probability of detection “AUC(pd, pf)”; i.e. the area under the curve of a probability of false alarm versus probability of detection.

Accordingly, we explore changing the standard goal. Learners that maximize “AUC(effort, pd)” find the smallest set of modules that contain the most errors. WHICH is a meta-learner framework that can be quickly customized to different goals. When customized to AUC(effort, pd), WHICH out-performs all the data mining methods studied here. More importantly, measured in terms of this new goal, certain widely used learners perform much worse than simple manual methods.

Hence, we advise against the indiscriminate use of learners. Learners must be chosen and customized to the goal at hand. With the right architecture (e.g. WHICH), tuning a learner to specific local business goals can be a simple task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

Article 27 October 2018

A variable-level automated defect identification model based on machine learning

Article Open access 23 March 2019

Predicting defective modules in different test phases

Article 07 May 2014

References

Arisholm, E., Briand, L.: Predicting fault-prone components in a java legacy system. In: 5th ACM-IEEE International Symposium on Empirical Software Engineering (ISESE), Rio de Janeiro, Brazil, September 21–22 (2006). Available from http://simula.no/research/engineering/publications/Arisholm.2006.4
Blake, C., Merz, C.: UCI repository of machine learning databases (1998). URL: http://www.ics.uci.edu/~mlearn/MLRepository.html
Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: Knowledge Discovery and Data Mining, pp. 9–15 (1998). Available from http://citeseer.ist.psu.edu/bradley98scaling.html
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Tech. rep., Wadsworth International, Monterey, CA (1984)
Breimann, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Brieman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Google Scholar
Chapman, M., Solomon, D.: The relationship of cyclomatic complexity, essential complexity and error rates. In: Proceedings of the NASA Software Assurance Symposium, Coolfont Resort and Conference Center in Berkley Springs, West Virginia (2002). Available from http://www.ivv.nasa.gov/business/research/osmasas/conclusion2002/Mike_Chapman_The_Relationship_of_Cyclomatic_Complexity_Essential_Complexity_and_Error_Rates.ppt
Cohen, P.: Empirical Methods for Artificial Intelligence. MIT Press, Cambridge (1995a)
MATH Google Scholar
Cohen, W.: Fast effective rule induction. In: ICML’95, pp. 115–123 (1995b). Available on-line from http://www.cs.cmu.edu/~wcohen/postscript/ml-95-ripper.ps
Cover, T.M., Hart, P.E.: Nearest neighbour pattern classification. IEEE Trans. Inf. Theory iT-13, 21–27 (1967)
Article Google Scholar
Demsar, J.: Statistical comparisons of clasifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). Available from http://jmlr.csail.mit.edu/papers/v7/demsar06a.html
MathSciNet Google Scholar
Dietterich, T.: Machine learning research: four current directions. AI Mag. 18(4), 97–136 (1997)
Google Scholar
Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29(2–3), 103–130 (1997)
Article MATH Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI’01) (2001). Available from http://www-cse.ucsd.edu/users/elkan/rescale.pdf
Fagan, M.: Design and code inspections to reduce errors in program development. IBM Syst. J. 15(3), 182–211 (1976)
Article Google Scholar
Fagan, M.: Advances in software inspections. IEEE Trans. Softw. Eng. SE-12, 744–751 (1986)
Google Scholar
Fawcett, T.: Using rule sets to maximize roc performance. In: 2001 IEEE International Conference on Data Mining (ICDM-01) (2001). Available from http://home.comcast.net/~tom.fawcett/public_html/papers/ICDM-final.pdf
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999). Available from http://citeseer.nj.nec.com/fenton99critique.html
Article Google Scholar
Fenton, N.E., Pfleeger, S.: Software Metrics: A Rigorous & Practical Approach, 2nd edn. International Thompson Press (1995)
Fenton, N.E., Pfleeger, S.: Software Metrics: A Rigorous & Practical Approach. International Thompson Press (1997)
Fenton, N., Pfleeger, S., Glass, R.: Science and substance: a challenge to software engineers. IEEE Softw., 86–95 (1994)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. JCSS: J. Comput. Syst. Sci. 55 (1997)
Hall, G., Munson, J.: Software evolution: code delta and code churn. J. Syst. Softw. 111–118 (2000)
Halstead, M.: Elements of Software Science. Elsevier, Amsterdam (1977)
MATH Google Scholar
Huang, J., Ling, C.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowledge Data Eng. 17(3), 299–310 (2005)
Article Google Scholar
Jiang, Y., Cukic, B., Ma, Y.: Techniques for evaluating fault prediction models. Empir. Softw. Eng., 561–595 (2008a)
Jiang, Y., Cukic, B., Menzies, T.: Does transformation help? In: Defects (2008b). Available from http://menzies.us/pdf/08transform.pdf
Khoshgoftaar, T.: An application of zero-inflated Poisson regression for software fault prediction. In: Proceedings of the 12th International Symposium on Software Reliability Engineering, Hong Kong, pp. 66–73 (2001)
Khoshgoftaar, T., Allen, E.: Model software quality with classification trees. In: Pham, H. (ed.): Recent Advances in Reliability and Quality Engineering, pp. 247–270. World Scientific, Singapore (2001)
Chapter Google Scholar
Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir. Softw. Eng. 8(3), 255–283 (2003)
Article Google Scholar
Koru, A., Zhang, D., Liu, H.: Modeling the effect of size on defect proneness for open-source software. In: Proceedings PROMISE’07 (ICSE) (2007). Available from http://promisedata.org/pdf/mpls2007KoruZhangLiu.pdf
Koru, A., Emam, K.E., Zhang, D., Liu, H., Mathew, D.: Theory of relative defect proneness: replicated studies on the functional form of the size-defect relationship. Empir. Softw. Eng., 473–498 (2008)
Koru, A., Zhang, D., El Emam, K., Liu, H.: An investigation into the functional form of the size-defect relationship for software modules. Softw. Eng. IEEE Trans. 35(2), 293–304 (2009)
Article Google Scholar
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. (2008)
Leveson, N.: Safeware System Safety and Computers. Addison-Wesley, Reading (1995)
Google Scholar
Littlewood, B., Wright, D.: Some conservative stopping rules for the operational testing of safety-critical software. IEEE Trans. Softw. Eng. 23(11), 673–683 (1997)
Article Google Scholar
Lowry, M., Boyd, M., Kulkarni, D.: Towards a theory for integration of mathematical verification and empirical testing. In: Proceedings, ASE’98: Automated Software Engineering, pp. 322–331 (1998)
Lutz, R., Mikulski, C.: Operational anomalies as a cause of safety-critical requirements evolution. J. Syst. Softw. (2003). Available from http://www.cs.iastate.edu/~rlutz/publications/JSS02.ps
McCabe, T.: A complexity measure. IEEE Trans. Softw. Eng. 2(4), 308–320 (1976)
Article MathSciNet Google Scholar
Menzies, T., Cukic, B.: When to test less. IEEE Softw. 17(5), 107–112 (2000). Available from http://menzies.us/pdf/00iesoft.pdf
Article Google Scholar
Menzies, T., Stefano, J.S.D.: How good is your blind spot sampling policy? In: 2004 IEEE Conference on High Assurance Software Engineering (2003). Available from http://menzies.us/pdf/03blind.pdf
Menzies, T., Raffo, D., Setamanit, S., Hu, Y., Tootoonian, S.: Model-based tests of truisms. In: Proceedings of IEEE ASE 2002 (2002). Available from http://menzies.us/pdf/02truisms.pdf
Menzies, T., Dekhtyar, A., Distefano, J., Greenwald, J.: Problems with precision. IEEE Trans. Softw. Eng. (2007a). http://menzies.us/pdf/07precision.pdf
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. (2007b). Available from http://menzies.us/pdf/06learnPredict.pdf
Milton, Z.: Which rules. M.S. thesis (2008)
Mockus, A., Zhang, P., Li, P.L.: Predictors of customer perceived software quality. In: ICSE ’05: Proceedings of the 27th International Conference on Software Engineering, pp. 225–233. ACM, New York (2005)
Google Scholar
Musa, J., Iannino, A., Okumoto, K.: Software Reliability: Measurement, Prediction, Application. McGraw-Hill, New York (1987)
Google Scholar
Nagappan, N., Ball, T.: Static analysis tools as early indicators of pre-release defect density. In: ICSE 2005, St. Louis (2005a)
Nagappan, N., Ball, T.: Static analysis tools as early indicators of pre-release defect density. In: ICSE, pp. 580–586 (2005b)
Nagappan, N., Murphy, B.: Basili, V.: The influence of organizational structure on software quality: An empirical case study. In: ICSE’08 (2008)
Nikora, A.: Personnel communication on the accuracy of severity determinations in NASA databases (2004)
Nikora, A., Munson, J.: Developing fault predictors for evolving software systems. In: Ninth International Software Metrics Symposium (METRICS’03) (2003)
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Where the bugs are. In: ISSTA’04: Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 86–96. ACM, New York (2004)
Chapter Google Scholar
Porter, A., Selby, R.: Empirically guided software development using metric-based classification trees. IEEE Softw. 46–54 (1990)
Pugh, W.: Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33(6), 668–676 (1990). Available from ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf
Article MathSciNet Google Scholar
Quinlan, J.R.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348 (1992a). Available from http://citeseer.nj.nec.com/quinlan92learning.html
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1992b). ISBN: 1558602380
Google Scholar
Raffo, D.: Personnel communication (2005)
Rakitin, S.: Software Verification and Validation for Practitioners and Managers, 2nd edn. Artech House, Norwood (2001)
Google Scholar
Shepperd, M., Ince, D.: A critique of three metrics. J. Syst. Softw. 26(3), 197–210 (1994)
Article Google Scholar
Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements inspections. IEEE Comput. 33(7), 73–79 (2000). Available from http://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.77.pdf
Google Scholar
Shull, F., Boehm, B., B., V., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., Zelkowitz, M.: What we have learned about fighting defects. In: Proceedings of 8th International Software Metrics Symposium, Ottawa, Canada, pp. 249–258 (2002). Available from http://fc-md.umd.edu/fcmd/Papers/shull_defects.ps
Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. IEEE Trans. Soft. Eng. 126–137 (1995)
Tang, W., Khoshgoftaar, T.M.: Noise identification with the k-means algorithm. In: ICTAI, pp. 373–378 (2004)
Tian, J., Zelkowitz, M.: Complexity measure evaluation and selection. IEEE Trans. Softw. Eng. 21(8), 641–649 (1995)
Article Google Scholar
Tosun, A., Bener, A.: Ai-based software defect predictors: Applications and benefits in a case study. In: IAAI’10 (2010)
Tosun, A., Bener, A., Turhan, B.: Practical considerations of deploying ai in defect prediction: a case study within the Turkish telecommunication industry. In: PROMISE’09 (2009)
Turhan, B., Menzies, T., Bener, A., Distefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empir. Softw. Eng. 68(2), 278–290 (2009). Available from http://menzies.us/pdf/08ccwc.pdf
Google Scholar
Turner, J.: A predictive approach to eliminating errors in software code (2006). Available from http://www.sti.nasa.gov/tto/Spinoff2006/ct_1.html
Voas, J., Miller, K.: Software testability: the new verification. IEEE Softw. 17–28 (1995). Available from http://www.cigital.com/papers/download/ieeesoftware95.ps
Weyuker, E., Ostrand, T., Bell, R.: Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir. Softw. Eng. (2008)
Witten, I.H., Frank, E.: Data Mining, 2nd edn. Morgan Kaufmann, Los Altos (2005)
MATH Google Scholar
Yang, Y., Webb, G.I., Cerquides, J., Korb, K.B., Boughton, J.R., Ting, K.M.: To select or to weigh: a comparative study of model selection and model weighing for spode ensembles. In: ECML, pp. 533–544 (2006)
Zimmermann, T., Nagappan, N., E.G., H.G., Murphy, B., Cross-project defect prediction. In: ESEC/FSE’09 (2009)

Download references

Author information

Authors and Affiliations

West Virginia University, Morgantown, USA
Tim Menzies, Zach Milton, Bojan Cukic & Yue Jiang
University of Oulu, Oulu, Finland
Burak Turhan
Boğaziçi University, Istandbul, Turkey
Ayşe Bener

Authors

Tim Menzies
View author publications
You can also search for this author in PubMed Google Scholar
Zach Milton
View author publications
You can also search for this author in PubMed Google Scholar
Burak Turhan
View author publications
You can also search for this author in PubMed Google Scholar
Bojan Cukic
View author publications
You can also search for this author in PubMed Google Scholar
Yue Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ayşe Bener
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim Menzies.

Additional information

This research was supported by NSF grant CCF-0810879 and the Turkish Scientific Research Council (Tubitak EEEAG 108E014). For an earlier draft, see http://menzies.us/pdf/08bias.pdf.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Menzies, T., Milton, Z., Turhan, B. et al. Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17, 375–407 (2010). https://doi.org/10.1007/s10515-010-0069-5

Download citation

Received: 17 November 2009
Accepted: 05 May 2010
Published: 20 May 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10515-010-0069-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Defect prediction from static code features: current results, limitations, new approaches

Abstract

Access this article

Similar content being viewed by others

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

A variable-level automated defect identification model based on machine learning

Predicting defective modules in different test phases

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Defect prediction from static code features: current results, limitations, new approaches

Abstract

Access this article

Similar content being viewed by others

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

A variable-level automated defect identification model based on machine learning

Predicting defective modules in different test phases

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation