Skip to main content
Log in

To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Software defects can cause much loss. Static bug-finding tools are designed to detect and remove software defects and believed to be effective. However, do such tools in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of five state-of-the-art static bug-finding tools (FindBugs, JLint, PMD, CheckStyle, and JCSC) on hundreds of reported and fixed defects extracted from three open source programs (Lucene, Rhino, and AspectJ). Our study addresses the question: To what extent could field defects be detected by state-of-the-art static bug-finding tools? Different from past studies that are concerned with the numbers of false positives produced by such tools, we address an orthogonal issue on the numbers of false negatives. We find that although many field defects could be detected by static bug-finding tools, a substantial proportion of defects could not be flagged. We also analyze the types of tool warnings that are more effective in finding field defects and characterize the types of missed defects. Furthermore, we analyze the effectiveness of the tools in finding field defects of various severities, difficulties, and types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

Notes

  1. http://lucene.apache.org/core/

  2. http://www.mozilla.org/rhino/

  3. http://www.eclipse.org/aspectj/

References

  • Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, pp. 361–370 (2006)

  • Artho, C.: Jlint—find bugs in java programs (2006). http://jlint.sourceforge.net/. Accessed 15 Aug 2013

  • Ayewah, N., Pugh, W., Morgenthaler, J.D., Penix, J., Zhou, Y.: Evaluating static analysis defect warnings on production software. In: Proceedings of the 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. 1–8 (2007)

  • Ayewah, N., Pugh, W.: The google findbugs fixit. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, pp. 241–252 (2010)

  • Ayewah, N., Hovemeyer, D., Morgenthaler, J.D., Penix, J., Pugh, W.: Using static analysis to find bugs. IEEE Softw. 25(5), 22–29 (2008)

    Article  Google Scholar 

  • Ball, T., Levin, V., Rajamani, S.K.: A decade of software model checking with slam. Commun. ACM 54(7), 68–76 (2011)

    Article  Google Scholar 

  • Beyer, D., Henzinger, T.A., Jhala, R., Majumdar, R.: The software model checker blast. STTT 9(5–6), 505–525 (2007)

    Article  Google Scholar 

  • Brun, Y., Ernst, M.D.: Finding latent code errors via machine learning over program executions. In: Proceedings of the 26th International Conference on Software Engineering, pp. 480–490 (2004)

  • Burn, O.: Checkstyle (2007). http://checkstyle.sourceforge.net/. Accessed 15 Aug 2013

  • Cadar, C., Ganesh, V., Pawlowski, P.M., Dill, D.L., Engler, D.R.: EXE: automatically generating inputs of death. ACM Trans. Inf. Syst. Secur. 12(2), 10 (2008)

  • Chou, A., Yang, J., Chelf, B., Hallem, S., Engler, D.R.: An empirical study of operating system errors. In: Symposium on Operating Systems Principles (SOSP), pp. 73–88 (2001)

  • Copeland, T.: PMD Applied. Centennial Books, Kewaskum (2005)

    Google Scholar 

  • Corbett, J.C., Dwyer, M.B., Hatcliff, J., Laubach, S., Pasareanu, C.S., Robby, Zheng, H.: Bandera: extracting finite-state models from java source code. In: Proceedings of the nternational Conference on Software Engineering ICSE, pp. 439–448 (2000)

  • Cousot, P., Cousot, R.: An abstract interpretation framework for termination. In: POPL, pp. 245–258 (2012)

  • Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Rival, X.: Why does astrée scale up? Formal Methods Syst. Design 35(3), 229–264 (2009)

    Article  MATH  Google Scholar 

  • Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering and Knowledge Engineering, pp. 92–97 (2004)

  • Dallmeier, V., Zimmermann, T.: Extraction of bug localization benchmarks from history. In: Automated Software Engineering (ASE), pp. 433–436, 2007

  • Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE), pp. 15–24 (2010)

  • Giger, E., Pinzger, M., Gall, H.: Comparing fine-grained source code changes and code churn for bug prediction. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR, pp. 83–92 (2011)

  • Godefroid, P., Klarlund, N., Sen, K.: Dart: directed automated random testing. In: Programming Language Design and Implementation PLDI, pp. 213–223 (2005)

  • GrammaTech (2012) Codesonar. http://www.grammatech.com/products/codesonar/overview.html. Accessed 15 Aug 2013

  • Heckman, S.S., Williams, L.A.: A model building process for identifying actionable static analysis alerts. In: International Conference on Software Testing Verification and Validation, ICST, pp. 161–170 (2009)

  • Heckman, S.S.: Adaptively ranking alerts generated from automated static analysis. ACM Crossroads 14(1), 7 (2007)

  • Heckman, S.S., Williams, L.A.: A systematic literature review of actionable alert identification techniques for automated static code analysis. Inf. Softw. Technol. 53(4), 363–387 (2011)

    Article  Google Scholar 

  • Holzmann, G.J., Najm, E., Serhrouchni, A.: Spin model checking: an introduction. STTT 2(4), 321–327 (2000)

    Article  Google Scholar 

  • Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Automated Software Engineering ASE, pp. 34–43 (2007)

  • Hosseini, H., Nguyen, R., Godfrey, M.W.: A market-based bug allocation mechanism using predictive bug lifetimes. In: Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on, IEEE, pp. 149–158 (2012)

  • Hovemeyer, D., Pugh, W.: Finding bugs is easy. In: Object-oriented programming systems, languages, and applications (OOPSLA) (2004)

  • IBM (2012) T.J. Watson Libraries for Analysis (WALA). http://wala.sourceforge.net. Accessed 15 Aug 2013

  • Jocham R (2005) Jcsc - java coding standard checker. http://jcsc.sourceforge.net/. Accessed 15 Aug 2013

  • Kawrykow, D., Robillard, M.P.: Non-essential changes in version histories. In: Proceedings of the 33rd International Conference on Software Engineering ICSE (2011)

  • Kim, S., Ernst, M.D.: Prioritizing warning categories by analyzing software history. In: Mining Software Repositories MSR (2007)

  • Kim, S., Whitehead, E.J.: How long did it take to fix bugs? In: Proceedings of the 2006 International Workshop on Mining Software Repositories, ACM, pp. 173–174 (2006)

  • Kim, S., Zimmermann, T., Whitehead, E.J., Zeller, A.: Predicting faults from cached history. In: ISEC, pp. 15–16 (2008)

  • Kremenek, T., Engler, D.R.: Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In: Proceedings of the Static Analysis: 10th International Symposium SAS (2003)

  • Liang, G., Wu, L., Wu, Q., Wang, Q., Xie, T., Mei, H.: Automatic construction of an effective training set for prioritizing static analysis warnings. In: Automated Software Engineering ASE (2010)

  • Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the International Conference on Software Engineering ICSE, pp. 452–461 (2006)

  • Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th International Conference on Software Engineering ICSE, pp. 284–292 (2005)

  • Necula, G.C., Condit, J., Harren, M., McPeak, S., Weimer, W.: Ccured: type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst. 27(3), 477–526 (2005)

    Article  Google Scholar 

  • Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the 2007 Programming Language Design and Implementation Conference PLDI, pp. 89–100 (2007)

  • Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Berlin (2005)

    MATH  Google Scholar 

  • Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31, 340–355 (2005)

    Article  Google Scholar 

  • Pan, K., Kim, S., Jr, E.J.W.: Toward an understanding of bug fix patterns. Emp. Softw. Eng. 14(3), 286–315 (2009)

    Article  Google Scholar 

  • Park, J.W., Lee, M.W., Kim, J., won Hwang, S., Kim, S.: CosTriage: a cost-aware triage algorithm for bug reporting systems. In: Proceedings of the Association for the Advancement of. Artificial Intelligence (AAAI) (2011)

  • Rutar, N., Almazan, C.B., Foster, J.S.: A comparison of bug finding tools for java. In: Proceedings of the 15th International Symposium on ISSRE, pp. 245–256 (2004)

  • Ruthruff, J.R., Penix, J., Morgenthaler, J.D., Elbaum, S.G., Rothermel, G.: Predicting accurate and actionable static analysis warnings: an experimental approach. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 341–350 (2008)

  • Sen, K., Marinov, D., Agha, G.: CUTE: a concolic unit testing engine for C. In: ESEC/SIGSOFT FSE, pp. 263–272 (2005)

  • Śliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? SIGSOFT Softw. Eng. Notes 30, 1–5 (2005)

    Article  Google Scholar 

  • Spacco, J., Hovemeyer, D., Pugh, W.: Tracking defect warnings across versions. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, ACM, MSR ’06, pp. 133–136 (2006)

  • Sun, C., Lo, D., Wang, X., Jiang, J., Khoo, S.C.: A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering ICSE, pp. 45–54 (2010)

  • Tassey, G.: The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology Planning Report 02–32002 (2002)

  • Thung, F., Lo, D., Jiang, L., Lucia, Rahman, F., Devanbu, P.T.: When would this bug get reported? In: Proceedings of the 28th IEEE International Conference on Software Maintenance ICSM, pp. 420–429 (2012)

  • Visser, W., Mehlitz, P.: Model checking programs with Java PathFinder. In: SPIN (2005)

  • Weeratunge, D., Zhang, X., Sumner, W.N., Jagannathan. S.: Analyzing concurrency bugs using dual slicing. In: Proceedings of the 19th International Symposium on Software Testing and Analysis ISSTA, pp. 253–264 (2010)

  • Weiss, C., Premraj, R., Zimmermann, T., Zeller, A.: How long will it take to fix this bug? In: Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07 (2007)

  • Wu, R., Zhang, H., Kim, S., Cheung, S.C.: Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering SIGSOFT FSE, pp. 15–25 (2011a)

  • Wu, W., Zhang, W., Yang, Y., Wang, Q.: Drex: developer recommendation with k-nearest-neighbor search and expertise ranking. In: Proceedings of the Software Engineering Conference (APSEC), 2011 18th Asia Pacific, IEEE, pp. 389–396 (2011b)

  • Xia, X., Lo, D., Wang, X., Zhou, B.: Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering (2013)

  • Xie, Y., Aiken, A.: Saturn: a scalable framework for error detection using boolean satisfiability. ACM Trans. Program. Lang. Syst. 29(3), 16 (2007)

  • Xie, X., Zhang, W., Yang, Y., Wang, Q.: Dretom: developer recommendation based on topic models for bug resolution. In: Proceedings of the 8th International Conference on Predictive Models in Software Engineering, ACM, pp. 19–28 (2012)

Download references

Acknowledgments

We thank the researchers creating and maintaining the iBugs repository which is publicly available at http://www.st.cs.uni-saarland.de/ibugs/. We also appreciate very much the valuable comments from anonymous reviewers and our shepherd Andreas Zeller for earlier versions of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ferdian Thung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thung, F., Lucia, Lo, D. et al. To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools. Autom Softw Eng 22, 561–602 (2015). https://doi.org/10.1007/s10515-014-0169-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-014-0169-8

Keywords

Navigation