Abstract
Software defects can cause much loss. Static bug-finding tools are designed to detect and remove software defects and believed to be effective. However, do such tools in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of five state-of-the-art static bug-finding tools (FindBugs, JLint, PMD, CheckStyle, and JCSC) on hundreds of reported and fixed defects extracted from three open source programs (Lucene, Rhino, and AspectJ). Our study addresses the question: To what extent could field defects be detected by state-of-the-art static bug-finding tools? Different from past studies that are concerned with the numbers of false positives produced by such tools, we address an orthogonal issue on the numbers of false negatives. We find that although many field defects could be detected by static bug-finding tools, a substantial proportion of defects could not be flagged. We also analyze the types of tool warnings that are more effective in finding field defects and characterize the types of missed defects. Furthermore, we analyze the effectiveness of the tools in finding field defects of various severities, difficulties, and types.
Similar content being viewed by others
References
Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, pp. 361–370 (2006)
Artho, C.: Jlint—find bugs in java programs (2006). http://jlint.sourceforge.net/. Accessed 15 Aug 2013
Ayewah, N., Pugh, W., Morgenthaler, J.D., Penix, J., Zhou, Y.: Evaluating static analysis defect warnings on production software. In: Proceedings of the 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. 1–8 (2007)
Ayewah, N., Pugh, W.: The google findbugs fixit. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, pp. 241–252 (2010)
Ayewah, N., Hovemeyer, D., Morgenthaler, J.D., Penix, J., Pugh, W.: Using static analysis to find bugs. IEEE Softw. 25(5), 22–29 (2008)
Ball, T., Levin, V., Rajamani, S.K.: A decade of software model checking with slam. Commun. ACM 54(7), 68–76 (2011)
Beyer, D., Henzinger, T.A., Jhala, R., Majumdar, R.: The software model checker blast. STTT 9(5–6), 505–525 (2007)
Brun, Y., Ernst, M.D.: Finding latent code errors via machine learning over program executions. In: Proceedings of the 26th International Conference on Software Engineering, pp. 480–490 (2004)
Burn, O.: Checkstyle (2007). http://checkstyle.sourceforge.net/. Accessed 15 Aug 2013
Cadar, C., Ganesh, V., Pawlowski, P.M., Dill, D.L., Engler, D.R.: EXE: automatically generating inputs of death. ACM Trans. Inf. Syst. Secur. 12(2), 10 (2008)
Chou, A., Yang, J., Chelf, B., Hallem, S., Engler, D.R.: An empirical study of operating system errors. In: Symposium on Operating Systems Principles (SOSP), pp. 73–88 (2001)
Copeland, T.: PMD Applied. Centennial Books, Kewaskum (2005)
Corbett, J.C., Dwyer, M.B., Hatcliff, J., Laubach, S., Pasareanu, C.S., Robby, Zheng, H.: Bandera: extracting finite-state models from java source code. In: Proceedings of the nternational Conference on Software Engineering ICSE, pp. 439–448 (2000)
Cousot, P., Cousot, R.: An abstract interpretation framework for termination. In: POPL, pp. 245–258 (2012)
Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Rival, X.: Why does astrée scale up? Formal Methods Syst. Design 35(3), 229–264 (2009)
Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering and Knowledge Engineering, pp. 92–97 (2004)
Dallmeier, V., Zimmermann, T.: Extraction of bug localization benchmarks from history. In: Automated Software Engineering (ASE), pp. 433–436, 2007
Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE), pp. 15–24 (2010)
Giger, E., Pinzger, M., Gall, H.: Comparing fine-grained source code changes and code churn for bug prediction. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR, pp. 83–92 (2011)
Godefroid, P., Klarlund, N., Sen, K.: Dart: directed automated random testing. In: Programming Language Design and Implementation PLDI, pp. 213–223 (2005)
GrammaTech (2012) Codesonar. http://www.grammatech.com/products/codesonar/overview.html. Accessed 15 Aug 2013
Heckman, S.S., Williams, L.A.: A model building process for identifying actionable static analysis alerts. In: International Conference on Software Testing Verification and Validation, ICST, pp. 161–170 (2009)
Heckman, S.S.: Adaptively ranking alerts generated from automated static analysis. ACM Crossroads 14(1), 7 (2007)
Heckman, S.S., Williams, L.A.: A systematic literature review of actionable alert identification techniques for automated static code analysis. Inf. Softw. Technol. 53(4), 363–387 (2011)
Holzmann, G.J., Najm, E., Serhrouchni, A.: Spin model checking: an introduction. STTT 2(4), 321–327 (2000)
Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Automated Software Engineering ASE, pp. 34–43 (2007)
Hosseini, H., Nguyen, R., Godfrey, M.W.: A market-based bug allocation mechanism using predictive bug lifetimes. In: Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on, IEEE, pp. 149–158 (2012)
Hovemeyer, D., Pugh, W.: Finding bugs is easy. In: Object-oriented programming systems, languages, and applications (OOPSLA) (2004)
IBM (2012) T.J. Watson Libraries for Analysis (WALA). http://wala.sourceforge.net. Accessed 15 Aug 2013
Jocham R (2005) Jcsc - java coding standard checker. http://jcsc.sourceforge.net/. Accessed 15 Aug 2013
Kawrykow, D., Robillard, M.P.: Non-essential changes in version histories. In: Proceedings of the 33rd International Conference on Software Engineering ICSE (2011)
Kim, S., Ernst, M.D.: Prioritizing warning categories by analyzing software history. In: Mining Software Repositories MSR (2007)
Kim, S., Whitehead, E.J.: How long did it take to fix bugs? In: Proceedings of the 2006 International Workshop on Mining Software Repositories, ACM, pp. 173–174 (2006)
Kim, S., Zimmermann, T., Whitehead, E.J., Zeller, A.: Predicting faults from cached history. In: ISEC, pp. 15–16 (2008)
Kremenek, T., Engler, D.R.: Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In: Proceedings of the Static Analysis: 10th International Symposium SAS (2003)
Liang, G., Wu, L., Wu, Q., Wang, Q., Xie, T., Mei, H.: Automatic construction of an effective training set for prioritizing static analysis warnings. In: Automated Software Engineering ASE (2010)
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the International Conference on Software Engineering ICSE, pp. 452–461 (2006)
Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th International Conference on Software Engineering ICSE, pp. 284–292 (2005)
Necula, G.C., Condit, J., Harren, M., McPeak, S., Weimer, W.: Ccured: type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst. 27(3), 477–526 (2005)
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the 2007 Programming Language Design and Implementation Conference PLDI, pp. 89–100 (2007)
Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Berlin (2005)
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31, 340–355 (2005)
Pan, K., Kim, S., Jr, E.J.W.: Toward an understanding of bug fix patterns. Emp. Softw. Eng. 14(3), 286–315 (2009)
Park, J.W., Lee, M.W., Kim, J., won Hwang, S., Kim, S.: CosTriage: a cost-aware triage algorithm for bug reporting systems. In: Proceedings of the Association for the Advancement of. Artificial Intelligence (AAAI) (2011)
Rutar, N., Almazan, C.B., Foster, J.S.: A comparison of bug finding tools for java. In: Proceedings of the 15th International Symposium on ISSRE, pp. 245–256 (2004)
Ruthruff, J.R., Penix, J., Morgenthaler, J.D., Elbaum, S.G., Rothermel, G.: Predicting accurate and actionable static analysis warnings: an experimental approach. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 341–350 (2008)
Sen, K., Marinov, D., Agha, G.: CUTE: a concolic unit testing engine for C. In: ESEC/SIGSOFT FSE, pp. 263–272 (2005)
Śliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? SIGSOFT Softw. Eng. Notes 30, 1–5 (2005)
Spacco, J., Hovemeyer, D., Pugh, W.: Tracking defect warnings across versions. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, ACM, MSR ’06, pp. 133–136 (2006)
Sun, C., Lo, D., Wang, X., Jiang, J., Khoo, S.C.: A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering ICSE, pp. 45–54 (2010)
Tassey, G.: The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology Planning Report 02–32002 (2002)
Thung, F., Lo, D., Jiang, L., Lucia, Rahman, F., Devanbu, P.T.: When would this bug get reported? In: Proceedings of the 28th IEEE International Conference on Software Maintenance ICSM, pp. 420–429 (2012)
Visser, W., Mehlitz, P.: Model checking programs with Java PathFinder. In: SPIN (2005)
Weeratunge, D., Zhang, X., Sumner, W.N., Jagannathan. S.: Analyzing concurrency bugs using dual slicing. In: Proceedings of the 19th International Symposium on Software Testing and Analysis ISSTA, pp. 253–264 (2010)
Weiss, C., Premraj, R., Zimmermann, T., Zeller, A.: How long will it take to fix this bug? In: Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07 (2007)
Wu, R., Zhang, H., Kim, S., Cheung, S.C.: Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering SIGSOFT FSE, pp. 15–25 (2011a)
Wu, W., Zhang, W., Yang, Y., Wang, Q.: Drex: developer recommendation with k-nearest-neighbor search and expertise ranking. In: Proceedings of the Software Engineering Conference (APSEC), 2011 18th Asia Pacific, IEEE, pp. 389–396 (2011b)
Xia, X., Lo, D., Wang, X., Zhou, B.: Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering (2013)
Xie, Y., Aiken, A.: Saturn: a scalable framework for error detection using boolean satisfiability. ACM Trans. Program. Lang. Syst. 29(3), 16 (2007)
Xie, X., Zhang, W., Yang, Y., Wang, Q.: Dretom: developer recommendation based on topic models for bug resolution. In: Proceedings of the 8th International Conference on Predictive Models in Software Engineering, ACM, pp. 19–28 (2012)
Acknowledgments
We thank the researchers creating and maintaining the iBugs repository which is publicly available at http://www.st.cs.uni-saarland.de/ibugs/. We also appreciate very much the valuable comments from anonymous reviewers and our shepherd Andreas Zeller for earlier versions of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thung, F., Lucia, Lo, D. et al. To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools. Autom Softw Eng 22, 561–602 (2015). https://doi.org/10.1007/s10515-014-0169-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-014-0169-8