Learning to recognize actionable static code warnings (is intrinsically easy)

Abstract

Static code warning tools often generate warnings that programmers ignore. Such tools can be made more useful via data mining algorithms that select the “actionable” warnings; i.e. the warnings that are usually not ignored. In this paper, we look for actionable warnings within a sample of 5,675 actionable warnings seen in 31,058 static code warnings from FindBugs. We find that data mining algorithms can find actionable warnings with remarkable ease. Specifically, a range of data mining methods (deep learners, random forests, decision tree learners, and support vector machines) all achieved very good results (recalls and AUC(TRN, TPR) measures usually over 95% and false alarms usually under 5%). Given that all these learners succeeded so easily, it is appropriate to ask if there is something about this task that is inherently easy. We report that while our data sets have up to 58 raw features, those features can be approximated by less than two underlying dimensions. For such intrinsically simple data, many different kinds of learners can generate useful models with similar performance. Based on the above, we conclude that learning to recognize actionable static code warnings is easy, using a wide range of learning algorithms, since the underlying data is intrinsically simple. If we had to pick one particular learner for this task, we would suggest linear SVMs (since, at least in our sample, that learner ran relatively quickly and achieved the best median performance) and we would not recommend deep learning (since this data is intrinsically very simple).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    https://github.com/XueqiYang/intrinsic_dimension.

  2. 2.

    https://pmd.github.io/latest/index.html

  3. 3.

    https://checkstyle.sourceforge.io/

  4. 4.

    http://findbugs.sourceforge.net

  5. 5.

    https://github.com/XueqiYang/intrinsic_dimension

References

  1. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th international conference on database theory, ICDT ’01, pp 420–434. Springer-Verlag, Berlin, Heidelberg

  2. Agrawal A, Fu W, Chen D, Shen X, Menzies T (2019) How to “dodge” complex software analytics. Preprint, IEEE Transactions on Software Engineering, Available on-line at 1902.01838

  3. Agrawal A, Menzies T (2018) Is better data better than better data miners?: on the benefits of tuning smote for defect prediction. In: International Conference on Software Engineering

  4. Allier S, Anquetil N, Hora A, Ducasse S (2012) A framework to compare alert ranking algorithms. In: 2012 19th Working conference on reverse engineering, pp 277–285. IEEE

  5. Avgustinov P, Baars AI, Henriksen AS, Lavender G, Menzel G, de Moor O, Schäfer M, Tibble J (2015) Tracking static analysis violations over time to capture developer characteristics. In: Proceedings of the 37th international conference on software engineering-volume 1, pp 437–447. IEEE Press

  6. Ayewah N, Pugh W, Hovemeyer D, Morgenthaler JD, Penix J (2008) Using static analysis to find bugs. IEEE software 25(5):22–29

    Article  Google Scholar 

  7. Bhattacharya P, Iliofotou M, Neamtiu I, Faloutsos M (2012) Graph-based analysis and prediction for software evolution. In: 2012 34th International conference on software engineering (ICSE), pp 419–429. IEEE

  8. Boogerd C, Moonen L (2008) Assessing the value of coding standards: An empirical study. In: 2008 IEEE International conference on software maintenance, pp 277–286. IEEE

  9. Breiman L (1999) Random forests. UC Berkeley TR567

  10. Chen C, Xing Z, Liu Y, Ong KLX (2019) Mining likely analogical apis across third-party libraries via large-scale unsupervised api semantics embedding. IEEE Trans Softw Eng

  11. Chen W-C, Tseng S-S, Wang C-Y (2005) A novel manufacturing defect detection method using association rule mining techniques. Expert systems with applications 29(4):807–815

    Article  Google Scholar 

  12. Choetkiertikul M, Dam HK, Tran T, Pham TTM, Ghose A, Menzies T (2018) A deep learning model for estimating story points. IEEE Trans Softw Eng

  13. Cortes C, Vapnik V (1995) Support-vector networks. Machine learning 20(3):273–297

    MATH  Google Scholar 

  14. Courtney RE, Gustafson DA (1993) Shotgun correlations in software measures. Softw Eng J, vol 8

  15. Géron A (2019) Hands-on machine learning with scikit-learn, keras, and tensorflow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media

  16. Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering-volume 1, pp 789–800. IEEE Press

  17. Goh Anthony TC (1995) Back-propagation neural networks for modeling complex systems. Artif Intell Eng 9(3):143–151

    Article  Google Scholar 

  18. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press

  19. Gu X, Zhang H, Zhang D, Kim S (2016) Deep api learning. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 631–642. ACM

  20. Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep learning techniques. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp 3–14. IEEE

  21. Hanam Q, Tan L, Holmes R, Lam P (2014) Finding patterns in static analysis alerts: improving actionable alert ranking. In: Proceedings of the 11th working conference on mining software repositories, pp 152–161. ACM

  22. Heckman S, Williams L (2009) A model building process for identifying actionable static analysis alerts. In: 2009 International conference on software testing verification and validation, pp 161–170. IEEE

  23. Heckman S, Williams L (2011) A systematic literature review of actionable alert identification techniques for automated static code analysis. Inf Softw Technol 53(4):363–387

    Article  Google Scholar 

  24. Hindle A, Barr ET, Gabel M, Su Z, Devanbu P (2016) On the naturalness of software. Commun. ACM 59(5):122–131. https://doi.org/10.1145/2902362

    Article  Google Scholar 

  25. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural networks 4(2):251–257

    MathSciNet  Article  Google Scholar 

  26. Huo X, Thung F, Li M, Lo D, Shi S-T (2019) Deep transfer bug localization. IEEE Trans Softw Eng

  27. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  28. Johnson B, Song Y, Murphy-Hill E, Bowdidge R (2013) Why don’t software developers use static analysis tools to find bugs?. In: Proceedings of the 2013 international conference on software engineering, pp 672–681. IEEE Press

  29. Khalid H, Nagappan M, Hassan AE (2015) Examining the relationship between findbugs warnings and app ratings. Ieee Software 33(4):34–39

    Article  Google Scholar 

  30. Kim S, Ernst MD (2007) Prioritizing warning categories by analyzing software history. In: Proceedings of the fourth international workshop on mining software repositories, p 27. IEEE Computer Society

  31. Kremenek T, Ashcraft K, Yang J, Engler D (2004) Correlation exploitation in error ranking. In: ACM SIGSOFT software engineering notes, 29, pp 83–93. ACM

  32. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  33. Levina E, Bickel PJ (2005) Maximum likelihood estimation of intrinsic dimension. In: Advances in neural information processing systems, pp 777–784

  34. Li X, Jiang H, Ren Z, Li G, Zhang J (2018) Deep learning in software engineering. arXiv:1805.04825

  35. Li Y, Yuan Y (2017) Convergence analysis of two-layer neural networks with relu activation. In: Advances in neural information processing systems, pp 597–607

  36. Liang G, Wu L, Wu Q, Wang Q, Xie T, Mei H (2010) Automatic construction of an effective training set for prioritizing static analysis warnings. In: Proceedings of the IEEE/ACM international conference on Automated software engineering, pp 93–102. ACM

  37. Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go?. In: 2018 IEEE/ACM 40th International conference on software engineering (ICSE), pp 94–104. IEEE

  38. Lin Y-Z, Nie Z-H, Ma H-W (2017) Structural damage detection with automatic feature-extraction through deep learning. Comput. Aided Civ Infrastructure Eng 32:1025–1046

    Article  Google Scholar 

  39. Maaten L, Hinton G (2008) Visualizing data using t-sne. Journal of machine learning research 9:2579–2605

    MATH  Google Scholar 

  40. Menzies T, Owen D, Richardson J (January 2007) The strangest thing about software. Computer 40(1):54–60. https://doi.org/10.1109/MC.2007.37

    Article  Google Scholar 

  41. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. Journal of Big Data 2(1):1

    Article  Google Scholar 

  42. Nguyen TD, Nguyen AT, Phan HD, Nguyen TN (2017) Exploring api embedding for api usages and applications. In: 2017 IEEE/ACM 39th International conference on software engineering (ICSE), pp 438–449. IEEE

  43. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. Journal of machine learning research

  44. Quinlan JR (1987) Generating production rules from decision trees.

  45. Rasmussen C (1999) The infinite gaussian mixture model. Advances in neural information processing systems 12:554–560

    Google Scholar 

  46. Rosenthal R, Cooper H, Hedges L (1994) Parametric measures of effect size. The handbook of research synthesis 621(2):231–244

    Google Scholar 

  47. Sawilowsky SS (2009) New effect size rules of thumb. Journal of Modern Applied Statistical Methods 8(2):26

    MathSciNet  Article  Google Scholar 

  48. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms. Cambridge university press, Cambridge

    Book  Google Scholar 

  49. Shen H, Fang J, Zhao J (2011) Efindbugs: Effective error ranking for findbugs. In: 2011 Fourth IEEE International conference on software testing, verification and validation, pp 299–308. IEEE

  50. Shivaji S, Whitehead Jr EJ, Akella R, Kim S (2009) Reducing features to improve bug prediction. In: 2009 IEEE/ACM International conference on automated software engineering, pp 600–604. IEEE

  51. Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: ICSE’16, pp 321–332

  52. Thung F, Lo D, Jiang L, Rahman F, Devanbu PT, et al. (2015) To what extent could we detect field defects? an extended empirical study of false negatives in static bug-finding tools. Autom Softw Eng 22(4):561–602

    Article  Google Scholar 

  53. Tu H, Nair V (2018) While tuning is good, no tuner is best. In: FSE SWAN

  54. Vandekerckhove J, Matzke D, Wagenmakers E-J, et al. (2015) Model comparison and the principle of parsimony. Oxford handbook of computational and mathematical psychology, pp 300–319

  55. Wang J, Wang S, Wang Q (2018) Is there a golden feature set for static warning identification?: an experimental evaluation. In: Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement, pp 17. ACM

  56. Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: 2016 IEEE/ACM 38th International conference on software engineering (ICSE), pp 297–308. IEEE

  57. White M, Tufano M, Vendome C, Poshyvanyk D (2016) Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 87–98. ACM

  58. Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: Practical machine learning tools and techniques. Morgan Kaufmann

  59. Wolpert DH, Macready WG, et al. (1997) No free lunch theorems for optimization. IEEE transactions on evolutionary computation 1(1):67–82

    Article  Google Scholar 

  60. Yu Z, Kraft NA, Menzies T (2018) Finding better active learners for faster literature reviews. Empir Softw Eng 23(6):3161–3186

    Article  Google Scholar 

  61. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization. arXiv:1611.03530

  62. Zhao G, Huang J (2018) Deepsim: deep learning code functional similarity. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 141–151. ACM

Download references

Acknowledgment

This work was partially funded by an NSF award #1703487.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tim Menzies.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by: Andy Zaidman

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, X., Chen, J., Yedida, R. et al. Learning to recognize actionable static code warnings (is intrinsically easy). Empir Software Eng 26, 56 (2021). https://doi.org/10.1007/s10664-021-09948-6

Download citation

Keywords

  • Static code analysis
  • Actionable warnings
  • Deep learning
  • Linear SVM
  • Intrinsic dimensionality