Empirical Software Engineering

, Volume 18, Issue 1, pp 25–59 | Cite as

Can traditional fault prediction models be used for vulnerability prediction?



Finding security vulnerabilities requires a different mindset than finding general faults in software—thinking like an attacker. Therefore, security engineers looking to prioritize security inspection and testing efforts may be better served by a prediction model that indicates security vulnerabilities rather than faults. At the same time, faults and vulnerabilities have commonalities that may allow development teams to use traditional fault prediction models and metrics for vulnerability prediction. The goal of our study is to determine whether fault prediction models can be used for vulnerability prediction or if specialized vulnerability prediction models should be developed when both models are built with traditional metrics of complexity, code churn, and fault history. We have performed an empirical study on a widely-used, large open source project, the Mozilla Firefox web browser, where 21% of the source code files have faults and only 3% of the files have vulnerabilities. Both the fault prediction model and the vulnerability prediction model provide similar ability in vulnerability prediction across a wide range of classification thresholds. For example, the fault prediction model provided recall of 83% and precision of 11% at classification threshold 0.6 and the vulnerability prediction model provided recall of 83% and precision of 12% at classification threshold 0.5. Our results suggest that fault prediction models based upon traditional metrics can substitute for specialized vulnerability prediction models. However, both fault prediction and vulnerability prediction models require significant improvement to reduce false positives while providing high recall.


Software metrics Complexity metrics Fault prediction Vulnerability prediction Open source project Automated text classification 



This work was supported in part by the National Science Foundation Grant No. 0716176 and the CAREER Grant No. 0346903. Any opinions expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also thank the NCSU Software Engineering Realsearch group for their reviews for the initial version of this paper. We appreciate Dr. Robert Bell and Dr. Raffaella Settimi for their advice on statistics. Most of all, we thank the editors and reviewers of Empirical Software Engineering journal for their thorough reviews and helpful suggestions.


  1. Alhazmi OH, Malaiya YK, Ray I (2007) Measuring, analyzing and predicting security vulnerabilities in software systems. Comput Secur 26(3):219–228CrossRefGoogle Scholar
  2. Antoniol G, Ayari K, Penta MD, Khomh F, Guéhéneuc Y-G (Oct. 27–30 2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: 2008 Conference of the Center for Advanced Studies on Collaborative Research, Ontario, Canada.Google Scholar
  3. Arisholm E, Briand LC (Sep. 21–22 2006) Predicting fault-prone components in a Java Legacy System. In: the 2006 ACM/IEEE International Symposium on Empirical Software Engineering, Rio de Janeiro, Brazil, pp. 8–17.Google Scholar
  4. Arisholm E, Briand LC, Fuglerud M (5–9 Nov. 2007) Data mining techniques for building fault-proneness models in telecom Java Software. In: 18th IEEE Int’l Symposium on Software Reliability Engineering (ISSRE’07), Trollhättan, Sweden, pp. 215–224.Google Scholar
  5. Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Software Eng 22(10):751–761CrossRefGoogle Scholar
  6. Crews-Meyer KA, Hudson PF (2004) Landscape complexity and remote classification in Estern Costal Mexico: applications of Landsat-7 ETM+ Data. Geocarto International 19 (1).Google Scholar
  7. Fitzpatrick-Linz K (1981) Comparison of sampling procedure and data analysis for a Land-Use and Land-Cover Map. Photogramm Eng Rem Sens 47(3):343–351Google Scholar
  8. Gegick M, Rotella P, Williams L (2009) Toward non-security failures as a predictor of security faults and failures. Paper presented at the International Symposium on Engineering Secure Software and Systems, Leuven, Belgium, February 04–06.Google Scholar
  9. Gegick M, Williams L, Osborne J, Vouk M (Oct. 27 2008) Prioritizing software security fortification through code-level metrics. In: 4th ACM workshop on Quality of protection, Alexandria, Virginia, pp 31–38.Google Scholar
  10. Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Software Eng 26(7):653–661CrossRefGoogle Scholar
  11. Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: the 15th International Symposium on Software Reliability Engineering (ISSRE’04), Saint-Malo, Bretagne, France, pp 417–428.Google Scholar
  12. Hassan AE (2009) Predicting faults using the complexity of code changes. In: the 31st International Conference on Software Engineering, pp 78–88.Google Scholar
  13. Heckman S, Williams L (Oct. 9–10 2008) On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques. In: 2nd International Symposium on Empirical Software Engineering and Measurement, Kaiserslautern, Germany, pp 41–50.Google Scholar
  14. IEEE (1988) IEEE Std 982.1-1988 IEEE standard dictionary of measures to produce reliable software. IEEE Computer Society.Google Scholar
  15. Jiang Y, Cukic B, Menzies T (2008a) Can data transformation help in the detection of fault-prone modules? In: Proceedings of the 2008 Workshop on Defects in Large Software Systems (DEFECTS’08), Seattle, Washington, pp 16–20.Google Scholar
  16. Jiang Y, Cukic B, Menzies T (10–14 Nov. 2008b) Cost curve evaluation of fault prediction models. In: 19th International Symposium on oftware Reliability Engineering (ISSRE’08), pp 197–206.Google Scholar
  17. Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K (20–21 Sept. 2007) The effects of over and under sampling on fault-prone module detection. In: 1st International Symposium on Empirical Software Engineering and Measurement, Madrid, Spain, pp 196–204.Google Scholar
  18. Khoshgoftaar TM, Allen EB, Kalaichelvan KS, Goel N (1996) Early quality prediction: a case study in telecommunications. IEEE Software 13(1):65–71CrossRefGoogle Scholar
  19. Kim S, Ernst MD (Sep. 3–7 2007) Which warnings should i fix first? In: the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pp 45–54.Google Scholar
  20. Kim S, Zimmermann T, E. James Whitehead J, Zeller A (2007) Predicting faults from cached history. In: the 29th International Conference on Software Engineering, pp 489–498.Google Scholar
  21. Krsul IV (1998) Software vulnerability analysis. PhD dissertation, Purdue University, West Lafayette.Google Scholar
  22. Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Software Eng 34(4):485–496CrossRefGoogle Scholar
  23. Mark A. Hall, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15 (3).Google Scholar
  24. McCabe TJ (1976) A complexity measure. IEEE Trans Software Eng 2(4):308–320MathSciNetMATHCrossRefGoogle Scholar
  25. Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE’09), Vancouver, Canada.Google Scholar
  26. Meneely A, Williams L (November 2009) Secure open source collaboration: an empirical study of Linus’ Law” computer and communications security. In: Computer and Communications Security (CCS), Chicago, IL, pp 453–462.Google Scholar
  27. Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007a) Problems with precision: a response to “Comments on ‘Data Mining Static Code Attributes to Learn Defect Predictors’”. IEEE Trans Software Eng 33(9):637–640CrossRefGoogle Scholar
  28. Menzies T, Greenwald J, Frank A (2007b) Data mining static code attributes to learn defect predictors. IEEE Trans Software Eng 33(1):2–13CrossRefGoogle Scholar
  29. Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code feature: current results, limitations, new approaches. Autom Softw Eng 17(4):doi: 10.1007/s10515-010-0069-5 Google Scholar
  30. Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (May 2008) Implications of ceiling effects in defect predictors. In: the 4th International Workshop on Predictor Models in Software Engineering (PROMISE’'08), Leipzig, Germany, pp 47–54.Google Scholar
  31. Nagappan N, Ball T (May 15–21 2005) Use of relative code churn measures to predict system defect density. In: the 27th International Conference on Software Engineering, St. Louis, MO, USA, pp 284–292.Google Scholar
  32. Nagappan N, Ball T, Zeller A (May 20–28 2006) Mining metrics to predict component failures. In: the 28th International Conference on Software Engineering, Shanghai, China, pp 452–461.Google Scholar
  33. Neuhaus S, Zimmermann T, Zeller A (October 29–November 2 2007) Predicting vulnerable software components. In: the 14th ACM Conference on Computer and Communications Security (CCS’07), Alexandria, Virginia, USA, pp 529–540.Google Scholar
  34. NIST (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards & Technology.Google Scholar
  35. Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Software Eng 31(4):340–355CrossRefGoogle Scholar
  36. Ostrand TJ, Weyuker EJ, Bell RM (July 9–12 2007) Automating algorithms for the identification of fault-prone files. In: the 2007 International Symposium on Software Testing and Analysis (ISSTA’07), London, UK, pp. 219–227.Google Scholar
  37. Ott RL, Longnecker M (2001) An introduction to statistical methods and data analysis, 5th edn. Duxbury, Pacific GroveGoogle Scholar
  38. Porter MF (1980) An algorithm for suffix stripping. Program 16(3):130–137CrossRefGoogle Scholar
  39. Rice D (2007) Geekonomics: The real cost of insecure software. Addison-Wesley Professional,Google Scholar
  40. Shin Y, Williams L (Oct. 27 2008) Is complexity really the enemy of software security? In: the 4th ACM Workshop on Quality of Protection, Alexandria, Virginia, USA, pp. 47–50.Google Scholar
  41. Shin Y, Meneely A, Williams L (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Software Eng.Google Scholar
  42. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers, BostonMATHGoogle Scholar
  43. Zimmermann T, Nagappan N (10–18 May 2008) Predicting defects using network analysis on dependency graphs. In: the 13th International Conference on Software Engineering, pp. 531–540.Google Scholar
  44. Zimmermann T, Nagappan N, Williams L (Apr. 6–11 2010) Searching for a needle in a haystack: predicting security vulnerabilities for Windows Vista. In: 3rd International Conference on Software Testing, Verification and Validation, Paris, France, pp. 421–428.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.DePaul UniversityChicagoUSA
  2. 2.North Carolina State UniversityRaleighUSA

Personalised recommendations