Skip to main content

An automatic method for assessing the versions affected by a vulnerability


Vulnerability data sources are used by academics to build models, and by industry and government to assess compliance. Errors in such data sources therefore not only are threats to validity in scientific studies, but also might cause organizations, which rely on retro versions of software, to lose compliance. In this work, we propose an automated method to determine the code evidence for the presence of vulnerabilities in retro software versions. The method scans the code base of each retro version of software for the code evidence to determine whether a retro version is vulnerable or not. It identifies the lines of code that were changed to fix vulnerabilities. If an earlier version contains these deleted lines, it is highly likely that this version is vulnerable. To show the scalability of the method we performed a large scale experiments on Chrome and Firefox (spanning 7,236 vulnerable files and approximately 9,800 vulnerabilities) on the National Vulnerability Database (NVD). The elimination of spurious vulnerability claims (e.g. entries to a vulnerability database such as NVD) found by our method may change the conclusions of studies on the prevalence of foundational vulnerabilities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. Some states in US (i.e. Nevada and Minnesota) have adopted PCI DSS as a actual law for some business operating in these states (Williams and Chuvakin 2012, Chap.3)


  3. While this is sufficient for non-critical applications, it is possible that we might miss important information that is not a part of the vulnerable code footprint (Please refer to Sections 4.4 and 7 for discussion)

  4. In an SVN repository, the immediately preceding of the revision r f i x is r f i x −1. In some other repository such as Mercurial, the immediately preceding revision is determined by performing the command parent

  5. This page has been removed, but can be accessed by URL

  6. Unexpected and extremely rare events hat can have major impact but can be only explained in retrospection (Taleb 2010)


  • Agresti A, Franklin CA (2012) Statistics: the art and science of learning from data. Pearson Higher Ed

  • Allodi L, Kotov V, Massacci F (2013) MalwareLab: Experimentation with cybercrime attack tools

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y-G (2008) Is it a bug or an enhancement? a textbased approach to classify change requests

  • Bird Christian, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? bias in bug-fix datasets. In: Proceedings of the 7th european software engineering conference. ACM, pp 121–130

  • Chowdhury I, Zulkernine M (2011) Using complexity, coupling, and cohesion metrics as early predictors of vulnerabilities. J Syst Archit 57(3):294–313

    Article  Google Scholar 

  • Chromium Developers (2013) Chrome stable releases history. visited in April 2013

  • Krsul IV (1998) Software vulnerability analysis. PhD thesis. Purdue University

  • Massacci F, Nguyen VH (2014) An empirical methodology to evaluate vulnerability discovery models. IEEE Trans Softw Eng 40(12):1147–1162

    Article  Google Scholar 

  • Massacci F, Nguyen VH (2010) Which is the right source for vulnerabilities studies? An empirical analysis on mozilla firefox. In: Proceedings of the international acm workshop on security measurement and metrics (MetriSec’ 9)

  • Massacci F, Neuhaus S, Nguyen VH (2011) After- life vulnerabilities: A study on firefox evolution, its vulnerabilities and fixes. In: Proceedings of the 2011 engineering secure software and systems conference (ESSoS’11)

  • Meneely A, Srinivasan H, Musa A, Rodrguez Tejeda A, Mokary M, Spates B (2013) When a patch goes bad: Exploring the properties of vulnerability-contributing commits

  • Mozilla Security (2011) Missing CVEs in MFSA? Private Communication

  • Needham R (2002) Security and Open Source

  • Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th acm conference on computer and communications security (CCS’07), pp 529–540

  • Nguyen T, Adams B, Hassan AE (2010) A case study of bias in bug-fix datasets. In: Proceedings of 17th working conference on reverse engineering (WCRE’10)

  • Nguyen VH (2014) Empirical methods for evaluating empirical vulnerability models. PhD thesis. University of Trento

  • Nguyen VH, Massacci F (2013) The (Un) reliability of nvd vulnerable versions data: an empirical experiment on google chrome vulnerabilities. In: Proceeding of the 8th ACM symposium on information, computer and communications security (ASIACCS’13)

  • NIST (2012) Question on the data source of vulnerable configurations in an NVD entry. Private communication

  • Ozment A (2007) Vulnerability discovery and software security. PhD thesis. University of Cambridge, Cambridge

    Google Scholar 

  • Ozment A, Schechter SE (2006) Milk or wine: Does software security improve with age?. In: Proceedings of the 15th USENIX security symposium

  • Quinn SD, Scarfone KA, Barrett M, Johnson CS (2010) SP 800-117. Guide to adopting and using the Security Content Automation Protocol (SCAP) Version 1.0. Tech. rep. National Institute of Standards and Technology

  • Roumani Y, Nwankpa JK, Roumani YF (2015) Time series modeling of vulnerabilities. In: Computers and Security 51, pp 32–40

  • Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787

    Article  Google Scholar 

  • Sliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proceedings of the 2nd international working conference on mining software repositories MSR(’05), pp 24–28

  • Taleb NN (2010) The black swan:: The impact of the highly improbable fragility. Vol. 2. Random House

  • Wikipedia (2013) Firefox release history., visited in April 2013

  • Williams BR, Chuvakin AA (2012) PCI compliance, third edition: Understand and implement effective pci data security standard compliance. Ed. by Dereck Milroy 3rd. Elsevier, Syngress

    Google Scholar 

  • Younan Y (2013) 25 Years of vulnerabilities:1988-2012. Tech. rep. Source Fire

  • Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the 3th international workshop on predictor models in software engineering (PROMISE?07). IEEE Computer Society, pp 9–15

Download references


This work has been partly supported by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 256980 (NESSOS), and agreement no. 285223 (SECONOMICS), and grant agreement no. 317387 (SECENTIS), and the Italian Project MIUR-PRIN-TENACE.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Fabio Massacci.

Additional information

Communicated by: Andreas Zeller

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nguyen, V.H., Dashevskyi, S. & Massacci, F. An automatic method for assessing the versions affected by a vulnerability. Empir Software Eng 21, 2268–2297 (2016).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Software security
  • Empirical validation
  • Vulnerability analysis
  • National vulnerability database (NVD)
  • Browsers