The results of our preliminary study characterize the package-side fixing release, where we find that (i) up to 64.50% of vulnerability fixes are classified as a package-side fixing releasetypeSpatch and (ii) up to 85.72% of commits in a release are unrelated to the actual fix. Based on these results, we suspect that potential lags might occur while the package-side fixing release get adopted by the clients and transitively propagate throughout the dependency network. Hence, we perform an empirical evaluation to explore potential lags in the adoption and propagation of the fix.
Model and Track Lags
To explore potential lags in both adoption and propagation, we model and track the package-side fixing release and client-side fixing release as illustrated in Fig. 5.
Released and Adopted by Version -
We identify lags in the adoption by analyzing the prevalence of patterns between a package-side fixing release and client-side fixing release, which is similar to technical lag (Zerouali et al. 2018) and based on semantic versioning. The definition of package-side fixing release was explained in Section 2.1 which describes how the package bumped the release version number. Note that pre-releases or special releases are not considered in this study. We then define a new term called a client-side fixing release. Client-side fixing release describes how clients bumped the version of an adopted package up from vulnerable version to fixing release. There are four kinds of client-side fixing release: (i) client major landing (major number of an adopted package is bumped up), (ii) client minor landing (minor number of an adopted package is bumped up), (iii) client patch landing (patch number of an adopted package is bumped up), and (iv) dependency removal (adopted package is removed from a client dependency list).
Figure 5a shows an example of the two terms defined above. First, we find that the package-side fixing release for package \(\mathbb {P}\) is classified as a package patch landing. This is because of the difference between a fixing release (\(\mathbb {P}_{V1.1.1}\)) and its vulnerable release (\(\mathbb {P}_{V1.1.0}\)). Furthermore, we find that the client-side fixing release for client \(\mathbb {X}\) is a client-side fixing pacage minor landing. This is because of the difference between the adopted fixing release (\(\mathbb {P}_{V1.1.2}\)) and its previous vulnerable release (\(\mathbb {P}_{V1.0.1}\)).
Propagation Influencing Factors -
We define Hop as the transitive dependency distance between a package-side fixing release and any downstream clients that have adopted this fix, i.e., one, two, three, and more than or equal to four hops. As shown in Fig. 5a, client \(\mathbb {X}\) is one hop away from package \(\mathbb {P}\). We consider two different factors to model and track lags in the propagation:
-
1.
Lineage Freshness: refers to the freshness of the package-side fixing release as inspired by Cox et al. (2015) and Kula et al. (2018a). Figure 5a shows two types of lineage freshness based on the release branches including: Latest Lineage (LL): the client has adopted any package-side fixing release on the latest branch, and Supported Lineage (SL): the client has adopted any package-side fixing release not on the latest branch. Our assumption is that a package-side fixing release in the latest lineage is adopted faster than a package-side fixing release in a supported lineage, i.e., suffer less lags. Figure 5b shows that three versions of package \(\mathbb {P}\) (V1.0.2, V1.0.3, V1.1.3) are classified as SL.
-
2.
Vulnerability Severity: refers to the severity of vulnerability, i.e., H = high, M = medium, L = low, as indicated in the vulnerability report (as shown in Fig. 2a from Section 2). Our assumption is that a package-side fixing release with higher severity is adopted quicker, i.e., less lags.
Empirical Evaluation
The goal of our empirical study is to investigate lags in the adoption and propagation. We use these two research questions to guide our study:
(R
Q
1) Is the package-side fixing release consistent with the client-side fixing release?
Our motivation for RQ1 is to understand whether developers are keeping up to date with the package-side fixing release s. We define that package-side and client-side fixing releases are consistent if client-side fixing release follow package-side fixing release. For example, client-side fixing pacage minor landing and package-side fixing pacage minor landing combination is consistent, but client-side fixing releasetypeSmajor and package-side fixing releasetypeSpatch combination is not consistent. Our key assumption is that the inconsistent combination requires more migration effort than the consistent one, which in turn is likely to create lags.
(R
Q
2) Do lineage freshness and severity influence lags in the fix propagation?
Our motivation for RQ2 is to identify the existence of lags during a propagation. Concretely, we use our defined measures, i.e., propagation influencing factors, to characterize a propagation lags. Our assumption is that a package-side fixing release on the latest lineage with high severity should propagate quickly.
Data Collection -
Our data collection consists of (i) vulnerability reports and (ii) the set of cloned npm package and client git repositories. We use the same 2,373 vulnerability reports as shown in our preliminary study which crawled from snyk.io (Snyk 2015). As inspired by Wittern et al. (2016), we cloned and extracted information of npm package and client from public GitHub repositories. In this study, we consider only normal dependencies listed in the package.json file to make sure that the packages are used in the production environment. Hence, other types of dependencies including: (1) devDependencies, (2) peerDependencies, (3) bundledDependencies, and (4) optionalDependencies are ignored in this study since they will not be installed in the downstream clients in the production or cannot be retrieved directly from the npm registry. To perform the lags analysis, we first filter reports that do not have the fixing release. We then used the package name and its GitHub link from the reports to automatically match cloned repositories.
As shown in Table 3, our data collection included 2,373 vulnerability reports that disclosed from April 9, 2009 to August 7, 2020. There are 1,290 reports that already published the fixing releases which affect 786 different packages. The statistics of vulnerable packages and reports are presented in the table. For package and client repositories, we collected a repository snapshot from GitHub on August 9, 2020 with 152,074 repositories, 611,468 dependencies, and 1,553,325 releases (Table 4).
Table 3 A summary of the data collection which used to populate the dataset to answer RQ1 and RQ2 Table 4 A summary of dataset information for the empirical study to answer RQ1 and RQ2
Approach to Answer RQ1 -
The data processing to answer RQ1 involves the package-side fixing release and client-side fixing release extraction. Similar to PQ1, we first identify the package-side fixing release by comparing a vulnerable release and a fixing release. To track the client-side fixing release, we then extract the direct clients’ version history of the vulnerable packages. A client is deemed vulnerable if its lower-bound dependency falls within the reported upper-bound as listed in a vulnerability report.
To ensure quality, we additionally filter out packages and clients that did not follow semantic versioning as shown Table 5. Our key assumption is to keep packages and clients that follow a semantic version release cycle, i.e., packages and clients should have all the update patterns of major landing, minor landing, and patch landing. As a result, 4,000 packages and clients were filtered out from the dataset. As shown in Table 4, our final dataset for RQ1 consists of 410 vulnerability reports that affect 230 vulnerable packages and 5,417 direct clients.
Table 5 A summary number of filtered clients grouped by their update pattern in RQ1. There are 4,000 packages and clients that excluded in the RQ1
The analysis to answer RQ1 is the identification of lags in the adoption. We show the frequency distribution of client-side fixing release in each package-side fixing release. In order to statistically validate our results, we apply Pearson’s chi-squared test (χ2) (Pearson 1900) with the null hypothesis ‘the package-side fixing release and the client-side fixing release are independent’. To show the power of differences between each package-side fixing release and client-side fixing release combination, we investigate the effect size using Cramér’s V (\(\phi ^{\prime }\)), which is a measure of association between two nominal categories (Cramér 1946). According to Cohen (1988), since the contingency Table 6 has 2 degrees of freedom (df*), effect size is analyzed as follows: (1) \(\phi ^{\prime }\) < 0.07 as Negligible, (2) 0.07 ≤ \(\phi ^{\prime }\) < 0.20 as Small, (3) 0.20 ≤ \(\phi ^{\prime }\) < 0.35 as Medium, or (4) 0.35 ≥ \(\phi ^{\prime }\) as Large. To analyze Cramér’s V, we use the researchpy package.Footnote 11
Approach to Answer RQ2 -
The data processing to answer RQ2 involves propagation influencing factors extraction. There are three steps to track downstream clients and classify lineage freshness and severity. First, we build and traverse in a dependency tree for each package-side fixing release using a breadth-first search (BFS) approach. The meta-data is collected from each downstream client which includes: (i) version, (ii) release date, and (iii) dependency list, i.e., exact version and ranged version. We then classify whether or not a client is vulnerable using an approach similar to RQ1. Our method involves removing duplicated clients in the dependency tree, which is caused by the npm tree structure. Second, we classify the lineage freshness of a fixing release by confirming that it is on the latest branch. Finally, we extract the vulnerability severity from the report. As shown in Table 4, our final dataset for RQ2 consists of 617 vulnerability reports, 344 vulnerable packages with fixing releases, and 416,582 downstream clients.
The analysis to answer RQ2 is the identification of lags in the propagation. We show a summary statistic of lags in terms of days, i.e., the mean, the median, the standard deviation, and the frequency distribution, with two influencing factors. In order to statistically validate the differences in the results, we apply Kruskal-Wallis non-parametric statistical test (Kruskal and Wallis 1952). This is a one-tailed test.Footnote 12 We test the null hypothesis that ‘lags in the latest and supported lineages are the same’. We investigate the effect size using Cliff’s δ, which is a non-parametric effect size measure (Romano et al. 2006). Effect size are analyzed as follows: (1) |δ| < 0.147 as Negligible, (2) 0.147 ≤ |δ| < 0.33 as Small, (3) 0.33 ≤ |δ| < 0.474 as Medium, or (4) 0.474 ≤ |δ| as Large. To analyze Cliff’s δ, we use the cliffsDelta package.Footnote 13
Results to the Empirical Study
(R
Q
1) Is the package-side fixing release consistent with the client-side fixing release?
Our results are summarized into two findings. First, Table 6 shows the evidence that most of package-side fixing release s are package patch landings. As shown in the first row of a table, we find that 245 out of 410 fixing releases have package-side fixing releasetypeSpatchs (highlighted in red). We also find that there are 66 package-side fixing releasetypeSmajors and 99 package-side fixing pacage minor landings. This finding complements the result of PQ1.
Table 6 A contingency table shows the frequency distribution of client-side fixing release for each package-side fixing release
Second, Table 6 shows the evidence that there is a dependency between package-side fixing release and client-side fixing release variables. However, there is no consistency across package-side fixing release and client-side fixing release. As highlighted in Client patch landing row of Table 6, we find that there are only 21.28% of clients adopt a package-side fixing releasetypeSpatch as client-side fixing releasetypeSpatchs. Instead, clients are more likely have client-side fixing pacage minor landings, i.e., 36.84% of clients (highlighted in red). For the case of package-side fixing releasetypeSmajor, there are 53.61% of clients remove their dependencies to avoid vulnerability (highlighted in yellow). The majority of clients that still adopt the package-side fixing releasetypeSmajor are around 43.18% as client-side fixing releasetypeSmajor. The only case that we find consistent is package-side fixing pacage minor landing which 50.40% of clients adopt the fix as client-side fixing pacage minor landing (highlighted in green).
For the statistical evaluation, we find that there is an association between the package-side fixing release and the client-side fixing release. Table 7 shows that our null hypothesis on ‘the package-side fixing release and the client-side fixing release are independent’ is rejected (i.e., χ2 = 1,484.48, p-value< 0.001). From the Cramér’s V effect size (\(\phi ^{\prime }\)), we got a value of 0.37 which shows the large level of association.
Table 7 A result of statistical test for RQ1
(R
Q
2) Do lineage freshness and severity influence lags in the fix propagation?
Our results are summarized into two findings. First, Table 8 shows the evidence that the lineage freshness influences lags in a propagation. As highlighted in red, we find that LL has more lags than SL in terms of days for every hops, e.g., median of lags for the first hop: 164 days > 89 days.
Table 8 A summary statistic of lags in the propagation (# days) categorized by lineage freshness to show the difference between lags in LL and SL
Second, Table 9 shows the evidence that the vulnerability severity influences lags in a propagation. As highlighted in green, we find that the high severity fixing release has the least lags than others in every hop, e.g., the first hop: 91 days. We also find that the medium severity fix has the most lags than others as highlighted in red, e.g., the first hop: 194 days.
Table 9 A summary statistic of lags in the propagation (# days) categorized by vulnerability severity to show the difference of lags between high, medium, and low severity vulnerability fixes Table 10 A comparison of lags in the propagation between clients that adopt the latest lineage and supported lineage fixing release, i.e., by the median
For the statistical evaluation, we find that lags in the latest and supported lineage showed to have a significant (p-value < 0.001), but negligible to small association. Table 10 shows that our null hypothesis on whether ‘lags in the latest and supported lineages are the same’ is rejected, i.e., the first hop to the more than the fourth hop for medium severity, the second hop and more than the fourth hop for low severity; and the second hop for high severity.