Technical Lag in Software Compilations: Measuring How Outdated a Software Deployment Is
- Cite this paper as:
- Gonzalez-Barahona J.M., Sherwood P., Robles G., Izquierdo D. (2017) Technical Lag in Software Compilations: Measuring How Outdated a Software Deployment Is. In: Balaguer F., Di Cosmo R., Garrido A., Kon F., Robles G., Zacchiroli S. (eds) Open Source Systems: Towards Robust Practices. OSS 2017. IFIP Advances in Information and Communication Technology, vol 496. Springer, Cham
Large software compilations based on free, open source software (FOSS) packages are the basis for many software systems. When they are deployed in production, specific versions of the packages in the compilation are selected for installation. Over time, those versions become outdated with respect to the upstream software from which they are produced, and from the components available in the compilations as well. The fact that deployed components are outdated is not a problem in itself, but there is a price to pay for not being “as much updated as reasonable”. This includes bug fixes and new features that could, at least potentially, be interesting for the deployed system. Therefore, a balance has to be maintained between “being up-to-date” and “keeping the good old working versions”. This paper proposes a theoretical model (the “technical lag”) for measuring how outdated a system is, with the aim of assisting in the decisions about upgrading in production. The paper explores several ways in which technical lag can be implemented, depending on requirements. As an illustration, it presents as well some specific cases in which the evolution of technical lag is computed.
1 From Upstream to Deployment
Many production systems are deployed as collections of FOSS (free, open source software) components. All of them are based on the software produced by the corresponding FOSS projects. And usually, as time passes, those projects deliver new releases with more functionality, more fixed bugs, and in many cases, more overall stability and performance . We will use the term “upstream project” for referring to the project originally producing a FOSS component. Upstream projects release, from time to time, versions of the FOSS components they produce and maintain. This release may be continuous, each time a change is done to the code, or discrete, at specific points in time, when the project considers it convenient . In fact, many projects release in both ways: they release continuously in their source code management system (one release per commit), but they also offer “official” tagged discrete releases. In any case, we will consider the released component as the “upstream released package”.
But it is unusual that upstream packages are directly deployed in production systems. Instead of that, packages coming from software compilations, usually referred to as “distributions”, are used for deployment. We will refer to the packages released as a part of a software compilation as “distribution packages” (to avoid using “compilation packages”, which could be easily mistaken for “package produced as the result of compiling some software”). Distribution packages are produced by adapting upstream packages to the policies and mechanisms defined by the software compilation. That usually makes the deployment of components easier, more coordinated with other components, and in general more uniform. This adaption usually includes changes to the code, with respect to upstream. For example, Debian packages include certain files with information on how to build (produce a binary version from the source code) and install the package, and may include changes to improve or adapt it to the distribution .
The upstream project produces an upstream package. This will be a new upstream release of the FOSS component. This can be just a commit in a Git repository, or a curated official tagged release.
That new upstream package is used by a software compilation as the basis for a new release of their corresponding distribution package. For producing it, upstream code is used, maybe with some patches applied, and some extra files.
Deployers use a certain release of the distribution package to deploy the FOSS component in production.
A real deployment may include hundreds or thousands of FOSS components, each corresponding to a certain release of the corresponding upstream package.
2 Technical Debt and Technical Lag
Each deployment scenario has different requirements with respect to their “ideal” relationship with upstream. But in all cases, if no updating action is performed, they stay static, “frozen in the past”, while upstream evolves, fixing bugs and adding new functionality. The same happens with software compilations with respect to upstream, if they do not release new updated packages for their components.
Depending on the requirements of the final system, and the resources to maintain it, lags of deployed systems with respect to their software compilations, and to the latest upstream packages, can be larger or shorter. For example, in deployments with a large number of components and high stability requirements, updating even a single new package can be a challenge: the whole system has to be tested, since the updated package could break something, specially if it is a dependency to many other packages . Even if upstream developers and compilation maintainers did their own thoughtful testing, some integration bug could be triggered when deployed. A significant amount of effort has to be devoted to upgrading, and tracking the behavior of the system after the upgrade. Besides, in some cases the new version could break some assumption about how it works, affecting the overall functionality or performance. Therefore, every new version has to be carefully examined before it can be deployed.
As time passes, if deployed components are not upgraded, the system misses more and more new functionality and bug fixes: it is not “as good as it could be”. This situation is akin to the one described as “technical debt” for software development. The metaphor of “technical debt” introduced in 1992, tries to capture the problems caused for not writing the best possible code, but code that could (and should) be improved later on . The difference between code “as it should be” and code “as it is” is a kind of debt for the developing team. If technical debt increases, code becomes more difficult to maintain. A similar concept is “design debt”, which translates the concept to the design of software components .
The concept does not try to capture that deployment is not done “as it should be done”. On the contrary, the system “degrades” just with the passing of time, and not because some code needed to be improved when deployed.
Software development is not really involved, since it only happens upstream, and to a certain extent, in software compilations. Only deployment decisions are considered.
The metaphor of the debt is difficult to understand in this case, since it is not some “debt” being acquired at some spot, which has to be returned later. Our case could be more comparable to a tax, paid for not being updated, in the form of less functionality and more bugs that we could have if updating.
To recognize the differences, we are coining a new term, “technical lag”, which refers to the increasing lag between upstream development and the deployed system if no corrective actions are taken. Deployers need to balance the technical lag their systems acquire as time passes, with the effort and problems caused by upgrading activities.
3 Computing Technical Lag for a Deployment
When measuring technical lag, the first problem is to decide what is the “gold standard” with which to compare. Depending on requirements and needs, the comparison may focus on stability, functionality, performance, or something else.
For example, if there is interest in calculating the technical lag of a Debian-based distribution, with a specific interest in stability, we need to find the standard for stability for Debian-based distributions. One choice could be Debian stable (the Debian release which is currently considered “stable”1). In a different case, a system could be interested in being as much up-to-date as possible with respect to upstream, because they are interested in having as much functionality and bugs fixed as possible. In this case, the standard would be the latest checkout for each upstream package.
Once the gold standard is defined, we still need to find out the function to compute the lag between the component in the standard compilation and the deployed component. For example, if the focus is on security, the lag function could be the number of security issues fixed in the standard which have not been fixed in the deployed system. If the focus is functionality, the function could be the number of features implemented in the standard which have not been implemented in the deployed component. Some other interesting lag functions could be the differences in lines of source code between standard and deployed components, or the number of commits of difference between them, if both cases correspond to upstream checkouts.
Therefore, when defining the technical lag for a system, it is not enough to just define the deployment to consider. The standard to compare (or the requirements of the ideal deployment) and the function to calculate the lag between versions of the component need to be defined as well.
4 Formal Definition of Technical Lag
We define the lag aggregation function, LagAgg, as the function used to aggregate the package lags for a set of components.
the distribution selected as the standard distribution to compare
the function used to calculate the lag for each of the components in the deployment
the aggregation function for the lags of the deployed components
5 Calculating Lag Between Packages
After the formal definition of the concept, this section will illustrate with an example how the lag can be computed for a certain component, how results differ depending on the distribution selected as the gold standard, and how they however make sense from a practical point of view. For simplicity, we will work with packages for which upstream is working openly in a Git repository. This allows us to model upstream as following a continuous release process, with each commit in the master branch of the Git repository being a release.
The selected illustrative cases are the acl and Git packages. In the case of acl, we have found 24 packages in the Debian archive (released from 2005 to 2012), while for Git we have found 192 (from 2005 to 2016). Only since 2010 Debian Git packages correspond to the “current” Git package, the popular source code management system. Before 2010, there were 7 packages which corresponded to GNU Interactive Tools, a set of tools for extending the shell. Therefore, only data since 2010 is really relevant, and we consider 185 Debian Git packages.
To estimate the technical lag of each Debian package, we will assume that it is deployed as such, and compared with the current upstream master HEAD checkout at the time of the study (Oct. 2016). Therefore, following the notation in the previous section: \(d_i\) is each of the Debian packages considered; \(s_i\) is the latest upstream continuous release (defined as the HEAD of the master branch in the upstream Git repository); and LagAgg is summation.
different_lines and different_files: number of different lines or files, including those that are present only in \(d_i\) or \(s_i\).
diff_commits: number of commits, following the master branch of the upstream Git repository, needed to go from the most likely upstream commit corresponding to \(d_i\) to the commit corresponding to \(s_i\).
normal_effort: total normalized effort for the commits identified when computing diff_commits. We define normalized effort (in days) for an author as the number of days with at least one commit between the dates corresponding to two commits in the master branch. We define total normalized effort (in days) as the sum of normalized effort for all the authors active during the period between two commits.
The first two lag functions capture how different is the deployed component is from the component in the standard distribution (in our case, the most recent commit upstream). The last two functions capture how many changes (or, to some extent, effort in changing) were applied to the component in the standard distribution since the upstream release used to build the deployed package.
To provide some context, we computed as well common_lines and common_files, which is the number of lines an files in common between \(D_i\) and \(C_i\) (lines exactly the same). Those are not really Lag functions, since they do not fulfill the lagging condition: both grew larger when \(d_i\) and \(s_i\) were closer.
Figures 3 and 4 are more revealing, because they have into account two common practices in Debian: labeling package releases (in part) with upstream version tags, and releasing slightly modified versions for stable distributions.
The first is observed by the different colors and lines in the charts: all Debian packages corresponding to the same major release have been depicted in the same color, and linked with lines. Now, when we look at the charts for acl in Fig. 3, we see how the step in 2009 corresponds to a change in version (from pink to red), which did a major refactoring of the code. That is clearly appreciated in the functions showing common and different lines. In the case of Git, the transition from GNU Interactive Tools (horizontal line in the left) to the “real” Git is now evident.
The second practice is observed for Git in Fig. 4: the red horizontal lines on the right correspond to new releases of “old” packages, fixing some important bugs, since they are still maintained after a long time for some stable distribution. That helps to explain the spikes we saw in Fig. 2: those \(d_i\) are really “out of order” packages.
In all the figures for the same component, the different functions show similar trends. There are differences, but probably any of them would provide enough information for evaluating if the lag is large enough to justify an update of a deployed package.
6 Discussion and Conclusions
Software compilations for FOSS components are usually complex and large, and decisions about when to upgrade specific deployed packages, or whole deployed distributions, is not easy. The complexity of dependency management [7, 8, 9], or their significant evolution over time  are reasons both to delay upgrading (because of the potential problems), and to consider it (because of the added functionality and improved code). The same way that the complexity in dependencies, or the some parameters of their evolution  can be measured, we are exploring the concept of technical lag to measure their “degradation” over time with respect to some “ideal” gold standard.
Defining this degradation requires identifying the “ideal” packages to deploy (the “gold standard” to compare), and finding distance metrics (lag functions) to compare deployed software with that standard collection. To be useful, these metrics should track characteristics linked to requirements of the deployed system. As it was discussed in the first part of this paper, a system interested in stability may define very different metrics and gold standard than one interested in maximum functionality. In this paper we have just explored one kind of ideal distribution (the latest available upstream code), and two kinds of metrics: those based on differences in source code (in terms of lines or files), and those based on the number of changes (either the number of commits or the normalized effort). However, many other could be explored.
In particular, the exploration of criteria to define “gold standards” for general or specific scenarios seems promising. Complete industries, such as automotive, embedded systems or cloud, could be interested in finding standard collections with which to compare any deployment, in a way that they may decide better when and what to upgrade, given a set of requirements.
The definition of lag functions requires careful exploration as well. Some of them may be difficult, because the needed information may be heterogeneous, and distributed. But some seem feasible: the number of bugs fixed, or security advisories addressed; the number of new features implemented; improvements in performance, etc. (obviously, when there are ways of collecting that information). This makes us think that there is a lot of work to do in this area, and that we have not even collected all the low hanging fruits.
In this paper, we have considered that distribution packages are directly deployed in production, and therefore make no real difference between the packages in a distribution, and those packages when deployed. In the real world, packages may be deployed with some differences with respect to the distribution packages used. For example, some patches could be applied to fix known bugs. However, this does not make the model less general: the patched packages can be modeled as a new distribution, based on the “original” one, and all the above considerations will apply.
As a kind of a conclusion, we propose technical lag as useful concept to deal with large FOSS deployments. As real-world systems are increasingly built by assembling large collections of FOSS components, it is evident the need of techniques for managing their complexity. In some areas, such as dependency management or architectural evolution, research has been producing results for many years. But there is little evidence that may help in the system-wide maintenance procedures, including those relatively easy, such as when and what to upgrade. With this paper we propose a new line of research, trying to provide support practitioners in may fields of the industry.
Although we are focused on FOSS compilations, it is interesting to notice that the concept of technical lag can in theory be extended to non-FOSS components. However, in practical terms that may be difficult, except if source code and other related information needed to estimate lag is present. This can be the case in some special cases, such as when a company deploys systems composed by a mix of FOSS and proprietary components, but it has access to all the needed information for proprietary ones.
Acknowledgments and Reproduction Package
The work of Jesus Gonzalez-Barahona and Gregorio Robles has been funded in part by the Spanish Gov. under SobreVision (TIN2014-59400-R), and by the European Commission, under Seneca, H2020 Program (H2020-MSCA-ITN-2014-642954). The research described in this paper was started thanks to a contract funded by Codethink.
All the code and data needed to reproduce the results in this package is available from a GitHub repository (https://github.com/jgbarah/techlag/) (checkout as of December 2016).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.