1 Introduction

A Software Product Line (SPL) is a set of software-intensive systems sharing a common, managed set of features that satisfy the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way (Clements and Northrop 2002). Prescription is given in terms of both developing for reuse and developing with reuse. This results in the interplay of two distinct activities: (1) Domain Engineering (DE), where the scope and variability of the system is defined and reusable assets are developed to conform the SPL platform; and (2) Application Engineering (AE), where products (a.k.a. variants) are derived by selecting and resolving variability, i.e., by variability configuration (Pohl et al. 2005). To obtain the full benefits, AE should be limited to configuration without further modifying the derived products (Krueger 2006). However, this is not always possible. Time-to-market pressure, expedited bug fixes, or product specifics lead to development to first happen at the product level and to be later merged back into the SPL platform. Indeed, product-based evolution (a.k.a. variant-based evolution) is being acknowledged as a major strategy to drive the evolution of a product line (Krüger et al. 2020). Kruger et al. observe that this way of working switches the typical order of DE and AE, i.e., developers first customize core assets to product requirements (AE) to next consolidate the product upgrades into the platform (DE). The bottom line is that AE is no longer limited to mere configuration but it might also involve development. We refer to this practice as ‘product customization’ as opposed to ‘product configuration’, where AE is limited to selecting the features to be exhibited by the product with no development involved.

Product-based SPL evolution might result in tensions between the quality and reuse effectiveness required by DE, and the time-to-market and customer pressure faced by AE. SPL managers need to analyze whether efforts invested in product customization pay off in terms of better SPL scoping (i.e., deciding on the products, technical areas, and functionalities that a product line should support). Questions might arise about how much effort is dedicated to product customization; which are the most customized core assets; or to which extent is core-asset code being reused in a given product. We refer to this endeavor as Customization Analysis, i.e., understanding the functional increments in adjusting products from the last SPL platform release. Customization analysis can help SPL managers in different ways: (1) identify which product developments should be promoted to the core-asset base, (2) spot overloaded product teams with a heavy customization duty, (3) uncover indecisive or exploratory design where developers’ hesitation is reflected in the volatility of the code, or (4), spot eventual merging issues when product developments are merged back into the core-asset base. Being able to quantify the customization activity might help managers take informed decisions about both the SPL’s stability and the rearrangement of the SPL task force.

Despite the importance of these decisions, support for customization analysis is rather limited. We abound into this practice by making a case for the use of alluvial diagrams (a.k.a. Sankey diagrams) as a suitable visualization for customization analysis. Specifically, two research questions are tackled:

  • RQ1 (Problem Space): Which are the information needs for customization analysis? How much time is needed to fulfill these information needs?

  • RQ2 (Solution Space): Might alluvial diagrams be useful for supporting customization analysis visualization?

We address these questions through a joint effort between academia and industry, specifically, Danfoss Drives, a company division in charge of P400, a product line for the software embedded in frequency converters (Fogdal et al. 2016). On these grounds, we adopt an Action Design Research (ADR) approach (Sein et al. 2011). The objective of action research is to solve or at least explain, the problems of an analyzed situation (i.e., product customization) by the researchers interacting with the participants (i.e., P400 engineers) in the studied situations (i.e., customization analysis). Action research becomes action design research if the problem is tackled through an artifact (i.e., a new method or a new tool) whose design and evaluation is conducted within the organization (Sein et al. 2011). Accordingly, we develop a Visual Analytics tool for customization analysis. Akin to the ADR principles, we distill this experience in some general outcomes. By doing so, we aim at contributing to the two previous RQs:

  • RQ1. We characterize customization analysis through a Goal-Question-Metric (GQM) model. Based on Danfoss Drives’ practices, we introduce a set of analysis questions, and rate the importance and required time to answer such questions (Section 4).

  • RQ2. We make a case for alluvial diagrams as an effective way to realize the previous GQM model. We flesh out this case through CustomDIFF, a visual analytics tool that uses Git as the SPL code repository, and pure::variants as the variability manager (Section 6).

  • RQ2. We distill general principles from first empirical evidence through an expert evaluation at Danfoss Drives (Section 8).

This article is an extension of a REVE’17 workshop paper (Montalvillo et al. 2017), which is extended in three major aspects. First, we considerably expand the description about the phenomenon at hand: product customization. Second, it is methodologically sounder since it sticks to the Design Activity Framework proposed for visual analytics, and provides empirical evaluation. Third, this paper focuses on annotation-based SPLs using pure::variants, whereas the REVE paper tackled component-based SPLs using FeatureHouse. By conducting this research in close collaboration with an industrial partner, we hope to facilitate a valuable transfer into practice. We start by introducing the phenomenon under study: product customization.

2 The Phenomenon: Product Customization

Code development during AE (i.e., product customization) has been documented in distinct scenarios: to meet products’ deadline and budget (Deelstra et al. 2005; Jensen 2007; Schackmann and Lichter 2006), to expedite bug fixes (Fogdal et al. 2016), to speed up unexpected functional changes in customer needs (Nagamine et al. 2016; Carbon et al. 2008; Iida et al. 2016), to decrease reusable asset complexity for single-product needs (Deelstra et al. 2005; Kircher and Hofman 2012; Bartholdt and Becker 2011), and, finally, in the transition to a fully-configured SPL, product specifics might remain in product teams (Kodama et al. 2014; Takebe et al. 2009). Even fully-configurable SPLs might reach the scale and complexity that make maintenance in short time spans infeasible. Hence, when organizations are faced with urgent customer or market requests, product-specific adjustments are realized first in the product to be later propagated to the SPL platform (Deelstra et al. 2005).

Implementation wise, this is normally realized through a version control system. Here, a master branch keeps the core assets, i.e., those assets which are in the SPL baseline. During product customization, it is allowed for this master to be branched off to support (urgent) product specifics. This leads to a grow-and-prune branching model (Faust and Verhoef 2003). This model states that, during a time-lapse, quick reaction to changes often requires copying and specializing in the branches (grow) to be later cleaned up by refactoring and merged back into the master (prune). The notion of time-lapse is important since product branches are not intended to live for a long time but are to be periodically merged back into the master. In this way, product needs drive the evolution of the SPL.

Fig. 1
figure 1

WeatherStationSPL branching model: the master branch holds the core assets from where SPL products are branched off

As an example, consider the WeatherStationSPL, an SPL for building web-based applications for weather stations. We borrow this example from the instructional material provided by pure::variants.Footnote 1 Let us suppose that this SPL holds a baseline release Baseline-v1.0, that accounts for seven features, clustered around three parent features,Footnote 2 namely: Sensors that encompasses AirPressure, Temperature, and WindSpeed; Warnings that comprises Gale and Heat; and Languages that is the parent feature for English and German. Let us imagine that some urgent customization needs arise that prevent developers from waiting until the next platform release. This causes Baseline-1.0 to be branched off into three product branches: PR-NewYork, PR-Paris, and PR-Berlin (see Fig. 1). This unleashes the grow-and-prune process:

  • Grow-Customization. AE adjusts core assets to product specifics, potentially evolving the product through different versions (e.g., PR-Paris-v1.0).

  • Prune-Consolidation. Eventually, DE gets integration requests from AE. Missing to reintegrate these product variants back to the master risks product-line engineering becoming clone&own development (Krüger and Berger 2020).

It might happen that the customization might be of interest but not yet mature enough to be offered to products other than the product that hosts it. Here, DE developers might integrate the adjustment into the platform using spurious features, i.e., transient features that might be exclusively used for the driving product. This allows AE developers to generate again this product using traditional configuration mechanisms.Footnote 3 This guarantees that the platform is under control, and application projects may maintain their independence by providing product-specific artifacts as new features. After a modification has been evaluated, a judgment will be made as to whether the change should be applied to additional products and so incorporated into the core assets (Fogdal et al. 2016). For Danfoss Drives, decision is taken by the Change Control Board that includes domain experts but also application engineers who were involved in the customization in the first place. The choice is not always easy, as evidenced by the so-called configuration oscillation phenomenon (Faust and Verhoef 2003). Here, engineers can be hesitant about which new assets should be promoted to the platform.

The bottom line is that the Change Control Board regulates the tempos of this grow-and-prune process. In this setting, customization analysis helps the Change Control Board identify if the customization endeavor matches the plan by correlating where the customization is occurring with where the customization is planned to occur. A lot of customization happening in unplanned areas may be an indicator of problematic code, that is, a code that continuously requires patching. In general, customization analysis might help foresee distinct problematic scenarios, namely:

  • product units that are under heavy customer pressure in terms of specifics that need to be accounted for. This might lead to reinforcing these units or to rescheduling customer petitions,

  • stable features that are passed (almost) untouched when deployed in distinct products. This showcases mature features. This scenario might point to needing to consider deviating programming resources to other units,

  • unstable features which might require additional refactoring, and testing efforts to cope with emerging scenarios coming from customer petitions. This might lead to reinforcing the domain engineers in charge.

Tracking these scenarios directly from code is time consuming and error prone. Tools are needed that abstract from the code-base. We tackle this challenge as a joint effort between academia and industry. On these premises, we decided to follow Action Design Research as our research methodology (Sein et al. 2011).

3 A Brief on Action Design Research

Sein et al. define Action Design Research (ADR) as a research method for generating prescriptive design knowledge through building and evaluating IT ensemble artifacts in an organizational setting (Sein et al. 2011, p. 40). A key insight is the role played by the organization (i.e., Danfoss) in driving and shaping the design knowledge that ends up being instantiated in the IT artifact (i.e., the CustomDIFF visualization tool). Hence, the term ensemble artifact denotes the artifact taking its full meaning in conjunction with the context where it displays its utility (i.e., CustomDIFF reflects the practice and brings utility to Danfoss). Therefore, ADR conceives artifact design as a result of a researcher-practitioner collaboration within an organization. Figure 2 reproduces ADR stages and principles (Sein et al. 2011).

Fig. 2
figure 2

The ADR Method: the first three stages conform an iterative cycle where tasks are intermingled till distilled into the final learnings at the end of the project (taken from Sein et al. 2011). Arrows stand for influential flows

Problem Formulation. The first stage is triggered by a problem encountered in practice or predicted by researchers. It serves as a catalyst for developing a research strategy. This stage draws on two principles: Practice-Inspired Research and Theory-Ingrained Artifact (Sein et al. 2011). The former emphasizes viewing organization problems as knowledge-creation opportunities. The second principle highlights that the intervention (e.g., the IT artifact) is to be informed by theories, existing knowledge that grounds design decisions.

Building, Intervention, and Evaluation. This stage builds upon the problem framing and theoretical premises adopted in stage one. These premises provide a platform for generating the initial design of the IT artifact. From here on, the IT artifact is further shaped by organizational use and subsequent design cycles (Sein et al. 2011). Or using Sein et al’s principles: reciprocal shaping (i.e., the IT artifact and the organization feedback each other: prototypes serve to profile the interpretation of the organizational environment that help a better fit in subsequent versions), mutually influential roles (i.e., researchers and practitioners bring complementary insights), and authentic and concurrent evaluation (i.e., authenticity is a more crucial element for ADR than controlled conditions, thus assessment should take place within the company and throughout the research).

Reflection and Learning. ADR involves more than merely solving a problem to an organization. To guarantee that contributions to knowledge are made, conscious reflection on the problem framing, theories adopted, and the emerging IT artifact are critical. The principle is termed as guided emergence where ‘emergence’ captures this notion of unanticipated consequences that arise during the intervention in the organization and to which researchers should be sensitive to (Sein et al. 2011).

Formulation of Learning. At this point, we reach an artifact that brings with it some premises about the problem framing and the organization setting (i.e., an ensemble artifact). It represents a solution to a problem. Both can be generalized. Sein et al. suggest three levels for this effort: (1) generalization of the problem instance, (2) generalization of the solution instance, and (3) generalization of mechanisms through design principles. Design principles abstract away from the specific IT implementation into the abstract mechanisms that brought the utility, and underlie the solution.

The rest of the paper is structured along these stages.

4 Problem Formulation

Problem formulation draws on two principles: practice-inspired research and theory-ingrained artifact. The former reflects the premise that IT artifacts are ensembles shaped by the organizational context (Sein et al. 2011). Therefore, it is most important to describe the organization whose practices and characteristics will inform the artifact design. The second principle highlights that ADR does not stop at identifying a problem, but provides an intervention to alleviate the problem. This intervention should be informed by existing theories. This section sets this research’s problem along with these two principles.

4.1 Practice-Inspired Research

This research builds upon the fifteen-year experience of Danfoss Drives, a company division in charge of P400, a product line for the software embedded in frequency converters (Fogdal et al. 2016). P400 is a member of the SPLC’s Product Line Hall of Fame.Footnote 4 Table 1 characterizes P400 in terms three main contextual dimensions: the stakeholders, the complexity of the task (i.e., customization analysis), and the setting that frames the problem (i.e., the technical infrastructure and features of the SPL that might be relevant for the problem at hand).

Table 1 Contextual characterization
Fig. 3
figure 3

Variation Points for Sensors.js at Baseline-v1.0. VP1 applies when either WindSpeed or AirPressure are selected. VP2 applies for Temperature. Notice how VP2 is scoped within VP1

Danfoss follows an Annotation-Based Approach to P400 definition. This implies that variations are supported through pre-compilation directives. A directive states when a block code is to be included in the final product based on the presence or absence of a feature selection at configuration time. In pure::variants, these directives start with an opening directive //PV:IFCOND and end with a closing directive //PV:ENDCOND. Figure 3 shows an example. The snippet illustrates two variation points, i.e., VP-1 and VP-2 that correspond to two ifdef blocks. In the example, VP-1 comprises lines 24 to 49, whereas VP-2 expands along lines 30 to 46.

Fig. 4
figure 4

Visualizing code churn: diffing sensors.js between the copy at the master branch and the customized copy at the productBerlin branch

Danfoss follows a Product-Based Approach to P400 evolution This implies that DE and AE co-exist not only at the onset but throughout the SPL life-cycle. We aim at understanding how DE-AE fluctuates throughout this life-cycle. Here, finding out customization endeavors involves looking at the differences between core assets (kept in the master branch) and the namesake assets once customized by the product (kept in the product branches). Differences between branches are traditionally spotted through file diffing: DIFF(C0.file, C1.file). Back to the WeatherStationSPL example, Fig. 4 illustrates the case for sensors.js, using the unified format for diff display (van van Rossum 2018). This figure illustrates the customization diffing in terms of LOC (Lines Of Code) being changed. For each change hunk, the outcome indicates: the hunk header (i.e., starting and ending line numbers together with the heading of the function the change hunk is part of), the added lines (denoted by a plus sign with a greenish background), the deleted lines (denoted by a minus sign with a reddish background), and the context (i.e., the three nearest unchanged lines that precede and follow the change). The latter is especially important since it provides the context in which the change happened. However, sensors.js is just one of the thirty files the WeatherStationSPL encompasses. And these thirty files might potentially suffer changes by any of the three products. This implies \(30 \times 3\) potential DIFFs. Now move to Danfoss Drives.

Fig. 5
figure 5

Code Churn visualization: (Left) Microsoft’s Azure opts for a chronological plot where different color shades account for lines added, deleted or modified throughout; (Right) GitHub reflects code additions (above) and code deletions (below) along with a common axis in a weekly basis

Danfoss faces scalability issues when conducting Customization analysis for P400. P400 holds over 10,000 core assets and 20 products. Though only a fraction of these assets needs to be upgraded by a limited number of products in the interim that goes between two SPL releases, this number is still high enough to be handled through traditional DIFF utilities. The complexity of conducting customization analysis is felt to be proportional to the number of products, features and LOC that are affected in the interim between two SPL releases. In addition, this complexity might also be impacted by the tangling degree and the scattering degree that the SPL exhibits. Table 1 collects some figures about these concerns after the work of Zhang et al. (2013).

For single-off development, traditional DIFF utilities include Microsoft’s Azure DevOps Server, and GitHub’s code frequency graph (see Fig. 5). Here, code churn is aggregated in terms of modules like files, package and so on. Yet, these units of aggregation might not fit the abstractions SPL analysts think about. Therefore, in adopting traditional DIFF utilities, customization analysis faces two main issues:

  • Scalability. Traditional DIFF utilities might not scale to the myriad of artifacts that SPLs exhibit,

  • Abstraction. Traditional DIFF aggregation is conducted in terms of files while features are crosscuts that need to be abstracted out of a set variation points spread around different files.

Fig. 6
figure 6

Time effort in conducting distinct Customization Analysis questions. IDs help link to Table 2

The problem can then be stated in terms of current DIFF tools not being suitable for accounting for the Change Control Board’ information needs. To provide some empirical evidence about this problem, we conducted a survey among Danfoss’ engineers about what would be the time it would take them to answer some customization-analysis questions using traditional DIFF utilities. Figure 6 depicts the results. Though the rationales for the questions are not provided until Section 5, this figure evidences that some questions might require a few hours to be answered. And this is just for one question, let alone a full analysis that might well involve several questions. This is especially so for the holistic perspective and the product perspective (see Fig. 6). Here, answering most of the questions required a few hours. This time effort is what motivates this research in the first place.

4.2 Theory-Ingrained Artifact

The previous section makes the case for current DIFF tools not scaling up to SPL analysis needs. After all, DIFF tools are not designed for analysis of product-based evolving SPLs but for code peering in single-off development. Therefore, the challenge is not so much about the metric itself (i.e., code churn) but providing the right abstraction at the scale of SPLs. To this end, we resort to theory on Visual Analytics to inform our intervention.

Visual Analytics is defined as the science of analytical reasoning facilitated by interactive visual interfaces (Cook and Thomas 2005). The basic idea is to visually represent the data to allow the human to directly interact with the information, to gain insights, and to ultimately make optimal decisions. Broadly speaking, Visual Analytics helps create a path from data to decision. In this process, visualization plays a twofold role (Reddivari et al. 2014):

  • as an abstraction means, by highlighting certain constructs and relationships while ignoring others;

  • as an interaction medium, by supporting the workflows for decision making.

By introducing Visual Analytics, decision makers can focus their full cognitive and perceptual attention on visualization-enabled analytical reasoning while taking advantage of automatic data processing techniques. The Design Activity Framework is a process model for visualization design (McKenna et al. 2014). Once we are aware of the problem and equipped with the theory, we can now move to the next stage: Building, Intervention, and Evaluation. Figure 7 depicts the three main cycles that ended up in CustomDIFF, the Visual Analytics tool. Next, we abound in each of these cycles.

Fig. 7
figure 7

Evolution of the CustomDIFF project. Y axis stands for the team members. X axis stands for the evolution in time along with the three main cycles that output distinct artifacts (i.e., the GQM model, the developer perspective, the CustomDIFF prototype)

5 Building, Intervention, and Evaluation: Cycle 1

So far, we have identified information needs whose fulfillment can hardly be satisfied with current tools. Our hypothesis is that dedicated Visual Analytics can improve this situation. Specifically, we adopt the Design Activity Framework framework to inform our buildingFootnote 5 (McKenna et al. 2014).

5.1 Building & Intervention

The first step is understand i.e., grasping the problem domain and target users. This activity aims at acquiring knowledge about the phenomenon of interest (i.e., customization analysis), domain-specific questions, and the types of measurements to appreciate this phenomenon (Reddivari et al. 2014). To this end, we resort to a Goal-Question-Metric (GQM) model (Basili et al. 1994) (see Fig. 8).

Fig. 8
figure 8

The GQM model

Goal. Following Basili et al.’s recommendations (Basili et al. 1994), we state our goal as follows: evaluate the effort trend (issue) in product customization (object) from the Change Control Board perspective (viewpoint). Our goal is to trace how much effort is being put into product customization.

Question. Questions are posed to characterize the way the assessment/achievement of a specific goal is going to be performed based on some characterized model (Basili et al. 1994). For this model, feature and product emerge as natural constructs. Indeed, SPL releases, product configuration, programming assignments, or work positions are commonly described in terms of features and products. Using the notions of ‘feature’ and ‘product’, we can arrange the questions along with three distinct scopes (refer to Table 2 for the full list of questions):

  • the feature-focused scope, e.g., for a given feature, which products are customizing it,

  • the product-focused scope, e.g., for a given product, which features have been customized; if a product has made no change, it is not considered.

  • the holistic scope, e.g., which is the whole customization effort from the last platform release,

Measurement. Once the questions have been developed, we proceed by associating the questions with appropriate metrics. Factors to be considered for this selection include (Basili et al. 1994): the quantity and quality of the existing data, and the maturity of the measurements. On these grounds, we resort to code churn, a well-established metric to assess evolution in one-off development (Ajila and Dumitrescu 2007; Hall and Munson 2000).

Code churn is a popular measure to inform about the rate at which the code evolves (Khoshgoftaar and Szabo 1994). The churn for a file over a specified period is computed as: \([LinesAdded]+[LinesDeleted]+[LinesModified]\) (Faragó et al. 2015). Traditionally, this is achieved by diffing files (Schulze et al. 2016): \(DIFF(aFile_{t1},aFile_{t0})\) displays the code churn for file aFile in the interval [t0,t1]. In single-off development, code churn is being extensively used for defect prediction (Nagappan and Ball 2005), assess code erosion (Ohlsson et al. 1999), or detect code volatility (Faragó et al. 2015).

Moving back to SPLs, code churn could also be a valuable metric for customization analysis: \(DIFF(aFile_{t1},aFile_{t0})\) where t0 stands from the time where the product branch is generated out of the master’s, and t1 corresponds to a time before the product branch is merged back to the master’s. Though the notion of churn might be appropriate, the object of the churn is not. Rather than files, the GQM’s questions are posed in terms of ‘features’ and ‘products’. Needed are mechanisms that move from file-based diffing to higher abstraction terms. Specifically, consider DIFF, a function that returns the code churn for aFile as it is kept in the master branch (i.e., aFile.core) vs. how it has been customized in a product branch (aFile.aProduct). Rather than DIFF(aFile.core, aFile.aProduct, we long for DIFF(aFeature.core, aFeature.aProduct) utilities that abstract out dozens of DIFF(aFile, aFile) for those aFiles that realize aFeature as it is being customized for aProduct. That is, DIFF(aFeature.core, aFeature.aProduct) outputs the code-churn aggregate of customizing aFeature for aProduct, no matter the files the aFeature is spread over.

On these grounds, we could define three sort of aggregates to measure the customization diffing (see Fig. 8), namely: (1) feature-focused DIFF (i.e., DIFF(aFeature, *)) where aFeature stands for the set of aFile realizing this feature, and ‘*’ indicates no matter the product branch; (2) product-focused DIFF (i.e., DIFF(*, aProduct)) where aProduct stands for the set of aFiles realizing this product, and ‘*’ indicates no matter the feature; and (3), holistic DIFF (i.e., DIFF(*, *)) where ‘*’ corresponds for all aFile no matter the feature nor the product. These aggregates compute the sum of code churn for the underlying involved files.

5.2 Evaluation

ADR promotes continuous assessment of the intervention (principle 5: authentic and concurrent evaluation) (Sein et al. 2011). The intervention is not only the IT artifact but also the design principles this artifact realizes. This includes the information needs the IT artifact aims to fulfill. This section provides a first validation of the GQM model.

Table 2 Validating the GQM. Rating the importance of information needs using a LIKERT scale from 1 (Not Important) to 5 (Very Important)

Participants. Participants were selected who had at least one-year experience on Danfoss Drives. Among the eight participants that took part in this evaluation, three had 10 years of experience while the other five accounted for 9, 7, 6, 3, and 1 year of experience, respectively.

Process. A questionnaire was prepared to assess the relevance of the above mentioned GQM model in terms of the importance given to the different questions raised by it. Face validity was conducted, that is, we checked whether the questionnaire seemed to correspond to the GQM model. A first draft of the questionnaire was prepared by the authors and next, the questionnaire was delivered to this paper’s Danfoss Drives’ author. Based on his comments, some amendments were added to clarify the purpose and adapt the terminology to that of the practitioners. Next, practitioners were requested to indicate the importance given to each question using a LIKERT scale from 1 (Not Important) to 5 (Very Important).

Results. Table 2 shows the results. Some conclusions can be drawn from them:

  • Perspective wise, both feature-focused and product-focused are similarly rated. The highest rated questions are “for the feature F1, which products are customizing it” (avg. 4) and its sibling, i.e., “for the product P1, which are the features being customized” (avg. 4.12). The lowest rated questions correspond to the fine-grained holistic perspective. This might be due to this information being better captured at either the feature perspective or the product perspective.

  • Aggregation-level wise, quite an unexpected result: intermediary aggregates were not prioritized. When pondering analysis needs, participants seem to favor either a general overview of the customization diffing or, instead, being able to dive into the specifics.

The bottom line is that all questions rate above 2.5, with four questions going beyond 3.5 (i.e., above “Moderately Important”). This provides first evidence about the interest in customization analysis. Yet, Visual Analytics does not stop at identifying the right data. Making better decisions also depends on the ability to understand and communicate adequately the measurement to the decision-makers. This moves us to the next cycle.

6 Building, Intervention, and Evaluation: Cycle 2

The basic idea is to visually represent the information, allowing humans to interact directly with such information, to gain insight, to draw conclusions, and to ultimately make better decisions. Figure 9 depicts the main ingredients of Visual Analytics tools (Reddivari et al. 2014). For our purposes, the Data is kept in a Git repository. This data corresponds to code files whose versions are arranged w.r.t the Git version control system model. This code is pre-processed to distill the metrics to be used to assess the customization effort (i.e., code churn). As a result, code churn is obtained and described along with a Model that serves the information needs which were identified in the GQM analysis. This Model ends up being realized through a database that acts as the back-end for a Web application that supports the Visualization strategy. Main architectural components include: a mining component that extracts data from Git repositories; a database that holds mined data along with the data Model; and a front-end component that queries the database and display results using appropriate visualization means.

Fig. 9
figure 9

Visual analytics main interactions (adapted from Reddivari et al. (2014))

This section instantiates this architecture for CustomDIFF, a web-based tool for customization analysis. CustomDIFF uses pure::variants (Pure-Systems 2018) and Git as the variability management tool and version control system, respectively. Three additional resources are made available:

  • an interactive online version of CustomDIFF which the reader is encouraged to access: http://customdiff.onekin.org/,

  • a video describing CustomDIFF (6’): https://vimeo.com/577936099,

  • a Zenodo replication package for the CustomDIFF implementation: https://doi.org/10.5281/zenodo.5728000

6.1 Building & Intervention

6.1.1 The Model

Dimensional Modeling is a data structure technique that is specifically designed for data storage when used for decision taking. This implies a sharp distinction between two sort of tables: “fact” and “dimensions” (see Fig. 10). The “fact” table collects the events of the phenomenon under study. For our purposes, the phenomenon under study is “product customization”. We consider an event of this phenomenon to occur when it happens the consecutive deletion/addition of code churn for a file. Each code churn gives rise to a fact tuple. Fact properties include: the number of lines added, the number of lines deleted or the actual code being changed (custom_diff) in a customization episode.

Fig. 10
figure 10

CustomDIFF’s data model

Facts are the finest grain of the customization endeavor. Obtaining a higher perspective of the customization endeavor requires these facts to be aggregated along different dimensions: “the what” (i.e., the variation point being affected by the customization), “the where” (i.e., the product in which the customization occurred), “the when” (i.e., the time of the product release), and “the who” (i.e., developers who conducted the customization).

Figure 10 displays the database schema. This schema will be instantiated from the codebase of the SPL at hand. The SPL codebase is held in a Git repository where the master branch contains the core-asset baseline while products branch off the master. This moves us to the next sub-section.

6.1.2 The Data

In data warehousing, ETL, which stands for extract, transform and load, is a data integration process whereby data is extracted from data sources (that are not optimized for analytics), and moved to a central host (which is). In our setting, ETL mines the SPL’s Git repository, runs the corresponding diffing, and populates the tables. Specifically, facts (i.e., customization_fact tuples) are obtained by working out a DIFF between the namesake artifacts of the master branch and the product branches. For the DIFF depicted in Fig. 4, two facts would be obtained. Fact #1 would stand for the changes introduced in line 29, whereas Fact #2 would correspond to those changes introduced in lines 34–38. Details about the mining algorithm are provided in the Appendix. A related approach is described by Zhang et al. (2013).

6.1.3 The Visualization

This section makes a case for the use of alluvial diagrams (a.k.a. Sankey diagrams) as a suitable visualization for customization analysis, measured in terms code churn from diffing. Broadly, we advocate to ideate the notion of customization diffing as an energy flow. The metaphor of energy flow is used in domains familiar with the dynamics of mass flow (e.g., energy, capital, transportation). Examples can be found for resource use (Lupton and Allwood 2017), energy flow (Schmidt 2008; Subramanyam et al. 2015) or material flows (Schmidt 2008), to name a few. For our purposes, the flow stands for the customization energy spent on attending products’ specifics. This flow moves from products to features, and from there, it percolates down to the files where these features end up being realized. For flow display, alluvial diagrams are commonly used.

Fig. 11
figure 11

Customization diffing are visualized as flows from products (as customization emitters) to features (as customization sinks). For example: ProductLondon-v1.0 requires customization for three features: WindSpeed, AirPressure & Temperature

We resort to alluvial diagrams to factor out the customization effort. Figure 11 shows the case for the WeatherStationSPL. On the left, flow emitters: the products where AE spends energy to account for their specifics. On the right, flow sink: the features where the energy produced during AE ends up. Arrows connect products with those features being the subject of a customization intervention.

Nodes are the issuers of the flow. The larger the node area, the larger the flow potential. If the node stands for a product, then the node width corresponds to the effort in customizing this product. If the node stands for a feature, then the node width corresponds to the effort in adjusting this feature no matter the product. On the other hand, arrows stand for the flow between nodes. If the arrow connects product P to feature F, then the arrow’s width captures the effort involved in adjusting F to P demands.

Alluvials pursue a prompt answer to the GQM’s questions. Take Fig. 11 as an example. Which features are being customized? AirPressure, Temperature, WindSpeed, Gale and German. Which products are customizing features? productBerlin, productLondon, etc. How much effort (i.e., code churn) has been spent on customizing the feature AirPressure? This is reflected in the node width; the tool also shows the customization diffing (i.e., 36 LOC) on a mouse over. How much effort has spent productParis in customizing AirPressure? This is reflected in the arrow width; mouse over to get 31 LOC. This anecdotal evaluation looks promising. Yet, there exists a first stumbling block: scalability.

SPLs can hold hundreds of features that can be combined into thousands of products. Although not every feature/product is involved in each customization cycle, visualization might become cluttered for a larger number of features or products. Filters and grouping are the most common techniques to reduce the number of nodes. Filters limit the flow to either the products or the features that meet the filtering criteria. Grouping permits products/features to be grouped into clusters. Products could be grouped based on product units. A product unit might be in charge of one or several products. Grouping permits this effort to be visualized for the whole unit. Likewise, features might be grouped based on their parent features so that the effort is displayed for the whole set of child features. The grouping of nodes also implies a grouping of the adjacent edges. Figure 12 shows groupings for the case of features. Now the flow concentrates along WeatherStationSPL’s parent features, i.e., Sensors, Languages, and Warnings.

Fig. 12
figure 12

Grouping. The Feature bar aggregates values by parent features. Likewise, the Product bar could also aggregate values by product units

6.2 Evaluation

To assess the suitability of alluvial diagrams, informal demo sessions were conducted. Interestingly enough, during the demo sessions two sort of clusters, similar to those that emerged during the understand evaluation phase, started to surface: coarse-grained (preferred by the Change Control Board) and fine-grained (preferred by developers). At the onset, researchers were mainly concerned about an intervention for correlating where the customization was occurring with where the customization was planned to occur. Here, the stakeholders are SPL managers interested in tracing the balance between DE and AE. Yet, as the interaction with practitioners advanced, a new role started to emerge, i.e., developers, who were concerned not only about the high-level customization representation but also about the code behind it. Rationales rest on the grow-and-prune model. Single-off development is more about growing than about pruning. That is, single-off development does not face later consolidation of upgrades into a common platform (i.e. the master). Applications evolve at their own pace. By contrast, product developers at SPLs are well aware that sooner or later their upgrades need to be merged back into the SPL platform. The bottom line is that developers’ concerns include not only customizing but also consolidating the upgrade. The alluvial diagram so far seems to be appropriate to capture the customization diffing but falls short to assess the consolidation effort. This requires moving down to code, to how features are fleshed out.

7 Building, Intervention, and Evaluation: Cycle 3

This section moves the developer perspective to the forefront. The alluvial diagram so far might be sufficient for managers to evaluate the effort trend in product customization (i.e., the GQM model in Fig. 8). Yet, developers do not stop at the big picture. When it comes to calibrating the consolidation effort, developers might need to go down to how features are both spread along distinct classes (i.e., scattering) and mixed up with other features (i.e., tangling).

7.1 Building & Intervention

CustomDIFF needs to cater for scattering and tangling. Nevertheless, the introduction of additional details might lead to cluttered interfaces. Hence, we should care not only for the visualization as such, but also for the interaction workflows that allow users to smoothly transit between the distinct perspectives. Accordingly, this subsection is structured along these two concerns: fine-grained visualization and workflows.

Fig. 13
figure 13

Scattering visualization. Two additional Package bars are added at each extreme to expand the flow down to files. For example: customization on AirPressure (no matter the product) goes across files scale.js, sensors.js & settings.js

7.1.1 Extending The Visualization

So far, alluvial diagrams stopped at products and features. This alluvial is now extended to account for scattering, tangling and code peering.

Scattering. We resort to extending the scope of the alluvial flow by incorporating the package bar at both ends (see Fig. 13). Details follow:

  • On the right side, the package bar extends the flow from feature to file so that DE developers can have a piece-meal perspective on how their core assets are being adjusted. This permits an estimation of the eventual consolidation effort. For instance, the code churn for file scale.js is three times larger than the one for sensors.js. Accordingly, engineers might now foresee that merging scale.js might most likely demand more time than merging sensors.js.

  • On the left side, the package bar extends the flow from product to file so that product units can assess what other units are working on the same files. This permits promptly spotting overlapping risks. For instance, developers at productBerlin-v1.0 and productSeville-v1.0 might probably arrange a meeting together to prevent redundant efforts. This in turn, might alleviate the burden at consolidation time by smoothing out their differences in advance.

Tangling. We resort to grouping (see Fig. 14). So far, the Feature bar captures the customization diffing along with the features being updated. Nodes might account for parent features that can be next broken down into their child features. We can further break down feature nodes into Feature Sibling (FS) Sets, a set of features that appear together in at least one ifdef directive, no matter the boolean expression that links them together. Figure 14 shows the case of WindSpeed. This feature is broken down into two FS sets. First, the WindSpeed set which accounts for modified ifdef blocks with WindSpeed as the only feature in their pre-compilation directive. Second, the AirPressure - WindSpeed set that agglutinates ifdef blocks where these two features are referred to in their directives, no matter the boolean operator. The flow from productBerlin-v1.0 to AirPressure - WindSpeed depicts the tangled effort made to evolve the AirPressure and WindSpeed features. By zooming into FS sets, developers can have a first insight into the extent of tangling along with the customization diffing.

Fig. 14
figure 14

Tangling visualization for feature WindSpeed. Customizations involving #ifdef blocks that include WindSpeed in their directives are broken down into two FS sets: ‘FeatureSibling: WindSpeed’ and ‘FeatureSibling: AirPressure-WindSpeed’

Code Peering. CustomDIFF sticks to the traditional DIFF view for code peering. Yet, some subtle changes are needed. Traditionally, the DIFF context refers to the three nearest unchanged lines that precede and follow the change (see Fig. 4). The context serves as a reference to locate the places of the changed lines. However, this might not be enough for variability-intensive code. Here, the code holds variability points where a pre-compilation directive regulates whether the block code is to be included in the final product. This pre-compilation directive is a main contextual clue to know which features are affected. Yet, these pre-compilation directives are right at the start of the block, potentially away from where the change has occurred, and hence, these directives might not show up in a traditional DIFF context. Figure 4 illustrates this situation. The change is located at line 29. However, the context (i.e., lines 26, 27, 28) does not include the pre-compilation directive. This deprives engineers from promptly knowing which features are affected. Therefore, feature-minded DIFF utilities should include pre-compilation directives as part of the DIFF context. Figure 15(c) mimics the case of Fig. 4, but now information about the pre-compilation directive is included into the hunk headings. The hunk corresponds to the changes in lines 43 and 50 under the scope of VP-1 (WindSpeed or AirPressure). In the case that VP-1 is nested within another variation point (e.g., VP-2), this is reflected in the hunk’s heading along the pattern :\(<enclosing VP> --> nested into --> <enclosed VP>\). In short, pre-compilation directives should become a main ingredient of the DIFF context for variability-intensive code.

7.1.2 The Workflows

Based on insights gained during the GQM evaluation, CustomDIFF considers two main perspectives: coarse-grained (preferred by the Change Control Board) and fine-grained (preferred by developers).

The Change Control Board Workflow. Here, stakeholders are interested in correlating where the customization is occurring with where the customization is planned to occur. They aim at differentiating between mature features (none or minimal customization needed) vs. immature features (recurrent customization required). They could start by displaying the alluvial diagram for the selected features at the highest abstraction level (see Fig. 12). Next, they can click on the grouping button to unfold parent features into their child features (see Fig. 11), or even further, moving down to the feature siblings where the feature participates (see Fig. 14). Notice that alluvials are abstracted from the customized code, hence, only product, features or variation points that have been customized will appear in the alluvial. Additional details can be obtained by extending the flow to the right/left by introducing the Package bar alongside the Feature/Product bar (see Fig. 13).

One key characteristic of interactive alluvials is the tracing of flows throughout the graph. Users may select a node or edge, and the contributions of all flows are highlighted and moved to the foreground. In this way, interactive alluvials draw the attention of the analyst to the largest flows, the largest consumer, or the main flow deviations or losses.

Fig. 15
figure 15

The Developer workflow. Moving down to code through hypertext navigation along with three HTML pages: (a) alluvial diagram; (b) gateway panel; (c) DIFF View; and (d) back to the alluvial diagram for a related feature

The Developer Workflow. Here, stakeholders are interested in assessing the complexity of integrating product branches back into the master. This might require moving down to code, i.e., from alluvials to DIFF views. This is achieved through a ‘gateway panel’. As an example, consider we would like to get insights on the effort of merging back customizations on WindSpeed to the master. We start by filtering upon WindSpeed (see Fig. 15a). Four products turn out to be responsible for having customized WindSpeed: productBerlin, productLondon, productParis, and productSeville. By displaying the Package bar, we also note that two files are being affected: scale.js and sensors.js. The visualization so far might help to get a global view, yet it is insufficient for developers who need to foresee the integration difficulties ahead. To this end, both nodes and arcs of alluvials are turned into hyper-links. Click on an arrow and a new browser tab opens a gateway panel. This panel displays a list of the affected ifdefs blocks (Fig. 15b). Click on an ifdef block for the corresponding code to appear in the DIFF view (Fig. 15c). In the example, the DIFF view shows that WindSpeed is tangled with other features (e.g., AirPressure). Hence, consolidating WindSpeed might also impact AirPressure. The developer might now wonder the extent to which AirPressure is being customized, which can be analyzed by moving back to an alluvial view. This workflow might then need to move back and forth between the alluvial view and the DIFF view. To this end, the DIFF view is turned into a hypertext, i.e., feature names in the DIFF context (variation points) are turned into URLs. Click on an URL and a new browser tab shows the alluvial view for the feature at hand (Fig. 15d. In this way, a CustomDIFF session is realized as a succession of tabs where alluvial views and DIFF views intermingle as consecutive browser tabs.

7.2 Evaluation

According to the GQM in Fig. 8, we aim at evaluating the effort trend in product customization. This is broken down into three perspectives: holistic, feature-centric and product-centric. These are the information needs. Yet, the added value of visual analytics goes beyond the information needs to care also for the way this information is delivered. Therefore, this section evaluates CustomDIFF from the analytical side (i.e., usefulness) but also the visual side (i.e., ease of use). Perceived usefulness is the degree to which a person believes that using a particular system would enhance his or her job performance. On the other hand, perceived ease of use is the degree to which a person believes that using a particular system would be free of effort (Davis 1989). Accordingly, we can state our evaluation goal as:

Assess the ease of use and usefulness of CustomDIFF with respect to conducting customization analysis from the point of view of SPL practitioners in the context of Danfoss Drives.

The question remains how to conduct the evaluation. Sein et al. emphasize that authenticity is a more important ingredient for ADR than controlled settings (Sein et al. 2011). For an authentic evaluation, Sjøberg et al. (2002) introduce three factors: (1) realistic participants, (2) realistic tasks, and (3) a realistic environment. Sjøberg et al. recognize the difficulties to meet all factors simultaneously, given the incipient nature of the interventions being checked out. SPLs are no exception, quite the contrary. In our case, it was not possible to use real tasks from Danfoss Drives, but we could tap into engineers from Danfoss Drives to check out complex-enough tasks upon the WeatherStationSPL. This subsection reports this expert evaluation.

7.2.1 Participants

Our evaluation derives its value from the expertise of participants in handling product-based SPL evolution. As customization analysis allows for different perspectives (feature vs. product) that might depend on the participants’ role, the selection was balanced between DE and AE. To attain this goal, we had the invaluable collaboration of a person with managerial responsibilities in Danfoss Drives, who encouraged participation through an open call. This call stated that participation was voluntary and that results would be gathered anonymously. No rewarding procedure was set except a coffee after the evaluation. Six people were able to participate: 1 product release manager, 3 software developers with a hybrid role in both domain and application engineering tasks, and 2 code reviewers in charge of branch integration. Participants’ average expertise in Danfoss Drives was 7 years.

7.2.2 Measurement Tool

Perceived Ease of Use We resorted to a questionnaire based on the Technology Acceptance Model (TAM) (Davis 1989) (see Table 4). Questions are arranged along a LIKERT scale from 1 (strongly disagree) to 7 (strongly agree).

Perceived Usefulness We could have resorted again to the TAM questionnaire. However, TAM’s perceived usefulness questionnaire has been criticized for being too general (Hornbæk and Hertzum 2017). Hence, we decided to create our own questionnaire (see Table 5). This questionnaire was checked for reliability. The reliability of any given measurement refers to the extent to which it is a consistent measure of a concept, i.e., the questionnaire’s items all reflect the same concern. Cronbach’s alpha is one way of measuring the strength of that consistency. The Cronbach’s alpha for the questionnaire in Table 5 is 0.93. A general accepted rule is that \(\alpha\) of 0.6-0.7 indicates an acceptable level of reliability, and 0.8 or greater a very good level (Ursachi et al. 2015).

7.2.3 Procedure

The procedure was designed with two goals in mind. On one hand, the tasks presented to participants needed to be meaningful enough to allow them to understand CustomDIFF and its application for visual analytics. On the other hand, given that participants have a busy agenda, it had to be limited in time with the aim of not disrupting their jobs more than necessary. To this end, the WeatherStationSPL was used as the running example. This SPL is included in the pure::variants experimental material, and hence, participants were already familiar with it. Consequently, participants could focus on customization issues rather than spending time understanding the SPL domain itself.

Table 3 Tasks for the expert evaluation

For the evaluation, three product variants were created: productParis, productBerlin, and productNewYork, each with a set of customizations. Customizations were designed to mimic the complexity that could be found in Danfoss in terms of affected features and products.

Upon this setting, participants were requested to conduct distinct tasks to assess the three perspectives identified in Section 5: holistic perspective, feature perspective, and product perspective (see Table 3). These tasks are felt representative of the information needs identified in Table 2.

The evaluation was conducted in two sessions. The first session presented CustomDIFF (1h 45’) to participants. CustomDIFF’s rationale and operations were introduced with the help of the WeatherStationSPL. The second session was a hands-on experience, where participants explored CustomDIFF on their own (1h 30’). The aforementioned sample cases were introduced. Next, participants were asked to fill two on-line questionnaires to assess CustomDIFF’s usefulness and ease of use. Due to agenda constraints, participants were divided into two groups, with 2 and 4 participants each. During the sessions, a researcher was observing participants’ interactions with the tool. Participants raised questions, doubts, and comments that were noted by the researcher.

Table 4 CustomDIFF’s perceived ease of use

7.2.4 Results

Ease of Use (see Table 4) Participants rated CustomDIFF with an average of 5.44. Although results seem to suggest that CustomDIFF is affordable enough, the lowest ranked question was the second one, which seems to suggest that some analysis workflows might not be as direct as expected. In addition, the researcher observing the participants noted a caveat for the interaction with CustomDIFF. Specifically, the difference between node clicking and arrow clicking was not apparent, and some participants expressed confusion. For instance, when looking at the customization effort undertaken in productParis to adjust the Sensors feature, some participants first clicked on the productParis. Although participants were not expected to write anything concrete, just to explore customization, direct observation of the participants showed that not all of them selected the Sensors-productParis arrow. Some participants first clicked on productParis and, in a second interaction, determined the adjustment for Sensors.

Table 5 CustomDIFF’s perceived usefulness. LIKERT scale ranges from 1 (strongly disagree) to 7 (strongly agree)

Usefulness (see Table 5) The questionnaire aims at capturing usefulness for conducting the tasks at hand. As long as these tasks have first been validated as relevant for customization analysis (the GQM model), they might conform to a sort of benchmark against which analysis tools can be evaluated. On these grounds, CustomDIFF yields good results from 6 (U1) to 5.33 (U6). That said, a difference can be appreciated between those questions aimed for the Change Control Board (coarse-grained: U1, U2, U3, U4) vs. those questions thought for application engineers (fine-grained: U5, U6). The latter outputs worse average and standard deviation outcomes. This might be due to Change Control Board members appreciating the novelty of alluvial diagrams, while application engineers did not see major differences w.r.t. the traditional DIFF view.

These results seem to suggest that participants found CustomDIFF easy to use (items ranked above 5 out of 7) and useful (items ranked above 5.33) for fulfilling the information needs in Table 3.

7.2.5 Threats to Validity

Construct Validity refers to the degree of accuracy with which the variables defined in a study measure the constructs of interest. Here, the constructs are ease of use and usefulness. As for the former, we resort to Davis’ questionnaire whose validity and reliability have been previously endorsed (Mathieson et al. 2001; Agarwal and Prasad 1998). Riskier is the use of our own questionnaire to assess usefulness. To ensure internal consistency, Cronbach’s alpha is calculated for the questionnaire which resulted in an \(\alpha\) value of 0.93, showing a high reliability, though the low number of participants needs to be considered.

Internal Validity refers to the extent to which the intervention or independent variable(s) were actually responsible for the effects seen in the dependent variable. Here, the intervention is CustomDIFF. Yet, other factors besides CustomDIFF might influence the results. First, the participants’ background. In this respect, we were especially careful to focus on SPL engineers who had at least one-year experience. Second, the questionnaire’s understandability. To reduce its influence, we carefully designed a running example that aimed at helping participants contextualize the different questions and ensuring a common understanding.

External Validity tackles the representativeness of the study and the ability to generalize the conclusions beyond the scope of the study itself. Representativeness can be challenged by the participants or the evaluation tasks. As for the former, we resorted to Danfoss engineers. To account for different perspectives, we aimed at involving practitioners with experience in both DE and AE. As for the tasks, the WeatherStationSPL is a rather small SPL compared to industrial SPLs. It is explicitly designed to help new users understand the concepts of pure::variants and, thus, it is expected to be cleanly engineered (which is, quite safe to say, not the case for industrial SPLs). Yet, for an incipient tool such as CustomDIFF to be put to the test using industrial code is difficult. Moving to variability-intensive open applications could have been an option. This would have brought more realistic tasks, yet at the expense of reducing the realism of participants and environment. The latter are however pivotal in a decision-taking scenario such as customization analysis.

8 Formalization of Learning

In accordance with ADR, the situated learning from the project should be further developed into general solution concepts for a class of field problems (Sein et al. 2011). Sein et al. suggest three levels for this conceptual move:

  • generalization of the problem instance, i.e., to what extent is customization analysis a problem for organizations other than Danfoss Drives;

  • generalization of the solution instance, i.e., to what extent is CustomDIFF a solution to customization analysis; and

  • derivation of design principles, i.e., what sort of design knowledge can be distilled from the CustomDIFF experience that might inform other tool builders.

The rest of this section tackles these questions.

8.1 Generalization of the Problem Instance

We tackle product customization at Danfoss Drives. This section elaborates on generalizing this experience to product-based evolving SPLs. Product customization challenges the traditional vision whereby new requirements are transmitted to domain engineering, features are built on the platform, and the product is then finally created. This conventional vision is increasingly called into question. Indeed, Krueger et al. report that an increasing number of companies and open-source projects add new variations or platform features using feature forks through version control systems (e.g., Git) (Krüger et al. 2020). Here, the platform, kept in the master branch, evolves through re-integrating these forks. If the master branch holds the SPL core assets, and the forks account for transient product enhancements, then we are talking about product-based evolving SPLs.

Provided this way of SPL evolution, the question remains whether customization analysis is also a problem in these organizations. It could be argued that SPLs in an earlier stage of their life-cycle might be the ones subject to a larger customization endeavor while more mature SPLs can obtain most of their products out of the core assets with minimal customization. That said, Danfoss Drives can be considered a mature SPL, and yet product customization is still very relevant. As usual in SPLs, the rationales might rest on scalability. SPLs handle a large number of products. Hence, perfective maintenance needs to scale up to the petitions of not one product’s stakeholders but a myriad of products. Timeliness might require the products’ perfective maintenance to develop in parallel. If this is your organization’s case, chances are you need to track how customization is conducted, i.e., customization analysis.

8.2 Generalization of the Solution Instance

This project develops CustomDIFF as an intervention for tackling customization analysis. We resort to visual analytics, using Git repositories as the data mines. This is not new. Distinct authors tackle both mining code repositories (we claim no contribution in this area) and software visualization for SPLs (our main contribution). The question arises about how alluvial diagrams compare w.r.t. other visualizations proposed in an SPL setting. Lopez-Herrejon et al. (2018) conduct a systematic survey where interventions are categorized based on the SPL stage to which the visualization technique applies. We enlarge this study for the stage maintenance and evolution along three dimensions (Novais et al. 2013): (1) the point of view, i.e., who is the user group that will use the visualization, (2) the object of study, i.e., what is being analyzed, and (3) the purpose and the focus, i.e., why is the analysis being done. Table 6 shows the results. Specifically, CustomDIFF can be pigeonholed as helping the Change Control Board (who) conduct customization analysis (what) for assessing SPL scoping (why). We use the visualization means as a way to structure the rest of the paragraphs.

Table 6 Related work on SPL visualization, along facets:who, what, why and visualization means

Trees. Its intuitiveness and the large support of graphical libraries make trees the most popular intervention. Trees naturally convey the notion of hierarchy as well as setting node clusters in terms of family dependencies (e.g., ancestors, siblings, descendants, etc). These affordabilities are put into play for different purposes. Kanda et al. aim at helping engineers in migrating a set of products, created through clone&own, to an SPL (Kanda et al. 2013). For the sake of better identifying commonalities and variability, engineers require to understand how products are derived from each one. This can be challenging if no tracing records exist. In these instances, Kanda et al. introduce the Product Evolution Tree visualization whereby derivation relationships among products are displayed in a way similar to VCS branching (Kanda et al. 2013). If the aim is not migration but testing, De Oliveira et al. introduce the Product Genealogy Tree with the aim of identifying products to be re-tested when a bug is found in a product (de Oliveira et al. 2012). This tree captures three traces: (1) which products were derived from which core assets, (2) which products are created from already derived products, and (3) which products have propagated changes to which products. When a bug is detected in a product, this visualization can help testers identify which other related products also need to be tested.

Tree-Map. This approach arranges data in a hierarchical, tree-structured diagram where the size of the rectangles is organized from the largest to the smallest. Main benefits include showing the ratio of each part to the whole. Tenev et al. resort to tree-maps for helping domain engineers identify the reuse potential of a number of similar software variants created through clone&own (Tenev et al. 2017). They compute the similarities of the source code of multiple software systems, and visualizes the commonalities and variabilities by means of multiple visualization means, such as, bar diagrams, tree-maps and phylogenetic diagrams. These diagrams provide domain engineers information about code similarity across a group of cloned software systems at different abstraction levels, i.e., from a single code line through files, folders and subsystems up to the whole system, which can then identify reuse potential and schedule an SPL migration plan.

Charts. Unlike trees, charts are more convenient when a longitudinal analysis is required. Wnuk et al. introduce the Feature Survival Chart for the visualization of scoping change dynamics: the X-axis stands for time while the Y-axis holds the features (Wnuk et al. 2009). In this way, the complete life-cycle of a single feature can be followed by looking at the same Y-axis position over time. This work does not tackle code but documents that formulate features for an upcoming platform project. A feature in this case is a concept of grouping requirements that constitute a new functional enhancement to the platform. At this stage, the features usually contain a description, their market values, and effort estimates. Features are refined to requirements which are specified, reviewed, and approved. The requirements are written in domain-specific natural language. The final scope is decided and agreed with the development resources. Wnuk et al. tap into these requirement documents to build up the Feature Survival Chart where the introduction/deletion of features are monitored. In this way, the visualization shows the decision process of including or excluding features that are candidates for the next SPL release. The aim is to prevent two sorts of problems: (1) setting too large scope compared to available resources and (2), setting a limited scope early, missing market opportunities. More recently, Hinterreiter et al. (2020) propose an IDE tool for developing and evolving a clone-and-own SPL. Their approach rests on an ad-hoc variability-aware VCS. Their tool supports a code-diff view that allows engineers to view how a given feature evolved for a given product (i.e., the product perspective). Specifically, Hinterreiter et al. (2020) propose two metrics to reflect how the code evolves: the relative change of the realizing artifact size (a sort of code churn), and the feature’s scattering change. In contrast, CustomDIFF is based on Git and pure::variants (i.e., annotation-based SPLs), reflecting a more industrial setting. In addition, visualization aims to account not only for the product perspective, but also the holistic perspective and the feature perspective.

Both Hinterreiter et al. and Wnuk et al. track SPL evolution based on requirement documents and the codebase, respectively (Hinterreiter et al. 2020; Wnuk et al. 2009). By contrast, CustomDIFF does not tackle the visualization of the evolution of the SPL. Rather, we focus on how to inform the follow-on SPL release. Alluvial diagrams depict the customization endeavor in the interim between SPL releases.

8.3 Derivation of Design Principles

So far, we have looked at CustomDIFF as a whole. Now, we disentangle the distinct mechanisms that on balance are responsible for the usefulness and perceived ease of use as detailed in Section 7.2. Table 7 outlines the main design principles. Design principles reflect knowledge of both IT and human behavior (Gregor et al. 2020). Accordingly, a design principle should provide cues about the effect (i.e., Change Control Board activity made possible), the cause (affordability brought about by CustomDIFF), and the context where this cause can be expected to yield the effect for the target audience (i.e., the Change Control Board and the application engineers).

Table 7 Design principles from the CustomDIFF experience
Table 8 Questionnaire on CustomDIFF’s mechanism

Table 7 outlines the four principles we consider more relevant. Principles flow-ness and grouping collect the benefits brought about by alluvials. On the other hand, principles zoom-ness and filtering are common principles in Visual Analytics. At this point, we also consider it significant to collect stakeholders’ opinions about each mechanism in isolation. Participants were the same throughout the research. We limited ourselves to one question for each mechanism. Table 8 collects the results. Mechanisms are in general appreciated with an average above 5 out of 7. Dispersion wise, the VP-enriched context is the mechanism with the larger dispersion. Rationales might rest on this mechanism being of interest only for application engineers while the Change Control Board might not need to delve into the code.

9 Conclusions

This research started with a phenomenon perceived in practice, i.e., product customization at Danfoss Drives. In this setting, we raised two questions: (1) which are the information needs for customization analysis? and (2), might alluvial diagrams be useful for supporting customization analysis visualization? As for the former, we provided some estimates on the cost of answering some questions, and developed a GQM model. We proposed the adjustment of the popular metric of code churn for SPL constructs: feature and product. However, the potentially high number of feature churns and product churns require appropriate visualization means. This moved us to the second question. Here, we made the case of alluvial diagrams. Proof-of-concept was provided through CustomDIFF, a publicly-available Web application using pure::variants as the variability manager, and Git as the code repository. Proof-of-value was conducted for ease of use and usefulness using Danfoss practitioners as the subjects. Results seem to suggest that alluvial diagrams facilitate a natural way to describe the flow dispersion at different levels of detail. On one hand, the Change Control Board gets the big picture along with products and features. On the other hand, developers can expand the flow to packages and files, down to the raw code. In this way, CustomDIFF advances traditional DIFF utilities for SPL specifics. We ended by generalizing this experience to SPLs other than Danfoss Drives. In short, we contribute to the existing literature by identifying information needs and advocating for alluvial diagrams as a feasible and effective way to reflect product customization in SPLs.

Additional empirical studies are needed to assess the value of customization analysis for actionable interventions. Our first evaluations indicate that customization diffing is a factor, but it is certainly not the only one that intervenes during decision making. In this respect, we plan to supplement Git data with data about products, customers, and developers, and to see what other kinds of analysis these additional sources would permit. In the same vein, metrics other than code churn might be of use. Besides tangling and scattering, more traditional metrics such as cyclomatic complexity and Halstead’s, might also be useful. Another interesting follow-on would be integrating CustomDIFF into a DevOps framework. Here, customization endeavors can be continuously tracked, so that actions can be attached to some customization thresholds being surpassed. Other scenarios include the use of CustomDIFF by application engineers to gaze at what other colleagues are customizing. For instance, a feature enhancement introduced in a given product might be promptly and directly incorporated into other products, without waiting for this enhancement to be promoted as a core asset. This opens up new scenarios for SPL evolution. Here, longitudinal evolution (between core assets and products) might coexist with traversal evolution where products sharing the same features might decide to incorporate enhancements from other products, and later on, be jointly consolidated. The aim is to find ways to alleviate the tension between the quality and reuse effectiveness required by domain engineering, and the time-to-market and customer pressure faced by application engineering.