Software product line testing: a systematic literature review

Agh, Halimeh; Azamnouri, Aidin; Wagner, Stefan

doi:10.1007/s10664-024-10516-x

Software product line testing: a systematic literature review

Open access
Published: 02 September 2024

Volume 29, article number 146, (2024)
Cite this article

Download PDF

You have full access to this open access article

Empirical Software Engineering Aims and scope Submit manuscript

Software product line testing: a systematic literature review

Download PDF

26 Accesses
Explore all metrics

Abstract

A Software Product Line (SPL) is a software development paradigm in which a family of software products shares a set of core assets. Testing has a vital role in both single-system development and SPL development in identifying potential faults by examining the behavior of a product or products, but it is especially challenging in SPL. There have been many research contributions in the SPL testing field; therefore, assessing the current state of research and practice is necessary to understand the progress in testing practices and to identify the gap between required techniques and existing approaches. This paper aims to survey existing research on SPL testing to provide researchers and practitioners with up-to-date evidence and issues that enable further development of the field. To this end, we conducted a Systematic Literature Review (SLR) with seven research questions in which we identified and analyzed 118 studies dating from 2003 to 2022. The results indicate that the literature proposes many techniques for specific aspects (e.g., controlling cost/effort in SPL testing); however, other elements (e.g., regression testing and non-functional testing) still need to be covered by existing research. Furthermore, most approaches are evaluated by only one empirical method, most of which are academic evaluations. This may jeopardize the adoption of approaches in industry. The results of this study can help identify gaps in SPL testing since specific points of SPL Engineering still need to be addressed entirely.

Systematic Review on Software Product Line Testing

Software Regression Testing in Industrial Settings: Preliminary Findings from a Literature Review

Software Test Management to Improve Software Product Quality

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Software Product Line (SPL) engineering has proven to be an efficient and effective strategy to decrease implementation costs, reduce time to market, and improve the quality of derived products (Denger and Kolb 2006; Northrop et al. 2007). SPLs and Configurable Systems (Alves Pereira et al. 2020) are two approaches used in software engineering to manage and create software with varying levels of customization and flexibility. While both SPLs and configurable systems share the goal of offering flexibility and customization, they differ in their core approach. SPLs primarily emphasize the systematic reuse of components, architectures, and design patterns across a range of related software products. In contrast, configurable systems are single software products designed to be adaptable, enabling users to configure them to meet their unique requirements. We decided to limit the scope on SPL to keep the review focused.

Testing is an essential part of SPL Engineering (SPLE) to identify potential faults (Pohl and Metzger 2006). This activity examines core assets shared among many products, product-specific parts, and the interaction among them (McGregor 2001). Therefore, SPL testing includes activities from the validation of initial requirements to the acceptance testing of a specific product by customers (Da Mota Silveira Neto et al. 2011).

As the adoption of the SPL approach by companies has grown (Weiss 2008), many researchers have made contributions in the SPL testing field to provide efficient and effective approaches that can satisfy specific needs of the industry (e.g., controlling the cost/effort of SPL testing). This resulted in many publications on different aspects of SPL testing. Therefore, analyzing research conducted in this field using well-known empirical methods is required to provide an overview of state-of-the-art testing practices and assess the effectiveness of the proposed approaches. To this end, Systematic Literature Reviews (SLR) and Systematic Mapping Studies (SMS) were conducted on SPL testing, but the most recent one dates back to 2014 (do Carmo Machado et al. 2014). While some recent research has focused on reviewing specific aspects of SPL testing, such as model-based testing of SPLs (Petry et al. 2020), test case prioritization for SPL (Kumar 2016), and combinatorial interaction testing for software product lines (Lopez-Herrejon et al. 2015), there has not been an SLR or SMS since 2014 that provides a comprehensive overview of the current state of SPL testing in a general context. Therefore, there is a need to update existing literature reviews (Mendes et al. 2020) to identify up-to-date evidence and issues that enable further development of the SPL testing field.

This paper presents an SLR to analyze interesting aspects of SPL testing that are formalized as research questions. An SLR is a rigorous and systematic method to identify, evaluate, and interpret all available research relevant to a particular research question, topic area, or phenomenon of interest (Cruzes and Dybä 2011). The specific aspects based on which we analyzed relevant studies are:

Characteristics of the studies focused on SPL testing.
Test levels executed throughout the SPL lifecycle.
Creating test assets by considering commonalities and variabilities.
Dealing with configuration-aware software testing.
Preserving traceability between test assets and other artifacts.
Testing non-functional requirements in an SPL.
Controlling cost/effort of SPL testing.

The SLR process was conducted from June 2022 to the end of 2022. While some of the findings derived from this SLR align with the conclusions of previous SLRs, such as the identification of existing gaps in non-functional testing for SPLs and the necessity for more robust and user-friendly testing tools, our review uncovered specific insights and unaddressed gaps in this domain that were not fully explored in prior SLRs. These include:

1.
Variability control, referring to the disciplined management and regulation of feature variations within SPLs, alongside modeling and tracing, presents persistent challenges that require attention throughout the testing process. Variability control involves implementing strategies, such as configuration and change management, to ensure consistency and predictability in the diverse configurations of products derived from the SPL.
2.
Novel approaches are needed for regression test selection, prioritization, and minimization, along with architecture-based regression testing, to effectively manage regression testing in SPLs.
3.
Promoting the adoption of SPL testing practices in industrial settings necessitates addressing practical challenges, such as offering guidance for industry-specific SPL testing, and conducting industrial evaluations.
4.
Exploring the details of test levels across the SPL lifecycle and highlighting the consequences of neglecting a particular test level can offer valuable insights for practitioners.
5.
Studies focusing on testing SPLs rarely address traceability explicitly. Considering feature variability and configuration management, more efficient methods for modeling and representing traceability relationships are required.

The remainder of this paper is organized as follows: Sect. 2 provides background information required to understand SPL and SPL testing concepts; Sect. 3 describes how the SLR methodology has been applied; the results of the SLR are reported in Sect. 4; potential threats to the validity of this study and the strategies employed to mitigate them are discussed in Sect. 5; Sect. 6 presents a summary of the research and examines the main findings; Sect. 7 provides a survey of the related research; Sect. 8 presents concluding remarks and further research.

2 Background

This section provides a concise background on the SPL development process, variability management, and testing approaches and levels as a basis for the remainder of this article.

2.1 SPL development process

SPL is a software development paradigm to achieve economies of scale and scope by analyzing product commonalities and variabilities. As this paradigm has specific benefits such as substantial cost savings, reduction of time to market, and high productivity, many organizations, including Philips, Nokia, Cummins, and Hewlett-Packard, have adopted it (Clements and Northrop 2002). In SPL, a set of core assets (e.g., reference architecture and reusable components) is first developed. Specific products are then built by configuring and composing the core assets in a prescribed way with product-specific features to satisfy particular market segments (Clements and Northrop 2002).

The SPL development process/lifecycle can be divided into two distinct phases: Domain Engineering and Application Engineering. According to Czarnecki and Eisenecker (2000, p. 20), Domain Engineering is “the activity of collecting, organizing, and storing experience in building systems or parts of systems in a particular domain in the form of reusable assets, as well as providing an adequate means for reusing these assets when building new systems.” Application Engineering is focused on deriving specific products from the core assets created during Domain Engineering; in this phase, specifics of the products are added to common parts to satisfy the particular needs of a product (Clements and Northrop 2002). Of these two phases, Domain Engineering demands significant resources and time. If not managed effectively, it can lead to the failure of the entire SPL (Pohl et al. 2005, p. 9–10). Three common approaches are employed for constructing an SPL, and each of these approaches directly influences the implementation of Domain Engineering (Apel et al. 2013):

Proactive approaches start with a comprehensive and thorough scoping of the domain to anticipate all requirements. Subsequently, all these requirements are implemented as assets, and SPL experts typically carry out this task.
Extractive approaches follow an automated process, utilizing a set of existing product variants as input. The SPL is constructed by extracting features from these variants. Features are identified and retrieved through feature location techniques (AL-Msie’deen et al. 2013; Rubin and Chechik 2013).
Reactive approaches follow an incremental process. They take as input an existing SPL version (SPL_i) and a set of new requirements about a new product. This process results in the creation of SPL_i+1, which can produce the new product.

2.2 Variability Management in SPL

In SPL engineering, variability mechanisms are fundamental for managing diversities across products. These mechanisms, as classified by Apel et al. (2013), include annotative mechanisms, transformative mechanisms (delta-oriented), and feature-oriented mechanisms. Annotative mechanisms involve marking or annotating code to denote variability points, while transformative mechanisms, such as delta-oriented programming, describe changes required to transform one product variant into another. Feature-oriented mechanisms organize variability around features and their interactions. These variability mechanisms can be applied across all stages of the software lifecycle.

A Feature Model is commonly used in Domain Engineering to present different combinations of features. A feature model is a formal representation and graphical notation that describes the variability and relationships among features in an SPL. Feature models typically consist of features (functionalities or characteristics), feature hierarchies (representing parent-child relationships between features), and constraints (rules governing the valid combinations of features) (Pohl et al. 2005). Due to the presence of numerous optional features, the configuration space in feature models may exponentially increase (reaching 2ⁿ possible configurations, where n represents the number of optional features without further constraints) (Chen and Babar 2011). A specific product can be derived once a complete feature configuration is established.

Although proactive approaches emphasize systematic upfront planning, modeling variabilities with feature and configuration models, and high asset reusability, reactive methods can also use feature models to represent variabilities introduced by new requirements. Configuration files or mechanisms are often used in reactive approaches to specifying how variabilities are configured in reaction to new requirements (Ghanam et al. 2010). Furthermore, extractive approaches may employ feature models to represent and visualize variabilities discovered in existing products. Configuration scripts or files may be used to document and manage variabilities found in the codebase (Parra et al. 2012).

2.3 Testing approaches and levels

There exist diverse approaches to software testing, including (Luo 2001; Jorgensen 2013):

Manual testing: Testers create and execute test cases manually to evaluate the behavior of a software application or system without using automated testing tools or scripts.
Automated Testing: Specialized testing tools and scripts are used to automate the execution of test cases and the verification of software applications or systems.
Functional testing: Focuses on verifying software functions according to specified requirements. This approach includes different levels of testing, including:
- Unit Testing is conducted at the lowest level, focusing on the fundamental unit of software, referred to interchangeably as “unit,” “module,” or “component.“
- Integration Testing takes place when two or more tested units are integrated into a larger structure. This testing assesses the interactions between components and evaluates the quality of the overall structure when the properties cannot be determined solely from its individual components.
- System Testing aims to validate the comprehensive quality of the entire system, covering end-to-end functionality. This type of testing typically aligns with the functional and requirement specifications of the system. Additionally, it assesses non-functional quality attributes like reliability, security, and maintainability.
- Acceptance Testing occurs when the developers deliver the completed system to the customers or users. The primary goal of acceptance testing is to give confidence that the system functions correctly rather than to uncover errors.
Non-functional testing: Focuses on evaluating the attributes of a software system that are not directly related to its functional behavior. Instead, non-functional testing assesses the system’s performance, reliability, scalability, security, usability, and other qualities that impact the overall user experience and the system’s ability to meet non-functional requirements.
Regression testing: Focuses on verifying that recent changes or updates to a software application have not introduced new defects or negatively affected existing functionality.
Model-based testing: Test cases are derived from models representing the software’s expected behavior. Different models can be used to generate test cases systematically, including graphical representations, mathematical models, or formal notations.

SPL testing is an essential activity in SPLE to identify potential faults (Pohl and Metzger 2006). Exhaustive testing in SPL is usually infeasible due to a combinatorial explosion in the number of products. Following Tevanlinna et al. (2004), Reuys et al. (2005), Käköla and Dueñas (2006), there are specific differences between single-system testing and SPL testing:

1)
Testing is a part of both phases: Domain Engineering and Application Engineering. Domain testing is focused on testing domain artifacts (e.g., requirements, features, and source code); however, as domain artifacts include variability, completely testing the domain artifacts in domain testing is impossible. Application testing aims to detect remaining faults in a derived product mainly caused by unexpected interactions.
2)
Test assets created in Domain Engineering (e.g., test cases, test scenarios, test results, and test data) are reused in Application Engineering to test instantiated products. To this end, test assets should be created by considering variability, which we call variant-rich test assets.

3 Systematic literature review methodology

To carry out this SLR, we followed guidelines for performing SLRs in software engineering (Kitchenham and Charters 2007). The steps followed in conducting this SLR are developing a review protocol, conducting the review, analyzing the results, reporting the results, and discussing the findings. The review protocol used in this SLR is explained in the following subsections. The protocol includes the formulation of research questions to achieve the objective (Sect. 3.1), identification of sources to extract the research papers, the search criteria and principles for selecting the relevant studies (Sect. 3.2), specifying a set of criteria to assess the quality of each study remained for data extraction (Sect. 3.3), and developing the template used for extracting data (Sect. 3.4).

3.1 Research questions

As previously stated, this study aims to investigate how the existing approaches deal with testing in SPL. To formulate research questions, we examined topics addressed by previous research on SPL testing (Pérez et al. 2009; Engström and Runeson 2011; Da Mota Silveira Neto et al. 2011; do Carmo Machado et al. 2014). Some of the research questions were completely reused from previous research – i.e., RQ1, RQ2, RQ3, RQ6, and RQ7 – and some of them were formulated by analyzing specific aspects that have not been investigated in detail in previous research – i.e., RQ4 and RQ5.

We reuse RQs to contrast and compare the newer research contributions with the results of previous SLRs. Yet, we identified two unique, interesting aspects: Because testing every potential configuration of an SPL is often impractical, it becomes essential to employ specific approaches for identifying valid and invalid configurations. We have examined the techniques utilized or proposed in RQ4 to address this issue. Maintaining traceability between test assets and other SPL artifacts offers substantial advantages, including enhanced reusability, impact analysis, and change management. Consequently, we designed RQ5 to investigate the techniques employed for preserving traceability. Answering these questions led to a detailed investigation of the identified studies to specify practical and research issues regarding SPL testing; therefore, the results of this study can support both industrial and academic activities. The research questions are as follows:

RQ1. How is the research on SPL testing characterized? This question intends to discuss the bibliometrics of the primary studies and the evidence available to adopt the proposed approaches.
RQ2. What levels of tests are usually executed throughout the SPL lifecycle (i.e., Domain Engineering and Application Engineering)? There are different levels of tests, and each level is associated with a specific development phase, including unit, integration, system, and acceptance tests (Ammann and Offutt 2008; Jaring et al. 2008). This question aims to specify different test levels usually executed throughout the SPL lifecycle.
RQ3. How are test assets created by considering commonalities and variabilities? The large number of variation points and variants in an SPL increases the number of possible testing combinations. Creating test assets for all combinations of functionality is almost impossible in practice; therefore, test assets must be created by considering commonality and variability so that they can be reused as much as possible. Furthermore, an undetected error in common core assets of an SPL can be spread to all instances depending on those assets (Pohl and Metzger 2006); therefore, creating test assets by considering commonalities and variabilities and testing common aspects as early as possible is essential. Answering this question led to investigating how testing approaches handle commonality and variability throughout creating/executing test assets.
RQ4. How do SPL approaches deal with configuration-aware software testing? Testing all functionality combinations in an SPL is impossible and unnecessary since some combinations are invalid based on the constraints defined between configuration parameters. This question is intended to specify ways/techniques to detect valid and invalid combinations of configuration parameters.
RQ5. How is the traceability between test assets and other artifacts of SPL preserved throughout the SPL lifecycle? The reusability of test assets is essential to manage the complexity of SPL testing; preserving traceability between test assets and requirements/implementation can enhance the reusability of test assets. In this sense, this question is intended to identify specific ways/techniques to achieve traceability between test assets and other artifacts throughout the SPL lifecycle.
RQ6. How are Non-Functional Requirements (NFRs) tested in SPL? NFRs such as security, reliability, and performance are very important for SPLs, and ignoring these requirements can lead to negative results (e.g., economic loss) (Nguyen 2009). Therefore, systematically testing NFRs by considering commonalities and variabilities is an important aspect of SPLE. This question is intended to investigate how tests of NFRs are performed in an SPL.
RQ7. What mechanisms have been used for controlling cost/effort of SPL testing? As SPL testing is more expensive than single-system testing, identifying specific techniques to reduce effort can provide the reader with an initial list of techniques identified by analyzing the selected studies. The specified list can be enriched regarding new publications about SPL testing.

3.2 Identification of relevant literature

The process of gathering and selecting primary studies has been performed in three stages: in the first stage, we investigated previously published literature reviews on SPL testing (Pérez et al. 2009; Engström and Runeson 2011; Da Mota Silveira Neto et al. 2011; do Carmo Machado et al. 2014) to identify the initial set of papers that have been published up to 2013. In the second stage, we updated the list of papers by searching for new papers published between 2013 and 2022; in this stage, we performed forward and backward snowballing (Webster and Watson 2002) to identify missing relevant papers. In the third stage, we applied inclusion and exclusion criteria to each potential primary study identified through stages one and two. Each of the three stages is explained in detail in the following subsections. We must note that we chose studies that could address at least one of the RQs while selecting primary studies. For instance, certain studies focusing on SPL verification were included because they could provide insights relevant to questions such as RQ4. An Excel file was created to be shared among the authors to document the various steps of the SLR process. This file^{Footnote 1} contains all the details about how we gathered and selected primary studies and how we extracted data from the chosen studies.

3.2.1 Analysis of existing reviews

By searching for existing SLRs or Systematic Mapping Studies (SMSs) on SPL testing, we found four SLRs (Engström and Runeson 2011; Da Mota Silveira Neto et al. 2011, Pérez et al. 2009; do Carmo Machado et al. 2014). Engström and Runeson (2011) conducted an SMS to identify useful approaches and needs for future research; in this study, 64 papers published up to 2008 were surveyed. Da Mota Silveira Neto et al. (2011) performed an SMS to investigate state-of-the-art testing practices in SPL testing; this study analyzed a set of 45 publications from 1993 to 2009. Pérez et al. (2009) conducted an SLR to identify experience reports and initiatives carried out in the SPL testing area; in this study, 23 primary studies published up to 2009 were analyzed. do Carmo Machado et al. (2014) conducted an SLR by analyzing 49 studies published up to 2013. As the four studies followed a systematic process to gather and select the primary studies, we are confident that they covered all the primary studies in the SPL testing field published up to 2013.

Using the list of primary studies in the four SLR/SMS, a set of 181 potentially relevant papers was identified, shown as stage 1.1 in Fig. 1. By reading the titles and abstracts of the publications, papers that addressed none of the research questions were excluded. Furthermore, duplicated papers were removed, i.e., those included in more than one literature review. At the end of this stage, 97 studies were finally selected, shown as stage 1.2 in Fig. 1.

3.2.2 Gathering recent publications

In the second stage of the search process, we updated the list of primary studies by analyzing papers published between 2013 and 2022 using the following databases: IEEE Xplore, Scopus, ACM DL, Springer, and Wiley online library. To answer the stated research questions, we identified the keywords that had to be used in the search process. Variants of the terms “software product line”, “software product family”, and “software testing” were applied to compose the search query, as follows:

(Software Product Line OR Software Product Lines OR Software Product Family OR Software Product Families) AND (Test OR Testing).

To evaluate the search string, we first performed a limited manual search to see whether the results of that search were among the results obtained by running the search string. The search string was adapted based on the syntax requirements of each data source used. Table 13 in Appendix A shows the forms of search strings applied to different engines and the number of papers extracted from each data source.

We obtained a set of 2,608 papers by running the search string on the search engines, shown as stage 2.1 in Fig. 1. We excluded 161 papers as duplicates since they were retrieved from multiple search engines. Furthermore, by reading the titles and abstracts of the remaining papers, a set of 2,125 papers was identified as irrelevant since they considered testing from a single-system development perspective, not an SPL point of view. At the end of this step, we had 322 papers, shown as stage 2.2 in Fig. 1.

In the next step, we conducted both backward and forward snowballing by examining the reference lists of all the identified papers and exploring the papers that have cited these identified papers, respectively. Following this step, 70 additional papers (20 via backward snowballing and 50 via forward snowballing) were added to the previously identified set of papers, shown as stage 2.3 in Fig. 1. At the end of stage 2, we had a set of 392 new publications, shown in Fig. 1 as stage 2.4.

3.2.3 Primary study selection strategy

By merging the results of the two previous stages, a set of 477 papers was composed, shown as stage 3.1 in Fig. 1. Throughout the merging process, we identified 12 papers as duplicates because the year 2013 was considered in both the SLR conducted by do Carmo Machado et al. (2014) and in the automated search stage. We defined a set of inclusion and exclusion criteria to assess each potential primary study; the criteria are presented in Table 1. These criteria were applied to the titles and abstracts of the identified papers. The first author performed this stage. However, to reduce the researcher bias, the results of this stage were validated by the second and third authors of this paper.

At this stage, we initially applied inclusion criteria to select papers meeting all of the specified criteria for inclusion. Following this, we applied exclusion criteria to exclude papers that met one or more of the specified exclusion criteria. We included only papers evaluated via at least one empirical method, including Case study, Survey, Experiment, and Observational study (Wohlin et al. 2003; Sjoberg et al. 2007; Zhang et al. 2018). At the end of this stage, a set of 161 papers were selected to be subject to full-text reading, depicted in Fig. 1 as stage 3.2. The analysis results of the papers, conducted based on the inclusion and exclusion criteria, are accessible within the replication package.

Table 1 Inclusion and exclusion criteria

Software product line testing: a systematic literature review

Abstract

Similar content being viewed by others

Systematic Review on Software Product Line Testing

Software Regression Testing in Industrial Settings: Preliminary Findings from a Literature Review

Software Test Management to Improve Software Product Quality

Explore related subjects

1 Introduction

2 Background

2.1 SPL development process

2.2 Variability Management in SPL

2.3 Testing approaches and levels

3 Systematic literature review methodology

3.1 Research questions

3.2 Identification of relevant literature

3.2.1 Analysis of existing reviews

3.2.2 Gathering recent publications

3.2.3 Primary study selection strategy

3.3 Quality assessment

3.4 Data extraction and analysis

4 Results

4.1 Characteristics of the studies (RQ1)

4.1.1 Bibliometrics

Annual trend:

Distribution per venue:

4.1.2 Analyzing the evidence available to adopt the proposed approaches

4.2 Test levels executed throughout the SPL lifecycle (RQ2)

4.3 Creating test assets by considering commonalities and variabilities (RQ3)

4.4 Dealing with configuration-aware software testing (RQ4)

4.5 Preserving traceability between test assets and other artifacts (RQ5)

4.6 Testing non-functional requirements in SPL (RQ6)

4.7 Controlling cost/effort of SPL testing (RQ7)

5 Threats to validity

5.1 Study selection validity

5.2 Data validity

5.3 Research validity

6 Discussion

6.1 Overview of evaluation maturity and studies’ contributions

6.2 Main findings

7 Related work

8 Conclusions and future work

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation