1 Introduction

Knowledge-based configuration is one of the most successful application areas of Artificial Intelligence (Felfernig et al. 2014; Frayman and Mittal 1987; Haag 2014; Hvam et al. 2008; Sabin and Weigel 1998; Salvador and Forza 2007; Stumptner 1997). There exist many application domains ranging from telecommunication systems (Fleischanderl et al. 1998; Stumptner et al. 1994), railway interlocking systems (Falkner and Schreiner 2014), the automotive domain (Sinz et al. 2003; Tiihonen and Anderson 2014; Walter and Küchlin 2014), software product lines (Benavides et al. 2010) to the configuration of services (Tiihonen et al. 2014a).

Configuration technologies must be able to deal with inconsistencies which can occur in different contexts. First, a configuration knowledge base can be inconsistent, i.e., no solution can be determined. In this context, the task of knowledge engineers is to figure out which constraints are responsible for the unintended behavior of the knowledge base. Bakker et al. (1993) show how to apply model-based diagnosis (Reiter 1987) to determine minimal sets of constraints in a knowledge base that are responsible for a given inconsistency. A variant thereof is documented in Felfernig et al. (2004) where an approach to the automated debugging of knowledge bases with test cases is introduced. Test cases are interpreted as positive or negative examples that describe the intended behavior of a knowledge base. If some positive examples induce conflicts in the configuration knowledge base, some of the constraints in the knowledge base are faulty and have to be adapted or deleted. If some negative examples are accepted (i.e., not rejected) by the configuration knowledge base, further constraints have to be included in order to take these examples into account (in Felfernig et al. (2004) this issue is solved by simply including negative examples in negated form into the configuration knowledge base). A related approach in the area of software product lines is proposed in White et al. (2010). Second, customer requirements can be inconsistent with the underlying knowledge base.Footnote 1Felfernig et al. (2004) also show how to diagnose customer requirements that are inconsistent with a configuration knowledge base. The underlying assumption is that the configuration knowledge base itself is consistent but combined with a set of requirements is inconsistent.

The so far mentioned configuration-related diagnosis approaches are based on conflict-directed hitting set determination where conflicts have to be calculated in order to be able to derive one or more corresponding diagnoses (Crow and Rushby 1991; Janota et al. 2014; Junker 2004; Reiter 1987; Shah 2011). These approaches often determine diagnoses in a breadth-first search manner which allows the identification of minimal cardinality diagnoses. The major disadvantage of applying these approaches is the need of determining minimal conflicts which is inefficient especially in cases where only the leading diagnoses (the most relevant ones) are sought. Furthermore, in many application domains it is not necessarily the case that minimal cardinality diagnoses are the preferred ones – Felfernig et al. (2009) show how recommendation technologies (Jannach et al. 2010) can be exploited for guiding the search for preferred (minimal but not necessarily minimal cardinality) diagnoses.

Algorithms based on the idea of anytime diagnosis are useful in scenarios where diagnoses have to be provided in real-time, i.e., within given time limits. Efficient diagnosis and reconfiguration of communication networks is crucial to retain the quality of service, i.e., if some components/nodes in a network fail, corresponding substitutes and extensions have to be determined immediately (Nica et al. 2014; Stumptner and Wotawa 1999). In today’s production scenarios which are characterized by small batch sizes and high product variability, it is increasingly important to develop algorithms that support the efficient reconfiguration of schedules. Such functionalities support the paradigm of smart production, i.e., the flexible and efficient production of highly variant products. Further applications are the diagnosis and repair of robot control software (Steinbauer et al. 2005), sensor networks (Provan and Chen 1999), feature models (Janota et al. 2014; White et al. 2010), the reconfiguration of cars (Walter et al. 2013), and the reconfiguration of buildings (Friedrich et al. 2011). In the diagnosis approach presented in this paper, we assure diagnosis determination within certain time limits by systematically reducing the number of solver calls needed. This specific interpretation of anytime diagnosis requires a trade-off between diagnosis quality (evaluated, e.g., in terms of minimality) and the time needed for diagnosis determination.

Algorithmic approaches to provide efficient solutions for diagnosis problems are manyfold. Some approaches focus on improvements of Reiter’s original hitting set directed acyclic graph (HSDAG) (Reiter 1987) in terms of a personalized computation of leading diagnoses (DeKleer 1990) or other extensions that make the basic approach (Reiter 1987) more efficient (Wotawa 2001). Wang et al. (2009) introduce an approach to derive binary decision diagrams (BDDs) (Andersen et al. 2010; Bryant 1992) on the basis of a pre-determined set of conflicts – diagnoses can then be determined by finding paths in the BDD that include given variable settings (e.g., requirements defined by the user). A predefined set of conflicts can also be compiled into a corresponding linear optimization problem (Fijany and Vatan 2004); diagnoses can then be determined by solving the given problem. In knowledge-based recommendation scenarios, diagnoses for user requirements can be pre-compiled in such a way that for a given set of customer requirements, the diagnosis search task can be reduced to querying a relational table (see, for example, Jannach 2006; Schubert and Felfernig 2011). All of the mentioned approaches either extend the approach of Reiter (1987) or improve efficiency by exploiting pre-generated information about conflicts or diagnoses.

An alternative to conflict-directed diagnosis (Reiter 1987) are direct diagnosis algorithms that determine minimal diagnoses without the need of predetermining minimal conflict sets (Felfernig et al. 2012; Shchekotykhin et al. 2014). The FastDiag algorithm (Felfernig et al. 2012) is a divide-and-conquer based algorithm that supports the determination of diagnoses without a preceding conflict detection. Such direct diagnosis approaches are especially useful in situations where not the complete set of diagnoses has to be determined but users are interested in the leading diagnoses, i.e., diagnoses with a high probability of being relevant for the user. Also in the context of SAT solving, algorithms have been developed that allow the efficient determination of diagnoses (also denoted as minimal correction subsets) in an efficient fashion (Bacchus et al. 2014; Gregoire et al. 2014; Marques-Silva et al. 2013; Mencia et al. 2014). Beside efficiency, prediction quality of a diagnosis algorithm is a major issue in interactive configuration settings, i.e., those diagnoses have to be identified that are relevant for the user. A corresponding comparison of approaches to determine preferred minimal diagnoses and unsatisfied clauses with minimum total weights is provided in Walter et al. (2016). The authors point out theoretical commonalities and prove the reducibility of both concepts to each other.

In this paper we show how the FastDiag approach can be converted into an anytime diagnosis algorithm (FlexDiag) that allows tradeoffs between diagnosis quality (minimality and accuracy) and performance. In this paper we focus on reconfiguration scenarios (Friedrich et al. 2011; Nica et al. 2014; Stumptner and Wotawa 1999; Walter and Küchlin 2014), i.e., we show how FlexDiag can be applied in situations where a given configuration (solution) has to be adapted conform to a changed set of customer requirements. Our contributions in this paper are the following. First, based on previous work on the diagnosis of inconsistent knowledge bases, we show how to solve reconfiguration tasks with direct diagnosis. Second, we make direct diagnosis anytime-aware by including a parametrization that helps to systematically limit the number of consistency checks and thus make diagnosis search more efficient. Finally, we report the results of a FlexDiag-related evaluation conducted on the basis of real-world configuration knowledge bases (feature models and configuration knowledge bases from the automotive industry) and discuss quality properties of related diagnoses not only in terms of minimality but also in terms of accuracy.

The remainder of this paper is organized as follows. In Section 2 we introduce an example configuration knowledge base from the domain of resource allocation. This knowledge base will serve as a working example throughout the paper. Thereafter (Section 3) we introduce a definition of a reconfiguration task. In Section 4 we discuss basic principles of direct diagnosis on the basis of FlexDiag and show how this algorithm can be applied in reconfiguration scenarios. In Section 5 we present the results of an analysis of algorithm performance and the quality of determined diagnoses. A simple example of the application of FlexDiag in production environments is given in Section 6. In Section 7 we discuss issues for future work. With Section 8 we conclude the paper.

2 Example configuration knowledge base

A configuration system determines configurations (solutions) on the basis of a given set of customer requirements (Hotz et al. 2014). In many cases, constraint satisfaction problem (CSP) representations are used for the definition of a configuration task.Footnote 2 A configuration task and a corresponding configuration (solution) can be defined as follows:

Definition 1

(Configuration Task and Configuration). A configuration task can be defined as a CSP (V,D,CR) where V = {v1,v2,...,vn} is a set of variables, \(\phantom {\dot {i}\!}D = \bigcup _{v_{i} \in V} \{dom(v_{i})\}\) represents domain definitions, and C = {c1,c2,...,cm} is a set of constraints (the configuration knowledge base). Additionally, user requirements are represented by a set of constraints R = {r1,r2,...,rk} where R and C are disjoint. A configuration (solution) for a configuration task is a complete set of assignments (constraints) S = {s1 : v1 = a1,s2 : v2 = a2,...,sn : vn = an} where aidom(vi) which is consistent with CR.

An example of a configuration task represented as a constraint satisfaction problem (CSP) is the following.

Example (Configuration Task)

In this resource allocation problem example, items (a barrel of fuel, a stack of paper, a pallet of fireworks, a pallet of personal computers, a pallet of computer games, a barrel of oil, a pallet of roof tiles, and a pallet of rain pipes) have to be assigned to three different containers. There are a couple of constraints (ci) to be taken into account, for example, fireworks must not be combined with fuel (c1). Furthermore, there is one requirement (r1) which indicates that the pallet of fireworks has to be assigned to container 1. On the basis of this configuration task definition, a configurator can determine a configuration (solution) S.

  • V = {fuel,paper,fireworks,pc,games,oil,roof,pipes}

  • dom(fuel) = dom(paper) = dom(fireworks) = dom(pc) = dom(games) = dom(oil) = dom(roof) = dom(pipes) = {1,2,3}

  • C = {c1 : fireworksfuel,c2 : fireworkspaper, c3 : fireworksoil,c4 : pipes = roof,c5 : paperfuel}

  • R = {r1 : fireworks = 1}

  • S = {s1 : pc = 3,s2 : games = 1,s3 : paper = 2,s4 : fuel = 3, s5 : fireworks = 1,s6 : oil = 2,s7 : roof = 1,s8 : pipes = 1}

On the basis of the given definition of a configuration task, we now introduce the concept of reconfiguration (see also Friedrich et al. 2011; Nica et al. 2014; Stumptner 1999; Walter and Küchlin 2014).

3 Reconfiguration task

It can be the case that an existing configuration S has to be adapted due to new customer requirements. Examples thereof are changing requirements in production schedules, failing components or overloaded network infrastructures in a mobile phone network, and changes in the internal model of the environment of a robot. In the following we assume that the pallet of paper should be reassigned to container 3 and the personal computer and games pallets should be assigned to the same container. Formally, the set of new requirements is represented by \(\phantom {\dot {i}\!}R_{\rho }: \{r_{1}^{\prime }: pc = games, r_{2}^{\prime }: paper = 3\}\). In order to determine reconfigurations, we have to calculate a corresponding diagnosis Δ (see Definition 2).

Definition 2

(Diagnosis). A diagnosis Δ (correction subset) is a subset of S = {s1 : v1 = a1,s2 : v2 = a2,...,sn : vn = an} such that S −Δ∪ CRρ is consistent. Δ is minimal if there does not exist a diagnosis Δ with Δ⊂Δ.

On the basis of the definition of a minimal diagnosis, we can introduce a formal definition of a reconfiguration task.

Definition 3

(Reconfiguration Task and Reconfiguration). A reconfiguration task can be defined as a CSP (V,D,C,S,Rρ) where V is a set of variables, D represents variable domain definitions, C is a set of constraints, S represents an existing configuration, and \(\phantom {\dot {i}\!}R_{\rho }=\{r_{1}^{\prime }, r_{2}^{\prime }, ..., r_{q}^{\prime }\}\) (Rρ consistent with C) represents a set of reconfiguration requirements. Furthermore, let Δ be a minimal diagnosis for the reconfiguration task. A reconfiguration is a variable assignment SΔ = {s1 : v1 = a1,s2 : v2 = a2,...,sl : vl = al} where si ∈Δ, \(\phantom {\dot {i}\!}a_{i}^{\prime } \neq a_{i}\), and S −Δ∪ SΔCRρ is consistent.

If Rρ is inconsistent with C, the new requirements have to be analyzed and changed before a corresponding reconfiguration task can be triggered (Falkner et al. 2011; Felfernig et al. 2009). An example of a reconfiguration task in the context of our configuration knowledge base is the following.

Example (Reconfiguration Task)

In the resource allocation problem, the original customer requirements R are substituted by the requirements \(\phantom {\dot {i}\!}R_{\rho } = \{r_{1}^{\prime }: pc = games, r_{2}^{\prime }: paper = 3\}\). The resulting reconfiguration task instance is the following.

  • V = {fuel,paper,fireworks,pc,games,oil,roof,pipes}

  • dom(fuel) = dom(paper) = dom(fireworks) = dom(pc) = dom(games) = dom(oil) = dom(roof) = dom(pipes) = {1,2,3}

  • C = {c1 : fireworksfuel,c2 : fireworkspaper, c3 : fireworksoil,c4 : pipes = roof,c5 : paperfuel}

  • S = {s1 : pc = 3,s2 : games = 1,s3 : paper = 2,s4 : fuel = 3, s5 : fireworks = 1,s6 : oil = 2,s7 : roof = 1,s8 : pipes = 1}

  • \(\phantom {\dot {i}\!}R_{\rho } = \{r_{1}^{\prime }: pc = games, r_{2}^{\prime }: paper = 3\}\)

To solve a reconfiguration task (see Definition 3), conflict-directed diagnosis approaches (Reiter 1987) would determine a set of minimal conflicts and then determine a hitting set that resolves each of the identified conflicts. In this context, a minimal conflict set CSS is a minimal set of variable assignments that trigger an inconsistency with CRρ, i.e., CSCRρ is inconsistent and there does not exist a conflict set CS with CSCS. In our working example, the minimal conflict sets are CS1 : {s1 : pc = 3,s2 : games = 1}, CS2 : {s3 : paper = 2}, and CS3 : {s4 : fuel = 3}. The corresponding minimal diagnoses are Δ1 : {s1,s3,s4} and Δ2 : {s2,s3,s4}.

The elements in a diagnosis indicate which variable assignments have to be adapted such that a reconfiguration can be determined that takes into account the new requirements in Rρ. Consequently, a reconfiguration represents a minimal set of changes to the original configuration (S) such that the new requirements Rρ are taken into account. If we choose Δ1, a reconfiguration SΔ (reassignments for the variable assignments in Δ1) can be determined by a CSP solver call CRρ ∪ (S −Δ1). The resulting configuration S can be {s1 : pc = 1,s2 : games = 1,s3 : paper = 3,s4 : fuel = 2,s5 : fireworks = 1,s6 : oil = 2,s7 : roof = 1,s8 : pipes = 1}. For a detailed discussion of conflict-based diagnosis we refer to Reiter (1987). In the following we introduce an approach to the determination of minimal reconfigurations which is based on a direct diagnosis algorithm, i.e., diagnoses are determined without the need of determining related minimal conflict sets.

4 Reconfiguration with FlexDiag

In the following discussions, the set AC = CRρS represents the union of all constraints that restrict the set of possible solutions for a given reconfiguration task. Furthermore, S represents a set of constraints that are considered as candidates for being included in a diagnosis Δ. The idea of FlexDiag (Algorithm 1) is to systematically filter out the constraints that become part of a minimal diagnosis using a divide-and-conquer based approach.

figure a

Sketch of Algorithm

In our example reconfiguration task, the original configuration S = {s1,s2,s3,s4,s5,s6,s7,s8} and the new set of customer requirements is \(\phantom {\dot {i}\!}R_{\rho } = \{r_{1}^{\prime }, r_{2}^{\prime }\}\). Since SRρC is inconsistent, we are in need of a minimal diagnosis Δ and a reconfiguration SΔ such that S −Δ∪ SΔRρC is consistent. In the following we will show how the FlexDiag (Algorithm 1) can be applied to determine such a minimal diagnosis Δ.

FlexDiag is assumed to be activated under the assumption that AC = CRρS is inconsistent, i.e., the consistency of AC is not checked by the algorithm. If AC is inconsistent but ACS is also inconsistent, FlexDiag will not be able to identify a diagnosis in S; therefore is returned. Otherwise, a recursive function FlexD is activated which is in charge of determining one minimal diagnosis Δ ⊆ S. In each recursive step, the constraints in S are divided into two different subsets (S1 and S2) in order to figure out if already one of these subsets includes a diagnosis. If this is the case, the second set must not be inspected for diagnosis elements anymore. If we assume, for example, S = {s1,s2,s3,s4,s5,s6,s7,s8} is inconsistent and we divide S into the two subsets S1 = {s1,s2,s3,s4} and S2 = {s5,s6,s7,s8} and S1 is already consistent with CRρ then diagnosis elements are searched in S2 (since S1 is already consistent). The complete related walkthrough is depicted in Figs. 1 and 2.

Fig. 1
figure 1

FlexDiag walkthrough: determining one minimal diagnosis with m = 1 (Δ = {s1,s3,s4})

Fig. 2
figure 2

FlexDiag walkthrough: determining a minimal diagnosis with m = 2 (Δ = {s1,s2,s3,s4})

FlexDiag is based on the concepts of FastDiag (Felfernig et al. 2012), i.e., it returns one diagnosis (Δ) at a time and is complete in the sense that if a diagnosis is contained in S, then the algorithm will find it. A corresponding reconfiguration can be determined by a CSP solver call CRρ ∪ (S −Δ). The determination of multiple diagnoses at a time can be realized on the basis of the construction of a HSDAG (Reiter 1987). In FlexDiag, the parameter m is used to control diagnosis quality in terms of minimality, accuracy, and the performance of diagnostic search (see Section 5). The higher the value of m the higher the performance of FlexDiag and the lower the degree of diagnosis quality. The inclusion of m to control quality and performance is the major difference between FlexDiag and FastDiag. If m = 1 (see Algorithm 1), the number of consistency checks needed for determining one minimal diagnosis is \(\phantom {\dot {i}\!}2\delta \times log_{2} (\frac {n}{\delta })+ 2\delta \) (in the worst case) (Felfernig et al. 2012). In this context, δ represents the set size of the minimal diagnosis Δ and n represents the number of constraints in solution S.

If m > 1, the number of needed consistency checks can be systematically reduced if we accept the tradeoff of possibly loosing the property of diagnosis minimality (see Definition 2). If we allow settings with m > 1, we can reduce the upper bound of the number of consistency checks to \(2\delta \times log_{2}(\frac {2n}{\delta \times m})\) (in the worst case). These upper bounds regarding the number of needed consistency checks allow to estimate the worst case runtime performance of the diagnosis algorithm which is extremely important for real-time scenarios. Consequently, if we are able to estimate the upper limit of the time needed for completing one consistency check (e.g., on the basis of simulations with an underlying constraint solver), we are also able to figure out lower bounds for m that must be chosen in order to guarantee a FlexDiag runtime within predefined time limits.

Table 1 depicts an overview of consistency checks needed depending on the setting of the parameter m and the diagnosis size δ for |S| = 16. For example, if m = 2 and the size of a minimal diagnosis is δ = 4, then the upper bound for the number of needed consistency checks is 16. If the size of δ further increases, the number of corresponding consistency checks does not increase anymore. Figures 1 and 2 depict FlexDiag search trees depending on the setting of granularity parameter m. The upper bound for the number of consistency checks helps us to determine the maximum amount of time that will be needed to determine a diagnosis on the basis of FlexDiag. For example, if the maximum time needed for one consistency check is 20ms, the maximum time needed for determining a diagnosis with m = 2 (given δ = 8) is ≈ 320 milliseconds.

Table 1 Worst-case estimates for the number of needed consistency checks depending on the granularity parameter m and the diagnosis size δ for |S| = 16

FlexDiag determines one diagnosis at a time which indicates variable assignments of the original configuration that have to be changed such that a reconfiguration conform to the new requirements (Rρ) is possible. The algorithm supports the determination of leading diagnoses, i.e., diagnoses that are preferred with regard to given user preferences (Felfernig et al. 2012; Walter et al. 2016). FlexDiag is based on a strict lexicographical ordering of the constraints in S: the lower the importance of a constraint siS the lower the index of the constraint in S. For example, s1 : pc = 3 has the lowest ranking. The lower the ranking, the higher the probability that the constraint will be part of a reconfiguration SΔ. Since s1 has the lowest priority and it is part of a conflict, it is element of the diagnosis returned by FlexDiag. For a discussion of the properties of lexicographical orderings we refer to Felfernig et al. (2012) and Junker (2004).

5 Evaluation

In this section, we present the evaluation we executed to verify the performance of FlexDiag. We first analyze how FlexDiag performs in front of real and randomly generated models and then, compare it with an evolutionary approach.

5.1 Evaluation aspects

To evaluate FlexDiag, we analyzed the two aspects of (1) algorithm performance (in terms of milliseconds needed to determine one minimal diagnosis) and (2) diagnosis quality (in terms of minimality and accuracy – see Formulae (1) and (2)). We analyzed both aspects by varying the value of parameter m. Our hypothesis in this context was that the higher the value of m, the lower the number of needed consistency checks (the higher the efficiency of diagnosis search) and the lower diagnosis quality in terms of the share of diagnosis-relevant constraints returned by FlexDiag. Diagnosis quality can, for example, be measured in terms of the degree of minimality of the constraints in a diagnosis Δ (see Formula (1)), i.e., the cardinality of Δ compared to the cardinality of Δmin. |Δmin| represents the cardinality of a minimal diagnosis identified with m = 1.

$$ minimality({\Delta})=\frac{|{\Delta}_{min}|}{|{\Delta}|} $$

If m > 1, there is no guarantee that the diagnosis Δ determined for S is a superset of the diagnosis Δmin determined for S in the case m = 1. Besides minimality, we introduce accuracy as an additional quality indicator (see Formula (2)). The higher the share of elements of Δmin in Δ, the higher the corresponding accuracy (the algorithm is able to reproduce the elements of the minimal diagnosis for m = 1).

$$ accuracy({\Delta})=\frac{|{\Delta} \cap {\Delta}_{min}|}{|{\Delta}_{min}|} $$

5.2 Datasets and results

We evaluated FlexDiag with regard to both metrics (algorithm performance, and diagnosis quality) by applying the algorithm to different benchmarks. First, using random feature models generated with the Betty tool (Segura et al. 2012). Second, with the set of models hosted in the S.P.L.O.T repository.Footnote 3 Third, in a real-world model extracted from the last Ubuntu Linux distribution (Galindo et al. 2010). Finally, a real-world automotive dataset. The configuration models are feature models which include requirement constraints, compatibility constraints, and different types of structural constraints such as mandatory relationships and alternatives.

For all the different datasets we report on averaged values. For that, we first, calculate the acurracy, execution time, and minimality for all the executions. Then, we aggregate the data and calculate the mean for the metrics.

5.2.1 Experimental platform

The experiments were conducted using a version of FlexDiag implemented in Java and integrated in the FaMa Tool Suite (Benavides et al. 2013). All the models were translated to a Constraint Satisfaction Problem (CSP) and used the Choco library for consistency checking.Footnote 4 Further, our FlexDiag implementation was running in a grid of computers running on four-CPU Dell Blades with Intel Xeon X5560 CPUs running at 2.8GHz, with 8 threads per CPU, and CentOS v6. The total RAM memory was 8GB. To parallelize the executions we used GNU Parallel (Tange 2011).

5.2.2 Random models

The first dataset used to evaluate FlexDiag was randomly generated. We used BeTTy (Segura et al. 2012) to generate a dataset that ranges from 50 to 2000 features and 10% to 30% of cross-tree constraints. The generation approach is based on Thum et al. (2009) that imitates realistic topologies.

For each model combining a given number of features and a percentage of cross-tree constraints, we randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FlexDiag on each combination of parameters three times to get average execution times trying to avoid third party threads.

In the following, we present the results showing a comparison between the different values of m and how the values evolved depending on the size of the models. Note that to generate the plots we aggregated the data and therefore the values shown are averaged results.

Figure 3 shows how the diagnosis performance can be increased depending on the setting of the m parameter. Also we observe how the minimality deteriorates when increasing m.

Fig. 3
figure 3

Random evolution based on features, time and m

Table 2 shows the averaged data we obtained. It is worth mentioning that the minimality decreases when m increases and that accuracy still provides acceptable results with m= 10. Also, the execution time (in milliseconds) is less than five minutes in the worst case.

Table 2 Random evaluation depending on m value and model size

As we can observe in Table 2 and Fig. 3, while the execution time decreases when incrementing m, quality deteriorates. However, minimality is clearly affected while accuracy stays with minor variations. Also, we observe that the time improvement depends on m and the number of features. For example, if we compare the time between m = 1 and m = 10, we can increase in runtime of 2.26× for 50 features, 5.21× for 500 features, 9.14× for 1000 and 16.5× for 2000 features.

5.2.3 SPLOT repository models

We extracted a total of 387 models from the SPLOT repository. For each model, we randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FlexDiag on each combination of parameters three times to get average execution times trying to avoid third party threads.

Table 3 shows the data of those models categorized as realistic in the repository. We again see that FlexDiag scales with no problem offering a good trade-off between accuracy and minimality while keeping the average runtime (in milliseconds) below a second.

Table 3 FlexDiag results in front of SPLOT realistic models

As we can observe in Table 3, while the execution time decreases when incrementing m, quality deteriorates again. However minimality is clearly affected while accuracy stays with minor variations, although, we can observe some special cases (“REAL-FM-5” with m = 2) were it deteriorates a bit more.

5.2.4 Ubuntu-based model

In order to test FlexDiag with large-scale real models, we encoded the variability existing in the Debian packaging system for the Ubuntu distribution and generated a set of configurations representing Ubuntu user installations with wrong package selections. Concretely, we modelled the Ubuntu XenialFootnote 5 distribution containing 58,107 packages and 52,721 constraints. This model was extracted using the mapping presented in Galindo et al. (2010) and Galindo et al. (2011). We executed FlexDiag with different m values. We randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FlexDiag on each combination of parameters three times to get average execution times trying to avoid third party threads.

Table 4 shows that FlexDiag is able to provide a good accuracy even with m set to 10. Also, it shows as expected, the negative impact of m regarding minimality. We observe that the execution time with m = 1 was 3.7 hours while with m = 10 it was 2.5 hours. This represents an improvement of runtime in 1.47×.

Table 4 Results obtained after executing FlexDiag with the Ubuntu Xenial variability model

5.2.5 Automotive models

The benchmark used in this experiment includes three automotive configuration models from a German car manufacturer. For each model, we randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FlexDiag on each combination of parameters three times to get average execution times trying to avoid third party threads.

Table 5 shows that FlexDiag is able to provide a good accuracy even with m set to 10. Again, it shows as expected, the negative impact of m regarding minimality. Also, the execution time for model with id = 1 and m = 1 was 6.33 hours while with m = 10 it was 7.9 minutes. This represents an improvement of 26.9×.

Table 5 FlexDiag evaluated with benchmarks from the automotive industry (calculation of the first diagnosis: ids 1–3 represent different type series of a German premium car manufacturer)

5.3 Comparing FlexDiag with evolutionary algorithms

In this research we do not compare FlexDiag with more traditional diagnosis approaches – for related evaluations we refer the reader to Felfernig et al. (2012) were detailed analyses can be found. These analyses clearly indicate that direct diagnosis approaches outperform standard diagnosis approaches based on the resolution of minimal conflicts (Reiter 1987) (if the search goal is to identify not all minimal but the so-called leading diagnoses which should, for example, be shown to users in interactive settings).

However, in this Section we compare FlexDiag with an evolutionary algorithm inspired by Ćendić-Lazović (2014) and Li and Yunfei (2002). The evolutionary algorithm has been build with the jenetics framework for JavaFootnote 6 leaving all parameters default and fixing the process to 500 generations. Also we compare the performance of FlexDiag with m set to 1 to have a fair comparison. Note that with higher values of m, we would even perform better in terms of runtime.

The first observation is that the evolutionary approach was not capable of dealing with very large realistic models (Ubuntu, automotive) when setting a time-out of 24 hours. Therefore, we report this comparison only relying on randomly generated models.

Figure 4 shows that the required time for FlexDiag is usually higher for models having less than 500 features. Therefore, there is a point when FlexDiag pays off and scales much better. Also, it is worth mentioning that the evolutionary algorithm was not capable of obtaining a complete and minimal explanation and returned only partial diagnoses. This is, in 500 generations it only found partial explanations.

Fig. 4
figure 4

Comparison between FlexDiag and the evolutionary approach regarding time

Table 6 shows that FlexDiag returns minimal diagnoses while we observe that the evolutionary approach was not capable of detecting minimal diagnoses. Also, we do see that FlexDiag offered a much better accuracy.

Table 6 Comparison results of the evolutionary approach and FlexDiag

5.4 Threats to validity

Even though the experiments presented in this paper provide evidence that the solution proposed is valid, there are some assumptions that we made that may affect their validity. In this section, we discuss the different threats to validity that affect the evaluation.

External validity

The inputs used for the experiments presented in this paper were either realistic or designed to mimic realistic feature models. The Debian feature model and the Automotive are realistic since numerous experts were involved in the design. However, since they were developed using a manual design process, it may have errors and not encode all configurations. Also, the random feature models may not accurately reflect the structure of real feature models used in industry. The major threats to the external validity are:

  • Population validity, the real feature models that we used may not represent all valid configurations in the domains due it manual construction. Also, random models might not have the same structure as real models (e.g. mathematical operators used in the complex constraints). To reduce these threats to validity, we generated the models using previously published techniques (Thum et al. 2009) and using existing implementations of these techniques in Betty (Segura et al. 2012).

  • Ecological validity: While external validity, in general, is focused on the generalization of the results to other contexts (e.g. using other models), the ecological validity ii focused on possible errors in the experiment materials and tools used. To prevent ecological validity threats, such as third party threads running in the virtual machines and impacting performance, the FlexDiag analyses were executed three times and then averaged.

Internal validity

The CPU resources required to analyse a feature model depend on the number of features and percentage of cross-tree constraints. However, there may be other variables that affect performance, such as the nature of the constraints used. To minimize these other possible effects, we introduced a variety of models to ensure that we covered a large part of the constraint space.

5.5 Final remarks

We observed that FlexDiag scales up with random and real-world feature models. Observing that, generally, diagnosis quality in terms of minimality and accuracy deteriorates with an increasing size of parameter m.

Minimality and accuracy depend on the configuration domain and are not necessarily monotonous. For example, since a diagnosis determined by FlexDiag is not necessarily a superset of a diagnosis determined with m = 1, it can be the case that the minimality of a diagnosis determined with m > 1 is greater than 1 (if FlexDiag determines a diagnosis with lower cardinality than the minimal diagnosis determined with m = 1). For simplicity, let us assume that AC = S = {c1,c2,c3,c4,c5,c6,c7,c8} and the following conflict sets CSi exist between the constraints ciS: CS1 : {c1,c3}, CS2 : {c2,c3}, and CS3 : {c4,c6}. Given m = 1, FlexDiag would determine the diagnosis {c1,c2,c4} whereas in the case of m = 2, {c3,c4} is returned by the algorithm.

6 Another example: reconfiguration in production

The following simplified reconfiguration task is related to scheduling in production where it is often the case that, for example, schedules and corresponding production equipment has to be reconfigured. In this example setting, we do not take into account configurable production equipment (configurable machines) and limit the reconfiguration to the assignment of orders to corresponding machines. The assignment of an order oi to a certain machine mj is represented by the corresponding variable oimj. The domain of each such variable represents the different possible slots in which an order can be processed, for example, o1m1 = 1 denotes the fact that the processing of order o1 on machine m1 is performed during and finished after time slot 1.

Further constraints restrict the way in which orders are allowed to be assigned to machines, for example, o1m1 < o1m2 denotes the fact that order o1 must be completed on machine m1 before a further processing is started on machine m2. Furthermore, no two orders must be assigned to the same machine during the same time slot, for example, o1m1o2m1 denotes the fact that order o1 and o2 must not be processed on the same machine in the same time slot (slots 1..3). Finally, the definition of our reconfiguration task is completed with an already determined schedule S and a corresponding reconfiguration request represented by the reconfiguration requirement Rρ = {r1 : o3m3 < 5}, i.e., order o3 should be completed within less than 5 time units.

  • V = {o1m1,o1m2,o1m3,o2m1,o2m2,o2m3,o3m1,o3m2,o3m3}

  • dom(o1m1) = dom(o2m1) = dom(o3m1) = {1,2,3}.dom(o1m2) = dom(o2m2) = dom(o3m2) = {2,3,4}.dom(o1m3) = dom(o2m3) = dom(o3m3) = {3,4,5}.

  • C = {c1 : o1m1 < o1m2,c2 : o1m2 < o1m3,c3 : o2m1 < o2m2,c4 : o2m2 < o2m3,c5 : o3m1 < o3m2,c6 : o3m2 < o3m3,c7 : o1m1o2m1, c8 : o1m1o3m1,c9 : o2m1o3m1, c10 : o1m2o2m2,c11 : o1m2o3m2, c12 : o2m2o3m2,c13 : o1m3o2m3, c14 : o1m3o3m3,c15 : o2m3o3m3}

  • S = {s1 : o1m1 = 1,s2 : o1m2 = 2,s3 : o1m3 = 3, s4 : o2m1 = 2,s5 : o2m2 = 3,s6 : o2m3 = 4, s7 : o3m1 = 3,s8 : o3m2 = 4,s9 : o3m3 = 5}

  • \(\phantom {\dot {i}\!}R_{\rho } = \{r_{1}^{\prime }: o_{3}m_{3} < 5\}\)

This reconfiguration task can be solved using FlexDiag. If we keep the ordering of the constraints as defined in S, FlexDiag (with m = 1) returns the diagnosis Δ : {s1,s2,s3,s7,s8,s9} which can be used to determine the new solution S = {s1 : o1m1 = 3,s2 : o1m2 = 4,s3 : o1m3 = 5,s4 : o2m1 = 2,s5 : o2m2 = 3,s6 : o2m3 = 4,s7 : o3m1 = 1,s8 : o3m2 = 2,s9 : o3m3 = 3} (see Table 7). If we change the parametrization to m = 2, FlexDiag returns the same diagnosis but in approximately half of the time (with 10 iterations, 16 milliseconds were needed on an average for m = 2 whereas 31 milliseconds were needed for m = 1). This is consistent with the estimates in Table 1.

Table 7 Reconfiguration determined for rescheduling task – S represents the original configuration and S represents a configuration resulting from a reconfiguration task

Possible ordering criteria for constraints in such rescheduling scenarios can be, for example, customer value (changes related to orders of important customers should occur with a significantly lower probability) and the importance of individual orders. If some orders in a schedule should not be changed, this can be achieved by simply defining such requests as requirements (Rρ), i.e., change requests as well as stability requests can be included as constraints \(\phantom {\dot {i}\!}r_{i}^{\prime }\) in Rρ.

7 Future work

In our work, we focused on the evaluation of reconfiguration scenarios where the knowledge base itself is assumed to be consistent. In future work, we will extend the FlexDiag algorithm to make it applicable in scenarios where knowledge bases are tested (Felfernig et al. 2004). An example issue is to take into account situations where unintended configurations are accepted by the knowledge base. In this context, we will extend the work of Felfernig et al. (2004) by not only taking into account negative test cases but also automatically generate relevant test cases, for example, on the basis of mutation testing approaches. We plan to extend our empirical evaluation to further industrial configuration knowledge bases. Furthermore, we want to analyze in which way we are able to further improve the output quality (e.g., in terms of minimality and accuracy) of FlexDiag, for example, by applying different constraint orderings depending on the observed interaction patterns (of users) and probability estimates for diagnosis membership derived thereof. The better potentially relevant constraints are predicted the better the diagnosis quality in terms of the mentioned metrics of minimality and accuracy. Note that, for example, counting the number of elements already identified as partial diagnosis elements in FlexDiag does not help to keep diagnosis determination within certain time limits, however, this mechanism could be used when determing more than one diagnosis to include diagnosis size as a relevance criterion. Also, in this context will analyze further alternatives to evaluate the quality of diagnoses which go beyond the metrics used in this article.

8 Conclusions

Efficient reconfiguration functionalities are needed in various scenarios such as the reconfiguration of production schedules, the reconfiguration of the settings in mobile phone networks, and the reconfiguration of robot context information. We analyzed the FlexDiag algorithm with regard to potentials of improving existing direct diagnosis algorithms. When using FlexDiag, there is a clear trade-off between performance of diagnosis calculation and diagnosis quality (measured, for example, in terms of minimality and accuracy).