Anytime Diagnosis for Reconfiguration

Many domains require scalable algorithms that help to determine diagnoses efficiently and often within predefined time limits. Anytime diagnosis is able to determine solutions in such a way and thus is especially useful in real-time scenarios such as production scheduling, robot control, and communication networks management where diagnosis and corresponding reconfiguration capabilities play a major role. Anytime diagnosis in many cases comes along with a trade-off between diagnosis quality and the efficiency of diagnostic reasoning. In this paper we introduce and analyze FlexDiag which is an anytime direct diagnosis approach. We evaluate the algorithm with regard to performance and diagnosis quality using a configuration benchmark from the domain of feature models and an industrial configuration knowledge base from the automotive domain. Results show that FlexDiag helps to significantly increase the performance of direct diagnosis search with corresponding quality tradeoffs in terms of minimality and accuracy.

Bakker et al. [3] show how to apply model-based diagnosis [36] to determine minimal sets of constraints in a knowledge base that are responsible for a given inconsistency. A variant thereof is documented in Felfernig et al. [12] where an approach to the automated debugging of knowledge bases with test cases is introduced. Test cases are interpreted as positive or negative examples that describe the intended behavior of a knowledge base. If some positive examples induce conflicts in the configuration knowledge base, some of the constraints in the knowledge base are faulty and have to be adapted or deleted. If some negative examples are accepted (i.e., not rejected) by the configuration knowledge base, further constraints have to be included in order to take these examples into account (in Felfernig et al. [12] this issue is solved by simply including negative examples in negated form into the configuration knowledge base). A related approach in the area of software product lines is proposed in [56]. Second, customer requirements can be inconsistent with the underlying knowledge base. 1 Felfernig et al. [12] also show how to diagnose customer requirements that are inconsistent with a configuration knowledge base. The underlying assumption is that the configuration knowledge base itself is consistent but combined with a set of requirements is inconsistent.
The so far mentioned configuration-related diagnosis approaches are based on conflict-directed hitting set determination where conflicts have to be calculated in order to be able to derive one or more corresponding diagnoses [8,27,28,36,41]. These approaches often determine diagnoses in a breadth-first search manner which allows the identification of minimal cardinality diagnoses. The major disadvantage of applying these approaches is the need of determining minimal conflicts which is inefficient especially in cases where only the leading diagnoses (the most relevant ones) are sought. Furthermore, in many application domains it is not necessarily the case that minimal cardinality diagnoses are the preferred ones -Felfernig et al. [14] show how recommendation technologies [26] can be exploited for guiding the search for preferred (minimal but not necessarily minimal cardinality) diagnoses.
Algorithms based on the idea of anytime diagnosis are useful in scenarios where diagnoses have to be provided in real-time, i.e., within given time limits. Efficient diagnosis and reconfiguration of communication networks is crucial to retain the quality of service, i.e., if some components/nodes in a network fail, corresponding substitutes and extensions have to be determined immediately [34,47]. In today's production scenarios which are characterized by small batch sizes and high product variability, it is increasingly important to develop algorithms that support the efficient reconfiguration of schedules. Such functionalities support the paradigm of smart production, i.e., the flexible and efficient production of highly variant products. Further applications are the diagnosis and repair of robot control software [44], sensor networks [35], feature models [27,56], the reconfiguration of cars [54], and the reconfiguration of buildings [19]. In the diagnosis approach presented in this paper, we assure diagnosis determination within certain time limits by systematically reducing the number of solver calls needed. This specific interpretation of anytime diagnosis requires a trade-off between diagnosis quality (evaluated, e.g., in terms of minimality) and the time needed for diagnosis determination.
Algorithmic approaches to provide efficient solutions for diagnosis problems are manyfold. Some approaches focus on improvements of Reiter's original hitting set directed acyclic graph (HSDAG) [36] in terms of a personalized computation of leading diagnoses [9] or other extensions that make the basic approach [36] more efficient [57]. Wang et al. [55] introduce an approach to derive binary decision diagrams (BDDs) [1,6] on the basis of a pre-determined set of conflicts -diagnoses can then be determined by finding paths in the BDD that include given variable settings (e.g., requirements defined by the user). A predefined set of conflicts can also be compiled into a corresponding linear optimization problem [16]; diagnoses can then be determined by solving the given problem. In knowledge-based recommendation scenarios, diagnoses for user requirements can be pre-compiled in such a way that for a given set of customer requirements, the diagnosis search task can be reduced to querying a relational table (see, for example, [25,39]). All of the mentioned approaches either extend the approach of Reiter [36] or improve efficiency by exploiting pre-generated information about conflicts or diagnoses.
An alternative to conflict-directed diagnosis [36] are direct diagnosis algorithms that determine minimal diagnoses without the need of predetermining minimal conflict sets [15,42]. The FASTDIAG algorithm [15] is a divide-and-conquer based algorithm that supports the determination of diagnoses without a preceding conflict detection. Such direct diagnosis approaches are especially useful in situations where not the complete set of diagnoses has to be determined but users are interested in the leading diagnoses, i.e., diagnoses with a high probability of being relevant for the user. Also in the context of SAT solving, algorithms have been developed that allow the efficient determination of diagnoses (also denoted as minimal correction subsets) in an efficient fashion [2,22,32,33]. Beside efficiency, prediction quality of a diagnosis algorithm is a major issue in interactive configuration settings, i.e., those diagnoses have to be identified that are relevant for the user. A corresponding comparison of approaches to determine preferred minimal diagnoses and unsatisfied clauses with minimum total weights is provided in [52]. The authors point out theoretical commonalities and prove the reducibility of both concepts to each other.
In this paper we show how the FASTDIAG approach can be converted into an anytime diagnosis algorithm (FLEXDIAG) that allows tradeoffs between diagnosis quality (minimality and accuracy) and performance. In this paper we focus on reconfiguration scenarios [19,34,47,53], i.e., we show how FLEXDIAG can be applied in situations where a given configuration (solution) has to be adapted conform to a changed set of customer requirements. Our contributions in this paper are the following. First, based on previous work on the diagnosis of inconsistent knowledge bases, we show how to solve reconfiguration tasks with direct diagnosis. Second, we make direct diagnosis anytime-aware by including a parametrization that helps to systematically limit the number of consistency checks and thus make diagnosis search more efficient. Finally, we report the results of a FLEXDIAG-related evaluation conducted on the basis of real-world configuration knowledge bases (feature models and configuration knowledge bases from the automotive industry) and discuss quality properties of related diagnoses not only in terms of minimality but also in terms of accuracy.
The remainder of this paper is organized as follows. In Section 2 we introduce an example configuration knowledge base from the domain of resource allocation. This knowledge base will serve as a working example throughout the paper. Thereafter (Section 3) we introduce a definition of a reconfiguration task. In Section 4 we discuss basic principles of direct diagnosis on the basis of FLEXDIAG and show how this algorithm can be applied in reconfiguration scenarios. In Section 5 we present the results of an analysis of algorithm performance and the quality of determined diagnoses. A simple example of the application of FLEXDIAG in production environments is given in Section 6. In Section 7 we discuss issues for future work. With Section 8 we conclude the paper.

Example Configuration Knowledge Base
A configuration system determines configurations (solutions) on the basis of a given set of customer requirements [24]. In many cases, constraint satisfaction problem (CSP) representations are used for the definition of a configuration task. 2 A configuration task and a corresponding configuration (solution) can be defined as follows: Definition 1 (Configuration Task and Configuration). A configuration task can be defined as a CSP (V, represents domain definitions, and C = {c 1 , c 2 , ..., c m } is a set of constraints (the configuration knowledge base). Additionally, user requirements are represented by a set of constraints R = {r 1 , r 2 , ..., r k } where R and C are disjoint. A configuration (solution) for a configuration task is a complete set of assignments (constraints) S = {s 1 : v 1 = a 1 , s 2 : v 2 = a 2 , ..., s n : v n = a n } where a i ∈ dom(v i ) which is consistent with C ∪ R.
An example of a configuration task represented as a constraint satisfaction problem (CSP) is the following. Example (Configuration Task). In this resource allocation problem example, items (a barrel of fuel, a stack of paper, a pallet of fireworks, a pallet of personal computers, a pallet of computer games, a barrel of oil, a pallet of roof tiles, and a pallet of rain pipes) have to be assigned to three different containers. There are a couple of constraints (c i ) to be taken into account, for example, fireworks must not be combined with fuel (c 1 ). Furthermore, there is one requirement (r 1 ) which indicates that the pallet of fireworks has to be assigned to container 1. On the basis of this configuration task definition, a configurator can determine a configuration (solution) S. On the basis of the given definition of a configuration task, we now introduce the concept of reconfiguration (see also [19,34,47,53]).

Reconfiguration Task
It can be the case that an existing configuration S has to be adapted due to new customer requirements. Examples thereof are changing requirements in production schedules, failing components or overloaded network infrastructures in a mobile phone network, and changes in the internal model of the environment of a robot. In the following we assume that the pallet of paper should be reassigned to container 3 and the personal computer and games pallets should be assigned to the same container. Formally, the set of new requirements is represented by R ρ : {r 1 : pc = games, r 2 : paper = 3}. In order to determine reconfigurations, we have to calculate a corresponding diagnosis ∆ (see Definition 2).
On the basis of the definition of a minimal diagnosis, we can introduce a formal definition of a reconfiguration task. Definition 3 (Reconfiguration Task and Reconfiguration). A reconfiguration task can be defined as a CSP (V, D, C, S, R ρ ) where V is a set of variables, D represents variable domain definitions, C is a set of constraints, S represents an existing configuration, and R ρ = {r 1 , r 2 , ..., r q } (R ρ consistent with C) represents a set of reconfiguration requirements. Furthermore, let ∆ be a minimal diagnosis for the reconfiguration task. A reconfiguration is a variable assignment If R ρ is inconsistent with C, the new requirements have to be analyzed and changed before a corresponding reconfiguration task can be triggered [10,14]. An example of a reconfiguration task in the context of our configuration knowledge base is the following.
Example (Reconfiguration Task). In the resource allocation problem, the original customer requirements R are substituted by the requirements R ρ = {r 1 : pc = games, r 2 : paper = 3}. The resulting reconfiguration task instance is the following.
To solve a reconfiguration task (see Definition 3), conflict-directed diagnosis approaches [36] would determine a set of minimal conflicts and then determine a hitting set that resolves each of the identified conflicts. In this context, a minimal conflict set CS ⊆ S is a minimal set of variable assignments that trigger an inconsistency with C ∪ R ρ , i.e., CS ∪ C ∪ R ρ is inconsistent and there does not exist a conflict set CS with CS ⊂ CS. In our working example, the minimal conflict sets are CS 1 : The elements in a diagnosis indicate which variable assignments have to be adapted such that a reconfiguration can be determined that takes into account the new requirements in R ρ . Consequently, a reconfiguration represents a minimal set of changes to the original configuration (S) such that the new requirements R ρ are taken into account. If we choose ∆ 1 , a reconfiguration S ∆ (reassignments for the variable assignments in ∆ 1 ) can be determined by a CSP solver call C ∪ R ρ ∪ (S − ∆ 1 ). The resulting configuration S can be {s 1 : pc = 1, s 2 : games = 1, s 3 : paper = 3, s 4 : f uel = 2, s 5 : f ireworks = 1, s 6 : oil = 2, s 7 : roof = 1, s 8 : pipes = 1}. For a detailed discussion of conflict-based diagnosis we refer to Reiter [36]. In the following we introduce an approach to the determination of minimal reconfigurations which is based on a direct diagnosis algorithm, i.e., diagnoses are determined without the need of determining related minimal conflict sets.

Reconfiguration with FLEXDIAG
In the following discussions, the set AC = C ∪ R ρ ∪ S represents the union of all constraints that restrict the set of possible solutions for a given reconfiguration task. Furthermore, S represents a set of constraints that are considered as candidates for being included in a diagnosis ∆. The idea of FLEXDIAG (Algorithm 1) is to systematically filter out the constraints that become part of a minimal diagnosis using a divide-and-conquer based approach.
Sketch of Algorithm. In our example reconfiguration task, the original configuration S = {s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 } and the new set of customer requirements is R ρ = {r 1 , r 2 }. Since S ∪ R ρ ∪ C is inconsistent, we are in need of a minimal diagnosis ∆ and a reconfiguration S ∆ such that S − ∆ ∪ S ∆ ∪ R ρ ∪ C is consistent. In the following we will show how the FLEXDIAG (Algorithm 1) can be applied to determine such a minimal diagnosis ∆.
FLEXDIAG is assumed to be activated under the assumption that AC = C ∪ R ρ ∪ S is inconsistent, i.e., the consistency of AC is not checked by the algorithm. If AC is inconsistent but AC − S is also inconsistent, FLEXDIAG end if 14: k = q 2 ; 15: return(D 1 ∪ D 2 ); 19: end procedure will not be able to identify a diagnosis in S; therefore ∅ is returned. Otherwise, a recursive function FLEXD is activated which is in charge of determining one minimal diagnosis ∆ ⊆ S. In each recursive step, the constraints in S are divided into two different subsets (S 1 and S 2 ) in order to figure out if already one of these subsets includes a diagnosis. If this is the case, the second set must not be inspected for diagnosis elements anymore. If we assume, for example, S = {s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 } is inconsistent and we divide S into the two subsets S 1 = {s 1 , s 2 , s 3 , s 4 } and S 2 = {s 5 , s 6 , s 7 , s 8 } and S 1 is already consistent with C ∪ R ρ then diagnosis elements are searched in S 2 (since S 1 is already consistent). The complete related walkthrough is depicted in Figures 1 and 2.
FLEXDIAG is based on the concepts of FASTDIAG [15], i.e., it returns one diagnosis (∆) at a time and is complete in the sense that if a diagnosis is contained in S, then the algorithm will find it. A corresponding reconfiguration can be determined by a CSP solver call C ∪ R ρ ∪ (S − ∆). The determination of multiple diagnoses at a time can be realized on the basis of the construction of a HSDAG [36]. In FLEXDIAG, the parameter m is used to control diagnosis quality in terms of minimality, accuracy, and the performance of diagnostic search (see Section 5). The higher the value of m the higher the performance of FLEXDIAG and the lower the degree of diagnosis quality. The inclusion of m to control quality and performance is the major difference between FLEXDIAG and FASTDIAG. If m = 1 (see Algorithm 1), the number of consistency checks needed for determining one minimal diagnosis is 2δ × log 2 ( n δ ) + 2δ (in the worst case) [15]. In this context, δ represents the set size of the minimal diagnosis ∆ and n represents the number of constraints in solution S.
If m > 1, the number of needed consistency checks can be systematically reduced if we accept the tradeoff of possibly loosing the property of diagnosis minimality (see Definition 2). If we allow settings with m > 1, we can reduce the upper bound of the number of consistency checks to 2δ × log 2 ( 2n δ×m ) (in the worst case). These upper bounds regarding the number of needed consistency checks allow to estimate the worst case runtime performance of the diagnosis algorithm which is extremely important for real-time scenarios. Consequently, if we are able to estimate the upper limit of the time needed for completing one consistency check (e.g., on the basis of simulations with an underlying constraint solver), we are also able to figure out lower bounds for m that must be chosen in order to guarantee a FLEXDIAG runtime within predefined time limits. Table 1 depicts an overview of consistency checks needed depending on the setting of the parameter m and the diagnosis size δ for |S| = 16. For example, if m = 2 and the size of a minimal diagnosis is δ = 4, then the upper bound for the number of needed consistency checks is 16. If the size of δ further increases, the number of corresponding consistency checks does not increase anymore. Figures 1 and 2 depict FLEXDIAG search trees depending on the setting of granularity parameter m. The upper bound for the number of consistency checks helps us to determine the maximum amount of time that will be needed to determine a diagnosis on the basis of FLEXDIAG. For example, if the maximum time needed for one consistency check is 20ms, the maximum time needed for determining a diagnosis with m = 2 (given δ = 8) is ≈ 320 milliseconds.
FLEXDIAG determines one diagnosis at a time which indicates variable assignments of the original configuration that have to be changed such that a reconfiguration conform to the new requirements (R ρ ) is possible. The algorithm supports the determination of leading diagnoses, i.e., diagnoses that are preferred with regard to given user preferences [15,52]. FLEXDIAG is based on a strict lexicographical ordering of the constraints in S: the lower the importance of a constraint s i ∈ S the lower the index of the constraint in S. For example, s 1 : pc = 3 has the lowest ranking. The  δ   m=1  m=2  m=4  m=8  1  10  8  6  4  2  16  12  8  4  4  24  16  8  -8  32  16  --16 32 --- Table 1 Worst-case estimates for the number of needed consistency checks depending on the granularity parameter m and the diagnosis size δ for |S| = 16.  lower the ranking, the higher the probability that the constraint will be part of a reconfiguration S ∆ . Since s 1 has the lowest priority and it is part of a conflict, it is element of the diagnosis returned by FLEXDIAG. For a discussion of the properties of lexicographical orderings we refer to [15,28].

Evaluation
In this section, we present the evaluation we executed to verify the performance of FLEXDIAG. We first analyze how FLEXDIAG performs in front of real and randomly generated models and then, compare it with an evolutionary approach.

Evaluation aspects
To evaluate FLEXDIAG, we analyzed the two aspects of (1) algorithm performance (in terms of milliseconds needed to determine one minimal diagnosis) and (2) diagnosis quality (in terms of minimality and accuracy -see Formulae 1 and 2). We analyzed both aspects by varying the value of parameter m. Our hypothesis in this context was that the higher the value of m, the lower the number of needed consistency checks (the higher the efficiency of diagnosis search) and the lower diagnosis quality in terms of the share of diagnosis-relevant constraints returned by FLEXDIAG. Diagnosis quality can, for example, be measured in terms of the degree of minimality of the constraints in a diagnosis ∆ (see Formula 1), i.e., the cardinality of ∆ compared to the cardinality of ∆ min . |∆ min | represents the cardinality of a minimal diagnosis identified with m = 1.
If m > 1, there is no guarantee that the diagnosis ∆ determined for S is a superset of the diagnosis ∆ min determined for S in the case m = 1. Besides minimality, we introduce accuracy as an additional quality indicator (see Formula 2). The higher the share of elements of ∆ min in ∆, the higher the corresponding accuracy (the algorithm is able to reproduce the elements of the minimal diagnosis for m = 1).

Datasets and Results
We evaluated FLEXDIAG with regard to both metrics (algorithm performance, and diagnosis quality) by applying the algorithm to different benchmarks. First, using random feature models generated with the Betty tool [40]. Second, with the set of models hosted in the S.P.L.O.T repository 3 . Third, in a real-world model extracted from the last Ubuntu Linux distribution [20]. Finally, a real-world automotive dataset. The configuration models are feature models which include requirement constraints, compatibility constraints, and different types of structural constraints such as mandatory relationships and alternatives. For all the different datasets we report on averaged values. For that, we first, calculate the acurracy, execution time, and minimality for all the executions. Then, we aggregate the data and calculate the mean for the metrics.

Experimental platform
The experiments were conducted using a version of FLEXDIAG implemented in Java and integrated in the FaMa Tool Suite [5]. All the models were translated to a Constraint Satisfaction Problem (CSP) and used the Choco library for consistency checking. 4 Further, our FLEXDIAG implementation was running in a grid of computers running on four-CPU Dell Blades with Intel Xeon X5560 CPUs running at 2.8GHz, with 8 threads per CPU, and CentOS v6. The total RAM memory was 8GB. To parallelize the executions we used GNU Parallel [48].

Random models
The first dataset used to evaluate FLEXDIAG was randomly generated. We used BeTTy [40] to generate a dataset that ranges from 50 to 2000 features and 10% to 30% of cross-tree constraints. The generation approach is based on Thüm et al. [49] that imitates realistic topologies.
For each model combining a given number of features and a percentage of cross-tree constraints, we randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FLEXDIAG on each combination of parameters three times to get average execution times trying to avoid third party threads.
In the following, we present the results showing a comparison between the different values of m and how the values evolved depending on the size of the models. Note that to generate the plots we aggregated the data and therefore the values shown are averaged results. Figure 3 shows how the diagnosis performance can be increased depending on the setting of the m parameter. Also we observe how the minimality deteriorates when increasing m. Table 2 shows the averaged data we obtained. It is worth mentioning that the minimality decreases when m increases and that accuracy still provides acceptable results with m = 10. Also, the execution time (in milliseconds) is less than five minutes in the worst case.
As we can observe in Table 2 and Figure 3, while the execution time decreases when incrementing m, quality deteriorates. However, minimality is clearly affected while accuracy stays with minor variations. Also, we observe that

SPLOT repository models
We extracted a total of 387 models from the SPLOT repository. For each model, we randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FLEXDIAG on each combination of parameters three times to get average execution times trying to avoid third party threads. Table 3 shows the data of those models categorized as realistic in the repository. We again see that FLEXDIAG scales with no problem offering a good trade-off between accuracy and minimality while keeping the average runtime (in milliseconds) below a second.
As we can observe in Table 3, while the execution time decreases when incrementing m, quality deteriorates again. However minimality is clearly affected while accuracy stays with minor variations, although, we can observe some special cases ("REAL-FM-5" with m = 2) were it deteriorates a bit more.

Ubuntu-based model
In order to test FLEXDIAG with large-scale real models, we encoded the variability existing in the Debian packaging system for the Ubuntu distribution and generated a set of configurations representing Ubuntu user installations with wrong package selections. Concretely, we modelled the Ubuntu Xenial 5 distribution containing 58,107 packages and 52,721 constraints. This model was extracted using the mapping presented in [20,21]. We executed FLEXDIAG with different m values. We randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FLEXDIAG on each combination of parameters three times to get average execution times trying to avoid third party threads. Table 4 shows that FLEXDIAG is able to provide a good accuracy even with m set to 10. Also, it shows as expected, the negative impact of m regarding minimality. We observe that the execution time with m = 1 was 3.7 hours while with m = 10 it was 2.5 hours. This represents an improvement of runtime in 1.47×.

Automotive models
The benchmark used in this experiment includes three automotive configuration models from a German car manufacturer. For each model, we randomly generated different sizes of reconfiguration requirements that involved the 10%, 30%, 50% and 100% of features of the model. Then we randomly reordered each of the reconfiguration requirements 10 times (to prevent ordering biases). Moreover, we executed FLEXDIAG on each combination of parameters three times to get average execution times trying to avoid third party threads. Table 5 shows that FLEXDIAG is able to provide a good accuracy even with m set to 10. Again, it shows as expected, the negative impact of m regarding minimality. Also, the execution time for model with id = 1 and m = 1 was 6.33 hours while with m = 10 it was 7.9 minutes. This represents an improvement of 26.9×.

Comparing FLEXDIAG with Evolutionary Algorithms
In this research we do not compare FLEXDIAG with more traditional diagnosis approaches -for related evaluations we refer the reader to [15] were detailed analyses can be found. These analyses clearly indicate that direct diagnosis approaches outperform standard diagnosis approaches based on the resolution of minimal conflicts [36] (if the search goal is to identify not all minimal but the so-called leading diagnoses which should, for example, be shown to users in interactive settings).
However, in this Section we compare FLEXDIAG with an evolutionary algorithm inspired by [7,30]. The evolutionary algorithm has been build with the jenetics framework for Java 6 leaving all parameters default and fixing the process to 500 generations. Also we compare the performance of FLEXDIAG with m set to 1 to have a fair comparison. Note that with higher values of m, we would even perform better in terms of runtime. Table 5 FLEXDIAG evaluated with benchmarks from the automotive industry (calculation of the first diagnosis: ids 1-3 represent different type series of a German premium car manufacturer). The second column shows the m value used in FLEXDIAG. |V | represents the number of variables in the CSP. |C| refers to the number of constraints in the CSP, |∆| is the average size of a diagnosis, and average time (in milliseconds), average minimality, and average accuracy represent the means of the calculated values for those metrics. The first observation is that the evolutionary approach was not capable of dealing with very large realistic models (Ubuntu, automotive) when setting a time-out of 24 hours. Therefore, we report this comparison only relying on randomly generated models. Figure 4 shows that the required time for FLEXDIAG is usually higher for models having less than 500 features. Therefore, there is a point when FLEXDIAG pays off and scales much better. Also, it is worth mentioning that the evolutionary algorithm was not capable of obtaining a complete and minimal explanation and returned only partial diagnoses. This is, in 500 generations it only found partial explanations.   Table 6 shows that FLEXDIAG returns minimal diagnoses while we observe that the evolutionary approach was not capable of detecting minimal diagnoses. Also, we do see that FLEXDIAG offered a much better accuracy. Table 6 Comparison results of the evolutionary approach and FLEXDIAG. |V | represents the number of variables in the CSP. Approach refers to the used approach, |C| refers to the number of constraints in the CSP, |∆| is the average size of a diagnosis, and average time (in milliseconds), average minimality, and average accuracy represent the means of the calculated values for those metrics.

Threats to validity
Even though the experiments presented in this paper provide evidence that the solution proposed is valid, there are some assumptions that we made that may affect their validity. In this section, we discuss the different threats to validity that affect the evaluation. External validity. The inputs used for the experiments presented in this paper were either realistic or designed to mimic realistic feature models. The Debian feature model and the Automotive are realistic since numerous experts were involved in the design. However, since they were developed using a manual design process, it may have errors and not encode all configurations. Also, the random feature models may not accurately reflect the structure of real feature models used in industry. The major threats to the external validity are: -Population validity, the real feature models that we used may not represent all valid configurations in the domains due it manual construction. Also, random models might not have the same structure as real models (e.g. mathematical operators used in the complex constraints). To reduce these threats to validity, we generated the models using previously published techniques [49] and using existing implementations of these techniques in Betty [40]. -Ecological validity: While external validity, in general, is focused on the generalization of the results to other contexts (e.g. using other models), the ecological validity ii focused on possible errors in the experiment materials and tools used. To prevent ecological validity threats, such as third party threads running in the virtual machines and impacting performance, the FLEXDIAG analyses were executed three times and then averaged.
Internal validity The CPU resources required to analyse a feature model depend on the number of features and percentage of cross-tree constraints. However, there may be other variables that affect performance, such as the nature of the constraints used. To minimize these other possible effects, we introduced a variety of models to ensure that we covered a large part of the constraint space.

Final Remarks
We observed that FLEXDIAG scales up with random and real-world feature models. Observing that, generally, diagnosis quality in terms of minimality and accuracy deteriorates with an increasing size of parameter m.
Minimality and accuracy depend on the configuration domain and are not necessarily monotonous. For example, since a diagnosis determined by FLEXDIAG is not necessarily a superset of a diagnosis determined with m = 1, it can be the case that the minimality of a diagnosis determined with m > 1 is greater than 1 (if FLEXDIAG determines a diagnosis with lower cardinality than the minimal diagnosis determined with m = 1). For simplicity, let us assume that AC = S = {c 1 , c 2 , c 3 , c 4 , c 5 , c 6 , c 7 , c 8 } and the following conflict sets CS i exist between the constraints c i ∈ S:

Another Example: Reconfiguration in Production
The following simplified reconfiguration task is related to scheduling in production where it is often the case that, for example, schedules and corresponding production equipment has to be reconfigured. In this example setting, we do not take into account configurable production equipment (configurable machines) and limit the reconfiguration to the assignment of orders to corresponding machines. The assignment of an order o i to a certain machine m j is represented by the corresponding variable o i m j . The domain of each such variable represents the different possible slots in which an order can be processed, for example, o 1 m 1 = 1 denotes the fact that the processing of order o 1 on machine m 1 is performed during and finished after time slot 1.
Further constraints restrict the way in which orders are allowed to be assigned to machines, for example, o 1 m 1 < o 1 m 2 denotes the fact that order o 1 must be completed on machine m 1 before a further processing is started on machine m 2 . Furthermore, no two orders must be assigned to the same machine during the same time slot, for example, o 1 m 1 = o 2 m 1 denotes the fact that order o 1 and o 2 must not be processed on the same machine in the same time slot (slots 1..3). Finally, the definition of our reconfiguration task is completed with an already determined schedule S and a corresponding reconfiguration request represented by the reconfiguration requirement R ρ = {r 1 : o 3 m 3 < 5}, i.e., order o 3 should be completed within less than 5 time units.  Table 7). If we change the parametrization to m = 2, FLEXDIAG returns the same diagnosis but in approximately half of the time (with 10 iterations, 16 milliseconds were needed on an average for m = 2 whereas 31 milliseconds were needed for m = 1). This is consistent with the estimates in Table 1.
Possible ordering criteria for constraints in such rescheduling scenarios can be, for example, customer value (changes related to orders of important customers should occur with a significantly lower probability) and the importance of individual orders. If some orders in a schedule should not be changed, this can be achieved by simply defining such requests as requirements (R ρ ), i.e., change requests as well as stability requests can be included as constraints r i in R ρ .  Table 7 Reconfiguration determined for rescheduling task -S represents the original configuration and S represents a configuration resulting from a reconfiguration task.

Future Work
In our work, we focused on the evaluation of reconfiguration scenarios where the knowledge base itself is assumed to be consistent. In future work, we will extend the FLEXDIAG algorithm to make it applicable in scenarios where knowledge bases are tested [12]. An example issue is to take into account situations where unintended configurations are accepted by the knowledge base. In this context, we will extend the work of [12] by not only taking into account negative test cases but also automatically generate relevant test cases, for example, on the basis of mutation testing approaches. We plan to extend our empirical evaluation to further industrial configuration knowledge bases. Furthermore, we want to analyze in which way we are able to further improve the output quality (e.g., in terms of minimality and accuracy) of FLEXDIAG, for example, by applying different constraint orderings depending on the observed interaction patterns (of users) and probability estimates for diagnosis membership derived thereof. The better potentially relevant constraints are predicted the better the diagnosis quality in terms of the mentioned metrics of minimality and accuracy. Note that, for example, counting the number of elements already identified as partial diagnosis elements in FLEXDIAG does not help to keep diagnosis determination within certain time limits, however, this mechanism could be used when determing more than one diagnosis to include diagnosis size as a relevance criterion. Also, in this context will analyze further alternatives to evaluate the quality of diagnoses which go beyond the metrics used in this article.

Conclusions
Efficient reconfiguration functionalities are needed in various scenarios such as the reconfiguration of production schedules, the reconfiguration of the settings in mobile phone networks, and the reconfiguration of robot context information.
We analyzed the FLEXDIAG algorithm with regard to potentials of improving existing direct diagnosis algorithms. When using FLEXDIAG, there is a clear trade-off between performance of diagnosis calculation and diagnosis quality (measured, for example, in terms of minimality and accuracy).