Incomplete MaxSAT approaches for combinatorial testing

We present a Satisfiability (SAT)-based approach for building Mixed Covering Arrays with Constraints of minimum length, referred to as the Covering Array Number problem. This problem is central in Combinatorial Testing for the detection of system failures. In particular, we show how to apply Maximum Satisfiability (MaxSAT) technology by describing efficient encodings for different classes of complete and incomplete MaxSAT solvers to compute optimal and suboptimal solutions, respectively. Similarly, we show how to solve through MaxSAT technology a closely related problem, the Tuple Number problem, which we extend to incorporate constraints. For this problem, we additionally provide a new MaxSAT-based incomplete algorithm. The extensive experimental evaluation we carry out on the available Mixed Covering Arrays with Constraints benchmarks and the comparison with state-of-the-art tools confirm the good performance of our approaches.


Introduction
The Combinatorial Testing (CT) problem (Nie and Leung 2011) addresses the question of how to efficiently verify the proper operation of a system, where a system can be a program, a circuit, a package that integrates several pieces of software, a GUI interface, a cloud application, etc. This problem requires exploring the parameter space of the system by iteratively testing different settings of the parameters to detect errors, bugs or faults. If we consider the system parameters as variables, a setting can be described as a full assignment to these parameters.
Exploring all the parameter space exhaustively, i.e., the set of all possible full assignments, is, in general, out of reach. Notice that if a system has a set of parameters P, the number of different full assignments is p∈P g p = O g |P| , where g p is the cardinality of the domain of parameter p and g is the cardinality of the greatest domain.
The good news is that, in practice, there is no need to explore all the parameter space to detect errors, bugs or faults. We just need to cover a portion of the possible parameter combinations (Kuhn et al. 2004). For example, most software errors (75%-80%) are caused by certain individual parameters or by the interaction of just two of them.
To cover that portion of parameter combinations exhaustively, Covering Arrays (CAs) play an important role in CT. Given a set of parameters P and a strength t, a Covering Array C A(N ; t, P) is a test suite of N tests that guarantee to cover all the possible interactions of t parameters (referred as t-tuples). Since executing a test in the system has a cost, we are interested in working with relatively small covering arrays. We refer to the minimum N for which a C A(N ; t, P) exists as the Covering Array Number, denoted by C AN (t, P). In particular, we are interested in building an optimal CA, i.e., a covering array of length C AN (t, P). Notice that it is guaranteed that the number of tests required to cover all t-way parameter combinations, for fixed t, grows logarithmically in the number of parameters (Colbourn 2004), which indicates that optimal or near-optimal covering arrays can be used in practical terms. The computational challenge is to build optimal CAs in a reasonable time frame.
In this paper, we focus on Mixed Covering Arrays with Constraints (MCACs). The term Mixed refers to the possibility of having parameter domains of different sizes. The term Constraints refers to the existence of some parameter interactions that are not allowed in the system. These forbidden interactions are usually implicitly described by a set of constraints. The problem of computing an MCAC of minimum length, to which we refer in this paper as the Covering Array Number problem, is NP-hard (Maltais and Moura 2010).
There exist several greedy approaches that tackle the problem of building minimum MCACs, such as PICT (Czerwonka 2006), based on the OTAT framework (Bryce et al. 2005), and ACTS (Borazjany et al. 2012), based on the IPOG algorithm (Duan et al. 2017). One downside of these approaches is that they become more inefficient as the hardness of the set of forbidden interactions increases. Therefore, we are more interested in constraint programming approaches, which are better suited for handling constraints. For example, CALOT (Yamada et al. 2015) is a tool for building MCACs based on Satisfiability (SAT) technology (Biere et al. 2009) that can handle constraints efficiently.
Within constraint programming techniques (Rossi et al. 2006), SAT technology provides a highly competitive generic problem approach for solving decision problems. In particular, the decision problem to be solved is translated into a SAT instance (a propositional formula) and a SAT solver is used to determine whether there is a solution. In this paper, we will review in detail the CALOT tool, which essentially solves a sequence of SAT instances to compute an optimal MCAC. Each SAT instance in the sequence encodes the decision query of whether there exists an MCAC of a certain length. By iteratively bounding the length, the optimum can be determined.
Since the problem of computing minimum MCACs is, in essence, an optimization problem, we also consider its reformulation into the Maximum Satisfiability (MaxSAT) problem (Biere et al. 2009), which is an optimization version of the SAT problem.
We show empirically that MaxSAT approaches outperform ACTS and CALOT (the state-of-the-art) once the suitable MaxSAT encodings are used. We evaluate both complete or exact MaxSAT solvers (certify optimality) and incomplete MaxSAT solvers (provide suboptimal solutions). In particular, we show that while complete MaxSAT solvers perform similar to CALOT (substantially in contrast to previously reported experiments with MaxSAT solvers (Yamada et al. 2015)), incomplete MaxSAT solvers obtain better suboptimal solutions and faster than ACTS and CALOT on many instances. This confirms the practical interest of incomplete MaxSAT approaches because, in real environments, we are mainly concerned with obtaining the best possible solution within a given budget of runtime.
Having confirmed the good performance of MaxSAT approaches for computing minimum MCACs, we explore another related problem, the Tuple Number (TN) Problem. Informally, the TN problem is to determine the minimum set of missing t-tuples in a test suite of N tests, or the maximum set of t-tuples that these N tests cover. This problem is related to the Optimal Shortening Covering Arrays (OSCAR) problem (Carrizales-Turrubiates et al. 2011) (which is NP-hard), where given a matrix of tests the goal is to find a submatrix of a fixed number of tests and parameters that maximizes the number of covered t-tuples. These shortened covering arrays have been used to improve the initialization of metaheuristic approaches for Covering Arrays (without SUT constraints).
In this paper, we explore (for the first time) the Mixed and with Constraints variants of the TN problem, assessing the performance of complete and incomplete MaxSAT approaches. Obviously, this problem is of interest when N < C AN(t, P). 1 We additionally present another incomplete approach based on MaxSAT technology to which we refer as MaxSAT Incremental Test Suite (Maxsat ITS), that incrementally builds the test suite with the help of a MaxSAT query that aims to maximize the coverage of allowed tuples at every step.
The Covering Array Number problem is concerned with reporting solutions with the least number of tests. From a practical point of view, whether we are satisfied with suboptimal solutions will depend on the cost of the tests. This cost basically includes the cost of generating the tests (computational resources) and the cost of testing the system. In particular, when the cost is too prohibitive in terms of our budget, and we are satisfied with covering a statistically significant portion of the tuples, we aim to solve (even suboptimally) the Tuple Number problem. Therefore, there exist real-world scenarios where all the approaches described in this paper are of practical interest.
The rest of the paper is structured as follows: Sect. 2 introduces definitions on CAs, SAT/MaxSAT instances, constraints and SAT solvers. For computing MCACs of a given length, Sect. 3 defines different SAT encodings and Sects. 4 and 5 describe techniques to make the SAT encodings more efficient. Section 6 introduces the incremental SAT algorithm CALOT for computing minimum MCACs. Subsequently, Sect. 7 defines MaxSAT encodings and Sect. 8 describes how to efficiently apply MaxSAT solvers. For the Tuple Number problem, Sect. 9 defines a MaxSAT encoding and Sect. 10 presents a new incomplete approach using MaxSAT solvers. To assess the impact of the presented approaches, Sect. 11 reports on an extensive experimental investigation on the available MCAC benchmarks. Finally, Sect. 12 concludes the paper.

Preliminaries
We first introduce the definitions related to Systems Under Test and Covering Arrays.
Definition 1 A System Under Test (SUT) model is a tuple P, ϕ , where P is a finite set of variables p of finite domain, called SUT parameters, and ϕ is a set of constraints on P, called SUT constraints, that implicitly represents the parameterizations that the system accepts. We denote by d( p) and g p , respectively, the domain and the domain cardinality of p. For the sake of clarity, we will assume that the system accepts at least one parameterization.
In the following, we assume S = P, ϕ to be a SUT model. We will refer to P as S P , and to ϕ as S ϕ .

Example 1
As an example of SUT model, we focus on the domain of autonomous driving. Table 1 shows the parameters and values, S P , and the SUT constraints, S ϕ : Definition 2 An assignment is a set of pairs ( p, v) where p is a variable and v is a value of the domain of p. A test case for S is a full assignment A to the variables in S P such that A entails S ϕ (i.e. A | S ϕ ) . A parameter tuple of S is a subset π ⊆ S P . A value tuple of S is a partial assignment to S P ; in particular, we refer to a value tuple of length t as a t-tuple.
Example 2 Consider the SUT presented in Example 1. An example of test case is {(L, dy), (E, hw), (S, ca), (M, cb)}. {L, E} is a parameter tuple and {(L, dy), (E, hw)} a value tuple for t = 2.  The Tuple Number problem is to find a test suite of size N that covers T (N ; t, S) t-tuples. The MCAC problem is a decision problem. The Covering Array Number and the Tuple Number problems, to which we refer in short as the C AN(t, S) and T (N ; t, S) problems, respectively, are optimization problems. Now, we introduce the definitions related to the encodings and the SATisfiabilitybased solving technology we will use to solve the problems defined above.
Definition 9 A literal is a propositional variable x or a negated propositional variable ¬x. A clause is a disjunction of literals. A formula in Conjunctive Normal Form (CNF) is a conjunction of clauses.
Definition 10 A weighted clause is a pair (c, w), where c is a clause and w, its weight, is a natural number or infinity. A clause is hard if its weight is infinity (or no weight is given); otherwise, it is soft. A Weighted Partial MaxSAT instance is a multiset of weighted clauses.
Definition 11 A truth assignment for an instance φ is a mapping that assigns to each propositional variable in φ either 0 (False) or 1 (True). A truth assignment is partial if the mapping is not defined for all the propositional variables in φ.
Definition 12 A truth assignment I satisfies a literal x (¬x) if I maps x to 1 (0). A truth assignment I falsifies a literal x (¬x) if I maps x to 0 (1). A truth assignment I satisfies a clause if I satisfies at least one of its literals; otherwise, it is violated or falsified. The cost of a clause (c, w) under I is 0 if I satisfies the clause; otherwise, it is w. Given a partial truth assignment I , a literal or a clause is undefined if it is neither satisfied nor falsified. A clause c is a unit clause under I if c is not satisfied by I and contains exactly one undefined literal.

Definition 13
The cost of a formula φ under a truth assignment I , denoted by cost (I , φ), is the aggregated cost of all its clauses under I .

Definition 14
The Weighted Partial MaxSAT (WPMaxSAT) problem for an instance φ is to find an assignment in which the sum of weights of the falsified soft clauses is minimal (referred to as the optimal cost of φ) and all the hard clauses are satisfied. The Partial MaxSAT problem is the WPMaxSAT problem when all the soft clauses have the same weight. The MaxSAT problem is the Partial MaxSAT problem when there are no hard clauses. The SAT problem is the Partial MaxSAT problem when there are no soft clauses.

Example 10
The optimal cost of the instance φ in Example 8 is 1.
Definition 15 An instance of Weighted Partial MaxSAT, or any of its variants, is unsatisfiable if its optimal cost is ∞. A SAT instance ϕ is satisfiable if there is a truth assignment I , called model, such that cost(I , ϕ) = 0.
Definition 16 An unsatisfiable core is a subset of clauses of a SAT instance that is unsatisfiable.
Definition 17 Given a SAT instance ϕ and a partial truth assignment I , we refer as Unit Propagation, denoted by U P (I , ϕ), to the Boolean inference mechanism (propagator) defined as follows: Find a unit clause in ϕ under I , where l is the undefined literal. Then, propagate the unit clause, i.e. extend I with x = 1 (x = 0) if l ≡ x (l ≡ ¬x) and repeat the process until a fixpoint is reached or a conflict is derived (i.e. a clause in φ is falsified by I ). We refer to U P(I , ϕ) simply as U P(ϕ) when I is empty.

Definition 18 Let A and B be SAT instances.
A | B denotes that A entails B, i.e. all assignments satisfying A also satisfy B.
Definition 19 A pseudo-Boolean (PB) constraint is a Boolean function of the form n i=1 q i l i k, where k and the q i are integer constants, l i are literals, and ∈ {<, ≤ , =, ≥, >}.
Definition 20 A Cardinality (Card) constraint is a PB constraint where all q i are equal to 1. An At-Most-One (AMO) constraint is a cardinality constraint of the form n i=1 l i ≤ 1. An At-Least-One (ALO) constraint is a cardinality constraint of the form n i=1 l i ≥ 1. An Exactly-One (EO) constraint is a cardinality constraint of the form n i=1 l i = 1.
The interface of a modern SAT solver is presented in code fragment SATSolver. The input instance is added to the solver with functions add_clause and add_retractable (in case the clause can be retracted) (lines 5-7), which operate on a single clause, while functions add and retract operate on a set of clauses. The last two functions are overloaded to ease the usage of SAT solvers within MaxSAT solvers (lines 10-13 and 14-18). Variable n_vars indicates the number of variables of the input formula (line 1).
Function solve (lines 8-9) returns UNSAT (SAT) if the input formula is unsatisfiable (satisfiable) and sets variable core (model) to the corresponding unsatisfiable core (model). Function assume (line 4) allows to place an assumption on the truth value of a literal before function solve is called. Finally, modern SAT solvers also support an incremental solving mode, which allows to keep the learnt clauses across calls to the function solve.

The MCAC problem as SAT
In this section, we present the SAT encoding described in Yamada et al. (2015) to decide whether there exists a C A(N ; t, S) for a given SUT model S = P, ϕ . It is similar to previous encodings described in Hnich et al. (2005Hnich et al. ( , 2006; Banbara et al. (2010); ; .
In the following, we list the set of constraints that define the SAT encoding and describe the semantics of the propositional variables they refer to. To encode each constraint, we assume that AMO and EO cardinality constraints are translated into CNF through the regular encoding (Ansótegui and Manyà 2004;Gent and Nightingale 2004) and the typical transformations (Tseitin 1983) of → and ↔ are implicitly applied. 2 First, we define variables x i, p,v to be true iff test case i assigns value v to parameter p, and state that each parameter in each test case takes exactly one value as follows Second, as described in , to enforce the SUT constraints ϕ, for each test case i, we add the CNF formula that encodes the constraints of ϕ into SAT and substitute each appearance of the pair ( p, v) in ϕ by the corresponding literal on the propositional variable x i, p,v for each test case i. i∈ [N ] Third, we introduce propositional variables c i τ and state that if they are true, then tuple τ must be covered at test i, by forcing the variables p in the test case to be assigned to the value specified in τ , as follows: Notice that only t-tuples that can be covered by a test case are encoded, i.e., τ ∈ T a . In Sect. 4, we discuss how to detect the t-tuples forbidden by the SUT constraints.
Finally, we state that every t-tuple τ ∈ T a , must be covered at least by one test case, as follows: Inspired by the incremental SAT approach in Yamada et al. (2015) (see Sect. 6), we present another encoding where C and C X are replaced by CC X: Variables c i τ have now a different semantics, i.e., if they are true, τ is covered by test case i or by any lower test case j, where 1 ≤ j ≤ i (equation a). In order to guarantee that τ will be covered by some test, notice that we just need to force c N τ to be true and c 0 τ to be false (variables c 0 τ are additionally included in the encoding). This can be achieved by adding the unit clauses c N τ (equation b) and the implication c N τ → ¬c 0 τ (equation c) for every allowed tuple τ . The seasoned reader may wonder why we do not simply replace equation (c) by τ ∈T a ¬c 0 τ . Indeed, this is possible. First, notice that UP on the conjunction of equations (b) and (c) will derive exactly the same. Second, for encoding some problems where it is not mandatory to cover all the tuples (see Sect. 9 on encoding the Tuple Number problem), we have to erase equation (b) from CC X and also guarantee that if a tuple τ is not covered in an optimal solution, i.e., c N τ has to be False, then the related clauses in CC X have to be satisfied (these are hard clauses) and, if possible, to be trivially satisfied, i.e., without requiring search. Equation (c) eases this case for all the scenarios in Sect. 9. Notice that, once c N τ is False, clauses in equation (c) are trivially satisfied and, by setting the remaining c i τ vars to True, clauses in equation (a) are also trivially satisfied.
Remark 1 There are some variations of equation (a) in CC X that can be beneficial when using some SAT solvers, as we will see in Sect. 11.1. For example, we can use full implication instead of half implication in equation (a), i.e., . Also, we can consider full implication in equation (c) and, for some of the problems analyzed in Sect. 11.1, we can even replace equation (c) by τ ∈T a ¬c 0 τ .

Example 12
We show how to build Sat N =10,t=2,S C X for the SUT in Example 1, where N = 10 is an upper bound ub for this SUT (see Sect. 4).

Preprocessing for the MCAC problem
In the context of the Covering Array Number problem, we define an upper bound ub and a lower bound lb to be integers such that ub ≥ C AN(t, S) > lb. When ub = lb + 1, we can stop the search and report ub as the minimum covering array number C AN(t, S).
To get an initial value for ub, we can execute a greedy approach to obtain a suboptimal C A(N ; t, S) and set ub to N . For example, in the experiments, we use the tool ACTS (Borazjany et al. 2012) that supports Mixed Covering Arrays with Constraints. Moreover, a lower ub also implies a smaller initial encoding.
Additionally, by inspecting the solution, i.e., the test cases that certify the suboptimal C A(N ; t, S), we can compute which tuples are not covered, the set of forbidden tuples, since the suboptimal C A(N ; t, S) guarantees to cover all allowed t-tuples.
Furthermore, let r be the maximum number of allowed t-tuples associated with any parameter tuple of length t. Then, we can set lb = r − 1, since these r value tuples (mutually exclusive) need to be covered by different test cases.
1 Input: SUT model S, SAT solver sat Using an approach like ACTS, not based on constraint programming techniques, has a drawback. It may not be efficient enough if testing the satisfiability of S ϕ (the set of SUT constraints) is computationally hard. In this case, to detect the forbidden tuples, we can simply apply algorithm ForbiddenTuples. This algorithm tests, for every tuple τ (lines 4-7), if it is compatible with the SUT constraints (line 2) through a SAT query; if the solver results in unsatisfiability (line 7), the tuple is added to the set of forbidden tuples T f , which is ultimately returned by the algorithm (line 8).
For t = 2, which is already of practical importance (Kuhn et al. 2004), the experiments carried out in this paper show that this detection process is negligible runtime-wise.

Symmetry breaking for the MCAC problem
As Yamada et al. (2015), we fix the r t-tuples that conducted us to set the initial lb (see Sect. 4) to test cases {1, . . . , r }. This helps us break row symmetries for the first r test cases. We will refer to this as fixed-tuple symmetry breaking.
There are other alternatives. We can impose row symmetry breaking constraints as Flener et al. (2002); since each row (test) represents a number in base 2, we can add constraints to order the tests in monotonic increasing order, from test 0 to test N − 1. We can also apply, as explained above, fixed-tuple symmetry breaking to the first r tuples (first partition) and apply row symmetry breaking constraints to the remaining ub − lb + 1 test cases (second partition). Furthermore, we can impose an order among the tuples in the first partition and the second partition, so that if two sets share the same value for the fixed tuple, then the one representing the lower number must be in the first partition.
Our experimental analysis shows that fixed-tuple symmetry breaking is superior to any other of the mentioned alternatives. For lack of space, we restricted all the experiments to this symmetry breaking approach.

Example 13
We show how to apply symmetry breaking to the SUT in Example 1.
(E, S) is the parameter tuple with the largest number of allowed tuples we selected. Its set of allowed value tuples is: To apply the fixed-tuple symmetry breaking variant, we just need to fix each allowed value tuple in a different test as shown below: x 1,E,hw ∧ x 1,S,ca (Ex. SYM X) x 2,E,hw ∧ x 2,S,ra x 3,E,ur ∧ x 3,S,ca x 4,E,ur ∧ x 4,S,ra x 5,E,ur ∧ x 5,S,li x 6,E,co ∧ x 6,S,ca x 7,E,co ∧ x 7,S,ra

Solving the CAN(t, S) problem with Incremental SAT
In this section, we present the CALOT algorithm, which is an incremental SAT approach for computing optimal covering arrays with SUT constraints described by Yamada et al. (2015). The input to the algorithm is an upper bound ub (computed as in Sect. 4), the strength t and the SUT model S. In line 2, the incremental SAT solver is initialized with the SAT instance Sat N =ub,t,S CC X . Additionally, breaking symmetries for the first lb + 1 tuples, as described in Sect. 5, are added to the SAT solver. The output is the covering array number and an optimal model. The algorithm works by iteratively decreasing the ub till it reaches lb + 1 (line 5) or the current SAT instance is unsatisfiable (line 6). To decrease the ub by one, the algorithm adds the set of unit clauses τ ∈T a c i−1 τ (line 7), which state that every t-tuple is covered by a test case with an index smaller than i.
There is a subtle detail in lines 9 and 10. Whenever the algorithm finds a new upper bound, variables x i, p,v related to the previous upper bound are fixed to the value in the last model found (b model in line 8), so that these variables do not need to be decided in the next iterations. As Yamada et al. (2015) report, not fixing these variables can have some negative impact on the performance. Yamada et al. (2015) 1 Input: upper bound ub, strength t, SUT model S 2 sat.add(Sat N =ub,t,S CC X ) 3 Fix lb + 1 value tuples to break symmetries (see Section 5)

Remark 2
The original (Yamada et al. 2015)'s algorithm pseudocode is slightly different. First, it assigns the i-th test at iteration i to the value it had in the previous model found instead of assigning the i + 1-th test. This does not correspond to the description given in the text of the paper and may lead to an incomplete algorithm. Second, the set of constraints (a) (CC X), described in Yamada et al. (2015), does not set c N τ to True as we do in this paper, which makes the pseudocode perform a dummy first step that can cause to report a wrong optimum. We think that these are merely errors in the description, and we have fixed them. Since the tool CALOT is not available from the authors for reproducibility, we have tried to do our best to reproduce (or extend) the idea behind their work.
In Sect. 8, we will see that this SAT incremental approach resembles how SAT-based MaxSAT algorithms behave Morgado et al. 2013). Actually, in contrast to Yamada et al. (2015), we show that MaxSAT technology can be effectively applied to solve Covering Arrays.  proposes an encoding into Partial MaxSAT to build covering arrays without constraints of minimum size. The main idea is to use an indicator variable u i that is True iff test case i is used to build the covering array. The objective function of the optimization problem, which aims to minimize the number of variables u i set to True, is encoded into Partial MaxSAT by adding the following set of soft clauses:

The CAN(t, S) problem as partial MaxSAT
Notice that we only need to use N − (lb + 1) indicator variables since we know that the covering array will have at least lb + 1 tests (see Sect. 4).
To avoid symmetries, it is also enforced that if test case i +1 belongs to the minimum covering array, so does the previous test case i: Then, variables u i are connected to variables c i τ , expressing that if we want test i to be the proof that τ is covered, then test i must be in the optimal solution 3 : In order to build the Partial MaxSAT version of Sat N ,t,S CC X , we just need to change how variables u i are related to variables c i τ . This constraint reflects that if u i is False (i.e., test i is not in the solution and, therefore, due to constraint B SU , none of the tests > i cannot be in the solution either), then the tuple τ has to be covered at some test below i: , variables u i are instead connected to variables x i, p,v in the following way:

Remark 3 In
This is a more compact encoding but it requires Eq. X to use an AMO constraint instead of an EO constraint.
Finally, we can convert these Partial MaxSAT instances into Weighted Partial MaxSAT modifying So f tU as follows: If we use w i = 2 i−(lb+2) we naturally introduce a lexicographical preference in the soft constraints. This is a key detail to alter the behaviour of SAT-based MaxSAT algorithms when solving Covering Arrays. If the MaxSAT solver applies the stratified approach (Ansótegui et al. 2012) (see for more details Sect. 8), it suffices to use w i = i − (lb + 2) + 1, i.e., to increase the weights linearly. This is of interest since a high number of tests in W Sof tU can result in too large weights for some MaxSAT solvers.
Example 14 We extend our working example to obtain the Partial MaxSAT and Weighted Partial MaxSAT encodings described in this section. We first describe how we encode So f tU (left) and B SU (right) constraints: Recall that in our example ub = 10 and lb = 6 (see Examples 12 and 13). Therefore, we will have N − (lb + 1) = 10 − (6 + 1) = 3 u i indicator variables.
To build the P M Sat N =10,t=2,S,lb=6 C X instance we add to Sat N =10,t=2,S C X the CU constraint: the CCU constraint: The weighted counterparts, W P M Sat N =10,t=2,S,lb=6 C X and W P M Sat N =10,t=2,S,lb=6 CC X , need only to replace So f tU by W Sof tU (using w i = i − (lb + 2) + 1), as follows: To build the resulting MCAC from the MaxSAT solver truth assignment, we will discard the x i, p,v vars whose corresponding u i is assigned to False (i.e. test i does not belong to the solution), and proceed as in Example 12.

Solving the CAN(t, S) problem with MaxSAT
In this section, we show that SAT-based MaxSAT approaches can simulate 4 the CALOT algorithm, while the opposite is not true. This is an interesting insight since the MaxSAT approach additionally provides the option of applying a plethora of MaxSAT algorithms.
Let us first introduce a short description of SAT-based MaxSAT algorithms. For further details, please consult Morgado et al. 2013). Roughly speaking, SAT-based MaxSAT algorithms proceed by reformulating the MaxSAT optimization problem into a sequence of SAT decision problems. Each SAT instance of the sequence encodes whether there exists an assignment to the MaxSAT instance with a cost less than or equal to a certain k. SAT instances with a k less than the optimal cost are unsatisfiable, while the others are satisfiable. The SAT solver is executed in incremental mode to keep the clauses learnt at each iteration over the sequence of SAT instances. Thus, SAT-based MaxSAT can also be viewed as a particular application of incremental SAT solving.
There are two main types of SAT-based MaxSAT solvers: (i) model-guided and (ii) core-guided. The first ones iteratively refine (decrease) the upper bound and guide the search with satisfying assignments (models) obtained from satisfiable SAT instances. The second ones iteratively refine (increase) the lower bound and guide the search with the unsatisfiable cores obtained from unsatisfiable SAT instances. Both have strengths and weaknesses, and hybrid approaches exist (Ansótegui et al. 2016;Ansótegui and Gabàs 2017).

The linear MaxSAT algorithm
The Linear algorithm (Eén and Sörensson 2006; Le Berre and Parrain 2010), described in Algorithm Linear, is a model-guided algorithm for WPMaxSAT.
At each iteration of the Linear algorithm, the SAT instance solved by the incremental SAT solver is composed of: (i) the hard clauses φ h (line 2), which guarantee that any possible solution is a feasible solution; (ii) the reification of each soft clause where b i is a fresh auxiliary variable which acts as a collector of the truth value of the soft clause (line 3); and (iii) the CNF translation of the PB constraint (c i ,w i )∈φ s w i · b i ≤ k, where k = ub − 1 bounds the aggregated cost of the falsified soft clauses, i.e., the value of the objective function.
Initially, ub is set to ( (c i ,w i )∈φ s w i + 1) (line 4), that is semantically equivalent to ∞. Then, iteratively, if the incremental SAT solver returns satisfiable, ub is updated Algorithm Linear: Linear SAT-based algorithm A technical point to mention is that the PB constraint is translated into SAT thanks to an incremental PB encoding (line 5) so that whenever we tighten the upper bound, instead of retracting the original PB constraint and encode the new one, we just need to add some additional clauses (line 10). Additionally, if all the weights in the soft clauses are equal, instead of using an incremental PB encoding, we can use an incremental cardinality encoding for which more efficient encodings do exist.

Proposition 6 The Linear algorithm with Weighted Partial MaxSAT instance W P M Sat N ,t,S,lb
CC X as input can simulate the CALOT algorithm (excluding lines 9 and 10).
In the first place notice that in the worst case the Linear algorithm will decrease the current upper bound by one unit as the Algorithm CALOT. Then, the key point establishing the connection of the Linear algorithm with the CALOT algorithm is to show that, given the same upper bound k to both algorithms, the Linear algorithm can propagate the same set of c i−1 τ variables (line 7 in Algorithm CALOT). Let us recall that the Linear algorithm, with input φ ≡ W P M Sat N ,t,S,lb CC X , will generate a sequence of SAT instances composed of the original hard clauses φ h , the reification of the soft clauses ) when using the exponential increase, and the current upper bound k.
First of all, notice that the weight of a higher index test is strictly greater than the aggregated weights of the lower index tests. Given an upper bound k, an efficient CNF translation of the PB constraint will allow Unit Propagation (UP) to derive that all bs associated with soft clauses with a weight greater than k must be False. Then, from the set of clauses that reify the soft clauses (of the form ¬u i ∨ b i ), UP will also derive that the corresponding u i vars must be False and, from the set of hard clauses CCU , UP will derive that the corresponding c i−1 τ must be true.
If the input problem is a Partial MaxSAT instance, i.e., P M Sat N ,t,S,lb CC X where the ith soft clause is of the form (¬u i , 1), the Linear algorithm uses a cardinality constraint instead of a PB constraint to bound the aggregated cost of the falsified soft clauses. In this case, we can only guarantee that CCU Notice that, given an upper bound k, UP cannot derive on (¬u i ,1)∈φ s b i ≤ k the set of b i s that must be False, because all correspond to soft clauses of equal weight.
CALOT algorithm cannot simulate the Linear Algorithm While the CALOT algorithm decreases the upper bound by one at each iteration, the Linear algorithm can decrease it more aggressively. This is the case when it finds a model with a lower cost than k − 1 (line 9), which can significantly reduce the number of calls to the SAT solver.

The WPM1 MaxSAT algorithm
The Fu&Malik algorithm (Fu and Malik 2006) is a core-guided SAT-based MaxSAT algorithm for Partial MaxSAT instances. In contrast to the Linear algorithm, which uses the models to iteratively refine the upper bound, the Fu&Malik algorithm uses the unsatisfiable cores to refine the lower bound. In particular, the initial SAT instance ϕ 0 explored by the Fu&Malik algorithm is composed of the hard clauses in the input MaxSAT instance φ h plus the SAT clauses c i extracted from the soft clauses (c i , w i ). We refer to these c i clauses as soft-indicator clauses.
At each iteration, if ϕ k is satisfiable, the optimum is k. If ϕ k is unsatisfiable, the clauses in the unsatisfiable core retrieved by the SAT solver are analyzed. If none of the clauses is a soft-indicator clause, the Partial MaxSAT formula is declared unsatisfiable and the algorithm stops. Otherwise, the core tells us that we need to relax the soft-indicator clauses, i.e., we need to violate more clauses. To construct the next instance, ϕ k+1 , each soft-indicator clause in the core of ϕ k is relaxed with a fresh auxiliary variable b and a hard EO cardinality constraint is added on these new variables, indicating that at least one clause must be violated (this is what the core told us) and at most one clause is violated (this prevents jumping over the optimum).
The WPM1 algorithm (Ansótegui et al. 2009;Manquinho et al. 2009) is an extension of the Fu&Malik algorithm that solves Weighted Partial MaxSAT instances by applying the split rule for weighted clauses. In particular, we are interested in using the Stratified WPM1 algorithm (WPM1) (Ansótegui et al. 2012), which clusters the input clauses according to their weights. 6 These clusters were originally named as strata in Ansótegui et al. (2012). The algorithm incrementally merges the clusters solving the related subproblem until all clusters have been merged. In its simpler version, all the clauses in a cluster have the same weight (called the representative weight), and clusters are added in decreasing order with respect to the representative weight, but other strategies can also be applied (Ansótegui et al. 2012).
In the WPM1 algorithm, variable φ wk represents the formula that contains the merged clusters (strata) so far, while φ re represents the remaining weighted clauses from the original input instance φ. Whenever we solve to optimality the current instance φ wk , i.e., the SAT solver returned a SAT answer in the last call (line 4) but φ re = ∅, function next_stratum updates variable φ st to the new stratum (cluster) to be merged with φ wk 7 (the working SAT instance (line 5) and variables φ wk , φ re are updated accordingly (line 6)). Otherwise, the SAT solver returned UNSAT in the previous call, meaning that we are still optimizing the current subproblem φ wk and need to call the SAT solver again (line 7).
If the SAT solver returns a SAT answer and all the original clauses in φ have been considered, i.e. φ re = ∅, then we have optimized the input instance φ and return its cost and an optimal model (line 8).
If the SAT solver returns an UNSAT answer, first we analyze the unsatisfiable core returned by the SAT solver (line 10) and return the soft-indicator clauses to be relaxed in variable to_relax, if any; otherwise, we have certified that the set of hard clauses is unsatisfiable, i.e., we return cost ∞ and an empty model.
Function split_and_relax (line 11) first applies the split rule to the soft-indicator clauses in to_relax and generates two sets, one where all the clauses are normalized to have the minimum weight, and another with the residuals of each clause with respect to the minimum weight in to_relax. Second, the set of clauses with the minimum weight are extended, each with an additional fresh variable and stored in the set relaxed as in the Fu&Malik algorithm. The new fresh variables are returned in set B.
Finally, the original set of clauses to_relax is retracted from the SAT solver (line 12), and the new set relaxed is added to the working SAT instance plus the cardinality constraint that increases the lower bound as in the Fu&Malik algorithm (line 13). 8 In line 14, φ wk is updated to reflect the changes in the SAT working formula, and the remaining formula φ re is extended with the residuals generated from the application of the split rule.
As a final remark, notice that if the statements in grey boxes of the WPM1 algorithm are erased and function next_stratum is instructed to report sequentially, first the hard clauses and then the soft clauses, we get the original Fu&Malik algorithm.
In the context of the Covering Array Number problem, the Fu&Malik algorithm on the P M Sat N ,t,S,lb CC X instance will perform a bottom-up search, i.e, the first query will correspond to the question of whether the covering array can be constructed with k = 0 tests, then with k = 1 tests, etc. This approach does not provide any intermediate upper bounds since the only query answered positively corresponds to the optimum.
However, interestingly, by considering the weighted version of the Fu&Malik algorithm, we can perform a top-down search on the Covering Array problem and provide intermediate upper bounds.

Proposition 8 The Stratified WPM1 algorithm with input W P M Sat N ,t,S,lb
CC X can simulate the CALOT algorithm (excluding lines 9 and 10).
Back to the context of covering arrays, each cluster in W P M Sat N ,t,S,lb CC X would be composed of a single soft clause (¬u i , w i ), except the cluster containing all the hard clauses. The first subproblem seen by the Stratified WPM1 algorithm encodes the query of whether one can build a covering array using N tests. The next subproblem incorporates the first soft clause (¬u N , w N ) and encodes the query of whether one can construct the covering array using N − 1 tests. Notice that each ¬u i will propagate, according to CCU , the corresponding c i−1 τ vars as in the CALOT algorithm. Notice also that every solution of a subproblem is an upper bound for the covering array.
The discussion of this section has provided insights into how to solve Covering Arrays through MaxSAT, but also into how to fix similar difficulties in other problems where MaxSAT is not yet effective enough.

Test-based Streamliners for the CAN(t, S) problem
Notice that a solution for a C AN(t, S) problem can be extended to multiple solutions in the previous MaxSAT translations. This happens when C AN(t, S) < N , since the assignment to the x vars related to any test i with i > C AN(t, S) (useless from the point of view of the C AN(t, S) problem) still needs to be consistent with the X and SU T X constraints. In general, notice that SU T X can be NP-complete.
Lines 9 and 10 of the CALOT algorithm, as described in Sect. 6, fix that problem but cannot directly be applied within MaxSAT algorithms since the solver is not aware of the C AN(t, S) problem semantics.
However, we can reproduce a similar effect. At the preprocessing step, we can build a dummy test case υ by computing a solution to S ϕ (e.g. with a SAT solver) or select any of the test cases in the solution returned by the ACTS tool when computing the upper bound (see Sect. 4). Then, we can state in the MaxSAT encoding that if a given test i is not part of the optimal solution (i.e., u i is False), then the corresponding x vars are set to the value in the test case υ.
The dummy test case υ exactly plays the role of the so-called streamliner constraints (Gomes and Sellmann 2004), which rule out some of the possible solutions but make the search of the remaining solutions more efficient.
There is yet another way to mitigate that potential bottleneck. We can indeed extend SU T X clauses for test i with literal ¬u i . Therefore, whenever test i is no longer in the optimal solution (i.e. u i is False), the corresponding SU T constraints are trivially satisfied. However, in the experimental investigation, we confirmed that this option is less efficient than adding NU X clauses.

The T(N; t, S) problem as weighted partial MaxSAT
For some applications, we may not be able to use as many test cases as the covering array number (e.g. due to budget restrictions), but we may still be interested in solving the Tuple Number problem, i.e., to determine the maximum number of covered t-tuples we can get with a test suite of fixed size.
Once again, MaxSAT technology can play an important role when SUT constraints are considered. Moreover, the size of the SAT/MaxSAT encodings for this problem are smaller than encodings for computing the Covering Array Number, since fewer tests are taken into consideration.
In the following, we show how we can modify the Sat N ,t,S C X and Sat N ,t,S CC X formulae to become Partial MaxSAT encodings of the Tuple Number problem.
The basic idea is that we need to soften the hard restriction that enforces all allowed t-tuples to be covered. To this end, we modify the SAT instance Sat N ,t,S C X as follows: First, we soften all the clauses from Eq. C which encode that every t-tuple τ must be covered by at least one test case, therefore allowing to violate (or relax) these constraints. For the sake of clarity, although not required for soundness, we introduce a new set of indicator variables c τ that reify each ALO constraint in Eq. C by introducing the following hard constraints: Then, we add the following set of soft clauses: Finally, we we replace in Sat N ,t,S C X the set of constraints C (the hard constraint that forced to cover all the tuples) by the previous two sets of constraints.

Proposition 9 Let S be a SUT model and let T P M Sat
Remark 4 Even if N > lb, we cannot use fixed-tuple symmetry breaking since we do not know whether the t-tuples that we fix will lead to an optimal solution. Therefore, fixed-tuple symmetry is disabled for all the encodings in this section.

Remark 5
When computing the tuple number, we can avoid the step of detecting all forbidden tuples since the encoding remains sound, i.e., we can interchange T a by T . Notice that those c τ vars related to forbidden tuples will always be set to False. Moreover, notice that a core-guided algorithm may potentially detect easily as many unsatisfiable cores as forbidden tuples which include just the unit soft clause that represents the forbidden tuple.
In case we want to extend Sat N ,t,S CC X to compute the tuple number, we just need to notice that the previously defined role of c τ corresponds exactly to variable c N τ in Sat N ,t,S CC X , so we just need to soften the hard unit clauses c N τ (described in CC X) with weight 1.  In what follows, we present two extensions.

Combining the CAN(t, S) and T(N; t, S) problems
The Covering Array and Tuple Number problems can lead to thinking about a more general formulation of the optimization problem where we want to maximize the number of covered t-tuples while minimizing the number of test cases. Notice that it will depend on the value of N with respect to the covering array number (not necessarily known a priori) whether we are, in essence, solving the covering array number or the tuple number problem.
To this end, we take the P M Sat N ,t,S,lb C X encoding of the Covering Array Number problem for a SUT model S, N tests and strength t. As earlier shown in this section, we first replace the set of hard constraints C by RC and So f tC W U .
τ ∈T a (c τ , |u i | + 1). (SoftCWU) Notice that we prefer violating all soft clauses (¬u i , 1) over violating a single soft clause (c τ , |u i | + 1). This way, we guarantee that any solution to our new Weighted Partial MaxSAT instance maximises the number of covered t-tuples while minimises the number of needed test cases.

Proposition 11
If N ≥ C AN(t, S), the optimal cost of the Weighted Partial MaxSAT instance P M Sat N ,t,S,lb The same idea can be applied to P M Sat N ,t,S,lb CC X by softening the unit hard clauses (c N τ ) in equation (b) from CC X with weight |u i | + 1. Here, it is important to recall the discussion in Sect. 3 on the need of equation (c) in CC X. The other, perhaps more natural, alternative was to replace equation (c) in CC X by τ ∈T a ¬c 0 τ . The problem arises when, in an optimal solution, τ is not covered, what also implies that (c N τ ) is False. Notice that we need to satisfy all clauses related to τ in CC X but, in order to do that, we need to set all c i τ vars to False. This may not be compatible with equation CCU (clauses of the form ¬c i−1 τ → u i ) when some test i is discarded to be in the solution and variable u i is set to False, since UP will derive in CCU that c i−1 τ is True. In this case, a contradiction is reached. On the other hand, as discussed in Sect. 3, equation (c) allows to set all c i τ vars to True when (c N τ ) is False and trivially satisfy all clauses in CC X related to τ .

The CAN(t, S) problem with relaxed tuple ratio coverage as MaxSAT
We can tackle other realistic settings where we still want to use the minimum number of tests, but there is no need to achieve a 100% ratio of covered t-tuples (mandatory per definition in Covering Arrays). Notice that the last tests that shape the covering array number tend to cover very few not yet covered t-tuples. Therefore, if these tests are expensive enough in our setting, we may consider relaxing the ratio coverage and skip these tests. The mentioned problem can be encoded by replacing the previously soft constraints on the c τ vars with a hard cardinality constraint on the minimum number of t-tuples to be covered as follows: where rt is the ratio of allowed t-tuples that we want to cover. Notice that, for efficiency reasons, CCard can be also described as τ ∈T a ¬c τ ≤ |T a | · (1 − rt) .

Remark 6
With this formulation, we cannot use the fixed-tuples symmetry breaking since we do not know whether we will require at least lb tests to cover the specified ratio of allowed t-tuples.

Incomplete MaxSAT algorithms for the T(N; t, S) problem
As argued earlier, if certifying optimality is not a requirement and we are just interested in obtaining a good suboptimal solution in a reasonable amount of time, we can apply incomplete MaxSAT algorithms on the encodings of the Tuple Number problem described in the previous section. Additionally, in this section, we present a new incomplete algorithm to compute suboptimal solutions for the Tuple Number problem.

MaxSAT based incremental test suite construction
A way to reduce the search space of any constraint problem is to add the so-called streamliner constraints (Gomes and Sellmann 2004). We recall that these constraints rule out some of the possible solutions but make the search for the remaining solutions more efficient. However, in practice, streamliners can rule out all the solutions.
In our context, the streamliner constraints correspond to a set of tests that we think have the potential to be part of optimal solutions. By fixing these tests, we generate a new covering array problem, easier to solve, but whose Covering Array Number can be greater than or equal to that of the original covering array, because we may have missed all the optimal solutions. We iterate this process until all t-tuples get covered. To select the k candidate test to be fixed at each iteration, we solve the Tuple Number problem restricted to length k.
In the context of the Tuple Number problem, this iterative process of fixing tests should not only finish when all t-tuples have been covered but also when the requested N tests have been fixed.
To that end, here we combine a greedy iterative approach with the SAT-based MaxSAT approaches from Sect. 9 in the IncrementalCA algorithm.

Input: SUT model S, Tests N i per iteration, SAT-based MaxSAT solver msat
In this algorithm, we begin with the remaining tuples to cover T r , initially assigned to allowed tuples T a , as well as an empty test suite Υ (line 2). Then, we first check how many tests should be encoded; the minimum between the tests in iteration N i and the remaining number of tests left to complete the test suite, N − |Υ | (line 4), storing the result into N . Next, we solve the Tuple Number problem for these N tests, encoded as a T P M Sat N ,t,S CC X formula (lines 5, 6) from Sect. 9. We extract the model from the MaxSAT solver, interpreting it into newly found test cases υ (line 7). Then, those new tests are added to test suite Υ (line 8). Finally, the tuples covered by these new test cases are removed from T r (line 9). This iteration is repeated until no more tuples are left in T r or we have reached the requested N test cases (line 3), in which case we return the constructed test suite Υ (line 10).

Experimental evaluation
In this section, we report on an extensive experimental investigation conducted to assess the approaches proposed in the preceding sections. We start by defining the benchmarks, which include 28 industrial, real-world or real-life instances and 30 crafted instances, and the algorithms involved in the evaluation.
We contacted the authors of Yamada et al. (2015) and Yamada et al. (2016) to obtain the benchmarks used in their experiments. In particular, the available benchmarks are: (i) Cohen et al. (2008), with 5 real-world and 30 artificially generated (crafted) covering array problems; (ii) Segall et al. (2011), with 20 industrial instances; (iii) Yu et al. (2015), with two real-life systems reported by ACTS users; and (iv) Yamada et al. (2016), with an industrial instance named "Company_B". Table 4 provides information about the System Under Test of each instance, where S P is the number of parameters and their domain (e.g. the meaning of 2 29 3 1 in instance 7 is that the instance contains 29 parameters of domain 2 and 1 parameter of domain 3); S ϕ is the number of SUT constraints and their sizes (e.g. the meaning of 2 13 3 2 in instance 7 is that the instance contains 13 constraints of size 2 and 2 constraints of size 3); and # lits C N F(S ϕ ) is the number of literals of the CNF representation of S ϕ (i.e. the sum of the sizes of all clauses). Table 4 also reports, for t = 2, the following data: ub AC T S , which indicates the upper bound returned by the ACTS tool (see Sect. 4); ub , which is the best known upper bound (a star indicates that it is optimal, i.e., C AN(2, S)); lb, which reports the lower bound (computed as in Sect. 4); and |T a | and |T f |, which report the number of allowed and forbidden tuples, respectively.
Finally, we also show, for the P M Sat N ,t=2,S,lb CC X encoding of each instance, the following information: # vars, which is the number of variables used by this encoding; # clauses, which is the number of clauses; # lits, which is the number of literals; and size (MB), which is the file size of the WCNF formula in MB.
Notice that in this paper we focus on t = 2 strength coverage. Regarding existing tools for solving Mixed Covering Arrays with Constraints, the main tool we compare with is CALOT (Yamada et al. 2015). Unfortunately, CALOT is not available from the authors but we did our best to reproduce it (see Sect. 6), showing our experimental investigation that the results are consistent with those of Yamada et al. (2015). Our implementation of CALOT and all algorithms presented in this paper can be found in http://hardlog.udl.cat/static/doc/inc-maxsat-ct/html/index. html, which we think is also a nice contribution for both the combinatorial testing and satisfiability communities.
Since all the algorithms presented in this paper are built on top of a SAT solver, we compared, when possible, all the algorithms with the same underlying SAT solver. That is not the case in Yamada et al. (2015), which may lead to flawed conclusions. In our experimental investigation we choose Glucose (version 4.1) (Audemard et al. 2013), as most of the state-of-the-art MaxSAT solvers are built on top of it.
We also use the ACTS tool (Borazjany et al. 2012) to compute fast and good enough upper bounds of the Covering Array Number problem, although it is not competitive with SAT-based approaches.
The environment of execution consists of a computer cluster with machines equipped with two Intel Xeon Silver 4110 (octa-core processors at 2.1GHz, 11MB cache memory) and 96GB DDR4 main memory. Unless otherwise stated, all the experiments were executed with a timeout of 2h and a memory limit of 18GB. To mitigate the impact of randomness we executed all the algorithms using five different seeds for each instance.     The rest of the experimental section is organized as follows. Regarding the Covering Array Number, in Sect. 11.1, we compare the CALOT algorithm with the MaxSAT encodings and SAT-based MaxSAT approaches described in Sects. 7 and 8 . Regarding the Tuple Number problem, in Sect. 11.2, we evaluate the complete and incomplete MaxSAT algorithms on the encoding described in Sect. 9. Then, in Sect. 11.3, we evaluate the incomplete approach for computing the Tuple Number described in Sect. 10.

SAT-based MaxSAT approaches for the covering array number problem
In this experiment, we compare the performance of state-of-the-art SAT-based MaxSAT solvers with the CALOT algorithm described in Sect. 6. We hypothesise that since these SAT-based MaxSAT algorithms, once executed on the suitable MaxSAT encodings, can simulate the behaviour of the CALOT algorithm (see Propositions 6 and 8) but the opposite is not true, MaxSAT algorithms may perform similarly or outperform the CALOT algorithm. This hypothesis would contradict the findings in Yamada et al. (2015), where it was reported that the CALOT algorithm clearly dominates the MaxSAT-based approach in . If our hypothesis is correct, MaxSAT approaches for solving the Covering Array Number problem would be put back on the agenda. We focus in anytime algorithms that must be able to report suboptimal solutions. 10 Solvers The CALOT algorithm (described in Sect. 6) and the model-guided Linear SAT-based MaxSAT algorithm Linear (described in Sect. 8) were implemented on top of the OptiLog (Ansótegui et al. 2021) python framework for SAT solving. This framework includes python bindings for several state-of-the-art SAT solvers and the python binding to the PBLib (Logic and Optimization Group 2019).
We additionally tested several complete and incomplete algorithms from the MaxSAT Evaluation 2020 . From complete MaxSAT solvers we tested MaxHS (Bacchus 2020), EvalMaxSAT (Avellaneda 2020), RC2 (Ignatiev 2020) and maxino (Alviano et al. 2015). We only report results for RC2 and one seed, 11 as this was the complete solver that reported better results. MaxHS obtained the best results for 2 of the tested instances, but we decided to exclude it from the comparison since it cannot report upper bounds for most of the instances and it uses another underlying SAT solver than Glucose41.
Regarding incomplete MaxSAT algorithms we tested Berg et al. (2020), tt-openwbo-inc (Nadel 2020) and SatLike (Lei and Cai 2018). We report results for Loandra and tt-open-wbo-inc as SatLike crashed in some of the tested instances.
MaxSAT encodings We report results on P M Sat N ,t,S,lb CC X and the weighted version W P M Sat N ,t,S,lb CC X using a linear increase for the weights (w i = i − (lb + 2) + 1, see Eq. WSoftU in Sect. 7). We found that W P M Sat N ,t,S,lb CC X with the linear and exponential increase (w i = 2 i−(lb+2) ) lead to the same performance, but the exponential increase represented a problem for some MaxSAT solvers when i was high enough.
We further tested the three different alternatives for equation (a) from CC X, where two reported good results. The first one is the original (a) equation shown in Sect. 3, , which we will refer to as a.0. The second one is the variation , which we will refer to as a.1. Results Table 5 shows the results of our experimentation. For each row and solver column, we give the average size of the minimum MCAC (out of the 5 executions per instance) and the average runtime. Bold values represent the best results. In case there are ties in size, the best time is marked. Sizes that have a star represent that the optimum has been certified in at least one of the five seeds executed for the current benchmark instance. Table 6 aggregates the information presented in Table 5 to analyze the dominance relations among approaches. In particular, we show for each row the number of wins (W) and loses (L) with respect to each of the approaches in columns, for both size and run times. We consider that if algorithm A finds a smaller MCAC than B, then A also needs less runtime than B. In this sense, we will say that an approach outperforms another if it provides a strictly better solution within the given timeout or finds the same best suboptimal solution faster. For example, in the ACTS row we found that it obtains worse sizes than CALOT CCX a.0 in 52 instances (0 W, 52 L in column size), better runtimes in 2 and worse runtimes in 56 (2 W, 56 L in column time).
We observe how both tt-open-wbo-inc and loandra outperform the results obtained by CALOT, improving the sizes in more than 10 of the 58 available instances and, in the case of tt-open-wbo-inc, we also improve runtimes in more than 40 instances. This confirms our hypothesis that MaxSAT approaches can simulate and even improve the results obtained by the CALOT algorithm.
Regarding the different variations of the CC X encoding, we notice that for tt-openwbo-inc and loandra, variation a.1 slightly improves results obtained by the original variation a.0. In particular, we observe that tt-open-wbo-inc with this specific encoding obtains the best size 12 in instance RL-B (727), while algorithm CALOT reports a size of 760. However, this behaviour of the encoding a.1 is not observed in algorithm CALOT, as in this case, the best variation of equation (a) seems to be a.0. These results suggest that in case we use a new MaxSAT solver we should not discard at front any encoding variation.
For RC2 and linear approaches we can observe clear differences among them when applying the P M Sat N ,t,S,lb CC X encoding, as linear obtains better sizes and times in 21 and 57 instances respectively. These results show that, for the Covering Array Number problem, it is more effective to perform a search that incrementally refines the upper bound as the linear approach does (see Sect. 8). However, we observe a substantial improvement when using the W P M Sat N ,t,S,lb CC X with the RC2 MaxSAT solver, improving the sizes obtained by its unweighted counterpart in 19 of the 58 instances, which produces similar results than CALOT and P M Sat N ,t,S,lb CC X linear approaches. This is expected since the weighted version forces RC2 to perform a top-down search as discussed in Sect. 8.
We also tested the W P M Sat N ,t,S,lb CC X encoding over the tt-open-wbo-inc, a not coreguided MaxSAT solver. We observe that results are similar or slightly worse than with Bold values represent the best results. In case of ties in size, the best time is marked. For sizes with a star the optimum has been certified in at least one of the five seeds executed    . We believe the W P M Sat N ,t,S,lb CC X encoding is more useful for core-guided MaxSAT solvers as it modifies their refinement strategy (i.e. improve the upper bound instead of the lower bound). We also observed that refining the lower bound for the Covering Array Number problem is more challenging than refining the upper bound, as there are some instances where encoding P M Sat N ,t,S,lb CC X with RC2 (which would refine the lower bound) is not able to report any results, usually on instances where the CAN is not found.

Weighted partial MaxSAT approaches for the tuple number problem
Encouraged by the good results of the proposed MaxSAT approaches for the Covering Array Number problem, we now evaluate the MaxSAT approach described in Sect. 9 on SAT-based MaxSAT approaches for solving the Tuple Number problem. Notice that the CALOT algorithm only works for solving the Covering Array Number problem. In this sense, this is a pioneering work on applying SAT technology to solve the Tuple Number problem.
Solvers We choose the tt-open-wbo-inc MaxSAT solver to perform these experiments, as this has been the approach that achieved better results in Sect. 11.1.
MaxSAT encodings We recall there are also some variations of the , to which we refer as a.2, reported also good results, while variation a.1 did not and was excluded.
We additionally noticed that, when computing the tuple number, the cost of the solution returned by the MaxSAT solver when using the original encoding of equation (a) in CC X, (c i τ → c i−1 τ ∨ x i, p,v ), can indeed overestimate the real cost of the solution induced by the value of the x i, p,v vars, i.e., the assignments that represent the actual tests used in the solution. This can happen since it is possible to set to False a c i τ even if the right-hand side of the implication is True. Enforcing the other side of the implication corrects this issue. For these reasons we will use the (c i Results We would like to study the evolution of the number of covered tuples as a function of the number of tests, as we hypothesise that adding a new test close to the Covering Array Number (that guarantees all tuples can be covered) will allow adding very few additional tuples. In that sense, if these tests are expensive enough, they will not pay off in terms of the available budget and the additional percentage of coverage we can achieve.
In Fig. 1, we show the number of tests required to reach a certain percentage of the tuples to cover for the tt-open-wbo-inc approach. Notice that tt-open-wbo-inc is an incomplete MaxSAT solver and we are therefore reporting a lower bound on the possible percentage by a particular number of tests. For lack of space, we only show the most representative instances of all the benchmark families.
We observe, for all the tested instances, that most of the tuples are covered using a relatively small number of tests and the remaining tuples require a relatively large additional number of tests. In our experiments, with only 52% of tests required for the   Table 5 in Sect. 11.1, we are able to reach a 95% coverage, whereas the remaining 5% of tuples need the remaining 48% of tests.
We also notice that the Tuple Number problem is more challenging than the Covering Array Number problem. According to some experimentation that we performed using complete MaxSAT solvers, none of the tested approaches has been able to certify any optimum for N > 1, even for the instances that were easy to solve for the Covering Array Number problem.
Another interesting observation is the erratic behavior on the RL-B instance (Yu et al. 2015) (Fig. 1, bottom right). RL-B is the biggest instance in the available benchmarks, having 27 parameters with domains up to 37, and with a suboptimal solution for the Covering Array Number (for t = 2) of 727 tests. After 100 tests, the results for the Tuple Number problem become quite unstable in contrast to the behaviour on the rest of the instances. This phenomenon points out that the approach analyzed in this section has some limitations when instances are large enough. For a fixed set of parameters, instances become bigger when we increase the strength t or the number of tests as in this case.
To conclude this section, we have confirmed that MaxSAT is a good approach to solve the Tuple Number problem with constraints. We have also observed that with a relatively small number of tests we can cover most of the tuples, and that this approach can be useful for medium-sized instances that do not need a large number of tests to reach a reasonable coverage percentage.
In the next section, we explore the Incremental Test Suite Construction for the Tuple Number problem described in Sect. 10.1. It allows us to tackle more efficiently those Tuple Number problems involving a relatively large number of tests.

MaxSAT based incremental test suite construction for T(N; t, S)
In Sect. 11.2, we have analyzed an approach that can be used to maximise the number of tuples covered by a number of tests inferior to C AN(t, S). However, we have seen that it becomes less efficient if we require to compute the Tuple Number problem for a large enough number of tests.
Solving approaches Here we propose three incomplete alternatives for solving the Tuple Number problem, with the aim of improving the results obtained in Sect. 11.2. Our hypothesis is that the application of incomplete approaches can be more suitable when solving bigger instances.
The first approach is the greedy algorithm presented in Yamada et al. (2016), referred to as maxh − its. This algorithm incrementally adds a test at a time. The test is constructed through a heuristic (Czerwonka 2006) that tries to increase the number of covered tuples so far, by selecting at each step the parameter tuple with the most value tuples yet to be covered.
The second approach is the Incremental Test Suite Construction from Sect. 10.1 (referred here as maxsat − its), which also adds a test at a time, 13 but this test is built by solving the Tuple Number problem through an incomplete MaxSAT solver instead of using a heuristic as in the previous approach.
In the third approach, instead of a MaxSAT query, as in the second approach, we apply a SAT query to return a test that covers at least one more tuple (referred to as sat − its) than the incremental test suite built so far.
We also evaluate the approach described in Sect. 9.2. The idea is to relax the Covering Array Number problem by allowing to cover only a 95% of the allowed tuples (τ a ). We refer to this approach as mints − 95%|τ a |. As for the Covering Array Number problem, we use the upper bound returned by the ACTS tool (see Sect. 4) for the initial number of tests.
Results: We present the relative performance of the previous four approaches respect to the best incomplete MaxSAT approach (tt-open-wbo-inc) for solving the Tuple Number problem from Sect. 11.2, referred as T (N ; t, S) (we use the symbol to indicate that the values reported for T (N ; t, S) correspond to suboptimal solutions). All the approaches shown in this section also use the incomplete SAT-based MaxSAT solver tt-open-wbo-inc, except sat − its which uses the Glu-cose41 SAT solver. For the encoding of equation (a) of CC X we use variation a.2 (c i τ ↔ c i−1 τ ∨ x i, p,v ) as in Sect. 11.2. To perform a fair comparison we tried to execute all the algorithms within the same runtime conditions. We use as a reference the runtime that maxsat − its needs to cover all the allowed tuples. In more detail, we set a timeout of 100s to each iteration of the maxsat − its approach. 14 Therefore, the total runtime in seconds consumed by maxsat − its is the number of tests it reaches multiplied by 100. For maxh − its and sat − its, the timeout is the total runtime consumed by maxsat − its. For mints − 95%|τ a |, we use as timeout the runtime consumed by T (N ; t, S) to reach 95% of coverage. Finally, for T (N ; t, S), we use a timeout of N · 100 seconds for each N . Notice that in this last case we are ensuring that for a given N , both T (N ; t, S) and maxsat − its approaches will have the same execution time limits. All approaches have been executed with 3 seeds and the mean is reported. The experimental results are presented in Figs. 2 and 3. As in Sect. 11.2, we only plot the most representative instances. Figure 2 shows the increment (or decrement) of the number of tests required by maxsat − its, maxh − its and mints − 95%|τ a | to cover the same number of tuples as T (N ; t, S). On the other hand, Fig. 3 shows the increment (or decrement) of tests required to reach the same coverage ratio as T (N ; t, S). For the sat − its approach we found that in most cases it is able to cover only one tuple per test, so we decided to exclude these results in the figures as they were clearly outperformed by the rest of the presented approaches.
In both figures, we plot a vertical line to show the points where T (N ; t, S) reaches 95% and 100% of tuples covered.
In general, maxsat − its clearly outperforms maxh − its. This can be expected since the nature of the incremental approach is to do the best at each possible iteration, and maxsat − its tackles exactly this goal by solving the Tuple Number problem, while maxh − its do not. We also observe that maxsat−its outperforms the tuple coverage that T (N ; t, S) can achieve on the first tests. Particularly, maxsat − its is able to improve the number of tests required to cover 95% of the allowed tuples in 7 of the 8 instances we show in Figs. 2 and 3. On the other hand, above 95%, T (N ; t, S) seems to be the best approach in terms of using fewer tests for the same coverage. This makes sense since the incomplete nature of maxsat − its makes it less efficient when approaching the complete coverage, which may not be needed for several applications.
In Fig. 2 we observe an erratic behaviour of instance RL-B, which is the largest instance that we had available. These results are in line with the ones in Fig. 1 of Sect. 11.2, and shows the possible issues that T (N ; t, S) can suffer when dealing with large instances. In particular, Fig. 4 shows the number of literals of the MaxSAT instance solved by T (N ; t, S) and maxsat −its as the size of the test suite increases for the RL-B benchmark. We observe that T (N ; t, S) has to deal with an increasing size of the Partial MaxSAT instance proportional to the number of tests in the test suite. In contrast, for maxsat − its the size of the instance decreases, since only one test is encoded and the number of tuples to cover decreases along with the size of the test suite built so far. This is an interesting insight since RL-B instance comes from an industrial application and it may reflect what we can face in harder real-world scenarios. Therefore, maxsat − its may seem more well suited for these harder realworld domains and may extend the reach of Combinatorial Testing for more complex SUTs.
Finally, although mints − 95%|τ a | is not consistently the best option to obtain a good suboptimal test suite that covers 95% of the total tuples, it obtains the best result on instances NetworkMgmt and Storage5. Moreover, it is the only method that guarantees optimality when combined with a complete MaxSAT solver.
We have shown that MaxSAT technology is well-suited for solving the Covering Array Number problem for Mixed Covering Arrays with Constraints through SAT technology. In particular, we discussed efficient encodings and how MaxSAT algorithms perform on them.
We also presented MaxSAT encodings for the Tuple Number problem. To our best knowledge, this is the first time that this problem is studied with SUT Constraints. Additionally, we presented a new incomplete algorithm that can be applied efficiently to solve those instances where the Tuple Number problem encoding into MaxSAT is too large. In particular, we proved we can build good enough solutions by incrementally adding a new test synthesized through a MaxSAT query that aims to maximize the coverage of additional allowed tuples, respect to the test suite under construction.
Another interesting result that we obtained is that if we do not aim to cover all t-tuples but a statistically significant fraction, we can save a great number of tests. We experimentally showed that to cover a 95% percentage, we just need, on average, a 52% percentage of the best suboptimal solution reported so far. This is of high practical importance for applications where test cases are expensive according to the budget.
From the point of view of Combinatorial Testing, it is reasonable to say that the practical and theoretical interest application of our findings and approaches will grow proportionally to the hardness or complexity of the SUT constraints. This will certainly extend the reach of Combinatorial Testing to more challenging SUTs.
From the point of view of Constraint programming, the lessons learnt on how to design efficient encodings for MaxSAT solvers can be exported to solve similar problems. These problems are roughly characterized by having an objective function whose size is proportional to the best known upper bound.
SAT and MaxSAT communities will also benefit from new challenging benchmarks to test the new advances in the field. Moreover, any future advance in MaxSAT technology can be applied to solve more efficiently the Covering Array Number and Tuple Number problems with no additional cost.