Negative Learning Ant Colony Optimization for MaxSAT

Recently, a new negative learning variant of ant colony optimization (ACO) has been used to successfully tackle a range of combinatorial optimization problems. For providing stronger evidence of the general applicability of negative learning ACO, we investigate how it can be adapted to solve the Maximum Satisfiability problem (MaxSAT). The structure of MaxSAT is different from the problems considered to date and there exists only a few ACO approaches for MaxSAT. In this paper, we describe three negative learning ACO variants. They differ in the way in which sub-instances are solved at each algorithm iteration to provide negative feedback to the main ACO algorithm. In addition to using IBM ILOG CPLEX, two of these variants use existing MaxSAT solvers for this purpose. The experimental results show that the proposed negative learning ACO variants significantly outperform the baseline ACO as well as IBM ILOG CPLEX and the two MaxSAT solvers. This result is of special interest because it shows that negative learning ACO can be used to improve over the results of existing solvers by internally using them to solve smaller sub-instances.


Introduction
Ant colony optimization ( ACO ) is a metaheuristic optimization technique [1] that was inspired by the foraging behaviour of ant colonies in nature. In technical terms, ACO algorithms are based on the repeated, step-by-step construction of solutions to the tackled optimization problem. For this purpose, ACO algorithms make use of a greedy heuristic in a probabilistic way. At each construction step, the probabilities of all options for extending the current partial solution are calculated based on the corresponding greedy function values and on so-called pheromone information. Hereby, pheromone information is implemented by means of a pheromone model that consists of a set of pheromone trail parameters together with their values. In most ACO algorithms there is a one-to-one correspondence between so-called solution components and pheromone trail parameters. In the case of the traveling salesman problem, for example, each edge of the input graph is a solution component and there is exactly one pheromone trail parameter assigned to each solution component. At each iteration, ACO algorithms perform two main actions. First, a certain number of solutions to the tackled optimization problem are generated based both on greedy information and on pheromone information. Then, high-quality solutions from the current iteration and/or from earlier iterations are used for updating the values of the pheromone trail parameters (henceforth simply called pheromone values). This is done to reward solution components that form part of high-quality solutions, with the aim of producingover time-better and better solutions. Hence, it can be said that ACO incorporates positive learning. Note that the solution construction mechanism, together with the pheromone values, defines a probability distribution over the search space which is sampled at each iteration of the algorithm. The pheromone value update then changes this probability distribution, presumably towards an area of the search space that contains even better solutions than the ones encountered so far.
Our first contribution [26] to the development of negative learning ACO variants was characterized by adding two features to the baseline ACO algorithm: (1) the use of an additional algorithmic component that provides negative feedback to the main ACO algorithm, and (2) the construction of a sub-instance for the additional algorithmic component to work on. We first applied negative learning ACO variants to the Capacitated Minimum Dominating Set (CapMDS) problem and used IBM ILOG CPLEX as the first additional algorithmic component. The empirical results showed that our negative learning ACO variants significantly outperformed the baseline ACO . Moreover, they showed to be competitive with the respective state-of-the-art [27]. We then proved that these approaches are also effective for other relevant combinatorial optimization problems [28][29][30]. Moreover, in [28], we developed negative learning ACO variants that incorporate a MAX -MIN Ant System ( MMAS ) [6] as an additional algorithmic component. These variants outperform the baseline ACO , proving that our negative learning proposal is effective for algorithmic components other than CPLEX.
In this paper, we apply the negative learning approach to the Maximum Satisfiability problem (MaxSAT) [31,32], which differs substantially from the problems considered to date. Given a multiset of clauses, where each clause is a disjunction of Boolean literals, the goal of MaxSAT is to find a truth assignment that maximizes the number of satisfied clauses or, equivalently, that minimizes the number of unsatisfied clauses.
There exist only a few ACO approaches (e.g. [33][34][35][36]) for solving MaxSAT, despite its practical relevance. The community working on satisfiability testing, for example, has solved challenging optimization problems by first encoding them as MaxSAT instances and then solving the resulting encodings with a MaxSAT solver. Nowadays, MaxSAT offers a competitive generic problem solving formalism for combinatorial optimization. For example, MaxSAT has been applied to solve optimization problems in domains as diverse as bioinformatics [37,38], combinatorial testing [39], community detection in complex networks [40], diagnosis [41], planning [42], scheduling [43] and team formation [44]. Moreover, the MaxSAT community holds an annual MaxSAT Evaluation (MSE) since 2006 [45,46]. This event has promoted the implementation of highly optimized MaxSAT solvers and the creation of a wide collection of MaxSAT instances from different domains. Thus, MaxSAT is a suitable test problem to validate the general applicability of the negative learning ACO approach in an extremely competitive scenario.

Related Work
Attempts to incorporate negative learning into ACO started almost a decade after ACO 's creation in 1991. Maniezzo [47] in 1999 and Cordon et al. [48] in 2000 pioneered the incorporation of negative learning in ACO by implementing pheromone value reductions of those components that form low-quality solutions. Then, Montgomery and Randall [49] proposed three variants of negative learning ACO , each one adopting different strategies for the identification, storage, and use of the negative feedback information. The first variant identifies bad solutions by simply choosing the worst solution found in each ACO iteration. The second variant has a function that shifts its preference from searching for good solutions at the beginning of an iteration to searching for bad solutions at the end of the iteration. The negative feedback information found by this variant is stored in an additional pheromone matrix. The third variant has a dedicated function to search for bad solutions, in addition to the standard function that searches for good solutions. The negative feedback information, however, is stored in the same pheromone matrix that stores the positive feedback information. The techniques described in some of these variants were also partially used in the negative learning ACO applications developed by other researchers such as Simon and Smith [50], Ye et al. [51], Masukane and Mizuno [52], and Rojas-Morales et al. [53]. In [28], we re-implemented negative learning ACO variants that incorporate the techniques proposed by Montgomery and Randall as well as the one proposed by Ramos et al. [54]. We compared these variants with our own negative learning approach on a large number of Minimum Dominating Set (MDS) and Multi-Dimensional Knapsack Problem (MDKP) instances. The empirical results showed that our negative learning variants significantly outperform the above mentioned variants.
Negative learning ACO has not been used so far to solve MaxSAT. Despite being a state-of-the-art metaheuristic, ACO itself has been applied just a few times to MaxSAT. The first ACO application was due to Drias and Ibri [34], who used a variant of ACO , known as Ant Colony System (ACS) [5], to solve weighted MaxSAT. This algorithm works by generating an initial solution to which a number of successive variable flips are applied. Drias and Ibri also added parallelization to their sequential ACS technique by using synchronous and asynchronous methods, but the empirical results showed that their algorithm did not outperform the existing approaches.
Another ACO implementation for MaxSAT was due to Pinto et al. [35]. They used an ACS variant to solve two unweighted and three weighted types of static and dynamic MaxSAT instances. Their implementation works by constructing the solutions in two phases: (1) variable selection, which is done randomly, and (2) value selection, which is based on a heuristic and on pheromone values. This ACO variant outperforms the baseline local search algorithm [55,56]. The authors, however, admitted that WalkSAT [57,58] and other native MaxSAT solvers were yet a substantial challenge for their proposal.
Villagra and Barán [36] developed Max-Min-SAT, a version of ACO specifically designed to solve MaxSAT. This algorithm borrowed the adaptive fitness function [59] from genetic algorithms and is available in three variants: (1) ACO SAW , which uses the step-wise adaptation of weights, (2) ACO RF , which implements refining functions, and (3) ACO RFSAW , which employs both step-wise adaptation of weights and refining functions. An empirical comparison on the basis of 50 random Max-3SAT instances showed that Villagra and Baran's approach did not outperform the WalkSAT MaxSAT solver.

Contribution and General Idea
This paper aims to prove the general applicability of our negative learning ACO framework by (1) implementing the approach to an optimization problem that has different characteristics than the problems considered to date, and (2) exploring other options for the additional algorithmic component that provides negative feedback to the main ACO algorithm. As mentioned above, there are only a few implementations of ACO for MaxSAT. Therefore, this work not only makes significant contributions to our negative learning ACO framework but also to the use of ACO for MaxSAT solving.
As mentioned before, in earlier work we proved that our negative learning ACO approach works very well with CPLEX and MMAS as additional algorithmic components that provide negative feedback to ACO [26,[28][29][30]. In this paper, we take a step further and consider two local search MaxSAT solvers, SATLike -c [60] and SLSMCS [61], as additional algorithmic components. The empirical results show that our approach also performs very well with these new algorithmic components. Moreover, the experimental investigation shows that all our negative learning ACO variants significantly outperform the baseline ACO , CPLEX , and Max-SAT solvers. Therefore, the obtained results provide stronger evidence of the general applicability and the effectiveness of our negative learning ACO approach. In particular, it might be very interesting to see for the MaxSAT community that our approach can be seen as a general framework for the improvement of existing MaxSAT solvers. Moreover, considering our findings, we believe that this algorithmic framework might be very useful also for other combinatorial optimization approaches.

The Maximum Satisfiability Problem
MaxSAT is an NP-hard optimization problem which can be stated as follows. Given is a set of n Boolean variables X = {x 1 , x 2 , … , x n } . A clause is a disjunction of literals and each literal is either a variable x i (that is a positive literal) or its negation x i (that is a negative literal). The variable x i can take the truth value 0 for FALSE or 1 for TRUE . A Conjunctive Normal Form (CNF) formula is a conjunction of a set of m clauses C = {c 1 , c 2 , … , c m } . A valid solution S to a MaxSAT problem is in the form of a complete truth assignment to all variables in X. The optimization objective of MaxSAT is to satisfy as many clauses in as possible.
Weighted MaxSAT is a variant of MaxSAT in which each clause has an associated positive weight and its optimization objective is to maximize the sum of weights of the satisfied clauses.
A standard integer linear programming model for Weighted MaxSAT can be stated as follows [32]: subject to the constraints: The model consists of a set Z of binary variables z 1 , z 2 , … , z m for each corresponding clause in C. Variable z j takes value 1 if clause c j is satisfied; otherwise it takes value 0. The sets I + j and I − j contain positive and negative literals indexes in clause c j , respectively. Parameter w j represents the weight of clause c j . We implemented all the approaches in this paper for unweighted MaxSAT. Therefore, all the clauses have weight 1. The objective function (1) counts the number of satisfied clauses and the restriction (2) ensures that each satisfied clause has at least one satisfied literal.
The satisfiability testing community has been very active in the development of MaxSAT solvers. As a result, their performance has improved dramatically in the last years, as witnessed by the results of the different editions of the MaxSAT Evaluation. The efforts have mainly focused on developing local search and exact MaxSAT solvers.
Local search MaxSAT solvers start from an initial complete assignment and, at each step, they flip the Boolean value of a selected variable to find a better solution using a heuristic. The most critical point of such solvers is that they can be trapped in local optima, and so they must incorporate suitable strategies to escape from local optima. Among the best performing solvers, we find Dist [62], CCEHC [63], SATLike [64] and SATLike 3.0 [65].
There are two main groups of exact MaxSAT solvers: branch-and-bound (BnB) and SAT-based solvers. BnB MaxSAT solvers implement the branch-and-bound scheme and are competitive on random and some types of crafted instances. At each node of the search tree, they apply some inference rules and compute a lower bound by detecting disjoint inconsistent subsets of soft clauses with unit propagation [66,67]. Representative BnB solvers are MaxSatz [68,69] and Ahmaxsat [70]. BnB MaxSAT solvers can become competitive on industrial instances by incorporating the recently defined clause learning mechanism defined in [71].
SAT-based MaxSAT solvers proceed by reformulating the MaxSAT optimization problem into a sequence of SAT decision problems [31] and are particularly competitive on industrial instances. These solvers could still be divided into three subgroups: model-guided, core-guided and Minimum Hitting Sets (MHS-)guided solvers. Model-guided approaches reduce the problem of deciding whether there exists an assignment for the MaxSAT instance with a cost less than or equal to a given k to SAT, and successively decrease k until an unsatisfiable SAT instance is found. Examples of such solvers are SAT4J-Maxsat [72] and Pacose [73]. Core-guided and MHS-guided approaches consider a MaxSAT instance as a SAT instance and call a CDCL SAT solver to identify an unsatisfiable subset of soft clauses, called a core. Then, they relax this core and solve the relaxed instance with a CDCL SAT solver to identify another core, repeating this process until deriving a satisfiable instance. The difference between them is that core-guided solvers relax a core using cardinality constraints, while MHS-guided solvers remove one clause from each detected core so that the number of different clauses removed from the cores is minimized by solving a minimum hitting set instance with an integer programming solver. The solvers Open-WBO [74], WPM3 [75] and RC2 [76] are representative core-guided solvers, and the solvers MHS [77] and MaxHS [78] are representative MHS-guided solvers.

Negative Learning ACO for MaxSAT
Our negative learning ACO for MaxSAT is based on a MMAS variant implemented in the hypercube framework [7] as the baseline algorithm. Depending on the type of additional algorithmic component that is used for providing negative feedback to the main ACO algorithm, we constructed three variants: (1) ACOSAT + neg , which uses the local search MaxSAT solver SATLike -c [60]; (2) ACOSLS + neg , which uses the local search SAT solver SLSMCS [61]; and (3) ACO + neg which applies the integer linear programming (ILP) solver CPLEX . Note that ACOSAT + neg , ACOSLS + neg and ACO + neg take benefit from both the positive and negative feedback information obtained by the solvers, whereas a fourth variant called ACO neg uses the solver CPLEX exclusively as negative feedback provider. Algorithm 1 displays the pseudo-code of the general algorithmic framework of all these variants.

General Description of the Algorithmic Framework
Our baseline ACO algorithm is a MAX-MIN Ant System ( MMAS ) implemented in the so-called hybercube framework [7]. This algorithm variant is nowadays one of the most used versions of ACO . It is characterized by the following three features: 1. All pheromone values are naturally limited to [0, 1], due to the specific pheromone value update that is employed. When each pheromone value either has value zero or value one, the algorithm is fully converged. Therefore, MMAS algorithms further limit the range of the pheromone values to [ min , max ] , where min is a small constant close to zero, and max is a constant close to one. As in most MMAS algorithms in the literature, we use fixed values min = 0.001 and max = 0.999. 2. At each iteration, the state of convergence of the algorithm is measured by calculating the so-called convergence factor cf , where 0 ≤ cf ≤ 1 . Once convergence is detected-which is the case when cf = 1-the algorithm is restarted by a re-initialization of the pheromone values to their initial values. 3. MMAS algorithms keep three solutions at any time: (1) the best solution constructed at the current iteration ( S ib ), (2) the best solution found since the last restart of the algorithm ( S rb ), and (3) the best overall solution ( S bsf ). These three solutions are used in a weighted form for updating the pheromone values at each iteration. The weights used for this purpose depend on (1) the value of the convergence factor cf , and on (2) the value of a Boolean control variable s_update . The role of s_update is hereby the following one. After pheromone (re-)initialization, only solutions S ib and S rb are used for the pheromone update. In this phase, s_update has a value of FALSE . When convergence is detected in this phase, the value of s_update changes to TRUE and the pheromone update is exclusively performed using solution S bsf . When algorithm convergence is detected in this phase, the algorithm is restarted as described above.
The pseudo-code of our baseline MMAS algorithm can be found in Algorithm 1. At the start of the algorithm both S rb and S bsf are initialized as empty sets (line 3). Moreover, cf and s_update are initialized to 0 and FALSE , respectively.
For the application to MaxSAT, the algorithm applies a standard pheromone model T that consists of pheromone (⟨x i ,j⟩) ≥ 0 for each possible value j ∈ {0, 1} to be assigned to each Boolean variable x i . In addition to the standard pheromone model, the algorithm also employs a negative pheromone model T neg that consists of negative pheromone for each Boolean value j to be assigned to each Boolean variable x i . The pheromones in T are initialized to 0.5 while the pheromones in T neg are initialized to 0.001 at the start of the algorithm by function InitializePheromoneValues(T, T neg ) (line 4). Based on greedy and pheromone information, then n a solutions are generated at each iteration according to function Construct_Solution(T, T neg , d rate ) (lines 6 -10). Further explanations on how this function works are given after this general description. S sub := SolveSubinstance(S iter , t solver ) 12: 13: if S ib better than S rb then S rb := S ib 14: if S ib better than S bsf then S bsf := S ib 15: 16: cf := ComputeConvergenceFactor(T ) 17: if cf > 0.999 then 18: if bs update = true then All solutions S k found in the current iteration are added to a set S iter . Subsequently, function SolveSubinstance(S iter , t solver ) builds a sub-instance I sub in the form of a MaxSAT partial solution. The pre-assigned variables in this partial solution are stored in set X ′ ⊆ X , which contains the variables that have been assigned the same truth value in each S k ∈ S iter . Depending on the specific variant of the negative learning ACO to be applied, the function then chooses either CPLEX or one of the two MaxSAT solvers for solving this sub-instance (line 11). Note that this function is the only place where these solvers are used within our negative learning ACO algorithm. After trying to solve the sub-instance for a maximum time of t solver CPU seconds, the function returns a solution S sub . Next, S sub is compared with the solutions in S iter . The solution with the best objective function value becomes the iteration-best solution S ib (line 12). This means that, in addition to using solution S sub for performing the update of the negative pheromone values (as outlined below), S sub is also used for additionally boosting the positive learning mechanism of the algorithm by adding it to set S iter which is used to update solutions S ib , S rb and S bsf . There is only one exception. In the case of algorithm variant ACO neg , S sub is not added to S iter . Hence, in this algorithm variant, S sub is exclusively used for updating the negative pheromone values. Afterwards, the restart-best solution S rb and the best-so-far solution are updated with S ib (lines [13][14]. Finally, the pheromone update and the calculation of the convergence factor are implemented by functions ApplyPheromoneUpdate(T , T neg , cf , s_update , S ib , S rb , S bsf , S sub , , neg ) and ComputeConver-genceFactor(T ) (lines 15-16), respectively. If cf > 0.999 and s_update = TRUE , the algorithm is restarted (lines 17-24). Note that our algorithm terminates once the termination conditions are met. In most of our experiments in this paper, the termination conditions are met once a given CPU time limit is reached. However, in some of the experiments we used a maximum number of solution constructions as termination condition. Nevertheless, this will be clearly stated in the section on the experimental results. In the following, the functions in the algorithm are described in more detail.

Solution Construction
Function Construct_Solution(T, T neg , d rate ) generates each solution S k in two phases: (1) variable selection and (2) value selection. In the first phase, a variable x i is taken from the set X ⊆ X that contains the variables that have not been assigned a value in solution S k . The probability 1 (x i ) of selecting variable x i is calculated according to the following equation.
where i is the greedy information for variable selection. More specifically, i is the number of occurrences of variable x i in the initial instance. Afterwards, a random number r ∈ [0, 1] is generated. The variable that has the highest value of (x i ) in Eq. 5 is directly selected if r ≤ d rate . Otherwise, the variable is randomly selected using a roulette wheel selection [79]. Hereby, d rate is the so-called determinism rate.
In the second phase of the solution construction, a truth value is assigned to the selected variable x i , in a way similar to the one of the first phase. The probability of assigning truth value j to variable x i is calculated with the following equation: where Hereby, the greedy information (⟨x i ,j⟩) for the truth value selection in Eq. 7 is inversely proportional to the number of new constraint violations in the partial solution S k . In this equation, cost() is a function that counts the number of constraint violations in a partial solution. With this definition, cost(S k ) refers to the number of constraint violations in the current partial solution S k , while cost(S k , {⟨x i , j⟩}) refers to the cost of the partial solution obtained by extending the partial solution S k with the assignment x i = j.
These two phases of the solution construction are repeated until all x i ∈ X are assigned a truth value. Figure 1 shows an illustrative example of how negative learning is added to the baseline ACO in the context of the MaxSAT problem. The example shows five solutions generated in the current ACO iteration for a MaxSAT problem on seven binary variables. These five solutions are added to set S iter . Subsequently, function SolveSubinstance(S iter , t solver ) (line 11 of Algorithm 1) builds a sub-instance I sub in the form of a Max-SAT partial solution. The pre-assigned variables in this partial solution are stored in set X ′ ⊆ X , which contains the variables that have been assigned the same truth value in each solution S k ∈ S iter . In the illustrative example in Fig. 1, variables x 2 , x 5 , and x 6 from S iter are assigned values 1, 0 and 1, respectively, in each of the five solutions. Consequently, in sub-instance I sub , variables x 2 , x 5 , and x 6 are pre-assigned with fixed values 1, 0 and 1, respectively. With this, the additional algorithmic component can only work on the remaining variables whose values are still unassigned.

Pheromone Update
Function ApplyPheromoneUpdate(T , T neg , cf , s_update , S ib , S rb , S bsf , S sub , , neg ) updates the standard pheromone model T and the negative pheromone model T neg at each iteration. The standard pheromone model T is updated in the same way as in all MMAS algorithms implemented in the hypercube framework. The value of each standard pheromone (⟨x i ,j⟩) is updated with the following equation: where: The parameter (⟨x i ,j⟩) in Eq. 9 stores the accumulative update received by each possible value j ∈ {0, 1} that can be assigned to each Boolean variable x i . The weights ib , rb , and bsf in the same equation represent the influences of solutions S ib , S rb , and S bsf , respectively, on the amount of pheromone deposits, and is the learning rate. The values of these weights are determined based on the states of cf and s_update as shown in Table 1. Note that in each state, the sum of ib , rb , and bsf is equal to 1. Furthermore, Δ(S, x i , j) evaluates to 1 if, and only if, the truth value j is assigned to variable x i in the corresponding solution; otherwise, Δ(S, x i , j) evaluates to 0. For preventing the algorithm to reach complete convergence, the pheromone values are limited in the range of min = 0.001 to max = 0.999 . Any pheromone that falls below min is set back to min and any pheromone that exceeds max is set back to max .
Function ApplyPheromoneUpdate(T , T neg , cf , s_update , S ib , S rb , S bsf , S sub , , neg ) also updates negative pheromones with a similar mechanism as the one used for the standard pheromone update. However, in the case of the negative pheromone values, Eq. 10 is only used to update the negative pheromone values corresponding to variables from X ⧵ X ′ , that is, variables that did not have already a pre-assigned value in sub-instance I sub . The update formula for the negative pheromone values is as follows: Hereby, neg is the negative learning rate. Furthermore, for all x i ∈ X ⧵ X � , neg (⟨x i ,0⟩) is set to 1 if x i has value 1 in solution S sub , to 0 otherwise. Moreover, neg (⟨x i ,1⟩) is set to 1 if x i has value 0 in solution S sub , to 0 otherwise. In the illustrative example in Fig. 1, negative pheromone update is only applied to the truth values of variables x 1 , x 3 , x 4 , and x 7 since their values are not pre-assigned in the sub-instance I sub . In j⟩) ).
this example, the truth values 1, 0, 1, and 1 are assigned to variables x 1 , x 3 , x 4 , and x 7 , respectively. As a consequence of this assignment, negative pheromone increase is given to the truth values 0, 1, 0, and 0 which are not assigned to variables x 1 , x 3 , x 4 , and x 7 , respectively. Hence, our algorithm gives penalty in the form of a negative pheromone increase to each Boolean value that is not assigned in S sub to a variable x i ∈ X ⧵ X � . Note that also the negative pheromone values are limited to the range from min = 0.001 to max = 0.999 . Any negative pheromone that falls below min is set back to min and any negative pheromone that exceeds max is set back to max .

Calculation of the Convergence Factor
Function ComputeConvergenceFactor(T ) calculates the value of cf needed to regulate the update of the standard pheromone model T using Eq. 11: With this equation, the value of cf is equal to zero when all pheromone values are initialized to 0.5. On the contrary, the value of cf is equal to one when all pheromone values are either min or max . In the rest of the conditions, the value of cf is between 0 and 1.

Experimental Evaluation
We performed the experimental evaluation of our negative learning ACO variants, the baseline ACO algorithm without negative learning, the ILP solver CPLEX , the two chosen local search MaxSAT solvers and the exact MAXSAT solver MaxHS [78] on a cluster of machines with two Intel ® Xeon ® Silver 4210 CPUs with 10 cores of 2.20 GHz and 92 GB of RAM. The version of CPLEX used by ACO variants ACO + neg and ACO neg , as well as in standalone model, was 12.10, in one-threaded mode. The local search MaxSAT solvers SATLike -c [60] and SLSMCS [61] used by variants ACOSAT + neg and ACOSLS + neg , as well as the exact MAXSAT solver MaxHS [78], are taken from [80]. The solvers SATLike -c and MaxHS were the winners of the incomplete and complete unweighted tracks of MaxSAT Evaluation 2020 (MSE 2020), respectively.

Problem Instances
First, we decided to compare our negative learning ACO approaches with the ACO approaches for MaxSAT by Pinto et al. [35] and Villagra and Barán [36]. Next, we also want to compare our approaches with the state-of-the-art MaxSAT solvers SATLike-c, SLSMCS and MaxHS from MSE 2020.
For this purpose, we tested our negative learning ACO variants on the problem instances from [35,36] as well as on a wide range of problem instances from MSE 2016 [81] and MSE 2020 [80]. The specifications of these MaxSAT instances are given in Table 2 Table 3) and over-constrained instances (instances 26-50 in Table 3). Each of these 50 instances has three literals per clause and 50 variables. Each phase-transition instance has 215 clauses while each over-constraint instance has 323 clauses. The chosen instance set from MSE 2020 and MSE 2016 consists of four groups: (1) maxcut, (2) highirth, (3) ramsey, and (4) set-covering. In Tables 4, 5, 6 and 7, we sort these 113 instances according to the number of literals, the number of variables, and the number of clauses. Overall, these instances vary considerably in terms of size and structure.

Algorithm Tuning and Test Settings
The baseline ACO as well as the negative learning variants require well-working configurations of their parameter values. We used the scientific tuning software irace [82] for parameter tuning purposes. In particular, we carried out separate tuning runs for each of the considered MaxSAT instance groups. Concerning the instances by Pinto et al., we chose instance number 1 from Table 2 for parameter tuning. The parameter values obtained for this instance group are presented in Table 8. Pinto et al. employed an ACO approach  317.38 (22) 317.224(16)

Table 4
Best results of all algorithms tested on MSE 2020 and 2016 instances using a single ant that was evaluated for 100 runs and each run consisted of 100 iterations. Consequently, we limited our algorithms to match the number of their ACO algorithm's solution constructions. In particular, we limited the execution of ACO + neg , ACO neg , and ACO to 6, 14, and 50 iterations, respectively.
Concerning the instances of Villagra and Baran, we chose the first five instances from each of the two instance types (phase-transition and over-constrained) for tuning. The parameter values obtained for this instance group are presented in Table 9. Villagra and Baran employed 10 ants in their ACO variants and limited the executions to 10,000 iterations for each of the 10 test runs for every instance. Adjusting to their test setting, we limited the execution of ACO + neg , ACO neg , and ACO to 20,000, 16,666, and 8333 iterations, respectively.
As shown in Tables 4, 5, 6 and 7, the instance set selected from the MSE 2016 and MSE 2020 Evaluations is very diverse in its specifications. For tuning purposes, we divided these instances into 8 sub-groups based on their type and size: (1) maxcut 1 , (2) maxcut 2 , (3) highirth, (4) ramsey 1 , (5) ramsey 2 , (6) setcov 1 , (7) setcov 2 , and (8) setcov 3 . We took the first two instances from each of these sub-groups for the tuning process. As an exception, we took the first two instances from each configuration of n l , n x , and n c for the sub-group highirth. Therefore, for this sub-group we used a total of eight instances for tuning. The obtained parameter values are presented in Table 10. We limited the execution time of all the algorithms tested on this instance group to 300 seconds, corresponding to one of the time limits used in MSE 2020 [80].

Results
The empirical results of all algorithms applied to the instances of Pinto et al. are presented in Table 2. Note that the results are provided in terms of the average number of satisfied clauses obtained within 100 runs, hence, a higher value represents a better result. Moreover, the results under the header ACO Pinto are the results of the ACO version from Pinto et al. [35]. Additionally, results marked in bold correspond to the best result of the comparison for each table row. In summary, the results show that ACO + neg is the best algorithm for these instances. They also show that even though ACO neg is outperformed by CPLEX , it still performs significantly better than ACO Pinto . Compared to ACO , each of our negative learning approaches produces a remarkable improvement over the baseline algorithm. Table 3 shows the empirical results of all the algorithms applied to the instances of Villagra and Baran in a summarized way. In particular, results are averaged over the 25 instances of each of the two instance sub-groups. In addition, the number of instances solved to optimality for each sub-group are given in brackets after the corresponding average results. In the context of these instances, our negative learning ACO variants are compared to the MaxSAT solver WalkSAT as well as the ACO variants from Villagra and Baran: ACO SAW , ACO RF , and ACO RFSAW . Each result in the table indicates the average number of satisfied clauses obtained within 10 algorithm runs. Additionally, we made use of the R package scmamp [83] to facilitate the interpretation of the results in Table 3. This statistical tool works as follows. First, the results from all algorithms are compared simultaneously using the Friedman test for obtaining the rejection to the hypothesis that all the algorithms perform equally. Next, a set of pairwise comparisons are performed using the Nemenyi post-hoc test [84] and, eventually, the output of this statistical analysis is presented as a critical difference (CD) plot in Fig. 2. The horizontal axis of the CD plot represents the range of algorithm ranks, while each of the vertical lines represents the average rank of the corresponding algorithm. Bold horizontal lines connecting algorithm markers means that the corresponding algorithms performed statistically equivalent i.e. the critical difference is not greater than the significance level of 0.05. Fig. 2 shows that all of our negative learning approaches, as well as the baseline ACO , perform statistically better than each of the ACO versions from Villagra and Baran. Furthermore, all our ACO versions perform statistically equivalent to the MaxSAT solver WalkSAT and the ILP solver CPLEX for this instance group.
Tables 4, 5, 6 and 7 present the results of all the algorithms applied to the selected MSE 2016 and MSE 2020 instances. Note that this table provides the best result of each (stochastic) algorithm, while the average results are, due to space limitations, provided as supplementary material [85]. Also note that the results in Tables 4, 5, 6 and 7 are given in terms of the number of violated clauses. Thus, a lower value represents a better result. For facilitating the interpretation of the results obtained for this instance group, we additionally present the data from Tables 4, 5, 6 and 7 in a summarized way in Table 11. In addition, we conducted the same statistical analysis with scmamp (as explained above) to the data from Tables 4, 5, 6, 7 and present the result as a CD plot in Fig. 3.
In particular, Table 11 shows the number of instances for which each one of the negative learning ACO variants performs better, worse, or equally in comparison to its individual algorithmic component. These summarized results indicate that, in general, each of our negative learning ACO variants improves both over the baseline ACO and over each of the solvers that are used internally for solving sub-instances. Among all the negative learning ACO variants, ACOSAT + neg achieved the highest number of improve- Table 5 Continuation of Table 4 Best results of all algorithms tested on MSE 2020 and 2016 instances Results marked in bold correspond to the best result of the comparison Table 6 Continuation of Table 4 Best results of all algorithms tested on MSE 2020 and 2016 instances Results marked in bold correspond to the best result of the comparison Table 7 Continuation of Table 4 Best MaxSAT solver SATLike-c, however, it improves over the result of SATLike -c only in 11.5% of all the problem instances. Nevertheless, ACOSAT + neg can be called the best algorithm for this instance group according to the CD plot in Fig. 3, even though no statistical difference can be detected with respect to SATLike -c and ACO + neg . Furthermore, all remaining negative learning ACO variants also significantly improve over both the baseline ACO approach and their internally used solvers. Even ACO neg , the variant that does not take advantage of the internally derived CPLEX result for updating its own best result, improves over both the baseline ACO approach and its constituent solver CPLEX in the context of most of the problem instances. Furthermore, the statistical analysis graphically presented in Fig. 3 also shows that ACO neg outperforms the MaxSAT solver SLSMCS . Hence, this proves the effectiveness of our negative learning strategy. Moreover, these results indicate an interesting aspect: our negative learning ACO framework can potentially be used for improving the results of MaxSAT solvers that are already very successful in standalone-mode.

Discussion and Conclusions
Ant colony optimization ( ACO ) was subject to several major improvements and extensions in its history. Most of these extensions, however, deal exclusively with the improvement of the positive learning mechanisms. Observing that negative learning works in synergy with positive learning in nature, several works were presented in the literature to integrate negative learning into ACO in the past decades. Most of these works, however, produced limited successes. In previous work, we introduced a novel strategy for the implementation and use of negative learning in ACO . In contrast with other negative learning proposals, we made use of an additional algorithmic component to provide negative feedback to the main ACO algorithm. Further, we also implemented an effective cooperation mechanism between the main ACO approach and the additional algorithmic component through the use of a sub-instance that is not only reduced in size but that also contains high quality solutions. Our strategy was proven to be useful for the improvement of the performance of the baseline ACO algorithm in the context of a range of sub-set selection problems. In our opinion, one of the main reasons for the success of negative learning in ACO is that, in addition to guiding the algorithm to promising areas of the search space due to positive learning, it also guides the algorithm away from making use of solution components that initially seem promising but that have shown during the search process to lead to rather low-quality solutions.
In this work, we applied the negative learning ACO strategy to the MaxSAT problem, an optimization problem which is substantially different from the problems considered to date. Moreover, this problem is an extremely well studied optimization problem for which a wide range of high performance solvers are available for comparison. Also, ACO approaches were rarely implemented for this optimization problem and most of the existing implementations are far from being able to compete with state-of-the-art approaches. Hence, testing our negative learning proposal on MaxSAT provides a good opportunity to demonstrate the general applicability as well as the effectiveness of our approach. In addition to the ILP solver CPLEX that we already employed in previous work, we made use of two high-performance MaxSAT solvers, SATLike -c and SLSMCS , as new options for the additional algorithmic component to be internally used with negative learning ACO . In this study, we evaluated the resulting negative learning ACO variants on three instance groups. In the context of the first two instance groups, the results show that our negative learning ACO variants perform significantly better than the baseline ACO as well as existing ACO variants from the literature. In the third instance group, consisting of instances used for  ACO 12 0.1 0.2 n/a n/a recent MaxSAT evaluations, the obtained results showed that all our negative learning ACO variants were able to improve over the baseline ACO approach and over each of the internally used solvers. This is a very interesting result, as it shows that high-performance MaxSAT solvers can even be improved using them for solving sub-instances within our framework. In our opinion, this happens because the ACO algorithm reduces the original problem instances and the solver is then applied only to limited areas of the search space in which presumably good solutions can be found. A natural extension of this work is to adapt our negative learning ACO for weighted MaxSAT as well as for partial 0.2 0.9 n/a n/a maxcut 2 5 0.1 0.9 n/a n/a highgirth 13 0.1 0.3 n/a n/a ramsey 1 10 0.3 0.1 n/a n/a ramsey 2 12 0.5 0.0 n/a n/a setcov 1 15 0.2 0.9 n/a n/a setcov 2 19 0.4 0.9 n/a n/a setcov 3 20 0.4 0.9 n/a n/a MaxSAT, which is the variant of MaxSAT that declares some clauses as hard and imposes that hard clauses must be satisfied by any valid solution. Since industrial instances are generally encoded using weighted and partial MaxSAT, it might be interesting to use a SAT-based MaxSAT solver or a branch-and-bound MaxSAT solver  with clause learning as additional algorithmic component. These solvers are particularly competitive on industrial instances and our negative learning ACO might help improve their performance. Finally, another extension of this work is to incorporate a decimation approach [65] in the generation of solutions.