Refining neural network predictions using background knowledge

Recent work has shown logical background knowledge can be used in learning systems to compensate for a lack of labeled training data. Many methods work by creating a loss function that encodes this knowledge. However, often the logic is discarded after training, even if it is still useful at test time. Instead, we ensure neural network predictions satisfy the knowledge by refining the predictions with an extra computation step. We introduce differentiable refinement functions that find a corrected prediction close to the original prediction. We study how to effectively and efficiently compute these refinement functions. Using a new algorithm called Iterative Local Refinement (ILR), we combine refinement functions to find refined predictions for logical formulas of any complexity. ILR finds refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not. Finally, ILR produces competitive results in the MNIST addition task.


Introduction
Recent years have shown promising examples of using symbolic background knowledge in learning systems: From training classifiers with weak supervision signals [25], generalizing learned classifiers to new tasks [29], compensating for a lack of good supervised data [9,10], to enforcing the structure of outputs through a logical specification [34].The main idea underlying these integrations of learning and reasoning, often called neuro-symbolic integration, is that background knowledge can complement the neural network when one lacks high-quality labeled data [16].Although pure deep learning approaches excel when learning over vast quantities of data with gigantic amounts of compute [5,27], most tasks are not afforded this luxury.
Many neuro-symbolic methods work by creating a differentiable loss function that encodes the background knowledge (Figure 1a).However, often the logic is discarded after training, even though this background knowledge could still be helpful at test time [15,29].Instead, we ensure we constrain the neural network with the background knowledge, both during train time and test time, by correcting its output such that it will satisfy the background knowledge (Figure 1b).In particular, we consider how to make such corrections while being as close as possible to the original predictions of the neural network.
We study how to effectively and efficiently correct the neural network by ensuring its predictions satisfy the symbolic background knowledge.In particular, we consider fuzzy logics formed using functions called t-norms [23,28].Prior work has shown how to use a gradient ascent-based optimization procedure to find a prediction that satisfies this fuzzy background knowledge [9,29].However, a recent model called KENN [7] shows how to compute the correction analytically for a fragment of the Gödel logic.
To extend this line of work, we introduce the concept of refinement functions, and derive refinement functions for many fuzzy logic operators.Refinement functions are functions that find a prediction that satisfies the background knowledge while staying close to the neural network's original prediction.Using a new algorithm called Iterative Local Refinement (ILR), we can combine refinement functions for different fuzzy logic operators to efficiently find refinements for logical formulas of any complexity.Since refinement functions are differentiable, we can easily integrate them as a neural network layer.
In our experiments, we compare ILR with an approach using gradient ascent.We find that ILR finds optimal refinements in significantly fewer iterations.Moreover, ILR often produces results that stay closer to the original predictions or better satisfy the background knowledge.Finally, we evaluated ILR on the MNIST Addition task [25] and show that ILR can be combined with neural networks to solve common neuro-symbolic tasks.
In summary, our contributions are: x j W o e I a J k T o d U q V w / H C 3 j q y Z 5 N s t y Y 4 c A z S v K q p t K A G 3 G f b K i v K r U M D x b r Y p 7 x O y I l D R Y a S c n v 0 9 b 7 R 2 2 3 / I a s 2 n j T P X b 7 x 7 0 f e u 3 v n m 3 e n n 1 q f W X 9 y X p s u d b A + s 7 6 3 r q 0 r i x s S e s X 6 1 f r H 0 f / P P r X 0 b + P / r O m f v j B J u b I K h x H / / s / H a 1 f O w = = < / l a t e x i t > t Logic Loss

< l a t e x i t s h a 1 _ b a s e 6 4 = " r O H O C O d t E + J z u a d T W l m o R a G B W L A = " > A A A U J H i c f V h d j + O 2 F V W S p k n c t N 1 k H v M i 1 A i w G
x h T y e u x P Q i m y N g e I w / d 3 c n u z u 6 m 4 8 m C o m i Z G U p k S W r G H k H / I 6 / p Q 3 9 N 3 4 o + 9 K W / p Z e y Z V t f F m C b 5 j n 3 6 P D q k q L k C U a V d p z / f v D h R 7 / 5 + L e f f P p Z 6 3 e f / / 4 P f 3 z 0 x Z d v F I 8 l J l e Y M y 7 f e U g R R i N y p a l m 5 J 2 Q B I U e I 2 + 9 2 7 H B 3 9 4 R q S i P X u u V I D c h C i I 6 p x h p 6 P p p 5 n H m q 1 U I P 8 k y y a u N N 9 9 j t H / d + 6 L W / e 7 Z 5 e / a p 9 Z X 1 J + u x 5 V o D 6 z v r e + v S u r K w J a 1 f r F + t f x z 9 8 + h f R / 8 + + s + a + u E H m 5 g j q 3 A c / e / / b H 1 f P w = = < / l a t e x i t > 3 / I a s 2 n j T P X b 7 x 7 0 f e u 3 v n m 3 e n n 1 q f W X 9 y X p s u d b A + s 7 6 3 r q 0 r i x s S e s X 6 1 f r H 0 f / P P r X 0 b + P / r O m f v j B J u b I K h x H / / s / H a 1 f O w = = < / l a t e x i t > t Logic Loss < l a t e x i t s h a 1 _ b a s e 6 r X D 0 c L + C p J 3 s 2 y X J j h g P P K M m r m k o D b s R 9 s q G + q t Q y P F i s i 3 n G 7 4 i U N F h o J C W / T 1 v v H 7 X d 8 h u y a u N N 9 9 j t H / d + 6 L W / e 7 Z 5 e / a p 9 Z X 1 J + u x 5 V o D 6 z v r e + v S u r K w J a 1 f r F + t f x z 9 8 + h f R / 8 + + s + a + u E H m 5 g j q 3 A c / e / / b H 1 f P w = = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " G t  4. We analytically derive minimal refinement functions for individual fuzzy operators constructed from the Gödel, Lukasiewicz, and product t-norms in Section 7.2. 5. We discuss a large class of t-norms for which we can analytically derive minimal refinement functions in Section 7. 6.We compare ILR to gradient descent approaches and show it finds refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not.7. We apply ILR to the MNIST Addition task [25] to test how ILR behaves when injecting knowledge into neural network models.

Related work
ILR falls into a larger body of work that attempts to integrate background knowledge expressed as logical formulas into neural networks.For an overview, see [16].As shown in Figure 1, methods can be categorized by whether they only use background knowledge during training in the form of a loss function [3,9,12,32,34,35] or whether the background knowledge is part of the model and therefore enforces the knowledge also at test time [1,7,11,14,17,33].ILR is a method in the second category.We note that these approaches can be combined [15,29].First, we discuss approaches that construct loss functions from the logical formulas (Figure 1a).These loss functions measure when the deep learning model violates the background knowledge, such that minimizing the loss function amounts to "correcting" such violations [32].While these methods show significant empirical improvement, they do not guarantee that the neural network will satisfy the formulas outside the training data.LTN and SBR [3,9] use fuzzy logic to provide compatibility with neural network learning, while Semantic Loss [34] uses probabilistic logics.The formalization of refinement functions can be extended to probabilistic logics by defining a suitable notion of minimality, like the KL-divergence between the original and refined distributions over ground atoms. 1 Some common t-norms extended to any-arity aggregation operators.
Among the methods where knowledge is part of the model, KENN inspired ILR [7,8].KENN is a framework that injects knowledge into neural networks by iteratively refining its predictions.It uses a relaxed version of the Gödel tconorm obtained through a relaxation of the argmax function, which it applies in logit space.Closely related to both ILR and KENN is C-HMCNN(h) [14], which we see as computing the minimal refinement function for stratified normal logic programs under Gödel t-norm semantics.We discuss this connection in more detail in Section 7.2.1.
The loss-function based method SBR also introduces a procedure for using the logical formulas at test time in the context of collective classification [9,29].Unlike KENN [7], these approaches do not enforce the background knowledge during training but only use it as a test time procedure.In particular, [29] shows that doing these corrections at test time improves upon just using the loss-function approach.Unlike our analytic approach to refinement functions, SBR finds new predictions using a gradient descent procedure very similar to the algorithm we discuss in Section 9.1.2.We show it is much slower to compute than ILR.
Another method closely related to ILR is the neural network layer SATNet [33], which has a setup closely related to ours.However, SATNet does not have a notion like minimality and uses a different underlying logic constructed from a semidefinite relaxation.DeepProbLog [25] also is a probabilistic logic, but unlike Semantic Loss is used to derive new statements through proofs and cannot directly be used to correct the neural network on predictions that do not satisfy the background knowledge.Instead, ILR can be used both for injecting constraints on the output of a neural network, as well as for proving new statements starting from the neural network predictions.
Finally, some methods are limited to equality and inequality constraints rather than general symbolic background knowledge [12,17].DL2 [12] combines these constraints into a real-valued loss function, while MultiplexNet [17] adds the knowledge as part of the model.However, MultiplexNet requires expressing the logical formulas as a DNF formula, which is hard to scale.

Fuzzy Operators
We will first provide the necessary background knowledge for defining and analyzing minimal refinement functions.In particular, we will consider fuzzy operators, which generalize the connectives of classical boolean logic.For formal treatments of the study of such operators, we refer the reader to [23], which discusses t-norms and t-conorms, to [20] for fuzzy implications, to [4] for aggregation functions, and to [32] for an analysis of the derivatives of these operators.

increasing in both arguments, and if for all
Similarly, a function S : . We list any-arity extensions, constructed using T (t) = T (t 1 , T (t 2:n )), T (t i ) = t i of five basic t-norms in Table 1.Here t = [t 1 , ..., t n ] ∈ [0, 1] n is a vector of fuzzy truth values, which we will often refer to as (truth) vectors.These any-arity extensions are examples of fuzzy aggregation operators [4].
Note that fuzzy implications do not have n-ary extensions as they are not associative.The so-called S-implications are formed from the t-conorm by generalizing the material implication using I(a, c) = S(1 − a, c).Furthermore, every t-norm induces a unique residuum or R-implication [20] R T (a, c) = sup{z|T (z, a) ≤ c}.
Logical formulas ϕ can be evaluated using compositions of fuzzy operators.We assume ϕ is a propositional logic formula, but note this evaluation procedure can be extended to grounded first-order logical formulas on finite domains.For instance, [8] introduced a technique for propositionalizing universally quantified formulas of predicate logic in the context of KENN.Moreover, this technique can be extended to existential quantification by treating it as a disjunction.We assume a set of propositions P = {P 1 , ..., P n } and constants C = {C 1 , ..., C m }, where each constant has a fixed value If T is a t-norm, S a t-conorm and I a fuzzy implication, then the fuzzy evaluation operator f ϕ : [0, 1] n → [0, 1] of the formula ϕ with propositions P and constants C is a function of truth vectors t and given as where we match on the structure of the formula ϕ in the subscript f ϕ .

Minimal Fuzzy Refinement Functions
We will next define (fuzzy) refinement functions, which consider how to change the input arguments of fuzzy operators such that the output of the operators is a given truth value.We prefer changes to the input arguments that are as small as possible.We will introduce several definitions to facilitate studying this concept.The first is an optimality criterion.
A refinement function for f ϕ changes the input truth vector in such a way that the new output of f ϕ will be tϕ .Whenever tϕ is high, we want the refined vector to satisfy the formula ϕ, while if tϕ is low, we want it to satisfy its negation.When tϕ = 1, the constraint created by the formula is a hard constraint, while if it is in (0, 1), this constraint is soft.We require bounding the set of possible tϕ by min ϕ and max ϕ , since if there are constants C i , or if ϕ has no satisfying (discrete) solutions, there can be formulas such that there can be no refined vectors t for which f ϕ ( t) equals 1.
Next, we introduce a notion of minimality of refinement functions.The intuition behind this concept is that we prefer the new output, the refined vector t, to stay as close as possible to the original truth vector t.Therefore, we assume we want to find a truth vector near the neural network's output that satisfies the background knowledge.In the forward pass (left), ILR computes the truth value of φ.In the backward pass (right), ILR traverses the computational graph of the forward step in reverse to calculate the refined vector t.ILR substitutes each fuzzy operator of the forward pass with the corresponding refinement function.Each refinement function receives as input the initial truth values used by the fuzzy operator in the forward step (purple lines) and the target value for the corresponding subformula.The scheduler calculates the target value t ¬A∧(B∨C) for the entire formula, which ILR calls between the forward and backward steps.
For a particular fuzzy evaluation operator f ϕ , finding the minimal refinement function corresponds to solving the following optimization problem: For some f ϕ we can solve this problem analytically using the Karush-Kuhn-Tucker (KKT) conditions.However, while • is convex, f ϕ (usually) is not.Therefore, we can not rely on efficient convex solvers.Furthermore, for strict t-norms, finding exact solutions to this problem is equivalent to solving PMaxSAT when tϕ = 1 [9,15], hence this problem is NP-complete.In Sections 7 and 8, we will analytically derive minimal refinement functions for a large amount of individual fuzzy operators.These results are the theoretical contribution of this paper.We first discuss in Section 5 a method called ILR for finding general solutions to the problem of finding minimal refinement functions.ILR uses the analytical minimal refinement functions of individual fuzzy operators in a forward-backward algorithm.Then, in Section 6, we discuss how to use this algorithm for neuro-symbolic AI.

Iterative Local Refinement
We introduce a fast, iterative, differentiable but approximate algorithm called Iterative Local Refinement (ILR) that finds minimal refinement functions for general formulas.ILR is a forward-backward algorithm acting on the computation graph of formulas.First, it traverses the graph from its leaves to its root to compute the current truth values of subformulas.Then, it traverses the graph back from its root to the leaves to compute new truth values for the subformulas.ILR makes use of analytical minimal refinement functions to perform this backward pass.ILR is a differentiable algorithm if the fuzzy operators and their corresponding minimal refinement functions are differentiable as it computes compositions of these functions.An example of one step of the ILR algorithm is presented in Figure 2, while Algorithm 1 contains the entire pseudocode.
First, ILR computes the truth value of the formula in the forward pass (left side of Figure 2), where the truth vectors of intermediate subformulas are saved (in Algorithm 1: t sub .In Figure 2: numbers inside colored boxes).Then, ILR computes a backward pass (right side of Figure 2).ILR uses previously computed truth vectors of subformulas to compute the minimal refined vectors for the components of that subformula.We use the results from Sections 7.2 to compute these for the Gödel, Lukasiewicz and product fuzzy operators.
In lines 13 to 19, ILR computes the minimal refined vector for a conjunction of subformulas.We retrieve the truth values of the subformulas from the forward pass and call the minimal refinement function ρ * T for the chosen t-norm.This procedure gives us a refined vector, where each value corresponds to the refined value of a subformula.ILR then goes in recursion on the subformulas.Note that the pseudocode for disjunction and implication is analogous.
One choice in ILR is how to combine the results from different subformulas.When a proposition appears in multiple subformulas, it can be assigned multiple different refined values.We found the heuristic in line 18 generally works well, which takes the tj with the largest absolute value.We also explored averaging the different refined values, but this took significantly longer to converge.Another choice is the convergence criterion.A simple option is to stop running the algorithm whenever it has stopped getting closer to the refined value for a couple of iterations.In our experiments, we observed that ILR monotonically decreases the distance to the refined value, after which it gets stuck either on a single local optimum or oscillates between two local minima.
Moreover, we experimented with a scheduling mechanism to smooth the behavior of ILR.We implement this in line 6.The scheduling mechanism works by choosing a different refined value at each iteration: The difference between the current truth value and the refined value is multiplied by a scheduling parameter α, which we choose to be either 0.1 or 1 (no scheduling).While usually not necessary, for some formulas, the scheduling mechanism allowed for finding better solutions.
ILR is not guaranteed to find a refined vector t such that f ϕ ( t) = tϕ .This is easy to see theoretically because, for many fuzzy logics like the product and Gödel logics, tϕ = 1 corresponds to the PMaxSAT problem, which is NP-complete [9,15], while ILR has linear time complexity.However, this is traded off by 1) being highly efficient, usually requiring only a couple of iterations for convergence, and 2) not having any hyperparameters to tune, except arguably for the combination function.Furthermore, ILR usually converges quickly in neuro-symbolic settings since background knowledge is very structured, and the solution space is relatively dense.These settings are unlike the randomly generated SAT problems we study in Section 9.1.3.These contain little structure the ILR algorithm can exploit.

Algorithm 1 Iterative Local Refinement
Require: ϕ, tϕ , t, α ∈ (0, 1] 1: t ← t 2: while not converged do 3: for subformula φ of ϕ do 5: Forward pass using Definition 4 6: t ← Backward(ϕ, t ϕ , t sub ) The ILR algorithm can be added as a module after a neural network g to create a neuro-symbolic AI model.The neural network predicts (possibly some of) the initial truth values t.Since both the forward and backward passes of ILR are differentiable computations, we can treat ILR as a constrained output layer [16].For instance, in Figure 2, the input t could be generated by the neural network, and we provide supervision directly on the predictions t.ILR ensures the predictions, i.e., the refined vector t, satisfy the background knowledge while staying close to the original predictions made by the neural network.Loss functions like cross-entropy can use t as the prediction.We train the neural network g by minimizing the loss function with gradient descent and backpropagating through the ILR layer.
One strength of ILR is the flexibility of the refinement values tϕi for each formula ϕ i .These can be set to 1 to treat ϕ i as a hard constraint that always needs to be satisfied.Alternatively, refinement values can be trained as part of a larger deep learning model.Since ILR is a differentiable layer, we can compute gradients of the refinement values.This procedure allows ILR to learn what formulas are useful for prediction.For instance, in Figure 2, t¬A∧(B∨C) can either be given or act as a parameter of the model that is learned together with the neural network parameters.
We give an example of the integration of ILR with a neural network in Figure 3, where we use ILR for the MNIST Addition task proposed by [25].In this task, we have access to a training set composed of triplets (x, y, z), where x and y are images of MNIST [24] handwritten digits, and z is a label representing an integer in the range {0, ..., 18}, corresponding to the sum of the digits represented by x and y.The task consists of learning the addition function and a classifier for the MNIST digits, with supervision only on the sums.To achieve this, knowledge consisting of the rules of addition is given.For instance, the rule Is(x, 3) ∧ Is(y, 2) → Is(x + y, 5) states that the sum of 3 and 2 is 5.
The architecture of Figure 3 consists of a neural network (a CNN) that performs digit recognition on the inputs x and y.After this step, ILR predicts a truth value for each possible sum.Notice that we define the CNN outputs C x , C y ∈ [0, 1] 10 as constants, i.e., ILR does not change the predictions of the digits.Moreover, the initial prediction for the truth vector of possible sums t x+y ∈ [0, 1] 19 is the zero vector.This allows ILR to act as a proof-based method.Indeed, similarly to DeepProbLog [25], the architecture proposed in Figure 3 uses the knowledge in combination with the predictions of the neural network to derive truth values for new statements (the sum of the two digits).We apply the loss function to the final predictions tx+y .During learning, the error is back-propagated through the entire model, reaching the CNN, which learns to classify the MNIST images from indirect supervision.
We present the results obtained by ILR in Section 9.2, and compare its performance with other neuro-symbolic AI frameworks.

Analytical minimal refinement functions
Having introduced the ILR algorithm, we next study the problem of finding minimal refinement functions for individual fuzzy operators.We need these in closed form to compute the ILR algorithm, as ILR uses them during the backward pass.This section first discusses several transformations of minimal refinement functions and gives the minimal refinement functions of the basic tnorms Gödel, Lukasiewicz and product.In Section 7 we investigate a large class of t-norms for which we have closed-form formulas for the minimal refinement functions.

General results
We first provide several basic results on minimal refinement functions for fuzzy operators.In particular, we will consider formulas such as ϕ = n i=1 P i m i=1 C i , that is, conjunctions of propositions and constants.As abuse of notation, from here on, we will refer to min ϕ and max ϕ when evaluated by the t-norm T as min T and max T and will do so also for other fuzzy operators.We find using Definition 1 that for some t-norm T , min T = 0 and max T = T (c), where c is the values of the constants C 1 , ..., C m as a truth vector, while for some tconorm S, min S = S(c) and max S = 1.Note that for m = 0, max T = 1 and min S = 0. Next, we find some useful transformations of minimal refinement functions to derive new results: φ is a minimal refinement function for f φ evaluated using t-norm T .Consider f ψ (t) evaluated using dual t-conorm S of T .
An analogous argument can be made for φ = ).We will use this result to simplify the process of finding minimal refinement functions for the t-norms and dual t-conorms.For example, assume we have a minimal refinement function ρ * T for tT ∈ [T (t), max T ].Let S be the corresponding dual t-conorm.Then, we can change the constraint S( t, c) = tS in Equation 7to the equivalent constraint 1 − S( t, c) = 1 − tS .We then use Proposition 1 to find the minimal refined vector for tS ∈ [min S , S(t)] as Proposition 2. Consider the formulas φ = P 1 ∨P 2 and ψ = ¬P 1 ∨P 2 .Assume ρ * φ is a minimal refinement function for f φ evaluated using the t-conorm S, . By the assumption of the proposition, ρ * φ (t , tψ ) is a minimal refinement function for S(t ) = f ψ (t).Furthermore, note that Similar to the previous proposition, this proposition gives us a simple procedure for finding the minimal refinement functions for the S-implication of some t-conorm.

Basic T-norms
In this section, we introduce the minimal refinement functions for the t-norms and t-conorms of the three main fuzzy logics (Gödel, Lukasiewicz, and Product).In particular, we consider when these t-norms and t-conorms can act on both propositions and constants, that is, ϕ = n i=1 t i m i=1 C i , which is evaluated with T (t, c).We present the main results with simple examples.

Gödel t-norm
In this section, we derive minimal refinement functions for the Gödel t-norm and t-conorm for the family of p-norms.

The minimal refinement function of the Gödel t-conorm and tS
Proof Follows from Propositions 1, 10 and 11, see Appendix A.1.1 and A.1.2.

Proposition 4. A minimal refinement function of the Gödel implication
where is an arbitrarily small positive number.
The proof is in Appendix 4. The bar plot in Figure 4(a) shows an example for the Gödel t-conorm with four literals.The minimal refined vector is represented with the orange boxes, while the initial and refinement values of the entire formula are represented as a blue and purple line respectively.Here, our goal is to increase the value of the t-conorm, i.e., the maximum value.Increasing other literals up to tϕ would require longer orange bars and bigger values for the L p norm. Figure 4(b) represents when multiple literals have the largest truth value.Here, only one should be increased2 .Finally, Figure 4(c) shows the refined vector for the Gödel t-norm.Since the smallest truth value should be at least tϕ , we simply ensure all truth values are at least tϕ .
Our results are closely related to that of [14], which considers hard constraints, i.e., tϕ = 1.In the hierarchical multi-label classification setting, the authors introduce an output layer that ensures predictions satisfy a set of hierarchy constraints.This layer corresponds to applications of the minimal refinement function for the Gödel implication with tR G = 1.Furthermore, [14] introduces C-HMCNN(h).This method considers an output layer that ensures predictions satisfy background knowledge expressed in a stratified normal logic program.The authors introduce an iterative algorithm that computes the minimal solution for such programs.This algorithm is related to that of ILR in Section 5.However, their formalization differs somewhat from ours, and future work could study whether these results also hold for our formalization of minimal refinement functions and if they can be extended to any value of tϕ .Finally, [14] introduces a loss function that compensates for gradient bias introduced by the constrained output layer.

Lukasiewicz t-norm
In this section, we derive minimal refinement functions for the Lukasiewicz tnorm and t-conorm, for the family of p-norms.We will start using the following notation here: t ↑ refers to the truth values t i sorted in ascending order, while t ↓ refers to the truth values sorted in descending order.
Then the minimal refinement vector of the Lukasiewicz t-norm is Then the minimal refinement function of the Lukasiewicz t-conorm is Although slightly obfuscated, these refinement functions simply increase each of the literals equally, while properly dealing with constraints on the truth values.We explain this using Figure 5, where the optimal solution corresponds to a vector that, from the original truth values t, is perpendicular to the contour line of the operator at the value tϕ .Moreover, the figure also provides some intuition for our proofs.The stationary points of the Lagrangian correspond to the points where the constraint function (blue circumference) tangentially touches the contour line of the refined value (orange line).
The change applied by the refinement function is proportional to the refinement value t.Computing these refinement functions requires finding K * , which can be done efficiently in log-linear time using a sort on the input truth values and a binary search.
The residuum of the Lukasiewicz t-norm is equal to its S-implication formed using S L (1 − a, c), and so its minimal refinement function can be found using Proposition 2.
The Lukasiewicz logic is unique in containing large convex and concave fragments [13].In particular, any CNF formula interpreted using the weak conjunction (Godel t-norm) and Lukasiewicz t-conorm is concave, allowing for efficient maximization using a quadratic program of a slightly relaxed variant of the problem in Equation 7. [13] studies this property in a setting similar to ours in the context of collective classification.Future work could study using this convex fragment to find minimal refinement functions for more complex formulas.

Product t-norm
To present the three basic t-norms together, we give the closed-form refinement function for the product t-norm with the L1 norm.Our proof is a special case of the general results on a large class of t-norms we will discuss in Section 7. In particular, the product t-norm is a strict, Schur-concave t-norm with an additive generator.It is an example of a t-norm for which we can find a closedform refinement function for the L1 norm using Propositions 15 and 1.First, we show the minimal refinement function for the product t-norm.
if T P (t, c) < tT P and i = arg min n j=1 t j , t i otherwise.
(13) Next, we present the result for the product t-conorm: This refined function increases all the literals smaller than a certain threshold up to the threshold itself, where we assume tT P is greater than the initial truth value.In fact, like the other t-norms in the class discussed in Section 7, it is similar to the Gödel t-norm in that it increases all literals above some threshold to the same value.Similarly, the refinement function for the t-conorm increases the highest literal.Figure 6 gives an intuition behind this behaviour.
Finally, the residuum minimal refinement function can be found with We also studied the minimal refinement function for the L2 norm, but concluded that the result is a 2nth degree polynomial with no simple closedform solutions.For details, see Appendix D.

A general class of t-norms with analytical minimal refinement functions
In this section, we will introduce and discuss a general class of t-norms that have analytic solutions to the problem in Equation 7in order to find their corresponding minimal refinement functions.We can find those for the t-norm, t-conorm, and the residuum.

Background on t-norms
To be able to properly discuss this class of t-norms, we first have to provide some more background on the theory of t-norms.

Additive generators
The study of t-norms frequently involves the study of their additive generator [22,23], which are univariate functions that can be used to construct t-norms, t-conorms and residuums.
= 0 is an additive generator if it is strictly decreasing, right-continuous at 0, and if for all t 1 , t 2 ∈ [0, 1], g(t 1 ) + g(t 2 ) is either in the range of g or in [g(0 + ), ∞].
Using Equation 15, the function g acts like an invertible function.It transforms truth values into a new space that can be seen as measuring 'untruthfulness'.n i=1 g(t i ) can be seen as a measure of 'untruth' of the conjunction.T-norms constructed in this way are necessarily Archimedean, and each continuous Archimedean t-norm has an additive generator.T P , T L and T D have an additive generator, but T G and T N do not.Furthermore, if g(0 + ) = ∞, T is strict and we find T (t) = g −1 ( n i=1 g(t i )).The residuum constructed from continuous t-norms with an additive generator can be computed using g −1 (max(g(c) − g(a), 0)) [20].

Schur-concave t-norms
We will frequently consider the class of Schur-concave t-norms, with their dual t-conorms and residuums formed from these Schur-concave t-norms.We denote with t ↓ the truth vector t sorted in descending order, and with t ↑ as t sorted in ascending order.Definition 8.A vector t ∈ R n is said to majorize another vector u ∈ R n , denoted t u, if n i=1 t i = n i=1 u i and if for each i ∈ {1, ..., n} it holds that Similarly, a Schur-concave function has that t u implies that f (t) ≤ f (u).
The dual t-conorm of a Schur-concave t-norm is Schur-convex.The three basic and continuous t-norms T G , T P and T L are Schur-concave.There are also non-continuous Schur-concave t-norms, such as the Nilpotent minimum [30,32].The drastic t-norm is an example of a t-norm that is not Schurconcave [30].This class includes all quasiconcave t-norms since all symmetric quasiconcave functions are also Schur-concave [26, p98 C.3].Therefore, this class constitutes a significant class of relevant t-norms.For a more precise characterization of Schur-concave t-norms, see [2,30].

Minimal refinement functions for Schur-concave t-norms
We now have the background to discuss several useful and interesting results on Schur-concave t-norms.First, we present two results that characterize Schurconcave minimal refinement functions.We use the notion of "strictly coneincreasing" functions here that is discussed in Appendix B.1.
Theorem 2. Let T be a Schur-concave t-norm that is strictly cone-increasing at tT and let • be a strict norm.Then there is a minimal refined vector t * for t and tT such that whenever For proof, see Appendix C.1.We note that we can make this argument in the other direction to show that any Schur-convex t-conorm will have a minimal refined vector such that t i > t j implies t * i ≥ t * j .Furthermore, if we know that a t-norm has a unique minimal refinement function, we can use this theorem to infer a useful ordering on how it changes the truth values.
Next, we will consider the L1 norm n i=1 | ti − t i |, for which we can find general solutions for the t-norm, t-conorm and R-implication when the t-norm is Schur-concave.Proposition 6.Let t ∈ [0, 1] n and let T be a Schur-concave t-norm that is strictly cone-increasing at tT ∈ [T (t, c), max T ].Then there is a value λ ∈ [0, 1] such that the vector t * , is a minimal refined vector for T and the L1 norm at t and tT .
For proof, see Appendix C.2.We found this result rather surprising: It is optimal for a large class of t-norms and the L1 norm to increase the lower truth values to some value λ.In this sense, these solutions are very similar to that of the Gödel refinement functions.The value of λ depends on the choice of t-norm and T (t * , c) is a non-decreasing function of λ.We show in Section 8.3 how to compute these.
We have a similar result, proof in the end of Appendix C.2, for the refinement functions of Schur-convex t-conorms.This proposition shows that, under the L1 norm, it is optimal to increase only the largest literal, just like with the Gödel t-norm.Proposition 7. Let t ∈ [0, 1] n and let S be a Schur-convex t-conorm that is strictly cone-increasing at tS ∈ [S(t, c), 1].Then there is a value λ ∈ [0, 1] such that the vector t * , is a minimal refined vector for S and the L1 norm at t and tS .

Closed forms using Additive Generators
Where the previous section gives general results on the form or "shape" of minimal refinement functions for t-norms and t-conorms under the L1 norm, we still need to figure out what the value of λ is for a particular tϕ .Luckily, additive generators will do the job here.
Proposition 8. Let T be a Schur-concave t-norm with additive generator g and let 0 < tT ∈ [T (t, c), max T ].Let K ∈ {0, ..., n − 1} denote the number of truth values such that t * i = t i in Equation 28.Then using in Equation 28gives See Appendix C.2 for a proof.g( tT ) can be seen as the 'untruth'-value in g-space that t * should attain.Since we have n − K truth values that we can move freely, we need to make sure that their 'untruth'-value in g-space is g( tT )/(n − K).However, we also need to handle the truth values we cannot change freely, which is why those are subtracted from g( tT ).
We should note that this does not yet give a procedure for computing the correct K ∈ {0, ..., n − 1}.The intuition here is that we should find an K such that t i ≥ λ K for the K largest values, and t i < λ K for the remaining n − K. Like with computing the K * for the refinement function for the Lukasiewicz t-norm (Section 7.2.2),we can do this in logarithmic time after sorting t, but we choose to compute λ K for each K ∈ {0, n − 1} in parallel.
We can similarly find a closed form for the t-conorms: Proposition 9. Let t 1 , t 2 ∈ [0, 1] and let T be a strict Schur-concave t-norm with additive generator g.Consider its residuum R(t ))] is a minimal refined vector for R and the L1 norm at t and t.
Here, we find that for this class of residuums, increasing the consequent (the second argument of the implication) is minimal for the L1 norm.This update reflects modus ponens reasoning: When the antecedent is true, increase the consequent.As we have argued in [32], this could cause issues in many machine learning setups: Consider the modus tollens correction instead decreases the antecedent.For common-sense knowledge, this is more likely to reflect the true state of the world.

Experiments
We performed experiments on two tasks.The first one does not involve learning.Instead, we aim to solve SAT problems.This experiment allows assessing whether ILR can enforce complex and unstructured knowledge.The second experiment is on the MNIST Addition task [25] to test ILR in a neuro-symbolic setting and assess its ability to learn from data.

Experiments on 3SAT problems
With this experiments, our goal is to find out how quickly ILR finds a refined vector and how minimal this vector is.We test this on formulas of varying complexity to analyze for what problems each algorithm performs well 3 .Fig. 7 Comparison of ILR with ADAM on uf20-91 in SATLIB.Refined value 1.0..The x axis corresponds to the number of iterations, while the y axis is the value of tϕ in the first row of the grid, and the L1 norm in the second row.
We found that ADAM [21] significantly outperformed standard gradient descent in all metrics, and we chose to use it throughout our experiments.Furthermore, inspired by the analysis of the derivatives of aggregation operators in [32], we slightly change the formulation of the loss function for the Lukasiewicz t-norm and product t-norm.The Lukasiewicz t-norm will have precisely zero gradients for most of its domain.Therefore, we remove the max operator when evaluating the in the SAT formula so it has nonzero gradients.For the product t-norm, the gradient will also approach 0 because of the large set of numbers between [0, 1] that it multplies.As suggested by [32], we instead optimize the logarithm of the product t-norm:

Results
In Figure 7 we show the results obtained by ILR and ADAM on the three tnorms (one for each column of the grid).We observe that ILR with schedule parameter α = 0.1 has a smoother plot compared to ILR with α = 1.0, which converges faster: In our experiments, the number of steps until convergence was always between 2 and 5.For both values of the scheduling parameters, ILR outperforms ADAM in terms of convergence speed.When comparing satisfaction and minimality, the behavior differs based on the t-norm used.In the case of Lukasiewicz, all methods find feasible solutions to the optimization problem.Furthermore, in terms of minimality (i.e., L1 norm), ILR finds better solutions than ADAM.
For the Gödel logic no method is capable of reaching a feasible solution.Here, ILR with schedule parameter α = 1 performs very poorly, obtaining worse solutions than the original truth values.On the other hand, with α = 0.1, it performs as well as ADAM for both metrics but with faster convergence.Finally, for the product logic, ILR fails to increase the satisfaction of the formula to the refined value.However, ADAM can find much better solutions, getting the average truth value to around 0.5.Still, it is far from reaching a feasible solution.Nonetheless, we recommend using ADAM for complicated formulas in the product logic.
However, we argue that in the context of Neural-Symbolic Integration, the provided knowledge is usually relatively easy to satisfy.With 91 clauses, there are few satisfying solutions in this space of 2 21 possible binary solutions.However, background knowledge usually does not constrain the space of possible solutions as heavily as this.For this reason, we propose a simplified formula, where we only use 20 out of 91 clauses.Figure 8 shows the results for this setting.We see that ILR with no scheduling (α = 1) finds feasible solutions for all t-norms.ILR finds solutions for the Gödel t-norm where ADAM cannot find any, while for Lukasiewicz and product, it finds solutions in much fewer iterations and with a lower L1 norm.Hence, we argue that for knowledge bases that are less constraining, ILR without scheduling is the best choice.

Experiments on MNIST Addition
The experiments on the SATLIB benchmark show how well ILR can enforce knowledge in highly constrained settings.However, as already mentioned, in neuro-symbolic AI, the background knowledge is typically much simpler.SAT benchmarks often only have a few solutions, heavily limiting what predictions the neural network can make.Moreover, previous experiments only tested ILR where initial truth vectors are random, and we did not have any neural networks or learning.
To evaluate the performance of ILR in neuro-symbolic settings, we implemented the architecture of Figure 3. Here, the task is to learn a classifier for handwritten digits while only receiving supervision on the sums of pairs of digits.30000 3000 DeepProblog [25] 97.20 ± 0.45 92.18 ± 1.57 LTN [3] 96.78 ± 0.5 92.15 ± 0.75 ILR 96.67 ± 0.45 93.38 ± 1.70 Table 2 Results on the MNIST addition task.We report the accuracy of predicting the sum (in %) on the test set with 30000 and 3000 samples.DeepProbLog results are taken from [3].LTN results have been obtained by replicating the experiments of [3].

Setup
We follow the architecture of Figure 3.We use the neural network proposed by [25], which is a network composed of two convolutional layers, followed by a MaxPool layer, followed by a fully connected layer with ReLU activation function and a fully connected layer with softmax activation.We use the Gödel t-norm and corresponding minimal refinement functions.Note that Gödel implication can only increase the consequent and can never decrease the antecedents.For this reason, ILR converges in a single step.
We set both α and target value t to one, meaning that we ask ILR to make the entire formula completely satisfied in one step.We use the ADAM optimizer and a learning rate of 0.01, with the cross-entropy loss function.However, since the outputs of the ILR step do not sum to one, we cannot directly apply it to the refined vector ILR computes.To overcome this issue, we add a logarithm followed by a softmax as the last layers of the model.If the sum of the refined vector is one, the composition of the logarithm and softmax functions corresponds to the identity function.Moreover, these two layers are monotonic increasing functions and preserve the order of the refined vector.
We use the dataset defined in [25] with 30000 samples, and also run the experiment using only 10% of the dataset (3000 samples).We run ILR for 5 epochs on the complete dataset, and 30 epochs on the small one.We repeat this experiment 10 times.We are interested in the accuracy obtained in the test set for the addition task.We ran the experiments on a MacBook Pro (2016) with a 3,3 GHz Dual-Core Intel Core i7.

Results
ILR can efficiently learn to predict the sum, reaching results similar to the state of the art, requiring on average 30 seconds per epoch.However, sometimes ILR got stuck in a local minimum during training, where the accuracy reached was close to 50%.It is worth noticing that LTN suffers from the same problem [3], with results strongly dependent on the initialization of the parameters.To better understand this local minimum, we analyzed the confusion matrix.Figure 9 shows one of the confusion matrices for a model stuck in the local minimum: the CNN recognizes each digit either as the correct digit minus one or plus one.Then, our model obtains the correct prediction in close to 50% of the cases.For example, suppose the digits are a 3 and a 5.The 3 is classified either as either a 2 or a 4, while the 5 is classified as a 4 or a 6.  8), otherwise, it does not.We believe that in these local minima there is no way for the model to change the digit predictions without increasing the loss, and the model remains stuck in the local minimum.
Table 2 shows the results in terms of accuracy of ILR, LTN [3] and Deep-Problog [25].To calculate the accuracy, we follow [3] and select only the models that do not stop in the local minimum.Notice that this problem is very rare for ILR (once every 30 runs) and happens more frequently with LTN (once every 5 runs).

Conclusion and Future Work
We analytically studied a large class of minimal fuzzy refinement functions.We used refinement functions to construct ILR, an efficient algorithm for general formulas.Another benefit of these analytical results is to get a good intuition into what kind of corrections are done by each t-norm.In our experimental evaluation of this algorithm, we found that our algorithm converges much faster and often finds better solutions than the baseline ADAM, especially for problems that are less constraining.However, for complicated formulas and the product logic, we conclude ADAM finds better results.Finally, we assess ILR on the MNIST Addition task and show it can be combined with a neural network, providing results similar to two of the most prominent methods for neuro-symbolic AI.
There is a lot of opportunity for future work on refinement functions.We will study how the refinement functions induced by different t-norms perform in practical neuro-symbolic integration settings.On the theoretical side, possible future work could be considering analytical refinement functions for certain classes of complex formulas.Furthermore, there are many classes of t-norms and norms for which finding analytical refinement functions is an open problem.Another promising avenue for research is designing specialized loss functions that handle biases in the gradients arising from combining constrained output layers with cross-entropy loss functions [14].We also want to highlight the possibility of extending the work on fuzzy refinement functions to probabilistic refinement functions, using a notion of minimality such as the KL-divergence.
then necessarily ti > t * i , or ti ≥ tT G but ti = t * i = t i .In either case, since • p is strictly convex in each argument with minimum at t, t − t p > t * − t p, hence t could not have smaller norm.

A.1.2 Gödel t-conorm
A derivation for increasing the Gödel t-conorm was first presented in [7] and is adapted to our notation here: Proposition 11.The minimal refinement function of the Gödel t-conorm for

A.1.3 Gödel Implication
We next present a proof for Proposition 4.
Proof First, assume tR G < 1.To ensure R G (t 1 , t 2 ) = tR G , we require t 2 = tR G as is clear from the definition.However, we also require t 1 > tR G .If t 1 is already larger, we can leave it to ensure minimality.Otherwise, we require it to be at least infinitesimally bigger, that is tR G + .
Next, assume tR G = 1.If t 1 ≤ t 2 , then the implication is already 1 and we do not need to revise anything.Otherwise, setting it equal to any value between t 2 and t 1 is minimal.

A.2 Lukasiewicz t-norm minimal refined function proofs
Then the minimal refinement vector of the Lukasiewicz t-norm is Proof We will prove this using the KKT conditions, which are both necessary and sufficient for minimality for the Lukasiewicz t-norm since it is affine when the max constraint is not active.We drop the p-root in the norm since it is a strictly monotonically increasing function.The Lagrangian and corresponding derivative is We note that we drop the absolute signs since T L is strictly monotonically increasing function and tT L ≥ T L (t, c).Assuming tT L > 0, T L ( t, c) = tT L can only be true if the first argument of max is chosen.Then for all i, j ∈ {1, ..., n}, p( ti − t i ) p−1 + γ i = p( tj − t j ) p−1 + γ j .Define I as the set of K * smallest t i .

A.2.2 Lukasiewicz t-conorm
Proposition 13.The minimal refinement function of the Lukasiewicz tconorm for tS Proof We do not add multipliers for the constraints on ti , and show critical points adhere to these constraints.The Lagrangian is Note that max S L = 1.Taking the derivative to ti , we find Assume tS L = S L (t), this gives three cases for all i ∈ {1, ..., n}: 1 = 0, and so ti = t i .
2. If t 1 + c ≥ 1, then min S L = max S L = 1, and again ti = t i .
3. Otherwise, it must be that t Since the equality holds for all i ∈ {1, ..., n}, we find p • ( ti − t i ) p−1 = p • ( tj − t j ) p−1 for all i, j ∈ {1, ..., n}.As we are only interested in real nonnegative solutions, we find that ti −t i = tj −t j = δ.Since t 1 + c 1 = t 1 + c 1 +nδ = tS L , we find Note that ti ≥ t i , since by assumption tS L ≥ S L (t, c), and ti ≤ 1 since by tS that is, the constraints of Equation 7 are satisfied.

B Dual Problem
This section introduces a dual problem to Equation 7.This is used extensively in several proofs.
A fuzzy evaluation operator f ϕ is strictly cone-increasing at t ∈ [0, 1] n if there is a nonempty cone K(t) such that t − t ∈ K implies f ϕ (t) < f ϕ (t ).
vectors and continuous (since it is a norm), necessarily there is some s > 0 such that t + (s) = u.However, this is in contradiction with the premise that t * is a solution of Equation 26, as fϕ( t + (s)) > fϕ(t * ).

C Schur-concave t-norms (Proofs)
C.1 Minimal refinement function for t-norms Theorem 4. Let T be a Schur-concave t-norm that is strictly cone-increasing at tT and let • be a strict norm.Then there is a minimal refined vector t * for t and tT such that whenever t i > t j , then t * i − t i ≤ t * j − t j .
Proof Assume there is a minimal refined vector t = t * which has some ti −t i > tj −t j while t i > t j .Consider t equal to t except that t i = tj −t j +t i and t j = ti −t i +t j such that by symmetry t−t = t −t .Define t max = max( t i , t j ) and t min = min( t i , t j ).
Clearly, ti > t max ≥ t min > tj .We will show t majorizes t by checking the condition of Definition 8 for any k ∈ {1, ..., n}.
1.If t↓ k > ti , then all elements are equal and Therefore, t majorizes t , and so by Schur concavity, T ( t, c) ≤ T ( t , c), noting that the additional truth vector c will not influence the majorization result since it is applied at both sides.By Theorem 3, either 1) T ( t, c) < T ( t , c), so t could not have been minimal, leading to a contradiction, or 2) T ( t, c) = T ( t , c) and both t and t are minimal.

C.2 Closed-form refinement function using additive generators
Proposition 15.Let T be a Schur-concave t-norm with additive generator g and let 0 < tT ∈ [T (t, c), max T ].Let K ∈ {0, ..., n − 1} denote the number of truth values such that t * i = t i in Equation 28.Then using in Equation 28gives Proof Using Equations 15 and 28, we find that Since tT > 0, we can remove the min, since tT > 0 will require that K i=1 g(t ).We apply g to both sides of the equation, which is allowed since g is a bijection.Thus where in the last step we apply g −1 .
In a similar manner we can find the λ for the t-conorm.Let j = arg max n i=1 t i .

S(t
is well defined, then we can ignore the min: is a minimal refined vector for T and the L1 norm at t and tT . Proof Assume otherwise.Then, using Theorem 3, there must be a refined vector t such that t − t 1 = t * − t 1 but T ( t, c) > T (t * , c).Since tT ∈ [T (t, c), max T ], we can assume ti ≥ t i .We define π * (i) as the permutation in descending order of t * .Furthermore, let k be the smallest j such that t ↓ j < λ.Since t 1 = t * 1 , by assumption of equal L1 norms of t and t * , we will prove for all i ∈ {1, ..., n} that t majorizes t * .
The first inequality follows from the fact that there is no ordering of t that will have a higher sum than in descending order.
We will distinguish two cases:
Proof We will assume t 1 > t 2 , as otherwise R(t 1 , t 2 ) = 1 for any residuum, which necessarily means tR = 1 and so t * = t.Assume t * is not minimal.Since R is strictly cone increasing at tR , by Theorem 35 there must be some t such that t − t = t * − t = λ − t 2 but R( t1 , t2 ) > R(t * 1 , t * 2 ).Since R is non-decreasing in the first argument and non-increasing in the second, we consider t = [t 1 − , λ − ] for > 0.

D Product t-norm with L2 norm
In this appendix, we consider the refinement functions for the product t-norm under the L2-norm.We find that there is no simple closed-form parameterization in terms of tϕ , but we can find approximations in linear time.These are satisfactory to reliably find the minimal refinement function.
In the following, we wil ignore constants and consider formulas n i=1 P i , and consider the problem in Equation 7. We consider the logarithm of the product as its optimum coincides.With Lagrangian L = i ( ti − t i ) 2 + λ( i log ti − log tT P ) − γ i ( ti − 1), and so Since this holds for all i, we find that for all i, j, (γ + 2t i − 2 ti ) ti = (γ + 2t j − 2 tj ) tj = λ.We partition {1, ..., n} into sets I and M , where I contains all i such that ti < 1, and M those where ti = 1.For i ∈ I, by noting that using the complementary slackness condition γ i = 0, this induces a quadratic equation in ti with solutions ti = 1 2 (t i ± t 2 i − 2λ).
Since we assume ti ≥ t i , we have to take the solution that adds the root of the determinant, that is, ti = 1 2 ( t 2 i − 2λ + t i ).Furthermore, since we constrain for i ∈ I that ti < 1, we find that Therefore, given some chosen value of c, we require for all i ∈ I that λ > 2t i −2, and so, min Unfortunately, finding the exact value of λ such that T P ( t) = tT P is a challenge.Filling in ti , we find This is a 2n-th degree polynomial in λ, and we were not able to find an obvious, general closed form solution to it.Mathematica [19] finds a complicated closed form formula for n = 2, but cannot find closed form formulas for n > 2.
We also still need to figure out how to partition i = {1, ..., n} into I and M .Since ti as computed by Equation 32 is a strictly decreasing function in λ for all i ∈ I, we have the following unproven proposition.It supports the result given in Theorem 4 .
Proposition 20.For all λ ∈ [min n i=1 2t i − 2, 0], the function has the following properties: 1. ρ * T P (t, λ) is a minimal refinement vector for the product t-norm, the L2 norm and tT P = T P (ρ * T P (t, λ)); 2. tT P = T P (ρ * T P (t, λ)) is a strictly decreasing function in c on (min n i=1 2t i − 2, 0], and so there is a bijection between λ and t ∈ [T P (t), 1] on this interval.
The second property is easy to see by noting the derivative of ρ * T P (t, λ) is negative on λ ∈ (min n i=1 2t i − 2], but for the first we do not have a direct proof as of yet and leave this for future work. Although ρ * T P (t, λ) is not parameterized in terms of tT P , it can still be used in practical scenarios where λ can be seen as the negative "confidence" in the clause.A practical implementation could learn a weight for the clause between 0 and 1, and then transform it to the domain of λ by dividing by min n i=1 2t i −2.Alternatively, T P (t + ρ * T P (t, λ)) − tT P 2 can be minimized with respect to λ using mathematical optimization methods like gradient descent or Newton's method to find answers in terms of tT P .

E Additional experiments
In this Appendix we present additional experiments when tϕ is not 1.

E.1 Results -Refined value 0.3
The figures in this section present the results when the refined value tϕ = 0.3.
t e x i t s h a 1 _ b a s e 6 4 = " N w Z f v V x I 4 z Z G / v X e 5 A s o m r Y Q 2 B M = " > A A A U J H i c f V h d j + O 2 F V W S p k n c t N 1 k H v M i 1 A i w G x h T y e u x P Q i m y N g e I w / d 3 c n u z u 6 m 4 8 m C o m i Z G U p k S W r G H k H / I 6 / p Q 3 9 N 3 4 o + 9 K W / p Z e y Z V t f F m C b 5 j n 3 6 P D q k q L k C U a V d p z / f v D h R 7 / 5 + L e f f P p Z 6 3 e f / / 4 P f 3 z 0 x Z d v F I 8 l J l e Y M y 7 f e U g R R i N y p a l m 5 J 2 Q B I U e I 2 + 9 2 7 H B 3 9 4 R q S i P X u u V I D c h C i I 6 p x h p 6 P p p 5 n H m q 1 U IP 4 l O 3 z 9 q O 8 d O d t j V h r t p t K 3 N c f n + i y N r 5 n M c h y T S m C G l r l 1 H 6 J s E S U 0 x I 2 l r F i s i E L 5 F A b m O 9 X x 4 k 9 B I x J p E O L W / B m w e M 1 t z 2 z i z f S o J 1 m w F D Y Q l B Q U b L 5 B E W I P / V l F K k Q i F R H X 8 O y r U u q n u g n V D I x j 8 T b L M k p M W A p N A I r G g e F l w l q B Q h U g v K p 0 m M c V O E j M i 7 8 J i p 3 E J H k v M J Z G Y K p O D S 0 j M C 2 H y r V 7 z y w 2 + W I k F i V S a x J K l + 4 E A E C n J H A K z p i I 6 F k k 2 G L j I t + p M y 5 h 0 T D P r O 5 s g e f u S + B 3 Q K X Q U 7 c w Z R 7 r Y 5 Y W l 5 D A c F B m a 3 j 4 U e 5 Y C S T O m 1 i w i 9 5 i H I Y r 8 Z C b S Z K b J U i e z z n E K Y B Z i g h n 1 J J K r R C 2 Q I A q k T C d A S c u 2 f a 7 t B f V 9 E v 3 5 W O k V I 2 d J 2 o F + U 8 m 1 g A k w I 4 z l t n 9 f A w l B I t / e I O t U t N 2 0 E u q T O Y q Z P v M Y j G l 7 w p J w w c Q h 5 U J w Q b p l s g Q 9 M E G 0 o g 8 k c Y 5 d 0 / W c 3 E 8 2 U 2 a c J 9 C X 6 N 6 n J E 1 e J G m Y J q 2 Z R w I a Z f k X F O t Y k v R 6 e e Y c D 0 5 I 2 F n l j b b b s f O 5 D 3 3 d E 7 K 8 A U u Z 2 r X k c e Q T H 7 z J C I o I 8 J P O 3 q h u 7 M d O x 3 l i m y m H o o A R + 7 H b c Z 9 8 a + L p n P t + u w t N + D O n j F 3 v 8 m z i Q A k + T 2 x M J T a B + Q j X 0 X O 6 1 o j i s N 3 9 i 9 t q 0 O g a k e 4 B l W r I w I Q M G k P y U z 5 t N t 4 F g W F D f E P I 0 N h s D s l P e p L 9 P a D R Z P u g 1 + a g T Z r h O / t A f R a q J S s + S W h E t Y Q K u 4 a 1 W Z / 1 O 1 A N c Q R L 1 R k U W 8 f + e 0 y J 7 t i K E P + s 1 7 0 p z m m 4 T 1 C c J s n M x O c F m n R T m N 8 F n t o Q Z w o j Rr j 3 M 9 R 5 P 8 3 j w c b X d j Y d P I 6 k X w x 9 a e T N w u t 5 9 s u y 7 v M 9 8 H k Z v N g D L z Y n g R u Z P e f S v o O K 5 l L Z Q L S B I i k m q h h 9 t Y 2 e 2 1 d l 6 T d 7 4 J s y + H Y P f F s G v X g P j S v o 3 R 5 6 V 0 H v 9 9 D 7 M r r c A 5 d l c L U H r s r g w x 7 4 U A b f H Z L 9 8 Z D s 3 0 q y k H + 4 s 6 6 y 9 Q 7 W m G y J T G 6 h K r 5 / / e y v a X K a H Z t a i I n t F o n Y y 4 l P p / 3 B 6 S A t w y z H e 9 O h O 5 p U 8 S 1 h M D p 3 x 3 W E L e N 8 M H Y m F z s v 3 R J 3 a 9 p x + h f n / b I U Z j t 8 e D q e V v G d W e d 8 M B n V E H Z u p + P e x W C T P kK i E j X Y 8 s a n / V 4 l L c F W 5 3 Q 0 G p 2 c V v E t Y d Q b j 4 f d G s K W M Z 5 M J u f j z I q I p W C k x B U 5 s d 8 / c a p S Y i s 0 d P q 9 8 x p 8 d w G c 4 W h U M S v 2 v I y m I 3 f i Z l 4 0 Q a z E 1 N t i G Q / P T y / K Q n q X / 9 H 5 e F y 5 g H o v / c O x O + n V E H Z e T y Y n F 0 8 z J x z W w a C c F b 5 N X 3 8 w f D o s S / G t 0 H Q A V 7 D ih e / O N D 0 d O Y O K F 7 7 n Z T o a j 0 x i i z M R J t m 1 e 5 O s 9 1 + 7 e W e 3 X d s s 1 U U y T L Q y 2 c y 9 n F z i s h o y a 2 b X 0 g / x 6 w N Y o / n q S D F u k s c 1 4 r j R D K 7 z g p v N 4 1 r z + J D 5 o M o P m u S D G v G g 0 U x Q 5 y V o N h / U m g 8 O m R d V v m i S F z X i o t G M q P M i m s 2 L W v P i k H l d 5 e s m e V 0 j r h v N 6 D o v u t m 8 r j W v D 5 n n V T 5 v k u c 1 4 r z R D K / z w p v N 8 1 r z / I B 5 S e 5 h y 2 d 2 C l B d i a x I w o M 9 P B J n O G Y J q u 4 s N d I k g z l L V A V m g E Y 6 x x 8 q u K c X R C O D e y F 4 z v 5 U O G J B c 4 Z p V n G 1 w 1 U N r l G 8 P Q M 0 K z g R K s e h S R m M t 8 z x q c p 2 5 d l I B E t m A Q I k 3 e 6 q Y N s d a Z s o T c P s n U 0 p S S i E m 3 W e x F k 1 D Y G a 7 z Z r S Z C + T 2 Z c E I l g j 2 x e l y S v p m k 1 R v g H Y y 4 n 6 c a f M G 9 6 E L P X r 5 B s G m 3 2 g Y V b l D B i t x h S u G a v S 2 V C M E O S P I O T v N i I f w N F I Y O Q Q l H A 7 6 r 5 Z l z P D g h Y W e V N 9 p u x 8 7 n P v R 1 T 8 j y B i x l a t e S x 5 F P f P A m I y g i w E 8 6 e 6 O 6 s R 8 7 H e e J b a Y c i g J G 7 M d u x 3 3 y r Y m n c + 7 7 7 S 4 0 4 c + c M n a 9 y 7 O J A y X 4 P T y e u x P Q i m y N g e I w / d 3 c n u z u 6 m 4 8 m C o m i Z G U p k S W r G H k H / I 6 / p Q 3 9 N 3 4 o + 9 K W / p Z e y Z V t f F m C b 5 j n 3 6 P D q k q L k C U a V d p z / f v D h R 7 / 5 + L e f f P p Z 6 3 e f / / 4 P f 3 z 0 x Z d v F I 8 l J l e Y M y 7 f e U g R R i N y p a l m 5 J 2 Q B I U e I 2 + 9 2 7 H B 3 9 4 R q S i P X u u V I D c h C i I 6 p x h p 6 P p p 5 n H m q 1 U I P 4 l O 3 z 9 r j 3 M 9 R 5 P 8 3 j w c b X d j Y d P I 6 k X w x 9 a e T N w u t 5 9 s u y 7 v M 9 8 H k Z v N g D L z Y n g R u Z P e f S v o O K 5 l L Z Q L S B I i k m q h h 9 t Y 2 e 2 1 d l 6 T d 7 4 J s y + H Y P f F s G v X g P j S v o 3 R 5 6 V 0 H v 9 9 D 7 M r r c A 5 d l c L U H r s r g w x 7 4 U A b f H Z L 9 8 Z D s 3 0 q y k H + 4 s 6 6 y 9 Q 7 W m G y J T G 6 h K r 5 / / e y v a X K a H Z t a i I n t F o n Y y 4 l P p / 3 B 6 S A t w y z H e 9 O h O 5 p U 8 S 1 h M D p 3 x 3 W E L e N 8 M H Y m F z s v 3 R J 3 a 9 p x + h f n / b I U Z j t 8 e D q e V v G d W e d 8 M B n V E H Z u p + P e x W C T P k T y e u x P Q i m y N g e I w / d 3 c n u z u 6 m 4 8 m C o m i Z G U p k S W r G H k H / I 6 / p Q 3 9 N 3 4 o + 9 K W / p Z e y Z V t f F m C b 5 j n 3 6 P D q k q L k C U a V d p z / f v D h R 7 / 5 + L e f f P p Z 6 3 e f / / 4 P f 3 z 0 x Z d v F I 8 l J l e Y M y 7 f e U g R R i N y p a l m 5 J 2 Q B I U e I 2 + 9 2 7 H B 3 9 4 R q S i P X u u V I D c h C i I 6 p x h p 6 P p p 5 n H m q 1 U I P 8 k y f f + o 7 R w 7 2 W F X G + 6 m 0 b Y 2 x + X 7 L 4 6 s m c 9 x H J J I Y 4 a U u n Y d o W 8 S J D X F j K S t W a y I Q P g W B e Q 6 1 v P h T U I j E W s S 4 d T + G r B 5 z G z N b e P M 9 q k r 5 Z l z P D g h Y W e V N 9 p u x 8 7 n P v R 1 T 8 j y B i x l a t e S x 5 F P f P A m I y g i w E 8 6 e 6 O 6 s R 8 7 H e e J b a Y c i g J G 7 M d u x 3 3 y r Y m n c + 7 7 7 S 4 0 4 c + c M n a 9 y 7 O J A y X 4 P 4 X e G p D n C m M G O H e z 1 D n / T S P B x t f 2 9 l 0 8 D i S f j H 0 p Z E 3 C 6 / n 2 S / L u s / 3 w O d l 8 G I P v N i c B G 5 k 9 p x L + w 4 q mk t l A 9 E G i q S Y q G L 0 1 T Z 6 b l + V p d / s g W / K 4 N s 9 8 G 0 Z 9 O I 9 N K 6 g d 3 v o X Q W 9 3 0 P v y + h y D 1 y W w d U e u C q D D 3 v g Q x l 8 d 0 j 2 x 0 O y f y v J Q v 7 h z r r K 1 j t Y Y 7 I l M r m F q v j +9 b O / p s l p d m x q I S a 2 W y R i L y c + n f Y H p 4 O 0 D L M c 7 0 2 H 7 m h S x b e E w e j c H d c R t o z z w d i Z X O y 8 d E v c r W n H 6 V + c 9 8 t S m O 3 w 4 e l 4 W s V 3 Z p 3 z w W R U Q 9 i 5 n Y 5 7 F 4 N N + g i J S t R g y x u f 9 n u V t A R b n d P R a H R y W s W 3 h F F v P B 5 2 a w h b x n g y m Z y P M y s e l + l o P D K J L c 5 E m G T X 7 k 2 y 3 n / t 5 p 3 d d m 2 z V B f J M N H K Z D P 3 c n K J y 2 r I r J l d S z / E r w 9 g j e a r I 8 W 4 S R 7 X i O N G M 7 j O C 2 4 2 j 2 v N 4 0 P m g y o / a J I P a s S D R j N B n Z e g 2 X x Q a z 4 4 Z F 5 U + a J J X t S I i 0 Y z o s 6 L a D Y v a s 2 L Q + Z 1 l a + b 5 H W N u G 4 0 o + u 8 6 G b z u t a 8 P m S e V / m 8 S Z 7 X i P N G M 7 z O C 2 8 2 z 2 v N 8 w P m J b m H

n a 7 z 5 M
8 m n s 6 4 7 3 d 6 0 I Q / M 8 r Y 1 T b P J g 6 U 4 P P E w l R i E 5 i P c B U 9 o y u N K A 4 7 v c + d d o N G z 4 j 0 9 q h U Q w Y m Z N A Y k p / y a b P x H g g M G + I b Q o b G Z n N I f t L j 7 O 8 e j S b b e 7 0 2 B 6 3 T DN / Z B + q z U C 1 Z 8 U l C I 6 o l V N g V r M 3 6 t N + F a o g j W K p O o d i 6 1 j c x J b p r K U L 8 0 6 P e d X F O w / 2 C 4 j R J X B O f F 2 j S S 2 F + F 3 h q T X Q V R o x w 7 + 9 Q 5 / 0 0 j w c b n 1 r Z d P A 4 k n 4 x 9 K W R N w u v 5 1 k v y 7 r P d 8 D n Z f B 8 B z x f n w R u Z 9 a M S + s O K p p L Z Q H R A o q k m K h i 9 O U m e m Z d l q X f 7 I B v y u D b H f B t G f T i H T S uo H c 7 6 F 0 F v d 9 B 7 8 v o Y g d c l M H l D r g s g w 8 7 4 E M Z f L d P 9 m / 7 Z L 8 u y U L + 4 c 6 6 z N Y 7 W G O y J T K 5 h a r 4 8 v W z r 9 L k J D v W t R A T y y k S s Z c T n 0 7 7 g 5 N B W o Z Z j h 9 N h 8 5 o U s

Fig. 1
Fig.1Comparing different approaches for constraining neural networks with background knowledge.Loss-based approaches include LTN, SBR, and Semantic Loss, while KENN, C-HMCNN(h), and SBR-CC are representatives for refinement functions.

Fig. 2
Fig. 2 Visualization of one step of ILR for the Gödel logic and formula φ = ¬A ∧ (B ∨ C).In the forward pass (left), ILR computes the truth value of φ.In the backward pass (right), ILR traverses the computational graph of the forward step in reverse to calculate the refined vector t.ILR substitutes each fuzzy operator of the forward pass with the corresponding refinement function.Each refinement function receives as input the initial truth values used by the fuzzy operator in the forward step (purple lines) and the target value for the corresponding subformula.The scheduler calculates the target value t ¬A∧(B∨C) for the entire formula, which ILR calls between the forward and backward steps.

Fig. 3
Fig.3Neuro-symbolic architecture based on ILR for the MNIST Addition task.A CNN takes in input two images of MNIST digits, returning their classification.The predictions of the CNN are concatenated together with a vector of zeros, representing the initial prediction for the Addition task.We perform an ILR step to update the sum of the two numbers, which is the final output of the model.

Fig. 4
Fig. 4 Gödel minimal refinement functions.The grey bars represent the initial truth vectors t; the light blue and purple lines indicate the initial truth value of the formula and the revision value tϕ, and the orange bars are the corresponding minimal refined vectors.(a) t-conorm; (b) t-conorm with two literals with same truth value; (c) t-norm.

Fig. 5
Fig. 5 Lukasiewicz minimal refinement functions.The orange line corresponds to the contour line of the S L and T L at the value tϕ.Dotted blue circumference corresponds to a set of points at an equal distance from t.(a) t-conorm; (b) t-norm; (c) t-norm in the limit case.

Fig. 6
Fig. 6 Product minimal refinement functions.The grey areas represent the truth value of the operator associated with the initial vector t.Red and blue areas represent the refined values when increasing a single literal.(a) t-conorm; (b) t-norm; (c) t-norm when multiple literals have same truth value.The green area represents the improvement obtained by increasing both literals equally.

Fig. 9
Fig. 9 Confusion matrix on the MNIST classification for a local minimum

Fig. 10
Fig. 10 Comparison of ILR with ADAM on uf20-91 of SATLIB.Refined value 0.3.The x axis corresponds to the number of iterations, while the y axis is the value of tϕ in the first row of the grid, and the L1 norm in the second row.
Definition 6 (Minimal refinement function).Let ρ * be a refinement function for operator f ϕ .ρ* is a minimal refinement function with respect to some norm• if for each t ∈ [0, 1] n and tϕ ∈ [min ϕ , max ϕ ], there is no refined vector t for tϕ such that ρ * (t, tϕ ) − t > t − t .
since by removing common terms we get ti > t max .4. If t min ≥ t↓ k > tj , then removing all common terms in the sums, we are left with ti + t↓ k > t min + t max .Note t min + t max = tj +t i −t j + ti +t j −t i = ti + tj .
minimal refinement function for t-norms Proposition 16.Let t ∈ [0, 1] n and let T be a Schur-concave t-norm that is strictly cone-increasing at tT ∈ [T (t, c), max T ].Then there is a value λ ∈ [0, 1] such that the vector t * , And so, t majorizes t * , and by Schur concavity of T , T ( t, c) ≤ T (t * , c) leading to a contradiction.such that the vector t * ,t * i = λ if i = arg max i∈D t i , t i , otherwise,(29)is a minimal refined vector for S and the L1 norm at t and tS .Proposition 18.Let t ∈ [0, 1] n and let S be a Schur-convex t-conorm that is strictly cone-increasing at tS ∈ [S(t, c), 1].Then there is a value λ ∈ [0, 1] such that the vector t * with i ∈ D,t * i = λ if i = arg max i∈D t i , t i , otherwise,(30)is a minimal refined vector for S and the L1 norm at t and tS .Proof Assume otherwise.Then, using Theorem 3, there must be a refined vectort = t * such that t − t 1 = t * − t 1 = λ − t ↓ 1 but S( t, c) > S(t * , c).Let π(i) be the permutation in descending order of t.Consider any k ∈ {1, ..., n}.There is no permutation with higher sum than in descending order, sok i=1 t π(i) ≤ k i=1 t ↓ i .Furthermore, since t − t 1 = λ − t ↓ 1 , ≤ k i=1 t * ↓i , that is, t * majorizes t, and by Schur convexity of S, S(t * , c) ≥ S( t, c).