In this section, we introduce the SMT encoding of tensor operations and loops.
5.1 Encoding Tensor Operations
The result of a tensor operation is encoded as a lambda expression in SMT. For example, a negation of tensor t is encoded as ‘\(\mathsf {lambda}~i, \texttt {negate}(\mathsf {select}(\texttt {t}, i))\)’ where i is a 32 bit-vector variable, ‘\(\mathsf {select}(\texttt {t}, i)\)’ selects the i-th element from the SMT array of \(\texttt {t}\), and ‘\(\texttt {negate}(bv)\)’ is an alias for an SMT expression extracting the sign bit of bv and concatenating its negation with its BV bits. Note that it does not check whether i is within the bound of the tensor. It is because the values at out-of-bounds indices cannot affect the program’s behavior.
For operations returning a multidimensional tensor, the lambda chooses and returns the element in row-major order. For example, transpose of t whose size is \(N \times N\) is encoded as ‘\(\mathsf {lambda}~i, \mathsf {select}(\texttt {t}, i \% N \times N+i/N)\)’.
Encoding Reduction Operations. In general, reduction operations like summation of an array cannot be precisely encoded in SMT-LIB 2. To support them, we abstractly encode the reduction operations using UFs. For example, we declare sum which is a UF taking an array and returning a float number. Since this is an over-approximation, the validation may fail. In this case, we perform abstraction refinement, which will be described in Sect. 6.3.
The out-of-bounds elements of an array are wiped out before applying to UF because they must not affect the result. This is done by wrapping the input array with \(\mathsf {lambda}\) and \(\mathsf {select}\). The \(\mathsf {select}\) returns the value that do not affect the result of the reduction (e.g., \(-0.0\) for a summation) if the index is out of bounds.
Tensor Operations and Undefined Behavior. The documentation was not clear about the behavior of a program violating the assumptions that tensor operations expect at runtime. The violations include out-of-bounds access, size mismatch of the dynamic-shaped tensors, and reading an uninitialized element. If it is defined as having well-defined side effects such as calling exit, dead tensor operations cannot be freely removed and lowering to LLVM IR whose behavior may be undefined cannot be explained. Therefore, we define them as UB.
5.2 Encoding Loops
In MLIR, linalg loops are typically generated from high-level tensor operations. Compared to loops in general programs, they are simple and syntactically provide rich information. The loop consists of instructions without side-effect (modulo UB), and linalg loops explicitly state input/output tensors’ index mappings as well as parallelizable induction variables. Therefore, we can construct the output tensor or buffer without synthesizing loop invariants.
Consider the above loop that adds tensors %A and \(\texttt {\%B}^T\). Indexing maps (#id, #transposed, #id) are mappings from two induction variables (hence a doubly nested loop) to the indices of input (%A, %B) and output (%out) tensors. The loop body shows that the initial value of %out is not used. Since iterations over each dimension have no dependency because they are parallel (iterator_types), we can conclude that %out[i][j] = %A[i][j] + %B[j][i].
In this section, we propose an encoding of loops in linalg using the lambda theory and a universal quantification. Encoding a loop in linalg starts with finding loop bounds. Loop bounds are determined by matching the ranges of the indexing maps with the tensor (buffer) sizes. Then, the loop body which yields the element of the resulting tensor is encoded. If the output type is tensor, the resulting tensor is encoded in lambda in row-major order. If the output type is buffer, the memory locations are accordingly updated.
For the above example, the yielded result at each iteration is described as a lambda expression with two parameters: ‘\(\mathsf {lambda}~(d_0,d_1),~\texttt {add}(\texttt {\%A}[d_0, d_1], \texttt {\%B}[d_1, d_0])\)’. Then, the output tensor \(\texttt {\%C}\) is encoded as a lambda with a single parameter i. It selects \((i\,/\,N, i\,\%\,N)\) from the first lambda where N is %out’s width.
Determining Loop Bounds. If the sizes of %A and %B are larger than that of %out, should the linalg.generic raise UB or add parts of the inputs?
To find its valid semantics, the first transformation to consider is linalg’s conversion from linalg.generic to a canonical for loop in another dialect. The conversion generates a for loop with the upper bounds of induction variables explicitly given. The conversion sequentially visits the indexing maps, and finds the first dimension that exactly matches. Exact matching means that the range of the indexing map must be identity, not e.g., d0 + 1. If such dimension cannot be found, the linalg.generic is considered syntactically invalid.
The second transformation is the canonicalization of linalg.generic. If a linalg.generic loop iterates over the input tensors and simply returns the elements, its output is replaced with the input tensors regardless of the input/output tensors’ shapes. However, if we determine the loop bounds only by the shape of the first matched tensor, this transformation cannot be justified when input tensors have different sizes.
Therefore, we encode the loop bounds of linalg.generic as follows. First, we find loop bounds according to the algorithm of the first transformation (generic to for). For the above example, the upper bounds of d0 and d1 are the dimension’s sizes of %A because the first indexing map is for %A. Second, all input tensors’ shapes must match the determined loop bounds, otherwise UB. In the case of the above example, %A, %B and %out’s shapes must be equal.
Encoding Loops on Buffers. If inputs/outputs are buffers, tensors are loaded from the inputs, the loop is performed on the tensors, and the resulting tensor is stored into the output buffer. The input and output buffers of linalg.generic must be disjoint (Sect. 6.2). If the output buffer’s layout map is identity, the output memory block is updated using lambda. If not, a fresh SMT array for the updated block is created, and the equalities between old/new elements of the block and the output tensor are encoded using forall quantifications.
Encoding Reduction Loops. Induction variables which have “parallel” in the iterator_types attribute must appear as the parameters of the SMT lambda expression. Other variables, however, must be accordingly encoded. To encode reduction loops, we syntactically match the operand of the last yield and use the corresponding UF for the reduction (Sect. 5.1). This worked well in practice because the reduction loops in MLIR had common patterns.
5.3 Supporting Arithmetic Properties of Reductions
Floating-point addition and multiplication are not associative, but programmers sometimes want to boost performance at the expense of precision by allowing compiler optimizations that rely on the property. To encode the property, the definition of addition and multiplication must be different from IEEE-754 because using it causes inconsistency in the underlying logic.
Then, what is the semantics of \(x + y + z\)? One possible solution is that its evaluation nondeterministically yields either \((x + y) + z\) or \(x + (y + z)\) [11]. However, encoding the semantics in SMT requires introducing quantified variables.
Therefore, as described in Sect. 5.1, we start from abstractly encoding reduction operations in UFs. For example, UF \(\texttt {sum}\) takes an array [x, y, z] and returns its summation. A question is how to encode their arithmetic properties like \(\texttt {sum}([\texttt {sum}([x, y]), z]) = \texttt {sum}([x, \texttt {sum}([y, z])])\). We introduce a new technique that works when the length of the input array is constant. This technique is not specific to a summation but can be applied to any reduction.
Encoding Commutativity. The first arithmetic property to consider is commutativity: ‘
’.
A straightforward solution is to use the multiset theory. Two \(\texttt {sum}\)s are considered equal if the multisets converted from input arrays are equal. For the solvers that do not support the multiset theory, a multiset can be simulated using an array taking an element and returning its count. However, this multiset-based approach does not scale well (Sect. 7.3). We conjecture that existing algorithms in the solvers are not good at checking the equality of two multisets (cvc5)/counter arrays (Z3).
We suggest a hash-based approach for encoding the multiset equality. Our approach begins with defining a hash function F on an array. If two arrays are equal, their hash values must be equal. The inverse holds when the range of F is sufficiently large. It only uses the theory of UF and BV, which are cheap.
To define F, we define another hash function f on floating-point numbers. F(A) is defined as a summation of hash values of its elements \(\sum _{x \in A} f(x)\). By the arithmetic property of bit-vector addition, \(F(A) = F(A')\) if \(A'\) is a permutation of A. The inverse direction also holds. We prove that if \(F(A) = F(A')\) for any f, \(A'\) is a permutation of A.
Theorem 1
Given A and \(A'\) that are arrays of type T, if \(\forall f\,.\,\sum _{x \in A}\!f(x)\!=\!\sum _{x \in A'}\!f(x)\) where \(f \in T\!\rightarrow \!BV(\lceil \log _2 max(|A|,|A'|) \rceil )\), \(A'\) is a permutation of A.
Proof
Let’s assume that count(S, x) is the number of x in multiset S. For example, \(count(\{1,1,3\}, 1)\) is 2. We first prove the following lemma.
Lemma 1
Given two multisets S and \(S'\), \(S = S'\) holds if
$$\forall g, \left( \sum _{x \in S}count(S, x)\times g(x)\right) = \left( \sum _{x \in S'}count(S', x)\times g(x)\right) $$
where \(g \in T \rightarrow BV(\lceil \log _2 max(|S|,|S'|) \rceil )\).
Proof
Assume that \(g_k(x)\) is a function that returns 1 if \(x=k\) and 0 otherwise. By picking each element of S as k and \(g=g_k\), \(S=S'\) holds. \(\square \)
Assume that S is a multiset from array A and \(S'\) from \(A'\). From the assumption \(\forall f\,.\,\sum _{x \in A} f(x) = \sum _{x \in A'} f(x)\), we can derive \(\forall g, \left( \sum _{x \in S}count(S, x)\times g(x)\right) = \left( \sum _{x \in S'}count(S', x)\times g(x)\right) \). Then, we can apply the lemma. By the conclusion of the lemma, the two multisets are equal, hence A is a permutation of \(A'\). \(\square \)
For each pair of two sum function calls appearing in the source and target, their equality is encoded as a constraint. Since \(P \implies Q\) iff \(\lnot Q \implies \lnot P\), the universal quantification in the Theorem 1 can be converted into an existential form ‘\(\texttt {sum}(A) \ne \texttt {sum}(A') \implies \exists f\,.\,\sum _{x \in A} f(x) \ne \sum _{x \in A'} f(x)\)’. Since \(\exists f\) can be moved out, the precondition is quantifier-free.
Encoding Flattening of a Nested Reduction. By expanding the hash function based approach, we can encode the equality between nested reductions. Consider this equality: ‘\(\texttt {sum}([\texttt {sum}(A), \texttt {sum}(B)]) = \texttt {sum}(A {{\,\mathrm{{+\!\!+}}\,}}B)\)’.
Since the array \([\texttt {sum}(A), \texttt {sum}(B)]\) is not a permutation of \(A {{\,\mathrm{{+\!\!+}}\,}}B\), the previous encoding does guarantee that the two summations are equivalent. To support this case, given a hash function F and summation \(\texttt {sum}(A)\), we add a precondition \(F(\texttt {sum}(A)) = \sum _{x \in A} F(x)\). That is, the hash value of \(\texttt {sum}(A)\) is equivalent to the summation of hash values of \(x \in A\).
Note that the hash function is individually defined per a pair of summations in the programs. This causes additional preconditions for each hash pair to relate inner and outer summation. We reduce the number of preconditions by unifying hash functions into oneFootnote 5.