Extreme Witnesses and Their Applications

We study the problem of computing the so called minimum and maximum witnesses for Boolean vector convolution. We also consider a generalization of the problem which is to determine for each positive value at a coordinate of the convolution vector, q smallest (largest) witnesses, where q is the minimum of a parameter k and the number of witnesses for this coordinate. We term this problem the smallest k-witness problem or the largest k-witness problem, respectively. We also study the corresponding smallest and largest k-witness problems for Boolean matrix product. First, we present an $$\tilde{O}(n^{1.5}k^{0.5})$$ O~(n1.5k0.5) -time algorithm for the smallest or largest k-witness problem for the Boolean convolution of two n-dimensional vectors, where the notation $$\tilde{O}(\ )$$ O~() suppresses polylogarithmic in n factors. In consequence, we obtain new upper time bounds on reporting positions of mismatches in potential string alignments and on computing restricted cases of the $$(\min , +)$$ (min,+) vector convolution. Next, we present a fast (substantially subcubic in n and linear in k) algorithm for the smallest or largest k-witness problem for the Boolean matrix product of two $$n\times n$$ n×n Boolean matrices. It yields fast algorithms for reporting k lightest (heaviest) triangles in a vertex-weighted graph.


Introduction
For a potential alignment of a pattern string with a text string over the same alphabet, a position in the alignment where the pattern symbol is different from the text symbol is a witness to the symbol mismatch while a position where the pattern and text symbol are equal is a witness to the symbol match.
Similarly, if A and B are two n × n Boolean matrices and C is their Boolean matrix product then for any entry C[i, j] = 1 of C, a witness is an index m such that A[i, m] ∧ B[m, j] = 1. The smallest (or, largest) possible witness is called the minimum witness (or, maximum witness, respectively).
The problems of finding "witnesses" have been extensively studied for several decades, at the beginning independently within stringology and graph algorithms relying on matrix computations. In string matching, witnesses for symbol mismatches or matches in potential alignments of two strings are sought [4,9,17] while in graph algorithms, witnesses for the Boolean matrix product are typically sought, originally in order to solve shortest path problems in graphs [2,3]. In both cases, highly non-trivial efficient algorithmic solutions have been presented [2][3][4]17].
Also in both areas, useful generalizations and/or specializations of the problems of finding witnesses have been studied. A natural generalization introduced for string matching in [17] is to request up to k witnesses instead of a single one. It has been efficiently solved by using concepts from group testing in [4] and conveyed to Boolean matrix product in [4,14]. A natural specialization is to request minimum or maximum witnesses. This specialization has been introduced and efficiently solved in [10] in the context of finding lowest common ancestors in directed acyclic graphs and it found many other applications since then (cf. [8,18,21]).
In analogy to witnesses for Boolean matrix product, if a and b are two n-dimensional Boolean vectors and c is their Boolean convolution then for any coordinate c i = 1 of c, a witness is an index l such that a l ∧ b i−l = 1. In contrast to string matching and Boolean matrix product, the problem of computing the witnesses of Boolean vector convolution does not seem to be explicitly studied in the literature. On the other hand, Boolean vector convolution is very much related to string matching [12], and hence the algorithms for reporting witness or more generally up to k witnesses can be easily conveyed from stringology to Boolean vector convolution (see Proposition 3.1).
In this paper, we study the problem of computing minimum and maximum witnesses for Boolean vector convolution. We also consider a generalization of the problem which is to determine for each positive (value at a) 1 coordinate of the convolution vector, q smallest (largest) witnesses, where q is the minimum of a parameter k and the number of witnesses for this coordinate. We term this problem the smallest kwitness problem or the largest k-witness problem, respectively. We also study the corresponding generalization for Boolean matrix product.
Let ω(1, r , 1) denote the exponent of fast arithmetic multiplication of an n × n r matrix by an n r × n matrix. In particular, ω(1, 1, 1) denoted by ω is known to not exceed 2.373 [15,22]. Next, let the notationÕ( ) suppress polylogarithmic in n factors. Our main contributions are as follows:  -anÕ(n 1.5 )-time algorithm for reporting minimum and maximum witnesses for the Boolean convolution of two n-dimensional vectors, and more generally, añ O(n 1.5 k 0.5 )-time algorithm for the smallest or largest k-witness problem for the convolution; -as corollaries,Õ(n 1.5 k 0.5 ) time bounds for the smallest or largest k-witness problems in string matching; -in part as corollaries, several upper time bounds on computing the (min, +) integer vector convolution in restricted cases, summarized in Table 1; -an O(n 2+λ k)-time algorithm for the smallest or largest k-witness problem for the Boolean matrix product of two n × n Boolean matrices, where λ is a solution to the equation ω(1, λ, 1) = 1 + 2 λ + log n k; -as a corollary, an O(n 2+λ k) time bound for the problem of reporting for each edge of a vertex-weighted graph k lightest (heaviest) triangles containing it, where λ satisfies the aforementioned equation; also, an O(min{n ω k + n 2+o(1) k, n 2+λ k}) time bound for the problem of reporting k lightest (heaviest) triangles in the input vertex-weighted graph.
We shall use the unit-cost RAM computational model [1] with computer word of length logarithmic in the maximum of the size of the input and the value of the largest input integer.

Fact 2.1 Let p and q be two n-dimensional integer vectors. The arithmetic convolution of p and q can be computed inÕ(n) time. Hence, also the Boolean convolution of two n-dimensional vectors can be computed inÕ(n) time.
For a sequence S of integers, we shall denote the minimum number of monotone subsequences into which S can be decomposed by mon(S).   Fact 2.5 [11] A lightest (heaviest) triangle in an undirected vertex weighted graph on n vertices can be found in O(n ω + n 2+o(1) ) time.

Extreme Witnesses for Boolean Convolution
Let c = (c 0 , . . . , c 2n−2 ) be the Boolean convolution of two n-dimensional Boolean vectors a and b. A witness of c i = 1 is any l ∈ [max{i − n + 1, 0}, min{i, n − 1}] such that a l ∧ b i−l = 1. A minimum witness (or maximum witness) of c i = 1 is the smallest (or, the largest, respectively) witness of c i . The witnesses problem (or minimum witness problem, or maximum witness problem) for the Boolean convolution of two n-dimensional Boolean vectors is to determine witnesses (or, the minimum witnesses or the maximum witnesses, respectively) for all non-zero coordinates of the Boolean convolution of the vectors. The k-witness problem (or, the smallest kwitness problem or the largest k-witness problem) for the Boolean convolution of two n-dimensional Boolean vectors is to determine for each non-zero coordinate of the convolution q witnesses (or, q smallest witnesses or q largest witnesses, respectively), where q is the minimum of k and the number of witnesses for this coordinate.
The Boolean vector convolution is very much related to string matching problems [12]. The corresponding problems of reporting a symbol mismatch or match, or up to k such mismatches or matches for each potential alignments of the pattern with the text have been studied in the so called non-standard stringology [4,17]. Also, the focus of this paper is on extreme witnesses. For these reasons and on the other hand, for the completeness sake, we just state a proposition and its generalization on standard witnesses for Boolean vector convolution that can be obtained analogously as the well known corresponding facts on string matching or Boolean matrix product.

Proposition 3.1 (Analogous to [3]) The witnesses problem for Boolean convolution of two n-dimensional vectors can be solved inÕ(n) time.
Proof sketch. The witnesses for the Boolean convolution c of two n-dimensional vectors a and b can be computed analogously as the witnesses for the Boolean matrix product [3]. The first observation is that for all coordinates of c that have a single witness, their witnesses can be obtained by computing the arithmetic convolution of a with the vector b resulting from replacing each 1 in b with the number of the respective coordinate. The next idea is to dilute the other vector b gradually so the number of witnesses for each positive coordinate of c decreases finally to zero but in most cases passing through 1 first. For instance, if c i has l witnesses and in each phase each coordinate of b is set to 0 with probability 1 2 then after a logarithmic number of such phases there is a positive probability that exactly one witness will remain. By iterating the process a logarithmic number of times witnesses for all positive coordinates of c can be determined with high probability.
In order to remove the randomness, we can use small c-wise -bias sample spaces analogously as Alon and Naor in their deterministic algorithm for witnesses of Boolean matrix product [3].
The algorithm, its analysis and derandomization are totally analogous to those of the algorithm of Alon and Naor for witnesses of Boolean matrix product [3]. We refer the reader for the technical details to their paper. It is sufficient to replace matrices with vectors, entries with coordinates and Boolean matrix product with Boolean vector convolution in their proof.
Following [4] and [14], one can also generalize Proposition 1 to include an algorithmic solution to the k-witness problem for Boolean convolution of two n-dimensional vectors inÕ(nk) time.
With a moderate technical effort, the minimum or maximum witness problem for Boolean convolution could be solved by combining the known O(n 2.575 )-time algorithm for the corresponding problem of minimum or maximum witnesses of Boolean matrix product [10] with the known reduction of vector convolution over an arbitrary semi-ring to matrix product over the semi-ring described in Fact 2.3 [5]. The combination results in an O(n 1.787 )-time solution to the extreme witness problem for Boolean convolution. We shall show that a substantially more efficient solution can be obtained directly.

Theorem 3.2 The minimum witness problem (maximum witness problem, respectively) for Boolean convolution of two n-dimensional vectors can be solved inÕ(n 1.5 ) time.
Proof Let a and b be two n-dimensional vectors. Let r be an integer parameter between 1 and n. For p = 1, . . . , n/r , let a p be the Boolean n-dimensional vector resulting from setting to zero all coordinates of a with indices not exceeding ( p − 1)r and those with indices greater than pr . We compute, for each p = 1, . . . , n/r , the Boolean convolution c p of a p and b. Next, for each i = 0, . . . , 2n − 2, we determine the smallest p such that c The method of Theorem 3.2 can be generalized to include the smallest k-witness problem and the largest k-witness problem.

Theorem 3.3
The smallest k-witness problem as well as the largest k-witness problem for Boolean convolution of two n-dimensional vectors can be solved inÕ(n 1.5 k 0.5 ) time.
Proof Let a and b be two input n-dimensional vectors. Let r be an integer parameter between 1 and n. Analogously as in the proof of Theorem 3.2, for p = 1, . . . , n/r , we let a p denote the Boolean n-dimensional vector resulting from setting to zero all coordinates of a with indices not exceeding ( p−1)r and those with indices greater than pr . Next, we compute for each p = 1, . . . , n/r , the arithmetic convolution w p of a l and b by interpreting these vectors as 0 − 1 ones. The arithmetic convolutions provide us with the number of witnesses in each interval (( p − 1)r , pr ] for each coordinate c i of the Boolean convolution c of a and b. Their coordinate-wise sum provides us with the total number of witnesses for each coordinate of c. In order to solve the smallest kwitness problem, for p = 1, . . . , n/r , and for i = 0, . . . , 2n − 2, whenever w p i > 0 and the number of witnesses for c i found so far is less than the minimum of k and the number of witnesses of c i , we search through the interval (( p − 1)r , pr ] from the left to the right for further witnesses. For details see the algorithm depicted in Fig. 1. In the worst case, for each i = 0, . . . , 2n − 2, we need to search through k of such intervals. The total cost of the searches becomes O(n × n r +n ×k ×r ), see lines 15-19 in the algorithm depicted in Fig. 1. On the other hand, the n/r computations of the arithmetic convolutions w p takesÕ(n 2 /r ) time. By setting r = n k , we obtain the claimed time complexity for the smallest k-witness problem.
The largest k-witness problem can be solved analogously in the same asymptotic time by considering the intervals in the opposite order and searching them from the right to the left instead.

String Matching
Fisher and Patterson showed already in 1974 [12] that several string matching problems can be efficiently reduced to Boolean vector convolution.
Suppose we are given two strings τ = τ m−1 τ m−2 ...τ 0 and ρ = ρ 0 ρ 1 ...ρ n−1 , where m < n, over a finite alphabet Σ. Following [12], for γ ∈ Σ, let H γ ( ) be a function from Σ to { true, false } such that H γ (x) = true if and only if x = γ. If i + m ≤ n, the question of whether τ m−1 τ m−2 ...τ 0 matches ρ i ρ i+1 ...ρ i+m−1 is equivalent to a conjunction of the negations of terms m−1 l=0 H α (ρ i+l )∧ H β (τ m−1−l ), where α, β ∈ Σ and α = β. Note that whenever such a term is true, the matching cannot take place as at some position α clashes with β. In this way, the standard string matching problem for τ and ρ easily reduces to O(|Σ| 2 ) Boolean convolutions of two Boolean vectors of length at most n. Observe now that witnesses for the aforementioned Boolean convolutions yield positions of the clashes, in other words, symbol mismatches. If we modify the terms to m−1 l=0 H α (ρ i+l ) ∧ H α (τ m−1−l ), for α ∈ Σ, the witnesses for the O(|Σ|) Boolean convolutions yield positions of two sided matches with α ∈ Σ. Hence, we obtain the following theorem as a corollary from Theorem 3.3.

Theorem 3.4 Consider the string matching problem for a text string of length n and a pattern string of length m < n, both over a finite alphabet. For each alignment of the pattern with the text, we can provide locations of the k earliest symbol mismatches and/or the k earliest symbol matches as well as locations of the k latest symbol mismatches and the k latest symbol matches in the alignments inÕ(n 1.5 k 0.5 ) time in total.
In particular, we can also provide positions of the earliest and/or latest two-side symbol matches with a given alphabet symbol (cf. ones problem in [17]) in the alignments inÕ(n 1.5 k 0.5 ) time in total.

(min, +) Convolution
Our original motivation has been an extension of the O(n 1.859 )-time algorithm due to Chan and Levenstein for the (min, +) convolution of two n-dimensional vectors with integer coordinates of size O(n) forming monotone sequences [6] to include the case where the vectors are decomposable into relatively few monotone subsequences. The major difficulty here is that a completion of the subsequences to full monotone sequences can affect the result. Roughly, we can avoid this difficulty when the coordinates of each of the vectors range over relatively few different values or all the subsequences are simultaneously either non-decreasing or non-increasing (see Table 1). The idea is to use our algorithm for minimum and maximum witnesses of Boolean convolution.
The correctness of the algorithm depicted in Fig. 2 relies on the following straightforward lemma. Fig. 2, the following equivalence holds: d k = 0 in line 13 if and only if min{a l + b m |l + m = k ∧ a l ∈ a i ∧ b m ∈ b j } is equal to the first argument of the minimum in this line. Fig. 2 computes their (min, +) convolution inÕ(c a m b n 1.5 ) steps.

Theorem 3.6 Let a and b be two n-dimensional integer vectors such that the coordinates of a range over at most c a different values while the sequence of the consecutive coordinates of b can be decomposed into m b monotone subsequences. The algorithm depicted in
Proof By Lemma 3.5 and line 13 in the algorithm, none of the coordinates of the output vector has a lower value than the corresponding coordinate of the (min, +) If we are given decompositions of the two input n-dimensional vectors a and b into monotone subsequences that are either all non-decreasing or all non-increasing then we can use the algorithm depicted in Fig. 3 which is analogous to that depicted in Fig. 2, in order to compute the (min, +) convolution of a and b. Thus, first for each subsequence a i of a and each subsequence b j of b, we compute the Boolean vectors char(a i ) and char(b j ) indicating with ones the coordinates of a or b covered by a i or b j , respectively. Next, depending if the subsequences are non-decreasing or nonincreasing, for each pair of such subsequences a i and b j , we compute the minimum witnesses of the Boolean convolution of char(a i ) and char(b j ) or the maximum witnesses of this convolution, respectively. We use the extreme witnesses to update  Fig. 2. Hence, we obtain the following theorem. Proof The proof of the correctness of the algorithm depicted in Fig. 3 is analogous to that of the correctness of the algorithm depicted in Fig. 2. The time complexity analysis of the former algorithm is also similar to that of the latter algorithm. The main difference is that the decompositions of a and b into subsequences are given and that the O(n 1.5 )-time algorithm for minimum or maximum witnesses of Boolean convolution is run m a m b times instead of c a m b times.
By combining Fact 2.3 with Fact 2.4, we also obtain the following bound. (min, +) convolution of a and b can be computed inÕ(c a n 1.844 ) time.

Theorem 3.8 Let a and b be two n-dimensional integer vectors such that the coordinates of a range over at most c a different values. The
We can also consider the problem of computing the (min, +) integer vector convolution of the input vectors a and b, when their coordinates range over c a and c b different integers, respectively. We can use the algorithm depicted in Fig. 4, analogous to that depicted in Fig. 2. The first difference is that the subsequences b j on the side of b are also constant. It follows that for any pair of such constant subsequences a i and b j , the value of the sum of any element from a i with any element from b j is constant and it can be trivially computed as a i 1 + b j 1 a priori. For this reason, it is sufficient to compute the Boolean convolution d of char(a i ) and char(b j ) for each pair a i and b j . Then, for any non-zero coordinate of d, we need to update the corresponding coordinate of the computed (min, +) convolution of a and b by taking the minimum of the coordinate and a i 1 + b j 1 . By Fact 2.1, we obtain the following theorem. Theorem 3.9 Let a and b be two n-dimensional integer vectors such that their coordinates range over at most c a or c b different values, respectively. The algorithm depicted in Fig. 4 computes the (min, +) convolution of a and b inÕ(c a c b n) time.
Proof The algorithm depicted in Fig. 4 can be easily implemented inÕ(c a c b n) time by running c a c b times the knownÕ(n)-time algorithm for Boolean convolution of two n-dimensional Boolean vectors, see Fact 2.1.

Extreme Witnesses for Boolean Matrix Product
For two n × n Boolean matrices A and B, a witness of a C[i, j] entry of the Boolean matrix product of A and B is any index m such that A[i, m] ∧ B[m, j] = 1. Next, the minimum witness and maximum witness for an entry of C as well as the witness problem, the minimum and maximum witness problems, the k-witness problem, and the smallest k-witness and largest k-witness problems for Boolean matrix product of A and B are defined analogously as those for Boolean vector convolution.
In this section, we shall present a generalization of the algorithm for minimum and maximum witnesses for Boolean matrix product from [10] to include the smallest and largest k-witness problems.
Let be a positive integer smaller than n. We may assume w.l.o.g. that n is divisible by . Partition the matrix A into n × sub-matrices A p and the matrix B into × n sub-matrices B p , such that 1 ≤ p ≤ n/ , and the sub-matrix A p covers the columns ( p −1) +1 through p of A whereas the sub-matrix B p covers the rows ( p −1) +1 through p of B.
For p = 1, . . . , n/ , let W p be the arithmetic product of A p and B p treated as 0 − 1 matrices. On the other hand, let C denote the Boolean matrix product of A and B.  [1, p ].
By this lemma, after computing all the matrix products W p = A p ·B p , 1 ≤ p ≤ n/ , we need O(n/ + k ) time per positive entry of C to find up to k smallest witnesses: Recall that ω(1, r , 1) denotes the exponent of the multiplication of an n ×n r matrix by an n r ×n matrix. It follows that the total time taken by our algorithm for the smallest k-witness problem is O((n/ ) · n ω(1,log n ,1) + n 3 / + n 2 k ) .

Theorem 4.2
Let λ be such that ω(1, λ, 1) = 1 + 2 λ + log n k. The smallest k-witness problem as well as the largest k-witness problem for the Boolean matrix product of two n × n Boolean matrices can be solved in O(n 2+λ k) time.
Le Gall has recently substantially improved upper time bounds on rectangular matrix multiplication in [16]. In consequence, he could show that for the equation ω(1, μ, 1) = 1 + 2 μ, μ < 0.5302. This in particular improves the upper time bound for the minimum and maximum witness problems from O(n 2.575 ) to O(n 2.5302 ). It follows that for k 1, λ in Theorem 4.2 is substantially smaller than 0.5302.

Lightest Triangles
By generalizing the reduction of the problem of reporting for each edge of a vertexweighted graph a heaviest triangle containing it to the maximum witness problem for Boolean matrix product from [21] to include reporting k heaviest triangles and the largest k-witness problem, we obtain the following theorem as a corollary from Theorem 4.2. Proof Number the vertices of G in non-decreasing vertex-weight order. Next, solve the smallest (largest) k-witness problem for the Boolean matrix product C of the adjacency matrix of G with itself. For each edge e = {i, j} of G, the up to k smallest (or, largest) witnesses of C[i, j] yield the q e lightest (or, heaviest, respectively) triangles in G including e. Theorem 4.2 yields the claimed upper bound.
As for the problem of finding k lightest (heaviest) triangles in a vertex-weighted graph, iterating the O(n ω + n 2+o(1) )-time algorithm for finding a lightest or heaviest triangle described in Fact 2.5 seems to be a better choice for up to moderate values of k. Before each next iteration, we remove the three vertices of the lastly reported triangle. After k iterations, we stop and find among the reported triangles and no more than 3(k − 1)n 2 other triangles incident to the removed vertices, the k lightest (heaviest) triangles if possible. The method takes O(n ω k + n 2+o(1) k + n 2 k), i.e., O(n ω k + n 2+o(1) k) time.

Theorem 4.4
Let G be an undirected vertex weighted graph on n vertices and let k be a natural number not exceeding n. Next, let λ be such that ω(1, λ, 1) = 1 + 2 λ + log n k. We can list q lightest (heaviest) triangles in G, where q is the minimum of k and the number of triangles in G, in O(min{n ω k + n 2+o(1) k, n 2+λ k}) time.
Finding or detecting triangles of extreme weight in vertex-weighted graphs has a number of applications . First of all, it can be used to solve the corresponding general problem of finding or detecting subgraphs or induced subgraphs of extreme weigh [11,20,21]. Vassilevska and Williams list also two other applications in [19]: a general variant of the 3-SUM problem and a general buyer-seller problem in computational economy.

Final Remarks
It is an interesting open problem if any of our upper time bounds on minimum and maximum witnesses for Boolean vector convolution and the extreme k-witness problems both for Boolean vector convolution and Boolean matrix product can be substantially improved? Note here that so far the O(n 2+λ ) time bound (where ω(1, λ, 1) = 1 + 2λ) on minimum and maximum witnesses of Boolean matrix product established one decade ago [10] couldn't be improved (see also [8]).
The problems of Boolean vector convolution and Boolean matrix product seem to be similar but there are some substantial differences between them. The former problem admits almost a linear in the input size algorithm while for the latter problem the current upper time bound is substantially non-linear [15,22]. There is a moderately efficient reduction of vector convolution to matrix product described in Fact 2.3 while such a reverse reduction is not known. Our upper time bounds for minimum and maximum witnesses of Boolean vector convolution show that a direct approach to Boolean vector convolution can yield better upper time bounds than those obtained by conveying known upper time bounds for the witness problems for Boolean matrix product via Fact 2.3 to those corresponding for Boolean vector convolution.
The extreme k-witness problems for Boolean matrix product presumably admit several other applications often corresponding to generalizations of the applications for minimum and maximum witnesses of Boolean matrix product [18,21] and/or the applications of the k-witness problem for Boolean matrix product [4], e.g., the all-pairs k-bottleneck paths.
Finally, a potentially interesting direction for further research is to consider approximation variants of the extreme witnesses problems and the (min, +) vector convolution.