Improved Merlin–Arthur Protocols for Central Problems in Fine-Grained Complexity

In a Merlin–Arthur proof system, the proof verifier (Arthur) accepts valid proofs (from Merlin) with probability 1, and rejects invalid proofs with probability arbitrarily close to 1. The running time of such a system is defined to be the length of Merlin’s proof plus the running time of Arthur. We provide new Merlin–Arthur proof systems for some key problems in fine-grained complexity. In several cases our proof systems have optimal running time. Our main results include: Certifying that a list of n integers has no 3-SUM solution can be done in Merlin–Arthur time O~(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n)$$\end{document}. Previously, Carmosino et al. [ITCS 2016] showed that the problem has a nondeterministic algorithm running in O~(n1.5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{1.5})$$\end{document} time (that is, there is a proof system with proofs of length O~(n1.5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{1.5})$$\end{document} and a deterministic verifier running in O~(n1.5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{1.5})$$\end{document} time). Counting the number of k-cliques with total edge weight equal to zero in an n-node graph can be done in Merlin–Arthur time O~(n⌈k/2⌉)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{O}}(n^{\lceil k/2\rceil })$$\end{document} (where k≥3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\ge 3$$\end{document}). For odd k, this bound can be further improved for sparse graphs: for example, counting the number of zero-weight triangles in an m-edge graph can be done in Merlin–Arthur time O~(m)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{O}}(m)$$\end{document}. Previous Merlin–Arthur protocols by Williams [CCC’16] and Björklund and Kaski [PODC’16] could only count k-cliques in unweighted graphs, and had worse running times for small k. Computing the All-Pairs Shortest Distances matrix for an n-node graph can be done in Merlin–Arthur time O~(n2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^2)$$\end{document}. Note this is optimal, as the matrix can have Ω(n2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (n^2)$$\end{document} nonzero entries in general. Previously, Carmosino et al. [ITCS 2016] showed that this problem has an O~(n2.94)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{2.94})$$\end{document} nondeterministic time algorithm. Certifying that an n-variable k-CNF is unsatisfiable can be done in Merlin–Arthur time 2n/2-n/O(k)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n/2 - n/O(k)}$$\end{document}. We also observe an algebrization barrier for the previous 2n/2·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n/2}\cdot \textrm{poly}(n)$$\end{document}-time Merlin–Arthur protocol of R. Williams [CCC’16] for #\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\#$$\end{document}SAT: in particular, his protocol algebrizes, and we observe there is no algebrizing protocol for k-UNSAT running in 2n/2/nω(1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n/2}/n^{\omega (1)}$$\end{document} time. Therefore we have to exploit non-algebrizing properties to obtain our new protocol. Certifying a Quantified Boolean Formula is true can be done in Merlin–Arthur time 24n/5·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{4n/5}\cdot \textrm{poly}(n)$$\end{document}. Previously, the only nontrivial result known along these lines was an Arthur–Merlin–Arthur protocol (where Merlin’s proof depends on some of Arthur’s coins) running in 22n/3·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{2n/3}\cdot \textrm{poly}(n)$$\end{document} time. Due to the centrality of these problems in fine-grained complexity, our results have consequences for many other problems of interest. For example, our work implies that certifying there is no Subset Sum solution to n integers can be done in Merlin–Arthur time 2n/3·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n/3}\cdot \textrm{poly}(n)$$\end{document}, improving on the previous best protocol by Nederlof [IPL 2017] which took 20.49991n·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{0.49991n}\cdot \textrm{poly}(n)$$\end{document} time. Certifying that a list of n integers has no 3-SUM solution can be done in Merlin–Arthur time O~(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n)$$\end{document}. Previously, Carmosino et al. [ITCS 2016] showed that the problem has a nondeterministic algorithm running in O~(n1.5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{1.5})$$\end{document} time (that is, there is a proof system with proofs of length O~(n1.5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{1.5})$$\end{document} and a deterministic verifier running in O~(n1.5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{1.5})$$\end{document} time). Counting the number of k-cliques with total edge weight equal to zero in an n-node graph can be done in Merlin–Arthur time O~(n⌈k/2⌉)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{O}}(n^{\lceil k/2\rceil })$$\end{document} (where k≥3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\ge 3$$\end{document}). For odd k, this bound can be further improved for sparse graphs: for example, counting the number of zero-weight triangles in an m-edge graph can be done in Merlin–Arthur time O~(m)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{O}}(m)$$\end{document}. Previous Merlin–Arthur protocols by Williams [CCC’16] and Björklund and Kaski [PODC’16] could only count k-cliques in unweighted graphs, and had worse running times for small k. Computing the All-Pairs Shortest Distances matrix for an n-node graph can be done in Merlin–Arthur time O~(n2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^2)$$\end{document}. Note this is optimal, as the matrix can have Ω(n2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (n^2)$$\end{document} nonzero entries in general. Previously, Carmosino et al. [ITCS 2016] showed that this problem has an O~(n2.94)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(n^{2.94})$$\end{document} nondeterministic time algorithm. Certifying that an n-variable k-CNF is unsatisfiable can be done in Merlin–Arthur time 2n/2-n/O(k)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n/2 - n/O(k)}$$\end{document}. We also observe an algebrization barrier for the previous 2n/2·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n/2}\cdot \textrm{poly}(n)$$\end{document}-time Merlin–Arthur protocol of R. Williams [CCC’16] for #\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\#$$\end{document}SAT: in particular, his protocol algebrizes, and we observe there is no algebrizing protocol for k-UNSAT running in 2n/2/nω(1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n/2}/n^{\omega (1)}$$\end{document} time. Therefore we have to exploit non-algebrizing properties to obtain our new protocol. Certifying a Quantified Boolean Formula is true can be done in Merlin–Arthur time 24n/5·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{4n/5}\cdot \textrm{poly}(n)$$\end{document}. Previously, the only nontrivial result known along these lines was an Arthur–Merlin–Arthur protocol (where Merlin’s proof depends on some of Arthur’s coins) running in 22n/3·poly(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{2n/3}\cdot \textrm{poly}(n)$$\end{document} time.

of zero-weight triangles in an m-edge graph can be done in Merlin-Arthur timẽ O(m). Previous Merlin-Arthur protocols by Williams [CCC '16] and Björklund and Kaski [PODC '16] could only count k-cliques in unweighted graphs, and had worse running times for small k.
• Computing the All-Pairs Shortest Distances matrix for an n-node graph can be done in Merlin-Arthur timeÕ(n 2 ). Note this is optimal, as the matrix can have (n 2 ) nonzero entries in general. Previously, Carmosino et al. [ITCS 2016] showed that this problem has anÕ(n 2.94 ) nondeterministic time algorithm.

Introduction
Fine-grained complexity has identified core problems that act as bottlenecks to obtaining faster algorithms for various tasks in computer science. Perhaps the most prominent problems among these are Satisfiability (SAT), Orthogonal Vectors (OV), 3-SUM, and All-Pairs Shortest Paths (APSP). The hypotheses that the known algorithms for these problems are essentially optimal have led to far-reaching consequences for the exact complexity of many problems of interest (see for example the survey by Vassilevska Williams [53]). There is now a vast web of "fine-grained" reductions among computational tasks, which has led to large equivalence classes of problems [3,10,19,23,44,48,55], many of which a priori look unrelated. For each of these equivalence classes, solving one problem in the class faster means that all problems in that class have more efficient algorithms.
Recently there has been growing interest in obtaining efficient Merlin-Arthur (MA) proof systems for problems studied in fine-grained complexity. Recall that in a Merlin-Arthur proof system, the probabilistic verifier (Arthur) always accepts valid proofs (from the prover Merlin), and rejects invalid proofs with probability arbitrarily close to 1. Williams [58] shows (among other results) that the OV problem 1 for sets of n vectors with dimension d can be solved by a Merlin-Arthur protocol in near-optimalÕ(nd) time, 2 achieving a nearly quadratic speedup compared to the fastest known algorithms for OV [7,20]. One consequence of this result is the refutation of the Merlin-Arthur Strong Exponential-Time Hypothesis, which could be viewed as evidence against the Nondeterministic Strong Exponential-Time Hypothesis (NSETH) proposed by Carmosino et al. [16]. 3 The main technical component in Williams' work is a "batch evaluation" protocol for low-degree arithmetic circuits, with which Merlin can quickly convince Arthur of the outputs of a circuit C on a set of points a 1 , . . . , a K , faster than evaluating C independently on each point a i . This protocol can be used to obtain efficient Merlin-Arthur protocols for various other problems, such as #SAT, counting dominating pairs of vectors, and counting Hamiltonian cycles [58]. Building on Williams' batch-evaluation protocol and employing additional algebraic techniques, Björklund and Kaski [11] obtained improved Merlin-Arthur protocols 4 for more problems, such as #k-Clique, Graph Coloring, and Set Cover. For many of these problems, the obtained Merlin-Arthur protocols achieve quadratic speedup compared to the fastest known algorithms. Variants of these protocols have been used as Proofs of Work based on fine-grained hardness assumptions [14], which have led to further work in fine-grained cryptography and average-case fine-grained complexity [8,13,21,27,30,33,43]. In [28,29], doubly efficient proof systems for #k-SUM, #k-Clique, and APSP were constructed, in which the prover runs in polynomial time (for constant k) and the verifier runs in "almost linear" time (i.e., N 1+o(1) time, where N is the input length). 5 Efficient batch verification using interactive protocols with a constant (but greater than two) number of rounds has also been developed for problems with polynomial-time verifiable, unique witnesses [49] and problems which can be solved by algorithms with small space complexity [50]. A different line of work in the stream verification setting has developed sublinear space protocols for various graph problems [15,17].
Given the interest in fine-grained complexity and proof systems, a natural question is to understand the Merlin-Arthur time complexity of core problems in fine-grained complexity. How efficiently can solutions to these problems be verified, with a randomized verifier? As seen above, such questions may have cryptographic applications, and in general they may give insight into the structure of these problems. Williams [58] already showed that OV admits near-optimal Merlin-Arthur protocols. In this work, we present improved Merlin-Arthur protocols for 3-SUM, APSP, and many more core 1 The OV problem is the following: Let d = ω(log n); given two sets A, B ⊆ {0, 1} d with |A| = |B| = n, determine whether there exist a ∈ A and b ∈ B so that d i=1 a i · b i = 0. 2 We useÕ( f (n)) to hide polylog( f (n)) factors. computational tasks. For several of these problems our protocols yield optimal running times (up to polylogarithmic factors) for the verifier.

Our Results
In this section, we describe our new results and compare them with previous work.
Faster Protocols for k-SUM and Related Problems. In the k-SUM problem, we are given n integers from [−n c , n c ] for some constant c (only depending on k), and wish to determine if some k of them sum to zero. The unparameterized version of this problem is the Subset Sum problem, in which we are given n positive integers less than or equal to 2 cn , for some constant c, and a target integer t, and must decide if some subset of the input integers sums to t.
Our first result is a Merlin-Arthur protocol for certifying there is no k-SUM solution, which is significantly more efficient than the best known nondeterministic algorithm [16] for the same task, which runs inÕ(n k/2 ) time. 6 Theorem 1.1 For any fixed integer k ≥ 3, certifying that a list of n integers has no k-SUM solution can be done in Merlin-Arthur timeÕ(n k/3 ).
In particular, our protocol for 3-SUM runs in near-optimalÕ(n) time. As an immediate corollary, we obtain a faster Merlin-Arthur protocol for certifying a Subset Sum instance has no solution.

Corollary 1.2 Certifying that a list of n integers has no
Subset Sum solution can be done in 2 n/3 · poly(n) Merlin-Arthur time.
The previous best Merlin-Arthur protocol for Subset Sum was presented by Nederlof in [45] and takes 2 0.49991n · poly(n) time. 7 Note that Subset Sum can be solved deterministically in O(2 n/2 ) time [32].
In the MinPlus Convolution problem, we are given two integer arrays a = (a 0 , a 1 , . . . , a n−1 ) and b = (b 0 , b 1 , . . . , b n−1 ), and want to compute the array c whose entries are defined by taking The best known algorithm for MinPlus Convolution takes n 2 /2 ( √ log n) time [9,20,59]. It is known that MinPlus Convolution has a fine-grained reduction to 3-SUM [18,48,54]. We observe that these reductions combined with Theorem 1.1 imply a near-linear time Merlin-Arthur protocol for MinPlus Convolution.

Corollary 1.3 MinPlus Convolution can be solved in Merlin-Arthur timeÕ(n).
The All-Numbers k-SUM problem is a seemingly harder version of k-SUM, where we want to decide for every input integer whether or not it belongs to a set of k inputs that sum to zero. It is known that All-Numbers k-SUM and k-SUM are fine-grained equivalent [55,Theorem 8.1] in the sense that one of the problems can be solved in O(n k/2 −ε ) time for some ε > 0 if and only if the other problem can also be solved in that running time, but with possibly a different ε. We observe that our k-SUM protocol can be extended to solve All-Numbers k-SUM in the same running time.

Corollary 1.4
For any fixed integer k ≥ 3, the All-Numbers k-SUM problem can be solved in Merlin-Arthur timeÕ(n k/3 ).
Counting zero-weight cliques. In the #Zero-Weight k-Clique problem, we are given a simple undirected graph G on n vertices and m edges with integer edge weights from [−n c , n c ] for a positive constant c, and are tasked with counting the number of k-cliques in G whose edge weights sum to zero. The easier #k-Clique problem is equivalent to the special case of #Zero-Weight k-Clique where all edge weights in the input graph are zero.
These two problems have been extensively studied in fine-grained complexity. The trivial brute-force algorithm for both problems runs in O(n k ) time. The #k-Clique problem can be solved faster using fast matrix multiplication [36,47]. For the detection versions of these problems, it has been conjectured that Zero-Weight k-Clique cannot be solved faster than n k−ε , and that k-Clique cannot be solved faster than n kω/3−ε (where 2 ≤ ω < 2.373 denotes the matrix multiplication exponent [5,42,52]), for any constant ε > 0 [1,44]. Some recent works employ even stronger conjectures about the hardness of k-Clique for integers k not divisible by 3 [2]. For k = 3, the Zero-Weight k-Clique problem is simply the Zero-Weight Triangle problem, and it is known that any truly subcubic algorithm for this problem would refute both the APSP conjecture and the 3-SUM conjecture [54,55].
We present an improved Merlin-Arthur protocol for the harder #Zero-Weight k-Clique problem. Theorem 1.5 For any fixed integer k ≥ 3, #Zero-Weight k-Clique on a graph with n nodes and m edges can be solved by a Merlin-Arthur protocol with proof length O(n k/2 ) and verification timeÕ n k/2 · m/n 2 (k+1)/4 + n k/2 .
Two notable cases are k = 3 and k = 4, for which Theorem 1.5 shows that #Zero-Weight 4-Clique can be solved in Merlin-Arthur timeÕ(n 2 ) which is near-optimal for dense graphs, and #Zero-Weight Triangle can be solved with proof lengthÕ(n) and verification timeÕ(m), which is near-optimal for graphs of any sparsity. Applying known reductions [55] immediately implies quadratic time Merlin-Arthur protocols for the following problems, which we define later in Sect. 4. Unsatisfiability of k-CNFs. We give a Merlin-Arthur protocol for certifying the unsatisfiability of a k-CNF (i.e., solving the k-UNSAT problem), which runs faster than the previously known 2 n/2 · poly(n)-time protocol. Theorem 1.7 There is a universal constant δ > 0 such that for all sufficiently large integers k > 0, we can verify any unsatisfiable n-variable m-clause k-CNF with a Merlin-Arthur protocol running in 2 n(1/2−δ/k) · poly(n, m) time.
Previously, Williams [58] had shown it is possible to count the number of satisfying assignments to CNFs on n variables and m clauses with a Merlin-Arthur protocol in 2 n/2 · poly(n, m) time. We find Theorem 1.7 intriguing, not just because it runs more efficiently, but also because the result provably must not algebrize (in the sense of [6,34]). In particular, we observe that • Williams' Merlin-Arthur protocol for #SAT algebrizes, and • there is no algebrizing protocol for k-UNSAT running in 2 n/2 /n ω (1) time.
More formally, we have the following two theorems: Proposition 1.8 [Williams' protocol algebrizes] For every oracle A, #CNF-SAT A on formulas with n variables and size poly(n) can be computed in Merlin-Arthur time 2 n/2 · poly(n) with oracle access to the multilinear extension of A over any field of characteristic greater than 2 n (and order at most 2 poly(n) ). Proposition 1.9 [Follows from [6]] There is an oracle A such that there is no Merlin-Arthur protocol running in 2 n/2 /n ω(1) time for 1-UNSAT A , even for protocols with oracle access to the multilinear extension of A (over any field of order 2 poly(n) ).
Therefore the properties exploited in the protocol of Theorem 1.7 (which applies an earlier reduction of [35] from fine-grained complexity) are provably non-algebrizing. We are hopeful that further study of such results may lead to new progress in lower bounds via algorithms.
Quantified Boolean Formulas. A Quantified Boolean Formula in prenex normal form (QBF), is a formula consisting of a propositional formula F of size m over n variables, preceded by quantifiers Q i ∈ {∃, ∀}. Deciding whether a given QBF is true is a canonical PSPACEcomplete problem.
Williams [58] gave a 3-round interactive protocol (i.e., an AMA protocol) for true QBFs that runs in 2 2n/3 · poly(n, m) time. The same work [58] raised the question of whether there is a Merlin-Arthur (two-round) protocol for deciding true QBFs which runs in 2 (1−ε)n · poly(n, m) time for some constant ε > 0. We resolve this open problem in the affirmative: Theorem 1.10 True Quantified Boolean Formulas (TQBF) with n variables and size m ≤ 2 n can be certified by a Merlin-Arthur protocol running in 2 4n/5 · poly(n, m) time.

Organization
In Sect. 2 we provide definitions and some useful known results. In Sect. 3 we present Merlin-Arthur protocols for the k-SUM problem and several related problems. In Sect. 4 we present a Merlin-Arthur protocol for counting zero-weight k-cliques, and show that it implies near-optimal protocols for many related fine-grained problems such as APSP. In Sect. 5 we present a Merlin-Arthur protocol for certifying unsatisfiability of k-CNFs. Then, in Sect. 6 we describe two barriers for obtaining better Merlin-Arthur protocols. In Sect. 7 we present a Merlin-Arthur protocol for the True Quantified Boolean Formulas problem. Finally we conclude with some open questions in Sect. 8.

Preliminaries
We assume basic familiarity with computational complexity and algorithms. The following notions will be particularly important for this paper.
Merlin-Arthur Protocols. We say that a function f : {0, 1} → {0, 1} has a Merlin-Arthur protocol (or proof system) running in T (n) time with proofs of length P(n) if there is a probabilistic algorithm V such that for all binary strings x with |x| = n: • If f (x) = 1, then there is a y ∈ {0, 1} P(n) such that V (x, y) accepts in T (n) time, with probability 1. • If f (x) = 0, then for every y ∈ {0, 1} P(n) , V (x, y) rejects in T (n) time, with probability at least 2/3.
Concretely, we assume Arthur's verification algorithm V runs in the word-RAM model with words of length log(n). We only consider protocols where the proof length P(n) is bounded above by the verification time T (n). We often refer to T (n) as the Merlin-Arthur time of the protocol. If we say a problem can be "solved in Merlin-Arthur time T (n)," we mean it has a Merlin-Arthur protocol running in time T (n).
Williams' Multipoint Evaluation Protocol. We will use Williams' protocol [58] for the Multipoint Circuit Evaluation problem, defined as follows.
Definition 2.1 [58, Definition 1.1]] The Multipoint Circuit Evaluation problem: given an arithmetic circuit C on n variables over a finite field F, and a list a 1 , . . . , a K ∈ F n , output (C(a 1 ), . . . , C(a K )) ∈ F K . [58,Theorem 3.1]] For every prime power q and ε > 0, Multipoint Circuit Evaluation for K points in (F q ) n on an arithmetic circuit C of n inputs, s gates, and degree d has an MA-proof system where:

Fast Polynomial Evaluation and Interpolation.
We need the following classical results on algebraic algorithms. We write [n] = {1, 2, . . . , n} to denote the set of the first n positive integers.

Theorem 2.3 [Fast multipoint evaluation [22], multivariate version] Let k be a fixed positive integer. Given a k-variate polynomial
with each variable having individual degree less than n, presented as at most n k coefficients, and given kn points

Theorem 2.4 [Fast interpolation [31], multivariate version] Let k be a fixed positive integer. Given kn points
we can output the coefficients of the unique such polynomial p ∈ F[X 1 , . . . , X k ] in which every variable has individual degree less than n, inÕ(n k ) additions and multiplications in F.
The original references for these two theorems only proved the univariate case (k = 1), but one can easily prove the multivariate versions above by applying the univariate algorithms to each variable one by one. Here we provide a sketch of the reduction from multivariate interpolation to univariate interpolation.
The original univariate versions of these theorems were also used in Williams' protocol [58].
Proof Sketch of Theorem 2. 4 We will show how the univariate version [31] of the interpolation algorithm easily implies the k-variate case.
We prove the result by induction on k. Recall that the n points on the x k coordinate are α k,1 , α k,2 , . . . , α k,n . Consider the (k − 1)-variate polynomials , by the induction hypothesis we know that the {q j } j∈ [n] are uniquely determined and can be computed by running the (k − 1)-variate interpolation algorithm inÕ(n · n k−1 ) =Õ(n k ) total time. Finally, for each tuple of degrees, the coefficients of the monomials x d 1 1 · · · x d k−1 k−1 in the polynomials {q j } j∈[n] taken together uniquely determine the coefficients of x d 1 1 · · · x d k−1 k−1 x d k in the polynomial p for all 0 ≤ d ≤ n − 1. These coefficients can again be recovered by the univariate interpolation algorithm, taking inÕ(n) time for each tuple.

An Improved Merlin-Arthur Protocol for k-SUM
Let k be a positive integer. In the k-SUM problem, we are given n integers a 1 , . . . , a n with magnitude at most n c for some constant c, and are tasked with determining if there exist indices i 1 , . . . , i k (not necessarily distinct) such that We call a list of indices i 1 , . . . , i k satisfying the above equation a k-SUM solution. We remark that another popular version of k-SUM from the literature, which we call k-SUM-Distinct, additionally requires the indices i 1 , . . . , i k in the solution to be distinct. Here, for convenience and consistency with the definition used in [16], we focus on the k-SUM problem, and later in Sect. 3.1 note how k-SUM-Distinct can be easily reduced to k-SUM.
In the Merlin-Arthur setting, it is trivial to verify a k-SUM solution exists since Merlin can just send Arthur a solution. Certifying that no k-SUM solutions exist is much more challenging. Our protocol for this problem is based on the following protocol for quickly computing a coefficient in a product of polynomials.
be univariate polynomials over F q each of degree at most d, for some prime q. Let M be the total number of nonzero coefficients appearing among these polynomials. Then given any integer t and error rate δ ∈ (0, 1), there is a Merlin-Arthur protocol for determining the coefficient of x t in the product Proof We may assume that 0 ≤ t ≤ kd, since otherwise the coefficient of x t in P is zero. Let be the product of all the F i polynomials. Set m = √ d . The protocol works as follows.
1. For each nonnegative integer ≤ kd with ≡ t (mod m), Merlin sends Arthur some c ∈ F q . Each such c term is Merlin's claim for the value of the corresponding coefficient p in P(x). 2. Arthur takes an integer h such that q h ≥ kd/δ and then samples w ∈ F q h uniformly at random. To construct the field F q h , Arthur just needs a polynomial of degree h irreducible over F q . As noted in [58], we can do this efficiently by having Merlin send such a polynomial, and then having Arthur verify the polynomial is irreducible in asymptotically (1) time using known irreducibility tests [40,Section 8.2].
For the rest of this protocol, Arthur performs all computations over F q h . For each polynomial of degree at most d, we say its reduced form is the polynomial formed by reducing the polynomial F i (wx) modulo x m − 1.
Arthur first constructs the reduced forms G i of F i for each 1 ≤ i ≤ k, inÕ(M) time. Then, using fast polynomial multiplication, Arthur computes the product By adding the coefficients of this polynomial and appealing to the definition of the reduced polynomials in Eq. (1), Arthur can compute the quantity In the second summation above, we are summing over a subset of k-tuples ( 1 , . . . , k ) with the property that 0 ≤ i ≤ d for each i. We define b i to be the residue of i modulo m for each i, and only consider those k-tuples in the sum if the sum of their residues modulo m is congruent to t modulo m. In the transition from the second to the third summation above, we note that this is equivalent to summing over all k-tuples ( 1 , . . . , k ) such that 0 ≤ i ≤ d for each i and the sum of the i is congruent to some integer t modulo m. After computing the sum from Eq. (2), Arthur also computes inÕ(M) time using the values Merlin sent. If this sum and the value of the sum from Eq. (2) agree over F q h , then Arthur accepts and returns c t as the coefficient of x t in P(x). If the sums disagree, then Arthur rejects the proof.
In the above protocol, if Merlin sends integers with c = p for all ≤ kd with ≡ t (mod m), then the values from Eqs. (2) and (3) will agree. If this happens, Arthur will accept and correctly determine p t as the value of the desired coefficient.
The only way for Arthur to accept an incorrect value for the coefficient is if q t = p t . In this case, are distinct polynomials over F q of degree at most kd. This means they agree on at most kd points. So, for uniform random w ∈ F q h , we have Q(w) = C(w) with probability at least Thus with probability at least 1 − δ, Arthur rejects an incorrect proof.

Remark 3.2
Recall that in the Subset Sum problem, we are given input integers a 1 , . . . , a n , and must decide if some collection of the inputs sums to a given target integer t. Although framed somewhat differently, Nederlof's Merlin-Arthur protocol for Subset Sum from [45] employs a similar tactic, and can be recovered by applying Lemma 3.1 to check if the coefficient of x t in the product is nonzero or not.

Reminder of Theorem 1.1 For any fixed integer k ≥ 3, certifying that a list of n integers has no k-SUM solution can be done in Merlin-Arthur timeÕ(n k/3 ).
Before proving Theorem 1.1, we first informally describe the three primary ideas underlying our Merlin-Arthur protocol.
First, solving k-SUM corresponds to checking the coefficient of some product of polynomials, where the degree of the polynomials is related to the magnitude max i |a i | of the input integers. This is hard in general since these magnitudes could be large polynomials in n, but could be made more efficient if there was a simple way to reduce the sizes of the inputs.
The second idea comes from the conondeterministic algorithm of [16, Lemma 5.8] for k-SUM: we have Merlin send a small prime p such that "few" sums of the k input integers vanish modulo p. Given p, Arthur can easily count the number of these sums (intuitively, because Arthur can replace each a i with its residue modulo p to reduce the size of the input integers). If Merlin then sends all the k-tuples of inputs that sum to zero modulo p, Arthur can check that the number of tuples sent matches the count computed, and then scan through the list to verify that none of the given sums equal zero over the integers.
The third and final idea is to employ the protocol for fast polynomial multiplication from Lemma 3.1.
We now describe the protocol.
Proof of Theorem 1.1 Suppose we are given a k-SUM instance on n integers a 1 , . . . , a n ∈ [−n c , n c ] for some constant c > 0. Merlin first sends a prime p =˜ (n 2k/3 ). Let be the the set of k-tuples whose sums vanish modulo p. Merlin additionally sends a set T of k-tuples of indices such that |T | ≤Õ(n k/3 ) and claims that T = S. Now, for each i ∈ [n] let b i be the residue of a i modulo p. Define the polynomial The coefficients of the k th power of this polynomial encode information that will help us solve the k-SUM problem. In particular, we leverage the following simple observation.
Now, since each b i is the residue of a i modulo p, we know that Combining these observations, we get that s 0 + s p + · · · + s (k−1) p is equal to the number of k-tuples (i 1 , . . . , i k ) ∈ [n] k such that which proves the desired result.
Returning to the protocol, Merlin and Arthur run k instances of the protocol from Lemma 3.1 in parallel, 8 for a field of size q, for some prime q > n k , with error rate δ = 1/(kn), to determine the coefficients s p of x p in B(x) k for all ∈ {0, 1, . . . , k − 1}. Arthur rejects if Merlin fails to convince him of the values of any of these coefficients.
Otherwise, Arthur checks that He also checks that for each (i 1 , . . . , i k ) ∈ T , we have a i 1 + · · · + a i k = 0 over the integers. If both these checks pass, Arthur accepts. Otherwise, he rejects. We now explain why this Merlin-Arthur proof system is correct. First, suppose that no k of the a i sum to zero. We show that Merlin has a proof which always convinces Arthur to accept.
By the prime number theorem, there exists some constant C such that there are at least n 2k/3 distinct primes in the interval I = [n 2k/3 , Cn 2k/3 log n]. Now, by assumption, each sum is a nonzero integer with magnitude at most kn c , and thus has at most c(log n + log k) distinct prime divisors. Thus by the pigeonhole principle, there exists a prime in the interval I which divides at most n k · c(log n + log k) n 2k/3 ≤Õ(n k/3 ) of the n k sums of the form presented in Eq. (5). So, Merlin can send a prime p satisfying the desired properties to Arthur. He also sends T = S, the list of sums of the form given in Eq. (5) which are divisible by p, which has |T | ≤Õ(n k/3 ) by the choice of p. If Merlin sends the correct values for s 0 , s p , . . . , s (k−1) p , then Eq. (4) will hold by Claim 3.3, and Arthur accepts. Now, suppose that some k of the a i do in fact sum to zero. In this case, we show that with high probability Arthur will reject.
First, if the set T which Merlin sends contains a k-tuple corresponding to a list of k inputs whose sum does not vanish modulo p, or a sum which sums to zero over the integers, then Arthur will automatically reject. Otherwise, by assumption, the set T is missing some tuple ( j 1 , . . . , j k ) such that a j 1 + · · · + a j k = 0.
Reducing the above equation modulo p, we see that b j 1 + · · · + b j k ≡ 0 (mod p).
Then by Claim 3.3, if Merlin and Arthur have decided on the correct coefficients of B(x) k , we have s 0 + s p + · · · + s (k−1) p < |T | and Arthur will reject. By union bound and Lemma 3.1, Arthur correctly rejects with probability at least

Implications of the k-SUM protocol
We now show the implications of our k-SUM protocol.
We first consider the k-SUM-Partitioned problem, where we are given k input lists A (1) , A (2) , . . . , A (k) each consisting of n integers from [−n c , n c ] for some constant c, and want to determine if there exist indices i 1 , . . . , i k such that A (1) i k = 0 (this problem has also been called k-SUM' [25] and Colorful k-SUM). We note there is a deterministic reduction from k-SUM-Partitioned to k-SUM, extending the case of 3-SUM [25].

Corollary 3.4 For any fixed integer k ≥ 3, certifying that a k-SUM-Partitioned instance has no solution can be done in Merlin-Arthur timeÕ(n k/3 ).
Proof Sketch Let M = 10kn c . We create a k-SUM instance as follows. For every 1 ≤ i ≤ k and every integer a from the input list A (i) , we include the integer in the k-SUM instance. A solution to the k-SUM-Partitioned immediately implies a solution of the new k-SUM instance. Conversely, it can be shown that any k-SUM solution must recover a solution of the k-SUM-Partitioned instance. Hence, applying the protocol from Theorem 1.1 to this k-SUM instance of kn integers can solve the original k-SUM-Partitioned instance.
Recall that in the k-SUM-Distinct problem, we are given n integers a 1 , . . . , a n with magnitude at most n c and need to determine if there exist k distinct indices i 1 , . . . , i k such that a i 1 + · · · + a i k = 0. We will use a folklore deterministic reduction from k-SUM-Distinct to k-SUM-Partitioned.

Corollary 3.5 For any fixed integer k ≥ 3, certifying that a list of n integers has no k-SUM-Distinct solution can be done in Merlin-Arthur timeÕ(n k/3 ).
Proof Given a k-SUM-Distinct instance a 1 , a 2 , . . . , a n , we will deterministically create Observe that there must exist a set of coordinates C ⊆ [log n] of size |C| ≤ k − 1, so that the projections bin(i 1 )| C , . . . , bin(i k )| C are still distinct. Hence, for every possibility of The total number of instances created is at most c k · (log n) k−1 for some constant c k .
By another folklore reduction, our protocol for the 3-SUM-Partitioned problem immediately implies an improved protocol for the Subset Sum problem.

Reminder of Corollary 1.2 Certifying that a list of n integers has no Subset Sum solution can be done in 2 n/3 · poly(n) Merlin-Arthur time.
Proof Suppose we have an instance of Subset Sum consisting of n inputs a 1 , a 2 , . . . , a n and a target integer t. We partition the set [n] into the disjoint union of three subsets A, B, C ⊆ [n], each with size at most n/3 , and define the sets Then there exists a subset S ⊆ [n] such that i∈S a i = t if and only if there exist x ∈ X , y ∈ Y , z ∈ Z such that x + y + z = 0, which is a 3-SUM-Partitioned instance. Note that X , Y , Z each have at most 2 n/3 elements. Applying Corollary 3.4 solves the problem in 2 n/3 · poly(n) time.

Reminder of Corollary 1.4 For any fixed integer k ≥ 3, the All-Numbers k-SUM problem can be solved in Merlin-Arthur timeÕ(n k/3 ).
Proof For every index i such that a i is part of a k-SUM solution, Merlin simply sends a witnessing solution to Arthur. Let S ⊆ [n] be the set of remaining indices, which do not participate in any solution. It remains to verify that S is correct.
Denoting By our choice of M, every k-SUM solution in this new instance must use exactly one integer from B, and hence corresponds to a k-SUM solution in the original instance that uses a i for some i ∈ S. So it suffices to use the protocol from Theorem 1.1 to prove that this new k-SUM instance has no solution.

Reminder of Corollary 1.3 MinPlus Convolution can be solved in Merlin-Arthur timẽ O(n).
Proof Sketch Merlin first sends the correct values of c k , each accompanied with a witness pair (i, j) such that i + j = k and a i + b j = c k . Then it remains to verify that a i + b j ≥ c i+ j for all i, j, which is equivalent to the MaxConv UpperBound problem defined in [18].
In [18,Appendix A], it was shown (using the techniques from [54, Theorem 3.3]) that MaxConv UpperBound can be deterministically reduced to the 3-SUM Convolution problem. In this problem, we are given three integer arrays a, b, c and want to decide whether there exists a pair of indices (i, j) such that In the Zero-Weight Triangle problem, we are given an undirected graph G on n vertices and m edges with weights from [−n c , n c ] for some positive constant c, and are tasked with determining if G contains a triangle whose edge weights sum to zero.

Corollary 3.6 Certifying that a given graph has no zero-weight triangles can be done in Merlin-Arthur timeÕ(m).
Proof Sketch We first make the graph directed by replacing each edge connecting vertices u and v with two arcs, one going from u to v and the other going from v to u. Then by making three copies of the original graph, we may assume without loss of generality that the graph is tripartite with three parts A, B, C, and edges are oriented from A to B, B to C, and C to A.
We use the reduction described in [37]. Merlin first assigns integer node labels 0 ≤ (u) ≤ poly(n) to each node u in the graph. For each edge (u, v) of weight w, insert an integer (u) − (v) + w to the 3SUM instance. Then it is easy to see that any zero-weight triangle (a, b, c) would lead to a 3SUM solution On the other hand, if the graph does not contain a zero-weight triangle, then a simple probabilistic argument implies the existence of a way to pick the node labels so that the resulting 3-SUM instance has no solution.
In the next section we will see that we can actually count the number of zeroweight triangles by a different Merlin-Arthur protocol with essentially the same time complexity.

Counting Zero-Weight Cliques
In this section we present the Merlin-Arthur protocol for #Zero-Weight k-Clique and prove Theorem 1.5, which is restated below. We assume the input graph is a simple undirected graph with n nodes and m weighted edges, where m ≥ (n) and all edge weights are in [−n c , n c ] for some constant c.
Proof Without loss of generality, we assume that the input graph is a k-partite graph with k parts of nodes A 1 , A 2 , . . . , A k/2 , B 1 , B 2 , . . . , B k/2 , each containing n nodes, and every edge connects two nodes coming from different parts. We identify the nodes with integers {1, 2, . . . , kn}. We also denote We encode the edge weights in binary, and will use arrow notation to emphasize that they are bit-vectors of length O(log n). For each node b ∈ B and node a ∈ A, let f b (a) be the binary encoding of the weight of edge (a, b) (if this edge does not appear in the input graph, we simply treat its edge weight as a large enough positive number M so that it can never participate in a zero-weight k-clique). We extend f b (a) to a vector polynomial f b (x) of degree |A| = O(n). Note that f b (x) consists of O(log n) many scalar polynomials each corresponding to one bit in the binary encoding of edge weights. These scalar polynomials are over the field F p for some prime p = poly(n) and p > n k .
Then, define a k/2 -variate vector polynomial h(x 1 , . . . , x k/2 ), such that for every vector h(a 1 , . . . , a k/2 ) encodes the total weight of the clique formed by the nodes a 1 , . . . , a k/2 . Note that h(x 1 , . . . ,  (x 1 , . . . , x k/2 ), w(b 1 , . . . , b k/2 ), where Q takes 2 + k/2 · k/2 input integers (encoded in binary), and outputs 1 if the input integers sum to exactly zero, and outputs 0 otherwise. Hence, by definition, a 1 ∈A 1 ,...a k/2 ∈A k/2 P a 1 , . . . , a k/2 (7) equals the number of zero-weight k-cliques in the input graph. Note that Q only involves a constant number of additions and a comparison, which can be implemented by an AC 0 circuit with O(log n) input gates and polylog(n) size. We can convert Q into an equivalent arithmetic circuit of polylog(n) size and degree. It then follows that P is a polynomial (over F p ) of degree at most n · polylog(n).
At the beginning of the protocol, Merlin sends the polynomial P defined in Equation (6) to Arthur, represented asÕ(n k/2 ) many coefficients. Then, Arthur can evaluate the values of P(a 1 , . . . , a k/2 ) for all (a 1 , . . . , a k/2 ) ∈ A 1 × · · · × A k/2 inÕ(n k/2 ) time using Theorem 2.3. Then Arthur can easily compute the count of zero-weight k-cliques using Equation (7).
It remains to analyze the time complexity for these preprocessing steps.
Let N (b) denote the set of neighbors of node b in A. We will show that, after O(n)-time preprocessing, this step can be performed in |N (b)| · polylog(n) time for every node b, and thus the total running time isÕ(n + b∈B |N ( Since the O(log n) coordinates of the vector will be considered separately, in the following we only need to discuss how to process one of these coordinates. Abusing notation, we use f b (a) to indicate the value of f b (a) on the coordinate under consideration, and use w(a, b) and M to denote the corresponding values on this coordinate. That is, f b (a) = w(a, b) if a ∈ N (b),  and f b (a) = M if a ∈ A \ N (b).

By Lagrange interpolation, we have
The denominator a ∈A\{a} (a − a ) can be easily computed for all a inÕ(n) total time (recall that the node set A is identified with the integer set {1, 2, . . . , k/2 ·n}).
For each x = r i , one can perform a simpleÕ(n)-time preprocessing so that for each a ∈ A, the numerator a ∈A\{a} (r i − a ) can be computed in constant field operations. Then, it only takes O (|N (b)|) field operations to evaluate f b (r i ).
We remark that the protocol of Theorem 1.5 can be also used to count cliques with other kinds of restrictions on the edge weights, by simply modifying the predicate Q in Equation (6). For example, our protocol can also apply to the #Negative k-Clique problem, which asks to count the number of k-cliques whose sum of edge weights is negative. By modifying Q in Eq. (6) we can also count the number of any 4-node (induced or not-necessarily-induced) subgraphs in the input graph, in near-optimalÕ(n 2 ) Merlin-Arthur time. See [56] for the best known algorithms to detect 4-node subgraphs in the input graph.

Corollary 4.2 For any 4-node pattern graph H , counting the number of (induced or not-necessarily-induced) copies of H in the input graph can be done inÕ(n 2 ) Merlin-Arthur time.
Combining known reductions with Corollary 4.1, our protocol for #Negative Triangle implies near-optimal protocols for MinPlus Product and APSP. Recall that in the MinPlus Product problem, we are given two n × n integer matrices A, B, and want to compute matrix C defined as C i, j = min n k=1 {A i,k + B k, j }. Proof of Corollary 1. 6 We first show that MinPlus Product can be solved in Merlin-Arthur timeÕ(n 2 ). Merlin first sends to Arthur the correct product C, together with the witness arg min n k=1 {A i,k + B k, j } for each entry C i, j in the product. Arthur checks the validity of these witnesses, and then verifies that A i,k + B k, j ≥ C i, j hold for all i, j, k. This task easily reduces to the Negative Triangle problem [55] as follows: create a tripartite graph (X , Y , Z ) with edge weights defined as and certify that this new graph has no negative triangles, using Corollary 4.1.
Using the Merlin-Arthur protocol for MinPlus Product, we immediately obtain añ O(n 2 ) time Merlin-Arthur protocol for APSP via the standard repeated squaring procedure. In particular, Merlin can send the matrices obtained from all O(log n) repeated squarings upfront, along withÕ(n 2 )-length proofs of their correctness; Arthur can verify each squaring is correct inÕ(n 2 ) time, one by one.
Given a simple undirected graph and a parameter t, the Triangle Listing problem [12,48,57] asks to report min(t, z) triangles in the graph, where z denotes the total number of triangles in the graph. Our results immediately imply a near-optimal protocol for this task.

Corollary 4.3 Triangle Listing can be solved in Merlin-Arthur timeÕ(m + t).
Proof Merlin uses Theorem 1.5 to prove that the input graph has z triangles inÕ(m) time, and then sends min(t, z) many triangles to Arthur, who verifies that these triangles are valid and distinct.

Unsatisfiability of k-CNFs
In this section, we will present a 2 n−n/O(k) · poly(n, m) time Merlin-Arthur protocol for k-UNSAT with n variables and m clauses in Theorem 1.7 (note that m ≤ O(n k ) in a k-CNF formula). This beats the previously known protocol for k-UNSAT running in 2 n/2 · poly(n, m) time, which follows directly from [58,Theorem 3.4]. We need the following useful theorems.

Theorem 5.1 [Impagliazzo-Paturi [35, Lemma 2]]
Let F be a k-CNF formula on m clauses such that every satisfying assignment to F has at least δn variables set to true for any δ > 0. For any > 0, there exists a k > 0 and F , which is a disjunction of at most 2 n k -CNFs on at most n(1 − δ/(ek)) variables such that F is satisfiable iff F is satisfiable. Moreover F can be computed from F in 2 2 n poly(m) time.

Theorem 5.2 [#SAT for Boolean formulas [58, Theorem 3.4]]
For any k > 0, #SAT for Boolean formulas with n variables and m connectives has a Merlin-Arthur proof system using 2 n/2 poly(n, m) time with randomness O(n) and error probability 1/ exp(n).

Recall that the binary entropy function H (·) is defined by taking
for all p ∈ (0, 1). We prove the following result. Proof The idea behind this protocol is to handle the assignments with fewer than δn variables set to true, and the assignments with more than δn variables set to true, separately. Once we have verified that there are no assignments with δn variables set to true, we can make use of Theorem 5.1 to decompose the formula into formulas with fewer variables. Formally, the protocol proceeds as follows: Given a k-CNF F on n variables and m clauses, Merlin and Arthur certify the unsatisfiability of F as follows: 1. Arthur enumerates over all possible O(2 H (δ)n ) assignments with at most δn variables set to true and verifies that none of them satisfy F. 2. Arthur uses Theorem 5.1 with = 1/k 2 to obtain at most t = 2 n/k 2 k -CNFs Verifying unsatisfiability for all the F i 's in step 3 takes time where the inequality holds for sufficiently large k (for example, k ≥ 60 suffices). Thus, the total time taken by Arthur for verification is 2 n/2−δn/(6k) + 2 H (δ)n poly(n, m). This completes the proof.
Reminder of Theorem 1.7 There is a universal constant δ > 0 such that for all sufficiently large integers k > 0, we can verify any unsatisfiable n-variable m-clause k-CNF with a Merlin-Arthur protocol running in 2 n(1/2−δ/k) · poly(n, m) time.
We stress that both of these are observations, which do not require any significant ideas that are not already in the literature. However, we find them striking to consider in the context of our other Merlin-Arthur protocols such as Theorem 1.7, which beat 2 n/2 time by exploiting the structure of k-CNF formulas.
First, we observe that Williams' protocol naturally algebrizes. Let A : {0, 1} → {0, 1} be an arbitrary oracle. For a constant k ∈ N, we say that a k-CNF A formula is a k-CNF in n variables x 1 , . . . , x n whose atoms are either literals, or they are of the form A(x i 1 , . . . , A i k ) where k ∈ [k] and each i j ∈ [n]. For example, is a 3-CNF A formula. Recall that 3-CNF-SAT A (where we are given a CNF A formula F A and are asked if F A is satisfiable) is NP A -complete, and its corresponding counting version #3-CNF-SAT A is #P A -complete. This definition appeared in [24,51].
Reminder of Proposition 1.8 For every oracle A, #CNF-SAT A on formulas with n variables and size poly(n) can be computed in Merlin-Arthur time 2 n/2 · poly(n) with oracle access to the multilinear extension of A over any field of characteristic greater than 2 n (and order at most 2 poly(n) ).

Proof Sketch
The proposition follows almost directly from the same sort of argument used by Aaronson and Wigderson [6] to show that PSPACE = IP algebrizes, applying it to Williams' protocol. Given a #CNF-SAT A instance F A on n variables with poly(n) size, we can think of F A as an AND of poly(n) ORs of poly(n) literals plus copies of the oracle A which take variables as input. We convert F A into an arithmetic circuit over F p where p > 2 n is a prime in the natural way, where the ANDs and ORs are replaced by corresponding multilinear polynomials of degree at most poly(n), and the copies of oracle A are replaced by calls to the multilinear extensionÃ of A. This results in an arithmetic circuit C of at most poly(n) degree that agrees with F A on all Boolean assignments, with the property that C can be evaluated on any particular assignment in (F p ) n in poly(n) time, provided p < 2 poly(n) . Note that we are using the fact that we have oracle access toÃ: without it, we would not necessarily be able to evaluate C in poly(n) time.
The Merlin-Arthur protocol then divides the set of variables into two halves, and creates a new arithmetic circuit C on n/2 variables, which equals the sum of C(x 1 , . . . , x n/2 , a) where a ranges over all 2 n/2 Boolean assignments to the second half of variables. Merlin tells Arthur a list of values v 1 , . . . , v 2 n/2 ∈ F p , and wishes to prove to Arthur that C (b i ) = v i for all i = 1, . . . , 2 n/2 , where b 1 , . . . , b 2 n/2 ∈ {0, 1} n/2 is a list of all Boolean assignments to the first half of variables (if Merlin can do so, i v i will equal the number of satisfying assignments to F A ). This is achieved by first defining "interpolating polynomials" Q 1 , . . . , Q n/2 such that for a fixed list of 2 n/2 distinct elements α 1 , . . . , α 2 n/2 ∈ F p , we have that Q i (α j ) outputs the i-th bit of the assignment b j . Note that each Q i has degree at most 2 n/2 . Merlin sends to Arthur a univariate polynomial P(y) of degree 2 n/2 · poly(n) representing the circuit C composed with these Q i 's. Arthur checks P by: (a) Picking a random point a ∈ F p , and confirming that C (Q 1 (a), . . . , Q n/2 (a)) = P(a), in 2 n/2 · poly(n) time (using the properties of our C and C ), and (b) Checking for all i = 1, . . . , 2 n/2 that P(α i ) = v i in 2 n/2 · poly(n) time, using fast univariate polynomial evaluation.
Finally, Arthur concludes that 2 n/2 i=1 v i equals the number of satisfying assignments to F A .
We want to compute where [P] takes value 1 if the statement P is true, and 0 otherwise.
Letting t = log 2 (n/2) and letting our formula be By assumption, there is a Merlin-Arthur protocol (with access to the unique multilinear extensionÃ) running in time 2 t/2 /t ω (1) for computing 1-UNSAT A (F A ). Let n 1 = 2 t/2 /t ω(1) = √ n/(log n) ω (1) . By definition, the algorithm proceeds by guessing n 1 bits, randomly choosing n 1 bits, and then running an n 1 -time algorithm that makes at most n 1 calls toÃ.
Alice and Bob compute DISJ as follows. First, they both know the formula F A (but not necessarily the oracle A). So they just start simulating the MA protocol for 1-UNSAT A (F A ) separately. They can obviously simulate the Merlin and Arthur steps in an MA communication protocol, by having "public" nondeterminism of n 1 bits followed by "public" randomness. To simulate the deterministic algorithm making oracle calls, Alice and Bob have to communicate as follows. To handle all n 1 oracle calls, they need to make up to n 1 evaluations of   A(a 1 , . . . , a t , a t+1 ) on given tuples of points a in F t+1 q (the tuple is determined by all the information computed so far, which both Alice and Bob know). Note that becauseÃ is multilinear, we can always writẽ   A(a 1 , . . . , a t , a t+1 ) = a t+1 · A 1 (a 1 , . . . , a t ) + (1 − a t+1 )A 0 (a 1 , . . . , a t ) for some multilinear A 0 and A 1 . Now, what are these A 0 and A 1 ? Well, when we plug in a t+1 = 0,Ã = A 0 , and note the remaining function has a truth table equal to x. Similarly when we plug in a t+1 = 1, the remaining function has a truth table equal to y, which is just A 1 . Therefore Alice can actually compute A 0 (a 1 , . . . , a t ) by herself, and Bob can compute A 1 (a 1 , . . . , a t ). So to evaluateÃ(a 1 , . . . , a t , a t+1 ), the two only have to exchange O(log(q)) bits (the values of A 0 and A 1 on these tuples). Thus they can simulate each query toÃ using O(log(q)) ≤ poly(t) ≤ polylog(n) bits of communication. It follows that they can jointly compute DISJ with only n 1 ·polylog(n) = o( √ n) communication, contradicting the known √ n lower bound for Merlin-Arthur protocols computing DISJ.

Quantified Boolean Formulas
We consider Quantified Boolean Formulas (QBFs) in prenex normal form where F is an arbitrary propositional formula of size m, preceded by quantifiers of the form Q i ∈ {∃, ∀}.
Williams [58] gave a 3-round interactive protocol (i.e., an AMA protocol) for QBFs that ran in O * (2 2n/3 ) time. It was asked as an open question [58] whether there is a 2-round Merlin-Arthur protocol for QBFs with O * (2 (1−ε)n ) running time for some constant ε > 0. Here we resolve this open problem: Reminder of Theorem 1.10 True Quantified Boolean Formulas (TQBF) with n variables and size m ≤ 2 n can be certified by a Merlin-Arthur protocol running in 2 4n/5 · poly(n, m) time.
Our new protocol follows the basic outline of Williams' earlier AMA protocol [58,Section 4], with several key differences we highlight in the proof.
We will prove the following lemma.
Before proving Lemma 7.1, we show that it implies the claimed QBF protocol.
To complete the argument, it remains to prove Lemma 7.1.
Proof of Lemma 7.1 First, we apply the same strategy as in [58]. Convert the propositional formula F to an equivalent arithmetic formula P of poly(m) degree and size, by replacing A ∧ B with A · B and replacing A ∨ B with A + B − A · B. Note that P outputs 0 or 1 on every Boolean input. Then, we convert the subformula φ (x 1 , . . . , x n− ) = (Q n− +1 x n− +1 ) · · · (Q n x n ) P(x 1 , . . . , x n ) into an arithmetic formula P , by replacing each (∃x i ) with a sum over x i ∈ {0, 1}, and each (∀x i ) with a product over x i ∈ {0, 1}. Note that for every a 1 , . . . , a n− ∈ {0, 1} n− , P (a 1 , . . . , a n− ) evaluates to a positive integer if the subformula φ (a 1 , . . . , a n− ) is true, and evaluates to zero if φ (a 1 , . . . , a n− ) is false. Note that P has a depth-binary tree structure with each leaf being a copy of P. The size of P is at most 2 · poly(m), and the degree of P is at most 2 k · poly(m), since there are at most k layers of multiplication gates in this binary tree. For every Boolean input a 1 , . . . , a n− ∈ {0, 1} n− , observe that the output of P (a 1 , . . . , a n− ) is a non-negative integer no larger than (2 n · m) O(2 k ) . Now we separately consider the two scenarios.
Case 1: To prove φ is true. In this case, Merlin sends Arthur a prime p from the interval [2, 2 2n 2 · m], such that, for every a 1 , . . . , a n− ∈ {0, 1} with P (a 1 , . . . a n− ) being a positive integer (over Z), P (a 1 , . . . a n− ) mod p is also non-zero (over F p ). The existence of such prime p was already proved in [58, Section 4] using a standard argument by considering the number of prime factors of P (a 1 , . . . a n− ) and applying a union bound.
The only concern is that the prime p sent by Merlin might not satisfy the required condition: there could exist some (a 1 , . . . , a n− ) ∈ {0, 1} n− where P (a 1 , . . . , a n− ) is non-zero over Z but is zero over F p , so that Arthur will be evaluating φ = (Q 1 x 1 ) · · · (Q n− x n− )φ (x 1 , . . . , x n− ) based on incorrect values of φ (a 1 , . . . , a n− ). However, Merlin is not able to cheat by doing this, since the value of φ is monotone increasing in the values of φ (a 1 , . . . , a n− ), and modifying some of these values from true to false will never change the value of φ from false to true.
We remark that the only difference of this protocol from the previous AMA protocol [58] is that we let Merlin send the prime p, whereas [58] let Arthur send a random p (which satisfies the required condition with high probability), costing an extra round of interaction.
Case 2: To prove φ is false. Note that the previous protocol for the "φ is true" case no longer applies here, since Merlin would be able to cheat by picking a prime p that makes many of the positive integers P (a 1 , . . . , a n− ) vanish in F p .
Recall that these positive integers P (a 1 , . . . , a n− ) are upper bounded by (2 n · m) O(2 k ) . Instead of picking a single prime p for the protocol, Merlin picks s distinct primes p 1 < p 2 < · · · < p s so that their product p 1 p 2 · · · p s is larger than this upper bound. In this way, by Chinese Remainder Theorem we can ensure that, every positive integer P (a 1 , . . . , a n− ) is non-zero mod p j for at least one 1 ≤ j ≤ s. Then, we can simply run the previous protocol for every p j (1 ≤ j ≤ s), in total time s · (2 n− +k + 2 ) · poly(n, m, log p s ).
By choosing the smallest s primes p 1 < · · · < p s such that p 1 p 2 . . . p s > (2 n · m) (2 k ) , we can ensure the above algorithm works with parameter choices s ≤ 2 k · poly(n log m), and p s ≤ O(s · log s) by the prime number theorem. Hence, the total time complexity is (2 n− +2k + 2 k+ ) · poly(n, m).

Open Questions
There remain many interesting open problems concerning the nondeterministic and Merlin-Arthur complexity of problems in fine-grained complexity. A few questions which are particularly relevant to our work are highlighted below. could be very powerful (see for example [26,39]). Perhaps an interesting lower bound for the Disjointness problem follows from Nonuniform NSETH? • Given a universe U = {1, . . . , n} of n elements, a family F of subsets of U , and a target integer t, the #Set Cover problem is the task of computing how many choices of t sets from F have the property that their union equals U . Similarly, the #Exact Cover is the task of counting how choices of t pairwise disjoint sets from F have their union equal to U . Both these problems can be solved deterministically in 2 n · poly(n) time. However, in the Merlin-Arthur setting, although there is a 2 n/2 + |F| poly(n)-time protocol for solving #Exact Cover, the fastest known protocol for solving #Set Cover takes 2 n/2 |F|·poly(n) time [11]. Is there a faster Merlin-Arthur protocol for #Set Cover, or is #Set Cover truly harder than #Exact Cover in the Merlin-Arthur setting for families consisting of 2 (n) sets? • To what extent can our Merlin-Arthur protocols be derandomized to obtain better nondeterministc algorithms for fine-grained problems? For example, derandomizing our protocol for 3-SUM without a loss in the running time would imply a nondeterministic derandomization of Freivald's verification algorithm for Boolean Matrix Multiplication [41,Theorem 1.1] and answer an open question raised by [41]. Finding faster nondeterministic verifiers in this way may also lead to new barriers in deterministic fine-grained reductions between problems [16].
For all of the problems discussed above, evidence against the existence of a better algorithm or protocol (via conditional hardness results) would also be interesting.