Making $\textsf{IP}=\textsf{PSPACE}$ Practical: Efficient Interactive Protocols for BDD Algorithms

We show that interactive protocols between a prover and a verifier, a well-known tool of complexity theory, can be used in practice to certify the correctness of automated reasoning tools. Theoretically, interactive protocols exist for all $\textsf{PSPACE}$ problems. The verifier of a protocol checks the prover's answer to a problem instance in probabilistic polynomial time, with polynomially many bits of communication, and with exponentially small probability of error. (The prover may need exponential time.) Existing interactive protocols are not used in practice because their provers use naive algorithms, inefficient even for small instances, that are incompatible with practical implementations of automated reasoning. We bridge the gap between theory and practice by means of an interactive protocol whose prover uses BDDs. We consider the problem of counting the number of assignments to a QBF instance ($\#\textrm{CP}$), which has a natural BDD-based algorithm. We give an interactive protocol for $\#\textrm{CP}$ whose prover is implemented on top of an extended BDD library. The prover has only a linear overhead in computation time over the natural algorithm. We have implemented our protocol in $\textsf{blic}$, a certifying tool for $\#\textrm{CP}$. Experiments on standard QBF benchmarks show that $\textsf{blic}$ is competitive with state-of-the-art QBF-solvers. The run time of the verifier is negligible. While loss of absolute certainty can be concerning, the error probability in our experiments is at most $10^{-10}$ and reduces to $10^{-10k}$ by repeating the verification $k$ times.


Introduction
Automated reasoning tools often underlie our assertions about the correctness of critical hardware and software components. In recent years, the scope and scalability of these techniques have grown significantly.
Automated reasoning tools are not immune to bugs. If we are to trust their verdict, it is important that they provide evidence of their correct behaviour. A substantial amount of research has gone into proof-producing automated reasoning tools [15,22,21,13,4]. These works define a notion of "correctness certificate" suitable for the reasoning problem at hand, and adapt the reasoning engine to produce independently checkable certificates. For example, SAT solvers produce either a satisfying assignment or a proof of unsatisfiability in some proof system, e.g. resolution (see [15] for a survey). Extending such certificates beyond boolean SAT is an active area of current research [4,17,23,28,3].
In the worst case, the size of certificates grows exponentially in the size of the input, even for boolean unsatisfiability (unless NP = coNP). If users have limited computational or communication resources, transferring and checking large certificates becomes a burden. Large certificates are not just a theoretical curiosity. In practice, resolution proofs for complex SAT problems may run to petabytes [14]. Ideally, we would prefer "small" certificates (polynomial in the size of the input) which can be checked independently in polynomial time.
The IP = PSPACE theorem proves that certification with polynomial verification time is possible for any problem in PSPACE, provided one trades off absolute certainty for certainty with high probability [26]. The complexity class IP consists of those languages for which there is a polynomial-round, complete and sound interactive protocol [12,2,19, 1]-a sequence of interactions between a (computationally unbounded) prover and a (computationally bounded) verifier after which the verifier decides whether the prover correctly performed a computation. The protocol is complete if, whenever an input belongs to the language, there is an honest prover who can convince a polynomial-time randomised verifier in a polynomial number of rounds. The protocol is sound if, whenever an input does not belong to the language, the Verifier will reject the input with high probability -no matter what certificates are provided to the Verifier. That is, a "Prover" cannot fool the certification process.
Since every language in PSPACE has an interactive protocol, there are interactive protocols for UNSAT, QBF, counting QBF, safety verification of concurrent state machines, etc. Observe that the prover of a protocol may perform exponential time computations (which is unavoidable unless P = PSPACE), but the verifier only requires polynomial time in the original input.
If interactive protocols provide a foundation for small and efficiently verifiable certificates (at least for problems in PSPACE), why are they not in widespread practice? We believe the reason to be the following: for asymptotic complexity purposes, it suffices to use honest provers with best-case exponential complexity that naively enumerate all possibilities. Such provers are incompatible with automated reasoning tools, which use more sophisticated data structures and heuristics to scale to real-world examples. So we need to make practical algorithms for automated reasoning efficiently certifying. We call an algorithm efficiently certifying if, in addition to computing the output, it can execute the steps of an honest prover in an interactive protocol with only polynomial overhead over its running time.
In this paper, we show that algorithms using reduced ordered binary decision diagrams (henceforth called BDDs) [9] can be made efficiently certifying. We consider #CP, the problem of computing the number of satisfying assignments of a circuit with partial evaluation (CP). Besides boolean nodes, a CP contains partial evaluation nodes π [x:=false] (resp., π [x:=true] ) that take a boolean predicate as input, say φ, and output the result of setting x to false (resp., true) in φ. #CP generalises SAT, QBF, and counting SAT (#SAT), and has a natural algorithm using BDDs: Compute BDDs for each node of the circuit in topological order, and count the accepting paths of the final BDD.
The theoretical part of the paper proceeds in two steps. First, we present CPCertify, a complete and sound interactive protocol for #CP. CPCertify is similar to the SumCheck protocol [19]. It involves encoding boolean formulas as polynomials over a finite field. The prover is responsible for producing certain polynomials from the original circuit and evaluating them at points of the field chosen by the verifier. These polynomials are either multilinear (all exponents are at most 1) or quadratic (at most 2).
Second, we show that an honest prover in CPCertify can be implemented on top of a suitably extended BDD library. The run times of the certifying BDD algorithms are only a constant overhead over the computation time without certification-they depend linearly on the total number of nodes of the intermediate BDDs computed by the prover to solve the #CP instance. We use two key insights. The first is an encoding of multilinear polynomials as BDDs; we show that the intermediate BDDs represent all the multilinear polynomials a prover needs during the run of CPCertify. The second shows that the quadratic polynomials correspond to intermediate steps during the computation of the intermediate BDDs. We extend BDDs with additional "book-keeping" nodes that allow the prover to also compute the quadratic polynomials while solving the problem. So computing the polynomials required by CPCertify has zero additional cost; the only overhead is the cost of evaluating the polynomials at the field points chosen by the verifier.
We have implemented a certifying #CP solver based on our extended BDD library. Our experiments show that the solver is competitive with state-of-the-art non-certifying QBF solvers, and can outperform certifying QBF solvers based on BDDs. The number of bytes exchanged between the prover and the verifier are an order of magnitude smaller, and Verifier's run time several orders of magnitude smaller, than current encodings of QBF proofs, while bounding the error probability to below 10 −10 . Thus, our results open the way for practically efficient, probabilistic certification of automated reasoning problems using interactive protocols. Additional Related Work. Proof systems for SAT and QBF remain an active area of research-both in theoretical proof complexity and in practical tool development. Jussila, Sinz, and Biere [16,27] showed how to extract extended resolution proofs from BDD operations. This is the basis for proof-producing SAT and QBF solvers based on BDDs [8,7,6]. As in our work, the proof uses intermediate nodes produced in the construction of the BDD operations. We focus on interactive certification instead of extended resolution proofs, which can be exponentially larger than the input formula.
Recently, Luo et al. [20] consider the problem of providing zero-knowledge proofs of unsatisfiability, a motivation similar but not equal to ours. Their techniques require the verifier to work in time polynomial in the proof, which can be exponentially bigger than the input formula. In contrast, the verifier of CPCertify runs in polynomial time in the input. Since any language in PSPACE has a zero knowledge proof [5], our protocol can in principle be made zero knowledge. Whether that system scales in practice is left for future work.

Preliminaries
The Class IP. An interactive protocol between a Prover and a Verifier consists of a sequence of interactions in which a Verifier asks questions to a Prover, receives responses to the questions, and must ultimately decide if a common input x belongs to a language. The computational power of the Prover is unbounded but the Verifier is a randomised, polynomial-time algorithm.
Formally, let P, V denote (deterministic) Turing machines. We say that (r; .., m i ) for even i and m i+1 = P (m 1 , ..., m i ) for odd i. We think of r as an additional sequence of bits given to Verifier V that is chosen randomly. The A language L belongs to IP if there exist some V, P H and polynomials p 1 , p 2 , p 3 , s.t. V (r, x, m 2 , ..., m i ) runs in time p 1 (|x|) for all r, x, m 2 , ..., m i , and, for each x and an r ∈ {0, 1} p 2 (|x|) chosen uniformly at random: 1. (Completeness) x ∈ L implies out(P H , V )(x, r, p 3 (|x|)) = 1 with probability 1, and 2. (Soundness) x / ∈ L implies that for all P we have out(P, V )(x, r, p 3 (|x|)) = 1 with probability at most 2 −|x| .
Intuitively, in an interactive protocol, a computationally unbounded Prover interacts with a randomised polynomial-time Verifier for k rounds. In each round, Verifier sends probabilistic "challenges" to Prover, based on the input and the answers to prior challenges, and receives answers from Prover. At the end of k rounds, Verifier decides to accept or reject the input. The completeness property ensures that if the input belongs to the language L, then there is an "honest" Prover P H who can always convince Verifier that indeed x ∈ L. If the input does not belong to the language, then the soundness property ensures that Verifier rejects the input with high probability no matter how a (dishonest) Prover tries to convince them.
It is known that IP = PSPACE [19,26], that is, every language in PSPACE has a polynomial-round interactive protocol. The proof exhibits an interactive protocol for the language QBF of true quantified boolean formulae; in particular, the honest Prover is a polynomial space, exponential time algorithm that uses a truth table representation of the formula to implement the protocol. Polynomials. Interactive protocols make extensive use of polynomials over some prime finite field F.
Let X be a finite set of variables. We use x, y, z, ... for variables and p, q, ... for polynomials. When we write a polynomial explicitly, we write it in brackets, e.g. [3xy − z 2 ]. We write 1 and 0 for the polynomials [1] and [0], respectively. We use the following operations on polynomials: • Sum, difference, and product. Denoted p + q, p − q, p · q, and defined as usual. A (partial) assignment is a (partial) mapping σ : ., x k are the variables for which σ is defined. Additionally, we call σ binary if σ(x) ∈ {0, 1} for each x ∈ X. Binary and multilinear polynomials. A polynomial is multilinear in x if the degree of x in p is 0 or 1. A polynomial is multilinear if it is multilinear in all its variables. For example, [xy − y 2 ] is multilinear in x but not in y, and [3xy − 2zy] is multilinear. A polynomial p is binary if Π σ p ∈ {0, 1} for every binary assignment σ. Two polynomials p, q are binary equivalent, denoted p ≡ b q, if Π σ p = Π σ q for every binary assignment σ. (Note that non-binary polynomials can be binary equivalent.)

Circuits with Partial Evaluation
We introduce circuits with partial evaluation (CP), a compact representation of quantified boolean formulae, and formulate #CP, the problem of counting the number of satisfying assignments of a CP. #CP generalises QBF, the satisfiability problem for quantified boolean formulas. Figure 1 shows an example of a CP. Informally, it is a directed acyclic graph whose nodes are labelled with variables, boolean operators, or partial evaluation operators π [x:=b] . Intuitively, π [x:=b] φ sets the variable x to the truth value b in the formula φ. In this way, each node of a circuit stands for a boolean function, and the complete circuit stands for the boolean function of the root. Figure 1 shows the formulae represented by each node. Definition 1. Let X denote a finite set of variables and S ⊆ X. A circuit with partial evaluation and variables in S (S-CP) has the form y y [y] We represent a CP φ as a directed acyclic graph. The nodes of the graph are the descendants of φ. A CP φ encodes a boolean predicate P φ , which maps assignments σ : free(φ) → {false, true} to a truth value P φ (σ) ∈ {false, true}. It does so in the obvious manner, e.g., P x (σ) := σ(x), P φ∧ψ (σ) := P φ (σ) ∧ P ψ (σ), etc. We use π [x:=b] as partial evaluation operator, so Figure 1 shows a CP for the quantified boolean formula ∀ y (¬x ∨ (x ∧ y)).
We consider the following problem: #CP Input CP φ. Output The number of satisfying assignments of φ.
Given a quantified boolean formula, we can use the macros for quantifiers to construct in linear time an equivalent CP, i.e., a CP with the same satisfying assignments. Similarly, #SAT instances can also be reduced to #CP. Structure of the rest of the paper. In Section 4, we give an interactive protocol for #CP called CPCertify. In Section 5, we implement an honest Prover for CPCertify on top of an extended BDD-based algorithm for #CP. The prover runs in time polynomial in the size of the largest BDD for any of the subcircuits of the initial circuit. Together, these results yield our main result, Theorem 1, showing that any BDD-based algorithm can be modified to run an interactive protocol with small polynomial overhead. Finally, Section 6 presents empirical results.

An Interactive Protocol for #CP
In this section we describe an interactive protocol for #CP, following the SumCheck protocol of [19]. Section 4.1 introduces arithmetisation, a technique to transform #CP into an equivalent problem about polynomials. Section 4.2 shows how to transform #CP into an equivalent problem about evaluating polynomials of low degree. Finally, Section 4.3 presents an interactive protocol for this problem.

Arithmetisation
We define a mapping [[·]] that assigns to each CP φ a polynomial [[φ]] over the variables free(φ), called the arithmetisation of φ:   ] is defined to satisfy the following property, whose proof is immediate: ] for every truth assignment σ to S.

So, intuitively, the polynomial [[φ]
] is a conservative extension of the predicate P φ : It returns the same values for all binary assignments. Accordingly, in the rest of the paper we abuse language and write σ instead of σ for the binary assignment corresponding to the truth assignment σ.

Degree Reduction
Given a CP φ, its associated polynomial can have degree exponential in the height of φ. Since we are ultimately interested in evaluating polynomials over binary assignments, and since x 2 = x for x ∈ {0, 1}, we can convert polynomials to low degree without changing their behaviour on binary assignments.
For this, we use a degree-reduction operator δ x for every variable x. The operator δ x p reduces the exponent of all powers of x in p to 1. For example, δ x [x 2 y +3xy 2 −2x 3 y 2 +4] = [xy + 3xy 2 − 2xy 2 + 4]. Observe that δ x p ≡ b p. Instead of working on the input CP directly, we first convert it into a circuit with partial evaluation and degree reduction by inserting degree-reduction operators after binary operations. This ensures all intermediate polynomials obtained by arithmetisation have low degree.

Definition 2.
A circuit with partial evaluation and degree reduction over the set S of variables (S-CPD) is defined in the same manner as an S-CP, extended as follows: • if φ is an S-CPD and x ∈ S, then δ x φ is an S-CPD, For an S-CPD φ we define free(φ), |φ|, children, descendants, and the graphical representation as for S-CPs. [xy] [xy] [xy] We convert a CP φ into a CPD conv(φ) by adding a degree-reduction operator for each free variable before any binary operation.
We collect some basic properties of CPDs:

(a) [[conv(φ)]] is a binary multilinear polynomial and
CPDs have another useful property. Recall that given a CP φ we are interested in its number of satisfying assignments. The next lemma shows that this number can be computed by evaluating the polynomial [[conv(φ)]] on a single input.

CPCertify: An Interactive Protocol for #CP
We describe an interactive protocol, called CPCertify, for a CP φ with n free variables. Let X denote the variables used in φ. Prover and Verifier fix a finite field with at least m + 1 elements, where m is an upper bound on the number of assignments (e.g. m = 2 n ). Prover tries to convince the Verifier that Π σ [[conv(φ)]] = K for some K ∈ F.
In the protocol, Verifier challenges Prover to compute polynomials of the form Π σ ([[ψ] ]), where ψ is a node of the CPD conv(φ) and σ : free(ψ) → F is a (non-binary!) assignment; we call the expression Π σ [[conv(ψ)]] a challenge. Observe that all assignments are chosen by Verifier. Prover answers with some k ∈ F. We call the expression CPCertify consists of an initialisation and a number of rounds, one for each descendant of conv(φ). Rounds are executed in topological order, starting at the root, i.e. at conv(φ) itself. The structure of a round for a node ψ of conv(φ) depends on whether ψ is an internal node (including the root), or a leaf.
At each point, Verifier keeps track of a set C of claims that must be checked.

Initialisation. Verifier sends Prover the challenge Π
(By Lemma 2, this amounts to claiming that φ has K · 2 n satisfying assignments.) Verifier Round for an internal node. A round for an internal node ψ runs as follows: i.e. the unique claim from (b) does not hold, then very likely one of the resulting claims will be wrong. Depending on the type of ψ, the claims are computed based on the answers of Prover to challenges sent by Verifier. (See "Description of step (c)" below.) Observe that, since a node ψ can be a child of several nodes, Verifier may collect multiple claims for ψ, one for each parent node. Round for a leaf. If ψ is a leaf, then ψ = x for a variable x, or ψ ∈ {true, false}.

Verifier removes all claims {Π
Observe that if all claims made by Prover about leaves are true, then very likely Prover's initial claim is also true.

Description of step (b). Let {Π
be the claims in C relating to node ψ. Verifier and Prover conduct step (b) as follows: i denote the partial assignment which is undefined on x and otherwise matches σ i . Verifier sends the challenges Example 1. Consider the case in which X = {x}, and Prover has made two claims,   This concludes the description of the interactive protocol. We now show CPCertify is complete and sound.
Proposition 2 (CPCertify is complete and sound). Let φ be a CP with n free variables. Let Π σ [[conv(φ)]] = K be the claim initially sent by Prover to Verifier. If the claim is true, then Prover has a strategy to make Verifier accept. If not, for every Prover, Verifier accepts with probability at most 4n|φ|/|F|.
If the original claim is correct, Prover can answer every challenge truthfully and all claims pass all of Verifier's checks. So Verifier accepts. If the claim is not correct, we proceed round by round. We bound the probability that the Verifier is tricked in a single step to at most 2/|F| using the Schwartz-Zippel Lemma. We then bound the number of such steps to 2n|φ| and use a union bound.

A BDD-based Prover
We assume familiarity with reduced ordered binary decision diagrams (BDDs) [9]. We use BDDs over X = {x 1 , ..., x n }. We fix the variable order x 1 < x 2 < ... < x n , i.e. the root node would decide based on the value of x n . 3

Definition 4. BDDs are defined inductively as follows:
• ⟨true⟩ and ⟨false⟩ are BDDs of level 0; The level of a BDD w is denoted ℓ(w). The set of descendants of w is the smallest set S with w ∈ S and u, v ∈ S for all ⟨x, u, v⟩ ∈ S. The size |w| of w is the number of its descendants.
The  BDDSolver: A BDD-based Algorithm for #CP. An instance φ of #CP can be solved using BDDs. Starting at the leaves of φ, we iteratively compute a BDD for each node ψ of the circuit encoding the boolean predicate P ψ . At the end of this procedure we obtain a BDD for P φ . The number of satisfying assignments of ψ is the number of accepting paths of the BDD, which can be computed in linear time in the size of the BDD.
For a node ψ = ψ 1 ⊛ ψ 2 , given BDDs representing the predicates P φ 1 and P φ 2 , we compute a BDD for the predicate P φ := P φ 1 ⊛ P φ 2 , using the Apply ⊛ operator on BDDs. We name this algorithm for solving #CP "BDDSolver." Figure 3: A BDD and its arithmetisation. For ⟨x, u, v⟩, we denote the link from x to v with a solid edge and x to u with a dotted edge. We omit links to ⟨false⟩.
From BDDSolver to CPCertify. Our goal is to modify BDDSolver to play the role of an honest Prover in CPCertify with minimal overhead. In CPCertify, Prover repeatedly performs the same task: evaluate polynomials of the form Π σ [[ψ]], where ψ is a descendant of the CPD conv(φ), and σ assigns values to all free variables of ψ except possibly one. Therefore, the polynomials have at most one free variable and, as we have seen, degree at most 2.
Before defining the concepts precisely, we give a brief overview of this section.
• First (Proposition 3), we show that BDDs correspond to binary multilinear polynomials. In particular, BDDs allow for efficient evaluation of the polynomial. As argued in Lemma 1(a), for every descendant ψ of φ, the CPD conv(ψ) (which is a descendant of conv(φ)) evaluates to a multilinear polynomial. In particular, Prover can use standard BDD algorithms to calculate the corresponding polynomials Π σ [[ψ]] for all descendants ψ of conv(φ) that are neither binary operators nor degree reductions.
• Second (the rest of the section), we prove a surprising connection: the intermediate results obtained while executing the BDD algorithms (with slight adaptations) correspond precisely to the remaining descendants of conv(φ).
The following proposition proves that BDDs represent exactly the binary multilinear polynomials.

Extended BDDs
During the execution of CPCertify for a given CPD conv(φ), Prover sends to Verifier claims of the form Π σ [[ψ]], where ψ is a descendant of conv(φ), and σ : X → F is a partial assignment. While all polynomials computed by CPCertify are binary, not all are multilinear: some polynomials have degree 2. For these polynomials, we introduce extended BDDs (eBDDs) and give eBDD-based algorithms for the following two tasks: 1. Compute an eBDD representing [[ψ]] for every node ψ of conv(φ).

Given an eBDD for [[ψ]] and a partial assignment σ, compute Π σ [[ψ]].
Computing eBDDs for CPDs: Informal introduction. Consider a CP φ and its associated CPD conv(φ). Each node of φ induces a chain of nodes in conv(φ), consisting of degree-reduction nodes δ x 1 , ..., δ xn , followed by the node itself (see Figure 4). Given BDDs u and v for the children of the node in the CP, we can compute a BDD for the node itself using a well-known BDD algorithm Apply ⊛ (u, v) parametric in the boolean operation ⊛ labelling the node [9]. Our goal is to transform Apply ⊛ into an algorithm that computes eBDDs for all nodes in the chain, i.e. eBDDs for all the polynomials p 0 , p 1 , ..., p n of Figure 4. Figure 4: A node of a CP (⊛) gets a chain of degree reduction nodes in the associated CPD.
Roughly speaking, Apply ⊛ (u, v) recursively computes BDDs w 0 = Apply ⊛ (u 0 , v 0 ) and w 1 = Apply ⊛ (u 1 , v 1 ), where u b and v b are the b-children of u and v, and then returns the BDD with w 0 and w 1 as 0-and 1-child, respectively. 4 Most importantly, we modify Apply ⊛ to run in breadth-first order. Figure 5 shows a graphical representation of a run of Apply ∨ (u, v), where u and v are the two BDD nodes labelled by x. Square nodes represent pending calls to Apply ⊛ . Initially there is only one square call Apply ∨ (u, v) ( Figure 5, top left). Apply ∨ calls itself recursively for u 0 , v 0 and u 1 , v 1 ( Figure 5, top right). Each of the two calls splits again into two; however, the first three are identical ( Figure 5, bottom left), and so reduce to two. These two calls can now be resolved directly; they return nodes true and false, respectively. At this point, the children of Apply ⊛ (u, v) become ⟨y, true, true⟩ = true, and ⟨y, true, false⟩, which exists already as well ( Figure 5, bottom right).
We look at the diagrams of Figure 5 not as a visualisation aid, but as graphs with two kinds of nodes: standard BDD nodes, represented as circles, and product nodes, represented as squares. We call them extended BDDs. Each node of an extended BDD is assigned a polynomial in the expected way: the polynomial , etc. In this way we assign to each eBDD a polynomial. In particular, we obtain the intermediate polynomials p 0 , p 1 , p 2 , p 3 of the figure, one for each level in the recursion. In the rest of the section we show that these are precisely the polynomials p 0 , p 1 , ..., p n of Figure 4.
Thus, in order to compute eBDDs for all nodes of a CPD conv(φ), it suffices to compute BDDs for all nodes of the CP φ. Since we need to do this anyway to solve #CP, the polynomial certification does not incur any overhead. Extended BDDs. As for BDDs, we define eBDDs over X = {x 1 , ..., x n } with the variable order x 1 < x 2 < ... < x n . Definition 5. Let ⊛ be a binary boolean operator. The set of eBDDs (for ⊛) is inductively defined as follows: Apply ∨ (u, v) Figure 5: Run of Apply ∨ (u, v), but with recursive calls evaluated in breadth-first order. All missing edges go to node false.
• every BDD is also an eBDD of the same level; • if u, v are BDDs (not eBDDs!), then ⟨u ⊛ v⟩ is an eBDD of level l where l := max{ℓ(u), ℓ(v)}; we call eBDDs of this form product nodes; • if u ̸ = v are eBDDs and i > ℓ(u), ℓ(v), then ⟨x i , u, v⟩ is an eBDD of level i; • we identify ⟨x i , u, u⟩ and u for an eBDD u and i > ℓ(u).
which is the polynomial p 0 shown in the figure.

Computing eBDDs for CPDs.
Given a node of a CP corresponding to a binary operator ⊛, Prover has to compute polynomials p 0 , δ x 1 p 0 , ..., δ xn ...δ x 1 p 0 corresponding to the nodes of the CPD shown on ComputeEBDD(w) Input: eBDD w Output: sequence w 0 , ..., w n of eBDDs w 0 := w; output w 0 for i = 0, · · · , ℓ(w) − 1 do ] of polynomials. On the right: Recursive algorithm to evaluate the polynomial represented by an eBDD at a given partial assignment. P (w) is a mapping used to memoize the polynomials returned by recursive calls. the right. We show that Prover can compute these polynomials by representing them as eBDDs. Table 1 describes an algorithm that gets as input an eBDD w of level n, and outputs a sequence w 0 , w 1 , ..., w n+1 of eBDDs such that w 0 = w; and w n+1 is a BDD. Interpreted as sequence of eBDDs, Figure 5 shows a run of this algorithm.  Table 1. It has the standard structure of BDD procedures: It recurs on the structure of the eBDD, memoizing the result of recursive calls so that the algorithm is called at most once with a given input.

Efficient Certification
In the CPCertify algorithm, Prover must (a) compute polynomials for all nodes of the CPD, and (b) evaluate them on assignments chosen by Verifier. In the last section we have seen that ComputeEBDD (for binary operations of the CP), combined with standard BDD algorithms (for all other operations), yields eBDDs representing all these polynomials-at no additional overhead, compared to a BDD-based implementation. This covers part (a). Regarding (b), recall that all polynomials computed in (a) have at most one variable. Therefore, using EvaluateEBDD we can evaluate a polynomial in linear time in the size of the eBDD representing it.
The Verifier CPCertify is implemented in a straightforward manner. As the algorithm runs in polynomial size w.r.t. the CP (and not the computed BDDs, which may be exponentially larger), incurring overhead is less of a concern. As presented above, EvaluateEBDD incurs a factor-of-n overhead, as every node of the CPD must be evaluated. In our implementation, we use a caching strategy to reduce the complexity of Theorem 1(b) to O(T ).
Note that the bounds above assume a uniform cost model. In particular, operations on BDD nodes and finite field arithmetic are assumed to be O(1). This is a reasonable assumption, as for a constant failure probability log |F| ≈ log n. Hence the finite field remains small. (It is possible to verify the number of assignments even if it exceeds |F|, see below.)

Implementation concerns
We list a number of points that are not described in detail in this paper, but need to be considered for an efficient implementation. Finite field arithmetic. It is not necessary to use large finite fields. In particular, one can avoid the overhead of arbitrarily sized integers. For our implementation we fix the finite field F := Z p , with p = 2 61 − 1 (the largest Mersenne prime to fit in 64 bits).

Incremental eBDD representation. Algorithm
ComputeEBDD computes a sequence of eBDDs. These must not be stored explicitly, otherwise one incurs a space-overhead. Instead, we only store the last eBDD as well as the differences between each subsequent element of the sequence. To evaluate the eBDDs, we then revert to a previous state by applying the differences appropriately. Evaluation order. It simplifies the implementation if Prover only needs to evaluate nodes of the CPD in some (fixed) topological order. CPCertify can easily be adapted to guarantee this, by picking the next node appropriately in each iteration, and by evaluating only one child of a binary operator ψ 1 ⊛ ψ 2 . The value of the other child can then be derived by solving a linear equation. Efficient evaluation. As stated in Theorem 1, using EvaluateEBDD Prover needs Ω(nT ) time to respond to Verifier's challenges. In our implementation we instead use a caching strategy that reduces this time to O(T ). Essentially, we exploit the special structure of conv(φ): Verifier sends a sequence of challenges where assignments σ i and σ i+1 differ only in variables x i and x i+1 . The corresponding eBDDs likewise change only at levels i and i + 1. We cache the linear coefficients of eBDD nodes that contribute to the arithmetisation of the root top-down, and the arithmetised values of nodes bottom up. As a result, only levels i, i + 1 need to be updated. Large numbers of assignments. If the number of satisfying assignments of a CP exceeds |F|, Verifier would not be able to verify the count accurately. Instead of choosing |F| ≥ 2 n , which incurs a significant overhead, Verifier can query the precise number of assignments, and then choose |F| randomly. This introduces another possibility of failure, but (roughly speaking) it suffices to double log |F| for the additional failure probability to match the existing one. Our implementation does not currently support this technique.

Evaluation
We have implemented an eBDD library, blic (BDD Library with Interactive Certification) 5 , that is a stand-in replacement for BDDs but additionally performs the role of Prover in the CPCertify protocol. We have also implemented a client that executes the protocol as Verifier. The eBDD library is about 900 lines of C++ code and the CPCertify protocol is about 400 lines. We have built a prototype certifying QBF solver in blic, totalling about 2600 lines of code. We aim to answer the following questions in our evaluation: RQ1: Performance of blic. We compare blic with CAQE, DepQBF, and PGBDDQ, three state-of-the-art QBF solvers. CAQE [28,10] does not provide any certificates in its most recent version. DepQBF [18,11] is a certifying QBF solver. PGBDDQ [7,24] is an independent implementation of a BDD-based QBF solver. Both DepQBF and PGBDDQ provide specialised checkers for their certificates, though PGBDDQ can also proofs in standard QRAT format. Note that PGBDDQ is written in Python and generates proofs in an ASCII-based format, incurring overhead compared to the other tools. We take 172 QBF instances (all unsatisfiable) from the Crafted Instances track of the QBF Evaluation 2022. 6 The Prenex CNF track of the QBF competition is not evaluated  Table 2: Comparison of certificate generation, bytes exchanged between prover and verifier, and time taken to verify the certificate on a set of QBF benchmarks from [7]. "Solve time" is time taken to solve the instance and to generate a certificate (seconds), "Certificate" is the size of proof encoding for PGBDDQ, and bytes exchanged by CPCertify for blic, and "Verifier time" is time to verify the certificate (Verifier's run time for blic and time taken by qchecker).

RQ1. Is a QBF solver with
here. It features instances with a large number of variables. BDD-based solvers perform poorly under these circumstances without additional optimisations. Our overall goal is not to propose a new approach for solving QBF, but rather to certify a BDD-based approach, so we wanted to focus on cases where the existing BDD-based approaches are practical. We ran each benchmark with a 10 minute timeout; all tools other than CAQE were run with certificate production. All times were obtained on a machine with an Intel Xeon E7-8857 CPU and 1.58 TiB RAM 7 running Linux. See Appendix H for a detailed description. blic solved 96 out of 172 benchmarks, CAQE solved 98, DepQBF solved 87, and PGBDDQ solved 91. Figure 6(a) shows the run times of blic compared to the other tools. The plot indicates that blic is competitive on these instances, with a few cases, mostly from the Lonsing family of benchmarks, where blic is slower than DepQBF by an order of magnitude. Figure 6(b) shows the overhead of certification: for each benchmark (that finishes within a 10min timeout), we plot the ratio of the time to compute the answer to the time it takes to run Prover in CPCertify. The dotted regression line shows CPCertify has a 2.8× overhead over computing BDDs. For this set of examples, the error probability never exceeds 10 −8.9 (10 −11.6 when Lonsing examples are excluded); running the verifier k times reduces it to 10 −8.9k . RQ2: Communication Cost of Certification and Verifier Time. We explore RQ2 by comparing the number of bytes exchanged between Prover and Verifier and the time needed for Verifier to execute CPCertify with the number of bytes in an QBF proof and the time required to verify the proof produced by DepQBF and PGBDDQ, for which we use QRPcheck [23,25] and qchecker [7,24], respectively. Note that the latter is written in Python.
We show that the overhead of certification is low. Figure 6(c) shows the run time of Verifier-this is generally negligible for blic, except for the Lonsing and KBKF families, which have a large number of variables, but very small BDDs. Figure 6(d) shows the total number of bytes exchanged between Prover and Verifier in blic against the size of the proofs generated by PGBDDQ and DepQBF. For large instances, the number of bytes exchanged in blic is significantly smaller than the size of the proofs. The exception are again the Lonsing and KBKNF families of instances. For both plots, the dotted line results from a log-linear regression.
In addition to the Crafted Instances, we compare against PGBDDQ on a challenging family of benchmarks used in the PGBDDQ paper (matching the parameters of [7, Table 3]); these are QBF encodings of a linear domino placing game. 8 Our results are summarised in Table 2. The upper bound on Verifier error is 10 −9.22 . We show that blic outperforms PGBDDQ both in overall cost of computing the answer and the certificates as well as in the number of bytes communicated and the time used by Verifier.
Our results indicate that giving up absolute certainty through interactive protocols can lead to an order of magnitude smaller communication cost and several orders of magnitude smaller checking costs for the verifier.

Conclusion
We have presented a solver that combines BDDs with an interactive protocol. blic can be seen as a self-certifying BDD library able to certify the correctness of arbitrary sequences of BDD operations. In order to trust the result, a user must only trust the verifier (a straightforward program that poses challenges to the prover). We have shown that blic (including certification time) is competitive with other solvers, and Verifier's time and error probabilities are negligible.
Our results show that IP = PSPACE can become an important result not only in theory but also in the practice of automatic verification. From this perspective, our paper is a first step towards practical certification based on interactive protocols. While we have focused on BDDs, we can ask the more general question: which practical automated reasoning algorithms can be made efficiently certifying? For example, whether there is an interactive protocol and an efficient certifying version of modern SAT solving algorithms is an interesting open challenge. Lemma 3. Let p 1 , p 2 , q 1 , q 2 denote polynomials with p i ≡ b q i for i ∈ {1, 2}. Then p 1 + p 2 ≡ b q 1 + q 2 , p 1 · p 2 ≡ b q 1 · q 2 , and π [x:=r] p 1 ≡ b π [x:=r] q 1 for r ∈ {0, 1}.
Proof. Let σ denote an arbitrary binary assignment. Then At ( * ) we use both the induction hypothesis and Lemma 3. We use ( * ) for the other cases as well, with the same meaning.
• Let φ = ψ 1 ∧ ψ 2 . We use δ x p ≡ b p for any polynomial p and variable x.
It remains to show that [[conv(φ)]] is multilinear. Again, we do a structural induction. The base case φ ∈ {true, false} is again trivial. The other cases all follow from the observation that, given multilinear polynomials p, q over variables Second, we have the case ψ = δ x ψ 1 . By induction, we find that ψ 1 has maximum degree 2, which cannot be increased by δ x . We now introduce the notation Σ x p := π [x:=0] p + π [x:=1] p for x ∈ X. Using this notation, we rewrite above equation.

Lemma 2. A CP φ with n free variables has m < |F| satisfying assignments iff
Crucially, we can now use the fact that p is multilinear (again Lemma 1), to derive for any x ∈ X. Setting x to 1/2 yields In other words, Σ x p = 2 · π [x:=1/2] p. By plugging this into ( * ) we get m = 2 n Π σ p, where σ(x) := 1/2 for x ∈ X, as desired.

C. Proof of Proposition 2
The core of the argument in Proposition 2 uses the Schwartz-Zippel Lemma, of which we only need a very simple version.
Lemma 4 (Schwartz-Zippel Lemma). Let p 1 , p 2 be distinct univariate polynomials over F of degree at most d ≥ 0. Let r be selected uniformly at random from F. The probability that p 1 (r) = p 2 (r) holds is at most d/|F|.
Proof. Since p 1 ̸ = p 2 the polynomial p := p 1 − p 2 is not the zero polynomial and has degree at most d. Therefore p has at most d zeros, and so the probability of p(r) = 0 is at most d/|F|. Now we move on to the proof. 9 Any prime field F with |F| > 2 has an element c such that 2c = 1. Proof. If the claim is true, Prover can always answer every challenge posed by Verifier truthfully. True claims pass all the checks conducted by Verifier, and so Verifier accepts. Assume now the claim is false. We show that Verifier accepts with probability at most 4n|φ|/|F|. First, we consider the contents of C, the set of claims yet to be checked, after each step of the protocol. This gives rise to a sequence C 0 , C 1 , ...C l , where C 0 contains only the initial claim. In particular, we consider each iteration of step (b.1) separately, so executing the loop s times adds s elements to the sequence.

Proposition 2 (CPCertify is complete and sound
As (by assumption) the initial claim is false, C 0 contains a false claim. For the moment, assume that Verifier accepts. They therefore must complete all rounds without rejecting. As the nodes are processed in topological order (every node is processed before its descendants), eventually C must become empty: an inner node ψ replaces all claims about itself with claims about its children, and each leaf node ψ removes all claims about itself.
So C l = ∅ contains only true claims and thus there must be an i, s.t. C i contains one false claim, but C i+1 contains only true claims. For any such i, we say that Prover tricks Verifier in step i. In other words: if Verifier accepts, it was tricked at some point.
We will now show that at each step, Verifier is tricked either with probability 0 or probability at most 2/|F|, and the latter case occurs at most 2n|φ| times. By union bound, this implies the stated bound.
For step (c.4) we get Otherwise, the claim added to C is false and Verifier is not tricked.
Step (b.1) remains. We remark that, though unintuitive, the probability that Verifier is tricked does not increase with m, the number of claims to be merged. If all claims To conclude the proof, we argue that steps (b.1) and (c.4) occur at most 2n|φ| times in total.
Step (c.4) occurs at most |conv(φ)| ≤ n|φ| times. For (b.1) we note that it is executed at most n times for each node with more than one parent. The conversion of φ to a CPD does not increase the number of such nodes, so it also occurs at most n|φ| times. It remains to argue that [[w]] is binary, for a BDD w. We again proceed by induction on ℓ(w). For w ∈ {⟨false⟩ , ⟨true⟩}, this is clear, so let w = ⟨x, u, v⟩ and let σ denote a binary assignment. We get

D. Proof of Proposition 3
The last step uses σ(x i ) ∈ {0, 1}. By induction hypothesis, both Π σ [[u]] and Π σ [[v]] are 0 or 1, and the claim follows. Part (b). Before we prove this part, we remark that the statement follows from two well-known facts: multilinear polynomials are uniquely determined by the values they take on binary assignments, and BDDs uniquely represent arbitrary boolean functions. Here, however, we give an elementary proof.
We show that such a w exists if p is a polynomial over variables x 1 , ..., x i , for all 0 ≤ i ≤ n. We proceed by induction on i. For i = 0 we have p ∈ {0, 1} and choose the appropriate w ∈ {⟨false⟩ , ⟨true⟩}. For i > 0, let It remains to show that w is unique. We will do this by proving that First, we consider the case that ℓ(u) < i.
. This contradicts minimality of the counterexample.
Second, we have the case The proof will take up the remainder of this section. For the time bound, observe that EvaluateEBDD performs the same operations as Apply ⊛ , but with a breadth-first traversal of the BDD, instead of a depth-first one. Clearly, this does not increase the time complexity.
The bound T ∈ O(|u 1 | · |u 2 |) is a well-known bound for Apply ⊛ , it relies on there being at most |u 1 | · |u 2 | recursive calls, as each call corresponds to a pair of BDD nodes (and identical calls are memoised). Naturally, the same bound holds for EvaluateEBDD, where the number of created product nodes is bounded by |u 1 | · |u 2 |. Each of these product nodes is operated on once, by replacing it with a BDD nodes. Now we move to showing correctness, which will follow from Lemmata 9 and 10. We start with a basic property of the degree reduction operator.

Lemma 5. Let p denote a polynomial and x a variable. Then we have
Proof. We write p as p = p 0 + [x] · p 1 + ... + [x k ] · p k for polynomials p 0 , ..., p k which do not depend on x. We get We first show that the innermost loop of ComputeEBDD computes a degree reduction. Lemma 6. Let ⟨u ⊛ v⟩ be a product eBDD, and let s := ⟨x n−i , t 0 , t 1 ⟩ be the eBDD computed by the innermost loop of ComputeEBDD (Table 1,
Now we show some simple invariants of the algorithms.
Lemma 7. w i only has product nodes at levels 1, ..., n − i.
Proof. For w 0 the statement holds vacuously. Assume the statement holds for w i . The algorithm computes w i+1 by replacing each product node ⟨u ⊛ v⟩ of w i at level n − i with a non-product node ⟨x n−i , t 0 , t 1 ⟩. So w i+1 has no product nodes at level n − i.
Proof. For any node v of w i , let v * denote the corresponding node in w i+1 . More precisely, let u 1 , v 1 , ..., u l , v l denote the sequence of replaced nodes, i.e.
Then v * : So if v is a product node at level n − i, v * denotes the BDD node that replaces it, and for the other nodes, v * and v are the same node, except that some descendant of v have been replaced by new nodes. Note that w * i = w i+1 , and We proceed by induction. For the two leaves true and false the statement clearly holds. For the induction step, we consider two cases.
• v is a product node.
] follows from Lemma 6. • where (1)  Proof. By Lemma 7, w n has no product nodes. By Definition 5, an eBDD without product nodes is a BDD. Moreover, By to the descendants that correspond to BDDs (and not eBDDs). Note |S bdd | = |φ| and |S| ≤ n|φ|. Additionally, let B ψ denote the eBDD representing ψ ∈ S, computed by ComputeEBDD. Note that B ψ is a BDD if ψ ∈ S bdd , and (as BDDs are unique, see Proposition 3) those necessarily match the BDDs computed by BDDSolver. We thus observe ψ∈S bdd |B ψ | ≤ T . For the eBDD ψ ∈ S \ S bdd , each node also appears in the computation of Com-puteEBDD. In the sum ψ∈S |B ψ |, however, a node is counted up to n times, so we get ψ∈S |B ψ | ≤ nT . As shown in Proposition 5, responding to one challenge takes time linear in |w|, where w is the evaluated eBDD.
Step (b.1.1) of CPCertify sends at most n challenges for each node in S bdd , which are evaluated in time linear in ψ∈S bdd nf |B ψ | ≤ nT .
Step (c) sends a challenge for each node in S, which take at most ψ∈S |B ψ | ≤ nT time.
Part (c). As argued for part (b), Verifier sends at most n|φ| challenges. The challenge consist of one partial assignment, which has size at most n. The failure probability follows from Proposition 2.

H.1. Instances
We used the instances from the crafted instances track of the QBF Evaluation 2022 (http: //www.qbflib.org/QBFEVAL_20_DATASET.zip). These are re-used from the QBF Evaluation 2020. It should be noted that these instances are all unsatisfiable. (Instances in QDIMACS format are specified so that the outermost quantifier is existential.) We also use the linear domino placement game used for evaluating PGBDDQ [7]. They can be obtained at https://github.com/rebryant/pgbddq-artifact. We reproduce the parameters of [7, Table 3]. On these instances we run only PGBDDQ and our tool, which are both BDD-based, and we allow both to use the provided variable ordering, which improves performance significantly.

H.2. Tools
The following four tools were compared: All solvers were run without a preprocessor, limiting their performance. In the QBF Evaluation 2022, CAQE combined with the preprocessor Bloqqer achieved first place in the crafted instances track. For comparison, we ran our tool and CAQE on the preprocessed instances: Bloqqer solves 97 of 172 by itself, of the remaining 74 instances our tool solves 34, while CAQE solves 50 (timeout of 10min).
We ran DepQBF with certificate generation enabled. More precisely, we used flags -trace=bqrp -dep-man=simple -no-lazy-qpup -no-dynamic-nenofex -no-qbce-dynamic -no-trivial-falsity -no-trivial-truth These flags disable features that are not supported in conjunction with certificate generation. This reduces the performance of DepQBF: in its default configuration it can solve 104 of 172 instances (compared to 87 when certification was enabled). To verify certificates generated by PGBDDQ we use the tool qchecker, which is part of PGBDDQ. It is specialised for the certificates PGBDDQ generates.

H.3. Time Measurement
Times are measured in the calling process. These times exceed self-reported times by about 1-2ms. Running our tool repeatedly on one (arbitrarily chosen) instance yields an average run-to-run deviation of 11ms, maximum deviation of 34ms (compared to a total running time of 3.08s).
To measure the time taken by our tool for Verifier and Prover parts of CPCertify, it is necessary to measure the contributions of each round of the interactive protocol. As the protocol is executed within a single process, these data are collected internally in our tool.

H.3.1. Comparison with results in [7].
As we use the same instances and configuration of PGBDDQ, we can compare the times we obtained with the times in [7, Table 3] to verify that we can reproduce their numbers.
Our results match their results relatively closely. Our times are between 15% and 29% slower for 10 ≤ N ≤ 25, and 45 − 49% slower for N = 45, with one outlier at 58% (a subsequent run was 54% slower, making it unlikely that the issue is intermittent). This can likely be accounted for by the faster (in terms of single-thread performance) Intel Core i7-7700K processor used in [7], and differences in main memory.