Learning Abstractions for Program Synthesis
 7.2k Downloads
Abstract
Many exampleguided program synthesis techniques use abstractions to prune the search space. While abstractionbased synthesis has proven to be very powerful, a domain expert needs to provide a suitable abstract domain, together with the abstract transformers of each DSL construct. However, coming up with useful abstractions can be nontrivial, as it requires both domain expertise and knowledge about the synthesizer. In this paper, we propose a new technique for learning abstractions that are useful for instantiating a general synthesis framework in a new domain. Given a DSL and a small set of training problems, our method uses tree interpolation to infer reusable predicate templates that speed up synthesis in a given domain. Our method also learns suitable abstract transformers by solving a certain kind of secondorder constraint solving problem in a datadriven way. We have implemented the proposed method in a tool called Atlas and evaluate it in the context of the Blaze metasynthesizer. Our evaluation shows that (a) Atlas can learn useful abstract domains and transformers from few training problems, and (b) the abstractions learned by Atlas allow Blaze to achieve significantly better results compared to manuallycrafted abstractions.
1 Introduction
Program synthesis is a powerful technique for automatically generating programs from highlevel specifications, such as inputoutput examples. Due to its myriad use cases across a wide range of application domains (e.g., spreadsheet automation [1, 2, 3], data science [4, 5, 6], cryptography [7, 8], improving programming productivity [9, 10, 11]), program synthesis has received widespread attention from the research community in recent years.
Because program synthesis is, in essence, a very difficult search problem, many recent solutions prune the search space by utilizing program abstractions [4, 12, 13, 14, 15, 16]. For example, stateoftheart synthesis tools, such as Blaze [14], Morpheus [4] and Scythe [16], symbolically execute (partial) programs over some abstract domain and reject those programs whose abstract behavior is inconsistent with the given specification. Because many programs share the same behavior in terms of their abstract semantics, the use of abstractions allows these synthesis tools to significantly reduce the search space.
In this paper, we propose a novel technique for automatically learning domainspecific abstractions that are useful for instantiating an exampleguided synthesis framework in a new domain. Given a DSL and a training set of synthesis problems (i.e., inputoutput examples), our method learns a useful abstract domain in the form of predicate templates and infers sound abstract transformers for each DSL construct. In addition to eliminating the significant manual effort required from a domain expert, the abstractions learned by our method often outperform manuallycrafted ones in terms of their benefit to synthesizer performance.
The workflow of our approach, henceforth called Atlas^{1}, is shown schematically in Fig. 1. Since Atlas is meant to be used as an offline training step for a generalpurpose programmingbyexample (PBE) system, it takes as input a DSL as well as a set of synthesis problems \({\varvec{\mathcal {E}}}\) that can be used for training purposes. Given these inputs, our method enters a refinement loop where an Abstraction Learner component discovers a sequence of increasingly precise abstract domains \(\mathcal {A}_1, \cdot \cdot , \mathcal {A}_n\), and their corresponding abstract transformers \(\mathcal {T}_1, \cdot \cdot , \mathcal {T}_n\), in order to help the AbstractionGuided Synthesizer (AGS) solve all training problems. While the AGS can reject many incorrect solutions using an abstract domain \(\mathcal {A}_i\), it might still return some incorrect solutions due to the insufficiency of \(\mathcal {A}_i\). Thus, whenever the AGS returns an incorrect solution to any training problem, the Abstraction Learner discovers a more precise abstract domain and automatically synthesizes the corresponding abstract transformers. Upon termination of the algorithm, the final abstract domain \(\mathcal {A}_n\) and transformers \(\mathcal {T}_n\) are sufficient for the AGS to correctly solve all training problems. Furthermore, because our method learns general abstractions in the form of predicate templates, the learned abstractions are expected to be useful for solving many other synthesis problems beyond those in the training set.
From a technical perspective, the Abstraction Learner uses two key ideas, namely tree interpolation and datadriven constraint solving, for learning useful abstract domains and transformers respectively. Specifically, given an incorrect program \(\mathcal {P}\) that cannot be refuted by the AGS using the current abstract domain \(\mathcal {A}_i\), the Abstraction Learner generates a tree interpolant \(\mathcal {I}_i\) that serves as a proof of \(\mathcal {P}\)’s incorrectness and constructs a new abstract domain \(\mathcal {A}_{i+1}\) by extracting templates from the predicates used in \(\mathcal {I}_i\). The Abstraction Learner also synthesizes the corresponding abstract transformers for \(\mathcal {A}_{i+1}\) by setting up a secondorder constraint solving problem where the goal is to find the unknown relationship between symbolic constants used in the predicate templates. Our method solves this problem in a datadriven way by sampling inputoutput examples for DSL operators and ultimately reduces the transformer learning problem to solving a system of linear equations.
We have implemented these ideas in a tool called Atlas and evaluate it in the context of the Blaze program synthesis framework [14]. Our evaluation shows that the proposed technique eliminates the manual effort involved in designing useful abstractions. More surprisingly, our evaluation also shows that the abstractions generated by Atlas outperform manuallycrafted ones in terms of the performance of the Blaze synthesizer in two different application domains.

We describe a method for learning abstractions (domains/transformers) that are useful for instantiating program synthesis frameworks in new domains.

We show how tree interpolation can be used for learning abstract domains (i.e., predicate templates) from a few training problems.

We describe a method for automatically synthesizing transformers for a given abstract domain under certain assumptions. Our method is guaranteed to find the unique best transformer if one exists.

We implement our method in a tool called Atlas and experimentally evaluate it in the context of the Blaze synthesis framework. Our results demonstrate that the abstractions discovered by Atlas outperform manuallywritten ones used for evaluating Blaze in two application domains.
2 Illustrative Example
Suppose that we wish to use the Blaze metasynthesizer to automate the class of string transformations considered by FlashFill [1] and BlinkFill [17]. In the original version of the Blaze framework [14], a domain expert needs to come up with a universe of suitable predicate templates as well as abstract transformers for each DSL construct. We will now illustrate how Atlas automates this process, given a suitable DSL and its semantics (e.g., the one used in [17]).
In order to construct the abstract domain \(\mathcal {A}\) and transformers \(\mathcal {T}\), Atlas starts with the trivial abstract domain \(\mathcal {A}_0 = \{ \top \}\) and transformers \(\mathcal {T}_0\), defined as \( {\![\![{F(\top , \cdot \cdot , \top )}\!]\!]^\sharp } =\top \) for each DSL construct F. Using this abstraction, Atlas invokes Blaze to find a program \(\mathcal {P}_0\) that satisfies specification \(\mathcal {E}_1\) under the current abstraction \((\mathcal {A}_0, \mathcal {T}_0)\). However, since the program \(\mathcal {P}_0\) returned by Blaze is incorrect with respect to the concrete semantics, Atlas tries to find a more precise abstraction that allows Blaze to succeed.
3 Overall Abstraction Learning Algorithm
Our toplevel algorithm for learning abstractions, called LearnAbstractions, is shown in Fig. 2. The algorithm takes two inputs, namely a domainspecific language \(\mathcal {L}\) (both syntax and semantics) as well as a set of training problems \({\varvec{\mathcal {E}}}\), where each problem is specified as a set of inputoutput examples \(\mathcal {E}_i\). The output of our algorithm is a pair \((\mathcal {A}, \mathcal {T})\), where \(\mathcal {A}\) is an abstract domain represented by a set of predicate templates and \(\mathcal {T}\) is the corresponding abstract transformers.
At a highlevel, the LearnAbstractions procedure starts with the most imprecise abstraction (just consisting of \(\top \)) and incrementally improves the precision of the abstract domain \(\mathcal {A}\) whenever the AGS fails to synthesize the correct program using \(\mathcal {A}\). Specifically, the outer loop (lines 4–10) considers each training instance \(\mathcal {E}_i\) and performs a fixedpoint computation (lines 5–10) that terminates when the current abstract domain \(\mathcal {A}\) is good enough to solve problem \(\mathcal {E}_i\). Thus, upon termination, the learned abstract domain \(\mathcal {A}\) is sufficiently precise for the AGS to solve all training problems \({\varvec{\mathcal {E}}}\).
Specifically, in order to find an abstraction that is sufficient for solving \(\mathcal {E}_i\), our algorithm invokes the AGS with the current abstract domain \(\mathcal {A}\) and corresponding transformers \(\mathcal {T}\) (line 6). We assume that Synthesize returns a program \(\mathcal {P}\) that is consistent with \(\mathcal {E}_i\) under abstraction (\(\mathcal {A}\), \(\mathcal {T}\)). That is, symbolically executing \(\mathcal {P}\) (according to \(\mathcal {T}\)) on inputs \(\mathcal {E}_i^in \) yields abstract values \({\varvec{\varphi }}\) that are consistent with the outputs \(\mathcal {E}_i^out \) (i.e., \(\forall j. \ \mathcal {E}_{ij}^out \in \gamma (\varphi _j)\)). However, while \(\mathcal {P}\) is guaranteed to be consistent with \(\mathcal {E}_i\) under the abstract semantics, it may not satisfy \(\mathcal {E}_i\) under the concrete semantics. We refer to such a program \(\mathcal {P}\) as spurious.
Thus, whenever the call to IsCorrect fails at line 8, we invoke the LearnAbstractDomain procedure (line 9) to learn additional predicate templates that are later added to \(\mathcal {A}\). Since the refinement of \(\mathcal {A}\) necessitates the synthesis of new transformers, we then call LearnTransformers (line 10) to learn a new \(\mathcal {T}\). The new abstraction is guaranteed to rule out the spurious program \(\mathcal {P}\) as long as there is a unique best transformer of each DSL construct for domain \(\mathcal {A}\).
4 Learning Abstract Domain Using Tree Interpolation
In this section, we present the LearnAbstractDomain procedure: Given a spurious program \(\mathcal {P}\) and a synthesis problem \(\mathcal {E}\) that \(\mathcal {P}\) does not solve, our goal is to find new predicate templates \(\mathcal {A}^\prime \) to add to the abstract domain \(\mathcal {A}\) such that the AbstractionGuided Synthesizer no longer returns \(\mathcal {P}\) as a valid solution to the synthesis problem \(\mathcal {E}\). Our key insight is that we can mine for such useful predicate templates by constructing a tree interpolation problem. In what follows, we first review tree interpolants (based on [18]) and then explain how we use this concept to find useful predicate templates.
Definition 1
(Tree interpolation problem). A tree interpolation problem \(T = (V, r, P, L)\) is a directed labeled tree, where V is a finite set of nodes, \(r \in V\) is the root, \(P: (V \backslash \{ r \}) \mapsto V\) is a function that maps children nodes to their parents, and \(L : V \mapsto \mathbb {F}\) is a labeling function that maps nodes to formulas from a set \(\mathbb {F}\) of firstorder formulas such that \(\bigwedge _{v \in V} L(v)\) is unsatisfiable.
In other words, a tree interpolation problem is defined by a tree T where each node is labeled with a formula and the conjunction of these formulas is unsatisfiable. In what follows, we write \(Desc (v)\) to denote the set of all descendants of node v, including v itself, and we write \(NonDesc (v)\) to denote all nodes other than those in \(Desc (v)\) (i.e., \(V\backslash Desc (v)\)). Also, given a set of nodes \(V'\), we write \(L(V')\) to denote the set of all formulas labeling nodes in \(V'\).
Given a tree interpolation problem T, a tree interpolant \(\mathcal {I}\) is an annotation from every node in V to a formula such that the label of the root node is false and the label of an internal node v is entailed by the conjunction of annotations of its children nodes. More formally, a tree interpolant is defined as follows:
Definition 2
 1.
\(\mathcal {I}(r) = false \);
 2.
For each \(v \in V\): \(\Big ( \big ( \bigwedge _{P(c_i) = v} \mathcal {I}(c_i) \big ) \wedge L(v) \Big ) \Rightarrow \mathcal {I}(v)\);
 3.
For each \(v \in V\): \( Vars \big ( \mathcal {I}(v) \big ) \subseteq Vars \big ( L( { \small { Desc }(v) }) \big ) \bigcap Vars \big ( L( { \small NonDesc (v) } ) \big )\).
Intuitively, the first condition ensures that \(\mathcal {I}\) establishes the unsatisfiability of formulas in T, and the second condition states that \(\mathcal {I}\) is a valid annotation. As standard in Craig interpolation [19, 20], the third condition stipulates a “shared vocabulary” condition by ensuring that the annotation at each node v refers to the common variables between the descendants and nondescendants of v.
Example 1
Consider the tree interpolation problem \(T = (V, r, P, L)\) in Fig. 3, where L(v) is shown to the right of each node v. A tree interpolant \(\mathcal {I}\) for this problem maps each node to the corresponding underlined formula. For instance, we have \(\mathcal {I}(v_1) = (len (v_1) \ne 7)\). It is easy to confirm that \(\mathcal {I}\) is a valid interpolant according to Definition 2.

V consists of all AST nodes in \(\varPi \) as well as a “dummy” node d.

The root r of T is the dummy node d.

P is a function that maps children AST nodes to their parents and maps the root AST node to the dummy node d.
 L maps each node \(v \in V\) to a formula as follows:
Essentially, the ConstructTree procedure labels any leaf node representing the program input with the input example \({e_{in }}\) and the root node with the output example \(e_{out }\). All other internal nodes are labeled with the axiomatic semantics of the corresponding DSL operator (modulo renaming).^{3} Observe that the formula \(\bigwedge _{v\in V} L(v)\) is guaranteed to be unsatisfiable since \(\mathcal {P}\) does not satisfy the I/O example \(({e_{in }}, e_{out })\); thus, we can obtain a tree interpolant for T.
Example 2
Consider program \(\mathcal {P}: \texttt {Concat}(x, ``\textsf {18}")\) which concatenates constant string “18” to input x. Figure 3 shows the result of invoking ConstructTree for \(\mathcal {P}\) and inputoutput example \((``{} \texttt {CAV}", ``{} \texttt {CAV2018}")\). As mentioned in Example 1, the tree interpolant \(\mathcal {I}\) for this problem is indicated with the underlined formulas.
Since the tree interpolant \(\mathcal {I}\) effectively establishes the incorrectness of program \(\mathcal {P}\), the predicates used in \(\mathcal {I}\) serve as useful abstract values that the synthesizer (AGS) should consider during the synthesis task. Towards this goal, the LearnAbstractDomain algorithm iterates over each predicate used in \(\mathcal {I}\) (lines 7–8 in Fig. 4) and converts it to a suitable template by replacing the constants and variables used in \(\mathcal {I}(v)\) with symbolic names (or “holes”). Because the original predicates used in \(\mathcal {I}\) may be too specific for the current inputoutput example, extracting templates from the interpolant allows our method to learn reusable abstract domains.
Example 3
Given the tree interpolant \(\mathcal {I}\) from Example 1, LearnAbstractDomain extracts two predicate templates, namely, Open image in new window and Open image in new window .
5 Synthesis of Abstract Transformers
The key part of our LearnTransformers procedure is the inner loop (lines 5–8) for inferring each of these \({\varvec{f}}_j\)’s. Specifically, given an output predicate template \(\chi '_j\), our algorithm first generates a set of inputoutput examples E of the form \([p_1, \cdot \cdot , p_n] \mapsto p_0\) such that \( {\![\![{F(p_1 , \cdot \cdot , p_n)}\!]\!]^\sharp } = p_0 \) is a sound (albeit overly specific) transformer. Essentially, each \(p_i\) is a concrete instantiation of a predicate template, so the examples E generated at line 6 of the algorithm can be viewed as sound inputoutput examples for the general symbolic transformer given in Eq. 1. (We will describe the GenerateExamples procedure in Sect. 5.1).
Once we generate these examples E, the next step of the algorithm is to learn the unknown coefficients of matrix \(P_j\) from Eq. 5 by solving a system of linear equations (line 7). Specifically, observe that we can use each inputoutput example \([p_1, \cdot \cdot , p_n] \mapsto p_0\) in E to construct one row of Eq. 4. In particular, we can directly extract \({\varvec{c}} = {\varvec{c}}_1, \cdot \cdot , {\varvec{c}}_n\) from \(p_1, \cdot \cdot , p_n\) and the corresponding value of \({\varvec{f}}_j({\varvec{c}})\) from \(p_0\). Since we have one instantiation of Eq. 4 for each of the inputoutput examples in E, the problem of inferring matrix \(P_j\) now reduces to solving a system of linear equations of the form \(A P_j^T = B\) where A is a \(E \times ({\varvec{c}} + 1)\) (input) matrix and B is a \(E \times {\varvec{f}}_j\) (output) matrix. Thus, a solution to the equation \(A P_j^T = B\) generated from E corresponds to a candidate solution for matrix \(P_j\), which in turn uniquely defines \({\varvec{f}}_j\).
Observe that the call to Solve at line 7 may return null if no affine function exists. Furthermore, any nonnull \({\varvec{f}}_j\) returned by Solve is just a candidate solution and may not satisfy Eq. 5. For example, this situation can arise if we do not have sufficiently many examples in E and end up discovering an affine function that is “overfitted” to the examples. Thus, the validity check at line 8 of the algorithm ensures that the learned transformers are actually sound.
5.1 Example Generation
In our discussion so far, we assumed an oracle that is capable of generating valid inputoutput examples for a given transformer. We now explain our GenerateExamples procedure from Fig. 6 that essentially implements this oracle. In a nutshell, the goal of GenerateExamples is to synthesize inputoutput examples of the form \([p_1, \cdot \cdot , p_n] \mapsto p_0\) such that \({\![\![{F(p_1, \cdot \cdot , p_n)}\!]\!]^\sharp } = p_0\) is sound where each \(p_i\) is a concrete predicate (rather than symbolic).

\(p_i\) is an instantiation of template \(\chi _i\).

\(p_i\) is a sound overapproximation of \(s_i\) (i.e., \(s_i \in \gamma (p_i)\)).

For any other \(p_i'\) satisfying the above two conditions, \(p_i'\) is not logically stronger than \(p_i\).
In other words, we assume that Abstract returns a set of “best” sound abstractions of \((s_0, \cdot \cdot , s_n)\) under predicate templates \((\chi _0, \cdot \cdot , \chi _n)\).
Next, given abstractions \((A_0, \cdot \cdot , A_n)\) for \((s_0, \cdot \cdot , s_n)\), we consider each candidate abstract example of the form \([p_1, \cdot \cdot , p_n] \mapsto p_0\) where \(p_i \in A_i\). Even though each \(p_i\) is a sound abstraction of \(s_i\), the example \([p_1, \cdot \cdot , p_n] \mapsto p_0\) may not be valid according to the semantics of operator F. Thus, the validity check at line 8 ensures that each example added to E is in fact valid.
Example 4
6 Soundness and Completeness
In this section we present theorems stating some of the soundness, completeness, and termination guarantees of our approach. All proofs can be found in the extended version of this paper [21].
Theorem 1
(Soundness of LearnTransformers). Let \(\mathcal {T}\) be the set of transformers returned by LearnTransformers. Then, every \(\tau \in \mathcal {T}\) is sound according to Eq. 2.
The remaining theorems are predicated on the assumptions that for each DSL construct F and input predicate templates \(\chi _1, \cdot \cdot , \chi _n\) (i) there exists a unique best abstract transformer and (ii) the strongest transformer expressible in Eq. 2 is logically equivalent to the unique best transformer. Thus, before stating these theorems, we first state what we mean by a unique best abstract transformer.
Definition 3
(Unique best function). Consider a family of transformers of the shape \( {\![\![{F \big ( \chi _1(x_1, {\varvec{c}}_1), \cdot \cdot , \chi _n(x_n, {\varvec{c}}_n) \big )}\!]\!]^\sharp } = \chi '(y, \star ) \). We say that \({\varvec{f}}\) is the unique best function for \((F, \chi _1, \cdot \cdot , \chi _n, \chi ')\) if (a) replacing \(\star \) with \({\varvec{f}}\) yields a sound transformer, and (b) replacing \(\star \) with any other \({\varvec{f'}}\) yields a transformer that is either unsound or strictly worse (i.e., \(\chi '(y, {\varvec{f}}) \Rightarrow \chi '(y, {\varvec{f}}')\) and \(\chi '(y, {\varvec{f}}') \not \Rightarrow \chi '(y, {\varvec{f}})\)).
We now define unique best transformer in terms of unique best function:
Definition 4
(Unique best transformer). Let F be a DSL construct and let \((\chi _1, \cdot \cdot , \chi _n) \in \mathcal {A}^n\) be the input templates for F. We say that the abstract transformer \(\tau \) is a unique best transformer for \(F, \chi _1, \cdot \cdot , \chi _n\) if (a) \(\tau \) is sound, and (b) for any predicate template \(\chi \in \mathcal {A}\), we have \((\chi , {\varvec{f}}) \in \mathsf {Outputs}(\tau )\) if and only if \({\varvec{f}}\) is a unique best function for \((F, \chi _1, \cdot \cdot , \chi _n, \chi )\) for some affine \({\varvec{f}}\).
Definition 5
(Complete sampling oracle). Let F be a construct, \(\mathcal {A}\) an abstract domain, and \(R_F\) a probability distribution over \(\textsc {Domain}(F)\) with finite support S. Futher, for any input predicate templates \(\chi _1, \cdot \cdot , \chi _n\) and output predicate template \(\chi _0\) in \(\mathcal {A}\) admitting a unique best function \({\varvec{f}}\), let \(C(\chi _0,\cdot \cdot ,\chi _n)\) be the set of tuples \((c_0,\cdot \cdot ,c_n)\) such that \((\chi _0(y,c_0),\chi _1(x_1,c_1),\cdot \cdot ,\chi _n(x_n,c_n)) \in A_0 \times \cdot \cdot \times A_n\) and \(c_0 = {\varvec{f}}(c_1,\cdot \cdot ,c_n)\), where \(A_0 \times \cdot \cdot \times A_n = \textsc {Abstract}(s_0,\chi _0,\cdot \cdot ,s_n,\chi _n)\) and \((s_1,\cdot \cdot ,s_n) \in S\) and \(s_0 = \![\![{F(s_1, \cdot \cdot , s_n)}\!]\!]\). The distribution \(R_F\) is a complete sampling oracle if \(C(\chi _0,\cdot \cdot ,\chi _n)\) has full rank for all \(\chi _0,\cdot \cdot ,\chi _n\).
The following theorem states that LearnTransformers is guaranteed to synthesize the best transformer if a unique one exists:
Theorem 2
(Completeness of LearnTransformers). Given an abstract domain \(\mathcal {A}\) and a complete sampling oracle \(R_F\) for \(\mathcal {A}\), LearnTransformers terminates. Further, let \(\mathcal {T}\) be the set of transformers returned and let \(\tau \) be the unique best transformer for DSL construct F and input predicate templates \(\chi _1, \cdot \cdot , \chi _n \in \mathcal {A}^n\). Then we have \(\tau \in \mathcal {T}\).
Using this completeness (modulo unique best transformer) result, we can now state the termination guarantees of our LearnAbstractions algorithm:
Theorem 3
(Termination of LearnAbstractions). Given a complete sampling oracle \(R_F\) for every abstract domain and the unique best transformer assumption, if there exists a solution for every problem \(\mathcal {E}_i \in {\varvec{\mathcal {E}}}\), then LearnAbstractions terminates.
7 Implementation and Evaluation
We have implemented the proposed method as a new tool called Atlas, which is written in Java. Atlas takes as input a set of training problems, an AbstractionGuided Synthesizer (AGS), and a DSL and returns an abstract domain (in the form of predicate templates) and the corresponding transformers. Internally, Atlas uses the Z3 theorem prover [22] to compute tree interpolants and the JLinAlg linear algebra library [23] to solve linear equations.
 1.
How does Atlas perform during training? That is, how many training problems does it require and how long does training take?
 2.
How useful are the abstractions learned by Atlas in the context of synthesis?
7.1 Abstraction Learning
To answer our first question, we use Atlas to automatically learn abstractions for two application domains: (i) string manipulations and (ii) matrix transformations. We provide Atlas with the DSLs used in [14] and employ Blaze as the underlying AbstractionGuided Synthesizer. Axiomatic semantics for each DSL construct were given in the theory of equality with uninterpreted functions.
Training Set Information. For the string domain, our training set consists of exactly the four problems used as motivating examples in the BlinkFill paper [17]. Specifically, each training problem consists of 4–6 examples that demonstrate the desired string transformation. For the matrix domain, our training set consists of four (randomly selected) synthesis problems taken from online forums. Since almost all online posts contain a single inputoutput example, each training problem includes one example illustrating the desired matrix transformation.
Looking at the right side of Fig. 7, we also observe similar results for the matrix domain. In particular, Atlas learns 10 predicate templates and 59 abstract transformers in a total of 22.5 s. Furthermore, Atlas converges to the final abstract domain after processing the first three problems^{5} and the number of iterations for each training instance is also quite small (ranging from 1 to 3).
7.2 Evaluating the Usefulness of Learned Abstractions
To answer our second question, we integrated the abstractions synthesized by Atlas into the Blaze metasynthesizer. In the remainder of this section, we refer to all instantiations of Blaze using the Atlasgenerated abstractions as Blaze \(^\star \). To assess how useful the automatically generated abstractions are, we compare Blaze \(^\star \) against Blaze \(^\dagger \), which refers to the manuallyconstructed instantiations of Blaze described in [14].
Benchmark Information. For the string domain, our benchmark suite consists of (1) all 108 string transformation benchmarks that were used to evaluate Blaze \(^\dagger \) and (2) 40 additional challenging problems that are collected from online forums which involve manipulating file paths, URLs, etc. The number of examples for each benchmark ranges from 1 to 400, with a median of 7 examples. For the matrix domain, our benchmark set includes (1) all 39 matrix transformation benchmarks in the Blaze \(^\dagger \) benchmark suite and (2) 20 additional challenging problems collected from online forums. We emphasize that the set of benchmarks used for evaluating Blaze \(^\star \) are completely disjoint from the set of synthesis problems used for training Atlas.
Experimental Setup. We evaluate Blaze \(^\star \) and Blaze \(^\dagger \) using the same DSLs from the Blaze paper [14]. For each benchmark, we provide the same set of inputoutput examples to Blaze \(^\star \) and Blaze \(^\dagger \), and use a time limit of 20 min per synthesis task.
Main Results. Our main evaluation results are summarized in Fig. 8. The key observation is that Blaze \(^\star \) consistently improves upon Blaze \(^\dagger \) for both string and matrix transformations. In particular, Blaze \(^\star \) not only solves more benchmarks than Blaze \(^\dagger \) for both domains, but also achieves about an order of magnitude speedup on average for the common benchmarks that both tools can solve. Specifically, for the string domain, Blaze \(^\star \) solves 133 (out of 148) benchmarks within an average of 2.8 s and achieves an average 8.3\(\times \) speedup over Blaze \(^\dagger \). For the matrix domain, we also observe a very similar result where Blaze \(^\star \) leads to an overall speedup of 9.2\(\times \) on average.
In summary, this experiment confirms that the abstractions discovered by Atlas are indeed useful and that they outperform manuallycrafted abstractions despite eliminating human effort.
8 Related Work
To our knowledge, this paper is the first one to automatically learn abstract domains and transformers that are useful for program synthesis. We also believe it is the first to apply interpolation to program synthesis, although interpolation has been used to synthesize other artifacts such as circuits [24] and strategies for infinite games [25]. In what follows, we briefly survey existing work related to program synthesis, abstraction learning, and abstract transformer computations.
Program Synthesis. Our work is intended to complement exampleguided program synthesis techniques that utilize program abstractions to prune the search space [4, 14, 15, 16]. For example, Simpl [15] uses abstract interpretation to speed up searchbased synthesis and applies this technique to the generation of imperative programs for introductory programming assignments. Similarly, Scythe [16] and Morpheus [4] perform enumeration over program sketches and use abstractions to reject sketches that do not have any valid completion. Somewhat different from these techniques, Blaze constructs a finite tree automaton that accepts all programs whose behavior is consistent with the specification according to the DSL’s abstract semantics. We believe that the method described in this paper can be useful to all such abstractionguided synthesizers.
Abstraction Refinement. In verification, as opposed to synthesis, there have been many works that use Craig interpolants to refine abstractions [20, 26, 27]. Typically, these techniques generalize the interpolants to abstract domains by extracting a vocabulary of predicates, but they do not generalize by adding parameters to form templates. In our case, this is essential because interpolants derived from fixed input values are too specific to be directly useful. Moreover, we reuse the resulting abstractions for subsequent synthesis problems. In verification, this would be analogous to reusing an abstraction from one property or program to the next. It is conceivable that templatebased generalization could be applied in verification to facilitate such reuse.
Abstract Transformers. Many verification techniques use logical abstract domains [28, 29, 30, 31, 32]. Some of these, following Yorsh, et al. [33] use sampling with a decision procedure to evaluate the abstract transformer [34]. Interpolation has also been used to compile efficient symbolic abstract transformers [35]. However, these techniques are restricted to finite domains or domains of finite height to allow convergence. Here, we use infinite parameterized domains to obtain better generalization; hence, the abstract transformer computation is more challenging. Nonetheless, the approach might also be applicable in verification.
9 Limitations
While this paper takes a first step towards automatically inferring useful abstractions for synthesis, our proposed method has the following limitations:
Shapes of Transformers. Following prior work [14], our algorithm assumes that abstract transformers have the shape given in Eq. 1. We additionally assume that constants \({\varvec{c}}\) used in predicate templates are numeric values and that functions in Eq. 1 are affine. This assumption holds in several domains considered in prior work [4, 14] and allows us to develop an efficient learning algorithm that reduces the problem to solving a system of linear equations.
DSL Semantics. Our method requires the DSL designer to provide the DSL’s logical semantics. We believe that giving logical semantics is much easier than coming up with useful abstractions, as it does not require insights about the internal workings of the synthesizer. Furthermore, our technique could, in principle, also work without logical specifications although the learned abstract domain may not be as effective (see Footnote 3 in Sect. 4) and the synthesized transformers would not be provably sound.
UBT Assumption. Our completeness and termination theorems are predicated on the unique best transformer (UBT) assumption. While this assumption holds in our evaluation, it may not hold in general. However, as mentioned in Sect. 6, we can always guarantee termination by including the concrete predicates used in the interpolant \(\mathcal {I}\) in addition to the symbolic templates extracted from \(\mathcal {I}\).
10 Conclusion
We proposed a new technique for automatically instantiating abstractionguided synthesis frameworks in new domains. Given a DSL and a few training problems, our method automatically discovers a useful abstract domain and the corresponding transformers for each DSL construct. From a technical perspective, our method uses tree interpolation to extract reusable templates from failed synthesis attempts and automatically synthesizes unique best transformers if they exist. We have incorporated the proposed approach into the Blaze metasynthesizer and show that the abstractions discovered by Atlas are very useful.
While we have applied the proposed technique to program synthesis, we believe that some of the ideas introduced here are more broadly applicable. For instance, the idea of extracting reusable predicate templates from interpolants and synthesizing transformers in a datadriven way could also be useful in the context of program verification.
Footnotes
 1.
Atlas stands for AuTomated Learning of AbStractions.
 2.
Without loss of generality, we assume that programs take a single input x, as we can always represent multiple inputs as a list.
 3.
Here, we assume access to the DSL’s axiomatic semantics. If this is not the case (i.e., we are only given the DSL’s operational semantics), we can still annotate each node as \(v = c\) where c denotes the output of the partial program rooted at node v when executed on \({e_{in }}\). However, this may affect the quality of the resulting interpolant.
 4.
We assume that \(\chi _1^{\prime }, \cdot \cdot , \chi _m^{\prime }\) are distinct.
 5.
The learned abstractions can be found in the extended version of this paper [21].
References
 1.Gulwani, S.: Automating string processing in spreadsheets using inputoutput examples. In: Proceedings of the 38th Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL, pp. 317–330. ACM (2011)Google Scholar
 2.Singh, R., Gulwani, S.: Transforming spreadsheet data types using examples. In: Proceedings of the 43rd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL, pp. 343–356. ACM (2016)Google Scholar
 3.Wang, X., Gulwani, S., Singh, R.: FIDEX: filtering spreadsheet data using examples. In: OOPSLA, pp. 195–213. ACM (2016)Google Scholar
 4.Feng, Y., Martins, R., Van Geffen, J., Dillig, I., Chaudhuri, S.: Componentbased synthesis of table consolidation and transformation tasks from examples. In: PLDI, pp. 422–436. ACM (2017)Google Scholar
 5.Wang, X., Dillig, I., Singh, R.: Synthesis of data completion scripts using finite tree automata. Proc. ACM Program. Lang. 1(OOPSLA), 62:1–62:26 (2017)Google Scholar
 6.Yaghmazadeh, N., Wang, X., Dillig, I.: Automated migration of hierarchical data to relational tables using programmingbyexample. In: Proceedings of the VLDB Endowment (2018)Google Scholar
 7.Gascón, A., Tiwari, A., Carmer, B., Mathur, U.: Look for the proof to find the program: decoratedcomponentbased program synthesis. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 86–103. Springer, Cham (2017). https://doi.org/10.1007/9783319633909_5CrossRefGoogle Scholar
 8.Tiwari, A., Gascón, A., Dutertre, B.: Program synthesis using dual interpretation. In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 482–497. Springer, Cham (2015). https://doi.org/10.1007/9783319214016_33CrossRefGoogle Scholar
 9.Feng, Y., Martins, R., Wang, Y., Dillig, I., Reps, T.W.: Componentbased synthesis for complex APIs. In: POPL, vol. 52, pp. 599–612. ACM (2017)Google Scholar
 10.Gvero, T., Kuncak, V., Kuraj, I., Piskac, R.: Complete completion using types and weights. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pp. 27–38. ACM (2013)Google Scholar
 11.Mandelin, D., Xu, L., Bodík, R., Kimelman, D.: Jungloid mining: helping to navigate the API jungle. In: Proceedings of the 26th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pp. 48–61. ACM (2005)Google Scholar
 12.Feser, J.K., Chaudhuri, S., Dillig, I.: Synthesizing data structure transformations from inputoutput examples. In: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pp. 229–239. ACM (2015)Google Scholar
 13.Polikarpova, N., Kuraj, I., SolarLezama, A.: Program synthesis from polymorphic refinement types. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pp. 522–538. ACM (2016)Google Scholar
 14.Wang, X., Dillig, I., Singh, R.: Program synthesis using abstraction refinement, vol. 2, pp. 63:1–63:30. ACM (2017)Google Scholar
 15.So, S., Oh, H.: Synthesizing imperative programs from examples guided by static analysis. In: Ranzato, F. (ed.) SAS 2017. LNCS, vol. 10422, pp. 364–381. Springer, Cham (2017). https://doi.org/10.1007/9783319667065_18CrossRefGoogle Scholar
 16.Wang, C., Cheung, A., Bodik, R.: Synthesizing highly expressive SQL queries from inputoutput examples. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pp. 452–466. ACM (2017)Google Scholar
 17.Singh, R.: BlinkFill: semisupervised programming by example for syntactic string transformations. Proc. VLDB Endow. 9(10), 816–827 (2016)CrossRefGoogle Scholar
 18.Blanc, R., Gupta, A., Kovács, L., Kragl, B.: Tree interpolation in vampire. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR 2013. LNCS, vol. 8312, pp. 173–181. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642452215_13CrossRefGoogle Scholar
 19.McMillan, K.L.: Applications of craig interpolants in model checking. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 1–12. Springer, Heidelberg (2005). https://doi.org/10.1007/9783540319801_1CrossRefGoogle Scholar
 20.McMillan, K.L.: Interpolation and SATbased model checking. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg (2003). https://doi.org/10.1007/9783540450696_1CrossRefGoogle Scholar
 21.Wang, X., Dillig, I., Singh, R.: Learning Abstractions for Program Synthesis. arXiv preprint arXiv:1804.04152 (2018)
 22.
 23.Keilhauer, A., Levy, S., Lochbihler, A., Ökmen, S., Thimm, G., Würzebesser, C.: JLinAlg: a javalibrary for linear algebra without rounding errors. Technical report (2003–2010). http://jlinalg.sourceforge.net/
 24.Bloem, R., Egly, U., Klampfl, P., Könighofer, R., Lonsing, F.: Satbased methods for circuit synthesis. In: Formal Methods in ComputerAided Design, FMCAD 2014, 21–24 October 2014, Lausanne, Switzerland, pp. 31–34. IEEE (2014)Google Scholar
 25.Farzan, A., Kincaid, Z.: Strategy synthesis for linear arithmetic games. Proc. ACM Program. Lang. 2(POPL), 61 (2017)CrossRefGoogle Scholar
 26.Beyer, D., Henzinger, T.A., Jhala, R., Majumdar, R.: The software model checker BLAST. Int. J. Softw. Tools Technol. Transf. 9(5–6), 505–525 (2007)CrossRefGoogle Scholar
 27.Albarghouthi, A., Li, Y., Gurfinkel, A., Chechik, M.: Ufo: a framework for abstraction and interpolationbased software verification. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 672–678. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642314247_48CrossRefGoogle Scholar
 28.LevAmi, T., Manevich, R., Sagiv, M.: TVLA: a system for generating abstract interpreters. In: Jacquart, R. (ed.) Building the Information Society. IIFIP, vol. 156, pp. 367–375. Springer, Boston, MA (2004). https://doi.org/10.1007/9781402081576_28CrossRefGoogle Scholar
 29.LevAmi, T., Sagiv, M.: TVLA: a system for implementing static analyses. In: Palsberg, J. (ed.) SAS 2000. LNCS, vol. 1824, pp. 280–301. Springer, Heidelberg (2000). https://doi.org/10.1007/9783540450993_15CrossRefzbMATHGoogle Scholar
 30.Pnueli, A., Ruah, S., Zuck, L.: Automatic deductive verification with invisible invariants. In: Margaria, T., Yi, W. (eds.) TACAS 2001. LNCS, vol. 2031, pp. 82–97. Springer, Heidelberg (2001). https://doi.org/10.1007/3540453199_7CrossRefGoogle Scholar
 31.Lahiri, S.K., Bryant, R.E.: Constructing quantified invariants via predicate abstraction. In: Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 267–281. Springer, Heidelberg (2004)CrossRefGoogle Scholar
 32.Reps, T., Thakur, A.: Automating abstract interpretation. In: Jobstmann, B., Leino, K.R.M. (eds.) VMCAI 2016. LNCS, vol. 9583, pp. 3–40. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662491225_1CrossRefzbMATHGoogle Scholar
 33.Reps, T., Sagiv, M., Yorsh, G.: Symbolic implementation of the best transformer. In: Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 252–266. Springer, Heidelberg (2004)CrossRefGoogle Scholar
 34.Thakur, A., Reps, T.: A method for symbolic computation of abstract operations. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 174–192. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642314247_17CrossRefGoogle Scholar
 35.Jhala, R., McMillan, K.L.: Interpolantbased transition relation approximation. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 39–51. Springer, Heidelberg (2005). https://doi.org/10.1007/11513988_6CrossRefGoogle Scholar
Copyright information
<SimplePara><Emphasis Type="Bold">Open Access</Emphasis>This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.</SimplePara><SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>