Deductive Program Repair
 22 Citations
 2 Mentions
 1.3k Downloads
Abstract
We present an approach to program repair and its application to programs with recursive functions over unbounded data types. Our approach formulates program repair in the framework of deductive synthesis that uses existing program structure as a hint to guide synthesis. We introduce a new specification construct for symbolic tests. We rely on such userspecified tests as well as automatically generated ones to localize the fault and speed up synthesis. Our implementation is able to eliminate errors within seconds from a variety of functional programs, including symbolic computation code and implementations of functional data structures. The resulting programs are formally verified by the Leon system.
Keywords
Fault Localization Recursive Function Recursive Call Deduction Rule Symbolic Test1 Introduction
This paper explores the problem of automatically repairing programs written as a set of mutually recursive functions in a purely functional subset of Scala. We consider a function to be subject to repair if it does not satisfy its specification, expressed in the form of pre and postcondition. The task of repair consists of automatically generating an alternative implementation that meets the specification. The repair problem has been studied in the past for reactive and pushdown systems [8, 10, 11, 19, 20, 26]. We view repair as generalizing, for example, the choose construct of complete functional synthesis [15], sketching [21, 22], and program templates [23], because the exact location and nature of expressions to be synthesized is left to the algorithm. Repair is thus related to localization of error causes [12, 14, 27]. To speed up our repair approach, we do use coarsegrained error localization based on derived test inputs. However, a more precise nature of the fault is in fact the outcome of our tool, because the repair identifies a particular change that makes the program correct. Using tests alone as a criterion for correctness is appealing for performance reasons [7, 17, 18], but this can lead to erroneous repairs. We therefore leverage prior work [13] on verifying and synthesizing recursive functional programs with unbounded datatypes (trees, lists, integers) to provide strong correctness guarantees, while at the same time optimizing our technique to use automatically derived tests. By phrasing the problem of repair as one of synthesis and introducing tailored deduction rules that use the original implementation as guide, we allow the repairoriented synthesis procedure to automatically find correct fixes, in the worst case resorting to resynthesizing the desired function from scratch. To make the repair approach practical, we found it beneficial to extend the power and generality of the synthesis engine itself, as well as to introduce explicit support for symbolic tests in the specification language and the repair algorithm.
Contributions. The overall contribution of this paper is a new repair algorithm and its implementation inside a deductive synthesis framework for recursive functional programs. The specific new techniques we contribute are the following.

Exploration of similar expressions. We present an algorithm for expression repair based on a grammar for generating expressions similar to a given expression (according to an error model we propose). We use such grammars within our new generic symbolic term exploration routine, which leverages test inputs as well as an SMT solver, and efficiently explores the space of expressions that contain recursive calls whose evaluation depends on the expression being synthesized.

Fault localization. To narrow down repair to a program fragment, we localize the error by doing dynamic analysis using test inputs generated automatically from specifications. We combine two automatic sources of inputs: enumeration techniques and SMTbased techniques. We collect traces leading to erroneous executions and compute common prefixes of branching decisions. We show that this localization is in practice sufficiently precise to repair sizeable functions efficiently.

Symbolic examples. We propose an intuitive way of specifying possibly symbolic inputoutput examples using pattern matching of Scala. This allows the user to partially specify a function without necessarily having to provide full inputs and outputs. Additionally, it enables the developer to easily describe properties of generic (polymorphic) functions. We present an algorithm for deriving new examples from existing ones, which improves the usefulness of example sets for fault localization and repair.
In our experience, the combination of formal specification and symbolic examples gives the user significant flexibility when specifying functions, and increases success rates when discovering and repairing program faults.

Integration into a deductive synthesis and verification framework. Our repair system is part of a deductive verification system, so it can automatically produce new inputs from specification, prove correctness of code for all inputs ranging over an unbounded domain, and synthesize program fragments using deductive synthesis rules that include common recursion schemas.
2 Deductive Guided Repair
We next describe our deductive repair framework. The framework currently works under several assumptions, which we consider reasonable given the state of the art in repair of infinitestate programs. We consider the specifications of functions as correct; the code is assumed wrong if it cannot be proven correct with respect to this specification for all of the infinitely many inputs. If the specification includes inputoutput tests, it follows that the repaired function must have the same behavior on these tests. We do not guarantee that the output of the function is the same as the original one on tests not covered by the specification, though the repair algorithm tends to preserve some of the existing behaviors due to the local nature of repair. It is the responsibility of the developer to sufficiently specify the function being repaired. Although underspecified benchmarks may produce unexpected expressions as repair solutions, we found that even partial specifications often yield the desired repairs. A particularly effective specification style in our experience is to give a partial specification that depends on all components of the structure (for example, describes property of the set of stored elements), and then additionally provide a finite number of symbolic inputoutput tests. We assume that only one function of the program is invalid; the implementation of all other functions is considered valid as far as the repair of interest is concerned. Finally, we assume that all functions of the program, even the invalid one, terminate.

Test generation and verification. We combine enumeration and SMTbased techniques to either verify the validity of the function, or, if it is not valid, discover counterexamples (examples of misbehaviors).

Fault localization. Our localization algorithm then selects the smallest expression executed in all failing tests, modulo recursion.

Synthesis of similar expressions. This erroneous expression is replaced by a “program hole”. The nowincomplete function is sent to synthesis, with the previous expression used as a synthesis hint. (Neither the notion of holes nor the notion of synthesis hints has been introduced in prior work on deductive synthesis [13]).

Verification of the solution. Lastly, the system attempts to prove the validity of the discovered solution. Our results in Sect. 5, Fig. 4 indicate in which cases the synthesized function passed the verification.
2.1 Fault Localization
A contribution of our system is the ability to focus the repair problem to a small subpart of the function’s body that is responsible for its erroneous behavior. The underlying hypothesis is that most of the original implementation is correct. This technique allows us to reuse as much of the original implementation as possible and minimizes the size of the expression given to subsequent more expensive techniques. Focusing also has the profitable sideeffect of making repair more predictable, even in the presence of weak specifications: repaired implementation tends to produce programs that preserve some of the existing branches, and thus have the same behavior on the executions that use only these preserved branches. We rely on the list of examples that fail the function specification to lead us to the source of the problem: if all failing examples only use one branch of some branching expression in the program, then we assume that the error is contained in that branch. We define \(\mathcal {F}\) as the set of all inputs of collected failing tests (see Sect. 4). We describe focusing using the following rules.
The above rules use tests to locally approximate the validity of branches. They are sound only if \(\mathcal {F}\) is sufficiently large. Our system therefore performs an endtoend verification for the complete solution, ensuring the overall soundness.
2.2 Guided Decompositions
2.3 Generating Recursive Calls
Our purely functional language often requires us to synthesize recursive implementations. Consequently, the synthesizer must be able to generate calls to the function currently getting synthesized. However, we must take special care to avoid introducing calls resulting in a nonterminating implementation. (Such an erroneous implementation would be conceived as valid if it trivially satisfies the specification due to inductive hypothesis over a nonwellfounded relation.)
This relatively simple technique allows us to introduce recursive calls while filtering trivially nonterminating calls. In the case where it still introduces infinite recursion, we can discard the solution using a more expensive termination checker, though we found that this is seldom needed in practice.
2.4 Synthesis Within Repair
The repairspecific rules described earlier aim at solving repair problems according to the error model. Thanks to integration into the Leon synthesis framework, general synthesis rules also apply, which enables the repair of more intricate errors. This achieves an appealing combination between fast repairs for predictable errors and expressive, albeit slower, repairs for more complicated errors.
3 CounterexampleGuided SimilarTerm Exploration
After following the overall structure of the original problem, it is often the case that the remaining erroneous branches can be fixed by applying small changes to their implementations. For instance, an expression calling a function might be wrong only in one of its arguments or have two of its arguments swapped. We exploit this assumption by considering different variations to the original expression. Due to the lack of a large code base in pure Scala subset that Leon handles, we cannot use statistically informed techniques such as [9], so we define an error model following our intuition and experience from previous work.
4 Generating and Using Tests for Repair
Tests play an essential role in our framework, allowing us to gather information about the valid and invalid parts of the function. In this section we elaborate on how we select, generate, and filter examples of inputs and possibly outputs. Several components of our system then make use of these examples. We distinguish two kinds of tests: input tests and inputoutput tests. Namely, input tests provide valid inputs for the function according to its precondition, while inputoutput tests also specify the exact output corresponding to each input.
Extraction and Generation of Tests. Our system relies on three main sources for tests that are used to make the repair process more efficient.
Partial specifications using the passes construct, allowing to match more than one inputs and providing the expected output as an expression.
Having partially symbolic inputoutput examples strikes a good balance between literal examples and fullfunctional specifications. From the symbolic tests, we generate concrete inputoutput examples by instantiating each pattern several times using enumeration techniques, and executing the output expression to yield an output value. For instance, from case Cons(a, Cons(b, Nil()))\(\Rightarrow \)a + b we will generate the following tests resulting from replacing a, b with all combinations of values from a finite set, including, for example, test with input Cons(1, Cons(2, Nil())) and output 3. We generate up to 5 distinct tests per pattern, when possible. These symbolic specifications are the only forms of tests provided by the developer; any other tests that our system uses are derived automatically.
(2) Generated input tests. We rely on the same enumeration technique to generate inputs satisfying the precondition of the function. Using a generate and test approach, we gather up to 400 valid input tests in the first 1000 enumerated.
(3) Solvergenerated Tests. Lastly, we rely on the underlying solvers for recursive functions of Leon [25] to generate counterexamples. Given that the function is invalid and that it terminates, the solver (which is complete for counterexamples) is guaranteed to eventually provide us with at least one failing test.
Automatically repaired functions using our system. We provide for each operation: a small description of the kind of error introduced, the overall program size, the size of the invalid function, the size of the erroneous expression we locate and the size of the repaired version. We then provide the times our tool took to: gather and classify tests, and repair the erroneous expression. Finally, we mention if the resulting expression verifies. The source of all benchmarks can be found on http://lara.epfl.ch/w/leonrepair (see also http://leon.epfl.ch)
5 Evaluation
We evaluate our implementation on a set of benchmarks in which we manually injected errors (Fig. 4). The programs mainly focus on data structure implementations and syntax tree operations. Each benchmark is comprised of algebraic datatype definitions and recursive functions that manipulate them, specified using strong yet still partial preconditions and postconditions. We manually introduced errors of different types in each copy of the benchmarks. We ran our tool unassisted until completion to obtain a repair, providing it only with the name of the file and the name of the function to repair (typically the choice of the function could also have been localized automatically by running the verification on the entire file). The experiments were run on an Intel(R) Core(TM) i72600K CPU @ 3.40GHz with 16GB RAM, with 2GB given to the Java Virtual Machine. While the deductive reasoning supports parallelism in principle, our implementation is currently singlethreaded.
The repairs listed in evaluation are not only valid according to their specification, but were also manually validated by us to match the intended behavior. A failing proof thus does not indicate a wrong repair, but rather that our system was not able to automatically derive a proof of its correctness, often due to insufficient inductive invariants. We identify three scenarios under which repair itself may not succeed: if the assumptions mentioned in Sect. 2 are violated, when the necessary repair is either too big or outside of the scope of general synthesis, or if test collection does not yield sufficiently many interesting failing tests to locate the error.
6 Further Related Work
Much of the prior work focused on imperative programming, without native support for algebraic data types, making it typically infeasible to even automatically verify data structure properties of the kind that our benchmarks contain. Syntaxguided synthesis format [1, 2] does not support algebraic data types, or specific notion of repair (it could be used to specify some of the subproblems that our system generates, such those of Sect. 3).
GenProg [7] and SemFix [17] accept as input a C program along with userprovided sets of passing and failing test cases, but no formal specification. Our technique for fault localization is not applicable to a sequential program with sideeffects, and these tools employ statistical fault localization techniques, based on program executions. GenProg applies no code synthesis, but tries to repair the program by iteratively deleting, swapping, or duplicating program statements, according to a genetic algorithm. SemFix, on the other hand, uses synthesis, but does not take into account the faulty expression while synthesizing. AutoFixE/E2 [18] operates on Eiffel programs equipped with formal contracts. Formal contracts are used to automatically generate a set of passing and failing test cases, but not to verify candidate solutions. AutoFixE uses an elaborate mechanism for fault localization, which combines syntactic, control flow and statistical dynamic analysis. It follows a synthesis approach with repair schemas, which reuse the faulty statement (e.g. as a branch of a conditional). Samanta et al. [20] propose abstracting a C program with a boolean constraint, repairing this constraint so that all assertions in the program are satisfied by repeatedly applying to it update schemas according to a cost model, then concretize the boolean constraint back to a repaired C program. Their approach needs developer intervention to define the cost model for each program, as well as at the concretization step. Logozzo et al. [16] present a repair suggestion framework based on static analysis provided by the CodeContracts static checker [5]; the properties checked are typically simpler than those in our case. In [6], Gopinath et al. repair data structure operations by picking an input which exposes a suspicious statement, then using a SATsolver to discover a corresponding concrete output that satisfies the specification. This concrete output is then abstracted to various possible expressions to yield candidate repairs, which are filtered with bounded verification. In their approach, Chandra et al. [3] consider an expression as a candidate for repair if substituting it with some concrete value fixes a failing test.
Repair has also been studied in the context of reactive and pushdown systems with otherwise finite control [8, 10, 11, 19, 20, 26]. In [26], the authors generate repairs that preserve explicitly subsets of traces of the original program, in a way strengthening the specification automatically. We deal with the case of functions from inputs to outputs equipped with contracts. In case of a weak contract we provide only heuristic guarantees that the existing behaviors are preserved, arising from the tendency of our algorithm to reuse existing parts of the program.
7 Conclusions
We have presented an approach to program repair of mutually recursive functional programs, building on top of a deductive synthesis framework. The starting point gives it the ability to verify functions, find counterexamples, and synthesize small fragments of code. When doing repair, it has proven fruitful to first localize the error and then perform synthesis on a small fragment. Tests proved very useful in performing such localization, as well as for generally speeding up synthesis and repair. In addition to deriving tests by enumeration and verification, we have introduced a specification construct that uses pattern matching to describe symbolic tests, from which we efficiently derive concrete tests without invoking fullfledged verification. In case of tests for recursive functions, we perform dependency analysis and introduce new ones to better localize the cause of the error. While localization of errors within conditional control flow can be done by analyzing test runs, the challenge remains to localize change inside large expressions with nested function calls. We have introduced the notion of guided synthesis that uses the previous version of the code as a guide when searching for a small change to an existing large expression. The use of a guide is very flexible, and also allows us to repair multiple errors in some cases.
Our experiments with benchmarks of thousands of syntax tree nodes in size, including tree transformations and data structure operations confirm that repair is more tractable than synthesis for functional programs. The existing (incorrect) expression provides a hint on useful code fragments from which to build a correct solution. Compared to unguided synthesis, the common case of repair remains more predictable and scalable. At the same time, the developer need not learn a notation for specifying holes or templates. We thus believe that repair is a practical way to deploy synthesis in software development.
References
 1.Alur, R., Bodik, R., Dallal, E., Fisman, D., Garg, P., Juniwal, G., KressGazit, H., Madhusudan, P., Martin, M.M.K., Raghothaman, M., Saha, S., Seshia, S. A., Singh, R., SolarLezama, A., Torlak, E., Udupa, A.: Syntaxguided synthesis. To Appear in Marktoberdrof NATO proceedings, (2014). http://sygus.seas.upenn.edu/files/sygus_extended.pdf. Accessed 02 June 2015
 2.Alur, R., Bodík, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S. A., Singh, R., SolarLezama, A., Torlak, E., Udupa, A.: Syntaxguided synthesis. In: FMCAD, pp. 1–17. IEEE (2013)Google Scholar
 3.Chandra, S., Torlak, E., Barman, S., Bodík, R.: Angelic debugging. In: Taylor, R.N., Gall, H.C., Medvidovic, N. (eds.) ICSE, pp. 121–130. ACM, New york (2011)Google Scholar
 4.de Moura, L., Bjørner, N.S.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008) CrossRefGoogle Scholar
 5.Fähndrich, M., Logozzo, F.: Static contract checking with abstract interpretation. In: Beckert, B., Marché, C. (eds.) FoVeOOS 2010. LNCS, vol. 6528, pp. 10–30. Springer, Heidelberg (2011) CrossRefGoogle Scholar
 6.Gopinath, D., Malik, M.Z., Khurshid, S.: SpecificationBased Program Repair Using SAT. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 173–188. Springer, Heidelberg (2011) CrossRefGoogle Scholar
 7.Goues, C.L., Nguyen, T., Forrest, S., Weimer, W.: Genprog: a generic method for automatic software repair. TSE 38(1), 54–72 (2012)CrossRefGoogle Scholar
 8.Griesmayer, A., Bloem, R., Cook, B.: Repair of boolean programs with an application to C. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 358–371. Springer, Heidelberg (2006) CrossRefGoogle Scholar
 9.Gvero, T., Kuncak, V., Kuraj, I., Piskac, R., Complete completion using types and weights. In: PLDI, pp. 27–38 (2013)Google Scholar
 10.Jobstmann, B., Griesmayer, A., Bloem, R.: Program repair as a game. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 226–238. Springer, Heidelberg (2005) CrossRefGoogle Scholar
 11.Jobstmann, B., Staber, S., Griesmayer, A., Bloem, R.: Finding and fixing faults. JCSS 78(2), 441–460 (2012)MathSciNetzbMATHGoogle Scholar
 12.Jose, M., Majumdar, R.: Cause clue clauses: error localization using maximum satisfiability. In: Hall, M.W., Padua, D.A. (eds.) PLDI, pp. 437–446. ACM, New york (2011)Google Scholar
 13.Kneuss, E., Kuraj, I., Kuncak, V., Suter, P.: Synthesis modulo recursive functions. In: Hosking, A.L., Eugster, P.T., Lopes, C.V. (eds.) OOPSLA, pp. 407–426. ACM, New york (2013)Google Scholar
 14.Könighofer, R., Bloem, R.: Automated error localization and correction for imperative programs. In: Bjesse, P., Slobodová, A. (eds.) FMCAD, pp. 91–100. FMCAD Inc, Austin (2011)Google Scholar
 15.Kuncak, V., Mayer, M., Piskac, R., Suter, P.: Functional synthesis for linear arithmetic and sets. STTT 15(5–6), 455–474 (2013)CrossRefGoogle Scholar
 16.Logozzo, F., Ball, T.: Modular and verified automatic program repair. In: Leavens, G.T., Dwyer, M.B. (eds.) OOPSLA, pp. 133–146. ACM, Newyork (2012)Google Scholar
 17.Nguyen, H.D.T., Qi, D., Roychoudhury, A., Chandra, S.: Semfix: program repair via semantic analysis. In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) ICSE, pp. 772–781. IEEE and ACM, New York (2013)Google Scholar
 18.Pei, Y., Wei, Y., Furia, C.A., Nordio, M., Meyer, B.: Codebased automated program fixing. ArXiv eprints (2011). arXiv:1102.1059
 19.Samanta, R., Deshmukh, J.V., Emerson, E.A.: Automatic generation of local repairs for boolean programs. In: Cimatti, A., Jones, R.B. (eds.) FMCAD, pp. 1–10. IEEE, New York (2008)Google Scholar
 20.Samanta, R., Olivo, O., Emerson, E.A.: Costaware automatic program repair. In: MüllerOlm, M., Seidl, H. (eds.) Static Analysis. LNCS, vol. 8723, pp. 268–284. Springer, Heidelberg (2014) Google Scholar
 21.SolarLezama, A.: Program sketching. STTT 15(5–6), 475–495 (2013)CrossRefGoogle Scholar
 22.SolarLezama, A., Tancau, L., Bodík, R., Seshia, S.A., Saraswat, V.A.: Combinatorial sketching for finite programs. In: ASPLOS, pp. 404–415 (2006)Google Scholar
 23.Srivastava, S., Gulwani, S., Foster, J.S.: Templatebased program verification and program synthesis. STTT 15(5–6), 497–518 (2013)CrossRefGoogle Scholar
 24.Suter, P.: Programming with Specifications. Ph.D. thesis, EPFL, December 2012Google Scholar
 25.Suter, P., Köksal, A.S., Kuncak, V.: Satisfiability modulo recursive programs. In: Yahav, E. (ed.) Static Analysis. LNCS, vol. 6887, pp. 298–315. Springer, Heidelberg (2011) CrossRefGoogle Scholar
 26.von Essen, C., Jobstmann, B.: Program repair without regret. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 896–911. Springer, Heidelberg (2013) CrossRefGoogle Scholar
 27.Zeller, A., Hildebrandt, R.: Simplifying and isolating failureinducing input. TSE 28(2), 183–200 (2002)CrossRefGoogle Scholar