Automatic Repair for Network Programs

. Debugging imperative network programs is a difficult task for operators as it requires understanding various network modules and complicated data structures. For this purpose, this paper presents an automated technique for repairing network programs with respect to unit tests. Given as input a faulty network program and a set of unit tests, our approach localizes the fault through symbolic reasoning, and synthe-sizes a patch ensuring that the repaired program passes all unit tests. It applies domain-specific abstraction to simplify network data structures and exploits function summary reuse for modular symbolic analysis. We have implemented the proposed techniques in a tool called NetRep and evaluated it on 10 benchmarks adapted from real-world software-defined network controllers. The evaluation results demonstrate the effectiveness and efficiency of NetRep for repairing network programs.

In the SDN framework, a logically centralized control plane generates rules that are installed into data planes, which in turn decides the routing of packets throughout the network. While network verification is a well-studied field where operators can be hinted on incorrectly installed rules [3,4,22], little prior work has explored the problem of automatically repairing the corresponding bug in the control plane, especially those written in widely used general-purpose languages such as Java or Python. Existing work mostly restricts the target to control plane programs written in domain-specific languages such as Datalog [51,17].
Since networks cannot tolerate even small mistakes, and most network operators are not trained in programming skills, debugging and repair tools in this domain should prioritize accuracy and automation. This means that many existing techniques for general program repair are not suitable to this domain as they trade off accuracy for heuristics for scaling with the size of analyzed programs and number of discovered potential bugs.
Motivated by the demand for automated repair and the limitations of existing techniques, we develop a precise and scalable program repair technique for network programs. Specifically, our repair technique takes as input a network program and a set of unit tests, reveals the program location that causes the test failure, and automatically generates a patch to fix the program. In the setting of SDN, a unit test corresponds to an incorrectly installed routing rule generated by the control plane from a reported packet. Such unit tests can be discovered by a separate network verification procedure [3,4,22].
Our main idea is to use symbolic reasoning using constraints capturing the semantics of the program for accurate repair, and modular analysis to improve the efficiency. We extended the encoding techniques from prior work [21,12] to support object-oriented features in Java. We also developed a new approach to focus the analysis on one function at a time and gradually narrow down the range of faulty statements along with the specification for the expected behavior.
The proposed technique is implemented in an automatic network program repair tool called NetRep. To evaluate NetRep, we adapt 10 benchmarks from real-world faulty network programs in Floodlight that require changing up to 3 lines of code to fix and apply NetRep to repair the benchmarks automatically. The experimental results show that NetRep is able to find a repair that passes all unit tests for faulty programs up to 738 lines of code for 8 benchmarks using 2 or 3 test cases, outperforming a state-of-the-art repair tool for general Java programs. Furthermore, NetRep is efficient in terms of repair time, requiring only an average running time of 744 seconds across all benchmarks.
Contributions. We make the following main contributions in this paper: -We present an automated program repair technique that aims to help network operators debug and fix network controller programs automatically. -We describe a bug localization approach based on symbolic execution and constraint solving for programs with imperative object-oriented features such as virtual function calls. -We propose novel modular analysis techniques to effectively scale up the symbolic reasoning for automatic repair. -We develop a tool called NetRep based on the proposed techniques and evaluate it using 10 benchmarks adapted from real-world network programs. The evaluation results demonstrate that NetRep is effective for bug localization and able to generate correct patches for realistic network programs.

Overview
In this section, we give a high-level overview of our repair techniques and walk through the NetRep tool using an example adapted from the Floodlight SDN controller [9]. Figure 1 shows a simplified code snippet about firewall rules in Floodlight. Specifically, the program consists of two classes -FirewallRule and MacAddr. The FirewallRule class describes rules enforced by the firewall, including information about source and destination mac addresses. The MacAddr class is an auxiliary data structure that stores the raw value of mac addresses 3 .
The network program shown in Figure 1 is problematic because the isSameAs function compares two mac addresses using the != operator rather than a negation of the equals functions. The != operator only compares two objects based on their memory addresses, whereas the intent of the developer is to check if two mac addresses have the same raw value. The bug is revealed by the unit test in Figure 2, then confirmed and fixed by the Floodlight developers 4 . Next, let us illustrate how NetRep localizes this bug based on unit tests test(1, 2) = false and test(1, 1) = true and automatically synthesizes a patch to fix it.
At a high level, NetRep enters a loop that iteratively attempts to find the fault location and synthesize the patch. Since our repair technique works in a modular fashion, NetRep first selects a function F in the program and tries to repair each possible fault location at a time. If NetRep cannot synthesize a patch consistent with the provided unit tests for any potential fault location in F , it backtracks and selects the next function and repeats the same process until all possible functions are checked. We now describe the experience of running NetRep on our illustrative example. Iteration 1. NetRep selects the constructor of FirewallRule as the target function. Fault localization determines that the fault is located at the dl dst = MacAddr.NONE part of Line 10, because it is related to the equality checking in the unit test. However, it is not the fault location. NetRep tries to synthesize a patch that passes all unit tests to replace this statement, but fails. Iteration 2. NetRep selects the same function -constructor of FirewallRule, but the fault localization switches to a different statement any dl dst = true at Line 10. Similar to Iteration 1, the synthesizer cannot generate a correct patch by replacing this statement. Iteration 3. Since none of the statements in the constructor is the fault location, NetRep now selects a different function: isSameAs. The fault localization determines that any dl dst = false at Line 13 may be the fault location as it may affect the testing results. However, having tried to replace the statement with many other candidate statements, e.g., r.any dl dst = false, any dl dst = true, the synthesizer still fails to generate the correct patch. Last iteration. Finally, after several attempts to localize the fault, NetRep identifies the fault lies in dl dst != r.dl dst at Line 14, which is indeed the reported bug location. At this time, the synthesizer manages to generate a correct patch !dl dst.equals(r.dl dst). Replacing the original condition at Line 14 with this patch results in a program that can pass all the provided test cases, so NetRep has successfully repaired the original faulty program.

Preliminaries
In this section, we present the language of network programs and describe a program formalism that is used in the rest of paper. We also define the program repair problem that we want to solve.

Language of Network Programs
The language of network programs considered in this paper is summarized in Figure 3. A network program consists of a set of classes, where each class has an optional annotation @network to denote that the class can benefit from network domain-specific abstraction.
Imm v : Each class in the program consists of a list of fields and functions. Each function has a name, a parameter list, and a function body. The function body is a list of statements, where each statement is labeled with its line number. Various kinds of statements are included in our language of network programs. Specifically, assign statement l := e assigns expression e to left value l. Conditional jump statement jmp (e) L first evaluates predicate e. If the result is true, then the control flow jumps to line L; otherwise, it performs no operation. Note that our language does not have traditional if statements or loop statements, but those statements can be expressed using conditional jumps. 5 Return statement ret v exits the current function with return value v. New statement x := new C creates an object of class C and assigns the object address to variable x. Static call x := C.f (v 1 , . . . , v n ) invokes the static function f in class C with arguments v 1 , . . . , v n and assigns the return value to variable x. Similarly, virtual call x := y.f (v 1 , . . . , v n ) invokes the virtual function f on receiver object y with arguments v 1 , . . . , v n and assigns the return value to variable x. Different kinds of expressions are supported including constants, variable accesses, field accesses, array accesses, arithmetic operations, and logical operations. Since the semantics of network programs is similar to that of traditional programs written in object-oriented languages, we omit the formal description of semantics.
In addition, we assume each statement in the program is labeled with a globally unique line number, and line numbers are consecutive within a function.

Problem Statement
We assume a unit test t is written in the form of a pair (I, O), where I is the input and O is the expected output. Given a network program P and a unit test t = (I, O), we say P passes the test t if executing P on input I yields the expected output O, denoted by P I = O. Otherwise, if P I ̸ = O, we say P fails the test t. In general, given a network program P and a set of unit tests E, program P is faulty modulo E if there exists a test t ∈ E such that P fails on t. Now let us turn the attention to the meaning of fault locations and patches.
Definition 1 (Fault location and patch). Let P be a program that is faulty modulo tests E. Line L is called the fault location of P, if there exists a statement Algorithm 1 Modular Program Repair s such that replacing line L of P with s yields a new program that can pass all tests in E. Here, the statement s is called a patch to P. Problem statement. Given a network program P that is faulty modulo tests E, our goal is to find a fault location L in P and generate the corresponding patch s, such that for any unit test t ∈ E, the patched program P ′ can always pass the test t.

Modular Program Repair
In this section, we present our algorithm for automatically repairing network programs from a set of unit tests.

Algorithm Overview
The top-level repair algorithm is described in Algorithm 1. The Repair procedure takes as input a faulty network program P and unit tests E and produces as output a repaired program P ′ or ⊥ to indicate repair failure. At a high level, the Repair procedure maintains a visited map V from line numbers to boolean values, representing whether each line of P is checked or not.
The Repair procedure first applies the domain-specific abstraction to program P (Line 2) and initializes the visited map V by setting every line in P as not checked (Line 3). Next, it tries to iteratively repair P in a modular way until it finds a program P ′ that is not faulty modulo tests E (Lines 4 -8). In particular, the Repair procedure invokes SelectFunction to choose a function F as the target of repair (Line 5). If none of the functions in P can be repaired, it returns ⊥ to indicate that the repair procedure failed (Line 6). Otherwise, it invokes the RepairFunction procedure (Line 7) to enter the localization-synthesis loop inside the target function F .
In addition to the program P and tests E, the RepairFunction procedure takes as input a target function F and the current visited map V. It produces as output the updated version of the visited map V, as well as a repaired program P ′ or ⊥ to indicate that the function F cannot be repaired. As shown in Lines 11 -18 of Algorithm 1, RepairFunction alternatively invokes sub-procedures LocalizeFault and SynthesizePatch to repair the target function. In particular, the goal of LocalizeFault is to identify a fault location in function F . If LocalizeFault manages to find a fault location L in F , then line L is marked as visited (Line 14). Otherwise, if LocalizeFault returns ⊥, it means function F and all functions transitively invoked in F are correct or not repairable. In this case, all lines in F and its transitive callees are marked as checked (Line 16). Furthermore, if the identified fault location L corresponds to a statement that invokes F ′ , it means the fault location is inside F ′ . Thus, RepairFunction directly returns ⊥ (Line 17) and SelectFunction will choose F ′ as the target function in the next iteration. On the other hand, the goal of the sub-procedure SynthesizePatch is generating a patch for function F given the fault location L. If SynthesizePatch successfully synthesizes a patch and produces a non-faulty program P ′ , then the entire procedure succeeds with repaired program P ′ . Otherwise, RepairFunction backtracks with a new program location and repeat the same process.
In the rest of this section, we explain fault localization, modular analysis, and patch synthesis in more detail.

Fault Localization
Next, we give a high-level description of our fault localization technique that aims to find the fault location in a given program. This corresponds to the LocalizeFault procedure in Algorithm 1. We will first show how to encode the problem on an entire program, and then explain how the analysis can be made modular to boost the performance.
At a high level, our fault localization technique uses a symbolic approach by reducing the fault localization problem into a constraint solving problem. In particular, we introduce a boolean variable for each line L, denoted by B[L], and encode the fault localization problem as an SMT formula, such that the value of the variable B[L] indicates whether line L is correct or not.
Checking faulty programs. To understand how to encode the fault localization problem, let us first explain how to encode the consistency check given a program P and a test case t = (I, O). Specifically, the encoded SMT formula Φ(t) consists of three components: 1. Semantic constraints. For each line L i : s i , we generate a formula Φ i (S, S ′ ) to describe the semantics of the statement s i . Specifically, given a state S that holds before statement s i , Φ i (S, S ′ ) is valid if S ′ is the state after executing s i . There are two parts of the constraint: the memory contents that are changed, and the memory contents that are preserved. For example, in case of an assignment statement, the constraint will claim that 1) the evaluation result of the right value in state S equals to the left value in state S ′ , and 2) all values except for the left value are the same in S and S ′ . 2. Control flow integrity constraints. In order to ensure all traces satisfying the constraint faithfully follow the control flow structure of a given program P, we generate another set of formulae Φ f . Specifically, we require that any line of code that is executed must have exactly one predecessor and one successor that are executed, and the branch condition in the code must be respected when picking the successor. This guarantees that there is exactly one valid execution trace corresponding to one test case, 3. Consistency between program and test. For the provided test case t = (I, O), we also generate formula Φ in (S 0 , I) and Φ out (S n , O) to ensure the program behavior is consistent with the test. In particular, Φ in (S 0 , I) binds input I to the initial state S 0 and Φ out (S n , O) describes the connection between output O and final state S n .
The satisfiability of formula Φ(t) indicates the result of consistency check. If Φ(t) is satisfiable, the solver generates a feasible execution trace and an assignment of all intermediate states along this trace. In this case, program P can pass the test t because there exists a valid trace following the control flow and every pair of adjacent states in the trace is consistent with the semantics of the corresponding statement. Otherwise, if Φ(t) is unsatisfiable, P fails the test t. Now to check whether P against a set of unit tests E, we can conjoin the formula Φ(t j ) for each unit test t j ∈ E and obtain the conjunction Φ = tj ∈E Φ(t j ). The satisfiability of formula Φ indicates whether P is faulty modulo tests E 6 .
Methodology of fault localization. Let P be a faulty program modulo E, we know the corresponding formula Φ for consistency check is unsatisfiable. Suppose the fault location is line L i , one key insight is that replacing the semantic constraint Φ i (S, S ′ ) with true yields a satisfiable formula. This is because true does not enforce any constraint between the pre-state S and post-state S ′ , so a previously invalid trace caused by the bug at L becomes valid now.
Based on this insight, we develop a methodology to find the fault location using symbolic reasoning. Specifically, given a consistency check formula Φ, we can obtain a fault localization formula Φ ′ by replacing the semantic constraint One hiccup here is that formula Φ ′ is always satisfiable and a model of Φ ′ can simply assign B[L i ] = false for all L i . It means all lines in the program are fault locations, which is not useful for fault localization. To address this issue, we can add a cardinality constraint stating there are exactly K variables in map B that can be assigned to false, which forces the constraint solver to find exactly K fault locations in program P. Modular analysis. The method above can precisely compute a potential fault location. But an obvious shortcoming is it is hard to scale. Encoding a long program involves 1) a large number of semantic constraints, 2) many fault location choices, as well as 3) many intermediate states to be assigned.
Notice that although a program can be arbitrarily long, developers usually follow the design practice that every function is of limited size. Focusing on analyzing one function at a time and recursively search for the final fault location could be way more efficient than solving one NP-hard problem at the entire program's scale.
To facilitate modular analysis of a function, we need to summarize the behavior of its sub-modules (callee functions) and infer external specification from its higher-level module (caller function).
The encoding method introduced above treats one line of code as a constraint on its pre-state and post-state. To summarize the behavior of a callee function, we aim to turn it into a similar constraint on the pre-state and post-state for the calling statement. The inner states of this callee function should be skipped in the encoding. We can compute such summaries of the target function's callees by symbolic execution. We start with a symbolic representation of the pre-state and execute the callee function until it returns, and claim that the output state equals the post-state. In this way, we can entirely eliminate all bug location choices and inner state assignments in the callee function, as well as greatly simplifying the semantic constraint.
There are two ways to infer the specification of target function. The first way is to encode only the calling stack of the target function up until the top-level function, where we can use the test case as the specification. All function calls made by the target's caller and transitive callers that are not in the stack can be replaced by the automatically computed summary. We can also disable all fault location choices except for lines in the target function. Another way is to infer a possible pre-condition and post-condition of the target function. From the perspective of the caller, the target function is a line of code that puts an incorrect constraint on its pre-state and post-state. After the analysis, the constraint solver will infer a feasible pre-state and post-state assuming this incorrect constraint is removed. This assignment can be used as the pre-condition and post-condition, which eliminates the need to encode any caller function. Since the second approach will possibly introduce incompleteness into the analysis, we use it only to infer a specification to synthesize the final patch, and use the first one for every function's analysis. Domain-specific abstraction. A domain-specific abstraction is essentially a function summary as discussed above. But for those repeatedly used network classes (identified by the @network annotation), we can pre-define some more succinct abstractions based on domain knowledge to make the analysis easier. The abstraction A[F ] of a function F is an over-approximation of F that is precise enough to characterize the behavior of F .
The abstraction is useful due to two observations. First, source code for network programs may only be partially available due to the use of high-level interface and native implementation. For example, when comparing the equality between two network addresses, the getClass function is frequently used, but its implementation depends on the runtime and is not available. To make the analysis easier, we can instead use the following abstraction for such comparison: where x.dtype denotes the dynamic type of the object x. Second, network programs have complex operations that are challenging for symbolic reasoning. For instance, bit manipulations are heavily used in network data structures. While bit manipulations can improve the performance of network programs, they present significant challenges for symbolic analysis due to the encoding in the theory of bitvectors. We can give an abstraction equivalent in correctness but simpler in the behavior, e.g., using the identity function instead of a hash code computation.

Patch Synthesis
The last step of our repair algorithm is to generate a patch to fix the faulty program. This corresponds to the SynthesizePatch procedure in Algorithm 1. It can be reduced to a sketch finishing problem in program synthesis where we replace the existing faulty line with a hole.
Our general idea is to use plain enumerative search with a depth bound in the candidate patch's space, but with two significant optimizations.
First, we reduce the search space with heuristics. On one hand, we only replace the core expression in the faulty statement with a hole to focus on the most expressive part. To be specific, we consider changing the right-hand-side expressions of assignments, conditional expressions of jump statements, return values of return statements, and functions and arguments for function invocations. On the other hand, we use a limited grammar to guide the search. We parameterize all constants, variables, fields, functions, and operators over the sketch and only instantiate constructs that are in scope. For example, given a particular sketch with a hole, we only populate the variable set with all local and global variables that are in scope of the hole. Also, if the hole corresponds to the conditional expression of a if statement, we only add logical operators to the grammar.
Second, we use the local specification to guide the synthesis. Sketch completion is different from synthesizing a complete program in that the specification is defined for the entire program. We have to repeatedly waste time on executing the correct part of the program to verify a candidate patch. We use the technique described in the modular analysis section to generate a pre-condition and post-condition for only the faulty line. In this way, only the generated patch needs to be executed to verify against the specification, which greatly saves time when the program grows larger.

Implementation
We have implemented the proposed repair technique in a tool called NetRep. NetRep leverages the Soot static analysis framework [26] to convert Java programs into Jimple code, which provides a succinct yet expressive set of instructions for analysis. In addition, NetRep utilizes the Rosette tool [48] to perform symbolic reasoning for fault localization and patch synthesis. While our implementation closely follows the algorithm presented in Section 4, we also conduct several optimizations important to improve the performance of NetRep. Memories for different types. Since the conversion between bitvectors and integers imposes significant overhead on running time, NetRep divides the memory into one part for integers and another for bitvectors. In this design, NetRep automatically selects the memory chunk based on the variable types. The type checking can guarantee that no such conversion will exist. Stack and heap. In order to reduce the number of memory operations, Ne-tRep also divides the memory into stack and heap. As is standard, stack only stores static data and its layout is deterministic. Therefore, stacks are implemented using fixed-size vectors, and thus can be efficiently accessed for read and write operations. On the other hand, heap stores dynamic data that are usually not known at compile time, such as allocated objects. Since the heap size cannot be determined beforehand, NetRep uses an uninterpreted function f (x) to represent heaps, where x is the address and f (x) is the value stored at x. String values. Since reasoning over string values is a challenging task and not always necessary for repairing network programs, we simplified the representation of strings with integer values. Specifically, NetRep maps each string literal to a unique integer and represents all string operations (e.g. concatenation) with uninterpreted functions. Bounded program analysis. In order to improve the repair time, NetRep only performs bounded program analysis for fault localization and patch synthesis. Namely, we unroll loops and inline functions up to K times, where K is a predefined hyper-parameter. In this way, function summaries can be easily and efficiently computed using symbolic execution.
Benchmark collection. To obtain realistic benchmarks, we crawl the commit history of Floodlight [9], a representative open-source SDN controller in Java that supports the OpenFlow protocol and a rich set of network functions. To distinguish commits caused by bug repairs from those generated for non-repair scenarios, we identify commits based on the following criteria: 1) The commit message contains keywords about repairing bugs, e.g., "bug", "error", "fix"; 2) The commit changes no more than three lines of code. Following these criteria, we have collected 10 commits from the Floodlight repository and adapted them into our benchmarks. Specifically, given a commit in the repository, we take the code before the commit as the faulty network program and the version after the commit as the ground-truth repaired program. The code is post-processed and the parts irrelevant to the bug of interest are removed. We also identify corresponding unit tests and modify them to directly reveal the bug as appropriate. Each benchmark in our evaluation consists of a faulty network program and its corresponding unit tests. Experimental setup. All experiments are conducted on a computer with 4-core 2.80GHz CPU and 16GB of physical memory, running the Arch Linux Operating system. We use Racket v7.7 as the compiler and runtime system of NetRep and set a time limit of 1 hour for each benchmark.

Main Results
Our main experimental results are summarized in Table 1. The column labeled "Module" describes the network module to which the benchmark belongs. The next two columns labeled "LOC" and "# Funcs" show the number of lines of source code (in Jimple) and the number of functions, respectively. The "# Tests" column presents the number of unit tests used for fault localization and patch synthesis. Next, the "Succ" and "Exp" columns show whether NetRep can successfully repair the program and if the generated patch is exactly the same as the ground-truth. Since NetRep returns the first fix that can pass all provided test cases, the repaired programs are not necessarily the same as those expected in the ground-truth. In this case, the table will show a "Yes" in the "Succ" column and a "No" in the "Exp" column. Finally, the last three columns in Table 1 denote the fault localization time, patch synthesis time and the total running time of NetRep.
As shown in Table 1, there is a range of 13 to 65 functions in each benchmark and the average number of functions is 34 across all benchmarks. Each benchmark has 212 -809 lines of Jimple code, with the average being 496. NetRep succeeds in repairing 8 out of 10 benchmarks. Furthermore, for 5 benchmarks that can be successfully repaired, NetRep is able to generate exactly the same fix as ground-truth. Given that our benchmarks cover programs from a variety of modules of Floodlight, such as DHCP Server, Firewall, etc, we believe that NetRep is effective to repair realistic network programs (RQ1).
We inspected the reason why NetRep fails to repair benchmarks 2 and 5. NetRep is not able to localize the fault in benchmark 2 due to its incomplete support for unbounded data structures with dynamic allocation such as hash map. For Benchmark 5, NetRep is able to localize the fault but not able to synthesize the correct patch. This is because the expected function to be invoked has side effects with another function, which needs some improvements in the specification checking to verify.
Regarding the efficiency, NetRep can repair 8 benchmarks in an average of 744 seconds with only 2 to 3 test cases. The fault localization time ranges from 39 seconds to 893 seconds, with 50% of the benchmarks within five minutes. The patch synthesis time ranges from 39 seconds to 2139 seconds, with 60% of the benchmarks within five minutes. In summary, the evaluation results show that NetRep only takes minutes to localize bugs in a faulty program and synthesize a correct patch based on two to three unit tests (RQ2).

Ablation Study
To explore the impact of modular analysis and domain-specific abstraction on the proposed repair technique, we develop three variants of NetRep: -NetRep-NoMod is a variant of NetRep without modular analysis. Specifically, NetRep-NoMod inlines the functions in a given program but still uses abstractions for network data structures for fault localization and patch synthesis. -NetRep-NoAbs is a variant of NetRep without domain-specific abstraction.
In particular, NetRep-NoAbs uses the original concrete implementation of network functions for symbolic reasoning. If the implementation is written in a different language, we manually translate the implementation to Java. To understand the impact of modular analysis and domain-specific abstraction, we run all variants on the 10 collected benchmarks. For each variant, we

Comparison with the Baseline
To understand how NetRep performs compared to other Java program repair tools, we compare NetRep against a state-of-the-art tool called Jaid [5] on our benchmarks. Specifically, Jaid takes as input a faulty Java program, a set of unit tests, and a function signature for fault localization and patch synthesis, a setting closest to NetRep among a variety of tools. Note that Jaid solves a simpler repair problem than NetRep, because it requires the user to specify a function that is potentially incorrect in the program, whereas NetRep does not need input other than the faulty program and unit tests. In order to run Jaid on our benchmarks, we adjust their formats to fit Jaid's and provide the faulty function (known from the ground truth) as input for Jaid. Jaid will indefinitely enumerate all possible patches, rather than recommending a most correct one. We think it is successful if the expected patch can be found among the results. In practice, human assistance is needed to pick out this patch from the thousands of candidates.
As a result, Jaid is able to finish on 8 out of 10 benchmarks. The expected patches are found among 2 of them, whereas NetRep can give the expected result for 5 benchmarks on the first recommendation. For one benchmark, Jaid is unable to fix. For another one, it runs out of memory.
We argue that NetRep is better suited for automatically repairing network programs compared to Jaid. First, it only requires network operators to provide unit test cases. As is discussed above, they can be automatically discovered by another verification or testing procedure. In comparison, Jaid requires users to have skill of programming network controllers to identify the buggy function and pick the correct patch from the results. This is beyond the ability of most network operators and starts to require an expert team. Second NetRep has higher repairing accuracy. As we discussed above, network is sensitive to small mistakes. High accuracy is crucial for a network to function correctly.
In summary, NetRep is more effective in automatically fixing bugs in network programs compared to state-of-the-art repairing tools for Java programs, especially with respect to repairing accuracy and automation (RQ4).

Related Work
Automated program repair. Automated program repair is an active research area that aims to automatically fix the mistakes in programs based on specifications of correctness criteria [11,28,39,18], with a variety of applications such as aiding software development [34], finding security vulnerabilities [37], and teaching novice programmers [49,14]. Different techniques have been proposed to solve the automated program repair problem, including heuristics-based techniques [16,31], semantics-based techniques [37,27], and learning-based techniques [45,30,32,47]. NetRep is a semantics-based automated repair tool. Different from prior work, NetRep is specialized to repair network programs based on modular analysis and network data structure abstractions. Fault localization. Researchers have developed various approaches to fault localization, including spectrum-based, learning-based, and constraint-based techniques. Specifically, the spectrum-based techniques [27,1,2,7,44,6,19] perform fault localization by identifying which part of program is active during a run through execution profiles (called program spectrum). Learning-based techniques [29,53,54] typically train machine learning models to predict and rank possible fault locations. By contrast, constraint-based techniques [21,20,12] encode the semantics of problems as logical constraints and reduce the fault localization problem into constraint satisfaction problem. In spirit, NetRep uses a similar idea for fault localization. However, NetRep performs modular analysis and enables debugging programs involving object-oriented features, whereas prior work only analyzes the entire program in a C-like language. Besides, NetRep reuses the fault localization result to speedup the patch synthesis while prior work mainly focuses on the fault localization step. Patch synthesis. Many synthesis algorithms have been developed for generating patches, including enumerative search [27], constraint-based techniques [37], statistical model [52], machine learning [15], hint from existing code [25], and so on. In terms of patch synthesis, NetRep generates a context-free grammar from the context of fault locations and performs enumerative search based on the grammar to synthesize patches. It does not require machine learning model or statistical information for ranking all possible patches. However, it is conceivable that NetRep will benefit from the guidance of such ranking techniques.
Verification and synthesis for SDN. In the networking domain, several verification tools [3,33,23,24] have been proposed based on either model checking or theorem proving. For example, VeriCon [3] performs deductive verification to verify the correctness of SDN programs specified by network-wide invariants on all admissible topologies. In addition to verification, synthesis techniques [36,35,38] have also been proposed to aid software-defined networking. NetRep aims to repair network programs automatically, which is a different problem than SDN verification or synthesis. Repair for network programs. Our work is most related to automated repair of network programs in the SDN domain [50,51,17]. Prior work about autorepair [50,51] relies on using Datalog to capture the operational semantics of the target language to be repaired. The repair techniques work for domain-specific languages (e.g. Datalog or Ruby on Rails) with simple structure. Similarly, Hojjat et al. [17] propose a framework based on horn clause repair problem to help network operators fix faulty configurations. However, NetRep targets Java network programs with object-oriented features and more complex constructs, which cannot be handled by existing techniques.

Limitations and Future Work
We discuss several limitations of NetRep that we plan to improve in future work. First, NetRep repairs the faulty network program with the first correct patch that can pass all tests. A user interaction that resumes the synthesis can be introduced in case it is not intended by the user or a more formal specification.
Second, patches that require complicated changes, e.g., those involving control flow structures, are beyond NetRep's ability. They make up 44% of our collection of bug-fixing commits. We envision that the challenge can be addressed by introducing more sophisticated patch synthesis techniques such as searching over a domain-specific language for edits.
Third, in order to force symbolic execution to terminate in finite time, Ne-tRep currently unrolls all loops in the network program, which may result in missing a potential bug. Loop invariant inference techniques can be leveraged to overcome this challenge and still guarantee termination.

Conclusion
In this paper, we have proposed an automated repair technique for network controller programs with unit tests as specifications. Our technique internally performs symbolic reasoning for bug localization and patch synthesis, optimized by network domain-specific abstractions and modular analysis to reduce encoding size. we have implemented a tool called NetRep and evaluated it on 10 benchmarks adapted from the Floodlight framework. The experimental results demonstrate that NetRep is effective for repairing realistic network programs with moderate change sizes.