1 Introduction

Deductive software verification aims at formally verifying that all possible behaviors of a given program satisfy formally defined, possibly complex properties, where the verification process is based on some form of logical inference, i.e., “deduction”. In this article we follow the trajectory of the field of deductive software verification from its inception in the late 1960s via its current state to its promises for the future. It was a long way from pen-and-paper proofs for programs in small, idealized languages to highly automated proofs of complex library or system code written in mainstream programming languages. We argue that the field has reached a stage of maturity that permits to use deductive verification technology in an industrial setting. However, this does not mean that all problems are solved. On the contrary, formidable challenges remain, and not the smallest among them is how to bring about the transfer into practical software development. Hence, the second contribution of this article is to present an overview of what we consider the most important challenges in the area of deductive software verification.

To render this article feasible in length (and to avoid overlap with other contributions in this volume) we focus on contract-based, deductive verification of imperative and object-oriented programs. Hence, we do not discuss model checking, SMT solvers, general proof assistants, program synthesis, correctness-by-construction, runtime verification, or abstract interpretation. Instead, we refer to the articles Runtime Verification: Past Experiences and Future Projections, Software Architecture of Modern Model-Checkers, Statistical Model Checking, as well as The 10,000 Facets of MDP Model Checking in this issue. We also do not cover fully automated verification tools for generic safety properties (see the article Static Analysis for Proactive Security in this issue for some aspects on these). This is not at all to say that these methods or tools are unimportant or irrelevant. On the contrary, their integration with deductive verification appears to be highly promising, as we point out in Sect. 5.2 below.

This paper is organized as follows: in the next section we walk through a non-trivial example for contract-based verification to clarify the scope and illustrate some of the important issues. In Sect. 3 we sketch the main developments in the field ca. up to the year 2000. In Sect. 4 we sketch the current state-of-art and we discuss the two main approaches to deductive verification: symbolic execution and verification condition generation. The core of the paper is Sects. 5 and 6, where we discuss the main achievements and the remaining challenges of the field, divided into technical and non-technical aspects. We conclude in Sect. 7.

2 An Example

Properties to be proven by deductive verification are expressed in a formal specification language. Ada was the first language that supported expressing formal specification annotations directly as structured comments next to the program elements they relate to [97]. As this proved to be natural and easy to use, this was followed for other programming languages. Eiffel [99] propagated a contract-based paradigm, where the prerequisites and obligations of each methodFootnote 1 are laid down in a contract. This has the very important advantage that methods, as the central abstraction concept to structure a program, have a direct counterpart in formal specifications. Hence, specifications and programs follow the same structure. For most major imperative/object-oriented programming languages there exist dedicated contract-based annotation languages (see Sect. 5.1).

We give an example of contract-based formal specification and verification of a Java program with the Java Modeling Language (JML) and provide informal explanations of JML specification elements; more details are in [66, 85]. Consider the Java method in Fig. 1 which implements binary search in a sorted integer array. Its code is completely specified, so it can be compiled and run from a suitable method.

Fig. 1.
figure 1

Formal JML specification of a Java binary search method

The method contract (lines 1–7) specifies the intended behavior, whenever terminates normally. The contract’s only requirement (line 2) is that the input array is sorted (in JML all reference types are assumed to be non-null by default, so this does not need to be spelled out). Sortedness is specified with the help of a model method that is not shown. The contract says that whenever is called with a sorted, non-null array then the call terminates and in the final state the property given in the ensures clause (lines 3–5) is satisfied. In addition, the assignable clause (line 6) says that the execution has strictly no side effects, not even creation of new objects. The contract is valid for any input of unbounded size that satisfies the requirements.

We take a closer look at the ensures clause: line 3 is the guard of a conditional term saying that the value occurs as an entry of . If true, an array index where is found is returned as the result, and otherwise. We do not specify whether the result is the smallest index, but make sure that is in a valid range.

The loop invariant (lines 14–16) specifies the valid range of the pivots and says that can never occur below index or above . To ensure termination of it is sufficient to ensure termination of the loop. This is achieved by the decreases clause (line 18), an expression over a well-ordered type that becomes strictly smaller in each iteration.

A central advantage of contract-based verification is compositionality and scalability: after showing that a method satisfies its contract, each call to that method can be replaced with its contract, instead of inlining the code. Specifically, if the callee’s requires clause is satisfied at the call point, then its ensures clause can be assumed and the values of all memory locations of the caller, except the assignable ones, are preserved. We illustrate the idea with a simple client method, see Fig. 2.

Fig. 2.
figure 2

Formal JML specification of a Java client method

Method searches for value in . If the entry was found at index , then it is appended to the array contained in the field and returned. The specification is surprisingly complex. First of all, as specified in the exceptional termination case (lines 14–19), an is thrown if the array is full and the given value is found (line 16): in this case the array has to be extended (line 17). The assignable clause (line 19) is not strict, because a new exception object is created. Sortedness of parameter is necessary to ensure that the contract of can go into effect.

The specification case for normally terminating behavior (lines 5–12) of is similar to that of : in addition we require (line 6) that and are different arrays (Java arrays can be aliased) and that there is still space for a new entry (). The latter could be weakened by disjoining the condition that the value is found. The second ensures clause (lines 10–11) is almost identical to the one of (we left out the bounds on the result). The first ensures clause (lines 7–9) says that, if the value is found then its index is appended to ; otherwise, and are unchanged. We use the keyword to refer to a value in the prestate. This is necessary, because was updated in the method.

Typically, one also specifies class invariants that, for example, capture consistency properties of the instance fields that all methods must maintain. In JML any existing class invariants are implicitly added to all requires and ensures clauses which helps to keep them concise. In the example, we maintain the invariant that is non-negative (line 1). Class invariants must be established by all constructors (not shown here).

Discussion. Even our small example shows that precise contracts, even of seemingly innocent methods, can become bulky. The specification of is about twice as long as its implementation. And, as pointed out, that specification could be made even more precise. But without further information about the call context it is hard to decide whether that is useful. A subtle question is whether the annotation in line 4 of Fig. 1 is needed at all: if it doesn’t hold then the expression is not well-defined in Java anyway. But most verifiers will not be able to deduce this by themselves, because they treat such an expression as underspecified. The semantics of most tools, including the JML standard, is not always unambiguous.

It is very easy to forget parts of specifications: in most cases, the first attempt will not be verifiable. While developing the example we forgot in line 6 of Fig. 2, a typical omission. Good feedback from the verification tool is very important here. Vice versa, some of the specification annotations should be automatically derivable, for example, the bounds. Note that reuse of specification elements is essential to obtain concise and readable annotations.

It took about one hour (for an expert) to specify and verify the example reproduced here. After finding the correct specifications, formal verification with the system KeY [3] is fully automatic and takes about 6 s on a state-of-the-art desktop, including the constructor and model methods not displayed. The most complex method, , led to a proof tree with ca. 4,000 nodes and 27 branches. Interestingly, when we loaded the verified example in OpenJML [37], we only had to rename some KeY-specific keywords such as (replaced by , and then most of the example could be verified directly. The only specification that could not be verified was the exceptional behavior specification of method , as OpenJML adds extra proof obligations to every array access instruction, capturing that the index should be between the bounds of the array, to ensure the absence of runtime errors.

3 History Until LNCS 1750 (aka Y2000)

The Roots of Deductive Verification. The history of deductive software verification dates back to the 1960s and 70s. Seminal work in this area is Floyd-Hoare logic [48, 61], Dijkstra’s weakest preconditions [42], and Burstall’s intermittent assertions [31].

Floyd and Hoare introduced the notion of pre- and postcondition to describe the behaviour of a program: a Hoare triple \(\{P\}S\{Q\}\) is used to express that if program \(S\) is executed in an initial state \(\sigma \), such that the precondition \(P\) holds for \(\sigma \), then if execution of \(S\) terminates in a state \(\sigma '\), the postcondition \(Q\) holds for the final state \(\sigma '\). This relation is also called partial correctness (partial, because termination is not enforced). Any pair of states \((\sigma ,\sigma ')\) for which the Hoare triple holds, must be contained in the big-step semantics [110] of S. Floyd and Hoare proposed a set of syntactic proof rules to prove the correctness of an algorithm. One classical example of such a proof rule is the rule for statement composition:

figure a

This rule expresses that to prove that if \(S_1;S_2\) is executed in a state satisfying precondition \(P\), if it terminates in a state satisfying \(Q\), it is sufficient to find an intermediate assertion \(R\), such that \(R\) can be established as a postcondition for the first statement \(S_1\), and is a sufficient precondition for \(S_2\) to establish postcondition \(Q\). Rules like this break up the correctness problem of a complete algorithm into a correctness problem of the individual instructions.

Dijkstra observed that it is possible to compute the minimal precondition that is necessary to guarantee that a program will establish a given postcondition. This simplifies verification, because in this way, one does not have to “invent” the intermediate predicate that describes the state between two statements, but this can be computed. In particular, the weakest precondition \(\mathsf {wp}\) for a statement \(S_1;S_2\) can be computed using the following rule:

$$\begin{aligned} \mathsf {wp}(S_1;S_2, Q) = \mathsf {wp}(S_1, \mathsf {wp}(S_2, Q)) \end{aligned}$$

For other instructions, similar rules exist. A Verification Condition Generator (VCG) is a deductive verification tool that produces proof obligations expressing that the specified precondition is stronger than the weakest precondition as computed by the \(\mathsf {wp}\) rules. For this approach to work, we require the presence of loop invariants and method contracts for all methods called in the verified code, which give rise to additional proof obligations.

VCGs in essence apply \(\mathsf {wp}\) transformation rules backwards through the target program, starting with the postcondition to be proven. However, it is also possible to verify a program in the forward direction of its control flow. Burstall [31] proposed to combine symbolic execution with induction to show that a program implies its postcondition (see also Sect. 4).

First Deductive Verification Tools. The early program verification techniques were, to a large extent, a pen-and-paper activity. However, the limitations of doing such proofs with pen-and-paper were immediately obvious, and several groups started to develop tools to support formal verification. These efforts were all isolated, and usually still required extensive user interaction. Nevertheless, the correct application of the proof rules was checked by the system, and many obvious errors were avoided this way. It is not possible to give a complete overview of early verification systems, but we mention some representative tools and their main characteristics.

Tatzelwurm [41] was a VCG for a subset of UCSD Pascal. It accepted specification annotations in sorted first-order logic and used a tableau-based theorem prover with a decision procedure for linear integer expressions to discharge verification conditions.

Higher-order logic theorem provers were frequently used to construct a verified program verifier. The soundness of the verification technique was proven inside the theorem prover, and the program to be verified was encoded in the logic of the theorem prover, after which the verified rules could be applied. This approach was used for example in the Loop project, where Hoare logic rules were formalized in PVS (later also Isabelle) to reason about Java programs [65, 67], the Sunrise project, which used a verification condition generator verified in HOL for a standard while-language [64], by Von Oheimb who formalized a Hoare logic for Java in Isabelle/HOL [126], and by Norrish, who formalized a Hoare logic for C in HOL [105].

SPARK [112] and ESC [90] were among the first tools to directly implement the weakest precondition calculus. Development of SPARK started in an academic setting, was further extended and refined in an industrial setting, and is now maintained and marketed by AdaCore and Altran. SPARK realizes a VCG for (a safety-critical subset of) Ada and is still actively developed [60]. The ESC (Extended Static Checker) tool originally targeted Modula-3, but was then adapted to Java [88]. ESC was designed with automation in mind: it traded off correctness and completeness with the capability to quickly identify possible problems in a program, thus providing the programmer with useful feedback.

Another early implementation of the weakest precondition calculus was provided in the B Toolkit [113] that realized tool support for the B Method [1]. The B Method is based on successive refinement of a sequence of abstract state machines—weakest precondition reasoning is used to establish invariants, preconditions, and intermediate assertions for a state machine. The B Method is one of the industrially most successful formal methods (see [93] for an overview), however, it is not a deductive software verification approach and, for this reason, not discussed further.

The KIV system [51] was the firstFootnote 2 interactive program verifier based on dynamic logic, an expressive program logic that can be viewed as the syntactic closure of the language of Hoare triples with respect to first-order connectives and quantifiers [54]. It formalizes Burstall’s [31] approach as a dynamic logic calculus whose rules mimic a symbolic interpreter [55]. Induction rule schemata permit complete symbolic execution of loops. KIV is still actively being developed, and much effort has been put into automation, and an expressive specification language, using higher-order algebraic specifications [45]. It has been used for verification of smart card applications and the Flashix file system.

ACL2 (A Computational Logic for Applicative Common Lisp) is a program verification tool for Lisp [78]. As other members of the Boyer-Moore family of provers, it has a small trusted core, and all other proof rules are built on top of this trusted core and cannot introduce inconsistencies. Its main proof strategies are based on induction and rewriting. The ACL2 prover is actively developed. It has been used to verify properties of, for example, models of microprocessors, microcode, the Sun Java Virtual Machine, and operating system kernels.

STeP, the Stanford Temporal Prover, used a combination of deductive and algorithmic techniques to verify temporal logic properties of reactive and real-time systems. It features a set of verification rules which reduce temporal properties of systems to first-order verification conditions and implements several techniques for automated invariant generation [19].

4 From LNCS 1750 to LNCS 10000

A Deductive Verification Community. After the year 2000, we see a gradual change from tools developed in isolation to a community of deductive software verification tool developers and users. Within this community, there is active exchange and discussion of ideas and knowledge. Effort has been put into standardizing specification languages, notably JML, now used by most contemporary tools aiming at verification of Java. Further, the VS-Comp and VerifyThisFootnote 3 program verification competitions have been established, where the developers and users of various deductive verification tools are challenged to solve program verification competition problems within a limited time frame [68]. After the competition, participants present their solutions to each other, which leads to substantial cross-fertilization.

Deductive Verification Architectures. As mentioned above, there are two main approaches for the construction of deductive verification tools: VCG and symbolic execution. Tools based on VCG use transformation rules to reduce an annotated program to a set of verification conditions whose correctness entails correctness of the annotated program. Tools that use symbolic execution collect constraints on the program execution by executing the program with symbolic variables. If the collected constraints can be fulfilled and imply the annotations at each symbolic state, then the annotated program is correct. Both approaches can be formalized within suitable program logics.

Kassios et al. [77] report that symbolic execution tends to be faster than VCG, but the former is sometimes less complete and occasionally suffers from path explosion. However, the completeness issue seems to derive from the specific architecture of the symbolic execution tool that was used in their study, which relies on an inherently incomplete separation of heap reasoning and arithmetic SMT solving. Path explosion, however, is clearly an issue for symbolic execution of complex target code [39]. It was recently shown that it can be mitigated with symbolic state merging techniques [117].

Long-Running Deductive Verification Projects. Several tools whose development started around the year 2000 still exist currently, or evolved into new tools. We sketch the development of some of these tools.

Work on the KeY tool [3] started in 1998 [53] and it has been actively developed ever since. Like KIV, KeY is based on symbolic execution formalized in dynamic logic, but it extends the KIV approach to contract-based verification of Java programs and uses loop invariants as a specific form of induction that is more amenable to automation. KeY is not merely focused on functional verification, but complements it with debugging and visualization [3, Chap. 11] or test generation [3, Chap. 12]. It covers the complete JavaCard language [102] and was used to identify a bug in the Timsort algorithm [39], the standard sorting algorithm provided in the Oracle JDK, Python, Android, and other frameworks.

The development of ESC/Java [88] was taken over by David Cok and Joe Kiniry, resulting in ESC/Java2 [38]. Initially, their goal was to bring ESC/Java up-to-date, as well as to provide support for a larger part of JML and more Java features. ESC/Java2 is not actively developed as a separate tool anymore, however, it formed the foundation for the static verification support in the OpenJML framework [37]. Over the years, the proving capabilities of the static verification support in OpenJML have been strengthened. Like ESC/Java, it still prioritizes a high degree of automation, but soundness is not traded off anymore. OpenJML offers not merely support for static verification, but also for runtime verification.

The original ESC/Java development team around Rustan Leino moved into a different direction. In 2004, they presented Spec#, a deductive verification tool for C# [11], which reused much of the philosophy of ESC/Java. In parallel to the development of Spec#, the team also designed Boogie, as an intermediate language for static verification [10]. Boogie is a very simple programming language, for which it is straightforward to build correct verification tools. To provide support for more advanced programming languages, it is sufficient to define an encoding into Boogie. Boogie is used as the intermediate verification language for various programming languages, including Java (in OpenJML), Java bytecode [86], and C# (in Spec#). After the work on Spec# and Boogie, Leino took a slightly different approach: instead of developing a verification tool for an existing programming language, he designed Dafny, which is a programming language with built-in support for specification and verification [89], and in particular supporting dynamic frames [76].

Another widely used intermediate language is Why3 [24] which nowadays is used as a backend for SPARK 2014, the current version of SPARK/Ada [81], and Frama-C, a tool for the verification of C programs [80], specified with the JML-like language ACSL. Its original version (Why [47]) has been used as a backend for Krakatoa [98] (for Java programs) and Jessie (for C programs). Frama-C provides more than mere deductive verification: it also supports runtime verification, and it contains analysis tools such as a slicer and a tool for dependency analysis. Much attention is given to the combination and interaction between these tools, for example how testing can be used automatically to understand why a proof fails [109]. Intermediate languages in the context of model checking are discussed in the article Software Architecture of Modern Model-Checkers in this issue.

A final example is the Infer tool [32], which supports fully automated deductive verification techniques to reason about memory safety properties of C programs. Infer uses separation logic, an extension of classical Hoare logic, which is especially suited to reason about pointer programs. The development of separation logic resulted in the creation of a series of research prototype tools (Smallfoot, Space Invader, Abductor) as a way to automatically analyze memory safety of programs. As the focus of Infer is on a restricted set of properties, specifications are not required (but it is possible to obtain the specs that infer derives from the analysis). Infer is integrated in the Facebook code inspection chain, and is used as one of the standard checks before code changes are committed.

All tools mentioned above have their specific strengths and weaknesses. However, they share that they target the verification of realistic programming languages, and have made substantial progress in this direction. Several of the tools mentioned above are used in undergraduate teaching (both at Bachelor and Master level). Importantly, this does not happen only at the universities of the tools’ own developers, but also at other universities where lecturers find it important to teach their students state-of-the-art techniques that can help to improve software reliability.

There exist many verification case studies, where unmodified (library) code was annotated and verified, and often bugs were discovered, see e.g. [39, 79, 102, 111, 116]. Despite those success stories, there is a growing realization that post-hoc verification and, in particular, specification, remains difficult and challenging, and that there always is a trade-off between the verification effort and the level of reliability that is required for an application. A result of this realization is that we see a shift of emphasis from proving correctness of an application to bug-finding and program understanding.

5 Achievements and Challenges: Technical

5.1 Specification Languages

Deductive verification starts with specifying what should be verified, i.e., what behaviour we expect from the implementation. This is where the specification language comes into play.

In essence, expected program behaviour is described in the form of a method contract: a precondition specifies the assumptions under which a method may be called; a postcondition specifies what is achieved by its implementation, e.g., the computed result, or its effect on the global state. Eiffel was the first mainstream programming language that featured such method specifications [100].Footnote 4

Achievements. For the deductive software verification community, the design of JML, the Java Modeling Language [66, 84], has been a major achievement. Figures 1 and 2 in Sect. 2 illustrate typical JML specifications. JML features method contracts, similar to Eiffel, but in addition provides support for more high-level specification constructs for object-oriented programming languages, such as class invariants, model elements, and history constraints [94]. One of the important design principles of JML is that its notation is similar to Java. Properties in JML are basically Java expressions with Boolean types, and only a few specific specification-only constructs such as quantification, and implication have been added. As a result, JML specifications have a familiar look and feel, and can easily be understood. JML is also used as a specification language for other formal validation techniques, such as test case generation, and runtime assertion checking, which further increases its usability in the software development process.

JML is a rich specification language; complex specifications can be expressed in it. It provides extensive support for abstraction in the form of a fully-fledged theory of model specification elements, based on the idea of data abstraction as introduced by Hoare [62]. The principles behind this are old, but JML turns it into a technique that can be used in practice. Abstraction allows a clear separation of concerns between specification language and implementation [33], and increases portability of specifications.

The design of JML has been influential in the design of other specification languages for deductive verification, such as the ANSI/ISO C Specification Langage (ACSL), which is used in the Frama-C project [80], and the Spec# specification language for C# [11] and its spin-off Code Contracts [96].

Challenges. A central problem of deductive verification is that specifications cannot be as declarative and abstract as one would like them, in order for verification proofs to succeed. Specifications become polluted with intermediate assertions and implementation properties that are necessary as hints for the verification engine. This becomes problematic in the verification of large code bases and is exacerbated by usage of off-the-shelf libraries. To improve the situation, we believe attention should be given to address the following two challenges:

  1. S.1

    Provide specifications for widely-used APIs. At the very least, these should describe under which circumstances methods will (not) produce exceptions. For specific APIs, such as the standard Java collection library, also functional specifications describing their intended behaviour are required. This task is work-intensive and has little (direct) scientific reward. It is, therefore, difficult to find funding to conduct the required work, see also challenge F.1.

  2. S.2

    Develop techniques to infer specifications from code in a (semi-)automated manner. Many specification details that have to be spelled out explicitly, actually can be inferred from the code (as illustrated in the example of Sect. 2). There is work on specification generation [63, 101], but it is not integrated into deductive verification frameworks (see challenge I.9).

5.2 Integration

Integration aspects of formal verification appear on at least three levels. On the most elementary level, there is the software engineering aspect of tool integration and reuse. Then there is the aspect of integrating different methods and analyses to combine their complementary strengths. Finally, there is the challenge to integrate formal verification technology into an existing production environment such that added value is perceived by its users. We discuss each aspect in turn in the following subsections.

Tool Integration and Reuse. Software reuse is still considered to be a challenging technology in Software EngineeringFootnote 5 in general. Therefore, it is not surprising that this is the case for formal verification in particular. The situation is exacerbated there, due to the complexity of interfaces and data structures.

Achievements. One success story of tool reuse in deductive verification is centered around Boogie [10] (see also Sect. 4), an intermediate specification and verification language and VCG tool chain, most often complemented by the SMT solver Z3 [40] as its backend. Boogie is a minimalist language, optimized for formal verification. It is used as a backend in several verification tool chains, including Chalice [87], Dafny [89], Spec# [11], and VCC [36]. More recently, also the intermediate verification language Silver [104], which has built-in support for permission-based reasoning, reuses Boogie as one its backends. In addition, it also comes with its own verification backend, an SE-based tool called Silicon. Interestingly, Silver in turn, is used as a backend in the VerCors platform [8] for reasoning about concurrent Java and OpenCL programs. Similarly, but with less extensive reuse, the WhyML intermediate verification language is used in the verification systems Frama-C [80] and Krakatoa [47]. Recently, a translation from Boogie to WhyML was presented [5] that links both strands. The state-of-art on tool integration in the model checking domain is discussed in the article Software Architecture of Modern Model-Checkers in this issue.

Challenges. Intermediate verification languages are good reuse candidates, because they are small and have a clear semantics. In addition, compilation is a well-understood, mainstream technology with excellent tool support. This makes it relatively easy to implement new frontends. On the other hand, tool reuse at the “user level”, for example, for JML/Java or ACSL/C is much harder to achieve and we are not aware of any significant case.

  1. I.1

    Equip frontend (JML, Java, ACSL, C, ...) as well as backend (Boogie, Silver, Why, ...) languages with precise, preferably formal, semantics. In the case of complex frontend languages this involves identifying a “core” that must then be supported by all tools.

  2. I.2

    Equip formal verification tools with a clear, modular structure and offer their functionality in well-documented APIs. This is a work-intensive task with few scientific rewards and, therefore, closely related to Challenge F.1.

  3. I.3

    Establish and maintain a tool integration community, to foster work on reuse and increase its appreciation as a valuable contribution.Footnote 6

Method Integration. Arguably, one of the largest, self-imposed stumbling blocks of formal methods has been the propagation of monolithic approaches. At least in deductive verification, it became very clear within the last decade that software development, formal specification, formal verification, runtime verification, test case generation, and debugging are not separate activities, but they have to be done in concert. At the same time, formal specifications have to be incrementally developed and debugged just as the pieces of code whose behavior they describe. This is now commonly accepted in the community, even if the infrastructure is not there yet; however, there are encouraging efforts.

Achievements. It is impossible to list exhaustively the flurry of papers that recently combined formal verification with, for instance, abstract interpretation [117], debugging [58], invariant generation [82], software IDEs [92], testing [109], to give only a few examples.

Most deductive verification tools (as well as proof assistants) provide an interface to SMT solvers via the SMT-LIB [12] standard. There is growing interest in formal verification from the first-order theorem proving community where tools can be integrated via the TPTP standard [119]. There is also work towards the exchange of correctness witnesses among verifiers [17].

An interesting recent trend is that specialized verification and static analysis tools are being equipped with more general techniques. For example, the termination analysis tool AProVE [50] as well as the safety verification tool CPAchecker [18] both implement a symbolic execution engine to improve their precision. We observe that boundaries between different verification subcommunities that used to be demarcated by different methods and tools are dissolving.

Challenges. In addition to the tool integration challenges mentioned above, on the methodological level, questions of semantics and usability arise. To mention just one example, there is a plethora of approaches to loop invariant generation, see e.g., [46, 63, 114]. All of them come with certain limitations. They tend to be driven by the technology they employ, not by applications and they are designed as stand-alone tools. This makes their effective usage very difficult.

Another area from whose integration deductive verification could benefit is machine learning, specifically, automata learning (see also the article Combining Black-Box and White-Box Techniques for Learning Register Automata in this issue).

  1. I.4

    Calls to auxiliary tools must return certificates, which must be re-interpreted in the caller’s correctness framework. This is necessary to ensure correctness arguments without gaps.

  2. I.5

    The semantic assumptions on which different analysis methods are based must be spelled out, so that it is possible to combine different approaches in a sound manner. Some work in this direction has been done for the .NET static analyzer Clousot [35], but such investigations should be done on a much larger scale.

  3. I.6

    There is a plethora of possible combinations of tools and methods. So far, method and tool integration is very much ad hoc. There should be a systematic investigation about which combinations make methodological sense, what there expected impact is, and what effort their realization would require.

  4. I.7

    A research community working on method integration should be established.

Integration with the Software Production Environment. It is very difficult to integrate software verification technology into a production environment. Some of the reasons are of a non-technical nature and concern, for example, usability or the production context. These are explored in Sect. 6 below. Another issue might be the lack of coverage, see Sect. 5.3. In the following, we concentrate on processes and work flows.

Achievements. Our guiding question is: How can formal software verification be usefully integrated into a software development process? The emerging integration of verification, test generation, and debugging aspects into single tool chains, as described above, is an encouraging development. We begin to see deductive verification tools that are intentionally presented as enhanced software development environments, for example, the Symbolic Execution Debugger (SED) [56] based on Eclipse or the Dafny IDE [91] based on MS Visual Studio.

Several verification tools support users in keeping track of open proof obligations [59, 80, 91] after changes to the code or specification. This is essential to support incremental software development, but not sufficient. To realize versioning and team-based development of verified software, it is necessary to generalize code repositories into proof repositories [30]: a commit computes not merely changes, but a minimal set of new proof obligations that arise as a consequence of what was changed.

Another issue is that most verification attempts fail at first. It requires often many tries to render a complex verification target provable [39]. It is crucial to provide feedback to the user about the possible cause behind a failed proof. Systems, such as KeY [3], can provide symbolic counter examples, and SED [56] computes symbolic backward slices from failure nodes in symbolic evaluation trees. The system StaDy [109] goes beyond this and uses dynamic verification to analyze failed proofs. The StaRVOOrS framework [34] generates optimized runtime assertion monitors for the unprovable parts of a specification.

In the context of commercial software production one can question whether functional verification is a worthwhile and realistic goal in the first place. Arguably, for safety- and security-critical code, as well as for software libraries used by millions, it is, but probably not for any kind of software. However, this does not mean that formal verification technology is restricted to the niches mentioned above, because there are many relevant formal verification scenarios, in addition to functional verification, notably: bug finding (discussed in Sect. 4) [15], information flow [44], and symbolic fault injection [83].

Challenges. The nature of software development is mostly incremental and evolutionary, and this must be accounted for by formal verification technology when used in commercial production. This is not the case at the moment.

Perhaps the biggest obstacle in functional verification is the lack of detailed enough specification annotations in the form of contracts and loop invariants. Without contracts, in particular for library methods, deductive verification does not scale. For some verification scenarios less precise annotations will do, but in general this is a huge bottleneck [13].

  1. I.8

    Implement proof repositories that support incremental and evolutionary verification and integrate them with verification tools.

  2. I.9

    Integrate automated specification generation techniques into the verification process.

5.3 Coverage

To make sure that deductive verification tools are practically usable, they need to support verification of a substantial part of the programming language. This means that for every construct of the programming language, verification techniques need to be developed (or at least, clear boundaries have to be provided, detailing what is covered, and what is not). Moreover, once the verification techniques are there, all variations of the programming language construct need to have tool support. Developing suitable verification techniques is typically a scientific challenge, but ensuring that a tool supports all variations of a language construct is mainly an engineering issue. If a language construct is not supported, preferably the tool design is such that it gracefully ignores the non-understood construct, and warns the user about this.

Achievements. State-of-the-art tools for deductive verification currently cover a very large part of the sequential fragment of industrially-used languages. To mention a few: OpenJML [37], KeY [3] and KIV [45] for Java, Frama-C [80], VeriFast [72] and Infer [32] for C, AutoProof [120] for Eiffel, and SPARK [60] for Ada. These tools are mature enough to verify non-trivial software applications, and to identify real bugs in them, as discussed in Sect. 4. However, for more advanced language features such as reflection, and recent features such as lambdas in Java, verification technology still has to be developed (and thus, is currently not supported by these tools).

To provide tool support for a realistic programming language entails verification techniques such as reasoning about integer types (including overflow) [16], reference types, and exceptions [70]. Some of these, for example, support to reason about exceptions, became mainstream and are built into all modern deductive verification tools. In contrast, precise reasoning about integers, including overflow, often clutters up specifications and renders verification much harder. Therefore, many deductive verification tools abstract away from it, or provide it as an optional feature.

There is active research to investigate how to extend support for deductive verification to concurrent software. This opens up a whole new range of problems, because one has to consider all possible interleavings of the different program threads. Pen-and-paper verification techniques existed already for a long time [75, 107], however, tool support for them remained a challenge.

The advent of concurrent separation logic [28, 106] gave an important boost, as it enabled modular verification of individual threads in a (relatively) simple way. This has given rise to a plethora of new program logics to reason about both coarse-grained and fine-grained concurrency, see [29] for an overview. Also variations of separation logic for relaxed memory models have been proposed [121, 124]. However, most of these logics still lack tool support.

In parallel to the theoretical developments, the basic ideas of concurrent separation logic, extended with permissions [25, 26] started to find their way into deductive verification tools. Existing tools such as VeriFast [72], VerCors [9, 20] and VCC [36] support verification of data race-freedom for different programming languages, using both re-entrant locks [6] and atomic operations as synchronisation primitives [7, 71]. Current investigations focus on the verification of functional properties of concurrent software by means of abstraction [23]. In addition to Java and C, the VerCors tool set also supports reasoning about OpenCL kernels, which is using a different concurrency paradigm [22]. Also the KeY verifier provides some support to reason interactively about data race freedom of concurrent applications [103]. This approach can be used in addition to VeriFast and VerCors, and is in particular suitable to trace the source of a failing verification.

There also exist alternative verification techniques for concurrent software that use a restricted setup to achieve their goals. In particular, Cave [123] automatically proves memory safety and linearizability using an automated inference algorithm for RGSep, a combination of rely-guarantee reasoning and separation logic [125]. Just as the Infer tool mentioned above, it achieves automation by restricting the class of properties that can be verified. Another alternative line of work is to investigate more restricted concurrency models that allow near-sequential verification techniques. This is the approach advocated in ABS [74] which supports cooperative multitasking with explicit scheduling points [43].

Challenges. The main challenges with respect to coverage go into two different directions: one is to cover more aspects of the programming languages already supported; the other is to cover new classes of programming languages.

  1. C.1

    Precise verification of floating point numbers is essential for many algorithms, in particular in domains such as avionics. There is preliminary work [108], but a full-fledged implementation of floating point numbers in deductive verification systems is not yet available. A promising recent breakthrough is an automatable formal semantics for floating points numbers [27] which found its way also into the SMT-LIB and the SMT competition.

  2. C.2

    Tool support for verification of concurrent software is still in its infancy. We need further developments in two directions: (1) automated support of functional properties of fine-grained concurrency, which does not require an overwhelming amount of complex annotations, and can be used by non-experts in formal verification, and (2) verification techniques for relaxed memory models that resemble realistic hardware-supported concurrent execution models.

  3. C.3

    Reasoning techniques for programs that use reflection are necessary for application scenarios such as the analysis of obfuscated malware, or of dynamic software updates.

  4. C.4

    The rapid evolution of industrial programming languages (e.g., substantial new features are added to Java every 2–3 years) is a challenge for tools that are maintained with the limited manpower of academic research groups. Translation to intermediate languages is one way out, but makes it harder to provide feedback at the source level. Ulbrich [122] suggested a systematic framework for combining deductive verification at the intermediate language level with user interaction at the source level, but it has yet to be integrated into a major tool.

  5. C.5

    Deductive verification technology is not merely applicable to software, but also to cyber-physical systems, as they exhibit similar properties [52]. Computational engineers are mainly working with partial differential equations to describe their systems, and they implement these in C, MATLAB, etc. There are some results and tools for deductive verification of hybrid systems [49]. Hybrid systems have been traditionally modeled with differential equations (see the article Multi-Mode DAE Models: Challenges, Theory and Implementation in this issue) and automata-based techniques (article Continuous-time Models for System Design and Analysis in this issue). It is an open problem to find out how these different methodological approaches relate to and could benefit from each other.

6 Achievements and Challenges: Non-technical

6.1 Usability

Research in formal verification is method- and tool-driven. As a consequence, the effectiveness of a novel method or a new tool is usually simply claimed without justification or, at best, underpinned by citing execution statistics. The latter are often micro benchmarks carried out on small language fragments. The best case are industrial case studies which may or may not be representative and in nearly all publications these are performed by the researchers and tool builders themselves, not by the intended users.

To convince industrial stakeholders of the usefulness of a formal verification approach, it is not only necessary to demonstrate that it can fit into the existing development environment (see Sect. 5.2), but also to argue that one can solve tasks more effectively or faster than with a conventional solution. This is only possible with the help of experimental user and usability studies.

Achievements. There are very few usability studies around formal verification tools. We know of an evaluation of KeY and Isabelle based on focus groups [14], while the papers [21, 57, 69] contain user studies or analyses on API usage, prover interfaces, and proof critics, respectively. There are a few papers that attempt to construct user models or elicit user expectations, but [57] seems to be the only experimental user study so far that investigated the impact of design decisions taken in a verification system on user performance.

Challenges. To guide research about formal verification so that it has impact on industrial practice, it is essential to back up claims on increased effectiveness or productivity with controlled user experiments. This has been proven to be beneficial in the fields of Software Engineering and Computer Security.

  1. U.1

    Claims about increased effectiveness or productivity attributed to new methods or tools should be backed up by experimental user studies.

  2. U.2

    Establish the paper category Experimental User Study as an acceptable kind of submission in formal verification conferences and journals.

6.2 Funding

To support formal verification of industrial languages in real applications requires a sustained effort over many years. As detailed in the previous sections, to specify and to reason about programs means that the semantics of the language they are written in must be fully and deeply understood, solutions for inference and its automation must be found, suitable specification abstractions must be discovered. To formulate appropriate theoretical and methodological underpinnings took decades and the process is still not complete for complex aspects such as floating point types and weak memory models (Sect. 5.3).

The road from the first axiomatic descriptions of program logics (Sect. 3) to the verification of software written in major programming language that is actually in use was long, and we are by far not at its end. It takes a long view, much patience, and careful documentation to avoid “re-invention of the wheel” or even regression. Tool building is particularly expensive and can take decades. To protect these large investments and to ensure measurable progress, long-term projects turned out to be most suitable.

Achievements. There are several long-term projects in deductive software verification that have sufficiently matured to enable industrial applicability (see also Sect. 4). We mention ACL2Footnote 7, BoogieFootnote 8, KeYFootnote 9, KIVFootnote 10, OpenJMLFootnote 11, SparkProFootnote 12, and Why/KrakatoaFootnote 13.

Challenges. Some of the long-term projects mentioned above are supported by research labs with strong industrial ties (Altran, INRIA, MSR). Unfortunately, neither the trend to embedded industrial research nor the current climate of academic funding are very well suited for this kind of enterprise. The challenge for ambitious projects, such as DeepSpecFootnote 14, is their continuation after the initial funding runs out. It is worrying that all existing long-term academic projects on deductive software verification were started before 2000. Further detrimental factors to long-time engineering-heavy projects are the publication requirements for tenured positions in Computer Science as well as the unrealistic expectations on short-term impact demanded from many funding agencies. Successful long-term research is not “disruptive” in its nature, but slowly and systematically builds on previous results. On the other hand, usability aspects of formal verification are hardly ever evaluated.

  1. F.1

    The academic reward system should give incentives for practical achievements and for long-term success (see [4] for some concrete suggestions how this could be achieved).

  2. F.2

    Large parts of Computer Science should be classified and treated as an Engineering or Experimental Science with an according funding model. Specifically, there needs to be funding for auxiliary personnel (professional software developers) and for software maintenance: complex software systems should be viewed like expensive equipment, such as particle colliders. The base level of funding should be that of an engineering or experimental science, not a mathematical science.

  3. F.3

    Grant proposals should foresee and include funding to carry out systematic experimental studies, also involving users. For example, money to reward the participants of user studies must be allocated.

6.3 Industrial and Societal Context

The best prospects for industrial take-up of deductive verification technology is in application areas that are characterized by high demands on software quality. This is clearly the case for safety- and security-critical domains that are regulated by formal standards overseen by certification authorities.

In many other application domains, however, timely delivery or new features are considered to be more important than quality. A contributing factor are certainly the relatively weak legal regulations about software liability. With the ongoing global trend in digitalization, however, we might experience a surge in software that can be deemed as safety- or security-critical, in particular, in the embedded market (e.g., self-driving cars [2], IoT). On the other hand, that market is partially characterized by a strong vendor lock-in in the form of modeling tools such as MATLAB/Simulink, which have no formal foundations. An interesting side effect of digitalization is the arrival of companies on the software market that so far had no major stake in software. Here is an opportunity for formal methods and formal verification, in particular, since software verification tools are as well applicable to cyber-physical systems [52, 73] (see Challenge C.5).

Formal specification and deductive verification methods are expressed relative to a target programming language. New features of languages such as C/C++ or Java are not introduced with an eye on verifiability, making formal verification and coverage unnecessarily difficult.

Achievements. The latest version of the DO-178C standard [115], which is the basis for certification for avionics products, contains the Formal Methods Supplement DO-333 that permits formal methods to complement testing. This makes it, in principle, possible to argue that formal verification can speed up or decrease the cost of certification.

The development of the concurrent modeling language ABS [74] demonstrated that it is possible to design a complex programming language with many advanced features that has an associated verification tool box with high coverage [127], provided that analyzability and verifiability are taken into account during language design.

Challenges. In order to ensure substantial impact of deductive software verification in society and industry, a coordinated effort is necessary to influence standardization and certification activities.

  1. ISC.1

    Researchers from the formal verification area should become involved in language standardization. In general, research in the fields of programming languages and formal verification must be better coordinated.

  2. ISC.2

    Researchers from the formal verification area should become actively involved in the standardization efforts of certification authorities.

  3. ISC.3

    Specific quality assurance measures for verification tools such as test coverage, incremental testing, external validation, etc. should be developed and applied. If deductive software verification should become usable in certification activities, the software quality of the verification tools themselves is a critical issue.

7 Summary

We described the progress made in the area of deductive software verification. Starting as a pen-and-paper activity in the late 1960s, deductive verification nowadays is a mature technique and it can substantially increase the reliability of software in actual production. Advanced tool support is available to reason about the behaviour of complex programs and library code, written in mainstream programming languages. Industrial applicability of deductive verification is witnessed by several success stories.

However, there are many challenges that need to be addressed to make the transfer from an academic technique to a technique that is a routine part of commercial software development processes. We divided these challenges into two categories: technical and non-technical. Technical challenges relate to what properties can be verified, what programs can we reason about, how we can make verification largely automatic, and how we provide feedback when verification fails. Non-technical challenges relate to how we can fund all necessary engineering efforts, how we can ensure that tool developers get sufficient scientific credits, and how to convince industrial management that the extra effort needed for verification will actually be beneficial. We hope that these challenges can serve as an incentive for future research directions in deductive software verification.