1 Introduction

This paper is meant as a journey through the development of a technology that started as a completely intractable endeavor and now plays a key role in the success of various commercial projects (e.g., [10, 14, 51, 69, 125]).

We contribute this report to the Festschrift for Tom Henzinger, who has influenced the development in several ways. In particular, he led the Blast project with Ranjit Jhala and Rupak Majumdar, and he pushed for the convergence of data-flow analysis, model checking, and software testing.

Dirk came to UC Berkeley as a young postdoc to join Tom’s group. Dirk thought that he would work on topics such as timed and hybrid systems, but Tom had asked him whether he would have a problem with working on software model checking instead. He became immediately infected with the charm of the Blast project and since then he has never stopped working in this area, instantiating some of the joint ideas in the CPAchecker project and in the competition on software verification.

Andreas would discuss possible approaches with Tom in an earlier period of time, when software model checking did not exist yet but many people thought about it. Andreas distinctly remembers one discussion in front of the coffee machine at the Max Planck Institute for Computer Science in Saarbrücken, when he and Tom concluded that abstracting a program to a finite-state system seemed a bad idea (Tom, matter-of-factly: “a loser right from the start”). How inspiring a bad idea can be.

2 Timeline of Formal Verification of Software

This section outlines a few milestones in the area of software verification which we think were instrumental to the success and led to the breakthrough in technology.

2.1 Before 1962

First Insights (1880–1940). Mathematicians were concerned with the verification of consistency of arithmetic axioms since a long time. Giuseppe Peano described an arithmetic system of axioms [158], and David Hilbert was interested to know whether a contradiction can be generated after finitely many proof steps [113, 2nd problem]. The dream of a machine that can generate all truth ended after only a few decades, when Kurt Gödel showed that there are certain theorems that cannot be proven [101], and Alonzo Church and Alan Turing showed that our abilities to prove the correctness of programs are limited [62, 191]. Despite these initial ‘bad’ news, software verification can solve many interesting and practical problems. One of the approaches is to restrict the proof system to a decidable theory, for which Presburger arithmetics [167] is a prominent example, which is still often-used today.

Computing Machinery (1940s). Z3, the first working digital, automatic, and programmable computer, was constructed by Konrad Zuse and was ready to be used in 1941. It was a binary computer, built using relays. The second computer, ENIAC, was completed in 1944, based on vacuum tubes. Leibniz and Babbage also constructed computers, but they were not digital, automatic, and programmable. The unavailability of good hardware foundations had hindered the development of computers for a long time. Not only the hardware foundations were missing as an enabling technology: The enabling theories in logics were also not yet sufficiently developed. Predicate logic was needed to prove that the halting problem of Turing machines is undecidable [191]. In parallel, Shannon showed how to implement Boolean algebra using electric circuits [185], which is still how computers are built today (just using transistors instead of relays and somewhat smaller).

Assertions, Proof Decomposition, and Abstraction. As early as 1949, Alan Turing published a method —based on assertions— to prove the correctness of computer programs [192]. He wrote: “In order that the man who checks may not have too difficult a task the programmer should make a number of definite assertions which can be checked individually, and from which the correctness of the whole program easily follows.” Assertions are nowadays one of the most common notations to write invariants in software development.

Abstraction was considered the key for proving correctness, and assertions are abstracting the states at a certain location in the program. Konrad Zuse understood that programming requires abstract languages, and developed the first high-level programming language designed for a computer (Plankalkül [180]).

Craig Interpolation (1950s). But how to automatically compute abstractions? William Craig defined interpolation for logic formulas in 1957 [75]. Given two formulas \(\phi _1\) and \(\phi _2\) such that \(\phi _1\) implies \(\phi _2\), an interpolant for \(\phi _1\) and \(\phi _2\) is a formula \(\psi \) that is implied by \(\phi _1\) and that implies \(\phi _2\), and contains only symbols that occur in both \(\phi _1\) and \(\phi _2\). Applied to program verification, if \(\phi _1\) represents a program path and \(\phi _2\) represents a safety property, then the interpolant \(\psi \) is an abstraction of the program path \(\phi _1\) that (1) can be automatically constructed and (2) makes it potentially easier to prove that the property \(\phi _2\) holds.

2.2 After 1962

Decision Procedures (1960s). The advent of programmable computers and continuous advancements of the theory made it possible to implement automatic theorem proving [79, 80, 95, 166, 170]. The algorithm of Davis, Putnam, Logemann, and Loveland is still used today and led to the notion of decision procedures. Such decision procedures where further extended to combinations with other theories [153, 154] and led to the theorem prover Simplify [81], which was used as backend in the Extended Static Checkers ESC/Java [88] and ESC/Modula-3 [139].

Program Correctness. In the 1960s, the availability of computers led to an enormous growth of software production. At the same time, the fundamental principles of programming and engineering of large software systems were not yet sufficiently studied. The term software engineering was established and a conference held: the first NATO Software-Engineering Conference took place 1968 in Garmisch, Germany. One of the solutions was to support the software development by formal methods [82, 83, 89, 114, 143, 200], which were establishing mathematically precise foundations of computer programming.

Data-Flow Analysis and Abstract States (1970s). In his famous POPL 1973 paper “A Unified Approach to Global Program Optimization” [126], Gary Kildall provided many of the technical ingredients of data-flow analysis that we still use today (fixed-point iteration, lattice operations, ...). The mathematical foundation for more general forms of program analysis was then given by Patrick and Radhia Cousot [74]. The general idea is to define an abstract domain via a lattice and then compute a fixed point in order to construct an overapproximation of the behavior of the program (see also [155]).

LTL and Model Checking (1980s). Zohar Manna, Amir Pnueli, Ed Clarke, Allen Emerson, and Joseph Sifakis contributed theoretical and conceptional foundations to the verification of systems (not only software systems), leading to the notion of model checking [68, 169]. Manna and Pnueli developed LTL as a specification language and used it to formally specify the behavior of a system using temporal logic [142]. Tools based on model checking became more and more important. Binary decision diagrams [3, 137] were extended to their shared and reduced versions by Randy Bryant [47], and he introduced BDDs as a data structure with a wide applicability in formal methods. For many years, the article by Randy Bryant was the most cited article in computer science. We refer to the Handbook of Model Checking [67] for overviews on specific topics on model checking, specifically temporal logic [160] and binary decision diagrams [48].

Symbolic Model Checking (1990s). While the 80s produced many of the theoretical foundations, the 90s brought verification algorithms to practice. Ken McMillan introduced BDDs as the data structure for symbolic model checking [50, 144]. BDDs and other symbolic state-space representations became an enabling technology to verify large systems.

Predicate Abstraction. In 1997, as a step towards connecting model checking with program verification, Susanne Graf and Hassen Saïdi developed a deduction-based method to partition the state space of a program according to an equivalence relation defined by a given finite set of state predicates [96]. We obtain a finite abstract system if we associate each block in the partition with an abstract state. The abstract system contains a transition between two blocks if the program has a transition between a state from one block to a state from the other block. If the state predicates and the program’s transition relation are represented by logical formulas, then the existence of such states in each of the two blocks reduces to the satisifiability of a logical formula (and this is how deduction comes in).

2.3 Software Model Checking

Tools for Software Model Checking (2000s). The time was ripe for software model checking. In summer 2000, Tom Ball and Sriram Rajamani, with help from others, notably Rupak Majumdar and Todd Millstein, developed Slam  [10, 11, 14], a tool that performs an abstraction-refinement loop. In each iteration of the loop, the tool, using a first-order logic theorem prover, abstracts the given C program (with procedures, possibly recursive) for a given set of predicates. If an error path is found, it checks the feasibility of the sequence of transitions that corresponds to the error path in the abstract system by checking satisifiability. If the error path is infeasible, it uses the proof of unsatisfiability to derive new predicates for the refined abstraction in the next iteration of the loop. The notion of counterexample-guided abstraction refinement (CEGAR) was born, developed around the same time in the context of software programs [13] and in the context of finite-state systems [65, 66]. In fall 2000, Tom Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre, with help from others, developed Blast  [32, 111, 112], a tool that implements a similar abstraction refinement but circumvents the abstraction of the whole C program by lazily constructing an abstract reachability tree.

These early developments received a lot of attention, and software model checking became a research topic on its own. The Slam project paved the road for the success of software model checking in industrial software development. The success of Slam is witnessed by the Static Driver Verifier projectFootnote 1, which is based on Slam and was used as part of Microsoft’s Windows Driver Development Kit in daily software production. The Blast project showed the effects of applying Craig interpolation to the abstraction-refinement process in program analysis [111], first using McMillan’s original FOCI library [146] and later the independently developed SMT solver CSIsat [40]. Later versions of Blast, which was by that time maintained by a different group [188], received gold medals in the category Device Drivers of the competition on software verificationFootnote 2 in 2012, 2014, and 2015. Both projects were highly influencial in the research community: the PLDI ’01 paper on Slam  [11] received a PLDI test-of-time award in 2011Footnote 3, and the POPL ’04 paper on Blast  [111] received a POPL test-of-time award in 2014Footnote 4.

Also, as a sign of maturity, survey papers appeared, on software verification [85], on software model checking [121], and on deductive verification [19]. A recent survey addresses the current status of formal methods [92], and competition reports give an overview over the status of tools for software verification [30].

Satisfiability Modulo Theory. In the early 2000s, there was an enormous progress in research on satisfiability (SAT), with the appearance of efficient implementations of algorithms for SAT solving, most notably Chaff [149]. Theory combinations led to the notion of satisfiability modulo theories (SMT), an integration of SAT with theories like linear arithmetics, bitvectors, and arrays. The SMTLIB format for input formulas [16] facilitated the use of SMT tools. Some SMT solvers support interpolation; examples are CSIsat [40], MathSAT [46], and SMTInterpol  [61].

Boolean and Cartesian Abstraction. At the beginning, when predicate abstraction was first used for the abstraction of C programs (by Slam and Blast), it was implemented by Cartesian abstraction [12]. At that time, disjunctions were not efficiently supported by automatic solvers such as Simplify [81]. Only later, interpolating SMT solvers such as CSIsat [40] and MathSAT [46] could handle disjunctions efficiently. Cartesian predicate abstraction seemed suitable as long as only simple program paths were encoded in path formulas. In connection with large-block encoding [31, 36], however, when it was possible to delegate large amounts of work (i.e., large formulas) to the SMT solvers, it became feasible to use Boolean abstraction [129] to implement predicate abstraction, as done in CPAchecker  [35].

Verification with Interpolants. Ken McMillan published how to use Craig interpolation [75] for finding abstract descriptions of the behavior of transition systems [145]. Later, Craig interpolation was also applied to program paths, in order to automatically learn abstractions for the verification of computer programs [111]. For every program path we can construct a formula such that the program path is infeasible (there is no execution of all statements along the path) if and only if the formula is unsatisfiable. Assume that we have split a given infeasible program path at a certain program location, and the path prefix and the path suffix correspond to the path formulas \(\phi ^{pre}\) and \(\phi ^{post}\), respectively. Then \(\phi ^{pre}\) implies the negation of \(\phi ^{post}\). The interpolant \(\psi = itp(\phi ^{pre}, \lnot \phi ^{post})\) represents an abstraction; i.e., it describes what we need to know about the states after executing the path prefix in order to derive that continuing the execution of the path suffix is not possible. Ken McMillan also developed the first tool for automatically computing interpolants [146]. An overview on the use of interpolation for verification is given in a chapter in the Handbook on Model Checking [147].

Trace Abstraction. As an alternative to constructing a sequence of (more and more refined) abstractions (abstract systems or abstract reachability graphs), the approach of trace abstraction [108] is to construct a sequence of programs until all paths of the input program are covered. Each program in the sequence is constructed from the proof of the infeasibility of a (spurious) counterexample. The covering check can be reduced to automata inclusion.

Termination. After a series of breakthroughs in making safety analysis of large software systems practically relevant, also liveness properties were investigated. Algorithmic approaches for constructing ranking functions [161] made it possible to perform termination analysis. Since termination of functions in the operating system are a major concern, e.g., for Microsoft, tool support for termination analysis [70, 162] became important.

Competition on Software Verification (2010s). In order to make progress explicit and show that there are many good tools for software verification available, a competition on software verification (SV-COMP) was developed 2010–2011, with the first results published in 2012 [20]. Such competitions create awareness of tools and the available technology, provide comparative evaluations, and establish standards (e.g., input formats, results formats, comparability, reproducibility). The most recent instance of the competition evaluated 47 verification tools.

Property-Directed Reachability. Property-directed reachability (PDR) is a SAT/SMT-based reachability algorithm that incrementally constructs inductive invariants. After it was successfully applied to hardware model checking [43, 44], several adaptations to software model checking have been proposed [42, 63, 64, 130, 131].

Interpolation-Based Model Checking. While interpolation became a key ingredient in many verification approaches for software, the original algorithm from 2003 [145] was adopted to the verification of software only recently [37].

Approaches Used in Tools. Current tools usually combine a set of approaches. We report in Table 4 which approaches are used by tools for software verification.

2.4 Current Developments (2022)

While the focus of the past decades was on contributing tools that implement verification approaches in order to make the research results practically usable, we today observe a move from a lack of tools to an abundance of tools (see Sect. 3.3).

A new research question arises in this context: How can we integrate existing verification systems in order to maximally benefit from their respective strengths. To enable cooperation between verification tools, we need standardized interfaces that make it possible to pass artifacts with valuable information from one tool to another. Such verification artifacts include, besides the programs and their specifications, also transformed or reduced programs, error paths, invariants, witnesses, and partial verification results in general [39, 41].

3 Maturity of the Research Area

The area of software model checking is 22 years old at the time of writing, and several aspects indicate that software model checking is a mature research area. We outline a few such indicators in the following.

3.1 Competitions

It is well understood that competitions are an important scientific method. Competitions provide regular comparative evaluations. In the area of formal methods, there are plenty of competitions [17], most of them being concerned with comparisons of tools that solve a certain kind of problem, most prominently, SAT and SMT solving. Five competitions are concerned with the verification of software: RERS, SV-COMP, Test-Comp, VerifyThis, and TermComp (Table 1).

Table 1. Competitions in the area of software verification

The Competition on Software Verification (SV-COMP) provides annually a comparative evaluation of automatic tools for software verification. The first results were published in 2012 [20]. The objectives of the competition include:

  • create awareness of tools,

  • provide yearly comparative evaluations,

  • create and maintain a benchmark collection (SV-Benchmarks repository),

  • establish standards (input, exchange, comparability, reproducibility),

  • conserve tools at a central place and make them available, and

  • educate PhD students and postdocs on benchmarking and reproducibility.

The competition was a success, in that all of the above-mentioned objectives were achieved. Over the last ten years, more and more verification tools participated, and the last edition was comparing 47 verification tools.

3.2 Publication Venues and Research Activity

Software model checking is a research field at the intersection of programming languages, software engineering, and theory of computation. Thus, the research results are mainly published in outlets in the area of programming languages, such as POPL, PLDI, and OOPSLA, of software engineering, such as ICSE, ESEC/FSE, ASE, and ISSTA, and of formal methods, such as CAV, TACAS, and ATVA.

Figure 1 illustrates the development of the research area. We created a mapping from each year in the range 1999–2008 to the number of search results of Google Scholar for the search term “software model checking” in its first 10 years. The graph drawn in Fig. 1 illustrates how the interest in software model checking was growing in the early 2000s s years, and how it stabilized afterwards.

Fig. 1.
figure 1

Number y of search results found by Google Scholar for “software model checking” per year x; illustrates growing interest in the topic in the first 10 years

Fig. 2.
figure 2

Number y of citations up to year x of SV-COMP reports according to the COCI CSV data set [157]; illustrates constant interest in verifier competitions

As mentioned above, the competition SV-COMP serves as a platform for creation and maintenance of benchmark sets and community standards. To illustrate the continuous interest in the topics of the competition, we counted the number y of citations up to year x and draw the function in Fig. 2. We used COCI [106], an open citation index that is regularly extended by OpenCitations. We used the data set version 16 (2022-08-31) [157], and counted the number of citations of any of the SV-COMP reports [20,21,22,23,24,25,26, 28,29,30].

3.3 Verification Tools and Artifacts

The research community developed new approaches, and implemented them in readily available tools. As shown by the competition SV-COMP, there are many verification systems available. Table 2 illustrates the rich set of verification tools, by listing the tool names, the language that they are mainly used for, references to literature, contact persons, and the location where the tools are developed, maintained, and hosted. All listed tools participated at least once in the competition on software verification SV-COMP. This is also a sign of maturity: Researchers develop tool implementations and hand them in for evaluation. Table 3 shows which tool participated when in the competition. It is interesting to see that there are verification systems that are long-term maintained and participate often, and there are some research prototypes, made to explore an idea, participated once, and then abandoned. The overview in Table 4 shows that there are many different technologies implemented and used.

Table 2. Tools for software verification, with the progamming language for which they participated (J for Java), main references, contact, and origin (assembled from SV-COMP reports 2012–2022)
figure a
Table 3. Participation in SV-COMP evaluations 2012–2022
figure b
Table 4. Algorithms and techniques, by verifier (assembled from SV-COMP reports)

4 Conclusion

We have given an overview over several mile stones in the history and the development of software verification, and have illustrated the maturity of the research area. This report also show-cases the research area by providing a comprehensive collection of competition-evaluated verification systems for the programming languages C and Java. We will not speculate about the future of software verification, but current trends are concerned with, for example, verification witnesses, concurrent programs, unbounded parallelism, termination, cooperative verification, machine-learning-based invariant generation, hyper-properties, and quantum programs.