How can we be certain that software is reliable? Is there any method that can verify the correctness of software for all cases of interest? Computer scientists and software engineers have informally assumed that there is no fully general solution to the verification problem. In this paper, we survey approaches to the problem of software verification and offer a new proof for why there can be no general solution.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Characterizing the satisfaction relation in the verification problem in model-theoretic terms may seem to differ from the way some computer scientists characterize verification. Emerson (2008, 28), for example, suggests the verification problem is determining “whether or not the behavior [our emphasis] of M meets the specification h” where M is the program and h is the specification. Our approach does not depend on behavioral properties, as such, of a program. Instead, we characterize the satisfaction relation in terms of a function that relates the models of a program/software system to models of the specification (see Sect. 3 for more detail).
The root toolkit example shows how difficult it is in practice to distinguish some specification from implementation issues. To put this point more sharply, we could answer the requirement to cut a piece of wood with a saw or a stick of dynamite. Both would do the job; the dynamite would surely cause much collateral damage.
Collectively, D2, D3, and D5 significantly overlap what we mean when we say a verification procedure is algorithmic. In particular, an algorithmic procedure is an effective procedure, and an effective procedure by definition implies (D2), (D3), and (D5). One might, therefore, replace the union of (D2), (D3), and (D5) with a desideratum requiring an approach to verification to be algorithmic. Articulating (D) as shown, however, supports some informative distinctions among the relative strengths of approaches that have been taken to the verification problem, and for this reason we choose to adopt (D) in the more expansive form shown.
Two programs, A and B, are concurrent if at least some portion of those programs execute at the same time.
Copeland et al. (2016) make the case that hypercomputation should be taken seriously as a candidate for what is meant by “computation”, given hypercomputation’s compatibility with the Church-Turing thesis. We will not defend our choice to exclude hypercomputation in this paper, however see Davis (2004) for reasons to be skeptical.
This problem has been known, at least informally, since the earliest days of software testing.
Among philosophers, Jim Fetzer was the first to point out that software verification characterized as a testing problem poses challenges (Fetzer 1988) that are intractable in practice.
A control-flow statement in S is a statement that can, based on a condition that may not always obtain, change the order of execution of the statement in S.
Note that this definition requires that S can run to “completion”. Some software systems, such as operating systems, by design “run forever”, and thus have no “completion”.
This definition of path complexity is different from McCabe complexity, which is a count of the number of independent paths in S (McCabe 1976).
A conditional statement is a statement of the form “If X, do Y”. A Turing complete language, in addition to providing a way implement conditional statements, must also provide a way to implement loops. For the purpose of this paper, we can restrict the analysis to a program whose control statements are “if–then” statements only. Why? To show that verification-as-testing fails to satisfy (D1), it is sufficient to show that even if S contained only if–then control constructs, verification-as-testing would fail to satisfy (D1). (Accommodating loop control constructs in S only increases complexity.).
For a complete discussion of the path complexity catastrophe see Symons and Horner (2014).
In typical practice, “soak testing” refers to informally observing the behavior of S over P under nominal operating conditions.
Some variants of Linux, for example, contain a coverage-analysis utility, gcov.
A liveness property asserts that program execution eventually reaches some desirable state (Owicki and Lamport 1982).
Concurrent systems are often reactive systems (i.e., they execute in response to a stimulus (e.g., from a sensor) outside those programs). Reactive systems are often nondeterministic, so their non-repeatable behavior is not amenable to testing. Their semantics can be given as infinite sequences of computation states.
A system K has the small model property if and only if any satisfiable formula in K has a “small” finite model, i.e., a model whose size is a polynomial function of the formula size.
The model checker used in this case found errors in the original design of the system. Some of these errors would have made buildings less resistant to earthquakes.
Robinson arithmetic is “ordinary” (Peano) arithmetic without the Peano induction axiom.
Virtually all business and scientific software must implement arithmetic.
Note that a theory of arithmetic (or anything else) that is not finitely axiomatizable cannot be implemented on a finite Universal Turing Machine. No second-, or higher-, order theory of arithmetic, for example, can be implemented on a Universal Turing Machine. (See for example Chang and Keisler 2012, Chapter 1.).
To avoid a problem of self-reference one need only partition the specification into two components. One component would state the requirement for the relationship between the models of H and the models of S, and the other component would describe everything not included in or implied by the first component of the specification.
Although beyond the scope of this paper, it’s worth noting that criteria (A) and (Q) rest on a theory of verification that does not appear to be limited to software regimes as such, and thus might help to characterize verification in ordinary empirical science (and even more generally, in any regime in which verification must be accomplished by a multi-step procedure in finite time).
In practice, we would attempt to limit the range of such sequences to those that had passed some testing or formal verification regimen.
We thank a reviewer of an earlier version of this paper for this objection.
We thank Troy Catterson for suggesting this elegant construction.
Amdahl, G. M. (1967). Validity of the single processor approach to achieving large-scale computing capabilities. AFIPS Conference Proceedings, 30, 483–485. https://doi.org/10.1145/1465482.1465560.
Amman, P., & Offutt, J. (2016). Introduction to software testing (2nd ed.). Cambridge: Cambridge University Press.
Baier, C., & Katoen, J. P. (2008). Principles of model checking. Cambridge: MIT Press.
Black, R., Veenendaal, E., & Graham, G. (2012). Foundations of software testing ISTQB certification. Boston: Cengage Learning EMEA.
Blum, E. K., Paul, M., & Takasu, S. (Eds.). (1979). Mathematical studies of information processing: Proceedings of the international conference, Kyoto, Japan, August 23-26, 1978. Lecture notes in computer science (Vol. 75). Springer.
Boolos, G., Burgess, J., & Jeffrey, R. (2007). Computability and logic (5th ed.). Cambridge: Cambridge University Press.
Boschetti, F., Fulton, E. A., Bradbury, R., & Symons, J. (2012). What is a model, why people don’t trust them, and why they should. Negotiating our future: Living scenarios for Australia to 2050, 2, 107–119.
Brockwell, P. J., & Davis, R. A. (2006). Time series: Theory and methods (2nd ed.). Berlin: Springer.
Chang, C., & Keisler, J. (2012). Model theory (3rd ed.). New York: Dover.
Chung, K. L. (2001). A course in probability theory (3rd ed.). New York: Academic Press.
Clarke, E. M., Bloem, R., Veith, H., & Henzinger, T. A. (Eds.). (2018). Handbook of model checking. Berlin: Springer.
Clarke, E. M., & Emerson, E. A. (1981). Design and synthesis of synchronization skeletons for branching time temporal logic. In D. Kozen (Ed.), Logic of programs. Lecture notes in computer science (Vol. 131, pp. 52–71). Berlin: Springer.
Copeland, J., Dresner, E., Proudfoot, D., & Shagrir, O. (2016). Time to reinspect the foundations? Communications of the ACM, 59(11), 34–38.
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed.). Hoboken: Wiley.
Davis, M. (2004). The myth of hypercomputation. In C. Teuscher (Ed.), Alan turing: Life and legacy of a great thinker (pp. 195–211). Berlin: Springer.
DeMarco, T. (1979). Structured analysis and system specification. Englewood Cliffs: Prentice-Hall.
Diestel, R. (1997). Graph theory. New York: Springer.
Emerson, E. A. (2008). The beginning of model checking: A personal perspective. In O. Grumberg & H. Veith (Eds.), 25 years of model checking—History, achievements, perspectives. Vol. 5000 of lecture notes in computer science. Berlin: Springer.
Fetzer, J. H. (1988). Program verification: The very idea. Communications of the ACM, 31(9), 1048–1063.
Floridi, L., Fresco, N., & Primiero, G. (2015). On malfunctioning software. Synthese, 192(4), 1199–1220.
Floyd, R. W. (1967). Assigning meanings to programs. In Schwartz, J. T. (Ed.), Proceedings of a symposium in applied mathematics. Mathematical aspects of computer science (Vol. 19, pp. 19–32). Dordrecht: Springer.
Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38, 173–198.
Gries, D. (1981). The science of programming. New York: Springer.
Hennessy, J., & Patterson, D. (2007). Computer architecture: A quantitative approach (4th ed.). New York: Elsevier.
Hoare, C. A. R. (1969). An axiomatic basis for computer programming. Communications of the ACM, 12, 576–580.
Hogg, R., McKean, J., & Craig, A. (2005). Introduction to mathematical statistics (6th ed.). London: Pearson.
Horner, J. K. (2003). The development programmatics of large scientific codes. In Proceedings of the 2003 international conference on software engineering research and practice (pp. 224–227). Athens, Georgia: CSREA Press.
Horner, J. K., & Symons, J. (2014). Reply to Angius and Primiero on software intensive science. Philosophy & Technology, 27(3), 491–494.
Huth, M., & Ryan, M. (2004). Logic in computer science. Cambridge: Cambridge University Press.
IEEE. (2000). IEEE-STD-1471-2000. Recommended practice for architectural description of software-intensive systems. http://standards.IEEE.org. Accessed 10 Nov 2018.
Koopman, P. (2014). A case study of Toyota unintended acceleration and software safety. https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_slides.pdf. Accessed 17 Apr 2018.
Kozen, D. (1983). Results on the propositional μ-calculus. Theoretical Computer Science, 27, 333–354.
Littlewood, B., & Strigini, L. (2000). Software reliability and dependability: A roadmap. Proceedings of the Conference on the Future of Software Engineering. https://doi.org/10.1145/336512.336551.
Löwenheim, L. (1915). Über Möglichkeiten im Relativkalkül. Mathematische Annalen, 76(4): 447–470, https://doi.org/10.1007/bf01458217. A translation to English can be found in Löwenheim, Leopold (1977), “On possibilities in the calculus of relatives”, From Frege to Gödel: A Source Book in Mathematical Logic, 1879-1931 (3rd ed.), Cambridge, Massachusetts: Harvard University Press, pp. 228–251.
McCabe, T. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2, 308–320.
Millikan, R. G. (1989). In defense of proper functions. Philosophy of Science, 56(2), 288–302.
Mostowski, A., Robinson, R. M., & Tarski, A. (1953). Undecidability and essential undecidability in arithmetic. In A. Tarski, A. Mostowski, & R. M. Robinson (Eds.), Undecidable theories. New York: Dover reprint.
Nielson, F., Nielson, H. R., & Hankin, C. (1999). Principles of program analysis. Berlin: Springer.
Owicki, S., & Lamport, L. (1982). Proving liveness properties of concurrent programs. ACM Transactions on Programming Languages and Systems, 4, 155–495.
Pneuli, A. (1977). The temporal logic of programs. In Proceedings of the 18th annual symposium on foundations of computer science, pp. 46–57.
Reichenbach, H. (1957). The philosophy of space and time. (Maria Reichenbach, Trans.). Dover edition.
Skolem, T. (1920), Logisch-kombinatorische Untersuchungen über die Erfüllbarkeit oder Beweisbarkeit mathematischer Sätze nebst einem Theoreme über dichte Mengen. Videnskapsselskapet Skrifter, I. Matematisk-naturvidenskabelig Klasse, 6: 1–36. An English translation can be found in Skolem, T. (1977), “Logico-combinatorical investigations in the satisfiability or provabilitiy of mathematical propositions: A simplified proof of a theorem by L. Löwenheim and generalizations of the theorem”, From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931 (3rd ed.), Cambridge, Massachusetts: Harvard University Press, pp. 252–263.
Symons, J., & Alvarado, R. (2016). Can we trust Big Data? Applying philosophy of science to software. Big Data & Society, 3(2), 2053951716664747.
Symons, J., & Alvarado, R. (2019). Epistemic entitlements and the practice of computer simulation. Minds and Machines. https://doi.org/10.1007/s11023-018-9487-0.
Symons, J., & Horner, J. K. (2014). Software intensive science. Philosophy and Technology, 27(3), 461–477.
Symons, J., & Horner, J. K. (2017). Software error as a limit to inquiry for finite agents: Challenges for the post-human scientist. In T. Powers (Ed.), Philosophy and computing: Essays in epistemology, philosophy of mind, logic, and ethics (pp. 85–97). Berlin: Springer.
Turing, A. M. (1936). On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society, 42, 230–265.
Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX, 433–460.
Valmari, A. (1998). The state explosion problem. Lectures on petri nets I: Basic models. Lectures in computer science (Vol. 1491, pp. 429–528). Berlin: Springer.
Venema, Y. (2001). Temporal logic. In L. Goble (Ed.), The Blackwell guide to philosophical logic (pp. 259–281). Hoboken: Blackwell.
We are grateful to Perry Alexander, Ray Bongiorni, Troy Catterson, Richard de George, Corey Maley, Eileen Nutting, and two anonymous referees for this journal for their critical comments. This work is partly supported by The National Security Agency through the Science of Security initiative contract #H98230-18-D-0009.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Symons, J., Horner, J.K. Why There is no General Solution to the Problem of Software Verification. Found Sci 25, 541–557 (2020). https://doi.org/10.1007/s10699-019-09611-w
- Software verification
- Model checker
- Philosophy of computer science