KLEE symbolic execution engine in 2019

KLEE is a popular dynamic symbolic execution engine, initially designed at Stanford University and now primarily developed and maintained by the Software Reliability Group at Imperial College London. KLEE has a large community spanning both academia and industry, with over 60 contributors on GitHub, over 350 subscribers on its mailing list, and over 80 participants to a recent dedicated workshop. KLEE has been used and extended by groups from many universities and companies in a variety of different areas such as high-coverage test generation, automated debugging, exploit generation, wireless sensor networks, and online gaming, among many others.

reaching capacity. The workshop was sponsored by the UK EPSRC, Baidu, Bloomberg, Fujitsu, Huawei, and Imperial College London.
Talks covered a wide range of topics related to KLEE and symbolic execution, such as scalability, usability, memory models, constraint solving, and new application domains. The schedule included both academic and industry speakers, with academic keynotes from Sarfraz Khurshid (UT Austin), Alessandro Orso (Georgia Tech), and Abhik Roychoudhury (NUS) and industry keynotes from Indradeep Ghosh (Fujitsu) and Peng Li (Baidu).

Software project and contributors
KLEE is an open-source tool, released under a liberal UIUC license and hosted on GitHub. 4 It is the work of many different contributors, over 60 of whom are listed on GitHub. 5 Since KLEE was moved to GitHub only in 2013, not all contributors are listed there. KLEE would not have been such a successful tool without its open-source contributors. Special thanks go to Daniel Dunbar, the main author of the original tool, and the maintainers of KLEE over the years, who, in addition to the authors of this paper, included in the past Daniel Dunbar, Dan Liew, and Andrea Mattavelli.

Software architecture
KLEE works at the level of LLVM bitcode, the intermediate language of the widely used LLVM compiler infrastructure [16]. It provides an interpreter that can execute almost arbitrary code represented in LLVM IR, both concretely and symbolically.
One of the main strengths of KLEE is its modular and extensible architecture. For example, while KLEE already provides a variety of different search heuristics to explore the program state space, it can be easily extended with new ones. A similar approach is taken for constraint solving, with constraint solving activities (such as optimisations and caching) structured as a series of stages. Existing stages can be readily enabled or disabled, and new ones easily added. Moreover, a variety of different SMT solvers are supported by different back ends, such as STP [12] (the default solver), Boolector [3], CVC4 [1], Yices 2 [11], and Z3 [10]. Some of these solvers are supported via the MetaSMT framework [14].

Modifications for Test-Comp 2019
The version of KLEE submitted to Test-Comp 2019 is based on commit b845877a in mainline KLEE (from January 2019). It uses the default options at that time, except for a few changes discussed below. While KLEE supports multiple solvers, we decided to configure it with STP, which in prior experiments had the best overall performance [20]. The use of other solvers could be explored in future editions of the competition.
We made several modifications to KLEE based on the nature of the Test-Comp benchmarks. For instance, we configured it differently for the bug-finding and the coverage categories, e.g. with the former configuration stopping KLEE as soon as an error is found, and the latter generating tests on the fly as soon as new basic blocks are covered.
As another example, we extended KLEE to better handle large numbers of symbolic variables, which is atypical for the benchmarks on which KLEE is usually run. We also extended KLEE to support the generation of XML-based test cases, a requirement for Test-Comp.
The Test-Comp benchmarks also helped reveal a series of bugs in KLEE, such as one affecting the handling of arrays of symbolic sizes and one concerning the debug information used in a coverage-based search heuristic. Besides that, we reduced the memory footprint of KLEE to handle more execution states simultaneously by sharing more information between states (e.g. sharing of common stack frames).

Set-up and configuration for Test-Comp 2019
The binary artefact is publicly available from the Test-Comp 2019 archives. 7 It runs on Ubuntu 18.04 and uses LLVM 6.0. Details related to the Test-Comp specific invocation can be found as part of the Python script bin/klee inside the artefact directory. This script compiles the given C file into LLVM bitcode and invokes KLEE (klee_build/bin/ klee) on it. For detailed information on available options, see the documentation provided via the integrated help (klee -help) and at http://klee.github.io/.
We opted for all Test-Comp categories, even though support for symbolic floating-point arithmetic is not part of the artefact. We note that there are two extensions of KLEE for floating point [19], but they have not been integrated into the mainline yet.

Test-Comp 2019 results
KLEE placed second in both the bug-finding and coverage categories. In the coverage category, the tool was particu-larly close to the winning tool, VeriFuzz, gaining 1226 points vs 1238 points for VeriFuzz. In the bug-finding category, it gained 499 points vs 595 points.
One notable observation is that while KLEE found much fewer bugs in the bug-finding category than the winning tool (437 vs 592), it found the vast majority of the common bugs faster: out of the 433 bugs that both KLEE and the winning tool found, KLEE found 345 of the bugs quicker.
KLEE gained points in every sub-category, with the exception of the bug-finding sub-category involving floats, as the submitted version of KLEE does not have support for symbolic floating-point values (see Sect. 3.2).
More details on the results, including per-task results and score-based quantile plots, can be obtained at https://testcomp.sosy-lab.org/2019/results/results-verified/.

KLEE and the Test-Comp 2019 benchmarks
The main strength of KLEE, and dynamic symbolic execution in general, is that by modelling paths using mathematical formulas, it has applications to a wide variety of problems, beyond test generation and bug finding. Examples include debugging, specification inference, program and input repair, fault reproduction, and patch analysis.
In the context of the tasks relevant to Test-Comp, test generation and bug finding, KLEE benefits from a systematic exploration of the program search space and its ability to reason about all possible values on each path explored.
The weaknesses of KLEE and symbolic execution more generally are well documented and are mainly related to path explosion and constraint solving [8].
The benchmarks used in Test-Comp are inherited from SV-COMP, the software verification competition. As a result, they have a verification twist and are quite dissimilar to the benchmarks currently used to evaluate software testing tools. First, they are quite small: using the cloc tool, we measured a range of 10 to 184,969 executable lines of code (ELOC), but with a median of only 1409 ELOC. By contrast, the benchmarks currently used in software testing research are considerably larger.
Another particularity of the current Test-Comp benchmarks is that they involve huge numbers of symbolic bytes, as well as memory objects with arbitrary sizes. Furthermore, there are many hand-crafted examples to make analysis difficult. These present interesting challenges for testing tools, but they are quite atypical for the way testing tools are usually used.
We think that new types of benchmarks and challenges are needed to encourage more robust software testing tools. We believe it is important for future editions of Test-Comp to incorporate larger real-world applications like the ones used in current evaluations of testing tools. The difficulty is, of course, to incorporate such benchmarks in the current evaluation infrastructure, construct appropriate drivers for the competition environment, and deal with the complexity brought by real-world applications, such as complex build systems and interactions with the environment.
We are looking forward to continuing discussing these challenges with members of the community.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.