1 New in STTT: Competitions and challenges

We are proud to announce a new theme in STTT: “Competitions and Challenges”. This is the inaugural issue of the newly introduced theme of the journal Software Tools for Technology Transfer.

This theme is dedicated to make available overview articles about new competitions, progress reports of established research competitions, and articles that provide insights about research competitions as a scientific method. For the various research communities working on tool implementations, it is important to bring together the community and to compare the state of the art, in order to identify progress of and new challenges in the research area. Also, one of the main challenges in tool development is that it requires considerable engineering effort. In order to publish and widely disseminate the knowledge about tools that represent the state of the art according to the latest research results, we need to obtain results using scientifically valid methods, and rigorous comparative evaluations are an example for such a method.

Evaluation of scientific contributions can be done in many different ways—research competitions and challenges are suitable to evaluate tools and have been a success story so far. Community challenges, or grand challenges, are problems that cannot be solved by a single research team but by the whole community as a long-term project, potentially spanning decades. The goal of such challenges is to focus the community effort on certain topics. Competition events can serve as milestones to capture a certain status. For example, in the early 1990’s, when the research area of formal methods became more mature, case studies were proposed to get an overview of the strengths and weaknesses of the various modelling approaches. The first such ‘competition’ was probably the Production Cell case study of the KORSO project [22]. After that, there have been many more, with the VerifyThis Long-Term Challenge [16] as the most recent one. These challenges are in the tradition of evaluating approaches, instead of tool performance.

A different example of challenges are community exemplars, such as for example the Pacemaker ChallengeFootnote 1, which has been used in over 50 formal-methods research papers and at least one book to illustrate and evaluate formal methods. These are examples that are provided (and perhaps enhanced over time by the community) for the purpose of show-casing and comparing techniques. Such examples are typically designed explicitly as an open-source subject for demonstrating the application of rigorous techniques, while incorporating domain realism (for example, by adapting them from real-world industry artifact), scale, and complexity.

The first formal-methods competition of tools was the SAT competition, which was founded in 1992 [18], shortly followed by the CASC competition in 1996 [27]. Since the year 2000, the number of dedicated formal-methods and verification competitions was steadily increasing. Many of these events now happen regularly, gathering researchers that would like to understand how well their research prototypes work in practice. Scientific results have to be reproducible, and powerful computers are becoming affordable; thus, these competitions are becoming an important means for advancing research progress.

The scope of the new CoCha theme is specialized on, but not limited to, the following publications:

  • reports about competitions that describe the progress of technology,

  • system descriptions that provide an overview of tools that participated in a competition,

  • analysis articles and surveys on the topic of competitions,

  • articles that focus on reproducibility and benchmarking technology,

  • articles that describe benchmark sets that are used in research competitions,

  • proposals and definitions of community challenges,

  • progress reports on community challenges, and

  • proposals for open-source system examples that are explicitly designed to stimulate community cross-assessment of different methods and demonstration of integration of methods across the system life-cycle.

The Theme Editors in Chief for the STTT theme “Competitions and Challenges” are:

  • Dirk Beyer (LMU Munich, Germany)

  • Marieke Huisman (University of Twente, Netherlands)

2 This special issue

TOOLympics 2019 was an event to draw attention to the achievements of the various competitions, and to understand their commonalities and differences. The event was part of the celebration of the 25\(^{th}\) anniversary of the conference TACAS and was held at ETAPS 2019 in Prague, Czechia. TOOLympics 2019 [3] included presentations of 16 competitions in the area of formal methods: CASC [26], CHC-COMP, CoCo [1], CRV [4], MCC [19], QComp [13], REC [11], RERS [14], Rodeo (planned), SAT [5], SL-COMP [25], SMT-COMP [2], SV-COMP [6], termCOMP [23], Test-Comp [7], and VerifyThis [15].

This issue is the first of two special issues on the TOOLympics 2019 event. The issue is dedicated to Test-Comp, the International Competition on Software Testing, which was held for the first time in 2019. The goals and design of the competition are described in the competition description [7]. The participating teams submitted test-generation tools, and the competition execution consists of (a) running the test-generation and (b) evaluating the produced test-suites regarding coverage. This journal issue contains articles that present the results of the competition in a report by the organizer and 7 selected competition contributions, which are briefly described in the following.

2.1 First international competition on software testing [8]

The competition report provides an overview of the competition, the definitions, technical setup, composition of the competition jury, the scoring schema and ranking calculation, and the results.

2.1.1 CoVeriTest: Interleaving value and predicate analysis for test-case generation [17]

CoVeriTest is a hybrid approach to test generation that combines several verification techniques. The tool interleaves a predicate analysis and a value analysis, and allows cooperation between the analyses. For the Test-Comp participation, a configuration was used in which both analyses reuse the internal data structures (abstract reachability graphs) from their previous iteration. CoVeriTest is based on the verification framework CPAchecker.

2.1.2 CPA/Tiger-MGP: Test-goal set partitioning for efficient multi-goal test-suite generation [24]

CPA/Tiger-MGP implements a test-generation technique that is based on configurable multi-goal set partitioning (MGP). The tool supports configurable partitioning strategies and processes several test goals at once in a reachability analysis. CPA/Tiger-MGP is based on a predicate-abstraction-based program analysis of the verification framework CPAchecker.

2.1.3 Esbmc 6.1: Automated test-case generation using bounded model checking [12]

Esbmc is a bounded model checker that uses an SMT-solver as backend. The tool participated in the theme in which the test specification was that a test should be produced that covered a certain function call. For Test-Comp 2019, Esbmc incrementally increased the bound until the specific function call is reached in the program. Once Esbmc has found an error path to the function call, it produces a test suite that contains at least one test to expose the reachability of the function call.

2.1.4 FairFuzz-TC: A fuzzer targeting rare branches [21]

FairFuzz is an Afl-based fuzzing tool that uses coverage-guided mutation. By targeting the mutation strategy towards rare branches, it tries to increase code coverage quickly. The tool participated in Test-Comp with a few modifications, and the competition contribution is called FairFuzz-TC.

2.1.5 Klee Symbolic execution engine in 2019 [9]

Klee is a tool for dynamic symbolic execution. The tool automatically explores the paths of a program, using a constraint solver to decide path feasibility. Klee integrates the solvers Stp, Boolector, Cvc4, Yices 2, and z3. In the configuration for Test-Comp, the tool uses the solver Stp for best performance and was extended such that it can better handle large numbers of symbolic variables.

2.1.6 Plain random test generation with PRTest [20]

PRTest is meant as a baseline tool for test-generation, which means that it uses only a ‘plain’ approach of random test generation in a black-box manner. PRTest executes the program for which the tests shall be generated and creates a new test value randomly whenever a value is required. The test vector is recorded and in the end, the achieved coverage is measured; the new test vector is added to the test suite only if it increases the coverage. This is executed repeatedly until the coverage criterion is satisfied or the time limit is reached. PRTest is publicly available and open source.

2.1.7 Symbiotic 6: Generating test-cases by slicing and symbolic execution [10]

Symbiotic is a tool for bug-finding that works in two phases: first, it preprocesses the input program by applying static analyses, instrumentation, and program slicing, and second, it executes a symbolic-execution engine to find interesting program paths. Klee is used as backend for symbolic execution.