Fast detection of concurrency errors by state space traversal with randomization and early backtracking

State space traversal is a very popular approach to detect concurrency errors and test concurrent programs. However, it is not practically feasible for complex programs with many thread interleavings and a large state space. Many techniques explore only a part of the state space in order to find errors quickly—building upon the observation that errors can often be found in a particular small part of the state space. Great improvements in performance have been achieved also through randomization. In the context of this research direction, we present the DFS-RB algorithm that augments the standard algorithm for depth-first traversal with early backtracking. Specifically, it is possible to backtrack early from a state before all outgoing transitions have been explored. The DFS-RB algorithm is non-deterministic—it uses random numbers, together with values of several parameters, to determine when and how early backtracking takes place in the search. To evaluate DFS-RB, we performed a large experimental study with our prototype implementation in Java Pathfinder on several Java programs. The results show that DFS-RB achieves better performance in terms of speed and error detection than many state-of-the-art techniques for many benchmarks in our set. Nevertheless, it is difficult to find a single configuration of DFS-RB that works well for many different benchmarks. We designed a ranking algorithm whose purpose is to identify configurations that yield overall consistently good performance with a small variation.

The first phase of this work was partially supported by the Czech Science Foundation project 14-11384S, and the second phase was partially supported by the Czech Science Foundation project 18-17403S. It was also partially supported by the Natural Sciences and Engineering Research Council of Canada.

Author information



Corresponding author

Correspondence to Pavel Parízek.

A Results of experiments with twelve benchmarks

A Results of experiments with twelve benchmarks

Table 12 Results for Elevator
Table 13 Results for Alarm Clock
Table 14 Results for Linked List
Table 15 Results for Producer Consumer
Table 16 Results for RAX Extended
Table 17 Results for Replicated Workers
Table 18 Results for jPapaBench
Table 19 Results for Monte Carlo
Table 20 Results for CDx
Table 21 Results for Cache4j
Table 22 Results for QSortMT

Here we provide Tables 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 with concrete data on error detection performance that were used to create the graphs in Fig. 8. We created one table per benchmark to enable easy comparison of the error detection performance of different configurations and techniques on each benchmark. Each table is divided into three segments. The top segment contains the performance data for selected configurations of DFS-RB from the overall top 10 list (which reflects the combined score). We selected the configurations at positions 1, 2, 3, and 10 in the list in order to show the best ones while also covering the whole range. The middle segment of each table shows the performance data for the state-of-the-art techniques that we included in our experimental comparison. Finally, the bottom segment contains the data for the combination of the overall best configuration of DFS-RB with the state-of-the-art techniques.

