Composing verification techniques in sequence has in the past been a promising approach in the annual software verification competition SV-COMP. Especially in 2018Footnote 1, the software verification framework CPAchecker [3], using a composition of analyses, was able to outperform competitors in category ReachSafety. However, the analysis sequence is often predefined and fixed. In other words, a problem instance might pass through a sequence of unsuccessful verification configurations until it is processed by the right technique or exceeds a time limit.
Our competition contribution utilizes the sequential setting of CPAchecker (more precisely, of CPA-Seq), but predicts the order of verification tools viz. configurations. For this, we applied an extension of our rank prediction approach introduced in [7]. Basically, for a given verification task we predict an ordering of CPAchecker configurations, and then sequentially run these configurations. Configurations are ordered with respect to their (likely) performance on the verification task.
The prediction employs machine learning. For the learning, we extract features of verification tasks via an encoding of programs as graphs combining concepts of control-flow and program dependence graphs with abstract syntax trees. Features represent certain graph substructures of programs, where the depth of substructures considered is configurable.
To obtain the execution order for a new problem instance, the Ranking by pairwise comparison (RPC) [9] framework is employed utilizing kernelized Support Vector Machines (SVM) [11] as base learners. By employing SVMs, we are able to choose a kernel functionFootnote 2 (similar to Weisfeiler-Lehman kernels [12]) that is specifically designed for graph substructures. However, the function proposed in [7] needed to be computed between the input instance X (the graph of a verification task) and every training sample Y, which can be quite costly in practice. As a consequence, we have re-implemented this approach and now compute Weisfeiler-Lehman-based features of single graphs. This significantly improves the performance of prediction.