figure a

1 Introduction

In recent years, many algorithms for solving string constraints have been developed and implemented in SMT solvers such as Norn [6], CVC4 [12], and Z3 (e.g., Z3str2 [13] and Z3str3 [7]). To validate and benchmark these solvers, their developers have relied on hand-crafted input suites [1, 4, 5] or real-world examples from a limited set of industrial applications [2, 11]. These test suites have helped developers identify implementation defects and develop more sophisticated solving heuristics. Unfortunately, as more features are added to solvers, these benchmarks often remain stagnant, leaving increasing functionality untested. As such, there is an acute need for a more robust, inexpensive, and automatic way of generating benchmarks to test the correctness and performance of SMT solvers.

Fuzzing has been used to test all kinds of software including SAT solvers [10]. Inspired by the utility of fuzzers, we introduce StringFuzz and describe its value as an exploratory testing tool. We demonstrate its efficacy by presenting limitations it helped discover in leading string solvers. To the best of our knowledge, StringFuzz is the only tool aimed at automatic generation of string constraints. StringFuzz can be used to mutate or transform existing benchmarks, as well as randomly generate structured instances. These instances can be scaled with respect to a variety of parameters, e.g., length of string constants, depth of concatenations (concats) and regular expressions (regexes), number of variables, number of length constraints, and many more.

Contributions

  1. 1.

    Footnote 1 The StringFuzz tool: In Sect. 2, we describe a modular fuzzer that can transform and generate SMT-LIB 2.0/2.5 string and regex instances. Scaling inputs (e.g., long string constants, deep concatenations) are particularly useful in identifying asymptotic behaviors in solvers, and StringFuzz has many options to generate them. We briefly document StringFuzz’s components and modular architecture. We provide example use cases to demonstrate its utility as an exploratory solver testing tool.

  2. 2.

    A repository of SMT-LIB 2.0/2.5 instances: We present a repository of SMT-LIB 2.0/2.5 string and regex instance suites that we generated using StringFuzz in Sect. 3. This repository consists of two categories: one with new instances generated by StringFuzz (generated); and another with transformed instances generated from a small suite of industrial benchmarks (transformed).

  3. 3.

    Experimental Results and Analysis: We compare the performance of Z3str3, CVC4, Z3str2, and Norn on the StringFuzz suites Concats-Balanced, Concats-Big, Concats-Extracts-Small, and Different-Prefix in Sect. 4. We highlight these suites because they make some solvers perform poorly, but not others. We analyze our experimental results, and pinpoint algorithmic limitations in Z3str3 that cause poor performance.

2 StringFuzz

Implementation and Architecture. StringFuzz is implemented as a Python package, and comes with several executables to generate, transform, and analyze SMT-LIB 2.0/2.5 string and regex instances. Its components are implemented as UNIX “filters” to enable easy integration with other tools (including themselves). For example, the outputs of generators can be piped into transformers, and transformers can be chained to produce a stream of tuned inputs to a solver. StringFuzz is composed of the following tools:

  • stringfuzzg

    This tool generates SMT-LIB instances. It supports several generators and options that specify its output. Details can be found in Table 1a.

  • stringfuzzx

    This tool transforms SMT-LIB instances. It supports several transformers and options that specify its output and input, which are explained in Table 1b. Note that transformers Translate and Reverse also preserve satisfiability under certain conditions.

  • stringstats

    This tool takes an SMT-LIB instance as input and outputs its properties: the number of variables/literals, the max/median syntactic depth of expressions, the max/median literal length, etc.

We organized StringFuzz to be easily extended. To show this, we note that while the whole project contains 3,183 lines of code, it takes an average of 45 lines of code to create a transformer. StringFuzz can be installed from source, or from the Python PIP package repository.

Table 1. StringFuzz built-in (a) generators and (b) transformers.

Regex Generating Capabilities.

StringFuzz can generate and transform instances with regex constraints. For example, the command “stringfuzzg regex -r 2 -d 1 -t 1 -M 3 -X 10” produces this instance:

figure b

Each instance is a set of one or more regex constraints on a single variable, with optional maximum and minimum length constraints. Each regex constraint is a concatenation (re.++ in SMT-LIB string syntax) of regex terms:

$$\begin{aligned}&\texttt {(re.++ T1 (re.++ T2} \; ... \; \texttt {(re.++ Tn-1 Tn )))} \end{aligned}$$

and each term Ti is recursively defined as any one of: repetition (re.*), Kleene star (re.+), union (re.union), or a character literal. Nested operators are nested up to a specified (using the –depth flag) depth of recursion. Terms at depth 0 are regex constants. Below are 3 example regexes (in regex, not SMT-LIB, syntax) of depth 2 that can be produced this way:

$$\begin{aligned}&((\texttt {a}|\texttt {b})|(\texttt {cc})+)\quad \quad \quad ((\texttt {ddd})*)+\quad \quad \quad ((\texttt {ee})+|(\texttt {fff})*) \end{aligned}$$

Equisatisfiable String Transformations. StringFuzz can also transform problem instances. This is done by manipulating parsed syntax trees. By default most of the built-in transformers only guarantee well-formedness, however, some can even guarantee equisatisfiability. Table 1b lists the built-in transformers and notes these guarantees.

Example Use Case. In Sect. 3 we use StringFuzz to generate benchmark suites in a batch mode. We can also use StringFuzz for on-line exploratory debugging. For example, the script below repeatedly feeds random StringFuzz instances to CVC4 until the solver produces an error:

figure c

3 Instance Suites

In this section, we describe the benchmark suites we generated with StringFuzz, and on which we conducted our experimental evaluation. Table 2a lists instances that were generated by stringfuzzg. Table 2b lists instances derived from existing seed instances by iteratively applying stringfuzzx. Every transformed instance is named according to its seed and the transformations it undertook. For example, z3-regex-1-fuzz-graft.smt2 was transformed by applying Fuzz and then Graft to z3-regex-1.smt2.

The Amazon category contains 472 instances derived from two seeds supplied by our industrial collaborators. The Regex category is seeded by the Z3str2 regex test suite [4], which contains 42 instances. Through cumulative transformations we expanded the 42 seeds to 7,551 unique instances. Finally, the Sanitizer category is obtained from five industrial e-mail address and IPv4 sanitizers.

Table 2. Repository of 10,258 SMT-LIB 2.0/2.5 instances.

4 Experimental Results and Analysis

We generated several problem instance suites with StringFuzz that made one solver perform poorly, but not others.Footnote 2 They are Concats-Balanced, Concats-Big, Concats-Extracts-Small, and Different-Prefix. Figure 1 shows the suites that were uniquely difficult for CVC4. Figure 2 shows the suites that were uniquely difficult for Z3str3. All experiments were conducted in series, each with a timeout of 15 s, on an Ubuntu Linux 16.04 computer with 32 GB of RAM and an Intel® Core™ i7-6700 CPU (3.40 GHz).

Fig. 1.
figure 1

Instances hard for CVC4

Fig. 2.
figure 2

Instances hard for Z3str3

Usefulness to Z3str3: A Case Study. StringFuzz’s ability to produce scaling instances helped uncover several implementation issues and performance limitations in Z3str3. Scaling inputs can reveal issues that would normally be out of scope for unit tests or industrial benchmarks. Three different performance and implementation bugs were identified and fixed in Z3str3 as a result of testing with the StringFuzz scaling suites Lengths-Long and Concats-Big.

StringFuzz also helped identify a number of performance-related issues and opportunities for new heuristics in Z3str3. For example, by examining Z3str3’s execution traces on the instances in the Concats-Big suite we discovered a potential new heuristic. In particular, Z3str3 does not make full use of the solving context (e.g. some terms are empty strings) to simplify the concatenations of a long list of string terms before trying to reason about the equivalences among sub-terms. Z3str3 therefore introduces a large number of unnecessary intermediate variables and propagations.

5 Related Work

Many solver developers create their own test suites to validate their solvers [1, 4, 5]. Several popular instance suites are also publicly available for solver testing and benchmarking, such as the Kaluza [2] and Kausler [11] suites. There are likewise several fuzzers and instance generators currently available, but none of them can generate or transform string and regex instances. For example, the FuzzSMT [9] tool generates SMT-LIB instances with bit-vectors and arrays, but does not support strings or regexes. The SMTpp [8] tool pre-processes and simplifies instances, but does not generate new ones or fuzz existing ones.