Plain random test generation with PRTest

Automatic test-suite generation tools are often complex and their behavior is not predictable. To provide a minimum baseline that test-suite generators should be able to surpass, we present PRTest, a random black-box test-suite generator for C programs: To create a test, PRTest natively executes the program under test and creates a new, random test value whenever an input value is required. After execution, PRTest checks whether any new program branches were covered and, if this is the case, the created test is added to the test suite. This way, tests are rapidly created either until a crash is found, or until the user aborts the creation. While this naive mechanism is not competitive with more sophisticated, state-of-the-art test-suite generation tools, it is able to provide a good baseline for Test-Comp and a fast alternative for automatic test-suite generation for programs with simple control flow. PRTest is publicly available and open source.


Introduction
Automatic test-suite generation is a highly active field of research and many successful tools exist to this date. Unfortunately, most of these tools are based on sophisticated algorithms and thus, both their code and their behavior can be hard to understand for non-experts. In addition, these tools and their improvements are usually only compared to each other, but no naive baseline exists. We present PRTest, a plain random test-suite generator that provides a solution for both issues. PRTest is designed to be simple: its full testsuite generation logic consists of 125 lines of code, and it uses no heuristics or sophisticated algorithms. Instead, PRTest provides random input generation [2]: It repeatedly executes the program under test with random inputs and stores the input values of an execution as a test if the execution increased the overall coverage. Thanks to its pure randomness and native execution of the program under test its behavior is easy to understand and it can be used as a lower baseline for Test-Comp.   First, PRTest uses the clang compiler to compile the program under test against a C harness that provides the full test-suite generation logic ('Test Gen. Harness' in Fig. 1). The harness provides: (1) a custom program entry point for test-generation setup, (2) definitions for the Test-Compspecific input methods __VERIFIER_nondet_X (where X is any primitive C type; e.g., __VERIFIER_nondet_int), (3) a method input that creates new test inputs, and (4) clang-specific methods that allow PRTest to track program coverage during runtime.  When the compilation result is executed, the custom program entry point initializes a random number generator and traps signals that would usually terminate the program (e.g., SIGINT) as well as the exit method. This is necessary so that PRTest is not terminated prematurely if the input program raises a signal or calls the exit method. Then, the test-generation loop starts and calls the original main function of the program under test on clean memory. Whenever a method __VERIFIER_nondet_X is called in the program under test, method input introduces a new test input of the expected type, records it as the next test input for the current execution, and returns it to the function call in the program under test. When the program under test terminates, PRTest checks whether the execution covered any new code blocks, and if it did, the test inputs that were recorded for that execution are stored as a new test. If no new code blocks were covered, the test inputs are discarded. We call this mechanism test filter. After test filtering, loop starts again by calling the main method of the input program, creating another random test in the process. The test-generation loop stops if a looked-for program bug is found (in case of category Coverage-Error) or if the process is aborted by the user.
The test harness of PRTest defines input methods __VERIFIER_nondet_X so that they declare a new program variable of their respective type X and call method input to introduce a new test input of the required size. Figure 2 shows this exemplary for method __VERIFIER_nondet_int.
Method input receives a pointer to input variable var that a new value should be assigned to, and the size of the type of var in bytes. For each byte, input creates a random byte value and stores that in an array that represents the new value of the given size. To create random values, it uses the random number generator rand() provided by the C standard library. After a value has been created for each byte, this byte sequence is copied into var (Fig. 3). Method input considers all types in their binary representation and is thus type-agnostic: it uses a uniform distribution over arbitrarysize binary values and is able to handle both integer and float types.
To measure code coverage of program executions, PRTest uses the program instrumentation SanitizerCoverage that is provided by clang. This instrumentation adds a special method call at the beginning of each code block. We define this method so that, whenever a new code block is covered, a Boolean flag is set to indicate that the current test covers new program behavior. This flag is then checked by the test filter to decide whether to keep or discard a test.
The version of PRTest used in Test-Comp '19 was implemented as part of tbf [1]. It is written in Python 3 and C, and uses the pseudo-random number generator provided by the C standard library with a uniform distribution. For reproducibility of the Test-Comp results, the seed of the random value generator is set to the arbitrary value 1618033988, derived from the golden ratio. Since version 2.0, 1 PRTest is a stand-alone application that does not require Python anymore.

Strengths and weaknesses
Strengths PRTest does not interpret or analyze the program under test, but executes it natively with a test-generation harness. Thanks to this, PRTest is able to handle all existing C constructs and can efficiently handle all numeric types, including floats.
PRTest is also able to create a vast amount of tests in a very short time: For example, for benchmark task floats-cdfpl/square_2.i, PRTest generated over 400 000 tests per second. This allows very fast generation of a rudimentary test suite that covers the, based on naive inputvalue probability, most probable program branches. PRTest is also very simple: The C harness, which is the only necessary component to create tests, is only 125 lines of code. The remaining code exists to determine the input methods for methods outside of Test-Comp, and to transform tests into the Test-Comp test format-functionality that is not required if one wants to apply PRTest's approach to a specific program with a fixed set of input methods.

Weaknesses
The uniform randomness of PRTest cannot compete with control-flow-aware test generators if programs contain deeply nested branches or branches that are only entered on a small range of inputs or a single input: The probability to generate a random test that reaches the comment 'code block' in the following example is 1 2 32 ≈ 2 * 10 −10 : If PRTest produced tests with the same speed as for the task floats-cdfpl/square_2.i that was mentioned above, PRTest would have a chance of about 8 % to create a test to enter this loop within the Test-Comp time limit. To achieve a 90 % probability to produce a test that reaches the code block, PRTest would have to create almost 10 billion random tests. For task floats-cdfpl/square_2.i, this would take PRTest about 7 hours. The probability to enter a program branch also exponentially decreases with the number of conditions required to enter the branch.
In the literature, random testing is mostly used as a complement to control-flow-aware testing techniques, for example to provide an initial test suite [4] or to avoid other generation techniques from getting stuck [3]. Installation and Usage PRTest requires Python 3.5 or later and clang 3.9 or later. It can be installed by following the steps described in file README.md. The following command line runs PRTest in its configuration for Test-Comp '19, for coverage-property file PROP_FILE and input program PROGRAM.c:
Participation PRTest participated in all categories of Test-Comp. In category Cover-Error, PRTest was not able to get any points in sub-categories ReachSafety-ControlFlow, ReachSafety-ECA and ReachSafety-Sequentialized because of its weakness regarding control flow. In sub-category ReachSafety-Floats, in contrast, PRTest even reaches the third place due to its ability to natively handle float types. PRTest also proved useful as a baseline to identify potential weaknesses of other participants: The result tables of Test-Comp '19 (e.g., for branch coverage 4 ) can show scatter plots for the values of chosen table columns. This allows a quick comparison of the coverage achieved per task by the random test suites created by PRTest and the test suites created by other participants. If a tool achieves significantly worse results for a task than PRTest, this may hint to a potential weakness in that tool. Such tasks exist for all participants.