To identify open problems and future research directions, we manually analyzed the execution logs of 1,653 frames that could not be reproduced in any of the 10 executions. This analysis includes a description of the reason why a frame could not be reproduced.Footnote 13 Based on those descriptions, we grouped the reason of the different failures into 13 categories and identified future research directions. Table 7 provides the number and frequency of frames classified in each category.Footnote 14 The complete categorization table is available in our replication package.Footnote 15
Table 7 Challenges with the number and percentage of frames identified for this challenge For each challenge, we discuss to what extent it is crash-reproduction-specific and its relation to search-based software testing in general. In particular, for challenges previously identified by the related literature in search-based test case generation, we highlight the differences originating from the crash reproduction context.
Input Data Generation
Generating complex input objects is a challenge faced by many automated test generation approaches, including search-based software testing and symbolic execution (Braione et al. 2017). Usually, the input space of each input is large and generating proper data enabling the search process to cover its goals is difficult.
As we can see from Table 7, this challenge is substantial in search-based crash reproduction. Trying to replicate a crash for a target frame requires to set the input arguments of the target method and all the other calls in the sequence properly such that when calling the target method, the crash happens. Since the input space of a method is usually large, this can be challenging. EvoCrash uses randomly generated input arguments and mock objects as inputs for the target method. As we described in Section 7, we observe that a widespread problem when reproducing a ClassCastException (CCE) is to identify which types to use as input parameters such that a CCE is thrown. In the case of a CCE, this information can be obtained from the error message of the exception. Our future work includes harvesting additional information, like error messages, to help the search process.
We also noticed that some stack traces involving Java generic types make EvoCrash abort the search after failing to inject the target method in every generated test during the guided initialization phase. Generating generic type parameters is also a recognized challenge for automated testing tools for Java (Fraser and Arcuri 2014a). To handle these parameters, EvoCrash, based on EvoSuite’s implementation (Fraser and Arcuri 2014a), collects candidate types from castclass and instanceof operators in Java bytecode, and randomly assign them to the type parameter. Since the candidate types may themselves have generic type parameters, a threshold is used to avoid large recursive calls to generic types. One possible explanation for the crashes in these cases could be that the threshold is not correctly tuned for the kind of classes involved in the recruited projects. Thus, the tool fails to set up the target method to inject to the tests. Based on the results of our evaluation, handling Java generics in EvoCrash needs further investigation to identify the root cause(s) of the crashes and devise effective strategies to address them.
For instance, EvoCrash cannot reproduce the first frame of crash XWIKI-13708Footnote 16, presented in Listing 5. The target method onEvent (detailed in Listing 6) has three parameters. EvoCrash could not reach the target line (line 78 in Listing 6) as it failed to generate a fitted value for the second parameter (source). This (Object) parameter should be castable to XWikiDocument and should return values for getXObject() or getAttachment() (using mocking for instance).
Chosen Examples
XWIKI-13708, frame 1; ES-22922, frame 5; ES-20479, frame 10.Footnote 17
Complex Code
Generating tests for complex methods is hard for any search-based software testing tool (Harman et al. 2004). In this study, we indicate a method as complex if (i) it contains more than 100 lines of code and high cyclomatic complexity; (ii) it holds nested predicates (Malburg and Fraser 2011; Harman et al. 2004); or (iii) it has the flag problem (Malburg and Fraser 2011; McMinn 2011), which include (at least one) branch predicate with a binary (boolean) value, making the landscape of the fitness function flat and turning the search into a random search (Harman et al. 2004).
As presented in Section 2, the first component of the fitness function that is used in EvoCrash encodes how close the algorithm is to reach the line where the exception is thrown. Therefore, frames of a given stack trace pointing to methods with a high code complexityFootnote 18 are more costly to reproduce, since reaching the target line is more difficult.
Handling complex methods in search-based crash reproduction is harder than in general search-based testing. The search process in crash reproduction should cover (in most cases) only one specific path in the software under test to achieve the reproduction. If there is a complex method on this path, the search process cannot achieve reproduction without covering it. Unlike the more general coverage driven search-based testing approach (with line coverage for instance), where the are usually multiple possible executions paths to cover a goal.
Chosen Examples
XWIKI-13096, frame 3; ES-22373, frame 10.Footnote 19
Environmental Dependencies
As discussed by Arcuri et al. (2014), generating unit tests for classes which interact with the environment leads to (i) difficulty in covering certain branches which depend on the state of the environment, and (ii) generating flaky tests (Luo et al. 2014), which may sometimes pass, and sometimes fail, depending on the state of the environment. Despite the numerous advances made by the search-based testing community in handling environmental dependencies (Arcuri et al. 2014; Fraser and Arcuri 2014b), we noticed that having such dependencies in the target class hampers the search process. Since EvoCrash builds on top of EvoSuite (Fraser and Arcuri 2013b), which is a search-based unit test generation tool, we face the same problem in the crash reproduction problem as well.
For instance, Listing 7 shows the stack trace of the crash XWIKI-12584.Footnote 20 During the evaluation, EvoCrash could not reproduce any of the frames of this stack trace. During our manual analysis, we discovered that, for the four first frames, EvoCrash was unable to instantiate an object of class XWikiHibernateStore,Footnote 21 resulting in an abortion of the search. Since the class XWikiHibernateStore relies on a connection to an environmental dependency (here, a database), generating unit test requires substantial mocking codeFootnote 22 that is hard to generate for EvoCrash. As for input data generation, our future work includes harvesting and leveraging additional information from existing tests to identify and use relevant mocking strategies.
Chosen Examples
ES-21061, frame 4; XWIKI-12584, frame 4.Footnote 23
Static Initialization
In Java, static initializers are invoked only once when the class containing them is loaded. As explained by Fraser and Arcuri (2014b), these blocks may depend on static fields from other classes on the classpath that have not been initialized yet, and cause exceptions such as NullPointerException to be thrown. In addition, they may involve environmental dependencies that are restricted by the security manager, which may also lead to unchecked exceptions being generated.
In our crash reproduction benchmark, we see that about 9% (see Table 7) of the cases cannot be reproduced as they point to classes that have static initializers. When such frames are used for crash reproduction with EvoCrash, the tool currently aborts the search without generating any crash reproducing test. As Fraser and Arcuri (2014b) discuss, automatically determining and solving all possible kinds of dependencies in static initializers is a non-trivial task that warrants dedicated research.
Chosen Examples
ES-20045, frames 1 and 2.Footnote 24
Abstract Classes and Methods
In Java, abstract classes cannot be instantiated. Although generating coverage driven unit tests for abstract classes is possible (one would most likely generate unit tests for concrete classes extending the abstract one or use a parametized test to check that all implementations respect the contract defined by the abstract class), when a class under test is abstract, EvoSuite (as the general test generation tool for java) looks for classes on the classpath that extend the abstract class to create object instances of that class. In order to cover (e.g., using line coverage) specific parts of the abstract class, EvoSuite needs to instantiate the right concrete class allowing to execute the different lines of the abstract class.
For crash reproduction, as we can see from Table 7, it is not uncommon to see abstract classes and methods in a stack trace. In several cases from Elasticsearch, the majority of the frames from a given stack trace point to an abstract class. Similarly to coverage-driven unit test generation, EvoCrash needs to instantiate the right concrete class: if EvoCrash picks the same class that has generated the stack trace in the first place, then it can generate a test for that class that reproduces the stack trace. However, if EvoCrash picks a different class, it could still generate a test case that satisfies the first two conditions of the fitness function (section 2). In this last case, the stack trace generated by the test would match the frames of the original stack trace, as the class names and line numbers would differ. The fitness function would yield a value between 0 and 1, but it may never be equal to 0.
Chosen Examples
ES-22119, frames 3 and 4; XRENDERING-422, frame 6.Footnote 25
Anonymous Classes
As discussed in the study by Fraser and Arcuri (2013b), generating automated tests for covering anonymous classes is more laborious because they are not directly accessible. We observed the same challenge during the manual analysis of crash reproduction results generated by EvoCrash. When the target frame from a given crash stack trace points to an anonymous object or a lambda expression, guided initialization in EvoCrash fails, and EvoCrash aborts the search without generating any test.
Chosen Examples
ES-21457, frame 8; XWIKI-12855, frames 30 and 31.Footnote 26
Private Inner Classes
Since it is not possible to access a private inner class, and therefore, not possible to directly instantiate it, it is difficult for any test generation tool in Java to create an object of this class. As for anonymous classes, this challenge is also present for crash reproduction approaches. In some crashes, the target frame points to a failing method inside a private inner class. Therefore, it is not possible to directly inject the failing method from this class during the guided initialization phase, and EvoCrash aborts the search.
Chosen Example
MATH-58b, frame 3.Footnote 27
Interfaces
In 6 cases, the target frame points to an interface. In Java, similar to abstract classes, interfaces may not be directly instantiated. In these cases also, EvoCrash randomly selects the classes on the classpath that implement the interface and, depending on the class picked by EvoCrash, the fitness function may not reach 0.0 during the search if the class is different from the one used when the input stack trace has been generated. This category is a special case of Abstract classes and methods (described in Section 8.5), however, since the definition of a default behavior for an interface is a feature introduced by Java 8 (Oracle 2019) that has, to the best of our knowledge, not been previously discussed for search-based testing, we choose to keep it as a separate category.
Chosen Example
ES-21457, frame 9.Footnote 28
Nested Private Calls
In multiple cases, the target frame points to a private method. As we mentioned in Section 6, those private methods are not directly accessible by EvoCrash. To reach them, EvoCrash detects other public or protected methods which invoke the target method directly or indirectly and randomly choose during the search. If the chain of method calls, from the public caller to the target method, is too long, the likelihood that EvoCrash may fail to pick the right method during the search increases.
In general, calling private methods is challenging for any automated test generation approach. For instance, Arcuri et al. (2017) address this problem by using the Java reflection mechanism to access private methods and private attributes during the search. As mentioned in Section 6.1, this can generate invalid objects (with respect to their class invariants) and lead to generating test cases helplessly trying to reproduce a given crash (Chen and Kim 2015).
Chosen Examples
XRENDERING-422, frames 7 to 9.Footnote 29
Empty enum Type
In the stack trace of the ES-25849 crash,Footnote 30 the 4th frame points to an empty enumeration Java type.Footnote 31 Since there are no values in the enumeration, EvoCrash was not able to instantiate a value and aborted during the initialization of the population. Frames pointing to code in an empty enumeration Java type should not be selected as target frames and could be filtered out using a preliminary static analysis.
Chosen Example
ES-25849, frame 4.
Frames with try/catch
Some frames have a line number that designates a call inside a try/catch block. When the exception is caught, it is no longer thrown at the specific line given in the trace, rather it is typically handled inside the associated catch blocks. From what we observed, often catch blocks either (i) re-throw a checked exception, which yield chained stack traces with information that is not exactly as the input stack trace but can still be used for crash reproduction; or (ii) log the caught exception. Since EvoCrash only considers uncaught exceptions that are generated as the result of running the generated test cases during the search, the logged stack traces is presently no use for crash reproduction. Also, even if a stack trace is recorded to an error log, this stack trace is not the manifestation of a crash per se. Indeed, once the exception logged, the execution of the program continues normally.
For instance, for the crash ES-20298,Footnote 32 EvoCrash cannot reproduce the fourth frame of the crash. This frame points to the following method call in a try and catch:
Even if an exception is thrown by the processResponse method, this exception is caught and logged, and the execution of the program continues normally.
Generally, if an exception is caught in one frame, it cannot be reproduced (as it cannot be observed) from higher level frames. For instance, for ES-20298, all frames above level 4 cannot be reproduced since the exception is catch in frame 4 and not propagated to the higher frames. This property of a crash stack trace implies that, for now, depending on where in the trace such frames exist, only a fraction of the input stack traces can actually be used for automated crash reproduction. Future development of EvoCrash can alleviate this limitation by, additionally to the monitoring of uncaught exceptions, read the error log to affecting the propagation of exceptions during execution. However, unlike other branching instructions relying on boolean values, for which classical coverage driven unit test generation can use the branch distance (see Section 2.2.1) to guide the search (McMinn 2004), there is little guidance offered for try/catch instructions since the branching condition is implicit in one or more instructions in the try.
Chosen Example
ES-14457, frame 4.Footnote 33
Missing Line Number
31 frames in JCrashPack have frames with a missing line number, as shown in Listing 8. This happens if the Java files have been compiled without any debug information (by default, the Java compiler add information about the source files and line numbers, for instance, when printing a stack trace) or if the frame points to a class part of the standard Java library and the program has been run in the Java Runtime Environment (JRE) and not the JDK.
Since EvoCrash currently requires a line number to compute the fitness values during the search, those frames have been ignored during our evaluation and do not appear in the results. Yet, as frames with missing line number appear in JCrashPack (and in other stack traces), we decided to mention this trial here as a search-based crash reproduction challenge. A possible solution, as the future work, is to relax the fitness function so that it can still approximate fitness if line numbers are missing.
Chosen Example
XRENDERING-422.Footnote 34
Incorrect Line Numbers
In 37 cases, the target frame points to the line in the source code where the target class or method is defined. This happens when the previous frame points to an anonymous class or a lambda expression. Such frames practically cannot be used for crash reproduction as the location they point to does not reveal where exactly the target exception occurs. One possible solution would be to consider the frame as having a missing line number and use the relaxed fitness function to approximate the fitness.
Chosen Examples
MATH-49b, frames 1 and 4.Footnote 35
Unknown
We were unable to identify why EvoCrash failed to reproduce 16 frames (out of 1,653 frames manually analyzed). In these cases, neither the logs nor the source code could help us understand how the exception was propagated.
Summary (RQ3)
What are the open problems that need to be solved to enhance search-based crash reproduction? Based on the manual analysis of the frames that could not be reproduced at least once out of 10 rounds of executions, we identified 13 challenges for search-based crash reproduction. We confirmed challenges previously identified in other search-based software testing approaches and specified how they affect search-based crash reproduction. And discovered new challenges, more specific to search-based crash reproduction and explained how the can affect other search-based software testing approaches.
These challenges are related to the difficulty to generate test cases due to complex input data, environmental dependencies, or complex code; abstraction (static initialization, interfaces, abstract, and anonymous classes); encapsulation mechanisms (private inner classes and nested private calls in the given stack trace) of object-oriented languages; or the selection of the target frame in crash reproduction (in try/catch blocks, in empty enumerations, when the location in the source code is unknown, or when the frame has an incorrect line number).