Logging to Facilitate Combinatorial System Testing

  • Peter M. Kruse
  • I. S. Wishnu B. Prasetya
  • Jurriaan Hage
  • Alexander Elyasov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8432)


Testing a web application is typically very complicated. Imposing simple coverage criteria such as function or line coverage is often not sufficient to uncover bugs due to incorrect components integration. Combinatorial testing can enforce a stronger criterion, while still allowing the prioritization of test cases in order to keep the overall effort feasible. Combinatorial testing requires the whole testing domain to be classified and formalized, e.g., in terms of classification trees. At the system testing level, these trees can be quite large. This short paper presents our preliminary work to automatically construct classification trees from loggings of the system, and to subsequently calculate the coverage of our test runs against various combinatorial criteria. We use the tool CTE which allows such criteria to be custom specified. Furthermore, it comes with a graphical interface to simplify the specification of new test sequences.


Combinatorial testing Classification trees Logging 

1 Introduction

A web application typically consists of both client and server components, and is generally quite complex. The server implements complicated business logic, and the client is rich, consisting of a whole range of graphical user interface (GUI) elements that the user can employ. Unit testing is effective for finding bugs in the implementation of the business logic. However, the use of rich clients can easily lead to integration bugs. Unfortunately, the only way to discover such bugs is by testing the application as a whole. Simple event coverage, where we require that every type of GUI event has been tested, is usually too weak [1]. Combinatorial testing, e.g. pair-wise testing, is believed to lead to better results [2]. To systematically perform combinatorial testing, the testing domain should be well-partitioned; a good way to express such a partition is by means of classification trees [3].

A classification tree is the graphical representation of a test specification. It contains the test relevant aspects (e.g. parameters) and their instantiations (e.g. values). In addition to the actual tree, the classification tree method guides the selection of test cases in a combination table. The classification trees of a realistic web application are typically very large. Manually constructing these trees is tedious and error prone. We therefore propose the following research questions:

RQ1:Can we analyze the system under test (SUT), or its runtime behavior, to acquire elements of the classification trees?

RQ2:Can we analyze the SUT to obtain its operational profile? A software operational profile is a quantitative characterization of the usage of software in the field, e.g. expressed in term of the occurrences probabilities of a set of operations of the software [4]. Such a profile can be used to prioritize testing [5]. In combinatorial testing, we can use profiles to annotate a classification tree.

RQ3:Can we analyze the SUT to validate, or even to acquire constraints on its valid behavior?

E.g. when the SUT expects multiple parameters, it is not wise to consider all combinations of these parameters’ values as potential test-cases, if it turns out that only few combinations can be considered to be valid. A classification tree can be annotated with such a constraint to prevent such invalid combinations from being tested.

In this paper we describe an approach to automatically construct classification trees from loggings of the application. A log-based approach is chosen because many applications nowadays already employ logging. E.g. web servers produce access logs, embedded software produce operational logs, database servers produce transaction logs, and operating systems produce system logs. Classification trees constructed from the logs are then imported into the CTE XL Professional tool, allowing us to specify combinatorial coverage criterion in CTE XL Professional, and to analyze logs generated by a test suite to see how well it performs on satisfying the criterion. CTE XL Professional also allows constraints to be expressed, so that we can specify more semantically informed context sensitive coverage criteria.

The tooling can also be interesting for supporting exploratory testing [6]. In this approach the tester starts with an abstract testing goal (e.g. to test the user registration feature) and then proceeds to interact with the SUT. As understanding improves, the tester tries to more intelligently interact with the SUT in order to test it with respect to the testing goal. Exploratory testing relies on the tester’s cognitive ability to discover which scenarios are more critical, and to focus the testing effort on those. In traditional script-based testing the tester also has to do that, but has to anticipate those scenarios upfront. Moreover, adhering to scripts prohibits the tester from exploring new test cases. By logging exploratory testing sessions, our tooling will help to analyze the combinatorial strength of the testing, and to specify complementary test cases through CTE XL Professional’s graphical interface.

2 Concepts

When applying the classification tree method, the first step is to decide which test aspects of the SUT to focus on. For example, the type of user and the type of browser can be two aspects we are interested in. A test aspect can also concerns the SUT’s output, or even non-functional aspects such as response time. In the terminology of classification trees, these test aspects are called classifications. After selecting relevant test aspects, the tester identifies the (equivalence) classes for each classification. The whole process is usually requirements driven. For semantic purposes, and to facilitate structuring, classifications can be (hierarchically) grouped into compositions.
Fig. 1.

Classification tree for an example application

Figure 1 contains a classification tree for an example application. The root node Application contains the name of the selected SUT. There are two compositions Inputs and Outputs. Five classifications have been identified: three input (Param A, Param B, Param C) and two output classifications (Result Param A, Result Param B). For each classification, a set of available classes has been determined, e.g. Param B has classes \(\alpha \), \(\beta \) and \(\gamma \). A test case is then constructed by selecting exactly one class for each classification in the tree. E.g. \((a,\alpha ,true,\mathbf{success},-1)\) and \((a,\alpha ,true,\mathsf{fail},-1)\) are two test-cases.

Such a test case is “abstract”: it cannot be executed yet. To transform it to an executable test. For this, each equivalence class needs to be mapped to a concrete value that represents the class. For example, the classes \(a\), \(\alpha \) above may represent negative integers and short strings, respectively, and we may decide to map them to, e.g., \(-1\) and “xyz”.

In real world test scenarios, some combinations of classes may actually be invalid. Therefore, constraints may need to be introduced to specify which combinations are allowed. Constraints can be found in many specifications of a software system. They exist for many reasons, such as limitations of the components used in the target system, lack of available resources and even marketing decisions [7]. We use dependency rules to formalize constraints [8]. For example we may have a rule that specifies that if Param A equals a and Param B equals \(\alpha \), then Param C may not be false. Such a rule is used to annotate classification trees.

A classification tree editor can be used to perform prioritized test case generation [9]. Due to limited test resources, it can be necessary to prioritize and select subsets of test cases. For prioritized test case generation all classes in the tree need to be annotated by weights. The weight of a class is its occurrence distribution within the classification to which it belongs. Essentially, this captures the SUT’s profile [4], but in terms of classes rather than operations. The assignment of weights to classes requires a data set containing a representative distribution of values and is typically a manual assignment process. In the example from Fig. 1, it might be the case that for Param A, its classes a, b, and c occur in \(50\,\%\), \(40\,\%\), and \(10\,\%\) of all cases, respectively. Prioritization can be on usage models [10], error models [11], and risk models [12].

The classification tree method can also be used for test sequence generation [13]. The idea is to interpret classification trees as FSMs [14], and then generate covering paths through the FSMs. This involves the identification of allowed transitions between individual classes. In our example, it may be that Param A can only be c in a test step if there has been an earlier test step with Param A equal to b. Then Param A cannot equal c directly after a test step with a, but always requires an intermediate step in which it equals b.

3 Approach

To answer our research questions, we have implemented a prototype with CTE XL Professional. The prototype requires log files in the FITTEST format [15] from an instrumented SUT. The format is well structured, like XML, which makes interpretation easier; it is more compact than XML to reduce I/O overhead.

3.1 Tool Chain

Fig. 2.

Classification tree construction algorithm

The algorithm for the construction of the classification tree from logs can be found in Fig. 2. Abstractly, a log is a sequence of entries; each describes the occurrence of an event during operation of the SUT and the state of the SUT after the event. An event description consists of a name, a type and a list of input parameters. While a state is represented by a set of SUT internal variables, which can store complex values such as objects. The logging of object variables produces deeply nested state structure. We view log entries as if they are test steps, together forming a test sequence. The algorithm constructs not only a classification tree, but also a CTE XL test sequence that corresponds to the log. It consists of three phases.

In phase 1 (Lines1–9), all log entries are parsed. The elements in each log entry are analyzed, and their name and type are stored. The concrete values found during this process are added to the list of possible values for the element. The log entry is then collected together with all contained parameters and their values. The entry’s time stamp is also stored.

In phase 2 (Lines 10–26), the classification tree is created. The list of all found elements is processed. If the element is a plain variable, we create a new element of type classification in the tree (Line 13). For each possible value of that parameter, a class is created (Line 15). Since the values of plain parameters are exclusive, i.e., a single log entry can only contain exactly one value for each parameter, the process maps the variables to classifications and their values to classes.

If the element is an array (with different subsets of members found among the log entries), we create a new element of type composition in the tree (Line 19). For each array member, we then create a classification (Line 21). For this classification, we always create two classes, \(1\) and \(0\) (Lines 22–23). We then use \(1\) to indicate that the element is a member of the array, and \(0\) for when it is not. Since array members are not exclusive, e.g. a log entry can contain arrays with more than one member, we choose the mapping of members to classifications. We group them using a parent composition.

In phase 3 (Lines 27–37), we fill the combination table. Recall that each log entry is considered as a test step (Line 28). For each plain variable, we select the corresponding value class from the tree. For arrays, we iterate through all known possible array elements and compare them against the element of the array from the current log entry. We select \(1\)-classes for all members contained in the current entry (Line 33) and \(0\)-classes for absent members (Line 34). Finally, we set the relative time stamp on the test step (Line 36).

The result from an example import is shown in Fig. 3.
Fig. 3.

Classification tree resulting from a log

The left branch of the tree contains details about the events of an application. The classification targetID has been collapsed to save space and its selected values are given as text in the test matrix. The classification type is a typical example of a plain value variable: it consists of three classes itemclick, click and change. As can be seen from the test matrix, whenever the targetID is ButtonBar0, the corresponding event type is itemclick (Events 1–6). Elements with a different targetID cause events of typeclick instead. The event typechange is not contained in the first 10 events.

The composition args was collapsed in Fig. 3 in order to save space. Details are given in Fig. 4.
Fig. 4.

Classification tree part for array

For each member of the array args, a classification has been created in the tree. In this case, there are twelve different possible array members. The values that occur in a particular event are those that select \(1\) for that event. For example, for event 1 these are \(0\) and \(100\), and for event 2 there are \(2\) and \(100\). An empty array would also be a possible configuration and would feature all zeros selected.

3.2 Addressing the RQs

Earlier in the paper, we posed three research questions. Below we discuss our progress in addressing these questions with a log-based approach towards answering these questions, and challenges that are still left for future work.

RQ1:Can logs be used to acquire elements of classification trees?

Since logs already contain repeating similarities which give some hints of the structures that underlie the SUT, the identification of classifications is quite straightforward; moreover, these also help in identifying missing test aspects in the test specification. To construct the classes of the classifications, we took a direct approach where we map each occurring concrete parameter value to its own class. Therefore, the resulting classification tree may be too verbose. Ideally, we would want to merge different parameter values that result in the same behavior into the same equivalence class. This remains future work. The expected challenge here is the issue of over and under-approximation [16]. Furthermore, for general automatic construction of classification trees, or any kind of test specifications in general, the scalability and applicability highly depend on the granularity, and the general quality of used logs. This leads to the question how to get such logs in the first place.

RQ2:Can logs be used to assess profiles of the application under test?

The calculation of the classes’ distributions is a simple task. It involves counting all events in the logs and then calculating the ratio of each class occurrence in the logs to the total number of events. This helps testers to quickly obtain an estimation of the class weights from the observed distribution.

Calculation of error probabilities [11] is future work. The error probability of a class is the probability that an error occurs when this class is used in a test case. In principle, this can be done by comparing values in the logs that represent the SUT’s state or outputs to their expected values. If they do not match, one could assume the execution that produced the value to be erroneous. The error probability would then be the ratio of erroneous executions against all executions. However, this requires oracles to be specified and integrated into the tool, which we have not done yet.

RQ3:Can logs be used to validate existing and acquire new constraints?

The classification tree editor is already capable of checking a test case against a set of behavior constraints in the form of dependency rules [8]. Since in our approach logs are transformed into test sequences, they can thus be used to check the validity of existing dependency rules. If there is a mismatch, then either the system is not behaving correctly or there is a problem with the test specification.

Given a long enough execution time or a high number of repetitions of the actual system one may assume that all combinations that did not occur can be considered invalid. This allows the automatic extraction of dependency rules from the logs. A potential problem is, again, over and under-approximation. If the execution time is too short, too many combinations are considered forbidden. We think this RQ can only be solved partially, and it still requires future work.

To identify transitions between classes of the classification tree, it is necessary to identify individual test steps from the logs. By monitoring the order in which classes occur, one can start from the assumption that sequences of any two classes that occur indicate a valid transition between these two classes. However, classes that do not occur in a certain order do not necessarily indicate an invalid transition. The quality of the identification is strongly dependent on the extent that the logs are representative of the system’s behavior. Future work is still required.

4 Related Work

Previous work to automatically generate and validate classification trees, e.g., using Z-specifications, includes [17, 18]. However, the requirement to first have a formal Z specification may in practice be unrealistic. A log-based approach is more pragmatic, since many modern applications already employ logging.

Logs can potentially provide much information; if semantic information can be extracted, powerful analyses become possible. For example, logs can be converted into a different representation to be queried for properties. Database-style and Prolog-style queries have been investigated in this context [19, 20]. It may be worthwhile to investigate the facilitation of querying elements of a classification tree against the logs. Marchetto et al. [21] and Lorenzoli et al. [22] investigated how FSMs (modeling the behavior of a program) can be reconstructed from logs. Nguyen et al. investigated how classification trees can be generated from FSMs [23]. However, a tree in Nguyen et al. represents a family of test sequences through an FSM, whereas our tree represents the classification of the entire SUT. Their choice is natural when the goal is to optimize the combinatorial coverage of each test sequence, whereas ours is natural to measure and optimize the combinatorial coverage of an entire suite.

In practice people often use a “simple” log format, e.g. the Common Log Format produced by many webservers. Logs produced using the popular log4j family of logging libraries also employ a simple format. Such a format allows log entries to be accompanied with free style text. When these free text fields are used, they often contain useful semantic information. Extracting or even classifying the semantical information from the free text is a difficult problem. Jain et al. [24] applied clustering techniques to automatically classify unstructured log messages. The clustering is based on common textual structures within the messages, which are then associated to some semantic categorization of the messages including info-messages and (various kinds of) alert-messages. If we can assume to work with more rigidly formatted logs, e.g. in Daikon format [25] or FITTEST format [15], the clustering approach can potentially be lifted to identify equivalence classes from the semantic parts of the log entries. This in turn may be useful for the inference of more abstract classes in the inferred classification trees.

5 Conclusion and Future Work

We have presented the use of logs for the generation, completion and validation of classification trees. We have proposed three research questions and discussed possible solutions, outcomes, and future work. A working prototype has been developed and integrated into CTE XL Professional. The answers we found for the research questions are promising:
  • For RQ1, we found out that elements of the classification tree can indeed be acquired from logs. We do, however, only obtain concrete values from log and a mapping to abstract values is not provided. Some future work is still required for grouping into equivalence classes.

  • For RQ2, we are able to calculate occurrence probabilities, though the calculation of error probabilities still has some issues.

  • For RQ3, the validation of logs against dependency rules has been performed successfully. The acquisition of dependency rules and allowed transitions depends on the amount of logs available.

Providing positive answers to our RQs helps to improve the quality of test design. Reducing manual labor when creating test specifications in terms of classification trees helps test engineers to focus on the actual product features and changes.

While all work has been done on classification trees we are certain that results can be extended to other test specification formats as well.

We have earlier indicated what challenges are left for future work. Additionally, we want to perform a large scale evaluation.



This work is supported by EU grant ICT-257574 (FITTEST).


  1. 1.
    Memon, A.M., Soffa, M.L., Pollack, M.E.: Coverage criteria for GUI testing. In: Proceedings of the 8th European Software Engineering Conference Held Jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-9, pp. 256–267. ACM, New York (2001)Google Scholar
  2. 2.
    Nie, C., Leung, H.: A survey of combinatorial testing. ACM Comput. Surv. 43, 11:1–11:29 (2011)CrossRefGoogle Scholar
  3. 3.
    Grochtmann, M., Grimm, K.: Classification trees for partition testing. Softw. Test. Verif. Reliab. 3(2), 63–82 (1993)CrossRefGoogle Scholar
  4. 4.
    Musa, J.D.: Operational profiles in software-reliability engineering. IEEE Softw. 10(2), 14–32 (1993)CrossRefGoogle Scholar
  5. 5.
    Misra, R.B., Saravana Kumar, K.: Software operational profile based test case allocation using fuzzy logic. Int. J. Autom. Comput. 4(4), 388 (2007)CrossRefGoogle Scholar
  6. 6.
    Bach, J.: Exploratory testing explained (2003). http://www.satisfice.com/articles/et-article.pdf
  7. 7.
    Cohen, M.B., Dwyer, M.B., Shi, J.: Interaction testing of highly-configurable systems in the presence of constraints. In: ISSTA ’07: Proceedings of the 2007 International Symposium on Software Testing and Analysis, New York, NY, USA, pp. 129–139 (2007)Google Scholar
  8. 8.
    Lehmann, E., Wegener, J.: Test case design by means of the CTE XL. In: Proceedings of the 8th European International Conference on Software Testing, Analysis and Review (EuroSTAR 2000), Kopenhagen, Denmark, December 2000Google Scholar
  9. 9.
    Kruse, P.M., Luniak, M.: Automated test case generation using classification trees. Softw. Qual. Prof. 13(1), 4–12 (2010)Google Scholar
  10. 10.
    Walton, G.H., Poore, J.H., Trammell, C.J.: Statistical testing of software based on a usage model. Softw. Pract. Exper. 25(1), 97–108 (1995)CrossRefGoogle Scholar
  11. 11.
    Elbaum, S., Malishevsky, A.G., Rothermel, G.: Test case prioritization: a family of empirical studies. IEEE Trans. Softw. Eng. 28(2), 159–182 (2002)CrossRefGoogle Scholar
  12. 12.
    Amland, S.: Risk-based testing: risk analysis fundamentals and metrics for software testing including a financial application case study. J. Syst. Softw. 53(3), 287–295 (2000)CrossRefGoogle Scholar
  13. 13.
    Kruse, P.M., Wegener, J.: Test sequence generation from classification trees. In: Proceedings of ICST 2012 Workshops (ICSTW 2012), Montreal, Canada (2012)Google Scholar
  14. 14.
    Conrad, M., Drr, H., Fey, I., Yap, A.: Model-based generation and structured representation of test scenarios. In: Proceedings of the Workshop on Software-Embedded Systems Testing, Gaithersburg, Maryland, USA (1999)Google Scholar
  15. 15.
    Prasetya, I.S.W.B., Middelkoop, A., Elyasov, A., Hage, J.: D6.1: FITTEST Logging Approach, Project no. 257574, FITTEST Future Internet Testing (2011)Google Scholar
  16. 16.
    Tonella, P., Marchetto, A., Nguyen, D.C., Jia, Y., Lakhotia, K., Harman, M.: Finding the optimal balance between over and under approximation of models inferred from execution logs. In: 2012 IEEE 5th International Conference on Software Testing, Verification and Validation, pp. 21–30. IEEE (2012)Google Scholar
  17. 17.
    Singh, H., Conrad, M., Sadeghipour, S.: Test case design based on Z and the classification-tree method. In: Proceedings of the 1st International Conference on Formal Engineering Methods, ICFEM ’97, pp. 81–90. IEEE Computer Society, Washington, DC (1997)Google Scholar
  18. 18.
    Hierons, R.M., Harman, M.: Automatically generating information from a Z specification to support the classification tree method. In: Bert, D., Bowen, J.P., King, S., Waldén, M. (eds.) ZB 2003. LNCS, vol. 2651, pp. 388–407. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  19. 19.
    Feather, M.S.: Rapid application of lightweight formal methods for consistency analyses. IEEE Trans. Softw. Eng. 24(11), 949–959 (1998)CrossRefGoogle Scholar
  20. 20.
    Ducasse, S., Girba, T., Wuyts, R.: Object-oriented legacy system trace-based logic testing. In: 10th European Conference on Software Maintenance and Reengineering (CSMR). IEEE (2006)Google Scholar
  21. 21.
    Marchetto, A., Tonella, P., Ricca, F.: State-based testing of Ajax web applications. In: ICST, pp. 121–130. IEEE (2008)Google Scholar
  22. 22.
    Lorenzoli, D., Mariani, L., Pezzè, M.: Automatic generation of software behavioral models. In: 30th International Conference on Software Engineering, pp. 501–510. ACM (2008)Google Scholar
  23. 23.
    Nguyen, C.D., Marchetto, A., Tonella, P.: Combining model-based and combinatorial testing for effective test case generation. In: Proceedings of International Symposium on Software Testing and Analysis (ISSTA), Minneapolis, Minnesota, USA (2012)Google Scholar
  24. 24.
    Jain, S., Singh, I., Chandra, A., Zhang, Z.-L., Bronevetsky, G.: Extracting the textual and temporal structure of supercomputing logs. In: Yang, Y., Parashar, M., Muralidhar, R., Prasanna, V.K. (eds.) HiPC, pp. 254–263. IEEE (2009)Google Scholar
  25. 25.
    Ernst, M.D., Perkins, J.H., Guo, P.J., McCamant, S., Pacheco, C., Tschantz, M.S., Xiao, C.: The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1–3), 35–45 (2007)CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Peter M. Kruse
    • 1
  • I. S. Wishnu B. Prasetya
    • 2
  • Jurriaan Hage
    • 2
  • Alexander Elyasov
    • 2
  1. 1.Berner and Mattner Systemtechnik GmbHBerlinGermany
  2. 2.Department of Information and Computing SciencesUtrecht UniversityUtrechtThe Netherlands

Personalised recommendations