Install the latest version of the R software environment (R Core Team 2014). In the menu of R, go to Packages, install the package quint from CRAN, and load the package.
Input: Data and formula
The study design for the data to be analyzed with quint needs to be a randomized controlled trial. The data structure in R can be an R data frame or an R matrix. The data set has to include at least the following variables, the order and names of which are not important: one continuous outcome variable (with the class of this variable being numeric), a dichotomous treatment variable (i.e., class may be factor or numeric), and several baseline characteristics (i.e., candidate splitting variables) that can be ordinal or continuous (i.e., class is numeric), or dichotomous (i.e., categorical variables with only two categories, such as gender or continuous characteristics that are dichotomized using a prespecified clinically informed cut-off score). The current version of quint is restricted to a dichotomous treatment variable and can handle neither categorical baseline characteristics with more than two categories, nor categorical outcome variables. Our example data set is included in the R package and can be inspected in the following way:
The two outcome variables, physical functioning (phys) and depression (cesd), have been measured at baseline (t1) and at 9 months post-treatment (t3). The variables negative social interaction (negsoct1) and unmitigated communion (uncomt1) are patient characteristics measured at baseline (i.e., a selection of the nine characteristics in this data set). The treatment variable cond represents three therapy conditions (nutrition, education, and control condition, denoted by 1 to 3, respectively). To get more insight into the meaning of the variables, one may use the help function:
If a data set contains more than two treatment conditions, as in this data set, the user needs to select two conditions of interest, before performing a quint analysis. As we focus in this example on the comparison between the nutrition and the education condition, we create a new data set without the third condition by the following command:
Before the analysis, the user needs to specify the role of all variables by means of a formula, which looks as follows: Y~T|X
, with a single outcome variable Y followed by two parts separated by the symbol |. The first part represents the dichotomous treatment variable T and the second part the baseline characteristics X
1 to X
, where J equals the total number of baseline characteristics under study. The order of X
1 to X
within the second part of the formula is arbitrary. In general, the outcome Y may be a single follow-up measure, a change or rate of change score from baseline to follow-up, a follow-up score adjusted for baseline, or a variable indicating time to an event. (Note that if outcome measurements at more than two time points would be available, quint analyses could be run on change scores between any two time points of interest.) We recommend to use outcome variables measured on scales that are calibrated in terms of what constitutes clinically meaningful differences. Furthermore, we recommend to construct the outcome variable in such a way that a higher score indicates a better treatment outcome to facilitate the interpretation of the output.
For our example data, we create two formulas, one for each outcome variable. For change in depression, the formula is specified as follows:
In the above formula, the expression I (cesdt1 - cesdt3) is used to calculate the change score. The posttest depression score (cesdt3) is subtracted from the baseline (cesdt1) to ensure that a higher score indicates a better treatment outcome, that is, a larger improvement in depression. Furthermore, the nine patient characteristics are listed as candidate splitting variables, in addition to the baseline measurement of the outcome variable.
For change in physical functioning, the formula is specified as follows:
In the above formula, the baseline score is subtracted from the posttest score to ensure that a higher score indicates a larger improvement in physical functioning.
First analyses with default values of parameters
We now start with the first analyses using the main function (called quint) of the package with default values for the tuning parameters. In the next section, an overview of the tuning parameters is given, and it will be shown how (and why) default values can be changed. Just before running the code, we fix the seed to be able to replicate the results of the bootstrapping. During the analysis, screen output is generated automatically to enable the user to follow the process.
The first two lines of this output explain the relation between the categories of the treatment variable T used in the analysis, and the categories of the treatment variable in the data set (in our case variable cond). The third line shows the number of patients that are used in the analysis; these are the patients without missing values on any of the variables included in the formula. Thus, in our example data, 148 out of the total of 168 patients who received nutrition or education therapy have no missing values on the outcome and baseline variables included in formula1. The end of the output shows the reason why the splitting process stopped. In this case, no split 7 could be found that implied a higher value of C. For this analysis, the output also gives two warning messages of which one is displayed above. It refers to difficulties in the bootstrap procedure that suggest instability of the tree after split 5. The result of the analysis is an object of class quint, from which the fit, split and leaf information can be obtained using the summary function:
The first line of this output concerns the type of partitioning criterion C, in this case the default criterion was used, namely, the Effect size criterion. The fit information of the full tree displays per split the apparent value of C, the bias-corrected value of C (which resulted from the bootstrap procedure), and the corresponding standard error (se). Note that the apparent value of C increases with an increasing number of splits (this is always true), whereas the bias-corrected value of C reaches its maximum at four splits, and then decreases.
The split information shows in the first two columns the node numbers of the parent nodes that were split and those of the resulting child nodes. The node numbering is the same as the one commonly used (e.g., in Breiman et al., 1984). In the third and fourth column, the splitting variable and corresponding split point are displayed per split.
The leaf information contains standard descriptive statistics (group size, mean outcome, and standard deviation [SD]) for each treatment group per leaf of the full tree (i.e., the tree after six splits). In addition, the effect size d (i.e., the standardized mean difference of T=1 minus T=2), its standard error (se), and the class assignment are displayed. When instead of the Effect size criterion, the Difference in means criterion is used, the same leaf information is given. In this example, the first leaf consists of 11 women from the nutrition condition (T=1), with a mean improvement in depression of 1.00 (SD =3.10) and seven women from the education condition (T=2), with a mean improvement of 3.71 (SD =4.75). The corresponding effect size d equals -0.71 (se =0.54), and the leaf is assigned to ℘
2, indicating that for these women education therapy outperforms nutrition therapy.
The fit information (fi), split information (si), and leaf information (li) are stored in three matrices that can also be inspected separately:
As explained before, the full tree may be too large and needs to be pruned back to avoid overfitting. The best tree is selected automatically by the function prune.quint. The input of this function is the object of the full tree (i.e., quint1).
The resulting pruned tree with four splits (i.e., five leaves) is an object of class quint, from which fit, split, and leaf information can be obtained using the summary function, and it can be visualized by plot.quint:
The plot of the pruned tree is displayed in Fig. 1. The inner nodes of the tree contain the labels of the splitting variables, and next to the branches the split points are shown. In the leaves of the tree, the effect sizes d are displayed by black dots, along with a 95 % confidence interval.
For the interpretation of the pruned tree, we inspect the assignment of the leaves to the partition classes and the paths of the tree leading to the leaves. Figure 1 shows that for one group of women (Leaf 2, green) the nutrition intervention outperforms the education intervention, in particular, the nutrition intervention resulted in a higher improvement in depression for those women with a lower level of dispositional optimism, a higher level of negative social interaction, and the least extensive form of primary treatment (i.e., lumpectomy without or with only one form of adjuvant therapy). In contrast, for two groups of women (Leaves 1 and 4, red), the education intervention outperforms the nutrition intervention; one of these groups of women reported a lower level of dispositional optimism and a lower level of negative social interaction, whereas the other group reported a medium level of dispositional optimism. For the remaining types of women (Leaves 3 and 5, grey) both interventions resulted in about the same improvement in depression. To learn more about the exact levels of improvement and the effect sizes, we inspect the leaf information of the pruned tree, rounded at two decimals:
Also, for the second outcome variable, Improvement in physical functioning, a quint analysis with default values of the tuning parameters was performed:
Because the result of this analysis was a tree with just two leaves, there was no need for pruning, and we continued by just inspecting the leaf information and the plot of the tree:
The resulting plot (see Fig. 2) shows that for women with four or fewer comorbidities (Leaf 1, the red one assigned to ℘
2) the education intervention was better than the nutrition intervention. The leaf information shows that in this leaf the mean improvement was 3.28 for the nutrition intervention and 6.88 for the education intervention. This latter value was larger, but can be considered as a small improvement from a clinical viewpoint, taking into account the guidelines from Wyrwich et al. (2005).
For women with more than four comorbidities (Leaf 2, the green one assigned to ℘
1), the leaf information shows that the nutrition intervention resulted in a larger improvement in physical functioning (i.e., 4.33) than the education intervention (1.53). Yet, 4.33 can also be considered as a small change from a clinical viewpoint.
Second analyses with modified values for the tuning parameters
Several values of the tuning parameters used in a quint analysis can be adapted by the user. Table 1 gives an overview of all parameters involved, subdivided in those concerning the partitioning criterion, the stopping criterion, the boundary conditions, and the bootstrap procedure. In this section, we will describe how to change the parameters, and the considerations associated with these changes.
With regard to the partitioning criterion, a first parameter concerns the type of partitioning criterion, that is, the Effect size criterion (which is the default as mentioned before) or the Difference in means criterion. For this choice, one possible consideration concerns the measurement scale of the outcome variable: If the outcome is measured on a scale with values that do not have a well-specified meaning (such as, Improvement in depression), the Effect size criterion may be preferred. In contrast, if a scale is used with values that bear a well-defined clinical interpretation (such as, Improvement in physical functioning), the Difference in means criterion is to be preferred. Another consideration pertains to whether and how one is willing to take into account subgroup heterogeneity. If one wants to identify subgroups that are homogeneous in treatment effect, then the Effect size criterion is to be preferred (note that an effect size of the same difference in means is larger when the pooled standard deviation of the treatment groups is smaller); if, in contrast, the only research concern is to identify subgroups with a mean difference in treatment outcome that is as large as possible, then the Difference in means criterion is to be preferred. A final consideration pertains to the robustness of the results. Baguley (2009) showed that the raw difference in means is more robust than the standardized effect size.
A second parameter concerns the weights of the two components of the partitioning criterion, the Difference in treatment outcome and the Cardinality component, that is, w
1 and w
2 (see also formula 6 in Dusseldorp & Van Mechelen, 2014). As mentioned before (see section Goal of QUINT), the Cardinality component concerns the sample sizes of the leaves assigned to ℘
1 and ℘
2. The default weights are chosen in such a way that the two components are weighted equally with the maximum possible value for each component being 2. The default value of w
1 depends on the criterion that is used: if this is the Difference in means criterion, the default value of w
1 is put equal to 1/log(I
R(Y)), where IQR denotes the interquartile range (which can be considered as a plausible maximum value for the difference in means). If the Effect size criterion is used, the default value of w
1 is put equal to 1/log(1+3), with 3 being considered as a plausible maximum value of the effect size. In a specific research field this value may be typically lower (e.g., 2), and the weight can be changed accordingly (see Table 1 for an example).
We change the values of the tuning parameters, using the function quint.control. For example, if we want to use the Difference in means criterion for Improvement in physical functioning, we first make a new control object, and then we use this control object in the analysis:
For this example, the resulting tree is the same as the tree grown with the Effect size criterion. In our experience this is often the case, but subtle differences may occur.
With regard to the stopping criterion, the maximum number of leaves of the tree can be changed. This enables the user to stop the tree algorithm before the maximum value of the partitioning criterion C was reached, for example, to inspect a tree of a certain fixed size (e.g., two leaves). The default value is set at ten leaves (which most of the times suffices in practice because the maximum value of C is reached earlier). This value can be changed into, for example, two leaves using the following commands:
With regard to the boundary conditions, a first tuning parameter concerns the critical minimum value of the absolute effect size in each leaf (d
) that is checked by the algorithm after the first split (i.e., the qualitative interaction condition). The results of an extensive simulation study (Dusseldorp & Van Mechelen, 2014) showed that a good balance between type I error and type II error is obtained for d
=0.30 and N=400. Therefore, the default value of d
equals 0.30. For smaller sample sizes, a higher value of d
is recommended to control for the risk of finding spurious interactions. In our example with a sample size of N=148, it may be advisable to increase the value to 0.40. For Improvement in depression, this change will not influence the result, because the effect sizes in the two leaves after the first split (see output above) are both greater than 0.40. However, if we change d
to 0.40 for Improvement in physical functioning, we obtain the following result:
The error message shows that the qualitative interaction condition (as explained at length in the section on the QUINT algorithm) is violated, and, as a consequence, no tree is grown. This result suggests that the interaction we found earlier for Improvement in physical functioning using the default values, may be a spurious one.
The remaining tuning parameters associated with the boundary conditions concern the minimal sample size per treatment condition in T=1 (a1), and in T=2 (a2). The default values have been set at 10 % of the treatment group sample sizes. However, the user is free to choose any value as minimum treatment sample size. When on the one hand treatment sample sizes are relatively small, 10 % of them may not allow to estimate the mean outcome in a treatment group with sufficient confidence. When on the other hand treatment sample sizes are large (i.e., 500 or more), we recommend to choose a lower value than the default to avoid that the tree algorithm stops (too) early (see Table 1 for an example).
With regard to the bootstrap procedure, a first tuning parameter determines whether or not this procedure is performed. If bootstrapping is not performed, the computation time of quint is much shorter, yet at the expense of a lack of information on the amount of overfitting. A second tuning parameter concerns the number of bootstrap samples, with a higher number (e.g., B=200) leading to more stable results. The default value has been put to B=25 (i.e., the recommended minimum value by LeBlanc & Crowley, 1993).
We proposed a new R package quint that can be used to study the important clinical problem of differential treatment efficacy. When many client characteristics (or other baseline characteristics) have been measured that may moderate treatment outcome, the problem of subgroup identification is a very difficult one with a high risk of type I and type II errors. In such a situation, the package quint can be most useful through its versatile way of searching for subgroups and its procedures that control for inferential errors and overfitting. The quint analysis focuses especially on the identification of subgroups that are involved in so-called qualitative treatment subgroup interactions. This type of interactions implies that for some subgroup of clients, one treatment alternative outperforms another while for another subgroup the reverse holds, and is therefore of utmost importance for personalized treatment assignment. It should be noted that a quint analysis does not aim at identifying quantitative interactions. If data contain no qualitative interactions, no tree will be grown by quint. In this paper, we demonstrated the functions of the package using data from the Breast Cancer Recovery Project, and highlighted possibilities to direct the analysis on the basis of theoretical and practical considerations.
The R package quint can be used for data from a randomized controlled trial. In this paper, we focused on a clinical trial involving cancer patients, but the method is applicable to controlled experiments in any setting, such as randomized experiments in which two interventions, training programs, or any other type of experimental manipulations are compared (e.g., Taylor, Davis, & Maxwell, 2001), including controlled web-based experiments (so-called A/B tests) in marketing research (Kohavi, Longbotham, Sommerfield, & Henne, 2009). Most important features of the data are that the persons are randomly assigned to two conditions (A and B) and that the person characteristics are measured before the treatment is received (unless it is very unlikely that the treatment has altered the characteristic, e.g., gender or age in years). Also, a total sample size of higher or equal to 400 is recommended, based on results from a simulation study (Dusseldorp and Van Mechelen 2014), to allow for the study of more complex treatment-subgroup interactions.
The core idea of random assignment of clients to treatment groups, is that the clients only differ with respect to the treatment variable. This implies that the client characteristics are not associated with the treatment variable and it enables that the observed differences in the (sub)groups can be attributed to the differences in treatment. However, this does not imply that the result of a subgroup analysis, such as the tree found by quint, is always generalizable towards the full client population. In some cases, indeed, the distribution of some characteristics in the study sample may not be the same as those in the population. For example, our sample might consist for 20 % of male clients, while the population to which we want to generalize consists for 50 % of male clients. One possible solution to take this imbalance into account, is to incorporate weighting in the analysis by quint. A vector of weights can easily be implemented for the Difference in means criterion of quint. For the Effect size criterion, this is more difficult, due to the estimation of a pooled standard deviation.
The current implementation of quint has several limitations: (a) weighting of clients according to some known population distribution is not possible; (b) clients with one or more missing values on any of the variables are omitted from the analysis (so-called listwise deletion); (c) the outcome variable should be numeric, and (d) categorical baseline characteristics involving more than two categories cannot be handled by the software. Currently, we are working on a new version of the R package that can deal with categorical baseline characteristics involving more than two categories.
Because QUINT is a post hoc method, it is recommended for clinical practice to check whether the results of QUINT can be replicated in a new randomized controlled trial. Ideally, for the sampling of the participants in this new trial a stratified sampling scheme should be used with stratification on the patterns of moderator variables identified by QUINT.