Regression tree construction by bootstrap: Model search for DRGsystems applied to Austrian healthdata
 7.4k Downloads
 5 Citations
Abstract
Background
DRGsystems are used to allocate resources fairly to hospitals based on their performance. Statistically, this allocation is based on simple rules that can be modeled with regression trees. However, the resulting models often have to be adjusted manually to be medically reasonable and ethical.
Methods
Despite the possibility of manual, performance degenerating adaptations of the original model, alternative trees are systematically searched. The bootstrapbased method bumping is used to build diverse and accurate regression tree models for DRGsystems. A twostep model selection approach is proposed. First, a reasonable model complexity is chosen, based on statistical, medical and economical considerations. Second, a medically meaningful and accurate model is selected. An analysis of 8 datasets from Austrian DRGdata is conducted and evaluated based on the possibility to produce diverse and accurate models for predefined tree complexities.
Results
The best bootstrapbased trees offer increased predictive accuracy compared to the trees built by the CART algorithm. The analysis demonstrates that even for very small tree sizes, diverse models can be constructed being equally or even more accurate than the single model built by the standard CART algorithm.
Conclusions
Bumping is a powerful tool to construct diverse and accurate regression trees, to be used as candidate models for DRGsystems. Furthermore, Bumping and the proposed model selection approach are also applicable to other medical decision and prognosis tasks.
Keywords
Regression Tree Internal Node Tree Size Terminal Node Regression Tree ModelBackground
The aim of diagnosis related group (DRG) systems is to classify hospital patients into clinically meaningful and comprehensible groups that consume similar hospital resources, usually measured by their length of stay (LOS). These homogeneous patient groups are described by simple rules, often including the patients' diagnoses, procedures, sex and age. The aim of DRG is to use these parameters as an estimate for the resource consumption of the hospital's individual patients. Among other purposes, e.g. to monitor quality of care and utilization of services, one of their most important applications is a fair, performancebased allocation of available resources among hospitals.
Similar to the British Healthcare Resource Groups (HRG)[1] system and the Canadian Case Mix groups (CMG)[2] system, the Austrian DRGsystem [3] is based on conjunctive rules only and no disjunctions are used, as is the case in other DRGsystems like the Australian ARDRG [[4], Chapter H.3] and the German GDRG[5] system. A major advantage of only using conjunctive rules is the possibility to interpret them as a tree structure, which gives a compact intuitively interpretable representation of the statistical model. Basically, these rules can be created by regression tree methods which, however, often have to be readjusted according to medical knowledge. Unfortunately, this manual adjustment usually yields a decrease of predictive accuracy.
Despite the possibility of manually adapting the original tree alternative models can be searched more systematically. One possibility for such and approach arises from an important characteristic of regression trees, i.e., their solutions are unstable. Thus minor changes in the data can result in completely different trees. Nevertheless, all of these trees can be statistically accurate. Through systematic resampling of the data by bootstrapping, a wider range of trees can be constructed. In this work, bumping[6] a bootstrapbased method proposed by Tibshirani and Knight is used.
In this article, we show that bumping allows us to build diverse and more accurate trees compared to the tree constructed by the currently used Classification and Regression Trees (CART) algorithm [7], while being equally or less complex. As it is shown in the results section, the statistically most accurate trees are too complex for the DRGapplication. We propose to select the final models in a twostep approach from preprocessed models. In a first step the tree size is chosen based on the models' accuracies as well as economical and medical considerations. These considerations require a lot of domain knowledge and are very difficult to express numerically. Therefore, the final tree size can not be selected based on statistics alone, but has to be chosen manually. In a second step, given the prespecified tree size, an accurate and medically reasonable model can be selected. In this way, statistically suboptimal, manual alterations of models are minimized.
The Austrian DRGSystem
Sine 1997 the Austrian hospital financing system is based on an activitybased hospital financing system called Leistungsorientierte Krankenhausfinanzierung (LKF). The aim was to replace the beforehand used per diembased payment scheme by a casebased one with following main objectives [8]:

Consolidate rapidly increasing costs by reducing the LOS

Reduce costs by substituting inpatient care through ambulatory care

Make the hospital system more efficient

Increase the transparency of costs and services

Improve data quality

Maintain the quality of medical services

Ensure modern scientific methods in medical care
For the construction of the current LDF model the CART algorithm, a predictive tree model for regression and classification problems, was used. A main advantage of regression tree models is that they can be interpreted as simple rules without requiring any knowledge about the algorithm itself. This is particularly important as the final model is not only based on statistics, but its medical suitability also has to be evaluated by domain experts. For hospital management and budgeting these simple rules provide transparent information.
Methods
Regression Trees
The aim of regression tree analysis can be stated by explaining a continuous response variable Y by a vector of n predictor variables X = X_{1}, X_{2},...,X_{ n }, which can be an arbitrary mix of continuous, ordinal and nominal variables. The CART algorithm recursively splits the data into two groups based on a splitting rule. The partitioning intends to increase the homogeneity of the two resulting subsets or nodes, based on the response variable. The partitioning stops when no splitting rule can improve the homogeneity of the nodes significantly.
 1.
Examine every allowable split on each predictor variable. Commonly the binary splits are defined as X _{ i }<c for continuous variables and as X _{ i }∈ C for categorical variables, where C is a finite number of categories b _{1}, b _{2},...,b _{ m }.
 2.
Select and execute the split that minimizes the impurity measure in the nodes. Samples that fulfill the criterion of the binary split propagate down into the descendant left node and the other variables into the right node. In our analysis we used the least square cost function, which is computationally efficient and the standard implementation of the CART algorithm.
 3.
Recursively continue step 1 and 2 on the descendant nodes until the homogeneity of the nodes cannot be improved significantly. Additionally, often additional stopping criteria are defined, e.g. minimum sample sizes in the terminal nodes.
Trees constructed in the described fashion tend to grow too big and have too few observations in the terminal nodes. In order to overcome this problem the trees are recursively pruned back to smaller size. In the DRG application we iteratively pruned back the internal node which led to the smallest degeneration in accuracy, until only one internal node remained. From there all tree sizes are evaluated separately.
where R(T) is the Mean Squared Error (MSE) and  Open image in new window  is the number of terminal nodes, or the number of internal nodes minus one, of model T. α is a nonnegative constant which regulates the additional cost for more complex trees.
Requirements and Review of Alternative Tree Methods
There are many alternative regression tree algorithms, mainly differing by their tree structure, splitting criteria, pruning method and handling of missing values. In addition quite a lot of hybrid algorithms have been proposed, e.g. Quinlan's M5 algorithm [9] fits a linear regression model in each of the leaves to improve accuracy. Ensembles of trees [10] have become commonly used which are, on the other hand, less easy to interpret as the resulting model consists of more than one tree. Moreover, regression trees with soft splits [11] and methods to combine multiple trees into a single tree [12] were introduced. Both methods provide more accurate trees which, however, do not offer a distinct split point. Although, apart from the models accuracy, its low complexity, interpretability as well as its simple tree structure are most desirable properties for the DRG application.
The CART algorithm is a greedy algorithm which builds trees in a forward stepwise search. Therefore, its results are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. By perturbing the data bumping identifies different trees in a greedy manner, while some of these models may be close to a global or local maximum. Besides the used bumping method, there are two other common groups of algorithms to find more globally optimal trees that fulfill our requirements of simplicity and interpretability, which are discussed in the following.
The first approach is to build trees in a globally more optimal way. This can be done by calculating the effects of the choice of the attribute deeper down in the tree, which in principle can be accomplished by an exhaustive search [13]. However, this is computationally intractable for larger datasets. As a consequence, the search space is usually limited by heuristics. According to previous studies, lookahead procedures are not always beneficial over greedy strategies and have been criticized [14, 15]. On the contrary several authors [16, 17, 18] reported a significant improvement in tree quality. Murthy and Salzberg [14] conclude that limited lookahead search on average produces shallower trees with the same classification accuracy. In some cases the trees from the lookahead procedures are even both, less accurate and bigger than the trees produced by a greedy strategy. Quinlan and CameronJones [15] argue that these rather unpromising results are due to oversearching the hypothesis space, resulting in an overfit of the training data.
Shi and LyonsWeiler [19] presented the Clinical Decision Modeling System (CDMS), which allows searching for random classification trees that fulfill user specified constraints about model complexity and accuracy. Similar to our approach they follow the idea of constructing a set of models first and leave the selection of a clinically meaningful tree to the user of their software.
The second group of algorithms built the tree in a greedy manner first and improve the tree structure later by the use of optimization methods, e.g. evolutionary algorithms [20], Bayesian CART [21, 22], simulated annealing [23] and tabu search [24].
Evolutionary algorithms are a family of algorithms that use stochastic optimization based on concepts of natural Darwinian evolution. For tree algorithms genetic operations can be applied to modify the tree structure and the tests that are applied in the internal nodes. Based on these operations new populations of trees are explored iteratively. The newly generated population is then assessed by a fitness function, which evaluates the quality of an individual within one population. Individual trees that are assessed to have a high fitness are more likely to be used in the next round, whereas the other models are rejected.
Kalles [25] classification tree algorithm uses a fitness function that takes the two quality attributes of misclassification rate and tree size into account. A survey of fitness approximations is given in [26]. An evolutionary approach that is applicable for classification and regression trees is presented in [20].
Bayesian CART [21, 22] algorithms aim to stochastically optimize prespecified CART trees in an approximated Bayesian way. The space of all possible trees is explored by Monte Carlo methods, which give an approximation to a probability distribution over the space of all possible trees. Modification of the tree structure is conducted by employing different move types, including grow and prune steps, as well as a change step which changes the split at an internal node. In contrast to evolutionary algorithms Bayesian CART is not population oriented, but only modifies one tree at a time.
Simulated annealing [23] is a stochastic search method that is inspired by the annealing of metals. An initial solution is modified by permutations and controlled by an evaluation function. Uphill moves, i.e., changes to a worse solution are accepted by the degree of badness and a parameter called Temperature (T). When T is high the search is almost random, while at a lower temperature the updates are greedier. During the iteration T is slowly decreased and the time spent at a specific temperature is increased. The basic idea of simulated annealing is to avoid to get stuck in a local minimum to early when T is high and to find the local optimal solution when T is low.
From an initial tree model, tabu search [24] iteratively contacts several neighborhood moves, i.e., modifications of the tree, and selects the move with the best solution among all candidate moves for the current iteration. A set of admissible solutions is stored in a so called candidate list. The size of the candidate list determines the tradeoff between time and performance. Reversal moves are avoided by making selected attributes of moves tabu, i.e., forbidden. Tabu search allows searching for solutions beyond local optimum while still making the best possible move at each iteration.
Model Search by Bootstrap
Bootstrap methods are most commonly based on the idea of combining and averaging models to reduce prediction error. Examples of such methods include Bagging [27], Boosting [28] and Random Forests [10]. The basic idea behind Bagging and Random Forests is to reduce variance by averaging a number of B models, created on the basis of B different datasets. In contrast, Boosting reduces the overall training error by recursively fitting models to the residuals of the previously constructed regression tree. Although these methods can improve the accuracy and the variability of the results significantly, the final model itself loses its interpretability and the influence of the predictor variables becomes unclear.
In contrast to other bootstrap methods the result of bumping is not an ensemble of trees but only single trees, which are built on different bootstrap samples. The bootstrap samples themselves are formed by random sampling with replacement from the original training data, while each bootstrap sample has the same size as the original training dataset. This procedure is repeated B times, producing B bootstrap datasets, from which, in turn B models can be built.
Bumping was successfully applied in combination with several learning algorithms including Classification Trees, Linear Regression, Splines and parametric density estimation [6], Linear Discriminant Analyis (LDA) [29], Neural Networks [30] and Self Organizing Maps (SOM) [31].
 1.
A set of bootstrap samples z ^{*1}, z ^{*2},...,z ^{*B}are drawn from the trainingset z
 2.
Models are fit to each bootstrap sample giving prediction Open image in new window (x) for each bootstrap b = 1, 2,...,B at input point x. As a convention the original trainingset z is included among the B bootstrap samples as well.
 3.
For each tree complexity, the best trees are selected based on their average prediction error on the original trainingset z.
In the following section the evaluation of the selected trees on independent data is further discussed. Additionally, the evaluation criteria to assess the number of statistically accurate model choices are defined.
From the presented methods that allow searching for alternative tree models, only bumping and evolutionary algorithms offer a diverse set of model choices. However, in principle the other methods could be modified to store an arbitrary amount of accurate candidate trees that are created during the search process.
A particular advantage of bumping compared to other nongreedy regression tree methods is the possibility to computationally effective construct and select the best models for each tree size. By the use of bumping all candidate trees can simply be grown to full size first and secondly be pruned back iteratively by one node. As a result, for each tree size the best model can be selected from the B bootstrap trees. Other algorithms that search for globally optimal candidate models would tend towards trees that are optimal for some tree complexity. These trees would either be very complex, or would at least have similar complexity for all candidate trees if the models' quality is measured by accuracy and the complexity of the tree. However, iterative pruning of these models does not necessarily result in optimal models with smaller tree size. Therefore, in order to build optimal trees for each tree size, each model complexity, determined by the number of internal nodes, would have to be handled separately.
where C_{ n }is the number of topologies for trees with n internal nodes. The number of binary trees with n = 1 to n = 6 internal nodes are 1, 2, 5, 14, 42, 132.
Evaluation Criteria
The performance of bumping compared to the standard CART algorithm is evaluated based on its ability of finding homogeneous patient groups with similar LOS. That is modeling and predicting the LOS of hospital patients, as it is described in the third step of the threestep classification procedure, summarized in Figure 1.
Accuracy of the Best Bootstrapped Tree
In this first evaluation step we want to show that the best bootstrapped tree offers increased predictive accuracy compared to the CART algorithm. The difference in accuracy is assessed by the use of 10fold crossvalidation [[34], Chapter 7]. In 10fold crossvalidation the data is first partitioned into complementary subsets called folds. The model is then built on 9 folds and the remaining fold is used as a testset. This analysis is repeated 10 times, where each of the folds is used as the testset once. Finally, the estimate of predictive accuracy is calculated from the average performance of the 10 models on their associated testsets. The evaluation on independent data is especially important as a wider search of the hypothesis space can lead to overfitting of data [15].
To avoid overfitting, each terminal node should have a minimal amount of observations m_{ min }. However, in our comparison, we did not restrict the minimum number of m_{ min }. The reason is, that we want to avoid the effect of trees stopping to split with m_{ min } k observations, where k is a small number of instances, while similar trees with m_{ min }observations further split up. To give an example where this is important imagine that the standard CART tree stops splitting at node j with m_{ min } 1 nodes. One of the 200 bootstrap trees is very similar to the standard CART tree but has m_{ min }observations in node j. As a result the bootstrap tree splits at j while the CART tree stops splitting. Thus, this marginal difference of one more observation in j results in two different treetopologies which can have significantly different predictive accuracy.
Number of Accurate Model Choices
In the second step of our evaluation the possibility to construct diverse choices of accurate trees by the use of bootstrap sampling is presented. The estimation of accuracy takes the whole dataset into account. In this part of the evaluation, where we assess the number of diverse choices of accurate trees, we limited the minimum number of observations to 30, which we thought of to be large enough to avoid overfitting as well as to be a minimum requirement to form a patient group in the LKF model.
The DRGData
Description of the evaluated datasets.
DataSet  Description  Sample Size  Variables (Interval,Nominal)  

HDG0106  Parkinson's disease  6155  114  (109,5) 
HDG0202  Malignant neoplasms  3933  55  (47,8) 
HDG0304  Eye diagnoses  9067  41  (36,5) 
HDG0502  Acute affections of the respiratory tract and middle atelectasis  8251  100  (92,8) 
MEL0101  Interventions on the skull  875  60  (54,6) 
MEL0203  Small interventions in connective tissue and soft tissue  17268  58  (52,6) 
MEL0401  Interventions on the outer and middle ear, designed to treat a liquorrhoe  4102  44  (40,4) 
MEL0501  Interventions on the esophagus, stomach and diaphragm  3432  86  (80,6) 
Results
Accuracy of the Best Bootstrapped Tree
Relative average improvement.
Tree Size  HDG0106  HDG0202  HDG0304  HDG0502  MEL0101  MEL0203  MEL0401  MEL0501  Average 

2  0.00  1.12  2.55  0.71  1.20  3.74  3.34  1.52  1.39 
3  0.00  2.78  3.33  1.65  5.96  1.88  3.92  1.97  2.19 
4  0.36  5.57  3.52  1.23  5.77  3.30  4.28  1.05  2.78 
5  0.42  3.18  3.85  2.30  7.43  0.26  3.81  0.84  2.55 
6  0.24  4.38  5.47  1.13  9.65  12.03  2.33  4.41  4.90 
8  0.11  6.05  1.75  1.15  1.06  12.91  2.67  3.63  3.64 
10  0.06  3.99  3.16  0.69  2.93  5.09  1.94  2.83  1.84 
12  0.42  4.14  3.24  1.75  2.89  1.61  1.24  4.95  2.43 
14  1.87  3.35  1.82  1.20  0.36  0.00  2.15  2.17  1.06 
16  0.76  2.11  2.52  1.27  1.38  1.18  1.89  0.65  1.28 
Figure 3 illustrates the reduction of the total MSE by models with different tree complexities estimated by 10fold cross validation. It can be observed that the predictive error is already reduced with a small number of splits and the improvements obtained by additional splits become progressively smaller with increasing tree complexity. Although very large trees often give the best predictive performance, these complex trees are difficult to interpret and hard to work with.
The average improvement in relative accuracy by the bootstrap method often offers models with the same accuracy but less complex rules. For example, models with 3 internal nodes compared to models with 2 internal nodes offer an average increase in accuracy of 1.60%, while the accuracy of the bootstrap method achieved an average improvement of 1.39%. For the datasets HDG0304, MEL0203 and MEL0401 the best bootstrapped tree with 2 internal nodes even outperforms the CART tree with 3 internal nodes. This effect becomes even more significant for larger tree sizes where one or even several rules can be omitted without degeneration in performance.
Number of Accurate Model Choices
In the second step the number of trees constructed by bumping that are at least as accurate or better than the standard tree is evaluated. Models are considered dissimilar when at least one split variable differed between the trees. For groups of trees where all the split variables are the same, but the split points differ the most accurate tree is selected and considered as a candidate model.
Number of diverse trees.
Tree Size  [1  %, ∞]  [+1  %, ∞]  [+3  %, ∞] 

2  3.4  (0,9)  0.1  (0,1)  0.1  (0,1) 
3  14.1  (0,45)  4.8  (0,27)  1.3  (0,9) 
4  23.3  (6,67)  7.4  (0,23)  3.9  (0,21) 
5  30.3  (10,45)  12.4  (0,37)  7.1  (0,34) 
6  39.8  (7,66)  10.4  (0,47)  4.3  (0,34) 
8  42.9  (12,84)  2.5  (0,8)  0.0  (0,0) 
10  60.1  (10,115)  9.0  (0,29)  0.1  (0,1) 
12  63.4  (6,181)  12.1  (0,93)  0.0  (0,0) 
14  76.1  (5,183)  13.1  (0,70)  8.8  (0,70) 
16  82.5  (5,187)  16.6  (0,98)  1.9  (0,15) 
The results show that even for very low tree complexities alternative models can be found. For simplest models, with only 2 internal nodes, an average of 3.4 different trees with at least similar accuracy [1%, +1%] were found. For slightly more complex models with 3 rules the average number of models with at least similar accuracy increased to 14.1 and 4.8 trees offered improve accuracy of > 1%, compared to the standard CART tree. It can be observed that with increasing model size the number of different trees increases to 187 for models with 16 internal nodes, while many of these models only differ by minor important splits at the bottom of the trees, which do not contribute much to the reduction of impurity and are medically very similar.
Therefore the similarity of trees should be further distinguished. How to assess statistical similarity of trees by means of topography and similar partitioning is discussed in [36, 37]. However, in the DRGapplication we are mainly interested in the choices of splitvariables regarding their medical meaning. In our analyses nodes differing further up in the tree are considered as more influential, as more patients are affected by these rules and they also contribute more to the reduction of the total variance. As an estimate on which levels the differences occur the results from Table 3 can be taken into account.
Conclusions
Based on the evaluation of 8 large datasets taken from the Austrian DRG system, we showed that bumping can be used to construct diverse and accurate candidate models for DRGsystems that are based on conjunctive rules. Compared to other methods that allow a broader search of the hypothesis space, bumping can be used computationally more efficient. The presented results show that on average the predictive accuracy of the best bootstrap based tree offers improved accuracy compared to the tree from the standard CART algorithm. Furthermore, less complex trees can be found that are noninferior compared to the single tree constructed by the original algorithm.
During the whole development of the Austrian DRGsystem medical experts have been involved in the evaluation of the resulting regression trees. Many times the statistical optimal tree was not selected because of medical expert opinion. From discussions with medical experts, we know that a single, datadriven model is not always the medical correct one and different options have to be presented for medical evaluation. With our approach of constructing diverse models for different prespecified tree sizes, we allow a wide range of candidate models to be considered. For these candidate models suitable tree sizes can be selected, based on the costcomplexity criterion as well as on economical and medical considerations. Subsequently, given a desired tree complexity, medical domain experts can chose a final model. In this way, statistically suboptimal, manual alterations of models can be minimized.
This presentation illustrates the possibilities of bumping, which will be used in the next years of the maintenance and further development of the Austrian DRGsystem. Besides its relevance to DRGsystems, bumping and the proposed twostep model selection process are especially useful to assist in any kind of classification or regression problems in medical decision and prognosis tasks [38, 39, 40]. This is because domain specific knowledge can be used to guide the selection of a medically meaningful and statistically accurate model.
Notes
Acknowledgements
The authors would like to thank Michael Edlinger, Department of Medical Statistics, Informatics and Health Economics, for reviewing the paper. Furthermore, we want to thank the Bundesministerium für Gesundheit, Familie und Jugend for providing and approving the use of the datasets for this study.
Supplementary material
References
 1.The Casemix Service: HRG4 Design Concepts. 2007, (accessed January 29, 2010), [http://www.ic.nhs.uk/webfiles/Services/casemix/Prep%20HRG4/HRG4%20design%20concepts%20a.pdf]Google Scholar
 2.Canadian Institute for Health Information: Acute Care Grouping Methodologies. 2004, (accessed on January 29, 2010), [http://secure.cihi.ca/cihiweb/en/downloads/Acute_Care_Grouping_Methodologies2004_e.pdf]Google Scholar
 3.Bundesministerium für Gesundheit, Familie und Jugend: Leistungsorientierte Krankenanstaltenfinanzierung. LKF Systembeschreibung. 2009, (accessed on January 29, 2010), [http://bmg.gv.at/cms/site/attachments/6/4/5/CH0719/CMS1159516854629/systembeschreibung_2009.pdf]Google Scholar
 4.Fischer W: Diagnosis Related Groups (DRG's) und Verwandte Patientenklassifikationssysteme. 2000, Wolfertswil: Zentrum für Informatik und wirtschaftliche MedizinGoogle Scholar
 5.Institut für das Entgeltsystem im Krankenhaus GmbH: German Diagnosis Related Groups Definitionshandbuch. Siegburg: Deutsche Krankenhaus Verlagsgesellschaft GmbH. 2005Google Scholar
 6.Tibshirani R, Knight K: Model Search by Bootstrap "Bumping". Journal of Computational and Graphical Statistics. 1999, 8 (4): 671686. 10.2307/1390820.Google Scholar
 7.Breiman L, Friedman J, Olshen R, Stone C: Classification and Regression Trees. 1984, Belmont: WadsworthGoogle Scholar
 8.Theurl E, Winner H: The impact of hospital financing on the length of stay: Evidence from Austria. Health policy. 2007, 82 (3): 375389. 10.1016/j.healthpol.2006.11.001.CrossRefPubMedGoogle Scholar
 9.Quinlan J: Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence. 1992, 343348.Google Scholar
 10.Breiman L: Random Forests. Machine Learning. 2001, 45: 532. 10.1023/A:1010933404324.CrossRefGoogle Scholar
 11.Suárez A, Lutsko J: Globally Optimal Fuzzy Decision Trees for Classification and Regression. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1999, 21 (12): 12971311. 10.1109/34.817409.CrossRefGoogle Scholar
 12.Shannon W, Banks D: Combining classification trees using MLE. Statistics in Medicine. 1999, 18 (6): 727740. 10.1002/(SICI)10970258(19990330)18:6<727::AIDSIM61>3.0.CO;22.CrossRefPubMedGoogle Scholar
 13.Vogel D, Asparouhov O, Scheffer T: Scalable lookahead linear regression trees. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007, ACM Press New York, NY, USA, 757764. full_text.CrossRefGoogle Scholar
 14.Murthy S, Salzberg S: Lookahead and pathology in decision tree induction. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. 1995, 10251031.Google Scholar
 15.Quinlan J, CameronJones R: Oversearching and Layered Search in Empirical Learning. Breast Cancer. 1995, 286: 27.Google Scholar
 16.Esmeir S, Markovitch S: Anytime Learning of Decision Trees. The Journal of Machine Learning Research. 2007, 8: 891933.Google Scholar
 17.Esmeir S, Markovitch S: Lookaheadbased algorithms for anytime induction of decision trees. ACM International Conference Proceeding Series. 2004, ACM Press New York, NY, USA, 257264.Google Scholar
 18.Norton S: Generating better decision trees. Proceedings of the Eleventh International Conference on Artificial Intelligence. 1989, 800805.Google Scholar
 19.Shi H, LyonsWeiler J: Clinical decision modeling system. BMC Medical Informatics and Decision Making. 2007, 7: 2310.1186/14726947723.CrossRefPubMedPubMedCentralGoogle Scholar
 20.Fan G, Gray J: Regression tree analysis using TARGET. Journal of Computational and Graphical Statistics. 2005, 14: 206218. 10.1198/106186005X37210.CrossRefGoogle Scholar
 21.Chipman H, George E, McCulloch R: Bayesian CART Model Search. Journal of the American Statistical Association. 1998, 93: 935947. 10.2307/2669832.CrossRefGoogle Scholar
 22.Denison D, Mallick B, Smith A: A Bayesian CART algorithm. Biometrika. 1998, 85 (2): 363377. 10.1093/biomet/85.2.363.CrossRefGoogle Scholar
 23.Sutton C: Improving Classification Trees with Simulated Annealing. Proceedings of the 23rd Symposium on the Interface, Interface Foundation of North America. 1992, 33344.Google Scholar
 24.Youssef H, M Sait S, Adiche H: Evolutionary algorithms, simulated annealing and tabu search: a comparative study. Engineering Applications of Artificial Intelligence. 2001, 14 (2): 167181. 10.1016/S09521976(00)000658.CrossRefGoogle Scholar
 25.Kalles D: Lossless fitness inheritance in genetic algorithms for decision trees. Arxiv preprint cs/0611166. 2006Google Scholar
 26.Jin Y: A comprehensive survey of fitness approximation in evolutionary computation. Soft ComputingA Fusion of Foundations, Methodologies and Applications. 2005, 9: 312.Google Scholar
 27.Breiman L: Bagging predictors. Machine Learning. 1996, 24 (2): 123140.Google Scholar
 28.Friedman J: Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001, 29 (5): 11891232. 10.1214/aos/1013203451.CrossRefGoogle Scholar
 29.Gao H, Davis J: Sampling Representative Examples for Dimensionality Reduction and RecognitionBootsrap Bumping LDA. Lecture Nodes in Computer Science. 2006, 3953: 275287. full_text.CrossRefGoogle Scholar
 30.Heskes T: Balancing between bagging and bumping. Advances in Neural Information Processing Systems 9. 1997, MIT Press, 466472.Google Scholar
 31.Petrikieva L, Fyfe C: Bagging and bumping selforganising maps. Computing and Information Systems. 2002, 9 (2): 69Google Scholar
 32.Weisstein Eric WA: "Binary Tree." From Math WorldA Wolfram Web Resource. (accessed Mai 21, 2009), [http://mathworld.wolfram.com/BinaryTree.html]
 33.Therneau T, Atkinson E: An introduction to recursive partitioning using the RPART routines. Mayo Foundation. 1997Google Scholar
 34.Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2008, SpringerGoogle Scholar
 35.Bundesministerium für Gesundheit, Familie und Jugend: Bundesministerium für Gesundheit, Familie und Jugend. [accessed on December 23th 2009], [http://bmg.gv.at]
 36.Chipman H, George E, McCulloch R: Making sense of a forest of trees. Proceedings of the 30th Symposium on the Interface. 1998, 8492.Google Scholar
 37.Miglio R, Soffritti G: The comparison between classification trees through proximity measures. Computational Statistics and Data Analysis. 2004, 45 (3): 577593. 10.1016/S01679473(03)00063X.CrossRefGoogle Scholar
 38.Ji S, Smith R, Huynh T, Najarian K: A comparative analysis of multilevel computerassisted decision making systems for traumatic injuries. BMC Medical Informatics and Decision Making. 2009, 9: 210.1186/1472694792.CrossRefPubMedPubMedCentralGoogle Scholar
 39.Toussi M, Lamy J, Le Toumelin P, Venot A: Using data mining techniques to explore physicians' therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Medical Informatics and Decision Making. 2009, 9: 2810.1186/14726947928.CrossRefPubMedPubMedCentralGoogle Scholar
 40.Barrett J, Mondick J, Narayan M, Vijayakumar K, Vijayakumar S: Integration of modeling and simulation into hospitalbased decision support systems guiding pediatric pharmacotherapy. BMC Medical Informatics and Decision Making. 2008, 8: 610.1186/1472694786.CrossRefPubMedPubMedCentralGoogle Scholar
Prepublication history
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14726947/10/9/prepub
Copyright information
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.