# A Comparison of Four Software Programs for Implementing Decision Analytic Cost-Effectiveness Models

## Abstract

The volume and technical complexity of both academic and commercial research using decision analytic modelling has increased rapidly over the last two decades. The range of software programs used for their implementation has also increased, but it remains true that a small number of programs account for the vast majority of cost-effectiveness modelling work. We report a comparison of four software programs: TreeAge Pro, Microsoft Excel, R and MATLAB. Our focus is on software commonly used for building Markov models and decision trees to conduct cohort simulations, given their predominance in the published literature around cost-effectiveness modelling. Our comparison uses three qualitative criteria as proposed by Eddy et al.: “transparency and validation”, “learning curve” and “capability”. In addition, we introduce the quantitative criterion of processing speed. We also consider the cost of each program to academic users and commercial users. We rank the programs based on each of these criteria. We find that, whilst Microsoft Excel and TreeAge Pro are good programs for educational purposes and for producing the types of analyses typically required by health technology assessment agencies, the efficiency and transparency advantages of programming languages such as MATLAB and R become increasingly valuable when more complex analyses are required.

### Key Points for Decision Makers

Microsoft Excel and TreeAge Pro are good programs for implementing the types of cost-effectiveness analyses commonly required by health technology assessment bodies. |

MATLAB and R are particularly valuable for implementing more complex decision analytic models and computationally demanding analyses, such as expected value of perfect parameter information (EVPPI), due to their processing speed and transparency. |

## 1 Introduction

Volume of published decision models (2000–2015)

Time frame | Markov model | Decision tree | Individual-level model |
---|---|---|---|

2000–2009 | 3344 | 896 | 252 |

2010–2014 | 2969 | 932 | 385 |

2015 | 443 | 214 | 116 |

Parsimony, transparency and reproducibility are well established principles in decision analytic modelling, and they are relevant considerations when choosing which software to use for model implementation. However, these are not the only considerations. The software must be capable of implementing the type of model needed, and it must match the experience and technical skills of the analyst. The technical demands of health technology assessment (HTA) bodies effectively define a minimum capacity set for software. However, the abilities of HTA bodies to evaluate submitted models reduce the alternative options for the decision modeller. Furthermore, the complexity of the proposed analyses can mean that computational efficiency is important.

There is scant literature around the software programs used for implementing cost-effectiveness models. A 2008 survey by Tosh and Wailoo for the UK’s National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) identified Microsoft Excel, R and TreeAge Pro as the predominant software used by the UK’s National Institute for Health Research (NIHR) Technology Assessment Reviews (TARs) teams [1]. In 2009, Menn and Holle conducted a review of cost-effectiveness modelling (CEM) software that included Arena, Microsoft Excel and TreeAge Pro [2]. A 2014 review by Davis et al. [3] for the NICE DSU identified Microsoft Visual Basic, R, TreeAge Pro and SIMUL8 as software used for developing models to conduct patient-level simulations. In 2017, Jalal et al. [4] identified that, over time, the landscape of software used for decision analysis is changing, with R having an increasingly important role in decision analysis in health sciences.

In this paper we consider TreeAge Pro 2016 R1 (hereafter referred to as TreeAge), Microsoft Excel 2016 (hereafter Excel) and two programming language-based software programs: R 3.2.4 x64 (hereafter R) and MATLAB 2016a x64 (hereafter MATLAB). Our focus is on software commonly used for building Markov models and decision trees to conduct cohort simulations, given their predominance in the published literature around CEM (Table 1). We do not consider Arena or Simul8 because these packages are more commonly used for discrete-event simulation (DES) than for building Markov models or decision trees. Nevertheless, the included programs have been extensively used in both academia and industry. With the exception of TreeAge, the programs considered are either open source or have open-source alternatives.

The remainder of this paper is structured as follows. Section 2 provides an overview for each of the four programs, including possible means for optimizing their processing speed and consideration of how much each costs to purchase. Section 3 briefly describes the criteria used to compare the programs. Section 4 describes the methods used for the benchmarking analyses. Section 5 presents the results of the benchmarking analyses and our assessment of each program’s performance against the evaluation criteria. Section 6 provides our summative assessment, including recommendations for how each program might be utilized in different settings such as education, HTA agency submissions and academic research.

## 2 Overview of the Software

### 2.1 Excel

Excel is the ubiquitous workhorse of CEM. All 28 respondents to the 2008 NICE DSU survey reported using Excel in the construction of a model submitted for technology appraisal. Along with TreeAge, it is the only software for which all six responding TAR teams stated they had expertise [1].

Menn and Holle [2] found Excel to be efficient for constructing simple models but that other programs were better suited to more complex models [2]. Excel is highly extensible through the use of macros programmed in Visual Basic for Applications (VBA) and numerous third-party packages to enhance functionality. Furthermore, Microsoft continues to develop the JavaScript application programming interface (API) across the Microsoft Office product line, allowing for choices in more advanced modelling [5].

Excel is often used in introductory courses and text books on CEM, and it is available on both the Microsoft Windows and the Apple MacOS operating systems [6].

LibreOffice and OpenOffice are open-source spreadsheet programs that are largely compatible with Excel [7, 8]; however, some Excel macros may require translation. The benchmark model discussed in Sect. 4, which uses simple macros, ran in LibreOffice with the sole modification of splitting one worksheet in Excel into two in LibreOffice because of the column constraints (a limit of 1024 columns in LibreOffice, compared with 16,384 in Excel) [9, 10].

Excel has many valuable features, including support for a variety of statistical and econometric functions and the ability to extend its built-in capabilities through the recording or direct programming of macros. These macros can vary in terms of their sophistication, from relatively basic ‘for’ loops to more complex user-defined functions. For advanced users who program macros to extend Excel’s capacities, the VBA development environment is fully featured, including syntax highlighting and completion as well as debugging and project management tools. Excel has extensive aesthetic options, allowing for a rich presentation of models and their results, as well as explicit documentation of the structure of the model. The links between Excel and other components of the Microsoft Office suite, such as Microsoft Word and Microsoft PowerPoint, support the efficient production of reports and presentations for dissemination.

Botchkarev [11] provides an overview of why Microsoft came under criticism for the quality of the algorithms used in Excel’s statistical functions and random number generators (RNGs). Problems were identified by numerous authors regarding the quality of Excel’s RNGs and the accuracy of statistical functions. He documents and elaborates on the state of the software and provides a table of known issues with functions as of Microsoft Excel 2013. He concludes that persisting problems with the default RNG necessitates the use of an RNG external to the software. Specifically, the RNG is undocumented, and the seed, which is used to initialize an RNG, cannot be set. This is problematic for reproducibility and, consequently, validation. In spite of these problems, Botchkarev concludes, “Microsoft Excel (versions 2010 and 2013) is a strong Monte Carlo simulation application” given the demonstrated improvements in many functions. As of Microsoft Excel 2016, the RNG remains largely undocumented, although a Microsoft support document [12] applicable to versions 2003–2010 suggests that the RNG is of the Wichman-Hill type. The ‘Randomize’ VBA function can be used to seed the RNG within VBA macros but not within Excel proper.

As of Microsoft Excel 2007, the software has a feature known as ‘multithreaded recalculation’ (MTR) that will automatically try to parallelize any re-calculations, provided they use only thread-safe functions [13]. MTR is active by default, requiring no additional settings from the user. Disabling unnecessary display outputs such as progress bars, screen updating and ‘Application.EnableEvents’ can also substantially reduce processing times [14].

Prices ($US) for Microsoft Office 2016 licences (March 2017) [15]

Version | Price |
---|---|

Home and student | 149.99 |

Home and business | 229.99 |

Professional | 399.99 |

### 2.2 TreeAge

TreeAge is a common visual development tool used in HTA [16]. The Tosh and Wailoo [1] survey indicated that 57% (*n* = 28) of respondents had used TreeAge to submit a model for a technology appraisal [1]. Along with Excel, it is the only software for which all six responding assessment groups stated they had expertise.

Summary of TreeAge Pro software capabilities

Models explicitly supported | Analysis and outputs | Interoperability | Scalability |
---|---|---|---|

Budget impact analysis (dynamic cohorts) Decision trees DES (time to event) Markov models Micro-simulation (individual state transition models) | Bayesian revision CE plane/scatter plots CEAC Deterministic and PSA EVPI EVPPI ICERs and dominance Markov trace Survival curves Threshold analysis Tornado diagrams Various charts (NMB vs. WTP, EVPI vs. WTP, etc.) Various distributions (ICERs, stochastic parameters, etc.) | Excel Java and ActiveX API ODBC database connections Python | Distributed computing Multi-threaded |

It is worth noting that TreeAge supports two implementations of the Markov model: the standard Markov chain and the Markov tree. The Markov tree was initially specified by Hollenberg [18], allowing a more aesthetic presentation of Markov models. These two forms are not completely equivalent—although the differences are subtle, these can have a substantial impact upon the results. Appendix 2 provides a more detailed exposition of the differences between Markov chain and Markov tree models and why the differences can be important.

With anything more than simple models, it is important to allow TreeAge to use as many central processing unit (CPU) threads as is possible. It is also advisable to allow TreeAge to use a significant quantity of the available random-access memory (RAM; heap memory). TreeAge supports distributed processing, with only one computer (the ‘master’) required to have an active license.

Prices ($US) for TreeAge Pro licences (March 2017) [16]

Core | Healthcare | |||||
---|---|---|---|---|---|---|

Commercial | Non-profit and government | Academic | Commercial | Non-profit and government | Academic | |

Standard | 1300 | 1150 | 1100 | 1760 | 1575 | 1100 |

Maintenance | 210 | 210 | 210 | 210 | 210 | 210 |

Annual licence | 475 | 435 | 425 | 600 | 550 | 425 |

Annual renewal | 475 | 435 | 425 | 600 | 550 | 425 |

Student course licence | – | – | – | – | – | 45 |

Student research licence | – | – | – | – | – | 275 |

### 2.3 MATLAB and R

Both MATLAB and R are general purpose programming languages frequently used for mathematical and statistical analyses. They share common strengths and weaknesses, and the development cycle is similar across both packages. A general analysis of both is first offered here, followed by application-specific details relevant to the analyst. No review of programming languages was offered by Menn and Holle [2]. The NICE DSU survey reported that, of the six responding assessment groups, two had expertise in R and the rest would either require training or be unable to review a model in R [1].

As general purpose high-level programming languages, MATLAB and R support all model types. There are few limits on the complexity, structure or scope and scale of models and analyses, and both value of information (VOI) and expected value of perfect parameter information (EVPPI) can be implemented [19, 20]. Both languages feature rich plotting features with many customizable options. Both languages are also highly extensible through community- and/or vendor-developed packages, which augment basic functionality.

MATLAB and R have built-in functions for routine statistics, permitting in-model specification of econometrics and other statistical analyses that drive model parameters. Both either include or can be used with an integrated development environment (IDE), such as RStudio for R. An IDE can greatly enhance productivity during model development as it provides visual identification of syntax errors as well as code management and debugging tools.

Both MATLAB and R use the Mersenne Twister as the default RNG, which has many desirable properties, including a long period length and efficiency in implementation [21].

In common with all programming languages, computational efficiency may be increased by pre-allocating any matrices, arrays or vectors that will be used in computation. Since both MATLAB and R make extensive use of vectorization, which enables the simultaneous processing of multiple data elements, using ‘for’ loops diminishes computational efficiency and so should be avoided where possible. For probabilistic models, it is more efficient to draw the random variables outside of the simulation loop. Profiling tools are available that identify which sections of code are generating the majority of the processing time and hence where there is the greatest potential for improving the efficiency of the code. As with Excel, it is advisable to eliminate progress bars, since these slow down analyses.

#### 2.3.1 MATLAB

MATLAB, a concatenation of ‘matrix laboratory’, is a popular high-level development environment. GNU Octave (hereafter Octave) is an open-source solution that is “mostly compatible” with the MATLAB language [22]. Although both are well suited to all forms of CEM, neither MATLAB nor Octave was considered in the review by Menn and Holle [2], nor did any respondents to the NICE DSU survey indicate the use of any of these two tools in submitting a HTA report [1].

Being a high-level language combined with an IDE, MATLAB is a general purpose computing platform providing for rapid development of numerical programs. An extensive standard library of functions provides a strong foundation for a variety of problems, and both MATLAB and Octave are extensible through user-submitted packages [23, 24]. MATLAB provides domain-specific extensions through ‘Toolboxes’, covering domains as diverse as bioinformatics and aerospace. For example, MathWorks Simulink and SimEvents products provide graphical DES, although there is sufficient infrastructure within MATLAB and the Statistics & Machine Learning Toolbox to implement DES without them [25, 26].

MATLAB has a built-in profiler to analyse the performance of programs and identify potential areas of improvement. The MATLAB editor provides real-time visual identification of syntax and programming errors and inefficiencies, although this is not available in Octave. The built-in debugging tools allow the programmer to step-through a program, line by line, to inspect problems. Its accelerator feature further optimizes programs, which can result in substantial reductions in run time. It also makes extensive use of parallel processing across many of its standard functions.

Although programs written in MATLAB are largely compatible with Octave, some components differ between the two programs. For example, Octave seeds RNGs using a different syntax to MATLAB. Speed of processing in Octave is highly dependent upon user optimization of the code, while the MATLAB accelerator precludes much of it. Code optimization is particularly important for EVPPI and other multi-level Monte Carlo analyses. By default, MATLAB uses double precision floating point arithmetic; this level of precision may be superfluous for some models and results in additional computational burden [27]. Single precision is more computationally efficient, so should be used if the level of accuracy is sufficient.

Prices ($US) for MATLAB licences (March 2017) [28]

Licence | Academic | Student | Student suite | Commercial and government | Home |
---|---|---|---|---|---|

MATLAB base | 500 | 49 | 99 | 2150 | 149 |

Statistics and machine learning toolbox | 200 | 29 | Including | 1000 | 45 |

Other toolboxes | 200–500 | 29 | 29 | 1000–3250 | 45 |

#### 2.3.2 R

R is an open-source language and environment designed for the development of statistical programming solutions [29]. Developed “as a different implementation of S [a language and environment developed at Bell Laboratories]”, its popularity has grown substantially since its first official release in 2000 [30]. A similar trend in R’s popularity in applications of decision analysis in health was recently observed by Jalal et al. [4]. The NICE DSU survey reported that 18% (*n* = 28) of respondents indicated they had used R for development of a model submitted as part of a NICE technology appraisal [1].

R is currently supplemented by more than 8000 community-developed open-source packages available for download from the Comprehensive R Archive Network (CRAN) [30]. These include packages designed specifically for cost-effectiveness analysis, such as BCEA, a Bayesian cost-effectiveness analysis package. Authors often supplement the package submission with a publication in the *Journal of Statistical Software*, usually offering more rigorous documentation, sometimes including the relevant theoretical material [31].

In addition to the ‘base’ version of R available from CRAN, other versions are developed with various extensions and enhancements. Microsoft’s R Open (MRO) is a popular example that includes more efficient math routines and advanced parallelization capabilities, amongst other benefits [32].

For programmers, the profiling tools that examine program performance, alongside the debugging tools, are particularly valuable. From an analytical perspective, the extensive range of free packages for statistical and econometrics analyses, supported by an extensive help system and documentation, make R an attractive implementation option.

Standard R is not inherently multi-threaded but requires additional packages to add this functionality. Eddelbuettel [33] provides a broad overview of packages that increase R’s computational capabilities. It should be noted that the use of such packages can necessitate additional technical knowledge. Unlike MATLAB, R does not have an ‘accelerator’. Compiling functions and files may result in decreased running times, particularly with just-in-time compilation that is supported by the ‘compiler’ package. Consequently, users have more direct responsibility for ensuring that their code is optimized. Further significant gains in processing speeds can be achieved by an installation of R which has had its math routines optimized and multi-threaded.

## 3 Methods

Our comparison of the four software uses three qualitative criteria as proposed by Eddy et al. [34]: “transparency and validation”, “learning curve” and “capability”. In addition, we introduce the quantitative criterion, processing speed. We also consider the cost of each program to academic users and commercial users. We rank the four programs based on each of these criteria, allowing for joint ranks in cases where we could not identify a clear difference in performance between software.

Eddy et al. [34] state that “transparency refers to the extent to which interested parties can review a model’s structure, equations, parameter values, and assumptions”. They identify two levels at which this is important: to allow a general understanding, as well as a more technical understanding, of the model. Validation complements transparency as a “set of methods for judging a model’s accuracy in making relevant predictions”. They list four main types of “validation”: face validity, verification (“internal validity”), external validity and predictive validity. In this work, validation refers primarily to verification which “addresses whether the model’s parts behave as intended and the model has been implemented correctly”.

The “learning curve” criterion is concerned with the ease with which a neophyte could acquire the skill necessary to implement a CEM in the software. In forming our opinions, we consider not only the resources offered by the software to support the necessary skill acquisition, including worked examples, manuals, training videos and courses, but also whether any additional background knowledge, such as basic mathematic or programming concepts (e.g. linear algebra), is necessary.

“Capability” refers to the scope of what is technically possible in the software. For example, while TreeAge is competent across a diverse array of models, these are not extensible; i.e. adding new types of analyses can only be done by the company itself. The growing literature on, and interest in, methods for approximating computationally burdensome VOI is a prime example of a capability to which users might wish to have access, but that they must wait for TreeAge to implement.

Computational speed is a critical component of CEM software because it is one of the key determinants of (1) how long it will take to produce a given analysis and (2) the scope of analyses that are feasible within the time available for a project. Complex models increase the computational burden and, when performing multi-stage Monte Carlo simulations, any inefficiency in model implementation is exacerbated. Marked differences in the time it takes to run simulations has implications for the cost of undertaking research, the ability to use decision analytic modelling as part of research and design and the level to which uncertainty around the model’s assumptions can be incorporated [35]. Processing speed was examined using a benchmarking exercise that is described in more detail in Sect. 4.

## 4 Benchmarking

For our benchmarking assessment, we implemented a CEM previously published by Paulden et al. [36]. This CEM was developed to evaluate the cost effectiveness of using a 21-gene assay, in conjunction with Adjuvant Online, for risk stratification in the provision of chemotherapy for patients with early-stage breast cancer.

A complete description of the model is provided in the paper by Paulden et al. [36]. Each risk category, representing a unique combination of the Adjuvant! Online and 21-gene assay risk groups, has its own Markov chain, modelling the probability of a distant recurrence. The strategies considered are the unique combinations of Adjuvant! Online risk groups to which the 21-gene assay may be provided. The original study found that providing the 21-gene assay to all Adjuvant! Online risk groups is cost effective.

The model was originally implemented in TreeAge 2009; adapting the model to run in the 2016 version required only minor modifications. The model was then recreated in MATLAB, R and Excel and implemented to track the same metrics as in TreeAge. The computer used for the benchmarking had an Intel i7-4770 @ 3.4 GHz CPU, with four physical and eight logical cores. It had 16 GB of RAM and ran the Microsoft Windows 10 64-bit operating system.

TreeAge was allocated the full amount of memory available on the computer and was allowed to use all available CPU cores. The Excel model makes only basic use of VBA, namely to loop through the Monte Carlo simulation number. In TreeAge, MATLAB and R, the same RNG seed was used on each run; it is not possible to set the seed in Excel. The R model makes use of multi-threading through the ‘doParallel’ package. No explicit multithreading was used for the MATLAB model. We considered the recreation of the TreeAge model in each software program to be complete when Table 2a from Paulden et al. [36] could be accurately reproduced using each model.

Ten simulations of 10,000 draws each were run using each software program. The required time for each simulation was reported directly by TreeAge; in Excel, R and MATLAB, a script was created that recorded the simulation time.

## 5 Results

Total costs ($CAD) and quality-adjusted life-years estimated by each software program

Strategy | Published costs | Excel costs | TreeAge costs | MATLAB costs | R costs |
---|---|---|---|---|---|

No patients | 13.86 m | 13.85 m | 13.85 m | 13.85 m | 13.85 m |

High risk only | 14.09 m | 14.09 m | 14.09 m | 14.09 m | 14.09 m |

Int risk only | 14.19 m | 14.18 m | 14.18 m | 14.18 m | 14.18 m |

Int/high risk only | 14.42 m | 14.42 m | 14.42 m | 14.42 m | 14.41 m |

Low risk only | 15.75 m | 15.75 m | 15.75 m | 15.75 m | 15.75 m |

Low/high risk only | 15.99 m | 15.99 m | 15.99 m | 15.99 m | 15.99 m |

Low/int. risk only | 16.08 m | 16.08 m | 16.08 m | 16.08 m | 16.08 m |

All patients | 16.32 m | 16.32 m | 16.32 m | 16.31 m | 16.31 m |

Published QALYs | Excel QALYs | TreeAge QALYs | MATLAB QALYs | R QALYs | |
---|---|---|---|---|---|

No patients | 11,063 | 11,062 | 11,061 | 11,061 | 11,061 |

High risk only | 11,276 | 11,276 | 11,275 | 11,275 | 11,275 |

Int risk only | 11,193 | 11,193 | 11,193 | 11,193 | 11,193 |

Int/high risk only | 11,407 | 11,408 | 11,406 | 11,407 | 11,406 |

Low risk only | 11,147 | 11,146 | 11,145 | 11,146 | 11,145 |

Low/high risk only | 11,361 | 11,360 | 11,359 | 11,360 | 11,359 |

Low/int risk only | 11,278 | 11,277 | 11,277 | 11,278 | 11,277 |

All patients | 11,492 | 11,492 | 11,490 | 11,492 | 11,491 |

### 5.1 Transparency and Validation

The models built in Excel are implicitly transparent when no restrictions on visibility of model structure or code are enforced (such as hiding a worksheet). Parameters should be obvious, as should the model structure. The use of cell names is required for efficient updating of parameter values in any but the most simple of models. However, the models can become opaque through what Tosh and Wailoo [1] call “cell chasing”, where references and names lead to a tangled web of variables. The ability of Excel to highlight these relationships, using the ‘trace precedents’ and ‘trace dependents’ facility, offers some reprieve. They also note that complex models will likely also require macros, the validation of which requires additional technical skills.

That the seed cannot be set in Excel is a significant problem. Any model should be reproducible upon demand and, as of Microsoft Excel 2016, a model implemented without complex VBA (including code that sets the seed of the RNG) could not be re-run without storing the entire draw for the simulation. This is also problematic when doing sensitivity analyses on a model, where one ought to use as much of the same draw as possible. For example, if one changes one parameter of one distribution, all the other draws should be identical for the sake of consistency.

Transparency of the model structure in TreeAge is explicit in visual form, and parameterization can be made transparent by outputting the model to a spreadsheet wherein the variables, distributions (and their parameters), tables and trackers can be reported easily. It offers model validation tools to check common development mistakes. These are particularly useful when working with large models. The ability to provide the model to unlicensed users in the form of the TreeAge Pro Player—including stored analyses—and extensive debugging options, such as console output and Markov cohort analysis, permit rigorous validation and exposition. Complementing this is the ability to incorporate user-defined, model-specific help files, which allow an analyst to document the model for reference and validation.

Although TreeAge’s implementation is proprietary and code validation is not possible, the software allows for sufficient output to confirm the calculations. In many ways, TreeAge standardizes the CEM models it supports, which is not the case with any of the other programs. However, as of March 2017, the distinction between a Markov chain and a Markov cycle tree receives no comment in the TreeAge user manual, with the “Markov Technical Details” chapter consisting of a single sentence that states “this chapter has not yet been written” [17]. Validation and transparency of TreeAge models is rendered fatuous without this knowledge and may even be misleading.

Models constructed in languages such as R and MATLAB provide for a very high degree of transparency, since the code implicitly documents the structure. When combined with thoughtful comments, programs can be easily followed and assumptions regarding parameters, model structure, and analyses are readily identifiable.

Changing parameters is simple and straightforward, allowing for multiple scenario analyses to be easily conducted by an assessment authority. Mathematical formulas are also explicit and their understanding informed by relevant comments and references. As statistics can be computed within the software itself, such analyses are equally transparent. However, although these programming languages offer the highest degree of transparency, they can be difficult to rigorously validate as a model may comprise many thousands of lines of code.

### 5.2 Processing Time

Time required for 10,000 simulations

MATLAB | R | Excel | TreeAge | ||||
---|---|---|---|---|---|---|---|

Seconds | Seconds | Seconds | Minutes | Seconds | Minutes | Hours | |

Average | 11.22 | 31.83 | 872.65 | 14.54 | 15,798.72 | 263.31 | 4.39 |

Standard deviation | 0.06 | 0.68 | 0.89 | 0.01 | 144.10 | 2.40 | 0.04 |

Minimum | 11.10 | 31.03 | 871.55 | 14.53 | 15,560.81 | 259.35 | 4.32 |

Maximum | 11.31 | 33.20 | 874.38 | 14.57 | 16,018.08 | 266.97 | 4.45 |

Median | 11.23 | 31.75 | 872.53 | 14.54 | 15,808.27 | 263.47 | 4.39 |

Estimated time required for an expected value of partial perfect information analysis consisting of 1000 runs of 10,000 simulations each (via extrapolation)

MATLAB | R | Excel | TreeAge | |
---|---|---|---|---|

Hours | 3.12 | 8.84 | 242.40 | 4388.53 |

Days | 0.13 | 0.37 | 10.10 | 182.86 |

### 5.3 Processing Speed

There is no ambiguity in the results of the benchmarking exercise. It is clear that complex models and their analyses benefit immensely from code tailored to the task at hand. One should bear this in mind when considering training options. It may be easier to learn TreeAge or Excel, and this might be attractive for one-off modelling tasks. However, as most analysts will be involved in constructing multiple models over time, consideration should be given as to whether the additional investment required to learn a programming language will be more than offset in the longer term by the benefits associated with substantial reductions in the time required to run model simulations.

### 5.4 Learning Curve

It is self-evident that it is easier to construct a model in TreeAge given the graphical nature of the program and the various utilities provided to convert diagrams into models. Time spent with the TreeAge user manual, and the extensive examples included in the program, will be sufficient to learn the nuances of the software. In addition, TreeAge offers a variety of training options.

It is our judgment that Excel without complex VBA is the second simplest software to learn. With an understanding of spreadsheets and the linkages between cells, it is relatively straightforward to implement a CEM in Excel. However, stochastic models require some command of VBA, and more complex simulations require an increasingly sophisticated command of VBA. For analysts requiring this level of coding expertise, it is likely that the time is better spent learning MATLAB or R, given the step-change improvements in speed of processing, as well as the transparency and validation advantages that they offer.

R and MATLAB have equivalent learning curves. Implementing a model in a programming language requires, at the very least, an understanding of control flow, data structures, RNGs, file operations and syntax. Parallelizing a model to run on multiple processing elements requires additional skill, as one must ensure that the implementation is thread safe. MATLAB offers training courses and certification, and one can find equivalent courses for R.

### 5.5 Capability

TreeAge provides the ability to run specific model types with varying structure. In every other software package, limitations on model type and variation are restricted only by imagination and computational power. Furthermore, the ability to perform statistical analyses within the model in Excel, R and MATLAB enhances transparency, validation and productivity. With TreeAge, one would have to produce any statistics—for example, a complex model of a parameter—and then import it or link it into TreeAge. If these statistical analyses are not well documented elsewhere, ambiguity can confound validation and transparency. Excel ranks below the programming languages as its statistical facilities are not as sophisticated.

Ranking of software on four domains of performance and purchase cost

Transparency and validation | Simulation time | Learning curve | Capability | Cost | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Rank | Software | Rank | Software | Rank | Software | Rank | Software | Rank | Software | |

Academic | Commercial | |||||||||

1 | MATLAB | 1 | MATLAB | 1 | TreeAge | 1 | MATLAB | 1 | R | R |

1 | R | 2 | R | 2 | Excel | 1 | R | 2 | Excel | Excel |

3 | Excel | 3 | Excel | 3 | MATLAB | 3 | Excel | 3 | MATLAB | TreeAge |

4 | TreeAge | 4 | TreeAge | 3 | R | 4 | TreeAge | 4 | TreeAge | MATLAB |

3 | Excel with complex VBA |

## 6 Discussion

It is clear that each of the programs have distinct advantages and disadvantages, and which is the ‘best’ program will depend upon a number of factors, including the purpose in building the model, the pre-existing level of expertise of the analyst, the sophistication of analysis required, the time available for the completion of the analysis, and the financial resources available to support the work.

For educational users, the following observations may prove useful. TreeAge provides an environment in which it is possible to quickly and easily implement concepts discussed in a classroom. It does not require complex mathematical skills, only an understanding of CEM. Implementing concepts such as the half-cycle correction is trivial in TreeAge, as is specifying distributions and model structures. Consequently, students can go from concept to application with ease.

For courses with more emphasis on mathematical content, Excel provides a rapid development environment where students are forced to engage not only with the concepts but also with the math. For example, students must explicitly form a Markov chain, parameterize distributions, draw from those distributions, debug any problems encountered and combine the results in meaningful ways. It is relatively easier to do these things in Excel (without complex VBA) than it is in a programming language.

Advanced courses would benefit from the use of a programming language. The students are forced to engage with the math (as with Excel), but they are also gaining the additional skill of basic programming. Statistical methods often used in HTA, such as survival analysis, can easily be incorporated and enhance the learning outcomes. If a student’s program includes a rigorous training in econometrics, economies of scope will be experienced if there is a common package used, such as R or MATLAB.

For commercial users, the scope of analyses that are likely to be undertaken across projects, rather than for any one project, is an important consideration. Here, the flexibility of a programming language may be necessary. While MATLAB poses higher acquisition costs, these may be quickly offset by productivity gains. This is especially true for those users who incorporate more burdensome computations into their projects. If such analyses are not a consistent characteristic of the portfolio of work, R may well represent a more efficient investment proposition.

Companies whose work program is made up entirely of submissions to HTA organizations, such as NICE in the UK or the Canadian Agency for Drugs and Technologies in Health (CADTH) in Canada, can currently meet their needs using any of the software programs compared in this paper. That said, as HTA agencies engage with earlier-stage evidence for technologies approved under conditional licensing processes, the nature of the evidence required by HTA agencies may change. For example, the UK’s newly implemented Cancer Drugs Fund process creates funding arrangements that are conditional upon the generation of additional evidence [39]. What new evidence will be required is to be defined by NICE. Relatively sophisticated VOI analyses may be required to identify the appropriate sample size for these on-market studies, and these analyses might not be feasible to implement using some software programs. Indeed, HTA agencies must themselves consider the virtues of demanding a higher degree of sophistication in their submissions.

## Notes

### Compliance with Ethical Standards

### Funding

This study was funded through grants from the Canadian Institutes for Health Research (CIHR) and Genome Canada. Christopher McCabe is supported through a Capital Health Research Chair in Emergency Medicine Research.

### Conflicts of interest

Mike Paulden and Christopher McCabe have taught introductory courses on decision modelling using Microsoft Excel, but have no relationships with the developer and have received no financial benefits for using this software to teach these courses. Petros Pechlivanoglou has taught introductory courses on decision modelling using R and has contributed to decision modelling courses that use TreeAge, but has no relationships with any of the developers and has received no financial benefits for using this software to teach these courses. Chase Hollman, Mike Paulden, Petros Pechlivanoglou and Christopher McCabe have no other potential conflicts of interest to report.

### Author contributions

Mike Paulden built the TreeAge model used for the benchmark comparisons and rebuilt this model using Microsoft Excel. Chase Hollman rebuilt this model in MATLAB and R, with support from Petros Pechlivanoglou, and conducted the benchmarking exercise. Christopher McCabe supervised the project. Chase Hollman wrote the first draft of the manuscript. All authors contributed to subsequent drafts of the manuscript, responses to peer review, and preparation of the manuscript for publication.

### Data availability statement

We have provided the models used in our benchmarking exercise as supplementary material.

## Supplementary material

### References

- 1.Tosh J, Wailoo A. Review of Software for Decision Modelling. Report by the NICE Decision Support Unit. 2008. http://www.nicedsu.org.uk/PDFs%20of%20reports/softwarereport-final.pdf. Accessed 1 May 2017.
- 2.Menn P, Holle R. Comparing three software tools for implementing markov models for health economic evaluations. Pharmacoeconomics. 2009;27:745–53.CrossRefPubMedGoogle Scholar
- 3.Davis S, Stevenson M, Tappenden P, Wailoo A. NICE DSU technical support document 15: cost-effectiveness modelling using patient-level simulation. Report by the NICE Decision Support Unit. 2014. http://www.nicedsu.org.uk/TSD15_Patient-level_simulation.pdf. Accessed 1 May 2017.
- 4.Jalal H, Pechlivanoglou P, Krijkamp E, Alarid-Escudero F, Enns E, Hunink MG. An overview of R in health decision sciences. Med Decis Making. 2017. doi:10.1177/0272989X16686559.Google Scholar
- 5.Microsoft Corporation. Increase the productivity of Users’ with enhanced Office.js APIs in Office 2016. 2015. https://dev.office.com/blogs/Office-js-Public-Preview. Accessed 1 May 2017.
- 6.Edlin R, McCabe C, Hulme C, et al. Cost effectiveness modelling for health technology assessment: a practical course. Heidelberg: Springer; 2015.CrossRefGoogle Scholar
- 7.The Document Foundation. LibreOffice. 2017. https://www.libreoffice.org/. Accessed 1 May 2017.
- 8.The Apache Software Foundation. OpenOffice. 2017. https://www.openoffice.org/. Accessed 1 May 2017.
- 9.The Document Foundation. Frequently asked questions - Calc. 2016. https://wiki.documentfoundation.org/Faq/Calc. Accessed 1 May 2017.
- 10.Microsoft Corporation. Excel specifications and limits. 2017. https://support.office.com/en-us/article/Excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3. Accessed 1 May 2017.
- 11.Botchkarev A. Assessing Excel VBA Suitability for Monte Carlo Simulation. Spreadsheets Educ (eJSiE). 2015;8(2):3.Google Scholar
- 12.Microsoft Corporation. Description of the RAND function in Excel. 2011. https://support.microsoft.com/en-us/help/828795/description-of-the-rand-function-in-excel. Accessed 1 May 2017.
- 13.Microsoft Corporation. Multithreaded recalculation in excel. 2012. https://msdn.microsoft.com/en-us/library/office/bb687899.aspx. Accessed 1 May 2017.
- 14.Microsoft Corporation. Excel 2010 performance: Tips for optimizing performance obstructions. 2011. https://msdn.microsoft.com/en-us/library/office/ff726673(v=office.14).aspx. Accessed 1 May 2017.
- 15.Microsoft Corporation. Microsoft Store—Microsoft Office. 2017. https://www.microsoftstore.com/store/msusa/en_US/list/Office/categoryID.71148700. Accessed 1 May 2017.
- 16.TreeAge Software Inc. Products—TreeAge Software. 2017. https://www.treeage.com/shop/. Accessed 1 May 2017.
- 17.TreeAge Software Inc. TreeAge Pro 2017 User’s Manual. 2017. http://files.treeage.com/treeagepro/17.1.0/20170109/TP-Manual-2017R1.pdf. Accessed 1 May 2017.
- 18.Hollenberg J. Markov cycle trees: a new representation for complex markov processes. Med Decis Making. 1984;4:529–30.Google Scholar
- 19.Claxton K, Eggington S, Ginnelly L, et al. A pilot study of value of information analysis to support research recommendations for NICE. CHE Research Paper 4. York: Centre for Health Economics, University of York; 2005. https://www.york.ac.uk/media/che/documents/papers/researchpapers/rp4_Pilot_study_of_value_of_information_analysis.pdf. Accessed 1 May 2017.
- 20.Ades AE, Lu G, Claxton K. Expected value of sample information calculations in medical decision modeling. Med Decis Making. 2004;24:207–27.CrossRefPubMedGoogle Scholar
- 21.Matsumoto M. Mersenne Twister Home Page. 2011. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html. Accessed 1 May 2017.
- 22.GNU Octave. About GNU Octave. 2017. https://www.gnu.org/software/octave/about.html. Accessed 1 May 2017.
- 23.Mathworks Inc. File Exchange. 2017. https://www.mathworks.com/matlabcentral/fileexchange/. Accessed 1 May 2017.
- 24.Octave-Forge. Extra packages for GNU Octave. 2017. https://octave.sourceforge.io/packages.php. Accessed 1 May 2017.
- 25.Mathworks Inc. Products and Services. 2017. https://www.mathworks.com/products.html. Accessed 1 May 2017.
- 26.Mathworks Inc. SimEvents. 2017. https://www.mathworks.com/products/simevents.html. Accessed 1 May 2017.
- 27.Mathworks Inc. Data Types. 2017. https://www.mathworks.com/help/matlab/data-types_data-types.html. Accessed 1 May 2017.
- 28.Mathworks Inc. Mathworks Store. 2017. https://www.mathworks.com/store/. Accessed 1 May 2017.
- 29.R Foundation for Statistical Computing. What is R? 2017. https://www.r-project.org/about.html. Accessed 1 May 2017.
- 30.Revolution Analytics. Revolutions: popularity. 2017. http://blog.revolutionanalytics.com/popularity/. Accessed 1 May 2017.
- 31.Foundation for Open Access Statistics. Journal of Statistical Software. 2017. https://www.jstatsoft.org/index. Accessed 1 May 2017.
- 32.Microsoft Corporation. Microsoft R Open: The Enhanced R Distribution. 2017. https://mran.microsoft.com/open/. Accessed 1 May 2017.
- 33.Eddelbuettel D. CRAN Task View: High-Performance and Parallel Computing with R. 2017. https://cran.r-project.org/web/views/HighPerformanceComputing.html. Accessed 1 May 2017.
- 34.Eddy DM, Hollingworth W, Caro JJ, et al. Model transparency and validation: a report of the ISPOR-SMDM modeling good research practices task force-7. Med Decis Making. 2012;32:733–43.CrossRefPubMedGoogle Scholar
- 35.Sculpher M, Drummond M, Buxton M. Economic evaluation in health care research and development: undertake it early and often. Health Economics Research Group discussion paper no. 12. London: Brunel University, HERG; 1995.Google Scholar
- 36.Paulden M, Franek J, Pham B, et al. Cost-effectiveness of the 21-gene assay for guiding adjuvant chemotherapy decisions in early breast cancer. Value Health. 2013;16:729–39.CrossRefPubMedGoogle Scholar
- 37.Sadatsafavi M, Bansback N, Zafari Z, Najafzadeh M, Marra C. Need for speed: an efficient algorithm for calculation of single-parameter expected value of partial perfect information. Value Health. 2013;16(2):438–48.CrossRefPubMedGoogle Scholar
- 38.Strong M, Oakley JE, Brennan A. Estimating multiparameter partial expected value of perfect information from a probabilistic sensitivity analysis sample: a nonparametric regression approach. Med Decis Making. 2014;34(3):311–26.CrossRefPubMedGoogle Scholar
- 39.NHS England Cancer Drugs Fund Team. Appraisal and funding of cancer drugs from July 2016 (including the new Cancer Drugs Fund): a new deal for patients, taxpayers and industry. 2016. https://www.england.nhs.uk/wp-content/uploads/2013/04/cdf-sop.pdf. Accessed 1 May 2017.