FormalPara Key Points for Decision Makers

This article demonstrates that the Markov decision process (MDP) has the potential to steer the optimization of sequential treatment to facilitate personalized treatment decisions.

This article specifically identifies applications of the MDP that have been used to address sequential decision problems in somatic diseases. The results indicate that the MDP could potentially be useful for addressing sequential decision making in depression.

Our study reveals that, although the structure of the state-transition model could potentially be suitable for extension into the MDP model, doing so would require a sufficiently extensive model.

1 Introduction

Depression is one of the most burdensome and costly of all mental health disorders, with a worldwide average lifetime and 12-month prevalence of 14.6% and 5.5%, respectively [1]. People with depression experience impairment in daily life, resulting in a quality of life that is lower than in the general population [2]. According to WHO projections, depression will rank first in terms of disability-adjusted life-years lost by 2030 [3]. The economic burden of depression is also high, having been estimated at US$326.2 billion for the United States in 2018 (price level 2020) [4]. Depression thus imposes a high burden on society, the healthcare system, and individuals [5]. To reduce this burden and support appropriate treatment selection, increasing attention is being directed to studies comparing different treatments regarding health outcomes and cost effectiveness. Most previous studies have examined only limited numbers of different treatments (e.g., psychotherapies, pharmacotherapy, brain stimulation therapy [6,7,8], genetic testing) to support targeted therapy [9], using different health economic (HE) models. While such studies have supported choices between different treatments, they have yielded little insight into treatment duration or sequential treatment choices.

To date, no consensus has been reached concerning how long (e.g., days, weeks, months, years) a patient should be treated with a specific treatment for depression [10]. Furthermore, it is unclear how consecutive treatments should be selected when initial treatment is not successful. One widespread approach is stepped care: a gradual increase in the intensity of treatments [11]. More recently, however, scholars have been directing greater attention to matched care [12], which implies that initial and sequential treatment steps are carefully adjusted to the personal characteristics and treatment history of the individual. Such adjustments are usually pragmatic and based on general guidelines, although they might also be informed by data-driven optimization.

The Markov decision process (MDP) is a mathematical model for sequential decisions and dynamic optimization [13], which generalizes standard Markov models by embedding a sequential decision process into the model and allowing multiple decisions in multiple time periods [14]. To support optimization, MDP models have been applied to address a variety of industrial operation problems, including cost-effective maintenance [15,16,17,18], electricity supply [19], and dynamic pricing [20]. Recent studies have demonstrated that MDP has potential to support clinical decision making [14]. Steimle and Denton [21] argue that the MDP model is essential for guiding decision makers in treatment decisions for chronic diseases, as it provides an analytical framework for studying sequential decisions. The framework is very general, however, and not geared toward specific diseases, nor does it contain actual input data. The feasibility of its actual application is therefore unclear. For this reason, two questions are worth exploring. The first concerns the identification of any actual applications of MDP within the field of healthcare, and the second concerns whether MDP could be fruitfully applied to address treatment decision issues in depression.

Given the state of knowledge as described above, the primary aim of this article was to examine how MDP has been implemented by reviewing all existing applications of MDP to medical decision making for diseases. It also provides a review of existing HE models of depression and an analysis of the potential of MDP to support sequential treatment decisions in depression, based on the reformulation of an existing HE model of depression and an assessment of the suitability of MDP.

2 Background

State-transition models (STMs) are structured around a set of mutually exclusive and collectively exhaustive health states, transitions, initial-state vectors, transition probabilities, cycle lengths, and state values (‘rewards’), which conceptualize a decision problem in terms of a set of health (or other) states and transitions among these states [22]. In this background section, the elements of an MDP are defined, starting with and compared with their analogues in STMs. The basic definition of an MDP comprises five elements (\(\mathrm{T},\mathrm{ S},{A}_{s},\) P(.│s, a),\({r}_{t}\left(s,a\right)\)), described using a standard notation [23]. To build an MDP model, the decision epochs (\(\mathrm{T}=\mathrm{1,2},\cdots \mathrm{N}\)), state space (\(\mathrm{S}\)), action space (\({A}_{s}\)), transition probabilities \((\mathrm{P}\left(.|s,a\right)\)), and rewards (\({r}_{t}\left(s,a\right))\) should be defined. All elements of an MDP are listed in Table 1, in comparison with the corresponding elements in a cohort-level STM.

Table 1 Elements of a Markov decision process (MDP) and comparable structures in a cohort-level state-transition model (STM)

As demonstrated by this comparison, an MDP can be regarded as an extension of an STM. The difference is the addition of actions (e.g., stop treatment, remain on current treatment, change treatment) and rewards, which may depend on these actions, with transition probabilities being conditional on both current state and current action. Conversely, if each state has only one action and if all rewards depend only on the state, the MDP reduces to an STM. Note that most STMs that have been applied to actual HE evaluations deviate in some way from the pure Markov property (e.g., because mortality depends on age or because some pay-offs vary according to both state and model run-time).

It stands to reason that no specific corresponding analogy exists for the MDP actions and decision rules, given that STMs are applied in HE evaluations primarily to compare two or more pre-specified strategies or scenarios. In contrast, states (as applied in STMs) are very similar to the states distinguished in MDPs. The decision epochs of an MDP are a set of points at which decisions are made, and they are analogous to the cycle time in standard Markov models.

While cohort-level Markov models can thus be extended into an MDP at the aggregate level, it is also possible to define an MDP with parameters that depend on individual characteristics and to define optimal strategies that vary by individual. Such patient-level MDPs could be regarded as an extension of patient-level STMs, sometimes also called microsimulation models, or patient-level Markov models. Finally, MDPs can be defined in continuous rather than discrete time, and with a finite or infinite time horizon [24].

3 Methods

We performed two reviews, the first to identify existing applications of the MDP in treatment of disease and the second to identify existing HE models in depression. Data were extracted from MDP applications to articulate assumptions and requirements for the MDP. We then illustrated the elements of an MDP by reformulating an existing HE model and examining its added value. This served as input for discussing the suitability of an MDP for solving sequential treatment decisions in depression. The methodological framework of the present study is displayed in Fig. 1. The protocols for the two reviews were registered in the Open Science Framework.

Fig. 1
figure 1

Methodological framework of the present study. MDP Markov decision process, HE health economic

3.1 Review of Markov Decision Process (MDP) and Health Economic (HE) Models

The two reviews followed the guidelines for Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA ScR) (see Appendix Part 1 in the electronic supplementary material [ESM] for the checklist).

The search strings for existing applications of MDP and HE decision models were designed to identify relevant literature (see Appendix Part 2 in the ESM). Web of Science and PubMed were searched in September 2021. An article was eligible for inclusion only if it addressed the treatment of diseases rather than the optimization of hospital operations, surgical techniques, or the application of healthcare devices using MDP. In the review of HE models for depression, publications were eligible for inclusion only if they concerned the economic evaluation of treatments for depression. Both reviews excluded papers published in languages other than English, meeting abstracts, reviews, and publications that were not available in full text.

After eliminating duplicates, two reviewers (F.L, X.L) independently screened titles and abstracts. Disagreements were initially addressed through discussion and consensus. Any remaining disputes between the two reviewers were solved by appealing to a third author (T.F). Two authors (F.L, X.L) abstracted data on general study characteristics using a data extraction form.

For the MDP review, data extraction focused on the structure of the MDP in each of the applications to evaluate the assumptions and requirements of MDP. The following elements were extracted: time horizon, disease, state space, action space, reward function, and main perspective. The authors also attempted to extract the requirements and assumptions of MDP when applied in healthcare settings based on the studies identified. Both general and specific assumptions related to specific applications were included.

The review of HE models for depression started with the categorization of model structures. Given our interest in the structure of STMs and whether this structure could be used as a starting point for MDPs, we retained only models that are structured as a set of health (or other) states and transitions among these states. For these studies, further information on each model was collected. General study characteristics were authors and year, treatment types, and their comparators. Model characteristics were health states, time horizon, cycle length, and aim.

3.2 Illustration of Elements of an MDP Using a Reformulation of an Existing HE Model into an MDP

We use a case study to illustrate how a real-world HE decision problem can be reformulated as an MDP. The required model elements (e.g., states, transition probabilities, costs, quality-of-life weights) were first extracted from an existing HE model. We then translated the HE model to an MDP formulation based on the information collected. To investigate the consistency of conclusions between the existing HE model and the MDP approach, we compared the results from the existing HE model to those of the MDP model. Finally, we discussed the potential added value of MDP after reformulation.

3.3 Assessment of the Suitability of MDP for Optimizing Sequential Treatment in Depression

After comparing the findings of the two reviews to clarify the assumptions about the use of MDP in depression, we examined whether they might be satisfied. We then discussed how to define MDP structure when used in depression. In this step, we also discussed challenges associated with using MDP to optimize sequential treatment decisions for depression.

4 Results

4.1 Overview of Existing Applications of MDP in Treatment of Disease

All existing applications of MDP to optimize treatment concern somatic disease. As shown in Fig. 2, we selected a total of 23 applications of MDP for inclusion in the review. An overview of the characteristics of these applications is provided in Table 2.

Fig. 2
figure 2

Flow chart of study selection for MDP applications in the field of healthcare. MDP Markov decision process

Table 2 Summary of MDP model applications in healthcare

Researchers have applied MDP models to optimize initial treatment selection [25, 26] and the timing of transplantation [27, 28], to compare the effectiveness of different combinations of treatment [29], to optimize screening policy [30], and to prevent disease-related complications [31]. However, 16 studies concern the optimization of treatment decisions [32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47]. Five studies use the MDP to optimize treatment decisions for cancer [30, 35, 39, 42, 43], five focus on optimizing the treatment of diabetes mellitus [31,32,33,34, 41], and the remaining (N = 13) studies are concerned with liver diseases [27, 28], high blood pressure/hypertension [37, 40], hepatitis C [44], atherosclerotic cardiovascular disease [45], ischemic heart disease [29, 36], atrial fibrillation [38], anemia [47], tuberculosis [46], aneurysms [25], and stroke [26].

The MDP approach has been used to determine the optimal sequence of chemotherapy and radiation therapy [35, 39, 42, 43] and to select the appropriate drugs for anemia [47], tuberculosis [46], atherosclerotic cardiovascular disease [45], and hepatitis C [44]. In studies by Meng et al. [32], Mason et al. [33], and Shifrin and Siegelmann [41], MDP is applied to optimize the management of diabetes medication for glycemic control. An MDP-based treatment recommendation system for diabetes medication steps has also been proposed by Oh et al. [34]. In studies by Choi et al. [37] and Schell et al. [40], MDP is used to develop an automated strategy to select suitable anti-hypertensive medications and dosages for patients, thus accounting for their heterogeneity. In contrast, the articles by Ibrahim et al. [38] and Hauskrecht and Fraser [36] are primarily theoretical and do not apply MDP to any actual clinical settings.

Of the studies identified, 11 address treatment decisions at the individual level, especially in applications for diabetes and ischemic heart disease [32, 33, 35,36,37,38, 40, 41, 44, 45, 47]. They apply risk engines using individual-level covariates (e.g., the Framingham model [48] and the UKPDS risk engine [49]) to calculate transition probabilities between states with different treatments. In the gastro-esophageal cancer treatment application, the transition probability is calculated individually using the expected toxicity level and demographic variables [35]. In contrast, for the hypertension application, the authors examined several individual-level covariates, including 11 variables used as treatment effect modifiers to modify baseline risks [37, 40]. Ibrahim et al. [38] include different transition probabilities when analyzing/optimizing the length of the initiation period of anticoagulation therapy.

In all, 20 studies concern MDPs with a finite time horizon, while another three articles involve MDPs with an infinite time horizon [27, 28, 36]. Infinite-horizon MPDs do not require a pre-defined time horizon. For most algorithms to work and result in a well-defined optimal solution, however, these models do require a boundedness condition on the value function.

Most studies define states according to clinically relevant variables and discretely, with numbers ranging widely from 4 to 8492 states (see Table 1), except for one study [36] that reports 11 state variables rather than listing all the states. Nine studies consider three actions [25, 28, 30, 34, 37, 39, 43, 45, 46], while six consider two actions [27, 31,32,33, 41, 44], four use five actions [29, 35, 38, 40], three use four actions [26, 36, 47], and one does not specify the number of actions [42]. In most cases, larger numbers of actions distinguished are associated with greater complexity in the process of finding an optimal solution. Rewards most frequently consist exclusively of health benefits, with more than half of the studies having optimal treatment outcomes as their objective [25,26,27,28,29, 31, 32, 34, 37, 38, 40,41,42,43,44,45, 47]. Only three studies focus on minimizing costs [30, 35, 36], and three other studies use the combination of treatment outcomes and costs (or net benefits) as the reward function [33, 39, 46].

4.1.1 Assumptions and Requirements of MDP

An MDP model explicates a stochastic control process and formally consists of four essential elements: states, actions, transition probabilities, and rewards. Three common assumptions of all studies in clinical settings are as follows: (i) both states and action space are a finite set; (ii) an absorbing state is included in the Markov process, either death or severe functional impairment, which is essential for any finite-horizon MDP to obtain an optimal solution; (iii) MDP states are observable and mutually exclusive. Several authors make additional assumptions based on the characteristics of specific research questions. For example, Alagoz et al. [28] assume that the reward function is positive and non-increasing in a particular state after a cadaveric liver transplant action. This implies that the intermediate reward does not increase as the patient deteriorates. For diabetes, Eghbali-Zarch et al. [31] assume that the treatment decision of insulin is irreversible (implying that, once patients initiate insulin, they remain on it until the end of the time horizon), thus avoiding optimal strategies that would not correspond to clinical practice. Similarly, to mimic current clinical practice, Choi et al. [37] exclude a dosage decrease in the action space. In the study by Kim et al. [42], a non-zero dose is assumed in each treatment. In this sense, additional assumptions could be added to the model to accommodate current treatment practice or to avoid clinically unrealistic or unacceptable solutions.

4.2 Overview of Existing HE Decision Models for Depression

In all, we identified 63 existing HE decision models in the review of existing HE models, more than half of which are STMs (Appendix Fig. S1 in ESM). The number of model states distinguished varies from three to eight (Appendix Table S1 in ESM). In 21 studies, states are defined by disease severity in terms of clinically relevant criteria (e.g., symptom scores for depression). Only five studies have a lifetime time horizon [50,51,52,53,54]. In the remaining studies, except for one study with a very short time horizon (3 months) [55] and one with a relatively long time horizon (11 years) [56], the time horizon varies between 1 and 5 years [6,7,8,9, 57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77].

The models focus predominantly on five categories of interventions (Appendix Fig. S2 in ESM). In all, 16 studies use a healthcare perspective [7, 8, 50,51,52, 63, 65,66,67,68, 70,71,72,73, 75, 76], while nine adopt a societal perspective [6, 9, 53, 55, 56, 59, 60, 64, 77]. Only two studies use the payer perspective [57, 62], and the rest present results for both the healthcare and societal perspectives [54, 58, 61, 69, 74].

4.3 Illustration of Elements of an MDP Using a Reformulation of an Existing HE Model into an MDP

We applied the MDP to reproduce the research carried out by Ssegonja et al. [74]. This study was chosen for three reasons. First, it involves a relatively small number of states, such that it is easy for readers to understand and suitable for use as an example. Second, the study reports all model parameters clearly, providing a basis for reformulating it into MDP. Finally, the model structure of a pure STM (rather than a combination of Markov and decision tree) facilitates reformulation.

The study by Ssegonja et al. [74] uses a cost-effectiveness analysis at the cohort level to compare a group-based cognitive behavior therapy (GB-CBT) preventive intervention for depression with a non-intervention option in Sweden for adolescents, using an STM. The transition from subthreshold depression to depression and from subthreshold to healthy was affected by GB-CBT, as illustrated in Fig. 3.

Fig. 3
figure 3

Simplification of model structure in the original paper [74]

Translating this decision problem to an MDP formulation, Fig. 4 displays the process of reformulating an existing study into an MDP model, designed to explore the best decision between treating adolescents with GB-CBT and leaving them untreated. The possible decisions are represented by the actions (treating with GB-CBT or leaving untreated).

Fig. 4
figure 4

Process of the Markov decision process (MDP) model based on the original model by Ssegonja et al. [74]. Note: s denotes the current state; \({s}^{^{\prime}}\) denotes the next state; \({R}_{t}\) denotes the reward at time t. The variable \(\upgamma\) is a discount factor. \(\mathrm{Q}({s}_{t,}{a}_{t})\) indicates the monetary value of quality-adjusted life-years in the state \(s\) at time t, taking the decision \(\mathrm{a}\); \(\mathrm{C}({s}_{t,}{a}_{t})\) indicates the total cost in the state \(s\) at time t, taking the decision \(\mathrm{a}\); \(v(s)\) denotes state value function, which is the expected monetary return starting from state s; \(q({s}_{t} ,a)\) indicates the expected monetary return starting from state s, taking action a at time t; \({v}^{*}(s)\) indicates the optimal value function over all decisions in the state \(s\); \({q}^{*}({s}_{t},a)\) is the optimal value function for action a in the state \(s\); t is measured in years

According to the original study, the \(\upgamma\) value was 0.97. The Bellman optimality equation was used to find the solution [78]. Based on the optimal state value function \({v}^{*}(s)\) at the following decision epoch, the optimal action-value function \({q}^{*}\left({s}_{t},a\right)\) was calculated, as shown in Fig. 4. The model was coded in Python software 3.3.8 using the MDP toolbox [79]. In keeping with the uncertainty analysis in the original study, we also considered different willingness-to-pay (WTP) thresholds. The values for each state are presented in Table 3, along with different WTP thresholds. Note that, for this simple example, the optimization could be simplified to decide whether GB-CBT should be implemented in the first epoch, given that the action space for each decision epoch except the first is confined to a single action.

Table 3 Value of different states with different willingness-to-pay thresholds

At the WTP threshold value of US$20,000/QALY, the \({q}^{*}\left(subthreshold,intervention\right)\) was US$134,000 at t = 1, and the \({q}^{*}\left(subthreshold,no intervention\right)\) was US$131,000. The optimized value function when choosing to implement GB-CBT is therefore higher than for the alternative strategy, and the former is thus optimal. This means that choosing the intervention brings a net profit. We therefore conclude that adolescents can benefit from the GB-CBT preventive interventions and that it can also generate good value for money, as compared with leaving adolescents with subthreshold depression untreated. This conclusion is consistent with Ssegonja et al.

In contrast to the original HE model, the MDP structure allows for more flexibility. We could now extend the action space and consider other strategies (e.g., starting the preventive treatment after a person has been in the subthreshold space for one period). This could be achieved by separating more minor decision epochs that would allow interventions to be performed at more appropriate times, as well as by increasing the number of actions, making it possible to compare multiple preventive treatments simultaneously. In addition, the comparison between different strategies is based on the reward function, and it might therefore be relatively easy to vary the weight assigned to health outcomes or costs to investigate impact on the optimal decision.

4.4 Assessment of the Suitability of MDP for Solving Sequential Treatment Decisions in Depression

The Markov property is a precondition for any MDP. To assess the suitability of MDP for depression, it is important to recognize two important assumptions of an MDP. First, the state space and the action space are finite. A state explosion might occur, especially in a state-transition system with many processes or a complex data structure. This means that an infinite number of states could trap the model in an endless loop, causing it to fail in finding the optimal solution. All existing HE models for depression consist of a finite number of states (varying from three to eight), indicating that the application of MDP to optimizing sequential treatment decisions for depression would probably not result in a state explosion problem. The second assumption is that MDP states are observable, which essentially corresponds to the situation in which we know with certainty the disease from which the patient is suffering at all epochs.

As for other diseases, the five core elements of MDP for depression are decision epoch, state space, action space, reward, and transition probabilities. The decision epoch of the MDP structure could be the beginning of each treatment cycle, with a decision made at every clinical visit. In practice, this would depend on the frequency of visits. The MDP states could be defined by depression severity. Depression differs from many somatic illnesses, in which states are distinguished according to clinical parameters (e.g., blood glucose level in diabetes mellitus). Such clinical parameters are not easily defined for depression. As illustrated by the review of HE models, the states in most studies concerning depression are defined by disease severity in terms of clinically relevant criteria (e.g., symptom scores for depression).

Regarding the action space, depression interventions can largely be divided into two categories: psychotherapies and medications. The action/treatment choice for a patient at a specific point in time could thus be simplified to no intervention, psychotherapy, medication use, or both. In reality, however, many different medications and psychotherapies might be distinguished, and different intensities (dosages and hours of therapy per unit of time) and combinations could be considered. Finally, QALYs, costs, or their combination could serve as a reward, depending on the objective of the decision maker.

The heterogeneity of patients with depression could also be integrated into the MDP, allowing for individuals experiencing different trajectories. In theory, therefore, it would be feasible to use MDP to optimize the sequential treatment decision at the individual level. For depression, individual-level covariates (including age, gender, baseline symptomatology, educational level, or socio-economic position) could be used to calculate different transition probabilities between states with specific treatment. This would nevertheless require sufficient data on how these covariates affect transition probabilities.

Although MDP proved to be suitable for supporting sequential treatment decisions for depression, several issues continue to require careful consideration. For example, (i) how many states to distinguish and how to define them based on severity; (ii) how to decide on the proper granularity of the treatment choices and decision epochs considered; (iii) which individual characteristics are important to include when optimizing at the individual level; (iv) how to achieve a balance between the level of detail in treatment specification and the feasibility of optimization.

5 Discussion

Markov decision processes can be regarded as an extension of a state-transition model, which is the most frequently applied model structure in health economic evaluations. The STM model structure is based on the Markov chain, which is also the underlying structure in MDPs. In contrast to STMs, however, MDPs include actions and rewards, thereby allowing greater flexibility in defining treatment strategies and enhancing the optimization of these strategies. To optimize sequential treatment decisions in depression, the MDP structure is relevant and interesting for further pursuit. The current study identifies 23 applications of MDP in healthcare, 16 of which use MDP to solve sequential treatment decisions in somatic disease. This demonstrates how MDP has been used to address treatment issues related to somatic disease. In addition, the reformulation of the existing HE model provides insight into how MDP can be applied to depression, and the added value of MDP demonstrates that it has the capacity to make dynamic comparisons of more interventions over time than would a traditional STM.

Our study is subject to several limitations. First, we merely analyze the potential use of MDP for depression in theory. In real-world practical settings, the sequential treatment decision problem might be more complex. Second, we do not assess the quality of each paper, as our main aim is to explore a model of optimizing decision treatment for depression, rather than to analyze the existing publications systematically. Moreover, our search was limited to publications written in English. While we are relatively confident that we identified most existing HE models for depression, we are less certain about our coverage of MDP applications in healthcare, as there is a long list of journals in which such applications could potentially be published. Furthermore, the MDP structure is difficult to identify when it is not adequately described or when it is included as a component of a hybrid model. Third, our review of HE decision models is relatively brief and focused only on aspects that are relevant to the aims of our study. For a complete overview of existing models and their characteristics, other more extensive reviews are available [80, 81].

Sequential decision making in depression treatment is a difficult problem that has given rise to a large volume of research. While some trials have investigated the appropriate type of treatment for patients with depression [82, 83], optimization through a formal simulation modeling approach for depression has yet to be conducted. The repeated choice of optimal sequential treatment decisions (e.g., remain with the current intervention, change to another intervention, or stop treatment) could also help to identify the best treatment duration, based on individual characteristics and a predefined objective.

Recently, a new methodological framework known as whole disease modeling (WDM) has attracted attention. This framework is characterized by its ability to reflect decisions occurring at multiple points within the entire clinical trajectory of a disease. As with MDP, it aims to support decision making throughout the clinical trajectory. In contrast, however, WDM emphasizes macro-level HE evaluation considering all relevant aspects of the disease and its treatment from the preclinical phase until death at the system level (e.g., of a national healthcare system). Like MDP, its decision node is transferable across the entire process, as opposed to the single decision node in conventional HE models. At the same time, however, MDP is suitable for supporting decisions concerning a sequence of treatment decisions that support optimal clinical treatment at the individual level, whereas WDM would not usually allow treatment decisions to be changed based on patient characteristics within a short period. More specifically, the scope of a WDM is usually wider, while its depth is lower.

The current study provides a review of MDP applications within the field of healthcare and demonstrates that the MDP has the potential to steer the optimization of sequential treatment to aid personalized treatment decisions in the treatment of depression. This could potentially inspire healthcare decision makers, modelers, and the research community with regard to optimizing the allocation of healthcare resources.

6 Conclusion

The MDP has been successfully used to address healthcare decision-making problems, especially for those involving sequential treatment decisions. For depression, existing STMs have potential for fitting into the MDP approach, thereby laying a solid foundation for developing an MDP for depression. This approach might be better than STM at depicting continuous treatment decision making. In addition to supporting clinicians by offering an optimal sequential treatment plan over time, this model also provides information about the best timing for starting and ending treatment for heterogeneous patient groups. As in current practice, clinicians lack decision rules on what to do for each patient, when, and in which order. We conclude that the MDP is a potentially powerful model for optimizing sequential treatment in depression and for finding the optimal treatment duration for individuals.