Having completed an overview of CLDs, stock and flow diagrams, and system dynamics, this section will now proceed to simulate a system dynamics model for application in global health. Specifically, we apply the system dynamics model to undertake a case study on research data generated by clinical trials, electronic medical records (EMR), and patients. This is an important area of research because movement of data in the global health research system has the potential to impact treatment development globally.
The current premise in global health research is that using clinical trials—and thus using the underlying data for clinical trials—will lead to better treatment development and public health outcomes (Rosala-Hallas et al. 2018). However, in order to bring the right treatment to the right patient at the right time, the process must utilize data from, and contribute data to, a larger global health data ecosystem. Generating real time, pragmatic evidence is not enough; a human-centered data ecosystem will learn from the experience of real patients in real-time, employing all tiers of biological and non-biological data, across therapeutic areas and stakeholders, to better respond to individual and population-based health needs.
Our question seeks to address the relationship between patient EMR data and clinical trial data, and whether the prior can complement existing global health research data sources to enhance our understanding of human disease progression, with the ultimate goal of improving general health outcomes globally. This problem helps us think about the types of policy-based changes that might be necessary for governments, research organizations, and health-service organizations (e.g., providers, hospitals, and clinics) to encourage sharing and use of proprietary data (EMR and clinical trial data). This, we hope, will help identify the types of feedback loops necessary to facilitate better data flows in global health research and make medical breakthroughs benefitting the entire world.
This exercise will proceed in five parts. First, we will identify the key variables in the system. Next, using CLDs and their components (feedback, stock/flow, time delay, non-linearity), we conceptually visualise the ways in which data flows within the current global health research system are conceived. Equations were logically derived using variables and developed CLDs. These equations were next computationally modeled using R. Lastly, we share the types of policy questions and directions that may be run on the model.
6.3.1 Identifying the System and Variables
When choosing variables, it is important to use variables that describe patterns of behaviour and activity in global health, rather than specific singular events (Kim 1992). Further, it can be helpful to think about the types of variables that affect the problem the most, and which the least. In this regard, subject matter experts should be consulted to broadly identify key factors affecting the model in the form of expert elicitation.
In our case study, four stock variables were ultimately decided as being critical to the movement of data within the global health landscape. Specifically, patients in hospitals, shared EMR data, health research data, and available treatments were all identified as key recurring variables contributing to the data-rich ecosystem of global health. These four variables were chosen because we are interested in the amount of data generated (EMR and clinical) and the impact data has on public health and research (patients and treatment availability). Further, these variables represent the units of measure for data (Shared EMR data and Health Research Data), as well as a surrogate measure for general health (Patients in hospitals) and scope of health products (Available treatments).
Efforts were taken to brainstorm some of the key factors that would affect the aforementioned stock variables (Table 6.1). In a real exercise, these factors would be consulted upon by experts to confirm their suitability for the model, including whether they could be easily measured and monitored. Moreover, it must be mentioned that a number of variables were decidedly not included in the system conceptualization of global health research, including disease prevention, intellectual property governance, research infrastructure, and workforce.
Table 6.1 Brainstorming variables for systems conceptualization 6.3.2 Causal Loop Diagrams (Feedback, Stock/Flow, Time Delay, Non-linearity)
Next, we proceed to establishing the links between the stock variables on the CLD, the polarity or direction on each link, stocks and flows, and the identification and labelling of the reinforcing or balancing loops in the diagram. For example, we know that one of the primary causal links that drives public health outcomes is the number of approved treatments (stock variable) with an inflow variable called approval rate. Similarly, approval rates are affected by research productivity and total number of clinical trials—both reinforcing loops. On the other hand, research and development budgets have a balancing effect on the total available treatments.
We have identified two balancing loops in Fig. 6.5. These are loop B1, indicated by the red counter-clockwise arrow, and loop B2 indicated by the clockwise arrow (Fig. 6.5). B1 regulates the amount of EMR data being generated, and B2 helps control the amount of patients that gets admitted into the hospital. The reinforcing loop R1 is seen as amplifying the system, which through the positive (+) signs at the arrowheads indicate that the effect is reinforcing in a positive direction. We only show these three feedback loops to demonstrate examples but it is worth noting that this CLD contains other feedback loops that also contribute to the behavior of the system.
In our hypothetical system, we assumed the hospital directly collect the data from the patients and their EMR. The patients have the right to share their data which determines the amount of data that should be shared; however, the amount of data that is shared is influenced by their willingness to share their own data. As an example, the threat of cyber attacks on EMR systems impact the patients’ willingness to share. This effect is captured by the feedback loop B1. In loop B2, the single negative polarity between general health outcomes and sick fraction causes the entire feedback loop to be a balancing. This loop signifies the tradeoff between number of sick people in hospitals generating data that contributes to treatment development and general health outcomes. In other words, it is assumed that having a healthy-only population would stall the treatment development process.
In feedback loop R1, we assume that pharmaceutical companies are the sole sponsors of clinical trials, having direct access to both EMR and clinical trial data (health research data). The number of available treatments is directly linked to research & development budget. Furthermore, we assumed that all of the research & development budget is spent on conducting clinical trials (represented as number of clinical trials) in each time period. The cost of clinical trials is positively correlated with number of enrolled subjects in clinical trials (enrollment per trial).
Although a host of other variables and loops, both reinforcing and balancing, could be identified as being relevant in the process, care was given to keep the conceptual diagram as simple as possible in order to ensure a parsimonious model. Thus, only the dominant loops were reflected in the CLD, which signify the behaviour of the system shifting from acceleration to deceleration, and gradual equilibrium. As mentioned, the CLD may be revised and a number of times as understanding deepens and the multidisciplinary process unfolds.
6.3.3 Constructing System Dynamics Equations
Based on the CLD/stock & flow diagram that we developed in Fig. 6.5, we can formulate the conceptual model into mathematical equations. To illustrate how one would formulate a stock and flow equation, let us look at the stock variable ‘patients in hospitals’.
For patients in hospitals \(P\), we can deduce that there are primarily an inflow and outflow: Admissions \(IP\) and Discharges \(OP\). We assume that no one dies in our fictional hospital. As a result, our equation looks like the following:
$$\frac{dP\left( t \right)}{dt} = IP\left( t \right) - OP\left( t \right)$$
(6.2)
The inflow, admissions \(IP\), in our model is simply equal to the sick population \(SP\) (auxiliary variable), which makes the assumption that all sick person go directly to the hospital in our fictional world. For real life applications, we could include a delay variable and/or constraint on the impact of sick people on hospital admissions. Thus,
$$IP\left( t \right) = SP\left( t \right)$$
(6.3)
While the outflow, discharges \(OP\), can be a function of patients if we model it as a fraction, \(\lambda\), being discharged from the hospital. As mentioned previously, this value is being subtracted in Eq. (6.2) because it is an outflow.
$$OP\left( t \right) = \lambda \cdot P\left( t \right)$$
(6.4)
As a result, we substitute IP and SP and rewrite the differential equation for stock variable, \(P\), as
$$\frac{dP\left( t \right)}{dt} = SP\left( t \right) - \lambda \cdot P\left( t \right)$$
(6.5)
It also worth mentioning that if we continue substituting auxiliary variables into the flow variables of our stock differential equations, we can mathematically reduce the entire model into a system of only four differential equations because our system only has four stock variables, and each differential equation corresponds with a stock variable.
We present several examples in the following paragraphs to illustrate the logic behind formulating a balancing feedback loop. This includes three auxiliary variables, one inflow variable, and one stock variable based on expert knowledge. Take a look at the balancing feedback loop B1, highlighted in Fig. 6.6:
Since security risk \(\alpha\) is fixed in our model, we assume that if more people share their data that would lead to a frequency of security breaches. Therefore, security breaches \(S\) is defined as a function of security risk (percentage of data that is compromised) and the amount of shared EMR data \(D_{S}\).
$$S\left( t \right) = \alpha \cdot D_{S} \left( t \right)$$
(6.6)
In turn, security breaches negatively affect the willingness to share EMR data \(WS\) which follows the logic that less people are willing to share their personal information if they observe higher incidences of security threats. This negative polarity enables feedback loop B1 to be balancing. Therefore, we must choose a mathematical function that has an inverse relationship between the dependent and independent variable. As a result, we can express an inverse mathematical relationship between \(WS\) and \(S\) as
$$WS\left( t \right) = \frac{1}{\beta \cdot S\left( t \right)}$$
(6.7)
where \(\beta\) is the sensitivity of patients to security risks. The willingness to share data positively influence the data shared by each patient (labeled as data shared per patient and symbolized as \({\widehat{D_{S}}}\)).
$${\widehat{D_{S}}} \left( t \right) = \gamma \cdot WS\left( t \right)$$
(6.8)
Proceeding along the link, we get to the inflow variable, creation of shared EMR data \(ID_{S}\) which is the number of patients \(P\) multiplied by the data shared per patient \(\widehat{D_{S}}\)
$$ID_{S} \left( t \right) = {\widehat{D_{S}}} \left( t \right) \cdot P\left( t \right)$$
(6.9)
Finally, the creation of shared EMR data directly feeds into the shared EMR data stock.
$$\frac{{dD_{S} \left( t \right)}}{dt} = ID_{S} \left( t \right)$$
(6.10)
For a complete listing of all the equations in our model, please refer to the supplement Jupyter notebook.
6.3.4 Modelling and Data Integration
As we have demonstrated, we can convert our CLD/stock and flow diagram into a system dynamics model by prescribing each causal link with a mathematical equation. After populating each link with an equation, we are able to numerically solve the entire of system as a set of ordinary differential equations using previously developed methods to solve differential equations.
The system dynamics model that was described in the previous section was implemented in R, a statistical programming environment. As previously mentioned, R is a robust programming language that allows users to develop their own numerical solver or access a wide variety of libraries that are useful for statistical analysis and numerical methods. In our example, we used the library package deSolve (Soetaert et al. 2010) and FME (Soetaert and Petzoldt 2010) to, compute the dynamic behavior and parameterize our coefficients, respectively. To solve these models, we would need to employ a numerical method (e.g., forward and backward-stepping Euler’s method, or Runge-Kutta family of solvers). These solvers have been programmed in the deSolve library in R which makes it easy to implement into any system dynamics model. The reader may refer to the accompanying Jupyter notebook located in the following URL that contains the steps needed to implement the system dynamics model: https://github.com/scarygary89/SimpleHealthDataSharingSD/blob/master/HealthDataSharingModel.ipynb.
System dynamics models can be developed to formalize logic without much data. However, the model’s usefulness is greatly enhanced if the parameters can be determined based on a maximum likelihood estimation and finding the confidence intervals using likelihood methods or bootstrapping (Dogan 2007). The parameters that we adjust to fit the model are the constant variables system dynamics model. There may also be times where initial values are also considered a parameter. Figure 6.7 shows how fitted parameters produce a simulated trendline (teal line) compares with “actual data” (red points). For demonstration purposes, the data in Fig. 6.7 is generated synthetically and not based on any real dataset.
The resulting parameter values based on the calibration and fit is located in the following Table 6.2.
Table 6.2 Parameter values based on model fitting with data Based on these constants, we have enough information to replicate the model and develop a baseline scenario in which we can test alternative policy scenarios. These parameters are estimated based on maximum likelihood estimation that calibrates a parameter that minimizes the (errors) residuals to best fit the data.
Error analysis can be conducted to calculate standard statistical measures that are similar to regression models. We can conduct hypothesis testing on each parameter by assuming a null hypothesis that a parameter is equal to zero and testing that against alternative hypothesis the parameter is equal to the calibrated value and comparing the error distributions—this allows us to conduct a t-test and calculate p-values assuming normality in error distribution. Other parameter validation methods include bootstrapping (refer to Rahmandad et al. 2015a) and method of simulated moments (refer to Rahmandad et al. 2015b) which can help the modeler build confidence in the estimation of each parameter. To further understand the parameters and their sensitivities, the modeler may wish to perform a Monte Carlo Markov Chain (MCMC) analysis (refer to Rahmandad et al. 2015c).
6.3.5 Interpreting Results and Policy Directions
Once the model is calibrated and running, researchers may wish to use it for testing targeted policy questions. For the purposes of this exercise, we are less concerned with the actual results since we are only illustrating the types of questions that may be of interest to global health policy-makers wishing to utilize system dynamics.
Three particular insights produced from our hypothetical CLD led us to consider the impact of cyber security attacks (security risk); data capture and interoperability in clinical trials (data generated per trial), and machine learning and artificial algorithms (data technology) on the global health research system. These areas of interest led us to propose the kinds of questions that may be run on the model we have generated:
-
Assuming an increase in data security attacks, where can policy-makers best target resources to ensure patient’s continue to contribute health data?
-
How might birth-to-death collection of data impact the cost of clinical trials? How does the collection of data both from sick and healthy people impact the price of clinical trials?
-
How much money can be invested in new data technology to result in a two-fold decrease of patients in hospitals?