Keywords

FormalPara Objectives

This chapter will proceed in three parts. First, in the background, we will describe how system dynamics has suitable applications to data-rich ecosystems. Next, we will share key introductory elements of system dynamics, including CLDs, stock and flow diagrams, and systems modelling. Finally, in the hands-on exercise, we will simulate a system dynamics model of clinical trial data for application in global health.

No prior modelling experience is assumed.

1 Background

System dynamics is a fundamentally interdisciplinary field of study that helps us understand complex systems and the sources of policy resistance in that system to be able to guide effective change (Sterman 2001). Within system dynamics, causal loop diagrams are the main analytical tools that assist in the identification and visualization of key variables and the connections between them. The related systems modelling methodology of system dynamics involves computer simulation models that are fundamentally unique to each problem setting (Homer and Hirsch 2006: 452).

This section will proceed in three parts. We will introduce, first, what we mean by data-rich ecosystems; second, the terminology of system dynamics; and third, a few applications of system dynamics in data-rich ecosystems.

1.1 Data-Rich Ecosystems

Data-rich ecosystems are defined as “technological and social arrangements underpinning the environments in which [data] is generated, analysed, shared and used” (Marjanovic et al. 2017: ii). These systems give rise to dynamic complexity because the system is: (1) constantly changing, (2) tightly coupled, (3) governed by feedback, (4) nonlinear, (5) history-dependent, (6) self-organizing, (7) adaptive, (8) characterized by trade-offs, (9) counterintuitive, and (10) policy resistant (Sterman 2001: 12). Indeed, problems plaguing data-rich ecosystems require understanding how the whole system will react to a seemingly inconsequential modification in one part of the system (Sterman 2001).

One example of such a data-rich ecosystem is global health, where the potential of data holds promise across all the building blocks of health systems. In its broadest sense, health data refers to any type of data that provides use for improved research and innovation, as well as healthcare related decision making (Marjanovic et al. 2017). As a result, a supportive health data ecosystem requires at least the following five elements: (1) collaboration and coordination, (2) public acceptance and engagement with health data, (3) data protection regulation and models of data access and use, (4) data quality, interoperability, and other technical considerations, and (5) workforce capacity (Marjanovic et al. 2017). These complexities, coupled with the still growing landscape of global health data generation, interpretation and use, require a systematic approach that has the potential to facilitate decision-making that aligns our long-term best interests with those of the system as a whole (Sterman 2001).

1.2 System Dynamics

A system can be characterized as a group of multiple components that interact with each other. System dynamics was originally meant to invoke systems thinking by endogenizing relevant variables and mathematically connecting causally linked variables (Richardson 2011). In particular, it requires moving away from isolated events and causes and toward the organization of the system as a set of interacting parts (Kirkwood 1998). As a result of its conceptual intuition, the system dynamics paradigm originally came to be interdisciplinary in nature (Sterman 2001). This has notably allowed researchers without quantitative backgrounds to participate in structural formation of the model.

Today, a system dynamics model consists of an interlocking set of differential and algebraic equations developed from a broad spectrum of relevant measured and experiential data (Cavana and Mares 2004). Systems thinking and causal loop diagramming allows researchers to move from conceptual understanding of unidimensional problems to a completed systems model containing scores of such equations, each with their appropriate numerical inputs. Once computerized, these models offer ways of systematically testing policies and scenarios in ways that answer both “what if” and “why” (Homer and Hirsch 2006).

Further, since modelling is iterative, the process relies on repeated attempts of scope selection, hypothesis generation, realistic causal diagramming, quantification, reliability testing, and policy analysis. These steps are selectively repeated until the model is able to generate useful insights while meeting certain requirements, such as its realism, robustness, and flexibility (Homer and Hirsch 2006). Ultimately, the ability to see systems “as a whole” provides a framework for understanding complexity and change, testing levers for policies that would result in sustainable progress (Cavana and Mares 2004; Homer and Hirsch 2006; Senge 1990).

1.3 Applications of System Dynamics in Data-Rich Ecosystems

An early application of system dynamics modelling includes an integrated assessment of anthropogenic impacts on the environment and resource scarcity which paved the way for integrated assessment modelling in sustainability applications (Forrester 1961; Meadows et al. 1972). System dynamics models have since found application in a number of data-rich ecosystems. For example, management and business operations benefit from modelling their entire enterprise at a systems-level, which includes all relevant processes, stakeholders, and components. In the context of healthcare delivery, system dynamics have been deployed to address problems with capacity and management of patient flow, but it is not out of the realm of possibilities for system dynamics to be employed as a way to study multiple, mutually reinforcing, interacting diseases and risks, in a way that gives a more realistic snapshot of overall epidemiology and policy implications (Homer and Hirsch 2006). Other successful interventions using system dynamics include long-range market forecasting, strategy development in manufacturing and commercial product development, and models for effective management of large-scale software projects (Sterman 2001).

2 Causal Loop Diagrams, Stock and Flow Diagrams, and System Dynamics

This section will give a more thorough introduction to the terminologies, concepts, equations, and tools utilized in system dynamics. The first part will discuss CLDs and, with the use of a classic example, their role in visualizing the relationships that govern complex systems. Then, we will introduce and describe how stock and flow diagrams quantitatively build upon the qualitative relationships mapped out in CLDs. Finally, we briefly discuss the software utilized to simulate and test multiple scenarios for a given system dynamics model.

2.1 Causal Loop Diagrams

Social issues that affect people and society typically involve complex systems composed of several components and interactions. Thus, answering policy questions typically involves a team of interdisciplinary researchers that observe and discuss the drivers surrounding a certain social issue. In these complex systems, “cause and effect are often distant in time and space, and the delayed and distant consequences of actions are different from and less salient than their proximate effects—or are simply unknown” (Sterman 2001: 16). These components and interactions can be visually mapped using a methodological paradigm or a “language” for understanding the dynamic, interconnected nature of our world, known as CLDs.

CLDs allow researchers to use a systems approach to understand the different scale and scope of an issue. One of the more immediate advantages of CLDs is the intuitive methodology behind building these maps. Indeed, the development of such diagrams or maps (that aim to capture the complexities of a multifaceted issue) do not require extensive quantitative training in engineering or mathematics. Detailed methods for developing CLDs have been outlined by Roberts et al. (1983), Richardson and Pugh (1981), Richardson (1991), Coyle (1996), Sterman (2001), and Maani and Cavana (2004).

For our purposes, the mapping legend to make CLDs comprises two basic features. First, CLDs are composed of variables and directional links (i.e., arrows) that represent causal interactions. The directional links illustrate a “cause and effect” relationship such that the origin variable will affect another variable (i.e., cause → effect). Second, causal linkages have two polarities: positive (same direction) and negative (opposite direction) (Cavana and Mares 2004; Kim 1992; Maani and Cavana 2004; Richardson 1991). A positive causal link indicates that two linked variables will increase or decrease together (same direction). A negative polarity between two variables implies an inverse or opposing relationship (opposite direction); an increase in one variable causes a decrease in the other linked variable and vice versa. In this way, a CLD is developed based on linking variables that are causally related. The following figure represents a simple example that is commonly observed in population modelling (Fig. 6.1).

Fig. 6.1
figure 1

Positive versus negative polarities

Once the problem is defined, the next step is to identify the relevant variables that affect the issue. Subsequently, the goal is to identify the variables in the adjacent systems that affect the ‘primary variables’. From a graphical standpoint, one can view all the variables in a CLD as ‘nodes’, and links as ‘edges’. After all the variables (nodes) and links are mapped, the ‘feedback loops’—or closed loops of variables—become more apparent. A coherent and holistic narrative about a particular problem is created by connecting the nodes and links of several loops (Kim 1992).

Feedback loops are next classified into two categories: reinforcing and balancing. In literature, reinforcing and balancing feedback loops are sometimes called positive and negative feedback loops, respectively (Kirkwood 1998). The reinforcing feedback loop is composed of all positive polarities in the same direction and/or an even number of negative polarities in the opposite direction (Kim 1992). If a reinforcing loop has a positive polarity, an even number of negative polarities would simply result in an overall positive polarity (i.e., two sequential links with negative polarity). We demonstrate an example of a reinforcing feedback loop in Fig. 6.2. In this example, we show how a raise in income leads to a rise in savings, in turn, boosting amount of interest accrual. The idea of reinforcing loops is quite provocative since these systems lack nontrivial equilibrium, rendering them unstable.

Fig. 6.2
figure 2

Example of a reinforcing feedback loop (Income and interest on a bank account)

In contrast, balancing feedback loops exist when a series of variables that are connected in a loop has an odd number of negative polarities. An example of a balancing feedback loop is shown in Fig. 6.3, where we recreated a CLD of the Lotka-Volterra system, which is more commonly known as the ‘predator-prey’ model. In our example, we show that sheep are prey population and the wolves are the predator population. If there are more wolves, then the population of sheep will decline because the wolves would be consuming more sheep. When there are not enough sheep to sustain the consumption requirement of wolves, the wolf population will dwindle. Therefore, this system is inherently a balancing feedback loop because of the inverse relationship between the population of wolves and sheep (Fig. 6.3).

Fig. 6.3
figure 3

Example of a Lotka-Volterra system of a balancing feedback loop

The CLD is constantly analysed visually to identify the key variables and the range of balancing and reinforcing loops it contains. A key feature of this process is also to simplify the conceptual diagram so the resulting insights can be used as the basis for developing and implementing policy (Cavana and Mares 2004). Based on the definition of feedback loops researchers should be able to understand certain mechanisms of a system they are studying. Further, in order for there to be a system that is stable, in other words, self-correcting or equilibrium-seeking, there must be a balancing loop that exists in some combination with a reinforcing loop. We encounter many examples of stable systems on a daily basis. For instance, a swinging pendulum eventually returns back to its original resting position (stable equilibrium point) due to gravity and remains stationary after some time.

We can also characterize a system as being unstable when a certain variable or mechanism is perturbed from its equilibrium. When a loop loses stability or balancing variables, one must correct the system by adding more counteracting forces. The variables responsible for this instability are typically targets for policy interventions. An example of an unstable system includes social dynamics in a country with conflicting groups. In this situation, the equilibrium would be peace. However, peace would be relatively fragile if those groups did not get along with each other. A slight provocation would cause the system to stray away from peace (unstable equilibrium point).

2.2 Stock and Flow Diagrams and System Dynamics Modelling

Systems academics have long noted human limitations when dealing with dynamic, evolving, and interconnected systems (Sterman 2001). Stock and flow diagrams and system dynamics modelling can help us avoid mental models that are “static, narrow, and reductionist” (Sterman 2001: 11). Specifically, dynamic systems change over time, which necessitates considering the behavior of each variable in a temporal domain. This involves translating the visual mapping of CLDs to diagrams that measure the accumulation and dispersal of resources over time (Sterman 2001).

In a stock and flow diagram, there are four main variables: stock, flow, auxiliary, and delay. A stock variable can be thought of as a “memory” variable which carries over after each time step. Due to their characteristics as defining the “state” of the system, these variables send out signals to all the other parts of the system. For example, one can imagine the volume of water in a container to be a stock variable since the volume in a previous time will carry over into the present unless there is a change in volume, i.e., someone adding more water or draining the container. In contrast, the idea of a changing stock variable can be represented by flow variables. These variables are subject to disappearing if time is hypothetically stopped. Thus, in a situation where the volume of water is increasing due to more water pouring into a container, we can consider that volumetric flow rate as an inflow variable. While the volumetric flow loss due to a drainage of the container will be an outflow variable. Thus, in Fig. 6.4, stock variables are sheep and wolves, whereas flow variables are Wolf Births, Wolf Deaths, Sheep Births, and Sheep Deaths.

Fig. 6.4
figure 4

Stock & flow diagram of the Lotka-Volterra model. Assuming arbitrary initial values and constants, the theoretical results of the Lotka-Volterra system were generated using the software Vensim to demonstrate that the stock and flow diagrams is identical to the mathematical formulations. The top right represents the phase space between wolf and sheep populations. The bottom right diagram represent the time series of wolf and sheep populations

Outside of stock and flow variables, there exist auxiliary variables which are simply defined as variables that influence the flows. These variables do not change the mathematical structure of the system, but do help bring transparency to the model. In the diagram below, auxiliary variables (endogenous) include Wolf Fertility Rate and Sheep Mortality Rate. We also have constant values (exogenous) which includes Sheep Fertility Rate and Wolf Mortality Rate.

Lastly, delay variables exist when a casual action occurs at a later time. Delay variables exist when there is a time lag between policy interventions and change in a pattern of human behavior. For example, a tax imposed on a specific good may not result in an immediate decline in demand because it takes time for the consumers to realize and respond to the surge in the price of the good. It should be noted that the delay length itself is a constant that needs to be parameterized and may introduce additional mathematical complexity.

System dynamics models use stock and flow diagrams to translate conceptual models to a mathematical one. Stocks can mathematically be expressed as integrals and generally considered the state variables of the system. Stock variable \(y\) can be explicitly represented as:

$$\frac{dy\left( t \right)}{dt} = x_{IN} \left( t \right) - x_{OUT} \left( t \right)$$
$$y\left( t \right) = y\left( {t_{0} } \right) + \mathop \int \limits_{{t_{0 } }}^{t} x_{IN} \left( \tau \right) - x_{OUT} \left( \tau \right) d\tau$$
(6.1)

In Eq. (6.1), the variable \(x_{IN}\) is the inflow, and \(x_{OUT}\) is the outflow of the system. The combined effect of the inflow and outflow variables represent the derivative of the stock such that inflows are a positive change and outflows are negative change. There could be multiple inflow and outflow variables associated with a stock. As a result, we are able to mathematically solve a system dynamics model as a system of ordinary differential equations. The phase plot and time series plot in Fig. 6.4 were generated using a system dynamics approach with chosen parameters. This was executed by converting stocks and flow diagram of the Lotka-Volterra model into corresponding mathematical equations. The auxiliary variables that determine the inflow of wolves and outflow of sheep are

$$\begin{aligned} Wolf\; Fertility\; Rate & = \alpha \cdot Sheep \\ Sheep\; Mortality\; Rate & = \omega \cdot Wolves \\ \end{aligned}$$

where \(\alpha\) and \(\omega\) are constants.

In stock and flow diagram in Fig. 6.4, the two stocks (represented as the bottom right plots)—Wolves and Sheep—are modified by births and deaths which correspond with the inflow and outflow variables for both stocks and they are described as the following differential equations.

$$\begin{aligned} & \frac{dWolves\left( t \right)}{dt} = Wolf \;Births - Wolf\; Deaths \\ & = Wolf\; Fertility\; Rate\; \cdot \;Wolves - Wolf\; Mortality \;Rate\; \cdot \;Wolves \\ \, &= \alpha \; \cdot \;Sheep\; \cdot \;Wolves - Wolf\; Mortality\; Rate\; \cdot \;Wolves \\ \end{aligned}$$
$$\begin{aligned} & \frac{dSheep\left( t \right)}{dt} = Sheep\; Births - Sheep\; Deaths \\ & = Sheep \;Fertility \;Rate\; \cdot \;Sheep - Sheep\; Mortality\; Rate\; \cdot \;Sheep \\ & = Sheep\; Fertility\; Rate\; \cdot \;Sheep - \omega \; \cdot \;Wolves \\ \end{aligned}$$

In this predator-prey model, Wolf Mortality Rate and Sheep Fertility Rate will also be considered constants.

2.3 Software and Computational Implementation

System dynamics modelling is originally meant to simulate the emerging mechanics and explain a system. Simulation of the mathematical model following the above system conceptualization stages can be undertaken using computers and software. Modern system dynamics modelling software makes it possible for almost anyone to participate in the modelling process. These simulations allow researchers to experiment, test their decision-making skills, and just ‘play’ (Sterman 2001: 21). System dynamics models can be easily implemented in a number of open-source and commercial software such as: Vensim (Free personal learning edition); STELLA (Proprietary); AnyLogic (Proprietary); and NetLogo (Free).

When choosing a suitable software, researchers working in data-rich ecosystems should pay particular attention to the capacity of the program to handle large datasets and advanced analytical tools. R and Python are a favorite for data scientists and researchers because both computing environments are open-source and adept to handling large data (Ihaka and Gentleman 1996; Johansson et al. 2012; McKinney 2013; Pedregosa et al. 2011). Furthermore, these computing languages have access to a wide range of libraries that allow for easy implementation of structural equations modelling.

Although machine learning and probabilistic methods typically perform better in prediction due to their enhanced capabilities of detecting trends in big data, system dynamics models provide more inferential capabilities by allowing researchers to test an expertise-based hypothesis with a complex causal structure. Nevertheless, recent advances in dynamical systems theory has allowed careful and effective parameterization using statistical learning and probabilistic methods (Brunton et al. 2016). Therefore, system dynamics can be robust and incorporate the human domain knowledge and advanced statistical tools. Parameters and initial conditions of the model are usually estimated using statistical means, market research data, analogous product histories, expert opinion, and any other relevant sources of data, quantitative or judgmental (Sterman 2001).

Finally, it is worth mentioning that simulation experiments can suggest the collection of new data and new types of experiments to run to resolve uncertainties and improve the model structure (Sterman 2001). Building model confidence is truly an iterative process that requires robust statistical testing.

3 Exercise

Having completed an overview of CLDs, stock and flow diagrams, and system dynamics, this section will now proceed to simulate a system dynamics model for application in global health. Specifically, we apply the system dynamics model to undertake a case study on research data generated by clinical trials, electronic medical records (EMR), and patients. This is an important area of research because movement of data in the global health research system has the potential to impact treatment development globally.

The current premise in global health research is that using clinical trials—and thus using the underlying data for clinical trials—will lead to better treatment development and public health outcomes (Rosala-Hallas et al. 2018). However, in order to bring the right treatment to the right patient at the right time, the process must utilize data from, and contribute data to, a larger global health data ecosystem. Generating real time, pragmatic evidence is not enough; a human-centered data ecosystem will learn from the experience of real patients in real-time, employing all tiers of biological and non-biological data, across therapeutic areas and stakeholders, to better respond to individual and population-based health needs.

Our question seeks to address the relationship between patient EMR data and clinical trial data, and whether the prior can complement existing global health research data sources to enhance our understanding of human disease progression, with the ultimate goal of improving general health outcomes globally. This problem helps us think about the types of policy-based changes that might be necessary for governments, research organizations, and health-service organizations (e.g., providers, hospitals, and clinics) to encourage sharing and use of proprietary data (EMR and clinical trial data). This, we hope, will help identify the types of feedback loops necessary to facilitate better data flows in global health research and make medical breakthroughs benefitting the entire world.

This exercise will proceed in five parts. First, we will identify the key variables in the system. Next, using CLDs and their components (feedback, stock/flow, time delay, non-linearity), we conceptually visualise the ways in which data flows within the current global health research system are conceived. Equations were logically derived using variables and developed CLDs. These equations were next computationally modeled using R. Lastly, we share the types of policy questions and directions that may be run on the model.

3.1 Identifying the System and Variables

When choosing variables, it is important to use variables that describe patterns of behaviour and activity in global health, rather than specific singular events (Kim 1992). Further, it can be helpful to think about the types of variables that affect the problem the most, and which the least. In this regard, subject matter experts should be consulted to broadly identify key factors affecting the model in the form of expert elicitation.

In our case study, four stock variables were ultimately decided as being critical to the movement of data within the global health landscape. Specifically, patients in hospitals, shared EMR data, health research data, and available treatments were all identified as key recurring variables contributing to the data-rich ecosystem of global health. These four variables were chosen because we are interested in the amount of data generated (EMR and clinical) and the impact data has on public health and research (patients and treatment availability). Further, these variables represent the units of measure for data (Shared EMR data and Health Research Data), as well as a surrogate measure for general health (Patients in hospitals) and scope of health products (Available treatments).

Efforts were taken to brainstorm some of the key factors that would affect the aforementioned stock variables (Table 6.1). In a real exercise, these factors would be consulted upon by experts to confirm their suitability for the model, including whether they could be easily measured and monitored. Moreover, it must be mentioned that a number of variables were decidedly not included in the system conceptualization of global health research, including disease prevention, intellectual property governance, research infrastructure, and workforce.

Table 6.1 Brainstorming variables for systems conceptualization

3.2 Causal Loop Diagrams (Feedback, Stock/Flow, Time Delay, Non-linearity)

Next, we proceed to establishing the links between the stock variables on the CLD, the polarity or direction on each link, stocks and flows, and the identification and labelling of the reinforcing or balancing loops in the diagram. For example, we know that one of the primary causal links that drives public health outcomes is the number of approved treatments (stock variable) with an inflow variable called approval rate. Similarly, approval rates are affected by research productivity and total number of clinical trials—both reinforcing loops. On the other hand, research and development budgets have a balancing effect on the total available treatments.

We have identified two balancing loops in Fig. 6.5. These are loop B1, indicated by the red counter-clockwise arrow, and loop B2 indicated by the clockwise arrow (Fig. 6.5). B1 regulates the amount of EMR data being generated, and B2 helps control the amount of patients that gets admitted into the hospital. The reinforcing loop R1 is seen as amplifying the system, which through the positive (+) signs at the arrowheads indicate that the effect is reinforcing in a positive direction. We only show these three feedback loops to demonstrate examples but it is worth noting that this CLD contains other feedback loops that also contribute to the behavior of the system.

Fig. 6.5
figure 5

CLD showing data flow in global health research

In our hypothetical system, we assumed the hospital directly collect the data from the patients and their EMR. The patients have the right to share their data which determines the amount of data that should be shared; however, the amount of data that is shared is influenced by their willingness to share their own data. As an example, the threat of cyber attacks on EMR systems impact the patients’ willingness to share. This effect is captured by the feedback loop B1. In loop B2, the single negative polarity between general health outcomes and sick fraction causes the entire feedback loop to be a balancing. This loop signifies the tradeoff between number of sick people in hospitals generating data that contributes to treatment development and general health outcomes. In other words, it is assumed that having a healthy-only population would stall the treatment development process.

In feedback loop R1, we assume that pharmaceutical companies are the sole sponsors of clinical trials, having direct access to both EMR and clinical trial data (health research data). The number of available treatments is directly linked to research & development budget. Furthermore, we assumed that all of the research & development budget is spent on conducting clinical trials (represented as number of clinical trials) in each time period. The cost of clinical trials is positively correlated with number of enrolled subjects in clinical trials (enrollment per trial).

Although a host of other variables and loops, both reinforcing and balancing, could be identified as being relevant in the process, care was given to keep the conceptual diagram as simple as possible in order to ensure a parsimonious model. Thus, only the dominant loops were reflected in the CLD, which signify the behaviour of the system shifting from acceleration to deceleration, and gradual equilibrium. As mentioned, the CLD may be revised and a number of times as understanding deepens and the multidisciplinary process unfolds.

3.3 Constructing System Dynamics Equations

Based on the CLD/stock & flow diagram that we developed in Fig. 6.5, we can formulate the conceptual model into mathematical equations. To illustrate how one would formulate a stock and flow equation, let us look at the stock variable ‘patients in hospitals’.

figure a

For patients in hospitals \(P\), we can deduce that there are primarily an inflow and outflow: Admissions \(IP\) and Discharges \(OP\). We assume that no one dies in our fictional hospital. As a result, our equation looks like the following:

$$\frac{dP\left( t \right)}{dt} = IP\left( t \right) - OP\left( t \right)$$
(6.2)

The inflow, admissions \(IP\), in our model is simply equal to the sick population \(SP\) (auxiliary variable), which makes the assumption that all sick person go directly to the hospital in our fictional world. For real life applications, we could include a delay variable and/or constraint on the impact of sick people on hospital admissions. Thus,

$$IP\left( t \right) = SP\left( t \right)$$
(6.3)

While the outflow, discharges \(OP\), can be a function of patients if we model it as a fraction, \(\lambda\), being discharged from the hospital. As mentioned previously, this value is being subtracted in Eq. (6.2) because it is an outflow.

$$OP\left( t \right) = \lambda \cdot P\left( t \right)$$
(6.4)

As a result, we substitute IP and SP and rewrite the differential equation for stock variable, \(P\), as

$$\frac{dP\left( t \right)}{dt} = SP\left( t \right) - \lambda \cdot P\left( t \right)$$
(6.5)

It also worth mentioning that if we continue substituting auxiliary variables into the flow variables of our stock differential equations, we can mathematically reduce the entire model into a system of only four differential equations because our system only has four stock variables, and each differential equation corresponds with a stock variable.

We present several examples in the following paragraphs to illustrate the logic behind formulating a balancing feedback loop. This includes three auxiliary variables, one inflow variable, and one stock variable based on expert knowledge. Take a look at the balancing feedback loop B1, highlighted in Fig. 6.6:

Fig. 6.6
figure 6

Feedback loop B1

Since security risk \(\alpha\) is fixed in our model, we assume that if more people share their data that would lead to a frequency of security breaches. Therefore, security breaches \(S\) is defined as a function of security risk (percentage of data that is compromised) and the amount of shared EMR data \(D_{S}\).

$$S\left( t \right) = \alpha \cdot D_{S} \left( t \right)$$
(6.6)

In turn, security breaches negatively affect the willingness to share EMR data \(WS\) which follows the logic that less people are willing to share their personal information if they observe higher incidences of security threats. This negative polarity enables feedback loop B1 to be balancing. Therefore, we must choose a mathematical function that has an inverse relationship between the dependent and independent variable. As a result, we can express an inverse mathematical relationship between \(WS\) and \(S\) as

$$WS\left( t \right) = \frac{1}{\beta \cdot S\left( t \right)}$$
(6.7)

where \(\beta\) is the sensitivity of patients to security risks. The willingness to share data positively influence the data shared by each patient (labeled as data shared per patient and symbolized as \({\widehat{D_{S}}}\)).

$${\widehat{D_{S}}} \left( t \right) = \gamma \cdot WS\left( t \right)$$
(6.8)

Proceeding along the link, we get to the inflow variable, creation of shared EMR data \(ID_{S}\) which is the number of patients \(P\) multiplied by the data shared per patient \(\widehat{D_{S}}\)

$$ID_{S} \left( t \right) = {\widehat{D_{S}}} \left( t \right) \cdot P\left( t \right)$$
(6.9)

Finally, the creation of shared EMR data directly feeds into the shared EMR data stock.

$$\frac{{dD_{S} \left( t \right)}}{dt} = ID_{S} \left( t \right)$$
(6.10)

For a complete listing of all the equations in our model, please refer to the supplement Jupyter notebook.

3.4 Modelling and Data Integration

As we have demonstrated, we can convert our CLD/stock and flow diagram into a system dynamics model by prescribing each causal link with a mathematical equation. After populating each link with an equation, we are able to numerically solve the entire of system as a set of ordinary differential equations using previously developed methods to solve differential equations.

The system dynamics model that was described in the previous section was implemented in R, a statistical programming environment. As previously mentioned, R is a robust programming language that allows users to develop their own numerical solver or access a wide variety of libraries that are useful for statistical analysis and numerical methods. In our example, we used the library package deSolve (Soetaert et al. 2010) and FME (Soetaert and Petzoldt 2010) to, compute the dynamic behavior and parameterize our coefficients, respectively. To solve these models, we would need to employ a numerical method (e.g., forward and backward-stepping Euler’s method, or Runge-Kutta family of solvers). These solvers have been programmed in the deSolve library in R which makes it easy to implement into any system dynamics model. The reader may refer to the accompanying Jupyter notebook located in the following URL that contains the steps needed to implement the system dynamics model: https://github.com/scarygary89/SimpleHealthDataSharingSD/blob/master/HealthDataSharingModel.ipynb.

System dynamics models can be developed to formalize logic without much data. However, the model’s usefulness is greatly enhanced if the parameters can be determined based on a maximum likelihood estimation and finding the confidence intervals using likelihood methods or bootstrapping (Dogan 2007). The parameters that we adjust to fit the model are the constant variables system dynamics model. There may also be times where initial values are also considered a parameter. Figure 6.7 shows how fitted parameters produce a simulated trendline (teal line) compares with “actual data” (red points). For demonstration purposes, the data in Fig. 6.7 is generated synthetically and not based on any real dataset.

Fig. 6.7
figure 7

Model results. Dynamic trendlines (teal line) of our system dynamics model for the four stock variables (patients in hospitals \(P\), shared EMR data \(D_{S}\), health research data \(D_{R}\), and available treatments \(TR\)) and five auxiliary variables (general health outcomes \(H\), willingness to share \(WS\), research & development budget \(R\), motivation to join a clinical trial \(MT\), and security breaches \(S\)) compared with data points (red points)

The resulting parameter values based on the calibration and fit is located in the following Table 6.2.

Table 6.2 Parameter values based on model fitting with data

Based on these constants, we have enough information to replicate the model and develop a baseline scenario in which we can test alternative policy scenarios. These parameters are estimated based on maximum likelihood estimation that calibrates a parameter that minimizes the (errors) residuals to best fit the data.

Error analysis can be conducted to calculate standard statistical measures that are similar to regression models. We can conduct hypothesis testing on each parameter by assuming a null hypothesis that a parameter is equal to zero and testing that against alternative hypothesis the parameter is equal to the calibrated value and comparing the error distributions—this allows us to conduct a t-test and calculate p-values assuming normality in error distribution. Other parameter validation methods include bootstrapping (refer to Rahmandad et al. 2015a) and method of simulated moments (refer to Rahmandad et al. 2015b) which can help the modeler build confidence in the estimation of each parameter. To further understand the parameters and their sensitivities, the modeler may wish to perform a Monte Carlo Markov Chain (MCMC) analysis (refer to Rahmandad et al. 2015c).

3.5 Interpreting Results and Policy Directions

Once the model is calibrated and running, researchers may wish to use it for testing targeted policy questions. For the purposes of this exercise, we are less concerned with the actual results since we are only illustrating the types of questions that may be of interest to global health policy-makers wishing to utilize system dynamics.

Three particular insights produced from our hypothetical CLD led us to consider the impact of cyber security attacks (security risk); data capture and interoperability in clinical trials (data generated per trial), and machine learning and artificial algorithms (data technology) on the global health research system. These areas of interest led us to propose the kinds of questions that may be run on the model we have generated:

  • Assuming an increase in data security attacks, where can policy-makers best target resources to ensure patient’s continue to contribute health data?

  • How might birth-to-death collection of data impact the cost of clinical trials? How does the collection of data both from sick and healthy people impact the price of clinical trials?

  • How much money can be invested in new data technology to result in a two-fold decrease of patients in hospitals?

4 Uses and Limitations

Our generation faces unimagined levels of information generated every second by new and existing actors. System dynamics complement existing modelling and simulation methodologies to navigate policy questions affecting data-rich ecosystems. Unlike the predictive and explanatory powers of machine learning and probabilistic methods, system dynamics is simply a tool for formalizing and quantifying complex relationships.

Like all models, system dynamics models are ‘wrong’ due to their inherent inability to understand all of the complex relationships and limitations of human rationality. It must be noted that modelling complex systems require an understanding of the dynamic behavior of variables and the range of possible parameter values which can lead to system uncertainties. System dynamics attempts to understand complexity by incorporating knowledge of modeler and its collaborators into a logical formalism that allows for a mathematical structure to be developed. However, parameterization of a complex model can be difficult due to the “curse of dimensionality” and some have propose methods to deal with this issue (Bellman 1957; Ye and Sugihara 2016). Finally, the validation of system dynamics models depends on availability of data and the opinions of domain experts. For more information on this topic, please refer to others who have written comprehensive reviews of the limitations of system dynamics (Featherston and Doolan 2012).