Keywords

1 Introduction

This paper will discuss the ongoing development of the DEAR decision making framework intended to facilitate decision making amidst complex and uncertain environments. Systems are becoming increasingly interactive. The feedback from one set of decisions becomes input for another. These problems span different industries such as: web-based advertising, automated investing techniques, real-time user recommendations, industrial processes, etc. When it comes to facilitating decision making using Decision Support Systems (DSSs) most techniques assume that the input data is neither complex or non-linear [1, 2]. As data becomes increasingly novel, voluminous, and complex and as feedback becomes increasingly common within online systems, researchers and practitioners need techniques to help users to confidently take action. Through incorporating a technique for complex causal inference, and through incorporating risk mitigation recommendations into a DSS, this research aims to help extend the usefulness of DSSs to these modern data domains.

Beneath the concerns about facilitating decision-making in complex systems is the challenge that comes with supporting causal inference and risk assessment. A technique called Convergent Cross-Mapping (CCM) allows for causal analysis amidst non-linear systems [3, 4]. The authors have already applied this technique to novel domains and have extended the scalability of this technique so that it can apply to larger data sets [5, 6]. The authors have also used Kelly’s equations to govern the optimal level of action to take in a given environment given the desired level of risk. Through using a combination of these two techniques, the authors propose a DEAR framework that allows the creation of DSSs to support inference in non-linear systems.

This paper will first discuss some background information related to DSSs, non-linear systems, decision-making loops, and to CCM. Then it will define the problem and recommend the DEAR approach as a solution. It will discuss how this framework helps solve problems and it will present an example use case. Finally, it will discuss the results of the application of this framework to that use case and will summarize the next steps related to further refining this set of ideas.

1.1 Decision Support Systems

There are several primary ways that DSSs can facilitate decision making. First they can help to reduce cognitive load. Well known information visualization techniques allow users to quickly compare the differences in behavior of data defined concepts. For example, showing multiple line charts on the same chart can help the user spot changes both in individual and in groups of time series. Second, DSSs can support cognitive scaffolding through using language and page layouts to augment a users’ ability to understand the relationship between data points. For example, an interface may include a language tree of terms related to the decision space. Third, DSSs can support attribution formation through helping users to form and evaluate hypotheses. This can be done through including information from multiple sources within a single chart. Finally, DSSs can support the decision making process when it guides a user from exploration to hypothesis formation, to evaluation, and ultimately towards action. While these four types of cognitive support are powerful, they have pronounced and limitations when applied to complex, non-linear domains.

The need for action recommendations.

While DSSs often do a good job of providing context and at helping users compare alternative hypotheses, they typically end up leaving it to the user to figure out which decision to make [1, 7, 8]. Without clear support for considering the likely impacts and marginal contribution of specific sets of actions, DSSs force the user to guess which action type to take and how much of that action to engage. This can result in a lack of confidence in the decision, a decline in trust in the DSS, and lack of adoption of the proposed solution. For this reason this paper proposes an approach to supporting decision making that helps guide actions.

The need for detection-oriented approaches.

Another challenge facing DSSs is the number of variables present in many modern systems. This makes it challenging to visualize information and to support hypothesis generation and evaluation. For this reason, many DSS systems have recently begun incorporating machine-assisted qualities that will recommend areas of consideration. While this is a good first step, it does not scale well since the number of possible interactions is combinatorial. For this reason, this paper proposes an approach that helps minimize the computational effort required to form comparisons through limiting the number of possible combinations considered.

The need for concept learning.

One of the reasons it could be hard to recommend an action is that there is limited or no historic data for actions and their success. However, in complex or non-linear environments, the user may encounter concepts or types of interactions that they have not seen before. Such concepts are novel and typically have no names. A limit of most modern DSS tools that utilize ontological scaffolding is that it requires both the knowledge of a comprehensive domain and the mappings of this domain to the data set [1]. Recent examples of cognitive tools suggest that such tools are beginning to infer groups of examples based upon relations between their properties [8]. This form of unsupervised learning is helpful since it allows for structured inference to take place in the absence of clearly defined or clearly mapped topologies.

1.2 Linear and Non-Linear Decision Making Contexts

A complex system is one that contains many processes or components whose behavior is a function both of each other’s states, but also of their own previous states [3]. The feedback of information from previous states to future states creates non-linear dynamics s.t. traditional causal analysis becomes much more difficult [2]. In such systems stable states, called attractors, tend to emerge around the boundaries within this system where the feedback force is much stronger than the noise within the system. The converse point where noise is similar to or greater than the feedback force represents the periods of phase transition within such systems. In such systems, if one variable causes another then there will be a tendency for embedded states of these variables to concomitantly be either a part of an attractor or of a phase shift. Through examining historically offset time series values of process or component, a non-linear causal technique can examine the relationship between variables in complex systems. Taken’s theorem describes and proves the efficacy of this type of complex inference.

As data becomes increasingly novel, voluminous, and complex, the presence of non-linear effects tends to increase. Non-linear systems are those whose previous states substantially effect their current state. Reducing cognitive load is challenging since it becomes unclear which data to concomitantly display. It is challenging to facilitate attribution formation since it is difficult to infer how to sample data to prevent certain paradoxes, such as Simpson’s paradox, from affecting the validity of analysis [10]. Scaffolding is challenging in complex systems since many of the interactions between various data types results in novel interactions that do not map well onto existing topologies. Finally, the decision-making process itself is more circuitous and it is difficult to figure out where to recommend intervention.

1.3 Decision Making Feedback Loops

In complex environments feedback from users related to the usefulness of DSSs is often either that they result in output that is not actionable or that the system does a poor job of communicating the relationships between variables. Either way, users are unsure which action to take. DSSs for non-linear environments that can offer clear action recommendations need to scaffold two things. First, the decision making process should have a full scaffold from problem identification through to action orientation. To orient an action requires both focusing upon where to act and then upon recommending the degree of action to take. Second, there is an emphasis upon creating models whose results becomes a part of the future model. Both of these concepts relate to the uncertainty about which action to take. A common approach to solving both of these problems is the use of decision making loops. In addition to creating a feedback-based approach, this approach will also need to account for action constraints such as the need to explore and the limited ability to take action.

The OODA loop.

This is one example of action-oriented loop [7]. This framework asks the user to observe, to orient, to decide, and to act. This process is intended for individuals making real-time decisions within their working environments. The loop is critical to this framework since it suggests historic outcomes factor into future decisions.

Analyst loops.

Intelligent systems that actively involve the analyst in a decision-making loop have recently received increased attention [8]. This type of approach is naturally congruent with complex environments since in such environments the future state depends upon a function of other variables as well as upon the previous state of itself.

Action budgets.

In order to truly capture the constraints that users often face in applied settings, this approach must recognize that the user can typically only take one or a few actions [8, 9]. This limited ability to act results in constraints characterized by the action budget for a given environment. Such constraints occur because of any number of limitations: time, computational power, human attention, process constraints, etc. Whatever the source, the take away is the presence of an upper limit on the number of simultaneous actions a user can take. This means this approach must account for the conditional and the marginal effects of an action or of inaction.

Reinforcement learning.

If the outcome of an action informs the future prediction as to which action a system should take, then this system engages in reinforcement learning [11]. The typical problem with such exploratory approaches is that they need to somehow balance pursuing the historically optimal solution with the need to ensure that the system has explored all possible solutions for a given problem. While outside the specific context of the problem that this approach strives to solve, the contrast between exploration and exploitation is one that algorithms such as online bandit approaches strive to solve. Nonetheless, this approach does learn from this work and recommends that DSSs balance the potential reward for an action with the underlying risk associated with both taking the action and with taking no action.

1.4 Problem Statement

Driving actions in complex systems is challenging because the future state of these systems is highly dependent upon the former states. Thus, control variables are often a function not only of the other variables but also of their own former states. In such environments traditional causal methods break down. This leads to a lack of interpretability and an inability to take action based upon the results from traditional DSSs. To combat this, this approach needs to consider practical factors such as the computational complexity of the model and the organizational risk associated with certain actions. Finally, this approach needs to account for the ability of the user or of the system to take action as a part of its recommendations.

2 Description of DEAR Framework

The DEAR framework proposes a four-step approach to guiding non-linear decision making that extends the OODA loop framework. In the OODA loop framework the user observes, orients, decides and then acts. DEAR stands for Detect, Evaluate, Assess, and Recommend. This framework strives to detect opportunities for action, evaluates root cause, assesses risk, and then recommends a degree of action to take. It starts by initially assessing the causal relationship between multitudes of variables in real time. Second, the system evaluates each concept for changes over time. Third, if the system detects an unexpected change it then examines which factors could plausibly explain the change. Fourth it recommends an action that balances the need to balance exploration with exploitation. This final step acknowledges the iterative nature of decision making necessary to achieve success in non-linear systems. Through providing mathematical definitions alongside practical examples, the authors hope to present an approach upon which researchers and practitioners can build.

2.1 Detecting Changes

The first technique used to facilitate DSSs in non-linear environments is the incorporation of automated change detection techniques. The goal is to detect meaningful changes in real time so that users or machines can take action using the subsequent stages of this approach. Thus, change detection is critical for increasing awareness of unexpected events and for reducing the number of subsequent comparisons.

Detecting changes can take place using any of a variety of techniques. Regardless of change detection method, this approach starts by initially building a model of each of the variables [12]. Next, this approach detects and notifies the system when incoming examples are unexpected based upon historic context. In this case a variable can be substantially different for a couple of reasons: combination of features is rare, magnitude of event is statistically unlikely, rate of occurrence is unlikely, change is larger than normal, etc. In such a case we can think of this either as anomaly detection or as outlier detection. Two common techniques are outlier detection and surprise-based techniques.

Outlier detection.

There are many forms of outlier detection [12]. Each form typically examines the related distribution and then examines the percentile at which incoming values fall on that distribution. If the position of the incoming value is above a certain, pre-defined threshold, then that example is an outlier [10]. Here are several forms of outlier detection: value-based using the normal distribution, variance-based using the F-distribution, rate-based using the Poisson distribution, etc. If an example exceeds the threshold on the relevant distribution, then it is considered rare. Optionally, this technique comes in one-tailed or two-tailed variants.

Surprise-based detection.

In addition, certain unsupervised clustering techniques group variables into various sets based upon the properties of their attributes [12]. By examining the frequency of occurrence for sets of values, these techniques can determine whether a certain set of attributes is unlikely. Unlike the outlier detection technique, similarity-based techniques typically have some final objective used to determine the rarity of a set of variables. The simplest example would be the chain product rule. Similar to outlier detection there is a threshold past which example are rare.

Since this approach is primarily concerned with combing several different processes together, the rest of this paper will consider one of the simpler possible forms of detection: value-based outlier detection using the normal distribution. Rarity will be set at a p = 0.99 level meaning that, on average, this approach will only consider about 1% of incoming examples. Future work may focus upon automatically learning this threshold so that it results in less subsequent computational effort.

2.2 Evaluating Causality

Once this approach detects an unexpected change, the next step is to examine which variables could have caused this to take place. This evaluation process uses causal inference to determine which factors could have contributed to the observed outcome. Two common types of causal inference are probabilistic or frequentist methods. Both methods have experimental design constraints. Frequentist methods such as ANOVA analysis require the use of either a within group or between groups experimental design [10]. Probabilistic methods such as Bayesian inference require a proper understanding of a priori independence. While both techniques are useful, they are not helpful for complex environments where traditional design of experiments approaches are not possible and where a priori knowledge of independence is challenging to surmise. For this reason, this approach uses the CCM technique for causal inference.

Convergent Cross-Mapping (CCM).

Traditional experimental design focuses upon understanding the direct relationship between a dependent and independent variable. In such a case any change in the dependent variable must precede a related change in the independent variable. Over time this precedence will lead to a parametric relationship upon which researchers can establish significance. This approach will run into problems if used to analyze complex environments. For example, it could lead to the identification of spurious relationships. Paradoxically and perniciously it can also result in the inability to define causal relations for systems with longer embeddings. For this reason, a DSS designed for us in complex environments must use a different technique for assessing causality.

Fortunately, there are already approaches researchers can use. CCM is one such approach that primarily considers the level of embedding between variables [2,3,4,5]. This technique starts by creating a set of time-delayed embeddings, or manifolds, of the variables in question. It then uses the manifold for the test variables to predict the manifold for the control variables. Through performing this process many times this technique then has a set of predicted variables alongside a set of actual variables. It then examines the correlation between the prediction and the actual values. The reason for this is that correlation represents the unit-normalized covariance between variables.

An astute reader may question why this technique uses correlation to test for causation. Because this technique considers the correlation of the embedded manifold, it is effectively examining the postulates presented within Taken’s theorem [2]. Broadly stated, if one variable causes another then that variable being influenced will contain information about its cause. This is an information-based approach to establishing causal relationships. Through discovering the rate at which the test variables are able to predict the control variables it is examining the amount of information present in the test variables related to the location of the control variables as they vacillate between being bound to an attractor or undergoing phase shifts. CCM provides the correlation between the predicted and actual value of the control variable (Table 1).

Table 1. CCM output for variables A and B alongside the related causal inference.

The CCM technique outputs a correlation value for a set of increasingly large library lengths. While each length yields a value, the output is value from the final, largest library length. By examining the manifold relationship between variables with various numbers of examples, this technique can examine the rate at which the CCM relationship converges to a stable value. Consider two variables A and B. By using CCM to test for causality in both direction (ie. A xmap B and B xmap A), CCM can determine:

Whereas correlation is symmetrical, the output of CCM is not. Recall that CCM tests for causality by examining the flow of information between variables. Because of this, and seemingly paradoxically, if the value for A xmap B is high then this implies that B is driving A since A contains information about B. The definition of high is left to the user to interpret. As with other measure of correlation, one could examine the significance of this value at a p = 0.05 level to determine that this causal relationship is stronger than a random interaction.

2.3 Assessing Relative Risk

At this point the DSS has detected an unexpected event and has established the causal relations between this event and other known variables. Before the DSS can move towards figuring out which action to take, it needs to first figure out whether taking an action is likely to be successful. Doing this will require forming probabilistic inference.

The output of the CCM technique represents the probability of explaining the variance in one set of variables given knowledge of a different set. However, CCM also offers a stronger assertion: this value is the probability that the information in one set will carry through to another. This is related to the probability that the movement in one set will cause another to move in the same direction. Thus, the square of the CCM output represents the expected amount of variance that the source variables can control.

Kelly’s criterion.

This formula is a probabilistic technique that will determine the ideal action to take given a set of constraints [9]. There are two input for this formula: the probability the action will result in success and the payout if the action is successful. Used in conjunction it will determine the degree to which an agent should commit to an action related to a specific outcome based upon the likelihood of that outcome to effect change and the expected payout for that outcome. The more accurate one can get in terms of determining the odds of success and the likely payout, the more useful this formula becomes. Where w is the total funds available to invest, p is the probability of success, b is the odds or payout, and k is the funds to invest this criteria states:

$$ k = w\frac{{p\left( {b + 1} \right) - 1}}{b} $$
(1)

As mentioned above the CCM correlation value is synonymous with causation. Furthermore, recall that each edge is the normalized correlation value s.t. it represents the probability that one variable can control another variable of interest.

The expected payout is an external factor determined by the user. This factor should be a function of the expected gain if the action taken is as successful as possible. The units for this gain are not important so long as it is continuous and there is a return on investment using the same units. If the investment is time the return is time savings. If this factor is negative it will effectively force the agent to take action to avoid this outcome. If the payout is the probability that an intervention will succeed then convert it to the odds value using this equation:

$$ \text{Odds} = \text{Probability}\,/\,\left( { 1 { - }\text{Probability}} \right) $$
(2)

Once this approach incorporates these factors within the formula, the results tell the user how much of an action to take given a limited pool of resources to commit. Called a wallet, express this pool of resources using the same units as the units for the payout factor. The size of the wallet should represent the total risk the user is willing to lose over a long period of time. For example, if the unit for risk is dollars, then this wallet may represent a budget used for maintaining infrastructure. The result of this equation determines which variable the user should action and the degree to which the user should commit resources to take action given a fixed budget. This approach works well for cases where the agent only takes one type of action.

Multiple actions types.

There are multiple ways to consider the application of Kelly’s criteria to multiple action types. First, select the single action that has the highest expected payout. Note that payout is the result of the equation. An alternative approach is to combine the expected return from a number of different options. So far this approach has assumed that all causal variables act independently. Testing interaction effects is a good way to examine independence [10, 13]. With this knowledge, the DSS can normalize the expected payout for each variable by its marginal contribution. The marginal contribution would be that contribution above and beyond the interaction contribution. The resultant normalization the percentage of action to direct towards each option.

Unintended consequences.

Finally, this approach can also estimate the potential consequences of taking action(s) upon other variables. Since users may wish to take the action that maximally changes one variable while leaving others unaffected. DSSs can use this approach combined with the knowledge of interaction effects from above to help users prevent unintended consequences. Doing this would require setting negative expected odds for the variables the user wants to leave alone. Then the total expected payout becomes the sum of the expected payout from the action or set of actions along with the expected loss from the optional, negative payout variables.

2.4 Recommending Action

At this point the DSS has detected unexpected changes in one or more variables and it has expressed the causal relations between them. This allowed the DSS to determine which set of actions had the highest expected payout. However, the system cannot simply recommend the highest payout. The reason for this is that the approach would not account for the remaining uncertainty related to whether or not there are better options that the model has never considered [11]. To account for these factors the approach must balance pursuing the optimal action with the need to occasionally explore alternatives. This requires choosing an initial minimum threshold with which to pursue an entirely random course of action. Next, through keeping track of the actual payout for each action, this approach can determine whether the model is likely accounting for external factors. If the difference between actual and expected payout does not decrease as the model continues to learn, then this can inform the model that it should increase the rate at which it takes random actions. This approach is similar to how multi-armed bandit algorithms hone in on the correct balance between exploratory and exploitational actions.

Sampling saves time.

Machine simulation using sampling techniques such as the Monte Carlo method or via implementing Gibbs sampling will allow this approach to explore the possible outcomes associated with a series of actions [10]. While this approach can directly model the result of a given action, it requires many computations to consider the possible payouts for a series actions. Use sampling techniques to simplify computational complexity.

2.5 Integrating the Framework

This approach recommends the possible use of a variety of techniques. Not all of these techniques are necessary for a majority of use cases. Nonetheless they are tools that, when used in conjunction with the DEAR framework, will result in a method that can consistently deliver insights that are simultaneously actionable and parsimonious. In order to integrate this approach within an existing DSS, consider focusing upon each stage of this approach as a discrete set of functional processes. While each stage is conceptually separate, they are also computationally separable. Treating each stage as separate will allow this approach to build models that are more scalable. This is because each stage will then contain only the information necessary for performing its operations. This will reduce the storage, computational, and transportation costs.

3 Method

3.1 Context

Many scenarios benefit from risk mitigation. One area that influences both public policy and corporate decision making is the reduction of mortal risk. One of the leading causes of unexpected deaths comes from vector-borne diseases carried by insects [14]. The incident rate of these diseases is in the hundreds of millions per year with about a million annually attributable fatalities. By far the largest contributor to these diseases is mosquitoes. Mosquitoes are ubiquitous throughout the world and carry many diseases including: the Zika virus, West Nile virus, Chikungunya virus, dengue, yellow fever, malaria, lymphatic filariasis, and Japanese encephalitis [14,15,16].

Preventing mosquito-borne diseases could focus upon preventing mosquitoes, preventing mosquitoes from contracting viruses, or upon preventing humans from coming in contact with mosquitoes. This paper does not focus upon preventing mosquitoes from contracting viruses since that is intractable enough that it requires specialized knowledge. While there are methods to prevent humans from contacting mosquitoes, the likelihood that humans will come into contact with mosquitoes is also a function of mosquito populations [14]. Therefore, this paper focuses upon mosquito populations.

There are many hypotheses about what factors cause mosquito populations to effect humans. Example hypotheses include precipitation, weather, geography, human population density, and time of year [15]. Since it is challenging to obtain precise geographical data, this paper will focus upon weather and upon precipitation [17, 18]. However, the population of mosquitoes has been shown to thrive in and near urban areas due to the prevalence of favorable environmental conditions [16]. Researchers find it challenging to prove that weather or that precipitation cause influxes in mosquitoes since it has proved hard to control for these variable. Additionally, natural systems are complex enough that natural experiments are hard to come by and the few good examples become difficult to reconcile with the general population. Finally, this is of particular concern due to the close proximity of mosquitoes and potential transmission of disease to humans.

3.2 Application

The previous section discussed the importance of this scenario. This scenario is also important from an application perspective. Public policy struggles to balance competing approaches to increase public safety. In complex and uncertain environments the unambiguous solutions, even if less efficacious, are easier to justify to critics. Furthermore, because there are many competing hypotheses regarding the cause of large changes in mosquito populations, this will represent a good opportunity to test this framework on a socially impactful problem. The method of this paper will be applied to the mosquito population in the United States. Weather data consists of Fahrenheit temperature and inches of precipitation.

In order to exercise the full DEAR framework, four different algorithms will examine this scenario. First it will detect large changes in mosquito populations using a statistical outlier detection technique. This simple technique will test for values that are beyond the 95th percentile. Second, it will apply the CCM technique to evaluate causal relations between these variables and the mosquito population. Third, it will model the risk associated with taking actions based upon these causal relationships. Fourth, it will examine several possible actions in order to recommend a policy action.

4 Results

4.1 Detecting Percentile-Based Outliers

Detection techniques are both statistical and user-driven. In July of 2015 in Columbus, OH the population was aware of a large increase in mosquitoes [19]. There was fear about West Nile Virus and an observed case of Zika virus. The precipitation had also markedly increased by nearly 50% year over year. This discovery led to the use of causal inference techniques.

4.2 Evaluating Causality Using CCM

To interpret CCM plots, examine the strength of correlation across increasing library lengths. If this correlation value increase as the library length increases, it suggests that information about one variable is increasingly found in the other. If the line for A xmap B increases but B xmap A does not, it suggests uni-directional causality (B causes A). Recall from above that there are four possible interpretations. Recall also that significance is still assessed at the 0.05 level. With a library length of 25, the correlation value has to be greater than 0.46 to achieve significance.

Initial investigation did not show any relationship between precipitation and mosquito populations or between temperature and population. However, subsequent analysis considered the degree days and cumulative mosquito population. Degree days is the cumulative sum of average daily temperatures. Turning these variables into cumulative variables immediately revealed causal relationships.

Figure 1 shows the discovery of evidence for non-linearity and the presence of a complex attractor in the phase portraits for Daily Precipitation, Mean Daily Temperature, Daily Mosquito Count, and a plot showing the relationship between Mean Daily Temperature and Daily Mosquito Count. This qualifies the datasets as suitable candidates for CCM analysis. The authors have discovered attractors with similar characteristics in other industrial and physical systems [5, 6].

Fig. 1.
figure 1

Phase portraits for Daily Precipitation, Mean Daily Temperature, Daily Mosquito Count, and a plot showing the relationship between Mean Daily Temperature and Daily Mosquito Count.

Figure 2 illustrates that while mosquito population data contain information about seasonal precipitation, that seasonal precipitation does not contain very much information about mosquito populations. This implies seasonal precipitation has a causal effect on mosquito population. Cumulative mosquito population xmap cumulative precipitation is significant (0.73) while the inverse is not significant. This asymmetrical pattern of information re-occurs throughout other variables.

Fig. 2.
figure 2

Causality of cumulative seasonal rainfall on mosquito population based on convergent cross mapping.

Figure 3 illustrates that the cumulative mosquito population xmap degree days is significant (0.80). This means that mosquitos contains information about temperature; changes in temperature cause changes in mosquitoes. Notice there is informational asymmetry since temperature xmap mosquitoes is not significant.

Fig. 3.
figure 3

Causality for cumulative degree days on mosquito population based on convergent cross mapping.

Since both temperature and precipitation significantly affect mosquito population, the next step was to check for two-way interactions. This approach can test for interaction by taking the product of the input variables as a single input into CCM [13].

Figure 4 shows that interaction xmap population is not significant but that population xmap interaction (0.7) is significant. This means that mosquito populations contain information about these two variables interacting and that therefore these interactions affect this population. The strength of this interaction is smaller than either of the two direct effects which suggests that each contains some marginal causality.

Fig. 4.
figure 4

Causality for the interaction of seasonal rainfall totals and cumulative degree days on mosquito population based on convergent cross mapping.

4.3 Using Kelly’s Criterion to Assess Risks

The Kelly criteria gives the optimal investment to make for the next action. It is a kinetic approach. Policy decisions are usually periodic with long delays in between decisions. For that reason, think about policy actions as the action to take for the subsequent period. This decision period should match the polling period used by the detection framework to notice changes in environmental variables. If data comes in weekly, then recommendations can have a weekly horizon.

Another peculiar part about applying the Kelly criteria to public policy is that the notion of a payout is abstract. To address this consider the investment as that of organizational effort including both time and money. If the result of the action is successful, then the need to take that action in the future should be reduced relative to the amount of action taken and the expected payout determined by the Kelly criteria.

Effective planning suggests implementing control systems in anticipation of peak mosquito population should have the most beneficial result in terms of minimizing the disease vector prevalence. Fortunately, the DSS knows the probability that an increase in temperature will lead to an increase in mosquito population from the causal analysis phase. Recall that cumulative mosquito population xmap degree days was 0.80. The square of this offers a 0.64 probability information in temperature will be found in mosquitoes. Now the DSS can now use Kelly’s criteria to examine preventative actions.

Mosquito Control authorities have numerous options at their disposal for mosquito population control including biological, environmental, and chemical remediation systems. The historical success rate of these control systems has been 90% [20]. Although effective, these control systems represent significant expenditures of resources. Nonetheless, the odds associated with these interventions are 9:1. Given these favorable odds, Kelly’s equation suggests investing up to 60% of resources during this cycle.

4.4 Recommending Mosquito Control Action

Given the results of the risk analysis this approach recommends that the responsible agency spend 60% of its funds for preventing vector-based diseases on mosquito control systems. Since this data updates weekly this decision can also update weekly. The shorter the decision period, the faster this approach will find the correct set of actions to take. If successful, this will prevent the need to take future action relative to the payout. It also saves the remaining funds for use with alternative initiatives that might be more successful. Finally, record the result of actions in order to further refine future risk models.

5 Conclusions

5.1 Conclusions Related to Applying the DEAR Framework to This Context

Monitored changes in environmental variables led to subsequent causal analysis regarding mosquito populations in Columbus, OH. This revealed that while both temperature and precipitation exhibit a causal effect on mosquito populations, temperature is slightly more explanatory. These variables also interact. By translating the degree of control discovered using CCM into Kelly’s criteria, this approach was able to recommend data-driven policy for mosquito mitigation efforts.

5.2 Conclusions Related to the DEAR Framework

Treating each stage as separate within this model made it easy to integrate different stages of the analytical process. Separating each component made this approach more scalable and will allow the creation of models for larger data sets since each stage only contains the information necessary for performing its operations. This will reduce the storage, computational, and transportation costs of data while empowering researchers.

6 Next Steps

A substantial area for future work relates to applying this technique to systems containing many simultaneous variables. This will require an understanding of marginal causality. Preliminary plans involve building graphical models containing the causal information from the evaluation phase of this approach. Once a graph exists, this approach should be able to leverage graph-based processing techniques to perform the necessary likelihood estimations.

Finally, while there are good methods for doing the detection in real-time, CCM remains a computationally complex process and future work will also focus upon building a more scalable version of CCM with similar guarantees.