This chapter is devoted to the presentation of a more realistic version of the model, Risk and Rumours, which extends the previous, theoretical version (Routes and Rumours) by including additional empirical and experimental information following the process described in Part II of this book. We begin by offering a reflection on the integration of the five elements of the modelling process, followed by a more detailed description of the Risk and Rumours model, and how it differs from the previous version. Subsequently, we present selected results of the uncertainty and sensitivity analysis, enabling us to make further inference on the information gaps and areas for potential data collection. We also present model calibration for an empirically grounded version of the model, Risk and Rumours with Reality. In that way, we can evaluate to what extent the iterative modelling process has enabled a reduction in the uncertainty of the migrant route formation. In the final part of the chapter, we reflect on the model-building process and its implementation.

1 Integrating the Five Building Blocks of the Modelling Process

The move from a data-free, theoretical agent-based model to one that represents the underlying social processes and reality more closely, requires making advances in all five areas presented in Part II of this book. The model itself needs to be further developed to answer more specific research questions in a more realistic scenario, the data and experimental information need to be collected, ideally guided by the statistical analysis where possible, and the modelling language and formalism need to be chosen so that they serve the new modelling aims and purposes.

In the context of the migration model presented in this book, we have therefore set out to create a more realistic version of the simulation of the migration routes into Europe. To make the model better resemble real-life scenarios, the notion of personal risk was introduced into the modelled world – in this case, the chance of not being able to make it safely to the destination and, in extreme cases, of perishing along the way. This was intended to align the scenario more closely with the sad reality of the deadly maritime crossings from North Africa and Turkey into Europe, especially via the Central Mediterranean route, where at least 17,400 people have perished between 2014 and January 2021 – a majority of the more than 21,300 deaths in the whole Mediterranean basin in that periodFootnote 1 (Frontex, 2018; IOM, 2021, see also Chap. 4).

In particular, by extending the model and its purpose, we were interested in investigating whether our model could be used to test the claim – which was made by some parties within the EU – that an increased risk on the Mediterranean would lead to a decrease in ‘pull factors’ of migration and thus a decrease in the number of arrivals (for a critical discussion of this idea, see e.g. the Death by Rescue report by Heller and Pezzani 2016, as well as other studies, overviews and briefs, such as Cusumano & Pattison, 2018; Cusumano & Villa, 2019; and Gabrielsen Jumbert, 2020). This is the type of research question that does not necessarily imply predictive capabilities in a simulation model, but rather seeks to illuminate the mechanisms and trade-offs involved in the interplay between risk, information, communication, and decisions.

In our case, the starting point for the model extension was the theoretical Routes and Rumours model, presented in Chap. 3 and Appendix A. Each of the subsequent building blocks – the empirical data, statistical analysis, psychological experiments, and the discussion around the choice of an appropriate programming language – as well as the changes made to the model itself as it was further developed to serve the purpose, were then used to augment the simulated reality in the light of the knowledge that became available as the modelling process unfolded.

Of course, as discussed before, identifying the empirical basis for the model proved challenging. Of the many different data sources on asylum migration discussed in Chap. 4 and Appendix B, only a handful were directly applicable to the new version of the model, and of those, only a couple ended up being used. The potentially applicable sources concentrated mainly on the process data on registered arrivals in Europe, (uncertain) risk-related data on the deaths in the Mediterranean, and survey-based indications of the sources of information used by migrants along the way (see Box 4.1).

The statistical analysis discussed in Chap. 5 served as a way of focusing the model on the most important aspects of the route dynamics, while at the same time allowing its development in other areas. To that end, the key findings regarding the sensitivity of the model outputs to a small set of information-related variables enabled us to concentrate on the key defining features of the underlying social mechanisms driving route formation, which in this case was focused on information exchange. At the same time, as was expected given the nature of migration processes, the levels of uncertainty surrounding the modelled route formation and the impact of its drivers (via model parameters), remained high – and higher than in the Routes and Rumours model.

On the one hand, the results of the statistical analysis carried out on the first, theoretical version of the model (Routes and Rumours), helped therefore delineate the possible uses of the psychological experiments in enhancing the simulation. In particular, the design of the second set of experiments discussed in Chap. 6, looking at the attitudes to risk and eliciting subjective probabilities of a safe journey depending on the source of information, was directly informed by both the model design and sensitivity analysis reported above. The data from this experiment were then directly used in informing the way the agents respond to different types of information in the current model version.

On the other hand, the choice of a modelling language also influenced the model-building, albeit indirectly. Despite the model development continuing in a general-purpose programming language (Julia) rather than a domain-specific one (ML3), the new version as described in Chap. 3 includes some aspects of the model formalism and semantics, uncovered through parallel implementation in both languages (Reinhardt et al., 2019). This mainly relates to using the continuous definition of time and to modelling of events through the waiting times, as recommended in Chap. 7. At the same time, the provenance description of the model helped understand the mechanics of the modelling process itself, and offered a more systematic way in which to extend the first version of the model.

Throughout the remainder of this chapter, we present the results of following the modelling process discussed before, in the form of a more realistic and empirically grounded, yet still explanatory rather than predictive model of migration route formation. In comparison with Routes and Rumours, the focus goes beyond the role of information and choice between different options under uncertainty, and now additionally includes risk and risk avoidance, with potentially very serious consequences for the agents. We discuss the motivation for the specific elements of the construction of the resulting Risk and Rumours model, as well as a detailed description of its constituting parts next.

2 Risk and Rumours: Motivation and Model Description

Most of the capabilities required by our model in order to be able to test whether increased risk could lead to a reduction in arrivals were already in place in the Routes and Rumours version, except for one crucial one: the presence of risk, and the rules governing the agents’ decisions in relation to risky circumstances, the addition of which was the key feature of the new version, called Risk and Rumours. Other than that, in the previous version the agents already reacted in real (simulated) time to the changes in travel conditions. Here, the continuous time paradigm offers a much more natural environment for framing the process of information flow and belief update, devoid of the artificial constraints imposed by the granularity of time steps and scheduling problems in discrete simulations (Chap. 7). Furthermore, the agents’ decisions are based not only on their subjective (and possibly imperfect) knowledge, which could be exchanged with other agents, mediated by the levels of trust, or gained by exploring the environment, but also by different levels of risk and attitudes towards it.

Contrary to the previous version, and to keep the Risk and Rumours model consistent, both internally and with the reality it aims to represent, in this version of the model it is possible for agents to die, which removes them from the simulation entirely. For the sake of simplicity, we assume that the agents can only die when moving across transport links. As with the other processes in the continuous-time version of the model, death happens stochastically at a certain rate. The rate of death for a given link is calculated from a risk value associated with each link that represents the expected probability of an agent dying when crossing that link, and the expected time it takes to cross that link. The death rates can be taken from the empirical data, such as the Missing Migrants project (see Chap. 4), either applied directly as model inputs, or used to calibrate the outputs.

The agents’ information on the transport links now also includes corresponding knowledge about risk, which they are able to learn about and communicate in the same way as for the links’ friction and other properties of their environment (see Chap. 3). Still, this is the one aspect of the new version of the model that is of crucial importance from the point of view of examining substantive research questions, many of which – implicitly or explicitly – rely on some assumptions about the attitudes of prospective migrants towards risk, and on the decisions taken in this light.

To that end, the risk-based decision making in the current version of the model is directly informed by the empirical experiments on subjective probabilities, risk attitudes and confidence in the ensuing decisions according to the source of information, as described in Sect. 6.3. Here, we used a logistic regression of the (stated) probability of making a decision to travel against the (stated) perceived level of risk, to parameterise a bivariate normal distribution. From this distribution, we draw for each agent individual values for the slope S and intercept I of the logit-linear function mapping the probability of travel, p (as per the experimental setup), and the agent’s perceived risk, s. As discussed in more detail in Box 6.1 in Sect. 6.5, the logit of the probability to travel can then be calculated as p = I + S * s. In this version of the model the value of p is transformed into a probability, and used as part of the cost calculation on which the agents’ path planning is based. For specific details on the calculation of risk functions, including the role of risk scaling factors, see Box 6.1 in Sect. 6.5, as well as the online material referenced in Appendix A.

In terms of the topology of the new version of the model, for simulating the effect of elevated risk we implemented a ‘virtual Mediterranean’ by keeping the risk at very low levels (0.001) for most links in the world, but increasing it in all links overlapping a rectangular region that ran across half of the width of the simulated area (the red – darker – central area in Fig. 8.1, showing the model topology).

Fig. 8.1
Two illustrations depict various routes and risk links of the world. The red links at the center indicate high risks, and the surrounding green links represent low risks.

Topology of the Risk and Rumours model: the simulated world with a link risk represented by colour (green/lighter – low, red/darker – high) and traffic intensity shown as line width. In this scenario, cautious agents (left) take traffic routes around the high-risk area, whereas agents exhibiting risky behaviour (right) take the shortest paths, crossing through the dangerous parts of the map. (Source: own elaboration)

In order to be able to run simulation experiments based on complex pre-defined scenarios such as, for example, policy interventions or changes in the agents’ environment over time, we further added a generic ‘plug-in’ scenario system to the model. This makes it possible to load additional code during the runtime of the simulation that, for example, changes the values of some parameters at a pre-defined time, or occasionally modifies the properties of some parts of the simulated world.

Examples of policy-relevant simulations generated by this model are described in more detail in Chap. 9. Their implementation required three such ‘plug-in’ scenario modules: two of them simulate simple changes in the external conditions of departures (the migrant generating process) and travel conditions, namely a change in departure rate at a given time, and change in the level of risk in the high-risk area at a given time. The third module simulates a government information campaign to make migrants aware of the high risk of crossing a dangerous area (here, our virtual Mediterranean) under varying levels of trust in official information sources informed by the Flight 2.0/Flucht 2.0 survey (see Box 4.1 in Sect. 4.5, and Appendix B for source details), as well as by the psychological experiment on eliciting subjective probabilities, reported in Chap. 6 (Sect. 6.2).

In this module, the information campaign has been implemented by introducing a simulated ‘government agent’ who has full knowledge concerning the high-risk area, who then interacts with a certain probability with agents present in the entry cities (see Appendix A). If an interaction takes place, the migrant agent in question exchanges information with the government agent analogous to the information exchange happening during regular agent contacts, albeit with modified trust levels.

In addition to providing insights into the topology of the modelled world, Fig. 8.1 offers some preliminary descriptive findings about the role of risk and risk attitudes, based on a single model run. In this example, the agents are on average either more or less risk-taking, which is in line with the qualitative findings of the first cognitive experiment, on eliciting the prospect curves (Sect. 6.2). These differences in attitudes to risk have a clear impact on the number of journeys undertaken by agents through the high-risk area. As expected, the more cautious agents are more likely to attempt travelling around, while in the scenario with higher risk tolerance, the intensity of travel through the high-risk area is visibly elevated. Some further substantive questions, which can be posed within the context of the Risk and Rumours setup, are examined for several policy-relevant scenarios generated by the model, presented in Chap. 9. Before that, however, an important intermediate question is: what is driving the behaviour observed in the model? As discussed in Chap. 5, the uncertainty and sensitivity analysis can offer at least some indications in that respect. We discuss this step of the analysis of the model behaviour next.

3 Uncertainty, Sensitivity, and Areas for Data Collection

To analyse the behaviour of the Risk and Rumours model itself, we follow the template from Chap. 5, with a few modifications. To start with, we limit the analysis to four model parameters related to information exchange, which were previously identified as key in Chap. 5 and one parameter related to the speed of exploration of the local environment (speed_expl), plus five additional free parameters, not identified from the data, yet crucial for the mechanism of the model. These additional parameters are related to the perceptions of risk, and the detailed list of all ten parameters used for uncertainty and sensitivity analysis is provided in Table 8.1.

Table 8.1 Parameters of the Risk and Rumours model used in the uncertainty and sensitivity analysis

This time, our focus is on two key outputs: the number of arrivals, and the number of drownings, as the ultimate human cost of undertaking perilous migration journeys. Both of these outputs are analysed globally, but can also be looked at as time series of the relevant variables for more specific policy-related questions and for setting up coherent scenarios, as discussed further in Chap. 9.

Given the number of parameters to be studied in this version of the model, there is no need to carry out extensive pre-screening, so the analysis can focus on assessing the uncertainty of the outputs and their sensitivity to the individual model inputs, in order to unravel the dynamics of the system and interactions between its different components. As before, standard experimental design, based on Latin Hypercube Samples, is applied, with 80 design points and five replicates per point.

The main results of the sensitivity and uncertainty analysis of the Risk and Rumours model are reported in Table 8.2. For the two outputs considered – the number of arrivals and the number of deaths – three parameters related to information exchange, introduced in Chap. 5, remain of pivotal importance. The key parameter is the probability of exchanging information through direct communication (p_transfer_info), followed by the probability of communicating with an agent’s contacts (p_info_contacts) and of losing contacts (p_drop_contact). From the newly-added parameters, depicting the relationships with risk, the most important are those related to the speed of updating the information about risk (speed_expl_risk), and to the mapping between the objective risk of death and its subjective assessment (risk_scale). The interactions between these parameters also play a role in shaping both outputs, as shown in Table 8.2.

Table 8.2 Uncertainty and sensitivity analysis for the Risk and Rumours model

The mean and variance levels of the expected model outputs indicate that on average, across the whole ten-dimensional parameter space, per each run with 10,000 travelling agents, the model generates nearly 7800 arrivals and 2200 deaths, although with some non-negligible variation. The resulting death rate, of around 22%, is clearly by an order of magnitude higher than would be observed even on a high-risk maritime crossing, such as Central Mediterranean. This suggests that the model needs to be properly calibrated to the empirical data on deaths in order for it to be more representative of the underlying reality of migration journeys. The estimated total variance in the code output translates into standard deviations of nearly 1150 for arrivals and over 650 for deaths, indicating considerable disparities across the whole parameter space. On the other hand, the impact of code uncertainty on the total estimated emulator variance is relatively small: the σ2 term for the code variability ‘nugget’ is two orders of magnitude smaller than the overall fitted variance term of the emulator, σ2. On the whole, the fit of the underlying GP emulator is reasonable, with the root mean squared standardised error (RMSSE) above two for both outputs, somewhat larger than the ideal levels of one, which would indicate that the emulator results are close to the model outputs.

Figure 8.2 illustrates the response surfaces with respect to the two parameters describing the relationship with risk (risk_scale and speed_expl_risk), over their space of variability defined in Table 8.1, [4, 20] × [0, 1]. The predicted values of the GP emulator, means and standard deviations, are shown for the two outputs: numbers of arrivals and deaths. For simplicity, only the results assuming Normal prior distributions of inputs are shown, and the values for the remaining parameters are set at arbitrary, yet realistic values.Footnote 2 As can be seen from Fig. 8.2, both outputs show clear gradients along both risk-related parameter dimensions, with arrivals increasing and deaths decreasing with both risk_scale and speed_expl_risk, and with lower uncertainty estimated for ‘middle’ values of both parameters than around the edges of the respective graphs.

Fig. 8.2
Four contour plots of risk scaling versus updating risk information depict the predicted number of arrivals and deaths on the left and their standard deviations on the right.

Response surfaces of the two output variables, numbers of arrivals and deaths, for the two parameters related to risk. (Source: own elaboration in GEM-SA, Kennedy & Petropoulos, 2016)

The results of the sensitivity analysis additionally point to the areas of further data collection, in particular with respect to information transfers over networks (parameters p_transfer_info, p_info_contacts, and p_drop_contact), mapping of objective and subjective risk measures (risk_scale), and the speed of updating the information about risk through observation (speed_expl_risk). These are the areas where the information gains in the model are likely to be the highest, and at the same time, where the existing evidence base is scarce or non-existent. Here, as discussed in Chap. 6, carrying out the more interactive and immersive cognitive experiments on decision making would bear a promise of producing results that may be less influenced by the respondent bias, which is a concern for respondents with no lived experience of migration, not to mention asylum migration. Setting up such an experiment can additionally be helped by carrying out a dedicated qualitative survey, specifically targeted at asylum seekers and refugees, the results of which would inform the experimental protocol and help manage some ethical issues related to the sensitivity of the topic.

Still, even within the confines of the current model, there is scope for further inclusion of selected data sources, discussed in Chap. 4, in order to make it even closer aligned with the reality the model aims to represent. We discuss these additions, leading to the creation of a new version of the model, called Risk and Rumours with Reality, and the process of calibrating this model to observed data by using Bayesian statistical methods, in the next section of this chapter.

4 Risk and Rumours with Reality: Adding Empirical Calibration

As discussed before, during the so-called ‘migration crisis’ following the Arab Spring and the Syrian civil war, attempts to cross the Mediterranean via the Central route, from Libya and Tunisia to Italy and Malta, saw a massive increase (Chap. 4). The European Union reacted to these developments by implementing a ‘deterrence’ strategy, in cooperation with North African states. This strategy relied on making it harder for humanitarian rescue missions to operate in the Mediterranean, while at the same time boosting efforts by coast guards in Libya and Tunisia to intercept asylum seekers’ boats before they could reach international waters. As mentioned before, the available data indicate that between 2015 and 2019 these policy changes could have led to a strong increase in interceptions at the African coast, and also to a greater number of fatalities, especially on the Central Mediterranean route (Frontex, 2018; IOM, 2021; see Sects. 4.2 and 8.1). The concomitant reduction in sea arrivals in Southern Europe, however, seems to indicate that their harrowing humanitarian costs notwithstanding these policy changes at least accomplished their declared goal.

It should be possible to test if this ‘deterrence hypothesis’ is true – that is, whether the effect of deterrence can indeed explain the reduction in the number of arrivals – by using an empirically calibrated model of migration that includes the effects of perceived risk on the migrants’ decisions. A full test of the hypothesis goes beyond the scope of this book; however, in the following discussion we demonstrate the first steps towards such a test, by calibrating the Risk and Rumours model against the refugee situation in the Mediterranean in the years 2016–2019, and thus creating a new version, Risk and Rumours with Reality. Setting up the modelling framework for this exercise involved four additional processes: (1) specifying the topology of the transport network, (2) extracting and assessing data on fatality and interception rates, (3) reassessing the sensitivity of the adjusted model to key parameters, and finally (4) calibrating the parameter values based on the empirical information.

To begin with, to define a geographically-plausible model topology for the network of cities and links between them in the model, we extracted the geographical locations of the most important cities in North Africa, the Levant and on the Turkish coast as well as some important landing points for refugee boats in Italy, Malta, Cyprus and Greece from OpenStreetMaps (using OpenRouteService – source S02 in Appendix B). From the same data source, we calculated travel distances between these locations to be used as a proxy for the friction parameter. The resulting map is shown in Fig. 8.3.

Fig. 8.3
An illustration depicts the network of cities and their links with high and low risks. A high-risk link with high intensity connects Larnaka and Port Said, and a low-risk link with high intensity connects Port Said and Hurghada.

Basic topological map of the Risk and Rumours with Reality model with example routes: green/lighter (overland) with lower risk, and red/darker (maritime) with higher risk. Line thickness corresponds to travel intensity over a particular route for a randomly-selected model run, with dashed lines denoting unused routes. (Source: own elaboration based on OpenStreetMaps)

In terms of data for the period 2016–2019, the number of interceptions at the Tunisian and Libyan coasts as well as numbers of presumed fatalities are available from IOM (2021) (see also Chap. 4, with sources 11 and 12 listed and discussed in more detail in Appendix B). Since we do not know the number of departures, we have to infer fatality and interception rates for each year by using arrivals (idem) in the corresponding year. For this, we assume that every migrant will attempt departure until they either manage to make the crossing, or die. Intercepted migrants wait a certain amount of time and then make another attempt. Based on these assumptions we can estimate the interception probability as pi = Ni/(Ni + Na + Nd) and probability of dying as pd = Nd/(Ni + Na + Nd), where Ni denotes the number of interceptions, Na – number of arrivals, and Nd – number of fatalities.

In the third step, we revisited the sensitivity and uncertainty of the revised version of the model to different parameters, with the detailed results reported in Table 8.3. In this iteration of the analysis, there is a noteworthy decrease in the share of the variance explained by individual parameters in comparison with previous model versions. There is also visibly higher impact of the parameter interactions, as well as other, residual factors that drive the model behaviour, which are not yet fully accounted for in the model, such as the changes in the intensity of migrant departures.

Table 8.3 Uncertainty and sensitivity analysis for the Risk and Rumours with Reality model

To increase the alignment of the model with reality further, by using the three outputs discussed above, Ni, Na and Nd, we selected a number of parameters that had emerged as being the most important in the sensitivity analysis – such as path_penalty_risk, p_info_contacts, p_drop_contact and speed_expl – as well as the two most important parameters determining the agents’ sensitivity to risk – risk_scale and path_penalty_risk. We subsequently calibrated the model using a Population Monte Carlo ABC algorithm (Beaumont et al., 2009) with the rates of change in the numbers of arrivals and interceptions between the years, as well as the fatality rates per year, as summary statistics. The rates of change were used in order to at least approximately get rid of the possible biases identified for these sources during the data assessment presented in Chap. 4 (in Table 4.3), tacitly assuming that these biases remain constant over time. A similar rationale was applied for using fatality rates. Here, the assumption was that the bias in the numerator (number of deaths) and in the denominator (attempted crossings) were of the same, or similar magnitude.

We ran the model for 2000 simulation runs spread over ten iterations, with 500 time periods for each run, corresponding to 5 years in historical time, 2015–19, with the first year treated as a burn-in period. Under this setup, however, the model turned out not to converge very well. Therefore, we additionally included the between-year changes in departure rates to the parameters to be calibrated. With this change we were able to closely approximate the development of the real numbers of arrivals and fatalities for the years 2016–19 in our model (see also Chap. 9).

In parallel, we have carried out calibration for two outputs together (arrivals and interceptions) based on the GP emulator approach, the results of which confirmed those obtained for the ABC algorithm. Specifically, we have estimated the GP emulator on a sample of 400 LHS design points, with twelve repetitions at each point, and 13 input variables, including three sets of departure rates (for 2017–19). The emulator performance and fit were found reasonable, and the results proved to be sensitive to the prior assumptions about the variance of the model discrepancy term (see also Chap. 5).

Selected results of the model calibration exercise are presented in Fig. 8.4 in terms of the posterior estimates of selected model parameters: as for the ABC estimates, we did not learn much about most of the model inputs, except for those related to departures. This outcome confirmed that our main results and qualitative conclusions were broadly stable across the two methods of calibration (ABC and GP emulators), strengthening the substantive interpretations made on their basis. To illustrate the calibration outcomes, Fig. 8.5, presents the trajectories of the model runs for the calibrated period. These two Figs. 8.4 and 8.5 – are equivalent to Figs. 5.7 and 5.8 presented in Chap. 5 for the purely theoretical model (Routes and Rumours), but this time including actual empirical data, both on inputs and outputs, and allowing for a time-varying model response.

Fig. 8.4
Twelve graphs of value versus count. 1. Departure 1. 2. Departure 2. 3. Departure 3. 4. Error. 5. p info contacts. 6. p notice death. 7. p transfer info. 8. path penalty risk. 9. risk scale. 10. speed expl move. 11. speed expl stay. 12. speed risk.

Selected calibrated posterior distributions for the Risk and Rumours with Reality model parameters, obtained by using GP emulator. (Source: own elaboration)

Fig. 8.5
Two graphs of year versus value depict the simulation outputs for log underscore diff underscore arrived and log underscore diff underscore ints from 2017 to 2019.

Simulator output distributions for the not calibrated (black/darker lines), and calibrated (green/lighter lines) Risk and Rumours with Reality model. For calibrated outputs, the simulator was run at a sample of input points from their calibrated posterior distributions. (Source: own elaboration)

In the light of the results for the three successive model iterations, one important question from the point of view of the iterative modelling process is: to what extent does adding more empirically relevant detail to the model, but at the expense of increased complexity, change the uncertainty of the model output? To that end, Table 8.4 compares the results of the uncertainty analysis for the number of arrivals in three versions of the model: two theoretical (Routes and Rumours and Risk and Rumours), and one more empirically grounded (Risk and Rumours with Reality). The results of the comparison are unequivocal: the key indicator of how uncertain the model results are, the mean total variance in code output (shown in bold in Table 8.4) is by nearly two orders of magnitude larger for the more sophisticated version of the theoretical model, Risk and Rumours, than for the basic one, Routes and Rumours. On the other hand, the inclusion of additional data in Risk and Rumours with Reality, enabled reducing this uncertainty more than two-fold. Still, the variance of the expected code output turned out to be the largest for the empirically informed model version.

Table 8.4 Uncertainty analysis – comparison between the three models: Routes and Rumours, Risk and Rumours, and Risk and Rumours with Reality, for the number of arrivals, under Normal prior for inputs

At the same time, reduction in the mean model output for the number of arrivals is not surprising, as in Risk and Rumours, ceteris paribus, many agents may die during their journey, especially while crossing the high-risk routes. In the Risk and Rumours with Reality version, the level of this risk is smaller by an order of magnitude (and more realistic). This leads to adjusting the mean output back to the levels seen for the Routes and Rumours version, which is also more credible in the light of the empirical data, although this time with a more realistic variance estimate. In addition, the fitted variance parameters of the GP emulator are smaller for both Risk and Rumours models, meaning that in the total variability, the uncertainty related to the emulator fit and code variability is even smaller. In the more refined versions of the model, uncertainty induced by the unknown inputs matters a lot.

Altogether, our results point to the possible further extensions of the models of migrant routes, as well as to the importance of adding both descriptive detail and empirical information into the models, but also to their intrinsic limitations. Reflections on these issues, and on other, practical aspects of the process of model construction and implementation, are discussed next.

5 Reflections on the Model Building and Implementation

In terms of the practical side of the construction of the model, and in particular the more complex and more empirically grounded versions (respectively, Risk and Rumours, and Risk and Rumours with Reality), the modifications that were necessary to make the model ready for more empirically oriented studies were surprisingly easy to implement. In part, this was due to the transition to an event-based paradigm which, as set out in Chap. 7, tends to lead to a more modular model architecture.

Additionally, we found that it was straightforward to implement a very general scenario system in the model. Largely this is because Julia – a general-purpose programming language used for this purpose – is a dynamic language that makes it easy to apply modifications to the existing code during the runtime. Traditionally, dynamic languages (such as Python, Ruby or Perl) have bought this advantage with substantially slower execution speed and have therefore rarely been used for time-critical modelling. Statically-compiled languages such as C++ on the other hand, while much faster, make it much harder to do these types of runtime modifications. Julia’s just-in-time compilation, however, offers the possibility to combine the high speed of a static language with the flexibility provided by a dynamic language, making it therefore an excellent choice for agent-based modelling.

As concerns the combination of theoretical modelling with empirical experiments, one conclusion we can draw is that having a theoretical model first makes designing the empirical version substantially easier. Only after implementing, running, and analysing the first version of the model (see Chap. 3) were we able to determine which pieces of empirical information would be most useful in developing the model further. This also makes a strong case for using a model-based approach not only as a tool for theoretical research, but also as a method to guide and inspire empirical studies, reinforcing the case for iterative model-based enquiries, advocated throughout this book (see Courgeau et al., 2016).

In terms of the future work enabled by the modelling efforts presented in this book, the changes implemented to the model through the process we describe would also make it easy to tackle larger, empirically oriented projects that go beyond the scope of this work. In particular, with a flexible scenario system in place, we could model arbitrary changes to the system over time. For example, using detailed data on departures, arrivals and fatalities around the Mediterranean (see Chap. 4) as well as the timing of some crucial policy changes in the EU affecting death rates, we would be able to better calibrate the model parameters to empirical data. In the next step, we could then run a detailed analysis of policy scenarios (see Chap. 9) using the calibrated model to make meaningful statements on whether an increased risk does indeed lead to a reduction of arrivals.

Similar types of scenarios can involve complex pattern of changes in the border permeability, asylum policy developments, and either support or hostility directed towards refugees in different parts of Europe between 2015 and 2020. A well-calibrated model, together with an easy way to set up complex scenarios, would allow investigating the effectiveness of actual as well as potential policy measures, relative to their declared aims, as well as humanitarian criteria. An example of applying this approach in practice based on the Risk and Rumours with Reality model is presented in Chap. 9. In addition, the adversarial nature of some of the agents within the model, such as law enforcement agents and migrant smugglers, can be explicitly recognised and modelled (for a thorough, statistical treatment of the adversarial decision making processes, see Banks et al., 2015).

At a higher level, model validation remains a crucial general challenge in complex computational modelling. As laid out in Chaps. 4, 5 and 6, and demonstrated above, the additional data and ‘custom-made’ empirical studies, coupled with a comprehensive sensitivity and uncertainty of model outcomes, can be a very useful way of directly improving aspects of a model that are known to be underdefined. In order to be able to test the overall validity of the model, however, it ideally has to be tested and calibrated against known outcomes.

One possible way of doing that would entail focusing on a limited real-world scenario with relatively good availability of data. The assumption would then be that a good fit to the data in a particular scenario implies a good fit in other scenarios as well. For example, we could use detailed geographical data on transport topology in a small area in the Balkans, combined with data on presence of asylum seekers in camps, coupled with registration and flow data, to calibrate the model parameters. An indication of the ‘empirical’ quality of the model is then its ability to track historical changes in these numbers, spontaneous or in reaction to external factors. Given the level of spatial detail that would be required to design and calibrate such models, they remain beyond the scope of our work; however, even the version of the model presented throughout this book, and more broadly the iterative process of arriving at successive model versions in an inductive framework, enables making some conclusions and recommendations for practical and policy uses.

This discussion leads to a more general point: what lessons have we learned from the iterative and gradual process of model-building and its practical implementation? The proposed process, with five clearly defined building blocks, allows for a greater control over the model and its different constituent parts. Analytical (and theoretical) rigour, coherence of the assumptions and results, as well as an in-built process of discovery of the previously unknown features of the phenomena under study, can be gained as a result. Even though some elements of this approach cannot be seen as a purely inductive way of making scientific advances, the process nonetheless offers a clear gradient of continuous ascent in terms of the explanatory power of models built according to the principles proposed in this book, following Franck (2002) and Courgeau et al. (2016).

In terms of the analysis, the coherent description of phenomena at different levels of aggregations also helps illuminate their mutual relationships and trade-offs, as well as – through the sensitivity analysis – identify the influential parts of the process for further enquiries. Needless to say, for each of the five building blocks in their own right, including data analysis, cognitive experiments, model implementation and analysis, as well as language development, interesting discoveries can be made.

At the same time, it is also crucial to reflect on what the process does not allow. The proposed approach is unlikely to bring about much change in a meaningful reduction of the uncertainty of the social processes and phenomena being modelled. This is especially visible in the situations where uncertainty and volatility are very high to start with, such as for asylum migration. This point is particularly well illustrated by the uncertainty analysis presented in the previous section: introducing more realism in the model in practice meant adding more complexity, with further interacting elements and elusive features of the human behaviour thrown into the design mix. It is no surprise then that, as in our case, this striving for greater realism and empirical grounding has ultimately led to a large increase in the associated uncertainty of the model output.

In situations such as those described in this chapter, there are simply too many ‘moving parts’ and degrees of freedom in the model for the reduction of uncertainty to be even contemplated. Crucially, this uncertainty is very unlikely to be reduced with the available data: even when many data sources are seemingly available, as in the case of Syrian migration to Europe (Chap. 4), the empirical material that corresponds exactly to the modelling needs, and can be mapped onto the sometimes abstract concepts used in the model (e.g., trust, confidence, information), is likely to be limited. This requires the modellers to make compromises, and make sometimes arbitrary decisions, or leave the model parameters underspecified and uncertain, which increases the errors of the outputs further.

These limitations underline high levels of aleatory uncertainty in the modelling of such a volatile process as asylum migration. Even if the inductive model-building process can help reduce the epistemic uncertainty to some extent, by furthering our knowledge on different aspects of the observed phenomena, it also illuminates clearly the areas we do not know about. In other words, besides learning about the social processes and how they work, we also learn about what we do not know, and may never be able to know. Besides an obvious philosophical point, variably attributed to many great thinkers from Socrates to Albert Einstein (passim), that the more we know, the more we realise what we do not know, this poses a fundamental problem for possible predictive applications of agent-based models, even empirically grounded.

If simulation models of social phenomena are to be realistic, and if they are to reflect the complex nature of the processes under study, their predictive capabilities are bound to be extremely limited, maybe except for very specific and well-defined situations where exact description of the underlying mechanisms is possible. At the same time, such models allow for knowledge advances in making possible, and furthering the depth and nuance of, theoretical explanations. The process we propose in this book additionally enables the researchers to identify gaps and future research directions, so that the modelling process of a given phenomenon could continue. We discuss some ideas in terms of the possible scientific and policy impacts in the next chapter, with examples based on the current versions of the Risk and Rumours model, both theoretical, and empirically grounded.