In the concluding chapter we summarise the theoretical, methodological and practical outcomes of the model-based process of scientific enquiry presented in the book, against the wider background of recent developments in demography and population studies. We offer a critical self-reflection on further potential and on limitations of Bayesian model-based approaches, alongside the lessons learned from the modelling exercise discussed throughout this book. As concluding thoughts, we suggest potential ways forward for statistically-embedded model-based computational social studies, including an assessment of the future viability of the wider model-based research programme, and its possible contributions to policy and decision making.

1 Bayesian Model-Based Population Studies: Moving the Boundaries

Given the current state of knowledge, what are the perspectives for computational migration and population modelling? The two intertwined challenges, those of uncertainty and complexity, can be broken down into a range of specific knowledge gaps, dependent on the context and research questions being addressed. The explanatory power of simulation models (for a general discussion, see Franck, 2002 and Courgeau et al., 2016), well suited for tackling the complexity of social processes, such as migration, can be coupled with the statistical analysis aimed at the quantification of uncertainty. Throughout this book, we have argued for the use of modelling and its encompassing statistical analysis as elements of a language for describing and formalising relationships between elements of complex systems. We discuss some of the specific points and lessons next.

The main high-level argument put forward in this book is that model building is – or needs to be – a continuing process, which aims to reduce the complexity of social reality. The formal sensitivity analysis helps retain focus on the important aspects, while disregarding those whose impact is only marginal. All the constituting building blocks of this process are therefore important: starting from the computational model itself, and its implementation in a suitable programming language, through empirical data, information on human decision making – which, as in our case, can come from experiments – and the statistical analysis of each model version. All of these elements contribute to our greater ability to understand the model workings, while retaining realism about the degree to which the model remains a faithful description of the reality it aims to represent. The formalisation of model analysis also allows us to explore the model behaviour and outcomes in a rigorous way, while being transparent about the assumptions made. In this way, we can illuminate the micro-level mechanisms (micro-foundations) that generate the population-level processes we observe at the macro scale, while formally acknowledging the different sources of their uncertainty.

Of course, when it comes to representing reality, all models are more likely to hold higher resemblance to the actual processes under specific conditions. To that end, adding more detail and data helps approximate the reality, but this comes at a cost of increased uncertainty. By doing so, the models also run the risk of losing generality, and their nature becomes more descriptive than predictive or explanatory. At the same time, as shown in Chap. 9, there are trade-offs involved in the different purposes of modelling, too: better predictive capabilities of a model can lead to a loss of explanatory power of the underlying mechanisms, if it is dominated by the information used for model calibration.

In such cases, additional effort is required in terms of data collection and assessment, to make sure that the model-based description of an idiosyncratic social process is as accurate as possible. The successive model iterations may then not be strictly embedded within one another, so that the ‘ascent’ of knowledge, which would be ideally seen in the classical inductive approach, is not necessarily monotonic (Courgeau et al., 2016). Still, even in such cases, the more detailed models can offer more accurate approximations of the reality. Formal description of the model-building process, for example by using provenance modelling tools discussed in Chap. 7, can help shed light on that, while keeping track of the developments in the individual building blocks in the successive model versions.

At the same time, such models can retain some ability to generalise their outcomes, although at the price of increased uncertainty. To that end, models can still make some theoretical contributions (Burch, 2018), especially if ‘theory’ is not interpreted in a strict nomological way, as a set of well-established propositions from which the predictions can be simply deduced (Hempel, 1962). Instead, the models can answer well-posed explanatory questions (‘how?’) in a credible manner – offering increasingly plausible descriptions of the underlying social mechanisms, as long as their construction follows several iterations of the outlined process, checking the model-based predictions against the observed reality. At the same time, some residual (aleatory) uncertainty always remains, especially in the modelling of social processes, and addressing it requires going beyond models alone.

In the light of the above findings, the modelling processes can also be given novel interpretations. Social phenomena, such as migration, are very complicated and complex inverse problems, which in the absence of an omniscient Laplace’s demon – a hypothetical being with the complete knowledge of the world, devoid of the epistemic uncertainty – do not have unique solutions (see Frigg et al., 2014). The scientific challenges of model identifiability are therefore akin to the studies of non-response or missing information, but this time carried out on a space of several possible (and plausible) models. Model choice becomes yet another source of the uncertainty of the description of the process under study, alongside the data, parameters, expert input, and so on. Still, the iterative model construction process advocated throughout this book enables building models of increasing analytical and explanatory potential, which at the same time remain computationally tractable.

This is yet another argument for turning to the philosophy of Bayesian statistical inference: the initial model specification is but a prior in the space of all possible models, and the modelling process by which we can arrive at the increasingly accurate approximations of reality is akin to Bayesian model selection. Of course, there is an obvious limitation here of being restricted to a class of models pre-defined by the modellers’ choices and, ultimately, their imagination (see also the discussion of inductive and abductive reasoning in Chap. 2). The inductive process of iterative learning about the dynamics of complex phenomena, besides being potentially Bayesian itself, can also include several other Bayesian elements, describing the uncertainty of different constituting parts, such as individual decisions of agents in the model (and updating of knowledge), model estimation and calibration, and meta-modelling.

The status quo in demography and population studies, on which this work builds, can be broadly described as the domination of empiricism at the expense of more theoretical enquiries (Xie, 2000), with an increasing recognition that some areas of theoretical void can be filled by formal models (see Burch, 2003, 2018). At the same time, recent years have seen promising advances in the demographic and social science methodology. The modelling approaches of statistical demography, including Bayesian ones, hardly existent until the second half of the twentieth century, are now a well-established part of mainstream population sciences (Courgeau, 2012; Bijak & Bryant, 2016), while agent-based and other computational approaches, despite recent advances (Billari & Prskawetz, 2003; van Bavel & Grow, 2016), remain somewhat of a novelty. So far, as discussed in Daniel Courgeau’s Foreword, these two modelling approaches have remained hardly connected, and connecting them was one of the main motivations behind undertaking the work presented in this book.

Against this background, our achievements can be seen both at the level of the individual constituent parts of the modelling process, presented in Chaps. 3, 4, 5, 6, and 7, as well as – if still tentatively – the way in which they can coherently work together. To that end, advances made at the level of process development and documentation, together with their philosophical underpinnings, offer a blueprint for constructing empirically relevant computational models for studying population (and, more broadly, social) research questions. The opening up of population and other social sciences for new approaches and insights from other disciplines can be an important step towards moving the boundaries of analytical possibilities for studying the complex and the uncertain social world. However, despite all the advances, some important obstacles on this journey remain, which we discuss next.

2 Limitations and Lessons Learned: Barriers and Trade-Offs

From the discussion so far, key challenges for advancing the Bayesian model-based agenda for population and broader social sciences are already clear. The main one relates to putting the different building blocks together in a unified, interdisciplinary modelling workflow. The interdisciplinarity is of lesser concern: most disciplines in social sciences are very familiar and comfortable with the high-level notion of modelling as an approximation of reality, so all that is needed for a successful bridging of disciplinary barriers is willingness to share other perspectives, open communication, and clear definitions of the concepts and ideas so that they can be understood across disciplines.

A much greater challenge lies in the fusion of different building blocks at an operational level: how to include experimental results in the simulation model? How to operationalise data and model uncertainty? How to implement the model in a way that balances computational efficiency with the transparency of code? These are just a few examples of questions that need answering for this approach to reach its full potential. Some possibilities for ideas dealing with these challenges have been proposed throughout this book, but they are just the tip of the iceberg. To develop some of these ideas further, and to come up with robust practical recommendations, a higher-level reflection is needed. Such a synthetic view and advice could be offered, for example, from the point of view of philosophy of science, science and technology studies, or similar meta-disciplines.

Another key challenge relates to the empirical information being too sparse and not exactly well tailored, either for the model requirements, or for answering individual research questions. What is contained in the publicly available datasets is often, at least to some extent, different to what is needed for modelling purposes. This leads to important problems at several levels. First, the models can be only partially identified through data, with many data gaps and free parameters compounding the output uncertainty. Second, the quality of the existing data may be low, with their uncertainty assessment contributing additional errors into the model. Third, the use of proxies for variables that conceptually may be somewhat different (e.g. GDP per capita instead of income, or Euclidean distance between capital cities of origin and destination countries instead of the distance travelled), can introduce additional biases and uncertainty, not all aspects of which may be readily visible even after a thorough quality assessment (see Chap. 4). The operationalisation problem is particularly acute for such variables and concepts as, for example, trust, risk-aversion, or many other psychological traits, for which no standard measures exist.

At the same time, as shown in Chaps. 5 and 8, modelling coupled with a formal sensitivity analysis can provide a way of identifying the data and knowledge gaps, and consequently of filling them with information collected through dedicated means. From the point of view of addressing individual research questions, this can be quite resource-consuming, sometimes prohibitively so, as it requires devoting additional resources in terms of time, labour and money, to the collection of new data. Yet when such data can be generated and deposited in an open-access repository, such activities, whenever possible, can offer positive externalities for a broader research community, with the possible applications of the collected data going beyond a particular piece of research (see Chap. 10). The same holds for tailor-made experiments, for which an additional aspect of the sensitivity analysis involves verifying the impact of psychologically plausible decision rules and mechanisms against the default placeholder assumptions, such as rational choice and maximum utility (Chap. 6).

The interpretation of models as tools to broaden the understanding of the processes at hand, through illuminating the information gaps, feedbacks, unintended consequences, and other aspects of individual-level human decisions and their impact on observed macroscopic, population-level patterns, is one of the many non-predictive applications of formal modelling (Epstein, 2008). In fact, as with the examples presented in this book, the purely predictive uses of models become of secondary importance. There is so much uncertainty in complex social and population processes, that not only proper description of the full extent of this uncertainty becomes difficult, but also any formal decision analysis on the basis of such predictive models would be very limited, and may well be hardly possible.

In the case of complex social processes, even once everything that is potentially known or knowable has been accounted for, and the corresponding epistemic uncertainty, related to imperfect knowledge, has been reduced, the residual uncertainty remains large. Even the most carefully designed and calibrated models still reflect the underlying messy and complex social reality, which is characterised by relatively large and irreducible aleatory uncertainty, related to the intrinsic randomness of the social world. For such applications, the focus of the analysis shifts from exact prediction and the resulting well-defined cost-benefit decision analysis, to aiding the broader preparedness and planning. In this way, the models can play an important role in testing the impact of different scenarios and assumptions, including qualitative ones, in a logically coherent simulated environment (Chap. 9).

The main lessons learned from the model-based endeavours, however, are about trade-offs. Of course, such trade-offs also exist at the level of the model analysis, with changes in some variables having non-trivial impact on others through non-linear relationships and feedback loops. Still, from the methodological point of view, even more important may be the process-level trade-offs, such as between increasing the level of detail and description of the social phenomena (topology of the world, decision processes, agents’ memory and learning, and so on), and the computational constraints, including run times, computer memory efficiency.

Every building block of the modelling process includes trade-offs as well. For data, the choice may be between their bias and variance; for experiments, between different levels of cognitive plausibility and less realistic default assumptions; for implementation, between general-purpose and domain-specific languages; for the analysis, between descriptive and more sophisticated analytical tools; and for documentation, between description and formalisation. As in real life, modelling leaves plenty of room for choice, but the model-based process we suggest in this book is designed to help make these choices and their consequences transparent and explicit.

3 Towards Model-Based Social Enquiries: The Way Forward

So, in summary, what can formal models and the lessons learned from following an interdisciplinary modelling process potentially offer population and other social scientists? The specific findings and more general reflections reported throughout this book point to important insights that can be generated by modelling, not necessarily limited to the specific research question or questions, but also leading to chance discoveries of some related process features, which can in turn produce new insights or lines of enquiry. In this way, modelling increases not only our understanding of the pre-defined features of the processes, but also the more general characteristics of the process dynamics. This is especially important for such complex and uncertain phenomena as migration flows. At the same time, it is also important to reflect on the practical limitations of furthering the model-based agenda, and health warnings related to the interpretation of the model results.

The key lessons from the work we describe throughout this book are threefold. First, modelling of a complex social phenomenon itself is a process, not a one-off endeavour. The process is iterative, and its aim is an ever-better sequence of approximations of the problem at hand, in line with the inductive philosophical principles of the scientific method, possibly coupled, where needed, with the pragmatic tenets of abductive reasoning (see Chap. 2). Second, the presence of many aspects of the modelling process – as well as of the process being modelled, especially in the social realm – requires true interdisciplinarity and interconnectedness between the different perspectives, rather than working in individual, discipline-specific silos. Third, the formal acknowledgement of uncertainty – in the data, parameters, and models themselves – needs to be central to the modelling efforts. Given the complex and highly structured nature of social problems, Bayesian methods provide an appropriate formal language for describing this uncertainty in different guises. These principles, coupled with a thorough and meticulous documentation of the work, both for legacy purposes and possible replication (see Chap. 10), are the main scientific guidelines for model development and implementation.

At the same time, the impact of models is not limited to the scientific arena. To make the most of the modelling endeavours targeted at practical applications, as argued in Chap. 9, the involvement of the users and other relevant audiences in the modelling process needs amplifying. This in turn requires greater modelling literacy on the part of the model users, next to statistical literacy (Sharma, 2017). The onus on ensuring greater literacy is on modellers, though: the communication of model workings and limitations needs to be specific and trustworthy, and provided at the right level of technical detail for the audience to understand. The levels of trust can be, of course, heightened by following established conventions in modelling (see Chap. 3): carrying out a thorough assessment of the available data (Chap. 4) and a multi-dimensional assessment of uncertainty (Chap. 5); following established ethical principles in gathering information that requires it (Chap. 6); and providing meticulous documentation of the process, for example through ODD and provenance description (Chap. 7). In short, the keys to good communication and effective user involvement are transparency, rigour, and awareness of the limitations of modelling. At the same time, the very purpose of model-building, and any practical uses of the models, are also related to societal values and can have ethical dimensions, which needs to be borne in mind.

There are other practical obstacles related to interdisciplinary modelling. Large and properly multi-perspective modelling endeavours are themselves complex, time-consuming and costly, having to rely on interdisciplinary teams. For communication within teams, a common language needs to be established, ensuring that the joint efforts are targeting shared problems. Even within the best-functioning teams, however, scientific challenges at the connecting points between the disciplines are inevitable (see Chap. 8). At the same time, overcoming them takes time and patience. Some interesting discoveries reported in this book were a result of our evolution in thinking about the modelling process and its components over the course of a five-year project. That there are not too many existing examples of such modelling projects and endeavours, is exactly why such work is both needed, and so difficult at the same time. This is also why large-scale scientific investments, offering funding beyond disciplinary silos, with modelling explicitly recognised as cross-cutting activity, are of crucial importance. They provide the necessary structures to help scientists from different areas connect by making them learn – and speak – the same language: the language of formal models.

Of course, modelling cannot solve all problems faced by population sciences, migration studies, or social enquiries more generally. As argued above, the aleatory uncertainty, some of which is related to human behaviour and agency, remains irreducible: this is in fact a welcome sign of the power of human spirit, free will and imagination. Still, formal models can help us get answers to questions that are more complex and sophisticated – and hopefully also more interesting and relevant – than those allowed by the more traditional social science tools. This is the beginning of a longer journey into the world of modelling, and despite the price that has to be paid for engaging in such activities, this is definitely worth doing, for the sake of exploring new intellectual horizons, designing more robust solutions to practical and policy problems, and ultimately making the social world a bit less uncertain.