Abstract
This chapter introduces the purpose of the book. When a researcher needs to perform microsimulation for population projections, building its own model with a common statistical software such as SAS might a good option, because this software is widely used among scholars and is taught in most social sciences departments. We define what is microsimulation: a modelling based on individual-level data rather than aggregated level data, in which transitions between the states are determined stochastically with a random experiment. We finally provide some examples of microsimulation models used by social scientists.
You have full access to this open access chapter, Download chapter PDF
Keywords
1.1 Why This Book?
Most population projections forecast the population using only demographic characteristics (age and sex), but the inclusion of additional dimension such as education (Lutz et al. 2014) and sociocultural variables (Bélanger et al. 2019) is an emerging approach in the social sciences (Spielauer 2010). Indeed, in addition to providing a richer set of outputs, including additional dimensions provides more flexibility in the generation of policy-relevant alternative projection scenarios. Furthermore, it improves the overall quality of the projection, as more sources of heterogeneity are considered, which also allows for a more refined modeling of demographic events.
Traditional demographic projections using the cohort-component method can only provide outcomes related to the age and sex structure of a population. When extended to multistate and multiregional applications (Rogers 1980, 1995), more dimensions can also be added (such as region or education). Microsimulation is a powerful tool that can be used to create population projections when the number of dimensions becomes large. Such a model is very flexible and characterised by the stochastic simulation of individual life courses based on derived parameters and individual characteristics (Van Imhoff and Post 1998). Until the late 90s, computer power was not sufficient to use microsimulation for very complex population projection. However, with a newer generation of powerful computers, some institutions around the world changed their projection methods to microsimulation (Caron-Malenfant et al. 2017).
Many microsimulation models are built using a language or a software specifically designed for this purpose, such as ModGen, JANSIM, Mic-Core, or OpenM++ (Bélanger and Sabourin 2017; Mannion et al. 2012; Zinn 2014). Using these tools requires specific and exhaustive prior knowledge, as they are complex and not user-friendly. Moreover, user guides and online support are in general limited, given the small number of users. Most of those tools are also not very flexible, as they are usually designed for a specific purpose and their functions cannot be modified or adapted easily for other purposes. This also keeps the user in the dark concerning what exactly happens when a function is called, sometimes leading to unexpected or awkward outcomes. Indeed, when using such tools, the assistance of a coding expert is generally required.
For those reasons, when a researcher needs to perform microsimulation, building one’s own model with common statistical software, such as SAS, Stata, or R, might be a good option. These programs are widely used among scholars and are taught in most social science departments, so many social scientists already have the required background in the coding language. Given a large number of users, online support can also be found easily when needed.
Microsimulation packages specifically designed for population projection already exist in R (Zinn 2014). This book is a step-by-step guide showing how to build a microsimulation model for demographic projections using the SAS language. For this book, we used SAS 9.4 Codes we provide also work with other versions of SAS, such as SAS University Edition. The guide is designed for people with beginner to intermediate knowledge in SAS. We suggest codes that are easy to understand so that they can be replicated or adapted for other purposes. They are however not necessarily the most efficient.
First, this book shows how to convert an existing multistate projection by age, sex, education and region into a microsimulation model framework. Two new dimensions are then added, the labour force participation and the sector of activity, and some examples of outputs and alternative scenarios that would not be possible with standard demographic methods are shown. Other chapters show how to adapt the model for other countries or other purposes.
The book is intended for people with a good background in demography, population dynamics, and quantitative analysis, who wish to extend their technical skills by learning how to use microsimulation in demography with SAS. The user needs to know the principles of population projection, as the book does not explain how to build demographic assumptions for the future. The demographic components of the microsimulation models constructed as examples in this book come from existing multistate projections, either from KC et al. (2018) or from Lutz et al. (2018), that forecast populations by region and educational attainment. We do however build assumptions for additional dimensions of the projection, labour force participation and sector of activity, which are modelled from various surveys.
For each chapter, all input files and code files used in this book can be found in the Chapter ESM (Electronic Supplementary Materials).
1.2 What Is Microsimulation? Why Use It?
Microsimulation is an alternative approach to the deterministic macro-level population projection models that use aggregate-level data, such as the cohort-component method, to project future population dynamics. In microsimulation, the modelling is based on individual-level data. Though microsimulation methods have been conceptualised for decades and used for other purposes (Orcutt 1957), their application for population forecasts is quite new. For an exhaustive description of microsimulation for population forecasting and its properties, compared to multistate cohort-component methods, see Van Imhoff and Post (1998).
A microsimulation model starts from a baseline population that consists of individual actors whose characteristics represent the composition of a given population across chosen dimensions. These individual actors are exposed to the risk of a set of events relevant to their state and specific to their own characteristics: death, births of children (which generate new actors inside the model), moving to a different region in a country, leaving the country, achieving a level of education, entering or exiting the labour market, and so on. International immigrants enter the model with a predetermined set of individual characteristics and are subjected to risks of the events mentioned above. Transitions between the states are determined stochastically with a random experiment (Monte Carlo method). Microsimulation thus allows not only for including a larger set of dimensions than the standard multistate population projection models, in which handling more than three or four dimensions becomes challenging but also for handling competing risks easily.
Figure 1.1 shows a simple example of how stochastic microsimulation works (using the Monte Carlo method), compared to the cohort-component method. Suppose 1000 women were aged 75–79 in 2015. If we assume a probability of dying of 5%, the cohort-component method will simply remove 5% of the cohort, and we will get 950 survivals aged 80–84 in 2020.
In microsimulation, we start with a dataset in which each row is an individual. In this example, we thus have 1000 rows representing the 1000 women age 75–79 in 2015, all tagged as being alive. Some of them will die before 2020, about 5% according to our assumptions. We determine who will die with random experiments, which implies comparing the probability of dying (5%) with a linear random number between 0 and 1. When the random number is lower than the probability of dying, the individual dies, and we switch the variable alive to 0. Out of 1000, we will get about 950 survivals. When the sample is small (for instance 10 individuals), the number of survivals in a single run could be far from the expected numbers: this is the Monte Carlo error resulting from the random experiment. In these cases, it might be useful to take the average of multiple simulations or increase the sample size, which would reduce the error.
If microsimulation gives similar results to the cohort-component methods, why choose this method? Spielauer (2010) describes three broad situations when microsimulation should be used, rather than the multistate cohort-component method:
-
When heterogeneity matters in the projection modeling or in the projection outcomes. The multistate cohort-component method can only handle a limited number of dimensions because the number of cells for the transition matrices corresponds to the multiplication of the number of categories of each dimension. In microsimulation, each additional dimension only adds a new column in the dataset. Suppose we have a 7-dimension model projecting age (20 age groups), sex (2 categories), education (6 categories), education of the mother (6 categories), region (70 categories), labour force participation (2 categories) and child parity (10 categories). The matrix for a multistate model would require more than 2 M cells (20*2*6*6*70*2*10). In microsimulation, the number of cells is the number of individuals in the sample multiplied by the number of dimensions. So if we have a sample of 100,000 individuals, the number of cells would be 700,000 (7 dimensions * 100,000). Then, if we want to add another dimension, for instance the religion in 4 categories, the number of cells in the multistate would be multiplied by 4 and would exceed 8 M, while in the microsimulation model, we just add one column to the data set and get 800,000 cells, which is much more manageable.
-
When behaviours can be better understood at the micro level than the aggregated level. For instance, the number of years spent in a country is a major predictor for immigrants’ fertility, mortality or labour force participation. At the micro level, these predictors can easily be taken into account. Only one additional column is required for the variable “time spent in the country”, the value of which is incremented every year without any complex modeling. The variable can then be used in the modeling of other events, using, for instance, relative risks and logit parameters.
-
When individual histories matter. For instance, past life habits might have a big impact on mortality and older ages. Similarly, retirement pensions depend in many cases of the past income and number of years worked. Microsimulation can also easily keep a record of the birth history of women. Every time the birth event occurs, we can just increment a variable “number of births”, which can then be used once women get older to analyse their potential as caregivers.
1.3 Examples of Demographic Projections Using Microsimulation
Many types of microsimulation models have been developed and used to address different types of research questions in various fields. For example, they have been used to evaluate the future performance of long-term programs such as pensions (Morrison 2017) and long-term care (Carrière et al. 2008), to simulate the potential impacts of prospective public policies or policy changes (Sutherland 2007), and to project life-time behaviours (e.g. saving) or complex dynamics (e.g. ageing) for policy analysis (Sundberg 2007). An exhaustive overview of microsimulation applications in social sciences and other areas can be found elsewhere (Li & O’Donoghue 2013; Spielauer 2010).
Recent developments in computing technology, as well the rise in the number of micro-data sources needed to calculate the parameters of microsimulation, have made it easier to develop more complex models and have increased the level of interest in such models (Bélanger and Sabourin 2017). Those interested in reading about the different uses of specific microsimulation models and their specific methodological issues can browse the International Journal of Microsimulation,Footnote 1 which is the official peer-reviewed journal of the International Microsimulation Association.
With regard to population projections that use microsimulation, Statistics Canada, the official statistical agency of Canada, is a pioneer. The agency has used microsimulation methods for its official projections for many years. This started in 2004 with the model PopSim (now DemoSim) which was designed to project the Canadian population in terms of various characteristics (Caron-Malenfant et al. 2017). The model is built using the ModGen language and its most recent version begins with the microdata file of the National Household Survey of 2011. It projects dynamically and in continuous-time on the one hand, sociodemographic characteristics such as age, sex, education and labour force participation, and on the other hand, several ethnocultural variables, such as visible minority group, place of birth, generation status, and language.
As Canada is becoming more and more diverse with large inflows of international immigrants, the model includes explicitly the different behaviours of ethnocultural groups living in the country. Among other sources of heterogeneity, the model accounts for higher fertility for some ethnic groups (Black, Muslim, First Nations), as well as for recent immigrants, compared to those who have been living in the country longer. It accounts for the higher propensity of international immigrants to emigrate (return migration), as compared to the native population. The “healthy immigrant” effect is also implemented, which provides immigrants with lower probabilities of dying in the years following their arrival as a result of direct and indirect immigration selection (McDonald and Kennedy 2004). Domestic migration is also modulated by languages, as the French and English speakers that constitute the core of the Canadian population have very different mobility patterns.
Microsimulation is the only possible method for dynamically including such heterogeneity in sociodemographic behaviours, thus allowing for more accurate and more detailed projection outcomes. Statistics Canada has used the model to produce several reports on future Canadian populations, such as visible minority groups (Morency et al. 2017), aboriginal populations (Caron Malenfant et al. 2015), and language speakers (Houle and Corbeil 2017), and to forecast labour force participation (Martel 2019).
DemoSim is built using several confidential data files that are not available to external researchers. From public microdata files, the Laboratoire de simulations démographiques (LSD) (Demographic Simulation Laboratory) of the Institut national de la recherche scientifique (National Institute for Scientific Research) proposed a framework for a lighter version of the microsimulation model that could project the population while accounting for several sociodemographic and ethnocultural variables, in order to study population changes in a context of relatively high immigration and low fertility (Bélanger et al. 2019). This framework has been adapted to produce several region-specific versions. For instance, the LSD framework was used to build a model for the United States (Van Hook et al. 2020), LSD-USA, from the anonymised public files of the 2015 American Community Survey and General Social Surveys (1995–2015). It projects the population of the USA to 2065 and includes dimensions such as race, generation, duration of stay, education and labour force participation. LSD-USA has been used to project the effect of several policy-oriented scenarios regarding immigration levels and educational attainment on the future workforce of the country (Van Hook et al. 2020).
From the LSD framework, the Center of Expertise on Population and Migration (CEPAM), a partnership between the International Institute for Applied Systems Analysis and the Joint Research Center of the European Commission, built a similar model called CEPAM-Mic (Bélanger et al. 2019). The base population and assumptions are built from different sources: public microdata files of European Labour Force Surveys and General Social Surveys on the one hand, and aggregated data from the Census 2011 and from a multistate cohort-component model on the other hand.
The CEPAM-Mic model can dynamically project the population for the EU28 member states in terms of several socioeconomic and ethnocultural dimensions, including education, labor force participation, employment. age at immigration, region of birth, duration of residence, education of the mother, religion and language. This model allows for the study of alternative scenarios of migration and their consequences on future populations and labour supply trends in the European Union. It has been used to assess policy-relevant scenarios with regard to sociocultural inequalities in education (Marois et al. 2019a), and integration of immigrants (Marois et al. 2019b), as well as to propose an innovative dependency ratio that takes into account the productivity of workers (Marois et al. 2020). CEPAM-Mic allows researchers to assess a large range of policy-relevant alternative scenarios and produce indicators showing that population aging is less daunting than it may seem when only age structure is considered.
Beyond ethnocultural and sociodemographic variables, other types of dimensions can also be implemented in microsimulation models for demographic projections. Starting from the CEPAM-Mic model mentioned above, the model ATHLOS-Mic implements a health module that refines projection outcomes (Marois and Aktas 2021). This module adds a health metric ranging from 0 to 100 and a set of risk factors (such as smoking, obesity, etc.) to the characteristics of individuals. Changes in risk factors are determined with logit regression parameters that take into account other risk factors. The value of the health metric, which is also used to modulate the probability of dying, is then determined from risk factors and other sociodemographic characteristics. This model thus allows researchers to assess the impact of policy-intervention scenarios on different outcomes, such as the number of years of life lost or the average health of the population.
References
Bélanger, A., & Sabourin, P. (2017). Microsimulation and population dynamics: An introduction to Modgen 12. Springer International Publishing.
Bélanger, A., Sabourin, P., Marois, G., et al. (2019). A framework for the prospective analysis of ethno-cultural super-diversity. Demographic Research, 41, 293–330. https://doi.org/10.4054/DemRes.2019.41.11
Caron Malenfant, É., Coulombe, S., Langlois, S., & Morency, J.- D. (2015). Projections of the aboriginal population and households in Canada, 2011–2036. Statistics Canada, Ottawa, Canada.
Caron-Malenfant, É., Coulombe, S., & Grenier, D. (2017). Demosim: An overview of methods and data sources.
Carrière, Y., Keefe, J., & Légaré, J., et al. (2008). Projecting the future availability of the informal support network of the elderly population and assessing its impact on home care services. Statistics Canada
Houle, R., & Corbeil, J.- P. (2017). Language projections for Canada, 2011–2036. Statistics Canada, Ottawa, Canada.
Kc, S., Wurzer, M., Speringer, M., & Lutz, W. (2018). Future population and human capital in heterogeneous India. Proceedings of the National Academy of Sciences of the United States of America, 115, 8328. https://doi.org/10.1073/pnas.1722359115
Li, J., & O’Donoghue, C. (2013). A survey of dynamic microsimulation models: uses, model structure and methodology. International Microsimulation Association, 6(2), 3–55. International Journal of Microsimulation, 6, 3–55
Lutz, W., Butz, W. P., & KC S, (Eds.). (2014). World Population and human capital in the twenty-first century. Oxford, UK: Oxford University Press.
Lutz, W., Goujon, A., & KC S, , et al. (Eds.). (2018). Demographic and human capital scenarios for the 21st century. Luxembourg: Publications Office of the European Union.
Mannion, O., Lay-Yee, R., & Wrapson, W., et al. (2012). JAMSIM: A microsimulation modelling policy tool. Journal of Artificial Societies and Social Simulation, 15.
Marois, G., & Aktas, A. (2021). Projecting health-ageing trajectories in Europe using a dynamic microsimulation model. Scientific Reports, 11, 1785. https://doi.org/10.1038/s41598-021-81092-z
Marois, G., Sabourin, P., & Bélanger, A. (2019a). How reducing differentials in education and labor force participation could lessen workforce decline in the EU-28. Demographic Research, 41, 125–160.
Marois, G., Sabourin, P., & Bélanger, A. (2019b) Implementing dynamics of immigration integration in labor force participation projection in EU28. Population Research and Policy Review. https://doi.org/10.1007/s11113-019-09537-y
Marois, G., Bélanger, A., & Lutz, W. (2020). Population aging, migration, and productivity in Europe. Proceedings of the National Academy of Sciences of the United States of America, 117, 7690. https://doi.org/10.1073/pnas.1918988117
Martel, L. (2019). The labour force in Canada and its regions: Projections to 2036. Statistics Canada, Ottawa, Canada.
McDonald, J., & Kennedy, S. (2004). Insights into the “healthy immigrant effect”: Health status and health service use of immigrants to Canada. Social Science & Medicine, 59, 1613–1627.
Morency, J.- D., Caron Malenfant, É., & MacIsaac, S. (2017) Immigration and diversity: Population projections for Canada and its regions. Statistics Canada, Ottawa, Canada.
Morrison, R. J. (2017). Rates of return in the Canada pension plan: Sub-populations of special policy interest and preliminary after-tax results. In: New frontiers in microsimulation modelling. Routledge, London.
Orcutt, G. H. (1957). A new type of socio-economic system. The Review of Economics and Statistics, 39, 116–123. https://doi.org/10.2307/1928528
Rogers, A. (1980). Essays in multistate mathematical demography. Laxenburg, Austria: International Institute for Applied Systems Analysis (IIASA).
Rogers, A. (1995). Multiregional demography: Principles, methods and extensions. Chichester, UK: Wiley.
Spielauer, M. (2010). What is social science microsimulation? Social Science Computer Review, 29, 9–20. https://doi.org/10.1177/0894439310370085
Sundberg, O. (2007). Model 5: SESIM (Longitudinal dynamic microsimulation model). Modelling our future: Population ageing, health and aged care (pp. 453–460). Bingley: Emerald Group Publishing Limited.
Sutherland, H. (2007). EUROMOD: The tax-benefit microsimulation model for the European union. Modelling our future: Population ageing, health and aged care (pp. 483–488). Amsterdam: Elsevier.
Van Hook, J., Bélanger, A., Sabourin, P., & Morse, A. (2020). Immigration selection and the educational composition of the US labor force. Population and Development Review, 46, 321–346. https://doi.org/10.1111/padr.12315
Van Imhoff, E., & Post, W. (1998). Microsimulation methods for population projection. Population: an English Selection, 10, 97–138.
Zinn, S. (2014). The MicSim package of R: An entry-level toolkit for continuous-time microsimulation. International Journal of Microsimulation, 7, 3–32.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2021 The Author(s)
About this chapter
Cite this chapter
Marois, G., KC, S. (2021). Introduction. In: Microsimulation Population Projections with SAS. SpringerBriefs in Population Studies. Springer, Cham. https://doi.org/10.1007/978-3-030-79111-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-79111-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79110-0
Online ISBN: 978-3-030-79111-7
eBook Packages: Social SciencesSocial Sciences (R0)