1 Introduction

The ramifications of prenatal events on child health have far-reaching implications that persist well into adulthood, ranging from consequences for mental health, educational attainment and labor market outcomes (Almond and Currie, 2011; Almond et al., 2018). A large strand of the literature has proposed stress during pregnancy as one of the key drivers of adverse health outcomes for children (Black et al., 2016; Mansour and Rees, 2012; Camacho, 2008). Yet, these studies face two empirical challenges that prohibit a clear conclusion about the role of stress. First, unobserved heterogeneity at the level of the household may lead to differential selection into stressful environments, based on risk aversion, job flexibility or other unobservable characteristics and thus bias the results (Koppensteiner and Manacorda, 2016; Black et al., 2016; Duncan et al., 2016; Currie and Rossin-Slater, 2013). Second, in the context of conflict-related stress, many of these studies focus on secular rises or extreme episodes of violence, such as 9/11 or the al-Aqsa Intifada, with potentially large consequences for local infrastructure and access to prenatal care (Eccleston, 2011; Mansour and Rees, 2012; Akbulut-Yuksel, 2014; Lee, 2014; Quintana-Domeque and Ródenas-Serrano, 2017; Lebedinski et al., 2021).

In this paper, I improve on these empirical challenges by exploiting within-mother variation in exposure to violence during pregnancy, effectively comparing differences in health outcomes between siblings with and without prenatal exposure to violence and by leveraging rich information on the type, severity and perpetrator of violence that allow me to consider incidents with little impact on the local infrastructure and incidents that are arguably more stressful. I investigate this question in an understudied context: insurgent violence following the US invasion of Iraq in 2003. I combine geo-coded information on civilian casualties between 2003 and 2009 from Condra and Shapiro (2012) with detailed information on outcomes for over 36,000 children from the Multiple Indicator Cluster Survey (MISC) conducted in Iraq in 2011. I use information on children’s day of birth to trace back in utero exposure to violence and include mother and children’s year of birth interacted with district fixed effects. Additionally, this paper is the first to go beyond birth weight as a primary measure of health, following children throughout their first years of life and tracking their cognitive, motor and behavioral skills.

The identifying assumption relies on the exogeneity of the timing of exposure to violence. I present several pieces of evidence that support the plausibility of this assumption. First, I show in a balance test that experiencing a one-casualty violent incident during pregnancy is not systematically related to an array of household and mother characteristics. In addition, a placebo exercise reveals no systematic differences in health outcomes of non-exposed children in treated households to non-exposed children in control households. In addition, focusing on selective fertility and migration, I show that it is unlikely that mothers either anticipate small-scale violent incidents or adjust their behavior differentially ex post.

My estimates suggest that one violent incident during pregnancy significantly decreases height- and weight-for-age z-scores and reduces children’s cognitive and motor skills. The magnitudes are large: in utero exposure to violence decreases children’s height and weight by approximately 0.13 standard deviations below the mean, which translates into 370 grams and 2.7 cm for a 1-year-old child. Children also perform worse in terms of cognitive and motor skills compared to children without in utero exposure to violence. It is important to note that — with the exception of a few households in Kurdish districts — all households experience at least one violent incident throughout the observation period between 2006 and 2009. The magnitudes are therefore particularly striking since they capture the marginal effect of an additional violent incident in utero.

I investigate stress as the primary driver of the adverse effects of violence on child health. There is a substantive medical literature linking in utero conditions to health outcomes of children, including stress. Cortisol, a hormone released during stress, is managed by the body’s Hypothalamic-pituitary-adrenal (HPA) axis. In pregnant women, higher cortisol levels can occur due to additional release from the placenta. Stress reduces the placental conversion of cortisol to its inactive form, cortisone, leading to constriction in the uterine arteries, which restricts blood flow to the fetus, resulting in poor birth outcomes such as low weight or small size at birth (Babu et al., 2022). Cortisol increases the risk of adverse birth outcomes especially in the early stages of pregnancy (Beydoun and Saftlas, 2008; László et al., 2014; Glynn et al., 2001). These adverse birth outcomes are in turn linked to a variety of long-run effects, which range from lower cognitive skills to higher risk of chronic diseases and beyond (Huxley et al., 2002; Cutler et al., 2006; Victora et al., 2008). While there is some evidence that children are able to catch up over the first years of life in terms of height and weight (Claas et al., 2011), the medical literature has also established this accelerated infancy growth can have adverse effects resulting in obesity or diabetes in childhood and in later life (Cole, 2004; Ong, 2007).

Consistent with the medical literature, I find evidence that stress is a major determinant of adverse health outcomes of children in the aftermath of a violent event. First, violent incidents in the first trimester drive the main result. Second, throughout all estimations I restrict attention to incidents that have arguably little effect on the local infrastructure, i.e., violent incidents with only one casualty. I also show that results hold when I only include events where civilians were killed by a direct gunshot, rather than bombings, explosions or airstrikes. Third, I provide evidence that the results are driven by religiously motivated killings that specifically target the civilian population, rather than exposure to confrontations between coalition and insurgent forces.

I perform several robustness checks to confirm the validity of these findings. While the mother fixed effects account for time-invariant unobserved heterogeneity, there may still be selection into fertility based on levels of violence. This can bias the results if violence is serially correlated and mothers dynamically adjust fertility. For this reason, I include the number of incidents in the three months before conception and the results hold. I also relax the assumption that children are carried to full term, in order to reduce measurement error around the conception cutoff. The results hold when I assume a gestational period of 30 instead of 40 weeks for all children. I also verify that selective out-migration is not a main driver of my results. First, if households move between births due to a violent incident, I consider them as not treated. If these children have worse health outcomes due to in utero exposure to violence, this would downward bias my findings. In addition, almost all districts experience a violent incident in the observation period and violent episodes (and their location) were difficult to predict. It is therefore unlikely that households move based on (low-casualty) incidents. I verify this claim by looking at another survey of Iraqi households that contains detailed information on the migration history of 175,000 respondents and their reasons for moving. In addition, I verify that the results are robust to changes in the assumptions about the duration of pregnancy, the definition of exposure to violence, alternative outcome measures, and changes to the sample composition.

This paper contributes to the literature on the so-called “fetal origins” hypothesis (FOH). The FOH in economics has been analyzed in six major contexts: nutritional shocks, infectious diseases, exposure to pollution, weather and climate change, use of alcohol and tobacco, as well as maternal stress. Several papers look at mild nutritional shocks in utero induced by pregnancy during ramadan and show that they have a substantial effect on various outcomes, such as income, educational attainment or mental illness (Majid, 2015; Van Ewijk, 2011; Almond and Mazumder, 2011). Similarly, other papers investigate the role of environmental factors, such as hurricanes, radioactive fallout or pollution, as well as substance abuse during pregnancy in determining health and long-run outcomes of children (Beach et al., 2016; Isen et al., 2017; Almond et al., 2009; Shah and Steinberg, 2017; Ngo and Horton, 2016; Andalón et al., 2016; Dercon and Porter, 2014; Banerjee et al., 2010; Adhvaryu et al., 2014; Nilsson, 2017; Reader, 2023). In addition, some researchers have highlighted the effect of economic conditions during pregnancy on outcomes of children (Dehejia and Lleras-Muney, 2004; Willage and Willén, 2022; De Cao et al., 2022; Reader, 2023).

There is a prominent sub-strand of this literature that tries to isolate the effect of stress. While it is well established in the medical literature that stress and the associated bio-chemical response are harmful to children, it is difficult to disentangle stress from other components in larger population studies. There are no large-scale longitudinal data sets that measure CRH or Cortisol levels in mothers.Footnote 1 Economists therefore revert to reduced-form estimations that use an exogenous event that reasonably affects stress in mothers, such as potential paths of major hurricanes, important sports events or family ruptures (Currie and Rossin-Slater, 2013; Duncan et al., 2016; Persson and Rossin-Slater, 2018).

In the context of conflict and violence, it is difficult to disentangle the effect of stress from other mechanisms. The literature on in utero exposure to conflict and child health outcomes studied two main contexts, either one of brief and extreme episodes of violence, such as the attacks on the World Trade Center in September 2001, landmine explosions or the al-Aqsa Intifada (Brown, 2015; Mansour and Rees, 2012; Eccleston, 2011; Camacho, 2008), or systematic “day-to-day” crime-related exposure to violence in Latin and South America (Nasir et al., 2016; Koppensteiner and Manacorda, 2016; Brown, 2015; Torche and Villarreal, 2014). On the one hand, large-scale incidents of violence are difficult to disentangle from other variables that may simultaneously influence child health outcomes, such as deterioration of health infrastructure or behavioral responses of parents. On the other hand, it is hard to argue that regular, crime-related violence is exogenous or unpredictable. Therefore, a major concern in these studies is the selection of households along unobservable characteristics.

Mansour and Rees (2012) is the study that is most closely related to this analysis. The authors draw from the 2004 Palestinian Demographic and Health Survey, which was conducted approximately 4 years after the start of the al-Aqsa Intifada, and find that an additional conflict-related fatality 9 to 6 months before birth is associated with a modest increase in the probability of a child weighing less than 2500 g. The authors have to rely on self-reported measures of health for children. In contrast, I can draw on indicators of child health that were measured by enumerators at the time of the interview and thus do not suffer from reporting or recall bias. In addition, the authors rely on a sample of 244 mothers in Western Gaza, while I can — for the sample of siblings — leverage information on over 12,000 mothers and 25,000 children. While the authors can account for unobserved heterogeneity at the level of the mother, they acknowledge that the al-Aqsa Intifada came with a lot of restrictions and curfews, potentially restricting access to health care. I can improve on isolating the effect of stress from access to health care by looking at events of the same severity but arguably different levels of stress for mothers. These differences may explain why my estimates substantially deviate from theirs in terms of magnitude.

2 Data and the Iraqi context

2.1 The US invasion of Iraq

In 2003, the United States, in conjunction with a coalition of international allies, embarked on a military campaign to oust the regime of Saddam Hussein in Iraq. The decision to invade was made by President George W. Bush and his administration, who posited that Iraq, under the leadership of Saddam Hussein, posed a threat to national security. As the United States and its allies sought to gain support for the war, the administration presented evidence that Iraq was in possession of weapons of mass destruction (WMDs) and had ties to terrorist groups, including Al-Qaida. The Bush administration also claimed that the war was necessary to spread democracy and human rights in Iraq. In retrospect, the evidence presented by the Bush administration to justify the invasion of Iraq has been widely criticized and discredited (Betts, 2007). Despite opposition from several countries such as France and Germany, the coalition launched the invasion on March 20th, 2003, with the stated objective of removing the regime of Saddam Hussein, eliminating the supposed WMDs and ties to terrorist groups and bringing democracy to Iraq.

The coalition, which included the United Kingdom, Australia, Poland, and several other countries, quickly secured control of key cities and infrastructure in Iraq. The initial invasion was marked by the swift collapse of the Iraqi military and the fall of the Hussein regime, culminating in the capture of Baghdad on April 9, 2003. However, the post-invasion period was plagued by a prolonged and violent insurgency. The insurgency was mainly composed of various sectarian and ethnic groups that opposed the foreign presence, the new political order and the lack of services and security (Hashim, 2011). Throughout the war, coalition forces struggled to identify insurgents and gain the support of the local population (Cockburn, 2007). Continuous confrontations and acts of violence between coalition forces, insurgent, and sectarians have caused immense damage in form of civilian casualties. The Iraq Body Count Project (IBC) has recorded a total of 200,000 civilian non-combatant casualties since 2003.

Aside from its devastating effects on Iraq’s economy, the continuous exposure to conflict has also taken a toll on the mental health of the Iraqi population. According to the 2007 Iraq Mental Health Survey, 16.5% of the 32,000 respondents reported symptoms of severe mental disorder; for women this number is even higher at about 20%.Footnote 2 About 5% of women in the Iraq Mental Health survey report symptoms of post-traumatic stress disorder (PTSD). Health professionals propose that among all mental disorders, PTSD is one of the main drivers through which parents transmit adverse effects to their children during war-times (Devakumar et al., 2014; Murthy and Lakshminarayana, 2006).

2.2 Data sources and descriptive statistics

Multiple Indicator Cluster Survey In collaboration with the local government, UNICEF assists countries in collecting and analyzing the situation of children and women through its international household survey initiative, called the Multiple Indicator Cluster Surveys (MICS). The cross-sectional survey in 2011 is the first to be representative on the district level in Iraq.Footnote 3 The survey collects detailed information on living conditions, mothers’ birth history, and very detailed health outcomes for children under the age of 5, sampling 35,580 households, 56,445 women (age 15–49 years) and 36,599 small children (0–4 years).

Information on infant health outcomes include classical anthropometric indicators such as the height-for-age z-Score (HAZ) and the weight-for-age z-Score (WAZ). These are not self-reported but are measured by the interviewer. The main index is based on the National Center for Health Statistics (NCHS) malnutrition and stunted scale and expresses the distance between an individual child’s height/weight and the average height/weight of comparable children in the reference population taking into account the dispersion of the distribution. Additionally, child development indicators are collected in the form of questions on motor skills (child is able to pick up a small object with 2 fingers), cognitive skills (child identifies at least ten letters of the alphabet, reads at least four simple, popular words, knows name and recognizes symbols of all numbers from 1 to 10), behavior (child follows simple directions, is able to do something independently, gets along well with other children, kicks, bites or hits other children or adults, gets distracted easily, is sometimes to sick to play). These more complex health indicators were not collected in all households but for a sub-sample of about 13,000 children (out of 36,000 children). The randomization was made at the household level such that all children under the age of 5 in the randomly selected household were tested.

Table 1 Summary statistics — 2011 Multiple Indicator Cluster Survey Iraq

Table 1 reports the various characteristics at the household-, parent- and child-level for all households in the MICS data set that have at least one child under the age of 5 (which I call the “full sample”) and the sample of households with at least 2 children born in the observation period (i.e., the “sibling sample”). Approximately 20,000 out of the total 35,000 households have at least one child under the age of 5. About 12,000 mothers gave birth to multiple children during the relevant observation period between 2006 and 2009. Parents in the sibling sample are less educated, younger and less wealthy than their counterparts in the full sample. 24,000 children under the age of 5 have a sibling that is also under the age of 5. Health outcomes for children that have siblings of similar age are generally worse, they tend to weigh less and be smaller. With regards to cognitive, behavioral and motor skills, children in both samples perform similarly well. In addition to information on children, there is a vast array of household and parents’ characteristics, including detailed information on household size, composition, and wealth, as well as information on parents’ education, labor market status, proxies for religiosity and a limited number of attitudinal questions.

Iraq Body Count Project The Iraq Body Count Project (IBC) provides daily geo-coded information on the location of the attack, the perpetrators of the attack (coalition, insurgent, sectarian, unknown), and the type of attack (mortars, missiles, suicide attacks, snipers, improvised explosive devices (IEDs), rocket-propelled grenades (RPGs), car bombs, and small arms fire, usually with assault riffles). Their data is gathered from media reports, hospital documentation, morgues and other sources. In cooperation with the IBC Condra and Shapiro (2012) assign 19,961 violent incidents to districts for the period between 2003 to 2009, accounting for a total of 59,245 civilian deaths.

Fig. 1
figure 1

Iraq Body Count Data by district: cumulative 2003–2009

Figure 1 depicts the cumulative number of violent incidents (panel a) and causalities involved (panel b) across all 118 districts between 2003 and 2009. Violent incidents cover the whole of Iraq, with the exception of Kurdish districts in the North-Eastern part of the country. Most of the violent incidents occur in more populated districts (the Southwestern part of Iraq is very scarcely populated) including the capital Baghdad. However, there is still substantial variation in exposure to violence across the entire Northwestern-Southeastern strip of the country.

This paper follows Condra and Shapiro (2012) to divide killings into 4 categories: (i) insurgent killings of civilians that occur in the course of attacking Coalition or Iraqi government targets; this category explicitly excludes insurgent killings that are unrelated to attacks and are better classified as intimidation killings related to dynamics of the civil war, (ii) Coalition killings of civilians, (iii) sectarian killings defined as those conducted by an organization representing an ethnic group and which did not occur in the context of attacks on Coalition or Iraqi forces, (iv) unknown killings, where a clear perpetrator could not be identified. This last category captures much of the violence associated with ethnic cleansing, reprisal killings, and the like, where claims of responsibility were rarely made and bodies were often simply dropped by the side of the road. While all casualties reported are civilian casualties, the casualties resulting from coalition-insurgent confrontation are considered “non-targeted collateral damage”, e.g., civilians dying in the crossfire. Sectarian violence is killings by a clearly identified militia targeted at the civilian population.

The IBC data includes a descriptive section on the type of violent act perpetrated. Using key word searches in the incident descriptions, I created 5 categories of attacks. These categories are not mutually exclusive since the same incident can contain multiple types of violence. I distinguish between suicide bombings, other bombs and explosions (typically so-called Improvised Explosive Devices), airstrikes and missiles (which is mostly long-distance weaponry or military planes), gunfire, and execution or torture. Table A.1 shows the breakdown of violent incident by type and perpetrator. Since violent incidents can involve multiple perpetrators and combine different types of violence, the shares reported do not add to 100 percent. Almost 60% of incidents with civilian casualties involved gun violence. Almost one-quarter involved executions and torture, which were mostly perpetrated by sectarian agents, who are also the principal source behind violent incidents. However, the majority of suicide bombings were caused by insurgent rather sectarian agents.

One concern is that the IBC data could be subject to reporting bias. On the one hand, reporters may avoid high-violence environments and not report stories from there. On the other hand, the opposite bias could be present if high-violence areas attract the most media attention. Condra and Shapiro (2012) address this concern, using 2612 incidents for which a larger level of aggregation (the governorate) is known but the district is not and analyze whether the proportion of non-attributable incidents at the governorate correlates with levels of violence. The authors do not find a significant correlation that would suggest a systemic bias of non-attributable incidents. In this paper, I restrict attention to one-casualty incidents only, further alleviating concerns about differential reporting bias based on the severity of violence. I also perform another sanity check on the data, using an alternative data set, namely the unclassified data from the Multi-National Force-Iraq (MNFI) through daily Significant Activity Reports (SIGACTS).Footnote 4 The data set covers all known attacks on Coalition forces, Iraqi Security Forces, the civilian population, and infrastructure between 2004 and 2008. The data do not capture the perpetrator of violence and are only recorded if they involve coalition forces in some way. In contrast to the IBC data, they also include incidents with no casualties. Comparing the yearly activities on the district level between the two data sets, there is a very high correlation of about 0.8 which is significant at the 1% level. Given the very limited data sources on district-level violent incidents in Iraq, I consider this one of the few possible sanity check for the accuracy of the IBC data.

Linking violent incidents to pregnancies In order to link district-level data on violent incidents to child health outcomes, I use the exact birthdays of the children under the age of 5 from the MICS 2011 and calculate their period in utero backward. I drop observations where there is no information on the birthday of the children.Footnote 5 However, I also show that the results hold when I assume a shorter gestational period for all children in the sample. The earliest date of conception is five years and 9 months before the time of the first MICS 2011 interview, i.e., September 2005. There will be some measurement error, particularly for the first trimester, as I cannot account for early births. Instead, I assume that all children are carried to full term. Once I have assigned in utero periods to all children in the MICS data set, I construct the final data set which consists of the overlap period between all recorded incidents in the IBC data set and the earliest birth in the 2011 MICS data set (see Fig. 2). This leaves me with the relevant period of observation from 2006 to 2009, comparing siblings with a maximum age gap of 2.5 years.

Importantly, for the sake of this analysis, I exclude all events that have more than one casualty in order to limit the influence of large-scale attacks that may impact the regional health infrastructure or induce more substantial change in the socio-demographic composition or regional amenities. In a robustness check, I show that the effect is marginally larger when I include violent incidents with more casualties. I also treat violent episodes that last several days as one incident and take the start date of the violent episode. 17,500 of the 18,500 recorded events only last 1 day. The average duration of a violent episode is between 1 and 1.5 days.

In Appendix Table A.3, I show the average duration and casualties associated with each incident by perpetrator, when I do not restrict the sample to one-casualty events. Half of the recorded violent incidents involve only one civilian casualty. Generally, casualty numbers are in the one digits, only 6% involve more than 10 casualties and the sample includes 70 observations with more than 100 casualties. Events with very high civilian casualties occurred almost exclusively in Baghdad. The vast majority of incidents, more than 70%, involves sectarian forces. However, coalition attacks have been the most fatal ones, resulting, for instance, in a violent episode of 30 days (July 1st to 30th 2006) that took 1098 civilian lives in Baghdad.

3 Empirical strategy

3.1 Estimating equation

I leverage within-mother variation in exposure to violence during pregnancy, effectively comparing health gaps between pairs of siblings where one was exposed to violence in utero and the other one was not to those sibling pairs where either both or none have been exposed to violence. My preferred specification writes as follows:

$$\begin{aligned} Y_{imd} = \beta _0 + \beta _1 violence_{d}\times in\;utero_{i} + \beta _2 X_{i} + YoB_{i} \times \theta _{d} + \gamma _{m} + \varepsilon _{imd} \end{aligned}$$

I observe health outcomes Y of child i born to mother m located in district d in 2011 and trace back their in utero exposure to violence in each of the trimesters. The treatment variable \(violence_{d}\times in\;utero_{i}\) is a dummy variable that switches on if any violent incident with one casualty was recorded at the district-level (\(violence_{d}\)) when the child was still in utero (\(in\;utero_{i}\)). The vector of control variables \(X_i\) includes child-level characteristics, such as the gender of the child, whether he or she has a twin sibling, and their birth order (e.g., whether the child is first, second or third born etc.). I also include a year of birth dummy YoB interacted with districts fixed effects \(\theta _{d}\), which capture child age-district specific time trends. Additionally, I include mother fixed effects \(\gamma _{m}\), absorbing any time-invariant unobserved heterogeneity at the level of the mother. The main outcomes of interest, weight and height, are measured by the enumerator and presented as Z-scores. Other health outcomes, such as the ability to recognize letters or numbers are based on survey questions (see Table 1 for the full list of non-anthropometric outcomes). Standard errors are clustered at the district level. The coefficient \(\beta _1\) can be interpreted as follows: a violent incident involving one casualty during pregnancy (or each trimester of pregnancy) will decrease the height or weight of a child by \(\beta _1\) standard deviations.

Fig. 2
figure 2

Linking violent incidents to pregnancies: cross-sectional MICS survey 2011 and Iraq Body Count data 2003–2009

It is important to note that violence varies at the district and day level. The treatment therefore affects everyone that lives in the district at a certain point in time. The identifying variation is at the child level and comes from exposure to violence when it is in utero. This is why it is possible to include mother fixed effects although outcomes are only observed once in 2011 and mothers are uniformly treated within a district. In order to further illustrate the identifying variation, in Appendix Table A.4, I show the probability of treatment for the sample of mothers that have more than one child under the age of 5 at the time of the interview. About 5900 mothers have experienced a violent incident during pregnancy, either for some or all of their children (see second and third row). The overall probability of experiencing a violent incident during pregnancy therefore lies at around 60%. As mentioned above, I compare health gaps of siblings in the second row with health gaps of siblings in the first and last row.Footnote 6

It is also worth noting that all children in the data set were exposed to a violent incident at some point after they have been born (except for a few districts in the Kurdish region that have not experienced any violence during the observation period). All of these events will naturally have an impact on child health a few years later. In this context, I only test whether an additional incident during pregnancy worsens health outcomes. Ex ante it is not obvious whether we should observe an effect at all. On the one hand, violence may be “normalized” in these contexts and mothers may respond less to an incident, as compared to rare events which are typically analyzed in the literature (Currie and Rossin-Slater, 2013; Brown, 2015; Duncan et al., 2016; Persson and Rossin-Slater, 2018). On the other hand, there is evidence that even small events in early stages of pregnancy can have significant effects in the long-run and that exogenous shocks are easier to mitigate in later stages of pregnancy and early childhood (Camacho, 2008; Duncan et al., 2016).

3.2 Threats to identification

A causal interpretation of \(\beta _1\) requires that there are no time-varying unobserved characteristics at the level of the mother that determine both the likelihood of being exposed to violence during pregnancy and health outcomes of the child they are carrying. In essence, the identifying assumption relies on the exogeneity of the timing of exposure to violence.

Plausibility of treatment exogeneity In Fig. 3, I present a plausibility check for the assumption that the timing of exposure to violence is quasi-random. Specifically, I predict the probability of treatment (in utero exposure to a violent incident with one casualty) with an array of mother and household-level characteristics. I include district fixed effects and standardize coefficients for comparability. Reassuringly, almost all variables are not systematically correlated with the treatment and those that are small in magnitude. Importantly, the mother fixed effects capture all characteristics presented in Fig. 3 and others that could be related to selection into treatment and child health outcomes, such as risk aversion, religiosity or attitudes towards contraception etc. Any differences in district characteristics in the year the child was born are captured by the district-year of birth fixed effects, including political stability, ethnic polarization, quality of health infrastructure etc. In addition, any concerns related to the time-varying unobserved heterogeneity at the level of the mother would have to point to variation in a time span of 2.5 years since this is the largest age gap observed between siblings in my data set. More generally, the coefficient \(\beta _1\) is biased if mothers either (i) anticipate small-scale violent incidents or (ii) adjust their behavior differentially ex post.

Fig. 3
figure 3

Plausibility of exogenous exposure to violence in utero. Note: Plausibility of exogenous exposure to violence during pregnancy. I estimate the following regression: \( violence_{d}\times in\;utero_{i} = \gamma X_{mh}+ \theta _{d} + \epsilon _{{imhd}} \). I regress a set of mother- and household-level characteristics \( X_{mh}\) on the treatment variable \(Violence_{imhd}\), i.e., a dummy variable that switches on if child i living in household h born to mother m has been exposed to a violent incident with one casualty in utero. I include district fixed effects \(\theta _{d}\) and cluster standard errors at the district level. All coefficients are standardized for comparability

Next, I conduct a placebo test where I compare children’s health outcomes across mothers. More specifically, I assign treatment status to children that have not been treated in reality but are the sibling of a child that was treated in utero. I run the baseline regression but remove mother fixed effects since now all children of a mother belong to the treatment group and the fixed effect would absorb the treatment status of children. If mothers select into treatment based on their children’s health outcomes, I should detect these differences with this exercise. Of course, there are two caveats. First, even in the presence of quasi-random exposure in the timing of violence, I may still find differences in health outcomes if there is negative selection of mothers along unobservables (these would be captured by the mother fixed effects). Second, intrauterine exposure to violence of one child may have spillover effects on other children in the household even if they have not been treated themselves. However, if I do not detect any large and systematic differences between placebo and control children, then this is reassuring. Columns 1 and 3 of Appendix Table A.5 compare the health outcomes to children of mothers where at least one child has been treated to those that have not been treated. In columns 2 and 4, I drop children that were actually treated from the sample, effectively comparing placebo treated children with control children. While the coefficients are negative throughout (potentially because of the aforementioned caveats), I find no large or statistically significant differences in health outcomes across placebo and control children.

Selective fertility First, I only observe pregnancies that resulted in a live birth. It is possible that violent incidents and stress increase the likelihood of miscarriage (László et al., 2014). Consequently, women that have been exposed to violence and experienced a miscarriage will not appear in the treatment group although they are the ones most severely affected by it. Therefore, the average treatment effect may underestimate the true impact of exposure to violence during pregnancy and only captures the adverse consequences conditional on having a live birth.

I examine the correlation between exposure to violence and selective fertility in Appendix Table A.6. Specifically, I run a mother-level regression, estimating the relationship between the cumulative number of violent incidents (and casualties) between 2003 and 2009 and the likelihood of having experienced the death of an infant or a miscarriage in the past. I include governorate fixed effects (N=18) and an array of district-, household- and mother-level controls.Footnote 7 Overall, the coefficients exhibit a positive sign bur are small in magnitude and statistically insignificant, suggesting that there are no large differences in miscarriages across districts with different levels of violence. I repeat this exercise using the likelihood of seeking prenatal care as the outcome of interest. Again, there is no systematic difference in accessing prenatal care across districts with higher cumulative exposure to violence over the observation period. However, the mother-level analysis does not allow me to exploit the quasi exogenous timing of conception or include a more saturated fixed effect structure such that the estimates likely suffer from bias due to selection into violent districts, recall bias and measurement error in effective exposure to violence due to lack of information on timing of death and miscarriage.

In addition, it is possible that even low-casualty incidents are serially correlated and mothers dynamically adjust their fertility in response to episodes of violence. For instance, it is possible that wealthier mothers with access to contraceptives can adjust fertility while others cannot. If there is negative selection into fertility that is time-varying at the level of the mother, I would falsely attribute the adverse health outcomes to violent incidents rather than time-varying unobserved heterogeneity of mothers. In order to address this concern, I include a dummy variable that captures any violent incident in the three months before conception in one specification.

Moreover, exposure to violence might increase the likelihood of early births. Overall, the pre-term birth rate (defined as births \(\le \) 37 weeks per 100 live births by the World Health Organization [WHO]) of 9.1% in Iraq today is comparably low and even outperforms the United States with 10% (Perin et al., 2023). The outcomes HAZ and WAZ will capture this effect since children that are born before full term will be smaller and need time to catch up with their birth cohort (Claas et al., 2011). Nevertheless, and as mentioned above, pre-term births will introduce some measurement error around the conception date. For this reason, I construct a data set that assumes a gestational period of 30 instead of 40 weeks for all children.Footnote 8 While this approach may underestimate the true exposure to violence in the most important period — the first trimester — it will offer a very conservative estimate for in utero exposure to violence and child health.

Selective migration As mentioned previously, the mother fixed effects capture a substantial share of the unobserved heterogeneity associated with both exposure to violence and child health. However, mothers may choose to move in response to exposure to violence. Since I have no information on the migration history and timing of migration (or any other proxy, such as district of birth of mother or father), I cannot control for this problem directly. If there is selective emigration in response to violence, I would consider them as not treated. Consequently, I may underestimate the true effect of exposure to violence in utero.

In addition, I focus on low-casualty incidents in a highly violent environment. The decision to emigrate would have to be based on these relatively small-scale events. In order to assess the plausibility of this claim, I use a different data source to verify whether internal migration was a major determinant in the demographic composition of districts. The Iraq Household Socio-Economic Survey (IHSES) is a large-scale survey conducted by the Iraqi Central Office for Statistics and supported by the World Bank, that covers over 25,000 Iraqi households and over 175,000 individuals in the year 2012. The data set contains detailed information on the migration history of respondents. Approximately 18% of respondents have migrated at some point in their lives. Appendix Table A.7 describes the different reasons for migration for the sub-sample of approximately 1600 respondents that moved during the observation period between 2006 and 2009. Approximately 6% of the individuals that migrated during this period have done so for reasons of civil conflict, armed conflict, security reasons, or overall forced displacement. Overall, they make up less than 0.1% of the full sample, alleviating concerns that migration is a main driver in the composition of respondents in the MICS. Lastly, in a robustness exercise, I exclude districts that have likely experienced large compositional changes due to violence. I repeat the analysis without the capital Baghdad, districts in the most peaceful region of Kurdistan and districts in the top decile of violence.

4 Empirical results and mechanisms

4.1 Main results

Anthropometric outcomes I present the main results in Table 2. I consider the height-for-age z-score (HAZ) and the weigh-for-age z-score (WAZ) in columns 1 to 3 and 4 to 6, respectively. As previewed above, all specifications include child-level controls, mother fixed effects as well as year of birth interacted with district fixed effects (I show in Appendix Table A.8 the evolution of the coefficient of interest when successively introducing controls and fixed effects). The average height and weight as well as the standard deviation of children by age are reported at the bottom.

In columns 1 and 4, I first include a dummy variable for any low-casualty violent incident during pregnancy and find that in utero exposure to violence significantly reduces the height and weight scores of children. The estimates suggest that one incident decreases the HAZ by 0.135 standard deviations and the WAZ by 0.143 standard deviations, which translates into 370 grams and 2.7 cm for a 1-year old child.Footnote 9

In columns 2 and 5, I present the results for assigning violent incidents to trimesters of pregnancy. The medical literature proposes that the production of the stress-induced hormone CRH has the most prominent effect on birth outcomes during the first trimester of pregnancy, i.e., children are most sensitive to maternal stress in early stages of pregnancy (Glynn et al., 2001). The estimates confirm this finding as most of the adverse effect of conflict on height and weight comes from events that occurred during the first trimester. The effect is significantly larger and more precisely estimated when I consider the timing of the violent incident. As mentioned in the previous section, I will have some measurement error around the exact timing of pregnancy since I cannot identify whether the child was born pre-maturely. This is why I may observe — although imprecisely estimated — a positive effect in the last trimester.

In columns 3 and 6, I include a dummy variable for any violent incident in the three months prior to the conception date in order to account for differential selection into fertility. As anticipated above, I find evidence in line with negative selection into fertility. Children to mothers who were exposed to violence before pregnancy and still get pregnant generally exhibit worse health outcomes both in terms of height and weight. But even conditional on pre-pregnancy levels of violence, the coefficients (albeit marginally smaller in magnitude) are robust and more precisely estimated.

Table 2 Violent incident during pregnancy deteriorates child health

Lastly, in columns 4 and 8, I account for the possibility of early births by assuming a shorter gestational period for all children. While the average number of pre-mature births is comparatively low in Iraq, it is well established in the medical literature that stress during pregnancy increases the likelihood of giving birth before 37 weeks (Kramer et al., 2009). The outcomes capture pre-term births in the form of lower height and weight scores. However, the estimation will introduce some measurement error around the conception date, which may lead to an over-estimation in the presence of selective fertility due to previous levels of violence. Therefore, I make the more conservative assumption that all children were born pre-maturely and use one-casualty incidents in each 10-week trimester of the short gestational period of 30 weeks as the main treatment variable. As expected, the coefficients are smaller since some treated children will not be considered as treated but are nevertheless positive and highly significant. In addition, the positive effect of violence on HAZ in the last trimester becomes small and statistically insignificant, confirming that this approach reduces measurement error around the conception cutoff.

Mitigating factors In a next step, I investigate potential mitigating factors. In Table 3, I interact the treatment with (i) the age of the child, (ii) age of the mother and (iii) the gender of the child to explore heterogeneous effects in exposure to violence. In contrast to previous studies that measure anthropometric outcomes at birth, I am able to trace HAZ and WAZ for children throughout their first four years of life. This allows me to analyze whether the adverse effects of exposure to violence in utero can be mitigated over time. In columns 1 and 4 of Table 3, I interact violence with the age (in months) of children and find that over time children are able to catch up with their siblings in terms of health outcomes. The coefficient implies that children need about 3 years to converge to their siblings. However, as noted above, the largest age gap in the sample is 2.5 years. Therefore, I don’t observe full convergence and it is possible that some developmental gaps persist over time. These estimates in the medical literature by Claas et al. (2011), who find that children born small for their gestational age need between 3.5 to 5.5 years to catch up with comparable children.

In a next step, I check whether the age of the mother plays a significant role in mitigating the adverse effects. On the one hand, older mothers may be more sensitive to adverse effects of stress since the likelihood of a risky pregnancy increases with age, even in the absence of exposure to violence. On the other hand, older mothers may also have more resources (both financial and social) to mitigate the adverse effects. Since more direct measures of resources are endogenous to exposure to violence (i.e., household income may itself be a consequence of violence) and therefore bad controls, I refute to age as a pre-determined proxy for both resources and risks. In columns 2 and 5, I do not find a significant relationship between the age of the mother and her ability to mitigate the consequences of violence, potentially due to the aforementioned countervailing forces.

Table 3 Mitigating factors: children catch up over time
Table 4 Violent incident during pregnancy and illness, cognitive skills and behavior

Lastly, I analyze in columns 3 and 6, whether the gender of the child exacerbates the adverse effects of exposure to violence. It is possible that parents allocate resources differentially across gender and therefore contribute to a widening health gap among siblings. The coefficient of the interaction term is small in magnitude and negative but noisily estimated, indicating that there are no large differences in the effect of a low-casualty violent incident between boys and girls.

Illness, cognitive skills and behavior The effects of in utero exposure to violence can also extend to cognitive, motor, and behavioral skills, as well as vulnerability of the immune system. Some of these outcomes are only available for a small but randomly chosen subset of households and are based on survey questions rather than direct measures taken by the enumerator (I list the questions and their coverage in Appendix Table A.9). In columns 1 and 2 of Table 4, I look at the likelihood that a child was sick with a cough or diarrhea in the last 2 weeks. This question is available for all households and is less likely to suffer from recall or reporting bias. In column 1, I find that children that were exposed to a violent incident in utero are more likely to have recently suffered from a cough. There is no differential effect on the likelihood of diarrhea in column 2. This is in line with the medical literature, which posits that prenatal conditions have a more direct and significant impact on the likelihood of a child experiencing cough due to their influence on lung development and immune function, compared to diarrhea, which is more influenced by postnatal environmental and dietary factors (Genser et al., 2006; Vieira, 2015; Muglia et al., 2022). Specifically, research indicates that prenatal stress can affect lung maturation, growth, and function through neuroendocrine, immune/inflammatory, and vascular pathways. This can lead to impaired lung function and immunity in the child often expressed as suffering from a cough (van de Loo et al., 2016; Bush et al., 2021).

Next, I look at the effect on cognitive skills and behavior of children for a small subset of approximately 2200 children and 1100 mothers. Cognitive skills include, for instance, the ability recognize letters or numbers. Behavioral outcomes include, for instance, whether the child is able to follow simple directions or gets along well with other children. Appendix Table A.10 shows the correlation between these variables and HAZ and WAZ of children. Overall the correlation is very low (no higher than 0.05) and is not significant in many cases, indicating that either cognitive skills and behavior are largely orthogonal to anthropometric measures or that there is measurement error or reporting bias on the side of mothers. I create two indicators of cognitive skills and behavior based on the first principle component of multiple questions listed in Appendix Table A.9 and report the results in columns 3 and 4, respectively. As mentioned above, the sample size is reduced by 90%. Nevertheless, I find a negative effect — albeit noisily estimated — on the cognitive skills and behavior of children. I observe the same overall patterns when expanding the definition of the treatment to account for the intensity of violence (i.e., the number of violent incidents during pregnancy) and severity of violence (i.e., including large-scale events). The coefficients reported in Appendix Table A.11 are more precisely estimated and the effect of the intensity of violence on cognitive skills of children becomes significant with a p-value of 0.057.

4.2 Stress mechanism

Perpetrator of violence: collateral casualties and directed violence I leverage information on the perpetrator of violence to argue that some events are likely to be more stressful than others, holding the severity (i.e., the number of casualties) constant. As outlined in Section 2, the IBC data allows me to distinguish between three perpetrators of violence: (i) insurgent killings of civilians that occur in the course of attacking coalition or Iraqi government targets, (ii) coalition killings of civilians by US or allied forces and (iii) casualties caused by sectarian militia that did not happen during combat with coalition forces, i.e., these attacks targeted the civilian population directly and are arguably more stressful. There may be concerns that sectarian violence is just a proxy for violence of extreme intensity, e.g., long duration and many casualties. While this concern would already be remedied by the restriction to one-casualty events, the correlation coefficients in Appendix Table A.12 confirm that this concern is not valid.Footnote 10 In fact, there is no significant correlation between the duration of a violent incident and the perpetrator, and the number of casualties is even negatively correlated with sectarian caused incidences.

Table 5 Stress mechanism: perpetrator of violence
Table 6 Stress mechanism: type of violence

I test this hypothesis in Table 5 and distinguish between the three perpetrators in contexts where one unique perpetrator could be assigned to the death of one civilian. I run separate regressions for each perpetrator, dropping from the sample sibling pairs that have been exposed to violence by the other perpetrator. Holding the control group fixed, allows me to compare coefficients across columns. I compare health outcomes between siblings with prenatal exposure to violence perpetrated by either sectarian (columns 1 and 4), insurgent (columns 2 and 5) and coalition (columns 3 and 6) forces to those without any exposure to violence. While all violent incidents exhibit a negative sign, the effect is largest and most precisely estimated for sectarian violence. This is in line with the idea that sectarian violence is more stressful for mothers.

Notably, the effect of coalition violence is close to zero. This does not negate the adverse effects of coalition and on child health overall but indicates that they are less likely to operate through the stress mechanism. This suggests that the effects of in utero exposure to violence on child health can be heterogeneous and — in addition to their effect on local infrastructure — also depend on their psycho-social impact on the civilian population.

Types of violence: stress versus damages to infrastructure Next, I look at different types of violence, distinguishing between direct gunshots, bombings, airstrikes and missiles, suicide bombings and torture and executions in Table 6. Again, I drop from the sample mothers that have been exposed to other types of violence, such that the comparison is between children that have been exposed to a specific type of violence to those that were not exposed to violence. The categories are not mutually exclusive, since — for instance — executions of civilians often involve a direct gunshot, or bombings can include suicide bombings but also other types of explosions. As shown in Appendix Table A.1, the type of violence is closely linked to the perpetrator of violence. Sectarian violence is largely associated with execution and torture as well as gunfire. Insurgent agents typically revert to suicide bombings. Airstrikes and missiles typically involve coalition forces.

In a first step, I compare types of violence that impact access to prenatal care to varying degrees. For instance, it is unlikely that the killing of one civilian with a direct gunshot causes damages to the infrastructure or trigger curfews, as those are usually initiated after bombs or explosions or more large-scale events. In columns 1 and 6 of Table 6, I restrict the sample of incidents to those that came from direct gunfire, again excluding from the control group mothers that have been exposed to other types of violence during pregnancy. Despite this limitation, I detect an adverse effect — albeit smaller than in the baseline regression — of direct gunshots on height and weight scores of children. Conversely, bombings are more likely to impact the health infrastructure more substantially. I test this in columns 2 and 7 and find — as expected — that the effect size is substantially larger, reflecting a compound effect of stress and access to prenatal care.

Next, in columns 3 and 4, as well as 8 and 9, I look at other events that may have a large effect on the infrastructure, including airstrikes, missiles and suicide bombings. Restricting the sample to these events reduces the number of treated mothers substantially since one-casualty suicide bombings and airstrike are rare: the IBC records 125 suicide bombings with one casualty and 528 airstrikes and missiles with one casualty compared to approximately 6000 incidents involving gunfire. In addition, I expect to find more noisy estimates for WAZ since the catch-up effect of children with low birth weight can overshoot and even result in obesity (Cole, 2004; Ong, 2007). Overall, I find that these negatively impact the HAZ and WAZ of children, although as anticipated the estimates are more noisy.

Lastly, I investigate another type of violence that is arguably particularly stressful. The majority of sectarian violence is directed at the civilian population and involves gruesome acts of torture and execution, typically involving a public display of violence (either through exposure to the victims of this violence and propagation on social media). This violence is individualized and may not even involve the use of high impact weapons. I look at these incidents in columns 5 and 10 and find a large and negative effect on children’s HAZ and WAZ. The effect size is statistically indistinguishable from bombings, suggesting that in some cases the severity of stress may even exceed the compounding effect of access to prenatal care.

4.3 Additional results and robustness checks

Definition of violence So far, I have focused on events with only one casualty to rule out that the effect is driven by damages to the infrastructure. In Appendix Table A.14, I narrow and expand the definition of the severity of violence. First, I consider all violent incidents including large-scale events with many casualties and potentially large effects on the infrastructure. Including these events may have two countervailing forces on the size of the coefficient: on the one hand, the coefficient may increase since the level of stress increases with the severity of the event and damages to the infrastructure may exacerbate the adverse effect on child health. On the other hand, these events may be more likely to trigger miscarriages or lead to internal migration of those that were most affected and therefore underestimate the negative effects on child health. Including large-scale events into the sample in columns 1 and 3, produces slightly smaller coefficients that are statistically indistinguishable from those of the main estimation, potentially because both of the aforementioned forces are at play.

In columns 2 and 4, I consider a different measure for severity, namely the number of violent incidents during pregnancy. Any additional violent incident during each trimester of pregnancy does not deteriorate child health further. This is in line with the findings of Camacho (2008) who shows that violence has a large effect at the intensive but not extensive margin. Alternatively, this could also reflect the fact that there are relatively few high frequency episodes of violence in the observation period (the majority of these occurred right after the invasion in 2003).

In a next step, I focus on violence involving direct gunfire with a high death toll. These are large-scale events that might cause more stress while having arguably little impact on the infrastructure. Most of these events last 1 day (< 1%, i.e., 31 events last more than 1 day). Compared to the effect of low-casualty and low-impact violent incidents (reported in columns 1 and 6 of Table 6), the effect is about 45% larger. This is either because these events are more stressful for mothers or because these types of events come with curfews or other restrictions that may impact access to prenatal care.

Lastly, I use a more restrictive definition of low impact violent incidents. In columns 3 and 6, I only include violent incidents with one casualty that lasted 1 day and drop from the sample mothers that have been treated with multi-day events. The coefficients are statistically indistinguishable from the baseline estimates since the share of multi-day low-casualty incidents is rather low: only 124 incidents lasted more than 1 day with the vast majority lasting 2 days.

Alternative outcome measures I verify that my results are robust to changes in the measurement of the treatment and the outcomes. In Appendix Table A.15, I show that using height and weight of children as a percentage of the median in columns 1 and 2 does not alter the results. In columns 3 and 4, I also show that — albeit more noisily estimated — my results are largely robust to using the WHO definition of height- and weight-for-age z-scores. However, the WHO measures are less granular and the distribution is more condensed (scale reaching from −6 to 6, rather than −10 to 10) and may therefore not pick up small changes in height and weight.

Sample composition In a next step, I verify whether districts with very high or low levels of violence are driving the results. This exercise addresses two concerns. First, selection into or out of these districts may contaminate the control group. For instance, the most resourceful mothers are able to move to districts that are known to be very safe. If these children are generally healthier, then we may overestimate the true effect of being exposed to violence during pregnancy. Conversely, while low-casualty incidents may not be predictable, in districts with generally high levels of violence they may be correlated with large-scale incidents and may thus overestimate the effect. In Appendix Table A.16, I address these points by dropping the 10 districts of Baghdad, which are — as the capital of the country — victim to many of the violent incidents in columns 1 and 5 and by dropping all districts in the Kurdish autonomous region, which belong to the safest in the country in columns 2 and 6. In columns 3 and 7, I also exclude all districts that fall into the 90th percentile of violent incidents between 2006 and 2009. The results are robust to these changes.

Lastly, I repeat the main analysis but exclude sibling pairs that have both been exposed to violence. The interpretation of the coefficient changes from a comparison of sibling pairs where only one child was exposed to violence with sibling pairs where either both or none have been exposed to violence in utero to a comparison of sibling pairs where one has been exposed to violence with sibling pairs where none have been exposed to violence. The coefficient is robust to these changes and is statistically indistinguishable from the coefficients in the baseline regression.

5 Conclusion

This paper provides novel evidence on the effect of in utero exposure to violence on child health in the medium run, focusing on stress as a possible mechanism through which adverse consequences of war are transmitted across generations. Detailed information on the type of severity of violence combined with micro-level data on child health allows me to overcome important empirical challenges in the literature. Specifically, I am able to address concerns about selection into fertility as well as selection into exposure to violence. In addition, I can follow children over the first years of life and can rely on anthropometric outcomes measured by enumerators and a broader set of health outcomes, including cognitive and behavioral skills. Lastly, this paper is the first to show that the type of violence matters for the health outcomes of children.

I show that one single violent incident during pregnancy significantly increases the risk of stuntedness and malnutrition. I also find suggestive evidence that in utero exposure to violence increases children’s likelihood of contracting illnesses, and deteriorates cognitive skills and behavioral outcomes. However, children are able to catch up — but potentially not fully converge — over time. Focusing on exposure to violence with arguably little effect on the infrastructure and access to prenatal care as well as comparing violent incidents that are more stressful for the civilian population allows me to improve on establishing stress as a plausible mechanism.

My results emphasize that an additional act of violence during pregnancy can still have a strong adverse effect on child health in an environment where violence has remained persistently high over the last 15 years. The paper also uncovers heterogeneous effects of different types of violence, showing that incidents with a low number of casualties but a large psychological impact on the civilian population can be detrimental for child health. Against the backdrop of the large literature on the long-run effects of adverse health outcomes of children, the results highlight the importance policies and interventions that target pregnant women particularly in places where ethnic tensions are high.