The Decline of War Since 1950: New Evidence

For the past 70 years, there has been a downward trend in war sizes, but the idea of an enduring ‘long peace’ remains controversial. Some recent contributions suggest that observed war patterns, including the long peace, could have resulted from a long-standing and unchanging war-generating process, an idea rooted in Lewis F Richardson’s pioneering work on war. Focusing on the hypothesis that the war sizes after the Second World War are generated by the same mechanism that generated war sizes before the Second World War, recent work failed to reject this ‘no-change’ hypothesis. In this chapter, we transform the war-size data into units of battle deaths per 100,000 of world population rather than absolute battle deaths – units appropriate for investigating the probability that a random person will die in a war. This change tilts the evidence towards rejecting no-change hypotheses. We also show that sliding the candidate break point slightly forward in time, to 1950 rather than 1945, leads us further down the path toward formal rejection of a large number of no-change hypotheses. We expand the range of wars considered to include not just inter-state wars, as is commonly done, but also intra-state wars. Now we do formally reject many versions of the no-change hypothesis. Finally, we show that our results do not depend on the choice of war dataset.

some detail (Goldstein, 2011;Pinker, 2011;Hathaway & Shapiro, 2017) while others reject it (Braumoeller, 2013;Cirillo & Taleb, 2016b;Clauset, 2018Clauset, , 2020. Here we do not attempt a broad survey of the existing literature. Rather, we focus on the recent contributions of Cirillo & Taleb (2016b) & Clauset (2018& Clauset ( , 2020 suggesting that observed war patterns, including the long peace, could have come from a long-standing and unchanging war-generating process. In particular, we engage with Clauset (2018) who tests the hypothesis that the war sizes after the Second World War are generated by the same mechanism that generated war sizes before the Second World War. He fails to reject what we will call a 'no-change hypothesis'.
Here are the main contributions of our chapter. First, we give a simple exposition of the central ideas behind the new critiques of the decline-of-war thesis made by Cirillo & Taleb (2016b) and Clauset (2018Clauset ( , 2020. These ideas hinge centrally on the original insight of Richardson (1948Richardson ( , 1960 into the fat-tailed size distribution of modern wars. Second, we transform the war-size data into units of battle deaths per 100,000 of world population rather than absolute battle deaths and argue that these units are appropriate for investigating the probability that a random person will die in a war. We show that this change tilts the evidence towards rejecting a large number of no-change hypotheses. Third, we show that sliding the candidate break point slightly forward in time, to 1950 rather than 1945, leads us further down the path toward formal rejection of a range of no-change hypotheses. Finally, we expand the types of wars to include intra-state as well as inter-state. Now we almost always formally reject our no-change hypotheses. 1 Finally, we show that our results do not depend on the choice between two widely used war datasets.

Richardson Provides Our Framework
Decades ago, Richardson (1948) introduced the idea that war sizes tend to follow what is known as a power law distribution. 2 Technically, this means that the frequency of wars of size x is proportional to x Àa where a [ 1 is some constant. Thus, bigger wars are less common than smaller ones with the value of a governing the rate at which war frequencies decrease as war sizes increase. This remarkable insight has fared well against more than half a century of new data and the development of more rigorous statistical methods for estimating and testing power laws (Cederman, 2003;Clauset, 2018;González-Val, 2016).
For our purpose, the important characteristic of power-law distributions is that they have what are known as 'fat upper tails' governing the relationship between 1 Our findings do not refute those of Clauset (2018). It can be true simultaneously that per capita war sizes decrease while the absolute war size generation mechanism does not change.
2 Spagat (2015) provides a non-technical introduction to power laws. war sizes and their frequencies. This property entails that, although bigger wars are less common than smaller ones, the rate at which war frequencies decline with war sizes is much slower than would be the case if war sizes followed a common normal, or 'Bell Curve', distribution. Most people are conditioned to think in terms of Bell Curves, so some mental effort is required to adjust to fat tails. Here is the most salient point to bear in mind in the present context; huge wars are really rare but not really really really, rare.
We illustrate the key fat-tail property with the following numerical example. Suppose that every time the world experiences a new war, w, the probability that the war size will grow to at least the size of the First World War, whereafter a 'truly huge war'is 0.006. 3 We now make the important assumption that war-size realizations are statistically independent of each other, which implies that the size of war w tells us nothing about the sizes of previous or future wars. Under these conditions, the chance that there is at least one truly huge war after 200 war-size realizations is roughly 2/3. 4 If we lower the probability that each new war will turn out to be a truly huge one down from 0.006 down to P w ! w ð Þ¼0:0001, then the chance of at least one truly huge war in 200 draws drops to around 1 in 50. Decreasing the probability of a truly huge war on each draw even further down to P w ! w ð Þ¼10 À7 , decreases this chance all the way down to about 1 in 50,000. Thus, it makes a big practical difference whether truly huge wars are really rare, P w ! w ð Þ¼0:006; really really rare, P w ! w ð Þ¼0:0001; or really really really rare, P w ! w ð Þ¼10 À7 . This fat-tail property of the war-size distribution potentially places the world into what we might call a 'bad Goldilocks' range. On the one hand, 0.006 is large enough that we might expect to suffer a truly huge war once every few generations, far too often for such a calamity. On the other hand, 0.006 is small enough that the risk of a truly huge war can lurk below the surface for a long time without being exposed as a major threat. This is evident within our example according to which the world has about a 1/3 chance of experiencing 200 wars without suffering a truly huge one. And if our luck holds out this long, we could easily last another 200 wars without suffering a truly huge war.
Thus, we arrive at an important insight flowing from the pioneering work of Richardson (1948) and developed further by Clauset (2018); the threat of a truly huge future war can be quite serious while simultaneously remaining well-hidden for a long time. In other words, we should not dismiss the possibility of a truly huge future war just because such an event would be dramatically out of line with our range of experience over the last 70 years. At the same time, we must not imprison ourselves in our own ahistorical assumptions that rely on the artifice of independent 3 This probability is not entirely fictitious. In the dataset compiled by Gleditsch (2004), the First and Second World Wars are by far the biggest two out of 362 wars that occurred between the beginning of the 19th century and 1945: 2=362 % 0:006. draws with fixed and unchanging probabilities. These calculations are helpful to understand important concepts and establish baseline expectations. But they do not possess any special powers to describe the world we currently live in or to predict its future. A finding that the war-size pattern of recent decades is consistent with an unchanging war generation mechanism over the last two centuries does not prove that that such a mechanism actually exists.

A New Debate on the Decline of War
There is diversity of opinion among proponents of the decline-of-war thesis. First, it is standard to claim that the absolute level of war violence has declined over time, albeit unevenly (Lacina & Gleditsch, 2005; Human Security Report Project, 2011). Different scholars emphasize different time periods, although most view the Second World War as an important turning point. Second, sometimes the main claim is about per capita, rather than total, war violence (Pinker, 2011). A third tendency is that no one we are aware of argues that truly huge wars have become impossible. To be sure, a sense of optimism pervades this literature with proponents generally providing reasons why war violence is decreasing and why this trend might reasonably be expected to continue. Yet, invariably, there is also a note of caution about the future.
The recent critique of the decline-of-war thesis was instigated by Cirillo & Taleb (2016b), who collected data on 565 wars going all the way back to Boudicca's rebellion against the Romans in the first century common era (CE). Using extreme value theory to fit the fat-tailed data, they find that they cannot reject their model and conclude from this non-rejection that the data do not support a decline-of-war thesis. In a companion paper they go further, writing that 'there is no scientific basis for narratives about change in risk' (Cirillo & Taleb, 2016a). Cirillo & Taleb (2016b) helped to prompt renewed focus on the importance of fat tails in war sizes for the decline-of-war debate; however, they left several important issues unresolved. First, although a main contribution of their work is the data collection effort, their dataset is not publicly available, and they have refused to allow other researchers to examine it (Spagat, 2017). This stance takes their work outside the scientific universe, at least for now. Second, non-rejection of a model fitting two thousand years of data does not rule out the possibility of scientifically grounded discussions about possible changes in war risks during subsets of these two thousand years. For example, there could be a big change after, e.g., war number 500 but without the last 65 draws disturbing the fit of the first 500 draws sufficiently to lead to rejection of the whole model. Imagine flipping a coin that has a 0.5 probability of landing heads for the first 500 flips and a 0.3 probability of landing heads for the last 65. You would probably not reject a hypothesis that all the flips had a chance pretty close to 0.5 of landing heads. More importantly, if you confine your analysis to the 565 flips as a whole, then you will get no hint that there was a dramatic change after flip number 500. It would have been more appropriate to test for a break in the data at a potential change point, such as the end of the Second World War; Cirillo & Taleb (2016b) do not provide such a test. Third, there is an overarching assumption in this approach that the only evidence scientifically admissible to our discussion is a list of war sizes and timings. Cirillo & Taleb (2016b) seem to think that historical events such as peace treaties, formation of international institutions or social trends such as improving human rights are, simply, outside the bounds of a scientific discussion; this restrictive view makes little sense. Clauset (2018) addresses the first two of the unresolved issues. First, he uses the open-source Correlates of War (COW) dataset that covers interstate wars from the beginning of the 19th century to the present (2007). Second, his whole analysis focuses on testing for a trend break starting at the end of the Second World War. The essence of his approach on war sizes is to fit a power law to the data up through the Second World War and then test the hypothesis that the data after 1945 was generated by this distribution, i.e., he tests what we call a no-change hypothesis. Clauset (2018) concludes that he cannot reject the no-change hypothesis. This finding is intuitive in light the numerical examples provided above although there is certainly tension between the no-change hypothesis and the last 70 years.
Clauset (2018) provides a useful contribution to our thinking but, at the same time, we must be cautious about this result for several reasons. First, other information besides the time series of war sizes is potentially relevant. Second, we should not think exclusively in terms of any one particular hypothesis such as the no-change one. There are other hypotheses, more in line with a decline-of-war thesis, that would also not be rejected by the data. For example, suppose we modify the no-change hypothesis by stipulating that wars with more than 5 million battle deaths became very very rare after the Second World War. That is, we virtually eliminate the fat tail from the hypothesized war generation mechanism. This restriction is fully consistent with the post-1945 experience since no war during this period comes close to such a size. Thus, this hypothesis is consistent with decline-of-war ideas and will also not get rejected by the data. And there is no reason to privilege the no-change hypothesis over this one. Third, we must not fall into the trap of accepting the null hypothesis based on its non-rejection. Clauset (2018) finds that we would finally reject his no-change hypothesis (p\0:05) after about 100-140 more years without a truly huge war. Even then we still would not be able to entirely rule out the no-change hypothesis. However, if the data became extremely contrary to the no-change hypothesis after 100 sufficiently peaceful years then the data would already be fairly contrary to this hypothesis after 50 sufficiently peaceful years. Returning to our earlier calculations, recall that the Gleditsch (2004) dataset contains 212 wars for the period after the Second World War. If a further 212 wars occur without a truly huge one, perhaps over the next 70 years, we could then reject this version of the no-change hypothesis at a 10% level, which would be rather convincing evidence that there was a change for the better. In other words, the 0.05 threshold is arbitrary and excessively binary; non-rejection of the no-change hypothesis does not mean that the decline-of-war thesis is false until it suddenly switches to true after 100 years without a truly huge war.

Measuring War
Our empirical analysis relies on two datasets that cover war sizes and dates; the commonly used Correlates of War (COW) dataset (Sarkees & Wayman, 2010), which was also used by Clauset (2018), and the dataset compiled by Gleditsch (2004). The two datasets overlap substantially, and both cover the period 1816-2007. Indeed, the Gleditsch (2004) dataset is based on the COW dataset. However, there are important distinctions that are worth understanding even though it turns out that our results do not depend materially on the choice of dataset. For COW there is a big change in the inclusion criteria in 1920 with the founding of the League of Nations. The fundamental test for COW is always membership in the international system for both states, in the case of inter-state war, and for the state in the case of intra-state war. Between 1816 and 1920 this test breaks down into two parts; (i) a population greater than 500,000 and (ii) being 'sufficiently unencumbered by legal, military, economic, or political constraints to exercise a fair degree of sovereignty and independence'. After 1920, the COW test switches to membership in the League of Nations (or United Nations) and receiving diplomatic missions from any two major powers (Singer & Small, 1972). Gleditsch & Ward (1999) note that, in practice, the pre-1920 test boils down to having formal diplomatic relations with Britain and France. This rule excludes many countries and their wars, including the three Anglo-Afghan wars that took place between 1839 and 1919 and some intrastate wars such as the 1831-45 civil war in Central America.
It would be unfair to label the COW dataset as simply incorrect, yet we believe that its British-French emphasis excludes many wars that are relevant to the decline-of-war debate. The revised data by Gleditsch (2004), which corrects these systematic problems, contains 574 wars between 1816 and 2007, 136 of which are interstate wars. During the same period COW contains only 474 wars, 95 of which are interstate. Thus, the difference in war counts is substantial. Moreover, 1920 is close enough to the Second World War so that the 1920 switch could potentially affect the results of Clauset (2018). Thus, we prefer the Gleditsch data but run our calculations on both datasets.  (2013) with some interpolations before 1950. The probability that an average person will be killed in war is of particular interest to the decline-of-war discussion and population adjustment is appropriate to assess this probability. In a similar vein, analysts normally assess, for instance, violence progress by examining the number of homicides per 100,000 of population or the quality of health services through the number of maternal deaths per 1000 live births. At the same time, we recognize the point of Braumoeller (2013) who argues that examination of unadjusted war sizes is of great relevance to understanding human war-proneness. 6 A third contrast with Clauset (2018) is that we include in our analysis all the wars in each dataset, not just interstate wars. 7 We think that there is no a priori theoretical justification to separate out interstate wars and agree with Small & Singer (1982) who argued that 'an understanding of international war cannot rest on interstate wars alone'. The common focus on wars involving major powers or other interstate wars seems to be driven by data availability rather than theoretical considerations (Cunningham & Lemke, 2013). Indeed, the third, fourth and sixth largest wars measured in per capita terms in the Gleditsch dataset are all intra-state ( Fig. 11.1). Thus, combining all wars is best practice in our view although we also run our calculations on interstate wars alone.
War-size numbers are intended to include just battle deaths, but both of our datasets work from available sources that sometimes mix in other kinds of deaths. This issue creates two separate problems. First, ideally we would have data on the full human cost of war but often we only have data on the battle-death component of this cost. For example, both datasets record 910,084 deaths for the Korean War, but a full figure would include famine deaths that could push the number up to 5 or 6 million (Lacina et al., 2006). Second, there is inconsistency across wars since some figures hue close to a battle-deaths-only concept whereas other figures are more comprehensive. 6 A war that kills one million people is an unmitigated disaster both in a world of 5 billion people and in a world of 9 billion people. 7 For COW, all wars means inter-state, intra-state, extra-systemic and non-state. The Gleditsch dataset does not have the last two of these categories, although its more inclusive definition of state means that it codes some COW extra-systemic and non-state wars as either inter-state or intra-state wars. Arguably, we should subtract the populations of ungoverned spaces that fall outside the scope of the Gleditsch dataset from our world population figures. Such adjustments would enhance our decline-of-war results because they would increase the per capita sizes of earlier wars relative to later ones; governance spreads over time. However, these adjustments would be very hard to perform with any degree of accuracy, so we do not attempt them here.

Insights from the Data
A particular feature of our approach is the large number of no-change hypotheses that we test. All our hypotheses are based on two separate cut-off points: one for time periods and the other for per capita war sizes. Our time periods pivot around either the Second World War or the Korean War, but future work should consider more cut-off points. For war sizes we consider all possible cut-offs and examine the fraction of all wars above each war-size cut-off for both the early period and the late period. In short, we examine many right-hand tails and test whether the tails for the later periods are thinner than the tails for the earlier periods.
Here are some sample calculations when the time cut-off point is 1945. According to the Gleditsch (2004) data, there were 362 wars between 1816 and 1945 with the Second World War being by far the largest. Our first no-change hypothesis for the post-1945 period is that the probability that a random war after 1945 will kill at least 781 people per 100,000 (Fig. 11.1) is given by the fraction of all wars before 1945 that reached this violence level. This fraction is p 0 ¼ 1 362 % 0:003. Zero wars out of 212 in the Gleditsch (2004) data attained this size between 1946 and 2007. If war sizes are drawn randomly and independently of each other and if the no-change hypothesis is true, then the probability of this happening is ð1 À 1 362 Þ 212 ¼ 0:56. This probability can be interpreted as a p-value on one Fig. 11.1 The largest wars as measured by battle deaths per 100,000 on the right y axis. Based on Gleditsch (2004) particular no-change hypothesis at the most extreme end of the distribution of war sizes. 8 Next we calculate exactly the same types of p-values but for lower and lower war sizes. For war sizes beginning at 781 per 100,000 and moving down towards 499 per 100,000, the size of the First World War, the p-values stay constant. At 499 battle deaths per 100,000 the p-value drops to ð1 À 2 362 Þ 212 ¼ 0:31. It then stays constant all the way down to 52 battle deaths per 100,000, the size of the American Civil War (1861-65), where the p-value drops down to 0.17 − ð359=362Þ 212 . In short, the three biggest wars were all before World War II inclusive and together they yield a preponderance of evidence against a no-change hypothesis but not a formal rejection at the 0.05 level. The next largest war is the second phase of the Chinese Civil War which pitted the communists under Mao Zedong against the nationalists under Chiang Kai-shek and caused 51 battle deaths per 100,000 people. The no-change hypothesis assigns probability 3 362 to the probability that each war size after World War II will exceed 51. This happens once in 212 draws so the p-value on the no-change hypothesis adds together the probability of 0 wars above size 51 and the probability of 1 war above size 51, leading to a p-value of 0.47.
We calculate p-values similarly as we move to smaller and smaller war sizes. When, for example, there are 6 wars before 1945 of size s and above then the no-change hypothesis fixes a probability of 6 362 on the event that a new post-1945 war will be of size s or above. When, for example, three out of these 212 wars after 1945 are above size s then the p-value on the no-change hypothesis is the probability of three or fewer wars of size s or greater after 212 independent draws, each with probability 6 362 of reaching this size. We use the binomial formula to make this calculation. 9 Panel (a) in Fig. 11.2 displays the p-values for the tests of all no-change hypotheses tests with cut-offs for war sizes below 50 battle deaths per 100,000 and with a time break point of 1945. Reading from right to left the curve dips down below 0.2 as we move through the Third Sino-Japanese War which began in 1937, 10 the Russian Civil War  following the Russian Revolution of 1917, and the 1864 Muslim revolt in Xinjiang, China. The p-values then rise back above 0.8 because the next four largest wars all occurred after the Second World War. These are the Korean War , the second phase of the Vietnam War which started in 1965, the Iran-Iraq war between 1980 and 1988, and finally the Second South Sudan War . Next, continuing to read from right to left, the next 9 wars all took place before the Second World War, bringing the p-values back down to around 0.2.
The evidence in Fig. 11.2 is unfavorable to the no-change hypothesis ðp\0:5Þ except in a narrow range of tails for war sizes between about 25 and 28 per 100,000. At the same time, we never reject the no-change hypothesis at the standard 0.05 level. The evidence leans towards the decline of war idea but is far from definitive.
When we use 1950, rather than 1945, as a break point the results are much more favourable to the decline-of-war thesis. Now the eight largest wars in per capita terms all occur before the break point. Panel (b) displays the new p-values. No-change hypotheses are often rejected at 0.05, and even 0.01 levels for a wide range of tails. Two of the very biggest wars (the Chinese Civil War and the Korean War) broke out within the 1945-50 time window so the p-value curve now drops much lower than it did when 1945 was break point. 11 We have made four separate data changes compared to Clauset (2018): measuring war sizes in per capita terms, using Gleditsch data rather than COW data, considering 1950 as a break point and including intrastate as well as interstate wars. To isolate the importance of each particular change we now consider them in turn. We first note that adjusting for world population levels is essential to get anything resembling the results in this chapter. This is so much true that we do not even bother showing pictures unadjusted for population. Second, the choice of COW or the Gleditsch war data does not matter much (Fig. 11.3). Third, both Figs. 11.2 and 11.3 show that the choice of break point does matter; evidence against the no-change hypothesis is much stronger when the break is at 1950 than it is when the Fig. 11.2 Tests of no-change hypotheses for all wars. Based on Gleditsch (2004Gleditsch ( ), using 1945 and 1950 (b) as break points 11 We date wars by when they start. break point is 1945. Finally, Figs. 11.4 and 11.5 show that our decision to include intrastate wars also matters. We think this is simply due to sample size; excluding intrastate wars decreases the number of wars, making it harder to reject the no-change hypothesis.

A More Peaceful World Since 1950
There will continue to be debate on the probability of another truly huge war. If we limit our attention to the probability of a future war at least as large as the First World War then, consistent with Clauset (2018), our analysis suggests that there is presently not enough data to draw a strong conclusion. At the same time, our analysis also suggests that the chances of drawing a truly huge war are probably lower now than they were in the 19th century and the first half of the 20th century. When we widen our scope to include smaller but still very large wars, e.g., wars killing more than 40 per 100,000 of world population then there is substantial evidence that the world has become more peaceful since the 1950s.
Until recently scholars have tended to assume that the Second World War is the obvious candidate for a break point into a more peaceful world. However, recent papers by Fagan et al. (2018) and Cunen et al. (2018) start from an agnostic position on potential break points and use statistical methods to detect convincing ones. Both papers find substantial evidence for a change at 1950 although they identify other candidate break points including 1912 (Fagan et al., 2018) and 1965 (Cunen et al., 2018). These results complement ours nicely.
There is certainly room to improve our analysis. First, we repeat our caution that a full treatment of the issues should consider more than just the time series of war sizes (and population numbers). Second, it would be helpful to go beyond battle deaths to include more complete numbers on war deaths. Unfortunately, it is unlikely that this second hope will ever be fully possible. Third, the new research into defining change points is an important development that will, hopefully, continue. Despite the potential for improvement, we believe that our chapter should shift the debate in favour of the decline-of-war thesis.