1 Introduction

Several recent studies have uncovered a conditional association between oil and the onset of ethnic war.Footnote 1 Challenging and improving upon them, we develop a more integrative and fine-grained theory regarding the conditional relationship between oil and the onset of ethnic war. We contend that it is the ethno-geographical location of oil that really connects oil with the onset of ethnic war, rather than a country’s exact amount of oil income, rent (e.g., Collier and Hoeffler 1998, 2004), local production (e.g., Sorens 2011; Hunziker and Cederman 2012; Asal et al. 2016); or (relative) concentration and distribution (e.g., Le Billon 2001; Morelli and Rohner 2015).Footnote 2 More concretely, when the core territory of a minority group possesses a significant amount of oil, the minority group is more likely to rebel against a central state dominated by another group and oil is strongly associated with the onset of ethnic war. In contrast, when oil is located with the core territory of a dominant majority group or that a country has a fairly even distribution of ethnic groups and no group can lay an exclusive claim to oil, oil is not associated with a higher risk of ethnic war (for details, see Sect. 2 below).

To test our theory, we construct two new indicators that integrate the geographical location of oil with the geographical distribution of ethnic groups. Our new indicators avoid not only all the pitfalls of endogeneity associated with earlier measurements of oil rent or income (absolute or proportional) as incisively criticized by Ross (2006, pp. 275–277, 2012, pp. 17–22), but also some key pitfalls associated with more recent indicators on local production and relative concentration of oil (e.g., Sorens 2011; Hunziker and Cederman 2012; Morelli and Rohner 2015; Asal et al. 2016). We test our empirical hypotheses with our new indicators and statistical results strongly support our hypotheses.

In a forthcoming accompanying paper (Tang et al. 2017), we deploy comparative case studies with process-tracing to demonstrate that the mechanisms singled out by our theory really did drive ethnic wars, whereas ethnic peace usually results when these mechanisms lay dormant. Our case studies also strengthen our case against other studies that center upon local production or relative concentration of oil. Together, our two studies crystallize that it is the ethno-geographical location of oil that truly matters for the onset of ethnic war rather than oil alone, the amount of oil income/rent/(local) production,Footnote 3 or relative concentration. Our studies thus provide a close-to-definitive answer to the question whether and how oil is associated with the onset of ethnic war. Our exercise also points to a broader theory that connects the ethno-geography of other commodity-type mineral resource with the onset of ethnic war.

Before we proceed further, a key caveat is in order. Many earlier studies on oil and conflict (ethnic or non-ethnic) deploy aggregate data at the country level and thus do not explicitly link the subnational properties of oil with the subnational location of ethnic groups. Most of these studies conclude that oil is generally a curse (for reviews, see Ross 2006, 2012, 2014; Blattman and Miguel 2010; Van der Pleog 2011). These results, however, have been questioned from time to time (some key studies are cited in Appendix E in supplementary material). Because out study deploys subnational data and explicitly links the subnational location of oil with the subnational location of ethnic groups, our study differs fundamentally from these earlier studies. We thus do not engage with this literature based on aggregate data extensively, although we do cite some of them when appropriate. We concur with Smith (2014) call that the two levels (i.e., national and subnational) should be synthesized. Indeed, our results below, which cover both the group-level and the state-level, provide a possible explanation why results from earlier studies with aggregate national data have been inconsistent.

The rest of the article is structured as follows. Section 2 outlines our new theory, lays out our empirical hypotheses, and paves the way for contrasting our exercise with several recent studies. Section 3 identifies the shortcomings within two recent studies that have reported seemingly similar results, focusing on theorization, measurement, and technical issues. Section 4 provides a brief description of our key explanatory variables and then presents quantitative results for our theory. Section 5 draws implication and concludes.

2 Oil and the Onset of Ethnic War: An Integrative Theory

In a series of excellent reviews (2004a, b, 2006), Michael Ross pointed out that despite some kind of consensus that oil does impact ethnic war (and non-ethnic civil war), results have been inconsistent within the first wave of literature. Ross reasoned that three deficiencies might have caused these inconsistencies (see also Hegre and Sambanis 2006; Smith 2014; Koubi et al. 2014). First, earlier studies stay with aggregate national level data, even though ethnic war is often a subnational phenomenon. Most prominently, earlier studies do not contain any geographical data regarding either resources or ethnic groups. Second, earlier studies have deployed oil production, income, and rent as instruments for oil and yet all these instruments suffer from serious problems of endogeneity (Humphreys 2005; Ross 2006; for earlier evidences, see Brunnschweiler and Bulet 2009; Mitchell and Thies 2012). Different authors have also used different data sets of civil war or ethnic war. Finally, many studies have proposed a wide range of causal mechanisms that purportedly link oil with ethnic war but have not explicitly tested these mechanisms.

The first deficiency has now been largely corrected, thanks to the pioneering study by Le Billon (2001) and Buhaug and Gates (2002). More and more recent studies of natural resources and civil wars now deploy Geographic Information System (hereafter, GIS) data. As a result, our understanding of oil and ethnic war has become far more fine-grained.

Regarding the remaining two deficiencies, we hold that they are so tightly interconnected that we cannot expect to correct one without correcting the other, even with GIS data. Fundamentally, the two deficiencies are underpinned by inadequate theorizing about oil and the onset of ethnic war, aggravated by the lumping together ethnic civil war and non-ethnic civil war. Without adequate theorizing, we are often at a loss about what should be measured, how things should be measured, what mechanisms should be tested, and how they should be tested. Consequently, some clear-cut measurements of oil have not been tried: most authors still measure oil in quantity, as oil rents, oil vs. total GDP, oil vs. total export (e.g., Collier and Hoeffler 1998, 2004; Fearon and Laitin 2003), (local) oil production (e.g., Humphreys 2005; Brunnschweiler and Bulet 2009; Wimmer et al. 2009; Cederman et al. 2010; Hunziker and Cederman 2012; Asal et al. 2016), oil income per capita (Ross 2006, 2012), or (relative) concentration/distribution (Morelli and Rohner 2015). Meanwhile, besides several exceptions (e.g., Ross 2004a, pp. 38, 60–61; Smith 2014; Tang 2015; Paine 2016) insisting that some mechanisms are not mutually exclusive, most authors still pit different mechanisms against each other (i.e., greed, grievance, or opportunity) rather than integrating underlying factors, immediate drivers, and mechanisms into a coherent theory.

We thus aim to correct the two interconnected deficiencies together, by developing a more integrative theory regarding oil and the onset of ethnic war. Critically, building upon elements and insights from the existing literature on natural resources and civil conflict (esp. Le Billon 2001; Ross 2004a, 2012) and the literature on the nexus of ethnic domination/subordination and resentment/hatred (e.g., Cederman et al. 2013; Horowitz 1985; Wimmer 2013), we advance a more integrated theory regarding oil and the onset of ethnic war.

Our theory contends that it is the ethno-geographical location of oil that truly connects oil with the onset of ethnic war. When the core territory of a (subordinate) minority group possesses (a significant amount of) oil,Footnote 4 this minority group is more likely to rebel against a central government dominated by another group, seeking greater autonomy or outright secession, ceteris paribus. Oil with such an ethno-geographical location is thus more likely to trigger an ethnic war or intensify an ongoing ethnic war.Footnote 5 Consequently, countries with oil located within the core territories of minority groups are more likely to experience ethnic war. In contrast, when oil is located with the core territory of a dominant majority group or that a country has a fairly even distribution of ethnic groups and no group can lay an exclusive claim to oil, oil is not associated with a higher risk of ethnic war.

Our theory further proposes two interlocked major mechanisms linking the ethno-geographical location of oil with ethnic war.Footnote 6 On the one hand, when oil is discovered within the core territory of a minority group, the central government (dominated by another group or other groups) almost inevitably tries to control the resources for two reasons. First is simple economic interest (i.e., “greed”): every state desires to control more resources and revenues. Second, the central government seeks to preempt the minority group from controlling the resources and revenues partly because the central government fears that the minority group may seek greater autonomy. This is most severe when there had been earlier episodes of ethnic tension, or worse, earlier ethnic war between the group that dominates the central government and the minority group. These two dynamics almost inevitably lead to the central government to tighten its grip on the minority’s core territory and its oil, via (para-)military deployment, forced, or induced migration of the majority group to the core territory of the minority group, and usually both. Indeed, due to the technology and capital-intensive nature of oil production, even without encouragement from the central government, the extraction of oil almost inevitably attracts an influx of immigrant workers, usually in the form of ethnic aliens (from the majority group or other countries) with more technological and linguistic skills plus political and business connections. The result is an “internal colonialization” of the core territory of the minority group by the majority group (Hechter 1970), which, in turn, increases the risk of “sons of the soil” conflict (Weiner 1978; Fearon and Laitin 2011).

On the other hand, even without earlier episodes of ethnic tension and conflict, the minority group will resent the central government—dominated by another group—takes what rightly “belongs” to the minority group away from the its natural owner. Put it crudely, the minority group will inevitably hold that oil discovered within their core territory to be their oil. The influx of immigrant workers as ethnic aliens and the fact that immigrants usually take up most of the high-pay jobs available only add to the resentment by the local minority group in the form of “relative depravation”, partially driven by the fact or perception that the income gap between the minority group and the majority group widens. Worse, oil extraction, and processing almost inevitably entail severe environmental degradation, and oil companies, whether multinationals or state-controlled, almost never compensate the local people enough, and do enough to protect the environment. These dynamics generate greater resentment by the “native” minority group against the “alien” majority group. Finally, elites within the minority group can employ the expected oil revenue to broadcast the bright prospect of greater autonomy or secession, with looting and extorting being a source of income for financing the war effort if oil production has already commenced.

Oil located within the core territory of a minority group, therefore, impacts both the minority group and the state-controlled by another (majority) group. Most critically, adding the two sides of the dynamics together results in a powerful mixture of immediate drivers of ethnic war. More specifically, oil located in the core territory of a subordinate minority group impacts fear of secession (by the majority), resentment (by the minority), interest or greed (both sides), and possibly capability (for the minority especially). And if some hatred between the minority group and the majority group already exists, oil located within the core territory of a subordinate minority group would impact five of the seven immediate drivers of ethnic war which, in turn, will drive the two sides into a spiral of escalating tension and mutual distrust, and eventually war. As such, our theory predicts that oil located within the core territory of a (subordinate) minority group should be strongly and positively associated with the onset of secessionist ethnic war.

From our new theory, two hypotheses for quantitative exercises,Footnote 7 each with two subhypotheses, can be derived:

H1A-G: At the group level, when oil is located within the core territory of a minority group, this minority group is more likely to rebel against the central government dominated by another group, seeking greater autonomy or outright secession, ceteris paribus.

H1B-G: At the group level, oil located within the core territory of a minority group may have little impact on the risk of governmental civil war.

H2A-C: At the country level, countries with oil located within the core territory of a minority group are more likely to experience secessionist ethnic war, ceteris paribus.

H2B-C: At the country level, countries with oil located within the core territory of a minority group are no more likely to experience war of ethnic infighting that is for the control of the government.

3 A Critique of Existing Theorizing and Empirical Efforts

Oil and civil conflict is an extremely crowded field, and several studies have touched upon some of the elements within our theory and advanced one or two of the hypotheses we advanced here. They have also provided empirical results that on first glance seem to be similar our results reported below. Due to space limitation, we leave a more extensive survey of the literature to Appendix E in supplementary material. In this section, we critically focus on two recent studies, because they look the most similar to what we report here. We first show that a study by Morelli and Rohner (2015) is theoretically unsound and statistically un-robust (our replications of their results are in Appendix D in supplementary material). While explicitly noting the useful and valid factors, mechanisms, and theses from it, we also show that a study by Asal et al. (2016) too suffers from several weaknesses.

A common weakness of Morelli and Rohner (hereafter, M&R 2015) and Asal et al. (2016) is their reliance on the PETRODATA data set for coding oil location or production without acknowledging that data on oil discovery and production within the PETRODATA data set have many missing observations. Thus, M&R’s (2015, pp. 37) claim that their data set on oil discovery and production is complete which simply cannot be substantiated. Indeed, within the PETRODATA set, only 540 entries (out of 891) or 60.6% have data on discovery time; but only 309 entries (out of 891) or 34.8% have data on production time. In contrast, not only we use two different data sources (USGS and PETRODATA) to code our key independent variable, we obtained almost identical results with the two different data sources (see Sect. 4 below for details).

3.1 M&R’s (2015) “Oil Gini” and the Onset of Civil War

M&R (2015) first built a simple bargaining model that addresses both group concentration and resource concentration. Their simple model points to the possibility that when group concentration and resource concentration are both high, groups then cannot credibly commit themselves to peaceful bargains under a variety of conditions, and civil war may result. Their model suggests a key empirical prediction “that conflicts are fuelled by non-governing ethnic minority groups living in very oil-rich regions…civil war is likely when resource discoveries happen in regions that are significantly populated by groups that do not belong to the governing coalition in the country.”

Then, by merging Lujala et al. (2007) PETRODATA data set and the Geo-referencing of Ethnic Groups (GRED) data set by Weidmann et al. (2010), M&R (2015, pp. 33) created an indicator of “Oil Gini” to capture “the unevenness of oil field distribution across ethnic group for a given country and year” at the country level. At the group level, M&R invented an indicator of “R1/R1 + R2” to capture a group’s relative concentration of oil within a country. Within “R1/R1 + R2”, R1 denotes resources (i.e., oil and gas) within the core territory of a minority group (group 1), whereas R2 denotes resources within the core territory of all other groups within a country. Hence, “R1/R1 + R2” is “the surface of an ethnic group’s territory covered with oil and gas as a percentage of the country’s total surface covered with oil and gas” at the (ethnic) group level (ibid, 33). They report quantitative results that corroborate their model and hypotheses.

Unfortunately, M&R’s theoretical and empirical exercises suffer from three critical shortcomings. Most critically, their key explanatory variables at both the country level (i.e., Oil Gini) and the group level (i.e., R1/R1 + R2) are problematic, because different situations may have the same “Oil Gini” value at the country level or the same R1/R1 + R2 at the group level, and yet, these situations may hold very different, even diametrically opposite, implications for ethnic war. Simply put, not all situations of (un-)equal distribution with the same Oil Gini score or the same R1/R1 + R2 value are the same.

First and at the country level, “Oil Gini” cannot differentiate two situations of extreme uneven distribution of oil and gas: both a situation in which all the oil is located within the core territory of a majority group and a situation in which all the oil is located within a minority region will have an “Oil Gini” value of one. Yet, these two situations will have diametrically opposite impact on the onset of ethnic war. In the former case, the probability that a minority group will rebel for the oil in the territory of the majority group is almost zero and any ethnic war experienced by this country will have little to do with oil.Footnote 8 In contrast, in the latter case, the probability that the minority group will demand to control the oil revenue will be extremely high and hence the chance of an ethnic war is going to be high, as suggested by M&R’s own logic (see also Sect. 2 above). This suggests that the effect of “Oil Gini” on the onset of ethnic war may be non-monotonic, yet “Oil Gini” cannot cope with this possibility. For M&R (2015, pp. 32), the relationship is monotonically (and linearly) positive: “war [will] be more likely when resource concentration and group concentration are high.”

Second, R1/R1 + R2, M&R (2015)’s measurement of uneven distribution of oil at the group level, is also problematic. According to them, R2 within R1/R1 + R2 contains all other groups except the first group. This neglects the fact that many multiethnic countries have more than one (minority) groups and each of the subordinate minority groups may stake a separate (and separatist) claim to the oil within its core territory. As such, what really matters is not one group versus the rest but one group versus the majority or dominant group. The case of Indonesia in which both Aceh and Irian Jaya/West Paupa held substantial oil and gas is one such example.

More critically, like their logic at the country level, M&R’s logic at the group level suggests that R1/R1 + R2 must be high enough to trigger the onset of civil war (non-ethnic and ethnic war), although they fall short of specifying a threshold level for R1/R1 + R2. Their logic thus ignores the possibility that a subordinate minority group may possess only a small fraction of the total oil and gas reserve in a country and yet may still rebel, because the group deems the small fraction now being controlled by the dominant majority group as a sufficient casus belli. Aceh in Indonesia has been such a tragic example. According to M&R’s own measurement, Aceh’s R1/R1 + R2 value is only 0.027, a very small value. Yet, Hasan di Tiro, the former leader of Free Aceh Movement still called for Acehnese’s rebellion against “the neo-colonialist Javanese state”, because gas in Aceh belongs to Acehnese (di Tiro 1984, for more detailed discussion of the Aceh-Indonesia conflict, see Tang et al. 2017 and the references cited there).

Third, although M&R’s two major independent variables are strictly related to ethnic groups—their claim that their model does not depend on ethnicity notwithstanding (M&R 2015, pp. 34, fn. 12), their dependent variables at both the country level and the group level are civil war onsets and incidences, thus including both ethnic wars and non-ethnic civil conflicts. This is not only theoretically inconsistent, but can seriously bias estimations. Moreover, in most of their regressions, they use OLS rather than logit or probit models to estimate the impact of oil concentration upon the onset of civil war, even though it is widely acknowledged that OLS estimation tends to generate smaller standard error thus exaggerates the significance of explanatory variables when the dependent variable is binary, multinomial, or ordinal.

Unsurprisingly, M&R’s results that supposedly show that oil concentration has a robust and significant effect upon the onset of civil war are at both the country level and the group level are extremely fragile, contrary to their claims of robustness, as our replications of their results clearly show (see Appendix D in supplementary material for detail).

3.2 Asal et al. (2016)

On surface, our study is also quite similar to Asal et al. (2016). Upon closer look, however, it is evident that our study betters Asal et al. (2016) both theoretically and empirically.

First, our theorization is both more integrated and more fine-grained than theirs. Contrary Asal et al. (2016), we do not hold that a group has to be excluded from the center to rebel. This is so because a minority group, by definition, is almost always under-presented in the center and ex ante grievance can amplify the degree of a minority group’s feeling of being dominated or excluded. We also explicitly differentiate not only ethnic civil war from non-ethnic civil war but also secessionist ethnic war from ethnic infighting. We then theorize that oil located with the core territory of minority group impacts only secessionist ethnic war, but not non-ethnic civil war or ethnic infighting (see Sect. 2).

Second and consistent with our theory, we provide more fine-grained results, showing that oil located with the core territory of minority group impacts only secessionist ethnic war, but not non-ethnic civil war or ethnic infighting (see Sect. 4).

Third, although Asal et al. (2016) hypothesized that the interactive term between political exclusion and oil location drives the onset of ethnic war, they initially failed to obtain such clear-cut results initially (10–11). As a result, they had to use a less straightforward technique to obtain results that can support their hypotheses. In contrast, our empirical results have consistently been supporting our hypotheses (see Sect. 4). In addition, for some reason, Asal et al. (2016) did not report country-level results.

Finally, econometric analyses by Asal et al. (2016) leave much room for improvement, to say the least. For instance, they did not perform regressions for rare events. In contrast, we have performed a variety of robustness checks and obtained positive results consistently.

3.3 Summary

Despite their various merits, the above-mentioned two studies have some shortcomings. In contrast, our empirical exercises are consistent with our theoretical exercises, because the ethno-geographical location of oil—our key explanatory variable—is firmly underpinned by our new theory. Moreover, our key explanatory variable is almost entirely exogenous, clear-cut, non-arbitrary, which have few missing data points. As becomes clear in Sect. 4, our key explanatory variable, oil location, is significant across all specifications with an array of control variables, at both the country level and the group level.

4 Quantitative Data and Results

Humphreys (2005, pp. 518–522) complained that earlier indicators of oil (and natural resources) are simply too aggregated for differentiating the different mechanisms through which natural resources impact civil war. Accordingly, Humphreys suggests that one solution is to construct more fine-grained indicators of oil (and civil war) that allows for finer resolution of mechanisms. We concur with Humphreys (2005) but go further. We believe that a more valid solution is to construct clear-cut and non-arbitrary measurement of oil (and natural resources), informed by rigorous theorizing. Existing indicators of oil, including oil rent, oil versus export (Collier and Hoeffler 2004), oil export dummy (Fearon and Laitin 2003), (local) oil production (Humphreys 2005; Hunziker and Cederman 2012; Sorens 2011), 100 US$ per capita oil income as the cutoff point (Ross 2006, 2012), or relative distribution/concentration (M&R 2015) are endogenous to conflict, arbitrary, or invalid. Moreover, these indicators are not underpinned by a sophisticated theory that links oil with the onset of ethnic war.

Our theory explicitly posts that it is whether a significant amount of oil is located within the core territory of a (subordinate) minority group that really connects oil with the onset of ethnic war. We construct two new explanatory variables to capture this ethno-geographical location of oil. Our two explanatory variables are not only clear-cut dichotomous variables with few missing data points, but also avoid most, if not all, of the pitfalls associated with earlier indicators of oil (e.g., endogeneity, arbitrariness).

4.1 Sample and Key Explanatory Variables

Following our theory, we restrict our sample to multiethnic countries at the country level and minority groups that are concentrated with a core territory at the group level.

At the country level, we rely on the Geo-Ethnic Power Relations (GeoEPR-ETH) data set by Wucherpfennig et al. (2011), which is the GIS-informed version of Ethnic Power Relations (EPR) data set (Cederman et al. 2009; Wimmer et al. 2009). Countries with a homogenous population (e.g., the two Koreas) are excluded from the GeoEPR-ETH data set, because they, by definition, cannot experience ethnic war. The GeoEPR-ETH data set covers 125 multiethnic countries with a population more than half million, including 110 onsets of ethnic war from 1946 to 2005. Our final data sets cover the same 125 countries from 1946 to 2005 (group level) and from 1946 to 2010 (country level).

At the (ethnic) group level, following our theory and Wucherpfennig et al. (2011, pp. 428–429), we restrict our sample to minority ethnic groups with a territorial base, that is, minority groups that are coded as 1 (regionally based), 3 (regional and urban), and 6 (aggregate) in the GeoEPR-ETH data set (Wucherpfennig et al. 2011; Bormann 2011). Our final data set covers 491 minority groups within the 125 multiethnic countries.

Within these samples, following our theory, we construct two new explanatory variables, one at the group level and one at the country level.

At the group level (variable name: oil location, group), for each minority ethnic group with a territorial base, we code it 1 when a group’s core territory has oil and 0 when it does not.

At the country level (variable name: oil location, country), we code it 1 when a multiethnic country has oil located within core territories of minority group(s) and 0 when (1) a country does not have any oil, (2) although it has oil, its oil is located within the territory of the dominant majority group, and (3) the country, despite being multiethnic, has no minority group concentrated within a particular region (i.e., the country is evenly occupied by a mixture of ethnic groups). At the country level, we also restrict our sample to only countries with oil as a robustness check. Results with this smaller sample are essentially identical to results with the sample of all countries (see Table 4).

We construct the two key explanatory variables by merging the information on the geographical location of ethnic groups in the GeoEPR-ETH with either the information on the geographical location of oil basins from U.S. Geological Survey (hereafter, USGS) or the information on the geographical location of actual oil fields in the PETRODATA data set by Lujala et al. (2007). By so doing, we are able to locate most major oil basins or major oil fields into specific geographical locations and then determine whether oil basins cover the core territory of a minority group (i.e., oil basin only covers the core territory of a majority group or there is no concentration of minority groups), or whether oil fields are located within the core territory of a minority group.

The reason why we have two sets of key explanatory variables, one underpinned by data on oil basins whereas the other by data on actual oil fields, requires a bit explanation.

USGS provides the most complete coverage on all known oil basins in the world. Yet, oil basin does not mean actual oil deposit, and certainly does not mean actual oil discovery and production. Thus, using the USGS data on oil basin also allows us to avoid at least some endogeneity in oil discovery and production (Melando 2014; cf. M&R 2015; Asal et al. 2016). Moreover, the USGS data should overestimate the number of (minority) ethnic groups with actual oil field(s). Logically, this means that the explanatory variable constructed from the USGS data (oil location, USGS) should point to a diminished impact on the onset of ethnic war by the ethno-geographical location of oil. Consequently, if our hypotheses are supported even with USGS data, we shall have great confidence that the ethno-geographical location of oil is positively associated with the onset of ethnic war in the real world.

The PETRODATA data set by Lujala et al. (2007) represents a major advancement: for the first time, geographical information of oil–gas fields based on GIS data has been brought into the study of civil conflict. The basic entry unit in the PETREODATA data set is oil (and gas) field either with a discovery date or a production date.

The PETRODATA data set, however, suffers from a severe problem of missing data. Within the data set, of the 891 entries of oil fields, only 540 entries (out of 891) or 60.6% have data on discovery time; but only 309 entries (out of 891) or 34.8% have data on production time.Footnote 9 For our purpose here, this severe problem of missing data means that we cannot possibly know whether the PETRODATA data set underestimates or overestimates the number of (minority) ethnic groups with actual oil fields located in their core territories. Yet, if our quantitative hypotheses are also supported by the PETRODATA data set, in addition to being supported by the USGS data, we shall have great confidence that our hypotheses hold in the real world. We thus use the results with the PETRODATA data set as additional robustness checks (reported in Appendix C in supplementary material). Here, suffice to say that results with the PETRODATA data set are almost identical to the results with the USGS data set (see immediately below): both sets of results strongly support our quantitative hypotheses.

To improve our coding of the two key explanatory variables based on the PETRODATA data set, we also employ information within the large map collections from the University of Texas Perry-Castaneda Library Map Collections. Some of the high resolution maps from this collection provide not only information on geographical information of ethnic groups but also geographical information of major oil and gas fields. Whenever possible, we also use other information (e.g., Li 2011; Petroleum Economists by World Energy Atlas, Oil and Gas Journal, and internet) to further improve the PETRODATA data set.Footnote 10

We admit one key drawback of our key explanatory variable (i.e., oil location): it is a dichotomous variable that is often time-invariant. As both a compensating measure and a robustness check, we multiply it with oil price from 1946 to 2005 (for group level) or 2010 (for country level), in nominal dollars per million British Thermal Units of natural gas priced at the Henry Hub in Louisiana compiled by Ross and Mahdavi (2015) from British Petroleum (BP) Statistical Review and the Economist Intelligence Unit (EIU). We reason that oil price is mostly exogenous to any specific ethnic or non-ethnic civil war within a country. Rather, oil price has mostly been driven by consumption and overall production in the world and key interstate wars and their aftermath (e.g., the Fourth Israel–Arab War, Iraq’s invasion of Kuwait, and U.S. invasion of Iraq in 2003) rather than by any particular ethnic or non-ethnic civil war. We thus use the product of oil location and oil price as another key explanatory variable for another set of robustness check (see Appendix B in supplementary material). Here, suffice to say that using the product oil location and oil price as the key explanatory variable yields almost identical results to models with oil location as the key explanatory variable.

We, however, refrain from artificially making our key explanatory variables more time-variant by multiplying them with a country’s oil production (per capita), rent, or value, because these instruments of oil suffer from serious endogeneity problem with ethnic and non-ethnic civil war (Humphreys 2005; Ross 2006, 2012; for earlier empirical evidences, Brunnschweiler and Bulet 2009; Mitchell and Thies 2012). Indeed, our own results reported below also indicate that at least at the national level, oil production per capita is not significantly associated with the onset of ethnic war after controlling for oil location.

We believe that our key explanatory variables, short of more accurate information on the date of discovery and production or the amount of oil extracted from the core territory of minority groups (which is difficult, if not entirely impossible, to obtain), represent the most accurate and comprehensive indicators on the ethno-geographical location of oil so far.Footnote 11

4.2 Dependent Variables and Control Variables

Our key dependent variable is the onset of ethnic war. Our dependent variables also have two levels: group and country. To facilitate interpretation and comparison with results reported by earlier studies, in addition to the coding of the onset of civil war in the UPCD/PRIO Armed Conflicts Data set (ACD data set, version 4-2009, for details on the ACD data set, see Gleditsch et al. 2002; Harbom and Wallensteen 2009), we also adopt the coding of the EPR data set. The EPR data set not only singles out the onset of ethnic war (variable name: Ethonset), but also refines the coding of ACD by further differentiating ethnic war into infighting ethnic war between groups within the central government from rebellion ethnic war by group not within the central government (Wimmer et al. 2009; Cederman et al. 2010). According to our theory, we shall expect the ethno-geographical location of oil to be significantly associated with only ethnic or secessionist civil war but not with governmental civil war at the group level (H1A–G and H1B–G). At the country level, we shall expect the ethno-geographical location of oil to be significantly associated with only rebellion ethnic wars but not with infighting ethnic wars (H2A–C and H2B–C). Description of control variables at both the group level and the country level is found in Appendix A in supplementary material, due to space limitation.

Following the recommendations of Beck et al. (1998) and Carter and Signorion (2010), cubic splines for peace years are introduced to reduce possible biases in working with binary or multinomial panel data at both the group level and the country level, although they are not shown in the tables for the sake of space.

All descriptive statistics and explanations of the key independent variables, dependent variables, and control variables are in Appendix A in supplementary material.

4.3 Model Specifications and Results

At the group level, our main specification function is:

$${\text{Onset}}_{i,t} = \alpha {\text{Oil}}\_{\text{Loc}}_{i,t} + \beta {\text{Group}}_{i} + \gamma {\text{Year}}_{t} + \varepsilon_{i,t} ,$$
(1)

where Onset i,t is the onset of ethnic war with a group i in year t, \({\text{Oil}}\_{\text{Loc}}_{i,t}\) is the location of oil at the group level or the country level (our key independent variable), \({\text{Group}}_{i}\) is the group fixed effect, \({\text{Year}}_{t}\) is the year fixed effect, and \(\varepsilon_{i,t}\) is the error term. Standard errors are allowed to cluster according to group.

At the country level, our main specification function is:

$${\text{Onset}}_{i,t} = \alpha {\text{Oil}}\_{\text{Loc}}_{i,t} + \beta {\text{Country}}_{i} + \gamma {\text{Year}}_{t} + \varepsilon_{i,t} ,$$
(2)

where \({\text{Onset}}_{i,t}\) is the onset of ethnic war within a country i in year t, \({\text{Oil}}\_{\text{Loc}}_{{{\text{i}},{\text{t}}}}\) is the location of oil at the country level (the key independent variable), \({\text{Country}}_{i}\) is the country fixed effect, \({\text{Year}}_{t}\) is the year fixed effect, and \(\varepsilon_{i,t}\) is the error term. Standard errors are allowed to cluster according to country.

Because our dependent variables are either dichotomous or trichotomous, we employ binary or multinomial logistic regression throughout. Because results at the group level drive results at the country level, we present results at the group level first followed by results at the country level below. In the main text, we report only regular logit results with a minimal number of tables. More robustness checks, including those with penalized maximum likelihood logistic regression that check rare events biases (Firth 1993),Footnote 12 are reported in the online appendixes. Here, suffice to note that our results are extremely robust.

Table 1 shows results at the group level with the key explanatory variable being oil location constructed from USGS data [denoted as oil location (USGS, group)]. In model 1 (the baseline model), we first test the explanatory power of oil location (USGS, group) when the dependent variable is onset of ethnic war as recoded by the EPR data set, with a minimal number of control variables to facilitate interpretation (Ray 2003). As expected, oil location (USGS, group) is positively and significantly associated with the onset of ethnic war (model 1). In terms of odds ratio, model 1 suggests that the probability that a minority group with oil located in its core territory will rebel against a state is 1.92 times of the probability that a minority group without oil will rebel. This result holds in model 2 even after we control for a battery of control variables as conventionally controlled in other group-level studies (e.g., Cederman et al. 2010; Weidmann et al. 2010; Wucherpfennig et al. 2011).

Table 1 Ethno-geographical location of oil and the onset of ethnic conflict, group-level, with USGS data set (dependent variable is onset of ethnic conflict experienced by a group, 1945–2005)

The more interesting results appear in models 3 and 4, which are multinominal logit models. After dividing civil war within the ACD data set into two types: territorial (or ethnical secessionist) and governmental (i.e., aiming for the control of central government, thus non-secessionist), striking results emerge: whereas oil location (USGS, group) remains positively and significantly associated with the onset of ethnic civil war, it is insignificant with the onset of governmental (i.e., non-ethnic) civil war. In terms of relative risk ratio, model 3 suggests that in a given group year, groups with oil location coded as 1 are 3.15 times more likely to experience an onset of territorial (ethnic) war than groups with oil location coded as 0 but are only 1.14 times more likely to experience an onset of governmental (non-ethnic) war than groups with oil location coded as 0.

These results strongly support the core argument of our theory that the ethno-geographical location of oil impacts only the onset of secessionist/ethnic war but not non-ethnic or non-secessionist (i.e., governmental) civil wars. As far as we can tell, we are the first to report such strikingly differentiating results when using oil as a key explanatory variable in the study of ethnic civil war at the global level. These results also strongly support the notion that ethnic civil wars have some fundamental differences from non-ethnic civil wars (e.g.,Horowitz 1985; Wimmer 2013), and it may not be always wise to study them together as they are fundamentally similar (e.g., Fearon and Laitin 1996; Walter 2001).

In Table 2, we focus on the onset of ethnic war at the country level, as recoded by the EPR data set. Again, the results strongly support our theory and hypotheses. Model 1 is our baseline model, and our key explanatory variable oil location (USGS, country) is positively and significantly associated with onset of ethnic war. In terms of odds ratio, model 1 of Table 2 suggests that the probability that a country with oil location coded 1 will experience an onset of ethnic war is 4.19 times of the probability that a country with oil location coded as 0. Again, this result holds as we add more and more control variables progressively (models 2 and 3). In models 4 and 5, we drop ongoing wars, and oil location (USGS, country) remains positively and significantly associated with the onset of ethnic war. Our hypothesis H2A-C is thus strongly supported. Note, however, that oil production per capita is insignificant.

Table 2 Ethno-geographical location of oil and the onset of ethnic conflict, country-level (dependent variable is onset of ethnic conflict experienced by a country, 1946–2010)

In Table 3, we move to more fine-grained analyses of onset of ethnic war, again at the country level. Here, ethnic conflict is divided into two categories: ethnic infighting among power-holders (infighting) and rebellion (i.e., an excluded ethnic group rebels against the state), according to the EPR data set. Again, the results strongly support our theory and hypotheses. In model 1 (the baseline model), oil location (USGS, country) remains positively and significantly associated with the onset of ethnic rebellion but not with the onset of infighting. In terms of relative risk ratio, model 1 suggests that in a given country year, countries with oil location coded as 1 are six times more likely to experience an onset of rebellion ethnic war than countries with oil location coded as 0 but are only 1.6 times more likely to experience an onset of infighting ethnic war than countries with oil location coded as 0. Again, the overall result holds as we add more and more control variables progressively in model 2 and model 3.

Table 3 Ethno-geographical location of oil and the onset of ethnic conflict, country-level (dependent variable is onset of ethnic conflict experienced by a country, 1946–2010, differentiated into ethnic infighting and ethnic rebellion, according to EPR)

Thus, our key explanatory variable (i.e., oil location) captures the different impact of oil location upon two different types of ethnic war (i.e., infighting vs. rebellion). Again, as far as we can tell, we are the first to report such strikingly differentiating results when using oil as a key explanatory variable in the study of ethnic war. Our hypothesis H2B-C is strongly supported. Again, note that oil production per capita is insignificant.

In Table 4, we restrict the sample to country-years with actual oil production (i.e., country-years without actual oil production are dropped from the sample) and replicate the models in Tables 2 and 3. Results with a reduced sample are mostly consistent with results in Tables 2 and 3. When the dependent variable is the onset of ethnic war at the country level, oil location retains its significance although the level of significance is reduced after adding more control variables (model 1 and model 2). When ethnic war is differentiated into infighting and rebellion/secessionist, oil location is only significantly associated with rebellion ethnic wars in the baseline model (model 3), although it becomes significant with both infighting and rebellion after more control variables are added (model 4).

Table 4 Replication of country-level results within only oil producing countries (dependent variable is onset of ethnic conflict experienced by a country, 1946–2010, differentiated into ethnic infighting and ethnic rebellion, according to EPR)

At the group level and the country level, signs in front of control variables are highly consistent with the results reported by Cederman et al. (2010), Wucherpfennig et al. (2011) and Wimmer et al. (2009). Variables that are significant in their regressions remain significant in our regressions, and their directions remain steady. For instance, at the group level (Table 1), both war history (i.e., previous conflict) and distance between the core territory of a minority group and a state’s capital (cap distance) are positively and significantly associated with onset of ethnic war, whereas the nearest distance between the core territory of a minority group and a foreign country (border distance) is negatively and significantly associated with onset of ethnic war. Two political variables from the EPR data set that have been consistently found to be and positively associated with onset of ethnic war (Cederman et al. 2010; Weidmann et al. 2010; Wucherpfennig et al. 2011), “excluded” and “downgraded2”, are also significantly and positively associated with onset of ethnic war at the group level in our regressions (see also Asal et al. 2016). Meanwhile, at both the group level and the country level, GDP per capita is significantly and negatively associated with the onset of ethnic war. Interestingly, at the country level, after controlling for the ethno-geographical location of oil, oil production per capital is no longer significant (Tables 2, 3, 4). This result further strengthens our core argument that it is the ethno-geographical location of oil rather than oil production per se that truly connects oil and the onset of ethnic war.

We perform several sets of robustness check and these results are reported in Appendixes B and C in supplementary material, respectively. In Appendix B of supplementary material, we check our results with USGS by employing the product of oil location and oil price as the key explanatory variables. We also check our results using penalized maximum likelihood logistic regression that check rare events biases (Firth 1993). In Appendix C of supplementary material, we replicate all the regressions with the PETRRODATA data set and obtain almost identical results with the USGS data. As noted above, this fact should give us great confidence that our empirical hypotheses hold in the real world.

4.4 Summary

To summarize, quantitative evidences based on our new indicators strongly support our quantitative hypotheses. At the group level, the ethno-geographical location of oil is strongly and positively associated with the onset of secessionist ethnic war, but not with the onset of non-ethnic (i.e., governmental) ones. At the country level, the ethno-geographical location of oil is strongly and positively associated with the onset of rebellion ethnic war, but not with the onset of ethnic infighting. To our knowledge, we are the first to report such fine-grained and conclusive evidences regarding oil and the onset of ethnic war with a global data set. These results are highly robust across a wide variety of robustness checks (see Appendixes B and C in supplementary material for details). By comparison, earlier results reported by M&R’s (2015) study are quite fragile (see Appendix D in supplementary material for details), mostly likely due to the questionable logic of their key explanatory variable (i.e., Oil Gini and R1/R1 + R2). In addition, results reported by Asal et al. (2016) are not as fine-grained as what we report here (see also Paine 2016).

5 Discussion and Conclusion

(Formerly united) Sudan and Nigeria are similar on least one front: although both have plenty of oil, most of the oil in both countries is located in the core territories of subordinate minority groups. Both countries had experienced bloody secessionist ethnic wars, and one of them ended in the formal breakup of the country (Sudan). Indonesia is another oil-rich country, and most of its oil is located within the territory controlled by its two dominant majority groups (Javanese and Sundanese). Even though only a small proportion of its gas reserve is located within the province of Aceh populated by a subordinate minority group (the Acehnese), Indonesia too had experienced a savage ethnic war. Yet, Gabon, another Africa country with plenty of oil, significant ethnic diversity, little democracy, but an even distribution of ethnic groups, ethnic peace has prevailed so far. Thus, although all these four multiethnic countries are major oil producers, their encounters with the specter of ethnic war have differed greatly. The impact of oil on ethnic war is thus conditional, rather than an inescapable “(oil) curse”.

We have advanced a more integrated and fine-grained theory regarding the conditional association between oil and the onset of ethnic war, singling out the ethno-geographical location of oil as the key in linking oil with the onset of ethnic war. We then present quantitative evidences to support our theory. Along the way, we correct key shortcomings of several recent studies. To our knowledge, we are the first to report such fine-grained and conclusive evidences regarding the ethno-geographical location of oil and the onset of ethnic war. Together with evidences from comparative case studies with process-tracing that demonstrate the mechanisms singled out by our theory really did operate in driving ethnic wars in an accompanying paper (Tang et al. 2017), we provide a more complete and close-to-definitive answer to the question whether and how oil is associated with the onset of ethnic war. Our theory and empirical results hold important implications for understanding the “oil curse” (and the broader “natural resource curse”) and ethnic war in general.Footnote 13

First and foremost, our theory and evidences point to a broader theory regarding mineral resources and ethnic war. Whenever a significant chunk of commodity-type mineral resources (oil, gas, copper, gold, ad diamond) is located in the core territory of a minority group, that group is more likely to rebel, especially if the group has been marginalized or dominated by the central government, all else being equal. As such, a state with significant commodity-type mineral resources located within the core territories of minority groups is more likely to experience ethnic war, all else being equal. We are now in the process to extend our theory in this direction with more systematic evidences.

Second, although we do not engage with the other famous thesis linking oil (and other commodities) with civil war, that is, the “weak state (capacity) thesis” by Fearon and Laitin (2003), our studies cast some doubt on the “weak state (capacity) thesis”. Our studies suggest that “weak state capacity” may not be the primary channel or mechanism through which oil impacts ethnic war (see also Smith 2014). Our studies, however, do point to some possible directions for investigating the interaction between state capacity and the ethno-geographical location of oil and other commodity-like mineral sources and how this interaction impacts ethnic conflict. For instance, admitting that state capacity is endogenous to both ethnic civil war and non-ethnic civil war (Thies 2010), it will be interesting to examine that whether states may intentionally devote less resource to necessary infrastructure in restless ethnic minority regions with plenty of oil and how such different choices by states impact ethnic conflict.

Third, our theory and empirical results from case studies (Tang et al. 2017) reinforce the argument that understanding ethnic war requires not only more disaggregated and fine-grained analyses with geographical information but also careful case studies with process-tracing (Sambanis 2004; Ross 2004a; Smith 2014). Even with GIS data, purely quantitative exercises cannot really differentiate genuinely positive cases from false positive cases. For instance, in most quantitative exercises, whether according to oil income or rent at the national level or according to the “Oil Gini”, and even our own oil location, the two Chechnya wars would have been identified as positive cases that suggest a link between oil and ethnic war. Yet, the two Chechnya wars had little to do with oil located within Chechnya. The Chechens rebelled not because of oil but because of their desire to become independent again (Tang et al. 2017).