## Abstract

Civil conflicts, which have been much more prevalent than inter-state conflicts over the last fifty years, vary enormously in their intensity, with some causing millions of deaths and some far fewer. The central goal of this paper is to test an argument from previous theoretical research that high inequality within an ethnic group can make inter-ethnic conflict *more violent* because such inequality decreases the opportunity cost to poor group members of fighting, and also decreases the opportunity cost to rich group members of funding the conflict. To test this argument, we create a new data set that uses individual-level surveys to measure inequality within ethnic groups. The analysis using these data provide strong evidence for the importance of within-group inequality, and thus underscores the value of focusing on the *capacity* of groups to fight if one wishes to limit the destruction of civil conflicts.

This is a preview of subscription content, access via your institution.

## Notes

Gleditsch et al. (2002), for example, find that since WWII, there were 46 interstate conflicts with more than 25 battle-related deaths per year, 22 of which have killed at least 1000 over the entire history of the conflict. Over the same period, there were 181 civil conflicts with more than 25 battle-related deaths per year, and almost half of them had killed more than 1000 people.

These figures are based on the “best estimate” of the Battle Deaths dataset, version 3.1; see Lacina and Gleditsch (2005).

Previous studies have focused instead on conflict onset, which as noted above, is not clearly linked to the ER theory. Østby et al. (2009) find a positive association between

*within-region*inequality and conflict onset in 22 countries in Sub-Saharan Africa, and Kuhn and Weidmann (2015), which we discuss below, find a positive association between within-group inequality and conflict onset using a global dataset that has the ethnic group—rather than the region—as unit of analysis.See Sect. 5, however, which is devoted to assessing issues of reverse causation and omitted variable bias.

Notice that the Rwandan genocide is not in our dataset as our empirical investigation focuses on two-sided conflicts exclusively. We have included this example to illustrate the fact that Esteban and Ray’s theory is quite general and applies to different conflict typologies.

It is also worth noting that micro-level studies of participation emphasize that richer elites recruit the poor to fight. Brubaker and Laitin (1998), for example, argue that most ethnic leaders are well-educated and from middle-class backgrounds while the lower-ranking troops are more often poorly educated and from working-class backgrounds. In their study of Sierra Leone’s civil war, Humphreys and Weinstein (2008) find that factors such as poverty, a lack of access to education, and political alienation are good predictors of conflict participation and that they may proxy, among other factors, for a greater vulnerability to political manipulation by elites. Justino (2009) also emphasizes that poverty is a leading factor in explaining participation in ethnic conflict.

The SWIID provides comparable (country-level) Gini indices of gross and net income inequality for 173 countries from 1960 to the present and is one of the most thorough attempts to tackle the comparability challenge.

There are a number of data sets on the geographic location of groups, including the GREG, the GeoEPR and the Ethnologue. The GREG dataset (Weidmann et al. 2010) is based on the Soviet Atlas Narodov Mira. The

*Ethnologue*provides information on the spatial location of linguistic groups in much of the world. The group-level studies of horizontal inequality and conflict have relied on the GeoEPR dataset, described in Wucherpfennig et al. (2011), which utilizes an expert survey to determine the identity and location of politically relevant ethnic groups (Wimmer et al. 2009).Østby et al. (2009), for example, use survey data from the Demographic Health Surveys in 22 countries in Sub-Saharan Africa. Their study calculates the Gini coefficient for each region and their analysis finds that regions with higher levels of inequality are most likely to experience the onset of conflict. Fjelde and Gudrun (2014) provide a similar regional-level study, focusing on civil unrest rather than civil war.

KW point out that groups might be relatively geographically segregated in the countryside, this is unlikely to be the case in urban areas. Thus including urban cells can introduce measurement error.

The data are accessed through the ETH Zurich’s GROWup data portal (http://growup.ethz.ch). We combine data from the Ethnic Power Relations (EPR) Core Dataset 2014 (https://icr.ethz.ch/data/epr/core/), the ACD2EPR 2014 dataset on conflict (https://icr.ethz.ch/data/epr/acd2epr/) and the GeoEPR 2014 data set on group attributes, including group economic well being (https://icr.ethz.ch/data/epr/geoepr/). See “Appendix C” for further details.

Following previous literature, we drop from the sample monopoly and dominant groups as by definition these groups cannot stage rebellions against themselves. A group is classified as monopoly if the elite members hold monopoly power in the executive to the exclusion of members of all other ethnic groups. A group is classified as dominant if elite members of the group hold dominant power in the executive but there is some limited inclusion of “token” members of other groups who however do not have real influence on decision making.

The adjusted inequality measures are generated regressors and therefore standard errors that do not take this fact into account are generally invalid since they ignore the sampling variation in such regressors. Nevertheless, for the purpose of testing whether the inequality variables are significantly different from zero, the sampling variation in the generated regressors can be ignored (at least asymptotically). See Wooldridge (2002), chapters 6 and 12 for additional details.

Studies of conflict onset have employed a variety of measures of horizontal inequality. Table 6 in “Appendix A” shows that our results are robust when these alternative measures are used as controls.

XPOLITY combines 3 out of the 5 components of Polity IV and leaves out the two components (PARCOMP and PARREG) that are constructed using political violence in their definition (Vreeland 2008).

Dropping gdp from column 6 and keeping group gdp produces essentially identical results, see “Appendix A”.

More specifically, conflict-share is the average of incidence over the period 1992–2010 and is employed in regressions where the unit of analysis is the group. All time-varying variables are based on values at the beginning of the sample.

This variable is called onset_do_flag in the EPR data set, and it is missing in years where a group is involved in conflict during the years immediately following the year when conflict is initiated. Thus, the data include group-years where a group either begins involvement in conflict or is not involved in conflict. See Wucherpfennig et al. (2011) for details on conflict measures.

We have formally tested whether the coefficients for \(\textsc {G}^R\) in the onset and the intensity regressions in Tables 1 and 3 are statistically different. Since the number of observations in the onset regressions is considerably smaller, power is likely to be low. Nonetheless, tests comparing the estimates of \(\textsc {G}^R\) in columns 3 and 4 from Table 3 to those obtained in the analogous regressions in Table 1 (columns 6 and 7) deliver

*p*-values of .026 and .06, respectively, depending on the specification employed, which suggests there is reasonable evidence in the data to reject equality of coefficients.If no survey is available for a group until time

*t*, observations up to*t*are set to missing.linguistic frac. is based on the group language variables in the EPR data set and is set equal to \(1-\text {size}_{LG_1}^2-\text {size}_{LG_2}^2\), where \(\text {size}_{LG_1}\) and \(\text {size}_{LG_2}\) indicate the size of the two largest linguistic subgroups within the group. Religious fractionalization is defined similarly.

This variable is taken from the EPR data set and equals 1 if geo\(\_\)typename equals “Regionally based.”

All the variables that are introduced in the remainder of this section are taken from the recently released PRIO-GRID 2.0 (Tollefsen et al. 2016). urbanization is the variable in this dataset called urban\(\_\)gc\(\_\)mean.

rainfall is the log of \(prec\_gpcp\_mean\) in Tollefsen et al. (2016) and agriculture is \(agri\_gc\_mean*100\) in this data set.

We also added each of the other variables discussed above—relig. frac., regional, urbanization, rainfall and agriculture—to models 3–7—and the results for the group Gini remain highly robust.

This definition of \(\delta \) corresponds to the case where there is a single observable control, see Oster (2017) for details on the more general case.

The data employed for this regression has been taken from the Growup portal (https://growup.ethz.ch) except for igi, which has been computed by Kuhn and Weidmann (2015).

The models presented in the previous tables use lags of the economic variables in order to diminish concerns about reverse causality. For consistency with the Kuhn and Weidmann (2015) and CWG approaches, variables in Table 12 are not lagged. The results are substantively similar regardless of whether the variables are lagged. Model 1 also omits 15 observations by using exclusion rules commonly used in research that employs EPR data to estimate models of civil war onset. In particular, groups are excluded if they are judged to be dominant, to have a monopoly on power, or if they are geographically dispersed (see discussion in CWG 2011). The results are essentially identical if these exclusion rules are ignored.

Kuhn and Weidmann (2015) also consider in their robustness checks an omitted variable bias analysis in a similar vein as the one discussed in the text, and obtain values of \(\delta \) that suggest that results are robust to omitted variable bias. However, they use the technique introduced by Bellows and Miguel (2009) that does not incorporate the movements in \(R^2\) in the estimation of \(\delta \). As discussed at length in Oster (2017), the omitted variable bias is proportional to the movement in coefficients

*only*if movements in \(R^2\)’s are also taken into account and, thus, it is critical to introduce this term in order to have accurate results.As in Table 10 in “Appendix A”, we have considered different values for R\(^2_\text {max}\), the maximum value of the \(R^2\) coefficient that could be obtained if all the relevant controls were included in the regressions. Only when R\(^2_\text {max}\) was set as low as 0.1 did we obtain values \(\delta \) larger than 1 in some specifications.

group size is size (epr) from the EPR data, and is the “group’s population size as a fraction of the ethnically relevant population of this group’s country.”

## References

Acemoglu, D., & Robison, J. (2005).

*Economic origins of dictatorship and democracy*. Cambridge: Cambridge University Press.Adhvaryu, A., Fenske, J., Khannax, G., & Nyshadham, A. (2017). Resources, conflict, and economic development in Africa. In:

*NBER working papers 24309*. National Bureau of Economic Research Inc.Alesina, A., Michalopoulos, S., & Papaioannou, E. (2016). Ethnic inequality.

*Journal of Political Economy*,*124*(2), 428–488.Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). Selection on observed and unobserved variables: Assessing the effectiveness of Catholic schools.

*Journal of Political Economy*,*113*(1), 151–184.Angel, S. (2012).

*Planet of cities*. Cambridge, MA: Lincoln Institute of Land Policy.Angrist, J. D., & Pischke, J.-S. (2009).

*Mostly harmless econometrics: An empiricist’s companion*. Princeton: Princeton University Press.Arbatli, C. E., Ashraf, Q. H., Galor, O., & Klemp, M. (2018). Diversity and conflict.

*Econometrica*(forthcoming).Ashraf, Q., & Galor, O. (2013a). The “Out of Africa” hypothesis, human genetic diversity, and comparative economic development.

*American Economic Review*,*103*(1), 1–46.Ashraf, Q., & Galor, O. (2013b). Genetic diversity and the origins of cultural fragmentation.

*American Economic Review*,*103*(3), 528–533.Baldwin, K., & Huber, J. D. (2010). Economic versus cultural differences: Forms of ethnic diversity and public goods provision.

*American Political Science Review*,*104*(4), 644–662.Ashraf, Q. H., & Galor, O. (2018). The Macrogenoeconomics of Comparative Development.

*Journal of Economic Literature*,*56*(3), 1119–1155.Bellows, J., & Miguel, E. (2009). War and local collective action in Sierra Leone.

*Journal of Public Economics*,*93*(11–12), 1144–1157.Brubaker, R., & Laitin, D. D. (1998). Ethnic and Nationalist Violence.

*Annual Review of Sociology*,*24*, 42–452.Cederman, L.-E., Gleditsch, K. S., & Buhaug, H. (2013).

*Inequality, gruevances, and civil war*. Cambridge: Cambridge University Press.Cederman, L.-E., Weidmann, N. B., & Bormann, N.-C. (2015). Triangulating horizontal inequality: Toward improved conflict analysis.

*Journal of Peace Research*,*52*(6), 806–821.Cederman, L.-E., Weidmann, N. B., & Gleditsch, K. S. (2011). Horizontal inequalities and ethnonationalist civil war: A global comparison.

*American Political Science Review*,*105*(3), 478–495.Cederman, L.-E., Wimmer, A., & Min, B. (2010). Why do ethnic groups rebel? New data and analysis.

*World Politics*,*62*(1), 87–119.Cheibub, J. A., Gandhi, J., & Vreeland, J. R. (2010). Democracy and dictatorship revisited.

*Public Choice*,*143*(1–2), 67–101.Cramer, C. (2003). Does inequality cause conflict?

*Journal of International Development*,*15*(4), 397–412.Dahrendorf, R. (1959).

*Class and class conflict in industrial society*. Stanford, CA: Stanford University Press.Deininger, K., & Squire, L. (1996). A new data set measuring income inequality.

*The World Bank Economic Review*,*10*(3), 565–591.Desmet, K., Ortuño-Ortin, I., & Wacziarg, R. (2012). The political economy of linguistic cleavages.

*Journal of Development Economics*,*97*(2), 322–338.Doyle, M., & Sambanis, N. (2006).

*Making war and building peace*. Princeton: Princeton University Press.Esteban, J., & Ray, D. (2011). A model of ethnic conflict.

*Journal of the European Economic Association*,*9*(3), 496–521.Esteban, J., Mayoral, L., & Ray, D. (2012). Ethnicity and conflict: An empirical study.

*American Economic Review*,*102*(4), 1310–42.Fearon, J. D. (2003). Ethnic and cultural diversity by country.

*Journal of Economic Growth*,*8*(2), 195–222.Fearon, J. D., & Laitin, D. D. (2000). Violence and the social construction of ethnic identity.

*International Organization*,*54*(4), 845–877.Fearon, J. D., & Laitin, D. D. (2003). Ethnicity, insurgency, and civil war.

*American Political Science Review*,*97*(1), 75–90.Fjelde, H., & Gudrun, Ø. (2014). Socioeconomic inequality and communal conflict: A disaggregated analysis of sub-Saharan Africa, 1990–2008.

*International Interactions*,*40*(5), 737–762.Gleditsch, N. P., Wallensteen, P., Eriksson, M., Sollenber, M., & Strand, H. (2002). Armed conflict 1946–2001: A new data set.

*Journal of Peace Research*,*39*(5), 615–637. (accessed October 1, 2010).Gurr, T. R. (1970).

*Why men rebel*. Princeton: Princeton University Press.Gurr, T. R. (1980).

*Why men rebel: Handbook of political conflict: Theory and research*. New York: Free Press.Humphreys, M., & Weinstein, J. M. (2008). Who fights? The determinants of participation in civil war.

*American Journal of Political Science*,*52*(2), 436–455.Justino, P. (2009). Poverty and violent conflict: A micro-level perspective on the causes and duration of warfare.

*Journal of Peace Research, Peace Research Institute Oslo*,*46*(3), 315–333.Kuhn, P. M., & Weidmann, N. B. (2015). Unequal we fight: Between-and within-group inequality and ethnic civil war.

*Political Science Research & Methods*,*3*(3), 543–568.Lacina, B. (2006). Explaining the severity of civil wars.

*Journal of Conflict Resolution*,*50*(2), 276–289.Lacina, B., & Gleditsch, N. P. (2005). Monitoring trends in global combat: A new dataset of battle deaths.

*European Journal of Population/Revue européenne de Démographie*,*21*(2), 145–166.Melvern, L. (2000).

*A people betrayed: The role of the West in Rwanda’s genocide*. New York: Zed Books.Mitra, A., & Ray, D. (2014). Implications of an economic theory of conflict: Hindu–Muslim violence in India.

*Journal of Political Economy*,*122*(4), 719–765.Morelli, M., & Rohner, D. (2015). Resource concentration and civil wars.

*Journal of Development Economics*,*117*(C), 32–47.Müller-Crepon, C., & Hunziker, P. (2018). New spatial data on ethnicity: Introducing SIDE.

*Journal of Peace Research*,*55*(5), 687–698.Nordhaus, W. D. (2006). Geography and macroeconomics: New data and new findings.

*Proceedings of the National Academy of Sciences of the USA*,*103*(10), 3510–3517.Østby, G., Nordas, R., & Rod, J. K. (2009). Regional inequalities and civil conflict in sub-Saharan Africa.

*International Studies Quarterly*,*53*(2), 301–324.Oster, E. (2017). Unobservable selection and coefficient stability: Theory and evidence.

*Journal of Business & Economic Statistics*. https://doi.org/10.1080/07350015.2016.1227711.Penn World Table. (2015). Dataset. Retrieved March 19, 2017 from https://pwt.sas.upenn.edu/php_site/pwt_index.php.

Polity IV. (2011). Polity IV project: Political regime characteristics and transitions, 1800–2009. http://www.systemicpeace.org/polity/polity4.htm. Accessed 1 Oct 2011.

Ross, M. (2013). The political economy of petroleum wealth.

*Middle East Development Journal*,*5*(2), 1350009-1–1350009-19.Ross, M., & Mahdavi, P. (2015). Oil and gas data, 1932–2014. https://doi.org/10.7910/DVN/ZTPW0Y, Harvard Dataverse, V2.

Sambanis, N., & Milanovic, B. (2011). Explaining the demand for sovereignty. In:

*Policy research working paper series*, vol. 5888. The World Bank.Solt, F. (2009). Standardizing the world income inequality database.

*Social Science Quarterly*,*90*(2), 231–242.Stewart, F. (2002). Horizontal inequalities: A neglected dimension of development. In:

*Annual Lecture No. 5, UNU world institute for development economics research*.Tollefsen, A. F., Strand, H., & Buhaug, H. (2012). PRIO-GRID: A unified spatial data structure.

*Journal of Peace Research*,*49*(2), 363–374.Tollefsen, A. F., Bahgat, K., Nordkvelle, J., & Buhaug, H. (June 2, 2016). PRIO-GRID v.2.0 Codebook. Retrieved June 20, 2017 from http://www.nber.org/ens/feldstein/ENSA_Sources/UCDP_PRIO/PRIO%20GRID/PRIO-GRID-Codebook.pdf.

Verwimp, P. (2005). An economic profile of peasant perpetrators of genocide. Micro-level evidence from Rwanda.

*Journal of Development Economics*,*77*(2), 297–323.Vreeland, J. R. (2008). The effect of political regime on civil war.

*Journal of Conflict Resolution*,*52*(3), 401–425.Weidmann, N. B., Rød, J. K., & Cederman, L.-E. (2010). Representing ethnic groups in space: A new dataset.

*Journal of Peace Research*,*47*(4), 491–99.Wimmer, A., Cederman, L.-E., & Min, B. (2009). Ethnic politics and armed conflict. A configurational analysis of a new global dataset.

*American Sociological Review*,*74*(2), 316–337.Wintrobe, R. (1995). Some economics of ethnic capital formation and conflict. In: A. Breton, G. Galeotti, and R. Wintrobe (Eds.),

*Nationalism and Rationality*(pp. 43–70).World Bank. (2013). Dataset. Retrieved October 17, 2015 from http://iresearch.worldbank.org/PovcalNet/index.htm?0,2.

Wooldridge, J. M. (2002).

*Econometric Analysis of Cross Section and Panel Data (Second Edition)*. Cambridge: MIT Press.Wucherpfennig, J., Metternich, N. W., Cederman, L.-E., & Gleditsch, K. S. (2012). Ethnicity, the state, and the duration of civil war.

*World Politics*,*64*(1), 79–115.Wucherpfennig, J., Weidmann, N. B., Girardin, L., Cederman, L.-E., & Wimmer, A. (2011). Politically relevant ethnic groups across space and time: Introducing the GeoEPR dataset.

*Conflict Management and Peace Science*,*28*(5), 423–437.Yanagizawa-Drott, D. (2012).

*Propaganda and conflict: Theory and evidence from the Rwandan genocide*. Cambridge: Harvard University.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is a revised version of our previous work “Inequality, Ethnicity and Civil Conflict”. John Huber is grateful for financial support from the National Science Foundation (SES-0818381). Laura Mayoral gratefully acknowledges financial support from the Generalitat de Catalunya, and the Ministry of Economy and Competitiveness Grant Number ECO2015-66883-P and from National Science Foundation Grant SES-1629370 (PI: Debraj Ray). We received helpful comments from Lars-Erik Cederman, Joan Esteban, Debraj Ray and seminar participants at various venues where this paper was presented. We also thank Sabine Flamand, Andrew Gianou and Tom Orgazalek for superb research assistance.

## Electronic supplementary material

Below is the link to the electronic supplementary material.

## Appendices

### Group inequality and conflict: additional analysis

This appendix provides the additional empirical analyses discussed in the text. Section A.1 contains the robustness tests discussed in Sect. 4 of the main text; Sect. A.2 examines the issues of correlation vs causation discussed in the Sects. 5 and A.3 provide additional analysis of WGI and conflict onset.

### Robustness checks

Tables 4 and 5 reestimate Table 1 using alternative measures of WGI. Models 1–7 in Table 4 use \(\textsc {g}^{{I}}\), the group Gini variable adjusted with the intercept approach. Model 8 uses \(\textsc {g}^U\), the unadjusted survey-based group Gini. The results for WGI are robust to using either measure. Model 8 suggests that the relationship between WGI and conflict incidence exists in the raw data and is not being driven by decisions with respect to adjusting the heterogenous surveys, although the heterogeneity in the surveys obviously warrants making such adjustments.

The measures of WGI used in the previously discussed results are based on the average of the various surveys for a group applied to all years, even those that precede the date of the survey. Since the surveys cover a relatively short period (1992–2010) and inequality variables are known to evolve slowly, this approach is a reasonable way to maximize the available data. Nonetheless, reverse causality is a valid concern. To avoid this possibility, we re-computed the inequality variables so that in year *t* we take the mean of all available surveys in year *t* and in prior years for which surveys exist.^{Footnote 24} Since inequality changes slowly, the correlation between our baseline group Gini and the one just described (denoted as \(\textsc {g}^R\textsc {-pre}\)) is very high (\(\hbox {r}=.94\)). Table 5 replicates Table 1 using these redefined measures of inequality; despite the fact that the sample size reduces considerably, the conclusions regarding within-group inequality are generally robust.

Table 6 considers different measures of horizontal inequality and GDP. As noted above, group gdp and gdp (a country-level measure) are correlated at .96, and similar results are obtained regardless of whether group GDP or country GDP is included. We can see this in Model 1 of Table 6, which re-estimates model 7 in Table 1, but substitutes group gdp for gdp. The remaining models in the table re-estimate our baseline specification (model 7 in Table 1) with different measures horizontal inequality, see “Appendix B” for definitions of these variables: model 2–3 use lineq2 (the measure of horizontal inequality in Cederman et al. 2011; Kuhn and Weidmann 2015), computed using data either from the surveys (model 2) or GECON (model 3), and models 4–5 consider the low and high variables (the measure of horizontal inequality in Cederman et al. 2015), also computed using data from the surveys (column 4) or from GECON (5). Across all five models, the coefficient for WGI remains positive and significant at conventional levels. Similar conclusions are obtained if one controls for group GDP rather than country GDP.

Table 7 considers a range of alternative variables and model specifications. Column 1 includes a variable measuring the net value of oil and gas exports per capita, oil/gas exports. The data are from Ross and Mahdavi (2015). Column 2 adds pol. transitions, the number of political transitions in the previous 5 years (Cheibub et al. 2010). Column 3 controls for the the distance from the ethnic homeland to the country capital (dist.cap). Model 4 adds poverty2, a country-level measure of poverty (from PovcalNet, World Bank), while column 5 introduces infant mortality (from Prio-Grid), which measures group-level infant mortality rate. Column 6 introduces the demographic power balance between the group and the group(s) in power (from EPR), while column 7 includes all the previous variables in the same specification. Model 8 introduces country-year fixed effects and is estimated by OLS, as the optimization algorithm of the ordered logit fails to converge. The results regarding WGI remain robust to the inclusion of these additional controls.

Figure 2 illustrates graphically our empirical results and plots the relationship between conflict-share and \(\textsc {G}^R\).

### Analysis of correlation versus causation

This section takes up the issues of correlation versus causation that are discussed in Sect. 5 of the paper. The analysis has three parts. Section A.2.1 considers the relationship between WGI and other group-level variables. Section A.2.2 presents a test for omitted variable bias. And Sect. A.2.3 considers the issue of reverse causation.

#### Group-level variables’ relationship with WGI

As discussed in Sect. 5.1 in the main text, this section treats within-group inequality as the dependent variable and explores the correlation of this variable with other group-level variables. We begin by considering group-level variables already included in the baseline regression models (see Table 1). Model 1 in Table 8 presents the results where a country-group is the unit of analysis, \(\textsc {g}^{{R}}\) is the *dependent* variable, and the right-hand side variables include the group-level variables in column 5 of Table 1, as well as country fixed effects. The only variable with a precisely estimated coefficient is group GDP: not surprisingly, groups that are on average richer tend to be *less* unequal.

Next we examine whether other variables that have *not* been the focus of previous group-level conflict studies have a systematic relationship with \(\textsc {g}^{{R}}\). As discussed above, Arbatli et al. (2018) demonstrate a strong positive relationship between conflict and interpersonal diversity (proxied by genetic diversity), and they provide suggestive evidence that this relationship may be mediated by social trust. We might also expect that distrust within a group will be highest in groups with high intra-group inequality. Since within-group trust is likely to be related both to conflict and to WGI, its omission could bias our results. While we do not have measures of intra-group trust levels that would allow us to probe this issue directly, if anything, the omission of within-group trust should bias the coefficient of WGI *downwards*. The direction of the bias is given by the sign of \(\beta cov(WGI, WGTrust)/var(WGI)\), where \(\beta \) is the coefficient of within-group trust (WGTrust) in the conflict regression. We expect \(\beta \) to be positive since within-group trust should decrease within-group coordination costs, increasing a group’s capacity to wage conflicts in which they become involved. Since the term *cov*(*WGI*, *WGTrust*) is likely to be negative, the sign of the omitted variable bias is most likely negative. This would suggest that our results for WGI represent a lower bound on this variable’s estimated effect.

Although we do not have direct measures of trust, we might expect that intra-group trust will will decrease with the cultural heterogeneity within groups. Model 2 therefore includes two variables that may be related to *distrust* within groups: linguistic frac., a variable measuring the linguistic fractionalization *within* a group, and model 3 includes relig. frac., a variable measuring the religious fractionalization within a group.^{Footnote 25} The results suggest no relationship between the group Gini and religious divisions within a group, but the coefficient for linguistic frac. is positive and highly significant, raising the possibility that our estimates for WGI may be biased downward. We return to this issue in Table 9 below, where we include linguistic frac. in the conflict regressions.

Two others variables related to the geographic dispersion of groups might at once be related to group inequality and conflict. A group that is regionally concentrated may have lower levels of inequality than groups that are spread out if this regional concentration constrains the nature of economic opportunities. It is also plausible that regionally concentrated groups may find it easier to coordinate fighting, and thus to sustain conflict, creating concerns about omitted variable bias. Model 4 therefore includes regional, an indicator variable that takes the value 1 if a group is regionally concentrated.^{Footnote 26} This variable has no precise relationship with a group’s Gini. Similarly, a more urbanized group may have higher levels of inequality than a group that is more rural (if the nature of economic opportunities varies more in cities) and may have different propensities to sustain conflict (if for example, it is more difficult to sustain conflict in urban areas). Model 5 includes urbanization, a variable that measures the proportion of a group’s cells that are urbanized.^{Footnote 27} Again we find no relationship between this variable and a group’s Gini.

Finally, we consider two variables related to the terrain a group occupies. First, groups in areas with more rainfall could have more inequality if adequate rainfall creates more opportunities for more productive economic activity among skilled and industrious individuals, and groups in such areas might also be associated with more conflict for reasons unrelated to labor-captial considerations (if such areas are simply attractive to plunder by the government or by other groups). rainfall measures the average annual rainfall in the areas controlled by a group, and it is included in model 6. We find no relationship between this variable and WGI. For reasons related to those regarding precipitation, groups that live in areas that can be used for agriculture might have more group-based inequality and might be attractive to governments or other groups. Model 7 therefore includes agriculture, which measures the percentage of a group’s area that is covered with agricultural production. Again, we find no relationship between this variable and the group Gini.^{Footnote 28} Finally, model 8 presents the results when all of these additional variables are included, and the findings reinforce those above: only group GDP and group language fractionalization have a robust association with a group’s Gini.

Since our regressions with standard controls include the group’s GDP, the analysis in Table 8 identifies one additional variable that raises concern about omitted variable bias: linguistic fractionalization, which we posit may also be related to intra-group trust, which itself may be related to conflict. Since this variable has a precisely estimated relationship with the group Gini, it is important to assess the robustness of the WGI coefficients when linguistic fractionalization is included in the conflict models. We therefore added linguistic fractionalization to models 3–7 in Table 1, and the results in Table 9 show no association between linguistic fractionalization and conflict. The coefficient for the group Gini remains positive and precisely estimated across all five models. Thus, there is little evidence that the results in the preceding discussion are not robust to the inclusion of this variable.^{Footnote 29}

Before turning to statistical efforts to address issues of causal identification, it is important to acknowledge one additional issue that is related to the endogeneity of the WGI measures, which concerns the nature and evolution of group boundaries themselves. Our measures of group inequality require that we begin with a definition of a group: it is impossible to proceed otherwise. But we know that such definitions are based on social processes and scholarly perceptions of these processes, and that they often begin with patterns of migration that are linked to genetic differences which themselves have implications for the processes we study (Ashraf and Galor 2013b). Moreover, the salience of group boundaries can change over time, and these changes may be due in part to the outcomes of conflict. Thus, even if we could nail down the causal effect of WGI using the group definitions we employ, we do not know if factors affecting the definition of these boundaries in fact shape conflict patterns. This is a problem inherent to all studies that must assign individuals to groups, and though we cannot address it here, we feel it is useful to flag it as a topic for future research.

#### Employing Oster (2017) to assess the possible influence of omitted variables

To test for the possible importance of omitted variable bias, we have computed the amount of correlation between the unobservables and WGI, relative to the correlation of the observables and WGI, that would be necessary to explain away our key result (i.e., to make the coefficient of WGI equal to zero). In its simplest formulation, this value, denoted by \(\delta \), can be computed as follows (see Oster 2017):^{Footnote 30}

where \(\beta _c\) and \(\beta _{nc}\) are the coefficients of WGI in a model that contains all the observable controls and one with no or a few controls, respectively, and \(R^2_{c}\) and \(R^2_{nc}\) are the \(R^2\)’s associated with those regressions. Finally, \(R^2_{\text {max}}\) is one’s assumption about the maximum \(R^2\) that could be attained if all the relevant controls were observed.

A value of \(\delta = 2\), for example, would suggest that the unobservables would need to be twice as important as the observables to produce a treatment effect of zero. Altonji et al. (2005) and Oster (2017) suggest that values of \(\delta \) larger than 1 in absolute value can be interpreted as evidence that omitted variable bias is unlikely to explain the observed result. A value of 1 (or larger) means that the unobservables would need to be at least as important as the observables to produce a treatment effect of zero. Since researchers typically choose the controls they believe *ex ante* to be the most important (Angrist and Pischke 2009), situations where the effect of the unobservables is larger than that of the controls are deemed unlikely.

Table 10 presents the results. The full model corresponds to that in column 7 in Table 1. Restricted models I, II and III correspond to models with no controls, with year fixed effects and with country and year fixed effects, respectively. In order to implement the test, a value for \(R^2_{\text {max}}\) needs to be chosen. To select this value, we have followed the advice in Oster (2017) who suggests using a value equal to 1.3 times the value of the R\(^2\) obtained in the regression with all controls, which in our case equals 0.89. In addition, we have also considered two additional values, so that \(R^2_{\text {max}}=\{0.8, 0.85, 0.9\}\). The figures in Table 10 correspond to the values of \(\delta \) for each of the 9 cases considered. In most cases we obtain values of \(\delta \) that are larger than 1 in absolute value, which suggests that it is not likely that the significance of WGI is due to omitted variable bias. Only in one of the nine combinations (when we consider a model with country and year fixed effects—Restricted model I—and a very high value of R\(_\text {max}\) (equal to 0.9) do we obtain values of \(\delta \) that are smaller than 1. This test therefore suggests that it is very unlikely that the results for within-group inequality are driven by omitted variables.

#### Reverse causation

The test for reverse causation is described in Sect. 5.3. Table 11 presents the results.

### WGI and conflict onset: additional analysis

This section revisits the analysis in Kuhn and Weidmann (2015), as discussed in the paper’s Sect. 6.1. Model 1 in Table 12 replicates model 4 in Table 1 of Kuhn and Weidmann’s paper, which relies on the same set of control variables as in Cederman et al. (2011, “CWG”), but which (unlike CWG) includes country fixed effects. Since we are considering all years (starting in 1992) for which EPR data is available, our data merging process results in more observations than in Kuhn and Weidmann’s original paper.^{Footnote 31} However, the results are similar to those presented in Kuhn and Weidmann: the coefficient for igi is positive and significant at the 10% level, though not as precisely estimated as in their paper.^{Footnote 32}

This result for igi, however, is not at all robust. Omitted variable bias is one concern. We explored the robustness of the results to omitted variables employing the technique developed by Oster (2017), as in Table 10 in a setup where the full model is that in Kuhn and Weidmann’s (2015) (column 1 in Table 12) and the three restricted models contain (i) no controls, (ii) country fixed effects and (iii) country and year dummies.^{Footnote 33} In all cases the values of \(\delta \) are close to zero, raising concerns that the estimates for igi in model 1 could suffer omitted variable bias.^{Footnote 34}

This concern is clear from the model in column 2, which adds the group-level control variables that are present in the models in Sect. 4 and that are taken directly from the *Growup* portal: group gdp, group elev.(sd), group size, group diamonds and group oil.^{Footnote 35} Though the coefficient for igi remains positive, it is now estimated with considerable error (p\(\,=\,\).30). Of the five group-level variables added to model 2, however, only one is significant, group elev. (sd), so model 3 re-estimates model 2 omitting the insignificant group level regressors. Model 1 also lacks two country-level controls, pop and polity. Column 4 adds these two variables and shows pop has a positive and significant coefficient while polity is estimated with a large error. The coefficient for igi decreases to less than half the value obtained in Column 1 and remains very imprecisely estimated (*p*-value .42). Finally, it is useful to note that the results in model 1 require the presence of a particular group: the East Timorese in Indonesia. Model 5 presents results from re-estimating model 1 without this group and the coefficient for igi is insignificant. Columns 6 and 7 reproduce columns 3 and 4 omitting this group, obtaining similar results.

We therefore find little support for a robust association between within-group inequality and conflict onset, regardless of whether WGI is measured using nightlights data or surveys. While it is always possible that these null results are due to measurement error, we suspect that is not the issue here. Instead, the null results are not inconsistent with the ER argument, which emphasizes that WGI increases the capacity to fight rather creating incentives to do so.

### Variable definitions and summary statistics

This section provides detailed definitions for the variables employed in the empirical analysis as well as a table of summary statistics.

### Variable definition

**Conflict variables**

intensity: “Group level Conflict intensity”. We assign a value of 0 to group *i* in year *t* if that group is at peace in year *t*, a value of 1 if group *i* is engaged in armed conflict against the state resulting in more than 25 battle-related deaths but less than 1000 in year *t*, and a value of 2 if group *i* is engaged in a conflict resulting in more than 1000 battle in year *t*. Ethnic groups are coded as engaged in conflict if a rebel organization involved in the conflict expresses its political aims in the name of the group and a significant number of members of the group participate in the conflict. Source: *Growup portal*, https://growup.ethz.ch.

incidence: “Group level Armed conflict”. A binary measure taking a value of 1 for those years where an ethnic group is involved in armed conflict against the state resulting in more than 25 battle-related deaths, that is, for the years where intensity is either 1 or 2. Source: *Growup portal*, https://growup.ethz.ch.

conflict-share: Share of years a group has been in conflict against resulting in more than 25 battle related deaths in the period 1992–2010.

battle deaths (best): Number of battle related deaths according to the *best* estimate from Lacina and Gleditsch (2005). One is added if the number of battle deaths is zero). In situations where the best estimate was missing, we used the low estimate instead.

battle deaths (low): Log of the number of battle related deaths according to the *low* estimate from Lacina and Gleditsch (2005). One is added if the number of battle deaths is zero).

onset: “Group level Conflict Onset”. A binary measure reflecting the first year in which a group enters a conflict, as defined in incidence above. Source: *Growup portal*, https://growup.ethz.ch.

intensity (ongoing only): This variable is identical to intensity, except that onset conflict years are set to zero.

incidence (ongoing only): This variable is identical to incidence, except that onset conflict years are set to zero.

peaceyears: Number of years since the last conflict observation. Source: *Growup portal*, https://growup.ethz.ch.

**Within-group Inequality variables**

\(\textsc {g}^{{R}}\): Group Gini coefficient, computed using survey data and adjusted using the Ratio approach, as described in Sect. 3.1 and “Appendix C.2”. All available observations for a group are averaged and assigned to all the years in the period 1992–2010.

\(\textsc {g}^{{I}}\): Group Gini coefficient, computed using survey data and adjusted using the Intercept approach, as described in Sect. 3.1 and “Appendix C.2”. All available observations for a group are averaged and assigned to all the years in the period 1992–2010.

\(\textsc {g}^U\): Group Gini coefficient, computed using survey data, unadjusted. All available observations for a group are averaged and assigned to all the years in the period 1992–2010.

\(\textsc {g}_t^R\)-pre: Group Gini coefficient, computed using survey data and adjusted using the Ratio approach. The value at time *t* of this variable is computed by averaging all available surveys in year *t* and in prior years. If no observations are available until period *t*, this variable is set to missing.

igi: Group-level Gini coefficient computed using nighlight emissions. Source: Kuhn and Weidmann (2015).

**Controls**

gdp: log of real GDP per capita, lagged 1 year. Source: Penn World Table (2015).

pop: log of the population in millions, lagged 1 year, as reported by the Penn World Table (2015).

xpolity: democracy score based on Polity IV, lagged 1 year. It combines 3 out of the 5 components of Polity IV (XCONST, XRCOMP, XROPEN) and leaves out the two components (PARCOMP and PARREG) that are related to political violence, and hence are likely to be endogeneous. It ranges from -6 (maximum level of autocracy) to 7 (maximum level of democracy). See Vreeland (2008) for details.

group gdp: Survey-based group GDP per capita, lagged 1 year. See Section C.1.3 for details on its construction.

group size: Relative size of the group. Source: GrowUp portal, https://growup.ethz.ch/.

group oil: dummy variable indicating whether the group has oil in its homeland. Source: PRIOGRID, through the Growup portal, https://growup.ethz.ch/.

group elev. (sd): the standard deviation of the elevation of the ethnic homeland. Source: GrowUp portal, https://growup.ethz.ch/.

excluded: dummy variable indicating whether the group is excluded from power. Source: GrowUp portal, https://growup.ethz.ch/.

hi(ln): group-level measure of horizontal inequality defined as the \(log|g-\bar{G}|\), where *g* is group’s GDP per capita (the above-defined group gdp variable) and \(\bar{G}\) is total GDP minus the GDP corresponding to group *i*.

lineq2: Measure of horizontal inequality, defined as \((log\frac{g}{G})^2\), where *g* is group’s GDP per capita and *G* is the (unweighted) average of GDP per capita of all groups. Source: we have computed this measure using the surveys and GECON data. Details are provided in the corresponding tables.

low: It is defined as \(max\{1, \frac{g}{G}\)}, where *g* is group’s per capita GDP and G is the average of the per capita GDP of all groups. Source: we have computed this measure using the surveys and GECON data. Details are provided in the corresponding tables.

high: It is defined as \(max\{1, \frac{G}{g}\)}, where *g* is group *i* per capita GDP and G is the average of the per capita GDPs of all groups. Source: we have computed this measure using the surveys and GECON data. Details are provided in the corresponding tables.

poverty2: Percentage of the total population with income lower than 2 dollars a day. Source: World Bank.

infant mortality: Infant mortality rate. This variable is a snapshot from the year 2000. Source: PRIOGRID, accessed through Growup portal, https://growup.ethz.ch/.

N. excluded groups: Number of groups in the country excluded from power. Source: GrowUp portal, https://growup.ethz.ch/.

group diamonds: dummy variable indicating whether the group has diamonds in its homeland. Source: PRIOGRID, through the Growup portal, https://growup.ethz.ch/.

pow. balance. Demographic power balance between the group and the group(s) in power. Denoting the populations of the group and the group(s) in power as s and S, respectively, the power balance is defined as s/(s+S) if the group is excluded, and as s/S otherwise.

linguistic frac : Within-group Linguistic fractionalization index. Source: Growup portal, https://growup.ethz.ch/.

religious frac : Within-group religious fractionalization index. Source: Growup portal, https://growup.ethz.ch/.

regional : A dummy measuring whether a group is regionally concentrated. Source: Growup portal, https://growup.ethz.ch/.

urbanized: The proportion of a group?s homeland that is urbanized. Source: Prio-Grid.

rainfall : Average annual rainfall in the areas controlled by a group. Source: Prio-Grid.

agriculture: Percentage of a group?s area that is covered with agricultural production. Source: Prio-Grid.

### Summary statistics

Table 13 provides summary statistics for the variables employed in Sect. 4.

## Rights and permissions

## About this article

### Cite this article

Huber, J.D., Mayoral, L. Group inequality and the severity of civil conflict.
*J Econ Growth* **24**, 1–41 (2019). https://doi.org/10.1007/s10887-019-09162-6

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10887-019-09162-6

### Keywords

- Ethnicity
- Within-group inequality
- Civil conflict

### JEL Classification

- D63
- D74
- J15
- O15