Skip to main content

Group inequality and the severity of civil conflict


Civil conflicts, which have been much more prevalent than inter-state conflicts over the last fifty years, vary enormously in their intensity, with some causing millions of deaths and some far fewer. The central goal of this paper is to test an argument from previous theoretical research that high inequality within an ethnic group can make inter-ethnic conflict more violent because such inequality decreases the opportunity cost to poor group members of fighting, and also decreases the opportunity cost to rich group members of funding the conflict. To test this argument, we create a new data set that uses individual-level surveys to measure inequality within ethnic groups. The analysis using these data provide strong evidence for the importance of within-group inequality, and thus underscores the value of focusing on the capacity of groups to fight if one wishes to limit the destruction of civil conflicts.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. Gleditsch et al. (2002), for example, find that since WWII, there were 46 interstate conflicts with more than 25 battle-related deaths per year, 22 of which have killed at least 1000 over the entire history of the conflict. Over the same period, there were 181 civil conflicts with more than 25 battle-related deaths per year, and almost half of them had killed more than 1000 people.

  2. These figures are based on the “best estimate” of the Battle Deaths dataset, version 3.1; see Lacina and Gleditsch (2005).

  3. In a related study, Arbatli et al. (2018) find a positive relationship between conflict and within group genetic diversity,which they argue is a proxy for interpersonal diversity. We discuss how our findings and those of Arbatli et al. (2018) are related in Sect. 2.

  4. Previous studies have focused instead on conflict onset, which as noted above, is not clearly linked to the ER theory. Østby et al. (2009) find a positive association between within-region inequality and conflict onset in 22 countries in Sub-Saharan Africa, and Kuhn and Weidmann (2015), which we discuss below, find a positive association between within-group inequality and conflict onset using a global dataset that has the ethnic group—rather than the region—as unit of analysis.

  5. See Sect. 5, however, which is devoted to assessing issues of reverse causation and omitted variable bias.

  6. Notice that the Rwandan genocide is not in our dataset as our empirical investigation focuses on two-sided conflicts exclusively. We have included this example to illustrate the fact that Esteban and Ray’s theory is quite general and applies to different conflict typologies.

  7. It is also worth noting that micro-level studies of participation emphasize that richer elites recruit the poor to fight. Brubaker and Laitin (1998), for example, argue that most ethnic leaders are well-educated and from middle-class backgrounds while the lower-ranking troops are more often poorly educated and from working-class backgrounds. In their study of Sierra Leone’s civil war, Humphreys and Weinstein (2008) find that factors such as poverty, a lack of access to education, and political alienation are good predictors of conflict participation and that they may proxy, among other factors, for a greater vulnerability to political manipulation by elites. Justino (2009) also emphasizes that poverty is a leading factor in explaining participation in ethnic conflict.

  8. See

  9. On the relationship between interpersonal trust and interpersonal diversity, see also Ashraf and Galor (2013a) and Ashraf and Galor (2018).

  10. The SWIID provides comparable (country-level) Gini indices of gross and net income inequality for 173 countries from 1960 to the present and is one of the most thorough attempts to tackle the comparability challenge.

  11. There are a number of data sets on the geographic location of groups, including the GREG, the GeoEPR and the Ethnologue. The GREG dataset (Weidmann et al. 2010) is based on the Soviet Atlas Narodov Mira. The Ethnologue provides information on the spatial location of linguistic groups in much of the world. The group-level studies of horizontal inequality and conflict have relied on the GeoEPR dataset, described in Wucherpfennig et al. (2011), which utilizes an expert survey to determine the identity and location of politically relevant ethnic groups (Wimmer et al. 2009).

  12. Regions with mixed ethnic composition are quite common (e.g., Müller-Crepon and Hunziker 2018), and Morelli and Rohner (2015) link this segregation itself to civil conflict.

  13. Østby et al. (2009), for example, use survey data from the Demographic Health Surveys in 22 countries in Sub-Saharan Africa. Their study calculates the Gini coefficient for each region and their analysis finds that regions with higher levels of inequality are most likely to experience the onset of conflict. Fjelde and Gudrun (2014) provide a similar regional-level study, focusing on civil unrest rather than civil war.

  14. KW point out that groups might be relatively geographically segregated in the countryside, this is unlikely to be the case in urban areas. Thus including urban cells can introduce measurement error.

  15. The data are accessed through the ETH Zurich’s GROWup data portal ( We combine data from the Ethnic Power Relations (EPR) Core Dataset 2014 (, the ACD2EPR 2014 dataset on conflict ( and the GeoEPR 2014 data set on group attributes, including group economic well being ( See “Appendix C” for further details.

  16. Following previous literature, we drop from the sample monopoly and dominant groups as by definition these groups cannot stage rebellions against themselves. A group is classified as monopoly if the elite members hold monopoly power in the executive to the exclusion of members of all other ethnic groups. A group is classified as dominant if elite members of the group hold dominant power in the executive but there is some limited inclusion of “token” members of other groups who however do not have real influence on decision making.

  17. The adjusted inequality measures are generated regressors and therefore standard errors that do not take this fact into account are generally invalid since they ignore the sampling variation in such regressors. Nevertheless, for the purpose of testing whether the inequality variables are significantly different from zero, the sampling variation in the generated regressors can be ignored (at least asymptotically). See Wooldridge (2002), chapters 6 and 12 for additional details.

  18. Studies of conflict onset have employed a variety of measures of horizontal inequality. Table 6 in “Appendix A” shows that our results are robust when these alternative measures are used as controls.

  19. XPOLITY combines 3 out of the 5 components of Polity IV and leaves out the two components (PARCOMP and PARREG) that are constructed using political violence in their definition (Vreeland 2008).

  20. Dropping gdp from column 6 and keeping group gdp produces essentially identical results, see “Appendix A”.

  21. More specifically, conflict-share is the average of incidence over the period 1992–2010 and is employed in regressions where the unit of analysis is the group. All time-varying variables are based on values at the beginning of the sample.

  22. This variable is called onset_do_flag in the EPR data set, and it is missing in years where a group is involved in conflict during the years immediately following the year when conflict is initiated. Thus, the data include group-years where a group either begins involvement in conflict or is not involved in conflict. See Wucherpfennig et al. (2011) for details on conflict measures.

  23. We have formally tested whether the coefficients for \(\textsc {G}^R\) in the onset and the intensity regressions in Tables 1 and 3 are statistically different. Since the number of observations in the onset regressions is considerably smaller, power is likely to be low. Nonetheless, tests comparing the estimates of \(\textsc {G}^R\) in columns 3 and 4 from Table 3 to those obtained in the analogous regressions in Table 1 (columns 6 and 7) deliver p-values of .026 and .06, respectively, depending on the specification employed, which suggests there is reasonable evidence in the data to reject equality of coefficients.

  24. If no survey is available for a group until time t, observations up to t are set to missing.

  25. linguistic frac. is based on the group language variables in the EPR data set and is set equal to \(1-\text {size}_{LG_1}^2-\text {size}_{LG_2}^2\), where \(\text {size}_{LG_1}\) and \(\text {size}_{LG_2}\) indicate the size of the two largest linguistic subgroups within the group. Religious fractionalization is defined similarly.

  26. This variable is taken from the EPR data set and equals 1 if geo\(\_\)typename equals “Regionally based.”

  27. All the variables that are introduced in the remainder of this section are taken from the recently released PRIO-GRID 2.0 (Tollefsen et al. 2016). urbanization is the variable in this dataset called urban\(\_\)gc\(\_\)mean.

  28. rainfall is the log of \(prec\_gpcp\_mean\) in Tollefsen et al. (2016) and agriculture is \(agri\_gc\_mean*100\) in this data set.

  29. We also added each of the other variables discussed above—relig. frac., regional, urbanization, rainfall and agriculture—to models 3–7—and the results for the group Gini remain highly robust.

  30. This definition of \(\delta \) corresponds to the case where there is a single observable control, see Oster (2017) for details on the more general case.

  31. The data employed for this regression has been taken from the Growup portal ( except for igi, which has been computed by Kuhn and Weidmann (2015).

  32. The models presented in the previous tables use lags of the economic variables in order to diminish concerns about reverse causality. For consistency with the Kuhn and Weidmann (2015) and CWG approaches, variables in Table 12 are not lagged. The results are substantively similar regardless of whether the variables are lagged. Model 1 also omits 15 observations by using exclusion rules commonly used in research that employs EPR data to estimate models of civil war onset. In particular, groups are excluded if they are judged to be dominant, to have a monopoly on power, or if they are geographically dispersed (see discussion in CWG 2011). The results are essentially identical if these exclusion rules are ignored.

  33. Kuhn and Weidmann (2015) also consider in their robustness checks an omitted variable bias analysis in a similar vein as the one discussed in the text, and obtain values of \(\delta \) that suggest that results are robust to omitted variable bias. However, they use the technique introduced by Bellows and Miguel (2009) that does not incorporate the movements in \(R^2\) in the estimation of \(\delta \). As discussed at length in Oster (2017), the omitted variable bias is proportional to the movement in coefficients only if movements in \(R^2\)’s are also taken into account and, thus, it is critical to introduce this term in order to have accurate results.

  34. As in Table 10 in “Appendix A”, we have considered different values for R\(^2_\text {max}\), the maximum value of the \(R^2\) coefficient that could be obtained if all the relevant controls were included in the regressions. Only when R\(^2_\text {max}\) was set as low as 0.1 did we obtain values \(\delta \) larger than 1 in some specifications.

  35. group size is size (epr) from the EPR data, and is the “group’s population size as a fraction of the ethnically relevant population of this group’s country.”


  • Acemoglu, D., & Robison, J. (2005). Economic origins of dictatorship and democracy. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Adhvaryu, A., Fenske, J., Khannax, G., & Nyshadham, A. (2017). Resources, conflict, and economic development in Africa. In: NBER working papers 24309. National Bureau of Economic Research Inc.

  • Alesina, A., Michalopoulos, S., & Papaioannou, E. (2016). Ethnic inequality. Journal of Political Economy, 124(2), 428–488.

    Article  Google Scholar 

  • Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). Selection on observed and unobserved variables: Assessing the effectiveness of Catholic schools. Journal of Political Economy, 113(1), 151–184.

    Article  Google Scholar 

  • Angel, S. (2012). Planet of cities. Cambridge, MA: Lincoln Institute of Land Policy.

    Google Scholar 

  • Angrist, J. D., & Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.

    Book  Google Scholar 

  • Arbatli, C. E., Ashraf, Q. H., Galor, O., & Klemp, M. (2018). Diversity and conflict. Econometrica (forthcoming).

  • Ashraf, Q., & Galor, O. (2013a). The “Out of Africa” hypothesis, human genetic diversity, and comparative economic development. American Economic Review, 103(1), 1–46.

    Article  Google Scholar 

  • Ashraf, Q., & Galor, O. (2013b). Genetic diversity and the origins of cultural fragmentation. American Economic Review, 103(3), 528–533.

    Article  Google Scholar 

  • Baldwin, K., & Huber, J. D. (2010). Economic versus cultural differences: Forms of ethnic diversity and public goods provision. American Political Science Review, 104(4), 644–662.

    Article  Google Scholar 

  • Ashraf, Q. H., & Galor, O. (2018). The Macrogenoeconomics of Comparative Development. Journal of Economic Literature, 56(3), 1119–1155.

    Article  Google Scholar 

  • Bellows, J., & Miguel, E. (2009). War and local collective action in Sierra Leone. Journal of Public Economics, 93(11–12), 1144–1157.

    Article  Google Scholar 

  • Brubaker, R., & Laitin, D. D. (1998). Ethnic and Nationalist Violence. Annual Review of Sociology, 24, 42–452.

    Article  Google Scholar 

  • Cederman, L.-E., Gleditsch, K. S., & Buhaug, H. (2013). Inequality, gruevances, and civil war. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Cederman, L.-E., Weidmann, N. B., & Bormann, N.-C. (2015). Triangulating horizontal inequality: Toward improved conflict analysis. Journal of Peace Research, 52(6), 806–821.

    Article  Google Scholar 

  • Cederman, L.-E., Weidmann, N. B., & Gleditsch, K. S. (2011). Horizontal inequalities and ethnonationalist civil war: A global comparison. American Political Science Review, 105(3), 478–495.

    Article  Google Scholar 

  • Cederman, L.-E., Wimmer, A., & Min, B. (2010). Why do ethnic groups rebel? New data and analysis. World Politics, 62(1), 87–119.

    Article  Google Scholar 

  • Cheibub, J. A., Gandhi, J., & Vreeland, J. R. (2010). Democracy and dictatorship revisited. Public Choice, 143(1–2), 67–101.

    Article  Google Scholar 

  • Cramer, C. (2003). Does inequality cause conflict? Journal of International Development, 15(4), 397–412.

    Article  Google Scholar 

  • Dahrendorf, R. (1959). Class and class conflict in industrial society. Stanford, CA: Stanford University Press.

    Google Scholar 

  • Deininger, K., & Squire, L. (1996). A new data set measuring income inequality. The World Bank Economic Review, 10(3), 565–591.

    Article  Google Scholar 

  • Desmet, K., Ortuño-Ortin, I., & Wacziarg, R. (2012). The political economy of linguistic cleavages. Journal of Development Economics, 97(2), 322–338.

    Article  Google Scholar 

  • Doyle, M., & Sambanis, N. (2006). Making war and building peace. Princeton: Princeton University Press.

    Google Scholar 

  • Esteban, J., & Ray, D. (2011). A model of ethnic conflict. Journal of the European Economic Association, 9(3), 496–521.

    Article  Google Scholar 

  • Esteban, J., Mayoral, L., & Ray, D. (2012). Ethnicity and conflict: An empirical study. American Economic Review, 102(4), 1310–42.

    Article  Google Scholar 

  • Fearon, J. D. (2003). Ethnic and cultural diversity by country. Journal of Economic Growth, 8(2), 195–222.

    Article  Google Scholar 

  • Fearon, J. D., & Laitin, D. D. (2000). Violence and the social construction of ethnic identity. International Organization, 54(4), 845–877.

    Article  Google Scholar 

  • Fearon, J. D., & Laitin, D. D. (2003). Ethnicity, insurgency, and civil war. American Political Science Review, 97(1), 75–90.

    Article  Google Scholar 

  • Fjelde, H., & Gudrun, Ø. (2014). Socioeconomic inequality and communal conflict: A disaggregated analysis of sub-Saharan Africa, 1990–2008. International Interactions, 40(5), 737–762.

    Article  Google Scholar 

  • Gleditsch, N. P., Wallensteen, P., Eriksson, M., Sollenber, M., & Strand, H. (2002). Armed conflict 1946–2001: A new data set. Journal of Peace Research, 39(5), 615–637. (accessed October 1, 2010).

    Article  Google Scholar 

  • Gurr, T. R. (1970). Why men rebel. Princeton: Princeton University Press.

    Google Scholar 

  • Gurr, T. R. (1980). Why men rebel: Handbook of political conflict: Theory and research. New York: Free Press.

    Google Scholar 

  • Humphreys, M., & Weinstein, J. M. (2008). Who fights? The determinants of participation in civil war. American Journal of Political Science, 52(2), 436–455.

    Article  Google Scholar 

  • Justino, P. (2009). Poverty and violent conflict: A micro-level perspective on the causes and duration of warfare. Journal of Peace Research, Peace Research Institute Oslo, 46(3), 315–333.

    Article  Google Scholar 

  • Kuhn, P. M., & Weidmann, N. B. (2015). Unequal we fight: Between-and within-group inequality and ethnic civil war. Political Science Research & Methods, 3(3), 543–568.

    Article  Google Scholar 

  • Lacina, B. (2006). Explaining the severity of civil wars. Journal of Conflict Resolution, 50(2), 276–289.

    Article  Google Scholar 

  • Lacina, B., & Gleditsch, N. P. (2005). Monitoring trends in global combat: A new dataset of battle deaths. European Journal of Population/Revue européenne de Démographie, 21(2), 145–166.

    Article  Google Scholar 

  • Melvern, L. (2000). A people betrayed: The role of the West in Rwanda’s genocide. New York: Zed Books.

    Google Scholar 

  • Mitra, A., & Ray, D. (2014). Implications of an economic theory of conflict: Hindu–Muslim violence in India. Journal of Political Economy, 122(4), 719–765.

    Article  Google Scholar 

  • Morelli, M., & Rohner, D. (2015). Resource concentration and civil wars. Journal of Development Economics, 117(C), 32–47.

    Article  Google Scholar 

  • Müller-Crepon, C., & Hunziker, P. (2018). New spatial data on ethnicity: Introducing SIDE. Journal of Peace Research, 55(5), 687–698.

    Article  Google Scholar 

  • Nordhaus, W. D. (2006). Geography and macroeconomics: New data and new findings. Proceedings of the National Academy of Sciences of the USA, 103(10), 3510–3517.

    Article  Google Scholar 

  • Østby, G., Nordas, R., & Rod, J. K. (2009). Regional inequalities and civil conflict in sub-Saharan Africa. International Studies Quarterly, 53(2), 301–324.

    Article  Google Scholar 

  • Oster, E. (2017). Unobservable selection and coefficient stability: Theory and evidence. Journal of Business & Economic Statistics.

  • Penn World Table. (2015). Dataset. Retrieved March 19, 2017 from

  • Polity IV. (2011). Polity IV project: Political regime characteristics and transitions, 1800–2009. Accessed 1 Oct 2011.

  • Ross, M. (2013). The political economy of petroleum wealth. Middle East Development Journal, 5(2), 1350009-1–1350009-19.

    Article  Google Scholar 

  • Ross, M., & Mahdavi, P. (2015). Oil and gas data, 1932–2014., Harvard Dataverse, V2.

  • Sambanis, N., & Milanovic, B. (2011). Explaining the demand for sovereignty. In: Policy research working paper series, vol. 5888. The World Bank.

  • Solt, F. (2009). Standardizing the world income inequality database. Social Science Quarterly, 90(2), 231–242.

    Article  Google Scholar 

  • Stewart, F. (2002). Horizontal inequalities: A neglected dimension of development. In: Annual Lecture No. 5, UNU world institute for development economics research.

  • Tollefsen, A. F., Strand, H., & Buhaug, H. (2012). PRIO-GRID: A unified spatial data structure. Journal of Peace Research, 49(2), 363–374.

    Article  Google Scholar 

  • Tollefsen, A. F., Bahgat, K., Nordkvelle, J., & Buhaug, H. (June 2, 2016). PRIO-GRID v.2.0 Codebook. Retrieved June 20, 2017 from

  • Verwimp, P. (2005). An economic profile of peasant perpetrators of genocide. Micro-level evidence from Rwanda. Journal of Development Economics, 77(2), 297–323.

    Article  Google Scholar 

  • Vreeland, J. R. (2008). The effect of political regime on civil war. Journal of Conflict Resolution, 52(3), 401–425.

    Article  Google Scholar 

  • Weidmann, N. B., Rød, J. K., & Cederman, L.-E. (2010). Representing ethnic groups in space: A new dataset. Journal of Peace Research, 47(4), 491–99.

    Article  Google Scholar 

  • Wimmer, A., Cederman, L.-E., & Min, B. (2009). Ethnic politics and armed conflict. A configurational analysis of a new global dataset. American Sociological Review, 74(2), 316–337.

    Article  Google Scholar 

  • Wintrobe, R. (1995). Some economics of ethnic capital formation and conflict. In: A. Breton, G. Galeotti, and R. Wintrobe (Eds.), Nationalism and Rationality (pp. 43–70).

  • World Bank. (2013). Dataset. Retrieved October 17, 2015 from,2.

  • Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data (Second Edition). Cambridge: MIT Press.

    Google Scholar 

  • Wucherpfennig, J., Metternich, N. W., Cederman, L.-E., & Gleditsch, K. S. (2012). Ethnicity, the state, and the duration of civil war. World Politics, 64(1), 79–115.

    Article  Google Scholar 

  • Wucherpfennig, J., Weidmann, N. B., Girardin, L., Cederman, L.-E., & Wimmer, A. (2011). Politically relevant ethnic groups across space and time: Introducing the GeoEPR dataset. Conflict Management and Peace Science, 28(5), 423–437.

    Article  Google Scholar 

  • Yanagizawa-Drott, D. (2012). Propaganda and conflict: Theory and evidence from the Rwandan genocide. Cambridge: Harvard University.

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Laura Mayoral.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is a revised version of our previous work “Inequality, Ethnicity and Civil Conflict”. John Huber is grateful for financial support from the National Science Foundation (SES-0818381). Laura Mayoral gratefully acknowledges financial support from the Generalitat de Catalunya, and the Ministry of Economy and Competitiveness Grant Number ECO2015-66883-P and from National Science Foundation Grant SES-1629370 (PI: Debraj Ray). We received helpful comments from Lars-Erik Cederman, Joan Esteban, Debraj Ray and seminar participants at various venues where this paper was presented. We also thank Sabine Flamand, Andrew Gianou and Tom Orgazalek for superb research assistance.

Electronic supplementary material


Group inequality and conflict: additional analysis

This appendix provides the additional empirical analyses discussed in the text. Section A.1 contains the robustness tests discussed in Sect. 4 of the main text; Sect. A.2 examines the issues of correlation vs causation discussed in the Sects. 5 and A.3 provide additional analysis of WGI and conflict onset.

Robustness checks

Tables 4 and 5 reestimate Table 1 using alternative measures of WGI. Models 1–7 in Table 4 use \(\textsc {g}^{{I}}\), the group Gini variable adjusted with the intercept approach. Model 8 uses \(\textsc {g}^U\), the unadjusted survey-based group Gini. The results for WGI are robust to using either measure. Model 8 suggests that the relationship between WGI and conflict incidence exists in the raw data and is not being driven by decisions with respect to adjusting the heterogenous surveys, although the heterogeneity in the surveys obviously warrants making such adjustments.

Table 4 Alternative ways of computing within group inequality
Table 5 Estimating models in Table 1 using only past surveys to compute \(\textsc {g}^{{R}}\)
Table 6 Alternative measures of group GDP and horizontal inequality

The measures of WGI used in the previously discussed results are based on the average of the various surveys for a group applied to all years, even those that precede the date of the survey. Since the surveys cover a relatively short period (1992–2010) and inequality variables are known to evolve slowly, this approach is a reasonable way to maximize the available data. Nonetheless, reverse causality is a valid concern. To avoid this possibility, we re-computed the inequality variables so that in year t we take the mean of all available surveys in year t and in prior years for which surveys exist.Footnote 24 Since inequality changes slowly, the correlation between our baseline group Gini and the one just described (denoted as \(\textsc {g}^R\textsc {-pre}\)) is very high (\(\hbox {r}=.94\)). Table 5 replicates Table 1 using these redefined measures of inequality; despite the fact that the sample size reduces considerably, the conclusions regarding within-group inequality are generally robust.

Table 6 considers different measures of horizontal inequality and GDP. As noted above, group gdp and gdp (a country-level measure) are correlated at .96, and similar results are obtained regardless of whether group GDP or country GDP is included. We can see this in Model 1 of Table 6, which re-estimates model 7 in Table 1, but substitutes group gdp for gdp. The remaining models in the table re-estimate our baseline specification (model 7 in Table 1) with different measures horizontal inequality, see “Appendix B” for definitions of these variables: model 2–3 use lineq2 (the measure of horizontal inequality in Cederman et al. 2011; Kuhn and Weidmann 2015), computed using data either from the surveys (model 2) or GECON (model 3), and models 4–5 consider the low and high variables (the measure of horizontal inequality in Cederman et al. 2015), also computed using data from the surveys (column 4) or from GECON (5). Across all five models, the coefficient for WGI remains positive and significant at conventional levels. Similar conclusions are obtained if one controls for group GDP rather than country GDP.

Table 7 considers a range of alternative variables and model specifications. Column 1 includes a variable measuring the net value of oil and gas exports per capita, oil/gas exports. The data are from Ross and Mahdavi (2015). Column 2 adds pol. transitions, the number of political transitions in the previous 5 years (Cheibub et al. 2010). Column 3 controls for the the distance from the ethnic homeland to the country capital (dist.cap). Model 4 adds poverty2, a country-level measure of poverty (from PovcalNet, World Bank), while column 5 introduces infant mortality (from Prio-Grid), which measures group-level infant mortality rate. Column 6 introduces the demographic power balance between the group and the group(s) in power (from EPR), while column 7 includes all the previous variables in the same specification. Model 8 introduces country-year fixed effects and is estimated by OLS, as the optimization algorithm of the ordered logit fails to converge. The results regarding WGI remain robust to the inclusion of these additional controls.

Table 7 Adding additional control variables

Figure 2 illustrates graphically our empirical results and plots the relationship between conflict-share and \(\textsc {G}^R\).

Fig. 2
figure 2

The relation between conflict-share and \(\textsc {g}^{{R}}\). The graph is a binned scatter that plots the residuals obtained by regressing \(\textsc {g}^{{R}}\) and conflict-share (the share of conflict years) on the controls (including country fixed effects) in column 7 from Table 1

Analysis of correlation versus causation

This section takes up the issues of correlation versus causation that are discussed in Sect. 5 of the paper. The analysis has three parts. Section A.2.1 considers the relationship between WGI and other group-level variables. Section A.2.2 presents a test for omitted variable bias. And Sect. A.2.3 considers the issue of reverse causation.

Group-level variables’ relationship with WGI

As discussed in Sect. 5.1 in the main text, this section treats within-group inequality as the dependent variable and explores the correlation of this variable with other group-level variables. We begin by considering group-level variables already included in the baseline regression models (see Table 1). Model 1 in Table 8 presents the results where a country-group is the unit of analysis, \(\textsc {g}^{{R}}\) is the dependent variable, and the right-hand side variables include the group-level variables in column 5 of Table 1, as well as country fixed effects. The only variable with a precisely estimated coefficient is group GDP: not surprisingly, groups that are on average richer tend to be less unequal.

Table 8 Group level variables related to WGI

Next we examine whether other variables that have not been the focus of previous group-level conflict studies have a systematic relationship with \(\textsc {g}^{{R}}\). As discussed above, Arbatli et al. (2018) demonstrate a strong positive relationship between conflict and interpersonal diversity (proxied by genetic diversity), and they provide suggestive evidence that this relationship may be mediated by social trust. We might also expect that distrust within a group will be highest in groups with high intra-group inequality. Since within-group trust is likely to be related both to conflict and to WGI, its omission could bias our results. While we do not have measures of intra-group trust levels that would allow us to probe this issue directly, if anything, the omission of within-group trust should bias the coefficient of WGI downwards. The direction of the bias is given by the sign of \(\beta cov(WGI, WGTrust)/var(WGI)\), where \(\beta \) is the coefficient of within-group trust (WGTrust) in the conflict regression. We expect \(\beta \) to be positive since within-group trust should decrease within-group coordination costs, increasing a group’s capacity to wage conflicts in which they become involved. Since the term cov(WGIWGTrust) is likely to be negative, the sign of the omitted variable bias is most likely negative. This would suggest that our results for WGI represent a lower bound on this variable’s estimated effect.

Although we do not have direct measures of trust, we might expect that intra-group trust will will decrease with the cultural heterogeneity within groups. Model 2 therefore includes two variables that may be related to distrust within groups: linguistic frac., a variable measuring the linguistic fractionalization within a group, and model 3 includes relig. frac., a variable measuring the religious fractionalization within a group.Footnote 25 The results suggest no relationship between the group Gini and religious divisions within a group, but the coefficient for linguistic frac. is positive and highly significant, raising the possibility that our estimates for WGI may be biased downward. We return to this issue in Table 9 below, where we include linguistic frac. in the conflict regressions.

Two others variables related to the geographic dispersion of groups might at once be related to group inequality and conflict. A group that is regionally concentrated may have lower levels of inequality than groups that are spread out if this regional concentration constrains the nature of economic opportunities. It is also plausible that regionally concentrated groups may find it easier to coordinate fighting, and thus to sustain conflict, creating concerns about omitted variable bias. Model 4 therefore includes regional, an indicator variable that takes the value 1 if a group is regionally concentrated.Footnote 26 This variable has no precise relationship with a group’s Gini. Similarly, a more urbanized group may have higher levels of inequality than a group that is more rural (if the nature of economic opportunities varies more in cities) and may have different propensities to sustain conflict (if for example, it is more difficult to sustain conflict in urban areas). Model 5 includes urbanization, a variable that measures the proportion of a group’s cells that are urbanized.Footnote 27 Again we find no relationship between this variable and a group’s Gini.

Table 9 Group linguistic fractionalization and civil conflict

Finally, we consider two variables related to the terrain a group occupies. First, groups in areas with more rainfall could have more inequality if adequate rainfall creates more opportunities for more productive economic activity among skilled and industrious individuals, and groups in such areas might also be associated with more conflict for reasons unrelated to labor-captial considerations (if such areas are simply attractive to plunder by the government or by other groups). rainfall measures the average annual rainfall in the areas controlled by a group, and it is included in model 6. We find no relationship between this variable and WGI. For reasons related to those regarding precipitation, groups that live in areas that can be used for agriculture might have more group-based inequality and might be attractive to governments or other groups. Model 7 therefore includes agriculture, which measures the percentage of a group’s area that is covered with agricultural production. Again, we find no relationship between this variable and the group Gini.Footnote 28 Finally, model 8 presents the results when all of these additional variables are included, and the findings reinforce those above: only group GDP and group language fractionalization have a robust association with a group’s Gini.

Since our regressions with standard controls include the group’s GDP, the analysis in Table 8 identifies one additional variable that raises concern about omitted variable bias: linguistic fractionalization, which we posit may also be related to intra-group trust, which itself may be related to conflict. Since this variable has a precisely estimated relationship with the group Gini, it is important to assess the robustness of the WGI coefficients when linguistic fractionalization is included in the conflict models. We therefore added linguistic fractionalization to models 3–7 in Table 1, and the results in Table 9 show no association between linguistic fractionalization and conflict. The coefficient for the group Gini remains positive and precisely estimated across all five models. Thus, there is little evidence that the results in the preceding discussion are not robust to the inclusion of this variable.Footnote 29

Before turning to statistical efforts to address issues of causal identification, it is important to acknowledge one additional issue that is related to the endogeneity of the WGI measures, which concerns the nature and evolution of group boundaries themselves. Our measures of group inequality require that we begin with a definition of a group: it is impossible to proceed otherwise. But we know that such definitions are based on social processes and scholarly perceptions of these processes, and that they often begin with patterns of migration that are linked to genetic differences which themselves have implications for the processes we study (Ashraf and Galor 2013b). Moreover, the salience of group boundaries can change over time, and these changes may be due in part to the outcomes of conflict. Thus, even if we could nail down the causal effect of WGI using the group definitions we employ, we do not know if factors affecting the definition of these boundaries in fact shape conflict patterns. This is a problem inherent to all studies that must assign individuals to groups, and though we cannot address it here, we feel it is useful to flag it as a topic for future research.

Employing Oster (2017) to assess the possible influence of omitted variables

To test for the possible importance of omitted variable bias, we have computed the amount of correlation between the unobservables and WGI, relative to the correlation of the observables and WGI, that would be necessary to explain away our key result (i.e., to make the coefficient of WGI equal to zero). In its simplest formulation, this value, denoted by \(\delta \), can be computed as follows (see Oster 2017):Footnote 30

$$\begin{aligned} \delta =\frac{\beta _c}{\beta _{nc}-\beta _{c}}\frac{R^2_{c}-R^2_{nc}}{R^2_{\text {max}}-R^2_c}, \end{aligned}$$

where \(\beta _c\) and \(\beta _{nc}\) are the coefficients of WGI in a model that contains all the observable controls and one with no or a few controls, respectively, and \(R^2_{c}\) and \(R^2_{nc}\) are the \(R^2\)’s associated with those regressions. Finally, \(R^2_{\text {max}}\) is one’s assumption about the maximum \(R^2\) that could be attained if all the relevant controls were observed.

Table 10 Assessing the importance of omitted variable bias

A value of \(\delta = 2\), for example, would suggest that the unobservables would need to be twice as important as the observables to produce a treatment effect of zero. Altonji et al. (2005) and Oster (2017) suggest that values of \(\delta \) larger than 1 in absolute value can be interpreted as evidence that omitted variable bias is unlikely to explain the observed result. A value of 1 (or larger) means that the unobservables would need to be at least as important as the observables to produce a treatment effect of zero. Since researchers typically choose the controls they believe ex ante to be the most important (Angrist and Pischke 2009), situations where the effect of the unobservables is larger than that of the controls are deemed unlikely.

Table 10 presents the results. The full model corresponds to that in column 7 in Table 1. Restricted models I, II and III correspond to models with no controls, with year fixed effects and with country and year fixed effects, respectively. In order to implement the test, a value for \(R^2_{\text {max}}\) needs to be chosen. To select this value, we have followed the advice in Oster (2017) who suggests using a value equal to 1.3 times the value of the R\(^2\) obtained in the regression with all controls, which in our case equals 0.89. In addition, we have also considered two additional values, so that \(R^2_{\text {max}}=\{0.8, 0.85, 0.9\}\). The figures in Table 10 correspond to the values of \(\delta \) for each of the 9 cases considered. In most cases we obtain values of \(\delta \) that are larger than 1 in absolute value, which suggests that it is not likely that the significance of WGI is due to omitted variable bias. Only in one of the nine combinations (when we consider a model with country and year fixed effects—Restricted model I—and a very high value of R\(_\text {max}\) (equal to 0.9) do we obtain values of \(\delta \) that are smaller than 1. This test therefore suggests that it is very unlikely that the results for within-group inequality are driven by omitted variables.

Reverse causation

The test for reverse causation is described in Sect. 5.3. Table 11 presents the results.

Table 11 Reverse causation: civil conflict and changes in WGI

WGI and conflict onset: additional analysis

This section revisits the analysis in Kuhn and Weidmann (2015), as discussed in the paper’s Sect. 6.1. Model 1 in Table 12 replicates model 4 in Table 1 of Kuhn and Weidmann’s paper, which relies on the same set of control variables as in Cederman et al. (2011, “CWG”), but which (unlike CWG) includes country fixed effects. Since we are considering all years (starting in 1992) for which EPR data is available, our data merging process results in more observations than in Kuhn and Weidmann’s original paper.Footnote 31 However, the results are similar to those presented in Kuhn and Weidmann: the coefficient for igi is positive and significant at the 10% level, though not as precisely estimated as in their paper.Footnote 32

Table 12 Within-group inequality (measured using nightlights) and conflict onset

This result for igi, however, is not at all robust. Omitted variable bias is one concern. We explored the robustness of the results to omitted variables employing the technique developed by Oster (2017), as in Table 10 in a setup where the full model is that in Kuhn and Weidmann’s (2015) (column 1 in Table 12) and the three restricted models contain (i) no controls, (ii) country fixed effects and (iii) country and year dummies.Footnote 33 In all cases the values of \(\delta \) are close to zero, raising concerns that the estimates for igi in model 1 could suffer omitted variable bias.Footnote 34

This concern is clear from the model in column 2, which adds the group-level control variables that are present in the models in Sect. 4 and that are taken directly from the Growup portal: group gdp, group elev.(sd), group size, group diamonds and group oil.Footnote 35 Though the coefficient for igi remains positive, it is now estimated with considerable error (p\(\,=\,\).30). Of the five group-level variables added to model 2, however, only one is significant, group elev. (sd), so model 3 re-estimates model 2 omitting the insignificant group level regressors. Model 1 also lacks two country-level controls, pop and polity. Column 4 adds these two variables and shows pop has a positive and significant coefficient while polity is estimated with a large error. The coefficient for igi decreases to less than half the value obtained in Column 1 and remains very imprecisely estimated (p-value .42). Finally, it is useful to note that the results in model 1 require the presence of a particular group: the East Timorese in Indonesia. Model 5 presents results from re-estimating model 1 without this group and the coefficient for igi is insignificant. Columns 6 and 7 reproduce columns 3 and 4 omitting this group, obtaining similar results.

We therefore find little support for a robust association between within-group inequality and conflict onset, regardless of whether WGI is measured using nightlights data or surveys. While it is always possible that these null results are due to measurement error, we suspect that is not the issue here. Instead, the null results are not inconsistent with the ER argument, which emphasizes that WGI increases the capacity to fight rather creating incentives to do so.

Variable definitions and summary statistics

This section provides detailed definitions for the variables employed in the empirical analysis as well as a table of summary statistics.

Variable definition

Conflict variables

intensity: “Group level Conflict intensity”. We assign a value of 0 to group i in year t if that group is at peace in year t, a value of 1 if group i is engaged in armed conflict against the state resulting in more than 25 battle-related deaths but less than 1000 in year t, and a value of 2 if group i is engaged in a conflict resulting in more than 1000 battle in year t. Ethnic groups are coded as engaged in conflict if a rebel organization involved in the conflict expresses its political aims in the name of the group and a significant number of members of the group participate in the conflict. Source: Growup portal,

incidence: “Group level Armed conflict”. A binary measure taking a value of 1 for those years where an ethnic group is involved in armed conflict against the state resulting in more than 25 battle-related deaths, that is, for the years where intensity is either 1 or 2. Source: Growup portal,

conflict-share: Share of years a group has been in conflict against resulting in more than 25 battle related deaths in the period 1992–2010.

battle deaths (best): Number of battle related deaths according to the best estimate from Lacina and Gleditsch (2005). One is added if the number of battle deaths is zero). In situations where the best estimate was missing, we used the low estimate instead.

battle deaths (low): Log of the number of battle related deaths according to the low estimate from Lacina and Gleditsch (2005). One is added if the number of battle deaths is zero).

onset: “Group level Conflict Onset”. A binary measure reflecting the first year in which a group enters a conflict, as defined in incidence above. Source: Growup portal,

intensity (ongoing only): This variable is identical to intensity, except that onset conflict years are set to zero.

incidence (ongoing only): This variable is identical to incidence, except that onset conflict years are set to zero.

peaceyears: Number of years since the last conflict observation. Source: Growup portal,

Within-group Inequality variables

\(\textsc {g}^{{R}}\): Group Gini coefficient, computed using survey data and adjusted using the Ratio approach, as described in Sect. 3.1 and “Appendix C.2”. All available observations for a group are averaged and assigned to all the years in the period 1992–2010.

\(\textsc {g}^{{I}}\): Group Gini coefficient, computed using survey data and adjusted using the Intercept approach, as described in Sect. 3.1 and “Appendix C.2”. All available observations for a group are averaged and assigned to all the years in the period 1992–2010.

\(\textsc {g}^U\): Group Gini coefficient, computed using survey data, unadjusted. All available observations for a group are averaged and assigned to all the years in the period 1992–2010.

\(\textsc {g}_t^R\)-pre: Group Gini coefficient, computed using survey data and adjusted using the Ratio approach. The value at time t of this variable is computed by averaging all available surveys in year t and in prior years. If no observations are available until period t, this variable is set to missing.

igi: Group-level Gini coefficient computed using nighlight emissions. Source: Kuhn and Weidmann (2015).


gdp: log of real GDP per capita, lagged 1 year. Source: Penn World Table (2015).

pop: log of the population in millions, lagged 1 year, as reported by the Penn World Table (2015).

xpolity: democracy score based on Polity IV, lagged 1 year. It combines 3 out of the 5 components of Polity IV (XCONST, XRCOMP, XROPEN) and leaves out the two components (PARCOMP and PARREG) that are related to political violence, and hence are likely to be endogeneous. It ranges from -6 (maximum level of autocracy) to 7 (maximum level of democracy). See Vreeland (2008) for details.

group gdp: Survey-based group GDP per capita, lagged 1 year. See Section C.1.3 for details on its construction.

group size: Relative size of the group. Source: GrowUp portal,

group oil: dummy variable indicating whether the group has oil in its homeland. Source: PRIOGRID, through the Growup portal,

group elev. (sd): the standard deviation of the elevation of the ethnic homeland. Source: GrowUp portal,

excluded: dummy variable indicating whether the group is excluded from power. Source: GrowUp portal,

hi(ln): group-level measure of horizontal inequality defined as the \(log|g-\bar{G}|\), where g is group’s GDP per capita (the above-defined group gdp variable) and \(\bar{G}\) is total GDP minus the GDP corresponding to group i.

lineq2: Measure of horizontal inequality, defined as \((log\frac{g}{G})^2\), where g is group’s GDP per capita and G is the (unweighted) average of GDP per capita of all groups. Source: we have computed this measure using the surveys and GECON data. Details are provided in the corresponding tables.

low: It is defined as \(max\{1, \frac{g}{G}\)}, where g is group’s per capita GDP and G is the average of the per capita GDP of all groups. Source: we have computed this measure using the surveys and GECON data. Details are provided in the corresponding tables.

high: It is defined as \(max\{1, \frac{G}{g}\)}, where g is group i per capita GDP and G is the average of the per capita GDPs of all groups. Source: we have computed this measure using the surveys and GECON data. Details are provided in the corresponding tables.

poverty2: Percentage of the total population with income lower than 2 dollars a day. Source: World Bank.

infant mortality: Infant mortality rate. This variable is a snapshot from the year 2000. Source: PRIOGRID, accessed through Growup portal,

N. excluded groups: Number of groups in the country excluded from power. Source: GrowUp portal,

group diamonds: dummy variable indicating whether the group has diamonds in its homeland. Source: PRIOGRID, through the Growup portal,

pow. balance. Demographic power balance between the group and the group(s) in power. Denoting the populations of the group and the group(s) in power as s and S, respectively, the power balance is defined as s/(s+S) if the group is excluded, and as s/S otherwise.

linguistic frac : Within-group Linguistic fractionalization index. Source: Growup portal,

religious frac : Within-group religious fractionalization index. Source: Growup portal,

regional : A dummy measuring whether a group is regionally concentrated. Source: Growup portal,

urbanized: The proportion of a group?s homeland that is urbanized. Source: Prio-Grid.

rainfall : Average annual rainfall in the areas controlled by a group. Source: Prio-Grid.

agriculture: Percentage of a group?s area that is covered with agricultural production. Source: Prio-Grid.

Summary statistics

Table 13 provides summary statistics for the variables employed in Sect. 4.

Table 13 Summary statistics

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huber, J.D., Mayoral, L. Group inequality and the severity of civil conflict. J Econ Growth 24, 1–41 (2019).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Ethnicity
  • Within-group inequality
  • Civil conflict

JEL Classification

  • D63
  • D74
  • J15
  • O15