Any attempt to operationalize these three dimensions over space and time requires a considerable amount of compromise. One particular challenge in choosing indicators is the interdependence of different functions of statehood. We thus need to be careful to avoid these interdependencies in our operationalization. For example, the ability of a state to tax its population requires a high degree of violence control (enforcing compliance) or empirical legitimacy (equivalent with voluntary compliance) and is at the same time the prerequisite for projecting implementation capacity (providing public goods).
What increases the attribution problem even further is that some states are unwilling to strictly enforce laws (including taxation) in the hope of obtaining popular consent (Holland 2016), and the occasional availability of alternative revenues from natural resources or foreign aid. Our selection of indicators to measure state functions must strike a careful balance between conceptual fit (validity), measurement precision (reliability), and availability (coverage). We discuss our choices below.
In order not to truncate our sample artificially, we include all independent countries with at least 250,000 inhabitants in our universe of cases—and not only those suspected to be “fragile” based on whatever prior knowledge available. We aim at measuring our latent state functions directly, e.g., the demonstrated ability of a state to implement policy (implementation capacity), but we often have to resort to observable outcome variables to imperfectly proxy these latent concepts, e.g., using the level of public service provision. Where available, expert assessments complement such proxy indicators, allowing us to capture dysfunctionalities that do not show up in observable indicators, such as the latent inability of states to control their territory in the absence of actual violence.
The violence control dimension represents the state’s ability to mute competing claims to the monopoly of violence and excessive manifestations of violence. We draw on two proxy variables to measure the level of violence control at the disposal of the state. One is battle-related deaths.Footnote 5 This includes all casualties directly related to combat occurring within the territory of a country. The measure reflects the intensity of internal and external attacks on the integrity of a state and thus the degree to which the state faces organized (but only acute) challenges to its monopoly of violence. Whereas war size is usually defined by absolute battle deaths, we employ battle deaths per 100,000 inhabitants because this better mimics the impact violent conflict has on a country’s population. The second observational indicator of violence control is homicides, i.e., “unlawful death purposefully inflicted on a person by another person” (UNODC 2013, p. 9). Individual instances of homicide do—in the vast majority of cases—not stem from explicit challenges to the dominance of the state. But widespread lethal crime can be considered an indicator of organized crime in conflict with governing authorities, i.e., a systemic malfunction affecting the state’s claim for dominance. In addition to these observable count measures of lacking state control, the Bertelsmann Transformation Index (BTI) provides a direct expert assessment which is better able to detect latent conflict: the BTI monopoly of violence indicator (BTI 2016: 16).
The implementation capacity dimension represents the state’s ability to carry out policies. While the classical state capacity literature is agnostic about what this capacity is being employed for in detail, the state fragility literature is explicit about the state’s obligation to provide something to the people in return for their obedience. This something may range from the minimalist “night-watchman” state to an extensive welfare state. We opt for a rather minimalist definition that is restricted to assisting citizens with basic life chances. These include the protection from (relatively easily) avoidable harmful diseases, a basic education that allows for an active participation in social and economic activities, and a basic administration that regulates social and economic activities sufficiently to increase collective gains and avoid massive negative externalities. Our proxies for disease control are the share of the population with access to improved drinking water sources and under-five mortality per 1,000 births, hereafter child mortality. Our education proxy is the rate of primary school enrollment. These are all outcome measures that may also be influenced by other actors, so we require an additional corrective to assess whether the state’s bureaucracy itself is actually less capable than it seems. BTI basic administration provides such a corrective. It is an expert-based assessment on the existence of fundamental structures of a civilian administration, such as a basic system of courts and tax authorities (BTI 2016, p. 17). Other approaches to measure core implementation capacity rather than public good outcomes have been proposed, but none of these is available with global coverage over a sufficient number of years (e.g., Lee and Zhang 2017).
Legitimacy is notoriously difficult to measure (von Haldenwang 2017; Weatherford 1992). In line with our conceptualization of empirical legitimacy as the acceptance of state rule, we are explicitly not aiming for assessing normative legitimacy, i.e., the extent to which the state’s claim to rule conforms to a predefined set of norms. Unfortunately, no valid and reliable survey data of sufficient coverage on perceived legitimacy exists (cp. Call 2011: 308). The World Values Survey (WVS) provides data only about seven percent of country-years covered by our sample.Footnote 6 It would require an imputational overstretch to use this data. Nonetheless, Gilley (2006) has used the WVS to present one of the few rationalizations of empirical legitimacy across a significant number of countries. Yet even his study does not cover more than 72 countries.Footnote 7Levi et al. (2009) have used Afrobarometer survey data to analyze the effect of trustworthiness of government and procedural justice on legitimacy. However, Afrobarometer and its siblings in other continents do still not provide sufficient coverage across time and space due to insufficient survey frequency. In addition, Levi, Sacks, and Tyler limited their analysis to “[c]ountries involved in a transition to democracy” (2009, p. 370), rightly assuming that in these contexts survey data would yield a reliable representation of respondents’ actual beliefs. Under conditions of a repressive government with an elaborate system of surveillance and control, by contrast, such an assumption would be more than daring. In the absence of reliable survey data, our second best option is thus to draw on indirect indicators of legitimacy. One of these is repression expressed in state-sponsored human rights violations. Due to its high cost, outright repression is a state’s last resort. It can thus serve as a proxy indicator, as Dogan (1992, p. 120) notes: “Theoretically, the lower the degree of legitimacy, the higher should be the amount of coercion. Therefore, in order to operationalize the concept of legitimacy it is advisable to take into consideration some indicators of coercion, such as the absence of political rights and of civil liberties.” We employ a new, continuous meta index of human rights protection developed by Fariss (2014). A similar reasoning applies to the cost of restricting press freedom. It will only be attempted when free media would undermine the state’s ability to claim the support of the wider population. We employ Freedom House’s “Freedom of the Press Data” to measure press freedom. Finally, a more legitimate state can be expected to drive fewer citizens into emigration, e.g., through political persecution. Even if people have no possibility of expressing their discontent publicly, they usually still have the option of “exit” (Hirschman 1970). The number of asylums granted in other countries per 100,000 inhabitants in the sending country is a good indicator for politically (rather than economically) motivated exit. To be sure, none of these indicators measures empirical legitimacy directly, and none of them is a perfect representation of the underlying concept. Yet, they jointly represent conditions of which at least one can be expected to be present in any state struggling with achieving domestic legitimacy. Hence, for want of better options, we consider this set of indicators the best approximation of empirical legitimacy available with sufficient coverage.
Some of our indicators do not report data for every country year in our sample. In the case of homicides, for example, reporting is incomplete for many poor countries. BTI data is only published biannually. To close these gaps, we linearly interpolate missing data points within countries. Where data at the beginning or the end of a time series are missing, we extrapolate the latest available score. Table 1 shows what share of observations is imputed for each indicator, and how many years we extrapolate, if necessary. Note that there may still be missing data for some country years if missing observations lie outside the extrapolation ranges we define, and if countries have no single data point for a particular indicator (e.g., most OECD countries for the BTI indicators). Table 2 shows the number of observations available after imputation. The Supplementary File provides full details on our imputation procedure and discusses its justification.
Table 1 Imputation, truncation, and transformation of the indicators Table 2 Summary statistics: dimension scores and imputed and transformed indicators In order to combine the information across the indicators into dimension scores, we transform all raw data to scores ranging from 0 to 1, where higher values imply better outcomes. This is done by first truncating the raw variable scores at pre-defined lower and upper bounds. This step is necessary to avoid that extremely large values dwarf the differences between other countries in this dimension. We calibrated these extremes so that variables that best represent each dimension determine the lion’s share of each dimension’s scores. These variables are homicides, child mortality, press freedom, and human rights. Empirically, they exhibit sufficient amounts of exploitable variance in most countries in the world (unlike, e.g., battle deaths, which is often zero). Conceptually, these outcome variables proxy a deficiency of the state in its respective core function. The goal is thus not to normalize each indicator, but to give it a distribution that translates into dimension scores that correspond with their concept.
The chosen lower and upper bounds for truncation are listed in Table 1; the resulting impacts are listed in the last column of Table 2.Footnote 8 After truncation, all variables are re-scaled to a zero-to-one scale. Some of the truncated and standardized indicator scores are strongly skewed, with very low frequencies at higher values. We assume that marginal effects decrease with higher values and thus take their logarithms (and bring them back to the zero-to-one scale). In a final step, we align all variables to range from their worst to their best extremes, inverting variables where necessary. Table 1 indicates how each indicator was treated in the transformation step.
A crucial question is now how to aggregate indicator scores within each dimension of fragility. The most widespread approach in index building is taking averages. This approach, however, has weak theoretical underpinnings. Why, for instance, should the absence of drinking water be made up for with higher enrollment rates? And if so, to what degree? Following Goertz (2006, pp. 128–131), we combine the transformed scores of our indicators with a “weakest link approach”: the score of each dimension per country-year is determined by the lowest value among the available indicators. Should less than two indicators be available, no dimension score is calculated.Footnote 9
For example, a country with standardized scores of 1.0 for battle deaths, 0.5 for homicides, and 0.2 for the BTI assessment of the monopoly of violence will receive a violence control score of 0.2. The idea is that even if there is no civil war causing battle deaths, and even if reported homicide rates are rather average, there must be a reason for such a low expert assessment of the monopoly of violence that is not captured with the former two indicators. This reason could be severely under-reported homicide rates, or a latent threat to stability that does not yet translate into battles or violent crime. Our approach thus prevents the undesired effects of compensation (Munck 2009, p. 32). When calculating dimension scores as averages, a country that experiences more severe civil war battles could set off this deterioration by achieving lower criminal murder rates. Such a trade-off is not a valid translation of our concept of violence control.Footnote 10 The weakest-link approach is equivalent to considering each variable a necessary component of a functioning state in the respective dimension. Table 2 shows descriptive statistics of the imputed and transformed data and of our three dimension scores. For clarity, the Supplementary File describes the entire transformation procedure in mathematical notation.