Measuring precision precisely: A dictionary-based measure of imprecision

How can we measure and explain the precision of international organizations’ (IOs) founding treaties? We define precision by its negative – imprecision – as indeterminate language that intentionally leaves a wide margin of interpretation for actors after agreements enter into force. Compiling a “dictionary of imprecision” from almost 500 scholarly contributions and leveraging insight from linguists that a single vague word renders the whole sentence vague, we introduce a dictionary-based measure of imprecision (DIMI) that is replicable, applicable to all written documents, and yields a continuous measure bound between zero and one. To demonstrate that DIMI usefully complements existing approaches and advances the study of (im-)precision, we apply it to a sample of 76 IOs. Our descriptive results show high face validity and closely track previous characterizations of these IOs. Finally, we explore patterns in the data, expecting that imprecision in IO treaties increases with the number of states, power asymmetries, and the delegation of authority, while it decreases with the pooling of authority. In a sample of major IOs, we find robust empirical support for the power asymmetries and delegation propositions. Overall, DIMI provides exciting new avenues to study precision in International Relations and beyond.


Introduction
In the year 2000, Abbott, Keohane, Moravcsik, Slaughter, and Snidal edited a groundbreaking special issue predicated on the idea that international legalization can be described along three fundamental dimensions. This special issue has since amassed thousands of citations in the Web of Science and left a major impact on International Relations (IR) as a discipline. Still, the three dimensions pose sizable challenges to empirical measurement. Focusing on the dimension of "precision" in this article, we introduce a dictionary-based measure of imprecision (DIMI) that is fully replicable and overcomes the subjectivity sometimes encumbering existing studies. To pick just one example, while Abbott and colleagues (2000, p. 406) described the European Union (EU) as highly precise, Hooghe and colleagues (2019, p. 60) characterized the very same international organization (IO) as an "organization with studied imprecision." While this discrepancy partly goes back to different understandings and operationalizations of precision, it still begs the question of how precise the EU as an IO is precisely. DIMI answers this question in the most systematic fashion possible and yields a continuous measure of imprecision applicable to all agreements. It is a significant advancement for studies of precision, addressing the fact that -despite the prominence of precision in political science, law, and economics -"no quantifiable guidelines have been elaborated [yet]" (Koremenos, 2016, p. 161).
DIMI proceeds in two steps. First, we reviewed how precision has been operationalized in around 500 scholarly contributions citing the work of Abbott and colleagues. This allowed us to compile a "dictionary of imprecision" based on the extant literature. Second, we use this dictionary to determine the share of imprecise sentences, i.e., those containing at least one vague word. This approach is in line with the insights generated by linguists, who have written about vagueness being "contagious" in the sense of a single vague word making the entire sentence vague. We thus measure precision by its negative, as the share of sentences indicating that states wanted to increase the margin of interpretation.
To validate DIMI, we identify patterns of precision in the founding treaties of IOs, drawing on the Measure of International Authority (MIA) dataset (Hooghe et al., 2017). We opted for IOs as an application of our measure for two reasons. First, IOs are sophisticated international institutions subject to hard bargaining among states. If our approach fails to produce valid results in the case of IOs, we should have little confidence that it can measure precision in other agreements. Second, the literature provides a set of theoretical expectations about the relationship between precision and other design features encountered in IOs, allowing us to highlight the utility of our approach by empirically testing these expectations. We find that greater power asymmetries and higher delegation in IOs are associated with higher levels of imprecision. By contrast, there is no association between precision and the number of states or the level of pooling. Overall, we present reasonable evidence that DIMI produces valid measurements and illustrate that it can significantly advance the study of precision in world politics and beyond.
In the next section, we conceptualize imprecision and explain how ambiguous language shapes this dimension of international legalization. In the third section, we review existing approaches to measure imprecision, introduce our dictionary-based measure, and explain how DIMI complements existing approaches. In the fourth section, we apply DIMI to the founding treaties of major IOs. We conclude by summarizing how our approach benefits future research.

Conceptualizing precision
The notion of precision is central to any understanding of legalization in world politics and has received wide attention in the literature. The two other dimensions in the legalization literature are delegation and obligation. Precision "logically precedes" (Manger & Peinhardt, 2017, p. 3) the other two because it shapes them. For example, compare a provision such as "the parties shall substantially increase exports" with "the parties endeavor to increase exports by 20 percent". The former provision, while obligatory, may be fulfilled by a 5 percent increase (the question is how much more is substantially more; see also Linos & Pegram, 2016, p. 593). This imprecision makes it difficult for any party to seek redress. Under the latter -precise but not obligatory -provision, the same increase would cast doubt on a party acting in good faith and at least entails reputational costs. This conceptual superiority of precision, combined with the wide scholarly attention that precision as a separate dimension of legalization has received (see our review of the literature below and, in more extensive form, in Appendix A2.1), 1 underlines the need for novel methodological approaches to measuring precision systematically.
Abbott and colleagues famously defined precision as a rule that "specifies clearly and unambiguously what is expected of a state" (2000, p. 412). Similarly, Koremenos (2016, p. 160) defined the precision of international agreements as the "exactness or vagueness of its prescribed, proscribed, and authorized behaviors." As a final example, Best saw a central component of precision in linguistic ambiguity and that a text is "open to multiple interpretations because certain words are ambiguous" (2012, p. 677; italics added). All these definitions share a common conceptual core in that all, at least partly, understand precision as defined by the language used. The more international agreements are permeated by vague words, the higher their imprecision, which is not necessarily a deficiency but may help states to achieve higher (albeit "softer") levels of cooperation. 2 1 The Appendix and replication materials are available on The Review of International Organizations' webpage or Harvard Dataverse at https:// doi. org/ 10. 7910/ DVN/ 2DACNY 2 In addition to states using vague language, Linos and Pegram (2016, p. 591) highlight the importance of options and caveats in international agreements to define states' flexibility during implementation (see also Baccini et al., 2015). Furthermore, precision could be understood as the extent to which different provisions in agreements reinforce (more precise) or contradict each other (less precise) or lack provisions on key aspects altogether (gaps) (Búzás and Graham 2020). We are only interested in the precision of the language used in this article.
The level of precision spells profound consequences for international cooperation. For example, Chayes and Chayes (1993) argued that imprecision decreases compliance because states find it difficult to identify non-compliant behavior. By contrast, Goldstein and Martin (2000) argued that precision inhibits compliance. In trade agreements, precision gives rise to clearer expectations about distributional implications. The information about who gains and who loses incentivizes domestic groups to pressure their governments into non-compliance. This effect also underlines that precision of international agreements can shape key domestic processes and is thus likely carefully considered by state governments. Investigations of the consequences of precision are hampered not least by the dearth of valid quantitative data, which our approach provides.
Precision defined by ambiguous language has garnered wide scholarly attention beyond IR. In economics, Nobel Prize winners Oliver Hart and Bengt Holmström defined "incomplete" contracts as those containing "vague and ambiguous" language (1987, pp. 131-134). Jean Tirole, another Nobel Prize winner, similarly acknowledged that "[incomplete] contracts are vague" (1999, p. 471). Given that actors are boundedly rational and their inability to foresee all future contingencies as well as costs associated with drafting agreements, economists conceive all written contracts as "unavoidably incomplete" (Williamson, 2000, p. 601;italics original). Still, we subscribe to the view that some contracts are more incomplete than others (Mattli & Stone Sweet, 2012, p. 8), with the ability to discern differences hampered by the substantial challenges of empirically measuring precision.
Law is another discipline greatly concerned with the precision of language (Asgeirsson, 2020;Keil & Poscher, 2016). Some legal scholars uphold the notion that legal texts are invariably precise because vague provisions leave actors with little guidance and contradict the idea of rule-governed behavior (Mellinkoff, 1963;Tiersma, 1999). Other legal scholars are closer to our approach and search for precision in words noting, similar to the legalization literature, that imprecision is not necessarily a deficiency but can be "valuable" to lawmakers (Endicott, 2011, p. 14). As Hafner-Burton and colleagues (2012, p. 74) noted, thus far "little collaboration has occurred between law and political science" on the precision of international agreements. Our approach for measuring precision thus carries profound implications beyond IR and can facilitate interdisciplinary collaboration.
In sum, we conceptualize imprecision as a gradual feature of international institutional design, ranging from high imprecision in agreements where vague language is pervasive to high precision where such linguistic elements are entirely absent. This implies that imprecision and precision are the endpoints of one continuous scale based on the occurrence of vague language. We also note that, in contrast to studies conceiving precision as a binary phenomenon, our approach is closer to the original conceptualization by Abbott and colleagues (2000, p. 415), who offered five broad indicators (from specific "rules" to general "standards") and clearly also understood precision as a gradual feature of international legalization.

3
Measuring precision precisely: A dictionary-based measure…

Measuring (Im-)precision
We now introduce our approach in greater detail. Our dictionary-based measure of imprecision (DIMI) is replicable, can process vast amounts of text at virtually no cost, and generates a valid and continuous measure of precision. DIMI complements existing approaches in key respects. First, it is improving upon approaches based on human judgment Hafner-Burton et al., 2015;Koremenos, 2016), which can be challenging to replicate transparently and invite subjectivity. Second, DIMI complements approaches in international political economy measuring precision via context-specific clauses (Lechner, 2016;Manger & Peinhardt, 2017). These approaches are replicable but cannot be generalized beyond the contexts for which they were designed. Third, studies on environmental agreements measure precision by searching for specific targets and quotas (Bernauer et al., 2013, p. 485;Böhmelt & Spilker, 2016, p. 77). While this approach is also replicable, it is again not generalizable to contexts where targets are uncommon and yields only a binary measure of precision. DIMI provides significant benefits compared to existing approaches (see Appendix A2.1 for more operationalizations of precision in the literature).
Our approach proceeds in two steps. First, we compiled a dictionary of words indicating imprecision from the scholarly literature. Second, we used the dictionary to identify the share of sentences containing at least one of them.
In the first step, we systematically reviewed the literature to infer which words signal ambiguity in international agreements. By drawing on central contributions in IR, we compiled a dictionary of imprecision drawn from various international agreements. We started from the two foundational contributions, Abbott and colleagues (2000) and , and identified all articles citing either of these and attracting at least five citations. This selection left us with 473 articles, primarily published in International Relations and Law, which also referred to additional sources such as scholarly monographs, which we included in our review. We found seven words that constitute the core of our dictionary: "appropriate," "reasonable," "critical," "necessary," "essential," "substantial," and "adequate" (for a detailed list of where we encountered these words, see Appendix A2.1). We extended the dictionary with synonyms of these words from The Oxford Thesaurus of English. Ultimately, our dictionary contains 54 words signaling vague language. 3 We refrained from including expressions such as "common interest," "good faith," or "opinion," which can also be described as imprecise. Linguists similarly argue that many, if not most, words are "fuzzy" (Barker, 2002;Keefe, 2000). But adopting this expansive interpretation would require us to add a vast array of words across all lexical categories to our dictionary of imprecision, including all gradable adjectives and many nouns. However, rather than focusing on the inherent ambiguity of language, DIMI only includes words revealing a genuine intention among states to increase the scope for interpretation in international agreements. Put differently, states have to use some words to write contracts, but they only use some of them habitually to increase the ambiguity of provisions. These words are the focus of our approach.
In the second step, we determined the share of sentences containing at least one of the words in our dictionary. This follows the insight of linguists that, while context still matters and can help to reduce ambiguity, vagueness and context are "in principle independent properties" (van Rooij, 2011, p. 126). Furthermore, vagueness is "contagious, in the sense that expressions built up from vague predicates are often themselves vague as a result" (Barker, 2010, p. 1038; see also van Rooij, 2011, p. 124). Consequently, we do not count the number of occurrences of imprecise words, of which there may be several within the same sentence. Instead, we break up documents into sentences as a delimiter of where contagion stops. 4 We follow the exogenous approach using punctuation marks (Däubler et al., 2012), such as semicolons and colons, as natural sentence delimiters. After dropping sentences of four words or less to reduce noise in the data, the text corpus contains almost 20,000 sentences across the 76 IO founding treaties. Whenever at least one of the words in our dictionary forms part of a sentence, we code the whole sentence as imprecise.
DIMI yields a continuous measure of imprecision, ranging from zero (if no sentence contains any of the words in our dictionary) to one (if all sentences contain at least one of the words). We do not dispute that imprecision may be more consequential in some sentences than others. However, we have not found an effective approach for weighting sentences and every ambiguous sentence adds to the overall level of imprecision. Nonetheless, researchers could also use DIMI to create more fine-grained measures, for instance, by comparing the imprecision of human rights provisions across agreements. Similarly, scholars may use our approach to compare imprecision across various substantive and procedural provisions (Koremenos, 2016). Still, our interest here is to assess the general level of precision and we will show how DIMI can advance our understanding of precision in the founding treaties of IOs.

Validating imprecision in IOs
We validate DIMI by gauging imprecision across 76 IOs from the MIA dataset (Hooghe et al., 2017). All these IOs have at least three state members included in the Correlates of War dataset, distinct physical headquarters or a website, a formal structure such as an administrative body, at least 50 permanent staff, a written founding treaty, and a decision-making body meeting at least annually (Appendix Table A1). The advantage of using MIA is that it contains fine-grained information on IOs that vary considerably in terms of issue area, founding year, geographic scope, and policy functions. This sample enables us to gauge the validity of DIMI across a range of organizations. Yet, MIA is limited to authoritative IOs. As a result, the effects should come out clearly but may not extend to all extant IOs due to selection bias (Roger & Rowan, 2022). Still, a focus on the most important IOs provides an ideal testing ground for our approach.
Using the UN Treaty Series, IO websites, and requests to IO secretariats, we collected all founding treaties and converted them into machine-readable text, manually inspecting them for conversion errors. 5 We limit our analysis to the body of the founding treaties, excluding preambles, annexes, and protocols. Preambles are primarily exhortative, which should lead to less bargaining over formulations. Annexes and protocols run into hundreds of pages that often specify technical details that we could not inspect for errors due to time constraints. We assume that the body of agreements reflects an agreement's overall level of precision.

Validating sentences
We begin with verifying the validity of our approach on the level of individual sentences. Table 1 lists the ten most frequent words identified by the dictionary across all IOs (for more context on these references, see Appendix Table A2.3). The results demonstrate that many words in our dictionary hardly feature in IO founding treaties. The words "necessary," "appropriate," and "reasonable" alone are found in over 1,500 imprecise sentences, whereas the seven words completing the top ten combined yield only about 500 hits. Except for "critical," all words we inferred directly from the literature can be found in the top ten. The first synonym is "sufficient" in sixth position. IR scholars' intuition that imprecision results from a limited number of words, which our dictionary-based approach builds on, is confirmed by these results.
The examples also highlight that our dictionary identifies substantively and procedurally important provisions. While the literature often focuses on substantive provisions (Bernauer et al., 2013;Koremenos, 2016, p. 158;Spilker & Böhmelt, 2013), our measure also identifies procedural issues that are regularly central to the interpretation and application of international agreements. The example for "necessary" from the Convention Establishing the European Free Trade Association (EFTA) gives the Council a wide margin of interpretation on how to facilitate (or hinder) the establishment of businesses in other member states, which was a complicated issue to resolve in the negotiations (Curzon-Price, 1998, pp. 26-27). Similar examples, ranging from import regulations in the case of NAFTA, over external tariffs (CAN), taxation (CEMAC), and air navigation (ICAO), to the functioning of a common market (COMESA), show that imprecision is not limited to marginal provisions but reaches to the core of an IO's raison d'être.
The example for "appropriate" taken from the International Labour Organization (ILO) illustrates that DIMI also identifies imprecision in procedural provisions. The wide margin of interpretation going back to the vague language  (Schmidt, 2018); the "gravity threshold" of the Rome Statute giving the ICC broad discretion on whether or not to prosecute certain crimes (deGuzman, 2008); and a vague provision on how much time the member states of the Organization for Security and Co-operation in Europe (OSCE) can take to examine visa requests. These examples show that DIMI identifies important provisions that shape the decision-making of states and international agents across a wide range of issue areas.

Validating descriptive patterns
We next gauge the validity of DIMI by focusing on variation across IOs (Fig. 1). The first important observation is that our measure generates a gradual distribution of precision, varying from zero imprecision for the Association of Southeast Asian Nations (ASEAN) to almost 25 percent of sentences classified as imprecise for the OSCE. This pattern shows that IOs spread along a continuum of imprecision. The mean value of 9.3 percent highlights that states largely refrain from adding vague words to sentences when drafting agreements. This supports the intuition of Abbott and colleagues (2000, p. 414) that "much of international law is in fact quite precise." The second significant result is that the poles of our distribution are in line with existing research. Focusing on the poles ensures that we deal with clear cases that are easier to qualify empirically as extremely precise or imprecise. At the lower end of the distribution, we find the Bangkok Declaration establishing ASEAN, which does not contain a single imprecise sentence. While this finding may appear surprising at first glance, it aligns with qualitative evaluations. The "ASEAN Way" developed a distinct approach to international cooperation that privileged sovereign equality, non-interference in the domestic affairs of the member states, and decision-making by consensus largely without structured formal procedures or the help of a powerful IO secretariat. While the five founding members agreed on the need to begin a process of regional cooperation to avoid conflict, they were eager that ASEAN would not develop too dynamically (Acharya, 2001). The Bangkok Declaration was negotiated in a limited number of meetings, sparking little controversy (Irvine, 1982, pp. 12-7). This basic approach is reflected in its short founding treaty, providing no basis for expansive linguistic interpretation. 6 The foundation of the OSCE played out very differently from that of ASEAN. It was set up in 1975 as the Conference on Security and Co-operation in Europe by the Helsinki Final Act. Hence, ASEAN and the OSCE were founded around the same time, towards the height of the Cold War. But while the Cold War was merely a background condition for the creation of ASEAN, the formation of the OSCE took place at the center of the conflict and involved direct negotiations between the United States, the Soviet Union, the EU, and non-aligned countries in Europe. It was an acrimonious process stretching over two and a half years where every word was contested and formulations negotiated at length. Thousands of official negotiating sessions were necessary to forge an agreement, with thousands more unofficial negotiations taking place at all levels (Thomas, 2001, p. 86). In the end, founding members deliberately designed the OSEC as a non-legally binding conference whose hallmark was political flexibility (Moser & Peters, 2019). This arduous negotiating process is reflected in its founding treaty, in which 23.4 percent of all sentences contain at least one vague word.

Some determinants of precision in IOs
This section provides a statistical analysis of the relationship between imprecision and four important IO design variables to illustrate how DIMI can contribute to exploring key questions in the literature. We first draw on the literature on institutional design to develop propositions on the relationship between precision and the number of states, power asymmetries, delegation, and pooling. We then examine these propositions using OLS regression and find robust empirical support for the power asymmetries and delegation propositions. Importantly, we are not making any

Propositions
Membership and power asymmetries are two fundamental institutional design choices (Snidal, 1985, p. 929) that carry significant implications for states' preferred degree of imprecision. First, theorizing on the effect of membership on precision has undergone a shift in the literature on institutional design. Koremenos and colleagues (2001, pp. 794-795) initially conjectured that the flexibility of international institutions decreases with the number of members because the costs of greater flexibility outweigh the benefits. This argument was based on the consideration that renegotiating costs increase, leading states to prefer a more stable settlement. Yet, in her refinement of the rational-design conjectures, Koremenos (2016, p. 49) clarifies that this hypothesis does not apply to imprecision. Instead, she expects imprecision to increase with the number of members to overcome diverse preferences. Other contributions similarly argue that vague language facilitates the conclusion of negotiations (Bernauer et al., 2013;Linos & Pegram, 2016). Hence, we expect that when the number of founding members is high, states are likely to design more imprecise agreements.
Second, the distribution of power among members shapes imprecision (Koremenos et al., 2001, p. 765). Powerful states carry weight in IOs because many cooperation problems cannot be effectively addressed without their participation. This makes powerful states essential to effective international cooperation, thus giving them leverage in negotiations (Gruber, 2000). Imprecision "compensates" powerful states for their participation and allows them to interpret rules in a self-interested manner without formally breaking them (Stone, 2013). Imprecision thus allows powerful states to steer implementation without risking a breakdown in cooperation due to overt non-compliance (Baccini et al., 2015). Hence, we expect that when power asymmetries among members are high, states are likely to design more imprecise agreements.
Third, member states can delegate decision-making powers to permanent IO secretariats to solve collective action problems (Hooghe & Marks, 2015, p. 307). Imprecision reinforces delegation in several ways (Abbott & Snidal, 1998;Hawkins et al., 2006, pp. 13-20). Drafting precise mandates restricts agents' ability to incorporate new information or adjust to unforeseen developments. Moreover, if agents enjoy more leeway to evaluate policy alternatives, states can reduce their own effort and lower transaction costs (Lake & McCubbins, 2006). A higher degree of delegation also indicates that states are more interested in ensuring progress in an area than in avoiding agency losses. Therefore, the "combination of relatively imprecise rules and strong delegation is a common and effective institutional response" to cooperation problems (Abbott et al., 2000, p. 405). We thus expect that as delegation increases, states are more likely to craft more imprecise agreements.
Fourth, pooling can shape imprecision. When states pool authority, they surrender their national veto or accept binding international decisions, resulting in potential sovereignty losses (Lake, 2007). Given the vital importance of sovereignty, states should be more concerned about imprecision when pooling authority in IOs . More imprecision increases the potential for unforeseen and unintended consequences that could encroach on the national sovereignty of founding members (Cooley & Spruyt, 2009). If pooling is high and rules are imprecise, member states may face decisions they would not have accepted in the founding treaty. States thus risk partly losing control over areas they did not want to be subject to collective decision-making. Hence, we expect that when pooling is high, states are likely to design less imprecise agreements.

Operationalization
We operationalize the variables as follows. The number of states is a count of founding states (number of states). For the distribution of power among founding members, we use the presence of unrivaled major powers. This variable scores one if only one of the following countries was a founding member: China, France, Germany (pre-1945 and post-1990), Japan, Russia, the United Kingdom, and the United States. For delegation and pooling, we use the two respective indices from MIA (Hooghe et al., 2017). Delegation captures the extent to which states empower agents to set the agenda and take decisions across six areas (membership accession, membership suspension, policymaking, budgetary issue, dispute settlement, and constitutional reform). Pooling integrates three sets of rules: collective decision-making procedures (consensus, unanimity, or majoritarian voting), domestic ratification requirements (whether states must ratify decisions), and the bindingness of decisions. Both indices range from zero to one, with higher values indicating more delegation or pooling.
We include three control variables. First, regime type may shape imprecision because democracies trust each other more and are thus more likely to agree to imprecise formulations (Hooghe et al., 2019;Hyde & Saunders, 2020). Using Polity IV scores (Marshall et al., 2016), we compute the share of members scoring seven or higher (democracy). Second, domestic norms and legal traditions also shape international institutions (Carlson & Koremenos, 2021). States with common-law systems may be more likely to prefer less formal and more imprecise designs (McLaughlin Mitchell & Powell, 2011). We control for the effect of domestic legal norms, using data gathered by Powell (2019) to calculate the share of countries with commonlaw traditions (common law). Third, we control for issue scope. A higher scope could imply more imprecision to overcome collective-action problems in more areas (Hooghe & Marks, 2015, p. 311). We use the absolute number of policy areas covered by an IO (scope) from MIA. 7 Table A2.4 in the Appendix presents descriptive statistics for all variables.

3
Measuring precision precisely: A dictionary-based measure…

Results
We test the propositions by estimating a cross-sectional OLS model, with the share of imprecise sentences as the dependent variable (Table 2). Beginning with our main model (Model 1), we find a positive and statistically significant effect for power asymmetries. This finding aligns with our expectation that IOs with major powers have more imprecise treaties. The effect of this variable is substantial, leading to a 15 percentage points increase in imprecision. Moreover, the statistical association between delegation and imprecision is positive and statistically significant. This result provides evidence that states support delegation through vague language. The substantive effect is strong. Moving delegation from its minimum to maximum is associated with a 20 percentage points increase in imprecision.
By contrast, we find little evidence that the number of states and pooling are associated with imprecision. Although the signs of both coefficients are in the expected direction, they are not statistically significant. There are several explanations for these results. First, given the small sample size, the models may lack "statistical power" to distinguish smaller effects from zero. For the number of states, it may be that both perspectives in the design literature -one stressing that a greater number of states have an interest in more precise agreements to secure bargains and one positing that flexibility is needed to reach these bargains in the first place -have merit and cancel each other out. As for pooling, the result may be explained by the fact that big states shape IOs, as our findings for major powers suggests. The effect could be that major powers feel less threatened in their sovereignty by imprecision, even when pooling authority in IOs. In Model 2, we include the three control variables, which all fail to achieve statistical significance but behave as expected and leave the main results unaffected.
We implement several checks to assess the robustness of our findings (Appendix A4). First, we use the standard deviation of ideal-point estimates from UN General Assembly voting patterns (UN vote) as an alternative operationalization of preference heterogeneity (Bailey et al., 2017). The results are almost identical. Second, we use two alternative operationalizations of power asymmetries. We first use the number of major powers in IOs. The second alternative is concentration ratio as a more granular measure of power using data on National Material Capabilities (NMC, Version 5.0) (Singer et al., 1972). This ratio ranges from zero to one, with zero indicating an even distribution of power among states and one a perfect concentration of power in the hands of only one member. While the number of major powers has the expected effect, the concentration ratio does not. This result suggests that only major powers are powerful enough to shape the level of precision in founding treaties.
Third, we replicate the analysis for two subsamples. The first subsample only keeps IOs founded after 1944 because the dynamics of international cooperation may have changed after World War II. The second subsample controls for outliers by excluding IOs with a Cook's distance of greater than 4/n (OSCE, ICC, AU). Our findings are robust to these changes. Fourth, we run a series of regressions for each predictor separately. The results for power asymmetries and delegation hold, and we also see a statistically significant association between the number of states and imprecision. At the same time, pooling shows a weak association with imprecision in the wrong direction (positive).
Finally, we employ a set of alternative operationalizations of our dependent variable. First, OLS estimations with truncated dependent variables can lead to prediction errors. Hence, we run a model with a logistically transformed dependent variable. Second, we only use the seven words derived from the literature to compute imprecision, excluding all synonyms that we added. Third, we probe into the robustness of our findings by replicating the analysis with dictionaries iteratively excluding each of the seven original words. The results show that neither the inclusion of synonyms nor the choice of individual words drives the results. Still, we prefer a more extensive version of the dictionary, including all synonyms, to increase the applicability of our approach beyond IR. Fourth, we test for the importance of vagueness contagion for our results and rerun the analysis with the share of imprecise words over the total number of words (after removing stop words and brackets as used in enumerated lists), rather than splitting up founding treaties by sentence. The two measures of imprecision correlate strongly (r = 0.93). Using the share of imprecise words leaves the finding for power asymmetries unaffected but delegation falls just short of generally accepted levels of statistical significance (p = 0.1072).

3
Measuring precision precisely: A dictionary-based measure…

Conclusion
Precision is one of three fundamental dimensions of legalization in world politics. Despite the concept's prominence, scholarly efforts to systematically measure it to prepare the ground for methodical empirical analyses have been limited. In this article, we introduced DIMI as a dictionary-based approach that is fully replicable, can be used across a variety of international agreements, and yields a valid continuous measure. Our approach opens exciting avenues for future research by giving IR scholars quantifiable guidelines to include (im-)precision in their studies. After two decades in use, the concept of precision can aspire to the same standard of empirical accuracy as other concepts in the institutional design literature. Beyond this contribution and to illustrate the new possibilities that our approach allows, we have provided some initial evidence that key design features in major IOs are systematically associated with imprecision. In line with expectations derived from the literature, we found that power asymmetries and delegation are associated with higher levels of imprecision. This article thus sheds new light on the institutional design of IOs, which has recently exhibited a trend towards large-N analyses. Apart from using DIMI to measure and explain the level of designed precision in even more international organizations (Roger & Rowan, 2022) and international agreements in various areas (Koremenos, 2016;Mitchell et al., 2020;Thompson et al., 2019;von Stein, 2018), it can power an agenda focused on the consequences of precision for international cooperation.
We already sketched new avenues for studying the effect of precision on compliance made possible by our approach. But DIMI also helps tackle other key questions of international cooperation thus far hindered by the unavailability of quantifiable guidelines. For example, our approach opens exciting opportunities for more empirical research on the effects of precision mediated by levels of obligation. As noted above, precision takes precedence over obligation. Still, the combination of precise language and high obligation likely plays out differently for agreements relying on precise language and low obligation. DIMI also provides avenues for future research on how imprecision shapes principal-agent dynamics. Principalagent theorists highlight the dilemma states face when setting up international secretariats. On the one hand, international agents require resources and leeway to fulfill their delegated tasks. On the other hand, agents can use this discretion to seek greater autonomy (Hawkins et al., 2006). Agents established in IOs with imprecise founding treaties could use ambiguous provisions to increase their autonomy, such as by setting up emanations (Johnson, 2014). By contrast, and as argued above, imprecision allows powerful states to exert influence in IOs and thus possibly rein in errant agents. Precise mandates may thus, counterintuitively, put agents in a better position to engage in autonomous behavior. Overall, DIMI could spark a reinvigoration and refinement of the legalization literature in the future.
Finally, our article carries significant implications beyond IR. In political science more broadly, for instance, DIMI can be used to compare and explain varying levels of precision across national constitutions and laws, again also investigating effects of these differences over time. DIMI could also reinvigorate interest in precision in related disciplines such as law and economics, as problems of empirical measurement also hampered progress in these disciplines. Given that (written) contracts underlie almost all legal and economic interactions, the array of potential applications is vast. Scholars can apply our approach to all documents carefully crafted among negotiators, such that individual words carry significant informational value. DIMI will certainly not be the preferred operationalization for all research questions. But particularly in large-N studies, it could develop into a vital alternative for operationalizing precision and advance empirical studies on its origins and consequences for years to come.