To understand why some international institutions have stronger dispute settlement mechanisms (DSMs) than others, we investigate the dispute settlement provisions of nearly 600 preferential trade agreements (PTAs), which possess several desirable case-selection features and are evoked more than is realized. We broaden the study of dispute settlement design beyond “legalization” and instead reorient theorizing around a multi-faceted conceptualization of the strength of DSMs. We posit that strong DSMs are first and foremost a rational response to features of agreements that require stronger dispute settlement, such as depth and large memberships. Multivariate empirical tests using a new data set on PTA design confirm these expectations and reveal that depth – the amount of policy change specified in an agreement – is the most powerful and consistent predictor of DSM strength, providing empirical support to a long-posited but controversial conjecture. Yet power also plays a sizeable role, since agreements among asymmetric members are more likely to have strong DSMs due to their mutual appeal, as are those involving the United States. Important regional differences also emerge, as PTAs across the Americas are designed with strong dispute settlement, as are Asian PTAs, which contradicts the conventional wisdom about Asian values and legalization. Our findings demonstrate that rationalism explains much of international institutional design, yet it can be enhanced by also incorporating power-based and regional explanations.

Fig. 1


    See Last accessed 22 March 2015.

    As James Gaathi notes, in a paper aptly titled The Under-Appreciated Jurisprudence of Africa’s Regional Trade Judiciaries: “…in Africa there has been an exponential use of these judiciaries without much acknowledgment in the academic literature” (Gaathi 2010a, 246).

    One current example is the investor-state dispute settlement provisions that were widely inserted in bilateral investment treaties (BITs). They subsequently were used more frequently than anticipated, in a manner that imposed significant costs on states, thus provoking a recent backlash (see Simmons 2014).

    See the growing work on forum-shopping (Busch 2007; Davis 2009), in which states strategically use the forum that is most likely to rule in their favor (e.g., WTO vs. a regional or bilateral DSM).

    Jo and Namgung (2012) collapse Smith’s legalism scale to three components – looking only at the presence of legal dispute settlement (third party review), bindingness of rulings, and the permanence of the body. Kono (2007) uses the same coding as Smith.

    From purely a methodological standpoint, one way to deal with this concern is by using factor analysis (Hooghe et al. 2014).

    Indeed, our data show that the sizeable majority of PTA signed over the past three decades allow for one or more forms of legal dispute settlement.

    We claim that the number of members and overall depth of the agreement should affect DSM design. These first two elements are established during the early stages of talks, while decisions about how to address future disputes over the agreement’s terms are addressed later in the negotiation process.

    For instance, a Congressional Research Service (CRS) analysis of the U.S.-South Korea PTAs notes that “(t)he potential scope of KORUS FTA dispute settlement is limited by the scope of the KORUS FTA obligations that would be taken on by the Parties” (Grimmett 2012, 2).

    Gruber (2000) discusses this issue in the context of NAFTA dispute settlement. Also see Allee and Peinhardt (2010, 2014) on the role of powerful states in the design of DSMs in BITs.

    The project identifies just over 700 post-war treaties, but the number of coded treaties is 589 due to treaty text availability.

    Laks-Hutnick (2013), for instance, identifies a total 1,207 trade disputes that were taken before the DSMs of four prominent Western Hemisphere PTAs (North American Free Trade Agreement, NAFTA; the Central American Common Market, CACM; the Common Market of the South, MERCOSUR; Andean Community) between 1995 and 2010.

  13. 13. (last accessed 25 March 2015). For a discussion of the mechanisms and their performance, see De Mestral 2006.

    The list of such cases is on the U.S. State Department’s website: (last accessed 25 March 2015).

    See Porges 2011, 473, on the former and Chase et al. 2013, 47, on the latter.

    For instance, Garcia Bercero (2006) notes that many trade disputes involving the E.U. are addressed through the various consultations mechanisms that are included in E.U. PTAs. See also Luo (2006, 443) on disputes within the Association of Southeast Asian Nations (ASEAN) and Dominguez (2007) on disputes among CACM members.

    We posit that stronger DSMs, which include these design features, should enhance treaty enforcement. Note that our dependent variable measures DSM strength; it is not a measure of actual treaty enforcement or dispute activity.

    We do not count cases where no arbitration or adjudication is specified, but the dispute settlement language allows for trade remedies.

    In principle this index can range from 0 to 12, but in reality the largest value maxes out at 9.

    Two of the six components (time limits, comprehensiveness) are already binary (0–1). For three of the others (forum choice, chairman selection, sanctions) we divide by the maximum value to array on a 0–1 scale. For the fourth (delegation), we (re-)code both ad hoc arbitration and creation of a standing body as 1.

    One clear factor emerges, with an Eigenvalue of 3.37. The loadings for the six variables that comprise the factor are as follows: delegation (0.81), forum choice (0.70), chairman selection (0.78), time limits (0.81), sanctions (0.82), and lack of exemptions (0.53).

    For purposes of robustness, we also consider various transformations of this variable (log, quadradic) as well as a simple distinction between bilateral (two) and plurilateral (three or more) agreements.

    We follow the United Nations classification for matching countries to continents. See, last accessed on 31 August 2014.

    This, and the next two, indicators of depth are drawn from Dür et al. (2014). For details on the components for the various depth measures, see Appendix 1 of our online appendix, which is available at the journal’s website (

    See Appendix 1 for details.

    We categorize the United States, Canada, Japan, Australia, New Zealand and all countries in Western Europe as the “North.”

    PTAs only among Northern states should have less demand for strong enforcement, and Southern states should be less likely to surrender sovereignty to a powerful DSM in their agreements with one another in the absence of any benefit from checking the power of a larger state.

    Our empirical measure takes the log of the mean value of imports across all members of the PTA (in millions of dollars), but we also consider exports as well as other methods of aggregation. Trade data are drawn from the International Monetary Fund’s Direction of Trade Statistics (DOTS, 2010) as well as Gleditsch (2002).

    Our primary dependent variable (see Table 1g) is ordered but not interval or continuous. Thus when using this primary outcome variable we estimate ordered probit models due to worries that OLS would produce biased and misleading estimates (Peel et al. 1998). We also note that our major findings are robust across estimators.

    We include dummy variables for 5 years intervals beginning with years ending in “0” or “5” (expect for the period 1948–54). As a sensitivity check, we also substitute narrower (yearly) or broader (decade) time-period dummies, which has no impact on our core conclusions.

    To do so, we use in Stata 12 the prvalue post-estimation command from the SPost package (Long and Freese 2014). Unless specified otherwise, we hold all variables at their median values and use 2000 as the year and set depth = 3. For tractability, these calculations draw upon a variant of the primary model in which a year variable is inserted in place of fixed time-period effects.

    We substitute a plurilateral (vs. bilateral) variable and find that it is always a positive and statistically significant predictor of DSM strength, with or without the number of members variable. We also find that the log of number of members is positive and significant when substituted in place of the untransformed number of members variable.

    According to some, the E.U. long took a more skeptical position toward legally-based dispute settlement because it felt that encouraging diplomatic dispute settlement with its PTA partners would better serve its interests (e.g., Broude 2004). European Commission official Ignacio Garcia Bercero (2006) argues somewhat differently, claiming that the Community’s negative experiences with GATT dispute settlement shaped its preference for diplomatic dispute settlement in its PTAs. Regardless of the reason, close observers now claim that the E.U. has become more receptive to strong, legal dispute settlement in recent years and recent PTAs (e.g., Garcia Bercero 2006; Porges 2011). We find some evidence of such a shift in our design data beginning around the year 2000.

    Additionally, we are quite confident in the validity of our findings: there are over 75 Asian PTAs in our data set, and our measure of dispute settlement strength is carefully constructed and multi-faceted.

    In the course of our research we have seen references to several disputes in Asian PTAs that were filed before arbitration or adjudication mechanisms, but later were withdrawn, presumably due to an “out of court” settlement. See also Luo (2006: 443).

    This is potentially consistent with Jo and Namgung’s (2012) claim that democracies prefer “medium” levels of legalism.

    Note that we also explored various selection models, with DSM design as the second (outcome) stage, but in all instances the lack of statistical significance of rho, the correlation between the first and second-stage estimations, led us to reject this approach.

    All primary findings and conclusions are unchanged, with the only exception being that the estimate for the U.S. dummy variable remains positive but is not statistically significant.


We are grateful to the Swiss Science Foundation ( for financial support and to our research assistants for help in collecting the data used in this paper. For helpful comments we thank Chad Bown, Christina Davis, Jeff Kucik, and the three anonymous reviewers.

