In 1990, an obscure, one-year task force published 40 vague and voluntary recommendations for building a better international anti-money laundering (AML) regime. Susan Strange [1] later would suggest that this Financial Action Task Force on Money Laundering [2] was likely an empty effort by a G7 motivated by a need to be seen as doing something, anything, to respond to expanding transnational illicit narcotics markets. In the intervening years, FATF has become the focal institution of a powerful financial governance regime. Governments rely on its tools to counter crimes of all kinds, including transnational organized crime, terrorism, and weapons proliferation. Nearly all states have committed to meet its standards. Most states have joined regional iterations of FATF. The most influential international organizations have integrated FATF’s standards as their own. G20 leaders in 2009 identified FATF as a keystone institution in a promised new global financial regulatory architecture. Once an oddity, FATF now seems like a model of global governance in a landscape characterized by contested multilateralism [3], plurilateralism [4], regime complexes [5], and “good enough global governance” [6].

What explains this rapid evolution from a fringe institution to a key body of governance, especially in a field as challenging as illicit finance? Most observers suggest that this impact stems from some mix of material and social—or coercive and persuasive—mechanisms. Socialization within a growing epistemic community is important, they argue, and may trickle up to the state. Ultimately, however, threats of exclusion from major markets raise the cost of non-compliance, which provides an incentive to cooperate.

There is some validity to that interpretation, but it misses much of what happens within FATF and the anti-money laundering regime. The evidence below shows that FATF’s development is better understood as a case of “experimentalist governance” [7,8,9,10,11,12,13,14].

Footnote 1Experimentalist decision-making is built around an inclusive, multi-level network of stakeholders who establish, evaluate, contest, and consistently revise standards in order to generate new knowledge about challenges and solutions [11]. Material sanctions are not precluded, but participants should use them to encourage deeper engagement with experimentalist processes, not to enforce narrow compliance. Knowledge creation, social learning, and persuasion are critical tools for promoting cooperation. Proponents argue that such governance systems create more democratic deliberative spaces that simultaneously promote normative convergence and the ratcheting up of standards through a forward-looking, problem-solving dynamic [15].

It has been through this decision-making process that FATF members have built a far-reaching and influential anti-money laundering regime. When members have abandoned this approach, FATF’s influence has been more contested and its progress more inconsistent. In this way, experimentalism as an analytical framework performs better than approaches that focus primarily on material mechanisms of change in the AML regime.

This article foregoes a primer on money laundering; for more on that, see the editor’s introduction in this special issue. The next section highlights the shortcomings of existing explanations before comparing those approaches, in section three, with experimentalist governance. In section four, the paper provides a closer look at FATF’s form and function to justify the interpretation. It analyzes in particular the architecture of FATF decision-making, the controversial and complex role of blacklisting, and specific cases whereby experience with implementation has led to the updating of international standards. Finally, the article considers the implications of this new interpretation of FATF and the AML regime, in particular for the continued development of the regime and for global governance more broadly.

Old and new takes on FATF: From realism to experimentalism

Despites its significance, we lack a satisfactory understanding of the dynamics that drive FATF and the AML regime. The simplest explanation for FATF’s influence—that it reflects the power distribution among states—is sensitive to assumptions about what states’ preferences actually are. Drezner [16] argues that the US and EU “were able to cajole, coerce, and enforce a global anti-money-laundering standard into existence.” Disagreements between the two led them to rely on more flexible “soft law” standards. The result is a powerful tool of global financial regulation that confirms states’ continued dominance in a globalizing economy. In contrast, Simmons [17] assumes that the US is the regulatory hegemon, but is unwilling to pay the enforcement costs for imposing a binding standard. FATF therefore can rely only on peer pressure “to embarrass governments” and global compliance reflects a spheres of influence pattern (Ibid, p. 609). Jojarth [18] argues that FATF publicizes standards set by the United States. The soft law basis reflects uncertainty about the best policy, but that also limits enforcement and, therefore, convergence. These three materialist approaches generate very different final analyses based on assumptions—not empirical verifications—of what powerful states want and the effects of different forms of law. One approach—realism—built around different assumptions of states’ interests, yields very different understandings of FATF.

Other scholars emphasize the constitutive effects of power-based networks. Abbott and Snidal [19] argue that “a significant degree of convergence” has occurred because soft law accommodates national diversity, creates expectations of political costs for non-compliance, legitimates third-party influence, and invokes a legal discourse. Hülsse and Kerwer [20] write that FATF’s experts generate legitimacy by marketing practical knowledge as the correct, impartial solution, which grants experts rule-making authority that is further enhanced by third-party endorsement. Jakobi [21] writes that the US has put itself at the center of the global AML network and adopted what might be described as a Gramscian strategy, such that other actors begin to accept the US’ preferences as their own. Finally, Sharman [22, 23] argues that supporters of AML strategically constructed non-compliant jurisdictions as renegades in the financial system, which persuaded key actors in targeted states that the material impacts would be serious and converted those actors into advocates for reform.

These network approaches highlight important aspects of FATF, especially Sharman’s, but they remain fundamentally materialist. Social mechanisms matter only to the extent that participants fear a material impact will result. They also assume that one or two states’ interests drive the standards, but stop short of squaring that assumption with the reality that FATF’s standards have changed over time and in ways that are contrary to domestic constituencies. Of all the interest-based explanations, only Sharman [22] goes to any length to explain what shapes the varying interests in the re-regulatory process that the AML regime represents.

The centrality of blacklists in almost all of those explanations also raises difficult questions that deserve some attention at this point.Footnote 2 The internal logic is reasonable. FATF names states or jurisdictions that are non-compliant, signaling to market actors that transactions with institutions in those jurisdictions may be subject to additional scrutiny by the blacklisting states. This potential added difficulty compels market actors to either divest currently held assets or avoid new transactions in targeted countries, which harms the targeted jurisdiction’s economy. Targets, or would-be targets, comply with FATF’s standards to avoid that fate. In my interviews with delegates to and from FATF, informants generally stressed first that FATF was about knowledge creation and persuasion, but followed that quickly with the idea that the threat of material enforcement was necessary, too. A German treasury official’s comments were typical. When asked why states do not want to be blacklisted, he responded: “Fear.” Why was there fear? “Loss of reputation, [debt] ratings, a loss of image.”Footnote 3 Talking about the case of Liechtenstein, Sharman [23] writes: “In many cases it is difficult to conclusively link material decline to the effects of blacklisting but the general opinion among government officials and those in the financial services industry was that the lists caused the damage.”

The challenge for this interpretation is finding systematic evidence to support it. Even anecdotal evidence is rare. When I asked the German official cited above to provide an example where that had happened, he responded: “It’s simply believed.” In my interviews with representatives of large and small states, as well as delegates from FATF and regional AML bodies, no one was able to give an example of blacklisting having actually harmed an economy.

Furthermore, no study to date has offered systematic evidence of this impact, while there is evidence that those perceptions are inaccurate. Harvey and Lau [24] find that banks rarely publicize their own AML efforts, as we would expect if there were rewards or penalties at stake. We might expect blacklisted jurisdictions’ currencies to lose value, especially relative to the US, as a financial superpower and strong FATF supporter, but that is not the case [25]. Even those involved in FATF policy-making disagree on the exact tools at work. The German treasury official cited above and a high-ranking official in an international business lobbying organization suggested that debt rating agencies consider FATF compliance in their ratings. Officials within the FATF secretariat were skeptical (interview with author, 2014). In other words, people who strongly shape the AML regime have different understandings of the issue. This suggests that FATF’s market effects are not pre-eminent in the minds of those involved.

In the most sophisticated analysis to date, Kudrle [26] uses Bank for International Settlements data to analyze blacklisted tax havens’ “total assets and liabilities and the associated component non-bank assets and liabilities as well as their counterparts for the loan and deposit data.” Testing whether blacklisting would lessen, increase, or have no effect on totals, he finds that the results are inconsistent across countries, across blacklists, and across instruments. He concludes that the data fail to “suggest that blacklisting made an important systematic difference in the volume of banking system-related tax haven fortunes” (Ibid, p. 45).

The best available evidence is imperfect, but it suggests that market activity does not respond strongly to blacklisting. If that is true, it means we need a much better understanding of blacklisting’s complex role in the regime. It also means that we still have an incomplete picture of FATF. The following section attempts to fill in some of those important blanks through the lens of experimentalist governance.

Experimentalist governance

Scholars in the fields of law (e.g., [27]), sociology (e.g., [28]), and European Union studies [10, 14] have begun re-imagining how institutional design and function can combine to produce a different mode of governance. Sabel and Zeitlin [10] identify four cornerstones to this “experimentalist architecture:”

  1. 1)

    Relevant actors (often states and international institutions) set framework goals (i.e., broad goals such as “full employment” or “a unified energy grid”) and metrics for gauging progress;

  2. 2)

    Lower-level units are obliged to implement policies aimed at meeting those standards, but are given the autonomy to implement the policies they deem most appropriate;

  3. 3)

    In return for that autonomy, they must report regularly on their performance and participate in a peer review that compares their performance to others’; and

  4. 4)

    The collective wisdom gained from implementation and evaluation is taken into account as the goals, metrics, and procedures are periodically reviewed and revised by participants.

Nance and Cottrell [15] summarize the distinction of experimentalism from command-and-control regulation as emphasizing: flexible and revisable standards over fixed and universal rules; broadly participatory networks over state centric quasi-hierarchy; and dynamic problem-solving over rule enforcement. In the context of the AML regime, five distinctions are particularly important in separating experimentalist governance from more traditional conceptualizations of governance.

First, the role that material sanctions play is the most relevant to the FATF debate. Experimentalism sees an important role for the threat of material consequence, but it prescribes them to compel engagement in decision-making, not to compel narrow compliance. Nor is the consequence usually a direct sanction, like a fine or tariff. The goal is to establish a “penalty default,” or some governance outcome that actors perceive as sub-optimal and that applies if experimentalist governance fails. Penalty defaults are “destabilization mechanisms that induce reluctant parties to cooperate in framework rule-making and respect its outcomes, while stimulating them to propose plausible and superior alternatives, typically by threatening to reduce their control over their own fates” (Zeitlin, 2015, p. 5). Material sanctions that fail to secure the targeted actor’s deeper engagement in the governance process will be less effective, eliciting “shallow compliance” at best [29]. By this logic, material sanctions should become less necessary over time as actors become embedded within the experimentalist process, even with rising standards.

A second critical distinction is the intentional reflexivity of experimentalist governance. Participants should regularly update the core standards and should do so in ways that reflect lessons learned in implementation. Flexibility in implementation across contexts reinforces this dynamic. Experimentalist governance requires that participants adapt broad standards to their particular contexts. The network as a whole then systematically assesses the effectiveness of those variations, and uses lessons learned to revise the standards. We therefore would expect to see that revisions sometimes reflect the preferences of the powerful, but not always. In short, experimentalism expects social learning. In Hall’s [30] classic formulation, social learning is “a deliberate attempt to adjust the goals or techniques of policy in response to past experience and new information. Learning is indicated when policy changes as the result of such a process.” This implies that experimentalist institutional development may follow a path that, looking back, was not clear at any given point [31].

The third critical difference is the assumed possibility of persuasion. In addition to the “shallow” or “strategic” persuasion of rationalist approaches ( [32], p. 5), experimentalist theory argues that “deep” persuasion, or a shift in the target’s preferences over outcomes, is also possible. Gheciu [33] defines persuasion as “social interactions between actors who have drawn different conclusions regarding the nature, merits, and/or implications of [a given] action or policy, and in which one or more of those parties attempt, through arguments, to get their interlocutors to rethink their conclusions.” Persuasion is driven by an actor presenting a policy as “‘the right thing to do,’ even in the absence of direct international rewards for taking that action” (Ibid: 982). Like other modes of interaction, persuasion is a power-laden process; some actors are better able to shape the ideas that define the kinds of arguments and evidence that are considered valid (Ibid).

The challenge is identifying when persuasion has been used. Risse [34] notes that: “We have probably witnessed processes of argumentative persuasion when powerful governments change their minds and subsequently their behavior, even though their instrumental interests would suggest otherwise, or when materially less powerful actors such as small states or nonstate actors carry the day.” If we lack evidence for a shift in interests and the use of coercion, on one hand, and we see a change in preferences or behavior and mechanisms of persuasion or learning, on the other, then we have good evidence that persuasion or learning has occurred.

In experimentalist institutions, peer review is the most important site of persuasion and learning. While it takes many forms, in essence peer review means that a state’s performance (in this case) is compared to the performance “with those pursuing other means to the same general ends” ( [11], p. 3). It is “experts criticizing and responding to criticism by experts in public” (Ibid., 6). As Abdelal [35] shows in the case of the OECD, peer review encourages the diffusion of standards through the socialization of actors into the group’s collective norms. Mutual expert evaluation and requiring justification for policies by a common standard also helps counteract some of the information asymmetries inherent in delegation, especially when the review is open to public scrutiny ( [11], p. 12).

A fourth distinction is the wider base of participation in decision-making. Greater participation enhances the legitimacy of the process. It reinforces reflexivity, as those most responsible for, and most directly affected by, the standards’ implementation feed their experiences back into the standard-setting process. Participation also is key to “democratic destabilization,” a process whereby deliberative rule-making renders formerly technocratic and closed processes accessible to “a wider range of information, experience, and argument” [36, 37]. Democratic destabilization helps dislodge participants’ pre-existing understandings of cause and effect, which minimizes the positive feedback effect of existing institutions and therefore promotes learning [38]. Broader participation also promotes more dynamic forms of accountability, enmeshing decision-makers in a denser web of domestic and international oversight [36].

Finally, experimentalism assumes that obligations are social facts that may extend from, and affect, a variety of actors or settings. It thus challenges the distinction between “hard” and “soft” law and focuses instead on the social construction of obligation. Experimentalist work in this way overlaps with constructivist approaches to law and governance [14].

Importantly, critics have argued that procedures identified as experimentalist often fall short of this standard: in particular, that participation is select or ineffective [39,40,41] and that learning and persuasion are rare [42]. Coercive interpretations of FATF would suggest the same is true of that institution: that function does not follow form. The following sections consider those aspects—FATF’s experimentalist form, then its function—in turn and consider the extent to which we can think of FATF as a case of experimentalist governance.

Experimentalism in FATF

Experimentalism in form

This repeats some of the information in the editor’s introduction (Nance, this volume), but the argument for an experimentalist interpretation of FATF necessarily begins with the close adherence between the ideal-typical form of experimentalist governance outlined above and the FATF process. Thus, this section provides a basic outline of FATF’s form. It is worth stressing that many of these factors have changed over time, as I will discuss. The goal here is to provide a frame of reference for the discussion of experimentalism’s emergence below.

In its own words, FATF’s objectives today are “to set standards and promote effective implementation of legal, regulatory and operational measures for combating money laundering, terrorist financing and other related threats to the integrity of the international financial system.”Footnote 4 This shift to a focus on systemic “integrity” is the most recent iteration of FATF’s mission and came about in the wake of the 2008 financial crisis. This widening is the most recent example of FATF’s widening mandate (editor’s introduction, this volume).

FATF is a transnational public policy network [43]. The participants represent a cross-section of international society: large and small states; powerful and weak states; post-industrial and developing economies; umbrella organizations like the IMF and World Bank; and more specific organizations like the World Customs Union. There is also a long list of actors who participate in the consultative network surrounding FATF’s work. FATF over time has included a wider array of representatives from interested parties. For some years it has held a consultative forum with private sector interest representatives, especially from those industries affected most directly by FATF’s activities, including the banking industry, lawyers, and accountants, as well law enforcement representatives. More recently, it has begun a regular consultative forum with non-profit organizations, targeted at getting input from NPOs on efforts to implement a recommendation that aims to prevent the abuse of NPOs (see Keatinge & Romaniuk, this issue). Adding to that diversity of perspectives, national delegations generally include representatives from any department that might be responsible for some aspect related to money laundering or counter-terrorism financing. Thus, FATF looks much more like a transnational, multi-level network than the quasi-hierarchical structure of many international organizations.

Despite describing itself as an inter-governmental body, FATF secretariat leaders and FATF members regularly interact with representatives from a very wide range of interests. Those meetings are surely not all aimed at FATF proponents learning from others; they more often are official outreach meetings to diffuse FATF’s standards. To the degree that those meetings allow for interaction between those implementing standards and those promulgating and overseeing the standards themselves, however, they become important potential sites of learning on both sides.

The central norms of FATF are the “FATF 40 Recommendations.”Footnote 5 While the precision of the Recommendations varies, in general they are broad and open-ended. In order to comply with the recommendations, states can and must write their own regulations that meet the goals set forth in the relevant recommendations. Recommendation 9 (formerly Recommendation 4), for example, reads: “Countries should ensure that financial institution secrecy laws do not inhibit implementation of the FATF Recommendations.”

Members review the recommendations annually for necessary changes and have revised them fully three times: in 1996, 2003/4, and 2012. In cases where experience shows difficulty in implementation or ineffectiveness, FATF often issues interpretative notes, which may after a trial period be integrated into the official recommendation it originally modified.

There is no central agreement in FATF that outlines the decision-making procedures.Footnote 6 By tradition, decision-making is consensus-based. Voting formally takes place during the thrice-yearly plenary meetings, often in Paris where the small secretariat sits under the same roof as the OECD. Informally, delegates hammer out agreements before the plenary in working groups or in more social gatherings that surround the plenary meetings. Meetings are run by the president, a one-year office that rotates among members that volunteer for the position. Members also select the president by consensus. The secretariat, comprised only of roughly 15 people, most of whom are seconded from the OECD, has no vote and traditionally performed a largely organizational and facilitative role, albeit an important one. More recently, FATF members decided that the secretariat should play a more active role in carrying out some of the core monitoring functions discussed below. This institutional detail is more significant than it might seem. While recent research on international organizations has begun to take more seriously the role of the secretariat as a quasi-independent actor in international relations, this is less true of FATF. Thus, when we talk of “FATF doing” anything, in fact that means the member states that comprise the network.

FATF’s monitoring powers are among the farthest reaching at the international level. There are two key monitoring mechanisms. The most important mechanism is the mutual evaluation. Conducted originally about every three years but now much less frequently (see Levi, et al., this issue), the mutual evaluation entails an on-site visit by a team of experts from other FATF members or representing other international institutions. That team spends roughly a week interviewing relevant actors and making site visits. In the process, they fill out a very detailed, very extensive “common methodology.”Footnote 7 Members discuss the results in a peer review in the Plenary. An unsatisfactory performance results in closer monitoring and the requirement to report again at the following Plenary on progress. Because that common methodology exists, these mutual evaluations are now generally conducted by regional FATF groupings, the International Financial Institutions, or some other international organization.

The “typology exercises” also play a monitoring role, although they are not generally acknowledged as such. These exercises are both diagnostic and generative. The exercises bring together money laundering experts from around the world to discuss current issues in money laundering. They have three basic goals: to exchange information on any on-going cases and operations; to identify and describe current trends in money laundering; and to identify and describe any effective counter-measures, as well as any failed attempts. Since 1995 the results of those meetings have been published publicly. Over the years the typologies have become more focused, with working groups established that are designed to intensify the scrutiny given to a particular topic.

As noted above, enforcement in FATF is a critical point of contention among scholars. The policy currently in place is a synthesis of previous versions and I discuss those changes in detail below. The process in place now, the International Cooperation Review Group (ICRG), began in 2007. In 2009, the G20 called on FATF to strengthen the ICRG procedures and publicly name countries with deficient AML systems. The enhanced procedure entails an initial mutual evaluation. If found to be insufficient, the country report is sent to the ICRG for an initial review, with comments from the target jurisdiction, should officials choose to provide them. If the ICRG finds the report insufficient, FATF conducts additional monitoring and expects the target to work with FATF members in order to develop a comprehensive reform plan with high-level political commitment. FATF now maintains a rolling list of jurisdictions that are seriously deficient and making no progress toward reform, as well as a list of those who have made the commitment but have not yet completed the required reforms.

This basic outline of standard setting and governance in FATF closely reflects Sabel and Zeitlin’s architecture of experimentalism. First, FATF members and relevant institutions set framework goals (the 40 Recommendations) and metrics for gauging progress toward those aims. The common methodology used for the mutual evaluation, as well as the process for evaluating those, define the metrics for evaluation. Second, lower-level units are obliged to implement policies aimed at meeting those standards, but are given the autonomy to implement the policies they deem most appropriate. The Recommendations are broad and focus on outcomes, requiring jurisdictions to write their own legislation and encouraging culturally more specific regulatory systems. Third, in return for that autonomy, they must report regularly on their performance, participate in a peer review of their policies comparing their performance to others. Monitoring is extensive in FATF and peer review is a critical part of decision-making in FATF, dynamics bolstered by the systemic, peer-based monitoring of the typologies exercises. Finally, the lessons learned from implementation and evaluation are taken into account as the goals, metrics, and procedures are periodically reviewed and revised by participants.Footnote 8 The most recent iteration represents the fourth version of the Recommendations. In between major revisions, FATF’s use of interpretive notes serves to clarify the intention of the Recommendations. FATF also circulates best practices to serve as a guide. Figure 1 provides a rough flowchart of FATF decisionmaking.

Fig. 1
figure 1

A representation of the process by which participants have created and revised FATF standards

Beyond form: Experimentalism in function

Three key pieces of evidence best illustrate the contention that FATF often, although not always, functions in line with experimentalism. First, blacklisting has not played the decisive role commonly ascribed to it, even by practitioners. In addition, more purely coercive versions of the blacklists were less effective than the versions more aligned with experimentalism. Second, monitoring is a definitive aspect of FATF, and a close-look shows that it is diagnostic, rather than punitive. Finally, there is an important reflexivity to FATF decision making, as shown in the development of a risk-based approach. All of these examples represent a broader experimentalist dynamic within FATF. They also all address issues that are at the center of what FATF aims to do. These examples are indicative of a trend, but also show experimentalism is at the center of FATF’s governance.

Blacklisting in FATF

As already noted, academics and practitioners alike grant the FATF blacklist a lot of significance in shaping the network’s influence. This is true, despite the paucity of evidence to verify that pattern. The experimentalist framework used here, however, uncovers a different story.

We begin with the fact that FATF had no formal policy for enforcing its standards for a full decade following its creation, despite the extensive monitoring process described above. The only note of enforcement was to be found in Recommendation 21 (now Rec. 19), which calls for all members to practice extreme caution in dealing with financial bodies from those states, including obtaining written explanations of the reason for any financial transaction.Footnote 9 The process for invoking enforcement was not evident. Not only was there no policy; meeting records show that some delegates early on explicitly rejected the use of blacklists or whitelists in FATF, and so FATF as a whole rejected them. Scholars have failed to consider this important reality: in the early phases of regime formation, members explicitly rejected material enforcement. Nevertheless, this period was a vital period in which members reached consensus on the foundational recommendations that remain the cornerstone of the regime.

Members developed the first process for enforcement in an ad hoc manner. It was in response to continued, flagrant non-compliance by Turkey and Austria. Turkey had failed to criminalize money laundering, while Austrian banks issued anonymous bank accounts (for more details, see [15]). FATF developed a loosely three-step process. States first had to respond to concerns by continuing to report in future FATF plenaries until found to be sufficiently compliant. The second step was a letter or series of letters from the FATF president expressing concern and a visit from a FATF delegation. The last step was expulsion from FATF and the invocation of the call for careful treatment of financial transactions with those jurisdictions. In contrast to the European Union, however, FATF has no competence to apply any kind of material sanction on non-compliant states. The recommendation also places the primary onus on private sector actors.

The second enforcement route garnered more headlines and is the primary source of FATF’s reputation as a coercive institution. Members developed this Non-Cooperative Countries and Territories (NCCT) process to deal with the problem of “non-compliant” non-members. Contrary to the long-suffering approach to members, non-members that had insufficient regulatory systems faced quick enforcement, especially those perceived to be intentionally marketing their lax standards. Members issued a report in 2000 that identified twenty-five criteria that comprised the most important recommendations. After members decided on the most problematic jurisdictions, they reviewed those jurisdictions’ AML systems, met with local authorities, and issued a report. The whole process took four months. Members listed sixteen jurisdictions and added eight more the following year. Even with this stricter system, Hülsse and Kerwer (2004) argue that, even in an AML regime in which non-compliance was common, the imposition of formal sanctions was an outlier. Furthermore, and despite the attention this process receives in scholarship, members stopped adding new jurisdictions after 2002, removing all jurisdictions from the list by 2006. Members then officially suspended the NCCT process. They eventually replaced with NCCT with the ICRG process, described above, that remains in place today.

The history of blacklisting reveals two points that bolster the case for an experimentalist interpretation. First, blacklisting clearly has changed over time and has done so in response to new knowledge about the problem. States initially rejected blacklists, but, as described in the Annual Report (1996), developed them in response to the perception that continued non-compliance by Turkey and Austria was “clearly damaging” their efforts. Members developed the NCCT process in reaction to the move by Seychelles to profit from money laundering. They developed the ICRG process in 2007 as a synthesis model, merging the member and the NCCT processes in light of protests from FATF members, targets of the NCCT process, and, most importantly, the World Bank and IMF, who refused to cooperate with FATF while the NCCT process was still in place. Members altered the ICRG process in June 2009. According to the 2009 Annual Report, the reform process had begun internally. The G20 gave additional momentum to those efforts when it called for regularly updated lists of non-compliant jurisdictions. There is a clear pattern within the blacklisting process, a process that many see as vital to FATF’s operation, whereby members alter it in light of perceptions about the challenges of implementation.

More importantly, with the important exception of the NCCT, which was not very effective and which members quickly abandoned, blacklists in FATF more generally have been applied only when the state or jurisdiction in question has stopped engaging the reform process. This is in line with experimentalist expectations of the use of a “penalty default.” An ideal typical penalty default entails a third party imposing “rules sufficiently unpalatable to all parties that each is motivated to contribute to an information-sharing regime that allows fair and effective regulation of their interdependence” (Sabel & Zeitlin, p. 2010). That predictably suboptimal outcome provides an additional incentive to engage in good faith with the decision-making process. This should be especially true since participants routinely revise the standards of an experimentalist process, providing those that participate with more voice than in a more hierarchical regulatory system. But that penalty default should be imposed only as a last resort. The use of blacklists against members prior to the NCCT fits this mold. Despite imperfect compliance in most members, only Turkey and Austria faced blacklisting and only as extreme cases. Once they re-engaged FATF, but before they were fully compliant, FATF members removed the two countries from the list.

The ICRG blacklisting process in place today operates similarly. It effectively establishes two categories of non-compliance and only those on the worst performing list face calls for enforcement. The first list—the so-called grey list—has a rotating cast of inhabitants based on whether members determine different states to be cooperative or un-cooperative, not whether they are compliant or non-compliant.Footnote 10 None have every moved to the blacklist. Since the ICRG process’ inception, only North Korea (the Democratic Peoples’ Republic of Korea) and Iran have been blacklisted. This suggests the dynamic of a penalty default. It certainly cuts against the argument that enforcement is credible, which should diminish its efficacy. Once a state engages the process and makes credible plans to improve its AML system, members quickly remove the state from the grey list.

In short, experimentalism provides a more comprehensive understanding of the FATF blacklists, the core body of evidence for a coercive understanding of FATF, than do other approaches. For most of its history, members have been reluctant to deploy their substantial material power. When they have done so, the results have been mixed at best. Members also have altered the enforcement process in response to experience implementing the strategy. Those changes in general have moved away from a more material application and more toward the “penalty default” logic of experimentalism.

Evolution of diagnostic monitoring

Monitoring is another vital part of FATF that displays key experimentalist attributes. Building from the previous section, if blacklisting operates more in line with the notion of a penalty default rather than more standard conceptualizations of material enforcement, then it stands to reason that the role of extensive monitoring within the FATF system should be more diagnostic than punitive. The mutual evaluations are among the most extensive monitoring of any international organization and imperfect compliance is well documented, yet enforcement remains rare. This stands in stark contrast to standard approaches to monitoring and enforcement within political science, which see monitoring as necessary to detect defections and to provide the evidence needed to overcome the collective action problem associated with enforcement. Without strict monitoring and enforcement, compliance with standards that require meaningful change from states will be lacking.Footnote 11

Monitoring in FATF has shown the experience-based revision that experimentalism prescribes. As a one-year task force, the question of monitoring was irrelevant. This changed once the 40 Recommendations had been published and the mandate expanded for an additional year. Delegates by then had converged around the idea that FATF should monitor the implementation of the recommendations. At its beginning that meant a self-evaluation questionnaire in which states noted with a simple “yes” or “no” whether they had implemented or planned to implement a particular recommendation.

After the initial round of evaluations, the 1993 annual report indicates members found that the format yielded too little information. The revised questionnaire was an enhanced version of the self-evaluations that allowed states to explain at length whether and how they intended to implement the recommendations. At the same time, again under the temporary mandate of FATF, states proposed that FATF begin the much more invasive system of mutual evaluations.Footnote 12 This phase, however, still focused predominantly on whether states had the appropriate laws on the books. By the third round of mutual evaluations, which began in 2004 and ended in 2014, members shifted the focus to include whether those laws were actually being implemented, rather than a simple box-ticking exercise of whether the laws had been passed.

At about the same time they implemented the third round standards on implementation, members began discussions about how to move the standard up from implementation to effectiveness. In that vein, the common methodology that evaluators filled out in conducting the third round of mutual evaluations included a small amount of vague guidance on how to assess the effectiveness of a jurisdiction’s AML system. The handbook for assessors published in 2009 gave more specific instructions to the assessment team. It expounded upon the methodology and included a three-page annex on the kinds of information that assessors might seek in judging effectiveness. The primary focus of the third round remained on “technical compliance,” or whether states had the appropriate laws on the books and whether those laws were being implemented. The inclusion of effectiveness in the third round was, for lack of a better phrase, experimental.

Near the end of the third round, FATF began revising its standards, in the members’ words, “to take into account the changing threats to the international financial system, and close any shortcomings and loopholes in the existing Recommendations, reflecting the lessons learnt from implementing and evaluating them” ( [45], p. 15). As a result of those discussions, the fourth round takes as its primary focus the question of effectiveness, a condition in which “Financial systems and the broader economy are protected from the threats of money laundering and the financing of terrorism and proliferation, thereby strengthening financial sector integrity and contributing to safety and security” (ibid.). In judging that effectiveness, and reflecting the outcome-based nature of experimentalism, the methodology emphasizes that the aim is not to assess how a jurisdiction “is implementing individual Recommendations; or the performance of specific organisations, or institutions” ( [45], p. 14). Rather, assessors will rate states’ performances on eleven “immediate outcomes,” which FATF members believe feed eventually into overall effectiveness. These outcomes retain, however, the broadly written character commonly found in experimentalism. Outcome 2, for example, reads: “International cooperation delivers appropriate information, financial intelligence, and evidence and facilitates action against criminals and their assets” ( [45], p. 15). Outcome 5 reads: “Legal persons and arrangements are prevented from misuse for money laundering or terrorist financing, and information on their beneficial ownership is available to competent authorities without impediments” (ibid.). Then-FATF President Bjørn Aamo noted that the approach was novel for a standard-setting body and so would entail challenges for FATF. He stressed that the new approach would “result in an in-depth knowledge of a country’s money laundering and terrorist financing threats and risks and its AML/CFT measures.”Footnote 13 ( [45], p. 7).

This evolution of monitoring shows strong signs of experimentalist dynamics, particularly in the sense that changes seem to come in light of experience. This does not mean that the system is ideal. The outcomes of the 4th round of evaluations are still to be seen. The structure of evaluation itself seems to have responded to some external critics. Global Witness, for example, in 2012, proposed a series of reforms to the common methodology that it felt would strengthen the AML regime and the 2013 common methodology and the 4th round evaluation reports closely reflect many of those proposals [48]. Whether or not those proposals directly influenced FATF is not clear. It is possible that they did, in which case it is a good, specific example of the feedback loop that experimentalism expects. It is also possible that both documents reflect a more general consensus within the expert community regarding the needed changes for a better AML system. In some regards, this would be a still strong example, if more generalized, of how diffused knowledge and expertise within the broader AML network is integrated effectively into FATF’s standards. Either way, the evolution of monitoring goes beyond “just” process.

The information generated by mutual evaluations, however, is only as good as the steps members take to use that information. Here, the peer review system deserves special attention. According to numerous interviews with participants, the peer review system is a valuable one. It plays two key roles. The first is to allow for peer pressure. Members discuss the results of the Mutual Evaluations in Plenary. The President appoints a discussant who is responsible for ingesting the full report and questioning the relevant jurisdiction on the problems identified. This can only work if the state under examination accepts the legitimacy of the questioning state, as well as the FATF process more generally.

The peer review also can enhance the accuracy of the evaluation process. The 1992–1993 FATF Annual Report explicitly acknowledges this role: “The purpose of this phase is to verify the validity of the facts and the state of implementation. This intensive peer review is a necessary means for reaching a clear and unbiased assessment of where the country stands and of the areas in which further efforts may be warranted.”Footnote 14 In interviews, delegates to FATF from both Mexico and the United States provided anecdotes of cases in which their respective delegations were able to justify in the peer review some domestic law or regulation that reviewers initially had considered non-compliant with FATF standards. Through peer review, however, they were able to explain how the law in question, in their respective contexts, in fact was closer to the FATF standard than originally judged. As a result, members judged them to be partially compliant instead of non-compliant.

In sum, standard schools of thought in International Relations understand monitoring largely as a tool of enforcement, monitoring in experimentalist systems is primarily diagnostic. Monitoring also reflects the evolutionary dynamic that experimentalism predicts. To be sure, FATF members can be critiqued for not having turned sooner to the question of effectiveness and time will tell how well they are able to gauge and influence outcomes in that regard. The gradual move from self-questionnaires, to mutual evaluations, and from mutual evaluations focused on passing laws, to implementing laws, and, now, to ensuring the effectiveness of those laws, represents more than just a change in procedures. It is a fundamental re-writing, and in this case a ratcheting up, of the standards that FATF requires for compliance. In this regard, experimentalism provides us with a much more nuanced understanding of the role and evolution of monitoring within FATF.

Specifying a “risk-based approach”

A third, more specific, example of experience-driven change in FATF is the continued development of knowledge surrounding a risk-based approach (RBA) to regulation. In brief, a risk-based approach to AML means that the level of regulatory attention paid to any one recommendation by FATF should be determined by the likelihood that the jurisdiction in question would be subject to such abuse. FATF describes it in a guidance paper on the topic: “The general principle of a RBA is that where there are higher risks countries should require financial institutions to take enhanced measures to manage and mitigate those risks, and that correspondingly where the risks are lower (and there is no suspicion of money laundering or terrorist financing) simplified measures may be permitted” (18). This in theory is a significant move forward in the contextualization of FATF’s standards and in general most actors seemed to endorse the idea of a risk-based approach. In practice, however, the new approach generated much uncertainty about how to carry out what FATF and national regulators would consider an appropriately rigorous risk assessment. While the process is still on-going, the process of developing and elaborating on a risk-based approach reflects the logic of experimentalism in two important ways.

To begin, members developed the risk-based approach in close consultation with the private sector. The group that drafted the key documents was comprised of representatives from fourteen FATF member states, six international organizations, eight securities firms or organizations, and twenty-one banking industry representatives (either individual banks or banking associations). A member of the United Kingdom’s Financial Services Authority and a private sector AML expert who, at the time, worked for GE Money in the United States were co-chairs. As members noted in the guidance on RBA adopted in 2008, it was “the first occasion that the FATF has developed guidance using a public-private sector partnership approach” ( [49], p. 1). Their influence is visible in the standards. Para. 1.13 of the 2007 guidance on the RBA notes: “…it must be recognized that any reasonably applied controls, including controls implemented as a result of a reasonably implemented risk-based approach will not identify and detect all instances of money laundering or terrorist financing. Therefore, regulators, law enforcement and judicial authorities must take into account and give due consideration to a financial institution’s well-reasoned risk-based approach.” In this sense, the development of the RBA within FATF also represents the increased integration of stakeholders into the standard-setting process, as experimentalism prescribes.

That statement also points to a central question in the further development of the RBA. What is an acceptable level of risk and who oversees those decisions as made by the financial actors in question? Without meaningful oversight, of course, RBA can easily mean simple de-regulation. For FATF, the answer to one part of that question is that jurisdictions using RBA must put in place “an adequate mechanism” to judge the appropriateness of institutions’ risk determinations. National authorities are responsible for overseeing the risk assessments process that institutions use in applying exemptions.

At the same time, FATF has continued to work to clarify what those “adequate mechanisms” are. FATF originally gave the following description in the 2012 FATF Recommendations: “When a financial activity is carried out by a natural or legal person on an occasional or very limited basis (having regard to quantitative and absolute criteria) such that there is a low risk of money laundering or terrorist financing, a country may decide that the application of AML/CFT measures is not necessary, either fully or partially.” As FATF itself notes, that standard is very important and very vague, in particular the standard of activity “on an occasional or very limited basis.” The initial guidance on the RBA foresaw and acknowledged this challenge. It also emphasized that the goal was not to develop “a single model” but “to provide guidance for a broad framework based on high level principles and procedures that countries may wish to consider when applying the risk-based approach…” ( [49], p. 1).

In 2011, FATF issued another guidance on the implementation of its Recommendations, with a heavy focus on how to implement an RBA. In it, members elaborated on the standards for exemptions from FATF’s Recommendations for an individual or sector. It notes that “Countries that opt for such an exemption must be able to make and demonstrate the correlation and cause and effect relationship between, on the one hand, the very limited and occasional nature of the financial activity and, on the other hand, the assessed low level of ML and TF risk” (20). It also highlights, however, that based on the mutual evaluations, “In most countries, the current exemptions are essentially based on a ‘perception’ of low risk because of the size of the activity or its nature…with no or very little evidence to support the risk ranking” (21). In monitoring the implementation of the risk-based approach, in other words, members had noted a pattern that was in conflict with the Recommendations.

Members then sought to provide further guidance based on a positive case. The European Union’s anti-money laundering directive had implemented similar flexibility in 2005, and a 2006 directive had set out the technical criteria for simplified customer due diligence requirements and for exemptions based on occasional or limited basis of the activity. It then provides the details of the UK’s regulation, which was written to meet the EU’s standards. FATF cites this implementing regulation in its 2011 guidance paper that aimed to help states implement the risk-based approach (p. 21). It bears repeating those standards here to emphasize the way in which the UK’s regulation fills in the details of the EU’s directive, which in turn are at least in part filling in FATF’s standards:

…a person is to be considered as engaging in financial activity on an occasional or very limited basis if all the following conditions are fulfilled:

  1. (a)

    the person‘s total annual turnover in respect of the financial activity does not exceed GBP 64000;

  2. (b)

    the financial activity is limited in relation to any customer to no more than one transaction exceeding 1 000 EUR, whether the transaction is carried out in a single operation, or a series of operations which appear to be linked;

  3. (c)

    the financial activity does not exceed 5% of the person‘s total annual turnover;

  4. (d)

    the financial activity is ancillary and directly related to the person‘s main activity;

  5. (e)

    the financial activity is not the transmission or remittance of money (or any representation of monetary value) by any means;

  6. (f)

    the person‘s main activity is not that of [(a) credit institutions; (b) financial institutions; (c) auditors, insolvency practitioners, external accountants and tax advisers; (d) independent legal professionals; (e) trust or company service providers; (f) estate agents; and (h) casinos];

  7. (g)

    the financial activity is provided only to customers of the person‘s main activity and is not offered to the public.

FATF’s guidance also discusses how France applies an exemption to two sectors (money changers and insurance intermediaries). It describes the rules and then writes specifically that FATF sees those standards as acceptable, while noting that “The exemption…is mainly expected to apply to tourist offices, hotels, travel agencies, businesses serving foreign travellers, etc” (22). The guidance emphasizes that “The French authorities have been required by the FATF to apply adequate monitoring of these exemptions” (22). These same examples were highlighted in the 2013 guidance on the 40 Recommendations (25–6). In a 2013 guidance on how to conduct and use a risk assessment, FATF highlighted the approach used in Switzerland as an additional example.

In the case of the risk-based approach, FATF members, in close cooperation with the private sector, developed an intentionally broad standard: the RBA approach in general and, in particular, the standard of activity that occurs on “an occasional or very limited basis.” In monitoring the implementation, members noted the lax application of the standard for exemptions from FATF’s highest standard of scrutiny. They then draw on best practices by other states that had developed detailed procedures in response to the EU’s 2005 and 2006 AML directives that established the same standard. It remains to be seen how this process of RBA refinement will continue or how best practices will be integrated into FATF’s standards or its monitoring. The process to date, however, is a strong, substantive, and specific example of the process of experience-driven revision of international standards that experimentalism expects and prescribes.

Re-thinking FATF, re-thinking experimentalism

The unusual role and history of blacklisting, the diagnostic use of monitoring, and the experience-based development of the risk based approach all are important examples of experimentalism at work in the AML regime. This interpretation of FATF has several important implications for scholars of FATF in particular, of international cooperation more generally, and of experimentalist governance.

For scholars of FATF, an experimentalist understanding means taking more seriously the internal operations and process of FATF. This is difficult, as it requires careful process tracing to identify. To do otherwise, however, is to risk imputing causation to what in fact is correlation. Many observers argue that FATF is driven by the US and EU because their interests align. This overlooks substantial disagreements among the US and the many diverse members of the EU. It also ignores the possibility that the causal arrows, in some cases, point in the opposite direction. If this experimentalist interpretation is correct, it means that FATF, the network, plays a much larger role in shaping actor preferences than has previously been acknowledged.

This is not to argue that FATF and its members have always acted in line with the principles of experimentalist governance. The NCCT blacklist process, seen in isolation, was not experimentalist. The process by which members refined the blacklists, however, is in line with experimentalism. Likewise, the fact that blacklisting remains extremely rare and even “grey-listing” is a short-lived matches the experimentalist prediction that standards can be raised over time without needing to increase the role of enforcement. The shift to judging system effectiveness shows this same ratcheting up dynamic.

Nor is the argument that FATF is a power-free realm. Rather, experimentalist governance alters the form of power that is most effective. It alters “the coin of the realm” from coercion to learning. When FATF has veered from the experimentalist track and turned to a more coercive model is when it has faced the most problems. That should alter our understanding of what it takes for FATF to be a more effective tool in the fight against illicit finance.

This understanding of FATF also underscores the seriousness of the problem raised by Levi et al., in this volume, namely, that states face a mutual evaluation only once every 8 years or more now. If the goal is to generate more information that helps states react to a fast-moving environment, mutual evaluations should happen more frequently, as they did in the beginning. Understanding FATF from an experimentalist perspective underscores how problematic that is.

This relatively long-running “experiment with experimentalism” provides several important lessons for those seeking to understand how and where experimentalism might fit among the panoply of governance tools at work today. It confirms de Búrca’s [50] argument that experimentalist institutions are not always designed to be so, but sometimes develop in response to uncertainty. It reinforces the notion that experimentalist governance is not a power-free governance system and as such is subject to misuse by the powerful. It also reinforces the idea that such misuses are overcome more easily in experimentalist governance than is true in most quasi-hierarchical systems. The case illustrates that experimentalism can work even in cases of significant power asymmetries, as is true of international finance.

This question about the nature of FATF also raises important questions for policy makers. As noted throughout this special issue, the most pressing question for FATF right now is its overall effectiveness. Is it slowing money laundering? Leaders thus far have been quite open about their uncertainty regarding how best to measure effectiveness. While this seems long overdue, if experimentalism can be harnessed to improve the networks’ understanding of its own effectiveness, the result should be a better FATF and better AML regime.

The findings here also draw attention to the general applicability of experimentalist models. I have argued elsewhere [15, 51] that the United Nations’ recent efforts to fight proliferation of Weapons of Mass Destruction under the auspices of UN Security Council Resolution 1540 has experimentalist overtones. The experimentalist work cited above finds it at work in areas ranging from environmental policy to human rights. What changes in policy-making does experimentalism require? This question deserves greater attention, but initial evidence suggests that such process require a longer time horizon for results and greater tolerance for deviation across local contexts. It also seems to require an openness to critique in order to promote learning. Governments often find those conditions difficult to accept, so the task becomes convincing leaders that the improvement in problem-solving is worth the costs.