1 Introduction

Excessive and useless reporting, known as the “crying wolf effect” (Takats 2011), is a crucial shortcoming that any anti-money laundering (AML) design aims to address and fix. The “crying wolf effect” harms the informational value of reports that banks and other professionals are obliged to file to comply with AML regulations.

The AML system in Europe and in the United States consists of a three-layer hierarchy of enforcers: financial intermediaries and other professionals; a Financial Intelligence Unit (FIU), normally established at the central bank; and the judiciary system. At the first level, financial intermediaries and other professionals are required to monitor all financial transactions and report suspected acts of money laundering to the FIU by filing a suspicious transaction report (STR).

Initially, the AML system followed a rule-based approach. Financial intermediaries and other professionals used a set of standardized criteria (determined by the law and the FIU) to identify suspicious transactions and report them to the FIU. In that system, the role of financial intermediaries and other professionals was relatively passive. A chief problem of the rule-based approach was the high number of STRs erroneously issued by financial intermediaries and professionals. The high incidence of type-I errors (false positives) in the rule-based AML system was considered inefficient because it wasted the FIU’s resources. Moreover, it was ineffective in deterring money laundering and detrimental for intermediaries and professionals (especially from a reputational perspective). However, simply raising the bar by imposing stricter rules and criteria for reporting a transaction as suspicious to the FIU was not a solution. In this regard, false negatives represent an additional problem for AML systems, as decisions to not report potential money-laundering transactions (type-II errors) both dilute deterrence and make the financial system less reliable (Demetis 2010).

Between 2007 and 2010, AML policies in both the US and Europe switched from a rule-based reporting system to a risk-based system in which all layers of the system need to respond to money-laundering threats in ways that are proportionate to the risks involved.Footnote 1 In particular, financial intermediaries and other professionals are required to play an active role in identifying suspicious transactions (Black and Baldwin 2010; Dalla Pellegrina and Masciandaro 2009). They must exploit their knowledge and other information regarding the financial habits of their customers [the know-your-customer (KYC) approach] to better determine which transactions should be reported as suspicious to the FIU. They must also apply their subjective judgment to assess the actual risk of a transaction being money laundering. In fact, intermediaries and professionals are required to adjust their reporting criteria and, therefore, move up or down their decisional bars (reporting test) when deciding whether to report a transaction to the FIU based on the actual risk of money laundering (Axelrod 2017; Lowe 2017).

The risk-based approach was introduced starting from 2007 mainly to avoid over-reporting to the FIU without allowing type-II errors to explode. In general, the risk-based AML system aims to increase the reliability and accuracy of the STRs that financial intermediaries and other professionals send to the FIU. In this vein, the KYC approach (see Jeans 2016) should allow the reduction of the number of both type-I and type-II errors at the first level of the AML system. At the second level of the AML system, the FIU analyses the collected STRs and reports the transactions that it deems to be money-laundering acts to the judicial authority. Type-I errors in STRs submitted by financial intermediaries and professionals are typically dismissed by the FIU because they are not considered true money-laundering transactions.Footnote 2 The judicial authority, the third level of the AML system, collects reports from the FIU and decides whether to issue a referral to trial.

Type-II errors committed on both the first and second levels of the AML system can be detected by the judicial authority. The latter also collects reports on money laundering that come from institutions other than the FIU and from actors other than financial intermediaries and professionals. For instance, money laundering can be detected by the police while investigating other crimes. Sometimes, criminal organizations’ confessions describe how illegal funds are laundered and how those activities avoid AML measures (Arnone and Borlini 2010; Barone and Masciandaro 2019). Although they are possible, errors at the level of the law enforcement system are not included in this analysis, which focuses on the first level of the AML system.

As Unger and van Waarden (2009) discussed, despite the aim of making the (first level) of the AML system more reliable, the impact of the risk-based approach differs across countries. In some countries, over-reporting decreased, and the overall quality of the reported information improved. However, this is not the case in other countries.Footnote 3

This study aims to investigate whether the risk-based approach introduced in Italy in 2009 had the expected results in terms of increased reporting accuracy and, in particular, a lower rate of type-I errors at the first level of the AML system. The analysis is based on a theoretical model that describes the relations between the reporting test, type-I errors, type-II errors, and their sum (a measure of accuracy), and the deterrence of money laundering activities. In general, the empirical aim is to test the most important implications of the theoretical model using data from the Italian FIU in the aftermath of the risk-based approach introduction (2009–2012).Footnote 4 We perform regression analysis supported by an approach based on the concept of sufficient statistics (Chetty 2009). Our results show that deterrence is prioritized, although an increase in type-I errors likely occurred following the introduction of the risk-based system. We thereby make inferences regarding the trend in type-II errors using the predictions of the theoretical model. We conclude that type-II errors decreased during the period of interest.

The remainder of this paper proceeds as follows. In the next section, we present our theoretical framework. In Sect. 3, we empirically assess the model and discuss the policy implications. Section 4 concludes this paper. The three appendices provide all details of the analysis.

2 The Model

All three levels of the AML system make their decisions based on pieces of evidence that support or contradict the hypothesis that a certain transaction involves money laundering. In particular, financial intermediaries and professionals decide whether to issue an STR for a given transaction mainly by considering the transaction attributes and customer characteristics (Gara and Pauselli 2015).

The professional assessment of intermediaries is relevant and must be considered together with all the other elements when assessing the risk that a transaction is money laundering. Financial intermediaries and professionals issue an STR when the evidence corroborating the idea that a transaction involves money laundering is greater than a certain threshold they set to conduct the anti-money laundering activity (reporting test).

2.1 Evidence of Money Laundering

We assume that each attribute of a transaction, including judgements on the parties involved in the transaction, is a piece of evidence that takes either a positive sign when consistent with the suspicion of a money laundering or a negative sign when against that suspicion. We do not consider how the net evidence is produced; however, all pieces of evidence combine in the net evidence, which is the continuous random variable E (see Rizzolli and Saraceno (2013)).In particular, we assume the following:

  • \(\left(E|Guilty\right)\sim N\left({\mu }_{G},{\sigma }_{G}^{2}\right)\); we define gE(e) as the probability density function of E conditional on the client being guilty and GE(e) as its cumulative distribution function.

  • \(\left(E|Innocent\right)\sim N\left({\mu }_{I},{\sigma }_{I}^{2}\right)\); we define iE(e) as the probability density function of E conditional on the client being innocent and IE(e) as its cumulative distribution function.

  • μI < 0 < μG, and \({\sigma }_{I}^{2}={\sigma }_{G}^{2}\) (first-order stochastic dominance).

  • iE(− e) = gE(e) (symmetry).

The assumption of normality is for the sake of simplicity and is consistent with the graphic illustration of the model (Figs. 1, 2, 3, 4, 5, 6, 7); it might be relaxed. Regarding the assumption of first-order stochastic dominance of GE over IE, it is consistent with the idea that the net evidence is informational to some extent: on average, the net evidence is positive for a money-laundering transaction and negative for a transaction that does not involve money laundering. As the observed net evidence e increases, it becomes less likely that it is about an innocent client. Conversely, as the observed net evidence decreases, it is less likely to concern a guilty client.

Fig. 1
figure 1

Reporting test and probabilities of a correct or incorrect reporting decision

Fig. 2
figure 2

ROC curve

Fig. 3
figure 3

Reporting test and money-laundering rate for different levels of sanctions

Fig. 4
figure 4

Reporting activity and money-laundering rate

Fig. 5
figure 5

Error rate, shares of type-I and type-II errors, and money-laundering rate

Fig. 6
figure 6

Error rate for various MLRmin corresponding to various sanctions S

Fig. 7
figure 7

Error rate for various MLRmin corresponding to various sanctions S

Fig. 8
figure 8

Intervals of assumptions H1–H3

Fig. 9
figure 9

Scree plot from factor analysis on predicate crimes, Italian provinces, 2009–2012

The assumption of symmetry with respect to a vertical axis remains reasonable because we presume that the process of evidence production is analogous both in the case of guilty behavior and in the case of legitimate transactions. However, clients’ behavior could differently affect the availability of indications of innocence or guilt contingent on whether they are innocent or guilty. Elusive behavior on the side of guilty clients might imply conditional distributions that are not symmetric with respect to a vertical axis. The proofs in Appendix 1 are provided by assuming two generic distributions that are symmetric with respect to any given vertical axis (in the figures provided in the rest of the paper, the conditional distributions are symmetric with respect to the vertical axis e = 0). By inspecting the proofs provided in Appendix A.1.1 we observe that symmetry is needed only to provide clearer and intuitive thresholds and the results of Sect. 2.4. Implications for the gold standard and deterrence (Result 1) and the relation between error rate, reporting test, and sanctions (Result 2) are valid even by relaxing symmetry.

2.2 The Reporting Test and the Decision to Issue an STR

For each transaction, financial intermediaries and professionals observe a certain net evidence e and issue an STR when e is greater than a certain threshold, ê, which is the reporting test. Therefore, given a transaction, the probability of a correct STR being issued is \(\mathrm{P}\left[E>\widehat{e}|Guilty\right]=1-{G}_{E}\left(\widehat{e}\right)\). Conversely, an STR is incorrectly issued (type-I error) with a probability of \(\mathrm{P}\left[E>\widehat{e}|Innocent\right]=1-{I}_{E}\left(\widehat{e}\right)\). Moreover, an STR is correctly not issued for a transaction that is not an act of money laundering with a probability of \({I}_{E}\left(\widehat{e}\right)\). Finally, an STR is incorrectly not issued for a money-laundering transaction (type-II error) with a probability of \({G}_{E}\left(\widehat{e}\right)\). Figure 1 illustrates these probabilities as functions of the reporting test \(\widehat{e}\).Footnote 5

Note that the probabilities of committing a type-I or a type-II error for a given transaction depend on the overlap between iE(e) and gE(e). As shown by the ROC curves (Marzban 2004) plotted for different pairs of conditional distributions in Fig. 2, the usefulness of the reporting test (as measured by the area under the ROC curve) depends on the extent to which the conditional distributions overlap and, in turn, how the evidence is either confounding or discriminating. The gold standard reporting test that maximizes the probability of reporting a true positive while simultaneously minimizing the probability of reporting a false positiveFootnote 6 is such that the conditional probability distributions iE and gE cross each other, that is \({\widehat{e}}_{gold}: {i}_{E}\left({\widehat{e}}_{gold}\right)={g}_{E}\left({\widehat{e}}_{gold}\right)\). A formal proof is provided in Appendix 1.Footnote 7

2.3 Reporting Test, Sanctions, and Money Laundering Activity

The possibility of being caught when laundering money represents a deterrent to money laundering activities. Deterrence depends on several factors, particularly the effectiveness of the AML system and imposed sanctions. As we focus on the first level of the AML system, we model the deterrent impact of a given reporting test \(\widehat{e}\) by considering the related probabilities to be reported as a suspicious client, ceteris paribus; that is, we do not explicit other factors, including the probability of being investigated by the police as a consequence of independent investigations or the probability that the further levels of the AML system decide to discharge intermediaries’ reports.Footnote 8

We define the stochastic variable w > 0 as the extra gain that a money laundering transaction would produce, net of the gain that could otherwise be obtained legally (i.e., by obeying the law). Then, we define S > 0 as the expected sanction applied when an STR is issued.Footnote 9

A rational, risk-neutral individual undertakes a money-laundering transaction only when the associated expected benefit is greater than what is expected from abstaining and being eventually erroneously sanctioned. Thus, a money laundering transaction is committed when

$$ \begin{array}{ll} {w - S\left[ {1 - G_{E} \left( {\hat{e}} \right)} \right] > - S\left[ {1 - I_{E} \left( {\hat{e}} \right)} \right]} \\ {w > S\left[ {I_{E} \left( {\hat{e}} \right) - G_{E} \left( {\hat{e}} \right)} \right] \equiv \hat{w}\left( {S,\hat{e}} \right)} \\ \end{array} $$
(1)

Given (1), the probability that an individual decides to engage in a money laundering transaction is

$$\mathrm{P}\left[w>\widehat{w}\left(S,\widehat{e}\right)\right]$$
(2)

As we do not know the probability distribution of w, we restrict ourselves to defining a probability measureFootnote 10 consistent with the probability defined in (2). In particular, starting from the deterrence threshold \(\widehat{w}\left(S,\widehat{e}\right)\) expressed in (1), we define a probability measure such that the condition determining the probability expressed in (2) is verified for a given value of w. Specifically, we define the money-laundering rateFootnote 11 (MLR) as

$$MLR\left(S,\widehat{e}\right)=1-\widehat{w}\left(\widehat{e}\right)=1-S\left[{I}_{E}\left(\widehat{e}\right)-{G}_{E}\left(\widehat{e}\right)\right]$$
(3)

As expected, the money laundering rate negatively depends on sanctions \(S\) (see Fig. 3 and proof A.1.2. in Appendix 1). As illustrated in Fig. 3, for any level of sanctions S, the money laundering rate \(MLR\left(\widehat{e}\right)\) is minimal when the reporting test is set equal to the gold standard \({\widehat{e}}_{gold}\) (see proof A.1.3 in Appendix 1).Footnote 12

Result 1

(Deterrence) The reporting test that minimizes the money-laundering rate is the gold standard \({\widehat{e}}_{gold}: {i}_{E}\left({\widehat{e}}_{gold}\right)={g}_{E}\left({\widehat{e}}_{gold}\right)\).Footnote 13 Sanctions negatively affect the money laundering rate.

2.4 Reporting Activity, Errors, and Accuracy

As shown in Sect. 2.3, by setting the reporting test, signaling bodies can affect the money laundering rate. If they opt for the gold standard, they maximize deterrence. However, the reporting test implemented by intermediaries depends on their aims. Different elements concur in the formation of the reporting test used by the signaling bodies: primarily, the bank’s incentive to monitor and report and the actual possibility to identify the true nature of a transaction (Takàts 2011). Monitoring and reporting are costly tasks for signaling bodies. Similarly, mistakes in these dual tasks are costly because type-II errors are typically sanctioned by the upper levels of the AML system, and type-I errors can imply relevant reputational costs. Although it is not obvious that type-I and type-II errors should be equally weighted from the intermediaries’ perspective and even from a social welfare maximization perspective, given the overall approach of the AML system described in Sect. 1, we assume that the pivotal objective for signaling bodies is to maximize the accuracy of their reporting activity.

Given the sanctions \(S\) and the reporting test \(\widehat{e}\), money laundering occurs with probability \(MLR\left(S,\widehat{e}\right)\). This implies that financial intermediaries and professionals that observe all the transactions that take place report a share of transactions to the FIU as suspicious, corresponding to

$$ STR\left( {S,\hat{e}} \right) = \underbrace {{MLR\left( {S,\hat{e}} \right) \times \left( {1 - G_{E} \left( {\hat{e}} \right)} \right)}}_{{Share\;of\;correct\;reports}} + \underbrace {{\left( {1 - MLR\left( {S,\hat{e}} \right)} \right) \times \left( {1 - I_{E} \left( {\hat{e}} \right)} \right)}}_{{Share\;of\;type - Ierrors}} $$
(4)

where the first addendum corresponds to the share of truly positive reports, and the second addendum corresponds to the share of falsely positive reports (type-I errors).

A further relevant measure for the AML system is the error rate \(ER\left(\widehat{e}\right)\), which is defined as the share of total transactions that are either erroneously reported as suspicious (type-I errors) or erroneously not reported though implying money laundering (type-II errors) by the signaling bodies:

$$ \begin{aligned} ER\left( {\hat{e}} \right) & = \underbrace {{\left( {1 - MLR\left( {\hat{e}} \right)} \right) \times \left( {1 - I_{E} \left( {\hat{e}} \right)} \right)}}_{{Share\;of\;type - Ierrors}} + \underbrace {{MLR\left( {\hat{e}} \right) \times G_{E} \left( {\hat{e}} \right)}}_{{Share\;of\;type - II\;errors}} \\ &= S\left[ {I_{E} \left( {\hat{e}} \right) - G_{E} \left( {\hat{e}} \right)} \right]\left( {1 - I_{E} \left( {\hat{e}} \right)} \right) + (1 - S\left[ {I_{E} \left( {\hat{e}} \right) - G_{E} \left( {\hat{e}} \right)} \right])G_{E} \left( {\hat{e}} \right) \\ \end{aligned}$$
(5)

We study \(STR\left(S,\widehat{e}\right)\) and \(ER\left(S,\widehat{e}\right)\) as functions of the reporting test \(\widehat{e}\) (see Fig. 4) and sanctions S. On this concern, note that banks and intermediaries (the first level of the AML system) can adjust the reporting test that they employ in their AML activities while considering sanctions S as exogenous.

Regarding the reporting activity \(STR\), it is decreasing in sanctions S since higher sanctions discourage money laundering (see Appendix A.1.4). Instead, \(STR\) is not a monotonic function of the reporting test; it is initially decreasing and then locally increasing when the reporting test is sufficiently higher than the gold standard to induce intense money-laundering activity that, in turn, pushes reporting. Finally, when the reporting test increases even further, reports decrease, although money laundering is intense because the strictness of the test prevails.

It is worth noting that our analysis can be limited to the neighborhood of the gold standard \({\widehat{e}}_{gold}\). On the one hand, reporting tests in the neighborhood of the gold standard are sufficiently accurate for each given transaction (see the ROC curve in Fig. 2) and guarantee a good level of deterrence (see Fig. 3). On the other hand, as showed in Fig. 4, “extreme” reporting tests (either very low or very high with respect to the gold standard) result in limit cases: a very low reporting test (\(\widehat{e}\ll {\widehat{e}}_{gold}\)) implies that all the transactions are correctly reported as suspicious (no errors) because any incentive to abstain from money laundering completely disappears (money laundering activity explodes, as shown in Fig. 3). Analogously, a very high reporting test \(\widehat{e}\gg {\widehat{e}}_{gold}\) implies that, although all the transactions are illicit, none is reported (100% of errors).Footnote 14 The proofs are provided in Appendix A.1.4.

Concerning the errors that signaling bodies make, we observe that \(ER\left(\widehat{e}\right)\) is decreasing in sanctions S when the applied reporting test is smaller than the gold standard and increasing in sanctions S when the applied reporting test is greater than the gold standard (see Appendix A.1.5).

\(ER\left(\widehat{e}\right)\) is the sum of the share of type-I errors and the share of type-II errors (Fig. 5). By inspecting (5), we observe that when the reporting test is low, the share of type-I errors prevails (the second addendum is initially null and then very small because \({G}_{E}\left(\widehat{e}\ll {\widehat{e}}_{gold}\right)\to 0\)); conversely, when the reporting test is high the share of type-II errors prevails (the first addendum is very small and finally null because \({1-I}_{E}\left(\widehat{e}\gg {\widehat{e}}_{gold}\right)\to 0\)). This is consistent with the general idea that stricter tests imply many false negative reports, whereas excessively lax tests result in many false positive reports.

As illustrated in Fig. 5 (and proven in Appendix A.1.5), \(ER\left(\widehat{e}\right)\) shows a global maximum and a global minimum. Both these points do not deserve particular comments as they correspond to reporting tests either so low or so high with respect to the gold standard that they result in the limit cases described above: \(\widehat{e}\ll {\widehat{e}}_{gold}\) implies that all the transactions are correctly reported (0% of errors) because any incentive to abstain from money laundering completely disappeared; \(\widehat{e}\gg {\widehat{e}}_{gold}\) implies that, though all the transactions are illicit, none is reported (100% of errors).

Focusing on the neighborhood of the gold standard \({\widehat{e}}_{gold}\) and defining \({\widehat{e}}_{minER}\) as the reporting test that locally minimizes the error rate, we prove that \(ER\left(S,\widehat{e}\right)\) has its local minimum in \({\widehat{e}}_{gold}\) if and only if \(S=\frac{0.5}{\left({I}_{E}\left({\widehat{e}}_{gold}\right)-{G}_{E}\left({\widehat{e}}_{gold}\right)\right)}\) (see Appendix A.1.5). In addition, we verify that when \(S<\frac{0.5}{\left({I}_{E}\left({\widehat{e}}_{gold}\right)-{G}_{E}\left({\widehat{e}}_{gold}\right)\right)}\) the local minimum of the error rate is on the left of \({\widehat{e}}_{gold}\); the smaller the sanctions, the lower the local minimum of the error rate. Conversely, when \(S>\frac{0.5}{\left({I}_{E}\left({\widehat{e}}_{gold}\right)-{G}_{E}\left({\widehat{e}}_{gold}\right)\right)}\) the local minimum of the error rate is on the right of \({\widehat{e}}_{gold}\); the bigger the sanctions, the higher the local minimum of the error rate.

Finally, recalling Eq. (3) and result 1, we define \({MLR}_{min}\equiv 1-S\left[{I}_{E}\left({\widehat{e}}_{gold}\right)-{G}_{E}\left({\widehat{e}}_{gold}\right)\right]\). Now, we can rewrite the relation between sanctions and \({MLR}_{min}\) as \(S=\frac{1-{MLR}_{min}}{{I}_{E}\left({\widehat{e}}_{gold}\right)-{G}_{E}\left({\widehat{e}}_{gold}\right)}\). This allows us to reformulate the previous implications as follows:

$$\begin{array}{cc}if {MLR}_{min}>0.5,& {\widehat{e}}_{minER}<{\widehat{e}}_{gold}\\ if {MLR}_{min}=0.5,& {\widehat{e}}_{minER}={\widehat{e}}_{gold}\\ if {MLR}_{min}<0.5,& {\widehat{e}}_{minER}>{\widehat{e}}_{gold}\end{array}$$
(6)

Moreover, by following the same reasoning made for sanctions, we remark that the higher the \({MLR}_{min}>0.5\), the lower the local minimum of the error rate; the lower the \({MLR}_{min}<0.5\), the higher the local minimum of the error rate. Figure 6 provides examples of the usual parameters.

Finally, the analysis above allows to derive the following:

Result 2

(Accuracy of reporting activity) The reporting test \({\widehat{e}}_{minER}\) that (locally) minimizes the error rate decreases as the minimum money-laundering rate MLRmin increases (sanctions S decrease). \({\widehat{e}}_{minER}={\widehat{e}}_{gold}\) only for a specific level of \({MLR}_{min}\) (sanctions S).

As expressed, result 2 is quite general, while the relevant threshold for \({MLR}_{min}=\) 50% (corresponding to the sanction threshold \(S=\frac{0.5}{\left({I}_{E}\left({\widehat{e}}_{gold}\right)-{G}_{E}\left({\widehat{e}}_{gold}\right)\right)}\)) strictly depends on the assumptions of symmetric conditional distributions of g and i. Figure 7 graphically summarizes the present implications by assuming the usual parameters.

As illustrated by the model, intermediaries and professionals typically face a trade-off when setting their reporting tests. While deterrence goals can always be pursued by setting the reporting test to the gold standard \(\widehat{e}:\) \({i}_{E}\left(\widehat{e}\right)={g}_{E}\left(\widehat{e}\right)\), the accuracy of reporting activity depends on the minimum level of money-laundering activity (which in turn depends on sanctions). Those intermediaries and professionals concerned with the accuracy of their reporting activity should opt for a reporting test lower than the gold standard when sanctions are low and money-laundering activity is intense. Conversely, they should employ a reporting test higher than the gold standard when sanctions are high and money-laundering activities are not overly pervasive.

3 Empirical Analysis

The empirical analysis aims to assess the evolution of the quality of the STR received from financial intermediaries and processed by the Italian FIU to understand whether the risk-based approach introduction has changed the reporting test adopted by intermediaries at the first level of the AML system.

Two considerations are worthwhile before conducting the empirical exercise. First, the analysis focuses on the role of financial intermediaries in combating the phenomenon of money laundering. Banks and other financial companies play a predominant role in the transmission of suspicious transactions. For example, in the period on which our analysis focuses (2009–2012) financial intermediaries transmitted 48,822 STRs to the Italian FIU, against 805 STRs sent by professionals and non-financial operators.Footnote 15 No STR was received from other potential reporting agents. Furthermore, given the size and quantity of transactions processed, financial intermediaries have developed sophisticated technologies and software to improve the quality of reporting activity. These instruments are infrequently used by other reporting agents, in light of the economies of scale required to use them.

Second, by quality of STR, we refer to the possibility of financial intermediaries to (i) adjust the share of errors they commit, and (ii) affect ML deterrence in the design of the suspicious reporting mechanism.

Through the predictions provided by the model illustrated in the previous section, the empirical analysis is primarily aimed at understanding how possible changes in the reporting activity due to the introduction of the risk-based approach and KYC principles may have affected the number of STRs (for a given volume of banking activities), the money laundering rate (identified as the dimension of laundering activity in relation to predicate crimes), and consequently type-I error (false positives reported by the banks as suspicious transactions to the FIU). It is therefore important to note that this study is not aimed to estimate the volume of ML, but rather focuses on the evolution of the behavior of intermediaries in the process of reporting STRs to the FIU.

To empirically evaluate this aspect, we perform regression analysis on Italian data, referring to the period immediately following the introduction of the risk-based approach (which took place in 2009 in Italy) at the provincial level. This will also allow to account for geographical specificities that might characterize the incidence of both laundering and reporting activities (dalla Pellegrina et al. 2020a, b). We use regression analysis, based on the use of proxies of the reporting test, STR, and money laundering rate (MLR) along with the model’s predictions to orient the statistical inference regarding the Italian situation in the period after 2009. Finally, by combining the empirical evidence obtained through regression analysis with the time evolution of aggregated data (at the national level), we will provide inference in terms of type-I error, and from there, we will use the model to infer the pattern of type-II error in the period of interest.Footnote 16

3.1 Empirical Hypotheses

In the first part of the analysis, we check the relationship between i) a proxy for the reporting test and the actual evolution of STR in terms of the volume of bank activity, and ii) a proxy for the reporting test and the MLR.

Referring to the model in Sect. 2, we make the following assumptions:

H1. If both (i) and (ii) indicate a negative correlation between the variables of interest, we infer that the reporting test is lower or in the positive neighbors of the gold standard (see interval H1 in Fig. 8).

H2. If both (i) and (ii) indicate a positive correlation between the variables of interest, we infer that the reporting test is higher than the gold standard (interval H2 in Fig. 8).

H3. If (i) indicates a negative correlation between STR and the reporting test, while (ii) indicates a positive correlation between MLR and the reporting test, we can infer that the latter is much higher than the gold standard (or in a narrow interval at the immediate right-hand side of the gold standard) (intervals H3 in Fig. 8).

However, the assumptions outlined above do not represent sufficient conditions to understand the evolution of the variables of interest, that is, STR and MLR in the period following the introduction of the risk-based approach. In fact, referring again to Fig. 8, we are not aware of the increasing or decreasing pattern of the reporting test after 2009, as we do not know if the intermediaries have decided to raise or lower the reporting test.

To better understand these aspects, the official statistics reporting trends over time of (our proxies of) the reporting test, STR, and MLR are of fundamental support. In particular, if both STR and MLR increase over time and at the same time, the proxy of the reporting test decreases, we can infer that H1 is verified. The same will be true if both STR and MLR decrease over time, and at the same time, the proxy of the reporting test is increasing.

On the contrary, if both STR and MLR increase over time and at the same time, the proxy of the reporting test increases, and we can infer that H2 is verified. The same will be true if both STR and MLR decrease over time, and at the same time, the proxy of the reporting test decreases. If none of these cases occur, hypothesis H3 holds.

According to the arguments presented in Sect. 1, we expect that H1 may plausibly hold; in particular, that financial intermediaries have relaxed the reporting test as a consequence of the imbalance between pro and cons of the reporting activity by intermediaries (“crying wolf” attitude). This would mean that banks now ask for less evidence to report suspicious transactions to the FIU, and consequently, the number of bad reports may have reasonably increased. Using official data from the FIU, we check whether STRs to the FIU have actually increased over time in relation to the banking activity, and especially if this increase in reports has translated into an increased rate of dismissal of reports received (at an aggregate level), a figure officially recognized as a measure of type-I error.Footnote 17 From here, we proceed with the appropriate inference of the type-II error using only the predictions of the model.

3.2 Data Sources and Proxies for Reporting Test and STR

The data collected from the Italian provinces cover the period from 2009 (corresponding to the introduction of the risk-based approach in Italy) to 2012.Footnote 18

A challenging methodological aspect stems from the fact that we cannot directly observe the true incidence of money laundering because of its concealed nature. We use the number of police reports for ML as a proxy of the amount of assets laundered by criminals. As widely debated in the literature, this measure has to be taken cautiously as criminal activities are part of an underground economy, thus the number of reports submitted to the authorities provides only partial insight into the overall phenomenon. Nevertheless, the total number of money laundering police reports is generally recognized to be a good indicator of the actual (unobserved) flow of laundered money. The underlying hypothesis is that all other factors being equal (i.e. controlled for in empirical analyses, with particular emphasis on predicate crimes), there should be positive correlation between police reports and the actual incidence of ML (see dalla Pellegrina et al. 2020a, b, and literature cited therein).Footnote 19

The number of suspicious transactions was provided by the FIU at the provincial level for the period of interest. We were endowed with STRs from financial intermediaries, representing the most consistent volume of reports transmitted to the FIU.Footnote 20 Data are bi-annual, and we summed the number of reports accrued to the FIU in each semester. We used the lagged value of the number of reports made in the last 12 months in relation to each predicate crime to build an index representing a proxy for the reporting test by means of factor analysis.

As being part of the criminals’ utility function, the probability to get caught when laundering money is accounted for by including a proxy of the detected ML crimes, divided by each province’s population (ML crimes with known offender /total ML crimes). Furthermore, as is known in the literature, lengthy procedures and delays in trials dilute crime deterrence. We used the duration of the criminal trials (days) to capture the deterrence effects against ML and predicate crimes. Data were obtained from the Ministry of Justice on an annual basis and disaggregated by district and court. It was therefore aggregated to have an average duration of the processes by district and reworked to have a correspondence between district and province.

We also account for the fact that, for reasons of non-traceability, one of the main mechanisms through which ML is executed involves cash transactions (Axelrod 2017; Dalla Pellegrina and Masciandaro 2009; Lowe 2017). We include cash inflows to the banking system (finally accruing to the Central Bank) divided by each province’s population. In the same spirit, as they could favor cash transactions compared to other forms of payment (for example, online), we also add the number of bank branches present in each province. In line with the above, both the number of real estate transactions and the number of real estate loans were included in the regression analysis (all data are from the Bank of Italy on an annual basis). Real GDP and real per capita GDP are also included to account for the (economic) dimension of each province (ISTAT). The summary statistics for all variables involved in the analysis are presented in Appendix 2.

To construct a measure of reporting tests, we collected information on police reports of both money laundering and predicate crimes for each Italian province on an annual basis. Previous studies (e.g., Abadinsky 2010) estimate that organized crime’s highest-return activities are drug trafficking, exploitation and abetting of prostitution, racketeering, fraud, and counterfeiting of brands and industrial products. In line with the literature (e.g., Arnone and Borlini 2010; Draghi 2007; Jayasekara 2020; Mugarura 2011), we added armed robbery and micro-crime indexes. In addition, to capture the activities conducted by criminal organizations, we also included criminal associations, mafia-type associations, usury, and corruption. Finally, the tax gap is used as a measure of tax evasion. All data were obtained from the Italian Institute of Statistics (ISTAT). In particular, police reports on various crimes were drawn from criminal judicial statistics, while the tax gap was extracted from the well-being and sustainability indicators (BES).

We define the following empirical proxy for the reporting test \(\widehat{e:}\)

$$\widehat{e} =-factor (\frac{STR}{{source crime}_{k}})$$
(7)

The numerator accounts for excess reporting activity to the FIU in a given province-year. Reporting activity was measured as the number of STRs submitted to the FIU by financial intermediaries. Predicate crimes are used in the denominator to normalize the excess reporting activity across provinces with different crime rates, and k refers to each predicate crime illustrated above.

As there are several different types of predicate crimes, we performed a confirmatory factor analysis (Jöreskog 1969) to obtain a unique and comprehensive measure of the standard of error. This technique is useful to the extent that the frequencies of similar types of crime are correlated across provinces. Hence, reducing the number of predicate crimes to one or more latent factors simplifies the interpretation of the subsequent empirical analysis.Footnote 21 Specifically, we constructed several individual reporting test measures, one for each type of predicate crime. These measures have identical numerators, but different denominators. The factor analysis of these individual measures allows us to obtain a reduced number of factors as proxies for the reporting test.

Note that the retrieved factor in Eq. (7) is an inverse measure of what is referred to in the model in Sect. 2 as the reporting test (\(\widehat{e}\)). To obtain a measure of the reporting test, which is in line with the model’s meaning, we take the negative value of the factor as a proxy of \(\widehat{e}\). Specifically, its increase identifies the fact that banks need more evidence to report a transaction as suspicious to the FIU.

The results of the factor analysis are shown in Table 1 (additional details are provided in Appendix 3). The corresponding scree plot presented in Fig. 9 suggests the retention of a single factor that represents a unique empirical measure of reporting tests at the provincial level. We aim to analyze its evolution over time and its incidence on other measures in the model’s setup.

Table 1 Factor analysis on predicate crimes, Italian provinces, 2009–2012

3.3 Econometric Evidence

According to Sect. 3.1 and with specific reference to the hypotheses outlined therein, the first step in assessing the model’s ability to evaluate the effects of changes in the reporting test on STR, MLR, type-I, and type-II errors is to understand where in Fig. 4 the data suggest setting the Italian situation in the years of interest. More precisely, we aim to formerly check the sign of the correlation between the reporting test illustrated in (\(\widehat{e}\)) and STR, on the one hand, and between \(\widehat{e}\) and the MLR, on the other hand, at the provincial level.

To this end, we estimate the following regressions:

$$ STRit = \alpha 0 + \alpha 2 \widehat{e}i,t + \alpha 3 Real GDPi,t + \alpha 4 Real per capita GDPi,t + \alpha 5 Length of criminal trialsi,t + \alpha 6 Detected\_MLi,t +\uplambda i +\upmu t +\upvarepsilon it,$$
(8)
$$ MLRit = \beta 0 + \beta 2 \widehat{e}i,t + \beta 3 Real GDPi,t + \beta 4 Real per capita GDPi,t + \beta 5 Length of criminal trialsi,t + \beta 6 Detected\_MLi,t +\uplambda i +\upmu t +\upvarepsilon it, $$
(9)

where the subscript i refers to the province and t refers to the year.

Equations (8) and (9) are the baseline specifications. We provide additional regressions by adding cash transactions, bank branches, and real estate-related variables. Other specifications involve a set of predicate crimes divided by each province’s population.

Given our empirical purposes, we measure STR as the number of suspicious transactions reported by banks to the FIU scaled by the volume of banks’ activities, namely the amount of bank transactions (loans plus deposits). MLR is the money laundering rate observed in the province in the 12 months preceding time t, measured by the number of police reports for ML in relation to the economic dimension of each province (GDP)Footnote 22; λi and μt are province and time fixed effects, respectively and εi,t is an idiosyncratic error term clustered at the provincial level. The estimated signs of α2 and β2 are our focus, as they capture the correlation between MLR and our proxy for the reporting test to address our hypotheses.Footnote 23

The regression output, which is obtained through linear estimation, is reported in Table 2 (upper panel). The negative sign of the proxy of the parameter associated to ê indicates that either as \(\widehat{\mathrm{e}}\) êi,t increases both STR and MLR decrease, or as \(\widehat{\mathrm{e}}\) êi,t decreases both STR and MLR increase. This is likely supportive of H1, suggesting that in Italy ê is likely to be positioned on the left-hand side or in the neighbours of the gold standard (i.e., not far from the golden rule in Fig. 8).

Table 2 Regression analysis, Italian provinces, 2009–2012

To elucidate each possible reverse situation (i.e., whether the reporting test has actually increased or decreased in the observed period), we consider the pattern of the aggregate variables at the national level. In other words, we use the model’s predictions combined with the statistics in Table 3 to make inferences on the direction taken by the reporting test in Fig. 8.

Table 3 Patterns of \(\widehat{\mathrm{e}}\), MLR and STR, Italian provinces, 2009–2012

First, Table 3 (Column (1)) shows that our proxy for \(\widehat{\mathrm{e}}\) \(\widehat{\mathrm{e}}\) \(\widehat{\mathrm{e}}\) ê followed a substantially decreasing pattern from 2009 to 2012, indicating that the reporting test decreased in the aftermath of the introduction of the risk-based approach.

Second, our data point toward a growing number of STRs in relation to the volume of bank transactions (STR) in the observed period (Table 3, Column (2)). Since STRs are not monotonous for all positive values of the reporting test, an increase in reporting activity combined with a decrease \(\widehat{e}\) confirms that H1 may reasonably hold.

Third, Table 3 (Column (3)) provides evidence of the stability of the MLR. According to the model, the MLR was around its minimum in the period of interest, that is, \(\widehat{\mathrm{e}}\) ê was likely to be closer to the gold standard. More precisely, from the pattern of MLR, which decreased from 2009 to 2010, stabilized in 2011, and then increased in 2012, we can conclude that deterrence was most likely maximized in 2010–2011, when it crossed the gold standard (from right to left, see Fig. 8).

To provide inference on type-I and type-II errors, we use the sufficient statistics approach pioneered by Chetty (2009) and recently applied to the study of money laundering by Imanpour et al. (2019). One useful feature of the model is that the variables of interest depend on only a few constructs that correspond to the real world and are easily observable in the available (mostly aggregated) data.

To test the predictions of the model, we refer to Fig. 5 in Sect. 2, suggesting that type-I error should have increased in the aftermath of the introduction of the risk-based approach, which we motivated as a consequence of the reduction of the reporting test (i.e., a shift leftwards in Figs. 5 and 8, in the neighbours of the gold standard).

Some (aggregate, at national level) measures of reports received, processed and dismissed by the FIU are available from the FIU Anti-money laundering notebooks (Statistical data and annual reports). Precisely, we have been endowed with three types of reports: (i) total reports for suspicious transactions (money laundering, terrorism, arms trafficking) received from financial intermediaries; (ii) reports for ML received from all reporting agents (professionals and non-financial operators, others), and (iii) dismissed reports for money laundering received from all reporting subjects, including professionals and non-financial operators. We are not endowed with a precise measures of the reports for ML received from financial intermediaries, but just with the total number of STR, including those for terrorism and arms trafficking, although the latter represent only 1% of the STR from intermediaries. Hence we believe this can be interpreted as a reliable measure of the flow of reports for ML received from financial intermediaries.

Starting from these data, we constructed some different proxies of type-I error, all being representative measures of the dismissal rate of the STR received by the FIU, with focus on financial intermediaries. As reported in Table 4, the ratio of reports received but not analyzed by the FIU to the overall reports received from intermediaries (column (1)) increased from 2009 to 2011 and then returned to 2008 levels. Similarly, the overall dismissed reports for ML to the analyzed reports for ML (column (2)) rose in 2009 and 2010 and then dropped considerably. Also the ratio of dismissed reports for ML to reports received from intermediaries (column (3)) was higher in 2009 and 2010 (compared to 2008) and then fell. Though, some considerations are worthwhile to explain its pattern.

Table 4 Pattern of STRs, type-I and proxies of type-I error incidence (Italian provinces, 2009–2012); type-I, Italian provinces, 2008–2012

In general, all proxies of type-I error, support the “crying wolf” hypothesis, at least in the period immediately after the introduction of the risk-based approach, showing a substantial increase in 2009–2010 (or 2011) compared to the pre-reform period in 2008. However, after reaching their maximum after two or three years, they suddenly dropped, stabilized, or even improved upon the pre-reform levels. Given the time required to process STRs, the initial backlog and the increase in the STRs’ dismissal rate, which we attribute to a decrease in the reporting test adopted by financial intermediaries seems to have occurred. This seems reasonable to the extent that the new regulation caused relevant changes in banks’ attitudes, which plausibly interpreted the cost of signaling as exempt from sanctions, or subject to relatively low reputational costs, along with the fact that the transposition of the new legislation led to difficulties in learning the new reporting procedures. This caused an anomalous wave of STR, which is also testified by the FIU Annual Report of 2010 (referring to year 2009), reporting that: “The length of reporting times often makes it difficult to promptly intercept suspicious flows, nullifying the preventive effectiveness of the system.” A similar judgment is given in the Report of the following year:”Despite the continuous increases in productivity achieved by the Unit in the financial analysis of STRs, the large and persistent growth in the number of incoming reports has determined an increase in the stock of STR waiting for processing” (FIU Annual Report of 2011, referring to year 2010).

Nevertheless, the following improvement and consequent (sudden) break occurring in our proxies of type-I error does not represent a failure of our model. The enhancements that took place were instead motivated by an important technological change that occurred at the FIU. Indeed, the implementation of a new IT system for the acquisition and processing of reports has allowed the intelligence unit to cope with the exceptional increase, to accelerate and rationalize the screening process, as documented in the FIU Annual Report of 2012 (referring to 2011).

Finally, given that there are no reliable proxies for type-II errors, we exclusively rely on and draw inferences from the model (Fig. 5) to determine whether type-II errors increased during the period of interest. We cautiously infer that as the type-I error continuously increased until 2017, the opposite occurred with type-II errors. In conclusion, we feel reasonably confident in asserting that while the risk-based approach favored higher standards of deterrence in money laundering activities, pushing the reporting test toward the gold standard does not necessarily increase accuracy (see Fig. 5).

4 Concluding Remarks

The risk-based approach involves financial intermediaries in the AML system proactively. These actors must establish risk-based procedures and criteria (such as the KYC principle) to report certain transactions as suspicious to the FIU. From this perspective, the first level of the AML system can change the reporting test used to report a given transaction to the FIU.

The theoretical model proposed in this paper offers an interpretation of the behavior of the agents signaling STR to the FIU with respect to two important goals: the deterrence effects of their reporting activities and the accuracy of those reports. We empirically assessed the model’s main predictions using multivariate techniques and sufficient statistics. The analysis, which was based on Italian data, focused on the role of financial intermediaries—the largest pool of actors submitting STRs to the FIU. We first tested the effects of the introduction of the risk-based methodology on deterrence and type-I errors by means of regression analysis (using disaggregated data at the provincial level) combined with aggregate statistics. Thereafter, we used the model’s predictions to draw inferences regarding the pattern of type-II errors and, consequently, accuracy.

The empirical outcome suggests that, in the period of interest (i.e., in the years immediately following the introduction of the risk-based approach), financial intermediaries lowered the reporting test required to report a transaction as suspicious. The adoption of this “tougher” approach by the Italian intermediaries might have been motivated by the fact that these intermediaries are severely sanctioned if they do not report transactions subsequently detected as money-laundering transactions. The data combined with the model’s predictions suggest that the reporting test moved closer to the gold standard, further promoting deterrence.

We found that the observed increase in STR activity in relation to total bank transactions could be explained by a decrease in the reporting test adopted by the intermediaries. The model predicts an increase in the incidence of type-I errors, which was confirmed by aggregate data on dismissal rates from the FIU. These findings might relate to the fact that intermediaries and professionals are not formally punished for over-reporting. Conversely, we conclude that the incidence of type-II errors must have decreased during the period of interest—a conclusion that stems directly from the model. This is also motivated by an increase in reporting tests. In terms of policy, this inferred conclusion is particularly important, as data or proxies for the incidence of type-II errors are not easily observed at any layer of reporting activity.

Finally, although the risk-based approach aims to improve reporting quality, Italian authorities should be aware that although the MLR seems to be relatively stable (i.e., close to the maximum deterrence), the incidence of reporting errors is not at the minimum. In particular, while the inference about type-II errors is encouraging, type-I errors increased following the introduction of the new reporting rules, though largely compensated by substantial investment in IT-based screening technologies and the number of employees in the FIU.Footnote 24 However, these aspects could be detrimental to a bank’s reputation and client retention, while the AML authorities are doing most of the job of managing type-I errors.

From a policy perspective, the emerging evidence seems to indicate that the risk-based paradigm is helpful in combatting money laundering, at least in Italy. However, false positives remain a major issue that deserve further consideration. Researchers may wish to apply the approach proposed in this paper to other contexts to understand where different countries are positioned in terms of the accuracy of the information transmitted to the FIUs and the deterrence of money laundering activities.