Automatic suppression of false positive alerts in anti-money laundering systems using machine learning

Bakry, Ahmed N.; Alsharkawy, Almohammady S.; Farag, Mohamed S.; Raslan, K. R.

doi:10.1007/s11227-023-05708-z

Automatic suppression of false positive alerts in anti-money laundering systems using machine learning

Open access
Published: 20 October 2023

Volume 80, pages 6264–6284, (2024)
Cite this article

Download PDF

You have full access to this open access article

The Journal of Supercomputing Aims and scope Submit manuscript

Automatic suppression of false positive alerts in anti-money laundering systems using machine learning

Download PDF

Ahmed N. Bakry¹,
Almohammady S. Alsharkawy¹,
Mohamed S. Farag¹ &
…
K. R. Raslan¹

2781 Accesses
3 Citations
Explore all metrics

Abstract

Criminal activities generate an estimated $2 trillion in laundered money per year, highlighting the need for financial institutions to detect and report suspicious activity to protect their reputation. However, rule-based models commonly used for this purpose generate a high number of false positives, draining compliance team time, and increasing investigation costs. However, the application of machine learning in conjunction with rule-based models presents noteworthy implications, encompassing the potential reduction in false positives and the concomitant risk of machine learning inadvertently suppressing true positive alerts. This paper proposes a framework called automatic suppression based on XGBoost for anti-money laundering (ASXAML) to enhance detection by reducing false positives. ASXAML leverages recursive feature elimination with cross-validation for optimal feature selection. Subsequently, Optuna is employed to fine-tune hyperparameters for the XGBoost model. Results indicate that ASXAML achieves an optimal balance between reducing false positives and avoiding missed money laundering events, with an 86% F-beta score and only 11% money laundering customers were incorrectly closed out of 1926 in the test data.

Anti-money Laundering (AML) Research: A System for Identification and Multi-classification

Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach

A synthetic data set to benchmark anti-money laundering methods

Article Open access 28 September 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The act of making money obtained through illegal operations, such as the trafficking of illegal drugs, appear to have come from legitimate commercial activities is known as money laundering. Money from illegal activities is viewed as dirty, and thus, the process “launders” it to make it appear clean.

The process by which criminals conceal the original ownership and control of the proceeds of their illicit activity by making those proceeds appear to have come from a legitimate source is known as money laundering [1].

Money appears to come from legitimate sources. Money can be laundered in a number of ways. These techniques range in complexity from very simple to very complicated. One of the most popular methods is to use a legal cash-based business that is owned by criminals to launder money. If criminals, for instance, ran a shop, they might have inflated the daily cash collections to transfer the dirty money from the shop to the bank. Owners use the shop’s bank account. The prevention of financial crime has increased in importance for banks. With the growth of modern technology and global communication, money laundering is rising drastically, causing banks to suffer significant losses, from the world’s largest fine ($1.9 billion) imposed on HSBC to millions of dollars imposed on other banks worldwide [1].

Money-laundering is a dynamic three-stage process that requires: (1) placement, which is the process of moving cash from its source. This is followed by placing it into circulation through financial institutions or any legitimate organization. The process of money placement can be carried out through many processes. (2). The layering process, which makes it more difficult to detect and uncover laundering activity, makes the trailing of illegal proceeds difficult for law enforcement agencies. (3). Integration is the process of moving previously laundered money into the economy, mainly through financial organizations or the banking system, which makes money appear to be normal business earnings.

To fight money laundering, many systems have been introduced. Most of the systems are rule-based systems [2] and generate a lot of false positive alerts. False positive alerts waste compliance time and increase the risk that high-risk alerts will be missed due to the large number of false positive alerts. The other methods to solve this problem are using machine learning methods. Anti-money laundering (AML) using machine learning is either supervised or unsupervised. Due to the lack of real-world labeled data, most newly introduced systems use unsupervised techniques, clustering and anomaly detection. The drawback of this approach is that not every suspicious behavior is outlier behavior. The money launderer is always trying to replicate normal behavior for customers [3]. In order to overcome these challenges, this paper aims to propose a new framework based on supervised learning techniques. The contributions of this article are summarized as follows:

Introducing a machine learning model working together with a rule-based model system and taking the results from it and suppressing false positive alerts, ranking true positive alerts with a risk-based approach mechanism.
Selecting the features from rule-based model alerts generated by AML scenarios that are designed by AML specialists gives the solution the advantage of interpretability.
Applying the solution to real-world data and testing the results of real experts to take their feedback, making the results countable.
Applying Hybrid parameter tuning Optuna [4], which gives great results with our machine learning model.

The organization of this article is presented as follows: in Sect. 2, we present overviews of related works. Then, Sect. 3 is devoted to the presentation of the proposed framework. Section 4 describes our experimental evaluation approach and the different parameters used, along with a discussion of the results obtained. Finally, Sect. 5 concludes this paper.

2 Related works

Anti-money laundering for decades has been one of the hot areas to be solved using machine learning. Domashova et al. [5] used Optuna with boosting algorithms to evaluate organizations based on non-transactional characteristics such as the organization’s age, authorized capital size, founder composition, and so on. The issue with this approach is that it neglects the fact that money laundering is a behavioral process and it depends on patterns of transactions. In [6], the authors utilized four supervised learning algorithms for the bitcoin dataset to find money laundering. Unified terminologies related to the AML field were proposed in [7]. This paper points to two major fields to work on: customer risk profiling and suspicious behavior. Also, the authors point out a major challenge, which is the lack of public data. In [8], the authors utilized transactions, sender, and receiver to build graphs to help to reduce false positives. They used real banking data, but still, many types of transactions in real data don’t contain the two sides of the transaction, and some types have only one side.

Weber et al. [9] pointed out the issue of reducing false positives of AML systems will also reduce the true positives, and keeping the rule-based model is important because it can be interpretable. In [10], the authors applied supervised learning techniques like XGBoost with real transactional data divided into three groups: normal transactions, fired event transactions, and suspected STR transactions. They achieved an 82% AUC. Ahmed et al. [11] used a gradient boosting algorithm to detect money laundering activities based on transactions and account level. Some other studies focused on filtering watch lists based on machine learning in AML [12]. Based on a 10,000 transaction dataset [13], they used decision trees and support vector machine classifiers to determine whether the transaction is legal or not, and they discovered that decision trees outperform SVM in their customized dataset. Xia et al. [14], proposed money laundering prediction model based on graph convolution neural networks (GCN) and long short-term memory (LSTM) with transactions gives good precision results, but it loses many true positive transactions on the other hand.

The authors in [15], utilized real-world transactional data. They made monthly profiles, getting the sums and counts of each type of transaction, like cash and wires, to identify money laundering. They used regression and classification techniques like logistic regression, logistic regression with lasso, K-NN, and XGBoost. The F1 score and accuracy are applied in this paper to compare the results, and they conclude that traditional logistic regression with binary outcome performs better than the other models. Raiter [16] utilized logistic regression, random forest, artificial neural network, and support vector machine with transactional features like amounts of each transaction type. They used score to compare the results, which is not accurate in this problem, because the nature of AML and fraud problems is that the data contains very rare events, and so the score will give a false lead about the model. In [17], social network analysis applied to fight Money Laundering, they introduced a prediction model using social networks to predict customer risk. Social network analysis is used to predict the involvement of accounts for customers who are involved in money laundering.

The proposed deep learning approach [18] for anti-money laundering (AML) has several strengths. Firstly, it replaces predefined rules with automatically extracted latent features, allowing for a more flexible and adaptable system that can detect new patterns and anomalies in transaction sequences. Secondly, the use of recurrent and Transformer encoder layers enables the model to capture long-term dependencies and complex relationships between transactions, leading to better performance in reducing false positives and retaining true positives. Additionally, the experiment with a large dataset from Spar Nord Bank and the subsequent expert review of 26 clients reported to Danish authorities suggests that the approach has real-world potential for improving AML compliance.

However, there are also some limitations to consider. Firstly, the proposed approach requires significant data processing and model training, which may pose challenges for smaller institutions with limited resources. Additionally, the reliance on transaction sequences may overlook other important sources of information, such as customer profiles and external data sources. Finally, the experiment with a small sample of high-risk clients prompts further questions about the approach’s scalability and generalizability to different populations and contexts.

Overall, while the proposed deep learning approach shows promise for enhancing AML systems, it is important to consider both its strengths and limitations when evaluating its potential impact on financial institutions and regulatory compliance.

However, these works mainly focus on transactions or customer profiles, whereas in our research, we concentrate on alerts generated from customer scenarios. The results were not measured effectively by focusing solely on reducing false positives without focusing on true positive events, resulting in a trade-off between reducing false positives and not losing many true positives. Because these alerts already quantify the behavior, in this article, the features rather than the transactions were the alerts produced by the AML system. Additionally, we employed F-beta with different $\beta $ values to comply with the requirements of the financial authorities.

3 The proposed ASXAML framework

In this paper, we propose a novel anti-money laundering framework called automatic suppression based on XGBoost for AML (ASXAML). This section presents the basic idea, key features, and architecture of the proposed AML framework; then, the steps of the ASXAML framework are described.

The architecture of the ASXAML framework is described in Fig. 1, which shows the flow of the cycle after applying the proposed framework. Data are extracted from the core banking system; then, transformation and load jobs are applied to place the data in the target AML format. The AML scenarios then analyze the transactions and other data to generate alerts. These alerts go through our ASXAML framework before showing up. The ASXAML framework predicts the importance level of the alerts grouped by the customer and then suppresses these alerts. If it were lower than the cutting value, the other alerts over the cutting value would be up for investigation.

The current AML systems that depend on a rule-based model have a lot of false positives, which have drained the compliance time and effort. Our main goal is to intercept the alerts raised by the rule-based model, filter these alerts based on our framework, and raise the important alerts only. So, we propose the ASXAML framework to cover the drawbacks of the previous work. Figure 2 shows the steps of the ASXAML framework, which we will demonstrate in the next subsections.

3.1 Data preparation

The first phase of the ASXAML framework involves the data preparation process. This phase encompasses the data acquisition, the extraction, and the transformation of the data into a format that is suitable for processing by the model. Additionally, it describes the original features of the dataset, details of data insights and variable analysis, and some basic descriptors to help understand the final data structure.

3.2 Dataset description

We use a real-world banking dataset for all our experiments. For privacy concerns, we are unable to reveal the name of the bank or provide precise information; however, we do provide approximations to characterize the data where we can.

As shown in Table 1, the dataset used is real data that contain 46 scenarios, and the number of customers with alerts is more than 210K. Each customer can have more than one alert for different scenarios. The same scenario could fire on the same customer multiple times during the period. The number of important entities (suspected money laundering) is 6420 customers, and the number of false positive customers is 204,576 customers. It is noticed that it is completely unbalanced data. We took a sample from the false positive customers of 10k. It was labeled as 0, and the full positive customers were 6420. It was labeled as 1. The dataset was divided into two subsets: a training dataset and a testing dataset, following a split ratio of 70% for training and 30% for testing. This partitioning operation maintains the distribution of the samples for each class in the dataset.

Table 1 Data summary

Full size table

3.2.1 Data acquisition

The data utilized in this study are a real-world dataset obtained from a financial institution that has requested to remain anonymous due to privacy concerns. During this phase, the handling of missing values, removal of useless or highly correlated data, and data transformations were performed to optimize the information provided to the models and adapt the data to the behavior of an anti-money laundering department.

3.2.2 Data extraction

The data utilized in this study consist of the alerts generated by the AML system based on 46 configured scenarios in the financial institution over a period of 6 months. The scenarios include, but are not limited to, large cash deposits on a daily basis, structuring the deposits over multiple days, transferring money to high-risk countries, and receiving money from non-customers. The data were divided into two groups, the important group and the not-important group.

The important group includes alerts that were deemed important by the financial institution investigator and reported to the Money Laundering Combating Unit (MLCU), alerts that were investigated and raised to manager level, and alerts that required significant investigation even though they were not money laundering cases. The not-important group includes alerts that were determined to be false positives by investigators. As most alerts in the AML solutions are false positives, only random samples were selected from the not-important category, with a sample size of four events from the not-important group for each important event, while all important alerts were included in this study. The proportion of important alerts reported to the MLCU is typically between 2 and 5% of the total events produced by the AML system [19].

3.2.3 Customer profiling

The customer profiling step is a critical part of the ASXAML framework, which aims to build a customer profile based on the historical fired alerts in the last six months prior to the last closed alert in the not-important group, and before the last case or report on the customer in the important group. To ensure the integrity and accuracy of the data, the framework excludes all alerts that were fired after the reported case, as these alerts were not considered when the investigator made their decision.

The data extracted includes customer numbers, scenario names, which represent the name of the scenario, and the date of the fired alert. The gathered data is then grouped per customer number, where each customer has only one record and all their alerts are placed next to them. To this end, Table 2 illustrates a sample of the extracted data, while Table 3 shows the structure of the prepared data, with the number in each cell of the cells related to alerts representing the number of alerts that occurred on this customer over the period, and the label represents the category of this customer if it is important or not.

Table 2 Sample of data extracted

Full size table

Table 3 Profiling customers

Full size table

The profiling of all customers enables the framework to group customers with similar alerts and classify them accordingly. Ultimately, the framework can use the generated profiles to predict instances of money laundering activity, as it accurately captures customer behavior and detects any unusual or suspicious activities.

3.3 Feature selection

Feature selection is a crucial step in machine learning to select relevant and important features while avoiding irrelevant, redundant, or not important ones. Recursive feature elimination with cross-validation (RFECV) is a widely used method for feature selection [20]. The main idea of RFECV is to recursively select the optimal variables by creating a model and selecting the best or worst variable repeatedly. The process continues until all variables are selected, and the order of the eliminated variables reflects their importance.

To implement the RFE algorithm, a machine learning algorithm such as random forest, Naïve Bayes, logistic regression, or gradient boosting is needed to evaluate the importance of the features. In this paper, random forest is used as the classifier for the RFE model. The initial selection of features significantly affects the later selected features in the RFE model, as the split is not the same in the recreation process [21]. The selected features vary with each RFE model, and k-fold cross-validation is used to address this challenge in selecting the important features for automatic suppression.

In the RF-RFECV strategy, the best variables are selected based on the highest recall of cross-validation in each RFE with a cross-validation model. This technique is used for feature selection in this paper.

3.4 Classification model

The ASXAML framework uses Extreme Gradient Boosting (XGBoost) in the classification phase. XGBoost is an ensemble learning method that learns from labeled data. Sometimes it is not sufficient to depend on one learning model and only one result. Ensemble learning combines multiple learners to enhance the result. The final result is that one model consists of the combinations of several models and aggregates their results to get a better model than the previous models. XGBoost was picked because it is better when working with real-world classification issues. XGBoost can be used with parallel processing, it can use all the cores of the machine it runs on. It is highly scalable and can effectively deal with classification and preprocessing of data. XGBoost is flexible and can be integrated with many platforms; it is not tied to any specific one. It can be handled with multiple programming languages, like C++, Python, R, and Java. XGBoost can transform a weekly learner into a strong learner by boosting this learner through its optimization process. It avoids over-fitting issues with regularization, whether it is trees or linear models [22].

XGBoost has an internal cross-validation function. Therefore, there is no need to get external packages to apply cross-validation results. It can deal with missing values, support the user in setting his own objective function of the model and define his evaluation metrics. Adding to previous features, XGBoost was used in many winning competitions in machine learning, like Kaggle [23]. Therefore, XGBoost is a good choice to deal with the money laundering problem. Next, we will discuss the mathematical model of the XGBoost model.

3.4.1 Mathematical model of XGBoost

The objective function in the XGBoost model is:

$$\begin{aligned} Obj(\theta )=L(\theta )+R(\theta ) \end{aligned}$$

(1)

where $\theta $ is the best parameters fit with the independent variables $x_i$, and dependent variables $y_i$. L is the training loss function and R is the regularization term which controls the complexity of the model, which helps to avoid over-fitting. L is commonly used as a mean squared error:

$$\begin{aligned} L(\theta )=\sum _{i}(y_i - \tilde{y_i})^2 \end{aligned}$$

(2)

And $y_i$ can be written in the form:

$$\begin{aligned} \tilde{y_i}=\sum _{t=1}^{T} f_t(x_i),f_t \in F \end{aligned}$$

(3)

where T is a number of trees, F is the set of all possible classification and regression trees CART’s. In the XGBoost, instead of training many trees, sequenced trees are created; each tree is dependent on the last tree. The objective of each tree is to enhance the last result:

$$\begin{aligned} \tilde{y_i}^{(0)}= & {} 0 \end{aligned}$$

(4)

$$\begin{aligned} \tilde{y_i}^{(1)}= & {} f_1(x_i) =\tilde{y_i}^{(0)} + f_1(x_i) \end{aligned}$$

(5)

$$\begin{aligned} \tilde{y_i}^{(2)}= & {} f_1(x_i) + f_2(x_i) =\tilde{y_i}^{(1)} + f_2(x_i) \end{aligned}$$

(6)

...

$$\begin{aligned} \tilde{y_i}^{(T)} = \sum _{t=1}^{T} f_t(x_i) = \tilde{y_i}^{(T-1)} + f_T(x_i) \end{aligned}$$

(7)

The objective function then becomes:

$$\begin{aligned} Obj^{(T)}=\sum _{t=1}^{n}(y_i - (\tilde{y_i}^{(T-1)} + f_T(x_i)))^2 + \sum _{i=1}^{T}\omega (f_i) \end{aligned}$$

(8)

3.5 Hyperparameter tuning using Optuna

In the present study, the hyperparameter tuning phase employed Optuna, an automatic hyperparameter optimization software framework specifically designed for machine learning [4]. Optuna offers several advantages, such as a dynamic define-by-run API that enables users to define their search space dynamically, an efficient implementation of searching and pruning strategies, and a versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to lightweight experiments conducted via an interactive interface. Additionally, Optuna is an open source framework that outperforms many black-box frameworks while being easy to use and setup in various environments. By leveraging Optuna, the present study aimed to automatically obtain new hyperparameters without requiring interventions from money laundering experts [24]. Specifically, Optuna was utilized to tune the hyperparameters employed in the XGBoost model. The optimized hyperparameters were then fed into the model to score the data based on the Optuna results.

3.6 Validating the results

In this step, the confusion matrix was used to get the results of the model and to make the comparison between all the used models. The confusion matrix Fig. 3 describes the performance of the classification models.

From the confusion matrix, we can conclude the following measures:

Recall = True positive rate = $\frac{true positive }{true Positive + false negative}$
Precision = positive productivity value = $\frac{true positive}{true Positive + false positive}$
F-measure = harmonic mean of recall and precision = $\frac{2*precision*recall}{precision + recall}$

These are the normal measures for most classification problems. We will also use the most important measure [25] in this problem, which is F-beta. F-beta can be used to concentrate on one measure, like in our case here. We need to concentrate on the recall measure, but false positives are still important.

$$\begin{aligned} F_\beta = (1 + \beta ^2) \frac{precision*recall}{(\beta ^2 * precision) + recall } \end{aligned}$$

(9)

4 Experiments and results

In this section, the experiments were done on a real dataset. The following subsections will demonstrate the results and analyze these results.

4.1 Performance evaluation

In this section, we present the evaluation of the ASXAML framework on real-world datasets. The dataset was partitioned into two distinct classes: a training dataset and a testing dataset, with a division ratio of 70% for training and 30% for testing. This stratified partitioning ensured an equitable representation of samples from all classes. To evaluate the performance of the proposed framework, we conducted four experiments using different combinations of feature selection methods and classification algorithms. The first experiment used the RFECV feature selection method with XGBoost and Optuna, the second experiment used only XGBoost and Optuna, the third experiment used only XGBoost, and the fourth experiment used multiple classifiers including SVM, RF, NB, KNN, and DT. The reason for conducting experiments with different combinations of methods and algorithms is to assess the effectiveness and robustness of the ASXAML framework in different scenarios and to compare its performance with other commonly used classifiers. The experiments were conducted on Google Colaboratory, which provides a free Jupyter notebook environment with GPU support for running machine learning experiments [26].

4.1.1 Experiment #1

In this experiment, Optuna with $\beta $ = 2 as optimsation function and XGBoost with feature selection was both employed. We used multiple cross-fold validation, from two cross-folds to ten cross-folds, but as there would be numerous figures, we only showed the findings of two and five cross-fold validations. Figure 4 shows the result after applying the feature selection method (RFECV) on the 46 features with 2 cross-fold validation. The number of optimal features chosen from RFECV, which represents the highest score is 16 features. The two lines in this figure display the result of each cross-validation applied. The x-axis is the number of features, and the y-axis shows the result of applying RFECV with recall as a measure on this cross-fold.

After applying Optuna for tuning the hyperparameters, Fig. 5 shows the important hyperparameters that affect the model the most. Booster has three options: gbtree, gblinear, or dart. Gbtree and dart use tree-based models, while gblinear uses linear models. The second important hyperparameter in this experiment was Alpha, which is a regularization term on weights (analogous to Lasso regression). The third important parameter was colsample_bytree which is the subsample ratio of columns when constructing each tree. subsample, which denotes the fraction of observations to be randomly sampled for each tree. Subsampling occurs once for every tree constructed. Lambda is the second regularization term on weights (analogous to Ridge regression) and it didn’t affect the model in this experiment. It is noticed that the hyperparameter booster is having the most important effect in this experiment. The number of trials to find the best hyperparameters for the XGBoost model is 421 with 0.86 F-beta. The number of total trials was 1000 trials Fig. 5; the majority of the trials had objective values (F-beta) above 0.80, which indicates that most of the trials produced excellent outcomes.

The second observation that will be shown in this experiment is using five cross-fold validations. In this case, the best number of features according to five cross-fold validations was 44 features, as shown in Fig. 6. Also, it was discovered that the booster, as a hyperparameter, has the greatest influence on the outcome see Fig. 7. The best trial was 770 with 0.86 F-beta, which is also not far from the results that came from using two cross-validations. The recall of the majority of trials is over 0.80, which indicates that the majority of trials offer very good results.

The comparison of all cross-fold validation combinations from 2 cross-fold validations to 10 cross-fold validations is shown in Fig. 8. Recall, reduced false positives, and F-beta, which was also displayed in Table 4, are the comparison metrics. It is noted that all combinations provide excellent results overall, and the variation in terms of F-beta is not very high. It is further noted that the measurement of F-beta is a perfect measure for this particular problem because it provides the balance between lowering the false positives and maintaining a high percentage of recall. The 9 and 7 cross-folds had the best total cross-fold validation rate of 0.86. With each cross-validation applied, the relevant hyperparameters affecting the model are shown in Table 5, and it can be shown that the booster hyperparameter is the most crucial hyperparameter in many combinations. The best two results of applying XGBoost after RFECV and Optuna were as follows: for 9 cross-fold, F-beta = 0.86 recall of the suspicious customers (important customers) was 0.92. The number of reduced false positive customers is 0.73, which means that only 0.27 of customers with a 0.92 true positive hit will be investigated. For 7 cross-fold, F-beta = 0.86 recall of the suspicious customers (important customers) was 0.93. The number of reduced false positive customers is 0.71, which means that only 0.29 of customers with a 0.93 true positive hit will be investigated.

Table 4 Comparison between recall, F-beta and reduced false positives for all C-F validations

Full size table

Table 5 Hyperparameters importance for all combinations of C-F validations

Full size table

After applying Optuna trials (1000 trials) and comparing the results of various cross-fold validations ranging from 2 to 10 cross-fold validations, it is evident that the final result is not significantly different and that the results are quite similar. This means that our framework continues to demonstrate its value even when used in various combinations.

4.1.2 Experiment #2

In this experiment, the XGBoost with Optuna with $\beta $= 2 as optimization function was utilized without any feature selection methods. Booster, which has a 0.96 score and is the best hyperparameter affecting the model with Optuna applied, is the key hyperparameter in this experiment, see Fig. 9. Alpha was the second most significant hyperparameter. The model’s influence from the colsample_bytree, subsample, and lambda hyperparameters was minimal. Optuna conducted 1000 trials, with 537 yielding an 0.86 success rate.

The result of applying XGBoost and Optuna was as follows: F-beta = 0.86, recall of the important customers was 0.90. The number of reduced false positive customers is 0.76, which means that only 0.24 of customers with a 0.90 true positive hit will be investigated. Figure 10 compares the outcomes of experiment #1 and experiment #2. In this comparison, we used 5 cross-fold validations, which is the RFECV default to represent experiment #1. As we show, the recall and F-beta in experiment #1 are superior to experiment #2’s, however experiment #2 performs better in terms of fewer false positives.

4.1.3 Experiment #3

In this experiment, XGBoost was applied with default parameters. Only XGBoost was applied without RFECV and without Optuna. The results were as follows: F-beta = 0.81. The recall of the suspicious customers (important customers) was 0.79. The number of reduced false positive customers is 0.95, which means that investigators will investigate only 0.05 of customers with 0.79 true positive hit on them.

4.1.4 Experiment #4

In this experiment, the Optuna maximization function was varied between multiple $\beta $ values, ranging from 1 to 2. Additionally, cross-fold values of 2 and 5 are used for feature selection. According to the Fig. 11, the recall in 2 Cross-fold starts at 0.80 and rises as $\beta $ is boosted. On the other hand, the decreased false positives begin at 0.94 and begin to decrease as $\beta $ is raised. At beta = 1.4 and 1.5, the two metrics are more evenly distributed, and at $\beta $ = 2, recall bias sets in while false positives are at their lowest. The same trend was present in cross-fold =5 with no noticeable differences, indicating that the financial institution’s decision to be more watchful of customers suspected of money laundering is up to them. They will increase the $\beta $ while still reducing false positives by at least 0.70, while if they need balance, they will aim for $\beta $ = 1.4 or 1.5.

4.1.5 Experiment #5

In this experiment, we applied other classifiers such as support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), Naive Bayes (NB), and decision tree (DT) classifiers with default parameters. These classifiers were tested to obtain the best classifier overall in our problem. Table 6 illustrates the performances of the five approaches and indicates that the best F-beta result was with KNN, with F-beta 0.82, recall 0.82 and reduced false positives by 0.90. Where NB has F-beta 0.83, 0.81 of recall and reduced false positives by 0.95.

Table 6 SVM, KNN, SVM, RF and NB classifiers

Full size table

The results from experiment #1 are compared to all employed classifiers in Fig. 12 using five cross-fold validation with $\beta $ equal 1.5, and it is evident that while most of the models have greater false positive reduction than the suggested model, they fall short in recall and F-beta. Because of this, even though these approaches’ reduction, for example, for NB, was very good (0.95), their recall is rather poor (0.81). On the one hand, this indicates that the NB model mistakenly closed 366 money laundering customers out of the 1926 customers in the test data. On the other hand, even if there are an adequate number of false positive reduction, the ASXAML framework can close fewer number money laundering customers (231) and when increase $\beta $ to 2 the number goes to (149) achieving higher recall value. This is due to the fact that other algorithms only aim to reduce false positives, which make these algorithms missing the crucial money laundering events and mistakenly closing them, putting the bank at a greater risk of fines.

The overall results show that the proposed ASXAML framework performs better in comparison to other classifiers. The recall has a gap that is at least 0.05 between the ASXAML and other classifiers, which affects the number of important customers that slipped away and closed by mistake. In this problem, it is important to close false positives without losing too many important customers. Based on this, it is clear that the proposed framework introduces this balance between these two lines without jeopardizing either one. Our framework’s dynamic nature relies on three main dynamic components—RFECV, Optuna, and XGBoost models—making it adaptable for use with different datasets. The adaptability of the system is underscored by its reliance solely on rule-based scenarios and the requisite involvement of investigators, a characteristic common to all financial institutions equipped with AML systems. This renders the system applicable across diverse datasets and to a wide spectrum of financial institutions.

5 Conclusion

In conclusion, we have introduced a novel framework for enhancing anti-money laundering (AML) systems using role-based models. Our framework involves extracting features from alerts generated by AML scenarios created by experts. These alerts are grouped by customer and fed into the feature selection method RFECV, which selects the best features to apply before passing them to Optuna for hyperparameter tuning of the XGBoost model. The resulting classifier uses the best parameters generated by Optuna to classify customers as either important or not, striking a balance between reducing false positives and not missing important customers.

Our experimental results using five cross-fold validation with $\beta $ equal to 1.5 demonstrate that our framework outperforms most of the other classifiers, achieving a higher recall value and F-beta score. While other approaches, such as Naive Bayes, may achieved a high false positive reduction (0.95), their recall was poor (0.81), leading to the incorrect exclusion of 366 money laundering customers out of 1926 in the test data. In contrast, the ASXAML framework, even with fewer false positive reductions, only missed 231 money laundering customers, and increasing $\beta $ to 2 further reduced the number to 149, demonstrating the framework’s ability to detect crucial money laundering events while minimizing the risk of fines for the bank.

It is imperative to acknowledge potential drawbacks in the application of this methodology. Firstly, human investigators must abstain from participating in fictitious actions. Secondly, the occurrence of true positive cases is significantly less frequent when contrasted with the prevalence of false positives. Additionally, the necessity for historical actions implies that the proposed solution necessitates a minimum duration of coexistence with a rule-based model. Lastly, the quality of data is contingent upon the rigor and relevance of the predefined scenarios.

Future work will involve applying this framework to other features, such as customer profiling features, and utilizing clustering methods to identify the most important customer groups.

Data availability

Due to the sensitive nature of the data used in this study, the data cannot be made publicly available. Access to the data can be requested through a formal application to the appropriate institution.

References

Chen Z, Van Khoa LD, Teoh EN, Nazir A, Karuppiah EK, Lam KS (2018) Machine learning techniques for anti-money laundering (AML) solutions in suspicious transaction detection: a review. Knowl Inf Syst 57(2):245–285
Article Google Scholar
Li X, Cao X, Qiu X, Zhao J, Zheng J (2017) Intelligent anti-money laundering solution based upon novel community detection in massive transaction networks on spark. In: 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD), pp 176–181. https://doi.org/10.1109/CBD.2017.38
Lorenz J, Silva MI, Aparício D, Ascensão JaT, Bizarro P (2020) Machine learning methods to detect money laundering in the bitcoin blockchain in the presence of label scarcity. In: Proceedings of the First ACM International Conference on AI in Finance. ICAIF ’20. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3383455.3422549
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’19. Association for Computing Machinery, New York, NY, USA, pp 2623–2631. https://doi.org/10.1145/3292500.3330701
Domashova J, Mikhailina N (2021) Usage of machine learning methods for early detection of money laundering schemes. Procedia Comput Sci 190:184–192. https://doi.org/10.1016/j.procs.2021.06.033. 2020 Annual International Conference on Brain-Inspired Cognitive Architectures for Artificial Intelligence: Eleventh Annual Meeting of the BICA Society
Pettersson Ruiz E, Angelis J (2021) Combating money laundering with machine learning—applicability of supervised-learning algorithms at cryptocurrency exchanges. J Money Laund Control. https://doi.org/10.1108/JMLC-09-2021-0106
Article Google Scholar
Jensen R, Iosifidis A (2022) Fighting money laundering with statistics and machine learning: an introduction and review. https://doi.org/10.48550/ARXIV.2201.04207. https://arxiv.org/abs/2201.04207
Eddin AN, Bono J, Aparício D, Polido D, Ascensão JT, Bizarro P, Ribeiro P (2021) Anti-money laundering alert optimization using machine learning with graphs. https://doi.org/10.48550/ARXIV.2112.07508. https://arxiv.org/abs/2112.07508
Weber M, Chen J, Suzumura T, Pareja A, Ma T, Kanezashi H, Kaler T, Leiserson CE, Schardl TB (2018) Scalable graph learning for anti-money laundering: a first look. https://doi.org/10.48550/ARXIV.1812.00076. https://arxiv.org/abs/1812.00076
Jullum M, Løland A, Huseby RB, Ånonsen G, Lorentzen J (2020) Detecting money laundering transactions with machine learning. J Money Laund Control 23(1):173–186. https://doi.org/10.1108/JMLC-07-2019-0055
Article Google Scholar
Ahmed AAA (2021) Anti-money laundering recognition through the gradient boosting classifier. Acad Account Financ Stud J 25(5)
Alkhalili M, Qutqut MH, Almasalha F (2021) Investigation of applying machine learning for watch-list filtering in anti-money laundering. IEEE Access 9:18481–18496. https://doi.org/10.1109/ACCESS.2021.3052313
Article Google Scholar
Kumar A, Das S, Tyagi V, Shaw RN, Ghosh A (2021) In: Bansal JC, Paprzycki M, Bianchini M, Das S (eds) Analysis of classifier algorithms to detect anti-money laundering. Springer, Singapore, pp 143–152
Xia P, Ni Z, Xiao H, Zhu X, Peng P (2022) A novel spatiotemporal prediction approach based on graph convolution neural networks and long short-term memory for money laundering fraud. Arab J Sci Eng 47(2):1921–1937
Article Google Scholar
Harris DA, Pyndiura KL, Sturrock SL, Christensen RAG (2021) Using real-world transaction data to identify money laundering: leveraging traditional regression and machine learning techniques. STEM Fellowsh J 7(1):21–32. https://doi.org/10.17975/sfj-2021-006
Article Google Scholar
Raiter O (2021) Applying supervised machine learning algorithms for fraud detection in anti-money laundering. J Mod Issues Bus Res 1(1):14–26
Google Scholar
Fronzetti Colladon A, Remondi E (2017) Using social network analysis to prevent money laundering. Expert Syst Appl 67:49–58. https://doi.org/10.1016/j.eswa.2016.09.029
Article Google Scholar
Jensen RIT, Iosifidis A (2023) Qualifying and raising anti-money laundering alarms with deep learning. Expert Syst Appl 214:119037. https://doi.org/10.1016/j.eswa.2022.119037
Article Google Scholar
Lannoo K, Parlour R et al (2021) Anti-money laundering in the eu: time to get serious. Technical report, Centre for European Policy Studies
Chen X, Jeong JC (2007) Enhanced recursive feature elimination. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp 429–435 . https://doi.org/10.1109/ICMLA.2007.35
Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8(1):25. https://doi.org/10.1186/1471-2105-8-25
Article CAS Google Scholar
Dhaliwal SS, Nahid A-A, Abbas R (2018) Effective intrusion detection system using xgboost. Information. https://doi.org/10.3390/info9070149
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. Association for Computing Machinery, New York, NY, USA, pp 785–794. https://doi.org/10.1145/2939672.2939785
Agrawal T (2021) Make your machine learning and deep learning models more efficient. Apress, Berkeley. https://doi.org/10.1007/978-1-4842-6579-6
Book Google Scholar
Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Losada JM, Fernández-Luna DE (eds) Advances in information retrieval. Springer, Berlin, pp 345–359
Chapter Google Scholar
Bisong E (2019) Google colaboratory. Apress, Berkeley, pp 59–64. https://doi.org/10.1007/978-1-4842-4470-8_7
Book Google Scholar

Download references

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Department of Mathematics, Faculty of Science, Al-Azhar University, Nasr City, Cairo, Egypt
Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag & K. R. Raslan

Authors

Ahmed N. Bakry
View author publications
You can also search for this author in PubMed Google Scholar
Almohammady S. Alsharkawy
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed S. Farag
View author publications
You can also search for this author in PubMed Google Scholar
K. R. Raslan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the conception and design of the study, analysis and interpretation of data, and drafting and revising the manuscript.

Corresponding author

Correspondence to Ahmed N. Bakry.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

As this study did not involve human or animal subjects, ethical approval was not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bakry, A.N., Alsharkawy, A.S., Farag, M.S. et al. Automatic suppression of false positive alerts in anti-money laundering systems using machine learning. J Supercomput 80, 6264–6284 (2024). https://doi.org/10.1007/s11227-023-05708-z

Download citation

Accepted: 01 October 2023
Published: 20 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11227-023-05708-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automatic suppression of false positive alerts in anti-money laundering systems using machine learning

Abstract

Similar content being viewed by others

Anti-money Laundering (AML) Research: A System for Identification and Multi-classification

Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach

A synthetic data set to benchmark anti-money laundering methods

1 Introduction

2 Related works

3 The proposed ASXAML framework

3.1 Data preparation

3.2 Dataset description

3.2.1 Data acquisition

3.2.2 Data extraction

3.2.3 Customer profiling

3.3 Feature selection

3.4 Classification model

3.4.1 Mathematical model of XGBoost

3.5 Hyperparameter tuning using Optuna

3.6 Validating the results

4 Experiments and results

4.1 Performance evaluation

4.1.1 Experiment #1

4.1.2 Experiment #2

4.1.3 Experiment #3

4.1.4 Experiment #4

4.1.5 Experiment #5

5 Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation