Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies

Vassallo, Dylan; Vella, Vincent; Ellul, Joshua

doi:10.1007/s42979-021-00558-z

Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies

Original Research
Published: 17 March 2021

Volume 2, article number 143, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

2576 Accesses
37 Citations
2 Altmetric
Explore all metrics

Abstract

The recent emergence of cryptocurrencies has added another layer of complexity in the fight towards financial crime. Cryptocurrencies require no central authority and offer pseudo-anonymity to its users, allowing criminals to disguise themselves among legitimate users. On the other hand, the openness of data fuels the investigator’s toolkit to conduct forensic examinations. This study focuses on the detection of illicit activities (e.g., scams, financing terrorism, and Ponzi schemes) on cryptocurrency infrastructures, both at an account and transaction level. Previous work has identified that class imbalance and the dynamic environment created by the evolving techniques deployed by criminals to avoid detection are widespread in this domain. In our study, we propose Adaptive Stacked eXtreme Gradient Boosting (ASXGB), an adaptation of eXtreme Gradient Boosting (XGBoost), to better handle dynamic environments and present a comparative analysis of various offline decision tree-based ensembles and heuristic-based data-sampling techniques. Our results show that: (i) offline decision tree-based gradient boosting algorithms outperform state-of-the-art Random Forest (RF) results at both an account and transaction level, (ii) the data-sampling approach NCL-SMOTE further improves recall at a transaction level, and (iii) our proposed ASXGB successfully reduced the impact of concept drift while further improving recall at a transaction level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of financial fraud: comparisons of some tree-based machine learning approaches

Article 28 March 2023

Detection of Ponzi scheme on Ethereum using machine learning algorithms

Article Open access 27 October 2023

Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach

Notes

xgboost library: https://xgboost.readthedocs.io/en/latest/python/index.html.
lightgbm library: https://lightgbm.readthedocs.io/en/latest/.
catboost library: https://catboost.ai/docs/concepts/python-quickstart.html.
Microsoft Azure: https://azure.microsoft.com.
sklearn-multiflow https://scikit-multiflow.github.io/.
AXGB: https://github.com/jacobmontiel/AdaptiveXGBoostClassifier.
AML-Crypto: https://github.com/achmand/aml-crypto-graph.
Sampled Datasets: https://drive.google.com/drive/folders/1xxJgmMPKVGLymI90fX1JxHFU9GCEJvK-.

References

Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on knowledge discovery & data mining, KDD ’19; 2019. p. 2623–631.
Baek H, Oh J, Kim CY, Lee K. A model for detecting cryptocurrency transactions with discernible purpose. In: 2019 Eleventh International Conference on ubiquitous and future networks (ICUFN), IEEE; 2019. p. 713–17.
Bartoletti M, Pes B, Serusi S. Data mining for detecting bitcoin ponzi schemes. In: 2018 Crypto Valley Conference on blockchain technology (CVCBT), IEEE; 2018. p. 75–84.
Batista GEAPA, Prati RC, Monard MC. Balancing strategies and class overlapping. In: Famili AF, Kok JN, Peña JM, Siebes A, Feelders A, editors. Advances in intelligent data analysis VI. Berlin: Springer; 2005. p. 24–35.
Chapter Google Scholar
Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on neural information processing systems, NIPS’11; 2011. p. 2546–554.
Bifet A, Frank E, Holmes G, Pfahringer B. Ensembles of restricted Hoeffding trees. ACM Trans Intell Syst Technol (TIST). 2012;3(2):1–20.
Article Google Scholar
Boiko Ferreira LE, Murilo Gomes H, Bifet A, Oliveira LS. Adaptive random forests with resampling for imbalanced data streams. In: 2019 International Joint Conference on neural networks (IJCNN), IEEE; 2019. p. 1–6
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB, editors. Advances in knowledge discovery and data mining. Berlin: Springer; 2009. p. 475–82.
Chapter Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16(1):321–57.
Article Google Scholar
Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on knowledge discovery and data mining, KDD ’16; 2016. p. 785–94.
Džeroski S, Ženko B. Is combining classifiers with stacking better than selecting the best one? Mach Learn. 2004;54(3):255–73.
Article Google Scholar
Elwell R, Polikar R. Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw. 2011;22(10):1517–31.
Article Google Scholar
European Parliament and Council. Fifth anti-money laundering di-rective (5amld); 2018. https://eur-lex.europa.eu/eli/dir/2018/843/oj. Online; Accessed 4 Jan 2020.
Fanusie Y, Robinson T. Bitcoin laundering: an analysis of illicit flows into digital currency services. Center on Sanctions & Illicit Finance memorandum; 2018.
Farrugia S, Ellul J, Azzopardi G. Detection of illicit accounts over the ethereum blockchain. Expert Syst Appl. 2020;150:113318.
Article Google Scholar
Financial Action Task Force. International standards on combating money laundering and the financing of terrorism & proliferation. the fatf recommendations; 2012. http://www.fatf-gafi.org/media/fatf/documents/recommendations/pdfs/FATF_Recommendations.pdf. Online; Accessed 4 May 2020.
Financial Action Task Force. Guidance for a risk-based approach to virtual assets and virtual asset service providers; 2019. http://www.fatf-gafi.org/media/fatf/documents/recommendations/RBA-VA-VASPs.pdf. Online; Accessed 4 May 2020.
Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T. Adaptive random forests for evolving data stream classification. Mach Learn. 2017;106(9):1469–95.
Article MathSciNet Google Scholar
Harlev MA, Sun Yin H, Langenheldt KC, Mukkamala R, Vatrapu R. Breaking bad: De-anonymising entity types on the bitcoin blockchain using supervised machine learning. In: Proceedings of the 51st Hawaii International Conference on system sciences, HICSS; 2018. p. 1–10
Hart P. The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory. 1968;14(3):515–6.
Article Google Scholar
Hidalgo JIG, Maciel BI, Barros RS. Experimenting with prequential variations for data stream learning evaluation. Comput Intell. 2019;35(4):670–92.
Article MathSciNet Google Scholar
Jullum M, Løland A, Huseby RB, Ånonsen G, Lorentzen J. Detecting money laundering transactions with machine learning. J Money Laund Control. 2020;23(1):173–86.
Article Google Scholar
Junsomboon N, Phienthrakul T. Combining over-sampling and under-sampling techniques for imbalance dataset. In: Proceedings of the 9th International Conference on machine learning and computing, ACM; 2017. p. 243–47.
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: A highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on neural information processing systems, NIPS’17; 2017. p. 3149–157.
Laurikkala J. Improving identification of difficult small classes by balancing class distribution. In: Proceedings of the 8th Conference on AI in medicine in Europe: artificial intelligence medicine; 2001. p. 63–6.
Lee C, Maharjan S, Ko K, Hong JW-K. Toward detecting illegal transactions on bitcoin using machine-learning methods. In: Zheng Z, Dai H-N, Tang M, Chen X, ediors. BlockSys 2019. CCIS, vol. 1156. Singapore: Springer; 2020. p. 520–33. https://doi.org/10.1007/978-981-15-2777-7_42.
Chapter Google Scholar
Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559–63.
Google Scholar
Lessambo FI. Anti-money laundering laws. In: The US banking system. Cham: Palgrave Macmillan; 2020. p. 37–66. https://doi.org/10.1007/978-3-030-34792-5_4.
Chapter Google Scholar
Liang J, Li L, Chen W, Zeng D. Targeted addresses identification for bitcoin with network representation learning. In: 2019 IEEE International Conference on intelligence and security informatics (ISI), IEEE; 2019. p. 158–60.
Lin Y, Wu P, Hsu C, Tu I, Liao S. An evaluation of bitcoin address classification based on transaction history summarization. In: 2019 IEEE International Conference on blockchain and cryptocurrency (ICBC), IEEE; 2019. p. 302–10.
MacKinnon JG. Approximate asymptotic distribution functions for unit-root and cointegration tests. J Bus Econ Stat. 1994;12(2):167–76.
MathSciNet Google Scholar
MacKinnon JG. Critical Values For Cointegration Tests. Working Paper 1227, Economics Department, Queen’s University; 2010. https://ideas.repec.org/p/qed/wpaper/1227.html. Accessed 26 May 2020.
Monamo PM, Marivate V, Twala B. A multifaceted approach to bitcoin fraud detection: global and local outliers. In: 2016 15th IEEE International Conference on machine learning and applications (ICMLA), IEEE; 2016. p. 188–94.
Montiel J, Read J, Bifet A, Abdessalem T. Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res. 2018;19(1):2914–5.
MATH Google Scholar
Montiel J, Mitchell R, Frank E, Pfahringer B, Abdessalem T, Bifet A. Adaptive xgboost for evolving data streams. In: Proceedings of the International Joint Conference on neural networks (IJCNN); 2020.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. Catboost: Unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on neural information processing systems, NIPS’18; 2018. p. 6639–6649.
Read J, Bifet A, Pfahringer B, Holmes G. Batch-incremental versus instance-incremental learning in dynamic and evolving data. In: International Symposium on intelligent data analysis, Springer; 2012. p. 313–23.
Savage D, Wang Q, Zhang X, Chou P, Yu X. Detection of money laundering groups: supervised learning on small networks. In: Workshops at the Thirty-First AAAI Conference on artificial intelligence; 2017.
Savona EU, Riccardi M. Assessing the risk of money laundering: research challenges and implications for practitioners. Eur J Crim Policy Res. 2019;25(1):1–4.
Article Google Scholar
Sayed GI, Tharwat A, Hassanien AE. Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection. Appl Intell. 2019;49(1):188–205.
Article Google Scholar
Schneider F, Windischbauer U. Money laundering: some facts. Eur J Law Econ. 2008;26(3):387–404.
Article Google Scholar
Senator TE, Goldberg HG, Wooton J, Cottini MA, Khan AU, Klinger CD, Llamas WM, Marrone MP, Wong RW, et al. The fincen artificial intelligence system: identifying potential money laundering from reports of large cash transactions. In: IAAI; 1995. p. 156–170.
Somasundaram A, Reddy S. Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance. Neural Comput Appl. 2019;31(1):3–14.
Article Google Scholar
Sun Yin HH, Langenheldt K, Harlev M, Mukkamala RR, Vatrapu R. Regulating cryptocurrencies: a supervised machine learning approach to de-anonymizing the bitcoin blockchain. J Manag Inf Syst. 2019;36(1):37–73.
Article Google Scholar
Toyoda K, Ohtsuki T, Mathiopoulos PT. Identification of high yielding investment programs in bitcoin via transactions pattern analysis. In: GLOBECOM 2017 - 2017 IEEE Global Communications Conference, IEEE; 2017. p. 1–6.
Toyoda K, Ohtsuki T, Mathiopoulos PT. Multi-class bitcoin-enabled service identification based on transaction history summarization. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), IEEE; 2018. p. 1153–1160.
Tyagi S, Mittal S. Sampling approaches for imbalanced data classification problem in machine learning. In: Proceedings of ICRIC 2019, Springer; 2020. p. 209–21.
Weber M, Domeniconi G, Chen J, Weidele DKI, Bellei C, Robinson T, Leiserson CE. Anti-money laundering in bitcoin: experimenting with graph convolutional networks for financial forensics. Tutorial in the Anomaly Detection in Finance Workshop at the 25th SIGKDD Conference on Knowledge Discovery and Data Mining; 2019.
Wilson DL. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC. 1972;2(3):408–21.
Article MathSciNet Google Scholar
Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241–59.
Article Google Scholar
Xia Y, Liu C, Li Y, Liu N. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl. 2017;78:225–41.
Article Google Scholar
Yang K, Xu W. Fraudmemory: Explainable memory-enhanced sequential neural networks for financial fraud detection. In: Proceedings of the 52nd Hawaii International Conference on system sciences; 2019. p. 1023–32.
Yeoh P. Banks’ vulnerabilities to money laundering activities. J Money Laund Control. 2019;23(1):122–35.
Article MathSciNet Google Scholar
Yicheng HJ. Effectiveness of us anti-money laundering regulations and HSBC case study. J Money Laund Control. 2015;18(4):525–32.
Article Google Scholar
Sun Yong, Liu Feng. Smote-ncl: A re-sampling method with filter for network intrusion detection. In: 2016 2nd IEEE International Conference on computer and communications (ICCC), IEEE; 2016. p. 1157–161.
Zhang Y, Trubey P. Machine learning and sampling scheme: An empirical study of money laundering detection. Comput Econ. 2019;54(3):1043–63.
Article Google Scholar
Zola F, Eguimendia M, Bruse JL, Orduna Urrutia R. Cascading machine learning to attack bitcoin anonymity. In: 2019 IEEE International Conference on Blockchain (Blockchain), IEEE; 2019. p. 10–17.

Download references

Funding

The research work disclosed in this paper is partially funded by the Malta Information Technology Agency (MITA) Distributed Ledger Technologies (DLT) Scholarship Scheme (Malta), initiated by MITA in collaboration with the Centre for DLT at the University of Malta.

Author information

Authors and Affiliations

Department of Artificial Intelligence, University of Malta, Msida, Malta
Dylan Vassallo & Vincent Vella
Centre for Distributed Ledger Technologies, University of Malta, Msida, Malta
Joshua Ellul

Authors

Dylan Vassallo
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Vella
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Ellul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dylan Vassallo.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vassallo, D., Vella, V. & Ellul, J. Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies. SN COMPUT. SCI. 2, 143 (2021). https://doi.org/10.1007/s42979-021-00558-z

Download citation

Received: 21 October 2020
Accepted: 02 March 2021
Published: 17 March 2021
DOI: https://doi.org/10.1007/s42979-021-00558-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies

Abstract

Access this article

Similar content being viewed by others

Detection of financial fraud: comparisons of some tree-based machine learning approaches

Detection of Ponzi scheme on Ethereum using machine learning algorithms

Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies

Abstract

Access this article

Similar content being viewed by others

Detection of financial fraud: comparisons of some tree-based machine learning approaches

Detection of Ponzi scheme on Ethereum using machine learning algorithms

Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation