Skip to main content

Advertisement

Log in

Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

The recent emergence of cryptocurrencies has added another layer of complexity in the fight towards financial crime. Cryptocurrencies require no central authority and offer pseudo-anonymity to its users, allowing criminals to disguise themselves among legitimate users. On the other hand, the openness of data fuels the investigator’s toolkit to conduct forensic examinations. This study focuses on the detection of illicit activities (e.g., scams, financing terrorism, and Ponzi schemes) on cryptocurrency infrastructures, both at an account and transaction level. Previous work has identified that class imbalance and the dynamic environment created by the evolving techniques deployed by criminals to avoid detection are widespread in this domain. In our study, we propose Adaptive Stacked eXtreme Gradient Boosting (ASXGB), an adaptation of eXtreme Gradient Boosting (XGBoost), to better handle dynamic environments and present a comparative analysis of various offline decision tree-based ensembles and heuristic-based data-sampling techniques. Our results show that: (i) offline decision tree-based gradient boosting algorithms outperform state-of-the-art Random Forest (RF) results at both an account and transaction level, (ii) the data-sampling approach NCL-SMOTE further improves recall at a transaction level, and (iii) our proposed ASXGB successfully reduced the impact of concept drift while further improving recall at a transaction level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. xgboost library: https://xgboost.readthedocs.io/en/latest/python/index.html.

  2. lightgbm library: https://lightgbm.readthedocs.io/en/latest/.

  3. catboost library: https://catboost.ai/docs/concepts/python-quickstart.html.

  4. Microsoft Azure: https://azure.microsoft.com.

  5. sklearn-multiflow https://scikit-multiflow.github.io/.

  6. AXGB: https://github.com/jacobmontiel/AdaptiveXGBoostClassifier.

  7. AML-Crypto: https://github.com/achmand/aml-crypto-graph.

  8. Sampled Datasets: https://drive.google.com/drive/folders/1xxJgmMPKVGLymI90fX1JxHFU9GCEJvK-.

References

  1. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on knowledge discovery & data mining, KDD ’19; 2019. p. 2623–631.

  2. Baek H, Oh J, Kim CY, Lee K. A model for detecting cryptocurrency transactions with discernible purpose. In: 2019 Eleventh International Conference on ubiquitous and future networks (ICUFN), IEEE; 2019. p. 713–17.

  3. Bartoletti M, Pes B, Serusi S. Data mining for detecting bitcoin ponzi schemes. In: 2018 Crypto Valley Conference on blockchain technology (CVCBT), IEEE; 2018. p. 75–84.

  4. Batista GEAPA, Prati RC, Monard MC. Balancing strategies and class overlapping. In: Famili AF, Kok JN, Peña JM, Siebes A, Feelders A, editors. Advances in intelligent data analysis VI. Berlin: Springer; 2005. p. 24–35.

    Chapter  Google Scholar 

  5. Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on neural information processing systems, NIPS’11; 2011. p. 2546–554.

  6. Bifet A, Frank E, Holmes G, Pfahringer B. Ensembles of restricted Hoeffding trees. ACM Trans Intell Syst Technol (TIST). 2012;3(2):1–20.

    Article  Google Scholar 

  7. Boiko Ferreira LE, Murilo Gomes H, Bifet A, Oliveira LS. Adaptive random forests with resampling for imbalanced data streams. In: 2019 International Joint Conference on neural networks (IJCNN), IEEE; 2019. p. 1–6

  8. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB, editors. Advances in knowledge discovery and data mining. Berlin: Springer; 2009. p. 475–82.

    Chapter  Google Scholar 

  9. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16(1):321–57.

    Article  Google Scholar 

  10. Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on knowledge discovery and data mining, KDD ’16; 2016. p. 785–94.

  11. Džeroski S, Ženko B. Is combining classifiers with stacking better than selecting the best one? Mach Learn. 2004;54(3):255–73.

    Article  Google Scholar 

  12. Elwell R, Polikar R. Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw. 2011;22(10):1517–31.

    Article  Google Scholar 

  13. European Parliament and Council. Fifth anti-money laundering di-rective (5amld); 2018. https://eur-lex.europa.eu/eli/dir/2018/843/oj. Online; Accessed 4 Jan 2020.

  14. Fanusie Y, Robinson T. Bitcoin laundering: an analysis of illicit flows into digital currency services. Center on Sanctions & Illicit Finance memorandum; 2018.

  15. Farrugia S, Ellul J, Azzopardi G. Detection of illicit accounts over the ethereum blockchain. Expert Syst Appl. 2020;150:113318.

    Article  Google Scholar 

  16. Financial Action Task Force. International standards on combating money laundering and the financing of terrorism & proliferation. the fatf recommendations; 2012. http://www.fatf-gafi.org/media/fatf/documents/recommendations/pdfs/FATF_Recommendations.pdf. Online; Accessed 4 May 2020.

  17. Financial Action Task Force. Guidance for a risk-based approach to virtual assets and virtual asset service providers; 2019. http://www.fatf-gafi.org/media/fatf/documents/recommendations/RBA-VA-VASPs.pdf. Online; Accessed 4 May 2020.

  18. Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T. Adaptive random forests for evolving data stream classification. Mach Learn. 2017;106(9):1469–95.

    Article  MathSciNet  Google Scholar 

  19. Harlev MA, Sun Yin H, Langenheldt KC, Mukkamala R, Vatrapu R. Breaking bad: De-anonymising entity types on the bitcoin blockchain using supervised machine learning. In: Proceedings of the 51st Hawaii International Conference on system sciences, HICSS; 2018. p. 1–10

  20. Hart P. The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory. 1968;14(3):515–6.

    Article  Google Scholar 

  21. Hidalgo JIG, Maciel BI, Barros RS. Experimenting with prequential variations for data stream learning evaluation. Comput Intell. 2019;35(4):670–92.

    Article  MathSciNet  Google Scholar 

  22. Jullum M, Løland A, Huseby RB, Ånonsen G, Lorentzen J. Detecting money laundering transactions with machine learning. J Money Laund Control. 2020;23(1):173–86.

    Article  Google Scholar 

  23. Junsomboon N, Phienthrakul T. Combining over-sampling and under-sampling techniques for imbalance dataset. In: Proceedings of the 9th International Conference on machine learning and computing, ACM; 2017. p. 243–47.

  24. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: A highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on neural information processing systems, NIPS’17; 2017. p. 3149–157.

  25. Laurikkala J. Improving identification of difficult small classes by balancing class distribution. In: Proceedings of the 8th Conference on AI in medicine in Europe: artificial intelligence medicine; 2001. p. 63–6.

  26. Lee C, Maharjan S, Ko K, Hong JW-K. Toward detecting illegal transactions on bitcoin using machine-learning methods. In: Zheng Z, Dai H-N, Tang M, Chen X, ediors. BlockSys 2019. CCIS, vol. 1156. Singapore: Springer; 2020. p. 520–33. https://doi.org/10.1007/978-981-15-2777-7_42.

    Chapter  Google Scholar 

  27. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559–63.

    Google Scholar 

  28. Lessambo FI. Anti-money laundering laws. In: The US banking system. Cham: Palgrave Macmillan; 2020. p. 37–66. https://doi.org/10.1007/978-3-030-34792-5_4.

    Chapter  Google Scholar 

  29. Liang J, Li L, Chen W, Zeng D. Targeted addresses identification for bitcoin with network representation learning. In: 2019 IEEE International Conference on intelligence and security informatics (ISI), IEEE; 2019. p. 158–60.

  30. Lin Y, Wu P, Hsu C, Tu I, Liao S. An evaluation of bitcoin address classification based on transaction history summarization. In: 2019 IEEE International Conference on blockchain and cryptocurrency (ICBC), IEEE; 2019. p. 302–10.

  31. MacKinnon JG. Approximate asymptotic distribution functions for unit-root and cointegration tests. J Bus Econ Stat. 1994;12(2):167–76.

    MathSciNet  Google Scholar 

  32. MacKinnon JG. Critical Values For Cointegration Tests. Working Paper 1227, Economics Department, Queen’s University; 2010. https://ideas.repec.org/p/qed/wpaper/1227.html. Accessed 26 May 2020.

  33. Monamo PM, Marivate V, Twala B. A multifaceted approach to bitcoin fraud detection: global and local outliers. In: 2016 15th IEEE International Conference on machine learning and applications (ICMLA), IEEE; 2016. p. 188–94.

  34. Montiel J, Read J, Bifet A, Abdessalem T. Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res. 2018;19(1):2914–5.

    MATH  Google Scholar 

  35. Montiel J, Mitchell R, Frank E, Pfahringer B, Abdessalem T, Bifet A. Adaptive xgboost for evolving data streams. In: Proceedings of the International Joint Conference on neural networks (IJCNN); 2020.

  36. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    MathSciNet  MATH  Google Scholar 

  37. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. Catboost: Unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on neural information processing systems, NIPS’18; 2018. p. 6639–6649.

  38. Read J, Bifet A, Pfahringer B, Holmes G. Batch-incremental versus instance-incremental learning in dynamic and evolving data. In: International Symposium on intelligent data analysis, Springer; 2012. p. 313–23.

  39. Savage D, Wang Q, Zhang X, Chou P, Yu X. Detection of money laundering groups: supervised learning on small networks. In: Workshops at the Thirty-First AAAI Conference on artificial intelligence; 2017.

  40. Savona EU, Riccardi M. Assessing the risk of money laundering: research challenges and implications for practitioners. Eur J Crim Policy Res. 2019;25(1):1–4.

    Article  Google Scholar 

  41. Sayed GI, Tharwat A, Hassanien AE. Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection. Appl Intell. 2019;49(1):188–205.

    Article  Google Scholar 

  42. Schneider F, Windischbauer U. Money laundering: some facts. Eur J Law Econ. 2008;26(3):387–404.

    Article  Google Scholar 

  43. Senator TE, Goldberg HG, Wooton J, Cottini MA, Khan AU, Klinger CD, Llamas WM, Marrone MP, Wong RW, et al. The fincen artificial intelligence system: identifying potential money laundering from reports of large cash transactions. In: IAAI; 1995. p. 156–170.

  44. Somasundaram A, Reddy S. Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance. Neural Comput Appl. 2019;31(1):3–14.

    Article  Google Scholar 

  45. Sun Yin HH, Langenheldt K, Harlev M, Mukkamala RR, Vatrapu R. Regulating cryptocurrencies: a supervised machine learning approach to de-anonymizing the bitcoin blockchain. J Manag Inf Syst. 2019;36(1):37–73.

    Article  Google Scholar 

  46. Toyoda K, Ohtsuki T, Mathiopoulos PT. Identification of high yielding investment programs in bitcoin via transactions pattern analysis. In: GLOBECOM 2017 - 2017 IEEE Global Communications Conference, IEEE; 2017. p. 1–6.

  47. Toyoda K, Ohtsuki T, Mathiopoulos PT. Multi-class bitcoin-enabled service identification based on transaction history summarization. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), IEEE; 2018. p. 1153–1160.

  48. Tyagi S, Mittal S. Sampling approaches for imbalanced data classification problem in machine learning. In: Proceedings of ICRIC 2019, Springer; 2020. p. 209–21.

  49. Weber M, Domeniconi G, Chen J, Weidele DKI, Bellei C, Robinson T, Leiserson CE. Anti-money laundering in bitcoin: experimenting with graph convolutional networks for financial forensics. Tutorial in the Anomaly Detection in Finance Workshop at the 25th SIGKDD Conference on Knowledge Discovery and Data Mining; 2019.

  50. Wilson DL. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC. 1972;2(3):408–21.

    Article  MathSciNet  Google Scholar 

  51. Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241–59.

    Article  Google Scholar 

  52. Xia Y, Liu C, Li Y, Liu N. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl. 2017;78:225–41.

    Article  Google Scholar 

  53. Yang K, Xu W. Fraudmemory: Explainable memory-enhanced sequential neural networks for financial fraud detection. In: Proceedings of the 52nd Hawaii International Conference on system sciences; 2019. p. 1023–32.

  54. Yeoh P. Banks’ vulnerabilities to money laundering activities. J Money Laund Control. 2019;23(1):122–35.

    Article  MathSciNet  Google Scholar 

  55. Yicheng HJ. Effectiveness of us anti-money laundering regulations and HSBC case study. J Money Laund Control. 2015;18(4):525–32.

    Article  Google Scholar 

  56. Sun Yong, Liu Feng. Smote-ncl: A re-sampling method with filter for network intrusion detection. In: 2016 2nd IEEE International Conference on computer and communications (ICCC), IEEE; 2016. p. 1157–161.

  57. Zhang Y, Trubey P. Machine learning and sampling scheme: An empirical study of money laundering detection. Comput Econ. 2019;54(3):1043–63.

    Article  Google Scholar 

  58. Zola F, Eguimendia M, Bruse JL, Orduna Urrutia R. Cascading machine learning to attack bitcoin anonymity. In: 2019 IEEE International Conference on Blockchain (Blockchain), IEEE; 2019. p. 10–17.

Download references

Funding

The research work disclosed in this paper is partially funded by the Malta Information Technology Agency (MITA) Distributed Ledger Technologies (DLT) Scholarship Scheme (Malta), initiated by MITA in collaboration with the Centre for DLT at the University of Malta.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dylan Vassallo.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vassallo, D., Vella, V. & Ellul, J. Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies. SN COMPUT. SCI. 2, 143 (2021). https://doi.org/10.1007/s42979-021-00558-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00558-z

Keywords

Navigation