Skip to main content

Machine learning techniques for anti-money laundering (AML) solutions in suspicious transaction detection: a review

Abstract

Money laundering has been affecting the global economy for many years. Large sums of money are laundered every year, posing a threat to the global economy and its security. Money laundering encompasses illegal activities that are used to make illegally acquired funds appear legal and legitimate. This paper aims to provide a comprehensive survey of machine learning algorithms and methods applied to detect suspicious transactions. In particular, solutions of anti-money laundering typologies, link analysis, behavioural modelling, risk scoring, anomaly detection, and geographic capability have been identified and analysed. Key steps of data preparation, data transformation, and data analytics techniques have been discussed; existing machine learning algorithms and methods described in the literature have been categorised, summarised, and compared. Finally, what techniques were lacking or under-addressed in the existing research has been elaborated with the purpose of pinpointing future research directions.

This is a preview of subscription content, access via your institution.

References

  1. Kou Y, Lu C-TT, Sirwongwattana S, Huang YP, Sinvongwattana S (2004) Survey of fraud detection techniques. In: 2004 IEEE international conference on networking sensing and control, vol 2(3), pp 749–754

  2. Huang JY (2015) Effectiveness of US anti-money laundering regulations and HSBC case study. J Money Laund Control 18(4):525–532

    Article  Google Scholar 

  3. Mollenkamp C, Wolf B (n.d.) HSBC to pay record $1.9 billion US fine in money laundering case. Accessed on 29 Dec 2016 (Online). http://uk.reuters.com/article/2012/12/11/uk-hsbc-probe-idUKBRE8BA05K20121211

  4. Claudio G, John E (n.d.) Iranian dealings lead to a fine for credit suisse. Accessed on 29 Dec 2016 (Online). http://www.nytimes.com/2009/12/16/business/16bank.html?_r=0

  5. Harry W (n.d.) Major banks still vulnerable to money laundering, says top regulator. Accessed on 29 Dec 2016 (Online). http://www.telegraph.co.uk/finance/newsbysector/banksandfinance/10 153728/Major-banks-still-vulnerable-to-money-laundering-says-top-regulator.html

  6. Reed A (n.d.) ING fined a record amount. Accessed on 29 Dec 2016 (Online). http://online.wsj.com/news/articles/SB1000142405275045774625127133363F78

  7. Standard Bank Fined Over Lax Anti-Money Laundering Controls (n.d.). Accessed on 29 Dec 2016 (Online). http://www.bbc.com/news/business-25864499

  8. Michael K (n.d.) FCA fines standard bank £7.6m for slack anti-money laundering controls. Accessed on 29 Dec 2016 (Online). http://www.ibtimes.co.uk/fca-fines-standard-bank-7-6m-slack-anti-money-laundering-controls-1433478

  9. Gao S, Xu D, Wang H, Green P (2009) Knowledge based anti money laundering: a software agent bank application. J Knowl Manag 13(2):63–75

    Article  Google Scholar 

  10. Verhage A (2009) Between the hammer and the anvil? the anti-money laundering-complex and its interactions with the compliance industry. Crime Law Soc Change 52(1):9–32

    Article  Google Scholar 

  11. Cahill MH, Lambert D, Pinheiro JC, Sun DX (2002) Detecting fraud in the real world. Handbook of massive data sets. Springer, Berlin, pp 911–929

    Chapter  Google Scholar 

  12. Gao Z, Ye M (2007) A framework for data mining-based anti-money laundering research. J Money Laund Control 10(2):170–179

    Article  Google Scholar 

  13. Arquilla J, Ronfeldt D (2002) Networks and netwars. In: The future of terror, crime and militancy, pp 80–82

  14. Liu X, Zhang P (2010) A scan statistics based suspicious transactions detection model for anti-money laundering (AML) in financial institutions. In: Proceedings—2010 international conference on multimedia communications, Mediacom, pp 210–213

  15. Sudjianto A, Nair S, Yuan M, Zhang A, Kern D, Cela-Díaz F, Cela F (2010) Statistical methods for fighting financial crimes. Technometrics 52(1):5–19

    MathSciNet  Article  Google Scholar 

  16. Yue D, Wu X, Wang Y, Li Y, Chu CH (2007) A review of data mining-based financial fraud detection research. 2007 international conference on wireless communications, networking and mobile computing, WiCOM 2007, pp 5514–5517

  17. Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  18. Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge

    MATH  Google Scholar 

  19. Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Disc 1(3):291–316

    Article  Google Scholar 

  20. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58

    Article  Google Scholar 

  21. Zhang ZM, Salerno JJ, Yu PS (2003) Applying data mining in investigating money laundering crimes. ACM, New York, pp 24–27

    Google Scholar 

  22. Mannes J (n.d.) Another salesforce acquisition with beyondcore enterprise analytics grab. Accessed on 5 Jan 2017 (Online). https://techcrunch.com/2016/08/15/another-salesforce-acquisition-with-beyondcore-enterprise-analytics-grab/

  23. Institute S (2008) SAS/STAT(R) 9.1 user’s guide: the REG procedure (book excerpt). SAS Institute, Cary

    Google Scholar 

  24. Kepes B (n.d.) More vertical analytics solutions—INETCO goes analytical on ATM data. Accessed on 10 August 2017 (Online). https://www.forbes.com/sites/benkepes/2015/01/15/more-vertical-analytics-solutions-inetco-goes-analytical-on-atm-data/#5755cb4f469c

  25. Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17(5–6):375–381

    Article  Google Scholar 

  26. Schmidt A (2013) Know your customer (technology abstract). Technical report, The Corporate Executive Board Company

  27. Chen Z, Van Khoa LD, Nazir A, Teoh EN, Karupiah EK (2014) Exploration of the effectiveness of expectation maximization algorithm for suspicious transaction detection in anti-money laundering. ICOS 2014–2014 IEEE conference on open systems, pp 145–149

  28. Le-Khac NA, Markos S, Kechadi MT (2010) Towards a new data mining-based approach for anti-money laundering in an international investment bank. In: Lecture notes of the institute for computer sciences, social-informatics and telecommunications engineering, vol 31 LNICST, pp 77–84

  29. Donders AR, van der Heijden GJ, Theo S, Karel GM (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091

    Article  Google Scholar 

  30. Brown ML, John FK (2003) Data mining and the impact of missing data. Ind Manag Data Syst 3(71–81):611–621

    Article  Google Scholar 

  31. Garfinkel SL (2006) Forensic feature extraction and cross-drive analysis. Dig Investig 3:71–81

    Article  Google Scholar 

  32. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Mass storage systems and technologies (MSST), Incline Village

  33. Russell SJ, Norvig P (2002) Artificial intelligence: a modern approach, 2nd edn. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  34. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston

    Google Scholar 

  35. PWC (n.d.) PWC. Accessed on 5 Jan 2017 (Online). http://www.pwc.com/gx/en/financial-services/publications/anti-money-laundering-know-your-customer-quick-reference-guide.jhtml

  36. Schmidt A (2013) Anti-money laundering: technology analysis abstract. Technical report

  37. Zadeh L (1965) Fuzzy sets. Inf Control 8(3):338–353

    Article  MATH  Google Scholar 

  38. Babuska R (1998) Fuzzy modeling for control, 1st edn. Kluwer, Norwell

    Book  Google Scholar 

  39. Mamdani EH, Assilian S (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud 7(1):1–13

    Article  MATH  Google Scholar 

  40. Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern SMC–15(1):116–132

    Article  MATH  Google Scholar 

  41. Chen Y-T, Mathe J (2011) Fuzzy computing applications for anti-money laundering and distributed storage system load monitoring. In: World conference on soft computing

  42. Ishibuchi H, Nojima Y (2006) Tradeoff between accuracy and rule length in fuzzy rule-based classification systems for high-dimensional problems. In: 11th international conference on information processing and management of uncertainty in knowledge-based systems

  43. Ishibuchi H, Nojima Y (2007) Analysis of interpretability–accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int J Approx Reason 44(1):4–31

  44. Sudkamp T, Hammell RJ (1998) Scalability in fuzzy rule-based learning. Inf Sci 109(1–4):135–147

    Article  Google Scholar 

  45. Luo X (2014) Suspicious transaction detection for anti-money laundering. Int J Secur Appl 8(2):157–166

    Google Scholar 

  46. Han J, Hei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD international conference on management of data, Dallas

  47. Grahne ZJ (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM 2003 workshop on frequent itemset mining implementations, Melbourne

  48. Grahne ZJ (2005) Fast algorithms for frequent itemset mining using fptrees. IEEE Trans Knowl Data Eng 17(10):1347–1362

    Article  Google Scholar 

  49. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  50. Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613

    Article  Google Scholar 

  51. Pengcheng W, Dietterich TG (2004) Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st international conference on machine learning, ICML’04, p 110

  52. Gabrilovich E, Markovitch S (2004) Text categorization with many redundant features: using aggressive feature selection to make SVMS competitive with c4.5. In: Proceedings of the 21st international conference on machine learning, ICML’04. ACM, New York, p 41

  53. Segata N, Blanzieri E (2011) Operators for transforming kernels into quasi-local kernels that improve SVM accuracy. J Intell Inf Syst 37(2):155–186

    Article  Google Scholar 

  54. Chen Z, Olugbenro O, Seng NLC (2016) Equipment failure analysis for oil and gas industry with an ensemble predictive model. In: The 5th international conference on computer science and computational mathematics, ICCSCM 06

  55. Gonzalez JL, Marcelin-Jimenez R (2011) Phoenix: a fault-tolerant distributed web storage based on URLs. In: 2011 IEEE 9th international symposium on parallel and distributed processing with applications, pp 282–287

  56. Habib Soliman M, Jugal K (2010) Scalable biomedical named entity recognition and investigation of a database and supported svm approach. Int J Bioinform Res Appl 6(2):191–208

    Article  Google Scholar 

  57. Joachims T (2006) Training linear SVMS in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06. ACM, New York, pp 217–226

  58. Liu K, Yu T (2011) An improved support-vector network model for anti-money laundering. In: Management of e-commerce and e-government (ICMeCG), Hubei

  59. Tang J, Yin J (2005) Developing an intelligent data discriminating system of anti-money laundering based on SVM. In: International conference on machine learning and cybernetics, Guangzhou

  60. Lv L-T, Ji N, Zhang J-L (2008) A RBF neural network model for anti-money laundering. In: Wavelet analysis and pattern recognition, Hong Kong

  61. Hwang Y-S, Bang S-Y (1994) A neural network model APC-III and its application to unconstrained handwritten digit recognition. In: International conference on neural information processing

  62. Cao DK, Do P (2002) Applying data mining in money laundering detection for the vietnamese banking industry. In: 4th Asian conference on intelligent information and database systems

  63. Yang Y, Guan X, You J (2002) CLOPE: a fast and effective clustering algorithm for transactional data. In: Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining, Alberta

  64. Le-Khac N-A, Kechadi M-T (2010) Application of data mining for anti-money laundering detection: a case study. In: Data mining workshops (ICDMW), Sydney

  65. Liu R, Qian X-L, Mao S, Zhu S-Z (2011) Research on anti-money laundering based on core decision tree algorithm. In: Chinese control and decision conference (CCDC), Mianyang

  66. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Record 25(2):103–114

    Article  Google Scholar 

  67. Paula EL, Ladeira M, Carvalho RN, Marzagão T (2016) Deep learning anomaly detection as support fraud investigation in Brazilian exports and anti-money laundering. In: IEEE international conference on machine learning and applications (ICMLA), Anaheim

  68. Spring R, Shrivastava A (2017) Scalable and sustainable deep learning via randomized hashing. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’17. ACM, New York, pp 445–454

  69. Dreżewski R, Sepielak J, Filipkowski W (2012) System supporting money laundering detection. Dig Investig 9(1):8–21

    Article  Google Scholar 

  70. Dreżewski R, Sepielak J, Filipkowski W (2014) The application of social network analysis algorithms in a system supporting money laundering detection. Inf Sci 295:18–32

    MathSciNet  Article  Google Scholar 

  71. Colladon AF, Remondi E (2017) Using social network analysis to prevent money laundering. Expert Syst Appl 67:49–87

    Article  Google Scholar 

  72. Demetis DS (2010) The risk-based approach and a risk-based data-mining application. In: Technology and anti-money laundering: a systems theory and risk-based approach. Edward Elgar Publishing, Cheltenham. https://doi.org/10.4337/9781849806657

  73. Chitra K, Subashini B (2013) Data mining techniques and its applications in banking sector. Int J Emerg Technol Adv Eng 3(8):219–226

    Google Scholar 

  74. Zhu T (2006) Suspicious financial transaction detection based on empirical mode decomposition method. In: IEEE Asia-Pacific conference on services computing, Guangzhou

  75. Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  76. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey

    MATH  Google Scholar 

  77. Loh W-Y (2009) Improving the precision of classification trees. Ann Appl Stat 3(4):1710–1737

    MathSciNet  Article  MATH  Google Scholar 

  78. Kamber M, Winstone L, Gong W, Cheng S, Han J (1997) Generalization and decision tree induction: efficient classification in data mining. In: Proceedings 7th international workshop on research issues in data engineering. High performance database management for large-scale applications, pp 111–120

  79. Sudhakar M, Reddy CVK, Pradesh A (2016) Two step credit risk assesment model for retail bank loan applications using decision tree data. Int J Adv Res Comput Eng Technol (IJARCET) 5(3):705–718

    Google Scholar 

  80. Wang S-N, Yang J-G (2007) A money laundering risk evaluation method based on decision tree. In: Machine learning and cybernetics, Hong Kong

  81. Rojas L, Alonso E, Axelson S (2012) Multi agent based simulation (MABS) of financial transactions for anti money laundering (AML). In: The 17th Nordic conference on secure IT system, volume: short-paper proceedings

  82. Rojas L, Alonso E, Axelson S (2012) Money laundering detection using synthetic data. In: The 27th annual workshop of the Swedish artificial intelligence society (SAIS), Karlskrona

  83. Liu X, Zhang P, Zeng D (2008) Sequence matching for suspicious activity detection in anti-money laundering. In: Intelligence and security informatics, Taipei, pp 50–61

  84. Larik AS, Haider S (2011) Clustering based anomalous transaction reporting. In: Procedia computer science, Pakistan

  85. Vikas J, Balan RS (2016) Money laundering regulatory risk evaluation using bitmap index-based decision tree. J Assoc Arab Univ Basic Appl Sci 23:96–102

    Google Scholar 

  86. Cortinas R, Freiling FC, Ghajar-Azadanlou M, Lafuente A, Larrea M, Penso LD, Soraluze I (2012) Secure failure detection and consensus in trustedpals. IEEE Trans Dependable Secure Comput 9(4):610–625

    Article  Google Scholar 

  87. Phua C, Smith-Miles K, Lee V, Gayler R (2012) Resilient identity crime detection. IEEE Trans Knowl Data Eng 24(3):533–546

    Article  Google Scholar 

  88. Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  MATH  Google Scholar 

  89. Wang X, Dong G (2009) Research on money laundering detection based on improved minimum spanning tree clustering and its application. In: 2nd international symposium on knowledge acquisition and modeling

  90. Raza S, Haider S (2010) Suspicious activity reporting using dynamic Bayesian networks. Procedia Computer Science

  91. Yang Q, Feng B, Song P (2007) Study on anti-money laundering service system of online payment based on union-bank mode. In: Wireless communications, networking and mobile computing, Shanghai

  92. Tang J (2006) A peer dataset comparison outlier detection model applied to financial surveillance. In: Pattern recognition, Hong Kong

  93. Kim Y, Sohn SY (2012) Stock fraud detection using peer group analysis. Expert Syst Appl 39(10):8986–8992

    Article  Google Scholar 

  94. Weston DJ, Hand DJ, Adams NM, Whitrow C, Juszczak P (2008) Plastic card fraud detection using peer group analysis. Adv Data Anal Classif 2(1):45–62

    MathSciNet  Article  MATH  Google Scholar 

  95. Kingdon J (2004) AI fights money laundering. Intell Syst 19(3):87–89

    Article  Google Scholar 

  96. NiceActimize (2009) Fortent is now part of NICE actimize. Accessed on 5 Jan 2017 (Online). http://www.niceactimize.com/index.aspx?page=fortent

  97. NiceActimize (n.d.) Nice actimize: top-5 US Bank using fraud prevention solution from actimize, a NICE Company, detects 73% of wire fraud attempts in real-time and realizes 100% ROI on 7-digit investment within six weeks. Accessed on 5 Jan 2017 (Online). http://www.prnewswire.com/news-releases/top-5-us-bank-using-fraud-prevention-solution-from-actimize-a-nice-company-detects-73-of-wire-fraud-attempts-in-real-time-and-realizes-100-roi-on-7-digit-investment-within-six-weeks-62237467.html

  98. Fulton S (n.d.) Logica announces new intelligent self-learning software to increase banks’ filtering systems efficiency. Accessed on 5 Jan 2017 (Online). http://www.marketwired.com/press-release/logica-announces-new-intelligent-self-learning-software-increase-banks-filtering-systems-1688140.htm

  99. Ramentol E, Caballero Y, Bello R, Herrera F (2012) SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl Inf Syst 33(2):245–265

    Article  Google Scholar 

  100. Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by a 3rd Called Collaboration with Public Universities and Agencies grant from the University of Nottingham, Malaysia Campus with Project No. UNHT0001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyuan Chen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Van Khoa, L.D., Teoh, E.N. et al. Machine learning techniques for anti-money laundering (AML) solutions in suspicious transaction detection: a review. Knowl Inf Syst 57, 245–285 (2018). https://doi.org/10.1007/s10115-017-1144-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1144-z

Keywords

  • Anti-money laundering
  • Data mining methods and algorithms
  • Supervised learning
  • Unsupervised learning
  • Anti-money laundering typologies
  • Link analysis
  • Behavioural modelling
  • Risk scoring
  • Anomaly detection
  • Geographic capability