Credit Card Transactions, Fraud Detection, and Machine Learning: Modelling Time with LSTM Recurrent Neural Networks

  • Bénard Wiese
  • Christian Omlin
Part of the Studies in Computational Intelligence book series (SCI, volume 247)


In recent years, topics such as fraud detection and fraud prevention have received a lot of attention on the research front, in particular from payment card issuers. The reason for this increase in research activity can be attributed to the huge annual financial losses incurred by card issuers due to fraudulent use of their card products. A successful strategy for dealing with fraud can quite literally mean millions of dollars in savings per year on operational costs. Artificial neural networks have come to the front as an at least partially successful method for fraud detection. The success of neural networks in this field is, however, limited by their underlying design - a feedforward neural network is simply a static mapping of input vectors to output vectors, and as such is incapable of adapting to changing shopping profiles of legitimate card holders. Thus, fraud detection systems in use today are plagued by misclassifications and their usefulness is hampered by high false positive rates. We address this problem by proposing the use of a dynamic machine learning method in an attempt to model the time series inherent in sequences of same card transactions. We believe that, instead of looking at individual transactions, it makes more sense to look at sequences of transactions as a whole; a technique that can model time in this context will be more robust to minor shifts in legitimate shopping behaviour. In order to form a clear basis for comparison, we did some investigative research on feature selection, preprocessing, and on the selection of performance measures; the latter will facilitate comparison of results obtained by applying machine learning methods to the biased data sets largely associated with fraud detection. We ran experiments on real world credit card transactional data using two innovative machine learning techniques: the support vector machine (SVM) and the long short-term memory recurrent neural network (LSTM).


Support Vector Machine Mean Square Error Receiver Operating Characteristic Curve False Alarm Rate Recurrent Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Acton, F.S.: Analysis of straight-line data. Dover Publications (1959) (1994)Google Scholar
  2. 2.
    Bolton, R.J., Hand, D.J.: Statistical Fraud Detection: A Review. Statistical Science 173, 235–255 (2002)MathSciNetGoogle Scholar
  3. 3.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
  4. 4.
    Brause, R., Langsdorf, T., Hepp, M.: Credit Card Fraud Detection by Adaptive Neural Data Mining (1999),
  5. 5.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)CrossRefGoogle Scholar
  6. 6.
    Card Fraud: The Facts, The definitive guide on plastic card fraud and measures to prevent it. APACS (2005),
  7. 7.
    Caruana, R.: The PERF Performance Evaluation Code (2004),
  8. 8.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001),
  9. 9.
    Cortes, C., Vapnik, V.: Support-Vector networks. Machine Learning 203, 273–297 (1995)Google Scholar
  10. 10.
    Elman, J.L.: Finding structure in time. Cognitive Science Journal 142, 179–211 (1990)CrossRefGoogle Scholar
  11. 11.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to Forget: Continual Prediction with LSTM. Neural Computation 12, 2451–2471 (2000)CrossRefGoogle Scholar
  12. 12.
    Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning Precise Timing with LSTM Recurrent Networks. Journal of Machine Learning Research 3, 115–143 (2002)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Harvey Jr., L.O.: Detection Sensitivity and Response Bias. Psychology of Perception. Psychology 4165, Department of Psychology, University of Colorado (2003)Google Scholar
  14. 14.
    Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  15. 15.
    Hsu, C., Chang, C., Lin, C.: A Practical Guide to Support Vector Classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University (2003)Google Scholar
  16. 16.
    Jordan, M.I.: Attractor dynamics and parallelism in a connectionist sequential machine. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pp. 531–546 (1986)Google Scholar
  17. 17.
    Keerthi, S.S., Lin, C.: Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel. Neural Computation 15, 1667–1689 (2003)zbMATHCrossRefGoogle Scholar
  18. 18.
    Kroon, S., Omlin, C.W.: Getting to grips with Support Vector Machines: Application. South African Statistical Journal 282, 93–114 (2003)MathSciNetGoogle Scholar
  19. 19.
    Kroon, S., Omlin, C.W.: Getting to grips with Support Vector Machines: Theory. South African Statistical Journal 282, 159–172 (2004)MathSciNetGoogle Scholar
  20. 20.
    Lee, Y., Lin, Y., Wahba, G.: Multicategory Support Vector Machines. Technical Report TR1040, Department of Statistics, University of Wisconsin (2001)Google Scholar
  21. 21.
    Maes, S., Tuyls, K., Vanschoenwinkel, B., Manderick, B.: Credit Card Fraud Detection Using Bayesian and Neural Networks. In: Proceedings of the 1st International NAISO Congress on Neuro Fuzzy Technologies, Havana, Cuba (2002)Google Scholar
  22. 22.
    Masters, T.: Practical neural network recipes in C++. Academic Press, London (1993)Google Scholar
  23. 23.
    Mena, J.: Investigative data mining for security and criminal detection. Butterworth-Heinemann (2003)Google Scholar
  24. 24.
    Mitchell, T.M.: Machine learning. MIT Press and The McGraw-Hill Companies, Inc. (1997)Google Scholar
  25. 25.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C. Cambridge University Press, Cambridge (1992)zbMATHGoogle Scholar
  26. 26.
    Robinson, A.J., Fallside, F.: Static and dynamic error propagation networks with application to speech coding. In: Anderson, D.Z. (ed.) Neural Information Processing System, American Institute of Physics (1998)Google Scholar
  27. 27.
    Williams, R.J., Zipser, D.: Gradient-based learning algorithms for recurrent networks and their computational complexity. Back-propagation: Theory, Architectures and Applications (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Bénard Wiese
    • 1
  • Christian Omlin
    • 2
  1. 1.Intelligent Systems Group, Department of Computer ScienceUniversity of the Western CapeCape TownSouth Africa
  2. 2.Middle East Technical University,Northern Cyprus Campus Kalkanli, Güzelyurt, KKTCMersin 10Turkey

Personalised recommendations