Detection of Fraud Symptoms in the Retail Industry

  • Rita P. RibeiroEmail author
  • Ricardo Oliveira
  • João GamaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10022)


Data mining is one of the most effective methods for fraud detection. This is highlighted by 25 % of organizations that have suffered from economic crimes [1]. This paper presents a case study using real-world data from a large retail company. We identify symptoms of fraud by looking for outliers. To identify the outliers and the context where outliers appear, we learn a regression tree. For a given node, we identify the outliers using the set of examples covered at that node, and the context as the conjunction of the conditions in the path from the root to the node. Surprisingly, at different nodes of the tree, we observe that some outliers disappear and new ones appear. From the business point of view, the outliers that are detected near the leaves of the tree are the most suspicious ones. These are cases of difficult detection, being observed only in a given context, defined by a set of rules associated with the node.


Outliers Contextual outliers Data mining 



This work was supported by research project TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER-000020, financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund and by European Commission through the project MAESTRA (ICT-2013-612944).


  1. 1.
    Skalak, S.: Global economic crime survey. Technical report, PwC (2014)Google Scholar
  2. 2.
    Jans, M., Lybaert, N., Vanhoof, K: Data mining for fraud detection: toward an improvement on internal control systems? In: 30th Annual Congress European Accounting Association (EAA 2007)Google Scholar
  3. 3.
    Coderre, D.: Computer-Aided Fraud Prevention & Detection. Wiley, Hoboken (2009)Google Scholar
  4. 4.
    Torgo, L.: Data Mining with R: Learning with Case Studies, 1st edn. Chapman & Hall/CRC, Boca Raton (2010)CrossRefGoogle Scholar
  5. 5.
    Bates, A.: Fraud risk management: developing a strategy for prevention,detection, and response, Technical report, KPMG Advisory Forensic (2006)Google Scholar
  6. 6.
    Stulb, D., Remnitz, D.: Big risks require big data thinking: global forensic data analytics survey 2014. Technical report, EY (2014)Google Scholar
  7. 7.
    Singh, K., Upadhyaya, S.: Outlier detection: applications and techniques. Int. J. Comput. Sci. Issues 9(3), 307–323 (2012)MathSciNetGoogle Scholar
  8. 8.
    Kristin, R.N., Matkovsky, I.P.: Using data mining techniques for fraud detection. Technical report, SAS Institute Inc. and Federal Data Corporation (1999)Google Scholar
  9. 9.
    Phua, C., Lee, V.C.S., Smith-Miles, K., Gayler, R.W.: A comprehensive survey of data mining-based fraud detection research. CoRR abs/1009.6119 (2010)Google Scholar
  10. 10.
    Hawkins, D.: Identification of Outliers. Monographs on Applied Probability and Statistics. Chapman & Hall, New York (1980)CrossRefGoogle Scholar
  11. 11.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefGoogle Scholar
  12. 12.
    Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Anglia Ruskin University: NuMBerS: numerical methods for biosciences students. Accessed 02 May 2016
  14. 14.
    Wells, J.T.: Corporate Fraud Handbook: Prevention and Detection, 2nd edn. Wiley, Hoboken (2007)Google Scholar
  15. 15.
    Gama, J., Carvalho, A., Faceli, K., Lorena, C., Oliveira, M.: Extração de Conhecimento de Dados - Data Mining, 1st edn. Silabo (2012)Google Scholar
  16. 16.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall, New York (1984)zbMATHGoogle Scholar
  17. 17.
    Therneau, T., Atkinson, B., Ripley, B.: rpart: Recursive Partitioning and Regression Trees. R package version 4.1-10 (2015)Google Scholar
  18. 18.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Faculty of SciencesUniversity of PortoPortoPortugal
  2. 2.LIAAD/INESC TECUniversity of PortoPortoPortugal
  3. 3.Faculty of EconomicsUniversity of PortoPortoPortugal

Personalised recommendations