Detection of Fraud Symptoms in the Retail Industry
Data mining is one of the most effective methods for fraud detection. This is highlighted by 25 % of organizations that have suffered from economic crimes . This paper presents a case study using real-world data from a large retail company. We identify symptoms of fraud by looking for outliers. To identify the outliers and the context where outliers appear, we learn a regression tree. For a given node, we identify the outliers using the set of examples covered at that node, and the context as the conjunction of the conditions in the path from the root to the node. Surprisingly, at different nodes of the tree, we observe that some outliers disappear and new ones appear. From the business point of view, the outliers that are detected near the leaves of the tree are the most suspicious ones. These are cases of difficult detection, being observed only in a given context, defined by a set of rules associated with the node.
KeywordsOutliers Contextual outliers Data mining
This work was supported by research project TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER-000020, financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund and by European Commission through the project MAESTRA (ICT-2013-612944).
- 1.Skalak, S.: Global economic crime survey. Technical report, PwC (2014)Google Scholar
- 2.Jans, M., Lybaert, N., Vanhoof, K: Data mining for fraud detection: toward an improvement on internal control systems? In: 30th Annual Congress European Accounting Association (EAA 2007)Google Scholar
- 3.Coderre, D.: Computer-Aided Fraud Prevention & Detection. Wiley, Hoboken (2009)Google Scholar
- 5.Bates, A.: Fraud risk management: developing a strategy for prevention,detection, and response, Technical report, KPMG Advisory Forensic (2006)Google Scholar
- 6.Stulb, D., Remnitz, D.: Big risks require big data thinking: global forensic data analytics survey 2014. Technical report, EY (2014)Google Scholar
- 8.Kristin, R.N., Matkovsky, I.P.: Using data mining techniques for fraud detection. Technical report, SAS Institute Inc. and Federal Data Corporation (1999)Google Scholar
- 9.Phua, C., Lee, V.C.S., Smith-Miles, K., Gayler, R.W.: A comprehensive survey of data mining-based fraud detection research. CoRR abs/1009.6119 (2010)Google Scholar
- 13.Anglia Ruskin University: NuMBerS: numerical methods for biosciences students. http://web.anglia.ac.uk/numbers/. Accessed 02 May 2016
- 14.Wells, J.T.: Corporate Fraud Handbook: Prevention and Detection, 2nd edn. Wiley, Hoboken (2007)Google Scholar
- 15.Gama, J., Carvalho, A., Faceli, K., Lorena, C., Oliveira, M.: Extração de Conhecimento de Dados - Data Mining, 1st edn. Silabo (2012)Google Scholar
- 17.Therneau, T., Atkinson, B., Ripley, B.: rpart: Recursive Partitioning and Regression Trees. R package version 4.1-10 (2015)Google Scholar
- 18.R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2016)Google Scholar