Abstract
In this paper we present a hybrid method for identifying suspicious behavior in transactional data by combining techniques from outlier detection and subgroup discovery. Most existing outlier detection approaches focus on the identification of single outliers without providing a description of these outliers. Moreover, these methods find single outliers instead of groups of outlying records. However, when searching for fraud, it is important to analyze data not on the level of single records but on a higher, group level, such as sets of records of customers, shops, etc. Our method is able to analyze data on such a higher level and additionally it provides descriptions of groups of found outliers.
The method involves three steps: scoring of individual records with help of a newly proposed outlier measure which is calculated with help of random forests, identification of unusual groups of records with help of subgroup discovery techniques, and finally, identify the most deviating entities such as shops, hospitals.
Keywords
- subgroup discovery
- outlier detection
- fraud detection
- random forests
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. SIGMOD Rec. 30(2), 37–46 (2001)
Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001), doi:10.1023/A:1010933404324
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. SIGMOD Rec. 29(2), 93–104 (2000)
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Comput. Surv. 38 (September 2006)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th Ann. International Conference on Very Large Data Bases (VLDB), pp. 392–403 (1998)
Leman, D., Feelders, A., Knobbe, A.: Exceptional Model Mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008)
Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE Computer Society, Washington, DC (2008)
Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 199–213. Springer, Heidelberg (2003)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral, p. 315. IEEE Computer Society, Los Alamitos (2003)
Pieters, B., Knobbe, A., Dzeroski, S.: Subgroup discovery in ranked data, with an application to gene set enrichment. In: Proceedings Preference Learning Workshop (2010)
Rousseeuw, P.J., Driessen, K.V.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Tang, J., Chen, Z., Fu, A., Cheung, D.: A robust outlier detection scheme for large data sets. In: 6th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (2002)
Hastie, T., Robert Tibshirani, J.F.: The Elements of statistical learning: Data mining, inference, and prediction. Springer, Heidelberg (2001)
Wrobel, S.: An Algorithm for Multi-Relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Yamanishi, K., Takeuchi, J.-I.: Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner. In: KDD 2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–394. ACM, New York (2001)
Zhang, J., Zulkernine, M.: Anomaly based network intrusion detection with unsupervised outlier detection. In: IEEE International Conference on Communications, ICC 2006, vol. 5, pp. 2388–2393 (June 2006)
Zhu, C., Kitagawa, H., Papadimitriou, S., Faloutsos, C.: Example-based outlier detection with relevance feedback. DBSJ Letters 3(2) (2004)
Zhu, C., Kitagawa, H., Papadimitriou, S., Faloutsos, C.: OBE: Outlier by Example. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 222–234. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Konijn, R.M., Kowalczyk, W. (2012). Hunting for Fraudsters in Random Forests. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28942-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-28942-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28941-5
Online ISBN: 978-3-642-28942-2
eBook Packages: Computer ScienceComputer Science (R0)
