Skip to main content

Hunting for Fraudsters in Random Forests

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7208)

Abstract

In this paper we present a hybrid method for identifying suspicious behavior in transactional data by combining techniques from outlier detection and subgroup discovery. Most existing outlier detection approaches focus on the identification of single outliers without providing a description of these outliers. Moreover, these methods find single outliers instead of groups of outlying records. However, when searching for fraud, it is important to analyze data not on the level of single records but on a higher, group level, such as sets of records of customers, shops, etc. Our method is able to analyze data on such a higher level and additionally it provides descriptions of groups of found outliers.

The method involves three steps: scoring of individual records with help of a newly proposed outlier measure which is calculated with help of random forests, identification of unusual groups of records with help of subgroup discovery techniques, and finally, identify the most deviating entities such as shops, hospitals.

Keywords

  • subgroup discovery
  • outlier detection
  • fraud detection
  • random forests

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. SIGMOD Rec. 30(2), 37–46 (2001)

    CrossRef  Google Scholar 

  2. Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)

    CrossRef  MATH  Google Scholar 

  3. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001), doi:10.1023/A:1010933404324

    CrossRef  MATH  Google Scholar 

  4. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. SIGMOD Rec. 29(2), 93–104 (2000)

    CrossRef  Google Scholar 

  5. Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml

  6. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Comput. Surv. 38 (September 2006)

    Google Scholar 

  7. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th Ann. International Conference on Very Large Data Bases (VLDB), pp. 392–403 (1998)

    Google Scholar 

  8. Leman, D., Feelders, A., Knobbe, A.: Exceptional Model Mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  9. Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE Computer Society, Washington, DC (2008)

    CrossRef  Google Scholar 

  10. Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 199–213. Springer, Heidelberg (2003)

    CrossRef  Google Scholar 

  11. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral, p. 315. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  12. Pieters, B., Knobbe, A., Dzeroski, S.: Subgroup discovery in ranked data, with an application to gene set enrichment. In: Proceedings Preference Learning Workshop (2010)

    Google Scholar 

  13. Rousseeuw, P.J., Driessen, K.V.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    CrossRef  Google Scholar 

  14. Tang, J., Chen, Z., Fu, A., Cheung, D.: A robust outlier detection scheme for large data sets. In: 6th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  15. Hastie, T., Robert Tibshirani, J.F.: The Elements of statistical learning: Data mining, inference, and prediction. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  16. Wrobel, S.: An Algorithm for Multi-Relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)

    CrossRef  Google Scholar 

  17. Yamanishi, K., Takeuchi, J.-I.: Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner. In: KDD 2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–394. ACM, New York (2001)

    CrossRef  Google Scholar 

  18. Zhang, J., Zulkernine, M.: Anomaly based network intrusion detection with unsupervised outlier detection. In: IEEE International Conference on Communications, ICC 2006, vol. 5, pp. 2388–2393 (June 2006)

    Google Scholar 

  19. Zhu, C., Kitagawa, H., Papadimitriou, S., Faloutsos, C.: Example-based outlier detection with relevance feedback. DBSJ Letters 3(2) (2004)

    Google Scholar 

  20. Zhu, C., Kitagawa, H., Papadimitriou, S., Faloutsos, C.: OBE: Outlier by Example. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 222–234. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Konijn, R.M., Kowalczyk, W. (2012). Hunting for Fraudsters in Random Forests. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28942-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28942-2_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28941-5

  • Online ISBN: 978-3-642-28942-2

  • eBook Packages: Computer ScienceComputer Science (R0)