Data Mining and Knowledge Discovery

, Volume 25, Issue 3, pp 577–602 | Cite as

Mining blackhole and volcano patterns in directed graphs: a general approach

Article

Abstract

Given a directed graph, the problem of blackhole mining is to identify groups of nodes, called blackhole patterns, in a way such that the average in-weight of this group is significantly larger than the average out-weight of the same group. The problem of finding volcano patterns is a dual problem of mining blackhole patterns. Therefore, we focus on discovering the blackhole patterns. Indeed, in this article, we develop a generalized blackhole mining framework. Specifically, we first design two pruning schemes for reducing the computational cost by reducing both the number of candidate patterns and the average computation cost for each candidate pattern. The first pruning scheme is to exploit the concept of combination dominance to reduce the exponential growth search space. Based on this pruning approach, we develop the gBlackhole algorithm. Instead, the second pruning scheme is an approximate approach, named approxBlackhole, which can strike a balance between the efficiency and the completeness of blackhole mining. Finally, experimental results on real-world data show that the performance of approxBlackhole can be several orders of magnitude faster than gBlackhole, and both of them have huge computational advantages over the brute-force approach. Also, we show that the blackhole mining algorithm can be used to capture some suspicious financial fraud patterns.

Keywords

Blackhole pattern Volcano pattern Financial fraud detection Graph mining Network structure analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adamic L, Brunetti C, Harris J, Kirilenko A (2010) Trading networks. SSRN eLibrary. http://ssrn.com/paper=1361184
  2. Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: Spotting anomalies in weighted graphs. In: Proceedings of the 14th pacific-Asia conference on knowledge discovery and data mining (PAKDD’10), Hyderabad, pp 410–421Google Scholar
  3. Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New YorkMATHGoogle Scholar
  4. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (ACM SIGMOD’00), Providence, pp 93–104Google Scholar
  5. Chakrabarti D (2004) Autopart: Parameter-free graph partitioning and outlier detection. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD’04), Pisa, pp 112–124Google Scholar
  6. Chaudhary A, Szalay AS, Moore AW (2002) Very fast outlier detection in large multidimensional data sets. In: Proceedings of ACM SIGMOD workshop on research issues in data mining and knowledge discovery, DalasGoogle Scholar
  7. Cook DJ, Holder LB (1994) Substructure discovery using minimum description length and background knowledge. J Artif Intel Res (JAIR) 1: 231–255Google Scholar
  8. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. The MIT Press, CambridgeMATHGoogle Scholar
  9. Diestel R (2006) Graph theory (Graduate texts in mathematics). Springer, HeidelbergGoogle Scholar
  10. Gehrke J, Ginsparg P, Kleinberg JM (2003) Overview of the 2003 KDD Cup. In: ACM SIGKDD Explorations 5(2):149–151Google Scholar
  11. Ghosh R, Lerman K (2008) Community detection using a measure of global influence. In: The 2nd SNA-KDD workshop on social network mining and analysis (SNA-KDD’08), Las VegasGoogle Scholar
  12. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826Google Scholar
  13. Hawkins D (1980) Identification of outliers. Chapman and Hall, DordrechtMATHGoogle Scholar
  14. Hopcroft J, Khan O, Kulis B, Selman B (2003) Natural communities in large linked networks. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’03), WashingtonGoogle Scholar
  15. Huan J, Wang W, Prins J (2003) Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining (IEEE ICDM’03), MelbourneGoogle Scholar
  16. Jiang X, Xiong H, Wang C, Tan AH (2009) Mining globally distributed frequent subgraphs in a single labeled graph. Data Knowl Eng 68: 1034–1058CrossRefGoogle Scholar
  17. Johnson RA, Wichern DW (1998) Applied multivariate statistical analysis. Prentice Hall, New YorkGoogle Scholar
  18. Knuth D (2011) The art of computer programming, Vol 4A: combinatorial algorithms. Addison-Wesley, BostonGoogle Scholar
  19. Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Mining Knowl Discov 11(3): 243–271MathSciNetCrossRefGoogle Scholar
  20. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’05), Chicago, pp 157–166Google Scholar
  21. Leskovec J, Faloutsos C (2006) Sampling from Large Graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’06), Philadelphia, pp 631–636Google Scholar
  22. Leskovec J, Huttenlocher D, Kleinberg J (2010a) Predicting Positive and Negative Links in Online Social Networks. In: Proceedings of the 19th international world wide web conference (WWW’10), RaleighGoogle Scholar
  23. Leskovec J, Huttenlocher D, Kleinberg J (2010b) Signed Networks in Social Media. In: Proceedings of the 28th ACM conference on human factors in computing systems (CHI’10), AtlantaGoogle Scholar
  24. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD’05), ChicagoGoogle Scholar
  25. Leskovec J, Lang K, Dasgupta A, Mahoney M (2008) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. In: arXiv.org:0810.1355Google Scholar
  26. Li Z, Xiong H, Liu Y, Zhou A (2010) Detecting Blackhole and Volcano Patterns in Directed Networks. In: Proceedings of the 10th IEEE International Conference on Data Mining (IEEE ICDM’10), Australia, pp 294–303Google Scholar
  27. Mehlhorn K, Naher S (1999) The LEDA platform of combinatorial and geometric computing. Cambridge University Press, CambridgeGoogle Scholar
  28. Moonesinghe HDK, Tan P-N (2008) Outrank: a graph-based outlier detection framework using random walk. Int J Artif Intel Tools 17(1):19–36Google Scholar
  29. Newman MEJ (2004) Detecting community structure in networks. Eur Phys J B 38: 321–330CrossRefGoogle Scholar
  30. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69: 026113CrossRefGoogle Scholar
  31. Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’03), Washington, pp 631–636Google Scholar
  32. Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE’03), Bangalore, pp 315–326Google Scholar
  33. Pathak N, DeLong C, Banerjee A, Erickson K (2008) Social topic models for community extraction. In: The 2nd SNA-KDD Workshop on Social Network Mining and Analysis (SNA-KDD’08), Las VegasGoogle Scholar
  34. Steyvers M, Smyth P, Rosen-Zvi M, Griffiths T (2004) Probabilistic author-topic models for information discovery. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’04), MagdeburgGoogle Scholar
  35. Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graph. In: Proceedings of the 5th IEEE international conference on data mining (IEEE ICDM’05), Houston, pp 418–425Google Scholar
  36. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, BostonGoogle Scholar
  37. Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-based graph databases. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (ACM SIGKDD’04), MagdeburgGoogle Scholar
  38. Wang J, Hsu W, Lee M, Sheng C (2006) A partition-based approach to graph mining. In: Proceedings of the 22nd international conference on data engineering (ICDE’06), Atlanta, p 74Google Scholar
  39. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2nd IEEE international conference on data mining (IEEE ICDM’02), MaebashiGoogle Scholar
  40. Zhou D, Manavoglu E, Li J, Giles CL, Zha H (2006) Probabilistic models for discovering e-communities. In: Proceedings of the 15th international world wide web conference (WWW’06), EdinburghGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  1. 1.Department of Management Science and Information SystemsRutgers UniversityNewarkUSA
  2. 2.University of Science and Technology BeijingBeijingChina

Personalised recommendations