Knowledge and Information Systems

, Volume 48, Issue 2, pp 399–428 | Cite as

Inferring lockstep behavior from connectivity pattern in large graphs

  • Meng Jiang
  • Peng Cui
  • Alex Beutel
  • Christos Faloutsos
  • Shiqiang Yang
Regular Paper

Abstract

Given multimillion-node graphs such as “who-follows-whom”, “patent-cites-patent”, “user-likes-page” and “actor/director-makes-movie” networks, how can we find unexpected behaviors? When companies operate on the graphs with monetary incentives to sell Twitter “Followers” and Facebook page “Likes”, the graphs show strange connectivity patterns. In this paper, we study a complete graph from a large Twitter-style social network, spanning up to 3.33 billion edges. We report strange deviations from typical patterns like smooth degree distributions. We find that such deviations are often due to “lockstep behavior” that large groups of followers connect to the same groups of followees. Our first contribution is that we study strange patterns on the adjacency matrix and in the spectral subspaces with respect to several flavors of lockstep. We discover that (a) the lockstep behaviors on the graph shape dense “block” in its adjacency matrix and creates “rays” in spectral subspaces, and (b) partially overlapping of the behaviors shape “staircase” in its adjacency matrix and creates “pearls” in spectral subspaces. The second contribution is that we provide a fast algorithm, using the discovery as a guide for practitioners, to detect users who offer the lockstep behaviors in undirected/directed/bipartite graphs. We carry out extensive experiments on both synthetic and real datasets, as well as public datasets from IMDb and US Patent. The results demonstrate the scalability and effectiveness of our proposed algorithm.

Keywords

Lockstep behavior Connectivity pattern Spectral subspace Propagation method Singular vector 

References

  1. 1.
    Becker RA, Volinsky C, Wilks AR (2010) Fraud detection in telecommunications: history and lessons learned. Technometrics 52(1):20–33MathSciNetCrossRefGoogle Scholar
  2. 2.
    Chau DH, Pandit S, Faloutsos C (2006) Detecting fraudulent personalities in networks of online auctioneers. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006. Springer, Berlin Heidelberg, pp 103–114CrossRefGoogle Scholar
  3. 3.
    Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) CopyCatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on World Wide Web, pp 119–130. International World Wide Web Conferences Steering CommitteeGoogle Scholar
  4. 4.
    Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web, pp 695-704. ACMGoogle Scholar
  5. 5.
    Fortunato Santo (2010) Community detection in graphs. Phys Rep 486(3):75–174MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen Jie, Saad Yousef (2012) Dense subgraph extraction with application to community detection. Knowl Data Eng IEEE Trans 24(7):1216–1230CrossRefGoogle Scholar
  7. 7.
    Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on Information and knowledge management, pp 25–32. ACMGoogle Scholar
  8. 8.
    Gnnemann S, Boden B, Frber I, Seidl T (2013) Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 261–275CrossRefGoogle Scholar
  9. 9.
    Chung F, Linyuan L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Inferring strange behavior from connectivity pattern in social networks. In: Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y (eds) Advances in knowledge discovery and data mining. Springer, pp 126–138Google Scholar
  11. 11.
    Chakrabarti S (2002) Mining the web: discovering knowledge from hypertext data. Elsevier, San FranciscoGoogle Scholar
  12. 12.
    Aggarwal CC, Wang H (2010) Managing and mining graph data, vol 40. Springer, New YorkGoogle Scholar
  13. 13.
    Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 228–238. ACMGoogle Scholar
  14. 14.
    Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data (TKDD) 2(4):16MathSciNetGoogle Scholar
  15. 15.
    Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 286–295. ACMGoogle Scholar
  16. 16.
    Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inf Syst 24(3):467–497CrossRefGoogle Scholar
  17. 17.
    Bahmani B, Kumar R, Vassilvitskii Sergei (2012) Densest subgraph in streaming and mapreduce. Proc VLDB Endow 5(5):454–465CrossRefGoogle Scholar
  18. 18.
    Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) CatchSync: catching synchronized behavior in large directed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 941–950. ACMGoogle Scholar
  19. 19.
    Moonesinghe HDK, Tan P-N (2008) Outrank: a graph-based outlier detection framework using random walk. Int J Artif Intell Tools 17(1):19–36CrossRefGoogle Scholar
  20. 20.
    Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957CrossRefGoogle Scholar
  22. 22.
    Chakrabarti D (2004) Autopart: Parameter-free graph partitioning and outlier detection. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202. Springer, Berlin, Heidelberg, pp 112–124Google Scholar
  23. 23.
    Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. AKDDM 17(1):410–421Google Scholar
  24. 24.
    Feng J, He X, Konte B, Bhm C, Plant C (2012) Summarization-based mining bipartite graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1249–1257. ACMGoogle Scholar
  25. 25.
    Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Detecting suspicious following behavior in multimillion-node social networks. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion, pp 305–306. International World Wide Web Conferences Steering CommitteeGoogle Scholar
  26. 26.
    Jiang M, Hooi B, Beutel A, Yang S, Cui P, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: Proceedings of IEEE international conference on data mining. IEEEGoogle Scholar
  27. 27.
    Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 907–916. ACMGoogle Scholar
  28. 28.
    Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering analysis and an algorithm. In: Proceedings of advances in neural information processing systems. Cambridge, MIT Press 14: 849–856Google Scholar
  29. 29.
    Huang L, Yan D, Jordan MI, Taft N (2008) Spectral clustering with perturbed data. In: NIPS, vol 21Google Scholar
  30. 30.
    Prakash BA, Sridharan A, Seshadri M, Machiraju S, Faloutsos C (2010) Eigenspokes: surprising patterns and scalable community chipping in large graphs. In: Advances in knowledge discovery and data mining, pp 435–448. Springer, Berlin, HeidelbergGoogle Scholar
  31. 31.
    Ying X, Xintao W (2009) On randomness measures for social networks. SDM 9:709–720Google Scholar
  32. 32.
    Wu L, Ying X, Wu X, Zhou Z-H (2011) Line orthogonality in adjacency eigenspace with application to community partition. In: Proceedings of the twenty-second international joint conference on artificial intelligence, Vol 3, pp 2349–2354. AAAI PressGoogle Scholar
  33. 33.
    Newman Mark EJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104MathSciNetCrossRefGoogle Scholar
  34. 34.
    Satuluri V, Parthasarathy S (2009) Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 737–746. ACMGoogle Scholar
  35. 35.
    Aaron C, Newman MEJ, Cristopher M (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111CrossRefGoogle Scholar
  36. 36.
    Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks:[extended abstract]. In: Proceedings of the 16th international conference on World Wide Web, pp 1275–1276. ACMGoogle Scholar
  37. 37.
    Kalman D (1996) A singularly valuable decomposition: the SVD of a matrix. Coll Math J 27:2–23MathSciNetCrossRefGoogle Scholar
  38. 38.
    Brownrigg DRK (1984) The weighted median filter. Commun ACM 27(8):807–818CrossRefGoogle Scholar
  39. 39.
    Kang U, Meeder B, Papalexakis EE, Faloutsos C (2014) Heigen: spectral analysis for billion-scale graphs. Knowl Data Eng IEEE Trans 26(2):350–362CrossRefGoogle Scholar
  40. 40.
    Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener Janet (2000) Graph structure in the web. Comput Netw 33(1):309–320CrossRefGoogle Scholar
  41. 41.
    Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: ACM SIGCOMM computer communication review, vol 29, no 4, pp 251–262. ACMGoogle Scholar
  42. 42.
    Hall BH, Jaffe AB, Trajtenberg M (2001) The NBER patent citations data file: lessons, insights and methodological tools. In: NBER working papers 8498, National Bureau of Economic Research, IncGoogle Scholar
  43. 43.
    Trappey CV, Trappey AJC, Wu C-Y (2001) Clustering patents using non-exhaustive overlaps. J Syst Sci Syst Eng 19(2):162–181CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Meng Jiang
    • 1
  • Peng Cui
    • 1
  • Alex Beutel
    • 2
  • Christos Faloutsos
    • 2
  • Shiqiang Yang
    • 1
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations