Abstract
In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The detection of anomalies in these networks has become increasingly important, such as in exposing infected endpoints in computer networks or identifying socialbots. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by utilizing topology-based features. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous, we applied our method on 10 networks of various scales, from a network of several dozen students to online networks with millions of vertices. In every scenario, we succeeded in identifying anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is generic, and efficient both in revealing fake users and in disclosing the influential people in social networks.
Similar content being viewed by others
Notes
In a large dataset, computing dozens of features can last several hours or even several days; to avoid extremely long computations we used only computationally efficient features (Fire et al. 2013).
This not the standard kNN acronym, but a set of weight functions defined by Cukierski et al. (2011).
We limited the crawler to crawling a maximum of 1000 friends and followers for every profile (see Sect. 9). This limitation is due to the fact that Twitter accounts can have an unlimited number of friends and followers, which in some cases can reach several million.
A Sybil attack is when the adversary controls a substantial fraction of the vertices in the system, which are then used to influence and manipulate the system to achieve the end goals of the attacker.
References
Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds) Advances in Knowledge Discovery and Data Mining, vol 6119. Springer, Berlin, Heidelberg
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688
Al Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SDM’06: workshop on link analysis, counter-terrorism and security
Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47
Balthrop J, Forrest S, Newman ME, Williamson MM (2004) Technological networks and the spread of computer viruses. Science 304(5670):527–529
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU (2006) Complex networks: structure and dynamics. Phys Rep 424(4):175–308
Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat sci 17:235–249
Boshmaf Y, Muslukhov I, Beznosov K, Ripeanu M (2011) The socialbot network: when bots socialize for fame and money. In: Proceedings of the 27th Annual Computer Security Applications Conference. ACM, pp 93–102
Brin S, Page L (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833
Cao Q, Sirivianos M, Yang X, Pregueiro T (2012) Aiding the detection of fake accounts in large scale social online services. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, p 15
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Cukierski W, Hamner B, Yang B (2011) Graph-based features for supervised link prediction. In: The 2011 international joint conference on neural networks (IJCNN). IEEE, pp 1237–1244
Douceur JR (2002) The Sybil attack. In: International workshop on peer-to-peer systems. Springer, pp 251–260
Eberle W, Holder L (2007) Anomaly detection in data represented as graphs. Intell Data Anal 11(6):663–689
Elyashar A, Fire M, Kagan D, Elovici Y (2014) Guided socialbots: infiltrating the social networks of specific organizations’ employees. AI Commun 29(1):87–106
Facebook (2015) Facebooks annual report 2015. https://s21.q4cdn.com/399680738/files/doc_financials/annual_reports/2015-Annual-Report.pdf. Accessed 16 Oct 2016
Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1(3):291–316
Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104
Fire M, Guestrin C (2016) Analyzing complex network user arrival patterns and their effect on network topologies. arXiv:160307445
Fire M, Tenenboim L, Lesser O, Puzis R, Rokach L, Elovici Y (2011) Link prediction in social networks using computationally efficient topological features. In: 2011 IEEE third international conference on privacy, security, risk and trust (PASSAT) and social computing (SocialCom). IEEE, pp 73–80
Fire M, Katz G, Elovici Y (2012) Strangers intrusion detection-detecting spammers and fake profiles in social networks based on topology anomalies. Hum J 1(1):26–39
Fire M, Tenenboim-Chekina L, Puzis R, Lesser O, Rokach L, Elovici Y (2013) Computationally efficient link prediction in a variety of social networks. ACM Trans Intell Syst Technol (TIST) 5(1):10
Heidler R, Gamper M, Herz A, Eßer F (2014) Relationship patterns in the 19th century: the friendship network in a German boys’ school class from 1880 to 1881 revisited. Soc Netw 37:1–13
Hernandez D (2015) Why can’t twitter kill its bots? http://fusion.net/story/195901/twitter-bots-spam-detection/. Accessed 16 Oct 2016
Hofmeyr SA, Forrest S, Somayaji A (1998) Intrusion detection using sequences of system calls. J Comput Secur 6(3):151–180
Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016) Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 895–904
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 631–636
Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30
Plante C (2014) That’s not a celebrity you’re following on twitter, it’s an assistant. http://www.theverge.com/2014/9/8/6121985/celebrity-twitter-adam-levine. Accessed 16 Oct 2016
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. ACM, pp 1–9
Strogatz SH (2001) Exploring complex networks. Nature 410(6825):268–276
Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Fifth IEEE international conference on data mining. IEEE, p 8
Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, pp 243–258
Vaas L (2014) Good bot, bad bot? 23 million Twitter accounts are automated. https://nakedsecurity.sophos.com/2014/08/14/good-bot-bad-bot-23-million-twitter-accounts-are-automated/. Accessed 16 Oct 2016
Wang XF, Chen G (2003) Complex networks: small-world, scale-free and beyond. IEEE Circuits Syst Mag 3(1):6–20
Acknowledgements
We would like to thank Carol Teegarden and Robin Levy-Stevenson for editing and proofreading this article to completion. We also thank the Washington Research Foundation Fund for Innovation in Data-Intensive Discovery, and the Moore/Sloan Data Science Environment Project at the University of Washington for supporting this study. Finally, we would like to thank the anonymous reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (MP4 53,299 kb)
Rights and permissions
About this article
Cite this article
Kagan, D., Elovichi, Y. & Fire, M. Generic anomalous vertices detection utilizing a link prediction algorithm. Soc. Netw. Anal. Min. 8, 27 (2018). https://doi.org/10.1007/s13278-018-0503-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0503-4