Generic anomalous vertices detection utilizing a link prediction algorithm

Kagan, Dima; Elovichi, Yuval; Fire, Michael

doi:10.1007/s13278-018-0503-4

Generic anomalous vertices detection utilizing a link prediction algorithm

Original Article
Published: 05 April 2018

Volume 8, article number 27, (2018)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

927 Accesses
26 Citations
199 Altmetric
28 Mentions
Explore all metrics

Abstract

In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The detection of anomalies in these networks has become increasingly important, such as in exposing infected endpoints in computer networks or identifying socialbots. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by utilizing topology-based features. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous, we applied our method on 10 networks of various scales, from a network of several dozen students to online networks with millions of vertices. In every scenario, we succeeded in identifying anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is generic, and efficient both in revealing fake users and in disclosing the influential people in social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Meta-Classifier Link Prediction Model for False Profile Identification in Facebook

On Anomaly Detection in Graphs as Node Classification

A Pilot Study and Survey on Methods for Anomaly Detection in Online Social Networks

Notes

https://www.kaggle.com/c/socialNetwork.
In a large dataset, computing dozens of features can last several hours or even several days; to avoid extremely long computations we used only computationally efficient features (Fire et al. 2013).
This not the standard kNN acronym, but a set of weight functions defined by Cukierski et al. (2011).
https://www.academia.edu.
https://www.arxiv.com.
https://snap.stanford.edu/data/cit-HepPh.html.
https://github.com/gephi/gephi/wiki/Datasets.
https://www.dblp.com.
http://dblp.uni-trier.de/xml/dblp.xml.gz.
https://www.flixster.com.
https://www.twitter.com.
https://about.twitter.com/company.
We limited the crawler to crawling a maximum of 1000 friends and followers for every profile (see Sect. 9). This limitation is due to the fact that Twitter accounts can have an unlimited number of friends and followers, which in some cases can reach several million.
https://www.yelp.com.
https://www.yelp.com/dataset_challenge.
https://support.twitter.com/articles/18311.
https://support.twitter.com/articles/15790.
A Sybil attack is when the adversary controls a substantial fraction of the vertices in the system, which are then used to influence and manipulate the system to achieve the end goals of the attacker.
http://data4good.io/dataset.html.
https://github.com/Kagandi/anomalous-vertices-detection.

References

Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds) Advances in Knowledge Discovery and Data Mining, vol 6119. Springer, Berlin, Heidelberg
Chapter Google Scholar
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688
Article MathSciNet Google Scholar
Al Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SDM’06: workshop on link analysis, counter-terrorism and security
Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47
Article MathSciNet MATH Google Scholar
Balthrop J, Forrest S, Newman ME, Williamson MM (2004) Technological networks and the spread of computer viruses. Science 304(5670):527–529
Article Google Scholar
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet MATH Google Scholar
Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU (2006) Complex networks: structure and dynamics. Phys Rep 424(4):175–308
Article MathSciNet MATH Google Scholar
Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat sci 17:235–249
Article MathSciNet MATH Google Scholar
Boshmaf Y, Muslukhov I, Beznosov K, Ripeanu M (2011) The socialbot network: when bots socialize for fame and money. In: Proceedings of the 27th Annual Computer Security Applications Conference. ACM, pp 93–102
Brin S, Page L (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833
Article Google Scholar
Cao Q, Sirivianos M, Yang X, Pregueiro T (2012) Aiding the detection of fake accounts in large scale social online services. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, p 15
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Article Google Scholar
Cukierski W, Hamner B, Yang B (2011) Graph-based features for supervised link prediction. In: The 2011 international joint conference on neural networks (IJCNN). IEEE, pp 1237–1244
Douceur JR (2002) The Sybil attack. In: International workshop on peer-to-peer systems. Springer, pp 251–260
Eberle W, Holder L (2007) Anomaly detection in data represented as graphs. Intell Data Anal 11(6):663–689
Article Google Scholar
Elyashar A, Fire M, Kagan D, Elovici Y (2014) Guided socialbots: infiltrating the social networks of specific organizations’ employees. AI Commun 29(1):87–106
Article MathSciNet Google Scholar
Facebook (2015) Facebooks annual report 2015. https://s21.q4cdn.com/399680738/files/doc_financials/annual_reports/2015-Annual-Report.pdf. Accessed 16 Oct 2016
Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1(3):291–316
Article Google Scholar
Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104
Article Google Scholar
Fire M, Guestrin C (2016) Analyzing complex network user arrival patterns and their effect on network topologies. arXiv:160307445
Fire M, Tenenboim L, Lesser O, Puzis R, Rokach L, Elovici Y (2011) Link prediction in social networks using computationally efficient topological features. In: 2011 IEEE third international conference on privacy, security, risk and trust (PASSAT) and social computing (SocialCom). IEEE, pp 73–80
Fire M, Katz G, Elovici Y (2012) Strangers intrusion detection-detecting spammers and fake profiles in social networks based on topology anomalies. Hum J 1(1):26–39
Google Scholar
Fire M, Tenenboim-Chekina L, Puzis R, Lesser O, Rokach L, Elovici Y (2013) Computationally efficient link prediction in a variety of social networks. ACM Trans Intell Syst Technol (TIST) 5(1):10
Google Scholar
Heidler R, Gamper M, Herz A, Eßer F (2014) Relationship patterns in the 19th century: the friendship network in a German boys’ school class from 1880 to 1881 revisited. Soc Netw 37:1–13
Article Google Scholar
Hernandez D (2015) Why can’t twitter kill its bots? http://fusion.net/story/195901/twitter-bots-spam-detection/. Accessed 16 Oct 2016
Hofmeyr SA, Forrest S, Somayaji A (1998) Intrusion detection using sequences of system calls. J Comput Secur 6(3):151–180
Article Google Scholar
Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016) Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 895–904
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031
Article Google Scholar
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Article MathSciNet MATH Google Scholar
Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 631–636
Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30
Article Google Scholar
Plante C (2014) That’s not a celebrity you’re following on twitter, it’s an assistant. http://www.theverge.com/2014/9/8/6121985/celebrity-twitter-adam-levine. Accessed 16 Oct 2016
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. ACM, pp 1–9
Strogatz SH (2001) Exploring complex networks. Nature 410(6825):268–276
Article MATH Google Scholar
Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Fifth IEEE international conference on data mining. IEEE, p 8
Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, pp 243–258
Vaas L (2014) Good bot, bad bot? 23 million Twitter accounts are automated. https://nakedsecurity.sophos.com/2014/08/14/good-bot-bad-bot-23-million-twitter-accounts-are-automated/. Accessed 16 Oct 2016
Wang XF, Chen G (2003) Complex networks: small-world, scale-free and beyond. IEEE Circuits Syst Mag 3(1):6–20
Article Google Scholar

Download references

Acknowledgements

We would like to thank Carol Teegarden and Robin Levy-Stevenson for editing and proofreading this article to completion. We also thank the Washington Research Foundation Fund for Innovation in Data-Intensive Discovery, and the Moore/Sloan Data Science Environment Project at the University of Washington for supporting this study. Finally, we would like to thank the anonymous reviewers for their helpful comments.

Author information

Authors and Affiliations

Ben-Gurion University of the Negev, Beersheba, Israel
Dima Kagan & Yuval Elovichi
University of Washington, Seattle, USA
Michael Fire

Authors

Dima Kagan
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Elovichi
View author publications
You can also search for this author in PubMed Google Scholar
Michael Fire
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dima Kagan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (MP4 53,299 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kagan, D., Elovichi, Y. & Fire, M. Generic anomalous vertices detection utilizing a link prediction algorithm. Soc. Netw. Anal. Min. 8, 27 (2018). https://doi.org/10.1007/s13278-018-0503-4

Download citation

Received: 01 November 2017
Revised: 18 February 2018
Accepted: 21 March 2018
Published: 05 April 2018
DOI: https://doi.org/10.1007/s13278-018-0503-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generic anomalous vertices detection utilizing a link prediction algorithm

Abstract

Access this article

Similar content being viewed by others

A Meta-Classifier Link Prediction Model for False Profile Identification in Facebook

On Anomaly Detection in Graphs as Node Classification

A Pilot Study and Survey on Methods for Anomaly Detection in Online Social Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generic anomalous vertices detection utilizing a link prediction algorithm

Abstract

Access this article

Similar content being viewed by others

A Meta-Classifier Link Prediction Model for False Profile Identification in Facebook

On Anomaly Detection in Graphs as Node Classification

A Pilot Study and Survey on Methods for Anomaly Detection in Online Social Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation