Skip to main content

Generic anomalous vertices detection utilizing a link prediction algorithm

Abstract

In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The detection of anomalies in these networks has become increasingly important, such as in exposing infected endpoints in computer networks or identifying socialbots. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by utilizing topology-based features. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous, we applied our method on 10 networks of various scales, from a network of several dozen students to online networks with millions of vertices. In every scenario, we succeeded in identifying anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is generic, and efficient both in revealing fake users and in disclosing the influential people in social networks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. https://www.kaggle.com/c/socialNetwork.

  2. In a large dataset, computing dozens of features can last several hours or even several days; to avoid extremely long computations we used only computationally efficient features (Fire et al. 2013).

  3. This not the standard kNN acronym, but a set of weight functions defined by Cukierski et al. (2011).

  4. https://www.academia.edu.

  5. https://www.arxiv.com.

  6. https://snap.stanford.edu/data/cit-HepPh.html.

  7. https://github.com/gephi/gephi/wiki/Datasets.

  8. https://www.dblp.com.

  9. http://dblp.uni-trier.de/xml/dblp.xml.gz.

  10. https://www.flixster.com.

  11. https://www.twitter.com.

  12. https://about.twitter.com/company.

  13. We limited the crawler to crawling a maximum of 1000 friends and followers for every profile (see Sect. 9). This limitation is due to the fact that Twitter accounts can have an unlimited number of friends and followers, which in some cases can reach several million.

  14. https://www.yelp.com.

  15. https://www.yelp.com/dataset_challenge.

  16. https://support.twitter.com/articles/18311.

  17. https://support.twitter.com/articles/15790.

  18. A Sybil attack is when the adversary controls a substantial fraction of the vertices in the system, which are then used to influence and manipulate the system to achieve the end goals of the attacker.

  19. http://data4good.io/dataset.html.

  20. https://github.com/Kagandi/anomalous-vertices-detection.

References

  • Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds) Advances in Knowledge Discovery and Data Mining, vol 6119. Springer, Berlin, Heidelberg

    Chapter  Google Scholar 

  • Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688

    MathSciNet  Article  Google Scholar 

  • Al Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SDM’06: workshop on link analysis, counter-terrorism and security

  • Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47

    MathSciNet  MATH  Article  Google Scholar 

  • Balthrop J, Forrest S, Newman ME, Williamson MM (2004) Technological networks and the spread of computer viruses. Science 304(5670):527–529

    Article  Google Scholar 

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    MathSciNet  MATH  Article  Google Scholar 

  • Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU (2006) Complex networks: structure and dynamics. Phys Rep 424(4):175–308

    MathSciNet  MATH  Article  Google Scholar 

  • Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat sci 17:235–249

    MathSciNet  MATH  Article  Google Scholar 

  • Boshmaf Y, Muslukhov I, Beznosov K, Ripeanu M (2011) The socialbot network: when bots socialize for fame and money. In: Proceedings of the 27th Annual Computer Security Applications Conference. ACM, pp 93–102

  • Brin S, Page L (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833

    Article  Google Scholar 

  • Cao Q, Sirivianos M, Yang X, Pregueiro T (2012) Aiding the detection of fake accounts in large scale social online services. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, p 15

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  • Cukierski W, Hamner B, Yang B (2011) Graph-based features for supervised link prediction. In: The 2011 international joint conference on neural networks (IJCNN). IEEE, pp 1237–1244

  • Douceur JR (2002) The Sybil attack. In: International workshop on peer-to-peer systems. Springer, pp 251–260

  • Eberle W, Holder L (2007) Anomaly detection in data represented as graphs. Intell Data Anal 11(6):663–689

    Article  Google Scholar 

  • Elyashar A, Fire M, Kagan D, Elovici Y (2014) Guided socialbots: infiltrating the social networks of specific organizations’ employees. AI Commun 29(1):87–106

    MathSciNet  Article  Google Scholar 

  • Facebook (2015) Facebooks annual report 2015. https://s21.q4cdn.com/399680738/files/doc_financials/annual_reports/2015-Annual-Report.pdf. Accessed 16 Oct 2016

  • Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1(3):291–316

    Article  Google Scholar 

  • Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104

    Article  Google Scholar 

  • Fire M, Guestrin C (2016) Analyzing complex network user arrival patterns and their effect on network topologies. arXiv:160307445

  • Fire M, Tenenboim L, Lesser O, Puzis R, Rokach L, Elovici Y (2011) Link prediction in social networks using computationally efficient topological features. In: 2011 IEEE third international conference on privacy, security, risk and trust (PASSAT) and social computing (SocialCom). IEEE, pp 73–80

  • Fire M, Katz G, Elovici Y (2012) Strangers intrusion detection-detecting spammers and fake profiles in social networks based on topology anomalies. Hum J 1(1):26–39

    Google Scholar 

  • Fire M, Tenenboim-Chekina L, Puzis R, Lesser O, Rokach L, Elovici Y (2013) Computationally efficient link prediction in a variety of social networks. ACM Trans Intell Syst Technol (TIST) 5(1):10

    Google Scholar 

  • Heidler R, Gamper M, Herz A, Eßer F (2014) Relationship patterns in the 19th century: the friendship network in a German boys’ school class from 1880 to 1881 revisited. Soc Netw 37:1–13

    Article  Google Scholar 

  • Hernandez D (2015) Why can’t twitter kill its bots? http://fusion.net/story/195901/twitter-bots-spam-detection/. Accessed 16 Oct 2016

  • Hofmeyr SA, Forrest S, Somayaji A (1998) Intrusion detection using sequences of system calls. J Comput Secur 6(3):151–180

    Article  Google Scholar 

  • Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016) Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 895–904

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  • Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256

    MathSciNet  MATH  Article  Google Scholar 

  • Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 631–636

  • Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30

    Article  Google Scholar 

  • Plante C (2014) That’s not a celebrity you’re following on twitter, it’s an assistant. http://www.theverge.com/2014/9/8/6121985/celebrity-twitter-adam-levine. Accessed 16 Oct 2016

  • Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. ACM, pp 1–9

  • Strogatz SH (2001) Exploring complex networks. Nature 410(6825):268–276

    MATH  Article  Google Scholar 

  • Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Fifth IEEE international conference on data mining. IEEE, p 8

  • Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, pp 243–258

  • Vaas L (2014) Good bot, bad bot? 23 million Twitter accounts are automated. https://nakedsecurity.sophos.com/2014/08/14/good-bot-bad-bot-23-million-twitter-accounts-are-automated/. Accessed 16 Oct 2016

  • Wang XF, Chen G (2003) Complex networks: small-world, scale-free and beyond. IEEE Circuits Syst Mag 3(1):6–20

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Carol Teegarden and Robin Levy-Stevenson for editing and proofreading this article to completion. We also thank the Washington Research Foundation Fund for Innovation in Data-Intensive Discovery, and the Moore/Sloan Data Science Environment Project at the University of Washington for supporting this study. Finally, we would like to thank the anonymous reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dima Kagan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (MP4 53,299 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kagan, D., Elovichi, Y. & Fire, M. Generic anomalous vertices detection utilizing a link prediction algorithm. Soc. Netw. Anal. Min. 8, 27 (2018). https://doi.org/10.1007/s13278-018-0503-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0503-4

Keywords

  • Anomalous Vertex
  • Classical Link Prediction
  • Anomaly Detection
  • Fake Profiles
  • Transit Friendly