Advertisement

Facebook Inspector (FbI): Towards automatic real-time detection of malicious content on Facebook

  • Prateek Dewan
  • Ponnurangam Kumaraguru
Original Article

Abstract

Online Social Networks witness a rise in user activity whenever a major event makes news. Cyber criminals exploit this spur in user engagement levels to spread malicious content that compromises system reputation, causes financial losses and degrades user experience. In this paper, we collect and characterize a dataset of 4.4 million public posts generated on Facebook during 17 news-making events (natural calamities, sports, terror attacks, etc.) over a 16-month time period. From this dataset, we filter out two sets of malicious posts, one using URL blacklists and another using human annotations. Our observations reveal some characteristic differences between malicious posts obtained from the two methodologies, thus demanding a twofold filtering process for a more complete and robust filtering system. We empirically confirm the need for this twofold filtering approach by cross-validating supervised learning models obtained from the two sets of malicious posts. These supervised learning models include Naive Bayesian, Decision Trees, Random Forest, and Support Vector Machine-based models. Based on this learning, we implement Facebook Inspector, a REST API-based browser plug-in for identifying malicious Facebook posts in real time. Facebook Inspector uses class probabilities obtained from two independent supervised learning models based on a Random Forest classifier to identify malicious posts in real time. These supervised learning models are based on a feature set comprising of 44 features and achieve an accuracy of over 80% each, using only publicly available features. During the first 9 months of its public deployment (August 2015–May 2016), Facebook Inspector processed 0.97 million posts at an average response time of 2.6 s per post and was downloaded over 2500 times. We also evaluate Facebook Inspector in terms of performance and usability to identify further scope for improvement.

Keywords

Facebook Malicious content Machine learning Real-time system 

Notes

Acknowledgements

We would like to thank Manik Panwar for helping with the development of Facebook Inspector and Bhavna Nagpal for helping with conducting the usability survey. We would also like to thank the members of Precog Research Group at IIIT-Delhi for their constant support and feedback.

References

  1. Acar A, Muraki Y (2011) Twitter for crisis communication: lessons learned from Japan’s tsunami disaster. Int J Web Based Communities 7(3):392–402CrossRefGoogle Scholar
  2. Aggarwal A, Rajadesingan A, Kumaraguru P (2012) Phishari: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime), 2012. IEEE, pp 1–12Google Scholar
  3. Ahmed F, Abulaish M (2012) An mcl-based approach for spam profile detection in online social networks. In: IEEE TrustCom. IEEE, pp 602–608Google Scholar
  4. Antoniades D, Polakis I, Kontaxis G, Athanasopoulos E, Ioannidis S, Markatos EP, Karagiannis T (2011) we. b: The web of short URLs. In: Proceedings of WWW. ACM, pp 715–724Google Scholar
  5. Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: CEAS, vol 6, p 12Google Scholar
  6. Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Proceedings of ACM SIGIR. ACM, pp 620–627Google Scholar
  7. Brooke J (1996) SUS-a quick and dirty usability scale. Usability Eval Ind 189:194Google Scholar
  8. Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In: WWW. ACM, pp 675–684Google Scholar
  9. Catanese S, De Meo P, Ferrara E, Fiumara G, Provetti A (2012) Extraction and analysis of facebook friendship relations. In: Computational Social Networks. Springer, Berlin, pp 291–324Google Scholar
  10. Chhabra S, Aggarwal A, Benevenuto F, Kumaraguru P (2011) Phi.sh/$ocial: the phishing landscape through short urls. In: CEAS. ACM, pp 92–101Google Scholar
  11. Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: Applied cryptography and network security. Springer, Berlin, pp 455–472Google Scholar
  12. Facebook (2014) http://newsroom.fb.com/company-info/. Facebook Company Info
  13. Facebook, Ericsson, Qualcomm (2013) A focus on efficiency. Whitepaper, Internet.orgGoogle Scholar
  14. Facebook Developers (2013) Facebook graph api search. https://developers.facebook.com/docs/graph-api/using-graph-api/v1.0#search
  15. Gao H, Chen Y, Lee K, Palsetia D, Choudhary AN (2012) Towards online spam filtering in social networks. In: NDSSGoogle Scholar
  16. Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Internet measurement conference. ACM, pp 35–47Google Scholar
  17. Gao H, Yang Y, Bu K, Chen Y, Downey D, Lee K, Choudhary A (2014) Spam ain’t as diverse as it seems: throttling osn spam with templates underneath. In: Proceedings of the 30th annual computer security applications conference. ACM, pp 76–85Google Scholar
  18. Google (2014) Safe browsing api. https://developers.google.com/safe-browsing/
  19. Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: CCS. ACM, pp 27–37Google Scholar
  20. Gupta A, Kumaraguru P (2012) Credibility ranking of tweets during high impact events. In: PSOSM. ACMGoogle Scholar
  21. Gupta A, Kumaraguru P, Castillo C, Meier P (2014) Tweetcred: Real-time credibility assessment of content on twitter. In: Social Informatics. Springer, Berlin, pp 228–243Google Scholar
  22. Gupta A, Lamba H, Kumaraguru P (2013) $1.00 per rt #bostonmarathon #prayforboston: analyzing fake content on twitter. In: eCRS. IEEE, p 12Google Scholar
  23. Gupta M, Zhao P, Han J (2012) Evaluating event credibility on twitter. In: SDM. SIAM, pp 153–164Google Scholar
  24. Hispasec Sistemas SL (2013) VirusTotal Public API. https://www.virustotal.com/en/documentation/public-api/
  25. Holcomb J, Gottfried J, Mitchell A (2013) News use across social media platforms. Technical report, Pew Research CenterGoogle Scholar
  26. Marca.com (2014) Luis suarez used as bait for Facebook scam. http://www.marca.com/2014/07/18/en/football/barcelona/1405709402.html
  27. McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. NIPS 2012:548–556Google Scholar
  28. Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we rt? In: Proceedings of the first workshop on social media analytics. ACM, pp 71–79Google Scholar
  29. OpenDNS (2014) Phishtank api. http://www.phishtank.com/api_info.php
  30. Opsahl T (2013) Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc Netw 35(2):159–167CrossRefGoogle Scholar
  31. Owens E, Turitzin C (2014) News feed fyi: Cleaning up news feed spam. http://newsroom.fb.com/news/2014/04/news-feed-fyi-cleaning-up-news-feed-spam/
  32. Owens E, Weinsberg U (2015) News feed fyi: Showing fewer hoaxes. https://newsroom.fb.com/news/2015/01/news-feed-fyi-showing-fewer-hoaxes/
  33. Palen L (2008) Online social media in crisis events. Educ Q 31(3):76–78MathSciNetGoogle Scholar
  34. Rahman MS, Huang T-K, Madhyastha HV, Faloutsos M (2012) Efficient and scalable socware detection in online social networks. In: USENIX security symposium, pp 663–678Google Scholar
  35. Rudra K, Banerjee S, Ganguly N, Goyal P, Imran M, Mitra P (2016) Summarizing situational tweets in crisis scenario. In: Proceedings of the 27th ACM conference on hypertext and social media. ACM, pp 137–147Google Scholar
  36. Semaan B, Mark G (2012) ’facebooking’towards crisis recovery and beyond: disruption as an opportunity. In: Proceedings of the ACM 2012 conference on computer supported cooperative work. ACM, pp 27–36Google Scholar
  37. Sheng S, Wardman B, Warner G, Cranor L, Hong J, Zhang C (2009) An empirical analysis of phishing blacklists. In: Sixth conference on Email and anti-spam (CEAS)Google Scholar
  38. SpamHaus (2014) Domain block list. http://www.spamhaus.org/dbl/
  39. Stein T, Chen E, Mangla K (2011) Facebook immune system. In: Workshop on social network systems. ACM, p 8Google Scholar
  40. Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: ACSAC. ACM, pp 1–9Google Scholar
  41. SURBL, URI (2011) Reputation data. http://www.surbl.org/surbl-analysis
  42. Szell M, Grauwin S, Ratti C (2014) Contraction of online response to major events. PLoS One 9(2):e89052 MITCrossRefGoogle Scholar
  43. TheGuardian (2013) Facebook spammers make $200m just posting links, researchers say. http://www.theguardian.com/technology/2013/aug/28/facebook-spam-202-million-italian-research
  44. Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Physica A 391(16):4165–4180CrossRefGoogle Scholar
  45. Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in facebook. In: Proceedings of the 2nd ACM workshop on online social networks. ACM, pp 37–42Google Scholar
  46. Wang AH (2010) Don’t follow me: Spam detection in twitter. In: SECRYPT. IEEE, pp 1–10Google Scholar
  47. WOT (2014) Web of trust api. https://www.mywot.com/en/api
  48. Zech M (2014) Flight 17 spam scams on facebook, twitter. http://www.nltimes.nl/2014/07/22/flight-17-spam-scams-facebook-twitter/
  49. Zhang X, Zhu S, Liang W (2012) Detecting spam and promoting campaigns in the twitter social network. In: 2012 IEEE 12th international conference on data mining (ICDM). IEEE, pp 1194–1199Google Scholar
  50. Zhu T, Gao H, Yang Y, Bu K, Chen Y, Downey D, Lee K, Choudhary AN (2016) Beating the artificial chaos: fighting OSN spam using its own templates. IEEE/ACM Trans Netw 24(6):3856–3869CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2017

Authors and Affiliations

  1. 1.PrecogIndraprastha Institute of Information Technology - Delhi (IIITD)New DelhiIndia

Personalised recommendations