Advertisement

Forecasting Suspicious Account Activity at Large-Scale Online Service Providers

  • Hassan HalawaEmail author
  • Konstantin Beznosov
  • Baris Coskun
  • Meizhu Liu
  • Matei Ripeanu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11598)

Abstract

In the face of large-scale automated social engineering attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of the attack and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning: we develop an early warning system that harnesses account activity traces to predict which accounts are likely to be compromised in the future. We demonstrate the feasibility and applicability of the system through an experiment at a large-scale online service provider using four months of real-world production data encompassing hundreds of millions of users. We show that—even limiting ourselves to login data only in order to derive features with low computational cost, and a basic model selection approach—our classifier can be tuned to achieve good classification precision when used for forecasting. Our system correctly identifies up to one month in advance the accounts later flagged as suspicious with precision, recall, and false positive rates that indicate the mechanism is likely to prove valuable in operational settings to support additional layers of defense.

Keywords

Forecasting Machine learning for security Big data analytics for security Large-scale cyberattacks Cloud security 

References

  1. 1.
    von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003).  https://doi.org/10.1007/3-540-39200-9_18CrossRefGoogle Scholar
  2. 2.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)Google Scholar
  3. 3.
    Bilge, L., Han, Y., Dell’Amico, M.: Riskteller: predicting the risk of cyber incidents. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 1299–1311. ACM, New York, NY, USA (2017).  https://doi.org/10.1145/3133956.3134022, https://doi.acm.org/10.1145/3133956.3134022
  4. 4.
    Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008).  https://doi.org/10.1007/s10462-009-9109-6. https://dx.doi.org/10.1007/s10462-009-9109-6CrossRefGoogle Scholar
  5. 5.
    Boshmaf, Y., et al.: Integro: leveraging victim prediction for robust fake account detection in OSNs. In: 22nd Annual Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, 8–11 February 2015, pp. 1–15. http://www.internetsociety.org/doc/integro-leveraging-victim-prediction-robust-fake-account-detection-osns
  6. 6.
    Canali, D., Bilge, L., Balzarotti, D.: On the effectiveness of risk prediction based on users browsing behavior. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, ASIA CCS 2014, pp. 171–182. ACM, New York, NY, USA (2014).  https://doi.org/10.1145/2590296.2590347, https://doi.acm.org/10.1145/2590296.2590347
  7. 7.
    Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 423–430. ACM, New York, NY, USA (2007).  https://doi.org/10.1145/1277741.1277814, https://doi.acm.org/10.1145/1277741.1277814
  8. 8.
    Egele, M., Stringhini, G., Kruegel, C., Vigna, G.: COMPA: detecting compromised accounts on social networks. In: Proceedings of the Network & Distributed System Security Symposium, NDSS 2013, ISOC, February 2013Google Scholar
  9. 9.
    Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014). http://dl.acm.org/citation.cfm?id=2627435.2697065MathSciNetzbMATHGoogle Scholar
  10. 10.
    Halawa, H., Beznosov, K., Boshmaf, Y., Coskun, B., Ripeanu, M., Santos-Neto, E.: Harvesting the low-hanging fruits: defending against automated large-scale cyber-intrusions by focusing on the vulnerable population. In: Proceedings of the 2016 New Security Paradigms Workshop, NSPW 2016, pp. 11–22. ACM, New York, NY, USA (2016).  https://doi.org/10.1145/3011883.3011885, https://doi.acm.org/10.1145/3011883.3011885
  11. 11.
    Halawa, H., Ripeanu, M., Beznosov, K., Coskun, B., Liu, M.: Forecasting suspicious account activity at large-scale online service providers. CoRR abs/1801.08629 (2018). http://arxiv.org/abs/1801.08629
  12. 12.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009).  https://doi.org/10.1109/TKDE.2008.239CrossRefGoogle Scholar
  13. 13.
    Ho, G., Javed, A.S.M., Paxson, V., Wagner, D.: Detecting credential spearphishing attacks in enterprise settings. In: Proceedings of the 26rd USENIX Security Symposium, USENIX Security 2017, pp. 469–485 (2017)Google Scholar
  14. 14.
    Jagatic, T.N., Johnson, N.A., Jakobsson, M., Menczer, F.: Social phishing. Commun. ACM 50(10), 94–100 (2007)CrossRefGoogle Scholar
  15. 15.
    Liu, G., Xiang, G., Pendleton, B.A., Hong, J.I., Liu, W.: Smartening the crowds: computational techniques for improving human verification to fight phishing scams. In: Proceedings of the Seventh Symposium on Usable Privacy and Security, SOUPS 2011, pp. 8:1–8:13. ACM, New York, NY, USA (2011).  https://doi.org/10.1145/2078827.2078838, https://doi.acm.org/10.1145/2078827.2078838
  16. 16.
    Liu, Y., et al.: Cloudy with a chance of breach: forecasting cyber security incidents. In: Proceedings of the 24th USENIX Security Symposium, USENIX Security 2015, pp. 1009–1024 (2015)Google Scholar
  17. 17.
    Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16:1–16:35 (2013).  https://doi.org/10.1145/2431211.2431215. https://doi.acm.org/10.1145/2431211.2431215CrossRefzbMATHGoogle Scholar
  18. 18.
    Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: M. Hämmerli, B., Sommer, R. (eds.) DIMVA 2007. LNCS, vol. 4579, pp. 20–39. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-73614-1_2CrossRefGoogle Scholar
  19. 19.
    Moore, T., Clayton, R., Anderson, R.: The economics of online crime. J. Econ. Perspect. 23(3), 3–20 (2009).  https://doi.org/10.1257/jep.23.3.3. https://www.aeaweb.org/articles/?doi=10.1257/jep.23.3.3CrossRefGoogle Scholar
  20. 20.
    Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001).  https://doi.org/10.1023/A:1007601015854. https://dx.doi.org/10.1023/A:1007601015854CrossRefzbMATHGoogle Scholar
  21. 21.
    Shon, T., Moon, J.: A hybrid machine learning approach to network anomaly detection. Inf. Sci. 177(18), 3799–3821 (2007)CrossRefGoogle Scholar
  22. 22.
    Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: Proceedings of the 23rd USENIX Security Symposium, USENIX Security 2014, pp. 625–640 (2014)Google Scholar
  23. 23.
    Stein, T., Chen, E., Mangla, K.: Facebook immune system. In: Proceedings of the 4th Workshop on Social Network Systems, SNS 2011, pp. 8:1–8:8. ACM, New York, NY, USA (2011).  https://doi.org/10.1145/1989656.1989664. https://doi.acm.org/10.1145/1989656.1989664
  24. 24.
    Thomas, K., Li, F., Grier, C., Paxson, V.: Consequences of connectivity: characterizing account hijacking on twitter. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, pp. 489–500. ACM, New York, NY, USA (2014).  https://doi.org/10.1145/2660267.2660282. https://doi.acm.org/10.1145/2660267.2660282
  25. 25.
    Wang, G., Konolige, T., Wilson, C., Wang, X., Zheng, H., Zhao, B.Y.: You are how you click: clickstream analysis for sybil detection. In: Proceedings of the 22Nd USENIX Conference on Security, SEC 2013, pp. 241–256. USENIX Association, Berkeley, CA, USA (2013). http://dl.acm.org/citation.cfm?id=2534766.2534788
  26. 26.
    Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: Proceedings of the 17th Annual Network and Distributed System Security Symposium, NDSS Symposium 2010, San Diego, CA, USA (2010)Google Scholar
  27. 27.
    Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 259–268. ACM, New York, NY, USA (2011).  https://doi.org/10.1145/2068816.2068841. https://doi.acm.org/10.1145/2068816.2068841
  28. 28.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113
  29. 29.
    Zhang, J., et al.: Safeguarding academic accounts and resources with the university credential abuse auditing system. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), pp. 1–8, June 2012.  https://doi.org/10.1109/DSN.2012.6263961

Copyright information

© International Financial Cryptography Association 2019

Authors and Affiliations

  • Hassan Halawa
    • 1
    Email author
  • Konstantin Beznosov
    • 1
  • Baris Coskun
    • 2
  • Meizhu Liu
    • 3
  • Matei Ripeanu
    • 1
  1. 1.University of British ColumbiaVancouverCanada
  2. 2.Amazon Web ServicesNew YorkUSA
  3. 3.Yahoo! ResearchNew YorkUSA

Personalised recommendations