Cyber Warfare pp 287-306 | Cite as

Graph Mining for Cyber Security

Part of the Advances in Information Security book series (ADIS, volume 56)


How does malware propagate? How do software patches propagate? Given a set of malware samples, how to identify all malware variants that exist in a database? Which human behaviors may lead to increased malware attacks? These are challenging problems in their own respect, especially as they depend on having access to extensive, field-gathered data that highlight the current trends. These datasets are increasingly easier to collect, are large in size, and also high in complexity. Hence data mining can play an important role in cyber-security by answering these questions in an empirical data-driven manner. In this chapter, we discuss how related problems in cyber-security can be tackled via techniques from graph mining (specifically mining network propagation) on large field datasets collected on millions of hosts.


Graph Mining Seed Node Epidemic Threshold Executable File Benign File 



The WINE platform data analyzed here is available for follow-on research as the reference data setWINE-2012-006. Based on work partly supported by the Army Research Laboratory under grant number W911NF-09-2-0053, the National Science Foundation under grant numbers IIS-1017415 and IIS-1353346 and by the Maryland Procurement Office under contract H98230-14-C-0127.


  1. Adar, E., Adamic, L.A.: Tracking information epidemics in blogspace. (2005)Google Scholar
  2. Albert, R., Jeong, H., Barabási, A.L.: Diameter of the World-Wide Web. Nature401, 130–131 (1999)Google Scholar
  3. Anderson, R.M., May, R.M.: Infectious diseases of humans: Dynamics and control. Oxford Press (2002)Google Scholar
  4. Bailey, N.: The Mathematical Theory of Infectious Diseases and its Applications. Griffin, London (1975)Google Scholar
  5. Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science286, 509–512 (1999)Google Scholar
  6. Barrett, C.L., Bisset, K.R., Eubank, S.G., Feng, X., Marathe, M.V.: Episimdemics: an efficient algorithm for simulating the spread of infectious disease over large realistic social networks pp. 1–12 (2008)Google Scholar
  7. Bass, F.M.: A new product growth for model consumer durables. Management Science15(5), 215–227 (1969)Google Scholar
  8. Beutel, A., Prakash, B.A., Rosenfeld, R., Faloutsos, C.: Interacting viruses in networks: can both survive? In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12, pp. 426–434 (2012)Google Scholar
  9. Bikhchandani, S., Hirshleifer, D., Welch, I.: A theory of fads, fashion, custom, and cultural change in informational cascades. Journal of Political Economy100(5), 992–1026 (1992)Google Scholar
  10. Bilge, L., Dumitras, T.: Before we knew it: An empirical study of zero-day attacks in the real world. In: ACM Conference on Computer and Communications Security. Raleigh, NC (2012)Google Scholar
  11. Briesemeister, L., Lincoln, P., Porras, P.: Epidemic profiles and defense of scale-free networks. WORM 2003 (2003)Google Scholar
  12. Camp, J., Cranor, L., Feamster, N., Feigenbaum, J., Forrest, S., Kotz, D., Lee, W., Lincoln, P., Paxson, V., Reiter, M., Rivest, R., Sanders, W., Savage, S., Smith, S., Spafford, E., Stolfo, S.: Data for cybersecurity research: Process and “wish list”. (2009)
  13. Chakrabarti, D., Wang, Y., Wang, C., Leskovec, J., Faloutsos, C.: Epidemic thresholds in real networks. ACM TISSEC10(4) (2008)Google Scholar
  14. Chau, D.H.P., Nachenberg, C., Wilhelm, J., Wright, A., Faloutsos, C.: Polonium: Tera-scale graph mining for malware detection. In: SIAM International Conference on Data Mining (SDM). Mesa, AZ (2011)Google Scholar
  15. Cohen, R., Havlin, S., ben Avraham, D.: Efficient immunization strategies for computer networks and populations. Physical Review Letters91(24) (2003)Google Scholar
  16. Domingos, P., Richardson, M.: Mining the network value of customers. In: KDD, pp. 57–66 (2001)Google Scholar
  17. Falliere, N., O’Murchu, L., Chien, E.: W32.Stuxnet dossier. Symantec Whitepaper (2011).
  18. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. SIGCOMM pp. 251–262 (1999)Google Scholar
  19. Ganesh, A., Massoulié, L., Towsley, D.: The effect of network topology on the spread of epidemics. In: IEEE INFOCOM. IEEE Computer Society Press, Los Alamitos, CA (2005)Google Scholar
  20. Gkantsidis, C., Karagiannis, T., Vojnovic, M.: Planet scale software updates. In: SIGCOMM, pp. 423–434 (2006)Google Scholar
  21. Goldenberg, J., Libai, B., Muller, E.: Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters (2001)Google Scholar
  22. Granovetter, M.: Threshold models of collective behavior. Am. Journal of Sociology83(6), 1420–1443 (1978)Google Scholar
  23. Gruhl, D., Guha, R., Liben-Nowell, D., Tomkins, A.: Information diffusion through blogspace. In: WWW ’04 (2004)
  24. Hayashi, Y., Minoura, M., Matsukubo, J.: Recoverable prevalence in growing scale-free networks and the effective immunization. arXiv:cond-mat/0305549 v2 (2003)Google Scholar
  25. Hethcote, H.W.: The mathematics of infectious diseases. SIAM Review42 (2000)Google Scholar
  26. Hethcote, H.W., Yorke, J.A.: Gonorrhea transmission dynamics and control. Springer Lecture Notes in Biomathematics46 (1984)Google Scholar
  27. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: KDD ’03 (2003)Google Scholar
  28. Kephart, J.O., White, S.R.: Directed-graph epidemiological models of computer viruses. In: Proceedings of the 1991 IEEE Computer Society Symposium on Research in Security and Privacy, pp. 343–359 (1991)Google Scholar
  29. Kephart, J.O., White, S.R.: Measuring and modeling computer virus prevalence. In: Proceedings of the 1993 IEEE Computer Society Symposium on Research in Security and Privacy, pp. 2–15 (1993)Google Scholar
  30. Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. In: WWW ’03: Proceedings of the 12th international conference on World Wide Web, pp. 568–576. ACM Press, New York, NY, USA (2003).Google Scholar
  31. Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining, pp. 611–617. New York (2006)Google Scholar
  32. Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Computer Networks31(11-16), 1481–1493 (1999)Google Scholar
  33. Lad, M., Zhao, X., Zhang, B., Massey, D., Zhang, L.: Analysis of BGP Update Burst During Slammer Attack. In: The 5th International Workshop on Distributed Computing (2005)Google Scholar
  34. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: Densification laws, shrinking diameters and possible explanations. In: Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY (2005)Google Scholar
  35. Li, J., Wu, Z., Purpus, E.: CAM04-5: Toward Understanding the Behavior of BGP During Large-Scale Power Outages. Global Telecommunications Conference, 2006. GLOBECOM ’06. IEEE pp. 1–5 (Nov. 2006)Google Scholar
  36. Madar, N., Kalisky, T., Cohen, R., ben Avraham, D., Havlin, S.: Immunization and epidemic dynamics in complex networks. Eur. Phys. J. B38(2), 269–276 (2004)Google Scholar
  37. Matsubara, Y., Sakurai, Y., Prakash, B.A., Li, L., Faloutsos, C.: Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12, pp. 6–14 (2012)Google Scholar
  38. McHugh, J.: Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security3(4), 262–294 (2000)Google Scholar
  39. McKendrick, A.G.: Applications of mathematics to medical problems. In: Proceedings of Edin. Math. Society, vol. 44, pp. 98–130 (1925)Google Scholar
  40. Milgram, S.: The small-world problem. Psychology Today2, 60–67 (1967)Google Scholar
  41. Moore, D., Shannon, C., Claffy, K.C.: Code-red: a case study on the spread and victims of an internet worm. In: Internet Measurement Workshop, pp. 273–284 (2002)Google Scholar
  42. Moore, D., Paxson, V., Savage, S., Shannon, C., Staniford, S., Weaver, N.: Inside the Slammer worm. Security & Privacy, IEEE1(4), 33–39 (2003)Google Scholar
  43. Newman, M.E.J.: Threshold effects for two pathogens spreading on a network. Phys. Rev. Lett (2005)Google Scholar
  44. Papalexakis, E.E., Dumitras, T., Chau, D.H., Prakash, B.A., Faloutsos, C.: Spatio-temporal mining of software adoption & penetration. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2013)Google Scholar
  45. Pastor-Satorras, R., Vespignani, A.: Epidemic dynamics and endemic states in complex networks. Physical Review E63, 066,117 (2001)Google Scholar
  46. Pastor-Satorras, R., Vespignani, A.: Epidemic dynamics in finite size scale-free networks. Physical Review E65, 035,108 (2002)Google Scholar
  47. Prakash, B.A., Tong, H., Valler, N., Faloutsos, M., Faloutsos, C.: Virus propagation on time-varying networks: Theory and immunization algorithms. ECML-PKDD (2010)Google Scholar
  48. Prakash, B.A., Chakrabarti, D., Faloutsos, M., Valler, N., Faloutsos, C.: Threshold conditions for arbitrary cascade models on arbitrary networks. In: ICDM (2011)Google Scholar
  49. Prakash, B.A., Beutel, A., Rosenfeld, R., Faloutsos, C.: Winner takes all: Competiting viruses or ideas on fair-play networks. WWW (2012)Google Scholar
  50. Prakash, B.A., Vreeken, J., Faloutsos, C.: Spotting culprits in epidemics: How many and which ones? In: ICDM (2012)Google Scholar
  51. Prakash, B.A., Adamic, L.A., Iwashyna, T.J., Tong, H., Faloutsos, C.: Fractional immunization in networks. In: SDM, pp. 659–667 (2013)Google Scholar
  52. Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing (2002). Scholar
  53. Ripeanu, M., Foster, I., Iamnitchi, A.: Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Computing Journal6(1) (2002)Google Scholar
  54. Rogers, E.M.: Diffusion of Innovations, 5th Edition. Free Press (2003).
  55. Staniford, S., Moore, D., Paxson, V., Weaver, N.: The top speed of flash worms. In: WORM, pp. 33–42 (2004)Google Scholar
  56. Staniford, S., Paxson, V., Weaver, N.: How to 0wn the internet in your spare time. In: Proceedings of the 11th USENIX Security Symposium, pp. 149–167. USENIX Association, Berkeley, CA, USA (2002).
  57. Symantec Corporation: Symantec Internet security threat report, volume 17. (2012)
  58. Tong, H., Prakash, B.A., Eliassi-Rad, T., Faloutsos, M., Faloutsos, C.: Gelling, and melting, large graphs by edge manipulation. In: CIKM (2012)Google Scholar
  59. Tong, H., Prakash, B.A., Tsourakakis, C.E., Eliassi-Rad, T., Faloutsos, C., Chau, D.H.: On the vulnerability of large graphs. In: ICDM (2010)Google Scholar
  60. Valler, N., Prakash, B.A., Tong, H., Faloutsos, M., Faloutsos, C.: Epidemic spread in mobile ad hoc networks: Determining the tipping point. IFIP NETWORKING (2011)Google Scholar
  61. Wang, L., Zhao, X., Pei, D., Bush, R., Massey, D., Mankin, A., Wu, S., Zhang, L.: Observation and Analysis of BGP Behavior under Stress. In: IMW (2002)Google Scholar
  62. Watts, D.J.: A simple model of global cascades on random networks. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 99, pp. 5766–5771 (2002)Google Scholar
  63. Weaver, N., Ellis, D.: Reflections on Witty: Analyzing the attacker. ;login: The USENIX Magazine29(3), 34–37 (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceVirginia Tech.BlacksburgUSA

Personalised recommendations