Identifying Threats Using Graph-based Anomaly Detection

  • William EberleEmail author
  • Lawrence Holder
  • Diane Cook

Much of the data collected during the monitoring of cyber and other infrastructures is structural in nature, consisting of various types of entities and relationships between them. The detection of threatening anomalies in such data is crucial to protecting these infrastructures. We present an approach to detecting anomalies in a graph-based representation of such data that explicitly represents these entities and relationships. The approach consists of first finding normative patterns in the data using graph-based data mining and then searching for small, unexpected deviations to these normative patterns, assuming illicit behavior tries to mimic legitimate, normative behavior. The approach is evaluated using several synthetic and real-world datasets. Results show that the approach has high truepositive rates, low false-positive rates, and is capable of detecting complex structural anomalies in real-world domains including email communications, cellphone calls and network traffic.


Intrusion Detection Anomaly Detection Normative Pattern Minimum Description Length Frequent Subgraph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Barthélemy, M., Chow, E. and Eliassi-Rad, T, Knowledge Representation Issues in Semantic Graphs for Relationship Detection. AI Technologies for Homeland Security: Papers from the 2005 AAAI Spring Symposium, AAAI Press, 2005, pp. 91-98.Google Scholar
  2. [2]
    Boykin, P. and Roychowdhury, V. Leveraging Social Networks to Fight Spam. IEEE Computer, April 2005, 38(4), 61-67, 2005.MathSciNetGoogle Scholar
  3. [3]
    Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. and Wiener, J. Graph Structure in the Web. Computer Networks, Vol. 33, 309-320, 2000.CrossRefGoogle Scholar
  4. [4]
    Caruso, C. and Malerba, D. Clustering as an add-on for firewalls. Data Mining, WIT Press, 2004.Google Scholar
  5. [5]
    Chakrabarti, D. AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 112-124, 2004.Google Scholar
  6. [6]
    Chung, F., Lu, L., Vu, V. Eigenvalues of Random Power Law Graphs. Annals of Combinatorics, 7, 21-33, 2003.zbMATHCrossRefMathSciNetGoogle Scholar
  7. [7]
    Cook, D. and Holder, L. Graph-based data mining. IEEE Intelligent Systems 15(2), 32-41, 2000.CrossRefGoogle Scholar
  8. [8]
    Cook, D. and Holder, L. Mining Graph Data. John Wiley and Sons, 2006.Google Scholar
  9. [9]
    Eberle, W. and Holder, L. Detecting Anomalies in Cargo Shipments Using Graph Properties. Proceedings of the IEEE Intelligence and Security Informatics Conference, 2006.Google Scholar
  10. [10]
    Frank, E. and Witten, I. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, Second Edition, 2005.Google Scholar
  11. [11]
    Gross, J, and Yellen, J. Graph Theory and Its Applications. CRC Press. 1999.Google Scholar
  12. [12]
    Gudes, E. and Shimony, S. Discovering Frequent Graph Patterns Using Disjoint Paths IEEE Transactions of Knowledge and Data Engineering, 18(11) November 2006.Google Scholar
  13. [13]
    Holder, L., Cook, D. and Djoko, S. Substructure Discovery in the SUBDUE System. Proceedings of the AAAI Workshop on Knowledge Discover in Databases, pp. 169-180, 1994.Google Scholar
  14. [14]
    Holder, L., Cook, D., Coble, J., and Mukherjee, M. Graph-based Relational Learning with Application to Security. Fundamenta Informaticae Special Issue on Mining Graphs, Trees and Sequences, 66(1-2):83-101, March 2005.zbMATHMathSciNetGoogle Scholar
  15. [15]
    Huan, J., Wang, W. and Prins, J. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. Knowledge Discovery and Data Mining, KDD '04, 2004.Google Scholar
  16. [16]
    KDD Cup 1999. Knowledge Discovery and Data Mining Tools Competition. 1999.
  17. [17]
    Kamarck, E. Applying 21 st Century Government to the Challenge of Homeland Security. Harvard University, PriceWaterhouseCoopers, 2002.Google Scholar
  18. [18]
    Kanungo, T, Mount, D., Netanyahu, N., Piatko, C., Silverman, R. and Wu, A. The Analysis of a Simple k-Means Clustering Algorithm. Proceedings on the 16th Annual Symposium on Computational Geometry, 100-109, 2000.Google Scholar
  19. [19]
    Kuramochi, M. and Karypis, G. An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Transactions on Knowledge and Data Engineering, pp. 1038-1051, 2004.Google Scholar
  20. [20]
    Kuramochi, M. and Karypis, G. Grew - A Scalable Frequent Subgraph Discovery Algorithm. IEEE International Conference on Data Mining (ICDM '04), 2004.Google Scholar
  21. [21]
    Lin S. and Chalupsky, H. Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis. Proceedings of the Third IEEE ICDM International Conference on Data Mining, 171-178, 2003.Google Scholar
  22. [22]
    Mukherjee, M. and Holder, L. Graph-based Data Mining on Social Networks. Workshop on Link Analysis and Group Detection, KDD, 2004.Google Scholar
  23. [23]
    Noble, C. and Cook, D. Graph-Based Anomaly Detection. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 631-636, 2003.Google Scholar
  24. [24]
    Portnoy, L., Eskin, E. and Stolfo, S. Intrusion detection with unlabeled data using clustering. Proceedings of ACM CSS Workshop on Data Mining Applied to Security, 2001.Google Scholar
  25. [25]
    Rattigan, M. and Jensen, D. The case for anomalous link discovery. ACM SIGKDD Explor. Newsl., 7(2):41-47, 2005.CrossRefGoogle Scholar
  26. [26]
    Sageman, M. Understanding Terror Networks. University of Pennsylvania Press, 2004.Google Scholar
  27. [27]
    Scott, J. Social Network Analysis: A Handbook. SAGE Publications, Second Edition, 72-78, 2000.Google Scholar
  28. [28]
    Shetty, J. and Adibi, J. Discovering Important Nodes through Graph Entropy: The Case of Enron Email Database. KDD, Proceedings of the 3rd international workshop on Link discovery, 74-81, 2005.Google Scholar
  29. [29]
    Staniford-Chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagland, J. Levitt, K., Wee, C., Yip, R. and Zerkle, D. GrIDS - A Graph Based Intrusion Detection System for Large Networks. Proceedings of the 19th National Information Systems Security Conference, 1996.Google Scholar
  30. [30]
    Sun, J, Qu, H., Chakrabarti, D. and Faloutsos, C. Relevance search and anomaly detection in bipartite graphs. SIGKDD Explorations 7(2), 48-55, 2005.CrossRefGoogle Scholar
  31. [31]
    Taipale, K. Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data. Columbia Science and Technology Law Review, 2003.Google Scholar
  32. [32]
    Thomas, L., Valluri, S. and Karlapalem, K. MARGIN: Maximal Frequent Subgraph Mining. Sixth International Conference on Data Mining (ICMD '06), 109-1101, 2006.Google Scholar
  33. [33]
    U.S. Customs Service: 1,754 Pounds of Marijuana Seized in Cargo Container at Port Everglades. November 6, 2000. (
  34. [34]
  35. [35]
    West, D. Introduction to Graph Theory. Prentice-Hall International. Second Edition. 2001.Google Scholar
  36. [36]
    Yan, X. and Han, J. gSpan: Graph-Based Substructure Pattern Mining. Proceedings of International Conference on Data Mining, ICDM, pp. 51-58, 2002.Google Scholar
  37. [37]
    Zeng, Z., Wang, J., Zhou, L. and Karypis, G. Coherent closed quasi-clique discovery from large dense graph databases. Conference on Knowledge Discovery in Data, SIGKDD, 797-802, 2006.Google Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceTennessee Technological UniversityCookeville
  2. 2.School of Electrical Engineering and Computer ScienceWashington State UniversityPullman

Personalised recommendations