Journal of Intelligent Information Systems

, Volume 39, Issue 1, pp 59–85

Community-based anomaly detection in evolutionary networks

  • Zhengzhang Chen
  • William Hendrix
  • Nagiza F. Samatova
Article

Abstract

Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks—their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non–representative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11–46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.

Keywords

Anomaly detection Time-varying graphs Evolutionary analysis Community detection Community-based anomaly 

References

  1. Bader, D. A., & Madduri, K. (2006). Gtgraph: A synthetic graph generator suite. Technical Report GA 30332, Georgia Institute of Technology, Atlanta.Google Scholar
  2. Baird, D., & Ulanowicz, R. E. (1989). The seasonal dynamics of the chesapeake bay ecosystem. Ecological Monographs, 59, 329–364.CrossRefGoogle Scholar
  3. Chakrabarti, D. (2004). Autopart: Parameter-free graph partitioning and outlier detection. In PKDD (pp. 112–124).Google Scholar
  4. Chakrabarti, D., Zhan, Y., & Faloutsos, C. (2004). R-mat: A recursive model for graph mining. In SDM.Google Scholar
  5. Chan, P. K., & Mahoney, M. V. (2005). Modeling multiple time series for anomaly detection. In ICDM (pp. 90–97).Google Scholar
  6. Chen, L., DeVries, A. L., & Cheng, C. H. (1997). Convergent evolution of antifreeze glycoproteins in Antarctic notothenioid fish and Arctic cod. Proceedings of the National Academy of Sciences of the United States of America, 94, 3817–3822.CrossRefGoogle Scholar
  7. Cheng, H., Tan, P.-N., Potter, C., & Klooster, S. (2008). A robust graph–based algorithm for detection and characterization of anomalies in noisy multivariate time series. In IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008 (pp. 349–358).Google Scholar
  8. Clauset, G., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70, 1–6.CrossRefGoogle Scholar
  9. Eberle, W., & Holder, L. (2006). Detecting anomalies in cargo shipments using graph properties. In Proceedings of the IEEE intelligence and security informatics conference.Google Scholar
  10. Eberle, W., & Holder, L. (2007). Discovering structural anomalies in graph–based data. In Workshops proceedings of the 7th IEEE international conference on data mining (pp. 393–398).Google Scholar
  11. Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.MathSciNetMATHCrossRefGoogle Scholar
  12. Hautamäki, V., Kärkkäinen, I., & Fränti, P. (2004). Outlier detection using k-nearest neighbour graph. In ICPR (3) (pp. 430–433).Google Scholar
  13. Hopcroft, J., Khan, O., Kulis, B., & Selman, B. (2004). Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences, 101, 5249–5253.CrossRefGoogle Scholar
  14. Keogh, E. J., Lin, J., & Fu, A. W.-C. (2005). Hot sax: Efficiently finding the most unusual time series subsequence. In ICDM (pp. 226–233).Google Scholar
  15. Lin, S., & Chalupsky, H. (2003). Unsupervised link discovery in multi-relational data via rarity analysis. In ICDM (pp. 171–178).Google Scholar
  16. Long, M., Betran, E., Thornton, K., & Wang, W. (2003). The origin of new genes: Glimpses from the young and old. Nature Reviews. Genetics, 4(11), 865–875.CrossRefGoogle Scholar
  17. Moonesinghe, H., & Tan, P.-N. (2006). Outlier detection using random walks. In International Conference on Tools with Artificial Intelligence, ICTAI (pp. 532–539).Google Scholar
  18. Noble, C. C., & Cook, D. J. (2003). Graph–based anomaly detection. In KDD ’03: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631–636). New York: ACM.CrossRefGoogle Scholar
  19. Padmanabh, K., Vanteddu, A., Sen, S., & Gupta, P. (2007). Random walk on random graph based outlier detection in wireless sensor networks. In Wireless communication and sensor networks (pp. 45–49).Google Scholar
  20. Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818.CrossRefGoogle Scholar
  21. Palla, G., Albert-László Barabási, A., & Vicsek, T. (2007). Quantifying social group evolution. Nature, 446, 664–667.CrossRefGoogle Scholar
  22. Schmidt, M. C., Samatova, N. F., Thomas, K., & Park, B.-H. (2009). A scalable, parallel algorithm for maximal clique enumeration. Journal of Parallel and Distributed Computing, 69(4), 417–428.CrossRefGoogle Scholar
  23. Shetty, J., & Adibi, J. (2005). Discovering important nodes through graph entropy the case of enron email database. In LinkKDD ’05: proceedings of the 3rd international workshop on link discovery (pp. 74–81). New York: ACM.CrossRefGoogle Scholar
  24. Snel, B., Bork, P., & Huynen, M. A. (2000). Genome evolution. Gene fusion versus gene fission. Trends in Genetics, 16, 9–11.CrossRefGoogle Scholar
  25. Staniford-chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagl, J., et al. (1996). Grids—a graph based intrusion detection system for large networks. In Proceedings of the 19th national information systems security conference (pp. 361–370).Google Scholar
  26. Steinhaeuser, K., Chawla, N. V., & Ganguly, A. R. (2009). An exploration of climate data using complex networks. In SensorKDD ’09: Proceedings of the 3rd international workshop on knowledge discovery from sensor data (pp. 23–31). New York: ACM.CrossRefGoogle Scholar
  27. Sun, J., Faloutsos, C., Papadimitriou, S., & Yu, P. S. (2007). Graphscope: Parameter-free mining of large time-evolving graphs. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 687–696). San Jose: ACM.CrossRefGoogle Scholar
  28. Sun, J., Qu, H., Chakrabarti, D., & Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In The 5th IEEE International Conference on Data Mining (ICDM) (pp. 418–425).Google Scholar
  29. Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 374–383). New York: ACM.CrossRefGoogle Scholar
  30. Tantipathananandh, C., Wolf, T. B., & Kempe, D. (2007). A framework for community identification in dynamic social networks. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 717–726). ACM.Google Scholar
  31. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.CrossRefGoogle Scholar
  32. Zhang J. (2003). Evolution by gene duplication: An update. Trends in Ecology & Evolution, 18, 292–298.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Zhengzhang Chen
    • 1
    • 2
  • William Hendrix
    • 1
  • Nagiza F. Samatova
    • 1
    • 2
  1. 1.Department of Computer ScienceNorth Carolina State UniversityRaleighUSA
  2. 2.Computer Science and Mathematics DivisionOak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations