# Community-based anomaly detection in evolutionary networks

- 703 Downloads
- 18 Citations

## Abstract

Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks—their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non–representative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11–46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.

### Keywords

Anomaly detection Time-varying graphs Evolutionary analysis Community detection Community-based anomaly## Notes

### Acknowledgements

The authors would like to thank Matthew C. Schmidt for his maximal clique enumeration program code, and we would like to thank Kevin A. Wilson and Ye Jin for valuable discussions.

### References

- Bader, D. A., & Madduri, K. (2006).
*Gtgraph: A synthetic graph generator suite*. Technical Report GA 30332, Georgia Institute of Technology, Atlanta.Google Scholar - Baird, D., & Ulanowicz, R. E. (1989). The seasonal dynamics of the chesapeake bay ecosystem.
*Ecological Monographs, 59*, 329–364.CrossRefGoogle Scholar - Chakrabarti, D. (2004). Autopart: Parameter-free graph partitioning and outlier detection. In
*PKDD*(pp. 112–124).Google Scholar - Chakrabarti, D., Zhan, Y., & Faloutsos, C. (2004). R-mat: A recursive model for graph mining. In
*SDM*.Google Scholar - Chan, P. K., & Mahoney, M. V. (2005). Modeling multiple time series for anomaly detection. In
*ICDM*(pp. 90–97).Google Scholar - Chen, L., DeVries, A. L., & Cheng, C. H. (1997). Convergent evolution of antifreeze glycoproteins in Antarctic notothenioid fish and Arctic cod.
*Proceedings of the National Academy of Sciences of the United States of America, 94*, 3817–3822.CrossRefGoogle Scholar - Cheng, H., Tan, P.-N., Potter, C., & Klooster, S. (2008). A robust graph–based algorithm for detection and characterization of anomalies in noisy multivariate time series. In
*IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008*(pp. 349–358).Google Scholar - Clauset, G., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks.
*Physical Review E, 70*, 1–6.CrossRefGoogle Scholar - Eberle, W., & Holder, L. (2006). Detecting anomalies in cargo shipments using graph properties. In
*Proceedings of the IEEE intelligence and security informatics conference*.Google Scholar - Eberle, W., & Holder, L. (2007). Discovering structural anomalies in graph–based data. In
*Workshops proceedings of the 7th IEEE international conference on data mining*(pp. 393–398).Google Scholar - Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks.
*Proceedings of the National Academy of Sciences, 99*(12), 7821–7826.MathSciNetMATHCrossRefGoogle Scholar - Hautamäki, V., Kärkkäinen, I., & Fränti, P. (2004). Outlier detection using k-nearest neighbour graph. In
*ICPR (3)*(pp. 430–433).Google Scholar - Hopcroft, J., Khan, O., Kulis, B., & Selman, B. (2004). Tracking evolving communities in large linked networks.
*Proceedings of the National Academy of Sciences, 101*, 5249–5253.CrossRefGoogle Scholar - Keogh, E. J., Lin, J., & Fu, A. W.-C. (2005). Hot sax: Efficiently finding the most unusual time series subsequence. In
*ICDM*(pp. 226–233).Google Scholar - Lin, S., & Chalupsky, H. (2003). Unsupervised link discovery in multi-relational data via rarity analysis. In
*ICDM*(pp. 171–178).Google Scholar - Long, M., Betran, E., Thornton, K., & Wang, W. (2003). The origin of new genes: Glimpses from the young and old.
*Nature Reviews. Genetics, 4*(11), 865–875.CrossRefGoogle Scholar - Moonesinghe, H., & Tan, P.-N. (2006). Outlier detection using random walks. In
*International Conference on Tools with Artificial Intelligence, ICTAI*(pp. 532–539).Google Scholar - Noble, C. C., & Cook, D. J. (2003). Graph–based anomaly detection. In
*KDD ’03: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining*(pp. 631–636). New York: ACM.CrossRefGoogle Scholar - Padmanabh, K., Vanteddu, A., Sen, S., & Gupta, P. (2007). Random walk on random graph based outlier detection in wireless sensor networks. In
*Wireless communication and sensor networks*(pp. 45–49).Google Scholar - Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society.
*Nature, 435*(7043), 814–818.CrossRefGoogle Scholar - Palla, G., Albert-László Barabási, A., & Vicsek, T. (2007). Quantifying social group evolution.
*Nature, 446*, 664–667.CrossRefGoogle Scholar - Schmidt, M. C., Samatova, N. F., Thomas, K., & Park, B.-H. (2009). A scalable, parallel algorithm for maximal clique enumeration.
*Journal of Parallel and Distributed Computing, 69*(4), 417–428.CrossRefGoogle Scholar - Shetty, J., & Adibi, J. (2005). Discovering important nodes through graph entropy the case of enron email database. In
*LinkKDD ’05: proceedings of the 3rd international workshop on link discovery*(pp. 74–81). New York: ACM.CrossRefGoogle Scholar - Snel, B., Bork, P., & Huynen, M. A. (2000). Genome evolution. Gene fusion versus gene fission.
*Trends in Genetics, 16*, 9–11.CrossRefGoogle Scholar - Staniford-chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagl, J., et al. (1996). Grids—a graph based intrusion detection system for large networks. In
*Proceedings of the 19th national information systems security conference*(pp. 361–370).Google Scholar - Steinhaeuser, K., Chawla, N. V., & Ganguly, A. R. (2009). An exploration of climate data using complex networks. In
*SensorKDD ’09: Proceedings of the 3rd international workshop on knowledge discovery from sensor data*(pp. 23–31). New York: ACM.CrossRefGoogle Scholar - Sun, J., Faloutsos, C., Papadimitriou, S., & Yu, P. S. (2007). Graphscope: Parameter-free mining of large time-evolving graphs. In
*KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining*(pp. 687–696). San Jose: ACM.CrossRefGoogle Scholar - Sun, J., Qu, H., Chakrabarti, D., & Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In
*The 5th IEEE International Conference on Data Mining (ICDM)*(pp. 418–425).Google Scholar - Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In
*KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining*(pp. 374–383). New York: ACM.CrossRefGoogle Scholar - Tantipathananandh, C., Wolf, T. B., & Kempe, D. (2007). A framework for community identification in dynamic social networks. In
*KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining*(pp. 717–726). ACM.Google Scholar - Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks.
*Nature, 393*(6684), 440–442.CrossRefGoogle Scholar - Zhang J. (2003). Evolution by gene duplication: An update.
*Trends in Ecology & Evolution, 18*, 292–298.CrossRefGoogle Scholar