Abstract
Detecting anomalies from structured graph data is becoming a critical task for many applications such as an analysis of disease infection in communities. To date, however, there exists no efficient method that works on massive attributed graphs with millions of vertices for detecting anomalous subgraphs with an abnormal distribution of vertex attributes. Here we report that this task is efficiently solved using the recent graph cut-based formulation. In particular, the full hierarchy of anomalous subgraphs can be simultaneously obtained via the parametric flow algorithm, which allows us to introduce the size constraint on anomalous subgraphs. We thoroughly examine the method using various sizes of synthetic and real-world datasets and show that our method is more than five orders of magnitude faster than the state-of-the-art method and is more effective in detection of anomalous subgraphs.
Keywords
- Anomalous Subgraphs
- Maximum Flow Algorithm
- Anomalous Vertex
- Real-world Graph Datasets
- Comparison Partners
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Source code is available at http://research.microsoft.com/en-us/downloads/d3adb5f7-49ea-4170-abde-ea0206b25de2/. Since the code can handle only integers for parameters, we first transform every parameter to an integer by multiplying some constant value.
- 3.
- 4.
- 5.
References
Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)
Akoglu, L., McGlohon, M., Faloutsos, C.: oddball: spotting anomalies in weighted graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 410–421. Springer, Heidelberg (2010)
Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29, 1–63 (2014)
Azencott, C.A., Grimm, D., Sugiyama, M., Kawahara, Y., Borgwardt, K.M.: Efficient network-guided multi-locus association mapping with graph cuts. Bioinformatics 29(13), i171–i179 (2013)
Bhaduri, K., Matthews, B.L., Giannella, C.R.: Algorithms for speeding up distance-based outlier detection. In: Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 859–867 (2011)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Chakrabarti, D.: AutoPart: parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004)
Chapelle, O., Schölkopf, B., Zien, A.: A discussion of semi-supervised learning and transduction. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, Chap. 25, pp. 473–478. MIT Press, Cambridge (2006)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
Eberle, W., Holder, L.: Discovering structural anomalies in graph-based data. In: IEEE International Conference on Data Mining (ICDM) Workshop, pp. 393–398 (2007)
Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18(1), 30–55 (1989)
Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 813–822 (2010)
Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum-flow problem. J. ACM 35(4), 921–940 (1988)
Henderson, K., Eliassi-Rad, T., Faloutsos, C., Akoglu, L., Li, L., Maruhashi, K., Prakash, B.A., Tong, H.: Metric forensics: a multi-level approach for mining volatile graphs. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 163–172 (2010)
Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 663–671 (2011)
Kawahara, Y., Nagano, K.: Structured convex optimization under submodular constraints. In: Proceedings of Uncertainty in Artificial Intelligence (UAI), pp. 459–468 (2013)
Lee, H.F., Dooly, D.R.: Algorithms for the constrained maximum-weight connected graph problem. Naval Res. Logistics 43(7), 985–1008 (1996)
Li, N., Sun, H., Chipman, K., George, J., Yan, X.: A probabilistic approach to uncovering attributed graph anomalies. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 82–90 (2014)
Lin, C.Y., Tong, H.: Non-negative residual matrix factorization with application to graph anomaly detection. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 143–153 (2011)
Müller, E., Sanchez, P.I., Mülle, Y., Böhm, K.: Ranking outlier nodes in subspaces of attributed graphs. In: ICDE Workshop, pp. 216–222 (2013)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636 (2003)
Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Dover, New York (1998)
Perozzi, B., Akoglu, L. Sánchez, P.I., Müller, E.: Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)
Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 877–885 (2012)
Sugiyama, M., Azencott, C.A., Grimm, D., Kawahara, Y., Borgwardt, K.M.: Multi-task feature selection on multiple networks via maximum flows. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 199–207 (2014)
Sugiyama, M., Borgwardt, K.M.: Rapid distance-based outlier detection via sampling. In: Advances in Neural Information Processing Systems, pp. 467–475 (2013)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)
Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: GBAGC: a general Bayesian framework for attributed graph clustering. ACM Trans. Knowl. Disc. Data 9(1), 1–43 (2014)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of the 2012 IEEE International Conference on Data Mining (ICDM), pp. 745–754 (2012)
Acknowledgment
The authors thank Yoshinobu Kawahara for insightful discussions. This work was partially supported by JSPS KAKENHI 26880013 and Grand-in-Aid for JSPS Fellows 26-4555.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sugiyama, M., Otaki, K. (2015). Detecting Anomalous Subgraphs on Attributed Graphs via Parametric Flow. In: Murata, T., Mineshima, K., Bekki, D. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2014. Lecture Notes in Computer Science(), vol 9067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48119-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-662-48119-6_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48118-9
Online ISBN: 978-3-662-48119-6
eBook Packages: Computer ScienceComputer Science (R0)