JSAI International Symposium on Artificial Intelligence

JSAI-isAI 2014: New Frontiers in Artificial Intelligence pp 340-355 | Cite as

Detecting Anomalous Subgraphs on Attributed Graphs via Parametric Flow

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9067)

Abstract

Detecting anomalies from structured graph data is becoming a critical task for many applications such as an analysis of disease infection in communities. To date, however, there exists no efficient method that works on massive attributed graphs with millions of vertices for detecting anomalous subgraphs with an abnormal distribution of vertex attributes. Here we report that this task is efficiently solved using the recent graph cut-based formulation. In particular, the full hierarchy of anomalous subgraphs can be simultaneously obtained via the parametric flow algorithm, which allows us to introduce the size constraint on anomalous subgraphs. We thoroughly examine the method using various sizes of synthetic and real-world datasets and show that our method is more than five orders of magnitude faster than the state-of-the-art method and is more effective in detection of anomalous subgraphs.

References

  1. 1.
    Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)CrossRefMATHGoogle Scholar
  2. 2.
    Akoglu, L., McGlohon, M., Faloutsos, C.: oddball: spotting anomalies in weighted graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 410–421. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  3. 3.
    Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29, 1–63 (2014)MathSciNetGoogle Scholar
  4. 4.
    Azencott, C.A., Grimm, D., Sugiyama, M., Kawahara, Y., Borgwardt, K.M.: Efficient network-guided multi-locus association mapping with graph cuts. Bioinformatics 29(13), i171–i179 (2013)CrossRefGoogle Scholar
  5. 5.
    Bhaduri, K., Matthews, B.L., Giannella, C.R.: Algorithms for speeding up distance-based outlier detection. In: Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 859–867 (2011)Google Scholar
  6. 6.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)Google Scholar
  7. 7.
    Chakrabarti, D.: AutoPart: parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  8. 8.
    Chapelle, O., Schölkopf, B., Zien, A.: A discussion of semi-supervised learning and transduction. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, Chap. 25, pp. 473–478. MIT Press, Cambridge (2006)CrossRefGoogle Scholar
  9. 9.
    Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRefGoogle Scholar
  10. 10.
    Eberle, W., Holder, L.: Discovering structural anomalies in graph-based data. In: IEEE International Conference on Data Mining (ICDM) Workshop, pp. 393–398 (2007)Google Scholar
  11. 11.
    Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18(1), 30–55 (1989)CrossRefMathSciNetMATHGoogle Scholar
  12. 12.
    Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 813–822 (2010)Google Scholar
  13. 13.
    Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum-flow problem. J. ACM 35(4), 921–940 (1988)CrossRefMathSciNetMATHGoogle Scholar
  14. 14.
    Henderson, K., Eliassi-Rad, T., Faloutsos, C., Akoglu, L., Li, L., Maruhashi, K., Prakash, B.A., Tong, H.: Metric forensics: a multi-level approach for mining volatile graphs. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 163–172 (2010)Google Scholar
  15. 15.
    Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 663–671 (2011)Google Scholar
  16. 16.
    Kawahara, Y., Nagano, K.: Structured convex optimization under submodular constraints. In: Proceedings of Uncertainty in Artificial Intelligence (UAI), pp. 459–468 (2013)Google Scholar
  17. 17.
    Lee, H.F., Dooly, D.R.: Algorithms for the constrained maximum-weight connected graph problem. Naval Res. Logistics 43(7), 985–1008 (1996)CrossRefMathSciNetMATHGoogle Scholar
  18. 18.
    Li, N., Sun, H., Chipman, K., George, J., Yan, X.: A probabilistic approach to uncovering attributed graph anomalies. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 82–90 (2014)Google Scholar
  19. 19.
    Lin, C.Y., Tong, H.: Non-negative residual matrix factorization with application to graph anomaly detection. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 143–153 (2011)Google Scholar
  20. 20.
    Müller, E., Sanchez, P.I., Mülle, Y., Böhm, K.: Ranking outlier nodes in subspaces of attributed graphs. In: ICDE Workshop, pp. 216–222 (2013)Google Scholar
  21. 21.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRefGoogle Scholar
  22. 22.
    Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636 (2003)Google Scholar
  23. 23.
    Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Dover, New York (1998)MATHGoogle Scholar
  24. 24.
    Perozzi, B., Akoglu, L. Sánchez, P.I., Müller, E.: Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)Google Scholar
  25. 25.
    Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 877–885 (2012)Google Scholar
  26. 26.
    Sugiyama, M., Azencott, C.A., Grimm, D., Kawahara, Y., Borgwardt, K.M.: Multi-task feature selection on multiple networks via maximum flows. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 199–207 (2014)Google Scholar
  27. 27.
    Sugiyama, M., Borgwardt, K.M.: Rapid distance-based outlier detection via sampling. In: Advances in Neural Information Processing Systems, pp. 467–475 (2013)Google Scholar
  28. 28.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)CrossRefGoogle Scholar
  29. 29.
    Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: GBAGC: a general Bayesian framework for attributed graph clustering. ACM Trans. Knowl. Disc. Data 9(1), 1–43 (2014)CrossRefGoogle Scholar
  30. 30.
    Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of the 2012 IEEE International Conference on Data Mining (ICDM), pp. 745–754 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.ISIROsaka UniversityOsakaJapan
  2. 2.JST, PRESTOChiyoda-kuJapan
  3. 3.Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations