Community Distribution Outlier Detection in Heterogeneous Information Networks

  • Manish Gupta
  • Jing Gao
  • Jiawei Han
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8188)


Heterogeneous networks are ubiquitous. For example, bibliographic data, social data, medical records, movie data and many more can be modeled as heterogeneous networks. Rich information associated with multi-typed nodes in heterogeneous networks motivates us to propose a new definition of outliers, which is different from those defined for homogeneous networks. In this paper, we propose the novel concept of Community Distribution Outliers (CDOutliers) for heterogeneous information networks, which are defined as objects whose community distribution does not follow any of the popular community distribution patterns.We extract such outliers using a type-aware joint analysis of multiple types of objects. Given community membership matrices for all types of objects, we follow an iterative two-stage approach which performs pattern discovery and outlier detection in a tightly integrated manner. We first propose a novel outlier-aware approach based on joint non-negative matrix factorization to discover popular community distribution patterns for all the object types in a holistic manner, and then detect outliers based on such patterns. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting community distribution outliers.


Outlier Detection Heterogeneous Network Cluster Centroid Pattern Discovery Community Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aggarwal, C.C., Yu, P.S.: Outlier Detection for High Dimensional Data. SIGMOD Records 30, 37–46 (2001)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C., Zhao, Y., Yu, P.S.: Outlier Detection in Graph Streams. In: ICDE, pp. 399–409 (2011)Google Scholar
  3. 3.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Surveys 41(3) (2009)Google Scholar
  4. 4.
    Ding, C.H.Q., He, X.: On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering. In: SDM, pp. 606–610 (2005)Google Scholar
  5. 5.
    Fox, A.J.: Outliers in Time Series. Journal of the Royal Statistical Society 34(3), 350–363 (1972)zbMATHGoogle Scholar
  6. 6.
    Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On Community Outliers and their Efficient Detection in Information Networks. In: KDD, pp. 813–822 (2010)Google Scholar
  7. 7.
    Ghoting, A., Otey, M.E., Parthasarathy, S.: LOADED: Link-Based Outlier and Anomaly Detection in Evolving Data Sets. In: ICDM, pp. 387–390 (2004)Google Scholar
  8. 8.
    Gupta, M., Gao, J., Han, J.: On Detecting Association-Based Clique Outliers in Heterogeneous Information Networks. In: ASONAM (to appear, 2013)Google Scholar
  9. 9.
    Gupta, M., Gao, J., Sun, Y., Han, J.: Community Trend Outlier Detection Using Soft Temporal Pattern Mining. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 692–708. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Gupta, M., Gao, J., Sun, Y., Han, J.: Integrating Community Matching and Outlier Detection for Mining Evolutionary Community Outliers. In: KDD, pp. 859–867 (2012)Google Scholar
  11. 11.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1), 10–18 (2009)CrossRefGoogle Scholar
  12. 12.
    Hodge, V.J., Austin, J.: A Survey of Outlier Detection Methodologies. AI Review 22(2), 85–126 (2004)zbMATHGoogle Scholar
  13. 13.
    Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-Based Outliers: Algorithms and Applications. VLDBJ 8, 237–253 (2000)CrossRefGoogle Scholar
  14. 14.
    Koutra, D., Papalexakis, E.E., Faloutsos, C.: TensorSplat: Spotting Latent Anomalies in Time. In: Panhellenic Conference on Informatics, pp. 144–149 (2012)Google Scholar
  15. 15.
    Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based Outlier Detection in High-Dimensional Data. In: KDD, pp. 444–452 (2008)Google Scholar
  16. 16.
    MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate Observations. In: Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  17. 17.
    Maruhashi, K., Guo, F., Faloutsos, C.: MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis. In: ASONAM, pp. 203–210 (2011)Google Scholar
  18. 18.
    Noble, C.C., Cook, D.J.: Graph-Based Anomaly Detection. In: KDD, pp. 631–636 (2003)Google Scholar
  19. 19.
    Sun, Y., Han, J., Yan, X., Yu, P.S.: Mining Knowledge from Interconnected Data: A Heterogeneous Information Network Analysis Approach. In: PVLDB (2012)Google Scholar
  20. 20.
    Sun, Y., Yu, Y., Han, J.: Ranking-based Clustering of Heterogeneous Information Networks with Star Network Schema. In: KDD, pp. 797–806 (2009)Google Scholar
  21. 21.
    Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: A Structural Clustering Algorithm for Networks. In: KDD, pp. 824–833 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Manish Gupta
    • 1
  • Jing Gao
    • 2
  • Jiawei Han
    • 3
  1. 1.MicrosoftIndia
  2. 2.SUNYBuffaloUSA
  3. 3.UIUCUSA

Personalised recommendations