Finding the Most Descriptive Substructures in Graphs with Discrete and Numeric Labels

  • Michael Davis
  • Weiru Liu
  • Paul Miller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7765)


Many graph datasets are labelled with discrete and numeric attributes. Frequent substructure discovery algorithms usually ignore numeric attributes; in this paper we show that they can be used to improve discrimination and search performance. Our thesis is that the most descriptive substructures are those which are normative both in terms of their structure and in terms of their numeric values. We explore the relationship between graph structure and the distribution of attribute values and propose an outlier-detection step, which is used as a constraint during substructure discovery. By pruning anomalous vertices and edges, more weight is given to the most descriptive substructures. Our experiments on a real-world access control database returns similar substructures to unconstrained search with 30% fewer graph isomorphism tests.


graph mining frequent substructure discovery numeric attributes outlier detection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  2. 2.
    Borgelt, C.: Canonical forms for frequent graph mining. In: 30th Annual Conf. German Classification Society, GfKl 2006, pp. 337–349. Springer (2006)Google Scholar
  3. 3.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. SIGMOD Rec. 29(2), 93–104 (2000)CrossRefGoogle Scholar
  4. 4.
    Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: SDM 2004. SIAM (2004)Google Scholar
  5. 5.
    Cook, D.J., Holder, L.B.: Graph-based data mining. IEEE Intelligent Systems 15, 32–41 (2000)Google Scholar
  6. 6.
    Davis, M., Liu, W., Miller, P., Redpath, G.: Detecting anomalies in graphs with numeric labels. In: CIKM 2011, pp. 1197–1202. ACM (2011)Google Scholar
  7. 7.
    Eichinger, F., Huber, M., Böhm, K.: On the usefulness of weight-based constraints in frequent subgraph mining. In: ICAI 2010, pp. 65–78. BCS SGAI (December 2010)Google Scholar
  8. 8.
    Fortin, S.: The graph isomorphism problem. Tech. rep., Univ. of Alberta (1996)Google Scholar
  9. 9.
    Huan, J., Wang, W., Prins, J., Yang, J.: SPIN: Mining maximal frequent subgraphs from graph databases. In: KDD 2004, pp. 581–586. ACM (2004)Google Scholar
  10. 10.
    Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Jiang, C., Coenen, F., Zito, M.: Frequent sub-graph mining on edge weighted graphs. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 77–88. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery in Data 3(1), 1:1–1:58 (2009)Google Scholar
  13. 13.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: ICDM 2001, pp. 313–320. IEEE (2001)Google Scholar
  14. 14.
    McKay, B.D., Piperno, A.: Practical graph isomorphism, II (January 2013),
  15. 15.
    Moser, F., Colak, R., Rafiey, A., Ester, M.: Mining cohesive patterns from graphs with feature vectors. In: SDM, pp. 593–604 (2009)Google Scholar
  16. 16.
    Papadimitriou, S., Kitagawa, H., Gibbons, P., Faloutsos, C.: Loci: fast outlier detection using the local correlation integral. In: Proceedings of the 19th International Conference on Data Engineering 2003, pp. 315–326 (March 2003)Google Scholar
  17. 17.
    de Vries, T., Chawla, S., Houle, M.: Finding local anomalies in very high dimensional space. In: ICDM 2010, pp. 128–137. IEEE (2010)Google Scholar
  18. 18.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: ICDM 2002, pp. 721–724. IEEE (2002)Google Scholar
  19. 19.
    Yan, X., Han, J.: CloseGraph: Mining closed frequent graph patterns. In: KDD 2003, pp. 286–295. ACM (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Michael Davis
    • 1
  • Weiru Liu
    • 1
  • Paul Miller
    • 1
  1. 1.Centre for Secure Information Technologies (CSIT), School of Electronics, Electrical Engineering and Computer ScienceQueen’s UniversityBelfastUnited Kingdom

Personalised recommendations