Substructure Clustering: A Novel Mining Paradigm for Arbitrary Data Types

  • Stephan Günnemann
  • Brigitte Boden
  • Thomas Seidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7338)

Abstract

Subspace clustering is an established mining task for grouping objects that are represented by vector data. By considering subspace projections of the data, the problem of full-space clustering is avoided: objects show no similarity w.r.t. all of their attributes but only w.r.t. subsets of their characteristics. This effect is not limited to vector data but can be observed in several other scientific domains including graphs, where we just find similar subgraphs, or time series, where only shorter subsequences show the same behavior. In each scenario, using the whole representation of the objects for clustering is futile. We need to find clusters of similar substructures. However, none of the existing substructure mining paradigms as subspace clustering, frequent subgraph mining, or motif discovery is able to solve this task entirely since they tackle only a few challenges and are restricted to a specific type of data.

In this work, we unify and generalize existing substructure mining tasks to the novel paradigm of substructure clustering that is applicable to data of an arbitrary type. As a proof of concept showing the feasibility of our novel paradigm, we present a specific instantiation for the task of subgraph clustering. By integrating the ideas of different research areas into a novel paradigm, the aim of our paper is to inspire future research directions in the individual areas.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., Wang, H.: Managing and Mining Graph Data. Springer, New York (2010)MATHCrossRefGoogle Scholar
  2. 2.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is ”Nearest Neighbor” Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  3. 3.
    Chen, J.: Making subsequence time series clustering meaningful. In: ICDM, pp. 114–121 (2005)Google Scholar
  4. 4.
    Deutsch, A., Fernández, M.F., Suciu, D.: Storing semistructured data with stored. In: SIGMOD, pp. 431–442 (1999)Google Scholar
  5. 5.
    Gouda, K., Zaki, M.J.: Genmax: An efficient algorithm for mining maximal frequent itemsets. DMKD 11(3), 223–242 (2005)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Günnemann, S., Färber, I., Boden, B., Seidl, T.: Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In: ICDM, pp. 845–850 (2010)Google Scholar
  7. 7.
    Günnemann, S., Kremer, H., Seidl, T.: Subspace clustering for uncertain data. In: SDM, pp. 385–396 (2010)Google Scholar
  8. 8.
    Günnemann, S., Müller, E., Färber, I., Seidl, T.: Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp. 1317–1326 (2009)Google Scholar
  9. 9.
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8(1), 53–87 (2004)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Jacobs, B.: Categorical Logic and Type Theory. Studies in Logic and the Foundations of Mathematics, vol. 141. North Holland, Amsterdam (1999)MATHGoogle Scholar
  11. 11.
    Jin, X., Lu, Y., Shi, C.: Distribution Discovery: Local Analysis of Temporal Rules. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 469–480. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 33(1), 1–58 (2009)CrossRefGoogle Scholar
  13. 13.
    Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: KDD, pp. 631–636 (2006)Google Scholar
  14. 14.
    Lin, J., Keogh, E., Truppel, W.: Clustering of streaming time series is meaningless. In: SIGMOD, pp. 56–65 (2003)Google Scholar
  15. 15.
    Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)Google Scholar
  16. 16.
    Patrikainen, A., Meila, M.: Comparing subspace clusterings. TKDE 18(7), 902–916 (2006)Google Scholar
  17. 17.
    Poernomo, A.K., Gopalkrishnan, V.: Towards efficient mining of proportional fault-tolerant frequent itemsets. In: KDD, pp. 697–706 (2009)Google Scholar
  18. 18.
    Rombo, S.E., Terracina, G.: Discovering Representative Models in Large Time Series Databases. In: Christiansen, H., Hacid, M.-S., Andreasen, T., Larsen, H.L. (eds.) FQAS 2004. LNCS (LNAI), vol. 3055, pp. 84–97. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  19. 19.
    Sanfeliu, A., Fu, K.S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics 13, 353–362 (1983)MATHGoogle Scholar
  20. 20.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22(8), 888–905 (2000)CrossRefGoogle Scholar
  21. 21.
    Tan, Z., Tung, A.: Substructure clustering on sequential 3D object datasets. In: ICDE, pp. 634–645 (2004)Google Scholar
  22. 22.
    Thomas, L., Valluri, S., Karlapalem, K.: Margin: Maximal frequent subgraph mining. In: ICDM, pp. 1097–1101 (2006)Google Scholar
  23. 23.
    Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: ICML, pp. 953–960 (2006)Google Scholar
  24. 24.
    Wang, C., Parthasarathy, S.: Summarizing itemset patterns using probabilistic models. In: KDD, pp. 730–735 (2006)Google Scholar
  25. 25.
    Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: KDD, pp. 286–295 (2003)Google Scholar
  26. 26.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)Google Scholar
  27. 27.
    Yankov, D., Keogh, E.J., Medina, J., Chiu, B.Y., Zordan, V.B.: Detecting time series motifs under uniform scaling. In: KDD, pp. 844–853 (2007)Google Scholar
  28. 28.
    Zhang, S., Yang, J., Li, S.: RING: An Integrated Method for Frequent Representative Subgraph Mining. In: ICDM, pp. 1082–1087 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Stephan Günnemann
    • 1
  • Brigitte Boden
    • 1
  • Thomas Seidl
    • 1
  1. 1.RWTH Aachen UniversityGermany

Personalised recommendations