Substructure Clustering: A Novel Mining Paradigm for Arbitrary Data Types

Günnemann, Stephan; Boden, Brigitte; Seidl, Thomas

doi:10.1007/978-3-642-31235-9_19

Stephan Günnemann¹⁸,
Brigitte Boden¹⁸ &
Thomas Seidl¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7338))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1650 Accesses
1 Citations

Abstract

Subspace clustering is an established mining task for grouping objects that are represented by vector data. By considering subspace projections of the data, the problem of full-space clustering is avoided: objects show no similarity w.r.t. all of their attributes but only w.r.t. subsets of their characteristics. This effect is not limited to vector data but can be observed in several other scientific domains including graphs, where we just find similar subgraphs, or time series, where only shorter subsequences show the same behavior. In each scenario, using the whole representation of the objects for clustering is futile. We need to find clusters of similar substructures. However, none of the existing substructure mining paradigms as subspace clustering, frequent subgraph mining, or motif discovery is able to solve this task entirely since they tackle only a few challenges and are restricted to a specific type of data.

In this work, we unify and generalize existing substructure mining tasks to the novel paradigm of substructure clustering that is applicable to data of an arbitrary type. As a proof of concept showing the feasibility of our novel paradigm, we present a specific instantiation for the task of subgraph clustering. By integrating the ideas of different research areas into a novel paradigm, the aim of our paper is to inspire future research directions in the individual areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., Wang, H.: Managing and Mining Graph Data. Springer, New York (2010)
Book MATH Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is ”Nearest Neighbor” Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Chen, J.: Making subsequence time series clustering meaningful. In: ICDM, pp. 114–121 (2005)
Google Scholar
Deutsch, A., Fernández, M.F., Suciu, D.: Storing semistructured data with stored. In: SIGMOD, pp. 431–442 (1999)
Google Scholar
Gouda, K., Zaki, M.J.: Genmax: An efficient algorithm for mining maximal frequent itemsets. DMKD 11(3), 223–242 (2005)
Article MathSciNet Google Scholar
Günnemann, S., Färber, I., Boden, B., Seidl, T.: Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In: ICDM, pp. 845–850 (2010)
Google Scholar
Günnemann, S., Kremer, H., Seidl, T.: Subspace clustering for uncertain data. In: SDM, pp. 385–396 (2010)
Google Scholar
Günnemann, S., Müller, E., Färber, I., Seidl, T.: Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp. 1317–1326 (2009)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Jacobs, B.: Categorical Logic and Type Theory. Studies in Logic and the Foundations of Mathematics, vol. 141. North Holland, Amsterdam (1999)
MATH Google Scholar
Jin, X., Lu, Y., Shi, C.: Distribution Discovery: Local Analysis of Temporal Rules. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 469–480. Springer, Heidelberg (2002)
Chapter Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 33(1), 1–58 (2009)
Article Google Scholar
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: KDD, pp. 631–636 (2006)
Google Scholar
Lin, J., Keogh, E., Truppel, W.: Clustering of streaming time series is meaningless. In: SIGMOD, pp. 56–65 (2003)
Google Scholar
Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)
Google Scholar
Patrikainen, A., Meila, M.: Comparing subspace clusterings. TKDE 18(7), 902–916 (2006)
Google Scholar
Poernomo, A.K., Gopalkrishnan, V.: Towards efficient mining of proportional fault-tolerant frequent itemsets. In: KDD, pp. 697–706 (2009)
Google Scholar
Rombo, S.E., Terracina, G.: Discovering Representative Models in Large Time Series Databases. In: Christiansen, H., Hacid, M.-S., Andreasen, T., Larsen, H.L. (eds.) FQAS 2004. LNCS (LNAI), vol. 3055, pp. 84–97. Springer, Heidelberg (2004)
Chapter Google Scholar
Sanfeliu, A., Fu, K.S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics 13, 353–362 (1983)
MATH Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22(8), 888–905 (2000)
Article Google Scholar
Tan, Z., Tung, A.: Substructure clustering on sequential 3D object datasets. In: ICDE, pp. 634–645 (2004)
Google Scholar
Thomas, L., Valluri, S., Karlapalem, K.: Margin: Maximal frequent subgraph mining. In: ICDM, pp. 1097–1101 (2006)
Google Scholar
Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: ICML, pp. 953–960 (2006)
Google Scholar
Wang, C., Parthasarathy, S.: Summarizing itemset patterns using probabilistic models. In: KDD, pp. 730–735 (2006)
Google Scholar
Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: KDD, pp. 286–295 (2003)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)
Google Scholar
Yankov, D., Keogh, E.J., Medina, J., Chiu, B.Y., Zordan, V.B.: Detecting time series motifs under uniform scaling. In: KDD, pp. 844–853 (2007)
Google Scholar
Zhang, S., Yang, J., Li, S.: RING: An Integrated Method for Frequent Representative Subgraph Mining. In: ICDM, pp. 1082–1087 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

RWTH Aachen University, Germany
Stephan Günnemann, Brigitte Boden & Thomas Seidl

Authors

Stephan Günnemann
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Boden
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science, EPFL IC SIN-GE, Ecole Polytechnique Federale de Lausanne, Batiment BC, Station 14, 1015, Lausanne, Switzerland
Anastasia Ailamaki
Department of Computer Science, Gonzaga University, 502 E. Boone Avenue, 99258-0026, Spokane, WA, USA
Shawn Bowers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Günnemann, S., Boden, B., Seidl, T. (2012). Substructure Clustering: A Novel Mining Paradigm for Arbitrary Data Types. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-31235-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31234-2
Online ISBN: 978-3-642-31235-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics