Abstract
Subspace clustering is an established mining task for grouping objects that are represented by vector data. By considering subspace projections of the data, the problem of full-space clustering is avoided: objects show no similarity w.r.t. all of their attributes but only w.r.t. subsets of their characteristics. This effect is not limited to vector data but can be observed in several other scientific domains including graphs, where we just find similar subgraphs, or time series, where only shorter subsequences show the same behavior. In each scenario, using the whole representation of the objects for clustering is futile. We need to find clusters of similar substructures. However, none of the existing substructure mining paradigms as subspace clustering, frequent subgraph mining, or motif discovery is able to solve this task entirely since they tackle only a few challenges and are restricted to a specific type of data.
In this work, we unify and generalize existing substructure mining tasks to the novel paradigm of substructure clustering that is applicable to data of an arbitrary type. As a proof of concept showing the feasibility of our novel paradigm, we present a specific instantiation for the task of subgraph clustering. By integrating the ideas of different research areas into a novel paradigm, the aim of our paper is to inspire future research directions in the individual areas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C., Wang, H.: Managing and Mining Graph Data. Springer, New York (2010)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is ”Nearest Neighbor” Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chen, J.: Making subsequence time series clustering meaningful. In: ICDM, pp. 114–121 (2005)
Deutsch, A., Fernández, M.F., Suciu, D.: Storing semistructured data with stored. In: SIGMOD, pp. 431–442 (1999)
Gouda, K., Zaki, M.J.: Genmax: An efficient algorithm for mining maximal frequent itemsets. DMKD 11(3), 223–242 (2005)
Günnemann, S., Färber, I., Boden, B., Seidl, T.: Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In: ICDM, pp. 845–850 (2010)
Günnemann, S., Kremer, H., Seidl, T.: Subspace clustering for uncertain data. In: SDM, pp. 385–396 (2010)
Günnemann, S., Müller, E., Färber, I., Seidl, T.: Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp. 1317–1326 (2009)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8(1), 53–87 (2004)
Jacobs, B.: Categorical Logic and Type Theory. Studies in Logic and the Foundations of Mathematics, vol. 141. North Holland, Amsterdam (1999)
Jin, X., Lu, Y., Shi, C.: Distribution Discovery: Local Analysis of Temporal Rules. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 469–480. Springer, Heidelberg (2002)
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 33(1), 1–58 (2009)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: KDD, pp. 631–636 (2006)
Lin, J., Keogh, E., Truppel, W.: Clustering of streaming time series is meaningless. In: SIGMOD, pp. 56–65 (2003)
Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)
Patrikainen, A., Meila, M.: Comparing subspace clusterings. TKDE 18(7), 902–916 (2006)
Poernomo, A.K., Gopalkrishnan, V.: Towards efficient mining of proportional fault-tolerant frequent itemsets. In: KDD, pp. 697–706 (2009)
Rombo, S.E., Terracina, G.: Discovering Representative Models in Large Time Series Databases. In: Christiansen, H., Hacid, M.-S., Andreasen, T., Larsen, H.L. (eds.) FQAS 2004. LNCS (LNAI), vol. 3055, pp. 84–97. Springer, Heidelberg (2004)
Sanfeliu, A., Fu, K.S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics 13, 353–362 (1983)
Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22(8), 888–905 (2000)
Tan, Z., Tung, A.: Substructure clustering on sequential 3D object datasets. In: ICDE, pp. 634–645 (2004)
Thomas, L., Valluri, S., Karlapalem, K.: Margin: Maximal frequent subgraph mining. In: ICDM, pp. 1097–1101 (2006)
Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: ICML, pp. 953–960 (2006)
Wang, C., Parthasarathy, S.: Summarizing itemset patterns using probabilistic models. In: KDD, pp. 730–735 (2006)
Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: KDD, pp. 286–295 (2003)
Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)
Yankov, D., Keogh, E.J., Medina, J., Chiu, B.Y., Zordan, V.B.: Detecting time series motifs under uniform scaling. In: KDD, pp. 844–853 (2007)
Zhang, S., Yang, J., Li, S.: RING: An Integrated Method for Frequent Representative Subgraph Mining. In: ICDM, pp. 1082–1087 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Günnemann, S., Boden, B., Seidl, T. (2012). Substructure Clustering: A Novel Mining Paradigm for Arbitrary Data Types. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-31235-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31234-2
Online ISBN: 978-3-642-31235-9
eBook Packages: Computer ScienceComputer Science (R0)