Scientific and Statistical Database Management

Volume 7338 of the series Lecture Notes in Computer Science pp 280-297

Substructure Clustering: A Novel Mining Paradigm for Arbitrary Data Types

  • Stephan GünnemannAffiliated withRWTH Aachen University
  • , Brigitte BodenAffiliated withRWTH Aachen University
  • , Thomas SeidlAffiliated withRWTH Aachen University


Subspace clustering is an established mining task for grouping objects that are represented by vector data. By considering subspace projections of the data, the problem of full-space clustering is avoided: objects show no similarity w.r.t. all of their attributes but only w.r.t. subsets of their characteristics. This effect is not limited to vector data but can be observed in several other scientific domains including graphs, where we just find similar subgraphs, or time series, where only shorter subsequences show the same behavior. In each scenario, using the whole representation of the objects for clustering is futile. We need to find clusters of similar substructures. However, none of the existing substructure mining paradigms as subspace clustering, frequent subgraph mining, or motif discovery is able to solve this task entirely since they tackle only a few challenges and are restricted to a specific type of data.

In this work, we unify and generalize existing substructure mining tasks to the novel paradigm of substructure clustering that is applicable to data of an arbitrary type. As a proof of concept showing the feasibility of our novel paradigm, we present a specific instantiation for the task of subgraph clustering. By integrating the ideas of different research areas into a novel paradigm, the aim of our paper is to inspire future research directions in the individual areas.