Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Large graphs are ubiquitous in today’s applications. Besides the mere graph structure, data sources usually provide information about single objects by feature vectors. To realize the full potential for knowledge extraction, recent approaches consider both information types simultaneously. Thus, for the task of clustering, combined clustering models determine object groups within one network that are densely connected and show similar characteristics. However, due to the inherent complexity of such a combination, the existing methods are not efficiently executable and are hardly applicable to large graphs.

In this work, we develop a method for an efficient clustering of combined data sources, while at the same time finding high-quality results. We prove the complexity of our model and identify the critical parts inhibiting an efficient execution. Based on this analysis, we develop the algorithm EDCAR that approximates the optimal clustering solution using the established GRASP (Greedy Randomized Adaptive Search) principle. In thorough experiments we show that EDCAR outperforms all competing approaches in terms of runtime and simultaneously achieves high clustering qualities. For repeatability and further research we publish all datasets, executables and parameter settings on our website.