Abstract
Graph mining is a set of techniques for finding useful patterns in various types of structured data. Many effective algorithms for mining static graphs have been proposed. However, graphs of human relationships and evolving genes change over time, and such evolving graphs require different algorithms for analysis. In this chapter, we explain a method called O2I for clustering in evolving graphs that can detect changes in clusters over time. O2I partitions the graph sequence into smooth clusters, even when the numbers of clusters and vertices vary. It first constructs a graph from the graph sequence, then uses spectral clustering and the RatioCut to apply k partitioning to this graph. O2I is compared in detail with the preserving clustering membership (PCM) algorithm, which is a conventional online graph-sequence clustering algorithm in which the numbers of clusters and vertices must remain constant. We further show that, in contrast to PCM, the performance of O2I is not dependent on the clustering of the initial graph in the graph sequence. Experiments on synthetic evolving graphs show that O2I is practical to calculate and addresses the main disadvantages of PCM. Further tests on real-world data show that O2I can obtain reasonable clusters. This method is hence a flexible clustering solution and will be useful on a wide range of graph-mining applications in which the connections, number of clusters, and number of vertices of the graphs evolve over time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The weight 0 means that there is no connection between two vertices. We need these zeros to create Laplacian matrices in Sect. 2.2.
- 2.
Because of space limitations, we omit XT X = I henceforth.
- 3.
The numbers of vertices in the third detected cluster sequence increases with time as 〈0, 1, 4, 8, 13, 16, 16, 18, 18, 18〉.
- 4.
Vector yi is the same symbol used in Algorithm 2.
References
Aggarwal, C.C., Han, J., Wang, J., Philip S.Y.: A framework for clustering evolving data streams. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Philip S.Y.: A framework for projected clustering of high dimensional data streams. In: Proc. of International Conference on Very Large Data Bases (VLDB), pp. 852–863 (2004)
Aggarwal, C.C., Han, J., Wang, J., Philip S.Y.: On demand classification of data streams. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD), pp. 503–508 (2004)
Bar-Joseph, Z., Gerber, G.K., Gifford, D.K., Jaakkola, T.S., Simon, I.: A new approach to analyzing gene expression time series data. In: Proceedings of International Conference on Computational Biology (RECOMB), pp. 39–48 (2002)
Beringer, J., Hüllermeier, E.: Online clustering of parallel data streams. Data Knowl. Eng. 58(2), 180–204 (2006)
Berlingerio, M., Bonchi, F., Bringmann, B., Gionis, A.: Mining graph evolution rules. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pp. 115–130 (2009)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 328–339 (2006)
Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary clustering. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD), pp. 554–560 (2006)
Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proceedings of Annual ACM Symposium on Theory of Computing (STOC), pp. 30–39 (2003)
Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: On evolutionary spectral clustering. ACM Trans. Knowl. Discov. Data 3(4), 17:1–17:30 (2009)
Domingos, P.M., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of International Conference on Machine Learning (ICML), pp. 106–113 (2001)
Inokuchi, A., Washio, I.: Mining frequent graph sequence patterns induced by vertices. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 466–477 (2010)
Klimmt, B., Yang, Y.: Introducing the Enron corpus. In: CEAS Conference (2004)
Möller-Levet, C.S., Klawonn, F., Cho, K.-H., Yin, H., Wolkenhauer, O.: Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets Syst. 152(1), 49–66 (2005)
O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-data algorithms for high-quality clustering. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 685–694 (2002)
Okui, S., Osamura, K., Inokuchi, A.: Detecting smooth cluster changes in evolving graphs. In: Proceedings of International Conference on Machine Learning and Applications (ICMLA), pp. 369–374 (2016)
van Wijk, J.J., van Selow, E.R.: Cluster and calendar based visualization of time series data. In: Proceedings of IEEE Symposium on Information Visualization (INFOVIS), pp. 4–9 (1999)
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Wang, Y., Liu, S.-X., Feng, J., Zhou, L.: Mining naturally smooth evolution of clusters from dynamic data. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 125–134 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Okui, S., Osamura, K., Inokuchi, A. (2019). Detecting Smooth Cluster Changes in Evolving Graph Structures. In: Sayed-Mouchaweh, M. (eds) Learning from Data Streams in Evolving Environments. Studies in Big Data, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-319-89803-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-89803-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89802-5
Online ISBN: 978-3-319-89803-2
eBook Packages: EngineeringEngineering (R0)