Abstract
Large network analysis is a very important topic in data mining. A significant body of work in the area studies the problem of node similarity. One way to express node similarity is to associate with each node the set of 1-hop neighbors and compute the Jaccard similarity between these sets. This information can be used subsequently for more complex operations like link prediction, clustering or dense subgraph discovery. In this work, we study algorithms to monitor the result of a similarity join between nodes continuously, assuming a sliding window accommodating graph edges. Since the arrival of a new edge or the expiration of an existing one may change the similarity between several node pairs, the challenge is to maintain the similarity join result as efficiently as possible. Our theoretical study is validated by a thorough experimental evaluation, based on real-world as well as synthetically generated graphs, demonstrating the superiority of the proposed technique in comparison to baseline approaches.
Chapter PDF
Similar content being viewed by others
References
Aggarwal, C., Wang, H.: Managing and Mining Graph Data. Springer (2010)
Aggarwal, C., Zhao, Y., Yu, P.S.: On Clustering Graph Streams. In: Proceedings of SIAM SDM, pp. 478–489 (2010)
Arasu, A., Ganti, V., Kaushik, R.: Efficient Exact Set-Similarity Joins. In: Proceedings of VLDB, pp. 918–929 (2006)
Bank, J., Cole, B.: Calculating the Jaccard Similarity Coefficient with Map Reduce for Entity Pairs in Wikipedia (2008), http://weblab.infosci.cornell.edu/weblab/papers/Bank2008.pdf
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs. In: Proceedings of ACM SIGKDD, pp. 16–24 (2008)
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. In: Proceedings of WWW, pp. 1157–1166 (1997)
Buriol, L.S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: Proceedings of ACM PODS, pp. 253–262 (2006)
Cormode, G., Muthukrishnan, S.: Space Efficient Mining of Multigraph Streams. In: Proceedings of ACM PODS, pp. 271–282 (2005)
Deng, F., Siersdorfer, S., Zerr, S.: Efficient Jaccard-based Diversity Analysis of Large Document Collections. In: Proceedings of ACM CIKM, pp. 1402–1411 (2012)
Fortunato, S.: Community Detection in Graphs, arXiv:0906.0612v2 [physics.soc-ph] (2009)
Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and Mining Data Streams: You Only Get One Look. In: Proceedings of ACM SIGMOD, Tutorial (2002)
Kontaki, M., Gounaris, A., Papadopoulos, A.N., Tsichlas, K., Manolopoulos, Y.: Continuous Monitoring of Distance-Based Outliers over Data Streams. In: Proceedings of IEEE ICDE, pp. 135–146 (2011)
Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Continuous Top-k Dominating Queries. IEEE Transactions on Knowledge and Data Engineering 24(5), 840–853 (2012)
Lian, X., Chen, L.: Efficient Join Processing on Uncertain Data Streams. In: Proceedings of ACM CIKM, pp. 857–866 (2009)
Liben-Nowell, D., Kleinberg, J.: The Link Prediction Problem for Social Networks. In: Proceedings of ACM CIKM, pp. 556–559 (2003)
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous Monitoring of Top-k Queries over Sliding Windows. In: Proceedings of ACM SIGMOD, pp. 635–646 (2006)
Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2), 117–236 (2005)
Viger, F., Latapy, M.: Efficient and Simple Generation of Random Simple Connected Graphs with Prescribed Degree Sequence. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 440–449. Springer, Heidelberg (2005)
Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient Similarity Joins for Near Duplicate Detection. ACM Transactions on Database Systems 15, 15:1–15:41 (2011)
Zelke, M.: Algorithms for Streaming Graphs. PhD Dissertation, Humboldt University of Berlin (2009)
Zhao, P., Aggarwal, C.C., Wang, M.: gSketch: On Query Estimation in Graph Streams. In: Proceedings of VLDB, pp. 193–204 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Valari, E., Papadopoulos, A.N. (2013). Continuous Similarity Computation over Streaming Graphs. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-40988-2_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40987-5
Online ISBN: 978-3-642-40988-2
eBook Packages: Computer ScienceComputer Science (R0)