A space and time efficient algorithm for SimRank computation

Yu, Weiren; Zhang, Wenjie; Lin, Xuemin; Zhang, Qing; Le, Jiajin

doi:10.1007/s11280-010-0100-6

A space and time efficient algorithm for SimRank computation

Published: 07 December 2010

Volume 15, pages 327–353, (2012)
Cite this article

World Wide Web Aims and scope Submit manuscript

Weiren Yu¹,
Wenjie Zhang¹,
Xuemin Lin¹,
Qing Zhang² &
…
Jiajin Le³

526 Accesses
45 Citations
6 Altmetric
Explore all metrics

Abstract

SimRank has become an important similarity measure to rank web documents based on a graph model on hyperlinks. The existing approaches for conducting SimRank computation adopt an iteration paradigm. The most efficient deterministic technique yields \(O\left(n^3\right)\) worst-case time per iteration with the space requirement \(O\left(n^2\right)\), where n is the number of nodes (web documents). In this paper, we propose novel optimization techniques such that each iteration takes \(O \left(\min \left\{ n \cdot m , n^r \right\}\right)\) time and \(O \left( n + m \right)\) space, where m is the number of edges in a web-graph model and r ≤ log₂ 7. In addition, we extend the similarity transition matrix to prevent random surfers getting stuck, and devise a pruning technique to eliminate impractical similarities for each iteration. Moreover, we also develop a reordering technique combined with an over-relaxation method, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well. We conduct extensive experiments on both synthetic and real data sets to demonstrate the efficiency and effectiveness of our iteration techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SimSky: An Accuracy-Aware Algorithm for Single-Source SimRank Search

Fast SimRank Computation over Disk-Resident Graphs

ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths

Article 05 June 2021

References

Antonellis, I., Garcia-Molina, H., Chang, C.C.: Simrank+ +: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)
Google Scholar
Bhatia, R.: Matrix Analysis. Springer, New York (1997)
Book Google Scholar
Cai, Y., Li, P., Liu, H., He, J., Du, X.: S-simrank: combining content and link information to cluster papers effectively and efficiently. In: ADMA (2008)
Chan, W.M., George, A.: A linear time implementation of the reverse cuthill-mckee algorithm. BIT 20(1), 8–14 (1980)
Article MathSciNet MATH Google Scholar
Cohen, J., Roth, M.S.: On the implementation of strassen’s fast multiplication algorithm. Acta Inf. 6, 341–355 (1976)
Article MathSciNet MATH Google Scholar
Coppersmith, D., Winograd, S.: On the asymptotic complexity of matrix multiplication. SIAM J. Comput. 11(3), 82–90 (1982)
MathSciNet Google Scholar
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9(3), 1–6 (1990)
Article MathSciNet Google Scholar
D’Azevedo, E.F., Fahey, M.R., Mills, R.T.: Vectorized sparse matrix multiply for compressed row storage format. In: International Conference on Computational Science (1) (2005)
Fogaras, D., Racz, B.: A scalable randomized method to compute link-based similarity rank on the web graph. In: EDBT Workshops (2004)
Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW (2005)
He, G., Feng, H., Li, C., Chen, H.: Parallel simrank computation on large graphs with iterative aggregation. In: KDD (2010)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD (2002)
Li, P., Cai, Y., Liu, H., He, J., Du, X.: Exploiting the block structure of link graph for efficient similarity computation. In: PAKDD (2009)
Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of simrank for static and dynamic information networks. In: EDBT (2010)
Lim, A., Rodrigues, B., Xiao, F.: Heuristics for matrix bandwidth reduction. Eur. J. Oper. Res. 174(1), 69–91 (2006)
Article MathSciNet MATH Google Scholar
Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. PVLDB 1(1), 422–433 (2008)
Google Scholar
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)
Article Google Scholar
Mendelzon, A.O.: Review—authoritative sources in a hyperlinked environment. ACM SIGMOD Digit. Rev. 1, 604–632 (2000)
Google Scholar
Page, L., Brin, S.R.M., Winograd, T.: The pagerank citation ranking bringing order to the web. Technial report (1998)
Pathak, A., Chakrabarti, S., Gupta, M.S.: Index design for dynamic personalized pagerank. In: ICDE (2008)
Quevedo, J.U., Huang, S.H.S.: Similarity among web pages based on their link structure. In: IKE (2003)
Weinberg, B.H.: Bibliographic coupling: a review. Inf. Storage Retr. 10(5–6), 189–196 (1974)
Article Google Scholar
Wijaya, D.T., Bressan, S.: Clustering web documents using co-citation, coupling, incoming, and outgoing hyperlinks: a comparative performance analysis of algorithms. IJWIS 2(2), 69–76 (2006)
Google Scholar
Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR (2005)
Yu, W., Lin, X., Le, J.: A space and time efficient algorithm for simrank computation. In: APWeb (2010)
Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs. In: WAIM (2010)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Google Scholar
Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management (2009)

Download references

Author information

Authors and Affiliations

School of Computer Science & Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Weiren Yu, Wenjie Zhang & Xuemin Lin
E-Health Research Center, Australia CSIRO ICT Center, Herston, Queensland, 4029, Australia
Qing Zhang
School of Computer Science & Technology, Donghua University, Shanghai, China
Jiajin Le

Authors

Weiren Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuemin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiajin Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiren Yu.

Additional information

The work was supported by ARC Grants DP0987557 and DP0881035 and Google Research Award. We also appreciate the general support from NICTA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, W., Zhang, W., Lin, X. et al. A space and time efficient algorithm for SimRank computation. World Wide Web 15, 327–353 (2012). https://doi.org/10.1007/s11280-010-0100-6

Download citation

Received: 27 August 2010
Revised: 19 November 2010
Accepted: 19 November 2010
Published: 07 December 2010
Issue Date: May 2012
DOI: https://doi.org/10.1007/s11280-010-0100-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A space and time efficient algorithm for SimRank computation

Abstract

Access this article

Similar content being viewed by others

SimSky: An Accuracy-Aware Algorithm for Single-Source SimRank Search

Fast SimRank Computation over Disk-Resident Graphs

ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A space and time efficient algorithm for SimRank computation

Abstract

Access this article

Similar content being viewed by others

SimSky: An Accuracy-Aware Algorithm for Single-Source SimRank Search

Fast SimRank Computation over Disk-Resident Graphs

ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation