Skip to main content
Log in

A space and time efficient algorithm for SimRank computation

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

SimRank has become an important similarity measure to rank web documents based on a graph model on hyperlinks. The existing approaches for conducting SimRank computation adopt an iteration paradigm. The most efficient deterministic technique yields \(O\left(n^3\right)\) worst-case time per iteration with the space requirement \(O\left(n^2\right)\), where n is the number of nodes (web documents). In this paper, we propose novel optimization techniques such that each iteration takes \(O \left(\min \left\{ n \cdot m , n^r \right\}\right)\) time and \(O \left( n + m \right)\) space, where m is the number of edges in a web-graph model and r ≤ log2 7. In addition, we extend the similarity transition matrix to prevent random surfers getting stuck, and devise a pruning technique to eliminate impractical similarities for each iteration. Moreover, we also develop a reordering technique combined with an over-relaxation method, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well. We conduct extensive experiments on both synthetic and real data sets to demonstrate the efficiency and effectiveness of our iteration techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Antonellis, I., Garcia-Molina, H., Chang, C.C.: Simrank+ +: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)

    Google Scholar 

  2. Bhatia, R.: Matrix Analysis. Springer, New York (1997)

    Book  Google Scholar 

  3. Cai, Y., Li, P., Liu, H., He, J., Du, X.: S-simrank: combining content and link information to cluster papers effectively and efficiently. In: ADMA (2008)

  4. Chan, W.M., George, A.: A linear time implementation of the reverse cuthill-mckee algorithm. BIT 20(1), 8–14 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cohen, J., Roth, M.S.: On the implementation of strassen’s fast multiplication algorithm. Acta Inf. 6, 341–355 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  6. Coppersmith, D., Winograd, S.: On the asymptotic complexity of matrix multiplication. SIAM J. Comput. 11(3), 82–90 (1982)

    MathSciNet  Google Scholar 

  7. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9(3), 1–6 (1990)

    Article  MathSciNet  Google Scholar 

  8. D’Azevedo, E.F., Fahey, M.R., Mills, R.T.: Vectorized sparse matrix multiply for compressed row storage format. In: International Conference on Computational Science (1) (2005)

  9. Fogaras, D., Racz, B.: A scalable randomized method to compute link-based similarity rank on the web graph. In: EDBT Workshops (2004)

  10. Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW (2005)

  11. He, G., Feng, H., Li, C., Chen, H.: Parallel simrank computation on large graphs with iterative aggregation. In: KDD (2010)

  12. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD (2002)

  13. Li, P., Cai, Y., Liu, H., He, J., Du, X.: Exploiting the block structure of link graph for efficient similarity computation. In: PAKDD (2009)

  14. Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of simrank for static and dynamic information networks. In: EDBT (2010)

  15. Lim, A., Rodrigues, B., Xiao, F.: Heuristics for matrix bandwidth reduction. Eur. J. Oper. Res. 174(1), 69–91 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. PVLDB 1(1), 422–433 (2008)

    Google Scholar 

  17. Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)

    Article  Google Scholar 

  18. Mendelzon, A.O.: Review—authoritative sources in a hyperlinked environment. ACM SIGMOD Digit. Rev. 1, 604–632 (2000)

    Google Scholar 

  19. Page, L., Brin, S.R.M., Winograd, T.: The pagerank citation ranking bringing order to the web. Technial report (1998)

  20. Pathak, A., Chakrabarti, S., Gupta, M.S.: Index design for dynamic personalized pagerank. In: ICDE (2008)

  21. Quevedo, J.U., Huang, S.H.S.: Similarity among web pages based on their link structure. In: IKE (2003)

  22. Weinberg, B.H.: Bibliographic coupling: a review. Inf. Storage Retr. 10(5–6), 189–196 (1974)

    Article  Google Scholar 

  23. Wijaya, D.T., Bressan, S.: Clustering web documents using co-citation, coupling, incoming, and outgoing hyperlinks: a comparative performance analysis of algorithms. IJWIS 2(2), 69–76 (2006)

    Google Scholar 

  24. Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR (2005)

  25. Yu, W., Lin, X., Le, J.: A space and time efficient algorithm for simrank computation. In: APWeb (2010)

  26. Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs. In: WAIM (2010)

  27. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)

    Google Scholar 

  28. Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiren Yu.

Additional information

The work was supported by ARC Grants DP0987557 and DP0881035 and Google Research Award. We also appreciate the general support from NICTA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, W., Zhang, W., Lin, X. et al. A space and time efficient algorithm for SimRank computation. World Wide Web 15, 327–353 (2012). https://doi.org/10.1007/s11280-010-0100-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-010-0100-6

Keywords

Navigation