Abstract
SimRank has become an important similarity measure to rank web documents based on a graph model on hyperlinks. The existing approaches for conducting SimRank computation adopt an iteration paradigm. The most efficient deterministic technique yields \(O\left(n^3\right)\) worst-case time per iteration with the space requirement \(O\left(n^2\right)\), where n is the number of nodes (web documents). In this paper, we propose novel optimization techniques such that each iteration takes \(O \left(\min \left\{ n \cdot m , n^r \right\}\right)\) time and \(O \left( n + m \right)\) space, where m is the number of edges in a web-graph model and r ≤ log2 7. In addition, we extend the similarity transition matrix to prevent random surfers getting stuck, and devise a pruning technique to eliminate impractical similarities for each iteration. Moreover, we also develop a reordering technique combined with an over-relaxation method, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well. We conduct extensive experiments on both synthetic and real data sets to demonstrate the efficiency and effectiveness of our iteration techniques.
Similar content being viewed by others
References
Antonellis, I., Garcia-Molina, H., Chang, C.C.: Simrank+ +: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)
Bhatia, R.: Matrix Analysis. Springer, New York (1997)
Cai, Y., Li, P., Liu, H., He, J., Du, X.: S-simrank: combining content and link information to cluster papers effectively and efficiently. In: ADMA (2008)
Chan, W.M., George, A.: A linear time implementation of the reverse cuthill-mckee algorithm. BIT 20(1), 8–14 (1980)
Cohen, J., Roth, M.S.: On the implementation of strassen’s fast multiplication algorithm. Acta Inf. 6, 341–355 (1976)
Coppersmith, D., Winograd, S.: On the asymptotic complexity of matrix multiplication. SIAM J. Comput. 11(3), 82–90 (1982)
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9(3), 1–6 (1990)
D’Azevedo, E.F., Fahey, M.R., Mills, R.T.: Vectorized sparse matrix multiply for compressed row storage format. In: International Conference on Computational Science (1) (2005)
Fogaras, D., Racz, B.: A scalable randomized method to compute link-based similarity rank on the web graph. In: EDBT Workshops (2004)
Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW (2005)
He, G., Feng, H., Li, C., Chen, H.: Parallel simrank computation on large graphs with iterative aggregation. In: KDD (2010)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD (2002)
Li, P., Cai, Y., Liu, H., He, J., Du, X.: Exploiting the block structure of link graph for efficient similarity computation. In: PAKDD (2009)
Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of simrank for static and dynamic information networks. In: EDBT (2010)
Lim, A., Rodrigues, B., Xiao, F.: Heuristics for matrix bandwidth reduction. Eur. J. Oper. Res. 174(1), 69–91 (2006)
Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. PVLDB 1(1), 422–433 (2008)
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)
Mendelzon, A.O.: Review—authoritative sources in a hyperlinked environment. ACM SIGMOD Digit. Rev. 1, 604–632 (2000)
Page, L., Brin, S.R.M., Winograd, T.: The pagerank citation ranking bringing order to the web. Technial report (1998)
Pathak, A., Chakrabarti, S., Gupta, M.S.: Index design for dynamic personalized pagerank. In: ICDE (2008)
Quevedo, J.U., Huang, S.H.S.: Similarity among web pages based on their link structure. In: IKE (2003)
Weinberg, B.H.: Bibliographic coupling: a review. Inf. Storage Retr. 10(5–6), 189–196 (1974)
Wijaya, D.T., Bressan, S.: Clustering web documents using co-citation, coupling, incoming, and outgoing hyperlinks: a comparative performance analysis of algorithms. IJWIS 2(2), 69–76 (2006)
Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR (2005)
Yu, W., Lin, X., Le, J.: A space and time efficient algorithm for simrank computation. In: APWeb (2010)
Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs. In: WAIM (2010)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
The work was supported by ARC Grants DP0987557 and DP0881035 and Google Research Award. We also appreciate the general support from NICTA.
Rights and permissions
About this article
Cite this article
Yu, W., Zhang, W., Lin, X. et al. A space and time efficient algorithm for SimRank computation. World Wide Web 15, 327–353 (2012). https://doi.org/10.1007/s11280-010-0100-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-010-0100-6