Skip to main content
Log in

Accelerating pairwise SimRank estimation over static and dynamic graphs

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Measuring similarities among different vertices is a fundamental problem in graph analysis. Among different similarity measurements, SimRank is one of the most promising and popular. In reality, instead of computing the whole similarity matrix, people often issue SimRank queries in a pairwise manner, each of which needs to estimate an approximate SimRank value within a specified accuracy for a given pair of nodes. These pairwise SimRank queries are often processed on real-life graphs, which typically evolve over time, requiring efficient algorithms that can query pairwise SimRank under dynamic graph updates. However, current single-pair SimRank solutions are either static or inefficient in handling dynamic cases with good-quality results. Observing that the sample size is the major factor that determines the efficiency and the accuracy in Monte Carlo methods to estimate pairwise SimRank, in this paper, we propose three algorithms to query pairwise SimRank over static and dynamic graphs efficiently, by using different sample reduction strategies. The accuracy of our algorithms is guaranteed by the different invariants we propose for pairwise SimRank. We show that our algorithms outperform the state-of-the-art static and dynamic solutions for pairwise SimRank estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. If \(|In(a)|=0\), the SimRank score between a and any other node is 0.

  2. Sometimes, the PSE problem is also referred as the single-pair SimRank problem [7, 14, 21].

  3. \(\delta \) is the failure probability.

  4. d is the average in-degree of G.

  5. The details of the derivation of the concentration inequality, which determines N, are shown in Sect. 3.

  6. Since \(N=\frac{c^{2}\ln {\frac{2}{\delta }}}{2\epsilon ^{2}}= 0.6^{2} \times \frac{\ln {\frac{2}{0.01}}}{2 * (0.01)^{2}} \approx 9537\).

  7. We assume \(\epsilon \) is a rationale number, instead of a real number; otherwise, \(\epsilon \) is uncountable and the problem is undecidable.

  8. Mathematically, we can view \(\mathbf {p} = \overrightarrow{vec}(\mathbf {P})\) and \(\mathbf {p} = \overrightarrow{vec}(\mathbf {R})\), where \(\mathbf {p}, \mathbf {r} \in {\mathbb {R}}^{n^{2} \times 1}\) and \(\mathbf {P},\mathbf {R} \in {\mathbb {R}}^{n \times n}\).

  9. https://snap.stanford.edu/data/index.html.

  10. http://konect.uni-koblenz.de/networks/.

  11. If both algorithms A and B achieve the \(\epsilon \)-bound, and B has a smaller ME than A, we cannot conclude that B has higher accuracy than A, since \(\epsilon \) is a predefined accuracy parameter. In contrast, this indicates that B may have unnecessary redundant computational cost.

  12. Since the ground truth all-pairs SimRank scores of these four small data sets can be computed within several minutes, it would be unaffordable to run the algorithm which has a total running time longer than 1 h.

References

  1. Abbassi, Z., Mirrokni, V.S.: A recommender system based on local random walks and spectral methods. In: WebKDD/SNA-KDD (2007). https://doi.org/10.1145/1348549.1348561

  2. Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: FOCS (2006)

  3. Antonellis, I., Garcia-Molina, H., Chang, C.: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421, (2008). http://www.vldb.org/pvldb/1/1453903.pdf

  4. Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW (2005)

  5. Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Onizuka, M.: Efficient search algorithm for SimRank. In: ICDE (2013)

  6. He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: KDD (2010). https://doi.org/10.1145/1835804.1835874

  7. He, J., Liu, H., Yu, J.X., Li, P., He, W., Du, X.: Assessing single-pair similarity over graphs by aggregating first-meeting probabilities. Inf. Syst. 42, 107–122 (2014). https://doi.org/10.1016/j.is.2013.12.008

    Article  Google Scholar 

  8. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963). http://www.jstor.org/stable/2282952

  9. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD (2002)

  10. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, 20–24 May 2003, pp. 271–279 (2003). https://doi.org/10.1145/775152.775191

  11. Jiang, M., Fu, A.W., Wong, R.C., Wang, K.: READS: a random walk approach for efficient and accurate dynamic SimRank. PVLDB 10(9), 937–948 (2017)

    Google Scholar 

  12. Kusumoto, M., Maehara, T., Kawarabayashi, K: Scalable similarity search for SimRank. In: SIGMOD (2014). https://doi.org/10.1145/2588555.2610526

  13. Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of SimRank for static and dynamic information networks. In: EDBT (2010a). https://doi.org/10.1145/1739041.1739098

  14. Li, P., Liu, H., Yu, J.X., He, J., Du, X.: Fast single-pair SimRank computation. In: Proceedings of the 2010 SIAM International Conference on Data Mining, SIAM, pp. 571–582 (2010b)

  15. Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.C.S.: Walking in the cloud: parallel simrank at scale. PVLDB 9(1), 24–35 (2015)

    Google Scholar 

  16. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  17. Liu, Y., Zheng, B., He, X., Wei, Z., Xiao, X., Zheng, K., Lu, J.: Probesim: scalable single-source and top-k SimRank computations on dynamic graphs. PVLDB 11(1), 14–26 (2017). http://www.vldb.org/pvldb/vol11/p14-liu.pdf

  18. Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for SimRank computation. VLDB 1(1), 422–433 (2008)

    Google Scholar 

  19. Lofgren, P., Banerjee, S., Goel, A.: Personalized PageRank estimation and search: a bidirectional approach. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, ACM, New York (WSDM ’16), pp. 163–172 (2016). https://doi.org/10.1145/2835776.2835823

  20. Lu, J., Gong, Z., Lin, X.: A novel and fast SimRank algorithm. IEEE Trans. Knowl. Data Eng. (2017). https://doi.org/10.1109/TKDE.2016.2626282

    Google Scholar 

  21. Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient SimRank computation via linearization (2014). CoRR arXiv:1411.7228

  22. Mislove, A., Koppula, H.S., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Growth of the flickr social network. In: Proceedings of the First Workshop on Online Social Networks (WOSN 2008), Seattle, 17–22 Aug 2008, pp. 25–30 (2008). https://doi.org/10.1145/1397735.1397742

  23. Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for SimRank over large dynamic graphs. PVLDB 8(8), 838–849 (2015)

    Google Scholar 

  24. Spirin, N., Han, J.: Survey on web spam detection: principles and algorithms. SIGKDD Explor. Newsl. 13(2), 50–64 (2012)

    Article  Google Scholar 

  25. Tao, W., Yu, M., Li, G.: Efficient top-k SimRank-based similarity join. PVLDB 8(3):317–328, (2014). http://www.vldb.org/pvldb/vol8/p317-tao.pdf

  26. Tian, B., Xiao, X.: Sling: A near-optimal index structure for SimRank. SIGMOD (2016).https://doi.org/10.1145/2882903.2915243

  27. Wang, Y., Lian, X., Chen, L.: Efficient SimRank tracking in dynamic graphs. In: ICDE (2018)

  28. Yin, X., Han, J., Yu, P.S.: Linkclus: efficient clustering via heterogeneous semantic links. In: VLDB (2006)

  29. Yoon, M., Jin, W., Kang, U.: Fast and accurate random walk with restart on dynamic graphs with guarantees (2017). CoRR arXiv:1712.00595

  30. Yu, W., McCann, J.A.: Efficient partial-pairs SimRank search for large networks. PVLDB 8(5), 569–580 (2015)

    Google Scholar 

  31. Yu, W., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for simrank computation. WWW 15(3) (2012). https://doi.org/10.1007/s11280-010-0100-6

  32. Yu, W., Lin, X., Zhang, W.: Towards efficient SimRank computation on large networks. In: ICDE, pp. 601–612 (2013a). https://doi.org/10.1109/ICDE.2013.6544859

  33. Yu, W., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013b)

    Google Scholar 

  34. Yu, W., Lin, X., Zhang, W.: Fast incremental SimRank on link-evolving graphs. In: ICDE, pp. 304–315 (2014). https://doi.org/10.1109/ICDE.2014.6816660

  35. Yu, W., Lin, X., Zhang, W., McCann, J.A.: Dynamical simrank search on time-varying networks. VLDB J. 27(1), 79–104 (2018). https://doi.org/10.1007/s00778-017-0488-z

    Article  Google Scholar 

  36. Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, ACM, pp. 553–562 (2009)

  37. Zheng, W., Zou, L., Chen, L., Zhao, D.: Efficient simrank-based similarity join. ACM Trans. Database Syst. 42(3), 16:1–16:37 (2017). https://doi.org/10.1145/3083899

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Yue Wang and Lei Chen are supported in part by the Hong Kong RGC GRF Project 16214716, National Grand Fundamental Research 973 Program of China under Grant 2014CB340303, the National Science Foundation of China (NSFC) under Grant No. 61729201, Science and Technology Planning Project of Guangdong Province, China, No. 2015B010110006, Huawei Co.Ltd Collaboration Project, YBCB2009041-45, Hong Kong ITC ITF grants ITS/391/15FX and ITS/212/16FP, Microsoft Research Asia Collaborative Research Grant, and WeChat-HKUST Joint Lab on Artificial Intelligence Technology, RDC-17182280. Yulin Che and Qiong Luo are supported in part by grants 16206414 from the Hong Kong Research Grants Council and MRA11EG01 from Microsoft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Chen, L., Che, Y. et al. Accelerating pairwise SimRank estimation over static and dynamic graphs. The VLDB Journal 28, 99–122 (2019). https://doi.org/10.1007/s00778-018-0521-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0521-x

Keywords

Navigation