Abstract
SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-k SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than \(10^6\) nodes. Consequently, no existing work has evaluated the actual accuracy of various single-source and top-k SimRank algorithms on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-k SimRank results on large graphs. This algorithm produces ground truths with precision up to 7 decimal places with high probability. With the ground truths computed by ExactSim, we present the first experimental study of the accuracy/cost trade-offs of existing approximate SimRank algorithms on large real-world graphs and synthetic graphs. Finally, we use the ground truths to exploit various properties of SimRank distributions on large graphs.
Similar content being viewed by others
References
Aldecoa, Rodrigo, Orsini, Chiara, Krioukov, Dmitri: Hyperbolic graph generator. Computer Phys. Commun. 196, 492–496 (2015)
Andersen, Reid., Chung, Fan R. K., Lang, Kevin J.: Local graph partitioning using pagerank vectors. In FOCS, pp. 475–486, (2006)
Antonellis, Ioannis, Molina, Hector Garcia, Chang, Chi Chao: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421 (2008)
Bahmani, Bahman, Chowdhury, Abdur, Goel, Ashish: Fast incremental and personalized pagerank. VLDB 4(3), 173–184 (2010)
Chung, Fan R.K., Lu, Lincoln: Concentration inequalities and martingale inequalities: a survey. Internet Math. 3(1), 79–127 (2006)
Fogaras, Daniel., Racz, Balazs.: Scaling link-based similarity search. In: WWW, pp. 641–650, (2005)
Fogaras, Dániel, Rácz, Balázs, Csalogány, Károly, Sarlós, Tamás: Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)
Fujiwara, Yuichiro., Nakatsuji, Makoto., Shiokawa, Hiroaki., Onizuka, Makoto.: Efficient search algorithm for simrank. In: ICDE, pp. 589–600, (2013)
He, Guoming., Feng, Haijun., Li, Cuiping., Chen, Hong.: Parallel simrank computation on large graphs with iterative aggregation. In: KDD, pp. 543–552, (2010)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: SIGKDD, pp. 538–543, (2002)
Jiang, M., Fu, A.W.C., Wong, R.C.W.: Reads: a random walk approach for efficient and accurate dynamic simrank. PPVLDB 10(9), 937–948 (2017)
Krioukov, Dmitri, Papadopoulos, Fragkiskos, Kitsak, Maksim, Vahdat, Amin, Boguná, Marián: Hyperbolic geometry of complex networks. Phys. Rev. E 82(3), 036106 (2010)
Kusumoto, M., Maehara, T., Kawarabayashi, K-I.: Scalable similarity search for simrank. In: SIGMOD, pp. 325–336, (2014)
Lee, P., Lakshmanan, LVS., Yu, JX.: On top-k structural similarity search. In: ICDE, pp. 774–785, (2012)
Leskovec, J, Chakrabarti, D, Kleinberg, J, Faloutsos, C, Ghahramani, Z: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11(2), (2010)
Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of simrank for static and dynamic information networks. In: EDBT, pp. 465–476, (2010)
Li, L., Li, C., Chen, H., Du, X.: Mapreduce-based simrank computation and its application in social recommender system. In: 2013 IEEE International Congress on Big Data, pp. 133–140. IEEE, (2013)
Li, Zhenguo, Fang, Yixiang, Liu, Qin, Cheng, Jiefeng, Cheng, Reynold, Lui, John: Walking in the cloud: parallel simrank at scale. PVLDB 9(1), 24–35 (2015)
Lin, Zhenjiang, Lyu, Michael R., King, Irwin: Matchsim: a novel similarity measure based on maximum neighborhood matching. KAIS 32(1), 141–166 (2012)
Litvak, N., Scheinhardt, W.R.W., Volkovich, Y.: In-degree and pagerank: why do they follow similar power laws? Internet Math. 4(2–3), 175–198 (2007)
Liu, Y., Zheng, B., He, X., Wei, Z., Xiao, X., Zheng, K., Jiaheng, L.: Probesim: scalable single-source and top-k simrank computations on dynamic graphs. PVLDB 11(1), 14–26 (2017)
Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)
Lü, Linyuan, Zhou, Tao: Link prediction in complex networks: a survey. Phys. A: Stat. Mech. Appl. 390(6), 1150–1170 (2011)
Luo, X., Gao, J., Zhou, C., Yu, J. X.: Uniwalk: Unidirectional random walk based scalable simrank computation over large graph. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 325–336, (2017)
Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient simrank computation via linearization. CoRR, abs/1411.7228, (2014)
Maehara, T., Kusumoto, M., Kawarabayashi, K.: Scalable simrank join algorithm. In: ICDE, pp. 603–614, (2015)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. (1999)
Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for simrank over large dynamic graphs. PVLDB 8(8), 838–849 (2015)
Tao, W., Minghe, Y., Li, G.: Efficient top-k simrank-based similarity join. PVLDB 8(3), 317–328 (2014)
Tian, B., Xiao, X.: SLING: a near-optimal index structure for simrank. In: SIGMOD, pp. 1859–1874, (2016)
Tsitsulin, A., Mottin, D., Karras, P., Müller, E.: Verse: Versatile graph embeddings from similarity measures. In: WWW, pp. 539–548. International World Wide Web Conferences Steering Committee, (2018)
Wang, H., Wei, Z., Yuan, Y., Du, X., Wen, J.: Exact single-source simrank computation on large graphs. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 653–663, (2020)
Wang, Y, Che, Y, Lian, X, Chen, L, Luo, Q: Fast and accurate simrank computation via forward local push and its parallelization. In: IEEE Transactions on Knowledge and Data Engineering (2020)
Wang, Y., Chen, L., Che, Y., Luo, Q.: Accelerating pairwise simrank estimation over static and dynamic graphs. VLDB J. 28(1), 99–122 (2019)
Wei, Z., He, X., Xiao, X., Wang, S., Liu, Y., Du, X., Wen, J.: Prsim: sublinear time simrank computation on large power-law graphs. In: SIGMOD, pp. 1042–1059. ACM, (2019)
Xi, W., Fox, EA., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR, pp. 130–137. ACM, (2005)
Yu, W., Lin, X., Zhang, W.: Fast incremental simrank on link-evolving graphs. In: ICDE, pp. 304–315, (2014)
Weiren, Y., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013)
Yu, W., McCann, J.: Gauging correct relative rankings for similarity search. In: CIKM, pp. 1791–1794, (2015)
Weiren, Y., McCann, J.A.: Efficient partial-pairs simrank search for large networks. PVLDB 8(5), 569–580 (2015)
Yu, W., McCann, J.A.: Efficient partial-pairs simrank search on large networks. Proc. VLDB Endow. 8(5), 569–580 (2015)
Yu, W., McCann, JA.: High quality graph-based similarity search. In: SIGIR, pp. 83–92, (2015)
Weiren, Y., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for simrank computation. World Wide Web 15(3), 327–353 (2012)
Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., Li, J.: Panther: Fast top-k similarity search on large networks. In: SIGKDD, pp. 1445–1454. ACM, (2015)
Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562. ACM, (2009)
Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562, (2009)
Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. PVLDB 6(7), 493–504 (2013)
Acknowledgements
Zhewei Wei was supported by National Natural Science Foundation of China (NSFC) No. 61972401 and No. 61932001, by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China under Grant 18XNLG21, and by Alibaba Group through Alibaba Innovative Research Program. The work is partially done at Beijing Key Laboratory of Big Data Management and Analysis Methods, MOE Key Lab DEKE, Renmin University of China, and Pazhou Lab, Guangzhou, 510330, China. Hanzhi Wang was supported by the Outstanding Innovative Talents Cultivation Funded Programs 2020 of Renmin University of China. Ye Yuan was supported by NSFC No. 61932004 and No. 61622202 and by FRFCU No. N181605012. Ji-Rong Wen was supported by NSFC No. 61832017 and by Beijing Outstanding Young Scientist Program NO. BJJWZYJH012019100020098. Xiaoyong Du was supported by NSFC No. U1711261.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, H., Wei, Z., Liu, Y. et al. ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths. The VLDB Journal 30, 989–1015 (2021). https://doi.org/10.1007/s00778-021-00672-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-021-00672-7