The VLDB Journal

, Volume 27, Issue 1, pp 127–152 | Cite as

Second-order random walk-based proximity measures in graph analysis: formulations and algorithms

  • Yubao Wu
  • Xiang Zhang
  • Yuchen Bian
  • Zhipeng Cai
  • Xiang Lian
  • Xueting Liao
  • Fengpan Zhao
Regular Paper


Measuring the proximity between different nodes is a fundamental problem in graph analysis. Random walk-based proximity measures have been shown to be effective and widely used. Most existing random walk measures are based on the first-order Markov model, i.e., they assume that the next step of the random surfer only depends on the current node. However, this assumption neither holds in many real-life applications nor captures the clustering structure in the graph. To address the limitation of the existing first-order measures, in this paper, we study the second-order random walk measures, which take the previously visited node into consideration. While the existing first-order measures are built on node-to-node transition probabilities, in the second-order random walk, we need to consider the edge-to-edge transition probabilities. Using incidence matrices, we develop simple and elegant matrix representations for the second-order proximity measures. A desirable property of the developed measures is that they degenerate to their original first-order forms when the effect of the previous step is zero. We further develop Monte Carlo methods to efficiently compute the second-order measures and provide theoretical performance guarantees. Experimental results show that in a variety of applications, the second-order measures can dramatically improve the performance compared to their first-order counterparts.


Second-order random walk Proximity measure Graph mining PageRank SimRank 



This work was partially supported by the National Science Foundation Grants IIS-11623-74, CAREER, and the NIH Grant R01GM115833.

Supplementary material

778_2017_490_MOESM1_ESM.pdf (9.2 mb)
Supplementary material 1 (pdf 9422 KB)
778_2017_490_MOESM2_ESM.pdf (1.3 mb)
Supplementary material 2 (pdf 1316 KB)
778_2017_490_MOESM3_ESM.pdf (399 kb)
Supplementary material 3 (pdf 399 KB)
778_2017_490_MOESM4_ESM.pdf (1 mb)
Supplementary material 4 (pdf 1056 KB)


  1. 1. Accessed 20 Nov 2017
  2. 2.
    Andersen, R., Chung, F., Lang, K.: Local graph partitioning using PageRank vectors. In: FOCS, pp. 475–486 (2006)Google Scholar
  3. 3.
    Benson, A.R., Gleich, D.F., Leskovec, J.: Tensor spectral clustering for partitioning higher-order network structures. In: SDM, pp. 118–126 (2015)Google Scholar
  4. 4.
    Benson, A.R., Gleich, D.F., Leskovec, J.: Higher-order organization of complex networks. Science 353(6295), 163–166 (2016)CrossRefGoogle Scholar
  5. 5.
    Bucklin, R.E., Sismeiro, C.: Click here for internet insight: advances in clickstream data analysis in marketing. J. Interact. Mark. 23(1), 35–48 (2009)CrossRefGoogle Scholar
  6. 6.
    Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, pp. 442–446 (2004)Google Scholar
  7. 7.
    Chung, F., Lu, L.: Old and new concentration inequalities. In: Chung, F., Lu, L. (eds.) Complex Graphs and Networks Chap 12. AMS, Providence (2006)Google Scholar
  8. 8.
    Cohen, S., et al.: A survey on proximity measures for social networks. In: Search Computing, pp. 191–206 (2012)Google Scholar
  9. 9.
    Fang, Y., Chang, K.C.-C., Lauw, H.W.: Roundtriprank: graph-based proximity with importance and specificity? In: ICDE, pp. 613–624 (2013)Google Scholar
  10. 10.
    Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW, pp. 641–650 (2005)Google Scholar
  11. 11.
    Fogaras, D., Rácz, B., Csalogány, K., et al.: Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. TKDE 19(3), 355–369 (2007)Google Scholar
  13. 13.
    Fujiwara, Y., Nakatsuji, M. et al.: Efficient search algorithm for SimRank. In: ICDE, pp. 589–600 (2013)Google Scholar
  14. 14.
    Gleich, D.F.: Pagerank beyond the web. SIAM Rev. 57(3), 321–363 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Gleich, D.F., Lim, L.-H., Yu, Y.: Multilinear PageRank. SIAM J. Matrix Anal. Appl. 36(4), 1507–1541 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: SIGKDD, pp. 543–552 (2010)Google Scholar
  17. 17.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. JASA 58(301), 13–30 (1963)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)Google Scholar
  19. 19.
    Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)Google Scholar
  20. 20.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)CrossRefzbMATHGoogle Scholar
  21. 21.
    Kusumoto, M., Maehara, T., Kawarabayashi, K.: Scalable similarity search for SimRank. In: SIGMOD, pp. 325–336 (2014)Google Scholar
  22. 22.
    Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRefGoogle Scholar
  23. 23.
    Langville, A.N., Meyer, C.D.: Deeper inside PageRank. Internet Math. 1(3), 335–380 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Langville, A.N., Meyer, C.D.: The mathematics guide. In: Langville, A.N., Meyer, C.D. (eds.) Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2011)Google Scholar
  25. 25.
    LeCun, Y., Boser, B.E., Denker, J.S., et al.: Handwritten digit recognition with a back-propagation network. In: NIPS (1990)Google Scholar
  26. 26.
    Lehmberg, O., et al.: Graph structure in the web: aggregated by pay-level domain. In: WebSci, pp. 119–128 (2014)Google Scholar
  27. 27.
    Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of SimRank for static and dynamic information networks. In: EDBT, pp. 465–476 (2010)Google Scholar
  28. 28.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. JASIST 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  29. 29.
    Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: LinkSCAN*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)Google Scholar
  30. 30.
    Lü, L., Zhou, T.: Link prediction in complex networks: a survey. Physica A 390(6), 1150–1170 (2011)CrossRefGoogle Scholar
  31. 31.
    Maehara, T., Kusumoto, M., et al.: Efficient SimRank computation via linearization. arXiv:1411.7228 (2014)
  32. 32.
    Mei, Q., Zhou, D., Church, K.: Query suggestion using hitting time. In: CIKM, pp. 469–478 (2008)Google Scholar
  33. 33.
    Meyer, C.D.: Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia (2000)CrossRefGoogle Scholar
  34. 34.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab, Stanford (1999)Google Scholar
  35. 35.
    Raftery, A.E.: A model for high-order Markov chains. J. R. Stat. Soc. Ser. B 47(3), 528–539 (1985)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Rosvall, M., Esquivel, A.V., Lancichinetti, A., et al.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)CrossRefGoogle Scholar
  37. 37.
    Rothe, S., Schütze, H.: CoSimRank: a flexible & efficient graph-theoretic similarity measure. In: ACL, pp. 1392–1402 (2014)Google Scholar
  38. 38.
    Sarkar, P., Moore, A.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: UAI (2012)Google Scholar
  39. 39.
    Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: KDD, pp. 513–522 (2010)Google Scholar
  40. 40.
    Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its applications. In: ICDM, pp. 613–622 (2006)Google Scholar
  41. 41.
    Wu, Y., Bian, Y., Zhang, X.: Remember where you came from: on the second-order random walk based proximity measures. PVLDB 10(1), 13–24 (2017)Google Scholar
  42. 42.
    Wu, Y., Jin, R., Li, J., Zhang, X.: Robust local community detection: on free rider effect and its elimination. PVLDB 8(7), 798–809 (2015)Google Scholar
  43. 43.
    Wu, Y., Jin, R., Zhang, X.: Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In SIGMOD, pp. 1139–1150 (2014)Google Scholar
  44. 44.
    Wu, Y., Jin, R., Zhang, X.: Efficient and exact local search for random walk based top-\(k\) proximity query in large graphs. TKDE 28(5), 1160–1174 (2016)Google Scholar
  45. 45.
    Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs. In WAIM, pp. 280–296 (2010)Google Scholar
  46. 46.
    Yu, W., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013)Google Scholar
  47. 47.
    Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In CKIM, pp. 1442–1451 (2012)Google Scholar
  48. 48.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: ICML, pp. 912–919 (2003)Google Scholar
  49. 49.
    Zhu, X., Goldberg, A.: Graph-based semi-supervised learning. In: Zhu, X., Goldberg, A. (eds.) Introduction to Semi-supervised Learning. Morgan & Claypool Publishers, San Rafel (2009)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceGeorgia State UniversityAtlantaUSA
  2. 2.College Information Sciences and TechnologyThe Pennsylvania State UniversityState CollegeUSA
  3. 3.Department of Computer ScienceKent State UniversityKentUSA

Personalised recommendations