Skip to main content

Accurate relational reasoning in edge-labeled graphs by multi-labeled random walk with restart


Given an edge-labeled graph and two nodes, how can we accurately infer the relation between the nodes? Reasoning how the nodes are related is a fundamental task in analyzing network data, and various relevance measures have been suggested to effectively identify relevance between nodes in graphs. Although many random walk based models have been extensively utilized to reveal relevance between nodes, they cannot distinguish how those nodes are related in terms of edge labels since the traditional surfer does not consider edge labels for estimating relevance scores. In this paper, we propose MuRWR (Multi-Labeled Random Walk with Restart), a novel random walk based model that accurately identifies how nodes are related with, considering multiple edge labels. We introduce a labeled random surfer whose label indicates the relation between starting and visiting nodes, and change the surfer’s label during random walks for multi-hop relational reasoning. We also learn appropriate rules on changing the surfer’s label from the edge-labeled graph to accurately infer relations. We develop an iterative algorithm for computing MuRWR, and prove the convergence guarantee of the algorithm. Through extensive experiments, we show that our model MuRWR provides the best inference performance.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7


  1. Avron, H., Horesh, L.: Community detection using time-dependent personalized pagerank. In: ICML (2015)

  2. Bonchi, F., Gionis, A., Gullo, F., Ukkonen, A.: Distance oracles in edge-labeled graphs. In: EDBT (2014)

  3. Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multi-relational data. Mach. Learn. 94(2), 233–259 (2014)

    MathSciNet  Article  Google Scholar 

  4. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS (2013)

  5. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  6. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: SIGKDD. ACM (2014)

  7. Eswaran, D., Günnemann, S., Faloutsos, C., Makhija, D., Kumar, M.: Zoobp: Belief propagation for heterogeneous networks. In: Proceedings of the VLDB Endowment, vol. 10, pp 625–636 (2017)

  8. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: SIGKDD. ACM (2016)

  9. Guu, K., Miller, J., Liang, P.: Traversing knowledge graphs in vector space. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 318–327 (2015)

  10. He, J., Li, M., Zhang, H.-J., Tong, H., Zhang, C.: Manifold-ranking based image retrieval. In: Proceedings of the 12th Annual ACM International Conference on Multimedia. ACM (2004)

  11. Jiang, Z., Liu, H., Fu, B., Wu, Z., Zhang, T.: Recommendation in heterogeneous information networks based on generalized random walk model and bayesian personalized ranking. ACM (2018)

  12. Jung, J., Jung, W.J., Sael, L., Kang, U: Personalized ranking in signed networks using signed random walk with restart. In: ICDM. IEEE (2016)

  13. Jon, M: Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)

    Article  Google Scholar 

  14. Koutra, D., Ke, T.-Y., Kang, U, Chau, D.H.P., Pao, H.K.K., Faloutsos, C.: Unifying guilt-by-association approaches: Theorems and fast algorithms. In: ECML, pp 245–260. Springer (2011)

  15. Kunegis, J., Lommatzsch, A., Bauckhage, C.: The slashdot zoo: mining a social network with negative edges. In: WWW. ACM (2009)

  16. Lao, N., Mitchell, T., Cohen, W.W: Random walk inference and learning in a large scale knowledge base. In: EMNLP. ACL (2011)

  17. Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)

    MathSciNet  Article  Google Scholar 

  18. Laub, A.J: Matrix analysis for scientists and engineers, vol. 91 Siam (2005)

  19. Lee, S., Park, S., Kahng, M., Lee, S.: Pathrank: a novel node ranking measure on a heterogeneous graph for recommender systems. In: CIKM. ACM (2012)

  20. Leskovec, J., Huttenlocher, D.P, Kleinberg, J.M: Governance in social media: A case study of the wikipedia promotion process. In: ICWSM (2010)

  21. Leskovec, J., Mcauley, J.J: Learning to discover social circles in ego networks. In: NIPS (2012)

  22. Li, L., Yao, Y., Tang, J., Fan, W., Tong, H.: Quint: on query-specific optimal networks. In: SIGKDD. ACM (2016)

  23. Lin, Y., Liu, Z., Luan, H., Sun, M., Rao, S., Liu, S.: Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 705–714 (2015)

  24. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, vol. 15, pp 2181–2187 (2015)

  25. Massa, P., Avesani, P.: Controversial users demand local trust metrics: An experimental study on epinions. com community. In: AAAI (2005)

  26. Massa, P., Salvetti, M., Tomasoni, D.: Bowling alone and trust decline in social network sites. In: DASC. IEEE (2009)

  27. Miller, G.A: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  28. Murphy, K: Machine learning: a probabilistic approach. Massachusetts Institute of Technology, 1–21 (2012)

  29. Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford InfoLab (1999)

  30. Park, H., Jung, J., Kang, U.: A comparative study of matrix factorization and random walk with restart in recommender systems. In: BigData. IEEE (2017)

  31. Perozzi, B., Schueppert, M., Saalweachter, J., Thakur, M.: When recommendation goes wrong: Anomalous link discovery in recommendation networks. In: SIGKDD. ACM (2016)

  32. Shahriari, M., Jalili, M.: Ranking nodes in signed social networks. Soc. Netw. Anal. Min. 4(1), 172 (2014)

    Article  Google Scholar 

  33. Socher, R., Chen, D., Manning, C.D, Ng, A.: Reasoning with neural tensor networks for knowledge base completion. In: NIPS (2013)

  34. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inform. Process. Manag. 45(4), 427–437 (2009)

    Article  Google Scholar 

  35. Strang, G.: Linear Algebra and Its Applications. Thomson Brooks/Cole, Pacific Groove (2006)

    MATH  Google Scholar 

  36. Tang, J.M.Q., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: Large-scale information network embedding. In: WWW. International World Wide Web Conferences Steering Committee (2015)

  37. Tong, H., Faloutsos, C.: Center-piece subgraphs: problem definition and fast solutions. In: SIGKDD. ACM (2006)

  38. Tong, H., Faloutso, C., Pan, J.-Y.: Fast random walk with restart and its applications. In: ICDM. IEEE (2006)

  39. Trefethen, L.N, Bau, D. III: Numerical linear algebra, vol. 50, Siam (1997)

  40. Wang, X., van Eeden, C., Zidek, J.V: Asymptotic properties of maximum weighted likelihood estimators. J. Stat. Plan. Infer. 119(1), 37–54 (2004)

    MathSciNet  Article  Google Scholar 

  41. Yang, B., Yih, W.-T., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: 3rd International Conference on Learning Representations, ICLR 2015, San diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

Download references


This work was supported by the ICT R&D program of MSIT/IITP (No.2017-0-01772, Development of QA systems for Video Story Understanding to pass the Video Turing Test). The Institute of Engineering Research at Seoul National University provided research facilities for this work. The ICT at Seoul National University provides research facilities for this study.

Author information

Authors and Affiliations


Corresponding author

Correspondence to U Kang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Artificial Intelligence and Big Data Computing

Guest Editors: Wookey Lee and Hiroyuki Kitagawa


Appendix A: Lemma for convexity and solution of label transition probabilities

Lemma 2

The estimator in (2) maximizes the weighted log-likelihood function \(WL(\mathbf {S}_{k};\mathcal {D}_{k})\) in (1).


Our goal is to find Sk that maximizes \(WL(\mathbf {S}_{k};\mathcal {D}_{k})\), which is equivalent to minimizing \(-WL(\mathbf {S}_{k};\mathcal {D}_{k})\). The probability P(xkh; Sk) is written as follows:

$$ P(\mathbf{x}_{kh}; \mathbf{S}_{k}) = \mathbf{S}_{k\mathbf{x}_{khs}\mathbf{x}_{kht}}=\prod\limits_{i=1}^{K}\prod\limits_{j=1}^{K}(\mathbf{S}_{kij})^{\mathbf{1}(\mathbf{x}_{khs}=i, \mathbf{x}_{kht}=j)} $$

Then \(-WL(\mathbf {S}_{k};\mathcal {D}_{k})\) is represented as follows:

$$ \begin{array}{@{}rcl@{}} -WL(\mathbf{S}_{k};\mathcal{D}_{k}) &=& -\sum\limits_{h=1}^{n_{k}} w_{\mathbf{x}_{kht}}\log{P(\mathbf{x}_{kh};\mathbf{S}_{k})}\\ &=& -\sum\limits_{h=1}^{n_{k}}\sum\limits_{i=1}^{K}\sum\limits_{j=1}^{K}w_{j}\mathbf{1}(\mathbf{x}_{khs} = i, \mathbf{x}_{kht} = j)\log{\mathbf{S}_{kij}}\\ &=& -\sum\limits_{i=1}^{K}\sum\limits_{j=1}^{K}w_{j}\left( \sum\limits_{h=1}^{n_{k}}\mathbf{1}(\mathbf{x}_{khs}=i, \mathbf{x}_{kht}=j)\right)\log\mathbf{S}_{kij}\\ &=& -\sum\limits_{i=1}^{K}\sum\limits_{j=1}^{K}w_{j}N_{kij}\log\mathbf{S}_{kij} \end{array} $$

where \(N_{kij}={\sum }_{h=1}^{n_{k}}\mathbf {1}(\mathbf {x}_{khs} = i, \mathbf {x}_{kht} = j)\) is the count of the label transition observations. Then the minimization problem is represented as follows:

$$ \begin{array}{@{}rcl@{}} \underset{\mathbf{S}_{kij}}{\text{minimize}} -WL(\mathbf{S}_{k};\mathcal{D}_{k}) &=& -\sum\limits_{i=1}^{K}\sum\limits_{j=1}^{K}w_{j}N_{kij}\log\mathbf{S}_{kij}\\ \text{subject to} \mathbf{S}_{kij} &\geq& 0 \text{ for } 1\leq i,j \leq K, \\ \sum\limits_{j=1}^{K}\mathbf{S}_{kij}&=&1 \text{ for } 1\leq i \leq K. \end{array} $$

Note that the above problem is convex (see Lemma 3); thus, the optimization problem is solved by the KKT theorem [5], and the solution of the problem is represented as (2) (details in Lemma 4. □

Lemma 3

The optimization problem in (7) is convex.


The objective function is convex since the negative log functions \(-\log {\mathbf {S}_{kij}}\) are convex, and the sum of non-negatively weighted convex functions is convex (i.e., wjNkij ≥ 0) [5]. Let C be a set of Sk satisfying the constraints, i.e., C = {Sk|Sk1 = 1,Skij ≥ 0, for 1 ≤ i,jK}. For \(\mathbf {S}_{k_{1}},\mathbf {S}_{k_{2}}\!\in \!\mathbf {C}\) and 𝜃1 + 𝜃2 = 1 such that 𝜃1,𝜃2 ≥ 0, let \(\mathbf {S}_{k_{3}}=\theta _{1}\mathbf {S}_{k_{1}} + \theta _{2}\mathbf {S}_{k_{2}}\). Then \(\mathbf {S}_{k_{3}}\mathbf {1} = (\theta _{1}\mathbf {S}_{k_{1}} + \theta _{2}\mathbf {S}_{k_{2}})\mathbf {1} = \mathbf {1}\) indicating \(\mathbf {S}_{k_{3}}\!\in \!\mathbf {C}\). Thus C is convex by the definition of convex set [5]. □

Lemma 4

The solution of the optimization problem in (7) is represented as (2).


The lagrangian \({\mathscr{L}}(\cdot )\) of the objective function in (7) is represented as follows:

$$ \begin{array}{@{}rcl@{}} \mathcal{L}(\mathbf{S}_{k}, \mathbf{\lambda}, \mathbf{\nu}) &=& -\sum\limits_{i=1}^{K}\sum\limits_{j=1}^{K}w_{j}N_{kij}\log\mathbf{S}_{kij} + \sum\limits_{i=1}^{K}\sum\limits_{j=1}^{K}-\lambda_{ij}\mathbf{S}_{kij} \\ &+& \sum\limits_{i=1}^{K}\nu_{i}\sum\limits_{j=1}^{K}\left( \mathbf{S}_{kij}-1\right) \end{array} $$

where λ and ν are inequality and equality lagrange multipliers, respectively. Let \(\hat {\mathbf {S}}_{kij}\) be the solution that minimizes (7). λ and ν denote the optional points for λ and ν, respectively. The stationarity condition \(\nabla _{\mathbf {S}_{k}}{\mathscr{L}}(\hat {\mathbf {S}}_{k},\lambda ^{*},\nu ^{*})=0\) implies the following equation:

$$ \frac{\partial \mathcal{L}(\hat{\mathbf{S}}_{k}, \mathbf{\lambda^{*}}, \mathbf{\nu^{*}})}{\partial \mathbf{S}_{kij}} = - \frac{w_{j}N_{kij}}{\hat{\mathbf{S}}_{kij}} - \lambda^{*}_{ij} + \nu^{*}_{i} = 0 \Leftrightarrow \hat{\mathbf{S}}_{kij} = \frac{w_{j}N_{kij}}{\nu^{*}_{i}-\lambda^{*}_{ij}} $$

By the complementary slackness \(\lambda ^{*}_{ij}\hat {\mathbf {S}}_{kij} = 0\), primal feasibility \(\hat {\mathbf {S}}_{kij} \geq 0\), and dual feasibility \(\lambda ^{*}_{ij} \geq 0\),

$$ \begin{array}{@{}rcl@{}} \bullet \text{ } \hat{\mathbf{S}}_{kij} > 0 \Rightarrow \lambda^{*}_{ij} &=& 0 \Leftrightarrow \hat{\mathbf{S}}_{kij} = \frac{w_{j}N_{kij}}{\nu^{*}_{i}} > 0 \Leftrightarrow w_{j}N_{kij} \neq 0\\ \bullet \text{ } \lambda^{*}_{ij} > 0 \Rightarrow \hat{\mathbf{S}}_{kij} &=& 0 \Leftrightarrow \hat{\mathbf{S}}_{kij} = \frac{w_{j}N_{kij}}{\nu^{*}_{i}\! -\! \lambda^{*}_{ij}} = 0 \Leftrightarrow w_{j}N_{kij} = 0 \end{array} $$

For the case that \(\hat {\mathbf {S}}_{kij} > 0\), \(\nu ^{*}_{i}\) is obtained from the equality constraint \({{\sum }_{z=1}^{K}\hat {\mathbf {S}}_{kiz}=1}\) as follows:

$$ \begin{array}{@{}rcl@{}} \sum\limits_{z=1}^{K}\hat{\mathbf{S}}_{kiz} &=& \!\!\!\!\!\underset{ \{z | \hat{\mathbf{S}}_{kiz} > 0 \}}{\sum}\!\!\!\!\! \hat{\mathbf{S}}_{kiz} = \!\!\!\!\!\underset{ \{z | \hat{\mathbf{S}}_{kiz} > 0 \}}{\sum}\!\!\!\!\! \frac{w_{z}N_{kiz}}{\nu^{*}_{i}} = 1 \Leftrightarrow\\ \nu^{*}_{i} &=& \!\!\!\!\!\!\! \underset{ \{z | \hat{\mathbf{S}}_{kiz} > 0 \}}{\sum} \!\!\!\!\! w_{z}N_{kiz} = \!\!\!\!\!\!\!\! \underset{ \{z | \hat{\mathbf{S}}_{kiz} > 0 \}}{\sum}\!\!\!\!\! w_{z}N_{kiz} + \!\!\!\!\!\!\!\!\underset{ \{z | \hat{\mathbf{S}}_{kiz} = 0 \}}{\sum}\!\!\!\!\! w_{z}N_{kiz} \! = \!\! \sum\limits_{z=1}^{K}w_{z}N_{kiz} \end{array} $$

Hence, \(\hat {\mathbf {S}}_{kij} = w_{j}N_{kij}/\nu ^{*}_{i} = w_{j}N_{kij}/({\sum }_{z=1}^{K}w_{z}N_{kiz})\). □

Appendix B: Lemma for recursive equation of MuRWR score matrix

Lemma 5

Equation (3) is represented as (4).


In (3), let lp denote lvu. For edge \(v \rightarrow u\),

$$ l_{i}) = \sum\limits_{j=1}^{K}\mathbf{R}_{vj}P(l_{j} \xrightarrow{l_{p}} l_{i}) = {\sum}_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{pji} $$

where Spji is the label transition probability \(P(l_{j} \! \xrightarrow {l_{p}} \! l_{i})\). By Definition 6, \(\tilde {\mathbf {A}}_{pvu} = |\overrightarrow {\mathbf {N}}_{v}|^{-1} = {{\tilde {\mathbf {A}}}^{\top }}_{puv}\) for all p when \(\tilde {\mathbf {A}}_{puv}\) is non-zero. Hence,

$$ \underset{v \in \overleftarrow{\mathbf{N}}_{u}}{\sum} \left( \frac{1}{\lvert \overrightarrow{\mathbf{N}}_{v} \rvert} \sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{pji}\right) = \underset{v \in \overleftarrow{\mathbf{N}}_{u}}{\sum} {\tilde{\mathbf{A}}^{\top}}_{puv}\sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{pji} $$

Let \({\overleftarrow {\mathbf {N}}}_{u}^{(i)}\) be the set of in-neighbors of node u such that node \(v \in \overleftarrow {\mathbf {N}}_{u}^{(i)}\) is connected to node u with edge label li. Then \(\overleftarrow {\mathbf {N}}b_{u}\) is represented as \(\overleftarrow {\mathbf {N}}_{uh} = \overleftarrow {\mathbf {N}}_{u}^{(1)} \cup {\cdots } \cup \overleftarrow {\mathbf {N}}_{u}^{(K)}\).

If there is no li-labeled edge to node u from any in-neighbor node, Thus (8) is represented as follows:

$$ \begin{array}{@{}rcl@{}} &&\underset{v \in \overleftarrow{\mathbf{N}}_{u}}{\sum}{\tilde{\mathbf{A}}^{\top}}_{puv}\sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{pji} \\ &=& \!\!\!\!\underset{v \in \overleftarrow{\mathbf{N}}_{u}^{(1)}}{\sum}{\tilde{\mathbf{A}}^{\top}}_{1uv}\sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{1ji}+ {\cdots} + \!\!\!\underset{v \in \overleftarrow{\mathbf{N}}_{u}^{(K)}}{\sum}{\tilde{\mathbf{A}}^{\top}}_{Kuv}\sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{Kji} \\ &=& \sum\limits_{k=1}^{K}\left( \underset{v \in \overleftarrow{\mathbf{N}}_{u}^{(k)}}{\sum}{\tilde{\mathbf{A}}^{\top}}_{kuv}\sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{kji}\right) \end{array} $$

Let (⋅)ij be (i,j)-th entry of a matrix. Then, \({\sum }_{v \in \overleftarrow {\mathbf {N}}_{u}^{(k)}}{\tilde {\mathbf {A}}^{\top }}_{kuv}{\sum }_{j=1}^{K}\mathbf {R}_{vj}\mathbf {S}_{kji}\) in the above equation is written as:

$$ \underset{v \in \overleftarrow{\mathbf{N}}_{u}^{(k)}}{ \sum}{\tilde{\mathbf{A}}^{\top}}_{kuv}\sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{kji} = \underset{v \in \overleftarrow{\mathbf{N}}_{u}^{(k)}}{\sum}{\tilde{\mathbf{A}}^{\top}}_{kuv}(\mathbf{R}\mathbf{S}_{k})_{vi} = ({\tilde{\mathbf{A}}^{\top}}_{k}\mathbf{R}\mathbf{S}_{k})_{ui} $$

Then (9) is represented as follows:

$$ \sum\limits_{k=1}^{K}\left( \underset{v \in \overleftarrow{\mathbf{N}}_{u}^{(k)}}{\sum}{\tilde{\mathbf{A}}^{\top}}_{kuv}\sum\limits_{j=1}^{K}\mathbf{R}_{vj}\mathbf{S}_{kji}\right) = \sum\limits_{k=1}^{K}({\tilde{\mathbf{A}}^{\top}}_{k}\mathbf{R}\mathbf{S}_{k})_{ui} = \left( \sum\limits_{k=1}^{K}{\tilde{\mathbf{A}}^{\top}}_{k}\mathbf{R}\mathbf{S}_{k}\right)_{\!\!ui} $$

Thus Rui in (3) is written as follows:

$$ \mathbf{R}_{ui} = (1-c) \left( \sum\limits_{k=1}^{K}{\tilde{\mathbf{A}}^{\top}}_{k}\mathbf{R}\mathbf{S}_{k}\right)_{\!\!ui} \!\!\! + c\mathbf{1}(u=s, l_{i}=l_{d}) $$

For 1 ≤ un and 1 ≤ iK where n is the number of nodes, the above equation is represented as (4). □

Appendix C: Lemma for spectral radius in convergence theorem

Lemma 6

Suppose \(\tilde {\mathbf {B}}^{\top } = {\sum }_{k=1}^{K}\mathbf {S}^{\top }_{k} \otimes {\tilde {\mathbf {A}}^{\top }}_{k}\) where \(\tilde {\mathbf {A}}_{k}\) is k-th labeled semi-row-normalized matrix, and Sk is k-th label transition probability matrix. Then, \(\| \tilde {\mathbf {B}}^{\top } \|_{1} \leq 1\), and the spectral radius of \(\tilde {\mathbf {B}}^{\top }\) is bounded as follows: \(\rho (\tilde {\mathbf {B}}^{\top }) \leq 1\).


According to spectral radius theorem [39], \(\rho (\tilde {\mathbf {B}}^{\top }) \leq \| \tilde {\mathbf {B}}^{\top } \|_{1}\)

where \(\| \tilde {\mathbf {B}}^{\top } \|_{1}\) is the maximum absolute column sum of \(\tilde {\mathbf {B}}^{\top }\). Since each entry of \(\tilde {\mathbf {B}}^{\top }\) is non-negative, \(\| \tilde {\mathbf {B}}^{\top } \|_{1}\) is equal to the maximum value of the column sums of the matrix. The column sums are represented as follows:

$$ \begin{array}{@{}rcl@{}} (\mathbf{1}^{\top} \otimes \mathbf{1}^{\top})\tilde{\mathbf{B}}^{\top} &=& (\mathbf{1}^{\top} \otimes \mathbf{1}^{\top})\left( \sum\limits_{k=1}^{K}\mathbf{S}^{\top}_{k} \otimes {\tilde{\mathbf{A}}^{\top}}_{k}\right) \\ & =&\sum\limits_{k=1}^{K}(\mathbf{1}^{\top} \otimes \mathbf{1}^{\top})(\mathbf{S}^{\top}_{k} \otimes {\tilde{\mathbf{A}}^{\top}}_{k}) = \sum\limits_{k=1}^{K}\mathbf{1}^{\top}\mathbf{S}^{\top}_{k} \otimes \mathbf{1}^{\top}{\tilde{\mathbf{A}}^{\top}}_{k} \end{array} $$

According to Definition 3, the sum of each row of Sk is 1, i.e., \(\mathbf {S}_{k}\mathbf {1} = \mathbf {1} \Leftrightarrow \mathbf {1}^{\top }\mathbf {S}^{\top }_{k} = \mathbf {1}^{\top }\). Hence,

$$ \begin{array}{@{}rcl@{}} \sum\limits_{k=1}^{K}\mathbf{1}^{\top}\mathbf{S}^{\top}_{k} \otimes \mathbf{1}^{\top}{\tilde{\mathbf{A}}^{\top}}_{k} &=& \sum\limits_{k=1}^{K}\mathbf{1}^{\top} \otimes \mathbf{1}^{\top}{\tilde{\mathbf{A}}^{\top}}_{k} = \mathbf{1}^{\top} \otimes \sum\limits_{k=1}^{K}\mathbf{1}^{\top}{\tilde{\mathbf{A}}^{\top}}_{k} \\ &=& \mathbf{1}^{\top} \otimes \mathbf{1}^{\top}\sum\limits_{k=1}^{K}{\tilde{\mathbf{A}}^{\top}}_{k} \end{array} $$

Note that \({\tilde {\mathbf {A}}^{\top }}_{k}={\mathbf {A}}_{k}^{\top }\mathbf {D}^{-1}\) according to Definition 5. Suppose \(\mathbf {A}^{\prime }={\sum }_{k=1}^{K}\mathbf {A}_{k}\) is the adjacency matrix of the graph G without edge labels. Then, \(\mathbf {1}{\sum }_{k=1}^{K}{\tilde {\mathbf {A}}^{\top }}_{k} = \mathbf {1}{\sum }_{k=1}^{K}{\mathbf {A}}_{k}^{\top }\mathbf {D}^{-1} = (\mathbf {A}^{\prime }\mathbf {1})^{\top }\mathbf {D}^{-1}\). The u-th row of \(\mathbf {A}^{\prime }\mathbf {1}\) indicates the out-degree of node u, denoted by degu. If node u is a deadend node, then degu = 0. Otherwise, degu > 0. Note that D is the out-degree diagonal matrix of G and \({\mathbf {D}}^{-1}_{uu} = 1/deg_{u}\) if node u is not a deadend. Otherwise, \(\mathbf {D}^{-1}_{uu} = 0\). Thus, \((\mathbf {A}^{\prime }\mathbf {1})^{\top }\mathbf {D}^{-1} = \mathbf {b}^{\top }\) where u-th entry of b is 1 if node u is non-deadend, or 0 otherwise. Hence, \((\mathbf {1} \otimes \mathbf {1})\tilde {\mathbf {B}}^{\top } \!= \mathbf {1} \otimes \mathbf {b}^{\top }\) which is the column sum vector of \(\tilde {\mathbf {B}}^{\top }\), and the maximum value of the vector is less than or equal to 1. Therefore, \(\| \tilde {\mathbf {B}}^{\top } \|_{1} \leq 1\), implying \(\rho (\tilde {\mathbf {B}}^{\top }) \leq \| \tilde {\mathbf {B}}^{\top } \|_{1} \leq 1\). □

Appendix D: Lemma for Complexity Analysis

Lemma 7

The time complexity of Algorithms 1 and 2 is O(m1.5 + K3) where m is the number of edges, and K is the number of edge labels.


In Algorithm 1, it takes O(m1.5) time to enumerate all transitive triangles in the given graph G using a triangle enumeration algorithm [17] (line 1 in Algorithm 1). Also, estimating Sk requires O(K3) time (lines 2 \(\sim \) 8 in Algorithm 1). Algorithm 2 takes O(m) time for counting out-degrees of nodes (line 2 in Algorithm 2) and computing \(\tilde {\mathbf {A}}_{k} = \mathbf {D}^{-1}\mathbf {A}_{k}\) for 1 ≤ kK (line 3 in Algorithm 2).

Lemma 8

The time complexity of Algorithm 3 is O(T(Km + K3n)) where \(T=\log _{(1-c)}\frac {\epsilon }{2}\) indicates the number of iterations for convergence, 𝜖 is an error tolerance, m is the number of edges, n is the number of nodes, and K is the number of edge labels.


Let mk denote the number of non-zeros in k-th semi-row normalized matrix \(\tilde {\mathbf {A}}_{k}\) stored in a sparse matrix format such as compressed column storage (CCS). For each iteration, it takes O(Kmk + K2n) time to compute \({{\tilde {\mathbf {A}}}^{\top }}_{k}\mathbf {R}^{(t-1)}\mathbf {S}_{k}\) since the sparse matrix product \({\tilde {\mathbf {A}}^{\top }}_{k}\mathbf {R}^{(t-1)}\) requires O(Kmk) time, and the dense matrix product \(({\tilde {\mathbf {A}}^{\top }}_{k}\mathbf {R}^{(t-1)})\mathbf {S}_{k}\) takes O(K2n) time. Thus, computing \({\sum }_{k=1}^{K}({\tilde {\mathbf {A}}^{\top }}_{k}\mathbf {R}^{(t-1)}\mathbf {S}_{k})\) takes \(O({\sum }_{k=1}^{K}(Km_{k} + K^{2}n)) = O(Km + K^{3}n)\) where \({\sum }_{k=1}^{K}m_{k} = m\) (line 6). Note that when 2(1 − c)t𝜖, R(t) converges since δ(t) ≤ 2(1 − c)t by Theorem 1. Hence, for \(t \geq \log _{(1-c)}\frac {\epsilon }{2}\), the iteration is necessarily terminated. Thus, the number of iterations for convergence is estimated at \(\log _{(1-c)}\frac {\epsilon }{2}\), and the total time complexity is \(O((\log _{(1-c)}\frac {\epsilon }{2})(Km + K^{3}n))\). □

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jung, J., Jin, W., Park, Hm. et al. Accurate relational reasoning in edge-labeled graphs by multi-labeled random walk with restart. World Wide Web 24, 1369–1393 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Relational reasoning
  • Multi-labeled random walk with restart
  • Edge-labeled graphs