Random walk-based ranking in signed social networks: model and algorithms

Abstract

How can we rank nodes in signed social networks? Relationships between nodes in a signed network are represented as positive (trust) or negative (distrust) edges. Many social networks have adopted signed networks to express trust between users. Consequently, ranking friends or enemies in signed networks has received much attention from the data mining community. The ranking problem, however, is challenging because it is difficult to interpret negative edges. Traditional random walk-based methods such as PageRank and random walk with restart cannot provide effective rankings in signed networks since they assume only positive edges. Although several methods have been proposed by modifying traditional ranking models, they also fail to account for proper rankings due to the lack of ability to consider complex edge relations. In this paper, we propose Signed Random Walk with Restart (SRWR), a novel model for personalized ranking in signed networks. We introduce a signed random surfer so that she considers negative edges by changing her sign for walking. Our model provides proper rankings considering signed edges based on the signed random walk. We develop two methods for computing SRWR scores: SRWR-Iter and SRWR-Pre which are iterative and preprocessing methods, respectively. SRWR-Iter naturally follows the definition of SRWR, and iteratively updates SRWR scores until convergence. SRWR-Pre enables fast ranking computation which is important for the performance of applications of SRWR. Through extensive experiments, we demonstrate that SRWR achieves the best accuracy for link prediction, predicts trolls \(4\times \) more accurately, and shows a satisfactory performance for inferring missing signs of edges compared to other competitors. In terms of efficiency, SRWR-Pre preprocesses a signed network \(4.5 \times \) faster and requires \(11 \times \) less memory space than other preprocessing methods; furthermore, SRWR-Pre computes SRWR scores up to \(14 \times \) faster than other methods in the query phase.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    SALSA [25] is a normalized version of HITS [22].

References

  1. 1.

    Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 635–644

  2. 2.

    Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184

    Article  Google Scholar 

  3. 3.

    Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    MathSciNet  Article  Google Scholar 

  4. 4.

    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  5. 5.

    Cartwright D, Harary F (1956) Structural balance: a generalization of heider’s theory. Psychol Rev 63(5):277

    Article  Google Scholar 

  6. 6.

    Davis JA (1967) Clustering and structural balance in graphs. Hum Relat 20(2):181–187

    Article  Google Scholar 

  7. 7.

    Duff IS, Grimes RG, Lewis JG (1989) Sparse matrix test problems. ACM Trans Math Softw (TOMS) 15(1):1–14

    MathSciNet  Article  Google Scholar 

  8. 8.

    Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge

    Book  Google Scholar 

  9. 9.

    Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. Proc VLDB Endow 5(5):442–453

    Article  Google Scholar 

  10. 10.

    Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 597–605

  11. 11.

    Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU press

  12. 12.

    Guha R, Kumar R, Raghavan P, Tomkins A (2004) Propagation of trust and distrust. In: Proceedings of the 13th international conference on World Wide Web. ACM, pp 403–412

  13. 13.

    Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 517–526

  14. 14.

    Heider F (1946) Attitudes and cognitive organization. J Psychol 21(1):107–112

    Article  Google Scholar 

  15. 15.

    Jin W, Jung J, Kang U (2019) Supervised and extended restart in random walks for ranking and link prediction in networks. PLoS ONE 14(3):e0213857

    Article  Google Scholar 

  16. 16.

    Jung J, Jin W, Sael L, Kang U (2016) Personalized ranking in signed networks using signed random walk with restart. In: IEEE 16th international conference on data mining, ICDM 2016, December 12–15, 2016, Barcelona, Spain, pp 973–978. http://dx.doi.org/10.1109/ICDM.2016.0122

  17. 17.

    Jung J, Park N, Sael L, Kang U (2017) Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In: Proceedings of the 2017 ACM international conference on management of data, SIGMOD conference 2017, Chicago, IL, USA, May 14–19, 2017, pp 789–804

  18. 18.

    Jung J, Shin K, Sael L, Kang U (2016) Random walk with restart on large graphs using block elimination. ACM Trans Database Syst 41(2):12. https://doi.org/10.1145/2901736

    MathSciNet  Article  Google Scholar 

  19. 19.

    Kang U, Faloutsos C (2011) Beyond ‘caveman communities’: hubs and spokes for graph compression and mining, in ‘ICDM’

  20. 20.

    Kang U, Tong H, Sun J (2012) Fast random walk graph kernel. In: Proceedings of the twelfth SIAM international conference on data mining, Anaheim, California, USA, April 26-28, 2012, pp 828–838

  21. 21.

    Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632

    MathSciNet  Article  Google Scholar 

  22. 22.

    Kleinberg JM (1999) Hubs, authorities, and communities. ACM Comput Surveys (CSUR) f31(4es):5

  23. 23.

    Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: Proceedings of the 18th international conference on World wide web. ACM, pp 741–750

  24. 24.

    Langville AN, Meyer CD, Fernández P (2008) Googles pagerank and beyond: the science of search engine rankings. Math Intell 30(1):68–69

    Article  Google Scholar 

  25. 25.

    Lempel R, Moran S (2001) Salsa: the stochastic approach for link-structure analysis. ACM Trans Inf Syst (TOIS) 19(2):131–160

    Article  Google Scholar 

  26. 26.

    Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 641–650

  27. 27.

    Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1361–1370

  28. 28.

    Lim Y, Kang U, Faloutsos C (2014) Slashburn: graph compression and mining beyond caveman communities. IEEE Trans Knowl Data Eng 26(12):3077–3089

    Article  Google Scholar 

  29. 29.

    Mishra A, Bhattacharya A (2011) Finding the bias and prestige of nodes in networks based on trust scores. In: Proceedings of the 20th international conference on World Wide Web. ACM, pp 567–576

  30. 30.

    Ng AY, Zheng AX, Jordan MI (2001) Stable algorithms for link analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 258–266

  31. 31.

    Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the Web

  32. 32.

    Saad Y (2003) Iterative methods for sparse linear systems, vol 82. SIAM

  33. 33.

    Shahriari M, Jalili M (2014) Ranking nodes in signed social networks. Soc Netw Anal Min 4(1):1–12

    Article  Google Scholar 

  34. 34.

    Shin K, Jung J, Lee S, Kang U (2015) Bear: Block elimination approach for random walk with restart on large graphs. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1571–1585

  35. 35.

    Song D, Meyer DA (2015) Recommending positive links in signed social networks by optimizing a generalized auc. In: AAAI, pp 290–296

  36. 36.

    Strang G (2006) Linear algebra and its applications. Thomson, Brooks/Cole. https://books.google.ie/books?id=q9CaAAAACAAJ

  37. 37.

    Szell M, Lambiotte R, Thurner S (2010) Multirelational organization of large-scale social networks in an online world. Proc Nat Acad Sci 107(31):13636–13641

    Article  Google Scholar 

  38. 38.

    Taylor ME (2006) Measure theory and integration. American Mathematical Soc, Providence

    MATH  Google Scholar 

  39. 39.

    Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T (2007) Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 737–746

  40. 40.

    Tong H, Faloutsos C, Pan J-Y (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3):327–346

    Article  Google Scholar 

  41. 41.

    Wu Z, Aggarwal CC, Sun J (2016) The troll-trust model for ranking in signed networks. In: Proceedings of the ninth ACM international conference on Web search and data mining. ACM, pp 447–456

  42. 42.

    Yang B, Cheung WK, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19(10):1333–1348

    Article  Google Scholar 

  43. 43.

    Yoon M, Jin W, Kang U (2018) Fast and accurate random walk with restart on dynamic graphs with guarantees. In: Proceedings of the 2018 World Wide Web conference on World Wide Web, WWW 2018, Lyon, France, April 23–27, 2018, pp 409–418

  44. 44.

    Yoon M, Jung J, Kang U (2018) Tpa: Fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In: 34th IEEE international conference on data engineering, ICDE 2018, Paris, France, April 16–19, 2018

Download references

Acknowledgements

This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) [2013-0-00179, Development of Core Technology for Context-aware Deep-Symbolic Hybrid Learning and Construction of Language Resources]. The Institute of Engineering Research at Seoul National University provided research facilities for this work. The ICT at Seoul National University provides research facilities for this study.

Author information

Affiliations

Authors

Corresponding author

Correspondence to U Kang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Details of the hub-and-spoke reordering method

SlashBurn [19, 28] is a node reordering algorithm which concentrates nonzero entries of the adjacency matrix of a given graph based on the hub-and-spoke structure. Let n be the number of nodes in a graph, and t be the hub selection ratio whose range is between 0 and 1 where \(\lceil tn \rceil \) indicates the number of nodes selected by SlashBurn as hubs. For each iteration, SlashBurn disconnects \(\lceil tn \rceil \) high-degree nodes, called hub nodes, from the graph; then the graph is split into the giant connected component (GCC) and the disconnected components. The nodes in the disconnected components are called spokes, and each disconnected component forms a block in \(\mathbf {|H|}_{11}\) (or \(\mathbf {T}_{11}\)) in Fig. 6. Then, SlashBurn reorders nodes such that the hub nodes get the highest ids, the spokes get the lowest ids, and the nodes in the GCC get the ids in the middle. SlashBurn repeats this procedure on the GCC recursively until the size of GCC becomes smaller than \(\lceil tn \rceil \). After SlashBurn is done, the reordered adjacency matrix contains a large and sparse block diagonal matrix in the upper left area, as shown in Fig. 6. Figure 14 depicts the procedure of SlashBurn when \(\lceil tn \rceil =1\).

Properties and lemmas

Sum of positive and negative SRWR scores

Property 3

Consider the recursive equation \(\mathbf {p}= (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p} + c\mathbf {q}\) where \(\mathbf {p}= \mathbf {r}^{+}+ \mathbf {r}^{-}\) and \(|{{\tilde{\mathbf {A}}}}|^{\top }\) is a column stochastic matrix. Then \(\mathbf {1}^{\top }\mathbf {p}= \sum _{i}\mathbf {p}_{i} = 1\).

Proof

By multiplying both sides by \(\mathbf {1}^{\top }\), the equation is represented as follows:

$$\begin{aligned} \mathbf {p}= (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p} + c\mathbf {q}\Leftrightarrow \mathbf {1}^{\top }\mathbf {p}= (1-c)\mathbf {1}^{\top }|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p} + c\mathbf {1}^{\top }\mathbf {q}\end{aligned}$$

Note that \(\mathbf {1}^{\top }|{{\tilde{\mathbf {A}}}}|^{\top } = (|{{\tilde{\mathbf {A}}}}|\mathbf {1})^{\top }\), and \(|{{\tilde{\mathbf {A}}}}|\) is a row stochastic matrix; thus, \((|{{\tilde{\mathbf {A}}}}|\mathbf {1})^{\top } = \mathbf {1}^{\top }\). Hence, the above equation is represented as follows:

$$\begin{aligned} \mathbf {1}^{\top }\mathbf {p}= (1-c)\mathbf {1}^{\top }|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p} + c\mathbf {1}^{\top }\mathbf {q}\Leftrightarrow \mathbf {1}^{\top }\mathbf {p}= (1-c)\mathbf {1}^{\top }\mathbf {p}+ c \Leftrightarrow \mathbf {1}^{\top }\mathbf {p}= 1 \end{aligned}$$

\(\square \)

Fig. 14
figure14

Node reordering based on hub-and-spoke method when \(\lceil tn \rceil =1\) where \(\lceil tn \rceil \) indicates the number of selected hubs at each step, and t is the hub selection ratio (\(0< t < 1\)). Red nodes are hubs; blue nodes are spokes that belong to the disconnected components; green colored are nodes that belong to the giant connected component. At Step 1 in (a), the method disconnects a hub node, and assigns node ids as shown in (b). The hub node gets the highest id (14), the spoke nodes get the lowest ids (1–7), and the GCC gets the middle ids (8–13). The next iteration starts on the GCC in (b), and the node ids are assigned as in (c)

Analysis on number of iterations of SRWR-Iter

Lemma 3

Suppose \(\mathbf {h}= [\mathbf {r}^{+};\mathbf {r}^{-}]^{\top }\), and \(\mathbf {h}^{(k)}\) is the result of kth iteration in SRWR-Iter. Let \(\delta ^{(k)}\) denote the error \(||\mathbf {h}^{(k)} - \mathbf {h}^{(k-1)} ||_{1}\). Then \(\delta ^{(k)} \le 2(1-c)^{k}\), and the estimated number T of iterations for convergence is \(\log _{1-c}\frac{\epsilon }{2}\) where \(\epsilon \) is an error tolerance, and c is the restart probability.

Proof

According to Eq. (4), \(\delta ^{(k)}\) is represented as follows:

$$\begin{aligned} \delta ^{(k)} = ||\mathbf {h}^{(k)} - \mathbf {h}^{(k-1)} ||_{1}&= (1-c) ||{{\tilde{\mathbf {B}}}}^{\top } (\mathbf {h}^{(k-1)} - \mathbf {h}^{(k-2)}) ||_{1} \\&\le (1-c) ||{{\tilde{\mathbf {B}}}}^{\top } ||_{1}||\mathbf {h}^{(k-1)} - \mathbf {h}^{(k-2)} ||_{1} \\&= (1-c) ||\mathbf {h}^{(k-1)} - \mathbf {h}^{(k-2)} ||_{1} = (1-c)\delta ^{(k-1)} \end{aligned}$$

Note that \(||{{\tilde{\mathbf {B}}}}^{\top } ||_{1} = 1\) since \({{\tilde{\mathbf {B}}}}^{\top }\) is column stochastic as described in Theorem 1. Hence, \(\delta ^{(k)} \le (1-c)\delta ^{(k-2)} \le \cdots \le (1-c)^{k}\delta ^{(1)}\). Since \(\delta ^{(1)} = ||\mathbf {h}^{(1)} - \mathbf {h}^{(0)} ||_{1} \le ||\mathbf {h}^{(1)}||_{1} + ||\mathbf {h}^{(0)} ||_{1} = 2\), \(\delta ^{(k)} \le 2(1-c)^{k}\). Note that when \(\delta ^{(k)} \le \epsilon \), the iteration of SRWR-Iter is terminated. Thus, for \(k \le \log _{1-c}\frac{\epsilon }{2}\), the iteration is terminated, and the number T of iterations for convergence is estimated at \(\log _{1-c}\frac{\epsilon }{2}\). \(\square \)

Time complexity of sparse matrix multiplication

Lemma 4

(Sparse Matrix Multiplication [32]) Suppose that \(\mathbf {A}\) and \(\mathbf {B}\) are \(p \times q\) and \(q \times r\) sparse matrices, respectively, and \(\mathbf {A}\) has \( nnz (\mathbf {A})\) nonzeros. Calculating \(\mathbf {C}=\mathbf {A}\mathbf {B}\) using sparse matrix multiplication requires \(O( nnz (\mathbf {A})r)\).

Complexity analysis of proposed methods for SRWR

We analyze the complexity of our proposed methods SRWR-Iter and SRWR-Pre in terms of time and space. The space and time complexities of SRWR-Iter are presented in Lemma 5, and those of SRWR-Pre are in Lemmas 67, and 8 , respectively.

Space and time complexities of SRWR-Iter

Lemma 5

(Space and Time Complexities of SRWR-Iter) Let n and m denote the number of nodes and edges of a signed network, respectively. Then the space complexity of Algorithm 2 is \(O(n+m)\). The time complexity of Algorithm 2 is \(O(T(n+m))\) where the number T of iterations is \(\log _{1-c}\frac{\epsilon }{2}\), c is the restart probability, and \(\epsilon \) is an error tolerance.

Proof

The space complexity for \({{\tilde{\mathbf {A}}}_{+}}\) and \({{\tilde{\mathbf {A}}}_{-}}\) is O(m) if we exploit a sparse matrix format such as compressed column storage to save the matrices. We need O(n) for SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\). Thus, the space complexity is \(O(n+m)\). One iteration in Algorithm 2 takes \(O(n+m)\) time due to sparse matrix vector multiplications and vector additions where the time complexity of a sparse matrix vector multiplication is linear to the number of nonzeros of a matrix [7]. Hence, the total time complexity is \(O(T(n+m))\) where the number T of iterations is \(\log _{1-c}\frac{\epsilon }{2}\) which is proved in Lemma 3. \(\square \)

Space and time complexities of SRWR-Pre

Lemma 6

(Space Complexity of SRWR-Pre) The space complexity of the preprocessed matrices from SRWR-Pre is \(O(n_{2}^{2} + m)\) where \(n_{2}\) is the number of hubs and m is the number of edges in the graph.

Proof

The space complexity of each preprocessed matrix is summarized in Table 7. \({{\tilde{\mathbf {A}}}_{-}}\), \(\mathbf {|H|}_{12}\), \(\mathbf {|H|}_{21}\), \(\mathbf {T}_{12}\), and \(\mathbf {T}_{21}\) are sparse matrices, and constructed from the input graph; hence, the space complexity is bounded by the number of edges (i.e., O(m)). Note that \(\mathbf {|H|}\) and \(\mathbf {T}\) have the same sparsity pattern; hence, \(\mathbf {|H|}_{11}\) and \(\mathbf {T}_{11}\) identified by [19, 28] have the same b blocks. The ith block in \(\mathbf {|H|}_{11}^{-1}\) (or \(\mathbf {T}_{11}^{-1}\)) contains \(n_{1i}^{2}\) nonzeros; therefore, \(\mathbf {|H|}_{11}^{-1}\) and \(\mathbf {T}_{11}^{-1}\) require \(O(\sum _{i=1}^{b}n_{1i}^{2})\) space, respectively. Since the dimension of \(\mathbf {L}^{-1}_{\mathbf {|H|}}\), \(\mathbf {U}^{-1}_{\mathbf {|H|}}\), \(\mathbf {L}^{-1}_{\mathbf {T}}\), and \(\mathbf {U}^{-1}_{\mathbf {T}}\) is \(n_2\), they require \(O(n_2^2)\) space. \(\square \)

Note that the blocks in \(\mathbf {|H|}_{11}\) (or \(\mathbf {T}_{11}\)) are discovered by the reordering method [19, 28] as briefly described in Appendix A.1. In real-world graphs, \(\sum _{i=1}^{b}n_{1i}^{2}\) can be bounded by O(m) as shown in [34]. Hence, we assume that the space complexity of \(\mathbf {|H|}_{11}^{-1}\) and \(\mathbf {T}_{11}^{-1}\) is O(m) for simplicity.

Table 7 Space complexity of each preprocessed matrix from Algorithm 3

Lemma 7

(Time Complexity of Preprocessing Phase in SRWR-Pre) The preprocessing phase in Algorithm 3 takes \(O(T(m+n\log {n}) + n_{2}^{3} + mn_{2})\) where \(T=\lceil {\frac{n_{2}}{tn}}\rceil \) is the number of iterations, and t is the hub selection ratio in the hub-and-spoke reordering method [19, 28].

Proof

We only consider the main factors of the time complexity of Algorithm 3 in this proof. The hub-and-spoke reordering method takes \(O(T(m + n\log {n}))\) time (line 1) where T is \(\lceil {\frac{n_{2}}{tn}}\rceil \) which is proved in [19, 28]. Computing the Schur complement of \(\mathbf {|H|}_{11}\) takes \(O(n_{2}^{2} + mn_{2})\) because it takes \(O(mn_{2})\) to compute \(\mathbf {P}_{1} = \mathbf {|H|}_{11}^{-1}\mathbf {|H|}_{12}\) and \(\mathbf {P}_{2} = \mathbf {|H|}_{21}\mathbf {P}_{1}\) by Lemma 4, and \(O(n_{2}^{2})\) to compute \(\mathbf {|H|}_{22} - \mathbf {P}_{2}\) (line 6). It takes \(O(n_{2}^{3})\) to compute the inverse of the LU factors (line 8). Note that computing \(\mathbf {|H|}^{-1}_{11}\)(line 4) requires \(O(\sum _{i=1}^{b}n_{1i}^{3})\) time where it takes \(n_{1i}^{3}\) to obtain the inverse of ith block. In real-world networks, the size \(n_{1i}\) of each block is much smaller than the number \(n_2\) of hubs; thus, we assume that \(\sum _{i=1}^{b}n_{1i}^{3} \ll n_{2}^{3}\) [34]. Hence, the time complexity of preprocessing \(\mathbf {|H|}\) is \(O(T(m + n\log {n}) + n_{2}^{3} + mn_{2})\). Note that the time complexity of preprocessing \(\mathbf {T}\) is included into that of preprocessing \(\mathbf {|H|}\) since \(\mathbf {T}\) and \(\mathbf {|H|}\) have the same sparsity pattern. \(\square \)

Lemma 8

(Time Complexity of Query Phase in SRWR-Pre) The query phase in Algorithm 4 takes \(O(n_{2}^{2} + n + m)\) time.

Proof

We only consider the main factors of the time complexity of Algorithm 4 in this proof. It takes \(O(n_{2}^{2} + m)\) to compute \(\mathbf {p}_{2}\) since it takes \(O(n_{2} + m)\) to compute \({{\tilde{\mathbf{q}}}}_{2} = \mathbf {q}_{2} - \mathbf {|H|}_{21}(\mathbf {|H|}^{-1}_{11}\mathbf {q}_{1})\), and \(O(n_{2}^{2}\)) to compute \(\mathbf {U}^{-1}_{\mathbf {|H|}}(\mathbf {L}^{-1}_{\mathbf {|H|}}{{\tilde{\mathbf{q}}}}_{2})\) (line 2). It takes O(n) time to concatenate the partitioned vectors (lines 4 and 8) and compute \(\mathbf {r}^{+}\) and \(\mathbf {r}\) (lines 9 and 10 ). Hence, the total time complexity of the query phase is \(O(n_{2}^{2} + n + m)\). \(\square \)

Detailed limitations of existing random walk-based ranking models in signed networks

In this section, we describe the detailed limitation of existing random walk-based ranking models which are briefly described in Sect. 1.

  • Random Walk with Restart (RWR): We perform RWR on a given signed network after taking absolute edge weights to obtain \(\mathbf {r}\) as follows:

    $$\begin{aligned} \mathbf {r}= (1-c){|{\tilde{\mathbf {A}}}|}^{\top }\mathbf {r}+ c\mathbf {q} \end{aligned}$$

    where \({|{\tilde{\mathbf {A}}}|}\) is the row-normalized matrix of the absolute adjacency matrix in the signed network. RWR does not properly consider negative edges for \(\mathbf {r}\).

  • Modified Random Walk with Restart (M-RWR) [33]: M-RWR applies RWR separately on both a positive subgraph and a negative subgraph; thus, it obtains \(\mathbf {r}^{+}\) on the positive subgraph and \(\mathbf {r}^{-}\) on the negative subgraph, and then, computes \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\). The detailed equations for M-RWR are as follows:

    $$\begin{aligned} \mathbf {r}^{+}= (1-c){{\tilde{\mathbf {B}}}}_{+}^{\top }\mathbf {r}^{+}+ c\mathbf {q} \text { and } \mathbf {r}^{-}= (1-c){{\tilde{\mathbf {B}}}}_{-}^{\top }\mathbf {r}^{-}+ c\mathbf {q} \end{aligned}$$

    where \({{\tilde{\mathbf {B}}}}_{+}\) is the row-normalized matrix of the adjacency matrix containing only positive edges, and \({{\tilde{\mathbf {B}}}}_{-}\) is that of the absolute adjacency matrix containing only negative edges. The main limitation of M-RWR is that it does not consider relationships between positive and negative edges due to the separation as shown in the above equations.

  • Modified Personalized SALSA (M-PSALSA) [30]: Andrew et al. made a modification on SALSAFootnote 1 by introducing the random jump into it, called Personalized SALSA (PSALSA). As similar to M-RWR, we apply PSALSA separately on both positive and negative subgraphs and consider authorities on the positive subgraph as \(\mathbf {r}^{+}\), and those scores on the negative subgraph as \(\mathbf {r}^{-}\). M-PSALSA also has the same limitation with M-RWR.

  • Personalized Signed Spectral Rank (PSR) [23]: Kunegis et al. proposed PSR which is a variant of PageRank by constructing the following matrix similar to Google matrix:

    $$\begin{aligned} \mathbf {M}_{PSR} = (1-c)\mathbf {D}^{-1}\mathbf {A}^{\top } + c\mathbf {e}_{s}\mathbf {1}^{\top } \end{aligned}$$

    where \(\mathbf {A}\) is the signed adjacency matrix, \(\mathbf {D}\) is the diagonal out-degree matrix, and \(\mathbf {e}_{s}\) is the sth unit vector. Then, PSR computes the left eigenvector of \(\mathbf {M}_{PSR}\), which induces a relative trustworthy score vector \(\mathbf {r}\) including positive and negative values. Although PSR is able to produce \(\mathbf {r}\), the equation for PSR is heuristic because \(\mathbf {M}_{PSR}\) is not a column stochastic matrix. Also, how the random surfer based on the equation interprets negative edges is veiled.

Detailed description of evaluation metrics

We describe the details of metrics used in the link prediction and the troll identification tasks. The metrics for the sign prediction task is described in Sect. 5.5.

Link prediction

  • GAUC (Generalized AUC): Song et al. [35] proposed GAUC which measures the quality of link prediction in signed networks. An ideal personalized ranking w.r.t. a seed node s needs to rank nodes with positive links to s at the top, those with negative links at the bottom, and other unknown status nodes in the middle of the ranking. For a seed node s, suppose that \(\mathbf {P}_{s}\) is the set of positive nodes potentially connected by s, \(\mathbf {N}_{s}\) is that of negative nodes, and \(\mathbf {O}_{s}\) is that of the other nodes. Then, GAUC of the personalized ranking w.r.t. s is defined as follows:

    $$\begin{aligned} \text {GAUC}_{s}&= \frac{\eta }{|\mathbf {P}_{s}|(|\mathbf {O}_{s}| + |\mathbf {N}_{s}|)}\left( \sum _{p \in \mathbf {P}_{s}}\sum _{i \in \mathbf {O}_{s} \cup \mathbf {N}_{s}} {\mathbb {I}}(\mathbf {r}_{p} > \mathbf {r}_{i}) \right) \\&\quad + \frac{1-\eta }{|\mathbf {N}_{s}|(|\mathbf {O}_{s}| + |\mathbf {P}_{s}|)} \left( \sum _{i \in \mathbf {O}_{s} \cup \mathbf {P}_{s}} \sum _{n \in \mathbf {N}_{s}} {\mathbb {I}}(\mathbf {r}_{i} < \mathbf {r}_{n}) \right) \end{aligned}$$

    where \(\eta = \frac{|\mathbf {P}_{s}|}{|\mathbf {P}_{s}| + |\mathbf {N}_{s}|}\) is the relative ratio of the number of positive edges and that of negative edges, and \({\mathbb {I}}(\cdot )\) is an indicator function that returns 1 if a given predicate is true, or 0 otherwise. GAUC will be 1.0 for the perfect ranking list and 0.5 for a random ranking list [35].

  • AUC (Area Under the Curve): AUC of the personalized ranking scores \(\mathbf {r}\) w.r.t. seed node s in signed networks is defined as follows [35]:

    $$\begin{aligned} \text {AUC}_{s} = \frac{1}{|\mathbf {P}_{s}||\mathbf {N}_{s}|}\sum _{p\in \mathbf {P}_{s}}\sum _{n \in \mathbf {N}_{s}}{\mathbb {I}}(\mathbf {r}_{p} > \mathbf {r}_{n}) \end{aligned}$$

    where \(\mathbf {P}_{s}\) is the set of positive nodes potentially connected by s, and \(\mathbf {N}_{s}\) is the set of negative nodes. \({\mathbb {I}}(\cdot )\) is an indicator function that returns 1 if a given predicate is true, or 0 otherwise. With an ideal ranking list, AUC should be 1 representing each positive sample is ranked higher than all the negative samples. For a random ranking, AUC will be 0.5. However, AUC is not a satisfactory metric for the link prediction task in signed networks because AUC is designed for two classes (positive and negative), while the link prediction in signed networks should consider three classes (positive, unknown, and negative) as described in the above.

Troll identification

Suppose that we have a personalized ranking \({\mathcal {R}}\) in the ascending order of the trustworthiness scores w.r.t. a seed node (i.e., a node with a low score is ranked high) to have the same effect of searching trolls in the bottom of the original ranking in the descending order of those scores.

  • MAP@k (Mean Average Precision): MAP@k is the mean of average precisions, AP@k, for multiple queries. Suppose that there are l trolls to be captured. Then, AP@k is defined as follows:

    $$\begin{aligned} \text {AP@}k = \frac{1}{\min (l, k)}\left( \sum _{t \in \mathbf {T}} \text {Precision@}t\right) \end{aligned}$$

    where \(\text {Precision@}t\) is the precision at the cutoff t. Note that \(\mathbf {T} = \{t | {\mathbb {I}}({\mathcal {R}}[t]) = 1 \text { for } 1 \le t \le k \}\) where \({\mathcal {R}}[t]\) denotes the user ranked at position t in the ranking \({\mathcal {R}}\), and \({\mathbb {I}}({\mathcal {R}}[t])\) is 1 if \({\mathcal {R}}[t]\) is a troll. For N queries, MAP@k is defined as follows:

    $$\begin{aligned} \text {MAP@}k = \frac{1}{N}\left( \sum _{i=1}^{N}\text {AP@}k\right) \end{aligned}$$
  • NDCG@k (Normalized Discount Cumulative Gain): NDCG is the normalized value of Discount Cumulative Gain (DCG), which is defined as follows:

    $$\begin{aligned} \text {DCG}@k = rel_{1} + \sum _{i=2}^{k}\frac{rel_i}{log_{2}{(i)}}, \quad \text {and}\quad \text {NDCG}@k = \frac{\text {DCG}@k}{\text {IDCG}@k} \end{aligned}$$

    where \(rel_{i}\) is the user-graded relevance score for the ith ranked item. Then, NDCG@k is obtained by normalizing using Ideal DCG(IDCG) which is the DCG for the ideal order of ranking.

  • Precision@k and Recall@k: Precision@k (Recall@k) is the precision (recall) at the cutoff k in a ranking. Precision@k is the ratio of identified trolls in top-k ranking, and Recall@k is the ratio of identified trolls in the total trolls.

  • MRR (Mean Reciprocal Rank): MRR@k is the mean of the reciprocal rank (RR) for each the top-k query response. RR is the multiplicative inverse of the rank of the first correct answer. Hence, for N multiple queries, MRR@k is defined as follows:

    $$\begin{aligned} \text {MRR}@k = \frac{1}{N}\sum _{i=1}^{N}\frac{1}{rank_{i}} \end{aligned}$$

    where \(rank_{i}\) is the rank position of the first relevant item in the top-k ranking. If there is no relevant item in the ranking for the ith query, the inverse of the rank, \({rank_{i}}^{-1}\), becomes zero.

Discussion on relative trustworthiness scores of SRWR

In Sect. 4.1, we define the relative trustworthiness \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\) where \(\mathbf {r}^{+}\) is for positive SRWR scores, and \(\mathbf {r}^{-}\) is for negative SRWR ones. We show that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are measures, and \(\mathbf {r}\) is a signed measure using definitions from measure theory [38]. We first introduce the definition of measure as follows:

Definition 5

(Measure [38]) A measure \(\mu \) on a (finite) set \(\Omega \) with \(\sigma \)-algebra \({\mathcal {A}}\) is a function \(\mu : {\mathcal {A}} \rightarrow {\mathbb {R}}_{\ge 0}\) such that

  1. 1.

    (Nonnegativity) \(\mu (E) \ge 0\)\(\forall E \in {\mathcal {A}}\),

  2. 2.

    (Null empty set) \(\mu (\emptyset ) = 0\),

  3. 3.

    (Countable additivity) \(\mu (\bigcup _{i=1}^{\infty }E_{i})=\sum _{i=1}^{\infty }E_{i}\) for any sequence of pairwise disjoint sets, \(E_1, E_2, \ldots \in {\mathcal {A}}\)

where \(\sigma \)-algebra \({\mathcal {A}}\) on \(\Omega \) is a collection \({\mathcal {A}}\subseteq 2^{\Omega }\) s.t. it is nonempty, and closed under complements (i.e., \(E \in {\mathcal {A}} \Rightarrow E^{c} \in {\mathcal {A}}\)) and countable unions (i.e., \(E_1, E_2, \cdots \in {\mathcal {A}} \Rightarrow \bigcup _{i=1}^{\infty }E_{i}\in {\mathcal {A}}\)). The pair of \((\Omega , {\mathcal {A}})\) is called measurable space. \(\square \)

In probability theory, \(\sigma \)-algebra \({\mathcal {A}}\) describes all possible events to be measured as probability. Note that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are joint probabilities of nodes and signs, i.e., \(\mathbf {r}^{+}_{u}=P(N={u}, S=+)\) and \(\mathbf {r}^{-}_{u}=P(N=u, S=-)\) where N is a random variable of nodes, and S is a random variable of the surfer’s sign. Note that N takes an item from \(\sigma \)-algebra \({\mathcal {A}}\). The following property shows that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are (nonnegative) measures.

Property 4

Suppose \(\Omega \) is the set \(\mathbf {V}\) of nodes, and \(\sigma \)-algebra \({\mathcal {A}}\) on \(\Omega \) is \(2^\Omega \). Let \(\mu ^{+}=P(N, S=+)\) and \(\mu ^{-}=P(N, S=-)\). Then, both \(\mu ^{+}\) and \(\mu ^{-}\) are (nonnegative) measures according to Definition 5.

Proof

For any \(E \in {\mathcal {A}}\), \(\mu ^{+}(E) \ge 0\) and \(\mu ^{+}(\emptyset ) = 0\) are obviously true since \(P(N, S=+)\) is a probability; hence, \(P(E, S=+) \ge 0\) and \(P(\emptyset , S=+) = 0\). Let \((E_n)_{n\in {\mathbb {N}}}\) be a sequence of pairwise disjoint sets where \(E_n \in {\mathcal {A}}\). Since the sets in the sequence are mutually disjoint, the following holds:

$$\begin{aligned} P\left( \bigcup _{n \in {\mathbb {N}}}E_n, S=+\right) = \sum _{n \in {\mathbb {N}}} P(E_n, S=+) \end{aligned}$$

Therefore, \(\mu ^{+}=P(N, S=+)\) is a measure by Definition 5. Similarly, \(\mu ^{-}=P(N, S=-)\) is also a measure. \(\square \)

Next, we introduce the definition of signed measure, a generalized version of measure by allowing it to have negative values.

Definition 6

(Signed Measure [38]) Given a set \(\Omega \) and \(\sigma \)-algebra \({\mathcal {A}}\), a signed measure on \((\Omega , {\mathcal {A}})\) is a function \(\mu : {\mathcal {A}} \rightarrow {\mathbb {R}}\) such that

  1. 1.

    (Real value) \(\mu (E)\) takes a real value in \({\mathbb {R}}\),

  2. 2.

    (Null empty set) \(\mu (\emptyset ) = 0\),

  3. 3.

    (Countable additivity) \(\mu (\bigcup _{i=1}^{\infty }E_{i})=\sum _{i=1}^{\infty }E_{i}\) for any sequence of pairwise disjoint sets, \(E_1, E_2, \cdots \in {\mathcal {A}}\)\(\square \)

Note that Shannon entropy and electric charge are representative examples of signed measure. Then, the following lemma indicates the difference between two nonnegative measures is a signed measure.

Lemma 9

(Difference Between Two Nonnegative Measures [38]) Suppose we are given nonnegative measure \(\mu ^{+}\) and \(\mu ^{-}\) on the same measurable space \((\Omega , {\mathcal {A}})\). Then, \(\mu = \mu ^{+} - \mu ^{-}\) is a signed measure.

Proof

Since \(\mu ^{+}\) and \(\mu ^{-}\) are nonnegative, \(\mu \) is located between \(-\infty \) and \(\infty \). Also, \(\mu (\emptyset ) = \mu ^{+}(\emptyset ) - \mu ^{-}(\emptyset ) = 0\). Moreover, \(\mu \) is countable additive, i.e.,

$$\begin{aligned} \mu \left( \bigcup _{i=1}^{\infty }E_{i}\right) = \mu ^{+}\left( \bigcup _{i=1}^{\infty }E_{i}\right) - \mu ^{-}\left( \bigcup _{i=1}^{\infty }E_{i}\right) = \sum _{i=1}^{\infty }\left( \mu ^{+}(E_i) - \mu ^{-}(E_i)\right) = \sum _{i=1}^{\infty }\mu (E_i) \end{aligned}$$

Hence, \(\mu = \mu ^{+} - \mu ^{-}\) is a signed measure according to Definition 6. \(\square \)

Lemma 9 implies that the relative trustworthiness \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\) is a signed measure. The trustworthiness \(\mathbf {r}_u\) measures a degree of trustworthiness between seed node s and node u: if \(\mathbf {r}_{u} > 0\), seed node s is likely to trust node u as much as \(\mathbf {r}_{u}\), while if \(\mathbf {r}_{u} < 0\), s is likely to distrust u as much as \(\mathbf {r}_{u}\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jung, J., Jin, W. & Kang, U. Random walk-based ranking in signed social networks: model and algorithms. Knowl Inf Syst 62, 571–610 (2020). https://doi.org/10.1007/s10115-019-01364-z

Download citation

Keywords

  • Signed networks
  • Signed random walk with restart
  • Personalized node ranking
  • Trustworthiness measure