Abstract
How can we rank nodes in signed social networks? Relationships between nodes in a signed network are represented as positive (trust) or negative (distrust) edges. Many social networks have adopted signed networks to express trust between users. Consequently, ranking friends or enemies in signed networks has received much attention from the data mining community. The ranking problem, however, is challenging because it is difficult to interpret negative edges. Traditional random walkbased methods such as PageRank and random walk with restart cannot provide effective rankings in signed networks since they assume only positive edges. Although several methods have been proposed by modifying traditional ranking models, they also fail to account for proper rankings due to the lack of ability to consider complex edge relations. In this paper, we propose Signed Random Walk with Restart (SRWR), a novel model for personalized ranking in signed networks. We introduce a signed random surfer so that she considers negative edges by changing her sign for walking. Our model provides proper rankings considering signed edges based on the signed random walk. We develop two methods for computing SRWR scores: SRWRIter and SRWRPre which are iterative and preprocessing methods, respectively. SRWRIter naturally follows the definition of SRWR, and iteratively updates SRWR scores until convergence. SRWRPre enables fast ranking computation which is important for the performance of applications of SRWR. Through extensive experiments, we demonstrate that SRWR achieves the best accuracy for link prediction, predicts trolls \(4\times \) more accurately, and shows a satisfactory performance for inferring missing signs of edges compared to other competitors. In terms of efficiency, SRWRPre preprocesses a signed network \(4.5 \times \) faster and requires \(11 \times \) less memory space than other preprocessing methods; furthermore, SRWRPre computes SRWR scores up to \(14 \times \) faster than other methods in the query phase.
This is a preview of subscription content, access via your institution.
References
 1.
Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 635–644
 2.
Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184
 3.
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
 4.
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
 5.
Cartwright D, Harary F (1956) Structural balance: a generalization of heider’s theory. Psychol Rev 63(5):277
 6.
Davis JA (1967) Clustering and structural balance in graphs. Hum Relat 20(2):181–187
 7.
Duff IS, Grimes RG, Lewis JG (1989) Sparse matrix test problems. ACM Trans Math Softw (TOMS) 15(1):1–14
 8.
Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge
 9.
Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact topk search for random walk with restart. Proc VLDB Endow 5(5):442–453
 10.
Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 597–605
 11.
Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU press
 12.
Guha R, Kumar R, Raghavan P, Tomkins A (2004) Propagation of trust and distrust. In: Proceedings of the 13th international conference on World Wide Web. ACM, pp 403–412
 13.
Haveliwala TH (2002) Topicsensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 517–526
 14.
Heider F (1946) Attitudes and cognitive organization. J Psychol 21(1):107–112
 15.
Jin W, Jung J, Kang U (2019) Supervised and extended restart in random walks for ranking and link prediction in networks. PLoS ONE 14(3):e0213857
 16.
Jung J, Jin W, Sael L, Kang U (2016) Personalized ranking in signed networks using signed random walk with restart. In: IEEE 16th international conference on data mining, ICDM 2016, December 12–15, 2016, Barcelona, Spain, pp 973–978. http://dx.doi.org/10.1109/ICDM.2016.0122
 17.
Jung J, Park N, Sael L, Kang U (2017) Bepi: Fast and memoryefficient method for billionscale random walk with restart. In: Proceedings of the 2017 ACM international conference on management of data, SIGMOD conference 2017, Chicago, IL, USA, May 14–19, 2017, pp 789–804
 18.
Jung J, Shin K, Sael L, Kang U (2016) Random walk with restart on large graphs using block elimination. ACM Trans Database Syst 41(2):12. https://doi.org/10.1145/2901736
 19.
Kang U, Faloutsos C (2011) Beyond ‘caveman communities’: hubs and spokes for graph compression and mining, in ‘ICDM’
 20.
Kang U, Tong H, Sun J (2012) Fast random walk graph kernel. In: Proceedings of the twelfth SIAM international conference on data mining, Anaheim, California, USA, April 2628, 2012, pp 828–838
 21.
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
 22.
Kleinberg JM (1999) Hubs, authorities, and communities. ACM Comput Surveys (CSUR) f31(4es):5
 23.
Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: Proceedings of the 18th international conference on World wide web. ACM, pp 741–750
 24.
Langville AN, Meyer CD, Fernández P (2008) Googles pagerank and beyond: the science of search engine rankings. Math Intell 30(1):68–69
 25.
Lempel R, Moran S (2001) Salsa: the stochastic approach for linkstructure analysis. ACM Trans Inf Syst (TOIS) 19(2):131–160
 26.
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 641–650
 27.
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1361–1370
 28.
Lim Y, Kang U, Faloutsos C (2014) Slashburn: graph compression and mining beyond caveman communities. IEEE Trans Knowl Data Eng 26(12):3077–3089
 29.
Mishra A, Bhattacharya A (2011) Finding the bias and prestige of nodes in networks based on trust scores. In: Proceedings of the 20th international conference on World Wide Web. ACM, pp 567–576
 30.
Ng AY, Zheng AX, Jordan MI (2001) Stable algorithms for link analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 258–266
 31.
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the Web
 32.
Saad Y (2003) Iterative methods for sparse linear systems, vol 82. SIAM
 33.
Shahriari M, Jalili M (2014) Ranking nodes in signed social networks. Soc Netw Anal Min 4(1):1–12
 34.
Shin K, Jung J, Lee S, Kang U (2015) Bear: Block elimination approach for random walk with restart on large graphs. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1571–1585
 35.
Song D, Meyer DA (2015) Recommending positive links in signed social networks by optimizing a generalized auc. In: AAAI, pp 290–296
 36.
Strang G (2006) Linear algebra and its applications. Thomson, Brooks/Cole. https://books.google.ie/books?id=q9CaAAAACAAJ
 37.
Szell M, Lambiotte R, Thurner S (2010) Multirelational organization of largescale social networks in an online world. Proc Nat Acad Sci 107(31):13636–13641
 38.
Taylor ME (2006) Measure theory and integration. American Mathematical Soc, Providence
 39.
Tong H, Faloutsos C, Gallagher B, EliassiRad T (2007) Fast besteffort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 737–746
 40.
Tong H, Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3):327–346
 41.
Wu Z, Aggarwal CC, Sun J (2016) The trolltrust model for ranking in signed networks. In: Proceedings of the ninth ACM international conference on Web search and data mining. ACM, pp 447–456
 42.
Yang B, Cheung WK, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19(10):1333–1348
 43.
Yoon M, Jin W, Kang U (2018) Fast and accurate random walk with restart on dynamic graphs with guarantees. In: Proceedings of the 2018 World Wide Web conference on World Wide Web, WWW 2018, Lyon, France, April 23–27, 2018, pp 409–418
 44.
Yoon M, Jung J, Kang U (2018) Tpa: Fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In: 34th IEEE international conference on data engineering, ICDE 2018, Paris, France, April 16–19, 2018
Acknowledgements
This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) [2013000179, Development of Core Technology for Contextaware DeepSymbolic Hybrid Learning and Construction of Language Resources]. The Institute of Engineering Research at Seoul National University provided research facilities for this work. The ICT at Seoul National University provides research facilities for this study.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Details of the hubandspoke reordering method
SlashBurn [19, 28] is a node reordering algorithm which concentrates nonzero entries of the adjacency matrix of a given graph based on the hubandspoke structure. Let n be the number of nodes in a graph, and t be the hub selection ratio whose range is between 0 and 1 where \(\lceil tn \rceil \) indicates the number of nodes selected by SlashBurn as hubs. For each iteration, SlashBurn disconnects \(\lceil tn \rceil \) highdegree nodes, called hub nodes, from the graph; then the graph is split into the giant connected component (GCC) and the disconnected components. The nodes in the disconnected components are called spokes, and each disconnected component forms a block in \(\mathbf {H}_{11}\) (or \(\mathbf {T}_{11}\)) in Fig. 6. Then, SlashBurn reorders nodes such that the hub nodes get the highest ids, the spokes get the lowest ids, and the nodes in the GCC get the ids in the middle. SlashBurn repeats this procedure on the GCC recursively until the size of GCC becomes smaller than \(\lceil tn \rceil \). After SlashBurn is done, the reordered adjacency matrix contains a large and sparse block diagonal matrix in the upper left area, as shown in Fig. 6. Figure 14 depicts the procedure of SlashBurn when \(\lceil tn \rceil =1\).
Properties and lemmas
Sum of positive and negative SRWR scores
Property 3
Consider the recursive equation \(\mathbf {p}= (1c){{\tilde{\mathbf {A}}}}^{\top }\mathbf {p} + c\mathbf {q}\) where \(\mathbf {p}= \mathbf {r}^{+}+ \mathbf {r}^{}\) and \({{\tilde{\mathbf {A}}}}^{\top }\) is a column stochastic matrix. Then \(\mathbf {1}^{\top }\mathbf {p}= \sum _{i}\mathbf {p}_{i} = 1\).
Proof
By multiplying both sides by \(\mathbf {1}^{\top }\), the equation is represented as follows:
Note that \(\mathbf {1}^{\top }{{\tilde{\mathbf {A}}}}^{\top } = ({{\tilde{\mathbf {A}}}}\mathbf {1})^{\top }\), and \({{\tilde{\mathbf {A}}}}\) is a row stochastic matrix; thus, \(({{\tilde{\mathbf {A}}}}\mathbf {1})^{\top } = \mathbf {1}^{\top }\). Hence, the above equation is represented as follows:
\(\square \)
Analysis on number of iterations of SRWRIter
Lemma 3
Suppose \(\mathbf {h}= [\mathbf {r}^{+};\mathbf {r}^{}]^{\top }\), and \(\mathbf {h}^{(k)}\) is the result of kth iteration in SRWRIter. Let \(\delta ^{(k)}\) denote the error \(\mathbf {h}^{(k)}  \mathbf {h}^{(k1)} _{1}\). Then \(\delta ^{(k)} \le 2(1c)^{k}\), and the estimated number T of iterations for convergence is \(\log _{1c}\frac{\epsilon }{2}\) where \(\epsilon \) is an error tolerance, and c is the restart probability.
Proof
According to Eq. (4), \(\delta ^{(k)}\) is represented as follows:
Note that \({{\tilde{\mathbf {B}}}}^{\top } _{1} = 1\) since \({{\tilde{\mathbf {B}}}}^{\top }\) is column stochastic as described in Theorem 1. Hence, \(\delta ^{(k)} \le (1c)\delta ^{(k2)} \le \cdots \le (1c)^{k}\delta ^{(1)}\). Since \(\delta ^{(1)} = \mathbf {h}^{(1)}  \mathbf {h}^{(0)} _{1} \le \mathbf {h}^{(1)}_{1} + \mathbf {h}^{(0)} _{1} = 2\), \(\delta ^{(k)} \le 2(1c)^{k}\). Note that when \(\delta ^{(k)} \le \epsilon \), the iteration of SRWRIter is terminated. Thus, for \(k \le \log _{1c}\frac{\epsilon }{2}\), the iteration is terminated, and the number T of iterations for convergence is estimated at \(\log _{1c}\frac{\epsilon }{2}\). \(\square \)
Time complexity of sparse matrix multiplication
Lemma 4
(Sparse Matrix Multiplication [32]) Suppose that \(\mathbf {A}\) and \(\mathbf {B}\) are \(p \times q\) and \(q \times r\) sparse matrices, respectively, and \(\mathbf {A}\) has \( nnz (\mathbf {A})\) nonzeros. Calculating \(\mathbf {C}=\mathbf {A}\mathbf {B}\) using sparse matrix multiplication requires \(O( nnz (\mathbf {A})r)\).
Complexity analysis of proposed methods for SRWR
We analyze the complexity of our proposed methods SRWRIter and SRWRPre in terms of time and space. The space and time complexities of SRWRIter are presented in Lemma 5, and those of SRWRPre are in Lemmas 6, 7, and 8 , respectively.
Space and time complexities of SRWRIter
Lemma 5
(Space and Time Complexities of SRWRIter) Let n and m denote the number of nodes and edges of a signed network, respectively. Then the space complexity of Algorithm 2 is \(O(n+m)\). The time complexity of Algorithm 2 is \(O(T(n+m))\) where the number T of iterations is \(\log _{1c}\frac{\epsilon }{2}\), c is the restart probability, and \(\epsilon \) is an error tolerance.
Proof
The space complexity for \({{\tilde{\mathbf {A}}}_{+}}\) and \({{\tilde{\mathbf {A}}}_{}}\) is O(m) if we exploit a sparse matrix format such as compressed column storage to save the matrices. We need O(n) for SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{}\). Thus, the space complexity is \(O(n+m)\). One iteration in Algorithm 2 takes \(O(n+m)\) time due to sparse matrix vector multiplications and vector additions where the time complexity of a sparse matrix vector multiplication is linear to the number of nonzeros of a matrix [7]. Hence, the total time complexity is \(O(T(n+m))\) where the number T of iterations is \(\log _{1c}\frac{\epsilon }{2}\) which is proved in Lemma 3. \(\square \)
Space and time complexities of SRWRPre
Lemma 6
(Space Complexity of SRWRPre) The space complexity of the preprocessed matrices from SRWRPre is \(O(n_{2}^{2} + m)\) where \(n_{2}\) is the number of hubs and m is the number of edges in the graph.
Proof
The space complexity of each preprocessed matrix is summarized in Table 7. \({{\tilde{\mathbf {A}}}_{}}\), \(\mathbf {H}_{12}\), \(\mathbf {H}_{21}\), \(\mathbf {T}_{12}\), and \(\mathbf {T}_{21}\) are sparse matrices, and constructed from the input graph; hence, the space complexity is bounded by the number of edges (i.e., O(m)). Note that \(\mathbf {H}\) and \(\mathbf {T}\) have the same sparsity pattern; hence, \(\mathbf {H}_{11}\) and \(\mathbf {T}_{11}\) identified by [19, 28] have the same b blocks. The ith block in \(\mathbf {H}_{11}^{1}\) (or \(\mathbf {T}_{11}^{1}\)) contains \(n_{1i}^{2}\) nonzeros; therefore, \(\mathbf {H}_{11}^{1}\) and \(\mathbf {T}_{11}^{1}\) require \(O(\sum _{i=1}^{b}n_{1i}^{2})\) space, respectively. Since the dimension of \(\mathbf {L}^{1}_{\mathbf {H}}\), \(\mathbf {U}^{1}_{\mathbf {H}}\), \(\mathbf {L}^{1}_{\mathbf {T}}\), and \(\mathbf {U}^{1}_{\mathbf {T}}\) is \(n_2\), they require \(O(n_2^2)\) space. \(\square \)
Note that the blocks in \(\mathbf {H}_{11}\) (or \(\mathbf {T}_{11}\)) are discovered by the reordering method [19, 28] as briefly described in Appendix A.1. In realworld graphs, \(\sum _{i=1}^{b}n_{1i}^{2}\) can be bounded by O(m) as shown in [34]. Hence, we assume that the space complexity of \(\mathbf {H}_{11}^{1}\) and \(\mathbf {T}_{11}^{1}\) is O(m) for simplicity.
Lemma 7
(Time Complexity of Preprocessing Phase in SRWRPre) The preprocessing phase in Algorithm 3 takes \(O(T(m+n\log {n}) + n_{2}^{3} + mn_{2})\) where \(T=\lceil {\frac{n_{2}}{tn}}\rceil \) is the number of iterations, and t is the hub selection ratio in the hubandspoke reordering method [19, 28].
Proof
We only consider the main factors of the time complexity of Algorithm 3 in this proof. The hubandspoke reordering method takes \(O(T(m + n\log {n}))\) time (line 1) where T is \(\lceil {\frac{n_{2}}{tn}}\rceil \) which is proved in [19, 28]. Computing the Schur complement of \(\mathbf {H}_{11}\) takes \(O(n_{2}^{2} + mn_{2})\) because it takes \(O(mn_{2})\) to compute \(\mathbf {P}_{1} = \mathbf {H}_{11}^{1}\mathbf {H}_{12}\) and \(\mathbf {P}_{2} = \mathbf {H}_{21}\mathbf {P}_{1}\) by Lemma 4, and \(O(n_{2}^{2})\) to compute \(\mathbf {H}_{22}  \mathbf {P}_{2}\) (line 6). It takes \(O(n_{2}^{3})\) to compute the inverse of the LU factors (line 8). Note that computing \(\mathbf {H}^{1}_{11}\)(line 4) requires \(O(\sum _{i=1}^{b}n_{1i}^{3})\) time where it takes \(n_{1i}^{3}\) to obtain the inverse of ith block. In realworld networks, the size \(n_{1i}\) of each block is much smaller than the number \(n_2\) of hubs; thus, we assume that \(\sum _{i=1}^{b}n_{1i}^{3} \ll n_{2}^{3}\) [34]. Hence, the time complexity of preprocessing \(\mathbf {H}\) is \(O(T(m + n\log {n}) + n_{2}^{3} + mn_{2})\). Note that the time complexity of preprocessing \(\mathbf {T}\) is included into that of preprocessing \(\mathbf {H}\) since \(\mathbf {T}\) and \(\mathbf {H}\) have the same sparsity pattern. \(\square \)
Lemma 8
(Time Complexity of Query Phase in SRWRPre) The query phase in Algorithm 4 takes \(O(n_{2}^{2} + n + m)\) time.
Proof
We only consider the main factors of the time complexity of Algorithm 4 in this proof. It takes \(O(n_{2}^{2} + m)\) to compute \(\mathbf {p}_{2}\) since it takes \(O(n_{2} + m)\) to compute \({{\tilde{\mathbf{q}}}}_{2} = \mathbf {q}_{2}  \mathbf {H}_{21}(\mathbf {H}^{1}_{11}\mathbf {q}_{1})\), and \(O(n_{2}^{2}\)) to compute \(\mathbf {U}^{1}_{\mathbf {H}}(\mathbf {L}^{1}_{\mathbf {H}}{{\tilde{\mathbf{q}}}}_{2})\) (line 2). It takes O(n) time to concatenate the partitioned vectors (lines 4 and 8) and compute \(\mathbf {r}^{+}\) and \(\mathbf {r}\) (lines 9 and 10 ). Hence, the total time complexity of the query phase is \(O(n_{2}^{2} + n + m)\). \(\square \)
Detailed limitations of existing random walkbased ranking models in signed networks
In this section, we describe the detailed limitation of existing random walkbased ranking models which are briefly described in Sect. 1.
Random Walk with Restart (RWR): We perform RWR on a given signed network after taking absolute edge weights to obtain \(\mathbf {r}\) as follows:
$$\begin{aligned} \mathbf {r}= (1c){{\tilde{\mathbf {A}}}}^{\top }\mathbf {r}+ c\mathbf {q} \end{aligned}$$where \({{\tilde{\mathbf {A}}}}\) is the rownormalized matrix of the absolute adjacency matrix in the signed network. RWR does not properly consider negative edges for \(\mathbf {r}\).
Modified Random Walk with Restart (MRWR) [33]: MRWR applies RWR separately on both a positive subgraph and a negative subgraph; thus, it obtains \(\mathbf {r}^{+}\) on the positive subgraph and \(\mathbf {r}^{}\) on the negative subgraph, and then, computes \(\mathbf {r}= \mathbf {r}^{+} \mathbf {r}^{}\). The detailed equations for MRWR are as follows:
$$\begin{aligned} \mathbf {r}^{+}= (1c){{\tilde{\mathbf {B}}}}_{+}^{\top }\mathbf {r}^{+}+ c\mathbf {q} \text { and } \mathbf {r}^{}= (1c){{\tilde{\mathbf {B}}}}_{}^{\top }\mathbf {r}^{}+ c\mathbf {q} \end{aligned}$$where \({{\tilde{\mathbf {B}}}}_{+}\) is the rownormalized matrix of the adjacency matrix containing only positive edges, and \({{\tilde{\mathbf {B}}}}_{}\) is that of the absolute adjacency matrix containing only negative edges. The main limitation of MRWR is that it does not consider relationships between positive and negative edges due to the separation as shown in the above equations.
Modified Personalized SALSA (MPSALSA) [30]: Andrew et al. made a modification on SALSA^{Footnote 1} by introducing the random jump into it, called Personalized SALSA (PSALSA). As similar to MRWR, we apply PSALSA separately on both positive and negative subgraphs and consider authorities on the positive subgraph as \(\mathbf {r}^{+}\), and those scores on the negative subgraph as \(\mathbf {r}^{}\). MPSALSA also has the same limitation with MRWR.
Personalized Signed Spectral Rank (PSR) [23]: Kunegis et al. proposed PSR which is a variant of PageRank by constructing the following matrix similar to Google matrix:
$$\begin{aligned} \mathbf {M}_{PSR} = (1c)\mathbf {D}^{1}\mathbf {A}^{\top } + c\mathbf {e}_{s}\mathbf {1}^{\top } \end{aligned}$$where \(\mathbf {A}\) is the signed adjacency matrix, \(\mathbf {D}\) is the diagonal outdegree matrix, and \(\mathbf {e}_{s}\) is the sth unit vector. Then, PSR computes the left eigenvector of \(\mathbf {M}_{PSR}\), which induces a relative trustworthy score vector \(\mathbf {r}\) including positive and negative values. Although PSR is able to produce \(\mathbf {r}\), the equation for PSR is heuristic because \(\mathbf {M}_{PSR}\) is not a column stochastic matrix. Also, how the random surfer based on the equation interprets negative edges is veiled.
Detailed description of evaluation metrics
We describe the details of metrics used in the link prediction and the troll identification tasks. The metrics for the sign prediction task is described in Sect. 5.5.
Link prediction

GAUC (Generalized AUC): Song et al. [35] proposed GAUC which measures the quality of link prediction in signed networks. An ideal personalized ranking w.r.t. a seed node s needs to rank nodes with positive links to s at the top, those with negative links at the bottom, and other unknown status nodes in the middle of the ranking. For a seed node s, suppose that \(\mathbf {P}_{s}\) is the set of positive nodes potentially connected by s, \(\mathbf {N}_{s}\) is that of negative nodes, and \(\mathbf {O}_{s}\) is that of the other nodes. Then, GAUC of the personalized ranking w.r.t. s is defined as follows:
$$\begin{aligned} \text {GAUC}_{s}&= \frac{\eta }{\mathbf {P}_{s}(\mathbf {O}_{s} + \mathbf {N}_{s})}\left( \sum _{p \in \mathbf {P}_{s}}\sum _{i \in \mathbf {O}_{s} \cup \mathbf {N}_{s}} {\mathbb {I}}(\mathbf {r}_{p} > \mathbf {r}_{i}) \right) \\&\quad + \frac{1\eta }{\mathbf {N}_{s}(\mathbf {O}_{s} + \mathbf {P}_{s})} \left( \sum _{i \in \mathbf {O}_{s} \cup \mathbf {P}_{s}} \sum _{n \in \mathbf {N}_{s}} {\mathbb {I}}(\mathbf {r}_{i} < \mathbf {r}_{n}) \right) \end{aligned}$$where \(\eta = \frac{\mathbf {P}_{s}}{\mathbf {P}_{s} + \mathbf {N}_{s}}\) is the relative ratio of the number of positive edges and that of negative edges, and \({\mathbb {I}}(\cdot )\) is an indicator function that returns 1 if a given predicate is true, or 0 otherwise. GAUC will be 1.0 for the perfect ranking list and 0.5 for a random ranking list [35].

AUC (Area Under the Curve): AUC of the personalized ranking scores \(\mathbf {r}\) w.r.t. seed node s in signed networks is defined as follows [35]:
$$\begin{aligned} \text {AUC}_{s} = \frac{1}{\mathbf {P}_{s}\mathbf {N}_{s}}\sum _{p\in \mathbf {P}_{s}}\sum _{n \in \mathbf {N}_{s}}{\mathbb {I}}(\mathbf {r}_{p} > \mathbf {r}_{n}) \end{aligned}$$where \(\mathbf {P}_{s}\) is the set of positive nodes potentially connected by s, and \(\mathbf {N}_{s}\) is the set of negative nodes. \({\mathbb {I}}(\cdot )\) is an indicator function that returns 1 if a given predicate is true, or 0 otherwise. With an ideal ranking list, AUC should be 1 representing each positive sample is ranked higher than all the negative samples. For a random ranking, AUC will be 0.5. However, AUC is not a satisfactory metric for the link prediction task in signed networks because AUC is designed for two classes (positive and negative), while the link prediction in signed networks should consider three classes (positive, unknown, and negative) as described in the above.
Troll identification
Suppose that we have a personalized ranking \({\mathcal {R}}\) in the ascending order of the trustworthiness scores w.r.t. a seed node (i.e., a node with a low score is ranked high) to have the same effect of searching trolls in the bottom of the original ranking in the descending order of those scores.
MAP@k (Mean Average Precision): MAP@k is the mean of average precisions, AP@k, for multiple queries. Suppose that there are l trolls to be captured. Then, AP@k is defined as follows:
$$\begin{aligned} \text {AP@}k = \frac{1}{\min (l, k)}\left( \sum _{t \in \mathbf {T}} \text {Precision@}t\right) \end{aligned}$$where \(\text {Precision@}t\) is the precision at the cutoff t. Note that \(\mathbf {T} = \{t  {\mathbb {I}}({\mathcal {R}}[t]) = 1 \text { for } 1 \le t \le k \}\) where \({\mathcal {R}}[t]\) denotes the user ranked at position t in the ranking \({\mathcal {R}}\), and \({\mathbb {I}}({\mathcal {R}}[t])\) is 1 if \({\mathcal {R}}[t]\) is a troll. For N queries, MAP@k is defined as follows:
$$\begin{aligned} \text {MAP@}k = \frac{1}{N}\left( \sum _{i=1}^{N}\text {AP@}k\right) \end{aligned}$$NDCG@k (Normalized Discount Cumulative Gain): NDCG is the normalized value of Discount Cumulative Gain (DCG), which is defined as follows:
$$\begin{aligned} \text {DCG}@k = rel_{1} + \sum _{i=2}^{k}\frac{rel_i}{log_{2}{(i)}}, \quad \text {and}\quad \text {NDCG}@k = \frac{\text {DCG}@k}{\text {IDCG}@k} \end{aligned}$$where \(rel_{i}\) is the usergraded relevance score for the ith ranked item. Then, NDCG@k is obtained by normalizing using Ideal DCG(IDCG) which is the DCG for the ideal order of ranking.
Precision@k and Recall@k: Precision@k (Recall@k) is the precision (recall) at the cutoff k in a ranking. Precision@k is the ratio of identified trolls in topk ranking, and Recall@k is the ratio of identified trolls in the total trolls.
MRR (Mean Reciprocal Rank): MRR@k is the mean of the reciprocal rank (RR) for each the topk query response. RR is the multiplicative inverse of the rank of the first correct answer. Hence, for N multiple queries, MRR@k is defined as follows:
$$\begin{aligned} \text {MRR}@k = \frac{1}{N}\sum _{i=1}^{N}\frac{1}{rank_{i}} \end{aligned}$$where \(rank_{i}\) is the rank position of the first relevant item in the topk ranking. If there is no relevant item in the ranking for the ith query, the inverse of the rank, \({rank_{i}}^{1}\), becomes zero.
Discussion on relative trustworthiness scores of SRWR
In Sect. 4.1, we define the relative trustworthiness \(\mathbf {r}= \mathbf {r}^{+} \mathbf {r}^{}\) where \(\mathbf {r}^{+}\) is for positive SRWR scores, and \(\mathbf {r}^{}\) is for negative SRWR ones. We show that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{}\) are measures, and \(\mathbf {r}\) is a signed measure using definitions from measure theory [38]. We first introduce the definition of measure as follows:
Definition 5
(Measure [38]) A measure \(\mu \) on a (finite) set \(\Omega \) with \(\sigma \)algebra \({\mathcal {A}}\) is a function \(\mu : {\mathcal {A}} \rightarrow {\mathbb {R}}_{\ge 0}\) such that
 1.
(Nonnegativity) \(\mu (E) \ge 0\)\(\forall E \in {\mathcal {A}}\),
 2.
(Null empty set) \(\mu (\emptyset ) = 0\),
 3.
(Countable additivity) \(\mu (\bigcup _{i=1}^{\infty }E_{i})=\sum _{i=1}^{\infty }E_{i}\) for any sequence of pairwise disjoint sets, \(E_1, E_2, \ldots \in {\mathcal {A}}\)
where \(\sigma \)algebra \({\mathcal {A}}\) on \(\Omega \) is a collection \({\mathcal {A}}\subseteq 2^{\Omega }\) s.t. it is nonempty, and closed under complements (i.e., \(E \in {\mathcal {A}} \Rightarrow E^{c} \in {\mathcal {A}}\)) and countable unions (i.e., \(E_1, E_2, \cdots \in {\mathcal {A}} \Rightarrow \bigcup _{i=1}^{\infty }E_{i}\in {\mathcal {A}}\)). The pair of \((\Omega , {\mathcal {A}})\) is called measurable space. \(\square \)
In probability theory, \(\sigma \)algebra \({\mathcal {A}}\) describes all possible events to be measured as probability. Note that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{}\) are joint probabilities of nodes and signs, i.e., \(\mathbf {r}^{+}_{u}=P(N={u}, S=+)\) and \(\mathbf {r}^{}_{u}=P(N=u, S=)\) where N is a random variable of nodes, and S is a random variable of the surfer’s sign. Note that N takes an item from \(\sigma \)algebra \({\mathcal {A}}\). The following property shows that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{}\) are (nonnegative) measures.
Property 4
Suppose \(\Omega \) is the set \(\mathbf {V}\) of nodes, and \(\sigma \)algebra \({\mathcal {A}}\) on \(\Omega \) is \(2^\Omega \). Let \(\mu ^{+}=P(N, S=+)\) and \(\mu ^{}=P(N, S=)\). Then, both \(\mu ^{+}\) and \(\mu ^{}\) are (nonnegative) measures according to Definition 5.
Proof
For any \(E \in {\mathcal {A}}\), \(\mu ^{+}(E) \ge 0\) and \(\mu ^{+}(\emptyset ) = 0\) are obviously true since \(P(N, S=+)\) is a probability; hence, \(P(E, S=+) \ge 0\) and \(P(\emptyset , S=+) = 0\). Let \((E_n)_{n\in {\mathbb {N}}}\) be a sequence of pairwise disjoint sets where \(E_n \in {\mathcal {A}}\). Since the sets in the sequence are mutually disjoint, the following holds:
Therefore, \(\mu ^{+}=P(N, S=+)\) is a measure by Definition 5. Similarly, \(\mu ^{}=P(N, S=)\) is also a measure. \(\square \)
Next, we introduce the definition of signed measure, a generalized version of measure by allowing it to have negative values.
Definition 6
(Signed Measure [38]) Given a set \(\Omega \) and \(\sigma \)algebra \({\mathcal {A}}\), a signed measure on \((\Omega , {\mathcal {A}})\) is a function \(\mu : {\mathcal {A}} \rightarrow {\mathbb {R}}\) such that
 1.
(Real value) \(\mu (E)\) takes a real value in \({\mathbb {R}}\),
 2.
(Null empty set) \(\mu (\emptyset ) = 0\),
 3.
(Countable additivity) \(\mu (\bigcup _{i=1}^{\infty }E_{i})=\sum _{i=1}^{\infty }E_{i}\) for any sequence of pairwise disjoint sets, \(E_1, E_2, \cdots \in {\mathcal {A}}\)\(\square \)
Note that Shannon entropy and electric charge are representative examples of signed measure. Then, the following lemma indicates the difference between two nonnegative measures is a signed measure.
Lemma 9
(Difference Between Two Nonnegative Measures [38]) Suppose we are given nonnegative measure \(\mu ^{+}\) and \(\mu ^{}\) on the same measurable space \((\Omega , {\mathcal {A}})\). Then, \(\mu = \mu ^{+}  \mu ^{}\) is a signed measure.
Proof
Since \(\mu ^{+}\) and \(\mu ^{}\) are nonnegative, \(\mu \) is located between \(\infty \) and \(\infty \). Also, \(\mu (\emptyset ) = \mu ^{+}(\emptyset )  \mu ^{}(\emptyset ) = 0\). Moreover, \(\mu \) is countable additive, i.e.,
Hence, \(\mu = \mu ^{+}  \mu ^{}\) is a signed measure according to Definition 6. \(\square \)
Lemma 9 implies that the relative trustworthiness \(\mathbf {r}= \mathbf {r}^{+} \mathbf {r}^{}\) is a signed measure. The trustworthiness \(\mathbf {r}_u\) measures a degree of trustworthiness between seed node s and node u: if \(\mathbf {r}_{u} > 0\), seed node s is likely to trust node u as much as \(\mathbf {r}_{u}\), while if \(\mathbf {r}_{u} < 0\), s is likely to distrust u as much as \(\mathbf {r}_{u}\).
Rights and permissions
About this article
Cite this article
Jung, J., Jin, W. & Kang, U. Random walkbased ranking in signed social networks: model and algorithms. Knowl Inf Syst 62, 571–610 (2020). https://doi.org/10.1007/s1011501901364z
Received:
Revised:
Accepted:
Published:
Issue Date:
Keywords
 Signed networks
 Signed random walk with restart
 Personalized node ranking
 Trustworthiness measure