Abstract
How can we rank nodes in signed social networks? Relationships between nodes in a signed network are represented as positive (trust) or negative (distrust) edges. Many social networks have adopted signed networks to express trust between users. Consequently, ranking friends or enemies in signed networks has received much attention from the data mining community. The ranking problem, however, is challenging because it is difficult to interpret negative edges. Traditional random walk-based methods such as PageRank and random walk with restart cannot provide effective rankings in signed networks since they assume only positive edges. Although several methods have been proposed by modifying traditional ranking models, they also fail to account for proper rankings due to the lack of ability to consider complex edge relations. In this paper, we propose Signed Random Walk with Restart (SRWR), a novel model for personalized ranking in signed networks. We introduce a signed random surfer so that she considers negative edges by changing her sign for walking. Our model provides proper rankings considering signed edges based on the signed random walk. We develop two methods for computing SRWR scores: SRWR-Iter and SRWR-Pre which are iterative and preprocessing methods, respectively. SRWR-Iter naturally follows the definition of SRWR, and iteratively updates SRWR scores until convergence. SRWR-Pre enables fast ranking computation which is important for the performance of applications of SRWR. Through extensive experiments, we demonstrate that SRWR achieves the best accuracy for link prediction, predicts trolls \(4\times \) more accurately, and shows a satisfactory performance for inferring missing signs of edges compared to other competitors. In terms of efficiency, SRWR-Pre preprocesses a signed network \(4.5 \times \) faster and requires \(11 \times \) less memory space than other preprocessing methods; furthermore, SRWR-Pre computes SRWR scores up to \(14 \times \) faster than other methods in the query phase.
Similar content being viewed by others
References
Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 635–644
Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184
Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Cartwright D, Harary F (1956) Structural balance: a generalization of heider’s theory. Psychol Rev 63(5):277
Davis JA (1967) Clustering and structural balance in graphs. Hum Relat 20(2):181–187
Duff IS, Grimes RG, Lewis JG (1989) Sparse matrix test problems. ACM Trans Math Softw (TOMS) 15(1):1–14
Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge
Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. Proc VLDB Endow 5(5):442–453
Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 597–605
Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU press
Guha R, Kumar R, Raghavan P, Tomkins A (2004) Propagation of trust and distrust. In: Proceedings of the 13th international conference on World Wide Web. ACM, pp 403–412
Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 517–526
Heider F (1946) Attitudes and cognitive organization. J Psychol 21(1):107–112
Jin W, Jung J, Kang U (2019) Supervised and extended restart in random walks for ranking and link prediction in networks. PLoS ONE 14(3):e0213857
Jung J, Jin W, Sael L, Kang U (2016) Personalized ranking in signed networks using signed random walk with restart. In: IEEE 16th international conference on data mining, ICDM 2016, December 12–15, 2016, Barcelona, Spain, pp 973–978. http://dx.doi.org/10.1109/ICDM.2016.0122
Jung J, Park N, Sael L, Kang U (2017) Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In: Proceedings of the 2017 ACM international conference on management of data, SIGMOD conference 2017, Chicago, IL, USA, May 14–19, 2017, pp 789–804
Jung J, Shin K, Sael L, Kang U (2016) Random walk with restart on large graphs using block elimination. ACM Trans Database Syst 41(2):12. https://doi.org/10.1145/2901736
Kang U, Faloutsos C (2011) Beyond ‘caveman communities’: hubs and spokes for graph compression and mining, in ‘ICDM’
Kang U, Tong H, Sun J (2012) Fast random walk graph kernel. In: Proceedings of the twelfth SIAM international conference on data mining, Anaheim, California, USA, April 26-28, 2012, pp 828–838
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Kleinberg JM (1999) Hubs, authorities, and communities. ACM Comput Surveys (CSUR) f31(4es):5
Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: Proceedings of the 18th international conference on World wide web. ACM, pp 741–750
Langville AN, Meyer CD, Fernández P (2008) Googles pagerank and beyond: the science of search engine rankings. Math Intell 30(1):68–69
Lempel R, Moran S (2001) Salsa: the stochastic approach for link-structure analysis. ACM Trans Inf Syst (TOIS) 19(2):131–160
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 641–650
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1361–1370
Lim Y, Kang U, Faloutsos C (2014) Slashburn: graph compression and mining beyond caveman communities. IEEE Trans Knowl Data Eng 26(12):3077–3089
Mishra A, Bhattacharya A (2011) Finding the bias and prestige of nodes in networks based on trust scores. In: Proceedings of the 20th international conference on World Wide Web. ACM, pp 567–576
Ng AY, Zheng AX, Jordan MI (2001) Stable algorithms for link analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 258–266
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the Web
Saad Y (2003) Iterative methods for sparse linear systems, vol 82. SIAM
Shahriari M, Jalili M (2014) Ranking nodes in signed social networks. Soc Netw Anal Min 4(1):1–12
Shin K, Jung J, Lee S, Kang U (2015) Bear: Block elimination approach for random walk with restart on large graphs. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1571–1585
Song D, Meyer DA (2015) Recommending positive links in signed social networks by optimizing a generalized auc. In: AAAI, pp 290–296
Strang G (2006) Linear algebra and its applications. Thomson, Brooks/Cole. https://books.google.ie/books?id=q9CaAAAACAAJ
Szell M, Lambiotte R, Thurner S (2010) Multirelational organization of large-scale social networks in an online world. Proc Nat Acad Sci 107(31):13636–13641
Taylor ME (2006) Measure theory and integration. American Mathematical Soc, Providence
Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T (2007) Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 737–746
Tong H, Faloutsos C, Pan J-Y (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3):327–346
Wu Z, Aggarwal CC, Sun J (2016) The troll-trust model for ranking in signed networks. In: Proceedings of the ninth ACM international conference on Web search and data mining. ACM, pp 447–456
Yang B, Cheung WK, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19(10):1333–1348
Yoon M, Jin W, Kang U (2018) Fast and accurate random walk with restart on dynamic graphs with guarantees. In: Proceedings of the 2018 World Wide Web conference on World Wide Web, WWW 2018, Lyon, France, April 23–27, 2018, pp 409–418
Yoon M, Jung J, Kang U (2018) Tpa: Fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In: 34th IEEE international conference on data engineering, ICDE 2018, Paris, France, April 16–19, 2018
Acknowledgements
This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) [2013-0-00179, Development of Core Technology for Context-aware Deep-Symbolic Hybrid Learning and Construction of Language Resources]. The Institute of Engineering Research at Seoul National University provided research facilities for this work. The ICT at Seoul National University provides research facilities for this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Details of the hub-and-spoke reordering method
SlashBurn [19, 28] is a node reordering algorithm which concentrates nonzero entries of the adjacency matrix of a given graph based on the hub-and-spoke structure. Let n be the number of nodes in a graph, and t be the hub selection ratio whose range is between 0 and 1 where \(\lceil tn \rceil \) indicates the number of nodes selected by SlashBurn as hubs. For each iteration, SlashBurn disconnects \(\lceil tn \rceil \) high-degree nodes, called hub nodes, from the graph; then the graph is split into the giant connected component (GCC) and the disconnected components. The nodes in the disconnected components are called spokes, and each disconnected component forms a block in \(\mathbf {|H|}_{11}\) (or \(\mathbf {T}_{11}\)) in Fig. 6. Then, SlashBurn reorders nodes such that the hub nodes get the highest ids, the spokes get the lowest ids, and the nodes in the GCC get the ids in the middle. SlashBurn repeats this procedure on the GCC recursively until the size of GCC becomes smaller than \(\lceil tn \rceil \). After SlashBurn is done, the reordered adjacency matrix contains a large and sparse block diagonal matrix in the upper left area, as shown in Fig. 6. Figure 14 depicts the procedure of SlashBurn when \(\lceil tn \rceil =1\).
1.2 Properties and lemmas
1.2.1 Sum of positive and negative SRWR scores
Property 3
Consider the recursive equation \(\mathbf {p}= (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p} + c\mathbf {q}\) where \(\mathbf {p}= \mathbf {r}^{+}+ \mathbf {r}^{-}\) and \(|{{\tilde{\mathbf {A}}}}|^{\top }\) is a column stochastic matrix. Then \(\mathbf {1}^{\top }\mathbf {p}= \sum _{i}\mathbf {p}_{i} = 1\).
Proof
By multiplying both sides by \(\mathbf {1}^{\top }\), the equation is represented as follows:
Note that \(\mathbf {1}^{\top }|{{\tilde{\mathbf {A}}}}|^{\top } = (|{{\tilde{\mathbf {A}}}}|\mathbf {1})^{\top }\), and \(|{{\tilde{\mathbf {A}}}}|\) is a row stochastic matrix; thus, \((|{{\tilde{\mathbf {A}}}}|\mathbf {1})^{\top } = \mathbf {1}^{\top }\). Hence, the above equation is represented as follows:
\(\square \)
1.2.2 Analysis on number of iterations of SRWR-Iter
Lemma 3
Suppose \(\mathbf {h}= [\mathbf {r}^{+};\mathbf {r}^{-}]^{\top }\), and \(\mathbf {h}^{(k)}\) is the result of kth iteration in SRWR-Iter. Let \(\delta ^{(k)}\) denote the error \(||\mathbf {h}^{(k)} - \mathbf {h}^{(k-1)} ||_{1}\). Then \(\delta ^{(k)} \le 2(1-c)^{k}\), and the estimated number T of iterations for convergence is \(\log _{1-c}\frac{\epsilon }{2}\) where \(\epsilon \) is an error tolerance, and c is the restart probability.
Proof
According to Eq. (4), \(\delta ^{(k)}\) is represented as follows:
Note that \(||{{\tilde{\mathbf {B}}}}^{\top } ||_{1} = 1\) since \({{\tilde{\mathbf {B}}}}^{\top }\) is column stochastic as described in Theorem 1. Hence, \(\delta ^{(k)} \le (1-c)\delta ^{(k-2)} \le \cdots \le (1-c)^{k}\delta ^{(1)}\). Since \(\delta ^{(1)} = ||\mathbf {h}^{(1)} - \mathbf {h}^{(0)} ||_{1} \le ||\mathbf {h}^{(1)}||_{1} + ||\mathbf {h}^{(0)} ||_{1} = 2\), \(\delta ^{(k)} \le 2(1-c)^{k}\). Note that when \(\delta ^{(k)} \le \epsilon \), the iteration of SRWR-Iter is terminated. Thus, for \(k \le \log _{1-c}\frac{\epsilon }{2}\), the iteration is terminated, and the number T of iterations for convergence is estimated at \(\log _{1-c}\frac{\epsilon }{2}\). \(\square \)
1.2.3 Time complexity of sparse matrix multiplication
Lemma 4
(Sparse Matrix Multiplication [32]) Suppose that \(\mathbf {A}\) and \(\mathbf {B}\) are \(p \times q\) and \(q \times r\) sparse matrices, respectively, and \(\mathbf {A}\) has \( nnz (\mathbf {A})\) nonzeros. Calculating \(\mathbf {C}=\mathbf {A}\mathbf {B}\) using sparse matrix multiplication requires \(O( nnz (\mathbf {A})r)\).
1.3 Complexity analysis of proposed methods for SRWR
We analyze the complexity of our proposed methods SRWR-Iter and SRWR-Pre in terms of time and space. The space and time complexities of SRWR-Iter are presented in Lemma 5, and those of SRWR-Pre are in Lemmas 6, 7, and 8 , respectively.
1.3.1 Space and time complexities of SRWR-Iter
Lemma 5
(Space and Time Complexities of SRWR-Iter) Let n and m denote the number of nodes and edges of a signed network, respectively. Then the space complexity of Algorithm 2 is \(O(n+m)\). The time complexity of Algorithm 2 is \(O(T(n+m))\) where the number T of iterations is \(\log _{1-c}\frac{\epsilon }{2}\), c is the restart probability, and \(\epsilon \) is an error tolerance.
Proof
The space complexity for \({{\tilde{\mathbf {A}}}_{+}}\) and \({{\tilde{\mathbf {A}}}_{-}}\) is O(m) if we exploit a sparse matrix format such as compressed column storage to save the matrices. We need O(n) for SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\). Thus, the space complexity is \(O(n+m)\). One iteration in Algorithm 2 takes \(O(n+m)\) time due to sparse matrix vector multiplications and vector additions where the time complexity of a sparse matrix vector multiplication is linear to the number of nonzeros of a matrix [7]. Hence, the total time complexity is \(O(T(n+m))\) where the number T of iterations is \(\log _{1-c}\frac{\epsilon }{2}\) which is proved in Lemma 3. \(\square \)
1.3.2 Space and time complexities of SRWR-Pre
Lemma 6
(Space Complexity of SRWR-Pre) The space complexity of the preprocessed matrices from SRWR-Pre is \(O(n_{2}^{2} + m)\) where \(n_{2}\) is the number of hubs and m is the number of edges in the graph.
Proof
The space complexity of each preprocessed matrix is summarized in Table 7. \({{\tilde{\mathbf {A}}}_{-}}\), \(\mathbf {|H|}_{12}\), \(\mathbf {|H|}_{21}\), \(\mathbf {T}_{12}\), and \(\mathbf {T}_{21}\) are sparse matrices, and constructed from the input graph; hence, the space complexity is bounded by the number of edges (i.e., O(m)). Note that \(\mathbf {|H|}\) and \(\mathbf {T}\) have the same sparsity pattern; hence, \(\mathbf {|H|}_{11}\) and \(\mathbf {T}_{11}\) identified by [19, 28] have the same b blocks. The ith block in \(\mathbf {|H|}_{11}^{-1}\) (or \(\mathbf {T}_{11}^{-1}\)) contains \(n_{1i}^{2}\) nonzeros; therefore, \(\mathbf {|H|}_{11}^{-1}\) and \(\mathbf {T}_{11}^{-1}\) require \(O(\sum _{i=1}^{b}n_{1i}^{2})\) space, respectively. Since the dimension of \(\mathbf {L}^{-1}_{\mathbf {|H|}}\), \(\mathbf {U}^{-1}_{\mathbf {|H|}}\), \(\mathbf {L}^{-1}_{\mathbf {T}}\), and \(\mathbf {U}^{-1}_{\mathbf {T}}\) is \(n_2\), they require \(O(n_2^2)\) space. \(\square \)
Note that the blocks in \(\mathbf {|H|}_{11}\) (or \(\mathbf {T}_{11}\)) are discovered by the reordering method [19, 28] as briefly described in Appendix A.1. In real-world graphs, \(\sum _{i=1}^{b}n_{1i}^{2}\) can be bounded by O(m) as shown in [34]. Hence, we assume that the space complexity of \(\mathbf {|H|}_{11}^{-1}\) and \(\mathbf {T}_{11}^{-1}\) is O(m) for simplicity.
Lemma 7
(Time Complexity of Preprocessing Phase in SRWR-Pre) The preprocessing phase in Algorithm 3 takes \(O(T(m+n\log {n}) + n_{2}^{3} + mn_{2})\) where \(T=\lceil {\frac{n_{2}}{tn}}\rceil \) is the number of iterations, and t is the hub selection ratio in the hub-and-spoke reordering method [19, 28].
Proof
We only consider the main factors of the time complexity of Algorithm 3 in this proof. The hub-and-spoke reordering method takes \(O(T(m + n\log {n}))\) time (line 1) where T is \(\lceil {\frac{n_{2}}{tn}}\rceil \) which is proved in [19, 28]. Computing the Schur complement of \(\mathbf {|H|}_{11}\) takes \(O(n_{2}^{2} + mn_{2})\) because it takes \(O(mn_{2})\) to compute \(\mathbf {P}_{1} = \mathbf {|H|}_{11}^{-1}\mathbf {|H|}_{12}\) and \(\mathbf {P}_{2} = \mathbf {|H|}_{21}\mathbf {P}_{1}\) by Lemma 4, and \(O(n_{2}^{2})\) to compute \(\mathbf {|H|}_{22} - \mathbf {P}_{2}\) (line 6). It takes \(O(n_{2}^{3})\) to compute the inverse of the LU factors (line 8). Note that computing \(\mathbf {|H|}^{-1}_{11}\)(line 4) requires \(O(\sum _{i=1}^{b}n_{1i}^{3})\) time where it takes \(n_{1i}^{3}\) to obtain the inverse of ith block. In real-world networks, the size \(n_{1i}\) of each block is much smaller than the number \(n_2\) of hubs; thus, we assume that \(\sum _{i=1}^{b}n_{1i}^{3} \ll n_{2}^{3}\) [34]. Hence, the time complexity of preprocessing \(\mathbf {|H|}\) is \(O(T(m + n\log {n}) + n_{2}^{3} + mn_{2})\). Note that the time complexity of preprocessing \(\mathbf {T}\) is included into that of preprocessing \(\mathbf {|H|}\) since \(\mathbf {T}\) and \(\mathbf {|H|}\) have the same sparsity pattern. \(\square \)
Lemma 8
(Time Complexity of Query Phase in SRWR-Pre) The query phase in Algorithm 4 takes \(O(n_{2}^{2} + n + m)\) time.
Proof
We only consider the main factors of the time complexity of Algorithm 4 in this proof. It takes \(O(n_{2}^{2} + m)\) to compute \(\mathbf {p}_{2}\) since it takes \(O(n_{2} + m)\) to compute \({{\tilde{\mathbf{q}}}}_{2} = \mathbf {q}_{2} - \mathbf {|H|}_{21}(\mathbf {|H|}^{-1}_{11}\mathbf {q}_{1})\), and \(O(n_{2}^{2}\)) to compute \(\mathbf {U}^{-1}_{\mathbf {|H|}}(\mathbf {L}^{-1}_{\mathbf {|H|}}{{\tilde{\mathbf{q}}}}_{2})\) (line 2). It takes O(n) time to concatenate the partitioned vectors (lines 4 and 8) and compute \(\mathbf {r}^{+}\) and \(\mathbf {r}\) (lines 9 and 10 ). Hence, the total time complexity of the query phase is \(O(n_{2}^{2} + n + m)\). \(\square \)
1.4 Detailed limitations of existing random walk-based ranking models in signed networks
In this section, we describe the detailed limitation of existing random walk-based ranking models which are briefly described in Sect. 1.
Random Walk with Restart (RWR): We perform RWR on a given signed network after taking absolute edge weights to obtain \(\mathbf {r}\) as follows:
$$\begin{aligned} \mathbf {r}= (1-c){|{\tilde{\mathbf {A}}}|}^{\top }\mathbf {r}+ c\mathbf {q} \end{aligned}$$where \({|{\tilde{\mathbf {A}}}|}\) is the row-normalized matrix of the absolute adjacency matrix in the signed network. RWR does not properly consider negative edges for \(\mathbf {r}\).
Modified Random Walk with Restart (M-RWR) [33]: M-RWR applies RWR separately on both a positive subgraph and a negative subgraph; thus, it obtains \(\mathbf {r}^{+}\) on the positive subgraph and \(\mathbf {r}^{-}\) on the negative subgraph, and then, computes \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\). The detailed equations for M-RWR are as follows:
$$\begin{aligned} \mathbf {r}^{+}= (1-c){{\tilde{\mathbf {B}}}}_{+}^{\top }\mathbf {r}^{+}+ c\mathbf {q} \text { and } \mathbf {r}^{-}= (1-c){{\tilde{\mathbf {B}}}}_{-}^{\top }\mathbf {r}^{-}+ c\mathbf {q} \end{aligned}$$where \({{\tilde{\mathbf {B}}}}_{+}\) is the row-normalized matrix of the adjacency matrix containing only positive edges, and \({{\tilde{\mathbf {B}}}}_{-}\) is that of the absolute adjacency matrix containing only negative edges. The main limitation of M-RWR is that it does not consider relationships between positive and negative edges due to the separation as shown in the above equations.
Modified Personalized SALSA (M-PSALSA) [30]: Andrew et al. made a modification on SALSAFootnote 1 by introducing the random jump into it, called Personalized SALSA (PSALSA). As similar to M-RWR, we apply PSALSA separately on both positive and negative subgraphs and consider authorities on the positive subgraph as \(\mathbf {r}^{+}\), and those scores on the negative subgraph as \(\mathbf {r}^{-}\). M-PSALSA also has the same limitation with M-RWR.
Personalized Signed Spectral Rank (PSR) [23]: Kunegis et al. proposed PSR which is a variant of PageRank by constructing the following matrix similar to Google matrix:
$$\begin{aligned} \mathbf {M}_{PSR} = (1-c)\mathbf {D}^{-1}\mathbf {A}^{\top } + c\mathbf {e}_{s}\mathbf {1}^{\top } \end{aligned}$$where \(\mathbf {A}\) is the signed adjacency matrix, \(\mathbf {D}\) is the diagonal out-degree matrix, and \(\mathbf {e}_{s}\) is the sth unit vector. Then, PSR computes the left eigenvector of \(\mathbf {M}_{PSR}\), which induces a relative trustworthy score vector \(\mathbf {r}\) including positive and negative values. Although PSR is able to produce \(\mathbf {r}\), the equation for PSR is heuristic because \(\mathbf {M}_{PSR}\) is not a column stochastic matrix. Also, how the random surfer based on the equation interprets negative edges is veiled.
1.5 Detailed description of evaluation metrics
We describe the details of metrics used in the link prediction and the troll identification tasks. The metrics for the sign prediction task is described in Sect. 5.5.
1.5.1 Link prediction
-
GAUC (Generalized AUC): Song et al. [35] proposed GAUC which measures the quality of link prediction in signed networks. An ideal personalized ranking w.r.t. a seed node s needs to rank nodes with positive links to s at the top, those with negative links at the bottom, and other unknown status nodes in the middle of the ranking. For a seed node s, suppose that \(\mathbf {P}_{s}\) is the set of positive nodes potentially connected by s, \(\mathbf {N}_{s}\) is that of negative nodes, and \(\mathbf {O}_{s}\) is that of the other nodes. Then, GAUC of the personalized ranking w.r.t. s is defined as follows:
$$\begin{aligned} \text {GAUC}_{s}&= \frac{\eta }{|\mathbf {P}_{s}|(|\mathbf {O}_{s}| + |\mathbf {N}_{s}|)}\left( \sum _{p \in \mathbf {P}_{s}}\sum _{i \in \mathbf {O}_{s} \cup \mathbf {N}_{s}} {\mathbb {I}}(\mathbf {r}_{p} > \mathbf {r}_{i}) \right) \\&\quad + \frac{1-\eta }{|\mathbf {N}_{s}|(|\mathbf {O}_{s}| + |\mathbf {P}_{s}|)} \left( \sum _{i \in \mathbf {O}_{s} \cup \mathbf {P}_{s}} \sum _{n \in \mathbf {N}_{s}} {\mathbb {I}}(\mathbf {r}_{i} < \mathbf {r}_{n}) \right) \end{aligned}$$where \(\eta = \frac{|\mathbf {P}_{s}|}{|\mathbf {P}_{s}| + |\mathbf {N}_{s}|}\) is the relative ratio of the number of positive edges and that of negative edges, and \({\mathbb {I}}(\cdot )\) is an indicator function that returns 1 if a given predicate is true, or 0 otherwise. GAUC will be 1.0 for the perfect ranking list and 0.5 for a random ranking list [35].
-
AUC (Area Under the Curve): AUC of the personalized ranking scores \(\mathbf {r}\) w.r.t. seed node s in signed networks is defined as follows [35]:
$$\begin{aligned} \text {AUC}_{s} = \frac{1}{|\mathbf {P}_{s}||\mathbf {N}_{s}|}\sum _{p\in \mathbf {P}_{s}}\sum _{n \in \mathbf {N}_{s}}{\mathbb {I}}(\mathbf {r}_{p} > \mathbf {r}_{n}) \end{aligned}$$where \(\mathbf {P}_{s}\) is the set of positive nodes potentially connected by s, and \(\mathbf {N}_{s}\) is the set of negative nodes. \({\mathbb {I}}(\cdot )\) is an indicator function that returns 1 if a given predicate is true, or 0 otherwise. With an ideal ranking list, AUC should be 1 representing each positive sample is ranked higher than all the negative samples. For a random ranking, AUC will be 0.5. However, AUC is not a satisfactory metric for the link prediction task in signed networks because AUC is designed for two classes (positive and negative), while the link prediction in signed networks should consider three classes (positive, unknown, and negative) as described in the above.
1.5.2 Troll identification
Suppose that we have a personalized ranking \({\mathcal {R}}\) in the ascending order of the trustworthiness scores w.r.t. a seed node (i.e., a node with a low score is ranked high) to have the same effect of searching trolls in the bottom of the original ranking in the descending order of those scores.
MAP@k (Mean Average Precision): MAP@k is the mean of average precisions, AP@k, for multiple queries. Suppose that there are l trolls to be captured. Then, AP@k is defined as follows:
$$\begin{aligned} \text {AP@}k = \frac{1}{\min (l, k)}\left( \sum _{t \in \mathbf {T}} \text {Precision@}t\right) \end{aligned}$$where \(\text {Precision@}t\) is the precision at the cutoff t. Note that \(\mathbf {T} = \{t | {\mathbb {I}}({\mathcal {R}}[t]) = 1 \text { for } 1 \le t \le k \}\) where \({\mathcal {R}}[t]\) denotes the user ranked at position t in the ranking \({\mathcal {R}}\), and \({\mathbb {I}}({\mathcal {R}}[t])\) is 1 if \({\mathcal {R}}[t]\) is a troll. For N queries, MAP@k is defined as follows:
$$\begin{aligned} \text {MAP@}k = \frac{1}{N}\left( \sum _{i=1}^{N}\text {AP@}k\right) \end{aligned}$$NDCG@k (Normalized Discount Cumulative Gain): NDCG is the normalized value of Discount Cumulative Gain (DCG), which is defined as follows:
$$\begin{aligned} \text {DCG}@k = rel_{1} + \sum _{i=2}^{k}\frac{rel_i}{log_{2}{(i)}}, \quad \text {and}\quad \text {NDCG}@k = \frac{\text {DCG}@k}{\text {IDCG}@k} \end{aligned}$$where \(rel_{i}\) is the user-graded relevance score for the ith ranked item. Then, NDCG@k is obtained by normalizing using Ideal DCG(IDCG) which is the DCG for the ideal order of ranking.
Precision@k and Recall@k: Precision@k (Recall@k) is the precision (recall) at the cutoff k in a ranking. Precision@k is the ratio of identified trolls in top-k ranking, and Recall@k is the ratio of identified trolls in the total trolls.
MRR (Mean Reciprocal Rank): MRR@k is the mean of the reciprocal rank (RR) for each the top-k query response. RR is the multiplicative inverse of the rank of the first correct answer. Hence, for N multiple queries, MRR@k is defined as follows:
$$\begin{aligned} \text {MRR}@k = \frac{1}{N}\sum _{i=1}^{N}\frac{1}{rank_{i}} \end{aligned}$$where \(rank_{i}\) is the rank position of the first relevant item in the top-k ranking. If there is no relevant item in the ranking for the ith query, the inverse of the rank, \({rank_{i}}^{-1}\), becomes zero.
1.6 Discussion on relative trustworthiness scores of SRWR
In Sect. 4.1, we define the relative trustworthiness \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\) where \(\mathbf {r}^{+}\) is for positive SRWR scores, and \(\mathbf {r}^{-}\) is for negative SRWR ones. We show that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are measures, and \(\mathbf {r}\) is a signed measure using definitions from measure theory [38]. We first introduce the definition of measure as follows:
Definition 5
(Measure [38]) A measure \(\mu \) on a (finite) set \(\Omega \) with \(\sigma \)-algebra \({\mathcal {A}}\) is a function \(\mu : {\mathcal {A}} \rightarrow {\mathbb {R}}_{\ge 0}\) such that
- 1.
(Nonnegativity) \(\mu (E) \ge 0\)\(\forall E \in {\mathcal {A}}\),
- 2.
(Null empty set) \(\mu (\emptyset ) = 0\),
- 3.
(Countable additivity) \(\mu (\bigcup _{i=1}^{\infty }E_{i})=\sum _{i=1}^{\infty }E_{i}\) for any sequence of pairwise disjoint sets, \(E_1, E_2, \ldots \in {\mathcal {A}}\)
where \(\sigma \)-algebra \({\mathcal {A}}\) on \(\Omega \) is a collection \({\mathcal {A}}\subseteq 2^{\Omega }\) s.t. it is nonempty, and closed under complements (i.e., \(E \in {\mathcal {A}} \Rightarrow E^{c} \in {\mathcal {A}}\)) and countable unions (i.e., \(E_1, E_2, \cdots \in {\mathcal {A}} \Rightarrow \bigcup _{i=1}^{\infty }E_{i}\in {\mathcal {A}}\)). The pair of \((\Omega , {\mathcal {A}})\) is called measurable space. \(\square \)
In probability theory, \(\sigma \)-algebra \({\mathcal {A}}\) describes all possible events to be measured as probability. Note that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are joint probabilities of nodes and signs, i.e., \(\mathbf {r}^{+}_{u}=P(N={u}, S=+)\) and \(\mathbf {r}^{-}_{u}=P(N=u, S=-)\) where N is a random variable of nodes, and S is a random variable of the surfer’s sign. Note that N takes an item from \(\sigma \)-algebra \({\mathcal {A}}\). The following property shows that \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are (nonnegative) measures.
Property 4
Suppose \(\Omega \) is the set \(\mathbf {V}\) of nodes, and \(\sigma \)-algebra \({\mathcal {A}}\) on \(\Omega \) is \(2^\Omega \). Let \(\mu ^{+}=P(N, S=+)\) and \(\mu ^{-}=P(N, S=-)\). Then, both \(\mu ^{+}\) and \(\mu ^{-}\) are (nonnegative) measures according to Definition 5.
Proof
For any \(E \in {\mathcal {A}}\), \(\mu ^{+}(E) \ge 0\) and \(\mu ^{+}(\emptyset ) = 0\) are obviously true since \(P(N, S=+)\) is a probability; hence, \(P(E, S=+) \ge 0\) and \(P(\emptyset , S=+) = 0\). Let \((E_n)_{n\in {\mathbb {N}}}\) be a sequence of pairwise disjoint sets where \(E_n \in {\mathcal {A}}\). Since the sets in the sequence are mutually disjoint, the following holds:
Therefore, \(\mu ^{+}=P(N, S=+)\) is a measure by Definition 5. Similarly, \(\mu ^{-}=P(N, S=-)\) is also a measure. \(\square \)
Next, we introduce the definition of signed measure, a generalized version of measure by allowing it to have negative values.
Definition 6
(Signed Measure [38]) Given a set \(\Omega \) and \(\sigma \)-algebra \({\mathcal {A}}\), a signed measure on \((\Omega , {\mathcal {A}})\) is a function \(\mu : {\mathcal {A}} \rightarrow {\mathbb {R}}\) such that
- 1.
(Real value) \(\mu (E)\) takes a real value in \({\mathbb {R}}\),
- 2.
(Null empty set) \(\mu (\emptyset ) = 0\),
- 3.
(Countable additivity) \(\mu (\bigcup _{i=1}^{\infty }E_{i})=\sum _{i=1}^{\infty }E_{i}\) for any sequence of pairwise disjoint sets, \(E_1, E_2, \cdots \in {\mathcal {A}}\)\(\square \)
Note that Shannon entropy and electric charge are representative examples of signed measure. Then, the following lemma indicates the difference between two nonnegative measures is a signed measure.
Lemma 9
(Difference Between Two Nonnegative Measures [38]) Suppose we are given nonnegative measure \(\mu ^{+}\) and \(\mu ^{-}\) on the same measurable space \((\Omega , {\mathcal {A}})\). Then, \(\mu = \mu ^{+} - \mu ^{-}\) is a signed measure.
Proof
Since \(\mu ^{+}\) and \(\mu ^{-}\) are nonnegative, \(\mu \) is located between \(-\infty \) and \(\infty \). Also, \(\mu (\emptyset ) = \mu ^{+}(\emptyset ) - \mu ^{-}(\emptyset ) = 0\). Moreover, \(\mu \) is countable additive, i.e.,
Hence, \(\mu = \mu ^{+} - \mu ^{-}\) is a signed measure according to Definition 6. \(\square \)
Lemma 9 implies that the relative trustworthiness \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\) is a signed measure. The trustworthiness \(\mathbf {r}_u\) measures a degree of trustworthiness between seed node s and node u: if \(\mathbf {r}_{u} > 0\), seed node s is likely to trust node u as much as \(\mathbf {r}_{u}\), while if \(\mathbf {r}_{u} < 0\), s is likely to distrust u as much as \(\mathbf {r}_{u}\).
Rights and permissions
About this article
Cite this article
Jung, J., Jin, W. & Kang, U. Random walk-based ranking in signed social networks: model and algorithms. Knowl Inf Syst 62, 571–610 (2020). https://doi.org/10.1007/s10115-019-01364-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01364-z