1 Introduction

Recently, graph-structured data is becoming increasingly ubiquitous, especially with the spreading popularity of the e-commerce platforms and social networks [32, 33]. These networks can be modeled as signed graphs whose edges have either positive or negative signs. Great research efforts have been spent on the unipartite signed graphs [4, 18, 22, 25, 26, 37]. However, as a common form of signed graphs, the signed bipartite graphs have been overlooked by the research community. Signed bipartite graphs have two independent partitions of nodes and the edges are only formed between the nodes of different types. These graphs are prevalent across many domains. For example, on e-commerce platforms, such as Amazon, the buyers can provide positive or negative reviews towards a product.

Previous works primarily focus on the unipartite signed graphs or bipartite graphs separately [2, 5, 31], which are the graphs without node partition or edge sign information. These methods are unable to handle the complexities brought by both bipartite and signed settings. There have been some methods proposed for signed bipartite graphs based on the balance theory [6, 8]. However, these methods are unable to fully exploit the structural, attribute and high-order information by simply performing conventional random walk, which limits their performance on the sign prediction task. With the growing popularity of graph neural networks (GNNs), a variety of network embedding and GNN-based methods are developed for unipartite signed graphs and unsigned bipartite graphs. These models also lack the capability to fully preserve the information of negative links and node partitions. For example, the graph neural networks for signed graphs cannot capture the similarity between nodes in the same node partition of signed bipartite graphs. Similarly, GNNs designed for unsigned bipartite graphs [9, 21] will aggregate the neighbor information in the same way for both positive and negative edges. As a result, they totally ignore the sign information and cannot be used for the sign prediction task. Unsatisfactory results are observed in our initial attempt to directly apply these models on signed bipartite graphs. Therefore, the graph neural network on the signed bipartite graphs cannot be carried out by simply applying existing models.

Fig. 1
figure 1

Example for Signed Bipartite Graph

Motivated by the above observations, in this paper, we design a novel graph neural network on the signed bipartite graphs by integrating the proposed polarity attribute, named Polarity-based Graph Convolutional Network (PbGCN). PbGCN first obtains the polarity value for each node, which describes others’ opinions towards this node.

For example, in Figure 1, people have polarized opposite opinions toward durian (\(t_2\)), even though they may share similar dietary preferences, such as they all love banana (\(t_1\)). Consequently, durian (\(t_2\)) and banana (\(t_1\)) will have small and large polarity values, respectively.

The polarity value paves the way for PbGCN to perform one-mode projection [38] by adding edges between nodes of the same type that share similar polarity values. Then, PbGCN is able to aggregate the information via the established edges directly, even if the nodes in the edges have high proximity with each other in the original graph. Based on the above ideas, we build a graph neural network for signed bipartite graphs, which can significantly boost the performance of the sign prediction task compared with the baseline approaches. The main contributions of the paper are summarized as follows.

  • To the best of our knowledge, this is the first work that develops a graph neural network model on the signed bipartite graph to solve the sign prediction task.

  • Our model combines the advantages of balance theory and polarity information, which describes the controversy of opinions to the node. Thus, it allows the graph neural network to perform aggregation between the nodes of the same type in signed bipartite graphs directly.

  • Extensive experiments conducted on both real-life and synthetic graphs demonstrate that PbGCN outperforms the state-of-the-art methods for sign prediction.

Roadmap. The rest part of the paper is organized as follows: Section 2 introduces highly related work of this paper (Section 2.1), and also given the key definitions and important notations used in this paper (Section 2.2). Section 3 describes our proposed model in detail. Specifically, Section 3.1 gives the overall framework of the model and other subsections describe every aspect of the model thoroughly. Section 4 consist the empirical evaluations of the proposed model, where Section 4.1 introduces the datasets and baseline methods used in experiments, Section 4.2 is the experiment settings and Sections 4.3 and 4.5 illustrate the experiment results including the parameter sensitivity analysis. Finally, the paper concludes in Section 5.

2 Related work and preliminaries

Recently, with the increasing popularity of social networks and e-commerce platforms, the signed bipartite graphs have become much more ubiquitous. With this trend, there are a great number of researches for graph analytics proposed to solve the specific tasks on the signed bipartite graphs, such as link prediction, sign prediction, node classification, etc. Besides, due to the significant success of the graph neural network (GNN) on various graph analysis tasks, some GNN-based models are also proposed for bipartite graphs. In this section, we introduce some related work with important definitions and notations.

Fig. 2
figure 2

Example for one-mode projection

2.1 Related work

2.1.1 Analysis on relatively simple graphs

Numerous research works have proposed outstanding methods on graphs with relatively simple structures, i.e., unipartite signed graph and bipartite signed graph [2, 18, 22, 31, 37]. Among them, [18, 22, 37] focus on the analysis of unipartite signed graphs. These studies carefully and effectively analyze the positive and negative relationships among the nodes of signed social networks. They point out the patterns or regularities that frequently appear in signed social networks. As a result, they achieve high accuracy on the sign prediction task, which is useful in predicting whether users are more likely to have positive or negative relationships with each other in social networks. In addition, the method proposed by [2] can be used on the structure of the bipartite graph and can efficiently reproduce its degree distribution as well as the degreewise metamorphosis coefficients. What’s more, [31] presents a very advanced method for calculating the number of butterflies in a bipartite graph. These methods cannot be directly applied for analysis of signed bipartite graphs due to their unipartite or unsigned settings respectively.

2.1.2 Methods for signed bipartite graph

Aiming to fill the gap mentioned in the previous subsection, a variety of heuristic and theoretical models [1, 3, 6, 8, 11, 13, 17, 24] are proposed for the applications on the signed bipartite graphs such as sign prediction, link prediction and recommendation. In these efforts, [6, 8, 13] analyzed the voting records of representatives from the U.S. Congress by using balance theory and achieved well predictions of the voting situation. In addition, there are other methods that implement the analysis of signed bipartite graphs based on the signed random walk with restart [17], signed multiplicative rank propagation [24], projection method [11] and linear algorithm [1]. Furthermore, [3] develops a heuristic solution and obtains an outstanding performance on the node classification in signed bipartite networks.

2.1.3 GNNs on bipartite graph

Recently, with the increasing popularity and development of network embedding [10, 23] and graph neural networks (GNN) [12, 16, 27], methods for solving problems on bipartite graphs using neural networks continue to be proposed by numerous studies [9, 19, 21, 28, 30, 34]. Amount them, [9] proposes a novel framework for cancer survival prediction, [19, 21] propose novel methods to handle large-scale e-commerce tasks by analyzing the hierarchical structures and using GNNs in bipartite graphs. [34] proposes an analysis that provides insights into better extracting and fusing information from the protein–protein interaction network for drug repurposing. Furthermore, [7, 14, 15, 20, 35] designed representation learning models [29, 36] that are able to preserve both positive and negative link information within the signed graphs. These methodologies are lacking the capability to fully exploit the structural and attribute information that lies in the signed bipartite graphs. In this paper, we proposed the first graph neural network for signed bipartite graphs that is able to address the above issues and improve the sign prediction performance.

2.2 Preliminaries

There are two key definitions in this paper and Table 1 summarized the important notations frequently used throughout the paper.

Definition 1

(Signed Bipartite Graph) A bipartite graph has two separate vertex sets, each of which only has connections with the vertices in the other vertex set. A signed bipartite graph can be denoted as \(G = (\mathcal {V}_S, \mathcal {V}_T ,\mathcal {E}^+, \mathcal {E}^-)\), where \(\mathcal {V}_S = \{s_1, s_2,...,s_m\}, \mathcal {V}_T = \{t_1, t_2,...,t_n\}\) are the mutually exclusive node sets. \(\mathcal {E}^+ \subset \mathcal {V}_S \times \mathcal {V}_T\) and \(\mathcal {E}^- \subset \mathcal {V}_S \times \mathcal {V}_T\) are the positive and negative edges that connect nodes between two partitions, where \(\mathcal {E}^+ \cap \mathcal {E}^- = \emptyset\).

Definition 2

(One-Mode Projection) [38] One-Mode Projection on the bipartite graphs aims to construct a projection graph that creates links between the nodes of the same type, i.e., to build graphs \(G_S = (\mathcal {V}_S ,\mathcal {E}_S)\), \(G_T = (\mathcal {V}_T ,\mathcal {E}_T)\) where \(\mathcal {E}_S \subset \mathcal {V}_S \times \mathcal {V}_S\) and \(E_T \subset \mathcal {V}_T \times \mathcal {V}_T\).

Definition 3

(Polarity Value) Polarity value is used in our model to measure the disagreement of attitude of the neighbors of one vertex towards it. Formally, in this work, the polarity value is defined as: \(r_{v_i} = 1 - \eta ^{DS}_{v_i}/\eta ^{all}_{v_i}\), where \(\eta ^{DS}_{v_i} = \left| \mathcal {N}^{+}_{v_i}\right| \times \left| \mathcal {N}^{-}_{v_i}\right|\) is the product of the number of positive and negative neighbors of \(v_i\), and \(\eta ^{all}_{v_i}\) is the number of all possible neighbor pairs of \(v_i\). We also regularize the polarity value as: \(\left| \tau _{v_i}\right| = \frac{r_{v_i} - min(r_{\mathcal {V}})}{max(r_{\mathcal {V}}) - min(r_{\mathcal {V}})}\).

Definition 4

(Balance Theory) Balance theory defined that a cycle in signed networks with an even number of negative links is balanced, which is typically stated as “a friend of my friend is my friend” while an “enemy of my friend is my enemy”.

Typically, the one mode projection is performed based on the number of common neighbors of two nodes of the same type.

Example 1

One-mode projection. Figure 2 is an example of a one-mode projection of the signed bipartite graph in Figure 1. In this figure, both \(s_1\) and \(s_3\) are connected with \(t_1\), \(t_2\) and \(t_3\) with positive signed edges, so \(s_1\) and \(s_3\) have 3 neighbors that are connected in the same way. When the threshold used to judge whether a connection should be created is 3, \(s_1\) and \(s_3\) should be connected. Similarly, \(s_2\) and \(s_4\) are connected to both \(t_1\) and \(t_3\) with positive edges and to \(t_2\) with negative edges at the same time, so \(s_2\) and \(s_4\) should also be connected. \(t_1\) and \(t_3\) have 4 common neighbors with the same sign and will also be connected. However, if the number of common neighbors with the same sign is less than 3 like \(s_1\) and \(s_2\), no edge will be established.

Table 1 Notation Table

3 Model

In this section, we introduce the details of our proposed Polarity-based Graph Convolutional Network (PbGCN). In Section 3.1, we give the overall introduction of the framework of our proposed PbGCN. Section 3.2 describes how the polarity attribute is captured in our model, based on which we perform the one-mode projection to establish the links between nodes in the same type. Section 3.3 introduces the graph convolution operations utilized in our model that aggregates the information along the edges in the graph after one-mode projection. Section 3.4 gives the learning objectives used to optimize our model. Finally, the balance theory-based sign prediction is introduced in Section 3.5.

Fig. 3
figure 3

The Framework of Proposed PbGCN

3.1 Framework

Our proposed PbGCN aims to preserve the universal relationship between neighboring nodes, thus achieves higher accuracy in the sign prediction task for signed bipartite graphs compared to the state-of-the-art heuristic methods. PbGCN is a GNN and balance theory-based model which combines their advantages. The framework of PbGCN is shown in Figure 3. We first get the initial structural information of the graph, which helps to summarize the polarity information and add signed weighted edges between the same-type nodes for one-mode projection. The added edges are superimposed into positive and negative adjacency matrices respectively. Afterwards, graph neural network is performed to obtain the prediction \(\tau\), which is then combined with the results \({\varvec{Y}}_b\) obtained based on balance theory for the down-stream prediction task.

3.2 Polarity attribute

Firstly, we need an initial polarity value \(\tau _{init}\) from the structure of the given graph to perform one-mode projection and guide our training process. The core idea is to capture the distribution of edge signs between each node and its neighbors. For node \(v_i \in \mathcal {V}\) in the bipartite graph G, we simply compute the possible number of neighboring node pairs connected by different signs as:

$$\begin{aligned} \eta ^{DS}_{v_i} = \left| \mathcal {N}^{+}_{v_i}\right| \times \left| \mathcal {N}^{-}_{v_i}\right| , \end{aligned}$$
(1)

where \(\left| \mathcal {N}^{+}_{v_i}\right|\) and \(\left| \mathcal {N}^{-}_{v_i}\right|\) denotes the number of positive and negative neighbors of \(v_i\) respectively. For instance, in Figure 1, Jack(\(s_2\)) has two positive links and one negative link, thus \(\left| \mathcal {N}^{+}_{s_2}\right| = 2\) and \(\left| \mathcal {N}^{-}_{s_2}\right| = 1\). We can also compute the number of all possible pairs between the neighbor of \(v_i\) using the following equation:

$$\begin{aligned} \eta ^{all}_{v_i} = \frac{\left| \mathcal {N}_{v_i}\right| !}{2 \times (\left| \mathcal {N}_{v_i}\right| - 2)!}, \end{aligned}$$
(2)

where \(\left| \mathcal {N}_{v_i}\right|\) denotes the total number of \({v_i}\)’s neighbors. Obtained the number of possible pairs, we can get the dissimilar connection rate using the fraction between the numbers of different signed neighbor pairs and all possible pairs: \(r_{v_i} = 1 - \eta ^{DS}_{v_i}/\eta ^{all}_{v_i}\). It is obvious that \(\left| \mathcal {N}_{v_i}\right|\), \(r_{v_i}\) is smaller with more polarized distributed edge signs, i.e., \(\left| \mathcal {N}^{+}_{v_i}\right|\) is closed to \(\left| \mathcal {N}^{-}_{v_i}\right|\), with the total number of neighbors fixed.

We normalize the dissimilar rate by:

$$\begin{aligned} \left| \tau _{v_i}\right| = \frac{r_{v_i} - min(r_{\mathcal {V}})}{max(r_{\mathcal {V}}) - min(r_{\mathcal {V}})}, \end{aligned}$$
(3)

where \(r_{\mathcal {V}}\) is the rate set of all nodes \(\mathcal {V}\) and \(\left| \tau _{v_i}\right|\) is the normalized value. The sign of \(\tau _{v_i}\) is determined by the number of positive and negative neighbors of \(v_i\), i.e., \(\tau _{v_i} > 0\) if \(\left| \mathcal {N}^{+}_{v_i}\right| > \left| \mathcal {N}^{-}_{v_i}\right|\) and vice versa. We denote this initial polarity value \(\tau\) as \(\tau _{init}\) in the rest of this paper. \(\tau _{init}\) directly comes from the neighbor information which captures abundant structural information. However, \(\tau _{init}\) is hard to be utilized as a training objective or directly used for sign prediction. Therefore, we convert \(\tau _{v_i}\) to a label \(y_i\) by thresholding. The nodes in the given signed bipartite graph are categorized into three classes according to polarity values: positive tendency nodes, negative tendency nodes which are more likely to have positive or negative links respectively, and polarized nodes which have no obvious tendency to form positive or negative links. These labels give a qualitative training target for GNN, and as a result, our model is able to capture the graph information on a learning basis rather than simply use the values from heuristic statistics.

Since the bipartite graphs have no edge connected between nodes within the same type, in order to allow neighbor aggregation between nodes of the same type, we perform one-mode projection to add edges between these same type nodes according to their polarity values. The main idea for creating such edges is to consider the closeness of \(\tau _{v_i}, \tau _{v_j}\) of the same type nodes \(v_i, v_j\). When their gap is less than a threshold \(\gamma\), we create a weighted edge \({\varvec{M}}^\tau _{ij} \in (0,1]\) between them:

$$\begin{aligned} {\varvec{M}}^\tau _{ij} = 1-\left( \left| \tau _{v_i}-\tau _{v_j}\right| \times \frac{1}{\gamma }\right) . \end{aligned}$$
(4)

This weight is determined by the similarity of their polarity value \(\tau _{init}\), and the smaller the gap, the greater the weight. After the pair-wise comparison of \(\tau _{init}\) between nodes in the same partition, we can form the one-mode projection (OMP) graphs \({\varvec{M}}^\tau _S\) and \({\varvec{M}}^\tau _T\) for node sets S and T, respectively. We form an adjacency matrix that involves all nodes in the graph as \({\varvec{M}}^{\tau } = \left[ \begin{array}{cc}{\varvec{M}}^\tau _S &{} {\varvec{0}}\\ {\varvec{0}} &{} {\varvec{M}}^\tau _T \end{array}\right]\) for the aggregation in graph neural network. The algorithm for one-mode projection is summarized in Algorithm 1, where line 2 is the computation of the number of positive edges, the number of negative edges and the total number of edges connected to node v. Lines 17 to 19 are used to avoid the self-loop.

figure a

3.3 Graph convolutional layers

In PbGCN, we use the graph convolutional network (GCN) to perform neighbor aggregation. The overall idea of GCN aggregation is to rely on the adjacency matrix to filter out the neighboring nodes of each node, then weight the feature vectors of these neighboring nodes and sum them up. The activation function is then used on them to update feature vectors, so that the similar nodes have similar feature vectors. Besides, in order to retain the feature vectors of the original nodes, a diagonal identity matrix is added to the original adjacency matrix for self-loop. And to ensure that all nodes aggregate the feature vectors of neighboring points at the same scale, a normalization operation is required for each row of the adjacency matrix. As a result, each layer in the original GCN algorithm can be formulated as:

$$\begin{aligned} \varvec{H}_{p}^{l+1} = \sigma (\widetilde{\varvec{D}}^{-\frac{1}{2}}\widetilde{\varvec{A}}\widetilde{\varvec{D}}^{-\frac{1}{2}}{\varvec{H}}^{(l)}{\varvec{W}}^{(l)}), \end{aligned}$$
(5)

where \({\varvec{H}}^{(l)}\) is the hidden feature vector at \(l^{th}\) layer, \(\widetilde{\varvec{A}}\) is the adjacency matrix \({\varvec{A}}\) with self-loop as: \(\widetilde{\varvec{A}} = {\varvec{ A}} + {\varvec{I}}_N\), \(\widetilde{\varvec{D}}\) is a diagonal degree matrix of \(\widetilde{\varvec{A}}\) where \({\varvec{D}}_{ii} = \sum _j\widetilde{\varvec{A}}_{ij}\), and \({\varvec{W}}^{(l)}\) is a trainable weight matrix for \(l^{th}\) layer. It is clear that the original GCN will suffer from poor performance if it is directly applied to the signed bipartite graph. As a result, there are two major challenges to apply graph convolutional network on the signed bipartite graphs: it is difficult to aggregate the feature vectors of nodes of the same type, on the other hand, the primitive aggregation approach cannot distinguish the sign information of the edges.

In order to solve the above problems, we have obtained \(\tau _{init}\) in the above steps and used it to build a one-mode projection graph \({\varvec{M}}^\tau\) to allow nodes of the same type with potentially similar properties to be connected and provide weights according to the polarity values to ensure better aggregation results. To address the second problem, we divide the original signed adjacency matrix into positive-edge adjacency matrix and negative-edge adjacency matrix. We design the graph neural network on these adjacency matrices separately to ensure that the information of positive and negative edges can each be fully utilized, instead of totally ignoring the sign information or directly using the original adjacency matrices with 1 and -1 weights, which cannot fully consider the sign information during the aggregation process and result in unsatisfactory aggregation results. The aggregation process can be summarized in the following equations:

$$\begin{aligned} {\varvec{H}}_{p}^{l+1}= & {} \sigma (\widetilde{\varvec{D}}_{p}^{\tau -\frac{1}{2}}\widetilde{\varvec{M}}^\tau _{p}\widetilde{\varvec{D}}_{p}^{\tau -\frac{1}{2}}{\varvec{H}}_p^{(l)}{\varvec{W}}_p^{(l)})\end{aligned}$$
(6)
$$\begin{aligned} {\varvec{H}}_{n}^{l+1}= & {} \sigma (\widetilde{\varvec{D}}_{n}^{\tau -\frac{1}{2}}\widetilde{\varvec{M}}^\tau _{n}\widetilde{\varvec{D}}_{n}^{\tau -\frac{1}{2}}{\varvec{H}}_n^{(l)}{\varvec{W}}_n^{(l)})\end{aligned}$$
(7)
$$\begin{aligned} {\widehat{\varvec{Y}}}= & {} softmax({\varvec{H}}^{L}_p + {\varvec{H}}^L_n), \end{aligned}$$
(8)

where \(\widetilde{\varvec{M}}^{\tau }_p = \left[ \begin{array}{cc}{\varvec{M}}^\tau _S &{} {\varvec{M}}_p\\ {\varvec{M}}^\mathsf {T}_p &{} {\varvec{M}}^\tau _T \end{array}\right] + {\varvec{I}}_M\), \(\widetilde{\varvec{M}}^{\tau }_n = \left[ \begin{array}{cc}{\varvec{M}}^\tau _S &{} {\varvec{M}}_n\\ {\varvec{M}}^\mathsf {T}_n &{} {\varvec{M}}^\tau _T \end{array}\right] + {\varvec{I}}_M\), and \(\widetilde{\varvec{D}}^\tau _{pii} = \sum _j \widetilde{\varvec{M}}^\tau _{pij}\), \(\widetilde{\varvec{D}}^\tau _{nii} = \sum _j \widetilde{\varvec{M}}^\tau _{nij}\), \(\sigma\) denotes the activation function, \({\varvec{H}}_p^{(0)}\) and \({\varvec{H}}_n^{(0)}\) are the input features of the graph, and \({\widehat{\varvec{Y}}}\in (0, 1)^{\left| \mathcal {V}\right| \times 3}\) is the output tensor for the label prediction. It is worth noting that the softmax function is used to the sum of two sets of feature vectors in layer L to obtain \({\widehat{\varvec{Y}}}\), while activation function is applied when aggregating the feature vectors in other layers.

3.4 Learning objectives

With the obtained label prediction, we need to set a loss function as the training target of the model. As mentioned in Section 3.2, we have generated a label \(\varvec{y}\) for each node to describe the probability of this node to form a new positive/negative link. Based on that, we design the cross-entropy loss function as our learning objective:

$$\begin{aligned} \mathcal {L} = -\sum _{v_i \in \mathcal {V}_{train}}\sum ^{K}_{k}{\varvec{y}}_{k}(v_i)log({\widehat{\varvec{y}}}_{k}(v_i)), \end{aligned}$$
(9)

where \(\mathcal {V}_{train}\) is the node set involved in the training edge set, and K is the number of labels, in our case \(K = 3\). The prediction result \(\widehat{\varvec{Y}}\) is then mapped as attribute vector \(\tau\) for all nodes by \(f(\cdot ): \mathbb {R}^{\left| \mathcal {V}\right| \times 3} \rightarrow \mathbb {R}^{\left| \mathcal {V}\right| \times 1}\)

$$\begin{aligned} f(\widehat{\varvec{Y}}) = \frac{\widehat{\varvec{Y}}_{pos}-\widehat{\varvec{Y}}_{neg}}{\widehat{\varvec{Y}}_{pos}+ \widehat{\varvec{Y}}_{pol} + \widehat{\varvec{Y}}_{neg}}, \end{aligned}$$
(10)

where \(\widehat{\varvec{Y}}_{pos}, \widehat{\varvec{Y}}_{pol}, \widehat{\varvec{Y}}_{neg}\) represent the predicted probability of a node being of type positive tendency nodes, polarized nodes and negative tendency nodes.

The mapping function f could assign 1 as predicted polarity value to the nodes that are more likely to have positive edges with similar neighboring nodes and assign \(-1\) to the other nodes. We then use the computed polarity value to optimize the balance theory-based prediction.

3.5 Balance theory-based sign prediction

Inspired by the application of balance theory on signed bipartite graphs [6], we design the heuristic prediction process based on the balance theory. According to the balance theory, we add the signed edges between two nodes of the same type with the following equation

$$\begin{aligned} {\varvec{P}}_{ij} = \left\{ \begin{aligned} 0 \qquad&\delta _n< \eta _{ij}^A - \eta _{ij}^D < \delta _p \\ \eta _{ij}^A - \eta _{ij}^D \qquad&otherwise \end{aligned}\right. \end{aligned}$$
(11)

\(\varvec{P}\) is an adjacency matrix storing edges of the same type nodes. Counting the number of neighbors with the same sign edges \(\eta _{ij}^A\) and the number of neighbors with different sign edges \(\eta _{ij}^D\) which are linked to both nodes \(v_i\) and \(v_j\) to create edges. When \(\eta _{ij}^A - n_{ij}^D\) is greater or less than a certain threshold, \(v_i\) and \(v_j\) will be connected by an edge with weight \(\eta _{ij}^A - \eta _{ij}^D\). After adding edges between the same type nodes, and then combining the weighted adjacency matrix \({\varvec{A}} \in \mathbb {R}^{|S| \times |T|}\) of the original graph, the adjacency matrix \({\varvec{M}}^b \in \mathbb {R}^{\left| \mathcal {V}\right| \times \left| \mathcal {V}\right| }\) that can be used for random walk can be constructed:

$$\begin{aligned} {\varvec{M}}^{b} = \left[ \begin{array}{cc}\widehat{\varvec{P}}_S &{} \lambda \widehat{\varvec{B}} \\ \lambda \widehat{\varvec{B}}^\mathsf {T} &{} \widehat{\varvec{P}}_T \end{array}\right] , \end{aligned}$$
(12)

where \({\varvec{M}}^{b}\) is the matrix constructed according to balance theory, and \(\widehat{\varvec{B}}\) is the row normalized bi-adjacency matrix, i.e., \(\widehat{\varvec{B}}_{ij} = {\varvec{B}}_{ij} / \sum _k|{\varvec{B}}_{ik}|\) where \({\varvec{B}} \in \mathbb {R}^{|S| \times |T|}\) is the adjacency matrix of the signed bipartite graph, \(\lambda =2\) is the weight coefficient for \(\widehat{\varvec{B}}\), and we also normalize matrix \({\varvec{P}}\) and \({\varvec{M}}^{b}\) to get matrix \(\widehat{\varvec{P}}\) and \(\widehat{\varvec{M}}^{b}\). The relationship between any three nodes is consistent according to balance theory when and only when the number of negative edges linked to these three nodes is an even number. 4 We then perform two-step random walk to capture more information, and the signs of the edges that connect the starting and ending nodes are determined by the signs of visited edges in these random walk paths. As a result, the prediction matrix \({\varvec{Y}}^b\) under balance theory can be obtained by performing random walk on the normalized adjacency matrix \(\widehat{\varvec{M}}^{b}\):

$$\begin{aligned} \varvec{Y}^b = (1-c)(I-c\widehat{\varvec{M}}^{b})^{-1}, \end{aligned}$$
(13)

where \((1-c)\) is the random walk restart probability.

After obtaining the prediction matrix \({\varvec{Y}}^b\) that relies on the balance theory, we add it with weighted \({\varvec{\tau }}\) before the prediction. In order to enable \({\varvec{\tau }}\) to add directly to the matrix \({\varvec{Y}}^b\), we need to do the expansion operation on \({\varvec{\tau }}\) first, that is to say, copy \({\varvec{\tau }} \in \mathbb {R}^{ \left| \mathcal {V}\right| }\) for \(\left| \mathcal {V}\right|\) times to get \({\varvec{\tau }}_{expand} \in \mathbb {R}^{\left| \mathcal {V}\right| \times \left| \mathcal {V}\right| }\). To further improve the prediction results with the help of \(\varvec{\tau }\), in practice, we will first compute the mean value \(\overline{y}\) of \({\varvec{Y}}^b\), and then multiply \(\overline{y}\) with \({\varvec{\tau }}_{expand}\). On this basis, the optimized prediction matrix \(\varvec{Y}_\tau\) is obtained as:

$$\begin{aligned} {\varvec{Y}_\tau } = {\varvec{Y}}^b + \omega \cdot \overline{y} \cdot {\varvec{\tau }}_{expand}, \end{aligned}$$
(14)

where \(\omega\) is the adjustable weight. Finally, we use the sign of the value at the corresponding position in \(\varvec{Y}_\tau\) as the sign prediction of the given edge.

4 Experiment

In this section, we empirically prove and evaluate the effectiveness of polarity value \(\tau\) and GNN in the optimization of sign prediction performance. We attempt to answer the following two research questions:

  • Q1. Has the utilization of polarity property and graph neural network led to improved prediction results?

  • Q2. How much the proposed method improves the baselines?

To answer the above questions, experiments are conducted on three real-world and two synthetic signed bipartite graphs to evaluate the performance of the baseline methods for sign prediction task. To further investigate the advancement of our proposed method, we also perform a parameter sensitivity analysis on PbGCN. The datasets and baselines used are listed in Section 4.1, the experiment settings are introduced in Section 4.2, the results comparisons are shown in Section 4.3 and the parameter analysis results in Section 4.5.

4.1 Datasets and baselines

The experiments are conducted on the following real-life and synthetic datasets:

  • Senate and House [6] are directed graphs that contain the role call votes records from the 1st to 10th United States Senate and House of Representatives.

  • Gene Footnote 1 is a undirected graph which describes the regulatory effect of the transcription factor on the regulated gene.

  • Synth1 and Synth2 are synthetic datasets that we generate by randomly adding positive and negative edges between two randomly separated node sets.

The statistics of these datasets are summarized in Table 2.

Table 2 Statistics of Datasets

Two set of synthetic data are generated from two undirected bipartite graphs: Robertson (1929) Footnote 2 and Youtube Group Memberships Footnote 3. We randomly add positive and negative labels to the edges of these two bipartite graphs to generate symbolic bipartite graphs, where the probability of obtaining a positive label for each edge is 35It should be noted that since the youtube dataset is too large, we only take its first 230,000 edges to generate the dataset. The number of edges added by the one-mode projection is usually under five times of the number of original edges in the graph.

We follow the settings in [6] to randomly split 85%, 5% and 10% of the edge labels in the original datasets for training, validation and testing. More specifically, during the training process, only training set is accessible, and when evaluating the sign prediction performance, we compare the ground truth signs of the edges in the test set with the predicted sign suggested by the models.

We compare the proposed framework with the following baselines:

  • SBRWFootnote 4 [6] is a sign prediction algorithm on signed bipartite graphs by performing random walk on the adjacency matrix constructed according to the balance theory.

  • SBRW+GCN and SBRW+GAT are the optimized models that add the learned predictions by directly performing GCN [16] or GAT [27] in the original signed bipartite graph to the prediction value of SBRW. Please note that these baselines do not consider the sign information.

  • SBRW\(+\tau _{init}\) is the model that utilizes \(\tau _{init}\) to boost the prediction performance of SBRW by simply adding \(\tau _{init}\) to the results of SBRW.

4.2 Experimental settings

We designed the polarity-based graph neural network for edge sign prediction on signed bipartite graphs to improve the prediction accuracy compared to the conventional balance theory-based methods. In our proposed model, we use two-layer GCN model, choose Adam as the optimizer with the learning rate \(lr=0.01\), and we use the randomly generated features as the input of GCN. The dimension of input feature is 1000, the representation feature’s dimension is 64, and the final output dimension for prediction is 3. The random walk restart probability \(1-c\) is set to 0.15. Since the sign prediction problem can be formulated as a binary classification task, we select the widely-used Area Under Curve (AUC) and F1-score as the evaluation metrics. The bold entries in Tables 3 and 4 indicate the best results.

Fig. 4
figure 4

Parameter Analysis Results in PbGCN on Gene Dataset

Table 3 Experiment results for sign prediction

4.3 Sign prediction results

The experiment results are illustrated in Table 3. Our proposed PbGCN outperforms all baseline methods across all datasets. Compared to the state-of-the-art sign prediction model SBRW, PbGCN could significantly improve AUC and F1-score by up to \(6.54\%\) and \(16.71\%\) respectively. It is worth mentioning that SBRW guesses the sign based on the proportion of positive and negative edges in the dataset if it encounters a result that cannot be distinguished to make a decision. Our proposed PbGCN could avoid this situation, thus achieves better performance. The improvement compared to SBRW+GCN and SBRW+GAT indicates that the edges constructed by the polarity attribute and consideration of positive and negative links could consistently enhance the prediction accuracy. These results provide a positive answer to our first research question: the utilization of polarity value and GNNs indeed provides an empirically verifiable improvement.

Moreover, PbGCN reaches better performance in comparison to SBRW+\(\tau _{init}\). The reason is that \(\tau _{init}\) only contains the node attribute rather than its neighbor information. Therefore, PbGCN could easily make a more accurate prediction by the aggregation of neighbor information in the signed bipartite graph with added \(\tau _{init}\)-based edges.

Overall, it can be summarized that PbGCN achieves the superior performance because of the utilization of polarity attribute, one-mode projection and GNN which aggregates the neighbor information with consideration of both positive and negative links.

Table 4 Experiment results for ablation study

4.4 Ablation study

We also perform the ablation study to show the effectiveness of our model. We replace the graph neural network in the paper with the basic neural network (i.e., multilayer perceptron MLP), denoted as NN-Baseline. The results are illustrated in Table 4. We can find in the results that, in most datasets, PbGCN outperforms the NN-Baseline, which indicate the effectiveness of the design of PbGCN.

4.5 Parameter analysis

We conduct the parameter analysis experiments on the following parameters: weight coefficient \(\omega\) of learned \(\tau\), threshold \(\gamma\), number of GCN layers L and learning rate lr. We perform all the parameter analysis experiments on the Gene dataset. The analysis results are shown in Figure 4.

For PbGCN, the most important parameter is the weight \(\omega\) of \(\tau\). We tested seven \(\omega\) values varying between 0 and 0.3 with an interval of 0.05 as our weight. The result of tests can be seen in Figure 4(a) for AUC and Figure 4(b) for F1. When the weight \(\omega\) is set to 0, the prediction is made fully by the heuristic method based on balance theory. The best AUC value is achieved when the weight is 0.15. At the same time, F1-score reaches the salient point with a weight of 0.15. The performance of PbGCN worsens with the increment of \(\omega\). Therefore, for the Gene dataset, \(\omega\) was set to be around 0.15.

In addition, the threshold value \(\gamma\) that determines whether an edge is established between two nodes of the same type is also an important parameter. We tested six thresholds \(\{0.05, 0.1, 0.15, 0.2, 0.25, 0.3\}\), and the results are demonstrated in Figures 4(c) for AUC and Figure 4(d) for F1. We can make a conclusion from the experiment results that PbGCN is robust for F1-score with regards to varying thresholds \(\gamma\). The best point appears at 0.1 for AUC. As a result, for the Gene dataset, we choose 0.1 as the threshold value to determine whether an edge should be added between the same type nodes.

In order to test the impact of the varying number of layers L in the graph convolutional network, we evaluate our model with 1 to 5 GCN layers. The results are illustrated in Figure 4(e) and (f). Interestingly, unlike the fact observed before that deeper graph neural networks have worse performance, the overall impact of the number of GCN layers L is insignificant in our case. Consequently, we set the GCN model with a depth of 2 for higher efficiency while maintaining effectiveness.

Finally, we tested the learning rate lr with five values {\(1 \times 10^{-4}, 1 \times 10^{-3}, 5 \times 10^{-3}, 0.01, 0.1\)} with results in Figure 4(g) and (h). While proving the robustness of proposed PbGCN, the results also indicate that the salient point for learning rate is 0.01 on the Gene dataset. Taking the experiment results on other datasets into consideration, we finally set the default learning rate of our model to 0.01.

5 Conclusion

The signed bipartite graphs are becoming more and more ubiquitous in real-life applications, but few research works are conducted due to the complexities brought by the signed links and bipartite settings. In this paper, we propose the first graph neural network on signed bipartite graphs for sign prediction task. The proposed PbGCN first introduces the polarity attribute to the signed bipartite graphs, in order to describe the probability of a node for establishing positive/negative links with others. Based on these polarity values, we construct one-mode projection graphs that enable the developed graph convolutional network to perform neighbor aggregation more effectively on signed bipartite graphs. By integrating the learning-based and balance theory-based predictions, PbGCN boosts the sign prediction accuracy significantly. Comprehensive experiments on three real-life and two synthetic datasets prove the notable improvement of PbGCN compared with the state-of-the-art methods.