Introduction

Network Representation Learning (NRL) is a useful tool that maintains the structure feature of a network in low-dimensional space. NRL [1] makes people better understand semantic relationships between vertices and the topology of the interactions the components, which has been prove effective in a various of network processing and analysis tasks such as node classification [2], recommendation systems [3], and network visualization [4]. NRL learns dense and structural characteristics and represents them in into low-dimensional space, effectively denoising or removing redundant information while preserving the essential structure. In real-world networks, network edges usually have directionality, and nodes often contain rich information. For example, the web pages on Internet are connected each other, forming a graph where web pages serve as vertices. These webpages typically contain rich text information, and the tasks like web-page information retrieval often involve high-dimensional and directed data. While web-page information retrieval has been addressed through web rank algorithm like PageRank [5, 6], vertex classification and clustering remain challenging in directed network. Zhou [7] proposed a semi-supervised learning algorithm for classification on directed graph, but it only focuses in digraph partition. Chen et al. [8] proposed an algorithm for vertex embedding in directed networks using random walk, but this work only considers directional feature in directed network representation. In the past, there have been many network representation learning solutions available for undirected networks, but the directionality of edge and rich node information have been less explored in directed NRL. Therefore, a natural idea is to learn directed network representations by considering both the network direction and text information. The edge direction representations is derived from directed network structure, while the text information of vertices provides additional information for directed NRL. Nevertheless, the directional feature of network and rich information are typically treated independently and concatenated as separate representations. In our algorithm, we propose an approach that leverages the directional features and text attributes in the digraph. Our algorithm utilizes in-degree-based Laplacian and signless Laplacian with vertex text information through induction matrix completion(IMC) [9]. One main contribution is the matrix definition of in-degree-based Laplacian and signless Laplacian for digraph. In our experiment, we compared baseline on four open directed network datasets. The classification [10] accuracy in these datasets outperforms baseline by maximum of 20% when the training ratio ranges from 10 to 90%. The experimental results show an advantage of up to 20% over the compared baseline, especially when the training ratio is 10%.

Related work

Related method

In the realm of homogeneous information networks, they can be divided into three types of representation learning methods: those that consider only vertices, those that consider only edges, and the third synthesizes information from both edges and vertices, supplemented by external descriptors such as labels and text. Regarding Network Representation Learning-based primarily on network architecture, computational complexity escalates when calculating vertex adjacency matrices. Some researchers proposed the Laplace matrix method [11], which is a spectral decomposition method that uses k-nearest neighbor kernel functions to learn high similarly of vertex representations. Inspired by the Word2vec algorithm [12] in natural language processing, through Perozzi et al. [13] proposed the DeepWalk algorithm. This algorithm randomly walks through all the nodes in the network, generating an ordered sequence of nodes. The closer the nodes are the higher the co-occurrence probability. The skip-gram model is then used to predict the sequence left and right the single node, and the node is represented by a learned vector. Node2vec [14] improved upon DeepWalk by introducing breadth-first search and depth-first search into the random walk sequence generation process using two parameters p and q. The LINE [15] algorithm preserves the first-order node similarity and the second-order neighbor similarity in the network and can effectively preserve both the local structure and global network structure. The SDNE algorithm [16] uses a semi-supervised deep neural network to model multiple nonlinear mappings for NRL. It retains local information from the Laplacian matrix and global information based on the unsupervised deep self-encoding learning.

NRL methods that combine edge and vertex with external information include the work by Tu et al. [17]. It proposed a semi-supervised network model MMDW, which is based on matrix decomposition and can learn node representations containing both network structure and vertex information. TADW [18] is another matrix decomposition-based representation learning method that incorporates text information of vertices. However, such algorithm like TADW and MMDW have a high computational cost, and they simply combine the node attributes, leading to a loss of semantic information in the network. Sun et al. [19]. proposed the CENE algorithm, which uses a logistic regression function to learn an extended network and optimizes the objective function using negative sampling. This algorithm captures both the network structure and the semantic information between the node and the content. Pan et al. [20] proposed a deep learning model TriDNR that combines three types of network information, namely, network structure, node content and node label. The random walk of TriDNR model retains the structural similarity between vertices, and a neural network is used to learn the correlations of between the nodes contexts. The node labels are also used as input to learn the label vector and the word vector, but the model is based on an undirected network.

Currently, there is relatively little research on representation learning methods base on in-degree Laplace matrix of directed networks.

Text-enhanced method

In the undirected network context, DeepWalk [13] generated an ordered sequence of vertices by random walking through all vertices. Given an undirected graph, V as the vertex set and E as the edge set, the DeepWalk objective function is as follows,

$$\begin{aligned} O(V)=\frac{1}{|V|}\sum _{(i=1)}^{|V|}\sum _{(i-t)\le j\le (i+t),j\ne i}logPr(v_j|v_i). \end{aligned}$$
(1)

where V is a set of random walk sampling vertices. \(v_i\) is current central vertex within the sampling window. For a given length t, namely size of sampling window. The vertex pair with as the center \(v_i\) and \(v_j\) as the context vertex, reachable from \(v_i\) via step t. \(Pr(v_j|v_i)\) defined by the Softmax function as:

$$\begin{aligned} Pr(v_j|v_i)=\frac{exp(\mathbf {v_jv_i})}{\sum _{v\in V}exp(\mathbf {v^{'}v_i}))}. \end{aligned}$$
(2)

Bold v represent vector of vertices. Yang et al. [18] proves that DeepWalk is equivalent to the factorization matrix. The matrix \(M_{ij}\) is adjacent matrix obtained by random walk t step, traversing the entire network. Instead of using \(M_{ij}\) for computational efficiency, Yang approximately factorizes the matrix using \(M = (A+A^2)/2 \).

$$\begin{aligned} M_{ij}= log\left( \frac{[e_i(A + A^2 +\cdots + A^t)]_j}{t}\right) . \end{aligned}$$
(3)

Yang proposes Text-Associated DeepWalk (TADW) algorithm, which incorporates a corresponding text feature matrix T. They propose TADW to learn representation of each vertex \(v_i \in V \) from both the network structure and the text features T. The TADW algorithm has a complexity of \(O(|V|^3)\) when t is large. In fact, DeepWalk is based on neural networks to avoid explicitly computation of the accurate matrix M.

Let the size of the matrix M be \(n \times n\) and T be the text features, the optimization problem is solved. Yang proposes objective function as follows,

$$\begin{aligned} \begin{aligned}&min_{W, H}\sum _{(i,j)\in {\Omega }}\Vert M-(W^THT)_{i,j}\Vert _F^2\\&\quad +\frac{\varepsilon }{2}(\Vert W\Vert _F^2+\Vert H\Vert _F^2). \end{aligned} \end{aligned}$$
(4)

Yang also uses the inductive matrix completion (IMC) algorithm proposed by Nagarajan and Dhillon [9]. Nagarajan and Dhillon propose the IMC algorithm, which hypothesizes an association matrix that is generated by applying feature vectors associated with its rows and columns to low-rank matrix \(W^TH\). Let \(M^{'}\) be observation matrix, we aim to recover \(W^TH\), where \(W\in {\mathbb {R}}^{N_g\times k}\) and \(H\in {\mathbb {R}}^{N_t\times k}\) have a rank \(k \ll m,n\). Minimizing \(W^TH\approx M^{'}\) is an optimization problem. However, solving a rank constraint optimization is always NP-hard. Therefore, the NP-hard problem is relaxed by using square loss function and adding two regular items. These two regular terms can limit overfilling of the data. The IMC algorithm incorporates more information from row and column units by including two feature matrices in the objective function. In the IMC algorithm, let \(x_i\in {\mathbb {R}}^{f_g}\) denote the gene feature vector of genes i, \(y_j\in {\mathbb {R}}^{f_t}\) denote the feature vector j for diseases, \(X\in {\mathbb {R}}^{N_g \times f_g} \) denote the training feature matrix of \(N_g\) genes, and let \(Y \in {\mathbb {R}}^{N_t\times f_t}\) denote as the training feature matrix of \(N_t\) disease. The IMC algorithm aims to recover a low-rank matrix using the observed entries from the genes–disease association matrix \(M^{'}_{i,j}\). Denote observed entries by \(M^{'}_{i,j} \in {\mathbb {R}}^{f_g \times f_t} \) in \(\Omega \).

Natarajan and Dhillon formalized the objective function:

$$\begin{aligned} \begin{aligned}&min_{W,H}\sum _{(i,j)\in {\Omega }}(M^{'}_{i,j}-(X^TW^THY)_{i,j})^2\\&\quad +\frac{\varepsilon }{2}(\Vert W\Vert _F^2+\Vert H\Vert _F^2). \end{aligned} \end{aligned}$$
(5)

where \(\varepsilon \) is a regularization parameter. one idea is inspired by the inductive matrix completion for the directed network representation is to incorporate the in-direction and the text features of a vertex to obtain better directed networks representations. Our algorithm obtains representation vector for the vertices of directed network using in-degree-Laplacian, in-degree-signless-Laplacian and text features.

Embedding algorithm of directed network

World Wide Web is the largest network for which topological information is currently available. It is a typical directed network dataset. Our algorithm uses digraph theory to model the web-page hyperlink of World Wide Web in Internet. The vertices of the web-page are the documents (webpages) and the edges are the hyperlinks (URL’s) that connect one document to another. Let G denote a digraph, and E denote an edge set, and V denote a finite vertex set. An arc of a digraph is an ordered pair vertex \((v_i,v_j)\). Each arc may have a positive weight w. The in-degree \(d^-_i\) of a vertex \(v_i\) for digraph is defined as \(d^-_i=\sum _{v_j,v_j\rightarrow v_i}w(v_i,v_j)\), where \(v_j\rightarrow v_i\) means \(v_j\) has a directed arc pointing to \(v_i\). On the digraph, when a Markov random walk through the digraph, a transition probability matrix is defined as \(P=[p(v_i,v_j)]_{v_i,v_j}\), where \(p(v_i,v_j)\) is transition probability, which means a walker random walk from the \(v_i\) vertex to the \(v_j\) vertex. Meanwhile, \(\sum _{v_i}p(v_i,v_j)=1,\forall v_j\). A Markov random walk is a stochastic process in which the next state of each vertex depends only on its current state. The stationary distribution of vertices in a Markov random walk is the probability distribution that describes the long-term behavior of the walker. The stationary distribution of vertices is the probability distribution to which the visit frequencies of each vertex converge when the walk reaches a steady state. If the digraph is irreducible, the digraph is connected, for the stationary distribution of each vertex can assume is \(\pi \) and \(\sum _{v_i}\pi _{v_i}=1\) [21]. The entries of transition probability matrix can be defined as \( p(v_i,v_j)=w(v_i,v_j)/d^-(v_i)\), which means that a random walker on a vertex jumps to its neighbors with a probability proportional to the edge weight.

Preliminaries

Since the achievements of research on directed graphs are seldom used in network representation, a natural idea of directed network representation is to convert directed edges to undirected edges [22, 23]. Because an undirected networks representation are simpler and easier to analyze than directed networks. However, this conversion can also lead to the loss of some information, such as the direction of the edges. Therefore, the decision of whether or not to convert a directed network to an undirected network depends on the specific needs and constraints of the problem.

Case 1

One widely used method is the \(A+A^T\) symmetrization method. Let \(A_U\) be denoted by undirected network’s adjacency matrix. This method involves adding the original directed network’s adjacency matrix A with its transpose \(A^T\) to create a symmetric adjacency matrix for an undirected network. Essentially, this means replacing each directed edge with an undirected one. However, this symmetrization method does not fully capture in and out edges on node similarity.

$$\begin{aligned} A_U= A + A^T. \end{aligned}$$
(6)

Case 2

The symmetrization based on random walk transforms the directed normalized cuts of a set of nodes in the generated symmetrized undirected network into the normalized cuts of the same set [24]. Specifically, the adjacency matrix of the digraph can be represented as \(A=(\Pi P + P^T\Pi )/2\), where P is the transition matrix of random walks and \(P^T\) is the transport matrix of the transition matrix. \(\Pi = \text {diag}(\pi _1, \pi _2, \ldots , \pi _n)\) is a diagonal matrix with the probability of staying at each node in the stationary state (stationary distribution). However, due to its dependency on the normalized cut criterion, the symmetrization method follows the concept of density-based clustering, making it challenging to identify easily other types of meaningful structures. In various social networks, it is difficult to identify the connections between different communities. For instance, members belonging to the same community, often interact with each other and share common interests. By identifying these communities, we can observe that their connections with each other are dense, with relatively fewer connections to other communities. Nevertheless, the connections between communities is of great importance in practice.

$$\begin{aligned} A_U= \frac{(\Pi P + P^T\Pi )}{2}. \end{aligned}$$
(7)

Case 3

The symmetric combination methods commonly utilize the adjacency matrix of a directed graph, specifically \(A^TA\), identify the common in-edges (the number of shared nodes being pointed to by two nodes) between pairs of nodes. Meanwhile, \(AA^T\) captures the common out-edges. This combination method [25] that takes into account in-degree and out-degree for digraph, recognizes the equal importance of both the in-degree and out-degree of nodes in communities. Then, the combination matrix is a symmetric matrix.

$$\begin{aligned} A_U = AA^T + A^TA. \end{aligned}$$
(8)

Case 4

A major characteristic of real-world networks is that they exhibit a power-law degree distribution [26]. This means that within a network, there are a few nodes with very high degrees, while the majority of nodes have lower degrees. In a directed network, the contribution of each node to the community will be normalized based on its node degree, which includes both in-degree and out-degree. To symmetrize the adjacency matrix of a digraph, we consider the contribution of the node degree. Let \(\alpha \) denote the contribution of the in-degree and \(\beta \) denote the contribution of out-degree of the node.

$$\begin{aligned} A_U=B+C, \end{aligned}$$
(9)

where

$$\begin{aligned} \begin{aligned} B&= D_{+}^{-\alpha } AD_{-}^{-\beta } A^T D_{+}^{-\alpha },\\ C&= D_{-}^{-\beta } A^T D_{+}^{-\alpha } AD_{-}^{-\beta },\\ \end{aligned} \end{aligned}$$
(10)

when \(\alpha =\beta =0.5 \), the Eq. (9) has

$$\begin{aligned} \begin{aligned} A_U=D_{+}^{-0.5} AD_{-}^{-0.5} A^T D_{+}^{-0.5} +D_{-}^{-0.5} A^T D_{+}^{-0.5} AD_{-}^{-0.5}.\\ \end{aligned} \end{aligned}$$
(11)

Where \(D_{+}\) denotes out-degree matrix and \(D_{-}\) denotes in-degree matrix. The above method is a representation for converting a directed graph into an undirected graph, and it is an equivalent representation of the network.

Case 5

Another method is to translate directed graph into a bipartite graph. We will not go into further details on this here.

Similar to the adjacency matrix, Laplacian matrix can also be applied using these network representation method. Just as combination Laplacian matrix. Equation (12) is defined as a combination of Laplacian Laplacian matrix by Chung [21].

$$\begin{aligned} L=\Phi -\frac{\Phi P+P^{*}\Phi }{2}. \end{aligned}$$
(12)

Definition

Our directed NRL algorithm defines the relevant concept and considers the vertex’s in-degree in the digraph as follows:

Definition 1

in-degree-Laplacian matrix \(L^-\) of digraph G is defined as an \( | n | \times | n |\) matrix,

If \(i=j\), let \(d^-_{ij}\) denote in-degrees of the ith vertex, and if \(i\not =j\), (ij)th entry value is \(-1\) of in-degree-Laplacian matrix in digraph.

Definition 2

In-degree-signless-Laplacian matrix \(Q^-\) of digraph G is defined \( | n | \times | n |\) matrix,

where \(i=j\), \(d^-_{ij}\) be denoted by in-degrees of the ith vertex, and when \(i\not =j\), (ij)th entry value is 1 of in-degree-signless-Laplacian matrix in digraph.

Embedding procedure

Next, we suppose A is Hermite matrix, thus we have \(A^H=A\).

Theorem 1

According to the characteristic equation of the matrix \(Ax=\lambda x\), where A is an n order Hermite matrix, and x is eigenvectors. We calculate the eigenvalues \(\lambda _1,\lambda _2,\ldots ,\lambda _n\) and eigenvectors \(x_1,x_2,\ldots ,x_n\), we have the orthogonal eigenvectors. Let \(\Lambda =\text {Diag}(\lambda _{1},\ldots ,\lambda _{n}),\Gamma =(x_1,x_2,\ldots ,x_n), x=\Gamma y.\) Then, we have the standard quadratic form of Hermite matrix,

$$\begin{aligned} f(x)=y^H\Lambda y \end{aligned}$$
(13)

Proof

$$\begin{aligned} \begin{aligned} f(x)=x^HAx&=(\Gamma y)^HA(\Gamma y)\\&=y^H(\Gamma ^HA\Gamma ) y\\&=y^H(\Gamma ^HA\Gamma )y\\&=y^H\Lambda y.\\ \end{aligned} \end{aligned}$$
(14)

\(\square \)

The adjacent matrix \(A^-\) of digraph is Hermitian and \(A^{-H}\) is the transpose matrix of \(A^-\). Through matrix operations, the diagonal of \(A^-A^{-H}\) is denoted as the out-degree of vertices, and the diagonal of \(A^{-H}A^-\) is denoted as the in-degree of vertices. Namely,

$$\begin{aligned} \begin{aligned} \text {Diag}(A^{-H}A^-)=\text {Diag}(d^-_{1},d^-_{2},\ldots ,d^-_{n}),\\ \text {Diag}(A^-A^{-H})=\text {Diag}(d^+_{1},d^+_{2},\ldots ,d^+_{n}).\\ \end{aligned} \end{aligned}$$
(15)

By Theorem 1, we obtain,

$$\begin{aligned} \begin{aligned} f(x)&=x^HA^-x\\&=(\Gamma y)^HA^-(\Gamma y)\\&=y^H\Gamma ^HA^-\Gamma y\\&=y^H(\Gamma ^HA^-\Gamma )y\\&=y^H\Lambda ^-y,\\ \end{aligned} \end{aligned}$$
(16)

where \(\Lambda ^-\) is eigenvalue of in-degree based. According to Rayleigh quotient and Perron-Frobenius Theorem, a digraph has a transition probability matrix P and the Perron vector \(\pi \). For any \(f: V(G) \rightarrow {\mathbb {C}}\), we have an irreducible matrix \(A^-\) with non-negative entries that has a unique (left) eigenvector with all entries positive. Let \(\lambda _i^-\) denote the in-degree-based eigenvalue of the all positive eigenvector of transition probability matrix P. Therefore, a strongly connected digraph has a unique left eigenvector \(\pi _i\) with \(\pi (v_i)> 0\) for all \(v_i\), we have \(\Pi P = \Lambda ^-\Pi \). Let \(\Pi ^-_i\) be the stationary distribution of in-degree diagonal matrix, i.e. \(\Pi ^-=\text {diag}(\pi ^{-}_1,\pi ^{-}_2\ldots ,\pi ^{-}_n)\).

Next, the in-degree-based Laplacian standard quadratic form of directed network is derived as follows:

Proposition 1

$$\begin{aligned} \begin{aligned} f'Lf&=x^{H}Lx\\&=x^{H}D^-x-x^{H}A^-x\\&=y^2D^--y^H\Lambda ^-y\\&=D^-\sum _{i=1}^{n}f^{2}(v_{i})-\sum _{i,j=1}^{n}\lambda ^-_{v_{j}\rightarrow v_{i}}f(v_{i})f(v_{j})\\&=\frac{1}{2}\Bigg (\sum _{i=1}^{n}d_{i}f^{2}(v_{i})-2\sum _{i,j=1}^{n}{\lambda ^-_{v_{j}\rightarrow v_{i}}}f(v_{i})f(v_{j})\\&\quad + \sum _{j=1}^{n}d_{j}f^{2}(v_{j})\Bigg )\\&=\frac{1}{2}\left( \sum _{i=1}^{n}{\lambda ^-_{v_{j}\rightarrow v_{i}}}(f(v_{i})-f(v_{j})\right) ^{2}\\ \end{aligned}\nonumber \\ \end{aligned}$$
(17)
Fig. 1
figure 1

Diagram of the proposed algorithm of directed graph representation with Text. Firstly, we construct digraph structure by using four open sources datasets. Secondly, we incorporate test information through matrix completion in directed network representation learning

By Theorem 1 and Proposition 1, we obtain representation learning of the directed network. The representations based on the in-degree-Laplacian and In-degree-signless-Laplacian is given by:

$$\begin{aligned} L^-= & {} \Pi ^--P^H\Pi ^-\end{aligned}$$
(18)
$$\begin{aligned} Q^-= & {} \Pi ^-+P^H\Pi ^- \end{aligned}$$
(19)

Our algorithm utilizes the IMC to supervised node classification on WebKb network database. IMC is to recover a low-rank matrix using the observed entries from the directional and text association matrix. The directional low-rank constraint problem is solved on S with rich text information. Our algorithm aims to minimize the following objective functions:

$$\begin{aligned} \begin{aligned}&min_{W, H}\sum _{(i,j)\in {\Omega }}(L^-_{i,j}-(X^TW^THT)_{i,j})^2\\&\quad +\frac{\varepsilon }{2}(\Vert W\Vert _F^2+\Vert H\Vert _F^2), \end{aligned} \end{aligned}$$
(20)

or

$$\begin{aligned} \begin{aligned}&min_{W, H}\sum _{(i,j)\in {\Omega }}(Q^-_{i,j}-(X^TW^THT)_{i,j})^2\\&\quad +\frac{\varepsilon }{2}(\Vert W\Vert _F^2+\Vert H\Vert _F^2). \end{aligned} \end{aligned}$$
(21)

Let \(S^-\) denoted \(L^-\) or \(Q^-\), our algorithm framework is shown in Fig. 1, and and the pseudo code is given in Algorithm 3.3.

Algorithm 1
figure a

The DGRT Algorithm

Our algorithm solves directed network embedding from in-degree-Laplacian matrix and in-degree-signless-Laplacian matrix with rich text information.

Table 1 Accuracy in difference training ratio on Cornell
Table 2 Accuracy in difference training ratio on Texas
Table 3 Accuracy in difference training ratio on Washington
Table 4 Accuracy in difference training ratio on Wisconsin

Complexity analysis

In this work, the complexity analysis of optimization problem is performed according to Yu [26]. For equation,

$$\begin{aligned} min_{W, H}\sum _{(i,j)\in {\Omega }}(S^-_{i,j}-(W^THT)_{i,j})^2 +\frac{\varepsilon }{2}(\Vert W\Vert _F^2+\Vert H\Vert _F^2),\nonumber \\ \end{aligned}$$
(22)

where W and HT are convex functions the complexity of each iteration involves minimizing W and HT. This is equivalent to a regularized least squares loss problem with k variables. If the variables k and \(|\Omega |\) are large, Conjugate Gradient iterative methods can be used. These methods offer a cheap updates and provide a good approximate solution within a few iterations. The method is particularly suitable for solving Eq. (10). Consequently, Yu’s techniques make the alternating minimization efficient enough to handle the large-scale problems, especially for the gradient calculations. Similarly, the complexity of our algorithm is equal to \(O(nnz(S)k +|V|f_tk+|V|k^2)\).

Experiments

Our experiments utilize multi-class vertex classification to evaluate the performance of in-degree-based network representation learning for a directed graph. The final result is a low-dimensional representation of directed network datasets with rich text attributes. The experiments involve predicting the type of webpages based on web attributes, with SVM (Support Vector Machine) selected as the classifier. The proposed algorithm is evaluated using the baseline method in the experiments on the same publicly available web datasets.

Experiments settings

Datasets This dataset contains webpages of four universities: Cornell, Texas, Washington, Wisconsin, and labeled with whether they are professor, student, project, or other pages. The dataset of Cornell has 867 webpages, Texas 827 webpages, Washington 1205 webpages, and Wisconsin 1263 webpages.

Table 5 Accuracy in difference training ratio and embedding parameter of Cornell
Table 6 Accuracy in difference training ratio and embedding parameter of Texas
Table 7 Accuracy in differ training ratio and embedding parameter of Washington
Table 8 Accuracy in difference training ratio and embedding parameter of Wisconsin
Fig. 2
figure 2

Four universities respectively embedding in textrank \(=\) 20

Fig. 3
figure 3

Four universities embedding of the same method in textrank \(=\) 50

Fig. 4
figure 4

In-degree-Laplacian parameter sensitivity of k and \(\epsilon \)

Fig. 5
figure 5

In-degree-signless-Laplacian parameter sensitivity of k and \(\epsilon \)

Baseline CL Combina-Laplacian is the baseline method. The performance of node classification is between 36 and 53%, in the corresponding data set. Chen [8] proposed an algorithm embedding on the directed network based on Chung [21]. Chung proposed combination Laplacian of directed graph, \(L=\Phi -\frac{\Phi P+P^{*}\Phi }{2}\), where L is combination Laplacian matrix, P is the transition matrix of directed network, \(\Phi \) is a diagonal matrix of the stationary distribution, i.e. \(\Phi = \text {diag}(\phi _1,\ldots \phi _n)\) with entries \(\Phi (v, v) = \phi (v)\). Observing Chung’s definition of the combinatorial Laplacian matrix, it can be seen that the combinatorial Laplacian matrix is symmetric. In fact, the matrices of the digraph is asymmetrical, including adjacency matrix of directed graph, Laplacian matrix of directed graph, signless Laplacian matrix of directed graph. And our defined in-degree-Laplacian and in-degree-signless-Laplacian matrix are also asymmetrical.

Adjacent matrix average Adjacent matrix average method decomposes the adjacency matrix of the directed network, the accuracy ratio is 36% to 58%.

IDL In-degree-Laplacian is the Kirchhoff matrix factorization. It is a our proposed algorithm. The performance is at least 3 percentage better than the baseline method.

IDUL In-degree-signless-Laplacian is the signless Kirchhoff matrix factorization. The performance is at least 4 percentage better than the baseline method.

CLT Combina-Laplacian with text is our improved baseline method. It introduces vertex text information. It can be observed that the performance improvement is at least 20% better.

IDLT In-degree-Laplacian with text is the Kirchhoff matrix with text method. The improvement performance is at least 23 better.

IDULT In-degree-signless-Laplacian with text is signless Kirchhoff matrix with text method. The performance is at least 23 better than the baseline method.

Settings For all four university webpages datasets, we select \(k=80\), \(\lambda =0.2\), text attributes dimension \(T\in (10,20,30,50)\). For supervised classifier, linear SVM is implemented by liblinear [10]. The training ratio varies from 10 to 90% for linear SVM.

Results

The application of directed network representation learning involves task of the vertex classification. In this work, we evaluate the vertex classification performance of the proposed DGRT (Directed Graph Representation with Text) algorithm on four university web page datasets in the USA in comparison with existing directed representation learning algorithms. The experimental results of the node classification are Tables 1, 2, 3 and 4, which display the classification accuracies for webpages of universities: Cornell, Texas, Washington, Wisconsin. The experimental simulation results of the accuracy rate for network node classification of difference training ratio and embedding parameter are shown in Tables 5, 6, 7 and 8. These results show that our algorithm outperforms the baseline by over \(20\%\) when the train ratio ranges from 10 to 90%. Based on in-degree (out-degree), it can more effectively capture the directional features of digraph and is more suitable for directed networks representation. Also, when we integrate text information through matrix completion in directed NRL experiments, we achieve an improvement of up to 20% compared to the baseline, particularly when the training ratio is 10%.

Parameter sensitivity The DGRT algorithm uses two hyper-parameters: the network representation vector and the regularization term. To analyze their impact on the learning performance of the network representation, we denote these hyper-parameters as the vector dimension k and the regularization term \(\epsilon \). These hyper-parameters can take different values. For the vector dimension k, we vary it from 40 to 100. For the regularization term \(\epsilon \), we vary it from 0.1 to 1. These variations are made when the training ratio is set to 10%. The corresponding vertex classification accuracies are shown in Figs. 2, 3, 4 and 5.

Conclusion

In this work, we propose a novel algorithm that utilizes the rich text features of vertices in directed network representation learning. Experimental results demonstrate its effectiveness and robustness across different training ratios using the WebKb datasets. The experiments show that the directed attribute is captured by the in-degree-Laplacian and the in-degree-signless-Laplacian matrix, outperforming the baseline method. Furthermore, the inclusion of text information enhances directed network representation learning. The experimental results show an advantage of up to 20% compared to the baseline method. Instead of simply concatenating features, our algorithm combines them to provide a novel approach to modeling for digraph. As for future work, a promising research direction is to explore online and distributed learning techniques for large-scale network data.