A novel message passing neural network based on neighborhood expansion

Most message passing neural networks (MPNNs) are widely used for assortative network representation learning under the assumption of homophily between connected nodes. However, this fundamental assumption is inconsistent with the heterophily of disassortative networks (DNs) in many real-world applications. Therefore, we propose a novel MPNN called NEDA based on neighborhood expansion for disassortative network representation learning (DNRL). Specifically, our NEDA first performs neighborhood expansion to seek more informative nodes for aggregation and then performs data augmentation to speed up the optimization process of a set of parameter matrices at the maximum available training data with minimal computational cost. To evaluate the performance of NEDA comprehensively, we perform several experiments on benchmark disassortative network datasets with variable sizes, where the results demonstrate the effectiveness of our NEDA model. The code is publicly available at https://github.com/xueyanfeng/NEDA.


Introduction
Network representation learning aims to learn the representations of network nodes so that these learned representations can be expressed as latent, informative, and low-dimensional vectors while preserving the network topology, node features, labels, and other auxiliary information [24,31]. These generated vectors can support a wide range of downstream tasks most effectively, such as node classification [15,21,42], link prediction [4,48] and recommendation [43]. Therefore, MPNNs [13,51] have aroused a significant amount of research interest as a powerful tool for learning multi-level representations automatically from networks without extensive domain knowledge or laborious exploration of the networks themselves.
MPNNs aggregate information recursively from nei-ghbourhoods to get a global view of the whole network. The neighborhood aggregation implicitly assumes that the observed network satisfies homophily [28] which can be understood as the general tendency of adjacent nodes with similar features and same labels to form edges in the context of semi-supervised node classification. However, this fundamental assumption might not be applicable for DNs where linked nodes may have distinct features and different labels, such as chemical and molecular networks [50], and biological and technological networks [38]. In other words, the homophily hypothesis fails to match the phenomenon of "opposites attracting" each other in the real world [50]. Consequently, at least two drawbacks arise when a DN is fed as an input into existing MPNNs: (1) Features ("messages") of destination nodes will be "washed out" by those from a large number of irrelevant and proximal source nodes [33]. For example, some classic MPNNs specially designed for assortative networks, such as GCN [21] and GAT [42], are outperformed by a * Zhen Jin jinzhn@263.net Yanfeng Xue xueyanfeng0819@qq.com 1 Multilayer Perceptron (MLP) with 1 hidden layer, which ignores the network structure [50]; (2) No one node can capture information from distant nodes or extract deep representations, further exacerbating the limited representation capability [17,33]. For example, as shown in Fig. 1, information of node 3 cannot fuse with that of node 9, even though both nodes have the same label and the highest structural role proximity [47]. Simply stacking hidden layers of MPNNs can capture long-range dependencies. However, it will inevitably cause an expensive recursive message-passing procedure [5] and an oversmoothing problem that nodes are totally indistinguishable from each other in the final representation space [7,26].
Intuitively, (1) nodes should aggregate more similar nodes with higher probability and vice versa; (2) Structural information should break the limitation of the immediate neighborhood so that nodes can capture long-distance dependencies. Inspired by these two intuitions, we designed a neighborhood expansion technique for simultaneously solving these two problems caused by neighborhood aggregation in heterophily settings. Then data augmentation [15] is performed on the expanded neighborhood for cheap computations. Finally, our NEDA successfully generalizes to networks with heterophily. Our contributions are as follows: • Heterophily generalization of MPNNs We design the neighborhood expansion strategy through a transmission dynamics for DNRL under the heterophily hypothesis between connected nodes. • Preprocessing for the neighborhood of nodes As a preprocessing method for the neighborhood of nodes, our neighborhood expansion can be equipped with other more sophisticated MPNNs and benefit from practical scenarios where the similarity between a pair of nodes can be defined via domain knowledge. • Comparison among techniques We compare our neighborhood expansion with a wide range of techniques empirically and comprehensively, which help other researchers and practitioners design more explanatory graph neural networks (GNNs).
We empirically validate the effectiveness of NEDA via thorough comparisons with state-of-the-art alternatives on several challenging benchmarks with heterophily. In addition, we also verify the rationality of the neighborhood expansion and data augmentation and perform an ablation study. The rest of the paper is organized as follows. We introduce notations and preliminaries in Sect. 2, and review some of the related works in Sect. 3. Sections 4 and 5 are dedicated to our proposed model and its analysis, respectively. The experimental results are in Sect. 6, and the conclusions are in Sect. 7, respectively.

Notations and preliminaries
This section briefly introduces some basic notations and relevant epidemic models.

Notations
Let G(V, E) denotes a network with a set of |V| = N nodes and a set of |E| = M edges between node pairs. Each node n ∈ {1, … , N} is associated with (1) the feature vector n ∈ F where F represents the feature dimension, (2) the ground-truth label vector n represented by a one-hot vector with C classes, and (3) the set (n) representing the set of its neighbor nodes in G.
A disassortative network The heterophily level of G can be formulated as 1 : where ind(⋅) denotes the indicator function and het(G) ∈ [0, 1] . The heterophily of G becomes stronger as het(G) → 1, and G is generally considered a disassortative network if het(G) > 0.5 . In Fig. 1, it is obviously the disassortative network G because het(G) = 0.80(> 0.5).

Epidemic models
Epidemic models are typically employed to simulate the processes of an infectious disease propagating in contact networks where nodes represent individuals, and edges encode the contact pattern amongst individuals [22,30].
Susceptible-infected (SI) models [2] are the simplest epidemic models where any node n in an infected or susceptible status is represented by n or n , respectively. The status space of the edges can be expressed as { i j , i j , i j , i j } according  Fig. 1 A toy example of a disassortative network G where each node tends to connect to nodes with distinct labels represented by three different shapes to the status of its endpoints i and j . The volume of transmission events is equal to the number of IS edges |{ }| assuming that the propagation of the infectious disease can only be transmitted from i to j at each time step t. Simulations for transmission events are performed with probability in individual-based SI epidemic models, as shown in Fig. 2 where the red and blue solid circles represent nodes in the infected and susceptible status, respectively. In other words, an infected node i infects its susceptible neighbor j with probability independent of the status of its other neighbors. Simultaneously, the nodes infected in cascade by patient zero [39] usually include its first-order neighbors, second-order, third-order, and higher-order ones.
In this paper, the motivation of using the SI models for neighborhood expansion is to analogize each node with patient zero and neighborhood expansion with dis-crete-time stochastic simulations of transmission events in networks. And can be further refined into ij = cs( i , j ) where cs(⋅, ⋅) denotes the cosine similarity between nodes i and j in the original feature space [17].

Related works
Several works have managed to learn network representation. Perozzi et al. [35] utilize unbiased random walk. Subsequently, Aditya Grover and Jure Leskovec [14] employ biased random walk. As a particular case of biased random walk, Zhan et al. [46] make use of a SI model in which nodes are not sampled repeatedly during each walk. Tang et al. [40] design the LINE model, which preserves either the first-order or secondorder proximities. The above four models are all trained in an unsupervised way [18], i.e., without leveraging node labels. On the contrary, topology is also explored via a Gaussian random field model in a supervised way in label propagation (LP) [52]. None of the above models use node features. However, these node features can provide equivalent or even more information than network structure in DNRL, which is empirically validated, as shown in Table 3 of Sect. 6.2.
GraphSAGE [15] is the first MPNN based on data augmentation by uniformly sampling from the entire local neighborhood for aggregation. Subsequently, a global sampling method is proposed based on the importance of the distribution of nodes in FastGCN [8]. The relationship between local sampling and an independent cascade model (a SI model with constraints) and the relationship between global sampling and SIS models are analyzed in detail. Finally, both two sampling schemes are effectively balanced via a hyperparameter in EpidemicGCN [11]. Nevertheless, unlike global sampling, which leads to irrelevance between layers in MPNNs or fails to generalize to a larger network, our neighborhood expansion can learn the hierarchical representation of nodes in the larger network.
In order to remedy the problems caused by homophily assumption or neighborhood aggregation in heterophily settings, current extensive attempts mainly focused on three families of MPNNs: (1) those that perform either "skip connection" operations between latent representations of different hidden layers [9,23] or a concatenation operation cross all hidden layers at each node [45], (2) those that augment network data by either mixing the features of each node and its neighbors [12] or dropping out a certain number of edges from input networks [37] randomly at each training epoch and (3) those that expand the neighborhood of destination nodes from distant nodes with either high structural similarity [33], feature similarity [17] or an efficient attention-guided sorting [27]. We compare the effects of some techniques of these models, comprehensively and empirically, on performance and design our own NEDA model under memory constraints. Subsequently, some interpretable models are proposed. For example, CPGNN incorporates a compatibility matrix for modeling the heterophily level [49]. GPRGNN optimizes node feature and topological information extraction jointly [10] via adaptively learning the GPR weights. BernNet provides a simple and intuitive mechanism for designing and learning an arbitrary spectral filter via Bernstein polynomial approximation [16].

Proposed model: NEDA
The core idea behind our model is that each node can aggregate information of nodes both with similar features and beyond its local neighborhood. We first describe the representation generation or forward propagation process of NEDA under the assumption that the model NEDA has already been trained so that its parameters are fixed in Sect. 4

Representation generation process
The representation generation process of NEDA consists of the following five operations: neighborhood expansion, data augmentation, aggregation, combination, and feature transformation, which are executed independently and iteratively across nodes, as shown in Algorithm 1. Without loss of generality, we introduce each operation individually, taking node n as an example.
Neighborhood expansion As the only patient zero, node n gets its extended neighborhood ( n ) by infecting other susceptible nodes in a cascading manner until the final size s 0 as

Fig. 2 Schematic diagram of SI models
a hyper-parameter. In other words, the process of its neighborhood expansion from ( n ) = ∅ to | ( n )| = s 0 is that of its extended neighobors being infected by n in a cascading manner. Exactly, the process is performed recursively based on the current IS edge set } , denoted by ne({ }) , via the following conditional probability,

Data augmentation
The neighborhood expansion successfully generalizes MPNNs to DNRL. However, the process is time-consuming, especially in the multi-layer architecture of MPNNs [9,37]. Therefore, a fixed set of neighbors of size s l , denoted by l−1 s (n) , are uniformly sampled at random 2 from ( n ) , denoted by sample( ( n )) , for cheap computation in the l-th layer of NEDA, where l ∈ {1, … , L} and l−1 s (n) ⊂ ( n ) . Finally, we aggregate the nodes from l−1 s (n) instead of ( n ) at each training epoch to receive multiple augmented features stochastically while minimizing computational cost. for k = 1, . . . , s 0 do 6: end for 10: end for 11: for l = 1, . . . , L do 12: for n = 1, . . . , N do 13: δ l−1 s (n) ← sample(δ I (I n )). 14: Aggregation The aggregation operation, a differentiab-le and permutation-invariant function [44], can be formulated as: where mean(⋅) takes the element-wise mean of the features in the sampled neighbor set l−1 s (n). Combination The combination operation concatenates the latent vectors of the previous layer l−1 n with the aggregated vectors l n , formulated as: where 0 n = n . It has been proven theoretically and empirically that a trained model can achieve better performance via the concatenation than the weighted sum (e.g., the combination operation of GCN or GAT) of l−1 n and l n under the heterophily setting [50].
Feature transformation The feature transformation operation is a composite function with a learnable parameter matrix l , which can be summarized as: where l is leveraged for feature fusion and dimensionality reduction, and l denotes nonlinear activation to boost the quality of representations. Finally, the real-valued vector produced by the output layer n = L n is its corresponding representation, which preserves both its features and topological structure information together.

The training process of NEDA
The set of parameter matrices W = { 1 , … , L } is updated via cross entropy loss function, where Y L denotes the index set of nodes in a training set.

The trained NEDA
When the disassortative network G shown in Fig. 1 is fed as an input into the trained NEDA with s 0 = 5, s 1 = 3 and s 2 = 4, the representation generation process for node 3 is shown in Fig. 3. To ease the understanding of the neighborhood expansion, we elaborate on how node 3 gets its extended neighbors. Firstly, G is mapped into a bipartite network G � (I, 3 4 , 3 5 } and S = 1 , 2 , 4 , ⋅ ⋅ ⋅, 11 } , based on the facts that transmission events only occur on the IS edges, as introduced in Sect. 2.2. Secondly, at the initial time t = 0 , ( 3 ) = ∅ and the conditional probability p( j |{ }) that each node j is about to be infected is calculated via Eq. (2). When t = 1 , node 5 is transferred from the set S to the set I assuming that 5 with the highest p( 5 |{ }) = 0.48 is infected, where ( 3 ) = { 5 } . Then,

NEDA*
A "feature copy" operation is added in the process of the neighborhood expansion in NEDA* as a variant of NEDA. In other words, the features of node j as a whole are reset to those of node n when j is infected by n . This "feature copy" operation can be embodied in Algorithm 1 by uncommenting line 7.

Model analysis
In this section, we analyze our proposed model NEDA from the following aspects: Time complexity The time complexity to perform one neighborhood expansion and multiple data augmentations for aggregation is linear w.r.t the number of edges and nodes, respectively, in the training process of NEDA. Hence, its time complexity is only O (M + N).
Model complexity Only a hyperparameter s 0 is ad-ded to control the size of the extended neighborhood in our NEDA based on GraphSAGE. Due to the small world characteristics of networks [1,29], s 0 is usually small and also independent of the size of networks.
Scalability In contrast to GCNII, Geom-GCN, and S-imP-GCN, our neighborhood expansion is parallelizable and can be performed only with the neighborhood knowledge, making NEDA suitable for larger networks.
Relation to GraphSAGE GraphSAGE conceptually i-nspires our NEDA for sampling from the set of all adjacent nodes. As shown in line 13 in Algorithm 1, if ( n ) is replaced by (n) , NEDA is equivalent to GraphSAGE. In addition, as a plug-in module on top of GraphSAGE, NEDA can be trained in minibatch mode and capable of inductive network representation learning that there is no need to retrain when new nodes are added [36].
Relation to SimP-GCN Like SimP-GCN, our NEDA also performs the neighborhood expansion based on the cosine similarities between the features of nodes. However, their difference is that SimP-GCN generates a kNN graph via the similarities independent of the original graph, while the similarities directly guide the neighborhood expansion in NEDA. NEDA is better than SimP-GCN because our neighborhood expansion can adaptively improve the clustering coefficient [3] of the rooted sub-network around each destination node. For example, 2 or 4 are more likely to be infected in the presence of I = 3 , 5 } than I = 3 } , as shown in Fig. 3. This improvement helps enhance its representation capability by comprehensively capturing much richer information from the network structure and the node features.

Experiment
In this section, we first compare the various techniques in the models and then evaluate the effectivenesses of our NEDA and NEDA* against a wide range of competitive state-of-the-art baselines. Finally, we analyze the expanded neighborhood and ablation study empirically.

Experimental settings
Datasets We use four standard disassortative network datasets: (1) Actor where each node corresponds to an actor, and each edge between connected nodes denotes co-occurrence on the same Wikipedia page [41], (2) Cornell, (3) Texas and (4) Wisconsin, where each node denotes web page, and each edge denotes hyperlink between two pages in these last three webpage datasets [33]. An overview summary of the characteristics of these datasets is given in Table 1.
Experimental setup We train NEDA and NEDA* with the same hyper-parameter settings based on Pytorch [32], leverage the Adam SGD optimizer [20] with a learning rate of 0.01, the validation set to do early stopping with the patience of 100 epochs and L 2 regularization 5e-4, and set the number of layers L and hidden units to 2 and 32, epochs 5000, across these four datasets, respectively. For the grid search space of hyper-parameters for the dataset Actor with 7,600 nodes and about 33,000 edges, both s 1 and s 2 in data augmentation are searched from {5, 10, 15, 20, 25}, and s 0 in neighbourhood expansion is tuned from { ⌈s max ⌉ , ⌈1.5s max ⌉ , ⌈2s max ⌉ } where s max = max{s 1 , s 2 } . And for the remaining three datasets with less than 300 nodes and about 1,700 edges: Cornell, Texas and Wisconsin, both s 1 and s 2 are chosen from {3, 6,9,12,15,18,21, 24}, and s 0 is chosen from { ⌈s max ⌉ , ⌈1.25s max ⌉ , ⌈1.5s max ⌉ , ⌈1.75s max ⌉ , ⌈2s max ⌉}.
The techniques in the models Except for a features-agnostic baseline LP [52] which only explores network structure and two graph-agnostic baselines which only explore node features, i.e., MLP and kNN-GCN, the others jointly capture both network structures and node features. To have a more comprehensive understanding of all models, we provide a table containing some baselines 3 and our models (i.e., NEDA and NEDA*) and specify their multiple technique types and model characteristics, as shown in Table 2. In Table 2, the concatenation (CONC-1) operation used in Eq. (4) is taken as the combination operation. A concatenation (CONC-2) operation across all hidden layers at the same node is implemented to select intermediate latent representations adaptively. Neighborhood expansion (NEI-EXP) is designed to aggregate features of nodes at various distances in each hidden layer. Data augmentation (DATA-AUG) is applied at each message passing step via uniformly sampling from its extended or entire local neighborhood. The feature similarity (FEAT-SIM) between pairwise nodes is used to prioritize nodes for aggregation to prevent features of nodes from being "washed out". Deep (DEEP) GNNs indicate that their numbers of the layer are greater than or equal to 4. Some heterophily-related models (HET-REL) are specially designed for DNs. The references (REF) report the average performance of the corresponding model over 10 random splits. Note that the CONC-1, CONC-2, NEI-EXP and DATA-AUG techniques designed based on the original network structure do not apply to LP, MLP, and kNN-GCN. "-" means not applicable.

Performance comparison
Following the common set of experiments, we randomly split nodes into 60%, 20%, and 20% for training, validation, and testing. In Table 3, we reused all (12) metrics reported in Ref. [17] and the metric of NLMLP from Ref. [27]. In addition, we reproduced the experiments of MLP, DropEdge [37], and Grand [12] via CogDL [6] and the experiments of GraphSAGE, H2GCN, and CPGNN via the codes provided by the authors, respectively. Finally, we reported their average performances and the average performance of our models with the highest validation accuracy on the test set over 10 random splits. The following observations are obtained from the comparison results: • Since LP only uses the structural information of the networks, it obtains the worst performance among all models. • MLP and kNN-GCN only employ node features, but their performances significantly improve, indicating that node features are more important than network structures in DNRL. • GCN, GAT, DropEdge, and Grand utilize both the topology and features of nodes, but their performances are inferior to that of these two graph-agnos-tic baselines attributable to the features of destination nodes being "washed out" by nodes with different labels. On the contrary, GraphSage improves the performance by alleviating this problem via CONC-1 instead of DATA-AUG and the weighted sum. • Although these deep GNNs, i.e., JK-Net, GCNII, and GCNII*, obtain more complex or higher-order structure information while avoiding over-smoothi-ng by using the original features of nodes, they still fail to prevent node features from being "washed out". As a result, they are not as effective as GraphSage. • The designs of Geom-GCN, (A+kNN)-GCN, SimP-GCN, and NLMLP are all based on neighborhood expansion, among which SimP-GCN and NLMLP achieve further performance gain. Like them, H2GCN and CPGNN are to learn representations for DNs. • Our models achieve the best performances except for CPGNN-MLP-1 on the dataset Wisconsin, whe-re the optimal and sub-optimal performances are highlighted in bold and underlined, respectively. T-he techniques in our mod-   [15] are integrated into our models. Secondly, the NEI-EXP and FEAT-SIM operations adopted by SimP-GCN and (A+kNN)-GCN and infectious disease dynamics are also incorporated into our models. Thirdly, both the deep GNNs and the GNNs on CONC-2 operation require too many parameters to be trained, leading to insufficient memory and over-fitting. However, our two-layer models with fewer learnable parameters reconcile the contradiction between shallow and deep GNNs by incorporating multi-hop neighborhood information.

Analysis of the expanded neighborhood
We analyze the size and order distribution of the extended neighbors, the data augmentation based on the extended neighbors, and the visualization and clustering of the node representations only for the network Wisconsin due to space limitations. The size distribution We visualize the node classification accuracies and the size distribution of the expanded neighborhood via the scattered plots shown in Fig. 4. The red horizontal dotted line is located at the 10-th best performance position of our models with different sizes of the expanded neighborhood s 0 determined uniquely by the maximum value of s 1 and s 2 (as discussed in Sect. 6.1). On the one hand, when s 0 = 37, the top 10 performances of our models, including the best performances 89.4% for NEDA and 89.8% for NEDA*, are achieved, as shown by the black dashed vertical line. Also, the worst performances (87.0% for NEDA and 88.0% for NEDA*) are higher than the performance of SimP-GCN (85.5%). The reason for the optimal size of the extended neighborhood to be 37 is that too small s 0 is challenging to capture longrange dependencies, while too large s 0 would drop all structural information of the network. On the other hand, when s 0 = 6, our models are all degraded to the worst performances (79.8% for NEDA and 80.0% for NEDA*). Nevertheless, they are comparable to the performances of some recent heterophilyspecific equivalents (e.g., 64.1% for Geom-GCN-P and 77.8% for CPGNN-Cheby-2), empirically verifying the rationality of our neighborhood expansion.
The order distribution To get a better understanding of the order distribution of the expanded neighborhood, we define a coefficient ned(n) for each node n as: where dist( n , j ) denotes the distance, i.e., the length of the shortest path, between nodes n and j [19]. We observe the order distribution of the extended neighborhood of our models with the highest validation accuracy (i.e., NEDA with s 0 = 37, s 1 = 21 , s 2 = 9 and NEDA* with s 0 = 37, s 1 = 21 , s 2 = 3) from a macro perspective by visualizing the average coefficients ned of all nodes through histograms, as shown in Fig. 5. Unlike the models whose aggregation neighborhoods are all first-order, the aggregation neighborhoods of our models contain far fewer first-order neighbors than high-order neighbors, especially second-order and third-order neighbors. The observation is confirmed by the fact that in DNRL, the immediate neighborhood may be dominated by heterophily. On the other hand, however, the higher-order neighborhood may be dominated by homophily, thus providing a more informative context [50].
The data augmentation We explore the impact of different combinations of parameter values from the first (hidden) and second (output) layers, represented by = { s 1 , s 2 , s 1 , s 2 ∈ {3, 6, ⋯ , 24}} , on performance. When the size of the extended neighborhood s 0 is designated as the optimal value 37, the relationship between s 1 , s 2 and s 0 is specified as s 0 = ⌈1.75max{s 1 , s 2 }⌉ (as discussed in Sect. 6.1), which is approximately refined into the following two formulas:

Ablation study
To fairly compare the impact of the CONC-2 operation and our neighborhood expansion on classification performance, we use the average performance of our models with the highest validation accuracy on the test set over 10 random splits in the 48%, 32%, and 20% splits used in [50]. Essentially, NEDA, NEDA*, and GraphSAGE+JK are all variants of GraphSAGE based on different techniques for DNRL. Here, we take a more profound examination of different techniques via the only baseline GraphSAGE to empirically quantify how each affects  Fig. 6 A scatter plot of combinations top=13 of our models on Wisconsin its performance. As can be seen from Fig. 8, all performance improvements obtained by the CONC-2 technique are negligible. On the contrary, our neighborhood expansion boosts performances by a large margin (e.g., roughly 10% on the dataset Cornell), indicating that our expanded neighborhood captures more information simultaneously from network structures and node features.

Conlusion
We propose neighborhood expansion as the preprocessing for the neighborhood of nodes to generalize MPNNs to DNs. The data augmentation on the expanded neighborhood is performed at each training epoch for cheap calculations and for stochastically receiving multiple augmented aggregated features of nodes with similar features. Finally, our NEDA and NEDA* are proposed by integrating the neighborhood expansion and data augmentation into an MPNN. Experiment results on the real-world heterophilic network datasets empirically demonstrate our models' outperforming extensive state-of-the-art baselines and the rationality of the neighborhood expansion and data augmentation. In future work, we will explore the rewiring scheme to deal with the situation where all similarities between a specific node and its adjacent nodes are zero, as shown when Q = 0 in Eq. (2). Meanwhile, we will further exploit the application of the epidemic dynamics in heterogeneous graph neural networks [25,34].
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Fig. 7 The visualization and clustering of the node representations Fig. 8 The impact of different techniques on performance