Personalised meta-path generation for heterogeneous graph neural networks

Zhong, Zhiqiang; Li, Cheng-Te; Pang, Jun

doi:10.1007/s10618-022-00862-z

Personalised meta-path generation for heterogeneous graph neural networks

Open access
Published: 02 October 2022

Volume 36, pages 2299–2333, (2022)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Personalised meta-path generation for heterogeneous graph neural networks

Download PDF

3086 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

Recently, increasing attention has been paid to heterogeneous graph representation learning (HGRL), which aims to embed rich structural and semantic information in heterogeneous information networks (HINs) into low-dimensional node representations. To date, most HGRL models rely on hand-crafted meta-paths. However, the dependency on manually-defined meta-paths requires domain knowledge, which is difficult to obtain for complex HINs. More importantly, the pre-defined or generated meta-paths of all existing HGRL methods attached to each node type or node pair cannot be personalised to each individual node. To fully unleash the power of HGRL, we present a novel framework, Personalised Meta-path based Heterogeneous Graph Neural Networks (PM-HGNN), to jointly generate meta-paths that are personalised for each individual node in a HIN and learn node representations for the target downstream task like node classification. Precisely, PM-HGNN treats the meta-path generation as a Markov Decision Process and utilises a policy network to adaptively generate a meta-path for each individual node and simultaneously learn effective node representations. The policy network is trained with deep reinforcement learning by exploiting the performance improvement on a downstream task. We further propose an extension, PM-HGNN++, to better encode relational structure and accelerate the training during the meta-path generation. Experimental results reveal that both PM-HGNN and PM-HGNN++ can significantly and consistently outperform 16 competing baselines and state-of-the-art methods in various settings of node classification. Qualitative analysis also shows that PM-HGNN++ can identify meaningful meta-paths overlooked by human knowledge.

Graph Neural Network Operators: a Review

Article 15 August 2023

Hierarchical Policy Network with Multi-agent for Knowledge Graph Reasoning Based on Reinforcement Learning

ASFGNN: Automated separated-federated graph neural network

Article 05 February 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The complex interactions in real-world data, such as social networks, biological networks and knowledge graphs, can be modelled as Heterogeneous Information Networks (HINs) (Sun and Han 2012), which are commonly associated with multiple types of nodes and relations. Take the academia HIN depicted in Fig. 1a as an example; it involves 4 types of nodes, including Papers (P), Authors (A), Institutions (I) and publication venues (V), and 8 types of relations. Due to the capability of HINs to depict the complex inter-dependency between nodes, they have attracted increasing attention in the research community and have been applied in the fields of relational learning (Van Otterlo 2005), recommender systems (Zheng et al. 2021), information retrieval (Wan et al. 2020), etc. However, the complex semantics and non-Euclidean nature of HINs render them challenging to modulate by conventional machine learning algorithms designed for tabular or sequential data.

Over the past decade, a significant line of research on HINs is Heterogeneous Graph Representation Learning (HGRL). The goal of HGRL is to learn latent node representations, which encode complex graph structure and multi-typed nodes, for downstream tasks, including link prediction (Wan et al. 2020), node classification (Wang et al. 2019) and node clustering (Fu et al. 2020). As discussed in a recent survey (Dong et al. 2020), one of the paradigms on HGRL is to manually define and use meta-paths to model HIN’s rich semantics and to leverage random walks to transform the graph structure into a set of sequences (Dong et al. 2017; Fu et al. 2017; Shi et al. 2019), which can be further exploited by shallow embedding learning algorithms (Mikolov et al. 2013; Le and Mikolov 2014). A meta-path scheme is defined as a sequence of relations over HIN’s network schema. For instance, an illustrative meta-path in the academia HIN in Fig. 1a is A$\xrightarrow {Cite}$A$\xrightarrow { Write}$P. Some follow-up shallow HGRL models try to avoid the requirement of manually defined meta-paths by developing jump and stay random walk strategies (Hussein et al. 2018), performing random walk with the guide of node contexts (Jiang et al. 2020), or switching to utilising network schema (Zhao et al. 2020). Nevertheless, these “shallow” methods neither support end-to-end training to learn more effective representations for a specific task nor fully utilise node attributes due to the limitation of the embedding algorithms.

Recently, in view of the impressive success of Graph Neural Networks (GNNs) (Kipf and Welling 2017; Velickovic et al. 2018), the second paradigm of HGRL attempts to devise Heterogeneous Graph Neural Networks (HGNNs) for HGRL (Schlichtkrull et al. 2019; Zhang et al. 2019; Fu et al. 2020), which extend various graph convolutions on HIN. Compared with “shallow” HGRL methods, HGNNs support an end-to-end training mechanism that can learn node representations with some labelled nodes’ assistance and are also empowered by more complex encoders instead of using the shallow embedding learning methods. HGNNs can model both structure and node attributes in HINs with the guidance of meta-paths. However, they still rely on the hand-crafted meta-paths to explicitly model the semantics of HINs, and obtaining meaningful and useful meta-paths for nodes in HINs to guide HGNNs is still highly non-trivial.

More precisely, existing meta-path guided HGRL methods simply assume that nodes with the same type share the same meta-path. Take the academia HIN as an example (Fig. 1a) and assume we plan to learn node representations to determine the research area of ${ Author}$s. A meta-path $\Omega _{1}$: A$\xrightarrow { Write}$P$\xrightarrow { Published}$V may be useful to learn a representation of a senior researcher since his/her published papers and attended venues may provide sufficient information to decide his/her research area. While learning the representation of a junior PhD candidate with just a few published papers, we may need to extract information from his collaborators following the meta-path: $\Omega _{2}$: A$\xrightarrow { Cite}$A$\xrightarrow { Write}$P, because $\Omega _{1}$ retains little information in the case of junior PhD candidates. Hence, we argue that we should generate a personalised meta-path for each “individual node” according to its attributes and neighbouring relational structures instead of giving each “node type” several pre-defined meta-paths in general.

Motivated by the outstanding success of Reinforcement Learning (RL) in strategy selection problems (Mnih et al. 2015), previous methods attempt to apply RL techniques to find paths between given node pairs which model the similarity between two nodes (Meng et al. 2015; Yang et al. 2018; Wan et al. 2020). The found paths are then fed into the encoder to learn representations for pairwise tasks like link prediction. Nevertheless, challenges still remain in designing personalised meta-paths of individual nodes for node-wise tasks.

Key challenges in personalised meta-paths generation. First, the definition of meta-paths requires rich domain knowledge that is extremely difficult to obtain in complex and semantic rich HINs (Dong et al. 2020). Specifically, given a HIN G with a node type set N, a relation type set R and a fixed meta-path length T. The possible meta-paths are contained in a set of size $(N \times R)^{T}$. Such a huge set can result in a combinatorial explosion when increasing the scales of N, R and T. Second, the representation capacities of manually-defined meta-paths are limited to a specific task on specific HIN since different G with the same N and R may have different node attributes and relation types distributions. It requires defining appropriate meta-paths for each task on each HIN, which is extremely difficult for practical applications.

In light of these challenges, we propose to investigate HGRL with the objectives of (i) learning to generate a personalised meta-path for each individual node in HINs automatically, (ii) learning node representations effectively and efficiently with personalised meta-paths, and (iii) retaining the end-to-end training strategy to achieve task-oriented optimisation. To achieve these objectives, we present a novel Personalised Meta-path based Heterogeneous Graph Neural Network (PM-HGNN), to unleash the power of HGRL.

Key Ideas of PM-HGNN. Generally, we aim to replace the human efforts on meta-path generation with an RL agent to address the limitations of the dependency on hand-crafted meta-paths of HGNNs. Compared with experts with domain knowledge, the RL agent can adaptively generate personalised meta-paths for each individual node in terms of a specific task/HIN through sequential exploration and exploitation. That said, the obtained meta-paths are no longer for specific types of nodes but are personalised for each individual node. Both graph structure and node attributes are considered in the meta-path generation process, and it is practicable for HINs with complex semantics.

As illustrated in Fig. 2, the meta-path generation process can be naturally considered as a Markov Decision Process (MDP), in which the next relation to extending the meta-path depends on the current state in a meta-path. Moreover, an HGNN model is proposed to learn node representations from the derived meta-paths that can be applied to a downstream task, such as node classification. We propose to employ a policy network (agent) to solve this MDP and use an RL algorithm enhanced by a reward function to train the agent. The reward function is defined as the performance improvement over historical performance, which encourages the RL agent to achieve better and more stable performance. In addition, we find that there exists a large computational redundancy during the information aggregation on meta-paths, thus we develop an efficient strategy to achieve redundancy-free aggregation.

We showcase an instance of our framework, i.e. PM-HGNN, by implementing it with a classic RL algorithm, i.e., Deep Q-learning (Mnih et al. 2015). Besides, we further propose an extension of PM-HGNN++ to deal with the issues of high computational cost and ignoring relational information in PM-HGNN. Specifically, PM-HGNN generates a meta-path for each node according to node attributes, while PM-HGNN++ further enables the meta-path generation to explore the structural semantics of HIN. PM-HGNN++ is able to not only significantly accelerate the HGRL process but also improve the effectiveness of learned node representations with promising performance on downstream tasks.

Main contributions. We summarise our contributions below:

We present a framework, PM-HGNN,^{Footnote 1} to learn node representations in a HIN without hand-crafted meta-paths. An essential novelty of PM-HGNN is that the generated meta-paths are personalised to every individual node rather than general to each node type.
We propose an attention-based redundancy-free mechanism to reduce redundant computation during heterogeneous information aggregation on the derived meta-path instances.
We further develop an extension of PM-HGNN, PM-HGNN++, which not only improves the meta-path generation by incorporating node attributes and relational structure but also accelerates the training process.
Experiments conducted on node classification tasks with unsupervised and (semi-)supervised settings exhibit that our framework can significantly and consistently outperform 16 competing methods (up to $5.6\%$ Micro-F1 improvements). Advanced studies further reveal that PM-HGNN++ can identify meaningful meta-paths that human experts have ignored.

2 Related work

Relational learning In the past decades, research focused on using frameworks that could represent a variable number of entities and the relationships that hold amongst them. The interest in learning using this expressive representation formalism soon resulted in the emergence of a new subfield of machine learning that was described as relational learning (Van Otterlo 2005; Raedt 2008). For instance, TILDE (Blockeel and Raedt 1998) learns decision trees within inductive logic programming systems. Serafino et al. (2018) proposed an ensemble learning-based relational learning model for multi-type classification tasks in HINs. Petkovic et al. (2020) proposed a relational feature ranking method based on the gradient-boosted relational trees towards relational data. Lavrac et al. (2020) presented a unifying methodology combining propositionalisation and embedding techniques, which benefit from the advantages of both in solving complex relational learning tasks. Nevertheless, most of them are not in virtue of neural networks, which fall behind in automatically mining complex HINs.

Graph neural networks Existing GNNs generalise the convolutional operations of deep neural networks to deal with arbitrary graph-structured data. Generally, a GNN model can be regarded as using the input graph structure to generate the computation graph of nodes for message passing, the local neighbourhood information is aggregated to get more effective contextual node representations in the network (Bruna et al. 2014; Defferrard et al. 2016; Kipf and Welling 2017; Velickovic et al. 2018). However, the complex graph-structured data in the real world are commonly associated with multiple types of objects and relations. All of the GNN models mentioned above assume homogeneous graphs, thus it is difficult to apply them to HINs directly.

Heterogeneous graph representation learning HGRL aims to project nodes in a HIN into a low-dimensional vector space while preserving the heterogeneous node attributes and edges. A recent survey presents a comprehensive overview of HGRL (Dong et al. 2020), covering shallow heterogeneous network embedding methods (Dong et al. 2017; Fu et al. 2017; Fan et al. 2018), and heterogeneous GNN-based approaches that are empowered by rather complex deep encoders (Meng et al. 2015; Schlichtkrull et al. 2019; Zhang et al. 2019; Wang et al. 2019; Fu et al. 2020; Hu et al. 2020; Zheng et al. 2021). The “shallow” methods are characterised as an embedding lookup table, meaning that they directly encode each node in the network as a vector, and this embedding table is the parameter to be optimised. However, they cannot utilise the node attributes and do not support the end-to-end training strategy. On the other hand, inspired by the recent outstanding performance of GNN models, some studies have attempted to extend GNNs for HINs. R-GCNs (Schlichtkrull et al. 2019) keep a distinct linear projection weight for each relation type. HetGNN (Zhang et al. 2019) adopts different recurrent neural networks for different node types to incorporate multi-modal attributes. HAN (Wang et al. 2019) extends GAT (Velickovic et al. 2018) by maintaining weights for different meta-path-defined edges. MAGNN (Fu et al. 2020) defines meta-path instance encoders, which are used to extract the structural and semantic information ingrained in the meta-path instances. However, all of these models require manual effort and domain expertise to define meta-paths in order to capture the semantics underlying the given heterogeneous graph.

A recent model, HGT (Hu et al. 2020), attempts to avoid the dependency on hand-crafted meta-paths by devising transferable relation scores, but the number of layers limits its exploration range, and it introduces a large number of additional parameters to optimise. GTN Yun et al. (2019) selects meta-paths from a group of adjacency matrices with learnable weights. The weights are shared among all nodes and are thus not flexible to generate node-specific meta-paths for each individual node. In addition, FSPG Meng et al. (2015), AutoPath Yang et al. (2018) and MPDRL Wan et al. (2020) attempt to employ RL technologies to discover paths between pairs of nodes and further learn node representations for predicting the possibility of the existing edges between node pairs. They assume the founded paths explicitly represent the similarity between two nodes. However, they can only identify meta-paths that describe two nodes’ similarities instead of generating meta-path for individual nodes to learn their representations for node-wise tasks. Moreover, some work (Tanon et al. 2018; Ahmadi et al. 2020) concerns the discovery of frequent patterns in a HIN and the subsequent transformation of these patterns into rules, aka rule mining. But the found patterns are not designed for specific tasks or nodes. Consequently, we believe it is necessary and essential to developing a new HGRL framework that can support the adaptive generation of personalised meta-paths for each node in HIN for node-wise tasks.

Discussion Table 8 summarises the key advantages of PM-HGNN and compares it with a number of recent state-of-the-art methods. PM-HGNN is the first HGRL model that can adaptively generate personalised meta-paths for each individual node to support node-wise tasks and maintain the end-to-end training mechanism.

3 Preliminaries

3.1 Problem Statement

Definition 1

(Heterogeneous information network): A HIN is defined as a directed graph $G=(V, E, N, R)$, associated with a node type mapping function $\phi : V \rightarrow N$ and a relation type mapping function $\varphi : E \rightarrow R$, where N and R are the sets of node and edge types, respectively. Node $v_{i}$’s attribute vector is denoted as $x_i \in \mathbb {R}^{\lambda }$ (with the dimensionality $\lambda $).

Definition 2

(Meta-path): Given a HIN G, a meta-path $\Omega $ with length T is defined as: $\omega _{0} \xrightarrow {r_{1}} \omega _{1} \xrightarrow {r_{1}} \dots \xrightarrow {r_{T}} \omega _{T}$, where $\omega _{j} \in N$ denotes a certain node type, and $r_{i} \in R$ denotes a relation type.

Definition 3

(Meta-path Instance): Given a meta-path $\Omega $, a meta-path instance p is defined as a node sequence following the schema defined by $\Omega $.

Problem: Heterogeneous graph representation learning. We formulate heterogeneous graph representation learning (HGRL) as an information integration optimisation problem. For a given HIN G, $f: (V, E)\rightarrow \mathbb {R}^{d}$ be a function to integrate information from node attributes and network structure as node representations. Without manually specifying any meta-paths, we aim to jointly generate $\rho $ (e.g., $\rho $=1) meta-paths $\{\Omega _{j}\}^{\rho }_{j=1}$ for each node $v_i\in V$ to guide f to encode rich structural and semantic information in HIN, and accordingly learn representations for all nodes in G.

3.2 Markov decision process

Markov Decision Process (MDP) is an idealised mathematical form to describe a sequential decision process that satisfies the Markov Property (Sutton and Barto 1998). An MDP can be formally represented with the quadruple $(S, A, \mathcal {P}, \mathcal {R})$, where S is a finite set of states, A is a finite set of actions, and $\mathcal {P}: S \times A \rightarrow (0, 1)$ is a decision policy function to identify the probability distribution of the next action according to the current state. Specifically, the policy encodes the state and the available action at step t to output a probability distribution $\mathcal {P}(a_{t} \vert s_{t})$, where $s_{t} \in S$ and $a_{t} \in A$. $\mathcal {R}$ is a reward function $\mathcal {R}: S \times A \rightarrow \mathbb {R}$, evaluating the result of taking action $a_{t}$ on the observed state $s_{t}$.

Modelling HGRL with MDP. As illustrated in Fig. 2, the meta-path generation process of HGRL can be naturally modelled as an MDP. To generate a meta-path with maximum of 3 steps for $v_{\textit{A-1}}$, we take the first step as an example, the state $s_{1}$ is identifiable information of $v_{\textit{A-1}}$ and the action set includes relations in the HIN, i.e., $\{ Work, Cite, \dots \}$. The decision maker selects one relation from the action set to extend the meta-path as $a_{1} = { argmax}_{a \in A}(\mathcal {P}(a \mid s_{1}))$. Then, the selected meta-path is fed into HGNN to learn node representation and apply it to the downstream task to obtain a reward score $\mathcal {R}(s_{1}, a_{1})$ that can be used to update $\mathcal {P}$. We refer to Sect. 4 for more details about modelling HGRL with MDP.

3.3 Solving MDP with reinforcement learning

Deep Reinforcement Learning (RL) is a family of algorithms that optimise the MDP with deep neural networks. At each step t, the RL agent takes action $a_{t} \in A$ based on the current state $s_{t} \in S$, and observes the next state $s_{t+1}$ as well as a reward $r_{t} = \mathcal {R}(s_{t}, a_{t})$. Looking at the definition of MDP, the agent acts as the decision policy with $\mathcal {P}$. We aim to search for the optimal decisions that maximise the expected discounted cumulative reward, i.e., we aim to learn an agent network $\pi : S \rightarrow A $ to maximise $\mathbb {E}_{\pi } [\sum _{t'=t}^{T} \gamma ^{t'}r_{t'}]$, where T is the maximum number of steps, and $\gamma \in [0, 1]$ is a discount factor to balance short-term and long-term gains, and smaller $\gamma $ values place more emphasis on immediate rewards (Arulkumaran et al. 2017).

Existing RL algorithms are mainly classified into two series: model-based and model-free algorithms. Compared with model-based algorithms, model-free algorithms have better flexibility, as there is always the risk of suffering from model understanding errors, which in turn affects the learned policy (Arulkumaran et al. 2017). We adopt a classic model-free RL algorithm, i.e., Deep Q-learning (DQN) (Mnih et al. 2015). The basic idea of DQN is to estimate the action-value function by using the Bellman equation (Mnih et al. 2015) ($\mathcal {Q}^{*}$) as an interactive update based on the following intuition: if the optimal value $\mathcal {Q}^{*}(s_{t}, a_{t})$ of the state $s_{t}$ was known for all possible actions $a_{t} \in A$, then the optimal policy is to select the action $a_{i}$ maximising the expected value $\mathcal {R}(s_{t}, a_{t}) + \gamma \mathcal {Q}^{*}(s_{t+1}, a_{t+1})$. And it is common to use a function approximator $\mathcal {Q}$ to estimate $\mathcal {Q}^{*}$ (Mnih et al. 2015):

$$\begin{aligned} \mathcal {Q}(s_{t},a_{t}; \theta ) = \mathbb {E}_{s_{t+1}} \left[ \mathcal {R}(s_{t}, a_{t}) + \gamma \max _{a_{t+1}\in A} \left( \mathcal {Q}(s_{t+1}, a_{t+1}; \theta )\right) \right] , \end{aligned}$$

(1)

where $\theta $ stands for the trainable parameters in the neural network that is used to estimate the decision policy. A value iteration algorithm will approach the optimal action-value $\widetilde{\mathcal {Q}}$, i.e., $\mathcal {Q} \rightarrow \widetilde{\mathcal {Q}}$, as $t \rightarrow \infty $.

DQN exploits two techniques to stabilise the training: (1) the memory buffer $\mathcal {D}$ that stores the agent’s experience in a replay memory, which can be accessed to perform the weight updating. (2) the separate target Q-network ($\hat{\mathcal {Q}}$) to generate the target for Q-learning, which is periodically updated.

4 The PM-HGNN Framework

We present the overview of the proposed PM-HGNN in Fig. 3a, which consists of two components: the RL agent module and an HGNN module. According to states, the RL agent aims at predicting actions for each individual node to arrive at better rewards. Next, we generate meta-path instances based on generated personalised meta-paths to support the information aggregation of HGNN to learn effective node representations. Finally, we apply generated representations on the downstream task for performance evaluation to obtain reward scores and save states, actions, and reward scores into the RL agent for the subsequent updating.

4.1 Personalised Meta-path Generation with RL

A personalised meta-path generation process with maximum T steps for each node can be modelled as a T-round decision-making process that can be naturally treated as an MDP. We elaborate on the alignment between each MDP component and the personalised meta-path generation process in the following.

State (S): The state is a vector used to assist the decision policy to select a relation type to extend the personalised meta-paths for each node. Hence, it is crucial to comprehensively encode existing parts of a meta-path into a state. We adopt a gating mechanism to adaptively update the state. Take $v_{i}$’s meta-path $\Omega $ (starting from node type $N_{v_{i}}$) as an example, the state $s_{t}$ of $\Omega $ at step t is formally defined as:

$$\begin{aligned} s_{t} = q \circ \left( \frac{1}{\vert D(v_{i}) \vert } \sum _{j \in D(v_i)} x_{j}\right) + (1-q) \circ s_{t-1}, \end{aligned}$$

(2)

where $\circ $ stands for the Hadamard product, $D(v_{i})$ represents a set of past nodes at step t, and $\frac{1}{\vert D(v_{i}) \vert } \sum _{j \in D(v_{i})} x_{j}$ represents the average vector of past nodes’ attributes. $s_{t-1}$ stands for the state at step $t-1$. q is the update gate that can determine whether to update a state with past nodes’ attributes, and we estimate it by exploring the relationship between past nodes’ attributes and states. It is formally defined as: $q = Sigmoid\left( f_{\varphi }\left( (\frac{1}{\vert D(v_i) \vert } \sum _{j \in D(v_i)} x_{j}) \parallel s_{t-1}\right) \right) $. $f_{\varphi }$ can be seen as a shared convolutional kernel (Velickovic et al. 2018), and $1-q$ is the reset gate.

Action (A): The action space is a set of relation types to be chosen by the policy network to extend the meta-path, and each relation type is labelled with a positive integer. Note that we add a special action STOP to allow each node to have flexible-length meta-paths. Beginning with the starting node $v_{i}$, the decision policy iteratively predicts which relation can lead to higher reward scores and accordingly use it to extend the current meta-path. The decision policy selects the action STOP to finish the path generation process if encountering a state that includes any extra relationship to the current meta-path hurts the performance on the downstream task.

Decision policy ($\mathcal {P}$): The decision policy aims at mapping a state in S into action in A. The state space and the action space are continuous and discrete, respectively. Thus we use a deep neural network to approximate the action-value function: $\mathcal {P}(a_{t} \vert s_{t}; \theta ): S \times A \rightarrow (0, 1)$. Besides, since any arbitrary action $a \in A$ is always a positive integer, we use DQN (Mnih et al. 2015) to optimise the meta-path generation problem. We utilise an MLP (Haykin 1999) as the decision policy network in the DQN, defined as: $z_{1} = W^T_{1} s_{t} + b_{1}$, $z_{2} = W^T_{2} z_{1} + b_{2}$, ..., $\hat{P} = { Softmax}(\phi _{m}(W^T_{m} z_{m-1} + b_{m}))$, where $W_{m}$ and $b_{m}$ denote the weight matrix and bias vector for the $m$-th layer’s perceptron, respectively. The output $\hat{P} \in (0, 1)$ stands for the possibilities of selecting different relations $a_{t} \in A$ to extend the meta-path. Note that it is possible to adopt other RL algorithms to optimise the policy network. Here, we utilise a basic RL algorithm to illustrate our framework’s main idea and demonstrate its effectiveness.

Reward function ($\mathcal {R}$): We devise a reward function to encourage the RL agent to achieve better and more stable performance on downstream tasks. We define the reward function $\mathcal {R}$ as the performance improvement on the specific downstream task comparing with the historical performances, given by:

$$\begin{aligned} \mathcal {R}(s_{t}, a_{t}) = \mathcal {M}(s_{t}, a_{t}) - \frac{\sum _{j=t-b}^{t-1} \mathcal {M}(s_{j}, a_{j})}{b}, \end{aligned}$$

(3)

where $\frac{\sum _{i=t-b}^{t-1} \mathcal {M}(s_{j}, a_{j})}{b}$ is the baseline performance value at step t, which contains the historical performances of the last b steps. $\mathcal {M}(s_{t}, a_{t})$ is the performance based on the learned node representation $H^{t}[v_{i}]$ on the downstream task (e.g., node classification). And use its accuracy on the validation set as the evaluation performance $\mathcal {M}$.

Optimisation The proposed meta-path generation at step t consists of three phases: (i) obtaining state $s_{t}$; (ii) predicting an action $a_{t}=\arg \max _{a \in A}(Q(s_{t}, a; \theta ))$ to extend the meta-path according to the current state $s_{t}$; (iii) updating state $s_{t}$ to $s_{t+1}$. Moreover, we train the policy network Q-function by optimising Eq. 1 with the reward function as defined in Eq. 3, and the loss function is given by:

$$\begin{aligned} \mathcal {L}_{Q}(\theta )&= \mathbb {E}_{\mathcal {T} \sim {U(\mathcal {D})}}[(\mathcal {R}(s_{t}, a_{t})&+ \gamma \max _{a_{t+1} \in A} \hat{\mathcal {Q}}(s_{t+1}, a_{t+1}; \theta ^{-}) \nonumber \\&\quad - \mathcal {Q}(s_{t}, a_{t}; \theta ) )^{2}], \end{aligned}$$

(4)

where $\mathcal {T} = (s_{t}, a_{t}, s_{t+1}, \mathcal {R}(s_t, a_t))$ is randomly sampled replay memory from the memory buffer $\mathcal {D}$, $\theta ^{-}$ is the set of parameters in the separate target Q-network $\hat{Q}$, $\max \limits _{a_{t+1}\in A} \hat{Q}(s_{t+1}, a_{t+1}; \theta ^{-})$ stands for the optimal target value, and $ Q(s_{t}, a_{t}; \theta )$ is the predicted reward value based on the training Q-network. We optimise the Q-network’s parameters by minimising the loss function.

4.2 Information aggregation with personalised meta-paths

Due to the heterogeneity of nodes in HIN, various types of nodes have different attribute spaces. Before the information aggregation, we first do a node type-specific transformation to project the latent features of different node types into the same space, given by: $H^{0}[v_{i}] = W_{\omega _{i}} \cdot x_{i}$, where $x_{i}$ and $H^{0}[v_{i}]$ are the original attributes and projected features of node $v_{i}$, and $W_{\omega _{i}}$ is a learnable transformation matrix for the type node $\omega _{i} = \phi (v_i)$. Next, we perform the information aggregation using the generated meta-path instances to learn effective node representations for the downstream task. Take node $v_{i}$ as an example, and its corresponding obtained meta-path is assumed to be $\Omega $.

We first generate meta-path instances $\{p_{1}, p_{2}, \dots \}$ based on meta-path $\Omega $. As the example shown in Fig. 4, there could be many meta-path instances that follow meta-path $\Omega $, (e.g., $\Omega : { Paper} \xrightarrow { Written\_by} { Author} \xrightarrow { Cite} { Author} \xrightarrow { Write} { Paper}$). An intuitive approach to performing information aggregation is to adopt sequential aggregation, e.g., HetGNN Zhang et al. (2019) and HAN Wang et al. (2019). But we argue that these aggregated paths have computation redundancy, and the approach to aggregation can be further improved. For example, $P_{3} \rightarrow A_{2}$ and $P_{5} \rightarrow A_{2}$ are repeatedly calculated in the first aggregation step. We aim to merge these two instances into one process $\{P_{3}, P_{5} \} \rightarrow A_{2}$ to reduce redundant computations, termed as redundancy-free aggregation. Besides, we learn the attention scores $Att(v_{i},v_{j})$, where nodes $v_{i}$ and $v_{j}$ are two ends of a link, for each link in the aggregation path so that messages on different nodes can be distinguished from each other.

Let $H^{0}[v_i]$ be the projected features of node $v_{i}$ involved in the instance set $\{p_{1}, p_{2}, \dots \}$ of meta-path $\Omega $. We give the updating procedure of $H[v_{i}]$:

$$\begin{aligned} H^{l}[v_{i}] = \mathop { Aggregate}\limits _{v_{j} \in I(v_{i})} \left( { Att}(v_{i}, v_{j}) \cdot H^{l-1}[v_{j}]\right) , \end{aligned}$$

(5)

where $I(v_{i})$ indicates the set of past nodes of $v_{i}$ in $\{p_{1}, p_{2}, \dots \}$, $l \in \{1, 2, \dots , t \}$ is the id of aggregator to perform information aggregation, and ${ Aggregate}(\cdot ) = Relu ({ Mean}(\cdot ))$. The operator ${ Att(\cdot )}$ calculates the importance of the received messages using the relation between past messages and node features, given by:

$$\begin{aligned} { Att}(i, j) = \mathop { Softmax}\limits _{j \in I(v_{i})} \left( LeakyRelu \left( W H^{l-1}[v_{j}] \parallel W H^{l-1}[v_{i}]\right) \right) , \end{aligned}$$

(6)

where W are the trainable parameters, and $\parallel $ is the concatenation operator. Sect. 5.4 presents an empirical study to reveal how the redundancy-free aggregation affects the time efficiency.

4.3 Model training

Training of HGNN Finally, the updated node representation $H^t[v_{i}]$ can be applied to downstream tasks. Take semi-supervised node classification as an example, with a small fraction of labelled nodes, we can optimise the HGNN’s parameters by minimising the Cross-Entropy loss $\mathcal {L}$ via back-propagation and gradient descent: $\mathcal {L} = - \sum \limits _{v \in \mathcal {V}_{L}} \sum \limits _{c=0}^{C-1}y_{v}[c] \cdot \log (H^{t}[v][c])$, where $\mathcal {V}_{L}$ stands for the set of nodes with labels, C is the number of classes, $y_{v}$ is the one-hot label vector of node v and $H^{t}[v]$ is the learned representation vector of node v. Meanwhile, the learned node representation can be evaluated by task-specific evaluation matrix $\mathcal {M}_{ eval}$ for obtaining a reward score. We summarise our PM-HGNN framework in Algorithm 1. Note that PM-HGNN framework maintains the ability of HGNN in learning node representations for newly added nodes to the graph. The trained RL agent can adaptively generate a meta-path for the new node to compute its representations.

4.4 PM-HGNN++

As seen in previous sections, PM-HGNN adopts the RL agent to generate meta-paths for HGRL adaptively. However, reviewing the overall workflow of PM-HGNN, we identify two limitations. (1) PM-HGNN neglects the relational structures of HIN while generating meta-paths; (2) HGNN requires a large number of epochs to train their parameters and obtain node representations, which restricts the efficiency of PM-HGNN. To be more specific, PM-HGNN generates meta-paths by only considering the node attribute information from the HIN, i.e., state $s_{t}$ summarising the attributes of nodes involved in the meta-path as Eq. 2. However, HIN’s semantic structure is also important information to assist the meta-path generation. In addition, similar to the training process of other deep neural networks (Kipf and Welling 2017), the encoder of PM-HGNN needs a number of epochs to train their parameters to learn effective node representations in terms of a set of meta-path instances. In this way, if HGNN needs B epochs to train their parameters, then we need $T \times B$ epochs to complete a meta-path generation process with the maximum number of steps T. Meanwhile, the RL agent also needs lots of explorations to obtain numerous samples ($\mathcal {T}=(s_{t}, a_{t}, s_{t+1}, \mathcal {R}(s_t, a_t))$) to train the policy network that can result in a combinatorial scale explosion. In the end, following the average aggregation approach to defining node states, PM-HGNN is not able to distinguish node states with categorical attributes, as shown in Eq. 2.

New State (S). To address the identified limitations, we propose an extension of PM-HGNN, PM-HGNN++, which can utilise the structure information of HIN and significantly accelerate the meta-path generation process. Our solution is to define a novel state instead of Eq. 2. Because we notice that the encoder (i.e., HGNN) of PM-HGNN can summarise node attributes and topological structure information involved in meta-paths to assist meta-path generation. In particular, we utilise the latest node representation vector $H^{t-1}[v_i]$ of meta-path’s starting node $v_{i}$ as the state. The new state (S) can be formally described as:

$$\begin{aligned} s_{t} = { Normalise}(H^{t-1}[i]), \end{aligned}$$

(7)

where ${ Normalise}$ is a normaliser trained on the first B generated states to convert the states into the same distribution. Hence, input will be normalized as $\frac{H[v_{i}]-H_{mean}}{H_{std}}$, where $H_{mean}$ and $H_{std}$ is calculated by the first B states. Note that normaliser is a common trick used in deep RL for stabilising the training when the S is very sparse (Hou et al. 2017). In addition, $H^{t}[i]$ contains not only node attributes but also the relevant graph structure information of each node, hence $s_t$ of Eq. 7 can distinguish different nodes with categorical attributes. So, it endows PM-HGNN++ with the ability to handle graphs with diverse node attributes.

New training process. The training process can be further improved based on the new definition of the state. The learned node representation can be further used to optimise HGNN’s parameters and update the RL agent. We achieve such a kind of mutual optimisation between the RL agent and HGNN since the meta-paths are generated based on the current status of HGNN. With this novel training process, we only need T epochs to complete a meta-path generation procedure because we do not need to wait for the encoder to complete an entire training process, instead of $T \times B$ epochs. We summarise PM-HGNN++ in Algorithm 2. We perform empirical analysis to compare the time consuming of PM-HGNN and PM-HGNN++ in Sect. 5.4. Note that, similar to PM-HGNN, PM-HGNN++ framework can adaptively generate a meta-path for the new node to compute its representation, which maintains the ability of HGNN to learn representations for newly added nodes.

4.5 Model analysis

The proposed models can deal with various types of nodes and relations and fuse rich semantics in HINs. Benefiting from the RL agent, personalised meta-paths are generated for different nodes, and the HGNN encoder allows information to transfer between nodes via diverse relations. The RL agent that we adopted in this paper, i.e., Deep Q-learning (DQN), is hard to give the precise computation complexity (Fan et al. 2020), hence we give the empirical meta-path generation time in IMDb and DBLP datasets in Sect. 5.4 (Fig. 8). Given a meta-path $\Omega $ and the dimensionality of (hidden) node representations is d. The time complexity of the node representation learning process is $\mathcal {O}(V_{\Omega } d^2 + E_{\Omega }d)$, where $V_{\Omega }$ is the number of nodes following the meta-path $\Omega $ and $E_{\Omega }$ is the number of meta-paths based node pairs. $\mathcal {O}(V_{\Omega } d^2)$ accounts for the representation transformation process and $\mathcal {O}(E_{\Omega }d)$ represents the $Att(\cdot )$ computation process of relevant edges. To further unfold the relationship between the complexity of generated meta-paths and the performance, we also report how the maximum step T and the number of aggregation paths affect the node classification performance based on the IMDb dataset in Sect. 5.4 (Table 6).

5 Experiments

5.1 Experimental settings

Datasets We adopt three HIN datasets (IMDb and DBLP, ACM) (Wang et al. 2019; Fu et al. 2020) from different domains to evaluate our models’ performance. Statistic information is summarised in Table 4 and the meta-relation schema are shown in Fig. 9. The detailed description of datasets refers to Appendix A.

Competing methods and model configuration We compare their performance against various state-of-the-art models. They include 5 homogeneous graph representation learning models: LINE Tang et al. (2015), DeepWalk Perozzi et al. (2014), MLP, GCN Kipf and Welling (2017), GAT Velickovic et al. (2018); and 10 HGRL models: Esim Shang et al. (2016), metapath2vec Dong et al. (2017), JUST Hussein et al. (2018), HERec Shi et al. (2019), NSHE Zhao et al. (2020), RGCN Schlichtkrull et al. (2019), HAN Wang et al. (2019), GTN Yun et al. (2019), MAGNN Fu et al. (2020) and HGT Hu et al. (2020); and 1 state-of-the-art relational learning model PropStar (Lavrac et al. 2020). Note that FSPG Meng et al. (2015) AutoPath Yang et al. (2018) and MPDRL Wan et al. (2020) do not support node-wise tasks since they learn paths that explicitly represent the similarity between pairs of nodes (as discussed in Sec. 2). Therefore we cannot compare them. The detailed model description and configuration for implementation refer to Appendix C and Appendix E.

Evaluation settings We evaluate our models under node classification. For the semi-supervised setting, we adopt the same settings as in MAGNN Fu et al. (2020) to use 400 instances for training and another 400 instances for validation, the rest nodes for testing. The generated node representations are further fed into a support vector machine (SVM) classifier to get the prediction results. For the supervised setting, we use the same percentage of nodes for training and validation, respectively. Use the rest of the nodes for testing. We report the average Micro-F1 and Macro-F1 scores of 10 runs with different seeds.

Table 1 Experiment results (%) on the IMDb, DBLP and ACM datasets for the node classification task with unsupervised and semi-supervised settings. MP2V stands for metapath2vec

Full size table

5.2 Experimental results

Semi-supervised node classification. We present the results in Table 1, in which competing methods that cannot support semi-supervised settings utilise the unsupervised settings. PM-HGNN++ performs consistently better than all competing methods across different proportions and datasets. On IMDb, the performance gain obtained by PM-HGNN++ over the best competing method (MAGNN) is around ($3.7\%-5.08\%$). GNN models designed for homogeneous and heterogeneous graphs perform better than shallow HGRL models. This demonstrates that the modelling of heterogeneous node attributes improves the performance significantly. On DBLP and ACM, the performances of all models are overall better compared with the results on IMDb. It is interesting to observe that different from the results on IMDb, the shallow heterogeneous network methods, i.e., JUST and NSHE obtain better performances than a few number of homogeneous GNNs. That said, the heterogeneous relations on DBLP and ACM are useful for the node classification task. PM-HGNN++ apparently outperforms the strongest competing method (i.e., MAGNN) up to $2.4\%$, showing the generality and superiority of PM-HGNN++. In addition, among unsupervised competing methods, NSHE obtains better performance than other unsupervised methods, showing that network schema in HIN is helpful to obtain better representations.

Table 2 Experiment results (%) on the IMDb, DBLP and ACM datasets for the node classification task with supervised settings

Full size table

Supervised node classification. We present the results in Table 2. We can see that PM-HGNN++ consistently outperforms all competing models on IMDb, DBLP and ACM datasets, with up to $5.6\%$ in terms of Micro-F1. Heterogeneous GNNs outperform homogeneous GNNs, and our PM-HGNN++ achieves the best performance. This demonstrates the usefulness of heterogeneous relations and the advantages of generating appropriate personalised meta-paths for each node according to its attributes and relational structure.

5.3 Meta-path analysis

Meta-path generation process visualisation We visualise how the RL agent in PM-HGNN++ generates personalised meta-paths for each target node in Figs. 10,11 and Figs. 12.10 summarises the actions made by the RL agent on IMDb with different max steps under the semi-supervised setting. The percentages marked in the figure present the fraction of nodes choosing the corresponding relation to extend the meta-path at that step. For example, the meta-path ${ M\xrightarrow {64.8\%}A\xrightarrow {8.8\%}M}$ with $T=2$. $64.8\%$ means there are $64.8\%$ Movie nodes select MA relation at step-1, and $8.8\%$ means among those Movie nodes who selected MA at step-1, there are $8.8\%$ nodes selecting AM to extend the meta-path. It is interesting to see that the RL agent selects the relation MA at the first step more often than the other one MD. It means that a movie’s characteristic is more related to its actors than its director. Besides, when step $T=2$, the RL agent selects plenty of STOP actions. This shows that two short meta-paths, i.e., ${ Movie}\xrightarrow { MA} { Actor}$ and ${ Movie} \xrightarrow { MD} { Director}$ are informative enough for the majority of nodes to learn effective representations. Moreover, there are $6.7\%-9.4\%$ nodes that do not need any meta-paths to learn their representations. This implies that their attributes can provide enough information. This has also been reflected in the results of MLP in Sect. 5.2, which does not utilise any structural information but only node attributes. More analyses on Figs. 11 and12 can be found in Appendix B.

Table 3 Meta-paths designed by PM-HGNN++ on IMDb

Full size table

Overall meta-path statistics. We further present meta-paths generated for most of the target nodes in Table 3,^{Footnote 2} which are summarised from Fig. 10. Table 6 and7 present the generated meta-paths on DBLP and ACM, respectively. We can find from Table 3 that the RL agent can generate the same meta-paths as manually-defined ones. Besides, we find that these meta-paths specified by human experts are not the most useful ones to learn node representations for Movie nodes. Two short meta-paths ${ Movie} \xrightarrow { MA} { Actor}$ and ${ Movie} \xrightarrow { MD} { Director}$ are the most useful ones. This indicates that participants (director and actors) of the movie largely determine its type as the task is to predict the class of a movie. The top three meta-paths generated by the most potent competing method, GTN, confirm our observation that two shorter meta-paths are more valuable to learn Movie nodes’ representations. However, GTN can only select several meta-paths for every node type; our model can identify personalised meta-paths for each node. More analyses on Tables 6 and 7 can be found in Appendix B.

Meta-path comparison In order to understand how does PM-HGNN++ generate personalised meta-paths for different nodes, we hereby present another analysis to investigate the designed meta-path lengths for nodes that can be correctly classified according to raw node attributes. Specifically, we devide nodes into two groups, pos and neg, according to whether applying multi-layer perceptron (MLP) with node attributes can produce correct classification under supervised settings. That said, correctly-classified nodes and incorrectly-classified ones are grouped into pos and neg, respectively. Then, we calculate the average lengths of PM-HGNN++ generated meta-paths for nodes belonging to two groups. The results are reported in Fig. 5. We can find that nodes that cannot be correctly classified by MLP are associated with longer personalised meta-paths. Such results deliver an important insight: PM-HGNN++ tends to explore deeper relational neighbouring structure to enhance node representation learning, i.e., discriminating nodes with different labels from one another, if node attributes themselves cannot provide sufficient information. Moreover, in the DBLP dataset, which has more complicated heterogeneous semantic patterns (4 node types and 6 relation types), node group MLP (neg) has relatively longer meta-paths because PM-HGNN++ can have more options to explore and generate personalised meta-paths.

5.4 Model snalysis on PM-HGNN++

Maximum step T. We study how the influence of maximum step T and the number of aggregation paths affect on the classification performance. The results on IMDb dataset are presented in Fig. 6. Setting $T \ge 2$ leads to the most significant performance improvement in terms of Micro F1 over $T\!=\!1$. As T increases, the number of aggregation paths also increases. Considering that more aggregation paths bring higher computational cost, we choose $T\!=\!2$ for the experiments in Sect. 5.2, even though $T\!>\!2$ can produce better performance.

Redundancy-free aggregation We investigate the influence of the redundancy-free improvement according to the number of aggregations with/without the improvement. Fig. 7 shows the results. We can find that our redundancy-free aggregation strategy significantly reduces the number of information aggregations. Besides, as the maximum step T increases, the reduction effect is more obvious. On IMDb, our method reduces near 50% aggregations with $T=2$, and the reduced ratio is more than 80% when $T=4$.

Run time analysis of meta-path generation As described in Sect. 4.4, PM-HGNN++ can accelerate the meta-path generation process through the proposed novel training process. Here we compare the run time of PM-HGNN and PM-HGNN++ for personalised meta-paths generation. We report their actual run time in seconds, and calculate the run time of PM-HGNN++ relative to PM-HGNN (i.e., relative time). The results on IMDb and DBLP datasets are shown in Fig. 8. We can find that PM-HGNN++ is less than $5\%$ of PM-HGNN’s run time on both datasets. Such results exhibit the promising time efficiency of PM-HGNN++.

6 Conclusions and future work

We have studied in this paper the HGRL problem and identified the limitation of existing HGRL methods, i.e., mainly due to their dependency on hand-crafted meta-paths. In order to fully unleash the power of HGRL, we presented a novel framework PM-HGNN and proposed one extension model PM-HGNN++. Compared with existing HGRL models, the most significant advantages of our framework lie in avoiding manual efforts in defining meta-paths of HGRL and generating personalised meta-paths for each individual node. The experimental results demonstrated that our framework generally outperforms the competing approaches and discovers useful meta-paths that have been ignored by human expertise. In the future, we plan to extend our framework to other tasks on HINs, such as online recommendation and knowledge graph completion and understanding PM-HGNN’s generated meta-paths is another promising direction.

Notes

Code and data are available at: https://github.com/zhiqiangzhongddu/PM-HGNN
We do not present the relation types for all meta-paths in3.
https://www.imdb.com/
https://dblp.uni-trier.de/
https://dl.acm.org/
We only discuss the meta-paths that started with target nodes of each dataset.
We present the visualisation results of $T<4$, since the figure will be too dense to see the content clearly with $T=4$
https://github.com/shenweichen/GraphEmbedding
https://www.dgl.ai/
https://pytorch-geometric.readthedocs.io/en/latest/
https://github.com/shangjingbo1226/ESim
https://github.com/eXascaleInfolab/JUST
https://github.com/librahu/HERec
https://github.com/AndyJZhao/NSHE
https://github.com/seongjunyun/Graph_Transformer_Networks
https://github.com/SkBlaz/PropStar
https://github.com/cynricfu/MAGNN
https://pytorch.org/

References

Sun Y, Han J (2012) Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explorations Newsletter
Van Otterlo M (2005) A survey of reinforcement learning in relational domains. Centre for Telematics and Information Technology (CTIT) University of Twente, Tech, Rep
Google Scholar
Zheng J, Ma Q, Gu H, Zheng Z (2021) Multi-view denoising graph auto-encoders on heterogeneous information networks for cold-start recommendation. In: Proceedings of the 2021 ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 2338–2348
Wan G, Du B, Pan S, Haffari G (2020) Reinforcement learning based meta-path discovery in large-scale heterogeneous information networks. In: Proceedings of the 2020 AAAI Conference on Artificial Intelligence (AAAI), pp. 6094–6101
Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019) Heterogeneous graph attention network. In: Proceedings of the 2019 International Conference on World Wide Web (WWW), pp. 2022–2032
Fu X, Zhang J, Meng Z, King I (2020) MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding. In: Proceedings of the 2020 International Conference on World Wide Web (WWW), pp. 2331–2341
Dong Y, Hu Z, Wang K, Sun Y, Tang J (2020) Heterogeneous network representation learning. In: Proceedings of the 2020 International Joint Conferences on Artifical Intelligence (IJCAI), pp. 4861–4867
Dong Y, Chawla N.V, Swami A (2017) metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of the 2017 ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 135–144
Fu T, Lee W, Lei Z (2017) Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of the 2017 ACM International Conference on Information and Knowledge Management (CIKM), pp. 1797–1806
Shi C, Hu B, Zhao WX, Yu PS (2019) Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionally. In: Proceedings of the 2013 Annual Conference on Neural Information Processing Systems (NIPS), pp. 3111–3119
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 2014 International Conference on Machine Learning (ICML), pp. 1188–1196
Hussein R, Yang D, Cudré-Mauroux P (2018) Are meta-paths necessary?: Revisiting heterogeneous graph embeddings. In: Proceedings of the 2018 ACM International Conference on Information and Knowledge Management (CIKM), pp. 437–446
Jiang J, Li Z, Ju CJ-, Wang W (2020) MARU: meta-context aware random walks for heterogeneous network representation learning. In: Proceedings of the 2020 ACM International Conference on Information and Knowledge Management (CIKM), pp. 575–584
Zhao J, Wang X, Shi C, Liu Z, Ye Y (2020) Network schema preserving heterogeneous information network embedding. In: Proceedings of the 2020 International Joint Conferences on Artifical Intelligence (IJCAI), pp. 1366–1372
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the 2017 International Conference on Learning Representations (ICLR)
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: Proceedings of the 2018 International Conference on Learning Representations (ICLR)
Schlichtkrull MS, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M (2019) Modeling relational data with graph convolutional networks. In: European Semantic Web Conference (ESWC), pp. 593–607
Zhang C, Song D, Huang C, Swami A, Chawla NV (2019) Heterogeneous graph neural network. In: Proceedings of the 2019 ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 793–803
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller MA, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529–533
Article Google Scholar
Meng C, Cheng R, Maniu S, Senellart P, Zhang W (2015) Discovering meta-paths in large heterogeneous information networks. In: Proceedings of the 2015 International Conference on World Wide Web (WWW), pp. 754–764
Yang C, Liu M, He F, Zhang X, Peng J, Han J (2018) Similarity modeling on heterogeneous networks via automatic path discovery. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECMLPKDD), pp. 37–54
Raedt LD (2008) Logical and Relational Learning. Cognitive Technologies
Blockeel H, Raedt LD (1998) Top-down induction of first-order logical decision trees. Artif Intell 101(1–2):285–297
Article MathSciNet MATH Google Scholar
Serafino F, Pio G, Ceci M (2018) Ensemble learning for multi-type classification in heterogeneous networks. IEEE Trans Knowl Data Eng 30(12):2326–2339
Article Google Scholar
Petkovic M, Ceci M, Kersting K, Dzeroski S (2020) Estimating the importance of relational features by using gradient boosting. In: International Symposium on Foundations of Intelligent Systems (ISMIS). Lecture Notes in Computer Science, vol. 12117, pp. 362–371
Lavrac N, Skrlj B, Robnik-Sikonja M (2020) Propositionalization and embeddings: two sides of the same coin. Mach Learn 109(7):1465–1507
Article MathSciNet MATH Google Scholar
Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and locally connected networks on graphs. In: Proceedings of the 2014 International Conference on Learning Representations (ICLR)
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the 2016 Annual Conference on Neural Information Processing Systems (NIPS), pp. 3837–3845
Fan Y, Hou S, Zhang Y, Ye Y, Abdulhayoglu M (2018) Gotcha - sly malware!: Scorpion A metagraph2vec based malware detection system. In: Proceedings of the 2018 ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 253–262
Hu Z, Dong Y, Wang K, Sun Y (2020) Heterogeneous graph transformer. In: Proceedings of the 2020 International Conference on World Wide Web (WWW), pp. 2704–2710
Yun S, Jeong M, Kim R, Kang J, Kim HJ (2019) Graph transformer networks. In: Proceedings of the 2019 Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 11960–11970
Tanon TP, Stepanova D, Razniewski S, Mirza P, Weikum G (2018) Completeness-aware rule learning from knowledge graphs. In: Proceedings of the 2018 International Joint Conferences on Artifical Intelligence (IJCAI), pp. 5339–5343
Ahmadi N, Huynh V, Meduri VV, Ortona S, Papotti P (2020) Mining expressive rules in knowledge graphs. ACM Journal of Data and Information Quality
Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. IEEE Transactions on Neural Networks and Learning Systems 9(5):1054–1054
Article Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine
Haykin S (1999) Neural networks: A comprehensive foundation. Knowledge Engineering Review
Hou Y, Liu L, Wei Q, Xu X, Chen C (2017) A novel DDPG method with prioritized experience replay. In: Proceedings of the 2017 International Conference on Systems Man and Cybernetics (SMC), pp. 316–321
Fan J, Wang Z, Xie Y, Yang Z (2020) A theoretical analysis of deep q-learning. In: Proceedings of the 2020 Annual Conference on Learning for Dynamics and Control (L4DC). Proceedings of Machine Learning Research, vol. 120, pp. 486–489
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of the 2015 International Conference on World Wide Web (WWW), pp. 1067–1077
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 2014 ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 701–710
Shang J, Qu M, Liu J, Kaplan LM, Han J, Peng J (2016) Meta-path guided embedding for similarity search in large-scale heterogeneous information networks. CoRR abs/1610.09769

Download references

Acknowledgements

This work is supported by the Luxembourg National Research Fund through grant PRIDE15/10621687/SPsquared, and supported by Ministry of Science and Technology (MOST) of Taiwan under grants 110-2221-E-006-136-MY3, 110-2221-E-006-001, and 110-2634-F-002-051.

Author information

Authors and Affiliations

Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
Zhiqiang Zhong & Jun Pang
Institute of Data Science and the Department of Statistics, National Cheng Kung University, Tainan, Taiwan
Cheng-Te Li
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Esch-sur-Alzette, Luxembourg
Jun Pang

Authors

Zhiqiang Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Te Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Pang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Cheng-Te Li or Jun Pang.

Additional information

Responsible editor: Albrecht Zimmermann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Datasets

The statistics of datasets are summarised in Table 4. Detailed descriptions of two datasets are presented as follows:

IMDb^{Footnote 3} is an online dataset about movies and television programs, including information such as cast, production crew and plot summaries. We extract a subset of IMDb that contains 4, 278 movies, 2, 081 directors and 5, 257 actors. The movies are labelled as one of the three classes, i.e., Action, Comedy and Drama, according to their genre information. The attribute of each movie corresponds to elements of a bag of words (i.e., their plot keywords, 1, 232 in total).

DBLP^{Footnote 4} is an online computer science bibliography. We extract a subset of DBLP that contains 4, 057 movies, 1, 4328 papers, 7, 723 terms and 20 venues. The authors are labelled as one of the following four research areas: Database, Data mining, Machine learning and Information retrial. Each author can be described by a bag of words (i.e., their paper keywords, 334 in total).

ACM^{Footnote 5} is an online academic publication dataset. We extract papers published in KDD, SIGMOD, SIGCOMM, MobiCOMM and VLDB and divided the papers into three classes (Database, Wireless Communication, and Data Mining). Then, we construct a HIN that comprises 3, 025 papers, 5, 835 authors and 56 subjects. Paper features correspond to elements of a bag of words represented by keywords. We label the papers according to the conference they published.

Table 4 Statistics of the datasets

Full size table

Table 5 All possible meta-paths on different datasets. IMDb: Movie (M), Director (D), Actor (A); DBLP: Author (A), Paper (P), Term (T), Venue (V); ACM: Paper (P), Author (A), Subject (S)

Full size table

More Meta-paths Analysis

We have introduced three datasets that we used in the experimental Sections in Sect. 5.1 and Section A. Here we further present the set of meta-paths that are possibly generated in each step in Table 5.^{Footnote 6} It should be noted that as we discussed in Section 4, there is always an available STOP action at each step. That said, PM-HGNN variants allow to design meta-paths with flexible lengths at each step.

We present more meta-path analysis here. Figures 11^{Footnote 7} and12 visualise how the reinforcement learning agent in PM-HGNN++ generates personalised meta-paths for each target node on DBLP and ACM datasets with different max steps under the semi-supervised setting, respectively The percentages marked in the figure represent the fraction of nodes choosing the corresponding relation to extend the meta-path at that step. Tables 6 and 7 summarise the predefined meta-paths, the found important meta-paths by GTN Yun et al. (2019) and the top-frequent personalised meta-paths designed by PM-HGNN++ for the target nodes of DBLP and ACM datasets, respectively. The meta-path generation process is shown in Figures 11 and 12, the RL agent makes decisions to extend the meta-paths for different nodes according to the defined state S. Note that, we only present the top-5 frequent meta-paths if there are more possibilities.

Competing Methods

We adopt 5 homogeneous graph representation learning models and 10 heterogeneous graph representation learning models and 1 state-of-the-art relational learning model as competing methods.

1.
LINE Tang et al. (2015) is a traditional homogeneous model that exploits the first-order and second-order proximity between nodes. We apply it to the datasets by ignoring heterogeneous information network heterogeneity and node attributes.
2.
DeepWalk Perozzi et al. (2014) is a random walk-based graph representation learning method for homogeneous graphs, we apply it to the datasets by ignoring heterogeneous information network heterogeneity and node attributes.
3.
ESim Shang et al. (2016) is a heterogeneous graph representation learning method that learns different semantics from multiple meta-paths. It requires a pre-defined weight for each meta-path, we assign all meta-paths with the same weight.
4.
metapath2vec Dong et al. (2017) is a heterogeneous graph representation learning approach that performs random walks on heterogeneous information networks with the guidance of pre-defined meta-paths and utilises skip-gram to generate node representations.
5.
JUST Hussein et al. (2018) is a heterogeneous graph representation learning method that does not rely on manually defined meta-paths.
6.
HERec Shi et al. (2019) is a heterogeneous graph representation learning method that designs a type constraint strategy to filter the node sequence and utilises skip-gram to generate node representations.
7.
NSHE Zhao et al. (2020) is a heterogeneous graph representation learning method that tries to maintain network schema during network representation learning.
8.
MLP Haykin (1999) is a class of feed-forward neural networks that learns information from node attributes without structural information.
9.
GCN Kipf and Welling (2017) is a homogeneous graph neural network that extends the convolution operation to graphs. Here we test GCN on meta-path-based homogeneous graphs and report the results from the best meta-path.
10.
GAT Velickovic et al. (2018) is a homogeneous graph neural network that performs convolution on graphs with an attention mechanism. Similar to the implementation of GCN, here we test GAT on meta-path-based homogeneous graphs and report the results from the best meta-path.
11.
RGCN Schlichtkrull et al. (2019) is a heterogeneous graph neural network model which keeps a different weight for each relation to perform convolution on graphs. The graph neural network encoder is the same as GCN.
12.
HAN Wang et al. (2019) is a heterogeneous graph neural network model which learns meta-path guided node representations from different meta-path-based homogeneous graphs and integrates them with using attention.
13.
GTN Yun et al. (2019) is a heterogeneous graph neural network model which transforms a heterogeneous graph into multiple new graphs defined by given meta-paths and selects from them by transition probability.
14.
PropStar Lavrac et al. (2020) is state-of-the-art relational learning which uses embedding vectors to represent the features of the data set. Individual relational features obtained as the result of pro-positionalisation by Wordification, are used by supervised embeddings learners to obtain representations, co-located with instance labels.
15.
MAGNN Fu et al. (2020) is a heterogeneous graph neural network model which defines meta-path instance encoders to extract the structure and semantics in meta-paths to improve generated representations.
16.
HGT Hu et al. (2020) is a heterogeneous graph neural network model which uses each meta-relation to parameterise transformer-like self-attention architecture to capture common and specific patterns of relationships.

Table 6 Meta-paths generated by PM-HGNN++ on DBLP

Full size table

Table 7 Meta-paths generated by PM-HGNN++ on ACM

Full size table

For LINE and DeepWalk, we utilise the integrated implementations from GraphEmbedding.^{Footnote 8} For metapath2vec, RGCN, HAN and HGT, we utilise the integrated implementations from Deep Graph Library.^{Footnote 9} For GCN and GAT, we use the implementation from PytorGeometric.^{Footnote 10} For other methods, we adopt the official implementation provided in their published papers: ESim,^{Footnote 11} JUST,^{Footnote 12} HERec,^{Footnote 13} NSHE,^{Footnote 14} GTN,^{Footnote 15} PropStart^{Footnote 16} and MAGNN.^{Footnote 17}

Table 8 Model comparison from various aspects: Heterogeneous Information Network (HIN), Node-wise Task (NT), End-to-end Training (E2E), Without Manual-defined Meta-paths (WMM), Adaptive Meta-path Generation (AMG), Meta-paths Attached (MPA) to each node-pair ($*$), each node type ($\circ $) or each node ($\bullet $)

Full size table

Model comparison

In Section 2, we have systematically discussed related work and highlighted the differences between PM-HGNN and them. Here, we further present Table 8 to summarise the key advantages of PM-HGNN and compares it with a number of recent state-of-the-art methods. PM-HGNN is the first HGRL model that can adaptively generate personalised meta-paths for each individual node to support node-wise tasks and maintain the end-to-end training mechanism.

Model configuration

For the DQN of the proposed PM-HGNN and PM-HGNN++, we use the implementation in Mnih et al. (2015) with a few modifications to fit it with our frameworks. We develop a 5-layers MLP with (32, 64, 128, 64, 32) as the hidden units for Q function. The memory size is $50 \times b$, where b is the number of validation nodes in the dataset. For the HGNN module of PM-HGNN and PM-HGNN++, we randomly initialise parameters and optimise the model with Adam optimiser. We set the learning rate to 0.005, the regularisation parameter to 0.0001, the representation vector dimension is 128, the dimension of the attention vector $Att(\cdot )$ to 16, the training batch size to 256 and the number of attention head is 8, with a dropout ratio to 0.5. The max steps (T) for IMDb and DBLP are 2 and 4, respectively. For a fair comparison, we set the node representation dimension of all the models mentioned above to 64.

For random walk-based methods, including DeepWalk, ESim, metapath2vec and HERec, we set the window size to 5, walk length to 100, walks per node to 40 and the number of negative samples to 5. For GAT, HAN, and MAGNN, we set the number of attention heads to 8. For HAN and MAGNN, we set the dimension of the attention vector in inter-meta path aggregation to 128. For meta-path guided methods, including Esim, metapath2vec, HERec, HAN and MAGNN, we give them the pre-defined meta-paths as in Wang et al. (2019). For IMDb, there are two meta-paths: Movie $\rightarrow $ Actor $\rightarrow $ Movie and Movie $\rightarrow $ Director $\rightarrow $ Movie. For DBLP, there are three meta-paths: Author $\rightarrow $ Paper $\rightarrow $ Author, Author $\rightarrow $ Paper $\rightarrow $ Term $\rightarrow $ Papper $\rightarrow $ Author, Author $\rightarrow $ Paper $\rightarrow $ Venue $\rightarrow $ Papper $\rightarrow $ Author, and Venue $\rightarrow $ Paper $\rightarrow $ Author. For the relational learning model, we use the implementation published with the official paper and adopt model settings the same as the official settings for the IMDb dataset. For relational learning and graph neural network-based models, we test them with the same parameters as PM-HGNN on the same data split. Competing models are implemented with Pytorch^{Footnote 18} following the published implementations .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhong, Z., Li, CT. & Pang, J. Personalised meta-path generation for heterogeneous graph neural networks. Data Min Knowl Disc 36, 2299–2333 (2022). https://doi.org/10.1007/s10618-022-00862-z

Download citation

Received: 09 December 2021
Accepted: 18 August 2022
Published: 02 October 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10618-022-00862-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Personalised meta-path generation for heterogeneous graph neural networks

Abstract

Similar content being viewed by others

Graph Neural Network Operators: a Review

Hierarchical Policy Network with Multi-agent for Knowledge Graph Reasoning Based on Reinforcement Learning

ASFGNN: Automated separated-federated graph neural network

1 Introduction

2 Related work

3 Preliminaries

3.1 Problem Statement

Definition 1

Definition 2

Definition 3

3.2 Markov decision process

3.3 Solving MDP with reinforcement learning

4 The PM-HGNN Framework

4.1 Personalised Meta-path Generation with RL

4.2 Information aggregation with personalised meta-paths

4.3 Model training

4.4 PM-HGNN++

4.5 Model analysis

5 Experiments

5.1 Experimental settings

5.2 Experimental results

5.3 Meta-path analysis

5.4 Model snalysis on PM-HGNN++

6 Conclusions and future work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Appendices

Datasets

More Meta-paths Analysis

Competing Methods

Model comparison

Model configuration

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation