1 Introduction

The complex interactions in real-world data, such as social networks, biological networks and knowledge graphs, can be modelled as Heterogeneous Information Networks (HINs) (Sun and Han 2012), which are commonly associated with multiple types of nodes and relations. Take the academia HIN depicted in Fig. 1a as an example; it involves 4 types of nodes, including Papers (P), Authors (A), Institutions (I) and publication venues (V), and 8 types of relations. Due to the capability of HINs to depict the complex inter-dependency between nodes, they have attracted increasing attention in the research community and have been applied in the fields of relational learning (Van Otterlo 2005), recommender systems (Zheng et al. 2021), information retrieval (Wan et al. 2020), etc. However, the complex semantics and non-Euclidean nature of HINs render them challenging to modulate by conventional machine learning algorithms designed for tabular or sequential data.

Over the past decade, a significant line of research on HINs is Heterogeneous Graph Representation Learning (HGRL). The goal of HGRL is to learn latent node representations, which encode complex graph structure and multi-typed nodes, for downstream tasks, including link prediction (Wan et al. 2020), node classification (Wang et al. 2019) and node clustering (Fu et al. 2020). As discussed in a recent survey (Dong et al. 2020), one of the paradigms on HGRL is to manually define and use meta-paths to model HIN’s rich semantics and to leverage random walks to transform the graph structure into a set of sequences (Dong et al. 2017; Fu et al. 2017; Shi et al. 2019), which can be further exploited by shallow embedding learning algorithms (Mikolov et al. 2013; Le and Mikolov 2014). A meta-path scheme is defined as a sequence of relations over HIN’s network schema. For instance, an illustrative meta-path in the academia HIN in Fig. 1a is A\(\xrightarrow {Cite}\)A\(\xrightarrow { Write}\)P. Some follow-up shallow HGRL models try to avoid the requirement of manually defined meta-paths by developing jump and stay random walk strategies (Hussein et al. 2018), performing random walk with the guide of node contexts (Jiang et al. 2020), or switching to utilising network schema (Zhao et al. 2020). Nevertheless, these “shallow” methods neither support end-to-end training to learn more effective representations for a specific task nor fully utilise node attributes due to the limitation of the embedding algorithms.

Recently, in view of the impressive success of Graph Neural Networks (GNNs) (Kipf and Welling 2017; Velickovic et al. 2018), the second paradigm of HGRL attempts to devise Heterogeneous Graph Neural Networks (HGNNs) for HGRL (Schlichtkrull et al. 2019; Zhang et al. 2019; Fu et al. 2020), which extend various graph convolutions on HIN. Compared with “shallow” HGRL methods, HGNNs support an end-to-end training mechanism that can learn node representations with some labelled nodes’ assistance and are also empowered by more complex encoders instead of using the shallow embedding learning methods. HGNNs can model both structure and node attributes in HINs with the guidance of meta-paths. However, they still rely on the hand-crafted meta-paths to explicitly model the semantics of HINs, and obtaining meaningful and useful meta-paths for nodes in HINs to guide HGNNs is still highly non-trivial.

Fig. 1
figure 1

An example HIN and few meth-paths: a an academia HIN; b meta-paths designed for senior vs. junior authors

More precisely, existing meta-path guided HGRL methods simply assume that nodes with the same type share the same meta-path. Take the academia HIN as an example (Fig. 1a) and assume we plan to learn node representations to determine the research area of \({ Author}\)s. A meta-path \(\Omega _{1}\): A\(\xrightarrow { Write}\)P\(\xrightarrow { Published}\)V may be useful to learn a representation of a senior researcher since his/her published papers and attended venues may provide sufficient information to decide his/her research area. While learning the representation of a junior PhD candidate with just a few published papers, we may need to extract information from his collaborators following the meta-path: \(\Omega _{2}\): A\(\xrightarrow { Cite}\)A\(\xrightarrow { Write}\)P, because \(\Omega _{1}\) retains little information in the case of junior PhD candidates. Hence, we argue that we should generate a personalised meta-path for each “individual node” according to its attributes and neighbouring relational structures instead of giving each “node type” several pre-defined meta-paths in general.

Motivated by the outstanding success of Reinforcement Learning (RL) in strategy selection problems (Mnih et al. 2015), previous methods attempt to apply RL techniques to find paths between given node pairs which model the similarity between two nodes (Meng et al. 2015; Yang et al. 2018; Wan et al. 2020). The found paths are then fed into the encoder to learn representations for pairwise tasks like link prediction. Nevertheless, challenges still remain in designing personalised meta-paths of individual nodes for node-wise tasks.

Key challenges in personalised meta-paths generation. First, the definition of meta-paths requires rich domain knowledge that is extremely difficult to obtain in complex and semantic rich HINs (Dong et al. 2020). Specifically, given a HIN G with a node type set N, a relation type set R and a fixed meta-path length T. The possible meta-paths are contained in a set of size \((N \times R)^{T}\). Such a huge set can result in a combinatorial explosion when increasing the scales of N, R and T. Second, the representation capacities of manually-defined meta-paths are limited to a specific task on specific HIN since different G with the same N and R may have different node attributes and relation types distributions. It requires defining appropriate meta-paths for each task on each HIN, which is extremely difficult for practical applications.

In light of these challenges, we propose to investigate HGRL with the objectives of (i) learning to generate a personalised meta-path for each individual node in HINs automatically, (ii) learning node representations effectively and efficiently with personalised meta-paths, and (iii) retaining the end-to-end training strategy to achieve task-oriented optimisation. To achieve these objectives, we present a novel Personalised Meta-path based Heterogeneous Graph Neural Network (PM-HGNN), to unleash the power of HGRL.

Key Ideas of PM-HGNN. Generally, we aim to replace the human efforts on meta-path generation with an RL agent to address the limitations of the dependency on hand-crafted meta-paths of HGNNs. Compared with experts with domain knowledge, the RL agent can adaptively generate personalised meta-paths for each individual node in terms of a specific task/HIN through sequential exploration and exploitation. That said, the obtained meta-paths are no longer for specific types of nodes but are personalised for each individual node. Both graph structure and node attributes are considered in the meta-path generation process, and it is practicable for HINs with complex semantics.

As illustrated in Fig. 2, the meta-path generation process can be naturally considered as a Markov Decision Process (MDP), in which the next relation to extending the meta-path depends on the current state in a meta-path. Moreover, an HGNN model is proposed to learn node representations from the derived meta-paths that can be applied to a downstream task, such as node classification. We propose to employ a policy network (agent) to solve this MDP and use an RL algorithm enhanced by a reward function to train the agent. The reward function is defined as the performance improvement over historical performance, which encourages the RL agent to achieve better and more stable performance. In addition, we find that there exists a large computational redundancy during the information aggregation on meta-paths, thus we develop an efficient strategy to achieve redundancy-free aggregation.

We showcase an instance of our framework, i.e. PM-HGNN, by implementing it with a classic RL algorithm, i.e., Deep Q-learning (Mnih et al. 2015). Besides, we further propose an extension of PM-HGNN++ to deal with the issues of high computational cost and ignoring relational information in PM-HGNN. Specifically, PM-HGNN generates a meta-path for each node according to node attributes, while PM-HGNN++ further enables the meta-path generation to explore the structural semantics of HIN. PM-HGNN++ is able to not only significantly accelerate the HGRL process but also improve the effectiveness of learned node representations with promising performance on downstream tasks.

Main contributions. We summarise our contributions below:

  • We present a framework, PM-HGNN,Footnote 1 to learn node representations in a HIN without hand-crafted meta-paths. An essential novelty of PM-HGNN is that the generated meta-paths are personalised to every individual node rather than general to each node type.

  • We propose an attention-based redundancy-free mechanism to reduce redundant computation during heterogeneous information aggregation on the derived meta-path instances.

  • We further develop an extension of PM-HGNN, PM-HGNN++, which not only improves the meta-path generation by incorporating node attributes and relational structure but also accelerates the training process.

  • Experiments conducted on node classification tasks with unsupervised and (semi-)supervised settings exhibit that our framework can significantly and consistently outperform 16 competing methods (up to \(5.6\%\) Micro-F1 improvements). Advanced studies further reveal that PM-HGNN++ can identify meaningful meta-paths that human experts have ignored.

2 Related work

Relational learning In the past decades, research focused on using frameworks that could represent a variable number of entities and the relationships that hold amongst them. The interest in learning using this expressive representation formalism soon resulted in the emergence of a new subfield of machine learning that was described as relational learning (Van Otterlo 2005; Raedt 2008). For instance, TILDE (Blockeel and Raedt 1998) learns decision trees within inductive logic programming systems. Serafino et al. (2018) proposed an ensemble learning-based relational learning model for multi-type classification tasks in HINs. Petkovic et al. (2020) proposed a relational feature ranking method based on the gradient-boosted relational trees towards relational data. Lavrac et al. (2020) presented a unifying methodology combining propositionalisation and embedding techniques, which benefit from the advantages of both in solving complex relational learning tasks. Nevertheless, most of them are not in virtue of neural networks, which fall behind in automatically mining complex HINs.

Graph neural networks Existing GNNs generalise the convolutional operations of deep neural networks to deal with arbitrary graph-structured data. Generally, a GNN model can be regarded as using the input graph structure to generate the computation graph of nodes for message passing, the local neighbourhood information is aggregated to get more effective contextual node representations in the network (Bruna et al. 2014; Defferrard et al. 2016; Kipf and Welling 2017; Velickovic et al. 2018). However, the complex graph-structured data in the real world are commonly associated with multiple types of objects and relations. All of the GNN models mentioned above assume homogeneous graphs, thus it is difficult to apply them to HINs directly.

Heterogeneous graph representation learning HGRL aims to project nodes in a HIN into a low-dimensional vector space while preserving the heterogeneous node attributes and edges. A recent survey presents a comprehensive overview of HGRL (Dong et al. 2020), covering shallow heterogeneous network embedding methods (Dong et al. 2017; Fu et al. 2017; Fan et al. 2018), and heterogeneous GNN-based approaches that are empowered by rather complex deep encoders (Meng et al. 2015; Schlichtkrull et al. 2019; Zhang et al. 2019; Wang et al. 2019; Fu et al. 2020; Hu et al. 2020; Zheng et al. 2021). The “shallow” methods are characterised as an embedding lookup table, meaning that they directly encode each node in the network as a vector, and this embedding table is the parameter to be optimised. However, they cannot utilise the node attributes and do not support the end-to-end training strategy. On the other hand, inspired by the recent outstanding performance of GNN models, some studies have attempted to extend GNNs for HINs. R-GCNs (Schlichtkrull et al. 2019) keep a distinct linear projection weight for each relation type. HetGNN (Zhang et al. 2019) adopts different recurrent neural networks for different node types to incorporate multi-modal attributes. HAN (Wang et al. 2019) extends GAT (Velickovic et al. 2018) by maintaining weights for different meta-path-defined edges. MAGNN (Fu et al. 2020) defines meta-path instance encoders, which are used to extract the structural and semantic information ingrained in the meta-path instances. However, all of these models require manual effort and domain expertise to define meta-paths in order to capture the semantics underlying the given heterogeneous graph.

A recent model, HGT (Hu et al. 2020), attempts to avoid the dependency on hand-crafted meta-paths by devising transferable relation scores, but the number of layers limits its exploration range, and it introduces a large number of additional parameters to optimise. GTN Yun et al. (2019) selects meta-paths from a group of adjacency matrices with learnable weights. The weights are shared among all nodes and are thus not flexible to generate node-specific meta-paths for each individual node. In addition, FSPG Meng et al. (2015), AutoPath Yang et al. (2018) and MPDRL Wan et al. (2020) attempt to employ RL technologies to discover paths between pairs of nodes and further learn node representations for predicting the possibility of the existing edges between node pairs. They assume the founded paths explicitly represent the similarity between two nodes. However, they can only identify meta-paths that describe two nodes’ similarities instead of generating meta-path for individual nodes to learn their representations for node-wise tasks. Moreover, some work (Tanon et al. 2018; Ahmadi et al. 2020) concerns the discovery of frequent patterns in a HIN and the subsequent transformation of these patterns into rules, aka rule mining. But the found patterns are not designed for specific tasks or nodes. Consequently, we believe it is necessary and essential to developing a new HGRL framework that can support the adaptive generation of personalised meta-paths for each node in HIN for node-wise tasks.

Discussion Table 8 summarises the key advantages of PM-HGNN and compares it with a number of recent state-of-the-art methods. PM-HGNN is the first HGRL model that can adaptively generate personalised meta-paths for each individual node to support node-wise tasks and maintain the end-to-end training mechanism.

3 Preliminaries

3.1 Problem Statement

Definition 1

(Heterogeneous information network): A HIN is defined as a directed graph \(G=(V, E, N, R)\), associated with a node type mapping function \(\phi : V \rightarrow N\) and a relation type mapping function \(\varphi : E \rightarrow R\), where N and R are the sets of node and edge types, respectively. Node \(v_{i}\)’s attribute vector is denoted as \(x_i \in \mathbb {R}^{\lambda }\) (with the dimensionality \(\lambda \)).

Definition 2

(Meta-path): Given a HIN G, a meta-path \(\Omega \) with length T is defined as: \(\omega _{0} \xrightarrow {r_{1}} \omega _{1} \xrightarrow {r_{1}} \dots \xrightarrow {r_{T}} \omega _{T}\), where \(\omega _{j} \in N\) denotes a certain node type, and \(r_{i} \in R\) denotes a relation type.

Definition 3

(Meta-path Instance): Given a meta-path \(\Omega \), a meta-path instance p is defined as a node sequence following the schema defined by \(\Omega \).

Problem: Heterogeneous graph representation learning. We formulate heterogeneous graph representation learning (HGRL) as an information integration optimisation problem. For a given HIN G, \(f: (V, E)\rightarrow \mathbb {R}^{d}\) be a function to integrate information from node attributes and network structure as node representations. Without manually specifying any meta-paths, we aim to jointly generate \(\rho \) (e.g., \(\rho \)=1) meta-paths \(\{\Omega _{j}\}^{\rho }_{j=1}\) for each node \(v_i\in V\) to guide f to encode rich structural and semantic information in HIN, and accordingly learn representations for all nodes in G.

3.2 Markov decision process

Markov Decision Process (MDP) is an idealised mathematical form to describe a sequential decision process that satisfies the Markov Property (Sutton and Barto 1998). An MDP can be formally represented with the quadruple \((S, A, \mathcal {P}, \mathcal {R})\), where S is a finite set of states, A is a finite set of actions, and \(\mathcal {P}: S \times A \rightarrow (0, 1)\) is a decision policy function to identify the probability distribution of the next action according to the current state. Specifically, the policy encodes the state and the available action at step t to output a probability distribution \(\mathcal {P}(a_{t} \vert s_{t})\), where \(s_{t} \in S\) and \(a_{t} \in A\). \(\mathcal {R}\) is a reward function \(\mathcal {R}: S \times A \rightarrow \mathbb {R}\), evaluating the result of taking action \(a_{t}\) on the observed state \(s_{t}\).

Fig. 2
figure 2

Illustration of generating meta-paths as an MDP

Modelling HGRL with MDP. As illustrated in Fig. 2, the meta-path generation process of HGRL can be naturally modelled as an MDP. To generate a meta-path with maximum of 3 steps for \(v_{\textit{A-1}}\), we take the first step as an example, the state \(s_{1}\) is identifiable information of \(v_{\textit{A-1}}\) and the action set includes relations in the HIN, i.e., \(\{ Work, Cite, \dots \}\). The decision maker selects one relation from the action set to extend the meta-path as \(a_{1} = { argmax}_{a \in A}(\mathcal {P}(a \mid s_{1}))\). Then, the selected meta-path is fed into HGNN to learn node representation and apply it to the downstream task to obtain a reward score \(\mathcal {R}(s_{1}, a_{1})\) that can be used to update \(\mathcal {P}\). We refer to Sect. 4 for more details about modelling HGRL with MDP.

3.3 Solving MDP with reinforcement learning

Deep Reinforcement Learning (RL) is a family of algorithms that optimise the MDP with deep neural networks. At each step t, the RL agent takes action \(a_{t} \in A\) based on the current state \(s_{t} \in S\), and observes the next state \(s_{t+1}\) as well as a reward \(r_{t} = \mathcal {R}(s_{t}, a_{t})\). Looking at the definition of MDP, the agent acts as the decision policy with \(\mathcal {P}\). We aim to search for the optimal decisions that maximise the expected discounted cumulative reward, i.e., we aim to learn an agent network \(\pi : S \rightarrow A \) to maximise \(\mathbb {E}_{\pi } [\sum _{t'=t}^{T} \gamma ^{t'}r_{t'}]\), where T is the maximum number of steps, and \(\gamma \in [0, 1]\) is a discount factor to balance short-term and long-term gains, and smaller \(\gamma \) values place more emphasis on immediate rewards (Arulkumaran et al. 2017).

Existing RL algorithms are mainly classified into two series: model-based and model-free algorithms. Compared with model-based algorithms, model-free algorithms have better flexibility, as there is always the risk of suffering from model understanding errors, which in turn affects the learned policy (Arulkumaran et al. 2017). We adopt a classic model-free RL algorithm, i.e., Deep Q-learning (DQN) (Mnih et al. 2015). The basic idea of DQN is to estimate the action-value function by using the Bellman equation (Mnih et al. 2015) (\(\mathcal {Q}^{*}\)) as an interactive update based on the following intuition: if the optimal value \(\mathcal {Q}^{*}(s_{t}, a_{t})\) of the state \(s_{t}\) was known for all possible actions \(a_{t} \in A\), then the optimal policy is to select the action \(a_{i}\) maximising the expected value \(\mathcal {R}(s_{t}, a_{t}) + \gamma \mathcal {Q}^{*}(s_{t+1}, a_{t+1})\). And it is common to use a function approximator \(\mathcal {Q}\) to estimate \(\mathcal {Q}^{*}\) (Mnih et al. 2015):

$$\begin{aligned} \mathcal {Q}(s_{t},a_{t}; \theta ) = \mathbb {E}_{s_{t+1}} \left[ \mathcal {R}(s_{t}, a_{t}) + \gamma \max _{a_{t+1}\in A} \left( \mathcal {Q}(s_{t+1}, a_{t+1}; \theta )\right) \right] , \end{aligned}$$
(1)

where \(\theta \) stands for the trainable parameters in the neural network that is used to estimate the decision policy. A value iteration algorithm will approach the optimal action-value \(\widetilde{\mathcal {Q}}\), i.e., \(\mathcal {Q} \rightarrow \widetilde{\mathcal {Q}}\), as \(t \rightarrow \infty \).

DQN exploits two techniques to stabilise the training: (1) the memory buffer \(\mathcal {D}\) that stores the agent’s experience in a replay memory, which can be accessed to perform the weight updating. (2) the separate target Q-network (\(\hat{\mathcal {Q}}\)) to generate the target for Q-learning, which is periodically updated.

4 The PM-HGNN Framework

We present the overview of the proposed PM-HGNN in Fig. 3a, which consists of two components: the RL agent module and an HGNN module. According to states, the RL agent aims at predicting actions for each individual node to arrive at better rewards. Next, we generate meta-path instances based on generated personalised meta-paths to support the information aggregation of HGNN to learn effective node representations. Finally, we apply generated representations on the downstream task for performance evaluation to obtain reward scores and save states, actions, and reward scores into the RL agent for the subsequent updating.

Fig. 3
figure 3

Overview of PM-HGNN and PM-HGNN++

4.1 Personalised Meta-path Generation with RL

A personalised meta-path generation process with maximum T steps for each node can be modelled as a T-round decision-making process that can be naturally treated as an MDP. We elaborate on the alignment between each MDP component and the personalised meta-path generation process in the following.

State (S): The state is a vector used to assist the decision policy to select a relation type to extend the personalised meta-paths for each node. Hence, it is crucial to comprehensively encode existing parts of a meta-path into a state. We adopt a gating mechanism to adaptively update the state. Take \(v_{i}\)’s meta-path \(\Omega \) (starting from node type \(N_{v_{i}}\)) as an example, the state \(s_{t}\) of \(\Omega \) at step t is formally defined as:

$$\begin{aligned} s_{t} = q \circ \left( \frac{1}{\vert D(v_{i}) \vert } \sum _{j \in D(v_i)} x_{j}\right) + (1-q) \circ s_{t-1}, \end{aligned}$$
(2)

where \(\circ \) stands for the Hadamard product, \(D(v_{i})\) represents a set of past nodes at step t, and \(\frac{1}{\vert D(v_{i}) \vert } \sum _{j \in D(v_{i})} x_{j}\) represents the average vector of past nodes’ attributes. \(s_{t-1}\) stands for the state at step \(t-1\). q is the update gate that can determine whether to update a state with past nodes’ attributes, and we estimate it by exploring the relationship between past nodes’ attributes and states. It is formally defined as: \(q = Sigmoid\left( f_{\varphi }\left( (\frac{1}{\vert D(v_i) \vert } \sum _{j \in D(v_i)} x_{j}) \parallel s_{t-1}\right) \right) \). \(f_{\varphi }\) can be seen as a shared convolutional kernel (Velickovic et al. 2018), and \(1-q\) is the reset gate.

Action (A): The action space is a set of relation types to be chosen by the policy network to extend the meta-path, and each relation type is labelled with a positive integer. Note that we add a special action STOP to allow each node to have flexible-length meta-paths. Beginning with the starting node \(v_{i}\), the decision policy iteratively predicts which relation can lead to higher reward scores and accordingly use it to extend the current meta-path. The decision policy selects the action STOP to finish the path generation process if encountering a state that includes any extra relationship to the current meta-path hurts the performance on the downstream task.

Decision policy (\(\mathcal {P}\)): The decision policy aims at mapping a state in S into action in A. The state space and the action space are continuous and discrete, respectively. Thus we use a deep neural network to approximate the action-value function: \(\mathcal {P}(a_{t} \vert s_{t}; \theta ): S \times A \rightarrow (0, 1)\). Besides, since any arbitrary action \(a \in A\) is always a positive integer, we use DQN (Mnih et al. 2015) to optimise the meta-path generation problem. We utilise an MLP (Haykin 1999) as the decision policy network in the DQN, defined as: \(z_{1} = W^T_{1} s_{t} + b_{1}\), \(z_{2} = W^T_{2} z_{1} + b_{2}\), ..., \(\hat{P} = { Softmax}(\phi _{m}(W^T_{m} z_{m-1} + b_{m}))\), where \(W_{m}\) and \(b_{m}\) denote the weight matrix and bias vector for the \(m\)-th layer’s perceptron, respectively. The output \(\hat{P} \in (0, 1)\) stands for the possibilities of selecting different relations \(a_{t} \in A\) to extend the meta-path. Note that it is possible to adopt other RL algorithms to optimise the policy network. Here, we utilise a basic RL algorithm to illustrate our framework’s main idea and demonstrate its effectiveness.

Reward function (\(\mathcal {R}\)): We devise a reward function to encourage the RL agent to achieve better and more stable performance on downstream tasks. We define the reward function \(\mathcal {R}\) as the performance improvement on the specific downstream task comparing with the historical performances, given by:

$$\begin{aligned} \mathcal {R}(s_{t}, a_{t}) = \mathcal {M}(s_{t}, a_{t}) - \frac{\sum _{j=t-b}^{t-1} \mathcal {M}(s_{j}, a_{j})}{b}, \end{aligned}$$
(3)

where \(\frac{\sum _{i=t-b}^{t-1} \mathcal {M}(s_{j}, a_{j})}{b}\) is the baseline performance value at step t, which contains the historical performances of the last b steps. \(\mathcal {M}(s_{t}, a_{t})\) is the performance based on the learned node representation \(H^{t}[v_{i}]\) on the downstream task (e.g., node classification). And use its accuracy on the validation set as the evaluation performance \(\mathcal {M}\).

Optimisation The proposed meta-path generation at step t consists of three phases: (i) obtaining state \(s_{t}\); (ii) predicting an action \(a_{t}=\arg \max _{a \in A}(Q(s_{t}, a; \theta ))\) to extend the meta-path according to the current state \(s_{t}\); (iii) updating state \(s_{t}\) to \(s_{t+1}\). Moreover, we train the policy network Q-function by optimising Eq. 1 with the reward function as defined in Eq. 3, and the loss function is given by:

$$\begin{aligned} \mathcal {L}_{Q}(\theta )&= \mathbb {E}_{\mathcal {T} \sim {U(\mathcal {D})}}[(\mathcal {R}(s_{t}, a_{t})&+ \gamma \max _{a_{t+1} \in A} \hat{\mathcal {Q}}(s_{t+1}, a_{t+1}; \theta ^{-}) \nonumber \\&\quad - \mathcal {Q}(s_{t}, a_{t}; \theta ) )^{2}], \end{aligned}$$
(4)

where \(\mathcal {T} = (s_{t}, a_{t}, s_{t+1}, \mathcal {R}(s_t, a_t))\) is randomly sampled replay memory from the memory buffer \(\mathcal {D}\), \(\theta ^{-}\) is the set of parameters in the separate target Q-network \(\hat{Q}\), \(\max \limits _{a_{t+1}\in A} \hat{Q}(s_{t+1}, a_{t+1}; \theta ^{-})\) stands for the optimal target value, and \( Q(s_{t}, a_{t}; \theta )\) is the predicted reward value based on the training Q-network. We optimise the Q-network’s parameters by minimising the loss function.

4.2 Information aggregation with personalised meta-paths

Due to the heterogeneity of nodes in HIN, various types of nodes have different attribute spaces. Before the information aggregation, we first do a node type-specific transformation to project the latent features of different node types into the same space, given by: \(H^{0}[v_{i}] = W_{\omega _{i}} \cdot x_{i}\), where \(x_{i}\) and \(H^{0}[v_{i}]\) are the original attributes and projected features of node \(v_{i}\), and \(W_{\omega _{i}}\) is a learnable transformation matrix for the type node \(\omega _{i} = \phi (v_i)\). Next, we perform the information aggregation using the generated meta-path instances to learn effective node representations for the downstream task. Take node \(v_{i}\) as an example, and its corresponding obtained meta-path is assumed to be \(\Omega \).

Fig. 4
figure 4

Comparison between information aggregation by conventional meta-path instances and the proposed redundancy-free computation. a The sequential aggregation path instances generated from the meta-path \(\Omega \): P\(\xrightarrow { Written\_by}\)A\(\xrightarrow { Cite}\)A\(\xrightarrow { Write}\)P (A: Author, P: Paper). b The improved aggregation structure by our redundancy-free computation method with attention scores to distinguish messages from different nodes

We first generate meta-path instances \(\{p_{1}, p_{2}, \dots \}\) based on meta-path \(\Omega \). As the example shown in Fig. 4, there could be many meta-path instances that follow meta-path \(\Omega \), (e.g., \(\Omega : { Paper} \xrightarrow { Written\_by} { Author} \xrightarrow { Cite} { Author} \xrightarrow { Write} { Paper}\)). An intuitive approach to performing information aggregation is to adopt sequential aggregation, e.g., HetGNN Zhang et al. (2019) and HAN Wang et al. (2019). But we argue that these aggregated paths have computation redundancy, and the approach to aggregation can be further improved. For example, \(P_{3} \rightarrow A_{2}\) and \(P_{5} \rightarrow A_{2}\) are repeatedly calculated in the first aggregation step. We aim to merge these two instances into one process \(\{P_{3}, P_{5} \} \rightarrow A_{2}\) to reduce redundant computations, termed as redundancy-free aggregation. Besides, we learn the attention scores \(Att(v_{i},v_{j})\), where nodes \(v_{i}\) and \(v_{j}\) are two ends of a link, for each link in the aggregation path so that messages on different nodes can be distinguished from each other.

Let \(H^{0}[v_i]\) be the projected features of node \(v_{i}\) involved in the instance set \(\{p_{1}, p_{2}, \dots \}\) of meta-path \(\Omega \). We give the updating procedure of \(H[v_{i}]\):

$$\begin{aligned} H^{l}[v_{i}] = \mathop { Aggregate}\limits _{v_{j} \in I(v_{i})} \left( { Att}(v_{i}, v_{j}) \cdot H^{l-1}[v_{j}]\right) , \end{aligned}$$
(5)

where \(I(v_{i})\) indicates the set of past nodes of \(v_{i}\) in \(\{p_{1}, p_{2}, \dots \}\), \(l \in \{1, 2, \dots , t \}\) is the id of aggregator to perform information aggregation, and \({ Aggregate}(\cdot ) = Relu ({ Mean}(\cdot ))\). The operator \({ Att(\cdot )}\) calculates the importance of the received messages using the relation between past messages and node features, given by:

$$\begin{aligned} { Att}(i, j) = \mathop { Softmax}\limits _{j \in I(v_{i})} \left( LeakyRelu \left( W H^{l-1}[v_{j}] \parallel W H^{l-1}[v_{i}]\right) \right) , \end{aligned}$$
(6)

where W are the trainable parameters, and \(\parallel \) is the concatenation operator. Sect. 5.4 presents an empirical study to reveal how the redundancy-free aggregation affects the time efficiency.

figure a

4.3 Model training

Training of HGNN Finally, the updated node representation \(H^t[v_{i}]\) can be applied to downstream tasks. Take semi-supervised node classification as an example, with a small fraction of labelled nodes, we can optimise the HGNN’s parameters by minimising the Cross-Entropy loss \(\mathcal {L}\) via back-propagation and gradient descent: \(\mathcal {L} = - \sum \limits _{v \in \mathcal {V}_{L}} \sum \limits _{c=0}^{C-1}y_{v}[c] \cdot \log (H^{t}[v][c])\), where \(\mathcal {V}_{L}\) stands for the set of nodes with labels, C is the number of classes, \(y_{v}\) is the one-hot label vector of node v and \(H^{t}[v]\) is the learned representation vector of node v. Meanwhile, the learned node representation can be evaluated by task-specific evaluation matrix \(\mathcal {M}_{ eval}\) for obtaining a reward score. We summarise our PM-HGNN framework in Algorithm 1. Note that PM-HGNN framework maintains the ability of HGNN in learning node representations for newly added nodes to the graph. The trained RL agent can adaptively generate a meta-path for the new node to compute its representations.

4.4 PM-HGNN++

As seen in previous sections, PM-HGNN adopts the RL agent to generate meta-paths for HGRL adaptively. However, reviewing the overall workflow of PM-HGNN, we identify two limitations. (1) PM-HGNN neglects the relational structures of HIN while generating meta-paths; (2) HGNN requires a large number of epochs to train their parameters and obtain node representations, which restricts the efficiency of PM-HGNN. To be more specific, PM-HGNN generates meta-paths by only considering the node attribute information from the HIN, i.e., state \(s_{t}\) summarising the attributes of nodes involved in the meta-path as Eq. 2. However, HIN’s semantic structure is also important information to assist the meta-path generation. In addition, similar to the training process of other deep neural networks (Kipf and Welling 2017), the encoder of PM-HGNN needs a number of epochs to train their parameters to learn effective node representations in terms of a set of meta-path instances. In this way, if HGNN needs B epochs to train their parameters, then we need \(T \times B\) epochs to complete a meta-path generation process with the maximum number of steps T. Meanwhile, the RL agent also needs lots of explorations to obtain numerous samples (\(\mathcal {T}=(s_{t}, a_{t}, s_{t+1}, \mathcal {R}(s_t, a_t))\)) to train the policy network that can result in a combinatorial scale explosion. In the end, following the average aggregation approach to defining node states, PM-HGNN is not able to distinguish node states with categorical attributes, as shown in Eq. 2.

New State (S). To address the identified limitations, we propose an extension of PM-HGNN, PM-HGNN++, which can utilise the structure information of HIN and significantly accelerate the meta-path generation process. Our solution is to define a novel state instead of Eq. 2. Because we notice that the encoder (i.e., HGNN) of PM-HGNN can summarise node attributes and topological structure information involved in meta-paths to assist meta-path generation. In particular, we utilise the latest node representation vector \(H^{t-1}[v_i]\) of meta-path’s starting node \(v_{i}\) as the state. The new state (S) can be formally described as:

$$\begin{aligned} s_{t} = { Normalise}(H^{t-1}[i]), \end{aligned}$$
(7)

where \({ Normalise}\) is a normaliser trained on the first B generated states to convert the states into the same distribution. Hence, input will be normalized as \(\frac{H[v_{i}]-H_{mean}}{H_{std}}\), where \(H_{mean}\) and \(H_{std}\) is calculated by the first B states. Note that normaliser is a common trick used in deep RL for stabilising the training when the S is very sparse (Hou et al. 2017). In addition, \(H^{t}[i]\) contains not only node attributes but also the relevant graph structure information of each node, hence \(s_t\) of Eq. 7 can distinguish different nodes with categorical attributes. So, it endows PM-HGNN++ with the ability to handle graphs with diverse node attributes.

figure b

New training process. The training process can be further improved based on the new definition of the state. The learned node representation can be further used to optimise HGNN’s parameters and update the RL agent. We achieve such a kind of mutual optimisation between the RL agent and HGNN since the meta-paths are generated based on the current status of HGNN. With this novel training process, we only need T epochs to complete a meta-path generation procedure because we do not need to wait for the encoder to complete an entire training process, instead of \(T \times B\) epochs. We summarise PM-HGNN++ in Algorithm 2. We perform empirical analysis to compare the time consuming of PM-HGNN and PM-HGNN++ in Sect. 5.4. Note that, similar to PM-HGNN, PM-HGNN++ framework can adaptively generate a meta-path for the new node to compute its representation, which maintains the ability of HGNN to learn representations for newly added nodes.

4.5 Model analysis

The proposed models can deal with various types of nodes and relations and fuse rich semantics in HINs. Benefiting from the RL agent, personalised meta-paths are generated for different nodes, and the HGNN encoder allows information to transfer between nodes via diverse relations. The RL agent that we adopted in this paper, i.e., Deep Q-learning (DQN), is hard to give the precise computation complexity (Fan et al. 2020), hence we give the empirical meta-path generation time in IMDb and DBLP datasets in Sect. 5.4 (Fig. 8). Given a meta-path \(\Omega \) and the dimensionality of (hidden) node representations is d. The time complexity of the node representation learning process is \(\mathcal {O}(V_{\Omega } d^2 + E_{\Omega }d)\), where \(V_{\Omega }\) is the number of nodes following the meta-path \(\Omega \) and \(E_{\Omega }\) is the number of meta-paths based node pairs. \(\mathcal {O}(V_{\Omega } d^2)\) accounts for the representation transformation process and \(\mathcal {O}(E_{\Omega }d)\) represents the \(Att(\cdot )\) computation process of relevant edges. To further unfold the relationship between the complexity of generated meta-paths and the performance, we also report how the maximum step T and the number of aggregation paths affect the node classification performance based on the IMDb dataset in Sect. 5.4 (Table 6).

5 Experiments

5.1 Experimental settings

Datasets We adopt three HIN datasets (IMDb and DBLP, ACM) (Wang et al. 2019; Fu et al. 2020) from different domains to evaluate our models’ performance. Statistic information is summarised in Table 4 and the meta-relation schema are shown in Fig. 9. The detailed description of datasets refers to Appendix A.

Competing methods and model configuration We compare their performance against various state-of-the-art models. They include 5 homogeneous graph representation learning models: LINE Tang et al. (2015), DeepWalk Perozzi et al. (2014), MLP, GCN Kipf and Welling (2017), GAT Velickovic et al. (2018); and 10 HGRL models: Esim Shang et al. (2016), metapath2vec Dong et al. (2017), JUST Hussein et al. (2018), HERec Shi et al. (2019), NSHE Zhao et al. (2020), RGCN Schlichtkrull et al. (2019), HAN Wang et al. (2019), GTN Yun et al. (2019), MAGNN Fu et al. (2020) and HGT Hu et al. (2020); and 1 state-of-the-art relational learning model PropStar (Lavrac et al. 2020). Note that FSPG Meng et al. (2015) AutoPath Yang et al. (2018) and MPDRL Wan et al. (2020) do not support node-wise tasks since they learn paths that explicitly represent the similarity between pairs of nodes (as discussed in Sec. 2). Therefore we cannot compare them. The detailed model description and configuration for implementation refer to Appendix C and Appendix E.

Evaluation settings We evaluate our models under node classification. For the semi-supervised setting, we adopt the same settings as in MAGNN Fu et al. (2020) to use 400 instances for training and another 400 instances for validation, the rest nodes for testing. The generated node representations are further fed into a support vector machine (SVM) classifier to get the prediction results. For the supervised setting, we use the same percentage of nodes for training and validation, respectively. Use the rest of the nodes for testing. We report the average Micro-F1 and Macro-F1 scores of 10 runs with different seeds.

Table 1 Experiment results (%) on the IMDb, DBLP and ACM datasets for the node classification task with unsupervised and semi-supervised settings. MP2V stands for metapath2vec

5.2 Experimental results

Semi-supervised node classification. We present the results in Table 1, in which competing methods that cannot support semi-supervised settings utilise the unsupervised settings. PM-HGNN++ performs consistently better than all competing methods across different proportions and datasets. On IMDb, the performance gain obtained by PM-HGNN++ over the best competing method (MAGNN) is around (\(3.7\%-5.08\%\)). GNN models designed for homogeneous and heterogeneous graphs perform better than shallow HGRL models. This demonstrates that the modelling of heterogeneous node attributes improves the performance significantly. On DBLP and ACM, the performances of all models are overall better compared with the results on IMDb. It is interesting to observe that different from the results on IMDb, the shallow heterogeneous network methods, i.e., JUST and NSHE obtain better performances than a few number of homogeneous GNNs. That said, the heterogeneous relations on DBLP and ACM are useful for the node classification task. PM-HGNN++ apparently outperforms the strongest competing method (i.e., MAGNN) up to \(2.4\%\), showing the generality and superiority of PM-HGNN++. In addition, among unsupervised competing methods, NSHE obtains better performance than other unsupervised methods, showing that network schema in HIN is helpful to obtain better representations.

Table 2 Experiment results (%) on the IMDb, DBLP and ACM datasets for the node classification task with supervised settings

Supervised node classification. We present the results in Table 2. We can see that PM-HGNN++ consistently outperforms all competing models on IMDb, DBLP and ACM datasets, with up to \(5.6\%\) in terms of Micro-F1. Heterogeneous GNNs outperform homogeneous GNNs, and our PM-HGNN++ achieves the best performance. This demonstrates the usefulness of heterogeneous relations and the advantages of generating appropriate personalised meta-paths for each node according to its attributes and relational structure.

5.3 Meta-path analysis

Meta-path generation process visualisation We visualise how the RL agent in PM-HGNN++ generates personalised meta-paths for each target node in Figs. 10,11 and Figs. 12.10 summarises the actions made by the RL agent on IMDb with different max steps under the semi-supervised setting. The percentages marked in the figure present the fraction of nodes choosing the corresponding relation to extend the meta-path at that step. For example, the meta-path \({ M\xrightarrow {64.8\%}A\xrightarrow {8.8\%}M}\) with \(T=2\). \(64.8\%\) means there are \(64.8\%\) Movie nodes select MA relation at step-1, and \(8.8\%\) means among those Movie nodes who selected MA at step-1, there are \(8.8\%\) nodes selecting AM to extend the meta-path. It is interesting to see that the RL agent selects the relation MA at the first step more often than the other one MD. It means that a movie’s characteristic is more related to its actors than its director. Besides, when step \(T=2\), the RL agent selects plenty of STOP actions. This shows that two short meta-paths, i.e., \({ Movie}\xrightarrow { MA} { Actor}\) and \({ Movie} \xrightarrow { MD} { Director}\) are informative enough for the majority of nodes to learn effective representations. Moreover, there are \(6.7\%-9.4\%\) nodes that do not need any meta-paths to learn their representations. This implies that their attributes can provide enough information. This has also been reflected in the results of MLP in Sect. 5.2, which does not utilise any structural information but only node attributes. More analyses on Figs. 11 and12 can be found in Appendix B.

Table 3 Meta-paths designed by PM-HGNN++ on IMDb

Overall meta-path statistics. We further present meta-paths generated for most of the target nodes in Table  3,Footnote 2 which are summarised from Fig. 10. Table 6 and7 present the generated meta-paths on DBLP and ACM, respectively. We can find from Table 3 that the RL agent can generate the same meta-paths as manually-defined ones. Besides, we find that these meta-paths specified by human experts are not the most useful ones to learn node representations for Movie nodes. Two short meta-paths \({ Movie} \xrightarrow { MA} { Actor}\) and \({ Movie} \xrightarrow { MD} { Director}\) are the most useful ones. This indicates that participants (director and actors) of the movie largely determine its type as the task is to predict the class of a movie. The top three meta-paths generated by the most potent competing method, GTN, confirm our observation that two shorter meta-paths are more valuable to learn Movie nodes’ representations. However, GTN can only select several meta-paths for every node type; our model can identify personalised meta-paths for each node. More analyses on Tables 6 and 7 can be found in Appendix B.

Fig. 5
figure 5

The average meta-path length generated by PM-HGNN++ on different datasets. MLP (pos)/ MLP (neg) are groups of nodes that MLP makes correct and wrong classifications, respectively

Meta-path comparison In order to understand how does PM-HGNN++ generate personalised meta-paths for different nodes, we hereby present another analysis to investigate the designed meta-path lengths for nodes that can be correctly classified according to raw node attributes. Specifically, we devide nodes into two groups, pos and neg, according to whether applying multi-layer perceptron (MLP) with node attributes can produce correct classification under supervised settings. That said, correctly-classified nodes and incorrectly-classified ones are grouped into pos and neg, respectively. Then, we calculate the average lengths of PM-HGNN++ generated meta-paths for nodes belonging to two groups. The results are reported in Fig. 5. We can find that nodes that cannot be correctly classified by MLP are associated with longer personalised meta-paths. Such results deliver an important insight: PM-HGNN++ tends to explore deeper relational neighbouring structure to enhance node representation learning, i.e., discriminating nodes with different labels from one another, if node attributes themselves cannot provide sufficient information. Moreover, in the DBLP dataset, which has more complicated heterogeneous semantic patterns (4 node types and 6 relation types), node group MLP (neg) has relatively longer meta-paths because PM-HGNN++ can have more options to explore and generate personalised meta-paths.

Fig. 6
figure 6

Micro-F1 and the number of aggregations of PM-HGNN++ on IMDb, with different max steps

5.4 Model snalysis on PM-HGNN++

Maximum step T. We study how the influence of maximum step T and the number of aggregation paths affect on the classification performance. The results on IMDb dataset are presented in Fig. 6. Setting \(T \ge 2\) leads to the most significant performance improvement in terms of Micro F1 over \(T\!=\!1\). As T increases, the number of aggregation paths also increases. Considering that more aggregation paths bring higher computational cost, we choose \(T\!=\!2\) for the experiments in Sect. 5.2, even though \(T\!>\!2\) can produce better performance.

Fig. 7
figure 7

The relative numbers of aggregations (# of without the redundancy-free aggregation divided by # of with the redundancy-free aggregation) under different maximum steps on IMDb (left) and DBLP (right)

Redundancy-free aggregation We investigate the influence of the redundancy-free improvement according to the number of aggregations with/without the improvement. Fig. 7 shows the results. We can find that our redundancy-free aggregation strategy significantly reduces the number of information aggregations. Besides, as the maximum step T increases, the reduction effect is more obvious. On IMDb, our method reduces near 50% aggregations with \(T=2\), and the reduced ratio is more than 80% when \(T=4\).

Fig. 8
figure 8

Run time analysis of meta-path generation for PM-HGNN and PM-HGNN++ under different max steps based on IMDb (left) and DBLP (right) datasets. The x-axis is the number of maximum timestep in generating meta-paths, and the y-axis is the relative time (i.e., the run time of PM-HGNN++ divided by the run time of PM-HGNN). Besides, the numbers at tops of bars indicate the run time values in seconds

Run time analysis of meta-path generation As described in Sect. 4.4, PM-HGNN++ can accelerate the meta-path generation process through the proposed novel training process. Here we compare the run time of PM-HGNN and PM-HGNN++ for personalised meta-paths generation. We report their actual run time in seconds, and calculate the run time of PM-HGNN++ relative to PM-HGNN (i.e., relative time). The results on IMDb and DBLP datasets are shown in Fig. 8. We can find that PM-HGNN++ is less than \(5\%\) of PM-HGNN’s run time on both datasets. Such results exhibit the promising time efficiency of PM-HGNN++.

6 Conclusions and future work

We have studied in this paper the HGRL problem and identified the limitation of existing HGRL methods, i.e., mainly due to their dependency on hand-crafted meta-paths. In order to fully unleash the power of HGRL, we presented a novel framework PM-HGNN and proposed one extension model PM-HGNN++. Compared with existing HGRL models, the most significant advantages of our framework lie in avoiding manual efforts in defining meta-paths of HGRL and generating personalised meta-paths for each individual node. The experimental results demonstrated that our framework generally outperforms the competing approaches and discovers useful meta-paths that have been ignored by human expertise. In the future, we plan to extend our framework to other tasks on HINs, such as online recommendation and knowledge graph completion and understanding PM-HGNN’s generated meta-paths is another promising direction.