Personalised Meta-path Generation for Heterogeneous GNNs

Recently, increasing attention has been paid to heterogeneous graph representation learning (HGRL), which aims to embed rich structural and semantic information in heterogeneous information networks (HINs) into low-dimensional node representations. To date, most HGRL models rely on hand-crafted meta-paths. However, the dependency on manually-defined meta-paths requires domain knowledge, which is difficult to obtain for complex HINs. More importantly, the pre-defined or generated meta-paths of all existing HGRL methods attached to each node type or node pair cannot be personalised to each individual node. To fully unleash the power of HGRL, we present a novel framework, Personalised Meta-path based Heterogeneous Graph Neural Networks (PM-HGNN), to jointly generate meta-paths that are personalised for each individual node in a HIN and learn node representations for the target downstream task like node classification. Precisely, PM-HGNN treats the meta-path generation as a Markov Decision Process and utilises a policy network to adaptively generate a meta-path for each individual node and simultaneously learn effective node representations. The policy network is trained with deep reinforcement learning by exploiting the performance improvement on a downstream task. We further propose an extension, PM-HGNN++, to better encode relational structure and accelerate the training during the meta-path generation. Experimental results reveal that both PM-HGNN and PM-HGNN++ can significantly and consistently outperform 16 competing baselines and state-of-the-art methods in various settings of node classification. Qualitative analysis also shows that PM-HGNN++ can identify meaningful meta-paths overlooked by human knowledge.


Introduction
The complex interactions in real-world data, such as social networks, biological networks and knowledge graphs, can be modelled as Heterogeneous Information Networks (HINs) [1], which are commonly associated with multiple types of nodes and relations. Take the academia HIN depicted in Fig. 1-(a) as an example; it involves 4 types of nodes, including Papers (P ), Authors (A), Institutions (I ) and publication venues (V ), and 8 types of relations. Due to the capability of HINs to depict the complex inter-dependency between nodes, they have attracted increasing attention in the research community and have been applied in the fields of relational learning [2], recommender systems [3], information retrieval [4], etc. However, the complex semantics and non-Euclidean nature of HINs render them challenging to modulate by conventional machine learning algorithms designed for tabular or sequential data. Over the past decade, a significant line of research on HINs is Heterogeneous Graph Representation Learning (HGRL). The goal of HGRL is to learn latent node representations, which encode complex graph structure and multi-typed nodes, for downstream tasks, including link prediction [4], node classification [5] and node clustering [6]. As discussed in a recent survey [7], one of the paradigms on HGRL is to manually define and use meta-paths to model HIN's rich semantics and to leverage random walks to transform the graph structure into a set of sequences [8][9][10], which can be further exploited by shallow embedding learning algorithms [11,12]. A meta-path scheme is defined as a sequence of relations over HIN's network schema. For instance, an illustrative meta-path in the academia HIN in Fig. 1-(a) is A Cite − −− →A Write − −−− →P. Some follow-up shallow HGRL models try to avoid the requirement of manually defined meta-paths by developing jump and stay random walk strategies [13], performing random walk with the guide of node contexts [14], or switching to utilising network schema [15]. Nevertheless, these "shallow" methods neither support end-toend training to learn more effective representations for a specific task nor fully utilise node attributes due to the limitation of the embedding algorithms.
Recently, in view of the impressive success of Graph Neural Networks (GNNs) [16,17], the second paradigm of HGRL attempts to devise Heterogeneous Graph Neural Networks (HGNNs) for HGRL [6,18,19], which extend various graph convolutions on HIN. Compared with "shallow" HGRL methods, HGNNs support an end-to-end training mechanism that can learn node representations with some labelled nodes' assistance and are also empowered by more complex encoders instead of using the shallow embedding learning methods. HGNNs can model both structure and node attributes in HINs with the guidance of meta-paths. However, they still rely on the hand-crafted metapaths to explicitly model the semantics of HINs, and obtaining meaningful and useful meta-paths for nodes in HINs to guide HGNNs is still highly non-trivial.
More precisely, existing meta-path guided HGRL methods simply assume that nodes with the same type share the same meta-path. Take the academia HIN as an example ( Fig. 1-(a)) and assume we plan to learn node representations to determine the research area of Author s. A meta-path Ω 1 : A Write − −−− →P Published − −−−−− →V may be useful to learn a representation of a senior researcher since his/her published papers and attended venues may provide sufficient information to decide his/her research area. While learning the representation of a junior PhD candidate with just a few published papers, we may need to extract information from his collaborators following the metapath: because Ω 1 retains little information in the case of junior PhD candidates. Hence, we argue that we should generate a personalised meta-path for each "individual node" according to its attributes and neighbouring relational structures instead of giving each "node type" several pre-defined meta-paths in general. Motivated by the outstanding success of Reinforcement Learning (RL) in strategy selection problems [20], previous methods attempt to apply RL techniques to find paths between given node pairs which model the similarity between two nodes [4,21,22]. The found paths are then fed into the encoder to learn representations for pairwise tasks like link prediction. Nevertheless, challenges still remain in designing personalised meta-paths of individual nodes for node-wise tasks. on meta-path generation with an RL agent to address the limitations of the dependency on hand-crafted meta-paths of HGNNs. Compared with experts with domain knowledge, the RL agent can adaptively generate personalised meta-paths for each individual node in terms of a specific task/HIN through sequential exploration and exploitation. That said, the obtained meta-paths are no longer for specific types of nodes but are personalised for each individual node. Both graph structure and node attributes are considered in the metapath generation process, and it is practicable for HINs with complex semantics.
As illustrated in Fig. 2, the meta-path generation process can be naturally considered as a Markov Decision Process (MDP), in which the next relation to extending the meta-path depends on the current state in a meta-path. Moreover, an HGNN model is proposed to learn node representations from the derived meta-paths that can be applied to a downstream task, such as node classification. We propose to employ a policy network (agent) to solve this MDP and use an RL algorithm enhanced by a reward function to train the agent. The reward function is defined as the performance improvement over historical performance, which encourages the RL agent to achieve better and more stable performance. In addition, we find that there exists a large computational redundancy during the information aggregation on meta-paths, thus we develop an efficient strategy to achieve redundancy-free aggregation.
We showcase an instance of our framework, i.e. PM-HGNN, by implementing it with a classic RL algorithm, i.e., Deep Q-learning [20]. Besides, we further propose an extension of PM-HGNN++ to deal with the issues of high computational cost and ignoring relational information in PM-HGNN. Specifically, PM-HGNN generates a meta-path for each node according to node attributes, while PM-HGNN++ further enables the meta-path generation to explore the structural semantics of HIN. PM-HGNN++ is able to not only significantly accelerate the HGRL process but also improve the effectiveness of learned node representations with promising performance on downstream tasks. Main Contributions. We summarise our contributions below: • We present a framework, PM-HGNN 1 , to learn node representations in a HIN without hand-crafted meta-paths. An essential novelty of PM-HGNN is that the generated meta-paths are personalised to every individual node rather than general to each node type. • We propose an attention-based redundancy-free mechanism to reduce redundant computation during heterogeneous information aggregation on the derived meta-path instances. • We further develop an extension of PM-HGNN, PM-HGNN++, which not only improves the meta-path generation by incorporating node attributes and relational structure but also accelerates the training process. • Experiments conducted on node classification tasks with unsupervised and (semi-)supervised settings exhibit that our framework can significantly and consistently outperform 16 competing methods (up to 5.6% Micro-F1 improvements). Advanced studies further reveal that PM-HGNN++ can identify meaningful meta-paths that human experts have ignored.

Related work
Relational Learning. In the past decades, research focused on using frameworks that could represent a variable number of entities and the relationships that hold amongst them. The interest in learning using this expressive representation formalism soon resulted in the emergence of a new subfield of machine learning that was described as relational learning [2,23]. For instance, TILDE [24] learns decision trees within inductive logic programming systems. Serafino et al. [25] proposed an ensemble learning-based relational learning model for multi-type classification tasks in HINs. Petkovic et al. [26] proposed a relational feature ranking method based on the gradient-boosted relational trees towards relational data. Lavrac et al. [27] presented a unifying methodology combining propositionalisation and embedding techniques, which benefit from the advantages of both in solving complex relational learning tasks. Nevertheless, most of them are not in virtue of neural networks, which fall behind in automatically mining complex HINs. Graph Neural Networks. Existing GNNs generalise the convolutional operations of deep neural networks to deal with arbitrary graph-structured data.
Generally, a GNN model can be regarded as using the input graph structure to generate the computation graph of nodes for message passing, the local neighbourhood information is aggregated to get more effective contextual node representations in the network [16,17,28,29]. However, the complex graphstructured data in the real world are commonly associated with multiple types of objects and relations. All of the GNN models mentioned above assume homogeneous graphs, thus it is difficult to apply them to HINs directly.
Heterogeneous Graph Representation Learning. HGRL aims to project nodes in a HIN into a low-dimensional vector space while preserving the heterogeneous node attributes and edges. A recent survey presents a comprehensive overview of HGRL [7], covering shallow heterogeneous network embedding methods [8,9,30], and heterogeneous GNN-based approaches that are empowered by rather complex deep encoders [3,5,6,18,19,21,31]. The "shallow" methods are characterised as an embedding lookup table, meaning that they directly encode each node in the network as a vector, and this embedding table is the parameter to be optimised. However, they cannot utilise the node attributes and do not support the end-to-end training strategy. On the other hand, inspired by the recent outstanding performance of GNN models, some studies have attempted to extend GNNs for HINs. R-GCNs [18] keep a distinct linear projection weight for each relation type. HetGNN [19] adopts different recurrent neural networks for different node types to incorporate multi-modal attributes. HAN [5] extends GAT [17] by maintaining weights for different meta-path-defined edges. MAGNN [6] defines meta-path instance encoders, which are used to extract the structural and semantic information ingrained in the meta-path instances. However, all of these models require manual effort and domain expertise to define meta-paths in order to capture the semantics underlying the given heterogeneous graph.
A recent model, HGT [31], attempts to avoid the dependency on handcrafted meta-paths by devising transferable relation scores, but the number of layers limits its exploration range, and it introduces a large number of additional parameters to optimise. GTN [32] selects meta-paths from a group of adjacency matrices with learnable weights. The weights are shared among all nodes and are thus not flexible to generate node-specific meta-paths for each individual node. In addition, FSPG [21], AutoPath [22] and MPDRL [4] attempt to employ RL technologies to discover paths between pairs of nodes and further learn node representations for predicting the possibility of the existing edges between node pairs. They assume the founded paths explicitly represent the similarity between two nodes. However, they can only identify meta-paths that describe two nodes' similarities instead of generating metapath for individual nodes to learn their representations for node-wise tasks. Moreover, some work [33,34] concerns the discovery of frequent patterns in a HIN and the subsequent transformation of these patterns into rules, aka rule mining. But the found patterns are not designed for specific tasks or nodes. Consequently, we believe it is necessary and essential to developing a new HGRL framework that can support the adaptive generation of personalised meta-paths for each node in HIN for node-wise tasks.
Discussion. Table 8 summarises the key advantages of PM-HGNN and compares it with a number of recent state-of-the-art methods. PM-HGNN is the first HGRL model that can adaptively generate personalised meta-paths for each individual node to support node-wise tasks and maintain the end-to-end training mechanism.

Problem Statement
Definition 1. (Heterogeneous Information Network): A HIN is defined as a directed graph G = (V, E, N, R), associated with a node type mapping function φ : V → N and a relation type mapping function ϕ : E → R, where N and R are the sets of node and edge types, respectively. Node v i 's attribute vector is denoted as x i ∈ R λ (with the dimensionality λ). Definition 2. (Meta-path): Given a HIN G, a meta-path Ω with length T is defined as: ω 0 where ω j ∈ N denotes a certain node type, and r i ∈ R denotes a relation type.
Definition 3. (Meta-path Instance): Given a meta-path Ω, a meta-path instance p is defined as a node sequence following the schema defined by Ω.
Problem: Heterogeneous Graph Representation Learning. We formulate heterogeneous graph representation learning (HGRL) as an information integration optimisation problem. For a given HIN G, f : (V, E) → R d be a function to integrate information from node attributes and network structure as node representations. Without manually specifying any meta-paths, we aim to jointly generate ρ (e.g., ρ=1) meta-paths {Ω j } ρ j=1 for each node v i ∈ V to guide f to encode rich structural and semantic information in HIN, and accordingly learn representations for all nodes in G.

Markov Decision Process
Markov Decision Process (MDP) is an idealised mathematical form to describe a sequential decision process that satisfies the Markov Property [35]. An MDP can be formally represented with the quadruple (S, A, P, R), where S is a finite set of states, A is a finite set of actions, and P : S × A → (0, 1) is a decision policy function to identify the probability distribution of the next action according to the current state. Specifically, the policy encodes the state and the available action at step t to output a probability distribution P(a t |s t ), where s t ∈ S and a t ∈ A. R is a reward function R : S × A → R, evaluating the result of taking action a t on the observed state s t .
Modelling HGRL with MDP. As illustrated in Fig. 2, the meta-path generation process of HGRL can be naturally modelled as an MDP. To generate a meta-path with maximum of 3 steps for v A-1 , we take the first step as an example, the state s 1 is identifiable information of v A-1 and the action set includes relations in the HIN, i.e., {Work , Cite, . . . }. The decision maker selects one relation from the action set to extend the meta-path as a 1 = argmax a∈A (P(a | s 1 )). Then, the selected meta-path is fed into HGNN to learn node representation and apply it to the downstream task to obtain a I-2 Meta-path instances of state 2: reward score R(s 1 , a 1 ) that can be used to update P. We refer to Sec. 4 for more details about modelling HGRL with MDP.

Solving MDP with Reinforcement Learning
Deep Reinforcement Learning (RL) is a family of algorithms that optimise the MDP with deep neural networks. At each step t, the RL agent takes action a t ∈ A based on the current state s t ∈ S, and observes the next state s t+1 as well as a reward r t = R(s t , a t ). Looking at the definition of MDP, the agent acts as the decision policy with P. We aim to search for the optimal decisions that maximise the expected discounted cumulative reward, i.e., we aim to learn an agent network π : where T is the maximum number of steps, and γ ∈ [0, 1] is a discount factor to balance short-term and long-term gains, and smaller γ values place more emphasis on immediate rewards [36].
Existing RL algorithms are mainly classified into two series: model-based and model-free algorithms. Compared with model-based algorithms, modelfree algorithms have better flexibility, as there is always the risk of suffering from model understanding errors, which in turn affects the learned policy [36]. We adopt a classic model-free RL algorithm, i.e., Deep Q-learning (DQN) [20]. The basic idea of DQN is to estimate the action-value function by using the Bellman equation [20] (Q * ) as an interactive update based on the following intuition: if the optimal value Q * (s t , a t ) of the state s t was known for all possible actions a t ∈ A, then the optimal policy is to select the action a i maximising the expected value R(s t , a t ) + γQ * (s t+1 , a t+1 ). And it is common to use a function approximator Q to estimate Q * [20]: where θ stands for the trainable parameters in the neural network that is used to estimate the decision policy. A value iteration algorithm will approach the optimal action-value Q, i.e., Q → Q, as t → ∞.
DQN exploits two techniques to stabilise the training: (1) the memory buffer D that stores the agent's experience in a replay memory, which can be accessed to perform the weight updating. (2) the separate target Q-network (Q) to generate the target for Q-learning, which is periodically updated.

The PM-HGNN Framework
We present the overview of the proposed PM-HGNN in Fig. 3-(a), which consists of two components: the RL agent module and an HGNN module. According to states, the RL agent aims at predicting actions for each individual node to arrive at better rewards. Next, we generate meta-path instances based on generated personalised meta-paths to support the information aggregation of HGNN to learn effective node representations. Finally, we apply generated representations on the downstream task for performance evaluation to obtain reward scores and save states, actions, and reward scores into the RL agent for the subsequent updating.

Personalised Meta-path Generation with RL
A personalised meta-path generation process with maximum T steps for each node can be modelled as a T -round decision-making process that can be naturally treated as an MDP. We elaborate on the alignment between each MDP component and the personalised meta-path generation process in the following. State (S): The state is a vector used to assist the decision policy to select a relation type to extend the personalised meta-paths for each node. Hence, it is crucial to comprehensively encode existing parts of a meta-path into a state. We adopt a gating mechanism to adaptively update the state. Take v i 's metapath Ω (starting from node type N vi ) as an example, the state s t of Ω at step t is formally defined as: where • stands for the Hadamard product, D(v i ) represents a set of past nodes at step t, and 1 |D(vi)| j∈D(vi) x j represents the average vector of past nodes' attributes. s t−1 stands for the state at step t − 1. q is the update gate that can determine whether to update a state with past nodes' attributes, and we estimate it by exploring the relationship between past nodes' attributes and states. It is formally defined as: f ϕ can be seen as a shared convolutional kernel [17], and 1 − q is the reset gate. Action (A): The action space is a set of relation types to be chosen by the policy network to extend the meta-path, and each relation type is labelled with a positive integer. Note that we add a special action STOP to allow each node to have flexible-length meta-paths. Beginning with the starting node v i , the decision policy iteratively predicts which relation can lead to higher reward scores and accordingly use it to extend the current meta-path. The decision policy selects the action STOP to finish the path generation process if encountering a state that includes any extra relationship to the current meta-path hurts the performance on the downstream task. Decision Policy (P): The decision policy aims at mapping a state in S into action in A. The state space and the action space are continuous and discrete, respectively. Thus we use a deep neural network to approximate the actionvalue function: P(a t |s t ; θ) : S × A → (0, 1). Besides, since any arbitrary action a ∈ A is always a positive integer, we use DQN [20] to optimise the meta-path generation problem. We utilise an MLP [37] as the decision policy network in the DQN, defined as: where W m and b m denote the weight matrix and bias vector for the m-th layer's perceptron, respectively. The outputP ∈ (0, 1) stands for the possibilities of selecting different relations a t ∈ A to extend the meta-path. Note that it is possible to adopt other RL algorithms to optimise the policy network. Here, we utilise a basic RL algorithm to illustrate our framework's main idea and demonstrate its effectiveness. Reward Function (R): We devise a reward function to encourage the RL agent to achieve better and more stable performance on downstream tasks. We define the reward function R as the performance improvement on the specific downstream task comparing with the historical performances, given by: where is the baseline performance value at step t, which contains the historical performances of the last b steps. M(s t , a t ) is the performance based on the learned node representation H t [v i ] on the downstream task (e.g., node classification). And use its accuracy on the validation set as the evaluation performance M. Optimisation. The proposed meta-path generation at step t consists of three phases: (i) obtaining state s t ; (ii) predicting an action a t = arg max a∈A (Q(s t , a; θ)) to extend the meta-path according to the current state s t ; (iii) updating state s t to s t+1 . Moreover, we train the policy network Q-function by optimising Eq. 1 with the reward function as defined in Eq. 3, and the loss function is given by: where T = (s t , a t , s t+1 , R(s t , a t )) is randomly sampled replay memory from the memory buffer D, θ − is the set of parameters in the separate target Q-networkQ, max at+1∈AQ (s t+1 , a t+1 ; θ − ) stands for the optimal target value, and Q(s t , a t ; θ) is the predicted reward value based on the training Q-network. We optimise the Q-network's parameters by minimising the loss function.

Information Aggregation with Personalised Meta-paths
Due to the heterogeneity of nodes in HIN, various types of nodes have different attribute spaces. Before the information aggregation, we first do a node typespecific transformation to project the latent features of different node types into the same space, given by the original attributes and projected features of node v i , and W ωi is a learnable transformation matrix for the type node ω i = φ(v i ). Next, we perform the information aggregation using the generated meta-path instances to learn effective node representations for the downstream task. Take node v i as an example, and its corresponding obtained meta-path is assumed to be Ω. The improved aggregation structure by our redundancy-free computation method with attention scores to distinguish messages from different nodes.
We first generate meta-path instances {p 1 , p 2 , . . . } based on meta-path Ω. As the example shown in Fig. 4, there could be many meta-path instances that follow meta-path Ω, (e.g., Ω : Paper ). An intuitive approach to performing information aggregation is to adopt sequential aggregation, e.g., HetGNN [19] and HAN [5]. But we argue that these aggregated paths have computation redundancy, and the approach to aggregation can be further improved. For example, P 3 → A 2 and P 5 → A 2 are repeatedly calculated in the first aggregation step. We aim to merge these two instances into one process {P 3 , P 5 } → A 2 to reduce redundant computations, termed as redundancy-free aggregation. Besides, we learn the attention scores Att(v i , v j ), where nodes v i and v j are two ends of a link, for each link in the aggregation path so that messages on different nodes can be distinguished from each other.
Let H 0 [v i ] be the projected features of node v i involved in the instance set {p 1 , p 2 , . . . } of meta-path Ω. We give the updating procedure of H[v i ]: where I(v i ) indicates the set of past nodes of v i in {p 1 , p 2 , . . . }, l ∈ {1, 2, . . . , t} is the id of aggregator to perform information aggregation, and Aggregate(·) = Relu(Mean(·)). The operator Att(·) calculates the importance of the received messages using the relation between past messages and node features, given by: where W are the trainable parameters, and is the concatenation operator. Sec. 5.4 presents an empirical study to reveal how the redundancy-free aggregation affects the time efficiency.

Model Training
Training of HGNN. Finally, the updated node representation H t [v i ] can be applied to downstream tasks. Take semi-supervised node classification as an example, with a small fraction of labelled nodes, we can optimise the HGNN's parameters by minimising the Cross-Entropy loss L via back-propagation and gradient descent: where V L stands for the set of nodes with labels, C is the number of classes, y v is the one-hot label vector of node v and H t [v] is the learned representation vector of node v. Meanwhile, the learned node representation can be evaluated by task-specific evaluation matrix M eval for obtaining a reward score. We summarise our PM-HGNN framework in Algorithm 1. Note that PM-HGNN framework maintains the ability of HGNN in learning node representations for newly added nodes to the graph. The trained RL agent can adaptively generate a meta-path for the new node to compute its representations. Sample a batch of nodes V j from V

5:
for v i ∈ V j do 6: Obtain state s t [v i ] via Eq. 2;

7:
Get Information aggregation on p t with trained HGNN via Eq. 5

15:
Obtain the learned node representations H t

16:
Obtain R(s t , a t ) on validation dataset via Eq. 3

17:
Store the quartuple T = (s t , a t , s t+1 , R(s t , a t )) into D

18:
Optimise Q-function using the data in D via Loss-function-Q (Eq. 4)

19:
end for 20: end for

PM-HGNN++
As seen in previous sections, PM-HGNN adopts the RL agent to generate meta-paths for HGRL adaptively. However, reviewing the overall workflow of PM-HGNN, we identify two limitations. (1) PM-HGNN neglects the relational structures of HIN while generating meta-paths; (2) HGNN requires a large number of epochs to train their parameters and obtain node representations, which restricts the efficiency of PM-HGNN. To be more specific, PM-HGNN generates meta-paths by only considering the node attribute information from the HIN, i.e., state s t summarising the attributes of nodes involved in the meta-path as Eq. 2. However, HIN's semantic structure is also important information to assist the meta-path generation. In addition, similar to the training process of other deep neural networks [16], the encoder of PM-HGNN needs a number of epochs to train their parameters to learn effective node representations in terms of a set of meta-path instances. In this way, if HGNN needs B epochs to train their parameters, then we need T × B epochs to complete a meta-path generation process with the maximum number of steps T . Meanwhile, the RL agent also needs lots of explorations to obtain numerous samples (T = (s t , a t , s t+1 , R(s t , a t ))) to train the policy network that can result in a combinatorial scale explosion. In the end, following the average aggregation approach to defining node states, PM-HGNN is not able to distinguish node states with categorical attributes, as shown in Eq. 2.
New State (S). To address the identified limitations, we propose an extension of PM-HGNN, PM-HGNN++, which can utilise the structure information of HIN and significantly accelerate the meta-path generation process. Our solution is to define a novel state instead of Eq. 2. Because we notice that the encoder (i.e., HGNN) of PM-HGNN can summarise node attributes and topological structure information involved in meta-paths to assist meta-path generation. In particular, we utilise the latest node representation vector H t−1 [v i ] of meta-path's starting node v i as the state. The new state (S) can be formally described as: where Normalise is a normaliser trained on the first B generated states to convert the states into the same distribution. Hence, input will be normalized as H[vi]−Hmean H std , where H mean and H std is calculated by the first B states. Note that normaliser is a common trick used in deep RL for stabilising the training when the S is very sparse [38]. In addition, H t [i] contains not only node attributes but also the relevant graph structure information of each node, hence s t of Eq. 7 can distinguish different nodes with categorical attributes. So, it endows PM-HGNN++ with the ability to handle graphs with diverse node attributes.
New Training Process. The training process can be further improved based on the new definition of the state. The learned node representation can be further used to optimise HGNN's parameters and update the RL agent. We achieve such a kind of mutual optimisation between the RL agent and HGNN since the meta-paths are generated based on the current status of HGNN. With this novel training process, we only need T epochs to complete a metapath generation procedure because we do not need to wait for the encoder to complete an entire training process, instead of T × B epochs. We summarise PM-HGNN++ in Algorithm 2. We perform empirical analysis to compare the time consuming of PM-HGNN and PM-HGNN++ in Sec. 5.4. Note that, similar to PM-HGNN, PM-HGNN++ framework can adaptively generate a meta-path for the new node to compute its representation, which maintains the ability of HGNN to learn representations for newly added nodes.

Model Analysis
The proposed models can deal with various types of nodes and relations and fuse rich semantics in HINs. Benefiting from the RL agent, personalised meta-paths are generated for different nodes, and the HGNN encoder allows information to transfer between nodes via diverse relations. The RL agent that we adopted in this paper, i.e., Deep Q-learning (DQN), is hard to give the precise computation complexity [39], hence we give the empirical meta-path Step t = mode(k, T ) + 1

4:
Sample a batch of nodes V j from V

12:
Obtain the learned node representations H t

13:
Obtain R(s t , a t ) on validation dataset via Eq. 3

14:
Store the quartuple T = (s t , a t , s t+1 , R t ) into D

16:
Optimise Q-function using the data in D via Loss-function-Q (Eq. 4) 17: end for generation time in IMDb and DBLP datasets in Sec. 5.4 (Fig. 8). Given a meta-path Ω and the dimensionality of (hidden) node representations is d. The time complexity of the node representation learning process is O(V Ω d 2 +E Ω d), where V Ω is the number of nodes following the meta-path Ω and E Ω is the number of meta-paths based node pairs. O(V Ω d 2 ) accounts for the representation transformation process and O(E Ω d) represents the Att(·) computation process of relevant edges. To further unfold the relationship between the complexity of generated meta-paths and the performance, we also report how the maximum step T and the number of aggregation paths affect the node classification performance based on the IMDb dataset in Sec. 5.4 (Table 6).

Experimental Settings
Datasets. We adopt three HIN datasets (IMDb and DBLP, ACM) [5,6] from different domains to evaluate our models' performance. Statistic information is summarised in Table 4 and the meta-relation schema are shown in Fig. 9. The detailed description of datasets refers to Appendix A.
Evaluation Settings. We evaluate our models under node classification. For the semi-supervised setting, we adopt the same settings as in MAGNN [6] to use 400 instances for training and another 400 instances for validation, the rest nodes for testing. The generated node representations are further fed into a support vector machine (SVM) classifier to get the prediction results. For the supervised setting, we use the same percentage of nodes for training and validation, respectively. Use the rest of the nodes for testing. We report the average Micro-F1 and Macro-F1 scores of 10 runs with different seeds. Semi-supervised Node Classification. We present the results in Table 1, in which competing methods that cannot support semi-supervised settings utilise the unsupervised settings. PM-HGNN++ performs consistently better than all competing methods across different proportions and datasets. On IMDb, the performance gain obtained by PM-HGNN++ over the best competing method (MAGNN) is around (3.7% − 5.08%). GNN models designed for homogeneous and heterogeneous graphs perform better than shallow HGRL models. This demonstrates that the modelling of heterogeneous node attributes improves the performance significantly. On DBLP and ACM, the performances of all models are overall better compared with the results on IMDb. It is interesting to observe that different from the results on IMDb, the shallow heterogeneous network methods, i.e., JUST and NSHE obtain better performances than a few number of homogeneous GNNs. That said, the heterogeneous relations on DBLP and ACM are useful for the node classification task. PM-HGNN++ apparently outperforms the strongest competing method (i.e., MAGNN) up to 2.4%, showing the generality and superiority of PM-HGNN++. In addition, among unsupervised competing methods, NSHE obtains better performance than other unsupervised methods, showing that network schema in HIN is helpful to obtain better representations. Supervised Node Classification. We present the results in Table 2. We can see that PM-HGNN++ consistently outperforms all competing models on IMDb, DBLP and ACM datasets, with up to 5.6% in terms of Micro-F1. Heterogeneous GNNs outperform homogeneous GNNs, and our PM-HGNN++ achieves the best performance. This demonstrates the usefulness of heterogeneous relations and the advantages of generating appropriate personalised meta-paths for each node according to its attributes and relational structure.

Meta-path Analysis
Meta-path Generation Process Visualisation. We visualise how the RL agent in PM-HGNN++ generates personalised meta-paths for each target node in Fig. 10, Fig. 11 and Fig. 12. Fig. 10 summarises the actions made by the RL agent on IMDb with different max steps under the semi-supervised setting. The percentages marked in the figure present the fraction of nodes choosing the corresponding relation to extend the meta-path at that step. For example, 64.8% means there are 64.8% Movie nodes select MA relation at step-1, and 8.8% means among those Movie nodes who selected MA at step-1, there are 8.8% nodes selecting AM to extend the meta-path. It is interesting to see that the RL agent selects the relation MA at the first step more often than the other one MD. It means that a movie's characteristic is more related to its actors than its director. Besides, when step T = 2, the RL agent selects plenty of STOP actions. This shows that two short meta-paths, i.e., Movie MA −−→ Actor and Movie MD −−→ Director are informative enough for the majority of nodes to learn effective representations. Moreover, there are 6.7% − 9.4% nodes that do not need any meta-paths to learn their representations. This implies that their attributes can provide enough information. This has also been reflected in the results of MLP in Sec. 5.2, which does not utilise any structural information but only node attributes. More analyses on Fig. 11 and Fig. 12 can be found in Appendix B. Overall Meta-path Statistics. We further present meta-paths generated for most of the target nodes in Table 3 2 , which are summarised from Fig. 10. Table 6 and Table 7 present the generated meta-paths on DBLP and ACM, respectively. We can find from Table 3 that the RL agent can generate the same meta-paths as manually-defined ones. Besides, we find that these metapaths specified by human experts are not the most useful ones to learn node representations for Movie nodes. Two short meta-paths Movie MA −−→ Actor and Movie MD −−→ Director are the most useful ones. This indicates that participants (director and actors) of the movie largely determine its type as the task is to predict the class of a movie. The top three meta-paths generated by the most potent competing method, GTN, confirm our observation that two shorter meta-paths are more valuable to learn Movie nodes' representations. However, GTN can only select several meta-paths for every node type; our model can identify personalised meta-paths for each node. More analyses on Table 6 and  Table 7 can be found in Appendix B. Meta-path Comparison. In order to understand how does PM-HGNN++ generate personalised meta-paths for different nodes, we hereby present another analysis to investigate the designed meta-path lengths for nodes that can be correctly classified according to raw node attributes. Specifically, we devide nodes into two groups, pos and neg, according to whether applying multi-layer perceptron (MLP) with node attributes can produce correct classification under supervised settings. That said, correctly-classified nodes and incorrectly-classified ones are grouped into pos and neg, respectively. Then, we calculate the average lengths of PM-HGNN++ generated meta-paths for nodes belonging to two groups. The results are reported in Fig. 5. We can find that nodes that cannot be correctly classified by MLP are associated with longer personalised meta-paths. Such results deliver an important insight: PM-HGNN++ tends to explore deeper relational neighbouring structure to enhance node representation learning, i.e., discriminating nodes with different labels from one another, if node attributes themselves cannot provide sufficient information. Moreover, in the DBLP dataset, which has more complicated heterogeneous semantic patterns (4 node types and 6 relation types), node group MLP (neg) has relatively longer meta-paths because PM-HGNN++ can have more options to explore and generate personalised meta-paths.

Maximum
Step T . We study how the influence of maximum step T and the number of aggregation paths affect on the classification performance. The results on IMDb dataset are presented in Fig. 6. Setting T ≥ 2 leads to the most significant performance improvement in terms of Micro F1 over T = 1. As T increases, the number of aggregation paths also increases. Considering that more aggregation paths bring higher computational cost, we choose T = 2 for the experiments in Sec. 5.2, even though T > 2 can produce better performance. Redundancy-free Aggregation. We investigate the influence of the redundancy-free improvement according to the number of aggregations with-/without the improvement. Fig. 7 shows the results. We can find that our redundancy-free aggregation strategy significantly reduces the number of information aggregations. Besides, as the maximum step T increases, the reduction effect is more obvious. On IMDb, our method reduces near 50% aggregations with T = 2, and the reduced ratio is more than 80% when T = 4.
Run Time Analysis of Meta-path Generation. As described in Sec. 4.4, PM-HGNN++ can accelerate the meta-path generation process through the proposed novel training process. Here we compare the run time of PM-HGNN and PM-HGNN++ for personalised meta-paths generation. We report their actual run time in seconds, and calculate the run time of PM-HGNN++ relative to PM-HGNN (i.e., relative time). The results on IMDb and DBLP datasets are shown in Fig. 8. We can find that PM-HGNN++ is less than 5%

Conclusions and future work
We have studied in this paper the HGRL problem and identified the limitation of existing HGRL methods, i.e., mainly due to their dependency on hand-crafted meta-paths. In order to fully unleash the power of HGRL, we presented a novel framework PM-HGNN and proposed one extension model PM-HGNN++. Compared with existing HGRL models, the most significant advantages of our framework lie in avoiding manual efforts in defining metapaths of HGRL and generating personalised meta-paths for each individual node. The experimental results demonstrated that our framework generally outperforms the competing approaches and discovers useful meta-paths that have been ignored by human expertise. In the future, we plan to extend our framework to other tasks on HINs, such as online recommendation and knowledge graph completion and understanding PM-HGNN's generated meta-paths is another promising direction.

A Datasets
The statistics of datasets are summarised in Table 4. Detailed descriptions of two datasets are presented as follows: IMDb 3 is an online dataset about movies and television programs, including information such as cast, production crew and plot summaries. We extract a subset of IMDb that contains 4, 278 movies, 2, 081 directors and 5, 257 actors. The movies are labelled as one of the three classes, i.e., Action, Comedy and Drama, according to their genre information. The attribute of each movie corresponds to elements of a bag of words (i.e., their plot keywords, 1, 232 in total). DBLP 4 is an online computer science bibliography. We extract a subset of DBLP that contains 4, 057 movies, 1, 4328 papers, 7, 723 terms and 20 venues. The authors are labelled as one of the following four research areas: Database, Data mining, Machine learning and Information retrial. Each author can be described by a bag of words (i.e., their paper keywords, 334 in total). ACM 5 is an online academic publication dataset. We extract papers published in KDD, SIGMOD, SIGCOMM, MobiCOMM and VLDB and divided the papers into three classes (Database, Wireless Communication, and Data Mining). Then, we construct a HIN that comprises 3, 025 papers, 5, 835 authors and 56 subjects. Paper features correspond to elements of a bag of words represented by keywords. We label the papers according to the conference they published.

B More Meta-paths Analysis
We have introduced three datasets that we used in the experimental sections in Section 5.1 and Section A. Here we further present the set of meta-paths that are possibly generated in each step in Table 5 6 . It should be noted that  as we discussed in Section 4, there is always an available STOP action at each step. That said, PM-HGNN variants allow to design meta-paths with flexible lengths at each step.   Fig. 11: Actions that the RL agent takes on DBLP: (a), (b) and (c) correspond to PM-HGNN++ with max step T = 1, 2, 3, respectively. The red triangles with "S" indicate the "STOP" action. The thickness of links represents the ratio of the corresponding action.   We present more meta-path analysis here. Figure 11 7 and Figure 12 visualise how the reinforcement learning agent in PM-HGNN++ generates  Fig. 12: Actions that the RL agent takes on ACM: (a), (b) and (c) correspond to PM-HGNN++ with max step T = 1, 2, 3, respectively. The red triangles with a black "S" indicate the "STOP" action. The thickness of links represents the ratio of the corresponding action. Paper→Author→Paper, Paper→Subject→Paper GTN [32] Paper→Author→Paper, Paper→Subject→Paper personalised meta-paths for each target node on DBLP and ACM datasets with different max steps under the semi-supervised setting, respectively The percentages marked in the figure represent the fraction of nodes choosing the corresponding relation to extend the meta-path at that step. Table 6 and  Table 7 summarise the predefined meta-paths, the found important meta-paths by GTN [32] and the top-frequent personalised meta-paths designed by PM-HGNN++ for the target nodes of DBLP and ACM datasets, respectively. The meta-path generation process is shown in Figure 11 and Figure 12, the RL agent makes decisions to extend the meta-paths for different nodes according to the defined state S. Note that, we only present the top-5 frequent meta-paths if there are more possibilities.