FIRE: knowledge-enhanced recommendation with feature interaction and intent-aware attention networks

Zhang, Ruoyi; Ma, Huifang; Li, Qingfeng; Wang, Yike; Li, Zhixin

doi:10.1007/s10489-022-04300-x

FIRE: knowledge-enhanced recommendation with feature interaction and intent-aware attention networks

Published: 07 December 2022

Volume 53, pages 16424–16444, (2023)
Cite this article

Download PDF

Applied Intelligence Aims and scope Submit manuscript

FIRE: knowledge-enhanced recommendation with feature interaction and intent-aware attention networks

Download PDF

2192 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

To solve the information overload issue and enhance the user experience of various web applications, recommender systems aim to better model user interests and preferences. Knowledge Graphs (KGs), consisting of real-world objective facts and fruitful entities, play a vital role in recommender systems. Recently, a technological trend has been to develop end-to-end Graph Neural Networks (GNNs)-based knowledge-aware recommendation (a.k.a., Knowledge Graph Recommendation, KGR) models. Unfortunately, current GNNs-based KGR approaches focus on how to capture high-order feature information on KGs while neglecting the following two crucial limitations: 1) The explicitly high-order feature interaction and fusion mechanism and 2) The valid user intent modelling mechanism. As such, these issues lead to insufficient user/item representation learning capability and unsatisfactory KGR performance. In this work, we present a novel Knowledge-enhanced Re commendation with F eature I nteraction and Intent-aware Attention Networks (FIRE) to address the latent intent modelling and high-order feature interaction deficiencies ignored by existing KGR methods. Based on the prototype user/item representation learning leveraging the GNNs-based approach, our model offers the following major improvements: One is the innovative use of Convolutional Neural Networks (CNNs) that perform vertical convolutional (a.k.a., bit-level convolutional) and horizontal convolutional (a.k.a., vector-level convolutional) processes to model multi-granular high-order feature interactions to enhance item-side representation learning. Another is to model users’ latent intent factors by utilizing a two-level attention mechanism (i.e., node- and intent-level attention mechanism) to enhance user-side representation learning. Extensive experiments on three KGs domain public datasets demonstrate that our method outperforms the existing state-of-the-art baseline. Last but not least, numerous ablation- and model studies demystify the working mechanism and elucidate the plausibility of the proposed model.

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Artificial intelligence in recommender systems

Article Open access 01 November 2020

Recommendation system based on deep learning methods: a systematic review and new directions

Article 03 August 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The last decades have witnessed the flourishing of the World Wide Web, facilitating the development of online recommender systems (a.k.a., recommendation). Recommender systems have become essential components for internet applications (i.e., micro-video [1], E-commerce [2], and P2P lending [3]) to discover latent user interests and select items of interest for users accurately and in a timely manner based on a user-item historical interaction network. To alleviate the inherent sparsity and cold-start problems of recommender systems, an increasing amount of cutting-edge research has focused on recommendation methods that incorporate auxiliary information to capture deeper features and improve recommendation performance, including social networks [4,5,6], tags [7], and multi-modal information [8]. Knowledge Graph (KGs) are beneficial for enhancing item features with a large amount of structured attribute information. Unlike classic user-item bipartite graphs and user-user social networks, KGs are composed of a set of triplets, i.e., <head entity, relation, tail entity>. Figure 1 illustrates an example of KGs, which can describe not only node attributes, such as <Music1, Style, Popular>, but also include node relationships, such as <Singer, Friend, Arranger>. Recently, several large-scale KGs have been proposed, such as Satori, Freebase, and Google’s Knowledge Graph. These KGs can benefit recommender systems by introducing relatedness among entities, which makes it convenient to build KGs for recommendation, enrich entity information, and produce explainability. The above-mentioned enriched information in KGs can also supplement the relational modelling between users and items. Therefore, recommendation methods incorporating KGs are of interest to researchers and can effectively improve recommendation performance.

Indeed, the research work dedicated to knowledge graph recommendation (a.k.a., KGR) has been ongoing. The previous KGs-based recommendation studies aim to obtain high-quality items and KGs entity embeddings leveraging pre-trained models, such as classical K nowledge G raph E mbedding (KGE) approaches are remarkable, including TransR [9], TransE [10], and RotatE [11]. Unfortunately, KGE-based KGR approaches neglect high-order connectivity and collaborative signals, leading to poor performance.

Consecutively, to resolve the natural shortcomings of the KGE approach, researchers attempt to probe the sophisticated high-order connectivity between items and entities. Primarily, there are two endeavours: 1) Meta-path-based methods. These methods mainly focus on leveraging the meta-path to capture item-entity long-distance connections and entity affiliations to augment the item/user representation. Nevertheless, these methods heavily rely on hand-designed meta-paths with expert knowledge. As a result, they are difficult to optimize during a practical training process. 2) Graph Neural Networks (GNNs)-based methods. These approaches focus iteratively over the whole KGs to find side information for recommendations. GNNs are a widespread neighbour aggregation strategy used to integrate multi-hop KGs entity nodes’ features into the target user/item representation. We must acknowledge that the existing GNNs-based KGR methods achieve excellent performance. However, can the current GNNs-based KGR methods achieve adequate awareness of high-order feature interaction signals and users’ latent intent information?

Based on the above doubts, we rethink the shortcomings and improvement goals of existing GNNs-based KGR methods. We believe that an effective approach for GNNs-based KGR methods is to couple both high-order feature interaction and user intent modelling as a whole based on which knowledge-enhanced method can be fully investigated. Nevertheless, fully exploiting high-order feature interaction signals and modelling latent intent information is by no means an easy task. To build an end-to-end KGR framework via high-order feature interaction and latent intent modelling, two issues inevitably need to be tackled:

How can an effective high-order feature interaction paradigm be designed? In existing studies, high-order features from GNNs aggregation are generally combined with concatenation, pooling, or summation to receive high-order information without explicitly modelling their interactions. Such high-order aggregation mechanisms easily lead to over-smoothing issue. Besides, no further valuable feature information can be encoded, which significantly limits the performance of the model. Intuitively, adequately modelling the fine-grained feature interaction signals among high-order features has profound implications for enriching node representation learning.
How can a user’s latent intent signal be fully captured? In the real world, users’ intents are sophisticated and diversified, driving the user to consume different items. The intent behind user-item interaction offers a deep understanding of the user preferences. Existing KGR recommendation studies rarely consider the underlying user intent modelling, which makes the trained models uninterpretable and leads to unsatisfactory model performance.

Consequently, to solve the above two issues, we propose a novel Knowledge-enhanced Recommendation with Feature Interaction and Intent-aware Attention Networks (FIRE) to address the latent intent modelling and high-order feature interaction deficiencies ignored by existing KGR methods. Initially, we adopt a GNNs-based knowledge-aware backbone network to generate the user/item prototype representation. Next, to combat the first issue raised above, we innovatively use of Convolutional Neural Networks (CNNs) that perform vertical convolutional (a.k.a., bit-level convolutional) and horizontal convolutional (a.k.a., vector-level convolutional) approaches to model multi-granular high-order feature interactions to enhance item-side representation learning. For the second issue, we use a two-level attention mechanism (i.e., node-level attention mechanism and intent-level attention mechanism) to model the latent intent embedding to enhance user-side representation learning. Finally, all user-side/item-side representations are integrated, and inner product operations are performed to output prediction scores.

Overall, the contributions of our FIRE framework are three-fold:

A novel high-order feature interaction paradigm: To the best of our knowledge, our work is the first attempt to incorporate high-order feature interaction techniques into the knowledge-aware recommendation task. Concretely, to enhance item-side representations, we highlight the critical importance of explicitly exploiting the feature interaction method in KGs-based GNNs recommendation methods. We propose a novel CNNs-based high-order feature interaction strategy to extract fine-grained interaction information features, which enrichs the item-side node representation learning capability.
Comprehensive modelling of users’ latent intent signals: For enhancing user-side representations, we propose a new approach to model a user’s latent intent by leveraging a two-level attention mechanism to enrich the node representation learning capability on the user-side.
Extensive experiments: We prepare three real-world datasets to evaluate our model. The empirical results demonstrate the effectiveness of our FIRE framework for KGs-based recommendation and show superiority to the current state-of-the-art baseline. Besides, numerous ablation- and model studies demystify the working mechanism and elucidate the plausibility of our FIRE model.

The rest of our paper is organized as follows. We summarize the related work in Section 2. Then, we briefly outline our task in Section 3. Based on this, we give a detailed description of our method in Section 4. In addition, the proposed model is analyzed and discussed in depth. In Section 5, a series of experiments on real-world KGs-based recommendation data are conducted, and the results are discussed in detail. Finally, a brief conclusion and future work are given in Section 6.

2 Related work

In this section, we review the current related work that is most relevant to the proposed approach. We present the relevant technical points in order from each of the following three technical perspectives: 1) KGs-based recommendation, 2) Feature interaction methods in recommendations, and 3) Disentangled representation learning (intent modelling) methods in recommendations. Next, we conclude and state these approaches after each subsection and briefly explain the differences from our proposed method.

2.1 KGs-based recommendation methods

In the early stages of KGR research, related work focused on embedding-based techniques. Entities and relations in KGs are used as supplementary information for users and items in recommendation tasks. In order to fully utilize the KGs information, the Knowledge Graph Embedding (KGE) technique is used to encode the entities and relations of the KGs as low-rank embeddings. Mainstream KGE algorithms are based on translation models (e.g., TransE, TransR, and RotatE). Based on the above method, a global graph representation of the user/item can be obtained. For instance, CKE [12] utilizes multimodal information as item-side supplemental information and learns item representation via TransR. DKN [13] treats contextual and word embedding information in news as side information and employs a multi-channel approch to generate news representations via TransD. However, embedding-based KGs recommendation approaches neglect the consideration of high-order connectivity and fails to adequately capture item-side high-order attribute embeddings.

Next, to pay more attention to the high-order connectivity problem in KGR, researchers are committed to advancing the path-based recommendation method. The path-based approach aims at path representation utilizing high-order entity connectivity patterns in heterogeneous information networks. The Meta-path is a relation sequence (i.e., $\mathcal {P}=A_{0} \overset {{R}_{1}}{\rightarrow } A_{1} \overset {{R}_{2}}{\rightarrow } {\ldots } \overset {{R}_{k}}{\rightarrow } A_{k}$ ) connecting object pairs in a Heterogeneous Information Network (HIN), which can be used to extract connectivity features in the graph, which accounts for long-range connectivity by extracting paths that connect the target user and item nodes via KGs entities, such as PER [14], McRec [15], and HERec [16]. However, mainstream path-based methods suffer from some inherent limitations: 1) Brute force searches tend to lead to labour-intensive and time-consuming feature engineering when large-scale graphs are involved; 2) Experts are needed to define domain knowledge. The path-based method often results in difficult-to-train models and poor performance.

Furthermore, the rise of GNNs-based technology offers the possibility of exploring long-range connectivity in recommendations. It can iteratively execute the propagation mechanism to capture the high-order semantic information of target nodes on KGs, thus updating the high-order embeddings of the target node on the KGs. For instance, KGAT [17] proposes a Collaborative Knowledge Graph (CKG) to combine user-item-entity and recursively performs propagation over CKG by G raph A ttention NeT works (GATs [18]) to enrich entity embeddings, KGCN [19] recursively performs propagation over KGs via GNNs to enhance item-side entity embeddings. CKAN [20] utilizes a heterogeneous propagation strategy, which enables simultaneous augmentation of user- and item-side representations via GATs. KGIN [21] utilizes intent- and relation-aware mechanisms to model user/item representations in KGR. Nevertheless, mainstream GNNs-based methods suffer from two inherent limitations: 1) Neglecting fine-grained feature interactions and user intent modelling; 2) Unavoidable over-smoothing phenomenon.

Summary

As we mentioned in the introduction, the existing GNNs-based KGR model shows a strong dominance. Yet, the lack of fine-grained feature interactions and the absence of an explicit intent modelling mechanism limits the performance of recommendations to some extent. Our work adopts a GNNs-based model as a backbone network to explore the issue of high-order feature interactions in KGR models as well as intent-aware modelling, which is the primary focus of our model.

2.2 Feature interaction technique in recommendations

Feature interaction (a.k.a., feature combination) techniques have been successful in the field of click-through prediction (CTR), which can fully extract explicit as well as implicit interactions between high-order features. Converging feature interactions with deep neural networks for end-to-end models has become a mainstream approach for CTR, which can enhance the nonlinear capability of models and is significant for increasing model prediction accuracy. DeepFM [22] is a feature interaction model based on deep neural networks and factorization machines, which can effectively model the interactions of low-order and high-order features. xDeepFM [23] proposes a Compressed Interaction Network (CIN), which aims to learn arbitrary low-order and high-order feature interactions. FINT [24] proposes a feature interaction model that aims to perform high-order feature interaction while preserving the semantic information at the field level. CAN [25] proposes a method for modelling feature interactions utilizing a co-action network.

Afterwards, with the rise of GNNs technology, researchers attempt to utilize GNNs technology to deal with feature interactions. Fi-GNN [26] proposes a graph-based feature interaction method that models high-order features as nodes on a graph for simulating complex high-order feature interactions. L0-SIGN [27] is a graph-based feature interaction model that proposes a method based on L0 regularization to preserve useful feature interactions and filter irrelevant feature interactions in the feature graph.

Summary

In this paper, inspired by the great progress made in the feature interaction paradigm in the CTR task as well as in the sequence model [28], we introduce high-order feature interaction techniques into the KGR task to model fine-grained feature interactions. To the best of our knowledge, our work is the first attempt to incorporate high-order feature interaction techniques into the KGR task. This is one of the contributions and novelties of our work.

2.3 Disentangled representation learning (intent modelling) technique in recommendations

In the real world, there are often highly complex factors behind the construction of graphs(e.g., social networks, and user-item bipartite graphs). With the boom in graph machine learning techniques, deep learning techniques often ignore the latent factors behind these interactions. Indirectly, this leads to poor model robustness, neglect of interpretability, and unsatisfactory performance. The idea of disentangled representation learning originated in Capsules Network, and the core algorithm is the neighbour routing mechanism. DGCF [29] combines disentangled representation learning with collaborative filtering. IPREC [30] adopts a novel package recommendation framework that considers user latent intent modelling via an attention mechanism. GNUD [31] is a framework for news recommendation that combines disentangled representation learning with news recommendation, where a neighbour routing mechanism algorithm is applied. MIDGN [32] proposes a multi-view intent disentangled GNNs-based bundle recommendation model. IDS4NR [33] proposes a novel intent disentangled recommendation model based on item popularity and user preference perspectives. DisenHAN [34] proposes a recommendation method for disentangling user intent in heterogeneous information networks. However, little research has been done on the great potential of intent modelling for KGR.

Summary

In this paper, benefiting from the success of disentangled representation learning and intent modelling, we consider the introduction of an intent-aware technique based on a two-level attention mechanism for modelling fine-grained user intents in KGR tasks. This is another of the contributions and novelties of our work.

3 Problem formulation

We have the following definition and description of the KGs-based recommendation task.

User-Item interaction data

In a classic recommendation scenario, we have a set of M users $U=\left \{u_{1}, u_{2}, \ldots , u_{M}\right \}$ and a set of N items $V=\left \{v_{1}, v_{2}, \ldots , v_{N}\right \}$. The user-item interaction matrix Y is defined according to user-item (u, v) implicit feedback as follows:

$$ y_{u v}=\left\{\begin{array}{lc} 1 & \text{if \ interaction}(u, v) \text {\ is \ observed } \\ 0 & \text{otherwise.} \end{array}\right. $$

(1)

Knowledge graph

We have a knowledge graph $G=(\mathcal {E} , \mathcal {R})$, which is an undirected graph composed of entity-relation-entity triples (h,r,t). Where $h, t \in \mathcal {E}$ and $r \in \mathcal {R}$, denote the head entity, tail entity, and KGs-relation of a knowledge triple, $\mathcal {E}$ and $\mathcal {R} $ are the set of entities and relations in the KGs, respectively. In addition, we define an item-entity alignment set $A=\{(v, e) \mid v \in V, e \in \mathcal {E}\}$ that is designed to uncover the alignment operations of items in both the user-item interaction matrix and KGs.^{Footnote 1} Ultimately, we aim to learn a match function $\tilde {y}_{u v}=\mathcal {F}(u, v \mid {\Theta }, \mathbf {Y}, G)$, where $\tilde {y}_{u v}$ denotes the probability that user u will match with item v, and Θ is the set of model parameters(a.k.a., configuration).

Task description

We detail the recommendation of this paper for the task.

Input: User-Item interaction matrix Y, knowledge graph G, and the model parameter set Θ.
Output: The probability that a user interacts with the item $\tilde {y}_{u v}$.

In addition, the important symbols involved in this paper are listed in Table 1.

Table 1 The important symbols and definitions in this paper

Full size table

4 Method

In this section, we introduce the proposed FIRE. The framework is shown in Fig. 2. Precisely, FIRE consists of three modules: 1) Attentive Propagation Layer, which generates high-order propagation embeddings of target users and target items; 2) Feature Interaction (Bi-CNNs) module. The high-order representation computed by the target item is integrated into a 2-D matrix. Features are fused and interacted with two granularity convolutional kernels, thus obtaining a feature-enhanced item embedding; 3) Intent Disentangled Module. It utilizes two-level attention mechanisms to model the latent intent behind user-item interactions and obtain user intent-enhanced embeddings.

4.1 Backbone network

FIRE relies on GATs to capture high-order neighbour information following CKAN [20] and RippleNet [35]. The receptive field R is defined as follows:

$$ \mathrm{R}_{u}^{0}=\left\{e \mid v \in\left\{v \mid y_{u v}=1\right\} \wedge(v, e) \in A\right\} $$

(2)

$$ \mathrm{R}_{u}^{l}=\left\{t \mid(h, r, t) \in G \wedge h \in \mathrm{R}_{u}^{l-1}\right\} $$

(3)

$$ \begin{array}{@{}rcl@{}} \mathrm{R}_{v}^{0}&=\left\{e \mid v^{*} \in\left\{v^{*} \mid \exists u \in U, y_{u v^{*}}=1 \wedge y_{u v}\right.\right.\\ &{}\left.\left.=1\right\} \wedge\left( v^{*}, e\right) \in A\right\} \end{array} $$

(4)

Here, l={1,2,…,L}. We elaborately describe the user representation learning process here. Because item representation learning is a dual process, we omit it for brevity. In addition, we need to define the triple set of l-th knowledge affiliation triples, taking the user as an example.The size of the triplet set $\mathcal {T}$ directly determines the number of associated high-order entities.

$$ \mathcal{T}_{u}^{l}=\left\{(h, r, t) \mid(h, r, t) \in \mathcal{G} \text { and } h \in \mathrm{R}_{u}^{l-1}\right\} $$

(5)

Next, first-order propagation on the KGs is used as a case to demonstrate in detail the calculation of high-order embeddings via the knowledge propagation attention mechanism as follows:

$$ \pi=\text{softmax}\left( f_{\text {neural }}\left( \boldsymbol{e}_{i}^{h} \oplus \boldsymbol{e}_{i}^{r}\right)\right) $$

(6)

Here $i=1,2,\ldots ,\left \vert \mathcal {T}_{u}^{l}\right \vert $ and f_neural(⋅) is a 3-layer feedforward neural network. ⊕ is the vector concatenation operation. Then, we can obtain the user embedding after first-layer propagation as follows:

$$ \boldsymbol{e}_{u}^{1}=\sum\limits_{i=1}^{\left\vert\mathcal{T}_{u}^{1}\right\vert} \pi \cdot \boldsymbol{e}_{i}^{t} $$

(7)

In addition, we introduce the initial propagation embeddings $\boldsymbol {e}_{u}^{0}, \boldsymbol {e}_{v}^{0}$ and the item initial embedding $\boldsymbol {e}_{v}^{i n i t}$ as follows:

$$ \boldsymbol{e}_{u}^{0}=\frac{1}{\left\vert\mathrm{R}_{u}^{0}\right\vert} \sum\limits_{e \in \mathrm{R}_{u}^{0}} \boldsymbol{e}, \boldsymbol{e}_{v}^{0}=\frac{1}{\left\vert\mathrm{R}_{v}^{0}\right\vert} \sum\limits_{e \in \mathrm{R}_{v}^{0}} \boldsymbol{e} $$

(8)

$$ \boldsymbol{e}_{v}^{i n i t}=\frac{1}{\vert\{e \mid(v, e) \in A\}\vert} \sum\limits_{\{e \mid(v, e) \in A\}} \boldsymbol{e} $$

(9)

By analogy, we can obtain both user and item high-order embeddings:

$$ \left\{\boldsymbol{e}_{u}^{0}, \boldsymbol{e}_{u}^{1}, \ldots, \boldsymbol{e}_{u}^{L}\right\},\left\{\boldsymbol{e}_{v}^{i n i t}, \boldsymbol{e}_{v}^{0}, \boldsymbol{e}_{v}^{1}, \ldots, \boldsymbol{e}_{v}^{L}\right\} $$

(10)

To aggregate high-order embeddings, we implement three aggregators: sum, concat, and maxpooling aggregators. Thus, we obtain the user and item integration representations: e_u,e_v. For convenience, we replace the user/item with the uniform symbol o.

$$ a g g_{\text{sum}}=\left[\mathbf{e}_{o}^{0}+\mathrm{\boldsymbol{e}}_{o}^{1}+\ldots+\mathrm{\boldsymbol{e}}_{o}^{L}\right] $$

(11)

$$ a g g_{\text{concat}}=\left[\boldsymbol{e}_{o}^{0}\oplus\boldsymbol{e}_{o}^{1}\oplus\ldots\oplus\boldsymbol{e}_{o}^{L}\right] $$

(12)

$$ agg_{\text{maxpooling}}=maxpooling\left\{\mathrm{\boldsymbol{e}}_{o}^{0}, \mathrm{\boldsymbol{e}}_{o}^{1}, \ldots, \mathrm{\boldsymbol{e}}_{o}^{L}\right\} $$

(13)

where + is the vector summation operator, ⊕ is the vector concatenation operator and maxpooling{⋅} is the vector maxpooling function.

4.2 Feature interaction

Classic GNNs-based KGs recommendation frameworks merely choose to aggregate high-order embeddings by concat, mean, sum, or maxpooling while fine-grained feature interactions are neglected, e.g. KGCN, RippleNet, and KGIN. Nevertheless, we believe that these methods do not consider high-order feature interactions, which are crucial for recommendations. Convolutional Neural Networks (CNNs) and related variants are known to have recently witnessed breakthroughs in the areas of computer vision and natural language processing. This technique has been successful in extracting both local- and global features. We innovatively attempt CNNs as feature aggregators to process the 2-D high-order embedding matrix $\textit {\textbf {M}}_{v} \in \mathbb {R}^{(L+2) \times \mathrm {\textit {d}}}$. Specific to the item-side, we adopt two convolutional kernels (Horizontal convolutional kernel and Vertical convolutional kernel) for extracting feature interaction signals, and the module is named Bi-CNNs.

More precisely, the vertical convolutional kernel $\boldsymbol {V}^{t} \in \mathbb {R}^{(L+2) \times 1}$ slides column-wise over matrix M_v to extract interactions of fixed dimension for high-order features, called the bit-level feature interaction mode. Similarly, a horizontal convolutional kernel $\boldsymbol {H}^{t} \in \mathbb {R}^{h \times d}$ slides over matrix M_v in rows to extract interaction signals between neighbouring high-order features, called the vector-level feature interaction mode. In addition, to extract richer feature interaction information, we follow concept of multi-head mechanism [18] in GATs and stack several convolutional kernels for two types of convolutional modes.

Initially, the high-order embedding is concatenated into a 2-D matrix, with the following formalization.

$$ \boldsymbol{M}_{v}=\left[\begin{array}{c} \boldsymbol{e}_{v}^{i n i t} \\ \boldsymbol{e}_{v}^{0} \\ {\cdots} \\ \boldsymbol{e}_{v}^{L} \end{array}\right]_{\mathbb{R}^{(L+2) \times d}} $$

(14)

Vector-level feature interaction mode

As shown in Fig. 3, the upper part of the Bi-CNNs module depicts the working mechanism of the horizontal convolutional kernel, denoteed as $\boldsymbol {H}^{t} \in \mathbb {R}^{h \times d}$, for extracting neighbouring order feature interaction signals. As mentioned previously, following the multi-head mechanism requires several convolutional kernels to extract more feature interaction signals. Hence, $t \in [1, \tilde {n}]$ and h ∈{1,2,…,(L + 2)} is shown as the height of the horizontal convolutional kernel. The i-th convolutional value $\tilde {\boldsymbol {c}}_{i}^{t}$ is illustrated as follows:

$$ \tilde{\boldsymbol{c}}_{i}^{t}=\text{ReLU}\left( \boldsymbol{M}_{v}[i: i+h-1,:] \odot \boldsymbol{H}^{t}\right) $$

(15)

Where ⊙ denotes the inner product operator. Thus, the horizontal convolutional result $\tilde {\boldsymbol {c}}^{t} \in \mathbb {R}^{(L+2)-h+1}$ is :

$$ \tilde{\boldsymbol{c}}^{t}=\left[\tilde{c}_{1}^{t}, \tilde{c}_{2}^{t}, \ldots, \tilde{c}_{i}^{t},\ldots,\tilde{c}_{((L+2)-h+1)}^{t}\right] $$

(16)

Since vector-level feature interactions resulting from horizontal convolutional kernel interactions cannot avoid overlap and redundancy of interaction information. Hence, we apply the maxpooling operation and the output vector $\boldsymbol {o}_{h} \in \mathbb {R}^{\tilde {n} \times 1}$ for the $\tilde {n}$ kernels is denoted as:

$$ \boldsymbol{o}_{h}=\left[\max \left( \tilde{\boldsymbol{c}}^{1}\right), \max \left( \tilde{\boldsymbol{c}}^{2}\right), \ldots, \max \left( \tilde{\boldsymbol{c}}^{\tilde{n}}\right)\right] $$

(17)

Bit-level feature interaction mode

To extract significant feature interactions from a fixed dimensional viewpoint, we employ a vertical convolutional kernel $\boldsymbol {V}^{t} \in \mathbb {R}^{(L+2) \times 1}$ to extract bit-level feature interactions. As shown in Fig. 3, the lower part of the Bi-CNNs layer depicts the working mechanism of the vertical convolutional kernel. The vertical convolutional kernel V^t covers the 2-D high-order feature matrix M_v and slides along the embedding dimension.

Similar to the Horizontal convolutional kernel, the i-th convolutional value $\boldsymbol {c}_{i}^{t}$ is denoted as:

$$ \boldsymbol{c}_{i}^{t}=\text{ReLU}\left( \boldsymbol{M}_{v}[:, i] \odot \boldsymbol{V}^{t}\right) $$

(18)

Where $\boldsymbol {V}^{t} \in \mathbb {R}^{(L+2) \times 1}, t \in [1, n]$. Thus, the vertical convolutional result $\boldsymbol {c}^{t} \in \mathbb {R}^{d}$ is:

$$ \boldsymbol{c}^{t}=\left[c_{1}^{t}, {c_{2}^{t}}, \ldots, {c_{i}^{t}},\ldots, {c_{d}^{t}}\right] $$

(19)

where d denotes the embedding dimension.

In particular, the vertical convolutional interaction results are equal to the weighted sum over the l rows of M_v weighted via vertical convolutional kernel V^t.

$$ \boldsymbol{c}^{t}=\sum\limits_{l=1}^{L+2} \boldsymbol{V}^{\textit{t}}[l,:] \cdot \boldsymbol{M}_{v}[l,:] $$

(20)

Here l ∈ [1,(L + 2)], i ∈ [1,d], and ⋅ denotes multiplication operation.

We stack n vertical convolutional kernels. In contrast to the horizontal convolutional kernel processing, we expect to maximize the retention of bit-level feature interaction information in each dimension. Hence, we join the n vertical convolutional kernels in sequence. The output vector $\boldsymbol {o}_{v} \in \mathbb {R}^{dn}$ is denoted as:

$$ \boldsymbol{o}_{v}=\left[\boldsymbol{c}^{1}\oplus \boldsymbol{c}^{2}\oplus \ldots\oplus \boldsymbol{c}^{n}\right] $$

(21)

where ⊕ is shown as the connection operator.

Dense layer

We concatenate the above two convolutional vectors, fed into the dense layer to extract global interactive features, and output the item feature-enhanced convolutional embedding z_v as such:

$$ \mathbf{z}_{v}=\varphi\left( \mathbf{W} \cdot \left[\begin{array}{l} \boldsymbol{o}_{v} \\ \boldsymbol{o}_{h} \end{array}\right]\right) $$

(22)

Where $\mathbf {W}\in \mathbb {R}^{d \times (\tilde {n}+dn)}$ is the transformation matrix and the convolutional embedding $\mathbf {z}_{v} \in \mathbb {R}^{d \times 1}$, φ(⋅) is the Sigmoid function. Then the final representation $\tilde {\boldsymbol {e}}_{v}$ is:

$$ \tilde{\boldsymbol{e}}_{v}=\boldsymbol{e}_{v}+\mathbf{z}_{v} $$

(23)

In summary, we innovatively employ CNNs and dense networks (layers) for extracting feature interaction signals, with the following advantages: 1) The horizontal convolutional mode primarily extracts feature interaction signals between adjacent high-order features, i.e., overall feature interactions between vectors. 2) The vertical convolutional mode extracts fine-grained feature interaction signals in each dimension of all features. 3) Similar to the multi-head mechanism, several types of convolutional kernels are used to extract more feature interaction signals. 4) The local feature interaction signals generated by the two convolutional modes are recombined and fed into a dense network to learn advanced global feature interaction signals.

4.3 Intent disentangled module

The construction of a real-world user-item interaction graph often results from a highly complex interaction of many latent factors. Existing deep learning techniques consider graph interactions holistically and rarely consider the entanglement of latent factors, causing the learned embeddings to be flawed for downstream tasks and uninterpretable. Figure 4 reveals the details of intent modelling driving user decisions and motivations. Thus, we first disentangle the user and item representations to different spaces and then make the user interact with the item in the same space to model the complex intent behind the user-item interaction:

$$ \boldsymbol{e}_{u^{s}}=\mathbf{W}_{u}^{s} \boldsymbol{e}_{u}, \boldsymbol{e}_{{i_{x}^{s}}}=\mathbf{W}^{s} \boldsymbol{e}_{i_{x}} $$

(24)

where $\boldsymbol {e}_{u^{s}}$ and $\boldsymbol {e}_{{i_{x}^{s}}}$ represent the corresponding embeddings under the s-th disentangled space, and s ∈ [1,S]. W^s is denoted as the disentangled matrix. $o^{\text {\textit {history} }}=\left \{i_{1}, i_{2}, \ldots , i_{x}\right \}$ is defined as the set of items with which the target user has interacted. Immediately afterwards, we integrate the embeddings for each intent via the node-level attention mechanism as such:

$$ \boldsymbol{f}^{s}=\sum\limits_{i_{x} \in O^{history}} {\alpha_{x}^{s}} \boldsymbol{e}_{{i_{x}^{s}}} $$

(25)

$$ {\alpha_{x}^{s}}=\frac{\exp \left( \mathbf{q}^{\mathrm{T}} \cdot \tanh \left( \mathbf{w}\left[\boldsymbol{e}_{u^{s}} \oplus \boldsymbol{e}_{{i_{x}^{s}}}\right]\right)\right)}{{\sum}_{i_{x} \in O^{h i s t o r y}} \exp \left( \mathbf{q}^{\mathrm{T}} \cdot \tanh \left( \mathbf{w}\left[\boldsymbol{e}_{u^{s}} \oplus \boldsymbol{e}_{i_{x^{\prime}}^{s}}\right]\right)\right)} $$

(26)

Where $\mathbf {q} \in \mathbb {R}^{d}$ and $\mathbf {W} \in \mathbb {R}^{d \times 2 d}$ are trainable parameters for the attention mechanism. In essence, the attention weight ${\alpha _{x}^{s}}$ captures the target user preference for an item in a particular intent disentangled space. Intuitively, user preferences vary for each intent, motivating us to further combine the influence from s-th disentangled spaces with intent-level attention mechanism:

$$ \boldsymbol{f}=\sum\limits_{s=1}^{S} \beta^{s} \boldsymbol{f}^{s} $$

(27)

$$ \beta^{s}=\frac{\exp \left( \mathbf{q}^{\mathrm{T}} \cdot \tanh \left( \mathbf{w}\left[\boldsymbol{e}_{u} \oplus \boldsymbol{f}^{s}\right]\right)\right)}{{\sum}_{s=1}^{S} \exp \left( \mathbf{q}^{\mathrm{T}} \cdot \tanh \left( \mathbf{w}\left[\boldsymbol{e}_{u} \oplus \boldsymbol{f}^{s}\right]\right)\right)} $$

(28)

Where $\boldsymbol {f} \in \mathbb {R}^{d}$ is the final embedding for encoding the complex intent influence for the user. Then the final representation $\tilde {\boldsymbol {e}}_{u}$ is:

$$ \tilde{\boldsymbol{e}}_{u}=\boldsymbol{e}_{u}+\boldsymbol{f} $$

(29)

4.4 Model prediction and optimization

On the premise that user/item prototype representations e_u,e_v are obtained via the knowledge-aware attention networks (c.f., Equations 11-13), we assign the ability to high-order feature interactions (c.f., Equation 23) and intent-aware (c.f., Equation 29) to the model.

Next, the interaction probability (matching score) $\tilde {y}_{u v}$ between the target user u and the target item v is calculated by the inner product as follows:

$$ \tilde{y}_{u v}=\sigma\left( \tilde{\boldsymbol{e}}_{u}^{\mathrm{T}} \cdot \tilde{\boldsymbol{e}}_{v}\right) $$

(30)

Here, σ(⋅) is the sigmoid function.

To ensure the effectiveness of model training and to improve training efficiency, we adopt the negative sampling strategy, sampling the same number of negative samples for each user. Ultimately, the loss function of FIRE is defined as follows.

$$ \begin{array}{@{}rcl@{}} L&=&\sum\limits_{u \in U}\left( \sum\limits_{v \in\left\{v\vert(u, v) \in \gamma^{+}\right\}} {\Gamma}\left( y_{u v}, \tilde{y}_{u v}\right)\right.\\ &&\left.-\sum\limits_{v \in\left\{v\vert(u, v) \in \gamma^{-}\right\}} {\Gamma}\left( y_{u v}, \tilde{y}_{u v}\right)\right) \ +\lambda\|{\Theta}\|_{2}^{2} \end{array} $$

(31)

Where Γ(⋅) is the cross-entropy loss function, and γ⁺ denotes the user-item positive pair set. Conversely, γ⁻ denotes the user-item negative pair set. λ is the L2-regularization coefficient for reducing overfitting. Θ denotes the model parameter set (model configuration). Finally, we compute the loss and adopt Adam optimization [36] to optimize our model parameters.

To make the overall framework of FIRE readable, we present the pseudo-code for the overall prediction method in Algorithm 1.

4.5 Model analysis and discussion

In this subsection, we conduct an in-depth analysis and discussion of the relation between FIRE and existing GNNs-based KGR models [17, 19, 20, 39]. Two main aspects are developed: 1) Novelty and differences, and 2) Relation with the state-of-the-art approach (KGIN [21]).

Novelty and differences. For the knowledge graph recommendation task, previous work [17, 19, 20, 39] has focused on extracting long-range attribute (knowledge) information using GNNs techniques, leading to better representation learning of the user/item. Due to the inherent sparsity of recommender systems, we believe high-order feature interactions are crucial for recommendations to achieve feature enhancement. To the best of our knowledge, this is the first attempt to utilize CNNs for feature interaction on high-order attributes generated by knowledge-aware networks. In addition, previous KGR models rarely considered the user intent modelling process. We strongly believe that the knowledge-aware backbone network, empowering item-side feature interaction, and user-side intent modelling could jointly make FIRE more efficient than previous work.
Relation with KGIN. KGIN [21] is the state-of-the-art model for KGs-based recommendations. The main highlights are 1) setting multiple latent intent factors to describe the intent associations behind user-item interactions, and 2) proposing a relation-aware mechanism to extract relational dependency signals in long-range connections. Although this has a plausible intent-aware design similar to FIRE, there are major distinctions: 1) KGIN’s intent modelling process is naive, setting the latent intent factor as a trainable parameter, inevitably leading to insufficient extraction of latent intent signals, and resulting in unsatisfactory results. In FIRE, we couple the user’s historical interaction behaviour with the intent-aware process and design a more sophisticated two-level attention mechanism to capture the latent intent-aware signals. 2) Despite the significant progress in KGIN’s path-aware mechanism, it still cannot consider the crucial feature interaction module in the KGR task. In summary, we believe that a great KGR model should fully extract the feature interaction signals and the latent intent information so that the model performance is satisfying.

5 Experiment

In this section, we perform experiments on three real-world scenario datasets to evaluate our method and answer the following four research questions:

RQ1: What is the performance of the proposed FIRE framework compared to the state-of-the-art KGs-based recommendation model?
RQ2: How do the key hyperparameters influence the performance of the proposed FIRE?
RQ3: How do different components affect FIRE?
RQ4: What is the time efficiency of the FIRE in model training?

5.1 Experimental setting

5.1.1 Dataset description

To evaluate the effectiveness of our method, we conduct a series of experiments on three different recommendation scenario datasets: 1) Last.FM, 2) Dianping-Food, and 3) MovieLens-1M, which are all openly accessible.

Last.FM^{Footnote 2} is a widely utilized benchmark in music KGs-based recommendations, which includes approximately 2000 users’ listening information from the Last.FM website.
Dianping-Food^{Footnote 3} is a restaurant recommendation (POI recommendation) dataset, provided by Meituan Dianping, which contains more than 10 million interactions between approximately 2 million users and 1000 restaurants.
MovieLens-1M^{Footnote 4} is a widely utilized benchmark in movie KGs-based recommendations, which contains 1 million ratings (ranging from 1 to 5) on a total of 2445 items from 6036 users.

For the construction of the dataset, we follow the treatment of previous work [20, 35, 39]. First, since the interactions in MovieLens-1M and Last.FM are both explicit feedback, they are converted to implicit feedback, where 1 indicates a positive sample (for MovieLens-1M, the threshold for a rating to be considered positive is 4, but no threshold is set due to the sparsity of Last.FM). We randomly sample negative samples per user with a negative sampling rate of 1. Second, in addition to the construction of the U-I interaction data, it was necessary to construct item-side sub-KGs for each dataset. sub-KGs for MovieLens-1M and Last.FM are constructed using Microsoft Satori.^{Footnote 5} as for the Dianping-food dataset, we use the KGs provided by Meituan to construct Dianping-food sub-KGs. For all items in the dataset (music, movies, restaurants), the IDs can be matched in the corresponding sub-KGs. In addition, to filter out noise, we have filtered items that matched multiple entities and items that did not match any entity. Table 2 summarizes the detailed statistics for the three datasets; Last.FM (music), MovieLens-1M (movies) and Dianping-Food (restaurant).

Table 2 Statistics and hyper-parameter settings for the three datasets (d: embedding dimension, s: intent number, λ: L2 regularization coefficient, L: depth of GNNs layer)

Full size table

5.1.2 Experimental settings

We utilize PyTorch^{Footnote 6} to implement our method and deploy it on a server with Quadro RTX 6000 GPU with 24 G of video memory.^{Footnote 7} For each dataset, the ratio of the training, validation, and test sets is 6:2:2. For BCE loss, we construct the training set by randomly sampling 1 negative item for each positive item (i.e., negative sampling rate: 1).

5.1.3 Evaluation metrics

We perform experiments in two prototypical recommendation scenarios. For the top-K recommendation task, we adopt a widely-used evaluation protocol to evaluate the effectiveness of our proposed method: Recall@K, and we set K={50, 100}. For the CTR prediction task, we utilize Area Under the ROC Curve (AUC) and F1-score for the evaluation protocol [40].

$$ \text{Recall}@K(u)=\frac{\left\vert R^{1: K}(u) \cap T(u)\right\vert}{\vert T(u) \vert} $$

(32)

$$ \text{Precision} =\frac{T P}{T P+F P} $$

(33)

$$ \text{Fl-score} =\frac{2 \times \text{Precision} \times \text{Recall}} {\text{Precision} + \text{Recall}} $$

(34)

$$ \text{AUC}(u)=\frac{{\sum}_{i \in T(u)} {\sum}_{j \in I \backslash T(u)} P\left( \tilde{\mathrm{\textit{y}}}_{u i} \geq \tilde{\mathrm{\textit{y}}}_{u j}\right)}{\vert T(u)\vert \vert I \backslash \vert T(u) \vert \mid} $$

(35)

Where P(⋅) is the indicator function, T(u) denotes the ground truth item set, and R^1:K(u) denotes the top-K recommended item list.

5.1.4 Implement details

We train the model by optimizing the BCE loss with Adam [36] optimizer and Xavier [37] initializer. We train the model for 50 epochs, and an early stopping strategy is performed to prevent overfitting. We perform a grid search for hyperparameters: The embedding dimension size in the range of {8, 16, 32, 64, 128, 256}, the L2 regularization factor tuned amongst {0, 1e-6, 1e-5, …, 1e-3} and the learning rate is tuned amongst {5e-2, 1e-2, 5e-3, 1e-3, 5e-4, 1e-4}. During KGs propagation, the depth of GNNs layer is adjusted among {1, 2, 3, 4}. We select the user- and item triple set size in {8, 16, 32, 64}. For user intent modelling, the number of intents is searched within {2, 4, 6, 8, 10}. For Bi-CNNs, we search for the number of two convolutional kernels in the range {2,4,8,16}. Three aggregators (sum, concat, and maxpooling) are employed for aggregating high-order representations, and the default Dropout [38] rate (Dr) is set to Dr= 0.5 in the Bi-CNNs and Dr= 0.5 in the all attention networks.

5.1.5 Baselines

To illustrate the effectiveness of our model, we choose eight baselines, as follows:

CKE [12]: It incorporates multi-modal knowledge to enhance item embeddings for collaborative filtering.
PER [14]: It’s treating the KGs as a heterogeneous information network (HIN) and explores high-order path information based on a meta-path way.
RippleNet [35]: It’s a propagation-based KGs recommendation method, that models user representation as plenty of entities related to users’ historical interacting items. And treats users’ preferences as Ripple that captures high-order attributes on the KGs.
KGCN [19]: It’s a GNNs-based KGR method, which effectively captures user-specific preferences for items in the KGs.
KGNN-LS [39]: It’s a GNNs-based KGR method, that considers label-smoothing in the information aggregation phase to generate user-specific item representations.
KGAT [17]: It’s a GNNs-based KGR method, that adopts attention mechanism and relies on GATs to capture the high-order neighbour information and node-level feature interaction.
CKAN [20]: It’s a GNNs-based KGR method, that utilizes attention mechanism and relies on GATs to capture the high-order neighbour information.
KGIN [21]: It’s a state-of-the-art GNNs-based KGR method, that disentangles the intent factor behind the user-item interaction and utilizes a relation-aware mechanism to obtain user/item representations.

5.2 Performance comparison (RQ1)

To answer the first research question, we compare the performance of all the baselines in Tables 3 and 4. From the results, we make the following observations:

PER has the worst performance on the three datasets over all baselines. This indicates that the path-based method requires domain knowledge to define meta-paths and it is hard to optimize path selection in training, which limits the performance of PER.
CKE achieves performance improvement over PER because of the introduction embedding-method. This indicates that it is helpful to improve recommendation performance with knowledge graph embedding technology. Nevertheless, CKE cannot effectively capture the high-order features on the KGs, which results in insufficient performance. In addition, CKE does not perform well, which might also be caused by a lack of multimodal information.
RippleNet has a significant performance improvement in comparison to path-based and embedding-based methods. This demonstrates the importance of exploring information about high-order attributes on the KGs. However, RippleNet constructs user and item representations asymmetrically and ignores high-order connectivity, which can lead to unsatisfactory performance.
Compared with the shallow model, the performance of KGCN, KGNN-LS, KGAT, and CKAN confirm that incorporating the high-order connectivity and attention mechanisms can improve the recommendation effect. However, the above KGs-based GNNs model neglects fine-grained feature interaction and intent disentangled, and struggles to avoid over-smoothing of high-order features, which leads to unsatisfactory performance.
KGIN is currently the strongest baseline model for KGR and is highly correlated to our model. Its strengths lie in fine-grained intent modelling and path-aware techniques. Whereas it still fails to deal with the lack of high-order feature interactions compared to our model, resulting in unsatisfactory model performance. This reflects the progressive nature of our model’s fine-grained feature interactions.
FIRE outperforms all the baseline methods, which demonstrates that FIRE can effectively explore user latent intent and item fine-grained high-order feature interactions via a two-level attention mechanism and Bi-CNNs module.

Table 3 Overall performance comparison of AUC and F1

Full size table

Table 4 Overall performance comparison of Recall@K

Full size table

5.3 Study of FIRE (RQ2)

To answer the second research question, we perform a crucial hyperparameter analysis of FIRE. From the results, we have the following conclusions.

5.3.1 Impact of embedding size

Figure 5 reports the effect of embedding size on the AUC performance in three recommendation scenarios. The best performance is achieved when the embedding size are set to 128, 64, and 64 for the three datasets. We can observe that with the increase in embedding size from 8 to 256, the recommendation performance improves due to the stronger representation feature space. However, a larger embedding size does not always result in stronger model representation ability for recommendations. This is caused by model overfitting as well as encoding irrelevant feature information.

5.3.2 Impact of different user intent numbers

To analyze the effect of the number of intents, we adjust in the range {2, 4, 6, 8, 10} and illustrate the performance change for the three datasets in Fig. 6. We find that the best performance is achieved when the number of intents is set to 6, 4, and 6 for each of the three datasets. Specifically, performance is poorer when the number of intents is set small, i.e. in the case of coarse-grained intent modelling. This justifies the encouragement of multiple user intents. However, when the number of intents is set larger, the model performance decreases instead. A reasonable explanation is that when intent modelling is too fine-grained, it encodes some irrelevant information and noise, which is detrimental to the accurate representation of the model.

5.3.3 Impact of different L2 regularization coefficients

Figure 7 summarizes the effect of different L2 regularization coefficients on AUC performance, and we have consistent conclusions that the model has different tolerances for different regularization coefficients. We find that the best performance is achieved when the L2 regularization coefficients are set to 1e-5, 1e-5, and 1e-6 for each of the three datasets. Specifically, when the regularization coefficient is small, it can lead to model over-fitting. Conversely, L2 regularization is like a double-edged sword. When the L2 regularization coefficient is too large, it can cause a shift in the correct optimization direction of the model and lead to the severe consequence of underfitting the model. Therefore, choosing an appropriate regularization coefficient allows the model to best performance.

5.3.4 Impact of different aggregators

We explore the impact of the three aggregators on the AUC performance in Table 5 and find that the sum aggregator consistently outperforms the other aggregators. In contrast, the maxpooling aggregator causes the model performance to collapse. One possible reason is that the concat aggregator may encode some irrelevant features, while the maxpooling aggregator undoubtedly loses important attribute information, which leads to poor performance.

Table 5 AUC results of FIRE with different aggregators

Full size table

5.3.5 Impact of different GNNs layers

We verified how GNNs depth affects model performance by varying the number of GNNs layers from 1 to 4. Table 6 shows that FIRE achieves the best performance when L is taken to be 2, 1, and 2 for the three datasets, respectively. In addition to the well-known problem of oversmoothing due to deep GNNs, two other reasons that are very significant are 1) The difficulty of avoiding the introduction of irrelevant information when high-order propagation introduces remote knowledge, especially when facing large-scale datasets. 2) The problem of leading to representation degradation. All these problems can lead to overfitting of the model and thus to poor model performance. Therefore, keeping an appropriate depth of layers in high-order information propagation can maximize the performance of recommendations.

Table 6 AUC results of FIRE with depth of GNNs layer

Full size table

5.3.6 Impact of the size of the triple set

We fine-tune the size of the user and item triple sets to explore their impact on FIRE within the range of {8, 16, 32, 64}. Here we choose Last.FM and Movielens-1M for our experiments. Tables 7 and 8 show the results of the experiments. We find that the best results are obtained when the user triple size is set to a uniform 16. The best performance is achieved when the item triple set size is chosen to be 32 and 64. One possible reason for this is that there is a difference in the initial number of user/item entities for different datasets, which directly determines the number of triples that can be associated. This results in performance differences. In addition, when the user triple size is set too large, the model’s performance is weakened, which is the cause of a degree of overfitting in the analysis.

Table 7 AUC results for Last.FM w.r.t. different sizes of the triple set

Full size table

Table 8 AUC results for MovieLens-1M w.r.t. different sizes of the triple set

Full size table

5.3.7 Impact of convolutional kernels

We fine-tune the sizes of the vertical- and horizontal convolutional kernels to explore their impact on FIRE within the range of {2, 4, 8, 16}. Here we choose Last.FM and Movielens-1M for our experiments. Tables 9 and 10 show the results of the experiments. We find that the best results were obtained when the horizontal convolutional kernel size is set to a uniform 4, and the size of the vertical convolution kernels is set to 2 and 4 respectively. A consistent conclusion is that when the size of convolutional kernels is too small, it is not possible to encode feature interaction information adequately. Conversely, when the size of convolutional kernels is too high, it encodes some of the noise and leads to the overfitting of the model. Therefore, choosing the suitable size of convolutional kernels is crucial for model performance improvement.

Table 9 AUC results for Last.FM w.r.t. different sizes of the two convolutional kernels (n: size of vertical convolutional kernel, $\tilde {n}$: size of horizontal convolutional kernel)

Full size table

Table 10 AUC results for MovieLens-1M w.r.t. different sizes of the two convolutional kernels (n: size of vertical convolutional kernel, $\tilde {n}$: size of horizontal convolutional kernel)

Full size table

5.4 Network visualization

To explore the effects produced by the two core modules in the FIRE model (i.e., the feature interaction module (Bi-CNNs) and the intent modelling module), we observe some details of the trained network. Figure 8(a) reveals the convolutional values of the five vertical convolutional kernels after training FIRE on the Last. FM dataset at L= 3. We find that the five convolutional kernels are trained to be diverse. After analysis, we believe that the vertical convolution kernels produce a similar effect to the attention mechanism by assigning a corresponding weighted sum to each dimension of high-order features (c.f., Equation (20)), which is sufficient to capture fine-grained bit-level feature interactions.

Next, we explore the practical effects produced by the intent-level attention mechanism in Fig. 8(b). We are again conducting experiments on Last.FM (with the number of intents set to 5), and we randomly select five users (U204, U32, U1324, U305, U472) to reveal the effect of intent-level attentive weight visualization (c.f., Equations 27-28). We can find that, after network training, the five attentive weights produce a significant divergence. This indirectly coincides with our vision to encourage modelling the complex and diverse intent perception factors of users to pinpoint their intents and interests.

5.5 Ablation analysis (RQ3)

To answer RQ3, as shown in Figs. 9 and 10, a comprehensive ablation analysis is conducted from two views (i.e., macro- and micro views) to assess the effectiveness and performance of all key components of the model. The specific variants are shown below.

FIRE_W/OIntent: Remove the user intent component totally.
FIRE Dual-Bi-CNNs: Add Bi-CNNs modules to the user-side.
FIRE-light: Remove both Bi-CNNs and intent component.
FIRE_W/OH-kernel: Remove the horizontal convolution module.
FIRE_W/OV-kernel: Remove the vertical convolution module.
FIRE_W/OGNNs-att: Remove the knowledge-aware attention mechanism module.
FIRE_W/OIntent-att: Remove the Intent-Aware Attention Mechanism module.

Impact of the intent-aware attention mechanism

To investigate the impact of the intent-aware attention module in FIRE, we disable the whole intent module, and FIRE_W/O Intent is represented as the variant model. Based on the results reported in Fig. 9, the AUC performance decreases significantly after removing the whole intent component, which demonstrates that the intent-aware modelling is essential for performance improvement. In addition, we conduct an additional experiment to verify the effectiveness of a two-level intent-aware attention mechanism. We construct a variant model FIRE_W/OIntent-att that shows the use of average aggregation instead of attention mechanism aggregation. The results are shown in Fig. 10 which indicates a significant drop in performance. This demonstrates the importance of a two-level intent-aware attention mechanism for distinguishing intents.

Impact of the feature interaction mechanism

Similarly, to investigate the need for the feature interaction module, we set up a variant named FIRE-light with the removal both Bi-CNNs and intent-aware component. As shown in Fig. 9, a significant performance degradation occurs when the feature interaction module is removed, indicating that the feature interaction module is critical to knowledge-aware recommendation models and contributes significantly to performance improvement. Next, to investigate the effect of two convolutional kernels on the performance of the model, two additional variants were set up: FIRE_W/OH-kernel and FIRE_W/OV-kernel. As shown in Fig. 10, we found that using only one type of convolutional kernel for feature fusion significantly degrades the model accuracy. Furthermore, the performance degradation is more pronounced when only the horizontal convolutional kernels are retained. Therefore, we conclude that 1) the best performance can be achieved by using both types of convolutional kernels and 2) bit-level feature interactions (i.e. vertical convolutional kernels) can make a sufficient contribution to the model.

Impact of the knowledge-aware attention mechanism

To explore the position of the knowledge-aware attention mechanism in the overall model, we set up variant FIRE_W/OGNNs-att. As shown in Fig. 10, the model performance collapses when the average aggregation mechanism is utilized instead of the knowledge-aware attention mechanism. This demonstrates that knowledge-aware attention mechanisms play a crucial role in user/item representation. It allows for adaptive aggregation of entity information and provides assurance that the model is accurately represented.

Other aspects

To further explore the efficiency of the feature interaction module in Fig. 9, we construct a variant FIRE Dual-Bi-CNNs, i.e., one that also gives feature interaction functionality for the user side. We have an interesting finding that the performance collapses when the Bi-CNNs module is added together with the user/item side. One possible reason is that the model underwent severe overfitting, rendering the feature interaction module useless. We leave the exploration of a more fine-grained and balanced approach to feature interaction to future work.

5.6 Training cost and efficiency analysis (RQ4)

To answer RQ4, in this section, we examine the time efficiency of the FIRE model and the two macro-variants and select four baseline methods for the CTR task as our control. All methods are performed in the same hardware environment, and the corresponding results are reported in Fig. 11. Specifically, the upper part of Fig. 11 reveals the specific training time of the model, and the lower half reports the overall performance of the model in terms of training time versus accuracy (AUC) (scatter plot). In this case, the top right corner implies the best performance.

We have the following observations: 1) Compared to the GNNs-based KGR model, FIRE is justified by the introduction of the feature interaction module and the intent-aware attention module, which makes the model more robust and thus introduces some additional overhead. 2) Compared to RippleNet, the temporal performance is relatively superior on large-scale datasets, demonstrating the effectiveness and efficiency of the adaptive knowledge-aware attention mechanism.

Unfortunately, while the performance is sufficiently good compared to traditional GNNs-based KGR methods, the stacking of additional neural modelling mechanisms makes the time consumption not advantageous. We will design a lightweight neural recommendation model in future work to achieve a double win between efficiency and time costs.

6 Conclusion and future work

In this paper, we propose a novel end-to-end KGs-based GNNs recommendation method. Specifically, the method leverages an attention mechanism to capture the high-order attribute information of users and items on the KGs. Next, a multi-granular convolutional neural network is adopted to capture the high-order feature interactions of the item-side. Last but not least, a two-level attention mechanism is utilized to model the latent intent of the user, thus achieving enhanced user embeddings. Empirical results on three large-scale benchmark datasets demonstrate the superiority and efficiency of our FIRE method.

In future work, we will consider how to distill and refine the sub-KGs in KGs-based recommendations, and attempt to integrate self-supervised learning techniques into the KGs-based GNNs recommendation. Besides, in view of the shortcomings of our work, which consumes more time than existing methods, we will aim to design more efficient and light-weight neural recommendation [41, 42] models in the future.

On the other hand, we will pay more attention to negative sampling techniques on KGs-based GNNs recommendation methods to generate better quality negative samples. We also leave these works in the future.

Notes

Special note: When a bold e appears in the context, it is uniformly signified as an embedding. Conversely, e is signified as an entity symbol in KGs.
grouplens.org/datasets/hetrec-2011/
https://www.dianping.com/
https://grouplens.org/datasets/movielens/1m/
https://searchengineland.com/library/bing/bing-satori
pytorch.org
https://www.nvidia.cn/

References

Liu Y, Liu Q, Tian Y, Wang C, Niu Y, Song Y, Li C (2021) Concept-aware denoising graph neural network for micro-video recommendation. In: Proceedings of the 30th ACM international conference on information knowledge management, pp 1099–1108
Ji H, Zhu J, Wang X, Shi C, Wang B, Tan X, Li Y, He S (2021) Who you would like to share with? a study of share recommendation in social E-commerce. In: Proceedings of the AAAI conference on artificial intelligence, vol 35. No. 1, pp 232–239
Liu Y, Ma H, Jiang Y, Li Z (2021) Modelling risk and return awareness for p2p lending recommendation with graph convolutional networks. Appl Intell 31:1–6
Google Scholar
Wei Y, Ma H, Zhang R. (2022) Social influence-based personal latent factors learning for effective recommendation. Adv Comput Intell 2(1):1–2
Article Google Scholar
Jiang Y, Ma H, Liu Y, Li Z, Chang L (2021) Enhancing social recommendation via two-level graph attentional networks. Neurocomputing 449:71–84
Article Google Scholar
Fan W, Ma Y, Li Q, He Y, Zhao E, Tang J, Yin D (2019) Graph neural networks for social recommendation. In: The world wide web conference, pp 417–426
Chen B, Guo W, Tang R, Xin X, Ding Y, He X, Wang D (2020) TGCN: Tag graph convolutional network for tag-aware recommendation. In: Proceedings of the 29th ACM international conference on information knowledge management, pp 155–164
Tao Z, Wei Y, Wang X, He X, Huang X, Chua TS (2020) Mgat: multimodal graph attention network for recommendation. Inf Process Manag 57(5):102277–00
Article Google Scholar
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2181–2187
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modelling multi-relational data. In: Neural Information Processing Systems (NIPS), pp 1–9
Sun Z, Deng Z H, Nie J Y, Tang J (2019) Rotate: knowledge graph embedding by relational rotation in complex space. arXiv:1902.10197
Zhang F, Yuan NJ, Lian D, Xie X, Ma WY (2016) Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 353–362
Wang H, Zhang F, Xie X, Guo M (2018) DKN: deep knowledge-aware network for news recommendation. In: Proceedings of the world wide web conference, pp 1835–1844
Yu X, Ren X, Sun Y, Gu Q, Sturt B, Khandelwal U, Norick B, Han J (2014) Personalized entity recommendation: a heterogeneous information network approach. In: Proceedings of the 7th ACM international conference on Web search and data mining, pp 283–292
Hu B, Shi C, Zhao WX, Yu PS (2018) Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery data mining, pp 1531–1540
Shi C, Hu B, Zhao WX, Philip SY (2018) Heterogeneous information network embedding for recommendation. IEEE Trans Knowl Data Eng 31(2):357–370
Article Google Scholar
Wang X, He X, Cao Y, Liu M, Chua TS (2019) Kgat: knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 950–958
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903
Wang H, Zhao M, Xie X, Li W, Guo M (2019) Knowledge graph convolutional networks for recommender systems. In: The world wide web conference, pp 3307–3313
Wang Z, Lin G, Tan H, Chen Q, Liu X (2020) CKAN: collaborative knowledge-aware attentive network for recommender systems. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 219–228
Wang X, Huang T, Wang D, Yuan Y, Liu Z, He X, Chua TS (2021) Learning intents behind interactions with knowledge graph for recommendation. In: Proceedings of the web conference, pp 878–887
Guo H, Tang R, Ye Y, Li Z, He X (2017) DeepFM: a factorization-machine based neural network for CTR prediction. In: Proceedings of the 26th international joint conference on artificial intelligence, pp 1725–1731
Lian J, Zhou X, Zhang F, Chen Z, Xie X, Sun G (2018) xdeepfm: combining explicit and implicit feature interactions for recommender systems. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1754–1763
Zhao Z, Yang S, Liu G, Feng D, Xu K (2022) FINT: field-aware interaction neural network for click-through rate prediction. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3913– 3917
Bian W, Wu K, Ren L, Pi Q, Zhang Y, Xiao C, Sheng XR, Zhu YN, Chan Z, Mou N, Luo X (2022) CAN: feature co-action network for click-through rate prediction. In: Proceedings of the fifteenth ACM international conference on web search and data mining, pp 57–65
Li Z, Cui Z, Wu S, Zhang X, Wang L (2019) Fi-gnn: modelling feature interactions via graph neural networks for ctr prediction. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 539–548
Su Y, Zhang R, Erfani S, Xu Z (2021) Detecting beneficial feature interactions for recommender systems. In: Proceedings of the AAAI conference on artificial intelligence, vol 35. no. 5, pp 4357–4365
Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 565–573
Wang X, Jin H, Zhang A, He X, Xu T, Chua TS (2020) Disentangled graph collaborative filtering. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1001–1010
Li C, Lu Y, Wang W, Shi C, Xie R, Yang H, Yang C, Zhang X, Lin L (2021) Package recommendation with intra-and inter-package attention networks. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp 595-604
Hu L, Xu S, Li C, Yang C, Shi C, Duan N, Xie X, Zhou M (2020) Graph neural news recommendation with unsupervised preference disentanglement. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 4255–4264
Zhao S, Wei W, Zou D, Mao X (2022) Multi-view intent disentangle graph networks for bundle recommendation. arXiv:2202.11425. AAAI-2022
Qian T, Liang Y, Li Q, Ma X, Sun K, Peng Z (2022) Intent disentanglement and feature self-supervision for novel recommendation. IEEE Trans Knowl Data Eng :19
Wang Y, Tang S, Lei Y, Song W, Wang S, Zhang M (2020) Disenhan: disentangled heterogeneous graph attention network for recommendation. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 1605–1614
Wang H, Zhang F, Wang J, Zhao M, Li W, Xie X, Guo M (2018) Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 417–426
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Wang H, Zhang F, Zhang M, Leskovec J, Zhao M, Li W, Wang Z (2019) Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery data mining, pp 968–977
Wu S, Sun F, Zhang W, Xie X, Cui B (2020) Graph neural networks in recommender systems: a survey. ACM Comput Surv (CSUR)
Zhang R, Ma H, Li Q, Li Z, Wang Y (2022) A knowledge graph recommendation model via high-order feature interaction and intent decomposition. In: International joint conference on neural networks, pp 1–7
Zhang X, Ma H, Gao Z, Li Z, Chang L (2022) Exploiting cross-session information for knowledge-aware session-based recommendation via graph attention networks. Int J Intell Syst 37:7614–7637
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Industrial Support Project of Gansu Colleges (No.2022CYZC-11), the Gansu Natural Science Foundation Project (21JR7RA114), the National Natural Science Foundation of China (61762078, 61363058, U1811264, 61966004) and Northwest Normal University Young Teachers Research Capacity Promotion Plan (NWNU-LKQN2019-2). NWNU Graduate Research Project Funding Program (2021KYZZ02103).

In addition, we also appreciate the professional and constructive suggestions and revisions from the anonymous reviewers and editor. And we are very appreciative of the doctors, nurses, and volunteers working on the front lines of the fight against COVID-19.

Author information

Authors and Affiliations

College of Computer Science and Engineering, Northwest Normal University, Lanzhou, 730070, China
Ruoyi Zhang, Huifang Ma, Qingfeng Li & Yike Wang
Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
Huifang Ma & Zhixin Li

Authors

Ruoyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Qingfeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yike Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huifang Ma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, R., Ma, H., Li, Q. et al. FIRE: knowledge-enhanced recommendation with feature interaction and intent-aware attention networks. Appl Intell 53, 16424–16444 (2023). https://doi.org/10.1007/s10489-022-04300-x

Download citation

Accepted: 27 October 2022
Published: 07 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-04300-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FIRE: knowledge-enhanced recommendation with feature interaction and intent-aware attention networks

Abstract

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Artificial intelligence in recommender systems

Recommendation system based on deep learning methods: a systematic review and new directions

1 Introduction

2 Related work

2.1 KGs-based recommendation methods

Summary

2.2 Feature interaction technique in recommendations

Summary

2.3 Disentangled representation learning (intent modelling) technique in recommendations

Summary

3 Problem formulation

User-Item interaction data

Knowledge graph

Task description

4 Method

4.1 Backbone network

4.2 Feature interaction

Vector-level feature interaction mode

Bit-level feature interaction mode

Dense layer

4.3 Intent disentangled module

4.4 Model prediction and optimization

4.5 Model analysis and discussion

5 Experiment

5.1 Experimental setting

5.1.1 Dataset description

5.1.2 Experimental settings

5.1.3 Evaluation metrics

5.1.4 Implement details

5.1.5 Baselines

5.2 Performance comparison (RQ1)

5.3 Study of FIRE (RQ2)

5.3.1 Impact of embedding size

5.3.2 Impact of different user intent numbers

5.3.3 Impact of different L2 regularization coefficients

5.3.4 Impact of different aggregators

5.3.5 Impact of different GNNs layers

5.3.6 Impact of the size of the triple set

5.3.7 Impact of convolutional kernels

5.4 Network visualization

5.5 Ablation analysis (RQ3)

Impact of the intent-aware attention mechanism

Impact of the feature interaction mechanism

Impact of the knowledge-aware attention mechanism

Other aspects

5.6 Training cost and efficiency analysis (RQ4)

6 Conclusion and future work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation