Introduction

With rapid development of artificial intelligence, recommendation systems have gained popularity with development of machine learning and deep learning. Personalized recommendation systems produce a close connection between users and information resources [1], aiming to provide users with information relevant to their interests; the impact of their recommendation scale is significant [2]. Classical recommendation methods such as matrix decomposition [3] model preferences through the interaction history between users and movies, for example, or through similarity functions that perform recommendation learning by judging the similarity of objects [4], Trattner et al. capturing the exact similarity of neighbors between users or movies based on their historical co-evaluation, and subsequently recommending suitable movies [5]. Intelligent recommendation systems can suggest suitable movies based on different user preferences, Lavanya et al. enhancing movie recommendations by mixing different hobby profiles, and are widely used for accurate matching of users and movies [6]. Auxiliary data are reused in recommendation systems; many approaches have been derived to further improve the recommendation performance of the models by exploiting contextual information [7, 8].

With rapid development of film, music, and online shopping, recommender systems can take root and evolve; bidirectional neural networks, heterogeneous information networks (HINs), and knowledge graphs are developing the next generation of recommender systems [9]. Chuan et al. improving recommendation accuracy by introducing knowledge graphs to capture heterogeneous information. HINs are often used in recommender systems to obtain information about interactions between different edge types and nodes, which is inextricably linked to their ability to flexibly characterize all types of heterogeneous data [10]. Meta-paths are sequences of relations connecting pairs of objects in HINs [29, 30] and have been widely used to obtain semantic structural information concerning the existence of different edge types and nodes related to recommendations [11, 18]. Mukul et al. and Fulian et al. improve recommendation accuracy by introducing heterogeneous information networks to capture more heterogeneous information related to user items. In this paper, we present an example of a movie recommendation featuring a HIN. HIN-based recommendation methods can be classified into two types. The first type uses path-based semantic relevance, rather than a HIN, as a direct feature for recommendation relevance [12, 13]; the second type performs transformations on path-based similarity to learn effective transformation features. Both approaches are designed to improve the representation of bidirectional user-item interactions by extracting meta-path-based features [14]. Hu et al. proposes a network model that introduces meta-paths into the recommendation algorithm to improve the accuracy of recommendations.

Table 1 Recommendation algorithm model information contribution table

Nowadays, in order to better apply information about user items, researchers have proposed models such as ItemKNN, Bayesian Personalized Ranking (BPR) [20], Matrix Factorization (MF), HeteRS [19], FMG [21], SVDFeature (hete) [23], MCRec [25], MCRec (avg), MCRec (mp), MCRec (rand), and MAGNN_Rec [32] for use in recommendation algorithms. However, different models apply different information and for this reason we have constructed a table of the contributions made by each model in the application of information as shown in Table 1 [36]. The information applied to the model includes user information, item information, additional information, supplementary information, user-item interaction information and higher order interaction information. Supplementary information includes click information, heterogeneous information and attribute information. Additional information means information that completes the missing information. Y indicates that the model applies such information, while N indicates that the model does not apply such information.

In movie recommendation algorithms [15, 31], especially the traditional collaborative filtering algorithm [16, 26], only the user’s history of clicks [27] is used to make recommendations [17]. Using the collaborative filtering algorithm to make movie recommendations, only information about the user and the movies they have previously clicked on is used; no user-movie interaction information is applied. In addition, information about newly registered users and movies cannot be used for recommendation, resulting in the inevitable problem of cold-starting users and movies in the recommendation algorithm [28]. To solve these problems, application of user-item interaction information, improved application of associative interaction information, and enhanced user and item representations through user-item interaction information are necessary. A main area of research in the field of recommendation algorithms is better application of user-item interaction information to improve recommendation accuracy. If user-item interaction information used in recommendation algorithms can be increased, recommendation accuracy can be greatly improved. If user-item interaction information is ignored, causal and confusing information can affect the recommendation result and accuracy.

To improve the application of causal association information, reinforcement learning is the first thing that comes to mind, with the advantage that, with sufficient data feedback, its exploratory power can reach upper bounds unattainable by traditional machine learning. However, it suffers from several serious shortcomings: 1 Poor sampling utilisation. 2 Exploration and utilisation difficulties. 3 Learning difficulties due to delayed rewards [34]. In recommendation algorithms require fast adaptation, user needs are changing and can lead to changes in the environment, often leading to reinforcement learning before the existing part of the needs has been learned to change, which in turn leads to non-convergence in the reinforcement learning [33] recommendation algorithm. Therefore, when applying reinforcement learning to recommendation algorithms, it is necessary to build models based on changes in users’ rating behaviour and their interest in watching movies, and to build suitable reinforcement recommendation algorithms according to the requirements of different projects. There is a greater need for recommendation algorithm models that can learn consistently for different projects, so reinforcement learning [35] is not applicable to our recommendation algorithms.

In order to make better use of the interaction information that exists on the user and the item, a more stable learning recommendation algorithm is constructed compared to the reinforcement learning recommendation algorithm. In this study, a three-way neural interaction model based on meta-path context \(\langle \)user, meta-path, item\(\rangle \) is combined with a two-layer, one-dimensional convolutional neural network (CNN), introducing the characteristics of a collaborative attention mechanism into the model and allowing top-N recommendations to be made using meta-path-based contexts. The MCRec model can effectively represent users, items, and meta-path-based context learning; introduction of a two-layer, one-dimensional CNN makes the model more powerful in terms of interaction. Adding a dropout layer to the interaction model and using a two-layer CNN can prevent overfitting and discard irrelevant information features to improve the recommendation. In addition, as traditional cross-entropy loss cannot effectively address the unbalanced nature of the training samples, an extreme cross-entropy loss (argmaxminloss) that combines the characteristics of argmin and argmax functions is designed, reducing the loss in recommendation training. The SCLW_MCRec model can effectively represent the context of learning users, items, and meta-paths, and has a powerful interaction function that can more accurately analyze user-item interactions to provide better recommendations.

The SCLW_MCRec model captures user and item information and their interaction through a three-part representation of the user, item, and user-item interaction information. The three-way neural network, the two-layer CNN, and the UUMUM, UMGM, UUUM, and UMMM meta-path methods obtain user-item interaction information, helping the model obtain path instance information related to the user-item. The extreme cross-entropy loss (argmaxminloss) combines the properties of argmin and argmax functions, with reduced loss during the recommendation algorithm model training process. Weight normalization was used to better optimize the SCLW_MCRec model and accelerate the convergence of stochastic gradient descent optimization. Through extensive experiments using the Movielens dataset on each model, it was found that the SCLW_MCRec model has better recommendation performance than the other models, with an improvement of 2.94–35.8% in the Prec evaluation index, 8.41–53.51% in Recall, and 24.52–49.37% in NDCG.

Fig. 1
figure 1

SCLW_MCRec model architecture

Therefore, the innovation points of this paper are shown below.

  1. 1.

    The model constructs a three-way neural interaction network \(\langle \)user, meta-path, item\(\rangle \) from meta-path contextual information.

  2. 2.

    The three-way neural network, the two-layer CNN, and the UMUM, UMGM, UUUM, and UMMM meta-path methods obtain user-item interaction information, helping the model obtain path instance information related to the user-item.

  3. 3.

    Design for introducing argmin and argmax characteristics of extreme cross-entropy loss.

  4. 4.

    Optimization method using weight normalization for the model.

Model description

General structural model

Compared to the previous user-item learning, the SCLW_MCRec model introduces embedded learning of user-item interaction contexts, which can be applied to relationships between users and items, and thus has an impact on the recommendation results. In contrast to the user and item embeddings in the two-way neural interaction model, the SCLW_MCRec model adds an embedding focus, meta-path-based contextual embedding. Although only one embedding structure is added, it addresses application of user-item interaction information that two-way neural networks tend to overlook. For recommender systems, the interaction relationship has a direct impact on the recommendation result, just as one may judge the behavior of a person based on what they do, and the judgement is based on the relationship between them. In this study, we use a three-way neural interaction model based on meta-paths, consisting of users, items, and meta-paths. The model incorporates meta-paths including UMUM, UMGM, UUUM, and UMMM, perfectly compensating for not considering the relationship between users and items, and reducing the impact of cold starts on the recommendation results. The SCLW_MCRec model obtains information by first obtaining the overall user-item information. The information obtained is then refined into four meta-paths UMUM, UMGM, UUUM and UMMM to obtain more accurate information about the user’s interests. This approach stems from the idea of acquiring information about the whole to the local.

The architecture of the SCLW_MCRec model is shown in Fig. 1. The \(\langle \)user, meta-path, item\(\rangle \) three-way interaction neural network is constructed using the embedding representations of user, item, and meta-path to obtain user information, item information, and user-item interaction information, respectively. The interaction information of UMMM, UMGM, UUUM, and UMMM meta-paths represents the final relevant path instances. A two-layer CNN is used to learn the embeddings of the final path instances, filtered with strong relevance to the user items. The CNN is followed by a max-pooling layer to capture higher-order interaction features with greater relevance to the path instances. The path instances obtained after the max-pooling operation are processed by the dropout layer to filter out unrelated and confusing path instances. As users do not have the same degree of association with movie items obtained from different meta-paths, attention weights are assigned to the meta-paths to represent the learning user-item interactions, with higher weights assigned to those that are more relevant. After representation of user information, representations of item information and path instance information processed through the meta-pathway are used to model the non-linear function of complex interactions through the MLP component, resulting in the final recommendation.

The SCLW_MCRec model is obtained through contextual embedding based on meta-paths using a two-layer, one-dimensional CNN consisting of a convolutional layer (generating new features through convolutional operations), a maximum pooling layer (Maxpooling), and a dropout layer (Dropout). The convolutional layers consist of 128 and 256 kernels; 64 and 128 convolutional kernels are used for validation. In the model, a 1D convolutional layer is used to obtain the local features of the movie dataset. Maxpooling is used to downsample the information, reducing the number of features without losing the main features. Dropout is used to prevent overfitting, and a fully connected layer weighs the local features of the previously collected movie dataset. The SCLW_MCRec model uses a two-layer, one-dimensional CNN structure, as shown in Fig. 2. This structure captures the feature information of the path instances represented by the meta-path through the convolutional layer. The maxpooling layer captures the higher-order interaction features of the path instances, and the dropout layer improves the relevance of the interaction information for recommendations by discarding information with little or no relevance in the higher-order interaction features of the path instances.

From a mathematical point of view, the role of the SCLW_MCRec model is similar to matrix factorization, which is equivalent to decomposing and extracting the information contained on a matrix. The SCLW_MCRec model is to obtain relevant interaction information by first obtaining the characteristic information of the user’s relevant movie data, and then refining it to the meta-path information about UMUM, UMGM, UUUM, UMMM, similar to finding the entry point first and dividing it in detail on the basis of the entry point. It is this approach from refining the overall logical reasoning to local logical reasoning that allows the SCLW_MCRec model to capture more information about user items for use of the recommendation algorithm, thus improving the accuracy of the recommendation performance.

Fig. 2
figure 2

Structure of two-layer CNN

SCLW_MCRec model principles

User and item embedding

Unlike the HIN-based recommendation model, the meta-paths are used as the context for interaction between users and items. The model characterizes the three-way interaction \(\langle \)user, meta-path, item\(\rangle \) rather than the two-way interaction \(\langle \)user, item\(\rangle \). To learn better through meta-paths to generate interactions for recommendation, the model introduces a more important embedding, meta-path-based context, in addition to the components used to learn user and item embeddings. The meta-path-based context is first modeled as a low-dimensional embedding using a hierarchical neural network. Using the initially learned embeddings for user, item, and meta-path-based contexts, the joint attention mechanism is improved by alternative augmentations for all three representations. Using meta-path-based context, a two-layer, one-dimensional CNN, and a dropout layer, the SCLW_MCRec model has steadily improved the accuracy of movie recommendations. The symbols involved in the model are shown in Table 2. Each of these symbols plays a key role in the SCLW_MCRec model, and it is what they represent that together build the core of the model. In the SCLW_MCRec model, u and i represent user and item information respectively, \({|\mathrm {\textit{U}}|}\) and \({|\mathrm {\textit{I}}|}\) represent the total number of users and projects respectively, and the rest of the symbols represent information about individual modules or parameters in the article, and are described in detail in later sections.

Table 2 Symbols and descriptions

After the embedding, a lookup layer is set up for converting a user and item representation to a low-dimensional dense vector. For a given user-item pair \(<u,i>\), \(\mathrm {\textit{m}}_{\textit{u}} \in \textit{R}^{|\mathrm {\textit{U}}| \times 1}\) and \(\mathrm {\textit{n}}_{\textit{i}} \in \textit{R}^{|\textit{I}| \times 1}\) are their individual representations. The parameter matrix \(\mathbf {\textit{M}} \in \textit{R}^{|\mathrm {\textit{U}}| \times \textit{d}}\)of the lookup layer is used to store the potential factors of the user, and \(\mathbf {\textit{N}} \in \textit{R}^{|\mathrm {\textit{I}}| \times \textit{d}}\) stores the items. \({|\mathrm {\textit{U}}| }\) represents the total number of users, and \({|\mathrm {\textit{I}}| }\) is the total number of items; the dimension size embedded by users and items is represented by d. The search method of the search layer is represented in the following equations:

$$\begin{aligned} \mathrm {\textit{x}}_{\textit{u}}= & {} M^{T} \cdot m_{u} \end{aligned}$$
(1)
$$\begin{aligned} \mathrm {\textit{y}}_{\textit{i}}= & {} \mathrm {\textit{N}}^{T} \cdot n_{i}. \end{aligned}$$
(2)

Meta-path-based interaction contexts

The SCLW_MCRec model approach includes four steps. The first step is to embed a single instance of the path [18]; two layers of CNN are used for processing. The CNN structure consists of a convolutional layer, a maxpooling layer [25], and a dropout layer, as shown in Fig. 2. The embedding is expressed as

$$\begin{aligned} \mathrm {\textit{h}}_{\mathrm {\textit{m}}}={\text {CNN}}\left( {\text {CNN}}\left( X^{m}; \Theta \right) \right) , \end{aligned}$$
(3)

where m denotes a meta-path containing four types of meta-paths, UMMM, UMGM, UUUM, and UMMM; \({X^{m}}\) denotes a feature of a path instance, and \(\Theta \) is a parameter in the CNN model.

The second step is to embed multiple paths [18], as meta-paths generate more than one path instance, after the convolutional layer is filtered to obtain K path instances that are more relevant to the user, denoted as \({\left\{ \textit{h}_{{\textit{m}}}\right\} _{\textit{m}=1}^{\textit{K}}}\). The model uses maxpooling to derive the meta-path embedding to obtain the important dimensional features. Meta-path m runs as

$$\begin{aligned} c_{m}=\max {\text {-pooling}}\left( \left\{ \mathrm {\textit{h}}_{\textit{m}}\right\} _{\textit{m}=1}^{\textit{K}}\right) \end{aligned}$$
(4)

The third step is a dropout to remove any confusing information contained in the important features, for more accurate recommendations, expressed as

$$\begin{aligned} \begin{aligned} f_{m}&={\text {dropout}}\left( \left\{ \mathbf {\textit{c}}_{\textit{m}}\right\} _{\textit{m}=1}^{\textit{K}}\right) \\&=d {\text {ropout}}\left( \left\{ \max {\text {-pooling}}\left( \left\{ \mathrm {\textit{h}}_{\mathrm {\textit{m}}}\right\} _{\textit{m}=1}^{\textit{K}}\right) \right\} _{\textit{m}=1}^{\textit{K}}\right) \end{aligned} \end{aligned}$$
(5)

The last step is embedding of aggregated meta-paths, derived using an averaging pool operation to facilitate embedding of contextual modeling. This is calculated as

$$\begin{aligned} f_{u \rightarrow i}=\frac{1}{\left| G_{u \rightarrow i}\right| } \sum _{m \in G_{u \rightarrow i}} f_{m} \end{aligned}$$
(6)

where \({f_{u \rightarrow i}}\) denotes the meta-path and contextual representation; G denotes the path instances from meta-path m, and \({G_{u \rightarrow i}}\) denotes the set of meta-paths used by the model for user interaction with the movie item.

Attention mechanism embedding module

In the model, it is possible to obtain information about the user and the item, and also about their interaction. The SCLW_MCRec model uses a different attention-mechanism embedding approach to obtain information about users and items through three components: user, item, and user-item interaction information (meta-path context representation). \(\mathrm {\textit{x}}_{\textit{u}}\) is the embedding representation of users; \(\mathrm {\textit{y}}_{\textit{i}}\) is the embedding representation of movie items, and \(\mathrm {\textit{c}}_{\textit{m}}\) is the embedding representation of meta-path contexts. The attention mechanism for user and item embeddings uses a single-layer network to compute the attention vectors for user u and item i. The attention vectors \(\mathrm {\beta }_{\textit{u}}\) and \(\mathrm {\beta }_{\textit{i}}\) are then used to improve the user and item embedding for context \({c_{u \rightarrow i}}\) based on the calibrated meta-path.

$$\begin{aligned} \beta _{u}= & {} g\left( W_{u} x_{u}+W_{u \rightarrow i} c_{u \rightarrow i}+b_{u}\right) \end{aligned}$$
(7)
$$\begin{aligned} \beta _{i}= & {} g\left( W_{i}^{\prime } y_{i}+W_{u \rightarrow i}^{\prime } c_{u \rightarrow i}+b_{i}^{\prime }\right) , \end{aligned}$$
(8)

where \(\mathrm {\textit{W}}_{\textit{u}}\) and \(\mathrm {\textit{W}_{\textit{u} \rightarrow \textit{i}}}\) are the weight matrices of the user focus layer, \(\mathrm {\textit{b}}_{u}\) is the deviation vector, while the weight matrix and deviation vector of the item focus layer are represented by \(\mathrm {\textit{W}_{\textit{i}}^{\prime }}\) and \(\mathrm {\textit{b}}_{\textit{i}}\) respectively. Similarly, g() represents the sigmoid function. The final representation of users and items is then calculated by using the product of the elements with the attention vector \(\mathrm {\otimes }\).

$$\begin{aligned} {\widetilde{x}}_{u}= & {} \beta _{u} \otimes x_{u} \end{aligned}$$
(9)
$$\begin{aligned} {\widetilde{y}}_{i}= & {} \beta _{i} \otimes y_{i}, \end{aligned}$$
(10)

where \(\mathrm {\textit{x}}_{\textit{u}}\), \(\mathrm {\textit{y}}_{\textit{i}}\), m and \(\mathrm {\textit{c}}_{\textit{m}}\) then denote the user’s embedding, the item’s embedding, the meta-path and the contextual embedding of the meta-path, respectively.

For the attention mechanism used for contextual representation of meta-paths to process user-item interaction information, as different meta-paths have different semantics in user-item interactions, we use a two-tier architecture to implement a meta-path-based contextual attention mechanism with interaction-specific attention weights based on meta-paths.

$$\begin{aligned} \alpha _{u, i, m}^{(1)}= & {} g\left( W_{u}^{(1)} x_{u}+{W_{i}^{(1)}} y_{i}+{W_{m}^{(1)}} f_{m}+b^{(1)}\right) \end{aligned}$$
(11)
$$\begin{aligned} \alpha _{u, i, m}^{(2)}= & {} g\left( {w^{(2)T}} \alpha _{u, i, m}^{(1)}+b^{(1)}\right) . \end{aligned}$$
(12)

Meta-path attention scores are obtained by normalizing the attention scores on all meta-paths using the softmax function.

$$\begin{aligned} \alpha _{u, i, m}=\frac{\exp \left( \alpha _{u, i, m}^{(2)}\right) }{\sum _{m^{\prime } \in G_{u \rightarrow i}} \exp \left( \alpha _{u, i, m^{\prime }}^{(2)}\right) } \end{aligned}$$
(13)

The new embedding based on the meta-path context can be expressed as

$$\begin{aligned} \textit{c}_{\textit{u} \rightarrow \textit{i}}=\sum _{\textit{m} \in \textit{G}_{\textit{u} \rightarrow \textit{i}}} \alpha _{\textit{u}, \textit{i}, \textit{m}} \cdot \textit{c}_{\textit{m}}. \end{aligned}$$
(14)

Finally, the three embedding vectors (user embedding, item embedding, and meta-path-based contextual embedding) are combined into a unified representation of the current interaction, expressed as

$$\begin{aligned} \widetilde{\textit{x}}_{\textit{u}, \textit{i}}=\tilde{\textit{x}} \oplus \textit{m}_{\textit{u} \rightarrow \textit{i}} \oplus \widetilde{\textit{y}}_{\textit{i}}. \end{aligned}$$
(15)

The resulting unified representation of interactions containing user embeddings, item embeddings, and meta-path-based contextual embeddings is fed into the MLP to implement a non-linear function to model complex interactions. The MLP component contains two hidden layers. The sigmoid function is used as the activation function, and the output layer has a ReLU function.The learning algorithm for the SCLW_MCRec model can be found in Algorithm 1. For training efficiency of the model, we use l meta-paths and use Algorithm 1 to implement the acquisition of user item interaction information. Given a node information, filter out all outgoing node information and further construct the information table for O(1) time node sampling. In this way, the acquisition of interaction information on path instances for meta-path generation interactions can be done in time \(\textrm{O}(\textit{l} \cdot \mathrm {\textit{L}} \cdot \mathrm {\textit{N}})\), where L is the path length and N is the maximum number of path instances considered for the meta-path.

$$\begin{aligned} \hat{\textit{r}}_{\textit{u}, \textit{i}}={\text {MLP}}\left( {\widetilde{x}}_{u}, i\right) \end{aligned}$$
(16)

Algorithm 1:

//Inputs: implicit feedback and information network

//Extreme value cross-entropy loss and weight normalisation

   optimization

//Outputs: recommendation models for user

Inputs: user u,item i,Meta-path m

Outputs: \({\hat{r}}_{u, i}\)

foreach \(<\textit{u},\textit{m},\textit{i}>\) do

      for user u and item i do

      Embedding of user and item using equations (1) and equations (2)

      respectively

      End

      For Meta-path m do

      Embedding of a single path instance (Equation (3))

      Embedding representation of meta-paths using equation (4)

      Filtering invalid information by dropout filtering (Equation (5))

      Learning the attention of meta-paths for users and items through

      Equation 7

      and Equation8 yields a meta-path context representation \(\textit{c}_{\textit{u} \rightarrow \textit{i}}\)

      End

End

//The meta-path context representation can influence the user/Item

      representation

Foreach u,i do

User: Equation (9)

Item: Equation(10)

End

Meta-path attention score \(\alpha _{\textit{u}, \textit{i}, \textit{m}}\)

Equation(15): Combine the embedding of user, item, context to

      generate the current

interaction representation \(\widetilde{\textit{x}}_{\textit{u}, \textit{i}}\)

Final expression: \(\hat{\textit{r}}_{\textit{u}, \textit{i}}\)

Loss function

To improve the movie recommendation algorithm and reduce the loss of the model as much as possible, a loss function is designed with the advantages of the cross-entropy loss function and the characteristics of argmin and argmax functions. The loss function refers to the abc function, returning the absolute value of the function, and can be used for training and recommendation with the Movielens dataset to reduce loss. The loss function also introduces the extreme value theorem in referring to the argmin and argmax functions, which includes the maximum value and does not ignore the minimum value, allowing the SCLW_MCRec model to obtain more comprehensive information about user-item interactions. The model loss function is expressed in the following equations, where \(\mathrm {\textit{y}}_{\textit{i}}\) denotes the actual labeled value, and \(\hat{\mathcal {\textit{y}}}_{\textit{i}}\) denotes the predicted labeled value.

$$\begin{aligned} \text{ Loss }= & {} (1+\text{ weight}) \times \text {categorical}\_\text {crossentropy} \end{aligned}$$
(17)
$$\begin{aligned} {\text {Loss}}= & {} -(1+ \text{ weight } ) \sum _{i=1}^{\text{ outpit } } y_{i} \cdot \log {\hat{y}}_{i} \end{aligned}$$
(18)
$$\begin{aligned} \text {Loss}= & {} -\left( 1+\frac{a b c\left( \frac{3}{4} m-\frac{1}{4} n\right) }{{\hat{y}}_{i}-1}\right) \sum _{i=1}^{\text {output size}} y_{i} \cdot \log {\hat{y}}_{i}.\nonumber \\ \end{aligned}$$
(19)

The weights are determined by including the argmin and argmax functions to introduce the extreme value theorem to obtain more comprehensive user-item interaction information and reduce loss; m and n denote the results obtained by introducing the argmin and argmax functions for the actual and predicted values, respectively.

$$\begin{aligned} \text{ weight }= & {} \frac{a b c\left( \frac{3}{4} m-\frac{1}{4} n\right) }{{\hat{y}}_i-1} \end{aligned}$$
(20)
$$\begin{aligned} m= & {} \arg \min _i y_i+\arg \max _i y_i \end{aligned}$$
(21)
$$\begin{aligned} n= & {} \arg \min _i {\hat{y}}_{i}+\arg \max _i {\hat{y}}_{i} \end{aligned}$$
(22)

Compared with other loss functions, the loss function in this model can obtain more complete data features to reduce loss and make more accurate recommendations.

Optimizer

For model optimization, the convergence of stochastic gradient descent optimization is accelerated using weight normalization to reparameterize the weight vectors. Application of weight normalization to movie recommendation models shows great advantages.

The weight-normalization optimization approach considers a standard artificial neural network, for which computation with neurons consists of two parts: the weighted sum of the input features and the elemental non-linearity:

$$\begin{aligned} y=\phi (w \cdot x+b). \end{aligned}$$
(23)

The scalar deviation term is represented by b; w represents the k-dimensional weight vector; x represents the k-dimensional vector of input features. The scalar parameter g and parameter vector v are used to re-parameterize each weight vector w, followed by stochastic gradient descent through re-parameterization using weight normalization.

$$\begin{aligned} w=\frac{g}{\Vert v\Vert } v, \end{aligned}$$
(24)

where v is a k-dimensional vector; \(\mathrm {\Vert \textit{v}\Vert } \) denotes the Euclidean norm of v, and g is a scalar.

The neural network is trained in the new parameterization using the standard stochastic gradient descent method. The gradient g of the loss function L with respect to the new parameters v is obtained in this section by performing differentiation.

$$\begin{aligned} \nabla _{\textrm{g}} L= & {} \frac{\nabla _{w} L \cdot v^{2}}{\Vert v\Vert } \end{aligned}$$
(25)
$$\begin{aligned} \nabla _{v} L= & {} \frac{g}{\Vert v\Vert } \nabla _{w} L-\frac{g \nabla _{g} L}{\Vert v\Vert ^{2}} v, \end{aligned}$$
(26)

where \(\nabla _{\textit{w}} L\) is the gradient with respect to the weight w normally used. Weight-normalized backpropagation can be used with only minor modifications to the usual backpropagation equations using standard neural network software.

Another method of writing gradients has been added:

$$\begin{aligned} \nabla _{v} L= & {} \frac{g}{\Vert V\Vert } \times \left( M_{w} \times M_{w}\right) \nabla _{w} L \end{aligned}$$
(27)
$$\begin{aligned} \mathrm {\textit{M}}_{w}= & {} I-\frac{w w^{\prime }}{\Vert w\Vert ^{2}}, \end{aligned}$$
(28)

where the projection matrix projected onto the complement of the w vector is represented by \(\mathrm {\textit{M}}_{w}\). This shows that weight normalization in the new optimization method accomplishes two things: it scales the weight gradient by g/\(\mathrm {\Vert \textit{v}\Vert } \), and helps the current weight-vector projection gradient bring the covariance matrix of the gradient closer to uniformity and benefit optimality.

Through several experiments on film and television recommendation models, neural networks with weight normalization worked well over a wider range of learning rates than with conventional parameterization. This study uses weight-normalized optimization to decouple the lengths of the weight vectors from their orientation by reparametrizing them in the neural network, which accelerates the convergence of stochastic gradient descent, and improves the model optimization process considerably, without introducing any dependencies between them. This suggests that the weight-normalization optimization method can also be applied to deep reinforcement learning or generative models, and that its introduction into film and television recommendation systems is highly effective. The weight-normalized optimization method is much simpler than other optimization methods, with faster batch normalization and a lower computational overhead, allowing more optimization steps to be performed in the same amount of time.

Experimental analysis

In this experiment, the SCLW_MCRec model is tested on the Movielens movie dataset and evaluated against other models by modifying the loss function, activation function, optimizer, convolutional kernel, and network information structure using the Prec, Recall, and NDCG indices.

Datasets and selected meta-paths

The MovieLens dataset contains user and movie attribute information, ratings of individual movies by different users, and different types of interactions between users and movies, users and users, movies and movies, and movies and genres, as shown in Table 3. The MovieLens dataset is one of the most commonly used datasets for recommendation systems, the test dataset for machine learning algorithms. Many well-known papers have used this dataset; it was also used in a historical recommendation system competition. Table 3 presents the data in the Movielens dataset. The first column corresponds to the user, the item, the number of interactions between the two, and the meta-path information available. The other columns present statistics for the other relationships: User-Movie, User-User, Movie-Movie, Movie-Genre. User-Movie corresponds to the UMUM meta-path; User-User corresponds to the UUUM meta-path; Movie-Movie corresponds to the UMMM meta-path, and Movie-Genre corresponds to the UMGM meta-path.

UMUM denotes the paths of movies followed by users who follow the same movies, whose data is constructed as user-movie-user-movie; UMGM denotes the paths of movies of the same genre as movies followed by the user, whose data is constructed as user-movie-genre-movie; UUUM denotes the paths of movies that the user follows, whose data is constructed as user-user-user-movie; and UMMM denotes the paths of movies associated with the movies that the user watches, whose data is constructed as user-movie-movie-movie. The dataset used in this paper is based on the public dataset Movieslens, which was refined by constructing user-movie-user-movie, user-movie-genre-movie, user-user-user-movie,user-movie-movie-movie. The MovieLens dataset contains information about the user, the movie, their interaction, and the path instances of the four meta-path contexts.

Table 3 Estimation of distribution parameters

Evaluation indices

To validate the effectiveness of the SCLW_MCRec model, three evaluation indices, Prec, Recall, and NDCG, are used to evaluate the model recommendation performance.

The recall rate is expressed as

$$\begin{aligned} \text{ recall } =\frac{\sum _{u \in U}|R(u) \cap T(u)|}{\sum _{u \in U}|T(u)|}. \end{aligned}$$
(29)

The accuracy rate is expressed as

$$\begin{aligned} {\text {prec}}=\frac{\sum _{u \in U} \mid R(u) \cap T(u)}{\sum _{n \in U}|T(u)|}, \end{aligned}$$
(30)

where R(u) represents the top-n recommendation list made to the user based on the user’s behavior with the training set, and T(u) represents the set of items actually selected by the user after the system has recommended items to the user.

The NDCG (normalized discounted cumulative gain) is expressed as

$$\begin{aligned} \textrm{N D C G}=\sum _{i \in T(u)} \frac{2^{r_{i}}-1}{\log \left( p_{i}+1\right) }, \end{aligned}$$
(31)

Where \(\textrm{r}_{i}\) and \(\mathrm {\textit{p}}_{i}\) denote the correlation and order of i in T(u) in R(u).

Advanced baseline

For a more objective understanding of the effectiveness of the SCLW_MCRec model in obtaining user-item interaction information through the three-way interaction neural network and the two-layer CNN with four meta-paths (UMMM, UMGM, UUUM, UMMM), the optimization effects of the extreme value loss function and weight-normalization optimization methods on film and television recommendation models are presented. This section describes some advanced recommendation algorithm models and compares the effectiveness of the SCLW_MCRec for verification.

Comparison of methods We considered three recommended methods: CF-based methods using implicit feedback(ItemKNN,BPR,MF), HIN-based methods using rich heterogeneous information(HeteRS,FMG,SVDFeature(hete)), and meta-path-based methods(MCRec,MCRec(avg),MCRec(mp),MCRec(rand), and MAGNN_Rec).

ItemKNN: The ItemKNN model is a classical collaborative filtering model that recommends similar items based on those chosen by users in the past.

Bayesian Personalized Ranking (BPR) [20]: The BPR model is based on Bayesian theory to maximize the posterior probability with prior knowledge and minimize the pairwise ranking loss of implicit feedback.

Matrix Factorization (MF): Cross-entropy loss is recommended for top-N recommendation; it is a standard matrix decomposition method.

HeteRS [19]: HeteRS is a recommendation method based on heterogeneous networks that uses multivariate Markov chains to model user preferences.

SVDFeature(hete)/SVDFeature(mp) [23]: This is a matrix decomposition model that uses heterogeneous relationships as hotspot features.

FMG [21]: This is a heterogeneous network-based rating prediction model.

MCRec [25]: This is a new type of deep neural network with a common attention mechanism for top-N recommendation using context based on rich meta-paths.

MCRec(avg)/MCRec(mp)/MCRec(rand): These are variants of the MCRec model.

MAGNN_Rec [32]: This model uses graph neural networks to aggregate different levels of interaction information, such that the user-item representations obtained are more closely related to the meta-path context. The interaction information is efficiently applied to improve the accuracy of the recommendation performance.

Experimental results

For this experiment, all user implicit feedback records from the movie dataset were randomly divided into a training set (80%) and a test set (20%). The model evaluation results were compared in terms of the precision rank (Prec), recall rank (Recall), and k-standardized cumulative gain (NDCG) to determine the strengths and weaknesses of the different models. The results for each parameter of the model were compared with those of previous models using the same dataset, preprocessing method, optimizer, and loss function. The validity of the SCLW_MCRec model was verified.

The SCLW_MCRec model can effectively recommend movies. To demonstrate the effectiveness of the SCLW_MCRec model, it was compared with the ItemKNN, BPR [20], MF, HeteRS [19], FMG [21], SVDFeature [23], and SVDFeaturemp models, a variant of the MCRec model, and the MCNGNN-Rec model. The ItemKNN model can only be used by the movies they have watched in the past, ignoring the information about the interaction between the user and the item.. The BPR recommendation model adds implicit feedback to the movie information based on the history of movies watched, but is still ineffective in applying user-item interaction information. HeteRS [19], FMG [21], SVDFeature [23], and SVDFeaturemp models can obtain heterogeneous information and item attribute information for recommendation, but application of interaction information is still inadequate. mCRec model variants obtain user-item interaction information using meta-paths. However, the filtering of the obtained path instances is not sufficient.

Unlike the ItemKNN and BPR models, the SCLW_MCRec model can obtain user and item information, and also user-item interaction information. The three neural interaction networks with a two-layer, one-dimensional CNN can obtain higher-order user-item interaction information. Unlike MF, HeteRS [19], FMG [21], SVDFeature [23], and SVDFeaturemp models, the SCLW_MCRec model obtains user and item attributes and selects the most suitable of the four meta-path methods (UUMM, UMGM, UUUM, UMMM) using both interaction and attribute information. Compared to the MCRec and variant models, the SCLW_MCRec model is better at selecting path instances and applying higher-order interaction information across the UMUM, UMGM, UUUM, and UMMM meta-paths. Application of extreme cross-entropy loss functions and weight-normalization optimization methods can optimize the model and reduce model loss. Compared with the MAGNN_Rec model, the SCLW_MCRec model is less capable of aggregating information, but more capable of selecting suitable meta-path instances and obtaining higher-order interaction information for recommendation.

The SCLW_MCRec model has both advantages and disadvantages compared to currently available state-of-the-art recommendation models. To clearly demonstrate the strengths and weaknesses of the models, this section tests and evaluates different models using the Movielens dataset with the same parameters, and determines the Prec, Recall, and NDCG evaluation indices, as shown in Table 4. The SCLW_MCRec model produced a significant improvement in all evaluation indices compared to the individual recommendation algorithms. Compared with ItemKNN, BPR, MF, HeteRS, SVDFeature, and SVDFeaturemp, the SCLW_MCRec model had significantly improved Prec, Recall, and NDCG indices as it can obtain user interaction information. Compared with the FMG and MCRec variants, the SCLW_MCRec model can obtain more relevant interaction information, although the FMG and MCRec variant models also apply user-item interaction information. The Prec, Recall, and NDCG indices all improved. Compared with the MAGNN_Rec model, the SCLW_MCRec model had improved Recall and NDCG indices; the NDCG increased by 6.7%. The experimental results demonstrate the effectiveness of the SCLW_MCRec model recommendation performance.

Table 4 Comparison of evaluation indices for advanced models

The SCLW_MCRec model is extremely effective for movie recommendations compared to the other recommendation models. In Fig. 3, it is observed that the evaluation indices of the SCLW_MCRec model are much higher than those of the other models, especially NDCG. The SCLW_MCRec model is better for processing analysis and recommendation with the Movielens dataset.

Fig. 3
figure 3

Bar chart comparing evaluation indices of advanced movie recommendation models

Fig. 4
figure 4

Advanced baseline model evaluation index comparison line chart

In this study, several experiments were conducted using the Movielens dataset. To demonstrate the validity of the proposed model, an ablation analysis was performed; the results are presented in Table 5.

In the ablation experiments, the proposed model was ablated by deleting or replacing individual modules, while keeping the corresponding parameters constant. Four cases were considered. (1) the MCRec model of only the three-way neural network, (2) the SC_MCRec model of the three-way neural network with the two-layer, one-dimensional CNN, (3) the SCL_MCRec model of the three-way neural network with the two-layer, one-dimensional CNN and improved loss function, (4) the SCLW_MCRec model of the three-way neural network with the two-layer, one-dimensional CNN, improved loss function, and weight-normalization optimization. The other parameter settings were unchanged.

Table 5 Ablation study on Movielens dataset

Compared to the MCRec model with only a three-way neural network, the SC_MCRec model with a three-way neural network and a two-layer, one-dimensional CNN obtained more user-item interaction information based on the meta-path context, and allowed the model to obtain better interaction information and higher-order user-item interaction features for accurate recommendations to users. The SCL_MCRec model with a three-way neural network, a two-layer, one-dimensional CNN, and an improved extreme cross-entropy loss function improved acquisition of higher-order interaction features and enabled the model to better reduce losses during the movie recommendation training process, improving the effectiveness of the model. The SCL_MCRec model with a three-way neural network, a two-layer, one-dimensional CNN, an improved extreme cross-entropy loss function, and a weight-normalization optimization approach was more stable and better able to provide accurate recommendations to users.

Fig. 5
figure 5

Comparison of ablation experiment results

To better describe the improvement in recommendation performance on the Movielens dataset provided by each module in the SCLW_MCRec model, the ablation experiment results are presented in the form of a line graph, as shown in Fig. 5. Each module in the model increases recommendation performance on the Movielens dataset, demonstrating the effectiveness of the SCLW_MCRec model in recommender systems.

Conclusion

A network model combining a three-way neural interaction network and a two-layer CNN was designed, and a cross-entropy loss function and weight-normalization optimization method were devised. The movie recommendation accuracy was greatly improved using four meta-path methods (UMMM, UMGM, UUUM, UMMM) to obtain user-item interaction information; the interaction information obtained through meta-path filtering is more in line with user interests. On the MovieLens dataset, the Prec, Recall, and NDCG evaluation indices all improved significantly. The SCLW_MCRec model compensates for the shortcomings of other recommendation models in learning effective representations of user, item, and meta-path contexts, with powerful interaction features that can more effectively process user-item interactions that are easily overlooked. However, for different datasets, the model must be manually designed to annotate the meta-paths; it does not provide a way to automatically design and select meta-paths based on data interaction information and directly apply them in other scenarios. In future research, the main goal is automatic selection of single or multiple meta-paths that are best suited to the application, regardless of the dataset, filtering the interaction information that provides accurate recommendations to the user by means of a self-attentive or attempted-attention mechanism. In addition, we aim to design optimal loss functions and optimization methods for different meta-path approaches to improve the convergence speed and reduce the loss of the model.