1 Introduction

As the amount of information rapidly increases, recommendation systems have become an important tool for information filtering. They help users discover products and services that they may be interested in. Recommendation systems have achieved great success in modeling their preferences and intentions by taking advantage of users’ recent and long-term behaviors. Existing methods usually use recurrent neural networks [1, 2] and attention mechanisms [3, 4] to model user preferences based on historical interaction sequences. However, there are two serious challenges in recommendation systems, namely data sparsity and cold-start problems [5]. Cold-start users have few or no interaction items, and cross-domain recommendation [6] has attracted widespread attention from academia and industry by leveraging the rich user behaviors in the source domain to help the target domain make recommendations, thereby alleviating the data sparsity and cold-start problems.

In recent years, a promising solution in cross-domain recommendation is to connect the source domain and the target domain by learning the bridge function which can transfer the appropriate knowledge across two domains. For example, the EMCDR [6] model trains a bridge function shared by all users to achieve knowledge transfer from the source domain to the target domain. However, it utilizes a shared bridge function, which not only fails to reflect users’ personalized recommendations, but also reduces the accuracy of recommendations, and can be considered as a coarse-grained method. On the one hand, the relationship between user preferences in source domain and that in target domain is complex and changeable, and it is difficult for a single bridge function to accurately capture the relationships of all users. On the other hand, it is unstable to use only some active users and popular items to train the bridge function, and ignores a large part of important users and items, which makes the generalization ability weak. In order to alleviate these shortcomings, PTUPCDR [7] proposes to leverage meta-network to learn personalized bridge functions for each user, and achieves good results.

Despite the validity of existing approaches, these studies have some limitations: (1) Most cross-domain approaches only consider long-term interests as users’ overall preferences while ignoring sequential features, and they are limited in modeling dynamic short-term interests. For a new user, his current interests through short-term behaviors should be modeled, and then long-term interests will be continually complemented and extended to the recommendations. For an old user, recommender can model both long-term interests and short-term behaviors so as to capture their latest interest changes. It is important and challenging to adaptively fuse these two aspects. (2) Bridge functions learned from meta-learning are usually unstable and cannot be adapted to new tasks. In addition, personalized bridge functions may be too fine-grained and lead to overfitting problems. Meta-learning requires to training the model on a large number of similar training tasks; however, the data of cold-start users is very sparse and the meta-learning-based methods rely heavily on the feature distribution of the training data, it ignores the ability to enhance adaptation to new tasks. Then, the bridge functions learned from meta-learning cannot accurately migrate user features across two domains.

Therefore, it is an urgent problem to provide more accurate recommendations for cold-start users in the target domain while ensuring good processing granularity and avoiding overfitting problems.

Considering the dynamic representation of users’ sequential interactions over the past period of time and taking advantage of meta-learning, we propose a meta-adversarial framework for cross-domain cold-start recommendation (MAFCDR) to solve the cold-start user recommendation problem. Specifically, we used Gated Recurrent Unit (GRU) gating to extract long short-term preferences of users in source domain. Then, we constructed a multi-level feature attention mechanism to independently learn the weights of long-term and short-term features. By weighing these features, we can build the users’ interest representations. To transfer users’ representations, we train an adversarial meta-network with the user’s feature embedding in the source domain as the input, and we obtain the model parameters which can be applied to different tasks through the adversarial game of generators and discriminators so as to enhance the robustness and stability of the model. The obtained personalized parameters are used as initialization parameters of the bridge function, which can capture the preference relationships between different domains. After training, we input the user embeddings from the source domain to the bridge function generated by the meta-network, and obtain the transformed user embeddings. The transformed user embeddings are used as the initial embeddings in the target domain. With these initial embeddings, our method is effective for cold-start users in the target domain.

The contribution of this paper can be summarized as follows:

  • We propose a meta-adversarial framework for cross-domain cold-start user recommendation. Personalized bridge functions for each user are generated by our model.

  • We design multi-level feature attention to extract the long short-term preferences of users in the source domain, and transfer it to the preference representations of users in the target domain.

  • We conduct extensive experiments on three cross-domain tasks using the Amazon dataset to validate the effectiveness of the proposed model.

2 Related Work

There are three lines of work that are most related to our work in this paper: cross-domain recommendations, cold-start recommendations and meta-learning.

2.1 Cross-Domain Recommendations

Cross-domain recommendations (CDRs) provide effective solutions for data sparsity and cold-start challenges. The basic idea of CDRs is to utilize the richer training data in source domain to improve the recommendation accuracy in the sparse target domain. Most of the existing CDR methods are based on collaborative filtering, while others use transfer learning-based CDR methods. Transfer learning-based CDR methods solve recommendation tasks in the target domain by transferring auxiliary knowledge that is different from but related to the target domain and improve the recommendation performance of the target domain. EMCDR [6] learns the mapping function on overlapping users, which maps user preferences across domains. SSCDR [8] uses overlapping users as anchors to calculate the preference characteristics of cold-start users through semi-supervised learning and k-nearest neighbor clustering to achieve cross-domain recommendation. DDTCDR [9] developed a new potential orthogonal mapping to extract users’ preferences in multiple domains while retaining the relationship between users in different potential spaces. Similar to the multi-task approach [10], these approaches focus on well-designed deep network structures. In this paper, we design a framework that can explicitly model knowledge transfer between different domains, rather than using a special deep network structure to implicitly transfer knowledge.

2.2 Generative Adversarial Network

Generative adversarial networks (GANs) [11] are becoming increasingly popular in cross-domain recommendation. Traditional GAN models usually require a lot of training data to learn the data distribution, especially in the case of high-dimensional space and complex data distribution, and the training process is often unstable. ACDN [12] dynamically generates adversarial samples during training to improve the generalization ability of cross-domain recommender systems. GAR [13] tricks the recommender by adversarially training the generator so that the generated cold-item embeddings have a similar distribution as the warm item embeddings. ELECRec [14] trains the generator as an auxiliary model with the discriminator for reasonable alternative items are sampled, and the trained discriminator is considered as the final model.

Fig. 1
figure 1

Diagram of the optimization-based meta-learning algorithm

2.3 Meta-learning

In recent years, the meta-learning in recommendation systems [15] has attracted people’s attention. Most of these works focus on scenarios with few training samples, because it is natural to turn these tasks into few-shot learning problems. The inspiration for meta-learning comes from the human learning process, which can quickly learn new tasks based on a small number of examples. In existing meta-learning works, metric-based methods [16] learn metrics or distance functions on tasks, while model-based methods [17] aim to design an architecture or training process for fast generalization across tasks. Finally, the optimization-based approach [18] directly adapts the optimization method to achieve fast adaptation. In contrast, we consider the concept of meta-learning based on parameter optimization for recommendation system, which can serve the personalized recommendation model by reflecting each user’s item interactions. The optimization-based meta-learning algorithm considers the distribution of model N and tasks p(T). It attempts to find the ideal parameters of model N, as shown in Fig. 1. The optimization-based meta-learning algorithm performs local and global updates. Starting from random initial parameters \(\theta\), the algorithm extracts several tasks from the distribution of tasks \(T_{i} \sim p(T)\). For each task \(i=1,\dots T\), the algorithm updates the parameter \(\theta\) to \(\theta ^{*}\) locally by gradient \(\nabla \theta L_{i}\left( f_{\theta }\right)\), where T is the number of sampling tasks, and \(L_{i}\left( f_{\theta }\right)\) represents the training loss of task i. After local updating, for all sampling tasks, the algorithm updates the parameter \(\theta\) globally based on \(L_{i}^{\prime }\left( f_{\theta ^{*}}\right)\), i.e., the test loss of the parameter \(\theta ^{*}\) of task i, so that the globally updated parameters are suitable for various tasks.

The optimization-based meta-learning algorithm uses two sets for each task, namely support set and query set. The support set and query set are used to calculate the training loss and test loss on each task, respectively. In the local update process, the algorithm adjusts the parameters of model in support set (learning process). In the global update process, the algorithm trains the parameter (from learning to learning) by minimizing the loss of the adaptive parameter on the query set. When the learning-learning process reaches the termination condition of the previous task, the algorithm only accepts the support set of the new task. Using a support set, the model can adapt to new tasks. Note that the algorithm allows not to store the parameters of each task. On the contrary, these parameters are calculated by the support set.

We regard each task as estimating user preferences in the recommendation system. Inspired by this, we propose a MAML-based recommendation system that can quickly estimate the preferences of new users based on only a few user-item interactions. The MAML-based recommendation system considers that different users have different optimal parameters. Therefore, our MAML-based model provides personalized recommendations for each user based on their unique consumption history.

3 Preliminaries

The CDR problem studied in this paper includes a source domain and a target domain. Each domain has a user set \(U=\left\{ u_{1}, u_{2}, \ldots \right\}\), an item set \(V=\left\{ v_{1}, v_{2}, \ldots \right\}\) and a rating matrix R. \(r_{ij}\in R\) represents the interaction between user \(u_i\) and item \(v_j\). In order to distinguish these two domains, the user set, item set and rating matrix of the source domain are represented as \(U^s\), \(V^s\), \(R^s\), respectively, and the target domain is represented as \(U^t\), \(V^t\), \(R^t\). The set of overlapping user between the two domains is defined as \(U^o=U^s\cap U^t\). For items, \(V^s\) and \(V^t\) are disjoint, which means that there are no overlapping items between the two domains.

This paper leverages the embedding method to convert users and items into low-dimensional dense vectors. \(u_{i}^{d} \in {\mathbb {R}}^{k}\) and \(v_{j}^{d} \in {\mathbb {R}}^{k}\) represent the embedding of user \(u_i\) and item \(v_j\), respectively, where k represents the embedding dimension and \(d\in \left\{ s,t \right\}\) represents the domain label. Given users’ behavior sequences, we can generate dense vectors that encode the users’ preferences and can be used (along with other rich features) to predict users’ preference scores for items in the target domain.

4 The Proposed Model

4.1 Model Framework

Inspired by adversarial learning and meta-learning research, we propose a novel model framework as shown in Fig. 2. The framework attempts to combine the long-term (static) and short-term (dynamic) preferences of the user for the next item recommendation [19].

Given user u, we first obtain his/her long and short-term preferences representations, i.e., \(\varvec{p_l}\) and \(\varvec{p_e}\), according to his/her behavior sequence in the source domain. Then, a multi-level attention structure is adopted to fuse long-term and short-term features and we obtain a generalized user representation \(\varvec{p_u}\) where the contribution of long-term and short-term features is determined by dynamic learnable weight. In order to capture the complex relationship between different user preferences in source and target domains, we propose an adversarial training framework containing a generator that generates internal model initialization parameters and a discriminator that maintains meta-task invariance. Through adversarial training, a personalized bridge function is generated between users’ embeddings in source and target domains. After training, we input the user embedding in the source domain into the bridge function and can obtain the transformed user embedding, which is used as the initial embedding in the target domain. Through the initial embedding, our method is effective for cold-start users who do not interact in the target domain.

Fig. 2
figure 2

Model framework

4.2 User Preference Learning

To capture the users’ personalized preferences, we extract long-term and short-term feature vectors from the users’ interactive sequences. In this way, our model can not only obtain stable long-term preferences of users, but also mine dynamic short-term preferences, which can improve the diversity and novelty of recommendations.

In this module, the users’ long-term sequences are used as input, and the users’ short-term preferences are captured by GRU gating. In order to integrate the long-term and short-term preferences of users, we design a multi-level attention structure that can determine the contribution of long-term and short-term features of users through dynamic learnable weighting factors.

4.2.1 Long Short-Term User Representation Learning

Due to the excellent performance of RNN in user sequential behavior modeling, it has attracted great attention in academia and industry in recent years. The update process can be defined as follows:

$$\begin{aligned} \varvec{h_{k}}=g\left( W\varvec{x_k}+U\varvec{h_{k-1}} +b\right) , \end{aligned}$$
(1)

where g is the activation function, \(\varvec{x_{k}}\) is the latest user behavior, \(\varvec{h_{k-1}}\) is the last hidden state, b is the bias term, W and U are trainable parameters. Among all RNN-based models, LSTM (long short-term memory) and GRU are most commonly used in RS. Compared with LSTM, GRU has less ‘gating’ inside and fewer parameters than LSTM, but can achieve the same function as LSTM, and is easier to train in comparison, which can greatly improve the training efficiency. Therefore, we apply GRU to users’ historical interactive sequences to extract short-term preferences. The equations are as follows:

$$\begin{aligned}&\varvec{r_i}=\sigma \left( W_r\cdot \left[ \varvec{h_{i-1}},S_u \right] \right) , \varvec{z_i}=\sigma \left( W_z\cdot \left[ \varvec{h_{i-1}},S_u \right] \right) , \\&\varvec{{\widetilde{h}}_i}=\tanh \left( W_{{\tilde{h}}} \cdot \left[ \varvec{r_{i}} \otimes \varvec{h_{i-1}}, S_{u}\right] \right) , \\&\varvec{h_{i}}=\left( 1-\varvec{z_{i}}\right) \otimes \varvec{h_{i-1}}+\varvec{z_{i}} \otimes \varvec{{\widetilde{h}}_{i}}, \end{aligned}$$
(2)

where \(\varvec{r_i}\in R^d\) and \(\varvec{z_i}\in R^d\) are gates controlling past and present information, \(W_r,W_z,W_{{\widetilde{h}}}\in R^{d\times \left( d+1 \right) }\) are learnable weights, \(\sigma (\cdot )\) is a sigmoid function, \(\left[ \cdot ,\cdot \right]\) denotes a connection, \(\otimes\) denotes element-wise multiplication, and the initial hidden state \(h_0\) is zero-initialized.

The short-term user preferences extracted by GRU are obtained by linear transformation using the output hidden layer state \(\varvec{h_i}\) as follows:

$$\begin{aligned} \varvec{p_e}=W\cdot \varvec{h_i}+b, \end{aligned}$$
(3)

where \(W\in R^d\) and \(b\in R^d\) are learnable parameters.

As long-term preferences are inherent and static, we directly use the item sequence embedding of user interaction as the user’s long-term preference, denoted as \(\varvec{p_l}\).

4.2.2 Long Short-Term Preference Fusion

The users’ long-term preferences and short-term preferences reflect different aspects of information, and their dimensions are not exactly the same. We cannot simply use weighted summation to fuse them.

In this work, we use attention mechanisms [20] to address this problem. Attention-based models can not only capture the relationships between different components, but also selectively construct features to emphasize key information and weaken redundant information. In this paper, to describe user interests more carefully, we design a multi-level attention structure to determine the contribution of long-term and short-term features by dynamically learnable weighting factors.

We first augment the long-term feature representation using a first-level attention mechanism to capture key item information, then apply second-level attention to assign different weights, thus fusing the user representations are obtained. The first-order attention formula is as follows:

$$\begin{aligned} \varvec{p_{l}^{'}}=W_1{\varvec{p_l}}, \end{aligned}$$
(4)

where \(\varvec{p_{l}^{'}}\) denotes the embedding obtained after the first-level attention mechanism and \(W_1\) denotes the dynamically learned weight parameter.

Long-term user representations correspond to long-term preferences, while short-term user representations indicate dynamic and recent preferences. These two types of representations are complementary and their fusion may have a stronger expressive power. After obtaining the enhanced long-term user preferences, in order to further determine the proportion of cross-domain long-term preferences and short-term preferences, i.e., which of them occupies the majority in users’ preferences, we use second-level attention to help make judgements.

$$\begin{aligned} \varvec{p_u}=W_2\varvec{p_l{^{'}}}+W_3\varvec{p_e}, \end{aligned}$$
(5)

For more details, we calculate these weighting parameters using the following equations.

$$\begin{aligned} \begin{aligned}&W_1=\exp \left( h_1{^T}{\text {ReLU}} \left( V_1\cdot \varvec{p_l} \right) +\varphi _1 \right) , \\& W_2=\exp \left( h_2{^T} {\text {ReLU}} \left( V_2\cdot [{\varvec{p_l{'}} \oplus \varvec{p_e}}] \right) +\varphi _2 \right) , \\& W_3=\exp \left( h_3{^T}{\text {ReLU}} \left( V_3\cdot [{\varvec{p_l{'}}\oplus \varvec{p_e}}] \right) +\varphi _3\right) , \\ \end{aligned} \end{aligned}$$
(6)

where \(V_1,V_2,V_3\in R^{D_h\times D_{p_l}}\) are matrix parameters that implement the dimensional mapping, \(h_1,h_2,h_3\in R^{D_h}\) are vector parameters, and \(\varphi _1,\varphi _2,\varphi _3\) are scalar parameters.

It is worth noting that the weight is calculated with \(\exp \left( \cdot \right)\), which makes \(W_*\) may be greater than 1. This is a relatively benign consideration, because these weights can compensate for dimension differences to some extent. Of course, dynamic learning can be less than 1 if necessary.

4.3 Meta-adversarial Training Process

The relationship between user preferences in different domains varies from user to user, and thus the process of preference transfer needs to be personalized. Intuitively, there is some connection between preference relationships and user characteristics, and existing approaches use meta-networks to capture this relationship [7, 21, 22]. However, the distribution of tasks for meta-learning is often complex and variable, with very sparse data in each task, which makes the parameters learned by the meta-network unstable and cannot be adapted to new tasks, thus reducing the recommendation performance. To address this problem, we propose a meta-adversarial network in which the generator takes the users’ transferable features as input and generates different model parameters for different tasks, while the discriminator discriminates whether the generated parameters have ‘task invariance,’ i.e., whether the parameters can maintain a certain stability across tasks. If the ‘task invariance’ is maintained, the discriminator will give positive feedback; otherwise, the discriminator will give negative feedback. We use a multilayer perceptron (MLP) to construct the encoder and discriminator.

Formally, for a given user feature \(\varvec{p_{u_{i}}}\), we apply the following procedure to obtain the initial parameters of the personalized bridge function:

$$\begin{aligned} \begin{aligned}&p_{G}\left( \theta \mid \varvec{p_{u_{i}}}\right) {\text {MLP}}_{\text{ enc } }\left( \varvec{p_{u_{i}}}; \phi \right) , \end{aligned} \end{aligned}$$
(7)

where the generator is a two-layer feedforward network parameterized by \(\phi\)

The loss of generators is:

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_{g e n}=-\prod _{i=1}^{N}\left( \theta =u_{i}^{t}\right) \log \left( p_{G}(\theta )\right) . \end{aligned} \end{aligned}$$
(8)

In order to make the generated personalized parameters applicable to various tasks, we let the discriminator identify \(\theta\) as false parameters to deceive the generator so that the generator learns shared features of multiple tasks during training, and is able to continuously improve the generator’s ability, and continuously adapt to the specific needs of the current task by iterative adversarial methods. Then, the bridge function can more accurately transfer user preferences to the target domain in the face of new tasks.

The discriminator is defined as Eq. 9

$$\begin{aligned} \begin{aligned} p_{D}\left( \theta ,u_i^t\right) ={\text {MLP}}_{\text{ dis }}\left( \theta ; \varphi \right) , \end{aligned} \end{aligned}$$
(9)

where the discriminator is a two-layer feedforward network parameterized by \(\varphi\), and \(\theta\) is the parameter generated by the generator.

The purpose of the discriminator is to predict whether \(\theta\) is a ‘real’ or ‘generated’ parameter. Since the generated parameters are eventually used to transfer the user representation, we directly use the embedded representation of the user in the target domain as the real sample to train the discriminator. The discriminator is trained with a binary cross-entropy loss as follows:

The discriminator loss is calculated as Eq. 10.

$$\begin{aligned} \begin{aligned}&{\mathcal {L}}_{\text{ dis }}=\sum _{i=1}^{N} -\prod \left( \theta =u_{i}^{t}\right) \log \left( p_{D}\left( u_{i}^{t}\right) \right) \\&-\prod \left( \theta \ne u_{i}^{t}\right) \log \left( 1-p_{D}(\theta )\right) . \end{aligned} \end{aligned}$$
(10)

The adversarial loss is shown in Eq. 11.

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_{\text{ adv }}={\mathcal {L}}_{\text{ gen }}+{\mathcal {L}}_{\text{ dis }}. \end{aligned} \end{aligned}$$
(11)

With the adversarial approach, the meta-generator and the meta-discriminator can contribute to each other and improve the generative power of the generator. After obtaining the personalization parameters for the adaptation task, a bridge function is used to transfer the user preferences in the source domain to the target domain. The bridge function can be defined as any structure, and since the multilayer perceptron (MLP) can learn more complex features, improve training speed and accuracy, and also fine-tune on less data, it can perform better in new domain. We migrate user preferences directly using the trained generator as a bridge function and \(\theta\) will be used as a parameter to the generator instead of an input. The generated bridge function varies from user to user and depends on the user’s characteristics.

The users’ embedding representations in the source domain are sent to the bridge function to obtain the transformed user embedding representation. The transformed embedding representation is considered as the initial embedding of the user in the target domain.

The transformed personalized user embedding can be obtained through the bridge function:

$$\begin{aligned} \begin{aligned} {\hat{u}}_{i}^{t} ={\text {MLP}}_{\textrm{enc}}\left( u_{i}^{s}; \theta \right) , \end{aligned} \end{aligned}$$
(12)

where \(u_{i}^{s}\) denotes the embedding of user \(u_{i}\) in the source domain, \({\hat{u}}_{i}^{t}\) denotes the transformed user personalized embedding. Finally, \({\hat{u}}_{i}^{t}\) is used for prediction.

Existing bridge-based methods [21, 23] directly utilize the transformed user embedding \({\hat{u}}_{i}^{t}\) to minimize the loss. However, due to the limited number of items for some cold-start user interactions, the user embedding \({u}_{i}^{t}\) may be unreasonable and inaccurate, and the learned unreasonable embedding can negatively affect the model. Therefore, we utilizes a task-oriented optimization approach to optimize the whole model.

To train the model, we use a task-oriented training procedure directly using the ratings of the final recommendation task as the optimization objective. The loss function can be formulated as follows:

$${\mathcal{L}}_{{{\text{rec}}}} = \frac{1}{{|R_{o}^{t} |}}\sum\limits_{{r_{{ij}} \in R_{o}^{t} }} {\left( {r_{{ij}} - \hat{r}_{{ij}} } \right)^{2} } ,{\text{ }}$$
(13)

where \(R^t_o=\left\{ r_{ij}\vert u_i\in U^o,v_j\in V^t \right\}\) denotes the interaction of overlapping users in the target domain, \(r_{ij}\) the true rating of user i on item j, and \({\hat{r}} _{ij}\) is the prediction rating.

In the end, we combine the two loss functions by a linear interpolation to obtain the hybrid loss function:

$$\begin{aligned} {\mathcal {L}}=\alpha {\mathcal {L}}_{\text {rec}}+(1-\alpha ) {\mathcal {L}}_{\text {adv}}, \end{aligned}$$
(14)

where \(\alpha\) is a hyper-parameter to control the relative importance of each loss function.

4.4 Training

The meta-network should be optimized for a large number of training tasks. We put this concept into the model to reflect personalized user preferences with only a small amount of interaction. The model of this paper considers the user’s consumption history, constructs M(\(M>10\)) group training task. We randomly select 10 items in the sequence as the query set, and the rest is the support set. That is, in order to reflect the user’s interest, the model updates the parameters in the meta-adversarial network according to the user’s unique consumption history. In addition, unlike MAML [24], this paper extends the idea of matching networks [25] without limiting the length of the project consumption history, i.e., the length of the support set is not fixed.

Algorithm 1
figure a

Meta-adversarial networks (MAN) for parameter-generated meta-learning

We denote the parameters of the generator and discriminator as \(\phi\) and \(\varphi\), respectively, and during each meta iteration, a meta-batch is first sampled from the meta-training dataset, then trained internally on that task, locally updating the parameters of the generator and discriminator by computing \({\mathcal {L}}\) and performing a gradient descent step on the support set as follows:

$$\begin{aligned} \begin{aligned}&\phi \longleftarrow \phi -\alpha \nabla _{\phi } \sum _{i=B} \frac{\partial {\mathcal {L}} ^{\text {adv}}}{\partial \phi }, \\&\varphi \longleftarrow \varphi -\beta \nabla _{\varphi } \sum _{i=B} \frac{\partial {\mathcal {L}} ^{\text {adv}}}{\partial \varphi }, \\ \end{aligned} \end{aligned}$$
(15)

where \(\alpha > 0\) and \(\beta > 0\) are the step sizes (learning rate) of the gradient descent. This local update can be considered as a personalized iteration, which can be repeated several times. Now we have new generators and discriminators and then globally update the pre-trained model on the new interaction sequence based \({\mathcal {L}}_{\text {rec}}\), the purpose of this process is to find the ideal parameters to obtain good recommendation performance after several local updates for all users.

The meta-optimization is performed on the generator and discriminator parameters, i.e., \(\phi\) and \(\varphi\), while the goal is to use the updated generator to generate the personalized parameters \(\theta\) used for migrating preference. In fact, the purpose of the meta-phase is to optimize the parameters of the task-oriented meta-adversarial network so that a set of one or a small number of gradient steps simulating a cold-start user will yield the most effective behavior on a real-world cold-start user. Finally, we obtain the overall training algorithm, i.e., Algorithm 1, for the model, which allows updating the meta-parameters by small batches of stochastic gradient descent.

5 Experiments

This section evaluates the proposed framework for solving cold-start user problems under different tasks. Firstly, the experimental setup and baselines are introduced. Then, extensive experiments are conducted on the Amazon dataset.

5.1 Experimental Setup

5.1.1 Datasets

Table 1 Cross-domain task information

The Amazon review dataset [26] is one of the most widely used public datasets for e-commerce recommendations, and this paper uses the Amazon-5 core dataset with at least five ratings per user or item. The dataset contains 24 different item domains. Three popular categories are chosen for this paper: movies_and_tv (movies), cds_and_vinyl (music) and books (books). Three CDR tasks are defined: task 1: movies\(\rightarrow\)music, task 2: book\(\rightarrow\)movies and task 3: books\(\rightarrow\)music. As shown in Table 1, the number of ratings in the source domain is significantly larger than the number of ratings in the target domain. Unlike many existing works that select only a portion of the dataset for evaluation, this paper uses all the data directly to simulate real-world applications.

5.1.2 Evaluation Metrics

Firstly, to measure the regression predictive ability of the model, we select MAE and RMSE as evaluation metrics. Then, to verify the ranking ability of the model, we select AUC and NDCG@10 as evaluation metrics. They are widely used in recommender system to evaluate the performance of model.

5.1.3 Baseline Models

The baselines can be divided into two groups: single domain and cross domain. In the first group, the source and target domains are considered as single domains respectively, and the popular matrix factorization (MF) method is utilized. The second group includes state-of-the-art CDR methods for cold-start users. As the proposed model belongs to the bridge-based CDR methods, this paper focuses on comparing the proposed model with the bridge-based methods. Therefore, the following methods are chosen as the baselines for comparison.

Single domain:

  • TGT: TGT [27] is a MF model, trained using only target domain data.

  • CMF: CMF [28] is an extension of MF. In CMF, the user’s embedding vector can be shared across source and target domains. Cross-domain:

  • SSCDR: SSCDR [8] is a method based on semi-supervised bridge.

  • DCDCSR: DCDCSR [21] belongs to a bridge-based approach, which considers the sparsity of individual users’ ratings in different domains.

  • EMCDR: EMCDR [6] is a commonly used cold-start CDR method. MF is first used to learn the embedding, and then the network is used to connect the user embedding from source domain to target domain.

  • RecGURU: RecGURU [29] learns users’ long short-term preference through adversarial training, achieving information sharing and cross-domain collaboration in user representations.

  • ELECRec: ELECRec [14] is a generative task. The generator is trained as an auxiliary model with the discriminator to sample reasonable alternatives, and the trained discriminator is considered as the final RS model.

  • PTUPCDR: PTUPCDR [7] belongs to the bridge-based cold-start CDR approach, which generates a personalized bridge function by using a meta-network of user feature embeddings and enables personalized preference transfer for each user.

5.1.4 Implementation Details

The proposed framework is implemented by PyTorch. For each task and method, the initial learning rate of the Adam [30] optimizer is tuned by a grid search in the range {0.001, 0.005, 0.01, 0.02, 0.1}. In addition, the dimensionality of the embedding is set to 10. For all methods, the small batch size is set to 512. The same fully connected layer is used to facilitate comparison of EMCDR, DCDCSR, SSCDR, PTUPCDR and MAFCDR, where the mapping function for MAFCDR is generated by a meta-network. The meta-network is a two-layer linear model with hidden cells of \(2\times k\), where k denotes the embedding dimension, and the output dimension of the meta-network is \(k\times k\).

To evaluate the performance of the proposed model, a portion of the overlapping users are then removed from the target domain and they are used as test users, while the rest of the overlapping user samples are used to train the meta-learner. In the experiments, the proportion of test (cold-start) users \(\beta\) is set to 20%, 50% and 80% of the total overlapping users. Overlapping users with item consumption history lengths between 13 and 100 are selected in the training data. For each overlapping user in the training data, 10 random items from the interactive sequences are used as the query set and the rest of the items are used as the support set, i.e., length of item consumption history is the value between 3 and 90, which shows good performance even though the length of the support set is not fixed.

5.2 Comparative Experiments

Table 2 Regression performance comparison of different models on 3 cross-domain tasks
Table 3 Ranking performance comparison of different models on 3 cross-domain tasks

Tables 2 and 3 show the performance of MAFCDR on the three cross-domain recommendation tasks. For each task, we report the average results of five random runs. The best performance is shown in bold. \(*\) indicates 0.05 level paired t test of MAFCDR against the best baseline. The Improve column indicates improvement relative to the best baseline. The following observations can be made from the experimental results.

  • TGT is a single-domain model that uses only data from the target domain and its performance is not satisfactory. Compared to TGT, all other cross-domain methods can utilize data from the source domain, resulting in better results. Therefore, utilizing data from source domains is an effective way to alleviate data sparsity and can improve the performance of target domain recommendations.

  • CMF uses auxiliary data by combining data from different domains into a single domain, while the CDR approach is specifically designed. It can be observed that the CDR method can outperform CMF for most tasks, this is because CMF ignores potential domain shifts by treating the data from both domains as identical. In contrast, the bridge function can transform the source embedding into the target feature space, effectively alleviating the effect of domain shifts. It is therefore essential to investigate CDR by making more effective use of auxiliary domains.

  • By observing the results of the t test with a 95% confidence level, it can be seen that MAFCDR significantly outperforms the best baseline in most cases, indicating that MAFCDR is an effective solution for cold-start recommendations.

5.3 Ablation Experiments

Table 4 Ablation experiments on three cross-domain tasks

The ablation experiments further explore the impact of the various components of the proposed MAFCDR model on performance. Specifically, the following models will be evaluated.

  • -Mulatt: It replaces the multi-level feature attention structure in the model with a self-attentive mechanism and uses long-term sequential user features as input to the self-attentive mechanism.

  • -GAN: The GAN is removed from the model and is replaced with a two-layer linear network as a meta-network.

  • -MAN: The meta-network is removed from the model, and we transfer the user preferences learned through the multi-level attention structure to the target domain through simple matrix multiplication.

  • -TOO: We replace the task-oriented optimization loss with a mapping-oriented optimization process to minimize the distance, using the transformed user embedding \({\hat{u}}_i^t\) to approach the target embedding \(u_i^t\).

Table 4 shows the results of the ablation experiments for the introduced variants on the three cross-domain recommendation tasks. Differences in overall recommendation performance can be observed when sub-modules or features are gradually subtracted from the complete model. It indicates the effectiveness of the individual modules for cold-start cross-domain recommendations.

5.4 Parameter Experiments

Fig. 3
figure 3

The performance of our model according to vary the number of local updates on the task1

We explore the impact of the number of local updates on Task 1. Figure 3 shows the performance of our method on two metrics by varying the number of personalized iterations. Even with few local updates, the model achieves significant improvements on both metrics. After a single iteration of the data, the method achieves significantly lower MAE and RMSE values. After one iteration, a slightly different result is observed by increasing the number of local updates, contrary to the results of the existing MAML [31], whose performance improves with increase in number of iterations. Our model can be adapted quickly to the user, as just one local update is sufficient. Fast adaption allows the proposed method to be applied to online recommendations based on user ratings.

5.5 Generalization Experiments

Fig. 4
figure 4

Generalization experiment on three basic models a MF, b GMF, and c YouTube DNN. The average results of five runs are shown in the figure

The comparison experiments mainly applied to MF for experimental evaluation. However, MF is a non-neural model, and the matrix decomposition algorithm is one of the most effective methods in recommendation recommendations. Therefore, to demonstrate the compatibility of MAFCDR with other bridge-based methods, i.e., EMCDR, PTUPCDR and MAFCDR are applied to two more complex neural models: GMF [32] and YouTube DNN [33]. GMF assigns different weights to different dimensions in the dot product prediction function, which can be seen as a generalization of the ordinary MF. YouTube DNN is a two-tower model. For GMF, the parameters trained by meta-learning can directly transfer the user embedding to the target domain. For YouTube DNN, the bridge function will transform the output of the user tower. Generalization experiments are conducted on the non-neural model (MF) and the neural model (GMF, YouTube DNN). From the results shown in Fig. 4, the following conclusions can be obtained:

  • The bridge-based CDR approaches can be applied to a variety of baseline models. For different baseline models, EMCDR, PTUPCDR and MAFCDR are effective in improving the performance of recommendations for cold-start users in the target domain. As GMF and YouTube DNN are two popular and well-designed models in large-scale real-world recommendations, they achieve better performance than that of ordinary MF.

  • The generalized MAFCDR can achieve satisfactory performance. On the one hand, with various base models, generalized MAFCDR can consistently achieve better results. On the other hand, the cold-start problem is highly challenging and the results of MAE are sufficient to demonstrate the effectiveness of the generalized MAFCDR in cold-start scenarios.

6 Conclusion

To better transfer user preferences from the source domain to the target domain, we proposed to train a meta-learning parameter for each user using a meta-adversarial framework. A meta-generator containing user feature embeddings was learnt to obtain personalized parameters that vary from user to user, and a bridge function was used to initialize the user embeddings to enable personalized transfer of user preferences. Extensive experiments were conducted on real datasets to evaluate the proposed model, and the results validate the effectiveness of the proposed model for cold-start cross-domain recommendation. In the future, we plan to integrate more content information into the framework to further alleviate the cold-start problem.