Introduction

Learning embeddings with excellent expressive ability is the core of recommendation. Early studies [1,2,3] have typically projected user and item IDs simply as embedding vectors based on the idea of word embedding [4, 5]. Some subsequent studies [6,7,8] have attempted to use the users’ interaction items to represent the users’ interests, which can be considered as having made initial use of the idea of graph neural network (GNN). This is because the set of items that the user has interacted with can be considered as the first-order neighbors of the user in the user–item bipartite graph. Based on this principle, it is easy to develop embedding models that exploit higher order information. For example, a user’s second-order neighbors are other users with which they had the same interaction items, and third-order neighbors are other items clicked by second-order neighbors [9]. GNN, a method based on multiple graph convolution operations to aggregate multihop neighbors into node representations, has been shown to capture the higher order connectivity implicit in the bipartite graph of user items and improve the performance of collaborative filtering (CF) [10,11,12]. Despite these studies demonstrating the effectiveness of graph neural networks in CF, there are still some issues that affect the performance of GCF:

  • Sparse interaction matrix: User–item interaction bipartite graphs are usually highly sparse, but deep neural network prefers to have a high-quality training dataset, which leads to typically poor robustness of deep learning models [13,14,15]. A poorly robust model can be slightly less effective when the training set contains significant complex noise, outlier invasion, and class imbalance [16]. In CF, this ineffectiveness is manifested by vulnerability to noise during training, which affects the quality of the representation.

  • Over-smoothing: The essence of graph convolution is the aggregation of neighbor information, thus, this phenomenon can easily lead to similar high-order embedding representations, which affect the expressiveness of the embedding and limit the depth of the GNN model [17].

Contrastive learning (CL) has been introduced into GCF with a series of results to address these issues. The current graph enhancement schemes commonly used in CL may hinder further performance improvements despite its effectiveness. For example, the representative method SGL [18], which combines CL with GCF, requires the destruction of graph structure when generating CL sample pairs, which leads to the possible loss of important edge information and thus affects the representation ability of the embedding.

Specifically, we first implement adding noise to the embedding by simply adding a dropout layer to the graph convolutional network, and obtain the augmented view pairs by two forward calculations. Dropout [19] is a widely used regularization technique that aims to prevent overfitting by simply removing a certain percentage of hidden units from the neural network during training to enhance robustness. Since the neurons deactivated by dropout are random each time, the noise that is added to the embedding during the two forward computations is different. The dropout-based CL scheme avoids the problem that using graph perturbation can lead to loss of graph structure information, and at the same time, the simple and elegant scheme of simply adding the dropout layer to the original network structure to obtain enhanced samples can be easily applied to commonly used CF models. Importantly, by constraining the similarity of the comparison samples obtained from the two forward calculations, the computational error caused by the influence of the model by the random noise can be reduced and the resistance of the model to disturbances can be enhanced, i.e., the robustness of the model can be enhanced. The study [13] indicated that the improvement in robustness helps the model to perform better in data with more noise disturbances. Then, in order to reduce the interference of the smoothing problem of high-order embeddings on the performance, we designed a cross-layer connected embedding approach with the idea of residual networks. The method computes higher order embeddings by connecting the original embedding across layers to the current layer to participate in the computation, thus providing more available information to mitigate the smoothing phenomenon. The cross-layer connected graph computation constrains the higher level embedding and effectively reduces the appearance of over-smoothing problems.

It is worth noting that both of our proposed improvements are model-independent, therefore, we choose LightGCN as the backbone to validate the effectiveness of the proposed method. The SimDCL model is an end-to-end model that has a simple but effective structure. Experimental studies on five benchmark datasets demonstrate the effectiveness of SimDCL. We summarize the contributions of this paper as follows:

  • We propose SimDCL, a CL-based GCF model, which improves CF through dropout-based CL and cross-layer connected graph embedding.

  • Our proposed method generates random noise by simple dropout only, while preserving the integrity of the user–item bipartite graph, to obtain augmented samples to assist the learning of recommendation tasks.

  • Extensive experiments on five datasets demonstrate that our approach consistently outperforms many competing baselines, including GNN-based and CL-based methods.

Preliminary

Graph collaborative filtering

CF is the fundamental paradigm of recommendation, and the goal is to predict a user’s interest in an item by the implicit feedback information given. Specifically, given the set of users \(U\) and the set of items \(I\), we can obtain the interaction matrix \(Y\in {\left\{\mathrm{0,1}\right\}}^{\left|U\right|\times \left|I\right|}\), where \({Y}_{u,i}=1\) indicates an interaction between user \(u\) and item \(i\) and \({Y}_{u,i}=0\) indicates no interaction.

In the GCF model, the interaction matrix is organized in the form of a bipartite graph of users and items, denoted by \(G={V,E}\), where \(V=\left\{U\cup I\right\}\) denotes the node, containing all users and items, and \(E= \left\{(u,i)|u\in U, i\in I, {Y}_{u,i}=1\right\}\) denotes the edge, which represents the interaction between users and items.

Graph convolutional network (GCN) [20] is the basis of GCF, which is focused on aggregating higher order connectivity information into a representation of the target node by iteratively performing multiple graph convolution operations. We present here the graph convolution paradigm in backbone LightGCN [11], the most commonly used in GCF research [18, 21, 22]. Taking user \(u\) as an example, the core computation in LightGCN is defined as follows:

$${\mathbf{z}}_{u}^{(l)}={f}_{\text{combine }}\left({\mathbf{z}}_{u}^{(l-1)},{f}_{\text{aggregate }}\left(\left\{{\mathbf{z}}_{i}^{(l-1)}\mid i\in {\mathcal{N}}_{u}\right\}\right)\right),$$
(1)

where \({z}_{u}^{(l)}\) denotes the embedding of user \(u\) at layer \(l\), \({z}_{u}^{(l-1)}\) denotes the user embedding at its previous layer, i.e., layer \(l-1\), and \({\mathbf{z}}_{i}^{(l-1)}\) is the embedding of user \(u\)’s \((l-1)\) order neighbor. \({z}_{u}^{(0)}\) denotes the user self-node representation, embedding. \({f}_{{\text{aggregate}} \, }(\cdot )\) is the aggregation function that is applied to \(l-1\) order neighbor embeddings of user \(u\). \({f}_{\text{combine }}(\cdot )\) is the combine function, which is used to combine \(l-1\) order embeddings and their neighbor information to obtain the \(l\)-th layer embeddings. Notably, although nonlinear activation and transformation are commonly used in neural networks [23], LightGCN demonstrates that their removal is highly beneficial to the performance of graph algorithms in CF. After obtaining \(l\) embeddings containing information of different orders, they are aggregated to obtain the final embedding:

$${z}_{u} = {f}_{readout}\left({z}_{u}^{\left(0\right)},{z}_{u}^{\left(1\right)}, \dots ,{z}_{u}^{\left(l\right)}\right),$$
(2)

where \({f}_{readout}\left(\cdot \right)\) denotes the readout function to obtain the final embedding, typically as a weighted sum function. The probability of an interaction between user \(u\) and item \(i\) is obtained by the inner product of user and item embeddings:

$${\widehat{y}}_{u,i}={z}_{u}^{T}{z}_{i},$$
(3)

Contrastive learning

CL is a discriminative representation learning framework based on the contrastive concept. The idea of CL is to consider each instance (user and item) as a category, generate multiple views by imposing transformations, and then pull the views of the same instance as close as possible in the embedding space and push those of different instances apart [24]. CL has been widely used in many fields such as CV and NLP, due to its excellent performance [25,26,27,28]. Its goal is to learn an encoder, and this encoder encodes similar data of the same kind and makes the encoding results of different classes of data as different as possible.

Studies that combine recommendation with CL have typically designed a pretext task and then used data augmentation strategy to generate contrastive view pairs of the sample. By maximizing the consistency of the two views, the training results in an embedding distribution that is as uniform as possible without model collapse. A classic study that applies CL to recommendation is SGL. SGL designs three strategies for graph enhancement that perturb the graph structure, where edge loss achieves the best performance. Edge loss generates two subgraphs by discarding connections in the graph, and then calculates the loss using the InfoNCE function:

$${\mathcal{L}}_{cl}=\sum_{i\in \mathcal{B}} -\mathrm{log}\frac{\mathrm{exp}\left(\frac{s\left({\mathrm{z}}_{i}^{\mathrm{^{\prime}}},{\mathbf{z}}_{i}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\right)}{\tau }\right)}{\sum_{j\in \mathcal{B}} \mathrm{exp}\left(\frac{s\left({\mathbf{z}}_{i}^{\mathrm{^{\prime}}},{\mathbf{z}}_{j}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\right)}{\tau }\right)},$$
(4)

where \(\mathrm{z{^{\prime}}}\) and \(\mathrm{z{^{\prime}}}\mathrm{^{\prime}}\) represent the embedding representations learned from each of the two views, \(s(\cdot )\) represents the similarity calculation function, typically computed using cosine similarity, and \(\tau \) represents the temperature parameter in SoftMax.

Existing model

We try to divide the existing GCF approaches into two classes, both of which aim to propagate higher order connectivity using graph convolution to obtain better user and item representations. These studies are summarized in Table 1.

Table 1 Relation models

The first category focuses on improving the model structure and computation. GC–MC [29] simply uses first-order neighbors of users and items to represent itself, and is the earliest attempt to apply graph networks to CF. NGCF [10] extends graph convolution from first-order neighbors to high-order and uses a residual strategy to aggregate embeddings of different layers. LightGCN [11] removes the nonlinear activation and transformation commonly used in deep neural networks based on NGCF, simplifying the GCN structure while improving the recommended performance. PinSage [30] designs a random walk-based sampling method to sample fixed-size neighborhoods realize to gain efficiency gains at the expense of losing the original graph structure. HMLET [17] considers that the predominance of linear and nonlinear features varies across datasets, and thus designs a gated network to select the use of linear or nonlinear features in each convolution. SVD-GCN [31] combines singular value decomposition with graph neural networks and proposes to use the maximum singular vector for recommendation, which significantly improves the computational efficiency.

The second class of models focuses on using CL to assist in training and enhances the robustness of the model to obtain an improved embedding representation. SGL [18] proposes three types of graph augmentations to generate view pairs, edge dropout, node dropout, and random walk. SGL improves model robustness by maximizing the consistency between view pairs to enhance its performance. DCL [32] performs random edge dropout on the \(l\)-hop ego-network of nodes to generate two views as positive pairs, which are then optimized using the debiased contrastive loss. PCRec [33] uses CL pre-training in cross-domain recommendation. PCRec first generates augmented subgraphs by random wandering, pre-trains encoders between the subgraphs, and finally fine tunes these encoders in the interaction data. HGCL [34] constructs homogeneity graphs of user and item nodes, and then trains the maximization of mutual information between each homogeneity graph and the global representation of the entire graph. NCL [21] abandons the use of graph augmentation and applies CL to the consistency of embedding features. NCL designs two CL losses that maximize the similarity of even-layer and central embeddings and users (items) to their semantic neighbors. SimGCL [35] adds random uniform noise to the hidden representation to obtain enhanced data and fine-grained regulation of the representation uniformity to alleviate the prevalence bias problem.

Methodology

As shown in Fig. 1, the proposed SimDCL superimposes multiple dropout layers in the forward computation to generate augmented data and mitigate the over-smoothing by cross-layer connectivity. This section first illustrates the computation of graph convolution with cross-layer connectivity and then provides a detailed description of the dropout-based CL method. The time complexity of the model is finally analyzed.

Fig. 1
figure 1

The model structure of SimDCL

Cross-layer connected graph convolutional network

We first introduce the SimDCL improvements in the way graph convolution is computed. We calculate the \(l\) layer embedding by simply connecting the original embedding across layers to the \(l\)-1 layer based on the idea of residual networks, which alleviates the problem of over-smoothing to a certain extent. We name the method as cross-layer connected GCN, and the CLC-GCN is calculated as follows:

$${\mathbf{z}}_{u}^{(l)}={f}_{\text{combine }}\left({f}_{concat}({\mathbf{z}}_{u}^{\left(l-1\right)},{\mathbf{z}}_{u}^{\left(0\right)}),{f}_{\text{aggregate }}\left(\left\{{f}_{concat}({\mathbf{z}}_{i}^{\left(l-1\right)},{\mathbf{z}}_{i}^{\left(0\right)})\mid i\in {\mathcal{N}}_{u}\right\}\right)\right),$$
(5)

where \({f}_{concat}(\cdot )\) denotes the cross-layer connection function, where a mean operation is used. After analysis, we found that the smoothing phenomenon typically occurs in the second graph convolution calculation, so we applied the cross-layer connection in the embedding calculation for the \(l>1\) layer. As in LightGCN, after performing the embedding of the \(L\)-layer, CLC-GCN obtains the final representation of the item and user by weighted summation of the obtained multiple embeddings, and then obtains the possibility of an interaction between the user and item by computing the inner product.

In the training phase, the recommended loss is calculated using BPR. The BPR loss maximizes the prediction score for items with user interaction and minimizes the prediction score for items without interaction:

$${\mathcal{L}}_{bpr}=\sum_{u,i,j\in Y}-\mathrm{log}\left(\sigma \left({\widehat{y}}_{u,i}- {\widehat{y}}_{u,j}\right)\right),$$
(6)

Dropout-based contrastive learning

We propose a simple CL method based on the dropout technique considering the studies of [36, 37]. Dropout is a technique for suppressing overfitting in deep learning, which prevents co-adaptation and performs implicit ensemble by simply removing a certain percentage of hidden units from the neural network during the training process. The dropout technique is to randomly exclude some neurons from the computation randomly during the training process to achieve the noise generation effect. The CL scheme of SimDCL is built on the randomness of dropout.

Specifically, SimDCL performs two forward computations, and the embeddings obtained from each layer in each forward computation are fed into the dropout layer to obtain the random noise. The randomly deactivated neurons are not fixed each time due to the random nature of dropout; therefore, the two forward computations result in a pair of embedding representations with different additional noises. We then apply CL on the two sample pairs, considering the embeddings of the same user(item) in both pairs as positive pairs and then the other users(items) as negative samples. Based on InfoNCE, the proposed dropout-based CL framework aims to minimize the distance between positive pairs as follows:

$${\mathcal{L}}_{ssl}^{user}=\sum_{u\in \mathcal{U}} -\mathrm{log}\frac{\mathrm{exp}\left(\frac{s\left({\mathbf{z}}_{u}^{\mathrm{^{\prime}}},{\mathbf{z}}_{u}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\right)}{\tau }\right)}{\sum_{v\in \mathcal{U}} \mathrm{exp}\left(\frac{s\left({\mathbf{z}}_{u}^{\mathrm{^{\prime}}},{\mathbf{z}}_{v}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\right)}{\tau }\right)},$$
(7)
$${\mathcal{L}}_{ssl}^{item}=\sum_{i\in I} -\mathrm{log}\frac{\mathrm{exp}\left(\frac{s\left({\mathbf{z}}_{i}^{\mathrm{^{\prime}}},{\mathbf{z}}_{i}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\right)}{\tau }\right)}{\sum_{j\in I} \mathrm{exp}\left(\frac{s\left({\mathbf{z}}_{i}^{\mathrm{^{\prime}}},{\mathbf{z}}_{j}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\right)}{\tau }\right)},$$
(8)

where \({\mathcal{L}}_{ssl}^{user}\) and \({\mathcal{L}}_{ssl}^{item}\) denote the user and item embeddings CL loss, respectively. \({\mathbf{z}}_{u}^{\mathrm{^{\prime}}}\), \({\mathbf{z}}_{u}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\) denote the two final embeddings of the same user with different noise disturbances obtained after two forward calculations, and \({\mathbf{z}}_{v}^{\mathrm{^{\prime}}\mathrm{^{\prime}}}\) denotes the final embedding of other users. The loss of the CL task is the sum of the user CL loss and the item CL loss:

$${\mathcal{L}}_{ssl}= {\mathcal{L}}_{ssl}^{user}+ {\mathcal{L}}_{ssl}^{item},$$
(9)

Multi-task training

To fully utilize CL to optimize the recommendation task, we use a joint training scheme to optimize both the recommendation loss and comparison loss in one training session as follows:

$$\mathcal{L}={\mathcal{L}}_{bpr}+\lambda {\mathcal{L}}_{ssl}+{\Vert \Theta \Vert }_{1}+{\Vert \Theta \Vert }_{2}^{2},$$
(10)

where \(\Theta \) denotes the parameter set of the model, i.e., all parameters of the user and the item embedding. \(\lambda \) represents hyperparameters controlling the SSL.

Complexity

This section analyzes the time complexity of SimDCL and compares it with the baseline LightGCN and GL. We discuss the time complexity within a single batch. We refer to [22] and define \(|E|\) as the number of edges in the interaction graph, \(d\) is the embedding size, \(b\) denotes the batch size, \(M\) denotes the number of nodes in the batch, and \(1-p\) denotes the edge loss rate in SGL. Table 2 shows the results of the time complexity analysis. Observations of this table reveal the following findings.

  • LightGCN and SimDCL do not need to perform graph augmentation but only require \(O(2|E|)\) time complexity in constructing the adjacency matrix. By contrast, SGL requires nearly three times the cost at this stage because it needs to perform graph augmentation twice to generate two subgraphs.

  • In the graph convolution phase, SGL requires forward computation for the original and the two enhanced graphs; thus, the time cost is nearly three times that of LightGCN. Notably, SimDCL requires only two graph convolution operations, reducing the computational effort of the graph convolution stage.

  • On the recommendation loss calculation, SimDCL calculates the BPR loss for the results of both forward calculations; therefore, the time cost at that stage is twice that of LightGCN and SGL. However, the BPR loss calculation is simple and does not add an excessive amount of time cost.

  • In the CL loss computation phase, SGL and the proposed SimDCL compute the loss for positive sample pairs. Therefore, the complexity is the same, which is\(O (bd + bMd)\), where \(O (bd )\) and \(O \left(bMd\right)\) are computed between positive and negative samples, respectively.

Table 2 Comparison of time complexity

Overall, the complexity of SimDCL is higher than that of SGL in the BPR Loss computation phase. However, the overall complexity of SimDCL is still lower than that of SGL because the computational complexity in the graph convolution part is markedly reduced.

Experiments

To demonstrate the superiority of SimDCL and reveal the reasons for its effectiveness, we conducted extensive experiments and answered the following research questions:

RQ1: How does SimDCL perform compared with the state-of-the-art CF model?

RQ2: How do the two components of SimDCL affect performance?

RQ3: How do the different settings affect the effectiveness of the proposed SimDCL?

Experimental settings

Dataset

We experiment on five benchmark datasets: MovieLens-1M[38], Amazon Books[39], Yelp,Footnote 1. Gowalla [40] and Ta-Feng.Footnote 2 Limited by the experimental environment and to maintain the validity of the data, we perform 20-core processing for Amazon Books, 15-core processing for Yelp, and 10-core for Gowalla. Table 3 summarizes the statistics of the processed datasets. We followed the approach in [21] and partitioned the dataset into training, validation, and test sets according to an 8:1:1 split.

Table 3 Statistics of datasets

Compared methods

We compared the proposed SimDCL with the following baselines:


BPRMF [2]: This is a personalized ranking algorithm based on Bayesian posterior optimization, focused on modeling the relative preference ranking of two items. Ultimately, the preference ranking for items with no past behavior is computed for each user, resulting in to personalized recommendations.


NGCF [10]: This method applies GCN to CF to achieve higher order connectivity propagation to improve performance.


LightGCN [11]: This method removes the nonlinear activation and transformation in GCN based on NGCF, making it simpler and improving performance.


SGL [18]: This method introduces self-supervised CL based on LightGCN and proposes three enhancement operators to generate comparison data. We chose to compare SGL-ED using random edge loss, which is the best performing enhancement scheme in the study of SGL.


HMLET [17]: This method investigates the role of linear and nonlinear transforms in GCF, and proposes a gated switch-based GCN for recommendation systems.


NCL [21]: This method abandons the use of data augmentation and instead proposes two contrasting methods to optimize GCF, namely optimizing the structural loss for consistency between layer 0 and even layers and optimizing the semantic loss for consistency between users (items) and their prototypes.


SimGCL [22]: This method discards the graph enhancement mechanism and instead adds uniform noise to the embedding space to create augmented view pairs.

Evaluation metrics

The evaluation is performed in a fully ranked manner in the test set; therefore, we use two Recall and NDCG, which are widely used in CF [11, 18, 41], as performance metrics to measure the performance of the model, where \(N\) is set to 10 and 20.

The \(Recall\) is defined as \(Recall= \frac{TP}{TP+FN}\), where \(TP\) and \(FN\) denote the number of true positives and false negatives, respectively. \(NDCG\) full name is normalized discounted cumulative gain, \(NDC{G}_{k}=\frac{DC{G}_{k}}{IDC{G}_{k}}\), where \(DC{G}_{k} = \sum_{i=1}^{k}\frac{{2}^{re{l}_{i}}-1}{{\mathrm{log}}_{2}(i+1)}\), IDCG is the maximum DCG value, under ideal conditions.

Implementation details

We implemented SimDCL based on RecBole [42], and all compared baseline models were experimented with using the implementation in RecBole. To ensure fair comparisons, we optimize all methods using the Adam optimizer and set the embedding size uniformly to 64 and the batch size uniformly to 4096. To prevent overfitting, we use an early stop strategy to record the best one-time model parameters and use them for evaluation in the test dataset.

Overall performance

Performance comparison

Table 4 compares the performance of the proposed SimDCL and baselines on five datasets, where bold indicates the best performance for each line, underline indicates the best baseline performance, and in parentheses is the percentage performance improvement of SimDCL compared with the best baseline. Analysis of the experimental results reveals several observations.

  1. (1)

    Compared with the traditional CF method BPRMF, the GNN-based approach generally achieves superior performance. This superiority is a good demonstration that GNN has improved feature representation because the high-order connectivity captured by GNN helps model better embedding representations compared to traditional methods.

  2. (2)

    Compared with the NGCF method that directly applies graph convolutional networks to CF, LightGCN and HMLET, which improve the structure of GNN, achieve performance gains. This finding demonstrates the significance of simplifying and improving the heavy GCN for performance improvement.

  3. (3)

    SGL, NCL, and SimGCL, which combine CL with GNN methods, consistently outperform other GNN methods on all five datasets compared with other methods. This finding demonstrates that self-supervised learning methods, especially CL, play a valuable role in CF. Notably, the NCL and SimGCL methods that forgo the use of graph augmentation exhibit better performance compared with SGL, which justifies the proposed CL-based approach that does not use graph augmentation and generates augmented views based on dropout layer only.

  4. (4)

    Finally, we observe that the proposed SimDCL method consistently outperforms all baselines, including the state-of-the-art NCL and SimGCL, on all datasets. This finding further demonstrates the superiority and effectiveness of the SimGCL method, which simultaneously performs graph computational improvements and introduces CL.

Table 4 Performance comparison with other SOTA models

Convergence speed comparison

This section compares the convergence speed of SimDCL with that of a representative model in GCF and demonstrates that SimDCL has superior convergence. Figure 2 shows the performance variation on the validation set during the training process. Notably, the model completes training after 10 epochs of optimal performance due to the use of an early stopping strategy. The figure reveals that the proposed SimDCL achieves the best performance at 35, 23, and 6 epochs on the MovieLens-1M, Yelp, and Ta-Feng datasets, respectively. By contrast, SGL reached the best after 53, 55, and 38 epochs, showing a slow convergence rate. Nevertheless, both CL-based methods achieve faster convergence compared to LightGCN because LightGCN is still several times epochs away from convergence during SGL and SimDCL convergence.

Fig. 2
figure 2

Performance curves on the validation set during training

In addition, the percentage of SimDCL convergence speedup varies on different datasets. On MovieLens-1M, SimDCL has approximately 30% speedup compared with SGL but over 80% on the Ta-Feng dataset. Thus, this finding may be related to the sparsity of the dataset. Moreover, SimDCL can achieve fast convergence on datasets with high sparsity.

Running time comparison

This section compares the actual running time during training of SimDCL with LightGCN and SGL methods. Table 5 shows the statistics, including the runtime of individual epochs and the total runtime. The results show that

  • The SimDCL method consumes essentially the same running time as SGL within a single epoch. Although the time cost is slightly higher than SGL on both Yelp and Ta-Feng datasets, the combined convergence speed of SimDCL is still better than SGL.

  • Combining the total runtime and convergence rounds of the three methods reveals that, with the benefit of the surprising convergence speed of SimDCL, SimDCL can complete the training using even less than the actual time cost of LightGCN.

  • It is worth noting that although our analysis in “Complexity” indicates that the time cost of SimDCL is smaller than that of SGL, the actual performance during training is not in line with our conclusion. On the Yelp and Ta-Feng datasets, SimDCL with only two forward computations takes even more time to complete the training of an epoch, which we attribute to the fact that the multiple dropout operations used increase the cost of computation.

Table 5 Runtime of SimDCL compared to the baseline, time in seconds

Further study of SimDCL

Ablation study

We proposed two improvements in SimDCL, CLC-GCN and dropout-based CL. To demonstrate the effectiveness of these two improvements, we conducted an ablation study on three datasets, and the results are depicted in Fig. 3. In particular, we apply CLC-GCN on top of backbone, called w/ CLC-GCN, and dropout-based CL on top of backbone, called w/ CL. The analysis of the experimental results shows that

  1. (1)

    Compared to LightGCN, w/ CLC-GCN and w/ CL, which apply only a single improvement, get some performance improvement, which shows that both of our proposed improvements are effective.

  2. (2)

    Compared to the w/ CLC-GCN with improved graph computation, the w/ CL with applied CL achieves more performance improvement, which proves that the CL-based approach is simpler and more effective than the improved graph structure and computation approach.

  3. (3)

    SimDCL with two improvements applied simultaneously achieves the best performance, which shows that our proposed two improvements can work together and help further performance improvements.

Fig. 3
figure 3

The results of SimDCL ablation experiments on three datasets

Visualizing the distribution of representations

In the previous sections, we demonstrated the superiority and effectiveness of SimDCL through extensive experiments; however, we wanted to provide a strong explanation for the performance improvement. We, therefore, followed the operations in [20] and visualized the item embedding distribution in the MovieLens-1M dataset. We first used t-SNE [43] to map the learned representation to a two-dimensional normalized vector on the unit hyperplane \({S}^{1}\). The feature distribution is then plotted using the nonparametric Gaussian kernel density estimate [44] in \({\mathbb{R}}^{2}\). As depicted in Fig. 4, we plot the item embedding distributions for LightGCN, SimDCL, and the two ablation models in “Ablation study” at each layer and in the final one. The two ablation models can be viewed as adding our proposed CLC-GCN and dropout-based CL, respectively, to LightGCN. Analyzing the results in Fig. 4, we can find the following:

  1. (1)

    The higher order embedding \({Z}^{(3)}\) of LightGCN exhibits severe feature smoothing, which is mitigated to varying degrees by our SimDCL as well as the two ablation models. Along with the performance, we consider that this indicates a positive correlation between the mitigation of the smoothing phenomenon and performance improvement.

  2. (2)

    The higher order embeddings of SimDCL w/ CL are more uniformly distributed than those of SimDCL w/ CLC-GCN. This is similar to the results in the ablation experiments, and demonstrates that CL is the main source of performance improvement for SimDCL.

  3. (3)

    Notably, the uniformity of the high-order embedding distribution of SimDCL is between SimDCL w/ CLC-GCN and SimDCL w/ CL, and we can conclude that both too uniform and concentrated feature distributions will degrade performance, and finding a reasonable feature uniformity is the key to performance improvement.

Fig. 4
figure 4

Visualization of item embedding

Parameter sensitivity analysis

In this section, we investigate the effect of the number of layers of graph convolution in SimDCL, and several key hyperparameters on the performance. In all experiments, all hyperparameters are optimal except for the hyperparameters under study.

Impact of the layers of graph convolution

The high-order connectivity of graph convolution propagation is a key factor in the performance improvement of GCF over traditional methods; therefore, we investigated the number of graph convolution layers and obtained the results in Fig. 5. An observation of Fig. 5 reveals that on all three datasets, the performance improves as the number of graph convolution layers increases. However, an excessively high number of layers usually leads to a decrease in performance. The best performance is achieved on all three datasets at a layer number of 3.

Fig. 5
figure 5

The performance comparison of different graph convolution layers

Impact of the parameter \({\varvec{\uplambda}}\)

In the ablation study, we found that the auxiliary task based on CL is the main source of performance improvement, and \({\varvec{\lambda}}\) is the hyperparameter that determines the importance of CL loss during joint training. We investigated the effect of the parameter \({\varvec{\lambda}}\) on performance in three datasets, and the results are shown in Fig. 6a. We find that the optimal \({\varvec{\lambda}}\) is 0.04 on the MovieLens-1M dataset and 0.3 and 0.1 on the Yelp and Ta-Feng datasets, respectively. The performance will be improved to a certain extent as the value of \({\varvec{\lambda}}\) increases. Further increase after reaching the optimal value will induce performance degradation, and a substantially large value of \({\varvec{\lambda}}\) will elicit dramatic performance degradation or even introduce performance crash. Thus, adjustments in the range of [0.01, 0.5] are proposed.

Fig. 6
figure 6

Performance comparison with respect to different λ, \(\tau \), and \(p\)

Impact of the temperature \({\varvec{\tau}}\)

The temperature parameter in the InfoNCE is an important parameter that has a remarkably strong impact on the effect of CL [44]. We fine-tuned the temperature parameter in the range of [0.05, 0.5] and the results are shown in Fig. 6b to analyze the effect of temperature on performance. The optimal τ is in the range of [0.1, 0.3] for all three datasets, and excessively large and small τ will induce performance degradation but will not lead to a performance crash.

Impact of the dropout radio \({\varvec{p}}\)

In SimDCL, the dropout ratio of the dropout layer is an important parameter that determines the proportion of the discarded original embedding. We fine-tuned the parameter in the range of [0.05, 0.5] to analyze the impact of dropout ratio on performance, and the results are shown in Fig. 6c. A small dropout ratio provides a superior performance; thus, fine-tuning the parameter in the range of [0.1, 0.3] is suggested.

Conclusion and future work

In this work, we propose a CL-based recommendation method, named SimDCL, for improving CF performance. SimDCL first uses cross-layer connections in graph convolution operations, and then, a dropout-based CL method is developed to be trained jointly with the recommendation model. Extensive experiments on five datasets demonstrate the effectiveness of SimDCL, and not only that, SimDCL has a very high convergence speed, which makes it more valuable for practical applications.

However, it is important to note that although SimDCL is effective and efficient, the dropout operation increases the cost of forward computation. This is obviously not in line with our intention to simplify the CF method based on CL, so in future work, we will explore the possibility of further simplifying the model structure and further improving the efficiency while keeping the performance constant. In addition, in our experiments, we found that SimDCL still shows inconsistent performance on different graphics cards even when using the exact same experimental settings. Although the performance fluctuations are small, this is detrimental to the stability of SimDCL, and we believe that this may be caused by the different accuracy of the graphics cards, and we will try to solve this problem in future work.