1 Introduction

By making great predictions on users’ future interactions, recommender systems have been used in various areas [1,2,3], e.g., online purchasing platforms and content distribution platforms. Collaborative filtering (CF) [4] is one of the most important recommendation methods and is widely adopted in the industry. Many efforts have been made in improving the performance of CF [5,6,7,8,9,10,11,12].

Among all these methods, graph neural networks (GNNs) [13, 14]-based methods [15,16,17,18,19] show the superiority of making precise predictions. GNNs-based methods model the history interactions as a graph and explicitly incorporate the graph collaborative filtering signal into the embedding encoder procedure. However, these methods suffer from noise and data sparsity problems [14]. Some studies have been conducted on incorporating contrastive learning with GNNs into collaborative filtering tasks to alleviate the problems and learn more robust embeddings, such as self-supervised graph learning (SGL) [10], simple graph contrastive learning (SimGCL) [11], hypergraph contrastive collaborative filtering (HCCF) [8] and Neighborhood-enriched contrastive learning (NCL) [12]. Although the previous contrastive learning-based methods have the ability to discriminate these embeddings from each other, the embeddings may still suffer from indistinguishable graph signals. Take Fig. 1 for example, the embeddings of the four users are well separated from each other, but the third dimension is identical to the second dimension, which fails to fully utilize the expressive ability of the representations. In spectral graph theory, each column of the embeddings in Fig. 1 is a graph signal. The previous methods focus on node-level contrastive learning but ignore the graph signals.

Fig. 1
figure 1

Illustration of the motivation: embeddings of four users with respect to three dimensions

To tackle the above problem, we propose a new graph collaborative filtering model named signal contrastive enhanced graph collaborative filtering (SC-GCF), which enhances the graph collaborative filtering task with contrastive learning at both the node level and the signal level. For the contrastive learning task, the node-level contrastive learning minimizes the distance between the node embeddings and the embeddings propagated from the neighbors, which makes the learned representations more robust. Different from the previous methods, we highlight the importance of graph signals, which are the kernel of the original graph convolution operation [20, 21]. Signal-level contrastive learning aims to make the graph signals different from each other, which guarantees the informativeness of the graph signals. In the graph collaborative filtering task, we utilize GNNs to model the local relations of users/items and use learnable hypergraph convolutional networks to model the implicit global dependency among nodes. Experiments are conducted on four real-world public datasets, and the experimental results show the effectiveness and superiority of the proposed model.

The main contributions of this paper are summarized as follows:

  1. 1.

    We first incorporate signal-level contrastive learning into graph collaborative filtering.

  2. 2.

    We propose SC-GCF, a new contrastive learning-based collaborative filtering method utilizing both GNNs and hypergraphs for representation learning.

  3. 3.

    Extensive experiments are conducted on four public datasets. Experimental results show the superiority of the proposed model.

The rest of the paper is organized as follows. We briefly review the related work in Sect. 2. Section 3 introduces the basic notations of graph collaborative filtering, which serves as preliminaries to this work. In Sect. 4, the structure of SC-GCF is discussed in detail. Section 5 reports the experimental results and Sect. 6 draws the conclusion of this paper.

2 Related Work

In this section, we introduce the related work, namely GNNs-based recommendation, hypergraph-based recommendation, and contrastive learning-based recommendation.

2.1 GNNs-Based Recommendation

GNNs have shown great performance improvements for representation learning in different areas. Graph convolutional network (GCN) [13] is a first-order approximation of graph convolution and utilizes degree normalization to aggregate the neighbor information. However, GCN considers the information’s importance according to the graph structure but ignores the embedding relations between the nodes when aggregating the information. Graph attention networks (GAT) [22] proposes to assign the weights according to the similarity of the adjacent nodes’ embeddings.

GNNs have shown great performance in collaborative filtering with the ability to explicitly encode the graph collaborative signals into user/item embeddings. GNNs-based methods follow the information propagation framework to smooth the user/item embeddings, like graph convolutional matrix completion (GCMC) [15], neural graph collaborative filtering (NGCF) [17], and light graph convolution networks (LightGCN) [18]. GCMC utilizes a graph auto-encoder on the bipartite graph to refine the user/item embeddings from the structure neighbors [15]. NGCF models the high-order connectivity using multiple layers of GNNs and explicitly injects the collaborative signals into the embedding learning procedure [17]. Different from the previous GNNs-based methods, in [18], it is identified that using feature transformation and nonlinear activation in collaborative filtering is useless, and the lightweight model called LightGCN is proposed.

2.2 Hypergraph-Based Recommendation

The hypergraph can capture complex relations among the nodes, which cannot be depicted by a traditional graph structure. For instance, in [23], HyperGraph neural networks (HGNN) is the first work to encode high-order data correlation in a hypergraph structure using the hyperedge convolution operation. The hypergraph structure of HGNN is constructed by calculating the K nearest neighbors using Euclidean distance. A main drawback of HGNN is the fixed hypergraph structure because it only considers the initial embeddings of the nodes and overlooks the refined representations. In Dynamic HyperGraph neural networks (DHGNN) [24], a dynamic hypergraph structure is introduced to tackle the above problem

Recent works incorporate hypergraph neural networks for collaborative filtering. In [7], dual channel hypergraph collaborative filtering (DHCF) utilizes hypergraph neural networks to model the correlation of the users/items on user/item hypergraph respectively. Different from the previous hypergraph-based methods, which construct the hypergraph structure, dynamic hypergraph learning framework for collaborative filtering (DHLCF) [9] proposes to learn the hypergraph structure in an end-to-end manner with a differentiable hypergraph learner and a corresponding hypergraph learning objective. With the success of contrastive learning, hypergraph contrastive collaborative filtering (HCCF) [8] contrasts the embeddings of GNNs with the embeddings of hypergraph neural networks to learn robust representations.

2.3 Contrastive Learning-Based Recommendation

Recently, contrastive learning [25] has attracted researchers’ attention as a new form of self-supervised learning, which is widely used in computer vision, natural language processing, and other areas. In [26], a graph contrastive learning (GraphCL) framework is proposed to learn robust representations of graph data.

Graph contrastive learning is incorporated with collaborative filtering because of its ability to relieve the data sparsity problem. SGL [10] produces two representations of the nodes using the original graph and the augmented graph, and maximizes the agreement between the two representations. Different from the previous graph augmentation methods, SimGCL [11] proposes to construct new representations by adding a small noise on the same direction. To avoid the corrupted graph or the additional noise, NCL [12] explicitly incorporates the structure neighbors and potential neighbors into contrastive pairs to enhance the performance.

3 Preliminaries

CF takes the user-item interactions as input and predicts the preferred items from these uninteracted items for each user. Specifically, given the training interactions, all the users form the user set \({\mathcal {U}}\) and all the items form the item set \({\mathcal {I}}\). Let \({\textbf{R}}\) denote the interaction matrix, then \({\textbf{R}}_{ui}=1\) if user u has interacted with item i and \({\textbf{R}}_{ui} = 0\) otherwise.

Matrix factorization (MF)-based methods [27, 28] propose to use latent representations to indicate the preferences of users or features of items. Let \({\textbf{U}}\) denote the embeddings of users and let \({\textbf{I}}\) denote the embeddings of items. Let \({\textbf{E}} = [{\textbf{U}},{\textbf{I}}]\), which is a concatenation of user embeddings and item embeddings. The latent representations are inherited by the later models.

GNNs-based methods regard the interaction matrix as a graph, where users and items are nodes of the graph and the interactions are the edges between these nodes. Specifically, \({\mathcal {G}}=({\mathcal {V}},{\mathcal {E}} )\), where \({\mathcal {V}}=\{{\mathcal {U}}, {\mathcal {I}}\}\) is the node set, and \({\mathcal {E}}\) is the edge set. The adjacent matrix can be represented as follows:

$$\begin{aligned} {\textbf{A}} = \begin{bmatrix} 0 &{} {\textbf{R}} \\ {\textbf{R}}^T &{} 0 \end{bmatrix} \end{aligned}$$
(1)

which is a square matrix.

GNNs-based models follow the message propagation framework, which aggregates the information from the neighbors for each layer and reads out the embeddings of each layer as the final embedding. The message propagation part can be defined as follows:

$$\begin{aligned} {\textbf{E}}^{l}_u = f_{propagate}(\{{\textbf{E}}^{l-1}_{i} \mid i \in \{{\mathcal {N}}_u \cup \{u\} \} \}) \end{aligned}$$
(2)

where \({\mathcal {N}}_u\) is the set of neighbors of user u and l is the layer number.

Multiple layers of GNNs can be stacked together and there are node embeddings for each layer. In the end, the embeddings of each layer are read out as follows:

$$\begin{aligned} \bar{{\textbf{E}}}_u = f_{readout}({\textbf{E}}^0_u, {\textbf{E}}^1_u, \ldots , {\textbf{E}}^L_u ) \end{aligned}$$
(3)

where \(\bar{{\textbf{E}}}_u\) is the user u’s final representation of the GNNs module. The read-out function \(f_{readout}\) can be an average function, concatenation function, or others.

4 The Proposed Method

In this section, we introduce the proposed SC-GCF model in detail. The overall framework of the model is depicted in Fig. 2.

Fig. 2
figure 2

The overall framework of the proposed SC-GCF model

We first introduce the graph collaborative filtering in Sect. 4.1, where hyper-GNNs are regarded as a complement to GNNs. Then Sect. 4.2 describes the contrastive learning task, which constrains the embeddings at both the node level and the graph signal level. The optimization of these two tasks is introduced in Sect. 4.3.

4.1 Graph Collaborative Filtering

The graph collaborative filtering module of SC-GCF is mainly composed of three parts, namely local relation encoder, global dependency learner, and representation aggregator.

4.1.1 Local Relation Encoder

The user-item interaction history is the most important information in the collaborative filtering task, and the direct interaction relation is encoded in the graph adjacent matrix. We use GNNs to model the local relations between the user nodes and the item nodes. Specifically, inspired by the success of LightGCN, the information propagation function is an average of the neighbor embeddings with respect to the node degrees. The information propagation function for user u can be defined as follows:

$$\begin{aligned} {\textbf{E}}^{l}_u = \sum _{i \in \{{\mathcal {N}}_u \cup \{u\} \}} \frac{1}{\sqrt{ \mid {\mathcal {N}}_u \mid \mid {\mathcal {N}}_i \mid }} ( {\textbf{E}}^{l-1}_{i} ) \end{aligned}$$
(4)

where \({\textbf{E}}^{l-1} \in {\mathbb {R}}^{(\mid {\mathcal {U}} \mid + \mid {\mathcal {I}} \mid )\times d}\) is the output of the former layer, d is the dimension of the embeddings, and \({\textbf{E}}^0\) is randomly initialized. Multiple layers can be stacked to encode the high-hop neighbors’ information. There is an embedding matrix for each layer. After L layers’ propagation, there are \(L+1\) embedding matrixes. We use a weighted sum to aggregate all these embedding matrixes:

$$\begin{aligned} {{\textbf{E}}_{G}}_u = \sum _{l=0}^{L} \frac{1}{l+1} {\textbf{E}}^{l}_u \end{aligned}$$
(5)

The embeddings of items are performed symmetrically.

Lastly, we provide a matrix form representation of the module as follows:

$$\begin{aligned} \hat{{\textbf{A}}}&= {\textbf{D}}^{-\frac{1}{2}} {\textbf{A}} {\textbf{D}}^{-\frac{1}{2}}\end{aligned}$$
(6)
$$\begin{aligned} {\textbf{E}}^l= & {} \hat{{\textbf{A}}} {\textbf{E}}^{l-1}\end{aligned}$$
(7)
$$\begin{aligned} {\textbf{E}}_G= & {} \sum _{l=0}^{L} \frac{1}{l+1} {\textbf{E}}^{l} \end{aligned}$$
(8)

where \({\textbf{D}}\) is the diagonal degree matrix of the graph, \(\hat{{\textbf{A}}}\) is the normalized adjacent matrix, and \({\textbf{E}}_G\) is the aggregated embeddings of the GNNs module.

4.1.2 Global Dependency Learner

Although the GNNs module can encode the local relations well, it is based on a fixed adjacent matrix, which changes with the growth of the interactions. There are some complex relations that have not appeared or cannot be expressed in the traditional graph structure. These features, which we call global dependency, cannot be captured by the GNNs module.

We propose to learn the global dependency of the nodes using a dynamic hypergraph structure learner. This module can be described in three steps: hypergraph structure, hyperedge embedding, and hypergraph convolution.

Different from the ordinary graph, the structure of a hypergraph is described by hyperedges, where each edge can connect any number of nodes. For each hyperedge, multiple nodes are related to this hyperedge. We propose to use a dynamic hypergraph structure learner to learn the structure instead of a fixed constructed one.

The hypergraph structure is defined as follows:

$$\begin{aligned} {\textbf{H}} = Softmax(LeakyRelu({\textbf{E}}{\textbf{W}})) \end{aligned}$$
(9)

where \({\textbf{W}} \in {\mathbb {R}}^{d \times h}\) serves as a linear mapping from node embeddings to the node-hyperedge assignment weights, and h is the number of hyperedges. LeakyRelu is the nonlinear transformation and Softmax is the normalization function. \({\textbf{H}}_i\) denotes the nodes that are related to the i-th hyperedge.

Hyperedge embeddings can be represented by the embeddings of the nodes that are related to the hyperedge. The hyperedge embeddings are aggregated through the hypergraph structure with the node embeddings.

$$\begin{aligned} \varvec{\Gamma } = LeakyRelu( {\textbf{H}}^T {\textbf{E}} ) \end{aligned}$$
(10)

where \(\varvec{\Gamma } \in {\mathbb {R}}^{h\times d}\), which denotes the embeddings of the hyperedges.

Hypergraph convolution follows the information propagation framework where hyperedge embeddings propagate through the hypergraph structure to refine the node embeddings. With the hypergraph structure and hyperedge embeddings, the hypergraph convolution can be defined as follows:

$$\begin{aligned} \begin{aligned} {\textbf{E}}_{out}&= LeakyRelu( {\textbf{H}} \varvec{\Gamma } ) \\&= LeakyRelu( {\textbf{H}} LeakyRelu({\textbf{H}}^T {\textbf{E}})) \end{aligned} \end{aligned}$$
(11)
Fig. 3
figure 3

The illustration of the hypergraph structure learner on users

The hypergraph structure learner is illustrated in Fig. 3. The hypergraph structure learner is applied on users and items respectively. The inputs are the user embeddings and item embeddings of \({\textbf{E}}_G\) learned by the local relation encoder, and the outputs are concatenated together to form the final hypergraph embeddings \({\textbf{E}}_H = Concat({\textbf{E}}_{out}^U, {\textbf{E}}_{out}^I)\).

4.1.3 Representation Aggregator

We use the weighted sum function to aggregate the embeddings learned by the local relation encoder and the global dependency learner. The local embeddings contribute more information to the collaborative filtering task and the global embeddings are used as complementary elements to it. When the global embeddings are similar to the local embeddings, the weights for global embeddings should be larger. Therefore, the representation aggregator is defined as follows:

$$\begin{aligned} \bar{{\textbf{E}}}&= {\textbf{E}}_G + \alpha {\textbf{F}} \odot {\textbf{E}}_H \end{aligned}$$
(12)
$$\begin{aligned} {\textbf{F}}_{i*}&= \frac{1}{1+e^{- {{\textbf{E}}_G}_i^T {{\textbf{E}}_H}_i} } \end{aligned}$$
(13)

where \(\alpha\) is a hyperparameter to balance the importance between the local embeddings and the global embeddings, and \({\textbf{F}}\) measures the similarity between local embeddings and global embeddings.

4.1.4 Prediction and Optimization

With the representations learned before, the preference between user u and item i can be predicted as follows:

$$\begin{aligned} {\hat{y}}_{ui} = \bar{{\textbf{E}}}_u^T \bar{{\textbf{E}}}_i \end{aligned}$$
(14)

Bayesian pairwise learning to rank (BPR) [29] is widely used in the collaborative filtering task. BPR aims to maximize the difference between the preferred items and the disliked items for users.

$$\begin{aligned} {\mathcal {L}}_{BPR} = \sum _{(u,i,j) \in {\mathcal {D}}}-\ln \sigma ({\hat{y}}_{ui} -{\hat{y}}_{uj}) \end{aligned}$$
(15)

where (uij) is an instance of the training data \({\mathcal {D}}\), item i is a preferred item of user u and item j is sampled from the uninteracted items of user u, and \(\sigma (\cdot )\) is the Sigmoid function.

4.2 Contrastive Learning

Graph collaborative filtering relies on the topology of the graph, which suffers from the data sparsity problem. Recently, the contrastive learning method is proposed as a new self-supervised learning form, which can relieve the data sparsity problem and improve the robustness of the embeddings. However, the existing methods ignore the importance of graph signals, which we highlight in this paper. We utilize contrastive learning from both the node level and the signal level to enhance the robustness of the embeddings and informativeness of the graph signals.

4.2.1 Node-Level Contrast Learning

The node-level contrastive learning component is designed to maximize the agreement between the initial embeddings and the embeddings aggregated from neighbors. Based on the InfoNCE loss [30], the node-level contrastive learning objective of users can be defined as follows:

$$\begin{aligned} {\mathcal {L}}^{{\mathcal {U}}}_{SSL} = \sum _{u\in {\mathcal {U}}} -log \frac{exp({\textbf{E}}^0_u \cdot {\textbf{E}}^k_u / \tau )}{\sum _{v \in {\mathcal {U}}} exp({\textbf{E}}^0_v \cdot {\textbf{E}}^k_u / \tau )} \end{aligned}$$
(16)

where k is a hyper-parameter representing the selected layer to contrast with the initial embeddings (\(0 \le k \le L\)), and \(\tau\) is a hyper-parameter, which controls the temperature of the softmax function. Considering the user-item interaction graph is a bipartite graph, k can be an even number to guarantee the homogeneity of the representations.

Symmetrically, the item side node-level contrastive learning objective can be obtained as follows:

$$\begin{aligned} {\mathcal {L}}^{{\mathcal {I}}}_{SSL} = \sum _{i\in {\mathcal {I}}} -log \frac{exp({\textbf{E}}^0_i \cdot {\textbf{E}}^k_i / \tau )}{\sum _{j \in {\mathcal {I}}} exp({\textbf{E}}^0_j \cdot {\textbf{E}}^k_i / \tau )} \end{aligned}$$
(17)

4.2.2 Signal-Level Contrast Learning

Graph signal is a mapping from a vertex to a real number. From the frequency aspect, GNNs act as low-pass filters to the graph signals. When different signals are smoothed by GNNs, the difference among the signals should reserve to guarantee the informativeness of different signals. With the insight of the graph signals, signal-level contrastive learning is proposed to maximize the difference among the graph signals. The signal-level contrastive learning objective can be defined as follows:

$$\begin{aligned} {\mathcal {L}}^{Signal}_{SSL} = \sum _{k=1}^{d} -log \frac{exp({\textbf{S}}^0_k \cdot {\textbf{S}}^L_k / \tau )}{\sum _{m=1}^{d} exp({\textbf{S}}^0_m \cdot {\textbf{S}}^L_k / \tau )} \end{aligned}$$
(18)

where \(S^{0} = {E^{0}}^T, S^{L} = {E^{L}}^T\), and d is the number of graph signals, which equals the dimension of the embeddings.

4.3 Optimization

With the graph collaborative filtering task and contrastive learning task described above, the objective function of the model is a weighted sum of these two objective functions as follows:

$$\begin{aligned} \begin{aligned} {\mathcal {L}}&= {\mathcal {L}}_{BPR} + \lambda _1 {\mathcal {L}}_{SSL}\\ {\mathcal {L}}_{SSL}&= {\mathcal {L}}^{{\mathcal {U}}}_{SSL} + {\mathcal {L}}^{{\mathcal {I}}}_{SSL} + \lambda _2 {\mathcal {L}}^{Signal}_{SSL} \end{aligned} \end{aligned}$$
(19)

where \(\lambda _1\) and \(\lambda _2\) are hyper-parameters to control the importance of contrastive learning and signal-level contrastive learning respectively.

5 Experiments

Experiments are conducted on four real-world datasets to answer the following three questions:

  1. 1.

    How does SC-GCF perform compared with the state-of-the-art methods?

  2. 2.

    How do the signal level contrastive learning and hypergraph learning contribute to the performance?

  3. 3.

    How do the key hyper-parameters affect the model performance?

5.1 Experimental Settings

In this subsection, we describe the detailed settings of the experiments.

5.1.1 Datasets

We conduct experiments on four public datasets, i.e., LastFM, MovieLens-100K, MovieLens-1 M, and Beauty.

  1. 1.

    LastFM [31]: The LastFM dataset is a stream history on Last.fm recording the listening interactions between the users and the artists.

  2. 2.

    MovieLens [32]: The Movielens dataset is collected by the GroupLens research project at the university of Minnesota. Two datasets of different sizes, namely MovieLens-100K and MovieLens-1 M, are used in the experiments.

  3. 3.

    Beauty [33]: Amazon-review datasets are crawled from Amazon and widely used in recommendations. The Beauty dataset is a sub-dataset of Amazon-review split by category.

The statistics information of these datasets is presented in Table 1. To ensure the quality of the data, we filter out users and items whose interactions are fewer than 10 on MovieLens-1 M. Then the data is split into the training set, validation set, and test set by 80%, 10%, and 10% respectively. For user u of each interaction in the training set, we randomly select an item for user u as a negative interaction to form the training data.

Table 1 Descriptive statistics of the datasets

5.1.2 Evaluation Metrics

The following evaluation Metrics are adopted in the experiments:

  1. 1.

    Recall@K: Recall is widely used in top-K recommendations. Recall measures the percentage of the truly interacted items ranked in the top-K items.

  2. 2.

    NDCG@K(Normalized Discounted Cumulative Gain): NDCG measures the ranking quality of the recommended top-K items. The higher the positive items are ranked, the larger the score of NDCG is.

The evaluation of the ranking is performed on all the items. K is set to 20 for validation, and K is set to {10, 20, 50} for testing.

5.1.3 Baseline Methods

We compare SC-GCF with the following eight baseline methods to confirm the effectiveness of our model.

  1. 1.

    BPRMF [29]: BPRMF learns the latent representation of users and items under the low-rank property of the interaction matrix.

  2. 2.

    NeuMF [5]: NeuMF utilizes the deep neural network to capture the relationships between user embeddings and item embeddings.

  3. 3.

    NGCF [17]: NGCF considers the high-hop neighbors using multiple layers of graph convolutional network.

  4. 4.

    LightGCN [18]: LightGCN removes the feature transformation and the nonlinear mapping of NGCF and has an outstanding performance among GNNs-based models.

  5. 5.

    DHCF [7]: DHCF utilizes dual channel hypergraph convolutional neural networks on user/item hypergraph respectively.

  6. 6.

    HCCF [8]: HCCF incorporates graph neural networks with hypergraph neural networks to learn the user/item representations.

  7. 7.

    SGL [10]: SGL utilizes graph augmentation methods on the original graph and uses contrastive learning on the two node embeddings to guarantee robustness.

  8. 8.

    SimGCL [11]: SimGCL randomly adds a small vector of the same direction on the node embeddings to form the augmented embeddings.

5.1.4 Parameter Setup

We implement SC-GCF and these baseline methods with RecBole [34], which is an open-source framework for the recommendation. Adam is used as the optimizer, and the learning rate is set to 0.001. The dimension of the embeddings is 64, and all the parameters are randomly initialized by Xavier distribution. The training batch size is set to 256. We set the max training epoch to 300 and use the early stopping strategy with the patience of 10 epochs, where NDCG@20 is regarded as the indicator. The hyper-parameter \(\lambda _1\) is selected from \(\{1, 10, 1e2, 1e3, 1e4, 1e5 \}\). The hyper-parameters \(\lambda _2\), and \(\alpha\) are selected from \(\{1e{-8}, 1e{-7},\ldots , 1e{-1} \}\). The temperature \(\tau\) is selected from \(\{0.05, 0.1, 0.2, 0.5, 1, 2, 5\}\).

5.2 Comparison with Baseline Methods

Table 2 Comparison with baseline methods on LastFM
Table 3 Comparison with baseline methods on MovieLens-100K

To answer the first question, we compare the experimental results of SC-GCF with all the baseline models on the four public datasets. The comparison results are listed in Tables 2, 3, 4 and 5. The best results of all methods are highlighted using bold font. The performance improvements of SC-GCF over the best results of these baselines are recorded in IMP-BEST. Based on the comparison results, we have the following observations:

  1. 1.

    SC-GCF consistently outperforms the baseline methods on all datasets. In particular, SC-GCF improves over the best baseline in terms of Recall@10 by 1.76%, 4.47%, 7.43%, and 4.32% on LastFM, Movielens-100K, MovieLens-1 M, and Beauty respectively. These improvements demonstrate the superiority of SC-GCF in learning robust representations and informative signals on sparse data compared with the baseline methods.

  2. 2.

    The performance improvements vary on different datasets and different evaluation metrics. For the datasets, different datasets have different latent properties, which leads to different improvements. For the evaluation metrics, generally speaking, the performance improvements are smaller with the increase of K. This is because the items that a user is interested in are more probably to appear in the recommended lists when K is large. Especially, when K is equal to the size of the item set, the Recall@K of all methods will be the same.

  3. 3.

    Signal-level contrastive learning is helpful to learn better representations. Compared to the node-level contrastive learning-based graph collaborative filtering methods (i.e., SGL, SimGCL), SC-GCF consistently achieves better performance, which verifies the effectiveness of the signal-level contrastive learning.

Table 4 Comparison with baseline methods on MovieLens-1 M
Table 5 Comparison with baseline methods on beauty

5.3 Ablation Study

To answer the second question, we conduct experiments to analyze the component effects of signal-level contrastive learning and hypergraph representation learning. The experimental results are reported in Fig. 4, where “w/o S” and “w/o H” denote the variants by removing signal-level contrastive learning and hypergraph representation learning. Compared with SC-GCF, there is a decrease in the performances of “w/o S” and “w/o H” due to the removal of the signal-level contrastive learning module or the hypergraph representation learning module respectively. Compared with LightGCN, there are significant performance improvements with different introduced modules. In conclusion, the experimental results demonstrate the contribution of signal-level contrastive learning and hypergraph representation learning to the performance of SC-GCF.

Fig. 4
figure 4

Ablation study on MovieLens-1 M in terms of Recall@20

5.4 Parameter Analysis

To answer the third question, in this section, we conduct experiments to analyze how these key parameters (e.g., embedding dimensionality d and temperature \(\tau\)) affect the performance of SC-GCF.

Fig. 5
figure 5

Parameter analysis: The Recall@20 values obtained by SC-GCF as a function of the six hyper-parameters on MovieLens-1 M

The experimental results of Recall@20 on MovieLens-1 M with respect to different dimensionality d, temperature \(\tau\), weight \(\lambda _1\), weight \(\lambda _2\), layer number L and weight \(\alpha\) are shown in the six subfigures of Fig. 5 respectively. The performance keeps growing with the increase of dimensionality d. When the dimensionality d is bigger, the embeddings and the models have a better expressive ability. For the temperature \(\tau\), when \(\tau\) is 0.05, the best performance is achieved. When temperature \(\tau\) is smaller, the InfoNCE loss pays more attention to the harder samples. According to the experiments, paying attention to the harder samples helps to improve the performance. For the weights \(\lambda _1\), there is an increment when \(\lambda _1\) is small. The best performance is achieved when \(\lambda _1\) is 100 and the performance gets worse when \(\lambda _1\) increases. \(\lambda _1\) indicates the importance of contrastive learning loss, and bigger \(\lambda _1\) makes the model focus too much on the contrastive learning task instead of the collaborative filtering task, while smaller \(\lambda _1\) removes the contrastive learning task and the model reduces to pure GNN-based methods. Similarly, when \(\lambda _2\) is \(1e^{-6}\), we get the best performance of SC-GCF on MovieLens-1 M, which means a proper weight for signal-level contrastive loss is reached here. The performance keeps declining with the increase of layer number L, due to the over-smoothness problem when multiple layers are stacked. \(\alpha\) is the weight for the global dependency representation. When the weight \(\alpha\) is big, the representation pays too much attention to the global dependencies and ignores the local relations. When \(\alpha\) is 0, SC-GCF degrades to the model which removes the hypergraph representation learning module.

6 Conclusion

In this paper, we focus on the graph signals, which are ignored by the previous contrastive learning-based graph collaborative filtering methods. We have proposed a novel model termed SC-GCF by introducing signal-level contrastive learning into the collaborative filtering task. For the representation learning of the collaborative filtering task, we utilize GNNs to model the local relations between users and items. Besides, we propose to learn the global dependency among users/items using a dynamic hypergraph structure learner. Experiments are conducted on four public datasets, and the experimental results show the effectiveness and superiority of SC-GCF over the baseline methods.