Joint data augmentations for automated graph contrastive learning and forecasting

Liu, Jiaqi; Chen, Yifu; Ren, Qianqian; Gao, Yang

doi:10.1007/s40747-024-01491-3

Joint data augmentations for automated graph contrastive learning and forecasting

Original Article
Open access
Published: 15 June 2024

Volume 10, pages 6481–6490, (2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Joint data augmentations for automated graph contrastive learning and forecasting

Download PDF

356 Accesses
Explore all metrics

Abstract

Graph augmentation plays a crucial role in graph contrastive learning. However, existing methods primarily optimize augmentations specific to particular datasets, which limits their robustness and generalization capabilities. To overcome these limitations, many studies have explored automated graph data augmentations. However, these approaches face challenges due to weak labels and data incompleteness. To tackle these challenges, we propose an innovative framework called Joint Data Augmentations for Automated Graph Contrastive Learning (JDAGCL). The proposed model first integrates two augmenters: a feature-level augmenter and an edge-level augmenter. The two augmenters learn whether to drop an edge or node to obtain optimized graph structures and enrich the information available for modeling and forecasting task. Moreover, we introduce two stage training strategy to further process the features extracted by the encoder and enhance their effectiveness for forecasting downstream task. The experimental results demonstrate that our proposed model JDAGCL achieves state-of-the-art performance compared to the latest baseline methods, with an average improvement of 14% in forecasting accuracy across multiple benchmark datasets.

LAGCL: Towards Stable and Automated Graph Contrastive Learning

Adaptive graph contrastive learning with joint optimization of data augmentation and graph encoder

Article 12 October 2023

MPGCL: Multi-perspective Graph Contrastive Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Graph representation learning is a technique utilized to encode graph data into low-dimensional vectors for various downstream tasks, including time series representation, forecasting, and traffic prediction. One popular approach in this field is graph contrastive learning (GCL), which has attracted much attentions due to its capabilities in unsupervised learning, robustness, and generalization. To further enhance the performance of GCL models, graph augmentation techniques are commonly used. These techniques introduce noise and provide additional training samples, thereby improving the model’s robustness against noisy or incomplete data. Designing effective augmentation schemes for graphs is challenging, as graphs often represent diverse attributes of the original data [1]. GCA (Graph Contrastive Augmentation) [2] aims to overcome this challenge by incorporating the notion of centrality, which considers the semantic influences between nodes and edges. However, a limitation of centrality measures is that they are task-specific and cannot be learned from data alone, as they rely on prior domain knowledge.

With the advancements in contrastive learning techniques, various contrastive learning-based models have been developed. ST-TSNet [3] introduces stochastic argumentation to capture local and spatial dependencies and effectively explore deep spatio-temporal representations. SSGNN [4] leverages both original and mask-augmented data in the model. It combines self-distillation techniques with self-supervised learning tasks to enhance the capture of spatio-temporal features in graph data. JOAO [5] focuses on optimizing the combination of augmentations without explicitly optimizing the augmentations themselves. By jointly optimizing the augmentation strategies, JOAO improves the performance of contrastive learning models. AutoSSL [6] concentrates on optimizing the weights for combined self-supervised tasks. It automates the process of selecting the most effective tasks for self-supervised learning, thereby enhancing the overall performance of the model.

In this paper, we present JDAGCL (Joint Data Augmentations for Automated Graph Contrastive Learning), a graph contrastive learning framework. Our work makes the following key contributions:

We introduce two augmenters that generate appropriate views of the original graph, taking into account both the topology and features of the graph. These augmenters enhance the learning process by providing diverse perspectives and enabling the encoder to capture essential information from different aspects of the graph.
We introduce a novel joint training strategy that guarantees semantic consistency and improve the forecasting performance.
We conduct comprehensive experiments on four real-world datasets to evaluate the effectiveness of our approach. The experimental outcomes demonstrate that our method consistently outperforms existing techniques, yielding superior forecasting results.

Related work

Graph contrastive learning

Contrastive learning has gained significant attention in various domains in recent years [7, 8]. In image-based contrastive learning, effective contrastive samples can be obtained by selecting augmentation techniques that preserve the intrinsic semantics of the images, leveraging human understanding of image semantics. However, graph data presents greater abstraction and complexity compared to images, making it challenging to ensure that operations on the graph do not disrupt its fundamental semantics [9]. Consequently, selecting appropriate contrastive samples in graph contrastive learning becomes a significant challenge. To address this challenge, existing research has explored different approaches to leverage various aspects of the graph as contrast samples. For example, in the work by [10], the triplet loss is introduced to train a biased encoder that prioritizes easy negative samples, enhancing the learning process by focusing on informative contrastive samples. HSAN (Hardness-based Self-Adaptive Negative Sampling) [11] incorporates both attribute embedding and structure embedding to compute the similarity between samples. This approach provides a more comprehensive representation of the relationships between samples, facilitating the measurement of sample hardness and enabling the selection of more challenging contrastive samples.

Data augmentation on graph

Data augmentation plays an important role in graph contrastive learning as it enhances the diversity of training data by applying various transformations and extensions to graph data. This augmentation process makes the model acquire more robust graph representations and effectively handle variations in input data [9, 12]. Many augmentation techniques have been explored in the context of graph contrastive learning. These techniques include node dropping [13, 14], edge perturbation [15], and subgraph extraction [16,17,18]. Each technique employs specific changes to the original graph data to optimize the following learning procedure. GraphCL [1] extensively investigates various combinations of data augmentation to identify the optimal augmentation strategy for specific downstream tasks. By exploring different augmentation techniques and their combinations, GraphCL aims to maximize the benefits of data augmentation for graph contrastive learning. SSGNN [4] incorporates both original and mask-augmented data into the model. It adopts the self-distillation method and self-supervised learning task to enhance the model. AD-GCL [19] employs an automated edge dropping strategy using data augmentation to enhance the model performance. However, determining the appropriate augmentation rate to ensure semantic lower bounds remains challenge.

Performance analysis for downstream task

There are some existing neural network methods concerning the downstream task such as forecasting and classification. MRENN [20] updates the model parameters using the Lyapunov stability criterion and establishes a recursive learning rate mechanism to expedite the learning process. The work in [21] proposes a taxonomy of graph data augmentation techniques and subsequently offer a structured review by categorizing related work according to augmented information modalities. Hybrid-GTS [22] integrates time series and geographic factors to enhance prediction accuracy. It constructs a geometric graph using node locations and leverages a probabilistic graph structure learned from node embeddings to capture nonlinear temporal dynamics. The work in [23] proposes a diagonal recurrent neural network for adaptive control of nonlinear dynamic systems. Gash-LKH [24] proposes a graph-based learning framework that combines sparse graph neural network with Lin-Kernighan heuristic solver.

Overall, we note that although significant progress has been made, existing methods still have some limitations, including the reliance on domain knowledge for data enhancement methods and influence of data incompleteness. These limitations motivate us to propose the JDAGCL framework. By introducing joint training strategy and automated graph contrast learning methods, we aim to overcome the limitations of existing methods and provide a more flexible, adaptive and superior performance solution.

Methodology

Figure 1 illustrates the overall architecture of the model. We will describe the three major components of our methods, including learnable data augmentation, joint training strategy and encoder.

Problem definition

This work mainly deals with forecasting tasks based on our augmentation and contrastive learning. Formally, an unlabeled attribute graph is defined as $G=(V,E, A)$, where V is the nodes set, and E is the edges set. $A\in \mathcal {R}^{N\times N}$ is the adjacency matrix of the graph. The entry $A_{ij}=1$ if $(v_i,v_j)\in E$, indicating that there is a correlation between node $v_i$ and $v_j$, otherwise $A_{ij}$=0. Given the time series data $X_{1:T} = \left( X_{1},X_{2},\dots , X_{T} \right) \in \mathcal {R}^{T\times N\times C} $, where C denotes the number of features being considered, N is the node number, and T is the number of time steps. The forecasting problem involves predicting the data at a future time step based on historical data, denoted as $X_{T+1} \in R^{N\times C}$

Learnable data augmentation

Our automated augmentation strategy consists of two approaches: node level augmentation and edge-level augmentation. These two approaches precisely focus on the two core components of the original graph, nodes and edges, and extract appropriate views to further assist the encoder in capturing the semantic commonalities between node views and topological views. The combination of node-level and road-level augmentation provides a comprehensive approach to data augmentation. They jointly capture both the broader semantic features. Figure 1 illustrates the two augmentation mechanisms.

Node-level augmentation

The node-level augmentation introduces diversity by perturbing partial nodes at the specific time step. Specifically, we randomly select a subset of nodes at a particular time step and apply perturbations to their corresponding embeddings. The selection of nodes to be perturbed is an important aspect that contributes to the diversity introduced in the data. In our mechanism, we consider the aggregation weight as a criterion for selecting the nodes to be perturbed, as it reflects the importance or influence of a nodes in the overall patterns. In the node-level augmentation, the masking probability $\rho _{t,i}$ is determined based on the aggregation weight $\lambda _{t,i}$ using a Bernoulli distribution, that’s $\rho _{t,i} \sim Bern\left( 1-\lambda _{t,i} \right) $. $\rho _{t,i}$ indicates the likelihood of masking the feature $X_{t,i}$ for region i at time step t. However, directly applying a random deletion method based on the masking probability can potentially lead to the unintended masking of crucial data.

To address this issue and prevent the masking of important information, we design a filtering strategy with mean operation. This strategy ensures that high-importance data is not masked while increasing the probability of masking low-importance data. The filtering strategy involves calculating the arithmetic mean of the masking probabilities $\rho _{t,i}$ for all regions at a specific time step t. Let’s denote this mean as $\rho _1$, which serves as the masking threshold. The calculation is as follows:

$$\begin{aligned} \rho _1 = \frac{1}{N} \sum _{i=1}^{N} \rho _{t,i} \end{aligned}$$

(1)

Given this threshold, we can determine whether to mask the data for a particular node i at time step t. If $\rho _{t,i}$ is greater than $\rho _1$, a random masking method is applied to the data for that region. On the other hand, if $\rho _{t,i}$ is less than $\rho _1$, the masking probability for the data of node i is set to 0, ensuring that the data is not masked. We use $G_{NM}$ to denote the generated view, and corresponding embedding is denoted as $E_{NM}$.

Edge-level augmentation

The edge-level augmentation considers the fine-grained details of the graph. The process can be divided into two aspects: non-adjacent nodes and adjacent nodes.

Adding edges. Adding edge involves considering the correlation between pairs of non-adjacent nodes in the graph. This correlation is represented by the probability $\rho _{i,j}$, which is calculated using the Bernoulli distribution: $\rho _{i,j} \sim Bern(\psi _{i,j})$. A higher value of $\rho _{i,j}$ indicates a stronger correlation between non-adjacent nodes i and j. To ensure that edges are added between highly correlated nodes while avoiding the addition of edges between nodes with low correlation, a filtering strategy is proposed. This strategy utilizes a threshold, denoted as $\rho _2$, which is determined by taking the arithmetic mean of $\rho _{i,j}$ for all non-adjacent nodes in graph. The formulation is as follows:

$$\begin{aligned} \rho _2 = \frac{1}{a} \sum _{e_{ij}\not \in E}^{} \rho _{i,j} \end{aligned}$$

(2)

where a represents the number of non-adjacent nodes pairs. Using this threshold, we can decide whether to add an edge between a pair of non-adjacent regions i and j. When $\rho _{i,j} > \rho _2$, the random addition strategy is employed, and an edge is added between regions i and j.

This filtering strategy prioritizes the addition of edges between highly correlated non-adjacent regions while preventing the addition of edges between regions with low correlation. We use $G_{EA}$ to denote the generated view, and corresponding embedding is denoted as $E_{EA}$.

Deleting edges. For adjacent nodes i and j, the probability of deleting the edge between them can be obtained using the Bernoulli distribution: $\rho _{i,j} \sim Bern(1-\psi _{i,j})$. Here, $\psi _{i,j}$ represents the correlation between adjacent nodes i and j. A higher value of $\rho _{i,j}$ indicates a lower correlation between the adjacent nodes and a higher likelihood of deleting the edge. However, it’s important to note that a low value of $\rho _{i,j}$ may also result in deleting the edge between nodes i and j, which can have negative effects on the data. To address this issue, a filtering strategy is proposed to increase the probability of deleting edges between low-correlation adjacent nodes while ensuring that edges between highly correlated adjacent nodes are not deleted.

The filtering strategy involves calculating the arithmetic mean of $\rho _{i,j}$ for all adjacent regions in the region graph. Let’s denote this mean as $\rho _3$, which serves as the threshold for distinguishing the correlation between adjacent regions. The formulation is as follows:

$$\begin{aligned} \rho _3 = \frac{1}{b} \sum _{\sum _{e_{ij}} \in E} \rho _{i,j} \end{aligned}$$

(3)

where b represents the number of adjacent regions pairs, satisfying $a + b = \frac{N(N-1)}{2}$. Using this threshold $\rho _3$, we can determine whether to delete an edge between a pair of adjacent regions i and j. When $\rho _{i,j} > \rho _3$, the probability of deleting $e_{ij}$ remains unchanged, instead a random deletion strategy is applied. On the other hand, the probability of deleting $e_{ij}$ is set to 0, ensuring that the edge is not deleted. We use $G_{ED}$ to denote the generated view, and corresponding embedding is denoted as $E_{ED}$.

Joint training strategy

In this section, we introduce our joint training strategy. Specifically, the whole procedure is conducted in two stages, as shown in Fig. 2. Stage 1 processes from the temporal dimension, and stage 2 processes from the spatial dimension.

Stage 1. We aim to enhance the semantic contextual information representation capability of embeddings. The whole procedure is divided into three stages: First, the original data and augmented data are fused using element-wise multiplication to form node embeddings. This fusion is performed for each node i at time step t as follows:

$$\begin{aligned} m_{t,i}= e_{t,i} \odot W_{1} + \tilde{e} _{t,i} \odot W_{2} \end{aligned}$$

(4)

where $e_{t,i}$ represents the node embeddings of original data for ndoee i at time step t, $\tilde{e}_{t,i}$ represents the embeddings of augmented data for node i at time step t, and $W_{1}$ and $W_{2}$ are learnable parameters.

Then, the node embeddings $m_{t,i}$ obtained in the previous step are aggregated to generate a global representation $M_{t}$ for time step t. The aggregation is performed by taking the average of the node embeddings across N nodes:

$$\begin{aligned} M_{t} = \frac{1}{N} \sum _{i=1}^{N} m_{t,i} \end{aligned}$$

(5)

Finally, the node embedding $m_{t,i}$ and the global representation $M_{t}$ are treated as positive pairs, as they capture the trend of sequence changes at the current time step. The node embedding $m_{t,i}$ and the embedding $m_{t',i}$ for other time steps $t'$ are treated as negative pairs, as they capture the semantic contextual information of data between different time steps. The cross-entropy loss function is used for optimization::

$$\begin{aligned} \mathcal {L} _{t}=-\left( \sum _{i=1}^{N} log\left( m_{t,i},M_{t} \right) + \sum _{i=1}^{N} log\left( 1 - \left( m_{t,i},m_{t',i} \right) \right) \right) \end{aligned}$$

(6)

where t and $t'$ represent two different time steps.

Stage 2. In this stage, we aim to capture the semantic contextual information of data in the spatial dimension. Firstly, we fuse the spatial semantic contextual information, represented by Q categories, with the augmented node embeddings. Formally, the fusion process is as follows: $u_{i,q}= r_{q}^{T} \tilde{e} _{i} $, where $r_{q}$ represents the embedding of the q-th category, and $u_{i,q}$ represents the strength of relevance. Then, the category score for node i is as follows: $u_{i}=\left( u_{i,1},u_{i,2},...,u_{i,Q}\right) ^{T}$ Next, we use the category representation of the original node embedding for the self-supervised task: $v_{i,q}= r_{q}^{T} e _{i} $. Finally, the loss function for the self-supervised task is as follows:

$$\begin{aligned} \mathcal {L}_{s} = -\sum _{i=1}^{N} \sum _{q=1}^{Q} u_{i,q} log\frac{exp\left( v_{i,q}/\theta \right) }{ \sum _{q=1}^{Q} exp\left( v_{i,q}/ \theta \right) } \end{aligned}$$

(7)

where $\theta $ represents the temperature parameter. The overall optimization objective for the semantic contextual self-supervised learning task is as follows:

$$\begin{aligned} \mathcal {L} _{st}=\mathcal {L} _{t}\cdot W_{3} +\mathcal {L} _{s}\cdot W_{4} \end{aligned}$$

(8)

where $W_{3}$ and $W_{4}$ represent learnable parameters. Through these two self-supervised tasks, we can effectively assist the model in capturing semantic contextual information in data.

Encoder

In this section, we design the encoder to model the data correlations. In particular, we take the traffic forecasting as an example of downstream task. The encoder consists of two parallel channels, each combining gated-TCN and multi-scale spatial-convolutional layers. Both channels in the model adopt a "sandwich" structure. The first and third layers of each channel are gated-TCN layers, responsible for capturing temporal dependencies. The second layer of each channel is a multi-scale spatial-convolutional layer, which focuses on capturing spatial dependencies.

Case 1: Learning Temporal Semantics. To capture temporal correlations, the original traffic tensor and the augmented traffic tensor are separately input into two parallel gated-TCN layers. These layers process the input tensors and produce embedding matrices with temporal awareness. The outputs of the gated-TCN layers are denoted as $Y_t \in R^{N \times D}$, representing the embedding matrices at each time step. Specifically, the procedure is formulated as:

$$\begin{aligned} \left( Y_{1} ,Y_{2},\dots ,Y_{T} \right) = GatedTC\left( X_{1} ,X_{2},\dots ,X_{T} \right) \end{aligned}$$

(9)

Case 2: Learning Spatial Semantic. Different multi-scale convolutional layers, denoted as $MultiSC_1$ and $MultiSC_2$, are used in the two channels of the module to capture the spatial correlations in traffic data. These layers take the embedding matrices $Y_t$ from the gated-TCN layers, along with the adjacency matrix A of the region graph. The procedure is formuated as:

$$\begin{aligned} \begin{matrix}\left( Z_{1}^1 ,Z_{2}^1,\dots ,Z_{T}^1 \right) = MultiSC_1\left( Y_{1} ,Y_{2},\dots ,Y_{T} ,A \right) \\ \left( Z_{1}^2 ,Z_{2}^2,\dots ,Z_{T}^2 \right) = MultiSC_2\left( Y_{1} ,Y_{2},\dots ,Y_{T} ,A \right) \end{matrix} \end{aligned}$$

(10)

where A is the adjacency matrix of the region graph. The multi-scale spatial-convolutional layers with different receptive fields are designed to effectively capture both local and global spatio-temporal features simultaneously. The outputs $\left( Z_{1}^1, Z_{2}^1,\dots ,Z_{T}^1 \right) $ and $\left( Z_{1}^2,Z_{2}^2,\dots ,Z_{T}^2 \right) $ are then fed into the gated-TCN layers, respectively. Finally, a dropout operation is performed to obtain the final embedding matrix $E=\{\mathbf {e_1},\mathbf {e_2},\dots ,\mathbf {e_N}\}$, which is used for the prediction task and the self-supervised learning tasks.

Loss function definition

After the above process, we input all the original node embeddings $e_{i}$ into FC layers to predict the future trend at time step $t+1$:

$$\begin{aligned} X_{T+1,i}= FC Layers\left( \textbf{e}_{i} \right) \end{aligned}$$

(11)

where FC layers consist of two fully connected layers. The overall model loss function is as follows:

$$\begin{aligned} \mathcal {L}_{all} = \sum _{i=1}^{N} \left| X_{T+1,i} - \hat{X} _{T+1,i} \right| \cdot W_{5} +\mathcal {L}_{st} \end{aligned}$$

(12)

where $X_{T+1,i}$ is the predicted result, $\hat{X} _{T+1,i}$ is the ground truth, and $W_{5}$ is a learnable parameter.

Experiment

In this section, we conducted comprehensive experiments on four real word datasets to evaluate the perforamnce of our proposed model.

Experimental settings

Implementation details

All experimental evaluations are performed on a single NVIDIA RTX 3060 GPU hardware platform. We implemented the proposed network based on the pytorch framework. We set the maximum training epoch to 100 using the early stop mechanism. The temporal convolution kernel size of the Encoder is set to 3. The multi-scale spatial convolution kernels of the Encoder in the first layer are set to 1, 2, 3, and in the second layer, they are set to 3, 4, 5. Adam optimizer is used during the training phase with a batch size of 32. The embedding dimension D is set as 64. The perturbation ratio for both node-level and edge-level data augmentation is set to 0.1.

Table 1 Description Of Datasets

Full size table

Table 2 Comparison of experimental results of diffrent approaches

Full size table

Evaluation metrics and baselines

In order to compare these methods quantitatively, we use two metrics to evaluate the performance of models in the field of traffic forecasting: mean absolute error (MAE), mean absolute percentage error (MAPE).

We compare SMF-STCN with the following baseline models. ARIMA [25], SVR [26], GMAN [27], STFGNN [28], STGCL [29], COST [30], CIGA [31], STNSCM [32], CauST [33], and STSSL [34].

Datasets

In the experiments, we take the traffic data as reference to evaluate our proposed model for solve traffic forecasting task. The datasets include BJTaxi, NYCBike1, NYCBike2, and NYCTaxi. These datasets were selected based on several considerations that make them suitable for evaluating the effectiveness of our framework. The specific details of the datasets are shown in the Table 1.

Main results

In this section, we examine the experimental results of JDAGCL in comparison with other baseline methods. Table 2 displays the detailed results. Overall, the JDAGCL model outperforms all baselines consistently across the four real datasets. Notably, the MAE values of JDAGCL show an average improvement of 14% compared to the other models. The improvement in prediction accuracy can be attributed to learning and optimizing the correlation between the time series in the dataset. While on the NYCTaxi dataset, the MAPE value of JDAGCL may not be the best, its MAE value outperforms the other baselines, possibly due to the limited size of the training samples.

In-depth analysis of joint training strategy

We validate and analyze the effectiveness of the joint training strategy through the following comparative experiments.

JDAGCL/SS: remove the Stage 2 from the joint training procedure.
JDAGCL/ST: remove the Stage 1 from the joint training procedure.
JDAGCL: Stage 1 and Stage 2 are jointly training.

Figure 3 visualizes the performance of various models on the NYCBike1 and NYCBike2 datasets. It can be observed that among all the components, data augmentation has the most significant impact on predictive performance. In the absence of region-level data augmentation, the MAE for input flow on the NYCBike1 dataset increased from 4.89 to 5.01, and the MAE for output flow increased from 5.18 to 5.31. In the absence of edge-level data augmentation, the MAE for input flow on the NYCBike2 dataset increased from 4.97 to 5.08, and the MAE for output flow increased from 4.63 to 4.75. We conclude that data augmentation significantly improves forecasting results. From the figure, we can also observe that the impact of the GCN on the model’s prediction results is the second-largest. These results suggest that the GCN at different scales contribute to the model’s better capture of spatial features. In particular, both two stage learning tasks contribute to improving prediction accuracy.

Furthermore, we randomly selected a subset of experimental data from the NYCBike1 and NYCBike2 datasets for visualizing the prediction errors. Figure 4 illustrates the results of the actual and predicted traffic flow values over 100 time steps. It can be observed that our model accurately predicts the sudden increase in traffic flow features, demonstrating that JDAGCL accurately captures the trends in traffic features.

The effectiveness of augmentation

To further analyze the effect of our proposed augmentation strategy, we conduct ablation study by designing the following three variations.

JDAGCL-ED: Only using the edge-level augmenter.
JDAGCL-FM: Only using the node-level augmenter.
JDAGCL-Both: Using both two augmenters.

Figure 5 displays the performance of these variations on two datasets. It is evident that combining two augmenters enhances performance, as a single augmenter limits the GNN encoder’s capacity to learn graph semantic commonality from various perspectives.

Analysis of sampling strategy and parameter sensitivity

In this section, we conducted experiments to analyze the influence of sampling percentage on the model performance, Fig. 6 presents the results. It is worth noting that the MAE values achieve the best performance when the data augmentation ratio is set to 0.1. Given that MAE is the primary metric, we set the data augmentation ratio to 0.1 for the entire experimental process. Specifically, between 0.1 and 0.7, both MAE and MAPE show an increasing trend, while they exhibit a decreasing trend when exceeding 0.7. And we can conclude that data augmentation has a significant impact on improving the model’s predictive performance and capturing spatio-temporal features. These findings highlight the significant impact of data augmentation on improving the model’s prediction performance and capturing spatio-temporal features.

Figure 7 illustrates the variation of the loss function over epochs for our model on the NYCBike1 and NYCBike2 datasets. The training loss exhibits a similar decreasing trend across different datasets, ultimately stabilizing, indicating the stability of our model.

Conclusion

In this paper, we present JDAGCL, a novel framework for graph contrastive learning that enhances learning and forecasting performance through joint training. Our approach incorporates a carefully designed data augmentation strategy to simultaneously improve data quality and prediction accuracy. We introduce two stages training task that helps the model for learning more robust features. Through comprehensive evaluations on four real-world datasets, we demonstrate that our proposed method outperforms state-of-the-art baselines. The results validate the efficacy of our framework in enhancing learning and forecasting performance for graph-based data.

While our proposed method demonstrates superior performance compared to state-of-the-art baselines on the evaluated datasets, it is important to acknowledge some limitations of our work. These limitations include: (1) Scalability: Although our framework performs well on the evaluated datasets, its scalability to larger and more complex graph structures remains an open question. (2) Hyperparameter Sensitivity: The performance of our framework is dependent on the selection of hyperparameters. The sensitivity of the model’s performance to these hyperparameters necessitates careful tuning and validation, which requires significant computational resources. (3) Applicability Assumptions: Our proposed framework assumes certain characteristics and properties of the input graph data, such as the availability of attribute information or the presence of meaningful graph structures. While these assumptions hold for many real-world scenarios, they may not be universally applicable to all types of graph data.

In our future research, we intend to extend the application of our proposed approach to different domains and datasets, such as energy and power. By exploring diverse datasets, we aim to evaluate the universality and generalizability of our model. Additionally, we plan to investigate the influence of external factors on the performance of our model, further enhancing its robustness and applicability.

Data availability

Data will be made available on request.

References

You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020) Graph contrastive learning with augmentations. Adv Neural Inform Process Syst 33:5812–5823
Google Scholar
Zhu Y, Yichen X, Feng Y, Liu Q, Shu W, Wang Liang (2021) Graph contrastive learning with adaptive augmentation. Proc Web Conf 2021:2069–2080
Google Scholar
Zhang z, Huang X (2022) Spatial-temporal transformer network with self-supervised learning for traffic flow prediction
Ji J, Fan Y, Lei Minglong (2023) Self-supervised spatiotemporal graph neural networks with self-distillation for traffic prediction. IEEE Trans Intell Trans Syst 24(2):1580–1593
Google Scholar
You Y, Chen T, Shen Y, Wang Z (2021) Graph contrastive learning automated. In International Conference on Machine Learning, pages 12121–12132. PMLR,
Jin W, Liu X, Zhao X, Ma Y, Shah N, Tang J (2021) Automated self-supervised learning for graphs. arXiv preprint arXiv:2106.05470,
Chaitanya K, Erdil E, Karani N, Konukoglu E (2020) Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv Neural Inform Process SystA 33:12546–12558
Google Scholar
Wang X, Yang S, Zhang J, Wang M, Zhang J, Yang W, Huang J, Han Xiao (2022) Transformer-based unsupervised contrastive learning for histopathological image classification. Med Image Anal 81:102559
Article Google Scholar
Zhao T, Liu Y, Neves L, Woodford O, Jiang M, Shah N (2021) Data augmentation for graph neural networks. Proc aaai Conf Artificial Intell 35:11015–11023
Google Scholar
Jang T, Wang X Difficulty-based sampling for debiased contrastive representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24039–24048, 2023
Liu Y, Yang X, Zhou S, Liu X, Wang Z, Liang K, Wenxuan T, Li L, Duan J, Chen C (2023) Hard sample aware network for contrastive deep graph clustering. Proc AAAI Conf Artificial Intell 37:8914–8922
Google Scholar
Sun Lichao, Dou Yingtong, Yang Carl, Zhang Kai, Wang Ji (2022) S Yu Philip, Lifang He, and Bo Li. A survey. IEEE Transactions on Knowledge and Data Engineering, Adversarial attack and defense on graph data
Luo D, Cheng W, Yu W, Zong B, Ni J, Chen H, Zhang X (2021) Learning to drop: Robust graph neural network via topological denoising. In Proceedings of the 14th ACM international conference on web search and data mining, pages 779–787,
Papp P, Martinkus K, Faber L, Wattenhofer R (2021) Dropgnn: Random dropouts increase the expressiveness of graph neural networks. Adv Neural Inform Process Syst 34:21997–22009
Google Scholar
Chen Q, Ye A, Zhang Q, Huang C (2023) A new edge perturbation mechanism for privacy-preserving data collection in iot. Chin J Electron 32(3):1–10
Article Google Scholar
Dong Y, Sun Y, Qin C, Zhu W (2019) Epmda: edge perturbation based method for mirna-disease association prediction. IEEE/ACM Trans Comput Biol Bioinform 17(6):2170–2175
Article Google Scholar
Yow KS, Liao N, Luo S, Cheng R (2023) Machine learning for subgraph extraction: Methods, applications and challenges. Proc VLDB Endowment 16(12):3864–3867
Article Google Scholar
Ge X, Yu J, Hao R (2023) Privacy-preserving graph matching query supporting quick subgraph extraction. IEEE Transactions on Dependable and Secure Computing,
Suresh S, Li P, Hao C, Neville J (2021) Adversarial graph augmentation to improve graph contrastive learning. Adv Neural Inform Process Syst 34:15920–15933
Google Scholar
Kumar R (2022) Memory recurrent elman neural network-based identification of time-delayed nonlinear dynamical system. IEEE Trans Syst Man Cybernet 53(2):753–762
Article Google Scholar
Ding K, Xu Z, Tong H, Liu H (2022) Data augmentation for deep graph learning: A survey. sigkdd explor. newsl. 24, 2 (dec 2022), 61–77,
Qiu Xihe, Qian Jiahui, Wang Haoyu, Tan Xiaoyu, Jin Yaochu (2024) An attentive copula-based spatio-temporal graph model for multivariate time-series forecasting. Appl. Soft Comput. 154:111324
Article Google Scholar
Rajesh K, Smriti Srivastava JRP, Gupta, (2017) Diagonal recurrent neural network based adaptive control of nonlinear dynamical systems using lyapunov stability criterion. ISA transactions 67:407–427
Liu X, Li Z, Zong W, Su H, Liu P, Ge SS (2024) Graph representation learning and optimization for spherical emission source microscopy system. IEEE Transactions on Automation Science and Engineering,
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. John Wiley & Sons, 2015
Castro-Neto M, Jeong YS, Jeong MK, Lee D (2009) Han. Online-svr for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Systems with Applications, 36(3-part-P2):6164–6173,
Zheng C, Fan X, Wang C, Qi J (2020) Gman: A graph multi-attention network for traffic prediction. Proc AAAI Conf Artificial Intell 34:1234–1241
Google Scholar
Li M, Zhu (2021) Spatial-temporal fusion graph neural networks for traffic flow forecasting. Proc AAAI Conf Artificial Intell 35:4189–4196
Liu X, Liang Y, Huang C, Zheng Y, Hooi B, Zimmermann R (2022) When do contrastive learning signals help spatio-temporal graph forecasting? In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pages 1–12
Woo G, Liu C, Sahoo D, Kumar A, Hoi S (2022) Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575,
Chen Yongqiang, Zhang Yonggang, Bian Yatao, Han Yang MA, Kaili Binghui Xie, Liu Tongliang, Han Bo, Cheng James (2022) Learning causally invariant representations for out-of-distribution generalization on graphs. Adv Neural Inform Process Syst 35:22131–22148
Google Scholar
Pan Deng Y, Zhao JL, Jia X, Wang M (2023) Spatio-temporal neural structural causal models for bike flow prediction. Proc AAAI Conf Artificial Intell 37:4242–4249
Google Scholar
Zhou Z, Huang Q, Yang K, Wang K, Wang X, Zhang Y, Liang Y, Wang Y (2023) Maintaining the status quo: Capturing invariant relations for ood spatiotemporal learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3603–3614,
Ji J, Wang J, Huang C, Junjie W, Boren X, Zhenhe Wu, Zhang Junbo, Zheng Yu (2023) Spatio-temporal self-supervised learning for traffic flow prediction. Proc AAAI Conf Artificial Intell 37:4356–4364
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China
Jiaqi Liu, Yifu Chen, Qianqian Ren & Yang Gao

Authors

Jiaqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yifu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qianqian Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Qianqian Ren or Yang Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, J., Chen, Y., Ren, Q. et al. Joint data augmentations for automated graph contrastive learning and forecasting. Complex Intell. Syst. 10, 6481–6490 (2024). https://doi.org/10.1007/s40747-024-01491-3

Download citation

Received: 20 January 2024
Accepted: 01 May 2024
Published: 15 June 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s40747-024-01491-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Joint data augmentations for automated graph contrastive learning and forecasting

Abstract

Similar content being viewed by others

LAGCL: Towards Stable and Automated Graph Contrastive Learning

Adaptive graph contrastive learning with joint optimization of data augmentation and graph encoder

MPGCL: Multi-perspective Graph Contrastive Learning

Explore related subjects

Introduction

Related work

Graph contrastive learning

Data augmentation on graph

Performance analysis for downstream task

Methodology

Problem definition

Learnable data augmentation

Node-level augmentation

Edge-level augmentation

Joint training strategy

Encoder

Loss function definition

Experiment

Experimental settings

Implementation details

Evaluation metrics and baselines

Datasets

Main results

In-depth analysis of joint training strategy

The effectiveness of augmentation

Analysis of sampling strategy and parameter sensitivity

Conclusion

Data availability

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation