Sparse self-attention guided generative adversarial networks for time-series generation

Remarkable progress has been achieved in generative modeling for time-series data, where the dominating models are generally generative adversarial networks (GANs) based on deep recurrent or convolutional neural networks. Most existing GANs for time-series generation focus on preserving correlations across time. Although these models may help in capturing long-term dependencies, their capacity to pay varying degrees of attention over different time steps is inadequate. In this paper, we propose SparseGAN, a novel sparse self-attention-based GANs that allows for attention-driven, long-memory modeling for regular and irregular time-series generation through learned embedding space. This way, it can yield a more informative representation for time-series generation while using original data for supervision. We evaluate the effectiveness of the proposed model using synthetic and real-world datasets. The experimental findings indicate that forecasting models trained on SparseGAN-generated data perform comparably to forecasting models trained on real data for both regularly and irregularly sampled time series. Moreover, the results demonstrate that our proposed generative model is superior to the current state-of-the-art models for data augmentation in the low-resource regime and introduces a novel method for generating realistic synthetic time-series data by leveraging long-term structural and temporal information.


Introduction
Time-series Generation is a crucial task in many notable disciplines, including finance [1], machine fault diagnosis [2], medicine [3], and music [4].Significant progress in the area of time-series generation has been achieved with the intro-GANs have demonstrated their ability to generate realistic data and have made remarkable progress in various tasks, such as the generation of time-series [3,4,6,7], images [8,9], and videos [10].Particularly, a significant amount of work has utilized GANs based on Recurrent Neural Networks (RNNs) for time-series generation [3,4,6,7].However, by carefully examining the generated samples from these models, we can observe that RNN-based GANs, such as LSTM GANs and gated recurrent GANs, cannot handle long sequences.Although RNN-based GANs can generate many realistic samples, there is still a difficulty in training them due to exploding vanishing gradients and mode collapse that limits their generation capability.In addition, the majority of time series generation models assume that time series data are collected at regularly spaced intervals of time, and so are referred to as "regular time series".In real-world contexts, however, unevenly spaced time is a major issue for time series data [6].Several factors, such as technological flaws in sensing equipment and imprecision in the sensors, might lead to irregular sampling [11].Accordingly, since RNN-based GANs are typically designed for regular time-series data, they cannot maintain informative varying intervals properly, which is a major concern for generating time-series data.
Although there is no single solution to these drawbacks, GANs combined with self-attention mechanisms make a convincing argument.Self-attention mechanisms [12] allow faster and higher-quality learning by paying more attention to the most important parts with relatively low computational cost.The incorporation of the self-attention module has proved its success in many deep learning-based applications, such as text translation [13], speech recognition [14], and image generation [9].The softmax transformation is commonly used in self-attention to construct the attention distribution that reflects the relative significance of all positions in a sequence of inputs [15].However, the memory and computational complexity of self-attention increase exponentially with sequence length.
To address the aforementioned issues, faster and more efficient alternatives like sparsity-based self-attention can be used instead.In this work, we introduce SparseGAN, a novel end-to-end generative model for regular and irregular multivariate time series.We use sparse self-attention to limit the selection to only relevant time steps in order to encode the similarity information of temporal behavior more effectively.Our primary innovation is the inclusion of a joint learning mechanism coupled with a sparse self-attention mechanism that replaces the use of a full fixed-size representation derived from continuous-time inputs.This gives SparseGAN more representational flexibility compared to previous RNN-based GANs.The key contributions of this paper can be presented as follows: • To the best of our knowledge, we propose the first timeseries generation model that relies on self-attention to generate realistic synthetic data for time-series forecasting.Precisely, we leverage the sparse self-attention layer to build a fully attentional generative model that is not only capable of accessing all historical input steps regardless of the time-series length but is also supported by supervision from the original data.• We design a joint framework for time-series generation.Our approach is more generic than any sequential generative model, as it can handle regular and irregular time-series data.• Experimental results prove that the SparseGAN model consistently outperforms all baseline models with an error reduction of around 15% over the former best model for both regular and irregular time-series data.• We further show that the time series synthesized from our model can be applied to other tasks, such as data augmentation and data substitution in low-data regimes for training better time-series forecasting models.

Related work
Generative adversarial networks GANs are generative models comprised of two networks: a generator and a discriminator.They are capable of generating new data points that are similar to the original dataset [5].Several works have used GANs for one-dimensional time-series generation.For instance, Lou et al. [16] introduced two distinct approaches for one-dimensional data augmentation using fully connected layers: Wasserstein Generative Adversarial Network (WGAN) and WGAN along with autoencoder for more efficient data representation.Ramponi et al. [6] are the first to introduce one-dimensional convolutional layers as an alternative to fully connected networks to handle regularly and irregularly sampled time-series.In the recent work of Shao et al. [2], Wiese et al. [1] and Dogariu et al. [17,18], the authors proposed models similar to [6] with an emphasis on machine fault diagnosis and financial time-series, respectively.In addition, other scholars have considered using GANs for multivariate time-series generation.Mogren [4] introduced continuous RNN-based GANs to model the entire conditional distribution of data sequences for classical music generation purposes.In his approach, unidirectional LSTM layers are used in the generator, while bidirectional LSTM layers are used in the discriminator.Similarly, Esteban et al. [3] proposed two different approaches for time-series generation with an emphasis on medical data: Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN).These models follow a similar architecture to that of the standard GAN, but LSTM layers were used to replace the fully-connected layers in both the generator and the discriminator.Later, Bandara et al. [19] proposed employing standard generation models such as dynamic time warping averaging with an emphasis on to produce time series synthetically in order to enhance forecasting models.Yoon et al. [7] have introduced a joint learning mechanism for training GANs.Particularly, the authors proposed an RNN-based GAN model that is trained jointly with an encoder-decoder network.Recently, Arnout et al. [20] introduced Classspecific Recurrent GAN (CLaRe-GAN) by conditioning the generator on auxiliary input comprising class-specific and class-independent properties.Specifically, the model consists of two encoders, one for each kind of information (inter-and intra-class characteristics), a shared-latent-space assumption, and a class discriminator that discriminates across latent vectors to extract class-specific features.Despite their promising performance, these models are still inadequate for generating time-series data that take into account all the relevant long dependencies and information in timeseries data.
Attention mechanisms Sequential problems have incorporated attention mechanisms such as time-series forecasting [21][22][23][24], time-series classification [25,26], time-series anomaly detection [27,28] and neural machine translation [29,30].The first is Transformer model [12] which is an encoder-decoder architecture based on the attention mechanism designed to handle sequential data effectively without RNN layers.Recently, a variety of transformer-based methods have been proposed, such as Transformer-XL [29], Reformer [31] and Universal Transformer [32].For example, Transformer-XL [29] can be considered an extension of the transformer model that tackles the issue of fixed-span attention that allows information to flow across segments.Reformer [31] is another transformer model extension that replaces the dot-product attention with locality-sensitive hashing attention to reduce the computational complexity and the standard residual blocks with reversible residual layers.The Universal Transformer [32] is another transformer architecture that combines the transformer model's parallelizability with RNNs' inductive bias.Similarly, Hu et al. [33] proposed self-attention along with RNN to mine more information of time series for forecasting task.Another work by Wan et al. [34] proposed using CNN-based model using 1D dilated convolution, followed by the feature extraction utilizing a self-attention mechanism for time-series prediction.
Recently, several papers have proposed sparsity-based architectures that rely on sparse attention mechanisms, which are typically used in neural machine translation [35,36].For instance, Sparse Transformer is a transformer-based architecture that introduced self-attention with sparse factorization, to reduce the computational cost, making it possible to train attention networks with hundreds of layers on long sequences [36].Adversarial Sparse Transformer is inspired by the sparse transformer and GANs for time-series forecasting [22].However, none of these studies have investigated adapting self-attention, especially sparse self-attention, into GANs for time-series generation tasks.Unlike all the existing models, we propose a novel model for time-series generation that employs sparse self-attention in both parts of the GANs in conjunction with joint learning to increase the quality of generated time-series data.

Problem formulation
Let X , M be any sets called data space and model space, respectively, e.g., X := R, and let X * := i∈N X i denote the set of finite sequences in X .We model a learning algorithm as a map a : X * → M from datasets D train ∈ X * to models ŷ ∈ M. Let : M × X * → R be an evaluation measure 1 where ( ŷ, D test ) with a fresh test sample from the same data generating distribution as D train yields an estimate of the expected error of the learned model.Our goal is to use training data D train to generate some synthetic data V that best approximates real data distribution.We distinguish two variants of the synthetic data generation problem: Data augmentation problem: Given a real dataset D train , a learning algorithm a, and an evaluation measure , find a synthetic dataset generator G such that the model trained on both, the training data and the data synthesized by it, V , has a minimal error, e.g., minimize: where ⊕ denotes the concatenation of sequences.The application idea behind the data augmentation problem is to add more synthetic training data to a dataset and fill unexplored input space to be able to learn more powerful models [2,6,37].Consequently, it is a very successful method for expanding and improving the quality of training data.Thus, the better the synthetic data generator, the further the loss should get reduced when compared to train on the real data only.
Data substitution problem: Given the same as above, find a synthetic dataset generator G such that the model trained only on data synthesized by it, V , can be used to replace the training data such that the model trained only on data synthesized by G has a minimal error, e.g., minimize: This problem is relatively new where the application idea behind the data substitution problem usually is, that models for sensitive data should be learned in potentially insecure environments such as the cloud or by a service provider, and thus the real data should not be shared, but just synthetic data fulfilling the same purpose [38,39].The data substitution problem silently assumes that the data generator does not have the capacity to simply reproduce the real training data, which arguably would yield the best data fundament for learning a model, but also would invalidate its purpose of not sharing the real data.
Here, we deal specifically with data augmentation and data substitution problems when learning forecasting models for time-series data.training stage.Generation network is used to generate timeseries that are as realistic as possible.Accordingly, the supervision network exploits the fact that the training data include more information than merely whether produced data is genuine or synthetic; we may explicitly learn from actual data.Therefore, the supervision network reduces the data's complexity to enable the generation network to produce more realistic data and does not depend only on discriminator feedback to acquire more realistic data.The two sub-nets of SparseGAN are trained in an end-to-end manner.We elaborate on each of the two sub-nets below.

Supervision network
SparseGAN adopts a few existing studies as its building blocks.First, the supervision network is an encoder-decoder network.Consider the input data, X = {x i 1 , x i 2 , . . ., x i t }, where x i t denotes the value of time series i at time t.The encoder consists of a series of Gated Recurrent Unit (GRU) cells which takes a time-series as input and produces latent feature representation, [40,41].The decoder follows a formula similar to the encoder, which takes the latent feature representation from the encoder and outputs reconstructed time-series data, C = {c i 1 , c i 2 , . . ., c i t }, which is similar to the input time-series that leverages the entire data structure [42]; this is crucial to facilitate the generation task.Accordingly, in order to reconstruct a high-dimensional time-series, the supervision network offers an effective and reliable solution.Utilizing the reconstructed data rather than the actual data enables the generation network to learn the underlying dynamics of the data through lower-dimensional representations [7].In addition, this phase prevents the discriminator from getting stuck in a local minimum [6,7,43].For the supervision network, we take x i t as input, and the objective is to reconstruct x i t .We use L reconstruct as our reconstruction loss to yield efficient and robust reconstruction for a high-dimensional time-series.
Let A(•) be the supervision network, the objective of the supervision network can be expressed as: (1)

Generation network
The generation network employs two independent networks: a generator G and a discriminator D, which follow the general GANs architecture.The seed for the generator is the random noise vector Z .Here, Z is a sequence of T points {z t } T t=1 sampled independently from Gaussian distribution with mean equals to 0 and standard deviation equals to 1. G attempts to transform the random noise vector Z into realistic time series.On the other hand, D seeks to distin-guish if the time series generated by G is realistic or not.For the generation network, let C denote the reconstructed time series yielded by the supervision network's decoder.Here, the discriminator evaluates the degree of authenticity of the generated time series V = {v i 1 , v i 2 , . . ., v i t } using the adversarial loss L adversarial , We utilize the noise vector Z as a seed to G, then we receive the generated time-series v i t , afterwards fed v i t into D, and ultimately the adversarial loss L adversarial .
While RNN-based GANs have been widely used to generate time-series data thanks to their ability to handle data sequences, they assume that the observations are sampled regularly and are fully observed at each sampling period.These assumptions often do not hold for real-world multidimensional time-series that can be sparse and irregular.Additionally, they scale inadequately with long input sequences.Taken together, these limitations make the existing RNNbased GANs unsuitable for our scenario [44,45].To properly deal with irregular time intervals and learn the implicit long-range dependencies, we propose the SparseGAN model based on the sparse self-attention module.
Sparse self-attention module In general, the canonical self-attention scores are computed by employing the classical softmax transformation to find the weighted sum transformations y = (y 1 , . . ., y L ) of the input sequence h = (h 1 , . . ., h L ) based on relevance [12].The i-th output y i of the input sequence h is determined as follows: (3) where W q ∈ R d×d , W k ∈ R d×d , and W v ∈ R d×d represent the parameter matrices.e i j represents the attention scores and k i j represents the relevance between the i-th and j-th input element.This results in dense dependencies that capture the complete interactions between each pair of time steps and fails to assign zero probability to less significant relationships.This would result in less attention assigned to the relevant time steps, diminishing the performance.Besides, it does not scale well with long input sequences, requiring exponential computation complexity and storage capacity to generate all similarity scores according to Jain and Wallace [46] and Wu et al. [22].
To address this problem, we employ a transformation from the entmax family [47] to replace the softmax function in the self-attention module in both generator and discriminator parts.unlike softmax, they are capable of producing sparse probability distributions.Specifically, this is done by applying an α-entmax transformation of the attention scores e [47][48][49], defined as: where d := { p ∈ R d : i p i = 1} is the probability simplex, and, for α ≥ 1, H T α is the Tsallis continuous family of entropies [50]: Based on the definition of H T α ( p), the entmax transformation can be defined as, where τ denotes the Lagrange multiplier which acts as a threshold, i.e., entries with score e i ≤ τ /α−1 get zero probability and 1 indicates a vector of all ones.In particular, we implement 1.5-entmax as a balance between softmax and sparsemax.By using sparsity to refine the attention weights, we can achieve a more expressive representation of the entire input.

Sparse self-attention GANs
In this part, we propose sparse self-attention GANs.In particular, the generator architecture is first composed of a stack of sparse self-attention layers, where each layer learns a representation by taking the output from the previous layer that follows a setup close to the form of attention proposed by Vaswani et al. [12] using 1.5-entmax transformation [47].Then, a fully connected feed-forward network is stacked on top of it.In the end, we employ a residual connection [51] around the stack of layers, followed by layer normalization [52].The seed for the generator is the random noise vector Z drawn from the Gaussian distribution.To ensure that the discriminator can learn fake and real time-series, the discriminator is also composed of a stack of self-attention layers using 1.5-entmax transformation [12,47] followed by a fully connected feed-forward network.Eventually, the proposed GANs become more capable of generating time-series data that fit the distribution of real data accurately.

End-to-end joint training
We train the two sub-nets jointly in an end-to-end manner.However, depending only on discriminator feedback has a major flaw.The generated samples are based on the distribution from which the input random noise Z is sampled.As a result, there might be a significant gap between the generated samples and the actual data.Accordingly, We use C to supervise the generator of generation network to improve the quality of generated data and enable the generator to construct more realistic data sequences [7,44,53].We define the supervision loss function as: The final loss is made of reconstruction, adversarial and supervision losses, with λ being a hyper-parameter that controls the contribution of the supervision loss.So our final objective function is:

Experiments
In this section, we present our experiments to evaluate the proposed generative model, SparseGAN.The experiments seek to answer the following research questions: (i) RQ1: Can SparseGAN improve the quality of data generation for regular and irregular time series compared to the state-ofthe-art generative models?(ii) RQ2: In low-resourced data scenarios, how effective is augmenting real-world datasets with SparseGAN generated data in boosting the accuracy of time-series forecasting models?(iii) RQ3: Can SparseGAN generate synthetic data that can be substituted for the existing real data?(iv) RQ4: How well SparseGAN generated data preserve original data's diversity and realism compared to the state-of-the-art generative models?

Datasets
We assess the utility of SparseGAN using five large-scale datasets.Since this study aims to assess the utility of SparseGAN for regular and irregular time-series, we train our model on five datasets, three of which are regular and two are irregular time-series. 2We list these datasets below (Table 1).

Sine waves
The appliances energy dataset contains 4.5 months of energy consumption of 10-min readings.This dataset includes 29 attributes, including temperature and humidity conditions in different areas of the house.4. Power consumption 5 We consider a dataset for the electricity consumption containing one-minute readings for 4 years in one household.This dataset consists of 9 attributes, including active power, reactive power and voltage.The processing of this dataset is done by selecting only 40% of each time series' points to create irregularly sampled data. 5. Air quality 6 We consider the hourly responses of gas concentration from a certified analyzer, which are recorded between March 2004 and February 2005.This dataset consists of 15 attributes, such as humidity and temperature.We follow [54] to create an irregular version of this dataset, in which the authors select only 40% of each time series to generate an irregular dataset.We follow the same procedure here.To maintain a fair comparison, the eliminated observations are randomly selected and maintained constant across all experimental settings and baselines.

Baseline models
To assess the performance of SparseGAN, we compare SparseGAN with time-series generative models, namely TimeGAN [7], a related method to our proposed one that combines unsupervised adversarial learning with supervised training, and several latest deep learning-based models, including T-CGAN [6], which is a data generation approach designed for time-series with irregular sampling, RCGAN [3], which is a standard GAN but LSTM layers have been used in both the generator and the discriminator, C-RNN-GAN [4], which is a continuous RNN-GAN to model the entire conditional distribution of data sequences, T-Forcing [55], which is RNN model trained with Teacher Forcing technique, P-Forcing [56], which is RNN model trained with Professor Forcing technique to capture long-term dependencies, WaveNet [57], which is a model for sequential generation of raw-waveform audio that imitates human voice, and WaveGAN [58], which is a convolutional-based GAN for raw-waveform audio generation.

Evaluation
Two aspects of the generated time-series data are considered for assessment.The first aspect is fidelity, which refers to how effectively the synthesized data retains the original data properties [59,60].In order to measure this, we performed two evaluation tasks to measure how well is our approach compared to different baseline models in terms of fidelity [59,60]: (1) Data augmentation: in a low-data regime, we investigate augmenting training data with generated data.We reduce the number of data points available to train and evaluate the performance on the test set to mimic the low-data regime setting, and (2) Data substitution: we consider only the generated examples for training and evaluate the performance on the test set.
In this paper, we assess the SparseGAN model for data augmentation and data substitution tasks using two timeseries forecasting models: LSTM model following [7] and LSTnet [61] which utilizes CNN along with RNN. 7 In this paper, we reduce the problem to learning a one-step-ahead forecasting model with minimal expected error.The second aspect is diversity [59,60], which is measured by how well the generated data preserves the original data's distribution.Inspired by the Frechet Inception Distance [62], we employ a 4-layer LSTM model to discriminate between actual and synthetic data as a standard supervised task.The classification accuracy on the held-out test set is then reported.A lower classification accuracy score indicates that the synthesized data closely resembles real-world time-series properties.Furthermore, it demonstrates that the generated time series are indistinguishable from real-world data.

Generated data fidelity
We train two time-series forecasting models for each dataset under two regimes as described in the Sect.6: (i) data substitution scenario and (ii) data augmentation for low-resourced data scenarios.
Data augmentation for low-resourced data scenarios In this section, we consider the scenario where forecasting models are trained on real-world data augmented by SparseGAN-generated data.To be clear, the purpose of this experiment is to investigate whether augmenting original data with SparseGAN-generated data will improve the accuracy of our forecasting models in low-data regime settings (RQ1 and RQ2).Simulating the low-data regime settings was introduced for NLP tasks such as text classification [63,64].Inspired by this line of research, we explore sub-sampling a small training set to mimic the low-data regime scenario for time series forecasting, where we train generative models with limited data.
This experiment goes as follows.We focus on experiments with 10 and 50 data points to simulate realistic low-resourced data settings where we frequently observe poor performance.For data augmentation, we add 100 SparseGAN synthetic data points to the original data as shown in Tables 2 and  3.This procedure allows us to evaluate the performance of SparseGAN in data augmentation tasks for time-series forecasting models.Results are reported as MAE on the test set.The findings in Tables 2 and 3 demonstrate that data augmentation using SparseGAN can drastically improve the accuracy of time-series forecasting models compared to other generative models.In addition, the results provide strong evidence supporting the significance of integrating a sparse self-attention mechanism and a supervised signal for enhancing the quality of generated data, particularly when compared to models that solely rely on convolutional and RNN layers.The supervision network acts as an auxiliary network, providing valuable feedback to the generator based on the properties and characteristics of the real data.This feedback enables the supervision network to offer more direct and informative signals to the generator, resulting in improved convergence and enhanced quality of the generated samples.By combining the sparse self-attention mechanism as a fundamental building block with the integration of the supervised signal obtained from a supervision network, SparseGAN outperforms existing state-of-the-art generation models in terms of accuracy.For instance, there is around 20% MAE reduction compared to the previous best model results, TimeGAN, in the power consumption dataset.Generally, it can be concluded that in all cases, the SparseGAN model yields improvements of around 11-17 %, manifested by a reduction in MAE, over the state-   of-the-art baseline models in the five datasets as shown in Tables 2 and 3. Data substitution scenario We now consider the scenario where we fully train our forecasting models on synthetic data.To answer RQ3, we compare the performance of syntheticdata-trained forecasting models when using SparseGAN against other baseline models.We present the findings of this experiment in Table 4.It is interesting to see that the SparseGAN model consistently outperforms all baseline models.For instance, there is around 6% improvement over the previous best model results, TimeGAN assuming that real-data performance can be achieved.In addition, the findings corroborate that SparseGAN can better handle the long-range dependencies between distant time stamps compared to other baseline models.SparseGAN also outperforms other baseline models in mimicking the properties of real data, both regular and irregular, emphasizing its utility in generating synthetic data that can then substitute real-world data.

Generated data diversity
To further analyze the generated data, we explore how well Sparse-GAN generated data preserves the diversity and patterns of original data as described in Sect.5.3 (RQ4).Experimental results on different datasets are illustrated in Table 5.As illustrated in Table 5, SparseGAN consistently generates synthesized data that closely resembles real-world time-series' diverse patterns.Furthermore, we observe in Table 5 that SparseGAN generated time-series are indistinguishable from real-world data.The findings in Table 5 demonstrate that our model achieves the least classification accuracy with a big gap compared to other generative models.To further highlight the similarities between the actual data and synthetic data, Fig. 2 is a visualization graph of data distributions transferred to two dimensions using PCA and t-SNE.Again, we observe that synthesized data closely mimics the various patterns of real-world time series, demonstrating the efficiency of the suggested approach.

Sensitivity analysis
We finally conducted an ablation study to test the robustness of our findings.First, we investigated the sensitivity of the parameter λ, which balances the generation loss and the adversarial training part.We alter the value of λ among {0, 0.1, 1, 10, 100} and report the performance of SparseGAN on the Energy Appliances dataset for data substitution.Here, using λ is equivalent to training our model without the supervision network.As shown in Table 6, SparseGAN performs better as λ increases.It is evident that our proposed approach benefits from the supervision network and enhances perfor-mance, but excessively high values of λ which give huge weight to the supervision network, have a negative impact on SparseGAN.Second, we investigated the effect of various attention mechanisms on SparseGAN performance.Table 7 indicates that, as compared to softmax, 1.5 entmax is more sparse and assigns greater scores to significant information that helps the model's performance.

Conclusion
In this paper, we have proposed a novel generative adversarial network, SparseGAN, which addresses the limitations of previous time-series generation models with regard to long-term dependencies.While previous research has built time-series generation on RNN and convolutional layers, we based the SparseGAN on sparse self-attention mechanism.In addition, our proposed model utilizes original data for supervision purposes.We show that SparseGAN can capture the long dependencies in time-series data efficiently.It is also capable of maintaining the original distribution of the data based on internal characteristics.The experimental findings substantiate that SparseGAN-generated data outperformed generative baseline models in the regular and irregular timeseries data.In particular, the forecasting models which were trained on SparseGAN-generated data perform similarly to models trained on real-world data.In addition, SparseGAN provides an effective way to augment training data in lowresourced data settings.

Fig. 1
Fig. 1 The overall architecture of the SparseGAN model.The left part of the figure is our supervision network and the right part of the figure is the generation network.The whole network is trained end-to-end manner and the supervision network is used to supervise the generation network in the training phase

Fig. 2 A
Fig. 2 A visual comparison of real data and its generated counterpart.The first row of the figure depicts PCA visualizations of Sines Waves, Google Stocks, and Energy Appliances, whereas the second row depicts

Table 1
Datasets statistics

Table 2
Performance summary of MAE scores for data augmentation in low-data regime Bold font indicates the best performing model, while underlined font represents the second best performing model We sample 10 training data points and then use 100 synthetic data points for augmenting our training sets

Table 3
Performance summary of MAE scores for data augmentation in low-data regime

Table 4
Performance summary of MAE scores for data augmentation in low-data regime Bold font indicates the best performing model, while underlined font represents the second best performing model We sample 50 training data points and then use 100 synthetic data points for augmenting our training sets

Table 5
Performance summary of the distance between the generated time series of different methods and original data in terms of classification accuracy score Bold font indicates the best performing model, while underlined font represents the second best performing model Please refer to Sect.5.3 for details

Table 6
Analysis of the hyper-parameter λ using Energy dataset based on LSTM forecasting model for data substitution task