1 Introduction

Using real-world time series is associated with issues such as lack of sufficient data, privacy concerns, or anomalies in various applications. Therefore, real-world time series are often replaced or complemented with synthetic time series to address these issues. For this reason, the generation of synthetic time series aims to provide realistic and useful time series. Generated time series are realistic if they have characteristics similar to real-world time series, such as seasonality and trend [13]. For example, an electricity consumption time series typically has reoccurring daily, weekly, and yearly patterns [10]. However, a realistic generated time series is not necessarily useful for all tasks. For example, to analyse the future growth of a startup company, it may be necessary to generate time series with seasonal fluctuations and a trend when given only a small sample of stationary input data.Footnote 1 Therefore, to generate a useful time series, one must be able to influence or even design time series’ characteristics during generation independently of the available input data.

Generating realistic and useful time series thus requires generation methods to control the generation of time series further. Firstly, generation methods have to control the non-stationarity of the generated time series to incorporate effects that change the value of the time series at different times. Examples of non-stationary time series are daily stock prices, the monthly beer production, or the annual number of strikes [13]. Secondly, these methods must be able to control the periodicities in the generated time series to represent regularly occurring patterns such as the patterns mentioned above in electricity consumption time series.

Despite generally promising results (e.g. [5, 19, 22, 23]), existing time series generation methods do not explicitly control the non-stationarity and periodicities of the generated time series. Instead, these approaches learn a mapping from the latent space, which lacks temporal information, to the realisation space. As a result, all newly sampled data follow the same distribution, resulting in stationary time series. Furthermore, the periodicities in the generated time series are limited to those learned from the training sample.

In the present paper, we thus present a novel approach to control non-stationarity and periodicities with calendar and statistical information when generating time series. For this, we make the following contributions: Firstly, we define the requirements for generation methods to generate time series with non-stationarity and periodicities, which we show is not fulfilled by existing generation methods. Secondly, we formally describe the novel approach for controlling non-stationarity and periodicities in generated time series. Thirdly, we introduce an exemplary implementation of this approach using a conditional Invertible Neural Network (cINN) that preprocesses calendar and statistical information as conditional input with a conditioning network.

To evaluate the proposed cINN, we empirically examine its capabilities to generate time series with controlled non-stationarity and periodicities in experiments with real-world data sets. We also compare the general quality of the time series generated with the cINN to that of state-of-the-art time series generation methods and perform an ablation study to analyse the effects of the conditional information.

2 Related work

The present paper introduces a novel approach for generating realistic and useful time series by controlling non-stationarity and periodicities. In this section, we thus describe how recent approaches to generate realistic and useful time series consider the temporal structure.

The first group of approaches considers the temporal structure independent from a specific domain: Xu et al. [22], for example, propose COT-GAN. It considers causality in the generation process by utilising the causal optimal transport theory. With this, the output of a point of time depends only on inputs up to that point of time. Yoon et al. [23] also focus on the temporal dynamics of generated time series: By using a supervised loss, the proposed TimeGAN can better capture temporal dynamics. In addition to temporal structures, MuseGAN by Dong et al. [6] considers the interplay of different instruments. For this purpose, they integrate multiple generators that focus on different music characteristics. To generate time series with irregular steps, Ramponi et al. [20] present the T-CGAN that uses timestamps to learn the relationship between data and time.

The second group of approaches additionally uses domain-specific information to consider temporal structures. Esteban et al. [8] introduce the Recurrent Conditional GAN (RCGAN) for generating realistic synthetic medical data. Thanks to conditioning utilising specific labels, RCGAN can create time series that can also be used for training models without imposing privacy concerns. For performing high-quality speech synthesis, Prenger et al. [19] combine the melspectograms presented in WaveNet [17] with an Invertible Neural Network. Furthermore, to generate electrical consumption time series for scheduling and energy management, Lan et al. [16] condition their Wasserstein GAN on temporal and consumer information.

Altogether, existing works consider the temporal structure and domain-specific information when generating time series. However, to the best of our knowledge, no work exists that successfully generates realistic and useful time series by controlling non-stationarity and periodicities.

3 Problem formulation

To generate realistic and useful time series, generation methods must be able to control non-stationarity and periodicities during generation. This section firstly formalises the related requirements before highlighting why current generation methods cannot fulfil them.

3.1 Requirements of non-stationarity and periodicities in time series

This section briefly formalises the requirements of non-stationarity and periodicities in time series with calendar information based on phenomena observed in real-world time series.

Requirement 1: non-stationarity

Realistic time series can include phenomena such as trends or seasonality. Such characteristics can be represented as components of a time series. In time series analysis, a time series \(\mathcal {X}_{t}\) is typically decomposed into a seasonal component St, a trend component Tt, and a random (or remainder) component Rt, i.e. \(\mathcal {X}_{t} = S_{t} + T_{t} + R_{t}\)Footnote 2 [13].

Since these components can cause a time series to be non-stationary, we detail non-stationary time series. We regard non-stationary time series as realisations of a non-stationary stochastic process {Xt}. However, we must consider stationary stochastic processes due to the lack of a formal definition of non-stationary stochastic processes. For time series, it is sufficient to focus on the properties of weakly-stationary stochastic processes. According to Hyndman and Athanasopoulos [13], a weakly stationary stochastic process {Xt} has the following properties:

  1. 1.

    \(\mu = \mathbb {E}[X_{t}] = \mathbb {E}[X_{t+\tau }], \forall t \in [1,L], \forall \tau \in \mathbb {N}\),

  2. 2.

    \(\sigma ^{2} = \text {var}[X_{t}] = \mathbb {E}[(X_{t} - \mu )(X_{t} - \mu )'], \forall t \in [1,L]\),

  3. 3.

    \({\Gamma }(k) = \text {cov}(X_{t},X_{t-k}) = \mathbb {E}[(X_{t} - \mu )(X_{t-k}-\mu )'], \forall t \in [1,L], \forall k \in \mathbb {N}\).

A time series is non-stationary if at least one of these properties is violated. That is, either the mean μ(t), the variance σ2(t), or the autocovariance Γ(k,t) vary over time and thus are time-dependent.

Requirement 2: periodicities

Realistic time series can also contain the phenomenon of periodicities. In a trend-free time series without a random component \(\mathcal {X}_{t} = x_{1}, x_{2}, {\dots } x_{L}, t \in [1,L]\), a periodicity with period η can be defined by xt+η = xt,∀t ∈ [1,L].

Since the random component Rt causes the time series to have constant unpredictable fluctuations, this definition is too strict. Therefore, we utilise the reoccurring autocovariance structure Γ(η) between time series points separated by the period η (see [13]) to define a periodicity, i.e.

$$ {\Gamma}(\eta) \approx {\Gamma}(2\cdot\eta) \approx ... \approx {\Gamma}(\mathcal{P}\cdot\eta), \mathcal{P} \in \mathbb{N}. $$
(1)

Furthermore, we expect a noticeably different autocovariance between time series observations separated by κ, whereby κ is not a multiple of the period η, i.e.

$$ \lvert {\Gamma}(\eta) - {\Gamma}(\kappa) \rvert \gg 0 : \kappa \neq \mathcal{P}\cdot\eta, \mathcal{P} \in \mathbb{N}. $$
(2)

In the case of trend-free time series, observations separated by the period η are additionally similar to each other, i.e.

$$ x_{t} \approx x_{t+\eta} \approx x_{t+2\cdot\eta} \approx ... \approx x_{t + \mathcal{P}\cdot\eta}, \mathcal{P} \in \mathbb{N}. $$
(3)

Therefore, a time series includes periodicities if a reoccurring autocovariance structure Γ(η) is present.

3.2 Shortcomings of generation methods

This section explains why current generation methods cannot fulfil the requirements of non-stationarity and periodicities. We first explain the principles of generation methods before describing how to apply them to time series generation. We then point out shortcomings of these methods concerning non-stationarity and periodicities in generated time series.

Principles of generation methods

Generation methods such as GANs [11], VAEs [15], and INNs [14] focus on describing a probability distribution PX of a random variable \(X: {\Omega } \rightarrow \mathbb {X}\), with Ω being a general probability space and \(\mathbb {X}\) the realisation space. The underlying assumption is that the observed data \(\mathbf {x} \in \mathbb {X}\) are realisations of the random variable \(X \sim P_{X}\). Since PX is often an intractable distribution, generation methods indirectly model the joint distribution PX,Z of X and a latent random variable \(Z: {\Omega } \rightarrow \mathbb {Z}\) in the latent space \(\mathbb {Z}\). Given the joint and latent distributions, PX can be expressed as

$$ P_{X} = \int P_{X \mid Z} P_{Z} dZ, $$
(4)

where PXZ is the likelihood and PZ the prior. If PXZ and PZ are tractable, this expression allows an exact calculation of PX without knowledge of the intractable distribution PX.

To be able to make use of (4), generation methods learn mappings for the generative process. Given an intractable distribution PX in the realisation space, VAEs and INNs learn an encoding f(X;𝜃1) from the sample distribution PX to the distribution PZX, where PZX is the probability distribution in the latent space given X and 𝜃1 are the trainable parameters. Thereby, regularisation is applied to ensure that PZX is a good approximation of PZ and that PZ is a tractable distribution in the latent space. Given PZ, generation methods then learn a second mapping g(Z;𝜃2) from PZ to PXZ, where 𝜃2 are the trainable parameters.Footnote 3 Based on the learned probability distribution PXZ and the known tractable distribution PZ, generation methods finally apply (4) to determine an approximation of the sample distribution PX.

Generating time series

Unfortunately, time series are not realisations of a probability distribution PX but of a time-dependent stochastic process {Xt}. Therefore, the aforementioned underlying assumption of generation methods does not hold. To still apply the principles of generation methods, one must account for the time-dependency of time series. One possibility to consider this time-dependency is to split a realised time series sample \(\mathbf {x} \in \mathbb {X}\) into, for example, N sequential segments \(\mathbf {x} = (\mathbf {x}^{1}, \mathbf {x}^{2}, \dots , \mathbf {x}^{N})\) of arbitrary length, because a stochastic process is defined as a series of random variables.Footnote 4 Analogously, one can aggregate multiple generated time series segments \(\hat {\mathbf {x}}^{i}, i \in [1,N]\) to include time-dependency in a generated time series longer than one segment. To generate these time series segments \(\hat {\mathbf {x}}^{i}, i \in [1,N]\), one draws multiple samples zi,i ∈ [1,N] from PZ and uses the mapping \(g(\mathbf {z}^{i}; \theta _{2}) = \hat {\mathbf {x}}^{i}\).

Shortcoming 1: non-stationarity in generated time series

In the generative process, the used samples zi,i ∈ [1,N] from the latent space are realisations of the random variable Z with the known distribution PZ. Similarly, the generated time series segments \(\hat {\mathbf {x}}^{i}, i \in [1,N]\) are realisations of the random variable transformation g(Z;𝜃2). According to the so-called law of the unconscious statistician (LOTUS) [21], this transformed random variable g(Z;𝜃2) has the expected value

$$ \mathbb{E}[g(Z; \theta_{2})] = \int g(\mathbf{z}; \theta_{2}) P_{Z} d\mathbf{z}. $$
(5)

Assuming that the mapping g(Z;𝜃2) only depends on the random variable Z and the fixed learned parameters 𝜃2, all generated time series segments also have the same expected value, i.e.

$$ \mathbb{E}[\hat{\mathbf{x}}^{i}] = \mathbb{E}[g(\mathbf{z}^{i}; \theta)], \forall i \in [1,N]. $$
(6)

The same argument applies to the variance. Since the variance of a random variable X is defined as \({\sigma ^{2}_{X}} = \mathbb {E}[X^{2}] - (\mathbb {E}[X])^{2}\), one can again use LOTUS [21] to show that the variance is also the same for all generated time series segments (for details see Appendix Appendix).

The equal variance across all generated time series segments has implications for the autocovariance. Since all generated time series segments are realisations of the same random variable transformation g(Z;𝜃2), the autocovariance that is defined for two random variables simplifies to the variance, i.e.

$$ \text{cov}(g(Z; \theta_{2}),g(Z; \theta_{2})) = \sigma^{2}_{g(Z; \theta_{2})}, $$
(7)

which, as previously shown, is the same for all generated time series segments.

Altogether, existing generation methods cannot vary the statistical properties of generated time series segments. Therefore, these methods cannot control non-stationarity in generated time seriesFootnote 5 and do not fulfil the previously defined Requirement 1.

Shortcoming 2: periodicities in generated time series

As previously shown, the autocovariance structure is the same for all generated time series segments in a generative process. As a result, existing generation methods cannot create reoccurring and different autocovariance structures. Therefore, these methods cannot control periodicities in generated time series and thus do not fulfil the previously defined Requirement 2.

4 Controlling non-stationarity and periodicities in generated time series

This section presents a novel approach for controlling non-stationarity and periodicities in generated time series. Firstly, we formally describe our approach to time series generation that fulfils Requirements 1 and 2. To show the practical viability of our approach, we then introduce a conditional Invertible Neural Network (cINN) as an exemplary implementation.

4.1 Formal solution

This section formally describes the novel approach for generating time series with controlled non-stationarity and periodicities whilst overcoming the previously presented Shortcomings 1 and 2 of existing generation methods. We firstly explain an assumption to guarantee the existence of time series segments with non-stationarity and periodicities. Afterwards, we detail how our approach uses and combines calendar and statistical information to control non-stationarity and periodicities in generated time series.

Existence guarantee

To guarantee the existence of time series segments with controlled non-stationarity and periodicities, we assume the encoding f(X;𝜃1) to be a bijective mapping, where g(Z;𝜃2) is the inverse function of f(X;𝜃1), i.e. f− 1(⋅;𝜃) := g(⋅;𝜃) and 𝜃 = 𝜃1 = 𝜃2. This mapping guarantees that the image of the concatenation \(\text {Im}((f \circ f^{-1})(\mathbb {X})) = \mathbb {X}\) includes the entire realisation space \(\mathbb {X}\). Therefore, for all possible samples from the latent space, a corresponding time series segment in the realisation space exists, i.e.

$$ \forall \mathbf{z} \sim P_{Z} \exists \hat{\mathbf{x}} : f^{-1}(\mathbf{z}; \theta) = \hat{\mathbf{x}}. $$
(8)

Besides guaranteeing the existence of a corresponding time series segment for all possible samples, this bijective mapping allows us to include additional inputs in the mapping. With these inputs, we can vary the properties of each generated time series segment:

Calendar information

The first additional input is calendar information such as the hour, day of week, month, or year. This information is implicitly present in a time series as a realisation of a stochastic process {Xt} but is currently not considered by generation methods. To include this information, we use the calendar information d as an additional input to our mapping, i.e.

$$ f: \mathbb{X} \rightarrow \mathbb{Z}, \mathbf{x}^{i} \mapsto f(\mathbf{x}^{i}; \mathbf{d}, \theta) = \mathbf{z}^{i}. $$
(9)

Considering calendar information enables us to generate time series segments with varying calendar information, even though these segments are generated from samples zi that are realisations of the same random variable Z. However, solely including calendar information does not allow us to vary the statistical properties of each generated time series segment.

Statistical information

To vary the statistical properties of each generated time series segment, we consider statistical information such as mean and variance as a second additional input. Therefore, we add statistical information s to our mapping, i.e.

$$ f: \mathbb{X} \rightarrow \mathbb{Z}, \mathbf{x}^{i} \mapsto f(\mathbf{x}^{i}; \mathbf{d}, \mathbf{s}, \theta) = \mathbf{z}^{i}. $$
(10)

Combining calendar and statistical information

Based on calendar and statistical information as inputs, we are able to generate time series segments with varying statistical properties dependent on the calendar information. For example, as a result, the mean of the transformed random variable f− 1(Z;d,s,𝜃) is dependent on calendar and statistical information, i.e.

$$ \mathbb{E}[f^{-1}(Z; \mathbf{d}, \mathbf{s}, \theta)] = \int f^{-1}(\mathbf{z}; \mathbf{d}, \mathbf{s}, \theta) P_{Z} d\mathbf{z}. $$
(11)

Similarly, the variance and the autocovariance also depend on the calendar and statistical information. Since the calendar and statistical information are included as additional inputs to the mapping, we can effectively control the statistical properties of the generated time series segments for the calendar information. This interplay between calendar and statistical information enables us to include and control non-stationarities and periodicities in the generated time series. Therefore, combining calendar and statistical information as additional inputs allows us to fulfil Requirements 1 and 2 mentioned above.

4.2 Exemplary implementation

This section presents the exemplary cINN-based implementation of our novel approach for generating time series with controlled non-stationarity and periodicities. After a brief overview of its architecture, we describe its training and generative process.

Architecture

To realise the previously defined bijective mapping that considers calendar and statistical information, we use a conditional Invertible Neural Network (cINN). To implement the exemplary cINN, we use FrEIAFootnote 6 and PyTorch [18]. As shown in Fig. 1, the used cINN comprises 15 subsequent invertible coupling layers, one conditioning network q, and their trainable parameters 𝜃.Footnote 7

As coupling layers, we use the conditional affine coupling layer proposed by Ardizzone et al. [2], which extends RealNVP by Dinh et al. [4]. Each coupling layer contains two subnetworks. The architecture of the subnetworks is shown in Table 1. As inputs, each coupling layer takes the output of the previous coupling layer and the conditional information c. This conditional information is the calendar information d and the statistical information s mentioned above, both encoded by a separate conditioning network q.

Table 1 Implementation details of the cINN regarding the used subnetwork

The architecture of the conditioning network q is also described in Table 2. The input of q are calendar and statistical information. The calendar information d are information for all the time stamps for which a value should be generated. This information includes the hour of the day encoded as a sine function \(\sin \limits (\pi \cdot \text {hour}/23)\) and cosine function \(\cos \limits (\pi \cdot \text {hour}/23)\), the month of the year as sine function \(\sin \limits (\pi \cdot \text {month}/11)\) and cosine function \(\cos \limits (\pi \cdot \text {month}/11)\), and the weekend as a Boolean. The statistical information s for the training is the mean of the time-series sample, i.e. \(\hat {\mu }^{i} = \mathbb {E}[\hat {\mathbf {x}}^{i}]\). In the generation, the mean is the desired mean of the generated sample.

Table 2 Implementation details of the cINN regarding the used conditioning network q

Training

To train the used cINN, we extend the training proposed by Ardizzone et al. [2] with a statistical loss. The training is generally based on a maximum likelihood optimisation using the change of variable formula

$$ P_{X}(\mathbf{x};\mathbf{c}, \theta) = P_{Z}(f(\mathbf{x};\mathbf{c}, \theta)) \left\lvert \det \frac{\partial f}{\partial \mathbf{x}} \right\rvert $$
(12)

with the Jacobian matrix f/x [2]. To implement this optimisation, we select the standard normal distribution as the latent distribution, choose Gaussian priors, and apply Bayes’ theorem. The result is the maximum likelihood loss, i.e.

$$ \mathcal{L}_{\text{ml}} = \mathbb{E}^{i}\left[ \frac{\parallel f(\mathbf{x}^{i}; \mathbf{c}^{i}, \theta){\parallel^{2}_{2}} }{2} - \log\mid J^{i} \mid \right] + \lambda \parallel \theta {\parallel^{2}_{2}}, $$
(13)

where Ji is the Jacobian corresponding to the i-th sample [2]. In addition to \({\mathscr{L}}_{\text {ml}}\) and as an extension of Ardizzone et al. [2], we also minimise the difference between the desired mean and the mean of the generated time series segment, i.e.

$$ \mathcal{L}_{s} = \sqrt{(\mathbf{s}_{\mu} - \frac{1}{n} {\sum}_{j} {\hat{\mathbf{x}}_j})^{2}}, $$
(14)

where \({\hat {\mathbf {x}}_j}\) is an entry of the generated time series segment and sμ is the desired mean in the statistical information. Therefore, the overall loss used to train the cINN is

$$ \mathcal{L} = \mathcal{L}_{\text{ml}} + \lambda \mathcal{L}_{s}, $$
(15)

where λ is a hyperparameter weighting the influence of the statistical loss.

We train the cINN for 200 epochs with this defined loss using the ADAM optimiser.

Generative process

The cINN learns a mapping from the realisation space to the latent space in the described training. Since this mapping is bijective, we can use the inverse direction as a generative process: To generate a time series segment \(\hat {\mathbf {x}}^{i}\), we firstly draw a sample zi from the random variable Z, choose the desired calendar and statistical information di and si, and apply the conditioning network to encode ci = q(di,si). With these inputs, we use the trained cINN in the inverse direction and obtain a time series segment \(\hat {\mathbf {x}}^{i} = f^{-1}(\mathbf {z}^{i};\mathbf {c}^{i}, \theta )\).

To create a time series longer than one segment, we utilise the calendar information previously included in the mapping to aggregate the generated time series segments \(\hat {\mathbf {x}}^{i}\). More specifically, we take advantage of the fact that the calendar information di of adjacent time series segments overlap: For adjacent segments, similar and related calendar information form the input of the conditional network. This input ensures that the sample distribution is conditioned on similar and related calendar information. This way, generated time series segments \(\hat {\mathbf {x}}^{i}\) with adjacent calendar information di are related, and we can calculate the median over all entries of a certain time t with

$$ \hat{x}_{t} = \text{Median}(\{\hat{\mathbf{x}}^{i}_{j} \mid \mathbf{d}^{i}_{j}\rightarrow t\}) \forall t \in [1,L], $$
(16)

where the condition \(\mathbf {{d^{i}_{j}}} \rightarrow t\) ensures that only entries of the time series segments \(\hat {\mathbf {x}}^{i}\) with the same time t are aggregated.

5 Experiments

This section empirically evaluates the cINN mentioned above as an exemplary implementation of our proposed approach for generating time series with controlled non-stationarity and periodicities. After introducing the used data sets, we demonstrate how the cINN generates time series with controlled non-stationarity and periodicities. Finally, we compare the cINN to state-of-the-art time series generation benchmarks to assess the general quality of the generated time series. At this point, we also perform an ablation study to determine the influence of conditional information and the statistical loss.

5.1 Data sets

In order to comprehensively evaluate our proposed approach and the exemplary implementation, we aim to select data sets from different domains. Since data sets commonly used to evaluate time series generation methods are not publicly available or do not contain calendar information (e.g. [8, 22, 23]), we select three publicly available time series data sets. The selected data setsFootnote 8 all contain univariate time series with calendar information but differ in their temporal resolution, variance, and periodicities:

Energy :

The first data set has an hourly temporal resolution, a low variance, and contains daily, weekly, and yearly periodicities. It consists of the electricity consumption of one client from the UCI Electricity Load Dataset [7].

Information :

The second data set has a daily temporal resolution, a medium variance, and contains no periodicities. It comprises the number of daily views of the Wikipedia page “Hitchhiker’s Guide to the Galaxy (video game)” from the Web Traffic Time Series Forecasting Dataset.Footnote 9

Mobility :

The third data set also has an hourly temporal resolution, a high variance, and contains daily and yearly periodicities. It contains the hourly records of rented bikes from the UCI Bikesharing Dataset [7, 9].

5.2 Generated time series with controlled non-stationarity and periodicities

To demonstrate how the cINN creates time series with controlled non-stationarity and periodicities as described in Requirements 1 and 2, we first generate a time series with controlled non-stationarity and second a time series with controlled periodicities for each data set. We first define the calendar and statistical information used in the generation for each time series. Afterwards, we evaluate the generated time series by visually inspecting them and calculating statistics corresponding to the requirements of non-stationarity and periodicities, respectively.

Controlled non-stationarity

To demonstrate controlled non-stationarity, we define calendar and statistical information for the cINN as follows. To determine the calendar information, we choose the years from 2011 to 2013 for the three data sets. As statistical information for the energy and information data sets, we specify a mean with a linear trend starting from 75 and ending at 125 and a yearly sinusoidal periodicity with an amplitude of 15. For the mobility data set, we specify a mean with a linear trend starting from 100 and ending at 250 and a yearly sinusoidal periodicity with an amplitude of 25.

The time series generated based on the selected calendar and statistical information are shown in Fig. 2 for the three data sets. For all data sets, the generated time series accurately reflect the specified mean, including the trend and the sinusoidal shape, while retaining the previously described original characteristics of the respective data set concerning variance and periodicities.

Fig. 2
figure 2

To demonstrate controlled non-stationarity, a time series is generated using defined calendar and statistical information. For the energy, the information, and the mobility data sets, the generated time series is shown in blue and the used controlled mean in orange

To further examine the generated time series according to the previously defined requirement of non-stationarity, we determine corresponding statistics. On four three-month cut-outs, we compare the mean, the variance, and the autocovariance with a fixed lag of half a year for two generated time series of each data set: one with a constant mean as statistical information and one with a controlled mean as statistical information to control the non-stationarity.

For both generated time series of each data set, Table 3 presents the mean, the variance, and the autocovariance of the four cut-outs. For all data sets, the calculated statistics of the generated time series with constant mean are similar for the considered cut-outs. In contrast, the calculated statistics of the generated time series with controlled non-stationarity are different for the considered cut-outs of each data set and thus are all time dependent. This shows that the generated time series is non-stationary according to Requirement 1.

Table 3 Statistics according to Requirement 1, i.e. mean, variance, and autocovariance, for the generated time series with constant mean as statistical information and with a controlled mean as statistical information to control the non-stationarity

Controlled periodicities

To demonstrate controlled periodicities, we also define calendar and statistical information for the cINN. To determine the calendar information, we again choose the years from 2011 to 2013. As statistical information for the three data sets, we specify a constant mean of 150 and a yearly sinusoidal periodicity with an amplitude of 50.

For the three data sets, Fig. 3 shows the resulting generated time series and the used mean. For all data sets, the generated time series follows the periodicity defined by the mean and retains the variance and periodicities of the respective data set.

Fig. 3
figure 3

To demonstrate controlled periodicities, a time series is generated using defined calendar and statistical information. For the energy, the information, and the mobility data sets, the generated time series is shown in blue and the used controlled mean in orange

To further analyse the generated time series according to the previously defined requirement of periodicities, we examine the autocovariance structure of the generated time series. More specifically, for the generated time series of each data set, we calculate the autocovariance of the first three months and the rest of the time series.

Figure 4 shows the yearly and the daily autocovariance of the generated time series of each data set. Note that no daily autocovariance structure exists for the information data set because this data set has a daily resolution and thus no daily periodicities. The autocovariance has a daily and yearly reoccurring structure for all generated time series. According to Requirement 2, the generated time series thus contains periodicities. These periodicities are more regular for the energy and mobility data sets than for the information data set.

Fig. 4
figure 4

The yearly and daily autocovariance structure according to Requirement 2 of the generated time series with controlled periodicities based on the three data sets. The autocovariance is calculated between the first three months and all other three months segments of the time series. Note that the information data has a daily resolution and thus we only calculate the yearly autocovariance

5.3 Quality of generated time series

To assess the general quality of the generated time series, we compare the cINN to state-of-the-art benchmarks on the three selected data sets and perform an ablation study regarding the influence of the conditional information and the statistical loss. We first introduce the three used evaluation metrics and the six benchmarks before presenting the benchmarking and the ablation study results. For this evaluation, we run each generation method three times. The cINN uses the calendar information and a calculated rolling mean of the respective data set for the generation.

5.3.1 Metrics

To assess the quality of the generated time series segments, we use three metrics. Firstly, we apply the train-synthetic-test-real evaluation [8] to obtain a predictive score. The predictive score measures the usefulness of the generated time series. Secondly, we make use of a discriminator to obtain a discriminative score. With the discriminative score, we examine the distinguishability of the generated and the original time series [23]. Thirdly, we measure the training time of the generation methods to assess their computational cost. In the following, we detail all metrics.

Predictive score

For the train-on-synthetic-test-on-real evaluation [8], we train a predictive model on the generated time series and test the model on the original time series to obtain a predictive score. As the predictive score, we use the mean absolute error (MAE), the mean absolute scaled error (MASE), and the root mean squared error (RMSE).

The architecture of the predictive model is a three-layered fully connected neural network with ten hidden neurons. The model is designed to work with time series segments of 24 hours. Thereby, the first 23 hours are used to forecast the last value. We use a ReLU activation function for the hidden layers and a linear activation function for the output layer.

To implement the predictive model, we use pyWATTSFootnote 10 [12] with Keras [3]. We train the implemented predictive model for 100 epochs and apply early stopping during the training process. To obtain more robust results, we train the predictive model five times on the generated time series of each data set.

Discriminative score

For the evaluation by the discriminator [23], we merge the generated and the original time series. We label the generated time series with 0 and the original time series with 1. Afterwards, the merged data set is split into a training (70%) and test set (30%). The discriminative model is then trained on the training set and the discriminative score is calculated on the test set. We use ∣Accuracy − 0.5∣ as discriminative score, where Accuracy refers to the performance of the discriminative model on the test set.

The architecture of the discriminative model is a three-layered fully connected neural network. The network uses tanh as an activation function for the hidden layers and softmax for the output layer.

To implement the discriminative model, we also use pyWATTS [12] with Keras [3]. We train the implemented discriminative model on the CPU for ten epochs using the ADAM optimiser and the binary cross-entropy loss. We run the training five times on each generated time series to obtain more robust results.

Computational cost

To evaluate the computational cost, we measure the training time of the evaluated generation methods in seconds. For this, we measure the time required for training of all generation methods three times and calculate the respective average to obtain robust results. For comparable results, we perform the training on the same hardware. As hardware, we use an off-the-shelf Lenovo Thinkpad T490 laptop with an Intel i7-8565U processor and 16 GB of RAM.

5.3.2 Benchmarks

As benchmarks, we select six generation methods. As state-of-the-art benchmark generation methods, we consider COT-GAN [22], RGAN and RCGAN [8], and TimeGAN [23]. As baseline generation methods, we additionally consider a simple GAN and a simple VAE.

COT-GAN

For the implementation of COT-GAN [22], we use the publicly available source code.Footnote 11 We only adapt the data loading functionality to apply COT-GAN to our data.

RGAN and RCGAN

For the implementation of RGAN and RCGAN [8], we use the publicly available source code.Footnote 12 For our experiments, we only adapt the setting file test.txt and the data_utils.py to load our data and ensure the training of RGAN and RCGAN on our data is robust. For RCGAN, we use the same calendar and statistical information as the implemented cINN.

TimeGAN

To implement TimeGAN [23], we use the publicly available source code.Footnote 13 Compared to the source code, we adapt the data loader of TimeGAN to apply it to our data sets.

GAN

To implement the GAN, we use Keras [3] and Tensorflow [1]. Table 4 provides details on the generator and the discriminator of the implemented GAN. The dimension of the random noise input of the implemented generator is 32. We train the GAN for 200 epochs and use the binary cross-entropy as the loss function for the discriminator.

Table 4 Implementation details of the GAN used as benchmark. Note that the stride is 1 for each convolutional layer

VAE

For the implementation of the VAE, we use Keras [3] and Tensorflow [1]. Table 5 details the encoder and the decoder of the implemented VAE. The latent dimension of the VAE is 2. When training the model, we use the Kullback-Leibler distance and the RMSE as the loss functions. We train the VAE for 2000 epochs and apply an early stopping with a patience of 10 epochs.

Table 5 Implementation details of the VAE used as benchmark. Note that the stride is 1 for each convolutional layer

5.3.3 Benchmarking results

With the cINN and the six selected benchmark methods, we generate time series segments for the three selected data sets. We compare the related predictive score, discriminative score, and training time in the following.

Predictive score

For all three data sets, the average, minimum, and maximum predictive scores of the cINN and the six benchmark methods are shown in Table 6. We also report a predictive model trained on the original time series as original data in the table, which corresponds to a train-on-real-test-on-real evaluation.

Table 6 The average, minimum, and maximum predictive scores of the cINN and the six benchmark methods on the energy, information, and mobility data sets. For comparison, a predictive model trained on the original time series is additionally reported as original data. Lower the better

The cINN outperforms all benchmark methods except COT-GAN on the three selected data sets. While the cINN performs better than COT-GAN on the mobility data set, it is on par with COT-GAN on the energy and information data sets. The cINN is also on par with the GAN for the information data set. Moreover, the cINN is almost on par with the predictive model trained on the original time series. Considering the MASE, we also observe that the performance of the generation methods depends on the data set. More specifically, most generation methods obtain the best relative predictive score –measured by the MASE– on the mobility data set. However, most generation methods achieve the worst predictive score on the mobility data set compared to the original data.

Discriminative score

The average, minimum, and maximum discriminative scores of the cINN and the six benchmark methods on the three data sets are shown in Table 7. We also observe that the cINN outperforms all benchmark methods except COT-GAN for the discriminative score. However, while the cINN performs better than COT-GAN on the information and mobility data sets, it performs slightly worse on the energy data set. Moreover, we observe that the discriminative score of each generation method is more similar across the different data sets than the predictive score.

Table 7 The average, minimum, and maximum discriminative scores of the cINN and the six benchmark methods on the energy, information, and mobility data sets. Lower the better

Computational cost

The computational cost of the cINN and the six benchmark methods in terms of the average training time is presented in Table 8. Overall, we observe that the simple generation methods, namely the GAN and the VAE, have the lowest training times. However, we observe that the cINN has the lowest training times when only considering state-of-the-art generation methods. Note that the training times of the generation methods on the information data set are shorter due to the smaller length of the related time series segments.

Table 8 The average training time in seconds of the cINN and the six benchmark methods for the three selected data sets

5.3.4 Ablation study

To determine the influence of the conditional information comprising calendar and statistical information as well as the statistical loss defined in (14), we perform an ablation study for the predictive and the discriminative scores. Based on the three data sets, we compare the cINN using calendar and statistical information and the statistical loss (cINN) to a cINN using only the calendar and statistical information (cINN Stats + Cal). Additionally, we compare cINNs using only statistical information (cINN Stats), calendar information (cINN Cal), and no information (INN).

Predictive score

The predictive scores of the different cINNs for the three data sets are shown in Table 9. We observe that the considered cINNs generally perform similarly on the three data sets. While the INN barely achieves the best performance on the energy data set and the cINN using only calendar information achieves the best results on the mobility data set, all cINNs are on par on the information data set.

Table 9 Ablation study comparing different cINNs with respect to the average, minimum, and maximum predictive score. Lower the better

Discriminative score

The discriminative scores of the different cINNs for the three data sets are presented in Table 10. Our observation is that the considered cINNs perform similarly on the energy and information data set. However, the cINNs using statistical information perform better on the mobility data set than the cINNs that do not consider this information.

Table 10 Ablation study comparing different cINNs with respect to the average, minimum, and maximum discriminative score. Lower the better

6 Discussion

This section discusses the previously reported results of the experiments, limitations, and benefits of the cINN as the exemplary implementation of the proposed approach.

In the experiments, we observe that, based on defined conditional information, the cINN can generate time series with controlled non-stationarity and periodicities while retaining the characteristics of the original data set. Furthermore, the cINN outperforms or is on par with the selected benchmark generation methods regarding the predictive and discriminative scores. Also, the cINN requires the lowest training time of the considered state-of-the-art generation methods, probably due to the cINN’s lower number of parameters or non-recurrent architecture. Additionally, in the ablation study, we observe that considering calendar and statistical information as conditional information only partly influences the predictive and discriminative score. From these observations, we conclude that the cINN, as an exemplary implementation of the proposed approach, can control the non-stationarity and the periodicities of generated high-quality time series with arbitrary length.

Despite these promising results, we note that the performed experiments may have limitations. One limitation could be that the selected data sets only have an hourly or daily resolution and only contain moderate variations. Another limitation might be that we do not evaluate how the described method for aggregating generated time series segments affects the performance of our approach. Furthermore, we only use univariate time series in our evaluation, even though our approach can be extended to multivariate time series.

Concerning the proposed approach, one limitation could also be the required bijective mapping realised by the cINN in the exemplary implementation. While using the bijective mapping guarantees the existence of all generated time series segments, we assume that it is not a necessary requirement for the proposed approach. Therefore, the proposed approach should also be effective with other generative mappings without the guaranteed existence of the generated time series. These mappings could approximate the inverse function, as in VAEs, or be trained by a discriminator, as in GANs. Extending our method to these generative mappings could lead to a more general framework to control non-stationarity and periodicities when generating time series. Another limitation could be that the proposed approach requires calendar information. Although this calendar information is present in a wide range of real-world time series, some time series, such as audio time series, do not contain such information. In addition to calendar information, the proposed approach also considers statistical information to generate time series with desired properties. In real-world applications, these desired properties may only be partially known and thus may need to be approximated.

However, given calendar and statistical information, the proposed approach enables controlling non-stationarity and periodicities in generated time series in many real-world applications, especially where real-world time series are non-existent, only partly available, not usable due to privacy concerns, or expensive to measure. In these cases, the proposed approach allows the generation of specific and diverse scenarios with non-stationarity and periodicities in time series. These scenarios could then be used to investigate unusual phenomena and various applications such as forecasting and imputation. Hence, our approach noticeably extends the capabilities of existing time series generation methods and offers new opportunities for purposeful time series generation and analysis.

7 Conclusion

The present paper presents a novel approach to control non-stationarity and periodicities with calendar and statistical information when generating time series. For this purpose, we first define the requirements for generation methods to generate time series with non-stationarity and periodicities, which we show is not fulfilled by existing generation methods. Secondly, we formally describe the novel approach for controlling non-stationarity and periodicities in generated time series. Thirdly, we introduce an exemplary implementation of this approach using a conditional Invertible Neural Network (cINN) that preprocesses calendar and statistical information as conditional input with a conditioning network.

To evaluate the proposed cINN, we examine its capabilities to generate time series with controlled non-stationarity and periodicities in experiments with real-world data sets. We also compare the general quality of its generated time series to state-of-the-art benchmark generation methods and perform an ablation study to analyse the effects of the conditional information. The presented experiments show that the cINN can generate time series with controlled non-stationarity and periodicities while retaining the characteristics of the original data set. Furthermore, the cINN outperforms or is on par with the selected benchmark generation methods regarding the predictive and discriminative scores. The cINN also requires the lowest training time of the considered state-of-the-art generation methods.

Future work could relax the assumption of a bijective mapping by applying the proposed approach to other generative models to control non-stationarity and periodicities during time series generation. This way, relaxing this assumption could enable a more general framework to control non-stationarity and periodicities during time series generation. Moreover, future work could extend the proposed approach to multivariate time series and time series without calendar information. Similarly, future work could focus on identifying additional controllable properties of time series and incorporating them into the proposed approach.