1 Introduction

There is a vast literature on the classification of time series with applications in many fields, amongst them being medicine, seismology, finance and quality control. Many classification techniques are based on spectral analysis methods (Kakizawa et al. 1998; Maharaj 2002; Shumway 2003; Sakiyama and Taniguchi 2004). Some of the other techniques are based on neural networks (Pham and Chan 1998; Nigam and Graupe 2004), fuzzy inference systems (Kannathal et al. 2005), and wavelets (Maharaj and Alonso 2007). For some of these techniques, consideration has been given to the time varying nature of spectral density functions, in particular, Shumway (2003) and Maharaj (2002). However, no specific consideration has been given to the time varying nature of the amplitude of the cyclical components of the time series for classification purposes.

The aim of this paper then is to investigate the time varying nature of the amplitude of the dominant cyclical component of each of the time series under consideration for discriminating between the patterns of these time series. We will demonstrate through simulation studies and applications to well-known data sets that time varying amplitudes have very good discriminating power and hence their use in classical discriminant analysis is a simple alternate to more complex methods of time series discrimination.

The advantage of using this procedure over some of the other procedures in the literature is that this fairly simple approach in constructing the discriminating variables, i.e., the time varying amplitudes, does not require making choices. However, for other methods that consider the time varying nature of the time series in the frequency domain, e.g., for those by Maharaj (2002) and Shumway (2003), choices have to be made with regard to block lengths of the time series, and bandwidths of the smoothing windows before the discriminating variables are constructed; and for the wavelets approach by Maharaj and Alonso (2007), choices have to be made with regard to block lengths of the time series, types of wavelet filters and the number of levels before the construction of the discriminating variables. Furthermore, with regard to the neural network approaches such as those by Pham and Chan (1998) and Nigam and Graupe (2004), choices have to be made about the number of the firing neutrons, and the process of training the networks. Hence, obtaining the classification results involves a large number of steps.

Since complex demodulation is a local version of harmonic analysis, we first briefly discuss some aspects of harmonic analysis. Harmonic regression models are useful in describing time series with one or more cycles (refer to Bloomfield 2000 for more details). A harmonic regression model with a single cycle is defined as

$$ x_{t} = \mu+ A \cos\omega t + B \sin\omega t + \varepsilon_{t} $$
(1)

where x t is the time series, μ is the mean of the underlying time series, A=Rcosϕ and B=−Rsinϕ. R is the amplitude or height of the cycle peaks, ϕ is the phase or the location of the peaks relative to time zero, \(\omega= \frac{2\pi}{\tau }\) is the frequency and τ is the cycle length, i.e., the distance from one peak to the next. ε t is the random error term which is white noise and the errors are uncorrelated with the sine and cosine terms.

By varying the A and B parameters that indicate how much weight is given to the sine and cosine components, it is possible to generate a sinusoid waveform of period τ with any particular amplitude and phase. Sine and cosine functions have the same waveform shape but differ in phase, that is, the cosine function has a peak value of one at t=0, whereas the sine function has a peak value of one at a quarter of a cycle later. The sine and cosine functions of period τ are orthogonal, that is they are uncorrelated. Therefore by varying the relative size of the A and B coefficients, it is possible to generate all possible sinusoids of period τ; sinusoids with any mean μ, amplitude R, and phase ϕ. Given an assumed value of τ, the period or cycle length, the remaining three parameters, i.e., the mean, amplitude and the phase of the cycle can be estimated using least squares regression to obtain the best possible fit to an observed time series. The estimated amplitude and phase are obtained from \(\widehat{R} = \sqrt{\hat{A^{2}} + \hat{B^{2}}}\) and \(\tan2\pi\hat{\phi} = -\hat{B}/\hat{A}\), respectively.

Now if a time series displays multiple cycles, these can be incorporated into the harmonic model as follows:

$$ x_{t} = \mu+ \sum_{i=1}^{k} (A_{i} \cos\omega_{i} t + B_{i} \sin \omega_{i} t) + \varepsilon_{t} $$
(2)

where k is the number of cycles.

While harmonic regression models can capture multiple cycles that may be present in a time series, they are unable to capture the time varying nature of the cycles. The dynamic harmonic regression (DHR) model developed by Young et al. (1999) is better able to do so. The DHR model can be considered as an extension of the harmonic regression model defined in Eq. (1) in which the amplitude and phase of the harmonic components can vary as a result of estimated temporal changes in the parameters A i and B i which now become A it and B it . The estimation of the time varying parameters is obtained through state space and frequency domain formulation. Young et al. (1999) who state that their DHR model which is asymptotically equivalent to other time varying approaches, but is more computationally attractive, give details of this estimation process. However, the estimation of the time varying parameters is quite complex and involves several steps. In contrast to this the estimation of the time varying amplitude of a particular cycle, i.e., \(\widehat{R}_{t} = \sqrt{\hat{A}^{2}_{t} + \hat{B}^{2}_{t}}\) can be much more easily achieved using complex demodulation. Since our aim is to use the time varying amplitude of the dominant cycle to discriminate between patterns of cyclical time series, we estimate it using complex demodulation. We briefly discuss complex demodulation as put forward by Bloomfield (2000) in Sect. 2.

In order to distinguish between the patterns of these cyclical time series patterns, the time varying amplitudes will be the input features variables into classical discriminant analysis which we briefly describe in Sect. 3. To evaluate the performance of this method of classification, we conduct simulation studies using different scenarios, and we also compare the performance of this method to some of the existing methods. We also consider two well-known applications which have been used by other authors to evaluate their time series classification procedures. The simulation studies and applications are described and discussed in Sects. 5 and 6, respectively.

Note

All programming code to generate output was written in Matlab and is available from the author on request.

2 Complex demodulation

Suppose that the deterministic part of a zero mean stationary time series with a single cycle or periodic component can be represented by

$$ x_{t} = R_{t}e^{2\pi i (\omega_{0} t + \phi_t)} $$
(3)

where R t is the slowly changing amplitude, ϕ t is the slowing changing phase and ω 0 is the known frequency. Hence, the cycle length is known π 0=2π/ω 0 is known. In practice, this frequency will be identified as that corresponding to the dominant peak of the periodogram of the time series under consideration.

The aim of complex demodulation is to extract approximations to the series R t and ϕ t . Hence, complex demodulation is regarded as a local version of harmonic analysis because its aim is to describe the amplitude and phase of an oscillation as they change over time. Constructing a new series from Eq. (3) gives

$$ y_{t} = x_{t} e^{-2\pi i \omega_{0} t} = R_{t}e^{2\pi i \phi_t} $$
(4)

This new series y t is said to be obtained from x t by complex demodulation. Then, the estimated time varying amplitude is \(\widehat {R}_{t} = |y_{t}|\), and the estimated time varying phase is obtained from \(e^{2\pi i\hat{\phi}_{t} } = y_{t}/ |y_{t}|\). In practice, a plot of the time varying amplitude obtained by complex demodulation is useful to determine if constant amplitude which is so often assumed for seasonal and other cyclical time series is in fact justified. If it is not, then the time series under consideration can be modelled by means of Eq. (5) instead of Eq. (1).

In practice a series that is analysed does not consist solely of a single periodic component and furthermore there is also the noise component. Hence, in general the series would be represented by

$$ x_{t} = R_{t}e^{2\pi i (\omega_{0} t + \phi_t)} + \varepsilon_{t} $$
(5)

Hence \(\widehat{R}_{t} \) will tend to be noisy and to extract a smooth version of the time varying amplitude, it must be filtered. Bloomfield (2000) suggests using various types of linear filters with a simple moving average filter being one of them. To illustrate this concept of the time varying amplitude and its usefulness in practice, we examine the well-known Wolf’s sunspot time series which are indices of surface activity of the sun. See Bloomfield (2000) for more details about this series. Figure 1 shows the annual average of sunspot index for the years 1700 to 2000, the periodogram of this series and a smoothed version of the time varying amplitude.

Fig. 1
figure 1

Wolf’s sunspot time series

While the series appears to contain a succession of peaks every eleven years or so, there does not appear a tendency for them to occur in a regular fashion, i.e., the amplitude of the cycles differ considerably over the given time period. Hence, it is clear that the amplitude is time varying. From the periodogram of this series it can be observed that the most dominant peak is identified to occur around a frequency of about ω=0.57 which corresponds to a period of about τ=2π/ω≈11 years.

Using this frequency, the time varying amplitude is obtained by complex demodulation (Eq. (4)) followed by smoothing it with a simple moving average filter. Bloomfield (2000) suggests a moving average filter of 11 years which corresponds to the period associated with the dominant peak in the periodogram with equal weights. With the combination of the frequency corresponding to the period of 11 years and the moving average filter of order 11, complex demodulation can be interpreted as a local version of harmonic analysis, in that the smoothed demodulated series is the discrete Fourier transform of x t−5,x t−4,…,x t+4,x t+5, evaluated at the frequency of 1/11 cycles per year. The variations in the amplitude over time can be clearly observed in the third panel of Fig. 1. It should be noted that the sunspot data contains components at other frequencies, and the roughly 11-year cycle which correspond to the basic frequency. However, the other cycles or frequency components are smoothed out when applying the moving average filter (see Bloomfield 2000, p. 126).

Other series e.g., retail sales could have twelve-month as well as six-month cycles and these can often be observed from the graph of the time series. The cycles will nevertheless be apparent as peaks in the periodogram of the time series. The procedure of applying the smoothing filter to the time varying amplitude associated with the dominant cycle with have the effect of smoothing out the six month cycle and other lower order cycles that may exist in the original time series

In what follows, a simple moving average filter of order of the identified dominant cycle length, using equal weights, will be applied to the estimated time varying amplitude before being input into classical discriminant analysis to distinguish between the patterns of cyclical time series under consideration.

3 Discriminant analysis

Classical discriminant analysis is a technique for classifying a set of observations into known groups. The purpose is to determine the group membership of an observation based on a set of input variables in some optimal manner. In the context of discriminating between time series patterns, the observations are time series and the input variables are features associated with the time series.

In linear discriminant analysis, it is assumed that the groups have equal covariance matrices and differ only in their means. While in quadratic discriminant analysis, the covariance matrices of the groups are not assumed to be equal. Refer to McLachlan (2004) for more details. There are several ways to evaluate the performance of a discriminant analysis procedure. One method is to split the sample into training and hold-out samples and evaluate the error rate associated with the hold-out set which was not used in deriving the classification rule. Another method is to use the hold-out-one technique of cross-validation which is particularly useful if the samples sizes are not very large. This technique holds out the observation to be classified, deriving the classification function from the remaining observations (see Lachenbruch and Mickey 1968 for more details). The procedure is repeated for each member of the sample and an overall error rate is determined.

In the typical discriminant analysis approach, the training data are taken to be independent for the purposes of forming the sample discriminant rule. In many studies in the literature, this is not the case. The effect of correlated training data on classification error rates have been investigated by several authors. In particular, for training data following autoregressive (AR) processes of order one, Lawoko and McLachlan (1983) demonstrated that group-specific unconditional error rates increase with positive correlation, and with the dimension of the feature vector.

Assumptions of multivariate normality are made about the probability distribution of the group feature variables in linear and quadratic discriminant analysis.

However, in most real applications, the assumption might not be strictly met. Many authors have conducted studies on the robustness of the discriminant functions and have found that some of them are fairly robust to departures from assumed models with little or no modification (see e.g., Chinganda and Subrahaniam 1979; Rawlings and Faden 1986; Fatti et al. 1982).

For our study for each time series, the input variables or features are the smoothed estimated time varying amplitudes, \(\widehat{R}_{t}\), t=1,2,…,T, where T is the length of the time series. Even though in most cases the \(\widehat{R}_{t} \) values would be correlated, and the sample linear or quadratic distribution do not satisfy the normality assumption, we proceed with the classical discriminant analysis procedures after carrying out normalisation transformations on the data. We derive an expression for the distribution of the time varying amplitude in the Appendix and show it is not of a closed form.

4 Implementation

The implementation of the discrimination procedure using time varying amplitudes consists of the following three steps.

  1. 1.

    For each time series under consideration, identify the frequency and hence the cycle length corresponding to the dominant peak in the periodogram.

  2. 2.

    The time-dependent amplitude R t corresponding to the cycle length obtained in Step 1, is estimated by complex demodulation and this estimator is smoothed by a moving average filter of order that is equal to the cycle length. This filter uses equal weights.

  3. 3.

    The smoothed time varying amplitude ordinates are the input variables in the classical discriminant procedure.

Remark

  1. 1.

    The method proposed here applies to time series that are mean stationary but are not necessarily variance stationary. However, if all the time series under consideration are not mean stationary, the trend can be filtered out by one of the trend filtering methods that are available and Steps 1 to 3 will be applied to the filtered time series.

  2. 2.

    If the periodogram of a series shows up a number of dominant cycles close to each other and there is difficulty identifying the frequency at which the most dominant cycle appears, a smoothed periodogram which will reduce the noise can be generated and hence, the dominant cycle identified.

  3. 3.

    While for a given time series, the number of smoothed time varying amplitudes ordinates is equal to the length of the time series, the use of these ordinates as the discriminating variables is preferable to using the actual time series observations. This is because the noise in the actual time series would have an impact on the discrimination process. Furthermore, if we were to use the periodogram ordinates of the time series as the discriminating features, this will reduce the number of discriminating feature by half, but the noise present in the periodogram would also have an impact on the discrimination process. While Shumway (2003) uses smoothed periodograms in discriminating between patterns of time series signals, and it reduces the number of discriminating variables even further and removes the noise present in the periodogram, decisions still have to be made regarding the bandwidth of the smoothing window. In contrast, no decisions have to be made in smoothing the time varying amplitude because the smoothing order is the cycle length identified at the first step of the process for obtaining the time varying amplitude.

  4. 4.

    To address the potential computational problems that may exist with the input of a large number of variables, viz., the time varying amplitude ordinates into the discriminant procedure, we will use diagonal discriminant procedures if necessary (see Dudoit et al. 2002).

5 Simulation studies

5.1 Experimental design

Two sets of simulation studies were conducted, where two groups of 15 series each of lengths T=64, 256 and 1024 were generated from various autoregressive moving average (ARMA) processes. Each series was subjected to complex demodulation (Eq. (4)) and the time varying amplitude estimated by using the cycle length, τ, associated with the dominant peak of the periodogram of series. The time varying amplitude was then filtered using a moving average of the order [τ]. Then, both linear and quadratic discriminant analyses were applied to the filtered time varying amplitudes (the feature variables) of the 30 series. Refer to McLachlan (2004) for details on how the linear and quadratic discriminants are constructed. For both linear and quadratic functions, standard discriminant analysis which uses all variables, and stepwise discriminant analysis where discriminating variables are selected to minimise the classification error, were applied.

We also compared the performance of the proposed method with methods used by other authors. These included the classical discriminant analysis approach using wavelet variances by Maharaj and Alonso (2007); the Kullback–Leibler (K–L) and Chernoff discrepancy measures based on spectral densities, developed by Kakizawa et al. (1998) (Shumway 2003 extended the Kullback–Leibler (K–L) discrepancy for use with locally stationary time series). We also conducted a neural network approach similar to the type used by authors such as Pham and Chan (1998) and Nigam and Graupe (2004).

5.1.1 Simulation study 1

Time series were generated from AR(1) and MA(1) processes with added cyclical components, namely,

$$ (1-B\phi)Y_{1t} = \varepsilon_{t}+ C_{t} $$
(6)

and

$$ Y_{2t} = (1-B\theta)\varepsilon_{t}+ C_{t} $$
(7)

where B is the backshift operator, C t =R t sin2πt/τ and ε t N(0,1).

For the first scenario, one group of 15 series was generated from Y 1 t with ϕ=0, while the second group of 15 series was generated in turn from Y 2 t with ϕ=0.3, ϕ=0.5, ϕ=0.7, ϕ=0.9. This scenario was chosen to show the increasing separation of the groups as the AR(1) parameter increases from 0.3 to 0.9 for series in the second group.

For the second scenario, one group of 15 series was generated from Y 1t with ϕ=0.5, and the second group of 15 series was generated from Y 2t with θ=0.5. This was repeated for the first group for Y 1t with ϕ=0.9 and for the second group for Y 2t with θ=0.9. This scenario was chosen to show the separation of groups for series generated from distinctly different first order processes.

For both scenarios for each group of 15 series, the amplitude R t took on values between 1 and 15, while the cycle length was varied between 4 and 12. The cyclical component with these parameter values is based on that used for the cyclical time series from the synthetic control chart data of Alcock and Manolopoupos (1999).

We also evaluated the performance of the time varying amplitudes as discriminating variables when the time series were generated from the processes in Eqs. (6) and (7) without the cyclical components.

Figures 27 show single realisations of length T=256 of AR(1) and MA(1) processes from Eqs. (6) and (7), respectively, for parameter values ϕ=0, 0.3, 0.5, 0.9; θ=0.5, 0.9 with and without the cyclical components The corresponding filtered time varying amplitudes are shown in the second and fourth panels of each figure. We observe from these figures that while the patterns between the time series from the different groups described above cannot always be distinguished, patterns between the corresponding time varying amplitude series can often be distinguished.

Fig. 2
figure 2

Time series and time varying amplitude: AR(1), ϕ=0 and AR(1), ϕ=0.3 with cyclical components

Fig. 3
figure 3

Time series and time varying amplitude: AR(1), ϕ=0 and AR(1), ϕ=0.3 without cyclical components

Fig. 4
figure 4

Time series and time varying amplitude: AR(1), ϕ=0.5 and MA(1), θ=0.5 with cyclical components

Fig. 5
figure 5

Time series and time varying amplitude: AR(1), ϕ=0.5 and MA(1), θ=0.5 without cyclical components

Fig. 6
figure 6

Time series and time varying amplitude: AR(1), ϕ=0.9 and MA(1), θ=0.9 with cyclical components

Fig. 7
figure 7

Time series and time varying amplitude: AR(1), ϕ=0.9 and MA(1), θ=0.9 without cyclical components

It should be noted for the first order processes without the specific cyclical components, spectral density functions of AR(1) and MA(1) processes exhibit either high or low frequency variation depending on the signs of the AR or MA parameters (see Chatfield 2004). In other words, the peaks of the estimated periodogram will be located either at very high or very low frequencies. For processes of higher order, it is possible to get spectral density functions with several peaks and troughs.

5.1.2 Simulation study 2

Time series were generated from AR(2), MA(2) and ARMA(1,1) processes each with a 12-period seasonal component, namely

$$\begin{aligned} & { \bigl(1-B\phi_{1}-B^{2} \phi_{2}-B^{12} \varPhi \bigr)Y_{3t} = \varepsilon_{t} } \end{aligned}$$
(8)
$$\begin{aligned} & {Y_{4t} = \bigl(1-B\theta_{1}-B^{2} \theta_{2}-B^{12}\varTheta \bigr)\varepsilon_{t} } \end{aligned}$$
(9)
$$\begin{aligned} & { \bigl(1-B\phi-B^{12}\varPhi \bigr)Y_{5t} = (1-B \theta)\varepsilon_{t} } \end{aligned}$$
(10)

For the first scenario, one group of 15 series was generated from Y 3t with nonseasonal AR parameters, ϕ 1=−0.5, ϕ 2=0.2, and a seasonal AR parameter, Φ=0.5, and the second group of 15 series was generated from Y 4t with nonseasonal MA parameters, θ 1=0.9,θ 2=0.6, and seasonal MA parameter, Θ=0.5. For the second scenario, one group of 15 series was generated from Y 5t with nonseasonal parameters, ϕ=0.95,θ=−0.1, and seasonal parameter, Φ=0.1; and the second group of 15 series was generated from Y 5t with nonseasonal parameters, ϕ=0.1, θ=0.95, and seasonal parameter, Φ=0.25. Both cases were chosen to show the separation of groups for series generated from distinctly different higher order processes.

We also evaluated the performance of the time varying amplitudes as discriminating variables when the time series were generated from processes in Eqs. (8), (9), (10), without the specific seasonal components.

Figures 8 and 9 show single realisations of length T=256 of AR(2) and MA(2) processes from Eqs. (8) and (9), respectively with and without the seasonal components, while Figs. 10 and 11 show single realisations of length T=256 of the different ARMA processes from Eq. (10) with and without the seasonal components. The corresponding filtered time varying amplitudes are shown in the second and fourth panels of each figure. As before, we observe that patterns between the time varying amplitude of the corresponding series can be much more easily distinguished than the patterns between the time series from the different groups.

Fig. 8
figure 8

Time series and time varying amplitude: AR(2), ϕ 1=−0.5, ϕ 2=0.2 and MA(2), θ 1=0.9, θ 2=0.6 with seasonal components

Fig. 9
figure 9

Time series and time varying amplitude: AR(2), ϕ 1=−0.5, ϕ 2=0.2 and MA(2), θ 1=0.9, θ 2=0.6

Fig. 10
figure 10

Time series and time varying amplitude: ARMA(1,1), ϕ=0.95, θ=−0.1 and ARMA(1,1), ϕ=0.1, θ=−0.95 with seasonal components

Fig. 11
figure 11

Time series and time varying amplitude: ARMA(1,1), ϕ=0.95, θ=−0.1 and ARMA(1,1), ϕ=0.1, θ=−0.95

5.2 Results

Tables 1 and 2 show the average classification error rates over 1000 simulations each for the groups in each of the two scenarios for the two simulation studies, using the proposed procedure with the time varying amplitudes (TVA), the discriminant analysis procedure with wavelet variances (WAV) (Maharaj and Alonso 2007), the Kullback–Leibler (KL) and Chernoff (CH) discrepancy measures based on spectral density functions (Kakizawa et al. 1998; Shumway 2003) and the neural network approach (NN) (Pham and Chan 1998; Nigam and Graupe 2004).

Table 1 Classification error rates for simulation study 1—first order generating processes
Table 2 Classification error rates for simulation study 2—higher order generating processes

For the proposed method (TVA), on using linear and quadratic discriminant functions for standard and stepwise discriminant analysis, the linear stepwise discriminant analysis method produced the lowest hold-out classification errors, and these are reported in the tables. For the wavelet variances (WAV) method, six filters with the corresponding number of levels relevant to each filter and various block sizes (see Maharaj and Alonso 2007) were used with linear and quadratic discriminant functions for standard and stepwise discriminant analysis. It was found that the linear stepwise discriminant analysis method for single block sizes, with the maximum number of levels relevant to a particular filter produced the lowest hold-out classification errors for all filters, and there was very little variation in the results amongst the filters. The results pertaining to one of the filters, viz. the least asymmetrical filter of length (LA8) are reported in the tables. For the Kullback–Leibler (KL) and Chernoff (CH) methods, the lowest hold-out classification errors from the optimal bandwidths which had to be selected, are reported in the tables. For the neural network (NN) method, the lowest hold-out classification errors from the optimal number of firing neutrons which had to be selected to train the networks, are reported in the tables.

5.2.1 Simulation study 1

For the first scenario, where the AR(1) parameter remained fixed at 0 for Group 1 and the increasing separation of the groups as the AR(1) parameter increased from 0.3 to 0.9 for series in the second group, was considered, it is clear for all samples size, the proposed TVA method performs very much better when the cyclical component was included in the generating processes than when it was not. When the cyclical component is included, the average classification error rates decrease as the sample size increases with them being close to zero for T=256, and being zero for T=1024.

With the added cyclical component,

  • For T=64, while the TVA method is out-performed by the KL method for all parameters, and by the NN method for AR(1), ϕ=0 versus ϕ=0.3, its performance is better than that of the WAV and CH methods.

  • For T=256, the TVA method out-performs all the other methods, while for T=1024, it outperform the other methods except for the KL method for ϕ=0 versus ϕ=0.7, and ϕ=0 versus ϕ=0.9, where the performance is the comparable.

Without the added cyclical component, the performance of the TVA method is generally poor when compared to the other methods except for ϕ=0 versus ϕ=0.9 for which its average classification error rates are lower that of the NN method for all time series lengths.

For the second scenario, where the series have been generated from distinctly different first order processes to assess the separation of groups, for all samples sizes, the proposed TVA method performs better when the cyclical component was included in the generating processes than when it was not.

With the added cyclical component

  • For ϕ=0.5 versus θ=0.5, except for the KL method, the TVA method out-performs the other methods for T=64, while for T=256, it outperforms all the other methods. For T=1024, it also out-performs the other methods, except for the KL method where the performance is comparable.

  • For ϕ=0.9 versus θ=0.9, for all sample sizes, with the exception of the KL and CH methods, its performances is better than that of the other methods. However, for T=1024, the performance of the TVA is the comparable to that of the KL and CH methods.

Without the added cyclical component, for all time series lengths, the performance of the TVA method is generally poor when compared to the other methods except for that of the NN method.

5.2.2 Simulation study 2

For this study, the separation of groups for series generated from distinctly different higher order processes is evaluated. There is little or no difference in the performance of the TVA method with or without the seasonal components. As the sample size increases, the average classification error rates decrease to zero or to almost zero for T=256, and to zero for T=1024.

  • For series from AR(2) versus MA(2) processes,

    • with seasonal components, for T=64, with the exception of the WAV method, the TVA performs better than the other methods. However, without the seasonal components, its performance is only better than that of the NN method.

    • for T=256, with or without the seasonal components, the performance of the TVA is comparable to that of the WAV method but is better than that of the other methods.

    • for T=1024, with or without the seasonal components, the performance of the TVA is comparable to that of the other methods, except for the NN method which performs quite poorly.

  • For series from two different ARMA(1,1) processes,

    • for T=64, with or without the seasonal components, with the exception of the NN method, the performance of the other methods is better than that of the TVA.

    • for T=256 and T=1024, with or without the seasonal components, the performance of the TVA is comparable to that of the other methods, except for the NN method which performs quite poorly.

5.2.3 Summary

Overall the results of the simulation studies reveal that

  • when series are generated from first order ARMA models with cyclical components, the time varying amplitude has very good discriminating power for the larger sample sizes and its performance is better or comparable to other methods considered. However, when the cyclical components are not included, the reasonably good discrimination power of the TVA is only apparent when the series are generated from distinctly different processes.

  • when series are generated from higher order ARMA models with or without seasonal components, the time varying amplitude displays very good discriminating power for the larger sample sizes and its performance is better or comparable to other methods considered.

6 Applications

We performed both standard and stepwise discriminant analysis with linear and quadratic discriminant functions with a hold-out-one validation procedure to obtain the classification error rates for differentiating between patterns of time series from two well-known data sets. In both applications, all the time series under consideration are mean stationary. We did not consider applications to other data sets in which some of the time series were mean stationary and others were not were not, as in the case with the synthetic control chart data of Alcock and Manolopoupos (1999). The reason for this is that the underlying assumption for obtaining the time varying amplitude, is that the time series is mean stationary. If however, all time series under consideration are not mean stationary, then all series will have to be made stationary by the same transformation such as the same order of differencing, before the time varying amplitude can be meaningfully used.

6.1 Earthquake and explosion data

Several author including Kakizawa et al. (1998), Shumway (2003), Huang et al. (2004), Chinipardaz and Cox (2004), and Maharaj and Alonso (2007) have evaluated their time series classification procedures using the suite of eight earthquakes and eight mining explosions originating in the Scandinavian Peninsula as well as an unknown event that originated in Novaya Zemlya, Russia. See Shumway and Stoffe (2000) for more details about the data. Figure 12 shows the waveforms of an earthquake, explosion and the unknown event.

Fig. 12
figure 12

Waveforms of an earthquake, explosion and the unknown event

The general problem is discriminating between the waveforms generated from earthquakes and explosions. Each waveform has two phases of arrival; the primary (P-phase) which accounts for approximately the first half of the waveform, and the secondary (S-phase), which accounts for approximately the second half. We use the ratio of filtered time varying amplitude of the S-phase to the P-phase as the feature variables. Shumway and Stoffer (2000, pp. 9, 462–465) discuss the rational for using the S-phase to P-phase ratio for discriminating between earthquakes and explosions.

Figure 13 shows the ratios of the time varying amplitude of the S-phase to the P-phase of an earthquake, explosion and the unknown event. It can be observed that the patterns associated with the earthquake and explosion can be clearly distinguished. The pattern associated with the unknown event appears to be closer to that associated with the explosion. The classification results are given in Table 3.

Fig. 13
figure 13

Ratios of time varying amplitudes S-phase/P-phase of an earthquake, explosion and the unknown event

Table 3 Classification results for a hold-out-one procedure for earthquakes and explosions

Using the stepwise method with the linear discriminant function, all earthquakes and explosions were correctly classified, and the unknown event was classified as an explosion. This is consistent with results obtained by Kakizawa et al. (1998), Shumway (2003), Huang et al. (2004), Chinipardaz and Cox (2004) and Maharaj and Alonso (2007). From Table 3, it can be observed that the other three combinations of method and discriminant functions gave classification error rates of 2/16, 4/16 and 4/16.

6.2 EEG data

We consider the suite of 500 EEG time series studied by Andrzejak et al. (2001) which are divided into five sets (denoted by A–E), each containing 100 EEG segments of 23.6 seconds duration (4096 observations):

  • Sets A and B are EEG recordings of healthy volunteers. They were awake with eyes open (A) and with eyes closed (B).

  • Set C and D are EEG recordings of epileptic patients measured during seizure free intervals. Segments in Set D were recorded from what is known as the epileptogenic zone and segments in Set C were recorded in the opposite hemisphere of the brain.

  • Set E consists of EEG recordings during seizure activity.

Figure 14 shows an example from each of the five sets. Notice that the patterns of recordings A and B are similar, as are those of C and D. However, the patterns of recordings A and E, B and E, C and E, D and E are dissimilar, while the patterns of the recordings of A, B, C, and D are not all that much dissimilar.

Fig. 14
figure 14

EEG recordings from each of sets A to E

Figure 15 shows their corresponding filtered time varying amplitudes. It can be observed that there is a distinct difference between the time varying amplitudes of the patterns A and E recordings. However, this difference in patterns of A and B are less distinct as is the case for patterns C, D and E.

Fig. 15
figure 15

Time varying amplitudes of EEG recordings from each of sets A to E

The results were extremely poor (minimum classification error rate of 57 %) when trying to differentiate between the five patterns, A, B, C, D and E. Since the A and B are so similar as are the C and D patterns, we considered differentiating between A, C and E patterns and also the B, D and E patterns. In both cases the classification error rate reduced to around 20 %. However, when we considered only the A and E records, which clearly have different patterns in their EEG records as well as between their corresponding time varying amplitude patterns, there was a vast improvement in the classification error rates. These results are given in Table 4.

Table 4 Classification results for a hold-out-one procedure for earthquakes and explosions

For each type of discriminant function, the standard and stepwise methods produce the same results. While the results are not that good with the linear discriminant function (12 %) the classification error rate when using the quadratic discriminant function is excellent (0 %). Hence, it is clear our method only works well when there is a distinct difference between the patterns of the times series from the groups under consideration.

Using Adaptive Neuro Fuzzy Inference System networks to discriminate among the patterns A and E, Kannathal et al. (2005) achieved an a classification error rate of 7.8 %. Nigam and Graupe (2004) used a Large-Memory Storage and Retrieval neural networks to discriminate among the patterns A and E and achieved a classification error rate of 2.8 %. On the other hand, Maharaj and Alonso (2007) achieved a 0 % classification error rate using time dependent wavelet variance in a quadratic discriminant analysis procedure. Hence, our results compare favourably with those obtained from classification methods used by other authors.

7 Conclusion

A new innovative procedure for discriminating between patterns of cyclical time series has been proposed. The method focuses on the time varying nature of the amplitude of the cyclical components of the series and uses the filtered time varying amplitudes as the feature variables in linear and quadratic discriminant functions using both standard and stepwise discriminant functions. The simulation results reveal that for particular combinations of method and discriminant function, the filtered time varying amplitude has very good discriminating power and its performance is generally comparable to some of the existing methods. Applications to well-known data sets show that performance of the procedure compares favourably with that of some of the existing methods.

The added value of using complex demodulation to obtain the time varying amplitudes for use in discriminant analysis, over some of the other procedures in the literature is that the fairly simple approach in constructing the discriminating variables, does not require making choices. However, for other methods that consider the time varying nature of the time series in the frequency domain, e.g., for those by Maharaj (2002) and Shumway (2003), choices have to be made with regard to block lengths of the time series and bandwidths of the smoothing windows before the discriminating variables are constructed; for the wavelet approach by Maharaj and Alonso (2007), choices have to be made with regard to block lengths of the time series, types of wavelet filters, and the number of levels before the construction of the discriminating variables. Furthermore, with regard to the neural network approaches such as though by Pham and Chan (1998) and Nigam and Graupe (2004), choices have to be made about the number of the firing neutrons, and the process of training the networks and hence obtaining the classification results involves a large number of steps.

Our procedure uses the complete time series record, searches for the single dominant cyclical component, and then uses the cycle length of this component to demodulate the time series to obtain the time varying amplitude. This identified cycle length is then used to filter out from the time varying amplitude, noise and other cycles that may be present in the time series. We believe this method is a simple, useful and an innovative contribution to the vast existing literature on the classification of time series, with particular reference to cyclical time series.