1 Introduction

In this paper we illustrate the non-parametric technique Dynamic Time Warping (DTW) to examine similarities of business cycles. DTW is a widely applied algorithm in non-economic fields such as speech, pattern and movement recognition [see, for example, Cedras and Shah (1995), Geiger et al. (1995), and Sakoe and Chiba (1978)] and data mining [see, for example, Keogh and Pazzani (2000), among others]. While its non-parametric similarity measure has been shown to be superior to other measures such as the Pearson correlation coefficient (Petitjean et al. 2011), economic research has not yet used Dynamic Time Warping to its potential. To the best of our knowledge, Wang et al. (2012) and Raihan (2017) are the exceptions, where the DTW technique is used in its basic standard format. In our study, we design the DTW technique in particular for our purposes to identify dynamic leading and lagging relationships between time series. We illustrate the application of our modified DTW technique by investigating business cycle similarities across US states.

There is substantial literature on similarities across time series variables, predominantly using parametric techniques. Usually, it is assumed that when one variable is leading the other, it does so throughout the entire sample. Bai and Ng (2004) consider a dynamic factor model with a common factor structure. Engle and Kozicki (1993) examine common cyclical and other features. Cubadda et al. (2009) study common autoregressive parts across individual series as an indication of common features in multivariate models. Fröhwirth-Schnatter and Kaufmann (2008) propose a clustering method based on mixture models, where each mixture component follows the same underlying model specification. Due to the parametric nature of these approaches to time series’ similarity, however, neither can adequately describe a time-varying dynamic specification. In particular, and to put it simply, these methods restrict that variables \(y_{t}\) and \(x_{t}\) are linked via, for example, the (vastly simplified) specification \(y_{t} = \alpha + \beta x_{t - j} + \varepsilon_{t}\) with j fixed over the sample. In this paper we propose to study co-movements using the DTW technique where we allow for the possibility that j is time-varying and may switch from positive to negative or vice versa.

A prominent strand of economic literature for which this seems particularly relevant is the research on the similarity of business cycles. Here, most studies analyse correlation structures with, for example, common factor models or their adaptations (Francis et al. 2017). Recently, also non-parametric clustering techniques have received increased attention. For example, Crone (2005) and Papageorgiou et al. (2017) consider K-means clustering to group business cycles of US states and European countries, respectively. A common caveat among these previously considered approaches, however, is the strict assumption of identically timed business cycles. As suggested by the findings of Hamilton and Owyang (2012), this is potentially problematic as a substantial amount of idiosyncratic business cycles seems to stem from differential timing around national recessions. If one is interested not only in the co-movement at a particular point in time, but also the similarity in the extent or gravity of the economic movements, existing techniques only allow for limited insights if their results are largely driven by even slight temporal shifts of time series. DTW can alleviate such concerns by directly incorporating potentially dynamic timing differences in the analysis.

For a first and intuitive impression on how DTW can contribute to existing literature, consider Fig. 1. The graph shows the DTW-produced alignment of quarterly, seasonally adjusted real GDP of Florida and Idaho for the period 2006Q1 to 2017Q4. The lines that link two points of the respective series show their closest correspondence. While in some cases the connected points concern the same quarter, the figure indicates that in most cases, earlier or later quarters are apparently a better match. This feature is the key reason that we rely on this non-parametric technique.Footnote 1 From Fig. 1 we see that the time-alignment of the two series, denoted by j earlier, seems to vary over the sample, ranging over values such as 0, 1, − 1 and even − 4 and 4. One could now decide to make j a time-varying parameter, for example \(j_{t}\), however, given the size of typical samples on real GDP data, a non-parametric technique without the need for high degrees of freedom is more appropriate. Dynamic Time Warping, at least in the variant proposed below, seems to be a useful technique: it not only illustrates the link of two time series visually but it also provides an estimate of their temporal alignment, as we will show below.

Fig. 1
figure 1

Temporal point mapping. Quarterly real GDP of Florida (x, blue) and Idaho (y, green) for the period 2006Q1 to 2017Q4, standardized at 1.00 for 2006Q1. (Color figure online)

The introduction of DTW—as a computer science method originally developed for pattern recognition—complements the recent developments in machine learning applications in economics, although with a clearly distinct contribution. In reviews of the relevant literature, Athey (2017) and Mullainathan and Spiess (2017) illustrate the potential of machine learning for advanced methods on causal inference and for predictive applications in economics. For DTW, we suggest that it contributes most to descriptive data analysis by revealing temporal dynamics between time series and allowing for assessment of their similarity. In a similar manner as machine learning methods can complement conventional econometric models, we believe DTW can complement—and in some contexts replace—the conventional Pearson correlation coefficient and other similarity measures. At the same time, the potential of this method is beyond its descriptive abilities. For example, Raihan (2017) makes promising suggestions for the use of DTW techniques for predictive purposes.

In the rest of this paper, we proceed as follows. In Sect. 2, we outline standard DTW and introduce our modifications for our analysis of economic time series. Our adjusted DTW is illustrated by an application to real GDP series of three US states. In Sect. 3, we discuss an application of DTW-based time series clustering to assess similarities of all mainland US state real GDP series. Section 4 concludes with some ideas for further research.

2 Dynamic Time Warping

This section outlines standard DTW and introduces our modification for our analysis of business cycle similarities. Throughout, we focus on the mathematical specification of the measure and its economic interpretation. For calculation of the DTW alignment by means of dynamic programming, Müller (2007) provides an excellent description of the algorithm.

DTW defines a distance-minimising temporal alignment between two time series \(X = \left[ {x_{1} ,x_{2} , \ldots ,x_{N} } \right]\) and \(Y = [y_{1} ,y_{2} , \ldots , y_{M} ]\). For this purpose, we first calculate the \(N \times M\) distance matrix D, where the \((i,j){\text{th}}\) element of the matrix is given by the distance between the points \(x_{i}\) and \(y_{j}\). Each element of the matrix thus corresponds to a mapping of two points between the time series. The optimal alignment is then given by the warping path p that minimises the cumulative distance of all mapped point-pairs on the path, denoted by

$$\gamma \left( {i,j} \right) = d\left( {x_{i} ,y_{j} } \right) + { \hbox{min} }\left[ {\gamma \left( {i - 1,j - 1} \right),\gamma \left( {i - 1,j} \right), \gamma \left( {i,j - 1} \right)} \right]$$

with \(d\left( {x_{i} ,y_{j} } \right)\) being a distance measure between points \(x_{i}\) and \(y_{j}\).

The DTW problem can be solved straightforwardly by means of an \(O\left( {MN} \right)\) dynamic programming algorithm as given by, for example, Müller (2007). The resulting minimal warping cost, \(\gamma \left( {N,M} \right)\), constitutes a non-parametric similarity measure of time series. When aiming to compare time series of differing length, it seems useful to assess the warping cost per considered time series element. In this case, we can consider

$$\varGamma \left( {N,M} \right) = \frac{{\gamma \left( {N,M} \right)}}{M + N}$$

where N and M are the lengths of the considered time series.

2.1 The Growth-Based Distance Function

While any function \(d\left( {x_{i} ,y_{j} } \right) \to {\mathbb{R}} \ge 0\) could be considered as a distance measure, in standard DTW it is typically defined as either \(\left| {x_{i} - y_{j} } \right|\) or \((x_{i} - y_{j} )^{2}\) [see, for example, Raihan (2017) and Wang et al. (2012)]. In the context of comparing economic time series such as business cycles, however, this choice does not seem perfectly appropriate. For example, it would ignore the definition of expansion and contraction periods if the value of only one period at a time is taken into account. More ideally, one therefore also considers an observation’s relationship to its earlier and subsequent neighbours to identify the features of the time series.

For these reasons, we consider the feature-based distance function of Xie and Wiltgen (2010). It constitutes a well suited measure that captures both the overall shape of the time series and the local trend around the points. The authors illustrate its advantage over standard, value-based DTW and the widely applied derivative DTW of Keogh and Pazzani (2001) by means of a simulation study.

We will adjust the feature-based distance function to avoid issues such as substantial differences in magnitudes of levels. The local feature of a particular observation is originally defined by a 2-element vector that summarises the slope to its left- and right-hand side neighbour. Rather than the absolute slope, we consider the growth rate between the periods to capture local trends. This modification fits common business cycle analysis. The local feature of observation i of a series X is then given by

$$f_{local} \left( {x_{i} } \right) = \left[ {\frac{{x_{i} - x_{i - 1} }}{{x_{i - 1} }},\frac{{x_{i + 1} - x_{i} }}{{x_{i} }}} \right]$$

The global feature is adjusted in a similar manner. Instead of the absolute deviation of a series’ point to its left- and right-hand side as proposed by Xie and Wiltgen (2010), we calculate its relative deviation to capture the shape of the particular series while correcting for substantial differences in magnitude across the set of time series. The global feature of observation i of a series X is then given by

$$f_{global} \left( {x_{i} } \right) = \left[ {\frac{{x_{i} - \mathop \sum \nolimits_{k = 1}^{i - 1} \frac{{x_{k} }}{i - 1}}}{{\mathop \sum \nolimits_{k = 1}^{i - 1} \frac{{x_{k} }}{i - 1}}},\frac{{x_{i} - \mathop \sum \nolimits_{k = i + 1}^{N} \frac{{x_{k} }}{N - i}}}{{\mathop \sum \nolimits_{k = i + 1}^{N} \frac{{x_{k} }}{N - i}}}} \right]$$

Finally, our modified feature-based distance function is thus given by

$$\begin{aligned} & d\left( {x_{i} ,y_{j} } \right) = d_{local} \left( {x_{i} ,y_{j} } \right) + d_{global} \left( {x_{i} ,y_{j} } \right) \\ & \quad = |f_{local} (x_{i} )_{1} - f_{local} (y_{j} )_{1} \left| + \right|f_{local} (x_{i} )_{2} - f_{local} (y_{j} )_{2} | \\ & \qquad + |f_{global} (x_{i} )_{1} - f_{global} (y_{j} )_{1} \left| + \right|f_{global} (x_{i} )_{2} - f_{global} (y_{j} )_{2} | \\ \end{aligned}$$

where the subscripts 1 and 2 refer to the corresponding elements in the vector. In short, our choice in (6) allows the data to have trends, and this is relevant for many economic time series. By balancing the local and global features, we can compare the data points as such, and evaluate them relative to their adjacent observations as well as to the overall pattern in the data.

3 Identification of Temporal Dynamics

Before we move to the analysis of all US states, we first consider Florida, Idaho and California and consider quarterly real GDP of US states from 2006Q1 till 2017Q4 obtained from the Bureau of Economic Analysis (BEA). For now, we focus on the insights obtained through DTW on the leading and lagging relationship between two time series. A discussion of the comparison of time series based on their similarities will follow in Sect. 4.

First, let us consider the example of Florida’s and Idaho’s GDP series presented in Fig. 1 of the introduction. When leads and lags would be constant throughout the sample, the lines connecting the two time series would all have a similar angle relative to one of the two variables. From the slope of the alignments in Fig. 1, we can however see that Idaho (green) leads Florida (blue) in the first half of the sample as the connecting lines face the north-east, so to say, while around observations 27 to 30, the connecting lines face the north-west. From around halfway the sample onward, the dynamics of the two series seem to change and Florida’s GDP growth leads Idaho’ for the remainder of the sample.

An alternative visualisation of the DTW results is given in Fig. 2. The distance-minimising mapping of points, shown in red in Fig. 1, is given by the warping path in blue. It is plotted on the distance matrix that was computed using our distance measure introduced above. A non-shifted temporal alignment between the two series would be indicated by a fully diagonal warping path, whereas deviations from the diagonal indicates leading and lagging relationships of the series. Straightforwardly, we can thus infer the lead of Idaho over Florida for the first half of the sample by the left-hand deviations from the diagonal. Similarly, we can observe the change in the relationship at around halfway the sample, from which onward Florida leads Idaho by the right-hand deviations from the diagonal. These findings seem to suggest that Idaho’s GDP was affected by the great recession in 2008/2009 sooner than Florida (a lead of Idaho), but Florida’s GDP began to recover earlier (a lead of Florida). DTW therefore identifies substantial time-varying temporal shifts between the time series. As the conventional correlation coefficient solely considers a pre-specified, fixed temporal alignment of time series, these results highlight the advantage of DTW when interested in the similarity of series beyond a temporal alignment.

Fig. 2
figure 2

Warping Path and Distance Matrix. Quarterly real GDP of Florida (x) and Idaho (y) for the period 2006Q1 to 2017Q4. Regions of low and high cost (distance) are indicated by light and dark colours, respectively

To provide another, more pronounced example, we also analyse Florida’s and California’s real GDP series by means of DTW. Figure 3 presents the two series and their point mapping. Figure 4 shows the corresponding distance matrix and the warping path. As indicated by the vertical point-mappings and the diagonal warping path, the two GDP series are not shifted for the first four and a half years. After that, we see a substantial change in the relationship, where California leads Florida’s GDP series for the remainder of the sample. This is shown by both the non-vertical point-mappings and the right-hand deviation from the diagonal of the warping path in Fig. 4. These graphs suggest that both Florida and California were affected by the recent great recession at about the same time, but California’s GDP seemed to have recovered sooner than Florida’s (that is, there is a lead of California).

Fig. 3
figure 3

Temporal Point Mapping. Quarterly real GDP of Florida (x, blue) and California (y, green) for the period 2006Q1 to 2017Q4. (Color figure online)

Fig. 4
figure 4

Warping path. Quarterly real GDP of Florida (x) and California (y) for the period 2006Q1 to 2017Q4

4 Time Series Clustering

Apart from insights on the pairwise leading and lagging relationships between two series, it is also of interest to assess the similarity of time series. When only a small set of series is under investigation, we could simply compare the minimal warping cost per observations as introduced in (2). Concerning the already discussed examples, we find that the warping cost for Florida and Idaho is 0.036, and 0.048 for Florida and California. Florida is thus more similar to Idaho than to California as assessed by DTW.

In many cases, however, the set of time series is so large that such a comparison becomes bothersome if not infeasible. For G time series, one would have to assess \(\frac{{G\left( {G - 1} \right)}}{2}\) warping distances. As an alternative, time series clustering provides the possibility to make structured inferences about the similarity of large sets of time series in a concise manner. This is achieved by organizing the series into homogeneous groups, minimizing intra-group dissimilarity and maximizing inter-group dissimilarity. Time series clustering based on DTW would allow to discover data structures and it has proven to be a useful method, for example, for the purpose of data mining (Aghabozorgi et al. 2015; Liao 2005). In the following, we aim to illustrate DTW’s contribution in this regard also in our context of investigating similarity structures across US states’ real GDP series.

4.1 DTW-Based K-Means

A straightforward clustering approach is the K-means algorithm originally developed by MacQueen et al. (1967). It aims to minimize the total difference of the cluster’s sequences to the respective cluster average by iteratively reassigning sequences to clusters whose average is most similar to the sequence and recalculating the cluster averages in each iteration. The algorithm converges when the total intra-cluster differences cannot be reduced further. In the below, we briefly illustrate how K-means can be adjusted to reflect time series similarity as captured by growth rate-based dynamic time warping.Footnote 2

The K-means algorithm requires a well-defined `average’ of a set of series. This is not trivial when considering DTW distance measures. Petitjean et al. (2011) address this issue by introducing a global averaging method, DTW Barycenter Averaging (DBA). It consists of a heuristic strategy to approximate an average that accounts for temporal shifts in the spirit of DTW. DBA iteratively refines an initial average sequence to minimize its squared warping distance to the set of to-be-averaged series. This is achieved by calculating the temporal alignment of each series to the currently best average, and then defining the mean of all time series points that are temporally aligned with a particular point of the average as its updated version. Note that this allows the DTW average—or DBA—to reflect potentially dynamic temporal shifts across time series. In particular, a specific point of the average might be associated with multiple points of a time series.Footnote 3 To adapt this method to the application of business cycles, we consider the average of associated growth rates rather than each point’s absolute value. The DTW-based average time series can easily be constructed by iterative multiplication of the growth rates and an initial index value (for example 1).

As K-means might converge to local optima, it has become common practice in economic research to perform multiple instances with random starting clusters for the search of a global maximum. To reduce the number of required random instances, we optimize the initial clustering similar to the idea of the DTW-based K-means++ algorithm as in Zhang and Hepner (2017). Rather than initializing clusters by dividing the set of time series entirely randomly, each of the series has a probability to be assigned as the initial centroid to one of the k clusters in disproportional to its similarity with existing centroids. In particular, one of the considered time series is chosen at random to be the centroid of the first cluster. For the remaining time series, we calculate the probability \(p_{i}\) is given by

$$p_{i} = \frac{{\varGamma_{i}^{2} }}{{\mathop \sum \nolimits_{j = 1}^{G} \varGamma_{j}^{2} }}$$

where \(\varGamma_{i} = \mathop \sum \limits_{c \in C} DTW\left( {x_{i} ,c} \right)\) with C being the set of existing centroids, \(x_{i}\) being the ith time series, and DTW being the Dynamic Time Warping distance as defined before. A time series is selected as the centroid for the next cluster with probability \(p_{i}\), and the step is repeated until all clusters are assigned to an initial centroid. The remaining time series are then grouped into the cluster with the `closest’ centroid as assessed by the DTW distances.

Starting at this initial clustering, K-means then proceeds by averaging the time series of each cluster by means of DBA and reassigning each time series to the cluster with the lowest DTW distance. This is continued until no series is reassigned and the procedure terminates.

We will now illustrate this procedure by analyzing US states’ quarterly GDP around the period of the Great Recession.

4.2 An Application to US States

To illustrate the DTW-based clustering procedure, we proceed by analyzing US states’ quarterly GDP for the period 2006Q1 till 2017Q4. Choosing the cluster amountFootnote 4 of 8 we find the clusters as depicted in Fig. 5.

Fig. 5
figure 5

Map of the US States with 8 clusters

To characterize the clusters, it is useful to consider their averages as computed by DBA. Figure 6 plots the averages of the clusters with more than one state. As DTW assesses similarity of series when allowing for temporal shifts between series, it does not seem particularly sensible to comment on the specific timing of certain features of the averages. In contrast, it is perfectly sensible to infer about their general shapes.

Fig. 6
figure 6

Clusters’ DTW Barycenter averages. Only DBAs with more than one state are presented. (North Dakota is omitted.). The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

If we zoom in on the graphs in Figs. 7, 8, 9, 10, 11, 12, 13 and 14, we see that there is much heterogeneity across the clusters. Figure 7 shows that Arizona, Florida and Michigan are in a cluster with states that suffered strongly from the financial crisis in 2008/2009, and it took many years for these states to recover. In stark contrast, for the states in Cluster 2, which are California, Colorado, Iowa, Montana, Nebraska, Oregon, Utah and Washington, it took less than 3 years to recover from the crisis, and after that they show substantial positive growth. These states came out very strong from the crisis. At the other end, the states in Cluster 3, which are Connecticut, Louisiana, Maine, Nevada, New Jersey and Rhode Island, did not anymore attain the high levels of growth that they had before the crisis. Cluster 4 only contains North Dakota, and this state seems to follow its own idiosyncratic pattern. Alaska, Kansas, South Dakota and Wyoming in Cluster 5 did not seem to suffer from much setback due to the 2008/2009 crisis. Clusters 6 and 7 contain the largest contingents of states, for which the crisis caused just a year or two with a setback, after which growth continued with a steady pace. Finally, the states in Cluster 8, which are Oklahoma and Texas, seem not to have suffered at all from the crisis.

Fig. 7
figure 7

Cluster 1: Arizona, Florida, Michigan. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of Economic Analysis (BEA)

Fig. 8
figure 8

Cluster 2: California, Colorado, Iowa, Montana, Nebraska, Oregon, Utah, Washington. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

Fig. 9
figure 9

Cluster 3: Connecticut, Louisiana, Maine, Nevada, New Jersey, Rhode Island. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

Fig. 10
figure 10

Cluster 4: North Dakota. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

Fig. 11
figure 11

Cluster 5: Alaska, Kansas, South Dakota, Wyoming. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

Fig. 12
figure 12

Cluster 6: Georgia, Hawaii, Idaho, Indianan, Maryland, Massachusetts, Minnesota, New York, Ohio, Pennsylvania, South Carolina, Tennessee. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

Fig. 13
figure 13

Cluster 7: Alabama, Arkansas, Delaware, Illinois, Kentucky, Mississippi, Missouri, New Hampshire, New Mexico, North Carolina, Vermont, Virginia, West Virginia, Wisconsin. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

Fig. 14
figure 14

Cluster 8: Oklahoma, Texas. The sample (horizontal axis) runs from 2006Q1 till 2017Q4 and the data are obtained from the Bureau of economic analysis (BEA)

Fig. 15
figure 15

Scree plot

In sum, some clusters’ DBAs indicate that some states did not have a downward impact from the recession at all. This seems to hold for the states in cluster 7, including states as Alabama, Kentucky, New Mexico and Wisconsin. In contrast, the states in cluster 3, which are Connecticut, Louisiana, Maine, Nevada, New Jersey and Rhode Island, did not seem to recover from the recession, even as late as by the end of 2017Q4.

5 Conclusion

We introduced the Dynamic Time Warping technique to examine potential similarities across business cycles, where we focused on quarterly real GDP. This non-parametric DTW technique is based on distances between observations, and seeks to find a minimum total distance as the optimal path. As such, it allows for the possibility that lead and lag relations can change during the sample period. With a few examples we showed that this switching behaviour is more common than rare. Parametric models to allow for this behaviour will quickly become parameter-heavy, and given the commonly observed sample sizes, we believe that a non-parametric technique comes in handy.

The DTW technique is conceptually not complicated, although it may take some computer time to run the calculations, specifically when clusters need to be created. We envisage various other application areas for the DTW technique. In essence it is a method to connect moving objects, and as such many economic variables come to mind which may benefit from DTW analysis. Non-parametric count methods can be used to introduce statistical inference, to examine if a leading variable turned into a lagging variable after some event.

We proposed to use Dynamic Time Warping as a machine learning technique in economics and finance, here with an application to examine common patterns across economic variables. The study of common patterns is important, see also Corona et al. (2020), as it can lead to a better understanding of data features across multiple time series, and it can be beneficial for forecasting. With the advance of more and more detailed data, we foresee the study of common patters as an important area for research. At the same time, when so many time series data become available, we also envisage a need for the application of machine learning techniques, see also Katris (2020), as these techniques can help to discover patterns that are not immediately obvious. More research into designing powerful machine learning methods to study common patterns is thus an important avenue for future research.