Business Cycle Synchronization in the EU: A Regional-Sectoral Look through Soft-Clustering and Wavelet Decomposition

Jokubaitis, Saulius; Celov, Dmitrij

doi:10.1007/s41549-023-00090-4

Business Cycle Synchronization in the EU: A Regional-Sectoral Look through Soft-Clustering and Wavelet Decomposition

Research Paper
Published: 10 December 2023

Volume 19, pages 311–371, (2023)
Cite this article

Journal of Business Cycle Research Aims and scope Submit manuscript

101 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper elaborates on the sectoral-regional view of the business cycle synchronization in the EU – a necessary condition for the optimal currency area. We argue that complete and tidy clustering of the data improves the decision maker’s understanding of the synchronization phenomenon and the quality of economic decisions. We define the business cycles by applying a wavelet approach to drift-adjusted gross value-added data spanning over 2000Q1 to 2021Q2. For the application of the synchronization analysis, we propose the novel soft-clustering approach, which adjusts hierarchical clustering in several aspects. First, the method relies on synchronicity dissimilarity measure, noting that, for time series data, the feature space is the set of all points in time. Then, the “soft” part of the approach strengthens the synchronization signal by using silhouette scores. Finally, we add a probabilistic sparsity algorithm to drop out the most asynchronous “noisy” data improving the silhouette scores of the most and less synchronous groups. The method splits the sectoral-regional data into three groups: the synchronous group that shapes the core EU business cycle; the less synchronous group that may hint at lagging sectors and regions; the asynchronous noisy group that may help investors to diversify through-the-cycle risks of their investment portfolios. Our results do not contradict the core-periphery hypothesis, suggesting that France, Germany, Austria and Italy, together with export-oriented economic activities, drive the core EU business cycle. The less synchronous group consists of agriculture, public services, and financial services that respond to global shocks to a lesser extent and are more resilient to the COVID-19 outbreak. Finally, the dropout segment includes periphery regions, containing mainly agriculture and other domestically supplied services. The coherence analysis demonstrates the spillover direction, going from the core to other groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unveiling time-varying asymmetries in the stock market returns through energy prices, green innovation, and market risk factors: wavelet-based evidence from China

Article 14 April 2024

A Moving Linear Model Approach for Extracting Cyclical Variation from Time Series Data

Article 25 November 2023

Causal inference for time series analysis: problems, methods and evaluation

Article 23 November 2021

Availability of data and material:

All data and materials used for this study are publicly available and named in the manuscript. To facilitate replication, data are also available upon request.

Notes

For “periodic” boundary conditions, the coefficients are calculated by: $\tilde{W}_{j}(t) = \sum _{l = 0}^{L^{*}_j - 1} \tilde{h}_{j, l} X((t - l) \mod T)$ and $\tilde{V}_{j}(t) =\sum _{l = 0}^{L^{*}_j - 1} \tilde{g}_{j, l} X((t - l) \mod T)$, where the initial data series $\{X(t)\}$ is used, for $t = 0, \ldots , T-1$.
The average synchronicity score within the “Type 2” group is 0.47, with a minimum of 0.18 and a maximum of 0.68.
Two of the 120 series within the “Type 3” group demonstrate synchronicity larger than 0.47 (the average score of the “Type 2“ cluster), while the average is 0.08, with a minimum of $-0.26$ and a maximum of 0.6.

References

Aguiar-Conraria, L., & Soares, M. J. (2011). Business cycle synchronization and the euro: A wavelet analysis. Journal of Macroeconomics, 33(3), 477–489.
Article Google Scholar
Aguiar-Conraria, L., & Soares, M. J. (2011). Oil and the macroeconomy: Using wavelets to analyze old issues. Empirical Economics, 40(3), 645–655.
Article Google Scholar
Ahlborn, M., & Wortmann, M. (2018). The core-periphery pattern of European business cycles: A fuzzy clustering approach. Journal of Macroeconomics, 55, 12–27.
Article Google Scholar
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern recognition, 46(1), 243–256.
Article ADS Google Scholar
Arcabic, V., Panovska, I., & Tica, J. (2022). Business cycle synchronization and asymmetry in the European Union. Available at SSRN.
Ardila, D., & Sornette, D. (2016). Dating the financial cycle with uncertainty estimates: A wavelet proposition. Finance Research Letters 19(C), 298–304.
Artis, M. J., & Zhang, W. (1997). International business cycles and the ERM: Is there a European business cycle? International Journal of Finance & Economics, 2(1), 1–16.
Article Google Scholar
Artis, M. J., & Zhang, W. (2002). Membership of EMU: A fuzzy clustering analysis of alternative criteria. Journal of economic integration, 17(1), 54–79.
Article Google Scholar
Bandrés, E., Gadea Rivas, M. D., & Gómez-Loscos, A. (2017). Regional business cycles across Europe. Banco de Espana Occasional Paper (1702).
Belo, F., et al. (2001). Some facts about the cyclical convergence in the euro zone. Economic Research Department
Bengoechea, P., Camacho, M., & Perez-Quiros, G. (2006). A useful tool for forecasting the euro-area business cycle phases. International Journal of Forecasting, 22(4), 735–749.
Article Google Scholar
Bruzda, J., et al. (2011). Business cycle synchronization according to wavelets-the case of Poland and the euro zone member countries. Bank i Kredyt, 42(3), 5–33.
Google Scholar
Bry, G., & Boschan, C. (1971). Cyclical Analysis of Time Series: Selected Procedures and Computer Programs. National Bureau of Economic Research, Inc.
Bunyan, S., Duffy, D., Filis, G., & Tingbani, I. (2020). Fiscal policy, government size and EMU business cycle synchronization. Scottish Journal of Political Economy, 67(2), 201–222.
Article Google Scholar
Campos, N. F., & Macchiarelli, C. (2016). Core and periphery in the European Monetary Union: Bayoumi and Eichengreen 25 years later. Economics Letters, 147, 127–130.
Article Google Scholar
Celov, D., & Comunale, M. (2022). Business Cycles in the EU: A Comprehensive Comparison Across Methods. In Essays in Honour of Fabio Canova, Volume 44 of Advances in Econometrics, pp. 99–146. Emerald Group Publishing Limited.
Christiano, L. J., & Fitzgerald, T. J. (1999). The Band pass filter. Working Papers (Old Series) 9906, Federal Reserve Bank of Cleveland.
Cornish, C. R., Bretherton, C. S., & Percival, D. B. (2006). Maximal overlap wavelet statistical analysis with application to atmospheric turbulence. Boundary-Layer Meteorology, 119, 339–374.
Article ADS Google Scholar
Coudert, V., Couharde, C., Grekou, C., & Mignon, V. (2020). Heterogeneity within the euro area: New insights into an old story. Economic Modelling, 90, 428–444.
Article Google Scholar
Crespo-Cuaresma, J., & Fernández-Amador, O. (2013). Business cycle convergence in EMU: A first look at the second moment. Journal of Macroeconomics 37(C), 265–284.
Crowley, P. M., & Mayes, D. G. (2009). How fused is the euro area core?: An evaluation of growth cycle co-movement and synchronization using wavelet analysis. OECD Journal: Journal of Business Cycle Measurement and Analysis, 2008(1), 63–95.
Google Scholar
de Haan, J., Jacobs, J. P., & Zijm, R. (2023). Coherence of output gaps in the euro area: The impact of the covid-19 shock. European Journal of Political Economy, 102369.
Di Giorgio, C. (2016). Business cycle synchronization of ceecs with the euro area: A regime switching approach. JCMS: Journal of Common Market Studies 54(2), 284–300.
Dickerson, A. P., Gibson, H. D., & Tsakalotos, E. (1998). Business cycle correspondence in the European Union. Empirica, 25(1), 49–75.
Article Google Scholar
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap. Chapman & Hall/CRC.
Book Google Scholar
Fidrmuc, J., & Korhonen, I. (2006). Meta-analysis of the business cycle correlation between the euro area and the ceecs. Journal of comparative Economics, 34(3), 518–537.
Article Google Scholar
Frankel, J. A., & Rose, A. K. (1998). The endogeneity of the optimum currency area criteria. The economic journal, 108(449), 1009–1025.
Article Google Scholar
Inklaar, R., Jong-A-Pin, R., & De Haan, J. (2008). Trade and business cycle synchronization in OECD countries-a re-examination. European Economic Review, 52(4), 646–666.
Article Google Scholar
Kalemli-Ozcan, S., Sørensen, B. E., & Yosha, O. (2001). Economic integration, industrial specialization, and the asymmetry of macroeconomic fluctuations. Journal of International Economics, 55(1), 107–137.
Article Google Scholar
Kapounek, S., & Kučerová, Z. (2019). Historical decoupling in the EU: evidence from time-frequency analysis. International Review of Economics & Finance, 60, 265–280.
Article Google Scholar
Krugman, P. (1991). Increasing returns and economic geography. Journal of political economy, 99(3), 483–499.
Article Google Scholar
Landesmann, M. A. (2003). The CEECs in an enlarged Europe: Patterns of structural change and catching-up (pp. 28–87). Austrian Ministry for Economic Affairs and Labour, Economic Policy Center, Vienna: Structural Reforms in the Candidate Countries and the European Union.
Lee, S., Liao, Y., Seo, M. H., & Shin, Y. (2021). Sparse hp filter: Finding kinks in the covid-19 contact rate. Journal of Econometrics 220(1), 158–180. Pandemic Econometrics.
McKinnon, R. I. (1963). Optimum currency areas. The American Economic Review, 53(4), 717–725.
Google Scholar
Mink, M., Jacobs, J. P., & de Haan, J. (2012). Measuring coherence of output gaps with an application to the euro area. Oxford Economic Papers, 64(2), 217–236.
Article Google Scholar
Monfort, M., Cuestas, J. C., & Ordóñez, J. (2013). Real convergence in Europe: A cluster analysis. Economic Modelling, 33, 689–694.
Article Google Scholar
Murtagh, F., & Legendre, P. (2014). 10). Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion? Journal of Classification, 31, 274–295.
Article MathSciNet Google Scholar
Papageorgiou, T., Michaelides, P. G., & Milios, J. G. (2010). Business cycles synchronization and clustering in Europe (1960–2009). Journal of Economics and Business, 62(5), 419–470.
Article Google Scholar
Percival, D. B., & Walden, A. T. (2000). Wavelet methods for time series analysis, (Vol. 4). Cambridge University Press.
Phillips, P. C. B., & Shi, Z. (2019). Boosting: Why you can use the HP filter.
Rathke, A., Streicher, S., & Sturm, J.-E. (2022). How similar are country- and sector-responses to common shocks within the euro area? Journal of International Money and Finance, 120, 102313.
Article Google Scholar
Roesch, A., & Schmidbauer, H. (2018). WaveletComp: Computational Wavelet Analysis. R package version, 1, 1.
Google Scholar
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Article Google Scholar
Soares, M. J., et al. (2011). Business cycle synchronization and the euro: A wavelet analysis. Journal of Macroeconomics, 33(3), 477–489.
Article Google Scholar
Stanišić, N., et al. (2013). Convergence between the business cycles of Central and Eastern European countries and the Euro area. Baltic Journal of Economics, 13(1), 63–74.
Article Google Scholar
Torres, M. E., Colominas, M. A., Schlotthauer, G., & Flandrin, P. (2011). A complete ensemble empirical mode decomposition with adaptive noise. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4144–4147.
Wortmann, M., & Stahl, M. (2016). One size fits some: A reassessment of EMU’s core-periphery framework. Journal of Economic Integration, 31(2), 377–413.
Article Google Scholar

Download references

Acknowledgements

This research has received funding from the European Social Fund (project No. 09.3.3-LMT-K-712-01-123) under a grant agreement with the Research Council of Lithuania (LMTLT). The authors would like to thank the anonymous Referees for their very constructive and detailed comments and suggestions on the initial version of the manuscript, as well as Svatopluk Kapounek, Jesus Crespo Cuaresma and all participants of “Euro4Europe” workshops for comments and suggestions.

Author information

Authors and Affiliations

Faculty of Mathematics and Informatics, Institute of Applied Mathematics, Vilnius University, Naugarduko st. 24, LT-03225, Vilnius, Lithuania
Saulius Jokubaitis & Dmitrij Celov
Faculty of Economics and Business Administration, Vilnius University, Sauletekio av. 9, (II building), Vilnius, Lithuania
Saulius Jokubaitis & Dmitrij Celov

Authors

Saulius Jokubaitis
View author publications
You can also search for this author in PubMed Google Scholar
Dmitrij Celov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saulius Jokubaitis.

Ethics declarations

Conflict of interest:

The authors declare that they have no conflict of interest.

Ethical approval:

This article does not contain any studies with human participants or animals performed by any of the authors.

Code availability:

The empirical analysis of this study is performed in R (version 4.1.1). All codes used in the analysis are available upon request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Probabilistic Soft-Clustering: Comparison with the Benchmark Alternatives

This appendix compares the probabilistic soft-clustering approach, described in Sect. 2.6, with widely known hard-clustering methods. Seeking to assess the gains and the subsequent trade-offs introduced through the probabilistic filtering of the modelled variables, we repeat the analysis of Sect. 3.2, where instead of the soft-clustering approach, we apply hierarchical clustering using the following dissimilarity measures: synchronicity measure (13), Euclidean distance, and supremum distance. We posit a 3-cluster configuration, which aligns with the assumption of an existing core-periphery structure and makes the direct comparison between the approaches feasible. We define the “hard” clustering methods as the challenger models, where none of the variables goes to a distinct Dropout cluster. The cluster mapping between compared approaches follows from the overlap of the elements within the identified clusters.

1.1 A.1 Synchronization Measure

The first experiment compares the results of the probabilistic soft-clustering approach with a hard-clustering approach keeping synchronization dissimilarity measure, as defined by Eq. (16), and applying hierarchical clustering with Ward’s minimum variance linkage. Hence, this comparison isolates the gains brought by the probabilistic “soft” part of the clustering approach.

Figure 6 summarizes the results, where the obtained clusters are aligned based on the similarity of the corresponding first principal components. Likewise pooled data results, the benchmark approach can keep the core signal within the analysed data. This observation follows from comparing the “Cluster 3“ with the Synchronous cluster in Fig. 6 and the corresponding first principal components, detailed in Fig. 8. However, even if the benchmark model with $48\%$ variance explained improves upon the pooled data ($32\%$ variance), it is still much noisier than the Synchronous cluster from the soft-clustering approach ($60\%$ variance). The other two clusters have a much smaller overlap with the corresponding soft-clustering groups. Besides, the leading effect observed in Fig. 5 disappears when applying the hard-clustering, where the position of local peaks and troughs is more aligned than allowing for the soft modification (see Fig. 7).

As established by Sect. B.1 and demonstrated by Sect. 3.2, the probabilistic soft-clustering approach provides the analyst with an additional set of tools that allow controlling the tightness and clarity of the core cluster and, hence, a better understanding of which regions and sectors constitute the core and allows finding spillover directions that do not contradict a priori expectations (Fig. 7).

1.2 A.2 Euclidean and Supremum Dissimilarity Measures

The main feature utilized in our paper is the use of synchronization as the dissimilarity measure for clustering (see Eqs. (13)–(16)). As discussed in Sect. 3.3, an additional advantage of this approach is the clear interpretation behind the resulting clusters stemming from certain levels of cycle synchronization. However, the complete comparison requires evaluating the proposed soft-clustering against other widely used dissimilarity measures. In the second experiment, we repeat the clustering exercise using the Euclidean and supremum dissimilarity measures and compare the arising differences against the soft-clustering approach with hyper-parameters established in Sect. 3.2. Similar to the first experiment, we consider only the “hard” hierarchical clustering approach with a fixed 3-cluster setup, without applying any other modifications discussed in the paper. However, this time the differences are both due to dissimilarity measure and soft adjustments.

First, we examine the benchmark model based on Euclidean dissimilarity, the results of which are detailed in Fig. 9. Observe that the Euclidean measure allows for the recovery of the core signals – this is evident when comparing each first principal component against the component of the Synchronous cluster, denoted in red. In fact, these results reveal an eminent property of the Euclidean measure – the resulting cluster formations will heavily depend on the variance of the series. For instance, examine the first principal components of Clusters 1–3. It should quickly become apparent that the differences between them essentially stem from the magnitude of the response to the two shocks: the financial crisis of 2008 and the COVID-19 shock of 2020. Indeed, “Cluster 1” is driven by the substantial impacts of the COVID-19 shock, while such effects seem considerably dampened when inspecting the signals of “Cluster 2”. While “Cluster 3” appears to be a collection of series likely leading the “Cluster 2” that is further supported by the coherence analysis presented by Fig. 10.

Figure 11 compares the first principal components. Note that all signals of the Euclidean benchmark approach are distributed roughly around the signal of the pooled group (see Fig. 1). It is evident that using the Euclidean distance as the dissimilarity measure does not immediately recover the Asynchronous signals that compose the periphery structure of the economy. For the latter, it is likely that higher precision is necessary, which may be achieved by increasing the number of clusters. However, in such cases, the required further investigation diverges from the primary focus outlined in this paper.

Similarly to the previous discussion, we examine the benchmark model based on the supremum dissimilarity. Figure 12 compares the resulting clusters. The key aspects are as follows. First, Cluster 1 is able to identify the Asynchronous signal. In fact, 85% of series from the Asynchronous group are in Cluster 1. However, second, Cluster 2 and Cluster 3 do not recover the core signal of the data, as can be seen in Fig. 11 when comparing the first principal components with the pooled and Synchronous components. In this case, the Cluster 2 signal follows the Dropout signal, while the the Cluster 3 signal overshoots at the tail-ends of the distribution. Thus, the supremum measure yields a split of the pooled signal into a lower and upper part, without directly constructing around the mean signal. Indeed, 87% of the series from the Synchronous cluster are in Clusters 2 and 3. While a large part of the Dropout cluster is covered by Cluster 1. In this case, directional spillover analysis loses any sense (Fig. 13, 14).

In sum, none of the hard-clustering approaches could recover the spillover direction from core to periphery and provide a distorted view of the core and periphery composition. It is evident that classical dissimilarity measures are heavily affected by the shape and scale of the series, which may lead to overly high sensitivity in the presence of economic shocks, especially considering the necessary normalization of the extracted series. On the other hand, the synchronization measure allows us to focus on whether the data is synchronized, bypassing the shape and phase analysis.

B A Simulation Study

Following Celov and Comunale (2022), we assume that the data is generated by:

$$\begin{aligned} y_{t} - y_{t}^{*} - \alpha C_{t}=~ & {} \nu _t, ~~~ \nu _t \sim \mathcal {N}(0, \sigma ^2), \end{aligned}$$

(25)

where $y_t$ denotes the simulated time series, $y_t^{*}$ is a stochastic local linear trend, $C_t$ is a stochastic cycle with scaling factor $\alpha$ controlling its overall variability (amplitude), and $\nu _t$ denotes high-frequency irregular shocks. The time series are simulated using a state-space representation of the following structural unobserved components model. The signal equation is defined by (25), the states of the unobserved stochastic trend are defined by (26)–(27),

$$\begin{aligned} y_{t}^{*}~=~ & {} y_{t-1}^{*} + \mu _{t-1} + \varepsilon _{l, t}, ~~~\varepsilon _{l, t} \sim \mathcal {N}(0, \sigma _{l}^{2}), \end{aligned}$$

(26)

$$\begin{aligned} \mu _{t}~=~& {} \mu _{t-1} + \varepsilon _{\mu , t}, ~~~~~~~~~~~~\varepsilon _{\mu , t} \sim \mathcal {N}(0, \sigma _{\mu }^{2}), \end{aligned}$$

(27)

and the unobserved stochastic cycle is defined by (28)–(29),

$$\begin{aligned} C_{t}~=~& {} c_{t-1} + \rho _{c} C_{t-1}, \end{aligned}$$

(28)

$$\begin{aligned} (1 - 2\rho \cos (\lambda )B + \rho ^2B^2)c_t~=~& {} (1 - \rho \cos (\lambda )B) \varepsilon _{1,t}, ~~~\varepsilon _{1,t} \sim \mathcal {N}(0,1), \end{aligned}$$

(29)

where $\mu _t$ is a time-varying drift with variance $\sigma _{\mu }^{2}$; $\sigma _{l}^2$ denotes the variance of the level state of $y_{t}^{*}$; $c_{t}$ is the inner first-order stochastic cycle with $\rho , \rho _{c} \in (0,1)$ denoting the damping factors; $\lambda = 2 \pi T$ is the frequency component measured in radians with T being the average duration of the inner cycle; and B denotes the backshift operator ($By_t = y_{t-1}$).

Throughout the simulation study, we assume the following parameters:

$$\begin{aligned} \alpha ~&= ~0.009,&~~\sigma ~&= ~0.01,&~~\sigma _{\mu }~&= ~0.00004,&~~\sigma _{l}~~{} & {} = ~0.001, \end{aligned}$$

(30)

$$\begin{aligned} \rho _{c}~&= ~0.55,&~~\rho ~&=~ 0.7,&~~\lambda ~&= ~0.2165.{} & {} \end{aligned}$$

(31)

Following the evidence presented in Celov and Comunale (2022), we posit that the series, simulated by (25)–(29) with parameters specified in (30)–(31), offer a reliable representation of the empirical EU data.

Assume that $C_{t}^{*}$ denotes a fixed core cycle. The key idea of this section involves the generation of three distinct categories of time series. The first category presumes that each series adheres to the same stochastic cycle $C_{t}^{*}$. We can interpret this group as the core cluster. The second and third groups permit random variability in the stochastic cycle component. Throughout this section the groups are denoted as “Type 1”–“Type 3”, defined by (32)–(34). Note, the “Type 2” category is carefully curated to include only those series that demonstrate high synchronicity of the unobserved cycles with the core cycle $C_{t}^{*}$, namely top 175 series from 1 million simulated series are chosen.^{Footnote 2} “Type 3” category, while randomized, may contain series that synchronize with the core cycle, but only with small probability.^{Footnote 3}

$$\begin{aligned} {\textbf {Type 1}}:{} & {} y_{i,t} - y_{i,t}^{*} - \alpha C_{t}^{*} ~=~\nu _t, ~~~ \nu _t \sim \mathcal {N}(0, \sigma ^2), \end{aligned}$$

(32)

$$\begin{aligned} {\textbf {Type 2}}:{} & {} y_{i,t} - y_{i,t}^{*} - \alpha C_{i,t} ~=~\nu _t, ~~~ \text {high synchronicity with } C_{t}^{*}, \end{aligned}$$

(33)

$$\begin{aligned} {\textbf {Type 3}}:{} & {} y_{i,t} - y_{i,t}^{*} - \alpha C_{i,t} ~=~\nu _t, ~~~ \text {low (random) synchronicity with } C_{t}^{*}. \end{aligned}$$

(34)

Figure 15 summarizes the simulated data. The motivation behind the simulation study is to evaluate the soft-clustering performance when the underlying DGP is controlled and assumed to be representative of real economic cycles. For this reason, we generate a substantial number of series with an underlying assumption that a certain portion of these series will be driven by the same core cycle of the economy – the $C_{t}^{*}$ cycle (the “Type 1” series). Furthermore, it is reasonable to anticipate that a certain part of the economy may not adhere to the core cycle, yet the series may still display a measure of synchronization. Such effect could stem from various sources, including but not limited to shared structural shocks, leading or lagging effects of the core cycle, and the convergence and divergence of certain economies, providing a functional interpretation as the peripheral series (“Type 2”). Finally, it is plausible that a set of the remaining (“Type 3”) series will be asynchronous or simply noisy.

For simplicity, in this simulation study, we assume that the size of these groups is approximately equivalent (in the range of 100–200 series), with the noise group set to take 25% of the data. The primary goal is to categorize the simulated series into three distinct clusters. Firstly, it is imperative to recover the core cycle of the economy and distinguish the specific series that are surrounding it. Secondly, it is crucial to identify the peripheral series. Since these may adhere to possibly divergent cycles, the analyst must be able to discern them separately. Finally, it is desirable to exclude all series that only contribute as noise.

As a benchmark, we use a hierarchical clustering approach with three dissimilarity measures: Euclidean, supremum, and the synchronicity measure. For simplicity, we assume that the underlying number of clusters is known and is equal to 3. Along with a baseline benchmark approach, we employ the probabilistic soft-clustering method as described in Sect. 2.6, where the hyper-parameters $(\omega _1, \omega _2, \omega _3)$ are set as (1000, 1, 2), in line with the results presented throughout this paper, while the parameter set $(\omega _4, \omega _5)$ is unknown and needs to be determined.

Through this simulation study, we demonstrate the following key points. First, we show that the soft-clustering approach with appropriately selected parameters $(\omega _4, \omega _5)$ can outperform the benchmark algorithms with all three considered dissimilarity measures (see Table 6). Second, in Sect. B.1, we establish heuristic guidelines that may, in particular cases, help define the search range for the optimal hyper-parameter pair $(\omega _4, \omega _5)$. Furthermore, we discuss a useful visual tool, defined as the $\mathcal {L}$-field, generated by the probabilistic clustering likelihood trajectories $L_{i}^{(\omega _4)}$ (see Sect. 2.6, Algorithm step (3c)). We demonstrate how traversing through the $\mathcal {L}$-field can help determine the search range for the optimal hyper-parameter pair $(\omega _4, \omega _5)$ and provide insights regarding the underlying cluster structure of the modelled data. Finally, we argue that the detailed exploration of the $\mathcal {L}$-field, presented in Sect. B.1, provides additional insights for a better understanding of the inner logic of the proposed soft-clustering approach (Sect. 2.6) and its limitations.

Table 6 Clustering accuracy, where three dissimilarity measures are considered: Euclidean, supremum and synchronicity. For each case, hierarchical clustering is performed assuming a 3-cluster structure. Here T1–T3 correspond to “Type 1“–“Type 3“ series, as defined by (32)–(34). The values denote the percentage part for each cluster that is assigned to a given group by the algorithm. In the first (top) part of the table, the results are generated by the benchmark hierarchical clustering approach. In the second (bottom) part we apply the probabilistic soft-clustering approach, described in the paper, where the hyper-parameters $\omega _4$ and $\omega _5$ are chosen to yield the most accurate overall performance

Full size table

2.1 B.1 Overview of the $\mathcal {L}$-field

Recall that the probabilistic clustering likelihood is defined as:

$$\begin{aligned} L_{i}^{(\omega _4)}= & {} \sum _{j: ~ p_{ij} > \omega _4} p_{ij}, \end{aligned}$$

(35)

where $p_{ij}$ are estimated probabilities that indicator pair (i, j) is grouped together, $i, j \in \mathcal {I}$. For any probabilistic threshold value $\omega _4 \in [0,1)$, consider a set of ordered likelihood trajectories $L_{(1)}^{(\omega _4)} \le L_{(2)}^{(\omega _4)} \le \ldots \le L_{(N)}^{(\omega _4)}$, where $N = |\mathcal {I}|$. Then, the $\mathcal {L}$-field can be defined as follows:

$$\begin{aligned} \mathcal {L}~:=~& {} \big \{L_{(i)}^{(\omega _4)}: 1 \le i \le N, ~~\omega _4 \in [0,1)\big \} \end{aligned}$$

(36)

$$\begin{aligned}=~& {} \big \{L_{\left( \lceil \omega _5 (N - 1) + 1 \rceil \right) }^{(\omega _4)}: \omega _5 \in [0,1], ~~\omega _4 \in [0,1)\big \}. \end{aligned}$$

(37)

The hyper-parameter pair $(\omega _4, \omega _5)$ is a key part of the proposed probabilistic soft-clustering approach, described in Sect. 2.6. Individually, the two parameters control the probabilistic threshold value and the dropout percentage value. However, substantial improvements in the clustering accuracy are observed when both parameters are carefully chosen. The interplay between the two parameters and their consequent impact on the final clustering solution can be inspected through the $\mathcal {L}$-field. In Fig. 16, we present a hypothetical representation of the $\mathcal {L}$-field, assuming it follows from a specific DGP with a three-cluster structure. We conclude that this representation can help demonstrate the described parameter interplay, discern particular solution zones, and formulate optimal hyper-parameter pair $(\omega _4, \omega _5)$.

The remainder of this section scrutinizes the $\mathcal {L}$-field through Fig. 16. We first briefly explore the theoretical model and end the section with observations backed by simulated data.

The x-axis of Fig. 16 denotes the percentage values of $\omega _5$ and corresponds to the dropout rate. The y-axis denotes the values of the likelihood trajectories $L^{(\omega _4)} = \{L^{(\omega _4)}_{\omega _5}: \omega _5 \in [0,1]\} \subset \mathcal {L}$, which can be normalized to not depend from the support size $|\mathcal {I}|$. Each likelihood trajectory $L^{(\omega _4)}$ is generated for some fixed $\omega _4 \in [0,1)$. By Eq. (35), it follows that with increasing $\omega _4$, a certain portion of the clustering probabilities is dismissed from the summation, leaving active only those indicators that are likely to group with sufficient probability. Consequently, this yields an additional interpretation for the hyper-parameter $\omega _4$ – as a particular baseline threshold for the expected tightness of the signal. Thus, the traversal through the $\mathcal {L}$-field corresponds to increasing the expected tightness of the clusters, the direction of which is denoted by purple arrows in Fig. 16.

The underlying assumption behind the sketch in Fig. 16 is that the DGP has a three-cluster structure, where one of the clusters consists of noise variables. Assuming the separation between the groups is non-trivial, the likelihood curves are deemed to have sigmoid shapes. In such a case, the upper part of the sigmoids will follow from indicators that cluster with high probability. Such variables can be interpreted as the core synchronous signal since they frame the base of the clustering exercise. Certain variables will likely be harder to accurately classify, therefore, they will either be grouped in smaller clusters or possess lower overall grouping probabilities. We consider such variables as a mixed signal group. Note that the proposed grouping interpretation does not recover the true underlying clustering from the DGP, it is only used to distinguish the clustering difficulty for the particular series. Hence, the core signal should be understood as the baseline signal, dealing with which is straightforward for the clustering algorithm, while mixed signals are inherently difficult, incorrect classification of which leads to overall unstable clusters. These two signal groups are denoted by blue and green in Fig. 16. Note that the core signal should envelope the likelihood trajectories above the top elbow line, while the mixed signal group falls between the upper and lower elbow lines. Finally, we argue that the lowest arm of the sigmoids is generated by the noisy signal. The noisy signals should either group with the remaining clusters with low probabilities, or group together as a separate noise cluster. Generally, a separable part of noise variables will fall below the lowest elbow line. This zone is denoted by brown in Fig. 16.

Note the interplay between the $\omega _4$ and $\omega _5$ parameters. As established, $\omega _4$ controls the tightness of the recovered signal, denoted by the arrows in Fig. 16, while $\omega _5$ controls the dropout percentage, denoted by the x-axis. If we ignored the parameter $\omega _4$ and focused only on the dropout percentage, we could find certain dropout thresholds, ensuring a level of signal separation at the cost of either dismissing useful information from the analysis (the core and mixed signals) or retaining a part of the noise variables (the noise signals). For instance, examine the points $M_1, M_2$ and $M_3$ in Fig. 16, which denote the dropout thresholds for mixed-signal capturing, core signal capturing or noise separation, respectively. Without controlling for $\omega _4$, each threshold can isolate a certain family of signals by dropping a part of useful information from the final result. However, by enabling the parameter $\omega _4$ we can search for optimal separation zones, moving through the corresponding likelihood trajectories. Ideally, the goal is to drop as little of the core signal as possible, since it captures the baseline indicators that are easy for the clustering algorithm to structure and are likely in the focus of scientific research. Furthermore, the secondary aim is to reduce the noise signal to a minimum, while retaining as much of both the core and mixed signals as possible. Therefore, we deem that the optimal balance between retaining the signal and reducing the noise should be achieved in the optimal parameter zone, which is denoted by a purple colour in Fig. 16, located between $M_1$ and $M_2$ dropout threshold points, above the lower elbow-line.

These guidelines are consistent with the simulation study, where we assumed that the data-generating process is structured in a 3-cluster setup, with one of the clusters generated as the noise cluster.

Indeed, consider Fig. 17, which presents the corresponding $\mathcal {L}$-field applying soft-clustering approach on the simulated data, where the synchronization dissimilarity measure is used for the clustering algorithm, and the algorithm assumes a two-cluster structure. Following the guidelines, the figure is appended with the resulting upper and lower elbow lines, which are deemed to separate the core, mixed and noise signals (in Fig. 17 denoted as blue, green and yellow regions). Recall, that these signals do not necessarily represent the three underlying clusters of the DGP, but are only referred to as signals, based on the clustering difficulties, which can be sensitive to the specific clustering methodology used. The focus here is the optimal recovery of both the core and mixed-signal groups while dropping out as many noise variables as reasonably possible. From the Fig. 16 guidelines follows the optimal parameter region that should fall between the dropout thresholds $M_1$ and $M_2$ and between the upper and lower elbow lines. This region is denoted by a purple area in Fig. 17.

Since the data-generating process is known, we confirm the previously stated ideas by reviewing each hyper-parameter $(\omega _4, \omega _5)$ pair and inspecting, which pair leads to the best-performing clustering model. The parameter pairs and the structures of the resulting clusters are presented in Fig. 18. First, we establish a baseline clustering approach by performing hierarchical clustering using the synchronicity dissimilarity measure, aiming for 3 clusters. The accuracy of the baseline approach is presented in Table 6. Next, in order to identify the core and mixed-signal regions, in Fig. 18 we present those hyper-parameter $(\omega _4, \omega _5)$ pairs that yield improved clustering results for either of the signal groups when compared against the benchmark groupings.

Consider the upper right graph of Fig. 18. The presented points yield perfect separation of the Type 1 group (see Eq. (32)). Note, that most of these points fall below the upper elbow line, as established by Fig. 17. Furthermore, by examining the guidelines of Fig. 17, observe that many points fall to the left of the $M_1$ dropout threshold, and below the lower elbow-line. This provides evidence that the Type 1 group is primarily contained within the core signal zone.

Furthermore, examine the lower left side of Fig. 18. The presented points result in improved separation of the Type 2 group compared to the benchmark clustering results. Note that when compared with the guidelines by Fig. 17, many presented points appear to fall within the optimal parameter region.

Finally, examine the lower right side of Fig. 18, where, likewise the previous observations, we present points that improve the separation of noise series, compared with the benchmark clustering results. When compared with the guidelines of Fig. 17, we observe that most of the points fall above the lower elbow line, to the right of the dropout threshold $M_1$. This provides evidence that a significant part of the noise series belongs to the noise signal region (highlighted in yellow, Fig. 17).

These results suggest that a large part of the Type 1 and Type 2 series are successfully assigned to the core signal group by the clustering algorithm. Furthermore, it follows that most of the clustering inaccuracies stem from the Type 2 series mixed with the noise series. By combining all of the previous observations we can find a set of optimal points that improve the separation of all three groups when comparing against the benchmark model. These are presented in the top left part of Fig. 18, while in the bottom part of Table 6, we present the precision of the best parameter pair. For robustness, this is repeated for all three dissimilarity measures.

Note, that the previously determined optimal set is much narrower than the optimal parameter zone, established by Fig. 18. Therefore, we deem that the zone should be only used as a recommendation when setting the initial grid of the hyper-parameter search, and adjusted further based on additional analysis through, e.g., Silhouette scores, cluster cleaning approaches, etc. Furthermore, in practice, the definition of “optimal” may vary depending on the research goals. Nonetheless, the tools provided by the $\mathcal {L}$-field can be easily tailored for different optimization objectives, by adjusting both the setup of the elbow lines and the interpretation of the resulting zones.

2.2 B.2 $\mathcal {L}$-field of the EU regional-sectoral data

In Sect. 3.2, we have discussed the selection of optimal hyper-parameter pair $(\omega _4, \omega _5)$. The reasoning is rooted in balancing the size of Synchronous and Asynchronous clusters, at the same time, maximizing Silhouette scores. In the main part of the paper, we employed a grid-search approach to ensure that all of the parameter values are considered, the results of which are detailed in Appendix E.

In this section, we supplement the discussion of Sect. 3.2 by inspecting the corresponding $\mathcal {L}$-field shown by Fig. 19. The elbow lines and the consequent dropout threshold points $M_1$ and $M_2$ align with the guidelines proposed in Sect. B.1. Compared with the results of the simulation, we can distinguish at least 4 elbow lines. However, the uppermost two lack precise definitions due to the shallow slopes of certain portions of the underlying likelihood trajectories (see, e.g., likelihood trajectories to the left of $M_2$). This observation suggests that a sufficiently strong core signal can be recovered even with $\omega _5$ close to 0, depending on the choice of $\omega _4$.

The two uppermost elbow lines may add another level of granularity to the resulting clusters: following the details of the topmost elbow line, we can distinguish two dropout threshold points $M_1$ and $M_2$. Here $M_1$ imposes a cutoff threshold that captures the topmost core signal, while the $M_2$ very roughly captures the turning point, signalling a substantial difference in clustering efficiency (i.e., the change of steepness around the $M_2$, when comparing the $\omega _4 = 0.9$ and $\omega _4 < 0.7$).

In accordance with the established guidelines and assuming a 3-cluster structure, we assert that the lowest two elbow lines should outline the optimal parameter region. We assume that the lowermost elbow line separates the noise signal, while the second lowest separates the core and mixed signals. This assumption allows us to establish the optimal parameter region, contained within $\omega _4 \in [0.55, 0.95]$ and $\omega _5 \in [0, 0.7]$, highlighted by Fig. 19. Besides, Table 10 shows the Silhouette score values in bold for hyper-parameter $(\omega _4, \omega _5)$ pairs falling into the established optimal parameter region. Except for $\omega _5 = 0.1$, the remaining highlighted scores contain the highest Silhouette scores in the same table column-wise. This fact confirms that parameters within the region are likely to yield adequate clustering results. It then remains for the researcher to choose a suitable tightness baseline through $\omega _4$ and a desirable dropout rate through $\omega _5$.

2.3 B.3 Isolating the COVID-19 effect

The purpose of this subsection is to assess the impact of the COVID-19 shock, as observed through the GVA data, on the outcomes of the soft-clustering approach. In line with the discussion in Sect. 3.2, we consider $\omega _4 = 0.8$ and limit the initial data sample by excluding the time period after the COVID-19 outbreak. Thus, we consider $\mathcal {T} = \{ 2000Q1, \ldots , 2019Q4\}$, and for convenience in the text below we will call it the “pre-COVID” sample, distinguishing it from the “full” data sample. Furthermore, we let $\omega _5 \in \{0.4, 0.45, \ldots , 0.95\}$, which allows us to investigate the resulting signal compositions under different noise profiles.

In Fig. 22 we present the $\mathcal {L}$-field (as described in Sect. B.1). Compare Fig. 22 with Fig. 19, where in the latter a “full” dataset is used. Examine the likelihood trajectories having high signal tightness (e.g., those where $\omega _4 > 0.8$) – we can observe two COVID-19 induced “ripples”, which in Sect. B.2 help establish the uppermost elbow lines, outlining the core signal zone. In contrast, in the pre-COVID case, the likelihood trajectories are almost linear without significant humps. In such a case, establishing clear elbow lines and the subsequent core signal zones becomes difficult. These observations suggest that the COVID-19 shock, from the perspective of the probabilistic soft-clustering algorithm, induces the grouping of particular variables with high probability (i.e., $\omega _4 \ge 0.8$). However, whether these co-movements are spurious or of a systemic nature can be answered with the help of Fig. 20.

Figure 20 presents the overlap of the resulting clusters when comparing the “pre-COVID” with the “full” sample case. The results demonstrate the stability of the Synchronous cluster with around 80% of the series retaining the same grouping over a variety of $\omega _5$ values. Although to a smaller degree (no less than 50%), similar stability is observed for the Asynchronous cluster. The composition of the Dropout cluster is less stable, approaching 80% with increasing $\omega _5$ values. For $\omega _5 \in [0.05, 0.4]$ a significant portion of the Dropout cluster is considered as Synchronous.

Finally, Fig. 21 demonstrates the first principal components of each cluster. As the core signals of each cluster highly overlap, this suggests a level of stability and robustness from omitting the COVID-19 shock. We argue that this property is a result of both the probabilistic soft-clustering approach and the synchronization measure used.

C Comparison of Synchronous Business Cycle Composition

The analysis below demonstrates the trade-off between highlighting the core sectoral-regional data points and the size of the dropout rates. We find that the small dropout rates tend to keep many boundary cases, while too high values remove the business cycles from the core open-to-trade sectors of the economy too much. The core regions, however, are the least affected. Hence, we could suggest higher dropout rates to highlight the main contributors but moderately adopt the findings when formulating economic policy recommendations (Table 7).

Table 7 Composition of synchronous business cycles clusters under differing dropout rates. Here the probability threshold is 80% and dropout rates are 5%, 45%, 65%. The ‘*, **, ***’ denote the inclusion at different dropout rates, with ‘***’ meaning inclusion in all three cases, and ‘*’ inclusion only under 5% dropout. See Appendix F, Table 14 for the list of economic activities and their abbreviations

Full size table

D Alternative Set of Parameters

We argued in the paper that higher dropout rates have a smaller impact on the synchronous core and demonstrated this by analysing the optimal, in our view, set of hyperparameters. To complete the discussion, in this Appendix, we show an alternative set of parameters with a much smaller dropout rate $\omega _5 \in \{0.05, 0.65\}$. We notice that all the main conclusions of the paper remain valid (Figs. 23, 24, Tables 8, 9).

Table 8 Clustering results: $\omega _4 = 0.8$, $1-\omega _5 = 0.95$. The results consist of two clusters (Asynchronous and Synchronous, denoted by ‘-‘ and ‘+‘, respectively). The remaining series after the dropout are denoted by blank spaces in the table and are aggregated under the Dropout cluster. See Appendix F, Table 14 for the list of economic activities and their abbreviations

Full size table

Table 9 Clustering results: $\omega _4 = 0.8$, $1 - \omega _5 = 0.35$. The results consist of two clusters (Asynchronous and Synchronous, denoted by ‘-‘ and ‘+‘, respectively). The remaining series after the dropout are denoted by blank spaces in the table and are aggregated under the Dropout cluster. See Appendix F, Table 14 for the list of economic activities and their abbreviations

Full size table

E Grid-Search Analysis

The appendix summarizes the grid-search results looking for the hyperparameters of the soft-clustering approach $(\omega _3,\omega _4,\omega _5)$, when $\omega _1 = 1000$ bootstrapped samples, $\omega _2 = 1$, and $\omega _6 = \omega _3$. The tables show the average and minimum silhouette scores before (initial) and after silhouette clean-up (final) from Sect. 2.5 (Tables 11, 12, 13).

Table 10 Composition of silhouette scores during the grid-search. Here $\omega _1 = 1000$, $\omega _2 = 1$ and $\omega _6 = \omega _3$. The presented scores correspond to the average score between all clusters, after the silhouette clean-up. The bold values correspond to $(\omega _4, \omega _5)$ pairs that fall into the optimal parameter zone, as suggested by Fig. 19

Full size table

Table 11 Composition of silhouette scores during the grid-search. Here $\omega _1 = 1000$, $\omega _2 = 1$ and $\omega _6 = \omega _3$. The presented scores correspond to the minimum score between all clusters, after the silhouette clean-up

Full size table

Table 12 Composition of silhouette scores during the grid-search. Here $\omega _1 = 1000$, $\omega _2 = 1$ and $\omega _6 = \omega _3$. The presented scores correspond to the average score between all clusters, before the silhouette clean-up

Full size table

Table 13 Composition of silhouette scores during the grid-search. Here $\omega _1 = 1000$, $\omega _2 = 1$ and $\omega _6 = \omega _3$. The presented scores correspond to the minimum score between all clusters, before the silhouette clean-up

Full size table

F NACE Economic Activities Codes

Table 14 Statistical classification of economic activities in the European Community (NACE Rev. 2)

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jokubaitis, S., Celov, D. Business Cycle Synchronization in the EU: A Regional-Sectoral Look through Soft-Clustering and Wavelet Decomposition. J Bus Cycle Res 19, 311–371 (2023). https://doi.org/10.1007/s41549-023-00090-4

Download citation

Received: 05 November 2022
Accepted: 31 October 2023
Published: 10 December 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s41549-023-00090-4

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Business Cycle Synchronization in the EU: A Regional-Sectoral Look through Soft-Clustering and Wavelet Decomposition

Abstract

Access this article

Similar content being viewed by others

Unveiling time-varying asymmetries in the stock market returns through energy prices, green innovation, and market risk factors: wavelet-based evidence from China

A Moving Linear Model Approach for Extracting Cyclical Variation from Time Series Data

Causal inference for time series analysis: problems, methods and evaluation

Availability of data and material:

Notes

References

Acknowledgements