1 Introduction

Time series prediction and classification in finance is significantly challenging due to the complexity, multivariate nature, and non-stationary nature of time series in this domain [1]. Security trading and price dynamics in financial markets are particularly complex due to the interacting nature and inter-connectedness of their underlying driving forces and determinants leading to significant co-movements in stocks’ prices. The characterization and modeling of multivariate time series dynamics have long been discussed in the financial literature, where the prevailing approach is that based on classical econometric theory. Among the multivariate linear models, the most widespread ones are vector autoregressive (VAR) models [2, 3], vector moving averages, ARMA (autoregressive moving average) models [4], and cointegrated VAR models [5]. Widespread is the use of multivariate conditional heteroskedasticity GARCH-type, see, e.g., [6] for a review, multivariate stochastic volatility models [7], and more methods based on the realized volatility [8].

Among the nonlinear models, the threshold autoregressive model [9], smooth transition autoregressive models [10], and Markov switching models [11] are nowadays standard approaches. Alternatives include nonparametric methods, functional coefficient [12] and nonlinear additive AR models [13], recurrence analysis, and neural networks. The complexity of modern financial markets running over the so-called limit-order book mechanism is, however, characterized by typical nonlinear, noisy, and often non-stationary dynamics. In addition, the high-dimensional nature of the limit-order book flow and the complexity of the interactions within it constitute severe limits in the applicability of classic econometric methods for its modeling and forecasting. Besides a very limited number of analytical and tractable models for the order flow and price dynamics in limit-order books [14,15,16], machine learning methods have received much attention [17, 18], as they are naturally appealing in this context.

By considering the stock market as a complex system, it is natural to apply such methods for addressing those prediction problems where the application setting and assumptions beneath standard econometric techniques are stringent or inadequate. Indeed, it has extensively been shown that, in financial applications, deep learning (DL) models are often capable of outperforming traditional approaches due to their ability to learn complex data representations based on end-to-end data-driven training, see, e.g., [19,20,21,22,23,24]. DL models have been adopted for a variety of problems ranging from price prediction [25,26,27,28,29], limit-order book-based mid-price prediction [20, 21, 23, 30, 31], and volatility prediction [32,33,34].

Whereas the target of the above literature is generally the analysis and prediction of single time series, this paper focuses on an analysis of stock pair co-movements. Several trading strategies can be put into play to take advantage of co-movements and exploit statistical arbitrages, including pair-trading, portfolio management, or relative and convergence trading strategies applied at an intraday level, e.g., see [35] for an overview. While DL provides a basis for prediction given a set of descriptive features, the issue of how to detect and quantify co-movements remains to be addressed. This paper suggests the use of recurrence analysis based on cross recurrence plots (CRP) for detecting and extracting features indicative of stocks’ shared dynamics or co-movements, along with a deep learning framework for predicting whether certain pairs of stocks will exhibit a shared dynamics in the future (in the sense specified in Sect. 2). Not only in the view of extending the ML and applied econometrics literature in this direction, but the possibility of forecasting epochs of time series synchronization is likewise relevant for practitioners.

For detecting and quantifying co-movements or, more generally, shared dynamical features in time series, the standard econometric approach is that of cross-correlation analysis [11, Ch. 8]. This intuitive linear approach, based on the estimation and perhaps forecasting of cross-correlation matrices, appears to be an element of a much wider theory and methodological approach that has been explored and developed in the last years within a broader generic non-financial setting. Simple cross-correlation analysis has been remarkably extended and generalized toward methods that help explore co-movements between time series within nonlinear, noisy, and non-stationary systems of very complex dynamics, either financial [36,37,38] or not [39,40,41].

Recurrent analysis [42] explores the reconstruction of a phase-space using time-delay embedding for quantifying characteristics of nonlinear patterns in a time series over time [43]. This is done by calculating the so-called Recurrence Plot [44], the core concept of which is to identify all points in time that the phase-space trajectory of a single time series visits roughly the same area in the phase-space. Recurrence plot analysis has no assumptions or limitations on dimensionality, distribution, stationarity, and size of data [42]. These characteristics make it suitable for multidimensional and non-stationary financial time series data analysis. The CRP [45] is an extension of the recurrence plot, introduced to analyze the co-movement and synchronization of two different time series. The CRP indicates points in time that a time series visits the state of another time series, with possibly different lengths in the same phase-space. These concepts are discussed in further detail in Sect. 2.

In this paper, we propose a method for predicting the state of synchronization over time of two multidimensionalFootnote 1 financial time series based on their CRP. In particular, we use the CRP to quantify the co-movements and extract the binary representation of its diagonal elements as the prediction targets for a DL model. For predicting the state of synchronization at the next epoch, we employ a Convolutional Neural Network (CNN) that uses as inputs CRPs independently calculated from data-crops obtained by applying fixed-size sliding windows on the time series. Our extensive experiments on 12 stocks of the S &P100 index selected from different sectors show that the proposed method can predict the synchronization of stock pairs with satisfactory performance.

The remainder of the paper is organized as follows. Section 2 introduces in detail the concepts and theory behind the CRP, with an outlook on its applications in financial and economic problems. Our proposed approach for predicting time instances of time series’ synchronization is presented in Sect. 3. Empirical results on real market data are provided in Sect. 4, while Sect. 5 provides conclusions.

2 Financial time series recurrence analysis

Recurrence in the analysis of time series, seen as a nonlinear dynamic system, is the repetition of a pattern over time. The visualization of recurrences in the dynamics of a time series can be expressed via a RP or recurrence matrix [42]. In other words, the RP represents the recurrence of the phase-space trajectory to a state. The phase-space of a d-dimensional time series \({\mathcal {N}}\) with T observations \({\mathcal {N}} =\{{\textbf{n}}_1^\top , {\textbf{n}}_2^\top ,\dots , {\textbf{n}}_T^\top \}^\top\), with \({\textbf{n}}_i\) being the row vector representing a generic observation at time i, \(i = 1,\dots ,T\) is calculated using the time-delay embedding method. State \(N_i\) in the phase-space is obtained by

$$\begin{aligned} N_i = [{\textbf{n}}_i, {\textbf{n}}_{i+\tau },\dots , {\textbf{n}}_{i+(k-1)\tau }], \,\, i=1,\dots ,T' \text {,} \end{aligned}$$
(1)

where \(\tau\) denotes the delay and k is the embedding dimension, \(T' = T-\tau (k-1)\), \(\tau\) and k can, respectively, be determined with the Average Mutual Information Function method [46] and the False Nearest Neighbors method of [47]. For a uni-dimensional time series, \(N_i\) is a row vector of size \((1 \times k)\), and for a d-dimensional times series \(N_i\) is a row vector of size \((1 \times kd)\). The recurrence state matrix of the reconstructed phase-space, known as Recurrence Plot (RP), at times i and j, is defined as

$$\begin{aligned} R_{i,j}(\varepsilon ) = H(\varepsilon - \Vert N_i - N_j\Vert ), \,\, i,j = 1,\dots ,T' \text {,} \end{aligned}$$
(2)

where \(\varepsilon\) is a threshold distance value, \(H(\cdot )\) is the Heaviside function, and \(\Vert \cdot \Vert\) is the euclidean distance. Due to the underlying embedding (1), \(R_{i,j}\) is defined for i \(i = 1\) up to \(T' = T-\tau (k-1)\). If two states \(N_i\) and \(N_j\) are in an \(\varepsilon\)-neighborhood the value of \(R_{i,j}\) is equal to 1, otherwise is 0. The value of \(\varepsilon\) highly affects the output of RP. When \(\varepsilon\) is too small or too large, the RP cannot identify the true recurrence of states. There are different approaches for finding the best value for \(\varepsilon\) in the literature [42]. We follow the guidelines provided in [48] for selecting \(\varepsilon\). The values on the diagonal line of RP are equal to one (i.e., \(R_{i,i} = 1\)) because in that case, the two states introduced to \(H(\cdot )\) are identical. The diagonal line of RP is called the Line Of Identity (LOI). Recurrence quantification measures derived from RPs, such as recurrence rate (RR), percent determinism, and maximum line length in the diagonal direction [42], give insights into the dynamic behavior of time series. These measures have been used in financial research to analyze the behavior of financial data, e.g., [49,50,51]. The RP of a time series can be used as a data transformation method for time series prediction. A method employing the RP of seven financial time series to train a deep neural network for predicting the market movement is proposed by [52]. Several authors have suggested an RP forecasting approach via DL. A feature extraction method exploiting the RP for parsing a DL algorithm is proposed by [53]. RP can be treated as images enabling the use of computer vision techniques for the forecasting task, such as autoencoders [54] or CNNs [55].

The CRP of two multidimensional time series [56] corresponds to an extension of RP which explores the co-movement of two time series and allows the study of the nonlinear dependencies between them. Let us denote by \(N_i, \,i=1,\dots ,T\) and \(M_j, \,j=1,\dots ,S\) the phase-space states of the time series \(\mathcal {N}\) and \(\mathcal {M}\) of length T and S, respectively. The cross recurrence (CR) states of the reconstructed state-space a time i and j are

$$\begin{aligned} CR_{i,j}(\varepsilon ) = H(\varepsilon - \Vert N_i - M_j \Vert ) \end{aligned}$$
(3)

with \(i = 1,\dots ,T' = T-\tau (k-1)\) and \(j=1,\dots ,S' = S-\tau (k-1)\). Here \(N_i\) and \(M_i\) are defined as in (1). \(\textrm{CR}_{i,j}\) defines the concept of synchronization and the way synchronization between two financial time series measured: an \(\varepsilon\)-neighborhood of the embeddings \(N_i\), \(M_j\) at epochs ij. We denote the full cross recurrence matrix, known as cross recurrence plot (CRP) extracted for \(\mathcal {N}\) and \(\mathcal {M}\) as \(\textrm{CRP}_{(\mathcal {N},\mathcal {M})}\), obtained through

$$\begin{aligned} \textrm{CRP}_{(\mathcal {N},\mathcal {M})}:= \left\{ \textrm{CR}_{i,j}(\varepsilon ) \right\} _{i = 1,\dots ,T',\, j = 1,\dots ,S'}\text {.} \end{aligned}$$
(4)

The CRP corresponds to a matrix of dimension \(T' \times S'\), which may not be square, as the time series \({\mathcal {N}}\) and \({\mathcal {M}}\) may have different lengths, i.e., \(T\ne S\).

For \(\mathcal {N}\), \(\mathcal {M}\) of equal length T, the \(CRP_{(\mathcal {N},\mathcal {M})}\) is a square \(T'\times T'\) matrix. Opposed to the (univariate) recurrence analysis of one time series (with itself, RP in equation (2)), the diagonal entries of the CRP are either 1 or 0, as the two time series may or may not be synchronized at (ij), \(i=j, i = 1,\dots ,T'\), see, e.g., Figure 2. In the CRP of two time series, the LOI is replaced by a distorted diagonal, called the Line Of Synchronization (LOS). The LOS reveals the relationship between the two time series in the time domain. In particular, it provides a nonparametric function containing information about the time-rescaling of the two time series, which further allows their re-synchronization [57].

As the time series we consider in our application are multidimensional (\(d>1\)), we point out that the CRP is indeed a multidimensional cross recurrence plot (MdCRP) [56] where \({\textbf{n}}_i\), \({\textbf{m}}_j\) are row vectors rather than scalars, and \(N_i\), \(M_j\) are dk-dimensional row vectors rather than k-dimensional row vectors, as opposed to the conventional CRP based on one-dimensional time series. Yet the above discussion is general and applies to both cases, and \(CR_{i,j}\) is in any case a scalar equal to either 0 or 1. For multidimensional time series, the entries of each of the two time series require normalization in each dimension before estimating the MdCRP [56], e.g., with the z-score.

Financial time series co-movement analysis using the CRP and LOS is studied in [58, 59]. The work in [60] analyses the inter-dependencies of the stock market index and its associated volatility index, further proposing a method for the LOS estimation based on a corrupted CRP. Furthermore, it is important to notice that financial time series are often represented as multivariate instances. Indeed, even the most basic source of financial data generally provides information about volumes along with prices. Furthermore, a number of closely related variables (e.g., volatility measures) are simple to extract. Despite the use of multidimensional inputs being effective and commonly found across several applications [42], existing CRP applications on market data are broadly limited to the use of one-dimensional series only (e.g., prices or volatilities) [42, 49, 61].

3 Proposed method

We exploit the CRP to quantify the co-movement of two multidimensional time series \(\mathcal {N}\) and \(\mathcal {M}\) observed continuously over a common period of length V. Our goal is to predict whether at a certain epoch (e.g., a certain calendar day), \(\mathcal {N}\) and \(\mathcal {M}\) are synchronized in the state-space embedding \(\varepsilon\)-neighborhood given by (3).

Assume two generic time series \(\mathcal {N}\) and \(\mathcal {M}\) are observed over the non-overlapping time-domains \(D_\mathcal {N}= \{ t_1,\dots ,t_T\}\) and \(D_\mathcal {M}= \{ s_1,\dots ,s_S\}\) of respective length T and S, their CRP corresponds to a \(T' = (T-\tau (k-1)) \times (S-\tau (k-1)) = S'\) matrix where the (ij) element expresses the state of synchronization at the i-th time instance of the first time series (\(t_i\)) and at the j-th time instance of the second (\(s_j\)), in terms of the \(\varepsilon\)-neighborhood of the states \(N_i\) and \(M_j\), as expressed by (3). With the domains being non-overlapping, there are no epochs \(t_i\) and \(s_j\) such that \(t_i = s_j\) in (the same) calendar time, and the state of synchronization at a same calendar time cannot be determined. Indeed, for a fixed \(t_i\), the column-vector \(\textrm{CRP}_{i,\cdot }\) reports for the epochs \(s_j,\, j=1,\dots ,S'\) (past or future with respect to \(t_i\)) whether state-space embedding \(M_j\) is in the same \(\varepsilon\)-neighborhood of \(N_i\).

In this light, if the domains of \(\mathcal {N}\) and \(\mathcal {M}\) only partially overlap over a region \(D:=D_{\mathcal {N}} \cap D_{\mathcal {M}}\) of length V, for our forecasting purpose, their non-overlapping regions \(D_\mathcal {N}{\setminus } D\) and \(D_\mathcal {M}{\setminus } D\) are irrelevant and can be discarded. Their V overlapping time instances \(t_i,\dots ,t_{i+V}\) and \(s_j,\dots , s_{j+V}\) correspond to the same calendar epochs, i.e., \(t_{i+h} = s_{j+h},\, \forall h = 1,\dots ,V\), and are of actual relevance. Over the common domain D, the CRP corresponds to a square \(V' \times V'\) matrix (\(V' = V-\tau (k-1)\)) with a well-defined diagonal expressing the state of synchronization at \(t_i = s_j\), e.g., answering whether at the (same) calendar day \(t_i = s_j\), \(\mathcal {N}\) and \(\mathcal {M}\) are synchronized or not.

This justifies the required form for the input data, corresponding to two (multidimensional) time series \(\mathcal {N}\) and \(\mathcal {M}\) observed over a common period D of equal length V, with \(D =\left\{ v_1 = \text {max}(t_1,s_1),\dots , v_V = \text {min}(t_T,s_S)\right\}\). Since the essence of time series forecasting is that of predicting the future from the past, the data from the past needs to be representative of the h-step ahead forecast. This implicitly requires D to be a continuous set of times for the given sampling frequency. That is, there should be no gaps between days or epochs, namely, \(v_V \equiv v_1 + (V-1)\). Furthermore, in order to calculate (3), we require the time series to be non-corrupted over D in all its multivariate entries, i.e., without missing values. The above requirements are generally met for the financial time series of our interest. The only constraints are that of using data for stocks traded at the same exchange (same trading days and observed festivities) and that of selecting stocks not subject to delisting in the period of interest.

Aligned with the general rationale of time series forecasting, we aim at predicting the one-step-ahead synchronization status between \(\mathcal {N}\) and \(\mathcal {M}\) at epoch \(i+1\), based on some lagged historical records available up to time i, that is based on some suitable set of feature observed or extracted over w past epochs. For \(i = w,\dots ,V'-1\), let us denote by \(\mathcal {N}^w_i\) and \(\mathcal {M}^w_i\) the sub-sample of \(\mathcal {N}\) and \(\mathcal {M}\) of the w most recent observations up to and including epoch i, that is

$$\begin{aligned} \mathcal {N}^w_i&= [ {\textbf{n}}_{i-w+1}, {\textbf{n}}_{i-w+2},\dots , {\textbf{n}}_{i}]\text {,}\\ \mathcal {M}^w_i&= [ {\textbf{m}}_{i-w+1}, {\textbf{m}}_{i-w+2},\dots , {\textbf{m}}_{i}]\text {.} \end{aligned}$$

Let us denote by \(\textrm{CRP}_{(\mathcal {N}^w_i,\mathcal {M}^w_i)}\) the \(w'\times w'\) CRP computed from \(\mathcal {N}^w_i\) and \(\mathcal {M}^w_i\) (with embedding dimension k, lag \(\tau\) and \(w' = w-\tau (k-1)\)). At epoch i, \(\textrm{CRP}_{(\mathcal {N}^w_i,\mathcal {M}^w_i)}\) is used as the input of the neural network for predicting the state of synchronization at \(i+1\). Within this framework, there are \(V'-w\) input-target pairs. The first pair corresponds to the input \(\textrm{CRP}_{(\mathcal {N}^w_{w},\mathcal {M}^w_{w})}\) and target \((\textrm{CRP}_{(\mathcal {N},\mathcal {M})})_{w+1,w+1}\), the last to the input \(\textrm{CRP}_{(\mathcal {N}^w_{V'-1},\mathcal {M}^w_{V'-1})}\) and target \((\textrm{CRP}_{(\mathcal {N},\mathcal {M})})_{V',V'}\). The prediction target at epoch i corresponds to the state of synchronization at \(i+1\), provided by the (diagonal) entry \(\left( \textrm{CRP}_{(\mathcal {N},\mathcal {M})}\right) _{i+1,i+1}\) of the CRP computed for the entire times series \(\mathcal {N}\), \(\mathcal {M}\). In particular, the state of synchronization at any epoch \(i=1,\dots ,V'\) is provided by the diagonal of \(\textrm{CRP}_{(\mathcal {N},\mathcal {M})}\), i.e.,

$$\begin{aligned} \underset{i= w,...,V'-1}{\textbf{diag}(\textrm{CRP}_{(\mathcal {N}, \mathcal {M})})_i}= {\left\{ \begin{array}{ll} 1, &{} \;\;\;\;\;\;\;\;{\text {if}\; \mathcal {N}\text { and }\mathcal {M}}\\ &{}\;\;\text {are synchronized at time } i,\\ 0, &{} \;\;\;\;\;\;\;\;\;\;\;\text { otherwise,} \end{array}\right. } \end{aligned}$$
(5)

so that \(\left\{ \text {diag}\left( \textrm{CRP}_{(\mathcal {N},\mathcal {M})}\right) _i \right\} _{i=w+1,\dots ,V'}\) corresponds to the targets for the inputs \(\left\{ \textrm{CRP}_{(\mathcal {N}^w_{i},\mathcal {M}^w_{i})} \right\} _{i = w,\dots ,V'-1}\). The above corresponds to a framework where inputs are created dynamically by using CRPs computed over sub-sampled time series obtained by applying sliding windows of a fixed size. The above construction is illustrated in Fig. 1. Note that \(\textrm{CRP}_{(\mathcal {N}^w_i,\mathcal {M}^w_i)}\) is not analogous to the sub-matrix P obtained from \(\textrm{CRP}_{(\mathcal {N},\mathcal {M})}\) by considering its rows and columns from \(i-w+1\) to i. In \(\textrm{CRP}_{(\mathcal {N},\mathcal {M})}\) the entire data in \(\mathcal {N}\) and \(\mathcal {M}\) accounts for the time series normalization and furthermore tunes the parameter \(\varepsilon\). \(\textrm{CRP}_{(\mathcal {N}^w_i,\mathcal {M}^w_i)}\) is thus truly dependent on the cropped times series data \(\mathcal {N}^w_i\), \(\mathcal {M}^w_i\), while P is not. In a forecasting context, our approach is feasible and unbiased as it does not use any future information following the one available at i. Note that, in general, nothing prevents from choosing the embedding size and lag parameter differently for the CRP computation of the targets and for the CRP computations of the inputs.

A simple example with two one-dimensional time series clarifies how we extract the input features and prediction targets. Consider the two time series \(\mathcal {A}\) and \(\mathcal {B}\) of 10 observations:

$$\begin{aligned} \mathcal {A}&= \{A,B,A,A,C,D,D,B,C,C\}^\top , \\ \mathcal {B}&= \{A,C,C,C,D,B,D,B,C,C\}^\top . \end{aligned}$$

Fig. 2 depicts their CRP, i.e., \(\textrm{CRP}_{(\mathcal {A},\mathcal {B})}\) (for simplicity computed with \(k=\tau =1\), and \(V=V'=10\)). The diagonal line of the CRP is highlighted and includes the values of the recurrence states. The diagonal line shows that the behavior of \(\mathcal {A}\) and \(\mathcal {B}\) at timestamps between 1 and 7 to 10 is synchronized, therefore at these timestamps the prediction targets are set to 1 (the actual value of the Heaviside function in (3)). By, for instance, setting \(w = 3\), we aim at predicting \(V-w = 7\) states of synchronization. The first prediction concerns the synchronization at epoch \(w+1=4\), based on the \(\textrm{CRP}_{(\mathcal {N}^3_3,\mathcal {M}^3_3)}\), that is, the CRP calculated from the first three observations of \(\mathcal {A}\) and \(\mathcal {B}\). The prediction of the synchronization at epoch 5 is based on the CRP calculated on observations 2 to 4, i.e., on \(\textrm{CRP}_{(\mathcal {N}^3_4,\mathcal {M}^3_4)}\). The procedure is repeated up to epoch \(V-1=9\), where \(\textrm{CRP}_{(\mathcal {A}^3_{9},\mathcal {B}^3_{9})}\), calculated from the observations 7 to 9, is used for predicting \(\text {diag}(\textrm{CRP}_{(\mathcal {A},\mathcal {B})})_{10}\).

Fig. 1
figure 1

Proposed method. For two input time series \(\mathcal {M}\), \(\mathcal {N}\), we construct their CRP and take its one-period-ahead diagonal entries as targets for model prediction (bottom flow). On the other hand, we construct a sequence of CRPs based on a fixed-size sliding window ending at the current period, whose elements constitute the inputs for the Neural Network (upper flow)

Fig. 2
figure 2

The CRP of two time series with the diagonal line highlighted

Appendix B includes a visualization that displays the results of recurrence analysis on real-world stock data. This visualization provides empirical evidence of the complexity of the patterns underneath the RP and CRP.

To practically implement the underlying DL model that maps each \(\textrm{CRP}_{(\mathcal {N}^w_{i},\mathcal {M}^w_{i})}\) input to its corresponding \(\text {diag}\left( \textrm{CRP}_{(\mathcal {N},\mathcal {M})}\right) _{i+1}\) output, consider that each input consists of a matrix of zeros and ones that can be considered analogous to an image. Therefore we can rely on well-established classification models. In particular, we employ a Convolutional Neural Network (CNN). Such a neural network is well-suited for capturing the spatial relationships between the features in their input, which in our case correspond to the 0-1 features encoded in the entries of \(\textrm{CRP}_{(\mathcal {N}^w_{i},\mathcal {M}^w_{i})}\). Note that in the CRP calculation \(\mathcal {N}^w_{i}\) and \(\mathcal {M}^w_{i}\) are z-score normalized before computing (3).

Fig. 3
figure 3

Architecture of the proposed convolutional neural network

We use a CNN architecture formed by two convolutional and one fully connected layer, as illustrated in Fig. 3. The neural network involves the typical blocks of the CNN architecture. The convolutional layers adaptively learn the spatial relationships of inputs, the Rectified Linear Unit (ReLU) activation introduces nonlinearity to the model, and the max-pooling layer provides down-sampling operations reducing the size of the feature map by extracting the maximum value in each patch from the input feature map. The current CNN is chosen based on a grid search over different network architectures, layers’ types, and sizes, aimed at maximizing the F1-score and showing the feasibility of our CRP-based DL approach. Importantly, the temporal connections in the input data are handled within the CNN. In fact, the construction of the input data as outlined above relates all the instances in \(\textrm{CRP}_{SW_i}\) to the target \(y_i\), where the entries in \(\textrm{CRP}_{SW_i}\) furthermore constitute aggregate values (CRP entries) capturing the similarity of the input processes at different lags of time, within the windows W. The output of the stacked convolutional layers is introduced to a fully connected layer, leading to the network’s output by applying a softmax function.

4 Experiments

4.1 Data

Our analyses rely on daily adjusted closing prices and daily number of traded shares (volumes) for 12 representative constituents of the S &P100 index in the period from December 31st, 2014 to November 29th, 2021 (\(V= 1,741\) trading days). The data is retrieved from Yahoo Finance. These 12 stocks are selected based on their market capitalization and their market sector. For each sector, we select the first two stocks of highest-but-comparable capitalization, a practice well-supported by financial theory [62]. Market sectors provide a natural grouping for securities: analyses conducted at a sector level are a common practice for granting comparability and robustness of the results, as across market sectors the dynamics of economic variables are well-known to be asymmetric. Table 1 lists our stock selection. Each stock is expressed as a trivariate time series consisting of daily prices, volumes, and returns. CRPs express temporal similarities in joint terms of the price level, traded volume, and daily return, providing a generalized definition of similarity in time series dynamics at a multivariate level.

For our bivariate analysis on two time series, we have \((12^2-12)/2 = 66\) pairs of stocks. For each stock pair, we use the first \(70\%\) of the data for training (\(V_\text{ train} = 1,218\) days) and the last \(30\%\) for testing (\(V_\text{test} = 523\) days). As the future input instances should not affect the training process, the order of the input data during the training is fixed. The input and targets of the train data and the test data are, respectively,

$$\begin{aligned} \text {Inputs:}&\;\left\{ \textrm{CRP}_{(\mathcal {N}^w_{i} ,\mathcal {M}^w_{i})} \right\} _{i \in I}\text {,}\\ \text {Targets:}&\;\left\{ \text {diag}\left( \textrm{CRP}_{(\mathcal {N},\mathcal {M})}\right) _i \right\} _{i \in T} \text {,} \end{aligned}$$

where \(I = w,\dots ,V_\text {train}\) and \(T =w+1,\dots , V_\text {train}+1\) for the training set and \(I = V_\text {train}+1,\dots ,V-1\) and \(T =V_\text {train}+2,\dots , V\) for the test set. We train the neural network once over the data for all the picks of the stock pairs. This pooled approach is a common practice in closely related Machine Learning literature, e.g., [21], and supported by the empirical findings of [63], suggesting the existence of an universal price formation mechanism (model), and thus price dynamic, not specific for individual assets. In practice, the input and output data is the concatenation of the individual pairs’ inputs-targets. For example, for a set window size w, for the train set the input-target data consists of \((V'_\text {train}-w) \times 66\) examples, that is \((V'_\text {train}-w) \times 66\) pairs of cross recurrent matrices and (scalar) targets, where \(V'_\text {train} = (V_\text {train}-\tau (k-1))\). In the training phase, the training data is used to estimate the optimal weights of the CNN. The test data is then parsed to the estimated CNN and the quality of the network outputs is evaluated against the actual targets. Details are provided in the following two subsections.

Table 1 List of selected stocks

For the training of the CNN we adopt the ADAM optimizer with the following hyperparameters: learning rate 0.01 (reduced by a factor of 5 every 40 epochs), momentum parameters 0.9 and 0.999, batch size 128 and epoch size 300. Across the epochs, we keep track of the F1-score on the validation set, which is set to the last 15% portion of the training set. For our classification task, we adopt the binary cross-entropy loss. As the target classes are unbalanced, the loss is weighted for the targets’ class proportion. Details on the filter sizes, kernel sizes and the max pooling size are provided in Fig. 3.

With respect to the CRP computations, throughout our analyses the embedding dimension k is set to 2 or 3 (estimated via FNN method) based on input type, and the delay parameter \(\tau\) is set to 1. Values 0.45, 0.55, 0.65, and 0.75 are used for the threshold \(\varepsilon\). These hyperparameters are selected according to the guidelines and discussion in [48] and [56]. The same values are applied for both the computation for the CRP related to the targets and the CRPs related to the inputs.

In our experiments, we consider two different choices for the window-size hyperparameter, namely \(w=\{10,30, 50, 60, 80\}\) days. With the above settings, \(V=V'=1,741\) days, \(V_\text {train} = V'_\text {train} =1,218\), and \(V_\text {test} = V'_\text {test} = 523\) days. For \(i = w,\dots ,V-1\), \(\textrm{CRP}_{(\mathcal {N}^w_i,\mathcal {M}^w_i)}\) are square matrices of size \(w'=w\) and \(CRP_{(\mathcal {N},\mathcal {M})}\) a square matrix of size V on whose diagonal are found the relevant targets, i.e., \(\text {diag}\left( CRP_{(\mathcal {N},\mathcal {M})}\right) _i\), \(i = w+1,\dots ,V\).

Table 2 Performance measures on the test set using (adjusted) price and volume as input variables

4.2 Experimental results

Stock pairs from the same sector or two different sectors with different co-movement behaviors can provide comprehensive experimental data to show the ability of the proposed method to predict the state of synchronization. To evaluate the performance of our proposed method, all pairs of stocks are used as the input of the method. We collect all pairs of stocks, and for each pair we follow the proposed steps (ref. Figure 1) to create the inputs and targets. We stack the input-target pair-specific data to create a single train and test set for all pairs.

Tables 2 and 3 show the performance of our proposed approach for all pairs of stocks using two types of input: (price, volume) and (price, return, volume), respectively. Given that the target classes are generally imbalanced, the preferred reference performance metric is the F1-score. Yet, we also include accuracy, precision, and recallFootnote 2 to have a clearer overview of the classification performance. For robustness, we run our experiments over a range of values for the window-size W and threshold \(\varepsilon\) hyperparameters, a setup that further clarifies the effect of these hyperparameters on prediction performance. Additional results for all the 66 pairs of stocks are provided in Appendix A.

Table 3 Performance measures on the test set using (adjusted) price, volume and returns as input variables

Results for the (price, volume) time series input are provided in Table 2, results for the (price, return, volume) input in Table 3. In general, our results show that the task of predicting the state of synchronization is not only feasible but, under our setup, quite satisfactory. Indeed our preferred performance F1 metric is as high as 84%. As expected, the results appear to be sensible to the choice of the window size and threshold parameter. In particular, the performance metrics decrease in their values as the threshold parameter and the window size increase. This means that stricter \(\varepsilon\)-neighborhoods are easier to predict and that the relevant information for the prediction of the synchronization state is found in the most recent instances of the CRP. This suggests the existence of patterns in the data that are strongly indicative of close \(\varepsilon\)-neighborhoods. I.e., the CNN detects clear patterns indicative of the fact that the day-ahead synchronization is likely to be very strong (the \(\varepsilon\)-neighborhood is tight), indeed, as \(\varepsilon\) increases, the performance metric decreases, indicating that the model indeed detects strong evidence of “strong” day-ahead synchronization. Regarding the window size, long-lagged CRP information appears to introduce noise in the system without providing any predictive gains, aligned with the intuition that further-in-time information is less and less related to the current state of the system and of little use for prediction.

Suspecting that the joint use of prices and returns might be redundant since they are closely related to each other, we also run a second experiment involving volumes and returns only. It is interesting to note that the inclusion of the returns does not seem to provide any advantage with respect to the (price, volume) input time series, but rather the opposite effect. It is expected that the inclusion of further input variables complicates the patterns in the CRP chessboard so that under the same network architecture, the performance metrics decrease. Furthermore, and aligned with the above, in additional experiments here not reported, we included squared returns (as a gross measure of daily volatility), finding that they also appear detrimental to the performance metrics and prediction task. This perhaps suggests that the network architecture needs to scale up with the dimensionality of the input data, reasonably inducing more complex patterns in the CRP.

An alternative method for assessing the prediction method’s performance is to examine its performance on periods of high and low volatility for stock pairs. To do this, for each stock, we construct estimates of daily volatility by applying the Exponentially Weighted Moving Average (EWMA) filter [64]:

$$\begin{aligned} \sigma _t = \sqrt{\lambda \cdot r_{t-1}^2 + (1-\lambda ) \cdot \sigma _{t-1}^2} \end{aligned}$$
(6)

where \(\sigma\) is the volatility, r is the return, t is an index denoting the day, and \(\lambda\) a decay factor. When calculating the volatility, the decay factor \(\lambda\) determines the weight given to older returns. We set \(\lambda = 0.94\), which is commonly used for daily returns series [64].

For each stock in every pair, we intersect the days of highest 30% and lowest 30% volatility. From such an intersection, we identify dates corresponding to two volatility regimes (high volatility and low volatility). For every pair, on average, 14% (12%) of the dates in the test set correspond to high (low) -volatility days. In this way, we derive high-volatility and low-volatility subsets from the test set.

Table 4 displays the out-of-sample performance of the best model (\(w=10,\, \varepsilon =0.45\), price-volume data) for the above two subsets. The results show that the prediction of the synchronization state is very satisfactory on both high-volatility and low-volatility test sets, with higher accuracy observed for the low-volatility set. Notably, the high-volatility test set has a higher percentage of non-synchronized time instances (12% in Class 0) than the low-volatility test set (5% in Class 0), which can be interpreted as evidence of higher non-synchronization or disentangled dynamics on high-volatility days with respect to low-volatility ones. Interestingly, these percentages indicate that even on high-volatility days, stocks tend to be by far very synchronized (88%), though not as much as on low-volatility days (where the fraction of non-synchronized days is 7% smaller). The relatively low number of non-synchronized days for the high-volatility regime also suggests that stocks are similarly perturbed/exposed to the market risk factors causing volatility and that their response to volatility outbursts is similar. Indeed, in 88% of high-volatility test days, the data is detected in the same price-volume embedding, or, in simpler terms, two series share similar dynamics.

Table 4 Performance measures on the high-volatility and low-volatility test sets

5 Conclusion

Predicting the co-movement of two multidimensional time series is a relevant task for the financial industry that supports potential trading strategies based on their interrelationships. This paper contributes to the literature by providing (i) a method relying upon the CRP to quantify the time series coupling over time, (ii) a DL model for predicting the time series synchronization state, (iii) the use of a multidimensional time series representation of the inputs involving prices, volumes, and returns. We conducted extensive analyses on real stock market data from different sectors: our results show that the proposed setup can effectively predict the one-day-ahead synchronization of two time series. An interesting future research direction would be to investigate the applicability of such an approach to a high-frequency domain where the high-dimensional nature of the raw data may provide valuable information for analyzing and predicting the coupling in settings that are known to be characterized by high levels of noise.