Predicting the state of synchronization of financial time series using cross recurrence plots

Cross-correlation analysis is a powerful tool for understanding the mutual dynamics of time series. This study introduces a new method for predicting the future state of synchronization of the dynamics of two financial time series. To this end, we use the cross recurrence plot analysis as a nonlinear method for quantifying the multidimensional coupling in the time domain of two time series and for determining their state of synchronization. We adopt a deep learning framework for methodologically addressing the prediction of the synchronization state based on features extracted from dynamically sub-sampled cross recurrence plots. We provide extensive experiments on several stocks, major constituents of the S &P100 index, to empirically validate our approach. We find that the task of predicting the state of synchronization of two time series is in general rather difficult, but for certain pairs of stocks attainable with very satisfactory performance (84% F1-score, on average).


I. INTRODUCTION
Time series prediction and classification in finance is significantly challenging due to the complexity, multivariate nature, and non-stationary nature of time series in this domain [Murphy, 1999].Security trading and price dynamics in financial markets are particularly complex due to the interacting nature and inter-connectedness of their underlying driving forces and determinants leading to significant co-movements in stocks' prices.The characterization and modeling of multivariate time series dynamics have long been discussed in the financial literature, where the prevailing approach is that based on classical econometric theory.Among the multivariate linear models, the most widespread ones are vector autoregressive (VAR) models [Lütkepohl, 1999], [Lütkepohl, 1991], vector moving averages and ARMA (autoregressive moving average) models [Reinsel, 1993] and cointegrated VAR models [Juselius, 2006].Widespread is the use of multivariate conditional heteroskedasticity GARCH-type, see e.g.[Bauwens et al., 2006] for a review, multivariate stochastic volatility models [Harvey et al., 1994], and more methods based on the realized volatility [Chiriac and Voev, 2011, e.g.].
Among the non-linear models the threshold autoregressive model [Tong, 1978], smooth transition autoregressive [Dijk et al., 2002] and Markov switching models [Krolzig, 2013] 1 Paper submitted to and under consideration at Pattern and Recognition Letters.
are nowadays standard approaches.Alternatives include nonparametric methods, functional coefficient [Chen and Tsay, 1993a] and nonlinear additive AR models [Chen and Tsay, 1993b], recurrence analysis, and neural networks.The complexity of modern financial markets running over the so-called limit-order book mechanism is, however, characterized by typical non-linear, noisy, and often non-stationary dynamics.In addition, the high-dimensional nature of the limit-order book flow and complexity of the interactions within it constitute severe limits in the applicability of classic econometric methods for its modeling and forecasting.Besides a very limited number of analytical and tractable models for the order flow and price dynamics in limit-order books [Cont et al., 2010], [Huang and Kercheval, 2012], [Hawkes, 2018, e.g.], machine learning methods have received much attention [Heaton et al., 2017], [Dixon et al., 2020], as they are naturally appealing in this context.
Whereas the target of the above literature is generally the analysis and prediction of single time series, this paper focuses on an analysis of stock pair co-movements.Several trading strategies can be put into play to take advantage of comovements and exploit statistical arbitrages, including pairtrading, portfolio management, or relative and convergence trading strategies applied at an intraday level, e.g., see [Guo et al., 2017] for an overview.While DL provides a basis for prediction given a set of descriptive features, the issue of how to detect and quantify co-movements remains to be addressed.This paper suggests the use of recurrence analysis based on Cross Recurrence Plots (CRP) for detecting and extracting features indicative of stocks' shared dynamics or co-movements, along with a deep learning framework for predicting whether certain pairs of stocks will exhibit a shared dynamics in the future (in the sense specified in Section II).Not only in the view of extending the ML and applied econometrics literature in this direction, but the possibility of forecasting epochs of time series synchronization is likewise relevant for practitioners.
For detecting and quantifying co-movements or more generally shared dynamical features in time series, the standard econometric approach is that of cross-correlation analysis, e.g., [Tsay, 2005, Chapter 8].This intuitive linear approach, based on the estimation and perhaps forecasting of crosscorrelation matrices, appears to be an element of a much wider theory and methodological approach that has been explored and developed in the last years within a broader generic nonfinancial setting.Simple cross-correlation analysis has been remarkably extended and generalized towards methods that help explore co-movements between time series within nonlinear, noisy, and non-stationary systems of very complex dynamics, either financial [Ma et al., 2013], [Bonanno et al., 2001], [Ramchand and Susmel, 1998] or not [Webber Jr. and Zbilut, 1994], [Marwan and Kurths, 2002], [Lancia et al., 2014].
Recurrent analysis [Webber and Marwan, 2015] explores the reconstruction of a phase-space using time-delay embedding for quantifying characteristics of nonlinear patterns in a time series over time [Takens, 1981].This is done by calculating the so-called Recurrence Plot [Eckmann et al., 1995], the core concept of which is to identify all points in time that the phase-space trajectory of a single time series visits roughly the same area in the phase-space.Recurrence plot analysis has no assumptions or limitations on dimensionality, distribution, stationarity, and size of data [Webber and Marwan, 2015].These characteristics make it suitable for multidimensional and non-stationary financial time series data analysis.The CRP [Marwan, 1999] is an extension of the recurrence plot, introduced to analyze the co-movement and synchronization of two different time series.The CRP indicates points in time that a time series visits the state of another time series, with possibly different lengths in the same phase-space.These concepts are discussed in further detail in Section II.
In this paper, we propose a method for predicting the state of synchronization over time of two multidimensional 2 financial time series based on their CRP.In particular, we use the 2 Throughout the paper, with uni-or multi-variate we refer to the nature of the analyses (RP as opposed to CRP), and with one-or multi-dimensional we refer to the nature of the time series.That is, the RP (as presented in equation ( 2)) provides a univariate analysis of a single one-dimensional time series, while the CRP (as presented in latter equation ( 3)) a multivariate analysis of two one-dimensional or multi-dimensional time series.
CRP to quantify the co-movements and extract the binary representation of its diagonal elements as the prediction targets for a DL model.For predicting the state of synchronization at the next epoch we employ a Convolutional Neural Network (CNN) that uses as inputs CRPs independently calculated from data-crops obtained by applying fixed-size sliding windows on the time series.Our extensive experiments on 12 stocks of the S&P100 index selected from different sectors show that the proposed method can predict the synchronization of stock pairs with satisfactory performance.
The remainder of the paper is organized as follows.Section II introduces in detail the concepts and theory behind the CRP, with an outlook on its applications in financial and economic problems.Our proposed approach for predicting time instances of time series' synchronization is presented in Section III.Empirical results on real market data are provided in Section IV, whilst Section V provides conclusions.

II. FINANCIAL TIME SERIES RECURRENCE ANALYSIS
Recurrence in the analysis of time series, seen as a nonlinear dynamic system, is the repetition of a pattern over time.The visualization of recurrences in the dynamics of a time series can be expressed via a RP or recurrence matrix [Webber and Marwan, 2015].In other words, the RP represents the recurrence of the phase-space trajectory to a state.The phasespace of a d-dimensional time series N with T observations N = {n 1 , n 2 , . . ., n T } , with n i being the row-vector representing a generic observation at time i, i = 1, . . ., T is calculated using the time-delay embedding method.State N i in the phase-space is obtained by where τ denotes the delay and k is the embedding dimension, T = T − τ (k − 1), τ and k can, respectively, be determined with the Average Mutual Information Function (AMI) method [Fraser and Swinney, 1986] and the False Nearest Neighbors (FNN) method of [Kennel et al., 1992].For a uni-dimensional time series N i is a row vector of size (1 × k), for a ddimensional times series N i is a row vector of size (1 × kd).
The recurrence state matrix of the reconstructed phase-space, known as Recurrence Plot (RP), at times i and j, is defined as where ε is a threshold distance value, H(•) is the Heaviside function, and • is the euclidean distance.Due to the underlying embedding (1), R i,j is defined for i i = 1 up to T = T − τ (k − 1).If two states N i and N j are in an ε-neighbourhood the value of R i,j is equal to 1, otherwise is 0.The value of ε highly affects the output of RP.When ε is too small or too large, the RP cannot identify the true recurrence of states.There are different approaches for finding the best value for ε in the literature [Webber and Marwan, 2015].We follow the guidelines provided in [Schinkel et al., 2008] for selecting ε.The values on the diagonal line of RP are equal to one (i.e., R i,i = 1) because in that case the two states introduced to H(•) are identical.The diagonal line of RP is called the Line Of Identity (LOI).Recurrence quantification measures derived from RPs, such as recurrence rate (RR), percent determinism (DET), and maximum line length in the diagonal direction (Dmax) [Webber and Marwan, 2015], give insights about the dynamic behavior of time series.These measures have been used in financial research to analyze the behavior of financial data, e.g.[Bastos and Caiado, 2011], [Fabretti and Ausloos, 2005], [Yin and Shang, 2016], [Hołyst et al., 2001], [Zbilut, 2005].The RP of a time series can be used as a data transformation method for time series prediction.A method employing the RP of seven financial time series to train a deep neural network for predicting the market movement is proposed by [Hailesilassie, 2019].Several authors have suggested an RP forecasting approach via DL.
A feature extraction method exploiting the RP for parsing a DL algorithm is proposed by [Li et al., 2020].On the other hand, RP can be treated as images enabling the use of different computer-vision techniques for the forecasting task, e.g.[Sood et al., 2021] uses autoencoders or [Han et al., 2021, e.g.] uses a CNN.
The CRP of two multi-dimensional time series [Wallot, 2019] corresponds to an extension of RP which explores the co-movement of two time series, and allows the study of the non-linear dependencies between them.Let us denote by N i , i = 1, . . ., T and M j , j = 1, . . ., S the phase-space states of the time series N and M of length T and S, respectively.The Cross-Recurrence (CR) states of the reconstructed statespace a time i and j are calculated by with i = 1, . . ., T = T − τ (k − 1) and j = 1, . . ., S = S − τ (k − 1).Here N i and M i are defined as in (1).CR i,j defines the concept of synchronization and the way synchronization between two financial time series measured: an ε-neighbourhood of the embeddings N i , M j at epochs i, j.
We denote the full cross-recurrence matrix, known as crossrecurrence plot (CRP) extracted for N and M as CRP (N ,M) , obtained through The CRP corresponds to a matrix of dimension T ×S , which may not be square, as the time series N and M may have different lengths, i.e., T = S.
For N , M of equal length T the CRP (N ,M) is a square T ×T matrix.Opposed to the (univariate) recurrence analysis of one time series (with itself, RP in equation ( 2)), the diagonal entries of the CRP are either 1 or 0, as the two time series may or may not be synchronized at (i, j), i = j, i = 1, . . ., T , see e.g. Figure 3.In the CRP of two time series the LOI is replaced by a distorted diagonal, called the Line Of Synchronization (LOS).The LOS reveals the relationship between the two time series in the time domain.In particular, it provides a nonparametric function containing information about the timerescaling of the two time series, that further allows their resynchronization [Marwan et al., 2002].
As the time series we consider in our application are multidimensional (d > 1), we point out that the CRP is indeed a Multidimensional Cross-Recurrence Plot (MdCRP) [Wallot, 2019] where n i , m j are row-vectors rather than scalars, and N i , M j are dk-dimensional row vectors rather than kdimensional row vectors, as opposed to the conventional CRP based on one-dimensional time series.Yet the above discussion is general and applies to both cases, and CR i,j is in any case a scalar equal to either 0 or 1.For multidimensional time series, the entries of each of the two time series require normalization in each dimension before estimating the MdCRP [Wallot, 2019], e.g. with the z-score.
Financial time series co-movement analysis using the CRP and LOS is studied in [Crowley, 2008], [Guhathakurta et al., 2014], [He and Huang, 2020].[Tzagkarakis and Dionysopoulos, 2016] analyses the inter-dependencies of the stock market index and its associated volatility index, further proposing a method for the LOS estimation based on a corrupted CRP.
Furthermore, it is important to notice that financial time are often represented as multivariate instance.Indeed, the most basic source of financial data generally provide information about volumes along with prices.Despite the use of multidimensional inputs being effective and commonly found across several applications [Webber and Marwan, 2015], existing CRP applications on market data are broadly limited to the use of one-dimensional series only (e.g.prices or volatilities) [Webber and Marwan, 2015], [Orlando and Zimatore, 2018], [Addo et al., 2013], [Bastos and Caiado, 2011].

III. PROPOSED METHOD
We exploit the CRP to quantify the co-movement of two multidimensional time series N and M observed continuously over a common period of length V .Our goal is to predict whether at a certain epoch (e.g. a certain calendar day), N and M are synchronized in the state-space embedding εneighbourhood given by (3).
Assume two generic time series N and M are observed over the non-overlapping time-domains D N = {t 1 , . . ., t T } and D M = {s 1 , . . ., s S } of respective length T and S, their CRP corresponds to a T = (T − τ (k − 1)) × (S − τ (k − 1)) = S matrix where the (i, j) element expresses the state of synchronization at the i-th time instance of the first time series (t i ) and at the j-th time instance of the second (s j ), in terms of the ε-neighbourhood of the states N i and M j , as expressed by (3).With the domains being non-overlapping there are no epochs t i and s j such that t i = s j in (the same) calendar time, and the state of synchronization at a same calendar time cannot be determined.Indeed, for a fixed t i , the column-vector CRP i,• reports for the epochs s j , j = 1, . . ., S (past or future with respect to t i ) whether state-space embedding M j is in the same ε-neighbourhood of N i .
In this light, if the domains of N and M only partially overlap over a region D := D N ∩ D M of length V , for our forecasting purpose their non-overlapping regions D N \D and D M \ D are irrelevant and can be discarded.On the other hand, over D, their V overlapping time instances t i , . . ., t i+V and s j , . . ., s j+V correspond to the same calendar epochs, i.e. t i+h = s j+h , ∀h = 1, . . ., V , and are of actual relevance.Over the common domain D, the CRP corresponds to a square V × V matrix (V = V − τ (k − 1)) with a well-defined diagonal expressing the state of synchronization at t i = s j , e.g.answering whether at the (same) calendar day t i = s j , N and M are synchronized or not.
This justifies the required form for the input data, corresponding to two (multidimensional) time series N and M observed over a common period D of equal length V , with D = {v 1 = max(t 1 , s 1 ), . . ., v V = min(t T , s S )}.Since the essence of time series forecasting is that of predicting the future from the past, the data from the past needs to be representative for the h-step ahead forecast.Trivially, this implicitly requires D to be a continuous set of times for the given sampling frequency.That is, there should be no gaps between days or epochs, namely, v V ≡ v 1 + (V − 1).Furthermore, in order to calculate (3), we require the timeseries to be non-corrupted over D in all its multivariate entries, i.e. without missing values.
The above requirements are generally met for the financial time series of our interest.The only constraints are that of using data for stocks traded at the same exchange (same trading days and observed festivities), and that of selecting stocks not subject to delisting in the period of interest.As a precaution, we suggest inspecting the data for missing values due to data-quality issues related to the data provider, or unlikely security-related events such as trading halts.
Aligned with the general rationale of time series forecasting, we aim at predicting the one-step-ahead synchronization status between N and M at epoch i + 1, based on some lagged historical records available up to time i, that is based on some suitable set of feature observed or extracted over w past epochs.For i = w, . . ., V − 1 let us denote by N w i and M w i the sub-sample of N and M of the w most recent observations up to and including epoch i, that is Let us denote by CRP (N w i ,M w i ) the w × w CRP computed from N w i and M w i (with embedding dimension k, lag τ and is used as the input of the neural network for predicting the state of synchronization at i + 1.Within this framework, there are V −w input-target pairs.The first pair corresponds to the input CRP (N w w ,M w w ) and target (CRP (N ,M) ) w+1,w+1 , the last to the input CRP The prediction target at epoch i corresponds to the state of synchronization at i + 1, provided by the (diagonal) entry CRP (N ,M) i+1,i+1 of the CRP computed for the entire times series N , M. In particular, the state of synchronization at any epoch i = 1, . . ., V is provided by the diagonal of CRP (N ,M) , i.e.,

diag(CRP
if N and M are synchronized at time i, 0 otherwise, (5) so that diag CRP (N ,M) i i=w+1,...,V corresponds to the targets for the inputs CRP (N w i ,M w i ) i=w,...,V −1 .The above corresponds to a framework where inputs are created dynamically by using CRPs computed over sub-sampled time series obtained by applying sliding windows of a fixed size.Note that CRP (N w i ,M w i ) is not analogous to the sub-matrix P obtained from CRP (N ,M) by considering its rows and columns from i − w + 1 to i.In CRP (N ,M) the entire data in N and M accounts for the time series normalization and furthermore tunes the parameter ε.CRP (N w i ,M w i ) is thus truly dependent on the cropped times series data N w i , M w i , while P is not.In a forecasting context our approach is feasible and unbiased as it does not use any future information following the one available at i.Note that, in general, nothing prevents from choosing the embedding size and lag parameter differently for the CRP computation of the targets and for the CRP computations of the inputs.
A simple example with two one-dimensional time series clarifies how we extract the input features and prediction tar- Figure 3 depicts their CRP, i.e., CRP (A,B) (for simplicity computed with k = τ = 1, and V = V = 10).The diagonal line of the CRP is highlighted and includes the values of the recurrence states.The diagonal line shows that the behavior of A and B at timestamps between 1 and 7 to 10 is synchronized, therefore at these timestamps the prediction targets are set to 1 (the actual value of the Heaviside function in (3)).By, for instance, setting w = 3, we aim at predicting V − w = 7 states of synchronization.The first prediction concerns the synchronization at epoch w +1 = 4, based on the 3 ) , that is, the CRP calculated from the first three observations of A and B. The prediction of the synchronization at epoch 5, is based on the CRP calculated on observations 2 to 4, i.e. on CRP (N 3 4 ,M 3 4 ) .The procedure is repeated up to epoch V − 1 = 9, where CRP (A 3 9 ,B 3 9 ) , calculated from the observations 7 to 9, is used for predicting diag(CRP (A,B) ) 10 .
To practically implement the underlying DL model that maps each CRP (N w i ,M w i ) input to its corresponding diag CRP (N ,M) i+1 output, consider that each input consists of a matrix of zeros and ones that can be considered analogous to an image.Therefore we can rely on well-established classification models.In particular, we employ a Convolutional Neural Network (CNN).Such a neural network is well-suited for capturing the spatial relationships between the features in their input, which in our case correspond to the 0-1 features encoded in the entries of CRP (N w i ,M w i ) .Note that in the CRP calculation N w i and M w i are z-score normalized before computing (3).
We use a CNN architecture formed by two convolutional and one fully-connected layer, as illustrated in Figure 2. The neural network involves the typical blocks of the CNN architecture.The convolutional layers adaptively learn the spatial relationships of inputs, the Rectified Linear Unit (ReLU) activation introduces nonlinearity to the model, and the maxpooling layer provides down-sampling operations reducing the size of the feature map by extracting the maximum value in each patch from the input feature map.The current CNN is chosen based on a grid search over different network architectures, layers' types and sizes, aimed at maximizing the F1-score and showing the feasibility of our CRP-based DL approach.

A. Data
Our analyses rely on daily adjusted closing prices and daily number of traded shares (volumes) for 12 representative constituents of the S&P100 index in the period from December 31 st , 2014 to November 29 th , 2021 (V = 1, 741 trading days).The data is retrieved from Yahoo Finance.These 12 stocks are selected based on their market capitalization and their market sector.For each sector we select the first two stocks of highestbut-comparable capitalization, a practice well-supported by financial theory [Fama and French, 1993].Market sectors provide a natural grouping for securities: analyses conducted at a sector level are a common practice for granting comparability and robustness of the results, as across market sectors the dynamics of economic variables are well-known to be asymmetric.Table I lists our stock selection.Each stock is expressed as a trivariate time series consisting of daily prices, volumes, and returns.This way, CRPs express temporal similarities in joint terms of the price level, traded volume, and daily return, providing a generalized definition of similarity in time-series dynamics at a multivariate level.
For our bivariate analysis on two time series we have (12 2 − 12)/2 = 66 pairs of stocks.For each stock pair, we use the first 70% of the data for training (V train = 1, 218 days) and the last 30% for testing (V test = 523 days).As the future input instances should not affect the training process, the order of the input data during the training is fixed.The input and targets of the train data and the test data are, respectively, where I = w, . . ., V train and T = w 1, . . ., V train + 1 for the training set and I = V train + 1, . . ., V − 1 and T = V train + 2, . . ., V for the test set.We train the neural network once over the data for all the picks of the stock pairs.This pooled approach is a common practice in closely-related Machine Learning literature [Ntakaris et al., 2018], [Tran et al., 2019, e.g.] and supported e.g. by the empirical findings of [Sirignano and Cont, 2019], suggesting the existence of an universal price formation mechanism (model), and thus price dynamic, not specific for individual assets.In practice, the input and output data is the concatenation of the individual pairs' inputs-targets.For example, for a set window size w, for the train set the input-target data consists of (V train − w) × 66 examples, that is (V train − w) × 66 pairs of cross-recurrent matrices and (scalar) targets, where V train = (V train − τ (k − 1)).
In the training phase, the training data is used to estimate the optimal weights of the CNN.The test data is then parsed to the estimated CNN and the quality of the network outputs is evaluated against the actual targets.Details are provided in the following two subsections.
For the training of the CNN we adopt the ADAM optimizer with the following hyper-parameters: learning rate 0.01 (reduced by a factor of 5 every 40 epochs), momentum parameters 0.9 and 0.999, batch size 128 and epoch size 300.Across the epochs we keep track of the F1-score on the validation set, which is set to the last 15% portion of the training set.For our classification task we adopt the binary cross-entropy loss.As the target classes are unbalanced, the loss is weighted for the targets' class proportion.Details on the filter sizes, kernel sizes and the max pooling size are provided in Figure 2.With respect to the CRP computations, throughout our analyses the embedding dimension k is set to 2 or 3 (estimated via FNN method) based on input type, and the delay parameter τ is set to 1. Values 0.45, 0.55, 0.65, and 0.75 are used for the threshold ε.These hyper-parameters are selected according to the guidelines and discussion in [Schinkel et al., 2008] and [Wallot, 2019].The same values are applied for both the computation for the CRP related to the targets and the CRPs related to the inputs.
In our experiments we consider two different choices for the window-size hyperparameter, namely w = {10, 30, 50, 60, 80} days.With the above settings, V = V = 1, 741 days, V train = V train = 1, 218, and V test = V test = 523 days.For i = w, . . ., V − 1, CRP (N w i ,M w i ) are square matrices of size w = w and CP R (N ,M) a square matrix of size V on whose diagonal are found the relevant targets, i.e. diag CP R (N ,M) i , i = w + 1, . . ., V .

B. Experiments Results
Stock pairs from the same sector or two different sectors with different co-movement behaviors can provide comprehensive experimental data to show the ability of the proposed method in predicting the state of synchronization.To evaluate the performance of our proposed method, all pairs of stocks are used as the input of the method.We collect all pairs of stocks and for each pair, we follow the steps of the proposed method (Fig 1) to create the inputs and targets.We stack the input-target pair-specific data to create a single train and test set for all pairs.
Tables II and III, show the performance of our proposed approach for all pairs of stocks using two types of input: (price, volume) and (price, return, volume) respectively.Given that the target classes are generally imbalanced, the preferred reference performance metric is the F1 score.Yet, we also include accuracy, precision, and recall to have a clearer overview of the classification performance.For robustness, we run our experiments over a range of values for the window-size W and threshold ε hyperparameters, a setup that further clarifies the effect of these hyperparameters on prediction performance.
Results for the (price, volume) time-series input are provided in Table II, results for the (price, return, volume) input in Table III.In general, our results show that the task of predicting the state of synchronization is not only feasible, but, under our setup, quite satisfactory.Indeed our preferred performance F1 metric is as high as 84%.Yet, as expected, the results appear to be sensible to the choice of the window size and threshold parameter.In particular, the performance metrics decrease in their values as the threshold parameter and the window size increase.This means that stricter εneighbourhoods are easier to predict and that the relevant information for the prediction of the synchronization state is found in the most recent instances of the CRP.This suggests the existence of patterns in the data that are strongly indicative of close ε-neighbourhoods, for which the prediction is very satisfactory.I.e. the CNN detects clear patterns indicative of the fact that the day-ahead synchronization is likely to be very strong (the ε-neighbourhoods is tight), indeed, as ε increases, the performance metric decrease, indicating that the model indeed detects strong evidence of "strong" dayahead synchronization.Regarding, the window size, Longlagged CRP information appears to introduce noise in the system without providing any predictive gains, aligned with
gets.Consider the two time series A and B of 10 observations: A = {A, B, A, A, C, D, D, B, C, C} , B = {A, C, C, C, D, B, D, B, C, C} .

TABLE I LIST
OF SELECTED STOCKS.