1 Introduction

Measures of statistical dependence represent one of the cornerstones in the analysis of empirical data. The study of time series measured from complex real-world systems poses a broad variety of challenges in quantifying uni- and multivariate data sets, e.g., lagged dependencies, non-stationarity, noise contamination, uncertainties and the limited length of time series. The set of tools to detect and quantify statistical dependencies ranges from standard correlation analysis techniques [1] over graph theoretical approaches, such as complex networks [2, 3], towards causal discovery algorithms [4,5,6], and detecting critical transitions [7,8,9]. A notoriously challenging problem is to adequately quantify non-monotonous and nonlinear relationships in stochastic and deterministic systems. In this context, information theoretic measures have established as an effective framework [10,11,12]. However, popular methods, such as the mutual information (MI), are not designed to treat higher-dimensional data appropriately (even though extensions have been suggested [13, 14]). The nature and strength of links in complex systems additionally often exhibits scale-dependence, i.e., multi-scale behavior [15]. This motivates the need of methods that are capable of unravelling dependencies at a broad range of scales, e.g., wavelet-based methods [16,17,18]. Few methods succeed to combine both capabilities of capturing nonlinear dependencies at multiple time scales [19, 20].

A powerful method that captures both nonlinear and multi-scale properties of a high-dimensional dynamical system is the recurrence plot (RP) [21]. An RP is a mathematically simple yet effective tool that encodes the tendency of a time series to recur to formerly visited states [22]. An RP is based on a binary recurrence matrix in which recurrences are marked by value one, giving rise to intriguing and well-interpretable structures in the RP. Various quantification measures can be applied to a recurrence matrix and prove powerful in classifying differing systems [23,24,25,26], identifying dynamical regime transitions [27, 28], and detecting non-linear correlations as well as synchronization [6, 29, 30]. Recurrence quantification analysis (RQA) based on diagonal lines in the RP not only allows identification of periodic behavior [18, 31], but also helps to identify unstable periodic orbits in high-dimensional chaotic systems [32]. The conceptual simplicity of RPs allows for a broad range of real-world applications, also for challenging data that are event-like or unevenly sampled in time [33, 34]. Recurrence measures of dependence have not only facilitated the study of synchronization in dynamical systems [30, 35,36,37] but also have been extended to account for lagged and conditional dependencies [6, 29, 38, 39]. Further concepts, including symbolic analyses of relationships, have been conceived more recently [40]. Recurrence-based quantification of statistical dependencies, thus, bares high potential to meet the combination of the above-mentioned challenges. Here, we propose a novel recurrence-based measure of dependence that uses delay coordinates from a given observational time series. Since the measure is based on RPs, nonlinear dependencies with multiple time lags can be quantified which makes the measure applicable to the problem of non-uniform delay selection [19, 20]. The proposed dependence measure, thus, contributes to the challenge of characterizing complex real-world interactions using RPs.

This work is structured as follows: in Sect. 2, we introduce the recurrence flow as a measure of dependence along with a brief review of the RP method. We showcase its scope in different numerical experiments in Sect. 3, covering the characterization of lagged nonlinear dependence and delay selection for uniform and non-uniform TDE. We conclude our findings in Sect. 4.

2 Recurrence flow

We are interested in characterizing nonlinear dependencies in a deterministic, high-dimensional system that is represented by M observational time series \(\left\{ s_n(t)\, |\, n=1,\ldots ,M\right\}\). In general, the relationships between the different time series \(s_n(t)\) and their coordinates do not need to be instantaneous but are often associated with time delays \(\tau _1,\tau _2,\ldots ,\tau _m\). Consequently, we define the recurrence flow measure of redundance to capture such lagged dependencies.

The key idea of the proposed measure is based on the existence of diagonal lines in RPs. An RP is a two-dimensional matrix that encodes how a system recurs to formerly visited states \({\vec{v}}_i, \, i = 1,\ldots ,N\). In general, this representation can be computed for systems of any dimension d and is formally given by

$$\begin{aligned} R_{i,j}(\tau )&= {\varTheta} \left( \varepsilon - \Vert {\vec{v}}_i(\tau ) - {\vec{v}}_j(\tau )\Vert \right) \nonumber \\&=\left\{ \begin{array}{ll} 1 &{} \quad \text {if} \ \Vert {\vec{v}}_i(\tau ) - {\vec{v}}_j(\tau )\Vert \le \varepsilon \\ 0 &{} \quad \text {if}\ \Vert {\vec{v}}_i(\tau ) - {\vec{v}}_j(\tau )\Vert > \varepsilon , \\ \end{array} \right. \end{aligned}$$
(1)

with two arbitrary times i and j, the vicinity threshold \(\varepsilon\) and a suitable norm \(\Vert \cdot \Vert\). The states denoted by \({\vec{v}}\) are either given by the available components (state variables) of the system or, in case of only limited access to the state variables, by delayed copies of the one (or multiple) observational time series \(s_n(t)\) of the studied system. In particular, \({\vec{v}}\) is then obtained by stacking these copies on top of each other as it is common practise in time delay embedding (TDE).

Diagonal lines of length \(L_d\) in an RP resemble periods of enhanced predictability as two trajectory segments at times i and j evolve in parallel in an \(\varepsilon\)-tube for \(L_d\) time instances. For a given system, this may reveal time periods of continuously high determinism or uncover abrupt regime shifts [41]. Properties of the diagonal line length distribution of an RP are linked to dynamical invariants of paradigmatic dynamical systems [42]. However, spurious artifacts are known to disrupt, lengthen or thicken diagonal lines due to erroneous computation of an RP [43]. An inadequate choice of the vicinity threshold \(\varepsilon\) will disrupt diagonal lines, thus underestimating the system’s predictability. On the other hand, too high values will artificially merge fundamentally distinct regions of phase space. Sampling can also alter diagonal line structures: if the system is undersampled, diagonal lines might not emerge continuously as deterministic time intervals are not resolved sufficiently. On the other hand, oversampling results in artificially thickened diagonal lines (tangential motion) [44]. Erroneous time delay embedding of uni-/multivariate time series can have several undesired effects on the formation of diagonal lines; if the embedding dimension is chosen too high, diagonal lines are artificially lengthened and can even emerge in absence of determinism for an uncorrelated stochastic process due to correlations in the underlying distance matrix [45]. A non-optimal choice for the embedding delay will result in diagonal lines that are perpendicular to the line of identity (\(i=j\), LOI). This indicates inclusion of erroneous time scales [43]. The formation of perpendicular lines is caused by the ambiguity in the reconstruction that is introduced by not eliminating the full serial dependence; this results in close evolution of states both forward and backward in time: \(\Vert {\vec{v}}_i - {\vec{v}}_j\Vert< \varepsilon , \, \Vert {\vec{v}}_{i+1} - {\vec{v}}_{j-1}\Vert < \varepsilon\), i.e., the trajectory segments closely evolve in parallel, but with opposite time directions. On top of that, additional deformations to a diagonal line can occur, e.g., in the form of bowed diagonal lines indicating that the evolution of states at different time intervals is similar but occurs with different velocity or temporal resolution [46].

We utilize the formation of such diagonal line artifacts (DLA) to identify time scales of the system that result in well-expressed diagonal lines. The proposed method is, thus, based on the assumption that the studied system exhibits (at least to some degree) deterministic dynamics which will result in meaningful diagonal lines in an RP. An effective way of retrieving information on the formation of DLA is given by scanning an RP diagonal-wise. We can identify an index for each diagonal at which the first recurrence pixel is located. It appears intuitive to regard these pixels as ‘obstacles’ to an imaginary fluid that flows along each diagonal into the RP and is not allowed to turn (Fig. 1C/E). The formation of DLA blocks the flow. As a basic example, we consider a noisy sinusoidal time series (Fig. 1A) with \(n=5000\) and a period of \(T=100\). Formation of perpendicular diagonal lines for \(\tau =\mu T, \, \mu \in {\mathbb {N}}\) (Fig. 1B) reduces the flow through the RP compared to \(\tau =\left. T / 4 \right.\) (Fig. 1D). We use the symmetry of the RP by only flooding the upper triangular matrix to save computation time. The recurrence flow \({\varPhi} (\tau )\) can be computed for varying delays \(\tau\) and encodes similar information as an inverse autocorrelation function, yielding a continuous representation of the redundancy between the time series and its delayed version (Fig. 1F). For continuous variations of \(\tau\) from \(\tau =0\) to \(\tau =\left. T / 4 \right.\), the perpendicular diagonal lines are progressively eliminated. This reproduces the well-known result that a sinusoidal signal needs to be shifted by (odd multiples of) a quarter of its period against itself to minimize redundancy.

To quantify the flow through the RP, we define the recurrence flow matrix \(\phi\) (Fig. 1C/E)

$$\begin{aligned} \phi _{i,j}(\tau ) & =\Bigl (1-{\varTheta} \bigl (\varepsilon - \Vert {\vec{v}}_i(\tau ) - {\vec{v}}_j(\tau )\Vert \bigr )\Bigr ) \, {\varTheta} \left( \ell _j - i \right) ,\\ & \quad i,\,j = 1,\ldots ,N, \end{aligned}$$
(2)

with the time delay \(\tau\) and the length \(\ell _j\) of the \(j\text {th}\) flooded diagonal, i.e., the number of subsequent zeros up to the first one. A flooded diagonal \(\ell _j\) has to be distinguished from a diagonal line \(L_d\): it denotes a diagonal of the RP (parallel to the LOI) starting at time instance j that is flooded with a fictive fluid, regardless of whether any diagonal lines exist on this diagonal. Accordingly, the factor \({\varTheta} \left( \ell _j - i \right)\) in Eq. (2) ensures that the flooding of the \(j\text {th}\) diagonal stops at the first recurrence on this diagonal. \({\phi }_{i,j}\) depends on the vicinity threshold \(\varepsilon\), i.e., the fraction of recurrences. We fix \(\varepsilon\) at some reasonable value that corresponds to a fixed recurrence rate (RR).

We study the dependence on the time delay \(\tau\) contained in the vector \({\vec{v}}(\tau )\) similar as it is done in TDE where the delays \(\tau _1,\tau _2,\ldots ,\tau _m\) for the different coordinates are free parameters and need to be chosen with respect to some notion of optimality [47]. The recurrence flow \({\varPhi} (\tau )\) is computed by summing over the recurrence flow matrix \({{\phi }}_{i,j}\) at given \(\tau\) and dividing by the number of non-recurrences (i.e., zeros in the RP):

$$\begin{aligned} {\varPhi} (\tau ) =\frac{\sum _{i,j=1}^N \phi _{i,j}(\tau )}{\sum _{i,j=1}^N \bigl (1-R_{i,j}(\tau )\bigr )}. \end{aligned}$$
(3)
Fig. 1
figure 1

Recurrence flow for a noisy sinusoidal. A Time series of the noisy sinusoidal and B/C RPs for two different embedding vectors \({\vec{v}}(t)\): \({\vec{v}}(t) = [y(t), y(t+T)]\) and \({\vec{v}}(t) = [y(t), y(t+\frac{T}{4})]\) (left) and flooded RPs/recurrence flow matrices (right). D Recurrence flow \({\varPhi} (\tau )\) through the RP for varying delay \(\tau\). The red dot marks the period T. \({\varPhi} (\tau )\) is maximized for delays that are odd multiples of T/4

In a multivariate application, \({\vec{v}}\) can encompass time series from different systems to study their cross-dependencies. In such a scenario, it is more instructive to define the recurrence flow as a direct measure of correlation/redundance. We, thus, define the recurrence flow measure of redundance (RFMR) \(\theta (\tau )\) as

$$\begin{aligned} \theta (\tau ) =1 - {\varPhi} (\tau ). \end{aligned}$$
(4)

The significance of recurrence flow values can be tested against a random null model based on uncorrelated white noise (Appendix B). Finally, it needs to be noted that the idea of using RPs to identify optimal embedding parameters has been considered before, but to our best knowledge has not been performed systematically [48, 49].

3 Application to model examples

We now demonstrate the scope of the proposed method by highlighting two different potential applications: the quantification of nonlinear correlations (Sect. 3.1) and the identification of uniform embedding delays for TDE of nonlinear signals (Sect. 3.2).

3.1 Nonlinear dependence

We exemplify the efficacy of \(\theta (\tau )\) as a nonlinear dependence measure for deterministic systems with a simple bivariate system:

$$\begin{aligned} x(t) \,&= \, \sin (2\pi t/\omega ) + \eta (t,\sigma _1)\nonumber , \\ y(t) \,&= \, a x(t-\tilde{\tau })^2 + \eta (t,\sigma _2), \end{aligned}$$
(5)

with frequency \(\omega = \left. 2\pi / T \right.\), period T, time lag \(\tilde{\tau }=20\), and normal-distributed white noise processes \(\eta (t,\sigma )\) with standard deviations \(\sigma _1\) and \(\sigma _2\). This system exhibits a sinusoidal cycle with frequency \(\omega\) in its x-component. The y-component is nonlinearly coupled to x(t) and exhibits a cycle with half of the period of x(t). y(t) follows x(t) with a fixed time lag \(\tilde{\tau }\). Both components are superimposed by measurement noise \(\eta (t)\). We consider time series with \(n=5,000\) samples (Fig. 2A). Due to the specific coupling, the relationship between x and y is nonlinear (Fig. 2B).

We test whether we can detect the coupling and the corresponding time lag \(\tilde{\tau }\) by computing \(\theta (\tau )\) for delays in the range \(\tau \in [-200,200]\) (Fig. 2C). In fact, we find that \(\theta (\tau )\) reaches local maxima at integer multiples of \(\tilde{\tau }\), including \(\tau =\tilde{\tau }\) (red dashed line). Similar results are obtained if the mutual information is used (Fig. 2C, black curve), confirming that nonlinear relationships between deterministic time series can be captured by recurrence flow in presence of measurement noise. In this case, the MI yields sharper peaks than the recurrence flow. However, the suggested recurrence flow measure provides advantages for high-dimensional data as discussed below.

Fig. 2
figure 2

Detection of lagged, nonlinear dependence between nonlinearly coupled sinusoidals. A The y(t) time series (black) results from squaring x(t) (blue) and shifting it by a fixed time delay. B Nonlinear relationship between x(t) and y(t). C Recurrence flow measure of redundance \(\theta (\tau )\) (blue) detecting the time delay between x(t) and y(t) (red dashed line), confirmed by the cross-mutual information (black)

3.2 Uniform time delay embedding

The ability of the recurrence flow to detect delayed dependencies between multiple variables motivates its use in the delay selection problem faced in TDE. Proficient delay selection must be based on a measure that captures the redundancy in a (potentially large) set of correlated time series. A popular solution is to use mutual information (MI). However, characterizing the required joint probability density function \(p(s_1, s_2, \ldots , s_m)\) becomes cumbersome for a large number of variables m and is rendered infeasible for many high-dimensional real-world systems. Compared to nonlinear correlation measures like standard MI that are based on binning, recurrence flow offers the advantage that computation times increase less rapidly: given \(k_m\) bins, an increase in dimensionality \(m\rightarrow m+1\) results in \((k_m-1)k_m\) additional bins while in the computation of \({\varPhi}\), a k-d-tree nearest neighbor search-based RP computation increases only linearly with the dimensionality of the system. It has yet to be noted that more sophisticated nearest neighbor-based approaches for MI computation do not suffer from this drawback [14].

Due to the popularity of this problem, other nonlinear correlation measures that do not suffer from the curse of dimensionality have been conceived [20, 50]. In order to validate the effectiveness of the measure proposed here for selecting embedding delays, we do not only compare it to the linear autocorrelation function (ACF) and the auto-mutual information (MI), but also to the delay selection method proposed in [20]: the continuity statistic is based on a hypothesis test of whether a component, added to an existing m-dimensional reconstruction vector, is functionally independent of all existing m components. This is tested by first defining a set of fiducial points and their k-nearest neighbors from the m-dimensional reconstruction vector. These are mapped onto the one-dimensional number line. The continuity statistic \(\langle \varepsilon ^*\rangle (\tau )\) denotes the smallest scale at which the number of observed neighbors mapped onto this line is larger than the number expected from a binomial distribution, implying a functional relationship. For further details, the reader is referred to [20, 51]. For the estimation of the optimal embedding dimension, we employ Cao’s method with a threshold of \({\varDelta} _{\mathrm {afn}}=0.2\) for the change of the number of averaged false neighbors from \(m\rightarrow m+1\) [52]. For the selection of embedding delays using the proposed recurrence flow measure, we identify all peaks of \({\varPhi} (\tau )\) at different delays \(\tau\) and choose the highest of these to obtain the corresponding embedding delay \(\tau ^{({\varPhi} )}\). In case there is several peaks of identical height, we choose the one for the smallest \(\tau\) (i.e., the first one).

To illustrate the procedure, we consider a time series of the past 1 million years of insolation on Earth at \(52.39^{\circ }\) latitude (see Appendix A) [53]. The insolation depends on the Earth orbit and the Earth axis tilt and precession, thus, varies in specific cycles (Milankovich cycles). Cao’s method suggests an embedding dimension \(m=4\). We present results for the second and third component of the embedding vector. Using first-crossing of ACF and first minimum of MI, both measures suggest a delay of \(\tau _1^{({{\mathrm {ACF}}})}=\tau _1^{({{\mathrm {MI}}})}={6}{\text {ka}}\) (Fig. 3A). Since none of both measures is capable of selecting different embedding delays for higher components of the embedding vector, this yields the three-dimensional embedding vector \({\vec{v}}(t)=\left[ y(t), y(t-{6}{\text {ka}}), y(t-{12}{\text {ka}})\right]\). Next, we compute \({\varPhi} (\tau )\) to check if the estimate of \({\vec{v}}(t)\) based on the traditional TDE metrics is confirmed: we find \(\tau _1^{({\varPhi} )}={5}{\text {ka}}\) in close agreement with the ACF and MI-criteria (Fig. 3C). Moreover, the continuity statistic \(\langle \varepsilon ^*\rangle (\tau )\) suggests the same embedding delay for the first component as the auto-correlation and MI (first local maximum). It does not indicate a global maximum at this delay, yielding a more ambiguous choice of the optimal embedding delay than \({\varPhi} (\tau )\). Finally, we examine if both multi-dimensional measures suggest \(\tau _2=2\tau _1\) for the second embedding delay as expected for a traditional uniform time delay embedding (UTDE). Interestingly, \(\langle \varepsilon ^*\rangle (\tau )\) offers only limited information on an optimal embedding delay for the third component of \({\vec{v}}(t)\) (Fig. 3D). Multiple local maxima offer a variety of choices with no clear optimal value. We choose the global maximum (marked by star). Conversely, \({\varPhi} (\tau )\) once more provides a clear choice for the second embedding delay with globally maximized flow for \(\tau _2^{({\varPhi} )}=2\tau _1^{({\varPhi} )}\).

We visually evaluate the quality of the resulting embeddings by comparing the line structures in the corresponding RPs (Fig. 3E). The enlarged details of the RPs illustrate how well deterministic intervals in the evolution of insolation are resolved based on the phase space reconstructions yielded by the ACF, continuity statistic, and recurrence flow. While the uniform embedding vectors obtained from the ACF and \({\varPhi} (\tau )\) result in well-separated, undisturbed diagonal lines, multiple diagonal lines and the related cycles are poorly expressed in the phase space suggested by \(\langle \varepsilon ^*\rangle (\tau )\). The reconstructed phase space based on the embedding vector obtained from the recurrence flow criterion reveals several unstable periodic orbits (Fig. 3F), constituting a concentric spiral-like phase space trajectory in three dimensions.

Many real-world systems allow taking only a relatively short series of measurements for a single variable with high levels of superimposed measurement noise. We study how well a known phase space of a paradigmatic system can be reconstructed based on the four different measures considered above with increasing noise strength. In particular, we generate \(n=2000\) samples of a Rössler system (see Appendix A) such that the resulting trajectory only covers relatively few unstable periodic orbits. We reconstruct the known three-dimensional phase space from the y(t)-component (Fig. 4A) with superimposed uncorrelated white noise realizations (Fig. 4B).

Fig. 3
figure 3

Uniform delay selection for numerical insolation model. A Insolation time series y(t) in W m\(^{-2}\). B Serial dependence of y(t), measured in terms of ACF (red/blue) and auto-MI (black) for the univariate time series, C/D as well as by the continuity statistic \(\langle \varepsilon ^* \rangle (\tau )\) (gray) and the recurrence flow \({\varPhi} (\tau )\) (dark blue) for C two- and D three-dimensional embedding vectors \({\vec{v}}(t)\). The optimal delay is marked by a circle(/cross/star/vertical green line) for the ACF(/MI/continuity statistic/recurrence flow), respectively. E Zoomed RPs for the ACF, \(\langle \varepsilon ^* \rangle (\tau )\) and \({\varPhi} (\tau )\) (from left to right). F Three-dimensional phase space reconstruction based on recurrence flow shows a spiral-like trajectory

The noise strength (standard deviation of the noise) is varied in multiples of the standard deviation \(\sigma _{\mathrm {Roe}}\) of the undisturbed y(t). Even with only \(10\%\) measurement noise, the original attractor is already significantly less smooth (Fig. 4B). We compare the ACF, MI, continuity statistic, and recurrence flow as delay selection measures while only uniform time delays are considered, i.e., the optimal embedding delay is selected only once for the step from a one- to a two-dimensional embedding. This ensures that the two high-dimensional measures (\({\varPhi} (\tau )\) and \(\langle \varepsilon ^* \rangle (\tau )\)) can be compared adequately to the traditional measures.

To quantitatively evaluate the dependence of the reconstruction on noise strength, we generate an RP for each reconstruction and for each of the four delay selection methods. For each RP, we compute the joint recurrence rate fraction (JRRF):

$$\begin{aligned} \hbox {JRRF} \,&= \, \frac{\sum _{i,j}^N JR_{i,j}}{\sum _{i,j=1}^N R_{i,j}^{\mathrm {ref}}} , \qquad \hbox {JRRF}\in [0,1] \nonumber \\&\text {with}\\ {\mathbf {JR}} \,&= \, {\mathbf {R}}^{\mathrm {ref}} \circ {\mathbf {R}}^{\mathrm {rec}}, \end{aligned}$$

from the RP of the (real) reference system \({\mathbf {R}}^{\mathrm {ref}}\) and the RP of the respective reconstruction \({\mathbf {R}}^{\mathrm {rec}}\). We use it to quantify the accordance of the real RP of the corresponding noisy Rössler system to the reconstructions with respect to its recurrence structure (the higher JRRF, the better the reconstruction).

We generate 50 individual noise realizations for each noise strength between 0 and \(100\%\) of the original standard deviation \(\sigma _{\mathrm {Roe}}\) and average the corresponding JRRF values (Fig. 4C). As expected, with increasing noise level, the quality of the reconstruction decreases.

Fig. 4
figure 4

Sensitivity of recurrence flow against measurement noise, compared to other dependence measures. A y(t)-component and reference phase space of Rössler system without noise contamination; B the same, but with \(10\%\) measurement noise. Time axis ticks refer to consecutive integration steps. C Performance of ACF, auto-MI, continuity statistic \(\langle \varepsilon ^* \rangle (\tau )\) and recurrence flow \({\varPhi} (\tau )\) in terms of JRRF for noise strengths from \(0\%\) to \(30\%\) (and \(100\%\) inset)

For noise strengths between 10 and \(30\%\), the linear ACF performs worst in terms of JRRF while the MI and continuity statistic perform equally well. The almost perfect alignment of both might seem surprising but is due to the discreteness immanent in the delay selection. While their agreement might be interpreted in the way that this has to be the optimal delay (i.e., the reconstructed system preserves most of the recurrence structure), the recurrence flow \({\varPhi} (\tau )\) shows superior performance with noise strengths up to \(30\%\), i.e., three times the noise level illustrated in Fig. 4B. Beyond \(30\%\), all four measures yield approximately the same performance (Fig. 4C inset), as for JRRF\(<0.5\), the alignment could be explained by random joint recurrences.

3.3 Non-uniform time delay embedding

For many real-world dynamical systems, it is not sufficient to consider only a single characteristic time scale. Instead, multi-scale systems are governed by a multitude of processes that imprint (quasi)-periodic cycles of various lengths onto the measured time series. The selection of delays must account for this complexity by considering non-uniform embedding delays (non-uniform time delay embedding, NUTD). One of the most studied systems that exhibits multi-scale dynamics is the El Niño-Southern Oscillation (ENSO). ENSO represents a quasi-periodic climate pattern that is associated with spatio-temporal variations of sea surface temperatures in the central and eastern Pacific Ocean, oscillating between El Niño and La Niña events. We use the delay differential ENSO model proposed in [54] to examine if the recurrence flow can unveil distinct delays for a three-dimensional state space reconstruction of model time series. The model is based on a nonlinear delay differential equation and reproduces an abundance of key features of ENSO (see Appendix A). We study two different solution types that are associated with distinct dynamical regimes, i.e., a seasonal oscillation with superimposed faster low-amplitude oscillations prior to a period-doubling and irregular oscillations that are reminiscent of El Niño and La Niña events of random magnitudes. A more detailed discussion of these solution types can be found in [54].

The first solution type exhibits well-pronounced seasonal cycle and fast, amplitude-modulated wiggles on top (Fig. 5A). Cao’s method suggests that this solution type can be embedded in a \(m=3\)-dimensional embedding space. We find that MI does not yield an unambiguous choice for an embedding delay while the ACF suggests \(\tau _1^{({\mathrm {ACF}})} = {0.25}{\text {years}}\), i.e., the expected value of a quarter of the seasonal cycle (Fig. 5B). The same delay is identified with \({\varPhi} (\tau )\) whereby \(\langle \varepsilon ^* \rangle (\tau )\) yields a slightly higher optimal embedding delay (Fig. 5C). While traditional UTDE now suggests \(\tau _2^{({\mathrm {ACF}})} = 2\tau _1^{({\mathrm {ACF}})}\) for a three-dimensional embedding, both \({\varPhi} (\tau )\) and \(\langle \varepsilon ^* \rangle (\tau )\) instead show that a different choice yields a superior phase space reconstruction in terms of minimized redundancy (Fig. 5C/D). Both \(\langle \varepsilon ^* \rangle (\tau )\) and \({\varPhi} (\tau )\) effectively uncover the faster cycle by means of local maxima. However, both also detect a delay that is the sum of the seasonal and the fast cycle as a promising candidate. The fact that the estimate of this conjoint cycle differs for both measures can be explained by the different estimates on \(\tau _1\). Since for \({\varPhi} (\tau )\) both local maxima have the same height, we pick the first. The zoomed RPs clearly express that both NUTD selection methods entail more coherent diagonal lines with less perpendicular distortions (Fig. 5E). Despite the different embedding vectors \({\vec{v}}^{({\varPhi} )}\) and \({\vec{v}}^{(\varepsilon ^*)}\), both reconstructions give a convincing representation of the seasonal cycle in the RP, respectively. The reconstructed phase space based on the delays selected from the recurrence flow yields a clear visualization of the system’s periodic oscillations (Fig. 5F).

Fig. 5
figure 5

Non-uniform delay selection for delay-differential ENSO model with A periodic dynamics, analogously to Fig. 3. Serial dependence of y(t), measured in terms of B ACF (red/blue) and auto-MI (black) for the univariate time series, as well as by C/D the continuity statistic \(\langle \varepsilon ^* \rangle (\tau )\) (gray) and the recurrence flow \({\varPhi} (\tau )\) (dark blue) for C two- and D three-dimensional embedding vectors \({\vec{v}}(t)\). The optimal delay is marked by a circle(/cross/star/vertical green line) for the ACF(/MI/continuity statistic/recurrence flow), respectively. E Zoomed RPs for the ACF, \(\langle \varepsilon ^* \rangle (\tau )\) and \({\varPhi} (\tau )\) (from left to right). F Three-dimensional phase space reconstruction based on recurrence flow

In the same manner, we study the second solution type of irregular ENSO-like oscillations (Fig. 6). Cao’s method yields a 4-dimensional TDE. Both the ACF and MI identify different but similar delays slightly larger than \(\tau _1={0.35}~{\text {years}}\) (Fig. 6B). \(\langle \varepsilon ^* \rangle (\tau )\) and \({\varPhi} (\tau )\) agree on an embedding delay of \(\tau _1^{({\varPhi} )} = \tau _1^{(\varepsilon ^*)} = {0.33}~{\text {years}}\) which is slightly smaller than both \(\tau _1^{({\mathrm {ACF}})}\) and \(\tau _1^{({\mathrm {MI}})}\) (Fig. 6C). Again, the estimates \(\tau _2^{({\varPhi} )}\) and \(\tau _2^{(\varepsilon ^*)}\) differ from the choice that would result from UTDE and indicate that describing the dominant variability in the observed irregular oscillations requires a multi-scale approach (Fig. 6D). Slight deviations in both estimates give rise to minor discrepancies between the emerging diagonal lines in the respective RPs (Fig. 6E). For none of the three reconstructions, DLA are entirely removed which hints at an optimal embedding dimension \(m>3\) as identified by Cao’s method. This is supported by the displayed attractor reconstruction that has a two-winged structure reminiscent of the famous Lorenz attractor but potentially be unfolded further (Fig. 6F). However, both NUTD selection measures once more provide a more convincing result in terms of sparse, continuous diagonal lines than the ACF and, thus, capture the system’s predictability more adequately.

Fig. 6
figure 6

Non-uniform delay selection for delay-differential ENSO model with A irregular oscillations, analogously to Fig. 3. Serial dependence of y(t), measured in terms of B ACF (red/blue) and auto-MI (black) for the univariate time series, as well as by C/D the continuity statistic \(\langle \varepsilon ^* \rangle (\tau )\) (gray) and the recurrence flow \({\varPhi} (\tau )\) (dark blue) for C two- and D three-dimensional embedding vectors \({\vec{v}}(t)\). The optimal delay is marked by a circle(/cross/star/vertical green line) for the ACF(/MI/continuity statistic/recurrence flow), respectively. E Zoomed RPs for the ACF, \(\langle \varepsilon ^* \rangle (\tau )\) and \({\varPhi} (\tau )\) (from left to right). F Three-dimensional phase space reconstruction based on recurrence flow

4 Conclusion

The nonlinearity and scale-dependence of relationships observed in high-dimensional empirical data calls for appropriate and easily applicable methods. Recurrence plots offer a mathematically simple yet effective framework for the study of dependencies in high-dimensional dynamical systems and are an established tool in applied nonlinear time series analysis. For deterministic systems, diagonal lines in an RP yield valuable information on the evolution of a system’s trajectory. We propose a novel recurrence-based dependence measure, called recurrence flow. It builds on the fact that an RP can be computed from delayed copies of one (or multiple) time series that are stacked on top of each other as it is common practise in time delay embedding. The absence of spurious structures perpendicular to diagonal lines hints at a correct choice of (embedding) delays between the embedding vector’s coordinates. We exploit the emergence of these structures to characterize the serial dependence in uni- and cross-dependence of multi-variate, high-dimensional systems. We demonstrated that the recurrence flow \({\varPhi} (\tau )\) captures nonlinear, lagged dependencies in the presence of observational noise. Due to its conceptual proximity to time delay embedding, we put a focus on the delay selection problem that underlies attractor reconstruction. The recurrence flow effectively pinpoints uniform and non-uniform embedding delays in multi-dimensional nonlinear systems. It, thus, bares high potential to be used in a recurrence-based embedding algorithm which will be the focus of future work. Compared to existing nonlinear dependence measures, it offers several advantages: (i) it is not based on a binning procedure and does not suffer from the curse of dimensionality, (ii) it is able to detect non-uniform delays in high-dimensional multi-scale data, (iii) it performs relatively well in the presence of observational noise, and, (iv) for the considered examples, it yields well interpretable, unambiguous maxima in the delay selection procedure (unlike, e.g., the continuity statistic). The recurrence flow formalism appears promising for the conceptualization of an automated recurrence-based time delay embedding algorithm.