1 Introduction

The famous embedding theorems of Whitney [1], Mañé [2], and Takens [3] together with their enhancement by Sauer et al. [4] allow a high dimensional state space reconstruction from (observed) uni- or multivariate time series. Computing dynamical invariants [5,6,7,8,9] from the observed system, making meaningful predictions even for chaotic or stochastic systems [10,11,12,13,14,15,16], detecting causal interactions [17,18,19] or nonlinear noise reduction algorithms [20, 21] all rely explicitly or implicitly on (time delay) embedding [22] the data into a reconstructed state space. Other ideas rather than time delay embedding (TDE) are also possible [22,23,24,25,26,27], but due to its simple use and its proficient outcomes in a range of situations, TDE is by far the most common reconstruction technique. Suppose there is a multivariate dataset consisting of M time series \(s_i(t),~i=1,\ldots ,M\). The basic idea is to use lagged values of the available time series as components of the reconstruction vector

$$\begin{aligned} \mathbf {v}(t) = \bigl ( s_{i_1}(t-\tau _1), s_{i_2}(t-\tau _2),\ldots ,s_{i_m}(t-\tau _m) \bigl ).\nonumber \\ \end{aligned}$$
(1)

Here, the delays \(\tau _j\) are multiples of the sampling time \(\varDelta t\) and the indices \(i_1, i_2, \ldots , i_m\) each denote the time series index \(i \in [1,\ldots , M]\), which has been chosen in the \(1\text {st},\, 2\text {nd}, \ldots ,\, m\text {th}\) embedding cycle. The total number of delays \(\tau _j, ~j= [1,\ldots ,m]\), i.e., the embedding dimension m, its values and the corresponding time series \(s_{i_j}, ~i_j \in [1,\ldots ,M]\) need to fulfill certain criteria to guarantee the equivalence to the unknown true attractor, e.g., the embedding dimension must suffice \(m \geqslant 2D_B+1\), with \(D_B\) being the unknown box-counting dimension (see Casdagli et al. [28], Gibson et al. [24], Uzal et al. [29] or Nichkawde [30] for a profound overview of the problem). Picking optimal embedding parameters \(\tau _j\) and m comes down to make the resulting components of the reconstruction vectors \(\mathbf {v}(t)\) as independent as possible [4, 22], but at the same time not too independent, in order to keep sufficient information of the correlation structure of the data [28, 29, 31,32,33]. Besides some unified approaches [34,35,36,37,38,39,40,41,42,43,44], which tackle the estimation of the delays \(\tau _j\) and the embedding dimension m simultaneously, most researchers use two different methods to perform the reconstruction.

  1. (1)

    A statistic determines the delays \(\tau _j\), we call it \(\varLambda _{\tau }\) throughout this paper. Usually, \(\tau _1 = 0\), i.e., the first component of \(\mathbf {v}(t)\) is the unlagged time series \(s_{i_1}\) in Eq. (1). For embedding a univariate time series, \(s_{i_1}=\ldots =s_{i_m}=s(t)\), the approach to choose \(\tau _2\) from the first minimum of the auto-mutual information [45, 46] is most common. All consecutive delays are then simply integer multiples of \(\tau _2\). Other ideas based on different statistics like the auto-correlation function of the time series have been suggested [23, 32, 33, 40, 47,48,49,50]. However, by setting \(\tau _j, j>2\) to multiples of \(\tau _2\), one ignores the fact that this “measure” of independence strictly holds only for the first two components of reconstruction vectors (\(m=2\)) [51, 52], even though in practice it works fine for most cases. More sophisticated ideas, like high-dimensional conditional mutual information [53, 54] and other statistics [54,55,56,57,58,59], some of which include non-uniform delays and the extension to multivariate input data [30, 38, 39, 53, 54, 60, 60,61,62,63,64,65, 65], have been presented.

  2. (2)

    A statistic, we call it \(\varGamma \) throughout this paper, which serves as an objective function and quantifies the goodness of a reconstruction, given that delays \(\tau _j\) have been estimated. The embedding process is thought of as an iterative process, starting with an unlagged (given) time series \(s_{i_1}\), i.e., \(\tau _1 = 0\). In each embedding cycle \({\mathscr {D}}_d, [d=1,\ldots ,m]\), a time series \(s_{i_d}\) lagged by \(\tau _d\) gets appended to obtain the actual reconstruction vectors \(\mathbf {v}_d(t) \in {\mathbb {R}}^{d+1}\) and these are compared to the reconstruction vectors \(\mathbf {v}_{d-1}(t)\) of the former embedding cycle (if \(d=1\), \(\mathbf {v}_{d-1}(t)\) is simply the time series \(s_{i_1}\)). This “comparison” is usually achieved by the amount of false nearest neighbors (FNN) [66,67,68,69,70], some other neighborhood-preserving-idea [71, 72], or more ambitious ideas [29, 30].

We have recently proposed an algorithm [41], which minimizes the L-statistic [29] (the objective function) in each embedding cycle \({\mathscr {D}}_d\) over possible delay values in this embedding cycle determined by a continuity statistic [65]. Nichkawde [30] minimizes the FNN-statistic in each embedding cycle over time delays given by a statistic, which maximizes directional derivatives of the actual reconstruction vectors. However, it cannot be ruled out that these approaches result in achieving a local minimum of the corresponding objective function, rather than attaining the global minimum.

Here, we propose a Monte Carlo Decision Tree Search (MCDTS) idea to ensure the reach of a global minimum of a freely selectable objective function \(\varGamma \), e.g., the L- or FNN-statistic or any other suitable statistic, which evaluates the goodness of the reconstruction with respect to the task. A statistic \(\varLambda _{\tau }\), which guides the pre-selection of potential delay values in each embedding cycle (such as the continuity statistic or conditional mutual information), is also freely selectable and can be tailored to the research task. This modular construction might be useful for practitioners, since it has been pointed out that optimal embedding parameters—thus also the used statistics to approximate them—depend on the research question, e.g., computing dynamical invariants or prediction [63, 64, 73,74,75]. Thus, the proposed method is neither restricted to the auto-mutual information, in order to measure the independence of consecutive reconstruction vector components, nor does it necessarily rely on the ubiquitous false nearest neighbor statistic. Independently from the chosen statistic for potential time delays and from the chosen objective function, the proposed method computes different embedding pathways in a randomized manner and structures these paths as a tree. Consequently, it is able to reveal paths through that tree— if there are any—which lead to a lower value of the objective function than paths, which strictly minimize the costs in each embedding cycle. Given a sufficiently high number of samplings, MCDTS guarantees to optimize the chosen objective function \(\varGamma \) over the (delay embedding-) parameter space. In Sect. 2, we describe this method before we apply it to paradigmatic examples in Sect.  3, which include Recurrence Analysis, nearest-neighbor-based time series prediction and causal analysis based on convergent cross mapping.

2 Method

When embedding a time series, in each embedding cycle a suitable delay, and for multivariate data a suitable time series, has to be chosen. While the final embedding vector is invariant to the order of chosen components, the embedding process, and the used statistics and methods to suggest suitable delays, generally depend on all the previous embedding cyclesFootnote 1. It seems therefore natural to visualize all possible embedding cycles in a tree-like hierarchical data structure as shown in Fig. 1. The initial time series \(s_{i_1}\) with delay \(\tau _1=0\) forms the root of the tree, and each possible embedding cycle \({\mathscr {D}}_d\) is a leaf or node of the tree. With the large amount of possible delays and time series to choose from, this decision tree becomes too large to fully compute it. At the same time, aforementioned statistics like the continuity statistic or conditional mutual information can guide us in pre-selecting potentially suitable delay values and an objective function like the L- or FNN-statistic can pick the most suitable delay value of the pre-selection by quantifying the quality of the reconstruction in each embedding cycle. Throughout this paper, we denote a statistic, which pre-selects potential delay values as \(\varLambda _{\tau }\) and the objective function as \(\varGamma \). The task to embed a time series can then be interpreted as minimizing \(\varGamma (i_1,i_2,..,i_m,\tau _1,\tau _2,...,\tau _m)\). Visualizing this with a tree as in Fig. 1, we actually perform a tree search to minimize \(\varGamma \). However, always choosing the leaf of the tree that decreases \(\varGamma \) the most might lead only to a local minimum.

Fig. 1
figure 1

All possible embeddings of a time series visualized by a tree. Each leaf of the tree symbolizes one embedding cycle \({\mathscr {D}}_d\) using one selected time series \(s_{i_d}\) from the multivariate data set and delay \(\tau _d\). Marked in orange is one chosen full embedding

As we strive to find a global minimum and cannot compute the full embedding tree, we proceed by sampling the tree. This approach is inspired by the Monte Carlo Tree Search algorithms that were originally envisioned to master the game of Go [76]. Ultimately computer programs based on these algorithms were able to beat a reigning world champion, a feat that was long thought to be impossible for computer programs [77]. Adapting this idea to the embedding problem, we proceed as follows. We randomly sample the full tree, for each embedding cycle we compute the change in the objective function \(\varGamma \) and pick for the next embedding cycle preferably those delays that decrease \(\varGamma \) further. Each node \({\mathscr {N}}_d\) of the tree encodes one possible embedding cycle and holds the time series used \([s_{i_1}, \ldots , s_{i_d}]\), the delays used until this node \([\tau _1, \ldots , \tau _{d}]\), i.e., the current path through the tree up to node \({\mathscr {N}}_d\), and a value of the objective function \(\varGamma _d\). We sample the tree \(N_\text {trial}\)-times in a two-step procedure:

  • Expand: Starting from the root, for each embedding cycle \({\mathscr {D}}_d\), possible next steps \((s_{i_j},\tau _j,\varGamma _j)\) are either computed using suitable statistics \(\varLambda _{\tau }\) and \(\varGamma \) or, if there were already previously computed ones, they are looked up from the tree. We consider the first embedding cycle \({\mathscr {D}}_2\) and use the continuity statistic \(\langle \varepsilon ^\star \rangle (\tau )\) for \(\varLambda _{\tau }\). Then, for each time series \(s_{i}\) the corresponding local maxima of all \(\langle \varepsilon ^\star \rangle (\tau )\) (for a univariate time series there will only be one \(\langle \varepsilon ^\star \rangle (\tau )\)) that determines the set of possible delay values \(\tau _2\) (see the rows in Figs. 12 corresponding to \({\mathscr {D}}_2\)). Then, one of the possible \(\tau _2\)’s is randomly chosen with probabilities computed with a softmax of the corresponding values of \(\varGamma _j\). Due to its normalization, the softmax function is able to convert all possible values of \(\varGamma _j\) to probabilities with \(p_j=\exp (-\beta \varGamma _j)/\sum _k\exp (-\beta \varGamma _k)\). This procedure is repeated (consecutive rows for \({\mathscr {D}}_3 \ldots \), etc., in Figs. 12) until the very last computed embedding cycle \({\mathscr {D}}_{m+1}\). This is, when the objective function \(\varGamma _{m+1}\) cannot be further decreased for any of the \(\tau _{m+1}\)-candidates. Figure 2 visualizes this procedure.

  • Backpropagation: After the tree is expanded, the final value \(\varGamma _m\) is backpropagated through the taken path of this trial, i.e., to all leafs (previous embedding cycles d), that were visited during this expand, updating their \(\varGamma _d\) values to that of the final embedding cycle.

With this two-step procedure, we iteratively build up the part of the tree that leads to embedding with the smallest values for the objective function. The following two refinements are made to improve this general strategy: in case of multivariate time series input, the probabilities are chosen uniformly random in the zeroth embedding cycle \({\mathscr {D}}_1\). This ensures an even sampling over the given time series, which can all serve as a valid first component of the final reconstruction vectors. Additionally, as soon as a \(\varGamma _j\) is found that is smaller than the previous global minimum, this embedding cycle is directly chosen and not randomized via the softmax function. This also means that for the very first trial always the smallest value of \(\varGamma _j\) is chosen, resulting in a good starting point for the further Monte Carlo search of the tree. In case, the continuity statistic \(\langle \varepsilon ^\star \rangle (\tau )\) is used as the delay pre-selection statistic \(\varLambda _{\tau }\) and the \(\varDelta L\)-statistic [29] as the objective function \(\varGamma \), the first sample thus is identical to the PECUZAL algorithms [41] and every further sample improves upon this embedding further minimizing \(\varDelta L\). Aside from the choice of \(\varLambda _\tau \) and \(\varGamma \), the two hyperparameters of the method are the number of trials \(N_{\text {trials}}\) and the \(\beta \) parameter of the probability distribution choosing the next delay value. The parameter \(\beta \) governs how likely it is that the minimum of all \(\varGamma _i\) is chosen, i.e., in the extreme cases for \(\beta =0\) the possible delay times are chosen uniformly random and for \(\beta \rightarrow \infty \) always the smallest \(\varGamma _i\) is chosen. For the tree search algorithms, this means that \(\beta \) governs how “wide” the tree search is, larger \(\beta \) values search the tree more along the already found previously found minima, whereas for smaller values the tree search will stress previously unvisited paths through tree stronger. The default value for \(\beta \) which is used in all shown results is \(\beta =2\).

The computational complexity of this algorithm obviously scales with the number of trials \(N_{\text {trials}}\), even though already computed embedding cycles are not computed again in later trials. When sampling the tree many times, the path through the tree of the first few embedding cycles will likely often be the same as that of previous trials. In these cases, computing the delay-preselection and objective function will be identical to that of previous trials. All the values of possible delays and values of the objective function that are computed in previous trials are saved during the tree search and are reused when the same embedding cycle needs to be computed again.

Otherwise, the complexity depends on the chosen delay pre-selection function \(\varLambda _{\tau }\) and the objective function \(\varGamma \). It has to be clear that the algorithm is computationally much more demanding than a classical TDE. However, once an embedding is computed for a specified system, it can be reused in later applications.

Fig. 2
figure 2

Visualization of the expand step of the MCDTS algorithm. Here, we exemplary use the continuity statistic \(\langle \varepsilon ^\star \rangle (\tau )\) as the delay pre-selection statistic \(\varLambda _{\tau }\) and the \(\varDelta L\)-statistic [29] as the objective function \(\varGamma \), as it has been utilized in the recently proposed PECUZAL algorithm [41]

3 Applications

In this section, we present the potential of the proposed MCDTS method by various applications. Here, we aim to provide suggestions and show that there are a number of state-space based applications that directly benefit from our method or provide better results than with the state-of-the-art embedding techniques. A variety of applications are presented to support the fact that different research questions elicit different embedding behavior and that our proposed method is able to optimize the embedding with respect to different study objectives. In particular, we investigate the influence of the state space reconstruction parameters on a recurrence analysis of the chaotic Lorenz-96 system (Sect.  3.1), a nearest-neighbor time series prediction for the chaotic Hénon map and for a palaeoclimate dataset (Sects. 3.2, 3.3), and last but not least, a causal analysis of two physical observables of a combustion process (Sect.  3.4). The selected applications cover many areas of nonlinear time series analysis, and it is not our intention here to propose new techniques for prediction or causal analysis which are necessarily superior to other, alternative approaches. We rather chose well established state-space-based methods and use them to show how our proposed method optimizes results with respect to the chosen embedding.

3.1 Recurrence properties of the Lorenz-96 system

At first, we consider a potentially higher dimensional nonlinear dynamical system and compare the recurrence properties of its dynamics as derived from the original set of system variables with such by applying the different embedding approaches. We utilize the Lorenz-96 system [79], a set of N ordinary first-order differential equations

$$\begin{aligned} \frac{dx_i}{dt} = (x_{i+1} - x_{i-2}) x_{i-1} - x_i + F , \end{aligned}$$
(2)

with \(x_i\) being the state of the system of node \(i = 1,\dots ,N\) and it is assumed that the total number of nodes is \(N\ge 4\). We can think of this system as a ring-like structure of N coupled oscillators—each representing some atmospheric quantity—all connected to the same forcing. The forcing constant F serves as the control parameter. Here, we vary F from \(F=3.7\) to 4.0 in steps of 0.002 covering limit cycle dynamics as well as chaos. We set \(N = 8\), randomly choose the initial condition to \(u_0=[0.590; 0.766; 0.566; 0.460; 0.794; 0.854; 0.200; 0.298]\), and use a sampling time of \(\varDelta t=0.1\). By discarding the first 2500 points of the integration as transients, we get time series consisting of 5000 samples for each of the encountered values of F. We focus on two scenarios: (1) only the time series of the 2nd node (univariate embedding) and (2) three time series of nodes 2, 4, and 7 are used to mimic a uni- and a multivariate embedding case. For each of these time series, we perform an embedding, using three classic time delay approaches as proposed by Kennel et al. [69] (5%-threshold), Cao [66] (slope threshold of 0.2), and Hegger and Kantz [67] (5%-threshold) with a uniform delay value estimated as the first minimum of the auto-mutual information (only applicable to the univariate case) and the recently proposed PECUZAL algorithm [41]. For our proposed MCDTS approach, we embed the data using the continuity statistic \(\langle \varepsilon ^\star \rangle (\tau )\) as the delay pre-selection statistic \(\varLambda _{\tau }\). For the objective function \(\varGamma \), we try two different approaches, namely the \(\varDelta L\)-statistic [29] (MCDTS-C-L) as well as the FNN-statistic [67] (MCDTS-C-FNN). In all approaches, we discard serially correlated points from the nearest neighbor search by setting a Theiler window [80] to the first minimum of the mutual information. An overview over all MCDTS implementations and abbreviations is given in Table 1.

Table 1 The different implementations of the MCDTS algorithm used throughout the article and their choice of delay-preselection and objective function, which is minimized through the tree search

By varying the control parameter F, the system varies its dynamics which is well represented by a change in the recurrence behavior [81]. In previous work, we have demonstrated that recurrence quantification analysis (RQA) can be used to qualitatively characterize the typical dynamical properties of the Lorenz-96 system such as chaotic or periodic dynamics [82]. We, therefore, compare the recurrence properties of all reconstructed trajectories to recurrence properties of the true trajectory (obtained from the numerical integration) by using RQA. The neighborhood relations of a trajectory can be visualized in a recurrence plot (RP), a binary, square matrix \({\mathbf {R}}\) representing the recurrences of states \(\mathbf {x}_i\) (\(i=1,\ldots ,N\), with N the number of points forming the trajectory) in a d-dimensional (optionally reconstructed) state space [83, 84]

$$\begin{aligned} {R}_{i,j}(\varepsilon ) = \varTheta \left( \varepsilon - \Vert \mathbf {x}_i - \mathbf {x}_j\Vert \right) , \qquad \mathbf {x} \in {\mathbb {R}}^d, \end{aligned}$$
(3)

with \(\Vert \cdot \Vert \) a norm, \(\varepsilon \) a recurrence threshold, and \(\varTheta \) the Heaviside function. There are numerous ideas of how to quantify a RP [84, 85]. Some statistics are based on the distribution of recurrence points, some on the diagonal line structures, some on the vertical structures, and it is also possible to use complex-network measures, when interpreting \({\mathbf {R}}\) (subtracting the main diagonal) as an adjacency matrix of a recurrence network (RN) [86]. Some of these quantifiers are related to dynamical invariants [87, 88].

Fig. 3
figure 3

Schematic visualization of the data analysis for the Lorenz-96 system, Eq. (2) (see text for details). In case of the univariate approach, the \(x_2(t)\)-time series gets embedded by all considered reconstruction methods, for the multivariate approach, three time series (\(x_2(t)\), \(x_4(t)\) and \(x_7(t)\)) are passed to the reconstruction algorithms. From the reconstructed attractors, we obtain a recurrence plot and quantify it (RQA) by using ten different quantifiers. The same is done for the reference trajectory gained from all 8 time series from the numerical integration. Repeating the analysis for time series corresponding to varying values of the control parameter F of the system, we finally obtain time series of the RQA-quantifiers for each reconstruction method as well as for the true trajectory

Fig. 4
figure 4

Results of the analysis of the Lorenz-96 system with varying control parameter and for all considered reconstruction approaches (see Table 1 for notations). Shown is the pairwise comparison of the normalized mean squared error of all considered ten RQA-quantifiers with respect to the truth RQA-time series (see text for details). For instance, a value of 70% in the table indicates that for seven out of the ten considered RQA-quantifiers the normalized mean squared error for the reconstruction method displayed on the y-axis is lower than for the reconstruction method displayed on the x-axis

For our purpose of comparing different aspects of recurrence properties of original and reconstructed trajectories, we use the transitivity (TRANS) of the \(\varepsilon \)-RN, the determinism (DET), the mean diagonal line length (\(L_\text {mean}\)), the maximum diagonal line length (\(L_\text {max}\)) and its reciprocal (DIV), the entropy of diagonal line lengths (ENTR), the TREND, the mean recurrence time (MRT), the recurrence time entropy (RTE), and finally, the joint recurrence rate fraction (JRRF). JRRF measures the accordance of the recurrence plot of the (true) reference system, \({\mathbf {R}}^{\text {ref}}\) with the RP of the reconstruction, \({\mathbf {R}}^{\text {rec}}\).

$$\begin{aligned} \text {JRRF}&= \frac{\sum _{i,j}^N JR_{i,j}}{\sum _{i,j}^N R_{i,j}^{\text {ref}}} , \quad \text {JRRF} \in [0,~ 1] \end{aligned}$$
(4)
$$\begin{aligned} \mathbf {JR}&= {\mathbf {R}}^{\text {ref}}\circ {\mathbf {R}}^{\text {rec}}. \end{aligned}$$
(5)

We compute both, \({\mathbf {R}}^{\text {ref}}\) and \({\mathbf {R}}^{\text {rec}}\), by fixing the recurrence threshold corresponding to a global recurrence rate (RR) of 5% in order to ensure comparability [89]. Although the quantification measures depend crucially on the chosen recurrence threshold, the particular choice we make here is not so important, since we apply it to all RPs we compare. \(RR= 5\)% ensures a proper resolution of the inherent structures to be quantified by the ten aforementioned measures.

The described procedure is schematically illustrated in Fig.  3. For each reconstruction method and for each of the ten RQA-statistics, the mean squared error (MSE) with respect to the RQA-statistics of the true reference trajectory is computed (normalized to the reference RQA-values). The pairwise comparison of the MSEs is evaluated as the percentage of the ten RQA-MSEs, which take a lower MSE (Fig. 4). For instance, a value of 70% in the table indicates that for seven out of the ten considered RQA-quantifiers the normalized mean squared error for the reconstruction method displayed on the y-axis is lower than for the reconstruction method displayed on the x-axis. The m-notation indicates the multivariate embedding approach, where three instead of one time series have been passed to the reconstruction methods (\(x_2(t)\), \(x_4(t)\), and \(x_7(t)\), see Fig. 3). Since the classic TDE algorithms from Cao, Kennel et al., and Hegger & Kantz are not able to handle multivariate input data, only PECUZAL and the proposed MCDTS-idea combined with the L-statistic and with the FNN-statistic are considered in the multivariate scenario. The superiority over the three classic TDE methods is discernible in values \(>50\)% for PECUZAL and MCDTS in the first three columns. While we would expect a better reconstruction for the multivariate cases—because we simply provide more information—our proposed method also performs better in the univariate case when the FNN-statistic is used as an objective function. When using MCDTS with the L-statistic, there is hardly any improvement discernible, while the computational costs are magnitudes higher. Here, PECUZAL reveals better results, even though it uses the same statistics. However, combined with the FNN-statistic our proposed idea performs very well in the univariate case and reveals excellent results for the multivariate case.

3.2 Short time prediction of the Hénon map time series

In the following, a state space reconstruction \(\mathbf {v}(t)\) of a single time series s(t) is used to further predict its course. Besides a very recent idea [90] to train neural ordinary differential equations on a reconstructed trajectory, which then allows prediction, several attempts have been published [10,11,12,13,14,15,16] which more or less rely on the same basic idea. For the last vector of the reconstructed trajectory, denoted with a time-index l, \(\mathbf {v}(t_{l})\), a nearest neighbor search is performed. Then, these neighbors are used to predict the future value of this point T time steps ahead, \(\mathbf {v}(t_{l+T})\). Knowledge of the used embedding, which led to the reconstruction vectors \(\mathbf {v}(t)\), then allows to read the prediction of the time series \(s(t_l+T)\) from the predicted reconstruction vector \(\mathbf {v}(t_{l+T})\). Usually, \(T=1\), i.e., the forecast is iteratively build by appending \(\mathbf {v}(t_{l+T})\) to the trajectory \(\mathbf {v}(t_{i}),~i=1,\ldots ,l\), and this procedure is repeated N times, in order to obtain an N-step prediction. The aforementioned approaches differ from the way they construct a local model of the dynamics based on the nearest neighbors. For instance, Farmer and Sidorowich [11] proposed a linear approximation, i.e., a linear polynomial is fitted to the pairs \((\mathbf {v}(t_{nn_i}), \mathbf {v}(t_{nn_i+T}))\), where \(nn_i\) denotes the ith nearest neighbor time-index. Sugihara and May [16] used a simplex with minimum diameter to select the nearest neighbor indices \(nn_i\) and projected this simplex T steps into the future. The prediction is then being made by computing the location of the original predictee \(\mathbf {v}(t_{l})\) within the range of the projected simplex, “giving exponential weight to its original distances from the relevant neighbors.” Here, a much simpler idea is considered: a zeroth-order approximation of the local dynamics. The prediction is simply the projection of the nearest neighbor of \(\mathbf {v}(t_{l})\), denoted by the index \(nn_1\), \(\mathbf {v}(t_{l+T})=\mathbf {v}(t_{nn_1+T})\). It is clear that the performance of all prediction approaches based on an approximation of the local dynamics by making use of nearest neighbors will crucially depend on the length of the training set. By training set, we mean the time series s(t), which has been used to construct the trajectory \(\mathbf {v}(t)\). We hypothesize that the accuracy of such a prediction will also depend on the reconstruction method, especially when the training set is rather short (Small and Tse [43] and also Bradley and Kantz [73]). In particular, Garland and Bradley [74] have shown that accurate predictions can be achieved with the aforementioned zeroth-order approximation when using an incomplete embedding of the data, i.e., reconstructions that do not satisfy the theoretical requirements on the embedding dimension in Takens’ sense.

As a proof of concept, we now use the described nearest-neighbor prediction method to predict the x-time series of the Hénon map [91], even though other simple models like low order polynomial models might be superior for such noise-free and pure deterministic dynamics (we provide a more challenging example in Sect. 3.3). The time series \(x_{i+1}=y_i+1-ax_i^2\) and \(y_{i+1}=bx_i\), with standard parameters \(a=1.4,~b=0.3\) and 100 randomly chosen different initial conditions are used. For each of those 100 samples x- and y-time series of length \(N=10,030\) are obtained (transients removed). The first 10, 000 points of the time series are used for state space reconstruction (both time series for the multivariate cases, only the x-time series in the univariate case), while the last 30 points are the prediction test set (only the x-time series is predicted). The same reconstruction methods as in Sect.  3.1 are used, but for MCDTS we try two different delay pre-selection statistics \(\varLambda _{\tau }\). Rather than only considering the continuity-statistic (denoted as C in the model description) we also look at a whole range of delay values \(\tau =0,\ldots ,50\) (denoted as R in the model description). For the objective function \(\varGamma \), we try

  • The \(\varDelta L\)-statistic (denoted as L in the model description),

  • The FNN-statistic (denoted as FNN in the model description),

  • The root mean squared in-sample one-step prediction error on the first component of the reconstruction vectors, i.e., the x-time series (denoted as MSE in the model description), and finally

  • The mean Kullback–Leibler-distance of the in-sample one-step prediction and the “true” trajectory points (denoted as MSE-KL in the model description).

By “in-sample,” we mean the training set, which is used for the reconstruction. For all MCDTS implementations and abbreviations, see again Table 1. The accuracy of the prediction is evaluated by the normalized root-mean-square forecast error (RMS),

$$\begin{aligned} e_{\text {rms}}(T) = \frac{\sqrt{\left\langle \left[ x_{\text {pred}}(T)- x_{\text {true}}(T) \right] ^2 \right\rangle }}{\sqrt{\left\langle \left[ x_{\text {true}}(T) - \langle x_{\text {true}}(T) \rangle \right] ^2 \right\rangle }} \end{aligned}$$

with index true denoting the test set values. This way \(e_{\text {rms}}(T) = 0\) indicates a perfect prediction, whereas \(e_{\text {rms}}(T) \approx 1\) means that the prediction is not better than a constant mean-predictor of the test set. Figure 5 shows the mean forecast accuracy for the traditional TDE methods (Cao, Kennel et al., Hegger & Kantz) and two selected MCDTS approaches as a function of the prediction time. The largest Lyapunov exponent is estimated to \(\lambda _1 \approx 0.419\), and we display Lyapunov times on the x-axis, i.e., units of \(1/\lambda _1\). As in Sect.  3.1, m indicates the multivariate case, in which both, x- and y-time series are fed into the reconstruction algorithms. The results for all discussed reconstruction methods can be found in Appendix A (Fig. 8). As expected, the forecast accuracy is worse in case of added white noise (Fig. 5B) and the predictions based on multivariate reconstructions perform slightly better. The MCDTS-based forecasts perform significantly better than the forecasts based on the traditional TDE methods. Even though the continuity statistic constitutes a reasonable delay pre-selection statistic with a clear physical meaning, when utilized in our MCDTS approach (MCDTS-C-), it performs not as good as if we would not pre-select delays on the basis of some statistic, but try delays in a whole range of values (\(\tau \in [0, 50]\), MCDTS-R-). At least, this statement holds for this example of the Hénon map time series.

Fig. 5
figure 5

A Normalized root-mean-square prediction errors (RMS) for the Hénon x-time series and for selected reconstruction methods (see Fig. 8 for all mentioned approaches and Table 1) as a function of the prediction time. Shown are mean values of a distribution of 100 trials with different initial conditions. For the prediction, we use a one step ahead zeroth-order approximation on the nearest neighbor of the last point of the reconstructed trajectory and iteratively repeated that procedure 30 times in order to obtain a prediction of 31 samples in total for each trial. B Same as in A but with 3% additive white noise

A Wilcoxon rank sum test is applied to underpin the better performance of the MCDTS-approaches in comparison with the classical time delay methods. Therefore, we define a threshold \(\zeta =0.1\) and compute the prediction times for which \(e_{\text {rms}}(T)\) first exceeds \(\zeta \) for all trials and for all considered reconstruction methods. These distributions of prediction times for each method are used for the statistical test with the null hypothesis that two considered distributions have equal medians. The tests complement the visual analysis of Figs. 5 and 8. A significantly better forecast performance (\(\alpha \)=0.01) than the classic time delay embedding methods for PECUZAL and all considered MCDTS-based approaches, but the ones combined with the FNN-statistic (MCDTS-FNN) can be verified for the noise free case. In the case of the noise corrupted time series PECUZAL (m), all MCDTS-MSE-approaches and MCDTS-C-L (m) achieve a significantly better prediction performance than the classical time delay methods.

Some remarks: Together with PECUZAL (m) and MCDTS-R-MSE (m), MCDTS-C-L (m) achieves the overall best results (Fig. 8). The choice of the threshold \(\zeta \) is obviously subjective, but a range of thresholds gave similar results and the “grouping” of the results according to the different techniques is clearly discernible already when looking at the mean (Figs. 5, 8). We have to mention that we could not achieve results as shown here for continuous systems like the Lorenz-63 or the Rössler model. In those cases, the difference in the prediction accuracy was not as clear as it is in the Hénon example and not significant, for both, noise-free and noise corrupted time series. We also investigated the influence of the time series length of the training sets, but the results did not change much. All reconstruction methods gave similar prediction results. We could, however, observe that simple and incomplete embeddings, i.e., a too low embedding dimension, often—but not always—led to similarly good prediction results, when compared to “full” embeddings. This was true for the continuous examples (not shown in this work), but this also holds for the Hénon example shown here, where the MCDTS-C-L approach does not yield the best results in the univariate case, although it targets the total minimum of the L-objective-function, which the authors consider to be a suitable cost-function for a good/full embedding. These observations are in line with the findings of Garland and Bradley [74] and the fact that our reconstruction methods tend to suggest higher dimensional embeddings with smaller delays in the presence of noise support the findings of Small and Tse [43]. The FNN-statistic does not seem to be useful in the prediction application shown here, since all approaches which make use of it (including classic TDE) perform clearly worse compared to the other methods used.

3.3 Improved short-time predictions for CENOGRID

To demonstrate that the prediction procedure from the preceding section works for real, noisy data, we apply it to the recently published CENOzoic Global Reference benthic foraminifer carbon and oxygen Isotope Dataset (CENOGRID) [92]. The temperature-dependent fractionation of carbon and oxygen isotopes in benthic foraminifera is an important means to reconstruct past global temperatures and environmental conditions. Moreover, the Cenozoic is interesting, because it provides an analogue of future greenhouse climate and how and which regime shifts in large-scale atmospheric and ocean circulation can be expected in the future warming climate. Predicting these data may be unrealistic and not motivated by an actual research question. However, this task shall serve as a proof of concept. The non-stationarity and noise level of CENOGRID make prediction particularly difficult.

The dataset consists of a detrended \(\delta ^{18}\)O and a detrended \(\delta ^{13}\)C isotope record with a total length of \(N=13,421\) samples and a sampling period of \(\varDelta t= 5000 \text {yrs}\) (Fig.  9 in Appendix B). Here, we make predictions on the \(\delta ^{13}\)C isotope record. The first 13, 311 samples are used as a training set, from which state space reconstructions are obtained. The remaining 110 samples of the \(\delta ^{13}\)C record act as the test set. For 100 different starting points in the test set, we make 10-step-ahead predictions for each reconstruction method by using the embedding parameters gained from the training and with the iterative zeroth-order approximation prediction procedure described in Sect.  3.2. This way we simulate different initial conditions for the prediction and obtain a distribution of forecasts for each reconstruction method. We again use a Wilcoxon rank sum test on these distributions in order to see whether predictions based on some reconstruction method are significantly better than the predictions obtained from classic TDE (Cao, Kennel et al., Hegger & Kantz). Only one of the applied reconstruction methods (listed in Table 1), MCDTS-R-MSE (m), score significantly better predictions (highly significant for prediction horizons up to 4\(\varDelta t\) and significant for prediction horizon up to 5\(\varDelta t\)). Figure 6A shows the mean normalized root mean square prediction error gained from the 100 predictions for the classic TDE and the mentioned MCDTS-R-MSE (m). The distribution of all prediction trials for the best performing classic TDE method (Hegger & Kantz) and for MCDTS-R-MSE (m) is shown in panels B, C. Even though the multivariate approach MCDTS-R-MSE (m) could have been used both, the \(\delta ^{18}\)O and the \(\delta ^{13}\)C time series for the reconstruction, it only uses \(\delta ^{13}\)C lagged by 1 and 2 samples in a three-dimensional reconstruction. The classic TDE methods and all other reconstruction methods (listed in Table 1, not shown in Fig. 9) revealed higher dimensional embeddings (Table 2). Yet, all these higher dimensional reconstructions give poor prediction results, except for MCDTS-C-MSE-KL (m), which gives significant better predictions (\(\alpha =0.05\)) than the classic TDE methods at least for the one-step-ahead prediction.

Fig. 6
figure 6

A Mean normalized root mean square prediction error for four selected reconstruction methods on the \(\delta ^{13}\)C CENOGRID record. B Prediction error for all 100 trials for the classic TDE method of [67] (yellow line in panel A). C Prediction error for all 100 trials for the MCDTS-R-MSE (m) method (purple line in panel A). The forecasts based on this method are significantly better than for all three classic TDE methods (up to 4 prediction time steps under a significance level \(\alpha =0.01\) and up to 5 prediction time steps under a significance level \(\alpha =0.05\))

3.4 Estimating causal relationship of observables of a thermoacoustic system

As a final proof of concept, we utilize state space reconstruction for detecting causality between observables X and Y in a turbulent combustion flow in a gas turbine. It is possible to infer a causal relationship between two (or more) time series x(t) and y(t) via convergent cross mapping (CCM) [18, 19, 93], which—in contrast to Granger causality [94]—also works for time series stemming from nonseparable systems, i.e., deterministic dynamical systems. The CCM method “tests for causation by measuring the extent to which the historical record of Y values can reliably estimate states of X. This happens only if X is causally influencing Y.” [18] This also incorporates the embedding theorems [1,2,3] in a sense that a state space reconstruction based on x(t) is diffeomorphic to a reconstruction of y(t), if x(t) and y(t) describe the same dynamical system and the embedding parameters have been chosen correctly. To check for a causal relationship from \(X \rightarrow Y\), a state space reconstruction of y(t) yields a trajectory \(\mathbf {v}_y(t) \in {\mathbb {R}}^{m}\), with m denoting the embedding dimension, which is then used for estimating values of x(t), namely \({\hat{x}}(t)\). It is said that \(\mathbf {v}_y(t)\) cross-maps x(t), in order to get estimates \({\hat{x}}(t)\). Technically, this is done by first searching for \(m+1\) nearest neighbors of a point corresponding to a time index \(t' \in t\), i.e., find the \(m+1\) time indices \(t'_{NN_i}, i=1,\dots ,m+1\) of the nearest neighbors of \(\mathbf {v}_y(t')\). Further, these time indices \(t'_{NN_i}\) are used to “identify points (neighbors) in X (a putative neighborhood) to estimate \(x(t')\) from a locally weighted mean of the \(m+1\) \(x(t'_{NN_i})\) values” [18]:

$$\begin{aligned} {\hat{x}}(t') = \sum w_i x(t'_{NN_i}), \qquad i=1,\dots ,m+1, \end{aligned}$$
(6)

with the weighting \(w_i\) based on the nearest neighbor distance to \(\mathbf {v}_y(t')\).

$$\begin{aligned} w_i&= u_i / \sum u_j, \qquad j=1,\dots ,m+1 \end{aligned}$$
(7)
$$\begin{aligned} u_i&= \exp {\left[ -\Vert \mathbf {v}_y(t') - \mathbf {v}_y(t'_{NN_i}) \Vert ~/~ \Vert \mathbf {v}_y(t') - \mathbf {v}_y(t'_{NN_1}) \Vert \right] } \end{aligned}$$
(8)

with \(\Vert \cdot \Vert \) a norm (we used Euclidean distances). Finally, the agreement of the cross-mapped estimates \({\hat{x}}(t')\) with the true values \(x(t')\) is quantified for all considered \(t' \in t\), e.g., by computing a linear Pearson correlation \(\rho _\text {CCM}\), which has been done in this study. The clou is that the estimation skill, here represented by \(\rho _\text {CCM}\), increases with the considered amount of data used, if X indeed causally influences Y. This is because the attractor—represented by the reconstruction vectors \(\mathbf {v}_y(t)\)—gets resolved better with increasing time series length, resulting in closer nearest neighbors and therefore a better concordance of \({\hat{x}}(t)\) and x(t), i.e., an increase in \(\rho _\text {CCM}\) with increasing time series length. This convergence of the estimation skill based on cross-mapping is a necessary condition for causation, not only a high value of \(\rho _\text {CCM}\) itself (Fig. 7A). Although the embedding process is key to a successful application of CCM to data, its influence has not been discussed by Sugihara et al. [18]. However, Schiecke et al. [95] discussed the impact of the embedding parameters on CCM briefly and we hypothesize that the embedding method can play a crucial role, when analyzing real-world data. Therefore, we utilize the MCDTS framework in the following way. As a delay pre-selection method \(\varLambda _{\tau }\), we use the reliable continuity statistic \(\langle \varepsilon ^\star \rangle (\tau )\) [65, 78]. As a suitable objective function \(\varGamma \), we use the negative of the corresponding \(\rho _\text {CCM}\), i.e., MCDTS optimizes the embedding with respect to maximizing \(\rho _\text {CCM}\) of two given time series. According to our abbreviation-scheme given in Table 1, we will refer to this approach as MCDTS-C-CCM.

Fig. 7
figure 7

A Linear correlation coefficient of convergent cross mapping (CCM) heat release \(\rightarrow \) pressure as a function of the considered time series length for Cao’s embedding method (gray) and the proposed MCDTS embedding (blue) exemplary shown for one out of 50 drawn sub-samples of length \(N=5000\) from the entire time series (Fig. 10, c.f. Table 1 for abbreviations). While the dashed black lines show the linear trend for both CCM correlations, the dashed red line shows the Pearson linear correlation between the heat release and the pressure time series, indicating no influence. We ensure convergence of the cross mapping, and, thus, a true causal relationship, if there is a positive trend in the CCM-correlation over increasing time series length (slope of the dashed black lines) and when the last point of the CCM-correlation (i.e., longest considered time series length) exceeds a value of 0.2 (in the shown case Cao’s method does not detect a causal influence of the heat release to the pressure). We test this on all 50 sub-samples for both causal directions. B True classified causal relationships as a fraction of all sub-samples based on the embedding of each time series using Cao’s method and our proposed MCDTS method

We apply the CCM-method to time series data that spans the different dynamical regimes of a thermoacoustic system. Here, we investigate the mutual causal influence of two recorded variables of the thermoacoustic system, namely the pressure and the heat release rate fluctuations (Fig. 10). The original experiments were performed on a turbulent combustor with a rectangular combustion chamber (length 700 mm, cross-section 90 mm \(\times \) 90 mm, Fig. 11). In such a combustion experiment, a fixed vane swirler is used to stabilize the flame and a central shaft that supports the swirler injects the fuel through four radial injection holes. The fuel used is liquefied petroleum gas (60% butane and 40% propane). The airflow enters through the inlet to the combustion chamber. The partially premixed reactant mixture is ignited using a spark plug. Once the flame is established in the combustor, we continuously varied the control parameter (mass flow rate of air, which, in turn, varies the Reynolds numberFootnote 2 and the equivalence ratioFootnote 3) to observe the dynamical transitions in the system. Acoustic pressure fluctuations were measured using a piezoelectric transducer (PCB103B02) and heat release rate using a photomultiplier tube (Hamamatsu H10722-01) at a sampling rate of 4 kHz.

The interactions between the turbulent flow, the unsteady fluctuations of the flame due and the acoustic field of the chamber lead to different dynamical states. As the airflow rate increases, the system transitions from a state of stable operation (which comprises high dimensional chaos having low amplitude [96]) to intermittency, a state that comprises bursts of periodic oscillations amid epochs of aperiodicity [97], and then to limit cycle [98]. The self-sustained limit cycle oscillations represent a state of oscillatory instability, known as thermoacoustic instability [99]. When the flow rate of air is further increased, the flame loses its stability inside a combustor and blows out. The pressure and heat release rate data capture the transition through all these dynamical states in sequence. In the many different dynamical regimes recorded in the time series, we expect the strength of causal interference between the heat release and the pressure to vary. But in all dynamics, we expect a mutual causal interaction between heat release and pressure. Moreover, since a possible asymmetric bi-directional coupling between heat release and pressure has been discovered in a stationary setup of a very similar experiment [100], we would also expect that the heat release rate has a slightly stronger causal influence on pressure than vice versa.

In short, the goal here is twofold:

  1. 1.

    Prove the expected mutual causal relationship between heat release rate and pressure as well as

  2. 2.

    the hypothesized asymmetry in its strengths by applying MCDTS-C-CCM on a range of time series, sampled from the entire record (Fig. 10).

We compare it to results obtained from using the CCM method with the classical embedding approach of Cao [66]. Specifically, we set up the following workflow for this analysis:

  1. 1.

    50 time indices \(t' \in t\) are drawn randomly, where t covers the entire record.

  2. 2.

    For each of these indices \(t'\), time series of length \(N=5000\) for pressure and heat release are obtained and standardized to zero mean and unit variance (Fig 10).

  3. 3.

    Both time series samples (of full length \(N=5000\)) each are embedded using Cao’s method as a classical reference and our proposed framework MCDTS-C-CCM with 100 trials (Table 1). Based on the obtained reconstructions, \(\rho _\text {CCM-Cao}\) and \(\rho _\text {CCM-MCDTS}\) are computed for both directions as a function of increasing time series length as exemplary shown in Fig. 7A.

  4. 4.

    To ensure convergence in the CCM-sense, we fit a linear model to \(\rho _\text {CCM}\) (dashed black lines in Fig. 7A) and whenever that model gives a positive slope and the last value of \(\rho _\text {CCM}\) (i.e., for the longest considered time series of length \(N=5000\)) exceeds a value of 0.2, we infer a true causal relationship.

  5. 5.

    When we can detect a causal relation simultaneously in both directions, we compute the average of the pointwise difference \(\rho _{\text {CCM}~\mathrm{heat} \rightarrow \mathrm{pressure}} - \rho _{\text {CCM}~\mathrm{pressure} \rightarrow \mathrm{heat}}\)

The minimum considered value of 0.2 for \(\rho _\text {CCM}\) is an arbitrary and subjective choice, and we could have made other choices. But since this procedure is applied to \(\rho _\text {CCM-Cao}\) and \(\rho _\text {CCM-MCDTS}\) at the same time, we think this is reasonable and it prevents samples to be accounted for as “true causal” when there is near-0 \(\rho _\text {CCM}\), but a positive linear trend. Results do only change slightly when varying this value in some interval [0.2 0.3]. Figure 7B summarizes the results obtained for both considered embedding methods. Shown are the classification results for correctly deducing a causal influence of pressure on heat release (left panel) and of heat release on pressure (middle panel) based on our definition (item 4 in the list above). Thus, in this first step, we do not measure the strength of the causal relationship, but rather test whether such a relationship actually exists. While MCDTS-C-CCM maintains a correct classification in 92% of all cases considered (50 samples) for pressure \(\rightarrow \) heat release and 94% for heat release \(\rightarrow \) pressure, Cao’s method is only able to correctly classify 44% and 74%, respectively. These results themselves already demonstrate a clear advantage of our proposed method, but recall that we expect a causal relationship between heat release and pressure simultaneously for each sample. The right panel of Fig.  7B reveals that in 88% of all cases considered, MCDTS-C-CCM is able to detect a mutual causal relationship, while Cao’s method managed to do so in only 28% of the cases.

Furthermore, we try to validate a hypothesis made by Godavarthi et al. [100] that heat release has a stronger effect on pressure than vice versa for most of the considered dynamics. The problem of measuring the strength of a causal relationship is twofold: First, the experiment considered here exhibits a number of different dynamics due to the continuously changing control parameter. The hypothesis of an asymmetry in the strength of the interaction was made for stationary cases and four considered dynamics the authors investigated. Second, in describing the CCM method, Sugihara et al. [18] merely described that in the case of a stronger causal effect of X on Y, cross-mapping X with \(\mathbf {v}_y\) converges faster than the other way around. Thus, we would have to define what faster means with respect to our experimental curves like the ones shown in Fig. 7A. That would mean introducing some parameters on which the results would depend too much. Here, we pursue a simpler idea in order to detect the strength of a causal interaction. For samples where a causal relation in both directions has been detected, we compute the average of the pointwise difference of the CCM-correlation coefficients, i.e., \(\varDelta \rho _{\text {CCM}} = \rho _{\text {CCM}~\mathrm{heat}\rightarrow \mathrm{pressure}} - \rho _{\text {CCM}~\mathrm{pressure}\rightarrow \mathrm{heat}}\). When this difference is positive, we claim that heat release stronger effects pressure in a causal sense than vice versa. Our analysis reveals that the proposed method is able to reflect the hypothesized stronger causal effect of the heat release on pressure data. Figure 12 shows that for 29 of the 50 samples (\(\sim 58\%\)) \(\varDelta \rho _{\text {CCM}}\) is indeed positive. Using the Cao method, we were able to derive such a result in only \(\sim 26\%\) of all samples. In this case, however, only \(\sim 28\%\) of the samples were found to be mutually causally related at all (cf. Fig. 7B). Within the group of mutually causally related samples, the assumed asymmetry is reflected very well (13 of 14 mutually causally related samples had a positive \(\varDelta \rho _{\text {CCM}}\)).

The proposed MCDTS reconstruction approach shows a clear advantage when using it together with the CCM method. Not only is the general classification ability remarkable, but the MCDTS reconstructions also allow verification of an assumed asymmetric causal interaction, which would be limited by the classical time delay method.

4 Conclusions

A novel perspective of the embedding process has been proposed, in which the state space reconstruction from single time series can be treated as a game, in which each move corresponds to an embedding cycle and is subject to an evaluation through an objective function. It is possible to model different embeddings, i.e., different choices of delay values and time series (if there are multivariate data at hand) in the embedding cycles, in a tree like structure. Consequently, our approach randomly samples this tree, in order to ensure the finding of a global minimum of the chosen objective function. We leave it to practitioners which state space evaluation statistic, i.e., objective function, they use, since different research questions require different reconstruction approaches. There is also a free choice of a delay pre-selection method for each embedding cycle, e.g., using the minima of the auto-mutual information statistic. We recommend the combination of the continuity statistic of Pecora et al. [65] as a delay pre-selection method together with the L-statistic of Uzal et al. [29] as an objective function as a very good “all-rounder” for many research questions in nonlinear time series analysis, as already shown by Kraemer et al. [41] (PECUZAL algorithm). Since the sampling of the tree is a random procedure, the proposed idea only yields converging embedding parameters for a sufficient sampling size \(N_{\text {trial}}\). In our numerical investigations, \(N_{\text {trial}}=50\) usually led to satisfying results for univariate cases and \(N_{\text {trial}}=80\) for multivariate embedding scenarios. Our proposed method initializes in a local minimum of the objective function, which is achieved by minimizing the objective function in each embedding cycle to the maximum extend. So in practice, even setting \(N_{\text {trial}}\) too low would lead to similar—but never worse—results as the state-of-the-art methods. Moreover, the proposed method is not limited to delay pre-selection and objective functions that take into account certain physical constraints. It would also optimize the reconstruction vectors of the state space for research questions such as classification, where we could speak of a feature or latent space instead of the state or phase space notation associated with statistical physics. We exemplified the use of such a modular algorithm by combining different objective- and delay pre-selection functions. Its superiority to classical time delay embedding methods has been demonstrated for a recurrence analysis of the Lorenz-96 system, a prediction of the x-time series of the chaotic Hénon map and the \(\delta ^{13}\)C CENOGRID record as well as on studying causal interactions between variables in a combustion process.

With these applications, we showed the advantage MCDTS brings for any kind of method that utilizes an embedding such as recurrence analysis, embedding-based predictions of time series, or causal analysis with convergent cross-mapping. It, thus, has potential in many applications and disciplines, everywhere where such phase space-based approaches are used, but an automatic phase space reconstruction is required. The latter is of increasing interest, e.g., for big data analysis, analysis with highly reliable requirements (e.g., in medical applications), and also for deep learning-based frameworks.