1 Introduction

The classification of patterns in space is of high importance for many research questions in environmental sciences and engineering. Applications include, but are not limited to: rainfall data, seismic activity, animal movements and air traffic. Spatial classification is usually a demanding task, since it is computationally complex and a huge amount of data has to be collected and processed (Long and Nelson 2013). The problems we have in mind are high dimensional. For example, in our initial example we consider hourly radar rainfall data. At each point in time we have a grid of data, collecting the amount of rainfall at several points in space. A rainfall event or a rainfall cell on these images can be clearly identified visually since it has intensities greater than zero, just as in radar images. Based on these data, we want to classify how the rain cell respectively the ‘center’ of the cell moves in space. Movement of rainfall cells can, e.g., be caused by wind or orography and it is important to know the movement of the center to obtain information on the general movement and shift of the rainfall cell as well as where the main impact is recorded. Such information can, for example, be used for flood forecasting.

Usually questions of this kind and similar classification demands for spatial data are tackled nowadays using methods from AI (artificial intelligence) or ML (machine learning). For example, machine-learning approaches are used to classify the type of rainfall (Ghada et al. 2022), the covered area of rainfall (Meyer et al. 2016), the variability of rainfall patterns (Ibebuchi and Abu 2023) or animal movement data (Wang 2019).

However, classification via AI, especially deep learning such as deep neural networks, has several drawbacks. First of all, a large amount of training data is needed and sometimes, this is simply not available or requires huge manual efforts. Secondly, if a structural break in the data occurs, it is not possible to adjust a model (since there is no model). Structural breaks in rainfall data may occur, for example, when the recording instruments are changed or updated, when the basis of the calibration of the radar data is changed, the timing of records is altered or more generally if there is a change in climate. Structural breaks in animal movement data may be caused by anthropogenic impacts, such as the construction of roads. The outcome of ML approaches trained with data recorded before the break might therefore not reflect the change or the pre-break data might no longer be relevant for training. Thirdly, some concepts such as deep neural networks are often seen as a kind of black box (Liang et al. 2021), as their many layers are hard to interpret. Recently, many studies focus on how to increase this interpretability, e.g. by adding additional information or so-called explainers (e.g Buhrmester et al. 2021; Liang et al. 2021; Ferreira et al. 2022). A pre-selection of features or a classification of the input data might increase the interpretability.

In the present article we apply so called ordinal patterns to classify high-dimensional motion patterns. Ordinal patterns classify a vector or a time series by the order of each value in the vector. They have the advantage that they are robust against measurement errors and against several kinds of structural breaks (like, e.g. shifts in the data). Furthermore, the algorithms are very quick and the concept is intuitive. Even for data sets with less than 500 data points, one gets reliable results.

Our approach therefore might be a possibility to further contribute to the ongoing process of unboxing the black box of some ML approaches by means of additional information and pre-selection of features (see above). Before applying an ML-algorithms to the data, one at least fixes ‘direction’ or ‘motion’ as the feature of interest. ‘The research area of feature extraction or feature learning is concerned with finding efficient mappings from raw data to (low-dimensional) representations that capture information appropriate for the learning task.’ (Mohr 2022, page v). In the present article we describe a method which allows to reduce the data significantly while keeping the relevant information in a context of two-dimensional motion. Let us emphasize that a pre-processing of this kind not only makes the procedure more transparent, but also more efficient (Mumuni and Mumuni 2022).

Different versions of multivariate ordinal patterns have already been used in order to measure the entropy of certain systems (Mohr et al. 2020a) or data sets and in the context of symmetry approximation (Finke et al. 2020).

Here, we use classes of multivariate ordinal patterns in order to analyze the motion of weather phenomena and animals. Analyzing movements of this kind, as well as those of humans or mechanical objects, helps to understand the impact of these phenomena on the study region. For example, movement of rainfall cells can help to identify the most critical rainfall events for a region (namely those that hit the study region for the longest time) and animal movement can provide information on the probability of an animal to occur in a village. But, the classification can also be used as the input for further analyses, such as weather or flood forecast. Currently, easily applicable methods that are able to classify even small data sets (as is frequently the case in environmental research) are often missing. The method presented here closes this gap. We demonstrate that indeed the classification by ordinal patterns delivers meaningful results with respect to the dependence on the spatial resolution and typical rainfall movements in Europe.

Our method can even be generalized to higher dimensions. We provide the mathematical background for such a generalization in Sect. 3. In three dimensions one could think of flying animals or fish in the ocean.

The paper structure is as follows: In the subsequent section we describe the one dimensional case shortly and describe the generalization to multivariate patterns. Section 3 contains the limit theorems for our setting together with a small simulation study using bootstrap. Section 4 finally deals with the data sets for which our methods are designed.

2 From univariate ordinal patterns to multivariate ones

The reader might wonder, why we are starting with the univariate theory. This is due to the following fact: In the first step, our multivariate patterns are vectors of univariate patterns. This means that we are using the classical definitions separately in each component. In the second step, we reduce the massive number of patterns by considering classes of multivariate patterns.

2.1 Univariate patterns

The origin of the whole theory is in the one dimensional framework. This is natural, since the order structure of the data plays an important role. The up-and-down behavior of the time series is nicely captured via ordinal patterns.

Before tackling the multivariate setting, we describe the univariate one: Let \((x_j)_{j=1,...,N}\) be a data set with values in \({\mathbb {R}}\) or \((X_j)_{j\in {\mathbb {Z}}}\) a univariate time series, that is, a sequence of real-valued random variables \(X_j\) defined on the common probability space.

Let us consider a vector \(x=(x^1,...,x^d)\in {\mathbb {R}}^d\) which is either a part of our data set within a moving window of length d or a realisation of d consecutive random variables of our time series. We assign the rank of each entry with the entry itself, that is, the vector (1.2, 7, 2) is mapped to (1, 3, 2), since 1.2 is the lowest value (rank 1) and 7 is the highest value (rank 3). The function which assigns the pattern with the vector x is called \(\Pi\). Mathematically speaking, \(\Pi\) maps d-dimensional vectors into the space \(S_d\) of permutations. This space consists of special vectors, which contain each of the values \(\{1,...,d\}\) exactly once.

By now, we have only considered the case without ties, that is, all values of the vector are different. As soon as the vector x contains a certain value twice (or more often), we call this a tie. In most papers on ordinal patterns it is (sometimes tacitly) assumed, that ties are excluded or at least very rare. Dealing, e.g., with finance data having 4 or more digits, that is a reasonable assumption. However, in climate data one might face data sets with several ties. If ties occur, there are some ways to overcome this problem in the classical setting: either, the definition is slightly modified in a way that still yields a unique pattern. In this case one usually assigns the lower rank with the lower index within the vector, that is, one maps (1, 1, 1) to (1, 2, 3) which is at least counter-intuitive and upward movements are overestimated. Other ways are to exclude the values of the respective window or to randomize in this case. In the latter case one adds a small noise to the sequence which breaks the ties. No matter which procedure is used, information is lost. Hence, as soon as several ties occur, it is useful to follow the approach in Schnurr and Fischer (2022) and consider patterns with ties. In this article, we explicitly allow for ties and assign a larger number of patterns to the vectors under consideration.

Consider again a vector of d consecutive values \((x^1,...,x^d)\). Write down the values, which are attained \((y^1,...,y^m)\) such that \(y^1<y^2<...<y^m\). Let us emphasize that \(m\in {\mathbb {N}}\) is the number of different values attained in \(x^1\),..., \(x^d\). Now define the generalized pattern of \((x^1,...,x^d)\) to be the vector \(\textbf{t}=(t^1,...,t^d) \in {\mathbb {N}}^d\) such that

$$\begin{aligned} t^j=k \text { if and only if } x^j=y^k. \end{aligned}$$

We write \(\Psi\) for the function which assigns the generalized pattern with each vector in contrast to the \(\Pi\) from above. The set of all generalized patterns of order d is denoted by \(T_d\) (in contrast to the classical permutations \(S_d\)). The patterns of length \(d=3\) are given in Fig. 1.

Fig. 1
figure 1

The 13 generalized patterns of length \(d=3\)

A drawback of these generalized patterns is that the number of possible patterns grows even quicker than in the classical case (Table 1).

Table 1 Number of classical and generalized patterns depending on the window length d

Let us give an intuitive example for the differences between the classical and the generalized patterns as well as between \(\Pi\) and \(\Psi\): Imagine a race where the time which each runner needed to reach the finish line is sampled in a vector x. If these times are all different to each other, one can use \(\Pi\) in order to assign the rank with each runner. The quicker he/she was, the lower his/her rank. If two or more runners have exactly the same time, \(\Pi\) either randomizes or just assigns the better rank to the person having a smaller starting number. Draws are forbidden. In contrast to this, \(\Psi\) allows for draws and results like (1,2,1) are possible.

A comment on the length of the patterns is in place: Experts in the field have emphasized that short pattern lengths like \(d=2\) or \(d=3\) should be used in the context of data analysis (cf. Bandt 2019, and the references given therein). This is due to the fact that each pattern probability is an additional parameter which has to be taken into account and which has to be estimated by statistical methods. In addition, the shorter patterns already contain a lot of information on the longer patterns. As an example consider the case, where one knows the patterns of \((x^1,x^2,x^3)\) and \((x^2,x^3,x^4)\). Then, only very few patterns of length 4 of \((x^1,x^2,x^3, x^4)\) are possible. In some cases it might be only a single possible pattern.

Later, we will use patterns only up to the order 3, which in both cases still yields a tractable number of patterns. The number of classical patterns is just d! while the number of generalized patterns are the so called ordered bell numbers. They can be found, e.g., at the online encyclopedia of integer sequences (oeis.com). There, one also finds several other applications in which the ordered bell numbers appear. To our knowledge, Unakafova and Keller (2013) were the first having calculated these numbers in the context of ordinal patterns.

Analyzing one-dimensional ordinal patterns within a data set, one can describe periodicities, times of monotonicity etc. Furthermore, it is possible to test for serial dependence (Weiß 2022; Weiß and Schnurr 2020), derive the entropy of the underlying system (Piek et al. 2019; Bandt and Pompe 2002) and to calculate the Hurst parameter in the case of long-range dependence (Sinn and Keller 2011).

2.2 Multivariate extensions

Let us now come to the multivariate setting. First we describe what would happen if we tried to generalize the univariate framework into more than one dimension in a naive way. Afterwards we reduce the multivariate complexity in order to derive a tractable and computationally efficient procedure.

As we have pointed out in the introduction, we are considering a 2-dimensional grid with data in \({\mathbb {R}}\) for every time-point. That is, for every time step \(j\in {\mathbb {N}}\) we have a \(k\times \ell\)-matrix with values in the real numbers. We consider two standard settings. Either we consider a movement in the points of the grid, i.e., the values that can be attained are in \(\{1,...,k\}\times \{1,...,\ell \}\), or we consider points in \([1,k]\times [1,\ell ]\). An example for the first setting is the movement of the maximum. In this case we track the argmax of the data. The second setting is considered, e.g., if we calculate the center of mass (balance point) which is usually a point not contained in the original grid. While we will usually not obtain ties in the second setting, this is indeed possible in the first one, in particular, if the grid is coarse.

In principle both standard settings can be tackled in the same way, one only has to keep in mind that ties can be a problem (or carry valuable information) in the first setting.

The time series we consider are now \((\varvec{Y}_j)_{j\in {\mathbb {Z}}}\) having values either in \(\{1,...,k\}\times \{1,...,\ell \} \subseteq {\mathbb {Z}} \times {\mathbb {Z}}\) or in \([1,k] \times [1,\ell ] \subseteq {\mathbb {R}} \times {\mathbb {R}}\). Our aim is to describe the movement within this derived time series in terms of ordinal patterns.

There have been various attempts in the literature on how to deal with multivariate extensions of ordinal patterns. We give a short overview at the end of this section.

It is possible to consider 3 or more dimensions. However, the number of parameters in the background grows very quickly and it becomes more involved to find classes of patterns sharing interesting properties. Hence, we suggest to use the procedure described here to deal with 2 dimensional data.

Let us consider here explicitly the case of two dimensions. Probably the most natural approach is to consider the vector component-wise and associate with each dimension a pattern as in the one dimensional case (cf. Mohr et al. 2020b, Section 4.1). The two-dimensional pattern is then a vector of patterns or this might be encoded again by a different symbol (which is not canonical, since there is no natural ordering of this two-dimensional structure). A problem, which arises directly—even in dimension 2—is the rapidly growing number of patterns under consideration.

Patterns of length 2 could be considered, but usually they do not carry sufficient information. If we consider patterns of length \(d\ge 3\) we have—even in the case without ties—to deal with 36, 576 or 14,400 patterns (for \(d=3,4,5\)). Higher values are usually not considered in the context of statistics. Let us mention that they are considered in the context of dynamical systems. There, it is even assumed that d tends to infinity (cf. e.g. Keller and Sinn 2010, ).

We will (at first) use patterns of order \(d=3\). In all that follows we try to keep the number of objects we consider, and hence the number of parameters, sufficiently low.

Considering patterns of order three in dimension two, we obtain 36 different patterns. In Fig. 2, nine of them are depicted, namely those which start with a movement towards north-east. This is equivalent to the two one-dimensional movements to begin with an upward movement.

Fig. 2
figure 2

The nine bivariate patterns moving to north-east first. The two one-dimensional patterns of the components are depicted in the upper left corner of each cell. There, the left one always describes the motion in left-right direction; the right one describes the up-down movement

The first idea is to reduce the number of patterns accordingly to the questions we have in mind and we have pointed out in the introduction. Hence, we use the patterns in order to classify whether we are facing a right turn (R), a left turn (L), monotonicity (M) or inversion (I). The reader might adapt this procedure by making up his/her own classes of patterns depending on a different question or type of analysis. Our limit results remain valid, no matter how classes among patterns are being built.

Let us give an explicit overview on which patterns yield which class. We have L, R, M and I as pattern classes in Table 2.

Table 2 Two-dimensional ordinal patterns of order \(d=3\) assigned to the four classes right turn (R), left turn (L), monotonicity (M) and inversion (I)

Let us emphasize that instead of dealing with 36 patterns, we deal with 4 pattern classes. In order to derive all patterns which belong to a certain class, the following facts are useful: in one dimension the inverse pattern to (xyz) is \((4-x,4-y,4-z)\). Inverse means, that one gets the pattern by reflecting it at the x-axis, that is, the up-and-down behavior is completely opposite to the original pattern. Taking one inverse pattern and leaving the pattern in the other dimension as it is, one maps M on M and I on I, left turns become right turns and right turns become left turns. Considering longer time windows means that we get more information on the behavior of the rain area or other phenomena we consider. We could try to use the same procedure for \(d=4,5,...\) as we have done in the case \(d=3\), that is, we could consider all two-dimensional patterns of that length and then try to classify them. This becomes almost impossible even in the case \(d=4\) (576 patterns). There are patterns having a kind of zik-zak-movement in one dimension, while we have monotonicity in the other dimension and so on.

Hence we make use of our second idea: instead of considering longer patterns, we reduce the complexity of the time series again, by only analyzing the behavior in terms of classes L, R, M and I as introduced above. A long left-turn could then be something like L, L, L, that is, 5 bivariate data points which yield a left turn (of any kind) for three consecutive data triplets: \((X_{j}, X_{j+1}, X_{j+2})\), \((X_{j+1}, X_{j+2}, X_{j+3})\) and \((X_{j+2}, X_{j+3}, X_{j+4})\) each yield the pattern class L.

The complexity of the time series is again reduced by this procedure, namely we are left with a time series with values in \(\{ \text {L, R, I, M}\}\)

Let us now analyze what changes, if we consider ties. The main difference is that we can now encounter the absence of a movement in one or two directions, this yields classes of patterns which can be interpreted as a zero movement, the beginning of a motion or the end of a motion. Even in the case \(d=3\) we have \(13^2=169\) two dimensional ordinal patterns. Combining patterns with ties in one dimension with patterns without ties in the other dimension usually yields a left or right turn. Let us depict nine of the more interesting cases in Fig. 3. The complete overview on the 169 cases is given in Table 7.

Fig. 3
figure 3

Nine bivariate patterns with ties moving to north-east. The two one dimensional patterns are again depicted in the upper left corner of each cell

In addition to \(\{ \text {L, R, I, M}\}\) we have now the total absence of a movement, a zero movement (Z), or a break (B), that is, the motion is stopped. Finally we can observe the start (S) of a motion. The 169 different two dimensional patterns with ties are hence reduced to 7 classes. A data set with values in \({\mathbb {R}}^{k\times \ell }\) becomes a data set in the finite space \(\{ \text {L, R, I, M, B, S, Z}\}\). Admittedly, the three new classes make the procedure (a little) more complicated and we have three more parameters which have to be estimated in applications. However, if ties appear in the data, adding these classes is inevitable. Otherwise we would have to leave out data vectors with ties (which might be a lot) or we would have to randomize as in the one-dimensional case. Either way, we would lose a lot of information.

Having presented our method to deal with multivariate ordinal patterns, let us shortly recall what can be found in the existing literature: Mohr has dealt with different concepts in her doctoral thesis (Mohr 2022). Compare in this context also Mohr et al. (2020b). The different approaches boil down to the following four ideas:

(a) If the data is n-dimensional, consider the ordinal pattern at each fixed point in time. This means that one does not use a moving window of length d, but instead compares the values of the different dimensions at each time point separately. The pattern length is n. This approach helps to analyze the dependence between the different components, but does not capture evolution or dependence over time. Compare in this context the spatial approach in Schnurr and Fischer (2022).

(b) Secondly, one could work in a one-dimensional setting in each of the components. That is, one considers the different dimensions separately as it is done, for instance in Oesting and Huser (2022) for longitudinal and latitudinal movements. In the context of permutation entropy one ‘pools’ afterwards over all dimensions (cf. Keller and Lauffer 2003, ). The interplay between the different components is lost completely. Rather, one just glues together the one-dimensional time-series.

(c) Furthermore one could map the multivariate data to scalar data and use classical one-dimension ordinal pattern analysis afterwards. This is done using the concept of depth in Betken and Schnurr (2023). If one uses such a mapping which reduces the dimension, one has to be very careful that no important information on the multivariate structure is lost. The mapping might introduce biases or artifacts that were not present in the original multivariate data set.

(d) Finally, it is possible to consider a vector of patterns for each time window, that is, elements in \(S_d^2\) in the case without ties and elements in \(T_d^2\) in the case with ties. This is done in the first step of our analysis in the present article.

3 Estimation of pattern distributions

In this section, we discuss how to estimate the distribution of multivariate patterns from an excerpt of a stationary time series. Here, we are more general than above and consider arbitrary dimensions p and pattern length d. More precisely, we consider a stationary multivariate time series \(\varvec{Y} = (\varvec{Y}_j)_{j \in {\mathbb {Z}}}\) with values in \({\mathbb {Z}}^p\) or \({\mathbb {R}}^p\) (\(p \in {\mathbb {N}}\)) and aim at estimating the distribution of multivariate patters \(\Pi (\varvec{Y}_1,\ldots , \varvec{Y}_d)\) and \(\Psi (\varvec{Y}_1, \ldots , \varvec{Y}_d)\), respectively, based on observations of \(\varvec{Y}_1, \varvec{Y}_2, \ldots , \varvec{Y}_{n+d-1}\). More precisely, we want to estimate p(A) where

$$\begin{aligned} {\left\{ \begin{array}{ll} p(A) = \textrm{Pr}(\Pi (\varvec{Y}_1. \ldots , \varvec{Y}_d) \in A), &{} \quad A \subset S_d^p, \quad \text {if no ties are considered}, \\ p(A) = \textrm{Pr}(\Psi (\varvec{Y}_1. \ldots , \varvec{Y}_d) \in A), &{} \quad A \subset T_d^p, \quad \text {if ties are considered}, \end{array}\right. } \end{aligned}$$

In particular, this includes the examples for \(p=2\) and \(d=3\) considered in Sect. 2.2.

Due to the stationarity of Y, a natural estimator is given by the sliding window estimator

$$\begin{aligned} {\left\{ \begin{array}{ll} {{\widehat{p}}}_n(A) = \frac{1}{n} \sum \nolimits _{j=1}^n 1\{\Pi (\varvec{Y}_j,\ldots ,\varvec{Y}_{j+d-1}) \in A\},&{} \quad A \subset S_d^p, \\ {{\widehat{p}}}_n(A) = \frac{1}{n} \sum \nolimits _{j=1}^n 1\{\Psi (\varvec{Y}_j,\ldots ,\varvec{Y}_{j+d-1}) \in A\},&{} \quad A \subset T_d^p, \end{array}\right. } \end{aligned}$$

if no ties are considered and if ties are considered, respectively.

In many applications of interest, one might not want to take into account all patterns but only the patterns at times when certain types of events, such as rainfall events, occur. These occurrences are encoded by a binary time series \(O = (O_j)_{j \in {\mathbb {Z}}}\) where

$$\begin{aligned} O_j = {\left\{ \begin{array}{ll} 1, &{} \quad \text {if an event occurs at time }j, \\ 0, &{} \quad \text {if no event occurs at time }j. \end{array}\right. } \end{aligned}$$

Assuming stationarity of the \((p+1)\)-variate time series \((\varvec{Y},O) = ((\varvec{Y}_j,O_j) )_{j \in {\mathbb {Z}}}\), we can thus define the estimator

$$\begin{aligned} {\left\{ \begin{array}{ll} {{\widehat{p}}}^c_n(A) = \frac{\sum \nolimits _{j=1}^n 1\{\Pi (Y_j,\ldots ,Y_{j+d-1}) \in A, O_j = \ldots = O_{j+d-1}=1 \}}{\sum \nolimits _{j=1}^n 1\{O_j = \ldots = O_{j+d-1}=1 \}}, \quad A \subset S_d^p,\\ {{\widehat{p}}}^c_n(A) = \frac{\sum \nolimits _{j=1}^n 1\{\Psi (Y_j,\ldots ,Y_{j+d-1}) \in A, O_j = \ldots = O_{j+d-1}=1 \}}{\sum \nolimits _{j=1}^n 1\{O_j = \ldots = O_{j+d-1}=1 \}}, \quad A \subset T_d^p, \end{array}\right. } \end{aligned}$$
(1)

for the conditional pattern distribution

$$\begin{aligned} {\left\{ \begin{array}{ll} p^c(A) = \textrm{Pr}( \Pi (Y_1, \ldots , Y_d) \in A \mid O_1=\ldots =O_d=1), &{} \quad A \subset S_d^p, \\ p^c(A) = \textrm{Pr}( \Psi (Y_1, \ldots , Y_d) \in A \mid O_1=\ldots =O_d=1), &{} \quad A \subset T_d^p, \end{array}\right. } \end{aligned}$$

Note that the denominator of the conditional estimator \(\widehat{p}^c_n(A)\) might be zero if \(O_j=0\) with positive probability. In this case, it might happen that no sequence of d events in a row is observed and, therefore, the conditional pattern distribution cannot be estimated. If, in contrast, \(O_j = 1\) almost surely, the conditional estimator \({{\widehat{p}}}^c_n(A)\) coincides with the unconditional estimator \({{\widehat{p}}}_n(A)\). Consequently, we henceforth focus on the more general conditional case only.

In the remainder of this section, we will first analyze the asymptotic behaviour of the estimator and establish limit theorems (Sect. 3.1), before demonstrating how to assess the uncertainty of the estimators via bootstrap in numerical experiments (Sect. 3.2).

3.1 Limit theorems

In order to establish asymptotic normality of the estimator, we will have to impose appropriate mixing conditions on the \((p+1)\)-variate time series \((\varvec{Y},O)\). Here, we choose conditions in terms of the strong \(\alpha\)-mixing coefficients. For an arbitrary stationary multivariate time series \(\varvec{Z}=(\varvec{Z}_j)_{j \in {\mathbb {Z}}}\), these are defined by

$$\begin{aligned} \alpha _{\varvec{Z}}(n)= & {} \sup _{A \in \sigma ({\varvec{Z}_j}: \, j \le 0), \, B \in \sigma ({\varvec{Z}_j}: \, j \ge n) } \vert \textrm{Pr}(A \cap B) \nonumber \\{} & {} - \textrm{Pr}(A) \textrm{Pr}(B) \vert , \quad n=0,1,\ldots . \end{aligned}$$
(2)

The process \(\varvec{Z}\) is called strongly \(\alpha\)-mixing if \(\alpha _{\varvec{Z}}(n) \rightarrow 0\) as \(n \rightarrow \infty\). Examples for processes satisfying this condition include certain Gaussian ARMA processes (see Example 2).

A slightly stronger condition on these mixing coefficients allows us to show asymptotic normality of \({{\widehat{p}}}^c(A)\). For notational convenience, the results are stated for the case that ties are considered. However, all the results still hold true in the case of no ties if we replace \(\Psi\) by \(\Pi\) and \(T_d^p\) by \(S_d^p\), respectively.

Theorem 1

Let \((\varvec{Y},O) = ((\varvec{Y}_j, O_j))_{j \in {\mathbb {Z}}}\) be a stationary \((p+1)\)-variate time series with \(\alpha\)-mixing coefficients \((\alpha _{(\varvec{Y},O)}(n))_{n=0,1,\ldots }\) satisfying

$$\begin{aligned} \sum \nolimits _{k \in {\mathbb {N}}} \alpha _{(\varvec{Y},O)}(k) < \infty . \end{aligned}$$
(3)

Then, with \(p_d:= \textrm{Pr}(O_1=\ldots =O_d=1)\) and for all \(A \in T_d^p\), we have

$$\begin{aligned}&\sqrt{n} \left( \begin{array}{c} \frac{1}{n p_d} \sum \nolimits _{j=1}^n 1\{\Psi (Y_j,\ldots ,Y_{j+d-1}) \in A, O_j = \ldots = O_{j+d-1}=1 \} - p^c(A) \\ \frac{1}{n p_d} \sum \nolimits _{j=1}^n 1\{O_j = \ldots = O_{j+d-1}=1 \} - 1 \end{array}\right) \\ {}&\quad \longrightarrow _d {\mathcal {N}}\Big ( \Big (\begin{array}{c} 0\\ 0 \end{array}\Big ), \Big (\begin{array}{cc} \sigma _1^2(A) &{} \sigma _{12}(A)\\ \sigma _{12}(A) &{} \sigma _2^2(A) \end{array}\Big ) \Big ), \end{aligned}$$

where

$$\begin{aligned} \sigma _1^2(A)&= \frac{1}{p_d^2} \sum \nolimits _{j \in {\mathbb {Z}}} \textrm{Cov}( 1\{\Psi (Y_1,\ldots ,Y_{d}) \in A, O_1 = \ldots = O_{d}=1\}, \\&\qquad \qquad \qquad \qquad \qquad 1\{\Psi (Y_j,\ldots ,Y_{j+d-1}) \in A, O_j = \ldots = O_{j+d-1}=1\} )\\ \sigma _2^2(A)&= \frac{1}{p_d^2} \sum \nolimits _{j \in {\mathbb {Z}}} \textrm{Cov}( 1\{O_1 = \ldots = O_{d}=1\}, 1\{O_j = \ldots = O_{j+d-1}=1\} ) \\ \text {and} \quad \sigma _{12}(A)&={} \frac{1}{p_d^2} \sum \nolimits _{j \in {\mathbb {Z}}} \textrm{Cov}( 1\{O_1 = \ldots = O_{d}=1\}, \\&\qquad \qquad \qquad \qquad \qquad 1\{\Psi (Y_j,\ldots ,Y_{j+d-1}) \in A, O_j = \ldots = O_{j+d-1}=1\} ). \end{aligned}$$

In particular,

$$\begin{aligned}{} & {} \sqrt{n} \Big ( {{\widehat{p}}}_n^c(A) - p^c(A) \Big ) \longrightarrow _d {\mathcal {N}}\Big (0, \sigma _1^2(A) - 2 p^c(A) \sigma _{12}(A) \nonumber \\{} & {} \quad + p^c(A)^2 \sigma _2^2(A)\Big ). \end{aligned}$$
(4)

The proof of this theorem is given in “Appendix B”.

To illustrate the applicability of our result, we consider a flexible class of examples including arbitrary transformations of multivariate Gaussian ARMA models. However, these processes just serve as an example. The class of processes that satisfy the assumptions of Theorem 1 is much richer.

Example 2

For all \(j \in {\mathbb {Z}}\), let the vector \((\varvec{Y}_j, O_j)\) be of the form \(f(\varvec{X}_j)\) for some function \(f: {\mathbb {R}}^{q} \rightarrow {\mathbb {R}}^{p+1}\) and some multivariate stationary Gaussian ARMA process \((\varvec{X}_j)_{j \in {\mathbb {Z}}}\). Then, we have

$$\begin{aligned} \varvec{X}_j = H \varvec{Z}_j, \quad j \in {\mathbb {Z}}, \end{aligned}$$

where

$$\begin{aligned} \varvec{Z}_j = F \varvec{Z}_{j-1} + G \varvec{e}_j, \quad j \in {\mathbb {Z}}, \end{aligned}$$

for some matrices FGH, another stationary time series \((\varvec{Z}_j)_{j \in {\mathbb {Z}}}\) and i.i.d. random vectors \(\varvec{e}_j\), \(j \in {\mathbb {Z}}\), with \(\varvec{e}_j \sim {\mathcal {N}}(\varvec{0}, \varvec{I}_r)\). Then, obviously, \({\mathbb {E}}(\Vert \varvec{e}_j\Vert ^{\delta }) < \infty\) and

$$\begin{aligned}&\int \frac{1}{\sqrt{2 \pi }^r} \Big \vert \exp \Big (- \frac{1}{2} (\varvec{x} - \varvec{\theta })^\top (\varvec{x} - \varvec{\theta })\Big ) - \exp \Big (- \frac{1}{2} \varvec{x}^\top \varvec{x} \Big ) \Big \vert \, \textrm{d} \varvec{x} \\&\quad ={} {\mathbb {E}}\Big \vert 1 - \exp (\langle \varvec{e}_j, \varvec{\theta }\rangle ) \exp \Big (- \frac{1}{2} \Vert \varvec{\theta }\Vert ^2\Big )\Big \vert \\&\quad \le {} (1+ \exp (\Vert \varvec{\theta }\Vert ^2)) \sum _{i=1}^r \vert \theta _i\vert {\mathbb {E}}\Big ( \vert e^{(i)} \vert \exp ( \vert \theta _i e^{(i)} \vert ) \Big ) \in {\mathcal {O}}(\Vert \varvec{\theta }\Vert _1) \end{aligned}$$

as \(\Vert \varvec{\theta }\Vert _1 \rightarrow 0\). If the modulus of all the eigenvalues of F is less than 1, by Thm. 3.1 in Pham and Tran (1985), the \(\alpha\)-mixing coefficients of \((\varvec{X}_j)_{j \in {\mathbb {Z}}}\) decay at an exponential rate and so do \(\vert \alpha _{(\varvec{Y},O)}(n)\vert \le \vert \alpha _{\varvec{X}}(n)\vert\). Thus, the summability condition in Theorem 1 holds.

It has to be noted that even though the underlying process \((\varvec{X}_j)_{j \in {\mathbb {Z}}}\) is assumed to be Gaussian, the function f allows for very flexible modelling of the process \((\varvec{Y}_j)_{j \in {\mathbb {Z}}}\). This includes phenomena such as multimodal distributions and heavy tails and thus go beyond the assumption of Gaussianity which is mathematically convenient, but might be unrealistic in practice.

3.2 Assessing the uncertainty of the estimators via bootstrap

While Eq. (1) provides a straightforward point estimate for the probability masses of the conditional pattern distribution, Theorem 1 can be used to assess its uncertainty. The specification of concrete confidence intervals, however, requires knowledge of the asymptotic variance in Eq. (4). As this expression is quite involved and can hardly be estimated directly, it is advisable to resort to resampling techniques such as bootstrap in order to assess the uncertainty of the estimate. Here, it is important to note that our procedure highly relies on the temporal order and temporal dependence structure of the data, which therefore necessarily has to be taken into account by the bootstrap. Thus, we propose the use a multiplier (disjoint) block bootstrap procedure (Bücher and Ruppert 2013; Drees 2015; Bücher and Kojadinovic 2016).

More precisely, we divide our time series into disjoint blocks of length \(\ell\),

$$\begin{aligned} \{ (O_{(i-1)\ell +1}, \varvec{Y}_{(i-1)\ell +1}), \ldots , (O_{i\ell }, \varvec{Y}_{i\ell }) \}, \quad i=1,\ldots , \lfloor n/\ell \rfloor , \end{aligned}$$

estimate both the numerator and denominator in Eq. (1) separately and reweight them by i.i.d. multiplier random variables \(1 + \xi _1, \ldots , 1 + \xi _{\lfloor n/\ell \rfloor }\) satisfying \({\mathbb {E}}(\xi _i)=0\) and \(\textrm{Var}(\xi _i)=1\). Thus, we obtain the bootstrap estimator

$$\begin{aligned} {{\widehat{p}}}^{c,B}_n(A) = \frac{ \sum \limits _{i=1}^{\lfloor n/\ell \rfloor } (1+\xi _i) \sum \limits _{j=(i-1)\ell +1}^{i\ell -d+1} 1\{\Pi (Y_j,\ldots ,Y_{j+d-1}) \in A, O_j = \ldots = O_{j+d-1}=1 \}}{ \sum \limits _{i=1}^{\lfloor n/\ell \rfloor } (1+\xi _i) \sum \limits _{j=(i-1)\ell +1}^{i\ell -d+1} 1\{O_j = \ldots = O_{j+d-1}=1 \}}. \end{aligned}$$

By independently sampling new multiplier random variables \(N_{BS}\) times, we obtain a sample of independent bootstrap estimates (conditionally on the data) whose distribution can be perceived as an approximation of the distribution of the original estimator.

We illustrate this procedure by a small simulation study. Here, we consider a special case of the multivariate stationary Gaussian ARMA time series in Example 2. More precisely, we consider a trivariate time series \((\varvec{Z}_j)_{j \in {\mathbb {Z}}}\) of the form:

$$\begin{aligned} \varvec{Z}_j:= \left( \begin{array}{c} Z_{j}^{(1)}\\ Z_{j}^{(2)}\\ Z_{j}^{(3)} \end{array} \right) = \left( \begin{array}{c} \rho _1 Z_{j-1}^{(1)} + \sqrt{1-\rho _1} e_j^{(1)} \\ \rho _2 Z_{j-1}^{(2)} + \sqrt{1-\rho _2} e_j^{(2)} \\ \rho _3 Z_{j-1}^{(3)} + \sqrt{1-\rho _3} e_j^{(3)} \\ \end{array} \right) , \end{aligned}$$

where \(\varvec{e}_j\), \(j \in {\mathbb {Z}}\), are i.i.d. with \(\varvec{e}_j \sim {\mathcal {N}}(\varvec{0}, \varvec{I}_3)\) and parameters \(\rho _1, \rho _2, \rho _3 \in (-1,1)\). Then, by construction, the stationary solution possesses independent components and standard Gaussian marginal distribution. We set

$$\begin{aligned} O_j = 1\{Z_j^{(1)} > \Phi ^{-1}(0.1)\}, \qquad \varvec{Y}_j = (Z_{j}^{(2)},Z_{j}^{(3)})^\top . \end{aligned}$$

Consequently, we expect events occurring in \(90\,\%\) of all cases.

We consider three different parameter settings:

  1. (i)

    independence: \(\rho _1=\rho _2=\rho _3=0\)

  2. (ii)

    moderate dependence: \(\rho _1=\rho _2=\rho _3=0.5\)

  3. (iii)

    strong dependence: \(\rho _1=\rho _2=\rho _3=0.8\)

In each case, we simulate \(N=100\) time series of length \(n=10\,000\) and estimate the probabilities

$$\begin{aligned} {{\widehat{p}}}_{n,j}^c(L), \ {{\widehat{p}}}_{n,j}^c(R), \ \widehat{p}_{n,j}^c(I), \ {{\widehat{p}}}_{n,j}^c(M), \quad j=1,\ldots ,N, \end{aligned}$$

for patterns of length \(d=3\), where the additional index j refers to the sample under consideration. For each sample, we estimate the uncertainty by the bootstrap sample standard deviation \(\hat{\sigma }_j^{(BS)}(A)\), \(A \in \{L,R,I,M\}\), based on a block length of \(\ell =100\) and a bootstrap sample size of \(N_{BS}=500\). We then compare the sample standard deviations

$$\begin{aligned} {\widehat{\sigma }} (A) = \frac{1}{N-1} \sum _{j=1}^n \Big [{{\widehat{p}}}_{n,j}^c(A) - \overline{p_{n}^c}(A)\Big ]^2, \end{aligned}$$

where

$$\begin{aligned} \overline{p_{n}^c}(A)= \frac{1}{N} \sum _{j=1}^n {{\widehat{p}}}_{n,j}^c(A), \quad A \in \{L,R,I,M\}, \end{aligned}$$

to the average bootstrap standard deviations \(\overline{\sigma ^{(BS)}}(A) = \frac{1}{N} \sum _{j=1}^n \hat{\sigma }_j^{(BS)}(A)\), \(A \in \{L,R,I,M\}\). The results are reported in Table 3. It can be seen that, due to the symmetry of the two components of Y, the pattern class L and R are equally likely in all three scenarios. The monotone patterns M, however, occur less frequently than the inverse patterns I in the independent scenario, while the order of these two frequencies changes in the strongly dependent scenario. In other words: Bivariate movements become more persistent in case of stronger dependence. Due to the large sample size n, the overall uncertainty is small with standard deviations between \(0.3\,\%\) and \(0.7\,\%\) which are very well recovered by the bootstrap estimates. We note that these results are invariant under monotone marginal transformations, that is, they also hold true for non-Gaussian processes with the same dependence structure.

Table 3 Relative frequencies \(\overline{p_{n}^c}(A)\) of patterns, the empirical standard deviations \({\widehat{\sigma }} (A)\) and the estimated standard deviations \(\overline{\sigma ^{(BS)}}(A)\) obtained via bootstrap, \(A \in \{L,R,I,M\}\), for the independent, the moderately dependent and the strongly dependent scenario (from top to bottom)

4 Application: real life data examples

4.1 Example 1: Rainfall radar data

As an example, we consider rainfall radar data for a catchment in Germany. Rainfall radar in Germany are recorded for more than twenty years and are available in a bias-corrected version from the German Weather Service (DWD), denoted by RADOLAN (Winterrath et al. 2018). Due to their high spatial (\(1 \times 1\) km2) and temporal (hourly data) resolution, rainfall radar data are used more and more often as input data for hydrological simulation and modelling of runoff. Moreover, they are used as training data sets for short-term flood forecasting based on artificial intelligence. It is therefore of high practical relevance to characterise patterns in rainfall events and their movement in space, e.g. of the focus point. Here, rainfall events are defined as periods of rainfall where the rainfall in one grid cell exceeds the rainfall sum of a given level. We chose the level as the return period of one year of the annual maximum 1 h-rainfall sums, which is an often used input for simulations for the sewer network (Huard et al. 2010). The corresponding rainfall event then is defined as the period where the threshold of 0.1 mm per hour (a typical threshold to differ between wet and dry spells, see, e.g., Huang et al. 2015, ) is exceeded in any grid cell. As an example, a catchment in Saxony, eastern Germany is considered. The Tannenberg catchment has a size of 92 km2, which leads to a consideration of 210 grid cells of radar data (including a 5 km buffer). A total number of 254 events was identified for this catchment in the period 2001–2020. An example of one identified rainfall event is given in Fig. 4.

Fig. 4
figure 4

Example of one six-hour rainfall event for the Tannenberg catchment in Germany

To track the spot with highest rainfall amount of the rainfall event, the focus point is considered. This focus point is estimated by using the weighted mean of all cells, where the weights are chosen as the rainfall heights of each cell. Hence, as long as the overall rainfall pattern changes (which is usually the case when considering 1-hour time steps), ties will not occur. In such a case, only the four main equivalence classes have to be considered: L, R, M, I. Here, bivariate patterns of length three are considered.

The results for the radar rainfall events in Tannenberg catchment with \(1 \times 1\) km grid resolution are given in Table 4, together with the theoretical frequency of each equivalence class under independence of both components (directions) and independence in time for each component, which serves as comparison value to evaluate the significance of the results. Additionally, also the frequency of each equivalence class under consideration of short-range dependence (SRD) of each component but independence between the components is considered (see the ARMA model in Sect. 3.2). The results demonstrate that the frequency of monotone movement is much higher than expected, while the left turn movement is lower than expected when compared to completely independent components. This implies that indeed SRD of each component can be expected, as this increases the frequency of monotone behaviour (see last two columns of Table 4). This dependence can be expected when considering hourly rainfall data. Interestingly, also the inverse movement occurs frequently, which is counter-intuitive to what one would expect from rainfall cells. This might be due to the methodology used here to define the focus point. The applied methodology is sensitive and might lead to changing patterns simply due to a shift in the outer cells of the rainfall event. Yet, this is not what we expect to observe for rainfall events. Therefore, it might be more intuitive to use the argmax-function instead of the focus point to determine the grid cell with maximum rainfall per time step. Yet, this approach also has a drawback: Depending on the spatial resolution of the data, it might occur that the cell with maximum rainfall does not change between two consecutive time steps. This introduces the presence of ties in patterns, and hence the number of equivalence classes that are considered increases, as also the classes Z, S and B have to be taken into account.

Table 4 Frequency of patterns for the focus point of rainfall events in the Tannenberg catchment in Saxony and theoretical frequency of each equivalence class under independence of all and between all components and under short-range dependence (SRD) in time of each component (in %)

To additionally investigate the impact of the resolution of the data, the spatial resolution is varied from small scale (\(1 \times 1\) km) to large scale (\(4 \times 4\) km) such that changes on small distance of 1 km-steps as well as on large distance of 4 km-steps can be considered. The results are given in Table 5 together with the theoretical frequency of each equivalence class under independence, which again serves as comparison value to evaluate the significance of the results. Specific results under SRD are not given here, since the results are crucially influenced by the model choice for the ties, e.g., INAR models or Poisson-increments. However, the same tendency as for previous example can be assumed under dependence of the components.

Table 5 Frequency of equivalence classes for the maximum of rainfall events in the Tannenberg catchment in Saxony dependent on the spatial resolution of the underlying radar image and theoretical frequencies under independence of all and between all components (in %)

The results demonstrate that the frequency of the patterns depends much on the spatial resolution of the underlying radar data. While a counter-clockwise movement (L) of the storm cell occurs almost constantly with a frequency nearly equal to the theoretical one, the frequency of storms with clockwise movement (R) decreases rapidly with decreasing spatial resolution and is far below what would have been expected under independence and also below what can be expected under SRD (as also for SRD a similar frequency of L and R patterns should be expected). This implies that for large scale patterns, which are better captured by low-resolution radar patterns as small changes do not disturb the overall storm pattern, counter-clockwise storm movements occur much more frequently than any other pattern. This is line with the fact that in the Northern Hemisphere winds blow counter-clockwise around a center of low pressure, hence coming along with wet conditions and frontal weather system (Coriolis effect). Such a dependence between the components, i.e. the directions, is not considered in the simulation study, which might be the reason why the resulting frequencies differ. Monotone movement on the small scale (i.e., visible in high resolution data) is more frequent than expected, indicating a rapid movement of the storm cells and again short-range dependence of the components. This is meaningful when considering that the focus in this study is laid on heavy rainfall events, i.e., events with high rainfall intensity, where usually fast movements of the rainfall cells can be observed. Significant stops or breaks in the movement only become present when focusing on the large scale, as again here the movement of small rainfall cells is levelled out by the low resolution. Here, the role of ties becomes apparent for this framework: The focus point of a rainfall event does not necessarily have to move between two time steps, as we are using grid cells. Therefore, it is possible that movements only occur within one grid cell and hence are not visible in the data. This might be meaningful when small movements are considered as random and should not impact the resulting pattern. The consideration of ties becomes more important when the spatial resolution decreases, as ties are required when aiming to differ between small and large scale patterns. The consideration of ties thus enables one to consider small as well as large scale changes within the same framework and to omit too sensitive estimation of the focus point.

To test the significance of the results, a second catchment in Bavaria, Germany was also considered. The catchment Baiersdorf with a catchment size of about 70 km2 is located in the northern parts of Bavaria close to the uplands. Each radar image related to this catchment includes 89 grid cells. Detailed results are not presented here, as these are similar to the ones presented for Tannenberg. Therefore, the overall tendency of the results can be confirmed.

The results obtained here could also be used to identify the most critical patterns of rainfall for a catchment. For this purpose, the patterns of those events could be detected that have the largest overlap with the catchment in time. This is of course highly dependent on the shape of the catchment but can be a valuable information for hydrological modelling of extremes.

4.2 Example 2: Animal movement data

For the second example, we consider data of animal movement. Such data can be expected to have a high spatial resolution as GPS tracker deliver (sub)daily positions of latitude and longitude of the animal’s position. Therefore, ties will not occur in the data and no scale-dependence or scale-sensitivity has to be considered. We consider data provided by the online repository Movebank (www.movebank.org), which is coordinated by the Max Planck Institute of Animal Behavior, the North Carolina Museum of Natural Sciences, and the University of Konstanz. There, large data sets on animal tracking are freely available. We consider the data set on daily satellite location data for leopard management in the North West Province, South Africa, 2014–2020 (Power et al. 2021). For the leopard “Brandy” (panthera pardus), we have 451 days of tracking positions for the given time period in a region close to Pretoria, South Africa. The habitat the leopard lives in is framed by cliffs in the south and a highway in the north (Fig. 5).

Fig. 5
figure 5

Daily location data for the leopard “Brandy” in South Africa. The red dot marks the position at the first day of observation

Table 6 Frequency of patterns for the daily location of a leopard in South Africa and theoretical frequencies under independence of all and between all components and under short-range dependence (SRD) in time of each component (in %)

The high frequency of monotone movement again emphasizes that a dependence in time of each component can be assumed, i.e. the direction of movement of the leopard is continuous for several days (Table 6). Yet, the high frequency of movement towards the left compared to that of movement to the right also indicates that the leopard is impacted by the topography, more precisely the cliff in the south and the highway in the north which forces him to take a left turn, when considering the starting point of the data. A simulation of this movement, e.g., should therefore take into account not only dependence in time but also dependence between the directional components. This can be detected by the proposed approach and used as basis for further analyses.