Order patterns, their variation and change points in financial time series and Brownian motion

Order patterns and permutation entropy have become useful tools for studying biomedical, geophysical or climate time series. Here we study day-to-day market data, and Brownian motion which is a good model for their order patterns. A crucial point is that for small lags (1 up to 6 days), pattern frequencies in financial data remain essentially constant. The two most important order parameters of a time series are turning rate and up-down balance. For change points in EEG brain data, turning rate is excellent while for financial data, up-down balance seems the best. The fit of Brownian motion with respect to these parameters is tested, providing a new version of a forgotten test by Bienaym'e.


Overview
Given a finite series of data, time series analysis attempts to find laws of the mechanism or process which generated the data. One tool to this end are order patterns, defined as permutations in section 3. Their probabilities are properties of the random process which can be easily estimated from the data. The permutation entropy [6] measures the variety of patterns. Applications to brain and heart data concern epilepsy [13,17], Alzheimer's disease [22], effect of anaesthesia [24], and cardiac dynamics [26,14,21]. For applications to physics, geophysics, environtmental and climate data see the surveys [33,3] and the collection [1].
Moreover, probabilities of specific order patterns can give important information. For chaotic dynamical systems, certain patterns will not appear [2]. Different versions of permutation entropy and the number of forbidden patterns were used to quantify and monitor the efficiency of stock [36] and bond markets [34,35]. The probabilities of patterns can be combined to form correlation functions [4,8]. The dependence of economical time series was studied in [27] by means of order patterns. Statistical theory of order pattern estimators was developed recently by [28,10], earlier work by [7,29] was restricted to Gaussian processes.
Several authors have studied change points and segmentation of time series by means of order patterns [30,32]. The interesting approach in [27,28] uses bivariate data while we shall study the univariate case only. A very impressive example is sleep stage classification using high-frequency data of a single EEG channel [23,19,8,9]. Here we have an abundance of data and few statistical problems. In the next section we will briefly describe this application.
Our main theme are order patterns in day-to-day financial data and their statistics. The zigzag patterns of up and down have long been considered by stock traders. Today they are certainly incorporated in secret algorithms for high-frequency trade. We are interested in the statistical error of estimates of pattern probabilties, and in the change point problem. We introduce our standard example of oil prices in section 4.
A curious feature of such examples is their apparent'self-similarity' studied by Mandelbrot since the 1960s [20]. This property implies equality of frequencies of patterns for different lags, which can be compared with simulated samples of Brownian motion and other Levy processes. One goal of the paper is to develop tests which decide whether Brownian motion is an appropriate model. In section 5 this is done by simulation, in section 7 rigorously.
It seems that for a large part, 'order self-similarity' is due to the uneven sampling of market data, with missing weekends and holidays. In contrast to measurements done in nature, the observed values are not varying in natural time. Their change is triggered by trade. Volume and number of buying and selling orders provide alternative scales to natural time. Under such conditions, classical tools like autocorrelation become useless. It may be a necessity to postulate equality of pattern frequencies for small lags. Our discussion in section 6 shows that the assumption is justified and can be used to improve estimates of pattern frequencies.
Our expectations should be modest: with at most a few thousand data, statistical accuracy will not be magic. In section 7 we introduce the two most stable and interpretable order parameters, up-down balance and turning rate. They are used to provide statistical tests for comparing models and data series. Actually, a first test for the turning rate was suggested already in [11,12]. The basic message is that pattern statistics has the 1/ √ n accuracy coming from binomial distribution. In Bienaymè's case it is even better because of negative correlations. In section 9 we use order patterns to determine change points in financial time series. It turns out that up-down balance is the most appropriate parameter.

A big data series from medicine
We start with a brief description of segmentation of big series of brain data by means of the order structure. This has been done by many authors, cf. [24,25,23,19,30,8,32]. Our version in [5] is particularly simple. We considered electroencephalographic (EEG) sleep data measured by [31], and publicly available at pysionet [18]. Healthy volunteers and patients were measured for one night in hospital with a sample rate of 500 Hz. These are multivariate data, as is the standard in sleep labs, and sleep stages (wake -REM sleepsleep stages 1 to 4) were annotated by trained experts for every epoch of 30 seconds. For our classification, we used only two EEG channels, each with 20 million data points for one night.
We determined a score for each epoch in an extremely simple way: we just counted the relative number of local maxima and minima in the time series. In Figure 1 it can be seen that the result almost coincides with the expert's annotation. In white noise, the turning rate -the relative amount of local extrema -is known to be 2 3 [11]. In Figure 1 we see turning rates between 0.35 and 0.45 when the proband is awake, with larger rates in the more frontal channel. In REM sleep, rates were between 0.3 and 0.35, also larger for the more frontal channel. When the patient was in non-REM sleep, turning rates went below 0.3, with larger values in the central channel. The smallest turning rate is below 0.2, in the first phase of deep sleep. After midnight, turning rates of sleep stages 3 and 4 gradually increased. Such results, with individual differences, were observed for about 20 persons measured by [31] with 500 Hz while for lower sampling rate, and for other data contaminated by 50 Hz noise from the power net, the results were less impressive [5]. Here we looked for many change points, and a very simple method was sufficient. The main thing was to express the structure by the appropriate parameter (see section 7 for details). Of course, this is preliminary research, and much work has to be done to make it an applicable method. But we are in a comfortable position with 15000 data for each epoch. For this application, a precision of ±10 seconds for change points is fine, so an error of ±1000 time points does not count. The data contain plenty of artefacts when the patient moves, rolls with the eyes in dream, gnabs with the teeth etc. A few bad epochs can be tolerated, however. Most of them are annotated 'wake' in Figure 1. The sliding window technique works well. We just determine turning rate for epochs of 30 seconds. For greater detail, we can take overlapping windows, with a shift of one second only [5]. There are no serious statistical problems.
In this paper on historic economical data, the situation is quite different. We have a few thousand values altogether. We must think about which structural parameters can be estimated with reasonable accuracy, and we have to deal with statistical errors.
3 Pattern frequencies as one facet of a random process As we said, we try to find properties of the process which generates the data. We begin with a brief glance at the theory. A stationary random process X 1 , X 2 , ... in discrete time, N = {1, 2, ...} is essentially defined by its distribution, a probability measure P on Borel sets B of the space Ω = R N of all possible infinite time series y 1 , y 2 , ..., which is invariant under time shifts. It is enough to know the probabilities of those Borel sets B ⊂ R n which depend only on the first values y 1 , ..., y n , for arbitrary n.
A statistician will ask how probabilities P (B) are estimated from a finite time series x = x 1 , x 2 , ..., x T . Since we assume stationarity, an estimate P (B) is the relative frequency of those vectors (x t+1 , ..., x t+n ) which belong to the set B, for t = 0, ..., T − n. As a formula, Here #A denotes the cardinality of a set A. Let us assume that the stationary process is ergodic. Then a classical theorem of Birkhoff says that for T → ∞, the estimate P (B) will converge to the true value P (B), with probability one.
Alright! But in practice, T is fixed and fairly small, say T = 1000. To get reasonable estimates, B should have a simple structure, like {x|x 1 < b} for given numbers b, which leads to the one-dimensional distribution. Multidimensional Borel sets have rarely been studied. Usually one prefers to study autocorrelation, based on averages over t of functions x t+1 x t+n for n = 1, 2, ....
There is, however, one class of multidimensional sets B for which estimation makes sense: the sets {x|x j < x k } and their intersections. The set {x|x 1 < x 3 } describes a halfplane in the x 1 x 3 -plane, and B = {x|x 1 < x 2 < x 3 } describes a 'cone', both in R 3 and Ω. The latter set B is symbolized as the first order pattern in Figure 2. There are five other patterns or permutations π. For example, π = 231 corresponds to B = {x|x 3 < x 1 < x 2 }.  Note that a permutation π of length n is a one-to-one mapping from {1, ..., n} onto itself. Figure 2 shows the graphs of the six permutations of length 3. We shall say that n equally spaced values x t+d , x t+2d , ..., x t+nd in our time series represent the permutation π of length n if they are in the same order. That is, π(j) < π(k) if and only if x t+jd < x t+kd , for 1 ≤ j < k ≤ n. Now we can count relative frequencies of permutations in the time series, which we shall denote p π (d). Let The case d = 1 corresponds to successive values in the series, but sometimes it is good to compare values which are 2 or d steps apart. The formula (1) for sets B determined by a permutation π of length n and a lag d ≥ 1 reads as follows.
Only formally, we allow t to be negative. Otherwise we had to consider permutations on 0, 1, ..., n − 1 as in [3]. Figure 3, taken from [8], shows a small example which leads to the frequencies p 132 (1) = 2 5 , p 321 (2) =   We have to note that our method excludes equality of values x j = x k . In theory, equality happens only with probability zero when the one-dimensional distribution of the underlying process has a density function. In practice, equality of values will happen, but will be rare when the values have an accuracy of 5 or more decimals. As explained in [9], pairs of equal values, as well as missing values, can just be disregarded in the calculation. A simpler practical method to eliminate equal values is to add a tiny white noise (uniform random numbers times 10 −7 , say) to the series x.
There are at least three good arguments for considering frequencies of such order patterns. First, since they describe cones in Ω, their estimates are robust with respect to noise and outliers, and not changed by an increasing nonlinear transformation of the data. Next, order patterns can be evaluated very easily and extremely fast by computer. There are few possible sources of error.
The most important argument for the use of order patterns are the weak stationarity assumptions which are required. The process X 1 , X 2 , ... need not be stationary. It need only have stationary increments in the sense that for all t, n ≥ 1 (X t+1 − X t , ..., X t+n − X t ) has the same distribution as (X 1 , ...., X n ) .
For pairwise different numbers x t , ..., x t+n , the vector (x t+1 − x t , ..., x t+n − x t ) always represents the same permutation as the vector (x t+1 , ..., x t+n ). The same argument works for arbitrary d. We obtain the following conclusion.
Proposition 1. For an ergodic process X 1 , X 2 , ... with stationary increments, p π (d) in (2) and (2d) is a consistent and unbiased estimator of the probability that the vector (X t+d , ..., X t+nd ) represent the permutation π, for any fixed t. This is a special case of the estimator (1) which is known to be unbiased. The proposition says that our naive formula (2d) makes sense. In the sequel, we consider only patterns of length n ≤ 4.
Financial data have a random walk structure and are typical examples of processes with stationary increments. Brownian motion is their classical model. It is not stationary, and its autocorrelation function depends on two arguments. Its pattern frequencies as functions of the lag d can be analytically determined, however [7].

Patterns of length four in a series of oil prices
Permutations of length 2 and 3 lead to two important parameters β and τ, as discussed in Section 7 below. To differentiate between various time series, we can use all 24 permutations of length 4. They provide detail information, and can be reasonably estimated when more than 1000 values are available. For a study of the 120 permutations of length 5 we would need much longer series.
For convenience, we shall assign to each permutation a number between 1 and 24, according to lexicographic order, as indicated in Table 1   On the left, frequencies are shown on linear scale. The largest frequencies appear for patterns 1234 and 4321, with p π (d) around 1/8. This happens for all daily price data. Here, p 1234 is clearly larger 1/8 and p 4321 is smaller, which can be explained by the increasing trend of the series. The largely increasing patterns 1243 and 2341 and their decreasing counterparts 4312, 3214 have probabilities near to 1/16 = 0.0625. All other patterns have smaller probabilities.
On the right, pattern frequencies are shown on logarithmic scale, in order to distinguish better the rare patterns. Differences here actually represent quotients, and deviations from the mean have to be interpreted as relative errors. The pattern with the smallest frequency, less then 1.5 percent, is 3142, followed by 2413 with almost 2 percent. Note that the average frequency is 1/24 ≈ 4.2%. The patterns in the middle, with numbers 8 to 17, as well as 5,6 and 19,20, are below average.
The figure shows an obvious symmetry between patterns with number k and 25 − k. Actually, pattern number 25 − k is obtained by reversing all order relations in pattern number k. Rank 4 is interchanged with 1, rank 2 with 3, and < with > . The graph of pattern 25−k, drawn as in Figure 2, is the graph of the pattern with number k turned upside down. It seems plausible that these two patterns have the same frequency, but in our series it is not quite true. The permutations with higher number have somewhat smaller frequency than their counterparts, with exception of number 12 and 13: 3124 is more frequent than 2431. Can this break in symmetry be explained by a trend? The time series of T = 8497 oil prices, drawn in Figure 5 on logarithmic scale, shows a general upwards trend, as most price series do. It is interrupted by two rather sudden drops of prices: one in 2008 when the world economy was hit by the financial crisis, and one in 2014 when some producers, notably Saudi Arabia, increased their supply to compete with the upcoming fracking industry in the USA.
So the series is not stationary and needs segmentation, which will be done by change point methods below. To study the effect of the trend on pattern frequencies, we construct an ad hoc segmentation into four pieces by eyesight.  Order four pattern frequencies for the four subseries are shown in Figure 6. They do not look as different as expected. It is true that for the strongly increasing series 2009-14, the increasing patterns on the left side of the figure have larger frequencies and the decreasing patterns on the right side, with number 20 to 24, say, have smaller frequencies. And for the time 2014-19 without trend, pattern 4321 appears more often than 1234. But some of the frequencies, notably number 11 to 14 in the middle, are almost constant through time. On the whole, frequencies do not differ much.
How can we decide about differences of such frequencies with statistical rigor? This will be a theme of the next sections. First we introduce a standard model.

Brownian motion as model for patterns in financial data
Since the work of Bachelier in 1900, Brownian motion (abbreviated BM) is a basic model for financial time series. It is usually taken to describe the logarithm of prices. For day-to-day data it postulates that the difference R t = log x t+1 − log x t , the so-called log return of day t, has a Gaussian distribution with mean zero and some variance σ 2 . Moreover, log returns of different days should be independent.
The independence assumption is plausible. When there would be a specific dependence between returns of today and tomorrow, some speculants would immediately bet against it to earn money, which would cause the dependence to disappear. The Gaussian distribution is not realistic, however, since big returns, both negative and positive, are more frequent in practice than the normal distribution allows [20]. Moreover, the important phenomenon of varying volatility is not contained in the Brownian motion model.
For order patterns, Brownian motion is a good model. It makes no difference whether we model logarithms or original values, because log x is an increasing function. The value of σ is also irrelevant, so we take σ = 1. Thus we let x t , t = 1, ..., T be a realization of standard Brownian motion -the cumulative sum of a series of standard normal random numbers. It can be simulated by the Matlab command x = cumsum( randn(1,T) ). We took T = 8500 as for the WTI series above. Theoretical pattern probabilities are marked by circles.
Pattern frequencies for lags d = 1, ..., 10 of a typical simulation of Brownian motion with length T = 8500 are shown in Figure 7. Theoretically, patterns k and 25 − k appear with the same probability, but this need not be true in a simulation. Moreover, the process of Brownian motion in discrete time is known to be self-similar. More precisely, B 1 , B 2 , B 3 , ... and (B s , B s+d , B s+2d , ...)/ √ d have the same distribution for any lag d ≥ 1 and any initial time s. As a consequence, the probabilities p π (d) do not depend on the lag d. Of course, this need not be so in a simulation. For large length T = 10 6 we checked that the coincidence is convincing for virtually every simulation. Here we want to see what happens for T = 8500, the same length as our oil price data. Comparing Figures 7 and 4, we see that the coincidence of frequencies for different lags in the data is not worse than in the model, where it should be perfect by definition. We try to clarify this point in the next section.
For Brownian motion, probabilities of order 4 patterns can be determined analytically [7]: since the multivariate standard normal distribution is spherically symmetric, octant probabilities can be determined  as surfaces of spherical triangles. The angles of these triangles are related to certain correlation coefficients. Table 2 lists the results of [7]. It is curious that there are seven different probabilities of which only four are rational.
Comparing the distributions of pattern frequencies of WTI prices in Figure 4 and BM in Figure 7, we see little difference. Is BM a good model in our case? We consider distributions of order 4 patterns as vectors in R 24 and study their Euclidean distances.
Let q = (q 1 , ..., q 24 ) denote an empirical distribution of WTI for one of the four time periods in Figure  6. We determine the distance q − q to the distribution q of the WTI over the whole time, and the distance q − b to the theoretical distribution b of Brownian motion given in Table 2. Moreover, N = 10 5 simulations of BM with the sample size of the respective WTI series for q are performed and their pattern The distribution of all d k serves as test distribution for the null hypothesis that q is obtained from the Brownian motion model. It resembles a chi-distribution for 24 degrees of freedom, with more heavy tail on the right, but not far from normal. Our simulation provides the median of the d k , and an estimate of the p-value of the data: For different lags d, the distances q − b can differ by 20 or 50 %, but they were always within a factor of 2. The resulting p-value, however, can differ by a factor 10 and more. Thus for a distribution with 24 frequencies, the dependence on the lag can be a problem. This experience leads us to simpler parameters of the time series, introduced in section 7.
Here we decided to take the mean of the frequencies for lags 1,2, and 3 as our empirical distribution q. This method is justified in the next section. The results are shown in Table 3. It turns out that for the last period 2015-19, BM is a perfect model with p = 88.9%. The patterns of the third period 2009-14 could also well be generated from BM. For the whole series as well as for the first two periods, there are significant differences, however.  Table 3: Distances of pattern distribution q of WTI prices in different periods to their mean q and to the exact distribution b of BM. This was compared with the distance of 10 5 simulated sample distributions b k of BM to b. During the last two periods, Brownian motion fits well.
What is the reason for the differences? In the second period, it is the strongly increasing trend of the prices, which results in large frequencies p 1234 and small frequencies p 4321 . This is not surprising and could be mended by taking a biased BM with linear trend fitted to the increase of prices in that time. For the whole series, the patterns would also fit better to a BM with linear trend. However, in that case the trend has to be chosen in such a way that today's prices would range above 300 instead of 60 $. Thus there may be another structural difference between WTI and BM for the whole series and for the first rather stationary period, although the p-value of 3.8 % in Table 3 is not convincing. We return to this question later.
6 Order-self-similarity of financial data Before we go on, we make a few general remarks. Instead of studying the prices x t , it is common to consider the increments Z t = x t+1 − x t or log returns R t = log x t+1 − log x t of every day. Their distribution is determined, and usually has heavier tails than normal distribution. Here we study order patterns in the original data, however. Returns come in automatically when we compare values: x t+1 > x t means that the increment and log return of day t are positive. Our order patterns can be considered as combinations of signs of returns for different days, and different time lags.
In contrast to time series which are measured with a fixed sampling rate, daily financial data come with an irregular time scale. At weekends as well as on holidays, the stock market is closed. We pretend that Monday is the day following Friday, although we know that much business is going on throughout the weekend. Moreover, there are days where no trade takes place, and other days are very busy. It was suggested that the time scale should be modulated by the trading volume, but this is not done. As a consequence, studies of daily market data by classical methods of time series analysis, as autocorrelation and frequency spectrum, do not reveal clear structure.
On the positive side, it has often been observed that financial series have a nice symmetry property. They are statistically self-similar in the sense that a magnification of the function at one place looks very similar to the original function at another place [20]. A random process in continuous time, Y t , t > 0 is said to be self-similar if there is an exponent H > 0 such that For details, see [16] or [15], section 8.9. Brownian motion with H = 1 2 is the standard example. Certain Levy Processes and fractional Brownian motion are also self-similar, and have been used as models of financial data. Usually, equation (4) is only considered for the one-dimensional distribution of the Y t , and there are methods to determine the Hurst exponent H from data, see [20,29]. The mathematical definition applies to the whole process, however. Thus for every n, d ≥ 1 (X d , ..., X nd ) has the same distribution as d H · (X 1 , ...., X n ) .
This multidimensional setting is practically impossible to check with statistical methods. Using order patterns, we now define a simpler concept. Note that (X 1 , ...., X n ) and c · (X 1 , ...., X n ) represent the same order pattern for each positive constant c. So let us say that a process Y 1 , Y 2 , ... is order self-similar if for every n, d ≥ 1 (X d , ..., X nd ) has the same pattern distribution as (X 1 , ...., X n ) .
Proposition 2. Each self-similar process is order self-similar. If the process is also ergodic and has stationary increments, then p π = 1 m · m d=1 p π (d) is a consistent and unbiased estimator of the pattern probability of π, for any fixed m ≥ 1.
The proof follows from Proposition 1 and the remark above. Thus we can use order self-similarity to improve our estimators, as already done in the previous section. Moreover, order self-similarity is a parameter-free concept -we do not need a Hurst exponent. And it seems rather easy to check this concept with sufficiently many data.
Problem. It is not known whether all order self-similar processes are self-similar. Note that self-similar processes are not stationary. The standard examples mentioned above have stationary increments, however. Are there stationary order self-similar processes?
Why should a daily financial series be self-similar? There are small and big orders on the market. Some shareholders keep their items for a long time, others will sell them immediately. Briefly, there are actions of all sizes which produce structure of all sizes in the time series. This may be true, but in our opinion the irregular time sampling of daily financial series contributes a lot to their self-similarity.
Our time series x t mixes time differences over one day, like Monday to Tuesday, with three-day differences from Friday to Monday. When we consider lag 3, like Monday to Thursday, then most differences cover five days, like Thursday to Tuesday. With such an irregular scale, our only chance seems to postulate that order pattern probabilities do not depend on the lag, at least for small lags. This postulate is supported by empirical data like Figure 4.
Henceforth, we shall assume that financial data fulfil order self-similarity in the sense of (6), for d ≤ 6. We use this assumption to improve estimates by applying Proposition 2 with m = 3 or m = 6. It makes no sense to go to much larger m. On one hand, the data will not fulfil (6) for large d, on the other hand the variance of estimates p π (d) increases with d, as we shall see below.
7 Turning rate and up-down balance Pattern frequencies are not so easy to interpret. Certain combinations of pattern frequencies are more meaningful. Let us fix the lag d = 1 for simplicity. The permutation entropy of order n is a sum over all patterns of length n.
This was introduced as a measure of complexity of the data [6]. See [2,33,1,3] for details, applications and related concepts. Here we shall focus on two simpler quantities. For patterns of length 2, we consider the up-down balance β = p 12 − p 21 = 2p 12 − 1 .
This parameter measures asymmetry, non-Gaussianity or irreversibility of the process [7,4,9]. If there is an increasing or decreasing trend in the series, then β will be positive or negative, respectively. In the extreme case of an increasing or decreasing time series β assumes its maximum 1 and minimum −1, respectively. However, β can also be positive in a stationary process. For instance, a stock price could increase always from Monday to Friday, and then fall to the level of last Monday over the weekend. In that case, β = 4 5 − 1 5 = 3 5 . Many time series in nature, for example the daily number of sunspots [7], show similar behaviour. A random walk, that is, a process with independent stationary increments, will have no trend if the increment Z = X t+1 − X t has zero mean. The median need not be zero, and in that case β = 2P (Z > 0) − 1 will be positive or negative. There are also asymmetric Levy processes where the mean of Z does not exist. An autoregressive process with exponential noise is stationary and has nonzero β for small d. An example is discussed at the end of this paper. For financial data, β often differs from zero, but not too much, as Table 4 indicates for our oil data.
Our other important parameter is based on patterns of length 3. It is the turning rate α = p 132 + p 231 + p 213 + p 312 = 1 − p 123 − p 321 , the relative amount of local maxima and minima (turning points) in the series. This is an intensity parameter which varies between 0 and 1. The extreme case zero is approached by a smooth time series which has very few turning points. The other extreme is an alternating series, like x t = (−1) t , for which each point is a turning point. For a series of independent random numbers (white noise) we have α = 2 3 , and for Brownian motion α = 1 2 . In previous work [7,4,9], the author considered the persistence τ = p 123 + p 321 − 1 3 = 2 3 − α instead of α as a function of d. This function shares many properties with autocorrelation. It is zero for white noise, for instance. However, turning rate was essentially introduced in [11] and is a very intuitive concept while the word 'persistence' is used with various meanings even in mathematics. So τ is not used here.
H, β, and α can be considered for arbitrary lag d ≥ 1. So they become functions of d which can be used as correlation functions. This is very useful in the case of big data series from weather or medicine where certain periodicities appear on various scales [9]. The result shown in Figure 1 was obtained with the turning rate for d = 4. For financial data, which are approximately order self-similar, we have essentially only one value α for small d. But Proposition 2 provides different ways of calculation, in particular The same holds for β. In Table 4 we compare α 1 , β 1 . with α 3 , β 3 . We see that the turning rate of our oil data is 1 2 in all periods, with very small fluctuation. The differences of β from zero are a bit larger. The z-values are obtained from Proposition 3 below.

The variance of α and β
Defined in terms of the estimators p π , the quantities H, α, and β are in fact themselves estimators of respective parameters of the underlying process. For instance, β estimates b = P (X 1 < X 2 ) − P (X 1 > X 2 ).
For an ergodic process with stationary increments, Proposition 1 implies that α and β are consistent and unbiased estimators. The entropy H, given by a continuous but nonlinear function of the p π , is consistent but not unbiased. Now we determine the quadratic error of these estimators, at least for the case of Brownian motion. In other words, we calculate the variance of the functions α and β, taken over all possible realizations of BM. We start with a basic fact. , E(β) = 0 , V ar(β) = 1 T − 1 .
Proof. The increments Z t = X t+1 − X t of BM are independent and have positive sign with probability 1 2 . Thus counting the number n 12 of up-steps t with Z t > 0 is really the same as tossing T − 1 fair coins and counting heads. Then p 12 = n 12 /(T − 1) has mean 1 2 and variance 1 4(T −1) , and the assertion for β = 2p 12 − 1 follows. A point t between 2 and T − 1 is a turning point (local maximum or minimum) of the time series x 1 , ..., x T if the increments z t−1 = x t − x t−1 and z t = x t+1 − x t have the same sign. The events to be counted are now A t = {Z t−1 Z t > 0} for t = 2, ..., T − 1. Since the signs of the Z t are independent and are + with probability 1 2 , we have P (A t ) = 1 2 for each t. When we show that the A t are independent, we do again have a series of fair coin tosses. From the definition of A t , it is clear that A t is independent on all A s with s ≤ t − 2. It remains to show that A t−1 and A t are independent: The rest of the proof is similar as for β.
We apply the result to α 1 and β 1 in Table 4, to test the fit of Brownian motion to our oil price data. The z-value of α 1 is obtained as for β 1 we calculate z = β √ T − 1. We compare with the standard normal distribution. For turning rates, only the period 2001-08 showed significant differences. The z-values of β are larger, but the significance for 2001-08 and for the whole time series can be attributed to trend. On the whole, BM is a good model for order patterns of the oil prices. Proposition 3 is paradigmatic because it shows that for all estimators p π (d) we can expect standard deviation and confidence bound of the form C/

√
T where C is a constant. However, beware of correlations! The proposition is not true for lags d greater than 1. The reason is that Z t (d) = X t+d − X t and Z t+1 (d) = X t+1+d − X t+1 are not independent since contain the common part X t+d − X t+1 , and this term is likely to become the main term when d is large. So there is positive correlation among the sets A t , which increases the variance. For BM this can be studied rigorously but it is easier to perform a simulation. The result, shown in Figure 8, is surprising. The variance of both α and β increases almost linearly with d. For α the slope of the line is about 1 3 , and the average α 3 has much smaller variance than α(1). For β, however, the slope is about 4 5 . The increase is so strong that averaging over different lags does not improve the accuracy of the estimator.
We have no space to extend this discussion, but for historical reasons we should mention the groundbreaking result of [11]. He considered turning points for white noise as model for series of astronomical measurements [12]. His 1874 paper was one page without proof. On the last page of his second paper, it is noted that J. Bertrand presented a proof on the next session of the French society for mathematics and astronomy. We slightly modify the result and add the statistics of up-steps.
Bienaymé's theorem. Consider a sequence of T independent and identically distributed random variables. The number V of turning points is asymptotically normal with The number U of up-steps is asymptotically normal with Proof. In an i.i.d. sequence of length m, all permutations of length m are represented with the same probability. This will be the main argument.
We start with up-steps. Note that U = We used that M (U s U t ) = 1 4 for |s−t| > 1 because of independence. Now M (U 2 t ) = 1 2 and M (U t U t+1 ) = For turning points, let V t = 1 {t is turning point} for t = 2, ..., T − 1. Then M (V t ) = p 132 + p 213 + p 231 + 2 as a sum, we now have to subtract 4 9 instead of 1 4 , and we can neglect terms with |s − t| > 2 for which V s and V t are independent. We obtain M (V t V t+1 ) = 10/24 by looking at Table 1 and realising that 10 of the 24 permutations have two turning points in the middle. In terms of the covariance we get Cov(V t , V t+1 ) = M (V t V t+1 ) − 4 9 = −1/36. It takes a little longer to see that exactly 54 of 120 permutations of length 5 have turning points at their second and second-to-last position. This implies that Cov gives the result after a brief calculation. Asymptotic normality for T → ∞ should be rigorously formulated in terms of standardized random variables. In Bienaymé's time it was taken for granted. Today it follows from central limit theorems for k-dependent or strongly mixing sequences of random variables.
Note that the variances here are even smaller than for the binomial distribution in Proposition 3, because of negative correlations. This often happens for the variance of pattern frequencies p π since many patterns, for instance 132, are not compatible with their shifted pattern. Moreover, Bienaymé's theorem holds for any lag d, in contrast to Proposition 3. But now let us turn to change points.

Change points with respect to mean and order structure
Certainly there is no unique way to determine change points in a financial time series. From the mathematical point of view, the simplest way is to look for changes in mean. A classical method takes the time point k where the difference of the means m k = 1 x t assumes a maximum or minimum. Thus we maximize the function The factor c k normalizes the standard deviation of the difference of means, as in a two-sample t-test. (We did not estimate the standard deviation since that would require information on autocorrelation.) We put the constant 2 into c k so that the factor is 1 for all k in the middle of the interval, and f (k) can be interpreted as a difference. As seen in Figure 9, the function f for the WTI series is rather smooth and has a unique maximum point k 1 in the strictly increasing part in the middle of the series in Figure 5, in late 2004. All x t with t < k 1 are smaller, and almost all x t with t > k 1 are greater than the price at k 1 . The average difference is 50 $. This is a clear solution to the optimization problem. However, it is not a point where change happens.  We can also detect change in the mean locally, by comparing the last 100 points before x k with the mean of the next 100 points. As shown in Figure 10, this results in two clear maximum points. One is in the middle of the 2008 period of rapid decrease. The other one at the beginning of decrease in 2014 seems a real change point. A lot of smaller maxima indicate smaller periods of increase and decrease. Now we turn to order structure. Let q k = (q 1 , ..., q 24 ) denote the vector of order 4 pattern frequencies for the time interval [1, k], andq k the respective vector for the time period [k + 1, T ]. Let v denote the Euclidean norm of a vector v. Similar as in [30], we consider the function where the constant c k is the same as in (10). It serves to diminuish the bad estimates of frequencies for k near the marginal values 1 and T, and can be justified by the 1/ √ T magnitude of the standard deviation of the pattern frequencies discussed above. Maxima of g are taken as change points for order. The function g(k) is shown in three versions in the upper panel of Figure 11. For lag d = 1, no clear change points are found. The maximum is near the right margin and turns out to be a random effect too large to be eliminated by c k . When we take q from a mean of lags, 1 to 3, or 1 to 6, we see clear maxima at the end of 1998, in 2008 and 2013/14. The maximum at the left margin is again a random effect due to high variance of g for small k. On the right, the maximum found for d = 1 has disappeared.
When we compare with the time series in Figures 5, we see that the peaks in July 2008 and July 2014 were included in our ad hoc segmentation. The peak in December 1998 is also a convincing change point. Probably it is a better choice than our first segmentation point in October 2001. And the peak in August 2013 is also reasonable. It could be taken as an alternative to July 2014, as can be seen in Figure 10. Moreover, the two means of lags give almost identical results, with few days difference in their maximum points. Distances of means over 1 to 6 are on a larger level than those from means over 1 to 3, because of larger variance discussed at Figure 8.
In the lower panel of Figure 11, the same results are shown for patterns of length 3. Again, d = 1 is not a good option. It was surprising that for means over lags 1 to 3 and 1 to 6 we get the same structure of maxima as for length 4. This indicates that with respect to change points the six patterns of length 3 contain all relevant information.  Figure 12: Difference before and after k of order 3 permutation entropies, conditional entropies recommended by [32], and turning rates. These methods seem not appropriate for segmentation of financial data.
Can we simplify even further? There are three objectives: the function g should become less erratic, more like the function f for the mean. The criterion should be transparent, interpretable, as it is for the mean. And the estimates of structure before and after should be fairly accurate even for small k or T − k. This was not the case for (11) where 1000 values would be minimal for a reasonable estimate of all 24 pattern frequencies. Even with c k there are marginal effects in Figure 11.
In the upper panel of Figure 12. we tested the function h(k) = c K (H k −H k ) where H,H denote permutation entropy of order 3 before and after k, respectively. The maximum in July 1998 is a realistic change point, but it is not convincing. In the middle panel we used conditional permutation entropies recommended by [32]. They work well for time series from certain dynamical systems with a Markov structure. We have the same function h(k) with H = 4 i=1 −p i log p i + p log p + (1 − p) log 1 − p where p = p 12 , p 1 = p 123 , p 2 = p 132 + p 231 , p 3 = p 213 + p 312 and p 4 = p 321 . The function is rather smooth but there is no clear maximum. Our data do not include a Markov structure. In the lower panel we used turning rates, h(k) = c k (α k −α k ). As noted in Section 2 they worked so well for classifying sleep brain data. Here they give almost the same result as permutation entropy. All three functions seem not appropriate for segmenting financial data.

Change points with respect to up-down balance
Our last trial is the function β. It will give the best results. We look for maxima and minima of h(k) = c k (β k −β k ) where β k andβ k are the values of β on [1, k] and [k + 1, T ], respectively. Note that by Proposition 3, c k is exactly the normalizing constant for β and α, at least for lag 1. Figure 13 shows that lag 1 alone is sufficient only from 1992 to 2002. The other two estimators of β give very similar functions. Their main maximum and minimum points are very near to the maxima for the order structure in Figure  11. The first change point k 1 has to be taken in August 2013. Our ad hoc point in July 2014 would be a possible alternative choice. As in the approach with means, we can now look for a second change point k 2 in the longer subinterval, in this case [1, k 1 ]. Figure 14 shows that the second change point is unique when we exclude margins. It is a minimum, which means that prices start to rise there, and can already be seen in Figure 13. Here it is comes one month later than in Figure 11, in early February 1999. When we look for a third change point in the interval [k 1 , k 2 ] we again get a unique solution which is already seen as maximum in Figure 14. This point in July 2008 differs only by one day from our ad hoc point with the highest oil price. Thus β produces a meaningful segmentation which is similar to, and better than, our ad hoc segmentation.  One can also search for local change points, comparing β on the intervals [k−m+1, k] and [k+1, k+m]. The situation is similar as for means: the local search gives a lot of maxima and minima, which depend very much on the scale parameter m. Figure 14 shows the result for m = 150. Two large maxima in 2008 and 2014 are similar to those in Figure 9. Nevertheless, local search offers too many choices.
For financial data, global search with β seems the best order method for finding change points. It provides more meaningful results than the approach with means.
What about significance of h(k 1 ) ? In view of Proposition 3, we can think about random walk statistics to provide p-values. Calculation of β on intervals [1, k] can be seen as a simple symmetric random walk b 1 , ..., b T with T − 1 time steps. The initial value is b 1 = 0, and the terminal value b T is the observed total number of up-steps. The probability p c that the walk will hit the lines y = ±c can be expressed by binomial coefficients.
This exciting idea has several shortcomings. First, we need the difference between relative frequencies, normalized by c k , while random walk concerns absolute values. Next, it would work only for lag 1, which from a practical viewpoint is not the best option for change point detection.in financial time series. The most serious problem, however, is that the question is wrong. It is easy to define changes in the mean or dynamics of model series. But do we really expect similar dynamical changes in a given real-world time series?
For β as mean over lags 1 to 3, we have the maximum value h(k 1 ) = 0.548 in Figure 13. We checked the significance of this change point by simulation, with Brownian motion as null model. The maximum value of |h(k)| = c k · |β k −β k | was determined for 1000 trajectories of BM. It turned out that for 422 simulations, the maximum was larger than 0.548. Almost every second realization of BM had a change point with larger value |h(k 1 )| than our data. This is alright! BM is not stationary. A time series taken from BM needs segmentation, like the data. Any reasonable method should detect change points in BM. As a null model, we better take a stationary series which by definition has no change points. We first took an AR(1) model x t = 0.99x t−1 + z t with i.i.d. Gaussian random numbers z t . We found that only 2 from 1000 realizations have larger |h(k 1 )| than our data. So for stationary series, we get significantly smaller values in the search for change points.  Figure 15: Sample from an AR(1) with exponential noise, with positive β(d) for small d. Although the process is stationary by definition, this cannot be seen in our figure, and segmentation may be necessary.

AR(1) process with exponential noise
We checked this with the same AR(1) process with exponential noise, z t = 1 − e t , where e t has an exponential distribution with mean 1. These random numbers have their heavy tail on the left, so the process will make downwards excursions and only slowly climb up. As a result, β(1) ≈ 0.25, and all β(d) with d ≤ 6 are still greater 0.1. Nevertheless, only two from 1000 simulations gave a maximum larger 0.548. Then we realized that the excursions are still too short, and increased the AR(1) factor to 0.998. A typical realization with maximum h-value 0.32 is shown in Figure 15. For this process, 92 from 1000 simulations had maxima greater 0.548.
These experiments gave the impression that β is a useful parameter for change point detection. It will provide large values of h when we expect them. But this is exploratory research, there is still much work to do. We hope that readers feel encouraged to perform their own studies with order patterns.