Multidimensional Permutation Entropy for Constrained Motif Discovery
Abstract
Constrained motif discovery was proposed as an unsupervised method for efficiently discovering interesting recurrent patterns in timeseries. The defacto standard way to calculate the required constraint on motif occurrence locations is change point discovery. This paper proposes the use of timeseries complexity for finding the constraint and shows that the proposed approach can achieve higher accuracy in localizing motif occurrences and approximately the same accuracy for discovering different motifs at three times the speed of change point discovery. Moreover, the paper proposes a new extension of the permutation entropy for estimating timeseries complexity to multidimensional timeseries and shows that the proposed extension outperforms the stateoftheart multidimensional permutation entropy approach both in speed and usability as a motif discovery constraint.
1 Introduction
Multidimensional time series data get generated from all kinds of industrial and scientific activities at an ever increasing rate. Finding interesting patterns in this data for useful information is a daunting task that is attracting more attention in the age of big data. Several information discovery methods rely on the ability to discover approximately recurring short patterns in long timeseries. These patterns are called motifs.
The history of motif discovery (MD) is rich and dates back to the 1980s [17]. The problem started in the bioinformatics research community focusing on discovering recurrent patterns in RNA, DNA and protein sequences. Since the 1990s, data mining researchers started to shift their attention to motif discovery in realvalued timeseries and several formulations of the problems and solutions to these formulations were and are still being proposed.
One of the earliest approaches was to discretize the timeseries (usually but not necessarily using SAX [4, 6]) then apply a discrete motif discovery algorithm from the bioinformatics literature. The most widely used such algorithm is Projections [4]. The main idea of the algorithm is to select several random hash functions and use them to hash the input sequences. Occurrences of the hidden motif are expected to hash frequently to the same value (called bucket) with a small proportion of background noise. An extension of Projectionsbased MD was proposed by Tanaka [22] that uses minimum description length (MDL) and PCA to handle multidimensional timeseries and find only statistically significant ones.
Another popular approach (used for example by the MK algorithm [20]) is to find efficiently the most similar pair of subsequences of a given length in a singledimensional timeseries using the triangular inequality for pruning of unneeded calculations. Several extensions were proposed that can discover multiple pairs [16], multiple motif lengths [11] and full motif enumeration of scale invariant motifs [12, 13] and [19].
One problem with both of the aforementioned approaches is that they do not scale well with very long time series (in the length of millions of points). Stochastic motif discovery does not suffer from this problem because its computational requirements can be adjusted to match available computational resources. Stochastic motif discovery algorithms sample subsequences from the timeseries and compare them using some predefined distance function searching for recurrent patterns with small distances. The simplest of these algorithms was proposed by Catalano et al. [2] and it samples subsequences randomly (i.e. from a uniform distribution over subsequence start points) then uses the distances between short overlapping parts of these subsequences to discover if two full occurrences of the same motif exist in these sampled subsequences. This approach is only viable when motifs appear frequently in the timeseries that random sampling can have a chance of sampling two complete occurrences.
Constrained Motif Discovery (CMD) is a special case of Stochastic MD which utilizes some constraint to bias the sampling process toward parts of the timeseries in which there is higher probability of finding a motif occurrence. This can be achieved using domain knowledge but, more interestingly, it can be achieved by utilizing a change point discovery (CPD) algorithm. This use of CPD for seeding MD is based on the assumption that when motif occurrences do not overlap or immediately follow each other, a change in the generation process must – by definition – happen at the beginning and end of each occurrence and CPD can be used to find these locations and bias the search process of MD. The toolbox has three Constrained MD functions [17]. Variants of this approach include MCFull and MCInc [9], Distance Graph Relaxation [7], GSteX [8] and shift–density based approach [14].
In this paper, we propose a new approach for finding the constraint used by constrained MD algorithms using analysis of the complexity of timeseries sequences. The advantage of relying on complexity measures instead of change point scores for constructing the constraint is that it automatically finds motifs that have interesting/complex structure avoiding the common problem of finding trivial motifs faced by CPD based algorithms. Moreover, this paper will show that the proposed method is faster to execute.
Bandt and Pompe proposed Permutation Entropy (PE) as a complexity measure for time series [1], which is similar to Lyapunov exponents and is applicable to any type of timeseries. One of the main limitations of the PE algorithm in its original form is its inability to handle multidimensional timeseries.
Several approaches have been proposed for a multivariate/multidimensional version of PE. Multivariate Permutation Entropy (MPE), focuses exclusively on the order of values at different channels at the same time step [5].
Multivariate MultiScale Permutation Entropy (MMSPE), applies PE in a matrix taking both orders of a single channel at different times and of different variables and simply combines the frequencies in the whole matrix. Moreover MMSPE uses a straightforward multiscaling through averaging [18]. This method is the most related to the proposed approach and we use it as the baseline approach when evaluating the proposed method.
We propose an extension of PE that is more efficient than MMSPE and MPE, as it takes in consideration timeseries values at different channels at different time steps.
The rest of this paper is organized as follows: Sect. 2 gives a mathematical definition for Permutation Entropy. Section 3 details our proposed algorithm which is evaluated in Sect. 4. Then the paper is concluded.
2 Permutation Entropy
Consider given a single dimensional time series \({x_t}\) for \(t=1,2, ...,T\), and its timedelay embedding representation \((X_{t}^{n})_i = \{x_i ,x_{i+\tau } , ... , x_{i+(n+1)\tau }\}\) for \(i = 1,2, ...,T(n 1)\tau \), where n is the embedding dimension or permutation order and \(\tau \) is a time delay that represents the time difference between the sample points for each segment of the n! possible ordinal patterns.
3 Multidimensional Permutation Entropy Variant
Given a multidimensional time series \(\mathbf x _t \in \mathbb {R}^N\) for \({t = 1,2,3, ...,T}\), where N is the dimensionality of the time series. To order \(\mathbf x _t\) points based on their values [1], we propose using the distance between all time series’ points and a reference point (q). The rest of the PE calculation is the same as in Sect. 2. This technique can intuitively be understood as giving an ordering of timeseries points at some singledimensional projection specified by the reference point q converting the problem into a singledimensional evaluation.
 1.Euclidean distance (see Eq. (3)), using the first point in time series \(\mathbf x _1\) as (q) a reference point.$$\begin{aligned} d(\mathbf x ,\mathbf q )=\sqrt{(x_1  q_1)^2 +(x_2  q_2)^2 + (x_3  q_3)^2 + ... + (x_N  q_N)^2} \end{aligned}$$(3)
 2.Manhattan distance (see Eq. (4)), using the first point in time series \(\mathbf x _1\) as (q) a reference point.$$\begin{aligned} d(\mathbf x ,\mathbf q )= (x_1  q_1) + (x_2  q_2) + (x_3  q_3) + ... + (x_N  q_N) \end{aligned}$$(4)
 3.
Normalized distance which uses Eq. (3), but employs the zero point \(\mathbf 0 \) as the reference point.
The resulting single dimension time series after distance calculation represents the rank order of the successive time series points which can be easily used as input to PE for complexity estimation. This conversion of a multidimensional PE problem into a standard single dimensional PE problem by utilizing a distance measure is called MPE hereafter (MultidimensionalPE).
4 Evaluation
4.1 Dataset

Bodyworn sensors: 7 inertial measurement units, 12 3D acceleration sensors, 4 3D localization information.

Object sensors: 12 objects with 3D acceleration and 2D rate of turn.

Ambient sensors: 13 switches and 8 3D acceleration sensors.
In this paper, only the bodyworn sensors are used. These sensors give a 36dimensional timeseries. One run from each user was used for hyperparameter estimation and the remaining five runs were used for evaluation. The same hyperparameters were used for all PE based algorithms including the proposed variants. Sensor data was annotated for different levels of activities (low level, medium level and high level). We consider only the mode of locomotion annotation which has five classes labelled as [\(Stand=1, Walk=2, Sit=3, Lie=4\)]. These classes define five different motifs that repeat in the time series but in slightly different ways corresponding to the definition of approximately recurring motifs.
Figure 2 shows a sample from OPPORTUNITY acceleration sensors data, and their locomotion labels. Missing data was ignored.
4.2 Evaluating Motif Boundary Discovery
One complication to this analysis, is that constrained motif discovery algorithms do not require the constraint to have a high score exactly at the motif occurrence boundary but near it. This means that direct calculation of relative entropy is too conservative for estimating the performance of CMD algorithms given any constraint.
The main idea behind ESR is to treat the change scores as pdfs (Probability Density Function) and finding the probability that a random sample from this pdf will be within a given allowed delay from a true change point. This notion of sampling equality to the true distribution is directly measuring the appropriateness of the algorithm for use in Constrained Motif Discovery [9], but it also corresponds to our subjective sense that if a better motif occurrence boundary discovery algorithm is one that gives high scores near true motif occurrence boundaries and low scores otherwise. The main difference between this approach and relative entropy is that it incorporates the concept of a neighbourhood of acceptable shift.
To get a general performance evaluation independent of the acceptable shift, the Aggregated ESR (AESR) [10] was also calculated for each algorithm. Figure 4a shows these results. It is clear that the proposed three variants significantly outperform MMSPE and CPD both in terms of AESR and speed.
To assess the statistical significance of these results, we applied factorial ttest with the conservative Bonferroni’s multiple comparisons. For \(F_{0.5}\), the only statistically significant results were Normalized MPE vs. MMSPE (\(p=0.002\) and \(t=3.35\)), Euclidean MPE vs. MMSPE (\(p=0.003\) and \(t=3.12\)), Manhattan MPE vs. MMSPE (\(p=0.0049\) and \(t=3.01\)) and MMSPE vs. CPD (\(p=0.003\) and \(t=3.2\)) confirming that the proposed variants are better than MMSPE which is worse than CPD for the evaluated task. Differences in speed showed the same pattern of significance but will not be reported due to lack of space.
The main limitation of this evaluation methodology is that it compares discovery of motif occurrences instead of motifs. In some applications, it may not be important to discover all occurrences of a motif but only representative samples. The next experiment evaluates the proposed variants in terms of motif discovery instead of motif occurrence discovery.
4.3 Evaluating Motif Discovery Results
In this experiment, we compared the actual motif discovery performance of the GSteX Constrained Motif Discovery algorithm [8] when the constraint is calculated using the proposed variants against both CPD and MMSPE. The same hyperparameters as in the first experiment were used. The GSteX algorithm expects a discrete set of probable motif locations instead of a realvalued score. A simple localmaxima finding algorithm was used to find these probable locations for all algorithms as recommended by [8].
Evaluation metrics of motif discovery using different constraints
Algorithm  Precision  Recall  Accuracy  \(F_{0.5}\) 

Normalized MPE  1 ± 0  0.833 ± 0.15  0.856 ± 0.15  0.962 ± 0.043 
Eculidean MPE  1 ± 0  0.822 ± 0.141  0.837 ± 0.141  0.957 ± 0.041 
Manhattan MPE  1 ± 0  0.856 ± 0.135  0.825 ± 0.135  0.954 ± 0.039 
MMSPE  1 ± 0  0.667 ± 0.181  0.675 ± 0.182  0.896 ± 0.072 
CPD (m = 100 k = 300)  0.991 ± 0.039  0.889 ± 0.157  0.882 ± 0.154  0.962 ± 0.051 
4.4 Sensitivity Analysis
In this experiment, we evaluate the effect of hyperparameters on the quality of motif discovery (using the same evaluation methodology).
\(F_{0.5}\) values for different window sizes for motif discovery
Algorithm/WindowSize  100  200  400  800  1600 

Normalized MPE  0.959  0.962  0.962  0.959  0.962 
Eculidean MPE  0.952  0.958  0.957  0.957  0.957 
Manhattan MPE  0.95  0.965  0.954  0.954  0.957 
MMSPE  0.901  0.897  0.896  0.9  0.903 
CPD  0.971  0.974  0.962  0.958  0.98 
Effect of n on the performance of the propose variants (\(F_{0.5}\))
Proposed variant  \(n=3\)  \(n=4\)  \(n=5\)  \(n=6\)  \(n=7\)  \(n=8\)  \(n=9\)  \(n=10\) 

Normalized MPE  0.962  0.962  0.962  0.959  0.959  0.959  0.952  0.958 
Eculidean MPE  0.967  0.954  0.957  0.966  0.955  0.955  0.959  0.959 
Manhattan MPE  0.962  0.954  0.954  0.978  0.937  0.952  0.963  0.962 
Effect of \(\tau \) on the performance of proposed variants at \(n=3\)
Proposed variant  Value of \(\tau \)  

1  5  10  20  30  40  50  60  70  80  90  
Normalized  0.962  0.962  0.962  0.959  0.962  0.962  0.962  0.962  0.962  0.959  0.962 
Eculidean  0.967  0.935  0.981  0.963  0.961  0.94  0.961  0.959  0.95  0.95  0.944 
Manhattan  0.962  0.975  0.966  0.946  0.966  0.966  0.966  0.951  0.962  0.952  0.944 
4.5 Discussion
Taken together, the results of the first two experiments reported in this section show that the proposed variants provide a better constraint for motif discovery than either change point discovery or MMSPE in terms of occurrence boundary discovery (Fig. 3 and 4a) and outperforms MMSPE in motif discovery (Table 1). This superior performance is achieved despite the fact that the proposed variants are two times faster than MMSPE and three times faster than CPD.
One limitation of the proposed method for complexity analysis in general is that it relies directly on the concept of entropy. It can be argued that high entropy signals as well as low entropy signals are both simple which is not captured by the proposed method. This limitation, nevertheless, does not affect our goal of finding an effective constraint for motif discovery. In the future, we will consider using more rigorously defined complexity techniques based on minimum description length. Another limitation of the proposed approach is the dependence on some form of Euclidean or Manhattan distance metric which is not always meaningful for comparing time series (consider for example sound signals). In the future, the use of other metrics will be explored and the proposed method will be applied to other types of signals to assess its generality.
5 Conclusion
In this paper, we proposed a new approach for calculating time series complexity using Permutation Entropy for multidimensional timeseries. The proposed complexity measure was shown to provide a better way to detect motif occurrence boundaries than standard change point discovery in constrained motif discovery applications. Evaluation results established the superiority of the proposed variants to a stateoftheart timeseries complexity measure in a realworld evaluation dataset. Moreover, the paper shows that the proposed method is robust to variations in its hyperparameters.
References
 1.Bandt, C., Pompe, B.: Permutation entropy: a natural complexity measure for time series. Phys. Rev. Lett. 88(17), 174102 (2002). https://doi.org/10.1103/PhysRevLett.88.174102CrossRefGoogle Scholar
 2.Catalano, J., Armstrong, T., Oates, T.: Discovering patterns in realvalued time series. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 462–469. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_44CrossRefGoogle Scholar
 3.Chavarriaga, R., et al.: The opportunity challenge: a benchmark database for onbody sensorbased activity recognition. Pattern Recogn. Lett. 34(15), 2033–2042 (2013). https://doi.org/10.1016/j.patrec.2012.12.014CrossRefGoogle Scholar
 4.Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 493–498. ACM, New York (2003). https://doi.org/10.1145/956750.956808
 5.He, S., Sun, K., Wang, H.: Multivariate permutation entropy and its application for complexity analysis of chaotic systems. Phys. A Stat. Mech. Appl. 461, 812–823 (2016)MathSciNetCrossRefGoogle Scholar
 6.Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007)MathSciNetCrossRefGoogle Scholar
 7.Mohammad, Y., Nishida, T.: Learning interaction protocols using augmented Baysian networks applied to guided navigation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2010, pp. 4119–4126. IEEE, October 2010. https://doi.org/10.1109/IROS.2010.5651719
 8.Mohammad, Y., Ohmoto, Y., Nishida, T.: GSteX: greedy stem extension for freelength constrained motif discovery. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) IEA/AIE 2012. LNCS (LNAI), vol. 7345, pp. 417–426. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642310874_44CrossRefGoogle Scholar
 9.Mohammad, Y., Nishida, T.: Constrained motif discovery in time series. New Gener. Comput. 27(4), 319–346 (2009)CrossRefGoogle Scholar
 10.Mohammad, Y., Nishida, T.: On comparing SSAbased change point discovery algorithms. In: IEEE/SICE International Symposium on System Integration, SII 2011, pp. 938–945 (2011)Google Scholar
 11.Mohammad, Y., Nishida, T.: Exact discovery of lengthrange motifs. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8398, pp. 23–32. Springer, Cham (2014). https://doi.org/10.1007/9783319054582_3CrossRefGoogle Scholar
 12.Mohammad, Y., Nishida, T.: Scale invariant multilength motif discovery. In: Ali, M., Pan, J.S., Chen, S.M., Horng, M.F. (eds.) IEA/AIE 2014. LNCS (LNAI), vol. 8482, pp. 417–426. Springer, Cham (2014). https://doi.org/10.1007/9783319074672_44CrossRefGoogle Scholar
 13.Mohammad, Y., Nishida, T.: Exact multilength scale and mean invariant motif discovery. Appl. Intell. 44, 1–18 (2015)Google Scholar
 14.Mohammad, Y., Nishida, T.: Shift density estimation based approximately recurring motif discovery. Appl. Intell. 42(1), 112–134 (2015)CrossRefGoogle Scholar
 15.Mohammad, Y., Nishida, T.: \(MC^2\): an integrated toolbox for change, causality and motif discovery. In: Fujita, H., Ali, M., Selamat, A., Sasaki, J., Kurematsu, M. (eds.) IEA/AIE 2016. LNCS (LNAI), vol. 9799, pp. 128–141. Springer, Cham (2016). https://doi.org/10.1007/9783319420073_12CrossRefGoogle Scholar
 16.Mohammad, Y., Nishida, T.: Unsupervised discovery of basic human actions from activity recording datasets. In: IEEE/SICE International Symposium on System Integration, SII 2012, pp. 402–409. IEEE (2012)Google Scholar
 17.Mohammad, Y., Ohmoto, Y., Nishida, T.: CPMD: a matlab toolbox for change point and constrained motif discovery. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) IEA/AIE 2012. LNCS (LNAI), vol. 7345, pp. 114–123. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642310874_13CrossRefGoogle Scholar
 18.Morabito, F.C., Labate, D., Foresta, F.L., Bramanti, A., Morabito, G., Palamara, I.: Multivariate multiscale permutation entropy for complexity analysis of Alzheimer’s disease EEG. Entropy 14(7), 1186–1202 (2012)CrossRefGoogle Scholar
 19.Mueen, A.: Enumeration of time series motifs of all lengths. In: 2013 IEEE 13th International Conference on Data Mining (ICDM). IEEE (2013)Google Scholar
 20.Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: SIAM International Conference on Data Mining, SDM 2009, pp. 473–484 (2009)Google Scholar
 21.Riedl, M., Müller, A., Wessel, N.: Practical considerations of permutation entropy. Eur. Phys. J. Spec. Top. 222(2), 249–262 (2013)CrossRefGoogle Scholar
 22.Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of timeseries motif from multidimensional data based on MDL principle. Mach. Learn. 58(2/3), 269–300 (2005)CrossRefGoogle Scholar