Ultra-fast meta-parameter optimization for time series similarity measures with application to nearest neighbour classification

Nearest neighbour similarity measures are widely used in many time series data analysis applications. They compute a measure of similarity between two time series. Most applications require tuning of these measures’ meta-parameters in order to achieve good performance. However, most measures have at least O(L2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(L^2)$$\end{document} complexity, making them computationally expensive and the process of learning their meta-parameters burdensome, requiring days even for datasets containing only a few thousand series. In this paper, we propose UltraFastMPSearch, a family of algorithms to learn the meta-parameters for different types of time series distance measures. These algorithms are significantly faster than the prior state of the art. Our algorithms build upon the state of the art, exploiting the properties of a new efficient exact algorithm which supports early abandoning and pruning for most time series distance measures. We show on 128 datasets from the UCR archive that our new family of algorithms are up to an order of magnitude faster than the previous state of the art.


Introduction
Time series distance measures are used in a wide range of time series data mining tasks, including similarity search [5,21,24,32], classification [3,18,34,37,43], regression [33], clustering [8,23], indexing [39], and motif discovery [1]. All these tasks rely on nearest neighbour (NN) search, which is widely known to be most effective when the meta-parameters of the distance measures are learnt [3,18,37]. For instance, the Dynamic Time Warping (DTW) distance proves to be the most effective when constrained by the right warping window (WW) [8,26,34]. Indeed, the unconstrained DTW is subject to pathological warping, leading to unintuitive alignments [34] where a single point of a time series is aligned to a large section of another series [14]. Fig. 1 UltraFastWWSearch vs the naive LOOCV approach, and state-of-the-art FastWWSearch. Total training time on the 6 largest datasets from [7] in terms of N × L Traditionally, learning the meta-parameters has been a time-consuming leave-one-out cross-validation (LOOCV) process that requires computing the distance between every pair of training instances, for each value of each meta-parameter [7,8,18,35,37]. This is only compounded by the quadratic time complexity of most distance measures. Take DTW for example, with a training dataset of N time series of length L, learning the WW naively requires O(N 2 .L 3 ) operations (Sect. 2.7). This is extremely slow, even for datasets with only a thousand of time series. For instance, the naive approach took 58 h to learn the best WW on the StarLightCurves dataset from the UCR archive [7], which only has 1000 training instances with a length of 1024 ( Fig. 1). In comparison, the prior state-of-the-art method, FastWWSearch, took 30 min, while our proposed approach took only 4 min. Similar improvements can be seen across all the largest (in terms of N × L) datasets from the UCR archive [7], shown in Fig. 1 (note log scale).
The prior state-of-the-art FastWWSearch that learns the best WW for DTW, is a sophisticated and intricate algorithm [34], exploiting the properties of DTW and its various lower bounds to achieve a three orders of magnitude speedup over the naive approach. Its significance is demonstrated by FastWWSearch having received the SDM 2018 best paper award. Even though FastWWSearch achieves 1000 times speedup compared to the traditional LOOCV approach, it is still undesirably slow for large datasets with long time series. This is shown in Fig. 1, where FastWWSearch took almost 6 h to train on the HandOutlines dataset, one of the largest and longest datasets from the UCR archive [7], compared to just 11 min for our proposed method. The FastWWSearch was later extended to other distance measures, forming the Fast Ensemble of Elastic Distances (FastEE) [37], that is 40 times faster than the original EE. We will refer to them in the paper as the FastEE approaches.
Before FastWWSearch and FastEE, there were two primary strategies for speeding up LOOCV. One was by speeding up the NN search process, e.g. by using lower bounds to skip most of the distance computations [15-17, 24, 36, 41]. The other was by speeding up the core computation of the distances, e.g. by approximating it [28] or pruning unnecessary operations [31]. But, scalability remains an issue for large datasets and long series [34,37,42].
Recently, [11] developed an efficient implementation strategy for six elastic distances, including DTW. This strategy, known as "Early Abandoned and Pruned" (EAP), relies on an upper bound beyond which the precise distance is not required. This is used to prune and early abandon the core computation of elastic distances. Nearest neighbour search naturally provides such an upper bound (Sect. 3.1). EAP demonstrated more than an order of magnitude speedup for several NN search tasks.
In this paper, we propose UltraFastMPSearch, a family of three algorithms to learn the meta-parameters for the six time series distance measures that are at the heart of the influential Ensemble of Elastic Distances (EE) algorithm [18]-Dynamic Time Warping (DTW) [27], Weighted Dynamic Time Warping (WDTW) [13], Longest Common Subsequence (LCSS) [4], Edit Distance with Real Penalty (ERP) [5,6], Move-Split-Merge (MSM) [32], and Time Warp Edit Distance (TWE) [21]. We fundamentally transformed the FastEE algorithm proposed in [37] to exploit the full capacity of EAP [11]. UltraFastMPSearch consists of 1. UltraFastWWSearch, recently proposed in our paper IEEE ICDM2021 [35], of which the current paper is an expanded version. UltraFastWWSearch was designed specifically for DTW, exploiting a DTW property called the window validity to gain further substantial speedup; 2. UltraFastLocalUB, a variant of UltraFastWWSearch extended to other distance measures without the window validity; 3. UltraFastGlobalUB, a variant of UltraFastLocalUB that uses a global rather than local upper bound, ensuring that a distance computation is only early abandoned if it cannot provide a useful lower bound for distance computations with subsequent meta-parameter values. Like FastWWSearch and all the FastEE approaches, UltraFastMPSearch is exact, i.e. produces the same results as the traditional LOOCV approach. It is, however, always fasterup to one order of magnitude-than FastWWSearch and the FastEE approaches when tested on the 128 datasets from the UCR archive [7]. Figure 2 demonstrates this by comparing UltraFastWWSearch to FastWWSearch and to the traditional LOOCV approach.
Similar to the FastWWSearch and FastEE approaches, UltraFastMPSearch systematically fills a table recording the nearest neighbour at each meta-parameter for each series in database T . However, it does so without the intricate cascading of lower bounds that is critical to FastWWSearch and FastEE. UltraFastMPSearch processes a new time series in a systematic order, minimizing the number of distance computations while carefully exploiting the strengths of EAP to speedup the required ones. We release our code open-source 1 to ensure reproducibility and to enable researchers and practitioners to directly use UltraFastMPSearch as a subroutine to tune time series distance measures whatever their application.
We believe that UltraFastMPSearch serves as an important foundation to further speed up distance-based time series classification algorithms, such as the "Fast Ensemble of Elastic Distances" (FastEE) [37], that has fallen from favor due to its relatively slow compute time.
This paper is organized as follows. In Sect. 2, we introduce some background and notation used in this work. We review some related work in Sect. 3. Section 4 describes Ultra- Fig. 2 Pairwise plot comparing UltraFastWWSearch to baseline methods on 128 UCR datasets. Fast-WWSearch is the current state of the art [34]. LOOCV is the standard method used to search for the best warping window [18] FastMPSearch in detail. Then, we evaluate our method in Sect. 5 with the standard methods. Lastly, Sect. 6 concludes our work with some future directions.
Note that this paper is an extended version of our IEEE ICDM2021 UltraFast-WWSearch paper [35].

Background
We consider learning from a dataset T = {T 1 , . . . , T N } of N time series where T i are of length L. The letters S and T denote two time series, and T i denotes the ith element of T .
In this section, we briefly discuss the distance measures used in this work and their metaparameters. We refer interested readers to their respective papers for a detailed overview of the measures.

Dynamic Time Warping
The DTW distance was first introduced in 1971 by [27] as a speech recognition tool. Since then, it has been one of the most widely used distance measures in NN search, supporting sub-sequence search [24], regression [33], clustering [8,23], motif discovery [1], and classification [3,18,37]; nearest neighbour with DTW (NN-DTW) has been the historical approach to time series classification. The "Ensemble of Elastic Distances" (EE) [18], introduced in 2015, was one of the first classifiers to be consistently more accurate than NN-DTW over a wide variety of tasks. It relies on eleven NN classifiers including NN-DTW (and its variant with a warping window, cDTW, see below). EE opened the door to "ensemble classifiers", i.e. classifiers embedding other classifiers as components, for time series classification. Various recent and accurate ensemble classifiers such as HIVE-COTE [19], Proximity Forest [20], and TS-CHIEF [29] also embed both NN-DTW and NN-cDTW classifiers.
DTW computes in O(L 2 ) the cost of an optimal alignment between two series (lower costs indicating more similar series) by minimizing the cumulative cost of aligning their individual points. Equations 1a to 1d define the "cost matrix" M for two series S and T such that M(i, j) is the minimal cumulative cost of aligning the first i points of S with the first j points of T . It follows that DTW(S, T )=M(L, L).
Common functions for the cost of aligning two points are d( In the current paper, we use the latter. However, our algorithms generalize to any cost function. The individual alignments (dotted lines in Fig. 3) form a "warping path" in the cost matrix (Fig. 4a). See how the vertical section of the path column 1 in Fig. 4a corresponds to the first point of T being aligned thrice in Fig. 3.

Warping window
DTW is usually associated with a "warping window" (WW) w (originally called Sakoe-Chiba band), constraining how far the warping path can deviate from the diagonal of the matrix [27]. Given a line index 1 ≤ l ≤ L and a column index 1 ≤ c ≤ L, we have |l−c| ≤ w. With the cost function d(S i , T j )=(S i −T j ) 2 , a WW of 0 is equivalent to the squared Euclidean distance. On the other hand, a WW ≥ L−1 is equivalent to unconstrained (or "full") DTW. DTW with a WW is often called cDTW, or simply as DTW annotated with a window w. In this paper, we focus only on the commonly used warping window (Sakoe-Chiba band) [15,18,26,34,37]. However, it is also important to note that there are other types of constraints for DTW such as the Itakura Parallelogram [12] and the Ratanamahatana-Keogh band [25]. Cells cut-out by the warping window are in light grey, borders are in dark grey. The warping path computed by the full DTW (a) is valid down to w=2 (b). Hence, the next required computation is with w=1, resulting in a higher DTW cost of 25 > 23 (c) We can make the following observations. 1. WW have a "validity": If the warping path at a given window w deviates from the diagonal by no more that v, then it will remain the same for all windows v ≤ w ≤ w. This is called window validity, v in [34], and is noted [v, w], i.e. we say that DTW w (S, T ) has a window validity of Figure 4 illustrates these observations on the cost matrix. The full DTW (Fig. 4a) has a window validity of [2, L−1], i.e. the warping path and DTW cost of 23 are the same for all warping windows from L down to 2 (Fig. 4b). The next WW of 1 actually constraints the warping path, resulting in an increased DTW cost of 25 (Fig. 4c). Figure 5 illustrates the consequence of these observations. The DTW distance is constant for a large range of windows and increases when w gets smaller.

Weighted Dynamic Time Warping
The Weighted Dynamic Time Warping (WDTW) was proposed to reduce pathological alignments [13]. Instead of having a hard constraint like the WW for DTW, WDTW imposes a soft constraint on the warping path. The cost of aligning two points S i and T j is multiplied by a weight that depends on their distance in the time dimension, a= |i − j|. It will have a larger weight if i is far from j and reduces the chances of aligning S i to T j , thus preventing the alignment of two points that are too far away in the time dimension. The weights are computed using a modified logistic weight function described in Eq. 2, parameterized by the meta-parameter g that controls the level of penalization for further points [13]. The optimal range for g is distributed between 0.01 and 0.6 as suggested by the authors [13]. w max is the upper bound for the weight and is typically set to 1 [13].
We observed that WDTW monotonically decreases with increasing parameter g (Fig.  6), i.e. WDTW g (S, T ) > WDTW g+k (S, T ) for k > 0. This means that WDTW g (S, T )  is a lower bound for all WDTW g (S, T ) with 0 ≤ g < g. Note that unlike DTW that stays constant within a window validity, the sigmoid weighting function makes WDTW a continuous function, preventing it from having a constant value, illustrated in Fig. 6b.

Longest Common Subsequence
The Longest Common Subsequence (LCSS) is a common measure used to compare string sequences [4,40]. It finds the longest common subsequence that best matches the two string sequences. Using a distance threshold ε, LCSS can be extended to numeric sequences (time series) where the two points S i and T j are considered a match if the cost between them is less than ε. Each cell of the cost matrix M LCSS (i, j) indicates the number of matches between the Similar to WW, the δ constraint parameter has a validity, making LCSS constant for a range of δ values. By definition, LCSS is monotonic in both δ and ε as shown in Fig. 7. LCSS δ,ε (S, T ) decreases as δ increases, i.e. LCSS δ,ε (S, T ) ≥ LCSS δ+k,ε (S, T ) for k ≥ 1. In addition, LCSS δ,ε also decreases as ε increases, i.e. LCSS δ,ε (S, T ) ≥ LCSS δ,ε+k (S, T ) for k > 0. Note that LCSS δ,ε (S, T )= LCSS δ,ε+k (S, T ) = 0 at a sufficiently large ε. This allows LCSS with a larger δ or ε to lower bound LCSS with a smaller δ or ε.

Edit distance with real penalty
Most time series distance measures such a DTW and LCSS are not metric, making it challenging to index a time series dataset or prune k-NN queries under these measures. Edit Distance with Real Penalty (ERP) is a metric that combines the L1-norm and edit distances such as DTW [5,6]. It is parameterized by two meta-parameters, "gap value" g and WW, w. If a gap is added, the penalty will be the cost between a point S i or T j and g. This is described in Eq. 4. Note that our implementation uses the squared Euclidean distance as the cost function. Increasing g increases ERP g,w (S, T ), as illustrated in Fig. 8 We observed that ERP g+k,w (S, T ) ≥ ERP g,w (S, T ) for k ≥ 0 and according to Eq. 4, ERP g+k,w (S, T )= ERP g,w (S, T ) for a sufficiently large g. Hence ERP g,w (S, T ) lower bounds ERP g+k,w (S, T ). Similarly ERP g,w+k (S, T ) also lower bounds ERP g,w (S, T ) for all k ≥ 1. Similar to DTW with w=0, confined to the diagonal of the cost matrix, ERP is constant when w=0 regardless of the g value, i.e. ERP g+k,0 (S, T )= ERP g,0 (S, T ).

Move-Split-Merge
The Move-Split-Merge (MSM) distance is a metric proposed to overcome the limitations of existing distance measures: the Euclidean distance is not robust to temporal misalignment; edit distances such as DTW and LCSS are not metric; the ERP distance is a metric but not translation invariant due to the way the gap cost is computed. MSM is a metric, robust to temporal misalignment and translation invariant [32]. It is parameterized by an additive penalty value c that is used to compute the cost of aligning off-diagonal alignments, described in Eq. 5. This cost function takes in the new point (np) of the off-diagonal alignment and the two previously considered points (x and y). If the new point is within the bounds of x and y, then the penalty is only be c.
The MSM distance increases with increasing parameter c and stays constant after some sufficiently large c because deviating from the diagonal is too costly, becoming similar to a w=0 scenario. Figure 9 shows that MSM c (S, T ) ≥ MSM c+k (S, T ) for k > 0 and that we can use MSM c (S, T ) to lower bound MSM c+k (S, T ).

Time Warp Edit Distance
Existing distance measures assume that the time series are uniformly sampled and they do not consider timestamps in aligning time series. The Time Warp Edit Distance (TWE) was designed to take into account the timestamps to better align time series that are not uniformly sampled [21]. It uses three main operations (delete A , delete B and match) to align the time series. TWE is parameterized by two meta-parameters, v and λ. v controls the "stiffness" of aligning two time series by weighing the contributions from the timestamps. v=∞ means that points that are off-diagonal of the matrix are not considered and is similar to the Euclidean distance while v=0 is similar to DTW [21]. The match operation is the sum of the cost between the current and previous data points and the weighted contribution of the respective timestamps using v. λ is a constant penalty added to the cost of the alternate alignments, i.e. the delete operations. The cost of all the three operations and the computation of each elements in the cost matrix M TWE are described in Eqs. 7 and 8, respectively. Note that t S i denotes the i-th timestamp of a series S. Our implementation assumes the time series are uniformly sampled and does not use timestamp, thus t S i =i.  Figure 10 shows that TWE increases monotonically with v and λ, allowing us to use distances computed at the smaller parameters to lower bound distances at larger parameters. Similarly, TWE distance becomes constant at a sufficiently large v and λ, where the cost of doing the delete operations becomes too costly. Hence,

Learning the optimal meta-parameter
There are two main advantages of learning the optimal meta-parameter. First, a good metaparameter provides better results (e.g. classification accuracy) with NN search. For instance, learning the best DTW warping window prevents spurious alignments, which in turn improves the classification accuracy [8,14,34]. Recent research [3,18,34,37] demonstrated that learning the optimal meta-parameter for each distance measures significantly improves classification accuracy. Improvements in accuracy as great as from 65% to 93% have been demonstrated [34]. Second, the window meta-parameter reduces the time complexity down to O(w.L). By doing so, distance measures with a window meta-parameter, such as DTW, LCSS and ERP can be significantly faster to compute than without using any windows, especially for small windows. The usual approach for discovering the optimal meta-parameter is through leave-one-out cross-validation (LOOCV) [3,7,8,18] by optimizing a performance metric such as training accuracy. This can be seen as creating a (N × L) table as shown in Table 1, giving the nearest neighbour of every time series for all meta-parameters and finding the column that gives the best training accuracy.
In this paper, we use 1-NN which has been widely used for benchmarking distance-based TSC algorithms [18,37]. Note that our UltraFastMPSearch can easily be extended to cases where more than 1 neighbour (k > 1) is desired. This is done by adding a third Table 1 Table of NNs for each Nearest neighbour at meta-parameters 0 1 · · · P−2 P−1 dimension, k to Table 1 and keeping track of the distance to the k-th nearest neighbour. For simplicity, we describe our work in this paper using k=1.
A straightforward LOOCV meta-parameter search implementation has O(N 2 .L 2 .P) time complexity (L 2 for each distance measure, repeated over P meta-parameters). To be consistent with distance-based TSC literature [18,37], we used P=100 in our implementation. For measures with 2 meta-parameters, 10 of each meta-parameter are sampled from a specified distribution (discussed later), forming 100 parameter combinations. We leave the exploration of different parameter search space to future work. For the rest of the paper, we use a parameter ID notation, p to refer to the parameter space for each distance measure. For example for single parameter measures like DTW, p=0 refers to w=0, while for two-parameters measures like LCSS, p=1 refers to a combination of δ and ε such as δ=0 and ε=0.2. Due to its L 2 complexity, meta-parameter search does not scale to long series [34,35]. Note that FastWWSearch, FastEE and UltraFastMPSearch primarily tackle the impact of the L 2 part of the complexity, by minimizing the number of times the O(L 2 ) distance is computed. For instance, the FordA and FordB datasets in Fig. 1 have short series, but many of them: the resulting speedup is limited by the N 2 part of the complexity. Figure 15a, b clearly illustrate this. When the length of the series increases ( Fig. 15a), UltraFastMPSearch, especially UltraFastWWSearch scales extremely well compared to the state of the art. On the other hand, when the number of series increases (Fig. 15b), both methods suffer from the associated quadratic complexity, although UltraFastMPSearch does better.
Discovering the optimal meta-parameter for all distance measures is so expensive that a recent version of state-of-the-art classifier HIVE-COTE dropped EE, even though doing so reduced its accuracy by 0.6% on average across the UCR series archive [7], and by up to 5% on some datasets [2]. The sole reason for dropping EE is its computational burden being too great for it to be considered feasible to employ. In the future, UltraFastMPSearch may allow to reinstate a more efficient implementation of EE in HIVE-COTE, providing a substantial improvement to the state of the art.

Related work
In this section, we review the state-of-the-art methods to speed up NN-DTW, focusing on LOOCV and learning the optimal meta-parameter efficiently.

Lower bounding
Filling out the NNs table in Table 1 can be considered as finding the nearest neighbour for each time series T within T at each meta-parameter. It follows that one way to speed up LOOCV is to speed up NN search. A common approach to speeding up NN search is through "lower bounding" [15-17, 24, 36, 37, 41]. A NN search returns the nearest neighbour T nn of a query S among a dataset T , i.e. we have where DIST denotes a time series distance measures (Algorithm 1).
It turns out that NN search naturally supports lower bounding (Algorithm 2). First, in Algorithm 1, notice how d nn is an upper bound (UB) on the end result: either it is the distance of the actual nearest neighbour, or it will later be replaced by a smaller value. A lower bound An efficient lower bound must be fast while having good approximations ("tight") [15,36]. Because these aims compete, lower bounds of increasing tightness and cost are used in cascade, e.g. in the UCR-Suite [24]. Various lower bounds have been developed for DTW. The most common lower bounds for DTW are LB_Kim [16] and LB_Keogh [15]. The UCR-Suite [24] is one of the fastest NN search algorithms. It uses 4 optimization techniques: early abandoning, reordering early abandoning, reversing query and candidate roles in LB_Keogh and cascading lower bounds (LB_Kim and LB_Keogh) to speed up NN search. Tan et al. [34] show that although UCR-Suite is faster than naive LOOCV, it is still significantly slower than FastWWSearch in optimizing for DTW's warping window. Lower bounds for other time series distance measures have not been well explored. Tan et al. [37] developed lower bounds for other time series distance measures and demonstrated that they can significantly speed up the meta-parameter optimization process.

Improving distance measures implementation
Speeding up the distance measure itself also speeds up the whole LOOCV process. PrunedDTW is one of the first approaches to speed up DTW computations [30]. It first computes an upper bound on the result (the Euclidean distance), then skips cells from the cost matrix M DTW that are larger. It was later extended to use the upper bound UB compute by the NN search process (d nn in Algorithm 2), allowing early abandoning [31]. These techniques only yielded a minimal improvement when applied to window search [34].
Recently, Herrmann and Webb [11] developed the new "EAP" (Early Abandoned and Pruned) strategy, which is tightly integrating pruning and early abandoning for the six time series distance measures considered in this paper. EAP supports the fastest known time series distance measure implementations, even reducing the need for lower bounds. Like any early abandoned distance, EAP takes an upper bound UB as an extra parameter (again, d nn in Algorithm 2) and abandons the computation as soon as it can be established that the end result will exceed it. The novelty of EAP is that early abandoning is treated as an extreme consequence of pruning: if UB prunes a full line of the distance matrix M, then no warping path can exist. Note that in Algorithm 2, the initial upper bound is set to ∞, which does not allow to prune anything when using EAP. By initializing it to the diagonal of M (i.e. squared Euclidean distance for DTW), EAP can prune some cells of M from the start but not early abandon.

FastWWSearch and FastEE
FastWWSearch [34] is a window optimization algorithm for DTW, producing the same results as LOOCV while being significantly faster than both traditional LOOCV and UCR-Suite. It exploits three important properties of DTW and its LB_Keogh lower bound: warping windows have a validity (Section 1); DTW is monotonic with w (Section 2); and, like DTW, LB_Keogh is monotonic with w.
FastWWSearch exploits these properties by starting from the largest warping window, skipping all the windows where the warping path remains the same. Starting from the largest warping window has another advantage: the monotonic property of DTW and LB_Keogh allows LB_Keogh w+k and DTW w+k , for any k ≥ 1, to be used as lower bounds for DTW w . In other words, results obtained at larger windows provide "free" lower bounds for smaller windows.
FastWWSearch was subsequently extended to other distance measures, creating Fas-tEE [37], a significantly faster implementation of EE. It uses the lower bounds to the distance measures and exploits their properties described in Sect. 2. We refer each of the distance measure (except cDTW) to their respective FastEE method as FastWDTW, FastLCSS, FastERP, FastMSM and FastTWE. Each distance measure in EE contains 100 parameter values in the search space. FastWWSearch was originally designed to search through L warping windows for DTW and was modified to use only 100 warping windows (percentage of the time series length) in FastEE.
FastWDTW exploits the monotonic property of WDTW by starting from the largest g value, lower bounding WDTW at smaller g using WDTW at larger g. Since WDTW is continuous with g, it does not have a constant value, preventing FastWDTW to skip any distance computations in the WDTW parameter search space. The g parameter for WDTW is chosen from an uniform distribution U (0, 1) with 100 values. FastMSM takes advantage of MSM property that it increases monotonically with its parameter c and stayed constant at some sufficiently large c. It uses MSM distances computed at a smaller c to lower bound MSM at the larger c. It also skips the computations at c when the distances are constant. The parameter c is sampled from an exponential sequence in the range of [0.01, 100] with 100 values.
For LCSS, 10 ε values are sampled uniformly from the range [σ/5, σ ] where σ is the standard deviation of the training dataset, while other 10 δ values are sampled from the range [0, L/4], forming a total of 100 parameter combinations. Then, they are arranged in such a way that the overall LCSS distance decreases with an increasing parameter ID (Fig. 11a), allowing FastLCSS to start from the largest parameter ID. Similar to WW, the δ parameter also allows FastLCSS to skip the computations at δs where the distances are constant. On the other hand, the meta-parameters combination for ERP and TWE is arranged such that the overall distances increase with an increasing parameter ID (Fig. 11b, c), allowing FastERP and FastTWE to start from the smaller parameter combination. The search space for ERP's g and w meta-parameter is chosen using the same way as LCSS. Note that regardless of the g value, the ERP distance is the same when the window parameter is 0. This means that there is a redundancy in the search space and will be explored as part of our future work.

Ultra-fast meta-parameter search
Our UltraFastMPSearch is a family of three algorithms, UltraFastWWSearch, UltraFastLocalUB and UltraFastGlobalUB. The UltraFastWWSearch algorithm was introduced in our ICDM2021 conference paper [35], designed specifically for the DTW distance measure. Both UltraFastLocalUB and UltraFastGlobalUB generalize to all distance measures that can be computed with EAP. The UltraFastLocalUB is a generalized version to UltraFastWWSearch that does not utilize the window validity property. UltraFastGlobalUB differs from UltraFastLocalUB such that a global upper bound is used instead of a local upper bound for EAP. A local upper bound refers to nearest neighbour distances at the current meta-parameter while a global upper bound refers to using nearest neighbour distances at some upper bound meta-parameter that gives an upper bound for a range of meta-parameters.
The significance of this difference lies in the consequences of EAP early abandoning a distance calculation. UltraFastMPSearch orders the meta-parameter values such that the distance at one value is a lower bound for the distance at the next. This is exploited to avoid calculating the distances at most values. However, if EAP early abandons a distance calculation, the exact distance is not known only that it is greater than the upper bound that was employed by EAP. Hence, that upper bound is the tightest lower bound available for the distance at the next parameter value. When distances only increase a small amount from one parameter value to the next, this is usually sufficient for effective lower bounding. However, when they increase more substantially, it means that opportunities to exploit lower bounding are lost. In this case, it is better to use a weaker upper bound such that if a distance turns out to be a useful lower bound for some subsequent parameter value it will be found. The strongest such weaker bound is the global upper bound-the minimum value that could allow the current candidate series to be a nearest neighbour at the final parameter value in the current sequence. Our experiments in Sect. 5 show that some distance measures are more effective using the local and others the global upper bound approach.
The core of UltraFastMPSearch is built upon FastEE [37] with the following key differences: 1. It replaces all calls to distance measures with the EAP variant [11]; 2. It takes full advantage of EAP's early abandoning and pruning; 3. It processes the time series in an order that best exploit EAP's capabilities; and 4. It does not use any custom lower bounds.
For the rest of the paper, all distance measures should be understood as being the EAP variant unless specified otherwise.
Recall that learning the meta-parameter can be thought as filling up a (N × P) nearest neighbour table, as shown in FastEE and illustrated in Table 1. Thus, it is important to note that FastEE and all algorithms under UltraFastMPSearch share the same space complexity. Once this table has been filled, we can easily determine the best meta-parameter for a particular problem by looking for the column that gives the best performance. In case of ties, we take the meta-parameter that is cheaper to compute at test time, for instance, smaller w for DTW. Algorithm 3 describes this process. The result is identical to FastEE and LOOCV. In general, Algorithm 3 can be transformed to either FastEE or LOOCV by replacing the InitTable algorithm in line 1 with the specified algorithm to fill the NNs table. For LOOCV, this is naively filling the table described in Sect. 2.7. Each of the UltraFastWWSearch, UltraFastLocalUB and UltraFastGlobalUB has their own InitTable method, which will be discussed later in this section.
[t!] In the following, we describe UltraFastMPSearch using UltraFastWWSearch and explain the key differences in both UltraFastLocalUB and UltraFastGlobalUB. We present UltraFastWWSearch as a set of algorithms. They rely on a global cache C indexed by a pair of series, storing a variety of information. C (S,T ) .value stores the most recent DTW value (i.e. at a larger window); C (S,T ) .validity stores the minimum window size for which C (S,T ) .value is valid. C (S,T ) .do_euclidean calculates the squared Euclidean distance between S and T on demand, caching the result for future uses.
The AssessNN algorithm in Algorithm 4 is a function that assesses whether a given pair of time series (S, T ) is less than some distance d apart for a meta-parameter p. For DTW, p=w is the warping window and DIST = DTW. AssessNN differs substantially from the FastWWSearch function on which it is based, LazyAssessNN, which incorporates complex management of partially completed lower bound calculations at varying windows. AssessNN uses DTW w+k computed at a larger window as a lower bound to avoid the computation of DTW w when possible. Unlike the complex cascade of lower bounds used in FastWWSearch, this is the only lower bound used in UltraFastWWSearch.
Algorithm 4 first checks whether the previously computed DTW distance, stored in the cache C (S,T ) , is larger than the current best-so-far distance to beat, d. If so, the algorithm terminates without any extra computation. This is because DTW distance increases with decreasing w (see Fig. 5), so if a distance at a larger w is already larger than the best-so-far distance d at w, then so too is DTW w . If not and the previously computed DTW is still valid, it is returned (line 2). Otherwise, we have to compute DTW w (S, T ). Notice that on line 4, we make use of the EAP implementation of DTW, passing the upper bound UB as an argument. We will describe how UB is calculated in the following paragraphs. If we do not early abandon, then the new distance is stored in C (S,T ) . Else we store UB in C (S,T ) and terminate the algorithm. Storing UB in C (S,T ) instead of ∞ provides a better ordering of T ∈ T later in the algorithm.

NNs[T ][w] ← (S, C (S,T ) )
Algorithm 3 is used to fill up the NNs table to learn the warping window for DTW. Algorithm 3 can be transformed into FastWWSearch by replacing line 1 with Algorithm 3 in [34] for FastWWSearch. We use Algorithm 6 to fill this table efficiently.
Similar to FastWWSearch, we build this table for a subset T ⊆ T of increasing size until T = T . This method allows us to process all the series in T in a systematic and efficient order. We start by building the table for T comprising only 2 first time series T 1 and T 2 , and fill this (2 × P)-table as if T was the entire dataset. At this stage it is trivial that T 2 is the nearest neighbour of T 1 and vice versa. We then add a third time series T 3 from T \ T to our growing set T . At this point, we have to do two things: (a) find the nearest neighbour of T 3 within T \ T 3 = {T 1 , T 2 } and (b) check whether T 3 has become the nearest neighbour of T 1 and/or T 2 . This is described in Algorithm 5. We can then add a fourth time series T 4 and so on until T = T .
Algorithm 5 describes the process to check whether either of a pair (S, T ) is a nearest neighbour of the other and, if so, to update the NNs table accordingly. This process differs from FastWWSearch by using a local UB to early abandon and prune DTW computations, exploiting EAP. It is important to have a "tight" UB, especially for w=L, because DTW L is the most expensive operation for UltraFastWWSearch and thus needs to be minimized.
Using EAP alone has provided a significant boost to the speed of FastWWSearch, which will be shown in our experiments in Sect. 5 The core of UltraFastWWSearch lies in Algorithm 6. In line 1, we start by initializing the NNs table to (_, +∞), an otherwise empty table with +∞ nearest neighbour distances. Then, we initialize T , the subset of T processed so far. After initializing the key components, we start with the second time series in T and add all the preceding time series T s−1 to T . We start the computation from the largest window, w=L−1, described from lines 4 to 10.
Recall that FastWWSearch processes the series at w=L−1 similarly as any other smaller w. It goes through the set T in an ascending order of lower bound distance to S. For the case of w=L−1, T is ordered on LB_Kim, which is a loose lower bound. This exploits its complex cascade lower bounds in order to minimize the number of full DTW calculations required by using the lower bounds to prune as many as possible. In contrast, UltraFastWWSearch exploits the unique properties of EAP by seeking to minimize the UB used in each call to DTW. Recall  .dist last, when it is also most likely to be small and hence the max will be small. To this end, we process T in descending order of the NN distance of each T ∈ T at w=L−1, as outlined in lines 6 to 9.
However, while this sort order is important to minimize the EAP computations at the full DTW when only loose lower bounds are possible, DTW w+1 (S, T ) provides a very tight lower bound on DTW w (S, T ). Once it is available it is advantageous to exploit it. Hence, on line 14, we order T in ascending order of their DTW L−1 distances. Note that we only do this once for each series S. In practice, the order does not change substantially as the window size decreases. Rather than resorting at each window size, it is sufficient to just keep track of the nearest neighbour at w+1, process it first as it is likely to be one of the nearest neighbours at w and then process the remaining series in the DTW L−1 sort order.
In addition, we also keep track of the maximum window validity, ω for all NNs[T ][L] for all T ∈ T . By keeping track of ω, we can quickly skip all the windows where the distances are constant for all T ∈ T . On line 11, once we have the NN of S at w=L−1, we need to propagate this information for all w , w ≤ w for which the warping path is valid. Similarly in lines 12 and 13, we also need to propagate the NN for all T ∈ T if NNs[T ][L]=S.
From line 16, we continue to process the windows from ω−1 down to 0. Line 17 checks if we already have a NN for S from larger windows due to window validity. Lines 18 to 23 check whether S is the NN of any T ∈ T . Since we already have the NN for S, the process  is the similar to lines 6-8 of Algorithm 5, the difference being the way UB was calculated. In this case, we can use the distance of the current NN of T (if available) as the UB instead of taking the max of the two NN distances, as we will not use the result to check whether T is S's NN. The process starting from the else case (line 24) is when we do not obtain the NN of S at w from a larger window. In this case, we need to search for the NN of S from T . We start from T NN w+1 , the NN of S at w+1. The NN of both S and T NN w+1 is updated with Algorithm 5. The rest of T ∈ T \ T NN w+1 is processed similar to Algorithm 5, except that we need to keep track of T NN w+1 (lines [27][28][29][30][31][32][33][34][35]. Finally, we have NNs[S][w], the NN of S at w; we need to propagate the information for all valid windows (line 36).

Ultra-fast meta-parameter search with local upper bound
UltraFastWWSearch is very specific to DTW and is not applicable to other distance measures. Hence, we designed a more generic algorithm, UltraFastLocalUB that is applicable to all time series distance measures that can be calculated using EAP, including DTW. The main difference with UltraFastWWSearch is that it that does not exploit the window validity property in DTW, as not all measures have this property.
UltraFastLocalUB requires some slight modifications to generalize the algorithms in UltraFastWWSearch to other distance measures. Starting from Algorithm 5, Algorithm 7 replaces all w to p and takes in an additional input, p UB . p UB is the meta-parameter that gives the upper bound distance in the parameter search space. This parameter allows us to first compute C (S,T , p UB ) .do_upperBound= DIST p UB (S, T ), the distance at the upper bound meta-parameter, which is then cached for future use. This is similar to computing the Euclidean distance as the upper bound for DTW where w=0. For DTW, WDTW and ERP, this upper bound distance is equivalent to computing the diagonal of the distance matrix M and can be computed in O(L) time instead of O(L 2 ). Similar to UltraFastWWSearch, if we do not have a best-so-far NN for either S or T yet, then we compute the upper bound distance between S and T at parameter p UB . The rest of the algorithm is similar to Algorithm 5 in UltraFastWWSearch.
Then UltraFastLocalUB replaces Algorithm 6 in Algorithm 3 with Algorithm 8 to fill the NNs table. Algorithm 8 describes the general process of filling the NNs table for any time series distance measures that support EAP. For ease of exposition, we assume that the meta-parameter p increases from 0 to P but implemented it based on the distance measure's properties described in Sect. 2. There are two main differences to Algorithm 6. First, it does not utilize the maximum window validity as per UltraFastWWSearch, because not all distance measures have the window meta-parameter. Second, it needs to keep track of the upper bound parameter ID, p UB . Distance measures with two meta-parameters have multiple upper bound distances along the parameter search space and when we do not have a nearest neighbour yet, we want to use the tightest upper bound possible for EAP -the upper bound that is closest to the current distance. Hence it is important to keep updating p UB while going through the search space.
The upper bound parameter p UB should be updated according to the properties of each distance measure, described in Sect. 2. p UB is constant for single meta-parameter measures, i.e. p UB DTW = p UB WDTW =0 and p UB MSM =P. In this work, we ordered the parameters for two metaparameters measures such that every 10 meta-parameter is an upper bound to the previous 9. This means that we need to update p UB at every 10-th meta-parameter. For the special case of ERP, the upper bound is when w=0 which gives the same ERP distance for all g, i.e. the upper bound for ERP is the same in this parameter space. When we are processing a new query, we have to reset p UB to the first upper bound, as shown in line 4 of Algorithm 8. Then line 15 of Algorithm 8 checks and update p UB after changing to the next parameter.

Ultra-fast meta-parameter search with global upper bound
UltraFastLocalUB takes the maximum of the nearest neighbour distance of S and T at the current meta-parameter p as the upper bound. We call this a local upper bound as this is only applicable to the current meta-parameter. Instead of using the nearest neighbour of S and T at the current meta-parameter p, UltraFastGlobalUB uses a global upper bound, i.e. the nearest neighbour distance of S and T at the parameter p UB . The global upper bound is applicable to a range of previous meta-parameters. Note that by definition, the global upper bound is looser than the local upper bound.
We modify Algorithm 7 by replacing line 1 with UB = max(NNs[S][ p UB ].dist, NNs[T ] [ p UB ].dist) as presented in Algorithm 9. The rest of Algorithm 9 is the same as Algorithm 7. Similarly Algorithm 8 is also modified with respect to the global upper bound by changing lines 19 and 26. Note that as the global upper bound will be used a lot, it is important to cache it for future use.

Experiments
This section describes the experiments to evaluate our UltraFastMPSearch. To ensure reproducibility, we have made our code and results available open-source at https://github. com/ChangWeiTan/UltraFastWWS. Note that UltraFastMPSearch is exact, producing the same results as FastWWSearch, FastEE and LOOCV, hence we are only interested in comparing the training time.
Our experiments use all of the 128 benchmark UCR time series datasets [7]. For each method, we perform the search using the set of 100 meta-parameters (Sect. 3.3) used in EE [18] and FastEE [37]. This allows UltraFastMPSearch to be directly used in EE. Since the ordering of the series in the datasets might affect the training time, i.e. the speed depends on where the actual nearest neighbour is, we report the average results over 5 runs for different reshuffles of the training dataset. We conducted our experiments in Java, on a standard single core cluster machine with AMD EPYC Processor CPU @2.2GHz and 32GB RAM.
Our experiments are divided into three parts. (A) We first study the effect of using EAP with DTW on LOOCV. (B) Then, we explore the features of UltraFastWWSearch that Figure 12a compares the total training time of the three methods on the 128 datasets. The results show that EAP_LOOCV reduces the training time of DTW_LOOCV by almost 1000 h (42 days) and about 300 h (12 days) for UCR-Suite_LOOCV. Note that EAP_LOOCV was able to achieve such significant speedup without using any lower bounds, while UCR-Suite_LOOCV uses a series of complex lower bounds. The main reason is because the LB_Kim and LB_Keogh lower bounds used in UCR-Suite are very loose at larger windows, as pointed out in [36]. The work in [36] showed that the more complex LB_Keogh can be looser than the simpler LB_Kim when w ≥ 0.5 · L. This shows that EAP is able to reduce the need for lower bounds for NN-DTW especially at larger warping windows.

Speeding up the state of the art with ULTRAFASTWWSEARCH
This section examines the features that make UltraFastWWSearch ultra-fast, comparing it to state-of-the-art FastWWSearch. The results are shown in Fig. 12b.
Much of EAP's speed up in many NN-DTW tasks actually comes from early abandoning (see Fig. 7 of [11]). Section 5.1 shows that EAP, even without using lower bounds, speeds up the naive LOOCV implementation. Hence, we created two variants of FastWWSearch, (1) with early abandoning and (2) without lower bounds to study how they contribute to speeding up FastWWSearch, annotated with the suffixes "_EA" and "_NoLb", respectively. We adopt the early abandoning strategy described in [24] for the original FastWWSearch and use the UB described in Sect. 3.1 for the early abandoning process.
It is not surprising that removing lower bounds for FastWWSearch makes it slower, as it makes use of various lower bounds to achieve the huge speed up. However, it is interesting that adding early abandoning to FastWWSearch makes it the slowest. This is because if DTW is early abandoned at a larger window, then when FastWWSearch needs the DTW distance at a smaller window, because it was not fully computed, FastWWSearch needs to recalcuate DTW from scratch. Similar behaviour was observed in [34] as well. On the other hand, the opposite is observed for the EAP variants. EAP_FastWWSearch _NoLb in Fig. 12b is EAP_FastWWSearch with the use of lower bounds removed. It shows that removing lower bounds actually improves EAP_FastWWSearch, albeit only by about 20 min. This is not surprising as it coincides with the results from the EAP paper [11]. This again highlights the effectiveness of the early abandoning strategy of EAP and the possibility of removing complex lower bounds.
UltraFastWWSearch incorporates six primary strategies that distinguish it from FastWWSearch. We study the effect of introducing each of these in turn with algorithms EAP_FastWWSearch: using EAP for DTW computations; EAP_FastWWSearch_NoLb: removing lower bounds; EAP_FastWWSearch_EA: using early abandoning; Ultra-FastWWSearch_V1: tighter upper bounds; UltraFastWWSearch_V2: sorting T in ascending order of distance to nearest neighbour and then sorting on DTW L ; and Ultra-FastWWSearch: skipping windows from L−1 to ω, the maximum window validity at L−1. Figure 12b shows that substituting EAP to compute DTW within FastWWSearch (even without early abandoning) (EAP_FastWWSearch) reduces the total training time for all 128 datasets by 5 h. 3 h and 50 min of this comes from 5 long and large datasets, NonInvasiveFetalECGThorax1, UWaveGestureLibraryAll, HandOutlines, FordA and FordB. The pairwise plot in Fig. 13 illustrates that EAP_FastWWSearch is consistently faster than the original DTW variant, although the difference between them is not large. The result shows that without early abandoning, EAP is still an efficient strategy that prunes unnecessary computations in DTW.
The effectiveness of early abandoning an EAP computation depends on the UB that was passed into it. EAP_FastWWSearch_EA uses the UB described in Sect. 3.1. The V1 variant of UltraFastWWSearch uses the UB described in Algorithm 5. The results in Fig. 12b show that this UB improves the speed of UltraFastWWSearch but by a small margin.  Fig. 12b.
Finally, we add the optimization of skipping windows from L−1 to ω. While the five previous optimizations all exploit the properties of EAP, this final optimization is a novel further exploit of the window validity property beyond those in FastWWSearch. It more than halves the total time. Figure 12b shows that UltraFastWWSearch is able to complete all 128 datasets in under 4 h. This is a 6 times speedup compared to 24 h for FastWWSearch.
We performed a statistical test using the Wilcoxon signed-rank test with Holm correction as the post hoc test to the Friedman test [10] to test the significance of our results and visualize it in a critical difference diagram, illustrated in Fig. 14. Figure 14 shows the average ranking of each method over all datasets, with a rank of 1 being the fastest and rank 9 being the slowest. Methods in the same clique (black bars) indicates that they are not significantly different from each other. Similar to the results in Fig. 12b, the optimizations for UltraFastWWSearch significantly slows down FastWWSearch. The critical difference diagram shows that all the EAP variants are faster than the original FastWWSearch with significant consistency across datasets. It is interesting to observe that although early abandoning reduces the total time on 128 datasets shown in Fig. 12b, it is ranked lower compared to all other methods. The reason being the early abandoning strategy in EAP reduces the time of three largest datasets (HandOutlines, FordA, and FordB) by a significant amount, while the overhead of having to recalculate EAP if previously early abandoned has greater cost relative to the computation save by abandoning on smaller datasets. Then we see that UltraFast-WWSearch is the fastest among all with an average rank closed to 1 (i.e. it is faster than all methods on almost all datasets), followed by its V2 and V1 variants. Figure 2 shows that UltraFastWWSearch is up to one order of magnitude faster than FastWWSearch.

Scalability to large and long datasets
We showed previously that UltraFastWWSearch is efficient on large and long datasets. We now investigate its scalability. We first experimented using the HandOutlines dataset with a length of L=2709-the longest in the UCR archive [7]. We varied the length from 0.1 × L to L, recording the time to search for the best warping window. Figure 15a shows that the training time of UltraFastWWSearch increases slower than FastWWSearch as L increases. With only L=1000, we are able to achieve 9.4 times speed up, and 30 times at L=2709. This means reducing 6 h of compute time down to 11 min (Fig. 1), thus effectively tackling the L 3 part of the complexity. We then evaluated the scalability to larger datasets, using the same SITS dataset as [34], taken from [38]. We chose this dataset because it has a short length of L=46, which tends to isolate the influence of N on the scalability. Figure 15b shows that UltraFastWWSearch is on average 2 times faster than FastWWSearch for all N . This means that although UltraFastWWSearch is faster than FastWWSearch, the N 2 part of the complexity becomes a limitation of UltraFastWWSearch. However, traditional methods LOOCV and UCR-Suite do not even scale on this dataset as shown in [34], requiring days to complete, while UltraFastWWSearch only takes 6 h for N =90, 000.

ULTRAFASTMPSEARCH-Generalizing to other distance measures
The previous experiments showed that UltraFastWWSearch achieved a substantial speed up compared to FastWWSearch in optimizing DTW's warping window. In this section, we generalize UltraFastWWSearch to UltraFastLocalUB and UltraFastGlobalUB and apply them to all other time series distance measures. We only compare UltraFast-LocalUB and UltraFastGlobalUB to the baseline FastEE approaches for each of the measures and FastEE using EAP and early abandoning. We assume that LOOCV for each distance measure will take similar time as DTW_LOOCV as they all have the same O(L 2 ) time complexity. Note that our proposed UltraFastWWSearch is by default using the local upper bound and we call the variant with the global upper bound, UltraFastWWSearch-Global which also uses the window validity to skip some DTW computations. Figure 16 compares the total training time for the methods of each distance measures on the 128 datasets. Overall, we observed that using the local upper bound is more efficient for most measures except for MSM and TWE where the global upper bound is faster. This is expected because the local upper bound is by definition tighter than the global upper bound. The results show that UltraFastMPSearch significantly reduces the training time from FastEE methods for all distance measures: between two consecutive λ or c. As the distances will be very similar, this means that they will typically early abandon near the end of the matrix and almost computing the full distance. Recall that an early abandoned distance cannot be used as a lower bound for the subsequent meta-parameters in UltraFastMPSearch. So it becomes inefficient for the algorithm when the distance computation early abandons late too many times, and recomputing it again for the next meta-parameter. This is almost as if we are computing the distance computation twice and defeating the purpose of early abandoning.
Therefore, by using a looser global upper bound, UltraFastGlobalUB achieves a balance between the number of times the DIST function is called and the number of times it early abandons. In other words, if a distance computation early abandons using a larger global upper bound, then we know that the candidate T will never be the nearest neighbour between the current meta-parameter and p UB , thus skipping all the unnecessary distance function calls. For other distance measures where the distances only grow by small amounts from one parameter value to the next, the number of times the global upper bound allows useful tighter lower bounds for subsequent parameter values is so few that the time saving is less than the time saved by the additional pruning EAP can achieve with the tighter local upper bounds. Figure 17 shows the average ranking of each distance measure in terms of training time over all datasets using a critical difference diagram to compare the training time for all the distance measures. At a glance, UltraFastMPSearch is significantly faster than the current state of the art, FastEE methods. Similar to the results in Sect. 5.2, some of the methods like UltraFastWWSearch-Global, EAP-FastMSM-EA, UltraFastTWE-Local and EAP-FastTWE-EA have longer overall training time, as shown in Fig. 16, but ranked better in the critical difference diagram in Fig. 17. The reason is that these methods are slightly faster on most datasets, that are short and small and require little computation, but took longer on the larger and longer datasets for which overall computation is greatest.

Conclusion
This paper proposes UltraFastMPSearch-a family of ultra-fast algorithms that are able to learn the meta-parameters for time series distance measures efficiently. UltraFast-WWSearch fundamentally transforms its predecessor FastWWSearch. It incorporates six major changes-using EAP to compute DTW; removing the use of DTW lower bounds; adding early abandoning of DTW; establishing tighter upper bounds for early abandoning; ordering the time series so as to best exploit the efficient pruning and early abandoning power of EAP; and using the window validity to skip the majority of window sizes altogether. UltraFastLocalUB generalizes UltraFastWWSearch by not using the window validity to skip the meta-parameters. Instead of using the nearest neighbour distance at the current meta-parameter for EAP, UltraFastGlobalUB uses as the upper bound the nearest neighbour distance computed at a parameter value that provides an upper bound for a series of subsequent parameter values. This achieves a better balance for measures with distances that grow substantially between successive meta-parameters, such as MSM and TWE.
Our experiments show that the UltraFastMPSearch algorithms are up to an order of magnitude faster than the previous state of the art, with the greatest benefit achieved on long time series datasets, where it is most needed.
UltraFastWWSearch speeds up the training of NN-DTW, formerly one of the slowest time series classification (TSC) algorithms, to under 4 h on the UCR datasets, a time close to ROCKET, one of the fastest and most accurate TSC algorithms [9]. Similarly Ultra-FastMPSearch also speeds up the training of NN search with other distance measures, although the efficiency is not as great due to the lack of the window validity property. This speedup holds open the promise for EE to be reinstated back into HIVE-COTE, which is known to improve its classification performance to a new state-of-the-art level for TSC and only omitted due to its excessive compute time [22] Dr. Matthieu Herrmann is a research fellow at Monash University, where he works on Time Series Classification. He obtained his Ph.D. from the University of Paris Diderot in 2016, studying programming languages and formal systems. He then joined Monash University in Melbourne, switching his focus to Time Series Classification. He now mainly works on efficient computation and parameterization for instance based classifiers, and develops the C++/Python Tempo library to make his research group research easily accessible to practitioners. Geoffrey I. Webb is Research Director of the Monash University Data Futures Institute. He was editor in chief of Data Mining and Knowledge Discovery, from 2005 to 2014. He has been Program Committee Chair of both ACM SIGKDD and IEEE ICDM, as well as General Chair of ICDM and member of the ACM SIGKDD Executive. He is a technical advisor to machine learning as a service startup BigML Inc and to recommender systems startup FROOMLE. He developed many of the key mechanisms of support-confidence association discovery in the 1980s. His OPUS search algorithm remains the state-of-the-art in rule search. He pioneered multiple research areas as diverse as blackbox user modelling, interactive data analytics and statistically sound pattern discovery. He has developed many useful machine learning algorithms that are widely deployed. His many awards include IEEE Fellow and the inaugural Eureka Prize for Excellence in Data Science (2017).