On computing exact means of time series using the move-split-merge metric

Computing an accurate mean of a set of time series is a critical task in applications like nearest-neighbor classification and clustering of time series. While there are many distance functions for time series, the most popular distance function used for the computation of time series means is the non-metric dynamic time warping (DTW) distance. A recent algorithm for the exact computation of a DTW-Mean has a running time of O(n2k+12kk)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(n^{2k+1}2^kk)$$\end{document}, where k denotes the number of time series and n their maximum length. In this paper, we study the mean problem for the move-split-merge (MSM) metric that not only offers high practical accuracy for time series classification but also carries of the advantages of the metric properties that enable further diverse applications. The main contribution of this paper is an exact and efficient algorithm for the MSM-Mean problem of time series. The running time of our algorithm is O(nk+32kk3)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(n^{k+3}2^k k^3 )$$\end{document}, and thus better than the previous DTW-based algorithm. The results of an experimental comparison confirm the running time superiority of our algorithm in comparison to the DTW-Mean competitor. Moreover, we introduce a heuristic to improve the running time significantly without sacrificing much accuracy.


Introduction
Time series databases have gained much attention in academia and industry due to demands in many new challenging applications like Internet of Things (IoT), bioinformatics, social and system monitoring.In particular, because of the emergence of IoT, the requirement for developing dedicated systems [32] supporting time series as a first-class citizen has increased recently.In addition to supporting fundamental database operations like filters and joins, analytical operations like clustering and classification are highly relevant in time series databases.
The analysis of time series like clustering largely depends on the underlying distance functions.In a recent study, Paparrizos et al. [34] re-examined the impact of 71 distance functions on classification for many datasets.While dynamic time warping (DTW) and related functions [8] had the reputation of being the best choice, Paparrizos et al. [34] found DTW performing inferior for time series classification in comparison to many other elastic distance functions.Among those is the move-split-merge ( MSM) metric [19].It works similarly to the Levensthein distance [2] by transforming one time series into another using three types of operations.A move operation changes the value of a data point, a merge operation fuses two consecutive points with equal values into one, and a split operation splits a point into two adjacent points with the same value.In addition to its superiority to DTW, MSM offers another significant advantage: it satisfies the properties of a mathematical metric, and thus it is ready-to-use for metric indexing [15] and algorithms that presume the triangle inequality.
Partition-based algorithms such as k-means clustering are among the best methods for clustering time series [29].One of the fundamental problems of k-means clustering for time series is how to compute a mean for a set of time series.Brill et al. [31] studied the problem for DTW and developed an algorithm computing an exact mean of k time series in O(n 2k+1 2 k k) time, where n is the maximum length of an input time series.To the best of our knowledge, the mean problem of time series has not been addressed for other distance functions like the MSM metric so far.
In this paper, we examine the mean problem of time series for the MSM metric.The mean m of a set X of input time series is a time series that minimizes the sum of the distances to the time series in X regarding the MSM metric.In the following, we use MSM-Mean and DTW-Mean to denote the mean of the MSM metric and DTW distance function, respectively.
An example of MSM-Mean is depicted in Figure 1.It comprises four sample time series from the data set Italy Power Demand of the UCR time series archive [21] with their respective MSM-Mean.In contrast to DTW-Mean, we show that MSM-Mean consists of values only present in the underlying time series.This observation is crucial for the design and efficiency of our algorithm.We prove that the running time of our algorithm is O(n k+3 2 k k 3 ), thus faster than the DTW-based competitor [31].
In summary, our contributions are: • We give new essential characteristics of the MSM metric.We first prove that there always exists an optimal transformation graph that is a forest to further specify the values of some crucial nodes within this forest.
The remainder of the paper is structured as follows.Section 2 reviews related work.In Section 3, we give some important preliminaries for the MSM metric and formulate the MSM-Mean problem.Then, in Section 4, we introduce some new properties of the MSM metric to prove at the end of the section that there always exists a mean consisting of data points of the input time series.The dynamic program for the exact MSM-Mean algorithm is given in Section 5. We experimentally evaluate our approach, discuss various heuristics, and compare it to the DTW-Mean algorithm in Section 6, and conclude in Section 7.

Related Work
For the exploratory analysis of time series, clustering is used to discover interesting patterns in time series datasets [9].Much research has been done in this area [11].The surveys [20,18] give a recent overview of many methods.The problem of mean computation is discussed for Euclidean distance and for the DTW distance, but not for the MSM metric.
Moreover, the use of classification methods is indispensable for accurate time series analysis [33].The temporal aspect of time series has to be taken into account for clustering and classification, though finding a representation of a set of time series is a challenging task.Determining accurate means of time series is crucial for partitioning clustering approaches [13] like k-means [3], where the prototype of a cluster is a mean of its objects, and for nearest-neighbor classification [23].These methods are based on the choice of the underlying distance function.Among the existing time series distance measures [34], the DTW distance [5] is a very important measure with application in, e.g., similarity search [12], speech recognition [6], or gene expression [10].We now give an overview of mean computation methods using the DTW distance.Besides the exact DTW-Mean algorithm [31] that minimizes the Fréchet function [1], there are many heuristics trying to address this problem.Some approaches first compute a multiple alignment of k input time series and then average the aligned time series column-wise [14,17].DTW barycenter averaging (DBA) [16] is a heuristic strategy that iteratively refines an initially average sequence in order to minimize its square DTW-distance to the input time series.Other approaches exploit the properties of the Fréchet function [27,30].Their methods are based on the observation that the Fréchet function is Lipschitz continuous and thus differentiable almost everywhere.Cuturi et al. [27] use a smoothed version of the DTW-distance to obtain a differentiable Fréchet DTW-distance.Brill et al. [31] showed that none of the aforementioned approaches is sufficiently accurate compared to the exact method.Since clustering methods based on partitioning rely on cluster prototype determination [20], it is necessary to compute an accurate mean.All these observations make it indispensable to consider the problem for other distance functions, like the MSM metric.
The MSM metric is already investigated for classification.One of the first studies of the MSM metric concerning its application for classification problems was by Stefan et al. [19].They perform their tests on 20 data sets of the UCR archive [21].The MSM distance is tested against the DTW distance, the constrained DTW distance, the edit distance on real sequence and the Euclidean distance.For a majority of the tests, the MSM distance performs better than the compared measures.There have been further studies of the accuracy of different time series distance measures regarding 1-NN classification problems [25,22].Bagnall et al. [25] also conclude that the MSM distance leads to better accuracy results than DTW but worse running time.All these studies come to a similar result as the most recent study of Paparrizos et al. [34].To the best of our knowledge, there are no studies that investigate and extend the theoretical concepts and applications of the MSM distance.
The subject of time series, also known as data series, has attracted attention within the database research domain recently, see [28] for a recent survey.There are time series data bases, also known as event stores, that are specially designed for the analysis of time series [24,32].These systems rarely support clustering, but focus on supporting the basic building blocks for query processing.
Since the MSM distance obeys all properties of a mathematical metric, especially the triangle inequality, it also applies to problems like metric indexing [26,15].In fact, metric indexing also requires the computation of pivots that is closely related to the mean.However, pivots belong to the underlying data set, while the mean (of a time series) is generally a newly generated object.

Preliminaries
Let us first introduce our notation and problem definition.For k ∈ N, let [k] := {1, . . ., k}.A time series of length n is a sequence x = (x1, . . ., xn), where each data point, in short point, xi is a real number.Let V (x) = {xi | x ∈ x} be the set of all values of points of x.For i < j, the point xi is a predecessor of the point xj and the point xj is a successor of the point xi.For a set of time series X = {x (1) , . . ., x (k) }, the ith point of the jth time series of X is denoted as x (j) i ; time series x (j) has length nj.Further let V (X) = ∪ j∈[k] V (x (j) ) = {v1, . . ., vr} be the set of the values of all points of all time series in X.

Move-Split-Merge Operations
We now define the MSM metric, following the notation of Stefan et al. [19], and the MSM-Mean problem.The MSM metric allows three transformation operations to transfer one time series into another: move, split, and merge operations.

. , xn).
A merge operation may be applied to two consecutive points of equal value.For xi = xi+1, it is given by Merge i (x) := (x1, . . ., xi−1, xi+1, . . ., xn).We say that xi and xi+1 merge to a point z.Split and merge operations are inverse operations.Their costs are assumed to be equal and determined by a given nonnegative constant c = Cost(Split i ) = Cost(Merge i ).A sequence of transformation operations is given by S = (S1, . . ., Ss), where Sj ∈ {Movei j ,w j , Split i j , Merge i j }.A transformation T (x, S) of a time series x for a given sequence of transformation operations S is defined as T (x, S) := T (S1(x), (S2, . . ., Ss)).If S is empty, we define T (x, ∅) := x.The cost of a sequence of transformation operations S is given by the sum of all individual operations cost, that is, Cost(S) := S∈S Cost(S).We say that S transforms x to y if T (x, S) = y.We call a transformation an optimal transformation if it has minimal cost transforming x to y.The MSM distance d(x, y) between two time series x and y is defined as the cost of an optimal transformation.The distance D(X, y) of multiple time series X = {x (1) , . . ., x (k) } to a time series y is given by D(X, y) = x∈X d(x, y).A mean m of a set of time series X is defined as the time series with minimum distance to X, that is, m = arg min z∈Z D(X, z), where Z is the set of all finite time series.The problem of computing a mean is thus defined as follows: MSM-Mean Input: A set of time series X = {x (1) , . . ., x (k) }.Output: A time series m such that m = arg min z∈Z D(X, z).
Before we regard the MSM-Mean problem in more detail, we will introduce the concept of transformation graphs to describe the structure of a transformation T (x, S) = y.

Transformation graphs
The transformation T (x, S) = y can be described by a directed acyclic graph G S (x, y), the transformation graph, with source nodes N (x) = {u1, . . ., um} and sink nodes N (y) = {v1, . . ., vn}, where a node ui represents the point xi and the node vj represents the point yj.All nodes which are neither source nor sink nodes are called intermediate nodes.If the time series and operation sequence are clear from context, we may write G instead of G S (x, y).Each node in the node set V of G is associated with a value given by a function val : V → R. For source and sink nodes we have val(ui) = xi and val(vj) = yj.Each intermediate node is also associated with a value.The edges represent the transformation operations of S. To create a transformation graph, for each operation in S a respective move edge or two split, or merge edges are added to the graph.A move edge can be further specified as an increasing (inc-) or decreasing (dec-) edge if the move operation adds a positive or negative value to the value of the parent The cost of a move edge is the difference between the source and the target point.We have total cost merge and split cost 3c and move cost of 8. Hence, the distance between x and y is d(x, y) = 8.3.
node, respectively.An edge can be either a move, split or merge edge.If a node α is connected to a node β by a split edge and β is a child of α, then there exists a node γ = β to which α is connected by a split edge and which is a child of α.If the nodes α and β are connected by a merge edge and α is a parent of β, then there exists a node γ = α which is connected to β by a merge edge and is a parent of β.Moreover, for the split and the merge case, it holds that val(α) = val(β) = val(γ).Given a set of sequence of operations S, the transformation graph G S (x, y) is unique.Given a transformation graph G, it may be derived from different sequences of operations since a sequence S is only partially ordered.Taking the example of a transformation graph (see Figure 2), that means, that for example the move operation between the node u1 and α and the move operation between u4 and v3 are interchangeable.
A transformation path, in short path, in G S (x, y) is a directed path from a source node ui ∈ N (x) to a sink node vj ∈ N (y).We say that ui is aligned to vj.A path can be further characterized by its sequence of edge labels.For example, in Figure 2, the path from u1 to v2 is an inc-merge-inc-split path.Analogously, we say that the path consists of consecutive inc-merge-inc-split edges.
A transformation path is monotonic if the move edges on this path are only inc-or only dec-edges.A monotonic path may be specified as increasing or decreasing.A transformation is monotonic if the corresponding transformation graph only contains monotonic paths.A transformation graph is optimal if it belongs to an optimal transformation.Two transformation graphs are equivalent if they have the same sink and source nodes.
In the next section, we recap some known properties about the transformation graph and extend them proving some new essential characteristics.

Properties of Transformation Graphs
In the following, we summarize some important known properties about the transformation graph by Stefan et al. [19].The first lemma states that there exists an optional transformation graph without split and merge edges that occur directly after another on a path.Lemma 1 (Proposition 2 [19]).For any two time series x and y, there exists an optimal sequence of transformation operations S such that G S (x, y) contains no consecutive merge-split or split-merge edges.
By construction, two consecutive move-move edges are not useful, since they can be combined into one move edge.We extend Lemma 1 to further path restrictions in an optimal transformation graph.That is, that there exists an optimal transformation graph without paths containing consecutive split-move-merge edges.
Lemma 2. For any two time series x and y, there exists an optimal sequence of transformation operations S such that G S (x, y) contains no consecutive split-move-merge edges.
Proof.Assume an optimal transformation graph G S (x, y) including split-move-merge edges.Since the underlying set of transformation operations S of G is partially ordered, we can reorder the operations in S, choosing an order, where split, move and merge operations are directly applied after one another.Figure 3 shows two different possibilities how consecutive split-move-merge edges may be contained in a transformation graph.
Case I : We consider a split at node α to the nodes α and α where val(α) = val(α ) = val(α ).It is followed by two move edges from α to β and from α to β and a merge of β and β (see Figure 3a).Therefore, the values added to val(α) have to be equal on both move edges, that is a value w ∈ R. The cost of these transformation operations are 2c + 2|w|.Consider replacing the two split-move-merge edges with one direct move edge from α to β adding w to val(α) (see Figure 3b).This replacement leads to an equivalent transformation with cost |w| < 2c + 2|w|.This is a contradiction to our assumption that G is optimal.
Case II : Consider the part of a transformation graph in Figure 3c.There is a split at α to α and α .The node α is connected by a move edge to β adding a value w to val(α ).The node β merges with β to β. Deleting the split-move-merge edge and editing the part of the graph as shown in Figure 3d leads to an equivalent transformation graph, saving cost of 2c + |w|.This is a contradiction to our assumption that G is optimal.
The next lemma states that there is always an optimal monotonic transformation.
Lemma 3 (Monotonicity lemma [19]).For any two time series x and y, there exists an optimal transformation that converts x into y and that is monotonic.
Summarizing the above properties, there always exists an optimal transformation graph only containing paths from source to sink nodes of the following consecutive edge types: Note, that paths of Type 2 and Type 3 contain at least one split or merge edge, respectively.In the following, we consider only transformation graphs that contain only paths of Type 1-4.To identify independent transformation operations, we decompose an optimal transformation graph into its weakly connected components.A weakly connected component is a tree if its underlying subgraph is a tree.
In the following, we give a more substantive view on those weakly connected components which are trees (see Figure 4).The first ones are trees of Type 1.These trees contain only paths of Type 1, that is, there is only one move edge in the tree connecting one source and sink node (see Figure 4a).A weakly connected component containing only paths of Type 2 has only one source node and at least two paths of Type 2, that is, it has at least two sink nodes.It is a tree since all all nodes have indegree 1.We call these trees trees of Type 2 (see Figure 4b).Trees of Type 3 contain only paths of merge or move operations (Type 3).These trees have at least two source nodes whose paths reach the same sink node.All nodes have outdegree 1 (see Figure 4c).The last weakly connected component which is a tree is a tree of Type 4. These trees include only paths of Type 4. They contain only and at least two paths of Type 4, that is, that they have at least two source and two sink nodes (see Figure 4d).For the following sections, we need a more detailed description of Type-4-Trees.All source nodes merge and move to some intermediate node σ.After σ, there is a move to σ * with subsequent split and move edges leading to the source nodes.All source and sink nodes of this tree are connected by one single path which we call a bottleneck with σ as the first bottleneck node and σ * as the second bottleneck node.All nodes above and including the first bottleneck node have outdegree 1.We call this subgraph the upper tree of σ.All nodes below and including the second bottleneck node have indegree 1.We call this subgraph a lower tree of σ * .
The following lemma states that there always exists an optimal transformation graph, where every weakly connected component is a tree of Type 1-4.Lemma 4. Let x and y be two time series.Then there exists an optimal transformation graph G S (x, y) such that its weakly connected components are only trees of Type 1-4.
Proof.We show that if a weakly connected component of a given optimal transformation graph G is not a tree of Type 1-3, then it has to be a tree of Type 4. We consider a path of Type 4. In a path of Type 4 there is one part with consecutive merge-move-split edges.Let σ be the node between this merge and move operation and σ * be the node between this move and split operation.We add further move and merge operations to the part above σ.We still have outdegree 1 of each node above σ.The same applies for the part below σ * : Adding further move and split operations still leads to indegree 1 of each node below σ * .Hence, the subgraph of G consisting of the edge from σ to σ * and all move or merge edges above σ and all move or split edges below σ * is a tree of Type 4. We now show that this tree structure cannot be extended without violating our assumptions of the above Lemmas.Let α be a node in the upper tree that is connected to a node α that is neither in the upper nor in the lower tree.If α is a source node or an intermediate node except of σ, the first operation is a split, where one split edge is on the path to σ and the other is on the path to α .We get a contradiction to Lemma 2, because the first path includes consecutive split-move-merge edges.If α = σ, we have again a split at α, which is a  contradiction to Lemma 1 because we have a consecutive merge-split edge.The same argumentation is applied for an extension of the lower tree, because it is the symmetric case of the one we described.
It follows that we can decompose an optimal transformation graph G S (x, y) into a sequence of distinct trees (T1, . . ., Tt).Each tree Ti has a set of sink nodes NT i (x) and a set of source nodes NT i (y).All nodes of NT i (x) and NT i (y) are successors of NT i−1 (x) and NT i−1 (y), respectively.We call a tree monotonic if all paths in the tree are monotonic.Further a tree may be specified as increasing or decreasing.Two trees are equivalent if they have the same set of source and sink nodes.The cost of a tree T is the sum of the cost of all edges in the tree.
In the following, we denote an optimal transformation graph fulfilling all the above properties as an optimal transformation forest.

Properties of the MSM Metric
As a main result of this section, we prove that for a set of time series X there exists a mean m such that all points of m are points of at least one time series of X.To this end, we first analyze the structure of trees of optimal transformation forests.Some of the following results are only proven for trees of Type 4 since these trees include all types of possible paths; as a consequence the proofs for other tree types are simpler versions of the ones for Type 4.

Properties of Alignment Trees
We first regard some properties of so-called subtrees, which are substructures of trees of Type 4.

Subtrees
Let G S (x, y) be an optimal transformation forest.For an intermediate node δ in G, that has two parent nodes connected to it by a merge edge each, let S(δ) be the subtree of δ consisting of all source nodes parents of α 1 of G that have a path to δ and of all nodes and edges on these paths.Each subtree has a set of source nodes N S(δ) (x).Let N S(δ) (x) = {ui, . . ., uj} be the source nodes of S(δ); we call ui the start node of S(δ) and uj the end node of S(δ).A subtree is increasing (decreasing) if all paths to δ are increasing (decreasing).In the following, we will give some properties of subtrees.If there are two move edges to some nodes α2 and β2 that merges to another node γ (see Figure 5a), we first observe that these two move edges cannot be both increasing or decreasing.
Lemma 5. Let G S (x, y) be an optimal transformation forest with nodes α1, α2, β1, β2 and γ and move edges between α1, α2 and β1, β2.If α2 and β2 merge to γ, then the edges between α1, α2 and β1, β2 cannot be both increasing or decreasing.
Proof.Without loss of generality, we prove this assumption for increasing paths.Assume towards a contradiction two inc-edges between α1 and α2 and between β1 and β2 with val(α1) < val(β1) (see Figure 5a).Since val(α2) = val(β2) = val(γ) the cost for these move operations are 2 val(γ) − val(β1) − val(α1).We now consider a modified merge structure with an additional intermediate node γ1 with val(γ1) = val(β1) (see Figure 5b).We now have an inc-edge from α1 to γ1, which merges with the new node β1 to a new node γ2.At γ2, there is an inc-edge to γ.The modified transformation forest is equivalent to the old one, since the parent and children nodes of the regarded nodes stay the same (see Figure 5).The cost for the modified move operations are val(γ) . This is a contradiction to G being optimal.
We now make two observations about the value of the node δ in a subtree S(δ).The first lemma states, that the value of δ is equal to one value of the source nodes of S(δ).Recall that u1, . . ., um are the source nodes of G S (x, y) with values x1, . . ., xm.Lemma 6.Let S(δ) be an increasing (decreasing) subtree of δ in an optimal transformation forest G S (x, y) with N S(δ) (x) = {ui, . . ., uj}.Then, val(δ) = max(xi, . . ., xj) for increasing subtrees and val(δ) = min(xi, . . ., xj) for decreasing subtrees.Proof of Lemma 6.Without loss of generality, we prove this assumption for increasing subtrees.Figure 6 depicts this subgraph with all mentioned intermediate nodes.Assume towards a contradiction, that val(δ) = max(xi, . . ., xj).Therefore, there exists an u ∈ {ui, . . ., uj} such that val(δ) = x .For val(δ) < x , it follows that we have a decreasing edge between u and δ, which is a contradiction.For val(δ) > x , let α be the first intermediate node below {ui, . . ., uj} such that val(α) = max u∈N S(α) (x) (u), where N S(α) (x) ⊆ N S(δ) (x) are the source nodes of the subtree S(α) of α.There exist two intermediate nodes α and α such that for the source nodes of their subtrees S(α ) and S(α ), respectively, it holds that N S(α ) (x) ∪ N S(α ) (x) = N S(α) (x).It follows that there exists a path from α to α and from α to α.Since α is the first intermediate node below N S(δ) (x) such that val(α) = max u∈N S(δ) (x) (u), it holds that val(α ) = max u∈N S(α ) (x) (u) and val(α ) = max u∈N S(α ) (x) (u).Consequently, there is an inc-edge on the path from α to α and on the path from α to α. Applying Lemma 5, this is a contradiction to G being optimal.
In the next lemma, we specify the value of δ in a subtree S(δ), stating that it is always equal to the value of a specific source node of N S(δ) (x).
Lemma 7. Let the nodes δ1 and δ2 merge to a node δ in an optimal transformation forest G. Let S(δ) be the subtree of δ, S(δ1) be the subtree of δ1 with the end node ui−1, and S(δ2) be the subtree of δ2 with the start node ui.If the subtree S(δ) is increasing (decreasing) and xi−1 > xi, then val(δ Proof.Without loss of generality, S(δ) is increasing.Figure 7 depicts this subgraph with all mentioned intermediate points.We will only show the case that xi−1 > xi since the other case is analogous.By Lemma 6, it holds that val(δ) = max(val(δ1), val(δ2)).We first show that val(δ) = val(δ1).Assume towards a contradiction that val(δ1) < val(δ2) = val(δ).Since xi−1 > xi, there exists intermediate nodes γ and γ on the path between ui and δ2 such that xi−1 ≥ val(γ ) and xi−1 < val(γ ).Let S (δ) be the modified subtree of S(δ).The only difference between S and S is that γ merges to some intermediate node on the path between ui−1 and δ1.The cost of S (δ) is Cost(S (δ)) = Cost(S(δ))−val(γ )−val(γ ) < Cost(S(δ)).This is a contradiction to G being optimal.Applying Lemma 5 and Lemma 6, we get val(δ) = val(δ1).
In a second step, we prove that val(δ1) = xi−1 = max u∈N S(δ 1 ) (x) (val(u)).Assume towards a contradiction, that there exists a uj ∈ N S((δ 1 ) (x)\ui−1 such that val(δ1) = xj > xi−1 > xi.Then, it holds that . . . .there exist two intermediate nodes δ and δ on the path between ui−1 and δ1 such that val(δ ) < val(δ1) and val(δ ) = val(δ1).We consider the modified subtree S (δ), which is almost equal to S(δ), the only difference being that δ2 merges to some intermediate node on the path between δ and δ .The cost of . This is a contradiction to G being optimal.
We will now apply the above properties to the bottleneck nodes in a tree of Type 4 stating that the first and second bottleneck nodes always have values of the input time series x and y, respectively.Recall, that the first bottleneck node σ is the intermediate node where all source nodes in the tree of Type 4 merge to, followed by a move edge to the second bottleneck node σ * .Corollary 1.Let G S (x, y) be an optimal transformation forest.In a tree T of Type 4 val(σ) ∈ VT (x) and val(σ * ) ∈ VT (y).
Proof.To prove that σ ∈ VT (x) we apply Lemma 6 since the upper part of the tree T is the subtree of σ.By symmetry reasons, it follows that σ * ∈ VT (y).

The Effect of Perturbing Single Values
We aim to show that there exists a mean of a set of time series that only consists of points of the input set.To this end, we first make observations on the effect of shifting points of a time series that are not from V (X).To this end, we first analyze for two time series x and y, how the distance between x and y may be affected by shifting one point of x by ε ∈ R. We let xε,i denote the new time series that is equal to x except at the position i, where it has the new point xi + ε.The change of the node ui in the transformation forest is denoted by u ε i .In the following we say that if the distance between xε,i and y is shorter than between x and y, the replacement of x by xε,i is beneficial.If it leads to a longer distance, it is detrimental, and if the distance does not change it is neutral.Assume that xi / ∈ V (y), the next lemma states that if the replacement of x by xε,i is not neutral, it is beneficial for either ε or −ε.Proof.We show the lemma for trees of Type 4. All other cases are simpler versions of this proof.Let T be a tree of Type 4 in G S (x, y).By Lemma 3, the tree T is monotonic.We assume, without loss of generality, that all monotonic paths in T are increasing.We distinguish whether ui has only predecessors or only successors (Case 1 ) or both (Case 2 ) in T .We denote the predecessors of ui as P and the successors of ui as F.
Case 1 : ui has only predecessors or successors in T .We prove the case that ui has only predecessors, the other case is analogous.We first describe the possible structures of the upper tree in T for this case, which are depicted in Figure 8.There is a potential move at ui to a node γ.The node γ merges to δ with a node α * , which is the node resulting from a move at α.The nodes {u i− , . . ., ui−1} ⊆ P are the source nodes of the subtree of α.Below δ there may be further subsequent merge and move operations to the first bottleneck node σ.Since xi−1 = xi there has to be an inc-edge either between ui and γ, if xi−1 > xi, or between ui−1 and α * , if xi−1 < xi because in the first case val(δ) = xi−1 and in the second case val(δ) = xi (see Lemma 7).
Case 1.1 There is an inc-edge between ui and γ (see Figure 8a).The replacement of x by xε,i is a beneficial increase for all ε ∈ [0, ε ] with ε = val(γ) − xi because the node u ε i approaches the node γ and the cost of the adapted move decrease by ε.Thus, we get the left side of Equation (1), d(xε,i, y) + ε = d(x, y).Since the subtree of δ is increasing and xi−1 > xi, it holds by Lemma 7 that val(δ) = xi−1.We get that xi−1 = val(γ) = xi + ε .For the right side of Equation ( 1), the argumentation is similar: After replacing x by x−ε,i for ε ≤ ε , the cost for the move between xi − ε and γ are val(γ) − xi + ε.Therefore, they increase by ε.
Case 1.2 : xi−1 < xi.There is an inc-edge between ui−1 and α * (see Figure 8b).We modify the structure of T for the replacement of x by xε,i for ε ∈ [0, εI ], εI > 0. Let T be the modified tree with a new node u ε i instead of ui.In T , the nodes α * and δ does not exist but T contains a new node δ such that val(δ ) ∈ [val(α * ), val(σ * )].The node u ε i merges to δ .The rest of the tree stays unchanged.For all ε ∈ [0, εI ] with εI = val(σ * ) − xi the cost of T is equal to the cost of T because we only shifted a merge operation to another position in the tree (see Figure 8c).This is a neutral increase for all ε ∈ [0, εI ].It holds that xi + εI = σ * ∈ V (y) (see Corollary 1).Let T be another modified tree of T with a new node u −ε i instead of ui for ε ∈ [0, εD], εD > 0. The tree T does not contain the node δ but contains a new node δ such that val(δ ) ∈ [xi−1, val(α * )] and u −ε i merges to δ (see Figure 8d).For all ε ∈ [0, εD] with εD = val(α * ) − xi−1 we get equal cost of T and T since we only shifted a merge operation.From Lemma 7 we get that val(α * ) = val(δ) = xi and hence xi − εD = xi−1.
Case 2 : ui has predecessors P and successors F. Again, we first describe the upper Type-4-Tree T (see Figure 9).Let {u i− , . . ., ui−1} ⊆ P be the source nodes of the subtree of α.At α there is a potential move to α * .Let {ui+1, . . ., ui+r} ⊆ F be the source nodes of the subtree of ζ.At ζ there is a potential move to ζ * After a potential move from ui to γ there is a merge with α * , which is afterwards merged with ζ * to an intermediate node δ.Without loss of generality, we assume this order of merge to δ.What follows are potential move and merge operations until all nodes in NT (x) merge to the first bottleneck node σ.Since T is increasing, the subtree of δ is increasing.We further analyze the relation between xi to its direct predecessor xi−1 and its direct successor xi+1.
Case 2.1 : xi−1 < xi < xi+1.By Lemma 7, it follows that val(δ) = xi+1.Furthermore, there is no inc-edge between ui and γ because ui merges with α * to β with a subsequent inc-edge to β * (see Figure 9a).We modify the tree structure of T for the replacement of x by xε,i.Let T be the modified tree of T , where we have the new node u ) Proof mechanism introducing the modified tree T , where the path on which the new node δ can be shifted on is marked in blue.d) Proof mechanism introducing the modified tree T following the same mechanism as in c).Case 2.4 There is an inc-edge between ui and γ (see Figure 9c).Again, we further assume, without loss of generality, that xi−1 < xi+1.Following the same argumentation as in Case 1.1, the replacement of x by xε,i is a beneficial increase for all ε ∈ [0, εI ] with εI = val(γ) − xi.By Lemma 7, it holds that val(β) = xi−1 and val(δ) = xi+1.Note, that there are no increasing paths between ui−1 and β * and ui+1 and ζ * because otherwise there is no move between ui and γ (see Lemma 5) NT (y).The same proof as for trees of Type 4 can be applied.Since the symmetries properties hold for the MSM metric, the Lemma holds for trees of Type 2 as well.For a tree containing only a move edge, the argumentation is the same as in Case 1.1.
In the following, we regard a block B of adjacent source nodes NB(x) = {ui, . . ., u } representing points of equal value of a time series x.A block is a maximal contiguous sequence of nodes with the same value.Our aim is to show a generalization of Lemma 8 shifting all points of a block B by some ε ∈ R. We show that shifting a block is either beneficial for one direction or neutral for both directions.Let x ε,i, , i < , denote the time series that is equal to x except at the positions i, . . ., , where the points xi of x are replaced by xi + ε.The definitions of beneficial, detrimental or neutral replacements of x by x ε,i, are analogous to the previous one.A block may be contained in several trees, hence shifting a block affects the cost of all these trees.To count the number of trees with beneficial or detrimental replacement, we introduce two further parameters ρI , ρD ∈ N. Lemma 9. Let x = (x1, . . ., xm) and y = (y1, . . ., yn) be two time series with a distance d(x, y).If we consider a block B of similar points NB(x) = {ui, . . ., u } with xi / ∈ V (y), then there either exists an ε > 0 and ρI , ρD ∈ N such that for all ε ∈ [0, ε ] one of the following equations holds: or there exist εI , εD > 0 such that ,i, y) for all ε ∈ [0, εI ] (neutral increase), and

and for neutral increases and decreases
Proof.We distinguish whether all nodes of a block B belong to the same tree or if they are in different trees.Without loss of generality, we specify monotonic paths and trees to be increasing.considered to be a tree of Type 4, since all other cases follows the same or a simpler argumentation.We further distinguish whether the nodes of NB(x) are the only nodes in T .Case 1.1: NT (x) = NB(x).For the bottleneck nodes it holds that val(σ) = xi and val(σ * ) ∈ V (y) (see Corollary 1).The replacement of x by x ε,i, is a beneficial increase for all ε ∈ [0, ε ] with ε = σ * − xi because the intermediate node is also shifted to σ + ε that leads to lower move cost of val(σ * ) − val(σ) − ε, that is, a decrease by ε.Therefore, we get the left side of the first equation d(x ε,i, , y) For the right side of the equation with ρD = 1, the argumentation is similar.Replacing x by x −ε,i, we get new move cost of val(σ * ) − val(σ) + ε, that is, an increase by ε.
Case 1.2: Since T is a tree of Type 4, all nodes in NT (x) merge to the intermediate node σ.Moreover, it is evident that the merge of adjacent points in T that are equal creates lower cost than merging two points that are different.Therefore, there exists an intermediate node σ where all nodes in NB(x) merge to (see Figure 10b).Then Lemma 8 can be applied for ui = σ with ρI = ρD = 1.Depending on the case, an εI is specified such that we get one of the above equations for an ε ∈ [0, εI ].For ε = εI , the block is shifted until it reaches a value of the adjacent points of the block, that is xi−1 or x +1 , or it reaches a point in V (y).
Case 2: The nodes in NB(x) belong to different trees (see Figure 10c).In this case, we need to count how many trees we have beneficial increases and decreases.To decide whether a replacement is beneficial or detrimental, there are two possible cases of trees belonging to the block B. The first case is that all nodes in a tree in B belong to NB(x).Then we can apply Case 1.1.The second case concerns the boundary values of NB(x) merging with the predecessors or successors of the block B. Following the argumentation of Case 1.2, we determine if the replacement of x by x ε,i, is beneficial, neutral, or detrimental.Ignoring neutral replacements, we set x + i, as the number of trees for which we have a beneficial increase and x − i, as the number of trees for which we have a beneficial decrease.Shifting a whole block may therefore lead to a reduction of distance of more than ε.We get the above statement for ρI = x + i, and ρD = x − i, .By Lemma 8, we get an εT for all trees T in a block B restricting a beneficial or neutral replacement.Let ε + min be the minimum of all εT for which we have a beneficial increase.Analogously, ε − min is defined for beneficial decreases.Without loss of generality, we assume x + i, > x − i, .Hence, it holds that for xi + ε + min is in {xi−1, x +1 } or in V (y).

MSM-Mean Values
We now use beneficial and neutral replacements to prove that for any set X there exists a mean m such that all points of m are points of at least one time series of X.
Proof.Assume towards a contradiction that every mean has at least one point that is not in V (X).Among all means, choose a mean m such that 1) nV , the number of points of m that are in V (X), is maximum and 2) among all means with nV points from V (X), the number of transitions from m j+1 where m j+1 is minimum.In other words, m has a minimal number of blocks.Let B be a block in m whose points are not in V (X).We apply Lemma 9 to show that there exists an ε ∈ R such that m ε,i, is a mean where the points of the shifted block B reach a point of a predecessor or successor of B or a point in V (X).We now specify ε.First, we determine whether ε is positive or negative.For each sequence, one of the Cases (1) to (3) of Lemma 9 applies.For each time series in X, we introduce two variables to count how many beneficial increases and decreases we have.Neutral replacements are not counted.Let x + be the sum of ρI for beneficial increases and x − be the sum of ρD for beneficial decreases of all time series.If x + ≥ x − , we set ε as the minimum of the specified ε for beneficial increases and all εI (see Lemma 9).If x + < x − we set ε as the maximum of the specified −ε for beneficial decreases and all −εD.Compared to the mean m, all values of m ε,i, are the same except the values of the shifted block.By Lemma 9 the points of the new mean m ε,i, are shifted for the specified ε until they reach a point of the right or left neighbor block or a point in V (X).If they reach a point of the right or left neighbor block, we have a contradiction to the selection of a mean with a minimal number of transitions.If they reach a point in V (X), we have the contradiction to the selection of a mean with minimal number of values that are not in V (X).

Computing an exact MSM-Mean
Based on Lemma 10, we now give a dynamic program computing a mean m of k time series X = {x (1) , . . ., x (k) }.The transformation operations are described for the direction transforming X to m.

Dynamic Program
We fill a (k + 2)-dimensional table D with entries D[(p1, . . ., p k ), , s], where • the index ∈ [N ] indicates the current position of m, and • s is the index of a point vs ∈ V (X).
We also say that (p1, . . ., p k ) are the current positions of X.For clarity, we write p = (p1, . . ., p k ).The entry D[p, , s] represents the cost of the partial time series {(x  [19] give the recursive formulation of the MSM metric as the minimum of the cost for the three transformation

Running Time Bound
We now show an upper bound on the maximum mean length in terms of the total length of X.To this end, we first make the observation that the index set IMO is never empty.That is, it is not optimal to apply only split operations in one recursion step.
Proof.Let D(X, m) be the distance of a mean m to X. Assume towards a contradiction, that there exists a recursion step, where IMO = ∅.That is, in each time series in X there is a split at a point x Let m be a mean of X equal to m but where m is deleted.For the mean m , we save the cost for splitting i∈I C(vs, x (i) p i , v s ), without changing the alignment of all other points in X.It follows that D(X, m ) < D(X, m).This is a contradiction to m being a mean.
Lemma 11 now leads to the following upper bound for the MSM-Mean length.
Lemma 12. Let X = {x (1) , . . ., x (k) } be a set of time series with maximum length max j∈ Proof.In the dynamic programming table D at most n k+2 k 2 entries have to be computed.This number is the dimension of k time series with maximum length n, the maximum length of the mean (n−1)k+1 ≤ kn and the size of V (X) which is at most kn.For each table entry, the minimum over the set V (X) is taken, which includes again kn data points.For each minimum over V (X) all subsets of [k] are considered which are at most 2 k sets.All subsets of [k] are only generated once for both IMO and IME.Thus, filling the table iteratively takes time O(n k+3 2 k k 3 ).
For the traceback, the start entry of D is any of the position (n1, . . ., n k ) with minimal cost, that is, The length of the mean m is start with mstart = vs start .In each traceback step, the predecessors of the current entry are determined, that are the entries leading to the cost of the current entry.A predecessor of an entry is not unique.For setting the mean data point we consider the current entry D[(p1, . . ., p k ), , s] and the entry of the predecessor D[(q1, . . ., q k ), , s ].If = − 1, the point v s is assigned to the mean point m and we continue with the next traceback step.Otherwise, the next traceback step is directly applied without assigning a mean point.We repeat this procedure until we reach the entry D[(1, . . ., 1), 1, s * ].The running time of filling the table clearly dominates the linear time for the traceback.

Implementation & Window Heuristic
We fill the table D iteratively and apply the above described traceback mechanism afterwards.Since the running time of MSM-Mean will often be too high for moderate problem sizes, we introduce the window heuristic to avoid computing all entries of table D. Similar to a heuristic of Levenshtein distance [7], the key idea is to introduce a parameter d called the window size representing the maximum difference between the current positions of the time series within the recursion.All entries whose current positions are not within distance d will be discarded.For example, an entry with current position (6, 3, 4) of X will not be computed for d = 2.In the case of a set of time series with unequal lengths, where nmin and nmax denotes the minimum length and the maximum lengths, respectively, of all time series, d has to be greater than nmax − nmin.

Experimental Evaluation
This section provides important results from a selection of experiments using implementations of mean algorithms1 .After a description of the experimental setup, we first provide a running time comparison of the DTW-Mean algorithm [31] and our MSM-Mean algorithm.Furthermore, we examine accuracy and running times of MSM-Mean for various heuristics.

Experimental Setup
The running times of our Java implementations are measured on a server with Ubuntu Linux 20.04 LTS, two AMD EPYC 7742 CPUs at 2.25 Ghz (2.8 Ghz boost), 1TB of RAM, Java version 15.0.2.Our implementations are single-threaded.For our results at most 26GB of RAM were occupied.
The experiments are conducted on 20 UCR data sets [21] that Stefan et al. [19] already used (see Table 1).The UCR data sets were collected for the use of time series classification.Each set consists of a training and a testing set.The data sets consists of time series of different classes and different lengths.Since we are not using the sets for classification use cases yet, we just take the training sets of each data set for our experimental setup.The parameter c is set constant for every data set following the suggestions of Stefan et al. [19].
Due to the complexity of the algorithms, we draw time series samples from the training sets obtained from the UCR archive in the following way.For each class of the training sets, we randomly pick k time series, k ∈ {3, 4, 5}, and for each of them, we cut out a contiguous subsequence of length n starting at a random data point, n ∈ {10, . . ., 50}.In addition, we limit the length of the mean time series to at most n.

Running Time Comparisons
In our first experiment, we consider the running times for k = 3, 4. Figure 11 shows the average running time over all 20 data sets as a function of n.Our MSM-Mean algorithm is substantially faster than the DTW counterpart.The outlier in the DTW graph is due to a data set where the implementation does not complete within 10 minutes.Figure 12 in the appendix provides box plots depicting the running times of both algorithms for k = 4 and n = 10 . . ., 13.They reveal that the MSM-Mean algorithm has smaller medians and interquartile ranges and fewer outliers of high running times compared to the DTW-Mean algorithm.The MSM-Mean implementation was able to compute any instance for k = 3, n < 43, k = 4, n < 19, and k = 5, n < 11 within 10 minutes.For DTW-Mean, this was only the case for k = 3, n < 29 and k = 4, n < 14.

MSM-Mean Quality
To evaluate the quality of the MSM-Mean, we use the algorithm on the ItalyPowerDemand data set [21] where each time series has length 24.The data set contains two classes.For different values of c ∈ {0.01, 0.1, 0.2, 0.5}, Table 2 shows the distance of MSM-Mean to three other time series.The first row reports the distance when the time series belong to one class, while the second row provides the distance when taking them from both classes.The results confirm for all c that distances of MSM-Mean are lower when time series belong to one class.

Length of the Mean
We implemented two versions of the MSM-Mean algorithm, one with a fixed length n of the mean and one without length constraints.As shown in Lemma 12, the lengths of MSM-Mean is at most (n − 1)k + 1.However, the results of our experiments for k = 3 and n ∈ {10, . . ., 30} show that the  length of MSM-Mean is always exactly n.Thus, it is advisable to use this constraint as done in the experiments discussed above.

Discretization Heuristic
Because the domain size of the values of a time series has an significant effect on the performance of MSM-Mean algorithm, we propose a second heuristic where the domain is split into v buckets of equal length.Each value x of a time series is then replaced by the center point of the bucket to which x belongs.Thus, there are at most v different values in total.Figure 13 shows the running time of this heuristic as a function of v for k = 3 and n = 30.There is a substantial (close to linear) decrease in the running time with a decreasing number of buckets.Moreover, the relative error is quite moderate: We observed an average and maximum error of 4.6% and 8.47%, respectively.

Window Heuristic
In the following, we investigate the window heuristic described in Section 5.3 for the MSM-Mean problem and k = 3, n ∈ {10, . . ., 42}.We examine the window size d = 1, 2, 3 in our experiments.We analyze relative error of the exact mean and the means obtained from the window heuristic.As expected, the higher the window size d the smaller is the relative error.The relative error averaged over all n and  all data sets was 4.8%, 3.2%, 2.4% and the maximum relative error was 9.1%, 6.4%, 5.4% for d = 1, 2, 3, respectively.Figure 14 shows the running time of the window heuristic in comparison to the exact computation as a function of n.Note that the y-axis plots the running time on a logarithmic scale.For all parameter settings, the running times improve substantially in comparison to the exact approach.

Conclusion and Future Work
This paper introduces the MSM-Mean problem of computing the mean of a set of time series for the Move-Split-Merge (MSM) metric.We present an exact algorithm for MSM-Mean with a better running time than a recent algorithm for computing the mean for the DTW distance.Experimental results confirm the theoretically proven superiority of our MSM-Mean algorithm in comparison to the DTW counterpart.The key observation of our method is that an MSM-Mean exists whose data points occur in at least one of the underlying time series.In addition, we provide an an upper bound for the length of MSM-Mean.In our experiments, the maximum mean length is much shorter, rarely exceeding the length of the longest time series.The paper also provides two heuristics for speeding up the computation of the mean without sacrificing much accuracy, as shown in our experimental evaluation.
In future work we will tackle the following issues.First, we will examine how to use MSM-Mean in real clustering and classification problems.Second, we plan to develop optimization strategies such as the A*-Algorithm [4] for further improving the running time of our algorithm to avoid filling up the entire dynamic programming table.As a starting point, the structure of the transformation forests and the metric properties of the MSM distance could be further explored.Third, the metric properties, especially the triangle inequality, of the MSM distance enables applications of the MSM-Mean to metric indexing.Finally, we conjecture MSM-Mean to be NP-hard.Proving this conjecture could be a next research step.

Figure 1 :
Figure 1: MSM-Mean of four time series from the Italy Power Demand data set containing time series of length n = 24.

Figure 2 :
Figure 2: Optimal transformation graph of x = (4, 5, 5, 10) to y = (10, 7, 8) for c = 0.1.Move edges are colored in red in this work.The cost of a move edge is the difference between the source and the target point.We have total cost merge and split cost 3c and move cost of 8. Hence, the distance between x and y is d(x, y) = 8.3.

Figure 3 :
Figure 3: a) First possibility of a transformation graph including consecutive split-move-merge edges.b) Equivalent transformation graph to a). c) Second possibility of a transformation graph including consecutive split-move-merge edges.d) Equivalent transformation graph to c)

Figure 4 :
Figure 4: All red edges are move operations.Black arrows are merge or split edges.The dashed lines represent paths from one node to another without specifying how many intermediate node are on them.a) Tree of Type 1. b) Tree of Type 2. c) Tree of Type 3. d) Tree of Type 4.

Figure 6 :
Figure6: Structure of the subtree of δ explained in the proof of Lemma 6. Note, that this is only a schematic representation and that there may be further intermediate nodes which are not marked.

Figure 7 :
Figure7: Structure of the subtree of δ described in the proof of Lemma 7. Note that this is only a schematic representation and that there may be further intermediate nodes which are not marked.

Figure 8 :
Figure 8: Schematic representation of the trees discussed for Case 1 in the proof of Lemma 8.The node u i has only predecessors.The dashed red edges show potential move operations.a) Case 1.1:x i−1 > x i .b) Case 2.1: x i−1 < x i .c) Proof mechanism introducing the modified tree T , where the path on which the new node δ can be shifted on is marked in blue.d) Proof mechanism introducing the modified tree T following the same mechanism as in c).
with α * .The rest of the tree stays unchanged.For ε ∈ [0, εI ] with εI = val(β * ) − xi the cost of T is equal to the cost of T because we only shifted a merge operation to another position in the tree.This is a neutral increase for all ε ∈ [0, εI ].It holds that xi + εI = val(β * ) = xi+1.Let further be T another modified tree of T .The tree T does not contain the node β, instead it contains a new node β , such that val(β ) ∈ [ui−1, val(α * )], where u −ε i merges to.Again, we only shifted a merge operation, that leads to equal cost of T and T for all ε ∈ [0, εD] with εD = val(α * ) − xi−1.We have val(α * ) = xi and hence xi − εD = xi−1.Case 2.2 : xi−1 > xi > xi+1.This case is analogous to Case 2.1.Case 2.3 : xi−1 < xi > xi+1.We further assume, without loss of generality, that xi−1 < xi+1.By Lemma 7 it holds that xi = val(δ).We have inc-edges between ui−1 and α * and between ui+1 and ζ * (see Figure 9b).The replacement of x by x−ε,i for an ε ∈ [0, εD] is a beneficial decrease because the merge points β and δ are shifted by −ε: The move cost are val(α * ) − ε − xi−1 and val(ζ * ) − ε − xi+1 for the two move operations, that is a decrease of 2ε.The new merge node of β * and ζ * is denoted by δ .For the new path between δ and σ * we have cost of |σ * − δ + ε|, that is an increase of cost by ε.We get the left side of Equation (2), that is, d(x−ε,i, y) + ε = d(x, y) for all ε ∈ [0, ε ] with εD = xi − xi+1.It holds that xi − εD = xi+1.The argumentation of the detrimental replacement of x by xε,i is analogous to Case 1.1.

Figure 9 :
Figure 9: Schematic representation of the trees discussed for Case 2 in the proof of Lemma 8.The node u i has predecessors and successors.a) Case 2.1:x i−1 < x i < x i+1 .b) Case 2.3: x i−1 < x i > x i+1 .c) Case 2.4: x i−1 > x i < x i+1 .

Case 1 : 1 Figure 10 :
Figure 10: Schematic representation of the three cases for proving Lemma 9, depending on the structure of a block B. a) Case 1.1: N T (x) = N B (x). b) Case 1.2: |N T (x)| > |N B (x)|. c) Case 2: The nodes in N B (x) belong to different trees.

Lemma 11 .
Let m be a mean of a set of k time series X.It holds that D[p, , s] < min v s ∈V (X) {D[p, − 1, s ] + i∈I C(vs, x (i) i , i ∈ I to the points m and m −1 .We regard the cost for the transformation up to the positions (p1, . . ., p k ) of X and of m.Applying the recursion formula for ISP = I, we get D[p, , s] = min v s ∈V (X) {D[p, − 1, s ] + i∈I C(vs, x (i) [k] |x (j) | = n.Then, every mean m has length at most (n − 1)k + 1. Proof.Towards a contradiction, let m be a mean of X with length N > (n−1)k +1.The entry of the first recursion call is D[(n1, . . ., n k ), N, s].Consider any sequence of recursion steps from D[(n1, . . ., n k ), N, s] to D[(1, . . ., 1), •, •]; each step is associated with index sets IMO and ISP , or IME.By Lemma 11, it holds that IMO = ∅ in each step.That is, at least one current position of X is reduced by one in each recursion step until the entry D[(1, . . ., 1), , s ] is reached.These are at most (n − 1)k + 1 recursion steps.Since N > (n − 1)k + 1, it holds that > 1.The only possibility for a further recursion step for D[(1, . . ., 1), , s ] is to set ISP = I, since D[p, , s] = +∞ whenever pi < 1 for some i.By Lemma 11, we get a contradiction to m being a mean.We now bound the running time of our algorithm.Lemma 13.The MSM-Mean problem for k input time series of length at most n can be solved in time O(n k+3 2 k k 3 ).

Figure 11 :
Figure 11: Running time comparison of MSM-Mean and DTW-mean for k = 3 as a function of n

Figure 12 :
Figure 12: Running time comparison of MSM-Mean and DTW-mean for k = 4

Figure 13 :
Figure 13: Running time for MSM-Mean calculation of k = 3 and n = 30 regarding the discretization heuristic.

Table 1 :
List of 21 UCR time series data sets.For our running time experiment, we did not take the Italy Power Demand data set (*) since the time series are too short.The quality analysis of MSM-Mean using this data set was conducted for c ∈ {0.01, 0.1, 0.2, 0.5}.

Table 2 :
Distance of the mean to time series of the data set ItalyPowerDemand of one class and to time series of mixed classes for k=3 and n=24 for varying c