Time-Warping Invariants of Multidimensional Time Series

In data science, one is often confronted with a time series representing measurements of some quantity of interest. Usually, in a first step, features of the time series need to be extracted. These are numerical quantities that aim to succinctly describe the data and to dampen the influence of noise. In some applications, these features are also required to satisfy some invariance properties. In this paper, we concentrate on time-warping invariants. We show that these correspond to a certain family of iterated sums of the increments of the time series, known as quasisymmetric functions in the mathematics literature. We present these invariant features in an algebraic framework, and we develop some of their basic properties.


Motivation
Given a discrete time series x = (x 0 , x 1 , . . ., x N ) ∈ (R d ) N , where N ≥ 1 is some arbitrary time horizon, our foremost, and original, motivation stems from the desire to extract features from x that are invariant to time warping.
The precise definition of the latter will be given in Section 4, but Figure 1 illustrates what we mean by time warping: the time series is allowed to "stand still" or to "stutter" 1 , which means that x has repetitions of values at consecutive time steps (here at time t = 3).
Remark 1.1.In this section we consider the notationally simpler case d = 1, that is, when x ∈ R N .Our interest is prompted, on the one hand by the extensive literature on the dynamic time warping (DTW) distance [BC94], a distance on discrete time series that is invariant to time warping.On the other hand the following example illustrates where such invariant features will become useful.
Example 1.2.Assume that there is a deterministic time series x ∈ R N which models some "prototype" evolution of a quantity, say the prototype heartbeat in a patient's ECG.This prototype is unknown, but one records a lot of samples of it run at different speeds and contaminated by noise (compare [Big13]).A model for these observations is then Here L is the number of observations, M ≥ N is the time horizon we allow the prototype to be "spread out" over, h ( ) : [1, . . ., M ] → [1, . . ., N ] are unknown non-decreasing, surjective time changes and w ( ) n are independent and identically distributed (iid) random walks.The goal is to recover x (up to time warping).
The currently used method [JRCL91,Big13], consists in first trying to align the different samples, i.e., to estimate the time-changes h ( ) , and to average afterwards.This seems to work well in regimes where the noise w ( ) is small (large signal-to-noise ratio), but will break down if this is not the case.
Guided by invariant methods in cryo-EM [BBSK + 17] we then propose the following procedure.
1. Calculate features of y ( ) that do not see time warpings.
2. Average those features over the independent samples, giving the law of large numbers a chance to cancel out the noise and getting an approximation of the features of x.
3. Invert the averaged features to arrive at a candidate for x.

Our approach to
Step (i) is new and will be presented in this paper.
Step (ii) and Step (iii) will be addressed in future work.
A moment's thought reveals that iterated-sums of the increments of x are invariant in the desired sense.For example, the simple sum i (x i − x i−1 ) or the more complex expressions are features of the time series that do not change when warping time, i.e., when repetitions of points, x i = x i+1 = • • • = x i+j occur in x.2However, two questions immediately emerge (A) The three expressions in (1) are already linearly dependent (adding the first and second sum gives the third).How to store only linearly independent expressions?(B) Do iterated-sums of increments give all (polynomial) time warping invariants?
Regarding the first item, it turns out that the above iterated-sums expressions are reminiscent of quasisymmetric functions [MR95].Consider the space R Y 1 , Y 2 , Y 3 , . . . of formal power series in ordered commuting variables Y 1 , Y 2 , Y 3 , . . . .By definition, a power series (of finite degree) Q ∈ R Y 1 , Y 2 , Y 3 , . . . is a quasisymmetric function if for all n ≥ 1, all i 1 < • • • < i n , all j 1 < • • • < j n and all α 1 , . . ., α n ≥ 1, the coefficient of the monomial (Y i 1 ) and we see that the invariants given above follow from the evaluation of these quasisymmetric functions at Y 1 → x 1 − x 0 , Y 2 → x 2 − x 1 , . . ., Y N → x N − x N −1 , and Y i → 0 for i ≥ N + 1.
Different linear basis for quasisymmetric functions are known.The one of monomial quasisymmetric functions of [MR95] is indexed by compositions of integers.Anticipating the multidimensional case, we write a composition and obtain the correspondence Quasisymmetric functions are a refinement of symmetric functions and form a commutative unital algebra.The product is just the polynomial product in the power series representation.It amounts to a so-called quasi-shuffle product (see Section 2) in the representation as compositions.For example, the abstract quasi-shuffle product The latter equality follows by case distinction for sums over the three indexing variables, which amounts to a summation-by-parts formula.The last two terms in the above product reflect the fact that multiplying sums requires the inclusion of sums over diagonal terms.
It is natural to store the iterated-sums invariants of the discrete time series x as a linear map DS(x) on the quasi-shuffle algebra of compositions, by defining the pairing Here ∆x i := x i − x i−1 for 1 ≤ i ≤ N , and as above we extend x constantly, so that ∆x i := 0 for i ≥ N + 1.From the correspondence between the product of power series and the quasi-shuffle product of compositions mentioned above we deduce that Hence, DS(x), which we call iterated-sums signature, is an algebra morphism (from the quasi-shuffle algebra to the underlying base field F).Since compositions form a linear basis, this answers Question (A) above -in the case d = 1.We will come back to Question (B) in Section 4.
The commutative algebra of quasisymmetric functions is the free quasi-shuffle algebra over one generator and it is -as we just saw -the correct framework to store iteratedsums for a one-dimensional time-series.The appropriate generalisation of this algebra to arbitrary dimension d ≥ 1, that is, the free quasi-shuffle algebra over d generators, was carried out by Hoffman [Hof00].
The aforementioned amounts to saying that iterated-sums signature DS(x) is an element of the dual space of the quasi-shuffle algebra over d generators.It can therefore be represented as an infinite word series with iterated-sums of the time series x as coefficients.Its compatibility with the quasi-shuffle product together with the fact that the latter can be seen as a deformation of the classical shuffle product [PFT16] suggests to consider DS(x) as a discrete analog of Chen's iterated-integrals signature over continuous curves [Che57,Ree58].The latter plays an important role in the theory of controlled ordinary differential equations (ODEs), stochastic analysis and Lyons' theory of rough paths [FV10,Lyo98].Such a large spectrum of applications reflects the important property of iterated-integrals to provide -in some sense -a complete representation of a curve, so that arbitrary functionals on curves should be well approximated by functions on its signature.There is a caveat though.Iterated integrals are tailor made to approximate functionals that stem from controlled ODEs.But as is quickly realised, this does not mean that the iterated-integrals signature is an optimal representation for other inputoutput systems.For example, since a controlled ODE -and hence also the signature -cannot see tree-like3 excursions, the iterated-integrals signature of a one-dimensional path reveals nothing about the path, except for its increment. 4The aforementioned limitations of the iterated-integrals signature with respect to tree-like paths prompts us to propose instead the use of "discrete time signature" DS(x), which, instead of storing iterated-integrals, gathers iterated-sums.
The paper is organised as follows.Section 2 recalls the notion of quasi-shuffle Hopf algebra and quasisymmetric functions.In Section 3 we introduce the iterated-sums signature and show its character property with respect to the quasi-shuffle Hopf algebra.Moreover, we show that Chen's property is satisfied, but that Chow's Theorem does not hold.Hence, while mirroring the setup of Chen's iterated-integrals signature to some extent, interesting differences emerge.It turns out that our description of the iteratedsums signature is nicely related to the work [NT10] on "multidimensional" generalisation of quasisymmetric functions, and we dwell on this briefly in Remark 3.5.In Section 4 we show the iterated-sums signature contains (almost) all time warping invariants.In Section 5 we use a specific Hopf algebra isomorphism, known as Hoffman's exponential, to relate the iterated-sums signature to Chen's iterated-integrals signature (of an infinitedimensional path).This includes in particular relating the continuous and discrete area operations.
We recall the inductive definition of the quasi-shuffle product following Hoffman [Hof00].See also [BCEF18,EMPW15].Our starting point is the alphabet A = {1, 2, . . ., d}, which we augment to a free commutative semigroup, A, by defining a commutative product denoted by square brackets, [−−] : A × A → A. For example, the product between the letters 1, 2 ∈ A is written [12] = [21].Any iteration of the product in A can be simplyfied to an expression containing a single pair of brackets, that is, Elements in the tensor algebra T (A) over (the vector space spanned by) A are denoted by words, i.e., we denote the tensor product by concatenation, or juxtaposition of basis elements.The neutral element for this product is the empty word, denoted by e.The augmentation ideal is defined by T + (A) := ⊕ n>0 A ⊗n such that T (A) = Fe ⊕ T + (A).
The commutative quasi-shuffle product m : T (A) ⊗ T (A) → T (A), u v := m (u, v), is introduced by inductively defining e u := u =: u e, for all u ∈ T (A), and for u, v ∈ T (A) and a, b ∈ A. For example, 2 3 = 23 + 32 + [23] and The tensor algebra is naturally graded by the length of words, (w However, in light of the new product (2), which is not homogenous with respect to the number of letters, we introduce the weight grading on T (A), denoted | • |, by declaring that |e| = 0, |a| = 1 for all a ∈ A and |[pq]| = |p| + |q| for all p, q ∈ A.
Here C(n) is the set of all compositions of the integer n, i.e., tuples Here (as well as later) we are using the suitable convention that [a] := a for all a ∈ A.
Remark 2.2 (Shuffle Hopf algebra).If the semigroup A is trivial, i.e., if [ij] = 0 for any letters i, j ∈ A, then the quasi-shuffle product (2) reduces to Chen's commutative shuffle product on T (A): for v, w ∈ T (A) and i, j ∈ A. Observe that in this case |w| = (w) for any word and ) is the classical shuffle Hopf algebra over the alphabet A. From (5) it follows that the antipode on H ∃ is given by α

Remark 2.3 (A remark on dimensions.).
There is a simple way of computing the Hilbert series , where T (A) n := F{w : |w| = n} is the homogeneous (for the weight grading) component of degree n of the quasi-shuffle algebra.It is not hard to see that all such words are of the form for some composition I = (i 1 , . . ., i p ) ∈ C(n) and letters a 1 , . . ., a n ∈ A, in the notation of Theorem 2.1.In total, in each block of size i 1 , i 2 , . . ., i p we are allowed to put a symmetric monomial of length i j of which there are exactly d−1+i j i j -this is the dimension of the degree-i j part of the symmetric algebra S(A).Therefore A simple computation shows that in fact where the Pochhammer symbol (or rising factorial) appears on the righthand side.It is well known that their exponential generating function equals the hypergeometric function Therefore Define the scalar product −, − : by u, v := 1 if u = v and zero else.It permits to identify the graded dual of H qsh as word series, i.e., c = w∈T (A) w, c w ∈ T ((A)) =: H * qsh , which is a noncommutative (topological) Hopf algebra with concatenation as convolution product, denoted by m : ), and de-quasi-shuffling as coproduct [Hof00].In more concrete terms, this means that given two such series c, c ∈ T ((A)) their convolution product c c := m F (c ⊗ c )δ : H qsh → F may be written as Of particular interest are characters, i.e., algebra morphisms c ∈ H * qsh .They satisfy e, c = 1 and u v, c = u, c v, c , for u, v ∈ H qsh .The first property requires that the coefficient e, c = 1 and the second is equivalent to c being group-like in H * qsh , which means that for u, v ∈ H qsh where the de-quasi-shuffling coproduct is defined on words by The set of characters, denoted by G, forms a group with the inverse c −1 = c • α.The corresponding Lie algebra, g ⊂ H * qsh , consists of so-called infinitesimal characters, which map the empty word and any non-trivial product in H qsh to zero.One can define the exponential map as a power series with respect to the convolution product which maps g bijectively to G, i.e., exp (f ) := ε + j>0 1 j! f j ∈ G.Because T (A) is a graded connected Hopf algebra, this expression becomes a finite sum when evaluated on homogeneous elements of T (A), so we do not have to deal with convergence issues.Its inverse is the logarithm, log (ε Again, the sum applied to any word w ∈ T (A) terminates after |w| terms, as (c − ε)(e) = 0. Notation 2.4.We introduce a particular notation for words in T (A), which will be useful in the sequel.The convention to identify [a] := a, for a ∈ A, permits to write any word in T (A) as a concatenation of brackets, i.e., w We come back to the setting of the introductory section with only a single letter, A = {1}.Then, in each degree n, T (A) has a single word of length one, [1 n ] ∈ A, and any basis element (or word) is of the form w = It is easy to see that then the tuple (k 1 , . . ., k n ) is a composition of the integer |w| of length n = (w).In [Hof00] Hoffman describes a unital algebra isomorphism Σ between the quasi-shuffle algebra H qsh , for A = {1}, and the algebra QSym of quasisymmetric functions in the ordered set of commuting variables {Y i } i∈N + [Ges84], defined by taking a word in T (A) to an iterated sum Here Σ(e) = M 0 = 1.Then, the correspondence of the introduction is explicitly given by where the second equality is an example of summation-by-parts for products of iterated sums.
The M (k 1 ,...,kn) of ( 6) are the monomial quasisymmetric functions, which form a basis for QSym.The Hopf algebra QSym is a generalisation of the classical Hopf algebra Sym of symmetric functions.It was defined and studied by Gessel [Ges84], based on earlier work by Stanley, and plays a rather distinguished role in modern algebraic combinatorics, with ramifications into several other fields of mathematics.Its graded dual is known as the connected graded cocommutative Hopf algebra NSym of noncommutative symmetric functions.The iterated-sums signature corresponding to a one dimensional discrete time series, alluded to in the first section, is an element in NSym.Further below, in Section 3, we consider the multidimensional generalisation of quasisymmetric functions (of level d in the terminology of [NT10]) and its corresponding iterated-sums signature.We close this section by mentioning that Malvenuto's and Reutenauer's Hopf algebra of permutations [MR95] plays an important part in the understanding of the relation between the objects Sym, QSym and NSym.The interested reader is referred to [AS05,ABS06] and to [LMvW13] for a readable introduction, including a brief historical overview.

Half-shuffles
Aiming at understanding the discrete analog of the area operation (to be introduced further below), we take a more refined approach at the quasi-shuffle product by observing that m may be split into three products, i.e., left and right half-shuffles and a third product Noticing the particular relation ua ˙ vb = vb ≺ ua which is equivalent to m being commutative, it is not hard to show that the quasi-shuffle algebra H qsh = (T (A), ) becomes a commutative tridendriform algebra, (T (A), ≺, ˙ , ), as defined by Loday and Ronco [LR04].
Remark 2.5.A similar splitting holds for the shuffle algebra in Remark 2.2.We can write the shuffle product m ∃ on T (A) as a sum of the two half-shuffles so that ua ∃ vb = ua ≺ vb + ua vb.Again, we check quickly that the commutativity of the shuffle product is equivalent to ua vb = vb ≺ ua.In fact, the triple (T (A), ≺, ) is also known as a commutative dendriform or Zinbiel algebra.

Hoffman's exponential
Shuffle and quasi-shuffle Hopf algebras are more tightly related than Remark 2.2 may adumbrate.Indeed, Hoffman proved in [Hof00] that ) and H qsh = (T (A), , δ) are isomorphic as Hopf algebras.We briefly recall this result.Let T (A) be equipped with the commutative shuffle product m Its inverse also admits an explicit expression, namely the Hoffman logarithm ).
In the second example, the terms correspond to the compositions (1, 1, 1), (2, 1), (1, 2) and (3) of the integer 3, in that order.Recall that the particular Notation 2.4 for words w , for u 1 , . . ., u k ∈ A, is in place.Also, note that the number of letters in each of the terms corresponds to the length of the composition.The reader is referred to [Hof00, HI17] for more details.See also [EMPW15] for an application in stochastic analysis.
In Section 5 we will show that Φ H is nicely compatible with comparing the iteratedsums signature on one side with the iterated-integrals signature on the other.The following two lemmas are going to be used in Subsection 5.5.1,where we address the area operation in the context of the iterated-sums signature.
Lemma 2.7.The image of any nonempty word w = w 1 • • • w n ∈ T (A) under Hoffman's isomorphism can be split into two parts as follows: where the remainder term The verification of the lemma is left to the reader.This splitting of Hoffman's isomorphism implies the following important result.
Lemma 2.8.Let u ∈ T (A) and a, b ∈ A. Then Proof.From Lemma 2.7 and linearity of Φ H , we deduce that Since the semigroup A is commutative, for any composition I = (i 1 , . . ., i p ) ∈ C(n) with i p ≥ 2 we have that Therefore, the equality ) holds, which implies the identity (12).

Iterated-sums signatures
We consider a discrete time series x ∈ (F d ) N as an element of the space of infinite time series that are eventually constant, by extending it constantly.
In this section we will see that the appropriate algebraic setting for iterated-sums, combined into the map DS(x), is that of a character on the quasi-shuffle Hopf algebra H qsh = (T (A), , δ, ε, | • |) over the semigroup A corresponding to the alphabet A = {1, 2, . . ., d}, introduced in Section 2.
The following notation for elements in the time series x is put in place: Next we define the corresponding time series ∆x = ((∆x) 1 , (∆x) 2 , . . ., (∆x) N ) with increments (∆x) n := x n − x n−1 ∈ F d , for n ≥ 1, as entries.The new notation is extended to include all brackets in A by defining Definition 3.1.The iterated-sums signature of the time series x is the two-parameter family (DS(x) n,m | 0 ≤ n ≤ m ∈ N) of linear maps from T (A) to F such that DS(x) n,n = ε, and defined recursively by e, DS(x) n,m := 1, and for Hence, the iterated-sums signature is a word series in with iterated sums over increments of x as coefficients, defined as We extend this definition to all n, m ∈ N by setting w, DS(x) n,m = 0 whenever m < n.
Remark 3.2.An easy consequence of this definition is that the coefficient w, DS(x) n,m vanishes whenever (w) > m − n.
The proof of the following lemma is straightforward.Lemma 3.3.Let x = (x n ) n≥0 and x = (x n ) n≥0 be two time series, and denote by xx := (x n x n ) n≥0 .Then the increment of the product xx is given by a generalised Leibniz rule More importantly, we have the following: Theorem 3.4.1. (Quasi-shuffle identity) For each n ≤ m, the map DS(x) n,m : H qsh → F is a quasi-shuffle Hopf algebra character.
2. (Chen's property) For any three n < n < n ∈ N we have Remark 3.5.1. Observe that point (i) in Theorem 3.4 amounts to a generalisation of the algebra isomorphism defined in (6) to the multidimensional case, i.e., for an alphabet A = {1, . . ., d}.Indeed, defining the map where u 1 , . . ., u n ∈ A and for u we obtain a quasi-shuffle algebra isomorphism into the algebra of quasi-symmetric functions of level d, as introduced by Novelli and Thibon in [NT10].For the sake of briefness we only remark that 2 ,...

2.
Specialising to F = R, Theorem 3.4 matches the corresponding result for the iterated-integrals signature6 S(X) of a curve of bounded variation in R B , where B is a (possibly countable) alphabet.Here, the underlying Hopf algebra is 2. (Chen's property) For s < u < t S(X) s,u S(X) u,t = S(X) s,t .
Before proving Theorem 3.4 we need the following abstract result, which is a particular case of the setting presented in [NT10, Section 5.1].
Let us look at the first few terms in (16): We refrain from further elaborating on this lemma, and instead refer to reference [NT10] for details about multivariable generating series.Note, however, that after evaluating σ(Y ) in Y [a] j

= ∆x
[a] j , we obtain DS(x) and the factorisation (16) takes place in the convolution algebra (T ((A)), ).We further remark that the expansion of the geometric series on the righthand side of the first equality in (16) takes place in A, which explains the summation over A in the second equality.
Proof.(of Theorem 3.4) 1.We need to show that for words w, w ∈ T (A) w w , DS(x) n,m = w, DS(x) n,m w , DS(x) n,m .
We use the recursive definition of the quasi-shuffle product (2) and induction on (w) + (w ), the base case (i.e., w = e or w = e) being trivial.If u, v ∈ T (A) and a, b ∈ A, define the auxiliary time series for 0 ≤ k ≤ m − n, and zero else.Observe that the increments By the induction hypothesis we then get n+k , and similarly Also, by a similar argument we also have Finally, we summate these relations by using Lemma 3.3 to get 2. The proof of Chen's property can be pursued using a pedestrian approach.However, it also follows from Lemma 3.6.Indeed, we may split the product in the factorisation (16) as

The desired identity follows upon evaluation at Y
j as in the previous remark.
We note that the iterated-sums signature, DS(x) n,m , introduced in this work is similar to the discrete Chen(-Fliess) series defined and studied in [GDEEF17] in the context of nonlinear control theory.This section is closed with an intriguing observation.Up to this point it may seem that iterated-sums signatures, DS(x) n,m , and Chen's signatures, S(X) s,t (see Remark 3.5), behave in the same way, but as the next example shows this is not at all the case.Recall that End F (H qsh ), the space of linear maps on H qsh , together with the convolution product ψ * γ := m (ψ ⊗ γ)δ, is an non-commutative algebra with unit ι := η • ε, where η : F → H qsh is the unit map, η(λ) := λe.Define where J := id −ι ∈ End F (H qsh ) is the projection onto the augmentation ideal T + (A).It is the adjoint of the classical Eulerian Lie idempotent [Reu93], that is, the concatenation logarithm of the identity map, log (id).Observe that the sum (17) terminates when evaluated in homogeneous elements since J(e) = 0, thus it is well defined for arbitrary elements of T (A).Then, for any character c ∈ T ((A)) and word u ∈ T (A) we have that u, log c = e(u), c , where log c ∈ g.Indeed, by definition In the third equality we used that c is a character.In the second equality the reduced coproduct is applied

Now, if
x is an arbitrary time series, for its iterated-sums signature this means that Therefore, the image of the logarithm of iterated-sums signatures only reaches a certain subset of the Lie algebra of infinitesimal characters on H qsh .This is in contrast to Chen's iterated-integrals signature, for which Chow's Theorem [FV10, Theorem 7.28] holds, showing that any character over the shuffle Hopf algebra may be realised as the Chen signature of a piecewise linear path.The implications of this observation will be studied in a forthcoming paper.Still, the following positive statement on the linear span of iterated-sums signatures holds.
Lemma 3.7.For every n ≥ 1, span F {proj ≤n DS(x) : x ∈ (F d ) Remark 3.8.The corresponding result for iterated-integrals signatures was shown in [DR18, Lemma 3.4], which is sometimes useful for proving statements about the underlying algebra that are easily verified when tested against signatures.
Proof.Fix n ≥ 1 and let P 1 , . . ., P L be the quasisymmetric monomial functions with degree smaller or equal to n in some order.
By [NT10, Section 5.1] they are independent as elements of the space of formal power series.This implies that, evaluating at Y n = (y 1 , . . ., y n , 0, 0, . ..), the expressions P 1 (Y n ), . . ., P L (Y n ) are independent as elements of F [y 1 , . . ., y n ].Denote n , . . ., y . Then, the independence of the P i implies independence, in R, of the rows of A fortiori, also the columns must be independent in R. Hence, the columns must be independent for some realisation of the Y ( ) .This finally implies that we can find P 2 ∆x (2) . . .P 2 ∆x (L) . . . . . . . . . . . .
, are independent.Here, as before, we extend the x ( ) constantly to an element of (F)

Invariants
In the previous section, we defined the iterated-sums signature, following the introduction.We now return to our original motivation, and first put the concept of "time warping" in a precise mathematical framework.For each index n ≥ 0 we define an operator acting on sequences by repeating once the value at time n.More precisely, given a time series, x, we define τ n (x) as the time series given by Observe that with this definition we have τ n (x) n = τ n (x) n+1 = x n , and the rest of the values are unchanged save for a time shift after time n.
Definition 4.1.We call a functional F : (F d ) From applications to data analysis, such as, e.g., moment corrections, we are mostly interested in polynomial invariants, i.e., invariant functionals that can be expressed by considering only polynomial expressions in a time series.Definition 4.2.We call F : (F d ) N + c → F polynomial, if for all N ≥ 1, F (x 0 , . . ., x N , 0, 0, . . . ) is a polynomial in the x i , and the polynomial degree is uniformly bounded in N .
From the factorisation (16) in Lemma 3.6 it follows that for any word w ∈ T (A) the coefficient w, DS(x) is a polynomial invariant in this sense.It turns out these are all the polynomial invariants, if we additionally demand invariance with respect to space translation of the entire series.
Lemma 4.3.Let F be polynomial, invariant to both time warping and space translations.Then: F is realised as a quasisymmetric function.
Hence, both sides coincide as polynomials.So that the coefficient of must coincide.This finishes the proof.

Hoffman's isomorphism and signatures
In this section we relate the iterated-sums signature of a time series with the usual iterated-integrals signature of the piecewise linear interpolation of an associated infinite dimensional time series.
Starting again with the extended alphabet A, we build the tensor algebra T (A) and define the shuffle product Recall Hoffman's isomorphism [Hof00] defined in Theorem 2.6, which shows that H ∃ = (T (A), ∃ , δ) and H qsh = (T (A), , δ) are isomorphic as Hopf algebras.Next we compute explicitly the image by the iterated-integrals signature S of a linear path.
The following lemma is an immediate extension of [FV10,Example 7.21] to a countable index set.
Lemma 5.1.Consider a countable set B and let z t = z 0 + at for some z 0 , a ∈ R B and all t ∈ [0, 1].
At the level of the tensor algebra this simply means that S(z) s,t = exp ((t − s)a).An analogue of this result holds for discrete signatures, which follows from Lemma 3.6, i.e., Chen's property.
Remark 5.4.We note that the iterated-integrals signature of the d-dimensional path consisting in the piecewise linear interpolation of x is not enough to obtain DS(x).
Instead, the theorem shows that the iterated-integrals signature of the piecewise linear interpolation of the infinite dimensional time series (18) is necessary.
Proof.First, we note that since Φ H is a Hopf algebra map, the statement is equivalent to showing that Φ * H (DS(x)) = S(X), where Φ * H is the adjoint of Hoffman's isomorphism.Next, we observe that we can reduce the argument to the case where x has only one non-zero increment, or equivalently, X is a straight line, by using Chen's property.In fact, we have that .
Indeed, if we suppose that for each i = 1, . . ., p the letter a i ∈ A is of the form [ in other words, K m is the number of times the letter m ∈ {1, . . ., d} is repeated in w.
But the only term in Φ H (w) containing a single letter is i.e., the full "contraction".By Lemma 5.2 this last expression also equals Φ H (w), DS(x) j−1,j .Therefore, we have shown the claim for a single time step.
Finally, since Φ * H is an algebra morphism, this means that So the result is valid for the full signature.

The area operation
It is well known that for the iterated-integrals signature certain linear combinations of the entries have a precise geometric interpretation.Indeed, for any i represents (two times) the signed area (or Lévy area) between the curves u → X i u and u → X j u for u ∈ [s, t], and the cord between the points (X i s , X j s ) and (X i t , X j t ).We abstract this operation to the shuffle algebra by using the notion of half-shuffles introduced in Subsection 2.2.1.In fact, one verifies that at this level the area operation may be represented in terms of half-shuffle operations as ij − ji = i j − j i =: area(i, j), so that in particular Area(X i , X j ) s,t = area(i, j), S(X) s,t .
Definition 5.5.The area map area : Next, the discrete analogue is given in terms of the first half-shuffle product in (7).
Definition 5.6 (Discrete area).The discrete area map area : We compare the two areas by considering the words u = as follows from Example 8.Both area and area can be iterated.We now make this precise: define D 1 = D 1 := F A, the vector space spanned by the set A. Then, inductively define vector spaces D n+1 := area(D 1 , D n ) and D n+1 := area(D 1 , D n ).We finally set Neither the area nor the discrete area operations are associative.One can show, however, that area satisfies a fourth-order relation, known as tortkara, introduced by Dzhumadil'daev in the 2007 paper [Dzh07].In [DIM18] the image of iterated applications of the area map is characterised.
Theorem 5.7 ([DIM18, Theorem 2.1]).The space D is spanned by the set From Lemma 2.7 and Lemma 2.8 we deduce the following morphism property of Hoffman's isomorphism with respect to area and area.
Remark 5.9.Note that Φ H is not a (quasi-)half-shuffle morphism.Only the antisymmetrisation to area respectively area is nicely compatible with it.
Proof.By Dzumadil'daev's theorem (Theorem 5.7) it suffices to prove the claim for the case when ϕ ).We first observe that in this case the area operation can be written more explicitly: Each of these terms can be further expanded into three terms.For example, the first one equals In total there are 12 terms, the remaining 9 terms are For each of these terms we can find exactly one other term such that their sum is of the form w( where the last identity is easy to check using that ). Applying a similar argument to all letters we see that = area(Φ H (ϕ), Φ H (ψ)).

Conclusion and outlook
In this work we have • introduced a new set of features for multidimensional time series consisting in iterated sums (Section 3); • shown that these features are invariant to time warping and that these in fact are all the (polynomial) invariants in this sense (Section 4); • described a Hopf algebraic framework to compute these features (Section 2); • shown how this setting mirrors the one of iterated-integrals in some aspects and differs in others (Section 2).
There are several possible generalisations of our work.
• Let f, g : F → F be such that f (0) = g(0) = 0. Then iterated-sums of the form i 1 <i 2 f (∆x i 1 ) g (∆x i 2 ) , are also invariant to time warping (and analogously for higher order iterated-sums).These are, in general, not polynomial in the time series anymore, but might still be relevant for certain applications.For smooth f, g this should be related to the expansion of nonlinear functionals on stochastic word series [CEFJMW19], but the non-smooth case (for example f (x) = x, g(x) = |x|) is particularly interesting.
• Multi-parameter data.An object of interest are for example "images" I : [0, N ] × [0, N ] → R and the time warping invariance becomes an invariance to stretching of the image.
We are also interested in exploring the possible applications of these invariants in data science.
• Retrieval of similar time series, invariant to time warping: see [YJF98] (and references in there), where it is stated that "the time warping distance . . .does not lead to any natural features".The invariants presented in our work may provide those missing "natural" features.
• Statistical inference in problems involving unknown time warping, as in Example 1.2.
• Time series clustering: the features of this work can be used to cluster time series according to their "shape", i.e., independent of time warping.Sometimes a "prototype" for each cluster is looked after, see for example [PKG11].In this case -as in the previous point -reconstruction of a time series from an (averaged) iterated-sums signature would be necessary.A detailed study of this ostensibly hard problem is left for future research.
We close with some open questions.At the end of Section 3 we showed that an equivalent of Chow's theorem does not hold for the iterated-sums signature DS(x).
• Can we understand {DS(x) : x ∈ (F d ) N + c } as a semi-algebraic set? (Compare [AFS19] for the investigation of the image of iterated-integrals signatures as algebraic sets.) • For x ∈ (F d ) N denote by ← − x the time series run backwards.Then (as might surprise readers familiar with Chen's signature) DS( ← − x ) DS(x) = ε.What are the implications?
We have show in Section 5 that the isomorphism known as Hoffman's exponential is compatible with piecewise linear interpolation of a time series.Are there other morphisms that correspond to other interpolation schemes?Do the (in general only linear) maps of Hoffman-Ihara [HI17] have an interpretation in this sense?
We end by remarking that the lead-lag procedure of [FHL16] lifts a discrete time series of dimension d to a piecewise smooth curve of dimension 2d.Since the resulting iteratedintegrals signature is invariant to time warping and polynomial in the original time series, by Lemma 4.3 it must be contained in the iterated-integrals signature DS(x).We believe that, conversely, the signature of the resulting 2d curve should be enough to recover the iterated-sum signature, thus finding a finite dimensional smooth curve whose iteratedintegrals signature contains the invariants presented in this paper (compare Theorem 5.3 for an infinite dimensional smooth curve doing the job).

Figure 1 :
Figure 1: Example of time warping in the case of a discrete time series in d = 1 dimensions.