Invariants of Multidimensional Time Series Based on Their Iterated-Integral Signature

We introduce a novel class of features for multidimensional time series that are invariant with respect to transformations of the ambient space. The general linear group, the group of rotations and the group of permutations of the axes are considered. The starting point for their construction is Chen’s iterated-integral signature.


Introduction
The analysis of multidimensional time series is a standard problem in data science. Usually, as a first step, features of a time series must be extracted that are (in some sense) robust and that characterize the time series. In many applications the features should additionally be invariant to a particular group acting on the data. In Human Activity Recognition for example, the orientation of the measuring device is often unknown. This leads to the requirement of rotation invariant features [37]. In EEG analysis, invariants to the general linear group are beneficial [12]. In other applications, the labeling of coordinates is arbitrary, which leads to permutation invariant features.
As any time series in discrete time can, via linear interpolation, be thought of as a multidimensional curve, one is naturally led to the search of invariants of curves. Invariant features of curves have been treated using various approaches, mostly focussing on two-dimensional curves. Among the techniques are Fourier series (of closed curves) [21,27,52], wavelets [6], curvature based methods [2,36] and integral invariants [13,35].
The usefulness of iterated integrals in data analysis has recently been realized, see for example [20,26,32,51] and the introduction in [5]. Let us demonstrate the appearance of iterated integrals on a very simple example. Let X : [0, T ] → R 2 be a smooth curve. Say we are looking for a feature describing this curve that remains unchanged if one is handed a rotated version of X. Maybe the simplest one that one can come up with is the (squared) total displacement length |X T − X 0 | 2 . Now, where we have applied the fundamental theorem of calculus twice and then introduced the notation dX i r forẊ i r dr. We see that we have expressed this simple invariant in terms of iterated integrals of X; the collection of which is usually called its signature. The aim of this work can be summarized as describing all invariants that can be obtained in this way. It turns out, when formulated in the right way, this search for invariants reduces to classical problems in invariant theory. We note that already in the early work of Chen (see for example [4,Chap. 3]) the topic of invariants arose, although a systematic study was missing (see also [23]).
The aim of this work is threefold. Firstly, we adapt classical results in invariant theory regarding non-commuting polynomials (or, equivalently, multilinear maps), to our situation. These results are spread out in the literature and sometimes need a little massaging. Secondly, it lays out the usefulness of the iterated-integral signature in the search for invariants of d-dimensional curves. We show, see Sect. 7, that certain "integral invariants" found in the literature are in fact found in the signature and our approach simplifies their enumeration. Lastly, we present new geometric insights into some entries found in the signature, Sect. 3.3. 1 The paper is structured as follows. In the next section we introduce the iterated-integral signature of a multidimensional curve, as well as some algebraic language to work with it. Based on this signature, we present in Sect. 3 and Sect. 4 invariants to the general linear group and the special orthogonal group. Both are based on classical results in invariant theory. For completeness, we present in Sect. 5 the invariants to permutations, which have been constructed in [1]. In Sect. 6 we show how to use all these invariants if an additional (time) coordinate is introduced. In Sect. 7 we relate our work to the integral invariants of [13] and demonstrate that the invariants presented there cannot be complete. We formulate the conjecture of completeness for our invariants and point out open algebraic questions.
For readers who want to use these invariants without having to go into the technical results, we propose the following route. The required notation is presented in the next section. The invariants are presented in Proposition 3.11, Proposition 4.4 and Proposition 5.4. Examples are given in Sect. 3.1 (in particular Remark 3.14), Example 4.7 and Example 5.6. All these invariants are also implemented in the software package [9]. For calculating the iterated-integral signature in Python we propose using the package iisignature, as described in [40].

The Signature of Iterated Integrals
By a multidimensional curve X we will denote a continuous mapping X : [0, T ] → R d of bounded variation. 2 The aim of this work is to find features (i.e. complex or real numbers) describing such a curve that are invariant under the general linear group, the group of rotations and the group of permutations. Note that in practical situations one is usually presented with a discrete sequence of data points in R d , a multidimensional time series. Such a time series can be easily transformed into a (piecewise) smooth curve by linear interpolation.
It was proven in [22], which extends the work of [4], that a curve X = (X 1 , .., X d ) is almost completely characterized by the collection of its iterated integrals 3 The collection of all these integrals is called the signature 4 of X. In a first step, we can hence reduce the goal By the shuffle identity (Lemma 2.1), any polynomial function on the signature can be rewritten as a linear function on the signature. Assuming that arbitrary functions are wellapproximated by polynomial functions, we are led to the final simplification, which is the goal of this paper Find linear functions Ψ : signature of curves → R that are invariant under the action of a group G.

Algebraic Underpinning
Let us introduce some algebraic notation in order to work with the collection of iterated integrals. Denote by T ((R d )) the space of formal power series in d non-commuting variables x 1 , x 2 , . . . , x d . We can conveniently store all the iterated integrals of the curve X in T ((R d )), by defining the signature of X to be 2 The reader might prefer to just think of a (piecewise) smooth curve. 3 Since X is of bounded variation the integrals are well-defined using classical Riemann-Stieltjes integration (see for example Chap. 6 in [43]). This generalizes the notation in the introduction above beyond smooth curves. It can be pushed much further though. In fact the following considerations are purely algebraic and hence hold for any curve for which a sensible integration theory (in particular: obeying integration by parts) exists. A relevant example is Brownian motion which, although being almost surely nowhere differentiable, nonetheless admits a stochastic (Stratonovich) integral. 4 Also called the "rough path signature".
Here the sum is taken over all n ≥ 0 and all i 1 , . . . , i n ∈ {1, 2, .., d}. For n = 0 the summand is, for algebraic reasons, taken to be the constant 1. The algebraic dual of T ((R d )) is T (R d ), the space of polynomials 5 in x 1 , x 2 , . . . , x d . The dual pairing, denoted by ·, · is defined by declaring all monomials to be orthonormal, so for example Here, we write the element of T ((R d )) on the left and the element of T (R d ) on the right. We can "pick out" iterated integrals from the signature as follows The space T ((R d )) becomes an algebra by extending the usual product of monomials, denoted ·, to the whole space by bilinearity. Note that · is non-commutative.
On T (R d ) we often use the shuffle product ∃ which, on monomials, interleaves them in all order-preserving ways, so for example Note that ∃ is commutative. Monomials, and hence homogeneous polynomials, have the usual concept of order or homogeneity. For n ≥ 0 we denote the projection on polynomials of order n by π n , so for example See [41] for more background on these spaces.
As mentioned above, every polynomial expression in terms of the signature can be rewritten as a linear expression in (different) terms of the signature. This is the content of the following lemma, which is proven in [39] (see also [41,Corollary 3.5] Remark 2. 2 We have used this fact already in the introduction, where we confirmed by hand that The concatenation of curves is compatible with the product on T ((R d )) in the following sense (for a proof, see for example [16,Theorem 7.11]).

Lemma 2.3 (Chen's relation) For curves
We will use the following fact repeatedly, which also explains the commonly used name tensor algebra for T (R d ).

Lemma 2.4
The space of all multilinear maps on R d × · · · × R d (n-times) is in a one-to-one correspondence with homogeneous polynomials of order n in the non-commuting variables x 1 , . . . , x d by the following bijective linear map with e i being the i-th canonical basis vector of R d .
For example, with d = 2 and n = 3, we can consider the multilinear map ψ which takes

General Linear Group
be the general linear group of R d .
for all A ∈ GL(R d ) and all curves X.

Lemma 3.3
For all A ∈ R d×d and any curve X, Proof It is enough to verify this on monomials φ = x 1 ..x m . Then, since the r -th component of the curve AX is equal to (AX) r = jr A r jr X jr , we get We can simplify the concept of GL invariants further, using the next lemma. Owing to the shuffle identity, signatures of curves live in a nonlinear subset of the whole tensor algebra T ((R d )), the set of "grouplike elements" (compare [41,Sect. 3.1]). It turns out though that they linearly span all of T ((R d )).

Lemma 3.4
For n ≥ 1 span π n S(X) 0,T : X curve = π n T R d . (1) Proof It is clear by definition that the left hand side of (1) is included in π n T ((R d )). We show the other direction and use ideas of [3,Proposition 4]. Let x in · . . . · x i 1 ∈ π n T ((R d )) be given. Let X be the piecewise linear path that results from the concatenation of the vectors t 1 e i 1 , t 2 e i 2 up to t n e in , where e i , i = 1, .., d is the standard basis of R d . Its signature is given by (see for example [16,Chap. 6]) where the exponential function is defined by its power series. Then Combining this with the fact that left hand side of (1) is a closed set we get that These elements span π n T ((R d )), which finishes the proof.
Hence φ is a GL invariant of weight w in the sense of Definition 3.1 if and only if for all A ∈ GL(R d ) Since the action respects homogeneity, we immediately obtain that projections of invariants are invariants (take B = (det A) −w A in the following lemma): Bπ n φ = π n φ, for all n ≥ 1.
Proof By definition, the action of GL on T (R d ) commutes with π n .
In order to apply classical results in invariant theory, we use the bijection poly between multilinear functions and non-commuting polynomials, given in Lemma 2.4.

Lemma 3.6
For ψ : (R d ) ×n → R multilinear and A ∈ GL(R d ), The simplest multilinear function satisfying Ψ (Av 1 , .., Av n ) = det(A)Ψ (v 1 , .., v n ) that one can maybe think of, is the determinant itself. That is, n = d and where v 1 v 2 ..v n is the d × d matrix with columns v i . Up to a scalar this is in fact the only one, and it turns out that invariants of higher weight are built only using determinants as a building block.
To state the following classical result, we introduce the notion of Young diagrams, which play an important role in the representation theory of the symmetric group.
Let λ = (λ 1 , .., λ r ) be a partition of n ∈ N, which we assume ordered as λ 1 ≥ λ 2 ≥ .. ≥ λ r . We associate to it a Young diagram, which is an arrangement of n boxes into left-justified rows. There are r rows, with λ i boxes in the i-th row. For example, the partition (4, 2, 1) of 7 gives the Young diagram A Young tableau is obtained by filling these boxes with the numbers 1, .., n. Continuing the example, the following is a Young tableau 2 3 7 1 5 4

6
A Young tableau is standard if the values in every row are increasing (from left to right) and are increasing in every column (from top to bottom). The previous tableau was not standard; the following is. The following result is classical, see for example Dieudonné [10, Sect. 2.5], [50] and [18], none of which explicitly give a basis for the invariants though. See [47, Theorem 4.1.12] for a slightly different basis.

Theorem 3.7 The space of multilinear maps
for all A ∈ GL(R d ) and v 1 , . . . , v n ∈ R d is non-empty if and only if n = wd for some integer w ≥ 1.
In that case, a linear basis is given by where C i are the columns of Σ , and Σ ranges over all standard Young tableaux correspond- Remark 3.8 A consequence of this theorem is the existence of identities between products of determinants. For example, for vectors v 1 , .., v 4 ∈ R 2 , one can check by hand This is why the product on the left-hand side here is not part of the basis in the previous lemma for d = 2, w = 2 (compare Sect. 3.1).
Identities of this type are called Plücker identities. They have a long history and are a major ingredient in the representation theory of the symmetric group. The procedure of reducing certain products of determinants to a basic set of such products is called the straightening algorithm [44,Sect. 2.6]. See also [30] and [48].

Remark 3.9
The only invariant for d = 2, w = 1 is a Lie polynomial. One can generally ask for invariant Lie polynomials [41,Sect. 8.6.2]. This seems to be of no relevance to the application of invariant feature extraction for curves though.
w be the number of linear independent invariants of weight w. By Theorem 3.7, this is the number of standard Young tableaux of shape (w, w, .., w). By the Hook formula [44, Theorem 3.10.2] For example for d = 2, the number of invariants for weights w = 0, 1, 2, 3, . . .

Proof of Theorem 3.7 Write
clearly spans a one-dimensional irreducible representation of GL(V ). Hence we need to investigate all one-dimensional irreducible representation of GL(V ) contained in V ⊗n (and it will turn out that all of them satisfy (2)). The (diagonal) action of GL(V ) on V ⊗n is best understood by simultaneously studying the left action of S n on V ⊗n given by By Schur-Weyl duality, [29, Theorem 6.4.5.2], as S n × GL(V ) modules, where the sum is over integer partitions λ of n, the S λ are irreducible representations of S n , to be detailed below and the V λ are irreducible representations of GL(V ). The exact form of the latter is irrelevant here, we only need to know that V λ is one-dimensional if and only if λ = (w, .., w), d-times, for some integer w ≥ 1, [10, p. 21]. This gives the condition n = wd in the statement. We assume this to hold from now on.
We are hence left with understanding the unique copy of the "Specht module" S λ inside of V ⊗n . We sketch its classical construction. Let us recall that a tabloid is an equivalence class of Young tableaux modulo permutations leaving the set of entries in each row invariant [44,Chap. 2]. 6 For t a Young tableau denote {t} its tabloid, so for example The symmetric group S n acts on Young tableaux as For example It then acts on tabloids by τ · {t} := {τ · t}. Define for a Young tableau t where the sum is over all π ∈ S n that leave the set of values in each column invariant. For example with we get Then Irrep (w,..,w) := span e t : t Young tableau of shape (w, .., w) is an irreducible representation of S n and e t : t standard Young tableau of shape (w, .., w) , forms a basis [44,Theorem 2.5.2]. This concludes the reminder on representation theory for S n . Define the map ι from the space of tabloids of shape (w, .., w) into V ⊗n as follows, 6 One can also think of a tabloid as the following element of the vector space spanned by Young tableaux, Here the sum is over all permutations π that leave the elements of each row of t unchanged.
where e * i is the canonical basis of V and This is a homomorphism of S n representations. Indeed, On the other hand with p := r τ −1 ( ) and So indeed ι(τ · {t}) = τ · ι({t}), and ι is a homomorphism of S n representations. It is a bijection from the space of (w, .., w) tabloids into the space spanned by the vectors Restricting to Irrep (w,..,w) then yields an isomorphism of irreducible S n representations. Hence ι(Irrep (w,..,w) ) is the (unique) realization of S λ inside of V ⊗n in (3). We finish by describing its image. Consider the standard Young tableau t first of shape (w, w, .., w) obtained by filling the columns from left to right, i.e.
Clearly, for any (standard) Young tableau t there exists a unique σ t ∈ S n such that We claim Indeed, since ι is a homomorphism of S n representation, It remains to check Every π ∈ S n that is column-preserving for t first can be written as the product π 1 · .. · π w , with π j ranging over the permutations of the entries of the j -th column t first . Then as desired.
Applying Lemma 2.4 to Theorem 3.7 we get the invariants in T (R d ).

Proposition 3.11 A linear basis for the space of GL invariants of order n = wd is given by
where C i are the columns of Σ , Σ ranges over all standard Young tableaux corresponding to the partition λ = (w, w, .., w) d times of n, and the notation v C is as introduced in Theorem 3.7.
Remark 3.12 By Lemma 3.5, for any invariant φ ∈ T (R d ) and n ≥ 1 we have that π n φ is also invariant. Hence the previous theorem characterizes all invariants we are interested in (Definition 3.1), not just homogeneous ones.

Remark 3.13
Note that each of these invariants φ consists only of monomials that contain every variable x 1 , . . . , x d at least once. This implies that S(X) 0,T , φ consists only of iterated integrals that contain every component X 1 , . . . , X d of the curve at least once. Hence, if at least one of these components is constant, the whole expression will be zero.
Since φ is invariant, this implies that S(X) 0,T , φ = 0 as soon as there is some coordinate transformation under which one component is constant, that is whenever the curve X stays in a hyperplane of dimension strictly less then d.
One of the simplest curves in d dimensions that does not lie in any hyperplane of lower dimension is the moment curve We will come back to this example in Lemma 3.29.

Examples
We will use the following short notation: We present the invariants described in Sect. 2 for some special cases of d and w.

− 21
Remark 3.14 Let us make clear that from the perspective of data analysis, the "invariant" of interest is really the action of this element in T (R d ) on the signature of a curve.
In this example, the real number changes only by the determinant of A ∈ GL(R 2 ) when calculating it for the transformed curve AX: Level 4 (w = 2) Remark 3.15 This is a linear basis of invariants in the fourth level. If one takes algebraic dependencies into consideration, the set of invariants becomes smaller. To be specific, assume that one already has knowledge of the invariant of level 2 (i.e. S(X) 0,T , 12 − 21 ). If, say in a machine learning application, the learning algorithm can deal sufficiently well with nonlinearities, one should not be required to provide additionally the square of this number. In other words | S(X) 0,T , 12 − 21 | 2 can also be assumed to be "known". But, by the shuffle identity (Lemma 2.1), this can be written as Now, seeing that 4 · 1122 − 4 · 1221 − 4 · 2112 + 4 · 2211 is invariant, there is only one "new" independent invariant in the fourth level, namely 1212 − 1221 − 2112 + 2121.
A similar analysis can also be carried out for the following invariants, but we refrain from doing so, since it can be easily done with a computer algebra system.

The Invariant of Weight One, in Dimension Two
Geometric Interpretation The invariant for d = 2, w = 1, namely φ = x 1 x 2 − x 2 x 1 has a simple geometric interpretation: it picks out (two times) 7 the area (signed, and with multiplicity) between the curve X and the cord spanned between its starting and endpoint (compare Fig. 1). For (smooth) non-intersecting curves, this follows from Green's theorem [43,Theorem 10.33]. For self-intersecting curves, the mathematically most convenient definition of "signed area" is the integral (in the plane) of its winding number. The claimed relation to the invariant φ is for example proven in [34,Proposition 1].

Connection to Correlation
Assume that X is a continuous curve, piecewise linear between some time points t i , i = 0, . . . , n. 8 The area is then explicitly calculated as Here, for two vectors a, b of length n 7 The prefactor 1/2 is irrelevant, so we will speak of φ and also of 1 2 φ as picking out the area. 8 The standard example is a time series that is discretely observed at times t i and linearly interpolated in between.
the lag-one cross-correlation, which is a commonly used feature in signal analysis, see for example [38,Chap. 13.2]. 9 In particular, if the curve starts at 0, we have which is an antisymmetrized version of the lag-one cross-correlation.
Remark 3. 16 The antisymmetrized version of the lag τ cross-correlation, for each τ ≥ 2, is also a GL(R 2 ) invariant of the curve. In general these invariants cannot be found in the signature, and we thank the anonymous referee for pointing out the following example. Consider the treelike curve which linearly interpolates the following points Its signature is trivial, but

The Invariant of Weight One, in Any Dimension
Whatever the dimension d of the curve's ambient space, the space of invariants of weight 1 has dimension 1 and is spanned by Here, for a matrix C of non-commuting variables, (compare [14, Definition 3.1]) This invariant is of homogeneity d. The following lemma tells us that we can write Inv d in terms of expressions on lower homogeneities.
To state it, we first define the operation InsertAfter(x i , r) on monomials of order n ≥ r, as the insertion of the variable x i after position r, and extend it linearly. For example where x j denotes the omission of that argument.

Remark 3.18
For completeness, we also note the related de Bruijn's formula. For d even, and the Pfaffian (with respect to the shuffle product), is For a proof see [7] and [33].
Proof The first statement follows from expressing the determinant in (4) in terms of minors with respect to the row r + 1 (since the x i are non-commuting, this does not work with columns!).
Regarding the second statement, since d is odd and then using the first statement InsertAfter(x j , r) Inv d−1 (x 1 , .., x j .., x d ) as claimed.
An immediate consequence is the following lemma.

Lemma 3.19
If the ambient dimension d is odd and the curve X is closed (i.e. X T = X 0 ) then Proof By Lemma 3.17 and then by the shuffle identity (Lemma 2.1) is zero for all j by assumption.
In even dimension we have the phenomenon that closing a curve does not change the value of the invariant.
Letting the summands act on S(X) 0,t we get ±1 times Since the signature is also invariant to translation, we can assume p 0 = 0. Now both sides of the statement transform the same way under the action of GL(R d ) on the points p 1 , .., p d . It is then enough to prove this for .
Now, for this particular choice of points the right hand side is clearly equal to 1. For the left hand side, the only non-zero term is We now proceed to piecewise linear curves with more than d vertices.

Lemma 3.23
Let X be the piecewise linear curve through, p 0 , .., p n ∈ R d , with n ≥ d. Then,

Remark 3.24
The number of indices is easily calculated. In the even case, we have B := d/2 "groups of two" to place, A := n − d "fillers" in between. This gives where r is the largest integer less than or equal to r.
In the odd case, we have B := (d − 1)/2 "groups of two" to place, with A := n − 1 − (d − 1) "fillers" in between. This gives Remark 3.25 Consider the case d = 2, and a curve X through the points p 0 , p 1 , .., p n ∈ R d , with p 0 = 0. Then We can express Inv d as a linear combination of the 2 × 2 minors P i,j of the 2 × n matrix (p 1 , p 2 , .., p n ). Generally, it is well-known that all invariants to GL(R 2 ) of a tuple of points are expressible in terms of these minors [47,Sect. 3.2]. So, for a piecewise linear curve through 0, p 1 , .., p n , all our integral invariants are-a fortiori-expressible in terms of them. In the simple case shown here, this expression is just a linear combination. Experimentally, for higher order invariants, polynomial combinations appear with a lot of structure. This poses the question on whether one can set up some kind of "GL invariant integration", where, instead of the classical Riemann integration that uses increments, one "integrates" using only these P i,j .  For the last equality we used that and that the increments of all curves X (i) are zero. Now by Lemma 3.20 we can omit the last straight line in every X (i) and hence by Lemma 3.21 which finishes the proof for d = 2. Now assume the statement is true for all dimensions strictly smaller than some d. We show it is true for d. d is odd As before we can assume p 0 = 0 and that p n lies on the x 1 axis. Every sequence summed over on the right-hand side of (5) is of the form i = (0, . . . , n). For each of those, we calculate Herep j ∈ R d−1 is obtained by deleting the first coordinate of p j , e 1 is the first canonical coordinate vector in R d and := (p 0 − p n ) 1 = S(X), x 1 is the total increment of X in the x 1 direction. Here we used that d is odd (otherwise we would get a prefactor −1). The last determinant is the expression for the summands of the right-hand side of (5), but with dimension d − 1 and points 0 =p 0 ,p 1 , ..,p n−1 . By assumption, summing up all these determinants gives whereX is the curve in R d−1 through the pointsp 0 , ..,p n−1 . Sincep n =p 0 = 0, we can attach the additional pointp n toX without changing the value here (Lemma 3.20). Hence the sum of determinants is equal to Since we arranged matters such that S(X), x i = 0 for i = 1, this is equal to where we used the shuffle identity, Lemma 2.1. By the second part of Lemma 3.17 this is equal to S(X), Inv d , which finishes the proof for odd d.

d is even
We proceed by induction on n. For n = d the statement follows from Lemma 3.21.
Let it be true for some n, we show it for a piecewise linear curve through some points p 0 , .., p n+1 . Write X = X X where X is the linear interpolation of p 0 , .., p n , X is the linear path from p n to p n+1 and we recall concatenation of paths from Lemma 2.3. By assumption, (5) is true for the curve X . Adding an additional point p n+1 , the sum on the right hand side of (5) gets additional indices of the form and where j 1 , .., j d−2 ranges over all possible increasing subsequences of 1, 2, .., n − 1 such that for odd j + 1 = j +1 .
Assume p n+1 − p n = · e 1 lies on the x 1 -axis. Then, summing over those j , HereX is the curve in R d−1 through the pointsp 0 , ..,p n , and we used the fact that the indices j here range over the ones used for (5) Here π = (0 = t π 0 , .., t π n π = T ) is a partition of the interval [0, T ] and |π| denotes its mesh size. The indices i are chosen as in Lemma 3.23. The previous theorem is almost a tautology, but there are relations to classical objects in geometry. For d = 2, as we have seen in Sect. 3.2, is equal to the signed area of the curve X. In general dimension, the value of the invariant is related to some kind of classical "volume" if the curve satisfies some kind of monotonicity. This is in particular satisfied for the "moment curve".

Lemma 3.29 Let X be the moment curve
Then for any T > 0 We deduce that (2) ..
In [24,Sect. 15], the value of this volume is determined, for T = 1, as We hence get the combinatorial identity (2) .. .
Proof For n ≥ d let 0 = t 0 < .. < t n ≤ T be time-points, let p i := X t i be the corresponding points on the moment curve and denote by X n the piecewise linear curve through those points. We will show First note that for any 1 since it is a Vandermonde determinant. We will decompose P := {p 0 , .., p n } into (overlapping) sets S with cardinality d + 1 and such that 10 Convex-Hull(p 0 , .., p n ) =

Convex-Hull(S ) .
A face of P is a subset F ⊂ P such that its convex hull Convex-Hull(F ) equals the intersection of Convex-Hull(P ) with some affine hyperspace. A face is a facet, if its affine span has dimension d − 1. The following is a fact that is true for any polytope spanned by some points P : up to a set of measure zero, for every point x in Convex-Hull(P ), the line connecting p 0 to x exits Convex-Hull(p 0 , .., p n ) through a unique facet of Convex-Hull(p 0 , .., p n ) contained in {p 1 , .., p n }. Hence Convex-Hull(p 0 , .., p n ) = We are looking for such {i j } such that i 1 ≥ 1. Those are exactly the indices with Together with i 0 := 0 these form the indices of Lemma 3.23. Convex-Hull(p i 0 , .., The determinant is in fact positive here, by (6). We can hence omit the modulus and get by Lemma 3.23. The statement of the lemma now follows by piecewise linear approximation of X using continuity of the convex hull, which follows from [11,Lemma 3.2], and of iterated integrals [16, Proposition 1.28, Proposition 2.7].

Rotations
be the group of rotations of R d .
for all A ∈ SO(R d ) and all curves X. Alternatively, as explained in Sect. 3, for all A ∈ SO(R d ), where the action on T (R d ) was given in Definition 3.2.
Since det(A) = 1, any GL invariant of weight w ≥ 1 (Sect. 3) is automatically an SO invariant. But there are SO invariants that are not GL invariants (of any weight), for example, Switching to the perspective of multilinear maps, this is the map (v 1 , v 2 ) → v 1 , v 2 . It is shown, see for example [50,Theorem 2.9.A], that all invariants are built from the inner product and the determinant.
Recently, a linear basis for these invariants has been constructed. To formulate the result, we need to introduce some notation from [28]. Define Use the following partial order on these sequences: for a ∈ I (r, n), a ∈ I (r , n) a ≥ a if r ≤ r and a j ≥ a j for j ≤ r.

Theorem 4.5 Define
The space of SO invariants on level n in T (C 2 ) is spanned freely by z = z j 1 · .. · z jn with #{r : j r = 1} = #{r : j r = 2}.

The space of SO invariants on level n in T (R 2 ) is spanned freely by
Re[z], Im[z] with #{r : j r = 1} = #{r : j r = 2} and z 1 = 1.

Remark 4.6
In particular for d = 2 and n even, the dimension of rotation invariants on level n in T (R 2 ) is equal to n n/2 .

The elements z form a basis
Now x j 1 ..x jn : j ∈ {1, 2} is a basis of π n T (C 2 ) with respect to C. Hence z j 1 ..z jn is (the map (x 1 , x 2 ) → (z 1 , z 2 ) is invertible). By Step 1 we have hence exhibited a basis (with respect to C) for all invariants in π n T (C 2 ).

Real invariants
The space of SO invariants on level n in T (C 2 ) is spanned freely by the set of z j 1 · .. · z jn with #{r : j r = 1} = #{r : j r = 2}.
Because z 3−j 1 · .. · z 3−jn is the complex conjugate of z j 1 · .. · z jn , this means that the space of SO invariants on level n in T (C 2 ) is spanned freely by the set of Re(z j 1 · .. · z jn ) and Im(z j 1 · .. · z jn ) with #{r : j r = 1} = #{r : j r = 2} and j 1 = 1. This is an expression for a basis of the SO invariants in terms of real combinations of basis elements of the tensor space. They thus form a basis for the SO invariants for the free real vector space on the same set, namely π n T (R 2 ). Regarding the last point, note the following sequence of equivalences.
This proves the claim.
We are not aware of a general explicit formula for the number of partitions (i.e. the coefficients of the generating function).

An Additional (Time) Coordinate
Assume now that X = (X 0 , X 1 , .., X d ) : [0, T ] → R 1+d . Here X 0 plays a special role, in that we assume that it is not affected by the space transformations under consideration.
Adding an "artificial" 0-th component, usually keeping track of time, X 0 t := t , is a common trick to improve the expressiveness of the signature. In particular, if such an X 0 is monotonically increasing, the enlarged curve (X 0 , X 1 , .., X d ) never has any "tree-like" components (compare Sect. 7), no matter what the original (X 1 , .., X d ) was.
Consider GL invariants for the moment.
the space of invertible maps of R 1+d leaving the first direction unchanged. We call φ ∈ T (R 1+d ) a GL 0 invariant of weight w if for all A ∈ GL 0 (R d ).
Consider the GL(R 2 ) invariant of weight 1 Since elements of GL 0 (R 2 ) leave the variable x 0 unchanged, a straightforward way to produce GL 0 invariants presents itself: insert x 0 at the same position in every monomial. For example is a GL 0 (R 2 ) invariant of weight 1. We now formalize this idea and show that we get every GL 0 invariant this way. Define the linear map Remove of "removing instances of x 0 " on monomials, as Define the linear map of restriction to U on polynomials of order m by defining on monomials

so for example
For z = (z 1 , .., z m+1 ) ∈ N m+1 denote by Insert z the linear operator on polynomials of order m by defining it on monomials as follows. For a monomial x i 1 ..x im of order m, Insert z inserts z 1 occurrences of x 0 before x i 1 , z 2 occurrences of x 0 before x i 2 , .., z m occurrences of x 0 before x im and z m+1 occurrences of x 0 after x im . For example Insert (2,1,4)

Theorem 6.2 A basis for the space of GL 0 invariants of weight w, homogeneous of degree m, is given by the polynomials
with 0 ≤ n ≤ m, ψ ranging over the basis for GL invariant of weight w and homogeneity n (Proposition 3.11) and z ∈ N n+1 such that z = m − n.
Proof Let n, ψ , z be as in the statement. Then, for Therefore Insert z ψ is GL 0 invariant of weight w.
On the other hand, let φ of order m be a GL 0 invariant modulo time of weight w. Define for U ⊂ [m] which collects all monomials having x 0 exactly at the positions in U . Then Now, since φ is GL 0 invariant of weight w and since GL 0 leaves invariant, we get that φ U is GL 0 invariant of weight w. Clearly, there is 0 ≤ n ≤ m and z ∈ N n+1 such that Hence every invariant is in the span of the set given in the statement. They are linearly independent, and hence form a basis.
The corresponding statements for rotations and permutations are completely analogous, so we omit them.

Discussion and Open Problems
We have presented a novel way to extract invariant features of d-dimensional curves, based on the iterated-integral signature. We have identified all those features that can be written as a finite linear combination of terms in the signature. Fig. 2 The lemniscate of Gerono. Traversing it once from each of the two starting points indicated gives two distinct closed curves with distinct iterated-integral signatures, but which cannot be distinguished with the "signature" of [13] These curves both trace a figure called the lemniscate of Gerono which is illustrated in Fig. 2. Both these expressions are free from the symbols ± and ∓. Therefore these two curves have the same values on terms of the form (8).
Moreover, the algorithmic nature of the construction in [13] makes it difficult to proceed to invariants of higher order. In contrast, our method gives an explicit linear basis for the invariants under consideration up to any order.
Regarding the question of whether our invariants are complete we propose the following conjecture. As shown in [22], if S(X) 0,T = S(Y ) 0,T for some curves X and Y , then X is "tree-like equivalent" to Y . For the concrete definition of this equivalence we refer to their paper, but let us give one example. Consider in d = 2, the constant path X t := (0, 0), t ∈ [0, T ] and the piecewise linear path Y , between the points (0, 0), (1, 0) and (0, 0). One can check that S(X) 0,T = S(Y ) 0,T = 1.
The signature has no chance of picking up these kind of "excursions" in a path; this concept is formalized in "tree-like equivalence". We suspect that the following holds true (with corresponding formulations for the other subgroups of GL(R d )). In Proposition 3.11, Proposition 4.4 and Proposition 5.4 we have established a linear basis for invariants for every homogeneity. As already mentioned in Remark 3.15, owing to the shuffle identity, there are algebraic relations between elements of different homogeneity. An interesting open problem is then to find a minimal set of generators for the set of invariants, considered as a subalgebra of the shuffle algebra. This applies to all subgroups of GL(R d ) and their corresponding invariants.
Lastly, a word on (computational) complexity. We have seen in Remark 3.10 the dimensions of GL invariant elements (which is a lower bound on the dimensions of SO invariant elements). 15 In Remark 5.5 we have seen the dimensions for the permutation invariant elements.
Computing the signature itself up to level n has complexity Ω(d n ), since d + .. + d n integrals need to be calculated. So any method that calculates the invariant features of a curve X by first calculating its signature and extracting them (see Remark 3.14) will have computational complexity dominated by the calculation of the signature. Furthermore, the calculation of the invariant elements is a computation that can be done offline (they do not depend on the curve X).
This leaves several directions of future research.
• Is it possible to apply kernelization techniques similar to the ones used for the entire (non-invariant) signature in [25]? These techniques, in the non-invariant setting, allow to use information of the signature up to high levels and dimension for certain learning algorithms. • We have studied in this paper linear expressions on the signature that are invariant to a group action. This was justified by using the shuffle identity (Lemma 2.1), which tells us that any polynomial functional on the signature can in fact be linearized. One can also consider a fixed level n of the signature and look for all nonlinear expressions that are invariant under the group action. This is the classical problem of invariant theory for polynomial rings [47,Sect. 4]. On the one hand, this makes it possible to "peek ahead" in the signature, since one gets invariant information that would only be seen in linear expressions of higher levels than n. On the other hand, except for special cases, there are no explicit expressions for these nonlinear invariant. One has to proceed algorithmically (for example via Derksen's algorithm, [8]) which only works for low dimension d and low levels n. Since the calculation of those nonlinear invariant elements can also be done offline it would nonetheless be nice to have a tabulation of nonlinear invariants (as far as existing algorithms can reach). • For GL invariants, in Remark 3.25 we conjecture the existence of a "GL invariant" signature. This could improve computation time, since no non-invariant integrals have to be computed.