1 Motivation

Given a discrete time series

$$\begin{aligned} x = (x_{0}, x_{1}, \ldots , x_{N}) \in ({\mathbb{R}}^{d})^{N}, \end{aligned}$$

where \(N \ge 1\) is some arbitrary time horizon, our foremost, and original, motivation stems from the desire to extract features from \(x\) that are invariant to time warping.

The precise definition of the latter will be given in Section 4, but Figure 1 illustrates what we mean by time warping: the time series is allowed to “stand still” or to “stutter” (this term is used in [47]), which means that \(x\) has repetitions of values at consecutive time steps (here at time \(t=3\)).

Fig. 1
figure 1

Example of time warping in the case of a discrete time series in \(d=1\) dimensions.

Remark 1.1

In this section we consider the notationally simpler case \(d=1\), that is, when \(x \in {\mathbb{R}}^{N}\).

Our interest is prompted, on the one hand by the extensive literature on the dynamic time warping (DTW) distance [5], a distance on discrete time series that is invariant to time warping. In [47] it is stated that “the time warping distance …does not lead to any natural features”. Our work aims to provide those missing “natural” features.

On the other hand the following example illustrates where such invariant features will become useful.

Example 1.2

Assume that there is a deterministic time series \(x \in {\mathbb{R}}^{N}\) which models some “prototype” evolution of a quantity, say the prototype heartbeat in a patient’s ECG. This prototype is unknown, but one records a lot of samples of it run at different speeds and contaminated by noise (compare [6]). A model for these observations is then

$$\begin{aligned} y^{(\ell )}_{n} = x_{h^{(\ell )}(n)} + w^{(\ell )}_{n}, \quad n = 1, \dots , M,\, \ell = 1, \dots , L. \end{aligned}$$

Here \(L\) is the number of observations, \(M \ge N\) is the time horizon we allow the prototype to be “spread out” over, \(h^{(\ell )}: [1,\ldots ,M] \to [1,\ldots ,N]\) are unknown non-decreasing, surjective time changes and \(w^{(\ell )}_{n}\) are independent and identically distributed (iid) random walks. The goal is to recover \(x\) (up to time warping).

The currently used method [6, 31, 33], consists in first trying to align the different samples, i.e., to estimate the time-changes \(h^{(\ell )}\), and to average afterwards. This seems to work well in regimes where the noise \(w^{(\ell )}\) is small (large signal-to-noise ratio), but will break down if this is not the case.

Guided by invariant methods in cryo-EM [4] we then propose the following procedure.

  1. (1)

    Calculate features of \(y^{(\ell )}\) that do not see time warpings.

  2. (2)

    Average those features over the independent samples, giving the law of large numbers a chance to cancel out the noise and getting an approximation of the features of \(x\).

  3. (3)

    Invert the averaged features to arrive at a candidate for \(x\).

Our approach to Step (1) is new and will be presented in this paper. Step (2) and Step (3) will be addressed in future work.

A moment’s thought reveals that iterated-sums of the increments of \(x\) are invariant in the desired sense. For example, the simple sum \(\sum _{i} (x_{i} - x_{i-1})\) or the more complex expressions

$$ \sum _{i} (x_{i} - x_{i-1})^{2}, \quad \sum _{i_{1} < i_{2}} (x_{i_{1}} - x_{i_{1}-1}) (x_{i_{2}} - x_{i_{2}-1}), \quad \sum _{i_{1} \le i_{2}} (x_{i_{1}} - x_{i_{1}-1}) (x_{i_{2}} - x_{i_{2}-1}), $$
(1)

are features of the time series that do not change when warping time, i.e., when repetitions of points, \(x_{i}=x_{i+1}=\cdots =x_{i+j}\) occur in \(x\).

Remark 1.3

To accommodate repetition of points, here we have conveniently written the sum over an unspecified set of time-points. We can think of the sum taken over \({\mathbb{N}_{+}}\), with \(x\) being extended constantly as \(x_{N}\) after time \(N\).

However, two questions immediately emerge

  1. (A)

    The three expressions in (1) are already linearly dependent (adding the first and second sum gives the third). How to store only linearly independent expressions?

  2. (B)

    Do iterated-sums of increments give all (polynomial) time warping invariants?

Regarding the first item, it turns out that the above iterated-sums expressions are reminiscent of quasisymmetric functions [39]. Consider the space \({\mathbb{R}}\langle Y_{1},Y_{2},Y_{3},\dots \rangle \) of formal power series in ordered commuting variables \(Y_{1}, Y_{2}, Y_{3}, \dots \). By definition, a power series (of finite degree) \(Q \in {\mathbb{R}}\langle Y_{1},Y_{2},Y_{3},\dots \rangle \) is a quasisymmetric function if for all \(n \ge 1\), all \(i_{1} < \cdots < i_{n}\), all \(j_{1} < \cdots < j_{n}\) and all \(\alpha _{1}, \dots , \alpha _{n} \ge 1\), the coefficient of the monomial \((Y_{i_{1}})^{\alpha _{1}} \cdots (Y_{i_{n}})^{\alpha _{n}}\) in \(Q\) is equal to the one of \((Y_{j_{1}})^{\alpha _{1}} \cdots (Y_{j_{n}})^{\alpha _{n}}\). First examples are

$$\begin{aligned} \sum _{i} Y_{i}, \qquad \sum _{i} (Y_{i})^{2}, \quad \sum _{i_{1} < i_{2}} Y_{i_{1}} Y_{i_{2}}, \quad \sum _{i_{1} \le i_{2}} Y_{i_{1}} Y_{i_{2}}, \end{aligned}$$

and we see that the invariants given above follow from the evaluation of these quasisymmetric functions at \(Y_{1} \mapsto x_{1} - x_{0}\), \(Y_{2} \mapsto x_{2} - x_{1}\), \(\dots , Y_{N} \mapsto x_{N} - x_{N-1}\), and \(Y_{i} \mapsto 0\) for \(i \ge N+1\).

Different linear basis for quasisymmetric functions are known. The one of monomial quasisymmetric functions of [39] is indexed by compositions of integers. Anticipating the multidimensional case, we write a composition \(c_{1} + \cdots + c_{k} = n\) as \([\mathtt{1}^{c_{1}}] \cdots [\mathtt{1}^{c_{k}}]\), and obtain the correspondence

$$\begin{aligned}{} [\mathtt{1}^{c_{1}}] \cdots [\mathtt{1}^{c_{k}}] \longleftrightarrow M_{(c_{1}, \ldots , c_{k})}:= \sum _{i_{1} < \cdots < i_{k}} (Y_{i_{1}})^{c_{1}} \cdots (Y_{i_{k}})^{c_{k}}. \end{aligned}$$

Quasisymmetric functions are a refinement of symmetric functions and form a commutative unital algebra. The product is just the polynomial product in the power series representation. It amounts to a so-called quasi-shuffle product (see Section 2) in the representation as compositions. For example, the abstract quasi-shuffle product

$$\begin{aligned}{} [\mathtt{1}] * [\mathtt{1}^{3}][\mathtt{1}^{7}] &= [\mathtt{1}] [ \mathtt{1}^{3}][\mathtt{1}^{7}] + [\mathtt{1}^{3}][\mathtt{1}][ \mathtt{1}^{7}] + [\mathtt{1}^{3}][\mathtt{1}^{7}][\mathtt{1}] + [ \mathtt{1}^{4}][\mathtt{1}^{7}] + [\mathtt{1}^{3}][\mathtt{1}^{8}] \end{aligned}$$

corresponds to the concrete product of power series

$$\begin{aligned} &\left (\sum _{i} Y_{i}\right ) \cdot \left (\sum _{i_{1}< i_{2}} (Y_{i_{1}})^{3} (Y_{i_{2}})^{7} \right ) \\ &\quad = \sum _{i_{1} < i_{2} < i_{3}} Y_{i_{1}} (Y_{i_{2}})^{3} (Y_{i_{3}})^{7} + \sum _{i_{1} < i_{2} < i_{3}} (Y_{i_{1}})^{3} Y_{i_{2}} (Y_{i_{3}})^{7} \\ & \qquad{} + \sum _{i_{1} < i_{2} < i_{3}} (Y_{i_{1}})^{3} (Y_{i_{2}})^{7}Y_{i_{3}} + \sum _{i_{1} < i_{2}} (Y_{i_{1}})^{4} (Y_{i_{2}})^{7} + \sum _{i_{1} < i_{2}} (Y_{i_{1}})^{3} (Y_{i_{2}})^{8}. \end{aligned}$$

The latter equality follows by case distinction for sums over the three indexing variables, which amounts to a summation-by-parts formula. The last two terms in the above product reflect the fact that multiplying sums requires the inclusion of sums over diagonal terms.

It is natural to store the iterated-sums invariants of the discrete time series \(x\) as a linear map \(\operatorname{ISS}(x)\) on the quasi-shuffle algebra of compositions, by defining the pairing

$$\begin{aligned} \langle [\mathtt{1}^{c_{1}}] \cdots [\mathtt{1}^{c_{k}}], \operatorname{ISS}(x) \rangle := \sum _{i_{1} < \cdots < i_{k}} ( \Delta x_{i_{1}})^{c_{1}} \cdots (\Delta x_{i_{k}})^{c_{k}}. \end{aligned}$$

Here \(\Delta x_{i} := x_{i} - x_{i-1}\) for \(1 \le i \le N\), and as above we extend \(x\) constantly, so that \(\Delta x_{i} := 0\) for \(i \ge N+1\). From the correspondence between the product of power series and the quasi-shuffle product of compositions mentioned above we deduce that

$$\begin{aligned} \langle [\mathtt{1}^{p_{1}}] \cdots [\mathtt{1}^{p_{k}}], \operatorname{ISS}(x) \rangle \cdot \langle [\mathtt{1}^{q_{1}}] \cdots [\mathtt{1}^{q_{l}}], \operatorname{ISS}(x) \rangle &= \langle [\mathtt{1}^{p_{1}}] \cdots [\mathtt{1}^{p_{k}}] * [ \mathtt{1}^{q_{1}}] \cdots [\mathtt{1}^{q_{l}}], \operatorname{ISS}(x) \rangle . \end{aligned}$$

Hence, \(\operatorname{ISS}(x)\), which we call iterated-sums signature, is an algebra morphism (from the quasi-shuffle algebra to the underlying base field \(\mathbb{F}\)). Since compositions form a linear basis, this answers Question (A) above – in the case \(d=1\). We will come back to Question (B) in Section 4.

The commutative algebra of quasisymmetric functions is the free quasi-shuffle algebra over one generator and it is - as we just saw - the correct framework to store iterated-sums for a one-dimensional time-series. The appropriate generalisation of this algebra to arbitrary dimension \(d \ge 1\), that is, the free quasi-shuffle algebra over \(d\) generators, was carried out by Hoffman [27].

The aforementioned amounts to saying that iterated-sums signature \(\operatorname{ISS}(x)\) is an element of the dual space of the quasi-shuffle algebra over \(d\) generators. It can therefore be represented as an infinite word series with iterated-sums of the time series \(x\) as coefficients. Its compatibility with the quasi-shuffle product together with the fact that the latter can be seen as a deformation of the classical shuffle product [18] suggests to consider \(\operatorname{ISS}(x)\) as a discrete analog of Chen’s iterated-integrals signature over continuous curves [10, 44]. The latter plays an important role in the theory of controlled ordinary differential equations (ODEs), stochastic analysis and Lyons’ theory of rough paths [19, 37]. Such a large spectrum of applications reflects the important property of iterated-integrals to provide - in some sense - a complete representation of a curve, so that arbitrary functionals on curves should be well approximated by functions on its signature. There is a caveat though. Iterated integrals are tailor made to approximate functionals that stem from controlled ODEs. But as is quickly realised, this does not mean that the iterated-integrals signature is an optimal representation for other input-output systems. For example, since a controlled ODE – and hence also the signature – cannot see tree-like excursions, the iterated-integrals signature of a one-dimensional path reveals nothing about the path, except for its increment.There are several procedures to circumvent this shortcoming, and to obtain information even about tree-like parts of a curve using signature. These procedures usually consists of lifting the path to a higher-dimensional curve and calculating the signature of it. The aforementioned limitations of the iterated-integrals signature with respect to tree-like paths prompts us to propose instead the use of “discrete time signature” \(\operatorname{ISS}(x)\), which, instead of storing iterated-integrals, gathers iterated-sums.

Remark 1.4

For the precise definition of “tree-like” see [24]; but one can think of a curve that completely “tracks back”. In particular in dimension 1, every curve that has coinciding start- and endpoint is tree-like.

The paper is organised as follows. Section 2 recalls the notion of quasi-shuffle Hopf algebra and quasisymmetric functions. In Section 3 we introduce the iterated-sums signature and show its character property with respect to the quasi-shuffle Hopf algebra. Moreover, we show that Chen’s property is satisfied, but that Chow’s Theorem does not hold. Hence, while mirroring the setup of Chen’s iterated-integrals signature to some extent, interesting differences emerge. It turns out that our description of the iterated-sums signature is nicely related to the work [42] on “multidimensional” generalisation of quasisymmetric functions, and we dwell on this briefly in Remark 3.5. In Section 4 we show the iterated-sums signature contains (almost) all time warping invariants. In Section 5 we use a specific Hopf algebra isomorphism, known as Hoffman’s exponential, to relate the iterated-sums signature to Chen’s iterated-integrals signature (of an infinite-dimensional path). This includes in particular relating the continuous and discrete area operations.

In the following all algebraic structures are defined over a base field \(\mathbb{F}\) of characteristic zero. The reader is invited to think of the field \(\mathbb{F}\) as the reals, \(\mathbb{F}={\mathbb{R}}\), or the complex numbers, \(\mathbb{F}=\mathbb{C}\), throughout.

We denote \({\mathbb{N}}:=\{0,1,2,\dots \}\) and \({\mathbb{N}_{+}}:=\{1,2,\dots \}\). All (co)algebras are (co)unital and (co)associative unless otherwise stated. For details on Hopf algebras the reader is referred to [9, 26, 36, 40, 46].

2 Quasi-Shuffle Hopf Algebra

The notion of quasi-shuffle product appeared first in a 1972 article by Cartier [8]. Its Hopf algebraic relevance was explored in the 1979 paper [41]. Two decades later, Hoffman [27] provided a comprehensive account of the quasi-shuffle product in a Hopf algebraic framework. Meanwhile, quasi-shuffle products appeared under different names, i.e., modified shuffle product [20, 34], sticky-shuffle [29, 30], overlapping shuffle [25], stuffle and harmonic product [48].

We recall the inductive definition of the quasi-shuffle product following Hoffman [27]. See also [7, 16]. Our starting point is the alphabet \(A=\{\mathtt{1},\mathtt{2},\ldots ,\mathtt{d}\}\), which we augment to a free commutative semigroup, \(\mathfrak{A}\), by defining a commutative product denoted by square brackets, \([- -]\colon \mathfrak{A} \times \mathfrak{A} \to \mathfrak{A}\). For example, the product between the letters \(\mathtt{1},\mathtt{2}\in A\) is written \([\mathtt{1} \mathtt{2}]=[\mathtt{2} \mathtt{1}]\). Any iteration of the product in \(\mathfrak{A}\) can be simplified to an expression containing a single pair of brackets, that is, \([\mathtt{i_{1}} \cdots \mathtt{i_{n}}] :=[\mathtt{i_{1}} [ \cdots [\mathtt{i_{n-1}} \mathtt{i_{n}}]]\cdots ]\). For instance, \([\mathtt{1}\mathtt{2}\mathtt{3}]=[\mathtt{1}[\mathtt{2} \mathtt{3}]]\) in \(\mathfrak{A}\). Elements in the tensor algebra \(T(\mathfrak{A})\) over (the vector space spanned by) \(\mathfrak{A}\) are denoted by words, i.e., we denote the tensor product by concatenation, or juxtaposition of basis elements. The neutral element for this product is the empty word, denoted by \(e\). The augmentation ideal is defined by \(T_{+}(\mathfrak{A}):=\oplus _{n>0} {\mathfrak{A}}^{\otimes n}\) such that \(T(\mathfrak{A})= \mathbb{F}e \oplus T_{+}(\mathfrak{A})\).

The commutative quasi-shuffle product \(m_{\star }\colon T(\mathfrak{A})\otimes T(\mathfrak{A})\to T( \mathfrak{A})\), \(u \star v:=m_{\star }(u,v)\), is introduced by inductively defining \(e\star u:=u =:u\star e\), for all \(u\in T(\mathfrak{A})\), and

$$ ua \star vb:=(u\star vb)a+(ua\star v)b+(u\star v)[ab], $$
(2)

for \(u,v\in T(\mathfrak{A})\) and \(a,b \in \mathfrak{A}\). For example, \(\mathtt{2} \star \mathtt{3}=\mathtt{2}\mathtt{3} +\mathtt{3} \mathtt{2}+[\mathtt{2} \mathtt{3}]\) and

$$ \mathtt{3} \star \mathtt{4} [\mathtt{12}] = \mathtt{34}[ \mathtt{12}] + \mathtt{43}[\mathtt{12}] +\mathtt{4} [\mathtt{12}] \mathtt{3} + [\mathtt{34}][\mathtt{12}] + \mathtt{4}[ \mathtt{123}]. $$
(3)

The tensor algebra is naturally graded by the length of words, \(\ell (w_{1}\cdots w_{n})=n\) for \(w_{1}\cdots w_{n} \in T(\mathfrak{A})\). However, in light of the new product (2), which is not homogenous with respect to the number of letters, we introduce the weight grading on \(T(\mathfrak{A})\), denoted \(|\cdot |\), by declaring that \(|e|=0\), \(|\mathtt{a}|=1\) for all \(\mathtt{a}\in A\) and \(|[pq]|=|p|+|q|\) for all \(p,q\in \mathfrak{A}\). Finally, for a word \(w=w_{1}\cdots w_{n}\in T(\mathfrak{A})\) we define its weight to be \(|w|=|w_{1}|+\cdots +|w_{n}|\).

Let \(\delta \colon T(\mathfrak{A})\to T(\mathfrak{A})\otimes T(\mathfrak{A})\) denote the deconcatenation coproduct defined on a nonempty word \(w=w_{1}\cdots w_{n} \in T(\mathfrak{A})\) by

$$ \delta (w):=w \otimes e + e \otimes w + \sum _{i=1}^{n-1} w_{1}\cdots w_{i}\otimes w_{i+1}\cdots w_{n}, $$
(4)

and \(\delta (e)=e \otimes e\). It turns \(T(\mathfrak{A})\) into a connected graded coalgebra, for both the length and weight grading. For any word \(w \in T_{+}(\mathfrak{A})\) the reduced coproduct is defined by \(\delta '(w):=\delta (w)- w \otimes e - e \otimes w\). “Sweedler’s notation” will be employed for both coproducts: \(\delta (w)=:\sum _{(w)} w_{(1)} \otimes w_{(2)}\) and . The canonical counit map \(\varepsilon \colon T(\mathfrak{A}) \to \mathbb{F}\) is defined to be \(\varepsilon (\lambda e)=\lambda \in \mathbb{F}\) and zero on \(T_{+}(\mathfrak{A})\). In [27] Hoffman showed the following

Theorem 2.1

(Quasi-shuffle Hopf algebra)

1. \(H_{\mathrm{qsh}}=(T(\mathfrak{A}),\star ,\delta ,\varepsilon ,|\cdot |)\)is a graded, connected, commutative, non-cocommutative Hopf algebra.

2. The antipode \(\alpha \colon H_{\mathrm{qsh}} \to H_{\mathrm{qsh}}\)is given by

$$ \alpha (w_{1} \cdots w_{n}) = (-1)^{n} \sum _{I\in \mathcal{C}(n)}I[w_{n} \cdots w_{1}]. $$
(5)

Here \(\mathcal{C}(n)\)is the set of all compositions of the integer \(n\), i.e., tuples \((i_{1},\ldots , i_{p})\)of positive integers such that \(i_{1}+\cdots +i_{p}=n\). Given \(I=(i_{1},\ldots ,i_{p}) \in \mathcal{C}(n)\)and a word \(w=w_{1}\cdots w_{n} \in T(\mathfrak{A})\)of length \(\ell (w)=n>0\), we define a new word \(I[w]\in T(\mathfrak{A})\)by

$$ I[w]:=[w_{1}\cdots w_{i_{1}}][w_{i_{1}+1} \cdots w_{i_{1}+i_{2}}] \cdots [w_{i_{1}+\cdots +i_{p-1}+1}\cdots w_{n}]. $$

Here (as well as later) we are using the suitable convention that \([a]:=a\)for all \(a \in \mathfrak{A}\).

Remark 2.2

(Shuffle Hopf algebra)

If the semigroup \(\mathfrak{A}\) is trivial, i.e., if \([\mathtt{i} \mathtt{j}]=0\) for any letters \(\mathtt{i},\mathtt{j} \in A\), then the quasi-shuffle product (2) reduces to Chen’s commutative shuffle product on \(T(A)\):

for \(v,w \in T(A)\) and \(\mathtt{i} , \mathtt{j} \in A\). Observe that in this case \(|w|=\ell (w)\) for any word and is the classical shuffle Hopf algebra over the alphabet \(A\). From (5) it follows that the antipode on is given by \(\alpha (\mathtt{i}_{1} \cdots \mathtt{i}_{n})=(-1)^{n} \mathtt{i}_{n} \cdots \mathtt{i}_{1}\). See [46] for a comprehensive account on .

Remark 2.3

(A remark on dimensions.)

There is a simple way of computing the Hilbert series

$$ G(t):=\sum _{n\ge 0}t^{n}\dim T(\mathfrak{A})_{n} $$

of \(T(\mathfrak{A})\), where \(T(\mathfrak{A})_{n}:=\mathbb{F}\{w:|w|=n\}\) is the homogeneous (for the weight grading) component of degree \(n\) of the quasi-shuffle algebra. It is not hard to see that all such words are of the form \(I[\mathtt{a}_{1}\cdots \mathtt{a}_{n}]\) for some composition \(I=(i_{1},\ldots ,i_{p})\in \mathcal{C}(n)\) and letters \(\mathtt{a}_{1},\ldots ,\mathtt{a}_{n}\in A\), in the notation of Theorem 2.1. In total, in each block of size \(i_{1}, i_{2},\ldots , i_{p}\) we are allowed to put a symmetric monomial of length \(i_{j}\) of which there are exactly \(\binom{d-1+i_{j}}{i_{j}}\) – this is the dimension of the degree-\(i_{j}\) part of the symmetric algebra \(S(A)\). Therefore

$$ \dim T(\mathfrak{A})_{n} =\sum _{(i_{1},\ldots ,i_{p})\in \mathcal{C}(n)} \binom{d-1+i_{1}}{i_{1}}\cdots \binom{d-1+i_{p}}{i_{p}}. $$

A simple computation shows that in fact

$$ \binom{d-1+i}{i}=\frac{d(d+1)\cdots (d+i-1)}{i!}=\frac{1}{i!}(d)_{i}, $$

where the Pochhammer symbol (or rising factorial) appears on the righthand side. It is well known that their exponential generating function equals the hypergeometric function

$$ {}_{1}F_{0}(d;t)=1+\sum _{i=1}^{\infty }(d)_{i}\frac{t^{i}}{i!}=(1-t)^{-d}. $$

Therefore

$$\begin{aligned} G(t)=\sum _{n=0}^{\infty }t^{n}\sum _{(i_{1},\ldots ,i_{p})\in \mathcal{C}(n)} \frac{(d)_{i_{1}}\cdots (d)_{i_{p}}}{i_{1}!\cdots i_{p}!} &=1+\sum _{p=1}^{\infty }\left ( \sum _{i=1}^{\infty }(d)_{i}\frac{t^{i}}{i!} \right )^{p} \\ &=\sum _{p=0}^{\infty }\left ( (1-t)^{-d}-1 \right )^{p}= \frac{(1-t)^{d}}{2(1-t)^{d}-1}. \end{aligned}$$

The coefficients of these Hilbert series can be found in column \(d\) of the OEIS sequence A261780.

Define the scalar product \(\langle - , -\rangle \colon T(\mathfrak{A}) \otimes T(\mathfrak{A}) \to \mathbb{F}\) for any words \(u,v \in T(\mathfrak{A})\) by \(\langle u , v\rangle :=1\) if \(u=v\) and zero else. It permits to identify the graded dual of \(H_{\mathrm{qsh}}\) as word series, i.e., , which is a non-commutative (topological) Hopf algebra with concatenation as convolution product, denoted by \(m_{\centerdot }\colon H_{\mathrm{qsh}}^{*} \otimes H_{\mathrm{qsh}}^{*} \to H_{\mathrm{qsh}}^{*}\), \(c \centerdot c':=m_{\centerdot }(c \otimes c')\), and de-quasi-shuffling as coproduct [27]. In more concrete terms, this means that given two such series their convolution product \(c \centerdot c':=m_{\mathbb{F}}(c \otimes c')\delta : H_{ \mathrm{qsh}} \to \mathbb{F}\) may be written as

$$ c \centerdot c' =\sum _{w\in T(\mathfrak{A})}\sum _{uv=w}\langle u,c \rangle \langle v,c'\rangle w =\sum _{w\in T(\mathfrak{A})}\langle \delta (w), c \otimes c'\rangle w. $$

Of particular interest are characters, i.e., algebra morphisms \(c \in H_{\mathrm{qsh}}^{*}\). They satisfy \(\langle e,c\rangle =1\) and \(\langle u \star v,c\rangle =\langle u,c\rangle \langle v,c\rangle \), for \(u,v \in H_{\mathrm{qsh}}\). The first property requires that the coefficient \(\langle e , c\rangle =1\) and the second is equivalent to \(c\) being group-like in \(H_{\mathrm{qsh}}^{*}\), which means that for \(u,v \in H_{\mathrm{qsh}}\)

$$ \langle u \star v , c\rangle =\langle u \otimes v , \Delta _{ \mathrm{qsh}}(c)\rangle =\langle u \otimes v, c \otimes c\rangle , $$

where the de-quasi-shuffling coproduct is defined on words by

$$ \Delta _{\mathrm{qsh}}(w):=\sum _{u,v \in T(\mathfrak{A})} \langle u \star v , w\rangle u \otimes v. $$

The set of characters, denoted by \(\mathcal{G}\), forms a group with the inverse \(c^{-1}=c\circ \alpha \). The corresponding Lie algebra, \(\mathfrak{g} \subset H_{\mathrm{qsh}}^{*}\), consists of so-called infinitesimal characters, which map the empty word and any non-trivial product in \(H_{\mathrm{qsh}}\) to zero. One can define the exponential map as a power series with respect to the convolution product which maps \(\mathfrak{g}\) bijectively to \(\mathcal{G}\), i.e., \(\exp ^{\centerdot }(f) := \varepsilon + \sum _{j > 0} \frac{1}{j!}f^{ \centerdot j} \in \mathcal{G}\). Because \(T(\mathfrak{A})\) is a graded connected Hopf algebra, this expression becomes a finite sum when evaluated on homogeneous elements of \(T(\mathfrak{A})\), so we do not have to deal with convergence issues. Its inverse is the logarithm, \(\log ^{\centerdot }(\varepsilon + (c-\varepsilon )) =\sum _{i \ge 1} \frac{(-1)^{i-1}}{i}(c-\varepsilon )^{\centerdot i} \in \mathfrak{g}\). Again, the sum applied to any word \(w \in T(\mathfrak{A})\) terminates after \(|w|\) terms, as \((c-\varepsilon )(e) =0\).

Notation 2.4

We introduce a particular notation for words in \(T(\mathfrak{A})\), which will be useful in the sequel. The convention to identify \([a]:=a\), for \(a \in \mathfrak{A}\), permits to write any word in \(T(\mathfrak{A})\) as a concatenation of brackets, i.e., \(w = [u_{1}] \cdots [u_{k}] \in T(\mathfrak{A})\), for \(u_{1}, \ldots , u_{k} \in \mathfrak{A}\).

We come back to the setting of the introductory section with only a single letter, \(A=\{\mathtt{1}\}\). Then, in each degree \(n\), \(T(\mathfrak{A})\) has a single word of length one, \([1^{n}] \in \mathfrak{A}\), and any basis element (or word) is of the form \(w=[\mathtt{1}^{k_{1}}][\mathtt{1}^{k_{2}}]\cdots [\mathtt{1}^{k_{n}}]\) for some integers \(k_{1},\ldots ,k_{n}>0\). It is easy to see that then the tuple \((k_{1},\ldots ,k_{n})\) is a composition of the integer \(|w|\) of length \(n=\ell (w)\). In [27] Hoffman describes a unital algebra isomorphism \(\Sigma \) between the quasi-shuffle algebra \(H_{\mathrm{qsh}}\), for \(A=\{\mathtt{1}\}\), and the algebra \(\mathrm{QSym}\) of quasisymmetric functions in the ordered set of commuting variables \(\{Y_{i}\}_{i \in \mathbb{N}_{+}}\) [21], defined by taking a word in \(T(\mathfrak{A})\) to an iterated sum

$$ \Sigma \left ([\mathtt{1}^{k_{1}}][\mathtt{1}^{k_{2}}]\cdots [ \mathtt{1}^{k_{n}}]\right ) :=\sum _{1 \le i_{1} < \cdots < i_{n}} (Y_{i_{1}})^{k_{1}} \cdots (Y_{i_{n}})^{k_{n}} =:M_{(k_{1}, \ldots ,k_{n})}. $$
(6)

Here \(\Sigma (e)=M_{0}=1\). Then, the correspondence of the introduction is explicitly given by

$$\begin{aligned} &{ \Sigma \left ( [\mathtt{1}] \right )\cdot \Sigma \left ( [\mathtt{1}^{3}][\mathtt{1}^{7}] \right ) = \left (\sum _{i} Y_{i}\right ) \cdot \left (\sum _{i_{1}< i_{2}} (Y_{i_{1}})^{3} (Y_{i_{2}})^{7} \right )} \\ &= \sum _{i_{1} < i_{2} < i_{3}} Y_{i_{1}} (Y_{i_{2}})^{3} (Y_{i_{3}})^{7} + \sum _{i_{1} < i_{2} < i_{3}} (Y_{i_{1}})^{3} Y_{i_{2}} (Y_{i_{3}})^{7} + \sum _{i_{1} < i_{2} < i_{3}} (Y_{i_{1}})^{3} (Y_{i_{2}})^{7} Y_{i_{3}} \\ &\qquad + \sum _{i_{1} < i_{2}} (Y_{i_{1}})^{4} (Y_{i_{2}})^{7} + \sum _{i_{1} < i_{2}} (Y_{i_{1}})^{3} (Y_{i_{2}})^{8} \\ &= \Sigma \left ( [\mathtt{1}] [\mathtt{1}^{3}][\mathtt{1}^{7}] + [ \mathtt{1}^{3}][\mathtt{1}][\mathtt{1}^{7}] + [\mathtt{1}^{3}][ \mathtt{1}^{7}][\mathtt{1}] + [\mathtt{1}^{4}][\mathtt{1}^{7}] + [ \mathtt{1}^{3}][\mathtt{1}^{8}] \right ) \\ &=\ \Sigma \left ( [\mathtt{1}] \star [\mathtt{1}^{3}][\mathtt{1}^{7}] \right ), \end{aligned}$$

where the second equality is an example of summation-by-parts for products of iterated sums.

The \(M_{(k_{1},\ldots ,k_{n})}\) of (6) are the monomial quasisymmetric functions, which form a basis for \(\mathrm{QSym}\). The Hopf algebra \(\mathrm{QSym}\) is a generalisation of the classical Hopf algebra \(\mathrm{Sym}\) of symmetric functions. It was defined and studied by Gessel [21], based on earlier work by Stanley, and plays a rather distinguished role in modern algebraic combinatorics, with ramifications into several other fields of mathematics. Its graded dual is known as the connected graded cocommutative Hopf algebra \(\mathrm{NSym}\) of noncommutative symmetric functions. The iterated-sums signature corresponding to a one dimensional discrete time series, alluded to in the first section, is an element in \(\mathrm{NSym}\). Further below, in Section 3, we consider the multidimensional generalisation of quasisymmetric functions (of level \(d\) in the terminology of [42]) and its corresponding iterated-sums signature. We close this section by mentioning that Malvenuto’s and Reutenauer’s Hopf algebra of permutations [39] plays an important part in the understanding of the relation between the objects \(\mathrm{Sym}\), \(\mathrm{QSym}\) and \(\mathrm{NSym}\). The interested reader is referred to [1, 2] and to [36] for a readable introduction, including a brief historical overview.

2.1 Half-Shuffles

Aiming at understanding the discrete analog of the \({\mathrm{area}}\) operation (to be introduced further below), we take a more refined approach at the quasi-shuffle product by observing that \(m_{\star }\) may be split into three products, i.e., left and right half-shuffles and a third product

$$ ua\mathbin{\dot{\succ }}vb:=(ua\star v)b, \enspace ua \mathbin{\dot{\prec }}vb:=(u\star vb)a, \enspace ua \diamond vb:=(u\star v)[ab], $$
(7)

so that \(u\star v=u\mathbin{\dot{\prec }}v+u\mathbin{\dot{\succ }}v+u \diamond v\). For instance (c.f. Example 3)

$$ \begin{aligned} \mathtt{3}\mathbin{\dot{\succ }}\mathtt{4}[ \mathtt{12}] &=\mathtt{34}[\mathtt{12}]+\mathtt{43}[\mathtt{12}]+[ \mathtt{34}][\mathtt{12}] \\ \mathtt{3}\mathbin{\dot{\prec }}\mathtt{4}[\mathtt{12}] &= \mathtt{4}[\mathtt{12}]\mathtt{3} \\ \mathtt{3}\diamond \mathtt{4}[\mathtt{12}] &=\mathtt{4}[ \mathtt{123}]. \end{aligned} $$
(8)

Noticing the particular relation \(ua \mathbin{\dot{\succ }}vb = vb \mathbin{\dot{\prec }}ua\) which is equivalent to \(m_{\star }\) being commutative, it is not hard to show that the quasi-shuffle algebra \(H_{\mathrm{qsh}}=(T(\mathfrak{A}),\star )\) becomes a commutative tridendriform algebra, \((T(\mathfrak{A}),{\mathbin{\dot{\prec }}},{\mathbin{\dot{\succ }}},{ \diamond })\), as defined by Loday and Ronco [35].

Remark 2.5

A similar splitting holds for the shuffle algebra in Remark 2.2. We can write the shuffle product on \(T(A)\) as a sum of the two half-shuffles

so that . Again, we check quickly that the commutativity of the shuffle product is equivalent to \(u\mathtt{a}\succ v\mathtt{b} = v \mathtt{b}\prec u\mathtt{a}\). In fact, the triple \((T(A),{\prec },{\succ })\) is also known as a commutative dendriform or Zinbiel algebra.

2.2 Hoffman’s Exponential

Shuffle and quasi-shuffle Hopf algebras are more tightly related than Remark 2.2 may adumbrate. Indeed, Hoffman proved in [27] that and \(H_{\mathrm{qsh}}=(T(\mathfrak{A}),\star ,\delta )\) are isomorphic as Hopf algebras. We briefly recall this result. Let \(T(\mathfrak{A})\) be equipped with the commutative shuffle product inductively defined by , for \(u,v\in T(\mathfrak{A})\) and \(a,b\in \mathfrak{A}\). The empty word, \(e\), is the unit for this product. Recall the notation \(I[w]\) introduced in Theorem 2.1.

Theorem 2.6

(Hoffman’s isomorphism)

[27] There exists a Hopf algebra isomorphism , given explicitly by the so-called Hoffman exponential

$$ \Phi _{\mathrm{H}}(w):=\sum _{(i_{1},\ldots ,i_{p})\in \mathcal{C}(\ell (w))} \frac{1}{i_{1}!\cdots i_{p}!}I[w] . $$
(9)

Its inverse also admits an explicit expression, namely the Hoffman logarithm

$$ \Phi _{\mathrm{H}}^{-1}(w):=\sum _{(i_{1},\ldots ,i_{p})\in \mathcal{C}(\ell (w))} \frac{(-1)^{\ell (w)-p}}{i_{1}\cdots i_{p}}I[w]. $$
(10)

Some examples: \(\Phi _{\mathrm{H}}([\mathtt{i}])=[\mathtt{i}]\) and for the words \([\mathtt{1}][\mathtt{2}] \in T(\mathfrak{A})\) and \([\mathtt{1}][\mathtt{23}][\mathtt{4}] \in T(\mathfrak{A})\) we find

In the second example, the terms correspond to the compositions \((1,1,1)\), \((2,1)\), \((1,2)\) and \((3)\) of the integer 3, in that order. Recall that the particular Notation 2.4 for words \(w = [u_{1}] \cdots [u_{k}] \in T(\mathfrak{A})\), for \(u_{1}, \ldots , u_{k} \in \mathfrak{A}\), is in place. Also, note that the number of letters in each of the terms corresponds to the length of the composition. The reader is referred to [27, 28] for more details. See also [16] for an application in stochastic analysis.

In Section 5 we will show that \(\Phi _{\mathrm{H}}\) is nicely compatible with comparing the iterated-sums signature on one side with the iterated-integrals signature on the other. The following two lemmas are going to be used in Section 5.1, where we address the area operation in the context of the iterated-sums signature.

Lemma 2.7

The image of any nonempty word \(w=w_{1}\cdots w_{n} \in T(\mathfrak{A})\)under Hoffman’s isomorphism can be split into two parts as follows:

$$ \Phi _{\mathrm{H}}(w)=\Phi _{\mathrm{H}}(w_{1}\cdots w_{n-1})w_{n}+R_{ \mathrm{H}}(w), $$
(11)

where the remainder term

$$ R_{\mathrm{H}}(w) =\sum _{ \substack{I=(i_{1},\ldots ,i_{p})\in \mathcal{C}(\ell (w))\\i_{p}>1}} \frac{1}{i_{1}!\cdots i_{p}!}I[w]. $$

The verification of the lemma is left to the reader. This splitting of Hoffman’s isomorphism implies the following important result.

Lemma 2.8

Let \(u \in T(\mathfrak{A})\)and \(a,b\in \mathfrak{A}\). Then

$$ \Phi _{\mathrm{H}}\big(u([a][b]-[b][a])\big) =\Phi _{\mathrm{H}}(u[a])[b]- \Phi _{\mathrm{H}}(u[b])[a]. $$
(12)

Proof

From Lemma 2.7 and linearity of \(\Phi _{\mathrm{H}}\), we deduce that

$$ \Phi _{\mathrm{H}}\big(u([a][b]-[b][a])\big) =\Phi _{\mathrm{H}}(u[a])[b] -\Phi _{\mathrm{H}}(u[b])[a] + R_{\mathrm{H}}(u[a][b]) - R_{ \mathrm{H}}(u[b][a]). $$

Since the semigroup \(\mathfrak{A}\) is commutative, for any composition \(I=(i_{1},\ldots ,i_{p})\in \mathcal{C}(n)\) with \(i_{p}\geq 2\) we have that

$$\begin{aligned} I\big[u[a][b]\big] &=[u_{1}\cdots u_{i_{1}}][u_{i_{1}+1}\cdots u_{i_{1}+i_{2}}] \cdots [u_{i_{1}+\cdots +i_{p-1}+1}\cdots u_{n-2}ab] \\ &=[u_{1}\cdots u_{i_{1}}][u_{i_{1}+1}\cdots u_{i_{1}+i_{2}}]\cdots [u_{i_{1}+ \cdots +i_{p-1}+1}\cdots u_{n-2}ba] \\ &=I\big[u[b][a]\big]. \end{aligned}$$

Therefore, the equality \(R_{\mathrm{H}}(u[a][b])=R_{\mathrm{H}}(u[b][a])\) holds, which implies the identity (12). □

3 Iterated-Sums Signatures

We consider a discrete time series \(x \in (\mathbb{F}^{d})^{N}\) as an element of

$$\begin{aligned} (\mathbb{F}^{d})^{{\mathbb{N}_{+}}}_{c} := \left \{ x: {\mathbb{N}_{+}} \to \mathbb{F}^{d}: \exists \, N \ge 1 \text{ such that }\, x_{N} = x_{n} \, \forall n \ge N \right \} , \end{aligned}$$

the space of infinite time series that are eventually constant, by extending it constantly. In this section we will see that the appropriate algebraic setting for iterated-sums, combined into the map \(\operatorname{ISS}(x)\), is that of a character on the quasi-shuffle Hopf algebra \(H_{\mathrm{qsh}}=(T(\mathfrak{A}),\star ,\delta , \varepsilon , | \cdot |)\) over the semigroup \(\mathfrak{A}\) corresponding to the alphabet \(A=\{\mathtt{1},\mathtt{2},\ldots ,\mathtt{d}\}\), introduced in Section 2.

The following notation for elements in the time series \(x\) is put in place:

$$ x_{j}=(x_{j}^{[\mathtt{1}]},\ldots ,x_{j}^{[\mathtt{d}]}) \in \mathbb{F}^{d}. $$

Next we define the corresponding time series

$$ \Delta x=((\Delta x)_{1}, (\Delta x)_{2}, \ldots , (\Delta x)_{N}) $$

with increments \((\Delta x)_{n} :=x_{n} - x_{n-1} \in \mathbb{F}^{d}\), for \(n\ge 1\), as entries. The new notation is extended to include all brackets in \(\mathfrak{A}\) by defining

$$ x_{j}^{[\mathtt{a}_{1}\cdots \mathtt{a}_{p}]} :=x_{j}^{[ \mathtt{a}_{1}]}\cdots x_{j}^{[\mathtt{a}_{p}]}. $$

Definition 3.1

The iterated-sums signature of the time series \(x\) is the two-parameter family \((\operatorname{ISS}(x)_{n,m}\mid 0\le n\leq m\in {\mathbb{N}})\) of linear maps from \(T(\mathfrak{A})\) to \(\mathbb{F}\) such that \(\operatorname{ISS}(x)_{n,n}=\varepsilon \), and defined recursively by \(\langle e,\operatorname{ISS}(x)_{n,m}\rangle :=1\), and for \(a_{1}\cdots a_{p} \in T(\mathfrak{A})\)

$$ \langle [a_{1}] \cdots [a_{p}],\operatorname{ISS}(x)_{n,m}\rangle :=\sum _{j=n+1}^{m}\langle [a_{1}]\cdots [a_{p-1}], \operatorname{ISS}(x)_{n,j-1}\rangle \Delta x_{j}^{[a_{p}]}. $$

Hence, the iterated-sums signature is a word series in \(H_{\mathrm{qsh}}^{*}\)

$$\begin{aligned} \operatorname{ISS}(x)_{n,m} &= \sum _{[u_{1}] \cdots [u_{k}] \in T( \mathfrak{A})} \langle [u_{1}] \cdots [u_{k}],\operatorname{ISS}(x)_{n,m} \rangle [u_{1}] \cdots [u_{k}] \end{aligned}$$
(13)

with iterated sums over increments of \(x\) as coefficients, defined as

$$\begin{aligned} \langle [u_{1}] \cdots [u_{k}],\operatorname{ISS}(x)_{n,m}\rangle &= \sum _{n< i_{1}< i_{2} < \cdots < i_{k} \le m} \Delta x_{i_{1}}^{[u_{1}]} \Delta x_{i_{2}}^{[u_{2}]} \cdots \Delta x_{i_{k}}^{[u_{k}]}. \end{aligned}$$
(14)

For example

$$ \langle [\mathtt{1}][\mathtt{12}],\operatorname{ISS}(x)_{n,m} \rangle = \sum _{n< i_{1}< i_{2}\le m}\Delta x_{i_{1}}^{[\mathtt{1}]} \Delta x_{i_{2}}^{[\mathtt{1}]}\Delta x_{i_{2}}^{[\mathtt{2}]}. $$

We extend this definition to all \(n,m\in {\mathbb{N}}\) by setting \(\langle w,\operatorname{ISS}(x)_{n,m}\rangle =0\) whenever \(m< n\).

Remark 3.2

An easy consequence of this definition is that the coefficient \(\langle w,\operatorname{ISS}(x)_{n,m}\rangle \) vanishes whenever \(\ell (w)>m-n\).

The proof of the following lemma is straightforward.

Lemma 3.3

Let \(x=(x_{n})_{n\ge 0}\)and \(x'=(x'_{n})_{n\ge 0}\)be two time series, and denote by \(xx':-(x_{n}x'_{n})_{n\ge 0}\). Then the increment of the product \(xx'\)is given by a generalised Leibniz rule

$$ (\Delta xx')_{n} = x'_{n-1}(\Delta x)_{n} +x_{n-1}(\Delta x')_{n} +( \Delta x)_{n}(\Delta x')_{n}. $$

More importantly, we have the following:

Theorem 3.4

  1. (1)

    (Quasi-shuffle identity) For each \(n\le m\), the map \(\operatorname{ISS}(x)_{n,m}:H_{\mathrm{qsh}} \to \mathbb{F}\)is a quasi-shuffle Hopf algebra character.

  2. (2)

    (Chen’s property) For any three \(n< n'< n''\in {\mathbb{N}}\)we have

    $$ \operatorname{ISS}(x)_{n,n'}\centerdot \operatorname{ISS}(x)_{n',n''}= \operatorname{ISS}(x)_{n,n''}. $$

Remark 3.5

1. Observe that point (i) in Theorem 3.4 amounts to a generalisation of the algebra isomorphism defined in (6) to the multidimensional case, i.e., for an alphabet \(A=\{\mathtt{1}, \ldots ,\mathtt{d}\}\). Indeed, defining the map \(\Sigma _{d}\) on \(H_{\mathrm{qsh}}\)

$$ \Sigma _{d}([u_{1}][u_{2}]\cdots [u_{n}]) :=\sum _{1 \le j_{1} < \cdots < j_{n}} Y^{[u_{1}]}_{j_{1}} \cdots Y^{[u_{n}]}_{j_{n}}, $$
(15)

where \(u_{1},\ldots ,u_{n} \in \mathfrak{A}\) and for \(u=[\mathtt{a}_{1} \cdots \mathtt{a}_{l}] \in \mathfrak{A}\) we have set

$$ Y^{[u]}_{j} := Y^{[\mathtt{a}_{1}]}_{j} \cdots Y^{[\mathtt{a}_{l}]}_{j}, $$

we obtain a quasi-shuffle algebra isomorphism into the algebra of quasi-symmetric functions of level \(d\), as introduced by Novelli and Thibon in [42]. For the sake of briefness we only remark that

$$\begin{aligned} \langle w, \operatorname{ISS}(x) \rangle = \Sigma _{d}(w)\Big\rvert _{Y_{1}^{[ \mathtt{1}]} = \Delta x_{1}^{[\mathtt{1}]}, \dots , Y_{1}^{[ \mathtt{d}]} = \Delta x_{1}^{[\mathtt{d}]}, Y_{2}^{[\mathtt{1}]} = \Delta x_{2}^{[\mathtt{1}]}, \dots } \end{aligned}$$

2. Specialising to \(\mathbb{F}= \mathbb{R}\), Theorem 3.4 matches the corresponding result for the iterated-integrals signature \(S(X)\) of a curve of bounded variation in \({\mathbb{R}}^{B}\), where \(B\) is a (possibly countable) alphabet. The iterated-integrals signature is also called Chen’s signature, rough path signature, continuous-time signature or just signature in the literature.

Here, the underlying Hopf algebra is . Indeed (see for example [23]),

  1. (1)

    (Shuffle identity) For fixed \(s < t\), \(S(X)_{s,t}\) is a character on , that is for all

  2. (2)

    (Chen’s property) For \(s < u < t\)

    $$\begin{aligned} S(X)_{s,u} \centerdot S(X)_{u,t} = S(X)_{s,t}. \end{aligned}$$

Before proving Theorem 3.4 we need the following abstract result, which is a particular case of the setting presented in [42, Section 5.1].

Lemma 3.6

Let \(M_{[u_{1}]\cdots [u_{k}]}(Y):=\Sigma _{d}([u_{1}]\cdots [u_{k}])\)denote the level \(d\)monomial quasisymmetric functions defined in (15). Then, the “generating series”

$$ \sigma (Y):=\sum _{w\in \mathfrak{A}} M_{w}(Y)\, w $$

admits the factorisation

$$ \sigma (Y)=\vec{\prod _{j\ge 1}}\left ( \varepsilon -\sum _{ \mathtt{a}\in A}Y_{j}^{[\mathtt{a}]}\, [\mathtt{a}] \right )^{-1} = \vec{\prod _{j\ge 1}}\left ( \varepsilon +\sum _{[u]\in \mathfrak{A}}Y_{j}^{[u]} \, [u] \right ). $$
(16)

Let us look at the first few terms in (16):

$$\begin{aligned} &{\sigma (Y)=\left ( \varepsilon +\sum _{[u]\in \mathfrak{A}}Y_{1}^{[u]}\,[u] \right ) \left ( \varepsilon +\sum _{[u]\in \mathfrak{A}}Y_{2}^{[u]}\,[u] \right ) \left ( \varepsilon +\sum _{[u]\in \mathfrak{A}}Y_{3}^{[u]}\,[u] \right )\cdots } \\ &= \varepsilon + \sum _{[u]\in \mathfrak{A} }\big(Y_{1}^{[u]} + Y_{2}^{[u]} + Y_{3}^{[u]} + \cdots \big)\,[u] + \sum _{[u][v] \in \mathfrak{A}^{ \otimes 2}}\big(Y_{1}^{[u]}Y_{2}^{[v]} + Y_{1}^{[u]}Y_{3}^{[v]} + \cdots \big)\,[u][v] + \cdots \end{aligned}$$

Instead of elaborating on this lemma, we refer to reference [42] for details about multivariable generating series. Note, however, that after evaluating \(\sigma (Y)\) in \(Y_{j}^{[\mathtt{a}]}=\Delta x_{j}^{[\mathtt{a}]}\), we obtain \(\operatorname{ISS}(x)\) and the factorisation (16) takes place in the convolution algebra . We further remark that the expansion of the geometric series on the righthand side of the first equality in (16) takes place in \(\mathfrak{A}\), which explains the summation over \(\mathfrak{A}\) in the second equality.

Remark 3.7

Equality (16) bears resemblance to [32, Definition 4.1] (c.f. also [38, Theorem 32]. We would like to thank Harald Oberhauser (Oxford) for pointing us to these references). At first sight though, only coefficients for words in letters of weight one are considered in the aforementioned reference (e.g. in our notation \([1],[2],\ldots ,[d],[1][1],[1][2],\dots ,[1][1][1]\), …). Preprocessing the underlying time series through a nonlinear function (i.e. a kernel in the terminology of [32]), one can introduce additional polynomial expressions. But, note that in their setting then nonetheless sums of increments of polynomials appear, whereas in the iterated-sums signature (i.e. in (16) evaluated at \(Y_{i} = \Delta x_{i}\)) polynomials of increments show up.

The differences between the two approaches may be summarized by saying that increments of polynomials differ from polynomials of increments. Saying this, it is an interesting question how these two approaches could be combined fruitfully. In particular, we hope to investigate the application of kernelization techniques to the iterated-sums signature.

Finally, we would like to mention that the work of Hoffman–Ihara (see Section 5 and [28], as well as [18]) permits to define for any positive integer a linear automorphism of \(T(\mathfrak{A})\) which gives rise to a family of “feature maps” interpolating between the iterated-sums signature and the iterated-integrals signature. This relates to a modification of (16) in the spirit of [32, Appendix B]. These new feature maps define characters over Hopf algebras equipped with new quasi-shuffle type products. The corresponding family of linear automorphisms define algebra maps between these quasi-shuffle type products and the quasi-shuffle product (2). We postpone the details of this construction to a follow-up paper, and would like to thank the anonymous referee for hinting at this direction.

Proof of Theorem 3.4

1. We need to show that for words \(w,w' \in T(\mathfrak{A})\)

$$ \langle w\star w',\operatorname{ISS}(x)_{n,m}\rangle =\langle w, \operatorname{ISS}(x)_{n,m}\rangle \langle w',\operatorname{ISS}(x)_{n,m} \rangle . $$

We use the recursive definition of the quasi-shuffle product (2) and induction on \(\ell (w)+\ell (w')\), the base case (i.e., \(w=e\) or \(w'=e\)) being trivial. If \(u,v\in T(\mathfrak{A})\) and \(a,b\in \mathfrak{A}\), define the auxiliary time series

$$ s_{k} :=\langle u[a],\operatorname{ISS}(x)_{n,n+k}\rangle , \quad s'_{k} :=\langle v[b],\operatorname{ISS}(x)_{n,n+k} \rangle $$

for \(0\le k\le m-n\), and zero else. Observe that the increments

$$ (\Delta s)_{k}=\langle u,\operatorname{ISS}(x)_{n,n+k-1}\rangle \Delta x_{n+k}^{[a]}, \quad (\Delta s')_{k}=\langle v, \operatorname{ISS}(x)_{n,n+k-1}\rangle \Delta x_{n+k}^{[b]}. $$

By the induction hypothesis we then get

$$\begin{aligned} s'_{k-1}(\Delta s)_{k} &=\langle u,\operatorname{ISS}(x)_{n,n+k-1} \rangle \langle v[b],\operatorname{ISS}(x)_{n,n+k-1}\rangle \Delta x_{n+k}^{[a]} \\ &=\langle u\star v[b],\operatorname{ISS}(x)_{n,n+k-1}\rangle \Delta x_{n+k}^{[a]}, \end{aligned}$$

and similarly

$$\begin{aligned} s_{k-1}(\Delta s')_{k} &=\langle u[a],\operatorname{ISS}(x)_{n,n+k-1} \rangle \langle v,\operatorname{ISS}(x)_{n,n+k-1}\rangle \Delta x_{n+k}^{[b]} \\ &=\langle u[a]\star v,\operatorname{ISS}(x)_{n,n+k-1}\rangle \Delta x_{n+k}^{[b]}. \end{aligned}$$

Also, by a similar argument we also have

$$ (\Delta s)_{k}(\Delta s')_{k}=\langle u\star v,\operatorname{ISS}(x)_{n,n+k-1} \rangle \Delta x_{n+k}^{[a]} \Delta x_{n+k}^{[b]}. $$

Finally, we summate these relations by using Lemma 3.3 to get

$$\begin{aligned} &{\langle u[a],\operatorname{ISS}(x)_{n,m}\rangle \langle v[b],\operatorname{ISS}(x)_{n,m}\rangle =\sum _{k=1}^{m-n}\Delta (ss')_{k}} \\ &=\langle (u\star v[b])[a]+(u[a]\star v)[b]+(u\star v)[ab], \operatorname{ISS}(x)_{n,m}\rangle \\ &=\langle u[a]\star v[b],\operatorname{ISS}(x)_{n,m}\rangle . \end{aligned}$$

2. The proof of Chen’s property can be pursued using a pedestrian approach. However, it also follows from Lemma 3.6. Indeed, we may split the product in the factorisation (16) as

$$ \sigma (Y)=\vec{\prod _{1\le j\le n'}}\left ( \varepsilon -\sum _{ \mathtt{a}\in A}Y_{j}^{[\mathtt{a}]}[\mathtt{a}] \right )^{-1} \centerdot \vec{\prod _{j>n'}}\left ( \varepsilon -\sum _{\mathtt{a} \in A}Y_{j}^{[\mathtt{a}]}[\mathtt{a}] \right )^{-1}. $$

The desired identity follows upon evaluation at \(Y_{j}^{[\mathtt{a}]}=\Delta x_{j}^{[\mathtt{a}]}\) as in the previous remark. □

We note that the iterated-sums signature, \(\operatorname{ISS}(x)_{n,m}\), introduced in this work is similar to the discrete Chen(–Fliess) series defined and studied in [22] in the context of nonlinear control theory.

This section is closed with an intriguing observation. Up to this point it may seem that iterated-sums signatures, \(\operatorname{ISS}(x)_{n,m}\), and Chen’s signatures, \(S(X)_{s,t}\) (see Remark 3.5), behave in the same way, but as the next example shows this is not at all the case. Recall that \(\mathrm{End}_{\mathbb{F}}(H_{\mathrm{qsh}})\), the space of linear maps on \(H_{\mathrm{qsh}}\), together with the convolution product \(\psi \ast \gamma :=m_{\star }(\psi \otimes \gamma )\delta \), is a non-commutative algebra with unit \(\iota :=\eta \circ \varepsilon \), where \(\eta : \mathbb{F}\to H_{\mathrm{qsh}}\) is the unit map, \(\eta (\lambda ):=\lambda e\). Define

$$ \mathfrak{e}:=\log ^{*}(\operatorname{id})=J-\frac{1}{2}J*J+\frac{1}{3}J*J*J+ \cdots , $$
(17)

where \(J:=\operatorname{id}- \iota \in \mathrm{End}_{\mathbb{F}}(H_{\mathrm{qsh}})\) is the projection onto the augmentation ideal \(T_{+}(\mathfrak{A})\). It is the adjoint of the classical Eulerian Lie idempotent [46], that is, the concatenation logarithm of the identity map, \(\log ^{\centerdot }(\operatorname{id})\). Observe that the sum (17) terminates when evaluated in homogeneous elements since \(J(e)=0\), thus it is well defined for arbitrary elements of \(T(\mathfrak{A})\). Then, for any character and word \(u\in T(\mathfrak{A})\) we have that

$$ \langle u,\log ^{\centerdot }c\rangle =\langle \mathfrak{e}(u),c\rangle , $$

where \(\log ^{\centerdot }c \in g\). Indeed, by definition

In the third equality we used that \(c\) is a character. In the second equality the reduced coproduct is applied

Now, if \(x\) is an arbitrary time series, for its iterated-sums signature this means that

$$ \langle [\mathtt{1}^{2}],\log ^{\centerdot }\operatorname{ISS}(x) \rangle =\langle [\mathtt{1}^{2}],\operatorname{ISS}(x)\rangle = \sum _{j}\left (\Delta x_{j}^{[1]}\right )^{2}\ge 0. $$

Therefore, the image of the logarithm of iterated-sums signatures only reaches a certain subset of the Lie algebra of infinitesimal characters on \(H_{\mathrm{qsh}}\). This is in contrast to Chen’s iterated-integrals signature, for which Chow’s Theorem [19, Theorem 7.28] holds, showing that any character over the shuffle Hopf algebra may be realised as the Chen signature of a piecewise linear path. The implications of this observation will be studied in a forthcoming paper.

Still, the following positive statement on the linear span of iterated-sums signatures holds.

Lemma 3.8

For every \(n\ge 1\), \(\operatorname{span}_{\mathbb{F}}\{ \operatorname{proj}_{\le n} \operatorname{ISS}(x) : x \in (\mathbb{F}^{d})^{\mathbb{N}_{+}}_{c}\} = \operatorname{proj}_{\le n} H^{*}_{\mathrm{qsh}}\).

Remark 3.9

The corresponding result for iterated-integrals signatures was shown in [13, Lemma 3.4], which is sometimes useful for proving statements about the underlying algebra that are easily verified when tested against signatures.

Proof

Fix \(n \ge 1\) and let \(P_{1}, \dots , P_{L}\), ordered in some way, be the quasisymmetric monomial functions with degree smaller or equal to \(n\).

By [42, Section 5.1] they are independent as elements of the space of formal power series.

This implies that, for some \(m\ge 1\) large enough, evaluating at \(Y_{m} = (y_{1}, \ldots , y_{m}, 0, 0, \ldots )\), the expressions \(P_{1}(Y_{m})\), …, \(P_{L}(Y_{m})\) are independent as elements of \(\mathbb{F}\left [ y_{1}, \ldots , y_{m} \right ]\). Denote

$$\begin{aligned} Y^{(\ell )}_{m} = \left (y_{1}^{(\ell )}, \dots , y^{(\ell )}_{m}, 0, 0, \ldots \right ),\quad \ell = 1,\ldots , L, \end{aligned}$$

copies in new variables of \(Y_{m}\), that is \(P_{i}\left ( Y^{(\ell )}_{m} \right ) \in R := \mathbb{F}\left [ y^{(1)}_{1}, \dots , y^{(1)}_{m}, \dots , y^{(L)}_{1}, \ldots , y^{(L)}_{m} \right ]\). Then, the independence of the \(P_{i}\) implies independence, in \(R\), of the rows of

$$\begin{aligned} \begin{pmatrix} P_{1}\left ( Y^{(1)}_{m} \right ) & P_{1}\left ( Y^{(2)}_{m}\right ) & \dots & P_{1}\left ( Y^{(L)}_{m} \right ) \\ P_{2}\left ( Y^{(1)}_{m} \right ) & P_{2}\left ( Y^{(2)}_{m}\right ) & \dots & P_{2}\left ( Y^{(L)}_{m} \right ) \\ \vdots & \vdots & \ddots & \vdots \\ P_{L}\left ( Y^{(1)}_{m} \right ) & P_{L}\left ( Y^{(2)}_{m}\right ) & \dots & P_{L}\left ( Y^{(L)}_{m} \right ) \end{pmatrix}. \end{aligned}$$

A fortiori, also the columns must be independent in \(R\). Hence, the columns must be independent for some realisation of the \(Y^{(\ell )}\). This finally implies that we can find \(x^{(\ell )} \in (\mathbb{F}^{d})^{m+1}\), \(\ell =1, \ldots , L\) such that the columns of

$$\begin{aligned} \begin{pmatrix} P_{1}\left ( \Delta x^{(1)} \right ) & P_{1}\left ( \Delta x^{(2)} \right ) & \dots & P_{1}\left ( \Delta x^{(L)} \right ) \\ P_{2}\left ( \Delta x^{(1)} \right ) & P_{2}\left ( \Delta x^{(2)} \right ) & \dots & P_{2}\left ( \Delta x^{(L)} \right ) \\ \vdots & \vdots & \ddots & \vdots \\ P_{L}\left ( \Delta x^{(1)} \right ) & P_{L}\left ( \Delta x^{(2)} \right ) & \dots & P_{L}\left ( \Delta x^{(L)} \right ) \end{pmatrix}, \end{aligned}$$

are independent. Here, as before, we extend the \(x^{(\ell )}\) constantly to an element of \((\mathbb{F})^{{\mathbb{N}_{+}}}_{c}\). □

4 Invariants

In the previous section, we defined the iterated-sums signature, following the introduction. We now return to our original motivation, and first put the concept of “time warping” in a precise mathematical framework. For each index \(n\ge 0\) we define an operator acting on sequences by repeating once the value at time \(n\). More precisely, given a time series, \(x\), we define \(\tau _{n}(x)\) as the time series given by

$$ \tau _{n}(x)_{j}:=\textstyle\begin{cases} x_{j}&j\le n \\ x_{j-1}&j>n \end{cases}\displaystyle . $$

Observe that with this definition we have \(\tau _{n}(x)_{n}=\tau _{n}(x)_{n+1}=x_{n}\), and the rest of the values are unchanged save for a time shift after time \(n\).

Definition 4.1

We call a functional \(F: (\mathbb{F}^{d})^{{\mathbb{N}_{+}}}_{c} \to \mathbb{F}\)invariant to time warping if \(F\circ \tau _{n}=F\) for all \(n\ge 1\).

From applications to data analysis, such as, e.g., moment corrections, we are mostly interested in polynomial invariants, i.e., invariant functionals that can be expressed by considering only polynomial expressions in a time series.

Definition 4.2

We call \(F: (\mathbb{F}^{d})^{{\mathbb{N}_{+}}}_{c} \to \mathbb{F}\)polynomial, if for all \(N \ge 1\), \(F(x_{0},\ldots ,x_{N}, 0,0,\ldots )\) is a polynomial in the \(x_{i}\), and the polynomial degree is uniformly bounded in \(N\).

From the factorisation (16) in Lemma 3.6 it follows that for any word \(w\in T(\mathfrak{A})\) the coefficient \(\langle w,\operatorname{ISS}(x)\rangle \) is a polynomial invariant in this sense. It turns out these are all the polynomial invariants, if we additionally demand invariance with respect to space translation of the entire series.

Lemma 4.3

Let \(F\)be polynomial, invariant to both time warping and space translations. Then: \(F\)is realised as a quasisymmetric function.

Proof

We do the one-dimensional case, \(d=1\), to avoid notational clutter. By translation invariance, for any \(N \ge 1\),

$$\begin{aligned} F(x_{0}, x_{1}, \ldots , x_{N}, 0, 0, \ldots ) = F(0, x_{1}-x_{0}, x_{2} - x_{0},\ldots , x_{N} - x_{0}, 0, 0,\ldots ). \end{aligned}$$

Now, by assumption, this is a polynomial in \(x_{1}-x_{0}, x_{2} - x_{0},\ldots , x_{N}- x_{0}\), hence it is a (different) polynomial in \(x_{1}-x_{0}, x_{2}-x_{1}, x_{3}-x_{2},\ldots ,x_{N}-x_{N-1}\). Therefore, \(F\) can be realised as a formal power series of bounded degree: there is \(\hat{F} \in \mathbb{F}\langle Y_{1}, Y_{2}, \dots \rangle \) of bounded degree such that for \(x \in (\mathbb{F}^{d})^{{\mathbb{N}_{+}}}_{c}\) we have that \(F( x ) = \hat{F}( \Delta x )\).

It remains to show that \(\hat{F}\) is quasisymmetric. Let \(n \ge 1\), \(i_{1} < \cdots < i_{n}\) and \(\alpha _{1}, \dots , \alpha _{n} \ge 1\). We show that the coefficient of the monomial \(Y_{i_{1}}^{\alpha _{1}} \cdots Y_{i_{n}}^{\alpha _{n}}\) in \(\hat{F}\) is equal to the one of \(Y_{1}^{\alpha _{1}} \cdots Y_{n}^{\alpha _{n}}\).

Indeed: by using repeatedly the invariance to time warping, we get that for all \(x \in {\mathbb{R}}^{n}\),

$$\begin{aligned} \hat{F}(\Delta x_{1},\Delta x_{2},\dots , \Delta x_{n},0,0,\ldots ) = \hat{F}(0,\ldots ,0,\underbrace{\Delta x_{1}}_{i_{1}},0,\dots ,0, \underbrace{\Delta x_{2}}_{i_{2}},0, \dots , 0, \underbrace{\Delta x_{n}}_{i_{n}}, 0, \ldots ). \end{aligned}$$

Hence, both sides coincide as polynomials. So that the coefficient of \(Y_{i_{1}}^{\alpha _{1}} \cdots Y_{i_{n}}^{\alpha _{n}}\) and \(Y_{1}^{\alpha _{1}} \dots Y_{n}^{\alpha _{n}}\) must coincide. This finishes the proof. □

5 Hoffman’s Isomorphism and Signatures

In this section we relate the iterated-sums signature of a time series with the usual iterated-integrals signature of the piecewise linear interpolation of an associated infinite dimensional time series.

Starting again with the extended alphabet \(\mathfrak{A}\), we build the tensor algebra \(T(\mathfrak{A})\) and define the shuffle product inductively by

Recall Hoffman’s isomorphism [27] defined in Theorem 2.6, which shows that and \(H_{\mathrm{qsh}}=(T(\mathfrak{A}),\star ,\delta )\) are isomorphic as Hopf algebras. Next we compute explicitly the image by the iterated-integrals signature \(S\) of a linear path.

The following lemma is an immediate extension of [19, Example 7.21] to a countable index set.

Lemma 5.1

Consider a countable set \(B\)and let \(z_{t}=z_{0}+at\)for some \(z_{0},a\in \mathbb{R}^{B}\)and all \(t\in [0,1]\). Then for \(w=w_{1} \cdots w_{n} \in T(B)\)

$$ \langle w,S(z)_{s,t}\rangle =\frac{(t-s)^{\ell (w)}}{\ell (w)!}\prod _{j=1}^{ \ell (w)}a^{w_{j}}. $$

At the level of the tensor algebra this simply means that \(S(z)_{s,t}=\exp ^{\centerdot }((t-s)a)\). An analogue of this result holds for discrete signatures, which follows from Lemma 3.6, i.e., Chen’s property.

Lemma 5.2

Let \(x=(0,v,v,\ldots )\)be a time series having a single non-zero increment \(v=(v^{[\mathtt{1}]},\ldots ,v^{[\mathtt{d}]})\in \mathbb{R}^{d}\). Then

$$ \langle w,\operatorname{ISS}(x)\rangle = \textstyle\begin{cases} (v^{[\mathtt{1}]})^{k_{1}}\cdots (v^{[\mathtt{d}]})^{k_{d}}, &w=[ \mathtt{1}^{k_{1}}\cdots \mathtt{d}^{k_{d}}] \\ 0, &\textit{else} \end{cases}\displaystyle . $$

Now we look for a relation between the iterated-integrals signature and the iterated-sums signature. For this, let \(x=(0,x_{1},x_{2},\ldots )\) be a time series and consider the (infinite dimensional!) path \(X=(X^{a}:a\in \mathfrak{A})\) where, for \(a=[\mathtt{1}^{k_{1}}\cdots \mathtt{d}^{k_{d}}]\in \mathfrak{A}\), the component path \(X^{a}\) is the linear interpolation of the time series

$$\begin{aligned} n\mapsto \sum _{j=1}^{n}\Delta x_{j}^{a} =\sum _{j=1}^{n}(\Delta x_{j}^{[ \mathtt{1}]})^{k_{1}}\cdots (\Delta x_{j}^{[\mathtt{d}]})^{k_{d}}. \end{aligned}$$
(18)

Theorem 5.3

We have \(\langle \Phi _{\mathsf{H}}(w),\operatorname{ISS}(x)\rangle = \langle w,S(X)\rangle \).

Remark 5.4

We note that the iterated-integrals signature of the \(d\)-dimensional path consisting in the piecewise linear interpolation of \(x\) is not enough to obtain \(\operatorname{ISS}(x)\). Instead, the theorem shows that the iterated-integrals signature of the piecewise linear interpolation of the infinite dimensional time series (18) is sufficient.

Proof

Without loss of generality let the interpolation of (18) happen at the time points \(0,1,2,\dots , N\). Then, by Chen’s property,

$$ S(X) = \exp ^{\centerdot }(X_{1})\centerdot \exp ^{\centerdot }(X_{2}-X_{1}) \centerdot \cdots \centerdot \exp ^{\centerdot }(X_{N}-X_{N-1}). $$

We first investigate what happens for a single time step. Let a word \(w=[a_{1}]\cdots [a_{p}]\in T(\mathfrak{A})\) be given, and write \(a_{i} = [\mathtt{1}^{k_{1}^{i}}\cdots \mathtt{d}^{k_{d}^{i}}] \in \mathfrak{A}\), \(i=1,\ldots ,p\). According to Lemma 5.1,

$$\begin{aligned} \langle w,\exp ^{\centerdot }(X_{j}-X_{j-1})\rangle &=\frac{1}{p!} \Delta x_{j}^{[a_{1}]}\cdots \Delta x_{j}^{[a_{p}]} =\frac{1}{p!}( \Delta x_{j}^{[\mathtt{1}]})^{K_{\mathtt{1}}} \cdots (\Delta x_{j}^{[ \mathtt{d}]})^{K_{\mathtt{d}}}\\ & =\frac{1}{p!}\Delta x_{j}^{[ \mathtt{1}^{K_{\mathtt{1}}}\cdots \mathtt{d}^{K_{\mathtt{d}}}]}. \end{aligned}$$

where \(K_{\mathtt{m}}:=k_{\mathtt{m}}^{1}+\cdots +k_{\mathtt{m}}^{p}\). In other words, \(K_{\mathtt{m}}\) is the number of times the letter \(\mathtt{m}\in \{\mathtt{1},\ldots ,\mathtt{d}\}\) is repeated in \(w\).

Now the only term in \(\Phi _{\mathsf{H}}(w)\) containing a single letter is \(\frac{1}{p!}[\mathtt{1}^{K_{\mathtt{1}}}\cdots \mathtt{d}^{K_{ \mathtt{d}}}]\), i.e., the full “contraction”. Then, by Lemma 5.2,

$$\begin{aligned} \langle \Phi _{\mathsf{H}}(w), \operatorname{ISS}(x)_{j-1,j} \rangle &= \langle \frac{1}{p!}[\mathtt{1}^{K_{\mathtt{1}}}\cdots \mathtt{d}^{K_{\mathtt{d}}}], \operatorname{ISS}(x)_{j-1,j} \rangle \\ &= \frac{1}{p!}\Delta x_{j}^{[\mathtt{1}^{K_{\mathtt{1}}}\cdots \mathtt{d}^{K_{\mathtt{d}}}]} \\ &= \langle \Phi _{\mathsf{H}}(w),\operatorname{ISS}(x)_{j-1,j} \rangle . \end{aligned}$$

Therefore, we have shown the claim for a single time step.

Now, since \(\Phi _{\mathsf{H}}\) is a Hopf algebra map, the statement of the theorem is equivalent to showing that \(\Phi _{\mathsf{H}}^{*}(\operatorname{ISS}(x))=S(X)\), where \(\Phi _{\mathsf{H}}^{*}\) is the adjoint of Hoffman’s isomorphism. Since \(\Phi _{\mathsf{H}}^{*}\) is an algebra morphism, we calculate

$$\begin{aligned} \Phi _{\mathsf{H}}^{*}(\operatorname{ISS}(x)_{0,N})&=\Phi _{ \mathsf{H}}^{*}(\operatorname{ISS}(x)_{0,1})\centerdot \cdots \centerdot \Phi _{\mathsf{H}}^{*}(\operatorname{ISS}(x)_{N-1,N})=S(X)_{0,1} \centerdot \cdots \centerdot S(X)_{N-1,N} \\ &=S(X)_{0,N}. \end{aligned}$$

So the result is valid for the full signature. □

Finally, we show a consistency result.

Proposition 5.5

Let \(X\colon [0,1]\to {\mathbb{R}}^{d}\)be a continuous path of finite variation, meaning that

$$ \sup _{\pi }\sum _{[s,t]\in \pi }\lVert X_{t}-X_{s}\rVert < \infty $$

where the supremum is taken over all partitions \(\pi \)of \([0,1]\).

Given such a partition \(\pi =\{t_{0}=0< t_{1}<\cdots <t_{N-1}<t_{N}=1\}\), define \(x(\pi )\)by \(x(\pi )_{j}=X_{t_{j}}\). Then

$$ \lim _{\lvert \pi \rvert \to 0}\langle w,\operatorname{ISS}(x(\pi ))_{0,N} \rangle = \textstyle\begin{cases} \langle w,S(X)_{0,1}\rangle &w\in T(A) \\ 0&w\not \in T(A) \end{cases}\displaystyle . $$

Proof

We use induction on the length \(\ell (w)\). If \(\ell (w)=1\) and \(w=\mathtt{i}\in A\), then

$$ \langle \mathtt{i},\operatorname{ISS}(x(\pi ))_{0,N}\rangle =\sum _{[s,t] \in \pi }( x(\pi )^{\mathtt{i}}_{t}-x(\pi )^{\mathtt{i}}_{s} )=X^{i}_{1}-X^{i}_{0}= \int _{0}^{1}\mathrm{d}X^{i}_{s} $$

which is independent of \(\pi \). If, on the other hand, \(w=a\in \mathfrak{A}\setminus A\) then \(a=[\mathtt{1}^{k_{1}}\cdots \mathtt{d}^{k_{d}}]\) with \(k_{1}+\cdots +k_{d}\ge 2\). Therefore

$$\begin{aligned} \lvert \langle a,\operatorname{ISS}(x(\pi ))_{0,N}\rangle \rvert &= \left \lvert \sum _{j=1}^{N}\prod _{i=1}^{d}(\Delta x(\pi )_{j}^{i})^{k_{i}} \right \rvert \\ &\le \sum _{j=1}^{N}\prod _{i=1}^{d}\lvert \Delta x(\pi )_{j}^{i} \rvert ^{k_{i}} \\ &= \sum _{j=1}^{N}\prod _{i=1}^{d}\left \lvert X^{i}_{t_{j}}-X^{i}_{t_{j-1}} \right \rvert ^{k_{i}} \\ &= \sum _{j=1}^{N}\lVert X_{t_{j}}-X_{t_{j-1}} \rVert ^{k_{1}+\cdots +k_{d}} \\ &\le \lVert X\rVert _{1}\sup _{j=1,\ldots ,N}\lVert X_{t_{j}}-X_{t_{j-1}} \rVert ^{k_{1}+\cdots +k_{d}-1} \end{aligned}$$

which vanishes in the limit since \(X\) is uniformly continuous on \([0,1]\).

Now suppose \(w=w'a\) for some \(w\in T(\mathfrak{A})\) and \(a\in \mathfrak{A}\). We have 3 cases

  1. (1)

    \(w'\in T(\mathfrak{A}\setminus A)\): in this case, no matter what \(a\) is, we have

    $$\begin{aligned} \lvert \langle w,\operatorname{ISS}(x(\pi ))_{0,N}\rangle \rvert & \le \sum _{j=1}^{N}\lvert \langle w',\operatorname{ISS}(x(\pi ))_{0,j} \rangle \rvert \lvert \Delta x(\pi )_{j}^{a}\rvert \\ &\le \sup _{j=1,\ldots ,N}\lVert X_{t_{j}}-X_{t_{j-1}}\rVert ^{|a|} \sum _{j=1}^{N}\lvert \langle w',\operatorname{ISS}(x(\pi ))_{0,j-1} \rangle \rvert \to 0 \end{aligned}$$

    as \(\lvert \pi \rvert \to 0\), by the induction hypothesis.

  2. (2)

    \(w'\in T(A)\) and \(a\in \mathfrak{A}\setminus A\): the same argument as before gives that the corresponding entry in \(\operatorname{ISS}(x(\pi ))\) vanishes in the limit.

  3. (3)

    \(w'\in T(A)\) and \(a\in A\): again by definition we have

    $$ \langle w,\operatorname{ISS}(x(\pi ))_{0,N}\rangle =\sum _{j=1}^{N} \langle w',\operatorname{ISS}(x(\pi ))_{0,j-1}\rangle (X^{a}_{t_{j}}-X^{a}_{t_{j-1}}) $$

    which converges to the Young (or Riemann–Stieltjes) integral

    $$ \int _{0}^{1}\langle w',S(X)_{0,s}\rangle \,\mathrm{d}X^{a}_{s}= \langle w,S(X)_{0,1}\rangle . $$

Therefore, we have that

$$ \lim _{|\pi |\to 0}\langle w,\operatorname{ISS}(x(\pi ))_{0,N} \rangle =\langle w,S(X)_{0,1}\rangle $$

if \(w\in T(A)\), and vanishes otherwise. □

5.1 The Area Operation

It is well known that for the iterated-integrals signature certain linear combinations of the entries have a precise geometric interpretation. Indeed, for any \(\mathtt{i},\mathtt{j}\in A\)

$$ \langle \mathtt{ij}-\mathtt{ji}, S(X)_{s,t}\rangle =\iint \limits _{s< u_{1}< u_{2}< t}( \mathrm{d}X^{\mathtt{i}}_{u_{1}}\mathrm{d}X^{\mathtt{j}}_{u_{2}} - \mathrm{d}X^{\mathtt{j}}_{u_{1}}\mathrm{d}X^{\mathtt{i}}_{u_{2}}) =:\operatorname{Area}(X^{\mathtt{i}},X^{\mathtt{j}})_{s,t}, $$

represents (two times) the signed area (or Lévy area) between the curves \(u\mapsto X_{u}^{\mathtt{i}}\) and \(u\mapsto X_{u}^{\mathtt{j}}\) for \(u\in [s,t]\), and the cord between the points \((X_{s}^{\mathtt{i}},X_{s}^{\mathtt{j}})\) and \((X_{t}^{\mathtt{i}},X_{t}^{\mathtt{j}})\).

We abstract this operation to the shuffle algebra by using the notion of half-shuffles introduced in Section 2.1. In fact, one verifies that at this level the area operation may be represented in terms of half-shuffle operations as

$$ \mathtt{ij}-\mathtt{ji}=\mathtt{i}\succ \mathtt{j}-\mathtt{j} \succ \mathtt{i}=:\operatorname{area}(\mathtt{i},\mathtt{j}), $$

so that in particular \(\operatorname{Area}(X^{\mathtt{i}},X^{\mathtt{j}})_{s,t}=\langle \operatorname{area}( \mathtt{i},\mathtt{j}),S(X)_{s,t}\rangle \).

We extend this by defining area operations on and \(H_{\mathrm{qsh}}=(T(\mathfrak{A}),\star ,\delta )\).

Definition 5.6

The area map is defined by

$$ \operatorname{area}(u,v):=u\succ v-v\succ u. $$

Next, the discrete analogue is given in terms of the first half-shuffle product in (7).

Definition 5.7

(Discrete area)

The discrete area map \(\operatorname{\mathtt{area}}\colon H_{\mathrm{qsh}}\otimes H_{\mathrm{qsh}}\to H_{ \mathrm{qsh}}\) is defined by

$$ \operatorname{\mathtt{area}}(u,v):=u\mathbin{\dot{\succ }}v-v\mathbin{ \dot{\succ }}u. $$

We compare the two areas by considering the words \(u=[\mathtt{3}]\) and \(v=[\mathtt{4}][\mathtt{12}]\). Then

$$\begin{aligned} \operatorname{area}([\mathtt{3}],[\mathtt{4}][\mathtt{12}]) &=[\mathtt{3}][ \mathtt{4}][\mathtt{12}] +[\mathtt{4}][\mathtt{3}][\mathtt{12}] -[ \mathtt{4}][\mathtt{12}][\mathtt{3}], \\ \operatorname{\mathtt{area}}([\mathtt{3}],[\mathtt{4}][\mathtt{12}]) &=[\mathtt{3}][ \mathtt{4}][\mathtt{12}] +[\mathtt{4}][\mathtt{3}][\mathtt{12}] +[ \mathtt{34}][\mathtt{12}] -[\mathtt{4}][\mathtt{12}][\mathtt{3}] \end{aligned}$$

as follows from Example 8.

Both \(\operatorname{area}\) and \(\operatorname{\mathtt{area}}\) can be iterated. We now make this precise: define \(\mathsf{D}_{1}=D_{1} :=\mathbb{F}\,\mathfrak{A}\), the vector space spanned by the set \(\mathfrak{A}\). Then, inductively define vector spaces

$$\begin{aligned} D_{n+1} &:=\operatorname{span}_{\mathbb{F}}\{ \operatorname{area}(D_{n+1-m},,D_{m}) : m \le n \} \\ \mathsf{D}_{n+1} &:=\operatorname{span}_{\mathbb{F}}\{ \operatorname{\mathtt{area}}( \mathsf{D}_{n+1-m},,\mathsf{D}_{m}) : m \le n \}. \end{aligned}$$

We finally set

$$ D:=\bigoplus _{n\ge 1}D_{n}, \qquad \mathsf{D}:=\bigoplus _{n\ge 1}\mathsf{D}_{n}. $$

Neither the \(\operatorname{area}\) nor the discrete \(\operatorname{\mathtt{area}}\) operations are associative. One can show, however, that \(\operatorname{area}\) satisfies a fourth-order relation, known as tortkara, introduced by Dzhumadil’daev in the 2007 paper [14]. In [15] the image of iterated applications of the area map is characterised. (Compare also [45, Theorem 28]).

Theorem 5.8

([15, Theorem 2.1])

The space \(D\) is spanned by the set

$$ \mathfrak{A}\cup \{\,u([a][b]-[b][a]):a,b\in \mathfrak{A}, u\in T( \mathfrak{A})\,\}. $$

From Lemma 2.7 and Lemma 2.8 we deduce the following morphism property of Hoffman’s isomorphism with respect to \(\operatorname{area}\) and \(\operatorname{\mathtt{area}}\).

Theorem 5.9

\(\Phi _{\mathrm{H}}\colon D\to \mathsf{D}\)is a tortkara morphism, i.e., for \(\varphi , \psi \in D\)

$$ \Phi _{\mathrm{H}}(\operatorname{area}(\varphi ,\psi )) =\operatorname{\mathtt{area}}(\Phi _{\mathrm{H}}( \varphi ),\Phi _{\mathrm{H}}(\psi )). $$

Remark 5.10

1. Note that \(\Phi _{\mathrm{H}}\) is not a (quasi-)half-shuffle morphism. Only the anti-symmetrisation to \(\operatorname{area}\) respectively \(\operatorname{\mathtt{area}}\) is nicely compatible with it.

2. The set \(D\) (the set of “areas-of-areas”) is known to generate \(T(\mathfrak{A})\) as a shuffle-algebra, see [12]. Applied to iterated-integral signatures this means that all their information is already contained in areas-of-areas. The area operation \((X,Y) \mapsto \operatorname{Area}(X,Y)\) has an immediate geometric interpretation, whereas the operation of integration \((X,Y) \mapsto \int X dY\).Footnote 1 Moreover, the area operation is related to antisymmetrised lead-lag correlation in time series analysis, see [13, Section 3.2]. We refer to [12, Section 6] for more applications.

Proof

By Dzumadil’daev’s theorem (Theorem 5.8) it suffices to prove the claim for the case when \(\varphi =u([a][b]-[b][a])\) and \(\psi =v([c][d]-[d][c])\). We first observe that in this case the \(\operatorname{area}\) operation can be written more explicitly:

Each of these terms can be further expanded into three terms. For example, the first one equals

In total there are 12 terms, the remaining 9 terms are

For each of these terms we can find exactly one other term such that their sum is of the form \(w([x][y]-[y][x])\), for \([x],[y]\in \{[a],[b],[c],[d]\}\), and thus by Lemma 2.8 the image of this sum has the form \(\Phi _{\mathrm{H}}(w[x])[y]-\Phi _{\mathrm{H}}(w[y])[x]\). To summarise, the image \(\Phi _{\mathrm{H}}(\operatorname{area}(\varphi ,\psi ))\) is a linear combination of 6 terms, each of them having the form \(\Phi _{\mathrm{H}}(w[x])[y]-\Phi _{\mathrm{H}}(w[y])[x]\). Now, if we pick any \([x]\in \{[a],[b],[c],[d]\}\) there are exactly three terms containing \([x]\) as the last letter. For example, for \([a]\) these terms are

where the last identity is easy to check using that \(\psi =v([c][d]-[d][c])\). Applying a similar argument to all letters we see that

 □

6 Conclusion

In this work we have

  • introduced a new set of features for multidimensional time series consisting in iterated sums (Section 3);

  • shown that these features are invariant to time warping and that these in fact are all the (polynomial) invariants in this sense (Section 4);

  • described a Hopf algebraic framework to compute these features (Section 2);

  • shown how this setting mirrors the one of iterated-integrals in some aspects and differs in others (Section 2).

There are several possible generalisations of our work.

  • Let \(f, g: \mathbb{F}\to \mathbb{F}\) be such that \(f(0) = g(0) = 0\). Then iterated-sums of the form

    $$\begin{aligned} \sum _{i_{1} < i_{2}} f\left ( \Delta x_{i_{1}} \right ) g\left ( \Delta x_{i_{2}} \right ), \end{aligned}$$

    are also invariant to time warping (and analogously for higher order iterated-sums). These are, in general, not polynomial in the time series anymore, but might still be relevant for certain applications. For smooth \(f\), \(g\) this should be related to the expansion of nonlinear functionals on stochastic word series [11], but the non-smooth case (for example \(f(x) = x\), \(g(x) = |x|\)) is particularly interesting.

  • Multi-parameter data. An object of interest are for example “images” \(I: [0,N]\times [0,N] \to {\mathbb{R}}\) and the time warping invariance becomes an invariance to stretching of the image.

We are also interested in exploring the possible applications of these invariants in data science.

  • Retrieval of similar time series, invariant to time warping: see [47] (and references therein), where it is stated that “the time warping distance …does not lead to any natural features”. The invariants presented in our work should provide those missing features, but a mathematical rigorous proof of this statement is left for future work.

  • Statistical inference in problems involving unknown time warping, as in Example 1.2.

  • Time series clustering: the features of this work can be used to cluster time series according to their “shape”, i.e., independent of time warping. Sometimes a “prototype” for each cluster is looked after, see for example [43]. In this case - as in the previous point - reconstruction of a time series from an (averaged) iterated-sums signature would be necessary. A detailed study of this ostensibly hard problem is left for future research.

We close with some open questions. At the end of Section 3 we showed that an equivalent of Chow’s theorem does not hold for the iterated-sums signature \(\operatorname{ISS}(x)\).

  • Can we understand \(\{ \operatorname{ISS}(x) : x \in (\mathbb{F}^{d})^{\mathbb{N}_{+}}_{c} \}\) as a semi-algebraic set? (Compare [3] for the investigation of the image of iterated-integrals signatures as algebraic sets.)

  • For \(x \in (\mathbb{F}^{d})^{N}\) denote by \(\overleftarrow{x}\) the time series run backwards. Then (as might surprise readers familiar with Chen’s signature) \(\operatorname{ISS}(\overleftarrow{x}) \centerdot \operatorname{ISS}(x) \neq \varepsilon \). What are the implications?

  • The lead-lag procedure of [17] lifts a discrete time series of dimension \(d\) to a piecewise smooth curve of dimension \(2d\). Since the resulting iterated-integrals signature is invariant to time warping as well as space translations, and is polynomial in the original time series, by Lemma 4.3 it must be contained in the iterated-integrals signature \(\operatorname{ISS}(x)\). Conversely, is the signature of the resulting \(2d\) curve enough to recover the iterated-sum signature? This would give a finite dimensional smooth curve whose iterated-integrals signature contains the invariants presented in this paper (compare Theorem 5.3 for an infinite dimensional smooth curve doing the job).