In the general case, the problem of reconstructing an object of some nature from incomplete information on its “parts” can be treated as a pattern recognition problem [1]. Specifically, when objects are words in the form of sequences of symbols over an alphabet, the question arises as to whether a word can be reconstructed from a set or multiset of its subwords. This problem is concerned with an area of discrete mathematics known as combinatorics on words [26], which studies the relationship between sequences of symbols and sets of their subsequences.

The problem of reconstructing words from subwords has a number of applications. For example, in character encoding, arbitrary information is represented in the form of sequences of symbols and words [7]. This method of information representation is used, for example, in data transmission through communication channels [8], when it is important to ensure reliable transmission of encoded information without loss or to recover the original word from available fragments in the case of loss [9]. Character encoding is also used, for example, in time series analysis [10] and in biocomputer science [11].

Estimates for the fragment length sufficient for an arbitrary word to be reconstructed from the multiset of all its subwords were obtained in [12, 13].

In this paper, on the words to be reconstructed, we impose the constraint that they contain a periodic subword. This makes it possible to reduce the fragment length sufficient for word reconstruction. For example, a result concerning the reconstruction of a word with a periodic suffix and an aperiodic prefix was presented in [17]. Below, this result of [17] is strengthened in Theorem 3, which deals with the reconstruction of a word with a periodic suffix and a periodic prefix. Theorems 1 and 2 concern the reconstruction of periodic words and repeat their counterparts from [17]. Additionally, some more cases of words that are not periodic, but contain a periodic word or subwords are considered in this paper. Specifically, we state and prove theorems on the reconstruction of a word with an aperiodic subword contained between periodic prefix and suffix (Theorem 4) and for a periodic word with a constraint on a repeated subword (Theorem 5). In turn, each of these theorems is included in Theorem 3 as a special case.

1 FORMULATION OF THE PROBLEM

We introduce the following notation and definitions. In what follows, Greek letters denote words, and lowercase Latin letters denote alphabetic symbols. Let \(E_{2}^{n}\) denote the set of words of length \(n\) from the alphabet {0, 1}. For a word \(\alpha = \left( {{{a}_{1}}{{a}_{2}} \ldots {{a}_{n}}} \right) \in E_{2}^{n}\), the symbol \(\left| \alpha \right|\) denotes the sum of its elements: \(\left| \alpha \right| = {{a}_{1}} + {{a}_{2}}\) + ... + an. Let \(\lambda \) denote the empty word, i.e., the word of length zero. The word \(\alpha \) raised to power \(s\) denotes the word consisting of \(s\) repeats of the word \(\alpha \), i.e., \({{\alpha }^{s}} = {{\beta }_{1}}{{\beta }_{2}} \ldots {{\beta }_{s}}\), where \({{\beta }_{i}} = \alpha ,~i~ \in \left\{ {1,~2,~ \ldots ,s} \right\}\).

Given a word \(\alpha = \left( {{{a}_{1}}{{a}_{2}} \ldots {{a}_{n}}} \right) \in E_{2}^{n}\) and a reference vector \({v} = \left( {{{{v}}_{1}}{{{v}}_{2}} \ldots {{{v}}_{n}}} \right)\), where \({{{v}}_{i}} \in \left\{ {0,~1} \right\},~i \in \left\{ {1,~2, \ldots ,~n} \right\}\), the fragmentation operation \(\langle \alpha \cdot {v}\rangle \) constructs a word of length \(\left| {v} \right|\) according to the following rule:

$$\langle \alpha \cdot {v}\rangle = \left\{ {\begin{array}{*{20}{c}} {{{a}_{i}},\quad {{{v}}_{i}} = 1} \\ {\lambda ,\quad {{{v}}_{i}} = 0.} \end{array}} \right.$$

A fragment, or subword, of a word α = \(({{a}_{1}}{{a}_{2}} \ldots {{a}_{n}})\, \in \,E_{2}^{n}\) is a word \(\tilde {\alpha } \in E_{2}^{n}\) of the form

$$\tilde {\alpha } = \left( {{{a}_{{{{i}_{1}}}}}{{a}_{{{{i}_{2}}}}} \ldots {{a}_{{{{i}_{k}}}}}} \right),\quad 1 \leqslant {{i}_{1}} < ~\,\,{{i}_{2}} < \ldots < {{i}_{k}} \leqslant n.$$

Let \(f{\text{*}}\left( \alpha \right)\) be the smallest fragment length k for which the word α of length \(n\) is uniquely reconstructed from a multiset of its subwords of length k.

Problem. In the general form, the problem of reconstructing a word from a multiset of its subwords is formulated as follows. Given

• a set of reference vectors V = \(\{ {{{v}}^{1}},~{{{v}}^{2}},~ \ldots ,~{{{v}}^{N}}\} ,{{{v}}^{i}} \in E_{2}^{n}\), \({\text{|}}{{{v}}^{i}}{\text{|}} = k,i \in \left\{ {1,~2,~ \ldots ,~N} \right\}\), and

• a set of words X = {χ1, χ2, ..., χN}, \({{\chi }^{i}} \in E_{2}^{k},i \in \left\{ {1,~2, \ldots ,~N} \right\}\),

the task is to check whether X is a set of fixed-length fragments of some unknown word \(\alpha \in E_{2}^{n}\) constructed with the help of fragmentation operations with vectors from V and to find all possible solutions.

Note that all estimates obtained in the case of the binary alphabet remain valid for an arbitrary alphabet [3], since, in the case of the alphabet \(\left\{ {0,1, \ldots ,~k} \right\}\), the problem is reduced to a set of \(k\) problems in the binary alphabet with the help of the mappings

$$\left\{ {\begin{array}{*{20}{c}} {{{\varphi }_{i}}:x \mapsto \left\{ {\begin{array}{*{20}{c}} {1,\quad x = i} \\ {0,\quad x \ne i,} \end{array}} \right.} \\ {i \in \left\{ {0,~1,~ \ldots ,~k} \right\}.} \end{array}} \right.$$

For this reason, in what follows, we consider only binary words \(\alpha = \left( {{{a}_{1}}{{a}_{2}} \ldots {{a}_{n}}} \right),~{{a}_{i}} \in \left\{ {0,~1} \right\}\). Moreover, we are interested in periodic words.

A word \(\alpha = \left( {{{a}_{1}}{{a}_{2}} \ldots {{a}_{n}}} \right),~{{a}_{i}} \in \left\{ {0,~1} \right\}\) is called periodic with a period p if

$$\alpha = {{({{a}_{1}}{{a}_{2}} \ldots {{a}_{p}})}^{l}},\quad {{a}_{i}} \in \left\{ {0,~1} \right\}.$$
(1)

Here, the subword \(\tilde {\alpha } = {{a}_{1}}{{a}_{2}} \ldots {{a}_{p}}\) is called the generating subword for a word α of form (1).

2 STATE OF THE ART IN THE PROBLEM

It is known that the problem of word reconstruction from subwords can be reduced to verifying the uniqueness of solutions to Diophantine equations of certain type [2]. It has been shown that the problem of existence and uniqueness of a solution is NP-complete.

In the case of a complete multiset of fragments (\(V = E_{2}^{n}\)), it was established [14] that a word α of length \(n\) can be uniquely reconstructed if the fragment length satisfies \(k \geqslant {\text{exp}}\left( {\Omega \left( {\log n} \right)} \right)\).

For the same case of a complete multiset of fragments, the following estimates for the sufficient fragment length are known [3, 13, 12]: \(k\,\, \geqslant \,\,\left\lfloor {\frac{n}{2}} \right\rfloor \), \(k\, \geqslant \,(1\, + \,o(1))\sqrt {n{\text{log}}n} \), \(k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt n } \right\rfloor + 5\).

Several special cases were studied. For example, it was shown in [15] that, for words consisting of \(l\) series, it is sufficient to use fragments of length \(l\).

3 RESULTS

The first theorem is as follows.

Theorem 1. A periodic word of length \(n\) and period \(p\) can be uniquely reconstructed from the multiset of all its fragments of length k if

$$k \geqslant \left( {1 + o\left( 1 \right)} \right)\sqrt {p{\text{log}}p} .$$

Before proving the theorem, we introduce the following quantity.

Definition 1. For a word \(\alpha \), let \({{N}_{\beta }}\left( \alpha \right)\) denote the number of fragments equal to β in \(\alpha \).

It was shown in [2] that, given \({{N}_{\beta }}\left( \alpha \right)\) for all binary words β of length k, the numbers of fragments of all lengths shorter than k can be uniquely reconstructed. Next, we formulate and prove a lemma that directly follows from the system of equations in [2].

Lemma 1. For any word \(\alpha \), the set of its moments of the form \(\mathop \sum \limits_{r = 1}^n {{a}_{r}}{{r}^{j}}\) is uniquely determined by the set of its fragments of the form \({{x}^{j}}1\).

Proof. Let us derive formulas for \({{N}_{{{{x}^{j}}1}}}\left( \alpha \right)\) for various j:

$$\begin{array}{*{20}{c}} {{{N}_{1}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^n {{a}_{r}},} \\ \begin{gathered} {{N}_{{x1}}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^n \left( {r - 1} \right){{a}_{r}} \\ = \mathop \sum \limits_{r = 1}^n {{a}_{r}}r - {{N}_{1}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^n {{a}_{r}}r - {{f}_{1}}\left( {{{N}_{1}}\left( \alpha \right)} \right), \\ \end{gathered} \\ \ldots \\ \begin{gathered} {{N}_{{{{x}^{{k - 1}}}1}}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^n \left( {\begin{array}{*{20}{c}} {r - 1} \\ {k - 1} \end{array}} \right){{a}_{r}} = \frac{1}{{\left( {k - 1} \right)!}}\mathop \sum \limits_{r = 1}^n {{a}_{r}}{{r}^{{k - 1}}} \\ - \,{{f}_{{k - 1}}}\left( {{{N}_{1}}\left( \alpha \right),{{N}_{{x1}}}\left( \alpha \right),~ \ldots ,{{N}_{{{{x}^{{k - 2}}}1}}}\left( \alpha \right)} \right). \\ \end{gathered} \end{array}$$

The functions \({{f}_{2}},~{{f}_{3}},~ \ldots ,~{{f}_{{k - 1}}}\) are computed using combinatorial relations. For example, we find an expression for \({{f}_{p}}({{N}_{1}}(\alpha ),~ \ldots ,{{N}_{{{{x}^{{p - 1}}}1}}}(\alpha ))\), \(~2 \leqslant p \leqslant k - 1\), in terms of \({{f}_{1}},~ \ldots ,~{{f}_{{p - 1}}}\) obtained at the preceding steps (here, it is possible to set \({{f}_{0}} \equiv 0\)):

$$\begin{gathered} {{N}_{{{{x}^{p}}1}}}(\alpha ) = \sum\limits_{r = 1}^n {\left( \begin{gathered} r - 1 \\ p \\ \end{gathered} \right)} {{a}_{r}} = \sum\limits_{r = 1}^n {\frac{{(r - 1)!}}{{p!(r - 1 - p)!}}} {{a}_{r}} \\ = \sum\limits_{r = 1}^n {\frac{{(r - 1) \cdot (r - 2)...(r - p)}}{{p!}}{{a}_{r}}} \\ = \sum\limits_{r = 1}^n {\frac{1}{{p!}}} {{a}_{r}}\left\{ {{{r}^{p}} + {{r}^{{p - 1}}}\sum\limits_{{{i}_{1}} = 1}^p {{{{( - {{i}_{1}})}}_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{_{{}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}} } \right. \\ \left. { + \sum\limits_{\substack{ {{i}_{1}},{{i}_{2}} \in \{ 1,2,...,p\} \\ {{i}_{1}} \ne {{i}_{2}} }} ^{} {( - {{i}_{1}})( - {{i}_{2}}) + ... + {{r}^{o}}\prod\limits_{{{i}_{{p = 1}}}}^p {( - {{i}_{p}})} } } \right\} \\ \end{gathered} $$
$$\begin{gathered} = \frac{1}{{p!}}\sum\limits_{r = 1}^n {{{a}_{r}}{{r}^{p}}} + \frac{1}{{p!}}\sum\limits_{l = 1}^p {\sum\limits_{r = 1}^n {{{a}_{r}}{{r}^{{p - 1}}}} } \\ \times \sum\limits_{\substack{ S \subseteq \{ 1,...,p\} \\ |S| = l }} ^{} {\prod\limits_{i \in S}^{} {( - i) = \frac{1}{{p!}}} \sum\limits_{r = 1}^n {{{a}_{r}}{{r}^{p}}} } \\ + \frac{1}{{p!}}\sum\limits_{l = 1}^p {({{N}_{{{{x}^{{p - l}}}1}}}(\alpha ) + {{f}_{{p - 1}}}({{N}_{1}}(\alpha ),...,{{N}_{{{{x}^{{p - l - 1}}}1}}}(\alpha )))} \\ \times \sum\limits_{\substack{ S \subseteq \{ 1,...,p\} \\ |S| = l }} ^{} {\prod\limits_{i \in S}^{} {( - i) = \frac{1}{{p!}}} \sum\limits_{r = 1}^n {{{a}_{r}}{{r}^{p}} + {{f}_{p}}({{N}_{1}}(\alpha ),...,{{N}_{{{{x}^{{p - 1}}}1}}}(\alpha )).} } \\ \end{gathered} $$

Since, by assumption, all \({{f}_{1}},~ \ldots ,~{{f}_{{p - 1}}}\) are known, \({{f}_{p}}\left( {{{N}_{1}}\left( \alpha \right),~ \ldots ,{{N}_{{{{x}^{{p - 1}}}1}}}\left( \alpha \right)} \right)\) is known as well.

Now we can prove Theorem 1.

Proof. The proof is based on the results of [13] concerning the lengths of fragments sufficient for unique word reconstruction in the case of arbitrary words. The estimate of f*(α) in [13] was obtained by analyzing the system of equations

$$\left\{ {\begin{array}{*{20}{c}} {\mathop \sum \limits_{r = 1}^n {{a}_{r}}{{r}^{j}} = {{s}_{j}}\left( \alpha \right)} \\ {0 \leqslant j \leqslant k - 1,} \end{array}} \right.$$
(2)

for which the solution is unique if \(f{\text{*}}(\alpha )\, \leqslant \,(1\, + \,o(1))\sqrt {n~{\text{log}}~n} \).

Suppose that the word \(\alpha \) consists of \(l = \frac{n}{p}\) periods: \(\alpha = {{({{a}_{1}}{{a}_{2}} \ldots {{a}_{p}})}^{l}}\). Then the zero equation \(\mathop \sum \limits_{r = 1}^n {{a}_{r}} = {{s}_{0}}\left( \alpha \right)\) can be rewritten as \(\mathop \sum \limits_{r = 1}^p {{a}_{r}} = \frac{{{{s}_{0}}\left( \alpha \right)}}{l}\). For an arbitrary index \(k'\) ranging from 1 to k – 1 inclusive, we have

$$\begin{gathered} \sum\limits_{r = 1}^n {{{a}_{r}}{{r}^{{k'}}}} = \sum\limits_{r = 1}^p {{{a}_{r}}{{r}^{{k'}}}} + \sum\limits_{r = p + 1}^{2p} {{{a}_{r}}{{r}^{{k'}}}} + ... + \sum\limits_{r = (l - 1)p + 1}^{lp} {{{a}_{r}}{{r}^{{k'}}}} \\ = \sum\limits_{r = 1}^p {{{a}_{r}}{{r}^{{k'}}}} + \sum\limits_{r = 1}^p {{{a}_{r}}{{{(p + 1)}}^{{k'}}}} + ... + \sum\limits_{r = 1}^p {{{a}_{r}}{{{((l - 1)p + r)}}^{{k'}}}} \\ = \sum\limits_{r = 1}^p {{{a}_{r}}{{r}^{{k'}}} + \sum\limits_{r = 1}^p {\sum\limits_{j = 0}^{k'} {\left( \begin{gathered} k{\kern 1pt} ' \\ j \\ \end{gathered} \right){{p}^{j}}{{r}^{{k' - j}}}{{a}_{r}} + ...} } } \\ \end{gathered} $$
$$\begin{gathered} + \,\sum\limits_{r = 1}^p {\sum\limits_{j = 0}^{k'} {\left( \begin{gathered} k{\kern 1pt} ' \\ j \\ \end{gathered} \right)} } {{((l - 1)p)}^{j}}{{r}^{{k' - j}}}{{a}_{r}} = l \cdot \sum\limits_{r = 1}^p {{{a}_{r}}{{r}^{{k'}}}} \\ + \,\sum\limits_{j = 1}^{k'} {\sum\limits_{r = 1}^p {\left( \begin{gathered} k{\kern 1pt} ' \\ j \\ \end{gathered} \right){{{((l - 1)p)}}^{j}}{{r}^{{k' - j}}}{{a}_{r}}} } \\ = {{f}_{{k'}}}\left( {\sum\limits_{r = 1}^p {{{a}_{r}}{{r}^{{k'}}},...,\sum\limits_{r = 1}^p {{{a}_{r}}} } } \right), \\ \end{gathered} $$

where \({{f}_{{k'}}}\) is a linear function. The rest of the proof is the same as in [13].

Let us formulate a stronger result.

Theorem 2. A periodic word of length \(n\) with period p is uniquely reconstructed from the multiset of all its fragments of length \(k\) if

$$k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt p } \right\rfloor + 5.$$

Proof. It is based on [12], where the possibility of unique reconstruction of a word from its subwords is proved not by analyzing system (2) used in [13], but rather by analyzing a similar system of the form

$$\left\{ {\begin{array}{*{20}{c}} {\mathop \sum \limits_{r = 1}^n {{a}_{r}}{{r}^{j}} = \mathop \sum \limits_{r = 1}^n {{b}_{r}}{{r}^{j}}} \\ {0 \leqslant j \leqslant k - 1,} \end{array}} \right.$$
(3)

written for two words, \(\alpha = {{a}_{1}} \ldots {{a}_{n}}\) and \(\beta = {{b}_{1}} \ldots {{b}_{n}}\). It is proved that \(\alpha \) and \(\beta \) have identical multisets of subwords of length k if and only if system (3) has a nontrivial solution \(\left( {{{a}_{1}}, \ldots ,{{a}_{n}},{{b}_{1}}, \ldots ,{{b}_{n}}} \right)\). Thus, the proof is reduced to a search for a condition under which the system of Diophantine equations (3) has only the trivial solution. Next, the authors of [12] refer to [16], where the following result is obtained:

$$k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt n } \right\rfloor + 5.$$

For the case of periodic words, in that work, it is shown in the proof of Theorem 1 how system (3) can be rewritten in terms of only \({{a}_{1}}, \ldots ,{{a}_{p}},{{b}_{1}}, \ldots ,{{b}_{p}}\), where p is the word period. Thus, the original word can be uniquely reconstructed from subwords if their length satisfies

$$k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt p } \right\rfloor + 5.$$

Word reconstruction is also possible when the word itself is aperiodic, but consists of several subwords, at least one of which is periodic. Below, we consider several cases of such words. For each case, we state a separate theorem, which is proved relying on what was proved previously for a periodic word (Theorem 2).

Theorem 3. Suppose that a word \(\alpha \) consists of two subwords: a prefix and a suffix, which are both periodic:

$$\alpha = {{({{a}_{1}}{{a}_{2}} \ldots {{a}_{q}})}^{m}}{{({{a}_{{q + 1}}}{{a}_{{q + 2}}} \ldots {{a}_{{q + p}}})}^{l}}.$$
(4)

Then, for \(l \geqslant m{{q}^{{\left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5}}}\), where \(P \equiv {\text{max}}\left( {p,q} \right)\), word (4) can be uniquely reconstructed from fragments of length k if \(k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\).

Proof. Let us find \({{s}_{0}}\left( \alpha \right)\):

$$\mathop \sum \limits_{r = 1}^n {{a}_{r}} = m\mathop \sum \limits_{r = 1}^q {{a}_{r}} + l\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}} = {{s}_{0}}\left( \alpha \right).$$

For \(mq < l\), we obtain \(\frac{{m\left( {{{a}_{1}} + \ldots + {{a}_{q}}} \right)}}{l} < 1\), so

$$\left\{ {\begin{array}{*{20}{c}} {\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}} = \left\lfloor {\frac{{{{s}_{0}}\left( \alpha \right)}}{l}} \right\rfloor } \\ {\mathop \sum \limits_{r = 1}^q {{a}_{r}} = \left\{ {\frac{{{{s}_{0}}\left( \alpha \right)}}{l}} \right\} \cdot \frac{l}{m}.} \end{array}} \right.$$

Consider the case \({{s}_{1}}\left( \alpha \right)\):

$${{s}_{1}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^{qm} {{a}_{r}}r + \mathop \sum \limits_{r = qm + 1}^{qm + pl} {{a}_{r}}r.$$
(5)

The first sum in the expression for \({{s}_{1}}\left( \alpha \right)\) (5) can be represented as follows:

$$\begin{gathered} \mathop \sum \limits_{r = 1}^{qm} {{a}_{r}}r = {{a}_{1}} \cdot \mathop \sum \limits_{j = 0}^{m - 1} \left( {qj + 1} \right) + \ldots + {{a}_{q}} \cdot \mathop \sum \limits_{j = 0}^{m - 1} \left( {qj + q} \right) \\ = \mathop \sum \limits_{r = 1}^q {{a}_{r}}\left( {mr + \frac{{qm(m - 1)}}{2}} \right)\, = \,m\mathop \sum \limits_{r = 1}^q {{a}_{r}}r\, + \,\frac{{qm(m - 1)}}{2}\mathop \sum \limits_{r = 1}^q {{a}_{r}}. \\ \end{gathered} $$

Similarly, for the second sum in (5), we obtain

$$\begin{gathered} \mathop \sum \limits_{r = qm + 1}^{qm + pl} {{a}_{r}}r = {{a}_{{q + 1}}}\mathop \sum \limits_{j = 0}^{l - 1} \left( {qm + pj + 1} \right) \\ + \ldots + {{a}_{{q + p}}}\mathop \sum \limits_{j = 0}^{l - 1} \left( {qm + pj + p} \right) \\ = \mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}}\left( {lr + \frac{{2qml + pl\left( {l - 1} \right)}}{2}} \right) \\ = l\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}}r + \left( {qml + \frac{{pl\left( {l - 1} \right)}}{2}} \right)\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}}. \\ \end{gathered} $$

Finally,

$${{s}_{1}}\left( \alpha \right) = m\mathop \sum \limits_{r = 1}^q {{a}_{r}}r + l\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}}r + {{f}_{1}}\left( {{{s}_{0}}} \right).$$

If \(mq(q + 1){\text{/}}2 < l\), then the equations for the prefix and the suffix split, as in the case with \({{s}_{0}}\left( \alpha \right)\). Now, we consider the case of an arbitrary \(k'\) such that \(2 \leqslant k' \leqslant k - 1\):

$${{s}_{{k'}}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^{qm} {{a}_{r}}{{r}^{{k'}}} + \mathop \sum \limits_{r = qm + 1}^{qm + pl} {{a}_{r}}{{r}^{{k'}}}.$$

The first sum is expanded as

$$\begin{gathered} \mathop \sum \limits_{r = 1}^{qm} {{a}_{r}}{{r}^{{k'}}} = \mathop \sum \limits_{r = 1}^q {{a}_{r}}({{r}^{{k'}}} + \ldots + {{(q\left( {m - 1} \right) + r)}^{{k'}}}) \\ = \mathop \sum \limits_{r = 1}^q {{a}_{r}}\mathop \sum \limits_{h = 1}^m {{(r + q\left( {m - h} \right))}^{{k'}}} \\ = \mathop \sum \limits_{r = 1}^q {{a}_{r}}\mathop \sum \limits_{h = 1}^m \mathop \sum \limits_{j = 0}^{k'} \left( {\begin{array}{*{20}{c}} {k'} \\ j \end{array}} \right){{r}^{j}}{{(q\left( {m - h} \right))}^{{k' - j}}} \\ = m\mathop \sum \limits_{r = 1}^q {{a}_{r}}{{r}^{{k'}}} + {{g}_{{k'}}}\left( {{{s}_{{k' - 1}}}, \ldots ,{{s}_{0}}} \right). \\ \end{gathered} $$

In the last transition, the term with \(r\) raised to power \(k'\) (for \(j = k'\)) was separated from the others. The terms with \(r\) raised to lower powers were represented in the form of a linear function \({{g}_{{k'}}}\).

Similarly, for the second sum, we have

$$\begin{gathered} \sum\limits_{r = qm + 1}^{qm + pl} {{{a}_{r}}{{r}^{{k'}}}} = \sum\limits_{r = q + 1}^{q + p} {{{a}_{r}}({{{(q(m - 1) + r)}}^{{k'}}} + ...} \\ \, + {{(q(m - 1) + p(l - 1) + r)}^{{k'}}}) \\ = \sum\limits_{r = q + 1}^{q + p} {{{a}_{r}}\sum\limits_{h = 1}^l {\sum\limits_{j = 0}^{k'} {\left( \begin{gathered} k' \\ j \\ \end{gathered} \right)} {{{(q(m - 1) + r)}}^{j}}{{{(p(l - h))}}^{{k' - j}}}} } \\ = \sum\limits_{r = q + 1}^{q + p} {{{a}_{r}}\sum\limits_{h = 1}^l {\sum\limits_{j = 0}^{k'} {\sum\limits_{s = 0}^j {\left( \begin{gathered} k' \\ j \\ \end{gathered} \right)\left( \begin{gathered} j \\ s \\ \end{gathered} \right)} {{{(q(m - 1))}}^{s}}{{r}^{{j - s}}}{{{(p(l - h))}}^{{k' - j}}}} } } \\ = l\sum\limits_{r = q + 1}^{q + p} {{{a}_{r}}{{r}^{{k'}}} + {{h}_{{k'}}}({{s}_{{k' - 1}}},...,{{s}_{0}}).} \\ \end{gathered} $$

Here, in the last transition, the term with \(r\) raised to power \(k'\) (for \(j = k'\) and s = 0) was again separated from the others. The terms with \(r\) raised to lower powers were represented in the form of a linear function \({{h}_{{k'}}}\).

Thus, combining the results obtained for both sums, we can return to the computation of \({{s}_{{k'}}}\left( \alpha \right)\):

$$\begin{gathered} {{s}_{{k'}}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^{qm} {{a}_{r}}{{r}^{{k'}}} + \mathop \sum \limits_{r = qm + 1}^{qm + pl} {{a}_{r}}{{r}^{{k'}}} \\ = \left( {m\mathop \sum \limits_{r = 1}^q {{a}_{r}}{{r}^{{k'}}} + {{g}_{{k'}}}\left( {{{s}_{{k' - 1}}}, \ldots ,{{s}_{0}}} \right)} \right) \\ + \left( {l\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}}{{r}^{{k'}}} + {{h}_{{k'}}}\left( {{{s}_{{k' - 1}}}, \ldots ,{{s}_{0}}} \right)} \right) \\ = m\mathop \sum \limits_{r = 1}^q {{a}_{r}}{{r}^{{k'}}} + l\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}}{{r}^{{k'}}} + {{f}_{{k'}}}\left( {{{s}_{{k' - 1}}}, \ldots ,{{s}_{0}}} \right), \\ \end{gathered} $$

where \({{f}_{{k'}}}\) denotes a linear function of \({{s}_{{k' - 1}}}, \ldots ,{{s}_{0}}\) that is the sum of \({{g}_{{k'}}}\) and \({{h}_{{k'}}}\).

Since \(\mathop \sum \limits_{k = 1}^n {{k}^{p}} \to \frac{{{{n}^{{p + 1}}}}}{{p + 1}}\), it follows that, for m · \(\frac{{{{q}^{k}}}}{k}\) < l, all the equations split, and we obtain two systems. Therefore, according to what was proved earlier for the case of a periodic word (Theorem 2), for \(l \geqslant m{{q}^{{\left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5}}}\), where \(P \equiv {\text{max}}\left( {p,q} \right)\), the subword length has to satisfy \(k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\).

Corollary 1. For m = 1, \(\alpha \) has the form of a word with an aperiodic prefix and a periodic suffix:

$$\alpha = {{a}_{1}}{{a}_{2}} \ldots {{a}_{q}}{{({{a}_{{q + 1}}}{{a}_{{q + 2}}} \ldots {{a}_{{q + p}}})}^{l}}.$$

If the suffix is longer than the prefix, namely, \(l\, \geqslant \,{{q}^{{\left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5}}}\), then the word \(\alpha \) can be uniquely reconstructed from subwords of length \(k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\) where \(P \equiv {\text{max}}\left( {p,q} \right)\).

Theorem 4. Suppose that a word \(\alpha \) consists of three subwords: a periodic prefix \(\bar {\alpha }\), a periodic suffix \(\hat {\alpha },\) and an aperiodic root \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\alpha } \):

$$\begin{gathered} \alpha = \bar {\alpha }~\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\alpha } \hat {\alpha } = {{({{a}_{1}}{{a}_{2}} \ldots {{a}_{q}})}^{m}}{{a}_{{q + 1}}}{{a}_{{q + 2}}} \ldots \\ \times \,{{a}_{{q + s}}}{{({{a}_{{q + s + 1}}}{{a}_{{q + s + 2}}} \ldots {{a}_{{q + s + p}}})}^{l}}. \\ \end{gathered} $$
(6)

Then, for \(m \geqslant {{s}^{{{{k}^{ \star }}}}}\) and \(l \geqslant \left( {m - 1} \right){{q}^{{{{k}^{ \star }}}}} + {{(q + s)}^{{{{k}^{ \star }}}}}\), where \({{k}^{ \star }} = \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\) and \(P \equiv {\text{max}}\left( {p,q,s} \right)\), for word (6) to be uniquely reconstructed, it is sufficient that \(k \geqslant {{k}^{ \star }}\).

Proof. The proof is based on Theorems 2 and 3.

Consider \({{s}_{0}}\left( \alpha \right)\):

$${{s}_{0}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^{qm + s + pl} {{a}_{r}} = m\mathop \sum \limits_{r = 1}^q {{a}_{r}} + \mathop \sum \limits_{r = q + 1}^{q + s} {{a}_{r}} + l\mathop \sum \limits_{r = q + s + 1}^{q + s + p} {{a}_{r}}.$$

Then, for \(mq + s < l,\) we can split the equations for two subwords: \(\bar {\alpha }~\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\alpha } \) and \(\hat {\alpha }\).

Consider \(k'\) such that \(1 \leqslant k' \leqslant k - 1\). Omitting the detailed calculations, which are similar to those presented in the proof of Theorem 3, we obtain

$$\begin{gathered} {{s}_{{k'}}}\left( \alpha \right) = m\mathop \sum \limits_{r = 1}^q {{a}_{r}}{{r}^{{k'}}} + \mathop \sum \limits_{r = q + 1}^{q + s} {{a}_{r}}{{r}^{{k'}}} \\ + \,l\mathop \sum \limits_{r = q + s + 1}^{q + s + p} {{a}_{r}}{{r}^{{k'}}} + {{f}_{{k'}}}\left( {{{s}_{{k' - 1}}}, \ldots ,{{s}_{0}}} \right), \\ \end{gathered} $$
(7)

where \({{f}_{{k'}}}\) is a linear function. Thus, under the condition \(l\, \geqslant \,m{{q}^{{k_{1}^{ \star }}}}\, + \,{{(q + s)}^{{k_{1}^{ \star }}}}\, - \,{{q}^{{k_{1}^{ \star }}}}\), where \(k_{1}^{ \star }\, = \,\left\lfloor {\frac{{16}}{7}\sqrt {{{P}_{1}}} } \right\rfloor \) + 5 and \({{P}_{1}} = {\text{max}}\left( {p,qm + s} \right)\), the equations for the words \(\bar {\alpha }~\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\alpha } \) and \(\hat {\alpha }\) can be split, so the whole word can be reconstructed under the subword length constraint \(k \geqslant k_{1}^{ \star }\).

However, the problem of reconstructing the word \(\bar {\alpha }~\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\alpha } \) can be considered separately. Indeed, by Corollary 1, to reconstruct the word \(\bar {\alpha }~\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\alpha } \) from a set of its subwords, it is sufficient that \(m \geqslant {{s}^{{k_{2}^{ \star }}}}\), where \(k_{2}^{ \star } = \left\lfloor {\frac{{16}}{7}\sqrt {{{P}_{2}}} } \right\rfloor + 5\) and \({{P}_{2}} = {\text{max}}\left( {q,s} \right)\). Then the length of the subwords is constrained as \(k \geqslant k_{2}^{ \star }\).

Combining the conditions with \(k_{1}^{ \star }\) and \(k_{2}^{ \star }\), we conclude that the word \(\alpha \) can be reconstructed from subwords for \(m \geqslant {{s}^{{{{k}^{ \star }}}}}\) and \(l \geqslant \left( {m - 1} \right){{q}^{{{{k}^{ \star }}}}} + {{(q + s)}^{{{{k}^{ \star }}}}}\), where \({{k}^{ \star }} = \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\) and \(P \equiv {\text{max}}\left( {p,q,s} \right)\). Here, the subword length satisfies \(k \geqslant {{k}^{ \star }}\).

Remark. The root length s = 0 gives the case considered in Theorem 3.

Corollary 2. The root length s = 1 is equivalent to a single symbol placed between two periodic subwords in the word \(\alpha \). In this case, \(\alpha \) can be reconstructed from subwords if \(l \geqslant \left( {m - 1} \right){{q}^{{{{k}^{ \star }}}}} + {{(q + 1)}^{{{{k}^{ \star }}}}}\), where \({{k}^{ \star }} = \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\) and \(P \equiv {\text{max}}\left( {p,q} \right)\). The subword length \(k\) is constrained by the relation \(k \geqslant {{k}^{ \star }}\).

Corollary 3. Suppose that a word α of the form (6) is such that the generating subwords in the prefix and the suffix are identical:

$$\begin{array}{*{20}{c}} {\alpha = {{{({{a}_{1}}{{a}_{2}} \ldots {{a}_{q}})}}^{m}}{{a}_{{q + 1}}}{{a}_{{q + 1}}} \ldots {{a}_{{q + s}}}{{{({{a}_{{q + s + 1}}}{{a}_{{q + s + 2}}} \ldots {{a}_{{q + s + p}}})}}^{l}},} \\ \begin{gathered} q = p,\quad {{a}_{i}} = {{a}_{j}},~ \\ \left( {i,j} \right) \in \left\{ {\left( {1,q + s + 1} \right), \ldots ,\left( {q,q + s + p} \right)} \right\}. \\ \end{gathered} \end{array}$$

Assume that \({{k}^{ \star }} = \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\) and \(P \equiv {\text{max}}\left( {q,s} \right)\). Then, for \(l + m \geqslant {{s}^{{{{k}^{ \star }}}}}\) the word α can be uniquely reconstructed if \(k \geqslant {{k}^{ \star }}\).

Proof. For α having the indicated structure, Eq. (7) becomes

$${{s}_{{k'}}}\left( \alpha \right) = \left( {m + l} \right)\mathop \sum \limits_{r = 1}^q {{a}_{r}}{{r}^{{k'}}} + \mathop \sum \limits_{r = q + 1}^{q + s} {{a}_{r}}{{r}^{{k'}}} + {{f}_{{k'}}}\left( {{{s}_{{k' - 1}}}, \ldots ,{{s}_{0}}} \right).$$

Therefore, for \(m + l \geqslant {{(q + s)}^{{{{k}^{ \star }}}}} - {{q}^{{{{k}^{ \star }}}}}\), the equations for the generating subword of the prefix (suffix) and the root can be split.

Theorem 5. Suppose that a word α has the form

$$\alpha = {{({{({{a}_{1}}{{a}_{2}} \ldots {{a}_{q}})}^{m}}{{({{a}_{{q + 1}}}{{a}_{{q + 2}}} \ldots {{a}_{{q + p}}})}^{l}})}^{N}},$$
(8)

i.e., it consists of repeats of a subword, which, in turn, consists of a periodic prefix and a suffix. Then, for \(l \geqslant m{{q}^{{{{k}^{ \star }}}}}\), where \({{k}^{ \star }} = \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\) and \(P \equiv {\text{max}}\left( {p,q} \right)\), word (8) can be uniquely reconstructed if \(k \geqslant {{k}^{ \star }}\).

Proof. We begin with computing \({{s}_{0}}\left( \alpha \right)\):

$${{s}_{0}}\left( \alpha \right) = \mathop \sum \limits_{r = 1}^{N\left( {qm + pl} \right)} {{a}_{r}} = Nm\mathop \sum \limits_{r = 1}^q {{a}_{r}} + Nl\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}}.$$

For \(m < l\), the equations split:

$$\left\{ {\begin{array}{*{20}{c}} {\mathop \sum \limits_{r = q + 1}^{q + p} {{a}_{r}} = \left\lfloor {\frac{{{{s}_{0}}\left( \alpha \right)}}{{Nl}}} \right\rfloor } \\ {\mathop \sum \limits_{r = 1}^q {{a}_{r}} = \left\{ {\frac{{{{s}_{0}}\left( \alpha \right)}}{{Nl}}} \right\} \cdot \frac{l}{m}.} \end{array}} \right.$$

Introducing the notation \(\tilde {n} \equiv qm + pl\), we express \({{s}_{{k'}}}\left( \alpha \right)\) for \(k'\) such that \(1 \leqslant k' \leqslant k - 1\):

$$\begin{gathered} {{s}_{{k'}}}(\alpha ) = \sum\limits_{r = 1}^{N\tilde {n}} {{{a}_{r}}{{r}^{{k'}}}} \\ = \sum\limits_{j = 0}^{N - 1} {\left[ {\left( {\sum\limits_{i = 0}^{m - 1} {\sum\limits_{r = \tilde {n}j + qi + 1}^{\tilde {n}j + qi + q} {{{a}_{{r - \tilde {n}j - qi}}}{{e}^{{k'}}}} } } \right)} \right.} \\ \left. { + \left( {\sum\limits_{i = 0}^{l - 1} {\sum\limits_{r = \tilde {n}j + qm + pi + 1}^{\tilde {n}j + qm + pi + p} {{{a}_{{r - \tilde {n}j - qm - pi + q}}}{{r}^{{k'}}}} } } \right)} \right] \\ \end{gathered} $$
$$\begin{gathered} = \sum\limits_{j = 0}^{N - 1} {\left[ {\left( {\sum\limits_{i = 0}^{m - 1} {\sum\limits_{r = 1}^q {{{a}_{r}}{{{(r + \tilde {n}j + qi)}}^{{k'}}}} } } \right)} \right.} \\ \left. { + \left( {\sum\limits_{i = 0}^{l - 1} {\sum\limits_{r = q + 1}^{q + p} {{{a}_{r}}{{{(r + \tilde {n}j + qm + pi - q)}}^{{k'}}}} } } \right)} \right] \\ = Nm\sum\limits_{r = 1}^q {{{a}_{r}}{{r}^{{k'}}}} + Nl\sum\limits_{r = q + 1}^{q + p} {{{a}_{r}}{{r}^{{k'}}} + {{f}_{{k'}}}({{s}_{{k' - 1}}},...,{{s}_{0}}).} \\ \end{gathered} $$

Thus, for \(\frac{m}{l} \cdot \frac{{{{q}^{k}}}}{k} < 1\), all equations split and we obtain two systems. By Theorem 2 for a periodic word, for \(l\, \geqslant \,m{{q}^{{{{k}^{ \star }}}}}\), where \({{k}^{ \star }} = \left\lfloor {\frac{{16}}{7}\sqrt P } \right\rfloor + 5\) and \(P \equiv {\text{max}}\left( {p,q} \right)\), the word \(\alpha \) can be reconstructed if \(k \geqslant {{k}^{ \star }}\).

Remark. If N = 1, we obtain the result proved in Theorem 3 for a word consisting, in fact, of a single repeat of a subword with a periodic suffix and prefix.

4 CONCLUSIONS

It was found that a periodic word of length \(n\) with period p can be uniquely reconstructed from a multiset of its subwords of fixed length k if \(k \geqslant \left\lfloor {\frac{{16}}{7}\sqrt p } \right\rfloor + 5\). The subword length k was also estimated for words that are not periodic, but contain periodic subwords. Specifically, for words consisting of a periodic prefix and a periodic suffix, the possibility of unique reconstruction was proved for a sufficient suffix length. It was shown that words consisting of a periodic prefix, a periodic suffix, and an aperiodic root can be uniquely reconstructed from a multiset of their fragments of length k under two conditions: on the one hand, if the root is small as compared with the prefix and, on the other hand, if the prefix is small as compared with the suffix. An estimate for \(k\) was proved in the case of a periodic word generated by a subword consisting of two parts: a periodic suffix and a prefix.