Erasure decoding of convolutional codes using first-order representations

It is well known that there is a correspondence between convolutional codes and discrete-time linear systems over finite fields. In this paper, we employ the linear systems representation of a convolutional code to develop a decoding algorithm for convolutional codes over the erasure channel. In this kind of channel, which is important due to its use for data transmission over the Internet, the receiver knows if a received symbol is correct. We study the decoding problem using the state space description of a convolutional code, and this provides in a natural way additional information. With respect to previously known decoding algorithms, our new algorithm has the advantage that it is able to reduce the decoding delay as well as the computational effort in the erasure recovery process. We describe which properties a convolutional code should have in order to obtain a good decoding performance and illustrate it with an example.


Introduction
In modern communication, especially over the Internet, the erasure channel is widely used for data transmission.In this type of channel the receiver knows if an arrived symbol is correct, as each symbol either arrives correctly or is erased.
For example over the Internet messages are transmitted using packets and each packet comes with a check sum.The receiver knows that a packet is correct when the check sum is correct.Otherwise a packet is corrupted or simply is lost during transmission.An especially suitable class of codes for transmission over an erasure channel is the class of convolutional codes [12].It is known that convolutional codes are closely related to discrete-time linear systems over finite fields, in fact each convolutional code has a so-called input-state-output (ISO) representation via such a linear system [13,14].This correspondence was also used in [4,5,6] to study concatenated convolutional codes.Moreover, the connection between linear systems and convolutional codes was investigated in a more general setup in [17], where multidimensional codes and systems over finite rings were considered.
Hence, decoding of a convolutional code can be viewed as finding the trajectory (consisting of input and output) of the corresponding linear system that is in some sense closest to the received data.The underlying distance measure one uses to identify the closest trajectory (i.e. the closest codeword) depends on the kind of channel that is used for data transmission.This decoding process can also be interpreted as minimizing a cost function attached to the corresponding linear system, which measures the distance of a received word to a codeword or the distance of a measured trajectory to a possible trajectory, respectively.For the Euclidean metric over the field of real numbers R, this is nothing else than solving the classical LQ problem, i.e. minimizing the cost function N −1 i=0 ||u i − ûi || 2 + ||y i − ŷi || 2 , where û ∈ (R m ) N and ŷ ∈ (R p ) N are received and one wants to find an input u ∈ (R m ) N and corresponding output y ∈ (R p ) N of the linear system such that this cost function is minimized.This problem is relatively easy to solve and it is known how to approach it for quite some time; see e.g.[10].
However, for the setting of classical coding theory, where usually the Hamming metric over finite fields is used, it turns out to be in general a hard problem to minimize the corresponding cost function N −1 i=0 wt(u i − ûi ) + wt(y i − ŷi ) with û, u ∈ (F m ) N and ŷ, y ∈ (F p ) N for some finite field F. The methods used to solve the LQ problem cannot be applied since the Hamming metric is not induced by a positive definite scalar product.However, the problem becomes much easier for transmission over an erasure channel as done with convolutional codes in this paper.In this setting, one introduces an additional symbol * that stands for an erasure and considers F ∪ { * } as set of symbols for the decoding.The Hamming metric can easily be extended to this new symbol space and we are going to minimize the same cost function.The big advantage when decoding over an erasure channel is that we know that all received symbols, i.e. all symbols except * in û and v, are correct and we only have to find a way to replace the unknowns * be the original values to bring the cost function to its minimal value, which equals the number of erasures.It depends on the number of erasures if unique decoding is possible or if one gets a list of possible codewords.In this paper, we focus on unique decoding, i.e. we present an erasure decoding algorithm that skips part of the sequence if there are too many erasures such that unique decoding is not possible.Our algorithm exploits the ISO representation of a convolutional code via linear systems to recover the erasures in the received sequence.With respect to other erasure decoding algorithms for convolutional codes that can be found in the literature [15,1], our systems theoretic approach has the advantage that the computational effort as well as the decoding delay can be reduced.
The paper is structured as follows.In Section 2, we give the necessary background on convolutional codes.In Section 3, we explain the correspondence of time-discrete linear systems and convolutional codes.In Section 4, we present our decoding algorithm, describe which properties a convolutional code should have to perform well with our algorithm and illustrate it with an example.In Section 5, we describe the advantages of our algorithm and in Section 6, we conclude with some remarks.

Convolutional codes
In this section, we start with some basics on convolutional codes.
is a principal ideal domain, every submodule is free and hence, there exists a full column rank polynomial matrix G(z) ∈ F[z] n×k whose columns constitute a basis of C, i.e.
Such a polynomial matrix G is called a generator matrix of C. A basis of an and therfore also a generator matrix of a convolutional code, is not unique.If G(z) and G(z) in F[z] n×k are two generator matrices of C, then one has G(z) = G(z)U (z) for some unimodular matrix U (z) ∈ F[z] k×k (a unimodular matrix is a polynomial matrix with a polynomial inverse).
Another important parameter of a convolutional code is its degree δ, which is defined as the highest (polynomial) degree of the k × k minors of any generator matrix G(z) of the code.An (n, k) convolutional code with degree δ is denoted as (n, k, δ) convolutional code.If δ 1 , ..., δ k are the column degrees (i.e. the largest degrees of any entry of a fixed column) of G(z), then one has that δ ≤ δ 1 +...+δ k .
Moreover, there always exists a generator matrix of C such that δ = δ 1 + ... + δ k and we call such a generator matrix column reduced.
Furthermore, for the use over an erasure channel, it is a crucial property of a convolutional code to be non-catastrophic.A convolutional code is said to be non-catastrophic if one (and therefore each) of its generator matrices is right prime, i.e. if it admits a polynomial left inverse.The following theorem shows, why this property is so important.Theorem 2.2.Let C be an (n, k) convolutional code.Then C is noncatastrophic if and only if there exists a so-called parity-check matrix for C, i.e. a full row rank polynomial matrix Parity-check matrices are common to be used for decoding of convolutional codes over the erasure channel.Recall hat, when transmitting over this kind of channel, each symbol is either received correctly or is not received at all.
The first decoding algorithm of convolutional codes over the erasure channel using parity-check matrices can be found in [15], variations of it in [1] or [11].
To investigate the capability of error correction of convolutional codes, it is necessary to define distance measures for these codes.
Therefore, we denote by the Hamming weight wt(v) of v ∈ F n the number of its nonzero components.For v(z for t ∈ {0, . . ., r}.For j ∈ N 0 , we define the j-th column distance of a convolutional code C as The erasure correcting capability of a convolutional code increases with its column distances, which are upper bounded as the following theorem shows. Theorem 2.3.[8] Let C be an (n, k, δ) convolutional code.Then, it holds: It is well-known that the column distances of a convolutional code could reach this upper bound only up to If one has equality for some j 0 ∈ N in Theorem 2.3, then one also has equality for j ≤ j 0 , see [8].Hence, it is sufficient to have equality for j = L to obtain an MDP convolutional code.The following theorem presents criteria to check if a convolutional code is MDP.
The following statements are equivalent: where G i ≡ 0 for i > µ has the property that every full size minor that is not trivially zero, i.e. zero for all choices of G 1 , . . ., G j , is nonzero.
 with H i ≡ 0 for i > ν has the property that every full size minor that is not trivially zero is nonzero.
The erasure decoding capability of an MDP convolutional code is stated in the following theorem.
Theorem 2.6.[15] If for an (n, k, δ) MDP convolutional code C, in any sliding window of length at most (L + 1)n at most (L + 1)(n − k) erasures occur, then full error correction from left to right is possible.
This means that the knowledge of the input and output sequences is sufficient to determine the sequence of states.
(c) minimal if it is reachable and observable.
Recall the following well-known characterization of reachability and observability.
Next, we will explain how one can obtain a convolutional code from a linear system; see [14].First, for we set Furthermore, there exist . one is able to obtain a factorization of the transfer function of the linear system via the generator matrix of the corresponding convolutional code, and in the case that this convolutional code is non-catastrophic, one even obtains a coprime factorization of the transfer function.
On the other hand, for each (n, k, δ) convolutional code C, there exists (A, B, C, D) ∈ In Remark 3.3.In the coding literature state space descriptions were often done in a graph theoretic manner using so-called trellis representations: see e.g.[7].However, especially over large finite fields it is hard to algebraically describe a decoding algorithm and hence, a state space description as above is preferred.
The following theorems show how properties of a linear system are related to properties of the corresponding convolutional code.

Low-delay erasure decoding algorithm using the linear systems representation
In this chapter, we develop our erasure decoding algorithm based on the ISO representation of the convolutional code.Some first ideas on decoding via this representation can already be found in [16].We adopt some of the ideas presented there and combine it with new ideas to obtain a complete decoding algorithm.
Assume that we have a message with m i ∈ F k which is sent at time step i.We write this message as m(z) = γ i=0 m γ−i z i and encode it via a full rank, left prime, column reduced polynomial generator matrix and so on.
Remark 4.1.In principle, it would be also possible to encode the message via the linear system, i.e. to set u(z) = m(z).In this case, one gets a rational generator matrix, which equals the transfer function of the linear system.But to make sure that the state and the output of the linear system have finite support, we had to impose restrictions on the input, i.e. on the message.This is why we consider this option as not suitable.
Let (A, B, C, D) be the linear systems representation of the convolutional code generated by G(z).Then, (y 0 , u 0 , . . ., y j , u j ) represents the beginning of a codeword if and only if Moreover, one has for i, j, l ∈ N 0 : where Define F 0 := D and for j ≥ 1 as well as Furthermore, u i = y i = 0 for i > γ + µ implies and Ẽw as submatrix of E w consisting only of the columns corresponding to components of (u ⊤ 0 , . . ., u ⊤ γ+µ ) that are not known yet.We assume that the erasure recovering process has to be done within time delay T , i.e. it is neceassary that m i can be recovered after one has received (with possible erasures) v 0 , . . ., v i , . . ., v i+T .
Assume that v 0 , . . ., v i−1 are known and v i contains erasures.Then, one obtains where β is a known vector depending on v 0 , . . ., v i−1 .

2:
If there exists w ∈ N 0 such that Ẽw has full column rank, go to 12, otherwise if v i contains erasures, go to 3 and if v i contains no erasures, set i = i + 1 and repeat step 2.
4: If v i can be recovered solving the linear system of equations induced by [−I | F j ] and v i , . . ., v i+j (see ( 5)), go to 5, otherwise go to 6.
5: Recover the erasures in v i (and if possible also erasures in v i+1 , . . ., v i+j ), solving the system of linear equations (5).Replace the erased symbols with the correct symbols and go back to 2.
6: If j = T , we go to 7. Otherwise, we set j = j + 1 and go back to 4.
9: If x i+l can be recovered solving the linear system of equations induced by (3) with x i+l and the erased components of v i+l , . . ., v i+l+j as unknowns, we go to 10. Otherwise, we go to 11.
10: Recover x i+l and as much as possible of v i+l , . . ., v i+l+j with the help of (3).With the knowledge of x i+l and u 0 , . . ., u i−1 and with equation (3), ob- If l ≤ ℓ, this equations allows us to recover u i , . . ., u i+l−1 and use it to compute y i , . . ., y i+l−1 as well.If l > ℓ some values of v i , . . ., v i+l−1 are lost but still we can restart the recovering process after these lost symbols.In either case, set i = i + l − 1 and go back to 2.
11: If j = T − l, set l = l + 1, and go back to 8. Otherwise set j = j + 1 and go back to 9.
12: Use the system of linear equations In steps 4 to 6 the algorithm recovers erasures forward within time delay T as long as this is possible.If it reaches a point where this is not possible, it tries to recover the state of the corresponding linear system (steps 9 to 11) to be able to restart the decoding process (and recovers also symbols that had been lost in between, in case this is possible, even if these symbols are then recovered with a delay that is larger than T ).After every successful recovery, in step 2, it is checked if there are already enough symbols known to recover the whole message with step 12.Note that due to theorem of Cayley-Hamilton one only has to check Ẽw up to w = δ − 1.
In order to have a good performance for our algorithm, a convolutional code should fulfill the following properties as good as possible: 1.The nontrivial minors of F j are nonzero for j = 1, . . ., T .
3. For as many sets of columns of E w as possible, there exists w = 1, . . ., δ −1 such that these columns are linearly independent.
4. ℓ is as large as possible.
It is difficult to ensure that all these four properties are perfectly fulfilled.However, since these properties involve similar matrices, it seems to be a good attempt to construct a convolutional code in such a way that some of the properties are fulfilled, and then check how good the other properties are fulfilled.
Clearly, if 2. is perfectly fulfilled, then also 1. Furthermore, there already exist constructions for matrices having all nontrivial minors nonzero (in the literature also referred to as superregular matrices); see e.g.[2], [16], [8].Hence, to illustrate the performance of our algorithm with an example, we will construct a convolutional code such that 2. is perfectly fulfilled and then investigate how good 3. and 4. are fulfilled.Note that 4. is not so important for our algorithm as it only helps to recover symbols that had to be declared as lost with a larger delay as allowed by the delay constraint.
Example 4.3.We will construct an (5, 3, 2) convolutional code for decoding with maximum delay T = L = 1.First note that property 4 can never be fulfilled for these parameters because R l has more columns than rows for all l ∈ N 0 .But as mentioned before, this property is only useful for the recovery of lost symbols with larger delay than originally prescribed and thus, it is no problem to neglect this.Hence, we want to construct has all nontrivial minors nonzero for a suitable finite field F. We use the construction for superregular matrices from [3] as well as the fact that column permutation preserves superregularity to obtain that a 16 a a 2 a 4 0 0 0 a 16 a 32 a 2 a 4 a 8 0 0 0 a 64 a 128 a 8 a 16 a 32 a a 2 a 4 a 128 a 256 a 16 a 32 a 64 a 2 a 4 a 8     , where F = F p N with N > 330 and a is a primitive element of F, has the property that all nontrivial minors are nonzero.We immediately obtain D = a a 2 a 4 a 2 a 4 a 8 and C = a 8 a 16 a 16 a 32 and can compute and As B is full rank, (A, B, C, D) is a minimal ISO representation of an (5, 3, 2) convolutional code C and since F 1 is superregular, C is an MDP convolutional code.Hence, in particular, it has to fulfill Theorem 2.5 (ii), which is not possible if G 1 has two columns that are identically zero.Hence a generator matrix G of C has at most one column degree that is equal to zero.Consequently, G has column degrees 1, 1, 0 since we assumed it to be a column reduced generator matrix and thus, the column degrees of G have to sum up to δ = 2. Therefore, we obtain µ = 1.Assume γ = 3 and that we receive the following: where * symbolizes an erasure and √ a received symbol.
Since C is MDP, it can recover n − k erasures out of n symbols or 2(n − k) erasures out of 2n symbols (assuming that there are no erasures in front of this window of size n or 2n, respectively).The steps of our algorithm with C and the above erasure pattern would be the following.First, the algorithm uses ( 5) with j = 0 to recover y 0 .Afterwards, one realizes that it is neither possible to recover y 1 and u 1 with (5) for j = 0 nor y 1 , u 1 , y 2 , u 2 with (5) for j = 1.The algorithm applies (3) with i = l = 1 to recover x 2 and y 2 but the erased components of y 1 and u 1 have to be declared as lost.Finally, as the matrix consisting of the first column of CA 3 B CA 4 B and all columns of CB CAB has nonzero determinant, one can use step 12 of the algorithm to recover the lost component of u 1 as well as u 4 before u 4 and y 4 were even sent, just with the knowledge of the already known symbols of u 0 , u 1 , u 2 , u 3 and with the information that γ = 3, i.e. u i = y i = 0 for i > 4.Then, with the knowledge of u 0 , . . ., u 4 , it is also possible to compute the erased components of y 1 and y 4 .
In summary, we are able to recover the whole sequence but part of it only with a larger delay than actually allowed.However, we were able to obtain u 4 , y 4 already one time interval before these vectors were sent, i.e. in some sense with delay −1.

Performance Analysis
In this section, we will explain the two main advantages of our systems theoretic decoding algorithm with respect to the (first) erasure decoding algorithm for convolutional codes that can be found in [15], namely the reduced decoding delay and the reduced computational effort.
Our algorithm tries to recover the occurring erasures with smallest possible delay by first trying to do the recovery in a window of size n, afterwards in a window of size 2n, and so on.In contrast to this approach, the decoding algorithm in [15] first tries to decode in the largest possible window of size (L + 1)n and only decreases this window if it fails to recover all the erasures in the big window.This implies that the decoding delay is always at least L.Moreover, it is computationally less complex and less costly to do several decoding steps in small windows than one decoding step in a larger window whose size is the sum of the sizes of the smaller windows since it is easier to solve several small than one large linear system of equations.In addition, by using the linear systems approach, the systems of equations we have to solve for erasure recovery are parts of linear systems that are already in echelon form; see (5).Especially, when we transmit over a channel with a statistic that implies that it is more likely to get erasures in the y i than in the u i , this is of very big advantage as you can obtain any erased component of any y i (that has the possibility to be recovered), directly from (5) with very small computational effort.
Finally, as we already observed in our example, the use of the terminating equations in step 12 of the algorithm can make it possile to obtain symbols that were not even sent yet, i.e. in some sense we are able to "look into the future" and terminate the decoding before the end of the transmission.This is of course an additional considerable reduction of the decoding delay.

Conclusion
In this paper, we presented an erasure decoding algorithm for convolutional codes employing their linear systems representation.We observed that this algorithm is able to reduce the decoding delay and the computational effort in comparison with previous algorithms.
output y ∈ F n−k and s, τ ∈ N 0 .We identify this system with the matrix-quadruple (A, B, C, D).The function T (z) = C(zI − A) −1 B + D is called transfer function of the linear system.Definition 3.1.

Theorem 3 . 2 .
(Kalman test) A linear system (1) is reachable if and only if the reachability matrix R(A, B) := (B, AB, . . ., A s−1 B) ∈ F s×sk satisfies rk(R(A, B)) = s and observable if and only if the observability matrix O(A, C) 0 forms a submodule of F[z] n of rank k and thus, an (n, k) convolutional code, denoted by C(A, B, C, D).Moreover, if one writes x(z) = x 0 z γ + • • • + x γ , y(z) = y 0 z γ + • • • + y γ and u(z) = u 0 z γ + • • • + u γ with γ = max(deg(x), deg(y), deg(u)), it holds this case, (A, B, C, D) is called linear systems representation or input-stateoutput (ISO) representation of C. Besides, one can always choose s = δ.In this case, (A, B, C, D) is called a minimal representation of C.
and ℓ := max{l | R l has full column rank} if B has full column rank and ℓ := −1 otherwise.Theorem 4.2.[9] The quadruple (A, B, C, D) is the linear systems representation of an MDP convolutional code if and only if each minor of F L which is not trivially zero is nonzero.