1 Introduction

In the current 5G new radio (NR) and the next generation wireless networks, high reliability and low power consumption are usually two important metrics should be seriously addressed. These applications include the machine-to-machine (M2M) communications in the Internet of Things (IoT) [1], the vehicle-to-vehicle communications in intelligent transportation systems (ITS) [2], and the wireless sensor networks used in smart buildings [3]. As these applications need short packets and high code rate codes for message transmission, the wireless channels may be affected by a number of noise sources, for example the impulse noise from point discharging, the movement of surrounded human or robots, and other wireless devices deployed in the same area. In this regard, the capacity achieving decoding algorithms and the efficient decoding architectures are the core of realization [4]. However, meeting the high reliability and low power consumption requirements remains a challenging task [5, 6]. The reason is that higher code rate often implies weaker error correction capability, and powerful decoding algorithms may require higher decoding complexity.

Among possible schemes towards high reliability for short packet transmission, the guessing random additive noise decoding (GRAND) algorithm and its derived variants have proven to be an effective solution [5, 7]. The original GRAND is a hard decision version, which has been improved to a number of variants that perform well over ideal channel models. This paper concentrates on the GRAND-Markov order (GRAND-MO) variant for two aspects: i) In practical civil and industrial applications, noise sources in the physical channels inherently have time-dependent correlations. These time-correlated additive noise effects usually generate errors that arrive in bursts at the receiver side. ii) For concatenated coding schemes, the outer code decoder is input by a hard decision packet, which is the output of the concatenated inner code decoder. Therefore, only hard decoding algorithms can be used for the outer code decoder. To be applied in such a case, GRAND-MO provided an alternative solution, and it is crucial to develop a noise error patterns (NEPs) generator that can generate NEPs efficiently.

Based on the successive addition and subtraction (SAS) operation for GRAND-MO [8], this paper goes further by converting the theoretical SAS-based operation into the hardware architecture design of a NEPs generator, followed by its realization using field programmable gate array (FPGA). As the first step, we propose an efficient architecture to generate all permutations for given burst parameters, by which the “1" and “0" burst permutations are parallel computed in the same way. Secondly, a special processing module is employed to regularize the generated “0" burst permutations. This novel technique ensures a uniformed generation of the four types of NEPs. Thirdly, we optimize the NEPs generator architecture through bit flipping. With an initialized all zero sequence, the “1" bursts are obtained by flipping the 0 bits at the corresponding indexes. The proposed NEPs generator can generate all of the putative NEPs for given burst parameters. Therefore, it can be adapted to other hard GRAND variants if the burst parameters are presented.

This paper is organized as follows. Section 2 gives primary background knowledge of the GRAND-MO algorithm and related works in GRAND variants. In Section 3, a uniformed construction technique of the four types of NEPs is presented. In Section 4, the overall NEPs generation architecture is proposed, includes the key modules design and related mathematical derivations. Then in Section 5, the decoding performance, hardware overhead and power consumption of the proposed NEPs generator are presented for effectiveness demonstration. At last, this work is concluded in Section 6.

2 Background knowledge and related works of GRAND

2.1 Preliminaries of the GRAND-MO algorithm

Aimed at improving the error correction capability of short packet and high code rate codes, the novel GRAND strategy was presented in [9], and then the theoretical analysis and performance verification were detailed in [5] and [6]. For a clear explanation, this paper expresses the vectors of the coded packet, the additive noise effects and the demodulated packet as \(\varvec{X}=x_{1} \cdots x_{i} \cdots x_{N}\), \(\varvec{Z}=z_{1} \cdots z_{i} \cdots z_{N}\) and \(\varvec{Y}=y_{1} \cdots y_{i} \cdots y_{N}\), respectively, where N is the vector length, \(x_{i}\), \(z_{i}\) and \(y_{i}\) are binary bits in set \(\{0, 1\}\). According to these assumptions and using the module-2 addition, vector \(\varvec{Y}\) can be expressed by Eq. (1).

$$\begin{aligned} \varvec{Y} = \varvec{X} \oplus \varvec{Z} \end{aligned}$$
(1)

By transmitting known packets and counting the distribution of received error bits, the probability distribution of the noise effects can be achieved. In the basic GRAND algorithm, binary putative NEPs vectors \(\varvec{E}=e_{1} \cdots e_{i} \cdots e_{N}\) are generated from the most likely to the least likely in terms of the statistical probability. Then, the putative NEPs are used to guess the additive noise effects vector \(\varvec{Z}\). The guessing operation is a 3-step procedure performed as follows:

  • Step 1: Calculate the guessed vectors \(\hat{\varvec{X}}\) of \(\varvec{X}\) by Eq. (2).

    $$\begin{aligned} \hat{\varvec{X}} = \varvec{Y} \oplus \varvec{E} \end{aligned}$$
    (2)
  • Step 2: \(\varvec{X}\) are any sequency in the linear codeword space \(\varvec{C}\), the legality of \(\hat{\varvec{X}}\) are verified by computing the syndrome vector \(\varvec{S}\) through a matrix operation.

    $$\begin{aligned} \varvec{S} = \hat{\varvec{X}} \cdot \varvec{H}^{T} \end{aligned}$$
    (3)

    where \(\varvec{H}\) is the parity check matrix of codeword space \(\varvec{C}\), and T stands for the transpose operation.

  • Step 3: If the syndrome vector \(\varvec{S}\) is a zero vector, it means \(\varvec{E}\) equals \(\varvec{Z}\), the legality of \(\hat{\varvec{X}}\) is verified, \(\hat{\varvec{X}}\) is output as the decoding result. Otherwise, new NEPs \(\varvec{E}\) should be generated to start the next round of the guessing operation.

In real environments, the communication channels are not memoryless and may cause clumps of errors. Theoretically, the bursty errors in a binary symmetric channel can be modeled by a two-state Markov chain. Assuming the transition probability from the good state to the bad state, and conversely from the bad state to the good state is b and g, respectively. An et al. [6] and [10] proposed the GRAND-MO algorithm, in which the memory characteristic of a binary channel is scaled by the \(\Delta l\) parameter, as calculated by Eq. (4).

$$\begin{aligned} \Delta l = \left\lfloor \frac{log(b/g)}{log((1-g)/(1-b))} \right\rfloor -1 \end{aligned}$$
(4)

For a class of putative NEPs \(\varvec{E}=e_{1} \cdots e_{i} \cdots e_{N}\) that includes m “1" bursts and totally \(l_{m}\) 1s, it is represented as the “1" burst parameters \(\{m,l_{m}\}\) in this paper. For example, if \(\varvec{E}=0100111000\), we can derive that \(m=2\) and \(l_{m}=4\). Using the \(\Delta l\) parameter, the NEPs classes with different “1" burst parameters \(\{m,l_{m}\}\) can be arranged from the most likely to the least likely. The first hardware implementation of GRAND-MO is proposed in [11]. With the syndromes stored in the memory for error bits position localization, the NEPs are generated in a parallel manner. Subsequently, an enforced version of this work was reported in [7]. While in [8], the authors proposed a successive addition-subtraction (SAS) operation based scheme. With the predefined “1" burst parameters \(\{m,l_{m}\}\), this scheme can calculate all of the corresponded permutations.

2.2 Related works of GRAND

Inspired by the optimal decoding performance that the GRAND algorithm can offer [12, 13], researchers have proposed a number of improved variants to simplify the practical applications. In [5], the GRAND with abandonment (GRANDAB) was proposed by limiting the number of corrupted bits in the generated NEPs. Recently, the GRANDAB algorithm was transformed into high throughput engineering realization in [14]. By calculating the syndromes associated with the flipped error bits, a maximum of 3 error bits can be corrected. While in [15], the average number of queries is significantly reduced by using the code structure to facilitate the membership checking operation. As an extension of the basic hard GRAND, Duffy and Médard [16] proposed a soft GRAND (SGRAND) algorithm to avail of the soft detection symbol reliability information, the decoding complexity is significantly reduced as the code rate increases. In [17], the authors employed the symbol reliability information to indicate whether a demodulated symbol is confidential or not. Through this technique, they introduced the SRGRAND and SRGRANDAB variants, aiming to strike a balance between the decoding performance and the computational complexity.

To achieve better decoding performance of SGRAND and to make use of the parallelized decoding architecture of SRGRAND, the ordered reliability bits GRAND (ORBGRAND) variant was proposed in [18]. By ordering the bit reliabilities from the least to the most reliable, ORBGRAND generates NEPs in the order of increasing likelihood, which equals to rank the order of the NEPs by logistic weight (LW). To reduce the decoding complexity and hardware overhead, Condo et al. [19] proposed a sequentially proceeded LW generation algorithm, where one NEP can be derived from its predecessor. Under high signal-to-noise ratio (SNR), Condo [20] found some NEPs have high LW but with low Hamming weight are more frequently appeared, leading to a proposal to store these empirically observed NEPs in a table. This look-up-table aided technique can limit the decoding process to a fixed latency. The latest enhancement in ORBGRAND is the listed-GRAND (LGRAND) [21], in which a list is generated during the decoding process, and a candidate NEP with the highest likelihood is selected.

3 Construction of the NEPs in GRAND-MO

As seen from Section 2-A, the NEPs generation is the core for the GRAND-MO algorithm. This section starts by analyzing the NEPs construction in GRAND-MO. Firstly, we illustrate the construction of NEPs in a conventional manner, which includes four types of NEPs. Then, we propose to regularize the “0" burst permutations, by which the generation of the four types of NEPs is uniformed to facilitate its hardware implementation.

3.1 Conventional construction of the four types of NEPs

To generate the NEPs for given “1” burst parameters \(\{m,l_{m}\}\), the first step is to calculate the corresponding permutations \(\{A_{1}, \cdots , A_{i}, \cdots A_{M} \}\), where \(M={l_{m}-1\atopwithdelims ()m-1}\) [8], \(A_{i}=(a_{i,1}, \cdots , a_{i,k}, \cdots , a_{i,m})\), \(1 \le a_{i,k} \le l_{m}\), \(a_{i,k}\) are the number of “1" in the \(k^{th}\) burst. The number of “1" in one NEP is \(l_{m}\) that satisfies \(\sum _{k=1}^{m}a_{i,k}=l_{m}\). Considering the NEPs also include the “0" burst, this paper assumes the “0" burst parameters as \(\{n,l_{n}\}\), where n is the number of “0" bursts, \(l_{n}\) is the sum of 0 bits in the NEPs. In the same way, we represent the “0" burst permutations as \(\{B_{1}, \cdots , B_{j}, \cdots , B_{P} \}\), where \(B_{j}=(b_{j,1}, \cdots , b_{j,k}, \cdots , b_{j,n})\), \(\sum _{k=1}^{n}b_{j,k}=l_{n}\), and \(P={l_{n}-1\atopwithdelims ()n-1}\). Classified by the location of the “1" and “0" bits and to make a clear explanation of the NEPs construction, we propose to divide the NEPs construction into four cases. Note that N is the length of NEPs, and \(\{m,l_{m}\}\) are the “1" bursts parameters. The four cases are illustrated as follows.

  1. 1.

    Case 1: For the NEPs with the form \(\varvec{E}=0 e_{2} \cdots e_{i} \cdots e_{N-1} 0\), the start and the end bits are both 0, the “0" burst parameters of this type of NEPs are \(\{n,l_{n}\}=\{m+1,N-l_{m}\}\).

  2. 2.

    Case 2: For the NEPs with the form \(\varvec{E}=1 e_{2} \cdots e_{i} \cdots e_{N-1} 1\), the start and the end bits are both 1, the “0" burst parameters of this type of NEPs are \(\{n,l_{n}\}=\{m-1,N-l_{m}\}\).

  3. 3.

    Case 3: For the NEPs with the form \(\varvec{E}=0 e_{2} \cdots e_{i} \cdots e_{N-1} 1\), the start bit is 0 and the end bit is 1, the “0" burst parameters of this type of NEPs are \(\{n,l_{n}\}=\{m,N-l_{m}\}\).

  4. 4.

    Case 4: For the NEPs with the form \(\varvec{E}=1 e_{2} \cdots e_{i} \cdots e_{N-1} 0\), the start bit is 1 and the end bit is 0, the “0" burst parameters of this type of NEPs are \(\{n,l_{n}\}=\{m,N-l_{m}\}\).

For the given parameters N, m and \(l_{m}\), the “0" burst parameters \(\{n,l_{n}\}\) for each case can be calculated. Considering two facts: i) For each the four cases, there are \({l_{m}-1 \atopwithdelims ()m-1}\) “1" burst permutations and \({l_{n}-1 \atopwithdelims ()n-1}\) “0" burst permutations. ii) Each pair of the “1" and “0" burst permutations \((A_{i}, B_{j})\) corresponds to one NEP. To summarize, the total number of NEPs of the four cases is \({l_{m}-1 \atopwithdelims ()m-1} [{N-l_{m}-1 \atopwithdelims ()m}+{N-l_{m}-1 \atopwithdelims ()m-2}+2{N-l_{m}-1 \atopwithdelims ()m-1}]\).

Assume \(N=15\), one of the “1" burst permutation is \(A_{i}=(2,1,2)\), and the corresponded “0" burst permutation is \(B_{j}=(2,4,1,3)\), \(B_{j}=(6,4)\), \(B_{j}=(2,4,4)\) and \(B_{j}=(2,4,4)\) for case 1, 2, 3 and 4, respectively. Examples of the construction for the four cases are presented in Fig. 1(a)–(d), which correspond to a process in which the NEPs are constructed by alternately piecing the “1" and “0" bursts together. The block diagram of the NEPs generator in this conventional manner is presented in Fig. 1(e). In the “1" and “0" burst generators, \(A_{i}\) and \(B_{j}\) are used to generate m “1" bursts and n “0" bursts. Through a switch, these bursts are output to form one NEP.

3.2 Uniformed NEPs generation technique

From the NEPs construction in Fig. 1, the piecing order of the “1" and “0" bursts are different for each case. As a result, the circuits realization of the NEPs generator module would be very complicated. In the four cases, the “1" burst permutation \(A_{i}\) is the same, while the “0" burst permutations \(B_{j}\) are different from each other. Therefore, this paper proposes to regularize the “0" burst permutations. As shown in Eqs. (5a)-(5d), by adding a zero constant to the start and/or the end of the original permutations, the regularized permutations, i.e., \(\bar{B}_{j}=(\bar{b}_{j,1}, \bar{b}_{j,2}, \cdots , \bar{b}_{j,k}, \cdots , \bar{b}_{j,m}, \bar{b}_{j,m+1})\), have \(m+1\) numbers, where \(C_{\alpha } \in \{1, 2, 3, 4\}\) is the case indicator, for the case 1, 2, 3 and 4, respectively.

$$\begin{aligned} \bar{B}_{j}&=(b_{j,1}, b_{j,2}, \cdots , b_{j,k}, \cdots , b_{j,m}, b_{j,m+1}) \ \ \ C_{\alpha }=1 \end{aligned}$$
(5a)
$$\begin{aligned} \bar{B}_{j}&=(0, \ \ \ b_{j,1}, \cdots , b_{j,k}, \cdots , b_{j,m-1}, \ \ \ 0) \ \ \ C_{\alpha }=2 \end{aligned}$$
(5b)
$$\begin{aligned} \bar{B}_{j}&=(b_{j,1}, b_{j,2}, \cdots , b_{j,k}, \cdots , \ \ b_{j,m}, \ \ \ \ 0) \ \ \ \ C_{\alpha }=3 \end{aligned}$$
(5c)
$$\begin{aligned} \bar{B}_{j}&=(0, \ \ \ b_{j,1}, \cdots , b_{j,k}, \cdots , b_{j,m-1}, b_{j,m}) \ \ \ C_{\alpha }=4 \end{aligned}$$
(5d)

With the same assumptions as in Fig. 1, examples of the uniformed NEPs construction for the four cases are illustrated in Fig. 2. For the NEPs in each of the four cases, the first burst is the “0" burst with \(\bar{b}_{j,1}\) 0 bits, followed by the “1" burst with \(a_{i,1}\) 1 bits. After m pairs of the “0" and “1" bursts piecing operations, the last burst is the “0" burst with \(\bar{b}_{j,m+1}\) 0 bits. Consequently, each NEP is composed of \(m+1\) “0" bursts and m “1" bursts, both the first and the last bursts are the “0" bursts. Note that if a “0" burst has zero 0 bit, this means an empty operation. In Fig. 2(b)–(d), the empty operation is represented by the \(\phi\) symbol. In essence, the regularization in Eqs. (5a)–(5d) is to transform the “0" burst permutations of the four cases into the same form, resulting in a uniformed NEPs generation module. As shown in Fig. 2(e), with the input \(A_{i}\) and \(\bar{B}_{j}\) permutations, the start index \(p_{start}\) and the end index \(p_{end}\) of the m “1" burst are iteratively calculated (as detailed in Section 4-C). Assume \(\varvec{E}_{0}\) as a N-bit all zeros sequence, by iteratively flipping the 0 bits with the indexes in \(p_{start}\) and \(p_{end}\) to 1 s, a m “1" bursts NEP can be generated.

Fig. 1
figure 1

Explanations of the NEPs construction in conventional manner

Fig. 2
figure 2

Explanations of the NEPs construction in uniformed manner

4 Hardware design of the NEPs generator

This section first proposes the NEPs generator architecture. Then, the two important parts, i.e., the permutation generation module and the NEPs generation module, are analyzed and designed.

4.1 Diagram of the NEPs generator

As seen from Section 2-B, the NEPs generation can be viewed as a process that the m “1" bursts are obtained by flipping the 0 bits at the corresponded indexes iteratively. Consequently, this paper proposes the NEPs generation diagram, as illustrated in Fig. 3.

Fig. 3
figure 3

Diagram of the proposed NEPs generator

Figure 3 is mainly composed of two parts: the permutation generation module and the NEPs generation module. Operations the NEPs generator are proceeded as follows.

  • Step 1: With the input “1" burst parameters \(\{m,l_{m}\}\), packet length N and cases indicator parameter \(C_{\alpha }\), the “0" burst parameters \(\{n,l_{n}\}\) are calculated. This step presented the parameters for the “0" bursts permutation generation.

  • Step 2: With the input “0" burst parameters \(\{n,l_{n}\}\), totally P permutations \(\{\bar{B}_{1}, \cdots , \bar{B}_{j}, \cdots , \bar{B}_{P} \}\) are generated in the \(B_{j}\) permutation generation module. Using Eqs. (5a)–(5d), \(\bar{B}_{j}\) are obtained by regularizing \(B_{j}\). Simultaneously, the M permutations \(\{A_{1}, \cdots , A_{i}, \cdots A_{M} \}\) are generated with the “1" burst parameters \(\{m,l_{m}\}\) in the \(A_{i}\) permutation generation module.

  • Step 3: For each “1" burst permutation \(A_{i}\), \(i=1,2,\cdots ,M\), it is paired with the P regularized “0" burst permutations \(\bar{B}_{j}\), \(j=1,2,\cdots ,P\). As a whole, there are \(M \times P\) pairs of \(A_{i}\) and \(\bar{B}_{j}\). Subsequently, each \(A_{i}\) and \(\bar{B}_{j}\) pair is used to iteratively compute m pairs of the start index \(p_{start}\) and end index \(p_{end}\) (as detailed in Section 4-C).

  • Step 4: Initialize a N-bit all zero sequence \(\varvec{E}_{0}\). With the m pairs of the start index \(p_{start}\) and end index \(p_{end}\), m “1" bursts are obtained by flipping the 0 bits with the indexes in \(p_{start}\) and \(p_{end}\). When the bit flipping process is completed, \(\varvec{E}\) is output as one NEP.

In the benchmark of [7], the coding structure related syndromes are used to localize the position of the error bits. If the correctness of a guessed result is verified, the corresponded NEP is then generated to calculate the codeword. Therefore, for different linear block coding schemes, the syndromes may be different, and should be stored in the memory in the initialization stage if the used coding scheme is changed. Seen from Fig. 3, in the proposed NEPs generator, the NEPs are generated by using the N, m, \(l_{m}\) and \(C_{\alpha }\) parameters. By using Eqs. (2) and (3), respectively, when a putative NEP is generated, it is output to calculate the guessed codeword, and to verify if the decoding result is correct. As such, the proposed NEPs generator is independent to the coding structure.

4.2 Architecture design of the permutation generation module

To generate all of the NEPs, the precondition is to generate all of the “1" and “0" burst permutations. In Fig. 3, the “1" and “0" bursts permutation generation modules have the same architecture, with the only difference being that the input burst parameters for the two modules are \(\{m,l_{m}\}\) and \(\{n,l_{n}\}\), respectively. Taking the “1" bursts permutation generation module as an example, the design of its architecture is presented in this subsection.

Algorithm 1
figure g

\(A_{i} \xrightarrow {SAS} A_{j}\)

For convenient analysis, this paper expresses the “1" burst permutations \(A_{i}\) by Eq. (6). It should be noted that, based on the SAS operation in Algorithm 1, a variable (\(c_{1}\) in Algorithm 1) is added to a number of “1" elements in the end of \(A_{i}\), which is denoted as f in Eq. (6).

$$\begin{aligned} A_{i}=(a_{i,1}, a_{i,2}, \cdots , \overbrace{1, \cdots , 1}^{{}f}) \end{aligned}$$
(6)

A burst permutation \(A_{i}\) is referred to as a mother permutation if \(a_{i,1} \ge 2\) and \(f \ge 1\). Taking \(A_{i}=(4,2,1,1,1)\) as an instance, we can get that \(a_{i,1}=4\) and \(f=3\), which signifies it as a mother permutation.

For the input “1" burst parameters \(\{m,l_{m}\}\), we initialize the first permutation as \(A_{1}=(l_{m}-m+1, 1, \cdots , 1)\). Thus the first element is \(a_{1,1}=l_{m}-m+1\), and the f parameter is \(m-1\). The rest \({l_{m}-1} \atopwithdelims ()m-1\) \(-1\) permutations can be calculated by the following steps.

  • Step 1: Set a last-in and first-out (LIFO) memory Mem, and initialize the address pointer Addr as 0. If \(A_{1}\) is a mother permutation, it is stored in Mem, and Addr is updated as \(Addr=Addr+1\). Otherwise, \(A_{1}\) is the only permutation for the “1" burst parameters \(\{m,l_{m}\}\). Since Addr equals 0, the NEPs generation process is finished. Whether \(A_{1}\) is a mother permutation or not, \(A_{1}\) is output through a buffer.

  • Step 2: Read out a mother permutation, for example \(A_{i}\), from the LIFO memory Mem, and update the address pointer as \(Addr=Addr-1\). Then, \(A_{i}\) is output to perform the SAS-based operation, as presented by the pseudo code in Algorithm 1. In this paper, the SAS-based operation is referred to as \(A_{i} \xrightarrow {SAS} A_{j} \ (i<j)\).

  • Step 3: The SAS-based operation in Step 2 has two loops, which can generate \((a_{i,1}-1) \times f\) new permutations. If a newly generated permutation is a mother permutation, the address pointer is updated as \(Addr=Addr+1\), and this mother permutation is stored in the LIFO Mem. In addition, all of the new permutations are output to the buffer.

  • Step 4: Step 2 and 3 are iteratively implemented. When \(Addr=0\), it means no mother permutations are stored in the Mem, and the whole permutation generation process is finished. As a whole, totally \(M={l_{m}-1 \atopwithdelims ()m-1}\) permutations are generated [8]. Through a buffer, these permutations are output to the NEPs generation module.

Upon detailed investigation of the SAS-based operation in Algorithm 1, we can find that when \(c_{1}=a_{i,1}-1\), \(a_{j,1}\) of the f newly generated permutation \(A_{j}\) equal 1, thus \(A_{j}\) are not the mother permutations. Similarly, when \(c_{2}=m\), \(a_{j,m}\) of the \(a_{i,1}-1\) newly generated permutations are greater than 1 (\(f=0\)), these permutations are also not the mother permutations. Note that when \(c_{1}=a_{i,1}-1\) and \(c_{2}=m\), this calculated permutation belongs to the above two cases simultaneously. So, totally \(f+(a_{i,1}-1)-1\) newly generated permutations are not the mother permutations. Therefore, when a mother permutation \(A_{i}\) is input to Algorithm 1 to perform the SAS-based operation, \(K_{mem}\) newly generated permutations are the mother permutations, which is calculated by Eq. (7).

$$\begin{aligned} \begin{aligned} K_{mem}&=(a_{i,1}-1) \times f-(f+(a_{i,1}-1)-1) \\&=(a_{i,1}-2) \times (f-1) \end{aligned} \end{aligned}$$
(7)

As can be seen from Eq. (7), \(K_{mem}\) is linear correlated with \(a_{i,1}\) and f. On considering two aspects: i) When the mother permutation is \(A_{1}\), we can derive that \(a_{1,1}=l_{m}-(m-1)\) and \(f=m-1\). As a result, \(K_{mem}\) reaches to the maximum value when the mother permutation \(A_{1}\) is read out to perform the SAS operation. ii) Seen from Algorithm 1, in each round of the SAS operation, the mother permutation with the \(a_{i,1}=2\) and \(f=1\) is the last one that stored in the LIFO Mem. In the next round of the SAS operation, this mother permutation is the first one that read out to perform the SAS operation, and does not generate any new mother permutation. To summarize, for the “1" burst parameters \(\{m,l_{m}\}\), maximal \(K_{mem}^{max}\) mother permutations are calculated and stored in the LIFO Mem, as calculated by Eq. (8).

$$\begin{aligned} \begin{aligned} K_{mem}^{max}&=(a_{i,1}-2) \times (f-1)|_{a_{i,1}=l_{m}-(m-1), f=m-1} \\&=(l_{m}-m-1)(m-2) \end{aligned} \end{aligned}$$
(8)

Based on the above analysis, architecture design of the permutation generation module is presented in Fig. 4.

Fig. 4
figure 4

Architecture design of the permutation generation module

4.3 Architecture design of the NEPs generation module

This subsection presents the architecture design of the NEPs generation module using the generated permutations \(A_{i}\) and \(B_{j}\). To achieve efficient realization, two techniques are adopted: i) Typically, \(l_{m}\) is smaller than \(l_{n}=N-l_{m}\). Therefore, one N-bit all zero sequence \(\varvec{E}_{0}\) is initialized, and then a putative NEP can be achieved by simply flipping the 0 bits at the corresponding indexes of \(\varvec{E}_{0}\). ii) For the “0" burst parameters \(\{n,l_{n}\}\) of the four types of NEPs in Fig. 1, n is different and thus may result in complicated architecture design. In Section 3-B, we propose to regularize the “0" burst permutations \(B_{j}\). With this technique, the four types of NEPs can be generated in the same way.

Fig. 5
figure 5

Architecture design of the NEPs generation module

Figure 5 is the architecture design of the NEPs generation module, which proceeds by the following steps.

  • Step 1: \(B_{j}\) regularization. A permutation \(B_{j}\) is input into a shift register. Based on Eqs. (5a)–(5d), by inserting a zero value to the begin and/or end position of \(B_{j}\), the regularized permutation \(\bar{B}_{j}\) can be calculated. It should be noted that, for each \(\bar{B}_{j}\), all of the permutations in \(\{A_{1}, \cdots , A_{i}, \cdots , A_{M} \}\) are paired to compute the m pairs of start and end indexes in Step 2.

  • Step 2: Iterative computation of the start and end indexes. At first, the “1” burst start index \(p_{start}\) and end index \(p_{end}\) is respectively initialized as 0 and \(\bar{b}_{j,1}\). Then, with the input \(A_{i}\) and \(\bar{B}_{j}\) permutations, the m pairs of \(p_{start}\) and \(p_{end}\) are iteratively calculated: i) Indexes computation: \(p_{start}=p_{start}+\bar{b}_{j,k}+1\), \(p_{end}=p_{end}+a_{i,k}\). This pair of start and end indexes is used for bit flipping in Step 3. ii) Indexes update for the next round of indexes computation: \(p_{start}=p_{end}\), \(p_{end}=p_{end}+\bar{b}_{j,k+1}\).

  • Step 3: Bit flipping. Assume the all zero sequence as \(\varvec{E}_{0}=e_{0}^{1} \cdots e_{0}^{p} \cdots e_{0}^{N}\), where p is the location index. For each pair of \(p_{start}\) and \(p_{end}\), the 0 bits with the indexes in \(p_{start}\) and \(p_{end}\) are flipped to 1.

  • Step 2 and 3 are combined into a single-loop operation, the corresponding pseudo code is shown by Algorithm 2. As a whole, \(M \times P\) NEPs are generated.

Algorithm 2
figure h

Permutation Regularization and Bit Flipping: \(B_{j} \rightarrow \bar{B}_{j}\), \(S_{0} \rightarrow S\)

5 Performance valuation

In this section, the benchmark in [7] is used for comparison. Firstly, the proposed NEPs generator is used to generate NEPs for CRC decoding, through which the PER performance is analyzed. Then, the proposed NEPs generator and the benchmark in [7] are realized on the same FPGA platform. Under the maximum clock frequency of the FPGA platform, the hardware overhead and power dissipation metrics are compared and discussed.

5.1 Decoding performance simulation

In the PER simulation, this paper assumes the coded packets are transmitted over a Markov chain modeled bursty channel, and the stationary bit flip probability of the bursty channel is \(b/(b+g)=Q(\sqrt{2RE_{b}/N_{0}})\) [6]. In the benchmark PER simulation, the burst number m is 2. For a fair comparison, we assume \(m \le 2\) and \(l_{m} \le 32\), which is expressed as \((m, l_{m})=(2,32)\) in this paper. For a class of NEPs, the maximum guessing operation number is limited to \(2 \times 10^{5}\). The targeted coding scheme being CRC32, the generating polynomial is shown in Eq. (9).

$$\begin{aligned} \begin{aligned} g(x)&=g_{0}x^{0}+g_{1}x^{1}+g_{2}x^{2}+ \cdots +g_{31}x^{31}+g_{32}x^{32} \ \\&=1+x+x^{2}+x^{4}+x^{5}+x^{7}+x^{8}+x^{10}+x^{11} \\&\quad +x^{12}++x^{16}+x^{22}+x^{23}+x^{26}+x^{32} \end{aligned} \end{aligned}$$
(9)
Fig. 6
figure 6

PER performance simulation and comparison

To show the advantage of GRAND-MO in bursty channels, two coding schemes are employed for performance comparison. The first is the 3/4 code rated convolutional codes (CC) with the Viterbi decoding algorithm, while the second is the Reed-Solomon (RS(15,11)) codes with the Berlekamp-Massey (BM) decoding algorithm. In the bursty channels, we assume the g parameter as 0.2 and 0.05, respectively. For the CC and CRC32 codes, the packet length is \(N=128\), the code rate is \(3/4=0.75\). While for the RS(15,11) code, the symbol length is 4 bits, the packet length is \(N=120\), and the code rate is \(11/15 \approx 0.73\).

The comparison of the PER performance is presented in Fig. 6. As expected, since CC is effective to correct random corrupted bits, this coding scheme gets the worst PER performance in a bursty channel. For the RS(15,11) code, the PER performance is only slightly improved. Theoretically, the RS(15,11) code can correct at most 8-bit length errors. However, in a memory channel, the error bursts are more likely to cover a scope that greater than an 8-bit length. By using the GRAND-MO algorithm, the benchmark in [7] can improve the PER performance significantly. For the proposed NEPs generator, it can generate all of the NEPs for given “1" burst parameters. When \((m, l_{m})\) is also set to (2, 32), the PER performance is the same as that of the benchmark. As the m or \(l_{m}\) parameter increases, i.e., \((m, l_{m})=(3,32)\) and \((m, l_{m})=(2,48)\), the PER performance is also improved.

Table 1 Number of clock cycles consumed for key operations in Fig. 3 (Clock frequency \(866 \ MHz\))
Table 2 Comparison of hardware overhead and power dissipation (Clock frequency \(866 \ MHz\))

5.2 Hardware overhead and power dissipation

Hardware overhead and power dissipation are another two important metrics for the performance evaluation. Using Verilog HDL on the FPGA platform with the device model Xilinx XC7Z020-2CLG400I, the proposed architecture and the benchmark are realized. The packet length is \(N=128\), and the burst parameters are \(\{m,l_{m}\}=\{2,32\}\). The highest clock frequency of the FPGA platform is 866 MHz. Based on the above assumptions, Table 1 shows the number of clock cycles consumed by the key operations in Fig. 3. For each of these operations, these results are tested with the input parameters or vectors in the worst cases. For example, in the \(B_{j}\) permutation generation module, it will take at most 334 clock cycles for Algorithm 1 to complete one SAS-based operation. While in the NEPs generation module, with the input \(A_{i}\) and \(\bar{B}_{j}\) permutation pair, it will take at most 658 clock cycles to generate one NEP.

Regarding to the hardware overhead and power dissipation metrics, Table 2 provides a comparison of the two architectures. In the benchmark, the 32 rows of registers are used to store the syndromes that associated with the noise bursts. As such, it is a high parallel architecture that can significantly improves the throughput. For the proposed architecture in this paper, a LIFO memory is employed to store the mother permutations, and the NEPs are generated in a serial manner. Practical testing shows that the throughput of the proposed architecture is 1/8 of the benchmark. This limitation highlights the need for improving the proposed architecture to a high parallel version. Compared to the benchmark, the hardware overhead and power consumption of the proposed architecture is 1/3 and 1/10 of the respective counterpart. The proposed NEPs generation module demonstrates a computing-intensive and power-efficient architecture. Moreover, since the proposed architecture can generate the NEPs without using the coding structure related syndromes, this architecture can be easily transplanted to the decoding of any linear block codes by using the GRAND-MO algorithm.

6 Conclusion

For the GRAND-MO algorithm used in bursty channels, the key aspect is to generate NEPs in an efficient way. In this paper, the NEPs construction is classified into four types. To facilitate the implementation of the NEPs generator architecture, we proposed to generate the “1” and “0” burst permutations by using the burst parameters. The generation of the four types of NEPs is uniformed by regularizing the “0" burst permutations. Then, the generation procedures are detailed step-by-step. These include the engineering realization of the successive addition-subtraction operation, analysis of the memory requirement in the permutation generation module, and bit flipping of the “1" bursts via the iterative calculated indexes. Based on FPGA implementation and comparison with existing benchmark, the novelties of the proposed NEPs generator architecture are demonstrated. There are still some issues should be addressed: i) For the given \(\{m,l_{m}\}\) and \(C_{\alpha }\) parameters, how to generate the permutations and NEPs more efficiently. ii) To reduce the decoding latency, how to develop a high throughput version from the proposed NEPs generator. iii) In complicated channel environments, how to balance the decoding complexity and the PER performance. These topics will be the focus of future research.