The channel of information transmission is called channel for short. The commonly used channels include cable, optical fiber, medium of radio wave transmission and carrier line, etc., and also include tape, optical disk, etc. The channel constitutes the physical conditions for social information to interact across space and time. In addition, a piece of social information, such as various language information, picture information, data information and so on, should be exchanged across time and space, information coding is the basic technical means. What is information coding? In short, it is the process of digitizing all kinds of social information. Digitization is not a simple digital substitution of social information, but is full of profound mathematical principles and beautiful mathematical technology. For example, the source code used for data compression and storage uses the principle of probability statistics to attach the required statistical characteristics to social information, so the source code is also called random code. The other is the so-called channel coding, which is used to overcome the channel interference. This kind of code is full of beautiful algebra, geometry and various mathematical techniques in combinatorics, in order to improve the accuracy of information transmission, so the channel coding is also called algebraic combinatorial code. The main purpose of this chapter is to introduce the basic knowledge of code theory for channel coding. Source coding will be introduced in Chap. 3.

With the hardware support of channel and the software technology of information coding, we can implement the long-distance exchange of various social information across time and space. Taking channel coding as an example, this process can be described as the following diagram (Fig. 2.1).

Fig. 2.1
figure 1

Channel Coding

In 1948, American mathematician Shannon published his pioneering paper “Mathematical Principles of Communication” in the technical bulletin of Bell laboratory, marking the advent of the era of electronic information. In this paper, Shannon proved the existence of “good code” with the rate infinitely close to the channel capacity and the transmission error probability arbitrarily small by using probability theory (see Theorem in this Chap. 2.10), on the other hand, if the transmission error probability is arbitrarily small, the code rate (transmission efficiency) does not exceed an upper bound (channel capacity) (see Theorem in Chap. 3). This upper bound is called Shannon’s limit, which is regarded as the golden rule in the field of electronic communication engineering technology.

Shannon’s theorem is an existence proof rather than a constructive proof. How to construct the so-called good code which can not only ensure the communication efficiency (the code rate is as large as possible), but also control the transmission error rate is the unremitting goal after the advent of Shannon’s theory. From Hamming and Golay to Elias, Goppa, Berrou and Turkish mathematician Arikan, from Hamming code, Golay code to convolutional code, turbo code to polar code, over the past decades, electronic communication has reached one peak after another, creating one technological miracle after another, until today’s 5G era. In 1969, the U.S. Mars probe used Hadamard code to transmit image information. For the first time, mankind was lucky to witness one beautiful picture after another in outer space, in 1971, the U.S. Jupiter and Saturn probe used the famous Golay code G23 to send hundreds of frames of color photos of Jupiter and Saturn back to earth, 70 years of exploration of channel coding is a magnificent history of electronic communication.

The main purpose of this chapter is to strictly define and prove the mathematical characteristics of general codes in theory, so as to provide a solid mathematical foundation for further study of coding technology and cryptography. This chapter includes Hamming distance, Lee distance, linear code, some typical good codes, MacWilliams theorem and famous Shannon coding theorem. Master the content of this chapter, we will have a basic and comprehensive understanding of channel coding theory (error correction code).

2.1 Hamming Distance

In channel coding, the alphabet usually chooses a q-element finite field \(\mathbb {F}_q\), sometimes a ring \(\mathbb {Z}_m\), where q is the power of a prime. Let \(n\geqslant 1\) be a positive integer, \(\mathbb {F}_q^n\) is an n-dimensional linear space over \(\mathbb {F}_q\), also called codeword space.

$$ \mathbb {F}_q^n=\{x=(x_1, x_2, \ldots , x_n)| \forall x_i \in \mathbb {F}_q\}. $$

A vector \(x=(x_1, x_2, \ldots , x_n)\) in \(\mathbb {F}_q^n\) is called a codeword of length n. For convenience, a codeword x, we write as \(x=x_1x_2 \ldots x_n\), each \(x_i \in \mathbb {F}_q\) is called a character, denoted by \(0=(0,0, \ldots , 0)\).

Two codewords \(x=x_1x_2\ldots x_n\) and \(y=y_1y_2\ldots y_n\) define the number of characters whose Hamming distance is different from x and y, that is

$$\begin{aligned} d(x,y)=\#\{i|1\le i\le n, x_i \ne y_i\}. \end{aligned}$$
(2.1)

Obviously \(0\leqslant d(x,y)\leqslant n\) is a positive integer, the weight function of a codeword \(x\in \mathbb {F}_q^n\) is defined as \(w(x)=d(x,0)\), that is Hamming distance between x and 0. The following properties are obvious.

Property 2.1

If x, \(y\in \mathbb {F}_{q}^n\), then

  1. (i)

    \(d(x,y)\ge 0\), \(d(x,y)=0\) if and only if \(x=y\).

  2. (ii)

    \(d(x,y)=d(y,x)\).

  3. (iii)

    \(w(-x)=w(x)\).

  4. (iv)

    \(d(x,y)=d(x-z,y-z)\), \(\forall ~ z\in \mathbb {F}_q^n\).

  5. (v)

    \(d(x,y)=w(x-y)\).

Property (i) is called nonnegativity, property (ii) symmetry and property (iv) translation invariance. This is the basic property of distance function in mathematics, and we can analogy with the distance between two points in plane or Euclidean space.

Lemma 2.1

Let x, \(y\in \mathbb {F}_q^n\) be two codings, then

$$ w(x \pm y)\leqslant w(x)+w(y). $$

Proof

Because \(w(-x)=w(x)\), so \(w(x-y)=w(x+(-y))\). We can only prove \(w(x+y)\leqslant w(x)+w(y)\). Let \(x=x_1\ldots x_n\), \(y=y_1 \ldots y_n\), then

$$ x+y=(x_1+y_1)(x_2+y_2)\ldots (x_n+y_n). $$

Obviously, if \(x_i+y_i\ne 0\), then \(x_i\ne 0\), or \(y_i\ne 0(1\leqslant i\leqslant n)\). Thus \(w(x+y)\leqslant w(x)+w(y)\).

$$ w(x-y)=w(x+(-y))\leqslant w(x)+w(-y)=w(x)+w(y). $$

We have completed the proof.

Lemma 2.2

(Trigonometric inequality) If x, y, \(z\in \mathbb {F}_q^n\) are three codings, then

$$ d(x,y)\leqslant d(x,z)+d(z,y). $$

Proof

From 2.1, if \(z\in \mathbb {F}_q^n\), then

$$ w(x-y)\le w(x-z)+w(z-y). $$

Then by property (v), \(d(x,y)=w(x-y)\), we have

$$ d(x,y)\leqslant d(x,z)+d(z,y). $$

The Lemma holds.

The nonnegativity, symmetry, translation invariance of Hamming distance and trigonometric inequality described in lemma 2 together show that Hamming distance of two codewords is equal to the distance between two points in physical space, which is a real distance function in mathematical sense. Similarly, we can define the concept of ball. A Hamming sphere with radius \(\rho \) centered on codeword x is defined as

$$\begin{aligned} B_\rho (x)=\{y|y\in \mathbb {F}_q^n,d(x,y)\leqslant \rho \}, \end{aligned}$$
(2.2)

where \(\rho \) is a nonnegative integer. Obviously, \(B_0(x)=\{x\}\) contains only one codeword.

Lemma 2.3

For any \(x\in \mathbb {F}_q^n\), \(0\leqslant \rho \leqslant n\), we have

$$\begin{aligned} |B_\rho (x)|=\sum \limits _{i=0}^{\rho }\left( {\begin{array}{c}n\\ i\end{array}}\right) (q-1)^i, \end{aligned}$$
(2.3)

where \(|B_\rho (x)|\) is the number of codewords in Hamming ball \(B_\rho (x)\).

Proof

Let \(x=x_1x_2\ldots x_n\), \(0\leqslant i\leqslant \rho \), i given, let

$$ A_i=\#\{y\in \mathbb {F}_q^n|d(y,x)=i\}. $$

Obviously,

$$ A_i=\left( {\begin{array}{c}n\\ i\end{array}}\right) (q-1)^i, $$

so

$$ |B_\rho (x)|=\sum \limits _{i=0}^\rho A_i=\sum \limits _{i=0}^\rho \left( {\begin{array}{c}n\\ i\end{array}}\right) (q-1)^i. $$

Corollary 2.1

For \(\forall x\in \mathbb {F}_q^n\), we have

$$ |B_\rho (x)|=|B_\rho (0)|. $$

That is to say, the number of codewords in \(B_\rho (x)\) is a constant which only depends on radius \(\rho \). This constant is usually denoted as \(B_\rho \).

Definition 2.1

If \(C\subsetneqq \mathbb {F}_q^n\), C is called a q-ary code, code for short, |C| is the number of codewords in code C. If \(|C|=1\), we call C a trivial code, and all the codes we discuss are nontrivial codes.

For a code of C, the following five mathematical quantities are of basic importance.

Definition 2.2

If C is a code, define

  • Bit rate of C \(R=R_C=\dfrac{1}{n}\log _q{|C|}\)

  • Minimum distance of C \(d=\min \{d(x,y)|x,y\in C, x\ne y\}\)

  • Minimal weight of C \(w=\min \{w(x)|x\in C, x\ne 0\}\)

  • Coverage radius of C \(\rho =\max \{\min \{d(x,c)|c\in C\}|x\in \mathbb {F}_q^n\}\)

  • Disjoint radius of C \(\rho _1=\max \{r|0\leqslant r\leqslant n,~B_r(c_1)\cap B_r(c_2)=\phi ,~\forall c_1,~c_2\in C,~c_1\ne c_2\}\)

It is important to discuss the relationship between the above five mathematical quantities for the study of codes. We begin by proving lemma 2.4.

Lemma 2.4

Let d be minimum distance of C, \(\rho _1\) be disjoint radius of C, then

$$ d=2\rho _{1}+1, \quad d=2\rho _{1}+2. $$

Proof

We can only prove \(2\rho _1+1\leqslant d\leqslant 2\rho _1+2\). If \(d\leqslant 2\rho _1\), then there are codewords \(c_1\in C\), \(c_2\in C\), \(c_1\ne c_2\) such that

$$ d(c_1,c_2)\leqslant 2\rho _1. $$

This means that \(c_1\) and \(c_2\) have at most \(2\rho _1\) different characters. Without losing generality, we can make the first \(2\rho _1\) characters of \(c_1\) and \(c_2\) different, that is

$$ \left\{ \begin{array}{@{}l} c_1=a_1a_2\ldots a_{\rho _1}a_{\rho _1+1}\ldots a_{2\rho _1}**\cdots *\\ c_2=b_1b_2\ldots b_{\rho _1}b_{\rho _1+1}\ldots b_{2\rho _1}**\cdots *\\ \end{array}\right. $$

where * represents the same character. We can put

$$ x=a_1a_2\ldots a_{\rho _1}b_{\rho _1+1}\ldots b_{2\rho _1}*\cdots *, $$

this shows that

$$ d(x,c_1)\leqslant \rho _1,\quad d(x,c_2)\leqslant \rho _1. $$

That is

$$ x\in B_{\rho _1}(c_1)\cap B_{\rho _1}(c_2). $$

It’s in contradiction with \(B_{\rho _1}(c_1)\cap B_{\rho _1}(c_2)=\phi \). So we have \(d\geqslant 2\rho _1+1\). If \(d>2\rho _1+2=2(\rho _1+1)\), then we can prove the following formula, which is in contradiction with the definition of disjoint radius \(\rho _1\).

$$ B_{\rho _1+1}(c_1)\cap B_{\rho _1+1}(c_2)=\phi ,~\forall c_1,~c_2\in C,~c_1\ne c_2. $$

Because if the above formula does not hold, then \(c_1\), \(c_2\in C\), \(c_1\ne c_2\), \(B_{\rho _1+1}(c_1)\) intersects with \(B_{\rho _1+1}(c_2)\), we might as well make

$$ x \in B_{\rho _1+1}(c_1)\cap B_{\rho _1+1}(c_2), $$

Then the trigonometric inequality of lemma 2.2 is derived

$$ d(c_1,c_2)\leqslant d(c_1,x)+d(c_2,x)\leqslant 2(\rho _1+1), $$

It contradicts the hypothesis of \(d>2(\rho _1+1)\). So we have \(2\rho _1+1\leqslant d\leqslant 2\rho _1+2\). The Lemma holds.

In order to discuss the geometric meaning of covering radius \(\rho \), we consider the set \(\{B_\rho (c)|c\in C\}\) of balls on code C, if

$$ \bigcup \limits _{c\in C}B_\rho (c)=\mathbb {F}_q^n, $$

Then \(\{B_\rho (c)|c\in C\}\) is called a cover of codeword space \(\mathbb {F}_q^n\).

Lemma 2.5

Let \(\rho \) be the covering radius of C, then \(\rho \) is the smallest positive integer of \(\{B_\rho (c)|c\in C\}\) covering \(\mathbb {F}_q^n\).

Proof

By the definition of \(\rho \), for all \(x\in \mathbb {F}_q^n\), there is

$$ \min \{d(x,c)|c\in C\}\leqslant \rho . $$

Therefore, when \(x\in \mathbb {F}_q^n\) is given, there is a codeword \(c\in C\Rightarrow d(x,c)\leqslant \rho \), that is \(x\in B_\rho (c)\), this shows that

$$ \bigcup \limits _{c\in C} B_{\rho }(c)=\mathbb {F}_q^n. $$

That is, \(\{B_\rho (c)|c\in C\}\) forms a cover of \(\mathbb {F}_q^n\). Obviously, \(\{B_{\rho -1}(c)|c\in C\}\) can’t cover \(\mathbb {F}_q^n\), because if

$$ \bigcup \limits _{c\in C}B_{\rho -1}(c)=\mathbb {F}_q^n. $$

Then for any \(x\in \mathbb {F}_q^n\), \(\exists ~ c\in C\Rightarrow x\in B_{\rho -1}(c)\), so

$$ \min \{d(x,c)|c\in C\}\leqslant \rho -1. $$

Thus

$$ \rho =\max \{\min \{d(x,c)|c\in C\}|x\in \mathbb {F}_q^n\}\leqslant \rho -1. $$

The contradiction indicates that \(\rho \) is the smallest positive integer. The lemma holds.

Lemma 2.6

Let d be the minimum distance of C and \(\rho \) be the covering radius of C, then

$$ d\leqslant 2\rho +1. $$

Proof

If \(d>2\rho +1\), Let \(c_0\in C\) be given, then we have

$$ B_{\rho +1}(c_0)\cap B_{\rho }(c)=\phi ,~\forall c\in C,~c\ne c_0. $$

So you can choose \(x\in B_{\rho +1}(c_0)\), and \(d(x,c_0)=\rho +1\), then

$$ x\not \in B_\rho (c_0),~x\not \in B_\rho (c),~\forall c\in C. $$

That is, \(\{B_\rho (c)|c\in C\}\) cannot cover \(\mathbb {F}_q^n\), which is contrary to lemma 2.5. So we always have \(d\leqslant 2\rho +1\). The Lemma holds.

Combining the above three lemmas, we can get the following simple but very important corollaries.

Corollary 2.2

Let \(C\subset \mathbb {F}_q^n\) be an arbitrary q-ary code. d, \(\rho \), \(\rho _1\) are the minimum distance, covering radius and disjoint radius of C respectively, then

  1. (i)

    \(\rho _1\leqslant \rho \).

  2. (ii)

    If the minimum distance of C is \(d=2e+1\Rightarrow e=\rho _1\).

Proof

(i) Directly from \(2\rho _1+1\leqslant d\leqslant 2\rho +1\), if \(d=2e+1\) is odd, then by the lemma 2.4, \(d=2\rho _1+1=2e+1\Rightarrow e=\rho _1\).

Definition 2.3

A code C, if \(\rho =\rho _1\), is called a perfect code.

Corollary 2.3

(i) The minimum distance of any perfect code C is \(d=2\rho +1\).

(ii) The minimum distance of a code C is \(d=2e+1\), Then C is a perfect code if and only if \(\forall ~ x\in \mathbb {F}_q^n\), \(\exists \) the only ball \(B_e(c)\), \(c\in C\Rightarrow x\in B_e(c)\).

Proof

(i) can be directly launched by \(2\rho _1+1\leqslant d \leqslant 2\rho +1\). To prove (ii), if C is a perfect code and the minimum distance is \(d=2e+1\), so we have \(\rho _1=\rho =e\). On the other hand, if the conditions are right, then the coverage radius of C is \(\rho \leqslant e=\rho _1\leqslant \rho \), so \(\rho _1=\rho \). C is a perfect code.

In order to introduce the concept of error correcting code, we discuss the so-called decoding principle in electronic information transmission. This principle is commonly known as the decoding principle of “look most like”. What looks like the most? When we transmit through the channel with interference, we receive a codeword \(x'\in \mathbb {F}_q^n\), and a codeword \(x\in C\). If

$$ d(x,x')=\min \{d(c,x')|c\in C\}, $$

x is the most similar codeword to \(x'\) in code C. So we decode \(x'\) to x. If the most similar codeword x is the only one in C, then theoretically, \(x'\) is the codeword received after x transmission, so \(x'\xrightarrow x\) is accurate.

Definition 2.4

A code C is called e-error correcting code \((e\geqslant 1)\). If for any \(x\in \mathbb {F}_q^n\), there is a \(c\in C\Rightarrow x\in B_e(c)\), then c is unique.

An error correcting code allows transmission errors without affecting correct decoding. For example, suppose that C is a e-error correcting code, then for any \(c\in C\), after c is transmitted through the channel with interference, the codeword we receive is x, if an error occurs when c is transmitted with no more than e characters at most, that is \(d(c,x)\leqslant e\), so the most similar codeword in C must be c, so we can decode \(x\xrightarrow {decode}c\) correctly.

Corollary 2.4

A perfect code with minimal distance \(d=2e+1\) is e-error correcting code.

Proof

Because the disjoint radius \(\rho _1\) of C has \(\rho _1=\rho =e\) with the covering radius \(\rho \). Therefore, for any received codeword \(x\in \mathbb {F}_q^n\), there exists and only exists a \(c\in C\Rightarrow x\in B_e(c)\). That is, C is e-error correction code.

Finally, we prove the main conclusion of this section.

Theorem 2.1

The minimum distance of a code C is \(d=2e+1\), then C is a perfect code if and only if the following sphere-packing condition holds.

$$\begin{aligned} |C|\sum \limits _{i=0}^e\left( {\begin{array}{c}n\\ i\end{array}}\right) (q-1)^i=q^n. \end{aligned}$$
(2.4)

Proof

If the minimum distance of C is \(d=2e+1\), and C is the perfect code \(\Rightarrow \rho =\rho _1=e\). So

$$ \bigcup \limits _{c\in C}B_e(c)=\mathbb {F}_q^n. $$

Then we have

$$ \left| \bigcup \limits _{c\in C}B_e(c)\right| =q^n, $$

thus

$$ |C|B_e=|C|\sum \limits _{i=0}^e\left( {\begin{array}{c}n\\ i\end{array}}\right) (q-1)^i=q^n. $$

Conversely, the sphere-packing condition (2.4) holds. Because the minimum distance of C is \(d=2e+1\), from corollary 2.2, we can see that \(\rho _1=e\), so we have

$$ \bigcup \limits _{c\in C}B_e(c)=\mathbb {F}_q^n. $$

It can be concluded that \(\rho \leqslant e=\rho _1\leqslant \rho \), thus \(\rho =\rho _1\), C is a perfect code. The theorem holds.

When \(q=2\), the alphabet \(\mathbb {F}_2\) is a finite field of two elements \(\{0,1\}\), at this time, the coding is called binary code or binary code, and the transmission channel is called binary channel. In binary channel transmission, the most important is binary entropy function \(H(\lambda )\), define as

$$\begin{aligned} H(\lambda )= \left\{ \begin{array}{@{}ll} 0,&{}\text {when}~\lambda =0 ~\text {or}~\lambda =1,\\ -\lambda \log \lambda -(1-\lambda )\log (1-\lambda ),&{}\text {when}~0<\lambda <1. \end{array}\right. \end{aligned}$$
(2.5)

Obviously, \(H(\lambda )=H(1-\lambda )\), and \(0\leqslant H(\lambda )\leqslant 1\), \(H\left( \dfrac{1}{2}\right) =1\), that is \(\lambda =\dfrac{1}{2}\) reaching the maximum. For further properties of \(H(\lambda )\), please refer to Chap. 1.

Theorem 2.2

Let C be a perfect code with minimal distance \(d=2e+1\), \(R_C\) is the code rate of C, then

(i) \(1-R_C=\dfrac{1}{n}\log _2\sum \limits _{i=0}^e\left( {\begin{array}{c}n\\ i\end{array}}\right) \leqslant H\left( \dfrac{e}{n}\right) \).

(ii) When the length of codeword is \(n\rightarrow \infty \), if \(\lim \limits _{n\rightarrow \infty }R_C=a\), then

$$ \lim \limits _{n\rightarrow \infty } H\left( \dfrac{e}{n}\right) =1-a. $$

Proof

(i) According to the sphere-packing condition, since C is the perfect code, so

$$ |C|\sum \limits _{i=0}^e\left( {\begin{array}{c}n\\ i\end{array}}\right) =2^n. $$

We have

$$ \dfrac{1}{n} \log _2 {|C|}+\dfrac{1}{n}\log _2\sum \limits _{i=0}^e\left( {\begin{array}{c}n\\ i\end{array}}\right) =1. $$

That is

$$ 1-R_C=\dfrac{1}{n}\log _2\sum \limits _{i=0}^e\left( {\begin{array}{c}n\\ i\end{array}}\right) \leqslant H\left( \dfrac{e}{n}\right) , $$

The last inequality is derived from lemma 1.11 in the Chap. 1, so (i) holds. If there is a limit of \(R_C\) when \(n\rightarrow \infty \), again from lemma 1.11 in the Chap. 1, we have

$$ \lim \limits _{n\rightarrow \infty }H\left( \dfrac{e}{n}\right) =1-\lim \limits _{n\rightarrow \infty }R_C=1-a. $$

The Theorem 2.2 holds.

Finally, we give an example of perfect code.

Example 2.1

Let \(n=2e+1\) is an odd number, then the repeated code in \(\mathbb {F}_2^n\) is \(A=\{0,1\}\), where \(0=00\ldots 0\), \(1=11\cdots 1\) are Perfect codes of length n.

First, repeat code \(A=\{0,1\}\subset \mathbb {F}_{2}^{n}\) contains only two codes \(0=0 \ldots 0\in \mathbb {F}_{2}^{n}, 1=1\ldots 1\in \mathbb {F}_{2}^{n}\), because \(n=2e+1\) is an odd number, so from Corollary 2.2, Disjoint radius of A is \(\rho _{1}=e\), let’s prove that the covering radius of A is \(\rho =\rho _{1}=e\), for any \(x \in \mathbb {F}_{2}^{n}\), if \(d(0,x)>e\), that is \(d(0,x)\ge e+1\), this shows that at least \(e+1\) characters are 1 in \(x=x_{1}x_{2}\ldots x_{n}\in \mathbb {F}_{2}^{n}\) , the maximum number of e characters is 0, thus \(d(1, x)\le e\). This shows that \(x\notin B_{e}(0)\), then \(x\in \ B_{e}(1)\), that is

$$\begin{aligned} B_{e}(0)\cup B_{e}(1)= \mathbb {F}_{2}^{n}, \end{aligned}$$

so \(\rho \le e= \rho _{1}\le \rho \), we have \(\rho =\rho _{1}\). That is A is the perfect code. Note that in this example, e can take any positive integer, so the code rate of the repeat code has a limit value

$$\begin{aligned} \lim _{n\rightarrow \infty }R_{A}=0,\Rightarrow \lim _{n\rightarrow \infty }H(\frac{e}{n})=1. \end{aligned}$$

As the end of this chapter, we discuss and define the equivalence of two codes. Let \(C \subset \mathbb {F}_{q}^{n}\) be a code of length n and \(S_{n}\) be a permutation group of n elements. Any \(\sigma \in S_{n}\) is a n permutation, \(x=x_{1}x_{2}\ldots x_{n}\in \mathbb {F}_{q}^{n}\), We define \(\sigma (x)\) as

$$\begin{aligned} \sigma (x)=x_{\sigma (1)}x_{\sigma (2)}\ldots x_{\sigma (n)} \in \mathbb {F}_{q}^{n}, \end{aligned}$$
(2.6)
$$\begin{aligned} \sigma (C)=\{\sigma (c)\mid c \in C\}. \end{aligned}$$
(2.7)

Definition 2.5

Let C and \(C_{1}\) be two codes in \(\mathbb {F}_{q}^{n}\), if there is \(\sigma \in S_{n}\) \(\Rightarrow \sigma (C)=C_{1}\), Call C and \(C_{1}\) is equivalent, denoted as \(C\sim C_{1}\). Obviously, equivalence is an equivalence relation between codes, because take \(\sigma =1\), then have \(C\sim C\). If \(C\sim C_{1}\), that is \(C_{1}= \sigma (C)\), then we have \(C= \sigma ^{-1}(C_{1})\), that is \(C\sim C_{1} \Rightarrow C_{1}\sim C\). Similarly, if \(C\sim C_{1}\), \(C_{1}\sim C_{2}\), then \(C\sim C_{2}\). Because \(C_{1}= \sigma (C), C_{2}=\tau (C_{1}) \Rightarrow C_{2}= \tau \sigma (C)\). Another obvious property is that the function of \(\sigma \) does not change the Hamming distance between two codewords, that is, we have

$$\begin{aligned} d(\sigma (x),\sigma (y))=d(x,y), \forall \sigma \in S_{n}. \end{aligned}$$
(2.8)

Lemma 2.7

Suppose \(C\sim C_{1}\)are two equivalent codes, then they have the same code rate, the same minimum distance, the same coverage radius and the same disjoint radius. In particular, if C is a perfect code, then all codes \(C_{1}\) equivalent to C are perfect codes.

Proof

All the results of lemma can be easily proved by using equation (2.8).

2.2 Linear Code

Let \(C\subset \mathbb {F}_{q}^{n}\) be a code, if C is a k-dimensional linear subspace of \(\mathbb {F}_{q}^{n}\), C is called a linear code, denote as \(C=[n,k]\). So for a linear code C, we have

$$R_{C}=\frac{1}{n}\log _{q} |C|=\frac{k}{n}, ~\text {minimal distance}~d= \text {minimal weight}~w.$$

Let \(\{\alpha _{1},\alpha _{2},\ldots ,\alpha _{k}\}\subset C\) be a set of bases of linear code C, where

$$\alpha _{i}=\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{n}}\in \mathbb {F}_{q}^{n}, ~1\le i\le k.$$

Definition 2.6

If \(\{\alpha _{1},\alpha _{2},\ldots ,\alpha _{k}\}\) is a set of bases of linear code \(C=[n,k]\), then have \(k\times n\)-order matrix

$$\begin{aligned} G= \left[ \begin{array}{cccc} \alpha _{1}\\ \alpha _{2}\\ \vdots \\ \alpha _{k} \end{array}\right] = \left[ \begin{array}{cccc} \alpha _{11} &{} \alpha _{12} &{} \cdots &{} \alpha _{1n}\\ \alpha _{21} &{} \alpha _{22} &{} \cdots &{} \alpha _{2n}\\ \cdots &{} \cdots &{} \cdots &{} \cdots \\ \alpha _{k1} &{} \alpha _{k2} &{} \cdots &{} \alpha _{kn} \end{array}\right] . \end{aligned}$$

Called generation matrix of C, write

$$G=[I_{k},A_{k\times (n-k)}], I_{k}~\text {is}~k~ \text { order identity matrix.}$$

It is called the standard form of G.

Lemma 2.8

\(C=[n,k]\) is a linear code, G is generation matrix, then

$$C=\{aG|a\in \mathbb {F}_{q}^{k}\}.$$

Proof

Because \(\{\alpha _{1}, \alpha _{2},\ldots , \alpha _{k}\}\) is a set of bases of linear code C. \(\forall ~ x\in C\), then

$$x=a_{1}\alpha _{1}+a_{2}\alpha _{2}+\cdots +a_{k}\alpha _{k}=(a_{1},a_{2},\ldots ,a_{k}) \left[ \begin{array}{cccc} \alpha _{1}\\ \vdots \\ \alpha _{k} \end{array}\right] =a\cdot G.$$

Where \(a=(a_{1},a_{2},\ldots ,a_{k})\in \mathbb {F}_{q}^{k}\), the Lemma holds.

Define the inner product in \(\mathbb {F}_{q}^{n}\), \(x=x_{1}\ldots x_{n},y=y_{1}\ldots y_{n}\in \mathbb {F}_{q}^{n}\), then define \(<x,y>=\sum _{i=1}^{n}x_{i}y_{i}\), if \(<x,y>=0\), Say x and y orthogonal, denote as \(x\perp y\).

Definition 2.7

Let \(C=[n,k]\) be a linear code whose orthogonal complement space \(C^{\perp }\) is

$$C^{\perp }=\{y\in \mathbb {F}_{q}^{n}|<x,y>=0,\forall ~x\in C\}.$$

Obviously, \(C^{\perp }\) is an \([n,n-k]\)-linear code, and \(C^{\perp }\) is the dual code of C. The generating matrix H of \(C^{\perp }\) is called the check matrix of C.

Lemma 2.9

\(C=[n,k]\) is a linear code, H is a check matrix, then we have

$$xH'=0\Leftrightarrow x\in C.$$

Where \(H'\) is the transpose matrix of H.

Proof

We only prove the conclusion by taking the standard form of the generating matrix G of C. Let

$$G=[I_{k},A_{k \times (n-k)}]=[I_{k},A],\quad \ A=A_{k\times (n-k)}.$$

Then the check matrix of C, that is, the generating matrix of dual code \(C^{\perp }\) is

$$H=[-A',I_{n-k}], ~H'= \left[ \begin{array}{cccc} -A \\ I_{n-k} \end{array}\right] .$$

By Lemma 2.8, if \(x\in C\), then \(\exists ~ a\in \mathbb {F}_{q}^{k}\Rightarrow x=aG\), thus

$$xH'=aGH'=a[I_{k},A] \left[ \begin{array}{cccc} -A \\ I_{n-k} \end{array}\right] =0.$$

Conversely, if \(xH'=0\), because H is the generating matrix of \(C^{\perp }\), again by Lemma 2.8, for \(\forall ~ y\in C^{\perp }, \exists ~ b\in \mathbb {F}_{q}^{n-k}\Rightarrow y=bH\), thus

$$<x,y>=xy'=xH'b'=0 \Rightarrow x\in (C^{\perp })^{\perp }=C.$$

The Lemma holds.

By Lemma 2.9, \(\forall ~ x,y\in \mathbb {F}_{q}^{n}\), then

$$xH'=yH'\Leftrightarrow x-y\in C.$$

Because C is an additive subgroup of \(\mathbb {F}_{q}^{n}\), \(xH'\) is called the check value of codeword x. Then the check values of the two codewords are equal \(\Leftrightarrow \). These two codewords are in the same additive coset of C. The following decoding principle of linear code is produced.

Decoding principle: If the \(C=[n,k]\) linear code is used for coding, through an interference channel, when the received codeword is \(x\in \mathbb {F}_{q}^{n}\), then find a codeword \(x_{0}\) with the least weight in the additive coset \(x+C\) of x, that is, \(x_{0}\) satisfies

$$x_{0}\in x+C, \text {and}~ w(x_{0})=\min \{w(\alpha )|\alpha \in x+C\}.$$

\(x_{0}\) is called the leader codeword in coset \(x+C\). We’re going to decode x into \(x-x_{0}\).

Lemma 2.10

If the minimum distance of linear code \(C=[n,k]\) is \(d=2e+1\), then there is at most one codeword \(x_{0}\Rightarrow w(x_{0})\le e\) in any additive coset \(x+C\) of C.

Proof

If \(\alpha , \beta \in x+C\), and \(w(\alpha )\le e, w(\beta )\le e\). Then \(\alpha -\beta \in C\). And \(w(\alpha -\beta )\le w(\alpha )+w(\beta )=2e\), but minimal weight of \(C=\)Minimal distance of \(C=2e+1\), so there are contradictions, thus \(\alpha =\beta \). The Lemma holds.

Corollary 2.5

For a perfect linear code \(C=[n,k]\) with minimal distance \(d=2e+1\), then there exists and only exists a codeword with weight \(\le e\) in any additive coset \(x+C\) of C. In other words, the leader code in any addition set is unique.

Proof

\(x\in \mathbb {F}_{q}^{n}\Rightarrow \exists ~c\in C\) such that \(x\in B_{e}(c)\), that is \(d(c,x)\le e\). So \(w(x -c)\le e.\)But \(x-c \in x+C.\) The Lemma holds.

Definition 2.8

If any two column vectors of the generator matrix G of a linear code \(C=[n,k]\) are linearly independent, C is called a projective code.

In order to discuss the true meaning of projective codes, we consider the \((k-1)\)-dimensional projective space \(PG(k-1,q)\) over \(\mathbb {F}_{q}\).

In \(\mathbb {F}_{q}^{k}\), any two vectors \(a=(a_{1},a_{2},\ldots , a_{k})\), \(b=(b_{1},b_{2},\ldots ,b_{k})\), say \(a\sim b\), if \(\exists ~\lambda \in \mathbb {F}_{q}^{*}\Rightarrow a=\lambda b\). This is an equivalent relation on \(\mathbb {F}_{q}^{k}\). Obviously \(b\sim 0\Leftrightarrow b=0\), any \(a\in \mathbb {F}_{q}^{k}, \overline{a}=\{\lambda a|\lambda \in \mathbb {F}_{q}^{*}\}\), the quotient set \(\mathbb {F}_{q}^{k}/_{\sim }\) is called a \((k-1)\)-dimensional projective space over \(\mathbb {F}_{q}\). Denote as \(PG(k-1,q)\), therefore

$$PG(k-1,q)=\mathbb {F}_{q}^{k}/_{\sim }=\{\overline{a}|a\in \mathbb {F}_{q}^{k}\}.$$

The number of nonzero points in \((k-1)\)-dimensional projective space \(PG(k-1,q)\) is

$$|PG(k-1,q)|=\frac{q^{k}-1}{q-1}=1+q+\cdots +q^{k-1}.$$

A linear code \([n, n-k]\), its check matrix H is a \(k\times n\)-order matrix, and any two column vectors are linearly independent, that is \(H=[a_{1}, a_{2},\ldots , a_{n}]\), then \(\{a_{1}, a_{2}, \ldots , a_{n}\}\subset PG(k-1,q)\) are n with different nonzeros. So the generating matrix of an [nk] projective code consists of n different nonzero points in projective space \(PG(k-1,q)\). Because \(n\le |PG(k-1,q)|\), when the maximum value is reached, i.e.

$$n=|PG(k-1,q)|=\frac{q^{k}-1}{q-1}.$$

This leads to a perfect example of linear codes, called Hamming codes.

Definition 2.9

Let \(k>1, n=\frac{q^{k}-1}{q-1}\), a linear code \(C=[n,n-k]\) is called a Hamming code if any two column vectors of the check matrix H of C are linearly independent.

Since C is a \(n-k\)-dimensional linear subspace and \(C^{\perp }\) is a k-dimensional linear subspace, its generating matrix H is a \(k\times n\)-order matrix. Therefore, if any two column vectors of H are linearly independent, they represent n different points in projective space \(PG(k-1,q)\). Because \(n=\frac{q^{k}-1}{q-1}\), then the construction of Hamming codes is the most possible.

Theorem 2.3

Any Hamming code \(C=[n,n-k]\) is a perfect code, its minimum distance is \(d=3\); therefore, Hamming codes are perfect \(1-\)error correcting codes.

Proof

We first prove that the minimum distance of Hamming code C is \(d\ge 3\). If \(d\le 2\), there is \(x=x_{1}x_{2}\ldots x_{n}\Rightarrow w(x)\le 2\), that is, there are at most two characters \(x_{i}\) and \(x_{j}\) are not 0. Because the minimum distance \(d=\) minimum weight w of a linear code.

Let \(H=(\alpha _{1},\alpha _{2},\ldots ,\alpha _{n})\) be the check matrix of C. if \(xH'=0\), then

$$(x_{1}, x_{2},\ldots , x_{n}) \left[ \begin{array}{cccc} \alpha _{1}\\ \alpha _{2}\\ \vdots \\ \alpha _{n} \end{array}\right] =0.$$

We have \(\alpha _{i}x_{i}+\alpha _{j}x_{j}=0\), thus \(\alpha _{i}\) and \(\alpha _{j}\) are linearly related, contradiction. So \(d\ge 3\), by Lemma 2.4, then the disjoint radius of C is \(\rho _{1}\ge 1\).

On the other hand, \(c\in C\), by Lemma 2.3, the number of elements in ball \(B_{1}(c)\) is

$$|B_{1}(c)|=1+n(q-1)=q^{k}.$$

Because C is a \((n-k)\)-dimensional linear subspace, that is \(|C|=q^{n-k}\), so

$$|\bigcup _{c\in C}B_{1}(c)|=|C|q^{k}=q^{n}=|\mathbb {F}_{q}^{n}|,$$
$$\Rightarrow \bigcup _{c\in C}B_{1}(c)=\mathbb {F}_{q}^{n}.$$

We have \(1\le \rho _{1}\le \rho \le 1\Rightarrow \rho _{1}=\rho =1\). C is a perfect code. Its minimal distance is \(d=2\rho +1=3\), the Lemma holds.

Next, we discuss the weight polynomial of a linear code C and prove the famous MacWilliams theorem.

\(x\in C=[n, k]\), then the value of weight function w(x) is from 0 to n, actually \(w(x)=0\Leftrightarrow x=0\in C\), \(w(x)=n\Leftrightarrow x=x_{1}\ldots x_{n}, \forall ~ x_{i}\ne 0\). So for each \(i, 0\le i\le n\), define

$$A(i)=\# \{x\in C|w(x)=i\}$$

and weighted polynomials of C.

$$A(z)=\sum _{i=0}^{n}A_{i}z^{i}, ~z~ \text {is a variable}.$$

Obviously, for any given \(c\in C\), then the number of codewords in C whose Hamming distance to c is exactly equal to i is \(A_{i}\), that is

$$A_{i}=\# \{x\in C|d(x,c)=i\}.$$

The codes with the above properties are called distance invariant codes; obviously, linear codes are distance invariant codes.

The following result was proved by MacWilliams in 1963; he established the relationship between the weight polynomials of a linear code C and its dual code \(C^{\perp }\), which is the most basic achievement in code theory.

Theorem 2.4

(MacWilliams) Let \(C=[n,k]\) be a linear code over \(\mathbb {F}_{q}\) and the weight polynomial be A(z), \(C^{\perp }\) is the dual code of C, the weight polynomial is B(z), then

$$B(z)=q^{-k}(1+(q-1)z)^{n}A(\frac{1-z}{1+(q-1)z}).$$

Specially, when \(q=2\),

$$2^{k}B(z)=(1+z)^{n}A(\frac{1-z}{1+z}).$$

Proof

Let \(\psi (a)\) be an additive feature on \(\mathbb {F}_{q}\). \(\psi (a)\) can be constructed as follows:

$$\psi (a)=\exp (\frac{2\pi i tr(a)}{p}),tr(a):\mathbb {F}_{q}\rightarrow \mathbb {F}_{p}.$$

For any \(c\in C\), we define the polynomial \(g_{c}(z)\) as

$$\begin{aligned} g_{c}(z)=\sum _{x\in \mathbb {F}_{q}^{n}}z^{w(x)}\psi (<x,c>), \end{aligned}$$
(2.9)

therefore,

$$\begin{aligned} \sum _{c\in C}g_{c}(z)=\sum _{x\in \mathbb {F}_{q}^{n}}z^{w(x)}\sum _{c\in C}\psi (<x,c>), \end{aligned}$$
(2.10)

Let’s calculate the inner sum of (2.10). If \(x\in C^{\perp }\), then

$$\sum _{c\in C}\psi (<x,c>)=|C|.$$

If \(x\notin C^{\perp }\), let’s prove

$$\begin{aligned} \sum _{c\in C}\psi (<x,c>)=0. \end{aligned}$$
(2.11)

If \(x\in \mathbb {F}_{q}^{n}\), \(x\notin C^{\perp }\), let

$$T(x)=\{y\in C\mid <y,x>=0\} \subsetneq C,$$

so T(x) is a linear subspace of C. \(c\in C\), we consider additive cosets any two codewords \(c+y_{1}\) and \(c+y_{2}\) in this set , we have

$$<c+y_{1},x>=<c+y_{2},x>=<c,x>.$$

On the contrary, any two additive cosets \(c_{1}+T(x)\), \(c_{2}+T(x)\), if \(<c_{1},x>=<c_{2},x>\), then \(<c_{1}-c_{2},x>=0\), that is \(c_{1}-c_{2}\in T(x)\), so \(c_{1}+T(x)=c_{2}+T(x)\). Therefore, the inner product of any two codewords in \(c+T(x)\subset C\) is the same with x. Conversely, different additive cosets and the inner product of x are not equal. Because \(x\notin C^{\perp },\exists ~ c_{0}\in C\), such that \(<c_{0},x>\ne 0\), let \(<c_{0},x>=a \ne 0\), then \(<a^{-1}c_{0},x>=1\), let \(c_{1}=a^{-1}c_{0}\in C\), then \(<c_{1}, x>=1\). Therefore, \(\forall a\in \mathbb {F}_{q}\Rightarrow <ac_{1},x>=a\). \(<c,x>\) takes every element of \(\mathbb {F}_{q}\), so

$$\sum _{c\in C}\psi (<x,c>)=[C:T(x)]\sum _{a\in \mathbb {F}_{q}}\psi (a)=0.$$

That is, (2.11) holds. From (2.10), we can get that

$$\begin{aligned} \sum _{c\in C}g_{c}(z)=|C|\sum _{\begin{array}{c} x\in \mathbb {F}_{q}^{n}\\ x\in C^{\perp } \end{array}}z^{w(x)}=|C|B(z). \end{aligned}$$
(2.12)

Define the weight function \(w(a)=1\) for \(a\in \mathbb {F}_{q}\), if \(a\ne 0\), \(w(0)=0\). For any \(x\in \mathbb {F}_{q}^{n}\), \(c\in C\), write \(x=x_{1}x_{2}\ldots x_{n}\), \(c=c_{1}c_{2}\ldots c_{n}\), then it is defined by G, we have

$$\begin{aligned} \begin{aligned} g_{c}(z)&=\sum _{\begin{array}{c} 1\le i\le n \\ x_{i}\in \mathbb {F}_{q} \end{array}}z^{w(x_{1})+w(x_{2})+\cdots +w(x_{n})}\psi (c_{1}x_{1}+\cdots +c_{n}x_{n})\\&=\prod _{i=1}^{n}\sum _{x\in \mathbb {F}_{q}}z^{w(x)}\psi (c_{i}x). \end{aligned} \end{aligned}$$
(2.13)

The inner layer sum of the above formula can be calculated as

$$\begin{aligned} \sum _{x\in \mathbb {F}_{q}}z^{w(x)}\psi (c_{i}x)={\left\{ \begin{aligned}&1-z,\quad \ \&if c_{i}\ne 0,\\&1+(q-1)z,\quad \ \ \&if c_{i}=0. \end{aligned} \right. } \end{aligned}$$

From (2.13), then we have

$$g_{c}(z)=(1-z)^{w(c)}(1+(q-1)z)^{n-w(c)}.$$

Thus

$$\begin{aligned} \begin{aligned} \sum _{c\in C}g_{c}(z)&=(1+(q-1)z)^{n}\sum _{c\in C}(\frac{1-z}{1+(q-1)z})^{w(c)}\\&=(1+(q-1)z)^{n}A(\frac{1-z}{1+(q-1)z}). \end{aligned} \end{aligned}$$

Finally, from (2.12), we have

$$\begin{aligned} \begin{aligned} B(z)&=\frac{1}{|C|}(1+(q-1)z)^{n}A(\frac{1-z}{1+(q-1)z})\\&=q^{-k}(1+(q-1)z)^{n}A(\frac{1-z}{1+(q-1)z}). \end{aligned} \end{aligned}$$

We have completed the proof of the theorem.

2.3 Lee Distance

 \(m>1\) is a positive integer, \(\mathbb {Z}_{m}\) a is residue class rings of \({{\mathrm{mod}\,}}m\), if \(\mathbb {Z}_{m}\) is the alphabet and \(C\subset \mathbb {Z}_{m}^{n}\) is the proper subset, then C is called an m-ary code. In this case, Hamming distance is not the best tool to measure error, we substitute Lee distance and Lee weight. Let \(i\in \mathbb {Z}_{m}\), define Lee weight as

$$\begin{aligned} W_{L}(i)=\min \{i,m-i\}. \end{aligned}$$
(2.14)

Obviously,

$$\begin{aligned} W_{L}(0)=0, ~W_{L}(-i)=W_{L}(m-i)=W_{L}(i). \end{aligned}$$
(2.15)

Suppose \(a=(a_{1}, a_{2}, \ldots , a_{n})=a_{1}a_{2}\ldots a_{n}\in \mathbb {Z}_{m}^{n}\), \(b=b_{1}b_{2}\ldots b_{n}\in \mathbb {Z}_{m}^{n}\), define Lee weight and Lee distance on m-ary code C as follows

$$ {\left\{ \begin{array}{ll} \ W_{L}(a)=\sum \limits _{i=1}^{n}W_{L}(a_{i})\\ \ d_{L}(a,b)=W_{L}(a-b). \end{array}\right. } $$

From (2.15), we have

$$W_{L}(-a)=W_{L}(a),d_{L}(a,b)=d_{L}(b,a),\forall ~ a,b\in \mathbb {Z}_{m}^{n}.$$

Lemma 2.11

For \(\forall ~ a,b,c\in \mathbb {Z}_{m}^{n}\), we have the following trigonometric inequalities

$$d_{L}(a,b)\le d_{L}(a,c)+d_{L}(c,b).$$

Proof

Suppose \(0\le i<m\), \(0\le j<m\), we have

$$\begin{aligned} W_{L}(i+j)\le W_{L}(i)+W_{L}(j). \end{aligned}$$
(2.16)

Because \(0\le i+j\le \frac{m}{2}\), then

$$W_{L}(i+j)=i+j=W_{L}(i)+W_{L}(j).$$

If \(\frac{m}{2}<i+j<m\), we discuss it in three ways,

(1)  \(i\le \frac{m}{2}\), \(j\le \frac{m}{2}\), there is

$$W_{L}(i+j)=m-i-j<i+j=W_{L}(i)+W_{L}(j).$$

(2)  \(i\le \frac{m}{2}\), \(j>\frac{m}{2}\), there is

$$W_{L}(i+j)=m-i-j\le m-j=W_{L}(j)\le W_{L}(i)+W_{L}(j).$$

(3)  \(i>\frac{m}{2}\), \(j\le \frac{m}{2}\), there is

$$W_{L}(i+j)=m-i-j\le m-i=W_{L}(i)\le W_{L}(i)+W_{L}(j).$$

So we always have (2.16), in \(\mathbb {Z}_{m}^{n}\), (2.16) can be extended to

$$W_{L}(a+b)\le W_{L}(a)+W_{L}(b), \forall ~a,b\in \mathbb {Z}_{m}^{n}.$$

So

$$\begin{aligned} \begin{aligned} d_{L}(a,b)&=W_{L}(a-b)=W_{L}((a-c)+(c-b))\\&\le W_{L}(a-c)+W_{L}(c-b)=d_{L}(a,c)+d_{L}(c,b). \end{aligned} \end{aligned}$$

The Lemma holds.

Next let’s make \(m=4\), the alphabet is \(\mathbb {Z}_{4}\), On a 4-ary code, we discuss Lee weight and Lee distance. Suppose \(a\in \mathbb {Z}_{4}^{n},~0 \le i \le 3\), let

$$\begin{aligned} n_{i}(a)=\# \{j|1\le j\le n,a=a_{1}a_{2}\ldots a_{n},a_{j}=i\}. \end{aligned}$$
(2.17)

\(n_{i}(a)\) is the number of characters equal to i in codeword a. \(C\subset \mathbb {Z}_{4}^{n}\) is a 4-ary code, the symmetric polynomial and Lee weight polynomial of C are defined as

$$\begin{aligned} swe_{C}(w,x,y)=\sum _{c\in C}w^{n_{0}(c)}x^{n_{1}(c)+n_{3}(c)}y^{n_{2}(c)} \end{aligned}$$
(2.18)

and

$$\begin{aligned} Lee_{C}(x,y)=\sum _{c\in C}x^{2n-W_{L}(c)}y^{W_{L}(c)}. \end{aligned}$$
(2.19)

Lemma 2.12

Let \(C\subset \mathbb {Z}_{4}^{n}\) is a 4-ary code with codeword length of n, then the symmetric polynomials and Lee weight polynomials have the following relation on C,

$$Lee_{C}(x,y)=swe_{C}(x^{2},xy,y^{2}).$$

Proof

\(a\in \mathbb {Z}_{4}^{n}\), by definition

$$n_{0}(a)+n_{1}(a)+n_{2}(a)+n_{3}(a)=n.$$

Let \(a=a_{1}a_{2}\ldots a_{n}\), then

$$W_{L}(a)=\sum _{i=1}^{n}W_{L}(a_{i})=n_{1}(a)+2n_{2}(a)+n_{3}(a).$$

So

$$\begin{aligned} \begin{aligned} Lee_{C}(x,y)&=\sum _{c\in C}x^{2n_{0}(c)}\cdot (xy)^{n_{1}(c)+n_{3}(c)}y^{2n_{2}(c)}\\&=swe_{C}(x^{2},xy,y^{2}). \end{aligned} \end{aligned}$$

The Lemma holds.

By using Lee weight and Lee distance, we can extend the MacWilliams theorem to \(\mathbb {Z}_{4}\) codes, we have

Theorem 2.5

Let \(C\subset \mathbb {Z}_{4}^{n}\) be a linear code and \(C^{\perp }\) be its dual code, \(Lee_{C}(x,y)\) be a Lee weighted polynomial of C, then

$$Lee_{C^{\perp }}(x,y)=\frac{1}{|C|}Lee_{C}(x+y,x-y).$$

Proof

Let \(\psi \) be a nontrivial characteristic of \(\mathbb {Z}_{4}\), and let \(\psi \) be

$$\psi (i)=(\sqrt{-1})^{i},i=0,1,2,3.$$

Let f(u) be a function defined on \(\mathbb {Z}_{4}^{n}\), we let

$$\begin{aligned} g(c)=\sum _{u\in \mathbb {Z}_{4}^{n}}f(u)\psi (<c,u>). \end{aligned}$$
(2.20)

As in Theorem 2.4, there are

$$\begin{aligned} \sum _{c\in C}g(c)=|C|\sum _{u\in C^{\perp }}f(u). \end{aligned}$$
(2.21)

Take

$$f(u)=w^{n_{0}(u)}x^{n_{1}(u)+n_{3}(u)}y^{n_{2}(u)}, ~u \in \mathbb {Z}_{4}^{n}.$$

Write \(u=u_{1}u_{2}\ldots u_{n}\in \mathbb {Z}_{4}^{n}\), then for each \(i, ~0 \le i \le 3\), we have

$$n_{i}(u)=n_{i}(u_{1})+n_{i}(u_{2})+\cdots +n_{i}(u_{n}).$$

Thus

$$f(u)=\prod _{i=1}^{n}f(u_{i}).$$

Let \(c=c_{1}c_{2}\ldots c_{n}\in \mathbb {Z}_{4}^{n}\), by (2.20),

$$\begin{aligned} g(c)=\prod _{i=1}^{n}(\sum _{u\in \mathbb {Z}_{4}}f(u)\psi (<c_{i},u>)). \end{aligned}$$
(2.22)

Now we calculate the inner sum on the right side of equation (2.22).

$$ \sum _{u\in \mathbb {Z}_{4}}f(u)\psi (<c_{i},u>)= {\left\{ \begin{array}{ll}w+2x+y,\quad \ \ &{} \text {if}~c_{i}=0\\ w-y,\quad \ \ &{} \text {if}~c_{i}=1~\text {or}~3\\ w-2x+y,\quad \ \ &{}\text {or}~c_{i}=2. \end{array}\right. }.$$

by (2.22),

$$g(c)=(w+2x+y)^{n_{0}(c)}(w-y)^{n_{1}(c)+n_{3}(c)}(w-2x+y)^{n_{2}(c)}.$$

So

$$\sum _{c\in C}g(c)=swe_{C}(w+2x+y,w-y,w-2x+y).$$

by (2.21),

$$|C|swe_{C^{\perp }}(w,x,y)=swe_{C}(w+2x+y,w-y,w-2x+y).$$

By Lemma 2.12, and replace the variable, there are

$$Lee_{C^{\perp }}(x,y)=\frac{1}{|C|}Lee_{C}(x+y,x-y).$$

We have completed the proof.

2.4 Some Typical Codes

2.4.1 Hadamard Codes

In order to introduce Hadamard codes, we first define a Hadamard matrix of order n. Let \(H=(a_{ij})\), if \(a_{ij}=\pm 1\), and

$$HH'=nI_{n}= \left[ \begin{array}{cccc} n &{} 0 &{} \cdots &{} 0\\ 0 &{} n &{} \cdots &{} 0\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} n \end{array}\right] ,$$

H is called a Hadamard matrix of order n. It is easy to verify that the following \(H_{2}\) is a Hadamard matrix of second order

$$H_{2}= \left[ \begin{array}{cccc} 1 &{} 1\\ 1 &{} -1 \end{array}\right] .$$

In order to obtain higher-order Hadamard matrices, a useful tool is the so-called Kronecker product. Let \(A=(a_{ij})_{m\times m}\), \(B=(b_{ij})_{n\times n}\), then A and B’s Kronecker product \(A\otimes B\) define as

$$A\otimes B= \left[ \begin{array}{cccc} a_{11}B &{} a_{12}B &{} \cdots &{} a_{1m}B\\ a_{21}B &{} a_{22}B &{} \cdots &{} a_{2m}B\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ a_{m1}B &{} a_{m2}B &{} \cdots &{} a_{mm}B\\ \end{array}\right] .$$

Obviously, \(A\otimes B\) is a square matrix of order \(nm\times nm\). The following results are easy to prove.

Lemma 2.13

Let A be a Hadamard matrix of order m, B be a Hadamard matrix of order n, then \(A\otimes B\) be a Hadamard matrix of order \(nm\times nm\).

Proof

Let \(A=(a_{ij})_{m\times m}\), \(B=(b_{ij})_{n\times n}\), \(H=A\otimes B\), then

$$\begin{aligned} \begin{aligned} HH'&= \left[ \begin{array}{cccc} a_{11}B &{} a_{12}B &{} \cdots &{} a_{1m}B\\ a_{21}B &{} a_{22}B &{} \cdots &{} a_{2m}B\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ a_{m1}B &{} a_{m2}B &{} \cdots &{} a_{mm}B\\ \end{array}\right] . \left[ \begin{array}{cccc} a_{11}B' &{} a_{21}B' &{} \cdots &{} a_{m1}B'\\ a_{12}B' &{} a_{22}B' &{} \cdots &{} a_{m2}B'\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ a_{1m}B' &{} a_{2m}B' &{} \cdots &{} a_{mm}B'\\ \end{array}\right] \\&= \left[ \begin{array}{cccc} c_{11}BB' &{} c_{12}BB' &{} \cdots &{} c_{1m}BB'\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ c_{m1}BB' &{} c_{m2}BB' &{} \cdots &{} c_{mm}BB'\\ \end{array}\right] \\&= \left[ \begin{array}{cccc} mnI_{n} &{} 0 &{} \cdots &{} 0\\ 0 &{} mnI_{n} &{} \cdots &{} 0\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} mnI_{n}\\ \end{array}\right] \\&=mnI_{nm}. \end{aligned} \end{aligned}$$

The Lemma holds.

Since \(H_{2}\) is a Hadamard matrix of order 2, then

$$H_{2}\otimes H_{2}=H_{2}^{\otimes _{2}},H_{2}\otimes H_{2}\otimes \cdots \otimes H_{2}=H_{2}^{\otimes _{n}}$$

are Hadamard matrix of order 4 and order \(2^{n}\) respectively.

Let n be an even number and \(H_{n}\) be a Hadamard matrix of order n, take \(\alpha _{1},\alpha _{2},\ldots , \alpha _{n}\) as n row vectors, i.e.,

$$H_{n}= \left[ \begin{array}{cccc} \alpha _{1}\\ \alpha _{2}\\ \vdots \\ \alpha _{n} \end{array}\right] , -H_{n}= \left[ \begin{array}{cccc} -\alpha _{1}\\ -\alpha _{2}\\ \vdots \\ -\alpha _{n} \end{array}\right] .$$

We get 2n row vectors \(\{\pm \alpha _{1},\pm \alpha _{2},\ldots , \pm \alpha _{n}\}\), for each row vectors \(\pm \alpha _{i}\), we replace the component \(-1\) with 0, the row vector \(\alpha _{i}\) so permuted is denoted as \(\overline{\alpha _{i}}\), \(-\alpha _{i}\) denote as \(\overline{-\alpha _{i}}\), so \(\overline{\pm \alpha _{i}}\) forms a vector of \(\mathbb {F}_{2}^{n}\), denote as

$$C=\{\overline{\pm \alpha _{1}}, \overline{\pm \alpha _{2}},\ldots , \overline{\pm \alpha _{n}}\}\subset \mathbb {F}_{2}^{n}.$$

C is called a Hadamard code.

Theorem 2.6

The minimum distance of Hadamard code C of length n (n is an even number) is \(d=\frac{n}{2}\).

Proof

Let \(H_{n}\) be a Hadamard matrix of order n, \(H_{n}= \left[ \begin{array}{cccc} \alpha _{1}\\ \alpha _{2}\\ \vdots \\ \alpha _{n} \end{array}\right] ,\) Each \(\alpha _{i}\) is a row vector of \(H_{n}\), substitute \(\alpha _{i}{\mathop {\longrightarrow }\limits ^{\sigma }}\overline{\alpha _{i}}\), such that each \(\overline{\alpha _{i}}\subset \mathbb {F}_{2}^{n}\) become a binary codeword. We see that this kind of permutation does not change the corresponding Hamming distance, that is

$$ {\left\{ \begin{array}{ll} \ d(\alpha _{i},\alpha _{j})=d(\overline{\alpha }_{i},\overline{\alpha }_{j})\\ \ d(-\alpha _{i},-\alpha _{j})=d(\overline{\alpha }_{i},\overline{\alpha }_{j}), \end{array}\right. }$$

where \(i\ne j\). Let us prove that the minimum distance of C is \(\frac{n}{2}\), let \(a=a_{1}a_{2}\ldots a_{n}\), \(b=b_{1}b_{2}\ldots b_{n}\) are two different row vectors of Hadamard matrix \(H_{n}\), because of

$$ab'=0\Rightarrow \sum _{i=1}^{n}a_{i}b_{i}=0.$$

And \(a_{i}=\pm 1\), \(b_{i}=\pm 1\). Let the number of the same character be \(d_{1}\) and the number of different characters be \(d=d(a,b)\), so there are \(d_{1}-d=0\), that is \(d_{1}=d\), but \(d_{1}+d=n\), so \(d=\frac{n}{2}.\) The Lemma holds.

Corollary 2.6

\(C=\{\overline{\pm \alpha _{1}},\overline{\pm \alpha _{2}},\ldots , \overline{\pm \alpha _{n}}\}\) is Hadamard code, then the Hamming distance of any two different codewords on C is \(\frac{n}{2}\).

Proof

\(\{\pm {\alpha _{1}},\pm {\alpha _{2}},\ldots , \pm {\alpha _{n}}\}\) s the row vector of Hadamard matrix, let \(a=\pm \alpha _{i},b=\pm \alpha _{j}(i\ne j)\), then

$$ab'=\pm \sum _{i=1}^{n}a_{i}b_{i}=0\Rightarrow d(a,b)=\frac{n}{2}.$$

A code of length n, number of codewords M, minimum distance d, denoted as (nMd), different from linear code [nk] or [nkd], Hadamard code is

$$C=(n,2n,\frac{n}{2}).$$

When \(n=8\), \(d=4\), this is an extension of Hamming code. When \(n=32\), (32, 64, 16) is the code used by the U.S. Mars probe in 1969 to transmit pictures taken on Mars.

2.4.2 Binary Golay Codes

In the theory and application of channel coding, binary Golay code is the most famous one. In order to introduce Golay code \(G_{23}\) completely, we first introduce the concept of \(t-(m,k,\lambda )\) design.

Let S be a set of m elements, that is \(|S|=m\). The elements in S are called points. Let \(\mathfrak {R}\) be the set of subsets with k elements in S, \(|\mathfrak {R}|=M\), i.e.,

$$\mathfrak {R}=\{B_{1},B_{2},\ldots , B_{M}\},B_{i}\subset S,|B_{i}|=k,1\le i\le M.$$

Element \(B_{i}\) in \(\mathfrak {R}\) is called block.

Definition 2.10

\((S,\mathfrak {R})\) is called \(t-(m,k,\lambda )\) design, if for any \(T\subset S, |T|=t\), then there are exactly \(\lambda \) blocks B in \(\mathfrak {R}\) such that \(T\subset B\). If \((S, \mathfrak {R})\) is a \(t-(m,k,\lambda )\) design, denote as \((S, \mathfrak {R})=t-(m,k,\lambda )\). If \(\lambda =1\), then \(t-(m,k,1)\) is called a Steiner system.

In a \(t-(m,k,\lambda )\) design \((S, \mathfrak {R})\), we introduce its occurrence matrix. For any \(a\in S\), the characteristic function \(\chi _{i}(a)\) is defined as

$$ \chi _{i}(a)= {\left\{ \begin{array}{ll}1,\quad \ \ &{} \text {if}~ a\in B_{i},\\ 0,\quad \ \ &{} \text {if}~ a\notin B_{i}, \end{array}\right. } $$

write \(S=\{a_{1}, a_{2}, \ldots , a_{m}\}, \mathfrak {R}=\{B_{1},B_{2},\ldots , B_{M}\}, |\mathfrak {R}|=M\). Matrix

$$A=(\chi _{j}(a_{i}))_{m\times M}= \left[ \begin{array}{cccc} \chi _{1}(a_{1}) &{} \chi _{2}(a_{1}) &{} \cdots &{} \chi _{M}(a_{1})\\ \chi _{1}(a_{2}) &{} \chi _{2}(a_{2}) &{} \cdots &{} \chi _{M}(a_{2})\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ \chi _{1}(a_{m}) &{} \chi _{2}(a_{m}) &{} \cdots &{} \chi _{M}(a_{m}) \end{array}\right] ,$$

A is called the occurrence matrix of \(t-(m,k,\lambda )\) design.

Let’s now consider a concrete example, \(2-(11,6,3)\) design. Where there are 11 points in S and 6 points in \(\mathfrak {R}\), and any two points in S have exactly three blocks containing it.

Lemma 2.14

\(2-(11,6,3)\) design is the only definite one, that is to say, let \(S=\{a_{1},a_{2},\ldots , a_{11}\}\), then there are 11 blocks in \(\mathfrak {R}\),

$$\mathfrak {R}=\{B_{1},B_{2},\ldots , B_{11}\}.$$

And for any \(a\in S\), exactly 6 blocks \(B_{j}\) in \(\mathfrak {R}\) contain a.

Proof

Suppose \( \forall ~ a \in S\), there is exactly l \(B_{j}\) containing it, because there are exactly 3 blocks in any 2 points, so there are \(6l-l=10 \times 3\). Then \(l=6\). In addition, suppose \(|\mathfrak {R}|=M\), because each point has exactly six blocks containing it, there is \(6 \times M= 11\times 6\), we can get \(M=11\).

By Lemma 2.14, the generating matrix N of \(2-(11,6,3)\) design is an 11-order square matrix

$$N= \left[ \begin{array}{cccc} \chi _{1}(a_{1}) &{} \chi _{2}(a_{1}) &{} \cdots &{} \chi _{11}(a_{1})\\ \chi _{1}(a_{2}) &{} \chi _{2}(a_{2}) &{} \cdots &{} \chi _{11}(a_{2})\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ \chi _{1}(a_{11}) &{} \chi _{2}(a_{11}) &{} \cdots &{} \chi _{11}(a_{11}) \end{array}\right] .$$

And every row of N has exactly six 1’s and five 0’s, and every column of N has exactly six 1’s and five 0’s.

Lemma 2.15

Let N be the occurrence matrix of \(2-(11,6,3)\) design, then

$$NN'=3I_{11}+3J_{11},J_{11}= \left[ \begin{array}{cccc} 1 &{} 1 &{} \cdots &{} 1\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ 1 &{} 1 &{} \cdots &{} 1 \end{array}\right] .$$

If N is regarded as a square matrix of order 11 over \(\mathbb {F}_{2}\), then

$$NN'=I_{11}+J_{11}.$$

Further \(rank(N)=10\), and the solution of linear equation system \(XN=0\) is exactly two repeated codewords 0 and \(1(0=(0,0, \ldots , 0), 1=(1,1, \ldots , 1))\) in \(\mathbb {F}_{2}^{11}\).

Proof

Let \(NN'=(b_{ij})_{11\times 11}\), defined by

$$b_{ij}=\sum _{k=1}^{11}\chi _{k}(a_{i})\chi _{k}(a_{j}).$$

When \(i\ne j\), \(b_{ij}=3\), when \(i= j\), \(b_{ij}=6\), so we have

$$NN'=3I_{11}+3J_{11}\equiv I_{11}+J_{11}({{\mathrm{mod}\,}}2).$$

Let \(N({{\mathrm{mod}\,}}2)\) still be N, which is a square matrix of order 11 over \(\mathbb {F}_{2}\). we have

$$rank(N)=rank(I_{11})-rank(J_{11})=10.$$

So the solution space of \(XN=0\) is a one-dimensional linear subspace of \(\mathbb {F}_{2}^{11}\). Since each column vector of N has exactly six 1’s and five 0’s, then

$$(1,1,\ldots , 1)N=(0,0,\ldots , 0)\in \mathbb {F}_{2}^{11}.$$

So there are exactly two solutions for \(XN=0\):

$$x=(0,0,\ldots , 0), ~ x=(1,1,\ldots , 1).$$

The Lemma holds.

Next, let’s construct a matrix G of order \(12\times 24\), \(G=(I_{12},P)\), where

where \(\alpha _{i}\in \mathbb {F}_{2}^{24}\) is the 12 row vector of G. Obviously we have a weight function

$$\begin{aligned} w(\alpha _{1})=12, ~w(\alpha _{i})=8,~2\le i\le 12. \end{aligned}$$
(2.23)

Lemma 2.16

Let \(G= \left[ \begin{array}{cccc} \alpha _{1}\\ \alpha _{2}\\ \vdots \\ \alpha _{12} \end{array}\right] \), then \(\{\alpha _{1},\alpha _{2},\ldots ,\alpha _{12}\}\subset \mathbb {F}_{2}^{24}\) is a linear independent group, and the weight of any nonzero linear combination is at least 8, that is

$$\begin{aligned} w(a_{1}\alpha _{1}+a_{2}\alpha _{2}+\cdots +a_{12}\alpha _{12})\ge 8, ~a_{i}~\text {not all zero}. \end{aligned}$$
(2.24)

Proof

Let’s prove that \(\{\alpha _{i}\}_{i=1}^{12}\) is a set of vectors orthogonal to each other, that is, the inner product is \(<\alpha _{i},\alpha _{j}>=\alpha _{i}\alpha _{j}'=0\). Obviously we have

$$<\alpha _{1},\alpha _{j}>=\alpha _{1}\alpha _{j}'=6\equiv 0({{\mathrm{mod}\,}}2), ~j\ne 1.$$

If \(i\ne 1\), \(j\ne 1\), \(i\ne j\), then

$$<\alpha _{i},\alpha _{j}>=1+\sum _{k=1}^{11}\chi _{k}(a_{i})\chi _{k}(a_{j})=4\equiv 0({{\mathrm{mod}\,}}2).$$

So \(<\alpha _{i},\alpha _{j}>=0\), when \(i\ne j\), that is \(\{\alpha _{1},\alpha _{2},\ldots , \alpha _{12}\}\) is a linear independent group of \(\mathbb {F}_{2}^{24}\). If \(a_{i}\in \mathbb {F}_{2}\), not all zero, take \(a=a_{1}a_{2}\ldots a_{12}\), let’s prove (2.24) by induction of w(a). If \(w(a)=1\), the proposition holds by (2.23). When \(w(a)\ge 8\), the proposition is ordinary, for \(2\le w(a)\le 7\), we can still prove

$$w(a_{1}\alpha _{1}+a_{2}\alpha _{2}+\cdots +a_{12}\alpha _{12})\ge 8.$$

So the Lemma holds.

Definition 2.11

The linear code [24, 12] generated by row vector group \(\{\alpha _{1},\alpha _{2},\ldots , \alpha _{12}\}\) of G in \(\mathbb {F}_{2}^{24}\) is called Golay code, denoted as \(G_{24}\). Remove the last component of \(\alpha _{i}\), \(\alpha _{i}\rightarrow \overline{\alpha }_{i}\), then \(\overline{\alpha }_{i}\in \mathbb {F}_{2}^{23}\). The linear code [23, 12] generated by \(\{\overline{\alpha }_{1},\overline{\alpha }_{2},\ldots , \overline{\alpha }_{12}\}\) in \(\mathbb {F}_{2}^{23}\) is called Golay code, denote as \(G_{23}\).

Theorem 2.7

Golay code \(G_{23}\) is a perfect code [23, 12] with minimal distance of \(d=7\).

Proof

Because the minimal distance of linear codes is minimal weight, by Lemma 2.16,

$$w(a_{1}\overline{\alpha }_{1}+a_{2}\overline{\alpha }_{2}+\cdots +a_{12}\overline{\alpha }_{12})\ge w(a_{1}\alpha _{1}+a_{2}\alpha _{2}+\cdots +a_{12}\alpha _{12})-1 \ge 7.$$

On the one hand, \(w(\alpha _{i})=8\) for \(\forall ~\alpha _{i}\), \(i\ne 1\), so there is \(\overline{\alpha }_{i}\Rightarrow w(\overline{\alpha }_{i})=w(\alpha _{i})-1=7\). So the minimum distance of \(G_{23}\) is \(d=7\). On the other hand, we note that

$$|G_{23}|\sum _{i=0}^{3}\left( {\begin{array}{c}23\\ i\end{array}}\right) =2^{12}\sum _{i=0}^{3}\left( {\begin{array}{c}23\\ i\end{array}}\right) =2^{23}.$$

By the sphere-packing condition of Theorem 2.1 \(\Rightarrow G_{23}\) is a perfect code, the Lemma holds.

2.4.3 3-Ary Golay Code

In order to introduce 3-ary Golay codes, we first define a Paley matrix of order q. Let \(q\ge 3\) be an odd number, and define a second-order real-valued multiplication characteristic \(\chi (a)\) in the finite field \(\mathbb {F}_{q}\) as

$$\begin{aligned} \chi (a)={\left\{ \begin{aligned}&0, \ \ \text {if}~ a=0; \\&1,\ \ \text {if}~ a\in (\mathbb {F}_{q}^{*})^{2};\\&-1,\ \ \text {if}~a \not \in (\mathbb {F}_{q}^{*})^{2}. \end{aligned} \right. } \end{aligned}$$

Obviously, \(\chi \) is a character in \(\mathbb {F}_{q}^{*}\). Because \(\mathbb {F}_{q}^{*}\) is a \((q-1)\)-order cyclic multiplicative group, so we have

$$\begin{aligned} \chi (-1)=(-1)^{\frac{q-1}{2}}={\left\{ \begin{aligned}&1,\ \ \ \ \ \text {if}~ q\equiv 1({{\mathrm{mod}\,}}4);\\&-1,\ \ \text {if}~q\equiv 2~ \text {or}~ 3({{\mathrm{mod}\,}}4). \end{aligned} \right. } \end{aligned}$$

Write \(\mathbb {F}_{q}=\{a_{0}, a_{1}, \ldots , a_{q-1}\}\) , where \(a_{0}=0\), then Paley matrix \(S_{q}\) of order q is defined as

$$\begin{aligned} S_{q}=(\chi (a_{i}-a_{j}))_{q\times q}= \left[ \begin{array}{cccccc} 0 &{} \chi (-a_{1}) &{}\chi (-a_{2}) &{}\cdots &{} \chi (-a_{q-1})\\ \chi (a_{1}) &{}0&{}\chi (a_{1}-a_{2}) &{}\cdots &{} \chi (a_{1}-a_{q-1})\\ \chi (a_{2}) &{}\chi (a_{2}-a_{1}) &{}0&{}\cdots &{} \chi (a_{2}-a_{q-1})\\ \cdots &{}\cdots &{}\cdots &{}\cdots &{}\cdots \\ \chi (a_{q-1}) &{}\chi (a_{q-1}-a_{1}) &{}\cdots &{}\cdots &{} 0\\ \end{array} \right] . \end{aligned}$$

Lemma 2.17

The Paley matrix \(S_{q}\) of order q has the following properties:

  1. (i)

    \(S_{q}J_{q}=J_{q}S_{q}=0\).

  2. (ii)

    \(S_{q}S_{q}^{'}=qI_{q}-J_{q}\).

  3. (iii)

    \(S_{q}^{'}=(-1)^{\frac{q-1}{2}}S_{q}\).

Here, \(I_{q}\) is the unit matrix of order q and \(J_{q}\) is the square matrix of order q with all elements of 1.

Proof

Let \(S_{q}J_{q}=(b_{ij})_{q\times q}\), then for \(\forall ~ 0\le i \le q-1,~ 0\le j \le q-1 \), there is

$$\begin{aligned} b_{ij}=\sum _{k=0}^{q-1}\chi (a_{i}-a_{k})=\sum _{c \in \mathbb {F}_{q}}\chi (c)=0. \end{aligned}$$

So (i) holds. To prove (ii), let \(S_{q}S_{q}'=(c_{ij})_{q\times q}\), then

$$\begin{aligned} c_{ij}=\sum _{k=0}^{q-1}\chi (a_{i}-a_{k})\chi (a_{j}-a_{k}). \end{aligned}$$

Obviously, we have

$$\begin{aligned} c_{ij}={\left\{ \begin{aligned}&q-1,\ \ \text {if}~ i=j;\\&-1,\ \ \ \ \text {if}~i\ne j. \end{aligned} \right. } \end{aligned}$$

So (ii) holds. To prove (iii), noticed that \(\chi (-1)=(-1)^{\frac{q-1}{2}}\), so

$$\begin{aligned} S_{q}=\chi (-1)S_{q}^{'}=(-1)^{\frac{q-1}{2}}S_{q}^{'}, \end{aligned}$$

the Lemma holds.

Let \(q=5\), we consider the Paley matrix \(S_{5}\) of order 5, it has been calculated that

$$\begin{aligned} S_{5}= \left[ \begin{array}{cccccc} 0 &{} 1 &{}-1 &{}-1 &{} 1\\ 1 &{}0&{}1 &{}-1&{} -1\\ -1 &{}1 &{}0&{}1&{} -1\\ -1&{}-1&{}1&{}0&{}1\\ 1 &{}-1 &{}-1 &{}1 &{} 0\\ \end{array}\right] . \end{aligned}$$

In \(\mathbb {F}_{3}^{11}\), we consider a linear code C whose generator matrix is

So C is a six-dimensional linear subspace in \(\mathbb {F}_{3}^{11}\), that is \(C=[11,6]\). This code is called 3-ary Golay code. In order to further discuss 3-ary Golay codes [11, 6], we discuss the concept of extended codes of linear codes.

If \(C\subset \mathbb {F}_{q}^{n}\) is a q-ary linear code of length n, the extension code \(\overline{C}\) of C is defined as

$$\begin{aligned} \overline{C}=\{(c_{1},c_{2},\ldots ,c_{n+1})|(c_{1},c_{2},\ldots ,c_{n}) \in C, ~\text {and} \sum _{i=1}^{n+1}c_{i}=0\}. \end{aligned}$$

Obviously, \(\overline{C}\subset \mathbb {F}_{q}^{n+1}\) is a linear code.

Lemma 2.18

If \(C\subset \mathbb {F}_{q}^{n}\) is a linear code, the generation matrix is G and the test matrix is H, then the length of extension code \(\overline{C}\subset \mathbb {F}_{q}^{n+1}\) is \(n+1\), its generation matrix \(\overline{G}\) and test matrix \(\overline{H}\) are

respectively. Where \(\beta \) is a column vector and satisfies that the sum of all column vectors of \(\beta \) and G is 0. Further, let \(q=2\), if the minimum distance d of C is odd, then the minimum distance of \(\overline{C}\) is \(d+1\).

Proof

The generation matrix and check matrix of \(\overline{C}\) can be given directly by definition. The minimal weight \(w=w(c)\) of C can be obtained by \(c=c_{1}c_{2}\ldots c_{n} \in C\), because \(q=2\), so there are w \(c_{i}=1\), and w is an odd number, then \(w\ne 0\), let \(c_{n+1}=1\), then

$$\begin{aligned} c^{*}=c_{1}c_{2}\ldots c_{n+1} \in \overline{C} ~\text {and} ~w(c^{*})=d+1. \end{aligned}$$

This is the minimal weight in \(\overline{C}\). The lemma is proved.

Consider the extension codes \(\overline{C}=[12,6]\) of 3-ary Golay code \(C=[11,6]\), its generating matrix is

(2.25)

Note that the sum of the components of each row vector of \(S_{5}\) is 0, and the inner product of the different row vectors is - 1, and the inner product of the same row vector is 1, so

$$\begin{aligned} \overline{G}\cdot \overline{G}^{'}=0. \end{aligned}$$

Therefore, the extended code \(\overline{C}\) is a self-dual code, that is \((\overline{C})^{\bot }=\overline{C}\).

Theorem 2.8

3-ary Golay code C is a perfect linear code [11, 6], its minimum distance is 5, so it is a 2-error correcting code.

Proof

The weight of each row vector of \(\overline{G}\) is 6, according to the calculation, the weight of the linear combination of row vectors of \(\overline{G}\) is 6, so the minimum distance of extension code \(\overline{C}\) is \(6\Rightarrow \) the minimum distance of C is 5. So the disjoint radius of C is \(\rho _{1}=2\). And because

$$\begin{aligned} |C|=3^{6}, ~\sum _{i=0}^{2} \left( {\begin{array}{c}11\\ i\end{array}}\right) 2^{i}=3^{5}, \end{aligned}$$

then the condition of sphere packing satisfy

$$\begin{aligned} |C|\sum _{i=0}^{2} \left( {\begin{array}{c}11\\ i\end{array}}\right) 2^{i}=3^{11}. \end{aligned}$$

Thus by Theorem 2.1, C is a perfect code, the Theorem holds.

Remark 2.1

It is worth noting that J.H.VanLint in 1971 (See reference 2 [24]), \(A.Tiet\ddot{a}v\ddot{a}inen\) in 1973(See reference 2 [43]) independently proved that perfect codes (nontrivial) with minimal distance greater than 3 have only 2-ary Golay codes \(G_{23}\) and 3-ary Golay codes over any finite field.

2.4.4 Reed–Muller Codes

Reed and Muller proposed a class of 2-ary linear codes based on finite geometry in 1954. In order to discuss the structure and properties of these codes, we first prove some results in number theory.

Lemma 2.19

Let p be a prime, kn be two nonnegative integers whose p-ary is expressed as

$$\begin{aligned} n=\sum _{i=0}^{l}n_{i}p^{i}, ~k=\sum _{i=0}^{l}k_{i}p^{i}. \end{aligned}$$

Then

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) \equiv \prod \limits _{i=0}^{l}\left( {\begin{array}{c}n_{i}\\ k_{i}\end{array}}\right) ({{\mathrm{mod}\,}}{p}), ~\text {where} \left( {\begin{array}{c}n_{i}\\ k_{i}\end{array}}\right) =0,~\text {if}~k_{i}>n_{i}. \end{aligned}$$

Proof

If \(k=0\), then \(k_{i}=0\), so the above formula holds. If \(n=k\), then \(n_{i}=k_{i}\), the above formula also holds. We might as well make \(1\le k <n\), note the polynomial congruence

$$\begin{aligned} (1+x)^{p}\equiv 1+x^{p}( {{\mathrm{mod}\,}}{p}), \end{aligned}$$

so we have

$$\begin{aligned} {\begin{matrix} (1+x)^{n}&{}=(1+x)^{\sum _{i=0}^{l}n_{i}p^{i}}\\ &{}\equiv \prod \limits _{i=0}^{l}(1+x^{{p}^{i}})^{n_{i}}({{\mathrm{mod}\,}}{p}), \end{matrix}} \end{aligned}$$

Comparing the coefficients of the \(x^{k}\) terms on both sides of the above formula, if there is a \(k_{j}>n_{j}\), then the \(x^{k}\) terms do not appear on the right side of the above formula, which means that the coefficients of the \(x^{k}\) terms on the left side are

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) \equiv 0 ({{\mathrm{mod}\,}}{p}). \end{aligned}$$

If \(k_{i}\le n_{i}\) , \(\forall ~ 0 \le i \le l\), then

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) \equiv \prod \limits _{i=0}^{l}\left( {\begin{array}{c}n_{i}\\ k_{i}\end{array}}\right) ({{{\mathrm{mod}\,}}{p}}). \end{aligned}$$

We complete the proof of Lemma.

Massey defined the concept of polynomial weight for the first time in 1973, on a finite field with characteristic 2 \((q=2^{r})\), a polynomial \(f(x)\in \mathbb {F}_{q}[x]\), whose Hamming weight is defined as

$$w(f(x))=\text {The number of nonzero coefficients of}~f(x).$$

Lemma 2.20

(Massey, 1973) Let \(f(x)=\sum _{i=0}^{l} b_{i}(x+c)^{i}\in \mathbb {F}_{q}[x]\) and \(b_{l}\ne 0\), let \(i_{0}\) be the smallest subscript i of \(b_{i} \ne 0\), then

$$\begin{aligned} w(f(x))\ge w((x+c)^{i_{0}}). \end{aligned}$$

Proof

\(l=0\), then \(i_{0}=0\), the lemma holds. Let \(l<2^{n}\) be lemma, we consider \(2^{n} \le l<2^{n+1}\), write f(x) as

$$\begin{aligned} \begin{aligned} f(x)&=\sum _{i=0}^{2^{n}-1}b_{i}(x+c)^{i}+\sum _{i=2^{n}}^{l}b_{i}(x+c)^{i}\\&=f_{1}(x)+(x+c)^{2^{n}}f_{2}(x)\\&=f_{1}(x)+c^{2^{n}}f_{2}(x)+x^{2^{n}}f_{2}(x), \end{aligned} \end{aligned}$$

where \(\deg f_{1}(x)<2^{n}, \deg f_{2}(x)<2^{n}\). There are two situations to discuss:

  1. (i)

    If \(f_{1}(x)=0\), then \(w(f(x))=2w(f_{2}(x))\). Because \(i_{0}\ge 2^{n}\), so

    $$\begin{aligned} \begin{aligned} w((x+c)^{i_{0}})&=w((x^{2^{n}}+c^{2^{n}})(x+c)^{i_{0}-2^{n}})\\&=2w((x+c)^{i_{0}-2^{n}}). \end{aligned} \end{aligned}$$

    From inductive hypothesis

    $$\begin{aligned} w(f_{2}(x))\ge w((x+c)^{i_{0}-2^{n}}). \end{aligned}$$

    So there are

    $$\begin{aligned} w(f(x))=2w(f_{2}(x))>2w((x+c)^{i_{0}-2^{n}})=w((x+c)^{i_{0}}). \end{aligned}$$
  2. (ii)

    \(f_{1}(x)\ne 0\), \(i_{1}\) is the subscript of \(f_{1}(x)\), \(i_{2}\) is the subscript of \(f_{2}(x)\). If the term not 0 in \(f_{1}(x)\) plus the corresponding term of \(c^{2^{n}}f_{2}(x)\) becomes 0, then \(x^{2^{n}}f_{2}(x)\) will have corresponding terms that are not zero, so we always have

    $$\begin{aligned} w(f(x))\ge w(f_{1}(x)), ~w(f(x))\ge w(f_{2}(x)). \end{aligned}$$

    If \(i_{1}<i_{2}\), then \(i_{0}=i_{1}\), from inductive hypothesis,

    $$\begin{aligned} w(f(x))\ge w(f_{1}(x))\ge w((x-c)^{i_{1}})=w((x-c)^{i_{0}}). \end{aligned}$$

    Similarly, if \(i_{2}<i_{1}\), then \(i_{0}=i_{2}\), there is

    $$\begin{aligned} w(f(x))\ge w(f_{2}(x))\ge w((x-c)^{i_{2}})=w((x-c)^{i_{0}}). \end{aligned}$$

    If \(i_{1}=i_{2}\) , then it can always be changed into the case of \(i_{1}\ne i_{0}\), so we always have Lemma holds.

Next, we use Massey’s method to construct Reed–Muller codes. Let \(m\ge 1\), \(\mathbb {F}^{m}_{2}\) be an m-dimensional affine space, denote as AG(m, 2), \(\alpha \in AG(m,2)\) is a point in affine space, write \(\alpha \) as an m-dimensional column vector, let \(\{u_{0}, u_{1}, \ldots , u_{m-1}\}\) be the standard base of \(\mathbb {F}^{m}_{2}\), that is

$$\begin{aligned} \alpha = \left[ \begin{array}{cccccc} a_{0}\\ a_{1}\\ \vdots \\ a_{m-1} \end{array}\right] , u_{0}= \left[ \begin{array}{cccccc} 1\\ 0\\ 0\\ \vdots \\ 0 \end{array}\right] , \ldots , u_{m-1}= \left[ \begin{array}{cccccc} 0\\ 0\\ 0\\ \vdots \\ 1 \end{array}\right] , \end{aligned}$$

where \(a_{i}=0\) or 1. Let’s establish a \(1-1\) correspondence between the points in the integer set \(\{0\le j<2^{m}\}\) and AG(m, 2). Let \(0\le j<2^{m}\), then

$$\begin{aligned} j=\sum _{i=0}^{m-1}a_{ij}2^{i}, a_{ij}\in \mathbb {F}_{2}. \end{aligned}$$

We define

$$\begin{aligned} x_{j}=\sum _{i=0}^{m-1}a_{ij}u_{i}= \left[ \begin{array}{cccccc} a_{0j}\\ a_{1j}\\ \vdots \\ a_{(m-1)j} \end{array}\right] \in \mathbb {F}_{2}^{m}, \end{aligned}$$

Because when \(j_{1}\ne j_{2}\), there is \(x_{j_{1}}\ne x_{j_{2}}\), So \(\{x_{j}|0\le j<2^{m}\}\) gives all the points in \(\mathbb {F}_{2}^{m}\). Write \(n=2^{m}\) and consider the matrix

$$\begin{aligned} E=[x_{0}, x_{1}, \ldots , x_{n-1}]= \left[ \begin{array}{cccccc} a_{00} &{} a_{01} &{} \cdots &{} a_{0(n-1)}\\ a_{10} &{} a_{11} &{} \cdots &{} a_{1(n-1)}\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ a_{(m-1)0} &{} a_{(m-1)1} &{} \cdots &{} a_{(m-1)(n-1)}\\ \end{array}\right] _{m \times n}, \end{aligned}$$

Each row vector \(\alpha _{i}=(a_{i0},a_{i1},\ldots , a_{i(n-1)}) (0\le i \le m-1)\) of E is a vector of \(\mathbb {F}_{2}^{n}\), which is written as

$$\begin{aligned} E= \left[ \begin{array}{cccccc} \alpha _{0}\\ \alpha _{1}\\ \vdots \\ \alpha _{m-1} \end{array}\right] =(a_{ij})_{m\times n}(0\le i<m,0\le j< 2^{m}=n). \end{aligned}$$

For each i, \(0\le i < m\), define a linear subspace in \(\mathbb {F}_{2}^{m}\),

$$\begin{aligned} B_{i}=\{x_{j} \in \mathbb {F}_{2}^{m}|a_{ij}=0\}. \end{aligned}$$

Obviously, \(B_{i}\) is a linear subspace, and the additive coset of \(B_{i}\) is called an \(m-1\)-dimensional plat in \(\mathbb {F}_{2}^{m}\). We consider \(A_{i}=B_{i}+u_{i}\),

$$\begin{aligned} A_{i}=\{x_{j} \in \mathbb {F}_{2}^{m}|a_{ij}=1,0\le j<n \} \Rightarrow |A_{i}|=2^{m-1}. \end{aligned}$$

We define the characteristic function \(\chi _{i}(\alpha )\) in \(\mathbb {F}_{2}^{m}\) according to \(A_{i}\),

$$\begin{aligned} \chi _{i}(\alpha )={\left\{ \begin{aligned}&1,\ \ \text {if}~ \alpha \in A_{i};\\&0,\ \ \text {if}~ \alpha \notin A_{i}. \end{aligned} \right. } \end{aligned}$$

where \(\alpha \in \mathbb {F}_{2}^{m}\). So each row vector \(\alpha _{i}(0\le i<m)\) in E can be expressed as

$$\begin{aligned} \alpha _{i}=(\chi _{i}{(x_{0})},\chi _{i}{(x_{1})}, \ldots , \chi _{i}{(x_{n-1})}). \end{aligned}$$

For any two vectors \(\alpha =(b_{0},b_{1},\ldots , b_{n-1}), \beta =(c_{0},c_{1},\ldots , c_{n-1})\) in \(\mathbb {F}_{2}^{n}\), define the product vector

$$\begin{aligned} \alpha \beta =(b_{0}c_{0},b_{1}c_{1},\ldots , b_{n-1}c_{n-1}) \in \mathbb {F}_{2}^{n}. \end{aligned}$$

So for \(0\le i_{1},i_{2}<m\), we have the product of row vectors of E

$$\begin{aligned} \alpha _{i_{1}}\alpha _{i_{2}}=(\chi _{i_{1}}(x_{0})\chi _{i_{2}}(x_{0}),\chi _{i_{1}}(x_{1})\chi _{i_{2}}(x_{1}),\ldots ,\chi _{i_{1}}(x_{n-1})\chi _{i_{2}}(x_{n-1})). \end{aligned}$$

So the j-th \((0\le j <2^{m})\) component of \(\alpha _{i_{1}}\alpha _{i_{2}}\) is

$$\begin{aligned} \chi _{i_{1}}(x_{j})\chi _{i_{2}}(x_{j})={\left\{ \begin{aligned}&1,\ \ \text {if}~ x_{j} \in A_{i_{1}}\cap A_{i_{2}};\\&0,\ \ \text {if}~ x_{j} \notin A_{i_{1}}\cap A_{i_{2}}. \end{aligned} \right. } \end{aligned}$$

From the definition of \(A_{i}\), obviously,

$$\begin{aligned} |A_{i_{1}}\cap A_{i_{2}}|=2^{m-2}. \end{aligned}$$

Lemma 2.21

Let \(i_{1},i_{2},\ldots , i_{s}\) be the number of \(s (0\le s<m)\) different indexes from 0 to \(m-1\), then

$$\begin{aligned} |A_{i_{1}}\cap A_{i_{2}}\cap \cdots \cap A_{i_{s}}|=2^{m-s}, \end{aligned}$$

And \(\alpha _{i_{1}} \alpha _{i_{2}} \cdots \alpha _{i_{s}} \in \mathbb {F}_{2}^{n}\) has a weight function

$$\begin{aligned} w(\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}})=2^{m-s}. \end{aligned}$$

Proof

The first conclusion is obvious. Let’s just prove the second conclusion,

$$\begin{aligned} \alpha =\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}}=(x_{i_{1}}{(x_{0})}\cdots x_{i_{s}}{(x_{0})}, x_{i_{1}}{(x_{1})}\cdots x_{i_{s}}{(x_{1})},\ldots , x_{i_{1}}{(x_{n-1})}\cdots x_{i_{s}}{(x_{n-1})}) \end{aligned}$$

has \(2^{m-s}\) \(x_{j}\in A_{i_{1}}\cap A_{i_{2}} \cap \cdots \cap A_{i_{s}}\), so there are \(2^{m-s}\) components in \(\alpha \) that are 1 and the others are 0, so

$$\begin{aligned} w(\alpha )=w(\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}})=2^{m-s}, \end{aligned}$$

the Lemma holds.

For \(0 \le l<2^{m}\), I(l) is defined as an indicator set,

$$\begin{aligned} I(l)=\{i_{1},i_{2},\ldots , i_{s}\mid l=\sum _{i=0}^{m-1}a_{il}2^{i} ~\text {satisfy} ~a_{il}=0\}. \end{aligned}$$

The following properties of the indicator set I(l) are obvious:

  1. (i)

    If \(l_{1}\ne l_{2} \Rightarrow I(l_{1})\ne I(l_{2}).\)

  2. (ii)

     \(\bigcup _{0\le l< n} {I(l)}=\{0,1,2,\ldots , m-1\}.\)

  3. (iii)

      If \(l=n-1\Rightarrow I(n-1)\) is an empty set.

The above properties are easy to verify, such as (iii), because \(l=n-1=2^{m}-1=1+2+\cdots +2^{m-1}\), so the subscripts i of \(a_{il}=0\) don’t exist, that is \(I(n-1)=\varnothing \).

Sometimes we can write indicator sets \(I(l)=\{i_{1},i_{2},\ldots , i_{s}\}_{l}\).

Lemma 2.22

Let \(0\le l<n=2^{m}, I(l)=\{i_{1}, i_{2},\ldots , i_{s}\}\), re hypothesis

$$\begin{aligned} \alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}}=(b_{l0},b_{l1},\ldots , b_{l(n-1)})\in \mathbb {F}_{2}^{n}, \end{aligned}$$

then in the ring \(\mathbb {F}_{2}[x]\), there is

$$\begin{aligned} (1+x)^{l}=\sum _{j=0}^{n-1}{b_{lj}x^{n-1-j}}. \end{aligned}$$
(2.26)

Proof

For \(0\le j<n\), write \(j=\sum _{i=0}^{m-1}a_{ij}2^{i}\), then

$$\begin{aligned} n-1-j=\sum _{i=0}^{m-1}c_{ij}2^{i}, \text {where}~c_{ij}=1-a_{ij}. \end{aligned}$$

By Lemma 2.19,

$$\begin{aligned} \left( {\begin{array}{c}l\\ n-1-j\end{array}}\right) \equiv \prod \limits _{i=0}^{m-1} \left( {\begin{array}{c}a_{il}\\ c_{ij}\end{array}}\right) ({{\mathrm{mod}\,}}{2}). \end{aligned}$$

If

$$\begin{aligned} \left( {\begin{array}{c}l\\ n-1-j\end{array}}\right) \equiv 1 ({{\mathrm{mod}\,}}{2}), \end{aligned}$$

then when \(a_{il}=0\), \(\Rightarrow c_{ij}=0 \Rightarrow a_{ij}=1\), that is to say

$$\begin{aligned} \left( {\begin{array}{c}l\\ n-1-j\end{array}}\right) \equiv 1({{\mathrm{mod}\,}}{2})\Leftrightarrow a_{ij}=1, ~\text {for}~\forall ~ i \in I(l). \end{aligned}$$

on the other hand, from Lemma 2.21,

$$\begin{aligned} b_{lj}=1\Leftrightarrow x_{j}\in A_{i_{1}}\bigcap A_{i_{2}}\bigcap \cdots \bigcap A_{i_{s}}\Leftrightarrow a_{ij}=1, \text {when}~ i \in I(l). \end{aligned}$$

Compare the \(x^{n-1-j}\) terms on both sides of formula (2.26), so we have

$$\begin{aligned} (1+x)^{l}=\sum _{j=0}^{n-1}b_{lj}x^{n-1-j}. \end{aligned}$$

The Lemma holds.

For any \(0\le l<n=2^{m}\), we define the index set \(I(l)=\{i_{1},i_{2},\ldots , i_{s}\}\) and the vector in \(\mathbb {F}_{2}^{n}\).

$$\begin{aligned} N_{l}=\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}}. \end{aligned}$$

The index set I(l) corresponding to different l is different, so the corresponding vector \(N_{l}\) is different; since the index set corresponding to \(l=n-1\) is an empty set, the corresponding vector \(N_{n-1}\) is defined as

$$\begin{aligned} N_{n-1}=(1,1,\ldots , 1)=e. \end{aligned}$$

Let \(e_{0}=(1,0,\ldots , 0),\ldots , e_{n-1}=(0,0, \ldots , 1)\) be a set of standard bases of \(\mathbb {F}_{2}^{n}\).

Lemma 2.23

For \(0\le j<n\), we have

$$\begin{aligned} e_{j}=\prod \limits _{i=0}^{m-1}(\alpha _{i}+(1+a_{ij})e), \end{aligned}$$

where \(\alpha _{i}\) is the i-th row of matrix E.

Proof

For vector \(\alpha \) in \(\mathbb {F}_{2}^{n}\), its complement vector \(\overline{\alpha }\) is defined to replace the component of 1 in \(\alpha \) with 0, and the component of 0 in \(\alpha \) with 1. So there are

$$\begin{aligned} \alpha +\overline{\alpha }=e=(1,1,\ldots , 1), \forall ~ \alpha \in \mathbb {F}_{2}^{n}. \end{aligned}$$

When \(0\le j<n\) is given, we define the j-th complement of row vector \(\alpha _{i} (0 \le i<m)\) of matrix E as

$$\begin{aligned} \overline{\alpha _{i}{(j)}}={\left\{ \begin{aligned}&\alpha _{i},\ \ \text {if}~ a_{ij}=1;\\&\overline{\alpha _{i}},\ \ \text {if}~ a_{ij}=0. \end{aligned} \right. } \end{aligned}$$

Obviously, there is

$$\begin{aligned} \alpha _{i}+(1+a_{ij})e=\overline{\alpha _{i}(j)}, \end{aligned}$$

from the definition of index set I(l), we have

$$\begin{aligned} \overline{\alpha _{i}(j)}={\left\{ \begin{aligned}&\alpha _{i},\ \ \ \ \ \ \text {if}~i\not \in I(j);\\&e-\alpha _{i},\ \ \text {if}~ i\in I(j). \end{aligned} \right. } \end{aligned}$$

Now let’s calculate

$$\begin{aligned} \begin{aligned} \prod \limits _{i=0}^{m-1}(\alpha _{i}+(1+a_{ij})e)&=\prod \limits _{i=0}^{m-1}\overline{\alpha _{i}(j)}\\&=\prod \limits _{i\in I(j)}(e-\alpha _{i})\prod \limits _{i\not \in I(j)}{\alpha _{i}}=b. \end{aligned} \end{aligned}$$

where \(b\in \mathbb {F}_{2}^{n}\), let \(b=(b_{0},b_{1},\ldots , b_{n-1})\). Obviously, \(b_{j}=1\). If \(k\ne j\), then

$$\begin{aligned} b_{k}=\prod _{i \notin I(j)}{a_{ik}} \cdot \prod _{i \in I(j)}{(1-a_{ik})}=0. \end{aligned}$$

Thus \(b=e_{j}\). We have completed the proof of Lemma.

Lemma 2.24

\(\{N_{l}\}_{0\le l<n}\) constitutes a group of bases of \(\mathbb {F}_{2}^{n}\), where \(N_{n-1}=e=(1,1,\ldots ,1)\).

Proof

\(\{N_{l}\}_{0\le l<n}\) has exactly n different vectors, let’s prove that they are linearly independent. Let

$$N_{l}=\alpha _{i_{1}} \alpha _{i_{2}} \cdots \alpha _{i_{s}}=(b_{l0},b_{l1},\ldots , b_{l(n-1)}),$$
$$\sum _{l=0}^{n-1}c_{l}N_{l}=(\sum _{l=0}^{n-1}c_{l}b_{l0},\sum _{l=0}^{n-1}c_{l}b_{l1},\ldots ,\sum _{l=0}^{n-1}c_{l}b_{l(n-1)})$$

be a linear combination. Where \(c=(c_{0},c_{1},\ldots ,c_{n-1})\ne 0\). Because

$$\begin{aligned} f(x)=\sum _{l=0}^{n-1}(1+x)^{l} \in \mathbb {F}_{2}{[x]}, ~f(x)\ne 0. \end{aligned}$$

By Lemma 2.22, we have

$$\begin{aligned} f(x)=\sum _{j=0}^{n-1}(\sum _{l=0}^{n-1}c_{l}b_{lj})x^{n-1-j}. \end{aligned}$$

So if there’s a component \(\sum _{l=0}^{n-1}c_{l}b_{lj}\ne 0\), that is \(\{N_{l}\}_{0\le l<n}\) is a group of bases. The Lemma holds.

Definition 2.12

Let \(0\le r<m\), a linear code of order rłłReed–Muller code R(rm) be

$$\begin{aligned} R(r,m)=L(\{\alpha _{i_{1}} \alpha _{i_{2}} \ldots \alpha _{i_{s}}|0\le s \le r\}) \subset \mathbb {F}_{2}^{n}, \end{aligned}$$

the vector corresponding to \(s=0\) is e.

Obviously, when \(r=0\), R(0, m) corresponds to the repeated code in \(\mathbb {F}_{2}^{n}\):

$$\begin{aligned} R(0,m)=\{(0,0,\ldots ,0),(1,1,\ldots ,1)\}. \end{aligned}$$

For general r, \(0\le r<m\), R(rm) is a t-dimensional linear subspace in \(\mathbb {F}_{2}^{n}\), where

$$\begin{aligned} t=\sum _{s=0}^{r} \left( {\begin{array}{c}m\\ s\end{array}}\right) . \end{aligned}$$

Lemma 2.25

The dual code of Reed–Muller code R(rm) of order r is \(R(m-r-1,m)\).

Proof

The dimensions of R(rm) and \(R(m-r-1,m)\) are

$$dim(R(r,m))=\sum _{s=0}^{r} \left( {\begin{array}{c}m\\ s\end{array}}\right) $$

and

$$dim(R(m-r-1,m))=\sum _{s=0}^{m-r-1} \left( {\begin{array}{c}m\\ s\end{array}}\right) .$$

Because

$$\begin{aligned} \begin{aligned}&\sum _{s=0}^{r} \left( {\begin{array}{c}m\\ s\end{array}}\right) +\sum _{s=0}^{m-r-1} \left( {\begin{array}{c}m\\ m-s\end{array}}\right) \\&=\sum _{s=0}^{r} \left( {\begin{array}{c}m\\ s\end{array}}\right) +\sum _{s=r+1}^{m} \left( {\begin{array}{c}m\\ s\end{array}}\right) \\&=\sum _{s=0}^{m} \left( {\begin{array}{c}m\\ s\end{array}}\right) =(1+1)^{m}\\&=2^{m}=n. \end{aligned} \end{aligned}$$

That is

$$\begin{aligned} dim(R(r,m))+dim(R(m-r-1,m))=n. \end{aligned}$$

Let \(\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}}\), \(\alpha _{j_{1}}\alpha _{j_{2}} \cdots \alpha _{j_{t}}\) be the basis vectors of R(r,m) and R(m-r-1,m), respectively. Let

$$\begin{aligned} \alpha =\alpha _{i_{1}} \alpha _{i_{2}} \cdots \alpha _{i_{s}},~\beta =\alpha _{j_{1}} \alpha _{j_{2}}\cdots \alpha _{j_{t}}, \end{aligned}$$

by Lemma 2.21,

$$\begin{aligned} w(\alpha )=2^{m-s}, w(\beta )=2^{m-t},~ s\le r<m, ~t \le m-r-1, \end{aligned}$$

because \(s+t<m\), the product \(\alpha \beta =\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}}\cdot \alpha _{j_{1}}\alpha _{j_{2}}\cdots \alpha _{j_{t}}\) has weight

$$\begin{aligned} w(\alpha \beta )=w(\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}}\cdot \alpha _{j_{1}}\alpha _{j_{2}}\cdots \alpha _{j_{t}})=2^{m-(s+t)}, \end{aligned}$$

so

$$\begin{aligned} <\alpha , \beta >=0, \end{aligned}$$

That is, the dual code of R(rm) is \(R(m-r-1,m)\). The Lemma holds.

Theorem 2.9

Reed–Muller code R(rm) of order r have minimal distance \(d=2^{m-r}\), specially, when \(r=m-2\), \(R(m-2,m)\) is a linear code \([n,n-m-1]\).

Proof

From Lemma 2.21, we have

$$\begin{aligned} w(\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}})=2^{m-s}, \end{aligned}$$

so the minimum distance of R(r,m) is \(d\le 2^{m-r}\), on the other hand, let \(I_{1}(r)\) be the value of all l of corresponding \(\{i_{1}, i_{2}, \ldots , i_{s}\}\) under the condition of \(s\le r\), let

$$\begin{aligned} \alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}}=(b_{l0},b_{l1},\ldots , b_{l(n-1)}), \end{aligned}$$

then

$$\begin{aligned} f(x)=\sum _{l \in I_{1}(r)}{(1+x)^{l}}= \sum _{j=0}^{n-1}( \sum _{l \in I_{1}(r)}c_{l}b_{lj})x^{n-1-j}. \end{aligned}$$

Therefore, the weight function of linear combination has the following relationship:

$$\begin{aligned} w(\sum _{l \in I_{1}(r)}c_{l}\alpha _{i_{1}}\alpha _{i_{2}}\cdots \alpha _{i_{s}})=w(f(x)). \end{aligned}$$

Define \(i_{0}\) as

$$\begin{aligned} i_{0}=\min \{l|l \in I_{1}(r)\}. \end{aligned}$$

Obviously,

$$\begin{aligned} i_{0}=1+2+\cdots +2^{m-r-1}=2^{m-r}-1, \end{aligned}$$

from Lemma 2.20, then there is

$$\begin{aligned} w(f(x))\ge w((x+1)^{i_{0}})=i_{0}+1=2^{m-r}. \end{aligned}$$

Because the combination numbers

$$\begin{aligned} \left( {\begin{array}{c}i_{0}\\ k\end{array}}\right) =\left( {\begin{array}{c}2^{m-r}-1\\ k\end{array}}\right) ~(0 \le k\le 2^{m-r}-1) \end{aligned}$$

are all odd, this is because

$$\begin{aligned} i_{0}=1+2+\cdots +2^{m-r-1}, ~k=k_{0}+k_{1}\cdot {2}+\cdots +k_{m-r-1}2^{m-r-1}, \end{aligned}$$

\(\forall ~ k_{i}\le 1\), so as to deduce

$$\begin{aligned} \left( {\begin{array}{c}i_{0}\\ k\end{array}}\right) \equiv \prod \left( {\begin{array}{c}l\\ k_{i}\end{array}}\right) ({{\mathrm{mod}\,}}2). \end{aligned}$$

So there is

$$\begin{aligned} \left( {\begin{array}{c}i_{0}\\ k\end{array}}\right) \equiv 1({{\mathrm{mod}\,}}2). \end{aligned}$$

In the end, we have \(d=2^{m-r}\). If let \(r=m-2\), then the minimum distance is 4. The dimension of \(R(m-2,m)\) is

$$\begin{aligned} \begin{aligned} t&=\sum _{s=0}^{m-2}\left( {\begin{array}{c}m\\ s\end{array}}\right) =\sum _{s=0}^{m}\left( {\begin{array}{c}m\\ s\end{array}}\right) -\left( {\begin{array}{c}m\\ m-1\end{array}}\right) -\left( {\begin{array}{c}m\\ m\end{array}}\right) \\&=2^{m}-m-1\\&=n-m-1. \end{aligned} \end{aligned}$$

So \(R(m-2,m)\) is a linear code \([n,n-m-1]\). The theorem is proved.

Because \(R(m-2,m)\) is in the form of linear code \([n,n-k]\), and the minimum distance is 4, so we consider \(R(m-2,m)\) as a class of extended Hamming codes. Although it is not perfect, Hamming codes are perfect linear codes.

2.5 Shannon Theorem

In the channel transmission, due to the interference of the channel, a codeword \(x \in C\) cannot be decoded correctly after it is sent, the probability of this error is recorded as p(x), which is called the error probability of codeword x. According to Hamming distance, after code C is selected, according to the decoding principle of “look most like”, the error probability p(x) of a codeword \(x{\mathop {\longrightarrow }\limits ^{ \text {sending}}}x'\) satisfies

$$\begin{aligned} {\left\{ \begin{aligned}&p(x)=0,\ \ \text {if}~d(x,x')\le \rho _{1}<\frac{1}{2}n;\\&p(x)>0,\ \ \text {if}~d(x,x')>\rho _{1}. \end{aligned} \right. } \end{aligned}$$

where \(\rho _{1}\) is the disjoint radius of code C. Therefore, the error probability p(x) of code word x is related to code C. The error probability of code C is

$$\begin{aligned} p(C)=\frac{1}{|C|}\sum _{x \in C}p(x). \end{aligned}$$

It is difficult to calculate the error probability of a codeword mathematically, we take the binary channel as an example, \(C \subset \mathbb {F}_{2}^{n}\) is a binary code of length n, to calculate the error probability p(x) of \(x \in C\), we agree that the transmission error probability of character 0 is p, \(p<\frac{1}{2}\), that is the probability of receiving 0 as 1 after transmission, and the probability of character 1 transmission error is also p, although the probability of error is very low, that is, the value of p is very small, the probability of error exists due to the interference of channel. We further agree that the error probability of each transmission of character 0 or 1 is p, which is called memoryless channel. In the memoryless binary channel, the transmission of a codeword \(x=x_{1}x_{2}\ldots x_{n} \in C\) just constitutes the n-fold Bernoulli test, this probability model provides a theoretical basis for calculating the error probability of codeword x, let’s take 2-tuple code as an example.

Lemma 2.26

Let \(A_{n}\) be a binary repeated code of length n, that is \(A_{n}=\{0,1\} \subset \mathbb {F}_{2}^{n}\), \(p(A_{n})\) is the probability of error, then

$$\begin{aligned} \lim _{n\rightarrow \infty }p(A_{n})=0. \end{aligned}$$

Proof

The transmission of codeword \(0=(0,0,\ldots ,0)\) is regarded as n-fold Bernoulli test, the character 0 has only two results of 0 and 1 after each transmission, the probability of occurrence of 0 is \(q=1-p\), and the probability of occurrence of 1 is \(p<\frac{1}{2}\). Let \(0\le k\le n\), then the probability of 0 appearing k times is

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) q^{k}p^{n-k}. \end{aligned}$$

If \(k>\frac{1}{2}n\), then there are \(k>\frac{1}{2}n\) 0 characters in the received codeword after the codeword 0 is transmitted, suppose \(0\rightarrow \overline{0}\), then \(d(0,\overline{0})\le n-k <\frac{1}{2}n\). Because the disjoint radius of repeat code is \(\frac{1}{2}n\), according to the decoding principle, we can always decode \( \overline{0} \rightarrow 0\) correctly; therefore, the error of codeword \(0=(0,0,\ldots , 0) \in \mathbb {F}_{2}^{n}\) occurs if and only if when \(k\le \frac{1}{2}n\), the error probability is

$$\begin{aligned} p(0)=\sum _{0\le k\le \frac{n}{2}}\left( {\begin{array}{c}n\\ k\end{array}}\right) q^{k}p^{n-k}. \end{aligned}$$

Similarly, the error probability of codeword \(1=(1,1,\ldots , 1) \in \mathbb {F}_{2}^{n}\) is

$$\begin{aligned} p(1)=\sum _{0\le k\le \frac{n}{2}}\left( {\begin{array}{c}n\\ k\end{array}}\right) q^{k}p^{n-k}. \end{aligned}$$

Therefore, the error probability of repeat code \(A_{n}\) is

$$\begin{aligned} p(A_{n})=\sum _{0\le k\le \frac{n}{2}}\left( {\begin{array}{c}n\\ k\end{array}}\right) q^{k}p^{n-k}. \end{aligned}$$

To calculate the limit value \(n \rightarrow \infty \) of the above equation, let’s see

$$\begin{aligned} \sum _{0\le k\le \frac{n}{2}}\left( {\begin{array}{c}n\\ k\end{array}}\right) <\sum _{0\le k\le n}\left( {\begin{array}{c}n\\ k\end{array}}\right) =2^{n}. \end{aligned}$$

Because \(p<\frac{1}{2}\), so \(p<q\), and when \(k\le \frac{n}{2}\), we have

$$\begin{aligned} k \log \frac{q}{p} \le \frac{n}{2} \log \frac{q}{p}. \end{aligned}$$

It can be directly proved by the above formula

$$\begin{aligned} q^{k}p^{n-k}\le (qp)^{\frac{n}{2}}. \end{aligned}$$

Thus

$$\begin{aligned} p(A_{n})\le 2^{n}(qp)^{\frac{n}{2}}=(4qp)^{\frac{n}{2}}. \end{aligned}$$

Because when \(p<\frac{1}{2}\),

$$\begin{aligned} p^{2}-p+\frac{1}{4}=(p-\frac{1}{2})^{2}>0, \end{aligned}$$

so

$$\begin{aligned} p(1-p)=pq<\frac{1}{4}, \text {that is}~ 4pq<1. \end{aligned}$$

Therefore,

$$\begin{aligned} 0\le \lim _{n\rightarrow \infty }p(A_{n})\le \lim _{n\rightarrow \infty }(4qp)^{\frac{n}{2}}=0. \end{aligned}$$

The Lemma holds.

Below, we assume that the channel transmission is binary memoryless symmetric channel. Each code is binary code. The error probability of each transmission of characters 0 and 1 is p, \(q=1-p\), \(p<\frac{1}{2}\). For given codeword length n and the number of codewords \(M=M_{n}\), we define Shannon’s probability \(P^{*}(n,M_{n},p)\) as

$$\begin{aligned} P^{*}(n, M_{n}, p)=\min \{P(C)|C \subset \mathbb {F}_{2}^{n}, ~|C|=M_{n}\}. \end{aligned}$$

Shannon proved the following famous theorem in 1948.

Theorem 2.10

(Shannon) In a memoryless symmetric binary channel, let \(0< \lambda <1+p\log p+q \log q \) be a given real number, \(M_{n}=2^{[\lambda n]}\), then we have

$$\begin{aligned} \lim _{n\rightarrow \infty }P^{*}(n, M_{n}, p)=0. \end{aligned}$$

In order to understand the meaning of Shannon’s theorem and prove it, we need some auxiliary conclusions.

Lemma 2.27

\(0<\lambda <1+p \log p+q \log q\) is a given real number, any binary code \(C \subset \mathbb {F}_{2}^{n}\), if \(|C|=2^{[\lambda n]}\), then the code rate \(R_{C}\) of C satisfies

$$\begin{aligned} \lambda -\frac{1}{n}<R_{C}\le \lambda . \end{aligned}$$

Specially, When \(n\rightarrow \infty \), the rate of C approaches \(\lambda \).

Proof

$$\begin{aligned} |C|=2^{[\lambda n]}\Rightarrow \log _{2}|C|=[\lambda n]\le \lambda n. \end{aligned}$$

Therefore,

$$\begin{aligned} R_{C}=\frac{1}{n} \log _{2}|C|\le \lambda . \end{aligned}$$

From the properties of square bracket function,

$$\begin{aligned} \lambda n<[\lambda n]+1, \end{aligned}$$

so

$$\begin{aligned} \lambda n-1<[\lambda n]=\log _{2}|C|. \end{aligned}$$

There are

$$\begin{aligned} \lambda -\frac{1}{n}<\frac{1}{n}\log _{2}|C|=R_{C}. \end{aligned}$$

The Lemma 2.27 holds.

Combining Lemma 2.27, the significance of Shannon’s theorem is that the code rate tends to the capacity \(1-H(p)\) of a channel when the code length n increases and tends to infinity, and there exists a code C whose error probability is arbitrarily small, according to Shannon’s understanding, this kind of code is called “good code”. Shannon first proved the existence of “good codes” under more general conditions by probability method. Theorem 2.10 is only a special case of Shannon’s channel coding theorem. To prove Shannon theorem, we must accurately estimate the error probability of a given number of codewords under the principle of decoding.

Lemma 2.28

In the memoryless binary channel, let the probability of each transmission error of characters 0 and 1 be p, \(q=1-p\), a codeword \(x=x_{1}x_{2}\ldots x_{n} \in \mathbb {F}_{2}^{n}\) has exactly \(\omega \) characters error during transmission, then for any \(\varepsilon >0\), let \(b=\sqrt{\frac{npq}{\varepsilon }}\), we have

$$\begin{aligned} P\{\omega >np+b\}\le \varepsilon . \end{aligned}$$

Proof

For any a codeword \(x=x_{1}x_{2}\ldots x_{n} \in \mathbb {F}_{2}^{n}\), when transmitted in a memoryless binary channel, it can be regarded as an n-fold Bernoulli test, \(\omega \) with exactly \(\omega \) errors in x can be regarded as a discrete random variable with a value of \(0,1,2 , \ldots , n\), the probability of occurrence of \(\omega \) is (i.e., the probability of the value \(\omega \) of the random variable \(\omega \))

$$\begin{aligned} b(\omega ,n,p)=\left( {\begin{array}{c}n\\ \omega \end{array}}\right) p^{\omega }q^{n-\omega }. \end{aligned}$$

Therefore, the probability distribution of \(\omega \) obeys the discrete random variable of binomial distribution. From Lemma 1.18 of the first chapter, the expected value \(E(\omega )\) and variance \(D(\omega )\) of \(\omega \) are as follows:

$$\begin{aligned} E(\omega )=np, ~D(\omega )=npq. \end{aligned}$$

From the Chebyshev inequality of corollary 1.2, for any \(k>0\),

$$\begin{aligned} P\{|\omega -E(\omega )|\ge k \sqrt{D{(\omega )}}\} \le \frac{1}{k^{2}}. \end{aligned}$$

Take \(k=\frac{1}{\sqrt{\varepsilon }}\), then we have

$$\begin{aligned} P\{w>np+b\}\le P\{|\omega -np| > b \}\le \varepsilon . \end{aligned}$$

That is

$$\begin{aligned} P\{w>np+b\}\le \varepsilon . \end{aligned}$$

The Lemma 2.28 holds.

Lemma 2.29

Take \(\rho =[np+b]\), where \(b=\sqrt{\frac{np(1-p)}{\varepsilon }}\), then

$$\begin{aligned} \begin{aligned}&\frac{\rho }{n}\log \frac{\rho }{n}=p \log p+O(\frac{1}{\sqrt{n}}),\\&(1-\frac{\rho }{n})\log (1-\frac{\rho }{n})=q \log q+O(\frac{1}{\sqrt{n}}). \end{aligned} \end{aligned}$$

Proof

When \(\varepsilon >0\) is given, \(b=O(\sqrt{n})\), so \(\rho \) can be rewritten as

$$\begin{aligned} \rho =np+O(\sqrt{n}),~\frac{\rho }{n}=p+O(\frac{1}{\sqrt{n}}). \end{aligned}$$

Thus

$$\begin{aligned} \begin{aligned} \frac{\rho }{n}\log \frac{\rho }{n}&=(p+O(\frac{1}{\sqrt{n}}))\log (p+O(\frac{1}{\sqrt{n}}))\\&=(p+O(\frac{1}{\sqrt{n}}))(\log p+\log (1+O(\frac{1}{\sqrt{n}}))). \end{aligned} \end{aligned}$$

For the real number x of \(|x|<1\), we have the following Taylor expansion

$$\begin{aligned} \log (1+x)=x-\frac{1}{2}x^{2}+\frac{1}{3}x^{3}-\frac{1}{4}x^{4}\ldots . \end{aligned}$$

So when \(|x|<1\), we have

$$\begin{aligned} \log (1+x)=O(|x|), \end{aligned}$$

thus

$$\begin{aligned} \log (1+O(\frac{1}{\sqrt{n}}))=O(\frac{1}{\sqrt{n}}), \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned} \frac{\rho }{n}\log \frac{\rho }{n}&=(p+O(\frac{1}{\sqrt{n}}))(\log p+O(\frac{1}{\sqrt{n}}))\\&=p \log p+O(\frac{1}{\sqrt{n}}). \end{aligned} \end{aligned}$$

Similarly, for the second asymptotic formula,

$$\begin{aligned} (1-\frac{\rho }{n})\log (1-\frac{\rho }{n})=q \log q+O(\frac{1}{\sqrt{n}}), \end{aligned}$$

the Lemma 2.29 holds.

To prove Shannon theorem, we define the following auxiliary functions, and for any two codewords \(x, y \in \mathbb {F}_{2}^{n},~ \rho \ge 0\), define

$$\begin{aligned} f_{\rho }(x,y)={\left\{ \begin{aligned}&0,\ \ \text {if}~d(x,y)>\rho ;\\&1,\ \ \text {if}~d(x,y)\le \rho .\\ \end{aligned} \right. } \end{aligned}$$

Let \(C=\{x_{1},x_{2}, \ldots , x_{M}\} \subset \mathbb {F}_{2}^{n}\) be a binary code of \(|C|=M\), define

$$\begin{aligned} g_{i}(y)=1-f_{\rho }(y,x_{i})+\sum _{j\ne i}f_{\rho }(y,x_{j}). \end{aligned}$$

Lemma 2.30

Assuming \(y \in \mathbb {F}_{2}^{n}\) is a given codeword, then

$$\begin{aligned} {\left\{ \begin{aligned}&g_{i}(y)=0,\ \ \text {if}~x_{i} \in C ~\text {is the only codeword so that}~ d(y,x_{i})\le \rho ,\\&g_{i}(y)\ge 1,\ \ \text {otherwise}.~\\ \end{aligned} \right. } \end{aligned}$$

Proof

If there is a unique \(x_{i} \in C\) such that \(d(y,x_{i})\le \rho \), then \(f_{\rho }(y,x_{i})=1\), but \(f_{\rho }(y, x_{j})=0(i\ne j)\), therefore

$$\begin{aligned} g_{i}(y)=1-f_{\rho }(y,x_{i})+\sum _{j \ne i}f_{\rho }(y,x_{j})=0. \end{aligned}$$

If \(d(y,x_{i})> \rho ,\) then \(f_{\rho }(y,x_{i})=0\), so

$$\begin{aligned} g_{i}(y)=1-f_{\rho }(y,x_{i})+\sum _{j\ne i}f_{\rho }(y,x_{j})=1+\sum _{j\ne i}f_{\rho }(y,x_{j})\ge 1. \end{aligned}$$

If \(d(y,x_{i}) \le \rho \), but there is at least one \(x_{k}\ne x_{i}\) such that \(d(y,x_{k})\le \rho \), then

$$\begin{aligned} \begin{aligned} g_{i}(y)&=1-f_{\rho }(y,x_{i})+\sum _{j\ne i}f_{\rho }(y,x_{j})\\&=1+\sum _{j\ne i,j \ne k}f_{\rho }(y,x_{j})\ge 1. \end{aligned} \end{aligned}$$

The Lemma 2.30 holds.

With the above preparation, we give the proof of Shannon’s theorem.

Proof

(The proof of Theorem 2.10) According to the assumptions of the theorem, we assume that \(0<\lambda < 1+p\log p+q \log q\) is a given positive real number \((p<\frac{1}{2}).\)

$$\begin{aligned} M=M_{n}=2^{[\lambda n]}, ~|C|=M. \end{aligned}$$

Let

$$\begin{aligned} |C|=\{x_{1},x_{2},\ldots , x_{M}\} \subset \mathbb {F}_{2}^{n}, \end{aligned}$$

\(\varepsilon >0\) is any given positive number,

$$ b=\sqrt{\frac{npq}{\varepsilon }}, ~\rho =[pn+b]. $$

Because of \(p<\frac{1}{2}\), when n is sufficiently large, we have \(\rho =pn+O(\sqrt{n})<\frac{1}{2}n.\) In order to calculate the error probability of codeword \(x_{i} \in C\), suppose \(x_{i}{\mathop {\longrightarrow }\limits ^{ \text {transmit}}}y\), if \(d(x_{i},y)\le \rho \), and there is a unique codeword \(x_{i} \in C\) such that \(d(y, x_{i})\le \rho \), so according to the decoding principle of “look the most like”, \(x_{i}\) is the most similar codeword in C, so we can decode it correctly as \( y \rightarrow ^{transmit} x_{i}\), in this case, the error probability of \(x_{i}\) is 0. Otherwise, there will be real decoding error. On the other hand, y becomes \(x_{i}\), and the occurrence probability of the received codeword after transmission is the conditional probability \(p=(y|x_{i})\), so the error probability of \(x_{i}\) is estimated as

$$\begin{aligned} \begin{aligned}&P_{i}=p(x_{i})\le \sum _{y \in \mathbb {F}_{2}^{n}} p(y|x_{i})g_{i}(y)\\&=\sum _{y \in \mathbb {F}_{2}^{n}} p(y|x_{i})(1-f_{\rho }(y,x_{i}))+\sum _{y \in \mathbb {F}_{2}^{n}}\sum _{\begin{array}{c} j=1\\ j\ne i \end{array}}^{M}p(y|x_{i})f_{\rho }(y,x_{j}). \end{aligned} \end{aligned}$$
(2.27)

According to the definition of \(f_{\rho }(y,x_{i})\), the first term of the above formula is the probability that the received codeword y sent by \(x_{i}\) is not in ball \(B_{\rho }( x_{i})\), i.e.

$$\begin{aligned} \sum _{y \in \mathbb {F}_{2}^{n}} p(y|x_{i})(1-f_{\rho }(y,x_{i}))=P \{ \text {received codewords} ~y| y \notin B_{\rho }( x_{i})\}. \end{aligned}$$

Because \(\omega =d(y,x_{i})\) is exactly the number of \(\omega \) error characters in \(x_{i}\rightarrow y\), from the Chebyshev inequality of Lemma 2.28, we have

$$\begin{aligned} P\{\text {received codewords}|y \notin B_{\rho }( x_{i})\}=P\{\omega >\rho \}\le P\{\omega \ge np+b\}<\varepsilon , \end{aligned}$$

from (2.27), we have

$$\begin{aligned} P_{i}=p(x_{i})\le \varepsilon +\sum _{y \in \mathbb {F}_{2}^{n}}\sum _{\begin{array}{c} j=1\\ j\ne i \end{array}}^{M}p(y|x_{i})f_{\rho }(y,x_{j}). \end{aligned}$$
(2.28)

Because the definition of the error probability p(C) of code C, so there is

$$\begin{aligned} p(C)=\frac{1}{M}\sum _{i=1}^{M}p(x_{i})\le \varepsilon +M^{-1}\sum _{i=1}^{M}\sum _{y \in \mathbb {F}_{2}^{n}}p(y|x_{i})\sum _{\begin{array}{c} j=1\\ j\ne i \end{array}}^{M}f_{\rho }(y,x_{j}). \end{aligned}$$

Since C is randomly selected, we can regard p(C) as a random variable, so Shannon’s probability \(P*(n,M_{n},p)\) is the minimum value of p(C), so it is less than the expected value of p(C), i.e.

$$\begin{aligned} \begin{aligned}&P^{*}(n, M_{n},p) \le E(P(C))\\&\le \varepsilon +M^{-1}\sum _{i=1}^{M}\sum _{y \in \mathbb {F}_{2}^{n}} \sum _{\begin{array}{c} j=1\\ j\ne i \end{array}}^{M}E(p(y|x_{i}) \cdot f_{\rho }(y,x_{j})). \end{aligned} \end{aligned}$$

When i is given, the random variables \(p(y|x_{i})\) and \(f_{\rho }(y,x_{j})(j\ne i)\) are statistically independent, so

$$\begin{aligned} E(p(y|x_{i})\cdot f_{\rho }(y,x_{j}))=E(p(y|x_{i}))E(f_{\rho }(y,x_{j})). \end{aligned}$$

So there is

$$\begin{aligned} P^{*}(n,M_{n},p) \le \varepsilon +M^{-1}\sum _{i=1}^{M}\sum _{y \in \mathbb {F}_{2}^{n}}\sum _{\begin{array}{c} j=1\\ j\ne i \end{array}}^{M}E(p(y|x_{i}))E(f_{\rho }(y, x_{j})). \end{aligned}$$
(2.29)

Let’s calculate the expected value of \(f_{\rho }(y,x_{j})\), because y is selected in \(\mathbb {F}_{2}^{n}\) with equal probability, so

$$\begin{aligned} \begin{aligned} E(f_{\rho }(y,x_{j}))&=\sum _{y \in \mathbb {F}_{2}^{n}}p(y)f_{\rho }(y,x_{j})\\&=\frac{1}{2^{n}}|B_{\rho }(x_{j})|\\&=\frac{1}{2^{n}}|B_{\rho }(0)|. \end{aligned} \end{aligned}$$

So there is

$$\begin{aligned} \begin{aligned} P^{*}(n,M_{n},p)&= \varepsilon +M^{-1}\sum _{i=1}^{M}\sum _{y \in \mathbb {F}_{2}^{n}}E(p(y|x_{i}))\sum _{\begin{array}{c} j=1\\ j\ne i \end{array}}^{M}E(f_{\rho }(y,x_{j}))\\&=\varepsilon +M^{-1}\sum _{i=1}^{M}\sum _{y \in \mathbb {F}_{2}^{n}}E(p(y|x_{i}))\frac{(M-1)|B_{\rho }(0)|}{2^{n}}. \end{aligned} \end{aligned}$$
(2.30)

Now let’s calculate the expected value of \(p(y|x_{i})\)(y fixed, \(x_{i}\) randomly selected in C)

$$\begin{aligned} E(p(y|x_{i}))=\sum _{i=1}^{M}p(x_{i})p(y|x_{i})=p(y), \end{aligned}$$

thus

$$\begin{aligned} \sum _{i=1}^{M}\sum _{y \in \mathbb {F}_{2}^{n}}E(p(y|x_{i}))=\sum _{i=1}^{M}\sum _{y \in \mathbb {F}_{2}^{n}}p(y)=M. \end{aligned}$$

From (2.30),

$$P^{*}(n,M_{n},p) \le \varepsilon +\frac{M-1}{2^{n}}|B_{\rho }(0)|,$$
$$\log _{2}(P^{*}(n,M_{n},p)-\varepsilon )\le \log _{2}M+\log _{2}|B_{\rho }(0)|-n.$$

That is

$$\begin{aligned} \frac{1}{n}\log _{2}(P^{*}(n,M_{n},p)-\varepsilon )\le \frac{1}{n}\log _{2}M+\frac{1}{n}\log _{2}|B_{\rho }(0)|-1. \end{aligned}$$

From Lemma 1.11 of Chap. 1,

$$\begin{aligned} \frac{1}{n}\log _{2}B_{\rho }(0)=\frac{1}{n}\log _{2}\sum _{i=0}^{\rho }\left( {\begin{array}{c}n\\ i\end{array}}\right) \le H(\frac{\rho }{n}), \end{aligned}$$

where \(H(x)=-x \log x-(1-x)\log (1-x)(0<x<\frac{1}{2})\) is the binary entropy function, so there is

$$\begin{aligned} \frac{1}{n}\log _{2}(P^{*}(n,M_{n},p)-\varepsilon )\le \frac{1}{n}\log _{2}M+ H(\frac{\rho }{n})-1. \end{aligned}$$

By hypothesis \(M=2^{[\lambda n]}\), \(\rho =[pn+b]\), \(b=O(\sqrt{n}),\) we have

$$\begin{aligned} \begin{aligned} \frac{1}{n}\log _{2}(P^{*}(n,M_{n},p)-\varepsilon )&\le \frac{[\lambda n]}{n}+ H(\frac{\rho }{n})-1\\&=\lambda + H(\frac{\rho }{n})-1+O(\frac{1}{n}). \end{aligned} \end{aligned}$$

By Lemma 2.29,

$$\begin{aligned} \begin{aligned} H(\frac{\rho }{n})&=-(\frac{\rho }{n}\log \frac{\rho }{n}+(1-\frac{\rho }{n})\log (1-\frac{\rho }{n}))\\&=-(p \log p+q \log q+O(\frac{1}{\sqrt{n}})). \end{aligned} \end{aligned}$$

So

$$\begin{aligned} \frac{1}{n}\log _{2}(P^{*}(n, M_{n},p)-\varepsilon )\le \lambda -(1+p \log p+q \log q)+O(\frac{1}{\sqrt{n}}). \end{aligned}$$

By hypothesis \(\lambda <1+p\log p+q \log q\), when n is sufficiently large, we have

$$\begin{aligned} \frac{1}{n}\log _{2}(P^{*}(n,M_{n},p)-\varepsilon )\le -\beta (\beta >0). \end{aligned}$$

Therefore, \(0\le P^{*}(n,M_{n},p)\le \varepsilon +2^{-\beta n}\), take the limit \(n\rightarrow \infty \) on both sides, finally,

$$\begin{aligned} \lim _{n\rightarrow \infty }P^{*}(n,M_{n},p)=0. \end{aligned}$$

We completed the proof of the theorem.

According to Shannon, the code rate is close to a given normal number \(\lambda \),

$$\begin{aligned} 0<\lambda < 1+p \log p+q \log q=1-H(p), \end{aligned}$$

the code with arbitrarily small error probability is called “good code”, we further analyze the construction of this kind of “good code”. (Shannon only proved the existence of “good code” in probability).

Theorem 2.11

For given \(\lambda \), \(0<\lambda< 1+p \log p+q \log q(p<\frac{1}{2})\), \(M_{n}=2^{[\lambda n]}\), if there is a perfect code \(C_{n}\), and \(|C_{n}|=M_{n}\), then we have

$$\begin{aligned} \lim _{n\rightarrow \infty }p(C_{n})=0. \end{aligned}$$

Proof

If perfect code \(C_{n}\) exists, by Lemma 2.27,

$$\begin{aligned} \lambda -\frac{1}{n}\le R_{C_{n}}\le \lambda . \end{aligned}$$

Therefore, the code rate of \(C_{n}\) can be arbitrarily close to \(\lambda \), the error probability of \(C_{n}\) can be arbitrarily small, so \(C_{n}\) is a “good code” in the mathematical sense. To prove Theorem 2.11, because \(C_{n}\) is a perfect code, the minimum distance \(d_{n}\) is defined as

$$\begin{aligned} d_{n}=2e_{n}+1, ~e_{n}<\frac{n}{2}. \end{aligned}$$

Because of \(\lim \limits _{n\rightarrow \infty }R_{c_{n}}=\lambda \), by Theorem 2.2, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }H(\frac{e_{n}}{n})=1-\lambda >H(p). \end{aligned}$$

Because the binary entropy function H(x) is a monotone continuous rising function \((0<x<\frac{1}{2}).\) So we have the limit \(\lim \limits _{n\rightarrow \infty }\frac{e_{n}}{n}\), and

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{e_{n}}{n}>p, ~\text {that is}~ \frac{e_{n}}{n}>p, \text {When { n} is sufficiently large}. \end{aligned}$$

Now consider the error probability p(x) of codeword \(x=x_{1}x_{2}\ldots x_{n}\in C_{n}\), since \(C_{n}\) is \(e_{n}\) error correction code, so \(x\rightarrow x'\), when \(d(x,x')\le e_{n}\), we can always decode correctly, at this time, the error probability of x is 0. Therefore, x transmission error, that is, the case where \(x'\) cannot be decoded correctly occurs only in case \(d(x',x)=w_{n}>e_{n}\). At this point we have(When n is sufficiently large)

$$\begin{aligned} \frac{w_{n}}{n}>\frac{e_{n}}{n}>p+\varepsilon , ( \text {exist a}~ \varepsilon >0) \end{aligned}$$

So the error probability p(x) of \(x \in C_{n}\) is estimated

$$\begin{aligned} \begin{aligned} p(x)&\le P\{\frac{w_{n}}{n}>p+ \varepsilon \}\\&\le P\{|\frac{w_{n}}{n}-p|> \varepsilon \}. \end{aligned} \end{aligned}$$

Because when \(n\rightarrow \infty \), the random variable sequence \(\{w_{n}\}\) is a Bernoulli random process (i.e., for each n, it is n-folds Bernoulli test). From theorem 1.2 in Chap. 1, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }p(x)\le \lim _{n\rightarrow \infty }P\{|\frac{w_{n}}{n}-p|>\varepsilon \}=0. \end{aligned}$$

For \(\forall ~ x\in C_{n}\) holds, so

$$\begin{aligned} \lim _{n\rightarrow \infty }p(C_{n})=0. \end{aligned}$$

The Theorem 2.11 holds.

From the proof of Theorems 2.10 and 2.11, it can be seen that Shannon randomly selects a code and randomly selects a codeword, which essentially regards the input information as a random event in a given probability space, and the transmission process of information is essentially a random process. The fundamental difference between Shannon and other mathematicians at the same time is that he regards information or a code as a random variable. The mathematical model of information transmission is a dynamic probability model rather than a static algebraic model. The most important method to study a code naturally is probability statistics rather than the algebraic combination method of traditional mathematics. From the perspective of probability theory, Theorems 2.10 and 2.11 regard a code as a random variable, but they have great particularity. The probability distribution of this random variable obeys Bernoulli binomial distribution, especially the statistical characteristics of code rate, which are not clearly expressed. It is the core content of Shannon’s information theory to study the relationship between random variables with general probability distribution and codes. One of the most basic concepts is information entropy, or code entropy. Using the concept of code entropy, the statistical characteristics of a code are clearly displayed. Therefore, we see a basic framework and prototype of modern information theory. In the next chapter, we explain and prove these basic ideas and results of Shannon information theory in detail. One of the most important results is Shannon channel coding theorem (see Theorem 3.12 in Chap. 3). Shannon uses the probability method to prove that the so-called good code with a code rate up to the transmission capacity and an arbitrarily small error probability exists for the general memoryless channel (whether symmetrical or not). On the contrary, the code rate of a code with an arbitrarily small error probability must not be greater than the capacity of the channel. This channel capacity is called Shannon’s limit, which has been pursued for a long time in the field of electronic communication engineering technology. People want to find a channel coding scheme with error probability in a controllable range (e.g., less than \(\varepsilon \)) and transmission efficiency (i.e., code rate) reaching Shannon’s limit. In today’s 5G era, this engineering technical problem seems to have been overcome. Returning to theorem 2.10, we see that the upper limit \(1-H(p)\) of the code rate is the channel capacity of the memoryless symmetric binary channel (see example 2 in Sect. 8 of Chap. 3). From this example, we can get a glimpse of Shannon’s channel coding theory.

Exercise 2

  1. 1.

    Please design a code of length 7, which contains 8 codewords, where the Hamming distance of any two codewords is \(\ge 4\). The code is transmitted through symmetric binary channel, assuming the error probability of characters 0 and 1 is p, calculate the success probability of codeword transmission.

  2. 2.

    Let C be a binary code of length 16, satisfy

    • (i) Each codeword has a weight of 6.

    • (ii) Any two codewords have Hamming distance of 8.

    Prove: \(|C|\le 16\). Does the binary code C of \(|C|=16\) exist?

  3. 3.

    Let C be a binary code of length n and an error correcting code of one character, prove

    $$|C|\le \frac{2^{n}}{n+2}\,(n\, is\, even).$$
  4. 4.

    Let C be a binary perfect code of length n, and the minimum distance is 7. Prove: \(n=7\) or \(n=23\).

  5. 5.

    Let \(C\subset \mathbb {F}_{q}^{n}\) be a linear code, \(C=[n,k]\) and any k coordinates be symmetric, prove: the minimum distance of C is \(d=n-k+1\).

  6. 6.

    Suppose \(C=[2k+1,k]\subset \mathbb {F}_{2}^{2k+1}\), and \(C\subset C^{\perp }\), write the difference set \(C^{\perp }\backslash C\).

  7. 7.

    Let \(x=x_{1}x_{2}\ldots x_{6}\in \mathbb {F}_{2}^{6}\), Decide Hamming ball \(|B_{1} (x)|\). We can find a code \(C\subset \mathbb {F}_{2}^{6}\)? Where \(|C|=9\), satisfy the Hamming distance of any two different codewords in C is \(\ge 3\)?

  8. 8.

    Let \(C=[n, k]\subset \mathbb {F}_{q}^{n}\) be a linear code, the generating matrix is G, if every column of G is not all zero, prove

    $$\sum _{x\in C}w(x)=n(q-1)q^{k-1}.$$

    Where w(x) is the weight of codeword x.

  9. 9.

    Let \(C=[n, k]\) be a linear binary code, and there is a codeword with odd weight in C, prove that the codewords with even weight in C form a linear code \([n,k-1]\).

  10. 10.

    Let C be a linear binary code, the generating matrix G is

    $$ \begin{pmatrix} 1 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 1 \\ 0 &{} 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 1 &{} 1 \\ 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 1 &{} 1 \end{pmatrix},$$

    Please decode the received codewords as follows: \(y_{1}=(1101011),y_{2}=(0110111),y_{3}=(0111000).\)

  11. 11.

    Let p be a prime, is there a self-dual linear code \(C=[8,4]\) over \(\mathbb {F}_{p}\)?

  12. 12.

    Let \(R_{k}\) be the rate of binary Hamming codes, find \(\lim \limits _{k\rightarrow \infty }R_{k}=?\)

  13. 13.

    Let C be a linear binary code, the weight distribution polynomial is A(z), finding the weight distribution polynomial B(z) of dual code \(C^{\perp }\).

  14. 14.

    Let \(C=[n,k]\subset \mathbb {F}_{2}^{n}\), weight distribution polynomial be A(z), we use binary symmetric channel to transmit codewords, and the error probability is p (the error probability of characters 0 and 1), we hope that a codeword transmission error can be detected, and calculate the probability that a codeword transmission error will not be detected.

  15. 15.

    There is no linear code \(C=[15,8]\) with minimum distance 5 over any finite field \(\mathbb {F}_{q}\).

  16. 16.

    Let \(n=2^{m}\), proved that Reed–Muller code R(1, m) is Hadamard code of length n.

  17. 17.

    Proved that ternary Golay has 132 codewords and its weight is 5. Let x be the codeword of weight 5, consider all pairs (x, 2x), where \(w(x)=5\), take the component whose coordinate component is not zero as a subset. Proved that there are 66 such subsets and form \(4-(11,5,1)\) designs.

  18. 18.

    If the minimum distance d of a binary code \(C=(n,M,d)\) is even, prove that there exists a binary code such that all its codewords have even weights.

  19. 19.

    Let H be a Hadamard matrix \(H_{12}\), define

    $$A=H-I,G=(I,A), I~\text {is the unit matrix}.$$

    Proved that G is the generating matrix of ternary code [24, 12] and the minimum distance is 9.

  20. 20.

    Let \(C=[4,2]\) be a ternary Hamming code. H is the check matrix of C, let I be the unit matrix of order 4, J is a square matrix of order 4 with all elements of 1, define

    $$G= \left[ \begin{array}{cccc} J+I &{} I &{} I\\ 0 &{} H &{} -H \end{array}\right] , $$

    prove that G generates a ternary code \(C=[12,6]\) and the minimum distance is 6.