1 Introduction

In many tensor applications, we only have an element-wise or a slice-wise description of our data/signal model. For instance, there exists only a slice-wise description of the PARATUCK2 decomposition and the PARAFAC2 decomposition corresponding to a certain unfolding of the overall tensor [1,2,3]. In the same way, some proposed tensor-based models for MIMO-OFDM communication systems have only an element-wise or a slice-wise representation [4]. Further examples include the slice-wise description of MIMO communication systems using two-way relaying [5, 6]. This description of the signal models does not reveal the tensor structure explicitly. Hence, the derivation of all tensor unfoldings is not always obvious. Therefore, we propose to express the slice-wise multiplication of two tensors in terms of the double contraction operator and use it to derive an explicit tensor structure of the received data tensor in the form of a CP-like, or Tucker-like, model in a systematic way. These explicit tensor models reveal all the possible generalized unfoldings at the same time and can subsequently be exploited to estimate the model parameters. One of our main contributions is to provide a systematic way to derive such an explicit tensor representation.

OFDM is the most widely used multi-carrier technique in current wireless communication systems. It is robust in multipath propagation environments and has a simple and efficient implementation [7, 8]. Using the fast Fourier transform (FFT), the complete frequency band is divided into smaller frequency subcarriers. Moreover, the use of the cyclic prefix mitigates the inter-symbol Interference (ISI) and the inter-carrier Interference (ICI). Typically, the OFDM receiver is implemented in the frequency domain based on a zero forcing (ZF) filter. Other more advanced solutions are proposed in [9], as well as optimal training and channel estimation for OFDM systems are proposed in [10, 11].

Tensor-based signal processing offers an improved identifiability, uniqueness, and more efficient denoising compared to matrix-based techniques. In [4], a MIMO multi-carrier system is modeled using tensor algebra and the PARATUCK2 tensor decomposition resulting in a novel space, time, and frequency coding structure. Similarly in [12], trilinear coding in space, time, and frequency is proposed for MIMO-OFDM systems based on the CP tensor decomposition. By exploiting tensor models, semi-blind receivers are introduced for multi-carrier communication systems in [13, 14]. All these works use additional spreading that leads to a significantly reduced spectral efficiency to create the tensor structure. Moreover, previous publications on tensor models for multi-carrier communication systems [4, 12,13,14] do not exploit the channel correlation between the adjacent subcarriers. The previously mentioned publications rely on the subcarrier-wise description of the MIMO-OFDM system. This description of the signal models does not reveal the tensor structure explicitly. Hence, the derivation of all tensor unfoldings is not always obvious. In [15], a PARAFAC model and a Tucker model are proposed for joint channel, data, and phase noise estimation in MIMO-OFDM system, taking into account the phase noise due to inter-carrier interference. The author of [16] also proposes a tensor model for filter bank-based multi-carrier (FBMC) communication systems. However, this model is derived from a PARATUCK2 decomposition, which is not based on tensor contractions. This derivation, although popular, is as not as general as the derivation proposed in this paper. Otherwise stated, the tensor modeling approach of [16] is restricted to FBMC systems, while our approach is valid for any MIMO-OFDM system with orthogonal or nonorthogonal subcarriers.

In this paper, we propose a new approach to model MIMO-OFDM communication systems and to design semi-blind receivers. The idea is built upon a double contraction model that allows to replace the slice-wise multiplication of two tensors so that the explicit tensor structure of the data model can be derived. We provide the mathematical tools to derive such an explicit tensor structure in general. The received data in a MIMO-OFDM system are derived from such an explicit tensor structure, which is efficiently exploited at the receiver for a joint channel and symbol estimation. More specifically, we first present the double contraction between an uncoded signal tensor and a channel tensor for OFDM systems, yielding the same spectral efficiency as matrix-based approaches (since no additional spreading is used) [17]. We propose an application of the double contraction operator to Khatri–Rao-coded MIMO-OFDM systems [18]. Due to the Khatri–Rao coding, the signal tensor has a richer structure and can be recast as a constrained CP-like model. In fact, the Khatri–Rao space–time coding concept has been introduced in [19]. Later, it has been extended in [20] to Khatri–Rao space–time–frequency coding. In contrast to the state of the art [4, 13, 14, 20], in this work we exploit the structure of the channel and the contraction properties using the transmit signal tensor and the known coding matrix to propose a receiver based on the LS-KRF. In addition, we reduce the number of required pilot symbols by exploiting the correlation of the channel in the frequency domain, which has not been exploited in these previous works. Finally, we propose a more spectrally efficient cross-coding model for MIMO-OFDM systems. In this case, the known and fixed Khatri–Rao coding matrix is eliminated, and two useful symbol matrices are cross-coded by means of the Khatri–Rao product. By exploiting the CP-like tensor structure of the received signal, we also design two types of receivers for the cross-coded MIMO-OFDM systems.

This paper is organized as follows. In Sect. 2, we introduce the tensor algebra notation and provide the mathematical tools to derive an explicit tensor structure from the slice-wise multiplication of two tensors. Section 3 describes the system model using the double contraction formalism for the traditional MIMO-OFDM transmission. In Sect. 4, we recast the tensor signal model for the Khatri–Rao-coded MIMO-OFDM case and present the two closed-form receiver designs for this system, which are based on the Khatri–Rao factorization. In Sect. 5, we consider a cross-coded MIMO-OFDM system with enhanced spectral efficiency and derive the corresponding semi-blind receivers. A discussion on the computational complexity of the different receivers is also carried out. In Sect. 6, numerical results are presented, and the paper is concluded in Sect. 7.

2 Tensor algebra and notation

2.1 Notation

We use the following notation. Scalars are denoted either as capital or lower-case italic letters, Aa. Vectors and matrices are denoted as bold-faced lower-case and capital letters, \({\varvec{{a}}}, {\varvec{{A}}}\), respectively. Tensors are represented by bold-faced calligraphic letters \({\varvec{{{\mathcal {A}}}}}\). The following superscripts, \(^{\mathrm{T}}\), \(^{\mathrm{H}}\),\(^{{-1}}\), and \(^+\) denote transposition, Hermitian transposition, matrix inversion, and Moore–Penrose pseudo-matrix inversion, respectively. The outer product, Kronecker product, and Khatri–Rao product are denoted as \(\circ\), \(\otimes\), and \(\diamond\), respectively. Moreover, we denote the Hadamard product (element-wise multiplication) and the inverse Hadamard product (element-wise division) between two arrays of equal dimensions as \(\odot\) and \(\oslash\), respectively. The operators \(\left| \left| .\right| \right| _{\text {F}}\) and \(\left| \left| .\right| \right| _{\text {H}}\) denote the Frobenius norm and the higher order norm of a tensor that is defined as the square root of the sum of the squared absolute values of its elements, respectively. Moreover, the n-mode product between a tensor \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{I_1\times I_2 \ldots \times I_N}\) and a matrix \({\varvec{{B}}} \in {\mathbb {C}}^{J \times I_n}\) is denoted as \({\varvec{{{\mathcal {A}}}}}\times _n{\varvec{{B}}}\), for \(n=1, 2, \ldots N\) [21]. The identity N-way tensor of dimension \(R\times R\cdots \times R\) is denoted as \({\varvec{{{\mathcal {I}}}}}_{N,R}\). Similarly, an identity matrix of dimension \({R\times R}\) is denoted as \({\varvec{{I}}}_R\) and we denote a vector of ones of length R as \({\varvec{{1}}}_R\). The nth three-mode slice of a tensor \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{I\times J\times N}\) is denoted as \({\varvec{{{\mathcal {A}}}}}_{(.,.,n)}\) and accordingly one element of this tensor is denoted as \({\varvec{{{\mathcal {A}}}}}_{(i,j,n)}\). The operator \(\mathrm{{diag}}(.)\) transforms a vector into a diagonal matrix and the operator \(\mathrm{{vec}}(.)\) transforms a matrix into a vector. Note that we distinguish between a super-diagonal or an identity tensor and a diagonal tensor. A diagonal tensor is a tensor that consists of diagonal slices along one dimension. For instance, a diagonal tensor \({{\varvec{{{\mathcal {D}}}}}_A \in {\mathbb {C}}^{M\times N\times N}}\) that is diagonal along the first dimension has diagonal one-mode slices, i.e., \({{\varvec{{{\mathcal {D}}}}}_A}_{(m,.,.)} = \mathrm{{diag}}({\varvec{{a}}}_m)\), for \(m = 1,\ldots ,M\), where \({\varvec{{a}}}_m\) is an n-dimensional vector. The concatenation of two tensors along their mth dimension is denoted as \(\sqcup _m\) [22]. For two tensors \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{I\times I_2 \times I_3}\) and \({\varvec{{{\mathcal {B}}}}} \in {\mathbb {C}}^{J\times I_2 \times I_3}\), after the concatenation along the first dimension, we get \({\varvec{{{\mathcal {A}}}}} \sqcup _1 {\varvec{{{\mathcal {B}}}}} \in {\mathbb {C}}^{I + J\times I_2 \times I_3}\).

2.2 The CP decomposition and generalized tensor unfoldings

The CP tensor decomposition decomposes a given tensor into the minimum number of rank one components. The CP decomposition of a four-way, rank R tensor \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{I \times J \times M \times N}\) can be written as

$$\begin{aligned} {\varvec{{{\mathcal {A}}}}} = {\varvec{{{\mathcal {I}}}}}_{3,R}\times _1{\varvec{{F}}}_1\times _2{\varvec{{F}}}_2\times _3{\varvec{{F}}}_3\times _4{\varvec{{F}}}_4, \end{aligned}$$
(1)

where \({\varvec{{F}}}_1 \in {\mathbb {C}}^{I \times R}, {\varvec{{F}}}_2 \in {\mathbb {C}}^{J \times R}\), \({\varvec{{F}}}_3 \in {\mathbb {C}}^{M \times R}\), and \({\varvec{{F}}}_4 \in {\mathbb {C}}^{N \times R}\) are the factor matrices [21, 23]. In addition to the n-mode unfoldings, generalized matrix unfoldings can be defined by using two subsets of any of the N dimensions [24, 25]. For instance, the set of modes \((1, 2,\ldots ,N)\) of an N-way tensor \({\varvec{{{\mathcal {A}}}}}\) can be divided into two non-overlapping subsets with cardinality P and \(N-P\), \(\alpha ^{(1)}=[\alpha _1 \ldots \alpha _P]\) and \(\alpha ^{(2)}=[\alpha _{P+1} \ldots \alpha _N]\), respectively. This leads to the generalized unfolding \(\left[ {\varvec{{{\mathcal {A}}}}} \right] _{(\alpha ^{(1)},\alpha ^{(2)})}\), where the indices contained in \(\alpha ^{(1)}\) vary along the rows and the indices contained in \(\alpha ^{(2)}\) vary along the columns. Here, the index \(\alpha _1\) varies the fastest between the rows, the index \(\alpha _{P+1}\) varies the fastest between the columns, P is any number between one and N, and \(\alpha _{n}\) is any of the tensor dimensions. For instance, let us assume the four-way tensor \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{I \times J \times M \times N}\) defined in Eq. (1). In the generalized unfolding \([{\varvec{{{\mathcal {A}}}}}]_{([1,2],[3,4])}\) the first mode varies faster than the second mode along the rows and the third mode varies faster than the fourth mode along the columns. Moreover, for a tensor with a CP structure, its unfoldings and generalized unfoldings can be expressed in terms of the factor matrices. For instance, the generalized unfolding \([{{\varvec{{{\mathcal {A}}}}}}]_{([1, 2],[3, 4])}\) of the tensor \({\varvec{{{\mathcal {A}}}}}\) satisfies [18, 25]

$$\begin{aligned}{}[{{\varvec{{{\mathcal {A}}}}}}]_{([1, 2],[3, 4])} =\left( {\varvec{{F}}}_2\diamond {\varvec{{F}}}_1\right) \cdot \left( {\varvec{{F}}}_4\diamond {\varvec{{F}}}_3\right) ^{\mathrm{T}}. \end{aligned}$$

In a similar way, the rest of the tensor unfoldings and generalized unfoldings can be defined.

2.3 Tensor contraction

The contraction \({\varvec{{{\mathcal {A}}}}}\bullet _n^m{\varvec{{{\mathcal {C}}}}}\) between two tensors \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{I_1\times I_2 \ldots \times I_N}\) and \({\varvec{{{\mathcal {C}}}}} \in {\mathbb {C}}^{J_1\times J_2 \ldots \times J_N}\) represents an inner product of the nth mode of \({\varvec{{{\mathcal {A}}}}}\) with the mth mode of \({\varvec{{{\mathcal {C}}}}}\), provided that \(I_n = J_m\) [26]. Contraction along several modes of compatible dimensions is also possible and accordingly the contraction along two modes is denoted as \({\varvec{{{\mathcal {A}}}}}\bullet _{n,k}^{m,l}{\varvec{{{\mathcal {C}}}}}\). More specifically, the double contraction between the tensors \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{I \times J \times M \times N}\) and \({\varvec{{{\mathcal {C}}}}} \in {\mathbb {C}}^{M \times N \times K}\) is defined as [26],

$$\begin{aligned} ({\varvec{{{\mathcal {A}}}}}\bullet _{3,4}^{1,2}{\varvec{{{\mathcal {C}}}}})_{(i,j,k)} \triangleq \sum _{n=1}^{N}{\sum _{m=1}^{M}{\varvec{{{\mathcal {A}}}}}_{(i,j,m,n)}\cdot {\varvec{{{\mathcal {C}}}}}_{(m,n,k)}}={\varvec{{{\mathcal {T}}}}}_{(i,j,k)}. \end{aligned}$$

This example represents a contraction of the third and fourth mode of \({\varvec{{{\mathcal {A}}}}}\) with the first and second mode of \({\varvec{{{\mathcal {C}}}}}\), respectively.

Using the concept of the generalized unfoldings, it can be shown that the tensor contraction satisfies

$$\begin{aligned}{}[{\varvec{{{\mathcal {A}}}}}\bullet _{3,4}^{1,2}{\varvec{{{\mathcal {C}}}}}]_{([1,2],3)}&= [{\varvec{{{\mathcal {A}}}}}]_{([1,2],[3,4])}\cdot [{\varvec{{{\mathcal {C}}}}}]_{([1,2],3)}{,} \end{aligned}$$
(2)
$$\begin{aligned} _{([1,2],3)}&= [{\varvec{{{\mathcal {A}}}}}]_{([1,2],[4,3])}\cdot [{\varvec{{{\mathcal {C}}}}}]_{([2,1],3)}. \end{aligned}$$
(3)

In the generalized unfolding \([{\varvec{{{\mathcal {A}}}}}]_{([1,2],[3,4])}\) the first mode varies faster than the second mode between the rows and the third mode varies faster than the fourth mode between the columns.

2.4 Contraction properties for element-wise and slice-wise multiplications

2.4.1 Hadamard product via tensor contraction

First, let us consider a Hadamard product (element-wise multiplication) between two vectors \({\varvec{{a}}} \in {\mathbb {C}}^{M \times 1}\) and \({\varvec{{b}}} \in {\mathbb {C}}^{M \times 1}\), \(c_{(m)}=a_{(m)}b_{(m)}\), \(\forall m=1,\ldots , M\) (\({\varvec{{c}}} \in {\mathbb {C}}^{M \times 1}\)). The Hadamard product can be expressed via the multiplication of a diagonal matrix and a vector, i.e., \({\varvec{{a}}} \odot {\varvec{{b}}} = \mathop {\mathrm{{diag}}}\left( {\varvec{{a}}}\right) {\varvec{{b}}} = \mathop {\mathrm{{diag}}}\left( {\varvec{{b}}}\right) {\varvec{{a}}}\). Using the fact that a matrix multiplication is equivalent to the contraction \({\bullet }_{2}^{1}\), we get \({\varvec{{a}}} \odot {\varvec{{b}}} = \mathop {\mathrm{{diag}}}\left( {\varvec{{a}}}\right) {\bullet }_{2}^{1}{\varvec{{b}}} = \mathop {\mathrm{{diag}}}\left( {\varvec{{b}}}\right) {\bullet }_{2}^{1}{\varvec{{a}}}.\)

Next, for the Hadamard product between two matrices \({\varvec{{A}}} \in {\mathbb {C}}^{M \times N}\) and \({\varvec{{B}}} \in {\mathbb {C}}^{M \times N}\), \({\varvec{{C}}}_{(m,n)}={\varvec{{A}}}_{(m,n)}{\varvec{{B}}}_{(m,n)}\), \(\forall m=1,\ldots , M\) and \(n=1,\ldots , N\), we can show that \({\varvec{{C}}} = {\varvec{{A}}} \odot {\varvec{{B}}} = {\varvec{{{\mathcal {D}}}}}_A{\bullet }_{2,4}^{1,2}{\varvec{{B}}} = {\varvec{{{\mathcal {D}}}}}_B{\bullet }_{2,4}^{1,2}{\varvec{{A}}}\). Here \({\varvec{{{\mathcal {D}}}}}_A \in {\mathbb {C}}^{M \times M \times N \times N}\) and \({\varvec{{{\mathcal {D}}}}}_B \in {\mathbb {C}}^{M \times M \times N \times N}\) are diagonal four-way tensors with nonzero elements \({{\varvec{{{\mathcal {D}}}}}_A}_{(m,m,n,n)} = {\varvec{{A}}}_{(m,n)}\) and \({{\varvec{{{\mathcal {D}}}}}_B}_{(m,m,n,n)} = {\varvec{{B}}}_{(m,n)}\), respectively. As an alternative, we also have

$$\begin{aligned} {\varvec{{C}}} = {\varvec{{A}}} \odot {\varvec{{B}}} = {\varvec{{{\mathcal {D}}}}}^{(A)}{\bullet }_{2,3}^{1,3}{\varvec{{{\mathcal {D}}}}}^{(B)}, \end{aligned}$$

where the diagonal three-way tensors have the following nonzero elements \({{\varvec{{{\mathcal {D}}}}}^{(A)}}_{(m,m,n)}={\varvec{{A}}}_{(m,n)}\) and \({{\varvec{{{\mathcal {D}}}}}^{(B)}}_{(m,n,n)}={\varvec{{B}}}_{(m,n)}\). Moreover, these diagonal three-way tensors can be either defined it terms of slices, \({{\varvec{{{\mathcal {D}}}}}^{(A)}}_{(.,.,n)}=\mathop {\mathrm{{diag}}}\left( {{\varvec{{A}}}_{(.,n)}}\right) , \forall n = 1,\ldots , N\), \({{\varvec{{{\mathcal {D}}}}}^{(B)}}_{(m,.,.)}=\mathop {\mathrm{{diag}}}\left( {{\varvec{{B}}}_{(m,.)}}\right) , \forall m = 1,\ldots , M\) or using tensor notation \({{\varvec{{{\mathcal {D}}}}}^{(A)}} = {\varvec{{{\mathcal {I}}}}}_{3,M} \times _3 {\varvec{{A}}}^{\mathrm{T}}\) and \({{\varvec{{{\mathcal {D}}}}}^{(B)}} = {\varvec{{{\mathcal {I}}}}}_{3,N} \times _1 {\varvec{{B}}}\).

2.4.2 Slice-wise multiplication via tensor contraction

A slice-wise multiplication between two tensors \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{M \times N \times K}\) and \({\varvec{{{\mathcal {B}}}}} \in {\mathbb {C}}^{N \times J \times K}\) is defined as \({{\varvec{{{\mathcal {T}}}}}_1}_{(.,.,k)} = {\varvec{{{\mathcal {A}}}}}_{(.,.,k)}{\varvec{{{\mathcal {B}}}}}_{(.,.,k)}\), \(\forall k = 1,\ldots , K\). We depict this slice-wise multiplication in Fig. 1. To express this slice-wise multiplication we can diagonalize \({\varvec{{{\mathcal {B}}}}}\) to obtain

$$\begin{aligned} {\varvec{{{\mathcal {T}}}}}_1 = {\varvec{{{\mathcal {A}}}}}{\bullet }_{2,3}^{1,4}{\varvec{{{\mathcal {D}}}}}_B \in {\mathbb {C}}^{M\times J \times K}, \end{aligned}$$

where \({\varvec{{{\mathcal {D}}}}}_B\in {\mathbb {C}}^{N\times J \times K \times K}\) has nonzero elements \({{\varvec{{{\mathcal {D}}}}}_B}_{(n,j,k,k)} = {{\varvec{{{\mathcal {B}}}}}}_{(n,j,k)}\) or \({{\varvec{{{\mathcal {D}}}}}_{B}}_{(n,j,.,.)}=\mathop {\mathrm{{diag}}}\left( {{\varvec{{{\mathcal {B}}}}}}_{(n,j,.)}\right)\), for \(n =1,\ldots N\) and \(j =1,\ldots J\). Further combinations are also possible that lead to the same result, for instance, \({\varvec{{{\mathcal {T}}}}}_2 = {\varvec{{{\mathcal {D}}}}}_B{\bullet }_{1,4}^{2,3}{\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{J\times K \times M}\) or \({\varvec{{{\mathcal {T}}}}}_3 = {\varvec{{{\mathcal {D}}}}}_A{\bullet }_{2,4}^{1,3}{\varvec{{{\mathcal {B}}}}} \in {\mathbb {C}}^{M\times K \times J}\) with \({{\varvec{{{\mathcal {D}}}}}_A}_{(m,n,k,k)} = {{\varvec{{{\mathcal {A}}}}}}_{(m,n,k)}\) as diagonal elements (nonzero elements of \({\varvec{{{\mathcal {D}}}}}_A\)). Note that the tensors \({\varvec{{{\mathcal {T}}}}}_1\), \({\varvec{{{\mathcal {T}}}}}_2\), and \({\varvec{{{\mathcal {T}}}}}_3\) contain the same elements, but have permuted dimensions. However, the permuted order of the dimensions is not relevant, because we always explicitly declare which dimension is multiplied or unfolded.

Fig. 1
figure 1

A slice-wise multiplication between two tensors \({\varvec{{{\mathcal {A}}}}} \in {\mathbb {C}}^{M \times N \times K}\) and \({\varvec{{{\mathcal {B}}}}} \in {\mathbb {C}}^{N \times J \times K}\)

2.4.3 Representation of diagonal matrices and diagonal tensors in terms of Khatri–Rao products

An explicit expression of the diagonalized tensor can be obtained by expressing its generalized unfolding in terms of a Khatri–Rao product with an identity matrix. First, let us consider the column vector \({\varvec{{a}}} \in {\mathbb {C}}^{M}\). It can be easily shown that

$$\begin{aligned} \mathop {\mathrm{{diag}}}\left( {\varvec{{a}}}\right) = {\varvec{{I}}}_M\diamond {\varvec{{a}}}^{\mathrm{T}}. \end{aligned}$$

Next, let us consider the reshaping of the matrix \({\varvec{{A}}}\in {\mathbb {C}}^{M\times N}\) into a diagonal tensor \({{\varvec{{{\mathcal {D}}}}}^{(A)}} = {\varvec{{{\mathcal {I}}}}}_{3,M} \times _3 {\varvec{{A}}}^{\mathrm{T}}\). By studying the resulting tensor structure, the tensor unfoldings, and the properties of the Khatri–Rao product, we get

$$\begin{aligned} {\left[ {\varvec{{{\mathcal {D}}}}}^{(A)} \right] _{([3,2],[1])}}= {\varvec{{I}}}_M \diamond {{{\varvec{{A}}}}}^{\mathrm{T}}. \end{aligned}$$

Likewise, for the tensor \({{\varvec{{{\mathcal {D}}}}}^{(B)}} = {\varvec{{{\mathcal {I}}}}}_{3,N} \times _1 {\varvec{{B}}}\in {\mathbb {C}}^{M\times N\times N}\) and the matrix \({\varvec{{B}}}\in {\mathbb {C}}^{ M \times N}\), we have \({\left[ {\varvec{{{\mathcal {D}}}}}^{(B)} \right] _{([1,3],[2])}}= {\varvec{{I}}}_N \diamond {{{\varvec{{B}}}}}\).

Table 1 Link between the diagonalized tensor structures and their generalized unfoldings

The expression of the diagonalized tensor in terms of its generalized unfoldings and the Khatri–Rao product with an identity matrix can also be obtained for N-way tensors. It is useful to note that there exists a link between the diagonalized tensor structures and their corresponding generalized unfoldings. The latter can always be expressed as a Khatri–Rao product between an identity matrix and a generalized unfolding of the tensor to be diagonalized, where the dimensions that are diagonalized are in the columns of the second matrix. This notation will be used later in this paper, and it is given in Table 1.

The element-wise or slice-wise multiplication between two arrays of the same order can be written in terms of a contraction if the unaffected mode vectors are transformed into a diagonal matrix (by adding an additional array dimension). This diagonalization can be performed using the Khatri–Rao product as shown in Table 1. As an example, please refer to the transformation of Eq. (4) to the equations at the beginning of Sect. 3.3 in this paper.

3 MIMO-OFDM

We assume a MIMO-OFDM system with \(M_{{\mathrm{T}}}\) transmit and \(M_{\mathrm{R}}\) receive antennas. One OFDM block consists of N samples, which equals the discrete Fourier transform (DFT) length, using the assumption that all N subcarriers are used for data transmission. If guard subcarriers are used, i.e., not all subcarriers are used for data transmission, the number of OFDM samples is smaller that the DFT length. All signals and equations used for the following derivation are in the frequency domain. Moreover, N is the number of subcarriers and K denotes the number of transmitted frames, where each frame consists of N symbol periods. The received signal in the frequency domain \(\tilde{{\varvec{{{\mathcal {Y}}}}}} \in {\mathbb {C}}^{N\times M_{\mathrm{R}} \times K}\) after the removal of the cyclic prefix is defined by means of the contraction operator

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {Y}}}}}} = \tilde{{\varvec{{{\mathcal {H}}}}}}\bullet _{2,4}^{1,2}\tilde{{\varvec{{{\mathcal {S}}}}}} + \tilde{{\varvec{{{\mathcal {N}}}}}} = \tilde{{\varvec{{{\mathcal {Y}}}}}}_0 + \tilde{{\varvec{{{\mathcal {N}}}}}}. \end{aligned}$$
(4)

We use \(\sim\) to distinguish the frequency domain from the time domain, i.e., \(\tilde{{\varvec{{{\mathcal {Y}}}}}}={{\varvec{{{\mathcal {Y}}}}}}\times _1{\varvec{{F}}}_N\), where \({\varvec{{F}}}_N\in {\mathbb {C}}^{N\times N}\) is the DFT matrix and \({\varvec{{{\mathcal {Y}}}}}\) is the received signal in the time domain. The transmit signal tensor is denoted as \(\tilde{{\varvec{{{\mathcal {S}}}}}} \in {\mathbb {C}}^{N \times M_{{\mathrm{T}}} \times K}\) and \(\tilde{{\varvec{{{\mathcal {N}}}}}} \in {\mathbb {C}}^{N \times M_{\mathrm{R}} \times K}\) represents the additive white Gaussian noise in the frequency domain. The tensor \(\tilde{{\varvec{{{\mathcal {Y}}}}}}_0 \in {\mathbb {C}}^{N\times M_{\mathrm{R}} \times K}\) represents the noiseless received signal in the frequency domain after the removal of the cyclic prefix. The frequency-selective propagation channel is represented by a channel tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}} \in {\mathbb {C}}^{N\times N \times M_{\mathrm{R}} \times M_{{\mathrm{T}}}}\) as we propose in [18] the structure of which is detailed as follows.

3.1 Channel tensor

We assume that the frequency-selective channel has an impulse response \({\varvec{{h}}}_L^{(m_{\mathrm{R}},m_{{\mathrm{T}}})} \in {\mathbb {C}}^{L \times 1}\), for each receive–transmit antenna pair, \((m_{\mathrm{R}},m_{{\mathrm{T}}})\), for \(m_{\mathrm{R}} = 1\ldots M_{\mathrm{R}}\) and \(m_{{\mathrm{T}}} = 1\ldots M_{{\mathrm{T}}}\), and a maximum of L taps. After the removal of the cyclic prefix, the channel matrix in the frequency domain is a diagonal matrix for each receive–transmit antenna pair, \(\tilde{{\varvec{{H}}}}^{(m_{\mathrm{R}},m_{{\mathrm{T}}})}=\mathrm{{ diag}}\left( {\varvec{{F}}}_{N\times L}\cdot {\varvec{{h}}}_L^{(m_{\mathrm{R}},m_{{\mathrm{T}}})}\right) \in {\mathbb {C}}^{N \times N}\) [10, 11]. Here, the matrix \({\varvec{{F}}}_{N\times L} \in {\mathbb {C}}^{N \times L}\) contains the first L columns of the DFT matrix of size \(N \times N\). Collecting all the channel matrices in a four-way channel tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}\), we get

$$\begin{aligned}&\tilde{{\varvec{{{\mathcal {H}}}}}}_{(.,.,m_{\mathrm{R}},m_{{\mathrm{T}}})} = \mathrm{{ diag}}\left( {\varvec{{F}}}_{N\times L}\cdot {\varvec{{h}}}_L^{(m_{\mathrm{R}},m_{{\mathrm{T}}})}\right) =\mathrm{{ diag}}\left( \tilde{{\varvec{{h}}}}^{(m_{\mathrm{R}},m_{{\mathrm{T}}})}\right) . \end{aligned}$$
(5)

For each receive–transmit antenna pair, the channel transfer matrix is a diagonal matrix that is represented by the corresponding slice of the tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) as shown in (5). The vector \(\tilde{{\varvec{{h}}}}^{(m_{\mathrm{R}},m_{{\mathrm{T}}})} \in {\mathbb {C}}^{N\times 1}\) contains the frequency domain channel coefficients. An example of a MIMO system with \(M_{{\mathrm{T}}}=2\) transmit antennas and \(M_{\mathrm{R}}=3\) receive antennas and the corresponding channel vectors is depicted in Fig. 2. We assume that the channel stays constant during the K frames. Note that only in case of cyclic prefix OFDM the channel tensor in the frequency domain contains diagonal matrices for each receive–transmit antenna pair. In a general multi-carrier system, the frequency domain channel matrix is not necessarily diagonal. However, Eq. (4) is still satisfied which means that our general model is valid for any multi-carrier MIMO system (not only OFDM-based), including systems without orthogonality in the frequency domain and systems with different types of coding.

Fig. 2
figure 2

A MIMO system with \(M_{{\mathrm{T}}}=2\) transmit antennas and \(M_{\mathrm{R}}=3\) receive antennas (left-hand side). Visualization of the generalized unfolding \([\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[2,4])}\) for the MIMO-OFDM (right-hand side)

In (5), we have defined the channel tensor. However, up to this point, we have not revealed the explicit tensor structure. In order to do so, let us first assume that all channel transfer matrices for the \(m_{{\mathrm{T}}}\)th transmit and all receive antennas are collected in a diagonal tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})} \in {\mathbb {C}}^{N\times N \times M_{\mathrm{R}}}\), i.e.,

$$\begin{aligned}&\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}(.,.,m_{\mathrm{R}})}^{(m_{{\mathrm{T}}})} = \mathrm{{diag}}\left( \tilde{{\varvec{{h}}}}^{(m_{\mathrm{R}},m_{{\mathrm{T}}})}\right) \end{aligned}$$
(6)

Based on this diagonal structure, the tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})}\) can be written as the following CP decomposition:

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})} = {\varvec{{{\mathcal {I}}}}}_{3,N}\times _1{\varvec{{I}}}_N\times _2{\varvec{{I}}}_N\times _3\tilde{{\varvec{{H}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})}, \end{aligned}$$
(7)

where \(\tilde{{\varvec{{H}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})} =\begin{bmatrix} \tilde{{\varvec{{h}}}}^{(1,m_{{\mathrm{T}}})}&\tilde{{\varvec{{h}}}}^{(2,m_{{\mathrm{T}}})}&\ldots&\tilde{{\varvec{{h}}}}^{(M_{\mathrm{R}},m_{{\mathrm{T}}})} \end{bmatrix}^{\mathrm{T}}\in {\mathbb {C}}^{M_{\mathrm{R}} \times N}\).

The complete four-way channel tensor, defined in Eq. (5), can be obtained by concatenating the \(\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})}\) tensors along the fourth dimension. Hence, the four-way channel tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) can be expressed as

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {H}}}}} }&=\begin{bmatrix} \tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(1)}&\sqcup _4&\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(2)}&\sqcup _4&\ldots&\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(M_{{\mathrm{T}}})} \end{bmatrix} = \sum _{m_{{\mathrm{T}}}=1}^{M_{{\mathrm{T}}}} \tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})} \circ {\varvec{{e}}}_{m_{{\mathrm{T}}}} \\&= \sum _{m_{{\mathrm{T}}}=1}^{M_{{\mathrm{T}}}} {\varvec{{{\mathcal {D}}}}}\times _1{\varvec{{I}}}_N\times _2{\varvec{{I}}}_N\times _3\tilde{{\varvec{{H}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})}\times _4 {\varvec{{e}}}_{m_{{\mathrm{T}}}}. \end{aligned}$$
(8)

Note that \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) satisfies a very special block term decomposition (BTD), where \({\varvec{{{\mathcal {D}}}}}_{(.,.,.,1)}={\varvec{{{\mathcal {I}}}}}_{3,N} \in {\mathbb {R}}^{N\times N\times N\times 1}\) (\({\varvec{{{\mathcal {D}}}}}={\varvec{{{\mathcal {I}}}}}_{4,1}\otimes {\varvec{{{\mathcal {I}}}}}_{3,N}\)) and \({\varvec{{e}}}_{m_{{\mathrm{T}}}}\in {\mathbb {R}}^{M_{{\mathrm{T}}}\times 1}\) is a pining vector. The BTD decomposes a tensor into block terms of smaller n-mode ranks [27]. We prove the BTD structure of the channel tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) in “Appendix.” In this appendix, we also show that the ([1, 3], [2, 4]) generalized unfolding of the channel tensor can be expressed as

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[2,4])} = \tilde{{\varvec{{H}}}}\diamond ({\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}\otimes {\varvec{{I}}}_{N}) \in {\mathbb {C}}^{NM_{\mathrm{R}} \times NM_{{\mathrm{T}}}}, \end{aligned}$$
(9)

where \(\tilde{{\varvec{{H}}}} \in {\mathbb {C}}^{ M_{\mathrm{R}} \times NM_{{\mathrm{T}}}}\) is a matrix containing all nonzero elements of the tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) and it is defined as

$$\begin{aligned} \tilde{{\varvec{{H}}}}&= \left[ \begin{array}{c|c|c|c} \tilde{{\varvec{{h}}}}^{(1,1){\mathrm{T}}} &{} \tilde{{\varvec{{h}}}}^{(1,2){\mathrm{T}}} &{} \ldots &{} \tilde{{\varvec{{h}}}}^{(1,M_{{\mathrm{T}}}){\mathrm{T}}}\\ \vdots &{} \vdots &{}\vdots &{} \vdots \\ \tilde{{\varvec{{h}}}}^{(M_{\mathrm{R}},1){\mathrm{T}}} &{} \tilde{{\varvec{{h}}}}^{(M_{\mathrm{R}},2){\mathrm{T}}} &{} \ldots &{} \tilde{{\varvec{{h}}}}^{(M_{\mathrm{R}},M_{{\mathrm{T}}}){\mathrm{T}}} \end{array} \right] \\&= \left[ \begin{array}{cccc} \tilde{{\varvec{{H}}}}^{(1)}_{\mathrm{R}}&\tilde{{\varvec{{H}}}}^{(2)}_{\mathrm{R}}&\ldots&\tilde{{\varvec{{H}}}}^{(M_{{\mathrm{T}}})}_{\mathrm{R}} \end{array} \right] \in {\mathbb {C}}^{ M_{\mathrm{R}} \times NM_{{\mathrm{T}}}}. \end{aligned}$$
(10)

Figure 2 (right-hand side) depicts the structure of the generalized unfolding \([\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[2,4])}\) for a MIMO-OFDM system with parameters \(M_{{\mathrm{T}}} = 2\), \(M_{\mathrm{R}} = 3\), and \(N = 3\).

3.2 Data transmission

The signal tensor \(\tilde{{\varvec{{{\mathcal {S}}}}}}\) in Eq. (4) contains all data symbols in the frequency domain that are transmitted on N subcarriers, \(M_{{\mathrm{T}}}\) transmit antennas, and K frames. For notational simplicity, we define the following block matrix \(\tilde{{\varvec{{S}}}}\) as the transpose of the three-mode unfolding of \(\tilde{{\varvec{{{\mathcal {S}}}}}}\)

$$\begin{aligned} \tilde{{\varvec{{S}}}} = \tilde{{\varvec{{{\mathcal {S}}}}}}_{([1,2],[3])}^{\mathrm{T}}= \left[ \begin{array}{cccc} \tilde{{\varvec{{S}}}}^{(1)}&\tilde{{\varvec{{S}}}}^{(2)}&\ldots&\tilde{{\varvec{{S}}}}^{(M_{{\mathrm{T}}})} \end{array} \right] \in {\mathbb {C}}^{K \times NM_{{\mathrm{T}}}}, \end{aligned}$$
(11)

where \(\tilde{{\varvec{{S}}}}^{(m_{{\mathrm{T}}})} \in {\mathbb {C}}^{K \times N}\) contains the symbols transmitted via the \(m_{{\mathrm{T}}}\)th antenna.

Moreover, we assume that the symbol matrix consists of data and pilot symbols, \(\tilde{{\varvec{{S}}}}= \tilde{{\varvec{{S}}}}_{\mathrm{{d}}}+\tilde{{\varvec{{S}}}}_{\mathrm{{p}}}\). The matrices \(\tilde{{\varvec{{S}}}}_{\mathrm{{d}}}\) and \(\tilde{{\varvec{{S}}}}_{\mathrm{{p}}}\) represent the data symbols and the pilot symbols, respectively. The matrix \(\tilde{{\varvec{{S}}}}_{\mathrm{{d}}}\) contains zeros at the positions of the pilot symbols. Accordingly, the matrix \(\tilde{{\varvec{{S}}}}_{\mathrm{{p}}}\) contains nonzero elements only at the pilot positions. Typically, there are three ways of arranging the pilot symbol within the OFDM blocks (block, comb, and lattice-type) [7]. We assume a comb-type arrangement, where the pilot symbols are positioned on non-consecutive positions with equidistant spacing in the time and the frequency domains, for each antenna. The spacing in the time domain is denoted by \(\Delta K\). Moreover, we assume a spacing in the frequency domain of \(\Delta F\) between two pilot symbols. Furthermore, there are positions where neither pilot symbols nor data symbols are allowed to be transmitted. These positions are reserved for the pilot symbols corresponding to the remaining antennas. This results in \(M_{{\mathrm{T}}}\lfloor \frac{N}{\Delta F}\rfloor\) pilot symbols per frame. In comparison, other publications such as [4, 12,13,14] use \(N{M_{{\mathrm{T}}}}\) pilot symbols per frame. By exploiting the channel correlation among adjacent subcarriers, a reduced number of pilot symbols can be used for channel estimation.

3.3 Receiver design

Using the property of the generalized unfoldings in Eq. (2), the received signal in Eq. (4) becomes

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([1, 2],[3])} = [\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[2,4])}\tilde{{\varvec{{{\mathcal {S}}}}}}_{([1,2],[3])} + [\tilde{{\varvec{{{\mathcal {N}}}}}}]_{([1, 2],[3])}. \end{aligned}$$

Next, by substituting the corresponding tensor unfoldings in the above equation, we get

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([1, 2],[3])} = \left( \tilde{{\varvec{{H}}}}\diamond ({\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}\otimes {\varvec{{I}}}_{N})\right) \cdot \tilde{{\varvec{{S}}}}^{\mathrm{T}}+ [\tilde{{\varvec{{{\mathcal {N}}}}}}]_{([1, 2],[3])}. \end{aligned}$$

The above equation satisfies an unfolding of a noisy observation of a low-rank tensor with a CP structure. By applying an inverse unfolding for the received signal in the frequency domain after the removal of the cyclic prefix, we get the desired tensor description of the received data tensor

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {Y}}}}}} = {\varvec{{{\mathcal {I}}}}}_{3,NM_{{\mathrm{T}}}} \times _1 ({\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}\otimes {\varvec{{I}}}_{N}) \times _2 \tilde{{\varvec{{H}}}} \times _3 \tilde{{\varvec{{S}}}} + \tilde{{\varvec{{{\mathcal {N}}}}}} {\in {\mathbb {C}}^{N\times M_{\mathrm{R}} \times K}}. \end{aligned}$$
(12)

Note that this model is a constrained CP-like model where the one-mode factor is a known constraint matrix. Our goal is to exploit (12) to jointly estimate the channel and the symbols, i.e., \(\tilde{{\varvec{{H}}}}\) and \(\tilde{{\varvec{{S}}}}\). The author of [16] proposes a similar model for the received signal of FBMC systems. In contrast to the model derived in this paper from contractions, the model in [16] is derived from the PARATUCK2 model. This means that the received signal should fit the PARATUCK2 decomposition in order to satisfy the received signal structure. On the other hand, the proposed derivation based on contractions in (4) is more general and it holds without such an assumption. More specifically, the proposed tensor contraction formalism that defines the signal model in Eq. (5) does not require the matrix slices \(\tilde{{\varvec{{H}}}}\) defined in Eq. (6) to be diagonal. Therefore, the proposed model and the derived algorithms remain valid for nonorthogonal multi-carrier systems with an arbitrary structure of the equivalent channel tensor in Eq. (4). This aspect is not captured by the tensor modeling approach of [16].

Using the prior knowledge of the pilot symbols and their positions, the channel in the frequency domain can be estimated. Naturally, the channel is estimated only at those subcarrier positions where the pilot symbols are located. Afterwards, an interpolation is applied to get the complete channel estimate. Alternatively, as shown in [10, 11] the channel can be first estimated in the time domain and then transformed into the frequency domain. Either way, this leads to a pilot-based channel estimate that we denote as \({\tilde{{\varvec{{H}}}}}_{\mathrm{{p}}}\), or \(\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{{p}}}\)Footnote 1. The pilot-based channel estimate is then used to estimate the data symbols. In the remainder of this section, we discuss different ways to estimate the symbols. We use the pilot-based channel estimate to initialize the proposed algorithms.

Traditionally, the estimate of the symbols is obtained in the frequency domain with a ZF receiver. In this case, the symbols are calculated by inverting the channel matrix for each subcarrier individually.

Alternatively, if we compute the one-mode unfolding of the tensor \({\tilde{{\varvec{{{\mathcal {Y}}}}}}}\) in Eq. (12), we get

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([1],[2, 3])} = ({\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}\otimes {\varvec{{I}}}_{N})\cdot \left( \tilde{{\varvec{{S}}}}\diamond \tilde{{\varvec{{H}}}}\right) ^{\mathrm{T}}+[\tilde{{\varvec{{{\mathcal {N}}}}}}]_{([1],[3, 2])}. \end{aligned}$$

Taking into account the structure of the matrices \(({\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}\otimes {\varvec{{I}}}_{N}) \in {\mathbb {R}}^{N \times NM_{{\mathrm{T}}}}\), \(\tilde{{\varvec{{H}}}}\) in (10), and \(\tilde{{\varvec{{S}}}}\) in (11), the one-mode unfolding becomes

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([1],[2, 3])} = \sum _{m_{{\mathrm{T}}}=1}^{M_{{\mathrm{T}}}}\left( \tilde{{\varvec{{S}}}}^{(m_{{\mathrm{T}}})}\diamond \tilde{{\varvec{{H}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})}\right) ^{\mathrm{T}}+[\tilde{{\varvec{{{\mathcal {N}}}}}}]_{([1],[3, 2])}. \end{aligned}$$

After transposition and omitting the noise term, we get

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([2, 3],[1])} \approx \sum _{m_{{\mathrm{T}}}=1}^{M_{{\mathrm{T}}}}\left( \tilde{{\varvec{{S}}}}^{(m_{{\mathrm{T}}})}\diamond \tilde{{\varvec{{H}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})}\right) \in {\mathbb {C}}^{M_{\mathrm{R}}K \times N}. \end{aligned}$$

This sum of Khatri–Rao products can be resolved in a column-wise fashion. Let \(\tilde{{\varvec{{y}}}}_n\in {\mathbb {C}}^{M_{\mathrm{R}}K \times 1}\) denote the nth column of \([\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([2, 3],[1])} \in {\mathbb {C}}^{M_{\mathrm{R}}K \times N}\). After reshaping this vector into the matrix \(\tilde{{\varvec{{Y}}}}_n \in {\mathbb {C}}^{M_{\mathrm{R}}\times K}\), such that \(\tilde{{\varvec{{y}}}}_n=\mathrm{{vec}}(\tilde{{\varvec{{Y}}}}_n)\), it is easy to see that this matrix satisfies

$$\begin{aligned} \tilde{{\varvec{{Y}}}}_n \approx \tilde{{\varvec{{H}}}}_n\cdot \tilde{{\varvec{{S}}}}_n, \end{aligned}$$
(13)

where \(\tilde{{\varvec{{H}}}}_n\) and \(\tilde{{\varvec{{S}}}}_n\) are the nth slices of \(\tilde{{\varvec{{{\mathcal {H}}}}}}_{(n,n,.,.)}\in {\mathbb {C}}^{M_{\mathrm{R}} \times M_{{\mathrm{T}}}}\) and \(\tilde{{\varvec{{{\mathcal {S}}}}}}_{(n,.,.)}\in {\mathbb {C}}^{M_{{\mathrm{T}}} \times K}\), respectively. Note that \(\tilde{{\varvec{{Y}}}}_n\) is the nth slice of \({\tilde{{\varvec{{{\mathcal {Y}}}}}}}_{(n,.,.)}\). Using the pseudo inverse of the channel, we get the traditional ZF receiver.

Alternatively, the channel and the symbols on each subcarrier can be estimated by means of iterative or recursive LS algorithms. Similar algorithms were proposed in [28] and [29] for blind source separation on a single subcarrier. We extend two of the algorithms presented in [29] that are based on projection to our application. We have proposed an extension of these algorithm using enumeration in [17], namely iterative least squares with projection (ILSP) and recursive least squares with projections (RLSP). In this paper, our focus is on finite alphabet projection-based algorithms since that they are computationally less expensive than the algorithms based on enumeration.

The identifiability properties of the problem in Eq. (13) have already been studied in [29], where the authors present sufficient conditions for identifiability.

4 Khatri–Rao-coded MIMO-OFDM

In this section, we model a Khatri–Rao-coded MIMO-OFDM communication system as a double tensor contraction between a channel and a signal tensor that contains coded symbols. This double tensor contraction is essentially equivalent to the model in (4). However, we assume that the signal tensor contains Khatri–Rao-coded symbols.

As in Sect. 3, we assume a MIMO-OFDM communication system with \(M_{{\mathrm{T}}}\) transmit and \(M_{\mathrm{R}}\) receive antennas. One OFDM block consists of N samples, which equals the DFT length. Moreover, all N subcarriers are used for data transmission. Furthermore, we assume a frequency-selective channel model that stays constant over the transmission of P frames. In contrast to the model presented in Sect. 3, here, we assume that the P frames are divided into K groups of Q blocks (Q corresponds to the spreading factor), \(P = K\cdot Q\).

Accordingly, the received signal in the frequency domain is given by

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {Y}}}}}} = \tilde{{\varvec{{{\mathcal {H}}}}}}\bullet _{2,4}^{1,2}\tilde{{\varvec{{{\mathcal {X}}}}}} + \tilde{{\varvec{{{\mathcal {N}}}}}} = \tilde{{\varvec{{{\mathcal {Y}}}}}}_0+ \tilde{{\varvec{{{\mathcal {N}}}}}} \in {\mathbb {C}}^{N \times M_{\mathrm{R}}\times K\times Q}, \end{aligned}$$
(14)

where \(\tilde{{\varvec{{{\mathcal {H}}}}}} \in {\mathbb {C}}^{N \times N \times M_{\mathrm{R}} \times M_{{\mathrm{T}}}}\) is the channel tensor and \(\tilde{{\varvec{{{\mathcal {X}}}}}} \in {\mathbb {C}}^{N \times M_{{\mathrm{T}}} \times K \times Q}\) is the signal tensor. The tensor \(\tilde{{\varvec{{{\mathcal {N}}}}}}\in {\mathbb {C}}^{N \times M_{\mathrm{R}}\times K\times Q}\) contains additive white Gaussian noise and \(\tilde{{\varvec{{{\mathcal {Y}}}}}}_0\in {\mathbb {C}}^{N \times M_{\mathrm{R}}\times K\times Q}\) is the noiseless received signal.

4.1 Channel tensor

In this section, we use the model of the channel tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) defined in Eq. (8). Moreover, we have defined the generalized unfolding \(\left[ \tilde{{\varvec{{{\mathcal {H}}}}}} \right] _{([1,3],[2,4])}\) in Eq. (9). Using a permutation matrix, it can be shown that the generalized unfolding \([\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[4,2])}\) of the channel is equal to

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[4,2])} = \bar{{\varvec{{H}}}} \diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}), \end{aligned}$$
(15)

where \(\bar{{\varvec{{H}}}} = \begin{bmatrix} \tilde{{\varvec{{H}}}}_R^{(1)}&\ldots&\tilde{{\varvec{{H}}}}_{\mathrm{R}}^{(M_{{\mathrm{T}}})} \end{bmatrix}\cdot {\varvec{{P}}} = {\tilde{{\varvec{{H}}}}}\cdot {\varvec{{P}}} \in {\mathbb {C}}^{M_{\mathrm{R}} \times M_{{\mathrm{T}}}N}.\) The permutation matrix \({\varvec{{P}}} \in {\mathbb {R}}^{NM_{{\mathrm{T}}} \times M_{{\mathrm{T}}}N }\) reorders the columns such that the faster increasing index is \(M_{{\mathrm{T}}}\) instead of N and it is defined as \([\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[4,2])}=[\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[2,4])}\cdot {\varvec{{P}}}\). Recall that the matrices \(\tilde{{\varvec{{H}}}}\in {\mathbb {C}}^{M_{\mathrm{R}} \times NM_{{\mathrm{T}}}}\) and \(\tilde{{\varvec{{H}}}}_{\mathrm{R}}^{(m_{{\mathrm{T}}})}\in {\mathbb {C}}^{M_{\mathrm{R}} \times N}\) are defined in Eq. (10). The structure of the four-way channel tensor in the frequency domain \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) and its unfoldings are derived in “Appendix.”

4.2 Data transmission

We can impose a CP structure to the transmit signal tensor, if we assume Khatri–Rao-coded symbols [19, 20]. The coding is proportional to the number of transmit antennas if we use a spreading factor \(Q=M_{{\mathrm{T}}}\), for each subcarrier \(n = 1,2,\ldots , N\). Hence, the generalized unfolding of the signal tensor is

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],[4,3])}&= \begin{bmatrix} \tilde{{\varvec{{S}}}}_1\diamond {\varvec{{C}}}_1&\tilde{{\varvec{{S}}}}_2\diamond {\varvec{{C}}}_2&\ldots&\tilde{{\varvec{{S}}}}_N\diamond {\varvec{{C}}}_N \end{bmatrix}^{\mathrm{T}} \\&= {\varvec{{I}}}_{ M_{{\mathrm{T}}}N}(\bar{{\varvec{{S}}}}\diamond \bar{{\varvec{{C}}}})^{\mathrm{T}}, \end{aligned}$$
(16)

where the matrix \(\tilde{{\varvec{{S}}}}_n \in {\mathbb {C}}^{K \times M_{{\mathrm{T}}}}\) contains modulated data symbols and \({\varvec{{C}}}_n \in {\mathbb {C}}^{Q \times M_{{\mathrm{T}}}}\) is a Vandermonde coding matrix as defined in [19]. The matrices \(\bar{{\varvec{{S}}}} = \begin{bmatrix} \tilde{{\varvec{{S}}}}_1&\ldots&\tilde{{\varvec{{S}}}}_N \end{bmatrix} \in {\mathbb {C}}^{K\times M_{{\mathrm{T}}}N}\) and \(\bar{{\varvec{{C}}}} = \begin{bmatrix} {\varvec{{C}}}_1&\ldots&{\varvec{{C}}}_N \end{bmatrix} \in {\mathbb {C}}^{Q\times M_{{\mathrm{T}}}N}\) contain all symbol and coding matrices for each subcarrier, respectively. Note that \(\bar{{\varvec{{S}}}}=\tilde{{\varvec{{S}}}}\cdot {\varvec{{P}}}\), where the matrix \(\tilde{{\varvec{{S}}}}\) is defined in Eq. (11) and \({\varvec{{P}}} \in {\mathbb {R}}^{NM_{{\mathrm{T}}} \times M_{{\mathrm{T}}}N }\) is the above-mentioned permutation matrix that reorders the columns such that the faster increasing index is \(M_{{\mathrm{T}}}\) instead of N. Moreover, we assume that \(\tilde{{\varvec{{S}}}}\) contains pilot symbols as explained after Eq. (11). As shown in [19] and as directly follows from (16), the tensor \([\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],3,4)}\) satisfies the following CP decomposition:

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],3,4)} = {\varvec{{{\mathcal {I}}}}}_{3,M_{{\mathrm{T}}}N} \times _1 {\varvec{{I}}}_{M_{{\mathrm{T}}}N} \times _2 \bar{{\varvec{{S}}}} \times _3 \bar{{\varvec{{C}}}}. \end{aligned}$$

4.3 Receiver design

Using Eqs. (2), (3), and (14), the noiseless received signal can be expressed as

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}_0]_{([1,2],[4,3])}=[\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[4,2])}\cdot [\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],[4,3])}. \end{aligned}$$

Inserting the corresponding unfoldings of the channel and the signal tensor in Eqs. (15) and (16), respectively, the noiseless received signal in the frequency domain is given by

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}_0]_{([1,2],[4,3])}= \left( \bar{{\varvec{{H}}}} \diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) \cdot (\bar{{\varvec{{S}}}}\diamond \bar{{\varvec{{C}}}})^{\mathrm{T}}. \end{aligned}$$

The above equation represents an unfolding of a four-way tensor with a CP structure. Therefore, the noiseless received signal tensor can be expressed as

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {Y}}}}}}_0 = {{\varvec{{{\mathcal {I}}}}}_{4,M_{{\mathrm{T}}}N}}\times _1 ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}) \times _2 \bar{{\varvec{{H}}}} \times _3 \bar{{\varvec{{S}}}}\times _4 \bar{{\varvec{{C}}}} { \in {\mathbb {C}}^{N \times M_{\mathrm{R}}\times K\times Q}}. \end{aligned}$$
(17)

Equation (17) represents the received signal in the frequency domain for all N subcarriers, \(M_{\mathrm{R}}\) receive antennas, and P frames after the removal of the cyclic prefix. Depending on the available a priori knowledge at the receiver side, channel estimation, symbol estimation, or joint channel and symbol estimation can be performed.

Let us compare the MIMO-OFDM tensor model and the Khatri–Rao-coded MIMO-OFDM tensor model in Eqs. (12) and (17), respectively. First, the factor matrices in these equations have different index orderings. In Eq. (12) the faster increasing index in N, whereas in Eq. (17) the faster increasing index in \(M_{{\mathrm{T}}}\) along the columns of the factor matrices. We use \(\sim\) and − to distinguish the different index orderings of the factor matrices. Recall that we have defined a permutation matrix \({\varvec{{P}}}\) that considers the reordering of the columns of the factor matrices. Moreover, Eq. (17) has an additional tensor dimension (the four-mode) corresponding to the coding technique and the spreading factor Q. Furthermore, taking into account the permutation matrix \({\varvec{{P}}}\), we get Eq. (12) from Eq. (17) for \(Q = 1\) and \(\bar{{\varvec{{C}}}}= {\varvec{{1}}}_{M_{{\mathrm{T}}}N}^{\mathrm{T}}\) (i.e., no coding and the spreading factor equals one).

Using Eq. (17), the channel and the data symbols can be jointly estimated from the ([1, 4], [3, 2]) generalized unfolding of the noise corrupted received signal

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([1,4],[3,2])}\approx \left( \bar{{\varvec{{C}}}} \diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) \cdot (\bar{{\varvec{{H}}}}\diamond \bar{{\varvec{{S}}}})^{\mathrm{T}}. \end{aligned}$$

Under the assumption that \(Q=M_{{\mathrm{T}}}\), \(\left( \bar{{\varvec{{C}}}} \diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) \in {\mathbb {C}}^{NQ \times M_{{\mathrm{T}}}N}\) is a block diagonal, left invertible matrix and known at the receiver. Using the properties of the coding matrices defined in [19], i.e., \({\varvec{{C}}}_n^{\mathrm{H}}{\varvec{{C}}}_n=M_{{\mathrm{T}}}{\varvec{{I}}}_{M_{{\mathrm{T}}}}\), we have

$$\begin{aligned} \bar{{\varvec{{Y}}}}\triangleq \frac{1}{M_{{\mathrm{T}}}}\left( \bar{{\varvec{{C}}}} \diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) ^{\mathrm{H}}\cdot [\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([1,4],[3,2])} \approx (\bar{{\varvec{{H}}}}\diamond \bar{{\varvec{{S}}}})^{\mathrm{T}}. \end{aligned}$$

After transposition, \(\bar{{\varvec{{Y}}}}^{\mathrm{T}}\approx \bar{{\varvec{{H}}}}\diamond \bar{{\varvec{{S}}}}\) can be approximated by the Khatri–Rao product between the channel and the data symbols. Therefore, the channel and the data symbols can be jointly estimated based on the LS-KRF as in [30].

Using the LS-KRF, the matrices \(\bar{{\varvec{{H}}}}\) and \(\bar{{\varvec{{S}}}}\) can be identified up to one complex scaling factor ambiguity per column. Hence, the estimated matrices satisfy the following relations:

$$\begin{aligned}&\hat{\bar{{\varvec{{H}}}}} = \bar{{\varvec{{H}}}}\cdot {\varvec{{\Lambda }}} \mathrm { and } \hat{\bar{{\varvec{{S}}}}} = \bar{{\varvec{{S}}}}\cdot {\varvec{{\Lambda }}}^{-1}, \end{aligned}$$
(18)

where \({\varvec{{\Lambda }}} \in {\mathbb {C}}^{M_{{\mathrm{T}}} N\times M_{{\mathrm{T}}}N}\) is a diagonal matrix with diagonal elements equal to the \(M_{{\mathrm{T}}}N\) complex scaling ambiguities. The simplest way to resolve the scaling ambiguity is by assuming the knowledge of one row of the matrix \(\bar{{\varvec{{S}}}} \in {\mathbb {C}}^{K\times M_{{\mathrm{T}}}N}\). This corresponds to \(M_{{\mathrm{T}}}N\) pilot symbols, i.e., one pilot symbol per transmit antenna and subcarrier. Since traditional MIMO-OFDM communication systems use fewer pilot symbols than \(M_{{\mathrm{T}}}N\), we propose to use the same amount of pilot symbols and exploit the channel correlation between adjacent subcarriers in order to estimate the scaling matrix. We transmit pilot symbols on positions with equidistant spacing in the frequency and the time domain. With the prior knowledge of the pilot symbols and their positions, we can obtain an initial channel estimate as in traditional MIMO-OFDM systems (see Sect. 3). We denote this pilot-based channel estimate by \(\tilde{{\varvec{{{\mathcal {H}}}}}}_{\mathrm{{p}}}\) \(({\bar{{\varvec{{H}}}}}_p)\). The pilot-based channel estimate is then used to estimate the scaling ambiguity \({\varvec{{\Lambda }}}\) in Eq. (18) as

$$\begin{aligned} \hat{{\varvec{{\Lambda }}}} = \mathrm{{diag}}\left( \frac{1}{M_{\mathrm{R}}}\sum _{m_{\mathrm{R}}= 1}^{M_{\mathrm{R}}}\hat{\bar{{\varvec{{H}}}}}_{(m_{\mathrm{R}},.)}\oslash {\bar{{\varvec{{H}}}}}_{\mathrm{{p}}(m_{\mathrm{R}},.)}\right) . \end{aligned}$$

By multiplying the solution of the LS-KRF with the diagonal matrix \(\hat{{\varvec{{\Lambda }}}}\), the scaling ambiguity in Eq. (18) is resolved and the data symbols can be demodulated. Note that the proposed Khatri–Rao receiver estimates the channel and the symbols in a semi-blind fashion. First, the channel and the symbols are jointly estimated without any a priori information. The pilot-based channel estimate is then used to resolve the scaling ambiguity affecting the columns of \(\hat{\bar{{\varvec{{H}}}}}\) and \(\hat{\bar{{\varvec{{S}}}}}\). Therefore, the optimal length and repetition of the piloting sequences are identical as for the traditional OFDM systems. We summarize the steps of the proposed Khatri–Rao (KR) receiver in Algorithm 1.

figure a

Furthermore, the channel estimate resulting from the KR receiver can be used for channel tracking in future transmission frames if the channel has not changed drastically. If the channel estimate is used for tracking, it could be improved by means of an additional LS estimate from \([\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([2,4,1],[3])}\) with the knowledge of the estimated and projected symbols onto the finite alphabet \(\Omega\), i.e., \({Q({{\bar{{\varvec{{S}}}}}})}=\mathrm{{proj}}({{\bar{{\varvec{{S}}}}}})\). The finite alphabet \(\Omega\) depends on the modulation type and the modulation order \(M_o\).

$$\begin{aligned} \hat{\bar{{\varvec{{H}}}}}_{\mathrm{{LS}}} ^{\mathrm{T}}= \left( ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\diamond \bar{{\varvec{{C}}}}\diamond {Q({\bar{{\varvec{{S}}}}})}\right) ^{+}\cdot [\tilde{{\varvec{{{\mathcal {Y}}}}}}]_{([2,4,1],[3])} \end{aligned}$$

However, we can also use this improved channel estimation to further improve the performance of the KR receiver. Using this updated channel estimate an improved estimate of the diagonal scaling matrix \(\hat{{\varvec{{\Lambda }}}}\) can be calculated and with that an enhanced estimate of the symbols, \(\hat{\bar{{\varvec{{S}}}}}_{\mathrm{{LS}}}\), using Eq. (18). Note that, instead of just one LS estimate of the channel and the symbols the performance can be further enhanced with additional iterations leading to an iterative receiver. Note that the symbol matrix \(\hat{\bar{{\varvec{{S}}}}}_{\mathrm{{LS}}}\) can be estimated in the least squares sense from the three-mode unfolding of Eq. (17), but the estimation of \(\hat{{\varvec{{\Lambda }}}}\) is computationally cheaper. The KR receiver with its enhancement via LS is summarized in Algorithm 2.

figure b

Due to the additional LS-based estimates, the KR+LS algorithm has higher computational complexity than the KR algorithm.

5 Khatri–Rao cross-coding MIMO-OFDM

In Sect. 4, we have proposed a tensor model for KR-coded MIMO-OFDM systems that introduces an additional CP-like structure to the signal tensor. The additional CP-like structure of the signal tensor is achieved by means of a simplified Khatri–Rao coding. However, using such a Khatri–Rao coding, we add additional spreading that reduces the spectral efficiency of the system. To overcome this issue, in this section we propose to keep the CP structure of the signal tensor proposed in Sect. 4, but introduce a cross-coding approach, where the known Khatri–Rao coding matrices \({\varvec{{C}}}_1,\ldots , {\varvec{{C}}}_N\) are replaced by symbol matrices containing useful information symbols to be transmitted.

As in Sect. 4, the received signal in the frequency domain after the removal of the cyclic prefix is given by

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {Y}}}}}} = \tilde{{\varvec{{{\mathcal {H}}}}}}\bullet _{4,2}^{2,1}\tilde{{\varvec{{{\mathcal {X}}}}}} + \tilde{{\varvec{{{\mathcal {N}}}}}} = \tilde{{\varvec{{{\mathcal {Y}}}}}}_0+ \tilde{{\varvec{{{\mathcal {N}}}}}} \in {\mathbb {C}}^{N \times M_{\mathrm{R}}\times K\times Q}, \end{aligned}$$
(19)

Likewise, the \(P = KQ\) frames that are divided into K groups of Q blocks (“spreading factor”). We model the channel tensor \(\tilde{{\varvec{{{\mathcal {H}}}}}}\) according to Eq. (8). Details regarding this model are also provided in “Appendix.” In this section, we make use of the generalized unfolding \([\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[4,2])} = \bar{{\varvec{{H}}}} \diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\) that is defined in (15). The generalized unfolding ([2, 1], [4, 3]) of the received signal tensor is given by

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],[4,3])}&= \begin{bmatrix} {\bar{{\varvec{{S}}}}^{(1)}_1}\diamond \bar{{\varvec{{S}}}}^{(2)}_1&{\bar{{\varvec{{S}}}}^{(1)}_2}\diamond \bar{{\varvec{{S}}}}^{(2)}_2&\ldots&{\bar{{\varvec{{S}}}}^{(1)}_N}\diamond \bar{{\varvec{{S}}}}^{(2)}_N \end{bmatrix}^{\mathrm{T}} \\&= {\varvec{{I}}}_{ M_{{\mathrm{T}}}N}(\bar{{\varvec{{S}}}}^{(1)}\diamond \bar{{\varvec{{S}}}}^{(2)})^{\mathrm{T}}, \end{aligned}$$
(20)

where the matrix \({\bar{{\varvec{{S}}}}^{(1)}_n} \in {\mathbb {C}}^{K \times M_{{\mathrm{T}}}}\) and \(\bar{{\varvec{{S}}}}^{(2)}_n \in {\mathbb {C}}^{Q \times M_{{\mathrm{T}}}}\) are the first and second symbol matrices that carry information symbols. The first symbol matrix \(\bar{{\varvec{{S}}}}^{(1)}\) follows the structure of the symbol matrix in Sect. 4 and is composed of a pilot part and a data symbols part (c.f. Eq. (11)). On the other hand, the second symbol matrix only contains data symbols, except its first row, which contains known symbols (e.g., row vectors composed of 1’s). We refer to this transmission scheme as cross-coded MIMO-OFDM, due to the fact that \({\bar{{\varvec{{S}}}}^{(1)}_n}\) plays the role of a random KR coding with respect to \({\bar{{\varvec{{S}}}}^{(2)}_n}\) and vice versa. Let us define the block matrices \(\bar{{\varvec{{S}}}}^{(1)} = \begin{bmatrix} {\tilde{{\varvec{{S}}}}^{(1)}_1}&\ldots&{\tilde{{\varvec{{S}}}}^{(1)}_N} \end{bmatrix} \in {\mathbb {C}}^{K\times M_{{\mathrm{T}}}N}\) and \(\bar{{\varvec{{S}}}}^{(2)} = \begin{bmatrix} \tilde{{\varvec{{S}}}}^{(2)}_1&\ldots&\tilde{{\varvec{{S}}}}^{(2)}_N \end{bmatrix} \in {\mathbb {C}}^{Q\times M_{{\mathrm{T}}}N}\). From (20), the tensor \([\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],3,4)}\) satisfies the following CP decomposition

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],3,4)} = {\varvec{{{\mathcal {I}}}}}_{3,M_{{\mathrm{T}}} N} \times _1 {\varvec{{I}}}_{M_{{\mathrm{T}}}N} \times _2 \bar{{\varvec{{S}}}}^{(1)} \times _3 \bar{{\varvec{{S}}}}^{(2)}. \end{aligned}$$

Using Eqs. (2) and (19), the noiseless received signal is given by

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}_0]_{([1,2],[4,3])}=[\tilde{{\varvec{{{\mathcal {H}}}}}}]_{([1,3],[4,2])}\cdot [\tilde{{\varvec{{{\mathcal {X}}}}}}]_{([2,1],[4,3])}. \end{aligned}$$
(21)

Inserting (15) and (20) into (21), we obtain

$$\begin{aligned}{}[\tilde{{\varvec{{{\mathcal {Y}}}}}}_0]_{([1,2],[4,3])}= \left( \bar{{\varvec{{H}}}} \diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) \cdot (\bar{{\varvec{{S}}}}^{(1)}\diamond \bar{{\varvec{{S}}}}^{(2)})^{\mathrm{T}}, \end{aligned}$$

or, alternatively, using the n-mode product notation

$$\begin{aligned} \tilde{{\varvec{{{\mathcal {Y}}}}}}_0 = {{\varvec{{{\mathcal {I}}}}}_{4,M_{{\mathrm{T}}}N}}\times _1 ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}) \times _2 \bar{{\varvec{{H}}}} \times _3 \bar{{\varvec{{S}}}}^{(1)}\times _4 \bar{{\varvec{{S}}}}^{(2)} {\in {\mathbb {C}}^{N \times M_{\mathrm{R}}\times K\times Q}}. \end{aligned}$$
(22)

Depending on the available a priori knowledge at the receiver side, channel estimation, symbol estimation, or joint channel and symbol estimation can be performed. Differently from the KR-coded system, where a known coding matrix is used, in the cross-coded MIMO-OFDM system, this knowledge is not available, which makes the receiver design more challenging. A joint channel and symbol estimation now involves the estimation of three factor matrices from the noisy version of the four-way CP model (22). From the three-mode, four-mode, and two-mode unfoldings of \(\tilde{{\varvec{{{\mathcal {Y}}}}}}\) in (19), and using (22), we can obtain the LS equations for estimating \(\bar{{\varvec{{S}}}}^{(1)}\), \(\bar{{\varvec{{S}}}}^{(2)}\) and \(\bar{{\varvec{{H}}}}\), respectively:

$$\begin{aligned} \bar{{\varvec{{S}}}}^{(1)}&= \left[ \tilde{{{\varvec{{{\mathcal {Y}}}}}}} \right] _{(3)}\cdot \left[ {\left( \bar{{\varvec{{S}}}}^{(2)}\diamond \bar{{\varvec{{H}}}}\diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) ^{\mathrm{T}}}\right] ^{+}, \end{aligned}$$
(23)
$$\begin{aligned} \bar{{\varvec{{S}}}}^{(2)}&= \left[ \tilde{{{\varvec{{{\mathcal {Y}}}}}}} \right] _{(4)}\cdot \left[ \left( \bar{{\varvec{{S}}}}^{(1)}\diamond \bar{{\varvec{{H}}}}\diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) ^{\mathrm{T}}\right] ^{+} \end{aligned}$$
(24)
$$\begin{aligned} \bar{{\varvec{{H}}}}&= \left[ \tilde{{{\varvec{{{\mathcal {Y}}}}}}} \right] _{(2)}\cdot \left[ \left( \bar{{\varvec{{S}}}}^{(2)}\diamond \bar{{\varvec{{S}}}}^{(1)}\diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) ^{\mathrm{T}}\right] ^{+} \end{aligned}$$
(25)

We adopt a three step ALS algorithm for estimating the symbol and channel matrices from the noisy versions of (23)–(25). However, it is known that there is no guarantee of convergence if we initialize the ALS algorithm randomly. To overcome this issue, we propose to use the pilot-based channel estimate \(\bar{{\varvec{{H}}}}_{\mathrm{{p}}}\) to obtain initial estimates of the matrices \(\bar{{\varvec{{S}}}}^{(1)}\) and \(\bar{{\varvec{{S}}}}^{(2)}\) based on LS-KRF. Such a channel estimate is obtained from the pilot symbols in \(\bar{{\varvec{{S}}}}^{(1)}\) and the first row of \(\bar{{\varvec{{S}}}}^{(2)}\) that has known symbols. From the ([3, 4], [1, 2]) generalized unfolding of the noisy received signal tensor \({\varvec{{{\mathcal {Y}}}}}\), we get

$$\begin{aligned} \left[ \tilde{{\varvec{{{\mathcal {Y}}}}}} \right] _{([3,4],[1,2])} \approx \left[ \bar{{\varvec{{S}}}}^{(2)}\diamond \bar{{\varvec{{S}}}}^{(1)}\right] \cdot \left[ \bar{{\varvec{{H}}}}_{\mathrm{{p}}}\diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right] ^{\mathrm{T}}. \end{aligned}$$
figure c
figure d

Given \(\bar{{\varvec{{H}}}}_{\mathrm{{p}}}\) and \(M_{\mathrm{R}}\ge M_{{\mathrm{T}}}\), from \(\left[ {\tilde{{\varvec{{{\mathcal {Y}}}}}}} \right] _{([3,4],[1,2])}\cdot \left[ \left( \bar{{\varvec{{H}}}}_{\mathrm{{p}}}\diamond ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}})\right) ^{\mathrm{T}}\right] ^{+}\approx \left[ \bar{{\varvec{{S}}}}^{(2)}\diamond \bar{{\varvec{{S}}}}^{(1)}\right]\) based on LS-KRF, we obtain \(\hat{\bar{{\varvec{{S}}}}}^{(1)}\) and \(\hat{\bar{{\varvec{{S}}}}}^{(2)}\). However, the matrices \(\hat{\bar{{\varvec{{S}}}}}^{(1)}\) and \(\hat{\bar{{\varvec{{S}}}}}^{(2)}\) are estimated up to one complex scaling ambiguity per column. We exploit the first row of the matrix \({\bar{{\varvec{{S}}}}}^{(2)}\) to estimate this ambiguity (recall that the elements of the first row of the matrix \({\bar{{\varvec{{S}}}}}^{(2)}\) are set to one). After resolving the scaling ambiguity, we propose to iterate between Eqs. (23)–(25) to enhance the accuracy of the receiver. Hence, we propose two receivers such as cross-coded Khatri–Rao (CC-KR) and cross-coded Khatri–Rao+alternating least squares (CC-KR+ALS) for the cross-coded MIMO-OFDM systems. These two algorithms are summarized in Algorithm 3 and Algorithm 4, respectively. The CC-KR receiver exploits the LS-KRF to compute an estimate of the symbol matrices \(\bar{{\varvec{{S}}}}^{(1)}\) and \(\bar{{\varvec{{S}}}}^{(2)}\), assuming that \(M_{\mathrm{R}}\ge M_{{\mathrm{T}}}\), the first row on the matrix \(\bar{{\varvec{{S}}}}^{(2)}\) contains only ones, and a pilot-based channel estimate \(\bar{{\varvec{{H}}}}_{\mathrm{{p}}}\) is already available. Note that the initial steps of the CC-KR+ALS and the CC-KR receivers are the same. As for the subsequent steps, for the CC-KR+ALS receiver, the channel matrix and both symbol matrices are estimated using ALS. The algorithm is stopped if it exceeds the maximum number of iterations that is set to 5, reaches a predefined minimum of the cost function \(\left\| \tilde{{\varvec{{{\mathcal {Y}}}}}} - {{\varvec{{{\mathcal {I}}}}}_{4,M_{{\mathrm{T}}}N}}\times _1 ({\varvec{{I}}}_N\otimes {\varvec{{1}}}_{M_{{\mathrm{T}}}}^{\mathrm{T}}) \times _2 \hat{\bar{{\varvec{{H}}}}} \times _3 \hat{\bar{{\varvec{{S}}}}}^{(1)}\times _4 \hat{\bar{{\varvec{{S}}}}}^{(2)}\right\| _{\mathrm{H}}^2/\left\| \tilde{{\varvec{{{\mathcal {Y}}}}}}\right\| _{\mathrm{H}}^2\), or if the error of the cost function has not changed within two consecutive iterations. The CC-KR+ALS algorithm has a higher computational complexity than the CC-KR algorithm due to the additional ALS iterations, as shown in Algorithm 4.

Based on Eqs. (23)–(25) and to ensure the parameter estimation identifiability, Algorithms 3 and 4 have to satisfy the following conditions related to the system parameters,

$$\begin{aligned} M_RQ \ge M_T, \quad M_RK \ge M_T, \quad KQ \ge M_T. \end{aligned}$$
(26)

These conditions establish trade-offs involving the space, time, and coding diversities to ensure a unique recovery of the channel and the symbols. More specifically, decreasing the number of receive antennas can be compensated by an increase in the numbers of groups K or the number of blocks Q that define the cross-coding scheme in order to ensure joint channel and symbol identifiability.

6 Simulation results

In this section, we evaluate the performance of the proposed receivers for MIMO-OFDM systems using Monte Carlo simulations. First, we compare the performance of ZF, ILSP, and RLSP, using 5000 realizations. We consider a \(2 \times 2\) OFDM system, with K frames, and \(N= 128\) subcarriers. The pilot symbols are transmitted on every third subcarrier such that \(\Delta F = 3\) and only during the first frame, i.e., \(\Delta K = K\). Using these pilots, we obtain a pilot-based channel estimate with which we initialize all of the algorithms. The transmitted data symbols are independent and they are drawn from a quadrature amplitude modulation (4-QAM). The frequency selective propagation channel is modeled according to the 3rd Generation Partnership Project (3GPP) Pedestrian A channel (Ped A)  [31]. The duration of the cyclic prefix is 32 samples and the weighting factor \(\alpha = 1\), for the recursive LS. The maximum number of iterations for the iterative algorithm is set to 7.

Fig. 3
figure 3

SER versus \(E_b/N_0\) for \(N=128\), \(Q=2\), \(M_{{\mathrm{T}}} =2\), \(M_{\mathrm{R}}=2\), \(\Delta K = K\), \(\Delta F = 4\) and different numbers of blocks K

In Fig. 3, we compare the SER performance of the traditional frequency domain ZF receiver, the proposed Khatri–Rao (KR) receiver (see Algorithm 1) and the proposed Khatri–Rao receiver with one additional LS iteration (see Algorithm 2) for different numbers of transmitted blocks. In this case, note that the KR and the KR+LS receivers benefit from the increased number of frames as the channel has been kept constant during the \(P=Q\cdot K\) frames. Moreover, as the number of frames increases, the advantages of the enhancement via LS become more pronounced.

Fig. 4
figure 4

SER comparison for different numbers of transmit and receive antennas

Moreover, the SER comparison between the ZF and the Khatri–Rao-coded algorithms, for \(N=128\), \(Q=M_T\), \(K=2\), \(\Delta _K =2\), \(\Delta _F=4\), and different numbers of antennas are depicted in Fig. 4. The KR and KR-LS receivers benefit from an increased number of transmit antennas due to the increased spreading factor, \(Q = M_T\). The performance enhancement with the additional LS estimate is achieved for \(K > 2\). However, the KR receiver outperforms the ZF one even without the LS enhancement. We can observe that for the Khatri–Rao-coded algorithms, i.e., the KR and the KR-LS, the performance of the receiver is increased. However, as shown in Table 2, we linearly increase the computational complexity of the receiver, since more rank-one approximations must be computed.

Fig. 5
figure 5

SER versus \(E_b/N_0\) for \(N=128\) subcarriers, \(K=10\) blocks, \(\Delta K = 10\), while \(P=10\) (\(K=5\) and \(Q=2\)) for the Khatri–Rao-coded MIMO-OFDM

In Fig. 5, we depict the SERs for these two systems. The KR receiver has similar accuracy to the ILSP and the RLSP algorithms [17] that improves with the increased SNR. The KR+LS receiver outperforms the ILSP algorithm and the KR algorithm in terms of SER. Recall that the KR-coded OFDM model in Eq. (17) has a richer tensor structure than the OFDM model in Eq. (12) due to the coding. The KR algorithm and the KR-LS algorithm effectively exploit this structure to estimate the channel and the symbols. Note that the KR-LS algorithm computes an improved estimate of the scaling matrix. Therefore, KR-LS leads to lower SER levels than the ILSP and KR algorithms.

Fig. 6
figure 6

SER for a \(2\times 2\) cross-coded OFDM system with parameters \(N=128\), \(Q=2\), K, \(\Delta K\), \(\Delta F\), and the symbols are drawn from a 4-QAM modulation. The parameters K, \(\Delta K\) and \(\Delta F\) are indicated in the legend

In Fig. 6, we provide an SER comparison for two scenarios. For both scenarios, we assume \(Q=2\), and the symbols are drawn from a 4-QAM modulation. Moreover, \(K=5\), \(\Delta F=10\), and \(\Delta K= 5\), for the first scenario, whereas for the second scenario \(K=3\), \(\Delta F=5\), and \(\Delta K= 3\). Hence, in the first scenario we estimate more symbols than in the second scenario, using fewer pilot symbols. As expected, we achieve a lower SER if more pilot symbols are used because they lead to a more accurate initial pilot-based channel estimate. Moreover, in Fig. 6 we see that the CC-KR+ALS receiver outperforms the CC-KR receiver. Thus, we benefit from the additional iterations and from exploiting the complete tensor structure. In contrast to CC-KR, CC-KR+ALS also estimates the channel matrix. Furthermore, the accuracy gain of the CC-KR+ALS receiver is more pronounced if we initialize the CC-KR+ALS with a less accurate pilot-based channel estimate (the gain is more pronounced for the solid lines than for the dashed lines in Fig. 6).

Fig. 7
figure 7

SER for \(4\times 4\) KR-coded OFDM, cross-coded OFDM, and traditional OFDM systems

Finally, in Fig. 7, we depict the SER performance for a \(4\times 4\) MIMO system, considering the following receivers: (i) ILSP receiver [17], (ii) RLSP receiver [17], (iii) KR receiver (Algorithm 1), (iv) KR-LS receiver (Algorithm 2), (v) CC-KR receiver (Algorithm 3), and (vi) CC-KR+ALS (Algorithm 4). To ensure a fair comparison in terms of spectral efficiency, the following parameters were chosen for the different receivers: The KR-coded OFDM system assumes \(N=128\), \(\Delta F = 10\), \(K=2\), \(\Delta K = 2\), \(Q=4\), \(P=KQ=8\) and the symbols are modulated using 16-QAM. For the CC-coded OFDM system we assume \(N=128\), \(\Delta F = 10\), \(K=2\), \(\Delta K = 2\), \(Q=4\), \(P=KQ=8\) and the symbols are drawn from a BPSK modulation. The OFDM system assumes \(N=128\), \(\Delta F = 10\), \(K=8\), \(\Delta K = 8\), and BPSK symbols. We see that the CC-KR receiver outperforms ILSP and RLSP receivers from [17]. In addition, the KR and KR-LS receivers for KR-coded OFDM have different slopes than the uncoded OFDM and the cross-coded OFDM, exhibiting a better performance, as expected.

Table 2 Computational cost of compared algorithms

In Table 2, we show the computational complexity of the compared algorithms. We take into account the main computational efforts, i.e., the computation of matrix inverses. For a matrix \({\varvec{{A}}} \in {\mathbb {C}}^{N \times M}\), we consider the cost of \({\mathcal {O}}\left( M^3\right)\) for its inversion, and \({\mathcal {O}}\left( NM^2 \right)\) for the computation of its rank-one SVD approximation. The Khatri–Rao factorization-based algorithms, i.e., the KR and the CC-KR algorithms have the lowest computational effort. This is due to the fact that they compute \(NM_T\) independent rank-one matrix approximations, while the remaining algorithms (ZF, RLSP, and ILSP), require iterations and/or the inversion of large matrices. Compared to the KR coding, the proposed CC-KR has similar complexity if \(Q = M_R\). On the other hand, in the proposed CC-KR receiver, two data symbol matrices are transmitted (\({\varvec{{S}}}^{(1)}\) and \({\varvec{{S}}}^{(2)}\)), increasing the spectral efficiency of the MIMO-OFDM system.

7 Conclusion and discussion

In this paper, we have presented a tensor model for MIMO-OFDM systems using the double contraction between a channel tensor and a transmit signal tensor. The use of double contractions allows us to derive explicit CP-like, or Tucker-like, tensor models for the received signal, which are exploited for a joint channel and symbol estimation using semi-blind algorithms. The proposed model is a very general and flexible way of describing the received signal in MIMO-OFDM systems for all subcarriers jointly.

We have also proposed Khatri–Rao-coded MIMO-OFDM models and proposed the corresponding semi-blind receivers based on the derived explicit CP-like tensor structure of the data model. In particular, the proposed KR-coded receivers, namely KR, KR+LS, CC-KR, and CC-KR+ALS, achieve a better performance in terms of the symbol error rate than the state-of-the-art schemes from the literature (ZF, ILSP, and RLSP). Also, the Khatri–Rao-based receivers (KR and CC-KR) can benefit from parallel processing, thus having a lower computational processing delay than the competitors. In addition, we have improved the performance of the Khatri–Rao-based receivers by means of an additional LS iteration (KR+LS) and an ALS procedure (CC-KR+ALS). Note that the Khatri–Rao coding strategy (KR and KR+LS) has a reduced spectral efficiency than the uncoded MIMO-OFDM system. To overcome this limitation, we have proposed a cross-coded Khatri–Rao strategy (CC-KR and CC-KR+ALS algorithms), where the “coding matrix” contains useful data symbols. For this cross-coded system, two receivers have been proposed.

A natural perspective of this work is an extension of the proposed semi-blind receivers to other multi-carrier techniques such as universal filtered multi-carrier (UFMC) and FBMC modulation [16], relay-assisted systems, and multi-user systems. In the case of a multi-user system, the proposed CC-KR algorithm, and possibly the CC-KR+ALS algorithm, can be used where the transmitted data symbols of multiple users are used as “coding matrices” to improve the total spectral efficiency of the system.