Abstract
In this paper, an efficient projection wavelet weighted twin support vector regression (PWWTSVR) based orthogonal frequency division multiplexing system (OFDM) system channel estimation algorithm is proposed. Most Channel estimation algorithms for OFDM systems are based on the linear assumption of channel model. In the proposed algorithm, the OFDM system channel is consumed to be nonlinear and fading in both time and frequency domains. The PWWTSVR utilizes pilot signals to estimate response of nonlinear wireless channel, which is the main work area of SVR. Projection axis in optimal objective function of PWWRSVR is sought to minimize the variance of the projected points due to the utilization of a priori information of training data. Different from traditional support vector regression algorithm, training samples in different positions in the proposed PWWTSVR model are given different penalty weights determined by the wavelet transform. The weights are applied to both the quadratic empirical risk term and the firstdegree empirical risk term to reduce the influence of outliers. The final regressor can avoid the overfitting problem to a certain extent and yield great generalization ability for channel estimation. The results of numerical experiments show that the propose algorithm has better performance compared to the conventional pilotaided channel estimation methods.
Introduction
Recently, with the rapid development of wireless communication technology, the demand for highspeed data transmission has increased rapidly. Broadband communication is an effective technology to provide high data rate transmission. However, the increase of bandwidth will result in less sampling period than channel delay spread, especially in multipath scenarios, which gives rise to frequencyselective channels. Carrier frequencyoffset due to the oscillators’ mismatch, together with high relative mobility between the transmitter and receiver cause the transmission channel to change rapidly in time, which is referred to as the timeselectivity of the channel. The channel with both the frequencyselectivity and the timeselectivity is called doublyselective in wireless communications. Orthogonal frequency division multiplexing (OFDM) is an attractive multicarrier modulation scheme, which divides the whole bandwidth into multiple overlapping narrowband subchannels to reduce the symbol rate. OFDM has the advantages of high spectral efficiency and simple structure of single tap equalizer. The bandwidth of each subcarrier in OFDM systems is small enough that each subcarrier is considered to experience flat fading in the frequencyselective channel. The narrow band nature of subcarriers makes the signal robust against the frequencyselectivity, and the intersymbol interference (ISI) can be easily eliminated by inserting a cyclic prefix (CP) in front of each transmitted OFDM block. However, OFDM is relatively sensitive to the timeselectivity of the mobile channel. Time variation of the channel within an OFDM symbol duration results in intercarrier interference (ICI) (Wu and Fan 2016) and lead to an irreducible error floor in conventional receivers, which further degrades the performance. OFDM has strong robustness in high delay spread environment, which can eliminate the need for equalizing delay spread effect. This feature allows for higher data rates and leads to the selection of OFDM as the standard for digital audio broadcasting, digital video broadcasting, some wireless local area networks, longterm evolution and the next generation cellular systems. It has been proposed to be adopted in high speed train broadband communication system (Yu et al. 2016). The performance of the OFDM systems is affected by channel estimation, timing synchronization and mobility. The ICI may become more severe as mobile speed, carrier frequency or OFDM symbol duration increases (Sheng et al. 2017). OFDM can be implemented by using a coherent or noncoherent detection technique. Coherent detection methods usually provide higher signaltonoise ratio gain than incoherent methods because they use channel state information. This implies, however, the receiver is more complex because channel state information is usually obtained through channel estimation. In order to achieve acceptable reception quality for applications with high latency and Doppler spread, it is essential to optimize the design of channel estimators.
In highly selective multipath fading channels, the channel response presents complex nonlinearity. If linear method is used, the estimation accuracy will be reduced (Charrada and Samet 2016). Therefore, it is necessary to use the nonlinear channel estimation method to improve the estimation performance. While support vector regression (SVR) developed from support vector machine (SVM) is suitable for regression of nonlinear systems. The SVM is proposed based on the principle of statistical learning theory and Vapnik–Chervonenkis (VC) dimensional theory, which is a promising machine learning approach that has been adopted in classification and regression (Vapnik 1995). The adoption of the kernel trick can be used to extend applications to nonlinear situations. It has become one of the most powerful tools for pattern classification and regression (Xu et al. 2017). So far, some SVR algorithms have been used in wireless channel estimation. Matilde et al. (2004) developed a multipleinput multipleoutput channel estimation method based on SVR, but the channel was assumed to be nonselective. Djouama et al. (2014) and Charrada and Samet (2016) proposed OFDM channel estimation method based on SVR for different application scenarios. However, the methods mentioned above are based on the basic SVR, there are still shortcomings in computational complexity and performance. Along with continuous research progress, many modifications have been proposed in recent years. Peng (2010) proposed a twin support vector regression (TSVR), which can increase the computational speed by solving two smallsize quadratic programming problems (QPPs). Xu et al. (2017) proposed asymmetric \(\nu\)twin support vector regression, which is a kind of twin SVR suitable for dealing with asymmetric noise. Peng et al. (2014) introduced a pair of projection terms in the optimization problem, which has the advantage of embedding the structural information of the data into the learning process, resulting in the reduction of empirical variance. Melki et al. (2017) studied multitarget regression and presented several models for problems with multiple outputs. Anand et al. (2019), Anagha et al. (2018) and Gupta and Gupta (2019) proposed the improved SVR algorithms based on the pinball loss function, Balasundaram et al. (2014) (2016a, Knowl. Inf. Syst.) studied the problems of Lagrangian SVR, and Anand et al. (2018) proposed a generalized \(\varepsilon\)loss function for regression.
However, all of the training samples in most methods are considered to have the same status and are given the same penalties, which may degrade performance due to the influence of noise or outliers. It is useful to give the training samples different weights depending on their importance. Although Xu and Wang (2014) proposed Knearest neighbor (KNN)based weighted twin support vector regression, which uses local information of data to improve prediction accuracy, KNNbased methods are suitable for clustering sample regression, but not for time series such as channel estimation. Recently, an efficient projection wavelet weighted twin support vector regression (PWWTSVR) was proposed in our work (Wang et al. 2019), which introduces a weight matrix based on wavelet transform and suitable for dealing with timeseries data. PWWTSVR is with good normalization performance and makes full use of data structure information. It is suitable for regression of nonlinear systems with sequence information as training samples. The application of the proposed algorithm in wireless channel parameter prediction, a system with strong nonlinearity, will improve the estimation performance. The idea of combining the TSVR and weights based on wavelet transform was also applied to the regression of steelmaking model (Gao et al. 2019).
Taking the merits of PWWTSVR, the proposed channel estimation algorithm can be with good performance. The contributions of this paper are summarized as follows.

1.
The improved TSVR is adopted for the first time to estimate doubly selective wireless channel parameters in OFDM system. This method solves the problem that the performance of most traditional estimation methods is degraded by linear assumptions.

2.
Aiming at nonlinearity and the characteristics of time series of fading channels of OFDM system, we propose an improved TSVR, PWWTSVR algorithm based on wavelet transform. It can be said that the method of calculating weight matrix by wavelet transform in this work is a new preprocessing angle. Wavelet transform a kind of timefrequency representation for signals, therefore the proposed method based on the wavelet theory is suitable for dealing with time series samples such as channel parameters. Additionally, compared with TSVR, the PWWTSVR improves the regression performance by adding projection terms to the objective function.

3.
In order to make better use of information of the received pilot signal and improve the channel estimation accuracy, weight matrix is introduced into the objective functions for channel parameters regression. The weights are inserted into both quadratic and firstdegree terms to reduce the influence of outliers, which is likely to appear in the received pilot signal polluted by noise. Intuitively, the PWWTSVR can make full use of the prior information of training channel response. The weight matrix and weight vector, which represent the distance of noised samples and its ’real position’, can reflect the prior information of the training channel response. The larger weight is given to samples with smaller noise and the smaller weight is given to samples with larger noise.
This paper is organized as follows: Sect. 2 briefly describes OFDM system. Section 3 proposes projection wavelet weighted TSVR channel estimation. Experimental results are described in Sect. 4 to investigate the validity of our proposed algorithm, and Sect. 5 ends the paper with concluding remarks.
System model
Consider an OFDM system with N subcarriers experiencing doubly selective fading channel. The sequence to be transmitted X(k) from QPSK or QAM constellation is parsed into blocks of N symbols and then transformed into a timedomain sequence using an Npoint IFFT. To avoid interblock interference (IBI), a cyclic prefix (CP) of length M that is equal to or larger than the channel order L (channel discrete multipath number is \(L+1\) ) is inserted at the head of each block. The timedomain signal x(n) can be serially transmitted over the fading channel. x(n) can be expressed as
where \(n=M,\ldots ,N1,\) \(k=0,1,\ldots ,N1.\) After CP is removed at the receiver, the received signal in time domain y(n) can be expressed as
where \(\eta (n)\) is additive white Gaussian noise (AWGN) with zeromean, variance \(\sigma _{n}^{2}\) and independent with each other, i.e. \(E(\eta (n)\eta (m))=\delta (nm)\sigma _{n}^{2}\). \(h_{l}(n)\) is the basebandequivalent doubly selective channel impulse response of the lth path (\(l=0,1,\ldots ,L\)) at time n, which includes the physical channel as well as filters at the transmitter and receiver.
The matrix form of (2) can be expressed as
where \(\mathbf {X}=[X(0),X(1),\ldots ,X(N1)]^{T}\), \(\mathbf {x}=[ x(0),x(1),\ldots x(N1)]^{T}\), \(\mathbf {y}=[y(0),y(1),\ldots y(N1)]^{T}\), \({\eta }=[\eta (0),\eta (1),\ldots \eta (N1)]^{T}\), (\(\cdot\))\(^{T}\) denotes the transpose operation; \(\mathbf {X},\mathbf {x},\mathbf {y},{ \eta }\in \mathbb {C}^{N}\), and the channel matrix \(\mathbf {h}\in \mathbb {C} ^{N\times N}\) can be expressed as
where \(\mathbf {F}^{H}\) is an Npoint inverse discrete Fourier transform (IDFT) matrix, \((\cdot )^{H}\) denotes conjugate transpose and the entry of which \([\mathbf {F}^{H}]_{n,k}=(1/\sqrt{N})\exp (j2\pi nk/N)\). Also define Fourier transform (DFT) matrix \(\mathbf {F}\), and \([\mathbf {F}] _{n,k} =(1/\sqrt{N})\exp (j2\pi nk/N)\).
Perform Fourier transform on both sides of (2) and the following equations can be obtained,
where \(\mathbf {Y}=[Y(0),Y(1),\ldots ,Y(N1)]^{T}\in \mathbb {C}^{N}\) is received signal vector in frequency domain. \(\mathbf {H}=\mathbf {FhF} ^{H}\in \mathbb {C}^{N\times N}\) is a frequency domain channel matrix with inter carrier interference (ICI) induced by time variations of the channel, and the elements of which can be described as
where \(p,q=0,1,\ldots ,N1\). The offdiagonal elements of \(\mathbf {H}\) is the ICI response. \(\mathbf {H}\) can be divided into two parts, one part \(\mathbf {H }_{d}\) \(\in \mathbb {C}^{N\times N}\) is to retain only the principal diagonal elements, the other one \(\mathbf {H}_{n}\) \(\in \mathbb {C}^{N\times N}\) is to retain only the nondiagonal elements. Then (5) can be expressed as
\(\mathbf {H}_{n}\mathbf {X}\) is the ICI component, \({\text {diag}}(\cdot )\) denotes the diagonal operators and \(\mathbf {H}_{d}^{^{\prime }}\in \mathbb {C}^{N}\) is a column vector, the element of which is taken from the principal diagonal element of \({\mathbf {H}}_{d}\).
Because of Doppler effect, a subcarrier will be interfered by ICI of adjacent subcarriers. Figure 1 shows the simulation results of the ICI power influence of adjacent subcarriers on one subcarrier. The horizontal axis is the distance between adjacent subcarriers and a certain subcarrier, and the vertical axis is the average normalized power of ICI. In this simulation, subcarrier number of an OFDM \(N=64\), carrier frequency \(f_{c}=2.15\) GHz, mobile speed is 10/120/350 (km/h). As can be seen from Fig. 1 that most energy is distributed over the diagonal and its neighbors, the main ICI power is generated by adjacent subcarriers. The closer the subcarriers are, the greater the interference to a certain subcarrier, and the greater the moving speed is, the more serious the influence is.
Projection wavelet weighted TSVR channel estimation
Linear interpolation methods, which are often adopted in typical channel estimation are inadequate for timevarying fading channel estimation, while TSVR is suitable for regression of nonlinear systems due to its kernel mapping technique, therefore we use this method for nonlinear channel estimation. In 2010, the TSVR was proposed by Peng (2010), which is an extension of the classification tool support vector machine to regression applications; the target of the regression problem is to acquire the relationship between inputs and their corresponding outputs. On the basis of Peng’s work, many improved algorithms were proposed (Huang et al. 2014; Balasundaram and Meena 2016b; Khemchandani et al. 2016; Parastalooi et al. 2016; Rastogi et al. 2017). In this section, a novel projection wavelet weighted TSVR is proposed for deep fading channel estimation, which inherits the advantages of TSVR regression for nonlinear systems.
Given a training set \(\mathbf {Tr}=\{(\mathbf {t}_{1},z_{1}),(\mathbf {t} _{2},z_{2}),\ldots ,(\mathbf {t}_{m},z_{m})\}\), where \(\mathbf {t}_{i}\in \mathbb {R }^{2}\) and \(z_{i}\in \mathbb {R}\), \(i=1,2,\ldots ,m\), m is the number of training samples. Then the output vector of the training data can be denoted as \(\mathbf {Z}=(z_{1},z_{2},\ldots ,z_{m})^{T}\in \mathbb {R}^{m}\) and the input matrix as \(\mathbf {T}=(\mathbf {t}_{1},\mathbf {t}_{2},\ldots ,\mathbf {t} _{m})^{T}\in \mathbb {R}^{m\times 2}\). The first column of \(\mathbf {T}\) represents the time domain positions of training samples and the second column of \(\mathbf {T}\) represents the frequency domain positions of training samples. Let \(\mathbf {e}\) and \(\mathbf {I}\) be a ones vector and an identity matrix of appropriate dimensions, respectively.
In high mobility wireless environment, the channel is selective in both time and frequency domains, and the doubly selective channel exhibits very complex nonlinearity in the case of fast and deep fading. Therefore, linear channel estimation methods cannot obtain high performance. We adopt the nonlinear PWWTSVR algorithm to satisfy the estimation requirements of nonlinear channels since TSVR is superior in solving nonlinear, small training samples and high dimensional pattern recognition. Similar to the classical TSVR model (Peng 2010), the PWWTSVR is constructed by two nonparallel hyperplanes, downbound \(f_{1}(\mathbf {t})\), and upbound \(f_{2}(\mathbf {t})\); each hyperplane determines the \(\epsilon\)insensitive bound regressor, and the end regressor is \(f(\mathbf {t})=\frac{1}{2}(f_{1}( \mathbf {t})+f_{2}(\mathbf {t}))\).
Channel estimation algorithmprojection wavelet weighted twin support vector regression
Initially, SVR was used to solve linear regression problems and a hyperplane can be adopted to regress the relationship between inputs and outputs by training samples. However, the hyperplane method is only applicable to linear problems. Based on Vapnik theory (1995), the algorithm can be extended to nonlinear cases by using kernel mapping, which is the majority of cases in the real world. The kernel trick is adopted to map the input into a higherdimensional feature space, and the following kernelgenerated functions are considered: downbound \(f_{1}(\mathbf {t})=K(\mathbf {t},\mathbf { T}^{T})\mathbf {w}_{1}+b_{1}\) and upbound \(f_{2}(\mathbf {t})=K(\mathbf {t}, \mathbf {T}^{T})\mathbf {w}_{2}\) \(+b_{2}\), where \(\mathbf {w}_{1},\mathbf {w} _{2}\in \mathbb {R}^{m}\), m is the number of training points, \(b_{1},b_{2}\in \mathbb {R},\) K is an appropriately chosen kernel. Therefore, the end regressor is the average of \(f_{1}(\mathbf {t})\) and \(f_{2}(\mathbf {t}),\) i.e. \(f(\mathbf {t})=\frac{1}{2}(f_{1}(\mathbf {t})+f_{2}( \mathbf {t}))\). The optimization problems can be described as follows:
and
where \(c_{11},c_{12},c_{13},c_{21},c_{22},c_{23}>0\) are parameters chosen a priori by the user; \(\varepsilon _{1}\) and \(\varepsilon _{2}\) are insensitive parameters, \({\xi }_{1}\) and \({\xi }_{2}\) are slack vectors to measure the errors of samples outside the “\(\varepsilon\) tube”. \(\mathbf {D}\in \mathbb {R}^{m\times m}\) is a weighting matrix, which will be discussed later.
The first term in the objective function of (8) is the sum of weighted squared distances from training points to the downbound function, which is called empirical risk. Minimizing this causes the function \(f_{1}(\mathbf {t})\) to fit the training samples and avoid underfitting. The second term is a regularization term, which can make \(f_{1}(\mathbf {t})\) as smooth as possible. The structural risk minimization is implemented by minimizing the regularization term \(\frac{1}{2}(\mathbf {w} _{1}^{T}\mathbf {w}_{1}+b_{1}^{2})\). A small value of \(\frac{1}{2}(\mathbf {w} _{1}^{T}\mathbf {w}_{1}+b_{1}^{2})\) corresponds to the function (8) being flat. The third one, data structure term, can minimize empirical variance values of projected points on the downbound functions. The fourth term aims to minimize the sum of errors of the points lower than the downbound \(f_{1}(\mathbf {t})\), which can possibly overfit the training points. The ratios of the four penalty terms in the objective function of ( 8) can be adjusted by the choice of \(c_{11},c_{12},\)and \(c_{13}\). The optimization problem (9) is with similar illustrations.
In the third term, the projection axis, \(\hat{\mathbf {w}}_{k}=[\mathbf {w} _{k};1]\), \(k=1,2,\) is normal to the line of the bound regression functions. The projection axis is meant to make the projected zone or the variance of the projected noise as small as possible, and the following formula can be obtained.
where
and \(\mathbf {z}_{i}\) is the training point \(\mathbf {z}_{i}=(\varphi ( \mathbf {t}_{i});\,y_{i})\), \(i=1,\ldots ,m\), \(\mu _{\mathbf {z}}\) is the centroid point of \(\mathbf {z}_{i}\), and \({\Sigma }_{\mathbf {z}}\) is the covariance matrix of \(\mathbf {z}_{i}\), \({\Sigma }_{\varphi (\mathbf {t} )}\) is the empirical covariance matrices of inputs, \({\Sigma } _{\varphi (\mathbf {t})y}\) is the empirical correlation coefficient matrix between the inputs and responses. They are defined as \(\mu _{\varphi ( \mathbf {t})}\) and \(\mu _{y},\) which are the centroid points of the inputs and outputs respectively. \(\varphi (\mathbf {t})\) is the operation of mapping input \(\mathbf {t}\) to highdimensional feature space using kernel trick for nonlinear applications, i.e., \(\mathbf {t}\rightarrow \varphi (\mathbf {t})\). However, \(\varphi (\mathbf {t})\) lacks an explicit formulation due to the higher dimensions, which prevents the computation of \(\Sigma _{\varphi ( \mathbf {t})}\). In Peng’s work (2014), the eigenvalue decomposition method is adopted to explicitly map to the empirical feature space. Let \(K(\mathbf {T}, \mathbf {T}^{T})\) denote an \(m\times m\) matrix of rank r, where K is an appropriately chosen kernel. Since \(K(\mathbf {T},\mathbf {T}^{T})\) is a symmetric positivesemidefinite matrix, it can be decomposed as \(K(\mathbf {T} ,\mathbf {T}^{T})=\mathbf {P}_{m\times r}{\varvec{\Lambda }} \mathbf {P}_{r\times m}^{T},\) where \({\varvec{\Lambda }}\) is a diagonal matrix containing only the r positive eigenvalues of \(K(\mathbf {T},\mathbf {T}^{T})\) in decreasing order, and \(\mathbf {P}_{m\times r}\) consists of the eigenvectors corresponding to the positive eigenvalues. The mapping from the input data space to the kernel space is expressed as
To solve the optimization problems (8) and (9), we convert the constrained problems to a pair of unconstrained problems by introducing the plus function (\(\cdot\))\(_{+}\) and substitute (10) into (8) and (9) as follows.
and
Plus functions \((\cdot )_{+}\) in (13) and (14) are not differentiable, but they can be replaced by smooth approximate functions \(\mathbf {p}(\cdot )\). In this paper, we adopt the sigmoid integral function as a smooth function; it is defined as
where \(\alpha\) is a positive real constant. Define \(\mathbf {f}_{1}=(\mathbf { Y}+\varepsilon _{1}\mathbf {e})\), \(\mathbf {f}_{2}=(\mathbf {Y}\varepsilon _{2} \mathbf {e})\), \(\mathbf {u}_{1}=[\mathbf {w}_{1}^{T},b_{1}]^{T}\), \(\mathbf {u} _{2}=[\mathbf {w}_{2}^{T},b_{2}]^{T}\), \(\mathbf {G}=[K(\mathbf {T},\mathbf {T} ^{T}),\mathbf {e}]\) and replace the plus functions \((\cdot )_{+}\) in (13) and (14) by (15), we can get
and
Note that \(L_{1}\) in (16) and \(L_{2}\) in (17) are convex (Wang et al. 2019). Global and unique solutions can be obtained, and the Newton iterative approach can be adopted to solve the minimization problems as follows.
The first and secondorder gradients of \(L_{1}\) in (16) and \(L_{2}\) in (17) are deduced as follows
and
where \(\mathbf {Q}_{1}=\left[ \begin{array}{ll} c_{11}\mathbf {I}+K(\mathbf {T},\mathbf {T}^{T})^{T}\mathbf {D}K(\mathbf {T}, \mathbf {T}^{T})+{\Sigma }_{\varphi (\mathbf {t})} &{} K( \mathbf {T},\mathbf {T}^{T})^{T}\mathbf {De} \\ \mathbf {e}^{T}\mathbf {D}K(\mathbf {T},\mathbf {T}^{T}) &{} c_{11}+ \mathbf {e}^{T}\mathbf {De} \end{array} \right]\), \(\mathbf {P}_{1}=\left[ \begin{array}{l} K(\mathbf {T},\mathbf {T}^{T})^{T}\mathbf {DY}+c_{12}{\Sigma }_{\varphi ( \mathbf {t})y}^{{}} \\ \mathbf {e}^{T}\mathbf {DY} \end{array} \right]\), \(\mathbf {Q}_{2}=\left[ \begin{array}{ll} c_{21}\mathbf {I}+\mathbf {A}^{T}\mathbf {D}K(\mathbf {T},\mathbf {T}^{T})+ {\Sigma }_{\varphi (\mathbf {x})} &{} K(\mathbf {T},\mathbf {T} ^{T})^{T}\mathbf {De} \\ \mathbf {e}^{T}\mathbf {D}K(\mathbf {T},\mathbf {T}^{T}) &{} c_{21}+ \mathbf {e}^{T}\mathbf {De} \end{array} \right]\), and \(\mathbf {P}_{2}=\left[ \begin{array}{l} K(\mathbf {T},\mathbf {T}^{T})^{T}\mathbf {DY}+c_{22}{\Sigma }_{\varphi ( \mathbf {t})y}^{{}} \\ \mathbf {e}^{T}\mathbf {DY} \end{array} \right] .\)
The iterative solutions of minimization problems (13) and (14) can be obtained by adopting the Newton method and using (18)––(21), as follows:
Training samples
SVR is a supervised machine learning method, which requires input and output of training samples for parameter training. The inputs of training samples are the positions of estimated channel response in time and frequency domain, and the outputs are the channel response estimated by received pilots. In the proposed method, the pilots are inserted equidistantly in both time and frequency domain. The OFDM symbols inserted into the pilot are called pilot symbols, and those without pilot are called data symbols. The positions of pilot symbols are expressed as \([n\Delta t]\), \(n=0,1,\ldots ,N_{t}1,\) where \(\Delta t\) is the pilot symbol interval in time domain and \(N_{t}\) is the number of pilot symbols. In an OFDM symbol, the transmitting pilot subcarrier positions are expressed as \([m\Delta f]\), \(m=0,1,\ldots ,N_{f}1,\) where \(\Delta f\) is the pilot interval in frequency domain and \(N_{f}\) is the number of pilots in an OFDM symbol. The pilot insertion scheme is shown in Fig. 2. Let the transmitting pilot matrix be \(\mathbf {X}_{P}={\text {diag}}(X(n\Delta t,m\Delta f))\in \mathbb {C} ^{N_{t}N_{f}\times N_{t}N_{f}}\), and the channel frequency response estimated at pilot subcarriers according to (7) can be obtained
where \(\mathbf {Y}_{P}=\mathbf {Y}(n\Delta t,m\Delta f)\in \mathbb {C} ^{N_{t}N_{f}}\) is the received pilot vector and \(\hat{\mathbf {H}}_{P}= \hat{\mathbf {H}}(n\Delta t,m\Delta f)\in \mathbb {C}^{N_{t}N_{f}}\) is the estimated frequency response at pilot positions \((n\Delta t,m\Delta f)\).
After interpolation, the frequency response of data position can be calculated and the predicted frequency response of all subcarriers in an OFDM symbol can be expressed as
Weighting parameters
The parameter \(\mathbf {D}\) mentioned above is the weighting matrix, and it can be determined beforehand according to the importance of training data. The samples used for training are all polluted by noise of different amplitude. Intuitively, the samples polluted by large noise should be given smaller weight. Therefore, all training data should be given penalty weights. The penalty weighting parameter D are given to the samples in the proposed algorithm. The weighting parameters which represent the distance between noised samples and its ‘real position’ can reflect the prior information of the training samples. While in the classical SVR algorithms, such as the TSVR, no weights are added to the samples, that means the weights are all one and all of the samples have the same weights. Points with too much noise, such as outliers, will degrade the performance of the regressors. Different training samples should be given different weights, a larger given weight means that the sample is more important. Motivated by this idea and noting that the Gaussian function can reflect this trend, the weighting parameter \(\mathbf {D}\) is determined by the Gaussian function described as follows.
where A is the amplitude, \(\sigma\) represents the standard deviation, and \(\hat{\mathbf {Y}}(=[\hat{y}_{1},\hat{y}_{2},\ldots ,\hat{y} _{_{N_{t}N_{f}}}]^{T})\) is the estimation value vector of output \(\mathbf {Y} (=[y_{1},y_{2},\ldots ,y_{^{_{_{N_{t}N_{f}}}}}]^{T})\). In this paper, wavelet transform based method is adopted to calculate \(\hat{\mathbf {Y}}\). Wavelet transform is a kind of timefrequency transform analysis method and wavelet filter is applicable to shortterm signal processing, which also is the characteristic of most actual time series signals. It inherits and develops the idea of shorttime Fourier transform localization, and overcomes the shortcomings of window size not changing with frequency. It can provide a timefrequency window that changes with frequency. It is a relatively ideal tool for timefrequency analysis and processing signals. In addition, the inserted pilots are twodimensional distribution in time and frequency domain, therefore twodimensional wavelet filtering is adopted to calculate \(\hat{\mathbf {Y}}\) by three steps. Firstly, the wavelet transform may be considered a form of timefrequency representation for signals, and thus are related to harmonic analysis. For discrete data, discrete wavelet transforms (DWT) use discretetime filter banks to decompose signal sequence. Secondly, the obtained decomposed sequence is processed by an appropriate algorithm to remove noise. In this paper, the high frequency part of the frequency domain signal is directly zeroed as denoising algorithm. Thirdly, the estimation value of output \(\hat{y}\) is reconstructed by the denoised sequence and the estimation value of output \(\hat{\mathbf {Y}}\) can be obtained. Substitute \(\hat{\mathbf {Y}}\) into (26), and then the weighting vector \(\mathbf {d}\) and matrix \(\mathbf {D}\) can be calculated.
Weighting parameters were also introduced in other improved TSVR, for example, in Xu’s work (2014), the Knearest neighbor (KNN) algorithm is adopted. One of the differences between the PWWTSVR and the method proposed by Xu and Wang (2014) lies in the different objects being processed. The KNN algorithm is suitable for dealing with clustered samples. The idea of the KNN is that a sample point x is important if it has a larger number of Knearest neighbors, whereas it is not important if it is an outlier that has a small number of Knearest neighbors. While the wavelet transform theory adopted in our proposed algorithm is a kind of timefrequency representation for signals, and the proposed method based on the wavelet theory is suitable for dealing with time series samples.
Algorithm summary
In this subsection, we illustrate the channel estimation algorithm.
Algorithm: Channel estimation based on PWWTSVR
Computational complexity analysis
Computational complexity is an important performance of an algorithm. Consider that in a calculating period, if the number of pilot symbols is \(N_{t}\) and the number of pilots in one pilot symbol is \(N_{f}\), then the number of training samples is \(l=N_{t}N_{f}\). In the processing step 1 of the algorithm described in the previous subsection, the calculation of \(\hat{\mathbf {H}}_{p}\) requires O(l) complex multiplications, and the step 2 needs computational cost O(l) real multiplication. In step 3, since the input \(\mathbf {t}\) is known at receiver, the \(\varphi (\mathbf {t})\) in (12) can be precomputed and stored at the receiver. Therefore the calculation of \({\Sigma }_{\varphi (\mathbf {t})}\) and \({\Sigma }_{\varphi (\mathbf {t})y}\) require O(l) and O(l) respectively. The step 4 is an iterative process. The main calculation comes from the inversion of two matrices : \((\nabla ^{2}L_{1}(\mathbf {u}_{1}))^{1}\) and \((\nabla ^{2}L_{2}(\mathbf {u}_{2}))^{1}\) in (22) and (23) respectively, and one matrix inversion requires about O(\(l^{3}\)). Let the number of iteration steps be n (generally, \(n<\)10), then the computational complexity of this step is about O(2\(nl^{3}\)). In addition, the proposed algorithm contains the wavelet transform weighted matrix, and a Db3 wavelet with a length of 6 is used in this paper, then the complexity of the wavelet transform is less than 12l. By comparing with the computations of the inverse matrix, the complexity of computing the wavelet matrix can be ignored. The step 5 also needs computational cost O(l). The step 6 is the repetitive process of the previous calculation. If we only retain the inverse computations and ignore the other computations, the total computational complexity is about O(4\(nl^{3}\)).
Experimental results
In this section, we show some simulation results of the proposed channel estimation algorithm based on the novel PWWTSVR. TSVR is effective for channel estimation, and its regression performance has been verified by the works of Charrada and Samet (2016) and Matilde et al. (2004), therefore we compare performance of the proposed algorithm with the linear interpolation method and TSVR (Peng 2010) based channel estimation algorithm. Consider an OFDM system with doubly selective channel. The multipath number \(L+1=6\), and the channel taps are assumed to be i.i.d., correlated in time with a correlation function according to Jakes’ model (Stuber 1996) \(E[h(n_{1},l_{1})h^{*}(n_{2},l_{2})]=\sigma _{h}^{2}J_{0}(2\pi f_{\max }T_{s}(n_{1}n_{2}))\delta (l_{1}l_{2}),\) where \(E(\cdot )\) means expected value, (\(\cdot\))\(^{*}\) denotes conjugate, \(n_{k}\) is time index, \(l_{k}\) is channel path index, \(J_{0}\) is the zerothorder Bessel function of the first kind, Ts is the sampling time interval, and \(\sigma _{h}^{2}\) denotes the variance of the channel. We consider \(N=64\) subcarriers, and a CP of length \(L=5\). The sampling time interval \(Ts=72\mu s\), carrier frequency \(f_{c}=2.15\) GHz, mobile speed is 120/350 (km/h) and 16QAM signaling is assumed.
In order to demonstrate the effects of multipath and moving speed on channel frequency response, four scenario simulations are performed. Figure shows the channel frequency response at subcarriers in an OFDM symbol under multipath number being 2 and 5 for mobile speed equal to 0 and 350 km/h respectively. From Fig. 3, we can see that multipath can cause frequencydomain fading and the more the number of multipath, the deeper the fading. At the same time, the mobile can cause ICI, the faster the movement, the bigger the ICI. According to (7), we know that the ICI is influenced by the product of channel response \(\mathbf {H}_{n}\) and data to be transmitted \(\mathbf {X}\). As data to be transmitted is random, the ICI caused by transmitted data is like noise, which can be reflected in the simulations.
The computer simulations are implemented in a Matlab R2014a environment on a PC. Because Gaussian kernel is an effective and frequently used kernel function in TSVR research field, in this paper, the Gaussian nonlinear kernel is adopted for the proposed PWWTSVR and TSVR (Peng 2010), that is
where \(\rho\) determines the width of the Gaussian function. The choice of parameters is essential for the performance of algorithms. In this paper, parameter values are chosen by the grid search method from the set of values \(\{10^{i}i=4,3,\ldots ,5\}\).
Figure 4 illustrates the regression of PWWTSVR. In this simulation experiment, pilots are inserted in frequency and time domain with insertion interval \(\Delta f=2\), \(\Delta t=2\), SNR=10dB, number of multipath \(L=5\) and mobile speed \(v=350\)km/h. We can see that the proposed PWWTSVR can fit the channel frequency response and the outliers (pilot samples with too much noise) can be ignored, which shows the robustness of the proposed algorithm. The star points represent the estimated channel response at the pilot position by LS method, the solid line represents the interpolation using the PWWTSVR algorithm, and the dashed line represents the perfect estimation result, which is not affected by noise in the simulation.
The performance of channel estimation based on PWWTSVR and other methods are evaluated by selected criteria. The number of testing samples is denoted by l, \(y_{i}\) denotes the real value of a testing sample point \(x_{i}\) and \(\hat{y}_{i}\) denotes the predicted value of \(x_{i}\). The criteria are specified as follows.
SNR: Signaltonoise ratio, defined as SNR=10log\((\sigma _{x}^{2}/\sigma _{v}^{2})\), where the signal power \(\sigma _{x}^{2}=E(\left x(k)\right ^{2}),\) \(\sigma _{v}^{2}\) is variance of AWGN.
SSE: Sum squared error of testing samples, defined as \({\text {SSE}}= \sum _{i=1}^{l}\left y_{i}\hat{y}_{i}\right ^{2}\).
BER: bit error rate, defined as \({\text {BER}}=N_{e}/N\), where \(N_{e}\) and N are the number of error signals and all signals in binary respectively.
SSE represents the fitting precision, but too small of an SSE value may mean overfitting of the regressor due to the fitting of noise.
To test the performance of kernel functions, six kernel functions are studied for Sinc function (\(f(x)=\frac{\sin x}{x}\)) including the Gaussian kernel adopted in the proposed method. The kernel function expressions and SSE performance are listed in Table 1. It is easy to see that the Gaussian kernel achieves the best results.
Outliers may have a certain influence on the regression of the method. To test the robustness of the proposed algorithm to outliers, the performance on Sinc function polluted by Gaussian additive noise with mean zero and variance \(\varsigma ^{2}=0.1^{2}\) is studied. Figure 5 shows that the prediction performance of the proposed method is satisfying.
Figures 6, 7, 8 and 9 show the BER performance as a function of the SNR with different pilot spacing and different moving speed. We compare the BER variation of the proposed algorithm with TSVR estimation (Peng 2014), linear interpolation estimation, Knearest TSVR (Gupta 2017) and perfect estimation in the presence of additive Gaussian noise. Table 2 shows the SSE performance of channel frequency response for various algorithms. It can be seen that the proposed method outperforms the TSVR estimation and linear interpolation estimation. The BER performances of the proposed method and the TSVR estimation are all better than that of linear interpolation estimation, which means that SVR is suitable for the regression problems based on training samples especially in the nonlinear cases. At the same time, we can see that our PWWTSVR estimation outperforms the TSVR estimation, which shows the effectiveness of the adoption of the weighting parameter \(\mathbf {D}\) in the objective functions.
Mobile speed v, pilot spacing \(\Delta f\) in frequency domain and \(\Delta t\) in time domain also affect the estimation performance. Slow moving speed leads to better BER and SSE performance, and smaller spacing interval can achieve better estimation quality, but the cost is the increase of computational complexity. The simulation results listed in table shows that our proposed method obtains the best performance compared with TSVR estimation and linear interpolation estimation in the sense of SSE value, which shows its good fitting ability. Furthermore, we notice that there are special cases due to randomness. For example, the SSE of \(\Delta f=4\), \(\Delta t=4\), SNR=5 dB, \(v=350\) in Table 1 is less than that of \(v=120\).
In summary, the simulation results confirm that the proposed channel estimation algorithm has better performance than the conventional methods and the introduction of the projection item and the weighting parameters calculated by wavelet transform method can improve the performance of the channel estimation.
Conclusions
In this paper, a novel PWWTSVR based channel estimator is proposed. Pilots are grid insertion in time and frequency domains in OFDM system, so the channel estimation is performed in dual domains using the gridinserted pilots. Firstly, the channel frequency response is calculated at the pilot positions and the training samples for the PWWTSVR can be obtained. Then channel frequency response at data positions can be estimated by the proposed algorithm. Unlike TSVR, samples in different positions in the proposed PWWTSVR model are given different weights according to the distance between samples and results preprocessed by the wavelet transform. Computational comparisons between PWWTSVR and other existing methods are performed to show the better performance of the PWWTSVR based channel estimation, which also demonstrates the effectiveness of the proposed method especially for the nonlinear cases. Furthermore, since one of the theoretical bases of the proposed method is wavelet theory, which is a powerful denoising tool for time series signals, the proposed method is suitable for dealing with time series datasets such as the estimation of channel parameters. Additionally, SVR is suitable for small sample size datasets, and a large number of training samples will bring about tremendous computational cost. For future work, study of small proportion pilots insertion in time and frequency domain algorithms based on more accurate channel models should be carried out.
References
Anagha P, Balasundaram S, Meena Y (2018) On robust twin support vector regression in primal using squared pinball loss. J Intell Fuzzy Syst 35(5):5231–5239
Anand P, Rastogi R, Chandra S (2018) Generalized \(\varepsilon\)loss functionbased regression. In: Machine intelligence and signal analysis, pp 395–409
Anand P, Rastogi R, Chandra S (2019) A new asymmetric insensitive pinball loss function based support vector quantile regression model. arXiv:1908.06923
Balasundaram S, Meena Y (2016a) A new approach for training Lagrangian support vector regression. Knowl Inf Syst 49:1097–1129
Balasundaram S, Meena Y (2016b) Training promal twin support vector regression via unconstrained convex minimization. Appl Intell 44:931–955
Balasundaram S, Gupta D Kapil (2014) Lagrangian support vector regression via unconstrained convex minimization. Neural Netw 5:67–79
Charrada A, Samet A (2016) Joint interpolation for LTE downlink channel estimation in very highmobility environments with support vector machine regression. IET Commun 10(17):2435–2444
Djouama A, Lim MS, Ettoumi FY (2014) Channel estimation in long term evolution uplink using minimum mean square errorsupport vector regression. Wirel Pers Commun 79:2291–2304
Gao C, Shen M, Liu X, Wang L, Chu M (2019) Endpoint static control of basic oxygen furnace (BOF) steelmaking based on wavelet transform weighted twin support vector regression. Complexity 2019:1–16
Gupta D (2017) Training primal Knearest neighbor based weighted twin support vector regression via unconstrained convex minimization. Appl Intell 47(3):962–991
Gupta U, Gupta D (2019) An improved regularization based Lagrangian asymmetric \(\nu\)twin support vector regression using pinball loss function. Appl Intell 49:3606–3627
Huang XL, Shi L, Pelckmans K, Suykens JAK (2014) Asymmetric \(\nu\)tube support vector regression. Comput Stat Data Anal 77:371–382
Khemchandani R, Goyal K, Chandra S (2016) TWSVR: regression via twin support vector machine. Neural Netw 74:14–21
Matilde SF, Mario PC, Jeronimo AG, Fernando PC (2004) SVM multiregression for nonlinear channel estimation in multipleinput multipleoutput systems. IEEE Trans Signal Proc 52(8):2298–2307
Melki G, Cano A, Kecman V, Ventura S (2017) MultiTarget support vector regression via correlation regressor chains. Inform Sci s415–416:53–69
Parastalooi N, Amiri A, Aliherdari P (2016) Modified twin support vector regression. Neurocomputing 211:84–97
Peng X (2010) TSVR: an efficient twin support vector machine for regression. Neural Netw 23(3):356–372
Peng X, Xu D, Shen J (2014) A twin projection support vector machine for data regression. Neurocomputing 138:131–141
Rastogi R, Anand P, Chandra S (2017) A \(v\)twin support vector machine based regression with automatic accuracy control. Appl Intell 46:670–683
Sheng Z, Tuan HD, Nguyen HH, Fang Y (2017) Pilot optimization for estimation of highmobility OFDM Channels. IEEE Trans Veh Technol 66(10):8795–8806
Stuber GL (1996) Principles of mobile communication. Kluwer, Boston
Vapnik VN (1995) The natural of statistical learning theroy. Springer, New York
Wang LD, Gao C, Zhao NN, Chen XB (2019) A projection wavelet weighted twin support vector regression and its primal solution. Appl Intell. https://doi.org/10.1007/s10489019014227
Wu J, Fan P (2016) A survey on high mobility wireless communications: Challenges, opportunities and solutions. IEEE Access 4:450–476
Xu Y, Wang L (2014) Knearest neighborbased weighted twin support vector regression. Appl Intell 41:299–309
Xu Y, Li X, Pan X, Yang Z (2017) Asymmetric \(\nu\)twin support vector regression. Neural Comput Appl 2:1–16
Yu X, Luo Y, Chen X (2016) An optimizated seamless duallink handover scheme for highspeed rail. IEEE Trans Veh Technol 65(10):8658–8668
Acknowledgements
This work was supported in part by the Fund of the Scientific Research Project of Liaoning Provincial Department of Education (Grant No. 2019LNJC02), in part by the Funds of the National Natural Science Foundation of China (Grant Nos. 71571091 and 71771112), and in part by the Fund of the University of Science and Technology Liaoning Talent Project (Grant No. 60101150703).
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, L., Ma, Y., Chang, X. et al. Projection wavelet weighted twin support vector regression for OFDM system channel estimation. Artif Intell Rev 54, 469–489 (2021). https://doi.org/10.1007/s10462020098532
Published:
Issue Date:
Keywords
 Twin support vector regression
 Wavelet transform
 Unconstrained minimization
 Channel estimation
 OFDM