1 Introduction

Portfolio selection problem has been one of the core issues of the modern investment theory (Ao et al., 2019). How to construct an effective portfolio to improve the out-of-sample performance is the focus in academia and industry (Ma et al., 2019). In practice, an often ignored fact is that noise is an important factor affecting portfolio performance (Kondor et al., 2007; Dessaint et al., 2019; Peress and Schmidt, 2020). Some studies indicate that denoising can significantly improve investors’ returns (Aloui and Jammazi, 2015; Zhu et al., 2019, 2021). However, the previous common denoising methods, especially empirical mode decomposition (EMD) denoising, have some weaknesses in portfolio management, such as inadequate or excessive denoising (He et al., 2017; Helong et al., 2019). To address these weaknesses, an EMD denoising strategy based on the correlation coefficient test criterion is proposed to improve portfolio performance.

The existence of noise originates from that individual investors have no access to inside information, they do not follow buy and hold strategies, and tend to select stocks with strong past returns (Black, 1986; Odean, 1999). A result from this concentrated trading is that prices tend to deviate from their fundamental values (Odean, 1999). Black (1986) labels these deviations as "noise". One often ignored fact is that the time series in financial market are easily interfered by noise, which may mislead the model fitting (Kondor et al., 2007). As results, the portfolio models may provide inaccurate results, investors who make decisions based on biased results will inevitably suffer losses. To eliminate noise interference, some researchers try to introduce data decomposition methods, such as the popular wavelet decomposition, into portfolio management. For example, Aloui & Jammazi (2015), Zhu et al. (2019, 2021) propose different denoising methods to construct portfolio models based on the wavelet decomposition technique, their empirical results indicate that the profitability, Sharpe ratio, and model accuracy have been improved after filtering the noise from original data. Overall, there are limited theoretical and empirical studies to investigate portfolio performance from a denoising perspective.

Except for the wavelet decomposition, EMD also receives extensive attention (Huang et al., 1998). Compare to wavelet decomposition, EMD does not require any prior assumptions about signal modes or system orders, and can directly decompose original data into finite intrinsic mode functions (IMFs) and a trend item. To date, it has shown outstanding advantages in decomposing financial data (Zhu et al., 2017; Yang et al., 2019). In this study, we use EMD instead of wavelet decomposition to construct different denoising strategies.

The key to EMD denoising is how to select the decomposed IMFs. It is generally accepted that different IMFs represent different fluctuation levels (Huang et al., 1998), the high-frequency IMFs are disordered and display minimal regularity, which are mainly caused by a series of factors that have short-term effects, such as bad weather and strikes, etc. Flandrin et al. (2004) consider these high-frequency components as noise and argue that the main information is concentrated in the low-frequency IMFs. Thus, there must be a key index, the IMFs after IMF\(_{index}\) are regarded as the dominant modes, and the formers are considered as noise. Numerous studies follow this framework to denoise different types of data in engineering and medical fields, etc (Boudraa and Cexus, 2007; Nguyen and Kim, 2016). However, these denoising methods may not be suitable for finance data since the optimal denoising strategy highly depends on the data characteristic, i.e., different types of data have different optimal denoising strategies (Li et al., 2016; Nguyen and Kim, 2016; Zhu et al., 2019, 2021). In practice, the approach might face many weaknesses, such as inadequate or excessive denoising.

Therefore, a new EMD denoising strategy based on the correlation coefficient test criterion is proposed to improve portfolio performance. In detail, we first theoretically prove that noise can cause the optimal portfolio weights and effective frontier to deviate from their true positions. Thus, it is necessary to eliminate noise. Next, we apply EMD to decompose original noisy price and perform a series of correlation coefficient tests to identify which IMFs are noise. If the tests accept the null hypothesis, the IMFs are considered as noise. Conversely, they are considered as non-noisy components. Finally, we sum the non-noisy components and residual to construct the denoised price.

In the empirical analysis, the daily closing prices of 3180 trading days ranging from October 8, 2007 to October 30, 2020 are collected to test portfolio performance. Four quantitative indicators including Sharpe ratio, Sortino ratio, upside potential ratio and tracking error ratio, are used to deeply summarize out-of-sample performance. The empirical results show that the proposed denoising method outperforms common EMD, Ensemble EMD (EEMD) and wavelet denoising methods under the mean–variance framework. Besides, the portfolio performance is examined in four different subsamples, including bull, bear markets and two special periods, i.e., the 2007–2008 financial crisis and coronavirus disease 2019 (COVID-19) pandemic in 2020. The results reconfirm the superiority of the proposed denoising method. The simulation study by setting different parameters validates the above conclusions. Overall, the proposed denoising method can minimize noise interference, and help investors improve portfolio performance to the greatest extent.

This paper contributes to portfolio management in the following two dimensions. First, we theoretically analyze the impact of noise on the portfolio, and prove that noise causes the optimal portfolio and effective frontier to deviate from their true positions. In this way, the theoretical basis of denoising is argued. Second, we point out the weaknesses of common denoising methods applied to portfolio management and construct an EMD denoising strategy based on the correlation coefficient test criterion, whose portfolio performance significantly outperforms other common denoising methods.

Figure 1 plots the framework of this paper. Section 2 theoretically analyzes the motivation of denoising. Section 3 introduces the proposed EMD denoising method based on the correlation coefficient test criterion. As a comparison, four common EMD denoising methods are also described. Section 4 compares the portfolio performance of different denoising methods under the mean–variance framework with different sample periods. Section 5 further evaluates the robustness of the proposed denoising method through simulated data. The last section concludes the paper.

Fig. 1
figure 1

The framework of this paper

2 Portfolio Theory Under Noisy Environment

In this section, we decompose the noisy price into non-noisy component and noise, and further construct the mean–variance model under the noisy environment. By comparing the portfolio under non-noisy environment, we explain the impact of noise on portfolio and argue the necessity of denoising.

2.1 The Noisy Portfolio Returns

Due to the asymmetry and incompleteness of information, the stock prices are generally noisy (Black, 1986; Odean 1999). Considering the price \( x_i(t)\) of stock \(i\,\,(i=1,\ldots ,k)\) at time \(t \,\,(t=1,\ldots ,T)\) is composed of non-noisy component \(s_i(t)\) and noise \(n_i(t)\).

$$\begin{aligned} x_i(t)= s_i(t) + n_i(t),\,\, t=1,\ldots ,T \end{aligned}$$
(1)

where the noise \(n_i(t)\) and non-noisy component \(s_i(t)\) are uncorrelated, i.e., \(\text {cov}(s_i(t),n_i(t)) = 0\). Then, the return \(r_i(t)\) for stock i can be calculated as

$$\begin{aligned} \begin{array}{ll} r_i(t)&{}=\displaystyle \frac{x_i(t)-x_i(t-1)}{x_i(t-1)} =\displaystyle \frac{s_i(t)-s_i(t-1)+n_i(t)-n_i(t-1)}{x_i(t-1)} \\ &{}=\displaystyle \frac{s_i(t)-s_i(t-1)}{s_i(t-1)}\frac{s_i(t-1)}{x_i(t-1)}+ \frac{n_i(t)-n_i(t-1)}{n_i(t-1)}\frac{n_i(t-1)}{x_i(t-1)} \\ &{}=r_{i,s}(t)\displaystyle \frac{s_{i}(t-1)}{x_i(t-1)}+r_{i,n}(t)\frac{x_i(t-1)-s_i(t-1)}{x_i(t-1)} \\ &{}=\alpha _i(t-1) r_{i,s}(t)+(1-\alpha _i(t-1)) r_{i,n}(t) \end{array} \end{aligned}$$
(2)

where \(r_{i,s}(t)=(s_i(t)-s_i(t-1))/{s_i(t-1)}\) is the return of non-noisy component. Similarly, \(r_{i,n}(t)=(n_i(t)-n_i(t-1))/{n_i(t-1)}\) is the return for noise. \(\alpha _{i}(t-1)=s_{i}(t-1)/{x_{i}(t-1)}\) denotes the share of non-noisy component in x(t). For reading convenience, the variables \(r_i(t), r_{i,s}(t), r_{i,n}(t), x_i(t),s _i(t), n_i(t)\) and \(\alpha _i(t-1)\) are denoted by \(r_{i},r_{i, s},r_{i, n},x_{i},s_{i},n_{i}\) and \(\alpha _{i}\), respectively. Furthermore, the noisy returns \({\varvec{r}}=(r_1,\ldots ,r_k)^{\tau }\) can be expressed as

$$\begin{aligned} \begin{array}{ll} {\varvec{r}} &{}={\varvec{\alpha \odot r_{s}+(1-\alpha ) \odot r_{n} }}\\ &{}={\varvec{R_{s}+R_{n}}} \end{array}\end{aligned}$$
(3)

where \((r_1,\ldots , r_k)^{\tau }\) and \({\varvec{\odot }}\) denote the transposition of \({\varvec{(}}r_1,\ldots ,r_k)\) and Hadamard product (Johnson, 1990). \({\varvec{r_s}}=(r_{1,s}, \ldots , r_{k,s})^{\tau }\) and \({\varvec{r_n}}=(r_{1,n},\ldots ,r_{k,n})^{\tau }\) present the noisy and non-noisy returns, their shares in the noisy returns are \({\varvec{\alpha }}=(\alpha _1, \ldots , \alpha _k)^{\tau }\) and \({\mathbf {1}}-\varvec{\alpha }=(1-\alpha _1, \ldots , 1-\alpha _k)^{\tau }\), respectively. Besides, we let \({\varvec{R_{s}}}=(R_{1,s}, \ldots , R_{k,s})^{\tau }\) and \({\varvec{R_{n}}}=(R_{1,n}, \ldots , R_{k,n})^{\tau }\) denote \({\varvec{\alpha \odot r_s}}\) and \({\varvec{(1-\alpha )\odot r_n}}\), where \( R_{i, s}=\alpha _{i} \odot r_{i, s}=r_{i, s}s_{i}/{x_{i}} \) and \(R_{i,n}=(1-\alpha _i) \odot r_{i,n}=r_{i,n}n_i/x_i\).

Since the price \(x_{i}\) is generally bounded, i.e., \(M_1\le x_{i}\le M_2\), where \(M_1\) and \(M_2\) are constants. Besides, it is deduced that \(\text {cov}(r_{i,s},r_{i,n})= \text {cov}(s_ir_{i,s}, n_ir_{i,n})=0\) based on \(\text {cov}(s_i,n_i)=0\). Finally, the covariance \(\text {cov}(R_{i,s}, R_{i,n})\) follows the inequality if considering \(1/x_{i}\) as a coefficient term.

$$\begin{aligned} 0= \frac{1}{M_2^2}\text {cov}(s_{i} r_{i, s}, n_{i} r_{i, n}) \le \text {cov}(R_{i, s}, R_{i, n})=\text {cov}\left( \frac{s_{i}}{x_{i}} r_{i, s}, \frac{n_{i}}{x_{i}} r_{i, n}\right) \! \le \!\frac{1}{M_1^2}\text {cov}(s_{i} r_{i, s}, n_{i} r_{i, n})=0 \end{aligned}$$
(4)

Equation 4 shows that \(cov(R_{i,s},R_{i,n})=0\), which means that the return \(r_i\) are mainly composed of non-noisy component \(R_{i,s}\) and noise \(R_{i,n}\). Besides, we can deduce that \(\text {cov}(R_{i,s},R_{j,n})=0,\ i\ne j\). In this way, the portfolio return \(r_{p}\) is

$$\begin{aligned} r_{p}={\varvec{w^{\tau }r}}={\varvec{w^{\tau }(R_s+R_n)}} \end{aligned}$$
(5)

where \({{\varvec{w}}}=(w_i,\ldots ,w_k)^{\tau }\) are the portfolio weights, and \(\text {cov}{\varvec{(R_s,R_n)=0}}\). Furthermore, we can obtain that the expectation and variance of the portfolio return \(r_p\) are

$$\begin{aligned} \begin{array}{rl} {\mathbb {E}}(r_{p}) &{}={\varvec{w^{\tau }(\mu _{s}+\mu _{n})}} \\ var(r_{p}) &{}={\varvec{w^{\tau }\Sigma _{s}w}}+{\varvec{w^{\tau }\Sigma _nw}} \end{array}\end{aligned}$$
(6)

where \({\varvec{\mu _s}}\) and \({\varvec{\mu _n}}\) denote the expectations of non-noisy component \({\varvec{R}}_{s}\) and noise \({\varvec{R}}_{n}\). Similarly, \({\varvec{\Sigma _s}}\) and \({\varvec{\Sigma _n}}\) denote the covariance matrices of \({\varvec{R_s}}\) and \({\varvec{R_n}}\), respectively.

2.2 Mean–Variance Model Under Noisy Environment

Following Markowitz’s portfolio optimization framework (Markowitz 1952). The classical mean–variance portfolio model, which aims at minimizing portfolio variance under the given expected return \({\mathbb {E}} (r_p) = \mu _0\), can be expressed as

$$\begin{aligned} \begin{array}{ll} {\varvec{w}}(\mu _0)=\text {argmin} &{}{\varvec{w^{\tau } { \Sigma _s} w+w^{\tau } {\Sigma _n} w}}\\ \qquad \quad \quad \text {s.t.}&{} {\varvec{w^{\tau }(\mu _s+\mu _n)}} = \mu _0 \end{array} \end{aligned}$$
(7)

For calculation convenience, we consider an investor’s wealth might be partially allocated to the risk-free security and short sales are allowed, the restriction \({\varvec{w^{\tau }1}}=1\) is not included in Eq. (7). By using the Lagrange multiplier algorithm, the optimal solution can be obtained by solving \(\mathop {min}\limits _{({{{\varvec{w}}},\lambda })} L({{\varvec{w}}},\lambda )\),

$$\begin{aligned} L({{\varvec{w}}},\lambda )={\varvec{ w^{\tau } {\Sigma _s} w+w^{\tau } {\Sigma _n} w}}-\lambda \left[ {\varvec{ w^{\tau }(\mu _s+\mu _n)}}- \mu _0\right] \end{aligned}$$
(8)

where \({\varvec{w}}\) is the optimal solution of Eq. (7) when the Lagrange function \(L({{\varvec{w}}},\lambda )\) satisfies

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \frac{{\partial L}}{{\partial {{\varvec{w}}}}} =2{\varvec{(\Sigma _s+\Sigma _n)w}}-\lambda {\varvec{(\mu _s+\mu _n)=0}}\\[3mm] \displaystyle \frac{{\partial L}}{{\partial \lambda }} ={\varvec{ w^{\tau }{\varvec{(\mu _s+\mu _n)}}}}- \mu _0 = 0 \end{array} \right. \end{aligned}$$
(9)

Then under the noisy environment, the optimal mean–variance portfolio weight vector \({\varvec{ w_{noise}^*}}\) is computed as

$$\begin{aligned} {\varvec{w_\mathrm{{noise}}^{*}}}=\mu _0{\varvec{\frac{(\Sigma _s+\Sigma _n)^{-1} (\mu _s+\mu _n)}{(\mu _s+\mu _n)^{\tau } (\Sigma _s+\Sigma _n)^{-1} (\mu _s+\mu _n)}}} \end{aligned}$$
(10)

Similarly, the optimal portfolio weight vector \({\varvec{ w_\mathrm{{nonnoise}}^*}}\) under the noise-free environment is calculated as follows:

$$\begin{aligned} {\varvec{ w_\mathrm{{nonnoise}}^{*}}}=\mu _0{\varvec{\frac{(\Sigma _s)^{-1} \mu _s}{\mu _s^{\tau } (\Sigma _s)^{-1} \mu _s}}} \end{aligned}$$
(11)

Equations (10), (11) show that noise affects portfolio weight not only through the covariance matrix but also through the expected return, which confirms the fact that noise is an important factor affecting portfolio performance. In practice, what investors need is the portfolio weight \({\varvec{ w_\mathrm{{nonnoise}}^*}}\) under non-noisy environment, however, due to the existence of noise, the actual portfolio weight they obtain is \({\varvec{ w_\mathrm{{noise}}^*}}\). As a result, it is difficult for investors to construct an effective diversification, therefore, it is necessary to use some appropriate denoising strategies to suppress the noise interference.

When focusing on noise, a common assumption in practice is that the mean of noise is 0, i.e., \({\varvec{\mu _n=0}}\) (Donoho and Johnstone, 1994). In this case, the optimal portfolio weight \({\varvec{w_{noise}^{\dag }}}\) under noisy environment is

$$\begin{aligned} {\varvec{w_{\mathrm {noise}}^{\dag }}}=\mu _0{\varvec{\frac{(\Sigma _s+\Sigma _n)^{-1} \mu _s}{\mu _s^{\tau } (\Sigma _s+\Sigma _n)^{-1} \mu _s}}} \end{aligned}$$
(12)

It is clear that noise affects portfolio performance only through the covariance matrix, which confirms the validity of previous studies to filter the covariance matrix (Daly et al., 2008; Tian and Zhao, 2020). However, when the assumption \({\varvec{\mu _n=0}}\) is not satisfied, only filtering the covariance matrix is not sufficient.

2.3 Mean–Variance Effective Frontier

When analyzing the interference of noise on portfolio variance, since the mean of returns is close to 0 in practice, we can consider a simple scenario, i.e., the assumption \({\varvec{\mu _n=0}}\) is satisfied. In this way, we bring Eq. (12) into Eq. (6), then, the portfolio variance under noisy environment is calculated as

$$\begin{aligned} \begin{array}{ll} \sigma ^2_{\mathrm {noise}}={\varvec{(w_{\mathrm {noise}}^{\dag })^{\tau }(\Sigma _s+ \Sigma _n) w_{\mathrm {noise}}^{\dag }}} =\displaystyle \frac{\mu _0^2}{{\varvec{\mu _s^{\tau } (\Sigma _s+\Sigma _n)^{-1} \mu _s}}} \end{array}\end{aligned}$$
(13)

If taking the portfolio variance \(\sigma ^2_{\mathrm {noise}}\) and expected return \(\mu _0\) as the axis, the shape of mean–variance effective frontier is a parabola that opens to the right and passes through the origin point. The reason for this result is that we impose certain constraints on the mean–variance model, such as \({\varvec{\mu _n=0}}\), etc. Similarly, the portfolio variance under the non-noisy environment is computed as

$$\begin{aligned} \begin{array}{ll} \sigma ^2_\mathrm{{nonnoise}}={\varvec{(w_\mathrm{{nonnoise}}^{*})^{\tau }\Sigma _s wv^{*}}}= \displaystyle \frac{\mu _0^2}{{\varvec{\mu _s^{\tau } \Sigma _s^{-1} \mu _s}}} \end{array}\end{aligned}$$
(14)

Equation (14) shows that noise causes the portfolio variance to deviate from the true position, which is consistent with the results of optimal portfolio weights. Besides, when comparing the portfolio variance under noisy and non-noisy environments, the magnitude between them can be obtained from the following equation.

$$\begin{aligned} \begin{array}{ll} \displaystyle \frac{\mu _0^2}{\sigma ^2_{\mathrm {noise}}}-\frac{\mu _0^2}{\sigma ^2_\mathrm{{nonnoise}}}&{}={\varvec{\mu _s^{\tau } (\Sigma _s+\Sigma _n)^{-1} \mu _s}}-{\varvec{\mu _s^{\tau } \Sigma _s^{-1} \mu _s}}\\ [-1mm] &{}=|{\varvec{\mu _s^{\tau } (\Sigma _s+\Sigma _n)^{-1} \mu _s}}|-{\varvec{|\mu _s^{\tau } \Sigma _s^{-1} \mu _s|}}\\ &{}={\varvec{| (\Sigma _s+\Sigma _n)^{-1}|\cdot | \mu _s^{\tau }\mu _s|}}-{\varvec{|\Sigma _s^{-1}|\cdot |\mu _s^{\tau } \mu _s|}}\\ &{}={\varvec{[\,| (\Sigma _s+\Sigma _n)^{-1}|- |\Sigma _s^{-1}|\,]\cdot |\mu _s^{\tau } \mu _s|}} \end{array}\end{aligned}$$
(15)

where \({\varvec{|\mu _s^{\tau } \mu _s|}}\ge 0\), the matrices \({\varvec{\Sigma _s}}\), \({\varvec{\Sigma _n}}\) and \({\varvec{\Sigma _s+\Sigma _n}}\) are positive definite. Based on the knowledge of higher algebra, the inverse matrices \({\varvec{\Sigma _s^{-1}}}\), \({\varvec{\Sigma _n^{-1}}}\) and \({\varvec{(\Sigma _s+\Sigma _n)^{-1}}}\) are also positive definite. Besides, it can be deduced that \({\varvec{|\Sigma _s+\Sigma _n|\ge |\Sigma _s|}}\),Footnote 1 and \({\varvec{|(\Sigma _s+\Sigma _n)^{-1}|\le |\Sigma _s^{-1}|}}\),Footnote 2 In this way, we can obtain the following inequality.

$$\begin{aligned} \begin{array}{ll} \displaystyle \frac{\mu _0^2}{\sigma ^2_{\mathrm {noise}}}\le \frac{\mu _0^2}{\sigma ^2_\mathrm{{nonnoise}}}\Longleftrightarrow \sigma ^2_{\mathrm {noise}}\ge \sigma ^2_\mathrm{{nonnoise}} \end{array}\end{aligned}$$
(16)

Equation (16) implies that noise increases the portfolio variance and shifts the mean–variance effective frontier to the right. Therefore, denoising is equivalent to changing from a noisy environment to a non-noisy environment. As consequence, the effective frontier will shift to the left compared to that of using original price, and the higher the denoising degree is, the farther the shift to the left will be. Figure 2 summarizes the mean–variance effective frontier for different scenarios.

Fig. 2
figure 2

Mean–variance effective frontier

2.4 Measures of Portfolio Performance

In practice, investors are more concerned about the return they can achieve under a certain level of risk tolerance (Moura et al., 2020). Thus, four common quantitative indicators are considered to evaluate portfolio performance, which include the Sharpe ratio, Sortino ratio, upside potential ratio, and tracking error ratio. The higher these indicators are, the better the effect of portfolio will be.

As we know, the Sharpe ratio, abbreviated SR, is the most common indicator adopted by investors to measure portfolio return.

$$\begin{aligned} SR= \frac{{\mathbb {E}}(r_p)}{\sqrt{{var}(r_p)} } \end{aligned}$$
(17)

Due to potential drawbacks of Sharpe ratio in evaluating portfolio performance, we apply the Sortino ratio, abbreviated SoR, to take account of the asymmetric pattern of financial volatility which cannot be captured via Sharpe ratio (Sortino and Van Der Meer, 1991).

$$\begin{aligned} SoR=\frac{{\mathbb {E}}(r_p)}{\sqrt{{\mathbb {E}}(min (r_p, 0))^{2}}} \end{aligned}$$
(18)

Additionally, as described by Sortino et al. (1999), we take into account the upside potential return, and use the upside potential ratio, abbreviated UPR, to study the information in the higher moment.

$$\begin{aligned} UPR=\frac{ {\mathbb {E}}(max(r_p, 0))}{\sqrt{{\mathbb {E}}(min (r_p, 0))^2}} \end{aligned}$$
(19)

Also, in order to quantify the differences between competing portfolio strategies, the tracking error ratio, abbreviated TR, is used to evaluate the error-tracking ability (Berger and Czudaj, 2020).

$$\begin{aligned} TR = \frac{{\mathbb {E}}(r_p - r_b)}{\sqrt{var(r_p - r_b)}} \end{aligned}$$
(20)

where \(r_b\) denotes the portfolio based on original unfiltered return, which is defined as the benchmark. TR gives the tracking error, i.e. the difference between the evaluated portfolio return and the benchmark. Thus, a higher TR denotes that the portfolio performance on error-tracking is better.

3 EMD Denoising Methodology

Section 2 points out that noise is an important factor affecting portfolio performance, take a step forward, a new EMD denoising method is constructed to improve portfolio performance. The reason for preferring EMD to construct the denoising method is that compared to traditional denoising methods such as wavelet denoising, etc, it is adaptive and does not require any prior assumptions about signal pattern or system order, such as basis function, decomposition level, etc, which are important factors affecting the denoising results. For investors, how to choose the right parameters is a difficult task. Besides, EMD shows better properties in dealing with nonlinear and non-stationary data (Huang et al., 1998), and has been widely applied to decompose financial data (Zhu et al., 2017; Yang et al., 2019). To illustrate the superiority of the proposed denoising method, we thoroughly compare several common denoising methods and test the portfolio performance under the mean–variance framework.

3.1 Empirical Mode Decomposition

The EMD proposed by Johnson et al. (1998) decomposes original noisy price x(t) into a series of IMFs, which need to satisfy the following two conditions: (1) The extremum numbers and zero-crossing points must be equal or differ at most by one in the whole time series. (2) The mean value of the envelope defined by the local maxima and minima is zero at any point. With this definition, the noisy price x(t) can be decomposed according to Table 1:

Table 1 EMD algorithm

Using the sifting procedure, the price x(t) can be expressed as the sum of IMFs and a residual,

$$\begin{aligned} {x(t)}=\sum _{j=1}^{C} I\!M\!F_{j}(t)+v(t) \end{aligned}$$
(21)

where v(t) is the residual, C is the number of IMFs.

3.2 Common EMD Denoising Methods

EMD decomposes the noisy data into several IMFs with frequencies ranging from high to low to represent the periodic change from highly time variant to long periodicity. Different IMFs represent different fluctuation levels of noisy data. Generally, the high-frequency IMFs are disordered and display minimal regularity, which are mainly caused by a series of factors that have short-term effects, such as bad weather and strikes, etc. Flandrin et al. (2004) consider these high-frequency IMFs as noise and argue that the main information is concentrated in the low-frequency IMFs. Thus, there must be a key index, the IMFs after IMF\(_{index}\) are considered as the dominant modes, and the formers are considered as noise. In this way, the denoised price \({\hat{s}}(t)\) can be expressed as

$$\begin{aligned} {\hat{s}}(t)=\sum _{j=index}^{C} I\!M\!F_{j}(t)+v(t) \end{aligned}$$
(22)

In practice, numerous studies follow the framework to construct denoising strategies in engineering and medical fields, etc (Boudraa and Cexus 2007; Nguyen and Kim, 2016). Following the previous approaches, four common criteria are considered to determine the index.

Criterion 1: As argued by Boudraa and Cexus (2007); An et al. (2013); Chen et al. (2021), minimizing the mean square error (MSE) between s(t) and an approximation \({\hat{s}}_i(t)\) is a common selection criterion, which is defined as

$$\begin{aligned} M\!S\!E(s(t),{\hat{s}}_i(t))=\frac{1}{T} \sum _{t=1}^T\left( s(t)-{\hat{s}}_i(t)\right) ^{2} \end{aligned}$$
(23)

where \({\hat{s}}_i(t)=\sum _{j=i}^{C} I\!M\!F_{j}(t)+v(t)\), C is the number of IMFs. However, the MSE cannot be calculated directly because s(t) is unknown. The consecutive MSE (CMSE) does not require any knowledge of s(t), which is

$$\begin{aligned} \begin{array}{ll} C\!M\!S\!E({\hat{s}}_i(t), {\hat{s}}_{i+1}(t)) &{}= \displaystyle \frac{1}{T} \sum _{t=1}^{T}\left( {\hat{s}}_i(t)-{\hat{s}}_{i+1}(t)\right) ^{2}, \quad i=1, \ldots , C-1 \\[4mm] &{}= \displaystyle \frac{1}{T} \sum _{t=1}^{T}\left( I\!M\!F_{i}(t)\right) ^{2} \end{array} \end{aligned}$$
(24)

Finally, the index is given by

$$\begin{aligned} index=\underset{1\le i\le C-1}{argmin}\,\,C\!M\!S\!E({\hat{s}}_i(t), {\hat{s}}_{i+1}(t)) \end{aligned}$$
(25)

Criterion 2: The change-point method proposed by Kokoszka and Leipus (1998) is a popular technique for identifying turning points. Instead of minimizing CMSE, we apply the change-point technique to find the index.

$$\begin{aligned} R(i)=\frac{i(C-i)}{C^{2}}\left( \frac{1}{i} \sum _{j=1}^{i} e_j-\frac{1}{(C-i)} \sum _{j=i+1}^{C} e_j\right) ,\quad i=1, \ldots , C-1 \end{aligned}$$
(26)

where \(e_j=\displaystyle \frac{1}{T} \sum _{t=1}^{T}\mathrm {IMF}^{2}_{j} (t)\). Finally, the index is given by

$$\begin{aligned} index=\underset{1\le i\le C-1}{argmax}\,\,|R(i)| \end{aligned}$$
(27)

Criterion 3: Komaty et al. (2013), Nguyen and Kim (2016) suggest the probability density function (PDF) of IMF contains its complete information, the PDF similarity measure can be used to identify the non-noisy modes.

$$\begin{aligned} P\!D\!F_{similarity} (i)=dist(P\!D\!F_{x(t)},P\!D\!F_{\mathrm{{IMF}}_{i}(t)}) \end{aligned}$$
(28)

where dist() is a distance metric used to compute the similarity.

Komaty et al. (2013) show that the similarity measures can be classified into two categories: (1) The information-theoretic measures such as Kullback–Leibler divergence (KLD), etc., (2) The distance measures between two PDFs such as Euclidean distance (ED), etc. Therefore, we construct criterions 3 and 4 based on these two metrics.

The KLD, which relies primarily on Shannon’s concept of probabilistic uncertainty, has been the most frequently used information-theoretic distance measure (Nguyen and Kim, 2016).

$$\begin{aligned} dist_{K\!L\!D}(P,Q)=\int _{-\infty }^{+\infty }P(u) \log \frac{P(u)}{Q(u)} d u \end{aligned}$$
(29)

where P and Q are PDFs. To eliminate the interference of asymmetric factors, we apply the symmetric version of KLD, which is

$$\begin{aligned} dist(P,Q)=\frac{dist_{K\!L\!D}(P,Q)+dist_{K\!L\!D}(Q,P)}{2} \end{aligned}$$
(30)

The index is given by

$$\begin{aligned} index=\underset{1\le i\le C-1}{argmax}\,\, P\!D\!F_{similarity}(i) \end{aligned}$$
(31)

Criterion 4: Euclidean distance is also a common method to measure PDF similarity (Komaty et al., 2013; Nguyen et al., 2015; Hao et al., 2017). Instead of KLD, criterion 4 applies the Euclidean distance to identify the relevant IMFs, which is

$$\begin{aligned} dist(P,Q)=\Vert P-Q\Vert _{2}=\left( \int _{-\infty }^{+\infty }(P(u)-Q(u))^{2} du\right) ^{\frac{1}{2}} \end{aligned}$$
(32)

3.3 The Proposed Denoising Method

Although the common EMD denoising methods mentioned in Sect. 3.2 have achieved great success in signal analysis (Komaty et al., 2013; Hao et al., 2017, engineering (Nguyen and Kim, 2016), etc., these denoising methods may not be suitable for finance data since the optimal denoising strategy highly depends on the data characteristic, i.e., different types of data have different optimal denoising strategies (Li et al., 2016; Nguyen and Kim, 2016; Zhu et al., 2019, 2021). In practice, these approaches might face many weaknesses, such as inadequate or excessive denoising (Helong et al., 2019). To better adapt to financial data and improve investors’ portfolio return, we propose a new EMD denoising method based on the correlation coefficient test criterion, which can be expressed as follows:

The correlation between noise n(t) and non-noisy component s(t) is relatively low or irrelevant, i.e., \(cor(s(t),n(t))=0\). Then, we can obtain

$$\begin{aligned} \begin{array}{rll} cov (x(t),n(t)) &{}= cov (s(t),n(t)) + cov(n(t),n(t)) &{}= \sigma _n^2 \\ cov (x(t),s(t)) &{}= cov(s(t),s(t)) + cov (s(t),n(t)) &{}= \sigma _s^2 \end{array}\end{aligned}$$
(33)

where \(\sigma _s^2\) and \(\sigma _n^2\) are the variances of non-noisy component s(t) and noise n(t), respectively. When denoising the price series in the stock market, \(\sigma _s^2\) is generally very large, while \(\sigma _n^2\) is relatively small (Li et al., 2016). Therefore, we can judge which IMFs are noise based on the covariances with noisy price x(t). However, the range of covariance is not fixed, the correlation coefficient ranges from \(-\,1\) to 1. Thus, it is better to replace covariance with correlation coefficient. Furthermore, the correlation coefficients between non-noisy component s(t), noise n(t) and noisy price x(t) are

$$\begin{aligned} \begin{array}{rl} corr(x(t),n(t))&{}=\displaystyle \frac{cov(x(t),n(t))}{\sigma _x\sigma _n} = \frac{\sigma _n^2}{\sigma _x\sigma _n}=\frac{\sigma _n}{\sigma _x}\\ corr(x(t),s(t))&{}=\displaystyle \frac{cov(x(t),s(t))}{\sigma _x\sigma _s}\, = \frac{\sigma _s^2}{\sigma _x\sigma _s}=\frac{\sigma _s}{\sigma _x} \end{array}\end{aligned}$$
(34)

where \(\sigma _x\) is the standard deviation of x(t). Based on the difference between \({\sigma _n}\) and \({\sigma _s}\), we can judge that the IMFs are noises if they have low correlation coefficients with noisy price x(t), otherwise, they are non-noise components.

In this study, we use the hypothesis test method to verify which IMFs are noise. Let \(\rho _j \,(j=1,\ldots , C)\) denotes the correlation coefficient between noisy price x(t) and each IMF. Then, the null hypothesis is

$$\begin{aligned} {H_0}:\rho _j = 0,\quad {H_1}:\rho _j \ne 0 \end{aligned}$$
(35)

The test statistic is

$$\begin{aligned} \displaystyle \rho _j \sqrt{\frac{T-2}{1-\rho _j^{2}}}\sim \chi (T\!-2) \end{aligned}$$
(36)

If the test accepts \(H_0\), we consider that the IMF has a low or no correlation with original price, then, the IMF is regarded as noise. Conversely, the IMF is considered as a non-noise component. In this study, the test p-valueFootnote 3 is used to identify the noise. In detail, the smaller the p-value, the greater the probability that the test result will reject the null hypothesis. Therefore, by setting the confidence level \(\beta \), we can determine that the IMFs with p-values higher than \(\beta \) are noise. Conversely, the IMFs are non-noisy components. Table 2 summarizes the identification results.

Table 2 Noise identification based on correlation coefficient test

Based on the above information, the noisy price x(t) can be decomposed as

$$\begin{aligned} \begin{array}{ll} x(t) = \underline{~\sum \limits _{\{j: {p_j} > \beta \} } I\!M\!F{_j}(t)~} &{}+\ \ \ \underline{~\sum \limits _{\{ j: {p_j} \le \beta \} } I\!M\!F{_j}(t) + v(t)~}\\ \quad\quad\quad\quad{{\hat{n}}}(t) &{}+ \quad\quad\quad{\hat{s}}(t) \end{array}\end{aligned}$$
(37)

where \({{\hat{n}}}(t)\) and \({{\hat{s}}}(t)\) are the estimations of noise n(t) and non-noisy component s(t), respectively. Finally, the denoised price \({\hat{s}}(t)\) can be expressed as

$$\begin{aligned} {\hat{s}}(t)=\sum _{\{j: p_j\le \beta \}}I\!M\!F_{j}(t)+v(t) \end{aligned}$$
(38)

To verify the accuracy of denoised price, we also test the correlation between the denoised price \({\hat{s}}(t)\) and original price x(t) according to Eq. (35). If the test rejects \(H_0\), then we can obtain the final denoised price. It is notable that the confidence level \(\beta \) determines the denoising degree, the lower the confidence level is, the higher the denoising degree is. In the empirical section, we choose a low confidence level \(\beta \)= 0.001 to fully remove the noise, which means that we can confirm the IMF as noise with a 99.9% probability. In practice, alternative values, such as 0.01, 0.05, etc, were also tried. However, we finally found that \(\beta \)= 0.001 is more appropriate. The selected confidence level may produce some deviations when denoising other financial data. Therefore, it should be treated with caution.

4 Empirical Analysis

To illustrate the superiority of the proposed denoising method, abbreviated EMD\(\rho \), we comprehensively compare four common EMD denoising methods discussed in Sect. 3.2, which include combining CMSE, change-point technique, Kullback–Leibler divergence, and the Euclidean distance. For presentation purposes, they are abbreviated as EMD\(_\mathrm{MSE}\), EMD\(_{\text {CP}}\), EMD\(_\mathrm{KLD}\), and EMD\(_\mathrm{ED}\), respectively.

4.1 Data Resource

The dataset is the daily closing prices of SSE 50 index’s latest constituents traded on the Shanghai Stock Exchange. The SSE 50 index picks the top 50 stocks ranked by total market value and turnover as its constituents. Therefore, the index’s constituents are the most representative stocks in terms of transaction size and liquidity (Chen et al., 2020). Besides, these constituents have been widely applied in portfolio management (Chen and Zhou, 2018; Ren et al., 2019). The dataset comprises the daily closing prices of 3,180 trading days ranging from October 8, 2007, to October 30, 2020, which are collected from the Wind website (www.wind.com.cn). To make the data as continuous as possible, we eliminate 20 stocks with missing values over 10 days. The appendix reports the IDs and names of the selected SSE 50 index’s constituents.

In practice, the in-sample and out-of-sample test method is often adopted. The former is used to calculate portfolio weights and calibrate the model, while the latter is used to evaluate portfolio performance. We divide the full dataset into two subsets: in-sample and out-of-sample periods. The first 60% of the sample, which covers the period from October 8, 2007 to August 6, 2015, is used as the in-sample estimation. The last 40% of the sample for the out-of-sample analysis covers from August 7, 2015 to October 30, 2020.

4.2 Denoising Analysis

The proposed denoising method is constructed based on EMD technique. As an example, Fig. 3 shows the decomposition results for the price of Pudong Development Bank (ID: 600000). EMD splits the original price into a series of IMFs, with cycles ranging from short to long, and frequencies varying from high to low. The high-frequency IMFs fluctuated sharply during the 2007–2008 financial crisis, due to that the market is sensitive during the financial crisis and some minor events may trigger huge market panics or fluctuations (Erkens et al., 2012). As results, the high-frequency IMFs, which are caused by some factors with short-term effects, show large fluctuations during the financial crisis period. Finally, the decomposition results for the other 29 constituents exhibit similar patterns, we do not report to save space.

Fig. 3
figure 3

EMD decomposition for the price of Pudong Development Bank

To explain the rationality and better understand the proposed denoising method, the price of Pudong Development Bank is used as an example. Table 3 reports the descriptive statistics of decomposed IMFs. It is shown that the covariance and correlation coefficients between IMFs 1–4 and original noisy price are close to 0, while the covariance and correlation between IMFs 5–8, residuals and original noisy price are relatively high. These findings are consistent with the underlying assumption, which implies that the proposed method is reasonable. The test results also indicate that the IMFs 1–4 are noise at the given confidence level \(\beta =0.001\). Finally, we sum the IMFs 5–8 and residual to construct the denoised price of Pudong Development Bank.

Table 3 Descriptive statistics of decomposed IMFs

Figure 4 provides six heatmaps to visualize the correlation structures across different denoised returns. It is shown that EMD\({\rho }\) and EMD\(_\mathrm{CP}\) significantly increase the correlations between returns. The main reason is that denoising removes the short-term heterogeneous fluctuations and retains the long-term common trend from the noisy price. The correlation structure for EMD\(_\mathrm{CP}\) is completely different from that of original return, which means that the denoising degree is too high to achieve a good portfolio performance. Besides, EMD\(_\mathrm{MSE}\) and EMD\(_\mathrm{ED}\) have similar correlation structures with original return, indicating that denoising is not sufficient. Thus, the portfolios based on EMD\(_\mathrm{MSE}\) and EMD\(_\mathrm{ED}\) hardly outperform the portfolio based on original return. By contrast, EMD\({\rho }\) has a relatively high denoising degree, and does not completely the correlation structure.

Fig. 4
figure 4

Correlation between return series for different denoising methods

4.3 Optimal Portfolio Construction

The optimal portfolio is constructed through efficient frontier. In detail, we take equidistant 100 points between the minimum and maximum average returns of 30 stocks, resulting in 101 points of \(E(r_p)=\mu _0\). Then, the efficient frontier is obtained according to Equation (7).Footnote 4

Figure 5 plots the mean–variance efficient frontiers for different denoising methods. It is shown that the effective frontiers based on denoised returns are on the left-hand side of that based on original unfiltered return. Generally, the higher the denoising degree is, the lower risk can be achieved, resulting in the effective frontier being closer to the vertical axis. Therefore, EMD\(_\mathrm{CP}\) (Yellow dotted line marked by lower triangle) and EMD\({\rho }\) (Green solid line marked by pentagram) have a high denoising degree. It is abnormal that the efficient frontier for EMD\(_\mathrm{KLD}\) (Red solid line marked by upper triangle) is a segmented straight line, due to the fact that EMD\(_\mathrm{KLD}\) removes too much effective information for a few stocks, resulting in the concentration of portfolio weights in these few stocks. Table 13 in the appendix confirms the point that EMD\(_\mathrm{KLD}\) denoises too many for Zhongjin Gold (ID: 600489). These results imply that EMD\(_\mathrm{KLD}\) can not diversify risk well and achieve satisfactory portfolio performance. In practice, there are two challenges in constructing the optimal portfolio: (1) The input parameters have a large impact on the portfolio (Chen et al., 2020). (2) The effective frontiers do not correspond to each other, i.e., the maximum and minimum average returns for different methods are not equal. To eliminate the interference from the human factor, and overcome these challenges, we construct a return interval by the maximum Sharpe ratio and use the return interval as a benchmark to search for the optimal portfolio weights. Table 4 shows the construction steps of optimal portfolio.

Fig. 5
figure 5

In-sample mean–variance efficient frontier

Table 4 Constructing the optimal portfolio

4.4 Portfolio Performance Evaluation

To illustrate the superiority of the proposed denoising method, we analyze the portfolio performance not only from the full sample, but also from four subsamples, including the bear market, bull market, the 2007–2008 financial crisis and COVID-19 pandemic periods.

4.4.1 Full Sample Analysis

Table 5 reports the performance statistics for different denoising methods. It is shown that EMD\({\rho }\) outperforms other competitors under all the metrics, which fully demonstrates the superiority of the proposed method. By contrast, other denoising methods have poor performance due to that the noise is not correctly removed. In detail, EMD\(_\mathrm{MSE}\) and EMD\(_\mathrm{ED}\) have poor performance since the noise is not sufficiently removed, while too much effective information is removed for EMD\(_\mathrm{CP}\). The weakness for EMD\(_\mathrm{KLD}\) is that denoising too much for single stock, which leads that the portfolio weights concentrated on a single stock. Overall, the proposed denoising method addresses these weaknesses, it is the optimal denoising strategy, which can help investors improve their portfolio return to the greatest extent.

Table 5 Mean–variance portfolio performance based on EMD denoising methods

The EEMD proposed by (Wu and Huang, 2009) is also a common data decomposition technique. By adding a lot of Gaussian white noise to the decomposed signal, it effectively solves the problem of mode mixing in EMD and has been widely used to decompose financial data (Nguyen and Kim, 2016; Yan et al., 2020). To further demonstrate the superiority of the proposed method, we apply EEMD to reconstruct different denoising methods.

Table 6 Mean–variance portfolio performance based on EEMD denoising methods

Table 6 presents the performance metrics for different denoising methods. It is shown that the sophisticated EEMD denoising methods do not achieve satisfactory results. As argued by Yeh et al. (2010), EEMD introduces a new problem when solving the mode mixing problem, i.e., the decomposed IMFs remain additional white noise, which inevitably increases the model error and deteriorates the portfolio performance. Scheller and Auer (2018) show that some simple methods usually achieve satisfactory results in portfolio management. This is the reason why we use the simplest EMD to decompose the noisy price.

Table 7 Mean–variance portfolio performance based on wavelet soft threshold denoising methods

Wavelet denoising is a prevalent denoising method in portfolio management (Hamdi et al., 2019, Zhu et al., 2021). The key of wavelet denoising is to determine the wavelet basis function. Following the previous studies (Zhu et al., 2019), three common basis functions: sym8, haar and coif4, are chosen to check the portfolio performance for wavelet denoising. Table 7 reports the corresponding portfolio results. Besides, DeMiguel et al. (2009) discuss that the equal-weighted portfolio can reap a better Sharpe ratio and turnover. As a comparison, Table 7 also presents the equal-weighted portfolio results.

Table 7 confirms the superiority of the proposed denoising method over wavelet denoising. Except for the tracking error ratio, the performance metrics for EMD\(\rho \) are far higher than those of wavelet denoising. Besides, the choice of wavelet basis function has a large impact on portfolio performance. For example, the portfolio performance for haar wavelet denoising is relatively poor, while, the wavelet denoising using sym8 and coif4 wavelets achieves better portfolio performance. In practice, it is a difficult task to pick the proper basis function in advance for investors. By contrast, the proposed denoising method avoids this challenge. Lastly, Table 7 also confirms that the proposed denoising method outperforms the equal-weighted portfolio.

4.4.2 Subsamples Analysis

Considering the differences between bull and bear markets, the denoising performance is tested not only in the full sample but also in different subsamples. Besides, to test the sensitivity of different methods to extreme events, we consider two special periods in the bear and bull markets, i.e., the 2007–2008 financial crisis and the COVID-19 pandemic in 2020. The different periods are identified according to the actual economic context and SSE 50 index’s tendency. Figure 6 plots the prices (Dot-dash line in the upper panel) and returns for SSE 50 index. Besides, the upper panel in Fig. 6 also plots the noise (Yellow solid line) and non-noisy components (Black solid line) based on the correlation coefficient test criterion.

Fig. 6
figure 6

The prices (upper panel) and returns (bottom panel) of SSE 50 index

Between 2007 and 2008, the global economy experienced a recession with the outbreak of financial crisis, the prices and returns of SSE 50 index fell sharply. Therefore, the data from October 8, 2007 to November 11, 2008 was used as the financial crisis subsample. To revive the economy, the Chinese government launched a 4 trillion bailout plan, the economy gradually emerged from the financial crisis and experienced a short-term bull market. However, due to the ensuing European debt crisis and the continued deterioration of the global economy, the economy was still in a downward spiral. Therefore, the period from October 8, 2007 to November 2, 2014 was considered as a bear market. After that, with the recovery of major economies and the transformation and upgrading of the economy, China’s economy was gradually emerging from the gloom and heading towards a better future. The prices of SSE 50 index were upward, giving an increase more than 100% from trough to peak, and the fluctuation in return is relatively moderate. Therefore, the remaining data in the full sample was identified as a bull market. Finally, on the last day of 2019, a novel coronavirus was first detected in Wuhan city. Since then, COVID-19 has continued to impact the global economy. Thus, the interval from January 1, 2020 to the endpoint of the full sample is set as the COVID-19 pandemic period.

Table 8 shows the division of in-sample and out-of-sample periods for different subsamples. Similar to the full sample, the first 60% of subsample data is set as the in-sample period, while, the remaining 40% is used as the out-of-sample period to test portfolio performance.

Table 8 In-sample and out-of-sample subsample periods

Table 9 reports the subsample portfolio results for different denoising methods. The results reconfirm the superiority of the proposed denoising approach, EMD\(\rho \) outperforms others in both bear and bull markets. As a comparison, other EMD denoising methods hardly achieve satisfactory results during all the subsample periods, which implies that it is critical to denoise the correct IMFs. Similarly, a better portfolio performance is hard to achieve for EEMD denoising due to the existence of additional white noise. It is notable that wavelet denoising reaps satisfactory results, indicating that it is a powerful denoising method. However, as noted above, wavelet denoising requires setting the basis function in advance, and an inappropriate basis function may lead to poor performance.

Table 9 Mean–variance portfolio performance for different subsamples

Focusing on the financial crisis and COVID-19 pandemic periods, EMD\(\rho \) is slightly ineffective during the financial crisis, which indicates that the proposed method is slightly weaker in reducing extreme loss. However, the proposed method still outperforms other EMD denoising methods. Besides, compared to the financial crisis, the COVID-19 pandemic had a relatively small shock on the portfolio performance, which due to that the Chinese government controlled the epidemic in a timely and effective manner, such as the closure of Wuhan city, the national joint prevention and control, etc.

5 Simulation Study

To further test the reliability of the conclusions, we further generate a series of price matrices through Monte Carlo simulation. The simulated price \(x_i(t)\) of asset \(i,\,(i=1,\ldots ,k)\) at time \(t,\,(t=1,\ldots ,T)\) is composed of two parts: non-noisy price \(s_i(t)\) and noise \(n_i(t)\). The non-noisy price \(s_i(t)\) is generated by the Ito process: \(ds_i(t) = \mu _i s_i(t)dt + \sigma _i s_i(t)dW\), where \(\mu _i\) and \(\sigma _i\) are the annualized rates of return and volatility, respectively, W follows a standard Brownian motion. The noise \(n_i(t)\) is obtained by sampling from a specific distribution. In this way, the simulated noisy price can be expressed as \(x_i(t)=s_i(t)+n_i(t)\). When focusing on the parameter setting and distribution characteristic, Table 10 reports different setting methods.

Table 10 Parameter setting for simulated price

We generate a price matrix of 1000 observations for each simulated sample. Table 11 reports the performance metrics for different denoising methods. In setting 1, since the results are similar, panel A only concerns \(\mu _i=0.1,\, \sigma _i = 0.5\,\,(i=1,\ldots ,k)\) and \(k=50\). Besides, to eliminate the influence of sample period on the simulation results, the in-depth simulation studies with 500 and 3000 observations are conducted for different settings. Table 16 in the appendix reports the portfolio results. The overall conclusions remain consistent with the previous, the portfolio for EMD\(\rho \) has the best performance, which fully illustrates the superiority and robustness of the proposed denoising method. The common EMD denoising methods perform poorly since the noise components are not correctly removed. The wavelet and EEMD denoising methods also exist some weaknesses, such as the choice of basis function and noise interference, etc. To sum up, the proposed method is the optimal denoising strategy, which can help investors significantly improve their out-of-sample portfolio performance.

Table 11 Mean–variance portfolio performance for different simulated samples

6 Conclusions

Noise is an important factor affecting portfolio performance, in this study, we theoretically prove that noise can cause the optimal portfolio weights and effective frontier to deviate from their true positions. Thus, it is necessary to eliminate noise. Besides, considering the previous common denoising methods, especially EMD denoising, have some weaknesses in portfolio management, such as inadequate or excessive denoising, we further construct the EMD denoising strategy based on the correlation coefficient test criterion to improve portfolio performance. In detail, the EMD is used to decompose original noisy price. Then, a series of correlation coefficient tests are performed to determine which IMFs are noise. If the tests accept the null hypothesis, the IMFs are considered as noise. Conversely, they are considered as non-noisy components.

In the empirical analysis, we apply the proposed denoising method to denoise the SSE 50 index’s constituents and summarize out-of-sample performance based on four return-risk ratios including Sharpe ratio, Sortino ratio, upside potential ratio and tracking error ratio. The empirical results show that the proposed method outperforms four common EMD denoising, EEMD and wavelet denoising under the mean–variance framework. Besides, the portfolio performance is examined in four subsamples, including bull, bear markets and two special periods, i.e., the 2007–2008 financial crisis and the COVID-19 pandemic in 2020. The results indicate that the proposed method performs better in bear, bull markets, and COVID-19 pandemic periods, while, slightly weaker during the financial crisis. The simulation studies by setting different parameters and sample periods validate the above conclusions. The proposed denoising method can minimize noise interference and help investors improve their portfolio performance to the greatest extent.