Introduction

Fourier transform (FT) is one of the most widely used signal processing techniques. It enables a dual view of the time domain signal from the ‘frequency’ perspective. The two representations are exclusive in a sense that the frequency (or Fourier) representation makes no reference to time and vice versa. But while this exclusivity of time and frequency domain representations may seem mathematically powerful, it becomes prohibitive when processing wide-ranging (non-stationary) signals in real-life which exhibit natural entanglements between the two domains.

The above shortcomings of the FT have led to a large and varied body of work, referred to as the T–F analysis, that attempts to view a signal in the joint time-frequency (T–F) domain1. Here, the key goal is to track the time variations of signal frequencies – instantaneous frequency. Interestingly, for many real-life signals, their trajectories of instantaneous frequencies appear as a finite number of ribbon-like coherent structures (ridges) in the T–F domain, revealing hidden inner structures of the data. This begs the question: can we also obtain the time domain representation of these organized structures appearing in the T–F domain? Indeed, such representations could further improve our understanding of underlying physical processes and facilitate the processing of non-stationary multi-component signals at the level of their constituent modes in the time domain2. We will refer to such time-domain representations of multi-component signals as signal decomposition (SD) in the sequel, though in literature those are also referred to as mode retrieval, mode decomposition and signal separation.

The above tasks of obtaining signal decomposition (SD) and T–F representations for non-stationary data are directly related to this study and therefore it is pertinent to define those precisely. In relation to SD, let the input signal x(t) be a multi-component signal containing finite K number of components or modes i.e.,

$$\begin{aligned} x(t)=\sum _{k=1}^{K}c_k(t), \end{aligned}$$
(1)

where \(c_k(t)=a_k(t)\cos (\phi _k(t))\), for \(k=1\ldots K\), denotes a family of amplitude- and frequency-modulated (AM–FM) components of x(t), which needs to be recovered; \(a_k(t)\) and \(\phi _k(t)\) respectively represent instantaneous amplitude and phase functions of \(c_k\). It is important to mention here that the choice of the AM–FM mode extraction model in (1) is governed by the properties of real-life signals, which naturally involve multiple rhythms and oscillations e.g., circadian3 and cortical4 rhythms, heart rate5, respiratory variability6, speech and vibration signals7.

Next, the related problem of obtaining T–F representation of x can be written as

$$\begin{aligned} x(t) \rightarrow \Phi x(t,f), \end{aligned}$$
(2)

where \(\Phi x(t,f)\) represents the energy density of the signal x(t) in the T–F domain. The T–F analysis and signal decomposition have both become indispensable tools in a wide range of problems, with the T–F analysis finding utility in audio and speech processing8,9, communication10, and biomedical engineering11,12; whereas signal decomposition is used in vibration analysis7, denoising13,14,15,16 data fusion17,18,19 and medical studies20, to name a few.

To illustrate the SD and T–F analysis tasks, we show an example of a composite signal x(t) with three components as shown in Fig. 1 a; each of the three components (modes) \(s_1\), \(s_2\) and \(s_3\) shown separately in Fig. 1 b. In the second row, the respective T–F plots are shown in Fig. 1 c, d. Given x(t), the goal of SD is to recover its constituent oscillatory components (as given in Fig. 1 b) whereas the T–F analysis entails obtaining the T–F plot of the whole signal, shown in Fig. 1 c, or the T–F plots of individual components, shown in Fig. 1 d.

Figure 1
figure 1

Example of SD and T–F representation of a signal. The original composite signal is shown in the time domain in (a) and its separate components, \(s_1\), \(s_2\) & \(s_3\) are shown in (b). In (c) the T–F representation is shown for the composite signal. In (d), the time-frequency representation of each of the separate components of the original signal is shown.

The earliest attempts at providing a joint T–F representation (2) of non-stationary signals resulted in an intuitive and ad hoc procedure of sliding (or windowed) Fourier transform and Wigner-Villa distribution - a quadratic functional of a signal which can be interpreted as a varying spectral density. The theory of wavelets represents another direction in harmonic signal analysis, replacing the notion of ‘frequency’ with ‘scale’ and formalizing the idea of local signal representation at multiple scales.

The idea of decomposition of a multicomponent signal into its constituent AM–FM components, i.e., the SD problem given in (1), was popularized after the emergence of an empirical data-driven approach known as empirical mode decomposition (EMD)21. Despite its inherent flaws that are mainly linked with the empirical nature of the algorithm, the EMD method has found interdisciplinary applications. This led to an explosion of new and improved techniques for SD as well as new application areas where such techniques have been useful. In addition to the EMD, other popular techniques for SD are synchrosqueezed transform (SST)22, variational mode decomposition (VMD)23 and its extension, named nonlinear chirp mode decomposition (NCMD)24 and sliding singular spectrum analysis (SSA)25. Following the SD, one way to obtain the T–F representation of a signal from its decomposed components, \(c_k(t)\), is through the application of Hilbert transform - as in the case of EMD, VMD and SSA. Contrarily, SST obtains the (squeezed) T–F representation first, followed by the separation of signal components in the time domain.

Regardless of their mode of operation, each of the above methods has their own strengths and weaknesses, making those suitable for certain classes of signals and applications. Further, almost all the methods are fairly sensitive to their operational parameters and it is challenging to tune those for optimal results, for a given signal. Finally, the performance of all methods degrades, albeit to varying degrees, under noisy data. Given this, it is of considerable interest to gain better understanding of the strengths and weaknesses of the popular SD approaches and to evaluate their performance in terms of the (i) accuracy of obtaining the constituent AM–FM oscillatory components; (ii) robustness against noise; (iii) sensitivity to changes in the algorithmic parameters.

To this end, we investigate and compare the performance of the popular SD approaches in the context of their accuracy of decomposition, their ability to operate under noisy data and their sensitivity to algorithmic parameter changes. We also make suggestions regarding the optimal choice of algorithmic parameters of the SD approaches. Our observations and suggestions are based on the empirical results of carefully designed experiments using synthetic signals as well as real-life data. We consider and examine the performance of both single-channel (or univariate) SD approaches and their multi-channel (multivariate) extensions in this study. Finally, it is emphasized that dozens if not hundreds of data-driven SD techniques have emerged over the last two decades and it is not within the scope of this work to assess or even review all those; instead, we focus only on a few largely popular classes of SD and T–F approaches in this work.

The article is organized as follows: Section 2 gives an overview of the state-of-the-art in SD along with some technical details of the methods considered in this study. Section 3 presents experiments and related observations regarding the accuracy of the SD approaches, while section 4 examines the noise robustness of the considered methods. Section 5 evaluates the performance of the multivariate extensions of the popular SD methods in the presence of noise. Section 3-5 also examine the sensitivity of the SD methods to parameter changes. The article is concluded with an overall performance assessment of the considered approaches along with related discussion.

Review of the popular SD approaches

In this section, we briefly review the well-established and popular SD techniques which have been considered for evaluation in this work. The goal is to give a general description of each technique without going into algorithmic details, though we do mention important algorithmic parameters that will be relevant in the subsequent analysis. Further details of the relevant algorithms can be found in the provided references.

Empirical mode decomposition (EMD) and variants

EMD is the first truly data-driven method for signal decomposition and T–F analysis21,26. It operates through a sifting process that iteratively extracts inherent oscillations from input data based on signal extrema and their interpolation. The downside of EMD is that it lacks a rigorous mathematical framework, which poses problems in guaranteeing its correctness, stability and performance assessment in general. Efforts to overcome those difficulties include approaches that replace the sifting process with the resolution of partial differential equations27,28, convex approximations of EMD29,30, local iterative filtering approaches31,32, sparse models to obtain data-driven modes33,34, Teager-Huang Transform35,36 and noise-assisted approaches37,38,39.

In this article, we consider the original EMD method by Huang et. al21. The important operational choices within the EMD method are: i) the extrema interpolation scheme; ii) the stopping criterion for the intrinsic mode functions. We employ the well-established cubic spline approach for extrema interpolation and use the stopping criterion, introduced in40, that is based on two threshold values.

Synchrosqueezed transform (SST) and variants

Founded on strong mathematical footing, SST is a powerful tool for nonstationary signal decomposition and T–F analysis2,41,42. The method operates by sharpening the T–F representation of a signal by using a synchrosqueezing operator, followed by the T–F ridge extraction and component retrieval. Several variants of SST have emerged in the last decade, including its adaptation to STFT22,43, S-transform44 and multitapered transform45; higher order variants46,47,48,49; improved signal separation operators50,51; and ridge extraction techniques52,53.

The performance of SD task via SST relies heavily on i) the ridge extraction technique; ii) the synchrosqueezed T–F (or time-scale) representation of input data. To create the initial T–F (or time-scale) representation, appropriate mother wavelet function and accompanying parameters must be chosen. An optimization based heuristic approach22 is typically used for ridge extraction within SST. The approach requires the tuning of two crucial parameters for its operation: i) the initial bandwidth \(\omega _i\); ii) the maximum step size \(\alpha\) which restricts the maximum change in center frequency along the time-axis.

Variational mode decomposition (VMD) and variants

VMD poses the signal decomposition problem within the mathematical framework of variational convex optimization23. Despite its popularity, VMD is applicable only to signals containing bandlimited components; a recent work, titled nonlinear chirp mode decomposition (NCMD), aims to address that problem but suffers from serious convergence issues24.

Both VMD and NCMD will be considered in our analysis. The important parameters affecting the operation of the VMD method are: i) the number of components K to be separated; ii) the bandwidth parameter \(\alpha\); and iii) \(\tau\), an internal parameter enforcing the exact reconstruction via Lagrangian multiplier. The precision of reconstruction can be changed by altering \(\tau\). For NCMD, in addition to K, \(\alpha\), and \(\tau\), tuning the increment parameter \(\mu\) is crucial as it affects the estimation of change in mode frequencies. More information about the parameters of VMD23 and NCMD24 can be found in the relevant articles.

Source separation based approaches

Source separation methods based on simple generative models bear close resemblance with non-stationary signal decomposition techniques due to their shared data-driven flavour54,55,56. Yet, this connection between the two classes of methods has largely remained unexplored. Few notable exceptions, however, exist including singular spectrum analysis57, sliding SSA (SSA)25 and low-rank T–F synthesis (LRTFS) approach58.

In this work, SSA25 is used as one of the assessed approaches. An important parameter that requires tuning within SSA is the embedding dimension L. Moreover, the desired number of classes/clusters is another user-defined parameter that affects the performance of the algorithm.

Multivariate extensions of the SD approaches

Specialized algorithms for multivariate data have become crucial in modern applications, owing to recent developments in sensing technology. To this end, the notable multivariate extensions of data-driven NSP methods include those for EMD59,60,61,62,63, local iterative filtering64, VMD65,66, SST67, empirical wavelet transform68 and sliding SSA69.

In this article, we will evaluate the performance of multivariate extensions of EMD, named MEMD62, and VMD, known as MVMD65. Importantly, the algorithmic parameters of both the methods are mostly the same as their univariate extensions.

Accuracy of signal decomposition

We evaluate the performance of the popular SD approaches (EMD, VMD, SST, VNCMD and SSA) in terms of accurately decomposing the constituent AM–FM oscillatory components of a multi-component signal. The accuracy of such methods is inherently dependent on the (complexity of) input signal and so careful consideration has been made while choosing the test signals. In particular, we design two synthetic signals comprising of narrow- and wide-band components respectively. Further, a real-life biomedical electroencephalogram (EEG) signal is also used to evaluate the performance of the methods.

To compare the accuracy of the SD methods quantitatively, we use a metric called quality of reconstruction factor (QRF)25. QRF of extracted signal component \({\hat{s}}\), relative to the reference ‘true’ component s, is defined as

$$\begin{aligned} {\textrm{QRF}}({\hat{s}}, s)=20 \times \log _{10}\left( \frac{\Vert s\Vert }{\Vert s-{\hat{s}}\Vert }\right) . \end{aligned}$$
(3)

Note from (3) that greater (smaller) the error between the reconstructed signal component \({\hat{s}}\) and the ‘true’ component s, the smaller (greater) the value of QRF. Thus, higher values of QRF imply that the SD has been performed accurately. Clearly, the QRF metric can only be used when ‘true’ signal components (or ground truth) is available.

In this and the remaining sections, rather than showing the time-series plots of the decomposed components, we will show the T–F representation of the individual components obtained by using the Hilbert transform. This provides a more convenient and compact visual representation of the decomposed components.

Case study 1: narrow-band signal

The first input signal was created synthetically and consisted of 3 components. Each component was an AM–FM signal with narrow-band characteristics. This signal is a modified version of a signal employed in another study22, with a discontinuity in the T–F plane introduced in one of the components to increase the complexity of the signal. The 3 components of the composite signal are defined as, \(s_{11} = (1 + 0.2\cos (t))\cos (30\pi (2t+0.3\cos (t)))\), \(s_{12} = (1 + 0.3\cos (2t)){\textrm{e}}^{-\frac{t}{15}}\cos (30\pi (2.4t+0.5t^{1.2}+0.3\sin (t)))\), and \(s_{13} = \cos (30\pi (5.3t+0.2t.^{1.3}))\), which are added together to form the synthetic signal 1, \(s_1 = s_{11} + s_{12} + s_{13}\). The signal has a duration of 10 seconds with a sampling rate of 256 Hz. The ideal T–F representation of \(s_1\) along with each of its constituent modes is shown in Fig. 2 a; notice the T–F discontinuity in mode #3.

The accuracy of the SD methods is evaluated on the synthetic signal (\(s_1\)) through the decomposed components and T–F plots of \(s_1\) along with the corresponding QRF values. The T–F plots obtained for each method and their respective QRF values are shown in Fig. 2b–f. The input signal is depicted in (a) with its three components shown on its right. The color bar shown to the right of the figure applies both to the composite signal along with its 3 components. These results were obtained by manually tuning the algorithmic parameters to their optimal values for each method. The impact of changing the algorithmic parameters on the performance of different methods is discussed in section 5.

Figure 2
figure 2

Performance evaluation of SD and T–F approaches on the narrow-band synthetic signal. Plot (a) shows the original input signal with each of its constituent components shown in the same row. Similarly this is done for each method, EMD, VMD, VNCMD, SST and SSA in plots (bf), where the reconstructed signal is shown together with the respective extracted modes. QRF is calculated with respect to the corresponding original signal component and shown in the upper right corner of each T–F plot.

Figure 2 and the corresponding QRF values show that the EMD (a), SST (e) and SSA (f) are the worst performing methods for the narrow-band signal. For EMD, we clearly see mode-mixing within all of the extracted modes, resulting in a poor signal decomposition and hence lower values of QRF for all the 3 components. The discontinuity in the original component-3 results in the mode-3 ’closing’ this gap by taking signal components from mode 2. This indicates that the EMD does not perform well in the presence of T–F discontinuity.

In the case of SST, we see mixed results across the different components. The first two components \(s_{11}\) and \(s_{12}\) are recovered very well by the SST method as illustrated by the high QRF values. The third component \(s_{13}\) was not recovered completely as a result of the ridge extraction being unable to handle the T–F discontinuity. However, it is important to note that these results were obtained by setting the number of extracted components \(K=3\). For \(K=4\), the SST method would obtain accurate T–F representation of the signal, with the fourth extracted mode containing the missing part of the \(s_{13}\).

SSA was performed with \(L = 110\) embedding dimensions, with the results being widely unstable. Small alterations in the the amount of dimensions resulted in widely different results, with \(L=110\) producing the best results. Similarly to EMD, we see mode mixing in SSA decomposition. This however only occurs in the first 2 components due to those being too close. By shifting the first component down in the frequency domain, we observed the improved results with the SSA. The \(s_{13}\) was recovered well by the SSA method as illustrated by the high QRF value.

Figure 2d shows that the VNCMD performs especially well in terms of decomposing/reconstructing \(s_{11}\) and \(s_{12}\), delivering the best decomposition out of all the methods. This of course is highlighted by the corresponding QRF values. The method also reconstructed \(s_{13}\) fairly well with the corresponding \(QRF=24.4dB\). Despite its impressive performance, the VNCMD method comes with the caveat that the starting frequency of each component must be initialized close to the actual ‘ground truth’. For instance, in this experiment, the starting frequencies where chosen to be 30 Hz, 50 Hz and 85 Hz respectively to obtain these results. Choosing slightly different values for the starting center frequencies of the 3 modes resulted in sub-optimal results. Overall, we found the VNCMD to be heavily reliant or sensitive to the algorithmic parameter changes - explained further in section 5.

Finally, VMD offered the best performance for the narrow band signal as depicted in Fig. 2c. The T–F plots and the corresponding QRF values for all 3 components confirm this observation. Moreover, the method was overall robust to the parameter changes e.g., the above results were unchanged despite varying the parameter \(\alpha\) in the range of 30–1000.

Case study 2: wide-band signal

The second test signal was a synthetic signal consisting of 2 components: a wide-band chirp signal within a frequency range of approximately 50-150 Hz and a narrow-band signal generated from a sinusoid modulated with a Gaussian-smoothed Brownian motion. That gives a signal which does not exhibit purely sinusoidal changes over time but instead varies randomly. The wide-band component is given by \(s_{21} = e^{0.8t} cos(1.1\pi (0.8+50t-100t^2+416t^3-200t^4))\). The narrow-band signal is generated through Brownian motion with a drift rate \(\mu =-0.1\) and a volatility rate \(\sigma =0.1\). The resulting signal is smoothed with a Gaussian filter and then used to modulate a sinusoid, resulting in the component \(s_{22}\). The wide-band synthetic signal used in this study is generated by adding the two components, i.e., \(s_2 = s_{21} + s_{22}\). The T–F representation of \(s_{2}\) along with its two constituent modes in the same row is shown in fig. 3a. This signal is challenging from two aspects: i) due to its wide-band nature; ii) because of the closeness of its two components at the end of the signal. The signal has a duration of 1 second with a sampling rate of 512 Hz.

Similar to the analysis of \(s_1\), we will evaluate the accuracy of the SD methods on \(s_2\) both qualitatively and quantitatively via the T–F plots of the decomposed components and the corresponding QRF measure respectively. In our experiments, various algorithmic parameters of different methods were optimized for \(s_2\). The T–F plots obtained for each of method along with their respective QRF values are shown in Fig. 3b–f. The original signal is depicted in (a) with its respective components shown in the middle and right columns.

We start our analysis with the performance of the EMD algorithm: it performed poorly on \(s_2\) as reflected in both its T–F plots and the corresponding QRF values. In particular, the wide-band component \(s_{21}\) was split into two components with the narrow-band Brownian component being part of the second component. This was expected since by design EMD is suitable for the decomposition of signals with multiple narrow-band components. It is important to note that EMD introduced unwanted artifacts (around \(t=0.6 s\) in both modes) which were not part of the original signal.

Like EMD, the VMD also performed poorly on this wide-band signal. Again, the original wide-band component was decomposed in two separate modes by the VMD. Each decomposed mode therefore exhibits a low QRF value even though the overall reconstructed signal is close to the original. Unlike EMD though, there were no unwanted artefacts introduced by the VMD algorithm in its decomposed components, as shown in the Fig. 3c.

The SSA also produced sub-optimal results in terms of component decomposition or construction, as depicted by the low QRF values for both the extracted components. While it appears that the T–F plots of the decomposed components are close to the original, closer inspection reveal that there is mixing of information in both modes.

The VNCMD performed very well on the wide-band signal but with the caveat that its performance was very unpredictable with respect to initial conditions of the algorithm and its parameter setting. Initial frequency estimates for each decomposed component had to be close to the original ones for the method to work properly. Further, even small changes to the parameter and initial conditions considerably altered the final results. This highlights the serious convergence issues within the VNCMD method. Such limitations of the VNCMD method become pronounced in the presence of noise, as will be noticed in section 4.

Finally, the SST produced overall good results as shown in Fig. 3e. Despite the lower QRF values as compared to VNCMD, the SST was considerably more stable to parameter changes. This is in contrast to the EMD and VMD methods which are by design applicable to the signals containing narrow-band components.

Figure 3
figure 3

Performance evaluation of SD and T–F approaches on the wide-band synthetic signal. Plot (a) shows the original signal with each of its different modes separately. Similarly this is done for each method, EMD, VMD, VNCMD, SST and SSA in plots (bf), where the reconstructed signal is shown together with the respective modes that are extracted. QRF is calculated with respect to the corresponding original signal component and shown in the upper right corner of each T-F plot.

Case study 3: electroencephalogram (EEG) signal

This case study examines the performance of different SD methods on a real-life EEG data. EEG refers to a non-invasive technique that records brain’s electrical activity over a period of time and is commonly used in clinical settings to diagnose epilepsy, sleep disorders and brain death. Here, we investigate the human EEG data recorded in a resting state - eyes closed. During the rest state, there is a pronounced EEG activity within the frequency range of 8–12 Hz - the so called alpha-rhythms. On the other hand, opening his/her eyes is marked by reduced alpha-rhythm in the EEG data. The data comprised of EEG recordings from a single subject who remained in the relaxed state with his eyes closed for four seconds, as seen in Fig. 4a. The data was recorded at COMSATS University Islamabad and obtained through an OpenBCI Cyton board at a sampling rate of 250 Hz65.

Here, we applied different SD methods to the EEG data with an aim to extract the component corresponding to the alpha-rhythm. The smoothed spectra of the extracted components (named Imf1-Imf5 in the figure) obtained from EMD, VMD, VNCMD, SST and SSA methods are shown in Fig. 4b–f. In Fig. 4g, we show the plots of the extracted mode corresponding to the alpha-rhythm obtained from the different SD approaches.

Figure 4
figure 4

Results of SD methods applied on a real-life EEG data. The original time-series is shown in (a), while the spectra of the \(K=5\) decomposed components are shown for EMD (b), VMD (c), VNCMD (d), SST (e) and SSA (f). The spectra of the relevant modes (corresponding to the \(\alpha\) frequency range (8–10 Hz)) obtained from different methods are shown in (g).

The analysis of Fig. 4 suggests that all the methods were able to extract the component corresponding to the alpha-rhythm, albeit to the varying degree of accuracy. In the case of EMD, the relevant component, shown in orange in (b), demonstrates wide-band characteristics suggesting that the component contains artefacts in addition to the signal of interest i.e., alpha-rhythm. The spectra of the relevant components obtained from the VMD and the VNCMD methods, shown in green in (c) and (d) respectively, are relatively narrow-band signals peaking at the desired frequency range of 8–12 Hz - hence more accurate. In the case of SST, there appears to be significant overlap between the spectra of the adjacent components in addition to the spectrum of the relevant component being relatively wide-band. Both these observations suggest that the alpha-rhythm may not be extracted accurately by the SST. Finally, the best performance is delivered by the SSA method, as shown in (f). Not only are the extracted components separated from each other in the frequency domain but the relevant component (shown in orange) is narrow-band and ideally centered around the desired frequency range of 8–12 Hz. The superiority of the SSA method is further confirmed by examining the plot in (g) that shows the spectra of the alpha-rhythm obtained from different methods. It is clear that the SSA produced the alpha-rhythm component with the narrowest spectrum among all other methods.

Robustness to noise

Noise is ubiquitous in most real-life signals, masking the desirable information content in the data. It is therefore imperative to either remove noise from data as a preprocessing step or to design methods that are inherently robust to noise. SD and T–F approaches are routinely used in applications involving noisy data sets. To this end, we will investigate the performance of different data-driven SD methods in the presence of noise.

We employ the QRF measure to assess the accuracy of signal decomposition methods under noisy inputs. The noisy input is generated by adding the white Gaussian noise (wGn) with varying powers, corresponding to a range of SNR = 24 dB – 3 dB, to \(s_1\) (narrow-band signal) and \(s_2\) (wide-band signal). We generated 50 realizations of wGn corresponding to each SNR value and obtained an ensemble of decomposed components from the relevant SD methods. Then, the QRF values were computed for those decomposed components (from multiple SD methods), for all 50 realizations of input noisy data. The mean QRF value along with its standard deviation is computed across all 50 realizations for each decomposed component obtained from multiple SD methods. Finally, the QRF values of all decomposed components from a particular SD method were summed to obtain a single QRF value corresponding to each method. We plot those values in Fig. 5a, b for \(s_1\) and \(s_2\) respectively, against a range of input SNR.

For the noisy \(s_1\), VNCMD performed the best for the lower range of SNR values but exhibited large deviation in its performance as compared to all other techniques. The means that the performance of the VNCMD was erratic: performing very well on some noisy signals but significantly worse on others. VMD was found to be very robust to noise and demonstrated the lowest performance deviation across different noise realizations. SST also performed quite well across all the input SNR values. Note that the discontinuity in the T–F plane of \(s_1\) contributed to the low QRF values for SST as the method inherently is not designed to obtain components corresponding to the discontinuous T–F ridges. By removing the discontinuity, we found that the SST performed on par with the VMD. The SST’s performance was also very stable across different noise realizations as depicted by its low standard deviation range. EMD and SSA performed considerably worse than the other methods as highlighted by the very low QRF values. This was somewhat expected as both these techniques also did not perform well in the absence of noise (see Figs. 2 and 3).

For the wide-band signal \(s_2\), the SST was the best performing method, not only in terms of higher QRF values but also the lower standard deviation of the QRF across all nose realizations. The VNCMD was again very unpredictable showing very large performance deviations especially at high SNR values. This is in addition to the fact that the VNCMD requires initial component frequencies to be set very close to the ‘ground truth’ for any meaningful results. Not surprisingly, the performance of the EMD, VMD and SSA was poor owing to the wide-band nature of the input signal \(s_2\). With EMD and VMD, however, the performance variation across multiple noise realizations was minimal.

Figure 5
figure 5

QRF versus SNR for synthetic signal 1 (a) and synthetic signal 2 (b), averaged over 50 different iterations of Gaussian white noise. The standard deviation is plotted as error bars for each method.

Overall, the SST was found to be the most robust technique against noisy data. This was expected since SST is the only SD approach that comes with theoretical guarantees regarding its robustness against noise22.

Sensitivity to changes in parameters

This section describes the dependence of the SD methods on their algorithmic parameters and how sensitive or robust they are to the changes in parameters.

The VMD method is dependent on the following parameters: (i) the number of components K to be extracted; (ii) \(\alpha\) that dictates the bandwidth of the extracted components; (iii) \(\tau\) that determines the signal reconstruction accuracy.

Choosing the right number of components K is important for the operation of VMD. Not doing so typically results in either mode-mixing or mode-splitting in the decomposed components. Several recent works address this issue: for instance, successive VMD70 obtains the components in an iterative fashion and therefore does not require the user-defined parameter K. Further, adaptive extensions of the VMD method71 have also recently surfaced.

The \(\alpha\) parameter also influences the result of the signal decomposition, but the range of its viable values is typically quite large. For instance, in our experiments, no discernible difference in the VMD output was found for the test signals \(s_1\) and \(s_2\), for the range of \(\alpha =50-1000\). For the values above \(\alpha =2000\), the decomposition accuracy began to suffer for the two test signals. For the EEG signal, the values of \(\alpha =2000\) produced the best output as the narrow-band alpha-rhythm was the desired signal. It is noteworthy that the optimal value for \(\alpha\) will depend on the input signal and application at hand. Higher values of \(\alpha\) reduce the bandwidth of the extracted components and may increase the risk of not capturing the correct center frequencies. On the other hand, lower values of \(\alpha\) tend to increase the bandwidth of the components and may introduce artefacts and mode-mixing in the output.

Lastly, setting \(\tau = 0\) is optimal for signals with noise as complete reconstruction is not needed in this case. For the non-noisy signals, setting \(\tau\) larger than zero but smaller than 1 yielded more accurate results.

Being an extension of the VMD method for wide-band signals, VNCMD shares some of the parameters of the VMD method e.g., the number of components K and the bandwidth related parameter. However, the method introduces several new user-defined parameters/initial conditions, proper tuning of which is critical to the operation of VNCMD. For example, the specification of the initial values for the center frequencies of the extracted components is an important parameter. In our experiments, for \(s_1\), the initial values were chosen to be 85 Hz, 50 Hz and 30Hz, for component 3, 2 and 1 respectively; but changing any of those by just 5 Hz significantly deteriorated the results. As an example, changing the initial frequency of component 3 from 85 Hz to 90 Hz resulted in the corresponding QRF dropping from 24.4 to 4.4 dB.

Unlike VMD, the VNCMD was found sensitive to variations in the \(\alpha\) parameter that dictates the bandwidth. Further, the method needs tuning of another parameter \(\mu\) that relates to the update of the instananeous frequency (IF) within VNCMD. In our experiments, we found the VNCMD method to be very sensitive to changes in \(\mu\) so much so that in some cases the method even failed to converge. Overall, the VNCMD method suffers from serious convergence issues and even in cases where it converges, it is hard to tune the algorithmic parameters to yield optimal results.

In SST, an important parameter is the number K of decomposed components. Further, since SST involves taking the wavelet (or short-time Fourier) transform of the signal, specifying related parameters, e.g., the mother wavelet (e.g., Morlet), the number of scales (frequencies), and \(\mu\) that dictates the time-frequency trade off, is an important step. Following the application of the time-scale (frequency) transform and squeezing of the resulting spectrum, the next step within SST is the curve extraction which affects the signal decomposition results via SST. The important parameters governing the curve extraction process include StartBand which set the initial width of the band around each center frequency and MaxStepSize which restricts the maximum amount of change in center frequency as the curve extraction goes forward in time. In our experiments, we obtained good results by choosing the above two parameters within the range of 5–30. For noise robustness, the SST utilizes a threshold parameter \(\gamma\); in our experiments, in our experiments involving noise robustness, we obtained optimal results by setting \(\gamma = 10^{-6}\).

The SSA was found to be very sensitive to parameter changes. Like VMD and SST, SSA also requires users to specify the number of decomposed components a priori. The embedding dimension L is another parameter that determines how well the components are separated. For the test signals used in our study, even small changes in L produced results of varying accuracy. We found that the range of \(L=30-150\) produced decent results, with higher values increasing the computation time but not necessarily the accuracy (QRF). The threshold parameter \(\epsilon\) that affects robustness against noise was set in the range of \(\epsilon = 10^{-5}\) to \(10^{-7}\) to provide good results.

The EMD method is advantageous in a sense that it does not require a priori specification of the number of decomposed components. An considerations with the EMD method is the choice of interpolation function for signal extrema. For that purpose, cubic spline interpolation is by far the most widely used technique within EMD. Another important factor in EMD operation is the stopping criterion for decomposed components (also known as intrinsic mode function or IMF). We use a popular criterion, introduced in40, which uses two threshold values to stop the sifting process within EMD.

Multivariate signal decomposition

Multivariate signals comprise of multiple data channels (time-series) that may be correlated to each other. While processing such signals, it is vital to consider inherent correlation among multiple data channels. That requires representing and viewing a multivariate signal in multidimensional space where it naturally resides. With this representation, the decomposition of multivariate signals amounts to the extraction of inherent rotational components (instead of oscillatory components for univariate signals) in multidimensional spaces. This principle forms the crux of two of the most popular classes of multivariate signal decomposition approaches, (bi-)multi-variate extensions of EMD (BEMD63, MEMD60) and VMD (MVMD)65, which will be evaluated in this section. The details of the operation of the two algorithms as well as the illustration of the multivariate rotational components obtained from those can be found in the relevant articles60,63,65.

In particular, we shall investigate and evaluate the mode-alignment property of BEMD and MVMD in this section. We shall also examine how the addition of noise in multivariate data affects the mode-alignment property. The mode-alignment refers to the matching of modes with similar frequency content across multiple channels in the same indexed components. This property is considered a prerequisite for a wide range of engineering applications involving non-stationary multivariate signals, e.g., data fusion17, denoising15 and biomedical signal classification11.

Mode-alignment

This section will explore the mode-alignment properties of multivariate extensions of EMD and VMD, and compare those against a univariate method (VMD) to highlight the superiority of the multivariate approaches. We will consider both synthetic and real-life biomedical signals to support our analysis.

Case study 1: multivariate synthetic signal

The first case study involved a synthetic bivariate (two-channel) signal containing a combination of two components in the first channel, a 2 Hz and a 50 Hz sinusoid; and a combination of three oscillatory components comprising of a 2 Hz, 20 Hz and a 50 Hz sinusoid in the second channel. The correct mode-alignment of the decomposed data should result in three decomposed components with the first component containing the low-frequency 2 Hz component in both channels. The second decomposed component should contain a 20 Hz sinusoid in channel two with no signal in channel 1. Similarly, the third decomposed component should contain a 50 Hz sinusoid in both the channels.

The Fig. 6 a, b, c show the signal decomposition results of the noisy multivariate synthetic signal corresponding to the \(SNR=40dB\) obtained by applying the VMD (a), MVMD (b) and BEMD (c) methods. It is clear that the VMD could not properly align the sinusoids across different channels in the components 2 and 3: specifically, the decomposed component in mode 2 contains two signals with different frequencies, i.e. 20 Hz and 50 Hz. On the other hand, the MVMD and the BEMD methods were both successful in decomposing the three modes with correct sinusoids, showcasing mode-alignment capability.

Figure 6
figure 6

Signal decomposition of noisy multivariate synthetic signals, corresponding to the \(SNR=40dB\) (top row) and \(SNR=10dB\) (bottom row) with white Gaussian noise, obtained by applying the VMD (channel-wise), MVMD and BEMD algorithms.

We next increased the input noise strength in all channels of the multivariate signal corresponding to the \(SNR=10dB\) and applied the three methods again to obtain the signal decomposition. The results are shown in Fig. 6d, e, f for VMD, MVMD and BEMD respectively. It is apparent that at higher input noise levels, the performance of the BEMD deteriorates significantly with no mode-alignment present in any of the three decomposed components. It was observed (while not shown here) that the BEMD failed to decompose multivariate signals properly even at the SNR level close to 30dB. Contrarily, MVMD was found to be robust to noise, delivering impressive results in terms of mode-alignment at the \(SNR=10dB\) as shown in Fig. 6e. The decomposition results obtained by applying the VMD algorithm separately on multiple channels of the noisy multivariate synthetic signal are also shown in Fig. 6d. It can be noticed that the mode-alignment is absent in this case owing to the fact that the VMD does not operate in multidimensional space where the signal resides.

Case study 2: Cardiotocographic data

The second case study for the qualitative evaluation of multivariate SD approaches in terms of mode-alignment involved a bivariate Cardiotocographic (CTG) data of Fetal Heart Rate (FHR) and Maternal Uterine Contraction (UC), as shown in Fig. 7. The recordings were taken from the freely available CTU-UHB intrapartum cardiotocography database by Physionet72.

Figure 8 shows the Fourier spectrum of the decomposed modes, extracted using the VMD (a), MVMD (b) and BEMD (c) methods. We set the total number of modes retrieved components to be equal to \(K=8\). It can be noticed that the decomposed components from the VMD method (shown in the first column) are mostly free of artifacts related to mode-mixing as each mode consists of a narrow-band signal. That said, VMD components show poor mode-alignment across channels in almost all the decomposed components, as exhibited by misalignment of the spectra of the two channels in each mode.

The decomposed components obtained from MVMD, shown in Fig. 8b, exhibit accurate mode-alignment together with the narrow band signals – indicating minimal mode-mixing. Another highlight of the MVMD is that the multiple decomposed components are non-overlapping in the frequency domain. This is not the case with BEMD which obtains decomposed components having relatively large bandwidths with overlapping spectra. However, the mode-alignment across multiple channels is also apparent in the case of BEMD with the exception of mode 7 where spectra of the two channels are not well-aligned.

Figure 7
figure 7

Time plot of cardiotocographic (CTG) recording of Fetal Heart Rate (FHR) and Maternal Uterine Contraction (UC).

Figure 8
figure 8

Demonstration of mode-alignment capability of multivariate SD approaches on real-life biomedical data. The mode-alignment is not present in the case of VMD (a) whereas the decomposition obtained via MVMD (b) and BEMD (c) shows proper mode-alignment, with MVMD showing better mode-separation than BEMD.

Sensitivity to changes in parameters

Like EMD, the most important parameters affecting the performance of (B)MEMD relate to the choice of the scheme used for extrema interpolation. We employed the widely used cubic spline interpolation in our experiments. For the decomposition of multivariate signals, (B)MEMD takes multiple uniform projections of input signal in multidimensional space. The number of those projections is a user-defined parameter within (B)MEMD. Fortunately, we found that the (B)MEMD was rather insensitive to the changes in that parameter: we used \(M=64\) projections in our experiments but found that any value of \(M\ge 8\) yielded approximately similar results.

The MVMD uses the same parameters as VMD, making the discussion in section 5 also relevant here. That said, in MVMD, we need to initialize the center frequencies \(\omega _c\) of all channels. In our experiments, we found that initializing those to zero worked well in all the experiments that were conducted in this study.

Table 1 Comparing the performance of the data-driven SD approaches in terms of task performed, accuracy, robustness to noise, robustness to changes in parameters, practical speed and data-driven nature.

Discussion

In this section, we comment on the overall strengths and weaknesses of the evaluated SD methods in light of the insights developed through the experiments conducted in this study. In particular, we will evaluate their performance in terms of accuracy in signal (mode) decomposition, robustness to noise, data-driven nature, sensitivity to parameter changes and their computational complexity. A summary is provided in Table 1, in light of the our experimental observations regarding the performance of the most popular SD approaches. The table is self-explanatory, however, few observations on the major drawbacks of the existing approaches are in order.

In terms of the tasks performed, most modern data-driven approaches accomplish both SD and T–F analysis, as highlighted in the table. However, some techniques are known to to better at a particular task: for instance, while SST is considered as state-of-the-art in T–F analysis, it exhibits weakness in SD owing to the lack of a robust ridge extraction technique. EMD and VMD are powerful signal decomposition approaches with T–F analysis capability coming from the application of the Hilbert transform on the decomposed components. VNCMD inherently obtains signal decomposition as well as T–F signatures of those components while SSA has been mainly designed as a signal decomposition method.

Regarding the accuracy of SD and T–F analysis, SST and VNCMD were found to be the best in our experiments. This was mainly due to ability of these approaches to cater for both narrow- and wide-band signals. The experiments in section 3 confirm our observation. For narrow-band signals, VMD was found to be superior as compared to both EMD and SSA. It should also be noted that only SST comes with theoretical guarantees for accurate decomposition of signals in terms of AM–FM components. Like their univariate counterparts, MVMD is overall superior to MEMD in terms of avoiding the mode-mixing problem. Moreover, as illustrated in the Fig. 8, MVMD better aligns similar frequency components across channels (mode-alignment) as compared to MEMD.

The SD methods considered in this work are all data-driven owing to the fact that those do not rely on the projection (or correlation) of input data along predefined fixed basis functions. Yet, the degree (or extent) of the data-driven nature of these methods varies significantly. In Table 1, the data-driven nature of the SD methods is measured through their dependence on prior information about input data in the form of user-defined parameters. In particular, EMD is a highly data-driven method that requires very few user-defined parameters for its operation. The EMD method is even able to find optimal number of signal modes, K, automatically. These useful attributes of the EMD algorithm are shared by its multivariate extension, MEMD. The VMD and VNCMD are only partially data-driven as they require prior information about the number of signal modes K as well as a rough idea of the bandwidth of the desired modes (needed to tune the \(\alpha\) parameter) for its operation. Similarly, SST relies on the prior information about the number of modes K, the mother wavelet function and other parameters needed for the ridge-extraction procedure. To operate effectively, SSA also needs prior information about the optimal number of modes K along with an accurate estimate of the embedding dimension L of input signal. Therefore, the SSA method can only be considered as partially data-driven at best.

In terms of noise robustness, SST and VMD performed comparatively well. Note that among all the methods that are considered here, SST is the only one that is complemented with theoretical guarantees for noise robustness22. While the VNCMD performed exceedingly well in some cases, we found that its performance was quite erratic on other noise realizations; so much so that in some cases, the VNCMD even failed to converge. The performance of EMD and SSA methods deteriorated even in the presence of small amounts of noise. The multivariate extensions of EMD and VMD inherit the noise robustness properties of their univariate counterparts i.e., MVMD is robust to noise as compared to MEMD.

The VNCMD is highly sensitive to changes in its parameters and was found to be difficult to tune in all our experiments. Particularly, it required very accurate initial estimates of center frequencies to produce any meaningful results. The performance of VMD and EMD was generally stable to changes in their parameters. The EMD is the only method that did not require the number of components K to be defined a priori. The SST was also robust to changes in parameters but required proper tweaking of its ridge extraction scheme to deliver meaningful SD results. The SSA was found to be very sensitive to changes in its parameters. Finally, like their univariate counterparts, the multivariate extensions of EMD and VMD were found to be highly and moderately robust to parameter changes respectively.

While a detailed analysis of the computational complexity of the data-driven SD approaches is beyond the scope of this work, we comment on the complexity of the SD methods based on our experimental observations on the ensemble of \(s_1\) and \(s_2\). Only the time complexity was considered here and not the space/memory complexity. We found VMD to be the fastest of all the methods followed by EMD, SSA, VNCMD and SST. Among the multivariate extensions, the MVMD was found to be more computationally efficient than MEMD.

Based on these observations, which process should one follow to choose an appropriate SD method for a given input data and task? To answer this, it is important to know any prior information that we may have about the data e.g., the number of inherent components K, whether those components are narrow- or wide-band and expected noise in the data. To obtain this information, one could start with the exploratory data analysis of input signal by e.g., plotting its Fourier spectrum or spectrogram. Indeed, the number of peaks of the spectrum can provide a rough idea of the number of possible modes K of the data whereas the spectrogram can give information about the presence of narrow- or wide-band signal modes. If prior information about the signal of interest is not known and cannot be adequately obtained from exploratory data analysis, highly data-driven approaches such as EMD could be a good first option. On the other hand, if some information about the signal of interest is available, partial data-driven approaches may be more appropriate: if K is known, VMD may yield more accurate results than EMD; additionally for extracting wide-band signal modes, VNCMD and variants of SST46 should be used (instead of EMD, VMD and SSA). Finally, for noisy signals, SST could be a better option owing to its demonstrated practical and theoretical robustness against noise.

Conclusion

A comparative analysis of modern data-driven signal decomposition and time-frequency methods has been performed. The methods include empirical mode decomposition (EMD), variational mode decomposition (VMD), variational nonlinear chirp mode decomposition (VNCMD), synchrosqueezed transform (SST) and sliding singular spectrum analysis (SSA). In addition, we have also compared the performance of the two popular multivariate extensions of the data-driven signal decomposition methods, including multivariate VMD (MVMD) and (bi-)multi-variate EMD (MEMD). The methods have been evaluated in terms of their accuracy to decompose non-stationary signals into AM–FM like components, robustness to noise, robustness to changes in their parameters and alignment of similar frequency components along multiple channels of a multivariate signal (mode-alignment). This has been achieved via multiple experiments involving carefully designed synthetic signals (with narrow- and wide-band properties) and real-life biomedical signals. Our observations from those experiments have been summarized in Table 1 which show that the SST performs best in terms of accuracy and robustness to noise, whereas VMD is superior in respect of robustness to parameter changes and speed while also being reasonably accurate and robust to noise. The performance of the SSA and EMD methods has been found to be relatively sub-optimal both in terms of accuracy and robustness to noise. Finally, it has been observed that VNCMD is quite erratic in its performance, performing well in some cases while not converging at all in others – in addition to the fact that it is highly difficult to tune its parameters to yield reasonable performance. Among multivariate approaches, we have observed MVMD to be superior to MEMD in terms of accuracy, noise robustness, mode-alignment and computational efficiency.

For future work in this area, the following avenues/directions could be considered: (i) no available signal decomposition technique is adequately robust to noise under simple conditions, calling for novel robust SD approaches; (ii) no existing SD technique provides complete guarantees for accuracy/correctness and robustness to noise under practical conditions; and (iii) all existing data-driven techniques are computationally expensive (when compared against wavelets and short-time Fourier transform), especially for medium- to large-sized (e.g., multivariate) data.