1 Introduction

Recent advances in the DNA sensing technologies allowed the collection of microarray genomic and metabolic data from various biological systems. These data are a valuable source of information about the biological systems they are collected from. For example, the availability of time-series dynamic genomic or metabolic data has made it possible to construct dynamic models that can not only be used to characterize the behavior of such biological systems and the interactions among different genes (Jong 2002; Chou et al. 2006; Gonzalez et al. 2007; Kutalik et al. 2007; Wang et al. 2009, 2010; Meskin et al. 2011; Zhou et al. 2012; Huang et al. 2009; Qiu and Plevritis 2011; Noor et al. 2012), but also to design intervention strategies for curing/managing major disease phenotypes (Ervadi-Radhakrishnan and Voit 2005; Meskin et al. 2011; Nounou et al. 2012; Meskin et al. 2012). Unfortunately, biological data are usually contaminated with measurement noise that can greatly degrade the usefulness of the data (Hulse et al. 2012). For example, constructing a dynamic genetic regulatory network model using noisy time-series biological data will not only affect the accuracy of the estimated model, but also the effectiveness of any intervention technique that can be developed based on that model (Kutalik et al. 2007). This means that measurement errors in biological data need to be filtered to reduce the noise content and enhance the usefulness of these data in practice. The applicability of genomic and metabolic data in modeling and intervention biological phenomena is schematically illustrated in Fig. 1.

Fig. 1
figure 1

A schematic diagram illustrating the applicability of metabolic data in practice

Generally, data filtering techniques can be classified into three main categories: filtering with a model, filtering with an empirical model, and filtering without a model. The model-based filtering techniques rely on minimizing the error between the measured and filtered data while requiring the filtered data to satisfy the available model. Methods in this category include Kalman filtering (Sorenson 1985), Moving Horizon Estimation, and particle filtering (PF) (Gustafsson et al. 2002; Arulampalam et al. 2002; Rawlings and Bakshi 2006). For nonlinear processes, different versions of Kalman filters have been developed, which include the extended Kalman filter (EKF) (Simon 2006; Grewal and Andrews 2008; Julier and Uhlmann 1997; Ljung 1979; Kim and Park 1994) and the unscented Kalman filter (UKF) (Simon 2006; Grewal and Andrews 2008; Kim and Park 2000, 2001; Kim et al. 2007). In the EKF, the model describing the system is linearized at every time sample (to estimate the mean and covariance matrix of the state vector), and thus, the model is assumed to be differentiable. Unfortunately, for highly nonlinear or complex models, the EKF does not usually provide a satisfactory performance. On the other hand, instead of linearizing the model to approximate the mean and covariance matrix of the state vector, the UKF uses the unscented transformation to improve the approximation of these moments. In the unscented transformation, a set of samples (called sigma points) are selected and propagated through the nonlinear model, which provides more accurate approximations of the mean and covariance matrix of the state vector, and thus, results in a more accurate state estimation. Both EKF and UKF have been used in modeling biological systems and in designing intervention techniques that can be used in treating diseases (Meskin et al. 1979, 2011). The PF, on the other hand, is an implementation of a recursive Bayesian estimator (Gustafsson et al. 2002; Arulampalam et al. 2002), which relies on approximating the posterior (which is the density function of the unobserved state vector given a sequence of the observed data) as a set of random samples called particles. The advantage of the PF is that it is not restricted to linear and Gaussian processes, which makes it applicable in a wide range of applications. Of course, in all the model-based filtering techniques, the quality of the estimation depends on the accuracy of the models used.

A major challenge when dealing with biological systems is that models are not usually available a priori. This is because of the scarcity and the quality of the available biological data. In the absence of a fundamental model and in the case of multivariate filtering, an empirical model that is extracted from the relationship between the measured variables can also be used in data filtering. Methods in this category include Principal Component Analysis (PCA) (Kramer and Mah 1994). Since accurate process models are usually not available, the most widely used filtering methods do not rely on fundamental or empirical models; instead, they rely on information about the nature of the errors or the smoothness of the underlying signal. Examples of model-free filters include the well-known low-pass filters, such as the Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters (Tham and Parr 1994). Examples of FIR and IIR filters include the mean filter (MF) and the exponentially weighted moving average (EWMA) filter, respectively. These are very popular filters as they are computationally efficient and can be implemented online. However, they are not very effective when used to filter data containing features with varying contributions over time and frequency. This is because low-pass filters define a frequency threshold above which all features are considered noise. Therefore, they may eliminate important features having higher frequencies than the threshold value and retain noise components that are changing at lower frequencies than the threshold value. An example of a high frequency feature is a sharp change in the data, while an example of low frequency noise is correlated (or colored) measurement errors. Thus, filtering the practical biological data that may contain features spanning wide ranges in time and frequency requires multiscale filtering algorithms that can account for this multiscale nature of the data.

Wavelet-based multiscale representation of data has been widely used in the analysis and investigation of various biological systems. For example, wavelets have been used to analyze genomic or DNA sequences to detect specific patterns (Arneodo et al. 1996, 1998; Dodin et al. 2000; Li 2003; Huang et al. 2008; Nguyen 2010). Also, wavelets have been applied to various aspects of protein structural investigations, including secondary and tertiary structure determination (Murray et al. 2002), refinement of X-ray crystallography (Main and Wilson 2000), and drug design and visualization (Mandell et al. 1998). Moreover, wavelets have been used in the analysis of microarray data and functional magnetic resonance imaging (fMRI) data (Dinov et al. 2005). Other applications of wavelets in biological systems include the analysis of 3-D biological shapes using conformal mapping and spherical wavelets. Wavelet-based filtering has also been used to denoise biological data. The authors in (Prasad et al. 2008) developed a new thresholding filter to be used in multiscale denoising of biological signals, and the authors in (Ustndag et al. 2012) used wavelet analysis and fuzzy thresholding in ECG data filtering.

In this work, various model-based as well as model-free filtering techniques will be compared when applied to filter time-series biological data representing dynamic metabolic measurements corresponding to four different genes. The model-based filtering techniques include the EKF, the UKF, and the PF. The model-free techniques include various low-pass filtering as well as several multiscale filtering techniques. Examples of low-pass filtering methods include MF and EWMA filtering. Examples of multiscale filtering methods, on the other hand, include standard multiscale (SMS) filtering, online multiscale (OLMS) filtering, translation invariant (TI) filtering, and boundary corrected TI (BCTI) filtering.

The remainder of this paper is organized as follows. In Sect. 2, a statement of the model-based filtering problem is presented followed by descriptions of some of the state estimation techniques used to solve this filtering problem. Then, in Sect. 3, a brief description of some of the low-pass filtering techniques is presented, followed by a description of multiscale representation of data and some of the multiscale filtering methods. Then, in Sect. 4, the various model-based and model-free filtering methods are compared using simulated time-series metabolic data. Finally, concluding remarks are presented in Sect. 5.

2 Model-based filtering

In this section, a problem statement of model-based filtering of biological data is presented, followed by descriptions of some the state estimation techniques that can be used to solve it.

2.1 Problem statement

Let the changes in the concentrations of various metabolites or proteins in a biological system be described by the nonlinear state space model described below:

$$ \begin{aligned} \dot{x}&=g(x, u, \theta, w), \\ y&=l(x, u, \theta, v), \end{aligned} $$
(1)

where \({x\in \mathbb{R}^{n}}\) is a vector of the state variables (concentrations of certain metabolite inside the cell), \({u \in \mathbb{R}^p}\) is a vector of the input variables (any manipulated variables that can change the state variables), \({\theta \in \mathbb{R}^q}\) is a known parameter vector, \({y \in \mathbb{R}^m}\) is a vector of the measured variables (measured metabolite concentrations), \({w \in \mathbb{R}^{n}}\) and \({v \in \mathbb{R}^{m}}\) are process and measurement noise vectors, respectively, and g and l are nonlinear differentiable functions that describe the changes in the state variables over time. Discretizing the state space model (1), the discrete model can be written as follows:

$$ \begin{aligned} x_k&={\mathfrak{F}}(x_{k-1}, u_{k-1}, \theta_{k-1}, w_{k-1}),\\ y_k&={\mathfrak{R}}(x_{k}, u_{k}, \theta_k, v_{k}), \end{aligned} $$
(2)

which quantifies the state variables at some time step (k) in terms of their values at a previous time step (k − 1). In Eq. (1), the two nonlinear functions \({\mathfrak{F}}\) and \({\mathfrak{R}}\) are the equivalent functions to the functions g and l, but for the discrete model. Let the process and measurement noise vectors have the following properties: \(\mathbf{E}[w_k]=0, \mathbf{E}[w_k w_k^T]=\mathbf{Q}_k, \mathbf{E}[v_k]=0\) and \(\mathbf{E}[v_k v_k^T]=\mathbf{{R}}_k, \) where \(\mathbf{Q}_k\) and \(\mathbf{R}_k\) are the covariance matrices of the process and measurement noise vectors, respectively. The objective here is to estimate the state vector x k , given the measurements vector y k . Some of the techniques that can be used to solve this state estimation (filtering) problem are presented next.

2.2 Description of state estimation techniques

In this section, the formulations as well as the algorithms of some of the state estimation techniques (which include EKF, UKF, and PF) will be presented.

  1. 1.

    Extended Kalman filtering (EKF): As the name indicates, EKF is an extension of Kalman filtering (KF) (Lee and Ricker 1994), where the model is linearized to estimate the covariance matrix of the state vector. As in KF, the state vector x k is estimated by minimizing a weighted covariance matrix of the estimation error, i.e., \(\mathbf{E}[({x}_k-\widehat{x}_k)\mathbf{M}({x}_k-\widehat{x}_k)^T], \) where \(\mathbf{M}\) is a symmetric nonnegative definite weighting matrix. If all the states are equally important, \(\mathbf{M}\) can be taken as the identity matrix, which reduces the covariance matrix to \(\mathbf{P}=\mathbf{E}[({x}_k-\widehat{x}_k)({x}_k-\widehat{x}_k)^T]. \) Such a minimization problem can be solved by minimizing the following objective function:

    $$ {\mathbf{J}}=\frac{1}{2}Tr\Bigl({\mathbf{E}}[({x}_k-\widehat{x}_k)({x}_k-\widehat{x}_k)^T]\Bigr). $$
    (3)

    subject to the model defined in Eq. (2). To minimize the above objective function (3), EKF estimates the state vector using a two-step algorithm: prediction and estimation (or update), which are described next (refer to Algorithm 1). In Algorithm 1, \(\mathbf{P}_{{k}|{k-1}}\) is the innovation (or residual) covariance matrix, \(\mathbf{K}_{{k}}\) is the near-optimal Kalman filter gain matrix, \(\mathbf{P}_{{k}|{k}}\) is the updated covariance matrix, and \(\widehat{x}_{{k}|{k}}\) is the updated state estimate. In addition, the state transition and observation matrices are defined to be the following Jacobians, \({\mathbf{A}_{k-1} \approx \frac{\partial \mathfrak{F}}{\partial x}|_{\widehat{x}_{{k-1}|{k-1}}}, \mathbf{C}_{k-1} \approx \frac{\partial \mathfrak{R}}{\partial x}|_{\widehat{x}_{{k-1}|{k-1}}}, \mathbf{G}_{k-1} \approx \frac{\partial \mathfrak{F}}{\partial \epsilon}|_{\epsilon_{k-1}}}\) and \({\mathbf{H}_{k} \approx \frac{\partial \mathfrak{R}}{\partial v}|_{v_{k}}, }\) which are the matrices of the linearized system model at every time step.

  2. 2.

    Unscented Kalman filtering: for highly nonlinear systems, the EKF may not provide a satisfactory performance because EKF approximates the mean and covariance matrix of the nonlinear state vector by linearizing the nonlinear model, which may not provide good approximations of these moments. To enhance the estimation of the mean and covariance matrix of the state vector, the UKF, which relies on the unscented transformation, has been developed (Simon 2006; Grewal and Andrews 2008; Kim and Park 2000, 2001; Kim et al. 2007). The unscented transformation is a method for calculating the statistics of a random variable which undergoes a nonlinear mapping (Julier and Uhlmann 1997). Assume that a random variable \({x \in \mathbb{R}^{r}}\) with mean \(m_x=\mathbf{E}[({x}_k-\widehat{x}_k)\mathbf{M}({x}_k-\widehat{x}_k)^T]\) and covariance matrix \(\mathbf{P}_x\) is transformed by a nonlinear function, y = f(x). To find the statistics of y, 2r + 1 sigma vectors are defined as follows (Van Der Merwe et al. 2001):

    $$ \begin{aligned} &X_0=\overline{x} \\ &X_i=\overline{x}+(\sqrt{(r+\lambda){\mathbf{P}}_x})_i\quad i=1,...,r \\ &X_i=\overline{x}-(\sqrt{(r+\lambda){\mathbf{P}}_x})_i \quad i=r+1,...,2r \\ \end{aligned} $$
    (4)

    where, λ = e 2(r + k) − r is a scaling parameter and \((\sqrt{(r+\lambda)\mathbf{P}_x})_{i}\) denotes the ith column of the matrix square root of \((r+\lambda)\mathbf{P}_x. \) The constant 10−4 < e < 1 determines the spread of the sigma points around \(\overline{x}, \) and the constant k is a secondary scaling parameter which is usually set to zero or 3 − r (Julier and Uhlmann 1997). Then, these sigma points are propagated through the nonlinear function, i.e.,

    $$ Y_i=f({X}_i)\quad i=0,...,2r $$
    (5)

    and the mean and covariance matrix of y can be approximated as weighted sample mean and covariance of the transformed sigma points of Y i as follows:

    $$ \begin{aligned} \overline{y} &\approx \sum_{i=0}^{2r}{W_i^{(m)}Y_i}, \\ \hbox {and } {\mathbf{P}}_x &\approx \sum_{i=0}^{2r}{W_i^{(c)}(Y_i-\overline{y})(Y_i-\overline{y})^T}, \\ \end{aligned} $$
    (6)

    where, the weights are given by:

    $$ \begin{aligned} &W_i^{(m)}= \frac{\lambda}{\lambda+r},\\ &W_0^{(c)}= \frac{\lambda}{\lambda+r}+(1-e^2+\xi), \\ \text {and}\; &W_i^{(m)}= W_i^{(c)}=\frac{1}{2(\lambda+r)},\quad i=0,...,2r. \end{aligned} $$
    (7)

    The parameter ξ is used to incorporate prior knowledge about the distribution of x. It has been shown that for a Gaussian and non-Gaussian variables, the unscented transformation results in approximations that are accurate up to the third and second order, respectively (Wan and Van Der Merwe 2001). Based on this unscented transformation, the unscented Kalman filtering algorithm can be summarized as shown in Algorithm 2. In Algorithm 2, \(\widehat{x}_{0}\) is the initial estimate, \(\mathbf{P}_{{x}_{0}}\) is the initial covariance matrice, \(\widehat{x}_{k}^- \) is the predicted state, \(P_{{xk}}^{ - } \) is the predicted covariance matrix, X k|k−1 is the sigma point, Y k|k−1 is the projected sigma point through the observation function, \(\widehat{y}_{k}\) is the predicted output, P yk is the updated output cross-covariance matrix, P xk yk is the state-measurement covariance matrix, K k is the UKF Kalman filter Gain, \(\widehat{x}_{k}\) is the updated state vector, and P xk is the updated covariance matrix.

  3. 3.

    Particle filtering: a PF is an implementation of a recursive Bayesian estimator (Gustafsson et al. 2002; Arulampalam et al. 2002). Bayesian estimation relies on computing the posterior p(x k |y 1:k ), which is the density function of the unobserved state vector, x k , given a sequence of the observed data \(y_{1:k} \equiv \{y_1,\; y_2,\;\cdots,\; y_k\}. \) However, instead of describing the required posterior distribution in a functional form, in PF, it is represented approximately as a set of random samples of the posterior distribution. These random samples, which are called the particles of the filter, are propagated and updated according to the dynamics and measurement models (Doucet and Johansen 2009; Arulampalam et al. 2002). The advantage of the PF is that it is not restricted by the linear and Gaussian assumptions, which makes it applicable in a wide range of applications. The basic form of the PF is also simple, but may be computationally expensive. Thus, the advent of cheap, powerful computers over the last decade has been a key to the introduction and utilization of PF in various applications.

For a given dynamical system describing the evolution of the states and parameters that we wish to estimate, the estimation problem can be viewed as an optimal filtering problem (Andrews et al. 2006), in which the posterior distribution, p(x k |y 1:k ), is recursively updated. Here, the dynamical system is characterized by a Markov state evolution model, p(x k | x 1:k−1) = p(x k | x k−1), and an observation model, p(y k |x k ). In a Bayesian context, the task of state estimation can be formulated as recursively calculating the predictive distribution p(x k |y 1:k−1) and the filtering distribution p(x k |y 1:k ) as follows,

$$ \begin{aligned} &p(x_{k}|y_{1:k-1}) = \int\limits_{{\mathbb{R}}^{n}} p(x_{k}|x_{k-1})p(x_{k-1}|y_{1:k-1}){\rm d}x_{k-1}, \\ &\text {and}, \quad p(x_{k}|y_{1:k})= \frac{p(y_k|x_{k})p(x_{k}|y_{1:k-1})}{p(y_k|y_{1:k-1})}, \\ \text{where}, \quad &p(y_k|y_{1:k-1}) = \int\limits_{{\mathbb{R}}^x} p(y_k|x_{k})p(x_{k}|y_{1:k-1}){\rm d}x_{k}. \\ \end{aligned} $$
(8)

The state vector x k is assumed to follow a Gaussian model, \(x_{k} \thicksim \mathcal{N}(\mu_{k},\lambda_{k}). \) Thus, the marginal state distribution is obtained by integrating over the mean and covariance matrix as follows,

$$ p(x_k | x_{k-1}) = \int {\mathcal{N}} (x_k | \mu_k, \lambda_k) p(\mu_k, \lambda_k | x_{k-1}) {\rm d} \mu_k {\rm d} \lambda_k , $$
(9)

where the integration with respect to the covariance matrix leads to the known class of scale mixture distributions introduced by Barndorff-Nielsen (1977) for the scalar case.

The nonlinear nature of the system model leads to intractable integrals when evaluating the marginal state distribution, p(x k |x k−1). Therefore, Monte Carlo approximation is utilized, where the joint posterior distribution, p(x 0:k |y 1:k ), is approximated by the point-mass distribution of a set of weighted samples (particles) \( \left\{ {x_{{0:k}}^{{(i)}} ,\ell _{k}^{{(i)}} } \right\}_{{i = 1}}^{N} ,\) i.e.,:

$$ \hat{p}_{N} (x_{{0:k}} |y_{{1:k}} ) = {{\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } \delta _{{x_{{0:k}}^{{(i)}} }} (dx_{{0:k}} )} \mathord{\left/ {\vphantom {{\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } \delta _{{x_{{0:k}}^{{(i)}} }} (dx_{{0:k}} )} {\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } }}, $$
(10)

where \(\delta_{ x^{(i)}_{0:k}} (d x_{0:k})\) denotes the Dirac function, and N is the total number of particles. Based on the same set of particles, the marginal posterior probability of interest, p(x k |y 1:k ), can also be approximated as follows:

$$ \hat{p}_{N} (x_{k} |y_{{1:k}} ) = {{\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } \delta _{{x_{k}^{{(i)}} }} ({\rm{d}}x_{k} )} \mathord{\left/ {\vphantom {{\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } \delta _{{x_{k}^{{(i)}} }} ({\rm{d}}x_{k} )} {\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{{i = 1}}^{N} {\ell _{k}^{{(i)}} } }}. $$
(11)

In this Bayesian importance sampling (IS) approach, the particles \( \left\{ {x_{{0:k}}^{i} } \right\}_{{i = 1}}^{N} \)are sampled from the following distribution,

$$ \pi( x_{0:k} | y_{1:k})= \int {\mathcal{N}} (x_k | \mu_k, \lambda_k) p(\mu_k, \lambda_k | x_{k-1}) {\rm d} \mu_k d \lambda_k. $$
(12)

Then, the estimate of the state vector \(\widehat{x}_{k}\) can be approximated by a Monte Carlo scheme as follows:

$$ \widehat{x}_{k} = \sum_{i=1}^{N} \ell_{k}^{(i)}x_{k}^{(i)}, $$
(13)

where \( \ell _{k}^{{(i)}} \) are the corresponding importance weights, which are defined as follows:

$$ \ell^{(i)}_k \propto \frac{p(y_{1:k} | x^{(i)}_{0:k}) p( x^{(i)}_{0:k})}{\pi(x^{(i)}_{0:k} | y_{1:k})}. $$
(14)

A common problem with the sequential IS-based PF is the degeneracy phenomenon, where after a few iterations, all but one particle will have negligible weights. It has been shown (Yang et al. 2005) that the variance of the importance weights can only increase over time, and thus, it is impossible to avoid the degeneracy phenomenon. This degeneracy implies that a large computational effort is devoted to update the particles whose contributions to the approximation of p(x k |y 0:k ) are almost zero. A suitable measure of degeneracy of the algorithm is the effective sample size N eff, which is introduced in (Gustafsson et al. 2002) and (Liu and Chen 1998), and is defined as,

$$ N_{\rm eff}= \frac{1}{\sum_{i=1}^N (\ell_k^{(i)})^2} $$
(15)

where \( \ell _{k}^{{(i)}} \) are the normalized weights obtained using (14). State estimation using PF can be summarized as shown in Algorithm 3.

The state estimation techniques described earlier (EKF, UKF, and PF) utilize a model of the biological system to filter (or provide better estimates of) the measured metabolic or genomic data. When a model is not available, however, model-free filtering of these data can be used. Some of the univariate model-free data filtering techniques that can be used in this regard are described next.

3 Model-free filtering

In the absence of a model of the biological system, which is common due to the complexity of biological phenomena, measured biological data can be filtered using univariate filtering techniques that rely on information about the errors or assumptions about the nature of the noise-free data. Some of these filtering techniques include linear low-pass filtering and multiscale filtering, which are described next.

3.1 Linear data filtering

Linear filtering techniques filter the data by computing a weighted sum of previous measurements in a window of finite or infinite length. These techniques include FIR and IIR filters, which are popular because they are computationally efficient and can be implemented online (Tham and Parr 1994). A linear filter can be written as follows,

$$ \hat x_k = \sum\limits_{i = 0}^{N - 1} {b_i x_{k - i} } $$
(16)

where, b i is the ith filter coefficient, \(\sum_i {b_i }=1, N\) is the filter length, x k is the measured variable (such as a metabolite concentration) and \(\hat x_k\) is the filtered value of the variable. A well-known FIR filter is the MF, where \( b_i=\frac{1}{N}. \) An example of an IIR filter, on the other hand, is the EWMA filter, which can be expressed as follows,

$$ \hat x_k = \alpha x_k + \left( {1 - \alpha } \right)\hat x_{k - 1} $$
(17)

where, α is a smoothing parameter between 0 and 1, where a value of one corresponds to no filtering and a value of zero corresponds to only keeping the first measured data point. A more detailed discussion of different types of linear filters is presented in (Strum and Kirk 1989).

In linear filtering, the basis functions representing raw measured data have a temporal localization equal to the sampling interval. This means that linear filters are single-scale in nature, since all the basis functions have the same fixed time–frequency localization. Consequently, these methods face a trade-off between accurate representation of temporally localized changes and efficient removal of temporally global noise. Therefore, simultaneous noise removal and accurate feature representation of the measured signals cannot be effectively achieved by single-scale filtering methods (Nounou and Bakshi 1999). Enhanced denoising can be achieved using multiscale filtering as will be described next.

3.2 Multiscale data filtering

In this section, several multiscale filtering methods are described. However, since multiscale filtering relies on multiscale representation of data using wavelets and scaling functions, a brief introduction to multiscale representation will be presented first.

(1) Multiscale representation of data: A signal can be represented at multiple resolutions by decomposing it, using a family of wavelets and scaling functions. Consider the time-series measurements of the concentration of a particular metabolite in a biological system which are shown in Fig. 2a (see the description of the biological system in Sect. 4). The signals in Fig. 2b, d, f are at increasingly coarser scales compared to the original signal in Fig. 2a, and are called scaled signals. These scaled signals are determined by projecting the original signal on a set of orthonormal scaling functions of the form,

$$ \phi_{ij}(t)=\sqrt{2^{-j}}\phi(2^{-j}t-k) $$
(18)

or equivalently by filtering the signal using a low-pass filter of length \(r, h_{f}=\left[ {h_1 ,h_2 ,..,h_r } \right], \) derived from the scaling functions. On the other hand, the signals in Fig. 2c, e, g, which are called the detail signals, capture the details between any scaled signal and the scaled signal at the finer scale. These detailed signals are determined by projecting the original signal on a set of wavelet basis functions of the form,

$$ \psi_{ij}(t)=\sqrt{2^{-j}}\psi(2^{-j}t-k) $$
(19)

or equivalently by filtering the scaled signal at the finer scale using a high-pass filter of length rg f  = [g 1g 2, .., g r ], derived from the wavelet basis functions. Therefore, the original signal can be represented as the sum of all detail signals at all scales and the scaled signal at the coarsest scale as follows,

$$ x(t)=\sum_{k=1}^{n2^{-J}}a_{Jk}\phi_{Jk}(t)+\sum_{j=1}^{J}\sum_{k=1}^{n2^{-j}}d_{jk}\psi_{jk}(t) $$
(20)

where jkJ, and n are the dilation parameter, translation parameter, maximum number of scales (or decomposition depth), and the length of the original signal, respectively (Strang 1989; Daubechies 1988; Mallat 1989), and a Jk is the kth scaling coefficient at scale J and d jk is the kth wavelet coefficient at the jth scale.

Fig. 2
figure 2

A schematic diagram of data representation at multiple scales

(2) Standard multiscale (SMS) filtering: Multiscale filtering using wavelets is based on the observation that random errors in a signal are present over all wavelet coefficients, while deterministic changes get captured in a small number of relatively large wavelet coefficients (Donoho and Johnstone 1994; Donoho et al. 1995; Bakshi 1999; Nounou and Bakshi 2000). Thus, stationary Gaussian noise may be removed by a three-step method (Donoho et al. 1995):

  1. 1.

    Transform the noisy signal into the time–frequency domain by decomposing the signal onto a selected set of orthonormal wavelet and scaling basis functions.

  2. 2.

    Threshold the wavelet coefficients by suppressing coefficients smaller than a selected value.

  3. 3.

    Transform the thresholded coefficients back into the original time domain.

Donoho and coworkers have studied the statistical properties of wavelet thresholding and have shown that for a noisy signal of length n, the filtered signal will have an error within O(log n) of the error between the noise-free signal and the signal filtered with a priori knowledge about the smoothness of the underlying signal (Donoho and Johnstone 1994).

Selecting the proper value of the threshold is a critical step in this filtering process, and several methods have been devised. For good visual quality of the filtered signal, the Visushrink method determines the threshold as (Donoho and Johnstone 1994),

$$ t_j = \sigma _j \sqrt {2\log n} $$
(21)

where, n is the signal length and σ j is the standard deviation of the errors at scale j, which can be estimated from the wavelet coefficients at that scale using the following relation,

$$ \sigma _j = \frac{1}{{0.6745}}{\rm median}\left\{ {\left| {d_{jk} } \right|} \right\}. $$
(22)

Other methods for determining the value of the threshold are described in (Nason 1996).

(3) Translation invariant (TI) filtering: In SMS filtering, when the wavelet basis function used to represent a certain feature in the signal is not aligned with the feature itself, an artifact, which is not present in the original signal, can be created in the filtered signal. TI filtering is one approach that was proposed in (Coifman and Donoho 1995) to deal with this problem by shifting the signal several times, filtering it, and then averaging out all translations to suppress these artifacts and improve the smoothness of the filtered data (see Fig. 3). This approach, however, suffers from the creation of end effects because TI filtering considers the signal to be a cyclic signal, which results in artificial discontinuities when the two end points in the signal are very different in magnitude. This disadvantage can be overcome by the BCTI filtering method, which will be discussed in Sect. 3 (B.5).

Fig. 3
figure 3

A schematic diagram of the translation mechanism used in TI filtering

(4) OLMS filtering: The standard and TI multiscale filtering techniques described earlier are batch, i.e., they require the entire data set (which has to be of dyadic length) a priori, and thus, cannot be implemented online. When filtering time-series metabolic data, batch filtering would be acceptable if the filtered data are needed to estimate the parameters of a genetic regulatory network for example. However, if the filtered data are used within an intervention framework, then the metabolic data need to be filtered online as they are measured. To deal with this problem, an OLMS filtering algorithm, that is based on multiscale filtering of data in a moving window of dyadic length, has been developed (Nounou and Bakshi 1999). The OLMS algorithm is summarized below:

  1. 1.

    Decompose the measured data within a window of dyadic length using a causal boundary corrected wavelet filter.

  2. 2.

    Threshold the wavelet coefficients and reconstruct the filtered signal.

  3. 3.

    Retain only the last data point of the reconstructed signal for online use.

  4. 4.

    When new measured data are available, move the window in time to include the most recent measurement, while maintaining the maximum dyadic window length (as illustrated in Fig. 4).

    Fig. 4
    figure 4

    A schematic diagram of the moving window mechanism used in OLMS and BCTI filtering

    Note from Fig. 4 that the moving window always seeks to keep the largest number of dyadic samples available, which would provide a more accurate estimation of the threshold value. Also, note that if a wavelet filter other than Haar is used in OLMS filtering, a boundary corrected version of the filter is needed to avoid inaccuracies at the boundaries. This is particularly important because OLMS only retains the last filtered data point from each window.

(5) BCTI Filtering: If all the filtered signals from all moving windows in OLMS (shown in Fig. 4) are averaged, a much smoother filtered signal is obtained. This is in fact similar to TI filtering, but since it does not assume the signal to be a cyclic list, it overcomes the problem of boundary effects encountered in TI, and thus, is called BCTI. Another advantage is that BCTI averages less number of data points when computing the final estimated signals (compare the mechanisms of TI and BCTI in Figs. 3 and 4, respectively), and thus, BCTI is computationally less expensive than TI. The performances of all filtering techniques are illustrated and compared through a simulated case study.

4 Case study: filtering of dynamic genomic data

The objective behind this case study is twofold: first, highlight the applicability of data filtering for biological data; second, to compare the performances of various model-based and model-free filtering algorithms (which were discussed in this paper) through their application using simulated time-series metabolic data representing the concentrations of four metabolites that are related by the branched pathway shown in Fig. 5. These concentrations are the four dependent variables x 1, ..., x 4 in the following S-system (Voit and Almeida 2004), which also involves one independent variable x 5:

$$ \begin{aligned} \dot{x}_1&=0.2 x_3^{-0.8}x_5^{1}-0.1x_1^{{0.5}} \\ \dot{x}_2&=0.08 x_1^{0.5}-0.03x_2^{0.75} \\ \dot{x}_3&=0.03 x_2^{0.75}-0.05x_3^{0.5}x_4^{0.2} \\ \dot{x}_4&=0.02 x_1^{0.5}-0.06x_4^{0.8} \\ \end{aligned} $$
(23)
Fig. 5
figure 5

Generic branched pathway with four dependent variables

The four differential equations shown in model (23), describe the changes in the concentrations of four different metabolites within the cell, and thus, quantify the interactions among the genes responsible for producing these metabolites. We chose to use a biological phenomenon represented by an S-system model in this simulated example because the S-system model has a canonical nonlinear model structure that can capture the behavior of a large class of biological systems due to its ability to provide a good trade-off between accuracy and mathematical flexibility (Voit 1991; Gentilini 2005). The data used in this case study are generated by first discretizing the S-system (23) using a sampling time of 3.9 s, and then simulating the discretized model using the following initial conditions: x 1(0) = 5.6, x 2(0) = 3.1, x 3(0) = 2.9, and x 4(0) = 3.1, and assuming x 5 = 1. The simulated data, which are assumed to be noise-free, are then contaminated with additive zero mean Gaussian noise, i.e., the noisy data are computed simply by adding the noise to the noise-free data. Two levels (variances) of noise, which correspond to signal-to-noise ratios (SNR) of 10 and 20, are used to provide a good comparison between the performances of the different filtering techniques. The SNR is the ratio of the variance of the noise-free data to the variance of the added noise.

The model-based filtering techniques used in this case study include the EKF, the UKF, and the PF. The model-free filtering techniques, on the other hand, include MF, EWMA filtering, SMS filtering, TI filtering, and BCTI filtering. The filter parameters for all model-free filtering techniques are optimized using cross-validation (Nason 1996), where the data are split into two sets, odd and even. Then, each set (odd and even) is filtered separately, and then the following cross-validation mean square error is minimized to determine the optimum filter parameters:

$$ \begin{aligned} {\rm CVMSE} &= \frac{1}{N/2} \sum\limits_{k = 1}^{N/2} \left\{ \left( \bar{x}_{{\rm odd},k} - x_{{\rm odd},k} \right)^2 \right. \\ & \quad \left. + \left( \bar{x}_{{\rm even},k} - x_{{\rm even},k} \right)^2 \right\} \end{aligned} $$
(24)

where \(\bar x_{{\rm odd},k}=\frac{1}{2}\left({\hat x_{{\rm even},k + 1}+\hat x_{{\rm even},k}}\right), \bar x_{{\rm even},k}= \frac{1}{2}\left({\hat x_{{\rm odd},k+1}+\hat x_{{\rm odd},k}}\right), \) and \({\hat x_{{\rm odd},k}}\) and \({\hat x_{{\rm even},k}}\) are the kth odd and even filtered data samples, respectively, and N is the total number of data samples. The performances of the various filtering techniques are compared by computing the mean square error with respect to the noise-free data, i.e., \({\rm MSE}=\frac{1}{N} \sum \nolimits_{k=1}^N {\left({\hat x_k-\tilde x_k} \right)^2},\) where \(\hat x_k\) and \(\tilde x_k\) are the kth filtered and noise-free data samples, respectively.

To make statistically valid conclusions about the performances of all techniques, a Monte Carlo simulation using 1,000 realizations is performed. The results of the model-based filtering techniques are summarized in Table 1 and are illustrated in Fig. 6 for the case, where SNR = 10. The results of the model-free filtering techniques, on the other hand, are summarized in Table 2 and are illustrated in Figs. 7 and 8 also for the case, where SNR = 10. In all figures, x 1 is in blue, x 2 is in dark green, x 3 is in red, and x 4 is in light green.

Table 1 Comparison between the filtering mean square errors (MSE) for the model-based filtering techniques
Table 2 Comparison between the filtering mean square errors (MSE) for the various model-free (low-pass and multiscale) filtering techniques
Fig. 6
figure 6

Comparison between the performances of various model-based filtering methods for the case, where SNR = 10 (dashed lines noise-free, solid lines filtered)

Fig. 7
figure 7

Comparison between the performances of various online model-free filtering methods for the case, where SNR = 10 (dashed lines noise-free, solid lines filtered)

Fig. 8
figure 8

Comparison between the performances of various batch multiscale filtering methods for the case, where SNR = 10 (dashed lines noise-free, solid lines filtered)

Table 1 and Fig. 6 show that UKF performs better than EKF because UKF can provide more accurate estimation of the mean and covariance matrix of the state vector than EKF, which estimates these moments using a linearized model. The results also show that the PF by far outperforms all other model-based filtering techniques because of its ability to better handle nonlinear systems than EKF and UKF. These model-based filtering techniques can help within a model-based intervention framework, where a model is available. On the other hand, the results of the model-free filtering methods (summarized in Table 2) show that good noise-removal can still be archived without a model. The results show that OLMS filtering outperforms the conventional low-pass filters (MF and EWMA), which can all be implemented online. This advantage can be helpful if the filtered data are utilized in model-free intervention, such as fuzzy intervention in biological systems (Nounou et al. 2012). The advantages of OLMS can be clearly seen in Fig. 7. If more time is available to filter the data, other batch multiscale filtering methods can perform even better. For example, the SMS technique provides a clear improvement over all online methods, including OLMS, as demonstrated in Fig. 8. Other batch multiscale filtering techniques can provide improved smoothness, such as TI. However, due to the cycle-spinning of the data, TI results in boundary effects, especially if the two ends of the data are significantly different in magnitude, such as in the case of x 1 (in blue) and x 4 (in light green) in Fig. 8. This drawback of TI is accounted for in BCTI, which provides similar smoothness without the boundary effects, and thus, results in smaller mean square errors as shown in Table 2. The advantages of these batch multiscale filtering techniques can be helpful if the filtered biological data are used to enhance the accuracy of biological models which are estimated from these filtered data.

5 Conclusion

Data filtering is important to enhance the quality and usefulness of noisy biological data, which can be used to model biological phenomena and to design intervention techniques for treating diseases. This paper presented a comparison between various model-based and model-free filtering techniques when used to denoise biological data contaminated with measurement errors. The model-based filtering techniques used in this work include the (EKF, UKF, and PF). The model-free filtering techniques include various low-pass as well as multiscale filtering methods. Examples of low-pass filters include the MF and the EWMA filter, while examples of multiscale filters include the OLMS, the SMS, the TI, and the BCTI filtering techniques. This comparison is conducted using simulated time-series biological data contaminated by white noise. The simulated data physically represent the concentrations of four metabolites related by a dynamic model. The results of the comparative analysis clearly show an advantage of the model-based over the model-free filtering techniques due to the utilization of the model, and that the PF by far outperforms the Kalman filtering techniques, EKF and UKF, due to its ability to handle highly nonlinear systems. The results also show that good noise removal is still possible without a model, especially using multiscale filtering, which has a clear advantage over the conventional low-pass filters. It is demonstrated that if filtering has to be implemented online (such as in biological intervention), OLMS outperforms the online MF and EWMA filters. However, if the filtered data are to be used to model the biological system, where the biological data are available a priori, the batch multiscale filtering techniques (SMS, TI, and BCTI) can provide even better performances. TI and BCTI provide smoother filtered data than SMS because of the averaging they utilize, while BCTI gives more accurate filtering at the edges.