Introduction

The magnetotelluric (MT) method employs natural electromagnetic fields as sources, reaching a prospecting depth of 600 km or more (Simpson and Bahr 2005; Garcia et al. 2015). It is a popular and indispensable method for deep mineral resources exploration and electrical conductivity structure probing of the Earth (Guo et al. 2019). However, natural MT signals are weak, non-stationary and broadband in frequency, and therefore susceptible to cultural noises (Cai et al. 2009; Neukirch and Garcia 2014). As the proportion of urbanization continues to increase, the distribution of cultural noises is becoming wider and wider, which severely limits the application of the MT method.

Methods to deal with MT noises are mainly include remote reference (RR) technique (Gamble et al. 1979), robust statistic estimate (Egbert 1986, 1997; Garcia and Jones 2008) and signal–noise separation in time domain (Trad and Travassos 2000; Neukirch and Garcia 2014). With the development of urbanization, the construction of remote reference stations becomes more and more difficult. The remote reference technique often fails in noisy environments because it only works if the noise is incoherent between local and remote reference stations. This assumption is increasingly violated due to the large-scale construction of infrastructure (rail networks, pipelines, power systems, industrial and mining enterprises, etc.). Robust statistic estimate method requires that most of the data should be noise-free, while some types of noise are persistent and appear during the entire observation process. In this case, the robust statistic method may lead to worse results (Escalas et al. 2013; Campanya et al. 2014; Larnier et al. 2016; Tang et al. 2018).

By removing noises in time domain, the time-series editing methods can directly and effectively improve the quality of MT data (Trad and travassos 2000; Tang et al. 2013, 2018; Neukirch and Garcia 2014; Larnier et al. 2016). The most representative time-series editing method is the wavelet transform-based scheme. Mathematical morphological filtering, empirical mode decomposition and combinations between them are also used for MT time-series editing (Cai 2016; Li et al. 2017). However, the time domain signal–noise separation methods mentioned above have the risk of losing effective signals, especially the low-frequency signal (Li et al. 2020). In other words, no matter the time-series segment being processed is clean or noisy, some components will be removed if using the methods mentioned above (Li et al. 2018).

To solve this problem, sparse representation was applied to MT signal processing (Tang et al. 2017; Li et al. 2017). By designing a redundant dictionary (or called over-complete dictionary, see details later) that matches with cultural noises but is not sensitive to useful MT signals, cultural noises can be effectively removed while retaining useful signals. However, cultural noises in the measured data are complex and diverse. Predesigned redundant dictionaries have obvious limitations. For instance, a simple redundant dictionary can only effectively deal with a certain type or a few types of noises. Besides, a complex redundant dictionary is time consuming because the number of atoms (each column in the dictionary is an atom) in the dictionary is too large.

Olshausen and Field (1996) used self-learned dictionaries for sparse representations of natural images, and since then self-learned dictionaries have received widespread attention. Shift-invariant sparse coding (SISC) is a data-driven machine learning algorithm (Blumensath and Davies 2005, 2006). It learns the feature structures (the so-called redundant dictionary) autonomously from a given sample set and uses convolution as a shift operator to conveniently represent multiple features with one atom. In other words, SISC can acquire the regular pattern of the signal autonomously. Currently, SISC has been modified several times and is successfully applied to heartbeat signal processing (Blumensath and Davies 2005), speech signal processing (Plumbley et al. 2006), and mechanical fault feature extraction (Liu et al. 2011; Zhu et al. 2016). In this paper, we extend the improved shift-invariant sparse coding (ISISC; Zhu et al. 2015; Wang et al. 2015) to noise attenuation of MT data and attempt to obtain better flexibility and efficiency by replacing predesigned redundant dictionary with self-learned redundant dictionary.

The rest of this paper is organized as follows. Section “Improved shift-invariant sparse coding” gives the theory of ISISC; Sect. “Synthetic case studies” presents the analysis of synthetic data; field experimental data and measured MT data are studied in Sect. “Real case studies”; the conclusions will be given in the final section.

Improved shift-invariant sparse coding

The shift-invariant sparse coding model

In the traditional model of sparse representation (Mallat and Zhang 1993), a signal or image is represented as a linear combination of the redundant dictionary and coefficients. Thus similar or identical feature structures at different locations in the time series require multiple atoms to represent. In addition, the redundant dictionary in the traditional sparse representation is manually predefined. In actual demand, the predefined dictionary is not flexible enough for complex signals (Jafari and Plumbley 2011; Chen 2017).

Shift-invariant sparse coding is a machine learning algorithm based on data-driven framework. It employs convolution as a shift operator to satisfy the property of shift-invariant, and represents the signal as a convolution of the dictionary and the coefficients. This allows feature atoms to be translated, flipped, and scaled anywhere in the time series, thereby facilitating the use of one atom to conveniently represent multiple features at different locations. What is more, the redundant dictionary in SISC is learned from the raw time series adaptively. In other words, SISC is able to learn the laws of signals autonomously and is effective for all kinds of morphological features, which is highly suitable for the processing of complex time series.

For the discrete signal set \(Y = \left[ {y_{1} ,y_{2} , \ldots y_{k} } \right]^{T}\), shift-invariant sparse coding expresses yk as the sum of convolutions of the atoms dm and sparse coding coefficients sm,k:

$${\mathbf{y}}_{k}^{{}} = \sum\limits_{m = 1}^{M} {{\mathbf{d}}_{m} * {\mathbf{s}}_{m,k} } + {\varvec{\upvarepsilon}} ,$$
(1)

where \(Y_{k} = \left[ {y_{1} ,y_{2} , \cdots y_{N} } \right]^{T}\) is a time-series segment with N sampling points. \(D = \left[ {d_{1} ,d_{2} , \cdots d_{M} } \right]^{T} \in R^{Q \times M}\) is the so-called redundant dictionary, or called over-complete dictionary, in which dm is an atom in dictionary D and the number of atoms M is larger than N. In most cases the number of atoms M is much larger than N. In other words, signal yk can be represented using the dictionary D in many ways. This is why dictionary D is called a redundant dictionary. \({\varvec{\upvarepsilon}}\) stands for Gaussian white noise. “*” denotes the operation of convolution. \({\mathbf{s}}_{m,k} \in R^{P}\) and sm,k is sparse (most of the elements in sm,k are zero). Q < N, P < N and Q + P−1 = N.

As shown in Eq. (1), both the atom dm and the coefficient sm,k in the model of SISC are unknown. The optimization problem is non-convex and difficult to obtain a stable solution if dm and sm,k are obtained simultaneously. Many scholars solved this problem by turning it into a convex optimization problem. The atom dm and coefficients sm,k are updated alternately in their schemes (Smith and Lewicki 2006; Plumbley et al. 2006; Aharon et al. 2006). When dm is known, sm,k can be obtained based on the convex optimization method; correspondingly, when sm,k is constant, dm can be solved based on the convex optimization method. Sparsity is the common goal of the above two optimization problems, and the cost function for evaluating the sparsity of yk is (Liu et al., 2011; Wang et al., 2015):

$$\psi (\theta ) = \mathop {\hbox{min} }\limits_{{{\mathbf{d}},{\mathbf{s}}}} \sum\limits_{k = 1}^{K} {\left\| {{\mathbf{y}}_{k} - \sum\limits_{m = 1}^{M} {{\mathbf{d}}_{m} * {\mathbf{s}}_{m,k} } } \right\|}_{2}^{2} + \beta \cdot \sum\limits_{m,k} {\left\| {{\mathbf{s}}_{m,k} } \right\|_{1} } ,$$
(2)

where \(\left\| . \right\|_{F}\) represents the F-order norm, β is a parameter of constraint used to balance reconstruction error and sparsity. The learned atom dm usually needs to be normalized, i.e., \(\left\| {d_{m} } \right\|_{2}^{2} = 1.\)

Solving the sparse coding coefficients

Keeping the dictionary unchanged, the sparse representation coefficients can be obtained by matching pursuit (MP) algorithm (Mallat and Zhang 1993) or orthogonal matching pursuit (OMP) algorithm (Pati et al. 1993). OMP is improved from MP by adding the step of orthogonalization. We use OMP to solve the coding coefficients because it has a better characteristic of convergence.

Imaging yk is the signal to be represented, gi,u is the atom obtained by shifting u points from the learned feature structure, its length is the same as signal yk, and || gi,u || = 1. L denotes the number of iterations, rL represents the residual after the Lth iteration, ψL represents the selected set of atoms after the Lth iteration. The steps of the OMP algorithm are as follows:

  1. 1.

    Initializations, r0 = yk, ψ0 = ∅, L = 1;

  2. 2.

    Pursuit of the atom gi,u that satisfies the following equation:

    $$\left| {\left\langle {{\mathbf{r}}^{L} ,{\mathbf{g}}_{i,u}^{L} } \right\rangle } \right| = \mathop {\sup }\limits_{1 \le i \le Q} \left( {\mathop {\sup }\limits_{0 \le u \le P} \left| {\left\langle {{\mathbf{r}}^{L} ,{\mathbf{g}}_{i,u}^{L} } \right\rangle } \right|} \right);$$
    (3)
  3. 3.

    Update the selected set of atoms, \({\varvec{\uppsi}}_{L} = {\varvec{\uppsi}}_{L - 1} \cup \{ {\mathbf{g}}_{i,u}^{L} \}\);

  4. 4.

    Calculate the projection coefficients according to least squares method \({\mathbf{s}}_{L} = ({\tilde{\varPsi }}_{L}^{T}\Psi _{L} )^{ - 1} \cdot {\tilde{\varPsi }}_{L}^{T} \tilde{y}_{k}\), and subsequently, residual \({\mathbf{r}}^{L} = {\mathbf{y}}_{k} - {\mathbf{s}}_{L}\Psi _{L}\), reconstructed signal \(\hat{y}_{k} = {\mathbf{s}}_{L}\Psi _{L}\);

  5. 5.

    L = L+1 and return to step 2 if L is smaller than the maximum number of iterations. Otherwise, output the current reconstructed signal and the corresponding residual.

Dictionary learning

Make full use of the idea of K-SVD algorithm (Aharon et al. 2006), Wang et al. (2015) updated the atoms one by one, rather than all at once. They termed the new method ISISC and show that ISISC is superior to the gradient-based SISC (GSISC) in accuracy and efficiency. In the stage of dictionary learning, the atoms are updated while the coding coefficients stay unchanged. The optimization function can be simplified as:

$$\begin{aligned} \bar{\psi }\left( \theta \right) = & \mathop {\hbox{min} }\limits_{d} \sum\limits_{i = 1}^{N} {\left\| {y_{k} - \sum\limits_{m = 1}^{M} {d_{m} *s_{m,k} } } \right\|}_{2}^{2} \\ \mathop { = \hbox{min} }\limits_{d} \sum\limits_{k = 1}^{K} {\left\| {\left( {y_{k} - \sum\limits_{m \ne i}^{M} {d_{m} *s_{m,k} } } \right) - d_{i} *s_{i,k} } \right\|}_{2}^{2} \\ \mathop {= \hbox{min} }\limits_{d} \sum\limits_{k = 1}^{K} {\left\| {E_{i,k} - d_{i} *s_{i,k} } \right\|}_{2}^{2} , \\ \end{aligned}$$
(4)

where Ei,k represents the recovery error of all the atoms except the ith atom with respect to the kth signal. The update of the ith atom can be translated into solving an equation for di. Since \({\mathbf{d}}_{i} * {\mathbf{s}}_{i,k} = {\mathbf{s}}_{i,k} * {\mathbf{d}}_{i}\), when only taking the kth signal into consideration, the optimization of Eq. (4) equals to the solution of the following equation (Zhu et al. 2016):

$$\left[ {\begin{array}{*{20}c} {s_{i,k}^{1} } & {} & {} & {} \\ {s_{i,k}^{2} } & {s_{i,k}^{1} } & {} & {} \\ \vdots & {s_{i,k}^{2} } & \ddots & {} \\ {s_{i,k}^{P} } & \vdots & \ddots & {s_{i,k}^{1} } \\ {} & {s_{i,k}^{P} } & \ddots & {s_{i,k}^{2} } \\ {} & {} & \ddots & \vdots \\ {} & {} & {} & {s_{i,k}^{P} } \\ \end{array} } \right] \cdot \left[ {\begin{array}{*{20}c} {d_{m}^{1} } \\ {d_{m}^{2} } \\ \vdots \\ {d_{m}^{Q} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {E_{i,k}^{1} } \\ {E_{i,k}^{2} } \\ \vdots \\ {E_{i,k}^{N} } \\ \end{array} } \right] .$$
(5)

Considering the matrix on the left side of Eq. (5) as a special Toeplitz matrix of coefficients \({\mathbf{s}}_{i,k}\), the above equation can be written as \(Toep({\mathbf{s}}_{i,k} ) \cdot {\mathbf{d}}_{i} = {\mathbf{E}}_{i,k}\). Since the coefficient \({\mathbf{s}}_{i,k}\) is sparse, many of the row vectors in the matrix \({\text{Toep}}({\mathbf{s}}_{i,k} )\) are 0 vectors, and these 0 vectors have no effect on the result. Remove these 0 vectors from \({\text{Toep}}({\mathbf{s}}_{i,k} )\) and the corresponding row vectors from Ei, then Eq. (5) can be written as \({\text{Toep}}(\tilde{s}_{i,k} ) \cdot {\mathbf{d}}_{i} = \widetilde{{\mathbf{E}}}_{i,k}\). When taking all K signals into consideration, the optimization function can be expressed as:

$$\left[ {\begin{array}{*{20}c} {{\text{Toep}}(\tilde{s}_{{_{i,1} }} )} \\ {{\text{Toep}}(\tilde{s}_{{_{i,2} }} )} \\ \vdots \\ {{\text{Toep}}(\tilde{s}_{{_{i,K} }} )} \\ \end{array} } \right] \cdot {\mathbf{d}}_{i} = \left[ {\begin{array}{*{20}c} {\tilde{E}_{{_{i,1} }} } \\ {\tilde{E}_{{_{i,2} }} } \\ \vdots \\ {\tilde{E}_{{_{i,K} }} } \\ \end{array} } \right].$$
(6)

Simplifying Eq. (6) to Sdi= E, and according to the least squares method, the feature atom can be derived as \({\mathbf{d}}_{i} = ({\mathbf{S}}^{T} {\mathbf{S}})^{ - 1} ({\mathbf{S}}^{T} {\mathbf{E}})\). Since \(({\mathbf{S}}^{T} {\mathbf{S}}) \in {\text{R}}^{Q \times Q}\) and in most cases, Q ≪ N, the above equation can be transformed into a small-scale linear equation. The solution can be directly obtained by Cholesky decomposition, LU decomposition and other methods. Each atom is updated in a random order, and finally gets all the feature structures, i.e., feature atoms. The learned dictionary is obtained by normalizing all the feature atoms, i.e., \({\mathbf{d}}_{i} = {\mathbf{d}}_{i} /||{\mathbf{d}}_{i} ||_{2}^{2}\).

The flow diagram of data processing

Here, we give the flow diagram of data processing for MT data:

Input: raw MT data Y.

Initialization: give the initial value to the dictionary D and the sparse representation coefficient S randomly, z = 0.

Repeat the following:

{

z = z+1;

Solving the sparse coding coefficients;

Update the dictionary;

}

Until z reaches the maximum number of iterations.

Output: the learned dictionary D, sparse coding coefficients S and the reconstructed signal \(\bar{Y}.\)

The reconstructed signals are components with abnormally large amplitude or obvious regularity. These components are noise according to the characteristics of natural magnetotelluric signals. Therefore, the de-noised MT data can be obtained by subtracting the reconstructed signal from the original signal.

Synthetic case studies

Square-wave noise and impulsive noise are two common types of noises during natural field source electromagnetic exploration. These two types of noise are usually very large in amplitude and theoretically have an infinite number of harmonic components, thus affecting multiple frequency bands of the apparent resistivity and phase curves. It is very difficult to deal with these types of noise. The pseudo-random square-wave signal contains a variety of square-wave structures of different widths. The charge–discharge-like waveforms have both abrupt components similar to impulsive noises and stationary components similar to harmonics. In this section, pseudo-random square-wave noise and charge–discharge-like noises are used to verify the effectiveness and performance of different de-noising methods since they can simulate the complex noises in the real world.

Noisy MT data are obtained by adding simulated pseudo-random square-wave noises or charge–discharge-like noises to measured noise-free MT data. Signal-to-noise ratio (SNR), recovery error (E), normalized cross-correlation (NCC) and time consuming (T) are used to quantitatively evaluate the results of noise attenuation. SNR, E and NCC are defined as (Candès and Wakin 2008; Li et al. 2017; Zhang et al. 2019):

$${\text{SNR}} = 20\lg \frac{{\left\| {y(n)} \right\|_{2} }}{{\left\| {y(n) - r(n)} \right\|_{2} }} ,$$
(7)
$$E = \frac{{\left\| {y(n) - r(n)} \right\|_{2} }}{{\left\| {y(n)} \right\|_{2} }} ,$$
(8)
$${\text{NCC}} = \frac{{\mathop \varSigma \limits_{n = 1}^{N} y(n) \cdot r(n)}}{{\sqrt {(\mathop \varSigma \limits_{n = 1}^{N} y^{2} (n)) \cdot (\mathop \varSigma \limits_{n = 1}^{N} r^{2} (n))} }},$$
(9)

where y(n) represents the original signal; r(n) stands for the de-noised signal; N is the length of signal. All data processing work in this paper is completed on the laptop of ThinkPad W530 (CPU, i7-3610, 2.3 GHz; RAM, 8.00 GB).

Noise attenuation for square-wave noises

Figure 1a shows the original noise-free MT data, which are collected in Qaidam Basin using MTU (Phoenix Geophysics Ltd), *with a sampling rate of 15 Hz. Figure 1b is the synthetic noisy data which is obtained by adding simulated pseudo-random square-wave noise to noise-free MT data. Figure 1c is the de-noised signal using predefined square-wave dictionary (SD) based method (Tang et al. 2017; Li et al. 2017). In this method, an improved OMP (IOMP) algorithm is used to improve the efficiency. Please refer to related literatures for more details about the method. Figure 1d is the de-noised signal using our new method proposed in this paper. Figure 1 demonstrates that both methods completely eliminate noises. However, compared with the original spectrum, it is clear that the results obtained by the predefined dictionary-based method lose some of the low-frequency effective signals. In contrast, our new method yields no visible distortion.

Fig. 1
figure 1

Results of pseudo-random square-wave noise processing. a Original noise-free MT signal; b the synthetic noisy signal; c de-noised by square-wave dictionary-based method (IOMP-SD); d de-noised by ISISC. The left panels are time domain waveforms and the right panels are their spectra

As shown in Table 1, both predefined dictionary and learned dictionary-based methods have achieved good de-noising results; SNR, E and NCC are greatly improved over the noisy signal. However, result obtained by our method has a better signal-to-noise ratio, smaller recovery error and higher normalized cross-correlation. Especially the time consumption is much smaller than the predefined dictionary-based method.

Table 1 Quantitative evaluation of pseudo-random square-wave noise processing results

Noise attenuation for charge–discharge-like noises

Figure 2a is the original noise-free MT data, which is collected in Qaidam Basin, with a sampling rate of 150 Hz. Figure 2b is the noisy MT signal which is obtained by adding simulated charge–discharge-like noises to noise-free MT data. Figure 2c is the de-noised signal using predefined pulse dictionary (PD, contains charge–discharge-like atoms) based method (Wang et al. 2013). In this method, the algorithm of particle swarm optimization (PSO) is used to improve the efficiency. Figure 2d is the de-noised signal using ISISC-based method proposed in this paper. Both predefined dictionary and learned dictionary-based methods effectively remove noises. However, the damage of low-frequency signal can be easily found from the spectrum obtained by the pulse dictionary-based method. In addition, the increase of the spectrum around 50 Hz indicates that pulse dictionary-based method generates some new noises.

Fig. 2
figure 2

Results of the charge–discharge-like noise processing. a Original noise-free MT signal; b synthetic noisy signal; c de-noised by pulse dictionary-based method (PSO-PD); d de-noised by ISISC. The left panels are time domain waveforms and the right panels are their spectra

As shown in Table 2, the pulse dictionary-based method also obtained good NCC, E and SNR because some atoms in the dictionary are very similar to the simulated charge–discharge-like noises. Nevertheless, the results obtained by pulse dictionary are far from satisfactory because it takes 440.2 s. The ISISC-based method adaptively learns the dictionary from the noisy data and is therefore not limited by the type of noise. The value of NCC obtained by ISISC surged up to 0.9827, recovery error decreased from 57.0588 to 0.1849, SNR increased from -35.1264 dB to 14.6597 dB, time consumption decreased from 440.2 s to 1.2 s. Obviously, the results achieved by ISISC-based method is significantly better than the predefined dictionary-based method.

Table 2 Quantitative evaluation of the charge–discharge-like noise processing

Real case studies

Experimental data in Liangshan Prefecture, Sichuan Province

In order to verify the validity of the data-processing method and to evaluate the data processing results reasonably, we conducted a MT data observation experiment in Liangshan Prefecture, Sichuan Province, China, in 2014. The experimental instruments are the magnetotelluric data acquisition system MTU-5A (Phoenix Geophysics Ltd) and the controlled-source electromagnetic method (CSEM) transmitter GGT-30 (Zonge International INC.). The data obtained at this station is recorded as ASY0002B.

The experiment lasted for 100 min, in which the first 60 min were observed normally according to the standard procedure (the transmitter did not work), and the data observed during this time period was marked as D1. In the next 40 min, the transmitter continuously transmits electric-dipole source signals with different frequencies. The maximum transmitting current was about 20 A. The length of the electric dipole was 1.2 km and the electric-dipole was parallel to the receiving dipole of x-direction. The transmitter was 1.2 km away from the observation station (the Rx–Tx distance was 1.2 km), and the data observed in this time period was marked as D2. Since the vicinity of the observation site was sparsely populated and there was almost no electromagnetic noise, the D1 data set is high-quality data. The apparent resistivity and phase curves obtained using this data set can be seen as the real geoelectric responses, so they can be used as a reference for comparison. Note that the controlled-source signals in our experiment are treated as noises because they are not plane wave. Therefore, the D2 data set is noisy because it was contaminated by strong controlled-source signals.

As shown in Table 3, the transmitter sends 7 square-wave signals of different frequencies in sequence. Due to the filters in the acquisition device and the low-pass filtering effect of the earth system, the received signals are not ideal square-wave signals, but are similar to charge–discharge-like waveforms. The frequencies of the controlled-source signals range from 0.125 Hz to 1 Hz, however, both the square-wave and the charge–discharge-like signals have a large number of odd harmonics. Therefore, the frequency actually affected by the CSEM source is not limited to 0.125 Hz–1 Hz.

Table 3 Fundamental frequencies and duration of the controlled-source signal

Figure 3 shows the MT time-series segments in the D2 data set, the sampling rate is 150 Hz. Similar to the segments shown in Fig. 3, the entire D2 data set was contaminated by CSEM source. These signals are standard CSEM noises in morphology and do not have the characteristics of natural magnetotelluric signals at all. When subjected to such persistent noise pollution, traditional MT data-processing methods are difficult to obtain good results. Observing the time series, it can be seen that the noise (CSEM source) is formed by shifting and flipping the same or similar structures, and thus has a characteristic of shift-invariant. These structures are perfect for feature extraction using SISC.

Fig. 3
figure 3

Raw time-series segments from the data set D2 of the station ASY0002B, a sampling rate of 150 Hz. a Ex component; b Ey component; c Hx component; d Hy component

Figure 4 shows the feature structures that ISISC learns autonomously from the D2 data set. These feature structures are zero-padded and then become atoms in the redundant dictionary, which can be used to accurately represent a particular type of signal. Since they are learned autonomously based on the signal to be processed, they match the CSEM noises very well. As shown in Fig. 4, the learned feature structure is neither an ideal square-wave signal nor an ideal charge–discharge signal, but a complex signal that is difficult to describe. A regular zigzag waveform is produced on its rising and falling edges. As it is, the source of human noise in the measured MT signal is complex and has different forms. It is very difficult to design a predefined dictionary that exactly matches the complex and variable noise. Therefore, the predefined dictionary-based method is inevitable to reduce the decomposition accuracy or increase the time consumption.

Fig. 4
figure 4

Feature structures (atoms) learned from Ex (a), Ey (b), Hx (c) and Hy (d) components of the MT data set D2 (a sampling rate of 150 Hz) using ISISC

As shown in Fig. 5, the magnetic components (Hx and Hy) obtained by PSO-PD have obvious differences between the first half and the second half. The first half of Hx and Hy components may lose some useful signals because natural MT signals usually contain some useful spikes (refer to the noise-free signals shown in Figs. 1a, 2a). The electric signals (Ex and Ey) and the second half of the magnetic signals indicate that some noise has not been removed by PSO-PD. The results obtained by ISISC are more reasonable since the noise is effectively removed and there is no obvious difference between the first and the second half.

Fig. 5
figure 5

De-noised time-series segments by ISISC (left) and PSO-PD (right). a Ex component; b Ey component; c Hx component; d Hy component

As shown in Fig. 6, the apparent resistivity and phase curves calculated from the noise-free MT data set D1 are continuous, smooth and vary slowly with the frequencies except for some little bias below 1 Hz. All values are in the normal range. Overall, it shows good data quality.

Fig. 6
figure 6

Apparent resistivity and phase curves of the station ASY0002B. The upper panels are apparent resistivity curves and the bottom panels are phase curves. The red curves with inverted triangles represent the results obtained by all raw data sets D1 and D2; the black solid lines represent the results obtained using noise-free data set D1; the green curves with triangles stand for the results achieved by data sets D1 and D2 but is de-noised by ISISC; the curves with blue circles stand for the results calculated from the data sets D1 and D2, but filtered by the pulse dictionary-based method

The apparent resistivity and phase curves obtained by raw D1 and D2 data sets are severely distorted below 300 Hz since the D2 data set is contaminated by CSEM noises. There are a large number of outliers in both resistivity and phase curves, and the phase values in the xy component are mostly distributed in an unreasonable range. This result is significantly different from the apparent resistivity and phase curves calculated from the noise-free data set.

The processing of PSO-PD results in an obvious improvement over the previous. However, the MT response of xy components still reveals obvious distortion around 100 Hz and below 1 Hz. Moreover, visible bias can be found at yx components below 1 Hz.

By de-noising the D2 data set with our new method and calculating the MT responses using data sets of D1 and D2, we get the curves shown as green triangles in Fig. 6. Obviously, most of the outliers disappear and the phase curves return to reasonable. The results obtained by our method are generally consistent with the results obtained from the D1 data set except for some little bias below 1 Hz.

Experimental data in Qaidam Basin, Qinghai Province

In June 2012, we conducted another experiment in Qaidam Basin, the northeastern part of the Qinghai-Tibet Plateau. The acquisition device is an MTU (Phoenix Geophysics Ltd). The transmitting equipment is a pseudo-random signal transmitter developed by Central South University. A 1.5-km-long, y-oriented electric-dipole source was placed 2 km away from the receive dipole. The maximum transmitting current is about 80 A and the frequencies of the controlled-source signals range from 0.0117 Hz to 48 Hz. The observation station is named as QH401504 and located in a very remote and sparsely populated area, so there is almost no human interference. The observation time is 20 h in total, and the data collected in the first 1.5 h are marked as D1. The D1 data set is severely polluted by the CSEM noises. The data collected in the next 18.5 h are high-quality data because the transmitter has stopped working during this time, where the data set is marked as D2.

Figure 7 shows the learned atoms from data set D1 of the site QH401504. The features of pseudo-random square-wave signals can be easily found from these atoms. In fact, each atom represents a periodic time-series segment. Each component contains three atoms, which means that there are three types of pseudo-random sources. They are different in fundamental frequencies. Figure 8 shows the results of signal–noise separation by ISISC. It is clear that the raw time series shown in Fig. 8 can be easily represented by a certain atom illustrated in Fig. 7. As shown in Fig. 8a, the raw data and de-noised data have the same trend (very low-frequency signal). It means that our new method has little risk of discarding the effective low-frequency signal. As shown in Fig. 8d, both the raw signal and de-noised signal have a spike around the sampling point of 220. This is the evidence that our method can accurately identify the target noise and retain useful signals. Similar evidence can be found near the sampling point of 12,000 in Fig. 8b. The processing of the MT data set QH401504 fully illustrates that our method can accurately identify and remove relevant and persistent noises.

Fig. 7
figure 7

Feature structures (atoms) learned from Ex (a), Ey (b), Hx (c) and Hy (d) components of the MT data set D1 (a sampling rate of 15 Hz) using ISISC

Fig. 8
figure 8

Raw (black) and de-noised (read) time-series segments from the data set D1 of the station QH401504, a sampling rate of 15 Hz. a Ex component. b Ey component. c Hx component. d Hy component

As shown in Fig. 9, the MT responses obtained using high-quality data set D2 vary slowly with frequencies, while the results achieved using the noisy D1 and noisy-free D2 data sets are severely distorted between 2 Hz and 0.002 Hz. The results obtained after applied our ISISC-based method are greatly improved in the whole band and are highly consistent with the results obtained from high-quality data.

Fig. 9
figure 9

Apparent resistivity and phase curves of the station QH401504. The upper panels are apparent resistivity curves and the bottom panels are phase curves. The red curves denote the results obtained by all the raw data sets D1 and D2; the solid lines represent the results obtained using noise-free data set D2; the green curves stand for the results achieved by data sets D1 and D2, but is de-noised by our new method

Real data in Lujiang-Zongyang ore district, Anhui Province

Because of the well-developed economics in the Lujiang-Zongyang ore district, the MT data collected in this region are more or less contaminated by cultural noises. There are a lot of time series with abnormal large amplitudes or regular structures. According to the origin and characteristics of natural magnetotelluric signals, such anomalous and periodic structures are not effective MT signals and need to be removed. Nevertheless, the noises are relevant and persistent and therefore difficult to be effectively eliminated by traditional methods.

Figure 10 shows the atoms learned from the real data set EL22192. These structures have regular morphology and strong instantaneous energy. Obviously these structures are generated by human activities.

Fig. 10
figure 10

Feature structures (atoms) learned from Ex (a), Ey (b), Hx (c) and Hy (d) components of the MT data set EL22192 (a sampling rate of 150 Hz) using ISISC

As shown in Fig. 11, the raw time series is decomposed into 7 independent components and a residual (the de-noised signal). The reconstructed components have different characteristics and may come from independent noise sources. After removing these reconstructed components, the residual has no regular components or abnormal large amplitude structures. Judging from results of the time series, the ISISC-based method effectively removed strong cultural noises.

Fig. 11
figure 11

Raw time-series segments (a), reconstructed components (bh), and residual (i) of the Hx component from real site EL22192. Each component is reconstructed by a learned atom. The residual is obtained by subtracting all reconstructed components from the raw signal

As shown in Fig. 12, there are numerous outliers in the curves calculated from the raw data. When the frequency is lower than 40 Hz, the apparent resistivity varying linearly with frequencies and rising up with 45° on the logarithmic coordinates, phase at the corresponding frequency approaching 0 or ± 180°, which is similar to the CSAMT apparent resistivity and phase curves in the near-field. This is the so-called near-source effect (Wei and Pedersen 1991; Tang et al. 2013). It is clear that the data are contaminated by noises near the observation station, and these raw curves cannot accurately reflect the electrical structures in the subsurface.

Fig. 12
figure 12

Apparent resistivity and phase curves of the stations in Lujiang-Zongyang ore district. The upper panels are resistivity curves and the bottom panels are phase curves. The red curves with inverted triangles denote the results obtained using the raw noisy data, the black solid line represents the results obtained by remote reference method, the green curves with triangles stand for the results achieved by our new method

After processing by our method, the apparent resistivity and phase curves vary smoothly with the frequencies and the near-source effect has been eliminated completely. The values of the apparent resistivity and phase are distributed in a reasonable range. The results obtained by our method are consistent with those obtained by the remote-reference processing method, except for a few data in frequency bands below 3 Hz. Case studies of MT data sets recorded in Lujiang-Zongyang ore district have shown that our new scheme can effectively remove the cultural noises and obtain comparable results to that from remote-reference processing.

Conclusions and discussion

To improve the quality of MT data, we propose a novel signal–noise separation method based on the improved shift-invariant sparse coding. The application of the proposed method to synthetic data and real MT data shows:

  1. 1.

    The new scheme proposed in this paper can improve the signal–noise ratio of MT data significantly and the MT responses obtained by our method are consistent with that from the remote reference method. This method can be used with or without a remote reference station and may provide superior results in two important cases: when there is no remote reference station and when the noise is coherent between the local and remote reference stations.

  2. 2.

    In our new scheme, the redundant dictionary is learned autonomously from the data to be processed, instead of predefined manually. Compared with the methods based on predefined redundant dictionary, the adaptability of the newly proposed method is greatly enhanced.

  3. 3.

    The number of atoms in the learned dictionary is greatly reduced since the dictionary is learned based on the characteristics of the data itself. This change greatly reduces the time consumption of subsequent decomposition. Noise removal using our new procedure can be completed on a laptop or a desktop computer. Nevertheless, it is worthwhile to improve the efficiency of our method since the amount of data in the magnetotelluric method is very large.

The method proposed in this paper is based on single site processing, and does not take into account the regularity of the cultural noise between different sites. In fact, cultural noises at different stations are likely to be relevant. It is possible to improve the performance of the proposed method using dictionary learning or other machine learning algorithms for multi-station processing. Besides, the new scheme proposed in this paper could be used together with remote-reference processing. The ISISC may take out the high amplitude and repetitive cultural noises while RR processing of the cleaned time series may further remove faint incoherent noises.