Introduction

With the construction of the national digital detection network, the observation methods and technologies of the new Internet of things are gradually applied [1,2,3,4], which basically realizes the digitization, networking and integration of seismic monitoring and exploration, and the ability of seismic monitoring and exploration has been greatly improved. The application of digital technology provides rich first-hand data for seismic scientific research and lays a good foundation for explanations and predictions of seismic exploration. Noise attenuation is crucial to obtain high-quality data from the seismic signals which are collected with geophone sensing equipment and networks [4]. But this usually badly interfered with and distorted by noises in seismic exploration. It has an important impact on oil and gas exploration.

In order to well denoise the big data generated by the Internet of things, many seismic denoising methods have been proposed. Among previously proposed seismic denoising methods [5,6,7,8,9,10,11,12,13], the most effective ones include dictionary learning schemes [12, 13] and sparse transform schemes [6, 7, 9,10,11] Convolutional neural networks (CNNs) based methods [14,15,16,17,18,19,20,21,22] show powerful learning ability and have been intensively applied in various fields, i.e., computer vision tasks including nature image super-resolution (SR), fine-grained visual categorization [23, 24], deep optimized MRI technique [25] for clinical applications. Even more, deep learning has been used to solve datacenter-scale traffic optimization, with outstanding performance. Aiming to fulfill high precision seismic exploration and appropriate seismic signal denoising CNN model is established in the present work. The advanced deep learning can promote the development of geophysical technology. And vice versa, task requirements of geophysics can accelerate the progress of artificial intelligence technology.

To remarkably enhance signal to noise ratios (SNR) and adapt to high precision seismic exploration, a novel transform domain based CNN architecture for denoising seismic data generated by the Internet of Things is presented in this paper, and primary contributions include:

  1. (i)

    To fully exploit features from seismic data, MSRD block is proposed and built for restoring seismic data with noises.

  2. (ii)

    Denoising problem is formulated into predicting transform-domain coefficients, by which noises can be further removed with MSRDN while more detail data is preserved comparing with the results in spatial domain.

  3. (iii)

    Our method is qualitatively and quantitatively evaluated with synthetic seismic records, public SEG and EAGE salt and overthrust seismic model and real field seismic data. The proposed method can get signals with higher PSNRs and far more useful data comparing with other leading edge ones. Essentially, it shows more considerable denoising performance under higher noise level.

The reminder of the manuscript is structured as the following: Section 2 presents a brief review of relate studies. Section 3 proposes a novel scheme, and it is validated in Section 4. Lastly, Section 5 summarizes the work.

Related work

For seismic exploration data from the Internet of things, many seismic denoising methods have been developed so far. The seismic denoising methods were proposed by Canales to suppress random noises and achieved a potential result [5]. Since then, several effective schemes for suppressing random noises have been proposed, such as sparse transform based methods, dictionary learning based procedures, and nonlocal means algorithm [8]. Among them, sparse representation of seismic signals has become popular.. Conventionally, almost all denoising is performed in a transform domain [6, 7, 9,10,11]. For learning based methods [12, 13], an over complete dictionary, which is generally written as an explicit matrix, can be inferred from a series of examples, and the matrix should be trained before it is adapted to examples.

Recently, with the continuous expansion of Internet of things data and cloud platform data, there are differences in data processing. Deep learning (DL) becomes very attractive and demonstrates excellent ability in many areas, such as multimedia and machine learning, since it overcomes the shortcomings of common learning based schemes by leveraging convolutional neural network (CNN) architecture. Since the CNN-based SRCNN was first proposed in [14] for low-level vision, lots of methods [14,15,16,17,18,19,20,21,22] with different network depth, i.e., the deeper layer RCAN [22], have been developed. The depth of networks has a positive impact on the SR performance of these networks, which was proved by the increasing number of convolutional layers from 3 to 400. In some recent CNN-based SR, several identical feature extraction modules are connected to build the entire network [19,20,21,22], and each module is essential in this structure. Obviously, these CNN-based methods are superior over the traditional methods. Some diversified SR network structures are also proposed. Recently, attention-based SR network models have been proposed, which uses addition operation in the output layer, avoiding the large amount of computing power consumed by convolution kernel multiplication, so as to efficiently complete the image SR.

These SRs are performed in spatial domain. But SR in transform domain can better preserve the context and texture information of an image in various layers. So, a deep wavelet SR network was proposed by Guo et al. [26] for acquiring HR images, whereas “missing details” of wavelet coefficients of LR images were predicted. Then, an orthogonally regularized deep network was suggested by the same team [27], whereas discrete cosine transformation (DCT) was integrated into CNNs. Besides, a face SR was constructed on the basis of wavelet transform and CNNs to capture local textural details and global topology information of faces [28].

Therefore, a novel transform domain based MSRDN architecture for seismic signal denoising is proposed and the method is detailed in Section 3.

Proposed method

In this paper, we propose and built a MSRD block to fully exploit features from seismic data for restoring seismic data with noises. Meanwhile, denoising problem is formulated into predicting transform-domain coefficients, by which noises can be further removed with MSRDN. First, the proposed method for seismic data denoising is outlined. Then, the architecture of the proposed MSRDN and MSRD block are briefly described. Lastly, transform domain is introduced.

Overview

Figure 1(a) presents a flowchart of our method. The noisy seismic data is first upsampled to the target high-resolution size to produce resized noisy seismic data, from which one low-frequency (LF) sub-band and several high-frequency (HF) sub-bands can be obtained by using specific transform. To predict the transform coefficients of the target clear seismic data, two deep residual networks are applied on the top of one LF sub-band and four HF sub-bands for preserving the global topology information and collecting the structure and texture information, respectively. Lastly, the target clear seismic data can be obtained via an inverse transform.

Fig. 1
figure 1

SeisDeNet architecture. a Transform domain coefficients prediction with MSRD network; b The architecture of MSRD network

MSRD network structure for seismic Denoising

We aims to restore a clear seismic data IClear from a noisy seismic data INoisy. As shown in Fig. 1(b), our MSRDN has two parts: the shallow (SSFE) and the deep seismic signal feature extraction (DSFE) module. We solve the problem for noised and clean signals:

$$\hat{\theta}=\arg \underset{\theta }{\min}\frac{1}{N}\sum \limits_{i=1}^N{L}^{DE}\left({F}_{\theta}\left({I}_i^{Noisy}\right),{I}_i^{Clear}\right)$$
(1)

where θ = {W1, W2, …, Wp, b1, b2, …, bp} is the weights and bias of our p convolutional layers. INoisy and IClear denote noised and clean signals, respective. LDE is the loss function for minimizing the difference between the noised and the clean data. And N is the number of training samples.

Mean square error (MSE) function is the most popular objective optimization function in image SR [15, 17, 18], whereas training with MSE loss maybe not the best option according to Lim et al. [29]. Mean absolute error (MAE) function LDE is used to reduce computations and avoid unnecessary training tricks in this work, which is defined by

$${L}^{DE}\left({F}_{\theta}\left({I}_i^{Noisy}\right),{I}_i^{Clear}\right)=\frac{1}{N}\sum \limits_{i=1}^N{\left\Vert {I}_i^{Noisy}-{I}_i^{Clear}\right\Vert}_1.$$
(2)

Particularly, the shallow feature F0 is obtained from the noisy seismic data with double convolution layers as follows

$${F}_0={H}_{SSFE1}\left({H}_{SSFE2}\left({I}^{Noisy}\right)\right),$$
(3)

where HSSFE1 and HSSFE2 are the convolution operations of these two layers. Then, the extracted F0 will be utilized in DSFE module. A DSFE module is composed of cascading MSRD blocks and each one can collect as much data as possible and extract more useful data. Then a 1 × 1 convolutional layer is employed to appropriately manipulate the output signals. This procedure can be expressed as

$${F}_{GF}={H}_{GF F}\left(\left[{F}_1,{F}_2,\dots, {F}_D\right]\right),$$
(4)

where Fi (i = 1, 2, …, D) is feature maps generated by MSRD blocks. [F1, F2, …, FD] is the concatenation of the feature maps. HGFF is a composite function of 1 × 1 convolutional layer. Then, feature maps IOutput can be obtained with global residual learning,

$${I}_{Output}={F}_0+{F}_{GF}.$$
(5)

In the proposed D-MSRDN, except the 128-filter convolutional layer in Eq. 4 s, each convolutional layer has c filters, where c is set as 32, 64 and 96.

Multi-scale residual dense (MSRD) block

Figure 2 shows the proposed MSRD block. Each MSRD block has m-path, which can be used for exploiting different scale characteristics features. Different from the RDN model [21], multi-bypass network is constructed in each module, with different convolutional kernels for different bypasses. So, the proposed model can adaptively measure m-path characteristics at various scales. This can be view as a wide and deep neural network model.

Fig. 2
figure 2

The architecture of MSRD block

Supposing the input and output of the d-th MSRD block are Fd − 1 and Fd. \({F}_{d,c}^1\) \({F}_{d,c}^2\), we have

$${F}_{d,c}^1=\sigma \left[{F}_{d-1},{Y}_{3\times 3}^1\left({F}_{d,1}^1\right),\dots, {Y}_{3\times 3}^{c-1}\left({F}_{d,c-1}^1\right)\right],$$
(6)
$${F}_{d,c}^2=\sigma \left[{F}_{d-1},{Y}_{5\times 5}^1\left({F}_{d,1}^2\right),\dots, {Y}_{3\times 3}^{c-1}\left({F}_{d,c-1}^2\right)\right],$$
(7)

where \({Y}_{3\times 3}^i\) and \({Y}_{5\times 5}^i\) refer to the function of 3 × 3 and 5 × 5 Conv layer in the i-th convolutional layer respectively, i = 1, …, c, …, C. By combining the previous information with the current multi-scale information, we have retained short path information.

$${F}_{d, LF}={H}_{LFF}^d\left(\sigma \left[{F}_{d-1},{Y}_{3\times 3}^1\left({F}_{d,1}^1\right),\dots, {Y}_{3\times 3}^c\left({F}_{d,c-1}^1\right),\dots, {Y}_{3\times 3}^C\left({F}_{d,C}^1\right)\right]+\sigma \left[{F}_{d-1},{Y}_{5\times 5}^1\left({F}_{d,1}^2\right),\dots, {Y}_{5\times 5}^c\left({F}_{d,c}^2\right),\dots, {Y}_{5\times 5}^C\left({F}_{d,C}^2\right)\right]\right),$$
(8)

where \({H}_{LFF}^d\) is the composite function of the 1 × 1 convolutional layer in the d-th MSRD block. σ is the ReLU function. [⋅] denotes the concatenation of feature maps by various convolutional kernels. Finally, the input information and combined multi-scale information are aggregated as follows:

$${F}_d={F}_{d\hbox{-} 1}+{F}_{d, LF},$$
(8)

where Fd is the output of the d-th MSRD block.

Transform-domain analysis for seismic data

Since wavelet can sparsely represent one-dimensional signals without point discontinuities, it has been successfully used in representing digital signals. But, image functions with curves and straight lines in higher dimensions cannot be “optimally” represented with wavelet analysis [30]. Subsequently, some sparser transform methods [31,32,33] have been presented such as curvelet transform, contourlet transform (CT), non-subsampled CT (NSCT), shearlet transform (ST), non-subsampled ST (NSST) and compactly supported ST (CSST), etc., in which the anisotropic regularity of a surface following edges can be exploited. Wherein, CSST is optimally sparse.

Generally, a sparse representation of signals can benefit signal processing tasks. To improve the representation sparsity of signals CT was developed by Do and Vetterli [31], which has two primary features of directionality and anisotropy and be superior to curvelets, bandelets and other geometrically-driven representations in its partially easy and efficient wavelet-like implementation using iterative filter banks.

Next, their sparsity is analyzed. The denoising effect is determined by the degree of representing decomposed effective signals [32]. The denoising performance becomes better and better as the sparsity of the method increases. Figure 3 shows the reconstruction errors in wavelet transform, curvelet transform, NSST and CSST domains and the data used are presented in Fig. 4(a). Clearly, NSST and CSST have the smallest approximation errors while retaining the same percentage coefficients, and the errors are close to zero when only 6% coefficients are retained, indicating its optimal sparsity. In addition, the literature [33] also indicates that compactly supported shearlets are optimally sparse.

Fig. 3
figure 3

Reconstruction errors comparison

Fig. 4
figure 4

Comparison of CSST, NSST and WT coefficients on the synthetic seismic data: (a) The original synthetic seismic data; the fusion of HF coefficients of (b) CSST, (c) discrete NSST, and (d) WT

The HF coefficients of CSST, NSST and WT are compared in Fig. 4. Obviously, CSST displays more accurate results of the curvature. We mention that these transforms can be applied in various denoising networks, with improved performance. Further experiment is conducted to evaluate the role of transform, as shown in Section 5.

Experimental results

Experiments are conducted to qualitatively and quantitatively evaluate the performance of MSRDN, with the following contrasting seismic denoising methods: traditional methods—wavelet-based methods and curvelet-based methods and DL based methods—VDSR [15], multi-scale residual network (MSRN) [20] and residual dense network (RDN) [21].

Seismic datasets

The basic data is synthesized with lots of seismic records, such as linear, curvilinear; various dip angle and fault events, with 1000 Hz sampling frequency and 150 traces. Ricker wavelet with the following expression is used as seismic wavelet:

$$x(t)=\left(1-2{\pi}^2{f}^2{t}^2\right)\cdot {e}^{-{\pi}^2{f}^2{t}^2},$$
(9)

where t is time and f is sampling frequency. Figure 5(a) shows partial synthetic seismic data. Besides, immigrated stack profile measured with the SEG/EAGE salt and overthrust model [34] is presented in Fig. 5(b). In addition, these seismic records are rotated by 45°, 90°, 135°, 180°, 270°, and 360°, respectively, following [17, 18]. Then, random noises of various levels are added into original and rotated datasets to obtain additional expanded versions, of which 80% are selected for training and 20% for testing.

Fig. 5
figure 5

Seismic data. a Partial synthetic seismic records; b Stacked profile measured by SEG/EAGE salt and overthrust model

Implementation details

Our MSRDN contains 10 MSRD blocks. For training four HF sub-bands and one LF sub-band are produced by passing training seismic data with 1-level NSST, and then cropped into 48 × 48 patches with an overlap of 24 pixels. The initial learning rate is 10− 4 for all layers and it haves every 50 epochs. The batch size is set as 64. The implementation of our method is realized under the Torch7 framework on an NVIDIA Tesla P100. The ADAM optimizer is used for updating. The approximate time of training our model is 6 hours for 200 epochs.

Comparison with traditional and leading edge methods

The denoising performance of our method is checked on synthetic seismic data in this section. Peak signal-to-noise ratio (PSNR) [35] is employed to quantitatively justify the reconstruction results:

$$PSNR\left({X}^{\prime },X\right)=10{\log}_{10}\frac{\sum_{i=1}^M{\sum}_{j=1}^N{{{\operatorname{MAX}}}_I}^2}{\sum_{i=1}^M{\sum}_{j=1}^N{\left({X}^{\prime}\left(i,j\right)-X\left(i,j\right)\right)}^2},$$
(10)

where X and X are the M × N denoised and clear seismic data, respectively; MAXI is the largest possible pixel intensity value.

The comparison is conducted with an identical training set for all models, and the released codes are used for the contrast models. Tables 1 and 2 present PSNR (dB) values for comparison with bold optimal values. PSNRs of our method are significantly higher than that of other schemes when evaluated on seismic data. Besides, Figs. 6 and 7 indicate that our method presents better qualitative results, with less residual coherent and incoherent noise.

Table 1 Comparison of PSNRs for different approaches on synthetic seismic data
Table 2 Comparison of PSNRs for various approaches on SEG/EAGE salt and overthrust model
Fig. 6
figure 6

Synthetic seismic data denoising. a Clear seismic data. b Seismic data with added strong random noise (PSNR: 68.1319 dB). Denoised seismic data by (c) curvelet-based threshold denoising (PSNR: 83.1581 dB), (d) RDN (PSNR: 89.8995 dB) and (e) the proposed approach (PSNR: 91.6652 dB)

Fig. 7
figure 7

Visual results of seismic signal denoising. a left panel: clean signals without noise, right panel: the signals with added white noise; b denoised signals (left panel) by RDN and its differences (right panel) with clean signals in (a); c and d same as (b) but for MSRN and the proposed method, respectively

In addition, noisy seismic data of Liaohe depression, China, which is acquired in the identical data area with same excitation and reception for validating the processing result of our method, are selected the field data examples. To guarantee no valid information loss, these data are roughly processed to generate the targeted clear data by using traditional random denoising modular of large processing system. The random noises of various levels are added into the targeted data for deep learning to learn and recognize noises and effective signals. For the same reason, the real seismic data are rotated by 45°, 90°, 135°, 180°, 270°, and 360°, to obtain expanded versions. 80% of versions are used for training; the remainders are for testing. As shown in Fig. 8, Fig. 8(a) is an original noisy data, and Fig. 8(b) presents the denoised data with the proposed scheme showing some highlighted effective signals, especially in the red rectangle area, a clearer interlayer structure and enhanced continuity of the events. Overall, our proposed method can also achieve a satisfactory result for real filed seismic data.

Fig. 8
figure 8

Migration profiles. a Original noisy section. b Denoised section by the proposed scheme

Discussion

In this section, we mainly discuss the effectiveness of transform domain. We conduct ablation investigation. Specifically, predication of CSST coefficients is introduced into denoising of seismic data and the contribution effect is evaluated. Four methods (VDSR, MSRN, RDN and our MSRDN) are selected and integrated with CSST predictions respectively. Figure 9(a) presents the PSNRs of MSRDN with and without CSST predictions across seismic data with noises at various levels. Figure 9(b) indicates the averaged PSNRs of VDSR, MSRN and RDN with noise level 0.1 from the left to the right respectively. Significant and consistent improvement are observed across all networks and benchmarks when integrated with CSST, demonstrating that CSST predictions over-perform in effectiveness as compared to spatial domain.

Fig. 9
figure 9

Effectiveness of CSST predictions. a PSNRs for spatial and CSST domains. b PSNRs for CSST predictions with various networks

Conclusions

We present a CNN-based seismic data denoising method in this work. To improve the seismic denoising performance an MSRDN consisting of a set of cascading MSRD blocks is proposed to exploit features of seismic data. Additionally, by applying transform-domain operator to the network structure richer detail information can be preserved in seismic signals with noises and the seismic denoising performance is improved further. The qualitative and quantitative experimental results illuminate that our method is significantly superior over other state-of-the-art ones in seismic data restoration.