1 Introduction

Hydraulic fracturing technology is the core supporting technology for the development of unconventional oil and gas reservoirs. By injecting quantities of high-pressure liquid into the formation, the rock pore pressure rises. When the pressure exceeds the elastic critical value of rock, tension or shear rupture will form a fracture network. Rock rupture will produce microseismic events, and the microseismic wave signals are collected by geophones and analyzed to obtain information such as the location, magnitude and energy of the seismic source [1]. The microseismic monitoring technology is employed to monitor the microseismic signals in the hydraulic fracturing process. However, microseismic signals have the characteristics of low energy, complex noise and weak signals. The collected microseismic signals are affected by the surrounding environment and is often accompanied by many kinds of noises (e.g., drilling interference, acoustic interference and strong pulse interference). The microseismic signals are submerged in noises and cannot be used effectively. As a result, the microseismic signals denoising is introduced to improve the recognition rate of microseismic events, which is significant to microseismic monitoring technology and production increase of unconventional oil and gas reservoirs. Microseismic signals denoising has become a recurring topic of research since the pioneering work in [2] and numerous research results have been reported in the literature [3,4,5,6,7,8,9,10]. In [8], a joint method of CEEMD and wavelet packet threshold has been utilized in experimental analysis and engineering applications. The noise suppression effect of this method is better than the single CEEMD method and the wavelet packet threshold method. In [9], an automated platform has been built for microseismic signals analysis. The system can quickly process large data sets of continuous seismic records, and realize the original seismic signals denoising, the detection of seismic events, then the construction and selection of the best characteristics of each event type. Finally, the event is divided into a specific category.

In recent years, neural networks have stirred a great deal of research attention [11,12,13,14,15,16,17,18,19]. Meanwhile, with its rapid development, convolutional neural network (CNN) has been widely used in image recognition [20, 21], speech signal disposal [22], image denoising [23] and other fields, which have attracted the attention of researchers, see e.g., [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39]. In [32], an image denoising framework based on residual learning CNN has been designed for alleviating network degradation as well as improving the accuracy of verification without learning identity mapping. The hierarchical residual learning network is capable of handling multiple general image denoising tasks. Nevertheless, it increases network complexity and relies heavily on batch normalization. In [36], a CNN image denoising method has been proposed for adapting to different image restoration tasks, however, it may trap in a local minimum. In [37], a complex valued deep CNN has been designed for image denoising, which achieves good accuracy with huge calculations.

Classical supervised learning usually uses specific data to train a single task in a given domain. When the domain changes, the model is often no longer accurate or even invalid. Transfer learning refers to applying the knowledge learned in one field to another target field, which has attracted some initial research interest, see [40,41,42,43,44,45,46,47] and the references therein. For instance, in [45], the transfer learning method has been introduced to classify breast cancer in ultrasound images, which has got higher AUC value than CNN method.

High-quality microseismic signals not only guide the implementation of hydraulic fracturing engineering but also play a crucial role in oil and gas extraction, lithology discrimination, and geological exploration. To utilize microseismic signals more effectively, we build a transfer learning based microseismic denoising model of CNN (T-MCNN). The proposed model can successfully address the issue of small samples and raise the microseismic signal-to-noise ratio. The major contributions of this paper are as follows:

  1. (1)

    Transfer learning is introduced to learn the complex features of images noises and apply the knowledge to microseismic signals domain. The utilize of transfer learning can solve the problem of insufficient labeled training data and small samples.

  2. (2)

    A deep learning model, which based on CNN, is designed for denoising and improving the signal noise ratio. The 16-layer CNN based on VGG network and residual learning can produce excellent noise reduction results.

  3. (3)

    Experiments on microseismic denoising are carried out to verify the performance of the designed method. Peak signal-to-noise ratio, mean square error and signal-to-noise ratio are adopted to evaluate the noise reduction effect of various methods. In addition, the timefrequency diagram is also introduced to analyze the denoising effect of the proposed method.

The rest of this article is organized as follows. In Sect. 2, the transfer learning model for microseismic signals denoising is described in detail. Section 3 introduces the designed CNN model. In Sect. 4, the simulation experiments are conducted and analyzed. Finally, Sect. 5 concluded this paper.

2 A Transfer Learning Model for Microseismic Denoising

Most of the traditional deep learning algorithms are supervised learning, which requires abundant data to train network models. CNN is able to learn the potential rules existing in the data, so there are strict requirements for the training set. First of all, the samples in a data set need to meet the condition of independent identical distribution, that is, each sample data is sampled from a feature space with a fixed probability distribution. In reality, it is hard to get large amounts of data that all conform to the same distribution. Secondly, the data of the training network needs to be labeled. The labeled data needs plenty of manpower and material resources, and the distribution of the data will also change with time and environment. Then the labeled data cannot be used again and need to be relabeled. In this case, transfer learning is an idea of problem solving. The knowledge learned in a certain field can be applied to solve problems in a new field. Figure 1 shows the learning process of transfer learning and that of traditional deep learning algorithms.

Fig. 1
figure 1

a Traditional learning method. b Transfer learning method

In the actual working environment, there are complex noises in the collected microseismic signals by the geophone, and it is very complicated to separate the clean original microseismic signals and the actual noise signals. While the same problem exists in the construction of image data sets which are seriously disturbed by environmental noises. Besides, the microseismic data sets do not have enough samples for training of deep learning, while the image data sets contain plentiful samples for training [48]. Therefore, the knowledge of image domain is used to solve the denoising task of microseismic signals by transfer learning. Transfer learning mainly includes two concepts: domain D and task T. The domain D is described as

$$\begin{aligned} D=\left\{ \chi ,P\left( X \right) \right\} \end{aligned}$$
(1)

where \(\chi \) is the feature space or sample space, sample data \(x=\left\{ x_{1},x_{2},\cdots ,x_{n} \right\} \in \chi \). \(P\left( X \right) \) is the probability distribution of the feature space, and the data x of the sample space obeys the probability distribution \(P\left( X \right) \). That is to say, the samples in a domain obey the same probability distribution. When the two domains are different, their sample space and probability distribution are different. When a domain D is defined, task T is represented by

$$\begin{aligned} T=\left\{ \gamma ,f\left( \cdot \right) \right\} \end{aligned}$$
(2)

where \(\gamma \) is the tag space, \(f\left( \cdot \right) \) is the objective function, and T is a task in the domain. Domain D can contain multiple different tasks. The purpose of deep learning model is to learn the objective function from a substantial amount of data pairs \(\left\{ x_{i},y_{i} \right\} \)

$$\begin{aligned} y=f\left( \cdot \right) =f\left( x \right) ,x\in \chi ,y\in \gamma . \end{aligned}$$
(3)

For given new sample data x, the corresponding prediction value \(f\left( x \right) \) can be obtained. In the denoising problem, the image domain is used as the source domain, and the data of the source domain is expressed as

$$\begin{aligned} D_{S}=\left\{ \left( x_{S_{1}},y_{S_{1}} \right) ,\left( x_{S_{2}},y_{S_{2}} \right) ,\cdots ,\left( x_{S_{n}},y_{S_{n}} \right) \right\} \end{aligned}$$
(4)

where \(x_{S_{i}}\in x_{S}\) is the sample of image data of the source domain, and \(y_{S_{i}}\in y_{S}\) is the corresponding noise sample label. The denoising problem of microseismic signals is taken as the target domain \(D_{0}\). Similarly, the sample data of the target domain is denoted by

$$\begin{aligned} D_{0}=\left\{ \left( x_{O_{1}},y_{O_{1}} \right) ,\left( x_{O_{2}},y_{O_{2}} \right) ,\cdots ,\left( x_{O_{n}},y_{O_{n}} \right) \right\} \end{aligned}$$
(5)

where \(x_{O_{i}} \in x_{O}\) is the sample of microseismic signals in the target domain, and \(y_{O_{i}} \in y_{O}\) is the noise label data corresponding to the sample. Next, the knowledge learned by the model in the image field is transferred to solve the denoising problem of microseismic signals. The process is illustrated in Fig. 2.

Fig. 2
figure 2

Transfer learning from image denoising to microseismic denoising

Image domain and microseismic signal domain are two different domain problems, so \(D_{S} \ne D_{O}\). At the same time, it also means that the data probability distribution of source domain and target domain is not the same, that is \(P_{S}\left( X \right) \ne P_{O}\left( X \right) \). For the task \(T_{S} = T_{O}\), the source task and the target task are denoising problems, and the problems to be solved are the same. Therefore, the knowledge learned in the image field is utilized to solve the denoising problem in the field of microseismic signal.

3 CNN Denoising Method Based on Transfer Learning

3.1 Noise Model

Image signal and microseismic signal models are defined respectively as follows:

$$\begin{aligned}{} & {} Y_{m}\triangleq X_{m}+ N_{m} \end{aligned}$$
(6)
$$\begin{aligned}{} & {} Y_{e}\triangleq X_{e}+ N_{e}. \end{aligned}$$
(7)

Among them, \(Y_{m}\) (\(Y_{e}\), respectively) is image signal (microseismic signal, respectively) with noise, \(X_{m}\) (\(X_{e}\), respectively) is clean image signal (microseismic signal, respectively), and \(N_{m}\) (\(N_{e}\), respectively) is noise in image signal (microseismic signal, respectively). Due to the strong similarity between noisy signals and denoising signals, it is easier to optimize the mapping of noisy signals to noise through residual learning and CNNs than to directly map clean data. Therefore, the construction of a CNN model is characterized as follows to map the noisy signal \(Y_{m}\) and \(Y_{e}\) to the noise \(N_{m}\) and \(N_{e}\):

$$\begin{aligned} {\hat{n}}=Net\left( y;\theta \right) \end{aligned}$$
(8)

where \(Net\left( \cdot \right) \) is the constructed CNN model, \(\theta \) contains the weight parameter w and the bias parameter b. Define the loss function as follows:

$$\begin{aligned} J\left( \theta \right) \triangleq \frac{1}{2N}\sum _{i=1}^{N}\left\| Net\left( y,\theta \right) -\left( y-x\right) \right\| ^{2}. \end{aligned}$$
(9)

In (9), N is the number of samples. Next, the image data set is used for pre-training the model. The image data pair is \(\left\{ y_{m},n_{m} \right\} \) and input into the network model \({\hat{n}}_{m}=Net\left( y_{m};\theta \right) \). By minimizing the loss function, the parameter \(\theta _{1}\) is obtained, which is indicated as follows:

$$\begin{aligned} \theta _{1}=\min _{\theta _{1} }\frac{1}{2N}\sum _{i=1}^{N}\left\| Net\left( y_{m},\theta _{1} \right) -\left( y_{m}-x _{m}\right) \right\| ^{2}. \end{aligned}$$
(10)

Through the pre-training of image data set, the pre-training model \(Net\left( \theta _{1} \right) \) is acquired.

The task of this paper is to microseismic signals denoising. By taking advantage of transfer learning, the knowledge learned from image denoising is adopted in the task of microseismic signals denoising. The denoising model got from image denoising is fine-tuned by loss function and microseismic data sets. T-MCNN is described as follows:

$$\begin{aligned} {\hat{n}}_{e}=Net\left( y_{e},\theta _{1},\theta _{2} \right) . \end{aligned}$$
(11)

The flow chart of T-MCNN model in this paper is shown in Fig. 3.

Fig. 3
figure 3

The flow chart of T-MCNN model for microseismic signals denoising

3.2 Network Model

Comparison of \(3\times 3\) convolution and \(5\times 5\) convolution is illustrated in Fig. 4, which reveals that the convolution of \(5\times 5\) is performed on the receptive field of \(5\times 5\), and an output is obtained. In this process, a \(3\times 3\) convolution can be used to process the receptive field of \(5\times 5\) first, and a \(3\times 3\) output can be obtained. In the second \(3\times 3\) convolution kernel processing, the same effect can be obtained as that of the \(5\times 5\) convolution kernel processing. When the receptive field is fixed, small convolution kernels are piled up to replace large convolution kernels, which increases the nonlinear layer and thus increases the expression ability of the network with fewer parameters. As displayed in Fig. 5, the output dimensions obtained by two \(3\times 3\) convolutions and a \(5\times 5\) convolution are the same, but the \(3\times 3\) convolution block adds an activation layer and increases the nonlinear expression ability of the network. The structure of VGG network is simple. By stacking \(3\times 3\) small convolution kernel and \(2\times 2\) maximum pooling layer, the network achieves the depth increase and the performance improvement.

Fig. 4
figure 4

Comparison of \(3\times 3\) convolution and \(5\times 5\) convolution

CNN microseismic denoising network T-MCNN based on transfer learning is constructed based on VGG network, and the network module is constructed by combining convolution operation, activation operation and batch normalization operation. T-MCNN is divided into three modules: \(\left( 1\right) \) Conv + Leaky ReLU; \(\left( 2\right) \) Conv + BN + Leaky ReLU; and \(\left( 3\right) \) Conv. The selection of layers of the CNN determines the performance of the model. Although the performance of the network would be improved if too many layers are set, it also brings an increase in the amount of computation. If too few network layers are set, the denoising effect of the network cannot reach the ideal state. Therefore, choosing the appropriate number of network layers is the key to build the denoising model of CNN. To assign an appropriate number of network layers, experimental comparison method is used to set the network layers as 10, 12, 14, 16 and 18 to train 20 epochs, respectively. PSNR, SSIM and SNR are used to evaluate the denoising effect of different depth models. The experimental results are revealed in Table 1.

Table 1 Comparison of denoising effects of models with different layers at 20 epochs

It is seen from Table 1 that, among the five different network layers, when the number of network layers is 16, the PSNR value of the network model is 30.72 (the highest), the SSIM value is 0.65, and the MSE value is 54.98 (the lowest). Considering comprehensively, the number of the network layers is set as 16, and the network structure is displayed in Fig. 5. The goal is to map from a noisy signal to a noise signal, so the input and output dimensions of the network are the same size. The convolution kernel size of the first-layer network is \(3\times 3\times 1\times 64\), \(3\times 3\) is the length and width dimensions of the convolution kernel, 1 is the number of channels of the convolution kernel, 64 is the number of convolution kernels, mainly including Conv and Leaky ReLu components. The convolution kernel size of layer 2–14 network is \(3\times 3\times 64\times 64\), the number of channels of the convolution kernels is 64, the number of convolution kernels is 64, and the middle layer contains the Conv, BN, and Leaky ReLu components. The convolution kernel size of network of the last layer is \(3\times 3\times 64\times 1\), the number of channels of the convolution kernels is 64, the number of convolution kernels is 1. This is due to the fact that the final output channel is consistent with the data channel.

Fig. 5
figure 5

The network structure of T-MCNN

Operation Batch Normalization is the batch normalization layer, which can be nested in the network layer to normalize the data, improve the generalization ability of the network and accelerate the convergence speed of the CNN. Leaky ReLU is a variant of the ReLU function, which is created to prevent too many neurons from falling into the “dead” state by entering the part of Leaky ReLU that is less than 0 and setting it to a small gradient, solving the problem of too many “dead” ReLU neurons not being able to update.

Fig. 6
figure 6

The network training flow chart of T-MCNN

The number of layers of the network model is determined as 16, and the components contained in each layer of the network model are determined. Next, we proceed to the network training phase. The flowchart of network training is drawn in Fig. 6. First image data is adopted to pre-train the network model to obtain the pre-trained model. Then the pre-trained parameters are used as the initial parameters of the network, the noise containing microseismic data is applied as the input of the network, and the noise is utilized as the output of the network. T-MCNN model is obtained by fine-tuning the neural network. Finally, the test data set is utilized to test the denoising performance of the T-MCNN network.

4 Simulation Experiment and Analysis

4.1 Build the Training Data Set

To train the T-MCNN model, it is necessary to construct an image data set for pre-training the model and a microseismic data set for fine-tuning the model. According to the mapping relationship of T-MCNN model, the input samples are the microseismic signals data containing noise, and the output labels are the noise signal data. 400 grayscale image data are collected as the data set of the pre-training model, and Gaussian noise (\(\sigma =50\)) is applied. The data set for CNN training are organized as follow: noise (the corresponding noise data, respectively) are regarded as input data (label data, respectively) when synthesizing image data. Patch processing is conducted on the image, and the selection of patch size depends on the level of noise. If the noise is complex, a larger patch may be selected to obtain more information for signal recovery. According to the settings of [49], the patch window size is selected as \(40\times 40\) and the sliding step size is 10. Patch data is intercepted from the original image data as the input of T-MCNN network. Figure 7 presents some of the pre-training data.

Fig. 7
figure 7

Pre-training image data for T-MCNN

To meet the mapping principle requirements of T-MCNN model, Ricker wavelet forward modeling is used to synthesize microseismic simulated signals. Gaussian noise is also a commonly used simulation noise for microseismic signals denoising. In the case of unknown noise type, Gaussian noise is used as the simulation of actual noise, which is simple and close to the actual approximate simulation. The expression of Ricker wavelet is described as follows:

$$\begin{aligned} s\left( t \right) =\left[ 1-2\left( \pi f_{m}t \right) ^{2} \right] \exp \left[ -\left( \pi f_{m}t \right) ^{2} \right] \end{aligned}$$
(12)

where \(f_{m}\) is the dominant frequency of the Ricker wavelet, and T is the time. Figure 8 is the Ricker wavelet graph.

Fig. 8
figure 8

Ricker wavelet for microseismic signals synthesizing

A total of 8000 microseismic signals are generated, each of which is with 400 sampling points. Select 80 channels of signals to synthesize a microseismic image data. There is a total of 100 microseismic data, and the shape and size are \(400\times 80\). The sliding window is chosen as \(40\times 40\), with a step size of 10 to slide on the microseismic image data, and a total of 600 blocks of data are obtained as the training data set of the fine-tuned T-MCNN model. The microseismic signals data is characterized in Fig. 9.

Fig. 9
figure 9

Microseismic data for T-MCNN training

4.2 Experimental Training Process

The computer used for this experiment consists of an Intel® Core™ i7-4510U, CPU, running at 2.60 GHz, an 8 GB RAM, the Win10 64-bit operating system, and a NVIDIA GeForce 840M with 4 GB of memory. The software environment is Matlab R2018a. Figure 10 shows the convergence of training network loss value of SGD and Adam optimization algorithm. It is known that the convergence speed of Adam is relatively better than the SGD algorithm, so the Adam algorithm is chosen as the optimization algorithm in this experiment. As recommended by [50], set the values \(\beta _{1}=0.9\), \(\beta _{2}=0.99\), \(\alpha =0.01\), and \(\varepsilon =10^{-8}\). The number of iterations is set as 20, 12 grayscale images are used to add \(\delta =50\) Gaussian noise as the image test set, and the value changes of PSNR and SSIM of each generation are counted. As drawn in Figs. 11 and 12, according to the changes of curves, the values of PSNR and SSIM firstly increased and then decreased with the increase of training algebra (epoch). When the algebra is 3, the values of PSNR and SSIM reach the highest value, which are 26.2396 and 0.7123, respectively. Therefore, the trained model with epoch = 3 is selected as the pre-training model.

Fig. 10
figure 10

Comparison of SGD and Adam loss of training

Fig. 11
figure 11

Changes of PSNR curve during training

Fig. 12
figure 12

Changes of SSIM curve during training

Microseismic data sets are employed to fine-tune the model, the change curve of loss value is exhibited in Fig. 13. In the pre-training stage, each epoch is trained 3313 times, and a total of three epochs are trained. In the fine-tuning stage of the model, each epoch is trained for 600 times, and three epochs are trained. As is observed from the change curve of loss value in Fig. 13, in the pre-training stage, the model could converge quickly. When the iteration reaches 9939 times, the training of the pre-training model finishes, the fine-tuning of the model starts, and the loss function value continues to decline, finally reaches the convergence state.

Fig. 13
figure 13

Change curve of model fine-tuning loss value

4.3 Experimental Results and Analysis

To verify the denoising effect of our proposed CNN model, Ricker is used to generate 10 pairs \(400\times 80\) microseismic synthetic data as a test set. As plotted in Fig. 14a, the microseismic signals contain two in-phase axes superimposed together, and the width of the center frequency wavelet (\(f_{m}=30\)) is 3. The generated Gaussian noise (\(\sigma =50\)) is applied, as reflected in Fig. 14b. To demonstrate the effect of our proposed CNN model based on transfer learning, we compare the effect of the transfer learning based microseismic denoising model (MCNN) and the non-transfer learning based microseismic denoising model. 400 pieces of microseismic data are utilized to train the model in the MCNN network model without transfer learning, while only 100 pieces of microseismic data are adopted to fine-tune the pre-training model in the T-MCNN network model based on transfer learning. To objectively evaluate the denoising effect of the two models, the PSNR, MSE, SNR and other indicators are calculated of the data after denoising with two models. PSNR measures the similarity between the denoised signal and the original clean signal. A higher PSNR indicates a better denoising effect. SNR represents the ratio of signal to noise. The larger the SNR, the better the denoising effect. MSE measures the error between the denoised signal and the original signal, and a smaller MSE suggests a smaller error between the denoised signal and the original signal, thus indicating a better denoising effect. The results are given in Table 2, which indicate that the three indexes of T-MCNN are all higher than those of MCNN.

Table 2 Comparison of evaluation indexes of denoising effect between MCNN and T-MCNN

To further analyze the denoising effect of the T-MCNN model proposed in this paper, the traditional wavelet threshold denoising algorithm (Wavelet), the MCNN algorithm and the T-MCNN algorithm are selected to make a comparison, the results are shown in Fig. 14a–e. As is reflected in Fig. 14c, there is noise residue in both the signal part and the no-signal part, which leads to poor clarity of microseismic signals, and there are still quantities of noise to disturb the microseismic signals. For the CNN model MCNN that is not pre-trained, the same algebra training as T-MCNN is better than the wavelet threshold denoising algorithm, but there is still a small amount of noise residue, and the processing effect of edge position is not ideal. Figure 14e represents the denoising effect of the CNN model based on transfer learning. It is shown clearly that the denoising effect is obvious with almost no noise residue and clear microseismic signals. From the perspective of denoising effect, both MCNN and T-MCNN can effectively complete the denoising work of microseismic data. It is illustrated that CNN is a powerful algorithm, which is capable of completing the denoising of microseismic data validly.

Next, spectrum analysis before and after the denoising of microseismic signals are carried out. Spectrum analysis is to implement Fourier transform on the signals and expand the signal strength in frequency order as a function of frequency change. Figure 14a shows the spectrum of the original microseismic signals, which are concentrated in the low-frequency region. There are two wave peaks in the microseismic signals, corresponding to the part with brighter color, while there is no microseismic signal in other parts. In Fig. 15b, noise information is added, and a considerable number of signals with different frequencies appear, but their intensity is low. The signal part is also affected, and the signal intensity changes. Figure 15c shows the result of denoising by wavelet threshold algorithm. Most of the noise perception with high frequency is processed, while much noise remains in the part close to the signal frequency. Figure 15d shows the effect after denoising of MCNN model. Compared with the wavelet threshold denoising algorithm, it has better denoising effect for low-frequency noise signals. However, there is still a small part of noise, and the signal of the first wave peak does not recover its main intensity information. Figure 15e shows the time–frequency diagram of the denoising results of the T-MCNN model proposed in this paper, which almost eliminates all noise signals and obviously restores the amplitude information of microseismic signals. Note that T-MCNN is the method with the best denoising effect, which successfully eliminates the noise of microseismic signals and retains the main information of signals.

Fig. 14
figure 14

Comparison of denoising effects of synthetic microseismic signals

Fig. 15
figure 15

Spectrum analysis of original microseismic signals and denoising microseismic signals from various models

According to Figs. 14, 15, 16 (denoising effect diagram, spectrum analysis diagram, waveform analysis diagram, respectively), the microseismic denoising algorithm proposed in this paper is able to remove the noise in the signal adequately and better protect the details of the signal. To assess the denoising results further, the peak signal-to-noise ratio (PSNR), mean square error (MSE) and signal-to-noise ratio (SNR) are adopted to quantitatively evaluate the denoising results of the model. The microseismic data is one-dimensional microseismic signal, and we transform the microseismic signal into image through the superposition of 80 channel data. In particular, PSNR and MSE are commonly used evaluation indexes for image quality, which are taken as the main reference indexes. Noises with levels \(\sigma =50\), \(\sigma =30\) and \(\sigma =15\) are added to the microseismic signals, respectively. The original microseismic signals are used as references. The statistical results of the evaluation indicators after denoising are given in Table 3. Taking the noise of \(\sigma =15\) as an example, the PSNR of the noisy microseismic signals is 17.3090. After carrying on the wavelet threshold denoising algorithm (the T-MCNN algorithm developed in this paper, respectively), the PSNR value increases by 3.5319. Compared to the wavelet threshold denoising method, the T-MCNN method has significantly improved the PSNR of processed microseismic signals. Specifically, the PSNR after T-MCNN processing increased by 53.04\(\%\) compared to the ratio after wavelet threshold algorithm processing. Although the wavelet threshold algorithm improves the PSNR of the microseismic signals and reduces the MSE of the microseismic signals, it does not effectively improve the SNR of the microseismic signals. In contrast, T-MCNN method has demonstrated effective denoising ability for high-level noisy signals by improving both the PSNR and SNR of the microseismic signals, while reducing MSE, unlike the wavelet threshold algorithm, which only improves the PSNR and reduces MSE without effectively improving the SNR of the signal. For low level noise, the T-MCNN model proposed in this paper still has better denoising effect. Specifically, compared to the signal processed by wavelet threshold algorithm, the signal processed by T-MCNN exhibits an increase of 27.92\(\%\) in PSNR, 271.80\(\%\) in SNR, and a decrease of 92.78\(\%\) in MSE. Therefore, no matter from the subjective visual analysis or objective quantitative index evaluation, the denoising algorithm designed in this paper has greater advantages, including eliminating the noise signal in the microseismic signals to the maximum extent, restoring the amplitude of the original signal, and protecting the edge detail information well.

Fig. 16
figure 16

Comparison of original microseismic signals and denoising microseismic signals from various models

Table 3 Comparison of the wavelet threshold algorithm and T-MCNN method

Finally, we select randomly a 80-channel microseismic signal and a random signal for waveform comparison and analysis. As plotted in Fig. 16, there are two waveforms in the signal. After adding noise, the waveforms appear distortion, and their maximum value exceeds the amplitude of the original signal, but most of the original waveforms remain. After denoising by wavelet threshold denoising algorithm, the amplitude of waveform decreases, but it is still higher than the amplitude of original signal. For no microseismic signal, the wavelet threshold denoising algorithm can not eliminate the noise, and there is still a large amount of noise. Both MCNN algorithm without transfer learning and T-MCNN based on transfer learning can achieve effective denoising. MCNN model cannot restore the amplitude to the original form, and its amplitude is lower than that of the original signal. In addition, the signal processing of edge position is not good, and the initial amplitude is different from the original amplitude. After denoising with T-MCNN algorithm in this paper, not only is the waveform of microseismic signal well protected with its amplitude being basically the same as that of the original signal, but also is the denoising effect quite obvious. In recent years, fuzzy learning has undergone rapid development, yielding fruitful results in methods such as fuzzy superior Mandelbrot sets [51], complex T-spherical fuzzy sets [52], and complex q-rung orthopair linguistic fuzzy sets [53]. To cope with uncertain and fuzzy data, we plan to enhance the capability of our model by combining transfer learning with fuzzy learning in the future, thus improving the data processing ability and robustness of the model.

5 Conclusion

In this paper, a transfer learning based CNN model has been proposed for microseismic signal denoising. The proposed method offers a novel and efficient method for noise reduction of microseismic signal. It is difficult to separate the microseismic data and noise data in the actual working environment and the synthetic data is not sufficient for network training. To address the problem, transfer learning has been introduced by utilizing the image data sets to pre-train the CNN denoising model and using the learned knowledge for microseismic signal denoising. The small sample problem can be successfully solved by the proposed T-MCNN which can also improve the microseismic signal-to-noise ratio. Experiments have indicated that the proposed method can increase the signal-to-noise ratio with different noise levels. Experiments have indicated that the proposed method can increase the signal-to-noise ratio with different noise levels. However, the proposed method has certain drawbacks. Due to the huge amount of calculation and the high demand on computational resources, it is challenging to apply the designed model on constrained devices, such as embedded microcontroller. In the future, we plan to compress the network, reduce the computation cost, and apply it on embedded microcontroller.