1 Introduction

As one of the geometric attributes of seismic signal, the local slopes contain complete information of the reflection geometry, which is of great significance to the analysis of seismic data. An accurate estimation of the slope information can benefit many subsequent procedures such as horizons interpretation (Fomel 2010; Wu and Hale 2015), structure enhancement (Hale 2009; Liu et al. 2010), incoherent/coherent noise attenuation (Liu et al. 2015; Huang et al. 2017), deblending (Huang et al. 2018), seismic interpolation/reconstruction (Gan et al. 2016; Huang and Liu 2020), inversion (Yao et al. 2020; Liu et al. 2020a) and imaging (Fomel 2007; Zhang et al. 2019b). The introduction of local slopes into seismic data processing can be traced back to the work of Rieber (1936), in this basic, a method of controlled direction reception is developed, which can achieve excellent results in seismic processing and interpretation (Riabinkin 1957). After then, a number of researchers have used several techniques to estimate the local slopes. Ottolini (1983) proposed a picking method of local slopes by local slant stacks. Lambaré et al. (2004) extended Ottolini’s method further to make it simpler and more automatic. But it is extremely expensive to cast about the best direction by slant slacking with different slopes. Claerbout (1992) first delved into a plane-wave destruction filter to estimate seismic event slopes in local windows under an assumption of local plane-wave model. Such filter is constructed with the help of an implicit finite-difference scheme for the local plane-wave equation (Fomel 2002). Andersson and Duchkov (2013) described a structure tensor-based local slopes estimation technique and also provided a summary of previous studies in detail. Huang et al. (2017a) proposed an algorithm to track the largest values of the upper envelope in local windows to obtain the event slope.

The computer-assisted seismic data processing and interpretation become more popular and have demonstrated outstanding performance. Machine learning refers to the process that perceives the data environment and discovers as well as learns information from the data, and further makes strategic decisions that maximize the chance of successfully achieving specific goals (Murphy 2012; Valentine and Kalnins 2016; Chen 2018;). Compared with traditional technique, machine learning may achieve higher efficiency and precision (Chen et al. 2016; Wu et al. 2019b; Wu et al. 2020b; Zhang et al. 2019b). The machine learning-based technique has drawn a lot of attention from various kinds of fields, such as medical science, biomedical engineering, text processing, (electronic) commerce, internet engineering (Chan et al. 2002; Vafeiadis et al. 2015). In seismological community, machine learning has been successfully applied to noise attenuation (Chen et al. 2019; Zhu et al. 2019; Saad and Chen 2020), signal recognition (Huang 2019), earthquake detection (Jia et al. 2019; Mousavi et al. 2019b), arrival picking (Yu et al. 2018; Zhu and Beroza 2018; Zhao et al. 2019; Jiang and Ning 2019; Zhang et al. 2020), fault detection (Ping et al. 2018), geophysical inversion (Chen et al. 2018; Li et al. 2020; Wu et al. 2020a), traveltime parameters estimation (Liu et al. 2020b) and reservoir porosity prediction (Chen et al. 2020). As one of the machine learning methods, deep learning is composed of multiple processing layers to intelligently extract data features, which can dramatically improve the state-of-the-art in various fields (Lecun et al. 2015). It has become a powerful tool in the seismic processing and interpretation (Waldeland et al. 2018; Mousavi et al. 2019a). Lewis and Vigh (2017) adapted deep learning technique to full-waveform inversion to overcome challenges caused by complex geological stratification. Ross et al. (2018) trained a convolutional neural network using the hand-labeled data archives of the Southern California Seismic Network and used the trained model to detect seismic body-wave phases. Mousavi and Beroza (2019) designed a regressor (MagNet) composed of convolutional and recurrent neural networks to predict earthquake magnitude from raw waveforms recorded at single stations. Araya-Polo et al. (2018) obtained an accurate gridding or layered velocity model from shot gathers using deep neural networks.

To estimate the local slope information of a seismic profile, we use a deep neural network of six main layers to predict the local slope of the central pixel in each small data patches. To train and validate the neural network, we create 5,562,120 data patches by randomly convoluting 121 different wavelets with different reflectivity models. The neural network parameters are optimized by minimizing a cross-entropy function. The stochastic gradient descent with momentum method is used to achieve solve the optimization problem. Application of the trained neural network on both synthetic and field example shows the effectiveness.

2 Model introduction

2.1 Deep neural network architecture

Our designed network consists of six main layers including three 2D convolution layers and three fully connected layers, as shown in Fig. 1. The three convolution layers serve as a deep feature extractor which reduces the input images into smaller sets of attributes that hold relevant information in local structure. A single convolution layer can be formulated as:

$${\mathbf{Y}}_{j} = {\text{g}}\left( {f\left( {\sum\limits_{i} {{\mathbf{W}}_{j} } *{\mathbf{X}}_{i} + b_{j} } \right)} \right),$$
(1)

where Yj and Xi stand for the jth extracted features map and the ith input data, respectively. Wj and bj denote the weighting matrix and basis parameters which control the convoluting and horizontal shifting process, which slides over the input map with a specified stride to reduce the spatial dimensionality. f is a nonlinear activation function which can improve the universality of the deep neural network and its simulation performance of nonlinear process. For the seismic local slope estimation, we choose the state-of-the-art rectified linear unit (ReLU) (Nair and Hinton 2010) as the activation function \(f\) = max (0, x), which returns its argument x when it is greater than zero and returns zero otherwise. The ReLU activation maintains the nonlinearity and off–on characteristic which is similar to a biological neuron. g is the batch normalization operation which accelerates the training process and also reduce the risk of over fitting (Ioffe and Szegedy 2015). For each convolution layer, we pad the data with zeros prior to the convolution operation to keep the sizes of input and output same. To further boost the convergence rate and capacity of resisting disturbance and noise, we insert the max-pooling operation into the first two convolution layers as

$${\mathbf{Y}}_{j} = g\left( {f\left( {p\left( {\sum\limits_{i} {{\mathbf{W}}_{j} } *{\mathbf{X}}_{i} + b_{j} } \right)} \right)} \right),$$
(2)

where p represents the max-pooling operation taking the maximum of the neighborhood.

Fig. 1
figure 1

A demonstration of the neural network architecture

The three fully connected layers can be treated as a classifier to predict the class based on the extracted features. Instead of convoluting a local sub-image with a convolutional kernel, the neurons in a fully connected layer are completely interconnected with the next layer neurons, but not connected with neurons in the same layer or cross-layer. A single fully connected layer can be formulated as:

$$y_{j} = {\mathbf{w}}_{j}^{T} {\mathbf{x}} + b_{j} ,$$
(3)

where yj denotes jth output neuron. x is the reshaped 1D column vector from the previous layer neurons. wj represents the weighting vector and bj is the basis. To reduce the risk of gradient vanishing and over fitting, the first two fully connected layers are followed by a leaky ReLU activation l and a dropout operation d as

$$y_{j} = d\left( {l\left( {{\mathbf{w}}_{j}^{T} {\mathbf{x}} + b_{j} } \right)} \right),$$
(4)

The local slope labels are introduced in the last layer. For the last layer, we insert a soft-max operation to obtain the final prediction result (i.e., the local slope of the central point of the input seismic patch). The final prediction results are classified into 22 groups including a non-signal group and 21 groups of various slopes. One can of course define more output categories, but generally need a more complex or deeper neural network. The detailed parameters we use refer to Fig. 1.

2.2 Training data and labels

The main limitation of applying deep learning in seismic image processing remains in preparing rich training datasets and the corresponding labels (Wu et al. 2019a). Manually labeling a mass of seismic images is highly labor-intensive and tedious. Moreover, the manually labeling is highly related to human experience and the result often goes to be different with different interpreters. The incomplete or inaccurate labeling may mislead the training process and trained neural network will make unreliable predictions (Wu et al. 2019a). For this reason, we train the neural network using synthetically-created seismic patches, in which the local slope information is pre-given. The local slope of a seismic section is defined as the local slope of the corresponding reflection coefficients. In order to have realistic and varied seismic data, we generate 11 Ricker wavelets with equally increasing domain frequencies from 10 to 70 Hz (6 Hz interval). For each Ricker wavelet, we diversify it into 11 wavelets with different phases (0–180°, 18° interval), which leads to 11 × 11 = 121 different wavelets in total, as shown in Fig. 2.

Fig. 2
figure 2

121 synthetic wavelets for the generation of training data

We estimate the local slope information of each data point within its neighborhood. The size of the neighborhood patch can be varying for the local slope estimation depending on the user requirement. However, it is to be understood that a small size of neighborhood patch may provide insufficient details and result in an incorrect estimation, while a big one introduces some interference information (e.g., nonlocal information) which may misguide the training and predicting processes. In addition, the bigger the size, the higher demand on the memory and computation performance of the workstation. Take these into account, we set a 31 × 5 local window to estimate the slope of the central point. We randomly convolute the synthetically created wavelets with different reflectivity models to obtain 1,854,040 small seismic patches including 299,430 (16.15%) non-signal patches and 1,554,610 (83.85%) signal patches with different slopes. To make the synthetic seismic images even more realistic, we add different levels of random noise. We also augment the data sets by randomly decimating and dithering operations to make the trained neural network applicable to the seismic data with relatively complicated structure, which produces 5,562,120 patches (898,052 non-signal and 4,664,068 signal patches) in total. Figure 3 shows the statistical analysis of the training data. 1,112,424 (20%) of the patches are randomly selected as the validation data sets.

Fig. 3
figure 3

Statistical analysis of the training data. a Non-signal and signal patches distribution. b Slopes distribution. c SNRs distribution

2.3 Deep neural network training

The whole training process of the deep neural network can be considered as solving a complex nonlinear inverse problem using interactive forward propagation and back propagation, where the former calculates the output (prediction), and the latter updates the parameters (weights and biases). The process of updating weights and biases can be viewed as an optimization problem where a cost function is minimized. We choose the widely-used cross-entropy as the cost function to train our deep neural network:

$$F\left( {{\mathbf{p}},{\mathbf{q}}} \right) = \sum - {\mathbf{p}}log\left( {\mathbf{q}} \right),$$
(5)

where p and q are the true and predicted classification vectors, respectively. We use the stochastic gradient descent with momentum (momentum = 0.65) and set a 0.045 initial learning rate and apply a 0.02 dropping factor to it in each epoch to optimize the network parameters. We feed the deep neural network in batches with the size of 1024 patches. Figure 4 presents the loss curves of the first ten thousand iterations on training and validation sets. The final training and validation accuracies are 97.1% and 96.67%, respectively.

Fig. 4
figure 4

Loss curves on training set and validation set

Figure 5 shows some randomly-chosen examples of the testing seismic patches (not included in the training data sets) displayed with the true (blue line) and predicted (red line) slopes of the central point (green dot). The last two columns belong to the non-signal category because the signals do not pass the central point. It can be seen that the synthetically-trained network can predict the local slope precisely with only minor error for two patches as marked by the blue frame boxes.

Fig. 5
figure 5

A demonstration of some testing seismic patches displayed with the true (blue line) and predicted (red line) slopes of the central point (green dot). The blue frameworks mark two wrong predictions

3 Results and discussion

3.1 Synthetic example

We then test the synthetically-trained network on three synthetic seismic data sets which are more realistic, as shown in Fig. 6a–c. The clean signal in each data consists of twelve hyperbolic events with different domain frequencies (20− 50 Hz) and initial phases (0°–90°). To test the prediction performance of our network, we added some Gaussian noise to the last two data (Fig. 6b, c). The rate of maximum amplitudes between noise and signal of Fig. 6b and c are 0.75 and 1.5, respectively. Figure 6e and f demonstrates the estimated local slopes maps using our synthetically-trained network. It can be observed that, when the input data are clean (Fig. 6a) or moderately corrupted by noise (Fig. 6b) the network can captures information from the seismic image and predicts the local slopes precisely. Although the performance goes downhill as the increase of noise (Fig. 6c), the local slope map (Fig. 6f) shows the slope variation tendency of seismic events and the main structure. A by-product i.e., the signal recognition, of each local slope estimations is presented in Fig. 6g–i, where the blue and white areas indicate the spatial–temporal locations of signal and non-signal components of the seismic data. It can be observed that although not very perfect for the last data, all the recognition results graphs the shape of seismic structure.

Fig. 6
figure 6

Synthetic data example. a-c Three synthetic seismic images with different levels of added Gaussian noise. d-f The estimated local slope maps from Fig. 6a, b and c. g-i The corresponding signal recognition results

We need to mention here that the proposed technique is a patch-based network. In other words, for each pixel a neighboring section will be involved in calculation. Therefore, it takes a large amount of time in the network training process. For example, it takes approximate 2 hours for the first ten thousand iterations in the training process. Compared with training process, the prediction step is much faster. For the synthetic data, it takes about 40 s to get the predication result.

In order to compare the performances of the proposed and traditional methods, we also apply the plane-wave destruction method (Claerbout 1992) to estimate the local slope map of the data. The plane-wave destructor originates from a local plane-wave model for characterizing seismic data (Fomel 2002). It is constructed as finite-difference stencils for the plane-wave differential equation and acts as a spatial–temporal analog of the spatial-frequency prediction-error filters. Figure 7 presents three groups of estimated local slope maps of the three synthetic seismic images (Fig. 6a–c) using plane-wave destructor with different local windows. From the top to bottom, the sizes of processing windows are 7 × 3, 25 × 5 and 73 × 7, respectively. In overall, the results are good. As the size increases, the local slope map become smoother but is with lower resolution to distinguish those events. The small size of windows also depresses the performance of plane-wave destructor in low SNR case (Fig. 7c). We can also observe that plane-wave destructor has difficulty in estimating steep slopes as indicated by the red arrows. Comparing with the deep neural network, the plane-wave destruction method cannot directly obtain the spatial–temporal locations of signal like our trained network does. Figure 8 demonstrates the statistical analysis of the performance of local slope estimation. Because the events are curve continuously, the reference local slope is determined as the first order central difference quotient of the corresponding reflection coefficients. Although not very accurate, it can statistically evaluate the performance to some extent. Figure 8a–l are the statistical analysis of the three synthetic seismic data sets, respectively. The first column corresponds to the proposed deep-learning method, and the second to the last columns correspond to three plane-wave destructors. It can be observed that the deep-learning method obtains the smallest error.

Fig. 7
figure 7

Local slope estimations of Fig. 6a–c by plane-wave destructor with different sizes of local windows. a-c 7 × 3 windows. d-f 25 × 5 windows. g-i 73 × 7 windows

Fig. 8
figure 8

A statistical analysis of the performance of local slope estimation. From the top to bottom: the statistical analysis of the three synthetic seismic data sets, respectively. From the left to right: the statistical analysis corresponds to the proposed destructors

3.2 Field example

To demonstrate how the neural network works in practice, we apply it to two real-reflection seismic data sets. The first real-data set is post-stack and shown in Fig. 9a. This data set is with a relatively high SNR and complex structure, e.g., a steep zone and a fault zone as highlighted by the green and blue frame boxes, respectively. To obtain the dip information of seismic events, we apply the trained neural network to calculate the local slope map (Fig. 9b). It can be seen that the neural network obtains a satisfactory performance and the change in color suggests the dip variation. Figure 9c–f presents the magnified sections marked by the green and blue frame boxes in Fig. 9a, where a better view is held to observe the performance of slope estimation in detail. In Fig. 9c, e the local slopes are plotted as the red bars to give us a more intuitive presentation. Figure 9c, d correspond to the blue frame box, which show us the details of the fault zone. We can observe that the color conflicts highlight the location of the faults. The data and local slope information corresponding the green frame box are shown in Fig. 9e, f. It can be seen that the almost all the red bars follow the direction of both steep and complanate seismic events spreading indicating a successful performance. Figure 10 demonstrates a comparison of the local slope estimation by the plane-wave destructor technique. We observe that although most slope information is picked accurately, there some disorganized and very likely false picked slopes as indicated by the red ellipse in Fig. 10a and the blue ellipses in Fig. 10b, d.

Fig. 9
figure 9

The field post-stack data example. a Input data. b Local slope map obtained by the learned network. c and d Magnified sections correspond to the blue frame box. e and f Magnified sections correspond to the green frame box

Fig. 10
figure 10

The field post-stack data example. a Local slope map obtained by the plane-wave destructor. b and c Magnified sections correspond to the blue frame box in Fig. 9a. d and e Magnified sections correspond to the green frame box in Figure

The second example explores the performance of the synthetically-trained network on a pre-stack data set. The original data set is shown in Fig. 11a. Similarly, we apply the trained neural network to estimate the local slope map and recognize the position of signals. Figure 11b shows the local slope estimation, which presents a distinct color piece change from blue to green and yellow (from left to right) in the shallow (0–2000 ms), but shows a somehow disorganized image in the deep (2000–4000 ms) due to the poor SNR. The seismic data and corresponding slope information in the blue and green frame boxes in Fig. 11a are presented in Fig. 11c–f. Figure 12 presents the local slope estimation by the plane-wave destructor technique where we can find some inaccurate slopes by visual inspection as highlighted by the blue ellipses in Fig. 12b, d. The slope estimation by the synthetically-trained network can be further used to facilitate the subsequent seismic data processing and underground structure analysis/interpretation.

Fig. 11
figure 11

The field pre-stack data example. a Input data. b Local slope map obtained by the learned network. c and d Magnified sections correspond to the blue frame box. e andf Magnified sections correspond to the green frame box

Fig. 12
figure 12

The field pre-stack data example. a Local slope map obtained by the plane-wave destructor. b and c Magnified sections correspond to the blue frame box in Fig. 11a. d and e Magnified sections correspond to the green frame box in Fig. 11a

4 Conclusions

We have designed a deep learning-based neural network to automatically estimate the local slope and spatial–temporal position of seismic events. In the designed network, three convolution and three fully connected layers serve as the feature extractor and classifier, respectively. Our neural network is trained by using only synthetic seismic data sets which greatly simplifie the process of training data collection because no manual labeling is needed. The application of the trained model to multiple synthetic and field seismic data sets recorded at varied surveys demonstrate an encouraging performance. The estimated local slopes and spatial–temporal position can be further utilized for many subsequent procedures such as signal detection, normal move-out correction, time-domain imaging and structural interpretation. Although this technique is currently tested on 2D seismic data but can be easily extended to 3D cases.