1 Introduction

Discriminating between earthquakes and explosions poses a significant challenge in seismology. When the source of seismic signals is unidentified, seismic stations and seismometers capture a range of local earth vibrations. Therefore, it is essential to classify the source of the signals as a preliminary step before engaging in seismic signal processing and analysis. Misidentifying artificial seismic events, such as quarry blasts, mining-induced earthquakes, and underground nuclear tests (Gitterman et al., 1998; Tibi et al., 2018; Zhao et al., 2015) can have serious implications. The conventional approach involves a visual examination of earthquake and explosion data or computation of specific record features to achieve seismic discrimination. However, this manual process demands a significant amount of effort and time from earthquake analysts, leading to potential errors in discrimination. To mitigate these challenges, employing a machine learning classifier for seismic discrimination can streamline the workload and reduce the occurrence of discrimination errors.

The primary focus of early seismology applications of machine learning (ML) was on analyzing the amplitude spectra of seismic waveforms from both natural earthquakes and nuclear explosions (Dowla et al., 1990; Dysart & Pulli, 1990). Successful utilization of ML techniques has been observed in distinguishing natural earthquakes from various other seismic events such as underground nuclear explosions (UNEs), submarine explosions, and volcano-tectonic events at local or regional distances.

ML techniques like support vector machines (SVM), random forests, self-organizing maps, and naive Bayes classification have demonstrated efficacy in discriminating between natural earthquakes and explosions (Pyle and Walter, 2019; Kim et al., 2020; L. Dong et al., 2014).

Several SVM algorithm types that have been proposed make use of feature vectors that are produced by estimating the frequency content in the frequency domain and calculating the amplitude from the time domain (Ahn, H. et al., 2022). Therefore, we used these machine learning techniques along with three other non-linear models (decision tree, K-nearest neighbors (KNN), and extreme gradient boosting) and one more linear model (logistic regression).

These models were chosen for inclusion because, according to Abdalzaher et al. (2021), they are utilized to identify artificial earthquake sources. For this goal, building an ideal model requires careful consideration of both ML architecture and training data choices.

Increasing the train dataset improves the model's performance in all artificial intelligence (AI) systems. Nonetheless, the evidence for ground nuclear explosions in the world is scant. As such, we employed the available nuclear data in our analysis. We may rise as much as we wish, but this is bad for models according to the earthquake. Due to the overfitting that occurs, the model becomes biased towards one kind and is unable to distinguish the other. As demonstrated by a number of articles, it is possible to create models with very little data. For example, "Nonlinear methodologies for identifying seismic event and nuclear explosion using random forest, support vector machine, and naive Bayes classification" used data from just 47 records. (L. Dong et al., 2014).

In this research, Statistical approaches that discriminate by examining P-wave and S-wave spectra relative to a standard earthquake source model (Allmann et al., 2008), Complexity and spectral ratio parameters, and body wave (mb) and surface wave magnitudes (Ms) (Holt et al., 2019; Koper et al., 2016; Wang et al., 2021) were applied for discrimination between earthquakes and nuclear explosions based on six distinctive features extracted from event waveforms as a first step and then applying seven ML models to these features of five global regions around the world (China, USA, North Korea, Pakistan and India) utilizing IRIS data to enhance the accuracy of manual features extraction.

This proposed approach will be utilized in online automatic event detection to identify whether an event is real or artificial, which is a critical choice in earthquake early warning (EEW). As a consequence of this work, we evaluated the effectiveness of supervised machine learning algorithms in distinguishing between nuclear explosions and earthquakes.

This inquiry has been organized as follows: Data collection and analysis focused on data collected from broadband and long-period seismic stations spanning 35 occurrences in the aforementioned five worldwide areas. The methodology demonstrated the feature extraction methods and the machine learning technology employed in this investigation.

The results and discussions demonstrated the accuracy of these models when applied to both classification feature extraction and time-domain waveforms, as well as the optimal settings for each model.

2 Data Collection and Analysis

Broadband digital waveform data encompassing both seismic events from earthquakes and nuclear explosions were acquired from IRIS. The dataset comprises a total of 35 seismic events, with 16 identified as natural earthquakes and 19 as nuclear explosions. These events were selected from diverse test sites, spanning China, India, Pakistan, Korea, and the USA, occurring between 1945 and 2017. Data was captured using 236 broadband and long-period seismic stations, operating at sample rates of 20 and 40 Hz, within a magnitude range of 4 to 6.5 (4 ≤ mb ≤ 6.5). Specifically, we focused on natural earthquakes that closely resembled the magnitude of nuclear explosions and occurred in proximity to the test sites. Due to a scarcity of nuclear explosion data worldwide from 1945 to 1990, our concentration was on nuclear explosions post-1990. For instance, within the China region, we utilized data from 7 earthquakes and 7 explosions. Similarly, in the India region, data from one earthquake and one explosion were used. In the Pakistani region, one explosion and one earthquake were recorded for analysis. Moving to the North Korea region, we incorporated data from 6 explosions and 3 earthquakes. In the USA territory, data from 4 explosions and 4 earthquakes were included. The spatial distribution of these events and the specific stations used can be visualized in (Fig. 1). Furthermore, (Fig. 2) gives an example of the waveform content for the left-sided earthquake that was recorded at the BJT station on April 30, 2014, as well as the waveform for the nuclear explosion event that was recorded at the same station on September 3, 2017.

Figure 1
figure 1

Epicentral location of seismic events (natural earthquakes and nuclear explosions) and seismic stations used for discrimination in five regions around the world

Figure 2
figure 2

Waveform sample for the left-sided earthquake that was recorded at the BJT station on April 30, 2014, as well as the waveform for the nuclear explosion event that was recorded at the same station on September 3, 2017

Exploring datasets stands as a fundamental skill in gaining comprehensive knowledge of complex data. Irrespective of the data type, delving into data analysis exploration is a crucial initial step required to develop effective algorithms. Multiple discrimination methods, both in the time and frequency domains {including Complexity “C,” Spectral ratio “Sr,” Spectral Characteristics “corner frequency (FcP), (FcS),” and mb-Ms (body wave and surface wave magnitudes)}, were employed to distinguish natural events from nuclear explosions across five distinct zones. To grasp the objectives and assess the efficacy of this study, we utilized (Fig. 3) to examine the distributions of each feature including {Complexity “C,” Spectral ratio “Sr,” Spectral Characteristics “corner frequency (FcP), (FcS),” and mb-Ms (body wave and surface wave magnitudes)} of all earthquakes and explosions in the same region. The Fig. 3 presents a method to visualize the distribution of each feature by illustrating the data through a continuous probability density curve for earthquakes and explosions.

Figure 3
figure 3

Distributions of the features including (Complexity “C,” Spectral ratio “Sr,” Spectral Characteristics “corner frequency (FcP), (FcS),” and mb-Ms (body wave and surface wave magnitudes) of all earthquakes and explosions in the same region

3 Methodology

The basis for automatic classification in this study relied on utilizing the time-domain waveform classifier and extracting features from collected seismograms. During the feature extraction process from the compiled waveform datasets, we derived four seismic source discrimination attributes.

(1) Complexity \((C\)): The complexity parameter is calculated by comparing the energy contents of the first five seconds of the P-waves (\({t}_{1}\)) of both the natural and the artificial events in the time domain. It’s determined by the equation of (Ari and Yosida, 2004).

$$C={\int }_{t1}^{t2}{s}^{2}\left(t\right)dt/{\int }_{t0}^{t1}{s}^{2}\left(t\right)dt ,$$
(1)

where; s (t) refers to the signal amplitude as a function of time (t), C is defined as the ratio of integrated power of the vertical component of the velocity seismogram \({{\text{s}}}^{2}\left({\text{t}}\right)\) in the selected time windows length (\({t}_{0}\), \({t}_{1}\)&\({t}_{2}\)) where \({t}_{0}\) is the onset time of P-wave,\({t}_{1}\) is the time of the first five seconds and \({t}_{2}\) is the time of the following thirty seconds.

(2) Spectral Ratio (\({S}_{r}\)): is another parameter for the classification of underground explosions from natural earthquakes in the frequency domain. The (\({S}_{r}\)) parameter is defined as the ratio of integrated spectral amplitudes a (f) of the P-coda in the selected frequency bands (high-frequency band \({h}_{1}\), \({h}_{2,}\) and low-frequency bands \({l}_{1}\), \({l}_{2}\)) which are visually selected from the spectrum of the earthquake and explosion for the same station. The (\({S}_{r}\)) parameter can be calculated by the following equation (Gitterman & Shapira, 1993).

$${S}_{r}={\int }_{{h}_{1}}^{{h}_{2}}a\left(f\right)df/{\int }_{{l}_{1}}^{{l}_{2}}a\left(f\right)df .$$
(2)

The values of the calculated spectral ratio (\({S}_{r}\)) are converted from the time domain to the frequency domain. The high-frequency bands \({h}_{1}\),\({h}_{2,}\) and the low-frequency bands \({l}_{1}\),\({l}_{2}\) are selected and the integration limits are chosen from the spectra of both explosion and earthquake by testing a number of frequency bands in order to find the best discriminating bands.

(3) Corner frequency of P and S wave (Fc): A characteristic feature of the displacement spectra Ω (P) for seismic body phases is the corner frequency (Fc), which can be related to the source dimension through this equation

$$Fc = cv/r.$$
(3)

where v is the elastic wave velocity of the body phase, r is the source radius, and c is a constant of order 1 which depends on the source model assumed (Sharpe, 1942; Kasahara, 1957; Archambeau, 1964; Berchkhemer and Jacob, 1968; Brune, 1970 and 1971).

(4) Body wave magnitude (mb) and surface wave magnitude (Ms): mb was developed to cover the teleseismic and deeper shocks and it was determined from broadband, vertical component seismograms at teleseismic distances, using one-half of the largest peak-to-peak motion of the P-wave at maximum amplitude. In this study, the body-wave magnitudes (mb) have been calculated using the formula: Iaspei (2013).

$${\text{mb}} = {\text{ log }}\left( {{\text{A}}/{\text{T}}} \right) \, + {\text{ q}}\left( {\Delta \, ,{\text{h}}} \right) - {3}.0$$
(4)

where (A) is the maximum ground amplitude in micrometer (µm) recorded by WWSPBN (Wood Anderson Short Period Seismograph). (T) is the period in seconds, T < 3 sec; q (∆, h) is a calibrating function to correct amplitude decay for epicentral distance and focal depth, ∆  = epicentral distance in degrees, 20° ≤ Δ ≤ 100°, h = focal depth in km (Gutenberg (a), 1945).

While the Ms has appeared to cover larger shocks. It is determined from the amplitude of Rayleigh waves with a period of 20 s recorded at teleseismic distances by formula (5) (Vanĕk et al., 1962). It is one of the most widely used magnitude scales, for large damaging earthquakes.

$${\text{Ms}} = {\text{log }}\left( {{\text{A}}/{\text{T}}} \right) \, + {1}.{\text{66 log }}\Delta \, + \, 0.{3}.$$
(5)

where, (A) is the maximum ground amplitude in micrometer (µm) recorded by WWLPBN (Wood Anderson Long Period Seismograph), T is the period in seconds, where ∆ is the epicenter distance from the events to the station in degrees which is about 20° ≤  ≤ 160°and (A/T) in microns per seconds.

Subsequently, ML models were applied to these output features to improve classification accuracy, classifying them as either linear or nonlinear. Leveraging these models, the proposed technique is then demonstrated.

3.1 Linear Models

  1. 1.

    Logistic regression (LR) is a method employed for binary decision-making, determining the posterior probability of feature class groups using a logit function. This function helps transform the components into the necessary probability.

  2. 2.

    Support vector machine (SVM) is used to identify the optimal hyperplane for K features by processing input data characteristics within a K-dimensional space. This facilitates the construction of the most effective hyperplane, achieved when data points are maximally separated. SVMs are employed to represent data points closest to the hyperplane, influencing its performance.

  3. 3.

    Naive Bayes (NB): Generally, the NB classifier is considered nonlinear. However, if the probability factors exhibit dependency on exponential relatives, the NB classifier is treated as a linear classifier. NB follows a Gaussian method to compute continuous values for features (Abdalzaher et al., 2021). Consequently, it is categorized as Gaussian Naive Bayes (GNB), a deterministic yet probabilistic technique utilized for value prediction based on a probability distribution.

    $${\text{p}}\left(\mathrm{a }=\mathrm{ p}|{\text{c}}\right)=\frac{1}{\sqrt{2\uppi {{\text{v}}}^{2}}} {{\text{e}}}^{\frac{-{\left({\text{p}}-\upmu \right)}^{2}}{2{{\text{v}}}^{2}}} ,$$
    (6)

Where “a” is an input of continuous data, “p” is the probability density, “c” denotes the class of the source of the event either earthquake or explosion, “v” indicates the variance, and “μ” represents the mean.

3.2 Nonlinear Models

  1. 1.

    K-nearest neighbors (KNNs): The decision process in the K-Nearest Neighbors (KNN) technique relies on establishing a boundary that assigns an input value based on its nearest neighbor set. Within this technique, the neighbor set serves as a crucial hyperparameter, assisting in the regression of the input, and the distance parameter aids in selecting the nearest neighbors. In the presence of a KNN version, the votes from the training set are determined based on their cosine similarity to the input, considering proximity.

  2. 2.

    Decision tree and random forest (RF): random forest (RF) is an ensemble method grounded in the tree concept, as proposed by (Breiman, 2001). Comprising a group of counterpart learners collaborating to mitigate variation and bias, RF is provided by the following equation.

    $$\widehat{{\text{Y}}}\left({\text{V}}\right)= \frac{1 }{{\text{N}}} {\sum }_{{\text{j}}={\text{I}}}^{{\text{N}}}{{\text{C}}}_{{\text{j}}}\left({\text{V}}\right),$$
    (7)

where V is the input vector, N denotes the number of trees, and \({C}_{j}\left(V\right)\) represents the \(jth\) classifier.

The essence of the RF algorithm lies in randomly selecting subsets of samples, with replacements, from the training dataset. Subsequently, a decision tree is trained for each of these subsets.

3.2.1 Ensemble Voting

Ensemble modeling is the act of creating numerous varied models to predict a result, either via the use of various modeling techniques or training data sets. The ensemble model then combines the predictions of each base model, yielding a single final estimate for the unknown data. In this investigation we used various modeling techniques. That done by combine all linear and non-linear models with different ratio where RF technique has high resolution in time-domain classifier.

In this work, seismic occurrences and ground nuclear explosions were treated as two-class prediction problems (binary classification), with outcomes labelled as positive or negative. A binary classifier produces four potential outputs. If the outcome of a prediction is and the actual value is, it is referred to as a true positive (TP); if the actual value is, it is referred to as a false positive (FP). A true negative (TN) occurs when both the forecast outcome and the actual value are, whereas a false negative (FN) occurs when the prediction outcome is but the actual value is. The ROC curve (Fig. 5) demonstrates the transition between the false positive (FPR) and true positive (TPR) rates expected for each class. Where True Negative Rate (TPR) informs us what percentage of the positive class was successfully categorized, and has calculate by:

$$TPR=TN/TN+FP.$$
(8)

False positive rate (FPR) Shows us how much of the negative class was wrongly categorized by the classifier and has calculate by:

$$FPR=FP/N+FP.$$
(9)

4 Proposed System Model

The proposed methodology, incorporating a blend of both linear and nonlinear ML approaches, is shown in (Fig. 4). The proposed models are categorized into two types of classifiers. The first type is responsible for calculating six features (C, Sr, mb, Ms, Fp, Fs), which were computed for 236 waveforms for Z components, comprising 125 earthquakes and 111 explosions. On the other hand, the second type of classifier is based on binary classification, where '0' represents nuclear explosions and '1' represents earthquakes based on the waveform of each event. To expand the dataset, the waveforms were increased to 1001 events (including 622 earthquakes and 379 nuclear explosions) by gathering more data from both horizontal components (N and E) and the vertical component (Z) for each region around the world; but, the high signal-to-noise ratio of these stations prevents us from computing the abovementioned features. We employed a diverse range of linear and nonlinear ML models during this process, leveraging the Scikit-Learn library for Python code implementation. Achieving optimal model performance necessitated parameter adjustments and careful model setup.

Figure 4
figure 4

Proposed system model

To thoroughly evaluate these models, two distinct training dataset-to-test dataset split ratios (20:15% and 20:25%) are employed. This enables a thorough examination of the features and waveforms. The outcome of this evaluation guides the configuration parameters to optimize the performance of the best model for both the features and waveform models. Initially, we retained all parameters, considering them valuable for our training. Hyperparameters were fine-tuned during the training phase. Subsequently, for each ratio, we computed the error rate across all ML classification models that were evaluated.

In this study, we used several machine learning models. These models use trial-and-error methods to find the ideal settings. To do this, fine-tune each model's parameter values until the optimal accuracy is achieved. The optimal results guided the selection of these settings (Abdalzaher et al., 2021; Ahn, H et al., 2022). We utilized ROC (receiver operating characteristic) and AUC (area under the curve) as evaluation metrics for classifier performance.

5 Results and Discussion

Our study concentrated on nuclear explosions in five distinct global regions as well as significant earthquakes that met the same criteria as nuclear explosions, including magnitude and location and extracting six distinct features of these events. In contrast, the majority of articles in that field concentrate on two or four elements. Furthermore, our goal is to identify the best model for event waveforms that may be used in an online application without requiring the extraction of event.

Notably, nonlinear ML models exhibited superior classification capabilities across the extensive scenarios we investigated, surpassing the performance of linear models. Figure 5 showcases the ROC curves obtained for classification between earthquakes and explosions, illustrating the superior performance of all models across both datasets. The AUC score, representing the area under the ROC curve, serves as a comprehensive metric evaluating the model's ability to predict classes accurately in general terms. Figure 6 compares each model's AUC score across both datasets.

Figure 5
figure 5

ROC curves for all models a Time domain waveforms (left side) b Features calculations (right side)

Figure 6
figure 6

Accuracy comparison of the utilized feature and waveforms

Based on the ROC and AUC analysis, the feature extraction models emerged as superior classifiers compared to waveform classifiers. Table 1 illustrates the accuracy of each model in the case of waveform and feature extraction classifiers. Remarkably, LR, RF, KNN, and ensemble models achieved the best performance, attaining a remarkable accuracy of 100% based on the six distinctive features enabling discrimination between earthquakes and explosions. Furthermore, RF and ensemble models excelled in classifying earthquakes and explosions in time-domain waveforms, achieving accuracy rates of 75.5% and 76.2%, respectively.

Table 1 The accuracy of all models in the case of time-domain waveform and feature extract

In this investigation, we employed multiple ML models. These models obtain the optimal parameters by employing trial and error techniques. This is accomplished by adjusting each parameter for each model and then obtaining the best accuracy. These settings were chosen based on the best findings (Abdalzaher et Al., 2021; Ahn, h et al., 2022). The best parameters of the optimized best model for RF model has been achieved by 50 estimators, the criterion was 'entropy', and the,max_features is ‘sqrt’. The best LR model parameters were 'lbfgs' solver, which support only Lasso regularization for 50 iterations. Where KNN best models parameter has been utilized with 'euclidean' metric.

The comparison of each model's accuracy in our results with that of each model's accuracy in previous work is shown in Table 2.

Table 2 The comparison of each model’s accuracy in our results with that of each model’s accuracy in previous work

Our findings show that the RF model achieved 100% accuracy by using feature extraction, and that it achieved 75.5% accuracy in the time domain waveform. On the other hand, the model obtained accuracy of 98% and 97.5% based on distinctive features, as reported by (Abdalzaher et al., 2021, and L. Dong et al., 2014) respectively.

Based on feature extraction, the LR model attained 100% accuracy, which translates to 74% accuracy in our time domain waveform results. Based on distinguishing characteristics, this model achieved accuracy of 73% and 86% as reported by Ahn, H. et al., 2022 and Abdalzaher et al., 2021 in that order.

In the time domain waveform, the SVM model produced results with 66% accuracy, which translates to 99.4% accuracy based on feature extraction. According to study findings by Ahn, H. et al. (2022), Kim et al. (2020), Abdalzaher et al. (2021), and L. Dong et al. (2014), this model's accuracy was 93.7%, 95.5%, 86% and 96.3% respectively, based on distinguishing traits.

Based on feature extraction, our results demonstrate that the KNN model obtained 100% accuracy, while in the time domain waveform, it gained 53% accuracy. Conversely, the model's accuracy was 96% depending on distinguishing characteristics, according Abdalzaher et al., 2021.

Our results show that the DT model attained 59% accuracy in the time domain waveform and 97.3% accuracy based on feature extraction. On the other hand, report from Abdalzaher et al., 2021 show that the accuracy of the model was 96% depending on differentiating characteristics.

In the time domain waveform, the Naïve Bayes model produced results with 45% accuracy, which translates to 99.8% accuracy based on feature extraction. According to study findings by Ahn, H. et al. (2022), and L. Dong et al. (2014), this model's accuracy was 91.11% and 95.6% respectively, based on distinguishing traits.

6 Conclusions

Several linear and nonlinear ML models were used in the suggested technique. Extensive studies have resulted in the selection of the best model, which depends on six characteristics to provide the best categorization between two groups. For the feature extract dataset, the best models achieved by LR, RF, KNN, and Voting classifier recorded 100% classification accuracy based on 35 events (natural earthquakes and nuclear explosions) observed by 236 seismic stations in regional regions worldwide. For the time-domain waveform dataset, the best models achieved by RF and ensemble models recorded 100% classification accuracy based on the same events observed here by 1001 seismic stations in regional regions worldwide. In future studies, we aim to increase the waveform dataset accuracy by extending the suggested model.