1 Introduction

Determining the structure of the Earth is one of the main aims of geophysical research. In this context, the crustal velocity structure is very important for accurately locating earthquakes, inverting seismic source moment tensors, and rapidly evaluating earthquake intensity distributions. To achieve this, a layered one-dimensional (1D) velocity structure can be used as an initial model for two-dimensional (2D) and three-dimensional (3D) inversions, while regional seismic waveform inversion is an effective tool that can be used to determine the former. However, as seismic waveform inversion problems are usually highly nonlinear and have non-unique solutions (Maurice et al. 2003; Li and Lei 2014a), linearized and gradient-based methods have traditionally been applied because of their computational efficiency. One shortcoming of these methods is that they often rely on the initial models and can converge on local minima; alternative approaches that do not depend on gradient-based searching within the model space can be used instead to improve the probability of convergence on the global minimum.

Among currently available search methods, one class of optimization algorithms inspired by biological evolution has been widely applied to geophysical inversions. Li et al. (2007), for example, used a genetic algorithm (GA) for the inversion of regional seismic waveforms to obtain the crustal structural velocity in Italy, while Abdelwahed and Zhao (2014) applied a micro-GA waveform inversion method in their study of crustal structure in northeastern Japan. Similarly, Maurice et al. (2003), Li et al. (2012), and Li and Lei (2014b) all applied a niching GA (NGA) in their work on crustal velocity inversions to get teleseismic and regional velocity structures, while Li et al. (2016) applied an NGA to the inversion of ocean-bottom waveforms from seismometers in order to determine the structure of the oceanic crust.

The FWA is a recently proposed novel swarm intelligence-based method that is used for nonlinear global optimization. This algorithm was inspired by firework explosions and is applied in global optimization problems that relate to complex objective functions. Numerical tests of the FWA applied to the inversion of crustal velocity structures have been carried out using synthetic models; this approach has been validated and is known to have higher accuracy and level of convergence compared to NGA (Ding et al. 2015). In this study, we apply the FWA to the inversion of regional seismic waveforms to determine the 1D crustal velocity structure in central Gansu on the northeastern margin of the Qinghai-Tibet plateau. An M W 5.4 earthquake took place near Minxian in Gansu Province on July 22, 2013, and was recorded by densely distributed permanent recording stations across the province. These recordings include three seismic waveform components that provide valuable data for investigations of regional crustal velocity structure in central Gansu. Thus, several geophysical studies have subsequently been carried out in this region, including the work of Shen et al. (2013) and He et al. (2014), who applied receiver function techniques to measure crustal thickness. In this study, we utilize regional seismic waveform inversion to obtain velocity structures within the crust, important data that augments our geophysical understanding of this region.

2 The FWA

Tan and Zhu (2010) proposed the use of FWA for the global optimization of complex objective functions. The FWA is a kind of searching method similar to GA, the search process used by this algorithm mimics the explosion of fireworks. In other words, when a firework is detonated, a shower of sparks is generated into the local space; in the FWA, these sparks search the neighboring model space in the vicinity of a specific location (Tan et al. 2013).

We applied a FWA variant called the Attract-Repulse FWA (AR-FWA) in this study (Ding et al. 2013). The core components of this variant are FWA search and attract-repulse mutations, as illustrated by the flowchart in Fig. 1. The main steps of the FWA are as follows:

Fig. 1
figure 1

(After Tan and Zhu 2010)

Flowchart of FWA

  1. 1.

    In the initialization step, n fireworks are randomly generated within the search range of a parameter.

  2. 2.

    A misfit value for each firework is then calculated using the objective function in Eq. (3).

  3. 3.

    The radius of explosion of each firework, \(A_{i}\), is then calculated using the misfit value in Eq. (1).

  4. 4.

    Fireworks are detonated, with each generating m sparks randomly distributed within an explosion radius \(A_{i}\). Misfit values are then calculated for each spark, with each firework replaced by its current best-fit (i.e., minimum misfit value) spark.

  5. 5.

    Attract-repulse mutation is then randomly applied to non-best-fit fireworks to increase their swarm diversity.

  6. 6.

    Mutated and best-fit fireworks are then chosen for use in the next generation, and this work flow is repeated from (2) until a predetermined termination criterion is satisfied.

The radius of a given firework explosion, A i , is as follows:

$$A_{i} = \hat{A} \cdot \left( {\frac{{f\left( {\varvec{x}_{i} } \right) - y_{\hbox{min} } + \xi }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {f\left( {\varvec{x}_{i} } \right) - y_{\hbox{min} } } \right) + n \cdot \xi }}} \right) + \delta .$$
(1)

In this expression, \(\hat{A}\) denotes the maximum explosion radius, n is the number of fireworks, \(x_{i}\) indicates the ith firework, and f(x i ) is the objective function value for the ith firework among n fireworks. Thus, \(y_{\hbox{min} } = \hbox{min} (f\left( {{\varvec{x}}_{i} } \right), \, i = 1, 2, \ldots n)\), the minimum (best fit) objective function value of n fireworks, and ξ denote machine precision to avoid zero division error. A small number, δ, is also introduced to this expression to guarantee a nonzero radius and to avoid stalls in the search process.

Thus, according to Eq. (1), a firework with a better misfit value will generate sparks within a smaller radius, and vice versa. Applying this strategy enhances exploitation of a greater number of potential positions, while searches will continue in the case of worse fit spaces.

As fireworks are set off and m sparks are randomly generated within neighboring spaces, the value of the kth dimension of the jth spark generated by the ith firework, \({\text{spark}}_{i,j,k}\), will have a random displacement with \({\text{firework}}_{i,k}\), \({\text{spark}}_{i,j,k} = {\text{firework}}_{i,k} + {\text{rand}}\left( { - 1,1} \right) \times A_{i,k}\), and \((i = 1,2, \ldots ,n;\;\;j = 1,2, \ldots ,m;\;\;k = 1,2, \ldots ,D)\). As sparks will exploit neighbor-solution spaces, this process is referred to as a FWA search. Thus, as the misfit value of each spark is calculated, each firework will be replaced by its current best-fit spark (Ding et al. 2015).

Attract-repulse mutation maintains FWA search swarm diversity while a further parameter, c, is introduced to control the mutation probability of each non-best-fit firework. Thus, during mutation, each dimension (d) of an ith firework has a 50% probability of being moved to a new position according to a random number, s, and the distance between non-best-fit and best fit fireworks. This random number, s, has a uniform distribution within the range \((1 - \delta s, \;\;1 + \delta s\)); thus, if s is less than 1, non-best-fit fireworks will be ‘attracted’ by the best fit to occupy the currently optimal position, while if s is greater than 1, these non-best-fit fireworks will be ‘repulsed’ to explore further spaces. This choice between ‘attract’ and ‘repulse’ reflects the balance between exploitation and exploration, while \(\delta s\) was set to 0.8 to enable faster convergence, as suggested by Ding et al. (2013). In order to ensure that all fireworks and sparks remain within the defined space, a mapping strategy was applied if mutated fireworks or generated sparks fell outside this range. Thus, if x i,d is outlying searching boundary of dimension d, while x i,d is mapped onto the search range using Eq. (2), as follows:

$$\varvec{x}_{i,d} = \varvec{b}_{d}^{l} + \left| {\varvec{x}_{i,d} - \varvec{b}_{d}^{l} } \right|\% \left( {\varvec{b}_{d}^{u} - \varvec{b}_{d}^{l} } \right).$$
(2)

In this expression, b l d and b u d denote the lower and upper boundaries of a search range of dimension d, while % refers to the modular arithmetic operation.

We chose mutated and best-fit fireworks for use in the next generation, calculating the misfit value and explosion radius, \(A_{i}\), of each new firework using Eqs. (3) and (1), respectively.

3 Robustness

In order to validate the performance of the FWA, Tan and Zhu (2010) carried out a series of experimental comparisons using nine benchmark test functions including standard and clonal particle swarm optimization (PSO). These experimental results show that the FWA outperforms the two others in terms of both optimization accuracy and speed of convergence. In addition, Ding et al. (2015) applied both PSO and AR-FWA approaches to the inversion of regional waveforms using synthetic models and data; the results of this study show that both PSO and AR-FWA are more efficient at avoiding premature in numerical tests, compare to the widely used evolutionary algorithms, including GA, NGA, and differential evolution. These experimental results also show that PSO and AR-FWA approaches generate models that converge very closely with synthetic examples.

In this study, we numerically test the robustness of AR-FWA in determining crustal velocity structure. To do this, we used a series of synthetic seismograms as observational data that were calculated using the reflectivity method (Fuchs and Müller 1971) and a given velocity model. The focal depth of these seismograms is 18.8 km, while nine stations range between 146 and 437 km away from the epicenter. We constructed a synthetic four-layered isotropic medium model on top of a half-space for this study, which corresponds to the sedimentary layer, the upper, middle, and lower crust, as well as the uppermost mantle. The P-wave velocity, v P, and the thickness, h, of each layer are presented in Table 1, while the density, ρ, of each layer was obtained using PREM (Dziewonski and Anderson 1981) and the crust 2.0 model (Bassin et al. 2000); these are 2.65, 2.75, 2.8, 3.1, and 3.37 g/cm3, respectively.

Table 1 True and FWA inversion results for models A and B

We determined nine 1D velocity model parameters for FWA inversion, including v P and h for the four crustal layers and v P for the upper mantle. The search range considered in this study is delimited by the lower and upper dashed line boundaries in Fig. 2. We inverted two velocity structure models in this test, one in which the velocity of layers roughly increases with depth (model A), and another that contains a low-velocity middle crust (model B). These two models were used to determine the inversion robustness for the model that contains a low-velocity layer.

Fig. 2
figure 2

Comparison between the FWA inverted model and a given true model in numerical tests. The solid black line denotes the given true model, while the black dotted and dashed lines mark the upper and lower boundaries of the velocity and depth search range. The red solid line denotes the inverted average velocity model, and the light red dashed line denotes ±1 SD of each model parameter. Upper left given true model A without a low-velocity layer; upper right given true model B incorporating a low-velocity layer. Lower the average misfit convergence value of model A (solid line) and model B (dashed line) along with generation increase

We set c to 0.9 in each numerical robustness test, while the numbers of fireworks, n, and sparks, m, were fixed at ten in each generation, depending on the preceding FWA (Ding et al. 2015) and NGA inversion (Li et al. 2012, 2016), as well as computing efficiency. Because there are n × m forward computing and objective function evaluations at each generation, larger values of n or m will require a longer computation time. Therefore, since a total of 100 (10 × 10) sparks will be generated in each generation and there are 500 generations in each FWA inversion, an optimizer can conduct up to 5 × 104 function evaluations in each run. We ran our FWA inversions five times separately using different random seed values; because each inversion will find one most optimal model, our approach generates five models after five inversions (runs). A final inverted model was then obtained by averaging the five most optimal models and calculating their standard deviations as the final model error.

The mean values and standard deviations for each parameter in our final model inversion are listed in Table 1 and illustrated in Fig. 2. The results show that mean values for inverted parameters are close to true preset values, while standard errors are relatively small, between 0.03% and 3.3%. The convergence of misfit values (Fig. 2) between the two models is also very small; our accurate and stable numerical test results show that the FWA is suitable for the inversion of waveforms.

4 Application of the FWA in central Gansu Province

The region of China investigated in this study lies to the northeast of the Qinghai-Tibet plateau (34°–39°N, 100°–106°E). The M W 5.4 earthquake took place near Min County in Gansu Province, China, on July 22, 2013. The epicenter of this event was adjacent to the Lintan-Tanchang fault, on the boundary between the Qilian and Qaidam blocks (Fig. 3). Shen et al. (2013) obtained crustal thickness of this region by using receiver function method, and the average crustal thickness beneath the five same stations we used from their results is 46.76 km. According to the seismic topography inversion results published by Zhou et al. (2006), the crustal structure of this region comprises four layers. The first of these corresponds to sediments at depths down to 3 km, while the second to fourth layers comprise upper (between 3 and 17 km), middle (between 17 and 36 km), and lower crust (between 36 km and the depth of Moho).

Fig. 3
figure 3

Study area. The focal mechanism beach ball denotes the epicenter of the M W 5.4 Minxian earthquake, the blue solid squares mark the positions of nine measurement stations, and black solid lines indicate faults. Abbreviations I, Qilian block; II, Qaidam block

The parameters of the M W 5.4 Minxian earthquake are shown in Table 2. The focal depth of this event is 18.8 km. We used the reflectivity method developed by Fuchs and Müller (1971) for forward modeling. This widely applied method is known for its high degree of accuracy and speed (Jia and Zhang 2008; Li et al. 2012). Our inversion model includes four layers on top of a half-space, including a sedimentary layer, upper, middle, and lower crust as well as the uppermost mantle. Each layer is characterized by a P-wave velocity, v P, a S-wave velocity, v S, a density, ρ, and a thickness, h. We maintained the velocity ratio between the P- and S-waves, v P/v S = 1.73, while layer densities were calculated using PREM and the crust 2.0 model as 2.65, 2.75, 2.8, 3.1, and 3.37 g/cm3, respectively. We determined a total of nine parameters via waveform inversion, including four layer thicknesses and five P-wave velocities (Fig. 4).

Table 2 Earthquake parameters used in this study (http://www.globalcmt.org)
Fig. 4
figure 4

Left average inversion models. The solid line in this figure denotes the final crustal velocity model, while the two dashed lines represent ±1 SD for model parameters. Right average misfit model value convergence plotted along with generation increase

The v P search ranges we applied to the sedimentary layer and to the upper, middle, and lower crust were between 4.0 and 6.4 km/s, 5.5 and 6.6 km/s, 5.75 and 7.0 km/s, and 6.0 and 7.45 km/s, respectively. At the same time, the v P search range we applied to the upper mantle was between 7.3 and 8.4 km/s, while the range of search thicknesses for each layer was between 1.0 and 15.0 km, 5.0 and 20.0 km, 5.0 and 25.0 km, and 5.0 and 25.0 km, respectively. We randomly generated an initial inversion model within each search range.

Instrument responses and observed trends in seismic waveform data were removed in each case, and data were filtered through a 0.02 Hz to 0.1 Hz Butterworth band-pass filter to remove high frequencies influenced by small-scale lateral heterogeneous variation. The objective function we used for waveform inversion consists of a weighted combination of root mean square residuals (RMR) and the correlation of observed and synthetic waveforms in the time domain, as follows:

$$F = \left( {1 - \lambda } \right) \times \frac{{\sqrt {\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} \left( {{\text{Obs}}_{ij} - {\text{Syn}}_{ij} } \right)^{2} } }}{{N_{\text{w }} \times \sqrt {\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} {\text{Obs}}_{ij}^{2} } }} + \lambda \times \left[ {1 - \frac{1}{{N_{\text{w }} }} \times \mathop \sum \limits_{i} \frac{{{\hbox{max}} \left( {{\text{Obs}}_{i} \times {\text{Syn}}_{i} } \right)}}{{\sqrt {{\text{Obs}}_{i} \times {\text{Obs}}_{i} } \sqrt {{\text{Syn}}_{i} \times {\text{Syn}}_{i} } }}} \right].$$
(3)

Thus, (Obs × Syn) is:

$$\left( {{\text{Obs}} \times {\text{Syn}}} \right)\left( \tau \right) = \int {{\text{Obs}}\left( t \right){\text{Syn}}\left( {t - \tau } \right){\text{d}}t}$$

In this expression, \(N_{\text{w }}\) refers to the waveform data number, \(\lambda\) is the weight parameter used to balance the RMR and waveform correlation error, while \({\text{Obs}}_{ij}\) and \({\text{Syn}}_{ij}\) refer to the amplitudes of the ith component of the observed and synthetic waveforms at the jth time point, respectively. In the objective function, the first term is the RMR used to determine the amplitude difference between synthetic and observation data, while the second term measures the similarity between observed and synthetic waveforms encompassing phase information. The weight parameter, \(\lambda\), was set to 0.5 in this inversion so as to provide the same weight to the RMR and the waveform correlation coefficient (Li et al. 2016).

Like the numerical tests described above, we set the number of fireworks, m, and the sparks, n, to ten for each generation, and run FWA inversions for 500 generations using five different random seed values. This approach meant that the optimizer conducted up to 5 × 104 function evaluations in each run, and a final inverted velocity model was obtained by averaging the five most optimal models.

5 Results and discussion

Utilizing regional waveform inversions of seismic data, we determined an average 1D velocity structural model for central Gansu Province. Our average inversion model including standard errors is shown in Table 3. According to this inversion model, the thicknesses of the sedimentary layer, as well as the upper, middle, and lower crust are 3.15, 15.69, 13.08, and 14.83 km, respectively, while v P values are 4.47, 6.07, 6.12, and 6.87 km/s, respectively, and the inverted v P of the uppermost mantle is 8.18 km/s. In a previous study, Liu et al. (2006) used v P intervals to determine the crustal composition of the northeastern margin of the Qinghai-Tibet plateau; the results of this study show that both the upper and middle crust contain felsic material, encompassing a total thickness of 28.77 km, while intermediate material is present in the lower crust, which has a thickness of 14.83 km.

Table 3 A 1D average inverted crustal velocity model of central Gansu Province (including standard errors)

The results of our study show that the thickness of the crust in central Gansu Province is 46.75 km, consistent with previously published receiver function results (Shen et al. 2013). These works calculated crustal thicknesses of 50.4 km at station BYT, 51.6 km at station LED, 46.2 km at station LXA, 49.2 km at station LZH, and 43.2 km at station YDT, translating to an average crustal thickness for the region covered by these five stations of 46.76 km. In this inversion, the velocity of upper mantle is 8.18 km/s, consistent with the upper mantle value used in the crust 1.0 model (Laske et al. 2013).

Comparisons of the synthetic waveforms generated by our final inverted 1D velocity model with those observed at the nine measurement stations are presented in Fig. 5. These data show a good overall fit between synthetic and observed waveforms and that the nine seismic stations in this study can be divided into three groups based on their azimuths from the epicenter.

Fig. 5
figure 5

Comparison of synthetic waveforms (red) from our final average inversion model with those observed (black) at nine stations in central Gansu Province. Station names and epicentral distances are listed at top of each figure. The vertical component for the LXA station, as well as the transverse and radial components for the LZH station were not used in the inversion as the observational records for these components contain high level of noises

Measurement stations LZH, YDT, SGT, and HYS all have a similar azimuth of about 340° from the epicenter. In these cases, both synthetic P- and S-waves for three components match the observed waveforms very closely, indicating that our inverted crustal velocity model accurately reproduces them in this orientation.

However, stations BYT and JTA are orientated at azimuths of about 355° from the epicenter; in these cases, while both the vertical and radial components match well with observations, there are obvious differences in the transverse component compared to observed waveforms. These differences may be the result of anisotropy and lateral heterogeneity in the crust along the BYT and JTA profiles with respect to the epicenter.

Stations LXA, LED, and MEY all have similar azimuths of about 319° with respect to the epicenter. Phase shifts between synthetic and observed S-waves are also clearly seen in vertical and radial components at these stations; although P-waves fit well. He et al. (2014) obtained a Moho depth map for China using the receiver function technique; the results of their study suggest that the depth of the Moho increases from about 45 km in the northeastern part of our study area to about 55 km in the southwest. Thus, this observed phase shift in S-waves maybe due to the Poisson solid, isotropy, and homogeneous medium assumptions that are inherent to the 1D velocity model. Research in South Korea (Chang and Baag 2006) and the Northeastern USA (Viegas et al. 2010) has also noted crustal anisotropy, which is beyond the scope of the 1D velocity model. This phenomenon should be further considered in future 2D and 3D crustal structural inversions.

The focal mechanism and location of an earthquake can also affect a final model. Thus, a station located on the nodal plane of a focal mechanism will record a seismogram with a smaller amplitude, so it is preferable to avoid the use of such station data for inversions. The distance from the epicenter can also affect models, so seismograms recorded at short epicentral distances also provide poor constraints for deeper parts of models as the turning point of the ray path is shallower. Because we utilized seismograms recorded simultaneously at nine stations, our model can be considered to be well constrained.

6 Conclusions

We present a 1D crustal velocity model for central Gansu Province that is based on regional waveform inversion using FWA. The three components of seismic waveform data used in this study were recorded by the regional seismic station network in Gansu Province. Numerical tests, the accuracy of our average inversion results, and standard deviations show that the FWA is suitable for regional waveform inversion. This inverted crustal velocity model indicates that the thicknesses of the sedimentary layer, as well as the upper, middle, and lower crust are 3.15, 15.69, 13.08, and 14.83 km, respectively, while calculated v P values for these intervals are 4.47, 6.07, 6.12, and 6.87 km/s, respectively. Data show that the inverted v P of the uppermost mantle is 8.18 km/s, while the crustal thickness is 46.75 km. The synthetic waveforms produced from this 1D model are in very close agreement with observations, although minor differences may be due to crustal anisotropy and lateral heterogeneous variations that were not taken into account during inversion. We suggest that these effects should be taken into account in future 2D and 3D velocity structural inversions.