Applying unsupervised machine-learning algorithms and MUSIC back-projection to characterize 2018–2022 Hualien earthquake sequence

The city of Hualien in eastern Taiwan is located at the junction of the Luzon arc–Eurasian continent collision and the oblique subduction of the Philippine Sea plate along the Ryukyu Trench. This complex tectonic setting creates one of the most intense seismic hazards in the world. Since the mid-twentieth century, three earthquake sequences (1951, 1986–1992, and 2018–2022) have caused severe damages and revealed a diversity of focal mechanisms, demonstrating a complex seismogenic tectonic interactions of collisions and subductions. In this study, we used the two-dimensional back-projection method to examine rupture characteristics of three strong earthquakes during the latest sequence (2018–2022). We also studied the same earthquake sequence using unsupervised machine-learning via density-based spatial clustering of applications with noise (DBSCAN) and principal component analysis (PCA). Our results revealed two westward dipping seismogenic structures. The deeper one lies beneath the eastern flank of the Central Range and the shallower one rotates clockwise along the coastline from south to north. Overall, focal mechanisms within these two structures are consistent with planar geometries obtained using PCA. The deeper structure beneath the Central Range is likely an imbricated fault system. The coastline structure is characterized by a transition of strike and dip angles from south to north. We suggest that the subhorizontal nodal planes in earthquakes offshore Hualien are related to the shallow northward dipping patch in the northern part of the coastal structure. Our Coulomb stress change results revealed strong linkages between these two structures. We used unsupervised machine-learning algorithms DBSCAN and PCA to study the 2018–2022 Hualien earthquake sequence. A deep westward-dipping and a shallow rotation structure system are revealed from earthquake clusters close to the northernmost Longitudinal Valley. Coulomb stress change is used to ascertain cascaded triggering on these two structures. We used unsupervised machine-learning algorithms DBSCAN and PCA to study the 2018–2022 Hualien earthquake sequence. A deep westward-dipping and a shallow rotation structure system are revealed from earthquake clusters close to the northernmost Longitudinal Valley. Coulomb stress change is used to ascertain cascaded triggering on these two structures.


Introduction
On Feb. 6, 2018, a destructive earthquake (M W 6.4) struck Hualien city in eastern Taiwan, close to the junction between the Longitudinal Valley Suture and the Ryukyu Subduction Zone (Suppe 1981;Shyu et al. 2005b;Byrne et al. 2011). This tectonic junction is among the most seismically active areas of the world, in which the Philippine Sea Plate (PSP) moves west-northwestward (311° N) at a rate of ~ 8.8 cm/year relative to the Eurasian Plate (EP) (Argus et al. 2011). North of Hualien, the PSP subducts obliquely beneath the EP and the northeastern continuation of the Central Range (CR) along the E-W running Ryukyu Trench. South of Hualien, the Luzon volcanic arc is colliding with the passive Eurasian continental margin along the N-S orientated Longitudinal Valley (Fig. 1). North of the junction between the Ryukyu Trench and the Longitudinal Valley, the back-arc rift of the Okinawa Trough terminates in an area close to the northeastern Taiwan, with the block south of the trough rotates clockwise offshore northeastern Taiwan Chen et al. 2018). This complex tectonic environment generates intense seismic events and significant seismic hazards in northeastern and eastern Taiwan (Cheng et al. 1997;Kao et al. 1998;Chen et al. 2004a;Theunissen et al. 2010). In the vicinity of Hualien city, three strong and catastrophic earthquake sequences (i.e., 1951, 1986-1992, and 2018-2022 sequences) have occurred since the mid-twentieth century (Fig. 1c, Additional file 1: Fig. S1). Among Keywords: Hualien earthquakes, Unsupervised machine learning, DBSCAN, PCA, Coastal rotation structure, Central Range fault Fig. 1 Tectonic setting of northeastern Taiwan and focal mechanisms for three Hualien earthquake sequences: a map of tectonic conditions of collision between Luzon Arc and Eurasian Plate (EP) with oblique subduction of Philippine Sea Plate (PSP) beneath EP along Ryukyu Trench, where motion of PSP relative to the EP is indicated by black arrow; b focal mechanisms of three earthquake sequences for 1951 (Cheng 1997), 1986-1992(Kao et al. 1998Jian et al., 2018a, b), where blue lines indicate active faults (Shyu et al., 2020) Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:28 them, the 1951 sequence includes three strong earthquakes (M ~ 7) struck the Hualien region on the same day (Cheng et al. 1997). These earthquakes also triggered another two major events one hundred kilometers to the south in November 1951 (Cheng et al. 1997;Chen et al. 2008;Chung et al. 2008). The 1986 earthquake (M W 7.3) off the coast of Hualien was followed by ten moderateto-strong earthquakes within the following six years (Kao et al. 1998). The most recent 2018-2022 sequence caused substantial damage and took a high death toll in Hualien city, despite the fact that the total seismic moment during this sequence was the least among the three earthquake sequences.
The frequent occurrence of earthquakes in the vicinity of Hualien raises important questions about active structural and fault geometries at the junction between the Longitudinal Valley Suture and the Ryukyu Subduction Zone. Note that in this region, the west-northwestward oblique collision between the Central Range, and the Luzon Arc in the south is gradually transitions to the northwestward subduction of the Luzon Arc and the PSP beneath the EP in the north (Teng 1990;Shyu et al. 2005aShyu et al. , 2016. Seismologically, this tectonic transition manifests as a variety of focal mechanisms observed between Hualien and offshore northeastern Taiwan (Cheng et al. 1997;Kao et al. 1998;Hsu 2001;Chen et al. 2004b;Wu et al. 2008b;Jian et al. 2018a). South of Hualien, earthquakes are dominated by thrust and reverse faulting indicative of oblique collision with strikes parallel to the Longitudinal Valley Suture. In contrast, focal mechanisms beyond the north of the Ryukyu Trench present a series of fault planes parallel to the subduction interface with P-axes perpendicular to the trench (Fig. 1b) (Kao et al. 1998;Jian et al. 2018b). In an area to the offshore Hualien, where the buoyant Luzon Arc submerges beneath the EP near the Hsinchen ridge (Font et al. 2001;Lallemand et al. 2013), the buoyant subducting Luzon Arc also contributes to the complexity of seismogenic structures, as evidenced by additional strike-slip events and unclassified faults (Zoback 1992) with sub-horizontal or sub-vertical nodal planes adjacent to the junction (Kao et al. 1998;Jian et al. 2018a) (Figs. 1, 2). These diverse fault geometries had a profound influence on the 2018-2022 Hualien earthquake sequence Page 4 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) (Fig. 1b). Given the temporal and spatial distribution of these events (Fig. 2) (Shyu et al. 2016), most of these focal mechanisms do not share the slip characteristics of known seismogenic structures near the surface. The discrepancy between slip characteristics of known seismogenic structures near the surface and focal mechanisms motivate us to develop alternative approaches by which to study the spatial distribution of ruptured faults in this region. When the causative fault rupture did not reach to the surface, identifying the geometry of the fault structure largely depends on the distribution and accuracy of earthquake hypocenters, together with their focal mechanisms. In the traditional approach, several iterations of earthquake relocation and tomography inversion are generally required to obtain satisfactory results (Wu et al. 2008a). Among different earthquake relocation methods, the 3D earthquake relocation with high-resolution 3D velocity model have been shown to provide valuable insight into the rupture fault's geometry with the cost of daunting computational overhead. The HypoDD is an alternative approach that could effectively confine the distribution of aftershocks relative to the master event (or mainshock) (Waldhauser and Ellsworth 2000). However, the diversity of focal mechanisms associated with the 2018-2022 earthquake sequence and the complexity of rupture patterns make it challenging to select proper spatial and temporal extension of aftershocks that could be employed in the HypoDD method (Fig. 2).
In order to avoid these aforementioned issues from hypocenter relocations, this study tends to develop an alternative and efficient solution to constrain the structural geometry from a complex and dense-distributed seismicity. This study began with the use of two-dimensional (2D) back projection (BP) to examine rupture characteristics associated with the mainshock of 2019. We also combined BP results of the 2018 foreshock and mainshock from Jian et al. (2018b) to identify areas associated with major ruptures during the 2018-2022 sequence. In addition to the 2D BP analysis, we employed unsupervised machine learning (ML) algorithms to explore the earthquake distributions that could reflect the geometry of potential seismogenic structures (Carena et al. 2002;Kim et al. 2016). These unsupervised ML algorithms include density-based spatial clustering of applications with noise (DBSCAN) to cluster earthquakes and the dimensionality reduction principal component analysis (PCA) to characterize the 2D planar orientation of each cluster. Both DBSCAN and PCA are important unsupervised ML algorithms to retrieve patterns or features from large volumes of data. Unlike supervised ML which uses the labeled training data to make predictions for new inputs, the labeled training data are not applicable for the study of searching "blind" seismogenic structures and thus makes the application of unsupervised ML for seismogenic structure analysis challenging (Zaliapin and Ben-Zion 2013;Li et al. 2018). Nevertheless, without prior selection of temporal or spatial extension for mainshock sequence and complex inversion parameter settings, the DBSCAN algorithms can efficiently complete clustering numerous earthquakes (e.g., more than 3000 events larger than M L 2.7) within seconds in a modern personal computer (PC). For each cluster identified by DBSCAN, the structural orientation (strike, dip angles) are obtained by PCA and compared with summation of moment tensors (MT) in the same cluster. We also perform control test to both the 2019 Hualien foreshockmainshock-aftershock sequence (Huang and Wang 2022) and CWB catalog, to test the reliability of DBSCAN and PCA methods. Finally, we calculated Coulomb stress change associated with earthquakes (M W ≥ 5.0) in chronological order to examine the static stress transferring during the 2018-2022 earthquake sequence.

Back projection and teleseismic data
High-resolution BP imaging via high-frequency teleseismic waves is widely used to characterize earthquake ruptures (Ishii et al. 2005;Kiser et al. 2011;Meng et al. 2011Meng et al. , 2012Yao et al. 2011;Jian et al. 2017). The process begins with the alignment of P-waves in a dense array to minimize the effects of local heterogeneous structures. Source images along candidate source grids are then derived by back-projecting coherent source wavetrains using beamforming in the time domain (Kiser et al. 2011), compressed sensing in the frequency domain (Yao et al. 2011), or subspace-based MUltiple SIgnal Classification (MUSIC) (Meng et al. 2011;Jian et al. 2017Jian et al. , 2018b. The MUSIC method involves locating the strongest source radiators by projecting pseudospectrum power into Page 5 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:28 noisy subspaces. In the current study, MUSIC BP was implemented using vertical velocity waveforms from the Australian seismic network, which includes apertures with azimuth coverage sufficient for the imaging of rupture properties associated with earthquakes of moderate intensity. In this study, we applied the same 2D BP method to the 2018 foreshock and mainshock (Jian et al. 2018b) to search for energy bursts along the 40 × 40 km 2 fault plane using a grid with cells measuring 2 × 2 km 2 . The pre-defined fault planes were selected from two nodal planes of the AutoBATS MT solution.

Empirical estimation of rupture area
The rupture area of significant earthquakes (M W ≥ 6.0) can be mapped by the finite fault inversion (Kikuchi and Fukao 1987;Ji et al. 2003;Lee et al. 2018) or BP imaging (Ishii et al. 2005;Kiser et al. 2011;Meng et al. 2011;Yao et al. 2011;Jian et al. 2017) via teleseismic body waves and Geodetic data. When dealing with moderate earthquakes (M W 5.0-5.9), examining the distribution of aftershocks is the most effective way to estimate rupture dimensions. The later mentioned method, however, is not very useful to estimate the rupture distributions for each moderate event in the 2018-2022 Hualien earthquake sequence, because the sequence was too complex to identify individual rupture planes associated with specific earthquakes. An alternative approach to the rupture zone estimation is utilizing the empirical relation between the earthquake's magnitude (M W ), rupture length (L) and rupture area (A). The magnitude-dependent rupture area relations are widely applied for the static stress change calculation (King et al. 1994;Toda et al. 2005;Chen et al. 2008). Leonard (2010) compiled several earthquake rupture datasets for events larger than 5.5 to derive self-consistent equations describing the scaling between seismic moment, rupture area, length, width, and average displacement. The relation between moment magnitude and rupture length is M w = 1.67log(L) + b , where constant "b" are 4.24 and 4.17 for dip-slip and strike-slip earthquakes, respectively. As the relation between rupture area and magnitude is M w = log(A) + a , where constant "a" is 4.0 for dip-slip events and 3.99 for strike-slip faults. According to above empirical relations and moment magnitude and rectangle rupture area assumption, we can also extract the rupture width from estimates of rupture area (A) and length (L) from the moment magnitude. For events of magnitude 5.5 to 6.5, the aspect ratio of rupture length to width ranges from 0.97 to 0.62. Since most of these statistics were derived using a limited number of earthquakes of magnitude 5.5 to 6.0, we decided to use the aspect ratio 0.75 estimated from magnitude 6 events, to estimate the rupture width for moderate earthquakes in this study.

Earthquake clustering and catalog
The DBSCAN and PCA are two fundamental unsupervised ML algorithms we used to clustering earthquakes and to analyzing fault rupture geometries from the unmodified CWB earthquake catalog. Figure 3a shows a simple workflow of clustering 3D epicenters (earthquakes) with DBSCAN and retrieving orientations of 2D planes with PCA from clusters. Both algorithms are from open package of Scikit-learn (Pedregosa et al. 2011).
Although the combination of these two unsupervised ML algorithms are not widely used in the seismological research, both DBSCAN and PCA are important and fundamental unsupervised ML algorithms which has been widely applied to many ML applications, such as data sampling and facial recognition (Seiffert et al. 2008;Balaban 2015;Zhou et al. 2017;Kaur et al. 2020). In the following sections, we will address the event catalog and DBSCAN and PCA methods in details.

Catalog
The CWB 3D event catalog from Feb-2018 to Dec-2021 is used in this study. The earthquake epicenters in this catalog are initially obtained by auto picking algorithm and then been examined manually (Chang et al. 2019 (Wu et al. 2008a) also suggests the average differences are 1.565 ± 1.479 km and − 0.353 ± 1.633 km in horizontal and vertical directions for earthquakes larger than 2.4 and shallower than 30 km (Additional file 1: Fig.  S3). Both results suggest the epicenter locations do not have significant difference between the on-land and nearshore earthquake events in the current CWB catalog.

Density-based spatial clustering of applications with noise
DBSCAN is a density-based non-parametric clustering algorithm (Ester et al. 1996) which groups data points that are spatially close to each other. In DBSCAN, the number of clusters does not have to be set beforehand and outliers are automatically omitted. The pros of DBSCAN make it suitable to spot concentrated distributions of earthquakes and identify potential seismogenic structures. DBSCAN requires only two parameters: the minimum number of samples in a cluster and the maximum distance between two samples for them to be considered neighbors (i.e., neighborhood radius parameter, epsilon (ε)). Data points are neither the core points nor directly reachable from core points are considered noises or outliers. Nonetheless, parameter scanning must be tailored to specific earthquake distribution patterns. The neighborhood radius plays an important role in determining the optimal cluster size and number of clusters; therefore, we examined the clustering results with ε set between 0.85 and 4 km with a minimum number of neighbors set at 10 for analysis of the 2018-2022 earthquake sequence. To conservatively eliminate the effect of magnitude incompleteness and the mislocation of small earthquakes, we set the M L thresholds for DBSCAN over a range of 2.5-3.0. Overall, the cluster results were similar in all cases where DBSCAN was implemented under the same neighborhood radius settings to which it is highly sensitive. In the current study, we selected an M L value exceeding 2.7 to make sure the earthquake used in this study is well above the magnitude of completeness of this area.

Principal component analysis
PCA is defined as an orthogonal linear transform that transforms the dataset into a new coordinate system. The first principal component (PC1) in the new system is the direction of data point projection that provides the greatest variance. The next principal component (PC2) is orthogonal to PC1 and provides the second-largest variance. PCA is widely used in machine learning to reduce the dimensionality of a dataset without losing important features (Jolliffe and Cadima 2016). Generally, PCA represents an M-dimensional dataset using N-dimensional subspaces, where N < M. This means that the 3D distribution of a group of earthquakes can be converted to one or two dimensions based on the ratio of the three eigenvalues of PCA. For earthquakes with a linear distribution extending to a large distance, the first eigenvalue is far larger than the other two. If the ratio of the three eigenvalues is close to 1:1:1, there is no preferred planar geometry applicable to the events in the cluster. For earthquakes with a planar distribution, the third eigenvalue is far smaller than the other two, and the corresponding eigenvector represents the normal vector of the PCA plane (Fig. 3). To present the PCA plane as conventional fault plane geometry, we further obtain the strike and dip angles from the relationship between the fault plane and normal vector (Stein and Wysession 2009). In order to depict the planar structure for DBSCAN clusters, we take the largest eigenvalue to approximate the half of PCA plane length. The same aspect ratio of 0.75 as empirical rupture area for moderate earthquakes was set to represent width of PCA plane.

BP imaging of 2019 mainshock
We first examine the rupture process of the second strongest earthquake (M W 6.2) in the 2018-2022 sequence. Triggered by the 2018 MW 6.4 Hualien mainshock 14 months later, the Mw 6.2 earthquake in 2019 occurred 15 km west of the 2018 mainshock. After performing 2D BP imaging for both nodal planes, we selected the BP results along the WNW-ward dipping plane over the other plane due to the stability of the BP seismic energy distribution (Additional file 1: Fig. S4). Based on coherently-aligned stacking P-waveforms from the AU seismic network, it appears that a weak initial amplitude was followed by strong seismic energy arriving after 3 s and lasting for about 3 s (Additional file 1: Fig. S4a). According to the high-frequency BP result at 1-2 Hz, the initial small-amplitude P waveforms were back-projected up toward a point roughly 5 km south of the epicenter. The major rupture propagated downward over a distance of roughly 15 km to the north. The rupture length over time leads to an average rupture speed of 2.4 km/s (Fig. 4b). The estimated 15-km length for the rupture from BP image is consistent with the prediction of a 14-km fault length based on the empirical relationship between fault length and magnitude (Leonard 2010). Our BP results indicate that the 2019 mainshock started the rupture at a shallower depth of 15 km, and the rupture extended for 12 km along the strike direction Page 7 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:28 and 10 km along the dip direction (Fig. 4). The stacking P-wave BP images verifies the overall rupture area (Additional file 1: Fig. S4b). The distribution of aftershocks dipping to the northwest (occurring over a period of one month) appears to support our preferred fault plane geometry. Although a small discrepancy (2-3 km) between the mainshock and aftershock cluster can be attributed to errors in epicenters, a systematic offset between the mainshock and aftershocks is uncommon for inland earthquakes (Wu et al. 2008a); therefore, we suggest the gap between BP imaging and the aftershock distribution is resulted from a structural system with parallel faults.
Overall, the length of ruptures estimated using BP imaging for the three largest events of the 2018-2022 earthquake sequence are comparable to the empirical predictions of fault length based on their magnitudes (Leonard 2010) (Fig. 4e). Given the time-varied pseudospectrum power functions and the rupture lengths, we estimated the rupture speeds as follows: 2018 foreshock (1.8 km/s), 2018 mainshock (2.0 km/s), and 2019 mainshock (2.4 km/s). All of these estimates fell within a reasonable range of 62-64 percent of shear wave velocity (Huang et al. 2014).

Ruptures due to 2018 Hualien mainshock
Based on both the regional (AutoBATS) and global CMT solutions, it appears that the 2018 Hualien mainshock was characterized by a strong CLVD component (~ 40%), which was likely the result of four subevents with different focal mechanisms, as described by Jian et al. (2018b) ( Table 1). Using multiple point source inversion (MPSI) (Kikuchi and Kanamori 1982;Kikuchi and Fukao 1985), they determined that the 2018 mainshock was initiated by an M W 5.9 subevent similar to the 2018 foreshock with either sub-horizontal or sub-vertical nodal planes (Fig. 4e). A NE-SW strike thrust event (M W 6.0) occurred 4 s later followed by the third and largest (M W 6.2) subevent along a high-angle sinistral fault. The fault geometry of the M W -6.2 subevent was similar to that of the Milun Fault (Chen et al. 2014;Shyu et al. 2016  . At 17 s after the rupture began, a final subevent occurred with a focal mechanism similar to the first one. Nonetheless, we doubt the reliability of the focal mechanism pertaining to the last subevent, due to high noise levels in coda waves from the three earlier subevents. Based on their MPSI results pertaining to the first three subevents, we used empirical relationship to estimate of the rupture area to represent complex ruptures related to the 2018 mainshock. According to BP imaging results pertaining to the 2018 foreshock and the Milun fault geometry, we posit that the first subevent originated with an E-W striking subhorizontal fault, whereas the third subevent originated in an N-S striking fault, likely to be the Milun fault. Our PCA results indicate that the second subevent originated in the NW-ward dipping fault plane, as outlined in the following section. Note that we expected certain uncertainty in the centroid depth of these subevents due to limited spatial resolution of teleseismic long-period waveforms used for MPSI.

Clustering and PCA results
As described in the method section on DBSCAN, neighborhood radius ε determines the size and number of clusters. When dealing with a larger ε, DBSCAN creates larger clusters, indicating a larger number of earthquakes grouped within a smaller number of clusters, and vice versa. Selection of parameter ε involves ad hoc assumptions pertaining to seismogenic conditions. For example, by setting ε = 4, the 2018-2022 Hualien earthquake sequence is grouped to two major clusters: one covering most of deeper earthquake at east of Hualien and another one merging coastal shallow and deeper Central Range earthquakes (Additional file 1: Fig. S5a). The PCA of the later cluster is characterized with major eigenvalue (= 381) and two small eigenvalues. DBSCAN using inappropriate large ε merge two spatial separable seismogenic structures into only one westward dipping plane of strike and dip angles in 214° and 39° (Additional file 1: Fig. S5b).
On the other hand, by selecting small ε at 0.85, DBSCAN produces several small clusters and treats most of earthquakes as noises (Additional file 1: Fig. S5c). Before searching a proper ε value to the 2018-2022 earthquake catalog from CWB, we examine the result of DBSCAN and PCA methods to a conventional foreshock-mainshock-aftershock (FMA) sequence of the 2019 mainshock (Fig. 5a) (Huang and Wang 2022) This 2019 FMA sequence starting from April to June, 2019 is extracted by double-link declustering method (Huang and Wu 2022). By setting ε to 1.5, the 2019 FMA sequence (M L ≥ 1.5) is divided to two clusters (Fig. 5a). From north to south, the PCA planes for both clusters dip to west in dip angles of 55° and 72°, respectively. The map and cross-sections of the 2019 FMA sequence confirm our PCA result of strike and dipping angle variation from north to south. By applying the same magnitude threshold and parameter ε to the same period of CWB catalog, the DBSCAN obtains the same northern and southern clusters. Their orientations are similar with the results obtained from the 2019 FMA sequence and the differences of PCA strike and dip angles between them are smaller than 3° and 9°, respectively (Fig. 5). Note that a clear 2 ~ 3 km offset of the aftershocks and 2019 mainshock are both observed in the 2019 FMA sequence and CWB catalog (Fig. 5). Therefore, the 2019 FMA sequence clarifies the uncertainty suspicious for inland event catalog reported by the CWB. The eigenvalue ratios for two clusters are 80:14:5 and 68:26:6 respectively. Conventionally, the 2019 FMA sequence would be described as an overall NNE-SSW striking structure which dips to northwest steeply. The DBSCAN and PCA methods proposed in the current study deliver mathematic estimations of structure orientations and show the structure transition from north to south for the 2019 FMA sequence. Among the DBSCAN results using different value of ε, the favored cluster results depended largely on additional seismogenic constraints pertaining to the focal mechanisms. We tend to choose the DBSCAN and PCA result when the planar geometry of PCA agree with most of the focal mechanisms (AutoBATS MT catalog) within the cluster. With these constraints in mind, the earthquakes beneath the Central Range were grouped into two clusters (M1 and M2) by setting ε to 1.5 in accordance with their relatively low seismic density distribution (Fig. 2). Note that we don't have another northern cluster as shown in control test (Fig. 5), because most of events in that cluster are smaller than our magnitude threshold of 2.7 used for this study. For earthquakes along the Hualien coastline, we selected a smaller ε (1.05-1.1) to create a larger number of clusters (C1-C5) emphasizing the curved pattern of their seismicity distributions. In the area north of Hualien (near the 2018 foreshock), a sparse flat distribution of earthquakes was grouped in the C0 cluster using ε = 1.4. We set ε = 1.2 to group four other small clusters (L1-L4) located from 23.8°N to 23.9°N between coastal cluster C4 and Central-Range cluster M2. The final DBSCAN and PCA results are summarized in Fig. 6b and Table 2. Among these PCA results, only the southern cluster C5 presented an SE-ward dipping structure (Fig. 6b). All of the other clusters dipped westward to northwestward. The coastal clusters C0 to C4 respectively present a gradual transition of strike and dip angles from roughly 250° to 180° and 10° to 65° (Fig. 6b and Table 2). We assessed the consistency between PCA-derived fault planes and their corresponding moment tensor summation. MT summation is typically dominated by the largest earthquake. For a group of comparable magnitude earthquakes with different focal mechanisms, MT summation reveals overall stress patterns under current tectonic conditions. Additional file 1: Figs. S6-S8 present the focal mechanisms and corresponding MT summations for each cluster for comparison with the PCA results. In the following section, we summarize the PCA results and empirical rupture areas for earthquakes exceeding 5.0 with the aim of revealing the overall seismogenic structures involved in the 2018-2022 Hualien earthquake sequence (Figs. 6, 7).

Earthquake sequence in Central Range
Most of the focal mechanisms included in clusters M1 and M2 involved thrusts with high-angle westward-dipping fault planes (Additional file 1: Fig. S6). These focal mechanisms are consistent with the corresponding PCA plane geometries. Just east of the M1 cluster, the 2019 mainshock occurred on a 15-km deep and is featured by a steeply west-dipping fault plane parallel to the M1 cluster. Along the shallower part of the M2 cluster, all of the focal mechanisms of the 2021 doublet and other nearby are presented as reference, and red to orange color bar indicates formation of rupture propagation; b DBSCAN and PCA results with cluster events (circles), corresponding PCA planes (rectangles), and larger earthquakes (stars) (note the number of cluster events did not meet our threshold (25) for clusters L1-L4, and MT summations for each cluster are presented beside PCA planes); c, d dip and strike angles of PCA planes respective to latitude Page 11 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:28 moderate earthquakes (M W ≥ 5.0) were similar to that of the 2019 mainshock. The consistency of the PCA result pertaining to M1 and M2 clusters and the close spacing of the focal mechanisms of large earthquakes suggests a west-dipping fault system with imbricated faults beneath the Central Range (Figs. 6, 7). The shallower cluster L1 at a depth of ~ 10 km located east of the M1 cluster presented qualitatively similar focal mechanisms, which means that it is likely associated with a shallower portion of the CR westward fault system (Additional file 1: Fig.  S8). In summary, the west-dipping fault system appears to begin from 24.1° N extending SSW roughly 30 km to 23.85° N with a dip angle of 60°−70° (Fig. 6). Around the L1 cluster, we do not associate clusters L2−L4 to this CR fault system, because most of the focal mechanisms in the latter three clusters do not agree with the steep westdipping fault geometry (Additional file 1: Fig. S8).

Coastal earthquake sequence and the Milun Fault
PCA analysis of the northernmost cluster C0 revealed a nearly horizontal plane. In the region south of C0, a large number of offshore focal mechanisms from the 2018-2022 sequence present nodal planes, either subhorizontal or subvertical (Fig. 2). Here, we preferred for  Page 12 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:28 subhorizontal fault plane solutions due to the lack of vertically distributed aftershock sequences in the DBSCAN and PCA results. Considering the PCA-derived lowangle plane of cluster C0 in the context of subhorizontal rupture earthquakes, it appears that there is a shallow planar structure dipping northward offshore from Hualien from latitude 24° N to 24.3° N at a depth of 5 to 15 km (Figs. 6, 7). When we consider earthquakes within this planar structure, the high degree of variability in the strike azimuths implies an overall uneven northward (NE-, N-or NW-ward) or bumpy subhorizontal seismogenic structure. The second subevent (thrust-type) of the 2018 mainshock was surprisingly similar to the PCA-derived geometry of the C2 cluster (Tables 1, 2; Figs. 6, 7). Taking into account the considerable uncertainty pertaining to centroid depth when using the MPSI method, we posit that the thrusting second subevent of the 2018 mainshock and the cluster C2 events occurred on an NNE-SSW striking fault, which dips northwestward (Figs. 6, 7). Between clusters C0 and C2, cluster C1 features a NW-ward dipping plane, perhaps indicative of a transition structure connecting the NW-ward C2 plane to the overall N-ward flattening structure. The transition is supported by several inconsistent focal mechanisms within cluster C2 (Additional file 1: Fig. S7). Our PCA results revealed two other westward dipping structures from the clusters C3 and C4, south of the C2, from 23.85° N to 24.0° N. Among these clusters (C0-C4), MT summation revealed dip angles that were consistent with the PCA planes except for cluster C4. We suspect that earthquakes associated with other structures were included in the C4 cluster because two largest earthquakes showing very different focal mechanisms at nearly identical epicenters (Cluster C4 panel in Additional file 1: Fig. S7). The orientation of planes with a shallow dip angle revealed considerable uncertainty indicative of inconsistencies in the strike angle between the MT summation and PCA results for clusters C0 and C1.
The PCA results pertaining to coastline clusters (C0 to C4) indicate that these clusters are associated with a gradual transition in strike and dip angles from south to north (Fig. 6c). The transitions in strike and dip angles are reliable by comparing them with the small uncertainties revealed in the control test of applying DBSCAN and PCA to CWB catalog and 2019 FMA sequence. This continuous seismogenic structure includes five coastline PCA planes and several empirically-determined rupture zones derived from moderate earthquakes, which cannot be separated spatially. When assembled, these zones suggest a blind seismogenic structure extending at a shallow depth (5-15 km) for roughly 50 km. The structure appears to be progressively rotated 60° clockwise from south to north, which extends from a steep west-dipping (60°-70°) structure running nearly N-S in the south to horizontal north-dipping (10°) structure orientated ENE-WSW in the north (Fig. 6). This continuous transition of structure can be also observed through conventional approach of profiling earthquakes as shown in Additional file 1: Fig. S9.
The major surface fault system in Hualien is the Milun Fault, which is sinistral and oblique (Shyu et al. 2005a). Rupturing of the Milun Fault during the 2018 mainshock was verified via field investigation and geophysical observations Yen et al. 2018;Huang et al. 2019). Jian et al. (2018b) claimed that the fault geometry of the third subevent associated with the 2018 mainshock was similar to that of the Milun Fault at the surface; however, its estimated centroid depth appears to be too deep because of limited resolution of using longperiod (10 ~ 50 s) teleseismic waveforms (Fig. 7c). A lack of left-lateral strike-slip aftershocks near the Milun Fault implies the 2018-2022 Hualien earthquake sequence involved other structures in Hualien area (Fig. 2). The limited distribution of strike-slip seismicity also suggests that the deeper part of the Milun Fault may intersect the west-dipping structure with a complex structural geometry slightly offshore from Hualien (Fig. 7c).

Cluster C5 and Longitudinal Valley Fault
We excluded the southernmost coastal cluster C5 from the coastal system due to its unique spatial geometry. PCA results indicate that C5 is an NE-SW striking structure that dips steeply to the east (strike = 56°, dip = 68°). This assertion is supported by focal mechanisms in the C5 cluster (Additional file 1: Fig. S7). The updip projection of this estimated PCA surface place the plane close to the Longitudinal Valley Fault on the surface-the prominent active fault along the western flank of the Coastal Range. Earthquakes within small cluster C5 extend to a depth of 5-10 km (Figs. 6, Additional file 1: Fig. S7) and likely represent a small patch on the Longitudinal Valley Fault triggered by the four-year-long Hualien earthquake sequence, and particularly by the 2019 mainshock (Fig. 2a).

Coulomb stress change
The densely-distributed earthquake cluster in the 2018-2022 sequence provides an excellent opportunity to examine fault interactions in the vicinity of Hualien. The regional earthquake interactions have been discussed in previous studies. For instance, Chen et al. (2008) explored the leaping trigger behavior of the 1951 sequence based on static Coulomb stress changes. They concluded that the 1951 strike-slip mainshock generated positive stress Page 13 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:28 change leading to the subsequent Yuli earthquake in the south. However, for the nearby second and third events their results showed negative stress change from their fault geometries settings. Clearly, the geometry of the receiver fault is crucial to the Coulomb stress change pattern (Toda et al. 2005;Mildon et al. 2016). For blind fault systems without surface rupture traces or a clear aftershock distribution, the lack of seismological and geological constraints may lead to the erroneous assignment of source-receiver fault combinations for Coulomb stress change evaluation. In this study, we used ML to analyze the 2018-2022 sequence, the results of which were used to posit possible fault geometries describing the process of stress transfer between larger events. We set a typical value of friction coefficient to 0.4 for general earthquakes (King et al. 1994;Toda et al. 2005) and the friction coefficient is not the key parameter for the overall stress change pattern . Figure 8a summarizes 14 fault geometry configurations (M W ≥ 5 events). Three of the subevents of the 2018 mainshock are included as event numbers (#3 to #5). For earthquakes with nearly identical epicenters or overlapping rupture areas, we excluded smaller events (e.g., the 2021 doublet). We favored fault planes with dip directions corresponding to nearby PCA results. For example, we selected subhorizontal ruptures pertaining to earthquakes close to cluster C0, including the 2018 foreshock, the first subevent of the 2018 mainshock, and several eastern offshore events. We selected the left-lateral fault plane as the third (and largest) subevent of the 2018 mainshock, based on the geometry of the Milun Fault. Based on the configuration of the regional fault systems and the chronological order of earthquakes in the 2018-2022 sequence, our model revealed that all earthquakes were associated with the fault plane receiving positive accumulated Coulomb stress change from earlier ruptures. Here, we present the results related to three important earthquakes: the largest subevent of the 2018 mainshock (event #5), the 2019 mainshock (event #9), and the 2021 doublet (event #11). In Fig. 8b, the shallower part of the largest subevent of the 2018 mainshock (i.e., the Milun Fault) experienced a positive change in stress of more than 1 bar. After Feb. 7, 2018, eight ruptured events had already produced stress perturbation exceeding 1 bar on the northern section of the CR fault (event #9 in Fig. 8c). However, the 2019 mainshock ruptured 14 months later to response the stress increase. The southern CR fault (event #11) inherited stress increase less than 0.5 bar from earlier 8 earthquakes in 2018 and the 2019 mainshock (event #9) (Fig. 8d). The small amount of stress perturbation explains another two-year-delay triggering for the 2021 doublet. The April-July, 2021 earthquake swarm (event #12 to #14) was also triggered by earlier ruptures involving positive stress change. All these modelling results indicate that the 2018-2022 earthquake sequence was a cascading rupture process beginning with 2018 foreshock in the northward dipping structure to the north, culminating in catastrophic earthquakes of the 2018 mainshock and delayed triggered 2019 mainshock, 2021 doublet, and the 2021 earthquake swarm.

Similarities among three Hualien earthquake sequences
Our result suggests the recent 2018-2022 earthquake sequence is similar to earlier earthquake sequences in this area (i.e., 1951 and 1986-1992). The 1951 Hualien earthquake sequence presented rapid triggering and longdistance interactions among multiple fault systems. The first strong earthquake (M L ~ 7.3) was followed within a few hours by several strong earthquakes, including two nearby events (M L ~ 7.1), as well as three distant events one month later (M L 6.0 ~ 7.3) more than 100 km to the south (Additional file 1: Fig. S1). The first motion solution associated with the 1951 mainshock (Cheng et al. 1997) was a strike-slip event, probably associated with the Milun Fault rupture (Hsu 1962;Bonilla 1975Bonilla , 1977. North of the 1951 mainshock epicenter, the second 1951 event presented a subhorizontal focal mechanism similar to the 2018 foreshock. These types of subhorizontal focal mechanisms have also been observed in the 1986-1992 earthquake sequence in similar locations. Hence, the subhorizontal section at the northern part of the coastal structure is likely to be a major seismogenic structure in this area (Fig. 1, Additional file 1: Fig. S1). These three earthquake sequences also appear to be associated with subduction zone interface events. The focal mechanism of the third strong 1951 earthquake (Cheng et al. 1997) featuring a shallow northward-dipping thrust with strike parallel to the Ryukyu Trench could be considered a subduction zone interface event. During the 1986-1992 sequence, three interface events (M W > 5.5) were triggered by the 1986 mainshock near Hualien (M W 7.3) (Fig. 1). Similar triggering occurred during the 2018-2022 sequence, in which the 2018 interface doublet (M W 5.9 and 6.1) and the 2022 earthquake (M W 6.2) were respectively triggered beneath the Nanao basin and Nanao Rise (Fig. 1). The rapidity with which changes in static stress decrease as a function of distance suggests that long-distance and delayed triggering may be related to the postseismic stress redistribution, such as viscoelastic relaxation, poroelastic rebound, or afterslip (Freed 2005). The similarity between the three Hualien earthquake sequences in terms of focal mechanisms is a clear indication of interactions between the Hualien Page 14 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022)  coastal structure, the Milun Fault, and the Ryukyu subduction interface.

Tectonic implications
The indentation of the Luzon arc in the oblique EP-PSP convergent system plays an important role in shaping the mountains of Taiwan and the tectonic rotation of northeastern Taiwan (Lu and Malavieille 1994;Shyu et al. 2005b;Wallace et al. 2005). Our seismological observations at the northernmost part of the Coastal Range provide further evidence of indentation and rotation in this oblique collision system. The structures that responsible for the coastline and Central Range cluster presented a clockwise rotation pattern consistent with the change of orientation of the Central Range between Hualien and Ilan (Fig. 6). We also observed similar rotation patterns in the simplified 3D oblique collision model north of the rigid indenter, in which a west-dipping oblique-reverse fault developed at the northwestern margin of the rigid indenter and gradually transitioned to the sub-horizontal faults north of the indenter (Lu and Malavieille 1994 Above the PSP mantle, rotation of the S H can perhaps be attributed to our coastal rotating structure. Figure 9 illustrates the two fault systems (Central Range fault system and the coastline fault system) as summarized from MPSI results of the 2018 mainshock, PCA results, and rupture zones associated with the 2018-2022 Hualien earthquake sequence. It is highly likely that the aforementioned west-dipping coastline structure is a shallow eastward extension of the Central Range fault, steeply dipping westward under the Central Range. It appears that the coastline fault system developed just south of the small clusters of L2-L4 (from 23.85°N), and gradually extending northeastward until reaching the Coastal Range. The L2-L4 clusters may be a manifestation of the transition connecting the west-dipping Page 16 of 18 Jian and Wang Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:28 shallow coastline fault to the high-angle, deeper, and more complex Central Range fault system at a depth of 10 km. To the north, the coastline structure gradually rotates clockwise roughly 60°, whereupon it changes to a north-dipping sub-horizontal reverse fault system in the offshore region northeast of Hualien. Although not illustrated by the earthquake clusters, this northern shallow dipping structure may eventually connect to the fault system associated to the Ryukyu subduction zone which is indented to by the Hsinchen Ridge and its underlying subducted Luzon Arc. Similar pattern of curved reverse fault system can be found around the subducting seamounts in many subduction zones, including the subducting Gagua Ridge east of Taiwan (Dominguez et al. 2000). Our interpretation of this fault connection may also help to explain the long-distance earthquake interaction between the Hualien area and the Ryukyu subduction zone, which appeared in the last three major sequences. Nonetheless, detailed marine geophysical surveys and seismological analysis north of Hualien will be required to verify this hypothesis.

Conclusions
The 2018 foreshock initiated the 2018-2022 Hualien earthquake sequence and helped to unveil the complex fault systems in the collision-subduction junction area. In the current study, we applied DBSCAN and PCA to the entire earthquake sequence. We also combined BP imaging of three strong earthquakes and rupture areas associated with moderate earthquakes. Our results revealed two seismogenic systems, including one beneath the Central Range and a coastal structure. The former is a 30-km westward-dipping Central Range fault system with imbricate faults in the north. The latter is a spatially continuous bending system characterized by rotating strike and dip angles. It appears that this structure depicts an indented Luzon Arc with a westward reverse fault developed in the southwest under the Central Range, and a northward-dipping fault in the north presumably above the subducted Luzon Arc. Finally, we characterized the static stress interaction between these two structures, showing clear sign of positive stress triggering between these events. Based on the similarity of earthquake events in this region during the last three earthquake sequences (1951, 1986-1992, and 2018-2022), we suggest shallowdipping northward plane found by our machine learning approach is a primary seismogenic structure offshore Hualien. This gentle northward dipping fault may eventually connect to the structures within the Ryukyu Subduction system, forming a curved reverse fault system that can be found around other subducted sea mounts in front of subduction trenches.
Additional file 1: Figure S1. Hualien earthquake sequence distributions of (a) 1951, (b) 1986-1992, and (c) 2018-2022, where earthquakes are colored by year of occurrence to show spatiotemporal rupture evolution of three sequences, stars show major events (M W ≥ 5.5), the 1951 event catalog is from Cheng et al., (1997), earthquakes after 1985 are from CWB catalog, moment magnitude for 1986-1992 and 2018-2022 earthquake sequences are from Kao et al. (1998) and final AutoBATS catalog, respectively. Figure S2. (a) horizontal and (b) vertical errors of the CWB event catalog. Figure S3. Comparison of CWB event catalog (circles) with the catalog reported by Wu et al. (2008a) (diamonds). Figure S4. (a) Aligned vertical waveforms from Australian seismic network as a function of epicentral distance for P-wave BP imaging. (b)-(c) P-wave BP stacking images along NW-and SE-dipping fault planes, respectively. Figure S5. DBSCAN results by setting parameter (ε) of (a) 4 and (b) 0.8, respectively.  Figure S7. MT summation and AutoBATS solutions for mountain C-system (C0-C5). Symbols and explanations are the same as Fig. S6. Figure S8. MT summation and AutoBATS solutions for L-system (L1-L4). Symbols and explanations are the same as Fig. S4. Figure S9. The map (a) and cross-sections (b) showing the clustered (color) and excluded (light blue) earthquakes by the DBSCAN.