1 Introduction

Italy is a region with high seismicity due to the collision process between the African and Eurasian plates. In the last century, Italy has been hit by 8 earthquakes with a magnitude greater than 6.5 (Rovida et al. 2020, 2022), causing more than 120,000 victims. For most of the strong events that have occurred in Italy recently, a Strong Subsequent Event (SSE) with a comparable or higher magnitude was observed. These include M 6.1 1968 Belice (Monaco et al. 1996), Mw 6.4 1976 Friuli (Console 1976), MS 6.9 1980 Irpinia (Bernard and Zollo 1989), Mw 5.9 1984 Val Comino (Milano and Di Giovambattista 2011), Mw 5.7 1997 Umbria-Marche (Di Giovambattista and Tyupkin 2000), Mw 6.1 2009 L’Aquila (Di Luccio et al. 2010), Mw 5.9 2012 Emilia (Ventura and Di Giovambattista 2013) and Mw 6.5 2016 Amatrice earthquake (Gentili et al. 2017). This behaviour can be related to the complexity of the fault systems (Gentili and Di Giovambattista 2017) and the stress transfer on the fault corresponding to the mainshock or neighbouring earthquakes (Catalli et al. 2008). When a fault segment ruptures, it can trigger nearby segments, resulting in a series of closely spaced SSEs that can have a higher magnitude than the mainshock (Shcherbakov et al. 2005). Furthermore, when analysing fault-trapped waves recorded during the 2009 L’Aquila earthquake (central Italy), Calderoni et al. (2012) found that two fault segments, even if mapped as separate segments, are part of a longer and continuous fault system. This observation may have a strong impact on the seismic hazard of the region, since the sequential rupture of contiguous fault segments is likely to produce earthquakes with much larger magnitudes than the MW 6.1 6 April 2009 event.

Compared to the effects of a strong mainshock, SSEs can lead to the collapse of already damaged structures and a further increase in the number of fatalities. Furthermore, they can have severe impacts on our globalised society and cause high economic losses. Therefore, forecasting of a SSE is of strategic importance to reduce the seismic risk during the occurrence of a seismic sequence (Gentili and Di Giovambattista 2017). In recent years, several methods have been proposed for California, Taiwan, Japan, Greece and Italy to forecast a SSE with a similar magnitude of the main strong earthquake. They focus on a pure Epidemic-Type Aftershock Sequence model (ETAS, Shcherbakov et al. 2019; Zhuang et al. 2002, 2004, 2005, 2008; Zhuang and Ogata 2006), the Omori-Utsu law of sequence (Shcherbakov 2014; Shcherbakov et al. 2018), its b value trend (Shcherbakov and Turcotte 2004; Helmstetter and Sornette 2003; Gulia and Wiemer 2019, 2021), its mainshock characteristics (Persh and Houston 2004; Rodríguez-Pérez and Zúñiga 2016; Tahir et al. 2012; Gentili and Di Giovambattista 2017) and the seismicity associated with its aftershocks (Vorobieva and Panza 1993; Vorobieva 1999; Gentili and Di Giovambattista 2017, 2020, 2022; Anyfadi et al. 2023; Gentili et al. 2024b). In particular, Gentili and Di Giovambattista (2017) proposed the machine learning-based algorithm NESTORE (Next STrOng Related Earthquake), which calculates seismicity parameters (features) at different time intervals after a first strong earthquake and provides the forecasting of a strong aftershock. Being DM the difference in magnitude between the first strong earthquake and its strongest aftershock in a sequence, accordingly with a classification of Vorobieva and Panza (1993), NESTORE distinguishes the cases where DM ≤ 1 (type A) from the others (type B) and provides a probability that an ongoing seismic cluster is of type A. The value of DM = 1 to discriminate between classes is not too far from the mean value of this difference proposed by Båth’s law (Båth 1965). This law states that the mean value of DM is approximately equal to 1.2 and does not depend on the magnitude of the first strong event. Båth’s law has been debated starting a few years later, e.g. Utsu (1969) found a correlation between DM and mainshock magnitude. More recent studies have found that DM can deviate significantly from 1.2 value (Shcherbakov and Turcotte 2004).

First versions of the NESTORE algorithm have been successfully applied to seismicity in California and Italy (Gentili and Di Giovambattista 2017, 2020, 2022). Recently, a new and online released version of NESTORE (NESTOREv1.0) was made available on GitHub (Gentili et al. 2023).

In this paper, we present the application of NESTOREv1.0 to Italian seismicity by considering both a nationwide and a regional approach. For this purpose, we used the data recorded in the last 40 years by the national seismic network of the Istituto Nazionale di Geofisica e Vulcanologia (INGV) and the regional network of the Istituto Nazionale di Oceanografia e Geofisica Sperimentale–OGS. Specifically, for both approaches, we trained the program and evaluated its performance using data up to 2020 and then simulated its use in near-real-time mode using data after 2020.

In general, a regional approach is able to better identify particular characteristics of seismicity of a given region than a national approach. On the other hand, the number of available clusters in a regional approach can be very small compared to a national approach, which leads to a low significance of the analysis results. However, the high density of the OGS network in north-eastern Italy allowed us to lower the value of the minimum magnitude for the first strong earthquake used at the national level, resulting in a similar number of database clusters for both approaches.

The paper is structured as follows: Sect. 2 describes the seismotectonics of the areas under study; Sect. 3 describes the available catalogues at national and regional scale and their comparison at regional level; Sect. 4 describes the NESTOREv1.0 code, the improvements it has undergone compared to the previous application to the same area, and the impact of these changes in terms of parameter values and parameter stability. Sections 5, 6 and 7 describe the application of the algorithm at national and regional level, the general characteristics of type A and type B clusters in the two areas, and the comparison of their characteristics. Section 7 also describes a new training with a dataset containing all available and reliable data before 2021. Section 8 is the main result of this paper as it presents the application of the algorithm in near real-time during the ongoing clusters. In the “Discussion and Conclusions” section, the results are summarised and discussed.

2 Seismotectonic framework

Italy, located at the boundary between the African and Eurasian tectonic plates, is prone to earthquakes due to the interaction at these plate edges. The collision of these plates produced a complex seismotectonic setting constituted by areas having different faulting mechanisms and geophysical characteristics. In particular, the Alps exhibit both normal and reverse faulting, the northern Apennines and the Po Valley show a behaviour mainly of reverse type, the central and southern Apennines are dominated by normal fault type and the Calabrian Arc and the Ionian Sea are mainly characterised by strike-slip faulting. This section provides a concise overview of the tectonic characteristics of these four different regions of Italy.

2.1 Alps

Seismic activity in the Alps mainly takes place in the upper crust, with the south-eastern and western regions of the Alps showing higher concentrations of seismic activity than the central Alps. According to Kiratzi and Papazachos (1995), Becker (2000) and Vannucci et al. (2004), the Eastern Alps exhibit active compression normal to the mountain belt, while the Western Alps exhibit extension perpendicular to the trend of the mountain belt, as noted by Fréchet (1978), Mathey et al. (2021) and Montone et al. (2004). In the last five decades, the Eastern Alps have been hit by two major earthquakes: the 1976 ML 6.5 Friuli earthquake, and the 1998 ML 5.6 Kobarid earthquake (Gentili and Franceschina 2011). In contrast, no earthquake with a magnitude larger than 6 was recorded in the Western Alps during the same period.

2.2 Northern Apennines and Po Plain

Northern Apennines Neogene-Quaternary northeast-verging belt developed during the Euro-African convergence and involved the westward continental collision with the Adriatic lithosphere (Alvarez 1972; Doglioni 1991). During convergence, the rollback of the W-subducting Adriatic plate triggered the eastward migration of thrust fronts and foredeep basins. The youngest of these basins is the Po Valley. The Northern Apennines thrust fronts’ tectonic activity is supported by (a) historical and instrumental seismicity (CPTI 2004; CSTI 1.0, 2001; Castello et al. 2006), the latter being characterized by reverse focal mechanism (Pondrelli et al. 2006), (b) geological and geomorphological evidence of faulting and folding of recent deposits at the surface (Vannoli et al. 2004; Boccaletti et al. 2011) and (c) subsurface geophysical profiles showing very recent growth strata developed across buried anticlines (e.g. Scrocca et al. 2007). The seismic activity in this region is characterized by shallow and moderate-depth earthquakes.

2.3 Central and Southern Apennines

It is assumed that the Apennines were formed by the combined effect of the opening of the Tyrrhenian Sea, the eastward movement of a compressive front and the retreat of the lithospheric plate beneath the Italian peninsula. The Apennines are subject to frequent seismic activity, with earthquakes generally occurring at a depth of 10–15 km (Italian Seismological Instrumental and Parametric Database, ISIDe Working Group 2007). The presence and movement of fluids in the Earth’s crust can influence the stress and strength of fault zones, leading to changes in seismic activity. Active NW–SE striking normal faults separate the CO2-releasing western region, originating from the mantle, from the non-degassing eastern area. Recent studies have identified an anomalous region of more than 100 mW m2 with significant heat flow values over Tuscany and Lazio (up to 400 mW m2) (Della Vedova et al. 2001; Di Luccio et al. 2022).

2.4 Calabrian Arc and Ionian Sea

The region of the Calabrian Arc and the Ionian Sea, where the African and Eurasian plates collide, has a complex seismotectonic setting characterized by active fault systems, volcanic activity and earthquakes. The most important factor influencing seismotectonic activity in this area is the subduction of the eastern Mediterranean oceanic lithosphere beneath the Eurasian plate. This subduction process leads to a wide range of earthquakes, including frequent quakes of varying magnitude, both onshore and offshore, from micro-earthquakes to severe earthquakes having magnitude larger than 7. Earthquakes in the Calabrian Arc vary greatly in depth, ranging from a few kilometers to over 400 km. The predominant mechanism observed in this area is strike-slip faulting, which is associated with the subduction process and reveals an extensional style of deformation both in the direction parallel and perpendicular to the arc (Frepoli and Amato 2000).

3 Data

To evaluate the national approach of NESTOREv1.0, we examined the seismicity recorded by INGV stations since 1980. In particular, we used the Lolli and Gasperini (2006) and “Italian Seismological Instrumental and Parametric Data-Base” [ISIDe, ISIDe Working Group (2007), http://iside.rm.ingv.it/iside/] catalogs, covering respectively the periods 1980–2004 and 2005–2020 (Fig. 1a). The first results from the integration of three catalogs, namely the “Catalogo Strumentale dei Terremoti Italiani” (CSTI, CSTI Working Group Version 1.1), which covers the period 1981–1996, the “Catalogo della Sismicità Italiana” (CSI), which covers the period 1997–2002 (Castello et al. 2006) and the Italian Seismic Bulletin (http://bollettinosismico.rm.ingv.it/), which covers the period 2003 to the end of 2004. The ML of the merged catalogue was estimated by orthogonal regression to be compatible with the ISIDe data (Lolli and Gasperini 2006). Most of these data come from the Italian Telemetered Seismic Network (ITSN), whose waveforms were recorded from 1980 to 1984 with analog instruments and later with digital systems. This network was reinforced after the MS 6.9 1980 Irpinia earthquake (Boschi et al. 1990; Barba et al. 1995; Marchetti et al. 2004) and currently comprises more than 470 stations (Fig. 1a).

Fig. 1
figure 1

a INGV and b OGS catalogs used respectively for the NESTOREv1.0 nationwide and regional scale application. The purple rectangle shows the area of analysis at regional level by Gentili and Di Giovambattista (2020)

To investigate the regional application of NESTOREv1.0 on Italian territory, we have analysed the seismicity in an area covering the northeast of Italy (Fig. 1b). This area is monitored by both the regional OGS and the national INGV seismic network. The OGS network has been operational since May 6, 1977 and, except for a short period when it was not operational (December 4, 1990-May 21, 1991), provides a seismic bulletin for the Friuli Venezia Giulia region and surrounding areas, which we refer to as the OGS bulletin for simplicity (Snidarcig et al. 2020; Friuli Venezia Giulia Seismometric Network Bulletin 2020; http://www.crs.ogs.it/bollettino_new/). During the operational period of the OGS network, the temporal improvement of instruments and data acquisition and analysis systems, the increase of installed seismic stations and the sharing of data with other nearby seismic networks led to a lowering of the detection threshold for earthquakes in north-eastern Italy and to an extension of the monitored area to the east and west (Priolo et al. 2005; Gentili et al. 2011; Peruzza et al. 2015; Bragato et al. 2021). The OGS network currently consists of 43 seismic stations (Bragato et al. 2021).

Figure 1b shows the epicentral distribution of OGS Bulletin data between 1977 and 2020 and the distribution of the OGS network. Figure 2 shows the comparison between completeness magnitude Mc of the catalogues of OGS and INGV in the period 1980–2020 within a rectangular study area for north-eastern Italy (Fig. 1b) considered by Gentili and Di Giovambattista (2020). The longitude interval of this area is 11°–14° before 2008 and 11°–14.5° after 2008, while the latitude interval does not change over time and is 45.6°– 46.75°. The Mc is estimated by using the software Zmap and the maximum curvature method (Wiemer & Wyss 2000). In the INGV catalogue the Mc estimation starts later, due to the small number of recorded earthquakes in 1980. Due to the higher density of stations within the area, the Mc of OGS catalogue is generally smaller than the INGV one. This finding is particularly evident before 1985 and after 2010. This result can be easily explained by comparing Fig. 1 of Marchetti et al. (2004) with Fig. 2b of Peruzza et al. (2015). Figure S1 shows the local magnitudes provided by the INGV and OGS networks for the common events in the same period. The reference magnitude of the OGS catalogue is the duration magnitude MD, and to avoid both the incorrect estimation of the Gutenberg–Richter law (Bragato and Tento 2005; Gentili et al. 2011) and to align the magnitude type with the national scale, we used the conversion law of Gentili et al. (2011) to transform it to ML.

Fig. 2
figure 2

Comparison between the completeness magnitude of the OGS network (black continuous line) and the INGV network (green continuous line) between 1980 and 2020 in the rectangular area covering northeastern Italy shown in Fig. 1b. The dashed lines represent the error estimated by the bootstrap method

Figure S1 also shows the orthogonal regression of the data for ML ≥ 1.7 for the two catalogues OGS and INGV compared to the 1:1 line. Even if the difference between the two lines is small (maximum difference 0.1), the dispersion of the data is large; for this reason, we prefer not to merge the two catalogues and have decided to use the OGS catalogue provided by its network in the 1977–2020 period as part of the application at regional level.

4 Methodology: NESTOREv1.0

For our analysis, we used the machine learning MATLAB-based toolbox NESTOREv1.0 (Gentili et al. 2023) to evaluate the ability to forecast a Strong Subsequent Event (SSE) with magnitude MSSE on Italian territory after a first strong earthquake that has a magnitude value equal or larger than a fixed threshold Mth. In the following, we refer to the latter event as the First Strong Event (FSE) or operative mainshock and indicate its magnitude as MFSE. NESTOREv1.0 distinguishes clusters in two typologies according to the difference in magnitude between the FSE and SSE (DM). If DM is equal or smaller than 1, the cluster is defined as type A, otherwise type B. NESTOREv1.0 is a software freely available on GitHub (see Gentili et al. 2023 for more details) and consists of four modules that were used in this study. These are the cluster identification, training, testing and near-real-time classification modules. The flowchart of how the four NESTOREv1.0 modules work is shown in Fig. 3. Using the cluster identification module, NESTOREv1.0 identifies clusters from an initial seismic catalogue using a window-based method (Gardner and Knopoff 1974; Uhrhammer 1986; Kagan 2002; Lolli and Gasperini 2003; Gentili and Bressan 2008; Gentili and Di Giovambattista 2020). Then, it is necessary to divide the cluster database into two parts: one part is used for training and the other for testing to avoid bias in the estimation of the performances in case of overfitting. In the training procedure, NESTOREv1.0 measures a set of seismic parameters (features) at increasing time interval Ti on clusters from the occurrence of the FSE to learn to distinguish between type A and type B cluster populations. For each time interval, the classification is performed by a simple one-node decision tree for each feature, so that a threshold is set for the features for which the training converges to a reliable result. The threshold is chosen so that most clusters of type A have a feature value above the threshold and most clusters of type B below the threshold. The features are based on the number of events, their energy, their source area, their distribution in space and on the changes in time of the previous physical quantities. The main purpose of using these features is to detect changes in seismic activity, especially in the form of increased intensity and irregularity in terms of space, time and magnitude. This shift has been interpreted as a sign of instability of the nonlinear system associated with earthquake-generating faults (Vorobieva 1999) and has been detected in the past before significant earthquakes (Keilis-Borok and Rotwain 1990; Keilis-Borok and Kossobokov 1990) or during seismic clusters before stronger events (Vorobieva 1999; Vorobieva and Panza 1993).

Fig. 3
figure 3

Flowchart of the functioning of NESTOREv1.0

In the test procedure, for each cluster, the threshold values obtained during the training are compared with the feature values measured in the same intervals Ti of the training; the overall probability of type A P(A) is calculated by an approach based on Bayes’ theorem by combining the probabilities supplied by each feature classifier. For both test and training processing, we opted for an initial period of 6 h in order to have sufficient data for the analysis and to limit the effect of the increase in Mc at the beginning of the cluster. According to Gentili et al. (2023), if the number of events was at least 80, the Mc of each cluster is calculated using the maximum curvature method (MAXC) adding 0.2. Otherwise, NESTOREv1.0 assumes a Mc value equal to a default value McDEF. After the testing procedure verifies that the training performances are reliable, the near-real-time classification procedure allows to derive the probability that an ongoing cluster is type A using the training information. A more detailed description of the features used in the analysis (Table S1), the statistical parameters used in the evaluation of the threshold performances (Accuracy, Informedness, Precision, Recall, Good Interval Range) and the functioning of the four NESTOREv1.0 modules can be found in Section S1 of the Supplementary Material.

After the first version of the “NESTORE” algorithm (Gentili and Di Giovambattista 2017, 2020), already applied in Italy, several changes were made in the following years, making it more robust and transferable in different regions and improving the performance evaluation. In particular:

  1. 1.

    We used a minimum MFSE equal to Mc + 2 (Gentili and Di Giovambattista 2022) for the clusters instead of Mc + 3 in the previous versions of the algorithm, in order to maintain the reliability of the results and obtain more data for the analysis.

  2. 2.

    We used a Bayesian method to merge and combine independent classifiers (Gentili and Di Giovambattista 2020), which allows us to weight the classifiers of a single feature depending on their hit and false alarm rate and to deal with the typical imbalances between A and B classes.

  3. 3.

    We assumed a common starting time based on the equation of Helmstetter et al. (2006), which considers an increase in Mc due to the superposition of waveforms after the FSE (Gentili and Di Giovambattista 2022).

  4. 4.

    We improved the reliability of the cluster class definition by excluding the cases in which the DM is between 0.8 and 1.2 from the analysis, to take into account an uncertainty of magnitude estimation of at least 0.1 on MFSE and MSSE.

  5. 5.

    We ended the analysis for A-type clusters with the time of the first aftershock with a magnitude ≥ MFSE-1 instead of the time of the strongest aftershock.

  6. 6.

    We improved the estimation of training performance by using an independent database instead of a Leave One Out (LOO) method for the same training database.

The previous changes had several effects on the feature values, so that the thresholds obtained in the classification were not readily comparable with those of previous works. We summarise the effects below:

  1. 1.

    The use of clusters with smaller MFSE (point 1) as well as the need to use earthquakes with magnitude above Mc for all analysed clusters, forced a change in the feature definition so that the minimum magnitude of the events considered for their evaluation was ≥ MFSE-2 (instead of ≥ MFSE-3 as in the older versions of the software); on the other hand, point 5 implied the further condition that the analysed events had magnitude < MFSE-1. These two conditions together reduced the range of magnitudes compared to previous applications of the code and, as most features are cumulative, generally had the effect of reducing the values of the thresholds.

  2. 2.

    The change of the start time of the analysis (point 3) increases or decreases the number of events involved, depending on whether it starts earlier or later. In particular, the start time of this version of the code was generally smaller than in the previous case. This choice therefore had the effect of increasing the number of events considered and, in contrast to points 1 and 5, increasing the threshold.

  3. 3.

    Points 2, 4 and 6 only had the effect of increasing the stability of the method limiting the misclassification of clusters.

The resultant modification of thresholds compared to the previous software version cannot be predicted due to the opposite effect of points 1 and 5 on the one hand and 3 on the other.

Besides the more robust analysis of the region provided by the 6 changes, the main innovation of this work is the application of the near-real-time classification module to Italy, which allows the forecasting of A-type clusters during seismic crises.

5 Nationwide application of NESTOREv1.0 for Italy

5.1 Cluster identification and analysis region definition

Following Gentili and Di Giovambattista (2017), we identified clusters through a window-based approach using the laws of Uhrhammer (1986) and Lolli and Gasperini (2003), as presented in Section S2 of the Supplementary Material. Thanks to change 1 in Sect. 4, we were able to lower the threshold for minimum FSE magnitude (Mth) from ML 4.5 (as in Gentili and Di Giovambattista 2017) to ML 4. The analysis region was selected in two steps: first, we excluded off-shore areas to avoid clusters with high location errors due to poor azimuthal seismic station coverage. We also excluded the Etna area and events deeper than 30 km to avoid events related to the volcanic or subduction regime in the southern Tyrrhenian Sea (Lanzano et al. 2019). However we did not exclude the Vesuvius area as we did not have volcanic clusters in the region; the margins of the northern Italian area need to be less precisely defined due to data sharing with seismic networks of different countries. Even if Sardinia is located on Italian territory, its seismicity is too low to be considered in this analysis. Figure 4a shows the first selection boundaries in red together with the location of the FSE epicenters of the 47 detected clusters within the region. Type A clusters (red circles in Fig. 4a) account for 30% of the total number of clusters. In agreement with Gentili and Di Giovambattista (2017), we found that the central Apennines are dominated by A-type clusters (Fig. 4a). In addition, the B-type population is dominant in the northern Italian area and widespread in the southern Italian regions. According to Sect. 4, the A-type clusters of L’Aquila (2009), Emilia (2012) and Central Italy (2016) were not included in the analysis because a strong aftershock with magnitude or equal than MFSE-1 occurs before the 6 h following the FSE. In a future implementation of the algorithm, we plan to change this limitation by shortening the time interval between the FSE and the first strong aftershock. The second step in the region selection was to understand if there was a region of anomalous seismicity that should not be analyzed along with the others.

Fig. 4
figure 4

Clusters identified for INGV catalogs reported in terms of a their typology and b the classification performance made by a preliminary self-test of NESTOREv1.0

As described in Sect. 4, the NESTOREv1.0 method assumes that A-type clusters are characterised by higher values of seismicity-related parameters than B-type clusters at incremental intervals after the FSE. These parameters are the number of aftershocks with magnitude M ≥ MFSE-2 (N2), their spatial distribution (Z), their cumulative magnitude change (Vm), their normalised cumulative source area (S), the change in S with incremental (SLCum) and sliding windows (SLCum2), their normalised radiated energy (Q) and the change in Q over incremental (QLCum) and sliding windows (QLCum2). With NESTOREv1.0, a threshold value can be found for each of these features, so that the values corresponding to most A clusters are above and the values corresponding to most B clusters are below it. For example, when applying NESTOREv1.0 to the Greek territory, it has been shown that 6 h after the FSE, the Q threshold = 0.012 (ratio between the aftershocks’ and the mainshock’s energy) distinguishes the two cluster populations well (92% of correct classifications—Anyfadi et al. 2023). Of course, such thresholds may vary due to the specificities of seismicity of a given region, although the previously described different behaviour of the two cluster types has been confirmed in several studies and different regions of the world (Gentili and Di Giovambattista 2017, 2020, 2022; Gentili et al. 2023; Anyfadi et al. 2023).

To perform an initial validation of our model on the Italian territory, we applied the training and testing modules to the entire 1980–2020 cluster catalog, assuming a McDEF of 2.5, in agreement with Schorlemmer et al. (2010).

Figure 4b shows that NESTOREv1.0 achieves a correct cluster classification in 81% of the cases 6 h after the occurrence of the FSE. We found that 5 incorrectly forecasted clusters (56% of the misclassifications) are located in a zone in the northwestern Apennines, which includes the areas of Mugello, Garfagnana, Lunigiana (Tuscany) and the southwestern part of the Emilia-Romagna region. This region, outlined in purple in Fig. 4a, b, is mainly populated by type B clusters which exhibit anomalous seismic productivity in terms of the number of events and the amount of seismic energy released in a short time (Fig. 4b). This particular behaviour suggests that a separate training procedure would be more appropriate for this region. Since the clusters of this area are almost exclusively type B, which are of the order of ten, we decided not to include this area in the following national-level analysis. We then defined a national area for Italy without the previous anomalous area (ITA) and named the corresponding cluster catalogue C_ITA. It is important to note that we selected the purple region based on the available clusters. Some evidence of smaller sequences suggests that the selected region may be larger, but this will be the subject of future more detailed independent work on the characteristics of seismicity in Italy.

Since the area identified by the previous preliminary analysis of NESTOREv1.0 proves to be very particular and interesting in terms of geological characteristics and seismotectonic context according to recent studies, in the next section we have analysed in more detail the main geophysical features observed for the area in order to look for possible causes of the anomalous behaviour of the clusters in this area.

5.2 The anomalous area of the North Central Apennines

The anomalous behaviour of the previously discovered area can be attributed to its geological and structural complexity, as several studies have shown. This region acts as a transitional zone between the extensional-transtensional Tyrrhenian domain and the compressional Adriatic domain (Carmignani and Kligfield 1990; Barchi et al. 1998; Frepoli and Amato 1997, 2000; Montone et al. 1999; Carminati et al. 1998; Doglioni 1991; Doglioni et al. 1999; Negredo et al. 1997; Eva et al. 2005). The NNE movement and rotation of the Adria plate relative to the European plate has led to extension in the Tyrrhenian domain (Bassi et al. 1997; Anderson and Jackson 1987; Ward 1994; Devoti et al. 2002; Eva et al. 2005). Eva et al. (2005) identified transtensional stress at depths up to 30 km and transpressive stress at greater depths. Barani et al. (2010) linked high strain rates in the northern Apennines to the rotation of Adria and gravitational settling, particularly in the Etrurian Fault System. Bonini et al. (2016) attributed high seismic energy release during strong earthquakes to Coulomb stress coupling between the normal faults of the northern Apennines and surrounding thrust fronts. Di Luccio et al. (2022) found that CO2-rich fluids in the crust regulate seismicity and the evolution of intense sequences in the Apennine region, with higher CO2 outgassing areas corresponding to lower S-wave velocity (Vs) anomalies and low seismicity in Tuscany. Their analysis also revealed that most seismic events occurred near zones of rapid transition in Vs, similar to observations in the Parkfield area of the San Andreas Fault (Agostinetti et al. 2022). These regions, characterised by rapid material changes and supra-hydrostatic conditions, are prone to dynamic ruptures. It is hypothesised that high seismicity in these clusters originates from rigid structures near fluid-rich areas, influenced by tectonic stress and overpressure from crustal fluids. Further analysis with more extensive data sets will be conducted to investigate cluster behaviour in greater detail.

5.3 Application to Italy

Analyzing the C_ITA dataset, which covers a large time period, allowed us to better observe some characteristics of A-type clusters compared to B-type clusters at the national level. In particular, the Fig. 5a shows the duration of the clusters normalized by the time length determined by the window method and using the law of Lolli and Gasperini (2003). Furthermore, Fig. 5b, c show the number of aftershocks with magnitude greater than or equal to MFSE-2 and their cumulative seismic moment M0 released and normalized to that of the FSE. In this regard, we used the relationship proposed by Malagnini and Munafò (2018) to obtain M0 from ML. It is interesting to note that while the distribution of the normalized cluster duration for clusters B took values in the range between 0 and 1, the values for clusters A were all above 0.8 (Fig. 5a). In agreement with the results of Gentili and Di Giovambattista (2017) and Gentili et al. (2023), Fig. 5b, c show that A clusters were on average more productive than B clusters in terms of the number of aftershocks with a magnitude equal to or greater than MFSE-2 calculated over the entire duration of the cluster and their cumulative seismic moment. From this result it can be deduced that in 92% of B clusters no more than 8 events with M ≥ MFSE-2 were recorded, while for the 92% of type A clusters the number of events was greater than or equal to 8. Figure 5c shows that for the vast majority of A clusters (85%), events with M ≥ MFSE-2 released a cumulative seismic moment greater than half the value corresponding to the FSE and comparable to it in 62% of cases. In contrast, for Bs, the contribution of events with M ≥ MFSE-2 was in all cases less than one fifth of the value of the FSE.

Fig. 5
figure 5

a Distribution of the cluster duration normalized to the time window width given by the window-based method (Lolli and Gasperini 2003), b distribution of the number of aftershocks with M ≥ MFSE-2, and c the corresponding cumulative seismic moment normalized to that of the FSE for the C_ITA catalog

In machine learning applications, the available data is split into two separate data sets to obtain reliable training and performance evaluation: the first is used to train the algorithm, the second to verify performance. The percentage of data used in each dataset is variable. Some studies suggest that the best model performance is achieved when the training and testing databases account for 70% and 30% of the original dataset, respectively (Liu et al. 2019); other authors concluded that the percentage of data corresponding to the training database can be as high as 80% (Gholamy et al. 2018) or between 50 and 70% (Xu and Goodacre 2018). In this paper, for the NESTOREv1.0 application, we considered a ratio of 3 to 1 for the time periods corresponding to the training and test catalogues, respectively, at both national and regional levels. In particular, we split C_ITA into two parts, training the algorithm for the first 30 years (C_ITAtrain) and testing the results on the clusters of the last 10 years (C_ITAtest). Due to the higher number of available clusters in more recent times, which is a consequence of the better coverage of the network and the lower Mc, 63% of the clusters were used in the training analysis, while the remaining 37% were used in the test procedure. The spatial distribution of the two databases along the ITA area is shown in Fig. 6a, b.

Fig. 6
figure 6

a Training and b testing databases for ITA shown as a function of their cluster typology and local magnitude ML

5.3.1 Training for ITA area

Figure 6a shows the distribution of C_ITAtrain clusters, consisting of a set of 24 clusters occurring on the Italian territory between 1980 and 2009. During training, for each time interval and applying a LOO method to each feature classifier separately, NESTOREv1.0 estimates the performances of each classifier based on a set of performance evaluators. Features whose classifier fails the test with all evaluators for a time interval are not used for classification in that time interval. Figure 7a, b show two of these evaluators: Accuracy and Informedness (Powers 2011; Gentili and Di Giovambattista 2017; Gentili et al. 2023). The blue area corresponds to the region where NESTOREv1.0 considers the performance unreliable for the evaluator. Each symbol corresponds to a different classifier and indicates in which interval the threshold performance is good (ΔTG). Figure 7a, b show that no feature provides reliable performance for Ti greater than one day. This is due to the time elapsed between the o-mainshock and the occurrence of the first strong aftershock with M ≥ MFSE-1 for A-type clusters in the training database. In almost 43% of the cases, these earthquakes occur before the second day after the FSE. Consequently these clusters are no longer included in the training set on the second day following the o-mainshock (see Sect. 4), resulting in poorer statistics for the calculation of thresholds for longer time intervals.

Fig. 7
figure 7

Performance of C_ITAtrain single feature classifiers during the training evaluated in terms of a Accuracy, b Informedness, c ROC graph at 6 h after FSE and d ROC graph at 1 day after the FSE

Figure 7c shows the Receiving Operating Characteristics (ROC, Egan 1975; Sweets et al. 2000; Fawcett 2006; Gentili and Di Giovambattista 2017; Gentili et al. 2023) graph, which was determined using the LOO method for threshold values 6 h after the FSE. Compared to the results of Gentili and Di Giovambattista (2017), we observe a better performance for all features at 6 h of observation, while at 1 day (Fig. 7d) we observe an improvement in terms of False Positive Rate and a decrease in terms of Recall (True Positive Rate). The values of the feature thresholds found for C_ITAtrain are listed in Table S2 in section S3 of the Supplementary Material.

5.3.2 Testing for ITA area

We applied the testing procedure of NESTOREv1.0 to C_ITAtest, which consists of 14 clusters and is divided into 6 A-type and 8 B-type cases. These clusters correspond to the period 2010–2020 and their spatial distribution is shown in Fig. 6b. Figure S2a shows the Bayesian probability that a cluster belongs to the A-type as a function of the analysis periods Ti corresponding to C_ITAtest. In particular, 6 h after the FSE, 79% of the clusters are correctly classified, and the percentage of successful forecasting increases to 86% after the first day (Fig. S2a-8). Indeed, the ML 4.4 cluster that occurred in Sicily in 2011 was classified as B-type due to the initial lack of aftershocks with a magnitude of ML 2.4 or more before one day and correctly forecasted as A-type afterwards.

The Recall-Precision graph after 6 h shows that all type A forecastings are correct (i.e. no false alarm) and that 50% of type A clusters are recognized by almost all features and their Bayesian combination (Fig. S3). The latter percentage increases to 67% 1 day after the FSE, due to the change in classification of the ML 4.4 (2011) Sicily cluster (Fig. 8a). In addition, the ROC diagram shows a zero value for the false positive rate after one day (Fig. 8b), indicating that the B-type clusters are all correctly classified. This performance does not change in the following periods (see Fig. S2a). The two misclassifications (Fig. 9) correspond to the A-type clusters ML 4.2 (2017) Central Italy and ML 4.7 (2018) Molise, both characterized by the absence of aftershocks between the FSE and the SSE. In the first case, we speculate that the high seismicity in the Central Italy region since August 2016 has led to an anomalous behavior of the subsequent clusters, which cannot be considered isolated. As for the second case, due to the low density of seismic stations in the Molise region, Mc cannot be guaranteed for the entire duration of the cluster; a more detailed analysis of the seismicity of the 2018 Molise cluster’s seismicity, which is beyond the scope of this article, is the subject of the paper Gentili et al. (2024a).

Fig. 8
figure 8

NESTOREv1.0 performance in terms of cluster typology forecasting for C_ITAtest at 1 day from the FSE occurrence

Fig. 9
figure 9

Performance of C_ITAtrain thresholds on the C_ITAtest at 1 day after the FSE in terms of a Precision-Recall and b ROC graph

6 Regional scale application of NESTOREv1.0 on North East Italy

6.1 Cluster identification analysis

According to Gentili and Di Giovambattista (2020), the OGS bulletin has a Mc smaller than that of the national INGV network in correspondence with northeastern Italy. This finding made it possible to lower Mth in the first application of the NESTORE method in Northeast Italy (Gentili and Di Giovambattista 2020). In our analysis at the regional level, we considered the same time-dependent area, mainly covering north-eastern Italy (hereafter NEI), using the OGS bulletin in the period 1977–2020. Similar to the ITA area, only FSEs with a depth of less than 30 km were considered. In accordance with Fig. 2 and consistent with Gentili and Di Giovambattista (2020), we assumed a McDEF of 2.0 before 1994 and between 1.5 and 1.7 in the following years, set Mth for NEI at 3.7 and used the relationships proposed by the authors for the identification of clusters in space and time based on Gentili and Di Giovambattista (2020) (see section S2 in the Supplementary Material). We obtained a catalogue of 32 clusters for NEI (C_NEI), of which 8 belonged to the A-type (Fig. 10).

Fig. 10
figure 10

Clusters identified for OGS catalog reported in terms of their typology. Blue rectangle: study area

In contrast to ITA, the spatial distribution of the A-type population did not appear to be concentrated in specific areas, while the eastern part was dominated by B-type clusters. Similar to the catalog study of the national approach (Sect. 5.1), we analyzed the C_NEI catalog in terms of the effective cluster duration normalized to the window based method duration obtained from Gentili and Di Giovambattista (2020), as well as the number of aftershocks with magnitude greater than or equal to MFSE-2 and the corresponding cumulative seismic moment normalized to that of the FSE (Fig. 11).

Fig. 11
figure 11

a Distribution of the cluster duration normalized to the time window width given by the window-based method (Gentili and Di Giovambattista 2020), b distribution of the number of aftershocks with M ≥ MFSE-2, and c the corresponding cumulative seismic moment normalized to that of the FSE for the C_NEI catalog

The plot of normalized cluster durations shows that—in good agreement with the national approach—88 percent of type-A clusters had a value greater than 0.8 (Figs. 5a, 11a). Figure 11b shows that, as in the national approach, the expected number of events with a magnitude greater than or equal to MFSE-2 was less than 8 in clusters B. Furthermore, it can be observed that the A clusters were characterized by a lower average number of events with M ≥ MFSE-2 than in the national approach (Figs. 5b, 11b). In fact, although the majority of the A clusters were still characterized by a higher number of aftershocks than the B clusters, almost one third of the A cases had a number of aftershocks with M ≥ MFSE-2 of less than 3. Interestingly, the distribution of the B clusters in terms of the seismic moment released by the aftershocks (Fig. 11c) was even more peaked than that of the B clusters in the national approach (Fig. 5c). The cumulative value of the seismic moment of aftershocks, normalized to the value of the FSE, was below 0.2 for all B clusters and above this value for all A clusters. Moreover, in the regional case, the percentage of A clusters in which the cumulative seismic moment release of events with M ≥ MFSE-2 was comparable to that of the FSE (38%) is almost two times smaller than in the national case (Fig. 5c).

6.2 Training procedure and application for NEI

In the case of C_NEI, we chose almost the same temporal split as for C_ITA by considering data between 1977 and 2009 as the training catalogue (C_NEItrain) and between 2010 and 2020 as the test catalogue (C_NEItest). In particular, since we found that the ML 3.8 1999 Kobarid (Slovenia) A-type cluster has anomalous seismic properties compared to the other data from C_NEI, probably due to its proximity to an earlier cluster with a strong main earthquake in the same area, we did not include this cluster in the C_NEItrain training database.

Figure 12a, b show the spatial distribution of C_NEItrain and C_NEItest, respectively. In this case, 42% of the C_NEI clusters are used for the training of NESTOREv1.0 and the remaining 58% for the test procedure. Even if this temporal division of the database differs slightly (8%—see Sect. 5.3) from that given by Xu and Goodacre (2018), it makes it possible to limit the imbalance between A and B typologies in the two datasets.

Fig. 12
figure 12

a Training and b testing databases for NEI shown as a function of their cluster typology and local magnitude

6.2.1 Training for NEI area

In the NESTOREv1.0 training for the NEI regional area, we analysed 13 clusters consisting of 4 A-type and 9 B-type clusters (Fig. 12a). According to the Accuracy values obtained for C_NEItrain, the thresholds correctly identify the cluster typology in more than 75% of the cases. This percentage is 77% for N2 and S and 92% for Vm and Z after the first 6 h (Fig. 13a). At this period, the Informedness values of the features show that the rate of correct identification of type A clusters is between 53 and 75% (Fig. 13b). 12 h after the FSE, Accuracy and Informedness are both 1, showing that most features correctly identify all clusters of C_NEItrain (Fig. 13a, b). As in the case of C_ITAtrain, 50% of the type A clusters were found to have the SSE before the first 18 h after the FSE. This means that the periods corresponding to the good interval time are always less than 1 day. As for the previous national case, the thresholds cannot be compared with those of Gentili and Di Giovambattista (2020) due to the changes in Sect. 4. The determined thresholds are listed in Table S3 in the Supplementary Material. Comparing the ROC diagram obtained after 6 h (Fig. 13c) with the results of Gentili and Di Giovambattista (2020), the estimated performances during the training procedure are better, but this could be related to the exclusion of the 1999 Kobarid outlier from the training set. Comparing the results of C_ITAtrain with those of C_NEItrain, we find that the percentage of A-type clusters correctly identified for NEI with all features is similar to that of ITA. Conversely, for type B misclassification (False Positive Rate, FPR), the percentage for NEI is slightly higher than for ITA for some features (Fig. 13c).

Fig. 13
figure 13

Performance of C_NEItrain single feature classifiers during the training evaluated in terms of a Accuracy, b Informedness and c ROC graph at 6 h after the FSE

6.2.2 Testing for NEI area

The test catalogue for NEI (C_NEItest) consists of 18 clusters, which are shown in Fig. 12b. 40% of the C_NEItest clusters are located in eastern Slovenia and belong to the B-type class. The 3 A-type clusters of C_NEItest occurred in the Veneto region and in the north-central part of Friuli. Figures S2b and 14 show that the 6-h forecast of NESTOREv1.0 for C_NEItest is correct in 94% of the cases. The only misclassification in this period concerns the ML 3.7 (2015) Valdobbiadene earthquake, which corresponds to an anomalous A-type cluster (Fig. 14). After the FSE of magnitude ML 3.7 on May 12, no aftershocks with a magnitude greater than ML 1.6 were detected before the SSE (ML 3.6) on May 15. Since the Valdobbiadene earthquake is located in an area for which there was no training, the incorrect forecasting could be due to differences between the local seismicity and the Friulian-Slovenian seismicity. The Precision-Recall and ROC plots show that 6 h after the FSE, 67% of the A-type clusters are correctly identified by N2, S, Q, the simple and the Bayesian combination of feature probabilities (Fig. 15a, b). Similar to the C_ITAtest, the ROC plot shows that all B-type cases are correctly identified (Fig. 15b). This finding was to be expected due to the large population of type B clusters in C_NEItrain.

Fig. 14
figure 14

NESTOREv1.0 performance in terms of cluster typology forecasting for C_NEItest at 6 h from the FSE occurrence

Fig. 15
figure 15

Performance of C_NEItrain thresholds on the C_NEItest at 6 h after the FSE in terms of a Precision-Recall and b ROC graph

7 Thresholds characteristics for Italian territory

To better explore both the properties of feature thresholds for national and regional areas and to evaluate the performance of the near-real-time classification module of NESTOREv1.0 on ITA and NEI, we used the entire C_ITA and C_NEI catalogues. As described in section S5.1 of the Supplementary Material, we first checked the temporal stability of the feature thresholds for the Italian territory and their possible dependence on the 1980–1990 period, when the national network was characterised by a poorer performance in terms of Mc, location and magnitude calculation. We found that the thresholds for all features are stable in time to a first approximation, except in the case of S and Z (Figs. S4, S5). To use the refined thresholds for the near-real-time procedure in both the ITA and NEI areas, we performed an autotest for the two areas and excluded the clusters of C_ITA and C_NEI that were outliers in at least one period Ti. The resulting refined catalogues were named C_ITAR and C_NEIR and were used to perform a second training and obtain new thresholds for ITA and NEI (Tables S4, S5 in the Supplementary Material). This last process is described in more detail in section S5.2 of the Supplementary Material. Taking the period Ti = 6 h, an interval in which the performance results of both C_ITAtest and C_NEItest are significant, it was found that the ITA thresholds (Table S4) are two or three times higher than the NEI thresholds (Table S5) for all available features. This result seems to indicate that the characteristics of NEI seismicity differ from those of ITA seismicity in the early stages of cluster occurrence. In particular, NEI appears to be characterised by lower productivity in terms of number of events, source area and energy than those measured at the national scale. Further details on this observation can be found in section S5.2 of the Supplementary Material. In order to use a refined training corresponding to the longest possible time period, we decided to use the training corresponding to C_ITAR and C_NEIR for the near-real-time application to ITA and NEI described in the following section.

8 Near-real-time analysis for Italian territory

To obtain a first simulation of NESTOREv1.0’s ability to classify a cluster in the hours immediately following the occurrence of its FSE, we evaluated the performance of the near-real-time module in both the ITA and NEI area from 2021. Since the first period of feature calculation of NESTOREv1.0 is currently 6 h after the occurrence of the FSE, the forecasting of a SSE is limited to those clusters that do not have an event with a magnitude M ≥ MFSE-1 in the first 6 h after the FSE. Subsequently, 6 clusters for the ITA area and 3 clusters for the NEI area were analyzed, corresponding to the available cases with MFSE ≥ 4 and MFSE ≥ 3.7 in the case of ITA and NEI, respectively. As the final forecasting period, we chose the one corresponding to the best threshold performance obtained for the C_ITAtest and the C_NEItest, which corresponds to one day after the FSE in the case of ITA and 6 h after the FSE in the case of NEI. We have shown in Fig. 16a the type of clusters analyzed and in Fig. 16b the performance of NESTOREv1.0 corresponding to the last forecasting period.

Fig. 16
figure 16

a Clusters used in the period 2021–2024 for the near-real-time analysis and b NESTOREv1.0 correspondent forecasting performance on them at 1 day after the FSE for ITA and 6 h after the FSE for NEI; c Probability to be an A-type cluster vs time for A-type Gambettola and d B-type Montagano clusters

8.1 Application to ITA

For the ITA area, we considered the ML 4.4 Gambettola (type A), ML 4.6 Montagano (type B), ML 4.4 Catania (type B), ML 4.2 Ceneselli (type A), ML 4.0 Carfizzi (type A) clusters that occurred in the period 2023–2024 using INGV data (http://terremoti.ingv.it/) (Fig. 16). In addition, we included in the analysis the B-type cluster corresponding to the ML 3.9 Moltelparo (2023) event, whose MFSE value was reported as 4.0 in the first week after the FSE (Fig. 16). In such a case, we can test the classification in near-real-time, even when the magnitude of the FSE fluctuates around the given Mth threshold.

In the case of the Gambettola cluster (northern Italy), the FSE with magnitude ML 4.1 occurred at 10:45:41 UTC on January 26, 2023, followed by the SSE at 05:32:51 on January 28 with the same magnitude value. In the first six hours, there are four events with a magnitude of ML 2.1 or more, and the voting of the S, Z, Vm, Q and N2 features after six hours is of type A with a probability P(A) equal to 1. In the following periods, the additional A-type voting of SLCum, QLCum at 12 h and SLCum2, QLCum2 at 18 h after the FSE leads to a Bayesian probability of type A P(A) equal to 1 (Fig. S6) and then to a correct classification of the cluster for all periods considered (Fig. 16c).

The Montagano B-type cluster is characterised by a FSE of ML 4.6 that occurred on 28 March 2023 at 21:52:42 UTC. The earthquake was clearly felt in the Molise region (central Italy) and neighbouring regions, and the strongest aftershock occurred the following day with a ML of 2.6 at a time interval of just over 6 h. The latter event turns out to be the only aftershock analysed by NESTOREv1.0 that leads to a zero P(A) probability for S, Z, Q, Vm, N2 at 6 h and for SLCum at 12 h (Fig. S7). The resulting Bayesian probability that the cluster is type A is correctly equal to 0 from 6 h to one day after the FSE (Fig. 16d).

The third case analysed concerns a B-type cluster whose FSE had an epicentre about 10 km southeast of the coast of Catania (southern Italy) on April 21, 2023 with ML 4.4. This event, which was about 17 km deep, was widely resonant in the south-eastern part of Sicily and was followed by only two events with ML below 2. Since there is therefore no event with a magnitude greater than MFSE-2, all the features vote for type B from 6 h to 1 day. It follows that, as in the previous case, the classification of the cluster based on the Bayesian method is correctly type B for all increasing periods from the FSE occurrence onwards.

The fourth case analyzed is the type A cluster, whose FSE with ML 4.2 occurred on October 28, 2023 in the southern part of the Veneto region and was followed three days later by an event of the same magnitude. No event with a magnitude greater than 2.1 occurred in the time window between these events. Consequently, the features and their Bayesian combination result in a P(A) = 0 for each time period leading to a misclassification of the cluster.

The fifth case corresponds to the type B cluster in Montelparo (Marche region) having a FSE that occurred on November 14, 2023. Its estimated magnitude was ML 4.0 in the first two weeks, reducing to ML 3.9 thereafter. In the first six hours after the FSE, an event with ML greater than 2 occurs, and at 6 h the P(A) values provided by N2, Vm, Z, Q are equal to 0 and are inherited in the following periods. Even if another event with ML > 2.0 occurs after the 6 h and the features S, SLCum to SLCum2 vote for a cluster of type A, the resulting Bayesian probability P(A) is equal to 0 up to one day, which leads to a correct classification of the cluster.

The sixth and last case analyzed refers to the type A cluster that occurred in May 2024 in the eastern part of the Calabria region (southern Italy). Specifically, it was located near the village of Carfizzi, 30 km north of the city of Crotone. In this cluster, the FSE occurred on May 24 with magnitude ML 4.0, followed by three SSEs that occurred on May 28, May 29 and June 1 with magnitude ML 3.1, ML 3.9 and ML 3.2, respectively. Since the occurrence of the FSE, three events with a magnitude of more than 1.9 have occurred on the first day, two of them within the first two hours. All features of NESTOREv1.0 except QLCum and QLCum2 indicate the occurrence of a cluster A in each period, resulting in a correct cluster classification from 6 h to one day.

8.2 Application to NEI

In the case of NEI, we analysed in near-real-time the ML 3.9 Zuglio A-type cluster (2021), the ML 4.5 Klana B-type cluster (2023) and ML 4.6 Socchieve B-type cluster (2024) using the data from the OGS network and bulletin (http://www.crs.inogs.it/bollettino_new/). In all cases, a report was sent to the head of the OGS Seismological Research Centre (CRS).

The first cluster occurred near Zuglio, about 80 km northwest of Udine town. The FSE occurred on October 21, 2021 (ML 3.9) and its SSE (ML 3.0) occurred almost a day apart. Two aftershocks were recorded between these events, one with a magnitude of ML 2.1 and another with a magnitude of ML 1.1, which occurred two and five hours after the FSE respectively. The first of them was considered in the NESTOREv1.0 analysis, which correctly provides a Bayesian probability P(A) for the cluster equal to 1 after 6 h. In particular, the features S, Q and N2 vote for a cluster A with a P(A) = 1 in the case of S and Q and P(A) = 0.8 in the case of N2.

The second case considered refers to a type B cluster whose FSE with a ML of 4.5 occurred on July 29, 2023 in Klana (Croatia) near the border between Croatia and Slovenia. This event, which was felt in Croatia, Slovenia and north-eastern Italy, was followed by several aftershocks, the strongest of which had a ML of 2.4. As there were no events with a magnitude greater than 2.5, the available N2, S and Q features at 6 h gave a P(A) = 0, resulting in a correct classification of the cluster in near-real-time.

The third and last case analyzed corresponds to a type B cluster that occurred near Socchieve, about 60 km northeast of the city of Udine. The FSE with a magnitude of ML 4.6 was one of the strongest events recorded in this area in the last 20 years and was followed on April 5 by the strongest SSE with ML 3.3. Within the first six hours after the FSE, an event with M ≥ MFSE-2 (ML 2.7) occurred. This means that feature N2 indicates the occurrence of a cluster A with a probability of 0.86, with the N2 threshold value for NEI after 6 h being equal to 0.5 (Sects. 7-S5.2). In contrast, S and Q indicate the occurrence of a B cluster with a probability of 1. It follows that the overall Bayesian probability P(A) after 6 h is equal to 0 and that the forecasting of the cluster type is correct.

Since ITA and NEI coincide in the Northeast Italy-Western Slovenia area and MFSE for the Klana (2023) and Socchieve (2024) B-type clusters is greater than 4, the near-real-time performance of these clusters should also be evaluated with C_ITAR training. However, since the Klana (2023) and Socchieve (2024) clusters are characterized by a number of events with M ≥ MFSE-2 equal to 0 and 1, respectively, and since the feature thresholds for ITA are higher than those for NEI in most time periods, we can conclude that the classifications of these two clusters remain correct even if we consider the ITA area and the corresponding training of C_ITAR.

8.3 Final remarks

Although our analysis provides a correct near-real-time estimate of the cluster type for the most intense clusters that occurred in ITA and NEI after 2020 (Fig. 16b), it should be noted that our methodology is currently not applicable for periods of less than 6 h after the occurrence of the FSE. Future research will explore the possibility of further shortening this minimum period aiming to provide information on clusters where the next strong event occurs in the first hours after the FSE. It should also be noted that for a reliable evaluation of the performance of the near-real-time module of NESTOREv1.0, it is necessary to test the algorithm on a much larger independent database of clusters than the one considered so far. The analysis is promising, and we hope it can be of significant help for civil protection purposes in the future, after further investigations and applications by the scientific community, thanks to the sharing of the NESTOREv1.0 open-source software.

9 Discussion and conclusions

This paper presents the results of applying the new machine learning package NESTOREv1.0 (Gentili et al. 2023) to the seismicity data of Italy recorded by the INGV and OGS networks in the last 40 years. In this work we:

  1. 1.

    enhanced forecasting of A-type clusters compared to previous applications in Italy in terms of feature calculation, definition of clusters, estimation of training performances and near-real time classification

  2. 2.

    performed both a regional and national scale studies for Italy lowering Mth in the case of national-wide approach

  3. 3.

    identified an area of type B anomalous clusters in the northwestern Apennines characterized by remarkable frequency of events after the FSE that could be explained by several complex geophysical factors

  4. 4.

    trained NESTOREv1.0 in the first 30 years of two available catalogs and, by testing it in the 2010–2020 period, obtained a correct cluster forecasting in 94% of cases six hours after the FSE for regional approach and in 86% of cases one day after the FSE in the national approach

  5. 5.

    exploited the entire catalogs both to verify the stability of the feature thresholds in time and to perform a near-real time cluster classification on the period 2021–2024 that provides performance similar to the 2010–2020 testing analysis

According to the results shown in Figs. 5 and 11, additional information in terms of cluster seismic productivity, cluster duration and cumulative release of seismic moment can further refine the forecasting provided by NESTOREv1.0. We observed that in terms of cluster duration, while type B clusters have a variable value, type A clusters have a value almost always close to the duration defined by the window-based method (Lolli and Gasperini 2003; Gentili and Di Giovambattista 2020). This result is to be expected, since in the case of type A clusters, where the magnitude of the SSE is comparable to that of the FSE, the end of the time window provided by the window-based method is determined by both the aftershocks associated with the FSE and the ones (starting later) associated with the SSE. In contrast, in the case of the B types, the duration of the window-based method should in most cases cover the entire temporal decay of the aftershocks associated with the FSE, as the results in Figs. 5a and 11a show. In terms of the number of aftershocks with magnitude greater than MFSE-2, most clusters A and clusters B turn out to be well distinguished. In particular, such distinction appears to be most effective in the case of the Italian approach (Table 1). When comparing Figs. 5b and 11b, we hypothesise that the lower boundary between the two classes of clusters in the regional approach, compared to the national approach, could be due to the lower seismicity in the NEI area relative to the mean rate of seismicity in the ITA area. This result would be confirmed by the different behavior of features in the two areas in the first hours following the FSE (Sects. 7-S5.2). In the case of the cumulative seismic moment of aftershocks, the distinction between cluster A and B distributions turns out to be even more evident, especially in the case of the regional approach (Fig. 11c). Interestingly, for a relevant part of A-type clusters the cumulative seismic moment of aftershocks turns out to equal or exceed the seismic moment corresponding to the FSE for both approaches. Particularly in the case of the national approach, this occurs at a rate greater than 60 percent (Fig. 5c).

Table 1 Statistical parameters for clusters A and B derived in the case of C_ITA and C_NEI: effective cluster duration ΔTC , related to the duration of the window-based method ΔT, number of aftershocks with magnitude M ≥ MFSE-2, and their cumulative seismic moment release, related to the seismic moment M0 of the FSE

These findings can be valuable for predicting cumulative structural damage to buildings in the epicenter zone once the cluster type is forecasted. Specifically, for cluster A, we anticipate not only a longer duration and a higher number of intense aftershocks but also a greater cumulative seismic moment release compared to cluster B. As a result, structures may be subjected to cumulative energy content that is comparable to or even exceeds that of the FSE.

In detail, based on the summarized data in Table 1, we can conclude that once a cluster has been classified by NESTOREv1.0 as type A, we can further forecast the minimum cluster duration. Indeed it is 0.8 times the duration determined by the window method in the case of the national and regional approaches, with a probability of 1 and 0.88, respectively. The NESTOREv1.0 classification gives us some information also on the expected number of aftershocks with M ≥ MFSE-2: if the classification is “type B” we can estimate a probability of 0.92 that the maximum number is 7 in the national case and 2 in the regional case. Conversely, if the classification is type-A, we can expect a minimum number of events equal to 8 in the national case with the same probability value, while for the regional case this value is equal to 2 with an associated probability equal to 0.75. Finally, if regarding the cumulative seismic moment for aftershocks with M ≥ MFSE-2, based on the past data, we can estimate with a probability 1 that in the case of a type B cluster it does not exceed a value of 0.2 and 0.1 times the seismic moment released by the FSE in the national and regional cases, respectively. For type A clusters, on the other hand, it can be expected that half of the seismic moment released by the FSE will be reached or exceeded with a probability of 0.85 in the national case and 0.2 times the seismic moment released by the FSE with a probability of 1 in the regional case. These preliminary results and considerations could be highly useful for civil protection efforts following a first strong earthquake. However, due to the critical nature of decision-making through extensive validation is essential.

In our database the number of the cluster with potentially dangerous FSEs (MFSE ≥ 5.0) is small (30%). From a statistical point of view, a small database can lead to an inaccurate estimate of probabilities. On the other hand, from a machine learning perspective, small datasets offer advantages like faster training times, simpler models, and better outlier detection. However, they also risk overfitting, reducing generalisation (Safonova et al. 2023). To obtain a database that is as statistically significant as possible, we also consider clusters with weaker FSEs. This choice, assuming self-similarity between clusters with different MFSEs and taking into account the magnitude of completeness of the catalogs, gives us a case number in the order of a few dozen for the national and regional approach. To better validate the method, a larger dataset is necessary. This can be achieved by (a) extending the data collection period or (b) expanding the geographical area analysed.

Extending the period is more robust but time-consuming; for instance, it would take 10 years to collect a reliable dataset of approximately 50 clusters in the ITA region. Analyzing smaller clusters over time could expedite this process. Expanding the area could also increase the dataset. However, in Sect. 7 of this document, the regional differences in thresholds were pointed out, which also become more relevant when considering different countries (see e.g. Anyfadi et al. 2023). One possible approach could be the one used in Anyfadi et al. (2023) for different regions of Greece: the training is conducted in one area and a testing is performed to check whether the performance is still reliable in another area. If this is the case, the two areas can be merged into a single population region. A possible extension of the NEI area could, for example, eastwards towards central Slovenia or southwards towards Croatia. However, in order to maintain the level of completeness of the national catalogues, this analysis cannot be performed using world scale catalogues and a preliminary magnitude scale conversion is required to merge different national catalogues. Approach (a) is currently being explored in different regions. Regarding approach (b), we are enhancing techniques that facilitate the integration of existing catalogs to increase the number of clusters for analysis (Si et al. 2024). Additionally, increasing data through synthetic datasets modelled on seismicity (e.g., ETAS Console et al. 2015) or using oversampling techniques are both under study.

The most interesting applications from the ground shaking point of view are the ones in which during the occurrence of a cluster a strong SSE is expected. In this case the maximum expected level of resentment for the SSE could be calculated by using GMPEs. When the magnitude and the epicentral location of the SSE are known, a ground motion prediction equation (GMPE) can be used to calculate the estimated Peak Ground Velocity (PGV) in the epicentral area and define the area characterised by a PGV value ≥ 2.4 cm/s. According to Faenza and Michelini (2010, 2011), this area would experience at least moderate potential damage and very strong perceived ground motion.

In our case, NESTOREv1.0 is able to define a maximum magnitude for cluster B (MFSE-1.1) and a minimum magnitude for cluster A (MFSE-1). Furthermore, it supplies in output the circular area defined by the window method and the MFSE where the location of the SSE within is considered equiprobable. Thus, as a way of forecasting perspective within that area, using the GMPE for Italy of Lanzano et al. (2019), we could calculate the maximum PGV value (PGVMSSE) by taking as reference the maximum expected magnitude for SSE in the case of B-type clusters and the minimum expected magnitude for SSE in the case of A-type clusters. If PGVMSSE turns out to be equal to or greater than 2.4 cm/s, we can combine the forecasting of the cluster type with a warning of expected damage in the area. More specifically, in the case that damage is expected in the area, we could compare the PGVMSSE value with the law of Faenza and Michelini (2010) to define the maximum damage level in the case of cluster B and the minimum damage level in the case of cluster A. Regarding the area where the SSE is expected, this information may be further refined. Calderoni et al. (2017) in an application to the central Italy seismic sequence revealed that the early distribution of along-strike seismicity is notably asymmetrical. According to Rubin and Gillard (2000) and Zaliapin and Ben-Zion (2011), this asymmetry may suggest a correlation between the rupture propagation direction and the immediate post-main shock seismicity. Specifically, there is a higher probability of significant magnitude earthquakes occurring in the direction of rupture propagation. This was evidenced by the greater number of early-stage aftershocks towards the NNW and the azimuth of the strongest aftershock, which followed the main shock, aligning with the rupture propagation direction observed during the 24 August 2016, MW 6.0 Amatrice earthquake. Further analyses on this topic may in future improve NESTOREv1.0 software output.