Estimation of MCS intensity for Italy from high quality accelerometric data, using GMICEs and Gaussian Naïve Bayes Classifiers

Macroseismic intensity provides a qualitative description of seismic damage. It can be associated with Ground Motion Parameters (GMPs), which are extracted in near real-time from instrumental recordings during an earthquake. Several formulations of this empirical association exist in literature for Italy, mainly focusing on the relationship between intensity expressed on the Mercalli-Cancani-Sieberg (MCS) scale and peak ground acceleration or velocity. They are usually in the form of Ground Motion to Intensity Conversion Equations (GMICEs), which treat intensity as a continuous quantity. We propose an alternative approach, the Gaussian Naïve Bayes (GNB) classifiers, which allows to correctly treat intensity according to its ordinal definition. As a comparison, we also implement a modified version of the standard GMICE approach. We expand the existing database of GMP/MCS-intensity points with new, high-quality accelerometric data recorded in Italy in the period from 2002 to 2016 and resample the database by treating the intermediate intensities with half integer values (which are not meaningful in the MCS description) as both belong to the above and below full integer classes with an assigned weight. As a result, we estimate a new set of regression relations and GNB probability distributions between integer MCS intensity classes and eight GMPs (peak acceleration, velocity, displacement, Arias and Housner intensities, spectral acceleration at 0.3, 1.0 and 3.0 s). Results based on PGA and PGV are the most stable on the whole intensity scale. GNB models score better than GMICEs in terms of performance on unseen data and classification scores.


Introduction
After the occurrence of a large earthquake, the aim of the civil protection unit is to rapidly assess spatial distribution of damage levels with special attention to highest degrees. Macroseismic intensity can be used to provide a qualitative description of such consequences of a seismic event. Reliable and updated tools to immediately associate the ground motion information with expected macroseismic intensity can thus play an essential role in the timely implementation of civil defence emergency plans.
By definition, macroseismic intensity is an ordinal quantity, which means that it has natural, ordered categories and the distances between the categories are not known (Agresti 2013; Kuehn and Scherbaum 2010). Intensity classes are expressed as roman integer values and are always defined as a "collective" measurement, coming from the observation of many factors which are not linearly dependent on any single, directly measurable value. The first main implication of this fact is that the classes are not proportional to one another, meaning that there is no assurance that the effects observed for a degree II are two times those of a degree I (while for example we can exactly define the proportionality between the energy released by a M W = 4.0 and a M W = 5.0 earthquake). This also implies that intensity measures have a high error content, pushing strongly towards the impossibility of interpreting decimal values as an improvement in the actual intensity estimate.
Even so, Ground Motion to Intensity Conversion Equations (GMICEs) are the most common choice in defining instrumental intensity as a function of Ground Motion Parameters (GMPs). Many GMICEs have been estimated for Italy in the past (e.g., Margottini et al. 1992;Faccioli and Cauzzi 2006;Faenza and Michelini 2010;Caprio et al. 2015;Gomez-Capera et al. 2020). They are usually obtained in a simple linear regression form: where I is the intensity and X is the GMP. In GMICE-related literature, it is common practice to use this functional form as it is, which implies treating intensity as a continuous quantity and obtaining decimal intensity values as a forecast.
In the light of the very definition of macroseismic intensity classes, the current formulation in (1) is not appropriate. With this reasoning in mind, we wanted to obtain GMICEs for Italy that are more compliant to the intensity scale by applying pre-and post-processing to the data, in order to use integer classes only. We also decided to propose a radically alternative description of intensity in terms of a direct probabilistic estimate through the Naïve Bayes classification. This approach has the benefit of treating intensity as a discrete variable throughout the whole definition procedure. It is also a machine learning oriented procedure that can be easily updated as more and new data become available.
In fact, the increase in the number of seismological stations and in the instrumentation quality and the occurrence of the latest destructive earthquakes have already provided new high quality data for a better definition of instrumental intensity for the Italian case. In particular, in this study, we re-elaborated the Faenza and Michelini (2010) dataset with the addition of 82 new data points related to 18 events which occurred in the time-span from 2002 to 2016 in Italy, using high quality accelerometric data. Such data points consist in GMP/MCS-intensity data couples obtained by coupling each expert-assessed intensity value with the nearest available waveform in a 3-km radius. The choice of using only high-quality macroseismic data mitigates the possible bias introduced with the use of the nearest value (cf. Lesueur et al. 2013). (1) We tested a set of eight ground motion parameters. For each one we present both our improved version of the GMICE, as a comparison, and the Naïve Bayes classification, as a suggested best practice.

Input data
The accelerometric input database consists of two parts, according to the availability and ownership of the data.
The first part of the dataset is composed of 72 analysed events from 1972 to 2004 (Table  SI1 in the Online Resource), with the lowest local magnitude of 3.4, corresponding to 193 GMP-intensity pairs. These are part of the set of 87 events (266 GMP-intensity pairs) used by Faenza and Michelini (2010) to derive the GMICEs used to date for the Italian territory. Since part of these older recordings generally shows low quality, originally being analog traces with no information on the starting time, we conducted a thorough analysis in order to discard all the recordings with multiple events, or evident artificial signals arising from analog-to-digital conversion or with no clearly identifiable peaks in acceleration, velocity or displacement. Data for the remaining 72 events are taken from the ITACA 2.0 database (ITalian ACcelerometric Archive, version 2.0; Luzi et al. 2008;Pacor et al. 2011), and now all belong to the RAN network (Rete Accelerometrica Nazionale; Gorini et al. 2010;Costa et al. 2015), managed by the DPC (Italian National Civil Protection).
The second part of the database consists of a selection of 18 events (82 GMP-intensity couples) from 2002 to 2016, with 3.4 chosen as the lowest local magnitude, and which comes from the dataset used by Tiberi et al. (2018). These high-quality accelerometric data were collected by the CE3RN (Central Eastern European Earthquake and Research Network; Costa et al. 2010;Bragato et al. 2014) and RAN stations.
The associated macroseismic intensity data-points are taken from the 2015 version of the Italian Macroseismic Database (DBMI15; Locati et al. 2016), except for data related to the two 2016 events (Table SI1) that comes from QUEST reports taken for the Ibleo (Azzaro et al. 2016) and Amatrice (Galli et al. 2016; Tertulliani and Azzaro 2016) earthquakes. All intensity values are expressed in MCS scale. In order to exclude cumulative damage, we only took into consideration the main shocks.
The scope of the work is to improve intensity forecasts also for medium-high damage levels ( I ≥ VI ), where they become of interest for civil defence purposes. For this reason, and to guarantee the homogeneity of the database, we only considered macroseismic intensity measures issued from expert surveys. Furthermore, the inclusion of crowdsourced intensity data is not a trivial process and would go beyond the scope of this work.
The investigated set of ground motion parameters consists in peak ground acceleration (PGA), velocity (PGV), displacement (PGD), Arias intensity (I A ), Housner intensity (I H ), spectral acceleration at 0.3 s (PSA03), 1.0 s (PSA10) or 3.0 s (PSA30). The ground motion parameters for all events were calculated using a near-real-time procedure developed at the Department of Mathematics and Geosciences of the University of Trieste , in order to process the signals in an as homogeneous as possible way. Its main features include a Butterworth filtering between 0.1 and 50 Hz, with range automatically selected based on the signal to noise ratio, and a trend removal used to compute PGV and PGD values. For older waveforms, when the pre-event noise trace was not available, we used fixed filter frequency values taken from the ITACA database.
GMPs were taken from the maximum between the two horizontal component values. The parameters and the observed intensity values were associated using the minimum distance criterion, with a maximum distance limit of 3 km. The complete database thus counts 90 events ( Fig. 1) in the time-window from 1972 to 2016, corresponding to 275 associated GMP-intensity pairs (Table SI1 of the Online Resource). Intensity values range between II and X; the epicentral distances range between 1.6 and 150.7 km (Fig. 2) and are well distributed, especially for the central values of intensities (IV-VI).
It is common practice to treat intensity as a continuous value, for example by using half integer classes in assigning uncertain MCS intensity values, so this kind of data is widely present in dedicated macroseimic surveys and is also found in the Italian Macroseismic Database. Following Kuehn and Scherbaum (2010), in order to be consistent with the class definitions given by the MCS scale, we included this uncertainty information in the data by re-assigning the half integer values to the nearby integer classes, with the use of some weights, so that integer classes only will be used in the calculations. In particular, all data originally corresponding to half integer classes were assigned both to Fig. 1 Data set used for the definition of instrumental intensity: the red stars are the epicentral locations of the analysed events; the cyan triangles are the station sites for which the GMPs are estimated, and with an associated observed intensity value the above and below integer classes, with a weight w = 0.5 , whereas all data originally corresponding to integer classes were assigned a weight w = 1.
The resulting weighted data distribution in terms of intensity classes is shown in Fig  and I A are explicitly shown in the figures, while we provide results for all studied parameters in the Online Resource). For the peak parameters (PGA, PGV, PGD) there are 376 points in the weighted database (of which, 174 with w = 1 ); for Arias intensity, PSA03 and PSA10, 220 points (100 with w = 1 ); for Housner intensity, 200 points (94 with w = 1 ), and for PSA30, 195 points (91 with w = 1 ). The difference in the number of available points for each parameter mainly comes from the fact that, in the case of partial or cut recordings, the extracted parameters were limited to peak amplitudes (PGA, PGV, PGD) and all integral quantities were discarded. The main reason behind this choice is the risk of underestimating the integral parameters due to missing part of the record. In the case of Housner intensity and PSAs, moreover, we also discarded the cases for which the used high-pass filter was so high that it would filter out the frequency values used in the parameter calculation.

GMICEs
Linear regressions are the most common tool to define instrumental intensity. Even so, they treat intensity as a continuous numerical value; therefore, predicted outcomes are not directly meaningful and either have to be rounded to the nearest integer value, or to be interpreted as reflecting an uncertainty between two intensity classes. For this reason, we calculated an updated version of GMICEs for Italy in the most intensity-compliant way, to confront the results with those obtained via the more rigorous GNB methodology. We decided to keep the log-linear functional form itself (Eq. 1) in performing the regression, but we applied some pre-and post-processing in order to take into consideration the caveats discussed so far. The first part of the pre-processing is described in detail in the Input Data section; as a form of post-processing, we rounded up the resulting forecast values to the nearest integer.
In order to consider both the dependent and the independent variables as affected by sampling variability, which is more correct given the nature of our data, we calculated the GMICEs by using the ODR methodology (Boggs et al. 1988; odr in scipy.org). ODR is a common technique for fitting data to models and we used it to extract the intercept and gradient parameters (a, b in Eq. 1). This algorithm minimizes the weighted orthogonal distances from the curve, taking into consideration both the vertical ( y ) and horizontal ( x ) uncertainties. We used ODR in its simplest form, assuming that the ratio of the standard deviation of the errors on dependent and independent data ( y ∕ x ) is known and fixed. This also makes it possible to directly invert the relation, so the regression coefficients could be likewise used to express the GMPs as a function of intensity.

Application
Both intensity and strong motion data are characterised by an intrinsically high spatial variability. As for intensity, this fact has been addressed by defining macroseismic classes as a collective measurement, subtracting to the meaningfulness of "punctual" measures. On the other side, instrumental data are intrinsically punctual and strongly connected to specific, local geological conditions. The polar character of intensities also adds to the disequilibrium in the input dataset, mostly concentrated in the lower-central classes (IV-VI). A possible solution to address this variability is to perform a preliminary smoothing of data to filter out effects related to regional variability, random components, and geological conditions. We decided to follow the approach proposed by Faenza and Michelini (2010) and to bin the GMP-intensity couples into integer intensity classes as a form of smoothing of the instrumental data. The regressions are thus performed on the mean of the logarithmic GMP values in each bin. This choice also depends on the distribution of the GMP standardised values with zero mean and a unit standard deviation, represented in Fig. 4 (cf. Fig. SI3 of the Online Resource). It is clearly visible that after the application of the logarithm in base 10 the data follow the normal Gaussian curves, allowing us to estimate the intrinsic variability of ground motion data for each class.
We calculated the mean values and the standard deviations of these distributions and used them in the GMICE inversions. In our case, in particular, we estimated the mean values * k as the weighted arithmetic mean of the logarithm in base 10 of the parameter ( log X ), for each intensity class k between II and X: where w jk is the weight assigned to the j-th point among the N k data points with I k = k . As for the associated errors, we must take into consideration that there is an evident lack of data for some classes with respect to others (e.g. classes IX and X versus class V; cf. Figure 3). For this reason, it was not possible to provide a robust estimate of the regular standard deviation for those classes, which would turn out very low or even null and would not reflect the actual distribution of the underlying data. The use of such standard deviation values would also excessively push the noise filtering resulting from the binning, leading to an artificial increase in the statistical parameters related to the goodness of the fit. Following the approach proposed by Kuehn and Scherbaum (2010), we thus estimated a standard deviation CSD common to all intensity classes for each GMP, as the square root of the pooled variance: where we divided by the total number of samples ( N tot ) minus the number of different classes in which data were binned, nine in our case. Values of * k and CSD for each parameter are reported in the Online Resource (cf.  1 3 corresponding intensity values I are also assigned an error I to account for the dispersion of data. We tested different binning on the GMP data in order to check the corresponding discrete distribution of intensity values and decided to adopt a conservative value of I = 1.0 as a common standard deviation associated with all intensity classes and all parameters. For each parameter, GMICEs were thus calculated on nine data couples ( * k , I k = k ), with associated errors ( CSD , I ) and intensity classes ranging from II to X (cf. Fig. SI4 for a graphic representation). Regression parameters are reported in Table 1. To allow for a qualitative comparison of our results, for each equation we calculated the R squared value ( R 2 ), representing the proportion of explained variance of I, and the standard deviation of the bins ( ) and of the data ( d ). The standard deviation of the bins is defined as: where k = I k −Î k is the residual between predicted intensity value ( Î k ) and true intensity value ( I k ) corresponding to * k . Due to the low sample population, we used a reduced form where the number of intensity points used in the regression (nine) is reduced by the number of fitted parameters (a and b in Eq. 1). Just like R 2 , depends on residuals calculated on the binned dataset k . For this reason, it does not fully catch the actual underlying variability in I, and its values are way lower than the prior ones assigned to the input data ( I = 1.0 ). Following Gomez-Capera et al. (2020), we also calculated the standard deviation of the data d : where I n −Î n is the residual calculated for the n-th input point. Values of R 2 , and d are reported in Table 1. Obtained d values are close to 1 and provide a better measure of the variability in I for a given GMP value. Even so, since I is an ordinal variable, they cannot be used as-are and require some degree of interpretation. One possibility is to define a probability associated to each Î n , in the form of a Gaussian distribution centred on the forecasted intensity and with d as standard deviation (cf. Sect. 5.1).

Gaussian Naïve Bayes Classifiers
Linear regressions treat intensity as a continuous numerical value instead of an ordinal one; therefore, predicted outcomes are not directly meaningful and either have to be rounded to the nearest integer value, or be interpreted as reflecting an uncertainty between two intensity classes. To correctly handle ordinal data throughout the whole inversion process it is possible to apply a different method, the Gaussian Naive Bayes classification (GNB), which estimates a discrete conditional probability distribution Pr (I|X) linking the (ordinal) intensity I to any GMP X (Pedregosa et al. 2011). The goal is to obtain an alternative way to express forecasts in the form of an ordinal instrumental intensity value, with a known associated probability.
GNB is part of a set of supervised learning algorithms based on applying Bayes' theorem in the Naïve form, that is, with the assumption of conditional independence between every pair of features given the value of the class variable (Zhang 2004). In our particular case, having considered only one feature at a time, it coincides with the full Bayes theorem. We hereby give a synthetic overview of the procedure; for more details on the procedure itself and on the underlying statistics, we refer the reader to Lancieri et al. (2015) and references therein.
For any variable X taken among the eight selected GMPs, and the categorical variable I which is dependent on variable X, a Naïve Bayes classifier predicts the conditional probability distribution of I given log X by using Bayes' rule: According to Bayes' rule, Pr(log X|I) is the conditional probability of observing log X on class I, and Pr (I) and Pr (log X) are the a priori probabilities for I and log X , respectively. In this specific context, the probability of having intensity class k when the variable X takes the value x i can be expressed as: where summation over j covers the whole event space, i.e. all possible intensity classes. We should stress how the name Bayesian classifiers comes from the use of the Bayes' theorem, but does not automatically imply the use of Bayesian inference. In principle, it would be possible to define prior distributions and estimate the parameters using Bayesian inference; in fact, following Kuehn and Scherbaum (2010), all parameters were empirically estimated by maximum likelihood. Pr(I = k) was learnt from the data as the relative frequency of the classes observed on the dataset: where N k is the number of data in class I = k and N tot is the total number of data. As for the conditional probability Pr(log X|I) , we used the normal distribution already estimated from the dataset for each intensity class k, with a mean value of * k and common standard deviation CSD (cf. Sect. 2): By using GNB, we fit the probabilities on the whole dataset to obtain a discrete conditional probability distribution on all intensity classes for each input GMP value. We then chose to select the class with the highest associated probability value as the best estimate of I.

Application
We applied the Python algorithm pomegranate (Schreiber 2018) to perform GNB classification on the whole dataset. As opposed to the regression procedure, we performed the calculations without binning the data and only used information from the binned database in the form of the parameters * k and CSD to be inputted in Eq. 9. Figure 5 shows an example of the corresponding conditional probability distribution Pr(log X = log x i |I = k) for each intensity class for the PGA parameter.
The resulting intensity predictions are plotted in Fig. 6 (cf. Fig. SI5 of the Online Resource), colour-coded from lower (white) to higher (black) associated probabilities. They are obtained by applying the model to a linear space covering all values of the input parameters; for each value on the x-axis, the corresponding colour-coded probability values along the vertical (intensity) axis sum up to one. For each parameter, the resulting ODR equation is reported for comparison.

Performance on unseen data
The best way to assess the performance and reliability of the resulting intensity predictions, both from ODR and GNB, would be to test them on an 'unseen' dataset, different from the one used to extract them. In our case, the available database itself does not contain enough data to properly build both a training set and a testing set, so we resorted to Leave-One-Out cross-validation (LOOCV) as a proxy to assess the equations performance on unseen data.
LOOCV works by repeatedly dividing the whole dataset into two subset: the one used to train the equations, containing N-1 points, plus a single point which is left out to be used Fig. 5 Example of the conditional probability distribution Pr(log X = log x i |I = k) for the PGA parameter, for each intensity class, used in the GNB Classification fitting for validation. We constrained our LOO system so that only points associated to integer intensity values (i.e. weight equal to one) would be left out as a test case. We dropped each of these points in turn, performed the regressions on the remaining data, and used them to estimate intensity on the left-out data point ( Î ). The classification ability with respect to the actual values ( I ) was then scored using the Cross-Entropy loss function L (C.E.; also called log-loss): where o,c is 1 if the intensity value of observation o belongs to class c and 0 otherwise, and P o,c is the predicted probability that observation o has intensity class c. We chose to use the Cross-Entropy loss as it takes into consideration the probability associated to each intensity class, which should reflect the intrinsic variability in intensity values for a given ground-motion input. This allows to compare models not only on their average classification ability, but also on how well they capture the uncertainty. C.E. loss score increases as the predicted probability deviates from the actual label; it would be 0 for The ODR equation with associated ±2 error is reported for comparison (red lines), together with the mean GMP values used to derive the equations (white diamonds) and the underlying dataset (black circles). For each value on the x-axis, the corresponding probability values along the vertical axis sum up to one a perfect model. Notice that, in the case of ODR forecasts, the predicted probability was estimated by integrating the normal density centred on Î over the interval [I − 0.5 , I + 0.5].
The resulting C.E. loss scores are reported in Table 2. GNB classifier models score better than ODR regressions for all ground motion parameters, indicating an overall better performance of the GNB models. Note that among other parameters the equations regarding PGV (in agreement with Kuehn and Scherbaum 2010), PSA03, I A and I H provide the best performances.

Spectral parameters
We should stress that, even if both methodologies have a way to address the weakness arising from less populated classes, models obtained from spectral parameters are still resenting the lack of data in high intensity classes (I > VII). As explained in Sect. 2, our database included many older, triggered waveforms, for which only the peak amplitude parameters could be extracted without risk of underestimation. This led us to using a less populated database for the case of spectral parameters. We can see from the distribution of such data (e.g. Fig. 6c) that it particularly lacks in high intensity classes, which could lead to inconsistencies in the related forecasts. This holds true for both methodologies, and is simply more evident in the case of GNB where it translates into not well resolved probability values. For this reason, we advise that only the resulting models for PGA and PGV should be used for forecasts.

Sensitivity study
We tested both GNB and ODR models on the training dataset (described in Sect. 2) by comparing the predicted classes with the observed ones, to check in which data ranges each performed better. Results are shown in form of weighted confusion matrices in Fig. 7: the elements distributed along the highlighted diagonal are the number of data correctly categorized, while the off-diagonal elements are the misclassified data. GNB models provide more realistic outcomes for all classes with respect to ODR models, which also tend to a class overestimation (more elements on the right side of the diagonal). In both cases, PGV-based classification is more robust than the PGA-based one.

Application of GNB forecasts
In order to be directly applicable into shaking intensity maps, GNB classification models have to be converted to GMICE-like objects. We assigned a single instrumental intensity value to each input ground motion parameter value in the database range. The forecast is chosen as the class with the highest associated probability (corresponding to the darkest colour in Fig. 6). The result is a linear trend that is comparable to the regression output and that can be used as a guide in defining parameter ranges for each instrumental intensity class (cf. Table 3). Results for the PGA and PGV cases are reported in Fig. 8.

Comparison of GMICEs for Italy
We compared the empirical GMICEs obtained in this study (which use integer classes only and are somehow more compliant to the MCS intensity scale) with the relationships reported by Gomez-Capera et al. (2020), Caprio et al. (2015), Faenza and Michelini (2010), and Faccioli and Cauzzi (2006). A summary of the characterizing parameters is reported in Tables 4 and 5, for the PGV and PGA cases respectively. The relations are consistent to each other inside the common standard deviation values estimated with our dataset (Fig. 9). The main difference is in the reliability and range of Fig. 7 Method classification on the training dataset in the form of weighted confusion matrices: on the y-axis the true (observed) class label and on the x-axis the predicted one are reported. The elements distributed along the highlighted diagonal are the number of data correctly categorized. Results refer to the ODR a and GNB b methods based on the PGA parameter (upper panels) and for the ODR c and GNB d methods based on the PGV parameter (lower panels)  validity of these laws, those estimated in this study having higher values of R 2 and a wider range of validity. In fact, the resulting equations present high R 2 values for all the studied GMPs (over 0.90), rendering it impossible to indicate which of the parameters provides a better estimate of intensity. The lowest standard deviation of data is associated to the regression line for PGV ( d = 1.19). However, it should be kept in mind that the GNB models should be preferred to the ODR ones in any case, as results from cross validation and sensitivity test confirm.

Conclusions
The aim of this study was to provide an updated and more rigorous definition of instrumental macroseismic intensity for Italy. We used integer MCS intensity classes in the range II-X, together with high quality accelerometric data. Data was pre-processed in order to use integer classes only. For each investigated ground motion parameter (PGD, PGV, PGA, I A , I H , PSA03, PSA10 and PSA30) we provided both the GMICE formulation, which is  The dotted red lines are the ± 2σ error associated to the ODR GMICEs more used but less appropriate, and the GNB formulation, which correctly treats intensity as an ordinal quantity. Out of the eight tested parameters, models based on spectral parameters proved to be too unstable at higher intensity levels; PGA-and PGV-based equations should be used instead.
Overall, the GNB approach should be preferred as more rigorous in treating intensity as a discrete variable throughout its whole procedure. It goes beyond providing a single-valued intensity estimate, as it calculates a full discrete probability distribution for the MCS intensity classes. As a result, GNB-based models show better performance on unseen data and more capability in capturing the uncertainty than GMICEs. Overall, GNB models perform better than ODR ones on the whole considered intensity range, in terms of classification scores.
The possibility to increase the estimate accuracy with respect to the 'standard' GMICEs might be extremely useful in some applications, such as shaking intensity maps. In fact, GMICEs are the default choice in generating ShakeMaps with the USGS-ShakeMap software (Wald et al. 1999). We propose a conversion of GNB models to GMICE-like objects that can be substituted in the ShakeMap procedure.
The GNB-based methodology is a machine learning oriented procedure that can be easily updated as more data is collected. In the era of big data, it can be included in the effort to efficiently analyse incoming data in near-real time. Future work includes testing and calibration of this procedure for both the south-eastern Alps region, following Moratto et al. (2009), and for the Italian territory, as soon as new, independent intensity data on new events becomes available. In particular, it is fundamental to point out that in order to explain the damage in the near fault areas, a more focussed study is needed to expand the research to a combination of ground motion parameters.