The compilation and maintenance of databases, in conjunction with big-data management methodologies, allow for relevant advances in research. Databases treatment and processing are indeed gaining increasing importance in modern probabilistic methodologies. The availability of databases is beneficial for the evaluation of statistical properties and for the development of empirical correlations. In addition, reliable databases provide reference benchmarks for validation and verification purposes.

Strong motion databases are common in applied seismology and geotechnical earthquake engineering (e.g. Chiou et al. 2008; Pitilakis et al. 2013; Ancheta et al. 2014). However, there are only few examples of databases dedicated to site characterization for seismic hazard analyses. Stewart et al. (2014a) pointed out the necessity of reliable databases for the definition of new statistical models for modulus reduction and damping curves. In addition, the authors recommended the compilation of well-documented and accessible databases of shear wave velocity (VS) profiles to investigate their statistical properties within seismic ground response studies. Moreover, these databases would be also useful to study the reliability of geophysical methods for Vs determination. The scientific literature includes some examples of such databases.

EPRI (1993) presented stochastic analyses on a database of over 350 VS profiles (mainly from sites in the United States) to develop guidelines for site investigations. Later, Toro (1995) compiled a database for the development of a geostatistical model for the management of uncertainties in VS profiles, using the PEA (Pacific Earthquake Analysis) database, with 745 VS profiles (including the ones of EPRI (1993) database). Only VS profiles that were measured in situ (i.e. 557) were included in the final data collection and were also partially used in a study by Wills et al. (2000). These VS profiles were also classified according to GeoMatrix (Chiou et al. 2008) and to the thirty-meter harmonic shear wave velocity (VS,30) proxies.

Moss (2008) and Comina et al. (2011) presented analyses for the quantification of measurement uncertainty propagated on the VS,30, but their databases were limited in size. Stewart et al. (2014b) proposed a database compiled for Greece collecting information in the open literature, research reports, professional engineering reports and personal communications. Each site in this database had geophysical measurements and was categorized according to typical classification proxies, as discussed by Toro (1995). The main scope was a method for the extrapolation of VS,z (harmonic average shear-wave velocity profile down to depth z) to VS,30 in case of lack of experimental data at increasing depths.

More recently, Sadiq et al. (2018) and Ahdi et al. (2018) compiled VS databases following similar examples presented over the last few years (Kayen et al. 2004; Ahdi et al. 2017). The main goal of these works was to share information worldwide with the technical and scientific communities. Another VS database is presented in Aimar et al. (2019) to validate simplified approaches included in the Italian building code (NTC 2018) for the soil class amplification factors. In this case, the authors also extended their database with additional synthetic profiles generated with a geostatistical model (Passeri et al. 2020).

The above-mentioned databases are affected by some limitations. First, most of the databases do not include specific information and details on processing and interpretation methods of the experimental data. In some of them, empirical correlations with other geotechnical tests were used for VS profiles determination. Therefore, these databases cannot be used to assess the statistical properties of the results for each specific seismic method. Second, most of the databases contain a limited number of profiles: the larger databases present a substantial spatial variability between the sites (e.g. ranging from California to Italy) and only in few situations sites are grouped by distance or other proxies. Third, some of the databases include profiles with very limited investigation depths.

For inter-method comparisons between invasive and non-invasive methods, the former (e.g. down-hole DH, cross-hole CH, suspension logging, seismic cone SCPT, seismic dilatometer SDMT) are usually assumed as the ground truth. Boore and Brown (1998) performed some comparisons also looking at the dynamic response of the sites in terms of surface-to-bedrock theoretical transfer functions (TTFs). They reported an underestimation of VS for shallow layers and an overestimation for deep layers by surface wave tests with respect to invasive tests. The different investigated volumes among the methods were claimed as a possible motivation for the observed differences. Later, Brown et al. (2002) considered ten examples of inter-method comparisons showing large inter-method differences only where considerable lateral variability was present at the sites. For laterally homogeneous sites, lower velocity values (around 15%) by surface wave tests than by invasive tests were observed close to the surface. Nowadays, it is accepted that also invasive tests are subjected to a non-negligible uncertainty, particularly for shallow depths (Garofalo et al. 2016b). Several inter-method studies showed that the agreement between the results is good if both invasive and non-invasive tests are conducted and interpreted using state of the art methods (Foti et al. 2011; Kim et al. 2013; Piatti et al. 2013b; Cox et al. 2014; Garofalo et al. 2016a; Garofalo et al. 2016b).

The modeling of uncertainties for a specific seismic method require their identification, quantification, and management (Passeri 2019) and a neat separation (if possible) of uncertainty sources (i.e. epistemic uncertainties and aleatory variabilities). From the surface wave test perspective, uncertainties are mainly associated to the acquisition and inversion of the Experimental Dispersion Curve (EDC) (Foti et al. 2019). Uncertainties of the EDC can be estimated with repeated acquisitions and/or numerical simulations. For the SASW (Spectral Analysis of Surface Waves) method, Marosi and Hiltunen (2004) showed experimental values of the VR (Rayleigh wave phase velocity) Coefficient of Variation (\({\mathrm{COV}}_{{V}_{R}}\)) typically around 1.5%. They also found that VR values were normally (i.e. Gaussian) distributed for frequencies in the 20–150 Hz range. O'Neill (2004) analyzed numerical simulations and experimental results showing that the \({\mathrm{COV}}_{{V}_{R}}\) increases nonlinearly for decreasing frequencies (i.e. from 1% to around 30%) and proposed a Lorentzian distribution for the low-frequency band. According to Lai et al. (2005), the EDC could be subdivided in a low-frequency zone with high uncertainties and a high-frequency zone with low uncertainties, as a result of the natural loss of resolution of the Rayleigh waves with depth and of the experimental challenges for low-frequency waves generation. They also confirmed the normal distribution of VR. More recently, Olafsdottir et al. (2018) confirmed the hypothesis of VR normal distribution with a dataset in Iceland using the probabilistic theories illustrated by Shapiro and Wilk (1965) and Ross (2014).

The EDC uncertainties are then propagated to VS models in the solution of the surface wave inverse problem. Given the inverse problem ill-posedness and the consequent solution non-uniqueness, several different VS profiles can honor equally well the EDC. It is possible therefore to identify a set of VS profiles which are intrinsically equivalent, i.e. their associated misfit is small if compared to the EDC uncertainties. The analyses of equivalent Vs profiles is usually undertaken by global search inversion methods with inferential statistical tests (e.g. Socco and Boiero 2008).

In this framework, this paper presents a flat-file database compiled for the assessment of statistical properties of surface wave tests (mostly active) at different sites: the Polito Surface Wave flat-file Database (PSWD). The flat-file database also includes passive surface wave data and experimental results from invasive methods at several sites, for inter-method comparison. The general characteristics of the PSWD are herein presented and discussed. The main novelty of the PSWD is its homogeneity and consistency in terms of processing and interpretation methods. Specifically, a common processing strategy and a consistent inversion approach were applied to guarantee the statistical reliability of the results. This approach is novel with respect to available databases and allow for statistical analyses also with respect to the comparisons with invasive tests, when available. The PSWD is publicly available as an electronic supplement to this paper for validation and/or verification benchmark studies.

Polito Surface Wave flat-file Database (PSWD)

The primary attention of the Polito Surface Wave flat-file Database (PSWD) is devoted to surface wave tests (mostly active), however the database structure is flexible and allows for the storage of further information, including results of additional geophysical tests (for inter-method comparisons) and/or geotechnical tests (for correlation studies). The PSWD was compiled with surface wave tests performed by Politecnico di Torino (sometimes in collaboration with Università di Torino) in the last 25 years. For each site, a representative EDC was obtained following state of the art processing approaches, homogeneous through the years (Foti et al. 2007). When possible, a quantification of experimental uncertainties was also performed.

The ill-posedness of the surface wave inverse problem, which causes the non-uniqueness of the solution, was investigated through a specific statistical sampling of the model parameters space. The EDCs collected over this wide time interval were indeed systematically reinterpreted adopting a two-step inversion procedure (Passeri 2019)with an improved Monte Carlo inversion algorithm (Socco and Boiero 2008). In both steps, layer thicknesses, number of layers, shear wave velocities and the Poisson’s ratios were assumed as random variables of the problem. This inversion methodology led, for each site, to a homogeneous set of equivalent solutions. This set was determined after the application of a one-tail statistical test (Socco and Boiero 2008) on the inversion results, propagating the experimental uncertainties into the VS profiles. As a consequence, the Vs profiles in the database may be slightly different than those reported in the literature for the same sites.

The PSWD is distributed as an electronic supplement of this paper. For each site the data shared in the electronic supplement include the EDC (with experimental uncertainties, when available), the best fit (i.e. lowest misfit) VS profile from surface wave tests and the VS profile from invasive tests (when available). Note that the data shared in the open-access version of the flat-file database only include the lowest-misfit solution. The entire statistical sample of equivalent profiles is stored in the private version of the database and can be made available upon request. Hereafter further details on the experimental information, processing approach and inversion procedure are presented.

Experimental information and processing approach

Seventy-one Italian sites are included in the PSWD version discussed in this paper (Fig. 1). Surface wave tests provided only the fundamental mode of the EDC for these sites. The geographic location of these fundamental-mode sites is shown in Fig. 1, where each point has a specific color that will be consistently used during the entire paper. The higher densities of sites are in Central (i.e. Abruzzo and Marche regions), Northeast (i.e. Friuli region), and Northwest (i.e. Piedmont region) Italy. Figure 1 also shows a zoom of the Central Italy area that was struck by the L’Aquila earthquake in 2009 (Monaco et al. 2012) and by the Central Italy 2016 seismic sequence (Stewart et al. 2018). This area was widely characterized during the seismic microzonation projects leaded by the Italian Civil Protection Department (Hailemikael et al. 2020).

Fig. 1
figure 1

Spatial distribution of the fundamental-mode sites included in the PSWD and zoom on Central Italy with the highest density of sites (the colors are consistent within the entire paper, except for the areas with multiple sites that are marked with a single black dot)

Table 1 contains general information about the sites with respect to: their geographical location, the surface wave tests setup, the properties of the EDCs and the presence of independent invasive tests (available for forty-four sites). Active surface wave data were acquired for all sites using a linear array of receivers, whose characteristics are also reported in Table 1, in the usual Multichannel Analysis of Surface Waves (MASW) approach. For seventeen sites also passive surface wave data were acquired with a 2D circular array of receivers, whose dimensions are also indicated in Table 1. The corresponding geographical location for each site represents the location around which both active and passive surveys were deployed. Each site represents therefore a single measuring point for which multiple active data acquisitions along the same line (usually not less than 10 shots) and multiple passive data recordings were performed. Literature references about the sites, containing more details on the performed surveys and on the adopted processing technique for EDC extraction are also included, when available. In these references also further details on the geological setting of the sites can be obtained to allow for eventual physical explanations of the resulting Vs profiles, based on the geology of the sites. The attended geological setting at the sites is also briefly resumed in the electronic supplement. More geological information can be obtained from available geological maps near the geographical locations of the sites (see and further specific geological information can be made available upon request.

Table 1 Fundamental-mode sites included in the PSWD

For all the tests a homogeneous processing procedure was adopted. Both active and passive data were processed through a frequency-wavenumber (f-k) transform of seismic records. For active data the code SWAT (Surface Wave Analysis Tool), developed in MATLAB® environment (Matlab 2018) at Politecnico di Torino, was used. For passive data a beamforming technique (Zywicki 1999) was adopted. The EDC was estimated by automatic picking of energy maxima in the f-k image within a preselected high energy area. This allows to obtain the pair of f-k parameters associated to the propagation of the Rayleigh waves. When active and passive tests were available at a site the information of the two surveys were merged (e.g. Foti et al. 2007).

Forty-nine of the sites have also an experimental evaluation of the Rayleigh velocity standard deviation (\({\sigma }_{{V}_{R}}\)). This evaluation was obtained by processing several acquisitions along the same survey line or passive data recording. This procedure allows for the evaluation of uncertainties related to background non-stationary noise but other effects, such as errors in the geometry, cannot be assessed. Therefore, the measured uncertainty is a partial estimate of the overall one.

Figure 2 shows the experimental wavelengths (\(\lambda\)) associated with the EDCs at each site (the IDs reported in the first column of Table 1 identify the sites). Given the link between maximum wavelength and penetration depth of surface waves, Fig. 2 offers a direct information on the depth of investigation obtainable at the different sites. Most wavelength intervals range from 2–3 m to 50–60 m. However, several of experimental measurements cover a wider wavelength band (from 1 to 100 m). Few EDCs (most significantly at La Salle, Mathi and Tarvisio sites, see Fig. 2) show experimental wavelengths above 100 m, thanks to the use of passive surveys. Within the PSWD the only constraint to limit the wavelength validity domain of the EDCs was related to the lowest frequency limit at which the EDCs were evaluated. This was established as a function of the geophone natural frequencies (i.e. 4.5 Hz for active data and 2 Hz for passive data). No other criteria with respect to the array length or aperture were used since several different ranges are proposed in literature and a unique criterion is still not available (e.g. Cornou et al. 2006).

Fig. 2
figure 2

Experimental wavelengths associated with each experimental dispersion curve (EDC) of the fundamental-mode sites included in the PSWD

Inversion procedure and shear wave velocity profiles

The inversion of the EDCs in the PSWD followed a two-step process (Passeri 2019). First, the inversion was performed with a fixed number of layers and 2 × 105 trial models. This inversion aimed at producing a preliminary population of solutions and at reducing the size of the model space for the second step. However, a complete statistical sample of solutions can be obtained only if also the number of layers is allowed to vary in the inversion process (Cox and Teague 2016). Therefore, a further randomization for the second inversion step was implemented with 2 × 105 additional trial models, considering also the number of layers as a random variable. The assumption of the Poisson’s ratio (\(\nu\)) variability (together with layer thicknesses, number of layers and shear wave velocities) in both inversion steps is a further strength of the methodology, compared to usual approaches that assume a-priori values for this parameter. The only a-priori assumption in the inversion was related to the layers densities, which were estimated from available information on the site, given the limited sensitivity of EDC to density variations. From the final population of solutions, the equivalent profiles (i.e. Vs profiles that equally fit the EDC) were determined after the application of a one-tail statistical test (Socco and Boiero 2008). The reader can refer to Passeri (2019) for further details on the proposed inversion methodology.

Examples of the inversion results are reported in Figs. 3,  4 for La Salle-1 (ID 22) and Torre Pellice-1 (ID 65) sites, respectively. These two sites show different characteristics in terms of geological and mechanical properties of the subsurface. La Salle-1 site lays on a wide alluvial fan, mainly composed by alluvial Quaternary deposits, medium to coarse grain sized, with a thickness of around 100 m, over compact glacial deposits. These characteristics were confirmed by stratigraphic logs, till 50 m depth, reporting the typical chaotic sequence of gravely soils of alpine alluvial fans, with absence of a properly layered structure. At the Torre Pellice-1 site, a velocity inversion is expected in the shallowest 30 m. This is caused by the presence of a shallow formation of fluvial sediments over softer lacustrine sediments. The thickness of the upper fluvial sediments is expected to be variable between 10 and 50 m. The bedrock is likely located at more than 100 m depth in the central part of the valley. However, it is shallower on the lateral valley portions where tests were performed.

Fig. 3
figure 3

Results of the inversion for La Salle-1 site (ID 22) in terms of equivalent solutions, in grey, and minimum misfit solution (i.e. the solution provided in the flat-file database), in red: a interval velocity profiles; b harmonic average velocity profiles; c experimental (black dots) and equivalent theoretical dispersion curves; and d theoretical surface-to-bedrock transfer functions (TTFs)

Fig. 4
figure 4

Results of the inversions for Torre Pellice-1 site (ID 65) in terms of equivalent solutions, in grey, and minimum misfit solution (i.e. the solution provided in the flat-file database), in red: a interval velocity profiles; b harmonic average velocity profiles; c experimental (black dots) and equivalent theoretical dispersion curves; and d theoretical surface-to-bedrock transfer functions (TTFs)

The examples reported in Fig. 3, 4 show the equivalent interval velocity VS profiles (Fig. 3a, 4a) and the equivalent harmonic average VS,z profiles (Fig. 3b and Fig. 4b), in grey. The VS,z was calculated for each profile with the formula:

$$V_{S,z} = \frac{{\mathop \sum \nolimits_{n} h_{i} }}{{\mathop \sum \nolimits_{n} \frac{{h_{i} }}{{V_{S,i} }}}}$$

where n is the number of layers down to the depth z, hi is the thickness of the ith layer and VS,i is its shear wave velocity. The solution with the lowest misfit (i.e. the solution reported in the database) is also reported, in red, both in terms of VS profiles (Figs. 3a, 4a) and VS,z profiles (Figs. 3b,  4b). The misfit (M) was calculated, following Socco and Boiero 2008 and Wathelet et al. 2004, with the formula:

$$M = \frac{{\mathop \sum \nolimits_{n} \left( {V_{ti} - V_{ei} } \right)^{2} \sigma_{ei}^{ - 2} }}{{l - \left( {2n - 1} \right)}}$$

where Vti and Vei are respectively the theoretical and the experimental phase velocities, σei are the experimental uncertainties, l is the number of data points in the EDC and n is the number of layers of the profile.

It can be observed an extremely limited variability in VS and VS,z at shallow depths for both sites. This is typical when performing global inversion strategies over surface wave data and it is related to the reduced uncertainty of the EDC in the higher frequency range and on to the sensitivity of the EDC to the properties of the shallow layers, resulting in a well constrained inversion in the shallower portion of the profile. At greater depths the relatively high Vs variability strongly reduces in terms of VS,z profiles (Socco et al. 2015). Note that the dynamic response of a deposit is highly dependent on the VS,z (Kramer 1996), therefore the comparison should always be conducted in terms of VS,z (Passeri et al. 2020). In addition, the VS,z profile provides initial information about the investigated site if compared to the experimental dispersion curve (Socco et al. 2017; Passeri 2019).

The comparison of the EDC with the theoretical dispersion curves of the equivalent profiles is reported in Fig. 3c and Fig. 4c, where consistency between the experimental and the theoretical site signatures can be observed. Finally, the one-dimensional, linear Theoretical Transfer Functions (TTFs) are reported (Fig. 3d and Fig. 4d). They confirm that the geophysical equivalence in the VR-f and VS,z-z spaces implies the equivalence in terms of the dynamic response of the deposit. Indeed at least the first two resonance peaks (which are also the most relevant for seismic response evaluations) are very consistent among the whole set of equivalent solutions, coherently with similar findings in the literature (Foti et al. 2009; Comina et al. 2011; Griffiths et al. 2016; Teague and Cox 2016; Teague et al. 2018; Passeri et al. 2019a). The higher resonance frequencies are more dispersed in the case of La Salle-1 site (Fig. 3d) because of the complex stratigraphy in the shallow layers which makes more relevant the role of uncertainty in the determination of layer boundaries.

A summary of the inversion results in terms of minimum achieved misfit (Mmin) and number of accepted equivalent profiles is reported in the electronic supplement of the paper for all the sites. Note that the solutions included in the PSWD can be slightly different from the ones previously proposed in the literature (see the last column of Table 1 for references). This is due to the new two-step inversion process that was systematically adopted for the entire flat-file database. However, there is a general accordance between the two interpretations. Note also that the statistical significance of the resulting number of equivalent profiles is adequate for most of the sites, indicating that the solution space was consistently sampled with the proposed inversion approach.

Statistical analysis of test results

The study of uncertainties and variabilities (Budnitz et al. 1997) is nowadays crucial for statistical analyses performed within a probabilistic framework (Bommer 2003). Hereafter, first the uncertainties and variabilities in the EDCs of the PSWD are discussed. Then, their influence on the solution is presented. Finally, shear wave velocity proxies from EDC are evaluated, discussed and validated.

Uncertainties in the EDC

An identification and quantification of uncertainties associated with the EDC is not straightforward. Indeed, the influence of different uncertainty sources converges into the EDC and a precise distinction between them is most often unfeasible (Teague and Cox 2016; Passeri et al. 2019a). For this reason, the uncertainties in the EDCs are usually jointly modelled via the overall \({\mathrm{COV}}_{{V}_{R}}\) as a function of frequency (Lai et al. 2005; Foti et al. 2014). However, the experimental evaluation of \({\mathrm{COV}}_{{V}_{R}}\) is often prevented in standard applications, as a statistical population of test repetitions (i.e. multiple shots at different locations) is not always available (Foti et al. 2018). Therefore, a single-deterministic EDC is often obtained and used for the inversion. Each sampled frequency is then associated to a single, deterministic VR value, with no information on uncertainties and variabilities in the measurements (Foti et al. 2019). Hereafter, we propose a formulation for the estimate of uncertainties and variabilities in the EDC when multiple measurements are not available. Note that the uncertainties in the EDC are always relevant as they are propagated in the surface wave inverse problem and therefore on the estimated shear wave velocity profiles.

Several possible functional forms were investigated to study the \({\mathrm{COV}}_{{V}_{R}}\) variability with frequency within the PSWD for the forty-nine sites having an experimental evaluation of \({\mathrm{COV}}_{{V}_{R}}\) (Passeri 2019). Also, the dependency of \({\mathrm{COV}}_{{V}_{R}}\) on wavelength, VR and different combinations of these two parameters was studied (Passeri 2019). In addition, various mathematical formulations and fitting algorithms were compared. In the end, the \({\mathrm{COV}}_{{V}_{R}}\) was confirmed to be mainly dependent on frequency (f) as already recognized in literature (Marosi and Hiltunen 2004, O'Neill 2004, Lai et al. 2005, Olafsdottir et al. 2018).

A preliminary analysis of \({\mathrm{COV}}_{{V}_{R}}(f)\) was obtained calculating the moving average and the standard deviation of all the experimental data (Fig. 5a). Each of the forty-nine sites contributed to this analysis with its EDC. The number of data points of each EDC is reported in the electronic supplement of the paper, the total number of available data points for the forty-nine sites is above 2600. This analysis suggests a \({\mathrm{COV}}_{{V}_{R}}\)≅ 0.05 at high frequencies, increasing up to 0.15–0.2 at low frequencies. These values are in agreement with other examples in the literature (Marosi and Hiltunen 2004; O'Neill 2004; Lai et al. 2005; Foti et al. 2009; Comina et al. 2011; Cox et al. 2014; Garofalo et al. 2016a; Garofalo et al. 2016b; Olafsdottir et al. 2018; Teague et al. 2018).

Fig. 5
figure 5

Variation (COV) of the Rayleigh wave velocity as a function of frequency: a entire set of experimental values along with the calculated moving average; b selected best fitting model and suggested precautionary choice with related parameters

The functional form selected for the regression is a double exponential power law:

$${\text{COV}}_{{V_{R} }} \left( f \right) = a_{1} e^{{a_{2} f}} + a_{3} e^{{a_{4} f}}$$

where ai are the regression coefficients and f is the independent variable.

The fitting of the experimental \({\mathrm{COV}}_{{V}_{R}}\) was performed using the LAR (i.e. least absolute residuals) robust regression algorithm (Dumouchel and O'Brien 1992, Huber 2011) obtaining an adjusted R-square value equal to 0.98.

The proposed functional form has a finite value at f = 0 Hz (\({\mathrm{COV}}_{{V}_{R}}\)=0.3 = a1 + a3) and a slight increase of \({\mathrm{COV}}_{{V}_{R}}\) for high frequencies (Fig. 5b). Indeed, the increased uncertainty of surface wave tests is associated both to very low frequencies (due to the lack of penetration at large depths and near-field effects) and to very high frequencies (due to local small scale heterogeneities, spatial aliasing and attenuation). The minimum \({\mathrm{COV}}_{{V}_{R}}\)≅ 0.03 is obtained for frequencies between 15 and 35 Hz, where almost any array setup usually provides reliable results. The values of the fitting parameters are reported in Fig. 5b along with an additional set of parameters provided to obtain a conservative estimate, i.e. an upper bound, to be used in case of low confidence in the experimental data (dashed red line in Fig. 5b). This upper bound is obtained as the 10th percentile (i.e. 10% of the experimental values are above the fitting line).

Note that the fitting is valid in the frequency range in which the regression was formulated (i.e. from about 2.5 to up to 75 Hz). The formulation should be adopted with caution outside this range. Also, note that the suggested relationship regards only the uncertainties for the fundamental mode of the EDC. Further analyses are needed for an estimation of the uncertainties associated with higher modes in the EDCs.

Statistical properties of the solutions

Through the EDC inversion, a set of equivalent solutions for each site was obtained. These results can be processed using a systematic statistical inference method (Ang and Tang 1984). The inferential analysis allows for the determination of the main statistical properties of the solutions.

Figure 6a shows, for each site in the PSWD, the mean \(\stackrel{-}{{V}_{S,z}}\) of all the equivalent profiles as a function of depth z, up to the depth with a minimum of fifty equivalent profiles. It can be observed that the PSWD includes a wide range of subsoil conditions. The \(\stackrel{-}{{V}_{S,z}}\) are reported in natural values, although their calculation was conducted on the logarithmic values. Figure 6a also includes a red dotted line highlighting the depth of 30 m where the \(\stackrel{-}{{V}_{S,30}}\) goes from 163 m/s (Catania-1) to 787 m/s (Tarcento-4). Figure 6b shows the logarithmic standard deviation \({\sigma }_{\mathrm{ln}\left({V}_{S,z}\right)}\) corresponding to each \(\stackrel{-}{{V}_{S,z}}\). The \({\sigma }_{\mathrm{ln}\left({V}_{S,z}\right)}\) profiles show a typical range of values roughly between 0.01 and 0.05. Higher values are observed for a reduced number of sites (12%) in the first 3 m depths reflecting the wavelength content of EDCs. For increasing depths, most of the sites show a \({\sigma }_{\mathrm{ln}\left({V}_{S,z}\right)}\) lower than 0.025. Also, it can be observed that \({\sigma }_{\mathrm{ln}\left({V}_{S,z}\right)}\) can be approximated as constant with depth. Local peaks represent the residual influence of the position of the interfaces. However, the consistent global behaviour is maintained and can be modelled as depth independent. Note that a constant logarithmic standard deviation is equivalent to a constant coefficient of variation (i.e. \({\mathrm{COV}}_{{V}_{S,z}}(z)\)). This means that the Gaussian standard deviation (i.e. \({\sigma }_{\left({V}_{S,z}\right)}(z)\)) increases with depth, as the harmonic average velocity typically increases. This observation is in line with the gradual loss of resolution of the surface wave tests with depth (Socco et al. 2010; Foti et al. 2014).

Fig. 6
figure 6

Parameters of the equivalent profiles for each site (the statistics of the sample of equivalent profiles are computed with a minimum number of 50 models): a mean of the harmonic average shear wave velocity profiles; b logarithmic standard deviation of the harmonic average shear wave velocity profiles; c mean of the interval shear wave velocity profiles; d logarithmic standard deviation of the interval shear wave velocity profiles

The same calculations were performed for the interval velocity profiles. In this case (Fig. 6c-d), the logarithmic standard deviation of the shear wave velocity is strongly influenced by the position of the interfaces and peaks are present at various depths. These peaks correspond to the depth of the interfaces, where a lognormal distribution cannot describe the interval velocity profile. Indeed, the solution non-uniqueness mostly affects the position of interfaces between layers in the interval velocity profiles. The resulted values presented in Fig. 6d mostly range from 0.02 to 0.2 and are in accordance with other evidence in the literature that are comprehensively compared in Passeri (2019).

In order to clarify the above finding, Fig. 7 shows the inference method applied to the VS,z profiles presented in Fig. 3b and to the VS profiles presented in Fig. 3a for La Salle-1 site. The same analysis was conducted on the remaining sites in the PSWD showing similar results. Specifically, Fig. 7 shows both the histograms (upper panels) and the quantile–quantile plots (Q-Q plots) (lower panels) assuming a lognormal distribution (Rasmussen 2004) for a standard lognormal random variable (i.e. VS,z or VS) at specific depths. The lognormal distribution is a common assumption for shear-wave velocity as it is usually preferred over the normal (i.e. Gaussian distribution) when modeling non-negative quantities.

Fig. 7
figure 7

Histograms and Q-Q plots of the population of equivalent profiles at La Salle-1 (ID 22) for verification of the lognormal distribution: harmonic average shear wave velocity profiles: a, b, c, d, and e; interval shear wave velocity profiles: f, g, h, i, and l

The lognormal distribution for VS,z is confirmed, showing an excellent approximation for different depths, also close to the interfaces. The same lognormal distribution assumption cannot be however adopted for the interval velocity (i.e. VS) as proposed by Toro (1995); Kottke and Rathje (2009); and Li and Asimaki (2010). Indeed, VS cannot be modeled as lognormally distributed close to the interfaces between layers due to the uncertainty on the position of the interfaces which introduces discontinuities in the distribution. This is particularly clear in Fig. 7f and 7j showing a bimodal shape both from the histograms and the Q-Q plots. This is related to the presence of sharp interfaces around 5 and 100 m depth of the profile. This effect is due to the intrinsic superposition of the random variables time and space in the interval velocity profile. On the other hand, the harmonic average profiles are calculated as the ratio of depth and cumulated travel time. Therefore the separation of the two random variable provide a continuous function (Passeri 2019).

Shear wave velocity proxies from the EDC

Several Authors (Brown et al. 2000; Martin and Diehl 2004; Socco et al. 2017; Foti et al. 2018; and Passeri 2019) suggest the use of the VR corresponding to a specific wavelength to estimate the VS,30 without the need for a formal solution of the Rayleigh inverse problem. This concept can be extended by plotting the VR vs. \(\lambda\) and the VS,z vs. \(z\) profiles of the statistically equivalent solutions. Indeed a remarkable similarity can be observed between the two curves, as suggested in Socco et al. (2017). A verification of the most adopted \(\lambda -z\) relationship at the single depth of 30 m by means of the PSWD is hereafter discussed. In this specific case, most of the abovementioned authors suggest the use of the VR associated with the wavelength of 42 m to be compared with \({V}_{S,30}\).

For the whole PSWD, Fig. 8a shows the mean of the \({V}_{S,30}\) of all the equivalent profiles, along with the error bars associated with their logarithmic standard deviations, versus the \({V}_{R,42}\). This figure provides a validation of the empirical relationships presented in the literature since most of the sites fall very near to the 1–1 line.

Fig. 8
figure 8

Validation of the empirical relationship to estimate the harmonic average shear wave velocity at a depth of 30 m (VS,30): a mean of the VS,30 along with the associated standard deviation of the statistical sample of equivalent solutions vs. the VR,42; b error index to estimate the goodness of the proposed empirical relationship as a function of VS,30

In addition, Fig. 8b gives a quantification of the error in the estimation of the \({V}_{S,30}\) by means of the only EDC, without performing any inversion, reporting an error index in the form of a normalized residual calculated as:

$$\eta = \sqrt {\frac{{\left( {V_{R,42} - \overline{{V_{S,30} }} } \right)^{2} }}{{\left( {\sigma_{{V_{S,30} }} } \right)^{2} }}}$$

The error index estimated by Eq. 5 is reported in Fig. 8b as a function of the average stiffness of the deposit, evaluated by means of \(\stackrel{-}{{V}_{S,30}}\). It can be observed that for most of the sites the error index is low, with an average value across all sites of 3.6. Few profiles show a larger error mainly related to the reduced uncertainty of the VS,30 that weights the proposed index.

Inter-method comparisons

The inter-method comparisons are usually reported in literature in terms of interval velocity VS profiles. A better understanding of the inter-method differences related to the dynamic characteristics of the deposit can be achieved comparing the VS,z profiles. Many authors already suggested this comparison (Moss 2008; Bergamo et al. 2011; Comina et al. 2011; Kim et al. 2013; Garofalo et al. 2016b), generally finding good results for all soil categories. In this context, Fig. 9 presents a comparison by an inter-method comparison index (\(IMC\)) defined as the percent error between the tests (in decimal form):

$$IMC\left( z \right) = \frac{{\left( {V_{{S,z \left( {surface \,waves} \right)}} \left( z \right) - V_{{S,z \left( {invasive} \right)}} \left( z \right)} \right)}}{{V_{{S,z \left( {invasive} \right)}} \left( z \right)}}$$
Fig. 9
figure 9

Inter-method comparison index (IMC) with depth for a DH tests, b CH tests, and c SDMT tests

where \({V}_{S,z \left(surface\,waves\right)}\left(z\right)\) is the harmonic average shear wave velocity profile obtained from surface wave test and \({V}_{S,z \left(invasive\right)}(z)\) is the same quantity calculated for the invasive test.

Generally, it can be stated that surface wave tests provide often a lower velocity in the shallow portion of the profile with respect to invasive tests, especially for DH and CH tests (Fig. 9a-b). This difference may be related to the strain hardening due to grouting operations to prepare the hole. These near-surface effects are recognized for invasive methods which tend to have measuring errors for the few uppermost meters (e.g. Moss 2008). Also, the different volumes investigated by the two methodologies could play a role in this difference. Figure 9 shows also that 18 of the 47 profiles (note that three sites have multiple invasive tests) present a globally lower VS,z by the surface wave tests for the whole profile depth (i.e. consistently negative \(IMCs\)). Nevertheless, most of the profiles (34/47) remain within an IMC lower than 0.2 for the entire profiles.

Figure 10 presents an inter-method comparison focusing on the top 30 m of the deposit, which is considered as a proxy for amplification in most seismic codes (Borcherdt 1994, 2012). Modern building codes also consider specific categories for sites where the bedrock is at a depth less than 30 m (e.g. NTC 2018). In this case, the analyst should calculate a specific VS,H, where H is the bedrock depth. For these reasons, Fig. 10 shows the comparison of the harmonic average VS,z for the depths of 5 m (Fig. 10a), 10 m (Fig. 10b), 15 m (Fig. 10c), 20 m (Fig. 10d), 25 m (Fig. 10e) and 30 m (Fig. 10f). Here the values from the statistical sample of surface wave tests are displayed as mean and logarithmic standard deviation converted in natural values to be consistently displayed as points and error bars in the graph. A substantial higher velocity is estimated by the invasive tests for the shallow part of the deposit, with a general agreement for the deepest portions. Figure 10g completes the charts comparing the VR,42 with the VS,30 obtained from the invasive tests. The comparison with Fig. 8a gives a more scattered result retrieved with the invasive tests that generally show higher harmonic average velocities, as expected from Fig. 10f. The values of VS,30, both from invasive and surface wave tests, and of VR,42 are also reported in the electronic supplement of the paper for direct numerical comparison. Please note that not all the sites have corresponding quantification of these parameters given the absence of invasive tests for some sites or the reduced penetration depth of surface wave tests in few sites.

Fig. 10
figure 10

Inter-method comparison of the VS models in terms of harmonic average shear wave velocity profiles: a VS,5, b VS,10, c VS,15, d VS,20, e VS,25, f VS,30. Panel g reports a comparison of the Rayleigh wave velocity for a wavelength of 42 m against the VS,30 from invasive methods

Discussions and conclusions

The Polito Surface Wave flat-file Database (PSWD) includes several sites that have been investigated in Italy during the past 25 years by the Politecnico di Torino (sometimes in collaboration with the Università di Torino). The main objectives of the flat-file database are: (i) the assessment of the statistical properties of the test results; (ii) the development of empirical correlations; (iii) inter-method comparisons and (iv) the calibration of empirical predictive models. In addition, the PSWD can be useful as a benchmark for the validation and verification of new inversion algorithms.

Some examples of use of the flat-file database are reported in the paper. The first is a formulation for the estimation of the uncertainties associated with the experimental dispersion curve (EDC). This formulation can be used when a statistically significant population of measurements is not available for a specific site. It offers the opportunity to propagate uncertainties into Vs profiles when a direct estimation of experimental uncertainties in the field is not possible. Caution must be posed in the application of the proposed formulation outside the frequency range for which it is proposed (i.e. from 2.5 to 75 Hz).

The flat-file database is also useful for the analysis of the statistical properties of the solutions. Specifically, in the paper it is shown that the harmonic average shear wave velocity profiles can be assumed as lognormally distributed. On the other hand, the results showed that the interval velocity profiles can be used only as an engineering schematization of the subsurface but are not adequate to model the uncertainties.

The analysis of the flat-file database confirmed the physical correlation between the wavelength of the Rayleigh velocity and the depth of the associated harmonic average shear wave velocity model. This physical correlation was already shown by other authors: Aung and Leong (2015) evaluated the contribution of different layers to the surface wave propagation velocity at certain wavelengths with specific weighting factors; Haney and Tsai (2015) proposed a Dix-type relationship to obtain a depth profile directly from the experimental dispersion curve. The analysis of the PSWD shows that at 30 m depth, a direct correlation (i.e. \(\lambda =42 \,{\rm m}-z=30 \,{\rm m}\)) can be used to estimate the soil class of the deposit without the need for a solution of the inverse problem. More generally the experimental dispersion curve can be directly transformed, at any depth, through the use of a simple data transform, based on an appropriate wavelength/depth relationship, into a VS,z profile. This last transformation has been demonstrated to match both the VS,z profile obtainable from a specific inversion of the EDC and the one from independent invasive tests (Socco and Comina 2015; Socco et al. 2017).

Some inter-method comparisons are also presented between invasive and non-invasive methods. The results show a general trend to estimate lower velocity from surface wave tests than invasive tests, particularly in the shallow part of the deposit. This may be related to soil modifications induced by borehole execution or to the different volumes investigated by invasive surveys and surface wave tests. In both cases the surface wave tests providing average information on a wide investigation volume and being inherently non-invasive, could result more effective in the overall seismic characterization of a site.

In future research, further efforts may be devoted to the assessment of the experimental uncertainties associated with higher modes of the dispersion curve. In this respect additional twenty-one sites, where higher modes were influential in the dispersion curves, are yet available. These sites were not included in the current version of the PSWD, to focus on the fundamental-mode sites. Also, the assumption of the lognormal distribution for the harmonic average shear wave velocity profile should be adopted limiting the statistical sample to ± two standard deviations, as usually proposed for other physical random variables. Finally, the VS,z–VR,λ relationship could be expanded beyond the correlation for the specific depth of 30 m (i.e. VS,30).