An assessment of uncertainties in VS profiles obtained from microtremor observations in the phased 2018 COSMOS blind trials

Site response is a critical consideration when assessing earthquake hazards. Site characterization is key to understanding site effects as influenced by seismic site conditions of the local geology. Thus, a number of geophysical site characterization methods were developed to meet the demand for accurate and cost-effective results. As a consequence, a number of studies have been administered periodically as blind trials to evaluate the state-of-practice on-site characterization. We present results from the Consortium of Organizations for Strong Motion Observation Systems (COSMOS) blind trials, which used data recorded from surface-based microtremor array methods (MAM) at four sites where geomorphic conditions vary from deep alluvial basins to an alpine valley. Thirty-four invited analysts participated. Data were incrementally released to 17 available analysts who participated in all four phases: (1) two-station arrays, (2) sparse triangular arrays, (3) complex nested triangular or circular arrays, and (4) all available geological control site information including drill hole data. Another set of 17 analysts provided results from two sites and two phases only. Although data from one site consisted of recordings from three-component sensors, the other three sites consisted of data recorded only by vertical-component sensors. The sites cover a range of noise source distributions, ranging from one site with a highly directional microtremor wave field to others with omni-directional (azimuthally distributed) wave fields. We review results from different processing techniques (e.g., beam-forming, spatial autocorrelation, cross-correlation, or seismic interferometry) applied by the analysts and compare the effectiveness between the differing wave field distributions. We define the M index as a quality index based on estimates of the time-averaged shear-wave velocity of the upper 10 (VS10), 30 (VS30), 100 (VS100), and 300 (VS300) meters and show its usefulness in quantitative comparisons of VS profiles from multiple analysts. Our findings are expected to aid in building an evidence-based consensus on preferred cost-effective arrays and processing methodology for future studies of seismic site effects.


Introduction
Microtremor observations are in frequent use for earthquake hazard studies, for building quasi-3D models of surficial geology, and for depth of cover studies in hydrology or mineral exploration surveys. Most modern earthquake building codes are based on a seismic site response parameters known as the time-averaged shear-wave velocity V S to the depth of 30 m (V S30 ) (BSSC 2003;EC8 2004). Although indirect proxy-based V S30 methods were applied (Yong, 2016), site seismic recordings are preferably the basis for modeling the one-dimensional (1D) V S profile and to derive V S30 .
Since it was introduced by Borcherdt (1994), V S30 has been the most common ground motion model site index to account for seismic site conditions. Slowness (S S ), the inverse of velocity, is a lesser known measure that has been known to be useful when highlighting detailed layers of the near surface (Brown et al. 2003;Boore and Asten 2008;Mital et al. 2021). Traditionally, invasive borehole (downhole, DH) methods were the standard practice for characterizing seismic site conditions and proved to be quite costly; some practitioners have ill-advisably considered DH methods to be the most accurate approach, and Socco and Strobbia (2004) and Thompson et al. (2009) (amongst others) have cautioned against the perceived superiority of DH methods (e.g., data contamination from reflected or downward wave propagation). In recent years, an impressive number of noninvasive geophysical site characterization methods were developed to meet the demand to accurately and costeffectively provide these critical seismic parameters. Consequently, a number of studies have administered blind trials to evaluate different array and processing methodologies for extraction of V S profile information, or the equivalent inverse parameter shear-wave slowness S S . Examples of past notable blind trials that placed substantial emphases on interpretation and modeling of passive seismic (microtremor or ambient noise) site characterization data include the 2004 Coyote Creek (CA, USA) blind test (Boore and Asten 2008), the blind tests for the 2006 3rd International Symposium on the Effects of Surface Geology (Grenoble, France) , and more recently, the 2015 Inter-comparison of methods for site PArameter and veloCIty proFIle charaCterization (InterPACIFIC) Workshop (Turin, Italy) (Garofalo et al. 2016). These trials commonly used a very large number of arrays and stations to provide information to produce the best-possible results for estimations of V S versus depth profiles from microtremor and other active-source methods. The participants of the aforementioned blind trial efforts were almost always exclusively comprised of globally recognized experts or practitioners with notable specialties in the limited methods tested. They, however, did not considered the issue of which basic arrays with a limited number of sensors might be practical and under what circumstances.
In this study, we address the important issue of how useful basic or sparse arrays can be in practical surveys for evaluation of site parameters. The study was conducted through the Consortium of Organizations for Strong Motion Observation Systems (COSMOS) using a set of blind trials which were conducted from June-November 2018. This COSMOS trial strategically employed a number of key ingredients (Asten et al. 2018a(Asten et al. , b, 2019a. One such component was that the study used microtremor array data recorded from four sites in order to consider limitations imposed by differing geologies, sparse array geometries, and interpretation methodologies. Another important element was that these sites differ in the azimuthal coverage of microtremor ambient noise at each location. The COSMOS trial also deliberately used a four-phase approach in order to evaluate changes in blind interpretations as each phase introduced additional array data. Furthermore, this trial also incorporated results from a range of different geophysical methods, including processing techniques (algorithms) coded in software packages (or customized in-house computer programs) which were independently selected for analyses by the 1 3 Vol:. (1234567890) invited participants. The range of their experiences with microtremor array methods was broad as they were comprised of graduate-level students, commercial practitioners, and topical experts-some were also developers of the open-source and commercial software packages used in this trial. To encourage and sustain the voluntary contributions in as many of the phases as possible and to avoid the undesirable effects of competition, identities of the participants were anonymized (except to the COSMOS administrators). For each of the four sites, a data supplier and established user of the methods supplied not only the recorded microtremor data for each site, but a reference V S model constructed from all available geophysical, borehole, and geological data. The reference models were only disclosed to participants in phase 4 of the blind trial.
In this paper, we summarize the 2018 trial and results from blind interpretations by 17 analysts who analyzed all sites and phases (group 1 in Table 1). We also present limited results from a separate subset of 17 analysts (group 2) who participated in phase 3 only, by providing analyses for only two of the four sites (site 2 and site 4).

Goals of the trials
The primary goal of the COSMOS trials was to evaluate the efficiency of microtremor methods to estimate V S profiles using surface-based arrays by progressively increasing site coverage through the densification of recording locations, from two-station arrays (N = 2) to sparse arrays (typically N = 3 or 4) and full arrays (N ≥ 6). An associated goal was to evaluate different techniques through multiple independent interpretations.
In order to achieve these goals, the trials were divided into four phases, with each of the four sites evaluated/re-evaluated in each of the phases: • Phase 1-two-station arrays (single pair of seismometers extracted from a larger array) • Phase 2-sparse arrays (four-station triangle) • Phase 3-full array data (nested triangles or circles of different diameters) • Phase 4-re-evaluation using geological and borehole data, as well as a reference model as provided by the data supplier for each site  (Aki 1957) and the frequency-wave number (FK) (Capon, 1969); a few analysts applied approach based on noise cross-correlation (CC) (Campillo and Paul 2003;Shapiro and Campillo 2004;Shapiro et al. 2005) or seismic noise interferometry techniques (SI) (Wapenaar 2004;Wapenaar et al. 2010a;2010b) (Table 1). It should be noted that CC and SI are frequently used interchangeably because the approach is fundamentally defined as the cross-correlation of microtremors recorded at two stations (Bensen et al. 2007;Campillo and Roux 2015). For accuracy, the SPAC approach relies on spatial azimuthal averaging of coherent plane waves propagating at a single scalar velocity at each frequency. The extended SPAC (ESAC) method was also applied by analysts of this study and as the name implies, ESAC is a variant of the SPAC method that fits estimates of phase velocity to multiple regular values of station spacings for a selected set of frequencies. FK array processing approaches (e.g., beam-forming and maximum likelihood methods) have four essential differences from SPAC; FK is capable of the following: • Performing at its optimum when unidirectional wave propagation is known to exist • Remaining effective when the array stations are irregularly spaced • Resolving (subject to limitations of the array response function) multiple velocities (or higher modes) of wave propagation • Providing robust estimates of wave velocity when incoherent noise is present in additional to the propagating wave signal Advantages and disadvantages of either the SPAC and FK approaches were recently outlined in Foti et al. (2018) as well as in , and the two were demonstrated to have possible trade-offs in reliability between the them. For example, the SPAC approach fundamentally averages the propagation coherent waves; thus, it is not necessary to know a priori about the unidirectional nature of the propagating wave field as would when using the FK approach. To do so, the SPAC method requires regular (or multiple regular) station spacing, which limits its applicability at locations where space is limited due to either access related issues or complexity in terrain (or both combined). By comparison, FK remains effective when the array is irregularly spaced but its reliability is reduced when the number of optimally located recordings are limited.
The frequency-or time-domain CC (or SI) method is often considered a development from the original frequency-domain SPAC method of Aki (1957). Ekstrom et al. (2009) further developed CC in the frequency domain, which was applied recently by Pastén et al. (2016). Tsai and Moschetti (2010) notably investigated the exact equivalence between the CC approach and the SPAC approach. They found both approaches allow results from each to be used in the other. For example, noise tomography as derived from CC can be improved by inclusion of results from the fundamental process of azimuthal averaging used in the SPAC approach and SPAC can be improved from the time windowing of tomography as derived in the CC approach. Tsai and Moschetti (2010) suggested that the merger of the two methods are complementary thus will improve results.  corroborated the findings of Tsai and Moschetti (2010); Foti et al. (2018) provided similar observations; thus, we also limit our descriptions about the CC method herein and refer the reader to these publications for additional information about the CC (or SI) approach. for phases 1, 2, and 3 of the blind trials. In addition to the array data previously described, each site has a substantial history of multiple independent seismic noninvasive (both passive and active surface wave methods) and invasive investigations (DH methods: seismic cone penetrometer test (SCPT) or P-S suspension logging). Figure 2 shows layered earth V S reference models constructed by data suppliers familiar with the full range of available data at the respective sites. These reference models were not disclosed to analysts until phase 4 of the project and were used to judge the quality of interpretations produced by analysts in these blind trials. Figure 3 shows the corresponding Rayleigh wave phase velocity dispersion curves for the four sites, computed using forward modeling methods and codes from Herrmann (2013). Dispersion curves plotted include the fundamental mode (R 0 ), two higher modes (R 1 and R 2 ), and the effective mode (R e ), where the R e assumes a power partition between fundamental and higher modes based on theoretical response of a layered earth to a vertical impact (Ikeda et al. 2012). Vertical shifts in the modeled dispersion curves in Fig. 3 are a consequence of changes in power partition between modes as noted by Ikeda et al. (2012). Tabular versions of the reference models are included in the phase 4 section of accompanying data release (Asten et al. 2021). Detailed information of the four sites are as follows: 2.3.1 Site 1: Guadalupe River, San Jose, CA, USA Site 1 overlaps with the location of drill hole 006S001W26Q001 (longitude − 121.93505°, latitude 37.37770°) (Wentworth et al. 2015;Mankine and Wentworth 2016). Wentworth et al. (2015) and Mankine and Wentworth (2016) interpreted data from the drill hole and showed that a 407-m-thick strata of Quaternary sediments overlies a bedrock of late-Mesozoic Franciscan Group consisting of metamorphic sandstones. The upper 300 m of sediments shows eight cycles of coarse through fine sediments; a coarse unit of gravels at base of cycle 3 (depth range 103-116 m) serves as a marker separating slower V S above and faster V S below, as seen on the shear-wave borehole log of Fig. 2a. The base of the drill hole is at approximately 407 m, which corresponds to data from both geology and the shear-wave log.

Sites investigated
The site was surveyed using broadband three-component (3C) intermediate sensors (flat response from 30 s to 50 Hz) deployed in three triangular arrays as shown in Fig. 1a. Inter-station spacings ranged from 17.3 to 300 m. Recording time ranged from 75 min for the small triangle to 30 min for the large triangle, with a sampling interval of 5 ms. Interpretations of microtremor and active surface wave data at this site and similar sites in Central California were published by Boore and Asten (2008).

Site 2: Trois Rivieres, Quebec, Canada
Site 2 has 30 m of soft clays over 30 m of firm overconsolidated clays. A nested triangular array consisting of ten receivers was used in the measurement (Fig. 1b). The maximum inter-station spacing was 50 m. The center of the array was located at longitude − 72.979202° and latitude 46.242401°. A cone penetrating test (CPT) and a seismic cone penetrating test (SCPT) to 29 m depth were performed inside of the array (Hayashi and Ito 2008). The site was surveyed using a 24-channel engineering seismograph and 2 Hz vertical-component geophones. Recording time was approximately 30 min.

Site 3: La Salle, Italy
Site 3 is located in a glacial valley in the foothills of the Alps, containing more than 200 m of firm sands and gravels (Socco et al., 2008). Location of the Fig. 1 Layout of three-component seismometers (3C) or single-component (1C) geophones for four sites used in the blind trials. a Site 1, Guadalupe, CA, USA; the drill hole with PS log is located adjacent to point B2 on eastside of the array. b Site 2, Trois Rivieres, Quebec, Canada; the drill hole with SCPT log is located about 20 m northwest of the array center. c Site 3, La Salle, Italy; two drill holes located 300 m north and 500 m northwest of the array center are shown in Fig. S4. d Site 4, Dolphin Park, Carson, CA, USA. Drill hole Carson-1 used to acquire PS data is located 150 m east of the array center. All drill hole data remained blind to analysts in this project until the phase 4 review. Yellow symbols in plots a-c show stations used for two-station SPAC in phase 1. Blue lines show arrays ("sparse triangles" of 3 or 4 stations) used in phase 2. For site 4, phase 1 data consisted of three pairs of stations with separations at 15, 30, and 60 m, where pairs were independent and specified only by the separation distance. For site 4, phase 2 data from three independent four-station triangles were supplied without additional data (e.g., location) other than the triangle side lengths of 15, 30, and 60 m. The gray arrow in c shows approximate direction of dominant wave propagation at only this site ◂ Fig. 2 Reference V S models for the four sites in this study. Left column: V S versus depth. Right column: slowness (S S ) versus depth. Black is reference model determined by data supplier for each site using all available data. See Asten et al. (2021) for tabular versions of reference models. a Site 1 (Guadalupe River, San Jose, CA, USA) has a V S borehole (downhole, DH) log (red curve) determined by Wentworth et al. (2015) and discretized into 5 m intervals by this study. b Site 2 (Trois Rivieres, Quebec, Canada) has an SCPT log (red curve) (Hayashi and Ito 2008). c Site 3 (La Salle, Italy) has logs determined from borehole (downhole, DH) seismic surveys in holes DH1 (red curve) and DH2 (yellow curve), located about 300 m north and 500 m northwest of the array (site E in Socco et al. 2008). d Site 4 (Dolphin Park, Carson, CA, USA) includes a borehole V S log (red curve) (Reichard et al. 2003) Site 3 Site 4 Site 4 array center is latitude 45.743119° and longitude 7.077813°. It lies on a wide triangular alluvial fan lying on the west side of the Dora Baltea River. The fan is mainly composed of alluvial deposits (sand, gravel, and stones), slivers, pebbles, and blocks from multiple origins. The general geological evolution can be summarized as a succession of alluvial fan deposits of medium to coarse grain size, overlaid on deposits of a glacial environment. Two drill holes (DH1 and DH2) located respectively 600 m northwest and 400 m north-northwest of the array center provide geological data. The stratigraphic logs, conducted to a depth of about 50 m, show the typical chaotic sequences of gravely soils of alpine alluvial fans, with no marked layering. Geophones were connected by cabling in two circular evenly-spaced arrays, each consisting of 24 geophones, at radii of 20 m and 37.5 m. The maximum inter-station spacing provided for this blind test is 64 m. Data were collected using a multichannel acquisition system and 48 vertical geophones with a natural frequency of 4.5 Hz. A sampling interval of 16 ms was adopted while recording files with lengths of 32,625 samples. Multiple files provided a total recording time of approximately 1 h.
Phase velocity dispersion curves computed for the reference models provided for sites 1-4. Red, yellow, green, and blue lines show (respectively) Rayleigh wave fundamental mode (R 0 ), Rayleigh wave higher modes (R 1 and R 2 ) and effective mode (R e ). Curves for R e overwrite R 0 for much of the spectra 1 3 Vol:. (1234567890)

Site 4: Dolphin Park, Carson, California
Site 4 (near Los Angeles) is situated on geology with a thickness of more than 400 m Holocene sediments. Drill hole Carson-1 (Well 4S/13 W-9H9; longitude − 118.2409°, latitude 33.8369°) and is also known as ROSRINE Site 26 (Reichard et al. 2003). The borehole was previously surveyed with a borehole P-S suspension logger to a depth of 340 m (Fig. 2d). Depth to basement is not known. Microtremor data were acquired using a nested triangular array, with maximum station spacing of 60 m. Sensors were 4.5 Hz vertical geophones that operated at a 4-ms sample rate with approximately 11 min of recording time. Note that although the drill hole and P-S log provided data to 340 m depth, the maximum depth of investigation provided by this microtremor survey is estimated from the lowest usable frequency (1.2 Hz in Fig. 12) and the corresponding Rayleigh wave phase velocity (450 m/s in Fig. 2) is approximated at the 190 m wavelength (using the half-wavelength approximation of Asten and Hayashi 2018). As a consequence of this limitation, analysis of site 4 data in these blind trials is constrained to ~ 100 m maximum resolvable depth (via the V S100 parameter) rather than the maximum geologically known depth of 340 m.

Defining accuracy relative to a reference model
We use the quality factor or "M index" (M) as estimates of the time-averaged shear-wave velocity of the upper 10, 30, 100, and 300 m. M is an empirical number (range 0 to 9, or 0 to 12), indicating the combined quality of the blind results for M VS10 , M VS30 , and M VS100 (and M VS300 where applicable) in terms of the fit against the site reference models, such as those representing V S10 , V S30 , and V S100 (and V S300 if available). To calculate the fit, we use the following equation: where N may be 30, 100, 100, or 300 m We define the M index to be: A perfect score for an interpreted V S profile matching the reference model is given by where terms in square brackets apply only for V S profiles extending to a depth of 300 m or greater.
Theoretically, an interpretation at sites 2 to 4 yielding scores of 3 each for M VS10 , M VS30 , and M VS100 will result in an overall perfect score of M = 9. For site 1 only, the greater depth of penetration allows estimation of V S300 and M VS300 and hence a perfect score is M = 12.

Choice of processing and techniques by analysts
Each of the 34 analysts who contributed to this study used their preferred MAM processing and analysis technique, as well as software (Table 1). The righthand column of Table 1 indicates whether analysis was made assuming only the fundamental Rayleigh mode (R 0 ) or the effective mode (R e ). The majority heavily favored use of SPAC (equivalently, spatially averaged coherency methods) and their various extensions (ESAC-1 and ESAC-2 in Table 1). The terms ESAC-1 and ESAC-2 denote similar methods but implemented in independent software packages. See section Data and Resources for software availability. Only two analysts reported use of frequency wavenumber (FK) methods, an observation not surprising because FK methods are most applicable for arrays with larger numbers of sensors than were provided in phases 1 and 2 of this study. Of the two analysts who used the CC method, one used the time-domain approach while the other did not specify time-or frequency-domain implementation. Although Tsai and Moschetti (2010) suggested that the merger of the SPAC and CC methods will improve results, no analysts reported the application of their recommendation.
where the direction was approximately normal to the pair of array geophones. This wave propagation information was not provided to participants. The issue of using SPAC methods when the azimuthal spread of wave propagation is narrow was considered by Cho et al. (2008). The basic SPAC method uses only the real part of inter-station complex coherencies but in practice the imaginary part contains information which can assist in recognition of situations where the assumptions behind the SPAC method are not fulfilled. Asten (2006) noted the value of the imaginary part as a quality control tool. Cho (2020) developed a quantitative method for utilizing the imaginary part. However, a priori recognition of narrow-azimuth wave propagation remains difficult, and correction when observations are limited to a single pair of geophones is near-impossible.
Phase 1 site 3 data demonstrated the difficulty of this challenge. Only analyst 1 was close in the attempt to estimate a velocity profile similar to the reference model for phase 1 at this site. However, the analyst did not provide any indication in the form of a discussion item, during the phase 4 re-evaluation of results, as to how this was achieved. Seven of 17 analysts in group 1 submitted results for this phase/site, and Fig. 4 shows three representative interpretations using three different software packages. It is clear that all three were severely in error due to the directional nature of the source. However, the same three representative analysts obtained acceptable (M = 8 or 9) estimates of the V S profile when using data from two nested triangular arrays as supplied for phase 2 at site 3 (see Fig. 5). Figures 6, 7, and 8 show selected V S and S S profiles for phase 1 sites 1, 2, and 4. The results are useful, despite the limitations imposed by the use of simple two-station pairs of sensors. Improvements achieved with 2D arrays are discussed in the next section.

Phase 2 and 3 results for arrays on four blind test sites
Phases 2 and 3 of these trials evaluate microtremor methods as a tool for interpreting V S profiles using sparse and dense array data, respectively. Figures 9, 10, and 11 show three of the analysts' results with the highest M indices (range 8 to 12) for phase 2, where data from a sparse triangular array is interpreted for each of sites 1, 2, and 4. Figure 5 (discussed previously) has the corresponding results for site 3 (M

Soft alluvials
Hard gravels Phase1 Site3 Phase1 Site3 The site 2 reference model (Fig. 2b) shows a mild low-velocity layer (LVL) between the 25 and 35 m depth. It is significant that no analyst succeeded in resolving this LVL. This result is in accordance with experience elsewhere where inversion of surface wave data is found to be limited in its ability to resolve shear-wave LVLs. For example, Garofalo et al. (2016;Fig. 13) show similar results for a Grenoble site where a 10-m layer of soft clays underlie a surficial 25 m of harder glacial sediments. Asten and Hayashi (2018) and Hayashi et al. (2021) also provide detailed discussions on the difficulty of resolving LVLs. Farrugia et al. (2016) give examples where the thickness of the LVL (clays) is of similar or greater thickness than the higher-velocity overburden (hard limestones in their case); for such cases, surface wave data do prove to be effective in resolving the LVL. Site 4 (Fig. 11) shows large discrepancies in V S estimates below the 100-m depth. As noted in the site description, the sensors used at this site were 4.5 Hz geophones and the most optimistic estimate for depth penetration in this blind study is on the order of 190 m deep. In order to make useful comparisons with the reference model, we restrict depth estimates to 100 m, while noting that the adjacent drill hole and P-S log provided independent V S data to the 340 m depth.
A clear outcome from the phase 1 and phase 2 results for site 3 was the demonstration of a major risk with the two-station method when energy is directional; it is difficult (or impossible) to recognize the problem of narrow-azimuth microtremor wave propagation oblique to the seismic line when data are limited to a single pair of seismometers. In such cases, interpretations are likely to be very misleading.
Interpreted V S -and S S -depth profiles for phase 3 interpretations at sites 1 to 4 are included as supplementary items (Figs. S5 to S12) in the electronic Supplementary Information of this article. Accuracy of interpretations relative to the reference models (combining experience from all analysts) is discussed in a later section.

Frequency bandwidth achieved with different processing techniques
Bandwidths achieved in Rayleigh wave dispersion analyses are an important factor in determining V S data over a wide range of depths. Figure 12 shows a summary of bandwidths achieved for phases 1, 2, and 3 at sites 1-4. As expected, the bandwidth generally increased as the number of seismometers/geophones in the array increased but equally important is the fact that bandwidth was affected by the choice of processing methodology. Orange bands in Fig. 12 highlight the analyst numbers which achieved optimum high and low frequencies. In principle, the low-frequency cut-off will be influenced by the geophone's natural frequency but for sites 2-4 where geophones were used, and the useful low-frequency limit achieved was one or two octaves below the geophone resonant frequency. A wide range of public-domain, commercial, and in-house software packages were used by the 17 analysts of group 1, but if we apply a test of consistency over at least three of the four sites, two unrelated techniques proved most effective in terms of bandwidth achieved: the ESAC and the multimode SPAC (MMSPAC) techniques for direct fitting of coherency spectra (orange bands on Fig. 12). CC (SI) methods also proved very effective for obtaining high-frequency data, but less so for low-frequency data. This limitation at low frequencies is a consequence of the fact that the time-domain approach imposes a restriction whereby velocities of propagating waves cannot be determined for wavelengths greater than half (or in practice a third) of the maximum inter-station distance (Bensen et al. 2007;Luo et al. 2015Luo et al. , 2016. By contrast, frequency-domain methods such as SPAC and its variants are effective where wavelengths are as long as some multiple of the array aperture .

Accuracy of interpretation by phase and site
The M index provides a useful basis for comparison of a range of differing interpretations by a group of analysts. Figure 13 shows the quality of analyst results, Phase1 Site4 Phase1 Site4 averaged over all analysts, for each of the three phases and each of the four sites. A perfect score is M = 9 for sites 2-4 (depths of interest to approximately 100 m), or M = 12 for site 1 (depths of interest to 400 m). For sites 2-4, it is apparent that there is little improvement in the M indices when using arrays of greater complexity than the phase 2 sparse triangular arrays.
Results for site 1 show improvement using the phase 3 arrays, illustrated by comparing Fig. 9 (phase 2 site 1) with Fig. S5 (phase 3 site 1) where it is obvious that the uncertainty ranges of slowness S S estimates decrease when using the phase 3 data. This improvement is probably because resolution of the range of depths of interest from 5 to 400 m requires the full Phase2 Site2 Phase2 Site2 While the M index is useful, it does not detect discrepancies between the reference and analysts' models at depths below 100 m (sites 2-4) or below 300 m (site 1). Figure 11 shows where two curves describe departures from the reference curve below 100 m depth but still retain high M indices. Such discrepancies are especially evident where cross-correlation methods are used, which prove effective at high frequencies but are limited at low frequencies (long wavelengths) as discussed in the previous section. We note such limitations in V S and S S versus depth curves for results by analysts #8 and #7 shown in Figs. S6 and S12, respectively, in the electronic Supplementary Information of this article.
We have used the M index because it reflects soil properties over thicknesses from 10 to 100 m (or 300 m). However, as an example of comparison with the simpler compilation of V S30 and classification into  site class (BSSC 2003), the Supplementary section contains Tables S1 to S4 which are compilations of phase 3 V S30 and site class for the four sites 1 to 4. These tables show results for all participating group 1 and group 2 analysts. For sites 2 and 3, the proportion of analysts estimating a V S30 within 10% of the reference model is about two thirds; for sites 1 and 4, the corresponding proportion is about one third of estimates within 10% of the reference model. There is no obvious reason in terms of site characteristics or analyst grouping for this systematic difference between site results.
6.2 Accuracy of interpretation by phase and analyst Figure 14 shows the quality of analysts' results averaged over all sites, for each of the most successful ten analysts; it also indicates the range of software packages used and does not indicate any packages performing superior than another. For site 1 only, the quality of fit also includes estimates of V S300 ; thus, analysts #3 (dark blue line) and #4 (orange line) were able to achieve indices > 9. For the top six analysts, it is again apparent that there is little improvement in accuracy relative to the reference models from phase 2 (sparse triangular arrays on all sites) to phase 3 (denser arrays).

Accuracy of interpretation of depth to layer interfaces
The quality of results discussed previously (shown in Figs. 13 and 14) is determined by the M index which describes accuracy of estimates of V S averaged as V S10 , V S30 , V S100 , and V S300 . These parameters are of particular importance in earthquake hazard studies (BSSC 2003;EC8 2004). Also, a proposed new earthquake hazard code for Europe requires estimates of depth to basement as well as averaged V S (Pitilakis et al. 2019). Site 1 has a borehole P-S suspension log which provides clear evidence of two important interfaces (V S contrasts) in the depth range from 100 to 120 m and at the basement depth of 407 m. This site thus provides opportunity for an assessment of the utility of MAM for depth estimation of interfaces having moderate V S contrasts.
We consider an acceptance criterion on the of order 40% misfit for the upper interface and 15% for the deep interface (pre-Quaternary basement), shown with the V S reference model ( Fig. 15 and Table 2). The V S borehole log for site 1 (Fig. 2a) and the geological log in Wentworth et al. (2015) show that the boundary is not a clear change in rock type but is probably spread over depths between 90 to 160 m. Detailed sensitivity studies by the supplier of the microtremor data using the reference model suggests the effective V S boundary to be 120 ± 40 m. The deep boundary of the geological contact in the drill hole is 407 m, but sensitivity analyses of the passive surface wave data suggest the effective boundary to be 390 ± 50 m. Note that because this deeper boundary lies below 300 m, success in achieving a high M index, as shown in Figs. 6 and 9 and in Supplementary Information Figs. S5-S6, is not necessarily reflected in the successful estimation of the boundary depth.
We use these boundaries and uncertainties to tabulate the rate of success by analysts in picking one or both V S contrasts. Table 2 shows the results compiled from interpretations of phase 1, 2, and 3 data submitted by up to 16 analysts. The three analysts in phase 3 that show major layer boundaries within the acceptable ranges shown on Fig. 15, for both interfaces, used three different software packages. These were analyst #4 using ESAC-2 for phase 2 and analysts #5, #10, and #3 (in-house SPAC; SPAC Genetic Algorithm, GA; and ESAC-1) for phase 3. Thus, there is no obvious indication of a preferred software package. It is Fig. 13 Quality of blind results by site, expressed as summed M index values for phases 1, 2, and 3, averaged over all group 1 analysts providing an interpretation. Site 1 is Guadalupe, CA, USA. Site 2 is Trois Rivieres, Quebec, Canada. Site 3 is La Salle, Italy. Site 4 is Dolphin Park, CA, USA notable that the success rate in correctly interpreting these two known boundaries from passive microtremor data at site 1 is on the of order 20% with phase 3 data and a poor 10% for the phase 1 and 2 sparse array data. It is reasonable to draw the conclusion that where depths to V S boundaries are important, as distinct from average velocities, then the use of dense arrays such as multiple triangles is a necessary part of the survey design.
The role of microtremor horizontal-to-vertical spectral ratios (mHVSR) was not discussed by most analysts in these trials, presumably because only one site of the four provided 3C data (Molnar et al. 2018). Two analysts noted using mHVSR on site 1 to determine the fundamental site frequency which was then used to guide the inversion process. Others may have done so without comment. Figure 16 shows the modeled ellipticity of Rayleigh modes for the reference model of site 1 as used in Figs. 2 and 3. This modeled ellipticity is similar to the mHVSR for observed data except that mHVSR curves are in general shifted   Table 2 Success rate for analysts providing depth estimates of upper and lower major interfaces for site 1. The number of analysts providing depth estimates was 10, 16, and 14 for phases 1, 2, and 3, respectively. The notation 3/14 indicates that there were 14 analysts who submitted a result for site 1 phase 3 and who provided depth estimates; of this group, there were three analysts who met the acceptable depth error criteria in Fig to higher values due to the inclusion of Love wave energy in observed microtremor signals. It is clear that the metamorphic basement at model depth 390 m (or based on drilled depth of 407 m) produces a strong mHVSR peak at 0.35 Hz in observed phase 1-3 data, which to some analysts represented a strong indicator of a layer boundary implying a high contrast in V S . (Analysts were not given the geological data until phase 4.) The frequency of the peak is clearly sensitive to basement depth on any modeling study of Rayleigh wave ellipticity. Multiple case histories have likewise used strong observed mHVSR peaks to guide the choice of a basement or other strong V S contrast (Asten et al. 2014;Macau et al. 2015). Inversion of mHVSR data combined with SPAC data was first described by Arai and Tokimatsu (2005) (Wathelet et al. 2020). Full 3C array processing of microtremor data using SPAC methods has been described by Kohler et al. (2007) and Puglia et al. (2011) andFK methods used by Fäh et al. (2008) and Poggi et al. (2017). In this study, two analysts reported experimenting with 3C processing with site 1 data. One used 3C processing in each phase of the site 1 study, with inclusion of both Love and Rayleigh wave dispersion data in the inversion for V S models. This analyst was one of the six in Table 2 who obtained a correct estimation of depth to the shallower interface of Fig. 15, but not for the deeper interface. The other analyst who used 3C processing noted some issues with orientation of seismometers to magnetic north versus position coordinates referenced to geographic north (magnetic declination at the site is 14° east). Additional processing by the analyst, described during phase 4 discussions, was successful in extracting both Love and Rayleigh wave dispersions curves through FK processing, but the recording time of 30 min for the 300-m array were apparently too short to allow definitive results.
We draw the conclusion (limited to site 1 only) in these blind trials that full 3C array processing did not demonstrate an obvious practical advantage over conventional SPAC processing with or without mHVSR information. Given the limitation that only one of the four sites had available 3C data, this cannot be presented as a general conclusion. However, it is consistent with a comment by  regarding the blind study described by Garofalo et al. (2016), which noted that four out of 14 analysts in that study made use of 3C array processing but no obvious improvement in the inversion to a V S model was noted for results from those four. It therefore appears that the merits of 3C array processing in such microtremor studies remain a question for further study.

Observations from phase 4 discussion by analysts
Site 4 and site 3 are regarded as the easiest and most difficult sites, respectively. This can be attributed to the fact that site 4 is near-ideal in array design and distribution of microtremor sources, while site 3 has a highly directional wave source. Two recurring majority themes in analyst feedback for all sites are (a) the advantages of the use of multiple arrays (i.e., dense arrays or multiple triangular or circular arrays) for higher confidence in microtremor-based interpretations and (b) highlights of the dangers of using two-station SPAC analysis where the character of azimuthal distribution of wave energy is not known. By contrast, there are opposite conclusions by some analysts to the effect that a sparse array can be reliable, and two-station SPAC is demonstrated to be useful (except for site 3). It should be noted that the view of a majority of analysts on the need for multiple arrays is not consistent with the observation in Fig. 13, which shows that at three out of SITE 1 REFERENCE MODEL Fig. 16 Rayleigh wave ellipticity curves for site 1, modeled using the reference model of Figs. 2a and 3. Red, yellow, and green colors show fundamental (R 0 ) and higher (R 1 and R 2 ) modes, respectively four sites, the addition of multiple arrays (phase 3) produces a change in the all-analyst average M index of less than one unit. The exception (site 1) was a rare instance where the additional array was a small array which gives significantly improved resolution of layers in the upper 100 m. It is reasonable to conclude that where the basic requirement is for V S10 , V S30 , or V S100 information, the sparse arrays are sufficient for these time-averaged types of V S calculations. It follows that where earthquake hazard site class specification (without layer thickness detail) is required, sparse arrays are sufficient. Three additional salient points from the phase 4 analyst comments are as follows: • A sparse array will typically deliver a narrower band of frequencies in the interpreted dispersion curve, which makes identification of the dominant Rayleigh wave mode more difficult. This can lead to errors in mode identification and subsequent errors in V S profile estimation. • 3C data at sites 2, 3, and 4 would allow mHVSR analyses and improve V S estimates at depth. Use of FK methods tends to overestimate phase velocities at low frequencies, and mHVSR data can assist in obtaining correct estimates of V S at the greater depths. • Active-source surface wave data (e.g., MASW described by Park et al. 1999 andXia et al. 2000), which is often richer in high-frequency information than passive data, should be included in order to improve resolution of nearsurface data.
Regarding the second point (above), these blind trials did not produce any example where the use of 3C array processing (for both Rayleigh and Love waves) improved the interpretation, although advantages of 3C processing have been previously discussed by multiple authors (Poggi and Fäh 2010;Wathelet et al. 2018). The third point is not necessarily valid for all interpretation methodologies; when in consideration of Fig. 12 and associated discussions contributed by the analysts, the take-away point seems to indicate that the useful high-frequency limit of dispersion curve data may be algorithm dependent. It is therefore arguable that optimum processing of passive microtremor data may access high-frequency data and thus provide sufficient resolution so as to negate the need for supplementary active seismic surface wave surveys.

Use of the blind trial for graduate student training
This COSMOS blind trial exercise was also used as a graduate student training exercise at the University of Western Ontario (UWO) by Sheri Molnar. A team of four was assembled with each member designated a particular site. The team composition included Molnar as the expert-trainer (site 4), one PhD student (site 2), and two MSc students (Sites 1 and 3). All three students had been trained by their supervisor (S. Molnar) in field data acquisition and processing of ambient vibration and surface wave array data using the Geopsy software during a 2-week training course. Sites were assigned blindly to each team member by the supervisor based on their experience, using the assumption that site complexity may increase sequentially from site 1 to site 4. The team began the blind trial at phase 2 which included the three-sensor array data and comprised the datasets of two team members. Each team member analyzed the blind trial phase 2 array data independently for their assigned site, including picking of fundamental-mode Rayleigh wave phase velocity estimates and inversion. The Geopsy software was used by all team members to calculate modified SPAC (MSPAC) dispersion estimates (SPAC curves are reviewed in selecting dispersion estimates but not used directly in the inversion) which were validated by comparisons with high-resolution FK dispersion estimates. Student team members then discussed their phase 2 dispersion picks, trials in model parameterization (not submitted to the trial), and inverted models with their supervisor prior to submission of an optimal V S profile. For site 1, the mHVSR was also calculated and inverted in phase 2 (no dispersion estimates were obtained). In phase 3, dispersion estimates were jointly inverted with the mHVSR data. Each team member repeated their independent dispersion assessments and inversions for their trial site with full array phase 3 data, in which results were reviewed by the supervisor prior to trial submission. The full array phase 3 data provided dispersion estimates at site 1, expanded the selected dispersion frequency bandwidth for site 3, and only provided 1 3 Vol.: (0123456789) confidence through consistency in the selected phase 2 dispersion estimates for sites 2 and 4.
Following the release of phase 4 site location and independent geophysical information, each team member evaluated their dispersion estimates and inverted V S model(s) in comparison with the phase 4 "ground truth" data. The four-member team discussed results in comparison to phase 4 information and shared lessons learned in regard to accuracy and frequency bandwidth of their dispersion estimates. Team members consistently obtain correct dispersion estimates in the mid-frequency dispersion curve range but tend to under-or overestimate dispersion estimates in the bottom (low frequencies) or top (high frequencies) of the curve. Overall, the blind trial provided empirical datasets with external expertreviewed answers. This formed an important empirical benchmark exercise for the UWO team to test their skills in dispersion curve interpretation.

Discussion and conclusions
We conducted a blind trial to estimate V S profiles from data recorded by MAM and found that, subject to a sufficient azimuthal distribution of seismic noise sources, the use of sparse arrays is sufficient for accurate estimates of V S10 , V S30 , V S100 , or V S300 . Guidelines on the choice of array dimensions and desirable sensor bandwidth are given in reviews by Foti et al. (2018), , and Hayashi et al. (2021). We have introduced the M index (based on time-averaged shear-wave velocities over multiple depth ranges) as a tool for quantitatively comparing results from multiple analysts. No single processing or analytical technique is identified as optimal and no software packages were found to be superior. We found that the best six analyst results in Fig. 14 used five different techniques/software packages.
The two-station method does not directly address the critical factor relating to whether the energy source for microtremor wave propagation is directional. Thus, the two-station method should only be used when azimuthally distributed sources are known to exist. This cautionary point will also apply if linear or quasi-linear arrays are used. For the majority of sites in this study, azimuthal distribution of sources was in fact sufficient such that a two-station sparse array proved adequate for reliable estimation of averaged shear-wave velocities V S10 , V S30 , and V S100 .
With respect to processing techniques, the widest usable bandwidths of Rayleigh wave dispersion curves of microtremor data, as determined in these blind trials, was obtained with the ESAC method and with the MMSPAC method based on direct fitting of coherency spectra.
Estimation of depth to known interfaces is a significantly greater challenge than estimation of V S10 , V S30 , and V S100 . We find that the M index is not a useful measure of accuracy for estimates of depth to interfaces. A comparison of 16 analyst's estimations of depth to known interfaces at site 1 showed that only about 50% of analysts were successful in estimating depth of one or both of the two known interfaces and only 10% of analysts in phases 1 and 2; only 20% in phase 3 succeeded in estimating both depths correctly. There were four successful estimations of depth for the pair of interfaces (Table 2, phase 2 plus phase 3) and these estimates used four different software packages. It is reasonable to conclude that analyst skill and experience may be a stronger factor than software choice in these comparisons. Where quantitative depth to interfaces is required, it is clear that arrays denser than the simple sparse arrays are to be recommended.
Analyses using mHVSR were found to assist in defining depth to the deep interface at one site (site 1). Where depth to interfaces (as distinct from averaged shear-wave velocities) is required, it is clear that acquisition of some 3C data for allowing HVSR processing, is to be recommended. The low success rate in defining depths to interfaces in this limited study suggests there is a need for development of clear guidelines for interpreters.
Only one of the sites included 3C data but only two analysts reported use of full 3C array processing, including Love wave dispersion analysis. We thus cannot draw any defensible conclusions from these blind trials regarding the usefulness of combined Rayleigh and Love wave dispersion analysis as applied to arrays of the types used in this study, and the subject remains an area for further research.
identified the need for these blind trials, for these guidelines for best-practices and provided funding and encouragement to facilitate the project. This material is also based upon work supported by the US Geological Survey under Cooperative Agreement No. G17AC00058. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of the US Geological Survey. Mention of trade names or commercial products does not constitute their endorsement by the US Geological Survey. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the US Government, the Canadian Government, or COSMOS. This paper is Natural Resources of Canada contribution number 20200665.
Open Access publication fees for this paper were directly funded by COSMOS.
Array data for site 1 was supplied by the US Geological Survey. Data for site 2 was acquired and supplied by the Port and Airport Research Institute, Japan.
Data for site 3 was acquired for the Interreg III B-Alpinespace-Sismovalp project: "Seismic hazard and alpine valley response analysis" funded by the European Union.
The borehole survey at site 4 was conducted as part of the ROSRINE project funded by the US National Science Foundation, Pacific Earthquake Engineering Center, California Department of Transportation, and other organizations. The microtremor array data was acquired in 2003 as an internally funded research project by GEOVision, Inc.