Benchmarking the software packages to model and assess the seismic response of unreinforced masonry existing buildings through nonlinear static analyses

Seismic modelling of unreinforced masonry (URM) buildings is addressed worldwide according to different approaches, not only at research level, but also in the current engineering practice. The analysts have so many different possible choices in interpreting the response of the examined structure and in transferring them into the model for the assessment that the achievable results may turn out in a huge scattering, as also testified by various comparative studies already available in the literature. Within this context, this paper is an overview of a wide research activity addressed to the benchmarking of software packages for the modelling and seismic assessment through nonlinear static analyses of URM buildings. The activity conveyed the effort of many experts from various Italian universities and was funded by the Italian Department of Civil Protection within the context of the ReLUIS projects. The main objective of the research is the critical analysis and the systematic comparison of the results obtained by using several modelling approaches and software package tools on selected benchmark examples in order to provide a useful and qualified reference to the engineering and scientific community. To this aim, different benchmark examples—of increasing complexity, ranging from the single panel to 3D existing buildings—have been specifically designed. While other papers from the teams involved in the research project delve on the specific results achieved on each of these case studies, this paper illustrates an overview on such benchmark structures, their purpose and the standardized criteria adopted to compare the results. Moreover, the whole set of benchmark case-studies is made available in this paper through their detailed input data allowing to be replicated also by other researchers and analysts.


Introduction
The outcomes of seismic vulnerability evaluations have significant repercussions in the engineering practice, since they influence the design of strengthening interventions, at the scale of the single existing building, or the plan of mitigation policies carried out by administrations, at a territorial scale.
Since the 1970s (Tomazevic 1978), the use of nonlinear analyses is becoming of wider and wider use in the professional community for the seismic assessment of existing unreinforced masonry (URM) buildings, as a consequence of the evolution of design/assessment codes (progressively oriented to the performance-based approach) and thanks also to the increasing availability of commercial software packages. Nowadays, the latter constitute one of the needed tools adopted by analysts and engineers involved in such a process, highlighting at the same time the problem of their reliability and their correct use. Thus, the benchmarking of practitioner-oriented software packages is an important task, as also confirmed by the increasing attention paid to this issue by associations and codes, too. Just to mention a few, the Italian Technical Code (NTC 2018), at §10.2-Analyses and verifications carried out with software packages, makes explicit reference to the need of verifying the reliability and suitability of the adopted software package, as well as the motivated judgement on the acceptability of the results achieved by the professional.
However, the comparative studies available in literature on this topic highlight how the issue is tricky and challenging by documenting a large scattering of achievable results due to the too many possible choices in defining the numerical model and in interpreting the results, especially when used to finalize the seismic assessment. For this aim, in fact, many assumptions arise for example to define limit states and compare the seismic input with the structural capacity. These comparative studies include both: • Research works specifically addressed to test different modelling strategies (e.g. Salonikios et al. 2003, Giamundo et al. 2014, considering also commercial software packages (e.g. Lourenço 2011, 2014;Calderoni et al. 2015;De Falco et al. 2017;Siano et al. 2018;Aşıkoğlu et al. 2020;Malcata et al. 2020); • Blind predictions involving a large number of research teams called to predict the seismic response of the same benchmark prototype, either within the context of correlated experimental campaigns able to provide also the actual "reference solution" (e.g. in Mendes et al. 2017;Esposito et al. 2019) or in purely research experiences (e.g. Bartoli et al. 2017;Parisse et al. 2021).
Some of these experiences are illustrated more in detail in the following Sect. 2 in order to clarify the motivations that inspired the wide research project described in this paper. These studies highlight how the issue concerns not only professionals but also the scientific community, and all possible failure mechanisms that may interest URM buildings (i.e. either the in-plane and out-of-plane response, as discussed in Sect. 2).
The large documented scattering can be ascribed to several issues inherent: • The intrinsic complexity of existing URM buildings, that involves many uncertainties (e.g. Bracchi et al. 2015) that-if investigated through various surveying/testing techniques (e.g. Kržan et al. 2015) at different levels of thoroughness/completeness (e.g. Rota et al. 2014;Haddad et al. 2019)-may lead to different assumptions in the modelling and, consequently, to scattered results in the analysis/verification phase.
• The modelling process, that involves many possible approaches to describe the seismic behaviour of URM structures Lourenço 2002;Roca et al. 2010;D'Altri et al. 2020). The choice among these different options and their consistent use must be based on a solid knowledge of the recurring failure modes that may occur and of their causing factors. This knowledge constitutes one of the key points for a proper use and a consistent selection of the software packages to be used depending on the specific building under examination (sometimes also highlighting the possible limits in describing it in an exhaustive way). • The way in which the software packages themselves are used: in fact, their relative ease of use cannot replace the necessary knowledge to make appropriate modeling choices that are always necessary when using a software, but it can give the fallacious illusion of easily achieving reliable results without the need of a solid expertise. • The analysis/verification procedures: as mentioned above, in the case of existing URM buildings the use of nonlinear approaches is particularly common, adding as a consequence the potential scatter deriving from the non-uniqueness of the solution and its dependence on the convergence algorithms (e.g. as discussed in Cattari et al. 2021).
It is worth highlighting that, although the paper is focused on the URM modelling through nonlinear analyses, the first three aforementioned issues concern the linear analyses as well. As an example, in Lagomarsino et al. (2020) it is discussed how different choices of various analysts may lead to equally scattered results also in seismic assessment performed using linear static methods.
Within this general context, it was felt that by conveying the effort of many experts in the critical analysis of selected benchmark examples of different complexity, analysed with several common modelling approaches and nonlinear software packages, it would have been possible to provide a useful reference to the engineering and scientific community. To this aim, a wide research program was carried out, starting in 2014, by several Italian Universities involved in the ReLUIS projects ("Rete dei Laboratori Universitari di Ingegneria Sismica"-Italian Network of University Seismic Laboratories) as commissioned and funded by the Italian Department of Civil Protection (DPC). In the following sections, the reference to this wide research activity is synthetically named as "URM nonlinear modelling-Benchmark project". To the Authors' knowledge, the only similarly extensive experience in the civil engineering literature is in the field of flood and coastal risk management research and was promoted by the UK Environment Agency (Néelz and Pender 2010).
The primary goal of the "URM nonlinear modelling-Benchmark project" was to define a set of Benchmark Structures (BSs) to be adopted as reference for verifying and validating the proper use of software packages used by professionals and researchers for the nonlinear modelling and seismic assessment of URM buildings. Up to now, the attention has been focused only to the global response governed by in-plane response of walls, i.e. the modelling of out-of-plane collapse modes has not been considered yet. The set of BSs, that is described in detail at Sect. 3, involves case studies of increasing complexity, ranging from simple panels to 3D buildings representative of actual complex structures. Some of the latter are permanently monitored by the DPC (Dolce et al. 2017) and were hit by the Central Italy 2016-2017 earthquake sequence , thus providing interesting data to be used also for validating the reliability of the achieved results. All the considered BSs have been specifically designed providing, where possible, analytical solutions as reference and procedures for checking the results or estimating the expected range of variation through simplified approaches. Moreover, all these BSs can be replicated by other researchers and analysts since all the input data are provided in this paper as supplementary electronic material (Annex I-Benchmark Structures Input Data).
The researchers involved in the project have analysed the BSs by using different modelling strategies which are usually adopted not only at research level but also in the current engineering practice, i.e.: equivalent frame (EF) models; and continuum and discrete-macroelement models based on two-and three-dimensional elements (synthetically hereinafter named as "refined models"). An overview and critical comparison of the features of the EF models adopted in the research program is provided in Sect. 4 of this paper, while the one of refined models is presented in D' Altri et al. (2021). When possible, the models were applied using the same assumptions, for an easier interpretation of the differences in the results obtained by the different software packages (Sect. 5). Moreover, standardized criteria have been adopted to more effectively address the comparison of results (Sect. 6): this represents an additional valuable outcome of the research, since only one or two computer programs are used in many scientific papers (see Sect. 2) and the question often rises on how the epistemic modelling uncertainties affect the achieved dispersion of results.
The analysis of each benchmark structure gave the opportunity to the whole research group to deepen various critical issues in the modelling and interpretation of the seismic response of URM buildings. While the present paper aims to provide an overview on the conceived BSs and the methodological approach adopted in the whole research project, further papers by the other involved research teams are more specifically devoted to present the results achieved on each BS  for the BS3, Manzini et al. 2021 andCannizzaro et al. 2021 for the BS4, Ottonelli et al. 2021 andCastellazzi et al. 2021 for the BS5; Degli Abbati et al. 2021 for the BS6) or to discuss more in depth critical issues in the modelling D'Altri et al. 2021).

Motivations emerged from other benchmarking studies available in literature
Among the comparative studies available in literature, three interesting blind predictions are discussed hereinafter as documented in Mendes et al. (2017), Parisse et al. (2021) and Esposito et al. (2019), respectively. The first two involved research teams (RTs) from various universities at international scale while the third one nine engineering companies working for the seismic assessment of the Groningen building stock, that is located in the northern part of the Netherlands and was subjected in the past decade to human-induced shallow earthquakes. The blind predictions discussed in Mendes et al. (2017) and Esposito et al. (2019) have been carried out within the context of two experimental campaigns (on shaking table, the first one, and through a quasi-static cyclic test, the second one). Conversely, the one described in Parisse et al. (2021) refers to a scientific exercise proposed within the 16th European Conference on Earthquake Engineering in Special Session 18 , for which the feedback on the actual seismic response of the examined benchmark structure is missing. Moreover, it has to be highlighted that, despite up to now the "URM nonlinear modelling-Benchmark project" has been essentially focused to the in-plane response, the experience illustrated in Mendes et al. (2017) mainly refers to the activation of out-of-plane mechanisms. Nevertheless, it has been analysed for providing a more comprehensive overview of the issue and demonstrating that it is not limited to the global response: that is useful also to address possible developments of the research.
As shown in Fig. 1, two different configurations have been analysed in Mendes et al. (2017) and Parisse et al. (2021) with analogous geometric dimensions but characterized by two different masonry typologies (i.e. irregular stone masonry and solid brick masonry); instead, the prototype tested in Esposito et al. (2019) consisted in a full-scale two-story specimen resembling a modern terraced house built after 1980.
Among the three prototype structures, the one proposed in Magenes et al. (2018) (and analysed in Parisse et al. 2021) is the most complex, since it consists of a three-story building with an idealized geometrical layout conceived to be representative of URM building units in historical centers of the Mediterranean and Central European countries (as based on the typological studies presented in Carocci and Circo 2014). The two proposed configurations vary not only in the types of masonry walls, but also in the horizontal diaphragms, namely: (A) stone masonry and flexible diaphragms, comparable to a traditional non-engineered building; (B) brick masonry and rigid diaphragms, comparable to a modern codebased structure.
In all the three blind prediction experiences, the involved analysts were kept free in the choice of modelling approach, method of analysis (if static or dynamic, and linear or nonlinear) and assessment criteria, necessary to finalize the seismic assessment of the structure under examination.
For such blind predictions, Table 1 highlights the synthetic features of the examined prototypes and the number of available simulations carried out by the RTs together with a brief summary of main results. In particular, a synthetic parameter (P) representative of the collapse condition is reported (as clarified in the legend of Table 1, depending on the blind prediction) that consists of: the Peak Ground Acceleration of the seismic input producing the collapse, in the case of Mendes et al. 2017 andParisse et al. 2021; or the Capacity/ Demand ratio computed according to the Capacity Spectrum Method (Freeman 1998), in

Table 1
Overview on results achieved in some blind prediction experiences documented in literature *The number of blind predictions deriving from a different modelling approach or analysis method **Not present/not available # P exp indicates-where the evidence from the actual response was available-the synthetic parameter representative of the collapse condition. In the case of Mendes et al.  Esposito et al. (2019) (according to what recommended in NPR 9998:2018NEN 2018. When the target provided by the experimental test was available, the same parameter is summarized for both the actual evidence (P exp ) and the estimate from the blind prediction (P bp ). The latter corresponds to the mean value of all available numerical simulations; in this case, the corresponding coefficient of variation (COV) has been also computed in order to quantify the scatter of results.
Results show a significant dispersion in all cases denoting a criticality which is generalized and not limited to specific typologies (i.e. of masonry, of failure mode-e.g. if associated to the in-plane or out-of-plane response-, or of diaphragm type-e.g. if flexible or stiff). Concerning the reliability of predictions, those discussed in Mendes et al. (2017) are in average on the safe side; however, the result is not confirmed in Esposito et al. (2019). Apart from that, it is worth observing that, although in the first case the result is conservative in average, the high COV reveals that within the set of analysts someone was extremely on the safe side while others completely on the unsafe side; to clarify the issue in Table 1 also the minimum and maximum values from blind predictions are reported together with the number of simulations that exceeded the experimental outcome (see also Fig. 2b for the experience illustrated in Esposito et al. (2019)). From an engineering point of view, when the synthetic parameter P becomes the starting point for eventually designing strengthening interventions, it means that an analyst may choose not to carry out a strengthening intervention on the structure while another one may choose to design a very heavy strengthening intervention. Furthermore, a more accurate comparison of results highlights that the safety indexes may correspond also to a very large variety of failure modes (see Fig. 2a). This means that, in some cases, analyses that yield comparable values of the safety indexes, can lead nevertheless to interventions on different structural elements and/or addressed Fig. 2 a Experimental collapse mechanism and variety of those simulated in the blind prediction in Mendes et al. (2017), for the brick mock-up. b, c Scattering of the simplified capacity curves and capacity/demand ratio estimated by the engineering companies in Esposito et al. (2019) to solve different structural criticalities. With respect to that, also when the attention is strictly focused to the predictions consistent with the failure mechanism actually occurred, the scatter didn't significantly decrease (see the results in Table 1 referred to Mendes et al. 2017). Finally, even when the same modelling approach is considered (see Fig. 2c in the case of Esposito et al. 2019), the COV doesn't significantly decrease denoting once again that the critical issue is generalized; the same result was confirmed also by the results discussed in Parisse et al. (2021).
Even the studies carried out by a single research team, which should ensure a higher cross consistency among the hypotheses assumed across the adopted software packages or the different modelling approaches investigated, show in many cases a not negligible dispersion on results. By way of example, Fig. 3 summarizes some results presented in Marques and Lourenco (2011) and Aşıkoğlu et al. (2020) through a view of the examined prototype structures (Fig. 3a,b) and the pushover curves (Fig. 3c,d) obtained by using various commercial software packages, i.e.: 3Muri (distributed by STA. Data which is based on the work of Lagomarsino et al. 2013) and 3DMacro (Caliò et al. 2012), to which SAM II (Magenes and Della Fontana 1998;Magenes et al. 2006) and DIANA-FEA (2017) are then added in the comparisons discussed in Marques and Lourenco (2011) and (Aşıkoğlu et al 2020), respectively. DIANA-FEA belongs to a continuum model approach, 3Muri and SAM II work according to the equivalent frame modelling whereas 3DMacro is based on the formulation of a plane discrete macro element.
Besides confirming a potential dispersion on results, these two studies are interesting because they better clarify the sensitivity of pushover curves on various modelling hypotheses when the same software package is used, e.g.: varying the tensile strength in DIANA-FEA and 3DMacro (in Aşıkoğlu et al 2020) (Fig. 3d) or inhibiting some shear failure mode-namely the bed joint sliding one-in SAM-II (in Marques and Lourenco 2011) (Fig. 3c). The mentioned examples represent only two of the possible different choices made by professionals in assessing the seismic response of an existing building that can produce dissimilar outcomes (as more extensively discussed in Bracchi et al. 2015).
It should be noted that all the software packages used in the aforementioned studies were firstly validated against the results of experimental tests available in literature: in Marques and Lourenco (2011) by referring to the URM wall tested under lateral static loading by Cappi et al. (1975), in Aşıkoğlu et al. (2020) by referring to the concrete block masonry building tested by dynamic shaking table tests by Avila et al. (2018). In general, the comparisons made by these authors showed a reasonable agreement and highlighted how, when significant differences were found, those were essentially due to limitations of the modelling approaches in describing all the activated failure modes (e.g. related to the activation of out-of-plane mechanisms in equivalent frame models able to capture just the in-plane response). This corroborates the hypothesis that an extremely important issue is the proper and aware use of the software packages, more than in deficiencies of the software themselves.
Finally, an interesting experience made on a definitively more complex existing structure, that is the Bonet building (the most ancient body of the National Palace of Sintra in Portugal), is described in Malcata et al. (2020). Here, two models, one set through the 3Muri program and the other with the ABAQUS software package (that belongs to the finite element approach), are firstly calibrated by using as target the frequencies estimated from ambient vibration measurements and by involving various uncertainties related to the boundary conditions (actually the building is in aggregate) and the characterization of material. By adopting such calibrated models, the results demonstrate a quite good agreement on the simulated damage pattern and reasonable differences in pushover curves.
The comparative studies available in literature do not allow to be conclusive in quantifying the uncertainty associated to the software-to-software variability neither when used under the same hypotheses nor when the hypotheses are kept free. In particular, the stateof-the-art on the topic reveals the following limits: • The available comparative studies have been carried out only on very simple structures (usually two-story single-unit or even single walls), while in few cases there is an attempt to analyse more complex building (i.e. in Marques e Lourenco (2014) and Malcata et al. 2020). These experiences doesn't exhaustively allow to assess if there is or not an increase/decrease in the scatter passing from simple to complex structures. • From a study to another the examined prototype changes together with the set of software packages adopted. This makes it difficult to compare the results and to extent the conclusions of each single work to a general perspective. • In various works, when modelling approaches at different scale of discretization have been adopted (e.g. if at material scale-like the continuum models-or at the structural component scale-like the equivalent frame ones), a preliminary calibration of mechanical parameters or other uncertainties involved in the modelling process has been firstly carried out in order to guarantee a consistency across the models. However, such calibration has been often based on different approaches, such as for example: matching the slope of the pushover curve in the initial elastic range by involving in the calibration process only the Young's modulus of masonry (Aşıkoğlu et al 2020); by firstly simulating an experimental campaign on simple mock-up specimens like the 1 3 diagonal compression tests (Betti et al. 2014); by adopting as target the dynamic properties estimated by ambient vibration tests (Malcata et al. 2020). Indeed, as discussed in D'Altri et al. (2021) and Cattari et al. (2021), even if only the cross-consistency between mechanical parameters is addressed, such a calibration phase is particularly tricky, especially when the wide variety of material constitutive laws formulated in the literature is considered. Thus, general recommendations for such calibration would be very useful. • The comparison of results is usually focused on the pushover curves and the damage pattern, while in very few cases it is extended to local parameters such as the evolution of internal generalized forces in specific structural elements (Marques and Lourenco 2014). This makes more difficult to pinpoint the reasons for the differences.
The objectives of the "URM nonlinear modelling-Benchmark project", as already outlined in Sect. 1, are to attempt to fill these gaps in order to provide a qualified contribution to the scientific and professional community and to enrich the dataset of structures examined, making them available for other researchers and professionals for comparison with these results when using different software packages.

Overview on the proposed benchmark structures
Within the "URM nonlinear modelling-Benchmark project", six Benchmark Structures (BSs) have been designed. While the main motivations led their selection and their main features are briefly reported below, all the data that allow their reproducibility also by third parties are documented in Annex I-Benchmark Structures Input Data (provided as supplementary electronic material). Figure 4 provides an overview of the BSs that consist in: a BS1-single panel; b BS2portal wall; c BS3-2D multi-story wall; d BS4-3D two-story single unit building; e BS5-3D complex URM existing building with irregular plan, inspired by the school "P. Capuzi" in Visso (MC, Italy); f BS6-3D complex URM existing building, inspired by the Town Hall of Pizzoli (AQ, Italy).
The growing complexity of BSs is designed to guide the analyst in progressively developing a more and more advanced awareness of the hypotheses on which the software package under examination is based on.
The simplicity of BS1 and BS2 may appear trivial, but it allows to explore the topic of masonry parameters calibration (see also D'Altri et al. 2021 for BS1) and the modelling of the interaction between the spandrel and the architrave or another tensile resistant element coupled to it, both crucial issues particularly in the case of use of refined models (see also Cattari et al. 2021;Occhipinti et al. 2021). Some of the proposed BSs are inspired by case studies analysed in previous experimental (BS1 by Anthoine et al. 1995 andBS4 by Calvi andMagenes 1994) or numerical (BS3 by Liberatore et al. 2000) researches or for which evidence on the actual seismic response was available (BS5 and BS6 by Cattari et al. 2019). From the scientific viewpoint, such a requisite is important in order to have also the possibility to assess the reliability of achieved results against the actual seismic response occurred. Moreover, as better clarified in Sect. 3.1, starting from given geometric dimensions, the BSs have been parametrically varied.
The first benchmark structure (BS1) consists of an isolated masonry pier that, besides deepening the abovementioned calibration process, is useful also for professionals to get awareness about the role of the different parameters which the simplified strength criteria proposed in the codes are based on (namely mechanical parameters, axial load, static scheme, aspect ratio). The proposed parametric configurations allow to highlight the sensitivity of results not only in terms of maximum base shear but also of failure mode and ultimate displacement. The cases BS1_S1/M2 and BS1_S2/M2 (see Sect. 3.1 for the notation) refer to two panels tested by Anthoine et al. (1995), for which also data on the mechanical characterization of masonry components are available.
The second benchmark structure (BS2) consists in a portal wall meant to elaborate on the variation of the axial load on the determination of the maximum strength of panels and the interaction phenomena between the pier and spandrel elements as well.
The geometry of the 2D multi-story wall (BS3) is inspired by the internal wall of a building dating back to the early 50 s of the last century in Catania (Fig. 5), originally selected for the aims of the "Catania Project" (Liberatore et al. 2000). This BS allows to elaborate on the effects of the pier-spandrel interaction at the scale of a multi-story wallsystem, expected to be more amplified than in the portal wall.
The fourth benchmark structure (BS4) consists in a 3D 2-story single unit URM building with rigid diaphragms. Starting from this 3D structure and then moving to the other Overview on the benchmark structures studied in the "URM nonlinear modelling-Benchmark project" complex buildings, it is possible to explore many issues that can affect the seismic response of existing buildings, such as torsional effects, the coupling effectiveness between intersecting orthogonal walls and the diaphragms stiffness, all capable to affect the seismic action redistribution among the bearing walls. The geometry of the Type I-wall (see the notation introduced in Table 2) and the masonry type of BS4 are consistent with those of the "Door wall" tested by Calvi and Magenes (1994). Finally, BS5 and BS6 are inspired by the plan configuration, geometry and masonry type of two strategic buildings permanently monitored by the DPC through the OSS (acronym of the Italian name "Osservatorio Sismico delle Strutture"). These buildings were already analysed, for different scopes, within another ReLUIS project (ReLUIS-Task 4.1 Workgroup (2018), Cattari et al. 2019) collecting very accurate information on both buildings, that are now available also to other researchers interested in simulating their seismic response (via the website http:// www. prote zione civile. gov. it). Both buildings were selected to be mainly associated to a box-type behavior, which the attention of the "URM nonlinear modelling-Benchmark project" was focused on (at least in the first phase already completed). Starting from their original configuration, few simplifications have been adopted in the corresponding benchmark structures, as described in Annex I-Benchmark Structures Input Data.
More specifically, BS5 replicates the geometry of the "P. Capuzi" school in Visso (province of Macerata, Marche, Italy), that consisted of two stories above ground and an attic covered by a pitched timber roof. It is characterized by an irregular T-shaped plan and load-bearing walls consisting of two-leaf stone masonry with a rather regular bond scheme; the external walls present a quite regular pattern of openings. The school was severely damaged by the seismic sequence that hit Central Italy in 2016 so that the municipality decided to demolish it. The building essentially exhibited a global in-plane response with cracks concentrated in piers and spandrels (Fig. 6). The activation of such type of response was clearly manifested from the first shock of 24th August 2016, while the local  Table 2 Overview of the parametric configurations (the parameters that change are summarized in the cells as marked in bold) and fixed parameters proposed for the BSs *Each BS is identified by a sequence of tags aimed to identify-when necessary-the parametric configuration analysed, i.e.: BS1_S1/M1/BC1 identifies the panel characterized by slenderness equal to 1.35, stone ashlar and fixed-fixed condition; BS4_P1 identifies the BS4 characterized by the plan configuration P1 as defined in the Annex-BS Input Data; BS5_C identifies the complex building inspired to P. Capuzi school characterized by RC tie beams coupled to spandrels **Four structural details are considered: Type A-with spandrels no coupled to any tensile resistant element; Type B-with spandrels coupled to steel tie rods; Type C-with spandrels coupled to RC tie beams; Type D-equivalent to the shear-type idealization ***For a more detailed description, see   at the end of sequence thanks to the data acquired after a survey made on 8th December 2016: the increasing thickness of lines corresponds to that of the crack severity; the dashed fill corresponds to the collapsed portion.
The clear evidence of the concentration of cracks in piers and spandrel and an accurate survey of their extension (Fig. 7) represent a precious (and rare) reference also to address the rules to be adopted in the definition of the structural elements geometry in the equivalent frame models. Indeed, this is one of the modelling uncertainties that can produce appreciable dispersions in results as discussed in Bracchi et al. (2015) and Quagliarini et al. (2017) and shown by quantitative evaluations in Cattari et al. (2021), Manzini et al. (2021) and Ottonelli et al. (2021).
Finally, BS6 is inspired by the Pizzoli town hall (province of L'Aquila, Abruzzo, Italy) with two floors above the ground level and a non-habitable attic. The plan can be assimilated to an elongated rectangle and masonry walls are built with stone ashlars. Externally, the structure shows a certain regularity in the arrangement of the openings, which are evenly distributed along the walls and vertically aligned. The building has been mainly struck by the shock of 18th January 2017 attaining a nonlinearity level lower than the "P. Capuzi" school in Visso. Cracks-from slight to moderate (limited to few cases)occurred mainly in piers as shown in Fig. 8 and no evidence of activation of local mechanisms was found.
Besides being representative of more complex structures than the other BSs, these two structures have the added value of providing data essential for validation aims and useful to carry out comparisons with the actual response both in qualitative (e.g. damage pattern) and quantitative terms (e.g. thanks to the dynamic parameters identified from ambient noise measurements and the acceleration recordings under the main mainshocks). Although the validation through an accurate calibration of the model (Ferrero et al. 2020;Sivori et al. 2021) or the simulation of the actual seismic response through more refined analyses, such as the nonlinear dynamic ones Brunelli et al. 2021;Miraglia et al. 2020), are out of the primary scopes of the "URM nonlinear modelling-Benchmark project", the availability of the dynamic properties and of an accurate reconstruction of the damage pattern allows to provide a first assessment of the reliability of the response predicted by commercial software packages.
In line with this, in Degli  the availability of dynamic parameters (in terms of frequencies and mode shapes) for Pizzoli town hall has been used to verify ex-post the consistency of the linear response forecast by the blind predictions carried out on five models of BS6_C (where C indicates the presence of r.c. tie beams at floor level, as clarified at Sect. 3.1) set up through commercial equivalent frame software packages. While in a first phase of the blind prediction, the elastic properties were set according to reference values proposed in the literature (MIT 2019) for a masonry type analogous to that of the building, then the available experimental data were used to refine the calibration of models and validate them. Figure 9 shows the results of the blind prediction in which the comparison with the target experimental value is made through the MAC index (Allemange and Brown 1982), for the mode shapes, and the percentage error, for the periods. Results show a very good agreement (indicating values of the MAC index very close to 1).
On the other hand, nonlinear static analyses performed on BS5_C, as discussed in Ottonelli et al. (2021) and Castellazzi et al. (2021), have offered the possibility to check the consistency of the global failure mode forecast by nine models-set up with software packages belonging to equivalent frame or refined approaches-against the actual one. Although it is evident that nonlinear static monotonic analyses are too rough to exactly reproduce the cyclic and damage accumulation phenomena that interested the "P.Capuzi" school in Visso, nonetheless the comparison was useful to check the agreement in terms of concentration of damage in piers and spandrels and of failure mode type (e.g. if prevailing flexural, shear or hybrid).

Outlined parametric configurations
Starting from a set of parameters kept fixed for each BS, further parametric configurations have been designed varying the masonry typology, the boundary conditions or the structural details. Table 1 provides an overview of the whole set of configurations, for which all the necessary details are illustrated more in detail in Annex I-Benchmark Structures Input Data.
In particular, the single panel (BS1) is proposed in two different configurations. The first is characterized by a stone ashlar masonry (tagged as M1) with fixed slenderness (tagged as S1) varying then two boundary condition (tagged as BC) schemes (i.e. fixed-fixed-BS1_S1/M1/BC1 and cantilever-BS1_S1/M1/BC2). The second is characterized by a brick masonry with mortar joints (tagged as M2) with fixed static scheme (fixed-fixed, tagged as BC1) varying two slenderness ratios (BS1_S1/M2/BC1 and BS1_S2/M2/BC2). The panel are first subjected to an axial load at their top, followed then by the application of in-plane horizontal shear while the axial load is kept constant. For each configuration, different axial load values have been considered applied on the top of panel in order to test different regions of the strength domain.
Moving to the benchmark structures from BS2 onwards, they are specifically conceived to be affected by the coupling effect provided by spandrels to piers as a function of their stiffness and strength properties. To this aim, four parametric configurations are proposed, namely: • A-spandrels not coupled to any tensile resistant horizontal element at floor level.
Only the presence of an effective architrave is assumed while the contribution of other factors that can produce an equivalent tensile strength on spandrels (like as the interlocking with the adjacent masonry region, as discussed in Beyer and Mangalathu 2013) is neglected. • B-spandrels in presence of horizontal steel tie rods. • C-spandrels coupled to reinforced concrete (r.c.) tie beams. • D-piers coupled by beams characterized by an infinite axial stiffness and restrained against the rotation in order to simulate the so called "shear-type" ideal scheme.
When the structures are subjected to in-plane horizontal loading, it is expected that starting from configuration A (weak spandrel-strong pier behaviour type) and moving to the ideal shear type one (D-strong spandrel-weak pier behaviour type) through the configuration C, both the global stiffness (K s ) and the base shear (V) progressively increase in the pushover curves, while the ultimate displacement capacity (d u ) decreases. This effect may result more or less evident as a function of the geometric configuration of the structure, the number of openings and their alignment (if regular or not). Figure 10 illustrates, by way of example, the pushover curves achieved across the various BSs proposed (from BS3 to BS6) by one of the software packages (SW) adopted by the RTs involved in the "URM nonlinear modelling-Benchmark project"; the SW belongs to the equivalent frame approach and all the pushover curves refer to the analyses performed on + X direction with load pattern proportional to masses. Although the entity of the variation on the aforementioned parameters (K s , V, d u ) may vary, the general trend was confirmed also by the other SWs used in the project and working on the same modelling approach, as testified in Fig. 11 in the case of BS4. This trend, and in particular the quite large difference passing from configuration A to the others, is not always found in more refined modelling approaches. This can be ascribed to many additional modelling choices that can influence the outcome of A-configuration for these models, whose higher accuracy corresponds also to a very large variety of possible choices. Just to name a few factors: the calibration of parameters used for simulating the flexural and shear response of spandrels; the modeling of the local interaction with the architrave; the interaction provided with the diaphragms.
As an example, Fig. 12 shows the results achieved in the case of BS4 by a finite element models that uses a nonlinear isotropic material model as constitutive law in which different values on the tensile strength parameter (f t ) have been adopted (where f t regulates both the flexural and shear behaviour of masonry panel as discussed in detail in D'Altri et al. 2021). Starting from a value of f t calibrated to be consistent with the reference one proposed for piers in the Annex I-Benchmark Structures Input Data (i.e. equal to 0.1 MPa), various alternatives have been explored changing the f t of spandrels (in particular reducing it) to attempt to reproduce the case of "weak spandrel" configuration. Of course, this example cannot be exhaustive of the problem, but it just aims to give an idea of the complexity of the issue and the consequences that different choices of analysts may produce on results. Additional issues on the topic are discussed in Cattari et al. (2021) and Occhipinti et al. (2021).
More simplified models-like those based on the equivalent frame approach-apparently seem less problematic in modelling the A-configuration, since the simplified hypotheses usually adopted by the commercial software packages are standardized oversimplifications of this complex problem. In fact, the current practice is to simulate A-configuration: either by assuming for spandrels the same strength criteria of piers (that leads, in absence of other tensile resistant elements, to obtain a null strength for the spandrel flexural behaviour); or by directly modelling the spandrel as axially rigid rods connecting the masonry piers. The use of "apparently" is justified by the fact that the results on BSs studied by the RTs involved in the "URM nonlinear modelling-Benchmark project" anyhow showed that the A-configuration is systematically associated to the highest dispersion of results, even when only the equivalent frame models are considered (as discussed more in detail in Man- Finally, in the case of BS4_C, the epistemic uncertainty associated to the effective length of r.c. tie beams has been considered as additional parametric configuration. Such effective length aims to account for the more or less effective coupling with the masonry Fig. 12 BS4: Example of sensitivity of results to different choices of parameters adopted for spandrels and piers in attempting to simulate the weakspandrel configuration by using a continuum model portions, as also testified in the experimental work done by Beyer and Dazio (2012). To this aim, two possible configurations were analysed, namely r.c. tie beams with: an effective length equal to the total distance between the two incidence nodes of piers (separated by the opening) (long tie beams, RC1); or an effective length equal to the net width of the corresponding opening (short tie beams, RC2).

Overview on the software packages used in the "URM nonlinear modelling-Benchmark project"
Different modelling strategies and software packages have been adopted up to now by the RTs involved in "URM nonlinear modelling-Benchmark project", namely: • Continuous finite element models: ABAQUS (2017) by adopting the constitutive law developed by Lee and Fenves (1998) and Lubliner et al. (1989); MIDAS FEA (2017) by using the constitutive law originally proposed in Vecchio and Collins (1986) and Selby and Vecchio (1993); LUSAS (2001)  While a more in-depth discussion on the hypotheses which the SWs working according to a more refined approach are based on is presented in D'Altri et al. (2021), focusing in particular on the basics of the constitutive laws adopted, an overview of the different modelling options offered by the SWs based on the equivalent frame approach is provided below. A state-of-the-art of different options involved in the modelling process for this approach is provided in Quagliarini et al. (2017), while a discussion on the repercussions of some of them in the seismic assessment is provided in Bracchi et al. (2015), Cattari et al. (2021), Ottonelli et al. (2021), Manzini et al. (2021). With a different perspective, hereinafter an overview is illustrated on the alternative ways that the SWs adopt to implement the many different choices which the analysts involved in the seismic assessment of URM building are called on. Although not exhaustive, the set of analysed SWs reflects the tools available to professionals in Italy nowadays; moreover, many of the SWs selected are used also at international level, as the subset of scientific studies discussed in Sect. 2 partially testified. Since a detailed analysis of the features of each single software package is out of the scopes of the work, the data are discussed in aggregate way as a function of the various factors that intervene on the modelling (F Mi ). In particular, in Figs. 13, 14 and 15 the recurrence of alternative options that the SWs allow to manage is reported together with those assumed by default (highlighting also when the default option may be modified or not).

FM1 -Equivalent Frame idealization
A: automatic (only one rule), but editable by the user B: automatic (more rules), but editable by the user C: free, always up to the user FM2 -Wall-to-wall connection and flange effect A: perfect kinematic coupling assumed by default, but editable by the user in C-option B: perfect kinematic coupling assumed by default, editable only into a null wall-to-wall connection C: coupling regulated by a proper calibration of the stiffness of equivalent beams   Figure 13 provides an overview on some general issues related to: how the equivalent frame idealization of walls is carried out (F M1 ); how the wall-to-wall connection and the composite action of intersecting walls ("flange effect") are managed (F M2 ); how the transfer of loads applied on the floor diaphragms to the walls is modelled (equivalent forces in nodes) (F M3 ); how the diaphragms are modelled (F M4 ); how the out-of-plane contribution of piers (in terms of additional stiffness and strength) is accounted for (F M5 ); how the convergence algorithms are implemented (F M6 ).
Regarding F M1 , the definition of the geometry of the elements where the nonlinearity is allocated constitutes the first step of the EF modelling approach. All the considered SWs let the users free to define the one to be adopted, thus guaranteeing a certain flexibility in describing complex configurations (like those in which the opening pattern is irregular, as the case of URM buildings often is). However, whereas in some SWs this phase is completely managed by the user (2 out of 6), other SWs (4 out of 6) implement some rules proposed in the literature suggesting a first tentative idealization that may be subsequently edited by the analyst.

FM7 -Modelling of piers (constitutive law)
A:LP whose properties are directly calculated by the software B:LP whose properties are directly calculated by the user C:LP that may be converted in a fiber model (only for describing the flexural/axial response) LP: non linear behavior (usually elasto-plastic) with "zero-length "lumped plasticity As far as F M2 concerns, the flexibility of the software on such an aspect is really relevant for describing, on one hand, various degrees of wall-to-wall connection and, on the other, the so-called flange effect. In most cases (4 out of 6), the SWs assume by default a perfect kinematic coupling among incident walls, that may be obtained by different solutions as the use of kinematic constraints and consequent condensation of the degrees of freedom (Fig. 14a), or very stiff equivalent beams (Fig. 14b). In all cases, this option can be then edited allowing: in three cases, to pass from the perfect kinematic coupling to the use of beams of finite stiffness (Fig. 14c); in one case, to delete the rigid link (thus by downgrading the full coupling to a null wall-to-wall connection). In most cases (5 out of 6), the effectiveness of wall-to-wall connection is managed by a proper calibration of the aforementioned beams made by the user, whereas only in one case it is simulated by directly defining the dimension of the collaborating portion of the orthogonal pier (i.e. the flange width as illustrated in Fig. 14d). In this last case, different models have to be adopted depending on the main direction in which the horizontal forces are applied to. Additional details on this modelling factor and the repercussion on results are discussed in Cattari et al. (2021) and Ottonelli et al. (2021), respectively.
Moving to F M3 , for half of the examined SWs (3 out of 6) the conversion is automatically operated by the computer program on the basis of information provided by the analyst (i.e. load per unit floor area and main orientation); for two SWs the tributary floor area of each structural elements is requested; and, finally, in one case both options are managed as a function of the diaphragm typology (if mono-or bi-directional).
Regarding the diaphragms modelling (F M4 ), most of SWs (4 out of 6) idealize them as linear elastic orthotropic membranes, even if in some cases (2 out of 6) the default modelling option assumes the adoption of infinitely rigid floors. In two cases, rigid diaphragms are simulated by a kinematic coupling among the nodes pertaining to the diaphragm and the only alternative is to remove the coupling, completely neglecting the membrane stiffness of the diaphragm.
In the case of F M5 , the majority of SWs (5 out of 6) allow to consider the out-of-plane stiffness and strength of piers into the global response. When included, in two cases such option is directly managed by the software after selection by the analyst; in the other cases, the user is in charge to insert specific end releases when the out-of-plane stiffness is to be neglected.
Finally, for F M6 , most of SWs adopt the Newton-Raphson strategy usually implemented with the arc-length method to describe the softening phase, while only in one case the event-to-event approach is implemented. Figure 15 depicts the solutions adopted by the examined SWs to model the URM panels in terms of: the constitute law adopted and the dependency of the shear strength on the axial load N acting on the panel (F M7 for piers and F M8 for spandrels); geometry and interaction with other tensile resistant elements, in the case of spandrels (F M9 ); computation of angular deformation or drift (F M10 ).
Considering F M9 and in particular the geometry of spandrels, the alternative options consist in splitting the spandrel in two elements in correspondence of the floor level, or to keep one single masonry spandrel. This option may turn out useful when the masonry typology varies passing from one level to the next one. In most cases, there is a complete flexibility in adopting one or the other solution, only in one SW spandrels are always split. Regarding the interaction of the spandrel with other tensile resistant elements, and more specifically to its possible influence on spandrels strength, in most SW the strength is evaluated computing the equivalent maximum axial force transferred to the spandrel (H p ). According to the criteria proposed in MIT (2019) and discussed also in Beyer and Mangalathu (2013), such axial force may be computed as the minimum value between the maximum tensile strength of the coupled element and the limit equal to 0.4f mh A sp (where f mh and A sp are the masonry compressive strength in horizontal direction and the transversal section of the spandrel, respectively). This approach assumes that an equivalent strut behavior is likely to occur in the spandrel when a tensile resistant element is coupled to it. In 3 out of 6 SWs, the value of H p is automatically computed by the software package on basis of data inserted by the users for the other modelled coupled elements (e.g. a r.c. tie beams or a steel tie rods) whereas in the other half the value used in the strength criteria is directly defined by the analyst.
Regarding F M10 , the main issue is whether rigid body motion is detracted from the angular deformation demand on the pier. Half of the SWs compute the angular deformation (sometimes called as "drift"). In the other cases, different approaches are adopted for the drift computation, that is: the simple ratio between the difference of horizontal displacements at the end sections and the panel height; the chord rotation; its equivalence with the plastic component of the rotation. Indeed, this is a quite intricated issue, since most of codes recommend thresholds for the "drift" of a wall element to check the attainment of the collapse without clarifying the criteria to compute it. Since these criteria are then autonomously defined by the software packages without giving the possibility to users to change them, it results in a potentially high scatter of the ultimate displacement capacity on pushover curves (as highlighted in all the BSs examined in the project and documented in Manzini et al. 2021, that usually decreases only for specific idealized conditions (e.g. in the shear-type idealization, see also Cattari et al. 2021).
Finally, Fig. 16 compares the hypotheses assumed for the modelling of r.c. tie beams that concern F M11 : the constitutive law adopted (if linear or nonlinear); the effective length. All the SWs considered in the "URM nonlinear modelling-Benchmark project" allow to model these elements as nonlinear, in most cases by assuming nonlinear beams with lumped plasticity and in few of them with a fiber approach. In 2 out of 6 SWs, the adoption of a nonlinear behavior is implicitly adopted by the software package when comprehensive data on the reinforcement are provided by the user. In most cases (5 out of 6) the element consists of an intermediate deformable effective length with nonlinear behaviour and rigid end segments; the definition of the effective length is editable by the user; only in one case, the SW adopts a length equal to the node-to-node distance (no rigid ends) without allowing any modification by the analyst.

FM11-r.c. tie beams (constitutive law)
A: nonlinear constitutive law (or linear) depending on the fact that the reinforcements properties are (or are not) assigned as input by the user B: nonlinear constitutive laws (with concentrated or distributed plasticity) FM11-r.c. tie beams (Leff) A: node-to-node length by default -editable B: length equal to the opening by default -editable C: node-to-node length by default -NOT editable D: manual  Fig. 16 Overview on the options/modelling assumptions adopted by the software packages based on the equivalent frame approach used in the project: r.c. tie beams modelling

Shared modelling criteria adopted for analysing the proposed BSs
In order to reduce the influence of both the different characteristics of the software and of the arbitrariness of analysts in the definition of the models, in the first phase, in the "URM nonlinear modelling-Benchmark project", the modelling process was undertaken by the involved RTs by adopting, when possible, the same common assumptions to minimize the scatter of results and allow an easier and clearer interpretation of the differences obtained using the various SWs. From the overview provided in Sect. 4, a certain flexibility emerged on the alternative options managed by software packages that allowed to proceed in that way. More specifically, in the "URM nonlinear modelling-Benchmark project" the following shared modelling criteria have been adopted: • Good walls-to-walls connection.
• Same gravity loads (and shared criteria for their distribution among walls). • Rigid diaphragms (as regards membrane action). • Same mechanical parameters for masonry; this implied a calibration process for establishing a consistency between more refined and equivalent frame models, as exemplified in D' Altri et al. (2021). • Same geometry and mechanical parameters for structural elements coupled to spandrels, if present.
Moreover, in the case of equivalent frame models: • The same criteria for the idealisation of URM walls in equivalent frames have been adopted by assigning the same geometry for piers and spandrels (as clarified in the Annex I-Benchmark Structures Input Data). • When possible, the contribution in terms of stiffness and strength associated to the outof-plane response of the walls was neglected.
Obviously, different hypotheses can be adopted by third parties but it is important to be aware of those adopted by the RTs when a comparison with the results presented in Occhipinti et al. (2021) for BS3, Manzini et al. (2021) and Cannizzaro et al. (2021) for BS4, Ottonelli et al. (2021) and Castellazzi et al. (2021) for BS5, Degli  for BS6 is carried out.

Standardized criteria and anonymous format adopted for comparing the results
Given the complexity of nonlinear analysis of URM structures, a univocal and rigorous analytical (reference) solution cannot be evaluated, but it is still possible to adopt, on one hand, analytical tools aimed at avoiding gross errors resulting from a wrong input of the model parameters into the software and, on the other hand, a methodological approach for the critical interpretation of the obtained results. As for the analytical tools, fairly simple and definitely useful controls to be adopted are, as an example, the check of: the consistency of the total mass of the structure, easily computable from the input data, and of the equilibrium between the gravity loads and the vertical reactions at the base of the structure. Regarding the methodological aspect, the execution of numerical analyses on various benchmark case studies through various software packages by the RTs involved in the project and the critical and qualified verification of the output data have allowed to estimate the expected variation intervals of the results, in terms of significant parameters of the structural response (referred to as SRPs, Structural Response Parameters, below). These could be adopted by the professional engineers and other researchers as a reference in the critical evaluation of the results obtained reproducing the proposed examples with a different software package. In general, for the BSs proposed in the project, the analytical solution of the problem under the hypothesis of "shear-type" behaviour has been evaluated with the aim of providing an upper bound for the actual solution in terms of stiffness and strength. The simplified assumption of restrained nodal rotations at each story makes the solution independent from the interaction between masonry piers and spandrels: in this case, in fact, the static scheme of the panels is known a priori (with the point of contraflexure at half the effective height of the element). Although in the case of 3D complex structures (like as the BS4, BS5 and BS6) the application of such a simplified approach can still be demanding and require conventional assumptions on how the distribution of forces among walls, it may constitute a useful reference. In Manzini et al. (2021) such a calculation is exemplified in case of BS4.
Regarding the SRPs adopted in the comparison of the numerical results, some of them refer to the global response in terms of capacity curves both of the whole structure and of each one of the structural walls that compose it, for the more complex BSs; then, additional more detailed checks refer to local quantities associated to single structural elements. A complete list of SRPs is illustrated in Sect. 6.1.
Moreover, since the aim of the research was not to express a judgment on the reliability of any specific software package adopted by the RTs involved in the project, results were represented in anonymous form by assigning a random colour and a random tag to each software package (those belonging to the EF approach have labels namely from SW1 to SW7, while SW8 and beyond have been assigned to the other more refined approaches).

Synthetic structural parameters of the nonlinear response
Concerning the SRPs adopted for the systematic comparison of results achieved on the set of BSs, they have been identified, in detail, in: • The axial force distribution in the masonry piers at the base of the structure after the application of the gravity loads into the model; • The variation of the axial force at the base of the masonry piers at the ground level when the seismic forces are applied (starting from BS2); • The global pushover curve of the structural system and those of the single walls that compose it (in the case of 3D BSs); the average displacement of nodes located on the top level weighted on their associated mass has been assumed as reference; • The parameters describing the equivalent bi-linear curve (K s , d u and V y ) which can be associated to the global capacity curves.
Concerning this last point, among the possible choices (ASCE/SEI 41 2017; EC8-3 2005; MIT 2019), the following criteria have been adopted (see Fig. 17a): • The equivalent stiffness, K s , is evaluated at a base shear level equal to 0.7 times the maximum value. • The ultimate displacement, d u , corresponds to an overall base shear post-peak decay equal to 0.2 times the maximum value. • The equivalent yield base shear, V y , is evaluated by imposing the equivalence of the areas under the capacity curve and the equivalent bi-linear curve up to d u .
The evaluation of the ultimate displacement d u may turn out more challenging in the case of more refined models when a slowly progressing softening phase is assessed (as discussed more in detail in Cattari et al. 2021). As a consequence, this condition has been alternatively identified by considering the point in which a single pier or a sub-set of piers (such to activate a soft-story mechanism in a wall/level) have attained given drift thresholds. This condition is evaluated ex-post from the analyses by assuming as reference the target sections corresponding to the effective height in the corresponding equivalent frame models. To this aim, also the drift thresholds are consistent to those adopted in the EF models to define the collapse condition of piers. See Castellazzi et al. (2021) for an exemplification of such a procedure.
Moreover, as far as the dynamic properties are concerned, for the complex 3D BSs a systematic comparison of periods, modal shapes and participant masses has been carried out (see Ottonelli et al. 2021, for BS5, and Degli Abbati et al. 2021 for BS6, respectively).
For the scalar SRPs, the ratio between the value obtained from each software and the corresponding average value, calculated by considering the results provided by all the software packages belonging to the same modelling approach, was evaluated (see Fig. 17b/c).
This last type of representation constitutes one of the synthetic ways adopted in the project to establish the reference values expected on the scattering of results.
The conversion of the pushover curve into an equivalent bilinear constitutes one of the preliminary steps required by most of nonlinear static procedures proposed in literature for computing the seismic demand expected according to the performance-based assessment (PBA). While in Marino et al. (2019) an overview on the reliability of various PBA Fig. 17 Exemplification of the procedure adopted for the: a conversion of the pushover curve into an equivalent bilinear curve; b computation of the average values of the SRPs (K s , V y , d u ); c computation, for each software, of the scatter with respect to the average values of the SRPs available when applied to URM buildings is illustrated, in the "URM nonlinear modelling-Benchmark project" conventional reference was made to the use of N2 method (as originally proposed in Fajfar 2000;recommended in EC8-3 2005). That allowed to compute a further SRP consisting in the peak ground acceleration compatible with the attainment of specific performance levels of the BS under examination (PGA PL ), e.g.: the attainment of yielding base shear or that of the ultimate displacement capacity.

Simulated damage pattern
In this paragraph, the criteria and representations adopted to compare the results of pushover curves in terms of damage pattern are illustrated. The damage is compared for given points of the pushover curves associated to consistent states of the structure simulated by the SWs (namely, after the application of gravity loads, at 0.5 times the maximum base shear, at maximum base shear or at the ultimate displacement capacity).
Actually, a standardized comparison among the predictions of SWs has been systematically carried out only in the case of equivalent frame models. In fact, while for these models the attribution of a specific failure mode to each element (i.e. if associated to a prevailing flexural or shear response) is in most cases straightforward, when passing to more refined models it becomes more conventional and difficult. More precisely, in the case of equivalent frame models that adopt an idealization of masonry panels by nonlinear beams with lumped plasticity, the activation of one or the other failure mode is usually attributed on basis of the minimum value provided by the corresponding analytical strength criteria adopted to interpret them, without defining hybrid mechanisms (that very few SWs introduce). As an example, Fig. 18a shows the strength domain obtained for the BS1_S1/M1/ BC2 by assuming the criteria proposed in Turnsek and Sheppard (1980) and in the Italian Technical Code (NTC 2018) for interpreting the shear diagonal cracking failure mode and the flexural response, respectively. It is underlined that, for the flexural strength criterion, two alternative options are plotted, both based of an equivalent rectangular stress block of normal stresses of height kf m , where f m is the compressive strength of masonry and k is the stress-block equivalence coefficient, one with k = 0.85 and one with k = 1. In the same figure, the point by point numerical simulation of the strength domain obtained by a continuum model calibrated to be consistent with the mechanical parameters adopted in the  Fig. 18 a Analytical strength criteria adopted for interpreting the actual response of BS1_ S1/M1/BC2 and points obtained by the numerical simulation through a continuum refined model; b damage pattern simulated by the refined model for increasing axial loads applied on the top ( adapted from D'Altri et al. 2021) equivalent frame model is represented (as discussed more in detail in D'Altri et al. 2021); the numerical results refer to the maximum base shear achieved in the nonlinear analyses carried out by the refined model. Figure 18b instead shows the damage pattern simulated by the continuum model after the attainment of the maximum base shear and once the failure mode has been completely activated. It is observed that (Fig. 18b): • When the value of the applied axial load ratio is low (σ/f c = 3%) the response of the panel is mainly flexural and characterized by rocking, with an evident parzialization of the end sections; • When an intermediate applied axial load is considered (σ/f c = 12%), the behaviour observed in the simulation can be described as a mixed (or hybrid) type, since both the parzialization of the end sections and the development of a diagonal crack are observed at failure; • Finally, considering a higher value of the applied axial load (σ/f c = 30%), the response of the panel is dominated by shear: the parzialization of the end sections is negligible with respect to the previous case and the failure of the panel is caused by the propagation of a typical shear crack starting from the centre of the panel.
It is evident that, in the case of intermediate value of the axial load, attributing a univocal failure mode is more conventional, as various experimental campaigns testified (Vanin et al. 2017), as well. In fact, by looking at the strength domain of the panel, it can be seen that the applied axial load refers to a transition zone between the prevalence of the flexural failure and the one of the shear failure. As discussed in Manzini et al. (2021), when the axial load acting on the panel is in this transition region, more discrepancies are likely to occur also in the predictions made by equivalent frame models; in fact, slight differences in the variation of axial load (e.g. produced by a different way to implement the flange effect or other actions redistribution carried out by diaphragms) may be directly reflected in being on the left or on the right of the point in which two analytical strength criteria provide the same value. That can be easily verified ex post by analysts. Additional difficulties in associating the damage simulated by refined models to single failure mode arise from the fact that, usually, information from various damage variables have to be integrated (as also depicted in Fig. 18b). This is why in the case of refined models the comparison has been made only in qualitative and general terms (i.e. verifying if the most severe damage is concentrated in the same main panels-if piers or spandrels, paying attention also on their position, checking if diagonal cracking is activated or not, …).
Conversely, in the case of equivalent frame models, two comparisons have been adopted: (I) one able to exhaustively show the damage localization in each element and interpret the global failure mode activated at scale of each wall; (II) another addressed to provide in an aggregate way a synthetic overview of the consistency on the simulated damage across the SWs. The second has been adopted in particular in the case of more complex BSs (BS5 and BS6), where many walls are present. In both cases, the anonymous format of comparisons has been adopted. Figure 19 depicts an example of type I-comparison in the case of BS3_C1 varying the configuration of structural details from A to C and adopting seven SWs.
As the figure clarifies, such type of comparison allows to effectively verify, on the one hand, the general agreement among SWs and, on the other, the expected change in the response. In fact, the response passed from a prevailing flexural failure mode in piers concentrated at ground level in the A configuration to a shear damage of spandrels and more spread damage of piers along the wall height in the C configuration. Moreover, in Fig. 19 the legend adopted for the failure modes is also reported, where: the tags -P and -C stand for plastic (i.e. after the yielding condition) and collapse conditions (i.e. after the attainment of the ultimate drift) of the element; the tags E and T refer to the elastic and in tension states; the tags F, DC and C refer to the activation of a prevailing flexural, diagonal shear and purely compression response, respectively.
Finally, Fig. 20 shows an example of of type II-comparison for the same BS. In this case, for each element, the number of SWs associated to the same prediction in terms of failure mode are counted. Obviously, in the case of perfect agreement among SWs, the number in the ordinate axis exactly corresponds to that of software packages.

Final remarks and future developments
This paper intends to give a global introduction to the research activity defined as "URM nonlinear modelling-Benchmark project", carried out since 2014 by several Italian Universities involved in the Italian Network of Seismic Laboratories (ReLUIS) projects. The main aim of this research was to benchmark software packages used also in the engineering practice to model and assess the seismic response of URM buildings. With this aim, particular attention was devoted to the use of nonlinear static analyses, of widespread use Fig. 19 Damage comparison of type I among SWs belonging to the equivalent frame. Results of nonlinear static Analyses performed on BS3_C1 passing from A to C configuration. In the legend: the tags -P and -C stand for plastic and collapse conditions of the element; the tags E and T refer to the elastic and in tension states; the tags F, DC and C refer to the activation of a prevailing flexural, diagonal shear and purely compression response, respectively in the context of performance-based assessment, and only to global response dominated by in-plane action of walls, by assuming a box-type behaviour of examined 3D structures. The motivation which has oriented this research is the evidence of the large scatter of results produced by the variety of different possible choices that analysts have to preliminary face in the modelling and assessment of existing structures, as emerged also from other benchmarking experiences available in literature.
In particular, the paper presents an overview of the designed benchmark structures, the rationale behind the choices made and the standardized criteria adopted to compare the results obtained with several common modelling approaches and software package tools in terms of pushover curves, damage pattern and local parameters. So far, six benchmark structures of increasing complexity have been designed, ranging from the single panel to 3D buildings inspired to real existing URM structures. All of them are specifically conceived to deepen some critical aspects related to the modelling of the masonry buildings. The paper provides a critical overview of the different modelling options offered by software based on the equivalent frame approach, by comparing the alternative ways that the software packages adopt to implement the many different choices which can be made by the analysts. Although not exhaustive, the set of analysed SWs reflects the tools available to professionals and researchers in Italy and at international level nowadays.
The results achieved on each specific BS are discussed more in detail on other papers that complete the special issue which this paper belongs to. They aim to provide a useful reference to the engineering and scientific community. Moreover, since all the data to replicate the BSs also by other analysts are provided in this paper as supplementary material, Fig. 20 Damage comparison of type II among SWs belonging to the equivalent frame. Results of nonlinear static analyses performed on BS3_C1 passing from A to C configuration. Results are differentiated for piers and spandrels in the future the results could be corroborated by other software packages and possibly further modelling strategies than those used by the research teams involved in the project.
The future steps of this activity, which is still in progress, will be oriented to increase the set of benchmark structures, the number and typology of considered software as well as that of modelling issues being studied, i.e.: the role of stiff/flexible diaphragms (until now assumed as rigid); further aspects on the modelling of spandrels, both through refined models and through the equivalent frame models by exploring additional strength criteria able to account a more accurate interaction with the architrave element and the adjacent masonry portion; the possible activation of out-of-plane mechanisms (until now assumed to be prevented).

Supplementary information
The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s10518-021-01078-0. are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.