HiggsSignals-2: Probing new physics with precision Higgs measurements in the LHC 13 TeV era

The program HiggsSignals confronts the predictions of models with arbitrary Higgs sectors with the available Higgs signal rate and mass measurements, resulting in a likelihood estimate. A new version of the program, HiggsSignals-2, is presented that contains various improvements in its functionality and applicability. In particular, the new features comprise improvements in the theoretical input framework and the handling of possible complexities of beyond-the-SM Higgs sectors, as well as the incorporation of experimental results in the form of Simplified Template Cross Section (STXS) measurements. The new functionalities are explained, and a thorough discussion of the possible statistical interpretations of the HiggsSignals results is provided. The performance of HiggsSignals is illustrated for some example analyses. In this context the importance of public information on certain experimental details like efficiencies and uncertainty correlations is pointed out. HiggsSignals is continuously updated to the latest experimental results and can be obtained at https://www.gitlab.com/higgsbounds/higgssignals .


Introduction
Elucidating the mechanism that controls electroweak symmetry breaking (EWSB) is one of the main goals of the LHC. The spectacular discovery of a Higgs boson with a mass around 125 GeV by the ATLAS and CMS experiments [1,2] marks a milestone of an effort that has been ongoing for almost half a century and has opened a new era of particle physics. In the eight years since the discovery of the new particle at the LHC, its mass has been measured with a few-per-mil accuracy, M obs H = 125.09 ± 0.24 GeV [3]. 1 The measured properties are, within current experimental and theoretical uncertainties, in agreement with the predictions of the Standard Model (SM) [4]. Together with the limits on beyond-the-SM (BSM) particles that were obtained at the LHC with center-of-mass energies of up to 13 TeV and elsewhere, the requirement that the particle spectrum should include an essentially SM-like Higgs boson at about 125 GeV imposes important constraints on the parameter space of possible extensions of the SM.
In order to test the predictions of BSM models with arbitrary Higgs sectors consistently against all the available experimental data, on the one hand the compatibility with existing BSM Higgs-boson searches has to be checked. This can be done with the public tool HiggsBounds [5][6][7][8][9]. On the other hand, any BSM model should furthermore be tested against the measured mass and rates of the observed scalar state. Confronting the predictions of an arbitrary Higgs sector with the observed Higgs signal 2 (and potentially with other, future, signals of additional Higgs states) is the purpose of the public computer program HiggsSignals [10]. Here we present the new version HiggsSignals-2 and update the description of the program w.r.t. version 1.0 as presented previously in Ref. [10].
HiggsSignals evaluates a χ 2 measure to provide a quantitative answer to the statistical question of how compatible the Higgs data (in particular, measured signal rates and masses) is with the model predictions. The χ 2 evaluation can be performed with various methods [10]. Among those the peak-centered χ 2 method (see Section 2) is the default method implemented in HiggsSignals. In this χ 2 method the (neutral) Higgs signal rates and masses predicted by the model are tested against the various signal rate measurements published by the experimental collaborations for a fixed hypothetical Higgs mass. This hypothetical Higgs mass is typically motivated by the signal peak seen in the channels with high mass resolution, i.e. the searches for H → γγ and H → ZZ ( * ) → 4 . In this way, the model is tested at the observed peak's mass position.
With theoretical input provided by the user in form of Higgs masses, production cross sections, and decay rates in the same format as used in HiggsBounds, the code HiggsSignals evaluates the corresponding χ 2 to test the compatibility of any BSM model with the full set of experimental Higgs-boson data. The experimental data from LHC (and -if needed -Tevatron) Higgs analyses is provided with the program, so there is no need for the user to include these values manually. However, it is possible for the user to modify or add to the dataset at will.
The usefulness of a public tool such as HiggsSignals 3 has become apparent in the past years, given the intense work by theorists to use the latest Higgs measurements as constraints on the SM and theories for new physics. The χ 2 output of HiggsSignals can directly be used as input to global fits. Some example applications can be found in Refs. [14][15][16][17][18][19][20][21]. Performance tests of the HiggsSignals implementation of experimental results within selected benchmark models have occasionally revealed shortcomings in the presentation of the publicly available experimental data. These have been reported to the experimental collaborations, which -in a joint effort between experiment and theory -have often led to an improved usability of the experimental results.
We begin Section 2 with a short overview of the relevant changes made to the shared Higgs-Bounds and HiggsSignals theoretical input framework. We then discuss the calculation of the χ 2 measure in HiggsSignals including a review of the peak-centered χ 2 method. We also describe in this context the χ 2 contributions arising from the LHC Run-1 ATLAS and CMS Higgs combination and the newly added simplified template cross section (STXS) observables. Furthermore, we discuss the handling of potential issues that can arise in BSM Higgs sectors, in particular effects from non-SM-like kinematical properties of the Higgs candidate, theoretical uncertainties of the mass predictions, and the possibility of overlapping signals from multiple Higgs bosons. We provide a short overview of the technical user operation instructions in Section 3 and refer to the online documentation [22] for details. We then discuss the options for exploiting the χ 2 result returned by HiggsSignals for several different applications in BSM model testing. In Section 4 we present detailed performance test of the HiggsSignals implementations for several analyses in comparison with official results from the experimental collaborations. In this context we point out the importance of sub-channel and correlation information that should be provided by the experimental collaborations in order to allow an accurate reinterpretation of their results. We close the section with a list of recommendations for the publication of Higgs signal measurements. We conclude in Section 5. The Appendix includes details on the implementation and the format of STXS observables as well as a listing of all currently included experimental measurements.  Table 1.: Production and decay mode identifiers, p and d, for the various Higgs production and decay processes included in HiggsSignals-2.
channel ID (c) collider process Table 2.: Examples for processes encoded by the HiggsSignals channel IDs.
Ref. [9]), which allows treating cases where the narrow width approximation is not applicable (as e.g. relevant in the case of a hypothetical or future signal in the high mass range).

The χ 2 calculation and individual contributions
In the previous version, HiggsSignals-1, three different run modes were supported: (1) the peak-centered χ 2 method which uses the signal strength measurements,μ, performed for one specific Higgs boson mass value,m (e.g. atm = 125 GeV); (2) the mass-centered χ 2 method which uses the signal strength measured as a function of a hypothesized Higgs boson mass as experimental input, thus enabling a χ 2 -test using the signal strength measured at the model-predicted Higgs boson mass, m; (3) a combination of both methods. Furthermore, an additional, separate χ 2 contribution arises from the comparison of the mass measurements with the model-predicted Higgs boson mass.
However, soon after the Higgs boson signal was established at a Higgs mass 125 GeV, the LHC experiments ceased to provide signal strength measurements for an extended Higgs boson mass interval. Without updated experimental input, the mass-centered χ 2 method quickly became irrelevant, and the peak-centered χ 2 method became the main run mode of HiggsSignals. Hence, the current version HiggsSignals-2 only uses the peak-centered χ 2 method while the other run modes have been removed. Yet, different contributions to the total χ 2 arise according to the experimental input the user chooses to apply. The different types of χ 2 contributions are listed in Table 3.
The χ 2 calculation using the so-called peak observables (first row in Table 3) closely follows the peak-centered χ 2 method employed in HiggsSignals-1. A brief review and a description of relevant modifications will be given in Section 2.3. The experimental input to this calculation is specified by the user by selecting an "observable set" which may contain measurements performed by the Tevatron experiments, as well as by the LHC experiments at center-of-mass energies of 7, 8 and 13 TeV. Predefined observable sets are provided with the HiggsSignals program, but the users can also create their own observable sets with selected observables. Simplified Template Cross Section (STXS) observables Table 3.: Individual χ 2 contributions in HiggsSignals-2. The total χ 2 (left column) is defined as the sum of the χ 2 contribution from the rate measurements (second column) and the χ 2 contribution from the mass measurements (third column).
We stress, however, that the user needs to avoid a statistical overlap with other observables contributing to the total χ 2 (see below ) when compiling an observable set that differs from the ones that are predefined in HiggsSignals.
After the completed LHC runs at 7 and 8 TeV the two experiments ATLAS and CMS released combined measurements of the signal rates [4] and the Higgs boson mass [3]. The signal rates were presented as unfolded inclusive measurements in 20 pure channels, i.e. one production process combined with one decay process (see Tab. 8 and Fig. 7 of Ref. [4]), accompanied with a 20 × 20 correlation matrix. The Higgs boson mass was determined tom LHC Run-1 = 125.09 ± 0.21(stat.) ± 0.11(syst.) GeV [3]. These experimental results give rise to a separate χ 2 contribution in HiggsSignals-2 (second row in Table 3); more details will be given in Section 2.4.
Another experimental input format that became available during Run-2 of the LHC are the simplified template cross section (STXS) measurements [23] (third row in Table 3). Higgs-Signals-2 features a new Fortran module to handle this input. While the treatment of these measurements in HiggsSignals is similar to the usual peak observables, the new module allows for a more versatile handling and additional features, as will be discussed in Section 2.5.
In the long run, the HiggsSignals framework handling the STXS observables will supersede the peak-centered χ 2 method also when the conventional (inclusive) signal strength measurements are used. In the meantime, STXS and peak observables can be used in parallel within one "observable set", each type giving rise to a separate χ 2 value which can be added if there is no statistical overlap in the corresponding measurements. As the Higgs mass measurements are always implemented in association with a signal rate measurement of the relevant experimental channel within the HiggsSignals program, the new STXS observables can also handle accompanying mass measurements.

The peak-centered χ 2 method in HiggsSignals-2
Within a given BSM model, the Higgs boson signal strength µ predicted for a specific bin or category of an experimental Higgs analysis is defined as where the sums run over all contributing channels i, which by definition consist of one Higgs production and one decay mode (see above). The numerator contains the experimentally observable signal rate predicted by the model, where i is the experimental signal efficiency 5,6 and [σ × BR] i is the model-predicted signal rate (i.e., in the narrow width approximation, the production cross section, σ, times the branching ratio, BR) for channel i. The denominator contains the corresponding quantities for a SM Higgs boson. Hence, the signal strength µ is a SM-normalized signal rate. The signal efficiencies in the model, i , can be different than those in the SM, SM,i , if the Higgs candidate has different kinematical properties, e.g., arising from higher-dimensional operators. Per default, we assume i = SM,i , but the user can directly set i if desired, see Section 2.6.1 for further details.
On the experimental side, in an analysis of the Higgs signal at 125 GeV, the measurement of the signal strength µ in a specific bin or category is performed by assuming SM Higgs properties for the observed particle, i = i,SM . The signal strength is determined by rescaling the SM-predicted signal rate for all involved channels i by a universal factorμ, The scale factorμ that best fits the observation is the central value of the measurement, which we shall denote asμ =μ| best-fit . Furthermore, the fitting procedure provides the upper and lower 68 % C.L. uncertainties, ∆μ up and ∆μ low , respectively.
In some analyses the measurements in experimental bins or categories are unfolded onto pure production and decay channels by fitting the corresponding signal strengths to all measurements simultaneously. This results in signal strength measurements of pure channels, however, with often significant correlations induced by the unfolding process. These types of measurements are special cases of the above definition, Eq. (1) (without a sum over i), and can 5 Experimentally, one distinguishes between the efficiency (related to the detector performance and object reconstruction) and signal acceptance A (i.e. the analysis-specific fraction of signal events passing the full signal selection). Here, the signal efficiency refers to the product of the experimental efficiency and signal acceptance. 6 Without loss of generality, the normalization of the signal rate to the SM expectation in Eq. (1) allows us to redefine all i and SM,i such that SM,1 ≡ 1. In this way, the i and SM,i describe the relative model and SM signal efficiency, respectively, of the channel i with respect to the first (i = 1) channel in the SM. be treated analogously in this method. However, it is then very important that a correlation matrix is provided by the experiments and properly included in the χ 2 calculation in HiggsSignals. 7 More details and discussions will be given in the following subsections and in Section 4.
As it will be useful in the following, we decompose µ, Eq. (1), as with relative efficiency modifiers ζ i ≡ i / SM,i , as well as the SM channel weights, ω SM,i , and individual channel signal strengths, c i , given by As explained in the HiggsSignals-1 documentation [10] the aim of the peak-centered χ 2 method is to perform a χ 2 test for the hypothesis that a local excess, "signal" (or "peak observable"), in the observed data at a specified mass is generated by the model. The experimental input for this method are the signal strength measurements performed at a specified mass,m, as well as measurements of the Higgs boson mass from the γγ and ZZ ( * ) → 4 final states.
Of course, ideally, the mass measurements and the specified mass of the signal strength measurements should at least approximately coincide (assuming the signals originate from the same Higgs boson). In this way, each signal strength measurementμ is intertwined with a specific Higgs mass value, which plays either the role of just a reference point or an actual measurement (that will contribute to the χ 2 test). Thus, a peak observablep is defined as onê µ measurement at a mass valuem (with mass resolution ∆m). Ifm ± ∆m is furthermore treated as a measurement, we call it a mass-sensitive peak observable. 8

The χ 2 calculation from the signal rates
We now briefly review the χ 2 evaluation in the peak-centered χ 2 method of HiggsSignals. As already mentioned in Section 2.2, Table 3, the total χ 2 value obtained in this method is composed of a χ 2 part from the signal strength observables and a χ 2 part from the Higgs mass 7 Even in the case of signal strength measurements in experimental bins or categories there are correlated systematic uncertainties which can be incorporated in HiggsSignals. 8 Observables handled via the new HiggsSignals-2 module for STXS measurements are treated analogously, i.e. among STXS observables we again distinguish between rate measurements at a hypothesized mass value and measurements of both the rate and mass (see Section 2.5). However, none of the STXS observables implemented in the current version HiggsSignals-2.6.0 includes a mass measurement. observables, χ 2 peak = χ 2 peak,µ + χ 2 peak,m . For N peak observables, the signal strength part is given by whereμ and µ are N -dimensional vectors of the measured and predicted signal strength, respectively. The signal strength covariance matrix C µ describes the signal rate uncertainties and incorporates correlations of the major uncertainties between the peak observables using publicly available information from the experimental analyses and theory predictions. For all peak observables, we include the correlation of the luminosity uncertainty (in % of the measured µ) for each experiment and center-of-mass energy, as well as the correlations of theoretical rate uncertainties. For the latter, we use the predictions from the LHC Higgs cross section work group (LHC HXSWG) Yellow report 4 [23] for the parametric and theoretical rate uncertainties of each production and decay process. Assuming a SM Higgs boson with mass 125.09 GeV, we construct relative covariance matrices for the production cross sections and branching ratios of the processes listed in Table 1, denoted as C SM σ and C SM BR , respectively. 9 For instance, correlations are induced by common error sources, e.g. the uncertainty in the strong coupling constant α s , or -for the branching fractions -through the division of the partial decay widths by the total decay width. As the theoretical rate uncertainties in the tested model can be different, we define analogous matrices, C model σ and C model BR , for the tested model. Per default, these are assumed to be the same as in the SM, but can be changed by the user. 10 In the construction of the signal strength covariance matrix C µ the theoretical rate uncertainties enter as where the notation is as follows (see also Ref. [10]): The index a (b) runs over the k α (k β ) contributing channels of the peak observablep α (p β ). The index mappings p(x) and d(x) project onto the production and decay mode identifier (cf.  9 We neglect correlations of theoretical/parametric uncertainties between production and decay rates. These correlations cannot be unambiguously reconstructed from the information given in Ref. [23]. Furthermore, these correlations are expected to be subdominant, at least for the main experimental channels. See also Ref. [24] for a discussion. 10 Instructions for how these matrices can be evaluated and incorporated in HiggsSignals are given in the online documentation [22]. i.e. the relative contribution of the channel i to the total signal rate, as predicted by the model. Through the multiplication with the channel weight and the signal strength modifier, the relative squared theory uncertainties encoded in C model σ and C model BR are converted into absolute squared signal strength uncertainties, as required for the covariance matrix. The theoretical rate uncertainties in the SM, which are typically already included in the signal strength measurement and therefore need to be subtracted from theμ uncertainty beforehand (see Ref. [10] for more details), are evaluated in a similar way using C SM σ and C SM BR .
In addition to the luminosity and theoretical rate uncertainties, other correlations between the peak observables can be included in HiggsSignals if the relevant information is available. This can be done by providing a corresponding correlation matrix within the observable set which is then loaded into HiggsSignals 11 . Ideally such matrices are published by the experiment along with the measurements. Indeed, as we will discuss in Section 4, including the correlation matrix in general drastically improves the accuracy of the χ 2 test, in particular if the µ measurements correspond to pure channels obtained after an unfolding process.

The χ 2 calculation from the Higgs mass
In the calculation of the χ 2 contribution from the Higgs boson mass measurements, χ 2 peak,m , HiggsSignals allows three different choices to model the probability density function (pdf) of the Higgs boson mass: (1 ) as a uniform (box) distribution; (2 ) as a Gaussian, or (3 ) as a box with Gaussian tails. For the Gaussian pdf, the theoretical mass uncertainty is treated as fully correlated among mass sensitive peak observables to which the same Higgs boson has been assigned.
If a Higgs boson h i with mass m i and theoretical mass uncertainty ∆m i is assigned to a masssensitive peak observablep α with mass measurementm α ± ∆m α , its χ 2 contribution is given by 12 for a uniform (box) pdf, and 11 Such correlation matrices are loaded automatically with the observable set during the initialization of the program. Each matrix should be given as three column data (index1, index2, correlation coefficient) in a separate file with extension .corr in the observable set directory. 12 We suppress in the following the subscript 'peak, m' for the individual χ 2 contributions from the peak observables.
for a box-shaped pdf with Gaussian tails. In both cases, the total χ 2 contribution from the Higgs mass measurements is given by χ 2 peak,m = M α χ 2 α , with α running over the M masssensitive peak observables. The parameter Λ appearing in Eq. (9) is the Higgs assignment range which will be discussed in detail in Section 2.6.3. Note that the uniform (box) pdf is a rather poor description of the Higgs mass pdf (in particular, of the experimental uncertainty) and is included mostly for illustrative purposes. In the (default) case of a Gaussian pdf, the total χ 2 contribution from the Higgs mass reads Here,m and m are M -dimensional vectors and contain the measured and predicted Higgs mass for the mass-sensitive peak observables, respectively. The diagonal elements of the Higgs mass covariance matrix, C m , contain the squared experimental uncertainty, (∆m α ) 2 , while the squared theory mass uncertainty, (∆m i ) 2 , enters all matrix elements (C m ) αβ (including the diagonal) where the Higgs boson h i is assigned to both peak observablesp α andp β .

The χ 2 contribution from the LHC Run-1 combination
In Ref. [4] the ATLAS and CMS collaborations have published a combined analysis of their datasets at √ s = 7 and 8 TeV center-of-mass energy. In its most general form, the results were presented as signal rate measurements for the five dominant production modes (pp → H, VBF, W H, ZH, ttH) times the five dominant decay modes (H → γγ, ZZ, W W, τ + τ − , bb). Five channels -for which the sensitivity for a measurement had not been reached in Run-1 -were omitted resulting in a total of 20 measured channels. The analysis was performed for a Higgs mass of m H = 125.09 GeV. The correlations of all theoretical uncertainties (assuming the SM Higgs boson) and experimental systematic uncertainties were provided in the form of a 20 × 20 correlation matrix. In addition to the signal rate measurements, the Higgs mass was determined to m H = 125.09 ± 0.21(stat.) ± 0.11(syst.) GeV from the combined ATLAS and CMS Run-1 data [3].
HiggsSignals incorporates the 20 Run-1 signal rate measurements (including their correlations) via a dedicated run routine (see online documentation [22]). Furthermore, a χ 2 contribution from the Higgs mass measurement, taken to be 125.09 ± 0.24 GeV in HiggsSignals, arises, if a Higgs boson in the tested model is assigned to these measurements. The resulting χ 2 contributions from the LHC Run-1 signal rates and mass measurements can then be added to other χ 2 values obtained by the HiggsSignals runs (e.g. using 13 TeV results).
As such, the ATLAS and CMS combined analysis represents the benchmark of √ s = 7 TeV and 8 TeV results. For the validation of the HiggsSignals methodology, reproducing these results from the signal strength measurements of the individual analyses at 7 and 8 TeV provides an excellent cross-check, as will be shown in Section 4.1.

Simplified Template Cross Section (STXS) measurements
During Run 2 of the LHC the experimental collaborations started to employ the STXS framework [23] for the presentation of Higgs rate measurements. These measurements comprise a different type of input observables in HiggsSignals, resulting in an additional χ 2 contribution (see Table 3) that is evaluated in a new module. Although the basic information given for STXS measurements is very similar to the conventional peak (or "µ") measurements, we choose to treat them separately in order to provide and enable a more flexible handling and additional features for the STXS observables. It should be noted that -although we denote the new module and the observables as STXS -any signal rate measurement can be handled via this framework, regardless whether this measurement fulfills the official definitions of an STXS bin. For instance, inclusive measurements or fiducial differential cross section measurements can be treated, as long as the necessary information is provided. Furthermore, if the peak-centered χ 2 method and the STXS method are simultaneously used, each measurement can only be implemented either as peak or STXS-observable in order to avoid double-counting.
The STXS framework was designed in order to (i) provide commonly-defined exclusive phasespace regions for Higgs production processes facilitating the incorporation of information on differential cross sections, (ii) maximize the measurement's sensitivity to the underlying physical process(es) while minimizing its dependence on theory assumptions, (iii) isolate possible BSM contributions by defining bins that have an enhanced sensitivity to BSM effects. By addressing these targets, the STXS framework has the goal to simplify the procedure of BSM model testing against the experimental data.
The defined exclusive regions of phase space, called "bins" for simplicity, are specific to the different Higgs production modes. With increasing amounts of data, measurements of differential distributions of the various Higgs processes become possible. Therefore, STXS bins have been defined for three stages ("stage 0", "stage 1", "stage 2", and substages thereof) to allow a transition from more inclusive to more differential measurements, see Ref. [23] for details. This transition can be performed independently for each production mode. In order to maximize the sensitivity of the current data, various decay modes can be combined in the determination of the STXS bins, assuming the SM Higgs boson as a kinematic template.
Within HiggsSignals each experimental measurement of an STXS bin enters the χ 2 calculation as an individual observable. It should be noted, however, that many of the STXS bins (in particular at Stage 0) do not represent "pure" Higgs signal channels but instead are comprised of different production channels (similar to the conventional "peak" observables, see Section 2.3). This is because the STXS definitions are driven by the particle level objects produced in association with the Higgs boson, whereas the signal channels within HiggsSignals are distinguished by the topology and Higgs coupling dependence. For instance, the two distinct production channels of vector boson fusion, pp → qqh, and Higgs-strahlung with a hadronically decaying vector boson, pp → V h (with V → qq), are considered together in one Stage-0 STXS bin, as they lead to the same final state particles. Only at a later stage the STXS framework aims to separate these processes into exclusive bins by employing dedicated cuts. Therefore, HiggsSignals in general treats the STXS observables as multi-channel measurements, and -if no further information is given -assumes that the combined processes have similar signal efficiency (using the SM Higgs boson as a kinematic template). Furthermore, the STXS measurements are implemented in HiggsSignals preferably as absolute rates, along with corresponding rates for the SM prediction as an important reference value (see below). This avoids the complication or peak observables that the implemented SM-normalized rates (i.e. signal strengths µ) have to be adjusted in case an updated SM prediction for the signal rate becomes available.
The STXS framework provides a smooth transition from more inclusive to more exclusive, i.e. differential, Higgs rate measurements, as more data is collected. In HiggsSignals however, the standard user input -handled via the HiggsBounds framework -are the predictions for the inclusive (SM-normalized) cross sections and decay rates. In order to enable tests of non-trivial new physics effects on differential distributions of Higgs boson processes, the user can specify independent predictions individually for every STXS observable. This is done by providing rate modification factors r j i for each STXS bin, defined as where the index i labels the signal channels of the STXS bin, and the index j labels the neutral Higgs bosons in the model. The prediction for the exclusive STXS bin rate (inclusive rate) for channel i and Higgs boson h j in the model is denoted σ j i (STXS-bin) (σ j i (incl.)), whereas the corresponding SM predictions are given by σ SM,i (STXS-bin) (σ SM,i (incl.)). Using the inclusive signal strength modifier from the conventional input, µ j i = σ j i (incl.)/σ SM,i (incl.), the model-predicted signal rate for the STXS bin is internally calculated as This quantity then enters the χ 2 test. Note that the knowledge of the SM prediction for the STXS-bin is essential here and is taken from the experimental analysis along with the corresponding measurement. The rate modification factors can be set using the subroutine STXS::assign_modification_factor_to_STXS (see online documentation [22]). Per default, r i i = 1, i.e. the SM Higgs boson is used as a kinematic template. Note that the quantities σ SM,i (STXS bin) and σ SM,i (incl.) in Eq. (12) can in principle be extracted from HiggsSignals, i.e. only the model-specific numerator of Eq. (12) needs to be evaluated externally. However, we recommend to also evaluate the denominator (i.e., the SM predictions) of Eq. (12) with the same computing setup in order to provide predictions for both the model and SM at the same level of accuracy.
The STXS functionality in HiggsSignals and the associated χ 2 test can be employed with measurements of individual cross sections, or branching ratios, or cross section times branching ratios, or even ratios of branching ratios, either given in absolute numbers, or in terms of a SM-normalized quantity. In particular, as mentioned above, any peak observable could in principle be implemented as an STXS observable (despite of generally not being consistent with the definition of STXS bins). More information on the technical implementation of STXS observables in HiggsSignals can be found in Appendix A.

Handling the potential complexity of BSM Higgs sectors
When testing the Higgs sector predictions of a BSM theory with Higgs measurements some issues that affect the calculation of the predicted signal strength µ, Eq. (1), warrant special consideration.
2.6.1. Non-trivial signal efficiencies i = i,SM Per default, HiggsSignals assumes i ≡ i,SM for all experimental analyses and channels. If significant modifications of the signal efficiencies i with respect to the SM expectation i,SM are anticipated in the model, these should be evaluated externally by the user for every relevant experimental analysis and every contributing channel i, for instance, by running a Monte-Carlo simulation of the BSM Higgs signal and processing it through the experimental analysis. In addition, the SM Higgs boson signal should also be simulated and processed for comparison, such that relative efficiency modifiers ζ i ≡ i / i,SM can be determined. These can then be fed into HiggsSignals in order to account for the modified signal efficiencies in the χ 2 test (through the subroutine assign_modelefficiencies_to_peak, see also Section 3). The usage of the relative efficiency modifiers ζ i has the advantage that uncertainties related to the recasting method largely cancel, as they typically affect both the model's Higgs signal and the SM Higgs signal in a similar way. For the STXS observables these effects are captured by the usage of rate modification factors, r j i , described in Section 2.5. For a recent study of the CP properties of the Higgs-top-quark interaction in which these features have been employed see Ref. [20].

Different predicted and observed mass with theory uncertainties
If the Higgs signal candidate's mass m is a free parameter of the model, it can be chosen to exactly match the observed Higgs massm, for which the signal strength measurements have been performed. However, this is not always possible for all observables simultaneously, as often subsets of signal strength measurements have been performed at slightly different hypothesized Higgs masses by ATLAS and CMS. If m =m, the mass dependence should be taken into account in the signal strength calculation, Eq. (1), by evaluating the model predicted signal rate at the predicted mass m, while evaluating the SM signal rate atm.
In contrast, if the Higgs signal candidate's mass m is not a free parameter, but instead a modelprediction with a non-negligible theoretical uncertainty ∆m, there are two reasonable choices: First, one could neglect the uncertainty of the mass prediction and do the same as above by taking into account the mass dependence of the model-predicted signal rate in Eq. (1); second, one could factor out the mass dependence of the rates and calculate both the model-and SMpredicted signal rates at the predicted mass value m. 13 In HiggsSignals-2 the default option is a combination of both: If the difference between observed and predicted mass is less than the theoretical uncertainty, |m −m| ≤ ∆m, HiggsSignals evaluates both the numerator and denominator in Eq. (1) at m. Otherwise, if m >m + ∆m or m <m − ∆m, it takes into account the mass dependence by evaluating the numerator at m and the denominator at m + ∆m orm − ∆m, respectively. In this way, the mass value chosen for the reference SM rate is considered to be the nearest mass value to m allowed by the theoretical mass uncertainty. Furthermore, this choice leads to a smooth transition between the different mass regimes. This behavior can be controlled through the setup_rate_normalization subroutine, and the theoretical mass uncertainties ∆m can be given as optional input by the user, see the online documentation [22] for details.

Multiple overlapping Higgs-boson candidates
A prime application of HiggsSignals are BSM models with an extended Higgs sector, i.e. scenarios where more than one neutral Higgs boson can potentially contribute to the observed Higgs signal. HiggsSignals therefore employs a dedicated algorithm -the assignment procedure -in order to determine which Higgs bosons are contributing in the signal strength evaluation, Eq. (1), and thus enter the χ 2 test against the observed signal. If a Higgs boson is assigned to an observable, then its signal rate enters Eq. (1). If no Higgs boson is assigned to an observable, the χ 2 contribution from the signal rate measurement is evaluated for zero predicted signal rate, µ = 0, which typically results in a large χ 2 penalty. The assignment procedure of a Higgs boson h i to an observablep α depends on the experimental mass resolution, ∆m α , as well as on the theoretical Higgs mass uncertainty, ∆m i . As a general rule, if 2 for a Gaussian Higgs mass pdf, then the Higgs boson h i is assigned to the observablep α . Here, Λ is a control parameter called assignment range, with default value Λ = 1. A few exceptions to this rule exist in the case where the observablep α contains a mass measurement that enters the χ 2 contribution from the Higgs mass, or in the case that several peak observables are collected in an assignment group [10]. In particular, for the former case, a Higgs boson assignment is also possible even if Eq. (14) is not fulfilled. This happens when the total χ 2 contribution from the peak observable in the case of assignment is lower than the χ 2 contribution in the case where the Higgs boson is not assigned (see Ref. [10] for details). Two different assignment range parameters are introduced for the two kinds of observables: Λ for all observables (peak, STXS and Run-1) that do not have an associated mass measurement, and Λ m for mass sensitive observables (which are currently available in the γγ and ZZ → 4 channels). This provides the user with more flexibility to specify the mass range that is regarded as phenomenologically viable using the setup_assignmentrange and setup_assignmentrange_massobs subroutine, respectively (see online documentation [22] for details). The default choice is Λ = 1 and Λ m = 2.
Before we move on to cases with multiple Higgs bosons, we illustrate the assignment procedure with a simple example. Figure 1 displays the total χ 2 as a function of the mass of a Higgs boson that has couplings that are identical to those of the SM Higgs boson, for the three possible choices of the Higgs mass pdf (box, Gaussian, box+Gaussian). The left panels (right panels) assume zero (a 1 GeV) theoretical mass uncertainty. The step-like shape originates from the Higgs assignment procedure, where at every step the assignment of the Higgs boson to some observables changes. For instance, in the top left panel, the χ 2 distribution for all mass pdfs features a jump at m = 124.61 GeV and at 125.57 GeV. Here, the assignment to the LHC Run-1 observables changes due to the implemented Higgs mass measurement of m = 125.09 ± 0.24 GeV and the default setting Λ m = 2. If ∆m = 0 the mass interval in which the Higgs boson is assigned to the LHC Run-1 observables is larger. Outside this range, the χ 2 rises steeply, rendering these mass values as highly disfavored. Two more features are illustrated in this example. First, the differences between the three Higgs mass pdf choices, Eqs. (9) to (11), becomes apparent, in particular in the case of non-zero theoretical mass uncertainty. Second, the plots display the treatment of the mass dependence of the signal rate prediction, as discussed above, resulting in an asymmetric χ 2 shape around the minimum for all three Higgs mass pdf choices. The way such plots can be obtained from HiggsSignals is illustrated in the HS_mass example program.
We now continue with the case of multiple neutral Higgs bosons. If N > 1 Higgs bosons are  assigned to an observablep α , Eq. (1) is generalized to µ α = j µ α,j , where j runs over all N assigned Higgs bosons h j . 14,15 If the observable contains a mass measurement,m α ± ∆m α , the predicted mass value and its uncertainty are determined from the masses of the assigned Higgs bosons, m j , and their uncertainties, ∆m j , through a signal-strength weighted average, respectively. These averaged quantities in Eq. (15) then enter the χ 2 test against the Higgs mass measurement. In other words, we assume that multiple Higgs bosons overlapping within the experimental mass resolution (according to Eq. (14)) would show up as a single signal peak approximately located at m. In case of a Gaussian pdf the theoretical mass uncertainties are still treated as fully correlated uncertainties among the observablesp α andp β , if the same Higgs boson h j has been assigned (denoted by the symbol ' ' below). The diagonal and off-diagonal entries in the covariance matrix are then given by respectively, where the relative signal strength contributions µ α,j /µ α of the assigned Higgs boson act as uncertainty weights in Eq. (17).
The described procedure is well-motivated and accurate in the case that the individual signals cannot even partially be resolved by the experimental analysis, or in other words, the predicted mass differences between the assigned Higgs bosons h i and h j are small compared to the experimental mass resolution, |m i −m j | ∆m α (for all i, j). However, the procedure becomes when the individual signals are at least partially resolved.
In order to accommodate this limitation, we introduce another χ 2 contribution for the mass separation of the assigned Higgs bosons with respect to the mass average, which for the three mass pdf choices reads as (20) where Θ is the Heaviside function. In order to avoid artificial penalties in the box-pdf case, Eq. (18), only h j that contribute at least a fraction of c µ (with a default value of 1 %) to the total signal strength µ α are assigned to the peak observablep α .
As a result, the total χ 2 value is composed of the signal rate part, χ 2 µ , the averaged Higgs mass part, χ 2 m , and the Higgs mass separation part, χ 2 sep . Note that the above formulas include the "standard" case with only one assigned Higgs boson. Moreover, due to the signal strength weighted averaging, a (technically) assigned Higgs boson with zero signal strength does not contribute to the χ 2 .  Figure 2.: Individual χ 2 contributions from the averaged mass, χ 2 m , the mass separation, χ 2 sep , the signal rates, χ 2 µ , and the resulting total χ 2 for a toy example with two Higgs bosons of masses M 1,2 , theoretical mass uncertainties ∆M 1,2 = 0.5 GeV, and universal signal strengths µ α;1,2 = 0.5. A Gaussian pdf is used, and all Λ are at their default values.
A simple example for the total χ 2 and its individual contributions is shown in Fig. 2 for a toy example with two Higgs bosons H 1,2 with masses M 1,2 , theoretical mass uncertainties ∆M 1,2 = 0.5 GeV and universal signal strengths µ α;j = 0.5 (for all α and j = 1, 2). The current default observable set is used as experimental input, i.e., the only mass measurement included is 125.09 ± 0.24 GeV from the LHC Run-1 combination. The χ 2 contribution from the averaged massm (top left panel ) shows four distinct regions. In the corner squares neither of H 1,2 is assigned, and the χ 2 m is zero. In the central square both scalars are assigned, and χ 2 m has a flat direction along the diagonal of constantm. In the four rectangular regions at the sides only one of the scalars is assigned, and the χ 2 m profile corresponds to the one-particle case discussed in Fig. 1. The flat direction along constantm is resolved by the χ 2 sep contribution (top right panel ). This penalizes large mass splittings and is only non-zero in the region where both scalars are assigned. The χ 2 µ contribution from the signal rates (bottom left panel ) is minimal and almost constant, ∆χ 2 ≈ 0, as long as both scalars are assigned since their rates sum to the SM rates. As soon as one of the scalars drops from assignment, the rates are only half the SM rates, or zero if neither of H 1,2 is assigned. In both cases ∆χ 2 µ incurs a very large penalty. The resulting total ∆χ 2 is shown in the bottom right panel with the 68 % C.L.
(95 % C.L.) contour indicated by a black solid (dashed) line. The resulting profile has no flat direction and favors M 1 ≈ M 2 ≈m α . Figure 2 was generated with the example program HS_2Higgses that is provided with the HiggsSignals package.

User operating instructions
HiggsSignals-2 incorporates a number of modernizations to the source code with respect to HiggsSignals-1. Furthermore, the project has moved the HiggsSignals-2 development to the GitLab repository https://gitlab.com/higgssbounds/higgssignals where the source code is available. Additionally, we are now using CMake as build system such that HiggsSignals is now compiled by mkdir build && cd build cmake .. make This will compile the HiggsSignals library, the main executable, as well as a number of example programs that illustrate different use cases. HiggsSignals depends on the HiggsBounds library which has to be available on the system and will be automatically found and used by CMake. More detailed information on building and linking HiggsSignals can be found in the Readme on the above-mentioned web-page.

Fortran subroutines
The Fortran subroutine interface provides access to all of the functionality in HiggsSignals. Up-to-date, detailed documentation for the subroutines is available online at https://higgsbounds.gitlab.io/higgssignals.
The most important interface change in HiggsSignals-2 stems from the separation of the observable set into three parts -the peak observables, the STXS observables, and the LHC Run-1 combination -as discussed in the previous section. These can be separately accessed through the subroutines run_HiggsSignals (for the 13 TeV LHC peak observables), run_HiggsSignals_STXS, and run_HiggsSignals_LHC_Run1_combination that return the corresponding χ 2 values defined in Table 3, the number of observables, and the p-value. In most applications, a combined result including all of these observable sets is required. In this case the subroutine run_HiggsSignals_full can be used or the three subroutines can be run separately and the resulting χ 2 values can be summed.
HiggsSignals-2 also includes a C interface to all of the Fortran subroutines to make them more accessible from C or C++ codes. This interface automatically handles the necessary type conversions and is included in the online documentation.

Command-line version
Compiling HiggsSignals generates a main executable that can be run as ./HiggsSignals <expdata> <pdf> <whichinput> <nHzero> <nHplus> <prefix> where the arguments specify the following: expdata selects which experimental dataset of peak and STXS observables is used. The default value is latestresults which refers to the latest available experimental results included in your version of HiggsSignals. Alternatively, any of the folders in data/Expt_tables/ can be named, and the observables in that folder will be used. The parameter <pdf> specifies the probability density function as discussed in Section 2.3.2. The input method is selected by <whichinput>, while <nHzero> and <nHplus> specify the number of neutral and charged Higgs bosons in the model, respectively. These parameters are passed to the HiggsBounds framework in order to handle the theoretical input, see the HiggsBounds manual for further details [9]. The <prefix> path specifies the longest common path to the input data files.
HiggsSignals can use files in the SLHA format for input and output. The input blocks are described in detail in the HiggsBounds manual [9]. In SLHA format HiggsSignals produces an output block called HiggsSignalsResults containing the resulting χ 2 values.

Interpretations of the χ 2 for model testing
The main application of the HiggsSignals χ 2 result is the statistical discrimination between models on the basis of Higgs boson signal observables (mass and rates, partly in kinematical bins). This is achieved by statistically testing either a particular hypothesis or two hypotheses against each other. Three typical applications can be distinguished: 1) Within a given model with model parameters p = p i (i = 1, 2, . . . , N ) one may wish to determine the parameter regions that are preferred at a specific confidence level (C.L.) by the observation. Or, as a related issue, one may consider the question which parameter regions are excluded at a specific C.L., given the fact that the model contains other regions that are in better agreement with the data. This can be answered by calculating two-sided confidence intervals (C.I.) for the parameters p. These C.I. are typically presented in one or two dimensions (with the remaining parameters profiled or marginalized), but can in general be of higher dimension D ≤ N . Examples for typical applications of this kind can be found in Refs. [8,16,17,19,20,25]. These global fit studies often combine the HiggsSignals χ 2 with other relevant constraints and observables. This hypothesis test will be described in Section 3.3.1; 2) If one is interested only in the lower or upper boundary of the C.I. of the model parameter(s), the calculation of one-sided confidence intervals can be appropriate. This is called limit setting. It may be of use if the parameter(s) represent so-far unobserved phenomena. For instance, one could ask the following questions: How large can the branching ratio for Higgs boson decays into invisible final states be? What is the maximally allowed value of the CP-violating phase of the Higgs-top-quark interaction? How low can the masses of the additional MSSM Higgs bosons be? How large can a Higgs doublet-singlet mixing be? While a lower or upper limit on a single parameter can be derived unambiguously, this is no longer true in a higher dimensional parameter space. However, in such cases a suitable mapping onto a one-dimensional parameter (e.g., a common signal strength modifier) may still be found to enable the derivation of a one-sided C.I.. Details on this hypothesis test are given in Section 3.3.2.
In both applications the calculation of the two-sided or one-sided C.I. is a hypothesis test based on a likelihood ratio (LR), which quantifies the (dis-)agreement of two competing statistical models based on the ratio of their likelihoods. Two categories need to be distinguished in the hypothesis test: First, which models are actually compared, and second, for which model comparison is the choice of LR optimal? For reasons beyond the scope of this discussion, these two categories do not necessarily coincide. In the analyses of the LHC experiments, both for limit setting and for fitting, the LR is constructed such that one of these models is typically determined by maximizing the likelihood over the entire parameter space, defining the best-fit (BF) scenario, and the other represents a specific model parameter point under study. If the likelihood of the alternative hypothesis (i.e. the non-BF model point under study) is significantly lower than the likelihood of the null hypothesis (i.e. the BF point), the alternative hypothesis can be rejected (see Sections 3.3.1 and 3.3.2 for details). The Neyman-Pearson lemma [26] shows that the LR test is the most powerful test among all possible statistical tests in this case.
The full likelihood of the model parameters p given the observed data x, denoted as L( p| x), as evaluated by the LHC experiments, is not publicly available. Based on the publicly available information on the measurements the HiggsSignals χ 2 approximates the full log-likelihood, The log-likelihood ratio (LLR) can therefore be constructed as Here,ˆ p denotes the parameter point at which the χ 2 is minimized (or the likelihood L( p| x) is maximized), the "BF point". The likelihood ratio is also denoted as a test statistics t( p). In the presence of additional nuisance parameters θ, corresponding to all possible systematic or parametric uncertainties, the test statistics generalizes to In the numerator the nuisance parameters are optimized for each tested parameter point p in the nuisance parameter space, with the optimum denoted byˆ θ, while in the denominator p and θ are optimized simultaneously to find the global likelihood maximum at the pointˆ p andˆ θ.
In a few cases, likelihood distributions are given for single observables or model parameters by the LHC experiments (see e.g. Ref. [27,28] for one-dimensional likelihood functions of κ parameters, the Higgs decay rate to invisible final states, the total decay width, and Higgs CP-sensitive parameters). However, this is not generally the case for many Higgs signal rate measurements, which are usually only presented in terms of central values and 1σ uncertainties, ideally accompanied by a correlation matrix. Based on this information, the χ 2 calculated by HiggsSignals is the closest possible approximation to the full likelihood for many observables.
3) The third application is the goodness-of-fit test which is different to the above choices in that it typically tests only one statistical model, i.e. it does not compare two different model parameter points, but is evaluated for a single parameter point. Typically, this test is performed for specific scenarios, e.g., the BF scenario that has been identified in a preceding global model fit (case 1). Certain caveats apply, however, if the user wants to exclude a model purely on grounds of a goodness-of-fit test (see below).
An approximation to the hypothesis test using the LR (as discussed above) can be constructed based on the goodness-of-fit test. This simpler implementation can be used to compare, for instance, the BF scenarios of different models. In particular, with a so-called F -test, one can determine whether model A leads to a significantly better description of the observation than model B, despite the fact that model A has more free parameters than model B. The F test is recommended to analyze which of the models under study delivers a better fit, but it is not recommended as a means to exclude models or model parameter ranges. More details on goodness-of-fit test are given in Section 3.3.3.
HiggsSignals can be used for all three applications. The most common hypothesis test employed in phenomenological studies with HiggsSignals is certainly the model fit (case 1).  In the following sections we elaborate on the three choices and discuss how in each case the χ 2 value calculated by HiggsSignals is used in practice. A decision chart for choosing the appropriate statistical treatment for a variety of common applications is given in Fig. 3. The chart also includes the possible attempt to exclude the SM hypothesis in favor of an alternative model ("discovery mode"). Here, the user must be aware of some important caveats in such an interpretation before making any far-reaching claims. These will be discussed in Section 3.3.4.

Usage in a fit
When fitting the parameters of a given model to the observation one employs the calculation of two-sided confidence intervals, as one is generally interested in the lower and upper boundaries of the allowed parameter intervals. This is illustrated in the left panel of Fig. 4. The plot shows a one-dimensional probability density function f (p) of a parameter p, the best-fit valuê p and the lower and upper boundaries of the C.I. of probability 1 − β, denoted p low 1−β and p up 1−β , respectively. 16 The dark-shaded area (i.e. the integral over the pdf f (p)) corresponds to the probability of 1 − β. For instance, for a 1σ C.I. in a one-dimensional parameter p, we choose β ≈ 0.32, and p low 0.68 (p up 0.68 ) is the corresponding lower (upper) boundary of the 68 % C.I.. 17 16 For illustration the pdf is Gaussian-shaped, but can in general have an arbitrary shape as long as it is properly normalized, f (p)dp = 1. 17 The other two panels of Fig. 4   It is well-known that in the so-called Gaussian limit -where near the best fit point the uncertainties of the observables are approximately independent of the parameters, the relation between parameters and observables are approximately linear, and the uncertainties are approximately Gaussian -for one-dimensional parameter spaces the two-sided confidence interval corresponding to a 1σ (2σ) C.L. is given by the ensemble of model parameter points with ∆χ 2 ( p) ≤ 1 (4) above the best fit point at ∆χ 2 (ˆ p) = 0. In two-dimensional parameter spaces, the 1σ and 2σ confidence regions are found at ∆χ 2 ( p) ≤ 2.30 and ∆χ 2 ( p) ≤ 6.18, respectively. For these values, it does not matter how many additional parameters of interest or nuisance parameters are free in the fit, as long as all other parameters apart from those for which the C.I. is determined are profiled (marginalized) in the frequentist (bayesian) interpretation. An overview of the ∆χ 2 values for different C.L. and different values of the number of parameters under study (also called degrees of freedom ν) are given in Table 4. While there is no logical necessity to restrict the number of parameters for whose common variation a C.I. is given to one or two, it is uncommon to determine the C.I. for more than two parameters at a time, as this becomes difficult to visualize. If it is important to convey information about the relation between many different parameters, one usually provides a covariance matrix based on linear correlation factors and on the 1σ one-dimensional uncertainties. These confidence intervals and covariances based on profiling can be established numerically using tools like MIGRAD in MINUIT [29] or by using methods like Markov Chain Monte Carlo techniques or other numerical optimizers.
Caution is required for cases where the requirements for the Gaussian limit are not reached, i.e. where the χ 2 profile around the minimum is not parabolical. This situation appears in many examples of model fits. It is common practice to use the ∆χ 2 ranges given here also in such a situation, albeit it cannot be guaranteed that they still correspond to the quoted frequentist coverage of e.g. 68 % for the one-dimensional 1σ C.I. Furthermore, linear correlations might  Table 4.: ∆χ 2 as function of confidence level (C.L.) and degree of freedom (ν). The values for ν > 2 are rarely used and only given for completeness.
not describe the interdependence of the parameters properly. We emphasize that the user should treat these cases with caution and discuss possible implications, or instead use a Toy-Monte-Carlo based technique. Ref. [15] gives such an example for determining the parameter range for the actual C.L. instead of using the profiling technique.
It is strongly recommended to use the likelihood-ratio based fitting technique to find the uncertainty range of the model parameters. We discourage from calculating the (approximate) goodness-of-fit for each model parameter point and then combine those parameter points with acceptable goodness-of-fit to form an "allowed region". The reason for this recommendation is explained in Section 3.3.3.

Usage in limit setting
The main difference between limit setting and fitting is the choice of a one-sided confidence interval for the former. As discussed above, limit setting and fitting are subclasses of hypothesis testing, which can be used for both discoveries or exclusions. For discoveries, hypothesis testing addresses the question "How unlikely is the observed pattern of data if the null hypothesis is true?" where the null hypothesis is typically the assumption of the SM without any additional signal, s = 0. In contrast limit setting is a hypothesis test where exclusions of an alternative hypothesis at a pre-defined C.L. are sought. Exclusions are related to the question "How unlikely is the data if the alternative hypothesis is true?". It is practically often implemented using a one-dimensional parametrization of the strength of the model predictions. In the HiggsSignals case, each model prediction of observables x( p) can be parametrized as Here, we introduced s as a global rescaling parameter of the model-predicted deviations in the observables x from the SM prediction, which we further abbreviate as d( p) ≡ x( p) − x SM . The true model prediction at p is retained when the rescaling parameter -also called strength of BSM effects -is s = 1. Under a change of parameters p → p two possible changes of d need to be distinguished: First, the case where the parameter transformation leads to a different pattern of deviations in the observables x from the SM prediction, i.e.
with a scalar constant α. Second, the case where the parameter transformation leaves the pattern of deviations invariant, Here the scalar multiplicative factor α can simply be absorbed into the transformed strength of deviations, i.e. only the strength of BSM effects s changes to s = αs under the change of parameters.
Using these definitions, limits on model parameter points can be derived from measurements of the Higgs mass and rates in HiggsSignals by employing a limit setting procedure on s.
Depending on the two possible transformation properties under a change of parameters described above, two statistical tests are often performed in experimental and phenomenological analyses in the absence of a significant deviation of the data from the SM: 1. Is the alternative model with parameters p excluded at a given C.L.? In this case, the C.L. for s = 1 needs to be calculated. If the C.L. falls below a pre-defined cut, the model point p is regarded as excluded. This is the recommended application for all models where the parameters of interest cannot be mapped onto a single rescaling parameter s that only affects the strength of the model-predicted deviation from the SM, i.e. for the first case of parameter transformation properties, Eq. (24). This is typically the case in relatively complicated parameter spaces, e.g. in the (M A , tan β) parameter plane of the 2HDM or MSSM.
2. What is the maximum signal strength s up at which the pattern of deviations from the SM defined by x( p) is allowed? In this case, s is varied until the set of predictions of observables x(s up , p) is found which is excluded exactly at the pre-defined C.L.. This is the recommended application for models where a (sub)set of parameters leaves -at least to a sufficiently accurate approximation -the shape of the pattern of deviations invariant and only affects the strength s according to Eq. (26). This applies, for instance, when constraining an additional Higgs branching ratio into an invisible final state, or a singlet-doublet-mixing angle in a Higgs singlet extension model. In this case, only the parameters which do affect the shape of the pattern of deviations need to be tested separately, while the (sub)set of parameters only affecting the strength can directly be tested against s up . For illustration we discuss an explicit example in Fig. 5 below.
In the following, we will describe the two tests on the C.L. of s = 1 and of the determination of s up .
The C.L. is chosen by defining the desired power of the test, β. In particle physics, for exclusions typically β = 0.05 is used, such that the required C.L. is 1 − β = 95 %. In order to find the observed C.L., the test statistics t is chosen in complete analogy to Eq.
which optimally distinguishes the alternative hypothesis s from the best-fit hypothesisŝ, under consideration of all possible uncertainties (systematic, parametric etc.) using the vector of nuisance parameters θ. The same convention for the definition ofˆ θ andˆ θ as in Eq. (22) applies. The second approximation, omitting the nuisance parameters θ, relates to the fact that the nuisance parameters are not fitted explicitly in HiggsSignals, since the experimental systematic errors are not reported separately by their source in the experimental input to HiggsSignals. Hence, correlated and uncorrelated systematic errors are only treated in the covariance matrix entering the χ 2 calculation. This similarity in the choice of test statistics between limit setting and fitting exemplifies the convergence of the consistent statistical treatment of one-sided confidence limits and two-sided confidence limits in the LHC era. This approach goes back to the ideas discussed in Ref. [30], but was not used in high energy physics at earlier major experiments. The discussion of the likelihood ratio based hypothesis test in this section follows the discussion in Ref. [31].
A detailed discussion of model exclusions based on model-independent likelihood values, as employed in HiggsBounds, has been presented in Ref. [9]. 18 The difference between the HiggsBounds and HiggsSignals approach lies on the one hand in the approximation to the likelihood, which in HiggsBounds is based on published observed and expected likelihood values, while in HiggsSignals the likelihood ratio is approximated from the ∆χ 2 . On the other hand, the approaches differ in the parametrization of the model, where physical parameters are applied for HiggsBounds, and the signal strength s is used in the HiggsSignals example discussed here. In the following, we will focus on the aspect of limit setting as we strongly discourage from using HiggsSignals for an attempted discovery. See Section 3.3.3 for detailed considerations on this issue.
When setting a limit on s, Eq. (27) is not used directly. Instead, several separate cases are considered: First, in order not to hold an overshoot of the data above the signal hypothesis against the signal, and still allowing s < 0, the test statistics from Eq. (27) is further modified in the following way: where we have introduced the notation q χ 2 s to distinguish this modified test statistics from the test statistics t( p) employed in the fits described in Section 3.3.1 as given in Eq. (22). The superscript χ 2 refers to the χ 2 approximation of the LR. The difference between the two test statistics further lies in the case separation of Eq. (28): forŝ > s, q χ 2 s = 0 is used in order to prevent the signal hypothesis s from being excluded if an even larger signalŝ would actually better describe the experimental results.
Often, s > 0 is required from physical considerations 19 , in which case a negative best-fit-valuê s should be interpreted as "The no-signal hypothesis fits data the best". Then, is used, 20 where χ 2 (0) denotes the χ 2 obtained for the null hypothesis, typically the SM.
In order to derive the desired C.L. for s = 1 or find s up , the calculation of a probability density function f (q χ 2 s |s) is needed to compare the range of possible expected outcomes for q χ 2 s (the random variable, i.e. the quantity which would show a different outcome for each simulated repetition of all measurements) with the observed q χ 2 s in the case of an assumed signal strength s. For calculating the C.L. at s = 1, only the pdfs f (q χ 2 s |1) and f (q χ 2 s |0) are required. For the calculation of s up , s needs to be varied, and eventually pdfs need to be calculated for several choices of s, until the desired C.L. at 1 − β is found at s up . Details are given in Ref. [31]. In this reference, for the case discussed here, the correspondence between the observed test statistics q χ 2 s and the observed C.L. of the signal plus background hypothesis CL s+b 21 is approximately found using where Φ is the cumulative normal distribution function.
As discussed above, the power 1−β is typically chosen such that β = 0.05 holds for a one-sided confidence level which is usually called a "2 σ" C.L. 22 . This case is exemplified in the right panel of Fig. 4. The desired upper limit on the allowed signal rate s up (corresponding to p up Fig. 4) is adjusted such that its CL s+b corresponds to the chosen C.L. of 1 − β = 0.95. Then, Φ −1 (0.05) = 1.64 can be used to determine the 95 % C.L. upper limit The standard deviation σ s on the signal hypothesis s (as constrained by the test statistics) can be determined numerically from the Wald approximation [33] using the same minimization and profiling techniques as in case of a fit (see Section 3.3.1). The approximate uncertainty σ s on the parameter s in a simple one-dimensional fit is σ s = |ŝ − s(∆χ 2 = 1)|, where s(∆χ 2 = 1) is the signal strength below or aboveŝ for which ∆χ 2 = 1 is obtained. This uncertainty connects the two-sided limit calculated for a fit to the one-sided limit discussed here. This connection can be shown as follows: In the Wald approximation, near the optimum the ∆χ 2 forms a parabola and hence can be parametrized as Using 1.64 = Φ −1 (0.05) = q χ 2 sup = ∆χ 2 (s up ) = (s up −ŝ)/σ s in the allowed range ofŝ, we can solve for s up and find Eq. (31). In this case, the model parameter point p can be regarded as excluded if s up ≤ 1 is found, and as allowed if s up > 1 is observed.
It is important to note that the above confidence level construction yields CL s+b , i.e. the confidence level of the signal plus background hypothesis at the signal rate s up . In most experimental applications of signal exclusions, CL s = CL s+b /CL b is used in order to avoid accidental exclusion of parameter points for which the search has no sensitivity [34]. Based on Ref. [31] and the example in Ref. [9], we find for the notation of the latter reference q exp s (s) ≡ ((s −ŝ)/σ s ) 2 and q obs s (ŝ) ≡ ((ŝ −ŝ)/σ s ) 2 = 0. Here, q exp s (s) in the notation from 21 It should be noted that in the brief discussion presented in this paper, every C.L. and C.I. is described as an observed result corresponding to the observed data. In order to obtain the expected quantities, the statistical treatment is exactly the same, apart from replacing the observed data with the measurements expected under the s = 0 hypothesis. 22 While β = 0.05 is not the exact value corresponding to a 2 σ exclusion, it is so close to it that it has become a commonly used term.
Ref. [9] is the square of the separation between the tested value s and the best fitŝ in units of the uncertainty σ s , while q obs s (ŝ) is the test statistics of the observed best fit point,ŝ, which in the definition of this section is always 0. 23 For the confidence levels, from inverting Eq. (30) and inserting into Eqs. (25) and (26) of Ref. [9], we thus find where the symmetry stems from the symmetry of the parabolic approximation to the likelihood. In order to find the limit s up for the CL s case, the above procedure has to be repeated iteratively by varying s until it is adjusted to the value s up such that CL s = 0.05 (for a 95 % confidence level) is found. For testing the exclusion of a model point at s = 1, no iteration is required. CL s can be calculated using the equations above, and if CL s < β is found, the parameter point p in the model can be regarded as excluded. Figure 5 illustrates the difference between a limit setting and a fitting interpretation in a toy model with a single SM-like Higgs boson at 125.09 GeV that is modified by a global coupling scale factor κ ≤ 1 and an additional decay into invisible final states with a branching ratio BR(h 125 → inv). For instance, this toy model can directly represent parts of the parameter space of models with additional scalar singlet(s) [35,36]. The figure shows three different ways to determine the allowed parameter region. The dashed line indicates the 2σ favored region in a fit (compare Table 4). The dotted line is the limit derived from CL s+b using s up from Eq. (31), and the solid line indicates the 95 % C.L. limit based on CL s . Both of the limit setting approaches lead to a less aggressive limit than the fit since they use one-sided instead of two-sided confidence intervals. For this simple example the difference between CL s+b and CL s is small, but in general the CL s based limit should be used whenever possible. This example is implemented in the HSLimitEffC example code. 23 The equivalence of using the squared relative separation of s fromŝ as an "expected" value, which may seem surprising at first, arises from the fact that only differences between χ 2 values have a meaning in this interpretation, and not absolute values. This is analogous to the definition of the likelihood, where only ratios of likelihoods (or differences of log-likelihoods) have a physical meaning. Fundamentally, s/σs parametrizes the expected distance (in units of σs) of the median of the pdf of the test statistics of the signal plus background hypothesis from the median of the pdf of the s = 0 hypothesis, andŝ/σs is the relative observed distance from s = 0. However, we define q χ 2 s such that q χ 2 s (s =ŝ) = 0. Thus, in contrast to an implementation as in Ref. [9], where the test statistics q obs at the best fit point is not defined to be 0, the χ 2 based input to the limit calculation applied here is shifted byŝ/σs, and thus s/σs −ŝ/σs parametrizes the relative distance of the expected result from the background-only result, andŝ/σs −ŝ/σs = 0 always holds for the observed result. L. allowed regions derived from a fit (dashed) and a limit setting procedure using CL s+b (dotted), which corresponds to s up = 1, and CL s (solid).

Usage in a goodness-of-fit test
The third typical application of the HiggsSignals χ 2 is the goodness-of-fit test. Fundamentally, it aims to determine the consistency of the model compared to the data only, in contrast to the comparisons between different (parameter) hypotheses in the above two cases. In practice, there are only very few cases where a goodness-of-fit test is preferential over a hypothesis test. One example is a case where the null hypothesis is deprecated by the data in a similar way as the alternative hypothesis, which contains the null hypothesis within its parameter range. This seemingly paradox situation can happen for example when comparing a supersymmetric model to the SM (see e.g. Ref. [15]). The SM as a null hypothesis is so well established that a very tight requirement on its rejection is placed -typically 5 σ significance corresponding to p 0 < 2.7 × 10 −7 . On the other hand, the alternative hypothesis shall be rejected at the level p s < 0.05. In this case, if a hypothesis test is performed following Section 3.3.2, in this hypothetical example the alternative hypothesis can never be refuted because in the decoupling limit it would fit the data at least as well as the SM, even though the SM would provide a bad fit to the data at the level of 2.7 × 10 −7 < p 0 < 0.05. In such a case a careful goodness-of-fit test can be used to refute a model.
The simplest application of a goodness-of-fit test is to calculate the probability P (χ 2 min |ndf) of the observed χ 2 min under the assumption that it follows a χ 2 distribution after the minimization in the fit, as described in Section 3.3.1, for the given number of degrees of freedom (ndf). In the construction of the HiggsSignals χ 2 it is assumed that the uncertainties are Gaussian, all correlations are linear and that the relation between observables and parameters is approximately linear around the minimum. In this case, the analytic form of the χ 2 distribution can be used to calculate the χ 2 probability, and the user can require a certain confidence level for an allowed model. Often, the ratio χ 2 min /ndf is used to estimate the goodness-of-fit, and it is expected that it is around 1 for a model that provides a good fit to the data. The variance of the χ 2 distribution V χ 2 = √ 2 ndf can be taken as a rough guideline for the expected range of deviation from 1 for a good fit.
However, this popular approach, while not generally invalid, comes with two caveats. First, in many fits of BSM models, two different parameter regimes are within the allowed parameter ranges in 1 or 2 σ C.I.: parameter ranges where the observables effectively do not depend on the parameters (the decoupled regime) as well as regions where the observables depend strongly on the same parameters. In such a case, the construction of the analytic χ 2 probability in the Gaussian approximation is invalid because it assumes a purely linear relation between parameters and observables. The user should handle such a situation with care, e.g. by using toy Monte Carlo simulations to obtain the true probability density function of the χ 2 .
The second caveat is that there is a notable difference between the goodness-of-fit and the confidence interval calculation (see Sections 3.3.1 and 3.3.2): By construction, the confidence interval cannot increase (or in other words, the constraint on the allowed parameter space cannot get looser) if the data is presented in an increased number of subchannels. 24 Splitting the measurement in many individual measurements can only decrease the confidence interval or leave it unchanged. However, if the measurements are split up in many individual results which the model can only vary together, but not individually, the goodness-of-fit in the average of all statistical tests in an ensemble of possible experimental outcomes increases if the separate measurements are statistically consistent with each other. This statistical increase of the goodness-of-fit on average buries the tension between the physical variations and the data under the expected statistical variations of the individual measurements. Therefore, using many experimental subchannels as implemented in HiggsSignals can artificially increase the goodness-of-fit. The consequences of this effect will also clearly appear in the validation of the LHC Run 1 combination in Section 4.1. Note that this statement about the probable outcome of such a test assumes statistical consistency amongst all observables. The situation is different if separate measurements of the same or a similar physical quantity do not agree with each other at a significant level, and thus are not at all or only marginally consistent with each other. A notable occurrence of such a case is the electroweak fit [37], where different individual measurements for the forward-backward and left-right asymmetries at the Z pole are in disagreement with each other while their average agrees with the SM prediction. In this case, the goodness-of-fit of course increases when fitting to physically meaningful averages instead of fitting to the separate measurements of the physical quantity which the SM cannot 24 Assuming that systematic uncertainties do not grow with smaller subchannel size.
vary independently. As a consequence, the result for the goodness-of-fit depends on whether individual measurements are directly used as input for the fit or whether instead certain averages of individual measurements are used as input. For a discussion of both of these caveats see Ref. [15].
Due to the latter caveat we further discourage using the goodness-of-fit on multiple parameter points in order to find an allowed area. The result would strongly depend on the structure of the experimental data and not only on the genuine predictions of a model parameter point. The goodness-of-fit should be evaluated with care, and the effort needed to do so is typically only warranted for the best-fit point. If it has an acceptable goodness-of-fit, the preferred parameter regions are much more robustly determined using the fit procedure described in Section 3.3.1. Generally, for all the above reasons, we advise to treat all goodness-of-fit test results as acceptable as long as each of them lies within the range deemed acceptable by the analyzer, and we advise against excluding models based on a comparison of the goodness-of-fit between models. However, while models should not be excluded based on a relative comparison of their goodness-of-fits, it is a valid question to ask whether models with different numbers of degrees of freedom provide a better or worse description of the data. This is often of interest if a more general model with a large number of parameters contains a more constrained model with less parameters.
In order to achieve this comparison between models of different complexity, a generalization, called the F -test, can be derived [38]. In practice, the F -test might be easier to implement than the hypothesis test described in detail in the limit setting description of Section 3.3.2. However, it is not as statistically stringent.
The F -test is a possibility to quantify how much a model B improves over the description of the data compared to model A with a different number of free parameters. This is particularly useful if one of the two models is a special case of the other model. If B contains A, as in extensions of the SM, B will always provide a better or equal fit to the data than A. The F -test weighs the higher complexity of B over A against the improvement in the fit. The test statistic f is calculated as with the number of degrees of freedom ν A,B = n obs − n par,A,B , where n obs denotes the number of measurements, and n par,A,B corresponds to the number of parameters in the two models. The χ 2 values refer to the minimal χ 2 value found in the parameter space (i.e. the best-fit value The cumulative probability F quantifies the significance of the χ 2 improvement found in the more general model, while accounting for the larger number of parameters. It is found by integrating the probability density function of the test statistic f from zero to the f -value determined by the data via the fit. The null hypothesis, which is that model B does not provide a significantly better fit to the data than model A (i.e. the SM), can be rejected, for instance, at the 68% (95%) confidence level if F > 0.68 (0.95).

Caveats on rejections of the Null Hypothesis
The decision chart in Fig. 3 also contains the case where a combination of measurements available in HiggsSignals could be used to reject the SM hypothesis in favor of a specific model of New Physics. We urge for a lot of caution in such an application. For instance, consider the combination of ATLAS and CMS results via HiggsSignals. On a purely statistical basis, this test is more sensitive than a test using experimental results from a single experiment, and thus it is possible that such a phenomenological analysis might be the first to claim a rejection of the null hypothesis (i.e. the hypothesis that the observed Higgs boson has precisely the properties predicted by the SM) in favor of a specific alternative hypothesis (e.g. a model where the 125 GeV Higgs boson has a modified coupling structure with respect to the SM). As long as the model of New Physics is not constructed to specifically fit the observed deviations of the data from the SM prediction, it is in principle statistically viable to perform such a test. However, if the sensitivity of the single experiments is not sufficient to reject the null hypothesis, it is very likely that the exclusion derived in a combination of measurements from both experiments is statistically marginal, making the numerical outcome strongly dependent on factors like the treatment of systematics and their correlations. While these are implemented with care in HiggsSignals, they contain the relevant correlations only for the theory uncertainties and the luminosity, and the published experimental correlations within one analysis. Experimental correlations between different analyses, which could arise e.g. from particle reconstruction effects or common selection cuts are usually not published and hence cannot be taken into account in HiggsSignals. Therefore, it is not advisable to claim a statistically meaningful rejection of the SM based on HiggsSignals alone. In addition, as explained in Section 3.3.3, one should also consider the custom to place a much more stringent criterion of 5 σ significance, corresponding to p 0 < 2.7 × 10 −7 , on the C.L. required for the exclusion of the null hypothesis.
Furthermore, the rejection of the null hypothesis always depends -to a varying degree, depending on the definition of the likelihood ratio -on the signal hypothesis. This is also true for the profile likelihood technique employed at the LHC [31], where a discovery is claimed if the likelihood of the background only (or SM) hypothesis compared to the best fit hypothesis of an alternative model is sufficiently small. It is conceivable that a model of New Physics is tested as an alternative hypothesis which was specifically and intentionally constructed to describe known deviations of the measurements from the SM. Testing such a model against the same data as was used for its construction in order to claim a rejection of the SM is obviously not statistically meaningful.

Performance tests
In this section we discuss the performance of HiggsSignals for a few selected experimental analyses by comparing official and HiggsSignals-reproduced results for various models that parametrize Higgs couplings or certain production rates. We first present the HiggsSignals performance for the ATLAS and CMS Run-1 combination [4] of Higgs boson measurements, either using the officially combined measurements or measurements from the individual Run-1 analyses. We then discuss example results for Run-2 analyses, either using signal strength (µ) measurements or STXS measurements as input. We conclude this section with a recommendation for the presentation of future Higgs signal rate measurements.

Reproduction of the ATLAS and CMS Run-1 combination
ATLAS and CMS published results for the production cross sections and decay rates of the observed Higgs boson from a combined analysis of the LHC pp collision data at √ s = 7 and 8 TeV [4]. For the reproduction of the results with HiggsSignals we use two different experimental inputs: 1. The combined ATLAS and CMS results for production cross section, σ i , times decay branching ratio, BR(H → f ), of the various signal channels together with the provided correlation matrix from Ref. [4]. This input and the associated χ 2 calculation is described in Section 2.4. [39][40][41][42][43][44] and CMS [45][46][47][48][49][50][51][52][53].

The signal strength measurements published in individual Run-1 analyses by ATLAS
We consider several generic parametrizations based on Higgs coupling scale factors and compare the HiggsSignals results for both input methods to the official experimental fit.

Parametrization through coupling scale factors
In the κ framework [54] we parametrize BSM effects through seven independent Higgs coupling scale factors -the generation-independent fermionic scale factors for up-type and down-type quarks κ u and κ d , respectively, for leptons κ , as well as the heavy gauge boson scale factors κ Z , κ W , and the loop-induced gluon and photon scale factors κ g and κ γ . Since the total width of the Higgs boson cannot be constrained at the LHC with sufficient precision in a modelindependent way, an additional assumption is necessary. Possible assumptions to overcome the degeneracy induced by the unknown total width are the following [14]: 1. There are no BSM decays of the Higgs boson, i.e. BR(H → NP) = 0.
2. New physics decays are allowed, i.e. BR(H → NP) ≥ 0 but the scale factors for the Higgs-gauge boson couplings are required to be |κ W,Z | ≤ 1. This assumption breaks the degeneracy introduced by the unknown total width by limiting the VBF and V H Higgs production channels [55,56].
3. All additional Higgs decay modes are required to yield an invisible final state, BR(H → NP) = BR(H → inv.). Such decay models can then be constrained by direct searches at the LHC which exploit the Higgs boson recoil when produced in association with other objects (Z or W boson, quarks, etc.).
In the following, we only consider the first two scenarios. The second one is compatible with a wide range of BSM physics. In particular, it is valid for models that contain only singlet and doublet Higgs fields. By including the branching fraction into states of new physics as additional free parameter in the fit we have eight free parameters in the second scenario. In this equation a new modifier, κ H , defined as is introduced to characterize modifications to the sum of the partial widths of decays of the Higgs boson into SM final states. Here, BR j SM = Γ j SM /Γ H SM is the branching ratio as predicted for a SM Higgs boson. In both fits, it is assumed that the coupling scale factor κ u is positive, without loss of generality. Figure 6 shows the fit results for the two scenarios. The HiggsSignals-2 fit results with the ATLAS and CMS combination input and the results with the individual signal strength input are shown in blue and red, respectively. The results obtained from HiggsSignals with the different input datasets agree well with each other, and both results are consistent with the official results shown in black. Differences in the sign of the best fit point in the HiggsSignals results are typically due to χ 2 distributions that are exactly (e.g. for κ W in the BR(H → NP) = 0 case) or almost insensitive to the sign.
For BR(H → NP) = 0 (Fig. 6, left panel ) in the case of the individual signal strength input the best-fit point has a goodness-of-fit of χ 2 min /ndf = 52.6/69 which leads to a pvalue compatibility between the data and the predictions of the SM of 92.87 %. In the case of the ATLAS and CMS combination input we find χ 2 min /ndf = 14.95/13 and a p-value estimate of 31.0 %. The reason for this large difference is the sensitivity of the goodness-of-fit to the structure of the experimental data discussed in Section 3.3.3. When many separate measurements are included, the p-value mostly measures the agreement between the separate measurements instead of quantifying the agreement of the model with the data [15]. For this reason we refrain from quoting the goodness-of-fit for the remainder of this section.
For the |κ V | ≤ 1 assumption in Fig. 6 (right panel ), the largest differences between the HiggsSignals-2 and the official results appear in the contributions of BSM decays. The HiggsSignals analysis returns BR(H → NP) < 0.37 (< 0.28) as the 68 % C.L. region using the combined (individual) ATLAS and CMS signal strength input. Both are notably larger than the official results BR(H → NP) < 0. 16. We suspect that the assumption of Gaussian uncertainties is not fully applicable in all parts of the parameter space and therefore causes the observed discrepancy.

Parameterization using ratios of coupling scale factors
We now turn to a Higgs coupling parametrization in terms of ratios of κ scale factors. The gg → H → ZZ channel serves as reference channel for the normalization because its overall uncertainties and background contamination are very small. It is parameterized as a function of κ gZ = κ g · κ Z /κ H , with κ H defined in Eq. (36). 25 The ratios λ Zg = κ Z /κ g and λ tg = κ t /κ g are probed by taking the ratios of measurements of VBF and ZH production and measurements of ttH production, respectively, to the measured rate for the gg → H → ZZ channel. Similarly, the ratios obtained from gluon fusion production with the decay modes H → W W , H → τ τ , H → bb and H → γγ probe the four ratios λ W Z = κ W /κ Z , λ γZ = κ γ /κ Z , λ τ Z = κ τ /κ Z and λ bZ = κ b /κ Z . Without loss of generality, κ Z and κ g are assumed to have the same sign, constraining λ Zg and κ gZ to be positive. Figure 7 compares the official fit results with the ones obtained from HiggsSignals-2, displaying the corresponding best-fit values for the different λ parameters. We find very good agreement between the results obtained from HiggsSignals-2 and the official results, though the HiggsSignals fits tend to have slightly larger 68 % and 95 % C.L. intervals. There are two reasons for these differences. Firstly, the Gaussian approximation may not be valid for the experimental uncertainties in all regions of parameter space. Secondly, the parameterization in terms of ratios should lead to partial cancellations of common theoretical uncertainties which are not expected to be entirely captured in the HiggsSignals results. While our result with the individual signal strength input reproduces the positive sign of the best-fit value for λ tg , the HiggsSignals-2 fit with the ATLAS and CMS combination input appears to prefer negative values. However, the ∆χ 2 between the negative and the positive best-fit value is tiny and thus there is no clear preference for one particular sign.

Parametrization using ratios of cross sections and branching fractions
Using the narrow-width approximation, the signal strength µ f i can be decomposed into the signal strength for production µ i = σ i /σ i,SM and the signal strength for the decay, µ f = BR f /BR f SM , with the short-hand notation BR f ≡ BR(H → f ).
Choosing the gg → H → ZZ channel as reference, the product of the production cross section σ i and the branching fraction BR f can be expressed as In accordance with the ATLAS and CMS analysis [4], we assume that the gluon fusion (ggF) and the bbH production signal strengths are equal, µ ggF = µ bbH , the H → Zγ and H → γγ decay signal strengths are equal, µ Zγ = µ γγ , and the H → gg, H → cc, and H → bb decay signal strengths are equal, µ gg = µ cc = µ bb .  HiggsSignals-2 using the combined input agrees well with the official result, we observe some larger discrepancies for various parameters when using individual measurements, most strikingly for σ ZH /σ ggF and BR bb /BR ZZ which are strongly anti-correlated. Furthermore the central values of σ ttH /σ ggF and σ VBF /σ ggF are shifted towards the SM prediction. We mainly relate these features to the fact that some analyses -in particular in CMS -have been improved for the combined result, but no updated individual measurements have been released. Figure 9 shows a comparison between the official ATLAS-only [57] (left panel ) and CMSonly [58] (right panel ) fit results and the corresponding results obtained with HiggsSignals-2 when using the individual signal strengths from ATLAS and CMS. HiggsSignals-2 reproduces the official ATLAS-only fit result very well. However, for the CMS-only fit we observe similar discrepancies as for Fig. 8, namely smaller values for σ VBF /σ ggF , σ ZH /σ ggF , and σ ttH /σ ggF as well as a larger value for BR bb /BR ZZ . In summary, the performed comparisons in all three model parametrizations have demonstrated very good agreement between the HiggsSignals implementation of the LHC Run-1 measurements -both using the individual and the combined experimental input -and the official ATLAS/CMS fit results. The agreement between the two possible HiggsSignals implementations is on the one hand a successful closure test of the HiggsSignals peakcentered χ 2 method, and on the other hand motivates our choice of using the LHC-Run-1 combined experimental measurements as default input for the LHC Run-1 legacy χ 2 evaluation in HiggsSignals-2, as described in Section 2.4. Computationally, this implementation is much faster. However, for very specific applications where the assumptions underlying the LHC Run-1 combination are not fulfilled, the experimental input from the individual Run-1 analyses is still available as the LHC7+8 observable set in the HiggsSignals package.

Examples for Run 2 Analyses in HiggsSignals-2
During Run 1 of the LHC, Higgs rate measurements were mainly represented in terms of signal strengths, µ = σ/σ SM , and coupling modifiers, κ i . For LHC Run 2 the experimental collaborations increasingly made use of the STXS framework to present their results (see Section 2.5). In some analyses, both STXS measurements and conventional signal strengths measurements in various event categories were presented, along with the correlation matrices, which are necessary to allow a comparison of the performance of the two experimental input formats.
In this section we discuss the performance of HiggsSignals-2 on the provided experimental input for a selection of LHC Run-2 analyses. These examples are primarily chosen to illustrate the level of agreement of the reconstructed HiggsSignals result with official results from ATLAS and CMS, and to highlight difficulties in the usage of the experimental results, which are often related to incomplete information in the public documentation of the experimental analysis. With the increasing amount of data during Run-2 the statistical uncertainty can be assumed to be Gaussian to very good approximation in most Higgs boson search channels. However, a decreasing statistical uncertainty also entails the fact that systematic uncertainties and their correlations among different measurements become more relevant. Therefore, it has become common practice for ATLAS and CMS to provide a correlation matrix of the experimental (statistical and systematic) uncertainties for the Run-2 measurements.
In the following we discuss the performance of the two input types (signal strength modifiers and STXS measurements) for a few selected Run  Table 6 and Table 7, respectively.

Input given in terms of signal strength modifiers
We discuss the CMS measurements in the H → W + W − channel at the LHC Run-2 as an example for analyses implemented in terms of signal strength modifiers. We find reasonable agreement between the reconstructed and official intervals and the corresponding best-fit point for µ ggF . However, the size of the allowed µ V H/VBF intervals and the observed anti-correlation between µ ggF and µ V H/VBF is not reproduced. The reason for this discrepancy is the lack of public information on signal efficiencies for the sub-channel rate measurements. The anti-correlation observed by CMS in the left panel of Fig. 10 indicates that these sub-channels are composed of signal contributions from both fermionic and bosonic production modes.
Information on sub-channel signal efficiencies was made available in the CMS H → W + W − analysis at 35.9 fb −1 [60]. The right panel of Fig. 10 shows the performance comparison for this analysis using the sub-channel signal strength results and the corresponding efficiencies. We find very good agreement between the reconstructed ∆χ 2 and the official likelihood results. This demonstrates the importance of publicly available detailed sub-channel information on signal strengths and signal efficiencies. The analysis performed by CMS with 35.9 fb −1 also includes first results for H → W + W − in the stage-0 STXS framework. However no information on the correlations between the STXS bins was provided for this analysis, which severely limits the usefulness of those STXS results. The performance achieved using this partial STXS input (not shown in the figure) is significantly worse than for the signal strength results in the right panel of Fig. 10.
A further update in the H → W + W − channel to 137 fb −1 [61] has since been released by CMS and is implemented in the current HiggsSignals datasets. The results of this analysis are given in terms of n-jet differential cross sections in the STXS framework. Per-bin signal efficiencies and inter-bin correlations are available as well. This analysis provides the most complete input to date, and we expect its implementation in HiggsSignals to be the best performing one. However, the analysis presented in Ref.
[61] does not include any interpretations that could be used for a performance comparison with HiggsSignals.

Input given in terms of Simplified Template Cross Sections (STXS)
We have seen that detailed information on sub-channel signal strengths and signal efficiencies are important when Higgs coupling measurements are given in terms of signal strengths. We now discuss some example applications for which the input to HiggsSignals-2 is given in the STXS framework instead. The diamonds indicate the SM prediction.
As a first example we study the HiggsSignals performance for ATLAS measurements in the H → ZZ * → 4 channel with 139 fb −1 of data [62] as STXS observables. These experimental results were given in 12 reduced Stage-1.1 STXS bins along with the correlation matrix for the experimental uncertainties ( Fig. 10 of Ref. [62]). The HiggsSignals ∆χ 2 distribution in the (κ V , κ F ) parameter plane based on this input, neglecting correlations of theoretical uncertainties on the STXS bin predictions, is shown in the left panel of Fig. 11 in comparison to the official ATLAS result (shown as gray contours). The agreement at lower values of κ V and κ F with the ATLAS results is very good. However, at larger values we find a small mismatch between the reproduced and official confidence region contours. In these regions the agreement can be improved if correlations of theoretical uncertainties on the gluon fusion STXS bin predictions are included in the χ 2 calculation, as shown in the right panel of Fig. 11. These correlations were evaluated by the ggF-subgroup of the LHC HXSWG and have been taken from Ref. [63] ("2017 scheme"). The evaluation of similar correlations for the STXS bins of other production modes is still in progress. As can be seen these correlations lead to a flattening of the likelihood at large coupling scale factors, i.e. in the regions where the corresponding cross sections (and thus their uncertainties) are larger than the SM prediction.
The next analysis that we consider here is the CMS measurement in the H → τ τ decay channel at 77 fb −1 [64]. CMS provides cross section measurements for nine different kinematic regions together with the expected acceptances, the SM predictions and the correlations between the bins. Figure 12 shows the comparison between the official and the reproduced ∆χ 2 contours in the (µ ggF,bbH , µ V H,VBF ) (left panel ) and (κ V , κ F ) (right panel ) parameter planes. We find reasonable agreement in the former but a substantial disagreement in the latter. The source of this discrepancy can be traced back to the fact that CMS included the contribution from H → W W to the eµ final state to remove an unconstrained direction along κ V . As the H → W W contribution is not accounted for in the presented STXS measurements it is not possible to properly implement it in HiggsSignals. We therefore find this open direction. However, in a global picture where H → τ τ and H → W W are simultaneously taken into account in HiggsSignals, the flat direction is lifted. This is already the case when adding only the measurements in the eµ final state e.g. from the dedicated CMS H → W W analysis [60].
As a final example in this context we discuss the ATLAS combination of Higgs analyses in the γγ, ZZ * , W W * , τ + τ − , bb and µ + µ − final states based on up to 79.8 fb −1 of Run-2 data [65]. Within the STXS framework -under the assumption that the observed signals are associated with a single particle -the various measurements are combined to determine the cross sections in various STXS bins. These bins represent a production process in a specific kinematic regime, e.g. gluon fusion in association with one additional jet, and a Higgs boson transverse momentum of 60 GeV ≤ p H T ≤ 120 GeV, times the branching fraction of the Higgs boson to Z bosons, BR(H → ZZ * ). In addition, ratios of branching ratios are determined for the various final states, with BR(H → ZZ * ) taken in the denominator. Here we use these measurements (given in Fig. 9 of Ref. [65]) and the corresponding correlation matrix as experimental input for HiggsSignals.
The performance of HiggsSignals for the results of the ATLAS Higgs combination is shown in Fig. 13, where the left panels display the results in the (κ V , κ F ) parameter plane and the right panels display the results in the (κ g , κ γ ) plane. For illustration, the top panels show the resulting likelihood profile if correlations both between experimental and theoretical uncertainties are neglected. We observe a clear mismatch in size, shape and location of the allowed regions with respect to the official ATLAS result (shown as gray contours). Once we include the correlations of experimental uncertainties the agreement of the reproduced and official confidence regions strongly improves, as shown in the middle panels of Fig. 13. Finally, the bottom panels show the result when correlations of theoretical uncertainties in the gg → H STXS bins [63] are also included (see above for details). From the comparison of the middle and bottom panels in Fig. 13, we find that these correlations have a rather small impact, giving rise to a further slight improvement of the agreement between the HiggsSignals result and the official result.
It should again be noted here that combination results with a separate determination of production and (ratios of) decay rates rely on the assumption that the signals are associated with a single Higgs boson. Thus, these combined experimental results cannot be directly treated as peak or STXS observables in HiggsSignals, where per default a signal can be composed Figure 13.: Performance test in the (κ F , κ V ) plane (left panels) and the (κ g , κ γ ) plane (right panels) using the STXS measurements of the ATLAS Run-2 Higgs combination with 80 fb −1 as HiggsSignals input. The contours and best fit points are indicated as in Figs. 10 to 12. Correlations of experimental uncertainties are neglected in the top panels but included in the middle and bottom panels. Theoretical rate uncertainties for the gg → H process are treated as uncorrelated in the top and middle panels and as correlated in the bottom panels, see text for further details.
of any superposition of Higgs bosons in the model (see Section 2.6.3). Therefore, in the officially provided observable sets we instead use the individual (uncombined) measurements as default experimental input. If only one Higgs boson is present in the model (or the other Higgs bosons have masses far away from 125 GeV), the combined measurements can still be safely used as experimental input (provided as observable set ATLAS_combination_Run2 in the HiggsSignals package).

Recommendations for the presentation of future Higgs signal rate measurements
Based on our past experiences and the observations in the above performance tests we summarize in this section the essential experimental information needed to enable an accurate reproduction of the model interpretations of the data (e.g. in the κ framework) provided by ATLAS and CMS and thus a reliable application of the experimental measurements in BSM model tests. This concurs with and partly extends recent recommendations on the presentation of LHC measurements provided in a joint effort by the experimental and theory community [66], as well as similar recommendations that we recently provided in the context of BSM Higgs limits [9]. We furthermore discuss the issue of the remaining model dependence in Higgs signal rate measurements, and some ideas for a possible reduction of the model dependence.
Early LHC results on the Higgs signal rates have been presented as signal strength modifiers (µ) in specific signal channels -often binned in various experimental categories -targeted by the experimental analysis. These measurements assume the kinematic properties of the SM Higgs boson, relying on a SM MC simulation of the inclusive signal process. The result -the signal strength modifier, µ -is defined as the observed signal rate normalized to the expected signal rate for a SM Higgs boson.
The complexity of calculating the expected signal rate in a BSM model depends on whether the Higgs boson candidate has -exactly or to a sufficiently accurate approximation -the same kinematic properties as the SM Higgs boson. If so, the signal rate can be predicted within the BSM model easily without MC simulation if the signal efficiencies (or signal compositions) for the corresponding experimental measurement are known. This is the approach pursued per default in HiggsSignals, see Eq. (1). It is therefore essential that the signal efficiencies (or signal compositions) expected for a SM Higgs boson are provided for all µ measurements. In contrast, if the Higgs boson candidate has significantly different kinematic properties compared to the SM Higgs boson, the signal needs to be processed through the full experimental analysis, which includes a full MC simulation (including detector simulation) of the signal and the application of analysis cuts. Given the high complexity of most experimental analyses (using, for instance, machine learning techniques) this is often impossible for theorists (who do not have access to full simulations for the LHC detectors) and should preferentially be carried out by the experimental collaborations.
In order to ameliorate this situation, i.e. to allow a proper application of measurements also to Higgs boson candidates with different kinematic properties than in the SM, the STXS framework has been introduced. An STXS observable is defined on a specific region in phase space at the MC particle level. The experimental unfolding process only relies on the kinematic properties of the SM Higgs boson within this specific phase space. 26 The model-dependence is therefore reduced, which, in turn, allows a wider application of these measurements to BSM models. At the moment, STXS observables are typically binned in jet multiplicity and/or Higgs boson transverse momentum (p H T ), depending on the Higgs boson production process. As more LHC data is accumulated a finer STXS binning as well as binning in additional kinematic variables (e.g. angular observables) becomes possible, hence various STXS stages have been defined (or are still being defined). In this way, the model assumptions can be further reduced with increasing amounts of data.
As the STXS observables are defined for specific particle level topologies of the Higgs boson production process, these can incorporate several production processes that depend on different Higgs couplings. For instance, gg → H(+jets) STXS observables target the gluon fusion production mode including gg-induced EW corrections. These are composed of virtual EW corrections to the gg → H form factor as well as real EW corrections, corresponding to gg → (Z → qq)H. In HiggsSignals, gluon fusion and gg → ZH are, however, treated as separate processes, as their dependences on the Higgs coupling properties are different. Another example are the STXS observables of the class "EW qqH" composed of the vector boson fusion (VBF) as well as qq → (V → qq)H (with V = W, Z) processes, which all three are treated as separate processes in HiggsSignals. While higher STXS stages aim at separating these subprocesses, the earlier stages must be regarded as inclusive in these processes. 27 For such STXS measurements it would be beneficial to publicly release the signal efficiencies (or signal compositions) for the involved processes, analogous to the case of µ measurements, see above. Unfortunately, this has not been done by the experiments so far.
For both the inclusive (µ) and STXS measurements, the correlations of uncertainties -preferably given separately for experimental and theoretical sources -need to be included in the χ 2 test in order to achieve a good performance. As the unfolding to pure production channels 26 While this is true for the central measurement, uncertainty correlations also depend on the SM Higgs boson predictions (and thus the assumed SM Higgs boson kinematics) in other phase space regions. 27 The general claim for early STXS stage measurements is that, at the present level of precision, these processes cannot be resolved. However, we want to remark that this claim relies on the assumption of SM signal strengths for all involved processes, and may not be true if a BSM model predicts a strong enhancement in one or more of these processes. and/or STXS bins often induces large correlations, experimental measurements of this kind are nowadays always accompanied by the corresponding correlation matrix. However, even µ measurements in experimental categories (which are therefore composed of various Higgs processes) have correlated experimental and theoretical uncertainties. Therefore it would be useful if the experiments could provide correlation matrices also for such cases, which so far has rarely happened.
We would furthermore like to encourage the experiments to accompany all signal strength and STXS measurements with a reference value for the signal rate expected for a SM Higgs boson. In case the measurement is quoted as a normalized signal strength, this allows one to recalculate the absolute observed signal rate. Furthermore, the signal strengths predicted in BSM models can in many cases be approximated by a simple rescaling of the SM reference value.
Lastly, we want to emphasize that the two-dimensional toy model interpretations -(κ V , κ F ) and (κ g , κ γ ) in the κ framework, as well as the production rate rescaling models -have proven extremely valuable for validation checks. Unfortunately, we have recently experienced that these interpretations are sometimes not performed when new experimental results are released, see e.g. Ref.

Summary
We have presented a new version of the public computer program HiggsSignals for confronting the predictions of arbitrary BSM models with the measured mass and rates of the Higgs boson that has been detected at about 125 GeV. The description of HiggsSignals-2 provided in the present paper has focussed on the improvements of the functionality and applicability of the program with respect to version 1.0 that was presented in Ref. [10]. Besides the confrontation with the properties of the detected Higgs signal, the phenomenological viability of BSM Higgs sectors should also be tested against the limits that have been obtained from the searches for additional Higgs bosons -a task that can be performed using the related public tool HiggsBounds [5][6][7][8][9].
On the basis of theoretical model input from the user -in the form of predicted Higgsboson masses, production cross sections, and decay rates in the HiggsBounds input format -HiggsSignals evaluates a χ 2 measure to quantify the compatibility of the measured Higgs properties with the model predictions as discussed in Section 2. The peak-centered χ 2 method is used to evaluate the χ 2 contribution from signal strength (µ) measurements, while STXS measurements give rise to a χ 2 contribution that is separately calculated in a newly introduced module. In addition, the LHC Run-1 results give rise to another χ 2 contribution, employing the legacy ATLAS and CMS combination of the Run-1 results. Lastly, the Higgs mass measurement(s) lead to another contribution to the total χ 2 . In total, the current version HiggsSignals-2.6.0 includes 86 Run-2 signal rate measurements, 20 Run-1 measurements, and one (combined) Higgs mass measurement as default experimental input.
The description in this paper has addressed the calculation of the various χ 2 parts, comprising as new feature the incorporation of experimental results in the form of STXS measurements as well as further improvements that have been highlighted. We have discussed possible features of BSM Higgs sectors that can have an important impact in the comparison with the data. This includes cases where the signal efficiencies differ significantly from the ones of a SMlike Higgs boson, the treatment of theoretical uncertainties in the Higgs mass predictions, and scenarios where the signal is composed of overlapping contributions from several Higgs bosons.
In order to give some guidance on possible applications of HiggsSignals-2 we have thoroughly discussed the different interpretations of the χ 2 output provided by HiggsSignals and spelled out the involved assumptions. In particular, we have explained the application of the obtained χ 2 output in fits, for limit setting, and for goodness-of-fit tests. We furthermore discussed possible attempts of using HiggsSignals for ruling out the SM on the basis of possible deviations between the measurements and the SM predictions, and stressed important caveats in this context. The experimental results used by HiggsSignals-2 are either signal strength modifiers,μ, in the various search channels and experimental categories, or measured signal rates provided in the simplified template cross section (STXS) framework. In both cases the results have to be supplemented by their experimental uncertainties, ∆μ and ∆σ, respectively. As performance tests of HiggsSignals-2 we have presented several detailed validations against published AT-LAS and CMS results in Section 4. These comparisons illustrate that the achievable agreement with the official results published by ATLAS and CMS strongly depends on the information made available by the experiments. As a first step, we compared to the ATLAS/CMS combination of the full Run-1 data within the κ framework for different assumptions regarding the total width of the Higgs boson and found very good agreement. Second, in our validations of Run-2 results against κ fits performed by ATLAS or CMS, we found the best agreement if the sub-channel signal strength modifiers are given together with the corresponding signal efficiencies. If signal strengths are given in terms of the targeted production processes, the best-fit point is typically well reconstructed by HiggsSignals-2 while the correlations cannot be correctly reproduced. Therefore, we expect further improvements from additional information about correlations of experimental uncertainties. In case of STXS measurements we have seen that information about experimental correlations is crucial in order to obtain reconstructed results that are close to the ones published by ATLAS and CMS. If STXS measurements are not given in terms of pure signal channels, for example by using intermediate stage-1 bins, additional information about production processes could further improve the reconstruction (e.g. the W H and ZH composition in the leptonic V H bin for the considered ATLAS H → γγ analysis). On the basis of those findings we have formulated some recommendations for the presentation of future Higgs signal rate measurements.
The HiggsSignals source code is available at https://gitlab.com/higgsbounds/higgssignals together with continuously updated technical documentation of the program's subroutines, compilation and interfacing procedures.

A. Implementation of STXS observables in HiggsSignals
Each STXS observable is defined by an individual file with the extension '.stxs' located in the observable set folder. The observable set can contain signal strength (µ) observables and STXS observables simultaneously, however, the user should avoid statistical overlap of the included measurements. The structure of the .stxs files is exemplified in Table 5. The integer observable ID has to be unique among all STXS observables. As mentioned earlier, the STXS framework also allows one to evaluate a χ 2 contribution from potential Higgs mass measurements. The mass measurement must be associated with one STXS observable and is specified on line 7 of the .stxs file. It is activated by setting the mass-observable flag on line 6 to 1. Independently of this mass measurementm and its quoted 1σ uncertainty ∆m, line 8 and 9 of the .stxs file always specify a reference mass, for which the rate measurement has been performed, as well as an estimate of the mass resolution. The latter is decisive on whether a Higgs boson in the model is assigned to the observable, i.e. whether its signal rate is assumed to contribute to the observable or not (see Section 2.6.3). Line 10 of the .stxs file specifies the number of relevant signal channels, N c , and another reference mass for the quoted efficiencies (which usually coincides with the previous reference mass for the measurement). Line 11 lists the N c channel IDs (separated by white spaces), see Table 1, and line 12 the corresponding relative channel efficiencies in the SM, SM,i . The last two lines (13,14) quote the observed and SM-predicted signal rate and the lower and upper 1σ range, as quoted in the experimental result. If the rate is given in SM-normalized form (SM-normalized-rate flag set to 1), the last line is ignored.
The STXS correlation matrix should be given in an additional file (.stxscorr) in the same folder. The file contains three columns, observable ID 1 observable ID 2 correlation coefficient, such that each line encodes the correlation between two observables. Tables 6 and 7 list all LHC Run-2 Higgs signal measurements from ATLAS and CMS, respectively, which are implemented as the default observable set in HiggsSignals-2.6.0.   Table 7.: CMS Higgs rate measurements from LHC Run-2 included in the default observable set LHC13_Apr2020 in HiggsSignals-2.6.0.