Searching for degenerate Higgs bosons - A profile likelihood ratio method to test for mass-degenerate states in the presence of incomplete data and uncertainties

Using the likelihood ratio test statistic, we present a method which can be employed to test the hypothesis of a single Higgs boson using the matrix of measured signal strengths. This method can be applied in the presence of incomplete data and takes into account uncertainties on the measurements. The p-value against the hypothesis of a single Higgs boson is defined from the expected distribution of the test statistic, generated using pseudo-experiments. The applicability of the likelihood-based test is demonstrated using numerical examples with uncertainties and missing matrix elements.


Introduction
A new resonance, consistent with the standard model (SM) Higgs boson and with a mass of approximately 125 GeV, has been observed by ATLAS and CMS collaborations at Large Hadron Collider (LHC) [1][2][3]. For each production tag and decay mode of the boson, the signal strength can be been defined in terms of the observed production cross section and branching ratio relative to the SM expectation as: where the indexes i and j stand for the different production tags and decay modes, respectively. The available experimental information can be arranged in a matrix, M , of signal strengths: ggH µ ggH,γγ µ ggH,WW µ ggH,ZZ µ ggH,ττ µ ggH,bb VBF µ VBF,γγ µ VBF,WW µ VBF,ZZ µ VBF,ττ µ VBF,bb VH µ VH,γγ µ VH,WW µ VH,ZZ µ VH,ττ µ VH,bb ttH µ ttH,γγ µ ttH,WW µ ttH,ZZ µ ttH,ττ µ ttH,bb The rows of M represent the four main production modes: gluon fusion (ggH), vector boson fusion (VBF), associated production with a Z or W boson (VH), and associated production with a top quark pair (ttH). The columns represent five decay modes 1 : γγ, WW, ZZ, ττ and bb.
It has been suggested that there may be more than one state nearly-degenerate in mass, which could couple differently to the SM particles and produce the observed signal. If more than one state is present, it may be possible to use the rank of M to test for multiple states without having to directly resolve the resonant peak(s), an approach limited by the experimental mass resolution to few GeV [4][5][6]. To that end, there are two main challenges that need to be overcome in order to make use of all the experimental information encoded in M : 1. Some µ i, j have not yet been measured, such as µ ttH,ZZ , and are therefore missing, raising the issue of censored data. 2. The precision to which different µ i, j have been measured varies greatly, with the more precise measurements usually being related to the processes with largest production cross-sections.
Inspired by the work presented in Ref. [4], we noticed that while their approach tackled the different precision of the measurements, it was not able to take into account the missing data aspect. The method presented here addresses the question of whether the experimental observations are compatible with a single Higgs boson using the full information encoded in M , i.e. coping with missing µ i, j and taking into account the uncertainties on the µ i, j .
2 The 2 × 2 case Consider a 2 × 2 sub-matrix of M , namely the one with the ggH and VBF production modes and the γγ and WW decay channels: Ignoring the uncertainties on the µ i, j elements, if there is only one Higgs boson the determinant of this 2 × 2 matrix is zero and consequently the matrix has rank 1.
However, if there are two particles involved, each signal strength in the matrix can be written as the sum of two terms: where the superscripts 1 and 2 concern each of the two states. When replacing all four matrix elements with sums of form Eq. (4), the determinant becomes, in general, not equal to zero and in that case the matrix rank will be 2. Instead of the determinant, when studying the rank of a 2 × 2 matrix, a double ratio can be used without loss of generality: This is equivalent to the double ratios introduced in Ref. [7] and the expectation is that ρ = 1 if there is only one state. If there are two states, one has and ρ can take a range of values, implying the rank of the matrix is 2. The rank is a discrete quantity that can be exactly computed if the elements of the matrix are also exactly known. If there are independent uncertainties on the matrix elements, the rank cannot be precisely determined. Nevertheless, it is still possible to assess the statistical compatibility of the matrix with one of rank 1. In previous works [4,7], it was proposed to assess the statistical compatibility via quantities such as ρ which can be assigned an uncertainty obtained by propagating the uncertainties on the matrix elements.

The profile likelihood ratio test
In order to evaluate the statistical compatibility of the matrix of observations having rank 1, a model of the matrix elements must be assumed. Such a model defines the correlations between the different µ i, j , which are parameters of interest in the fit to data, but may also include all other parameters that affect the measurements, such as systematic uncertainties. The latter can be treated as nuisance parameters using the profile likelihood method and their values are obtained from a fit to the data [8].
The likelihood is the probability to observe the data assuming that a given model is true. Using the likelihood function L(data|α, β ), where α are parameters of interest and β are nuisance parameters, we define the test statistic q = q(α) based on the profile likelihood ratio [8][9][10][11][12][13]: whereβ is the value of β that maximizes the likelihood function for the given values of α, whereasα andβ are the values that maximize the likelihood function overall.

Application to the 2 × 2 case
Let's consider again a 2 × 2 matrix and no nuisance parameters. In this case, the test statistic q(ρ) can be constructed for the double ratio ρ from Eq. (5): When testing for compatibility with rank 1, the relevant quantity is q(ρ = 1), where the numerator is the likelihood to observe the data assuming a rank 1 matrix, whereas the denominator is the likelihood to observe the data assuming the most general 2 × 2 matrix.

Application to M
The likelihood ratio defined in Eq. (8) can be generalized to any matrix size, which includes M , a 5 × 4 matrix. In that case, the denominator is the profile likelihood for the model of the most general 5×4 matrix while the numerator uses the model of a general rank 1 matrix, which we compose as the tensor product of two vectors: (µ γγ , µ WW , µ ZZ , µ ττ , µ bb ) and (1, λ VBF , λ VH , λ ttH ), where µ j = µ ggH, j is the signal strength in for gluon fusion production and decay mode j, and λ i = µ i, j /µ ggH, j are the relative scaling factors between the signal strength in production mode i and that for gluon fusion. Note that, as expected, the rank 1 model entails that the λ i are the same for all decay modes.
The matrix under each of the alternative hypotheses is: 1. A general rank 1 matrix with eight parameters µ j , λ VBF , λ VH , and λ ttH : The most general 5×4 matrix with twenty parameters µ j , λ j VBF , λ j VH , and λ j ttH : If the matrix has rank 1, the rows in these two models are the same, i.e. λ VBF ≡ λ j VBF , λ VH ≡ λ j VH , and λ ttH ≡ λ j ttH . The parameters of interest are λ VBF (λ j VBF ), λ VH (λ j VH ), and λ ttH (λ j ttH ), while the signal strength parameters are profiled as they have no bearing on the rank of the matrix.
The test statistic q λ is defined using ratio between the profile likelihood of the two aforementioned models: .
The numerator is the profile likelihood of data assuming the most general rank 1 matrix and the denominator is the profile likelihood of data assuming the most general 5 × 4 matrix.
As noted above, if there is only one Higgs boson the λ i do not depend on the decay mode and the value of the q λ is not very large. However, if the rank is not equal to 1, the most general 5 × 4 matrix model will fit the data better than the general rank 1 matrix model and the value of q λ will be large.
When dealing with more general parameters than the double ratio ρ, the expected distribution of the test statistic is generated using the Monte Carlo simulation. The p-value for the observation is evaluated from the expected test statistic distribution under the hypothesis of the SM Higgs boson.

Numerical examples
For ease of comparison, and even if more recent data are available [6,[14][15][16][17][18][19][20][21][22][23][24][25][26][27], the following numerical examples are carried out using the same signal strength matrix presented in Ref. [4], namely: In the following sub-sections we will consider, in turn, sub-matrices of Eq. (10), building up to the analysis of all the information it provides. While any correlated uncertainties between the matrix elements have not been taken into account, experiments can and should take them into account when using this method.

(ggH, VBF) × (γγ, WW, ττ)
As a first example, consider the 2 × 3 sub-matrix spanned by (ggH, VBF) × (γγ, WW, ττ) with signal strengths taken from the Eq. (10) above. As an approximation, each signal strength is modeled by a Gaussian distribution using the likelihood function for a counting experiment with large event yields. The counting experiment is set up as follows: the expected number of background events B exp is fixed to 1000 and the number of observed events N obs and the expected number of signal events S exp are chosen to reproduce the observed signal strength µ i, j and its uncertainty. Using this approach, the observed value of the test statistic is q obs λ = 4.02.
Pseudo-experiments are used to estimate the expected distribution of the test statistic under the hypothesis of a single particle. The pseudo-data are generated according to the expectation for the SM Higgs boson hypothesis, where µ j = λ VBF = λ j VBF = 1. The expected test statistic distribution is shown in Fig. 1. Small negative values of q λ are expected from the limited numerical precision when evaluating q λ , when q λ ∼ 0. On the other hand, large negative values are due to the failure to fit the pseudo-data with the more general model (in the denominator of q λ ) while the rank 1 model fit converges. These are rare cases, present in this particular scenario which has large input uncertainties. This is in contrast with the scenario considered in the following sub-section, where these large negative values are absent, as expected from the reduced uncertainties.
The p-value for the observed outcome can be derived using the distribution of the test statistic in pseudo-data shown in Fig. 1 by counting how many outcomes have q λ ≥ q obs λ : This means that if the matrix has rank 1 we would expect an outcome as extreme, or more extreme, than the observation in (20.90 ± 0.03)% of the cases, where the uncertainty quoted is purely statistical and is calculated from the 68.2% Clopper-Pearson interval [28]. The p-value can be translated to into a z score and for that conversion we use the one-tail convention. The z score for the observed p-value above is 0.81 standard deviations (σ ) and represents a (low) significance against the rank 1 hypothesis. This result is in contrast with the findings of Ref. [4], namely 0.4σ .
As would be expected, there is no change in the results when extending the matrix to include also the ZZ information from Eq. (10). The reason is that having one single element in any given row or column makes no statement about the rank-related parameters, λ i .

Luminosity extrapolation
For a different type of numerical example, let us examine the same 2 × 3 sub-matrix as above, but scaling the signal strength uncertainties with 1/ √ L = 1/ √ 10, a naive way of simulating a 10-fold increase in the amount of data, and also done in Ref. [4]. Because of the decreased uncertainties, the observed value of the test statistic would become q high−lumi λ = 38.7. The expected test statistic distribution can be seen in Fig. 2 and the p-value is found to be: which would correspond to a 4.15 ± 0.04σ significance for rank larger than unity.
From the comparison of the two test statistic distributions in Figs. 1 and 2, one notes the effect that the reduction of the uncertainties on the signal strengths has in decreasing the number of outcomes with q λ < 0, as previously described.
The observed value of the test statistic is found to be q 3×3 λ = 4.28. The expected test statistic distribution can be seen in Fig. 3 and the p-value is: which corresponds to a 0.4σ significance for rank larger than unity. 4.4 (ggH, VBF, VH) × (γγ, WW, ττ) with removed elements To demonstrate way in which the proposed technique deals with censored data, two different test with absent elements are presented.
Two elements removed In a first test, the two elements with large uncertainties, namely µ VBF,γγ and µ VH,WW , have been removed. The observed value of the test statistic is found to be q 3×3 λ ,mis.2 = 0.28 and the p-value, defined from the test statistic distribution in Fig. 4, is which corresponds to a −0.86σ significance for rank larger than unity.  Fig. 4 The distribution of the test statistic, generated using the 3 × 3 sub-matrix of the Eq. (10), spanned by (ggH, VBF, VH) × (γγ, WW, ττ), with the signal strengths µ VBF,γγ and µ VH,WW removed, and assuming the SM Higgs boson hypothesis. The observed value of q λ is q 3×3 λ ,mis.2 = 0.28 and, under the SM Higgs boson hypothesis, the fraction of pseudo-experiments for which q λ ≥ 0.28 is 0.81.
In this case, a signal strength measurement µ ττ is absent, implying that the parameterization for the most general rank 1 matrix has more free parameters than there are measurements. This presents a technical difficulty for the numerical minimization used in calculating the profile likelihood. Nevertheless, and without loss of generality, the missing µ j can be set to unity in the denominator of q λ , Eq. (9).
The observed value of the test statistic is found to be q 3×3 λ ,mis.3 = 0.31. The test statistic distribution under the SM Higgs hypothesis can be seen in Fig. 5 and the p-value is p SM,3×3,mis.3 =P(q λ ≥ q 3×3 λ ,mis.3 |µ j ≡ 1, λ j i ≡ 1) =0.44, which corresponds to a 0.15σ significance for rank larger than unity.  Fig. 5 The distribution of the test statistic, generated using the 3 × 3 sub-matrix of the Eq. (10), spanned by (ggH, VBF, VH) × (γγ, WW, ττ), after removing three elements, and assuming the SM Higgs boson hypothesis. The observed value of q λ is q 3×3 λ ,mis.3 = 0.31 and, under the SM Higgs boson hypothesis, the fraction of pseudoexperiments for which q λ ≥ 0.31 is 0.44.

Conclusion
We have developed a method that can be used to test for the presence of multiple Higgs bosons. The method can be directly used by the ATLAS and CMS collaborations as it builds upon the profile likelihood techniques already in use for Higgs measurements at the LHC. The main features of method are that it can test for (in)dependence in arbitrarilysized linear systems in the presence of uncertainties and of missing elements of data.