8D Likelihood Effective Higgs Couplings Extraction Framework in the Golden Channel

In this paper we build a comprehensive analysis framework to perform direct extraction of all possible effective Higgs couplings to neutral electroweak gauge bosons in the decay to electrons and muons, the so called `golden channel'. Our framework is based on a maximum likelihood method constructed from analytic expressions of the fully differential cross sections for $h \rightarrow 4\ell$ and for the dominant irreducible $q\bar{q} \rightarrow 4\ell$ background, where $4\ell = 2e2\mu, 4e, 4\mu$. Detector effects are included by an explicit convolution of these analytic expressions with the appropriate transfer function over all center of mass variables. Using the full set of decay observables, we construct an unbinned 8-dimensional detector-level likelihood function which is continuous in the effective couplings and includes systematic uncertainties. We consider all possible $ZZ$, $Z\gamma$ and $\gamma\gamma$ couplings, allowing for general CP odd/even admixtures and any possible phases. We describe how the convolution is performed and demonstrate the validity and power of the framework with a number of supporting checks and example fits. The framework can be used to perform a variety of multi-parameter extractions, including their correlations, to determine the Higgs couplings to neutral electroweak gauge bosons using data obtained at the LHC and other future colliders.


I. INTRODUCTION
The recent discovery of the Higgs boson at the LHC [1,2] with properties resembling those predicted by the Standard Model, shifts our focus to the determination of its precise nature. It is important to establish whether or not the Higgs boson possesses any anomalous couplings to Standard Model particles. In this study we focus on couplings to neutral electroweak gauge bosons. Since these 'anomalous effects' are expected to be small if at all present, constraining or measuring of these couplings should preferably be done through direct parameter extraction with minimal theoretical assumptions. The vast literature  on Higgs decays to four charged leptons (electrons and muons) through neutral electroweak gauge bosons, suggests that the so called 'golden channel', can be a powerful process towards accomplishing this goal.
A number of frameworks have been presented utilizing the Matrix Element Method to study the golden channel aiming to determine these potentially anomalous couplings. These primarily rely on Monte Carlo generators such as the JHU generator [13,17,32] or on Madgraph implementations [22,31]. They have the advantage of flexibility to include various Higgs production and decay channels and are especially useful for constructing kinematic discriminators to distinguish between competing hypotheses. The work we present here focuses on the golden channel as well, but we propose a novel analysis framework largely based on an analytic implementation. It is designed to maximize the information contained in each event towards direct extraction of the various effective Higgs couplings. It is generally acknowledged in the literature that analytic methods are optimal for performing direct multi-parameter extraction within practical and reasonable computational processing resources [13,17,32]. In this work, we demostrate that within an analytic framework one can readily include the relevant detector effects in order to obtain a detector-level likelihood function in terms of the full set of observables available in the four lepton final state.
Our framework is based on a maximum likelihood method constructed from analytic expressions of the fully differential cross sections for the h → 4 decay (this includes the ZZ, Zγ, and γγ effective couplings and 4 = 2e2µ, 4e, 4µ) as well as the dominant irreducible qq → 4 background. These expressions have been computed and validated in [19,35,36]. Using the full set of decay observables available in the golden channel we build the complete 8-dimensional detector level likelihood in order to perform parameter extraction of the various arXiv:1401.2077v1 [hep-ex] 9 Jan 2014 possible Higgs couplings, including general CP odd/even admixtures and any possible phases. The method does not rely on hypothesis testing or on the construction of kinematic discriminants, but instead on a maximization of the full likelihood to find the value of the parameters for which one obtains the global maximum. Pseudoexperiments are performed in order to determine the precision of our framework in extracting or constraining the values of these couplings.
Detector effects are included by the explicit convolution of a transfer function, encapsulating the relevant detector effects, with the generator (truth) level probability density formed out of the signal and background differential cross sections. After performing this 12-dimensional convolution integral and its normalization, we are left with a probability density function (pdf ) from which we construct an unbinned detector-level likelihood which is a continuous function of the effective couplings (or Lagrangian parameters) and takes as its input, reconstructed (detector-level) center of mass observables. For our observables to be used in the likelihood we use the full set of eight center of mass variables available in the h → 4 decay (after averaging over the four production variables) thus optimizing the power of the golden channel final state. Finally, systematic uncertainties are included in the likelihood via the use of nuisance parameters.
As discussed in [13,17,32] the advantage of this approach is that the likelihood can be maximized for a large set of parameters in the most optimal way without losing information. In constructing this likelihood we have overcome many of the technical challenges which in the past have made it impossible to use the fully multidimensional detector level likelihood in multi-parameter fits. In addition to taking into account detector and systematic effects as mentioned above, these challenges include: i) parametrizing both the signal and background in a multi-dimensional space, ii) efficient and accurate normalization of the pdfs and iii) achieving a high degree of convergence as well as a high degree of stability in locating the global maximum of the likelihood in the multi-parameter fit procedure. Our construction eliminates almost entirely the need to build multi-dimensional templates when including detector effects, which can require 'smoothing' and large computing time, and thus is not dependent on Monte Carlo statistics. Once this likelihood is constructed for a given phase space, we perform a variety of multi-parameter extractions including all correlations within minimal computing time (on the order of seconds for a relevant dataset) and with the same degree of flexibility as was done at generator-level in [35].
We present the overview of our analysis framework, the procedure for how the convolution with the transfer function is performed as well as show various validation and consistency checks. We also demonstrate the validity and potential of our machinery by performing example fits to pseudodata and extracting the Higgs couplings to neutral electroweak gauge bosons via maximization of the likelihood. Since we are only seeking to demonstrate the viability of our framework, we perform only a few example 'toy' fits and make a small number of simplifying assumptions to be discussed below which minimally affect our results. A more thorough and detailed examination with precise quantification of our ability to extract the various couplings is done separately as part of an ongoing study [37]. The full details and validation studies of our framework can also be found in accompanying studies [19,35] as well as a detailed technical note [36] to which we refer the interested reader for more information. We expect this framework will be useful for performing parameter extraction and determining the Higgs couplings to neutral electroweak gauge bosons using the LHC data and in future collider studies. This framework will be useful if additional scalars are discovered with couplings to electroweak gauge bosons.
The organization of this paper is as follows: in Sec. II we briefly review the kinematics of the four lepton final state as well as the Higgs to V V (V V = ZZ, Zγ, γγ) tensor couplings and discuss how the analytic expressions of the differential cross sections are obtained. In Sec. III we present the strategy for including the relevant detector effects and constructing the detector level pdfs. In Sec. IV we present how the final detector-level likelihood is obtained and the inclusion of systematic uncertainties. In Sec. V we discuss the fit and statistical procedure used to perform the parameter extraction. Finally in Sec. VI we present results obtained from fits to pseudodata. In the Appendix we present some additional results as well as details on the transfer function.

II. PARAMETRIZATION OF DIFFERENTIAL CROSS SECTIONS
Analytic expressions have been shown to be useful in likelihood methods where the full kinematics of an event can be exploited. This is especially true for the golden channel as has been demonstrated in numerous studies [13-15, 17, 18, 29, 32, 35]. In this section we briefly discuss the parametrization of the differential cross sections for the h → 4 signal process as well as the qq → 4 background process. For a detailed description of the analytic calculations for the signal and background fully differential cross sections as well as their validation we refer the reader to accompanying studies [19,35] and an accompanying technical note [36].

A. Decay Observables
Here we describe the various center of mass variables which will be used as our set of observables when constructing the likelihood. The kinematics of four lepton events are illustrated in Fig. 1. The invariant masses are defined as: The invariant mass of the four lepton system or the Higgs mass in case of signal.
• M 1 -The invariant mass of the lepton pair system which reconstructs closest to the Z mass.
• M 2 -The invariant mass of the other lepton pair system and interpreted as M 2 < M 1 .
These variables are all independent subject to the constraint (M 1 + M 2 ) ≤ √ŝ . Note also that the 4e/4µ final state can be reconstructed in two different ways due to the identical final state interference. This is a quantum mechanical effect that occurs at the amplitude level and thus both reconstructions are valid. The definitions M 1 and M 2 remain unchanged however.
The angular variables are defined as: • Θ -The production angle between the momentum vectors of the lepton pair which reconstructs to M 1 and the total 4 system momentum.
• θ 1,2 -Polar angle of the momentum vectors of e − , µ − in the lepton pair rest frame.
• Φ 1 -The angle between the plane formed by the M 1 lepton pair and the 'production plane' formed out of the momenta of the incoming partons and the momenta of the two lepton pair systems.
• Φ -The angle between the decay planes of the final state lepton pairs in the rest frame of the 4 system.
We group the angular variables as follows Ω = (Θ, cos θ 1 , cos θ 2 , Φ 1 , Φ). There is a sixth angle, referred to here as the offset angle φ, defining a global rotation of the event which has a flat distribution and thus is not shown. However, it is important to keep track of, when performing the convolution to include detector effects. There are also the production variables associated with the initial partonic state four momentum. This four momentum defines the invariant mass of the center of mass system ( √ŝ ), as well as the rapidity (Y ) defined as the motion along the longitudinal direction, and the momentum in the transverse direction ( p T ). In principal the inclusion of Y and p T as observables would increase the discriminating power of the golden channel, but as we are interested primarily in parameter extraction and these variables introduce additional systematic uncertainties, we average over them (and φ) and thus do not consider them in our set of observables. As with φ however, it is important to include them when performing the convolution with the transfer function.

B. Parametrization of Scalar-Tensor Couplings
Assuming only Lorentz invariance, the general couplings of a spin 0 particle to ZZ, Zγ, or γγ pairs can be parametrized by the following vertex 1 , where ij = ZZ, Zγ, or γγ and k and k represent the four momentum of the intermediate vector bosons with v the Higgs vacuum expectation value (vev) which we have chosen as our overall normalization. The A ij 1,2,3 are dimensionless arbitrary complex form factors with possible momentum dependence. For the purposes of this study we approximate the couplings as constant as is done in other similar analysis [13,17,22,31,32]. Our framework can easily include the full momentum dependence of the form factors. For the case of a scalar coupling to Zγ or γγ, electromagnetic gauge invariance requires A Zγ,γγ 1 = 0, while for ZZ it can be generated at tree level as in the SM or by higher dimensional operators. We note that the particular parametrization in Eq.(1) is not a necessary feature of our analysis framework and can easily be changed to match those found in recent studies [30][31][32] depending on the particular interpretation that one wants to subscribe to. We choose Eq.(1) simply for concreteness in what follows.
The more general parametrization in Eq.(1) can be mapped for example on the Lagrangian given by set of Hermitian operators, where we have taken ϕ real and allowed only up to dimension five operators and Z µ is the Z field while V µν = ∂ µ V ν −∂ ν V µ is the usual bosonic field strengths. The dual field strengths are defined as V µν = 1 2 µνρσ V ρσ . Thus for this Lagrangian we would have all couplings real with ≡g Z and similarly for Zγ and γγ. This makes Eq.(1) a convenient parametrization for fitting to Lagrangian parameters that might be generated in various models at dimension five or less. For a purely Standard Model Higgs we have A ZZ 1 ≡ g h = 2, while all other coefficients are ≈ 0. The parameterization in Eq.(1) can of course be mapped onto Lagrangians with dimension greater than five with appropriate translation of the parameters. We work explicitly with the vertex in Eq.(1) used to calculate the fully differential cross section for h → 4 and when performing parameter extraction, but again this can easily be changed in our framework. We also define the full set of parameters as, which will be used for the remainder of this study.

C. Signal and Background Differential Cross Sections
In the case of signal we have computed analytically the fully differential cross section in the observables described in Sec.II A for the process h → ZZ + Zγ + γγ → 4 using the parametrization in Eq.(1). We have included all possible interference effects between tensor structures as well as identical final states in the case of 4e/4µ. For the irreducible background we have computed analytically the process qq → ZZ + Zγ + γγ → 4 which includes the s-channel (resonant) 4 process as well as the t-channel (diboson production) 4 process and again includes all possible interference effects. All vector bosons are allowed to be on or off-shell and we do not distinguish between them in what follows. The details of these calculations can be found in [19,35,36] along with the validation procedures and detailed studies of the distributions as well as the various interference effects. We have combined these analytic expressions with functions parametrizing the production spectra and implemented them into our analysis framework.
We note that it is important to include all possible Higgs couplings including the Zγ and γγ contributions in the signal differential cross section since the Higgs appears to be mostly Standard Model-like [40] and we are primarily searching for small anomalous deviations from the Standard Model prediction. Thus when attempting to extract specific couplings we must be sure that one small effect is not being mistaken for another. This is particularly relevant because we find many of the couplings are correlated. Including all possible couplings and doing a simultaneous fit ensures that we minimize the possibility of misinterpretation or of introducing a bias when attempting to extract these couplings. Searching for these small effects is also why it is important to include the interference effects between the identical final state leptons as well as the relevant detector effects and background.

III. CONSTRUCTION OF THE PDF
To be able to perform a fit for the effective Higgs couplings, we must first obtain the probability density function (pdf ) for the observables as a function of the undetermined parameters ( A). This pdf consists of two components which we assume to be factorized: the parton level ('decay') differential cross section as discussed in Section II C, and the production spectrum. This can be expressed as, The parton level fully differential cross section is treated as being at fixedŝ where one obtains the inputŝ value from the production spectrum W prod . The production spectrum for the signal and background depend on the parton distribution functions and can not be computed analytically. For the signal in which we assume decays on-shell, theŝ spectrum is taken to be a delta function centered at m 2 h . We discuss in more detail how W prod is obtained for the signal and background in Sec. III D.
We explicitly assume that the decay process can be factorized from the production mechanism and as mentioned previously will eventually average over p T , Y and φ. Of course the expression in Eq.(4) represents the generator level pdf, while a realistic treatment involves the pdf after taking into account detector effects. We study this in more detail below and discuss the basic procedure for obtaining the detector level pdf via an explicit integration over all of the center of mass variables. The particular details of the various steps as well as a number of validations can be found in an accompanying technical note [36].

A. Obtaining pdf in Terms of Detector Observables
A realistic treatment of the signal and background requires obtaining the pdfs in terms of detector level observables. This can be done by a convolution of the generator level pdf introduced in Eq.(4) with a transfer function which parametrizes the effects of the lepton selec-tion efficiency and the imperfect momentum measurement resolution of the detector. This can be represented schematically as follows, Here we take X to represent the full set of center of mass variables, including production and the flat offset angle φ, as X ≡ ( p T , Y, φ,ŝ, M 1 , M 2 , Ω). The transfer function T ( X R | X G ) is loosely based on the approximate performance of the CMS detector. It takes us from generator (G) level to reconstructed (R) (detector level) observables and is described in more detail in the Appendix. It represents the probability of reconstructing the observables X R given the generator level observable X G and is treated as a function of X R which takes X G as input. The set of variables X exhausts the twelve degrees of freedom (note that p T has 2 components and Ω contains 5 angles) available to the four (massless) final state leptons. The differential volume element is given by Upon integration over all X G variables one obtains a pdf which encapsulates the relevant detector effects.
The integral in Eq.(5) is the main result of this paper and we emphasize that it has not been obtained via Monte Carlo methods. Instead we have explicitly performed the integration by utilizing various change of variables and well-established numerical techniques (see [31,[41][42][43] for new studies that perform similar convolutions using Monte Carlo methods). This ensures that (arbitrarily) high precision is maintained at each step, producing what is effectively an 'analytic function' in terms of detector level variables once the convolution has been performed. After averaging over the production variables ( p T , Y, φ), this allows us to ultimately construct a complete unbinned detector level likelihood, which utilizes the full set of eight reconstructed decay observables and is a continuous function of the effective couplings. Having the detector level likelihood as a continuous function of all the effective Higgs couplings allows us to easily perform multi-parameter extraction with great speed and flexibility as was done at generator level [35]. By obtaining the 8-dimensional detector level likelihood explicitly we avoid the need to fill large multi-dimensional templates that an impractical amount of computing time; we also thus avoid the collateral binning and often 'smoothing' side-effects.
While conceptually simple the convolution integral is operationally challenging and in fact is most easily done with a different set of variables than those in the center of mass frame. Since this step is crucial for performing the convolution we describe below an overview of the necessary change of variables. The explicit details of these transformations and their validations are given in [36]. We note for now that the manner in which the qq → 4 and h → 4 expressions are calculated, as a sum of the individual contributions [19,35], makes the con-volution feasible since one can perform the integration on each smaller piece and then simply sum the separate contributions. This is much more practical to do computationally than to integrate the entire expressions at once.

B. Changing Variables for Background pdf
We first discuss the construction of the background detector level pdf and continue with the construction of the signal as there is a subtle difference between these two cases. Since there are no undetermined parameters in the background the generator-and detector-level pdfs are given simply by P B ( X G ) and P B ( X R ) respectively. In order to perform the convolution with the transfer function we first transform to a more convenient set of variables in which the detector smearing is parametrized before performing the integration.
To begin, we transform from the twelve center of mass variables to the three momentum for the four final state leptons. This can be represented as follows, where the differential volume element is now given by, and p G i is the generator level three momentum of the i'th lepton. The |J P G | is the Jacobian associated with the twelve dimensional change of variables from X G → P G in the differential volume element. The |J P R | arises from the change of variables X R → P R in the transfer function (remembering T ( X R | X G ) is treated as a function of X R ) which we loosely also refer to as a Jacobian, as we will do for all subsequent change of variables to follow. Ideally to find these Jacobian factors one should construct the 12×12 matrix associated with these transformations and then calculate the determinant, but this is untenable analytically since it must be constructed for each point in phase space. We therefore implement a straightforward numerical algorithm to calculate these factors for each phase space point. This procedure is described in detail and validated in [36].
Since we make the assumption that detector smearing will only affect the component of the lepton momentum parallel to the direction (p i|| ) of motion and not the two components perpendicular to the direction of motion ( p i⊥ ) (which are zero at generator level) we find it convenient to decompose the lepton three momenta p i in terms of p i|| and p i⊥ . Note that this assumption is equivalent to assuming angular resolution effects due to detector smearing can be neglected, which is an excellent approximation for the LHC detectors. In the (p i|| , p i⊥ ) basis only the transfer function associated with p i|| is non-trivial while the one associated with the perpendicular components can be represented simply as a delta function for each perpendicular direction, thus allowing for trivial integration over the eight p i⊥ variables.
The differential volume element can now be written as We then use the property of the transfer function that it is explicitly parametrized in terms of the ratio of reconstructed and generator level momentum components along the direction of motion to again change variables as follows, where The Jacobian factors |J c R | and |J c G | are obtained analytically [36] and take us from p i R || → c i and p i G || → c i variables in the transfer function (which is now a function of c) and the differential volume element respectively.
Finally, we use the fact that to eliminate c 2 and c 4 and make a final change of variables to the basis in which we perform the explicit four dimensional integration. This gives the final background pdf in terms of reconstructed variables, where in the first line in Eq.(10) we have implicitly used the delta functions in the transfer function to perform the eight dimensional integration over p i G ⊥ . The Jacobian |J M B | is obtained again analytically [36] from the change of variables c 2 , c 4 → M 2 1 G M 2 2 G in the differential volume element when going from the first line to the second line in Eq. (10). We can write Eq.(10) more compactly as, where the total background Jacobian factor is given by, We thus see in the end that what started out as a twelve dimensional integral has been reduced to a much more manageable integration over four variables. The details and validation of this four dimensional integration, which is done using a recursive numerical integration technique, along with the change of variables including the background Jacobian |J B | in Eq. (12) are shown explicitly in [36]. Further details on the transfer function T ( c | P G ) can be found in the Appendix.

C. Changing Variables for Signal pdf
To construct the detector level signal pdf, which is now a function of the effective couplings A, we follow the same procedure as for the background through the second line in Eq.(9) to obtain, In contrast to the background however, we now perform the following change of variables, where again we have implicitly used the delta functions in the transfer function to perform the eight dimensional integration over p i G ⊥ . Here |J M S | is the Jacobian obtained analytically [36] in changing from c 2 , c 3 , c 4 → s G , M 2 1 G , M 2 2 G variables. As mentioned below Eq.(4) thê s spectrum for the signal is ∝ δ(ŝ G − m 2 h ) (where m h is the generated Higgs mass), enabling us to perform the integration over dŝ G . Thus, we have for the final signal detector level pdf, where the total signal Jacobian is given by, We note that the integration in dŝ G involves an integral over a delta function which is computationally nontrivial when including detector resolution effects. The delta function inŝ G also places an additional constraint when performing the M 2 1 G M 2 2 G integration which further complicates matters and must be properly taken into account. Explicit details of this integration and its validation along with the derivation of the signal Jacobian |J S | in Eq.(16) are given in [36].

D. Production Spectra and Extra Backgrounds
Here we discuss how the W prod (ŝ, p T , Y ) production spectrum in Eq.(4) is obtained (ignoring the 'flat' φ distribution). This function involves higher order effects as well as parton distribution functions and thus can not be computed analytically. To include them in the total pdfs there are various options. One can in principal generate enough Monte Carlo events to accurately fill the full spectrum in the (ŝ, Y, p T ) variables. As this is computationally intensive we take an approximate approach in which we interpolate analytic functions for the signal and background from the one dimensional projections generated from the Madgraph [44] and POWHEG [45] Monte Carlo generators. Having an analytic parametrization for these functions also allows for faster integration when implementing them into the convolution procedure described above. This procedure of interpolating the one dimensional projections neglects any correlations between the production variables, but as we are explicitly assuming factorization between production and decay in addition to averaging over Y , and p T the effects of this approximation on our analysis are small. A more complete and accurate multi-dimensional interpolation of the parton distribution functions is presented in [32], which we are currently implement into our framework [37].
For the background there are also the higher order contributions such as the gg → 4 and Z + X processes. These make up a small component compared to the dominant qq → 4 background which itself is small compared to the signal after selection cuts have been applied around the signal region. Averaging over Y , and p T acts to further suppress the sensitivity of the coupling parameter measurement to the details of these higher order effects which mainly affect the production variables. We thus expect these contributions to minimally affect our analysis so we neglect them. However, these additional backgrounds can be included in our framework either as part of the systematics of the production function (W prod ) or by explicitly including the gg → 4 and Z + X processes in the pdf. Work is currently underway to include these contributions.
Since we explicitly fit to ratios of couplings and do not extract the overall normalization while using the experimentally measured Higgs mass, the dependence on these small contributions is further reduced. The overall normalization and Higgs mass can in priciple be extracted in our framework, but as this requires careful treatment of the production spectra and extra backgrounds we defer this to future work. Here we continue with our approximated production functions for the (ŝ, Y, p T ) spectra. We note that NLO contributions to the h → 4 decay processes are also expected to be small at ∼ 125 GeV [46,47], but a careful study of these effects is ongoing.

E. Validation of Convolution Integral
As validation of the convolution integral we first show in Figs. 2-5 projections for signal and background. We compare in these plots the distributions for a Madgraph sample which has had detector smearing and acceptance effects applied to it versus projections generated from our detector level pdfs obtained after the convolution described above. These plots should not be taken as validation of the complete detector-level pdf which must be validated with full simulation and data. They are meant only to show the validation of the convolution procedure as well as the construction of the generator level pdfs including the analytic computations.
We have obtained the signal and background production spectrum for the (ŝ, p T , Y ) variables from POWHEG and boosted the Madgraph events and those from our projections accordingly. We have used the interpolation procedure described in Sec. III D to build the production spectra for the signal and background pdfs and combined them with the analytic expressions for the h → 4 and qq → 4 processes. For the signal we show the Standard Model point where A ZZ 1 = 2 and all other couplings are set to zero while for both signal and background we show only the 2e2µ final state.
A further validation beyond these projections however is to look at the likelihoods (the pdf evaluated for a set of observables) for both the signal and background which contain the full correlations between the different variables. We show these in Fig. 6 for a CMS-like phase space and a very large number of events. To obtain these likelihoods we have evaluated our detector-level pdf with the Madgraph sample which has had detector smearing and acceptance effects applied and plotted it on top of the result of evaluating our detector-level pdf with events generated from the pdf itself. We find the agreement between the two results to be very good. Further details are found in the accompanying technical note [36], while likelihoods for other phase spaces and final states as well as different signal hypothesis will be included in the near future at [48].

IV. CONSTRUCTION OF THE LIKELIHOODS
With the detector level pdfs obtained in Eq.(11) and Eq.(15) in hand we can then go on to construct the full likelihood for a particular dataset. Before doing so, we must properly normalize the background and signal pdfs by performing the full integration over all reconstructed X variables where from now on we drop the superscript R since we only deal with detector level observables in what follows. In this section we present an overview of the normalization procedure. We also at this stage dis- cuss the averaging over the production variables Y and p T and the implementation of systematic uncertainties through the use of nuisance parameters in the likelihood functions. Further details can be found in [36].

A. Normalization of Background and Signal pdfs
We first perform the averaging procedure over the detector level production variables Y and p T for the background pdf by the following integration, An overall volume factor is not shown because for the purpose of likelihood maximization this constant factor is not relevant. What matters is that the relative normalization between all components in the likelihood is done consistently. With this pdf in terms of the eight center of mass decay observables we can obtain the overall normalization via Monte Carlo integration [36], which gives for our final normalized background pdf, We have calculated the qq → 4 expression as a sum of the separate individual contributions [19,35] making it possible to easily perform the integration on each smaller piece to obtain each normalization and then simply sum over them to obtain the overall normalization. Similarly for the signal we have for the averaging over (Y, p T ) variables, To obtain the overall normalization in the signal case we first note that it is a function of the underlying parameters A defined in Eq.(1). However, from the calculation of the parton level differential cross section presented in [19,35] or from considering Eq.(1) it is clear that P S (ŝ, M 1 , M 2 , Ω| A) is a sum over terms each of which is proportional to A ij n Aīj * n (where barred indices are treated as distinct from un-barred). Thus we can write, where P S (ŝ, M 1 , M 2 , Ω) ijīj nn is the parton level differential cross sections with the couplings factored out. The separate normalizations for each term can now easily be obtained via, from which we can now obtain the total overall normalization for the signal pdf as, This gives finally for the normalized signal pdf, Since each N ijīj Snn is computed, one does not need to compute the normalization each time a new hypothesis for A is constructed.
Note also that if we choose to fit for ratios of parameters and not their overall normalization, we do not need the absolute normalization of the pdfs either; it suffices to have the relative normalization between the different components correct when performing the maximization. This fact greatly reduces the computational complexity. Instead of propagating the full normalization and aligning units correctly so that when one integrates over all 8 dimensions unity is obtained, it is sufficient to do a Monte Carlo integration using a fixed sample size in a consistent and sufficiently large range. The meaning of the log likelihood difference remains unchanged with this construction. Further details of the normalization procedure for both signal and background are found in [36].

B. Generator Level vs. Reconstructed Level
With the normalized signal pdf in Eq.(24) we can perform a comparison between the generator level and reconstructed level pdfs. This gives an idea of the effects due to detector smearing and reconstruction efficiency on the parameter extraction. In Fig. 7 we show the results for a generator level fit (black) compared to the results after detector smearing and efficiencies have been applied (green). In our example we fit to the ratio R γγ 3 = A γγ 3 /A ZZ 1 = 0 where the 'true' value is indicated by the dashed-dotted line. To ensure a consistent comparison while isolating the effects of smearing and acceptance we show the results for a signal only sample containing 200 events with no systematics included. One can see that the effects of smearing and efficiency are small, but still important to take into account to ensure an optimal analysis.

C. Signal + Background pdf and Final Likelihood
With Eqs. (19) and (24) in hand we can now build the signal plus background pdf from which the total likelihood will be constructed. The signal plus background pdf can be written as, where O ≡ (ŝ, M 1 , M 2 , Ω) is our final set of observables to be used in the construction of the likelihood and F B is the background fraction, which must also be extracted. We can now write the likelihood of obtaining a particular dataset containing N events as, In the case of multiple final states (for example 4e, 4µ and 2e2µ), we build the likelihood function and implement the appropriate systematic uncertainties for each one separately. We now briefly discuss the implementation of the systematic uncertainties.

D. Systematic Uncertainties with Nuisance Parameters
Systematic uncertainties must be accounted for given our imperfect knowledge of various aspects of the analysis procedure. The lepton momentum resolution, the size of the backgrounds, and the exact production spectra are some important examples. For each of these systematic uncertainties we can associate an undetermined parameter which parametrizes our ignorance of the corresponding effect. Since we are not directly interested in these parameters, but only use them to estimate our systematic uncertainties, they are deemed nuisance parameters. One can then include systematic uncertainties into the machinery with alternative pdfs using varying values of the nuisance parameters. This is done by generating alternative pdfs using different values for the nuisance parameter of interest. To give one important example, we generate pdfs with narrower or wider lepton response functions to parameterize our knowledge of the lepton momentum resolution. If we define the nominal pdf to be P 0 (O) and the alternative as P 1 (O), one can parameterize the dependence of the likelihood on a nuisance parameter n by interpolating between the nominal and the alternative pdfs as follows: It is instructive to observe that, for all values of n, the normalization of the total pdf stays the same. Given the asymmetric nature of many systematic uncertainties, it is more appropriate to generate many "check-points" along the axis of n and to do piece-wise interpolation without the need of worrying about the normalization. Noncentral values of n are a priori disfavored, therefore one can impose a prior on top of the interpolated likelihood: where G(n) is typically a Gaussian centered at the central value of n. In the case of multiple systematic uncertainties, one can replace n by a vector of nuisance parameters n, and the prior G(n) by G( n). In general G( n) is a multivariate Gaussian-like function with primary axes which are some combination of different nuisance parameter directions. However one can carefully define the nuisance parameters such that correlations between them are negligible. In this limit G( n) can be written as the product of many Gaussian-like functions.
In this paper, we have included the dominant systematic uncertainties resulting from imperfect knowledge of the lepton momentum scale and resolution. Future work will incorporate a more exhaustive list of systematic uncertainties, including those resulting from uncertainties in the production spectra, uncertainties in the Higgs boson mass, and uncertainties on sub-dominant backgrounds.

V. FIT AND STATISTICAL PROCEDURE
Here we discuss the maximization procedure used to extract the undetermined parameters and the use of pseudo experiments to quantify the uncertainty as well as present our fit definition. To perform the maximization of the likelihood we have incorporated the MINUIT [49] function minimization code into our framework. Further details of these procedures can be found in [36].

A. Maximization Procedure
One important feature of the procedure is that the computationally intensive component of evaluating the likelihood only needs to be done for the events in the final dataset used in the fit for a given experiment. Therefore the computationally expensive pieces can be calculated on the computing grid prior to the analysis of the data, and the fit for parameter extraction itself is then completed within a few seconds. This allows for a great deal of flexibility when fitting the undetermined parameters.
Once the likelihood L( A) for a particular dataset is obtained, a simple maximization procedure to find the global maximum is performed to obtain the value of the parameters which maximizes the likelihood,Â. Thuŝ A represents the most likely value of A for a given dataset. Schematically we have, To quantify the uncertainty on the extracted valueÂ we perform a large number of pseudo-experiments N each containing N events and perform the maximization for each pseudo-experiment. A distribution forÂ is obtained with a spread σ and average valueĀ. The true value A o will sit within some interval of the extracted valueÂ for a given pseudo experiment and as the number of pseudo experiments is taken to infinity the average value ofÂ will converge to the true value; i.e.Ā → A o as N → ∞. The results to be shown in Sec. VI represent a rough estimate of the final precision of the analysis, while a precise quantification of the measurement precision including all sub-dominant backgrounds and systematic uncertainties are left to an ongoing study [37].

B. Finding the Global Maximum
In practice the maximization in Eq.(29) is done by a simple scan of the likelihood function starting from some random initial point in the parameter space. Of particular importance in this step is ensuring that the point in parameter space that this procedure converges to is actually the global maximum and not simply a local maximum, as the statistical fluctuations of a particular dataset can lead to the appearance of multiple local maxima in the likelihood. This can lead to biases or imprecise estimations of the undetermined parameters.
We illustrate this effect in Fig. 8 where we show 'arrow plots' for an example two-dimensional fit to two different datasets containing the same number of events and same 'true' value for the undetermined parameters. We show a large number of arrows whose tails begin at some initial point in a two dimensional parameter space and whose heads point to the final point reached in the maximization scan. On the left we see the same endpoint is reached regardless of the initial starting point indicating there is a clear global maximum. On the right we see two separate accumulations to which the arrow heads point indicating two local maxima. We have carefully accounted for this effect in our maximization procedure and find a very high convergence rate in general ( 99%) to the global maximum of the likelihood. More details of this procedure can be found in [36]. showing the convergence to the point which maximizes the likelihood starting from a random initial point. In these the tail of the arrow is at the initial point while the head is at the end point to which the fit converges. On the left we see that only one end point is found for all initial values. On the right we see the appearance of two endpoints which depend on the initial value indicating the appearance of multiple maxima.

C. Fit Definition
To extract the effective Higgs couplings, we take as our hypothesis the vertex in Eq.(1). We can use an overall phase rotation to make one of the parameters real and as discussed above, we can avoid the need for the absolute normalization if we instead fit to ratios of couplings. Which parameter to make real and which ratios to construct explicitly is a matter of choice, the most convenient of which depends on the fit being performed. Thus, in terms of the vertex as defined in Eqs.(1), we are explicitly fitting to, where R ij n are complex ratios defined as R ij n = A nij /|A| and |A| is some normalization to be chosen depending on the particular fit. Since one of the R ij n can always be made real there are in principal twelve undetermined parameters to fit for when neglecting the overall normalization (note R Zγ 1 = R γγ 1 = 0). Fitting to ratios also makes any dependence on the production variables, p T and Y minimal since they mainly only affect selection efficiencies.

VI. EXAMPLE EXTRACTION OF EFFECTIVE HIGGS COUPLINGS
With Eq.(26) we now have a likelihood which is a continuous function of the underlying effective couplings A (as well as the background fraction F B ) and which includes the relevant detector and systematic effects. With this in hand we can go on to explore a variety of multiparameter fits for a number of combinations of parameters. In this section we present an example parameter fit to the Standard Model point where only the R ZZ We note that although we consider only scenarios where the true values of R Zγ,γγ 2 and R Zγ,γγ 3 are zero in this study, it is still important to account for the effects of floating these parameters in the fit since there are potentially important correlations between them and the ZZ couplings [35], as well as the background. This ensures that one is able to distinguish between effects coming from the Zγ or γγ couplings from those coming from the ZZ sector or perhaps a background fluctuation. Fitting to all parameters simultaneously thus allows us to build an analysis framework that is as model independent as possible without a priori assuming a certain value for the various couplings. We do however assume for simplicity that all couplings are real and therefore do not float their phases. The effects of complex phases in these parameters and the possible degeneracies that they generate will be explored in future work.
Of course even in the Standard Model R Zγ 2 , R γγ 2 = 0 and detailed explorations of the CP properties of the γγ [50] and Zγ couplings is also very interesting. An investigation of the sensitivity of the golden channel to these couplings is ongoing [51].

A. Phase Space Definition
Before constructing the likelihood and performing parameter extraction, we must first define our phase space. We perform all fits for cuts similar to those used by the CMS collaboration [2]: • p T > 20, 10, 7, 7 GeV for the ordered lepton p T • |η | < 2.4 for the lepton rapidity In reconstructing M 1 and M 2 we always take M 1 to be the reconstructed invariant mass closer to the Z mass as well as M 1 > M 2 . We show for simplicity only results for the 2e2µ final state in what follows.

B. Results for 'SM-like' ZZ Point
For an example parameter extraction we examine a Standard Model like point where A = (2, 0, 0; 0, 0; 0, 0) with A defined in Eq.(3). We will normalize the parameters by |A| = |A ZZ 1 |, leaving six parameters to be extracted in the fit under the assumption that all parameters are real. We treat R ZZ 1 = 1 in Eq.(30) as fixed and float the remaining six R ij n . As background is present we must also fit for the background fraction F B . Thus in the end we have seven parameters which are floated simultaneously in the fit procedure.
To estimate the effect of systematic uncertainties on the fit precision, we perform the 7-dimensional fit with and without including systematic uncertainties and compare their results. In Fig. 9 we show the result of this comparison and show the extracted background fraction as well as the six parameters for a large set of pseudo experiments each containing 200 signal and 120 background events. The fit without systematics is shown in green while the fit with systematics shown in red. The true value is indicated by the dashed-dotted line.
One can see from the bottom plot in Fig. 9 the fit to F B which is slightly off-center from the true value. This is a consequence of the fact that we are floating the Zγ couplings as well. This can be understood from the fact that at ∼ 125 GeV the background is primarily composed of the Zγ contribution [19] leading to a correlation between the F B and R Zγ 2,3 parameters, in particular for small datasets. We illustrate this in Fig. 10 where we show the correlation between F B and |R Zγ | = |R Zγ 2 | 2 + |R Zγ 3 | 2 for 200 signal and 120 background events. However, as the number of events in the dataset is increased (or the Zγ couplings are held fixed) this effect is diminished and we find the distribution of fits is centered at the true value, but this emphasizes the point that one must interpret results with caution when dealing with small datasets. We note however that even in the case of small datasets the true value is still within the uncertainty as can be seen in Table I where we show results for the 68% range of the extracted values for the same data sets as those shown in Fig. 9.
We examine some of the potential correlations between various pairs of parameters as shown in Fig. 11 for a few of the possible combinations. We show the results of the fits including systematic effects for a large set of pseudoexperiments with 200 signal and 120 background events. The true value is indicated by the intersection of the two dashed-dotted lines. The colors indicate the density of pseudoexperiments returning a particular value for the extracted parameters as indicated on the x and y axis. We can see that in the majority of pseudoexperiments the fit returns values close to the true ones. One can also see in these plots some of the potential correlations between the various parameters though the full set of correlations between the seven parameters which are contained in the fit can not be displayed easily. This also demonstrates the importance of including all possible couplings in the differential cross section.
We also study how the spread of the distribution changes as a function of the numbers of events. In Fig. 12 we have plotted the results for various number of signal events ranging from 25 to 1600 per pseudoexperiment with a 37.5% background fraction. The color indicates the density of pseudoexperiments which return a value of the parameter as indicated on the y-axis. We can easily see by the color that the spread roughly decreases with the expected 1/ √ N scaling as the number of events is increased. We observe that most of the pseudoexper- We compare results of the fit to events generated with detector smearing and acceptance effects applied without systematic effects (green) versus to events generated with detector smearing and acceptance effects applied with systematic effects (red).
iments return a value close to the true value, indicated by the solid black line. Quantifying more precisely how the uncertainty changes as a function of number of events is beyond the scope of this study, and is left to ongoing work.

VII. CONCLUSIONS AND OUTLOOK
In this study we build upon an earlier study [35] to construct a comprehensive analysis framework aimed at extracting as much information as possible from the Higgs golden channel. Our framework is based on a maximum likelihood method constructed from analytic expressions of the fully differential cross sections for the h → 4 decay as well as the dominant irreducible qq → 4 background which were computed in [19,35]. As our main result, we construct an 8-dimensional detector level likelihood using the full set of decay observables available in the golden channel. This allows us to perform parameter extraction of the various possible Higgs couplings, including general CP odd/even admixtures and any possible phases. Correlations between certain pairs of parameters for the Standard Model point described in text. We conduct a large set of pseudoexperiments with 200 signal and 120 background events. The true value is indicated by the intersection of the two dashed-dotted lines and again we have fit to the ratios R ij n = Anij/|A| and take the normalization to be |A| = |A1ZZ |. We have included detector smearing and acceptance effects as well as systematics.
The detector-level likelihood is obtained by the explicit convolution of a transfer function, encapsulating the relevant detector effects, with the generator-level probability density formed out of the signal and background differential cross sections. After performing this 12-dimensional convolution integral and its normalization we obtain a probability density function from which we construct an unbinned detector-level likelihood which is a continuous function of the effective couplings. We average over the four production variables to construct the likelihood in terms of the full set of eight center of mass variables available in the h → 4 decay. Systematic uncertainties are included in the likelihood via the use of nuisance parameters.
We present the steps involved in performing the convolution integral as well as various validations and discuss the maximization procedure and the inclusion of systematic effects. We present various example parameter extraction cases to demonstrate the flexibility and power of our framework. We have included in the likelihood the dominant signal and background contributions as well as detector and systematic effects. In follow-up work we will include the sub-dominant backgrounds, gg → ZZ and Z + X, and the sub-dominant systematic uncertainties.
In summary, we have constructed a comprehensive optimized analysis framework to perform multi-parameter extraction of the various Higgs couplings to electroweak gauge bosons in the golden channel. This framework can be readily adapted to analyses to be performed at the LHC as well as at future colliders.  We fit to the ratios R ij n = Anij/|A| and take the normalization to be |A| = |A1ZZ |. We have included detector smearing and acceptance effects as well as systematics.

VIII. APPENDIX
In this Appendix we show results for a second example parameter extraction of an 'exotic' ZZ point. We also give further details on the parametrization of the transfer function discussed in Eqs. (11) and (15).

A. Results for 'Exotic' ZZ Point
For our second example parameter extraction we examine an exotic ZZ point where A = (2, 7.7, 10.1; 0, 0; 0, 0) with A defined in Eq.(3). We normalize the parameters again by |A| = |A ZZ 1 |, leaving 6 parameters to be extracted in the fit, under the assumption that they are real. Thus we treat R ZZ 1 = 1 in Eq.(30) as fixed and float the remaining six R ij n . As background is present we must also fit for the background fraction F B . Thus in the end we have seven parameters which are floated simultaneously in the fit procedure.
Again, we estimate the effect of systematic uncertainties on the fit precision by performing the 7-dimensional fit with and without including systematic uncertainties, shown in subsequent figures in green and red respectively, and comparing their results. In Fig. 13 we show the result of this comparison and show the extracted background fraction as well as the six parameters for a large set of pseudo experiments each containing 200 signal and 120 background events. The true value is indicated by the dashed-dotted line. We also show in Table II numerical values of the result for the 68% range of the extracted values of parameters.
We can also examine some of the potential correlations between various pairs of parameters as shown in Fig. 14 for a few of the possible combinations. We show the results of the fits including systematic effects for a large set of pseudo experiments with 200 signal and 120 background events. The true value is indicated by the intersection of the two dashed-dotted lines. The colors indicate the density of pseudo experiments returning a particular value for the extracted parameters as indicated on the x and y axis. We can see that in the majority of pseudo experiments the fit returns values close to the true ones. One can also see in these plots some of the potential correlations between the various parameters though of course the full set of correlations between the six parameters which are contained in the fit can not be displayed easily.
It is also important to study how the spread of the distribution changes as a function of the numbers of events. In Fig. 15 we have plotted the results for various number of signal events ranging from 25 to 1600 per pseudo experiment with a 37.5% background fraction. The color indicates the density of pseudo experiments which return a value of the parameter as indicated on the y-axis. Here we again fit to R ij n = A nij /|A| where |A| = |A 1ZZ |. We can easily see by the color that the spread roughly decreases with the expected 1/ √ N scaling as the number of events is increased. We observe that most of the pseudo-experiments return a value close to the true value, indicated by the solid black line. Quantifying more precisely how the uncertainty changes as a function of number of events is beyond the scope of this study, and is left to ongoing work.  = 1 and where R ij n = Anij/|A1ZZ |. We compare results of the fit to events generated with detector smearing and acceptance effects applied without systematic effects (green) versus to events generated with detector smearing and acceptance effects applied with systematic effects (red)

B. Modeling the Detector Response
The main challenge is to propagate the effects of the lepton selection efficiency and the imperfect momentum measurement resolution on the pdf without excessive requirement on computational time. Due to the excellent performance of the tracking detectors at the LHC experiments, we make the assumption that the effect of the imperfect measurements of lepton direction is negligible compared to the effect of imperfect momentum measurements. Therefore, all detector response effects including effects from selection inefficiency may be parameterized into transfer functions in the following way, where c i = p i R || /p i G || . Here we have decomposed the lepton three momenta p i in terms of p i|| and p i⊥ where p i|| is the component of the lepton momentum parallel to the direction of motion, while p i⊥ are the two components perpendicular to the direction of motion (which are zero of course). The response function S, parameterizes the probability for a lepton with actual momentum p G i to be reconstructed with momentum p R i , while δ is the Dirac delta function, and is the selection efficiency. With typical lepton selection criteria employed by the LHC experiments, it is a good approximation that each lepton is independent. Thus, the full transfer function for the event may be written as: where we have defined, c = (c 1 , c 2 , c 3 , c 4 ), P G = ( p 1 G , p 2 G , p 3 G , p 4 G ). (33) We treat T ( c | P G ) as a function of c which takes the generator level momenta P G as input. The only effect FIG. 14. Correlations between certain pairs of parameters. We conduct a large set of pseudoexperiments with 200 signal and 120 background events for the exotic ZZ point described in text. The true value is indicated by the intersection of the two dashed-dotted lines and again we have fit to the ratios R ij n = Anij/|A| and take the normalization to be |A| = |A1ZZ |. We have included detector smearing and acceptance effects as well as systematics.
of imperfect momentum measurement on the production spectra is to provide a small smearing of the p T spectrum for the four lepton system. Since we average over this spectrum in the measurement fit, this small smearing can be neglected to current precision.  15. Distribution of extracted parameters as a function of the numbers of events for 25 to 1600 signal events per pseudoexperiment with a 37.5% background fraction for the exotic ZZ point described in text. The color indicates the density of pseudo experiments which return a value of the parameter as indicated on the y-axis while the true value is now indicated by the solid black line. We fit to the ratios R ij n = Anij/|A| and take the normalization to be |A| = |A1ZZ |. We have included detector smearing and acceptance effects as well as systematics.