Measurements of Higgs boson properties in the diphoton decay channel in proton-proton collisions at $\sqrt{s} =$ 13 TeV

Measurements of Higgs boson properties in the H $\to\gamma\gamma$ decay channel are reported. The analysis is based on data collected by the CMS experiment in proton-proton collisions at $\sqrt{s} =$ 13 TeV during the 2016 LHC running period, corresponding to an integrated luminosity of 35.6 fb$^{-1}$. Allowing the Higgs mass to float, the measurement yields a signal strength relative to the standard model prediction of 1.18 $^{+0.17}_{-0.14} =$ 1.18 $^{+0.12}_{-0.11}$ (stat) $^{+0.09}_{-0.07}$ (syst) $^{+0.07}_{-0.06}$ (theo), which is largely insensitive to the exact Higgs mass around 125 GeV. Signal strengths associated with the different Higgs boson production mechanisms, couplings to bosons and fermions, and effective couplings to photons and gluons are also measured.


Introduction
The standard model of particle physics (SM) [1][2][3] has been very successful in explaining the interactions between elementary particles. During the Run 1 period (2010-2012) of the CERN LHC, with proton-proton collisions at centre-of-mass energies of 7 and 8 TeV, a new particle was discovered by the ATLAS [4] and CMS [5,6] Collaborations. The discovery was followed by a comprehensive set of studies of the properties of this new boson in the decay channels and production modes accessible with the LHC Run 1 data set. Measurements from ATLAS and CMS [7,8] have shown that the properties of the new boson are consistent with expectations for the SM Higgs boson [9][10][11][12][13][14].
Despite the small branching fraction predicted by the SM (≈0.2%), the H → γγ decay channel provides a clean final state with an invariant mass peak that can be reconstructed with high precision. As a consequence, H → γγ was one of the most important channels for the discovery of the Higgs boson and first measurements of its properties [15,16]. In Run 2, with protonproton collisions at √ s = 13 TeV, this channel remains one of the most sensitive to continue the precise characterization of the Higgs boson.
In this paper, measurements of the Higgs boson production rates with respect to the SM prediction (signal strength modifiers) are presented, along with measurements of the coupling modifiers to fermions and bosons, and effective coupling modifiers to photons and gluons, in the so-called κ framework [17]. Improved precision on these parameters constrains possible deviations in the Higgs sector of the SM. The analysis is based on proton-proton collision data collected at √ s = 13 TeV by the CMS experiment in 2016, corresponding to an integrated luminosity of 35.9 fb −1 .

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid, 13 m in length and with an inner diameter of 6 m, which provides an axial magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
Charged-particle trajectories are measured by the silicon pixel and strip tracker, with full azimuthal coverage within |η| < 2.5. The ECAL and HCAL surround the tracking volume and cover the region |η| < 3.0. The ECAL barrel extends to |η| < 1.48, while the endcaps cover the region 1.48 < |η| < 3.0. A lead/silicon-strip preshower detector is located in front of the ECAL endcap in the region 1.65 < |η| < 2.6. The preshower detector includes two planes of silicon sensors measuring the x and y coordinates of the impinging particles. A steel/quartzfibre Cherenkov forward calorimeter extends the calorimetric coverage to |η| < 5.0. In the region |η| < 1.74, the HCAL cells have widths of 0.087 in both pseudorapidity and azimuth (φ). In the (η, φ) plane, and for |η| < 1.48, the HCAL cells map on to 5×5 ECAL crystal arrays to form calorimeter towers projecting radially outwards from points slightly offset from the nominal interaction point. In the endcap, the ECAL arrays matching the HCAL cells contain fewer crystals. The calibration of the ECAL uses the azimuthal symmetry of the energy flow in minimum-bias events, π 0 → γγ, η → γγ, W → eν, and Z → e + e − decays. Changes in the response of the ECAL crystals due to irradiation during the LHC running periods and their subsequent recovery are monitored continuously and corrected for, using light injected from a laser system. More details on the methods employed are given in Ref. [18].
The global event reconstruction algorithm, also called particle-flow event reconstruction [19], attempts to reconstruct and identify individual particles using an optimized combination of information from the various elements of the CMS detector. The energy of photons is directly obtained from the ECAL measurement with a procedure described in greater detail in Section 5.1. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of muons is obtained from the curvature of the corresponding track. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energy.
Hadronic jets are clustered from these reconstructed particles using the infrared-and collinearsafe anti-k T algorithm [20], with a distance parameter of 0.4. The jet momentum is determined as the vectorial sum of all particle momenta in the jet. An offset correction is applied to jet energies to take into account the contribution from additional proton-proton interactions within the same or nearby bunch crossings. Jet energy corrections are derived from simulation, and are confirmed with in situ measurements of the energy balance in dijet, multijet, photon + jet, and leptonically decaying Z + jets events [21]. The jet momentum is found from simulation to be within 5 to 10% of the true momentum over the entire jet transverse momentum (p T ) spectrum and detector acceptance. Additional selection criteria are applied to each event to remove spurious jet-like features originating from isolated noise patterns in certain HCAL regions.
To identify jets originating from the hadronization of bottom quarks, the combined secondary vertex (CSV) b tagging algorithm is used [22,23]. The algorithm tags jets from b hadron decays by their displaced decay vertex, providing a numerical discriminant value that is higher for jets likely to be initiated by b quarks. Two tagging algorithm working points, medium and loose, are used in this analysis: the medium (loose) point provides an efficiency for identifying bquark jets of about 70% (85%) and a misidentification probability for jets from light quarks and gluons of about 1% (10%).
The missing transverse momentum vector is taken as the negative vector sum of all reconstructed particle candidate transverse momenta in the event reconstruction, and its magnitude is referred to as p miss T . A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [24].

Analysis strategy
The dominant Higgs boson production mechanism in proton-proton collisions is gluon-gluon fusion (ggH), with additional contributions from vector boson fusion (VBF), and production in association with a vector boson (VH) or with a top quark pair (ttH).
To maximise the sensitivity of the analysis, specific production modes with reduced background contamination are targeted. Events are categorized by requiring specific features in the final state: forward jets for VBF, top decay products such as muons, electrons, missing transverse energy from neutrinos, jets arising from the hadronization of b quarks for ttH, and vector-boson decay products such as muons, electrons, missing transverse energy, or dijets with a characteristic invariant mass for VH production. The events with no specific features, mostly coming from ggH, are categorized according to their expected probability to be signal rather than background.
Several multivariate techniques are used in the analysis. An initial set is used to improve the event reconstruction, and particularly the photon energy estimate, the photon identification, the identification of the diphoton primary vertex and the estimate of its probability of being the true diphoton vertex. In the subsequent steps of the analysis, the event classification benefits from multivariate techniques to categorize ggH events, to enhance the identification of forward jets in VBF events and the separation of such events from ggH events, to enhance the b tagging and the separation of ttH jets in events with multiple jets.
Measurements are extracted by a simultaneous maximum-likelihood fit to the diphoton invariant mass distributions in all event categories. Simulated samples are used to derive the signal model, while the background is obtained from the fit to the data. The latter aspect is particularly important, as it makes the use of simulated samples only relevant to the optimization of the multivariate classifiers used in the different steps of the analysis. While imperfect simulation might induce suboptimal performance, the use of multivariate inputs uncorrelated with the diphoton invariant mass ensures that no bias is introduced. The impact of the choice of the event generator on the multivariate discriminators has also been checked and found to be negligible.

Data sample and simulated events
The events used in this analysis were selected by diphoton triggers with asymmetric transverse energy (E T ) thresholds of 30 and 18 GeV. The trigger selection requires a loose calorimetric identification using the shape of the electromagnetic showers, a loose isolation requirement, and a selection on the ratio of the HCAL and ECAL deposits of the photon candidates. The R 9 shower shape variable is used in the trigger to identify photons that convert to an e + e − pair in the tracker material before reaching the ECAL surface. The R 9 variable is defined as the energy sum of the 3 × 3 crystals centred on the most energetic crystal in the candidate electromagnetic cluster divided by the energy of the candidate. The electromagnetic showers from photons that convert before reaching the calorimeter have wider transverse profiles and lower values of R 9 than those of unconverted photons. The trigger efficiency is measured from Z → e + e − events using the tag-and-probe technique [25]. Efficiencies in simulation are corrected to match those measured in data.
Simulated signal events are generated using MADGRAPH5 aMC@NLO v2.2.2 at next-to-leading order (NLO) [26] in perturbative quantum chromodynamics (QCD) with FxFx merging [27], the parton level samples being interfaced to PYTHIA8.205 [28] for parton showering and hadronization. The CUETP8M1 PYTHIA underlying event tune parameter set is used [29]. Events produced via the gluon fusion mechanism are weighted as a function of the Higgs boson p T and the number of jets in the event, to match the prediction from the NNLOPS program [30]. Parton distribution functions (PDFs) are taken from the NNPDF3.0 [31] set. The signal cross sections and branching fraction recommended by the LHC Higgs cross section working group are used [32].
The dominant background to H → γγ consists of the irreducible prompt diphoton production, and the reducible backgrounds from γ + jet and dijet events where the jets are misidentified as isolated photons. Background events, used for the trainings of multivariate discriminants and for category optimization, have been simulated using various event generators. The diphoton background is modeled with the SHERPA v.2.2.1 [33] generator. It includes the Born processes with up to 3 additional jets as well as the box processes at leading order. Multijet and γ + jet backgrounds are modeled with PYTHIA, with a filter applied to enhance the production of jets with a large fraction of electromagnetic energy. The Wγ and Zγ samples are generated with MADGRAPH5 aMC@NLO at leading order, while Drell-Yan events are simulated with the same generator at NLO precision.
The detailed response of the CMS detector is simulated using the GEANT4 [34] package. This includes the simulation of the multiple proton-proton interactions taking place in each bunch crossing, referred to as pileup. These can occur at the nominal bunch crossing (in-time pileup) or at the crossing of previous and subsequent bunches (out-of-time pileup), and the simulation accounts for both. Simulated events are weighted to reproduce the distribution of the number of interactions in data. The average number of pileup interactions measured in data amounts to 23, with a root-mean-square (RMS) of about 6.

Photon reconstruction and identification
Photon candidates are reconstructed as part of the global event reconstruction, as described in Section 2. Photons are identified as ECAL energy clusters not linked to the extrapolation of any charged-particle trajectory to the ECAL. The clustering algorithm allows an almost complete collection of the energy of the photons, even for those converting in the material upstream of the calorimeter. First, cluster "seeds" are identified as local energy maxima above a given threshold. Second, clusters are grown from the seeds by aggregating crystals with at least one side in common with a clustered crystal and with an energy in excess of a given threshold. This threshold represents about two standard deviations of the electronic noise in the ECAL and amounts to 80 MeV in the barrel and, depending on |η|, up to 300 MeV in the endcaps. The energy of each crystal can be shared among adjacent clusters assuming a Gaussian transverse profile of the electromagnetic shower. Finally, clusters are merged into "superclusters", to allow good energy containment, accounting for geometrical variations of the detector along η, and optimizing robustness against pileup.

Photon energy
The energy of photons is computed from the sum of the energy of the clustered crystals, calibrated and corrected for changes in the response over time [18] and considered in the clustering procedure. The preshower energy is added to that of the superclusters in the region covered by this detector. To optimize the resolution, the photon energy is corrected for the containment of the electromagnetic shower in the superclusters and the energy losses from converted photons [35]. The correction is computed with a multivariate regression technique that estimates simultaneously the energy of the photon and its uncertainty. This regression is trained on simulated photons using as the target the ratio of the true photon energy and the sum of the energy of the clustered crystals. The inputs are shower shapes and position variables -both sensitive to shower containment and possible unclustered energy -preshower information, and global event observables sensitive to pileup.
A multistep procedure has been implemented to correct the energy scale in data, and to determine the additional smearing to be applied to the reconstructed photon energy in simulated events so as to reproduce the energy resolution observed in data. First, the energy scale in data is equalized with that in simulated events, and residual long-term drifts in the response are corrected, using Z → e + e − decays in which the electron showers are reconstructed as pho-tons. Then, the photon energy resolution predicted by the simulation is improved by adding a Gaussian smearing determined from the comparison between the Z → e + e − line-shape in data and simulation (Fig. 1). The corrections to the energy scale are extracted differentially in time, |η| (two categories in the barrel and two in the endcaps) and R 9 (two categories). They range from about 0.1 to about 0.3%, depending on the category. The amount of smearing required is extracted differentially in the same |η| and R 9 categories as the energy scale corrections and ranges from about 0.1 to about 2.7%, depending on the category.

Photon preselection
The photons considered further in this analysis are required to satisfy preselection criteria similar to, but slightly more stringent than, the trigger requirements. The preselection requirements consists of: • p γ1 T > 30 GeV and p γ2 T > 20 GeV, where p γ1 T and p γ2 T are the transverse momenta of the leading (in p T ) and subleading photons, respectively; • |η| < 2.5, excluding the barrel-endcap transition region 1.44 < |η| < 1.57, where the photon energy reconstruction is affected by a suboptimal containment of the electromagnetic shower; • a selection on the R 9 variable and on σ ηη -the lateral extension of the shower, defined as the energy-weighted spread within the 5×5 crystal matrix centred on the crystal with the largest energy deposit in the supercluster -to reject ECAL energy deposits incompatible with a single isolated electromagnetic shower, such as those coming from neutral mesons; • a selection on the ratio of the energy in the HCAL cells behind the supercluster to the energy in the supercluster (H/E), to reject hadrons; • an electron veto, which rejects the photon candidate if its supercluster is matched to an electron track with no missing hits in the innermost tracker layers; • a requirement on the photon isolation (I ph ), defined as the sum of the transverse energy of the particles identified as photons and falling inside a cone of radius R = (∆η) 2 + (∆ϕ) 2 = 0.3 around the photon candidate direction; the sum is corrected for the contribution of the pileup estimated from the median energy density in the event [36]; • a requirement on the track isolation in a hollow cone (I tk ), the sum of the transverse momenta of all tracks in a cone of radius R = 0.3 around the photon candidate direction (with tracks in an inner cone of size R = 0.04 not included in the sum); the cone is hollow to use the same isolation definition also for electrons; • a loose requirement on charged-hadron isolation (I ch ), the sum of the transverse momenta of charged particles inside a cone of radius R = 0.3 around the photon candidate; this requirement is added to the one on track isolation to match the selection applied to photon candidates as part of data reconstruction; • a loose requirement on the photon identification (as described in Section 5.3).
The selection thresholds are reported in Table 1. Additionally, both photons must satisfy either (a) R 9 > 0.8 and I ch < 20 GeV, or (b) I ch /p γ T < 0.3. The efficiency of all preselection criteria, except the electron veto requirement, is measured with a tag-and-probe technique using Z → e + e − events. The efficiency for photons to satisfy the electron veto requirement, which cannot be measured with Z → e + e − events, is obtained from Z → µ + µ − γ events, in which the photon is produced by final-state radiation and provides a sample of prompt photons with purity higher than 99%. Table 2 shows the preselection efficiencies measured in data, data , and simulation, MC , along with their ratio data / MC . Statistical and systematic uncertainties are included both in the efficiencies and in their ratio. The measured ratios are used to correct the signal efficiency in simulated signal samples and the associated uncertainties are propagated to the expected signal yields.

Photon identification
A boosted decision tree (BDT) is used to separate prompt photons from photon candidates that arise from misidentified jet fragments, but which satisfy the preselection. This photon identification BDT is trained using simulated γ + jet events where prompt photons are considered as signal and non-prompt photons as background.
The photon identification BDT is trained with the following input variables: • shower shape observables, corrected to mitigate data and simulation discrepancies; • isolation variables, I ph and I ch ; two kinds of I ch are computed, including hadrons associated with the chosen primary vertex (described in Section 6), and including hadrons associated with the vertex providing the largest isolation sum; the latter is effective in rejecting misidentified photon candidates originating from jets coming from a vertex other than the chosen one; • photon η and energy, which are correlated with the shower topology and isolation variables; • the median energy density per unit area in the event, ρ, to minimize the impact of pileup on the above inputs.   The photon identification BDT score is also shown in Fig. 2 (right) for electrons reconstructed as photons in Z → e + e − events, in data and simulation. The systematic uncertainty in the photon identification score, represented by the hashed region, is conservatively assigned to cover the largest observed discrepancy between data and simulation for electrons in the ECAL endcaps.

Diphoton vertex
The determination of the primary vertex from which the two photons originate has a direct impact on the diphoton invariant mass resolution. If the position along the beam axis (z) of the interaction producing the diphoton is known to better than about 10 mm, the invariant mass resolution is dominated by the photon energy resolution. For comparison, the distribution in z of the position of the vertices reconstructed from the observed tracks has an RMS spread of about 3.4 cm.
The diphoton vertex assignment relies on a BDT (the vertex identification BDT) whose inputs are observables related to tracks recoiling against the diphoton system: distribution of the primary vertices is the same as the beam spot width in data.
The efficiency of correctly assigning the diphoton vertex to be within 10 mm of the true vertex in H → γγ simulated events is shown in Fig. 4 as a function of the p T of the diphoton pair and as a function of the number of primary vertices in the event, and compared with the average estimated vertex probability BDT. The overall efficiency is about 81%.

Event classification
The event selection requires two preselected photon candidates with p γ1 T > m γγ /3 and p γ2 T > m γγ /4, in the mass range 100 < m γγ < 180 GeV. The use of p T thresholds scaled by m γγ prevents a distortion of the low end of the invariant mass spectrum. The requirement on the photon p T is applied after the vertex assignment.
To improve the sensitivity of the analysis, events are classified targeting different production mechanisms and according to their mass resolution and predicted signal-to-background ratio. In each category, the selections are optimized to maximize the significance of the expected signal with respect to the background. As the first step of the classification, exclusive event categories are defined by dedicated selections on additional reconstructed objects to select Higgs boson production mechanisms other than ggH: VBF, VH or ttH.
All objects are reconstructed as described in Section 2 and (for photons) Section 5. In addition, electrons are required to be within |η| < 2.5 and outside the barrel-endcap transition region. Muons are required to be within |η| < 2.4.
A dedicated diphoton BDT is used in the event categorization. The diphoton BDT assigns a high score to events with photons showing signal-like kinematics, good mass resolution, and high photon identification BDT score. The input variables to the classifier are: • p γ T /m γγ for each photon; • the pseudorapidity of the two photons; • the cosine of the angle between the two photons in the transverse plane; • photon identification BDT scores for both photons; • two per-event relative mass resolution estimates, one under the hypothesis that the mass has been reconstructed using the correct primary vertex, and the other under the hypothesis that the mass has been reconstructed using an incorrect vertex; • the per-event probability estimate that the correct primary vertex has been assigned to the diphoton.
The relative mass resolution is computed from the propagation of the photon energy resolution estimates, assuming the functional forms of the photon resolutions are Gaussian. Figure 5 (left) shows the transformed score from the diphoton multivariate classifier for data and simulated signal and backgrounds, for events with two photons satisfying the preselection requirements. The classifier score has been transformed such that the sum of signal events from all the production modes has a uniform distribution. A validation of the score from the diphoton multivariate classifier obtained in Z → e + e − events, where the electrons are reconstructed as photons, is shown in Fig. 5 (right) for data and simulation.

Event categories for ttH production
Events produced in association with a top quark pair feature two b quarks from the decay of the top quarks, and may be accompanied by charged leptons or additional jets. In the latter case, to enhance the tagging of ttH multijet events, a multivariate discriminant is built upon the following inputs: • the number of jets with p T > 25 GeV; • the leading jet p T ; • the two highest scores from the btag CSV discriminator.   The output of this discriminant is shown in Fig. 6. The threshold on the discriminant is optimized jointly with the requirement on the diphoton BDT score by maximizing the expected sensitivity to ttH production.
To cross-check the performance of this BDT observable, a control sample in data is defined by selecting events with a pair of photons, one of which passes the preselection and photon identification requirements, while the other has no preselection applied and an inverted criterion on the score from the photon identification BDT. As the efficiency for selecting such photons is not the same as for the signal region, events in the control samples are weighted according to the η and p T of the photons so as to obtain a control sample with similar kinematic properties as the signal region, but statistically independent.  Figure 6: Score distribution of the jet multivariate discriminant used to enhance jet tagging in the ttH multijet category. The points show the distribution for data in the signal region sidebands, m γγ < 115 GeV or m γγ > 135 GeV; the histogram shows the distribution for events in the data control sample; the filled histogram shows the distribution for simulated signal events. The distributions in the simulated and control samples are scaled as to match the integral of that from the data sidebands.
Depending on the type of the top quark decay, the following categories are defined: • semileptonic top quark decays (ttH Leptonic): • leading photon p T > m γγ /2, subleading photon p T > m γγ /4; • diphoton classifier BDT score greater than 0.11; • at least one lepton with p T > 20 GeV; electrons must satisfy loose requirements on the same observables as described in Ref. [37]; muons are required to pass a tight selection based on the quality of the track, the number of hits in the tracker and muon system, and the longitudinal and transverse impact parameters of the track with respect to the muon vertex, and to satisfy a requirement on the relative isolation (after correction for pileup) based on transverse energy of the charged hadrons, neutral hadrons, and photons, in a cone around the muon with a radius between 0.05 and 0.2, depending on the p T of the muon; • all selected leptons are required to have R( , γ) > 0.35, where R is the distance between the objects in the η − φ plane; • specifically for electrons: |m e,γ − m Z | > 5 GeV, where m e,γ is the invariant mass of any pair of electron and photon and m Z refers to the mass of the Z boson; • at least two jets in the event with p T > 25 GeV, |η| < 2.4, and R(jet, γ) > 0.4 and R(jet, ) > 0.4; • at least one of the jets in the event identified as a b jet according to the CSV tagger medium requirement; • hadronic top quark decays (ttH Hadronic): • leading photon p T > m γγ /3, subleading photon p T > m γγ /4; • diphoton classifier BDT score greater than 0.58; • no leptons, defined according to the criteria of the ttH Leptonic category; • at least three jets in the event with p T > 25 GeV and |η| < 2.4; • at least one of the jets in the event identified as a b jet according to the CSV tagger loose requirement; • score from the ttH Hadronic multivariate discriminant greater than 0.75.

Event categories for VH production
The selection criteria targeting the associated production of the Higgs boson with a vector W or Z boson exploit the presence of leptons, missing transverse momentum, and jets. To reduce contamination from Drell-Yan events with an electron misreconstructed as a photon, or with photons radiated in the final state, photon candidates are required to be separated in angle from the closest lepton. The criteria are the following: • leptonic Z decays (ZH Leptonic): • leading photon p T > 3m γγ /8, subleading photon p T > m γγ /4; • diphoton classifier BDT score greater than 0.11; • two same-flavour leptons within the fiducial region, p T > 20 GeV; electrons and muons are required to satisfy the same identification criteria as for the ttH Leptonic category; • dilepton invariant mass m in the range 70 < m < 110 GeV; • R(γ, e) > 1.0, R(γ, µ) > 0.5, for each of the leptons; • in addition, a conversion veto is applied to the electrons to reduce the number of electrons originating from photon conversions, by requiring that, when an electron and photon candidate share a supercluster, the electron track is well separated from the centre of the supercluster: R(supercluster, e-track) > 0.4. • leptonic W decays (WH Leptonic): • leading photon p T > 3m γγ /8, subleading photon p T > m γγ /4; • diphoton classifier BDT score greater than 0.28; • at least one lepton with p T > 20 GeV; electrons and muons are required to satisfy the same identification criteria as for the ZH Leptonic category; • R(γ, ) > 1.0 and conversion veto as in the ZH Leptonic category; • missing transverse momentum p miss T > 45 GeV; • up to two jets each satisfying p T > 20 GeV, |η| < 2.4, R(jet, ) > 0. 4 • dijet invariant mass in the range 60 < m jj < 120 GeV; • |cos θ | < 0.5, where θ is the angle that the diphoton system makes, in the diphoton-dijet centre-of-mass frame, with respect to the direction of motion of the diphoton-dijet system in the lab frame. The distribution of this variable is rather uniform for VH events, while it is strongly peaked at 1 for background and events from ggH production.

Event categories for VBF production
Events produced via the VBF process feature two jets in the final state separated by a large rapidity gap. A multivariate discriminant is trained to tag the distinctive kinematics of the VBF jets, considering as background the production process of ggH + jets. This discriminant is given as input to an additional multivariate classifier (VBF combined BDT) along with the score from the diphoton BDT, and the ratio p γγ T /m γγ . Figure 7 (left) shows the transformed score from the VBF combined BDT for data in the mass sideband regions from 105-115 GeV and 135-145 GeV, along with the predicted VBF and ggH distributions. The VBF combined BDT score has been transformed such that the signal events from the VBF production mode has a uniform distribution. A validation of the score from the combined multivariate classifier obtained in Z → e + e − + jets events, where the electrons are reconstructed as photons and at least two jets satisfy the requirements listed below to enter the VBF category, is shown in Fig. 7 (right) for data and simulation.
The selections targeting the VBF production mechanism are the following: • leading photon p T > m γγ /3, subleading photon p T > m γγ /4; • photon identification BDT score greater than −0.2, to provide additional rejection against background events whose kinematics yield a high diphoton BDT score despite one reconstructed photon with a relatively low photon identification BDT score; • one jet with p T > 40 GeV and one with p T > 30 GeV, both with |η| < 4.7 and with a tight requirement on the pileup jet identification; • invariant mass of the two jets m jj > 250 GeV; • VBF combined multivariate discriminant greater than 0.43.
Three categories are defined using the score from the combined discriminant, and are optimized to maximize the expected signal significance in the VBF production channel.  Figure 7: Score distribution from the VBF combined BDT for (left) ggH and VBF signal distributions, compared to background taken from data in the mass sideband regions, and (right) Z → e + e − + jets events. On the left, the signal region selection is applied to the simulated ggH and VBF events; these are compared to points representing the background, as determined from data using the signal region selection in mass sidebands. On the right, the signal selection is applied to electrons reconstructed as photons, with points showing the distribution for data and the histogram showing the distribution for simulated Drell-Yan events, including statistical and systematic uncertainties (pink band). In both plots, dotted lines delimit the three VBF categories, while the grey region is discarded from the analysis.

Event categories for ggH production
Events not passing any exclusive category are classified using the multivariate discriminator described in the introduction of this section. The score from this classifier is used to select and divide the events into four "untagged" categories according to the diphoton mass resolution and predicted signal over background ratio. The number of categories is determined by maximizing the expected signal significance. The boundaries of these categories are shown in Fig. 5.

Final classification
Each event is classified exclusively by applying the category selections in order and choosing the highest-priority category satisfied by the event. Category selections targeting specific production processes are applied first, ranked by expected signal significance, then untagged categories. The final ordering is thus ttH Leptonic, ttH Hadronic, ZH Leptonic, WH Leptonic, VH LeptonicLoose, VBF categories, VH MET, VH Hadronic, and untagged. The fraction of events with multiple diphoton pairs satisfying one or more category selections is less than 2 × 10 −4 . In this case, the diphoton in the highest-priority category is selected or, in case of ambiguities, the diphoton pair with the highest sum of photon p T is selected.

Signal model
The signal shape for the diphoton invariant mass distribution in each category and for a nominal Higgs boson mass m H is constructed from simulation using events from the different production modes.
The simulation includes the tuning of the photon shower variables to the data, and accounts for trigger, reconstruction and identification efficiencies as measured with data-driven techniques (as discussed in Section 5). It also weights the events so that the distribution of the number of interactions and the primary vertex location reproduce those observed in data, as explained in Sections 4 and 6.
Since the shape of the m γγ distribution changes considerably depending on whether the vertex associated with the candidate diphoton was correctly identified within 10 mm, distributions for the correct vertex and wrong vertex assignments are fit separately when constructing the signal model. For each process, category, and vertex scenario, the m γγ distributions are fitted using a sum of at most five Gaussian functions.
For each process, category, and vertex scenario, a simultaneous fit of signal samples at mass values in the range from 120 to 130 GeV is performed to obtain parametric variations of the Gaussian function parameters used in the signal model fit. Polynomials are used to describe these variations.
The final fit function for each category is obtained by summing the functions for all production modes normalized to the expected signal yields in that category. Figure 8 shows the signal model corresponding to m H = 125 GeV for the best resolution category and also for all categories combined together, weighted by the S/(S + B) ratio, where S is the number of signal events, and B the number of background events in a window around the signal peak, in each category.
The product of efficiency and acceptance of the signal model as a function of m H for all categories combined is shown in Fig. 9.

Background model
The model used to describe the background is extracted from data with the discrete profiling method [38] as implemented in Ref. [15]. This technique was designed as a way to estimate the systematic uncertainty associated with choosing a particular analytic function to fit the background m γγ distribution. The method treats the choice of the background function as a discrete nuisance parameter in the likelihood fit to the data.
No assumptions are made about the particular processes composing the background nor the functional form of their smoothly falling diphoton invariant mass distribution. A large set of candidate function families is considered, including exponentials, Bernstein polynomials, Laurent series, and power law functions. For each family of functions, an F-test [39] is performed to determine the maximum order to be used, while the minimum order is determined by requiring a reasonable fit to the data.
When fitting these functions to the background m γγ distribution, the value of twice the negative logarithm of the likelihood (2NLL) is minimized. A penalty of n p is added to 2NLL to take into account the number of floating parameters n p in each candidate function and avoid favouring functions with a greater number of free parameters. When making a measurement of a given parameter of interest, the discrete profiling method determines the minimal 2NLL by considering all allowed functions for each value of the parameter.

Systematic uncertainties
The systematic uncertainties are treated differently depending on how they affect the m γγ signal distribution. The parameters of the signal model shape are allowed to vary, within the constraints set by the measurements described in Section 5.1, to account for systematic uncertainties in the photon scale and resolution. Additional nuisance parameters are included to account for systematic uncertainties which affect the overall rate and migration of signal events between the categories, and are log-normal constrained. For cases where the systematic uncertainty has an effect on the input to one of the classification discriminants, the uncertainty takes the form of a variation in the category yield, representing event migration between categories.

Theoretical uncertainties
Theoretical uncertainties in the signal yield associated with QCD calculations typically have an overall normalization uncertainty, taken from Ref. [32], along with an additional uncertainty accounting for the migration of events between the analysis categories. The category migration uncertainties are factorized from the overall yield uncertainty by scaling them appropriately so that the overall yield (including events outside the acceptance of the analysis) is unchanged. The uncertainties computed in this way are: • QCD scale uncertainty: related to variations of the renormalization and factorization scales, has two nuisance parameters affecting the overall normalization uncertainty and depending on the number of jets in the event. Variations are found to be typically less then 5%.
• PDF uncertainties: have an overall normalization from the PDF4LHC prescription [32,40], while the bin-to-bin migrations are calculated from the NNPDF3.0 [31] PDF set using the MC2HESSIAN procedure [41]. The category migrations are found to be typically less than 1%, depending on the category.
• α s uncertainty: the uncertainty in the value of the strong force coupling constant α s is evaluated following the PDF4LHC prescription. The overall variation in the relative event yield due to the α s uncertainty is at most 2.6%.
Further theoretical uncertainties are: • Underlying event and parton shower uncertainty: is obtained using samples where the choice and tuning of the generator has been modified. This systematic uncertainty is treated as an event migration systematic as it will mainly affect the jets in the analysis. The possibility that an event could move from one VBF category to another or from any VBF category to an inclusive category is assigned a systematic uncertainty of 7 and 9%, respectively.
• Gluon fusion contamination in the ttH tagged categories: the theoretical predictions for gluon fusion are less reliable in a regime where the Higgs boson is produced in association with a large number of jets. The systematic uncertainty in the gluon fusion contamination in the ttH tagged categories has been estimated taking into account several contributions: • uncertainty due to the limited size of the simulated sample: 10%.
• uncertainty from the parton shower modelling. This uncertainty is estimated as the observed difference in the jet multiplicity between MAD-GRAPH5 aMC@NLO predictions and data in tt + jets events (which are dominated by gluon fusion production gg → tt), with fully leptonic tt decays. This uncertainty is about 35% in the bins with the largest discrepancy (N jets ≥ 5). • uncertainty in the gluon splitting modelling. This is estimated by scaling the fraction of events from gluon fusion with real b jets by the observed difference between data and simulation in the ratio σ(ttbb)/σ(ttjj) at 13 TeV [42]. This uncertainty implies a variation of about 50% in the yield of gluon fusion events.
• Gluon fusion contamination in categories with additional jets and a high-p T Higgs boson: particularly important for estimating the yield in the VBF categories. A total of seven nuisance parameters account for different systematic effects: • uncertainties in jet multiplicities: two nuisance parameters account for missing higher-order corrections and two for migrations between categories with different jet multiplicity. These are based on the STWZ [43] and BLPTW [43][44][45] predictions. • uncertainties in the Higgs boson p T modelling: two nuisance parameters include migrations between regions with p T in the range between 60 and 120 GeV and above 120 GeV. A third nuisance parameter accounts for the impact of top quark mass effects, which are negligible for a Higgs boson p T below 150 GeV and rise to about 35% at 500 GeV; these impact primarily the tightest untagged and VBF categories, where the resulting uncertainty in the predicted gluon fusion yield is 6-8%. • uncertainties in the acceptance of gluon fusion events in the VBF categories, due to missing higher-order QCD effects in the calculations: these are estimated by variations of the renormalization and factorization scales in MCFM 5.8 [46]. Two nuisance parameters account for the uncertainty in the overall normalizations of Higgs boson events with 2 extra jets, or with 3 or more extra jets, allowing one to propagate the impact of jet suppression from the kinematic selections in the VBF BDT scores. An exten-sion of the Stewart-Tackmann method [47,48] is used. The impact on the yield of gluon fusion events in VBF categories is 8-13%.
• Uncertainty in the H → γγ branching fraction: is estimated to be about 2% [32].

Experimental uncertainties in the photon energy scale
The experimental uncertainties in the photon energy scale and resolution are propagated through to the signal model in the final statistical fit, allowing the shape to vary. These uncertainties are: • Energy scale and resolution: The uncertainties in the overall photon energy scale and resolution corrections are assessed with Z → e + e − events and applied to photons. These uncertainties account for varying the R 9 distribution, the regression training (using electrons instead of photons) and the electron selection used to derive the corrections. The uncertainty in the additional energy smearing is assigned propagating the uncertainties in the various |η| and R 9 bins to the Higgs boson signal phase space. In both cases dedicated nuisance parameters are included as additional systematic terms in the signal model and amount to a 0.15 to 0.5% effect on the photon energy depending on the photon category. The effect on the measurement of the inclusive signal strength modifier is found to be about 2.5%.
• Nonlinearity of the photon energy: An additional uncertainty accounts for the possible residual differences in the linearity of the energy scale between data and simulation. This effect is studied using Lorentz-boosted Z boson dielectron decays. The effect is found to be at most 0.1% on the photon energy in all categories, except in the untagged category with highest signal-to-background ratio, for which it is 0.2%.
Additional uncertainties are assigned based on studies accounting for differences between electrons and photons on the following points.
• Nonuniformity of the light collection: The uncertainty in the modelling of the fraction of scintillation light reaching the photodetector as a function of the longitudinal depth in the crystal at which it was emitted. The uncertainty has been slightly increased with respect to Run 1 to account for the larger loss in transparency of the ECAL crystals. The size of the effect on the photon energy scale for 2016 data is estimated to be 0.07%.
• Electromagnetic shower modelling: A further small uncertainty is added to account for imperfect electromagnetic shower simulation in GEANT4. A simulation made with a previous version of the shower description, not using the Seltzer-Berger model for the bremsstrahlung energy spectrum [49], changes the energy scale for both electrons and photons. Although mostly consistent with zero, the variation is interpreted as a limitation on our knowledge of the correct simulation of the showers, leading to a further uncertainty of 0.05% in the photon energy.
• Modelling of the material budget: The uncertainty in the material budget between the interaction point and the ECAL, which affects the behaviour of electron and photon showers, is estimated with specially simulated samples where the material budget is uniformly varied by ±5 %. This accounts for the difference in the estimate of the material budget between data and simulation, using methods based on electron bremsstrahlung, multiple scattering of pions, and energy flow in ECAL. The effect on the energy scale is at most 0.24%.
• Shower shape corrections: The uncertainty deriving from the imperfect shower shape modelling in simulation. It is estimated using simulation with and without the cor-rections on the shower shape variables applied to mitigate discrepancies between data and simulation (as described in Section 5.3). This uncertainty in the energy scale is at most 0.01-0.15%, depending on the photon category.

Additional experimental uncertainties
Other experimental uncertainties are accounted for by propagating the uncertainties in the efficiencies, scale factors, and selection variables through the analysis and applying them to the per-category signal yield: • Trigger efficiency: the trigger efficiency is measured from Z → e + e − events using the tag-and-probe technique; the impact on the event yields is at most 0.1%.
• Photon preselection: the systematic uncertainty is taken as the uncertainty in the ratio between the efficiency measured in data and in simulation; it ranges from 0.1 to 0.7%, according to the photon category, and results in an event yield variation from 0.2 to 0.5%, depending on the event category.
• Photon identification BDT score: to cover the observed discrepancies between data and simulation, the uncertainty in the signal yields in the different categories of the analysis is estimated conservatively by propagating the uncertainty described in Section 5 through the diphoton BDT and categorization.
• Per-photon energy resolution estimate: this uncertainty is parameterized conservatively as a rescaling of the resolution by ±5% about its nominal value, to cover all differences between data and simulation in the output distribution of the estimator. The variation is propagated through the diphoton BDT and categorization procedure.
• Jet energy scale and smearing corrections: this uncertainty is implemented as migration within VBF categories, within ttH categories, within VH categories, and from tagged to untagged categories. Jet energy scale corrections account for an 8 to 18% migration between the VBF categories and 11% from the VBF to untagged categories. The migration due to the energy scale is about 5% in ttH categories and up to about 15% in VH categories. The jet energy resolution has an impact on the event migration of less than 3% in all categories except VH, for which the effect can be as large as 20%.
• Missing transverse energy: this uncertainty is computed by shifting the reconstructed p T of the particle candidates entering the computation of p miss T within the momentum scale and resolution uncertainties appropriate to each type of reconstructed object, as described in Ref. [50]. It results in a 10 to 15% migration from the ggH categories into the VH MET category.
• Pileup jet identification: this uncertainty is estimated by comparing in data and simulation the identification score of jets in events with a Z boson and one balanced jet. The full discrepancy between data and simulation is used to estimate the event migration, which is of the order of 1% or less.
• Lepton isolation and identification: for both electrons and muons the uncertainty is computed by varying the ratio of the efficiency measured in data and simulation by its uncertainty. The measurement is done using the tag-and-probe technique on Z events. The resulting differences in the selection efficiency are less than 1% for the ttH Leptonic category, 1.5% for the WH Leptonic category, and 3% for the ZH Leptonic category.
• b tagging efficiency: uncertainties have been evaluated by comparing data and simulated distributions for the CSV b tagging discriminant, as described in Section 2.
The uncertainties include the statistical component in the estimate of the fraction of heavy-and light-flavour jets in data and simulation, and the corresponding mutual contaminations. These are propagated differently for the hadron-tagged category and the lepton-tagged category, because the former uses the b tagging discriminant distribution as input to a specialized ttH BDT, whereas the latter uses a fixed working point, as described in Section 7. For the lepton-tagged category, the uncertainty is evaluated by varying the measured b tagging efficiencies in data and simulation within their uncertainties [22]. For the hadron-tagged category, the uncertainty is evaluated by modifying the shape of the b tagging discriminant in the simulation. The resulting uncertainty in the signal yields is about 2% in the lepton-tagged category and less than 5% in the hadron-tagged category.
• Vertex finding efficiency: the largest contribution to this uncertainty comes from the modelling of the underlying event, plus the uncertainty in the ratio of data and simulation obtained using Z → µ + µ − events. It is handled as an additional nuisance parameter built into the signal model, which allows the fraction of events in the correct and wrong vertex scenario to change. The size of the uncertainty in the vertex selection efficiency is 2%.
• Integrated luminosity: it amounts to a 2.5% uncertainty in the signal yield [51].
The choice of the background parametrization is handled using the discrete profiling method, described in Section 9, which propagates the uncertainty on the choice of function through the fits.
The dominant systematic uncertainties on the signal strengths and couplings are the photon shower shape modelling (which affects the photon identification and per-photon energy resolution estimate), the photon energy scale and smearing, the jet energy scale, the integrated luminosity. The most important theoretical uncertainties are the branching fraction, and the renormalization and factorization scale uncertainties. Each of these uncertainties has an impact of a few percent on the overall signal strength, with some dependence on the targeted production mechanism, as shown in Fig. 10.

Results
To extract the results, binned maximum-likelihood fits are performed to the m γγ distributions of all categories, in the range 100 < m γγ < 180 GeV, with a single overall signal strength modifier and a single value of m H free to vary in the fit (profiled). Binned fits are used for speed of computation, and the chosen bin size of 250 MeV is sufficiently small compared to the mass resolution that no information is lost. The signal strength modifier µ is defined as the ratio of the observed Higgs boson rate in the H → γγ decay channel to the SM expectation. The data and the signal-plus-background model fit for each category are shown in Figs. 11-13. The m γγ distribution for the sum of all the categories is shown in Fig. 14. The one (green) and two (yellow) standard deviation bands shown for the background component of the fit include the uncertainty in the fitted parameters.   Table 3 and Fig. 15 show the expected number of signal events for each category. The total number is broken down by the contribution (in percent) of each production mode to any particular event category. The σ eff and σ HM are also listed: the former is defined as the smallest interval       Figure 15: Expected fraction of signal events per production mode in the different categories. For each category, the σ eff and σ HM of the signal model, as described in the text, are given. The ratio of the number of signal events (S) to the number of signal-plus-background events (S+B) is shown on the right hand side.

fb
A likelihood scan of the signal strength modifier is performed, with other parameters of the signal and background models allowed to vary. Systematic uncertainties are included in the form of nuisance parameters and the results are obtained using an asymptotic approach [52][53][54] with a test statistic based on the profile likelihood ratio (q) [55]. The individual contributions of the statistical and systematic uncertainties are separated by performing a likelihood scan removing the systematic uncertainties to determine the statistical uncertainty. The systematic uncertainty is then taken as the difference in quadrature between the total uncertainty and the statistical uncertainty. The results can be found in Fig. 16. The best fit signal strength modifier measured for all categories combined using this method is µ = 1.18 +0. 17 −0.14 = 1.18 +0.12 −0.11 (stat) +0.09 −0.07 (syst) +0.07 −0.06 (theo). The best fit mass is found at m H = 125.4 ± 0.3 GeV = 125.4 ± 0.2 (stat) ± 0.2 (syst) GeV. A precise determination of the systematic uncertainties affecting the best fit mass is not within the scope of this analysis. The maximum relative variation of µ for m H within a range of ±1 GeV around 125 GeV is less than 2%.
The results of a fit to the signal strength modifier for each production mode, defined analogously to the overall µ above, are shown in Fig. 17. The observed rates of the VBF, ttH, and VH production modes correspond respectively to p-values of 4.2, 0.074, and 0.47%, with respect to the absence of the considered production mode. The expected p-values are 1.8, 7.3, and 12%, respectively, for an SM Higgs boson, with the current data set.
A similar fit is performed to extract the ratios of observed cross sections to the SM prediction in the stage 0 of the simplified template cross section (STXS) framework [32]. These cross sections are for a reduced fiducial volume, defined by requiring the Higgs boson rapidity to be less than 2.5. Outside of this volume the analysis has a negligible acceptance. The ratios are measured for the ggH, VBF, ttH, and VH production processes. VH is further split considering the decay of the associated boson into WH leptonic, ZH leptonic, and VH hadronic, which groups hadronic decays of both the W and Z bosons. The STXS approach differs from the signal strength mod-ifier measurements in the splitting of the production modes, and reduces the dependence of the measurements on the theoretical uncertainties in the SM predictions, by avoiding the sizeable uncertainty associated with the extrapolation to the full phase space. The measured cross section ratios, where the SM prediction [32] is denoted as σ theo , are shown in Fig. 18. A two-dimensional likelihood scan of the signal strength modifier µ ggH,ttH for fermionic production modes (ggH and ttH) and µ VBF,VH for vector boson production modes (VBF, ZH, WH), with the value of the parameter m H profiled in the fit, is performed. Figure 19 shows the 68 and 95% confidence level (CL) contours. The best fit values for each modifier are µ ggH,ttH = 1.19 +0.22 −0.18 and µ VBF,VH = 1.21 +0.58 −0.51 . Deviations from the SM expectation in the couplings of the Higgs boson can be parameterized using coupling modifiers in the so-called κ framework [17]. Two-dimensional likelihood scans of the Higgs boson coupling modifiers are produced: κ f versus κ V , the coupling modifiers to fermions and bosons; and κ g versus κ γ , the effective coupling modifiers to gluons and photons. The κ parameters other than those varied are fixed to 1 in each case. Figure 20 shows the test statistic q and the 68% and 95% CL contours for each scan. The point (κ V , κ f ) = (1, −1) has an observed (expected) q value of 35.2 (53.7), inconsistent with the observed (expected) best fit value at the level of 5.8 (7.0) standard deviations.

Summary
We report measurements of the production cross section and couplings of the Higgs boson using its diphoton decay: the overall signal strength modifier; the signal strength modifier for each production mode separately; cross section ratios for the stage 0 simplified template cross section framework; the best fit rates in the µ VBF,VH -µ ggH,ttH plane with VBF and VH production, and ggH and ttH production, varied together; and the best fit coupling modifiers in the κ f -κ V and κ g -κ γ planes. The analysis is based on proton-proton collision data collected at √ s = 13 TeV by the CMS experiment at the LHC in 2016, corresponding to an integrated luminosity of 35.9 fb −1 . The best fit signal strength modifier obtained after profiling m H is µ = 1.18 +0. 17 −0.14 =      Figure 19: The two-dimensional best fit (black cross) of the signal strength modifiers for fermionic (ggH, ttH) and bosonic (VBF, ZH, WH) production modes compared to the SM expectation (red diamond). The Higgs boson mass is profiled in the fit. The solid (dashed) line represents the 68 (95)% confidence region. plane are µ ggH,ttH = 1.19 +0. 22 −0.18 and µ VBF,VH = 1.21 +0.58 −0.51 . When µ ttH is considered separately, the best fit value is µ ttH = 2.2 +0.9 −0.8 , corresponding to a p-value of 0.074% with respect to the absence of ttH production. Stage 0 simplified template cross sections are compatible with the standard model.