Measurements of Higgs Bosons Decaying to Bottom Quarks from Vector Boson Fusion Production with the ATLAS Experiment at $\sqrt{s}=$ 13 TeV

The paper presents a measurement of the Standard Model Higgs Boson decaying to $b$-quark pairs in the vector boson fusion (VBF) production mode. A sample corresponding to 126 fb$^{-1}$ of $\sqrt{s} =$ 13 TeV proton-proton collision data, collected with the ATLAS experiment at the Large Hadron Collider, is analyzed utilizing an adversarial neural network for event classification. The signal strength, defined as the ratio of the measured signal yield to that predicted by the Standard Model for VBF Higgs production, is measured to be $0.95^{+0.38}_{-0.36}$, corresponding to an observed (expected) significance of 2.6 (2.8) standard deviations from the background only hypothesis. The results are additionally combined with an analysis of Higgs bosons decaying to $b$-quarks, produced via VBF in association with a photon.


Introduction
Approximately 4.5 million Higgs bosons [1][2][3][4][5][6][7][8] have decayed into bottom quarks in the nearly 140 fb −1 of 13 TeV LHC collisions collected for analysis by the ATLAS experiment during Run 2. While this number is three times larger than that from any other decay mode, this decay channel remains the most poorly measured major Higgs boson decay channel. In the dominant production mode, where the Higgs boson is produced in a virtual top-quark loop connecting two interacting gluons (ggF), the signature of this decay is overwhelmed by the strong production of quarks and gluons, except in cases where the Higgs boson is highly boosted [9]. The ATLAS and CMS experiments have observed Higgs boson decays into -quarks, →¯, with most of the sensitivity coming from cases where the Higgs boson is produced in association with a vector boson ( , = , ), which provides sufficient discrimination against QCD background processes despite its small cross-section. The measurement by the ATLAS experiment of the signal yield relative to the Standard Model expectation, →¯, is 1.02 ± 0.12(stat.) ± 0.14(syst.), corresponding to a significance of 6.7 [10] relative to the background-only hypothesis. The measurement by the CMS experiment is →¯= 1.04 ± 0.14(stat.) ± 0.14(syst.), corresponding to a significance of 5.6 [11].
Vector-boson fusion (VBF), wherein the Higgs boson is produced when quarks from each proton radiate weak vector bosons that fuse to form the Higgs boson, is the second most frequent Higgs boson production mechanism. Its signature, shown in Figure 1(a), is characterized by the presence of jets from each of the quarks with a large rapidity gap between them. Because there is no coloured connection between the two protons, radiative hadronic activity between the two forward jets is suppressed. VBF Higgs boson production has been measured by the ATLAS experiment in several decay channels, and the combined result for the signal strength is VBF = 1.21 ± 0.18(stat.) ± 0.15(syst.) [12]. The most recent CMS measurement yields VBF = 0.73 ± 0.23(stat.) ± 0.16(syst.) also through the combination of several decay channels [13].
Studying Higgs boson decays into -quarks in the vector-boson fusion channel provides an avenue to pursue this challenging signature. Previous measurements of this process by the ATLAS experiment using 31 fb −1 of √ = 13 TeV data yielded combined results for VBF production with and without a photon in the final state [14]. The resulting signal strength of →¯was →¯= 2.5 +1.4 −1.3 , corresponding to an observed (expected) significance of 1.9 (0.8 ). The observed signal strength for VBF production of →( electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range | | < 4.9. Within the region | | < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within | | < 1.7, and two copper/LAr hadronic endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimized for electromagnetic and hadronic measurements respectively.
The muon spectrometer comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroids. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. A set of precision chambers covers the region | | < 2.7 with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected to be recorded by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [20]. The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger reduces in order to record events to disk at about 1 kHz.
The data used in this analysis were collected in 2016, 2017 and 2018 [21]. After applying event-cleaning requirements which ensure that the data are of good quality for triggering and reconstructing jets containing -quarks, these datasets correspond to integrated luminosities of 24.6 fb −1 , 43.6 fb −1 , and 57.7 fb −1 , respectively, for a total integrated luminosity of 125.9 fb −1 . During the 2016 data-taking, roughly one-fifth of the data taken by ATLAS which was good for physics was affected by an inefficiency in the vertex reconstruction in the high-level trigger, which reduced the efficiency of the algorithms used to identify jets originating from -hadron decays; those events were not retained for further analysis and are not included in the 24.6 fb −1 .

Monte Carlo Samples
Simulated events with a Higgs boson mass of 125 GeV are used for the signal modelling. The signal models include all major Higgs boson production modes: VBF and ggF, as well as and associated production with a pair of top quarks (¯). Simulated VBF signal events were generated at next-to-leading order (NLO) in QCD with P -B v2 [22][23][24][25], using the NNPDF3.0 [26] parton distribution functions (PDFs). NLO electroweak corrections are calculated using M G 5_aMC@NLO v3.0.1 [27] and applied as a function of the generated Higgs boson transverse momentum ( T ). Simulated ggF samples were generated at next-to-next-to-leading order (NNLO) in QCD with P -B v2 [28][29][30] using the NNPDF3.0 PDFs. Both the VBF and ggF samples use P 8.212 [31] for parton showering and fragmentation with the A3 tuned parameter set and the NNPDF2.3LO PDF set [32]. For the VBF sample, the P dipole-recoil scheme [33] is used to improve the modelling of radiation in the central region. P -B v2 interfaced to NNPDF3.0 was used to generate events that were showered with P 8.212 to model contributions from [34][35][36][37] and¯ [38]. All samples were interfaced to E G v1.6.0 [39] for heavy-hadron decays. The and non-resonant backgrounds are derived from the data.
Multiple collisions were simulated with the soft QCD processes of P 8.212 using the A3 tuned parameter set and the NNPDF2.3LO PDFs. These additional interactions are overlaid on the hard-scatter interaction of the signal and background samples according to the luminosity profile of the recorded data to model contributions from interactions in both the same bunch crossing and neighbouring bunch crossings (pile-up). The response of the ATLAS detector to the generated events is then modelled with a detailed ATLAS detector simulation software [40] based on G 44 [41].

Event reconstruction
This analysis uses an all-hadronic final state, and therefore the primary objects of interest are jets. Jets are reconstructed by applying the anti-jet clustering algorithm [42,43] with a radius parameter of = 0.4 to particle flow [44] objects which are constructed from calorimeter energy clusters and tracks. In order to remove jets originating from additional interactions, the likelihood-based 'jet vertex tagger' [45] algorithm is applied to all jets with T < 60 GeV and | | < 2.5. Quality and cleaning cuts are applied to ensure well-measured jets. In addition, -and T -dependent scale factors are applied to correct the jet energy to the hadronic level [46], and a pile-up subtraction algorithm is applied to reduce effects of pile-up contributions to the jet energy.
A multivariate -tagging algorithm, MV2c10 [47], is used to identify jets which contain -hadrons. This algorithm is a boosted decision tree which combines several features of impact parameter distributions of associated tracks with properties of secondary vertices, as well as jet T and . A similar algorithm is used by the trigger, differing only in the implementation of the jet, track and primary vertex definitions used, and, in 2016 data-taking, by the flavour-tagging discriminant. Two different identification working points of the algorithm are used, corresponding to 77% and 85% efficiency for -jets as measured in as ample. In the trigger system, the 60%, 70% and 85% working points are used. In order to improve the resolution of the invariant mass of -jet pairs, additional corrections are applied to the energy of -tagged jets. These corrections account for semileptonic decays of the -hadron (the muon-in-jet correction) and energy resolution effects specific to -jets (the PtReco correction). They improve the dĳet mass resolution by up to 20% [48].
Photons, electrons and muons are also identified in order to veto events which overlap with other →ā nalysis channels [16,49]. Muons are additionally used in the embedding procedure described in Section 6. Photons are reconstructed from energy clusters in the electromagnetic calorimeter and calibrated to account for energy losses upstream of the calorimeter as well as leakage outside of the clusters [50]. Photons which convert in the detector material are reconstructed separately, utilizing tracks to find the conversion vertex. A tight, cut-based selection is applied, providing good rejection of hadronic jets where a neutral meson carries most of the jet energy. To further suppress the jet backgrounds, the photons are required to be isolated, meaning that after corrections for pile-up the sum of the transverse energy in a cone of Δ = 0.4 around the photon candidate must not exceed a threshold which depends on the photon's T . Electrons are also reconstructed utilizing energy clusters in the electromagnetic calorimeter [50]. Electron candidates must satisfy a loose likelihood-based electron identification criterion, including a loose isolation requirement.
Muons [51] are reconstructed using inner detector, calorimeter and muon spectrometer information. In this analysis they satisfy loose identification requirements, including a loose isolation requirement. Of particular importance for this analysis is a requirement that both electrons and muons are prompt, i.e. their impact parameter significance is required to be small. An overlap removal procedure is applied in order to ensure that jets, electrons, muons and photons are not double counted. Events with identified photons with transverse energy ( T ) > 15 GeV, or electrons or muons with T > 7 GeV after overlap removal are vetoed. The objects have identical selection requirements to Refs [16,49] in order to ensure orthogonality.

Event selection
There are two channels in this analysis, the forward and central channels, which are distinguished by the presence or absence of a high-T forward (3.2 < | | < 4.5) jet in the event. The specific selections are noted below and summarized in Table 1. In the following, 1 and 2 refer to the -tagged jets forming the highest-T -tagged jet pair in the event, and 1 and 2 are two additional jets referred to as VBF jets. Generally, 1 refers to the highest-T and most forward VBF jet, and 2 refers to the second-highest-T VBF jet. Additional requirements on the jets are channel specific and detailed below.
The trigger for the forward channel targets events with at least two -tagged jets, one with trigger level T > 80 GeV and a second with T > 60 GeV, and at least one forward jet with T > 45 GeV. The corresponding event selection requires at least one -tagged jet passing the 77% -tagging working point to have T > 85 GeV and | | < 2.5 ( 1 ), and a second -tagged jet passing the 85% -tagging working point to have T > 65 GeV and | | < 2.5 ( 2 ). One forward jet is required to have T > 60 GeV ( 1 ) and an additional jet is required with T > 30 GeV and | | < 4.5 ( 2 ). Reconstructed jets in the event are required to be matched to trigger jets.
The central channel trigger targets events with at least three central jets, two of which are -tagged. The trigger jet thresholds varied over the run period. The corresponding event selection requires at least two -tagged jets passing the 77% -tagging working point to have T > 65 GeV and | | < 2.5 ( 1 , 2 ). Two additional jets are required, one jet having T > 160 GeV and | | < 3.1 ( 1 ) and a second requiring T > 30 GeV and | | < 4.5 ( 2 ). Furthermore, events are rejected if they have any jets with T > 60 GeV and 3.2 < | | < 4.5 to maintain orthogonality with the forward channel. As in the forward channel, reconstructed jets in the event are required to be matched to trigger jets.
In cases where there is ambiguity when identifying the two VBF jets (e.g. when more than two jets not selected as the signal -tagged jets satisfy the T and requirements), the pair giving the highest invariant mass of the dĳet system, , is chosen.
To remove a shaping of the invariant mass of the -tagged dĳet system, , by the high leading-jet thresholds, a cut on the transverse momentum of the -tagged dĳet system, T, , is required. Both channels require T, > 150 GeV. The central channel additionally requires to be greater than 800 GeV. The efficiency of the combined selection of both channels for simulated VBF signal events with true Higgs boson rapidity | | < 2.5 is 0.8%, with roughly half of the acceptance coming from each channel. With respect to simulated VBF signal events with | | < 2.5 and T > 200 GeV, this selection is 10% efficient, with roughly 4% and 6% contributed from the forward and central channels, respectively.

Adversarial Neural Network for Event Classification
The sensitivity of the analysis is boosted by using an ANN to divide events into regions of varying signal to background composition. This type of multivariate classifier can be constructed such that the network Table 1: Event selection criteria for the two channels used in this analysis. The T and | | requirements on the jets are used to match trigger selections and flavour-tagging requirements. is penalized for learning a feature of the dataset [52,53]. Because the Higgs boson signal is extracted from a fit to , the ANN used in this analysis is penalized for learning the distribution. Using this construction, the non-resonant background shape is independent of the classifier score and therefore the same for each region, which allows for extra constraints in the shape of the non-resonant background.
Twelve variables are used as inputs to the ANN:

•
: the invariant mass of the VBF jet pair. This is typically larger in the signal than in the background.
• T, : the transverse momentum of the VBF jet pair. This is typically larger in the signal than in the background, due to both the harder T, distribution and the lower jet multiplicity in the signal.
• balance T : the ratio of the vectorial and scalar sums of the transverse momenta of 1 , 2 , 1 and 2 . Generally, in the signal the four jets tend to be balanced, so the quantity is smaller in the signal than in the background.
• ( 1 T − 2 T )/( 1 T + 2 T ): the asymmetry in the VBF jet transverse momenta, which tends to be smaller in the signal than in the background.

•
( , ): the separation in between the -tagged jet pair and the VBF jet pair. Due to the dominant (→¯) (→ ) background, where the gluons can have a large separation in , this quantity tends to be larger in the background than in the signal.

•
( , ): the separation in between the -tagged jet pair and the VBF jet pair. In the signal, the¯and systems tend to have a larger separation than in the background.
• tan −1 (tan( ( )/2)/tanh( ( )/2)): the measure of the relative angle of Δ and Δ between the two -tagged jets. This variable takes on higher values in the signal than in the background.
• Jets : the number of jets with T > 20 GeV and | | < 4.5. This quantity is, on average, larger in QCD-produced events, such as the background, than those arising from electroweak production, such as the signal.
• min ( 1(2) ): the minimum separation in Δ between the (sub)leading VBF jet and any jet in the event which is not a part of the -tagged jet pair or VBF jet pair. If there are none of these jets, this quantity takes the default value of 0.
• 1 (2) trk : the number of tracks matched to the (sub)leading VBF jet. Signal VBF jets are typically quark-initiated, while background events have a higher gluon composition, which produces higher track-multiplicity jets. This variable is only valid for jets with T > 50 GeV and | | < 2.1. Jets outside this kinematic region take specific values depending on their T and . This information is useful as jet quark versus gluon composition varies as a function of T and .
The ANN consists of a classifier and an adversary. The classifier's role is to determine if the event is signal-or background-like. The adversary's role is to determine the value of in terms of a binned distribution. Then the two are combined such that the overall network discriminates between signal and background but is penalized if the value is learned, i.e. if it can accurately determine the bin. To achieve this, a three-step training procedure is used [52]. First the classifier is pre-trained with binary cross-entropy loss, while keeping the adversary weights frozen. Next, the adversary is pre-trained with categorical cross-entropy loss, keeping the classifier weights fixed. Third, the classifier and adversary are trained together with a combined loss function, . The combined training proceeds in two sub-steps for each epoch. First the classifier is trained with = classifier − adversary , keeping the adversary weights frozen, and then the adversary is trained with loss function = adversary . The configurable parameter controls to what extent the adversary impacts the overall loss function, i.e. how much the network is penalized for learning . As a post-training step, the classifier scores are scaled to quantiles of the signal MC distribution with the output values ranging from 0 to 1.
One ANN for each channel is trained in Theano through Keras [54,55]. The Adam optimizer is used with standard parameters from Ref. [53]. Signal events for training data are taken from the VBF MC sample described in Section 3. The background events are taken from data sidebands in the range 70 GeV < < 100 GeV (low mass sideband) and 140 GeV < < 200 GeV (high mass sideband). This range is slightly larger than the mass range used in the signal extraction fit (see Section 7) to avoid edge effects in the spectrum. Multi-jet MC events are used for determining the network performance and architecture, but data sidebands are used in the final discriminant training due to the fact that the MC sample does not fully capture the data complexity. For events in the low (high) mass sideband, the adversary bins have a width of 5 (10) GeV, leading to six bins per sideband and a roughly equal number of events per bin in each sideband.
The network architecture is optimized for both discrimination of signal from background and decorrelation of the ANN output score with . Because a reduction in the correlation between and the classifier score is particularly important in the most sensitive signal regions, both the bulk decorrelation and the decorrelation for events with scores in the top 1% of ANN outputs are used to select the network architecture. The network is constructed with = 10 and two epochs of pre-training for both the adversary and classifier, and the adversary training utilizes only the background events. Four-fold cross validation, in which the data is divided into four random subsamples, or folds, and each fold is tested on an ANN trained with the other three folds, is used to verify that there is no overtraining. This procedure ensures that the data are always categorized with an independently trained ANN. Additional uncertainties due to possible bias arising from using data sidebands in the training in which the Higgs boson mass window cannot be directly checked are assessed in Section 8.3. Figure 2 shows the distributions of the ANN scores for the VBF signal, ggF events and the low and high mass data sidebands. It can be seen that good separation is achieved between the VBF signal and data sidebands. Additionally it is noted that the ggF events have a similar score distribution to the data sidebands and that the low and high mass sidebands are in good agreement with each other, as shown in the ratio panel. The variables which contribute most to the ANN's discrimination power are , Jets , balance T , Δ ( , ), and min Δ ( 1(2) ). The ANN score is used to divide the data into signal regions for the signal extraction fit. Both the number of regions which are used in the fit and the boundaries of those regions are optimized in order to maximize the overall signal sensitivity. The optimization metric is where is the number of regions, is the number of signal events in the Higgs boson mass window (100 GeV < < 140 GeV) in region , is the number of background events in the Higgs boson mass window in region , and ( ) is the uncertainty on the number of background events.
is determined from the signal MC and is estimated from a multi-jet MC normalized to the dataside bands. The uncertainty on the background, ( ), is taken to be 10% of . The significance scan starts from = 2 because the lowest classifier score region ( = 1) is used primarily to constrain the background shape in the high score regions. This optimization results in five signal regions for each channel, with region boundaries, signal yields, boson yields, and data sideband yields for each channel shown in Table 2. The four highest-score regions are named SR1 to SR4; the lowest-score region is named RegLow.

Resonant Background Determination
Candidate →¯events contribute significantly to the selected event sample in the range < 120 GeV. Because of the kinematic requirements of the event selection, simulated samples do not accurately reflect the data in the phase space probed by this analysis. Additionally, the →¯ANN score distribution is similar to the data sidebands, and consequently there is little constraining power of the resonant contribution in the fit, which is described in Section 7. Therefore, the yield of the resonant component is constrained with a data-driven estimate from → + − events. The → + − candidate data events are selected by requiring a dimuon pair in the mass window (81 GeV < < 101 GeV) recorded by single-muon and double-muon triggers which do not have an isolation requirement. The reconstructed muon pair in the event is replaced with particle-flow objects obtained by showering, hadronizing and reconstructing an MC-generated -quark pair with four-vectors matching the muon pair four-vectors. The -quark pairs are showered and hadronized with P 8.226 and run through the ATLAS fast simulation [40], digitization and reconstruction chain. Jet finding is run on the modified events and the event selection cuts are applied. The events are first weighted to correct for the muon trigger and offline reconstruction efficiencies. Then the forward and central channel trigger requirements are emulated by applying data-driven trigger efficiency maps as a function of the VBF jet T and . The resulting sample is called the embedded sample. The events are processed through the full event selection and region categorization to yield the number of resonant background events in each region.
The process is validated by applying the embedding method in MC events and then comparing embedded and non-embedded events utilizing the analysis selections. Two samples are used for the validation. Embedded and non-embedded →¯MC events are compared; however, the samples are rather small after analysis selections are applied. Therefore, the process is also validated using embedded and non-embedded VBF →¯events. The embedded and non-embedded MC samples agree to within 20% for the variables used in the ANN and for the and T, distributions. In the signal extraction fit described in Section 7, this 20% residual non-closure is used as an uncertainty in the resonant component.

Signal Extraction Method
The Higgs boson signal strength is determined from an extended binned maximum-likelihood fit to the distribution in data. The two channels are combined in a joint fit of the distribution where all ten signal regions share the same signal-strength parameter, . The negative log-likelihood is constructed as shown in Equation 1, where denotes the yield of ℎ bin of ℎ signal region of ℎ channel. Nuisance parameters, which have external constraints ( ), are penalized in the negative log-likelihood.
The normalizations of the non-resonant background in each signal region, denoted with , are free parameters, whereas the signal normalizations VBF are set to the Standard Model expectation from simulation. VBF, →¯r epresents the observed signal strength and is an extracted parameter. The contributions from ggF,¯and production are fixed to their Standard Model predictions, ggF+VH+ttH . The measurement is primarily sensitive to differences in the signal contribution relative to the background among the regions, and therefore it is only weakly sensitive to the ggF contribution, which has approximately the same ANN distribution as the backgrounds. The¯and contributions are negligible. The normalizations of the Z events, , are taken from the embedding process. The shapes of Higgs boson, Z boson, and non-resonant background are , and , respectively. The Higgs boson signal shape is determined from a binned fit of a Bukin function [56] to MC signal events. The boson shape is also taken from a binned fit of a Bukin function to a sample of →¯events obtained by the embedding process. The normalization of the contribution in each region is taken from the number of embedded events passing that region's selection cuts. For each channel, the non-resonant background is fit to a binned distribution of arbitrary shape which is subject to the constraint that it is the same in all regions, hence the absence of a subscript in equations 2. The shape is primarily constrained by each channel's RegLow region, which has approximately 50 times more events than the other regions in each channel. The normalization is allowed to float in each region. The fit range is 80 GeV < < 200 GeV. The bin width is 4 GeV. Data in the Higgs boson mass window are kept blinded until all the elements of the analysis are finalized.
Systematic uncertainties affecting the signal and simulation-based background components, described in Section 8, are included as nuisance parameters and most are constrained in the likelihood using Gaussian or log-normal probability density functions. All experimental uncertainties, described in Section 8.1, are treated as fully correlated across all regions. Uncertainties related to different -tagging working points are taken as uncorrelated with each other; however, it was checked that the impact on the sensitivity of treating them as fully correlated or fully uncorrelated is negligible. Theoretical uncertainties, described in Section 8.2, are treated as correlated across regions. The uncertainty in the resonant background normalization and width related to the embedding process is taken as uncorrelated between regions. Uncertainties covering possible biases in the non-resonant background shape are included as an additional uncertainty in the signal normalization per region and are uncorrelated between regions. This bias uncertainty is determined using dedicated control regions described in Section 8.3.
The parameter of interest in the fit is the signal strength of the vector-boson fusion production channel, VBF . Tests of Asimov data with signal injected at strengths of VBF, →¯= 0, 1, 2 confirmed the linearity of the fit with no bias in VBF, →¯. As a cross-check verifying that the analysis is only weakly sensitive to the overall ggF contribution, a 100% correlated uncertainty was added to the ggF uncertainties. The expected VBF, →¯u ncertainty changed by only 0.01. The fit is also performed with the inclusive Higgs boson signal strength →¯a s the parameter of interest. For this parameter of interest, the ggF, VBF, and contributions are fixed to their relative contributions according to the Standard Model values and a single signal strength parameter is applied to all contributions. Signal injection tests also confirm the linearity of the fit with no bias in →¯.

Systematic Uncertainties
Systematic uncertainties are divided into three categories: experimental, theoretical and data-driven background uncertainties. The first two categories impact the signal and simulated component of the resonant background estimation. All of these uncertainties except the luminosity uncertainty are propagated to the kinematic event variables prior to signal region classification, such that the uncertainty is estimated both for the ANN classification and for the signal and resonant background shape modelling. The impact of the uncertainties on both the signal region classification and the shape are fully correlated, as are the uncertainties affecting both the signal and the simulated component of the resonant background estimation.

Experimental systematic uncertainties
The most prominent experimental uncertainty is due to uncertainties in the -jet trigger scale factors which account for differences between data and MC events in the efficiency of the triggers. The per-jet online -tagging efficiency calculated with respect to the offline -tagging efficiency is measured in¯events in both data and MC events. A scale factor is then applied to the leading simulated -tagged jet as a function of the leading jet's T . For the 2016 dataset, an additional scale factor binned in the leading jet's was applied. The total uncertainty in the -jet trigger scale factor is typically 2-5%. These uncertainties are applied to signal events and the embedded sample which uses simulated -jets. Additional jet trigger scale-factor uncertainties, which are determined by taking the difference between the leading jet trigger efficiencies as determined in data and MC events using the same procedure, are approximately 1%.
The second leading experimental uncertainty impacts the per-jet track multiplicity distribution used for quark-gluon separation [57] in the ANN. This quantity is affected by the modelling of the charged-particle multiplicity in jet fragmentation models as well as the track reconstruction efficiency in jets. The modelling uncertainties are determined through a measurement of the per-jet charged-particle multiplicity in data which is then compared with various models. The experimental uncertainties are evaluated through standard track-reconstruction efficiency techniques, and their estimation is fully described in Ref. [57].
Jet energy scale (JES) and jet energy resolution (JER) uncertainties comprise the next largest group of experimental uncertainties. The JES uncertainties are primarily determined using data-based -boson-jet, photon-jet and multi-jet T -balancing techniques [46]. Additional uncertainties are applied for the energy scale of jets containing -quarks. The impact of the uncertainty in the JES is estimated by scaling the jet energies within their uncertainties. JER uncertainties are also determined from in situ measurements of -boson-jet , photon-jet and dĳet T balancing [46]. The systematic uncertainty due to the JER is calculated by increasing the resolution within its uncertainties, smearing the jet energy by the resulting change in resolution, and comparing the result with the nominal shape and normalization in simulation.
Offline reconstructed -tagging scale factors and their associated uncertainties are determined on a per-jet basis through comparisons of data and MC¯, + , * , and multi-jet events [47, 58,59]. For -jets the uncertainty in the scale factors is approximately 2%, for -jets it is 10%, and for light-quark and gluon jets it is 30%.
The uncertainty in the integrated luminosity is 1.7% [60], obtained using the LUCID-2 detector [61] for the primary luminosity measurements. This uncertainty is applied to the signal process. Other uncertainties include the uncertainty from the jet vertex tagging requirement, which is measured on a per-jet basis in → ℓ + ℓ − events with one jet, and the MC pile-up reweighting uncertainty.

Theoretical systematic uncertainties
Theoretical uncertainties primarily affect the signal modelling and are divided into several categories. The value of the →¯branching ratio and its uncertainty are calculated by the HDECAY program [62] using the LHC Higgs Cross Section Working Group recommendations [63] with = 125 GeV. To estimate the effect of missing higher-order terms on the QCD calculations used to predict the VBF and ggF cross-section and acceptance, the chosen renormalization and factorization scales are independently varied by factors of 0.5 and 2.0. Additional acceptance uncertainties are applied to ggF events in which additional radiation produces a VBF-like topology by having an additional jet pair with an invariant mass greater than 400 GeV [63]. There is a 20% uncertainty for all events in this category, and additional uncertainties for events with a fifth jet. The impact of missing electroweak corrections on the signal cross-section and acceptance is calculated using the recommendations from the LHC Higgs Cross Section Working Group, namely that the uncertainty is calculated from the ratio of the cross-section calculated with NLO electroweak corrections using HAWK 2.0 to the leading-order cross-section plus the contributions from photon-induced processes [64]. The uncertainty is applied as a function of the Higgs boson T . Uncertainties in the cross-section and acceptance due to the choice of PDF and s are evaluated using the error eigenvectors of the nominal PDF. Additionally, the overall PDF+ s cross-section uncertainty is applied as a uniform uncertainty. In order to estimate the uncertainty due to the choice of parton shower and underlying-event model, the difference between generator-level signal samples showered with P 8.212 and H 7.0 [65,66] is determined as a ratio in the variables and jets which is applied to the nominal signal weights. These variables are chosen as they show the largest differences at the truth level. When calculating the significance and the signal strength, all uncertainties are used. When calculating the cross-sections, the cross-section uncertainties are removed and only acceptance uncertainties are considered. Contributions from¯and are small and a conservative 100% uncertainty is assigned to their cross-section and acceptance.

Data-driven background systematic uncertainties
Additional uncertainties arise from potential biases in the non-resonant background shape in the Higgs boson mass window due to the fact that the mass window is not used in the ANN training. Studies of multi-jet MC confirm that training on the sidebands does not induce any bias in the Higgs boson mass window, however these samples do not have sufficient statistics to derive an uncertainty. Therefore these uncertainties are assessed with shape-bias control regions (CRs). The CRs are constructed by inverting one or more event selection cuts to produce an orthogonal event sample which has similar kinematics to the SR. Then the CR events undergo a translation and reweighting, described below, to correct the CR kinematics back to the SR kinematics. Lastly, the ANN is applied, generating one control region for each region in each channel.
The translation is done by smearing the jet of the CR VBF jets with a Gaussian width of 200 GeV/ T,jet . The new jet is called . The smearing of the original jet value is repeated until the modified event meets the signal region selections. Table 3 presents a summary of the event selection in the CRs. The forward channel CR uses a trigger requiring four central jets, at least two of which are -tagged, and the event selection requires four jets with | | < 2.5, at least two of which are -tagged. In the central channel CR, the trigger and event selection is the same as in the SR except that the CR requires Δ > /4 and the cut is inverted, i.e. < 800 GeV. While the cut is sufficient to ensure orthogonality, the Δ cut improves agreement between CR and SR kinematic distributions. In order to improve agreement between the SR and CR in the ANN output score distributions, the forward channel CR events are sequentially reweighted to match SR kinematics in Jets , , and T, , and the central channel CR events are sequentially reweighted in Δ ( , ), Jets , and balance T . Table 3: Event selection criteria for the forward and central channel CRs. The T and | | requirements on the offline jets are used to match trigger selections and flavour-tagging requirements. The variables and refer to the and of the VBF jets after the smearing procedure described in the text. To determine the bias uncertainty, the CR distribution is reweighted to match the SR using a reweighting factor derived from RegLow in each channel. This reweighting is performed by first doing a simultaneous fit of each CR and its RegLow to the and non-resonant background components. As in the fit described in Section 7, the non-resonant background has arbitrary shape and the normalization and shape is determined using constraints from the embedding. Next, the component is subtracted from the CRs such that only the non-resonant background remains. Then the embedded component is subtracted from the SR RegLow, leaving only the SR RegLow non-resonant background, as the contamination from the signal is negligible. The ratio of the -subtracted SR and CR RegLow shapes is determined. This ratio varies smoothly from 0.95 to 1.1 (1.05) in the forward (central) channel. A third-order polynomial fits it well in the range 80 GeV < < 180 GeV, with a 2 probability of 0.92 (0.19) in the forward (central) region. The ratio is applied to all of the -subtracted CRs for each channel. Figure 3 shows the distribution of ANN scores for the SR events, the CR events prior to reweighting, the CR events after kinematic reweighting, and the CR events after kinematic and reweighting. It can be seen that after kinematic reweighting the SR and CR ANN distributions agree well. There is good agreement between the reweighted CR and SR sidebands for all regions, with 2 probabilities ranging from 0.20 to 0.91. Lastly, the component is added back to each CR, and each CR is fit for a bias signal simultaneously with RegLow for that channel's control region. The shape of the bias signal is the same as the Higgs boson signal shape. The mean of the bias signal is scanned from 100 to 140 GeV and the largest bias signal found in this range is taken as an additional uncertainty in the number of Higgs boson events in that region for the Higgs boson signal-strength fit. True signal contamination in the control regions are negligible. The maximum observed bias ranges from 60% of the expected signal in the least sensitive signal regions to 14% of the expected signal in the most sensitive signal region. The bias uncertainties are uncorrelated between channels and regions because they are dominated by statistical effects. Assuming the bias uncertainties to be correlated leads to a modest increase in the overall uncertainty and no change in the central value of the fit. Several modifications to the CR definitions, smearing procedure and reweighting schemes, including using no reweighting, were tested and no significant change in the bias uncertainty was found.
The uncertainty in the resonant background normalization related to the embedding process is determined from the 20% residual non-closure of the method seen in the variables used in the ANN. Studies show that the uncertainty in the Higgs boson production signal-strength parameter is not sensitive to variations in the resonant background width uncertainty for a large range of non-zero values, and therefore a conservative 20% is taken as the width uncertainty. These uncertainties are in addition to the experimental uncertainties related to jet reconstruction, triggering and -tagging described above.

Results
The fits to the full mass region are shown in Figures 4 and 5 for all signal regions in the forward and central channels, respectively. A summary plot, with all signal regions summed, weighted by ln(1 + / ), is shown in Figure 6. The values of and are the integrated numbers of signal and background events in a region centred on the Higgs boson mass and containing 68% of the signal events. This weighting approximates the relative contribution each region has in the overall significance of the measurement. The quality of the fit was checked by performing an eight-parameter-of-interest fit to separately determine the signal strength in each of the four signal regions of each channel. The probability of compatibility of the signal strengths with a single value is 88% and the results for the two most sensitive signal regions are consistent, well within one standard deviation of each other.  Table 5 summarizes the uncertainties in the VBF, →¯m easurement. The measurement is statistically limited, with the largest uncertainties coming from the data statistics and the non-resonant background bias uncertainty. The experimental systematic uncertainties, which are about a third as large as the statistical uncertainties, are dominated by the trigger and jet energy scale and resolution uncertainties. Taken together, the theory uncertainties are comparable to the largest individual experimental uncertainties, as is the uncertainty due to the embedded sample constraint. Due to the requirement on T, , the measurement is primarily sensitive to higher-T Higgs boson production. The Higgs boson cross-section at high T is sensitive to many BSM models, motivating the definition of dedicated high-T (true Higgs boson T > 200 GeV) regions in the simplified template cross-section framework [67]. The observed signal strength of inclusive Higgs boson production with a true T > 200 GeV is 0.93 +0. 38 −0.38 (stat.) +0.24 −0.20 (syst.), for a significance of 2.2 , compared with 2.3 expected. For this measurement, the contribution from events with true T < 200 GeV is fixed to its Standard Model expectation. Results of two-dimensional likelihood scans of the signal strength for Higgs boson production with true boson T > 200 GeV versus the signal strength for Higgs boson production with true boson T < 200 GeV are shown in the Appendix, where the relative constraining power can be seen. Results of two-dimensional likelihood scans of the signal strength for Higgs boson production in the VBF production mode versus the signal strength for Higgs boson production in the ggF production mode are also shown in the Appendix. It can be seen that the measurement has little power to constrain ggF production separately from VBF production.       (e) Figure 4: Fit results for the forward channel. Top left is SR1, top right is SR2, middle left is SR3, middle right is SR4, bottom is RegLow. The data are the black points, the blue line is the non-resonant background shape (continuum), grey is the resonant background ( →¯), red is the fitted VBF Higgs boson signal (VBF →¯), and maroon are events attributed to the other Higgs boson production mechanisms (ggF +¯+ ). The bottom panel of each plot shows the difference between the data and the non-resonant background. The hashed band shows the fitted background and signal uncertainties. For RegLow, the dominant background uncertainty is the statistical uncertainty of the shape of the background template. This uncertainty is mainly driven by the statistical uncertainty of data in RegLow, such that the background uncertainty is almost fully correlated with the data uncertainty in RegLow.
Weighted by Higgs Boson ln(1+s/b) Figure 6: The distribution after subtraction of non-resonant background, with signal regions 1 through 4 for both channels weighted by ln(1 + / ) within the range 112 < < 136 GeV and summed. This range corresponds to roughly 68% of the total Higgs boson signal template width. The value of is calculated from the observed Higgs boson signal (including all →¯production processes) within this range for each region, and is calculated from the post-fit background yields in each region. The data are the black points, the contribution is in grey, the fitted VBF Higgs boson signal is in red, and maroon are events attributed to the other Higgs boson production mechanisms (ggF +¯+ ). The hashed band shows the fitted background uncertainties.   [68], including corrections at NLO from electroweak and photon processes determined with HAWK 2.0, is 3.77 ± 0.08 pb. A fiducial cross-section is determined by requiring | | < 2.5 and is measured to be VBF-fid = 3.31 +1.12 −1.12 (stat.) +0.74 −0.60 (syst.) fb.
These results are combined with the photon analysis [16] described at the end of Section 1. The photon analysis yields a signal strength of 1.3 ± 1.0 with a significance of 1.3 , compared with an expectation of 1.0 for both inclusive and VBF production. The two analyses are orthogonal, as the measurement described in this paper vetoes events with photons. They are combined in a joint likelihood fit, sharing the Higgs boson signal strength ( VBF, →¯o r →¯) as a common parameter of interest. All experimental systematic uncertainties are correlated with the exception of the jet energy resolution and offline flavour-tagging uncertainties, as the two analyses use different jet definitions. The jet energy scale is partially correlated because both jet definitions use similar datasets to derive the uncertainties. Theoretical uncertainties are correlated except for the parton shower uncertainties, which are determined using a variation of the H HardScale parameter in the photon analysis, and a comparison between P and H in this analysis. The combined measured signal strength for VBF production is 0.99 +0. 30  ), corresponding to an observed (expected) significance of 3.0 (3.0 ). The combination significances and signal strengths are summarized in Table 6.

Conclusion
This paper presents a measurement of Standard Model VBF Higgs boson production in the¯decay mode using 126 fb −1 of 13 TeV proton-proton data collected with the ATLAS detector at the LHC. Significant improvements relative to the previous analysis using 31 fb −1 lead to a measurement of VBF Higgs boson production with a significance of 2.6 relative to the background-only hypothesis, with an observed signal strength of 0.95 +0.32 −0.32 (stat.) +0.20 −0.17 (syst.) compared to an expected value of 1.00 +0.32 −0.32 (stat.) +0.21 −0.17 (syst.). The improvements include the usage of an ANN to decorrelate from the event classifier, boosting the statistical power of the fit by allowing the same background shape to be used in all signal regions. Additionally, the usage of embedding to estimate the boson contribution directly from data results in significantly less uncertainty on the boson contribution to the fit. The event selection and classifier provide sensitivity primarily to VBF production at high Higgs boson T

Appendix
This analysis is particularly sensitive to high Higgs boson T where true Higgs boson T , T, , is greater than 200 GeV, but is unable to separately constrain the cross-sections at high and low T . In the forward channel, 54% of events have both T, and T, > 200 GeV, 34% have both T, and T, < 200 GeV and 8 (4)% have T, < (>) 200 GeV and T, > (<) 200 GeV. In the central channel, 79% of events have both T, and T, > 200 GeV, 13% have both T, and T, < 200 GeV, and 7 (2)% have T, < (>) 200 GeV and T, > (<) 200 GeV. The two-dimensional likelihood scan of high-and low-T Higgs boson production is shown in Figure 7, where the relative size of the ranges for each T bin gives a measure of the relative sensitivity to high and low Higgs boson T . Because the total VBF contribution is rather well-constrained, the production at high and low T is anti-correlated.
This analysis does not strongly constrain the ggF Higgs boson production cross-section, as the ggF kinematics are similar to the kinematics of the SM background, resulting in a similar ANN score distribution as the background. The VBF production cross-section cannot be measured precisely if the ggF cross-section is unconstrained, particularly due to large uncertainties for ggF events with at least two extra jets and high . The two-dimensional likelihood scan of VBF and ggF production signal strengths is shown in Figure 8. A large range in the ggF signal strength is observable in the figure, and the impact of the lack of ggF constraint on the VBF production signal strength is only found to be large when the ggF production signal strength is far from the SM expectation. Best fit 68% CL 95% CL Figure 8: Likelihood scan as a function of the signal strength for VBF Higgs boson production versus the signal strength for ggF Higgs boson production. Shown is the best-fit value (blue plus sign), and the 68% (solid blue) and 95% (dashed blue) confidence level (CL) contours. The asymmetric shape arises from the fit's inability to separately constrain the subset of ggF events which have a VBF-like topology.