Prospects for a measurement of the W boson mass in the all-jets final state at hadron colliders

Precise measurements of the mass of the W boson are important to test the overall consistency of the Standard Model of particle physics. The current best measurements of the W boson mass come from single production measurements at hadron colliders in its decay mode to a lepton (electron or muon) and a neutrino and pair production of W bosons at lepton colliders, where both the leptonic and hadronic decay modes of the W boson have been considered. In this study, prospects for a measurement of the W boson mass in the all-jet final state at hadron colliders are presented. The feasibility of this measurement takes advantage of numerous recent developments in the field of jet substructure. Compared to other methods for measuring the W mass, a measurement in the all-jets final state would be complementary in methodology and have systematic uncertainties orthogonal to previous measurements. We have estimated the main experimental and theoretical uncertainties affecting a measurement in the all-jet final state. With new trigger strategies, a statistical uncertainty for the measurement of the mass difference between the Z and W bosons of 30 MeV could be reached with HL-LHC data corresponding to 3000 fb−1 of integrated luminosity. However, in order to reach that precision, the current understanding of non-perturbative contributions to the invariant mass of W → qq¯′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ q\overline{q}^{\prime } $$\end{document} and Z → bb¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b} $$\end{document} jets will need to be refined. Similar strategies will also allow the reach for generic boosted resonances searches in hadronic channels to be extended.


Introduction
In the Standard Model (SM) of particle of physics, electroweak interactions are mediated by the photon and the W and Z bosons [1][2][3]. While at lowest order in electroweak theory, the mass of the W boson can be expressed solely as a function of the Z boson mass, the fine-structure constant and the Fermi constant, this statement is modified by higher order corrections, most prominently from other heavy particles in the SM [4,5] and potentially also from particles beyond the SM. Global fits to the SM parameters [6], constraining physics beyond the SM, are currently limited by the precision of the W boson mass measurement. Precise measurements of the mass of the W boson are therefore important to test the overall consistency of the SM. The current best measurements of the W boson mass come from single production measurements at hadron colliders [7][8][9][10][11][12] in its decay mode to a lepton (e or µ) and a neutrino with a branching ratio of (21.34 ± 0.31) % [13] and pair production of W bosons at lepton colliders [14][15][16][17], where both the leptonic decay mode and the decay mode of the W boson to a qq pair with a branching ratio of (67.41 ± 0.27) % has been considered. The current world average (not yet considering LHC measurements) is (80.385 ± 0.015) GeV [13].
In this paper, we explore the feasibility of a new channel, namely single production at a hadron collider in the decay mode with the highest branching ratio to a qq pair. At hadron colliders, the W boson mass cannot be fully reconstructed in the lepton plus neutrino decay mode. Only its transverse mass can be extracted by estimating the neutrino transverse momentum from the measured missing transverse momentum in the event. The qq decay mode allows for the reconstruction of the full 4-momentum of the W boson through exclusively visible particles. The hadronic decay mode, compared to the lepton plus neutrino decay mode has the potential to avoid the experimental systematic uncertainties related to the measurement of the missing transverse momentum and theoretical uncertainties related to the transverse mass [18,19]. Additionally, the absence of missing transverse energy yields a more narrow peak that is predominantly invariant across a broad kinematic regime of W boson transverse momentum. Single production of W → qq results in a rather clean final state, compared to e.g., tt production, because the quarks originating from the W boson can form jets of hadrons without color reconnection to other quarks in the event not originating from the W boson.
The dominant background to the production of W → qq at hadron colliders is quantum chromodynamics (QCD) multijet production. This background can be significantly suppressed by requiring jets with high transverse momenta p T in the event. We therefore propose a measurement of the W → qq mass produced in association with a high momentum jet as depicted in figure 1. For such high momentum W bosons, the shower of hadrons originating from the quark anti-quark pair merges into a single large radius jet of particles. Multiple techniques have been proposed to analyze the jet substructure of such jets [20][21][22][23][24][25][26][27] in order to distinguish jets from W → qq and jets from multijet background. For a review of recent theory and experimental progress in jet substructure, see [28,29]. Additionally, such techniques have been shown to reduce theoretical and experimental uncertainties related to the reconstruction of the W boson mass by removing contributions from non-perturbative effects and additional pp interactions happening in the same bunch crossing, so-called pileup interactions. These techniques have been extensively validated by the ATLAS and CMS experiments [30][31][32][33] and it was successfully demonstrated that they can be used to extract the W → qq mass peak on top of the multijet background [34][35][36].
Very similar strategies have been used by both ATLAS [37] and CMS [34,35] to place the leading bounds on hadronically decaying resonances beyond the SM in most of the 50-300 GeV range. The dominant systematic uncertainties in these searches result from JHEP02(2019)003 the selection efficiency and misidentification probability calibrations of the substructure variables used to reduce the multijet backgrounds. These uncertainties are closely correlated with equivalent issues for the W and Z bosons themselves. Further, measurements of the Higgs boson decaying to b quarks at high transverse momentum [38] are sensitive to similar systematic effects. By providing several strategies in which the uncertainties can be studied and quantified in the more well-understood SM channels, improvements can also propagate to analogous measurements and beyond the SM searches, independent of the ultimate precision on the W boson mass that can be achieved. We therefore believe that the extraction of the W boson mass in the all hadronic final state represents a concrete goal that will benefit the field of jet substructure more broadly.
In this paper, we quantify the potential of a measurement of the W boson mass in this new channel at the LHC and the HL-LHC [39] that are expected to deliver proton-proton collision data corresponding to integrated luminosities of 300 fb −1 and 3000 fb −1 , respectively. In section 2 we describe our simulated samples, and simplified detector simulation. In section 3 we present the expected statistical uncertainties on the W boson mass at LHC and HL-LHC as well as trigger strategies. The leading experimental and theoretical uncertainties are discussed in section 4. Finally, we conclude on the feasibility of such a measurement in section 5.

Simulation setup
Monte Carlo (MC) samples of W + jets and Z + jets events, where the W and Z decay into quark anti-quark pairs, as well as multijet events are simulated at a proton-proton center of mass energy of 13 TeV with the leading-order (LO) mode of MadGraph5 aMC@NLO v5.2.2.2 [40,41] combined with Pythia version 8.212 [42] for parton showering with the Monash 2013 tune [43]. Additionally, the NNPDF 3.0 [44] parton distribution functions (PDF) is used. For cross checks, we use W + jets and Z + jets events produced with MadGraph5 aMC@NLO combined with Herwig++ v2.7.1 [45,46] and its default tune. Precise predictions of the W and Z boson p T spectra from ref. [47] are include in our simulation. Cross sections are computed at a center of mass energy of 13 TeV. In future runs of LHC and HL-LHC energies up to 14 TeV are foreseen. Conclusions drawn based on the 13 TeV simulation will hold also at 14 TeV as the expected cross section changes from 13 to 14 TeV are only at the 10%-level for signals and backgrounds.
We employ a detector simulation that reproduces the main resolution effects relevant for jet substructure reconstruction, representative of current and future detector concepts; this simulations employs particle-flow-based reconstruction, such as the CMS [48] or AT-LAS [49] detectors at the LHC.
Due to isospin considerations, jets on average consist of 60% charged hadrons, 30% photons (including π 0 → γγ) and 10% neutral hadrons, although these fractions are subject to large jet-by-jet fluctuations [48,49]. In the simulation, we first categorize the generated particles into charged particles (tracks), photons and neutral hadrons. Tracking inefficiencies occur at high particle momenta and within high momentum jets, where the tracking detector granularity is not sufficient to reconstruct highly collimated particles. Since both ineffi-JHEP02(2019)003 ciencies are correlated within high momentum jets, they are simulated together by treating charged particles with momenta above a threshold p max T,track = 220 GeV as neutral hadrons. For jet p T of 100-500 GeV, the tracking (2-5%) and HCAL resolutions (5-10%) are of similar order at CMS. In this range, a generic particle flow algorithm may promote the HCAL measurement over the tracker one. The threshold p max T,track is chosen such that it matches the jet mass resolution of the current CMS detector [50] at high momenta, and increased by a factor 2 for the HL-LHC Phase-II upgrade of the CMS tracker [51]. The improvement comes from a higher granularity tracking detector which will better distinguish hits from nearby high p T tracks. The generated neutral hadrons are then discretized to simulate the spatial resolution of the electromagnetic (σ η ECAL = σ φ ECAL = 0.0175) and hadronic calorimeters (σ η HCAL = σ φ HCAL = 0.022). Finally, all particles are smeared according to parametrized resolutions σ E particle for each particle type (σ p T charged particles = 0.00025 p T /GeV ⊕ 0.015, σ E photons = 0.021/ E/GeV ⊕ 0.094/(E/GeV) ⊕ 0.005, σ E neutral hadrons = 0.45/ E/GeV ⊕ 0.05). The resolutions and granularities have been chosen to match the performance of the CMS detector [52]. The resolution in jet mass and substructure variables for W → qq and single parton jets with p T in the range 300 GeV and 3.5 TeV has been compared to CMS public results and found to be compatible [32,33,53]. The average W → qq and single parton jet selection efficiencies match with those of CMS [32,33] which are similar to those of ATLAS [30,31]. As such, while we will present results using simulation generated with the CMS detector configuration, the study is representative for the current and future performance of both experiments.

Measurement strategy
After introducing the observables used for the measurement, in this section we present two separate strategies to measure the W mass. The first approach is to measure only the W boson jet mass peak position m W . The ultimate uncertainty of this approach is constrained experimentally by the jet constituent energy scale calibration. Currently, it would require a significantly better jet constituent calibration than that achieved by the LHC experiments. The second and more feasible approach is to measure the mass peak position difference between the Z boson and the W boson ∆m = m Z − m W because many systematic uncertainties can cancel using the Z boson mass as a standard candle. Finally, a measurement relying on a recently developed trigger strategy is proposed.

Observables
Jets are clustered from the detector-simulated particles using the anti-k T [54] algorithm, with a distance parameter of R = 0.8. Before computing the invariant mass of the jet, soft radiation is removed iteratively with the modified mass-drop algorithm (mMDT) [20,23], also known as the soft drop algorithm [25] with β = 0. The mMDT procedure reduces the mass of quark and gluon jets and improves the mass resolution of W and Z boson jets. Soft drop with the angular exponent β = 0, soft cutoff threshold z cut = 0.1, and characteristic radius R 0 = 0.8 is applied using the FastJet software package [55]. In addition to the soft drop algorithm, we have considered a set of alternative grooming algorithms, namely recursive softdrop [56] with the angular exponent β = 1, soft cutoff threshold z cut = 0.1, characteristic radius R 0 = 0.8, and the number of iterations N set to infinity, trimming [57] with subjet size of R sub = 0.2 and f cut=0.03 and pruning [58] with the soft threshold parameter z cut = 0.1 and angular separation threshold of ∆R > m jet /p T,jet .
An additional discriminator relying on substructure information is used to further suppress multijet background. Among the most discriminant, N -subjettiness observables [21], τ N , and the energy correlation function ratios N β i [27] are considered. Here τ i will take on low values if a jet has N ≥ i subjets. The ratio τ 2 /τ 1 therefore discriminates W → qq jets that contain the shower of a quark anti-quark pair from single quark or gluon jets. Similarly, N β i attempts to identify N -prong jet substructure using information about the energies and pair-wise angles of particles within a jet without requiring a subjet finding procedure. N β i are ratios of multi-point correlation functions where β is again an angular exponent of the pairwise particle distances. N β 2 with β = 1 is an observable that distinguish best between quark and gluon jets and intrinsically two-prong jets from a W /Z bosons. Its performance in terms of quark and gluon jet background rejection power vs. W jet selection efficiency is summarized in figure 2, showing similar performance for all considered observables.
We also study variants of N β 2 that have been decorrelated [59] with jet p T and m mMDT for QCD multijet events. Figure 3 shows the correlation of N β=1 2 with ρ = log(m 2 mMDT /p 2 T ) for different bins of jet p T in simulation of QCD multijet production, from which the 1% quantile of the N β=1  estimated from multijet simulation rather than a constant value across p T and m mMDT . From figure 2 it can be seen that the performance in terms of background rejection power vs. W jet selection efficiency is similar to the taggers without decorrelation. The performance is also similar for different choices of the quantile of multijet reduction used in the decorrelation. However, by construction the decorrelated N β=1 2 1% tagger guarantees a smoothly falling m mMDT spectrum for QCD multijet background for any p T , simplifying the signal W extraction procedure as demonstrated in figure 4 (left) and explained in more detail in the next section.

Measurement of m W
We first consider an approach to obtain the W boson mass by measurement of the peak position of the W boson alone on top of the smoothly falling QCD mutlijet background, treating Z boson production as a background. Since top quark production can be measured in lepton+jet events, we assume its all-hadronic contribution can estimated precisely and subtracted, and thus do not consider it in this feasibility study, it contributes ∼5% to the sample of W bosons selected here.
The current lowest unprescaled trigger thresholds on the leading large radius jet p T collecting multijet events plateau at 100% efficiency for a reconstructed jet p T of 500 GeV for the CMS [60] and ATLAS experiments [61]. Due to the foreseen increase in luminosity, it will become challenging to preserve the online jet momentum selection at its current value until the end of LHC and HL-LHC running. In the following, we assume that we will JHEP02(2019)003 succeed in retaining events with the leading jet passing p T > 500 GeV and the decorrelated N β=1 2 1% tagger. As demonstrated in figure 4 (left), the leading jet mass m mMDT for multijet background after this selection can be described by a smooth functional form, enabling a signal plus background fit to extract the signal parameters. The background is parametrized by a logistic function with 3 free parameters: The signal is parametrized by a Gaussian function to simulated signal samples. To estimate the expected statistical uncertainty of a W boson measurement, we first generate pseudodata from the signal plus background functional forms for the expected number of events corresponding to 30 and 300/fb of integrated luminosity at the LHC and 3000/fb at the HL-LHC. With these pseudo-data distributions, we perform signal plus background fits. The result of a fit for the HL-LHC scenario is shown in figure 4 (left). The estimated statistical uncertainties for W mass measurements at the LHC and HL-LHC are summarized in table 1.
In figure 4 (right), we demonstrate how the same approach performs without decorrelating N β=1 2 . While a similar statistical uncertainty can be achieved as for the decorrelated tagger, the procedure is subject to larger systematic uncertainties due to the necessity for a background functional form that is not smoothly falling unlike the decorrelated tagger. The differences of 756 MeV in peak position between the correlated and decorrelated tagger on top of the background can be taken as an indication of the size of background systematic ef- fects without decorrelation. Further, when lowering the p T threshold below 500 GeV, without decorrelating N β=1 2 , the maximum of the background jet mass spectrum would peak close to the W mass, making the extraction of the W boson peak even more challenging.

JHEP02(2019)003
For each tagger, we have considered working points with 0.5%, 1%, 2%, 5% and 10% quark and gluon jet efficiency, jet mass ranges from 20 to 200 GeV and various background functional forms. We quote in table 1 the results that minimize the statistical uncertainty on the W boson mass, while maintaining a good fit of the background functional form to QCD multijet simulation. We have also studied an alternative decorrelated observable N β=2 2 2%, which yields a slightly worse statistical uncertainty, but is found to be within 30% of the best variable. This result is representative of the variance over the taggers considered in this study.

Measurement of ∆m
The above approach to measure exclusively the W mass is highly sensitive to uncertainties related to the absolute calibration of the jet mass. We consider an alternative approach where the Z boson mass peak is used as a standard candle to constrain experimental uncertainties related to the jet mass calibration and theoretical uncertainties related to the jet mass spectrum prediction.
The data sample is split into a category enriched with Z bosons and a category enriched in W bosons by using a b-tagging algorithm to exploit the higher branching fraction to b quarks (15.12% for bb) to that of the W boson (0.06% to bc). A b-tagger makes use of the fact that b quarks form B hadrons that have a larger lifetime than lighter hadrons and can be identified by secondary decay vertices made of tracks with large impact parameters with respect to the primary vertex and several observables that characterize B hadrons flight directions in relation to the jet substructure. We consider an efficiency of 45% for Z → bb jets and an efficiency of 1% for light quark jets and 3% for Z → cc jets. These efficiency values are similar to the performance of double b-taggers employed by both the CMS [62] and ATLAS [63] experiments for large radius jets. Figure 5 shows the resulting expected W and Z signal peaks in the Z-enriched and the W -enriched categories. By measuring the mass difference no absolute calibration of the jet mass is needed, instead only the relative jet mass scale of light and b-flavor enriched jets need to be calibrated.
The result of the simultaneous fit to the two categories to pseudo data for the HL-LHC is shown in figure 5. The statistical uncertainty of the m Z − m W mass measurement at the LHC and HL-LHC is given in table 2. The statistical uncertainty is larger than for the JHEP02(2019)003   case of the m W mass measurement, because the sample of Z bosons decaying to b quarks is significantly smaller than that of W bosons decaying to quarks. We have considered double b-tagger selections corresponding to 10, 30, 45, 75, and 85% Z → bb jet efficiency to divide up the data into Z → bb-enriched and the W → qq-enriched categories. We report the result for the selection corresponding to 45% Z → bb jet efficiency that yields the best statistical precision for the m Z − m W mass measurement. However, it should be noted that yet better precision may be achievable by splitting the sample further into a singleb-enriched and double-b-enriched and light quark enriched samples. Correlating or not correlating the background shape parameters between the two categories does not change the resulting statistical uncertainty on the m Z − m W mass measurement, which indicates that the measurement uncertainty is mainly driven by the W and Z peak modeling rather than modeling of the smoothly falling background shape.

JHEP02(2019)003
, p T > 300 GeV 3000/fb 32 Table 3. Statistical uncertainty of W mass measurement for different strategies and trigger selections. The p T > 300 and p T > 400 selections will require new trigger strategies.

Measurement with new triggers
Following existing triggers, the best estimated statistical uncertainty from both approaches at the HL-LHC is a factor 2 worse than the precision reached by existing measurements of the W boson mass. Though a moderate improvement is expected combining this measurement with the existing ones due to largely uncorrelated uncertainties, we propose to make use of new trigger strategies for this measurement to increase the number of W → qq events, following the data scouting approaches explored by CMS [64], ATLAS [65] and LHCb [66], where only limited event information is stored to allow data storage at a higher rate yielding a lower high-level trigger jet p T threshold. The data rate after high-level trigger is the limiting factor for the jet p T threshold at the moment and will likely remain so even at the HL-LHC, because hardware-based triggers using jet substructure information [67] may become feasible to maintain low hardware-based jet p T trigger thresholds. Though trigger level jets currently have larger associated systematic uncertainties than offline jets, we will assume here that the advances in the trigger system for the HL-LHC will allow to achieve the same systematic uncertainties as for offline jets. Table 3 presents scenarios with jet p T trigger thresholds lowered to p T > 300 GeV and p T > 400 GeV at the LHC and HL-LHC. The statistical uncertainties are significantly reduced with lower trigger thresholds, though the achievable statistical uncertainty with the integrated luminosity of 3000/fb and p T > 500 GeV still remains lower than that of 300/fb and p T > 300 GeV. The ultimate statistical uncertainty on the W boson mass measurement that can be achieved is 13 MeV for the m W approach and 32 MeV for the ∆m = m Z − m W approach.

Systematic uncertainties
Rather than estimating the uncertainty based on the current knowledge of LHC detectors and theory [7], we provide an estimate of the experimental and theoretical precision in various sources of uncertainty that would be needed to achieve a systematic uncertainty of σ m W = 10 MeV in a W mass measurement. The signal to background ratio and the continuity of the background distribution when selecting events with a decorrelated substructure observable are well sufficient to unambiguously separate the signal contribution from background. Uncertainties in the modeling of backgrounds are therefore not discussed, as their contribution to the mass measurement is expected to be subdominant. Unless stated otherwise, a selection of p T > 300 GeV and N β=1 2 < 0.2 is applied when studying the signal systematic uncertainties, where no decorrelation is applied to N β=1 2 to ease comparison to (future) theoretical computations of the observable.

Experimental uncertainties
If the W mass is measured without the use of b-tagging and the Z mass peak, the jet mass needs to be calibrated with high precision. In table 4, we quantify what precision that is needed for the energy scale measurement of charged particles, photons (and π 0 ), and neutral hadrons. We quote the precision for each particle type, assuming perfect description of the other particle types. The necessary precision for charged particles of 0.03% and photons of 0.06% is within a factor of 2 of what is currently achieved by the CMS [48] and ATLAS [49] detectors. The precision needed for neutral hadrons of 0.1% is however an order of magnitude better than what is currently achieved e.g., in jet calibration [68,69]. For an overall precision of σ m W =10 MeV, each particle type needs to be calibrated such that σ 2 m W,charged particles + σ 2 m W,photons + σ 2 m W,neutral hadrons < 10 MeV. Unless this precision can be achieved with the large HL-LHC dataset, the measurement of m Z − m W using b-tagging and the Z mass peak would be the only feasible approach, although consequent improvements to generic boosted light resonance searches are not tied to such a specific benchmark, as discussed in the introduction.
For a measurement of m Z −m W , an additional uncertainty arises from the understanding of the difference in detector response for b-, c-, and light quark-initiated showers. The effects of hadronization on the W → qq and Z → bb mass distributions will be discussed in section 4.2 and will need an improvement in understanding to a 5-10% level for a precision of 10 MeV in m Z − m W . Previous measurements of the difference between b and light jet energy response using a Z + b-jet balancing method achieved a precision of 0.5% [48], thus well below the corresponding theoretical uncertainty

JHEP02(2019)003
Another important experimental effect comes from additional pp interactions happening in the same bunch crossing, so-called pileup interactions. Particles from pileup interactions enter the reconstructed jets of the main interaction and increase their mass. At the HL-LHC, up to 200 of such interactions can happen simultaneously. Dedicated suppression techniques have been developed to remove such contamination, among which we exploit the pileup per particle identification (PUPPI) algorithm [70]. We estimate the shift in jet mass expected with 200 pileup interactions, applying the PUPPI algorithm, and quote how precise the pileup fractional estimate would be needed to model this on average. To reach a W mass precision of 10 MeV, the modeling of the extra jet mass from 200 pileup interactions needs to be at the level of 1.4% (see table 4), which may seem feasible given the achieved 1%-level modeling of the fraction of energy from pileup interactions in jets with p T > 300 GeV [48]. Maintaining this level of the understanding of pileup will, however, remain to be seen. Both ATLAS and CMS experiments plan to upgrade their detection capabilities by introducing better tracking and timing detection systems. The suppression and modeling of up to 200 pileup interactions will nevertheless remain a major experimental challenge.

Theoretical uncertainties
W and Z boson kinematics. In order to extract the W boson mass from the measured jet mass, sufficient theoretical precision in the prediction of the jet mass of a W boson is also required. Figure 6 shows the kinematic distributions of the leading jet in W and Z production. The jet p T , mass and η distributions are subject to multiple theoretical uncertainties, including parton density functions, factorization and renormalization scales, hadronization models. To demonstrate the dependence of the W jet mass on the prediction of the jet kinematics, we compute the mass difference between W + and W − jets to 170 MeV, which have different p T and η distributions due to the parton composition of the colliding protons. The uncertainty on the W mass due to parton density functions, factorization and renormalization scales used for the prediction of the p T spectra of the W and Z is evaluated by reweighing them to NNLO predictions and varying them according to the NLO PDF, NLO QCD and NLO EW uncertainties computed in ref. [47]. The resulting uncertainties quoted in table 5 show that current predictions of p T spectra for the W and Z have sufficient precision for a 10 MeV W mass measurement, considering only these uncertainty sources. When taking into account that other experimental or theoretical uncertainties of similar size can contribute, small improvements over the current precision is desirable.
Non-perturbative effects and jet substructure. Additionally, the non-perturbative effects and the impact of jet substructure selection on the W jet mass must be theoretically understood. Figure 7 shows the dependence of the W jet mass distribution on various effects. The non-perturbative effects are studied by disabling hadronization and underlying event in Pythia and computing the jet mass from partons after showering. As a cross check, we also consider the difference between Herwig and Pythia, which are based on different models for non-perturbative effects and parton showering. Excluding non-perturbative effects, a significant difference between the jet mass with and without a jet substructure selection of N β=1  measurement, it would require an understanding of the observed difference at a 3%-level for a W mass precision of 10 MeV. The W jet mass distribution with and without nonperturbative effects after N β=1 2 < 0.2 selection shows a difference of similar order, that would require an understanding of this difference at a 9% (0.9%)-level for a W mass precision of 10 MeV with (without) the use of the Z mass peak. Other grooming choices may further reduce these uncertainties. The difference between m Z − m W (m W ) in Herwig and Pythia after N β=1 2 < 0.2 selection is also of similar order, ranging from 50-500 MeV (200-1000 MeV) depending on the grooming algorithm used. It would thus require an understanding of their difference at 2-20% (1-5%)-level for a W mass precision of 10 MeV with (without) the use of the Z mass peak. In all cases, a significant improvement in understanding of these non-perturbative and showering effects needs to be reached to make this measurement feasible. One should also note that the m mMDT mass distribution with nonperturbative effects as shown in figure 7 is no longer similar to a Gaussian distribution,

JHEP02(2019)003
Quantity Effect Size of effect Understanding needed for σ m W =10 MeV Table 5. List of systematic effects. The understanding needed for σ m W = 10 MeV is the fraction of 10 MeV and the quoted size of effect. It should be noted that yet better precision is needed to achieve a sum in quadrature of all systematic uncertainties of σ m W = 10 MeV. Unless stated otherwise a selection of p T > 300 GeV and N β=1 2 < 0.2 is applied. Theoretical systematic uncertainties are estimated using particle-level simulations. but rather asymmetric, which may complicate the definition of the mass peak position. This jet mass distribution shape depends strongly on the choice of grooming algorithm and differs between herwig and pythia. Appendix A shows examples of such distributions. We find the jet mass distribution becomes increasingly more asymmetric when going from p T > 500 GeV to p T > 300 GeV. At a jet p T > 500 GeV the difference between m Z − m W Herwig and Pythia is strongly reduced from that of lower p T jets to 23 MeV for mMDT. This variation ranges from 10-50 MeV for different groomers. By making use of the p T dependence of the jet mass, there is thus potential to constrain the contributions from non-perturbative effects. Thinking beyond the HL-LHC, one should note that this measurement will also benefit from higher center of mass energies at future colliders (e.g. HE-LHC [71] or FCC [72]) due to the fact that non-perturbative effects are significantly reduced for higher W boson p T . Figure 8 shows the dependence of the W and Z jet mass on jet p T and N β=1 2 selection. The variation of the W and Z jet mass as a function of p T is minimal with N β=1 2 selection applied and no non-perturbative effects considered. When non-perturbative effects are taken into account, the dependence of the W and Z jet mass on p T are enhanced. However, since W and Z show similar trends as a function of p T , the contribution from non-perturbative effects is reduced for a m Z − m W measurement, as can also be seen in table 5. Similarly, the N β=1 2 dependence plot shows the largest shifts when adding nonperturbative effects. However, a similar trend is observed between W and Z bosons. For a very loose selection on N β=1 2 a small deviation in perturbative effects is also observed. b-quark hadronization. When considering a W mass measurement using the Z mass peak, an additional uncertainty arises from the theoretical knowledge of the jet mass difference at particle level between the W and Z peak. In particular, the hadronization of b quarks will result in larger uncertainties in the b-hadron decay due to the possible presence of neutrinos in the final state. We quantify the impact of this modeling by computing the difference between the jet mass arising from Z → qq and Z → bb decays in table 5. The addition of jet substructure selection significantly reduces this difference. The measurement thus benefits from the jet substructure selection, not only through reduction of background, but also through reduction of systematic effects.
To cross check the size of this effect, we compute the difference between Herwig and Pythia for the difference between Z → qq and Z → bb jet masses for different grooming algorithms. This difference would need an understanding in the 2-20% range for a precision of 10 MeV, indicating a significant improvement in understanding non-perturbative and showering effects on b-hadronization would be needed to make this measurement feasible. Simultaneously, in Pythia lone, W mass precision of 10 MeV for jets with substructure selection can be achieved with understanding of the hadronization of b quarks at the 7% level, comparable to current b-hadronization uncertainties in dedicated analyses [73], indicative of heavy flavor fragmentation tuning between Herwig and Pythia currently having room for improvement. For c quarks only a 13%-level understanding of the difference between W → qq and W → cs is needed as quoted in table 5. The contribution of Z → cc is suppressed by more than a factor 10 compared to Z → bb with a typical double b-tagger.
In table 6, we summarize the most important uncertainties that would contribute to this measurement.
Prospects for theoretical uncertainties. In this section we highlight several theoretical issues related to achieving an accurate description of the N 2 and jet mass spectra. The  Table 6. Summary of uncertainties for an m Z − m W measurement. The understanding needed for σ m W = 10 MeV is the fraction of 10 MeV and the estimated size of effect. It should be noted that yet better precision is needed to achieve a sum in quadrature of all systematic uncertainties of σ m W = 10 MeV. Unless stated otherwise a selection of p T > 300 GeV and N β=1 2 < 0.2 is applied. Theoretical systematic uncertainties are estimated using particle-level simulations. resolution of these issues is well beyond the scope of the current paper, and our goal is therefore more to emphasize where progress can be made, and what issues must be overcome. From figure 8, we see that perturbatively there is an extremely weak dependence of the jet mass on the N 2 cut, and that the jet mass aligns well with the W or Z world average. The small negative offset is due to radiation that is not captured in the jet, due to the finite jet radius. These effects can be analytically calculated, and are incorporated in all standard jet substructure calculations. We therefore believe that perturbative effects can be kept under good theoretical control.

JHEP02(2019)003
More concerning are non-perturbative effects, which, as can be seen in figure 8 dominate the offset of the jet mass from the W and Z world average, and furthermore, exhibit a dependence on the N 2 cut. In fact, there are two distinct non-perturbative effects which would need to be understood in order to completely understand this measurement. The first are non-perturbative corrections to the N 2 distribution on which the cut is applied. Non-perturbative effects for the groomed D 2 observable [24,26] (which is closely related to N 2 ) were recently studied in [74][75][76], where it was shown that they take a relatively simple form, and can be modeled by a single parameter shape function. The second is nonperturbative corrections to the jet mass distribution itself. Non-perturbative corrections to the groomed top quark mass distribution were studied in [77], where they were also found to take a simple form. However, non-perturbative effects for the mass distribution for the decay of a color singlet have not, to our knowledge, been studied in the literature. We believe that this deserves further attention. Ideally, these corrections could also be described by a universal shape function that could then be self-consistently extracted with the mass measurement itself.
As a cause for cautious optimism, we would like to point out that for the m W − m Z measurement strategy, it is not the non-perturbative corrections themselves that will need to be understood at the 10% level, but rather their difference acting on Z and W bosons JHEP02(2019)003 given particular selection criteria. Thus, the O(100 MeV) effects quoted should not be interpreted as requiring control over absolute hadronization and underlying event corrections at the single hadron level, which is unrealistic. Ideally, one could therefore prove a statement on the universality of the non-perturbative corrections for hadronic W and Z decays, which would place this measurement on a firmer theoretical footing. This universality would be violated by, for example, b-quark mass effects, but this should be a much smaller effect than the overall shift due to hadronization, and could perhaps be accounted for. Therefore, while this measurement seems challenging from a theoretical perspective, it points to a number of theoretical issues which deserve further thought, and whose resolution would have wider applicability in a number of jet substructure measurements.

Conclusions and outlook
A feasibility study for a first measurement of the W boson mass in the all-jets final states at the LHC and HL-LHC has been presented. Compared to the lepton plus neutrino final state, a measurement in the all-jets final state could avoid experimental systematic uncertainties related to the measurement of the missing transverse momentum and the theoretical uncertainties related to the transverse mass. While a measurement of the W mass itself seems unrealistic since it would require a significantly better understanding of the jet energy calibration than reached by the current LHC experiments, a measurement of the mass difference between the W and Z bosons is more feasible. New trigger strategies will need to be exploited to reach a statistical uncertainty of 30 MeV with HL-LHC data corresponding to 3000/fb of integrated luminosity. The measurement is, however, limited by the understanding of non-perturbative contributions to the invariant masses of W → qq and Z → bb, that would need a significant improvement to reach below 100 MeV precision.
More generally, we believe that progress towards the extraction of the W mass from the all hadronic final state using jet substructure represents a concrete goal that can drive progress in jet substructure, much like the extraction of α s from the jet mass distribution in hadron colliders [78]. We have highlighted areas for improvement on both the theoretical and experimental sides. Their improved understanding will have a much broader impact on jet substructure, most importantly for improving searches for light hadronically decaying resonances, which utilize many of the same techniques, and almost certainly for other applications unforeseen at the current time.

JHEP02(2019)003
A W and Z jet mass distributions in Pythia and Herwig In figure 9 and figure 10 we provide the predictions from pythia and herwig for different grooming algorithms, transverse momentum thresholds and substructure observable selections. They demonstrate how the dependence of the shape of the jet mass observable on non-perturbative effects, parton shower and hadronization can be influenced by the choice of grooming algorithm, transverse momentum threshold and substructure observable selection. It can be seen that the peak positions, symmetry of the distributions and the differences between pythia and herwig depend evidently on the choice grooming algorithm.  Figure 9. Jet mass distribution of W , Z and Z → bb jets in pythia and herwig with p T larger than 300 GeV and N β=1 2 < 0.2 for four different grooming algorithms. (top left) mMDT with the angular exponent β = 0, soft cutoff threshold z cut = 0.1, and characteristic radius R 0 = 0.8. (top right) recursive softdrop [56] with the angular exponent β = 1, soft cutoff threshold z cut = 0.1, characteristic radius R 0 = 0.8, and the number of iterations N set to infinity. (bottom left) trimming [57] with subjet size of R sub = 0.2 and f cut=0.03 . (bottom right) pruning [58] with the soft threshold parameter z cut = 0.1 and angular separation threshold of ∆R > m jet /p T,jet .
JHEP02 (2019)  Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.