Studies of jet mass in dijet and W/Z + jet events

Invariant mass spectra for jets reconstructed using the anti-kt and Cambridge-Aachen algorithms are studied for different jet"grooming"techniques in data corresponding to an integrated luminosity of 5 inverse femtobarns, recorded with the CMS detector in proton-proton collisions at the LHC at a center-of-mass energy of 7 TeV. Leading-order QCD predictions for inclusive dijet and W/Z+jet production combined with parton-shower Monte Carlo models are found to agree overall with the data, and the agreement improves with the implementation of jet grooming methods used to distinguish merged jets of large transverse momentum from softer QCD gluon radiation.


Introduction
The variables most often used in analyses of jet production are jet directions and transverse momenta (p T ). However, as jets are composite objects, their invariant masses (m J ) provide additional information that can be used to characterize their properties. One motivation for investigating jet mass is that, at the Large Hadron Collider (LHC), massive standard model (SM) particles such as W and Z bosons and top quarks are often produced with large Lorentz boosts, and, when such particles decay into quarks, the masses of the evolved jets can be used to discriminate them from lighter objects generated in quantum-chromodynamic (QCD) radiative processes. The same argument also holds for any new massive particles produced at the LHC. For sufficiently large boosts, all the decay products tend to be emitted as collimated groupings into small sections of the detector, and the resulting particles can be clustered into a single jet. Jet "grooming" techniques are designed to separate such merged jets from background. These new techniques have been found to be very promising for identifying decays of highly-boosted W bosons and top quarks, and in searches for Higgs bosons and other massive particles [1]. The main advantage of these grooming techniques is their ability to distinguish high p T jets that arise from decays of massive, possibly new, particles. In addition, their robust performance is valuable in the presence of additional interactions in an event (pileup), which is likely to provide an even greater challenge to such analyses in future higher-luminosity runs at the LHC.
Only a few of these promising approaches have been studied in data at the Tevatron [2] or at the LHC [3]. To understand these techniques in the context of searches for new phenomena, the jet mass must be well-modeled through leading-order (LO) or next-to-leading-order (NLO) Monte Carlo (MC) simulations. Much recent theoretical work in QCD has focused on the computation of jet mass, including predictions using advances in an effective field theory of jets (soft collinear effective theory, SCET) [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. Studies of the kind reported in the present analysis can provide an understanding of the extent to which MC simulations that match matrix-element partons with parton showers can model the observed internal jet structure. Results of these studies can also be used to compare data with theoretical computations of jet mass, and to provide benchmarks for the use of these algorithms in searches for highly-boosted Higgs bosons, or new objects beyond the SM, especially by investigating some of the background processes expected in such analyses.
We present a measurement of jet mass in a sample of dijet events, and the first study of such distributions in V+jet events, where V refers to a W or Z boson. The data correspond to an integrated luminosity of 5.0 ± 0.2 fb −1 , collected by the Compact Muon Solenoid (CMS) experiment at the LHC in pp interactions at a center-of-mass energy of 7 TeV. The analysis of these two types of final states provides complementary information because of their different parton-flavor content, since the selected dijet events are dominated by gluon-initiated jets, and the V+jet events often contain quark-initiated jets. We focus on measuring the jet mass after applying several jet grooming techniques involving "filtering" [24], "trimming" [25], and "pruning" [26,27] of jets, as discussed in detail below. This work also presents the first attempt to measure the mass of trimmed and pruned jets.
To study the dependence of the differential distributions in m J on jet p T , we measure the distributions in intervals of jet transverse momentum. Formally, this can be expressed in terms of a double-differential cross section for jet production (d 2 σ/dp T dm J ) that is examined as a function of m J for several nonoverlapping intervals in p T : where i = 1, 2, 3, . . . refers to the i th interval in p T , and the sum of contributions over all i is equal to the total observed cross section ∑ i σ i = σ. The differential probability density as a function of m J for each p T interval can therefore be written as The distributions in reconstructed jet mass of Eq.
(2) include corrections used to unfold jets to the "particle" level; the p T intervals are defined for ungroomed jets, following energy corrections for the response of the detector.
For the dijet analysis, p T and m J correspond to the average transverse momentum and average jet mass of the two leading jets (i.e., of highest p T ): p AVG T = (p T 1 + p T 2 )/2 and m AVG J = (m J 1 + m J 2 )/2. For the V+jet analysis, we use the m J and p T of the leading jet. Both quantities depend on the nature of the jet grooming algorithm, as discussed in Section 2. This paper is organized as follows. To introduce the subject, we first discuss jet clustering algorithms in Section 2, focusing mainly on grooming techniques. After a brief description of the CMS detector and the MC samples in Section 3, we provide information pertaining to the collected data and a description of event reconstruction in Section 4. Selection of events is then described in Section 5, and the effect of pileup on jet mass is investigated in Section 6. This is followed in Section 7 by the correction and unfolding procedures that are applied to the m J spectra and their corresponding systematic uncertainties. In Sections 8 and 9, we present the results of the dijet and V+jet analyses, respectively. Finally, observations and remarks on the presented results are summarized in Section 10.
The distributions shown are also stored in HEPData format [28].

Jet clustering algorithms and grooming techniques 2.1 Sequential jet clustering algorithms
Jets are defined through sequential, iterative jet clustering algorithms that combine four-vectors of input pairs of particles until certain criteria are satisfied and jets are formed. For the jet algorithms considered in this paper, for each pair of particles i and j, a "distance" metric between the two particles (d ij ), and the so-called "beam distance" for each particle (d iB ), are computed: where p T i and p T j are the transverse momenta of particles i and j, respectively, "min" refers to the lesser of the two p T values, the integer n depends on the specific jet algorithm, ∆R ij =

Filtering algorithm
The "mass-drop/filtering" procedure aims to identify symmetric splitting of jets of large p T that have large m J values. It was proposed initially for use in searches for the Higgs boson [24], but we consider just the filtering aspects of this algorithm for grooming jets.
For each jet obtained in the initial clustering procedure, the filtering algorithm defines a new, groomed jet through the following algorithm: (i) the constituents of each jet are reclustered using the CA algorithm with R = 0.3, thereby defining n new subjets s 1 , . . . , s n , ordered in descending p T , and (ii) the four-momentum of the new jet is defined by the four-vector sum over the three subjets of hardest p T , or in the rare case that n < 3, just these remaining subjets define the new jet.
The new jet has fewer particles than the initial jet, thereby reducing the contribution from effects such as underlying event and pileup, and the new m J and p T values are therefore smaller than those of the initial jet. As will be demonstrated in Section 2.5, with this choice of param-eters, filtering removes the fewest jet constituents, and is therefore the least aggressive of the investigated jet grooming techniques.

Trimming algorithm
Trimming ignores particles within a jet that fall below a dynamic threshold in p T [25]. It reclusters the jet's constituents using the k T algorithm with a radius R sub , accepting only the subjets that have p T sub > f cut λ hard , where f cut is a dimensionless cutoff parameter, and λ hard is some hard QCD scale chosen to equal the p T of the original jet. The R sub and f cut parameters of the algorithm are taken to be 0.2 and 0.03, respectively. As will be demonstrated, with this choice of parameters, trimming removes more jet constituents than the filtering procedure, but fewer jet constituents than pruning, and corresponds therefore to a moderately aggressive jet grooming technique.

Pruning algorithm
Following the clustering of jets using the original algorithm (either AK7, CA8, or CA12), the pruning algorithm [26,27] reclusters the constituents of the jet through the CA algorithm, using the same distance parameter, but additional conditions beyond those given in Eq.
(3). In particular, the softer of the two particles i and j to be merged is removed when the following conditions are met: where m J and p T are the mass and transverse momentum of the originally-clustered jet, and z cut and α are parameters of the algorithm, chosen to be 0.1 and 0.5, respectively. In our particular choice of parameters, we have chosen to divide the jet into two "exclusive" subjets (similarly to the exclusive k T algorithm [29], where one clusters constituents until the jets are all separated by the parameter R in Eq. 3). As will be demonstrated, with this choice of parameters, pruning removes the largest number of jet constituents, and can therefore be regarded as the most aggressive jet grooming technique investigated. It was previously used in the CMS search for tt resonances [34]. Figure 1 shows a comparison of distributions in the dijet sample for the ratio of groomed AK7 jet mass to the mass of the matched ungroomed AK7 jet, for our three grooming techniques, for data and for PYTHIA6 MC simulation [35], using the Z2 tune. Three distributions are shown for each grooming technique: (i) the reconstructed data ("data RECO"), (ii) the reconstructed simulated PYTHIA6 data ("PYTHIA RECO"), and (iii) the generated particle-level jets from PYTHIA6 ("PYTHIA GEN"). These three grooming techniques involve different jet algorithms for grooming (CA for filtering and pruning, k T for trimming) once the jets are found with AK7. The data and the simulation exhibit similar behavior. In general, the filtering algorithm is the least aggressive grooming technique, with groomed jet masses close to the ungroomed values. The trimming algorithm is moderately aggressive, and the pruning algorithm is the most aggressive of the three. With pruning, a bimodal distribution begins to appear, which is typical of our implementation of this algorithm as we require clustering into two exclusive subjets. In cases where the pruned jet mass is small, jets usually have most of their energy configured in "core" components, with little gluon radiation, which leads to narrow jets. When the pruned jet mass is large, the jets are split more symmetrically, which can be realized in events with gluons splitting into two nodes that fall within ∆R = 0.7 of the original parton.

The CMS detector and simulation
The CMS detector [36] is a general-purpose device with many features suited for reconstruction of energetic jets, specifically, the finely segmented electromagnetic and hadronic calorimeters and charged-particle tracking detectors.
CMS uses a right-handed coordinate system, with origin defined by the center of the CMS detector, the x axis pointing to the center of the LHC ring, the y axis pointing up, perpendicular to the plane of the LHC ring, and the z axis along the direction of the counterclockwise beam. The polar angle θ is measured relative to the positive z axis and the azimuthal angle φ relative to the x axis in the x-y plane.
Charged particles are reconstructed in the inner silicon tracker, which is immersed in a 3.8 T axial magnetic field. The CMS tracking detector consists of an inner silicon pixel detector composed of three concentric central layers and two sets of disks arranged forward and backward of the center, and up to ten silicon strip central layers and three inner and nine outer strip disks forward and backward of the center. This arrangement provides full azimuthal coverage for |η| < 2.5, where η = − ln[tan(θ/2)] is the pseudorapidity. The pseudorapidity approximates the rapidity y and equals y for massless particles. Since many of the reconstructed jets are not massless, we use the rapidity y for characterizing jets in this analysis.
A lead tungstate crystal electromagnetic calorimeter (ECAL) and a brass/scintillator hadronic calorimeter (HCAL) surround the tracking volume and provide photon, electron, and jet reconstruction up to |η| = 3. The ECAL and HCAL cells are grouped into towers projecting radially outward from the center of the detector. In the central region (|η| < 1.74), the towers have dimensions of ∆η = ∆φ = 0.087 that increase at larger |η|. ECAL and HCAL cell energies above some chosen noise-suppression thresholds are combined within each tower to define the tower energy. Muons are measured in gas-ionization detectors embedded in the steel return yoke outside the solenoid. To improve reconstruction of jets, the tracking and calorimeter information is combined in a "particle-flow" (PF) algorithm [37], which is described in Section 4.4.
For the analysis of dijet events, samples are simulated with PYTHIA6.4 (Tune Z2) [35,38], PYTHIA8 (Tune 4c) [39], and HERWIG++ (Tune 23) [40], and propagated through the simulation of the CMS detector based on GEANT4 [41]. Underlying event (UE) and pileup (PU) are included in the simulations, which are also reweighted to have the simulated PU distribution match the observed PU distribution in the data.
For the V+jet analysis, events with a vector boson produced in association with jets are simulated using MADGRAPH 5.1 [42]. This matrix element generator is also used to simulate tt events. The MADGRAPH events are subsequently subjected to parton showering, simulated with PYTHIA6 using the Z2 Tune [38]. To compare hadronization in different generators, we generate V+jet samples in which parton showering and hadronization are simulated with HER-WIG++. Diboson (WW, WZ, and ZZ) events are also generated with PYTHIA6. Single-top-quark samples are produced with POWHEG [43], and the lepton enriched dijet samples are produced with PYTHIA6 using the Z2 Tune. CTEQ6L1 [44] is the default set of parton distribution functions used in all these samples, except for the single-top-quark MC, which uses CTEQ6M.

Dijet trigger selection
Events are collected using single-jet triggers, which are based on jets reconstructed only from calorimetric information. This procedure yields inferior resolution to jets reconstructed offline with PF constituents, but provides faster reconstruction that meets trigger requirements. As the instantaneous luminosity is time-dependent, the specific jet-p T thresholds change with time. The triggers used to select dijet events have partial overlap. Those with lower-p T thresholds have high prescale settings to accommodate the higher data-acquisition rates, and some events selected with these lower-p T triggers are also collected at higher thresholds.
To avoid double counting of phase space, each event is assigned to a specific trigger. To do this, we compute the trigger efficiency as a function of reconstructed p AVG T , select an interval in trigger efficiency where the efficiency is maximum (>95%) for that range of p AVG T , and assign that trigger to the appropriate p AVG T interval. The assignment is based on the jet p T values reconstructed offline (but not groomed). Table 1 shows the p T thresholds for each of the jet triggers used in the analysis, and the corresponding intervals of p T to which the triggered events are assigned.

V+jet trigger selection
Several triggers are also used to collect events corresponding to the topology of V+jet events, where the V decays via electrons or muons in the final state. For the W+jet channels, the triggers consist of several single-lepton triggers, with lepton identification criteria applied online. To assure an acceptable event rate, leptons are required to be isolated from other tracks and energy depositions in the calorimeters. For the W(µν µ ) channel, the trigger thresholds for the muon p T are in the range of 17 to 40 GeV. The higher thresholds are used at higher instantaneous luminosity. The combined trigger efficiency for signal events that pass offline requirements (described in Section 5) is ≈92%.
For the W(eν e ) events, the electron p T threshold ranges from 25 to 65 GeV. To enhance the fraction of W+jet events in the data, the single-electron triggers are also required to have minimum thresholds on the magnitude of the imbalance in transverse energy (E miss T ) and on the transverse mass (m T ) of the (electron + E miss , and φ is the angle between the directions of p e T and E miss T . The combined efficiency for electron W+jet events that pass the offline criteria is ≈99%. The Z(µµ) channel uses the same single-muon triggers as the W(µν µ ) channel. The Z(ee) channel uses dielectron triggers with lower thresholds for p T (17 and 8 GeV), and additional isolation requirements. These triggers are 99% efficient for all Z+jet events that pass the final offline selection criteria.

Binning jets as a function of p T
The jet p T bins introduced in Eq. (1) are given in Table 2 for V+jet and dijet events. The jet p T is re-evaluated for each grooming algorithm. Because there are large biases due to jet misassignment in the dijet events, especially at small p T (when three particle-level jets are often reconstructed as two jets in the detector, or vice versa), the p T intervals for these events begin at 220 GeV. Furthermore, the smaller number of events in the V+jet samples precludes the study of these events beyond p T = 450 GeV.

Event reconstruction
As indicated above, events are reconstructed using the particle-flow algorithm, which combines the information from all subdetectors to reconstruct the particle candidates in an event.
The algorithm categorizes particles into muons, electrons, photons, charged hadrons, and neu-tral hadrons. The resulting PF candidates are passed through each jet clustering algorithm of Section 2, as implemented in FASTJET (Version 3.0.1) [45,46].
The reconstructed interaction vertex characterized by the largest value of ∑ i (p T trk i ) 2 , where p T trk i is the transverse momentum of the i th charged track associated with the vertex, is defined as the leading primary vertex (PV) of the event. This vertex is used as the reference vertex for all PF objects in the event. A pileup interaction can affect the reconstruction of jet momenta and E miss T , as well as lepton isolation and b-tagging efficiency. To mitigate these effects, a trackbased algorithm is used to remove all charged hadrons that are not consistent with originating from the leading PV.
Electron reconstruction requires the matching of an energy cluster in the ECAL with a track extrapolated from the silicon tracker [47]. Identification criteria based on the energy distribution of showers in the ECAL and consistency of tracks with the primary vertex are imposed on electron candidates. Additional requirements remove any electrons produced through conversions of photons in detector material. The analysis considers electrons only in the range of |η| < 2.5, excluding the transition region 1.44 < |η| < 1.57 between the central and endcap ECAL detectors because of poorer resolution for electrons in this region. Muons are reconstructed using two algorithms [48]: (i) in which tracks in the silicon tracker are matched to signals in the muon chambers, and (ii) in which a global fit is performed to a track seeded by signals in the external muon system. The muon candidates are required to be reconstructed through both algorithms. Additional identification criteria are imposed on muon candidates to reduce the fraction of tracks misidentified as muons, and to reduce contamination from muon decays in flight. These criteria include the number of hits detected in the tracker and in the outer muon system, the quality of the fit to a muon track, and its consistency of originating from the leading PV.
Charged leptons from V-boson decays are expected to be isolated from other energy depositions in the event. For each lepton candidate, a cone with radius 0.3 for muons and 0.4 for electrons is chosen around the direction of the track at the event vertex. When the scalar sum of the transverse momenta of reconstructed particles within that cone, excluding the contribution from the lepton candidate, exceeds ≈10% of the p T of the lepton candidate, that lepton is ignored. The exact isolation requirement depends on the η, p T , and flavor of the lepton. Muons and electrons are required to have p T > 30 GeV and > 80 GeV, respectively. The large threshold for electrons ensures good trigger efficiency. To avoid double counting, isolated charged leptons are removed from the list of PF objects that are clustered into jets.
After removal of isolated leptons and charged hadrons from pileup vertices, only the neutral hadron component from pileup remains and is included in the jet clustering. This remaining component of pileup to the jet energy is removed by applying a correction based on a mean p T per unit area of (∆y × ∆φ) originating from neutral particles [30,49]. This quantity is computed using the k T algorithm, and corrects the jet energy by the amount of energy expected from pileup in the jet cone. This "active area" method adds a large number of soft "ghost" particles to the clustering sequence to determine the effective area subtended by each jet. This procedure is done for all grooming algorithms just as for the ungroomed jets. The active area of a groomed jet is smaller than that of an ungroomed jet, and the pileup correction is therefore also smaller. Different responses in the endcap and central barrel calorimeters necessitate using η-dependent jet corrections. The amount of energy expected from the remnants of the hard collision (the underlying event) is estimated from minimum-bias data and MC events, and is added back into the jet.
In addition, the pileup-subtracted jet four-momenta in data are corrected for nonlinearities in η and p T by using a p T -and η-dependent correction to account for the difference between the response in MC-simulated events and data [50]. The jet corrections are derived for the ungroomed jet algorithms but are also applied to the groomed algorithms, thereby adding additional systematic uncertainty in the energy of groomed jets.

Event selection
We apply several other selection criteria to minimize instrumental background and electronic noise. In particular, accepted events must have at least one good primary vertex (Section 4.4). Backgrounds from additional beam interactions are reduced by applying a variety of requirements on charged tracks. Finally, calorimeter noise is minimized through restrictions on timing and electronic pulse shapes expected for signals.
Dijet events are required to have at least two AK7 jets, each with p T > 50 GeV and |y| < 2.5, and each jet must satisfy the jet quality criteria discussed in Ref. [37]. No third-jet veto is applied.
Reconstruction of W and Z bosons in V+jet events begins with identification of charged leptons and a calculation of E miss T , described in the previous section. Candidates for Z → + − ( = e or µ) decays are reconstructed by combining two isolated electrons or muons and requiring the dilepton invariant mass to be in the 80 < M < 100 GeV range. An accurate measurement of E miss T is essential for distinguishing the W signal from background processes. The E miss T in the event is defined using the PF objects, and this analysis requires E miss T > 50 GeV. Candidate W → ν decays are identified primarily through the presence of a significant E miss T and a single isolated lepton of large p T , with p T and m T of the W candidate obtained by combining the lepton and the E miss T vectors.
The analysis of V+jet events is mainly of interest for the regime of p V T > 120 GeV, in which the opposing jet tends to have large p T as well, because of momentum conservation. In fact, the leading jet in each event (independent of clustering algorithm and jet radius) is required to have p T > 125 GeV and |y| < 2.5. A back-to-back topology between the vector boson and the leading jet is ensured by the additional selection of ∆φ(V, jet) > 2 and ∆R( , jet) > 1. Requiring such highly boosted jets, in addition to the tight isolation criteria for the leptons, greatly suppresses the background from multijet production. In the W → ν +jet analysis, additional rejection of multijet background is achieved by requiring m T (W) > 50 GeV. No subleading-jet veto is applied.
Figures 2(a) and (b) show the p T distributions for the leading AK7 jet selected in Z+jet and W+jet candidate events, respectively. Given the unique signature for highly-boosted vector bosons recoiling from jets, the selections suffice to define very pure samples of V+jet events. In the Z( )+jet analysis, the additional constraint on dilepton mass removes almost all background contributions, yielding a purity of ≈99% for Z+jet events, with ≈1% contamination from diboson production. The W+jet candidate sample contains ≈82% W+jet events, with small background contributions from tt (13%), single top-quark (3%), and diboson and Z+jet (1% each) events based on MC simulation. The small number of events expected from these processes are subtracted using MC predictions for the jet mass from the W+jet candidate events, before correcting the data for detector effects. Similarly, the small number of events expected from diboson production are subtracted from the Z +jet candidates.

Influence of pileup on jet grooming algorithms
During the data taking the instantaneous LHC luminosity exceeded ≈3.0 × 10 33 cm −2 s −1 , or an average of ten interactions per bunch crossing. Such pileup collisions are not correlated with the hard-scattering process that triggers an interesting event, but present a background from low-p T interactions that can affect the measured energies of jets and their observed masses. Methods to mitigate these effects are part of standard event reconstruction, as discussed in Section 4.4, and are essential for extracting correct jet multiplicities and energies. The jet mass is expected to be particularly sensitive to pileup [1] for jets of large angular extent that contain many particles. Grooming techniques are designed to reduce the effective area of such jets and thereby minimize sensitivity to pileup. We examine this issue through studies of jet mass in the presence of pileup.
The mean jet mass m J for AK jets is presented for size parameters R = 0.5, 0.7, and 0.8, as a function of the total number of reconstructed primary vertices (N PV ) in Fig. 3(a), for data and MC simulation. The mean mass for N PV = 1 increases linearly with the jet radius from 0.5 to 0.8. A measure of the dependence of m J on pileup is given by the slope of a linear fit to the jet mass versus N PV . The ratios of these slopes (s R ) are found to be roughly consistent with the ratio of the third power of the jet radius, as summarized in Table 3. This is in agreement with predictions for scaling of the mean mass [51]. The R 3 dependence can be understood in terms of the increase of the jet area as R 2 . Simultaneously, the contribution of these particles to the jet mass scales with the distance between them, or ≈R/2, yielding another power of R.
In Fig. 3(b) we show the dependence of m J on N PV , for AK7 jets, for different grooming algorithms. The grooming significantly reduces the impact of pileup on m J , as reflected by the decrease of the slope of the linear fit to the groomed-jet data points, as summarized in   The observed agreement between data and simulation in Fig. 3 provides support for our characterization of jet grooming and pileup, and the decrease in slopes suggests that grooming is indeed an effective tool for suppressing the impact of pileup on jets with large R parameters.

Corrections and systematic uncertainties
Before comparison of the jet mass distributions with QCD predictions, the data are corrected to the particle level for detector effects, such as resolution and acceptance. The simulated particlelevel jets are reconstructed with the same algorithm and with the same parameters as the PF jets. We use the unfolding procedure described in Refs. [52][53][54][55][56] to correct the jet mass, through an iterative technique for finding the maximum-likelihood solution of the unfolding problem. The detector response matrix is obtained in MC studies of jets. In general, the number of iterations must be tuned to minimize the impact of statistical fluctuations on the result. In practice, however, the procedure is largely insensitive to the precise settings and binning of events and four iterations usually suffice. A larger number of iterations were found to provide the same results except for small fluctuations in the tails of distributions. A simpler bin-by-bin unfolding is used as a cross-check, and is found to provide similar results, with fluctuations in the tails of the distributions. The jet transverse momenta are not unfolded.
Systematic uncertainties are estimated by modifying the response matrix for each source of uncertainty by ±1 standard deviation, and comparing the mass distribution to the nominal results, based on simulated PYTHIA6 events. The difference in the unfolded mass spectrum from such a change is taken as the uncertainty arising from that source.
The experimental uncertainties that can affect the unfolding of the jet mass include the jet energy scale (JES), jet energy resolution (JER), and jet angular resolution (JAR). The uncertainty from JES is estimated by raising and lowering the jet four-momenta by the measured uncertainty as a function of jet p T and η [50], which typically corresponds to 1-2% for the jets in this analysis. Two additional p T -and η-independent uncertainties are included: a 1% uncertainty to account for differences observed between the measured and predicted W mass for high-p T jets in a tt-enriched sample, and a 3% uncertainty to account for differences in the groomed and ungroomed energy responses found in MC simulation [34].
The impact of uncertainties in JER and JAR on m J are evaluated by smearing the jet energies, as well as the resolutions in η and φ, each by 10% in the MC simulation relative to the particle-level generated jets [50]. These estimated uncertainties on JER and JAR are found to be essentially the same for all jet grooming techniques in MC studies. Since this analysis uses jets constructed from PF constituents, the charged particles have excellent energy and angular resolutions, but their use induces a dependence on tracking uncertainties, e.g., tracking efficiency. This dependence is accounted for implicitly in the ±10% changes in jet energy and angular resolutions, since such changes would lead to a difference between expected and observed values of these quantities. The same is true for the neutral electromagnetic component of the jet (primarily from π 0 → γγ decays).
The remaining sources of uncertainty are estimated from MC simulation. The tracking information is not sensitive to the neutral hadronic component of jets, and this small contribution is taken directly from simulation. We estimate this remaining uncertainty by comparing the unfolded data using PYTHIA6 and using HERWIG++, and assign the difference as a systematic uncertainty. This also accounts for the uncertainty from modeling parton showers. The latter effect often comprises the largest uncertainty in the unfolded jet mass distributions as described below. Other theoretical ambiguities that can affect the unfolding of the jet mass include the variation of the parton distribution functions and the modeling of initial and finalstate radiation (ISR/FSR). The former was investigated and found to be much smaller than the difference between the unfolding with PYTHIA6 and the unfolding with HERWIG++, and hence is neglected. The latter is included implicitly in the uncertainty between PYTHIA6 and HERWIG++.
As described in Section 4.4, the jets used in this analysis are reconstructed after removing the charged hadrons that appear to emanate from subleading primary vertices. This procedure produces a dramatic (≈60%) reduction in the pileup contribution to jets. The residual uncertainty from pileup is obtained through MC simulation, estimated by increasing and decreasing the cross section for minimum-bias events by 8%.
In the dijet analysis, there can be incorrect assignments of leading reconstructed jets relative to the generator level, e.g., two generator-level jets can be matched to three reconstructed jets, or vice versa. This effect causes a bias in the unfolding procedure, which becomes greater at small p T . This bias is corrected through MC studies of matching of particle-level jets to reconstructed jets, and the magnitude of the bias correction is also added to the overall systematic uncertainty. Such misassignments are negligible in the V+jet analysis.

Results from dijet final states
The differential probability distributions of Eq.
(2) for m AVG J of the two leading jets in dijet events, corrected for detector effects in the jet mass, are displayed in Figs. 4-7 for seven bins in p AVG T along with the HERWIG++ predictions.. The p AVG T is not corrected to the particle level, because the correction is expected to be negligible for the momenta considered. Results are shown for ungroomed jets and for the three categories of grooming. Each distribution is normalized to unity. The ratios of the MC simulations used in Figs. 4-7 to the results for data, for PYTHIA6, PYTHIA8, and for HERWIG++ are given in Figs. 8-11, respectively.
The largest systematic uncertainty is from the choice of parton-shower modeling used to calculate detector corrections, with small, but still significant uncertainties arising from jet energy scale and resolution, and small contributions from jet angular resolution and pileup. In the 220-300 GeV and 300-450 GeV jet-p T bins, the m J < 50 GeV region is dominated by uncertainties from unfolding (50-100%), which are negligible for p AVG T > 450 GeV. For m J > 50 GeV, the JES, JER, JAR, and pileup uncertainties each contribute ≈10%. For the 450-1000 GeV p T bins, parton showering dominates the uncertainties, which is around 50-100% below the peak of the m J distribution and 5-10% for the rest of the distribution. For p T > 1000 GeV, statistical uncertainty dominates the entire mass range.

Results from V+jet final states
This section provides the probability density distributions as functions of the mass of the leading jet in V+jet events. These distributions are corrected for detector effects in the jet mass, and are compared to MC expectations from MADGRAPH (interfaced to PYTHIA6) and HERWIG++. The jet mass distributions are studied in different ranges of p T between 125 and 450 GeV, as given in Table 2. (Just as in the dijet results, p T is not corrected to the particle level.) For jets reconstructed with the CA algorithm (R = 1.2), we study only the events with p T > 150 GeV, which is most interesting for heavy particle searches in the highly-boosted regime, where all decay products are contained within R = 1.2 jets [24].
For clarity, the distributions are also truncated at large mass values where few events are recorded. Jet-mass bins with relative uncertainties > 100% are also ignored to minimize overlap with more precise measurements in other p T bins.          tively. Both PYTHIA6 and HERWIG++ show good agreement with data for all p T bins, but especially so for p T > 300 GeV. As in the case of the dijet analysis, the data at small jet mass are not modeled satisfactorily, but show modest improvement after applying the grooming procedures. To investigate several popular choices of jet grooming at CMS, Figs. 16-17 show the distributions in m J for pruned CA8 and filtered CA12 jets in Z+jet events. For groomed CA jets, both PYTHIA6 and HERWIG++ provide good agreement with the data, with some possible inconsistency for m J < 20 GeV and at large m J for p T < 300 GeV for the ungroomed and filtered jets. Figures 18-21 show the corresponding distributions for the mass of the leading jet accompanying the W boson for AK7 jets in W( ν )+jet events for the ungroomed, filtered, trimmed, and pruned clustering algorithms, and Figs. 22-23 show the distributions for pruned CA8 and filtered CA12 jets. For CA8 and CA12 jets, only particular grooming algorithms and p T bins are chosen for illustration. The MC simulation shows good agreement with data, just as observed for Z+jet events.

Summary
We have presented the differential distributions in jet mass for inclusive dijet and V+jet events, defined through the anti-k T algorithm for a size parameter of 0.7 for ungroomed jets, as well as for jets groomed through filtering, trimming, and pruning. In addition, similar distributions for V+jet events were given for pruned Cambridge-Aachen jets with a size parameter of 0.8, as well as for filtered Cambridge-Aachen jets with a size parameter of 1.2. The impact of pileup on jet mass was also investigated.
Higher-order QCD matrix-element predictions for partons, coupled to parton-shower Monte Carlo programs that generate jet mass in dijet and V+jet events, are found to be in good agreement with data. A comparison of data with MC simulation indicates that both PYTHIA6 and HERWIG++ reproduce the data reasonably well, and that the HERWIG++ predictions for more aggressive grooming algorithms, i.e., those that remove larger fractions of contributions to the original ungroomed jet mass, agree somewhat better with observations. It is also observed that the more aggressive grooming procedures lead to somewhat better agreement between data and MC simulation.
In comparing the results from the V+jet analysis with those for the two leading jets in multijet events, the predictions provide slightly better agreement with the V+jet data. This observation suggests that simulation of quark jets is better than of gluon jets. Differences between data and simulation are larger at small jet mass values, which also correspond to the region more affected by pileup and soft QCD radiation.
These studies represent the first detailed investigations of techniques for characterizing jet substructure based on data collected by the CMS experiment at a center-of-mass energy of 7 TeV. For the trimming and pruning algorithms, these studies mark the first publication on this subject from the LHC, and provide an important benchmark for their use in searches for massive particles. Finally, the intrinsic stability of these algorithms to pileup effects is likely to contribute to a more rapid and widespread use of these techniques in future high-luminosity runs at the LHC.