Jet mass and substructure of inclusive jets in √ s = 7 TeV pp collisions with the ATLAS experiment

: Recent studies have highlighted the potential of jet substructure techniques to identify the hadronic decays of boosted heavy particles. These studies all rely upon the assumption that the internal substructure of jets generated by QCD radiation is well understood. In this article, this assumption is tested on an inclusive sample of jets recorded with the ATLAS detector in 2010, which corresponds to 35 pb − 1 of pp collisions delivered by the LHC at √ s = 7 TeV. In a subsample of events with single pp collisions, measurements corrected for detector eﬃciency and resolution are presented with full systematic uncertainties. Jet invariant mass, k t splitting scales and N -subjettiness variables are presented for anti- k t R = 1 . 0 jets and Cambridge-Aachen R = 1 . 2 jets. Jet invariant-mass spectra for Cambridge-Aachen R = 1 . 2 jets after a splitting and ﬁltering procedure are also presented. Leading-order parton-shower Monte Carlo predictions for these variables are found to be broadly in agreement with data. The dependence of mean jet mass on additional pp interactions is also explored.


Introduction
The ATLAS experiment observes proton-proton (pp) collisions provided by the Large Hadron Collider (LHC). The outcome of these collisions is frequently the production of large numbers of hadrons. In order to understand these collisions, studies usually group hadrons into jets defined by one of a number of standard algorithms [1][2][3][4][5][6][7]. The variables most often used in analyses are the jet direction and momentum transverse to the beam (p T ). However the jets remain composite objects and their masses and internal substructure contain additional information.
One strong motivation for studies of the internal substructure of jets is that at the LHC particles such as W and Z bosons and top quarks are produced abundantly with significant Lorentz boosts. The same may also be true for new particles produced at the LHC. When

JHEP05(2012)128
such particles decay hadronically, the products tend to be collimated in a small area of the detector. For sufficiently large boosts, the resulting hadrons can be clustered into a single jet. Substructure studies offer a technique to extract these single jets of interest from the overall jet background. Such techniques have been found promising for boosted W decay identification, Higgs searches and boosted top identification amongst others [8]. However, many of these promising approaches have never been tested with collision data and rely on the assumption that the internal structure of jets is well modelled by parton-shower Monte Carlo approaches. It is therefore important to measure some of the relevant variables in a sample of jets to verify the expected features.
In this paper, measurements are made with an inclusive sample of high-transverse momentum jets produced in proton-proton collisions with a centre-of-mass energy ( √ s) of 7 TeV. This is a natural continuation of the studies in previous experiments [9][10][11][12][13]. It also complements previous ATLAS studies [14] probing the shape of jets reconstructed with the anti-k t algorithm [5] with smaller radius parameters R = 0.4 and 0.6.
This study focuses on two specific jet algorithms that are likely to be of interest for future searches: anti-k t jets with an R-parameter of 1.0 and Cambridge-Aachen [3,4] jets with R = 1.2. Jets are required to be at high-transverse momentum (p T > 200 GeV) and central in rapidity 1 (|y| < 2). The normalised cross-section as a function of jet mass, taken from the jet four-momentum, is measured for both these algorithms. In addition to the mass, two sets of substructure variables, k t splitting scales [15] and N -subjettiness ratios [16], are measured. For the Cambridge-Aachen jets, the mass distribution after a substructure splitting and filtering procedure [17] is also presented.

The splitting and filtering procedure
The "splitting and filtering" procedure aims to identify relatively hard, symmetric splittings in a jet that contribute significantly to the jet's invariant mass. This procedure is taken from recent Higgs search studies [17,19]. The parameters are tuned to maximise sensitivity to a Standard Model Higgs boson decaying to bb, but this procedure is suitable generally for identifying two-body decay processes. The effect of the procedure is to search for jets where clustering the constituents with Cambridge-Aachen combines two relatively low mass objects to make a much more massive object. This indicates the presence of a heavy particle decay. The procedure then attempts to retain only the constituents believed to be related to the decay of this particle. Because the procedure itself uses the Cambridge-Aachen algorithm, it is most natural to apply it to jets originally found with this algorithm.
Each stage in the clustering combines two objects j 1 and j 2 to make another object j.
j1,j2 and δR j1,j2 = δy 2 j1,j2 + δφ 2 j1,j2 , where δy and δφ are the differences in rapidities and azimuthal angles respectively. The procedure takes a jet to be the object j and applies the following: 1. Undo the last clustering step of j to get j 1 and j 2 . These are ordered such that their mass has the property m j1 > m j2 . If j cannot be unclustered (i.e. it is a single particle) or δR j1,j2 < 0.3 then it is not a suitable candidate, so discard this jet.
The algorithm parameters µ and v cut are taken as 0.67 and 0.09 respectively [19].
The µ cut attempts to identify a hard structure in the distribution of energy in the jet, which would imply the decay of a heavy particle. The cut on v further helps by suppressing very asymmetric decays of the type favoured by splittings of quarks and gluons. A notable modification of the original procedure [17] in this paper is the addition of the δR j1,j2 cut in step 1. This cut is applied because with current techniques the correction for detector resolution at angular scales below 0.3 is not well controlled. Steps 3 and 4 filter out some of the particles in the candidate jet, the aim being to retain particles relevant to the hard process while reducing the contribution from effects like underlying event and pile-up. The 4-vector after step 4 can be treated like a new jet. This new jet has a p T and mass less than or equal to those of the original jet.

k t splitting scales, d ij
The k t splitting scales are defined by reclustering the constituents of the jet with the k t recombination algorithm [1,2]. The k t -distance of the final clustering step can be used to define a splitting scale variable √ d 12 : where 1 and 2 are the two jets before the final clustering step [15]. The ordering of clustering in the k t algorithm means that in the presence of a two-body heavy particle decay the final clustering step will usually be to combine the two decay products. The parameter √ d 12 can therefore be used to distinguish heavy particle decays, which tend to be more symmetric, from the largely asymmetric splittings of quarks and gluons. The expected value for a heavy particle decay is approximately m/2, whereas inclusive jets will tend to have values ∼ p T /10, although with a tail extending to high values. The variable √ d 23 is defined analogously but for the two objects combined in the penultimate clustering step.

N -subjettiness
The N -subjettiness variables τ N [16] are designed to be smooth, continuous observables related to the subjet multiplicity. Intuitively, the variables can be thought of as answering the question: "How much does this jet look like N different subjets?" The variable τ N is calculated by clustering the constituents of the jet with the k t algorithm and requiring N subjets to be found. These N subjets define axes within the jet around which the jet constituents may be concentrated. The variables τ N are then defined as the following sum over all constituents k of the jet: where δR i,k is the distance from the subjet i to the constituent k and R is the R-parameter of the original jet algorithm. Using this definition, τ N describes how well the substructure of the jet is described by N subjets by assessing the degree to which constituents are localized near the axes defined by the k t subjets. For two-and three-body decays, respectively, the ratios τ 2 /τ 1 and τ 3 /τ 2 have been shown to provide excellent discrimination for hadronic decays of W -bosons and boosted top quarks [20]. These ratios will be referred to as τ 21 and τ 32 respectively. These variables mostly fall within the range 0 to 1. As an example, τ 21 1 corresponds to a jet which is narrow and without substructure; τ 21 0 implies a jet which is much better described by two subjets than one. Similarly low values of τ 32 imply a jet which is much better described by three subjets than two. However, as can be seen from the definition, adding an additional subjet axis will tend to reduce the value of τ N and therefore even narrow jets tend to have values of τ 21 and τ 32 slightly less than 1.

JHEP05(2012)128 3 The ATLAS detector
The ATLAS detector [21] provides nearly full solid angle coverage around the collision point with tracking detectors, calorimeters and muon chambers. Of these subsystems the most relevant to this study are the inner detector, the barrel and endcap calorimeters, and the trigger system.
The inner detector is a tracking detector covering the range |η| < 2.5 and with full coverage in φ. It is composed of a silicon pixel detector, a silicon microstrip detector and a transition radiation tracker. The whole system is immersed in a 2 T magnetic field. The information from the inner detector is used to reconstruct tracks and vertices.
The barrel and endcap calorimeters cover the regions |η| 1.5 and 1.5 |η| < 3.2, respectively. Electromagnetic measurements are provided by a liquid-argon (LAr) sampling calorimeter. The granularity of this detector ranges from δη×δφ = 0.025×0.025 to 0.1×0.1. Hadronic calorimetry in |η| < 1.7 is provided by a scintillating-tile detector, while in the endcaps, coverage is provided by a second LAr system. The granularity of the hadronic calorimetry ranges from 0.1 × 0.1 to 0.2 × 0.2.
The trigger system [22] is composed of three consecutive levels. Only the Level-1 (L1) trigger is used in this study, with higher levels not rejecting any events. The L1 trigger is based on custom-built hardware that processes events with a fixed latency of 2.5 µs. Events in this analysis are selected based on their L1 calorimeter signature. The L1 calorimeter trigger uses coarse detector information to identify interesting physics objects above a given transverse energy (E T ) threshold. The jet triggers use a sliding window algorithm taking square δη × δφ = 0.2 × 0.2 jet elements as input. The window size is 0.8 × 0.8.

Dataset and reconstruction
The data analysed here come from the 2010 √ s = 7 TeV pp dataset. Data are used in this study only if the detector conditions were stable, there was a stable beam present in the LHC, the luminosity was reliably monitored and the trigger was operational. The selected data set corresponds to an integrated luminosity of 35.0 ± 1.1 pb −1 [23,24].
Events in this analysis are first selected by the L1 calorimeter trigger system. The efficiency of this trigger was evaluated in data and found to contain no significant biases for the selection used here. For the lowest p T bin (200-300 GeV) a trigger is used which was only available for part of the dataset. As a result some plots are presented with the lower integrated luminosity of 2.0 ± 0.1 pb −1 .
To reject events that are dominated by detector noise or non-collision backgrounds, events are required to contain a primary vertex consistent with the LHC beamspot, reconstructed from at least five tracks with p T > 150 MeV. Additionally, jets are reconstructed with the anti-k t algorithm using an R-parameter of 0.6. Events are discarded if any such jet with transverse momentum greater than 30 GeV fails to satisfy a number of quality criteria, including requirements on timing and calorimeter noise [25]. This selection removes approximately 3% of events in this dataset.

JHEP05(2012)128
Additional proton-proton collisions (pile-up) can have a significant impact on quantities like jet mass and substructure [8]. The primary results in this paper are therefore presented only in events where the number of reconstructed primary vertices (N PV ) composed of at least five tracks is exactly one. This requirement selects approximately 22% of events in the 2010 dataset. As vertex finding is highly efficient, this approach is expected to be very good at rejecting pile-up, and no additional systematic uncertainties as a result of this requirement are considered. The effects of pile-up are discussed in more detail in section 10.
Calorimeter cells are clustered using a three-dimensional topological algorithm. These clusters provide a three-dimensional representation of energy depositions in the calorimeter with a nearest neighbour noise suppression algorithm [26]. The resulting clusters are made massless and then classified as either electromagnetic or hadronic in origin based on their shape, depth and energy density. Cluster energies are corrected with calibration constants, which depend on the cluster classification to account for calorimeter non-compensation [25]. The clusters are then used as input to a jet algorithm.
As part of this study, specific calibrations for these jet algorithms have been devised. Calibrations for the mass, energy and η of jets are derived from Monte Carlo (specifically Pythia [27]). Hadron-level jets (excluding muons and neutrinos) are matched to jets reconstructed in the simulated calorimeter. The matched pairs are used to define functions for these three variables, dependent on energy and η, which on average correct the reconstructed quantities back to the true scale. This correction is of the order 10-20% for mass and energy and 0.01 for η.
Jets constructed from tracks are used for systematic studies in this paper. These trackjets are constructed using the same algorithms as calorimeter jets. The input constituents are inner-detector tracks originating only from the selected pp collision of interest as selected by the criteria p T > 500 MeV, |η| < 2.5, |z 0 | < 5 mm and |d 0 | < 1.5 mm [28]. Here z 0 and d 0 are the longitudinal and transverse impact parameter of the track at closest approach to the z-axis, relative to the primary vertex.
The measurements presented in this paper are for jets that have |y| < 2 in four 100 GeV p T bins spanning 200 to 600 GeV. This selection is not biased by trigger effects and the jets it selects are contained entirely within the barrel and end-cap subdetectors.

Monte Carlo samples
Samples of inclusive jet events were produced using several Monte Carlo (MC) generators including Pythia 6.423 [27] and Herwig++ 2.4 [29]. These programs implement leadingorder (LO) perturbative QCD (pQCD) matrix elements for 2 → 2 processes. Additionally, use the AMBT1 tune [28]. In some figures the Perugia2010 Pythia tune is used [37], which has been found to describe jet shapes more accurately at ATLAS [14]. Leading-order parton density functions are taken from the MRST2007 LO* set [38,39], unless stated otherwise. No pile-up was included in any of these samples.
The MC generated samples are passed through a full simulation [40] of the ATLAS detector and trigger, based on GEANT4 [41]. The Quark Gluon String Precompound (QGSP) model is used for the fragmentation of nuclei, and the Bertini cascade (BERT) model for the description of the interactions of the hadrons in the medium of the nucleus [42].

Detector-level distributions
Detector-level distributions for jet p T , η, mass, √ d 12 , √ d 23 , τ 21 and τ 32 are shown in figures 1-6. The statistical uncertainty represented in ratios is that from Monte Carlo and data added in quadrature. Representative distributions of the substructure variables are shown for the 300-400 GeV bin only. The Monte Carlo is normalised to the data separately in each plot. The properties of these jets are observed to be reasonably well modelled by leading-order parton-shower Monte Carlo. There are approximately four times fewer split and filtered jets (e.g. figure 3) because many jets fail the splitting criteria described above.

Systematic uncertainties
The modelling of the calorimeter response is the biggest systematic uncertainty for this analysis. The key issue therefore is to validate the Monte Carlo-based jet calibration described in section 4. As the results here use jet algorithms with larger R-parameters, the ATLAS jet energy scale uncertainty [25] for anti-k t R = 0.4 and 0.6 jets cannot be applied. The primary systematic uncertainties considered in the present study are those relating to scales and resolutions, such as jet p T scale (JES) and jet p T resolution (JER). For each substructure variable, the scale and resolution of the variable itself are also considered, for example the jet mass scale (JMS) and jet mass resolution (JMR). The scale uncertainties are primarily constrained by in-situ validation using track-jets. The inner detector and calorimeter have largely uncorrelated systematic effects, therefore comparison of variables such as jet mass and energy between the two sub-detectors allows for some separation of physics and detector effects. This technique is limited to a precision of around 3-5% by systematic uncertainties arising from the inner-detector tracking efficiency and confidence in Monte Carlo modelling of the relative behaviour of the charged and neutral components of jets.  Jets composed from tracks are matched to calorimeter-jets if they are within δR < 0.3 of each other. The split and filtered calorimeter-jets are matched to Cambridge-Aachen R = 1.2 track-jets. Ratios are defined between track-and calorimeter-jets for each variable Example distributions of some of the ratio variables are shown in figure 7. It can be seen that the ratios are in broad agreement between data and Monte Carlo. To quantify the level of agreement, double ratios are defined:  where again, X can be p T , mass or any of the substructure variables. The distributions of the variables X calorimeter−jet themselves are not necessarily expected to be correctly modelled by Monte Carlo. However, if the simulation correctly models the effect of the detector on these variables, the double ratios ρ X , are expected to be consistent with unity. Figure 7 also shows below each plot the corresponding double ratio. In order to account for possible uncertainties due to different fragmentation and hadronisation models, these double ratios are also calculated with a variety of Monte Carlo programs.
Final scale uncertainties are determined by adding in quadrature the estimated uncertainty on the inner-detector measurement with the deviation from unity observed in the double ratios. The resulting scale uncertainties on p T , mass and substructure variables are in the range 3-6%. The highest p T bins contain fewer events and therefore suffer from sta-

JHEP05(2012)128
tistical fluctuations when calculating the double ratio deviation. These scale uncertainties tend to dominate the systematic uncertainties on the final measurements.
As an additional cross-check, Monte Carlo-based tests are used to determine the dependence of the detector response on a number of different variables. These include samples produced with modified detector geometry, different GEANT hadronic physics models and different Monte Carlo generators. These tests indicate variations of a similar order of magnitude to those observed in the in-situ studies. The in-situ track-jet study is limited by inner-detector acceptance and only extends as far as |η| < 1.0, which corresponds to 75% of the jets in the measured distributions. However, the Monte Carlo-based tests also indicate no strong η-dependence from any of the different possible types of mismodelling examined. Based on this, the systematic uncertainty is applied to the entire sample.
In-situ tests of the JER [43] for anti-k t jets with R = 0.4 and 0.6 indicate that the jet p T resolution predicted by simulation is in good agreement with that observed in the data. Here, the resolution uncertainties are taken from the Monte Carlo tests described above only, primarily because the mass and substructure variable resolutions are difficult to validate in-situ with this dataset. From studying the variations in resolution created by varying the detector geometry, GEANT hadronic physics model and Monte Carlo generator, resolution uncertainties of around 20% are conservatively estimated, except for τ 21 and τ 32 where they are around 10%.

Data correction
To compare the measurements directly to theoretical predictions the final distributions in this study are corrected for detector resolution and acceptance effects. The procedure here is a matrix-based unfolding technique called Iterative Dynamically Stabilised (IDS) unfolding [44,45].
In this procedure truth jets and reconstructed jets in Monte Carlo simulated events are matched using the criterion δR < 0.2, which leads to a match for > 99% of jets. Matched pairs of jets are used to construct a transfer matrix corresponding to the effect of the detector. A true jet can be matched with a reconstructed jet that fails the p T cut and vice-versa. As such, the efficiency for matching a true jet to a reconstructed jet in the same p T bin is recorded as a function of the variable of interest. The reverse quantity is also defined for reconstructed jets. The data are then scaled by the reconstructed matching efficiency, multiplied by the transfer matrix and finally divided by the truth matching efficiency. There is also an iterative optimisation step, where the rows of the matrix are scaled to match the corrected result. Pythia is used to provide the central value. Each p T bin is unfolded independently. The systematic uncertainty is assessed by repeating the procedure using Sherpa samples.

Results
Using the analysis techniques outlined above, measured not include the data statistical uncertainty. Although in some cases the Monte Carlo predictions are not in agreement with the data, the shapes of the distributions are correctly reproduced. For jet mass the distributions produced by Pythia tend to be too soft, while those from Herwig++ are too hard. Notably, the Cambridge-Aachen jet mass after splitting and filtering, as shown in figure 9, is the only variable for which the Monte Carlo predictions are in agreement to within statistical uncertainties, both with each other and the data. The substructure variables exhibit generally better agreement with Monte Carlo predictions than mass, with all but a few bins correctly described by both Pythia and Herwig++. In the higher p T bins statistical fluctuations begin to limit the precision of the measurements, but the level of agreement in all variables appears to remain approximately constant between p T bins.   The unfolding technique used introduces correlations between the bins. The statistical uncertainty in these results represents the diagonal element of the covariance matrix only; therefore, comparison to alternative predictions requires use of the full covariance matrices. These matrices are available, along with the full results presented here, in HepData [46].

Mean mass with multiple proton-proton interactions
The results presented so far have been for events containing only one pp interaction; however even in this early period of running, the data contain events with multiple simultaneous pp interactions (pile-up) [47]. These additional collisions are uncorrelated with the hard-scattering process that typically triggers the event. They therefore present a background of soft, diffuse radiation that offsets the energy measurement of jets and will impact jet-shape and substructure measurements. It is essential that future studies involving jet-substructure variables, such as those investigated here, be able to understand and correct for the effects of pile-up. Methods to mitigate these effects will be essential for jet multiplicity and energy scale measurements. Substructure observables are expected to be especially sensitive to pile-up [8]. This is true in particular for the invariant mass of large-size jets. Techniques such as the splitting and filtering procedure used in this study reduce the effective area of large jets and are therefore expected to reduce sensitivity to pile-up.
The sensitivity of mean jet mass to pile-up is tested in this dataset. The correlation of the mean jet mass of anti-k t jets with the number of reconstructed primary vertices is presented in figure 17 (left). All jets with a p T of at least 300 GeV in the rapidity range |y| in good agreement with the ratio of the third power of the jet R-parameter. This is in agreement with predictions of scaling of the mean mass [48,49]. This behaviour can also be qualitatively explained by two factors. Firstly the jet area in the y − φ plane grows roughly as R 2 . Moreover, the contribution of these particles to the jet mass scales with the distance between them approximately as R/2, giving another power of R.  Figure 17 (right) shows the dependence on N PV of the mean jet mass before and after the splitting and filtering procedure for Cambridge-Aachen jets. Since the angular requirement R jj > 0.3 is imposed, the splitting steps of this procedure naturally select more massive jets. Since the splitting procedure selects a kinematically biased subset of jets, a third line shows the mean mass prior to filtering of jets that pass the splitting. The filtering step significantly reduces the impact of pile-up on mean jet mass. In fact, the slope of the straight line fitted to the filtered jet data points is statistically consistent with zero.
Altogether, this demonstrates that the pile-up dependence of mean jet mass in real LHC conditions matches expectations. Additionally, jet substructure techniques that reduce the area of jets are promising for suppressing the effects of pile-up.  Figure 14. Normalised cross-sections as functions of τ 32 of Cambridge-Aachen jets with R = 1.2 in four different p T bins.

Conclusions
Jet mass and several jet substructure variables have been measured. This is the first particle-level measurement of these variables at the LHC and in many cases the first at any experiment. There is broad agreement between data and leading-order parton-shower Monte Carlo predictions from Pythia and Herwig++, although there is some scope to improve this. Jet mass has generally been found to exhibit the largest disagreements with Monte Carlo simulations. However, in contrast to this, the masses of jets after the Cambridge-Aachen splitting and filtering procedure display good agreement both with and  between Monte Carlo simulations. The substructure variables √ d 12 , √ d 23 , τ 21 and τ 32 are all reasonably well reproduced by Monte Carlo predictions. Additionally, the effects of pile-up on mean jet mass have been found to match phenomenological expectations for R-parameter dependence. Splitting and filtering has also been found to reduce the impact of pile-up significantly.
Generally these results show that jet mass and substructure quantities can be successfully reproduced by leading-order parton-shower Monte Carlo. This result bodes well for future analyses aiming to make use of jet substructure techniques. [26] ATLAS collaboration, Calorimeter clustering algorithms: description and performance, ATL-LARG-PUB-2008-002 (2008).