Jet Substructure Without Trees

We present an alternative approach to identifying and characterizing jet substructure. An angular correlation function is introduced that can be used to extract angular and mass scales within a jet without reference to a clustering algorithm. This procedure gives rise to a number of useful jet observables. As an application, we construct a top quark tagging algorithm that is competitive with existing methods.


Introduction
In preparation for the LHC, the past several years have seen extensive work on various aspects of collider searches. With the excellent resolution of the ATLAS and CMS detectors as a catalyst, one area that has undergone significant development is jet substructure physics. The use of jet substructure techniques, which probe the fine-grained details of how energy is distributed in jets, has two broad goals. First, measuring more than just the bulk properties of jets allows for additional probes of QCD. For example, jet substructure measurements can be compared against precision perturbative QCD calculations or used to tune Monte Carlo event generators. Second, jet substructure allows for additional handles in event discrimination. These handles could play an important role at the LHC in discriminating between signal and background events in a wide variety of particle searches. For example, Monte Carlo studies indicate that jet substructure techniques allow for efficient reconstruction of boosted heavy objects such as the W ± and Z 0 gauge bosons [1][2][3][4], the top quark [5][6][7][8][9][10], and the Higgs boson [11][12][13][14][15][16].
At least two broad classes of jet substructure techniques have been developed. The first class employs jet shape observables to probe energy distribution in jets. The second class makes use of the clustering tree of a jet as constructed by the Cambridge-Aachen (CA) [17] or k T [18] sequential jet clustering algorithms to identify and characterize subjets within the jet.
Jet shape observables offer a measure of how energy is distributed within a jet. The energy distribution of a jet is determined by a variety of factors, including heavy particle decays, color flow, and the dynamics of the parton shower. Different jet shape observables have been constructed to quantify these [19][20][21][22][23][24] and other aspects of jet substructure. Infrared and collinear (IRC) safe observables can in principle be computed in perturbation theory or modeled with Monte Carlo simulations and then compared to experimental results. Combining different jet shape observables has been shown to provide for effective discrimination in a variety of different scenarios (see e.g. [25]). A disadvantage of jet shape observables is that, because they can only be computed once the constituents of the jet have been defined, they cannot be used to determine how to most effectively select jets within a given event. In particular a jet shape observable is only as good as the choice of particles that define the jet. As a result jet shape observables do not offer a way of selectively removing likely contamination from underlying event or pile-up † .
The CA and k T sequential jet algorithms are defined by metrics d ij that have been chosen with the goal of constructing clustering trees that closely approximate the perturbative QCD parton shower. The first few branches of the clustering tree can be used to decompose a jet into subjets. This unclustering procedure has seen a wide variety of phenomenological applications, especially in the context of tagging jets that result from boosted heavy particle decays, e.g. filtering in boosted Higgs searches [11]. A closely related procedure, referred to as pruning [27], vetoes on QCD-like branches with the goal of sharpening jet mass resolution. This family of procedures offers a number of tunable parameters, allowing the user to control how much and what kind of substructure is identified. A disadvantage of these procedures is that, in order for them to be most effective, the clustering tree must accurately reconstruct the parton shower history of the jet. In practice the CA and k T algorithms reconstruct the most probable shower history, which need not coincide with the actual shower history. In addition, the parameters which define the unclustering typically impose a hard line between QCD-like behavior and non-QCD-like behavior that can fail to accommodate jets that deviate too much from "most probable" jets.
The goal of this paper is to explore an alternative procedure for identifying and characterizing substructure within jets. The discussion is organized as follows. In Section 2, we introduce the "angular correlation function" G(R) and discuss how structure in G(R) can be used to construct IRC safe jet observables. In particular we use G(R) to extract angular scales R * and mass scales m * directly from the constituents of a jet without use of a clustering tree. These angular and mass scales correspond to the angular separations and invariant masses of pairs of hard substructure in the jet. In Section 3, we present an application of these ideas to the tagging of boosted top quarks. We find that the resulting top tagging algorithm is competitive with other methods in the literature. Given the straightforward approach we take in applying G(R) to top tagging, this good performance 'out of the box' is encouraging. In Section 4 we discuss other possible applications of the methods introduced in this paper.

Angular Correlation Function
To characterize substructure in a jet J we define the angular correlation function G(R) as where the sum runs over all pairs of constituents of J and Θ(x) is the Heaviside step function. Here p T i is the transverse momentum of constituent i, and ∆R ij is the Euclidean distance between i and j in the pseudorapidity (η) and azimuthal angle (φ) plane: On the LHS of Eq. (1) the dependence on transverse momenta is fixed by collinear safety. Provided that ∆R ij is raised to a positive power, the entire expression is IRC safe. We choose ∆R 2 ij in Eq. (1) so that G(R) has a clear physical interpretation: G(R) is the (fractional) mass contribution from constituents separated by an angular distance of R or less. An important point here is that R does not mark the distance with respect to any fixed center.
For a jet with no substructure, G(R) is featureless. In contrast, if a jet has significant substructure at an angular scale R = R * , G(R) exhibits a discontinuous cliff at R = R * , see Fig. 1. Such a cliff corresponds to two or more hard subjets separated by a distance R * from one another, with the cliff height determined by the invariant mass of the subjets. Notice that these cliffs are closely related to mass drops as exploited in a variety of jet substructure studies [8][9][10][11][12]. We expect that a typical QCD jet will have an angular correlation function that is more or less smoothly varying without any sharp cliffs, while for a jet with significant substructure G(R) will have one or more sharp cliffs at angular scales R = R * corresponding to distinct separations between hard subjets in the jet. This suggests several jet observables that can be defined from G(R). Given a procedure for finding cliffs in G(R), we can consider: (i) the total number of cliffs; (ii) the angular scales R = R * at which cliffs are found; and (iii) the cliff heights at each R = R * . We will see that, once suitably defined, each of the resulting observables proves useful in characterizing substructure within jets.
In effect, G(R) defines a continuous family of jet shape observables. Each G(R 0 ) for a given R 0 differs from most jet shape observables in that: (i) it does not contain any preferred or reference four-vectors (e.g. the energy center of the jet); and (ii) it involves a sum over two-particle correlations. For example, the radial jet energy profile ψ(R) as in [28,29] quantifies the fraction of a jet's energy that is contained within an angular distance R of the center of the jet. Although ψ(R) for a top jet will exhibit discontinuous cliffs at particular angular scales, these scales are not useful for characterizing the substructure of the jet. This is because the resulting angular scales, which are defined with respect to the jet center, cannot be used to reconstruct the separations between the three top subjets. In addition, the invariant masses of pairs of subjets are not accessible from ψ(R). The angular correlation function G(R) is closer in spirit to factorial moments as in [30], which were introduced to quantify scaling behavior in multi-particle production.
In order for the observables derived from G(R) to be useful, care must be taken in defining them. We find that, instead of directly finding cliffs in G(R), it is preferable to find peaks in a suitably chosen derivative of G(R). In particular, because we are interested in ratios of mass scales, we should look for structure in log G(R) ‡ . Because QCD is approximately scale invariant, structure in log G(R) should be identified by calculating derivatives with respect to log R. Since d/d log R = R d/dR, ‡ The normalization in G(R) has been chosen with this logarithm in mind: G(R) increases monotonically from 0 to 1 as R increases from R = 0 to R = max ∆R ij .  Figure 3: An illustration of how prominence requirements, by selecting peaks that stand out above background noise, prevent angular scales from being double-counted.
this choice ensures that noise in log G(R) at small R does not result in extraneous peaks. This suggests that the quantity of interest is d log G(R)/d log R. A concern with d log G(R)/d log R is that the derivative produces a delta function δ(R − ∆R ij ); as a consequence, d log G(R)/d log R defines a noisy function of R. Therefore, to identify structure in log G(R) we define an "angular structure function" ∆G(R) by replacing the delta function in d log G(R)/d log R with a smooth kernel K(x): In the following we choose a gaussian K(x) = e −x 2 /dR 2 / √ πdR 2 with dR = 0.06. We find that this choice reduces noise substantially. This value of dR was selected after scanning a range dR ∈ [0.02, 0.12] and choosing dR to maximize the performance of the top tagging algorithm presented in Sec. 3.
To identify angular scales R = R * in the jet that correspond to distinct hard substructure in the event, it is important to find peaks in ∆G(R) in a way that is robust against noise. § For this purpose we borrow a concept from geography called (topographic) prominence [31]. The prominence of the highest peak is defined as its height. In the mountaineering analogy, the prominence of any lower peak P is defined as the minimum vertical descent that is required in descending from P before ascending a higher, neighboring peak P , where P can lie to either side of P . Fig. 2(b) illustrates this concept for two different peaks. In Fig. 3 we illustrate how using prominence instead of height to identify physical peaks can eliminate extraneous peaks that are artifacts of the detector's finite angular resolution. The pictured jet has two distinct hard subjets separated by a single angular scale ∆R. Since one of the subjets has its energy deposited in two neighboring calorimeter cells, the angular structure function ∆G(R) exhibits two distinct peaks in the neighborhood of R = ∆R. Only one of the two peaks has a large prominence, and so using prominence to select peaks in ∆G(R) ensures that only a single angular scale near R = ∆R is identified.
In the following we will identify a peak in ∆G(R) by demanding that its prominence exceeds a minimum value h 0 . So far we have described how to define two different jet observables from prominent peaks in ∆G(R). The first is n p , the number of prominent peaks in ∆G(R). The second is the various angular scales R i * at which prominent peaks are located. It remains to define a jet observable that corresponds to cliff heights in G(R). The magnitude of a cliff's height in G(R) will map onto the height of the corresponding peak in ∆G(R). This height is determined by the invariant mass of (typically) two hard subjets separated by an angular distance R = R i * . For each prominent peak in ∆G(R) with height ∆G(R i * ) we define the partial mass m(R i * ) ≡ m i * as where we have used Eq. 2 to extract the (appropriately normalized) numerator of the angular structure function. Here is the denominator of G(R) in Eq. 1 and is approximately equal to the squared jet mass m 2 J . To see the physics that is encoded in the partial mass consider a jet with two infinitely narrow, hard subjets separated by an angular distance ∆R and with transverse momenta p T 1 and p T 2 . This jet will exhibit a single prominent peak in ∆G(R) at R = ∆R. The corresponding partial mass m * will be given by m 2 * = p T 1 p T 2 ∆R 2 ≈ 2p 1 · p 2 . ¶ Thus the partial mass is a measure of the mass at a particular angular scale. For a jet whose substructure is determined by a heavy particle decay, the partial masses will be fixed by the kinematic constraints of the decay. This observation will be explored further in Sec. 3 in the context of top tagging. Now that we have defined n p , R i * , and m i * , we can ask how these jet observables characterize the substructure of a jet. First, for an idealized jet composed of n s hard, narrow subjets with each pair of subjets separated by distinct angular scales R i * , we expect the number of peaks n p to be given by In general this equality becomes an inequality n p ≤ n max p for jets whose substructure is less clean. For example, if some of the n s subjets are wide or if some of the angular separations are approximately degenerate, then ∆G(R) may exhibit fewer than n max p prominent peaks. When a prominent peak is resolvable, however, the resulting angular scale R i * corresponds to an angular separation between two or more hard substructures in the jet. For a QCD jet, the distribution of prominent peaks should be roughly uniform in R, since QCD is approximately scale invariant. For a jet that is initiated by a heavy particle decay, the angular scales R i * will be peaked at values characteristic of the decay kinematics of the heavy particle. The corresponding partial masses will be correlated to mass scales intrinsic to the heavy particle decay. In contrast, for QCD jets the partial masses will be peaked at small values, as determined by the soft and collinear singularities of QCD.
Some of the foregoing discussion is illustrated in Figs. 2 and 4. In Fig. 2 we show a boosted top jet with a clean three-pronged substructure. In the p T plot in Fig. 2(a) the distances R i * between the three hardest cells are indicated. From Fig. 2(b) we see that it is these same three angular scales that show up as prominent peaks in the angular structure function ∆G(R). Less prominent peaks correspond to softhard correlations in the jet. The substructure of the QCD jet in Fig. 4(a) is quite different, with a single hard core surrounded by soft diffuse radiation. The mass of the jet is largely due to these soft, wide-angle emissions, and the most prominent peak in ∆G(R) corresponds to correlations between the hard core of the jet and one such emission. Prominent peaks in ∆G(R) for this QCD jet are distributed approximately uniformly in R, as expected.
The close correspondence between structure in the p T plots apparent by eye and the structure identified by the angular structure function ∆G(R) is encouraging. To investigate the effectiveness of this procedure more thoroughly will require testing it against a concrete application, where the characteristics of the observables n p , R i * , and m i * can be explored in greater detail. A good testbed will involve jets with complex substructure. For this reason we choose to construct a top tagging algorithm as a first application.

Top tagging
If every top jet had the clean three-pronged structure apparent in Fig. 2(a) then constructing an efficient top tagger would be straightforward. In practice, reconstruction of the top is complicated by a number of factors, including: (i) the finite resolution of the detector, which degrades mass and angular resolution; (ii) collinear radiation, which can make it difficult to resolve subjets initiated by hard partons that are close together; and (iii) the boost from the top rest frame to the lab frame, which can result in decay products that are soft or overlap with one another. As a consequence, many top jets will have fewer than three prominent peaks in their angular structure functions. For example, in Fig. 5 we show an example of a top jet in which the W ± decay products do not exhibit a clean two-pronged structure. As a result ∆G(R) only has a single prominent peak corresponding to mass correlations between the W ± and the b subjet. Constructing a tagger with high signal efficiencies will therefore require considering top jets with fewer than three prominent peaks in their angular structure functions.
This suggests that the following procedure could result in an efficient top tagging algorithm. Fix a minimum prominence h 0 . For each candidate jet, calculate the angular structure function and identify the number of peaks n p with prominences exceeding h 0 . Reject candidate jets with n p = 0 or n p > 3 and sort the rest into bins with n p = 1, 2, 3. Then apply separate sets of cuts to the R i * and m i * in each bin. This procedure has the advantage that candidate jets are being sorted with respect to their observed topologies. For example, top jets in which the decay products of the W ± are merged will be treated differently from top jets that exhibit a clean three-pronged substructure. In each bin cuts will be applied to the observables available from the identified substructure, and the cuts can be separately optimized to reflect the diversity of actual tops. By not requiring candidate jets to have the substructure of an idealized top jet with three distinct prongs, the top tagger can be more accommodating towards "ugly duckling" tops and thus attain higher signal efficiencies .
The outline of this section is as follows. In Sec. 3.1 we discuss distributions of the observables R i * and m i * for top jets and QCD jets. In Sec. 3.2 we present the details of our top tagging algorithm. In Sec. 3.3 we describe the Monte Carlo used to test the top tagger as well as the performance of the algorithm.

Observables
To set the stage for the top tagging algorithm defined in the next section, we first discuss what sort of top jet discrimination is available from the observables R i * and m i * . In Fig. 6 we illustrate distributions for these observables in the n p = 3 bin. For top jets the kinematic constraints of the top decay in conjunction with the boost to the lab frame account for the basic features (see appendix A for details). Identifying the smallest R * , i.e. R 1 * , with the angle between the b subjet and the closer of the W ± subjets, we expect that R 1 * ∼ 0.25 for this 500 GeV ≤ p T ≤ 600 GeV bin. Similarly, identifying R 2 * with the angle between the two W ± subjets and R 3 * with the angle between the b subjet and the further of the W ± subjets, we expect that R 2 * ∼ 0.50 and R 3 * ∼ 0.75. With these identifications for the three peaks, the predictions for the partial masses become m 1 * ∼ 50 GeV, m 2 * ∼ m W , and m 3 * ∼ 140 GeV. These predictions for the R i * and m i * match up well with the distributions in Fig. 6, although in practice the corresponding identifications only hold on the average. Note that the kinematic constraints of the top quark decay imply strong correlations between R i * and m i * for each i. This is illustrated in Fig. 7, where R 2 * has been plotted against m 2 * in the n p = 3 bin. For QCD jets R 2 * and m 2 * are uncorrelated.
In contrast to top jets, QCD jets have no intrinsic scales. Since QCD is approximately scale invariant and the derivative in ∆G(R) is with respect to log R, we expect the R * distributions to be approximately uniform. Imposing the ordering R 1 * ≤ R 2 * ≤ R 3 * then has the consequence that the R 1 * distribution should peak at R = 0, the R 2 * distribution should peak at intermediate R, and the R 3 * distribution should peak towards large R. This is consistent with what is seen in Fig. 6, up to edge effects at large R in the R 3 * distribution. The partial masses of QCD jets are peaked towards small m i * , as we expect given that the physics of m i * is qualitatively The features of the distributions in the n p = 1, 2 bins are qualitatively similar, see Fig. 9. Here it is less clear what identifications to make for the different peaks, and it is likely that there is a fair amount of mixing between different decay topologies. In any case the observables derived from ∆G(R) in the n p = 1, 2 bins make effective discriminants between top jets and QCD jets, although more discrimination is available in the n p = 3 bin. The distributions for R 1 * and m 1 * in the n p = 1 bin are consistent with correlations between the W ± subjets j W 1 and j W 2 ; one possibility is that for these top jets the b subjet is too soft to yield prominent peaks. The distributions for the n p = 2 bin are consistent with correlations between the b subjet and each of the two W ± subjets; one possibility is that for these top jets the W ± subjets j W 1 and j W 2 are nearly merged so that correlations between j W 1 and j W 2 do not result in any prominent peaks.

An algorithm
The distributions in Figs. 6-9 suggest that imposing cuts on m J , R i * , and m i * could lead to effective discrimination between top jets and QCD jets. To test this we employ the following top tagging algorithm. Using the CA algorithm, cluster the event into fat jets with R = 1.5. Although a more advanced version of the tagger could benefit from using variable R (or a filtered jet mass m filt ), we leave the value of R fixed for simplicity. Before applying any cuts, first presort the candidate jets into p T bins of width 100 GeV. Then for each candidate jet calculate ∆G(R) and identify the number of peaks n p whose prominence exceeds a fixed minimum prominence h 0 = 4.0. This value of h 0 has been selected by scanning over a range h 0 ∈ [1.0, 10.0] and choosing h 0 to minimize the background efficiency over a wide range of p T and signal efficiencies. Within each p T bin further sort the candidate jets into three peak bins (n p = 1, 2, 3), throwing out jets with n p = 0 or n p > 3. This n p cut removes a sizable fraction (∼ 15%) of QCD jets, while rejecting only ∼ 3% of top jets, see Fig. 10. For discrimination between top jets and QCD jets to be most effective one would like to disentangle the correlations between the observables as much possible; for simplicity, however, we choose to make rectangular cuts in the space of observables. In particular, in the n p = 3 bin we choose to impose cuts on six of the seven available observables, excluding m 1 * , which is the least discriminating observable. More specifically, we impose the following cuts: 3. m 2 * > m min 2 * , m 3 * > m min

*
A candidate jet that passes this set of cuts is tagged as a top jet. In the n p = 1, 2 bins we employ the corresponding set of cuts, except in contrast to the n p = 3 bin, we make use of all of the observables. Also, we impose an additional cut m J < m t max in the n p = 1 bin only, since the smaller number of observables in the n p = 1 bin (three) means that imposing this cut does not substantially increase the computer time needed to find optimal cuts. For the moment we leave the values of the cuts unspecified; this will be addressed in the next section.

Results
We use two different event samples for evaluating the performance of the top tagger. These event samples (from pp collisions with center of mass energy of 7 TeV) belong to a set of benchmark event samples that have been made publicly available by participants of the BOOST 2010 workshop [32]. The first event sample is generated by HERWIG 6.510 [33] with the underlying event simulated by JIMMY [34], which has been configured with a tune used by ATLAS. The second is generated by PYTHIA 6.4 [35] with Q 2 -ordering and the 'DW' tune for the underlying event. See [36] for more details. Unless noted otherwise, all results presented in this paper make use of the HERWIG event samples; the PYTHIA event samples were used as crosschecks. For signal jets we use the hardest jet in each event of a Standard Model hadronic tt sample, excluding jets with |η| > 2.5. For background jets we use the hardest jet in each event of a Standard Model dijet sample, again excluding jets with |η| > 2.5. For both event samples there are O(10 4 ) events in each p T bin of width 100 GeV. For jet clustering we use the CA algorithm [17] with R = 1.5 as implemented by FastJet 2.4.2 [37]. In order to simulate the finite resolution of the ATLAS or CMS calorimeters, particles in each event are clustered into 0.1 × 0.1 cells in (η, φ) and then combined into massless four-vector pseudoparticles that are fed into FastJet. For each p T window the cuts are chosen to yield the smallest background efficiency B at each fixed signal efficiency S . This optimization is performed by a custom Monte Carlo code that finely samples the space of cuts. Some sample values for the different cuts are given in Table 1.
In Fig. 11(a) and Fig. 11(b) we illustrate the performance of the top tagger. The performance is comparable to other top taggers in the literature [6-8, 27, 38-42], with B ∼ 5% for S = 50% and B ∼ 0.5% for S = 20% [36]. For a fixed signal efficiency, the background efficiency is approximately flat across the p T range we have tested, 200 GeV ≤ p T ≤ 800 GeV. In Table 1 we see that in the n p = 2 and especially n p = 3 bins, where correspondingly more observables are available for discrimination, the top tagger is able to attain large signal efficiencies. Because the net signal and background efficiencies are obtained by combining all three n p bins, the largest contribution to S is actually from the n p = 2 bin, since the plurality of top jets land in the n p = 2 bin for h 0 = 4.0 (see Fig. 10). For example, at S = 50% and for 500 GeV ≤ p T ≤ 600 pT subsamples (a,b) and for the subsample containing jet with pT range 300-400 employs pruning Section 3.3). For additional step: jet is unclustered erging) and the e W boson, as in ciency curves in scanned over the jet . We then scan masses, with the ass is always refine two working f 20% and 50%. points are given and background n Fig. 4. The tag , after a turn-on s that we tested 0 GeV than the ungroomed approaches. This reflects the relative stability of the groomed variables as a function of p T . Splitting scales, in particular, are sensitive to the p T of the initial jets, however groomed masses correspond closely to physical quantities and hence are Lorentz-boost invariant.
The overall mistag rates for the different taggers at the different working points are summarised in Table 2. For the 20% working point it is clear that the grooming based taggers perform strongly, suppressing the background by a factor of 20-100. For the samples we chose, the pruning approach performs best. The ungroomed tagging approaches are more competitive at the 50% working point, which is often at the limit of the applicable range for the grooming-based approaches. It can be seen that the pruning-based approach actually performs worst at this working point. This seems to be the reflection of the fact that grooming approaches produce a narrow top mass peak, typically containing around 60% of the signal for top jets. To produce an overall efficiency of around 50% , in combination with the m jet > 120 GeV requirement, we must then choose a large mass window. This  [36] with the results from our tagger added. Here the candidate jets have transverse momenta 500 GeV ≤ p T ≤ 600 GeV. For Fig. (a) only, candidate jets have been clustered with the anti-kT algorithm with R = 1.0, as was done in the BOOST study. As a consequence the performance in (a) is better than in (b), where the large jet radius degrades top mass resolution. In (b) the background efficiency is plotted as a function of p T for signal efficiencies of S = 50% (black), 40% (blue), 30% (green) and 20% (red). Efficiencies at a given p T 0 are calculated from a p T window of 100 GeV centered at p T 0 . Note that, as a consequence, each point is not statistically independent. Error bands are statistical.  Table 1: Sample optimized cut parameters at a (total) signal efficiency of S = 50% for two different p T bins. In the rightmost column we show the signal and background efficiencies obtained within each n p bin taken separately; i.e. these numbers do not take into account what fraction of candidate jets end up in each n p bin. Signal efficiency increases substantially with n p . GeV about 55% of tagged top jets come from the n p = 2 bin, while about 20% and 25% come from the n p = 1 and n p = 3 bins, respectively. Similarly, the background efficiency is lowest in the n p = 1 bin; only QCD jets with two or three prominent peaks do a good job of faking the substructure of a top jet. For example, at S = 50% and for 500 GeV ≤ p T ≤ 600 GeV about 32%, 54%, and 14% of tagged QCD jets come from the n p = 1, n p = 2, and n p = 3 bins, respectively, even though only about 31% of QCD jets fall in the n p = 2 or 3 bins.
As a crosscheck in Fig. 12 we compare the performance of the top tagging algorithm between the HERWIG and PYTHIA event samples. We see that the background efficiency is generally lower for HERWIG than it is for PYTHIA. One possible reason for this is that that although the cut parameters have been separately optimized for both event generators, the parameters h 0 = 4.0 and dR = 0.06 were optimized on the basis of the HERWIG event samples. The HERWIG and PYTHIA event samples already disagree at the level of the n p distributions, and this disagreement persists in the absence of the underlying event. This means that the typical prominence of peaks in ∆G(R) differs between the two event samples. It would be interesting to understand in detail which features of the two event generators (the parton shower description, the underlying event model, etc.) contribute to this disagreement. Going further in this direction, however, lies outside the scope of this paper.
Given the large number of cut parameters that enter into the top tagging algorithm, overtraining is a concern. By training the cut parameters on a subset A of the event samples and testing the resulting cuts on subsets B i disjoint from A, we can get some idea for how susceptible the quoted efficiencies are to overtraining. We find that the variation in the background efficiency B (at fixed S ) that results from this validation procedure is comparable to the quoted statistical uncertainties. This additional uncertainty should be kept in mind when considering the absolute performance of the top tagger. Since precise estimates for background efficiencies are made difficult by other uncertainties, such as those which enter the modeling of QCD backgrounds or detector mock-up, we do not consider overtraining any further.
Our simple mock calorimeter does not account for a variety of detector effects. Recent studies at the LHC (see e.g. [43]) suggest that Monte Carlo tools provide a fair description of the performance of jet substructure algorithms. Since the algorithm discussed in this paper relies on kinematic observables, we suspect that the performance of the algorithm will not be exceedingly sensitive to detector effects. As a consequence the tagging efficiencies quoted above should be fair estimates of what can be expected in the absence of pile-up. Sensitivity to pile-up requires further study, and a full detector simulation would be required to better understand the expected performance of the top tagging algorithm. Aspects of the algorithm may also be amenable to a sideband analysis. In particular, by looking at regions of the R i * -m i * plane (see Fig. 7) away from the signal region the shape of the background distributions can be extrapolated into the signal region.

Discussion
By sorting jets according to the number of prominent peaks identified in their angular structure functions ∆G(R) and making rectangular cuts on the angular and mass scales R i * and m i * , we have been able to construct an efficient top tagging algorithm. Since the focus of this paper has been to demonstrate that ∆G(R) can be used to identify angular and mass scales in jets, the particular algorithm we have described was chosen for its simplicity. A number of possible improvements to the algorithm suggest themselves, however, even leaving aside modifications that are unrelated to the use of ∆G(R). One possible concern is the large number of cut parameters that result from using three peak bins. Given the strong correlations between the R i * and m i * (see Fig. 7), one way to reduce the total number of free parameters would be to consolidate some of the variables. For example, one could replace separate cuts on R i * and m i * with a single cut on m i * /R i * . One could also investigate different schemes for binning identified peaks in ∆G(R). For example, the expected substructure of a top might be better captured by sorting into bins {n p0 , n p1 }, where bin {n p0 , n p1 } contains n p0 peaks with prominence P ≥ h 0 and n p1 peaks with prominence h 1 ≤ P < h 0 . The definition of the partial mass in Eq. 3, which is most accurate for narrow subjets, could be improved to better capture the invariant mass of wide subjets. The particular way in which we organize the observables R i * and m i * according to their ordering in R as well as the use of topographic prominence to identify peaks could also be revisited. Since ∆G(R) defines a continuous number of observables, this list of possible modifications could go on indefinitely, and it is interesting to ask whether our simple procedure makes efficient use of the information available from G(R). Going further in this direction, however, lies outside the scope of this paper.
Although we have explored the use of the angular correlation function G(R) and the angular structure function ∆G(R) for the particular application of top tagging, the generality of the resulting procedure suggests that it could be useful in a variety of different contexts. It seems likely that procedures that make use of ∆G(R) will be most effective when accurate reconstruction of angular scales is valuable. Some interesting possibilities include: • using observables defined from ∆G(R) to probe QCD; for example, measurements of R * or n p distributions for QCD jets could be compared against Monte Carlo calculations • using R * distributions to search for new physics (angular bumps instead of mass bumps); this is attractive, since accurate mass reconstruction is difficult • calculating ∆G(R) for the event as a whole and using the identified angular scales to determine an appropriate jet radius parameter R event-by-event • using ∆G(R) to access helicity/spin information in jetty cascades • generalizing G(R) to some kind of n-particle correlation function, which might prove to be useful in the context of n-body decays • using ∆G(R) to zoom in on the prominent angular scales within a jet and defining some kind of 'angular filtering' procedure to improve mass resolution • using G(R) to study correlations in the underlying event By performing what is essentially an 'angular fourier transform' on the constituents of a jet, ∆G(R) provides a convenient way of accessing angular and mass scales within jets. These angular and mass scales can be used to characterize the substructure of a jet. Further work will be needed to determine the extent to which the ideas explored in this paper can be applied more generally.
Peskin, and Jay Wacker for helpful feedback on the manuscript. We would like to thank Steve Ellis for suggesting the term 'cliffs' where we had previously (and confusedly) had 'ledges.' A.L. thanks Steve Ellis, Matt Strassler and Jon Walsh for an introduction to jets and motivation for studying jet substructure when the field was still in its infancy. This work is supported by the US Department of Energy under contract DE-AC02-76SF00515. M.J. receives partial support from the Stanford Institute for Theoretical Physics and A.L. is also supported by an LHC Theory Initiative Travel Award.