Measuring Boosted Tops in Semi-leptonic $t\bar t$ Events for the Standard Model and Beyond

We present a procedure for tagging boosted semi-leptonic $t\bar t$ events based on the Template Overlap Method. We introduce a new formulation of the template overlap function for leptonically decaying boosted tops and show that it can be used to compensate for the loss of background rejection due to reduction of b-tagging efficiency at high $p_T$. A study of asymmetric top pair production due to higher order effects shows that our approach improves the resolution of the truth level kinematic distributions. We show that the hadronic top overlap is weakly susceptible to pileup up to 50 interactions per bunch crossing, while leptonic overlap remains impervious to pileup to at least 70 interactions. A case study of Randall-Sundrum Kaluza-Klein gluon production suggests that the new formulation of semi-leptonic template overlap can extend the projected exclusion of the LHC $\sqrt{s}$ = 8 TeV run to Kaluza-Klein gluon masses of 2.7 TeV, using the leading order signal cross section.


I. INTRODUCTION
Boosted massive jets are becoming increasingly important at the LHC, both in searches for new physics (NP) and in the Standard Model (SM) related measurements. With the LHC pushing the energy frontier forward, and no physics beyond the SM showing up at O(1 TeV), the experimental searches are entering a kinematic regime in which a significant fraction of heavy SM particles are produced at ultra high p T . Of particular interest are boosted top quarks, as many models of new physics that address the hierarchy problem predict resonances or top partners with mass of O(1 TeV) and a large decay rate to top quark pairs (see for example Refs. [1][2][3][4][5][6][7]). Boosted top quarks are also significant for the measurements of the SM differential cross sections at high transverse momentum or at high invariant tt mass, as well as precision measurements of the total top production cross section.
Traditional jet reconstruction techniques are inadequate to fully describe the decays of heavy boosted objects at high transverse momentum. Small angular scales in the lab frame, which characterize the decays of massive boosted particles, make it difficult to distinguish them from the background of light parton QCD jets or electro-weak events using only jet mass and p T . Additional information about energy distribution within the jet, commonly referred to as jet substructure, allows for more efficient identification of heavy boosted objects. The leading order (LO) three prong decay structure of a boosted top and the correlations therein can be employed to distinguish top quark jets from, say, light parton QCD jets, which typically have a two prong topology.
A myriad of techniques to reconstruct and identify boosted massive jets have been developed over the past decade (see e.g. Refs. [8][9][10][11][12][13][14][15][16] for reviews), many of which can be grouped into two broad categories. The first category employs jet shape observables to probe the energy flow inside jets. These include the angular correlation functions studied in Ref. [17], as well as the sphericity tensors or planar flow of Refs. [18][19][20].
The second category makes use of the fact that the signal events are coming from decay of genuine massive particles and are thus characterized by spikes of energy which to leading order would correspond to massive particles daughter products. This category can be broken down to methods that incorporate Filtering [21] (see also Refs. [22,23]) and the Template Overlap Method [24]. Filtering algorithms act on the list of jet constituents by removing the soft components based on some measure which defines the "hard" part of the jet. The remaining constituents are then reclustered into the "filtered" jet. The Template Overlap Method, to be further exploited in this paper, does not manipulate the jet constituent list, nor does it require a special clustering algorithm for substructure analysis. Instead, the method compares the jet to a set of parton level states built according to a fixed-order distribution of signal jets called templates. The comparison makes use of an "overlap function" which evaluates the level of agreement between each measured jet and a set of templates.
Methods that employ elements from both categories or use other ingredients also exist. Jet dipolarity [25] and N-subjettiness [26,27] are examples of hybrid jet shapes which study jet energy flow with respect to directions of candidate subjets identified by the above mentioned techniques. More recently, shower deconstruction method [15,16] appeared as a variant of the matrix element method [28][29][30] to classify jets with the help of approximations to hard matrix elements and the parton shower.
Here we discuss the performance of the Template Overlap Method (TOM) as a tagger of semi-leptonic tt events.
TOM is a jet substructure tool which aims to match the energy distribution of a jet to a parton level structure of heavy particle decay, with all the relevant kinematic constraints. Reference [24] showed that TOM is a powerful hadronic top tagger, with rejection power over QCD background of O(100) being possible when the method is used in conjunction with other jet substructure correlations. A consequent ATLAS study of Ref. [31] validated TOM in experimental conditions at √ s = 7 TeV in the fully hadronic channel. The results were used to set a useful bound on a Randall-Sundrum (RS) [32] Kaluza-Klein (KK) gluon mass.
References [33,34] also studied TOM in the context of boosted Higgs decays to bb. The results showed that a combination of leading order (two-body) and beyond leading order (three-body) template analysis could significantly improve the signal to background ratio, at a cost to signal efficiency. Reference [34] also demonstrated that TOM is robust against pileup contamination. The study with twenty interactions per bunch crossing showed that pileup has little effect on TOM, with impact on rejection power being a 10% effect, thus reducing the need for pileup correction or subtraction.
In our current work, we consider the task of boosted top tagging in two ways. On one hand we are interested in quantifying the capacity of TOM to both tag and measure boosted tops. The ability to efficiently tag boosted tops and improve the top sample purity is important both in measurements of the tt differential distributions as well as in discriminating the non-top backgrounds in BSM physics searches. On the other hand, we study the ability of TOM to determine whether a tt event as a whole originated from an interesting BSM signal (e.g. an s-channel resonance that decays to tt), or whether it came from SM. Tagging the boosted tt event as a whole is of great importance in searches for BSM physics, as it provides a discriminant of SM tt events which would otherwise be considered an irreducible background. To that end, we introduce a formulation of TOM for top decays with large missing energy, whereby we derive the "leptonic top overlap," Ov lep 3 , from the standard definition of the peak template overlap. The leptonic overlap function differs from the hadronic top overlap is two ways. First, Ov lep 3 requires to keep track of the identities of template partons, while the identical template sets can be used both for hadronic and leptonic top analysis. Second, since only the transverse component of missing energy is available, we define the neutrino overlap function only in the transverse plane.
In addition to extending the TOM algorithm, we address several challenges relevant to jet substructure studies both with the recent √ s = 8 TeV LHC data, as well as the future runs. First, we are interested in the ability of TOM to accurately resolve the kinematic parameters of boosted tops. At high energies, the top pairs are often not produced back to back. Higher order effects become prominent at high p T , with gluon splitting to tt and hard gluon emission becoming non-negligible contributions to the total tt cross section. We show that TOM is able to distinguish back to back tt events from configurations in which the hadronic top does not recoil against the leptonically decaying top. As an illustration, we show that the resulting p T resolution of the top jet improves compared to the ATLAS-d 12 tagger [35]. The ability to reject the "asymmetric" tt events comes with the additional benefit of improving the signal to background ratio in heavy resonance searches, where we expect the fraction of asymmetric top events coming from new physics to be significantly lower than for SM tops.
Next, we study the capability of TOM to tag SM semi-leptonic tt events and reject the relevant backgrounds over a wide range of fat jet p T . For the purpose of measuring the tt system within the SM, our main background channel consists of W + jets events, while multijet QCD background does not contribute significantly after requiring that a "mini-isolated" lepton exists [36]. In the later sections which deal with BSM searches, we also consider the SM tt channel as one of the dominant backgrounds. We consider data from both MadGraph/MadEvent [37] showered with Pythia [38], and Sherpa [39] to illustrate the effects of different showering algorithms and matching procedures on the analysis. For the signal we also provide comparison with the next to leading order (NLO) results from POWHEG [40].
Our analysis shows that hadronic template overlap, Ov had 3 , properly tags about 10 signal jets for every 1 fake event at 60% top tagging efficiency and p T ∼ 500 GeV, with no additional cuts on the jet mass or b-tagging. The ability of Ov had 3 to reject background events slowly decreases with p T , due to the higher order effects becoming more prominent.
Adding Ov lep 3 to our analysis proves to be rewarding, as leptonic overlap's potential to reject background events can compensate for the reduction of b-tagging efficiency at high p T (assuming a tentative b-tagging efficiency of 50%).
Pileup and underlying event provide much nuisance for jet substructure observables, and here we extend the study of the TOM's susceptibility to pileup contamination. In order to reduce the difficulties of estimating the fat jet p T in a pileup environment, we introduce a method of selecting template p T bins based on the scalar sum of the leptonic top decay products and the kinematics of top pair events. In accord with the study in Ref. [34], we show that TOM is only mildly sensitive to pileup up to ∼ 50 interactions per bunch crossing. At higher levels of pileup (i.e. ∼ 70 interactions per bunch crossing), the signal tagging remains unaffected, whereas the increase in amount of fake events becomes important.
A case study on the discovery potential of an RS KK gluon serves to illustrate the performance of semi-leptonic TOM in new physics searches. We analyze the common benchmark KK gluon model, which features a large coupling to tt, in order to typify a resonance search, while an effective theory serves to illustrate the performance of TOM in searches where the signal m tt distribution is characterized by depletion of the tt spectrum from its SM expectation.
We show that a combination of Ov had 3 and Ov lep 3 can improve the analysis of the 20 fb −1 of data collected during the √ s = 8 TeV run, extending the projected limits to KK gluon masses of ≈ 2.7 TeV.
Finally, we discuss technical aspects of TOM relevant for the experimental implementation of the method both for SM and NP measurements. Subjets of highly boosted tops (p T ∼ 1 TeV) are characterized by sizable differences between the p T of the hardest and softest partons (typically greater than a hundred GeV). In order to adequately capture the radiation pattern of all three leading subjets over a wide range of fat jet p T , while at the same time not affecting the shape of the peak overlap distribution, we vary the template sub-cones radii according to their p T . The variation is inspired by the jet-shape data [41], and we follow the scaling rule for the sub-cone radii from the boosted Higgs study in Ref. [34]. This results in a stable signal efficiency for a fixed cut on template overlap over a wide range of top jet p T . A comparison with several values of fixed template sub-cone radii reveals a non-trivial fact that no single fixed radius provides stable signal efficiency for a fixed overlap cut.
We further show that missing energy resolution has little effect on the results of the overlap analysis, as well as demonstrate that TOM is insensitive to the angular resolution of the template momenta, with 50 steps in η, φ and beyond providing adequate template phase space coverage.
We organized the paper in seven sections addressing the above-mentioned novelties and issues. In Section II we define the hadronic top template overlap following the longitudinally boost invariant notation of Ref. [34], as well as introduce the leptonic top template overlap. Section III addresses our data generation and describes the pre-selection cuts we use to define our data sets. In Section IV we address the issues of higher order effects and the ability of TOM to reject asymmetric tt events. In Section V we present our results on the rejection power of TOM for SM tt events over a wide range of fat jet p T values for both hadronic and leptonic overlap. Section VI is dedicated to pileup studies, ranging from 0 − 70 interactions per bunch crossing. Finally, Section VII shows an example study of a search for new physics in a tt channel and illustrates the improvements TOM can provide for the analysis. The technical details of Template Overlap, such as the adequate number of templates, effects of missing energy resolution and template sub-cone scaling can be found in the Appendix.

A. Hadronic Top Template Overlap
Following the notation of Ref. [34], here we consider the definition of hadronic peak template overlap in terms of longitudinally boost invariant quantities: where p T,a is the transverse momentum of the a th template parton and p T,i is the transverse momentum of the i th jet constituent. The functional is maximized over f , a set of kinematically allowed decay configurations of the boosted top (templates). The weight σ a defines the energy resolution of the peak template overlap which we set to 1/3p T,a , while the coefficient = 0.8 serves to compensate for the radiation which falls outside the template sub-cones. We define the kernel F (n i ,n a ) as a step function wheren i,a is the position vector of a jet constituent (i) or template parton (a) in the η, φ space, and ∆R is the plain distance in η, φ between the i th jet constituent and the a th template parton. We determine the size of the template sub-cone, r a , according to a polynomial fit to the scaling rule of Ref. [34] (see Appendix A 2 for details), in addition to requiring that the template partons be isolated such that for any two template partons a and b.

B. Leptonic Top Template Overlap
So far, TOM has only been discussed in the context of fully hadronic decays of massive objects. It is also possible to define template overlap on heavy particle decays with missing energy such as the leptonically decaying boosted top 1 . The missing information about the longitudinal component of the missing energy makes the "canonical" overlap function definition of Eq. (1) inappropriate to describe a leptonic top decay. We begin instead by defining the leptonic three body overlap function, Ov lep 3 , as a product of the overlap functions for the b jet, the lepton and the neutrino: The first exponential in Eq. (4) is the familiar overlap function of Eq. (1) for a single template parton, the second exponential refers to the lepton, while the third exponential is associated with missing energy. We introduce coefficients i to include effects of energy reconstruction efficiency of the top decay product as in the case of Ov had 3 . Other than b = 0.8, here we use = ν = 1. We also find that σ b, ,ν = 1/3k b, ,ν T provides sufficient background rejection, while keeping the signal efficiency comparable to the fully hadronic case. 2 The optimization of overlap parameters is relatively straightforward, however it requires experimental input which is beyond the scope of our current work.
The maximization in Eq. (4) is performed over a full set of templates, in the same fashion as Ov had 3 , and with the same sets of templates.
We keep the kernel function F for the b template the same as in Eq. (1), while we define the neutrino kernel as 1 Leptonic Overlap can be used both on muons and electrons with no loss of generality. 2 Notice that as the detector-level corrections to the lepton energy scale much smaller than the above width chosen for the template, one can in principle improve upon the above definition by reducing σ . However, as qualitatively we expect that the template resolution would be controlled by the missing energy resolution which is much worse than the leptonic one, this potential modifications of the leptonic template and its implications is left for future work.
where ∆φ is the azimuthal distance between the template parton and the total E T / , and r ν = 0.2 is the neutrino azimuthal bin size.
The main difference between Ov lep 3 and Ov had 3 is that leptonic overlap takes into account only the azimuthal component of missing energy. Since our overlap algorithm requires us to rotate the templates into the fat jet frame on an event by event basis, the absence of the longitudinal component of missing energy does not allow for a good enough reconstruction of the top axis. We choose instead to rotate the templates so that the second template parton is always aligned with the lepton, the first template is always the neutrino and the third template is the b-quark.
Anchoring template states to the lepton also eliminates the need for a lepton kernel function. In addition, the fact that leptonic overlap deals with three different species of particles forces us to keep track of the identities of template partons on a template by template basis, a requirement which is absent in the case of the fully hadronic overlap.
Since the identities of reconstructed objects are matched to the identities of template partons, we also do not impose the non-overlapping template subcone criteria of Eq. (3) for Ov lep 3 .

III. EVENT GENERATION AND PRE-SELECTION
Before we begin to discuss the performance of TOM in a semi-realistic experimental setting, we take a moment to define our samples and the kinematic constraints we use in the event pre-selection. Our current analysis focuses on tagging the semi-leptonic tt events and rejecting the W + jets background at √ s = 8 The MadGraph generated samples serve as a benchmark dataset in all sections, with the exception of Section IV where, as mentioned, we use samples generated with the BOX [44] version of the POWHEG-hvq [40] in order to capture the NLO effects in top quark pair production more accurately. Our MadGraph and Sherpa samples assume the CTEQ6L1 [45] parton distribution function sets while for POWHEG we use CTEQ6M. We perform jet clustering using the Fastjet [46] implementation of the anti-k T algorithm [47].
Our event selection begins with a requirement of exactly one lepton with where mini-ISO is the lepton isolation observable of Ref. [48] and p cone T is scalar sum of all the charged tracks with p T > 1 GeV, including the hard lepton, inside a cone of radius where we used the scaling convention of Ref. [49]. We label this lepton as coming from the leptonically decaying top. Next, we define the hardest anti-k T r = 0.4 jet within a distance ∆R j < 1.5 from the lepton as the b-jet of the leptonically decaying top. For the purpose of this analysis we define a transverse missing energy vector to be the vector sum of all the neutrino transverse momenta in the event, while we postpone a detailed study of the effects of We identify the hardest anti-k T "fat" jet using three different large effective cone sizes R, defined on an event-byevent basis as where h T is the scalar p T sum of the leptonic top decay products, and it serves as an estimator of the top fat jet p T with a weak susceptibility to pileup contamination. We find that for fat jet p T > 500 GeV, the p T of the jet is well correlated with h T of the leptonic top. For more details on the criteria for correlating the fat jet parameters with the leptonically decaying top see Appendix A 3, while we present a detailed discussion of the NLO effects on the correlation in Section IV.
Continuing, including the previous requirements, namely the mini-isolated lepton and the ∆R j < 1.5, all events are subject to following Basic Cuts (BC): where p j R T is the transverse momentum of the fat jet with radius R and N out j is the number of r = 0.4 jets with ∆R j < 1.5 . N out refers to the lepton with selection criteria of Eq. (6) in addition to p T > 25 GeV, ∆φ j is the azimuthal distance between the mini-isolated lepton and the fat jet, and η j, is the rapidity of the fat jet/ mini-isolated lepton.
Here we only consider W +jets as the dominant background to the semileptonic tt events at high transverse momentum. The multijet QCD contribution becomes negligible upon the mini-isolation requirement on the lepton (see Ref. [48] for instance), while the single top cross section is already highly sub-leading compared to tt at pre-selection level [50].
For the overlap analysis in the following sections of this paper (both in the context of hadronically and leptonically decaying tops) we use the TOM implementation of the TemplateTagger code [51]. The fraction of events in which the top-antitop system is not back to back is not only significant but increases with the H T of the event (here we define H T = j p j T , where j runs over all final state particles in the event). The effect leads to a challenge for new physics searches at high p T due to difficulties in estimation of various tt differential distributions. The pre-selection of the "top candidate" as the hardest fat-jet in the event, combined with the selection criteria for the "leptonic top" object can result in mis-identifying a hard light-quark QCD jet for a top. Moreover, in the context of TOM, the imbalance in the transverse momenta of t andt could lead to an inaccurate estimate of the top jet p T (based on the h T of the leptonically decaying top), and thus result in the use of a template p T bin which does not match the transverse momentum of the hadronically decaying top.
In order to systematically study the NLO effects on performance of TOM, we first classify the SM top/antitop events into three different categories [52], depicted in We quantify the top/anti-top p T imbalance by the following asymmetry between the vector sum and the scalar sum of the top transverse momenta: where p T t,t are the transverse momentum vectors of the top and the anti-top respectively, and we choose to study A SV tt on the truth level. The asymmetry vanishes for kinematic configurations in which the di-top system is back to back (i.e. large m tt ), whereas the maximum occurs when the tops are parallel (i.e. m tt → 2 m t ). Hence, the events belonging to category (i) are characterized by small asymmetry, roughly A SV  It is important to note that the events belonging to class (ii) and (iii) in the SM tt production come both as a blessing and a curse. For instance, if one is interested in measuring the SM top differential p T distribution, the rejection of asymmetric events due to top-tagging will come at a cost of excluding a portion of relevant events. Yet, including the asymmetric events into the event sample might lead to mis-identification of the hadronic top, and an inaccurate reconstruction of the event. Furthermore, top quark pairs produced in heavy resonance decays are typically symmetric. Hence, rejecting asymmetric events implies that the SM tt is not an irreducible background anymore and a further improvement in signal to background can be achieved.
A SM tt sample generated at NLO with POWHEG and showered with Pythia serves as a benchmark for studies of A SV tt in the context of TOM. We apply the same pre-selection cuts as in Section III, but with the requirement on p T of the fat jet lowered to p T > 300 GeV. We use template p T bins of 50 GeV for the overlap analysis, a preference which has little effect on the ability of TOM to tag jets, but it improves the p T resolution of the fat jet.

Figure 2 (left panel) shows the truth level A SV
tt for a series of H T bins with two main features of the SM tt sample evident. First, the peak at A SV tt → 0 is mainly due to the LO contribution, while the peak at A SV tt → 1 corresponds in most part to events from class (iii). The main contribution to the region of A SV tt in-between the two peaks, which spreads over large range of angles between the top and the anti-top, comes from category (ii) and it is not seen as a peak. Second, it is evident that the fraction of asymmetric events increases with H T of the event sample, as both the phase space for hard gluon emission and the gluon splitting to a tt pair increase with energy.
How well can TOM distinguish the back to back tt events from the events with large A SV tt ? The ability of TOM to reject asymmetric events is highly correlated with the ability to reject light parton QCD jets, which we discuss in detail in Section V. For the purpose of comparison, here we also include results using the ATLAS-d 12 tagger [35], which consists of the following cuts: The m trim         Ref. [22] for more details), and d 12 is the k T measure at the last step of large-R jet clustering with a k T algorithm: The values p T,i appearing in the last equation are the transverse momenta of the two subjets at the last step of fat jet clustering and ∆R 12 is the plain distance between them. Boosted top quark decays are characterized by symmetric splittings d 12 ≈ m t /2, whereas background QCD jets tend to have much smaller d 12 .
The right panel of Fig. 2 shows the comparison of TOM and d 12 in their ability to reject asymmetric events. The blue points represent the fraction of asymmetric events, A SV tt > 0.2, which remain after applying various cuts on Ov had 3 as a function of the peak template p T . The green triangles show the analogous fraction of asymmetric events after the ATLAS-d 12 tagger, as a function of the trimmed fat-jet p T . Our analysis shows that TOM is able to reject the asymmetric events over a wide range of p T , by a factor of 2 better than the default ATLAS-d 12 tagger.
Higher order effects can have a significant impact on the ability to experimentally resolve the underlying parton level distributions of the top kinematic observables. The issue of resolution is inseparable from the problem of signal purity, as misidentifying a light parton QCD jet for a top will lead to an incorrect estimate of the kinematic properties of the truth level objects. Figure 3 shows Similarly, the distribution of the m tt resolution parameter shows Hence, we find that TOM is able to resolve the p T and m tt of the truth level tops for events which pass the overlap selection criteria better than the ATLAS-d 12 tagger. This finding is in accord with the right panel of Fig. 2, as TOM is more efficient at rejecting events in which a light jet is pre-selected as the top candidate.
For completeness Fig. 3 also shows the p T and m tt resolution for symmetric events only (right panels). In both cases, we find that the resolution obtained from TOM is comparable to the ATLAS-d 12 tagger, with d 12 slightly overestimating the p T and the m tt compared to TOM. For the p T distribution we find that while for the m tt we obtain that events with large A SV tt tend to over-estimate both m tt and the p T of the fat jets. It is then reasonable that TOM results in distributions which resolve the truth level kinematic parameters to an improved degree, as TOM is more efficient at rejecting the events with large A SV tt .

V. BACKGROUND REJECTION POWER
Previous work of Ref. [24] showed that TOM is able to efficiently reject the QCD background in cases where both the top and the anti-top are decaying hadronically at p T ∼ 1 TeV. Tagging boosted tops in events with a hard lepton and missing E T / constitutes a separate problem from the fully hadronic decays of tt, due to differences in the background composition. Namely, the dominant background to semi-leptonic decays of tt comes from W +jets, while the multijet contribution is already sub-leading after the lepton mini-isolation.
In this section we focus on the performance of Ov had 3 and Ov lep 3 in rejecting W +jets with no contamination from soft radiation of minimum bias events, and postpone the discussion of effects of pileup and underlying event until Section VI.
To quantify the ability of TOM to tag boosted tops against the W +jets background we study two observables where cuts denotes all selection cuts including overlap, and BC denotes the Basic Cuts of Eq. (10). We then define the background rejection power (RP) relative to the Basic Cuts as We do not include an explicit b-tag in our analysis of RP, due to the experimental challenges of b-tagging at high p T and high luminosity. Instead, we study Ov lep 3 as an alternative and compare the rejection power obtained from a tentative b-tagging benchmark point to our results using leptonic top overlap.

A. Rejection Power for Hadronically Decaying Tops at √ s = 8 TeV
We perform the template overlap analysis on hadronically decaying tops according to the prescription of Eq. (1). The peak at Ov had 3 ≈ 0 in the signal distribution deserves some attention. The event pre-selection allows for many events in which one of the decay products of the top was not captured by the fat jet cone as well as asymmetric events discussed in Sec. IV to pass the cut. These events will likely have a low overlap score, due to having the wrong jet mass and/or substructure, resulting in the peak at Ov had 3 ≈ 0 in the signal distribution. A cut on hadronic overlap will efficiently remove such events in a systematic manner, without the need for additional customized cuts. As an example, consider the intrinsic feature of TOM mass filtering. A cut on hadronic peak overlap efficiently removes the low mass regions both in the signal and the background distributions as evident in Fig. 5. Implementing a mass cut via a cut on Ov had 3 has a further advantage in a high pileup environment as TOM is much less susceptible to pileup contamination than the jet mass (see Section VI for more details).
We proceed to discuss the rejection power achievable with TOM at √ s = 8 TeV over a wide range of fat jet p T .
Note that in the following, we assume signal events to be the SM tt events, including the events characterized by a large A SV tt . Figure 6 shows the rejection power of Ov had 3 compared to the ATLAS-d 12 tagger. The left panel illustrates the dependence of rejection power on fat jet p T at a fixed signal efficiency of 60%. We find that a rejection power of ∼ 10 is possible at p T ≈ 500 GeV, while the ability to reject W +jets events reduces at higher p T . We have checked that the decrease in rejection power with the increase in jet p T (dashed blue curve) is almost entirely due to the asymmetric events discussed in Section IV, since the proportion of tt events with large A SV tt increases with the H T of the event. The right panel of Fig. 6 shows more complete information on the ability of TOM and d 12 to reject background events. The curves represent the W +jets fake rate as a function of signal efficiency, while the overlap cut runs along the curves. Each curve is limited to a range of fat jet p T values. Notice that TOM clearly outperforms d 12 for most efficiencies and the entire considered p T range by roughly a factor of two. Table I       is that the kinematics of the object we construct from a jet, a lepton and missing energy in the W +jets events is by construction more similar to a boosted top decay at the pre-selection level. The object Ov lep 3 is trying to distinguish from the leptonically decaying quark is typically of higher mass than a light jet in addition to the missing energy and the lepton already coming from a W decay. The templates, which are designed to tag a W and reconstruct the correct mass of the top quark (among other things) thus have a higher probability of mis-tagging such an object as a top. However, overlap analysis is extremely efficient in removing the pure QCD background which is of much higher rate, hence the analysis will result in a better sensitivity and reach.
Leptonic top implementation of TOM is able to reject W +jets events with RP ≈ 2.5 for h T = 500 GeV, with the increase in rejection power to ≈ 4 at higher values of h T , as the pre-selection cuts are sufficient to relieve Ov lep 3 from the higher order effects which plague the fat jet analysis. For completeness, we also summarize the rejection power analysis in Table II for several cuts on Ov lep 3 .

C. Leptonic Top Overlap as a b-tagging Alternative
Tagging of b-quarks at high p T (i.e. > 300 GeV) is an experimentally challenging task. Any alternative method which could at least compensate for the background rejection power provided by the b-tagging procedure could be a valuable asset in boosted top analyses. In the previous section we already discussed the rejection power which can be achieved by Ov lep 3 . Here, we ask whether the achievable rejection power is sufficient to compensate for the reduction in the b-tagging efficiency.
The details of b-tagging involve an elaborate analysis of the detector level data (including both the tracking and calorimeter information), which is beyond the scope of this analysis. Here, we use a semi-realistic b-tagging procedure, whereby the parton level information from the Monte Carlo hard process provides a "tag" for the showered jets. If an r = 0.4 anti-k T jet is within ∆R = 0.4 from a hard-process b or c quark, we assign a b-tag to the jet. Otherwise, the jet is tagged as a light jet. We then weigh the number of b, c and light jets by the efficiencies for identifying each  category as an actual b-jet. For the purpose of this analysis, we use the benchmark point of where b,c,l are efficiencies that a jet is identified as a b-jet for b, c and light flavors respectively. Properly tagging the b-quark at high p T hence results in the rejection power of roughly 5 for light jets and 1.7 for charm.  (20). In reality, the ability to properly tag the b quarks deteriorates with the increase in energy, while the leptonic overlap rejection power increases. Hence we find that Ov lep 3 could provide a useful substitute for the rejection power lost due to the reduction of b-tagging efficiency in an analysis. In addition, the information contained in Ov lep 3 is complementary to b-tagging, and the combination of the two can be used to further increase the RP.

VI. EFFECTS OF PILEUP CONTAMINATION ON TOM
The high instantaneous luminosity characteristic of the LHC poses a serious problem for jet substructure physics.
The current LHC run at √ s = 8 TeV recorded an average N vtx ≈ 20 interactions per bunch crossing, with the projections that the future runs may result in as much as N vtx ∼ 100 [53]. Contamination due to diffuse radiation from pileup can significantly shift and broaden the jet kinematic distributions, sparking a need for methods to either subtract, or correct for large pileup effects. Figure 8 shows an example of effects of pileup on the boosted top and light quark QCD jet mass distribution. Pileup not only shifts the mass peak to the right, but significantly broadens the distributions as well. Imposing a fixed mass window on the fat jet distribution would thus result in decreased efficiency with the increase in pileup. The statement is true even after estimating the relative shift of the mass peak due to pileup, as the widening of the mass distribution is difficult to correct for. Algorithms such as Jet Trimming [22] and Jet Pruning [23] aim to remove the contamination of soft radiation from underlying event or pileup, which is important to improve the mass resolution for large jets. A data driven method of Ref. [54] focused on pileup correction for jet-shape variable at the differential level, say as a function of the jet mass, angularity and planar flow. Subsequent studies by CDF [55] and ATLAS [56] collaborations provided qualitative validation of the method. In addition, the CMS collaboration employs track information to subtract pileup contamination coming from secondary vertices [57]. Reference [58] uses a jet area based method for pileup correction, whereby the effects of pileup are subtracted from jet observables such as p T and mass based on the data driven estimates of the pileup contamination per unit area. More recently, the authors of Ref. [59] proposed a method of subtracting the effects of pileup from jet shape variables using jet areas. The results were numerically shown to hold up to N vtx = 60.
Reference [34] showed that TOM is weakly affected by pileup, with boosted Higgs distributions of template overlap (and other template based observables) remaining mostly impervious to pileup at N vtx = 20 interactions per bunch crossing. The relative insensitivity of TOM to pileup comes from the fact that template sub-cones radii are typically of O(10 −1 ) of the fat jet cone, yielding that the relative pileup contamination is only a few percent of the effect on jet observables such as fat jet mass or transverse momentum.
In this section, we study the effects of pileup on top template overlap, at various top energies and levels of pileup contamination. For the purpose of our study, we choose to omit as many pileup sensitive observables as possible (such as the fat jet p T and mass). Instead, we focus on the intrinsic, pileup insensitive mass filtering property of TOM, as well as present the results in terms of h T instead of fat jet transverse momentum where appropriate.
To simulate the effects of pileup we add minimum bias events to each event we we wish to analyze, whereby the number of pileup events added is determined on an event-by-event basis, by drawing a random number from a Poisson distribution with the mean N vtx .

)
We begin with the study of pileup effects on Ov had 3 . The benchmark points of N vtx = 20, 50, 70 pileup events per bunch crossing serve to illustrate the performance of TOM in a pileup environment, while several h T bins serve to illustrate the effects of pileup at various jet transverse momenta.
The top left panel of Fig. 9 shows the signal efficiency with a fixed Ov had Ref. [34]. The result shows that TOM can perform well without significant pileup subtraction or correction on the current 8 TeV data set, while alternative ways of dealing with pileup are likely to be needded beyond N vtx > 50. can be efficiently corrected [60].   Many models provide possibilities for new interactions with enhanced couplings to top quarks. To demonstrate the effectiveness of the TOM, we present a simple search for a massive spin-one, tt color octet resonance in the lepton plus jets channel for the past run of the LHC at √ s = 8 TeV. We further study the case of heavy NP characterised by effective field theory (EFT). Specifically, we add a four-fermion uūtt operator capable of accommodating the discrepancy in the Tevatron tt forward-backward asymmetry.
Our analysis focuses on the kinematic range in which the tt system has sufficient energy for the decay products of each top quark to be fully merged into fat jets of R ∼ 1, leading to di-jet topologies in which the events have one fully merged hadronic decaying top-quark candidate and one fully merged leptonic decaying top-quark candidate. Event reconstruction follows the same steps as in Sec. III. For events passing the Basic Cuts of Eq. (10), we further demand that the semileptonic tt candidate contain two top tags and satisfy extra cut on template m tt consistent with the decay of a heavy resonance.

A. Benchmark models
We consider new physics in two specific benchmark models. In the first case, we consider Kaluza-Klein (KK) gluons from the bulk Randall-Sundrum model (RS) [3,5] with Γ KK /M KK = 15%. Neglecting effects related to Electro-weak symmetry breaking (EWSB), the left-handed (g L ) and right-handed (g R ) couplings to quarks in this model are where q = u, d, c, s and g s is the SM SU (3) C gauge coupling. Masses below ≈ 2 TeV for KK gauge particles are disfavored by precision tests [61,62], while direct constraints from CMS limit the KK gluon to be heavier than roughly 2.5 TeV [63], assuming a signal K-factor of 1.3, derived from color singlet NLO analyses [64]. As here we consider a color octet resonance, we conservatively do not apply this K-factor. We consider two specific KK gluon masses: M KK = 2.5 TeV and M KK = 3 TeV. In this mass regime, KK gluons decay dominantly to tt with a branching ratio of ≈ 95%.
As a second example, we consider a non-resonant top-philic NP model. Assuming that new physics is heavy enough, one can take an EFT approach to describe the NP by means of higher dimensional interactions among the SM fields.
For simplicity, we focus on the operator where Λ is the scale of the new interaction, T a being an SU (3) generator (a = 1...8) and g A is the "axigluon" coupling. This presence of this operator can be motivated by the anomalous top forward-backward asymmetry at the Tevatron (see e.g. [65][66][67][68][69][70][71]). As a reference point, we chose g A /Λ 2 ∼ 1.4/TeV 2 , which can account for the observed asymmetry. Since Λ is relatively low, we expect a strong enhancement of the differential tt production cross section.
It is worth noting that the heavy NP described by the above EFT is already in a tension with the recent CMS search for anomalous tt production of Ref. [63].

B. LHC signals
We simulate the signal and background samples using the procedure described in Sec. III. The events are required to satisfy the Basic Cuts described in Eq. (10) with the fat jet transverse momentum p T > 500 GeV. In Table V, we summarize the cross sections considered after the Basic Cuts. While the signal cross sections are computed at LO, the background cross sections are obtained with MadGraph normalized to the theoretical cross sections of Ref. [72] (for tt at NNLO), and Ref. [73] (for W jj at NLO). For events passing the basic reconstruction, we further demand that the semi-leptonic tt candidates satisfy the overlap cuts, The top-tagging algorithm for the hadronic and leptonic top-quark candidates was described in Sec. II. To optimize the choice of Ov 3 cut for the NP search, we turn to the expression for S/ √ B as a function of background rejection power, RP, and the signal efficiency sig : where sig is the efficiency of the Ov 3 cut relative to basic cuts of Eq.  the EFT described in Eq. (22) (red). Both the SM tt and W + jets production rates fall steeply as a function of the tt mass. From the left panel of Fig. 12 it is clear that the dominant background in this analysis in the absence of top tags is from W +jets events rather than from SM tt production.
Further purification of the signal can be achieved by applying cuts on the hadronic and leptonic top jets. The right panel of Fig. 12 shows that the SM non-tt background is significantly reduced once an overlap cut is applied to the top jet candidates. We see that SM tt dominates over W +jets for m tt < 2.5 TeV, while the long tail in the invariant mass distribution of the W +jets background is comparable to SM tt for m tt > 2.5 TeV. The absence of b-tagging in our current study does not allow direct comparisons with the ATLAS study [50], where b-tagging significantly reduces the W +jets background. However, it is worth noting that the background composition in our study and the one performed by ATLAS are similar for m tt < 2.5 TeV, with SM tt being the dominant background.
Note that most of the events from the high mass KK gluon resonances do not appear as a sharp resonance but instead are smeared over a wide range of the m tt distribution. This effect is due to the fact that the KK gluon is rather broad and more importantly due to the convolution of the rapidly falling parton distribution functions with the Born cross section. Furthermore, obviously the contributions from the EFT operator, being irrelevant, are increasing with energy. Therefore, the tt spectrum tends to be harder in presence of new physics. To improve the signal to background ratio, we apply a sliding lower cut m tt > m min for each resonance, conveniently adjusted to give an approximately flat S/B ratio.
In order to determine the reach, we apply a simplified Bayesian approach using a flat prior distribution and neglecting systematic uncertainties [74]. We assume that the probability of measuring n events is given by a Poisson distribution where B and S ≡ σ sig sig L are the number of expected background and signal events, respectively. Here we regard We assume that n is equal to the integer closest to B, and solve Eq. (26) for σ CL assuming α = 0.05 (95% exclusion). Figure 13 shows the results for the projected 95% CL exclusion of the KK gluon search at √ s = 8 TeV and L = 14 fb −1 , 20 fb −1 . We find that KK gluon masses up to ≈ 2.6 TeV can be excluded with L = 14 fb −1 , and masses up to ≈ 2.7 TeV with L = 20 fb −1 , assuming no b-tagging, no pileup, no detector effects and no signal K-factor.  In this paper we introduced a tagger for semi-leptonic tt events based on the Template Overlap Method (TOM).
We demonstrated that at large boost the leptonic-top tagger leads to an additional rejection power of roughly 4. The tagger may serve to compensate or complement the rejection power lost due to the reduction of b-tagging efficiency.
We showed that the semi-leptonic tt TOM tagger is by itself robust against pileup up to 50 interactions per bunch crossing, without the use of additional pileup correction techniques. The relative insensitivity of TOM to pileup may thus serve to study the systematic effects of other pileup correction techniques.
Furthermore, we demonstrated that TOM is able to efficiently reject events in which tt pairs are produced in association with a hard gluon and hence single out the back to back tt events. Our results show that Ov had 3 is able to provide an improvement of a factor of 2 in back to back tt signal purity compared to the ATLAS tagger based on cuts on the k T splitting scale and the trimmed jet mass selection. Our method is able resolve the kinematic distributions of high energy top quark events to a reasonable degree, and better than the above-mentioned ATLAS tagger. The improvement in resolution is due to the fact that conventional approaches will often tag the extra hard jet as a hadronic-top candidate. The hadronic TOM rejects W +jets events at the rate of ≈ 10 with the SM tt efficiency of 60% at p T ∼ 500 GeV. The rejection power decreases with energy, due to the mentioned higher order effects that are characterized by hard and wide gluon emission and the gluon splitting function to a top quark pair.
We performed a detailed study of pileup effects on TOM. To illustrate the performance of TOM in a high luminosity environment, we chose not to subtract pileup from our events. Instead, we introduce a simple approach to damp the effects of soft contamination on results of the overlap analysis. We introduced a method to estimate the p T of the hadronic top template based on the scalar sum of the leptonic top decay products' transverse momenta as a pileup insensitive alternative. In addition, we omitted the cut on the fat jet mass and instead relied on TOM's intrinsic mass filtering ability only. Our results revealed that the hadronic formulation of TOM is fairly robust agains soft contamination up to ≈ 50 interactions per bunch crossing, while the leptonic top TOM remaining weakly affected up to at least 70 interactions.
As a case study, we have investigated the performance of TOM in the context of a KK gluon resonance and a non-resonant top-philic searches at a 8 TeV LHC, with the major backgrounds consisting of SM tt continuum and W + jets. The additional rejection power provided by our semi-leptonic top template tagger suggests that an analysis based on TOM can achieve a better sensitivity than previous analyses. In particular, we found that a KK gluon could be excluded up to masses of ≈ 2.7 TeV with 20 fb −1 of data at 95% CL. Non-resonant new physics contributions to tt production could in principle be excluded with the same efficiencies.
Finally, we discussed many technical and experimental aspects of TOM in the Appendix. We showed that covering a wide range of top transverse momenta insists on a use of some sort of a template sub-cone scaling rule, while no single fixed value of sub-cone radii is adequate to provide a fixed efficiency for a fixed Ov had  In order to speed up the overlap calculation we generate template states at fixed jet p T in the boosted frame. This requires us to generate several sets of templates and dynamically determine which set to use on an event by event basis. In this paper, unless otherwise noted, we use twelve template libraries starting at p T = 550 GeV in increments of δp T = 100 GeV. The 3-particle top templates are determined by two four momenta, p 1 and p 2 , subject to the constraints, where P is the top total momentum, while m t and m W are the top and W mass respectively. The third four momentum, p 3 , is determined by momentum conservation. By solving Eq. (A1), we can generate the templates with a sequential scan over η, φ of the first two template momenta.

Template Subcone Scaling
The shape of both the signal and background overlap distributions is dependent on the choice of the template sub-cone size. However, Ref. [34] showed that there typically exists a wide region of template sub-cone radii for which Naturally, one should expect the radiation pattern of a, say, p T = 100 GeV quark to be wider that the radiation pattern of a 1 TeV quark. Hence, the template sub-cone which is "adequate" to match the higher energy subjet could be too small to accurately capture most of the showering pattern of a lower energy one. In addition, how will the change of the adequate template sub-cone size affect the shape of the overlap distributions at different p T ? What effect will the change in shape of the distributions have on the signal efficiency and the rejection power of a fixed Ov had 3 cut?
The true understanding of the dependence of adequate template sub-cone size on the energy of the subjet is a topic in non-perturbative QCD as is beyond the scope of our analysis. We instead turn to a more data-driven approach, whereby we compare the properties of fixed template sub-cones over a wide range of fat jet p T to a polynomial fit to template sub-cone scaling rule of Ref. [34]: where p T, a is the transverse momentum of a template parton. We limit the template sub-cone sizes to be in the range [0.05, 0.3], where the lower limit serves to take into account the detector resolution, while we set the upper limit to the value beyond which no data points exist (see Ref. [34] for more details).
Varying sub-cones, while not necessarily providing an increase in rejection power at a fixed signal efficiency, have clear advantages over the fixed template sub-cones. Figure 14 shows an example.  In our current work we opt not to use the fat jet transverse momentum as the estimator of template p T because of the susceptibility of jet p T to pileup. Instead, we define the observable where p i T is the transverse momentum of the leptonic top components (i.e. the hardest lepton outside the fat jet with mini-ISO> 0.95, the hardest anti-k T , r = 0.4 jet within R = 1.5 from the lepton and the total E T / ). h T is correlated with the fat jet p T , especially in events in which tops decay back to back. Figure 15 shows an example. The high degree of correlation between h T and p T of the fat jet allows us to replace the criterion for template set selection based on pileup sensitive jet p T to a more pileup robust h T . Notice that the Ov had 3 distribution in Fig. 15 remains unaffected by the choice of the template selection rule.
The template p T estimation based on the h T of the leptonic top provides an additional discriminant of asymmetric tt events we discussed in Section IV. If an event is characterized by a large A SV tt , the h T of the leptonic top will often not match the p T of the fat jet, even in the cases where the hadronic top jet is correctly pre-selected. The resulting peak overlap score will thus tend to small values, due to the mismatch between p T of the template and the transverse momentum of the hardest fat jet.

Effects of MET Resolution on Template Selection Criteria
So far we have taken the simplified approach to estimating E T / , where we assumed that the missing transverse momentum was simply a sum of the transverse components of all the neutrino four momenta in an event. Here we explore the effects of properly reconstructing the missing energy and the E T / resolution on the TOM analysis.
We follow the ATLAS prescription of Ref. [75] for reconstruction and smearing of missing energy. We begin by calculating the missing energy as the sum of x and y components of the clusters where the sum over i goes over all the final state particles in the event which are not neutrinos and satisfy the following criteria: The requirement on pseudo-rapidity guarantees that the particle does not end up down the beam line, while the p T requirement is necessary to take into account the effects of charged particle's track bending in the strong magnetic field.
Next, we smear E x,y individually by drawing a random number from a gaussian centered at E x,y with a width given by the E T / resolution where i runs over the non-neutrino event constituents satisfying the requirements of Eq. A5.
Finally, we calculate the missing energy from the smeared E x,y as Leptonic overlap shows somewhat more pronounced susceptibility to missing energy resolution. Figure 17 shows an example distribution for tt and W +jets. We find that Ov lep 3 distributions are shifted slightly towards lower values of overlap if smearing of missing energy is included. This is not surprising, given that the missing energy goes directly into the computation of Ov lep 3 . Nonetheless, the effects are small enough to be concerning for the overall performance of the overlap analysis. distribution for tt events with 500 GeV < pT < 600 GeV. Right panel shows the same thing for W +jets. All events assume the Basic Cuts of Eq. (10).

Determining the Adequate Number of Templates
How is the template analysis affected by the number of templates used in the calculation? In the limit of N templates → ∞, we expect a perfect coverage of the top decay phase space. However, technical difficulties and processing time limit us to a finite number of template states, where a large number of templates (typically of O(10 6 )) is declared "adequate." So far, there has been no detailed analysis on the actual sensitivity of TOM to the number of used template states, as the problem requires a dedicated study at several template transverse momenta and various number of templates. For instance, a typical case covering a template range of p T = 500 − 1500 GeV, in steps of 100 GeV and 5 different template sets (with different number of steps in angular variables, N η,φ ) would require 50 runs of both the signal and background channels.
In order to test the dependence of overlap results on the number of templates used, we vary the number of steps in η, φ used to generate templates from 50 to 90 steps in each, in increments on 10 steps. This method gives template sets with roughly twice as many templates in each consecutive case. In the interest of time, we consider only two template p T values Case 1 : 500 GeV < p T < 600 GeV , Case 2 : 1400 GeV < p T < 1500 GeV .
This particular choice of case studies looks at the extrema of the p T range of interest for the boosted top analyses of the near future and to first approximation we will consider the results valid for in-between values of template p T . Figure 18 shows the result. The background fake rate as a function of signal efficiency remains unaffected for all considered cases, signaling that even N η,φ = 50 adequately covers the top decay phase space. The effects of varying Madgraph + Pythia, s =8 TeV, anti-k T , R-varied 500 GeV <p T <600 GeV the number of templates are noticeable only in the high overlap region of the signal distribution where adding more templates naturally improves the resolution of subjets and thus slightly improves the peak overlap score . Notice however that the region of low overlap (i.e. Ov had 3 < 0.7) remains unaffected, hence verifying that increasing the number of templates does not lead to an increase in the mis-tag rate of events which do not match the topology and substructure of a boosted semi-leptonic tt decay.