Resonance Searches with an Updated Top Tagger

The performance of top taggers, for example in resonance searches, can be significantly enhanced through an increased set of variables, with a special focus on final-state radiation. We study the production and the decay of a heavy gauge boson in the upcoming LHC run. For constant signal efficiency, the multivariate analysis achieves an increased background rejection by up to a factor 30 compared to our previous tagger. Based on this study and the documentation in the Appendix we release a new HEPTopTagger2 for the upcoming LHC run. It now includes an optimal choice of the size of the fat jet, N-subjettiness, and different modes of Qjets.

After the discovery of the Higgs boson, a keystone of the Standard Model, one main task for the upcoming LHC runs will be searches for physics beyond the Standard Model. Several open experimental and theoretical questions point to additional particles or structures at energies above the electroweak energy scale [1]. A very generic feature of many extensions of the Standard Model is the presence of additional heavy particles which preferentially decay to a pair of top quarks [2]. One example for such a resonance could be a heavy neutral Z -gauge boson with a TeV-scale mass. Historically, such states were only searched for using semileptonically decaying top pairs. There, a kinematic reconstruction is based on an approximate reconstruction of the missing neutrino momentum through a W -mass or top mass condition. In the last LHC run this search channel was supplemented by resonance searches based on boosted, hadronically decaying top pairs. In the corresponding ATLAS analysis [3] the HEPTopTagger [4,5] and the template tagger [6] each showed a similar reach, comparable with the semileptonic channel. This experimental success is based on rapid progress in the field of just substructure both experimentally and theoretically, which will gain even more momentum during the 13 TeV LHC run.
The field of top and Higgs tagging [7] started essentially as a Gedankenexperiment to illustrate recombination jet algorithms [8]. After some early attempts for example to tag hadronically decaying tops [9] it took off with the development of the BDRS Higgs tagger with its mass drop condition [10] and a filtering step targeting underlying event and pile-up [11]. The first top taggers were simple, deterministic algorithms which could identify and reconstruct hadronically decaying top quarks including subjet b-tagging [4,[12][13][14]. They were based on deliberately simple structures and algorithms, to firmly establish subjet methods in AT-LAS and CMS. After the experimental success of these completely new analysis tools in the first run of the LHC, the upcoming run will benefit from more advanced top tagging methods. Those include multivariate taggers [15], template taggers [6], as well as shower deconstruction [16] or event deconstruction [17] * . For those specialized tools the challenge will be to still provide a universal top tagging approach, which on the one hand allows for optimal experimental results, but on the other hand identifies and reconstructs boosted top quarks independent of the specialized analysis framework.
Over time, the original HEPTopTagger [4] has gone through several rounds of improvements. The first modification included a re-formulation of the algorithm, leading to the trademark A-shaped kinematic cuts [5]. One of the key observations leading to these cuts is that in the absence of a b-tag it is not helpful to uniquely identify the two W -decay jets because in typical top decays there will be two jet-jet combinations which reconstruct to an invariant mass around 80 GeV [19]. The first set of new, additional variables [20] then included a combination of the usual filtered top mass [11] with a pruned top mass [21]. In this upgrade we introduce a fat jet radius up to R = 1.8 for moderately boosted tops and allow for a choice of Cambridge-Aachen [22] and k T [23,24] jet algorithms in all internal clustering and filtering steps except for the mass drop condition. This improves the tagging performance for highly boosted tops [20]. Recently, the algorithm was slightly changed to avoid background shaping [15]. In the same study we added a low-p T mode based on Fox-Wolfram moments [25] to incorporate angular correlations, extending the tagging coverage to p T,t = 150 GeV.
In this paper we present a detailed study of the HEPTopTagger2, collecting all previous modifications, as well as a whole range of new features targeted at multivariate analyses and statistical approaches to single events [26,27]. The main body of the paper will focus on Z searches, where final-state jet radiation turns out to be the limiting factor of the original tagger. After resolving the issue with final-state radiation we will step by step improve the tagging algorithm by defining and including additional kinematic information. Finally, we will compare the multivariate tagging performance with the leading projections based on event deconstruction [17].
The main background in fully hadronic Z → tt searches is QCD multi-jets production, which allows us to directly translate all our findings into a performance study based on tagging tt pairs in the Standard Model. We will show these results together with a review of the complete HEPTopTagger2 algorithm and the code interface in the Appendix.

II. RESONANCE RECONSTRUCTION
The key challenge of any top tagger is its broad range of applications and the related optimization of the algorithms and codes. For example, the HEPTopTagger was developed to solve the combinatorial problems in ttH searches [4]. The first public tagging code was presented for supersymmetric top partner searches in semi-leptonic top decays [5]. Its proposed applications include single top production to experimentally separate the s-channel and t-channel production processes [28]. However, its experimental application during the first LHC run was the search for heavy resonances decaying to hadronic tt searches [3]. For such a resonance search the kinematic top tagger in combination with a b-tag showed a similar performance as the usual, approximate reconstruction of semileptonic tt pairs. In this paper we will present a set of improvements towards the HEPTopTagger2 for a Z search at the 13 TeV LHC. Many of these improvements can be applied to other LHC processes, as will be discussed in the Appendix.
In using all available information from a pair of boosted top quarks, event deconstruction is currently giving the leading performance estimates for heavy resonance searches [17]. For the analysis in the main body of this paper we will follow the analysis framework of Ref. [17], to eventually allow for a comparison in Sec. IV. For the signal we therefore use Pythia8 [29] to generate Z → tt events with m Z = 1500 GeV at 13 TeV collider energy. Assuming the same couplings as for the Standard Model Z-boson would yield a width of Γ(Z ) = 47 GeV; to be consistent with the assumed experimental resolution in Ref. [17] we increase the width to 65 GeV and only simulate the vector couplings. However, we will see that this choice of the physical Z width does not affect our results which are based on the reconstructed fat jet kinematics. For the Z decay we assume a 100% branching ratio to top pairs. The two backgrounds are continuum tt production which we simulate assuming p T,t > 400 GeV, and QCD di-jet production, also requiring p T,j > 400 GeV. Again, we rely on Pythia8, keeping in mind that for the pure QCD background our di-jet rate might not be a conservative estimate. All top quarks are forced to decay hadronically. Our simulations for the main body of the paper include underlying event but do not account for pile-up or detector effects, unless explicitly mentioned. For a completely realistic study of the signal and background efficiencies of the new HEPTopTagger2 we will have to rely on upcoming experimental studies. For our multivariate tagging analyses we optimize the background rejection with respect to the pure QCD background, because it is by far dominant.

Decay kinematics
On the analysis level we first select events with at least two fat jets with p T,fat > 400 GeV and |y fat | < 2.5 , reconstructed using the C/A algorithm [22] with cone size R = 1.5, as implemented in FastJet [24]. We limit ourselves to the two hardest fat jets in each event for the Z search. The corresponding cut flow is given in Tab. I. Using the old default HEPTopTagger setup [5] we find a double top tagging efficiency of ε 2tags = 14% in the signal, as shown in Tab. I. If we apply a fixed invariant mass window m tt ∈ [1200, 1600] GeV on the tagged and reconstructed top quarks, the Z tagging efficiency is ε Z = 10.2%. For the tt background we find mis-tagging probabilities of ε 2tags = 13.7% and ε Z = 3.3%. For the QCD background sample the double mistag rates are ε 2tags = 6.6 · 10 −4 and ε Z = 1.5 · 10 −4 . The QCD jets background exceeds the continuum top pair production by a factor five after all cuts.   Figure 1: Left: ROC curves for the dominant QCD background vs. the Z signal after including additional kinematic information shown in Eq. (2). As in all figures the asterisk corresponds to the original HEPTopTagger described in Ref. [5]. Right: |∆y| distribution of the reconstructed top quarks for signal and backgrounds.
A straightforward improvement of the basic analysis shown in Tab. I should be to replace the mass window by a boosted decision tree (BDT) analysis, as implemented in Tmva [30], based on the reconstructed invariant mass m tt . In the left panel of Fig. 1 we first show the results as receiver operator characteristic (ROC) curve, correlating the best signal and background efficiencies based on a given set of kinematic observables. This approach has been used to improve and benchmark the general performance of the HEPTopTagger [15]. Because the QCD jet background is dominant we always set up our multivariate analyses based on the Z signal and the QCD background sample. Compared to the working point of the original public HEPTopTagger tool [5] with a fixed mass window m tt ∈ [1200, 1600] GeV the new HEPTopTagger2 including m tt in a multivariate analysis looks slightly worse. The reason is the change in the order in the algorithm described in the Appendix. It significantly reduces the background sculpting, but at the expense of background rejection for example for a constant signal efficiency. On the other hand, the reduced background sculpting removes a major source of systematic uncertainty when we need to interpret an m tt distribution which shows a peak which could be due to a signal or to a sculpted background. Moreover, it turns out that the difference between the old and new taggers vanishes once both of them are used in a fully flexible multivariate framework.
For a better discrimination between signal and background we should include additional variables in our multivariate analysis. The deterministic structure of the HEPTopTagger will still allow for a particularly clear separation of the actual tagging and reconstruction from a subsequent kinematic analysis based on the reconstructed top momenta. The first additional variable we include is the rapidity difference between the two reconstructed top quarks, |∆y|. The corresponding signal and background distributions are shown in the right panel of Fig. 1. While this variable might not be too efficient in removing the tt continuum background, events are visibly less central for QCD jets. The differences can hardly be translated into efficient kinematic cuts, but they will help as part of a multivariate analysis. In the left panel of Fig. 1 we show the corresponding improvement in terms of ROC curves. In particular for low signal efficiencies ε S < 0.1 we find a significant reduction of the background fake rates, going beyond the working point of the first HEPTopTagger.
An obvious extension of our set of kinematic observables are the transverse momenta of the reconstructed top quarks. Note that as part of the ROC analysis we do not have to ensure that the different kinematic variables are independent of each other, which would be problematic for a combination of m tt and the p T,t distributions. Again, the improvement from the transverse momentum spectra is shown in the left panel of Fig. 1. All this illustrates that the kinematic information on the tagged and reconstructed tops can increase the background rejection by 50% to 100% for fixed signal tagging efficiency. We also see that once we include the top-pair invariant mass and the transverse momenta, the additional improvement from |∆y| vanishes, because the 2-particle final state is essentially fully described. As kinematic observables in our multivariate  analysis we choose

QCD jets
In purely hadronic searches for new physics, QCD effects beyond fixed order are a major issue in trying to theoretically understand the signal and backgrounds. Before we devise strategies to deal with final-state radiation and initial-state radiation in heavy-resonance searches we can estimate their effect on the naive tagger-based analysis.
On the Monte Carlo level it is possible to separately remove initial-state radiation and final-state radiation from all signal events. For the QCD jets background this is not sensible, because we need both mechanisms to generate a sufficient jet multiplicity. The ROC curves in Fig. 2 show the expected improvements in the absence of additional signal jets. We see that the leading effect spoiling the signal extraction is final-state radiation (FSR). Initial-state radiation (ISR) affects top tagging in two ways. First, the additional QCD jets can mimic for example the softer W -decay jet and degrade the tagging efficiency through combinatorics. On the other hand, ISR jets recoil against the Z , affecting the p T spectrum of the top quarks. In particular the tagging of the softer top decay can benefit from this recoil, which means that for large signal efficiency the results without ISR become worse than those with all jet activity included.
As a whole, the results shown in Fig. 2 indicate potentially significant improvements of top taggers when we target the different effects of QCD jet radiation. We will show in the following subsection how a deterministic top tagger is limited by final-state radiation and how the new HEPTopTagger2 can avoid these issues. Combinatorial problems related to initial-state radiation will then be one of the key topics in Sec. III.

Final-state radiation
Final-state radiation (FSR) turns one of the key advantages of our top tagger into a significant problem: unlike some other top tagging approaches, the HEPTopTagger returns the 4-momentum of the tagged top, including a cut on the reconstructed top mass m rec ∈ [150, 200] GeV [15]. This allows us to trivially reconstruct m Z . Final-state radiation off the top decay products will be captured by the jet clustering and contribute to the correct filtered top mass value [11]. This way it will not pose a problem as long as the Z decays to on-shell tops. However, if the Z decays to slightly off-shell tops, which turn themselves into on-shell tops, this final-state radiation off the intermediate top mis-aligns the actual Z with the Z as reconstructed from the top quarks at the moment they decay. Because the hard radiated gluon does not enter the top reconstruction, the top tag will pass, but lead to an underestimated m Z value. In the left panel of Fig. 3 we indeed see that the m tt distribution for the top-tagged signal correctly peaks around m Z , but develops a sizeable asymmetric tail towards smaller m tt values. While the details of this asymmetric tail from Pythia8 should be subject to a detailed Monte Carlo study, we simply confirm that turning off final-state radiation by hand gets rid of it almost entirely. The remaining slight broadening as well as a minimal tail towards smaller m tt values is due to small losses in the top 4-momentum reconstruction of the tagger. At higher values of m Z the asymmetric tail is further enhanced.
The problem with large asymmetric tails from final-state radiation is that they cannot simply be corrected for in a universal top tagger. The basic structure of the HEPTopTagger has to identify and reconstruct top quarks, rather than the decay products of a heavy Z resonance. Therefore, we do not modify the actual tagger, but we account for final-state radiation through an additional set of kinematic observables.
Following the brief discussion above, including the kinematics of the fat jet in addition to the reconstructed   top 4-momentum should remove the broad asymmetric tail in the reconstructed m Z values. Again, we first select events with two tagged tops, including the top mass condition. Instead of using the 4-momenta of the tagged tops, we now reconstruct the Z mass from the 4-momenta of the two fat jets of size R = 1.5, which eventually lead to the top tags. In the presence of underlying event and initial-state radiation the naive m ff distribution peaks roughly at the correct Z mass and shows symmetric tails. To use the invariant mass of the two fat jets we need to apply filtering [11]. In the right panel of Fig. 3 we compare the filtered invariant mass from the two fat jets [11] and its pruned value [21], both as implemented in FastJet [24]. As a reference we also show the m tt distribution from the left panel of the same figure. Unlike the reconstructed m tt distribution, both, the filtered and the pruned m ff distributions give symmetric peaks around the correct m Z value.
To be able to use the filtered m ff values in our HEPTopTagger analysis we confirm that filtering and pruning give stable numerical results for the invariant mass of the two fat jets. Results for different parameter settings are listed in Tab. II. We give the peak positions, which would be subject to a proper calibration, the fitted Breit-Wigner widths for the symmetric peaks, and the tagging performances for a fixed mass window |m ff − m Z | < 150 GeV. Replacing the Breit-Wigner width with a Gaussian would make no difference, but give a poorer modelling of the tails. Typical widths of the reconstructed Z mass peak will range around 145 GeV, roughly twice the assumed particle width of 65 GeV. Even in the absence of detector effects, this resolution will replace the assumed particle width of 65 GeV in all of the following analysis. The constant numbers in Tab. II confirm that the m ff criterion is stable for different filtering parameters as well as pruning.
On the other hand, the results shown in Tab. II also indicate that simply replacing the m tt window by a filtered m ff value will not improve the Z extraction. In Fig. 4 we show that the steeply falling QCD jets background now has a maximum around m ff = 1.3 TeV, while for the reconstructed top quarks there exists a much more pronounced maximum around m tt = 900 GeV. The reason is that top tagging removes events with many hard QCD jets in two steps: first requiring the correct top mass value from three assumed top decay products, and second when applying the Z mass window. If we remove the first step, the second one has to deal with larger backgrounds at high m ff values.
If we want to include final-state radiation and at the same time benefit from its additional information, we need to keep m ff as well as m tt in our analysis, and not apply a simple mass window on the m tt distribution. The kinematics of the Z -decay is then described by { m tt , m ff , p T,t1 , p T,t2 , p T,f1 , p T,f2 } (filtered fat jets).
All default settings of the HEPTopTagger are listed in the Appendix. We filter the fat jets using R = 0.3 and keep the N = 5 hardest substructures. In the left panel of Fig. 5 we show the corresponding ROC curves. Unlike in the rest of the paper we study the tt and QCD jets backgrounds separately. The improvement of decay kinematics (2) filtered fat jets ( filtered fat jets (3) variable masses (4) QCD Figure 5: Left: performance of the multivariate analysis including the information on the fat jet, as given in Eq. (2), Eq. (3) and Eq. (4). Only in this plot do we optimize for tt and QCD backgrounds separately. Right: performance curve for the full analysis only accounting for the dominant QCD jet background.
the full multivariate tagger including the fat jet information of Eq. (3) is obvious for both backgrounds. In the right panel of Fig. 5 we first show the same improvement, but using a BDT trained on the QCD jets background only. Compared to the original HEPTopTagger we achieve an improvement of up to a factor 2 in 1/ε B for constant signal efficiency. We note that for the QCD background the combination of mistagged top kinematics and fat jet kinematics goes beyond the description of the hard process. For example initial-state radiation, sensitive to the color structure of the signal and the background, will be captured in this combination of observables. On the other hand, because the fat jets are defined using the standard jet algorithms and show a stable filtering performance, we do not envision major experimental problems provided pile-up subtraction works as well as expected.
The set of kinematic observables listed in Eq.(3) still relies on the deterministic HEPToptagger output. This means that the identification of a Z signal event is limited by the efficiency of two top tags. The choice of a working point in the top tagging algorithm will therefore limit our over-all efficiency. On the other hand, we already know that for hadronic Z searches the QCD jets background is dominant and will only be reduced through a combination of top tags and Z mass reconstruction.
In addition, we omit a fixed mass window for the reconstructed top mass m rec . Instead, we widely open the top mass and W -mass constraints in the tagging algorithm. For each of the tops the corresponding m rec value then becomes an output of the tagger. We provide the multivariate Z analysis with the smaller and larger of these two output m rec values, which we label as m min rec and m max rec respectively. Similarly, we avoid a fixed window for the ratio of the W -mass to the top mass, parametrized as f W in the tagging algorithm. Its deviation from the true value is given by the value of f rec defined in the Appendix. In the multivariate analysis we include the maximum of the two f rec values corresponding to each tagged top.
The result is shown in the right panel of Fig. 5, where the range of accessible efficiencies eventually extends to 56%. Altogether, the analysis based on the set of kinematic variables shown in Eq. (4) gives us an improvement of up to a factor 5 in background rejection for a constant Z -signal efficiency.

III. UPDATED TAGGER
Fat jets with a geometric size of R = 1.5 or even R = 1.8 have shown to be powerful new analysis objects at the LHC. The radius of the fat jet is directly related to the energy or boost of the heavy particles which can be captured. This means that a multi-purpose top tagger will be based on as large fat jets as possible. However, to realize their potential such large jets require additional treatment linked to their large geometric size. Without a dedicated analysis step, underlying event and pile-up will almost entirely wash out any structure inside the fat jet. Filtering [11] as an integral part of all versions of the HEPTopTagger [4,5] effectively reduces the geometric size of the fat jet used to reconstruct the top 4-momentum by introducing a second clustering stage with higher resolution. This solves the problem with underlying event and pile-up, but there remains a combinatorial problem caused for example by initial-state radiation. In particular the softer of the two subjets from the W -decay can easily be faked by a typical QCD jet inside the fat jet. This will lead to a wrong reconstruction of the top 4-momentum, which we can only counter by applying harder tagging requirements and hence reducing the tagging efficiency. These so-called type-2 tags [20], where only two of three top decay jets can be identified with a parton-level decay quark have been in the focus of HEPTopTagger studies at moderate boost [15,20,34]. In the reconstruction of heavy resonances we can solve the problem of (too) large fat jets by adapting the size of the fat jet to the kinematics of the tagged top. It turns out that this adaptive size of the fat jet also gives us another powerful kinematic variable for the multivariate analysis. Finally, we will show how this optimalR modification of our tagging algorithm can be further improved by including N -subjettiness variables.

OptimalR mode
There have been different attempts to adjust the size of the fat jet for example based on the transverse momentum of the fat jet [12,27,31], but none them lead to a dramatic effect in the performance of taggers. We instead choose a purely algorithmic way of determining the minimum size of the fat jet [35]. Assuming that three top decay jets are captured by the fat jet we can run the standard HEPToptagger algorithm to determine the top mass from the three leading subjets [15]. For a large fat jet size, typically R = 1.5 or R = 1.8, we compute a reference value of m rec , which should be around the top mass. In the usual tagging algorithm, this computation of m rec from filtered subjets takes into account final-state radiation off the on-shell top. We then reduce the size of the fat jet in steps of ∆R = 0.1 and compute the corresponding values of m rec (R). In case of several possible triplets, this includes the step of choosing the one closest to the physical top mass, as described in step (5) in the Appendix. As a function of the decreasing jet size R the fat jet mass m rec (R) will form a stable plateau, until the reduced fat jet will be too small to capture all three top decay jets. At this point m rec (R) will leave the plateau and show a significant drop. For R = 1.5, which is sufficient for the Z mass in our study, we define this drop through Once the shrinking fat jet passes this condition we go back one step to the last R value on the plateau and define this value as R opt . The smallest value we allow in this study is R opt = 0.5, but for p T,t > ∼ 1 TeV this value can be adjusted in the tagger setup. This value could be a challenge of the calorimeter resolution, so the corresponding results are subject to tests based on a full detector simulation in ATLAS and in CMS. In this paper we typically arrive around R opt = 0.6. The tagging result for this R opt value will be the output of the top tagger.
Measuring R opt defines another useful variable for the top tagger, because we can also predict R opt from the fat jet kinematics. A similar reasoning is used in the original HEPTopTagger algorithm, where a consistency condition on the reconstructed top momentum p T,t > 200 GeV ensures that the reconstructed top can actually be captured in the fat jet. In the optimalR mode we first determine the transverse momentum of the filtered fat jet, p T,f as described in the previous section. Including up to ten hardest subjets after a filtering step with R filt = 0.2 turns out to give the best estimate of p T,f for this purpose. Reducing this number to five subjets has no measurable effect on the width of the reconstructed p T,f distribution, but slightly shifts its maximum to smaller values [35]. The final number will be subject to an independent optimization in ATLAS and CMS.
For p T,f > 200 GeV we derive a closed form by fitting a function R (calc) opt ∝ 1/p T,f to simulated data, as described in the Appendix. The kinematic variables in our the multivariate tagger now read  (3) variable masses (4) optimalR (6) N-subjettiness (8) Qjets (11, 0.1x0.1 cells) = 13 TeV s Figure 6: Performance of the optimalR mode based on the kinematic variables in Eq. (6), including N -subjettiness variables as defined in Eq. (8), and including Qjets. As described in the text, for Qjets we need to require a finite calorimeter resolution, while all other curves do not include any detector effects. We only consider the dominant QCD background.
For this case of two top tags we choose R opt − R (calc) opt as the maximum deviation of the tagged tops. In this form all subsequent kinematic variables linked to the top tags will be evaluated with the fat jet size R opt . For the Z search R (calc) opt will be strongly correlated with other kinematic variables listed in Eq. (6). We nevertheless include it in the BDT because the general multivariate HEPTopTagger2 described in the Appendix will not include the top momenta in the tagging. The increase of the tagging performance from the optimalR mode is shown in the left panel of Fig. 6. While for small signal efficiencies the curves for optimalR and for the variable mass setup of Eq.(4) are identical within numerical fluctuations, we observe a significant improvement for larger signal efficiencies.

N-subjettiness
The arguably simplest question we can ask as part of a top tagger is the number of hard subjets inside the fat jet with a given jet mass. This number of subjets can be defined through an observable similar to event shapes like for example thrust, called N -subjettiness [32,33]. It is based on N reference axes which are required to match the k hard substructures, where ∆R i,j is the geometric separation between the axis i and the substructure k. In this form N -subjettiness parametrizes the deviation of the energy flow away from N jets not only related to an integer number of subjets, but also reflecting the color structure and the related radiation pattern.
In terms of original definition [32] we fix the exponent to β = 1. R 0 is an intrinsic cone size, chosen such that τ N < 1. Small values of τ N → 0 indicate that the complete substructure is described by N axes, indicating that there are at most N relevant substructures. The ratio τ N /τ N −1 will therefore become small for a fat jet with N hard subjets. For top tagging the ratio τ 3 /τ 2 will be most useful and can even be used as a tagger itself. Higher τ N values will contribute to a multivariate analysis of N -subjettiness, describing the jet radiation pattern around the assumed three partonic top decay momenta.
We will use N -subjettiness as an additional variable in our multivariate HEPTopTagger. Originally, this combination did not lead to a significant improvement when added to the A-shaped cuts [15]. However, when we open the cut f W on the reconstructed ratio m W /m t we observe a significant improvement for the extended set of kinematic variables. The complete set of relevant kinematic variables, now including N -subjettiness variables before and after filtering, is 2,N } (N -subjettiness).
(8) For more details on the N -subjettiness variables we refer to the Appendix. As in Eq.(6) all kinematic variables linked to the top tag will be evaluated with the fat jet size R opt . The details of implementation of the N -subjettiness variables is discussed in the Appendix.

Qjets
The main limitation even of the deterministic multivariate HEPTopTagger is the aim to identify a unique set of subjets from the top decay as part of the tagging procedure, which allows us to reconstruct the 4-momentum of the tagger top and for example compare it to Monte Carlo truth. If the kinematic selection identifies a wrong set of subjets as the best candidates for the top decay products, an actual top decay can easily fail the tagging procedure. To avoid this loss in signal efficiency we can allow for more than one set of candidate subjets to be tested. One approach that not only covers several candidates of subjet combinations, but which even allows for a statistical analysis of many such assignments is Qjets [26].
During the clustering of the fat jet the standard recombination algorithms combine the closest set of pre-jets according to a given measure. For the C/A algorithm this measure is the geometric separation d ij = ∆R 2 ij of the pre-jets i and j. Qjets generalizes this deterministic choice to a likelihood measure. For each pair of pre-jets (i, j) it computes the weight and then chooses the two pre-jets to cluster according to a random number trailing the weights ω (α) ij . For this study we choose α = 0.1, to balance the convergence of the algorithm with our aim of generating alternative subjet assignments for the top tagger. The standard jet algorithm corresponds to the limit α → ∞. The global weight for a clustering history is defined as The universal limiting case Ω (α) → 1 for a perfect clustering history indicates that in searching for the largest global weight Ω the choice of α should not make a major difference. The Qjets clustering procedure can be repeated many times, where in this study we typically rely on 100 clustering histories. They can be ranked by their global weights Ω (α) instead of the independent local weights used by a deterministic jet algorithm. For each history we apply the unclustering and top tagging algorithm. As long as the deterministic jet algorithm picks a reasonable merging history for a signal event we expect the outcome of the deterministic tagger and the tagger acting on the clustering history with the highest global weight to be close.
The first advantage of Qjets appears when during an early clustering step the deterministic measure d ij identifies the wrong merging in the sense that the remaining history cannot be described well by QCD. This deterministic history will by definition receive the maximum global weight Ω (α) = 1. However, an alternative history in better agreement with QCD could reach a similarly large global weight. Because Qjets provides many alternative clustering histories, we can search for a set of top tags with comparably large global weights. For example, we can use the two positively tagged Qjets histories with the highest global weight in the multivariate analysis. This way, a possibly misleading deterministic result is corrected. This should improve the performance in particular when we enforce high signal efficiencies, where the tagger becomes most vulnerable to a wrong clustering input. It turns out that already this simple modification gives a sizeable improvement in the signal efficiency.
The second improvement to the usual top tagger is based on HEPTopTagger output for the full set of 100 clustering histories. First, we include the fraction of positive top tags based on the default HEPTopTagger settings among all 100 Qjets histories, ε Qjets , as introduced in the Appendix. Next, we extract statistical information from distributions of the Qjets histories, like for example the reconstructed top mass m rec . This distribution is defined for ε Qjets × 100 histories. Signal events will strongly peak around the top mass with a possible secondary peak around the W -mass. QCD background events will instead show a smooth decrease. The two most relevant observables in the m rec distribution are the mean and the variance of this reconstructed top mass distribution with 100 entries, symbolically denoted as {m Qjets rec }.
Our multivariate analysis we base on the second approach. We start with the top-tagged Qjets history with the highest global weight and run the tagging algorithm of this history only. In addition, we include the statistical information of the m rec distribution of the subset of the 100 Qjets histories which defines a top candidate. The complete list of observables including the Qjets information now reads (11) where {τ N } represents the appropriate set of filtered and unfiltered N -subjettiness variables (for example N = 1, 2, 3 for each of the two tops). For the two tags in the Z analysis we choose the smaller ε Qjets value of the two. All variables from the tagger are evaluated for the optimized R size and the clustering history with the largest global weight.
In Fig. 6 we show the effect of the Qjets histories in addition to the other improvements. A key difference between the previous discussion and the Qjets approach is that we now need to include some kind of detector resolution, to limit Qjets to a manageable number of significantly different merging histories. For that reason we divide the calorimeter into η × φ cells of size 0.1 × 0.1 and pre-cluster the entire set of calorimeter entries before applying any jet algorithm. Because this detector resolution effect is not included for the previous results, the Qjets ROC curve does not consistently exceed the N -subjettiness curve without Qjets. On the other hand, we still observe the expected improvement towards large signal efficiencies. The moderate drop at small signal efficiencies gives us confidence that a full detector simulation will not lead to significant degradation of our results.

IV. FULL EVENT INFORMATION
Going back to the discussion in Sec. II the remaining question is how the new HEPTopTagger2 performance compares to other approaches designed for the upcoming LHC run. The benchmark for such a comparison is event deconstruction, or more specifically the projections for a Z resonance search [17]. As mentioned in our discussion of jet radiation in Sec.II the borders between the hard process or the Z decay on the one side and QCD jet radiation and its sensitivity to the signal and background color structure on the other side are washed out when we include for example filtered subjets or N -subjettiness information. We therefore start with a brief discussion of the additional information from jets in the entire event and then move on to the comparison with the leading benchmark in proposed Z analyses.

Additional jets
To determine to what degree the jet structure of purely hadronic Z → tt events helps the extraction of the signal from the tt and QCD jets background we first study the number and kinematic distribution of small C/A jets with R = 0.2 and p T,j > 10 GeV in addition to the fat jets fulfilling Eq.(1). We choose these very small jets in order to test information which might be available from so-called microjets in shower deconstruction. Our discussion should not be applied to an LHC analysis one-to-one and is instead aimed at capturing as much information as possible. Without any major cuts, the number of jets will consist of three decay jets per top quark, FSR jets, and ISR jets. For an inclusive event sample, we should be able to tell apart the different processes from the number of jets and the kinematics of the individual jets [36].
After a first level of cuts we see in Fig. 7 that the Z signal and the tt background both peak at 10 microjets, e.g. four jets from ISR and FSR combined. For the background the number is slightly larger, because we generate the scale of the hard process also through a large number of jets. We also see that the transverse momentum of the hardest jet is slightly larger for the signal. We could include these jet patterns in a multivariate analysis, but at this stage this information would be very heavily correlated with the variables from the top tagger.
In a second step we focus on the jet activity which does not contribute to the top tagging. Inside the fat jets we know that the top tagger includes information based on subjets with typically R = 0.3 and p T > ∼ 20 GeV after filtering. After two tags we then remove all calorimeter data associated with the filtered triplet of either of the top candidates and re-cluster the remnants into microjets with R = 0.2 and p T,j > 10 GeV. In the lower panels of Fig. 7 we see how after removing the signal decay jets the remaining number of jets peaks around two ISR or FSR jets. For the QCD background this number is higher, because it takes a larger number of equally distributed jets in the detector to fake a boosted massive top inside each fat jet. The transverse momentum of the hardest of the remaining QCD jets also peaks at very small values for the signal and the tt background, as one would expect for example for a small number of ISR jets. The bulk of the hardest QCD jets per event shows transverse momenta around p T,j = 50 − 200 GeV, still small compared to the hard scale imprinted on the multi-jet background through the kinematic selection of Eq.(1). We should be able to use this additional information for our BDT analysis, to improve the signal extraction. In the right panel of Fig. 7 we see the corresponding ROC curve. It turns out that almost all of the information available through the extra jet radiation is already included in our combined analysis of top tags and subjet kinematics.
Based on this piece of information we assume that additional jet information inside and outside the fat jets hardly changes the stable results of the updated top tagger, so we can compare the new HEPTopTagger2 to other multivariate methods.  (3) variable masses (4) optimalR (6) N-subjettiness (8) Qjets (11, 0.1x0.1 cells) = 14 TeV s Figure 8: Comparison of the multivariate HEPTopTagger2 analysis presented in this paper with the event deconstruction approach of Ref. [17]. All HEPTopTagger2 curves correspond to Fig. 6, but now with a collider energy of 14 TeV instead of 13 TeV, This comparison in the absence of an experimental validation should be taken as first estimate.

Comparison with other approaches
The most promising projections for boosted top identification and specifically searches for tt resonances during the upcoming LHC runs are available for shower deconstruction [16] or event deconstruction [17]. This method is based on a construction of likelihoods representing possible shower histories for a jet or a fat jet. The underlying objects are so-called C/A [22,24] microjets with R = 0.2 and p T > 10 GeV [17]. They are slightly softer and smaller than the subjets in a typical top tagger, but we have seen that the additional information from those jets should not make a big difference. Unlike general template methods, shower deconstruction relies on the soft and/or collinear approximation of QCD to compute the likelihood of a given shower history in terms of splitting probabilities and Sudakov factors (non-splitting probabilities). Based on the possible shower histories the likelihood ratio of a fat jet coming from a boosted top quark or from the QCD jet background acts as a measure for the top tag. One problem with shower deconstruction, like any probabilistic approach, is that we cannot separate the identification and the reconstruction of the boosted top quark. This means we cannot for example show the quality of the reconstructed 4-momentum compared to Monte Carlo truth.
The Z analysis using event deconstruction starts with two fat jets of size R = 1.5 and the acceptance cuts given in Eq. (1). The number of microjets is limited to 9 per fat jet. In addition to the likelihood separating the top or QCD origin of each of the two fat jets, the event likelihood measure now also includes a likelihood describing the resonant or non-resonant production of the pair of fat jets given their 4-momenta. At the level of the hard process this part is not very different from the established matrix element method [37] and largely replaces an analysis of the m tt and p T,t distributions defining the multivariate analysis of Eq. (2). In Ref. [17] the observable width of the m tt resonance is assumed to range around 65 GeV, an assumption we follow. In our analysis the precise resolution for example after detector effects only plays a secondary role, because the resolution of the HEPTopTagger2 is limited to 145 GeV, as shown in Tab.II.
In Fig. 8 we show the performance of the analysis developed in this paper with the recent benchmark of event deconstruction. One difference to the HEPTopTagger results shown in Fig. 6 is that we now show Z efficiencies up to 68%, confirming that Qjets indeed gives us a major improvement for very large signal efficiencies. Another difference is that for a direct comparison we now assume a collider energy of 14 TeV. Both, event deconstruction and the new HEPTopTagger show a comparable performance for the upcoming run. The final answer on both methods will only be given by experimental studies including data.

V. CONCLUSION
We demonstrated how the updated HEPTopTagger2 performs in searches for Z bosons or other heavy resonances decaying to top pairs in the upcoming LHC run. Based on the original HEPTopTagger [5] we modify the tagging algorithm and add several additional kinematic variables to a multivariate analysis: -fat jet kinematics to account for final-state radiation in resonance searches; -algorithmically optimized size of the original fat jet combined with its prediction (optimalR mode); -N -subjettiness probing the more general subjet structures inside the fat jet; -Qjets with a global picture of the most likely clustering histories giving a top tag.
Each of these improvements can be added to the top tagging individually. For the specific Z resonance search we altogether achieve an increase of the background rejection by a factor of 30 for a constant Z -signal efficiency of 10%. Compared to the original tagger [5] the background sculpting in the invariant mass of the top pair is significantly reduced [15]. These updated results are at least competitive with the leading estimates for other tagging methods.
Because the multivariate Z analysis includes several layers of improvement, not necessarily linked to the actual top tagging, we also show in the Appendix the corresponding improvements for top tagging in tt events. There, we test the updated tagger for moderate (p T,t > 200 GeV) and sizeable (p T,t > 600 GeV) boost and find a significant improvement in particular for larger boost. The limiting factor for moderate boost still is capturing all three top decay jets inside a fat jet, which has to be targeted by a dedicated low-p T mode [20]. The corresponding HEPTopTagger2 described in the Appendix will be made publicly available [5,38]. In particular for Qjets there exist different modes which need to be tested on data.
Comparing the improvement of the Z analysis with that in the individual top tags shows that the benefits for the full Z case are significantly larger than those just from the top tags. A lesson from this is that it is useful to consider the optimization of top tagging, not only in its own right, but also in the context of full search analyses.

Appendix: HEPTopTagger2
In the past it has proven useful to publish details about the HEPTopTagger algorithm. We describe the new structure reflecting all changes in Refs. [5,15,20] in this Appendix. Because the main body of the paper is focused on the performance in resonance searches we then present benchmark results based on purely hadronic tt events in the Standard Model. They can be directly translated for example into semi-leptonically decaying tt pairs. Finally, the enhanced capabilities of the HEPTopTagger2 have lead to enough of a complexity of the actual code that we briefly describe the run modes, the input parameters, and the available output information from the tagger.

Algorithm
The basic HEPTopTagger2 algorithm largely follows the original algorithm described in Ref. [5], but is based on FastJet3 [24] and includes a number of new features: 1. define a C/A fat jet with R fat = 1.8 and determine the splitting history through the default clustering.
2. identify all hard subjets using a mass drop criterion: undo the last clustering of the jet j, into two subjets j 1 , j 2 with m j1 > m j2 ; require m j1 < f drop m j with f drop = 0.8 to keep both; otherwise, keep only j 1 ; further decompose or add each subjet j i to the list of relevant substructures. A global soft cutoff m ji > m min = 30 GeV can be adjusted † .
3. iterate through all triplets of three hard subjets: filter them with resolution R filt = min(0.3, ∆R jk /2); use the N filt = 5 hardest filtered constituents and calculate their combined jet mass; re-cluster these five subjets into three assumed top decay jets; reject all triplets outside m 123 ≡ m rec ∈ [150, 200] GeV; keep the event if at least one such triplet exists. For the multivariate analysis this window is opened to m rec < 1 TeV, which allows us to use m rec as a kinematic output of the tagger.
This set of re-clustering and filtering steps by default uses the C/A jet algorithm [22]. However, to guarantee infrared safety and enhance the performance at large boosts [20] it can be switched to k T jets [23].
4. order the three subjets j 1 , j 2 , j 3 by p T ; if the masses (m 12 , m 13 , m 23 ) satisfy one of the following three criteria, accept them as a top candidate: 5. of all triplets passing the above criteria in a given fat jet choose the one with m 123 ≡ m rec closest to m t . This selection has shown to be the most efficient, and applying it after all kinematic cuts minimizes the background sculpting. The m rec and f rec values supplied to the multivariate analysis are those corresponding to this triplet.
6. for consistency, require the reconstructed p T,t to exceed 200 GeV.
7. in the low-p T mode [20] reduce this threshold to p T,t > 150 GeV; compute the Fox-Wolfram moments [25] of the subjets relative to each other and relative to the reconstructed top momentum. This mode is not part of the usual tagger and relies on external GSL libraries [39] for Legendre polynomials.
8. in the optimalR mode repeat steps 1 to 3 with a decreasing fat jet radius in steps of ∆R = 0.1; based on the condition m (1.8) rec − m rec > 0.2m (1.8) rec determine the minimum radius R opt > 0.5; follow steps 4 to 6 with this modified fat jet. We also parametrize the expected value for R opt in terms of p T,f based on the numerical simulation of the top decay kinematics illustrated in Fig. 9 R (calc) opt 9. in the N -subjettiness mode [15] compute the τ j [32] as defined in Eq. (7) from the filtered and unfiltered subjets, as described below. Again, this mode is not part of our tagger code and relies on the FastJet Contrib [24,38] add-on for N -subjettiness [32].   (17) optimalR (18) N-subjettiness (20) Qjets ( (17) optimalR (18) N-subjettiness (20) Qjets (21) > 600 GeV T p Figure 10: Performance of the HEPTopTagger2 for tt production in the Standard Model. We show the incremental improvements from the extended multivariate analyses for top quarks with pT,t > 200 GeV and pT,t > 600 GeV.
Following this description the low-p T (7) and N -subjettiness (9) modes simply add kinematic observables to the tagger output. These observables can be included in a multivariate analysis or can be cut on in the deterministic top tagging decision. The improvement in the low-p T mode is illustrated in detail in Ref. [15] while the impact of N -subjettiness variables on the resonance search is illustrated in Fig. 6. In contrast, the optimalR mode and the Qjets mode modify the clustering histories (1) underlying the mass drop search (2). Depending on the modified fat jet size or on the Qjets weight they return a set of tagging outputs. For the optimalR mode it is straightforward to choose the smallest reasonable fat jet size R opt for the actual tagging. The Qjets histories can be evaluated in a range of possible ways.

Performance
The main body of this paper focuses on tt resonance searches using the HEPTopTagger described above. While the combination of tagged top kinematics and fat jet kinematics in Sec.II does not directly translate into to a universal top tagger, the multivariate aspects discussed in Sec. III, namely optimalR, N -subjettiness, and Qjets do. Here, we show efficiencies for extracting tt events from the QCD multi-jet background.
Our analyses are based on fully hadronic tt signal and QCD dijet background samples generated with Pythia8 [29]. For the general top tagger analysis in this Appendix we include underlying event in the event generation and mimic the limited detector resolution by clustering the hadronic activity into η × φ cells of size 0.1 × 0.1, similar to the Qjets results shown in Fig. 6. Instead of the hard acceptance cuts in Eq.(1) we now allow for softer fat jets. Two multivariate BDT analyses focus on tt samples with where the top momenta are evaluated on the Monte Carlo truth level. We select events with fat C/A jets of radius R fat = 1.8 and |y fat | < 2.5 constructed with FastJet.
Background efficiencies ε B are defined as relative to the number of those fat jets. For the signal efficiencies we require that the fat jets can be matched to a parton level top quark within ∆R < 0.8. Using the original version of the HEPTopTagger [5] we find for the p T > 600 GeV samples a signal efficiency of ε S = 35.6% and a mis-tagging rate ε B = 2.7%. The first change in the algorithm addresses the signal efficiency and background sculpting. In the original algorithm the triplet of subjets closest to the true top mass is selected and only later the mass plane cuts are applied. Therefore, the tagger will fail if this triplet does not pass the mass plane constraints and no alternative triplet is analyzed. To eliminate this limitation, we first apply the mass plane constraints and then pick the triple closest to the top mass, as described above.
As in the main text we study further improvements of the tagger based on ROC curves. To allow for such improvements we loosen the cuts of the tagger to m rec < 1 TeV and f W = 0.3. The initial set of BDT parameters in analogy to Eq. (4) is The large cone size of R = 1.8 is not always appropriate, so the optimalR mode optimizes the radius of each fat jet. Starting from the initial cone size we stepwise reduce the size of the fat jet until the criterion Eq. (5) indicates that we miss a top decay jet. For the last stable R size we run the usual tagging algorithm. We can calculate the expected value R calc opt for the critical radius based on the transverse momentum of the filtered fat jet. For a fat jet originating from a top decay this prediction should agree with the measured value, while for a background fat jet the two are only strongly correlated when the entire subjet kinematics is a perfect match to a top decay. For the optimalR mode we set up a BDT analysis with the observables All tagging observables are evaluated for a fat jet with size R opt . In Fig. 10 we show the improvement from the optimized size of the fat jet. Obviously, it is more impressive for larger boost, while for p T,t > 200 GeV the optimalR mode hardly leads to a reduction in fat jet size.
The N -subjettiness variables are best applied independently for fat jets which would pass and would not pass the initial tagging criterion. The optimalR working point which corresponds to the signal efficiency ε S = 0. 22(0.27) in Fig. 10, defines these two categories. Fat jets passing Eq. (19) can be assumed to include a complete set of top decay products and are filtered with R fat , τ 3 , τ 1 , τ 1 , τ 3 /τ 2 , τ 2 /τ 1 } (N -subjettiness, fail), (20) and later combine them into one ROC curve. This precise condition is represented by the more generic Eq. (8).
In Fig. 10 we show the corresponding ROC curves for a successively improved tagger.
Finally, we can replace the deterministic clustering history from the usual jet algorithm with a set of Qjets histories with large global weights Ω (α) defined in Eq.(10) for α = 0.1. This way we avoid cases where the deterministic clustering history entering the top tagging algorithm is misled during the independent evaluation of splittings in the usual jet algorithm. When defining jets as analysis objects for a hard process this does not pose a problem, but for subjet analyses it can have an effect.
Our analysis is based on 100 Qjets histories per fat jet. In Tab. III we show their signal and background efficiency if required to lead to individual top tags. As the reference value we use the default HEPTopTagger with fixed mass windows. Based on 100 Qjets histories we then define the fraction ε Qjets of histories which lead to a top tag with the default tagging setup. We see that for moderately boosted tops the deterministic signal tagging efficiency can be reproduced by requiring 30% of the Qjets histories to deliver a positive tag. The corresponding mis-tag probability is slightly reduced compared to the deterministic tagger. For harder tops the corresponding value is around ε Qjets > 20%, with no improvement in the background rejection.
As discussed in Sec. III Qjets offers two strategies to improve the top tagger. To maximize the improvement in the tagging performance and to limit the CPU time we base the multivariate analysis on the tagged history with the largest global weight. As additional parameters we include the value of ε Qjets as well as the mean and variance of the m rec distribution with the 100 Qjets entries, symbolically denoted as {m Qjets rec }. For the BDT analysis the variables are  Figure 11: Performance of the HEPTopTagger2 for tt production in the Standard Model. For pT,t > 200 GeV and pT,t > 600 GeV we we focus on different Qjets setups, based on a more basic multivariate tagger without optimalR and N -subjettiness.
As usual, all variables from the tagger are evaluated for the optimized R size and the clustering history with the largest global weight. The additional improvement is shown in Fig. 10.
Because Qjets offers a variety of improvements to the tagger, we study different setups based on the stage with multivariate mass windows in Fig. 11. We start by replacing the deterministic C/A output with the most likely Qjets history and including ε Qjets in the multivariate analysis. This leads to a moderate improvement of the tagger at large transverse momenta and at large signal efficiencies. Adding the statistical information from the ε Qjets × 100 entries in the m rec information leads to a sizeable improvement over a wide range of signal efficiencies. This is the mode we use for the Z analysis as well as in Fig. 10.
Next, we add the second-best Qjets history to the tagger, such that the multivariate tagger (including ε Qjets ) is free to construct a criterion based on one or two tags in the two best Qjets histories. For most of the ROC curves this comparably simple approach is as successful as the full statistical information. Finally, adding the statistical information on the m rec distribution leads to a mild improvement.

Interface
To apply the HEPTopTagger algorithm to a fat C/A jet constructed with FastJet3 [24], the only necessary steps are executing the default constructor HEPTopTagger(fastjet::PseudoJet jet) followed by running the tagger using void run(). This will analyze the fat jet using the optimalR procedure with the default settings given in Tab. V. The available operation modes are shown in Tab. IV. All configurable parameters are listed in Tab. V. Functions to retrieve results are presented in Tab. VI.
QHTT() sets up the Qjets mode. It is applied to a fully configured HEPTopTagger by void run(HEPTopTagger htt). All configurable parameters are given in Tab. VII. A list of functions to access the results is presented in Tab. VIII.
In addition, we provide a framework for the calculation of Fox-Wolfram moments that relies on an existing installation of GSL [39]. While the constructor FWM(vector<fastjet::PseudoJet> jets) allows the calculation of Fox-Wolfram moments for a given set of jets, FWM(HEPTopTagger htt, unsigned selection) uses the b, W 1 , and W 2 momenta from the HEPTopTagger run and calculates the Fox-Wolfram moments in the top rest frame. The boost axis a itself can be included [15]. Subsets of these four vectors can be set via unsigned selection, as a sequence of 0 or 1 in the order abW 1 W 2 . In Tab. IX we show how to extract the Fox-Wolfram moment of a given order of the Legendre polynomials.