Investigating top tagging with Y$_{\text{m}}$-Splitter and N-subjettiness

We study top-tagging from an analytical QCD perspective focusing on the role of two key steps therein : a step to find three-pronged substructure and a step that places constraints on radiation. For the former we use a recently introduced modification of Y-Splitter, known as Y$_{\text{m}}$-Splitter, and for the latter we use the well-known N-subjettiness variable. We derive resummed results for this combination of variables for both signal jets and background jets, also including pre-grooming of the jet. Our results give new insight into the performance of top tagging tools in particular with regard to the role of the distinct steps involved.

While the rapid development of substructure methods often resulted in novel powerful techniques, many of which are currently still in use, some key questions also emerged about the robustness of the methods being employed. Such questions were concerned, for instance, with the accuracy to which Monte Carlo event generators provide a reliable description of substructure observables and about the dependence of tagger performance on poorly understood physics aspects like non-perturbative effects in QCD. This led to a parallel effort to better understand jet substructure as relevant to tagging and grooming of jets originating from boosted particles, from the first principles of QCD theory [25][26][27][28][29][30][31][32][33][34][35][36][37]. As a consequence it was possible to identify flaws in existing tools [25,26], design superior tools which remove some of the main flaws thus identified [25,28] , and shed light on the factors that influence performance including the role of non-perturbative effects [25, 31-33, 38, 39]. 1 The more recent advent of machine-learning (ML) tools to study jets has also yielded impressive performance gains with ML based taggers shown to often significantly outperform standard ("QCD-based") tagging algorithms [39][40][41][42][43][44][45][46][47][48][49]. Nevertheless the questions raised for earlier tagging methods in terms of exclusive reliance on parton showers to study performance and the issue of performance gains originating in non-perturbative effects remain in the ML case, and are indeed potentially re-enforced. Here one can mention studies that have investigated the resilience of Lund-plane based ML [38,39] against non-perturbative effects, finding that eliminating the non-perturbative region results in a marked decrease in performance. Furthermore new research on parton showers has revealed flaws in the perturbative structure of dipole showers including a failure to reproduce the QCD double emission matrix-element for soft emissions strongly ordered in angle [50,51], in principle a crucial regime for meaningfully describing jet substructure.
Given all of the above, it therefore remains of importance to continue to develop the program of understanding jet substructure taggers via perturbative QCD. While much success has been obtained in analytic understanding of the impact of taggers and groomers on signal and background for two-pronged decays, there is a more limited understanding of top tagging which is a somewhat more complicated problem owing in part to the coloured parton initiating the signal jet. In terms of tools, various methods for finding three-pronged jet substructure have been introduced in the literature including the early ATLAS top tagger [18] based on Y-splitter, as well as the CMS top tagger [14,15,17] conceptually related to the mMDT/Soft Drop procedure [25,28]. Other widely used methods for top tagging include the Johns Hopkins top tagger [12] and the HEP top tagger [13], shower deconstruction [52] and template tagging [53].
Amongst methods aiming at constraining radiation around three hard prongs, the Nsubjettiness ratio τ 32 [54] and Energy Correlation Function ratios [55] have been actively studied. Combinations of these tools with grooming have also been studied and exploited in experimental analyses. For example Refs. [56,57] makes use of trimmed jets with a top tagging procedure involving a combination of Y-splitter and the N-subjettiness ratios τ 32 and τ 21 while Ref. [58] reports, amongst other studies, combinations of the CMS top tagger with a τ 32 cut. Such combinations are similar in essence to the combinations we shall study in the present particle, though various details differ.
A first analytical study of the impact of prong-finding methods for top tagging supplemented with grooming, in the high p T limit i.e. with p T in the TeV range, was carried out in Ref. [36]. This work included the study of IRC safe extensions of the IRC unsafe CMS top tagger as well as studying an adaptation of the Y-splitter method, Y m -Splitter .
In this article we extend the work of Ref. [36] by combining prong-finding with Y m -Splitter , with an additional radiation constraint coming from a τ 32 cut. Further, we account for the impact of pre-grooming with Soft Drop (SD) and mMDT. We begin with a set of Monte Carlo studies that motivate the use of this particular combination of methods as well as indicate optimal values for the τ 32 cut, τ ∼ 0.2. Next we obtain resummed analytic results for QCD background jets for Y m -Splitter with the τ 32 cut, in the small τ limit. These results are obtained in a modified leading-logarithmic approximation where other than capturing all leading-logarithmic (LL) terms, one also retains important classes of next-toleading-logarithmic (NLL) terms such as those from hard-collinear emission. Following the treatment of Ref. [37] we then extend our results to include finite τ effects which are in general non-negligible even for our typical value of τ ∼ 0.2. We study both the un-groomed case as well as consider pre-grooming with Soft Drop (β = 2) and the mMDT. Despite this rather complex combination of methods leading to a highly non-trivial observable, we find that our results are in broad agreement with those from parton shower studies, with remaining moderate differences consistent with the expected size of omitted (beyond LL) terms.
Next we study signal jets on a similar footing. We start with a simple situation with only a mass window cut and compare the resulting Sudakov form factor to results from Pythia, finding excellent agreement. This step is useful in order to test the validity of our simplifying assumptions about radiation in a top initiated jet. We then extend our studies to include Y m -Splitter in addition to the mass window cut, also accounting for pre-grooming using both mMDT and SD (β = 2). Our results here improve upon previous work by accounting for a previously neglected situation where one of the prongs found by Y m -Splitter can be a soft gluon rather than one of the top decay products. The inclusion of this correction term brings our results for the signal into substantially better agreement with Pythia simulations, than was seen in previous studies [36]. We then account for the impact of τ cut in the signal case. Although our treatment of finite τ corrections for the signal is not as accurate as the corresponding treatment for the QCD background, we obtain a good description of the τ dependence of the result especially for the un-groomed case and for pre-grooming with SD(β = 2).
The layout of this paper is as follows: We start in section 2 by recalling the definitions of the Y m -Splitter tagger, N-subjettiness including our choice of axes, and Soft Drop grooming. In section 3 we report results from initial Monte Carlo studies which help lay the ground for our subsequent analytical investigation. In section 4 we carry out our calculations for Y m -Splitter with a τ 32 cut for QCD background jets. Here we discuss in detail the small τ limit as well as accounting for finite τ effects, studying both the differential distribution and the cumulant. We close this section by including grooming with both mMDT and SD (β = 2) and comparing our results to those from Herwig and Pythia showers. Section 5 is devoted to signal jets where we first study the effect of a mass window cut alone followed by studies of Y m -Splitter including grooming and finally the inclusion of a τ 32 cut. Section 6 discusses our analytical results in terms of the understanding gained for the performance of the Y m -Splitter , τ 32 and grooming combinations in terms of the interplay between the τ and mass window cuts, and reports further comparisons to parton showers. Our conclusions are summarised in section 7.

Tagger definitions
The primary step involved in top-tagging is the identification of three-pronged jet substructure that characterises top-decay. There are various methods that have been suggested in the literature for the identification of three-pronged substructure within a fat jet, some of which have also been used for phenomenology. Examples of prong finding methods include the early CMS and ATLAS top taggers [17,59,60] and Y m -Splitter , an adaptation of Y-splitter introduced in Ref. [36] which we shall use for our analytical studies here. Additionally jet shape variables such as N-subjettiness [54], which we also use here, are known to be powerful methods that quantify the N-pronged nature of a jet through placing constraints on radiation from N identified prongs within a fat jet. Techniques combining prong-finding methods with jet shape variables are also known to give rise to important performance gains, have been used in experimental studies [56][57][58]61] and motivate our desire to better understand such combinations. We define in more detail below all the specific methods that we use in this article.

Y m -Splitter
The Y m -Splitter method for top tagging [36] takes a jet clustered with the gen-k t (p = 1 2 ) algorithm (referred to as gen-k t from here on) and performs the following steps: (a) Undo the last clustering, to give two sub-jets, both of which are examined for the condition p t,i > ζp t,jet . If either sub-jet fails the ζ condition, the jet is rejected.
(b) Check which sub-jet produces the larger gen-k t distance when de-clustered, and undo the last clustering of this sub-jet. Check whether the resulting sub-jets from this de-clustering pass the ζ condition. If either the de-clustering or the ζ condition fail, the jet is rejected.
(c) Find the pairwise masses of the three final sub-jets, and require that min(m 12 , m 13 , m 23 ) > m min . If this condition is not met, the jet is rejected.

N-subjettiness
We will use the N-subjettiness ratio variable with the sum over i running over the jet constituents, ∆R ij = (∆y ij ) 2 + (∆φ ij ) 2 , and the N partition axes are labelled 1 · · · N . There are various options for defining the partition axes, for instance finding the axes which minimise τ N (optimal axes). Throughout this work we use β = 2 as this facilitates our analytical studies and we make use of the gen-k t axes with p = 1/2. These axes are obtained by clustering the jet with the gen-k t (p = 1 2 ) algorithm and identifying the axes with the N exclusive sub-jets resulting from N − 1 de-clusterings. For τ 2 with β = 2, these have been shown to be very close to the optimal axes [37]. For τ 3 these axes are exactly the three prongs returned by Y m -Splitter which is helpful in facilitating the resummation of the tagged fraction of events.

Soft Drop
Soft Drop takes a jet, re-clusters it with the Cambridge/Aachen (C/A) algorithm [62,63] and performs the following steps: (a) Undo the last clustering, to give two sub-jets.
(b) Examine the lower p T sub-jet for the condition p t,i > z cut ( ∆R R ) β (p t,i + p t,j ). (c) If this condition is not met this sub-jet is removed from the jet and the groomer goes back to step (a). If it is met, the groomer stops and this is the final jet.
Throughout this work we set z cut = ζ, the Y m -Splitter parameter.

Monte-Carlo study
In this section we investigate the performance of various tagging procedures based around the N-subjettiness variable τ 32 as well as how they are impacted by hadronisation, ISR, and MPI. The tagging procedures considered all have the restriction that the jet mass is between 160 GeV and 225 GeV, corresponding to a window around the top mass, and are studied as a function of the cut on τ 32 . After examining N-subjettiness cuts with and without pregrooming we combine these cuts with the Y m -Splitter method which we again investigate with and without pre-grooming. Two pre-grooming options are considered, SD (β = 2) and mMDT (equivalent to SD with β = 0). We start by generating 1 million tt and qq events with Pythia 2 . ISR, MPI and hadronisation were initially deactivated, and a generation cut of p t > 1600 GeV was applied. Jets were clustered with the Cambridge/Aachen algorithm with R = 1 and p t,min = 2 TeV using Fastjet 3 [64], as was the case for the studies in Ref. [36].    Where jets are groomed we use z cut = 0.05. τ 32 is calculated using the N-subjettiness fastjet contrib [65], and where Y m -Splitter is used we choose m min = 50 GeV and ζ = 0.05. This information is then used to construct the tagged fraction of events and the signal to square-root-background as a function of a cut on τ 32 . The same procedure is used both with only hadronisation, and then hadronisation, ISR and MPI activated to assess their impact.
To discuss the features that emerge from our Monte Carlo studies let us first examine the top row of Fig. 1, i.e. Figs. 1a and 1b, which show the signal significance as a function of the τ 32 cut, τ , without any grooming and without Y m -Splitter on the left and with Y m -Splitter on the right. It is clear that in the absence of a grooming step ISR and MPI significantly damage performance in each case, although the inclusion of Y m -Splitter results in a higher signal significance after all effects are considered.
Next we come to the plots involving the application of grooming i.e. Figs. 1c and 1d for SD (β = 2) pre-grooming and Figs. 1e and 1f in the bottom row for the mMDT. From these one notes that grooming, especially with mMDT, is an effective method to significantly mitigate ISR and MPI. When combining grooming with Y m -Splitter we observe that both hadronisation and ISR+MPI are significantly reduced, resulting in high performance with an optimal value of τ ∼ 0.2 emerging for mMDT pre-grooming and τ ∼ 0.3 for SD (β = 2). The best performance, i.e. highest signal significance, comes with mMDT pre-grooming and Y m -Splitter applied in addition to the τ cut, as shown in Fig. 1f. This combination is also more resilient to ISR and all non-perturbative effects at the same time. In contrast although pregrooming jets and cutting on τ 32 without Y m -Splitter (see figure 1e) gives good performance at hadron level, the discrepancy with the parton level result indicates that the performance of this procedure cannot necessarily be understood from perturbative QCD arguments alone and may be more susceptible to mis-modelling of non-perturbative effects in parton showers 3 .
In summary, applying Y m -Splitter to pre-groomed jets with cuts on τ 32 and the jet mass is a high performing method for tagging hadronically decaying high-p T top quarks. 4 The performance is also well described by parton level predictions and is therefore reasonably robust against effects which are less well theoretically understood in this context. These observations provide some of the main motivation for detailed theoretical studies using perturbative QCD, which will be the subject of the next two sections. 3 A possible reason for this might be that a pure τ32 cut is not IRC safe and is instead only Sudakov safe [66,67] while the application of Ym-Splitter prior to the subjettiness cut prevents τ2 from vanishing, resulting in an IRC safe quantity. 4 We find that, for comparable signal significance, these methods appear, in the high pT region, to outperform the dense neural net and boosted decision tree used by ATLAS in [68], although it should be noted that the two studies are perhaps not equivalent, as no attempt was made here to examine detector effects, which were included in the ATLAS study.

Y m -Splitter splitter with a τ 32 cut : QCD jets
We start by examining the impact of a τ 32 cut on QCD jets after applying Y m -Splitter .
Analytical studies for Y m -Splitter as applied to top-tagging, with and without pre-grooming, have already been carried out in Ref. [36]. These studies derived results for the jet mass distribution and consequently the efficiency for QCD jets tagged with Y m -Splitter using the technique of QCD resummation. Resummation is required in order to address the multi-scale nature of the problem. Crucially the highly boosted limit implies that the invariant jet mass m 2 p 2 T , with m 2 ∼ m 2 t and p T values in the TeV range, which leads to large logarithms in ρ = m 2 /R 2 p 2 T . A good description of the jet-mass distribution then requires resummation of the logarithms in ρ. Additionally for Y m -Splitter we have ρ min = m 2 min /p 2 T R 2 1 and a further small scale ζp T , the minimum energy of an emission that passes the ζ condition, with ζ 1. Large logarithms are then expected and do arise in ρ, ρ min , ζ and in ρ min /ρ. In Ref. [36] a modified leading logarithmic resummation was performed which included all double-logarithmic terms and a subset of single-logarithmic terms such as those arising from hard-collinear emissions. The logarithms that are most crucial to resum are those in the smallest parameters ρ and ρ min . Typical values of ζ ∼ 0.05 and ρ min /ρ ∼ m 2 W /m 2 top are larger and hence we only aim to retain logarithms in these parameters at leading doublelogarithmic accuracy.
Here, relative to previous work [36] we shall additionally include the τ 32 cut, considering the possibility that τ 32 is not small. In doing so we shall follow closely the treatment of Ref. [37] for resummation of jet mass with a τ 21 cut.

Leading-order result
We start with the leading-order result, computed in the soft and collinear approximation which yields the leading logarithmic terms. For Y m -Splitter this starts at order α 2 s for QCD jets, since one requires at least two emissions within the jet (i.e. at least three partons) in order to be accepted by Y m -Splitter . Since for three partons τ 3 vanishes, a cut requiring τ 32 < τ is trivially satisfied. Therefore the leading-order result is unchanged from the pure Y m -Splitter case of Ref. [36]. For the case of a quark initiated jet and in the abelian C 2 F channel it is given by 5 where we definedᾱ = C F αs π , taking for definiteness the case of a quark initiated jet. In deriving the above result we have taken a strongly-ordered in angle configuration with θ 2 θ 1 , made a leading logarithmic approximation that the jet mass is dominated by the emission that makes the larger contribution, and imposed the tagger conditions by requiring both emissions to pass the ζ cut and implemented the ρ min condition in the strongly-ordered limit where θ 12 ∼ θ 1 .
We then obtain: A similar result is obtained for the C F C A colour factor while in the C F T R n f channel the result is one logarithm down due to the lack of a soft enhancement in the p qg splitting function. For future convenience we note that the leading-order result can also be expressed in terms of the highest-mass emission ρ a and the next-highest-mass emission ρ b . Written in these terms we have where in the ρ min condition we used strong angular ordering to replace θ 2 ab by max(θ 2 a , θ 2 b ). Finally, we note that beyond double logarithmic accuracy a more precise result at order α 2 s can be achieved by considering three collinear partons within a jet without imposing strong ordering between the partons. Such configurations are described by triple-collinear splitting functions and calculations implementing the triple-collinear result were included in the studies of Y m -Splitter carried out in Ref. [36].

Resummed results
Now we turn to the resummed result. We first consider the case where τ 32 < τ 1. Then we shall lift the requirement that τ 1 i.e. we shall account for finite τ effects.

The small τ limit
For the case of Y m -Splitter one considers, as in Ref. [36], two real emissions that pass the tagger cuts accompanied by an ensemble of soft and collinear emissions which are constrained to set a smaller gen-k t distance (i.e. mass) than either of the two leading emissions. This constraint on real emissions produces a Sudakov form factor. In the current case the emissions are additionally constrained by the τ cut. Here we shall derive the Sudakov form factor at leading-logarithmic (LL) accuracy, capturing all double-logarithmic terms including those in τ and running coupling effects, and also include some important single logarithmic effects such as accounting for hard-collinear radiation.
For the two emissions accounted for at leading-order, Eq. (4.2), we shall again label ρ a as the emission that sets the larger mass and ρ b the smaller mass. Consider first all subsequent primary emissions, i.e. emissions from the hard parton initiating the jet. These emissions must not give rise to larger mass (gen-k t ) values than the first two emissions de-clustered by Y m -Splitter and they must set a value of τ 32 < τ . Recall that the contribution of an emission i to τ N is given by z i min(θ 2 i1 , ..., θ 2 iN ). As was the case for τ 2 [30], the limit of strong angularordering ensures that, for emissions coming from a leg lying along one of the N-subjettiness axes, the smallest of the θ ia angles is either the angle between the emission and its emitter, or can be approximated by this angle to LL accuracy. For a primary emission this implies that the contribution to τ 3 , τ 3i = z i θ 2 i where z i is the energy fraction and θ i is the angle of the emission wrt the hard initial parton. The value of τ 2 on the other hand is dominated, to LL accuracy, by the second highest mass emission ρ b , due to the strong ordering in masses relevant at LL accuracy. 6 The condition on primary emissions then reads: The first step function reflects the condition on τ 32 while the second condition reflects the constraint on mass which gives the primary emission Sudakov form factor for Y m -Splitter in Ref. [36], i.e. that none of the emissions i have a gen-k t distance larger than ρ b by assumption.
Since τ < 1 the second condition is automatically satisfied and the condition on primary emissions is just given by the stronger constraint Θ The primary emission Sudakov factor then arises from a veto on any emissions violating this condition. More precisely it takes the form S = e −R (primary) with the "radiator" R (primary) given by where C R is a colour factor that depends on the identity of the initiating jet i.e. C F for a quark and C A for a gluon jet, and p(z) is the QCD splitting function describing collinear emission from a quark (p(z) = p gq (z)) or gluon (p(z) = p gg (z)). For the argument of the running coupling we have used the k t of the emission (in terms of z and θ) as required in the soft and collinear limit. As well as vetoing primary emissions from the parton initiating the jet, the overall Sudakov factor must also account for a veto on secondary emissions which would set a value of τ 32 larger than τ from either of the two emissions included in the leading order pre-factor. In the case of soft secondary emissions the angle of emission θ i is limited by angular-ordering to be less than the angle of the parent θ a or θ b . Apart from this constraint, for emissions off parton a we have the same constraint as for primary emissions and hence we obtain, for emissions off parton a: where we note that z represents the energy fraction of parton a's energy carried by the soft secondary emission. We also note that for secondary emissions, the gen-k t distance, entering the veto condition above, differs from the mass even in the soft limit, involving one less factor of z a . A similar equation gives the result for emissions off parton b, with the obvious replacement of z a and θ a by z b and θ b . The overall result can be written as a Sudakov form factor weighting the leading-order, order α 2 s result which serves as a pre-factor. For simplicity if one retains just the leadinglogarithmic expression for the pre-factor reported in Eq. (4.3), we can obtain the resummed result by inserting the factor S = e −R in the integrand in Eq. (4.3) where R = R (primary) + R (secondary,a) + R (secondary,b) . While our results include the full running coupling and hardcollinear effects we report below a simplified result for the Sudakov factor S in the limit of a fixed coupling and retaining only the soft-collinear behaviour i.e. replacing p(z) and p gg (z) by the soft limit expression 2/z: where the term involving ln 2 1/ρτ b comes from primary emissions, the term involving ln 2 ρa τ ρ b comes from vetoing emissions from emission a and finally the suppression involving just ln 2 1/τ comes from vetoing emissions from emission b. The difference between primary and secondary emissions arises entirely from angular-ordering and the ensuing limitation on emission angle we mentioned previously. Although the logarithms present in S are written above in terms of ρ a and ρ b , these will eventually be related to logarithms of ρ, ρ min and ratios thereof once the integrals in the pre-factor a carried out. In the limit τ → 1 of the above result we obtain the pure Y m -Splitter result of Ref. [36]. A couple of further remarks are in order concerning the result in Eq. (4.7). First of all the result captures leading double logarithms in τ in addition to the logarithms involved in the resummation of plain Y m -Splitter [36]. Including hard-collinear emission via using the full splitting functions, rather than just their soft limit, and using the running coupling helps to improve the result beyond double-logarithmic accuracy. The result indicates that the effect of the N-subjettiness cut is to produce an extra suppression relative to the case of Y m -Splitter [36] just by changing the scale ρ b to the smaller scale τ ρ b and the extra secondary suppression factor we get from emissions off parton b. This suppression of the background is of course desirable but choosing a small τ value potentially also suppresses the signal, which in this case is a coloured particle namely the top quark. Also as is well-known from several prior applications [54,65] and additionally emerges in the Monte Carlo studies we reported in section 3, optimal values of τ do not necessarily satisfy τ 1, so that finite τ effects generally need to be considered [37] in addition to the resummation of logarithms of τ . The inclusion of finite τ corrections is thus the topic of the next subsection.

Finite τ corrections
To obtain an insight into the role of the τ 32 cut in a phenomenological context, one has to address values of τ ∼ 1. From the viewpoint of resummation this has implications identical to those first pointed out in the τ 21 case [37]. The small τ limit resummation of the previous subsection is designed to fully capture double logarithmic terms of the form α n s L 2n where, for power counting purposes, we use the symbol L 2n to denote double logarithms in any of ln ρ, ln ρ min , ln ρ ρ min , ln τ or any combination of them. From the fixed-coupling Sudakov form factor, Eq. (4.7), written in terms of ρ a , ρ b and τ we note that we obtain terms that are single logarithmic in jet masses (and jet mass ratios) but double logarithmic overall due to the role of ln τ i.e. terms of the form α s ln ρ b ln τ and α s ln ρa ρ b ln τ. Beyond the small τ limit we need to account for such terms beyond just their ln τ dependence i.e. obtain the full function f τ that multiplies single logarithms in jet mass. However, given that single logarithms in jet mass ratios i.e. α s ln ρ b /ρ a are smaller, we do not attempt to obtain the finite τ corrections for such terms which is substantially more involved and accordingly retain their small τ form only. In the Sudakov form factor with inclusion of running coupling, we therefore wish to control terms of the form α n s L n ρ f n (τ ) (where L ρ generically denotes logarithms in jet masses but not ratios), while the small τ resummation accounts only for terms that approximate f n (τ ) by its leading small τ behaviour ∼ ln n τ .
While in the small τ limit resummation of the previous subsection we assumed that emissions were strongly ordered in terms of their contribution to the jet mass (in addition to strong ordering in angle), in order to achieve resummation of terms α n s L n ρ with their accompanying τ dependence we no longer assume strong ordering in jet masses. In particular we assumed that τ 2 was set by a single emission b, and hence its value was taken to be ρ b . Beyond the small τ limit, we must account for the fact that τ 2 receives a contribution from all emissions in the jet except emission a. Since we still desire terms that are at least single logarithmic in jet masses, we continue to assume that emissions are strongly ordered in angle. This approximation of emissions ordered in angle but not in mass is the same as was made in the case of τ 21 studies for two-pronged decays [37], to obtain the finite τ correction to the small τ results [30].
We can then write where the numerator comes from τ 3 being given by the sum of jet masses contributed by all emissions except a and b, while the denominator is τ 2 which is given by the sum of jet masses contributed by all emissions except emission a. The sum of all emissions' contributions to jet mass, including those of a and b, just gives the total jet mass ρ.

Differential distribution in τ
We begin by presenting a result for the joint distribution in jet mass ρ and τ i.e. the quantity ρτ σ d 2 σ dρdτ , where τ is a set value of τ 32 . To begin with we shall consider primary emissions only, since it is straightforward to account for secondary emissions in the final result.
We first write the result for the cross-section differential in both τ and ρ, which accounts for the two emissions a and b included in the leading-order formula but now accompanied by an infinite number of additional emissions which are strongly ordered in angle. The strong ordering in angle ensures that these emissions are emitted independently from the hard initial parton which leads to the standard factorised formula for any number of emissions: (4.9) The above result is written using a fixed-coupling approximation for the emission of partons a and b though we shall account for the running of the coupling, with the k t of those emissions, in the pre-factor for our final results. It involves considering p factorised real emissions, with a sum over all p, alongside a sum over all virtual corrections included via the exponential form factor. The factor R , appearing in both real and virtual terms above, stems from the integral over the emission probability for a single emission in the soft and collinear limit, at a fixed mass ρ : where the RHS of the above equation gives the fixed-coupling result and we have replaced the splitting functions p(z) by their soft piece ∝ 1/z and incorporated the effect of hard-collinear emissions by introduction of the B i terms, corresponding to inclusion of the hard-collinear piece of the splitting function to our accuracy. For quark and gluon jets respectively B q = − 3 /4 and B g = (−11CA + 4nf TR) /(12CA).
Integrating over ρ b in Eq. (4.9) using the delta function constraint involving τ allows us to set ρ b = (1 − τ ) (ρ − ρ a ) which then leads to the result, assuming τ < 1/2, ρτ σ where we have used the shorthand notation Θ ρ min to denote the ρ min condition and the ρ b that occurs as an upper limit in the ρ i integral is understood to be the value set by the delta function i.e (ρ−ρ a )(1−τ ). The other step function constraint on ρ a derives from the condition that ρ a > ρ b and the value of ρ b set by the delta function integral we have performed. Finally we observe that we are left to evaluate the multiple emission contribution where the sum over the ρ i are constrained to be equal to (ρ − ρ a )τ . Additionally each emission i is constrained so that ρ i < ρ b = (1 − τ )(ρ − ρ a ), however for τ < 1/2 this condition is automatically met if the stronger condition on the sum of ρ i is satisfied. In what follows we restrict our attention to τ < 1/2 as this region is sufficient given the optimal value of τ that emerged from the Monte Carlo studies in section 3. Finally we note that one can evaluate the multiple emission contribution on the second line of Eq. 4.9 simply by using known results for the standard jet mass [69], since the constraint on multiple emissions is the same as for the plain jet mass ρ but with ρ replaced by (ρ − ρ a )τ . Hence without needing to perform any further explicit calculation one can write: The above result accounts for configurations where the three prongs tagged by Y m -Splitter are the hard parton which initiates the jet along with two gluons emitted independently from it, however, our final result also contains configurations where a gluon emitted from the hard parton branches, with the resulting three particles corresponding to the three Y m -Splitter prongs.
For the region τ > 1/2 one could in principle follow the same method as outlined for the τ 21 calculation in Ref. [37], though given our immediate motivation we do not consider this region further in the present study. We could also have initially integrated over ρ a instead of ρ b to obtain the result in the equivalent form ρτ σ . (4.13) A key feature of our results is the presence of an overall 1/(1−τ ) factor as was also the case in the τ 21 result of Ref. [37]. Taking the small τ limit of Eq. (4.12) we should return to the small τ result we derived in the previous subsection, which is indeed the case up to subleading terms in the orderᾱ 2 pre-factor. To be more precise, in the previous subsection we had evaluated the pre-factor taking ρ a to dominate the jet mass, by using the condition δ(ρ − ρ a )Θ(ρ a > ρ b ), which correctly captures all double logarithmic terms in the pre-factor. If instead one uses the more accurate condition δ(ρ − ρ a − ρ b )Θ(ρ a > ρ b ), then after integration over ρ b we obtain the same result as the small τ limit of Eq. (4.12). Relative to the strong ordering of emissions ρ a and ρ b , using the exact jet mass conditions affects only single logarithmic termsᾱ 2 L 2 in the pre-factor where L denotes logarithms in ρ/ρ min or ζ. Such terms are two logarithms below the leadingᾱ 2 L 4 terms in the pre-factor and only involve more modest logarithms than those in the jet mass. We can therefore consider such terms as negligible and hence the small τ limit of Eq. (4.12) is equivalent to the result of the previous subsection. For an explicit calculation demonstrating the argument above, we refer the reader to Appendix A.
Beyond the small τ limit the most crucial feature of the result is the overall 1/(1 − τ ) factor in Eq. (4.12). While there is additionally a τ dependence in the step functions in Eq. (4.12), that is again responsible for introducing τ dependent terms of orderᾱ 2 L 2 in the pre-factor, and hence can be neglected to our accuracy as illustrated in Appendix A. In what follows we shall use the freedom to set τ to zero in the step function conditions to obtain an analytic form for the cumulative distribution.

Cumulative distribution
It is of direct interest to also obtain the result for the differential distribution in the jet mass with a cut on τ 32 , τ 32 < τ rather than fixing a value for τ 32 as required for the double differential distribution above. In order to do this one can integrate the differential distribution, Eq. (4.12), between 0 and τ . Setting τ to zero in the step functions of the pre-factor, which has only a sub-leading impact as discussed before, we can write , (4.14) which corresponds to integrating the distribution up to some maximum value τ for τ 32 . To single-logarithmic accuracy, we can expand the radiator about some point τ 0 to write: where R (x) = − ∂R ∂ ln x . With τ 0 chosen such that in the small τ limit τ 0 is of order τ , and given that the integral is dominated by values τ ∼ τ , terms of order R and beyond can be neglected as they are beyond single-logarithmic accuracy and we may replace τ by τ 0 in the R terms to obtain: Upon evaluating the integral over τ we obtain We then have the result for the cumulative distribution given by Eq. (4.16) with finite τ effects encoded in the Hypergeometric function of Eq. (4.18) precisely as for the τ 21 case [37]. The origin of the Hypergeometric factor is simply the extra overall factor of 1/(1 − τ ) in the finite τ differential distribution. Without this factor we simply obtain the usual result for the cumulative (integrated distribution) up to terms involving R beyond our accuracy, as long as τ 0 ∼ τ . In what follows we shall simply choose τ 0 = τ while noting that varying this choice by an O(1) factor will correspond to an effective resummation scale uncertainty on our results.
To obtain an alternate form of Eq. (4.16) we could have integrated Eq. (4.13) over τ instead. Again, as before, we can drop any τ dependence in the pre-factor other than the overall 1 1−τ , which leads to a factor ρ/(ρ−ρ b ), rather than ρ/ ρ − ρ b 1−τ . This again leads one to consider only the overall 1/(1 − τ ) factor together with the τ dependence in the exponent. Then integrating over τ using the same steps that gave Eq.(4.16) we obtain where we have again used the freedom to neglect factors of τ in the pre-factor which only introduce terms of orderᾱ 2 s L 2 and set τ 0 to τ . While so far we have worked with a fixed-coupling approximation in our pre-factor, we now introduce the running of the coupling for "emissions" a and b. In order to do so we replace theᾱ 2 term withᾱ(z a ρ a p 2 T R 2 ) ×ᾱ(z b (ρ − ρ a )p 2 T R 2 ) inside the integral of Eq. (4.16). This corresponds to using the k t of each emission in the argument of the corresponding coupling factor, with neglect of a factor 1 − τ in the coupling associated to emission ρ b , i.e. using ρ b = (ρ − ρ a ) instead of (ρ − ρ a )(1 − τ ). The 1 − τ factor only results in sub-leading terms involving logarithms of 1 − τ which we neglect, consistent with our general treatment of the pre-factor.
Finally to include secondary emissions we use the full radiator including the secondary emission terms i.e. replace

Pre-grooming with Soft Drop
It is known that the Y-Splitter and Y m -Splitter methods need to be supplemented by some form of grooming in order to yield good performance for the signal significance (signal to square-root of background ratio) [31,32,36]. In Ref. [32] it was found, in the context of W/Z/H tagging, that pre-grooming jets with Soft Drop was optimal in terms of increasing performance while minimising the sensitivity to non-perturbative effects. Furthermore, in the context of top-tagging there is another advantage to pre-grooming, namely that the pregrooming procedure leads to a Sudakov form factor inherited from the groomer [32]. In other words for mMDT pre-grooming we obtain the mMDT Sudakov structure while for Soft Drop with non-zero β we obtain the Soft Drop Sudakov for both signal and background jets. Given that a modest rather than strong Sudakov suppression was found to be beneficial for signal significance in top-tagging [36], pre-grooming with mMDT which has only a single-logarithmic Sudakov form factor, followed by Y m -Splitter , emerged as the most performant method as well as being resilient to non-perturbative effects.
Here we consider QCD jets pre-groomed with mMDT as well as Soft Drop for β = 2 . In Ref. [36] a result was obtained for the jet mass distribution with Soft Drop pre-grooming followed by the application of Y m -Splitter i.e. without the additional τ cut involved here. As described in detail in Ref. [36], three situations can arise : a) the largest gen-k t emission, i.e. a in the present paper, stops the groomer, b) the next largest gen-k t emission, i.e. b stops the groomer and c) another emission stops the groomer. For the first situation the result obtained for the primary emission radiator, with mMDT grooming, was shown to be of the form: This corresponds to the usual mMDT Sudakov at the scale ρ b but modified by the addition of an extra piece, R angle mMDT that arises because emissions with angle below θ a are not examined by the groomer and hence need to be vetoed (if they have mass above ρ b ) even if they have z < ζ. This extra contribution, at fixed-coupling and leading logarithmic accuracy, is given by [36]: In case b), where emission b stops the tagger, one obtained instead just the standard mMDT result R mMDT (ρ b ), while for case c) where an emission other than a or b stops the tagger, there is a complete cancellation against virtual corrections and hence no contribution. For our current work where we apply also a τ cut, situation a) yields the result reported in Eq. (4.21) but now the mass scale ρ b is replaced by τ (ρ − ρ a ) in both terms of Eq. (4.21). In the case b) where emission b stops the tagger we now have to also account for the fact that while emissions with z < ζ and θ < θ b can never set a mass, or equivalently gen-k t distance, above ρ b , they can set a mass larger than τ (ρ − ρ a ). This is disallowed by the τ cut and hence such emissions have to be vetoed which leads to the appearance of a term R angle mMDT (θ b , τ (ρ − ρ a )) , in addition to R mMDT (τ (ρ − ρ a )), also in case b). Taking into account hard-collinear emissions and the running of the coupling we can write our result in the form where the first line is just the standard mMDT result [25], the second line is the extra R angle contribution and θ 1 = max(θ a , θ b ) is the angle of the emission which stops the groomer. The basic form of the result is then that of the mMDT Sudakov evaluated at the scale (ρ − ρ a )τ , which corresponds to a single-logarithmic Sudakov suppression. In a fixed-coupling leading log approximation, the R angle term can be written as where the logarithm involves a ratio of two small quantities similar to be behaviour obtained for secondary emission contributions. Overall therefore we retain the feature that pre-grooming with mMDT results in a reduced Sudakov suppression factor relative to the un-groomed case. The step function in eq. (4.24) switches off the R angle contribution when  One can also consider pre-grooming with Soft Drop. Identical considerations to the mMDT case apply, with the only difference being in the grooming condition i.e. for an emission to pass the grooming one needs z > ζθ β . We then obtain a result along similar lines to that for the mMDT above, but with the Soft Drop Sudakov (i.e. radiator) replacing that for the mMDT and a corresponding R angle contribution whose fixed-coupling leading-log form is explicitly reported in Ref. [36].
Secondary emissions are unaffected by grooming so the only change to the radiator, relative to the un-groomed case, arises from the primary emission term discussed above. The inclusion of finite τ effects is also unchanged relative to our previous discussions so that we still have the result Eq. (4.12) for the differential distribution and Eq. (4.16) for the cumulant but with the primary emission radiator replaced by that for the groomed case Eq. (4.23) for mMDT and its analogue for Soft Drop.

Numerical implementation and parton shower studies
For the rest of this section we focus on quark initiated jets, as in the jet p T range under consideration, these are the dominant background to top jets, though most of what follows could equally be applied to gluon initiated jets with minimal modifications. The form of Eq. (4.19) is that of the leading O α 2 s result multiplied by a factor accounting for further emissions. We now perform a type of matching to improve the accuracy with which we calculate this leading order pre-factor. While we have mentioned in section 4.1 that a more precise calculation of the leading order pre-factor based around the triple collinear splitting functions is possible, it was shown in [36] that the numerical difference between such a calculation and one using a product of 1 → 2 splitting functions, but the full phase-space, is slightly less than 10% for a jet mass of 175 GeV and m min = 50 GeV. Further to this, when a pair of collinear emissions are strongly ordered in angle, as we have considered them to be throughout this work, the appropriate matrix element is a product of 1 → 2 splitting functions. We therefore choose to match our resummed calculation on to a pre-factor calculated by taking the matrix element to be a product of 1 → 2 splitting functions but still using the full three-particle phase-space in the collinear limit. This particular matching procedure also potentially serves to bring the effects included in our calculations more in line with what is captured by the parton showers which we will compare our calculations to, as while these may be expected to contain elements of the phase-space, they do not include the full triple collinear splitting functions.
We now re-calculate the LO pre-factor at this higher level of accuracy, before showing how it is matched to the full resummation. As before, we use the C 2 F channel for illustrative purposes, although our final results contain contributions from the C F C A and C F n f colour channels where similar modifications can be made to those listed below. In what follows, the parton initiating the jet is labelled as parton 3, with the emission at the widest angle to this parton labelled with 1 and the smaller angle emission labelled 2. So as to ensure that the variables appearing as the arguments of the factorised 1 → 2 splitting functions are defined appropriately, we work with the energy fraction variables z and z p defined so that z 1 = 1 − z where the Gram determinant is given by and Θ Ym-Splitter = Θ(min(ρ 12 , ρ 13 , ρ 23 ) > ρ min )Θ(min(z 1 , z 2 , z 3 ) > ζ), (4.27) where ρ ij = z i z j θ 2 ij , encapsulating the conditions imposed by Y m -Splitter without approximating any particles as soft. Similarly, without approximating any particles as soft Θ clust. = i<j =k Θ(θ ij < min(θ ik , θ jk ))Θ(θ ij < R)Θ(θ ij,k < R). . We must first specify how the quantities appearing in the Sudakov factor of Eq. (4.19), which are defined in the soft and collinear limit, are related to the kinematic variables appearing in our improved LO pre-factor (Eq. 4.25). For the C 2 F channel we make the following prescription: which we note that there is some freedom in choosing, the only constraint being that the correct result must be recovered in the soft and strongly-ordered limit.
Replacing the O(α 2 s ) part of equation (4.19) with Eq. (4.25) and using the matching prescription given in Eq. (4.29) we can write: where the quantities ρ b , k t1 , and k t2 are as defined in Eq. (4.29) 7 . Eq. (4.19) can be recovered from Eq. (4.30) by replacing s 123 → ρ a + ρ b , neglecting the hard collinear part of the splitting functions, carrying out the θ 12 integral (equivalent to an azimuthal integral) and changing phase-space variables back to ρ a , ρ b , z a and z b . For the sake of brevity, the above result is given only for the C 2 F colour channel, however, our final results include the C F C A and C F n f colour channels, where a single gluon is emitted and then decays as opposed to the two independent emissions shown above. We also include secondary Sudakov factors in our final result exactly as before. Our results for pre-groomed jets are obtained by replacing the primary radiator with the groomed variant as discussed in section 4.3.3.
Eq. (4.30) is evaluated numerically using the Suave numerical integrator [70] interfaced to Mathematica [71]. As the cut on τ 32 restricts emissions down to very low transverse momenta, we freeze the running coupling at k t = 1.5 GeV to prevent divergences due to the Landau pole. The tagged background fraction is constructed from Eq. (4.30) by integrating ρ over the mass window. This is shown in figure 4 along with the same quantity derived from parton shower simulations using both Pythia and Herwig [72] for three variations on our calculation: no grooming, Soft Drop with β = 2 pre grooming, and pre-grooming with mMDT.
In all cases one notes that our results are in reasonable agreement with parton shower predictions, given the uncertainties of the calculations and the shower predictions due to subleading terms present in each case, and also reflected in the difference between Herwig and Pythia showers. In the case of no pre-grooming or grooming with β = 2 Soft Drop, our finite τ calculation is clearly an improvement of the small τ calculation over a wide range of τ values. Where jets are pre-groomed with mMDT, the finite τ effects we include still have a sizeable impact and improve agreement with the parton showers as τ → 1 2 , however, at 7 In deriving equation (4.30) all emissions are considered to contribute to the jet mass as shown in eq.
(4.29). As well as allowing us to capture the function of τ multiplying single logarithms in mass scales, this also generates τ dependant terms which are beyond our accuracy. These terms are removed, as discussed in section 4.3, by neglecting the τ dependence in the pre-factor beyond the leading 1 /1 − τ term which leads to the hypergeometric function in eq. (4.30). Specifically, we have set τ to zero inside the delta function, which would otherwise be written as δ ρ − s 123  smaller values of τ it is not clear that agreement with the parton showers is improved by their inclusion. This is potentially due to the fact that the leading logs in this case are single logs and we do not include any sources of next-to-leading logarithms (or their interplay with the τ dependence), other than the finite τ corrections we introduced here. Figure 4 also shows increasing differences between results from parton showers as the level of grooming decreases. Hence the mMDT result, involving more aggressive grooming, is in better agreement between the two shower descriptions over a wider range in τ , while the un-groomed case shows the largest differences. This is likely due to the differences in the modelling of soft gluon effects between the two showers, which is ameliorated by grooming.

Signal jets
Here we consider the action of the Y m -Splitter method with a τ 32 cut on the top quark initiated signal jet. In Ref. [36] studies were carried out for top jets with a range of tagging methods including Y m -Splitter both with mMDT and Soft Drop pre-grooming. Here one has to account, in principle, for gluon radiation in both the top production and top decay processes. In the highly boosted limit the top quark is similar to a light quark and the role of soft gluon radiation and its resummation therefore becomes as important as for the background QCD case. In particular in the boosted limit one can ignore the dead-cone effect [73], which does not affect our logarithmic accuracy. We shall also consider soft gluon energies well above the top width where we can neglect additional details of the soft gluon emission pattern studied for instance in [74]. In the region relevant to our studies we can therefore consider soft emissions as arising from a single fast moving colour charge aligned with the initial top quark direction. 8 In spite of these simplifying dynamical assumptions, for top jets, the resummation of large logarithms for the tagging and grooming combinations we consider is more complicated than for the case of background jets. In particular the three-pronged structure of the jet can arise in multiple ways including from the electroweak decay of the top system as well as from soft gluon emission effects. Therefore as in Ref. [36] our targeted accuracy will be lower for the signal case and shall omit double logarithms in ζ, ρ/ρ min and other similar ratios. We shall mainly aim at capturing leading logarithms in m 2 /p 2 T where m is a mass-scale which is at most of the order of the top mass.

Jet mass distribution for top jets
We start by computing the fraction of top jets tagged by simply requiring the invariant mass to be within some mass window. Radiation produced by the virtual top quark emerging from the hard process can be recombined with the final top decay products to form the final jet.
Placing an upper limit on the jet mass therefore directly constrains this radiation and results in a Sudakov form factor precisely as for a light quark jet. We restrict ourselves to the case where the lower edge of the mass window is below the top mass, so that jets containing all of the top decay products will have mass larger than this. Of course, there will be some fraction of events where not all of the top decay products are reconstructed as a single jet, however such configurations as suppressed by a power of mt p T [31] and hence can be neglected to our accuracy. We can then write the tagged fraction of events as where |M t→bqq | 2 is the squared matrix-element for the top decay, dΦ 3 is the three-body phase-space in the collinear approximation and Θ Clust is the jet clustering condition as for the background case (see Eqs. (4.25) and (4.28) ). The normalisation factor σ 0 is just the result without considering QCD corrections i.e. the squared matrix-element for top decay integrated over the final state phase-space with the jet clustering requirement. The factor S QCD takes into account the constraint on QCD radiation through limiting the jet mass. Given that the jet mass can be expressed in terms of multiple soft gluon emissions such that ρ = ρ t + i ρ i , with ρ t = m 2 t /(R 2 p 2 T ), the constraint on ρ i just produces a Sudakov form factor which factorises from the integral over the top-decay phase-space to give: where R (ρ − ρ t ) is the standard jet mass Sudakov evaluated to NLL accuracy [75] at the shifted scale ρ − ρ t . 9 In order to test this result and the approximations inherent in deriving it, we compare our result to expectations from Pythia 8. For our Pythia 8 study we choose the lower edge of the mass window to be 10 GeV below the top mass which serves to further reject events 9 Although we use the full heavy jet mass radiator evaluated to NLL accuracy the result is only accurate to modified LL accuracy for our case. In particular we neglect non-global [76] and clustering logarithms [77] that are relevant here at NLL level. where the top decay constituents are not recombined into the final jet. Effects contributing at the lower edge of the mass range should thus only differ from our result by numerically small effects. Pythia 8 was used to create a sample of 1 million tt events, with UE, MPI and hadronisation deactivated, and Fastjet [64] was then used to find CA jets with R = 1. Figure 5 shows the integrated jet mass distribution as the upper limit on the mass range is varied. Our analytical estimate is in good agreement with the distribution obtained with Pythia and Fastjet. As m approaches the top mass the agreement between our calculation and Pythia slightly worsens which is to be expected as effects which we neglect, including non-perturbative effects, become relevant for values of m very close to the top mass.

Top jets with Y m -Splitter
Next we consider the application of Y m -Splitter to the tagging of top jets. This was already studied in Ref. [36] where it was noted that the signal case had a number of additional complications relative to the description of QCD background jets, which made the attainment of leading logarithmic accuracy in each of the parameters ρ, ρ min /ρ and ζ substantially harder. For this reason only basic leading logarithmic accuracy in ρ (or equivalently in ρ min ) was targeted which allowed for a simplified treatment of the Sudakov form factor. Consistently with the accuracy goal of that article, complications including the possibility of soft gluon emissions giving one of the three prongs found by the tagger, and the interplay with the mass window constraint were neglected. The results, broadly speaking, gave a reasonable description of the main behaviour seen with parton showers, but the agreement was not as good as seen for QCD background jets.
Here, prior to discussing N-subjettiness, we shall attempt to at least partially address some of the complications that are mentioned above for pure Y m -Splitter . In particular we now consider in the soft-collinear limit, the situation where a single soft emission can be de-clustered as one of the prongs found by the tagger in addition to the case where the de-clustered prongs arise from the electroweak top decay process. Let us start by considering the result at leading-order i.e. neglecting all QCD radiative corrections. Then we can write where Θ Ym-Splitter = Θ(Min(ρ 12 , ρ 13 , ρ 23 ) > ρ min )Θ(Min(z 1 , z 2 , z 3 ) > ζ), (5.4) and 1, 2, 3 refer to the three prongs identified by Y m -Splitter . As there is no soft enhancement to the top decay matrix element, we use only a collinear approximation for the pairwise invariant masses, ρ ij = x i x j θ 2 ij in our calculations of the leading-order top decay. Next we consider QCD radiative corrections in the soft and collinear limit. We first take into account the situation that no soft gluon emissions are de-clustered as a prong by Y m -Splitter . This imposes a constraint on real emissions in addition to the constraint on jet mass, which comes from the requirement that the soft emission must set a smaller gen-k t distance than those set by the three-pronged top system. Labelling the soft emission by i we then have that min(d i1 , d i2 , d i3 ) < min(d 12 , d 13 , d 23 ). This complicated constraint simplifies in the soft and strongly-ordered limit responsible for the leading double logarithms we seek. To be more precise, the three-pronged top decay results in relatively energetic particles owing to the lack of soft enhancement in the electroweak decay. For a soft gluon emission to set a comparable gen-k t distance it must be emitted at a relatively large angle compared to the opening angle between the top decay products, 1 θ 2 i θ 2 ij , where θ i is the angle wrt the jet axis or equivalently the emitting top quark. In this region we can approximate the angle made by the soft emission with any given prong from the top decay simply by the angle wrt the jet axis which allows us to write the gen-k t distance for the gluon as z i θ 2 i . In addition to the gen-k t distance, the soft emissions are also subject to the jet mass constraint as before. Therefore the argument of the Sudakov corresponds to whichever is the tighter constraint which gives where by Σ (0) we mean the contribution where we enforce that no soft gluons can give one of the 3 prongs found by the tagger. Next we correct this picture by allowing a soft emission to form one of the prongs found by Y m -Splitter , a situation that can first arise at order α s . Consider a single gluon emerging from the de-clustering process before one of the top decay products, and thus being identified as a prong. This gluon is constrained so that it has energy fraction z > ζ and sets a minimum pairwise mass with the other prongs (labelled 1 and 2) of m min , i.e. min(ρ 1g , ρ 2g ) > ρ min , where g labels the gluon. The gluon must also not set a jet mass which pushes the jet outside of the mass window. The emission of a single soft gluon factorises from the top decay process and gives an order α s contribution to the pre-factor. Subsequent gluon emissions are constrained by the requirement of not being de-clustered as a prong as well as being subject to the jet mass constraint and again give rise to a Sudakov suppression. Hence we obtain the result: In the above result the first line gives the pre-factor which, aside from the usual squared matrix-element and phase-space integration for top decay, now also has the QCD pre-factor coming from real emission of the soft gluon. The three prongs are given by the soft gluon, a clustered pair of particles (ij) from the top decay and the remaining particle k arising from the top decay. The condition Θ(zθ 2 > d ij ) alongside the requirement that z > ζ ensures that the soft gluon is de-clustered as a prong. 10 The condition Θ(Min(ρ k(ij) , zθ 2 ) > ρ min is the ρ min condition where again we used the fact that at our accuracy we can replace the gluon angle wrt a given prong by that wrt the jet axis. Finally we discuss the Sudakov which has as argument (Min(d k(ij) , zθ 2 , ρ − ρ t − zθ 2 ), reflecting the competing constraints on subsequent soft emissions. Firstly we have that emissions must not set a gen-k t distance larger than the smallest gen-k t distance amongst the 3 prongs found by Y m -Splitter , given by Min(d k(ij) , zθ 2 ). Secondly we have that the soft emissions must not push the jet out of the mass window, i.e. the jet mass should be below ρ. Taking into account the additional soft emission we now have as a prong, this condition implies that for multiple subsequent emissions i we must have i ρ i < ρ − ρ t − zθ 2 . Taken together these conditions, on gen-k t and mass, produce the Sudakov in Eq. (5.6). It is additionally possible for two soft emissions to be resolved i.e. form two of the prongs found by Y m -Splitter . This occurs at order α 2 s with only modest logarithmic enhancements 10 Note that here we used the same leading-logarithmic simplification for the gen-kt distance for soft gluon emissions that led to the result in Eq. (5.3). 11 and hence such contributions are suppressed relative to the terms we include. We therefore omit them here. We also note that we have ignored soft emissions from the qq system produced by the splitting of the W boson. Soft emissions from this dipole are restricted in angle, by virtue of angular ordering, to have an angle less than that of the qq pair. Since they are part of the top system they also do not contribute to a shift in mass. Hence to our leading logarithmic accuracy they can also be ignored.
Our results are compared to Pythia 8 in Fig. 6, where we plot the signal efficiency as a function of m min (c.f. similar plots in Ref. [36]). We show our results for both cases with (red crosses) and without (blue dots) a resolved gluon prong. Our analytics agrees in both cases with the general behaviour seen with Pythia and we note an improved agreement with Pythia when the Σ (1) contribution, amounting to an O(15%) correction, is included. As before we choose the lower limit of the mass window to be 10 GeV below the top mass.   11 We remind the reader that resolved emissions are constrained in several ways. They need to have energy larger than ζ as well as a mass large enough to satisfy the ρmin condition but not large enough to push the jet out of the mass window. These constraints lead to the appearance of only modest logarithmic contributions.

Y m -Splitter with grooming for signal jets
Next we examine the impact of pre-grooming with Soft Drop on our results for Y m -Splitter applied to top jets. Relative to results from previous studies [36] here we also account for the possibility of a resolved gluon prong as in the previous subsection. The result of pre-grooming with mMDT or Soft Drop is again to essentially replace the Sudakov for the un-groomed case by the Sudakov for the groomer i.e. we make the following replacements in the Σ (0) and Σ (1) terms of the un-groomed results (see Eqs. (5.5) and (5.6)): where the suffix mMDT or SD is used to indicate the grooming variant. We note that unlike the case of the QCD background jets, we have not included R angle terms in the signal case. Although such terms would in principle be present, the angular scales involved are of the order of the opening angles between top decay products. At such angular scales the radiation pattern becomes more complicated as one also needs to account for radiation from the qq dipole produced by the colour singlet W decay. Given that the terms produced are logarithms in the ratio of two small scales, i.e. of the same level of significance as ln ρ/ρ min terms, they are beyond the accuracy we aim for in the case of signal jets. The tagged signal fraction, with our usual choice of parameter values, is compared to a Pythia simulation in Figure 7, again showing the results with and without a resolved gluon prong and for grooming with SD (left) and mMDT(right). We see that except for the extreme region, where the tagged signal fraction is very small, our analytic results, especially after inclusion of the resolved gluon case, are in good overall agreement with the behaviour seen with Pythia.

Y m -Splitter with τ 32 and grooming for signal jets
We now wish to understand the effect of adding a cut on τ 32 to the tagged signal distribution after application of Y m -Splitter . We shall first consider the un-groomed case and then include the effects of grooming. We begin with the configuration where all three of the LO top decay products are identified as prongs by Y m -Splitter . With no additional emissions τ 3 vanishes and hence a cut on τ 32 has no impact. Adding a set of soft and collinear emissions, one has to consider how these emissions are constrained by the τ cut, the mass-window cut and the requirement that they should not give a resolved prong on applying Y m -Splitter .
We first introduce an approximation into our definition of τ 2 which is valid to within the overall accuracy we can obtain with our current calculations for the signal case, i.e. LL accuracy in ρ with neglect of logs in ratios of mass scales and ζ. Consider the region of phase space where say d 12 < min(d 13 , d 23 ), so that the first de-clustering will lead to two gen-k t axes lying along p 3 and p 1 + p 2 . In this region of phase space, to leading order where there are no additional emissions, τ 2 = z 1 θ 2 1,12 + z 2 θ 2 2,12 . As the p 1 + p 2 direction will be aligned more with the harder of partons 1 and 2 we make the approximation that the gen-k t axis  is aligned with this parton, so that to LO we can approximate τ 2 = min(d 12 , d 13 , d 23 ). As there is no logarithmic enhancement associated with the leading order decay of the top, this approximation will introduce an O(1) rescaling of the argument of the Sudakov factor, which is consistent with an NLL correction and hence beyond our LL accuracy. When considering the role of additional soft emissions let us first consider, as in section 5.3 before, primary emissions at a large angle to the opening angles of the top decay system. Regardless of which of the gen-k t axes these emissions are closer to, their contribution to τ 3 and τ 2 may always be approximated by i ρ i , where ρ i = z i θ 2 i and θ i is the emission angle wrt the emitting top quark direction. The constraint on emissions due to the τ 32 cut is then τ 32 ≈ i ρ i min(d 12 ,d 13 ,d 23 )+ i ρ i < τ which gives the constraint i ρ i < min(d 12 , d 13 , d 23 ) τ 1−τ . For τ < 1/2 this subjettiness constraint overcomes the constraint from Y m -Splitter , ρ i < min(d 12 , d 13 , d 23 ) and hence the argument of the primary emission Sudakov depends only on the competing subjettiness and jet mass constraints.
Until now we have neglected the role of secondary radiation from the qq system (arising from W decay) since these emissions bring only enhancements in ratios of similar mass scales. If we wish to obtain a good description of the signal with a τ cut including also the region where τ 1, we need to consider all sources of double-logarithmic corrections in τ . Secondary emissions are a source of such double-logarithmic terms and hence we include them here. The secondary emission terms are given by taking into account soft and collinear emissions from the q andq with the constraint that the emission angle is smaller than θ qq the opening angle of the qq dipole. This leads to results which have the same form as the corresponding results for the background case (see Eq. (4.6)) with z a replaced by z q and θ a by θ qq for emission from q and similarly for emission from theq. We note that secondary emissions are part of the decaying top system and hence do not contribute to a shift in mass so that the jet mass constraint is irrelevant here.
Thus we can write where ρ max is the upper limit on the jet mass. Finally we account for the effect of grooming.
To take this into account one makes the usual replacement of the primary emission radiator by its groomed counterpart. An additional subtlety that is present here is the existence of R angle terms (see Eq. (4.22)) which originate from emissions which are not visible to the groomer as they are shielded by larger angle emissions that stop the grooming. Such terms have been ignored for the signal since they are complicated to account for and produce only logarithms of mass ratios which we neglect. However in the presence of a τ cut such terms also induce double logarithms in τ as described by Eq. (4.23). A consistent description of the double logs in τ should also include the double logarithm originating here while we can neglect all other details associated to this term. Grooming is therefore included through the replacement of the radiator as (5.10) where, at fixed coupling, R angle (τ ) = C F αs 2π ln 2 τ . We have thus far not considered the case where a soft gluon is resolved as a Y m -Splitter prong, which we took into account in the previous subsections. For such a configuration, the effect of the τ cut is actually to constrain the phase space of partons arising from the LO top decay. As the electroweak top decay is not logarithmically enhanced, the restriction from the τ cut leads to a suppression proportional to τ . Given that the configuration with a resolved gluon prong is already suppressed by a power of α s a further suppression with τ implies that we may ignore this term while still retaining a reasonable description of the overall behaviour. 12 Equation (5.8) is evaluated and compared to the same distribution derived from simulations using Pythia in figure 8. Although given the accuracy of the shower and the analytic calculations (each of which is leading-logarithmic albeit with inclusion of some key NLL effects), one would expect to see the moderate level of difference that can be observed in the figure, it is noticeable that the behaviour in τ is well captured by the analytics especially for the un-groomed case and for pre-grooming with Soft Drop. For grooming with mMDT there is good agreement at smaller τ and a deviation at larger values of τ . Here, given that the leading logarithms are single logarithms, the analytics and the shower would each only contain (at best) a correct leading-logarithmic description, but with potentially larger differences from spurious NLL effects in the shower and their interplay with τ . Moreover our neglect of configurations where a gluon is one of the resolved prongs from Y m -Splitter would also lead to differences at larger values of τ where the power suppression with τ , which was a factor in our neglecting this configuration, will be less pronounced. Neglect of such configurations may have more of an impact on the distributions where jets are pre-groomed, as they can allow the jet to be tagged even if one of the electroweak top decay products is groomed away.  : Comparison between our analytic calculation (crosses) and Pythia for the tagged signal distribution as a function of the cut on τ 32 for jets without pre-grooming (left) , with pre-grooming using Soft Drop (centre) and with pre-grooming using the mMDT.
We note that Eq. (5.8) for the signal case reflects a few features that are different to the corresponding results for the QCD background. In particular for signal jets there is a lack of soft and collinear enhancements in the pre-factor resulting in the absence of the Hypergeometric function. Also, to our accuracy, the jet mass constraint does not affect the distribution for small enough τ cuts, or large enough ρ max , as a result of the fixed invariant mass of the leading-order system. This is clear from the argument of the Sudakov factor in equation (5.8) which contains a competition between the τ cut and the mass-window. For a given τ 32 cut we can estimate the threshold below which m max should be taken if varying it is to have an effect on the tagged signal fraction: For top jets, where min(d 12 , d 13 , d 23 )p 2 T may be roughly approximated by the W boson mass squared, we estimate that, for τ = 0.3 and m t = 173 GeV, the jet mass constraint will not significantly affect the signal efficiency unless m max 181 GeV. In reality there will not be a hard threshold but some range of parameters over which the Sudakov suppression transitions from being due to the cut on τ 32 to being due to the jet mass constraint. The application of this will be discussed further in the next section.

Exploiting jet mass cuts
In this section we discuss a notable feature of our calculations in terms of the differences between signal and background jets. As suggested by Eq. (5.11), one can reduce the cut on jet mass m max , without impacting the signal until we reach a critical value depending on τ . Until we reach this point reducing m max results in a decrease in the background tagging rate and hence an increase in performance. While our analytic studies are somewhat simplified and in particular neglect subleading terms, it is interesting to study the extent to which our observations may apply to parton shower studies when subleading effects are present. Figure  9 shows, using both analytic calculations (left) and parton level MC simulations (right), how the signal tagging rate varies with m max for several fixed τ cuts both without grooming and with grooming via Soft Drop and the mMDT.
For the signal distribution the overall shape and dependence on τ is well described by our calculation, although, as before, there is some difference in the overall normalisation. The difference between our calculation and the distribution derived from MC worsens for smaller values of m max , which should be expected, as non-perturbative effects, which can not be completely removed from parton shower simulations, will start to play more of a role in this region. While the signal tagging rate derived from MC simulations does not flatten off to the same extent as the analytic calculations do as m max is increased, it is clear that beyond a certain value of m max the signal efficiency depends only very weakly on m max . Figure 10 shows similar plots for the case of quark jets. Our analytic predictions are again seen to be in overall good agreement with the Pythia shower capturing the m max and τ dependences. It is notable that the jet mass constraint affects the background tagging rate in the same way for any cut on τ 32 , as there are not two competing scales in the Sudakov factor. This opens up the possibility to improve the performance of the tagging procedure by reducing m max so that the signal tag rate remains approximately constant whilst removing a significant portion of the background.
One may wonder, given the effectiveness of a tight cut on the jet mass, what improvement is gained by cutting on τ 32 in these circumstance. Figure 11 also shows a curve generated by varying m max over the range 173 GeV to 225 GeV, but with no cut on τ 32 . In this case the signal significance is higher than cutting on τ 32 with m max = 225 GeV, but cutting on τ 32 with m max = 180 GeV is still the highest performing tagging procedure.   Varying , m max = 180 GeV Varying , m max = 225 GeV Varying m max , 32 < 1 Figure 11: Signal significance against efficiency for three variations on the tagging procedure. All jets are groomed with mMDT and tagged with Y m -Splitter . Either τ or m max is varied with a fixed cut placed on the other. The samples were produced using Pythia with hadronisation and UE activated.
We now investigate the impact of non-perturbative corrections on this tagging procedure as m max is varied. Figure 12 shows the resilience [78] to non-perturbative effects, defined where ∆ is the difference between the parton and hadron level tagging efficiency and is the mean of the two, for jets pre-groomed with mMDT, as m max is varied, for three different values of τ . To construct the resilience 10 million qq events and 1 million tt events were generated at both parton and hadron level using Pythia. From figure  12 we see that the resilience to non-perturbative effects does not strongly depend on m max in the range considered, even with m max as low as 180 GeV. By contrast, reducing the cut on τ 32 from 0.4 to 0.2 results in a marked drop in resilience. It would therefore be beneficial, in terms of reducing the impact of non-perturbative effects, to take τ not too small, say τ = 0.4, while imposing a rather tight cut on the jet mass to provide the discriminating power. These cuts provide a signal significance of around 6 with a signal efficiency around 0.35. This is both a higher signal efficiency and significance than was reported in section 3 with m max = 225 GeV and τ = 0.2, the highest significance achieved with the higher value of m max .

Conclusions
In this article we have studied top-tagging from first principles of QCD, as part of a larger program to understand the features of tagging and grooming methods in a model-independent fashion. We chose a combination of methods, starting from the application of a prong-finding step aimed at tagging three-pronged decays and rejecting background, followed by a radiation constraining step. We also pre-groom jets with both the mMDT and with Soft Drop with β = 2 to reduce non-perturbative contributions. For prong finding we have used Y m -Splitter , an adaptation of Y-splitter introduced in Ref. [32] while as a radiation constraining shape variable we have applied the N-subjettiness ratio τ 32 with β = 2. While our specific choices (use of Y m -Splitter for prong-finding and β = 2 for τ 32 ) are helpful in somewhat simplifying analytical studies, combinations similar to the ones used here have commonly been employed, including for experimental studies involving top-tagging [56][57][58].
We started by carrying out Monte Carlo studies which provided some of the motivation for what followed in terms of yielding information on performance, resilience to non-perturbative effects, and optimal parameter choices for our combination of methods. Next we turned to studying QCD background jets. Here we have built on previous work [36] on understanding top-tagging, and in particular Y m -Splitter , by including the constraint from τ 32 . We have derived results for the double differential distribution in jet mass and τ 32 as well as for the cumulant where we integrate over τ 32 with the condition τ 32 < τ . We obtained a result in the limit of small τ and then included finite τ corrections along similar lines to the studies in Ref. [37]. We also performed studies both with and without pre-grooming with mMDT and Soft Drop (β = 2). We compared our results to those from both Herwig and Pythia showers and in all cases we saw that our analytical calculations are able to capture the essential impact of the tagging, shape-variable and grooming steps.
We then turned to studying signal jets. We found that in the highly boosted limit a mass window constraint gives rise to a simple Sudakov form factor which is in good agreement with Pythia results. We then added Y m -Splitter as in Ref. [36], but improved upon previous calculations by also considering a situation, at order α s , where a soft gluon can be one of the prongs resolved by Y m -Splitter . Including this contribution we found the results to be in significantly better agreement with Pythia than was the case with previous results where such a correction was not considered [36]. We then studied the impact of a τ cut, including finite τ effects and also considering pre-grooming with mMDT and Soft Drop β = 2. In all cases, in spite of the complexity of the problem, our simplifying approximations were sufficient to capture, the basic behaviour, i.e the τ dependence over a wide range in τ , seen also with parton showers. Remaining differences with parton showers were at a level that was consistent with our expectations from missing subleading terms. One immediately exploitable outcome of our analytic results was the suggestion that using a tighter mass cut than our default choice (for a given τ ) would reduce the background rather than the signal while not significantly affecting the resilience to non-perturbative effects. This finding was used to show how a highly performant and resilient method could be developed using our combination of tools.
Finally we would say that although the combination of methods we have considered gives rise to a highly non-trivial observable, we have demonstrated that analytical methods can still give substantial insight into the basic physics mechanisms that control the performance of such tool combinations. Further systematic improvements on the results we have obtained are possible, with the inclusion of subleading logarithmic terms being one avenue to pursue. Also, while our specific choice of tools is helpful for analytical studies and was taken mainly for convenience, combinations similar to the ones here have been in widespread use, and have not been analytically understood thus far. We believe that our studies should therefore encourage analytical investigations of other similar combinations, including for example variants using τ (β=1) 32 , and help develop a more complete picture of the distinct role played by the different elements and/or steps that form part of a number of top tagging methods. limits as in section 4.3.2 14 . Enforcing the condition ρ b > ρ min , which is embodied in Θ ρ min gives an upper limit on ρ a of ρ − ρ min 1−τ , which, within our accuracy we can approximate as ρ. To carry out this integral within single logarithmic accuracy, we can expand the radiator about some fixed ρ b which we take as (ρ − ρ a )(1 − τ 0 )so that: where τ 0 should be chosen close to τ as values of ρ b close to (ρ − ρ a )(1 − τ ) are expected to dominate the integral. The integral can then be carried out to give (7.12) in perfect agreement with Eq. (4.16). Although less convenient for making contact with the result reported in Eq. (4.16), we could equally well have integrated over ρ a , leaving the ρ b integral to be done numerically, as we could have done in section 4.3.1. To do this one would expand R(ρ − ρ a − ρ b ) around ρ a = ρ − ρ b 1−τ which would lead to: (7.13) where again, τ 0 should be taken close to τ , and any τ dependence in the leading order prefactor has been neglected.