Framing energetic top-quark pair production at the LHC

Top-quark pair production is central to many facets of LHC physics. At leading order, the top and anti-top are produced in a back-to-back topology, however this topology accounts only for a minority of $t \bar t$ events with TeV-scale momentum transfer. The remaining events instead involve the splitting of an initial or final-state gluon to $t \bar t$. We provide simple quantitative arguments that explain why this is the case and examine the interplay between different topologies and a range of variables that characterise the event hardness. We then develop a method to classify the topologies of individual events and use it to illustrate our findings in the context of simulated events, using both top partons and suitably defined fiducial tops. For events with large $t \bar t$ invariant mass, we comment on additional features that have important experimental and theoretical implications.


Introduction
Top quarks are among the most central objects to collider physics today. As the heaviest known fundamental particle, the top-quark has a unique place in the standard model (SM), the only particle with a Yukawa coupling close to unity, making it a key player in many Beyond Standard Model (BSM) scenarios and in discussions of the stability of the SM universe [1]. Top-quark pair production is an increasingly important input to fits of parton distribution functions (PDFs) [2][3][4][5][6][7][8]. It plays a crucial role in effective field theory (EFT) studies [9][10][11][12][13], provides an avenue to learning about the top-quark Higgs Yukawa coupling [14]. More generally top quarks are omnipresent in beyond standardmodel (BSM) searches, both as signal objects and for backgrounds (e.g. as a source of leptons and b quarks).
Early studies of top-quark production at Fermilab's Tevatron and CERN's Large Hadron Collider (LHC) were restricted to configurations where the top quarks had a transverse momentum that was comparable to the top-quark mass. Today, at each of the ATLAS and CMS collision points, the LHC has produced over a hundred thousand events with top quarks having a transverse momentum p T > 500 GeV and a couple of thousand events with p T > 1 TeV (all decay channels combined), with those numbers expected to increase by a factor of 25−30 at the high-luminosity LHC [15]. These large numbers of events provide an opportunity for percent-level precision across a wide kinematic range, with corresponding experimental measurements well underway [16][17][18][19][20][21]. This measurement programme will benefit a wide range of physics areas.
To reap these benefits, it will be necessary to achieve corresponding percent-level understanding of theoretical predictions and one may expect to leverage the impressive ongoing progress in perturbative calculations (for a review see Ref. [22]). The reliability of this approach depends, however, on an assumption that the perturbative series is well behaved. A prerequisite is that the event topology that appears at LO, i.e. a 2 → 2 process with a back-to-back top-pair, dominates over topologies that start to arise only at higher orders, for example a boosted tt system recoiling against a hard jet. As we shall see, the extent to which this is true depends on the choice of observable used to characterise the event hardness. For some choices, the hierarchy holds as expected, but for other widely used choices of event hardness scale this is not the case, (e.g. for any observable that sums the transverse momenta of both jets and the top quarks). Accordingly, it is crucial to develop an understanding of the interplay between various widely used measures of the event hardness and the underlying topology of tt production. This is one of the main goals of this manuscript.
There are multiple potential benefits to having this understanding. For applications where precision is crucial, it can inform the choice of measurement observables. More generally, we will develop an approach for identifying the topology of any given event, so as to be able to separate out subsets of events that may probe different underlying physics in the specific application being considered, whether a PDF fit, an EFT fit or a BSM search.
This article is structured as follows. In Section 2, we examine a range of variables used to characterise the hardness of tt events, examine the event topologies that can be produced at leading order (LO) and next-to-leading order (NLO), give a simple analysis of our expectations for their relative sizes, and finally consider the interplay between these topologies and the event hardness variables. In Section 3 we introduce a procedure for identifying the topologies of individual events given the top quarks and a list of the other event particles, apply the procedure to simulated tt events, and compare the results to the expectations of Section 2. In Section 4 we combine this analysis with a fiducial reconstructed top-quark definition designed to identify top-quark candidates from the full set of final-state particles across a wide range of transverse momenta. In Section 5, we conclude with a discussion of the implications of our results. Throughout this paper we will consider the semi-leptonic tt → bbµν µ jj decay channel, though the arguments that we make apply generally to any tt decay channel.  Table 1: Variables that may be used to characterise a hard kinematic scale in events with a semi-leptonically decaying tt pair. All observables within a given group are identical to each other at leading order. The j t,i jets correspond to R = 0.4 non-top jets, while the J i jets corresponds to large-R jets (whose clustering inputs include the top quarks). Further details about the jet finding are given in Sections 3 and 4.

Theoretical considerations
In this section we review various event-hardness measures, and discuss some basic expectations about their behaviour for events that involve large momentum-transfer and contain a tt pair.

Event hardness variables and their leading-order behaviour
We start by examining measures to characterise large momentum transfer in tt events, i.e. the event hardness, including a discussion of their leading-order distributions. A wide variety of such measures is used in the literature and we provide an illustrative selection of them in Table 1, organised into groups that are identical at LO, i.e. for events that consist of just a single back-to-back tt pair.
The first set of observables simply measures the top-quark transverse momentum. They differ only in terms of which top-quark is used, which is why they are identical at LO (order α 2 s ). We also have an all-order relation between some of the observables, specifically 1 dσ dp top,had T = dσ dp top,lep Note that we have chosen to define the "max/min" based on the value of m T rather than p T , but results would be essentially unchanged if we instead used p T . In the LO calculations that we report below, it will be convenient to have the shorthand p T,t for the transverse momentum of either of the top quarks. The next set of observables provides measures of the hard scale of the event that at LO include the top-quark mass and transverse momentum. The H tt T variable is identical to the H T of Czakon, Heymes and Mitov [23] and of Catani et al. [24]. The H tt+jets T observable provides a democratic evaluation of the event hardness across all objects. At high scales is very similar to the m eff variable used in supersymmetry searches, which is the scalar sum of the transverse momenta of all jets, leptons and missing momenta, see e.g. Ref. [25]. 2 The m J,avg T quantity is based on large-R jets, with the details of the jet finding discussed in more detail below. It is not part of the standard set of tt event-hardness measures, but we include it here because it gives a faithful reflection of the hardness of the main 2 → 2 scattering in the event, regardless of the precise role played by the top quarks in that scattering.
At leading order, in the limit where p T,t grows much larger than the top-quark mass (m top ), all three observables in this group become identical to the p T,t type observables of the first group. Structurally, for p T,t m top but still significantly smaller than the collider centre-of-mass energy √ s, the LO cross section is given by dσ dp 2 Here the c gg and c qq are numerical constants, of the order of 0.1, which depend on how steeply the PDFs fall with increasing x. They are discussed in Appendix A.2, together with our specific definition the partonic luminosities L gg and L qq . Since this is a LO calculation, for simplicity we have left out explicit renormalisation and factorisation scale dependence. The next observable in Table 1  multiplying the gluon-gluon luminosity. One can understand the origin of this logarithm by considering fixed m tt and examining the distribution of ∆y tt , the difference in rapidity between the top and anti-top. At large ∆y tt , the gluon-fusion contribution is dominated by a t-channel top-quark exchange diagram and the cross section becomes a constant, independent of ∆y tt . Integrating over ∆y tt up to its kinematic limit, then yields the logarithmic factor seen in Eq. (2.3), cf. Appendix A.1. The large-∆y tt enhancement of the gluon-fusion versus qq contributions provides a potentially powerful handle for separately constraining the qq and gg luminosities in PDF fits.
Note that at the large ∆y tt values that dominate the gluon-fusion contribution to Eq. (2.3), the top-quark transverse momentum p T,t is much smaller than its ∆y tt = 0 value of approximately m tt /2. This is the reason why one should be wary of using m tt as a renormalisation and factorisation scale for calculating the m tt distribution, and would expect the use of µ = m tt to lead to poor stability, as observed in Ref. [23]. 3 In practice, once one uses a dynamical scale, the LO result is no longer flat in ∆y tt , but is sensitive to the varying scale of both α s and the PDF, and typically results in a ∆y tt distribution that is peaked at large ∆y tt . 4 Some plots illustrating these points are given at LO in Appendix A and at NLO in Section 3.2 and Appendix B.
The importance of large ∆y tt values makes the m tt observable subtle both theoretically and experimentally. The theoretical subtleties are discussed briefly in Appendix B and have two main facets: firstly, for t-channel top-quark exchange a rich structure of logarithmically enhanced terms, α n s ln m m tt /m top , develops beyond LO. Secondly, contributions from 4-top final states with t-channel gluon exchange, and EW bb → tt diagrams with t-channel W exchange both scale as 1/(m 2 top m 2 tt ) rather than 1/(m 4 tt ). For sufficiently large m tt values they dominate over other contributions. At the LHC, they bring only a small contribution, because of suppression by phase-space and couplings, but this would no longer be the case at a 100 TeV pp collider.
Experimentally there are also at least two facets to the subtleties for the m tt observable. Firstly, large ∆y tt implies large rapidities for at least one of the two tops, which may then be beyond the detector acceptance, notably for b-tagging and lepton identification. Secondly, the spread of p T,t values across the range of ∆y tt adds the complication that standard measurement approaches, which use either resolved or boosted top reconstruction techniques, cannot reconstruct tops across the whole range of ∆y tt . We will return to this issue below.
The last two observables that we show in Table 1, p tt T and p j t,1 T , are identically zero at order α 2 s . In the absence of p T and rapidity acceptance cuts for jets, the two observables are identical at order α 3 s , as long as one has a concrete scheme that separates the top decay products from other event particles.

Topologies beyond LO
Having reviewed the key characteristics of the LO distributions, we can now turn to the main topic of this paper, the question of topologies beyond LO and the interplay between topologies and event hardness scales.
The topologies that we will consider are illustrated in Fig. 1 and will be familiar to some readers from older discussions of b-quark production. The flavour creation (FCR) configuration is the dominant mechanism for top production at low transverse momentum. It is the only topology that is present at leading order (LO) in a strong-coupling perturba-3 Similar considerations apply to the measurement of the running top-quark mass as a function of m tt [26].
Indeed, at high m tt the sensitivity of the cross section to the top-quark mass will come predominantly from the region close to the kinematic limit of ∆y tt , where the largest virtuality of the produced or exchanged top quarks is much closer to mtop than to m tt . 4 This feature was observed numerically in the context of FCC-hh studies in section 12.3 of Ref. [27]. tive expansion. In flavour excitation (FEX), a tt pair can be produced by an initial state splitting, with one of the pair undergoing a large momentum-transfer scattering with a light parton. Gluon splitting (GSP) involves production of a tt pair during jet fragmentation. Both FEX and GSP start at next-to-leading order (NLO). Finally some events do not readily fall into any of these categories, for example two high-transverse momentum light-flavour jets plus a (relatively) soft additional gluon that splits to tt. These arise only at NNLO and beyond. Relative to LO, the FEX and GSP topologies involve a factor α s ln p T /m top , where p T is generally the transverse momentum of the hardest object in the event. The ln p T /m top factor that arises at the LHC is typically not large: e.g. for p T ∼ 1 TeV, it is of the order 2, which would not be expected to compensate for the extra power of α s relative to LO, and one might expect FEX and GSP to be small compared to FCR. 5 As we shall see, this intuition misses important considerations. To help understand this, Table 2 shows the different factors that come into the calculation of the cross section for the FCR, FEX and GSP topologies. We consider a 2 → 2 hard scattering energy of 2 TeV and take the case of 90 degree scattering in the centre of mass, which dominates high-p T production. This corresponds to each outgoing object from the 2 → 2 scattering having a transverse momentum of 1 TeV and identical rapidity.
The first point that we highlight is that the underlying 2 → 2 matrix elements for the FCR process are an order of magnitude smaller than for FEX and GSP. To illustrate the origin of this analytically in one simple case, consider 90 • scattering in the limit p T m top , and compare for example the squared matrix element relevant for the q iqi → tt channel of FCR (cf. [34] or [35]), 5) to that involved in the qt → qt channel of FEX,  Table 2: Factors contributing to the top-production cross section for a variety of partonic scattering channels. In each case the 2 → 2 squared matrix element (|ME| 2 , with a g 4 = (4πα s ) 2 factor stripped off as in Eqs. (2.5), (2.6)) is given in the massless limit (valid when p T m t ), for 90 • scattering in the partonic centre-of-mass frame. The partonic luminosities, defined as in Eq. (A.2), are given for a proton-proton centre of mass energy of √ s = 13 TeV and for producing a partonic system mass of √ŝ = 2 TeV. We set the factorisation scale to µ = 1 TeV. Σ denotes a sum over all (non-top) quark and antiquark flavours. The luminosities have been evaluated with the PDF4LHC15 nnlo mc [28] set, re-evolved in a six-flavour scheme with HOPPET [29] using NNLO splitting and thresholdmatching functions [30][31][32][33]. The final-state splitting probability P g→tt is obtained using Eq. (2.9). The results in the final column are to be taken as order of magnitude estimates, illustrating the commensurate sizes of different channels.
The Mandelstam invariants areŝ = 4p 2 T andt =û = −2p 2 T , and as a result the FEX channel has a squared matrix element that is ten times larger than the FCR channel.
A second factor that is relevant is the partonic luminosity. For the FEX channels, the incoming top is produced by an initial-state g → tt splitting, so ultimately the cross section is driven by gg and gΣ luminosities, where Σ is the sum of all light (anti-)flavours. The top-quark luminosity then involves a factor α s ln p T /m top , which gives a smaller luminosity than either the gg or q iqi luminosities that were relevant for the FCR case. Ultimately the larger matrix element compensates for the reduced luminosities and the FEX process has a cross section that is comparable to that for FCR.
A similar set of features emerges also for the GSP case. Here the α s ln p T /m top factor appears for the final-state splitting rather than an initial state one. It is straightforward to use massive splitting functions [36] to evaluate the leading-order probability P g→tt for g → tt splitting with the tt pair separated by distance ∆R tt < R, where ∆R 2 tt = (y t − yt) 2 + (φ t − φt) 2 and y t and φ t are respectively the rapidity and azimuth of the top. For a gluon transverse momentum of p T , and with the conditions p T R m top and R 1, the result is In practice, the regime of p T = 1 TeV is not sufficiently asymptotic for this expression to hold, as one can see by substituting R = 1 and observing that the result is negative. To obtain a more reliable estimate, we maintain the conditions p T m top and R 1, but relax the constraint on p T R/m top . The resulting expression is a little cumbersome, 6 but the following parametrisation reproduces the correct result to better than 1% for all values In other small-R calculations, corrections associated with finite values of R have often been found to go as R 2 with a small coefficient [37]. At this stage, there is some freedom in the R value that we choose in order to define the gluon-splitting. Insofar as we are interested mainly in an order-of-magnitude estimate of g → tt, we evaluate Eq. (2.9) with R = 1, ignoring potential R 2 -suppressed corrections. Substituting p t = 1 TeV and m top = 173 GeV and α s (1 TeV) = 0.089, this yields the result for P g→tt shown in Table 2. We see that, like FEX, the GSP topology is also comparable to the FCR one.
Were we to consider significantly harder events (e.g. at a 100 TeV collider) or bquarks instead of top quarks, the logarithmic factors would start to become large, further enhancing the FEX and GSP contributions relative to FCR. This is consistent with earlier findings of large relative FEX and GSP contributions to high-p T b-jet production [38].
The analysis shown in Table 2 is not intended to give precise predictions for the relative sizes of different topologies. Nevertheless it shows that, despite their being suppressed by a power of α s , the (NLO) FEX and GSP topologies are numerically comparable to the LO FCR topology. 7 By framing the discussion in terms of an asymptotic limit where p T m top , we avoided providing a rigorous definition of the FCR, FEX and GSP topologies. If one wishes to study actual events, whether in fixed-order QCD, or at particle-level in experiments, a precise definition becomes necessary. This will be the topic of Section 3.

Interplay between topologies and hardness characterisation variable
Before turning to detailed topology definitions, we discuss the interplay between the topologies of Fig. 1 and the event hardness variables of Table 1. While all three main topologies have comparable cross sections for comparable hardness of the underlying 2 → 2 scattering, their relative contributions to the differential distribution of some specific event hardness 6 Starting from the massive splitting function, one introduces µ = mtop/(pT,t∆R) (where ∆R is the separation between the top and anti-top), performs the logarithmic integral over µ, and after rewriting the expression in terms y, which is the solution of µ 2 = −y 2 /(1 − y) 4 , integrates over z. One obtains and the solution to take for y is y . 7 This finding is reminiscent of the observation of giant K-factors discussed for example in Ref. [39] for vector-boson plus jet production, though the K-factors in the tt case are less extreme. variable depends significantly on the choice of variable. A key principle to remember in the discussion is that each of the topologies has a cross section that falls steeply as a function of the underlying 2 → 2 transverse momentum p 2→2 T , say as ∼ 1/(p 2→2 T ) k with some positive power k. Consider a specific value V of a given hardness variable. If p 2→2 T is significantly larger than V in some topology, its contribution to the bin around V will be suppressed relative to another topology for which p 2→2 T is comparable to V . Equivalently, a topology where V ends up being significantly smaller than p 2→2 T will be suppressed relative to a topology where V is similar to p 2→2 T . On this basis we can work out which topologies will be relevant for which hardness variable, and the conclusions are summarised in Table 3.

Hardness variable
Specifically, we see that the first group of hardness variables in Tables 1 and 3, the p top T set of variables, splits into two sub-groups. The first three variables p top,had share the characteristic that they can be commensurate with p 2→2 T if at least one of the two tops is hard. Therefore we expect the distributions of these variables to receive significant contributions from the FCR and FEX topologies, 8 but not from GSP (because neither of the tops carries the full p T of the underlying hard process). In contrast, for p top,min T and p top,avg T to be commensurate with p 2→2 T , both tops need to be hard, and so we expect significant contributions mainly from FCR.
The next set of variables in Tables 1 and 3 also splits into two groups. T , regardless of the underlying topology, and so we expect contributions from FCR, FEX and GSP.
The 1 2 m tt variable is special, as discussed in Section 2.1 and Appendix B. We do not expect significant FEX or GSP contributions associated with NLO matrix elements that are larger than the LO one, and on that basis would expect it to be dominated by FCR. However if log m tt /m top becomes large enough, the discussion of Appendix B implies that the largest log-enhanced contributions (e.g. α 3 s ln 3 m tt /m top terms) would include FEX-like topologies. At LHC energies, logarithms are not yet quite large enough to override the main perturbative hierarchy and accordingly we remain with the expectation that the m tt distribution should mainly involve the FCR topology.
The last two variables that we consider are p tt T and p j t,1 T , which are identical at NLO. They are commensurate with p 2→2 T only for the topologies with a hard non-top jet, i.e. for the FEX and GSP topologies.
The issue of the relevant topologies is not the only aspect that contributes to the size of the final cross sections for different hardness variables. We will comment on other aspects as they arise in the sections below.

Parton-level (truth-top) analysis
If we are to explore the relevance of different topologies in actual (simulated or experimentally observed) tt events, it becomes necessary to develop techniques to identify the tt event topologies based on the momenta of the top quarks and the other event particles. Such techniques need to be applicable even outside the asymptotically high-scale limit that formed the conceptual basis of the discussion in Section 2.
As a first step, we imagine a situation where we have full kinematic information about the top and anti-top quarks and that we can separate out all particles that do not stem from the tt decays. In Section 3.1 we outline a simple algorithm to assign a classification of the topology for any given event. Then in Section 3.2 we apply the algorithm to simulated parton-level events and compare the results to the expectations from Section 2.

Identification of topologies with identified tops
We consider a procedure for events with exactly one tt pair, and follow a two-stage reconstruction procedure, set out as Algorithm 1.
In the first stage we obtain R = 0.4 jets from all objects other than the top quarks. This is the output of step 2 of the algorithm. The R = 0.4 jet radius ensures that lowmomentum particles from the underlying event (UE) and pileup do not too significantly affect the momenta of genuinely hard jets. The p T,min cut on the jets ensures that jets composed primarily of low-momentum particles from UE and pileup do not significantly affect variables that sum over multiple jets, such as H tt+jets T . In this section we imagine a perfect detector, and apply no rapidity acceptance cuts, neither on the top quarks nor on the jets.

Algorithm 1 Event analysis algorithm, given t,t partons, and other partons
Require: two undecayed tops, t,t, and the set of partons not from top decay, {P t } 1: Cluster the non-top {P t } partons with the anti-k t algorithm [40], using a jet radius of R = 0.4. 2: Apply a transverse momentum threshold p T,min to obtain the set of non-top R = 0.4 jets, {j t }, ordered in decreasing p T . 3: Cluster the set {j t , t,t} with a jet algorithm with radius of order 1. Here we take the anti-k t algorithm with radius R J = 1. Refer to the resulting set of large-R jets as {J}, which are sorted in order of decreasing m 2 Step 3 of the algorithm then clusters the top quarks and the R = 0.4 jets together using a jet radius of 1. When there are jets or top quarks at high transverse momentum, this step effectively treats them democratically, reflecting a view that the top quarks are akin to light partons, and should be included in the clustering on the same footing as other partons. Taking R 1 is the natural choice for separating initial-state and final-state perturbative-QCD radiation [41]. The use of R = 0.4 jets (with a p T,min threshold) as the input to the large-R clustering ensures that the large-R jets are kept relatively free of underlying-event and pileup contamination, in the same way as for observables such as H tt+jets T that sum over multiple jets. This clustering of smaller-radius jets into a larger radius system is reminiscent of CMS's radiation recovery procedure in dijet resonance searches [42] and bears similarities also to the use of filtering [43] or trimming [44] with large-R jets for resonance reconstruction in Ref. [45].
Sorting the large-R jets {J} in order of decreasing m 2 T ensures that for low-p T events, the first two large-R jets always contain the top quarks. At large p T , the difference between m T and p T ordering is immaterial.
Given the output of Algorithm 1 for a specific event, we then identify the topology as follows: • If each of J 1 and J 2 contains one top (anti-)quark, we declare the event topology to be FCR.
• If one of J 1 and J 2 contains a single top, and the other does not contain a top, we declare the event topology to be FEX.
• If one of J 1 and J 2 contains both tops, we declare the event topology to be GSP.
• Otherwise, we define the event topology as "other".
One advantage of this simple approach is that it is straightforward to implement and gives a definite answer about the topology of each event.

Results
Let us now examine what happens when we analyse tt events according to the procedure outlined so far. We will consider events at a pp centre-of-mass energy of √ s = 13 TeV, simulated with the hvq process [46] of POWHEG Box v2 [47], revision 3660, using the PDF4LHC15 nnlo mc PDF set [28] via LHAPDF [48], and a top mass of 173 GeV, interfaced to Pythia 8.240 [49], Monash13 tune [50], with multiple interactions turned off. All jet clustering is performed with FastJet [51], version 3.3.2. In the future it would also be interesting to carry out similar studies to NNLO accuracy [52][53][54], for example taking advantage of recent developments in combining parton showers and tt production at NNLO [55]. As inputs to Algorithm 1, we take the truth tops from the event record, together with all partons not coming from the top decay. Since the hvq process is NLO for tt production, we expect to have FCR topologies accurate to NLO, and FEX and GSP accurate to LO, while other topologies are at best generated by the shower, so do not have any formal perturbative accuracy. For observables and topologies that start only at NLO (α 3 s ), we have cross-checked the hvq results against the NLO calculation for tt + jet (i.e. up to α 4 s ) in its POWHEG implementation [56] and found good agreement.
In this section, even though we consider top quarks before their decay, we will still label one of them as leptonic and the other as hadronic. Explicit cross sections will include the branching ratio for one top to decay muonically and the other hadronically.
For each of the event-hardness characterisation scales from Table 1, Figure 2 shows the fractions of FCR, FEX and GSP, as a function of the hardness scale (other topologies are negligible). The expectations from Table 3 for the dominant topologies at high scales are given on each plot. Those expectations are all well borne out for sufficiently hard events, V 2m top , and it striking to what extent different event-hardness characterisation scale choices lead to very different proportions of the various topologies. In particular, the simple analysis of Table 2, which suggested that all three topologies could potentially be of the same size, is clearly reflected in the plots for the two hardness scales that are insensitive to the particular topology, H tt+jets T and m J,avg T . Within groups of observables that have the same dominant topologies according to Table 3, there remain some differences. measures the hardness of the softer top relative to the underlying 2 → 2 event hardness, and ∆φ tt is the azimuthal distance between the two top quarks. In FCR topologies we expect the softer top to balance against the harder top (z = 1) and to be back-to-back in azimuth (∆φ tt = π). In FEX topologies, because the softer top is produced through initial-state radiation, we expect z to take on predominantly low values and ∆φ tt to be spread out between 0 and π. In GSP topologies, where the softer top is in the same jet as the harder top, we expect z 1/2 and ∆φ tt < R J = 1. These features are broadly observed in the plots, though for the finite values of m J,avg T that we use, the exact limits on z are affected by the contributions of the top-quark mass to the variables that enter its definition.
Let us now apply the understanding that we have obtained to investigate differential  Fig. 1), as a function of the variable used to characterise the hardness of the event, cf. Table 1. The expectations are those shown in Table 3.
cross sections that are commonly studied experimentally. Fig. 4 shows differential cross sections for a subset of observables, choosing at least one from each of the groupings of Table 1. The left-hand plot shows the results without any topological classification. Among the features in the plot that is surprising at first sight is that the p tt T distribution, which starts at α 3 s , is larger at high scales than the p top,lep T distribution, which starts at α 2 s . Based on the analysis of Section 2, this is however not a surprise, because of large FEX and GSP contributions to the p tt T distribution. If one considers only events with an FCR topology, as done in the right-hand plot, the p tt T distribution ends up being substantially suppressed relative to p top,lep We have also checked the six other event-hardness scales from Table 1 and the patterns observed are in line with the analysis given above.
Our final comment of this section concerns the observation that the V = m tt /2 distribution is 12−14 times larger than the p top,lep T distribution. A significant ratio is expected because of the log(m tt /m top ) enhancement that is present in Eq. (2.3), associated the integral over ∆y tt up to its kinematic boundary, Eq. (2.4). Fig. 5 (left) shows the ∆y tt distribution in a bin of either the m tt /2 or p top,avg T hardness scale. In the lowest bin of ∆y tt , the results are independent of the choice of hardness scale. However at larger ∆y tt , the difference between the two histograms is striking, with the m tt /2 case dominated by values of ∆y tt close to the kinematic limit, a consequence not just of the LO distribution covering rapidities up to the kinematic boundary, but also of further apparent logarithmic enhancements for large ∆y tt at NLO and beyond (cf. Appendix B). It is important to be aware that the events at large ∆y tt involve low transverse momenta for the top quarks. This is illustrated in Fig. 5 (right), which shows the average top-quark transverse momentum as a function of ∆y tt for the same bin of m tt /2 as shown on the left. Close to the kinematic boundary of ∆y tt , where the cross section is largest, the top quarks have transverse momenta of the order of m top , which is to be expected given the basic kinematic relations that hold at LO.

Particle-level (reconstructed top) analysis
In this section, we examine whether it is feasible to carry out an analysis of tt topologies in actual collider events. Standard experimental tt analyses take either a resolved approach, or a boosted approach [57][58][59] to identifying the top-quark decay products. However, this strategy breaks down for the FEX topology because one top is boosted, while the other may have only a moderate p T . It breaks down also for the GSP topology, because a single fat jet contains both a top and an anti-top, and typical boosted-top tagging algorithms are not designed to identify both tops. Finally, we have seen that even in FCR topologies, a single bin of m tt /2 m top receives contributions from tops at low as well as high p T , and so it is not sufficient to only apply boosted-top tagging algorithms in measurements of the (high) m tt /2 distribution.
The strategy that we develop here is to adopt a resolved top identification algorithm as our baseline, but to provide inputs such that the algorithm continues to function even for tops at high p T . We will restrict our study to semi-leptonic tt events, and work with the assumption that in a full experimental environment, lepton tagging, double b-tagging and a missing-energy threshold would be sufficient to reduce backgrounds to a manageable level. 9

Event analysis
The event analysis algorithm that we develop as a proof of concept is given as Algorithm 2. It is intended to function across the full range of top transverse momenta with large LHC statistics, i.e. from low p T up to p T 1 TeV, and to be capable of reconstructing tt pairs that would lie in a single large-R jet, thus addressing the issues raised in the introduction to this section. Some of the analysis steps involve a certain amount of choice. When choosing between methods that are best at very high p T (very many times the top mass) and methods that are simple, we have generally chosen the latter.
Note that the declustering of step 3 is essential when the hadronic top quark is at high p T , because it resolves the decay products even when they have been clustered into fewer than three R = 0.4 jets j. 10 The approach that we use is similar to the early jet substructure work of Seymour [62] and also coincides with the approach adopted in the proposal [63] to use tops to characterise the time evolution of the heavy-ion medium. If one were studying top quarks with very high p T , it would probably be better to develop an approach based on the Cambridge/Aachen algorithm so as to reduce the high-p T toptagging's sensitivity to the underlying event and pileup. One might also wish to impose a constraint on the separation between candidate top decay products that depends on the p T of the top itself, similar to the variable-R approach used in Ref. [64,65] or alternatively a kinematically analogous fractional momentum cut on individual prongs, as used in Soft Drop [66,67] and a range of other taggers.

Algorithm 2 Event analysis algorithm at hadron (particle) level
Require: at least one lepton (we require it to have a transverse momentum of at least 25 GeV), missing transverse momentum and hadrons. 1: Cluster the hadronic part of the event with the anti-k t algorithm with R = 0.4 and discard any jets below some p t threshold, p T,min , as one would normally (we take p T,min = 30 GeV). 2: Optionally, e.g. if subject to finite detector acceptance, exclude jets and leptons with an absolute rapidity beyond some y max . The remaining set of jets is referred to as {j} and the hadrons contained within that set of jets is {H}. At very high p T , instead of the R J = 1 anti-k T algorithm used in step 3 of Algorithm 1, it might make more sense to adopt an algorithm such as flavour-k T [38,68] and possibly apply it directly to the hadrons {H t } and tops, i.e. to the set {H t , t,t}. The flavour-k t algorithm suppresses the clustering of lone soft-quarks within a hard jet, a situation which would contaminate the flavour of a hard jet. 11

Top reconstruction
The top reconstruction that we use is a so-called "resolved" algorithm, i.e. one that takes advantage of the fact that the top decay products should map to separate jets. The declustering procedure in step 3 of Algorithm 2 helps ensure that this is true even for high-p T tops. 11 These configurations should be assigned to the "other" category of Fig. 1, and this does not always occur with the anti-kT algorithm. The effects start only at order α 2 s ln pT /mtop relative to LO, and are practically negligible at the pT values that we study here, hence our choice to retain the simplicity of the anti-kT algorithm. The effects are conceptually interesting when L = ln pT /mtop 1, because higher-order logarithms have a BFKL [69,70] structure, as pointed out by Marchesini and Mueller [71].
There are many resolved top reconstruction algorithms in use for semi-leptonic tt events, i.e. those with a lepton, missing energy and jets, some of them b-tagged. The procedure we adopt is largely based on the fiducial top definition proposed in Ref. [72] and it makes use of invariant masses in order to choose which jets to cluster together to form top candidates. Our version has one small modification concerning the neutrino treatment.
First we reconstruct the neutrino momentum from the missing transverse momentum p miss T and the lepton 4-momentum p lep , setting p ν T = p miss T and determining p ν z from the constraint where M (X) refers to the invariant mass of an object X and m W is the mass of the W boson. Solving the resulting quadratic equation generally presents us with two solutions for p ν z , the component of the neutrino's momentum along the beamline. In cases where the two solutions are complex, we take their real part as the physical p ν z , and where both are real we take the root with smaller |p ν z | (this is the one small point where we differ from Ref. [72], which takes the root with larger |p ν z |). With this estimate of the kinematics of the leptonically decaying W boson, we attempt to identify a subset of the reclustered (sub)jets {j d } with the remaining decay products. Defining a semi-leptonic tt pair candidate by assigning one b-tagged jet to the leptonically decaying top quark candidate (t l ), another to the hadronically decaying top candidate (t h ), and a pair of non b-tagged jets as the decay products of the hadronically decaying W boson candidate (W h ), we calculate the quantity for each combination of jets satisfying 140 GeV ≤ M (t h/l ) ≤ 190 GeV, where m top = 173 GeV is our top-quark mass choice (we do not place any direct requirements on M (W h )). If no such combination of jets exists then we deem the reconstruction to have failed, otherwise the tt candidate pair with the lowest K 2 is taken to most closely describe the kinematics of the full decay chain.

Validation of reconstruction performance
To verify the performance of our reconstruction approach we consider events at parton level, where we include the decays of the top quarks and their subsequent showering. The use of parton-level events here allows us to unambiguously identify the source of each particle in the event, for example whether it came from a W decay. We will show results as a function of m J,avg T , because this event hardness characterisation scale yields events with a mix of all three topologies, cf. Fig. 2. Fig. 6 (left) shows the overall efficiency of the top reconstruction and topology identification. Consider a topology that we wish to test, say FCR. We identify all events where, using the Monte Carlo truth top quarks, the topology was classified as FCR. For a given bin of Monte Carlo truth m J,avg T , we then determine the fraction of those events satisfying the following conditions: 2. additionally the reconstructed top candidates should predominantly contain the corresponding truth top decay products, specifically, the b quarks should be correctly assigned in each candidate, and the two jets that make up the W candidate should each have received at least 50% of their p T from genuine W decay products; 3. finally the event topology based on the reconstructed top quarks should also be FCR.
One sees that the efficiency is about 10% for low values of m J,avg T , rising to 30% at large m J,avg T , with the FEX and GSP efficiencies being slightly lower than for FCR, which is to be expected given that the FEX and GSP topologies are made more difficult to reconstruct by the lower transverse momenta and/or potential proximity of the top decay products.
We also verify the purity of the reconstruction, separating out the study of the purity of the top reconstruction and of the topology identification. Fig. 6 (middle) shows the former. For a given reconstructed topology, it shows the fraction of the events in the given reconstructed m J,avg T bin for which the reconstructed top candidates predominantly contain the corresponding truth top decay products (condition 2 above). The top purity is in the range 50−80%, increasing with m J,avg T . Fig. 6 (right) shows the purity for the topology identification. Here we consider all events reconstructed as being in a given topology, and examine the fraction for which the truth topology is the same as the reconstructed one (irrespective of the whether the top candidates match the truth ones). This purity rapidly tends to 1 with increasing m J,avg T . Overall, the results of Fig. 6 give us confidence that the reconstruction approach proposed here can be successfully applied to realistic events.

Results
We close this article by repeating the main truth-level analyses of Section 3.2 on hadronlevel events (with multi-parton interactions switched on), and imposing a realistic detector rapidity acceptance, i.e. considering only jets and muons at rapidities below 2.5 in step 2 of Algorithm 2. Fig. 7 is the analogue of Fig. 2 using top-quark candidates as reconstructed from particle (hadron) level events. The two sets of plots are strikingly similar, which should not be surprising given the validation results shown above. Where modest differences arise, these can be understood as a consequence of the variations in reconstruction efficiencies across different topologies. For example one sees a slightly larger FCR contribution in the hadron-level reconstructed m J,avg T plot than in the parton-level one, reflecting the higher efficiencies for FCR reconstruction. Similarly we have checked that the hadron-level recon-√s = 13 TeV, POWHEG hvq + Py8, tt → bbjjμ ± ν |y j |, |y μ | < 2. |Δy tt | Δy tt distribution for fxed 800 < m tt /2 < 1000 GeV truth top partons reco tops, |y j/l |<4 reco tops, |y j/l |<2.5 Figure 9: Comparison of the truth (partonic-top) ∆y tt distribution with the distribution obtained for fully reconstructed top quarks in hadron (particle) level events. The (truth or reconstructed) tt pair satisfies the constraint 800 < m tt /2 < 1000 GeV. The histograms include all topologies.
structed analogue of Fig. 3 is close to the truth-level results. Fig. 8 shows hadron-level differential cross sections. Broadly speaking the results are similar to those with truth tops in Fig. 4. There is an overall reduction in the cross sections, which is to be expected given the 10−30% reconstruction efficiencies shown in Fig. 6.
One additional striking difference is that the m tt /2 distribution no longer shows the strong enhancement relative to other hardness scales at high values of m tt /2, for example being only 4 times larger than the p top,lep T distribution at around 1 TeV, rather than 12−14 times larger in Fig. 4. This can be understood from Fig. 9, which compares results for ∆y tt at large m tt /2 using truth tops and using tops as reconstructed from the final particles (in this case final partons). 12 The reconstruction procedure has the largest impact at large values of ∆y tt , where the top quarks have relatively low p T (which tends to reduce reconstruction efficiencies) and where additionally some of their decay products may fall beyond the rapidity acceptance. Fig. 9 shows results for a current LHC acceptance of |y| < 2.5 and for an HL-LHC type acceptance of |y| < 4. The latter is almost identical to results with full rapidity acceptance. Since the enhancement of the truth m tt /2 distribution is precisely due to the contributions from large ∆y tt , it is now evident why that enhancement is reduced with the reconstructed tops in Fig. 8. This suggests that for any kind of precision study with the m tt /2 variable it would be wise to apply an upper limit on |∆y tt |, e.g. |∆y tt | < 2. On one hand, within this region, Fig. 9 shows that measurements would then be largely unaffected by the experimental rapidity acceptance cuts on the input jets and leptons. On the other hand, the |∆y tt | < 2 rapidity window is still large enough, for example, to exploit the different ∆y tt dependencies of the qq and gg-induced channels to separately constrain gluon and (anti)-quark PDFs, cf. Fig. 10 of Appendix A.1.

Conclusions
The core observation of this paper is that energetic tt production involves far more than the simple leading-order back-to-back topology (flavour creation, FCR). If one selects events with a large momentum transfer and a tt pair, one finds roughly equal contributions from the FCR topology, and from each of the two topologies that start only at NLO: events with an initial-state g → tt splitting (flavour excitation, FEX) followed by a hard scattering of one of the tops, and events with a non-top hard scattering that is followed by a final state g → tt splitting (gluon splitting, GSP). The H tt+jets T and m J,avg T panels of Fig. 2 illustrate that all three of these topologies have similar cross sections. The reason for this surprising pattern is that the underlying hard 2 → 2 scattering channels that dominate at NLO involve t-channel gluon exchange, and so have squared matrix elements that are an order of magnitude larger than for the LO scattering channels, which involves either an s-channel gluon exchange or a t-channel quark exchange. That enhancement numerically compensates for the extra factor of α s that appears from the g → tt splitting in the FEX and GSP channels, as can be seen from Table 2. The specific mix of topologies depends critically on the choice of observable used to characterise the hardness of tt events, as anticipated in Section 2.3 and verified Fig. 2.
An awareness of the role played by different topologies is important in experimental measurements. Identifying the top quarks in FEX and GSP topologies brings additional challenges as compared to the FCR topology, for example because the two tops may have significantly different transverse momenta, or because they may end up within a single jet. This requires the use of top reconstruction techniques that transcend the usual resolved versus boosted paradigm, and we proposed a suitable fiducial-top definition in Section 4, which combines the approaches of Refs. [63] and [72]. The use of techniques that can successfully identify the tt pair in all topologies is especially critical for measurements that correct to the level of top-partons: if a measurement is blind to a certain topology, then the contribution from that topology to the final cross section risks being estimated entirely from simulation rather than data. 13 As we saw in Sections 3 and 4 it is possible to provide classifications of the topology of individual events. Given that these topologies involve different PDFs and different underlying 2 → 2 hard-scattering processes, separating out those topologies can help in extracting the most physics information from the data. This is potentially relevant in any use of energetic tt production for precision physics, whether PDF fits, EFT studies, searches for small direct BSM, or validation of simulation tools. A separation by topology would also seem wise when studying the order-by-order convergence of perturbative predictions. We therefore encourage future measurements and theoretical studies to further explore the rich topological structure of energetic tt production.
One consideration that we have not explored in any depth is that of enhancements of perturbative contributions by logarithms of the hardness scale divided by the top mass (e.g. FONLL [73,74], BFKL in both t-channel quark [75][76][77][78] and gluon exchange [69,70], BFKL in EW exchange [79], double logarithms [80][81][82][83] and BFKL logarithms [71,84,85] in final state fragmentation, and double-log small-x non-singlet enhancements [86,87]). At LHC energies, such logarithms only start to become relevant for tt production (cf. Eq. (2.7)). However, theoretically, the breadth of different classes of logarithmic enhancement would make for a compelling study. Such a study would probably be called for at higher collider energies (e.g. the 100 TeV of the FCC-hh), and could potentially also be of interest for bb production at the HL-LHC.

A Leading-order distributions
For reference, we present and comment on some analytic formulas for leading-order tt cross sections, in a limit where one kinematic variable is much larger than the top mass. The results are essentially textbook level, and can be straightforwardly be derived from matrix elements to be found, e.g., in Ref. [35]. They help provide some of the background to Section 2.1.

A.1 Distributions differential in m tt
While in the main text we have used m tt /2 for distributions, to keep the notation more compact, here we use the tt mass rather than half the mass, and write it as m tt . We also write m t ≡ m top . We consider the region m tt m top . We start with a distribution that is double differential in m tt and the rapidity between the top and anti-top quarks, ∆y tt = y t − yt.
where zf i (z) is the distribution of partons of flavour i and momentum fraction z and L qq sums over qq flavours. In Eq. (A.1), we have neglected corrections of the form m 2 top /(m t T ) 2 . For ∆y tt 1, the condition m tt m t implies large m t T m t and so they can be neglected. For large ∆y tt , one can have m t T ∼ m t , however one can verify that the relative contribution of the m 2 t /(m t T ) 2 terms is suppressed by a power of cosh ∆y tt . In Eq. (A.1), for large ∆y tt , the term proportional to the gluon luminosity becomes independent of ∆y tt , i.e. one obtains a flat distribution in ∆y tt . In contrast, the term proportional to the quark-antiquark luminosity vanishes. This difference in behaviour is a consequence of the fact that the gg channel includes a t-channel quark-exchange diagram, whereas the qq channel only involves s-channel exchanges.
The result in Eq. (A.1) is valid up to the kinematic limit given in Eq. . It is illustrated for fixed scale in the coupling and PDFs in Fig. 10 (left), separated into the gg and qq-induced components. However, it is physically inappropriate to use a fixed scale, even for a single value of m tt , because different rapidities involve substantially different transverse momenta, and associated momentum transfers. The impact of using a physically motivated scale choice, µ 2 R = µ 2 F = (H tt T /2) 2 = m 2 tt /(2(1 + cosh ∆y tt )), is shown in the right-hand plot. 14 This choice has a major impact on the shape of the distribution, with the plateaus in the gg-induced distribution acquiring a strong quasi-linear dependence on ∆y tt . This dependence arises from the scaling violations in the coupling and PDF, a consequence of ln µ 2 ln m 2 tt − ∆y tt . The precise slope depends on the x values being probed in the PDF.
The significant difference in ∆y tt dependence for qq and gg-induced production has the potential to provide a valuable handle separately for the gluon and quark parton distributions.
We can also integrate over ∆y tt to obtain the single-differential distribution,

A.2 Distributions differential in the top transverse momentum
For large p T,t m t , the leading order top-quark distribution doubly-differential in p T,t and ∆y tt is given by dσ dp 2 T,t d∆y tt T,t (8 cosh ∆y tt − 1) cosh ∆y tt 24(1 + cosh ∆y tt ) 3 L gg 2(1 + cosh ∆y tt )p 2 T,t /s + + 4 9 cosh ∆y tt (1 + cosh ∆y tt ) 3 L qq 2(1 + cosh ∆y tt )p 2 T,t /s , (A.4) neglecting terms relatively suppressed by powers of m 2 t /p 2 T,t . The structure here is very similar to that of Eq. (A.1), and indeed it is a trivial rewriting of that result since at LO any combination of p T,t and ∆y tt can be mapped to a combination of m tt and ∆y tt . Still, some features change: we have an overall factor of 1/p 4 T,t rather than 1/m 4 tt ; inside the outer square bracket the denominators have three powers of (1 + cosh ∆y tt ) rather than two. These are trivial consequences of the LO relation between m tt and p T,t , and of the Jacobian transformation in the differential cross section. Importantly, for a given bin of p T,t the luminosities are evaluated at a mass that now depends on the rapidity separation, whereas in a given bin of m tt they didn't.
The distributions for the rapidity difference are illustrated in Fig. 11. While the gginduced term still has a broader ∆y tt distribution than the qq-induced term, the difference in shapes is much smaller than for fixed m tt . In particular, both distributions are now concentrated around ∆y tt = 0 and quite strongly peaked there: this is partly because of the extra power of (1 + cosh ∆y tt ), and partly because the partonic luminosities drop off rapidly with increasing rapidity separation. The kinematic limit at fixed p T,t is reached   Those effects correspond to an α 3 s ln 2 m tt /m top contribution to the total cross section, i.e. a single-logarithmic enhancement. However Fig. 12 shows further strong dependence of the NLO/LO K-factor as a function of ∆y tt , which one may take evidence of further sources of logarithmic enhancement.
A full discussion of the different potential sources of logarithmic enhancement is significantly beyond the scope of this paper. However, we believe that it is still informative to outline the different classes of term that can contribute. Loss of top momentum through fragmentation can contribute logarithms at small ∆y tt (i.e. large p t T ), and is traditionally accounted for in the FONLL formalism [73,74]. At large ∆y tt , for t-channel top-quark exchange there are single logarithmic t-channel-fermion analogues of BFKL enhancement [75][76][77][78], associated with integrals over the rapidity of emitted gluons between the two final-state top quarks. Integrating over ∆y tt , we also expect double logarithmic α n s ln 2n m tt /m top enhancements, which relate to double-logarithmic non-singlet structure functions at small-x [86,87], whose formalism can be used [89] to predict the α n s ln 2n−2 x terms in the non-singlet P + NS (x) splitting functions [30,90,91].
To understand their origin in the context of tt production, one may examine the NLO case: consider an ISR g → tt splitting followed by a harder tg → gt scattering (which proceeds mainly through t-channel top exchange). We are interested in a situation where, in the ISR splitting, the anti-top and top have transverse momenta equal to some value p t1 . Take the anti-top to be emitted into the final-state, and the top to be the particle that initiates the tg → gt hard scattering. The anti-top and top carry longitudinal momentum fractions 1 − z and z respectively, with z 1. The tg → gt scattering itself involves large s, |u| |t| p 2 t1 . The zP g→tt (z) splitting function goes as α s z, and the reduced cross section dσ/d ln m tg for the tg → gt scattering goes as α 2 s m 2 tg ln mtg p t1 , where m 2 tg = zm 2 tt . The z factor from the splitting function compensates the 1/z from the 1/m 2 tg = 1/(zm 2 tt ) factor in the reduced cross section, resulting in a logarithmic integral over z. There is also a logarithmic integral over p t1 . Thus, after all integrations, we have a total cross section that goes as α 3 s ln 3 m tt /m top , i.e. a double-logarithmic enhancement relative to the LO cross section.
Starting from NNLO, additional contributions arise that involve t-channel gluon exchange, associated with four-top production. At sufficiently large m tt these will dominate the m tt distribution because dσ/dm 2 tt scales as α 4 s /(m 2 top m 2 tt ) instead of the α 2 s /m 4 tt seen in Eq. (2.3). 15 There are also electroweak contributions from qq → tt through a tchannel W exchange, which is dominated by qq = bb incoming flavours. These scale as α 2 EW /(m 2 top m 2 tt ), with a further α 2 s ln 2 m top /m b suppression factor coming from the requirements of two b-PDFs rather than gluon PDFs. These t-channel vector-boson exchange contributions, whether they involve gluons or electroweak bosons, will additionally be enhanced by QCD [69,70] and EW [79] BFKL-type logarithms. We have verified the size of 4-top and EW processes using Alpgen [92] v. 2.14 and MadGraph5 aMC@NLO [93] v. 2.8.2 respectively. At the LHC, even at large m tt values of up to 4 TeV, both processes bring small corrections relative to the QCD LO tt cross section, at most at the few-percent level. At a 100 TeV collider, which is beyond the scope of this article, considering m tt ≥ 20 TeV, the 1/m 2 top m 2 tt scaling results in the 4-top and EW processes becoming comparable to normal QCD tt production.
We have also explicitly verified the stability of the ∆y tt distribution from NLO to NNLO using the MiNNLO event sample of Ref. [55] across a range of m tt /2 values. 16 Relative to a fixed-order (MCFM) NLO calculation with a scale choice of µ = m tt /2 that is similar to that of the MiNNLO sample, we see a further substantial correction from the NNLO terms at large ∆y tt . It would be interesting to further investigate this with a scale choice such as H tt+jets T /2 or m J,avg T that tracks the transverse momenta of the tops across the full ∆y tt range.
We encourage further investigation of all the issues discussed in this Appendix.