Top Quark Mass Calibration for Monte Carlo Event Generators -- An Update

We generalize and update our former top quark mass calibration framework for Monte Carlo (MC) event generators based on the $e^+e^-$ hadron-level 2-jettiness $\tau_2$ distribution in the resonance region for boosted $t\bar t$ production, that was used to relate the PYTHIA 8.205 top mass parameter $m_t^{\rm MC}$ to the MSR mass $m_t^{\rm MSR}(R)$ and the pole mass $m_t^{\rm pole}$. The current most precise direct top mass measurements specifically determine $m_t^{\rm MC}$. The updated framework includes the addition of the shape variables sum of jet masses $\tau_s$ and modified jet mass $\tau_m$, and the treatment of two more gap subtraction schemes to remove the ${\cal O}(\Lambda_{\rm QCD})$ renormalon related to large-angle soft radiation. These generalizations entail implementing a more versatile shape-function fit procedure and accounting for a certain type of $(m_t/Q)^2$ power corrections to achieve gap-scheme and observable independent results. The theoretical description employs boosted heavy-quark effective theory (bHQET) at next-to-next-to-logarithmic order (N$^2$LL), matched to soft-collinear effective theory (SCET) at N$^2$LL and full QCD at next-to-leading order (NLO), and includes the dominant top width effects. Furthermore, the software framework has been modernized to use standard file and event record formats. We update the top mass calibration results by applying the new framework to PYTHIA 8.205, HERWIG 7.2 and SHERPA 2.2.11. Even though the hadron-level resonance positions produced by the three generators differ significantly for the same top mass parameter $m_t^{\rm MC}$ value, the calibration shows that these differences arise from the hadronization modeling. Indeed, we find that $m_t^{\rm MC}$ agrees with $m_t^{\rm MSR}(1\,\mbox{GeV})$ within $200$ MeV for the three generators and differs from the pole mass by $350$ to $600$ MeV.


Introduction
The top quark mass m t is one of most important parameters of the Standard Model (SM). Due to its large size, it plays an important role in many quantitative and conceptual aspects of the SM [1][2][3][4][5][6]. Its value also becomes increasingly important as an input in constraining the potential effects of physics beyond the SM [7]. The most precise determinations of this parameter are based on so called "direct measurements" where kinematical observables depending on the momenta of the top decay products (jets and/or charged leptons) in tt events are measured and compared to the corresponding predictions obtained from Monte Carlo (MC) event-generator simulations. Even though these MC event generators (MCs) are based on first principles, due to conceptual as well as practical limitations (and to gain generality), their main ingredients -parton shower and hadronization models -use approximations. Modeling assumptions in the hadronization process lead to a large set of free parameters which partly affect the parton showering description (e.g. the shower cut parameter). These parameters are fixed by tuning the MCs to standard observables in e + e − facilities and also the large hadron collider (LHC) to achieve an optimal reproduction of experimental measurements. Even though an adequate data description can be achieved, the physical meaning of the MCs inherent QCD parameters including the top quark mass m MC t , which is determined in direct measurements, becomes partly uncontrolled. The current particle data group (PDG) world average for direct measurements reads m MC t = (172.69 ± 0.30) GeV [8] and uses, among others, the respective combinations by CMS m MC t = (172.44 ± 0.48) GeV [9], ATLAS m MC t = (172.69 ± 0.48) GeV [10] and Tevatron m MC t = (174.30 ± 0.65) GeV [11]. Recently, there has been a very precise direct measurement not yet included in the world average m MC t = (171.77 ± 0.37) GeV from CMS [12]. Future projections for the HL-LHC indicate that uncertainties as small as 200 MeV for individual measurements may eventually be reached [13]. The basis of the direct measurements are reconstructed observables defined on the top quark decay product momenta, highly sensitive to the top quark mass, based on the idealization of considering the top quark as a physical particle. The approximation of on-shell top quarks with a factorized decay is also the foundation of state-of-the-art MCs. These observables are, however, strongly affected by soft gluon radiation as well as non-perturbative effects, where currently no consistent theoretical predictions based on systematic analytic methods exist. The direct top mass measurements are therefore solely based on MCs, and even though they have reached a high level of sophistication concerning the treatment of top quark decay products, the result for m MC t must be interpreted with some care when used as an input for theoretical predictions [13][14][15].
At this time, a number of first-principle insights have been obtained concerning the theoretical interpretation of the top quark MC mass parameter m MC t , which is, as a matter of principle, tied to the precision and implementation of the parton shower. The latter is the essential perturbative component of the MCs. At the purely partonic level, it can be shown for the coherent branching parton shower algorithm and inclusive shape variables (where coherent branching is NLL precise), that m MC t differs from the pole mass by a term proportional to Q 0 × α s (Q 2 0 ), where Q 0 is the transverse-momentum shower cut [16]. It has been suggested that a similar relation applies to any parton shower [17,18], and evidence supporting this view has been provided in Ref. [19] by numerical analyses for the dipole shower. However, an analytic proof for the dipole shower, comparable to that of coherent branching in Ref. [16], is still missing. Conceptually, the shower cut Q 0 acts like an infrared factorization scale that can be controlled by a renormalization group equation that is linear rather than logarithmic [16]. Physically, the shower cut Q 0 is also a resolution scale, below which real and virtual soft radiation are unresolved and cancel. It is therefore reasonable to associate m MC t with a low-scale short distance mass such as the MSR mass m MSR t (R = Q 0 ) [14,20,21] where the scale R acts as an IR resolution scale for self-energy corrections as well. Using the MSR mass also avoids the appearance of the pole-mass renormalon which would add an additional uncertainty between 110 MeV [22] and 250 MeV [23]. However, in practical MCs, where the shower cut is treated as a tuning parameter, the meaning of m MC t may also be influenced by details of the hadronization models [14]. This latter source of uncertainty has not yet been investigated quantitatively up to now, as it is non-trivial to disentangle their effects from the dynamics of the parton showers. The insights just described have been obtained in the context of e + e − collisions. They should in principle also apply for hadron colliders, but initial-state radiation processes such as multi parton interactions and underlying event, for which no systematic theoretical description exists at this time, make concrete quantitative statements on the precise theoretical interpretation of m MC t more difficult. It was stated in Ref. [14] that for the time being one may identify m MC t with the MSR mass at the scale R = 1.3 GeV with an uncertainty of 0.5 GeV. This quantification should be scrutinized through explicit phenomenological analyses.
Alternatively to the conceptual insights just mentioned, a number of studies to numerically relate m MC t to the top quark mass in a well-defined renormalization scheme have been carried out. In Ref. [24] a simultaneous measurement of m MC t and the inclusive tt cross section at the LHC was suggested, intended for a m MC t -independent measurement of the top quark mass from fixed-order cross section theoretical calculations. The method also yielded a quantification of the relation between m MC t and the pole and MS masses with an uncertainty of 2 GeV which, however, depends on the set of parton distribution functions employed for the analysis. A more precise direct calibration method was developed in Ref. [25], where hadron-level N 2 LL resummed and NLO matched theoretical predictions for the 2-jettiness distribution in the highly top-mass sensitive resonance region for boosted top production in e + e − annihilation were fitted to Pythia 8.205 [26] pseudo-data samples. The theoretical factorization framework to determine the 2-jettiness distribution was de-veloped in Refs. [27,28] and is based on soft-collinear effective theory (SCET) [29][30][31] and boosted heavy-quark effective theory [27,28]. Since the 2-jettiness distribution is an inclusive event-shape closely related to thrust, apart from a systematic resummation of soft, collinear and ultra-collinear QCD corrections, also a first-principle parametrization of the hadronization effects was employed. This yields a systematic hadron-level prediction depending on QCD parameters, such as the top mass (in any renormalization scheme) and the strong coupling, as well as the parameters of a non-perturbative shape function which was originally developed for inclusive B-meson decays in the endpoint region [32]. Furthermore, using a low-scale short-distance mass such as the MSR mass m MSR t (R) and the gap subtraction formalism [33], all O(Λ QCD ) renormalon effects, which arise from ultracollinear and large-angle soft radiation, can be removed systematically while at the same time avoiding the appearance of large logarithms. All these ingredients were combined to obtain a hadron-level cross section for the 2-jettiness distribution at N 2 LL+NLO in Ref. [34] 1 . These theoretical predictions were used in the calibration analysis of Ref. [25] and the following numerical relations were found: m MC t = m pole t + (0.57 ± 0.29) GeV and m MC t = m MSR t (1 GeV) + (0.18 ± 0.23) GeV. A similar analysis in the context of the LHC was performed by the ATLAS collaboration in Ref. [36] using soft-drop groomed [37] boosted top jet mass distributions, based on the NLL hadron-level theoretical description developed in Refs. [38,39], which are compatible with the e + e − calibration results, but have much larger uncertainties.
In this article, an update and a generalization of the calibration analysis of Ref. [25] is presented. The work is improved in several aspects: (i) In order to study observable independence, in addition to the 2-jettiness τ 2 distribution two additional shape variables, namely the sum of jet masses τ s and the modified jet mass τ m , are considered. The conceptual subtlety is that these three shape variables are affected differently bŷ (massive) power corrections which can be larger than the precision achieved in Ref. [25]. We study these power corrections and provide a well-motivated prescription to tame them.
(ii) To test the dependence on the gap subtraction scheme (to treat O(Λ QCD ) renormalons stemming from large-angle soft radiation), we implement and study two additional gap subtraction schemes, one of which was already employed in Ref. [35]. To deal with these two additional gap schemes we improve significantly the flexibility of the shape-function fit parameters. (iii) While the calibration analysis in Ref. [25] was solely for Pythia 8.205, here we also calibrate m MC for all theoretical ingredients that were employed in the original calibration analysis [25], but not displayed there due to lack of space. Within the theoretical uncertainties of our theoretical N 2 LL + NLO description we find observable and gap-scheme independence for the m MC t calibration, and reconfirm the numerical results obtained in the original analysis of Ref. [25]. The probably most interesting outcome is that, while the hadron-level distributions for the three shape variables differ considerably between Pythia, Herwig and Sherpa for the same m MC t input value, the calibration results for the relation of this parameter to the MSR mass are compatible within uncertainties of about 200 MeV. It turns out that the bulk of the differences observed for the hadron-level cross sections is associated to different modeling of hadronization effects among the three MCs.
The content of this article is as follows: In Sec. 2 we introduce the three shape variables used in our calibration analysis and show the corresponding predictions for the cross section using Pythia, Herwig and Sherpa for boosted top production in e + e − annihilation. These MC pseudo-data are used as the input for the top quark mass calibrations carried out in the subsequent sections. In Sec. 3 a detailed description of the N 2 LL+NLO differential cross section for the shape variables in the resonance region used for the calibration analysis is provided. Here we also discuss the generalizations concerning the gap subtraction schemes and them 2 t power corrections that were not considered in Ref. [25]. The fit procedure, data processing and our approach to determine uncertainties are explained in Sec. 4. Section 5 focuses on a first application of the updated calibration framework, namely reproducing the results given in Ref. [25], which were based on the original calibration setup. Here we also introduce the graphical representation of the calibration results used in the following sections of the article. In Sec. 6 we discuss the generalizations of the calibration framework needed to reliably carry out fits in the two additional gap subtraction schemes. Since performing these is in general quite costly and cumbersome, we introduce a minimal modification of the scale setting procedure that translates into a faster χ 2 minimization that we also use in the final calibration analysis. The role ofm 2 t power corrections and the necessity to partially account for them within the singular bHQET cross section to achieve observable-independent calibration fits are discussed in Sec. 7. In Sec. 8 we present the final results and Sec. 9 contains our conclusions. We added four appendices showing the NLO fixed-order QCD results for the three shape distributions needed for the matching calculations and providing some basic formulae concerning the renormalization-group evolution factors, the three gap subtraction schemes and the definition of distributions. In Appendix D we provide the relevant entries for the input files we used to generate the Pythia 8.305, Herwig 7.2 and Sherpa 2.2.11 shape distributions.

Shape Observables
In the calibration analyses carried out in this article we consider three e + e − inclusive event shapes. They are equivalent in the dijet limit concerning the dominant singular QCD effects, but differ at O(m 2 t /Q 2 ), which constitute the most relevant subleading power corrections to the factorized and resummed treatment of the singular contributions.
The first observable is 2-jettiness τ 2 defined as [45] where the sum runs over all final-state particles with momenta ⃗ p i . The maximum defines the thrust axis ⃗ n t and Q is the center of mass energy. If the masses of the final-state particles are neglected τ 2 agrees with thrust [46]. Since the event shapes are computed with the momenta of the top-quark decay products (which can be considered as light) τ 2 is numerically close to thrust for unstable top-pair production. The τ 2 distribution has a distinguished peak at its lower endpoint region that is very sensitive to the top mass, which we call the resonance region. For Q ≫ m t this region is dominated by dijet-like events where the top quarks are boosted and decay inside narrow back-to-back cones. This kinematic situation is the basis for the factorized treatment of the peak region, where the dominant large-angle soft QCD dynamics is analogous to that of e + e − thrust at LEP energies. For a stable top quark the lower endpoint is illustrating the strong top mass sensitivity. In fact, at tree-level and for stable tops, the distribution is proportional to a Dirac delta function peaking at τ 2,min . The expression for τ 2,min also shows the importance of O(m 2 t ) power corrections since them 4 t term in the expanded expression corresponds to a shift in the top quark mass of 2 to 5 GeV for Q in the range of 700 to 1400 GeV. It is quite obvious that, at the level of precision of our calibration analysis, besides the power corrections in τ 2,min shown above (which can be accounted for in a trivial manner), also other more subtle sources ofm 2 t power corrections need to be considered. By construction, apart from a broadening due to the finite top-quark width, 2-jettiness is insensitive to the details of the decay products dynamics as long as the finalstate kinematics does not affect the direction of ⃗ n t . For Q ≫ m t the out-of-hemisphere decays arem 2 t -suppressed, but (for unpolarized electron-positron beams) the top quarks, in their rest frame, decay to a good approximation isotropically such that this effect only modifies the overall normalization and not the resonance peak location [27]. This class of power corrections is therefore not considered in our theoretical description. In the resonance region, the fact that the top quark decays thus only leads to Breit-Wigner type smearing of the distribution, also setting the power counting of the peak region size to where Γ t ≈ 1.4 GeV is the top quark width. The peak location is, however, also strongly affected by perturbative and non-perturbative QCD corrections. The second observable we consider is the sum of jet masses (sJM) τ s , also referred to as the hemisphere mass sum. The plane perpendicular to the thrust axis ⃗ n t defines the top and antitop hemispheres, called a and b. This plane is used to define the normalized (squared) invariant masses where the sum runs over all final-state particles in either hemisphere a or b. The sum of jet masses is therefore defined as For a stable top quark its lower endpoint is 6) and the differential distribution shows the same features as 2-jettiness. If allm 2 t -suppressed power corrections are neglected, τ 2 and τ s are equivalent in the lower endpoint region, so that the dominant singular QCD effects are equivalent as well. However, as we shall show in the course of our analysis, them 2 t power corrections in the measurement function related to (perturbative as well as non-perturbative) large-angle soft radiation are particularly sizable compared to 2-jettiness (for which they are absent). This is discussed in detail in Sec. 3.4.2.
The third observable we consider is called modified jet mass (mJM) τ m , and defined from sJM by It has the important feature that the previously mentionedm 2 t power corrections to the large-angle soft radiation effects are absent as is also the case for 2-jettiness. We use the modified jet mass variable τ m as an important diagnostic tool for our treatment of power corrections. In fact, as we shall show, in contrast to sJM, 2-jettiness is the observable least sensitive tom 2 t power corrections in our implementation to account for them. Note that in the context of having massive particles in the final state, different schemes exist specifying precisely how the energies and momenta of the final-state particles enter the shape-variable definition. The scheme we have adopted for the three shape variables τ 2 , τ s and τ m has been called "massive scheme" in Ref. [47] and ensures that the leading nonperturbative correction (encoded quantitatively in the moment Ω 1 , see Sec. 3.1) is universal with respect to the effects of non-zero hadron masses [47,48]. When the "massive scheme" is used for stable heavy quarks, the sensitivity to their mass is increased as compared to other choices [49,50].
The three event-shape distributions in the peak region generated by Pythia 8.305 [42], [41] and Herwig 7.2 [40] (using their standard settings) for m MC t = 173 GeV and boosted-top pair production at center of mass (c.m.) energies Q = 700, 1000 and 1400 GeV are displayed in Fig. 1 as a function of the jet mass variable M J = Q τ /2, where τ stands for τ 2 , τ s and τ m . The scaling of M J with respect to τ visualizes directly the top mass sensitivity of the three shape variables since M J would be equal to the input top mass at tree-level for Γ t = 0 and neglectingm t power corrections. The differences in the peak positions between the shape variables and for different Q values visualizes the sizable impact of them 2 t power corrections. In addition, the shift of the peak positions to values much larger than 173 GeV is due to collinear and soft radiation, and in particular non-perturbative effects which are Q-dependent as well. It is also conspicuous that there  are considerable differences in the shape and the peak locations generated by the three MC event generators. While Pythia predicts a quite narrow and distinct peak shape, Herwig and Sherpa yield a broader resonance region with Herwig showing the widest peak distribution. Furthermore, the peak positions for Herwig and Sherpa are located at significantly larger M J values. One of the most interesting conceptual aspects of the analysis presented in this article is showing how all these differences affect the result for the m MC t calibration, since the theoretical framework must be capable of disentangling the perturbative radiation and the non-perturbative effects at the observable hadron level in order to provide reliable results for the top quark mass. For the framework presented here, it is essential that the calibration fits involve MC pseudo data from different Q values.
We finally note that in principle also C-parameter [51,52], in the modified version introduced in Refs. [50,53], could be a good candidate as a top-mass sensitive shape variable for the calibration. The singular QCD effects are closely related to the ones for the thrust-like shape variables above (see Refs. [54,55]). However, as was shown by a thorough N 2 LL + NLO analysis in Ref. [56], the C-parameter is highly sensitive to the way in which top-quark decay products are emitted, which causes a considerable broadening of the distribution in the resonance region that depends on the dynamics of the decay process and flattens the peak distribution in a way which cannot be accounted for with the Breit-Wigner smearing. This effect strongly reduces the top quark mass sensitivity and is so sizable that the C-parameter is not suitable for top mass calibration at the intended precision.
3 Resummed Cross Section at N 2 LL+NLO with Power Corrections

Factorization Formula in the Peak Region for the Singular Cross Section
A factorization theorem that resums large QCD logarithms in the resonance region of the 2-jettiness τ 2 distribution for e + e − → tt + X was derived in Refs. [27,28] using a sequence of effective field theories (EFTs). The factorization formula also applies to the sum of jet masses τ s and the modified jet mass τ m distributions in the resonance region. In the following subsection, for the convenience of the reader, we briefly review the basic theoretical ingredients at N 2 LL order precision, which have already been discussed at N 3 LL in Ref. [35].
Here we use the same notations as in Ref. [35], and generically refer to the shape variable as τ .
The factorization formula in the resonance region is derived in two steps [27,28]. The first one is matching QCD to SCET in order to integrate out fluctuations at the production scale Q, leading to an expansion in τ ∼m 2 t ≪ 1, and resums logarithms of combinations of τ andm 2 t . At leading power, the three shape variables τ 2 , τ s and τ m are equivalent. The resulting factorization formula exhibits the separation of large-angle soft and collinear dynamics known from massless quark event shapes (with λ ∼ √ τ ∼m t ≪ 1 the SCET power counting parameter) and is valid in the tail region of the distribution where there is no hierarchy between τ − τ min andm 2 t . The collinear modes (which contain the topquark decay products with four-momentum q µ ) exhibit invariant mass fluctuations scaling as (q 2 − m 2 t )/m t ∼ m t while the soft modes have a much lower virtuality. This SCET factorization formula may be formulated in the context of a 6-flavor QCD theory. In the resonance region defined by τ − τ min ∼m 2 t Γ t /m t ≪m 2 t , one has (q 2 − m 2 t )/m t ∼ Γ t , enforcing an additional factorization using bHQET. Off-shell (mass-mode) fluctuations of the top quark are integrated out such that the collinear dynamics only contains radiation involving momenta scaling like k rest,µ uc ∼ Γ t in the top-quark rest frame, denoted as ultracollinear modes. In the peak region the virtuality of the large-angle soft radiation is also lowered and involves momenta scaling like k µ s ∼m t Γ t ≳ Λ QCD in the e + e − c.m. frame. Here the ultra-collinear and large-angle soft dynamics are described in a 5-flavor scheme (treating all other quarks as massless). The fixed-order perturbative description of this process exhibits large logarithms of ratios of these momentum scales yielding the hierarchy Q ≫ m t ≫ Γ t >m t Γ t ≳ Λ QCD . The dominant (also called singular) tower of these logarithms with respect to ratios of scales is summed in the SCET/bHQET framework, see Tab. 1 for the naming convention of the logarithmic resummation orders. order log terms cusp non-cusp matching ) and fixed-order N k−1 LO matrix element and matching corrections [57]. Cusp and non-cusp anomalous dimensions and β-function coefficients are given in App. B.1. The R-anomalous dimension γ R and the renormalon subtraction series δ refer to both soft-gap and pole-mass renormalons.
The factorization formula in the resonance region τ − τ min ∼m 2 t Γ t /m t ≪m 2 t has the form where σ C 0 stands for the vector (C = V ) and axial-vector (C = A) massless quark Born cross sections, see Eqs. (A.3), and the factorization formula shown on the RHS is the same for V and A. The superscripts (6) and (5) of the various functions indicate the number of active flavors, and we have defined the off-shellness variablê is the leading term of the on-shell top quark Lorentz factor for the boost that relates the c.m. and top/antitop rest frames in the resonance region. It is tied to the definition of the velocity labels [35] of the heavy quarks in bHQET. These labels are controlled by a reparametrization invariance when subleading power corrections are included. The term m t appearing in ϱ is therefore not tied to a particular renormalization scheme, and should in practice be set to a kinematic mass compatible with the invariant mass of the top (or antitop) system [35] such as the pole mass m pole t or the MSR mass m MSR t (R ∼ 1 − 2 GeV). Possible variations of m t in ϱ are of order Γ t [35] and lead to tiny effects which are irrelevant in our analysis, and for this quantity we use the pole mass determined from the MS mass m t (m t ) at three-loops. The term τ min is the lower endpoint τ value for stable top quarks for which we always use the exact expressions quoted in Sec. 2. This already provides the treatment of the most importantm t power corrections, but is not yet sufficient for the precision of our analysis, as we discuss in Sec. 3.4.2.
The term H (6) Q is the SCET hard function, which is the modulus squared of the Wilson coefficient obtained by matching the QCD and SCET top-antitop currents at leading order inm t . It contains the short-distance dynamics at the scale Q = E cm ≫ m t that are integrated out in SCET and reads [28] The natural scaling for its renormalization scale is µ H ∼ Q, so that no large logarithms arise.
The term H (6) m is the current matching coefficient between SCET and bHQET. It contains top quark fluctuations that are off-shell in the resonance region and therefore integrated out [28,59]. It has the form where L m ≡ log m 2 t /µ 2 m . The 2-loop term, which is enhanced by a so-called rapidity logarithm, is formally counted as α 2 s logm 2 t ∼ O(α s ) and is therefore included at N 2 LL. This term appears since there are two types of fluctuations at the mass scale, collinear and soft mass modes, which have the same invariant mass but different rapidities with respect to the top-antitop axis. The N 2 LL rapidity logarithms can be resummed to all orders [59] (see also Ref. [60]), but the numerical effect is negligible and therefore not included here. The natural scaling for the renormalization scale is the top quark mass, µ m ∼ m t , also called the mass-mode scale. For H (6) m one may also use the 5-flavor scheme for the strong coupling at the order we consider. Numerically, the difference of the two choices is orders of magnitude smaller than our perturbative uncertainties [35]. Note that the scheme choice for the top mass m t appearing in L m is not relevant at this order either. Here we use the pole mass as obtained for ϱ. Using a different scheme leads to tiny effects as well.
The bHQET jet function J B,τ describes the ultra-collinear dynamics of the decaying top-antitop system. For stable top quarks it has the form where L i are the standard plus distributions defined in Eq. (C.4). The bHQET jet function accounts for the leading double-resonant contributions in the peak region. The top quark finite-width effects, which we treat in the leading double resonant approximation as well, are 2 In Ref. [58] the hard and jet functions in SCET and bHQET have been computed to all orders in the large-β0 approximation, which entails that terms proportional to α n+1 s n n ℓ are known for any n ≥ 0. In the same reference, it was also found that the SCET-bHQET current matching function H (3.8) The factors of 2 in G(ŝ, Γ t ) arise because J B,τ accounts for the top and antitop quarks. As was shown in Ref. [28], this treatment is equivalent to having an explicit imaginary width term in the (anti)top quark HQET propagator, ∼ 1/(v · k + iΓ t /2) with v µ the top quark velocity label. The natural scaling for the bHQET jet function renormalization scale is µ J ∼ŝ τ = Q 2 (τ − τ min )/m t , which is linearly increasing with τ to the right of the peak and of order Γ t on the resonance region and below. The residual mass term δm t ≡ m t − m pole t specifies the renormalization scheme that is used for the top mass m t , and enters through the replacementŝ →ŝ − Q 2 mt dτ min dmt δm t . In the pole mass scheme we have δm t = 0. In general δm t is a series starting at O(α s ) and one has to consistently expand to O(α s ) to obtain the bHQET jet function in any other top quark mass scheme. The mass schemes used in this work are explained in Sec. 3.2.1.
The soft functionŜ (5) τ accounts for the effects of large-angle soft radiation with respect to the top-antitop jet axis at parton-level. It has the form with the natural scaling µ S ∼ µ J m t /Q for its renormalization scale. In the resonance region µ S ∼ Γ t m t /Q = Γ t /ϱ, but the renormalization scale must be chosen such that µ S still remains sufficiently perturbative. This also implies that µ J is always set larger than the top quark width. The large-angle soft radiation also has a non-perturbative component featuring scales of order Λ QCD , which arise from hadronization effects related to the soft exchange between the two hemispheres. In the resonance region they are implemented through the convolution ofŜ (5) τ with a non-perturbative model function F (k) [33], referred to as the shape function, where the shift parameter∆ accounts for the average minimum hadronic energy deposit in each hemisphere originating from hadron masses and is also referred to as the "gap" [33]. More details on the gap and the concrete treatment of the dependence onδ are given in Sec. 3.2.2. The form of Eq. (3.10) with the convolution of the partonic soft and shape functions provides a first-principle QCD description of the hadronization effects associated to the large-angle soft radiation tied to the hemisphere prescription of the shape variables we consider in our analysis. It has the advantage that the partonic component of the cross section, which is obtained setting F (k) = δ(k), is not modified. This entails in particular that all infrared properties of the parton-level cross section such as its renormalon structure remain intact and that the treatment of subleading power corrections is straightforward. In this context the shape function F (k) has a form that peaks at k ∼ Λ QCD and is normalized to unity. The model character of Eq. (3.10) arises from the particular form of the ansatz (including the gap parameter∆) and the parametrization of the shape function in practical applications. We use the parametrization developed in Ref. [32], which has support for k ≥ 0 and has the following form: where P n are the Legendre polynomials and the normalization is fixed by i c 2 i = 1. We truncate the sum over basis functions f n at N = 3, which is sufficient to describe corrections to the peak shape due to non-perturbative effects. The function f 0 appearing in Eq. (3.11) is positive definite and has one peak, while the functions f n≥1 have n zeros. The latter are less important for the shape of the cross section's peak, because the details of the shape function are smeared by the convolution. The width of the region where the f n functions have a sizable contribution is determined by the parameter λ, which is adjusted such that the series in n converges rapidly and truncation in N still allows to describe all relevant nonperturbative features in the resonance region. The most important quantity specifying the impact of the shape function on the peak distribution is the shape function's first moment and provides a quantitative measure of where the shape function peaks. We stress that the shape function, Ω 1 and all other moments have a rigorous non-perturbative matrix element definition in QCD and are not model parameters [61,62]. It is only the parametrization of the shape function with the truncation order N that introduces model character in practical applications. As can be seen from the form of the factorization formula (3.1), the shape function shifts the peak location of the τ distribution by an amount ∆τ ∼ Ω 1 /Q. For the top quark mass dependence this corresponds to a shift of ∆m t ∼ Ω 1 Q/m t , which increases with Q. 3 Next to the top quark mass, the first moment of the shape function is therefore the other essential parameter that needs to be accounted for in the calibration fits. This dependence also illustrates the need to include MC samples produced for different Q values in order to lift the degeneracy of the peak location concerning its dependence on m t and Ω 1 . We also note that away from the resonance peak, in the tail region of the τ distribution where ℓ ≫ Λ QCD , it is in principle sufficient to use an operator product expansion (OPE) where the leading non-perturbative correction is related to Ω 1 . However, we always describe the non-perturbative effects through the convolution with the shape function, since this is fully compatible with the OPE description. The renormalization group (RG) evolution factor U B and U  describes the (5-flavor) evolution of the top-antitop production current matching in bHQET, which compensates the combined µ dependence of the bHQET jet and soft functions. These evolution factors sum up large logarithms of ratios of the different physical scales arising in the resonance region. Due to RG consistency relations [28] not all of them are independent quantities. In Eq. (3.1), the (6-flavor) SCET current evolution only proceeds until the mass mode scale µ m where the top quark off-shell mass modes are integrated out. The global scale µ should therefore be formally chosen below µ m . However, the dependence on µ cancels exactly and its specific value is irrelevant. The concrete expressions for the evolution factors are for convenience collected in App. B. Overall, we determine all evolution factors at N 2 LL order using the inputs indicated in Tab. 1. Here we use 4-loop running and 3-loop matching of α s for the evolution of the strong coupling provided by the REvolver library [63].

MSR Mass Scheme
For the top quark mass in our calibration analysis we employ the pole m pole (R), which is a short-distance mass free of the O(Λ QCD ) renormalon, leads to a higher level of stability and smaller theoretical uncertainties [25]. The MSR mass [20,21] is defined from the perturbative series for the difference between m pole t and the renormalon-free MS mass at the MS mass scale m t ≡ m s (m t )/(4π)] n , where the coefficients a MS n (n ℓ , n h ) are known up to 4-loops [64][65][66][67][68][69]. The scale-dependent top MS mass m (6) t (µ) is a 6-flavor quantity. Here n ℓ stands for number of massless flavors appearing in closed fermion loops and n h for those with mass m t . The MSR mass (which is called 'natural' MSR mass in Ref. [21]) is a 5-flavor quantity defined by integrating out all virtual top mass loops, The appearance of the scale R, which yields a linear RG R-evolution in contrast to the logarithmic µ evolution of the MS mass, is essential at low virtualities in the resonance bHQET region, where all radiation effects are governed by momentum scales much smaller than m t . In the bHQET jet function J (5) B,τ this R scaling is crucial since the absence of large logarithms implies the natural scale choice R ∼ŝ τ that cannot be realized for the MS mass. Sinceŝ τ ∼ Γ t is small in the resonance region, the MSR mass m MSR t (R) with some scale R ∼ Γ t is numerically close to the pole mass and therefore constitutes a kinematic mass like m pole t . 4 Note that for a complete cancellation of the O(Λ QCD ) renormalon it is mandatory to expand δm MSR t (R) in powers of α  (1 GeV) as the input reference mass, following the convention used in the original calibration [25]. Note that the MSR mass renormalization scale R used in the theoretical description is tied to the jet function scale µ J , see Eq. (3.35), which is typically in the range of 10 to 20 GeV. Therefore, the choice of reference scale does not have any particular physical meaning and results at a different reference scale can be obtained using R-evolution at 3-loops. The form of the R-evolution equation can be found up to 4-loops in Ref. [21], see also the App. F of Ref. [35] as well as Tab. 3. We use the REvolver library [63] for all RG evolution and the conversion between different mass schemes. REvolver also provides routines to convert to all other common top quark mass short-distance renormalization schemes used in the literature. 5 We note that the MS mass m (6) t (µ) is also close to the pole mass for scales around µ = 80 GeV (see e.g. Fig. 5 in Ref. [21]). This may erroneously be interpreted as a fact supporting the use of the MS mass as a low-scale short distance mass in the bHQET jet function. However, the unphysical logarithmic µ-dependence of m (6) t (µ) for these low scales is much stronger than the linear m MSR t (R) evolution for R ∼ Γ t , which at the practical level makes it hard to achieve high precision when scale variations are accounted for. At the conceptual level, the fact m   t (µ)α (6) s (µ)/(3π) cannot be consistently used in the bHQET jet function as its size by far exceeds that of dynamical QCD corrections in the peak region, no matter which choice of µ is adopted. This is related to the fact that the logarithms that are summed in m (6) t (µ) for µ < m t are not compatible with the low-scale bHQET dynamics in the heavy top quark rest frame. 4 Kinematic top quark mass schemes are sometimes also referred to as "schemes consistent with the top quark's Breit Wigner line shape" [15]. 5 Note that in the original calibration analysis [25] the so-called 'practical' MSR mass definition was employed where top quark loop corrections are not fully integrated out. The difference to the 'natural' MSR mass is at the level to 10 MeV [21] which is insignificant at the level of precision of our calibration framework.

Soft Gap Subtraction Schemes
The parton level soft functionŜ (5) τ (ℓ,δ = 0, µ S ) in Eq. (3.9) has a leading O(Λ QCD ) renormalon similar to the bHQET jet function in the pole mass scheme which also leads to instabilities of the partonic threshold. While the pole mass O(Λ QCD ) renormalon can be removed by a quark mass scheme change (and is therefore an artificial theoretical issue) the renormalon in the partonic soft function is physical and related to a non-perturbative effect. If we do not deal with this renormalon, eventually, at high orders, we would find instabilities in our calibration fits for the shape function's first moment Ω 1 in Eq. (3.14). Due to the linear dependence of Ω 1 on the non-perturbative gap parameter∆ we can associate its renormalon instability to∆. Thus, given a perturbative seriesδ(R s , µ S ) in powers of α (5) s (µ S ) that precisely reproduces the soft function O(Λ QCD ) renormalon asymptotic behavior, called the gap subtraction series, we can remove this renormalon. This is achieved using the gap formalism [33] which starts from the combined perturbative and non-perturbative soft function where both the partonic soft functionŜ (5) τ (ℓ,δ = 0, µ S ) and the shape function where ∆ is strictly scale-independent (in analogy to the pole mass). Since ∆ has dimensions of energy and the soft function renormalon in ∆ scales with ℓ ∼ŝ τ m t /Q =ŝ τ /ϱ, the gap subtraction series has dimensions of energy as well through an overall factor R s with the natural scale choice R s ∼ŝ τ /ϱ [33]: The scale R s and the renormalon-free gap parameter ∆(R s , µ S ) play roles in close analogy to the scale R and the MSR mass m MSR t (R), where ∆(R s , µ S ) also satisfies a linear RG equation in R s . We keep the argument µ S in ∆(R s , µ S ) since it, depending on the gap choice, may not be RG invariant with respect to µ S . The gap subtraction series can now be shifted into the partonic soft function in the convolution of Eq. (3.10) yielding [33] where the last equality, together with Eqs. (3.25) and (3.26) given below, define the form for the soft function shown in Eq. (3.1). Note that for the gap subtraction different schemes can be adopted (which are discussed in the following). This scheme dependence is suppressed in this notation. As for the residual mass term, δ(R s , µ S ) needs to be consistently expanded out together with the soft function in powers of α (5) s (µ S ) such that the corresponding renormalon is removed order-by-order. The renormalon-free gap parameter ∆(R s , µ S ), which depends on the scheme choice forδ(R s , µ S ) and obeys a RG evolution equation in R s (and potentially also in µ S ) remains in the shape function. Since R s and µ S are in general τ -dependent in order to properly sum all logarithms, see Sec. 3.3, we adopt ∆ 0 ≡ ∆(R ∆ , R ∆ ) at the reference scale R ∆ = 2 GeV as the specified input and determine ∆(R s , µ S ) through its R s (and potentially µ S ) evolution equation(s).
A general parametrization for suitable subtraction schemes, collectively referred to as R-gap schemes, has been introduced in Ref. [35] by imposing a general condition on the soft function at a point in position space, The solution is given bȳ A relation to obtain the coefficients s ij in terms of s i0 , the coefficients of the cusp and noncusp partonic soft function anomalous dimensions, and the QCD β-function is provided in App. C.2 of Ref. [35]. The switch A turns the non-trivial anomalous dimension in µ S on or off. When A = on the scale of the strong coupling in the subtraction series is µ S by construction, such that ∆(R s , µ S ) and the gap seriesδ(R s , µ S ) satisfy RG equations in R s and µ S . For A = off a gap subtraction series is defined such that it only depends on R s so that ∆(R s , µ S ) andδ(R s , µ S ) satisfy an RG equation in R s , but are µ S -invariant. In this work we employ three different gap subtraction schemes to test the gap scheme dependence of the calibration results:δ Scheme 1 was the first realization of a gap subtraction and originally devised in Ref. [70]. It was then applied for strong coupling determinations from e + e − event-shape data in Refs. [54,55,57,71]. It was also used in the original Pythia top mass calibration of Ref. [25]. The gap subtraction series reads where L R ≡ ln(µ S /R s ). Explicit results forδ (1) and the R s as well as µ S evolution equations can e.g. be found in Section 2.F of Ref. [57]. The choice n = 1 concerning the number of y-derivatives in Eq. (3.21) sets the non-logarithmic coefficient to zero since s 11 = 0, so that d This implies that at O(α s ) the gap subtraction in scheme 1 is zero for the choice R s = µ S . A subtraction with the proper sign is only achieved if R s < µ S . Therefore, in this scheme R s has to be strictly set below the soft renormalization scale µ S to achieve a useful subtraction term with the proper sign at O(α s ) in the peak region.
Gap scheme 3 was devised in Ref. [35] in a phenomenological analysis of the bHQET factorization formula (3.1) at N 3 LL to allow for the setting R s = µ S , since using R s < µ S in the peak region can lead to an unstable behavior of the N 3 LL corrections due to larger values of α s (µ S ). This is achieved by using the position-space partonic soft function in Eq. (3.21) without any y-derivative (i.e. n = 0). The subtraction series has the form The gap subtraction series ofδ (3) has a sizable O(α s ) term d 1 (R s , µ S ) = −8.35669, see Eq. (3.17). Gap scheme 3 is µ S -invariant, but retains a residual dependence on the soft scale µ S at any finite order once the strong coupling is expanded in powers of α (5) s (µ S ) as required by renormalon cancellation. We have noticed in our numerical studies that gap scheme 3 can yield some unphysical behavior of the τ distribution in the transition from the resonance peak to the tail region when paired together with the pole mass scheme and using profile functions with fast changing scales. This is caused by the sizable constant For a strongly increasing profile for R s (τ ) = µ S (τ ) to the right of the peak region this can give rise to a severe cancellation of the τ -dependence inŝ τ and ∆ 3 (R s (τ ), R s (τ )) in the factorization formula (3.1), so that the distribution does not show any more a falling tail. As we show in Secs. 6, 7 and 8, this can result in larger calibration uncertainties and instabilities for the top quark pole mass which are, however, an artifact of gap scheme 3. If the MSR mass scheme is adopted, this feature is absent, since the τ dependence of m MSR t (R(τ )) through its profile R(τ ) partly cancels the τ dependence of ∆ 3 (R s (τ ), R s (τ )), see also Sec. 5.B of Ref. [35]. Even though one may argue that this is yet another argument that disfavors the use of m pole t , we do not adopt this point of view because this feature does not arise in general.
The problematic feature of gap scheme 3 in the pole mass scheme motivates the introduction of gap scheme 2 which differs from gap 3 by setting ξ to e 5γ E instead of 1. For this ξ value the nonlogarithmic O(α s ) term d 1 (R s , µ S ) = −3.9363 is substantially smaller than for gap 3 such that the glitch mentioned above does not arise. One can consider gap scheme 2 to be halfway between gap schemes 1 and 3, which also motivates our numbering. Nevertheless, for R s = µ S gap scheme 2 is very effective in removing the soft function renormalon and will therefore be the gap scheme we use for quoting the final calibration results. Complete formulae forδ (i) (R s , µ S ) for the three gap schemes and the resulting R s -evolution equations for ∆ 1,2,3 (R s , µ S ), which we employ at 2-loops, are given in App. B.2. The subtraction series δ (i) are only needed to one-loop at N 2 LL + NLO order.
Through the shape function's dependence on ∆ (i) (R s , µ S ), where i stands for the gap scheme, the gap parameter∆ in the shape function in Eqs. (3.1) and (3.10) gains scheme dependence and evolves with R s and (potentially) µ S , which themselves are τ -dependent as well. The concrete expression for∆ reads [35,54,55,57,71] where ∆ 0 is a free parameter that agrees with the reference value This also results in a scale-dependent first shape-function moment where the expression for Ω 1 (λ, ∆, 3) is given in Eq. (3.14). We note that the term ∆ 0 represents an additional parameter of the shape function besides λ and the coefficients c i , see Eq. (3.11). Both parameters are in principle redundant if the coefficients c i provide sufficient flexibility in the calibration fits. For a large value of N this would be automatically ensured, but in phenomenological applications N must be chosen sufficiently small to be practical. In Refs. [35,54,55,57,71] and the original m MC t calibration analysis [25], where gap scheme 1 was employed, ∆ 0 = 0.05 GeV and λ = 0.5 GeV were used (i.e. they were not fit parameters), and it was checked that the coefficients c i with a proper choice of N provide sufficient flexibility for carrying out phenomenologically meaningful fits. For other gap schemes, this flexibility needs to be reinvestigated, which is the topic of Sec. 6. We also note that Ω 1 without any soft function renormalon subtraction (i.e. forδ = 0) was referred to as Ω 1 in Refs. [35,54,55,57,71].
It is the first moment at the reference scale R ∆ = 2 GeV, namely which we quote in the presentation of the results for the m MC t calibration. To show the outcome of our analyses in the different gap schemes, and to visualize the gap-scheme independence of the calibration, it is useful to convert the results for the Ω (i) 1 (R ∆ , R ∆ ) to a common reference scheme. Since gap scheme 1 was the first available in the literature, we pick it as our reference. The corresponding conversion formulae are obtained from the relation Ω

Profile Functions
The bHQET τ distribution in the resonance region depends on the natural renormalization scales µ H , µ m , µ J and µ S of the hard, mass-mode, bHQET jet and partonic soft functions, as well as on the soft renormalon subtraction scale R s and, if applicable, the MSR top mass scale R. Formally, at the all-order level, these scale dependences would vanish, but at any finite order a residual dependence remains, which we utilize as a quantification for the theoretical uncertainty of our N 2 LL + NLO description. While all scales can be considered as τ -independent directly on the peak, where the scale hierarchy is the largest, only µ H and µ m are also constant away from the peak. The scales µ J , µ S , R s and R, on the other hand, are in general τ -dependent as already explained in Sec. 3.1. While these scales should be varied to obtain an adequate theory uncertainty estimate, they also need to obey some physical correlations so that the natural scaling hierarchy is not upset. This is achieved by profile functions for all renormalization scales. For the differential distribution for massive quark production in the entire τ spectrum, an efficient parametrization of these profile functions was designed in Ref. [34], which is a generalization of the profile functions used for massless event-shape distributions designed and employed earlier in Refs. [54,55,57,71]. This profile parametrization applies to top and bottom quark production. The formulae for the profile functions of Ref. [34] in the resonance region, which we need for the calibration analysis, were also presented in Ref. [35]. Here, we review some basic aspects of these profile functions in the resonance region and point out some differences concerning the range of variations of the profile function parameters used in this article compared to the original calibration work of Ref. [25] and to the N 3 LL analysis of Ref. [35]. The general form of the τ -dependent jet and soft profile functions are given by piecewise functions, which describe the non-perturbative (τ < t 0 ), resummation (t 1 < τ < t 2 ) and fixed-order (τ > t s ) regions, where t 0 < t 1 < t 2 < t s . In the non-perturbative region the scales are frozen at a low but still perturbative value. In the resummation region the profiles grow steadily and in the fixed-order region they merge with the hard function scale µ H . These three regions are connected by transition regions, which allow the piece-wise functions F (τ < t a ) and G(τ > t b ) to be smoothly connected by a double quadratic function ζ(F (τ ), G(τ ), t a , t b , τ ) for t a < τ < t b , which has been given e.g. in Eq. (74) of Ref. [54]. Since the calibration only concerns the resonance region, where the bHQET description is sufficient, we only need the profile functions in the non-perturbative and the transition to resummation regions, so that only t 0 and t 1 are relevant. The boundary t 0 is located to the right of the peak position and the condition τ > t 0 roughly indicates the region where the OPE description with the first moment Ω 1 and the effects of the shape function agree to better than 2%. The boundary t 1 is located in the tail, where the distribution reaches about half of the peak height. They read [34] where τ min (m 2 t /Q 2 ) refers to the minimal stable quark τ values for the different shape variables given in Sec. 2 as a function of the top mass. This introduces two additional profile parameters d 0,1 which are varied in the interval [−0.05, +0.05], with zero as their default value.
The soft function scale profile is given by It is built on the generic bHQET jet scale functionμ J (τ ), which encodes the natural relation of the hard, jet and soft scales, with modulations controlled by the parameter e J ∈ [−3, 0] with default value e J = −1.5, that is constructed to have no effect in the fixed-order region far above the resonance. We refer to Ref. [34] for more details. The additional fixed-order region parameter n s ∈ [0.375, 0.425] has very little impact, and its default value is n s = 0.4. The soft function renormalon subtraction scale R s has to be close to the soft scale µ S , but we need two different prescriptions, one for gap scheme 1, where one should use R s < µ S , and another one for gap schemes 2 and 3, where we use R s = µ S . For gap parameters default value range of values n s 0.4 0.375 to 0.425 while for gap scheme 2 and 3 we use The renormalization scale of the MSR mass m MSR t (R) is always set to the jet scale: The renormalization scale for the remaining fixed-order QCD corrections at NLO that are not accounted for in the bHQET and SCET factorization formula, see Sec. 3.4, is denoted by µ ns . It is set to a weighted average of the hard and jet scales, The profile function formulae and parameters employed here are identical to the ones used for the original calibration [25] and in the analyses of Ref. [34], except for the gap 2 and 3 renormalon subtraction scales R (2,3) S since there only gap 1 was considered. In Ref. [34] the parameter ranges have been tested extensively at N 2 LL + NLO, where also the SCET and QCD non-singular corrections were accounted for. In the analysis of Ref. [35] the different variations , e J ∈ [−1.5, 1.5] and n s ∈ [0.475, 0.525] were adopted.
For gap 1, which was not analyzed in Ref. [35], the larger µ 0 variation is not suitable since R (1) S in Eq. (3.33) can become too low. Furthermore, in the analysis of Ref. [35] the singular bHQET factorization formula of Eq. (3.1) was determined and analyzed at N 3 LL order, but did not account for the non-singular SCET or QCD corrections. The different variation ranges for e J and n s used there yielded better convergence for these singular contributions. The difference is associated to the non-singular corrections, which, as we show in the subsequent section, are not small.

Non-singular Corrections
The bHQET factorization formula for the resummed singular τ distribution valid in the resonance region discussed in Sec. 3.1 and shown in Eq. (3.1) contains the leading distributional and non-perturbative corrections in a expansion inm t = m t /Q, Λ QCD /Q and Γ t /m t [27,28]. For reliable phenomenological applications, however, formally subleading power corrections need to be accounted for since they are not negligible. These can be included by recovering contributions that have been integrated out in the two-step matching from QCD to SCET at the scale Q and then from SCET to bHQET at the scale m t . The procedure to recover and include these subleading power corrections, which are called non-singular or matching corrections, is in general not unique since one may absorb some of them already in the singular bHQET factorization formula. At this point we remind the reader that using the term "non-singular" is somewhat misleading for the case of massive quark production, since the distributional terms contained in the leading singular bHQET cross section do not encode the entire singular distributional terms (i.e. delta-functions and plus-distributions) which have coefficients containingm 2 t power corrections. Since the difference to an approach where the singular cross section is treated in a strict power counting approach, where no subleading power contributions are absorbed, is associated to the resummation of formally power-suppressed logarithms of certain types of massive power corrections, any absorption prescription should be based on physical arguments. An essential guiding principle is that fixed-order final matched formulae reproduce the fixed-order full QCD result.
In the factorization formula (3.1) one such absorption prescription has been applied by using the exact kinematic stable-top quark expression for the minimal τ value τ min . This prescription resums kinematicm t power corrections beyond a strict power counting approach to all orders and is crucial for the phenomenological reliability of the factorization theorem, as we already mentioned in Sec. 2. It is physically sensible since the higher power m 2 t terms contained in τ min represent a global shift with respect to which the singular dynamical QCD effects unfold in a universal and observable-independent way. It is therefore physically unreasonable to treat the higher-powerm t terms in τ min in an expansion. Beyond the absorption concerning τ min , however, the factorization formula (3.1) applies strict power counting. We therefore label it with the subscript 'strict'. In the original 2-jettiness calibration analysis of Ref. [25] the same strict approach was applied and the non-singular corrections were included in two steps by first matching back to SCET and then to full QCD. In Sec. 3.4.1 we review the 'strict' approach of Ref. [25]. Since this approach does not yield consistent calibration results for the three observables 2-jettiness τ 2 , sJM τ s and mJM τ m , as we shall show in Sec. 7, we discuss an improved procedure in Sec. 3.4.2. Note that the presentations in this subsections still use the generic shape variable τ which can stand for τ 2 , τ s or τ m . The shape-variable dependent NLO fixed-order results, which are used to determine the QCD non-singular contributions are given in App. A. We also note that much more details on the matching procedure to achieve a reliable description for all values of τ can be found in Ref. [34].

QCD and SCET non-singular Distributions: Strict Power Counting
The full parton-level, stable-top, pole-mass and non-renormalon subtracted SCET and QCD matched resonance region cross section in the strict approach has the form The SCET non-singular cross section dσ C nsb / dτ | strict is defined from the fact that the bHQET factorization theorem emerges from the SCET factorization theorem valid for (q 2 − m 2 t )/m t ∼ m t when the off-shellness (q 2 − m 2 t )/m t reaches values below m t [27,28]. As already explained at the beginning of Sec. 3.1, apart from the resulting modified RG evolution factors in the 5-flavor scheme, this only affects the collinear sector, where the SCET jet function J SCET (s, µ) splits in the mass-mode matching function H m times the bHQET jet function J (5) B,τ plus a contribution that is power suppressed, nonsingular and also integrable in where q 2 = s + m 2 t is the inclusive invariant mass of the collinear radiation described by the SCET jet function. The NLO non-singular jet function J At NLO, the specification of the flavor-number scheme for the strong coupling in Eq. (3.38) is not yet relevant, but we indicate the choice implemented in our numerical code. Note that the SCET massive primary quark jet function has recently been computed at 2 loops in Ref. [60]. The SCET non-singular cross section in the resonance region is given by which means that the bHQET jet function is simply replaced by the non-singular SCET function with the analogue scale setting. This implies that the contributions in the nonsingular SCET jet function are treated as low-scale dynamical fluctuations. In the original calibration analysis [25] the scale setting J nsb (m tŝ , m t , µ m ) was used, such that the nonsingular SCET jet function was treated as an off-shell contributions. The difference is numerically insignificant since the overall contribution of the SCET non-singular cross section turns out to be tiny, and the difference concerning the resummed logarithms is irrelevant as well. Since J (5) nsb is a pure O(α s ) contribution all other fixed-order matrix elements in dσ C nsb / dτ are taken at tree-level. Therefore, the NLO expanded SCET non-singular cross section simply reads σ 0 Q 2 J (5) nsb (m tŝτ , m t , µ). The QCD non-singular cross section dσ C ns / dτ | strict is obtained by subtracting the bHQET and non-singular SCET cross sections expanded at O(α s ) from the NLO full QCD fixed-order cross section dσ C QCD / dτ | strict , all evaluated at the non-singular renormalization scale µ ns : Apart from the expression for τ min appearing in the bHQET singular cross section, only the QCD non-singular cross section is observable dependent. The functions A C,ns τ (m t ), B C,ns plus (m t ) and F NS,C,ns τ (τ,m t ) are obtained from the corresponding QCD functions shown in Eq. (A.1) upon the subtractions from the expanded singular bHQET and non-singular SCET cross sections. The NLO expanded singular bHQET cross section reads where A bHQET τ (m t ) and B bHQET plus (m t ) are given in Eqs. (3.56) for L s = 0. This yields the following results for the QCD non-singular functions and The NLO fixed-order functions R C 0 , A C τ , B C plus and F NS,C τ (τ,m t ) are defined in Eq. (A.1).

Absorption ofm 2 t Power Corrections
As we demonstrate in Sec. 7, the strict approach to define the bHQET cross section (including the exact expression for τ min ) and to construct the non-singular cross sections still yields a sizable residual observable dependence on the top quark mass calibration results, which arise fromm 2 t power corrections not contained in τ min . This motivates the absorption of additionalm 2 t power corrections in the singular bHQET differential distribution. In this section we discuss three kinds of absorption prescriptions, which remove the observable dependence for the calibration results. We emphasize that the discussions presented in this subsection do not constitute a comprehensive and complete treatment ofm 2 t power corrections. However, we believe that we have identified the ones most relevant for phenomenological applications and implemented a reasonable way to estimate the remaining uncertainties due tom 2 t power corrections that are not yet accounted for. We also mention that in the context of our analysis it turns out that the 2-jettiness distribution, which was used in the original calibration analysis [25], is largely insensitive to the treatment ofm 2 t corrections indicating its robustness with respect to power-suppressed effects.
We start the discussion concerning them 2 t power corrections with the observation that the non-perturbative shape function has a sizable impact on the location of the resonance peak position. This sensitivity on non-perturbative effects parametrized by the shape function is encoded in the measurement delta function δ(ŝ τ −ŝ−ϱℓ) appearing in the factorization formula (3.1). This corresponds to a generic modification of the kinematic variable of order δŝ τ ∼ (Q/m t )Ω 1 , which implies that the resonance peak position (with respect to the top mass) is shifted by the shape function effects by an amount ∆m t ∼ δŝ τ /2 ∼ QΩ 1 /(2m t ).
For Ω 1 in the range of 0.5 GeV to 1 GeV, which covers the typical values we obtain for Ω 1 from our calibration analysis, this corresponds to a contribution to the fitted top quark mass of around 1 to 2 GeV for Q in the range of 600 to 1400 GeV. This means thatm 2 t power corrections to the measurement delta function of the form δ[ŝ τ −ŝ − r τ,s (m t )ϱℓ] with r τ,s (m t ) = (1 + const ×m 2 t ) can still lead to shifts at the level of 250 to 300 MeV, larger than the uncertainties expected for the top quark mass at N 2 LL + NLO order [25]. It is therefore reasonable to include the rescaling factor r τ,s (m t ) for the shape variables we consider.
To that end, let us consider generic soft momenta k s and ks arising from large-angle soft radiation in the top (n) and antitop (n) hemispheres, respectively. In the absence of any ultra-collinear radiation one has for the four-momenta flowing in each hemisphere the following expressions: where t are the (stable) top and antitop velocities without large-angle soft radiation, which we assume to be in the z-direction. For the 2-jettiness variable τ 2 defined in Eq. (2.1) it is easy to see that soft momenta may modify the thrust axis which is along the z-direction in the absence of soft radiation, but this modification is of order k s ∼ ks leading to effects quadratic in k s,s . Let us now define n µ = (1, 0, 0, 1) andn µ = (1, 0, 0, −1), with the thrust axis pointing in the z-direction and use the usual light-cone decomposition of momenta p µ = p +n µ 2 + p − n µ 2 + p µ ⊥ . As a result we obtain so thatŝ τ 2 = ϱ(k + s + k − s ). We see that there are no O(m 2 t ) power corrections to the soft rescaling factor, and we therefore have For the sum of jet masses variable (sJM) τ s defined in Eq. (2.5) the situation is more complicated since invariant masses exhibit a non-linear dependence on the soft momenta k s,s . We apply the following heuristic consideration, neglecting again any soft modification of the thrust axis along with contributions quadratic in k s,s . We obtain that We can now write the p − n (p + n ) momentum components in terms of p + n (p − n ) using the relations which arise from energy conservation, and where ∆E = k + s + k − s = −k + s − k − s represents the soft energy imbalance between the two hemispheres. Together with Eq. (3.45) this yields where the linear soft contribution ∝ ∆E cancels between the two hemispheres and we have neglected all contributions quadratic in soft momenta or energies. As a result we havê s τs = r τs,s (m t )ϱ(k + s + k − s ) with (3.51) Note that the large-angle soft momenta k s,s appearing in Eqs. (3.49) and (3.50) are not exclusively related to on-shell gluons, but also account for the recoil effects on the top and antitop quarks, so that ∆E can have any sign. The result for r τs,s (m t ) thus accounts for the effects that radiation in one hemisphere has on the entire event.
The modified jet mass variable (mJM) τ m = τ s + τ 2 s /2 is designed such that the soft rescaling factor does not have a quadraticm 2 t term. Using the result on the second line of Eq. (3.50) we obtain such that we arrive atŝ τm = r τm,s (m t )ϱ(k + s + k − s ) with We use mJM as a diagnostic shape variable to cross check that the sizablem 2 t power corrections associated to r τs,s (m t ), which are present in the sJM variable are indeed absent in mJM.
The second absorption prescription is related to the observation that, as was observed in Ref. [50], the NLO fixed-order results given in App. A exhibit a universal observableindependent coefficient B plus (m t ) multiplying the plus-distribution term [1/(τ −τ min )] + once the tree-level cross section term R C 0 (m t ) is factored out, see Eq. (A.1). The plus distribution coefficient B plus (m t ) is also universal concerning vector (V) or axial-vector induced topantitop production. This universality does not only concern the three shape variables considered here, but applies to any global and infrared-safe event-shape variable [50]. It is therefore reasonable to assume that including the tree-level cross section term R C 0 (m t ) as a global factor multiplying the singular bHQET factorization formula resums another set of important power corrections. Together with the soft rescaling factor this motivates the following modified form of the parton-level, stable-top, pole-mass and non-renormalonsubtracted bHQET factorization formula which differs from the strict formula of Eq. (3.1) concerning the overall factor R C 0 (m t ) and the additional factor r τ,s (m t ) in the measurement delta-function. Expanded to O(α s ), which we need to determine the non-singular cross section this yields Note that L s is not a large logarithm, but O(m 2 t ) power-suppressed or zero. Since the modification of the measurement delta-function also applies in the context of SCET, the SCET non-singular cross section adopts the form Note that for the SCET non-singular cross section we do not factor out the tree-level factor R C 0 (m t ) since it leaves the structure of Eq. (3.44) intact, given our parametrization of the non-singular contribution F NS,C τ (τ,m t ) in the NLO fixed-order full QCD distribution shown in Eq. (A.1). The numerical impact is, however, tiny anyway, as we have already mentioned above in Sec. 3.4.1.
If we would stop here, the coefficient of the delta-function, h C τ (m t ), and the coefficient of the plus-distribution b(m t ) in the QCD non-singular cross section (with the phase space We now adopt a third prescription where these contributions are absorbed into the singular bHQET cross section as well. However, since we do not have any compelling physical argument supporting this prescription, we implement it with scaling factors which we vary in our calibration fits to estimate the uncertainty concerning our treatment of them 2 t power corrections. The final form of the bHQET cross section with all three absorption prescription implemented reads B,τ →J The scaling parameters ξ A1 and ξ B1 determine the fractions of the coefficients b(m t ) and h C τ (m t ) being absorbed into the bHQET cross section, where ξ A1 = ξ B1 = 1 refers to full absorption and ξ A1 = ξ B1 = 0 refers to the treatment where b(m t ) and h C τ (m t ) are fully contained in the QCD non-singular cross section. In our calibration fits we vary ξ A1 and ξ B1 independently in the interval [0, 2]. The scaling parameters ξ J and ξ B reflect how the deltafunction coefficient h C τ (m t ) is redistributed into the constant non-logarithmic contributions of the hard, mass-mode and jet bHQET functions. In our calibration fits they are varied independently in the interval [0, 1] with the constraint ξ J + ξ B ≤ 1. For the calibration fits, in order to quantify the uncertainty of our treatment of them 2 t power corrections, the values of ξ J , ξ B , ξ A1 and ξ B1 are chosen randomly in the ranges given above. Specifically, we pick the points } to be uniformly distributed on the unit-sphere in the first octant. This ensures a symmetrical distribution among the three coefficients. For ξ A1 and ξ B1 we independently use the Beta distribution N (x/2) −0.5 (1 − x/2) −0.5 in the interval x ∈ [0, 2], which conservatively enhances the population of the boundary regions close to 0 and 2. When the absorption prescription for the treatment ofm 2 t power corrections is used, the random variations of the ξ parameters is implemented in parallel to the 501 random profile function parameter variations. Thus, the variation of both types of parameters combined constitutes our estimate of the perturbative uncertainties.
Overall, the NLO expanded expression for the modified bHQET factorization formula with the three absorption prescriptions reads Note thatÃ C,bHQET τ (m t , ξ A1 , ξ B1 ) andB bHQET plus (m t , ξ B1 ) do not depend on the scaling parameters ξ J and ξ B since these only specify how h C τ (m t ) is distributed among the hard, mass-mode and bHQET jet functions. The QCD non-singular cross section then adopts the form We remind the reader that dσ C bHQET /dτ and dσ C nsb /dτ depend on the τ -dependent profiles for the renormalizations scales µ H , µ m , µ J and µ S . Furthermore, dσ C bHQET /dτ depends on the scaling parameters ξ A1 , ξ B1 , ξ J and ξ B , and dσ C ns /dτ depends on the scaling parameters ξ A1 and ξ B1 . This dependence is suppressed in the arguments to avoid cluttering.

Combining Ingredients
In Sec. 3.4 we have derived the full parton-level resonance τ distributions including the singular bHQET and non-singular cross sections in the limit of a stable top quark and without any renormalon subtractions. So the formulae for dσ C full,strict (τ )/dτ in Eq. (3.37) and for dσ C full,absorb (τ )/dτ in Eq. (3.66) are in the pole mass scheme and without any soft gap subtraction. For the event-shape distributions used in the calibration fits, the nonperturbative effects parametrized in the shape function F (k), the top quark width effects and the renormalon subtractions still need to be implemented. This is achieved by the following additional convolutions involving the shape function F (k − 2∆) of Eq. (3.11) and the Breit-Wigner function G(ŝ, Γ t ) of Eq. (3.8): where the residual mass δ m and the gap subtractionδ terms (for the three gap schemes we use) are discussed in Sec. 3.2. The mass m t appearing in the argument on the RHS refers to m MSR t (R) in the MSR mass scheme and to m pole t in the pole mass scheme. The same is true also for the top mass appearing in the denominator ofŝ τ in Eq. (3.2). Note that for the top mass appearing in the soft rescaling factor r τ,s (m t ) we always adopt the MSR mass m MSR t (5 GeV). The MSR-mass and gap subtractions are expanded strictly in α s at the same respective renormalization scales together with the bHQET jet and soft functions to guarantee a correct order-by-order cancellation of the renormalons.
We stress that the finite top width and non-perturbative corrections as well as the renormalon subtractions also affect the non-singular cross sections through the global convolution in Eq. (3.67). This implementation is important, since the final cross section can otherwise show severe instabilities when the singular delta-function or plus-distribution terms are not fully absorbed into the bHQET cross section. We finally mention that for the final expressions entering the calibration analysis the vector-(V) as well as axial-vector-(A) induced cross sections are added up: (3.68)

Fitting and Data Processing
In this section we provide details on the fit procedure and the data handling, as well as the theory grid we use in order to carry out the fits in a timely manner. They have been carried out as described in the original calibration analysis [25] and realized in the same way in this update. All routines, however, have been coded anew to replace the customwritten in-house calibration software framework of [25] by a workflow that supports current state-of-the-art libraries and data formats.

Basic Fit Procedure
We use a standard χ 2 fit procedure for the top quark mass m t (in either pole or MSR mass schemes) and the non-perturbative model parameters {c 0 , c 1 , c 2 , c 3 } (and in principle also ∆ 0 and λ), which we outline in the following. The shape function coefficients c i , see Eq. (3.11), are restricted by 3 i=0 c 2 i = 1, so the actual fit parameters are three euclidean angles {a} = (a 0 , a 1 , a 2 ).
The reference data are binned distributions of either 2-jettiness, sJM or mJM, which we simply refer to as τ , obtained from the MCs for the process e + e − → tt, where the top quarks decay through all allowed leptonic or hadronic channels, and each histogram contains 10 7 events. We use three different fit ranges around the peak of the distribution. These are denoted by (x, y), with the minimum and maximum value τ fit min and τ fit max defined as the position where the distribution drops to a fraction x and y, respectively, of the maximal peak height: The three ranges used are (0.6, 0.8), (0.7, 0.8) and (0.8, 0.8). To break the degeneracy of the peak position with respect to the top quark mass and the shape function (mostly due to the top mass independent value of Ω 1 for the latter) it is necessary to simultaneously include distributions at multiple c.m. energies Q. We used five different sets of Q values. In GeV units they read: (700, 1000, 1400), (800, 1000, 1400), (700 − 1400), (600, 1000, 1400) and (600 − 1400), where the ranges are in steps of 100 GeV. This gives 3 (ranges around peak) × 5 (Q sets) = 15 different "fit settings" (labeled with the subscript s below) of bins included in the χ 2 analyses. For a perfect theoretical description (and assuming that the MC data is equally perfect) these settings should have no influence on the outcome of the fits. The spread of the fit results for the various settings is therefore a quantification for the "incompatibility" between theory and MC. Since the theoretical (perturbative and power correction) uncertainties are already estimated through the variations of the profile-function and power-correction ξ-parameters -see below Eqs. (3.36) and (3.61), and Tab. 2 -the variation of the fit results with the choice of fit setting quantifies the uncertainty of the MC event generator. We therefore include the fit setting dependence as a separate source of uncertainty in addition to the perturbative one. We use the following procedure to obtain a central value and uncertainties for the top mass m t (and analogously for the shape function's first moment Ω 1 ): 1. For one fit setting, labeled by s, remove 1.5% of the upper and 1.5% of the lower m t values of the 501 best-fit values from the variation over the profiles (and ξ parameters when the absorption prescription form 2 t power corrections is used) to remove potential outliers. Let us call this cleaned up set of masses {m t } s . When quoting final combined uncertainties we quadratically add the perturbative and incompatibility uncertainties. Note that for Ω 1 the removal of outliers described in bullet point 1 is carried out independently.
The best-fit value for a single profile and one fit setting is obtained by minimizing the χ 2 function with respect to the fit parameters using the program Minuit [72], with The theory bin f theo Q,i (m t ; {a}, ∆ 0 , λ) at observable value τ i is defined as the differential cross section integrated between τ i and τ i+1 , which we callf theo Q,i (m t ; {a}, ∆ 0 , λ), divided by the norm if theo Q,i (m t ; {a}, ∆ 0 , λ), where the sum is over the τ range of the fit setting: Likewise, the MC generator bin f MC Q,i is defined as the sum of events with τ i < τ < τ i+1 , which we callf MC Q,i , divided by the norm N MC Q = if MC Q,i . So theory and MC histograms are normalized to 1 across the fit range (τ min , τ max ). The uncertainty σ Q,i is the statistical error of the event generator bin f MC Q,i obtained by naively dividing the bin errors ∆f MC Q,i of the unnormalized binsf MC Q,i by the norm N MC Q . This "naive" bin error σ Q,i ignores correlations between bins that are introduced by using histograms normalized to the fit range. We also tested the strict statistical treatment of performing the fits with the χ 2 values obtained by using the full covariance matrix for the normalized bins. The differences to the naive treatment of Eq. (4.2) for the fitted mass are at the sub-MeV level for individual profile fits. In light of the negligible differences, we adopt the naive treatment. We note that the size of the resulting numerical values of χ 2 do by themselves not have any physical meaning since MC modeling uncertainties are not included in the χ 2 -function. However, the relative size of the resulting minimal fit values, χ 2 min , quantifies the quality of the respective fit. In our results we therefore quote the mean and the standard deviation of χ 2 /dof over all 501 profiles and the 15 fit settings.
In principle, also the strong coupling α s may be fitted as a theoretical parameter (with the same fundamental meaning as the top quark mass) in the calibration fits. However, as was already pointed out in the original calibration analysis of Ref. [25], the χ 2 function has a very flat dependence on α s so that the strong coupling cannot be constrained in the calibration. The value of α (5) s (m Z ) thus has to taken as an input. In fact, using variations in the value of the input strong coupling value in the range α (5) s (m Z ) = 0.1181 ± 0.0013, which is substantially more conservative than the current world average [73], leads to changes in the top mass results from the calibration at the level of 20 MeV, which are negligible in comparison to the uncertainties obtained from the calibration at N 2 LL + NLO order. For the calibration fits we therefore adopt the value α The top quark width was fixed for theory and event generator to Γ t = 1.4 GeV. The generators use a tree-level e + e − → tt matrix element, which goes through their respective internal standard decayer, parton shower and hadronization model. Initial state radiation has been turned off. For Pythia 8.305 [42] we use the default setting and the standard Monash e + e − tune (7). For Herwig 7.2 [40] and Sherpa 2.2.11 [41] we use the default settings and tunes.

Details on Data Processing and Theory Evaluations
The MC pseudo data are generated with the standard setting of Pythia 8.305 [42], Sherpa 2.2.11 [41] and Herwig 7.2 [40] using the input files given in App. D. We use the program Rivet [43] paired with a python [74] custom-written analysis tool to convert per event kinematic information into histograms in the format yoda for our observables. This workflow works with all state-of-the-art MCs that support Rivet directly or the event record format HepMC [44]. The MC produces events across the full shape-variable range and the choice of the bin specification has no impact on the MC runtime. It is therefore safer to keep a large range and use narrow bins, since wider bins can always be produced by merging smaller ones without loosing information. For the histograms corresponding to a given Q value we use 10000 evenly spaced bins between 0.0 and 0.5 for each of our observables. This is also the width of the bins we use for the χ 2 function in Eq. (4.2). The results for the three shape distributions τ 2 , τ s and τ m and the three MCs in the peak region are shown in Fig. 1 exemplarily for m MC t = 173 GeV and for Q = 700, 1000 and 1400 GeV as a function of the jet mass variable M J = Q τ 2,s,m /2.

The theory cross section is based on the fortran-2008 [75] object-oriented program
Caliper [76], which we modified whenever necessary. For the concrete numerical evaluation at the partonic level we compute the bHQET factorization formula in Fourier space since all convolutions turn into easily manageable multiplications. We multiply out all matrix elements appearing in the factorization theorem, along with the gap and MSR mass renormalon subtraction series, and strictly truncate at O(α s ). On the contrary, resummation factors are fully multiplied to each of the terms resulting from the expansion and not expanded in any way with the matrix elements. The final result is then transformed back into momentum space using analytic formulae. All necessary expressions have already been given in Ref. [35] (see Sec. V A and the appendices) and shall not be repeated here.
The integration over the Breit-Wigner function is also carried out analytically, while the convolution with the shape function is done numerically in the peak region using the quadpack package [77]. The RG-evolution of the SCET non-singular contribution involves the evaluation of 3 F 2 and 2 F 1 hypergeometric functions, that in the resonance region can be efficiently computed as a Taylor series around the origin, keeping as many terms as necessary to achieve machine precision. The convolution of the QCD and SCET non-singular partonic distributions with the shape function is carried out numerically with quadpack. Since the theory cross section cannot be evaluated from scratch during the fits due to performance and speed constraints, extensive grids need to be implemented. We keep track of the dependence on the shape function coefficients c i exactly: the hadron-level cross section is written as a double sum over distribution functions f kℓ (τ, m t , Q, . . .) since one can factor out the quadratic double sum dependence on the c i of the shape function over the basis functions given in Eq. (3.11). We can therefore treat the dependence on the c i analytically and only generate grids for the distribution functions f kℓ (τ, m t , Q, . . .) which satisfy f kℓ = f ℓk . The ellipses stand for the dependence on the other parameters and will be suppressed from now on. Due to the normalization 3 i=0 c 2 i = 1 we express the c i in terms of euclidean angles a 1,2,3 , such that the cross section will depend on sines and cosines of those. We note that one has to sample multiple starting values to reliably find the true minimum in the χ 2 minimization procedure. For each choice of the 501 sets of profile functions (including the random values for the power correction ξ-parameters) and for fixed values of ∆ 0 , Q and λ, we generate grids for all the f kℓ functions in m t and τ .
The τ nodes of the grid lie in a range between 0 and t 1 (m t = 177 GeV, Q, d 1 = 0.25) which is defined in Eq. (3.29). The value of t 1 with the given arguments is larger than any of the upper boundary τ max of our fit ranges, defined by the smallest y parameter given below Eq. (4.1). The range of our MC histogram is also chosen such that t 1 always lies within. The τ values of the theory grid do not have to coincide with the MC histogram bin boundaries, since we compute the integrated bins from the interpolated distribution functions. To determine appropriate τ values for our grid we first find the peak t peak of the f 00 distribution with the fortran routine compass_search [78] using the tree-level and stable-top threshold τ min as starting value. We then generate 15 evenly spaced points in the range [0, t peak −0.4(t 1 −t peak )]. The next interval [t peak −0.4(t 1 −t peak ), t peak +0.4(t 1 −t peak )] is filled with 75 evenly spaced points, and the third interval [t peak + 0.4(t 1 − t peak ), t 1 ] has 10 evenly spaced points. We checked, by testing finer τ grids, that this setting provides an adequate interpolation quality in the actual peak region (with all f kℓ functions included) for the fits. The other dimension of the grids is the top quark mass value (for the pole mass for several R values using the 3-loop R-evolution and α (5) s (m Z = 91.188 GeV) = 0.117, 0.118 and 0.119 using the REvolver library [63]. The values for ∆m MSR t (R) depend to a very good approximation linearly on α is then sampled by Minuit.
We remind the reader that the procedure just described in this subsection applies for fixed values of ∆ 0 and λ.

Calibration Consistency Test with Previous Results for Pythia and Graphical Representation
The top mass calibration implementation and the results presented in this article constitute an update and generalization of the study carried out in Ref. [25] for Pythia 8.205. Thus, before we enter the discussion of the new analyses, a comparison with the results of Ref. [25] is in order. This also gives us the opportunity to introduce and explain the graphical scheme we employ to represent the results of the different calibration analyses in the following sections. In Ref. [25] the observable 2-jettiness τ 2 was used for the calibration and the gap scheme 1 defined in Eq. (3.22) was employed for the renormalon subtraction concerning large-angle soft radiation. As already explained in Sec. 3.2.2, see paragraph below Eq. (3.26), in Ref. [25] ∆ 0 = 0.05 GeV and λ = 0.5 GeV were adopted for the parametrization of the shape function and it was checked that these values provide sufficient flexibility for the shape function fits through the coefficients c i . The analysis was carried out in the pole and MSR mass schemes, adopting m MSR t (R = 1 GeV) as the quoted reference mass for the latter. We note that in the resonance region of the cross section the MSR mass is evaluated at much higher R scales described by the profile function for R(τ ) given in Eq. (   We remind the reader that the R-evolution of the MSR mass is mass-independent so that the difference On the other hand, there is a significant discrepancy in the pole mass analysis at both orders. The perturbative uncertainties decrease substantially at N 2 LL+NLO in comparison to NLL+LO, but the incompatibility uncertainties, which quantify the disagreement of the MC event generator remain comparable. The pole mass calibration results exhibit a large correction between orders, which is associated to the fact that the NLO corrections are larger in the pole mass scheme. The more stable MSR mass results illustrate that in this scheme, and with the proper choice of the MSR mass scale R, a sizable fraction of the higher-order QCD corrections related to the top mass sensitivity of the τ 2 distribution in the peak region are absorbed in the mass. Due to the absence of the pole-mass infrared renormalon in this short-distance scheme, the MSR results are expected to be more stable also at higher orders than those of the pole mass.
The right half of Tab. 4 shows the results of the calibration fits with our new setup and for Pythia 8.305. Up to small differences they are equivalent to the results quoted in Ref. [25]. We have also carried out a calibration with our new setup for Pythia 8.205 which yields numbers that are within 10 MeV equivalent to the ones shown in the right half of the table. The agreement between the new results and those from Ref. [25] means that the differences between the old and new fit setup only have a marginal effect. The features in the new setup which have been changed compared to the one of Ref. [25] are: 1. The renormalization scale of SCET non-singular term in Eq. (3.40) is µ J , the renormalization scale of the distributional terms in the bHQET jet function. In the old setup that scale was frozen at the mass mode matching scale µ m . The effect of this change is tiny because the contribution of this non-singular is small.
2. In the new setup, the interpolation over m t is at the bin level and a simultaneous fit of all parameters is carried out. The approach of the old setup was to first minimize with respect to the shape function parameters at fixed m t giving χ 2 (m t , {a min (m t )}), then interpolating this marginalized χ 2 over m t and finding the minimum with respect to m t . Both methods are in principle equivalent if the grid in m t is fine enough, but the new fit procedure is in general more robust.
With the new setup, which allows for more freedom in the parametrization of the shape function, it turns out that these two Q sets, which are quite restricted in the range of Q values, are not able to break the degeneracy between m t and Ω 1 . They have therefore been dropped in the new setup for efficiency reasons. We have checked that the removal of these two Q sets only has small effects on the final results quoted in Ref. [25].
We note that all results in Tab. 4, like those quoted in Refs. [25], are based on the strict approach for the treatment ofm  to the right of the graphical representation, where the perturbative and compatibility uncertainties appear in first and second place, respectively. In parentheses we also show the average minimal χ 2 /dof value and standard deviation of all the fits (from the different profile functions and Q sets after removing outliers as described in Sec. 4.1). We emphasize that the size for χ 2 /dof is arbitrary due to the ad-hoc treatment of the uncertainties entering the denominators in where the gap subtraction series on the RHS are evaluated at the scale 2 GeV and truncated at O(α s ). We note that the difference between the values for Ω

Refinement for Shape Function Fits
As we have already mentioned in Sec. 3.2.2, see paragraph below Eq. (3.26), employing a fit for the model function coefficients c 0 to c 3 while fixing the shape-function parameters ∆ 0 = 0.05 GeV and λ = 0.5 GeV, is adequate only for gap scheme 1. In this section we investigate the modifications needed to carry out reliable shape-function fits for gap schemes 2 and 3, and we explain the fast shape-function fit procedure we adopt for our final calibration analysis.

Gap Dependent Fits
In the upper part of Fig. 3 we display the calibration results for the setup discussed in Sec. 5 for all three gap schemes, based on the 2-jettiness distribution τ 2 and the strict treatment ofm 2 t power corrections. The blue bars are the results already displayed in Fig. 2, while the orange and green bars refer to gap schemes 2 and 3, respectively. The results for these two schemes differ strongly from one another, but also from gap scheme 1. However, we also observe, that the values for χ 2 /dof are significantly larger for gap 2 and even more for gap 3, indicating a much worse fit for these two schemes. The differences in the fit results for Ω 1 for the three gap schemes (even after conversion to gap scheme 1) are furthermore similar to the scheme differences themselves (prior to the conversion to scheme 1) as can be seen from the vertical red lines. This shows that the parametrization of the shape function we used for gap scheme 1 with the fixed values ∆ 0 = 0.05 GeV and λ = 0.5 GeV, and using c 0 -c 3 as fit parameters is not adequate for gap schemes 2 and 3.
The shape function parameters that are naturally connected to the first moment of the shape function Ω 1 in Eq. (3.13) and the effects of the gap scheme, are the renormalon free gap parameter∆ =∆ (i) (R s , µ S ) and more specifically ∆ 0 defined in Eq. (3.25). Recall that at the reference scale R s = µ δ ≡ R ∆ = 2 GeV we have∆ = ∆ 0 . In the limit N → ∞ in Eq. (3.11) any form of the shape function F (k; λ, {c i }, N ) could be accurately parametrized in terms of the infinite sequence of coefficients {c i } for any value of ∆ 0 and λ. The results shown in the upper part of Fig. 3 indicate that for the truncation value N = 3 we adopt this is not any more the case. From the mathematical perspective this means that for the values ∆ 0 = 0.05 GeV and λ = 0.5 GeV the quadratic polynomial in Eq. (3.14) is bounded too tightly on the hyper-sphere 3 i=0 c 2 i = 1 for gap schemes 2 and 3. A resolution is to treat ∆ 0 as an additional fit parameter.
To efficiently perform the fits we can add an additional ∆ 0 -dimension to the grid and interpolation procedure described in Sec. 4.2. This additional ∆ 0 dependence can be handled in the same way as the dependence on the top quark mass. We use steps of the size δ∆ 0 = 0.05 GeV within the interval [−1.00 GeV, 1.90 GeV] which safely covers all gap and mass schemes at NLL+LO and N 2 LL+NLO. The generalization of Eq. (4.6) then readŝ where MINUIT is now able to smoothly sample in {a}, m t , and ∆ 0 . The outcome of the calibration fits within this extended framework is shown in the three lower sections of Fig. 3 for λ = 0.5, 1.1 and 1.5 yielding good fits with equivalent top mass and Ω 1 best-fit values within their uncertainties and χ 2 /dof values for all settings. The larger uncertainties we observe for Ω 1 for the pole mass fit results in gap scheme 3 are caused by its large subtraction coefficient, as we already anticipated in Sec. 3.2.2 in the text after Eq. (3.23). Similar observations for gap scheme 3 are also made in the subsequent fit results, and we emphasize that this is an artifact of this gap scheme. The independence concerning the width parameter λ indicates that there is a strong degeneracy concerning ∆ 0 and λ, and that λ can be safely fixed within a broad interval. The results with ∆ 0 as a fit parameter also agree with the original fit setup with fixed values ∆ 0 = 0.05 GeV and λ = 0.5 GeV for gap scheme 1, reassuring that the original fit setup is perfectly adequate for this gap scheme.

Fast Fit Procedure with ∆ 0 Dependent Profiles
Using ∆ 0 as a general and independent fit parameter comes with the downside that the size of the interpolation grid increases substantially. This makes the general setup for a floating ∆ 0 fit as described in the previous section very costly and time intensive. For calibration studies this setup is only practical if the ∆ 0 grid dimension is based on much smaller gapscheme-dependent ranges and if the one-dimensional spline interpolation that was applied to the m t -dimension before is now replaced by a two-dimensional spline interpolation in the top mass and ∆ 0 directions, which leads to lower interpolation precision. For detailed and extended calibration studies that approach turns out to be too slow and extensive. For producing the final results we therefore adopt a physically equivalent, but much faster version of the floating ∆ 0 fit approach which, however, also requires setting suitable values for λ. This fast approach is described in the following subsection.
The fast version of the floating ∆ 0 fit procedure is based on the observation that the ∆ 0 dependence of the theoretical τ distributions is formally related to a trivial Q dependent shift in τ , see Eq. (3.1). This trivial ∆ 0 dependence would, however, only arise if the theory distributions were strictly renormalization-scale independent. In practice, the presence of the profile functions µ i (τ ) yields a much more complicated dependence on ∆ 0 . On the other hand, this complication diminishes at increasing orders due to a smaller dependence of the τ -distribution on the renormalization scales.
For the fast version of the floating ∆ 0 calibration fits we make use of this observation and generate our τ -grids with one fixed ∆ (i) 0,grid value adequate for each gap scheme i and obtain the distribution for any other ∆ 0 by sampling shifted points: This implies that all profile functions are also shifted accordingly: For gap scheme 1 we use ∆ 0,grid on the τ i grid values 7 and determine 2-D spline interpolations of the F kℓ (τ, m t , Q) over m t and τ . The binned distribution functions that enter Eq. (6.1) are then determined from the formula 6) which yields very accurate results due to the small size of our bins. This approach provides a substantial speed gain, since we can use a standard interpolator routine (Python class scipy.interpolate.RectBivariateSpline) that supports vectorization for parallelized evaluation. The creation of the grids as just described is substantially faster and relies on much smaller data files due to the removal of the ∆ 0 grid dimension. In addition, this reduces the time required to distribute the grids to each node of the computer cluster needed to carry out the fits.
The fast floating ∆ 0 fit approach just described reproduces within errors the results of the general and more flexible but very slow floating ∆ 0 fit procedure of Sec. 6.1. But it also reintroduces a dependence on the value of λ and the gap scheme in the uncertainties when the calibration is carried out for the pole mass. In Fig. 4 the results for the fast floating ∆ 0 calibration fits in the MSR and pole mass schemes for the 2-jettiness distribution at N 2 LL+NLO order are shown for λ between 0.5 GeV and 1.5 GeV for Pythia 8.305. We see that the results stabilize and yield smaller values for χ 2 /dof only for λ ≥ 1.1 GeV. Compared to the results shown in Fig. 3  errors have increased and smaller values for χ 2 /dof can be reached, but the results are fully compatible with those of Fig. 3. For our final analyses we therefore adopt the fast floating ∆ 0 fit procedure with λ = 1.1 GeV for the calibration fits for Pythia 8.305. As already anticipated (see also Sec. 3.2.2), the renormalization scale uncertainties for the pole mass fit results in gap scheme 3 are generally larger than for the other gap schemes. We remind the reader that this is an artifact of gap scheme 3.
We have carried out analogous comparative analyses for Herwig 7.2 and Sherpa 2.2.11. The results for the fast floating ∆ 0 fit approach for Herwig 7.2 and Sherpa 2.2.11 are shown in Figs. 5 and 6, respectively. We observe again a stabilization of the results and improved fits for larger λ values, but a much stronger dependence on λ than for Pythia. For Herwig and Sherpa using a large value for λ is even more important than for Pythia in order to obtain reliable results with the fast floating ∆ 0 fit procedure. This can be understood from the fact that the hadron-level distributions generated by Herwig and Sherpa are much broader than those from Pythia, as can be clearly seen in Fig. 1. As we show in the discussion of our final results in Sec. 8 this must be attributed to the fact that for the standard tunes we have employed, the hadronization effects (i.e. the values for Ω 1 ) are substantially larger for Herwig and Sherpa than for Pythia. When we apply the fast floating ∆ 0 fit procedure for Herwig 7.2 we use λ = 1.5 GeV while for Sherpa 2.2.11 we adopt λ = 1.3 GeV. As for the Pythia fits, shown in Fig. 4, we observe particularly sizable uncertainties for the pole mass fits in gap scheme 3, and to a lesser extent also in gap

Observable Universality and Power Corrections
In the preparatory calibration analyses carried out in Secs. 5 and 6 based on the 2-jettiness distribution we have used the strict treatment ofm 2 t = (m t /Q) 2 power corrections where, apart from incorporating the exactm t -dependent expression for τ min , the leading singular bHQET cross section is defined strictly excluding any formally subleadingm 2 t power corrections. This strict treatment ofm 2 t power corrections has been explained in Sec. 3.4.1 and was employed in the original calibration analysis of Ref. [25]. In Sec. 3.4.2 we have provided conceptual arguments explaining why the strict treatment may not suffice at the precision achieved at N 2 LL+NLO which yields uncertainties of around 200 MeV, as it may lead to a discrepancy for shape observables with different sensitivity tom 2 t power corrections. In the following we confirm these arguments by carrying out top mass calibration analyses for all three shape variables, 2-jettiness τ 2 , the sum of jet masses (sJM) τ s and the modified jet mass (mJM) τ m . We demonstrate that the strict power correction treatment does not suffice to achieve observable independence and that the absorption prescription laid out in Sec. 3.4.2 is mandatory.
In Fig. 7 the results for the top mass calibration for Pythia 8.305 in the strict power correction treatment is shown for all three shape variables using gap schemes 1, 2 and 3,   for the pole as well as the MSR mass and at N 2 LL+NLO and NLL+LO. Here and in all subsequent calibration fits we employ the fast floating ∆ 0 fit procedure described in Sec. 6.2. It is conspicuous that all top mass results for the sJM variable are systematically lower by around 400 MeV compared to the outcome for the 2-jettiness and mJM variables. At the same time, the sJM fit results for Ω 1 are systematically larger by around 200 GeV than for 2-jettiness and mJM. On the other hand, the results for 2-jettiness and mJM differ only slightly and are in agreement. The consistency of the results for 2-jettiness and mJM and the discrepancy with the sJM results strongly support the conceptual arguments given in Sec. 3.4.2 emphasizing the practical relevance of them 2 t power corrections and in particular the important role of the soft rescaling factors r τ,s (m t ) from Eqs. (3.47), (3.51) and (3.53) in the measurement δ-function to achieve observable-independent calibration results.
In Fig. 8 we now show the calibration results when the absorption prescriptions for thê m 2 t power corrections given in Sec. 3.4.2 are employed. Here we only provide the results for gap scheme 2 since the observations for gap schemes 1 and 3 are very similar. In the upper portion of Fig. 8 the results for the strict power correction treatment already given in Fig. 7 are shown as a reference. The middle portion shows the results of our absorption prescription, but setting the soft rescaling factor for all shape variables to unity,   GeV. The different sections are: "strict pc" uses strict treatment of (m t /Q) 2 power corrections, "absorb (r s = 1)" absorbs coefficients of distributions from the non-singular contribution into the resummed cross section, and "absorb" additionally includes the correct measurement power correction r s . r τ,s (m t ) = 1. We observe a small increase of about 100 to 150 MeV for the top quarks masses (except the NLL+LO order pole-mass results) and a comparable decrease for Ω 1 . The uncertainties at N 2 LL+NLO are in general a bit larger as a result of the additional ξ parameter variations. However, the discrepancy between the sJM and the 2-jettiness as well as mJM calibrations results remains similar to the strict power correction treatment. In the lower portion of Fig. 8 we use the complete absorption prescription including also the soft rescaling factors as shown in Eqs. (3.47), (3.51) and (3.53). Compared to the middle portion, the 2-jettiness results are unchanged since r τ 2 ,s (m t ) = 1. The mJM results only move slightly since r τm,s (m t ) = 1 + O(m 4 t ). The sJM results for the top quark masses, on the other hand, increase substantially by around 400 MeV and are now fully consistent with the 2-jettiness and mJM calibration results. Likewise, also the Ω 1 results are now in agreement for all three shape variables. Interestingly, we also find that the absorption prescription leads to a general reduction of the perturbative uncertainties at NLL+LO order for the MSR and pole mass calibration fits. We have analyzed this behavior in great detail [79] and found that it is a general feature of floating ∆ 0 fits in combination with the absorption prescription for them 2 t power corrections visible for gap schemes 1 and 2. We believe that this is related to an accidental interplay between both procedures that leads to an artificial reduction of the profile (and ξ) parameter dependence at NLL+LO order where the QCD corrections are entirely encoded in renormalization-group evolution factors. These smaller NLL+LO perturbative uncertainties should therefore not be considered realistic. At N 2 LL+NLO this effect does not arise. A second feature visible in Fig. 8 and worth noticing is that the NLL+LO values for the pole mass increase by around 350 to 400 MeV when the floating ∆ 0 fits are combined with the absorption prescription. Since the pole mass uncertainties at NLL+LO order are about 400 MeV this is not a point of concern. Still, we have analyzed this behavior as well [79] and found that half of this shift is caused by using the floating ∆ 0 fit and that this only happens for the NLL+LO pole-mass fits.
Overall, when using the full absorption prescription for them 2 t power corrections we find gap scheme and observable independence. We therefore use this prescription for our final calibration analysis which we discuss in the following section.

Final Results
With all theoretical tools at hand we are now ready to discuss the final results of the NLL+LO and N 2 LL+NLO top mass calibration fits for Pythia 8.305, Herwig 7.2 and Sherpa 2.2.11 for the pole and MSR masses based on the three shape variables 2-jettiness τ 2 , sJM τ s and mJM τ m , and using the gap subtraction schemes 1, 2 and 3. The fits are based on the updated calibration framework, laid out in detail in the previous sections, which includes an updated shape-function fit procedure and a more sophisticated treatment ofm 2 t power corrections. The results for Pythia 8.305 are an update for the results presented in Ref. [25] for Pythia 8.205, where we have checked (see Sec. 5) that, as far as the shape variables we use in our analysis are concerned, the two Pythia versions are fully equivalent.
The final top mass calibration results for Pythia 8.305 and m MC t = 173 GeV are displayed in Fig. 9. We observe nicely consistent results for all shape variables in both mass schemes yielding uncertainties of about 200 MeV for the MSR mass m MSR t (1 GeV) and around 300 MeV for the pole mass at N 2 LL+NLO order. The smaller uncertainties at NLL+LO for the pole mass results are accidental as we have pointed out in Sec. 7 and do not reflect the true uncertainties at this order. The rather large uncertainties (and instabilities for Ω 1 ) visible for the pole mass calibration results in gap scheme 3 are an artifact of the sizable O(α s ) subtraction in this gap scheme, as we have discussed in Sec. 3.2.2. The results in gap schemes 1 and 2 are very similar, apart from a glitch in the N 2 LL+NLO result for Ω 1 in the MSR scheme fit for sJM, which is caused by some numerical outliers that could not be removed by the procedure described in Sec. 4.1 (see bullet point 1). This was the only incident in our analysis where our prescription to remove outliers did not suffice. We use the results obtained for the 2-jettiness shape variable and in gap scheme 2 at N 2 LL+NLO when quoting the final numbers for the results of our calibration analyses.   We remind the reader that the results for Ω 1 (2 GeV) we present are always converted to gap scheme 1 via Eq. (5.2). At this point, a comparison to the original calibration analysis of Ref. [25] carried out in the strict power correction approach and displayed (based on our own reanalysis) in Tab. 4 and Fig. 2, is in order. The N 2 LL+NLO results for the MSR and pole masses obtained in Ref. [25] [25]. Within uncertainties, all N 2 LL+NLO results are still fully compatible with those of the original calibration analysis, but the updated results presented here should be considered as more reliable.
The final top mass calibration results for Herwig 7.2 and m MC t = 173 GeV are displayed in Fig. 10. As for the Pythia analysis we observe nice consistency for the three shape variables 2-jettiness, sJM and mJM and for the three gap schemes albeit with larger uncertainties in gap schemes 2 and 3, particularly in the pole mass scheme. We have again carried out the same analysis for m MC mass results are visualized in the central panel of Fig. 13 and can be summarized as: A comparison between the Herwig 2-jettiness distributions with the N 2 LL+NLO theory cross section using the best MSR-mass fit result for Q = 700, 800 and 1000 GeV is shown in the central panels of Fig. 15. The N 2 LL+NLO pole-mass fits, visualized in the middle panel of Fig. 14 The rather large uncertainty of 460 MeV for the pole mass calibration is caused by a particularly strong dependence on the ξ parameter variations and is even larger for gap scheme 3, compared to a much smaller uncertainty for gap scheme 1. We believe this is caused by the broadness of the Herwig shape distributions shown in Fig. 1 which makes the fits more unstable for larger gap subtractions at low orders due to the stronger infrared-sensitivity in the cross sections in the pole mass scheme. We make a similar observation for the  Fig. 11. As for the Pythia and Herwig analyses, we observe nice consistency for the three shape variables 2-jettiness, sJM and mJM, and for the three gap schemes. Compared to the Pythia results, the uncertainties in gap schemes 2 and 3 are again larger, particularly in the pole mass scheme, but they are not as sizable as for Herwig. This is correlated with the fact that the broadness of the Sherpa peak shown in Fig. 1  A comparison between the Sherpa 2-jettiness distributions with the N 2 LL+NLO theory cross section using the best MSR mass fit result for Q = 700, 800 and 1000 GeV is shown in the lower panels of Fig. 15. The N 2 LL+NLO pole mass fits, visualized in the right panel of Fig. 14 0.05 ± 0.24 ± 0.09 (7.9±10.8) Figure 11: Summary of final top mass calibration results for Sherpa 2.2.11 with m MC t = 173 GeV for three gap subtraction schemes and the shape variables 2-jettiness, sum of jet masses (sJM) and modified jet mass (mJM).
A comparison between the calibration results for Pythia 8.305, Herwig 7.2 and Sherpa 2.211 for all three shape variables for gap scheme 2 is shown in Fig. 12. The most interesting aspect of the calibration results for the top quark masses is that they are fully compatible among all three MCs. At the same time, the calibration results for Ω 1 , which we find to be m MC t -independent, are around 250 MeV larger for Herwig and Sherpa compared to Pythia. This means the visible discrepancy in the position and the broadness of the peaks for all shape variables shown in Fig. 1 must be attributed to a difference in the modeling of the hadronization effects between Pythia and Herwig, while the conceptual meaning of their top quark mass parameters is within uncertainties (at N 2 LL+NLO) equivalent. While there are general arguments that the exact field-theoretic meaning of m MC t depends on the parton shower implementation and is therefore different for coherent branching and dipole based parton-shower implementations [16][17][18], this important observation can be interpreted as evidence that these differences may be small numerically at least concerning the meaning of the top quark mass parameter. We emphasize, however, that such statements can be made strict only in the context of observables where all showers are NLL precise and under the assumption that the hadronization models do not interfere in an uncontrolled way. That latter aspect has so far not been investigated yet in the literature and remains an issue that has to be studied carefully.

Conclusions
In this article we have updated and generalized the Monte Carlo (MC) top quark mass calibration framework of Ref. [25] that was based on the 2-jettiness distribution for boosted top pair production in e + e − annihilation and applied to relate the Pythia 8.205 top quark mass parameter m MC t to top quark masses in unambiguously defined renormalization schemes. The calibration approach uses binned hadron-level distributions generated by the MC for a given m MC We have generalized the original framework of Ref. [25], which is based on a bHQET factorization formula -matched to SCET and full QCD -in several ways: (i) including two more shape variables, namely the sum of (squared) hemisphere jet masses τ s and the newly designed modified jet mass τ m , and (ii) accounting for two additional gap subtraction schemes that remove the O(Λ QCD ) renormalon effects coming from large-angle soft radiation. The treatment of different gap subtraction schemes requires a more general fit  (m t /Q) 2 power corrections already in the singular bHQET factorization formula to achieve observable-independent results. Furthermore, we have updated the calibration framework to use standard file and event record formats, and presented all theoretical ingredients in great detail, which was missing in Ref. [25] due to lack of space.
We applied the updated calibration framework to Pythia 8.305, Herwig 7.2 and it is generator dependent and varies between 350 and 600 MeV. The probably most instructive result of our analysis is that, even though Pythia 8.305, Herwig 7.2 and Sherpa 2.2.11 with their standard tunes produce resonance shape distributions that are visibly different as far as the peak position and shape are concerned, the interpretation of their top quark mass parameters m MC t agree with each other within 200 MeV. We find from the fit results for Ω 1 , which are m MC t -independent, that the differences are associated to the different hadronization modeling used by the generators.
While the calibration framework presented in this article provides concrete numerical relations between m MC t and the top quark mass in well-defined renormalization schemes, it is not capable of testing the physical aspects of the interplay between the parton-level description and the hadronization modeling contained in the MCs. These two components are usually blended together in state-of-the-art MCs within the tuning procedure where the shower cut is treated as a tuned parameter. The next important conceptual step towards a better understanding in the interpretation of the MC top quark mass parameter m MC t is to carefully study the hadronization models. This shall be addressed in future work. In this context, the calibration framework presented in this article will play an important numerical diagnostic tool.

A.1 Notation and Tree-Level Results
The full QCD, NLO fixed-order calculation for the different event-shape distributions in e + e − annihilation to a stable, massive quark-antiquark pair is required for the treatment of the m 2 t /Q 2 power corrections and to obtain the QCD non-singular contributions mandatory for a full N 2 LL+NLO prediction. The result for a generic even shape variable τ can be written in the form (m t = m t /Q) where C stands for either the vector (V) or axial-vector (A) current induced massive quarkantiquark production. The quark mass m t is defined in the pole renormalization scheme. The minimal (and tree-level) event shape values τ min are τ s,min = 2m 2 t (jet mass sum, sJM) , τ m,min = 2m 2 t + 2m 4 t (modified jet mass, mJM) .
All distributive terms for τ → τ min are encoded in the coefficients A C τ (m t ) and B plus (m t ) such that the functions F NS,C τ (τ,m t ) are non-singular, which means that they are integrable at τ = τ min . The terms σ C 0 stand for the Born cross section for massless quark production: with N c the number of colors, α em the electromagnetic coupling, Q q the quark electric charge,m Z = m Z /Q the reduced Z-boson mass, Γ Z the finite width of the Z boson, and v i = (T i 3 − 2Q i sin 2 θ W )/ sin(2θ W ) and a i = T i 3 / sin(2θ W ) the vector and axial-vector couplings of the electron or quark to the Z boson, respectively. The coefficients R V 0 (m t ) show the quark mass dependence of the tree-level total cross section and read

A.2 NLO Results
A full generic analytic method to determine the NLO fixed-order corrections to massive quark event-shape distribution was developed in Ref. [50] and earlier calculations were already provided in Ref. [34,56]. Here we use the notation of Ref. [50] to write down the results, where also the ingredients needed for the computation can be found. The NLO delta function coefficients read (A. 6) and the only event-shape dependent contribution is encoded in the term I τ (m t ). For sJM it has the form I τs (m t ) = 1 24 π 2 (v 2 + 1) − 12(v 2 + 1)Li 2 v + 1 2 − 6{v[v(2 + log 2 2) + 2 − 4 log 2] + log 2 2} + 6(v 2 − 1) log and agrees with the case of the heavy jet mass distribution already given in Ref. [50]. The results for 2-jettiness and mJM read The coefficient of the plus distribution is universal for any event shape distribution and whether we consider vector or axial-vector current induced quark pair production: This fact motivates factoring out the tree-level mass correction terms R C 0 (m t ) in Eq. (A.1). Our treatment of power corrections concerning the overall factor R C 0 (m t ) in Sec. 3.4.2 is based on the assumption that this universality is not accidental and also valid for the singular QCD corrections beyond NLO, which are assumed to be event-shape independent as well.
The integrable functions F NS,C τ (τ,m t ) can be obtained computing the quark-antiquark plus gluon phase space for a given event shape value τ > τ min in four dimensions, see Eq. (4.16) of Ref. [50]. The result for the full distribution for τ > τ min , which is referred to as F C τ (τ,m t ), receives contributions where either only the quark, only the antiquark or only the gluon are populating one of the two hemispheres. We call these phase space regions quark (qu), antiquark and gluon (gl) regions, and we find that the quark and antiquark region results are identical. The result for the F C τ (τ,m t ) can then be written in the form F C τ (τ,m t ) = F C τ,qu (τ,m t ) + F C τ,gl (τ,m t ) .
The solution for the evolution is therefore , which can be generated by [35] s mn = s The anomalous dimensions are listed in the previous section and the relevant non-logarithmic terms are given by [28,85]  The gap subtraction series coefficients as defined in Eq. The gap R-evolution is thereforē Gap 2 and 3 are µ independent, but gap 1 inherits a non-trivial µ-anomalous dimension from the soft function and hence requires an additional µ evolution. This µ-RGE reads [34] µ d dµ∆ (1)

C Distributions
The plus function with a fractional exponent 1 + ω and ω < 1 is defined by [28] Θ(x) (x) 1+ω Expanding this equation for small ω defines plus distributions for positive integer n, Integrating plus distributions with a test function f (x) giveŝ Plus distributions appear in the jet and soft function and their evolutions. We use the following shorthand notation for them, where the exponent j is the mass dimension of the variable ℓ. In the case of a dimensionless argument we will also use the notation 1 e + ≡ Θ(e) e + . (C.5) The rescaling identity for plus distribution arguments reads κ θ(x) log n (κx) κx

D MC Simulation Settings
In the following subsections we will give all the relevant MC settings that are sufficient to describe the process e + e − → tt. Any other standard instructions/settings (random seeds) that might be necessary for the operation of the MC, which do not change the statistical population of the final state, can be found in the respective manuals or example input files and they have been left out below.

D.1 PYTHIA
The following flags were set in our Pythia 8.305 main program. We kept the default Monash 2013 tune, Tune:ee = 7 We select the process e + e − → tt and turn off initial state radiation (ISR). The center of mass energy is set to Q = Q/GeV, We select the process e + e − → tt (processes){ Process 11 -11 -> 6 -6; Order (*,2); End process; }(processes)