A theoretical study of top-mass measurements at the LHC using NLO+PS generators of increasing accuracy

In this paper we study the theoretical uncertainties in the determination of the top-quark mass using next-to-leading-order (NLO) generators interfaced to parton showers (PS) that have different levels of accuracy. Specifically we consider three generators: one that implements NLO corrections in the production dynamics, one that includes also NLO corrections in top decay in the narrow width approximation, and one that implements NLO corrections for both production and decay including finite-width and interference effects. Since our aim is to provide an assessment of the uncertainties of purely theoretical origin, we consider simplified top-mass related observables that are broadly related to those effectively used by experiments, eventually modelling experimental resolution effects with simple smearing procedures. We estimate the differences in the value of the extracted top mass that would occur due to the use of the three different NLO generators, to the variation of scales, to the choice of parton distribution functions and to the matching procedure. Furthermore, we also consider differences due to the shower and to the modelling of non-perturbative effects by interfacing our NLO generators to both Pythia8.2 and Herwig7.1, with various settings. We find very different results depending upon the adopted shower model. While with Pythia8.2 we find moderate differences between the different NLO+PS generators, with Herwig7.1 we find very large ones. Furthermore, the differences between Pythia8.2 and Herwig7.1 generators are also remarkably large.


Introduction
In Ref. [1] we considered three NLO+PS generators for tt production, hvq [2], ttdec [3], and bb4 [4], implemented in the POWHEG BOX [5][6][7][8], interfaced with either Pythia8.2 (Py8.2) [9] or Herwig7.1 (Hw7.1) [10,11]. We focused particularly on an observable that mimics those used in direct top mass measurements, but also included in our study the proposed top mass measurements from the peak energy of the b jet [12] and from the class of leptonic observables suggested in Ref. [13]. We found large differences between predictions obtained using the two parton shower programs. In particular, while results obtained with the three NLO+PS generators interfaced to Py8.2 are fairly consistent among each other, large differences are found if they are interfaced to Hw7. 1.
In this addendum we discuss the results obtained with the older, fortran-based versions of the Pythia and Herwig codes. Our purpose is to see if the effects that we have seen are specific to the new implementations, or were already present in the old ones. We briefly recall the characteristics of the older generators: -Pythia6.4 (Py6.4) [14]: implements a p T -ordered shower, making use of the same algorithm adopted in Py8.2. The older and new codes have both an interleaved radiation scheme between the initial-state radiation and the multi-parton interactions (MPI). In Py8.2, final-state radiation is also interleaved, and different models of colour reconnection are also offered. -Herwig6.5 [15] with Jimmy 4.31 [16] (Hw6.5): implements an angular-ordered shower. However, the showering variables are different from those adopted in Hw7.1. In the latter code, a boost-invariant set of showering variables was introduced, as described in Ref. [17]. Thus the older and newer schemes are fully equivalent only in the strict collinear limits. The two versions of Herwig implement the PS and the perturbative part of the MPI in a similar manner. The non-perturbative part of the MPI, instead, has been completely redesigned [18]. Similarly to Pythia, colour-reconnection effects are properly included only in the recent versions of Herwig [19].
By including Herwig6.5 and Pythia6.4 we exhaust all possible shower generators that can be interfaced to our NLO ones, since these are the only ones that implement the Les Houches Interface for User Processes [20]. The purpose of Ref. [1] was to understand and estimate uncertainties in top-mass measurements by comparing generators of different formal accuracy, i.e. the bb4 , ttdec and hvq ones. In doing so, it was found that switching the shower programs (to which the three NLO generators are interfaced to) yields large differences in the results, in spite of the fact that the different shower programs have fairly similar formal accuracy. These differences must be ascribed to the fact that different shower Monte Carlo programs may differ widely in their modeling of subleading effects, like the non-collinear radiation, the colour-reconnection schemes and the models for hadronization and multi-parton interactions. It is thus natural to extend the study of Ref. [1] with the inclusion of other shower generators, in order to further explore the impact of these differences.
We are aware of the fact that the c++ and fortran versions of the generators we are considering undoubtedly share some similarities, since the latter are the ancestors of the former ones. In spite of this, we found non-negligible differences, that we will discuss in the following.
In our previous work, we have seen that the two generators bb4 and ttdec yield fairly consistent results for the mass of the reconstructed top and the b-jet energy. In the case of leptonic observables, the differences between bb4 and ttdec within the same shower model are generally much smaller than the differences between the different shower models for the same NLO generator. The largest difference between bb4 and ttdec appears in association with Herwig7.1, and is around 1.5 GeV, while the difference between Herwig7.1 and Pythia8.2 in ttdec is about 2.5 GeV (see Fig. 17 of Ref. [1]). For these reasons, we only consider the hvq and bb4 generators in this addendum.

Interface to POWHEG BOX
In this section we briefly describe the matching of bb4 and hvq to both Py6.4 and Hw6.5. The matching to Py8.2 and Hw7.1 is detailed in Ref. [1].
Py6.4 implements both a p T and a virtuality-ordered PS. Here, we employ the p T -ordered shower with the Perugia tune (PYTUNE(320)) [21].
We setup Py6.4 in such a way that the p T of radiation in the shower is limited by the scalup parameter of the Les Houches Interface for User Processes [20], as is usually done in POWHEG. This is at variance with the Perugia tune settings, that requires p T to be smaller than scalup divided by √ 2. 1 The matching of shower emissions in the production process relies on the default behaviour of POWHEG, i.e. the shower evolution starts at scalup. In the decays, a different scale must be adopted, and thus it requires a custom veto prescription in bb4 . We implement it using two methods, both analogous to what we did in order to match Py8.2 to bb4 in Ref. [1]: 1. Each time Pythia6.4 generates an emission off the top (or anti-top), we compute its transverse momentum according to the POWHEG definition. If it is larger than the transverse momentum of the emission generated by the POWHEG BOX, we abandon the current shower, and restart a shower from the same Les Houches event. This represents our default method. We label it as the "FSR" veto, in full analogy with the notation adopted for Py8.2. 2. Since we employ a p T -ordered shower, we can also simply require the shower to start at a given transverse momentum, that we set equal to the transverse momentum of the corresponding POWHEG emission. This veto procedure will be referred to as the "SR" method, as we did with the analogous method that we adopted in Py8.2.

Herwig6.5
For Hw6+Jimmy we adopted the ATLAS AUET2 tune [22]. The Herwig shower is ordered in angle and not in p T . Therefore all the emissions with transverse momentum larger than that of the POWHEG emission must be vetoed. Both Herwig versions already enforce this veto for the production part of the process. Similarly to Py6.4, extra care is required for emissions from the top-decay products, when interfaced with bb4 . In our previous work, two procedures were devised to veto extra Hw7.1 emissions. Both of them use the p T of the POWHEG emission as an upper bound, either on the p T of each branching at the end of the showering phase (FullShowerVeto), or on the shower evolution scale during the showering phase (ShowerVeto). Unfortunately, the Hw6.5 event record (as for Py6.4) does not contain information regarding the branching of the partons, i.e. it is not possible to reconstruct the emission's history after the shower is completed, in contrast to the new version of the code. Therefore, we only implemented the analogue of the Hw7.1 ShowerVeto method which proceeds as follows: when an emission off a top resonance is generated, if its p T (defined in terms of Herwig variables) is larger than that of the POWHEG emission, the branching is discarded and the evolution continues from the scale of this discarded emission.

Hadronic observables: NLO+PS results
In this section we compare predictions for hadronic observables at the NLO+PS level, i.e. without the inclusion of MPI and of hadronization effects. Our aim is to assess differences of perturbative origin and, in particular, due to the NLO+PS matching.

Pythia6.4 versus Pythia8.2
We begin by comparing the predictions obtained with Py6.4 and Py8.2, which both implement a dipole-like algorithm for final-state showers. In Ref. [1] we made use of a smearing procedure to simulate experimental resolution effects. We begin by examining results obtained without applying any smearing.
The distributions of the reconstructed-top mass and of the b-jet energy using hvq matched to the two versions of Pythia are shown in the upper and lower panes of Fig. 1, respectively. The two curves for the reconstructed-top mass are almost indistinguishable. Also the peak positions of the b-jet energy spectra agree remarkably well, despite some small differences in shape, leading to a displacement of the extracted top-mass for this observable of ≈ 200 MeV.
In Fig. 2 we plot the distributions obtained using the bb4 generator. The results for the m W b j spectrum obtained with Py6.4 show an enhancement in the low-mass region with respect to the Py8.2 distribution, irrespective of the veto scheme used (upper pane). Nevertheless there is no appreciable shift in the peak-position.
The shape of the b-jet energy spectrum in the proximity of the peak region is instead different for Py8.2 compared to the two results obtained by using Py6.4, with a shift in the maximum of the b-jet energy of approximately +0.5 GeV of the former with respect to the latter two results. This shift induces a displacement in the extracted top-mass (m t ) of ≈ 1 GeV. 2 In Tables 1 and 2 we summarize the m W b j and E b j peak positions respectively, obtained for different values of the jet radius varied between 0.4 and 0.6. Table 1 also shows the m W b j distribution peak positions when the smearing is applied. An excellent agreement is found between hvq+Py6.4 and hvq+Py8.2 for m max W b j , even after the smearing is applied, and the E max b j differences are small, nearly consistent with zero within their statistical errors for all values of R.
The low-mass enhancement in the m W b j spectrum of the bb4 +Py6.4 generator, with respect to the bb4 +Py8.2 generator, leads to quite large displacements of the peak position once smearing is applied. For our default FSR-veto procedure, the differences between Py8.2 and Py6.4 are roughly 250-300 MeV. The differences of E max b j for the two showers used with bb4 are even larger, of the order of 0.5 GeV for all values of the jet radius. It is interesting to notice that bb4 +Py6.4 and bb4 +Py8.2 can yield such large differences, in spite of the fact that they should implement the same shower model, and now we are not considering hadronization and MPI effects.
The differences in m max W b j and E max b j between the bb4 and hvq generators for R = 0.5 are reported in Table 3.
We notice that the level of agreement of m max W b j predictions obtained using bb4 and hvq gets worse in Py6.4 as compared to Py8.2, while the opposite is true for E max b j .
In the upper panes of Figs. 3 and 4 we plot the results for m W b j obtained with hvq and bb4 . The cross section under the peak is mildly suppressed in Hw6.5 with respect to Hw7.1. This is then compensated by enhancements in the low-and, to a smaller extent, high-tail regions. A small bump is also present at roughly 1 GeV below the peak position when using the bb4 generator with Hw7.1, also present to Py6.4 (FSR) 1 2± 2 −265 ± 2 −147 ± 106 a smaller extent when using Hw6.5 instead. 3 These differences, present already at the shower level, could be ascribed to the fact that the two versions of Herwig adopt slightly different ordering variables. 4 Despite the presence of these differences, the peak position (at the unsmeared level) in Hw6.5 or Hw7.1, in both hvq and bb4 , is not changed.
In the lower panes of Figs. 3 and 4 we show the results for the b-jet energy spectrum. The peak position, when hvq is used, is 250 MeV bigger when showering with Hw6.5 than with Hw7.1, while in the case of bb4 it has the same magnitude but opposite sign. This affects the extracted top mass by 0.5 GeV.
In Tables 4 and 5 we quote the differences between the two Herwig showers for several values of the jet radii.
We notice that the differences between the c++ and fortran versions of Herwig for m max W b j and E max b j are considerably smaller than in the Pythia case, in spite of the fact that the two implementations of the angular-ordered shower in Herwig are completely different.
Conversely to the Pythia case, in Herwig the differences between bb4 and hvq are quite large, as shown in Table 6. 3 Further studies suggest that this bump is a symptom of a minor shower cut-off mismatch between Hw7.1 and bb4 . 4 In Hw6.5 the variable z is interpreted as the energy fraction of the emitter after the emission, while in Hw7.1 it represents the light-cone momentum fraction. In both, the ordering variable in the collinear limit becomes ∼ Eθ, E being the energy of the emitting parton and θ the angle between the two radiated partons. See [10] for further details.  The shifts for m max W b j , without any smearing, are small and comparable when using Hw7.1 or Hw6.5. These are not reported in the figures, and can be obtained from the tables in the appendix.
When the smearing is applied, Hw7.1 and Hw6.5 with bb4 give comparable negative shifts, around 1 GeV. Instead, with hvq, the displacement of the peak position (with respect to the reference values) are around −100 ÷ −200 MeV for Hw7.1, and 0 ÷ −150 MeV for Hw6.5, for the different jet radii R. Since no significant difference between the two Herwig versions was observed in the bb4 case (where POWHEG generates the hardest emission both in production and decay), and since hvq does not handle radiation in decay, this behaviour is likely to be due to a different treatment of radiation in decay in the two Herwig versions with respect to Pythia.
As for E max b j predictions in Fig. 6, we find minor differences between Hw6.5 and Hw7.1 for R ≥ 0.5, that go in the direction to amplify the difference with respect to our reference generator. Similarly to m max W b j , also in this case the discrepancies between bb4 and hvq interfaced to the same shower generator are larger for Herwig than for Pythia, both for the older and newer versions.
We interpret the relative consistency of the Hw7.1 and Hw6.5 predictions with the bb4 generator as a validation of our veto procedures and of the results presented in Ref. [1].

Hadronic observables: full results
We now summarize the results obtained by showering hvq and bb4 with the four PS programs at the full level, that is with the MPI and hadronization switched on. The bb4 +Py6.4 results shown here and in the following sections are obtained using the FSR veto.
For the hvq generator (see Fig. 7) we find that Py6.4 and Py8.2 yield very similar results. However, we find an appreciable disagreement between Hw7.1 and Hw6.5. We Hw6.5 13 ± 2 −829 ± 2 −1220 ± 102  Table 8. The square/round dots refer to bb4 /hvq results, while the colours correspond to given shower generators attribute it to different implementations of MPI in the two versions of Herwig, since the predictions agreed rather well at the NLO+PS level for R ≥ 0.5. 5 If the bb4 generator is employed (see Fig. 8) the same reasoning applies, but with one important difference: the dis- 5 We stress that, among other improvements over Hw6.5, Hw7.1 implements a model for the treatment of colour reconnection. The m W b j and E b j shifts in peak positions obtained considering several values of the jet radius R, with and without smearing in the case of the m W b j distribution, are summarized in Figs. 9 and 10. We notice a non-negligible R dependence in the difference between Py6.4 and Py8.2, both in the hvq and bb4 case. Something similar is observed for the difference between Hw7.1 and Py8.2. A large R dependence is also observed in the case of Hw6.5, but with an opposite slope. The largest difference with respect to our reference result is given by the Hw7.1, that represent a major cause of concern. We stress that these large differences arise in the smeared case from the mass distribution away from the peak, i.e. cannot be consider as an irreducible uncertainty on the extracted mass.
In Fig. 9, we also see a rather striking difference between Hw7.1 and Hw6.5 interfaced to the bb4 generator, represented by the blue and orange square dots in the figure. The two shower generators yield differences larger than 1 GeV for the largest value of R. Furthermore, the R dependence Fig. 7 Reconstructed-top mass (upper pane) and b jet energy distribution (lower pane) obtained for the hvq generator matched to Py8.2 (red), to Py6.4 (green), Hw7.1 (blue) and Hw6.5 (orange). The hadronization and the underlying event are included in the two cases is opposite, in spite of the fact that, in the similar plot without hadronization and MPI (see Fig. 5), the two generators yield rather consistent results.
Overall, we find that bb4 and hvq showered with Pythia exhibit more consistency than those showered with both versions of Herwig. This is perhaps not surprising. Matrix-element corrections (MEC), that have a large impact on hvq predictions (since this generator implements only LO top decay), as implemented in the context of angular ordered parton showers (i.e. in Herwig), are technically quite different from the way in which the hardest top radiation is generated in bb4 , at variance with MEC in transverse-momentum ordered showers (i.e. in Pythia). We find that it is difficult to use this difference to dismiss the Hw7.1 result, since the

Leptonic observables
The last class of observables we consider are the leptonic ones. In Ref. [1] we found that these observables are only mildly affected by non-perturbative effects (i.e. the hadronization and the MPI), thus we present only the results obtained at the full level and with jet radius R = 0.5. However, they are likely to be strongly affected by the parton shower, since the W boson, and thus the leptons arising from  Table 9. The square/round dots refer to bb4 /hvq results, while the colours correspond to given shower generators its decay, must absorb the radiation recoil to ensure fourmomentum conservation.
We extract the top mass value from the following observables: The results are presented in Table 7 and their graphical display is given in Fig. 11. As before, our pseudo data sample was generated with bb4 +Py8.2, and we used all combinations of NLO+PS generators and shower programs to extract a corresponding top-mass value. We remark that the mass extraction performed with the bb4 +Py8.2 generators has been carried out using the same sample generated as pseudo data, so that the central value of the extracted mass is identical to the input mass in this case.
We have included the standard theoretical uncertainties as described in Ref. [1], and averaged the results obtained for the different leptonic observables also considering the statistical correlation among them, as suggested in Ref. [13].
The Py6.4 predictions always give m t values roughly 1 GeV larger (1.2 GeV for bb4 and 0.8 GeV for hvq) than the corresponding Py8.2 ones. This variation is of the same order of the extracted total uncertainty on m t .

Conclusions
In this work we have extended the study performed in Ref. [1] by also considering the Py6.4 and Hw6.5 generators.
We find that, at the NLO+PS level, the Py6.4 and Py8.2 generators (both based upon a p T -ordered shower) are quite consistent among each other, and the same holds for Hw6.5 and Hw7.1 (both based upon an angular-ordered shower). When non-perturbative effects are included, we find larger differences between the old and the new Herwig versions of the PS programs, that yields a better agreement of the old Herwig version with respect to both Pythia versions (see Fig. 9).
If we compare predictions for the leptonic observables, we see that the old Herwig version is further away from our reference result then the new version.
Overall, the inclusion of the older versions of the shower generators supports what was found in Ref. [1], i.e. an indication of a large sensitivity to the shower generator in the extraction of the top mass. The fact remains that Herwig7 yields the most disturbing difference with respect to the other generators for what concerns the most important observable that we have considered, i.e. m max W b j . On the other hand, we believe that the Herwig6.5 result, that is more in line with
The horizontal band represents the weighted average of the results, and the black horizontal line corresponds to m t = 172.5 GeV, which is the top mass value used in the bb4 +Py8.2 reference sample Fig. 12 Results for the difference of the m max W b j at the Monte Carlo truth level (i.e. with no smearing) with respect to our reference generator (i.e. bb4 +Py8.2) using hvq or bb4 , showered by Pythia and Herwig, for different values of jet radius R. Hadronization and MPI effects are included. The numerical values are reported in Table 9 the Pythia ones, cannot be used to dismiss the Herwig7 one. In fact, it supports Herwig7 when only shower effects are considered, and only the inclusion of hadronization and MPI effects, thanks to an accidental cancellation, brings the final result in better agreement with the Pythia ones.
Since we have now compared four different shower and hadronization models, it is worth asking what kind of estimate of irreducible non-perturbative effects, potentially due to the different implementation of the shower cut-off and the matching hadronization model. We thus consider the spread of the m max W b j values obtained with all generators as a crude estimate of non-perturbative effects. Looking at Fig. 12, we see that the unsmeared results from the bb4 generators, taking R = 0.5 to avoid too large hadronization effects (for small R) and too large MPI contamination (for large R), covers a range of roughly 200 MeV when switching among our four shower generators. If we take this range as an estimate of non-perturbative and subleading shower effects, we can conclude that, after all, these effects are well below presently quoted errors for direct measurements from the experimental collaborations.