Reducing the top quark mass uncertainty with jet grooming

The measurement of the top quark mass has large systematic uncertainties coming from the Monte Carlo simulations that are used to match theory and experiment. We explore how much that uncertainty can be reduced by using jet grooming procedures. We estimate the inherent ambiguity in what is meant by Monte Carlo mass to be around 530 MeV without any corrections. This uncertainty can be reduced by 60% to 200 MeV by calibrating to the W mass and a further 33% to 140 MeV by applying soft-drop jet grooming (or by 20% more to 170 MeV with trimming). At e+e- colliders, the associated uncertainty is around 110 MeV, reducing to 50 MeV after calibrating to the W mass. By analyzing the tuning parameters, we conclude that the importance of jet grooming after calibrating to the W mass is to reduce sensitivity to the underlying event.


Introduction
The top quark mass is a fundamental parameter in the Standard Model (SM). Its value, and the associated uncertainty, are of great importance for predictions at the Large Hadron Collider (LHC). In the top quark discovery papers from 1995, the CDF [1] collaboration measured m t = 176 ± 12.8 GeV and DØ [2] measured m t = 199 ± 29.7 GeV. Since then measurements have come a long way, with a recent CMS combination [3] using 7 and 8 TeV data giving m t = 172.44 ± 0.48 GeV and a recent ATLAS combination [4] giving m t = 172.84 ± 0.70 GeV. Further reducing the uncertainty on the top quark mass is important both for checking self-consistency of the SM and for new physics searches. For example, because of its order-one coupling to the Higgs, the top quark is a dominant contributor to the Higgs effective potential, with implications for baryogenesis and vacuum stability. Indeed, the top quark mass uncertainty is currently the limiting factor in determining whether the Standard Model (SM) is stable or meta-stable [5][6][7]. If m t 171.22 GeV, our universe is unstable, if m t 177 GeV it is rapidly unstable. For intermediate values, the universe should last at least as long is it currently has. If m t is measured precisely enough to confidently claim the Standard Model is in the unstable region this would be compelling evidence for physics beyond the Standard Model.
For a precision measurement of the top quark mass, we need a precision definition of the top quark mass. Since the quark carries color charge, we cannot observe an isolated top quark and measure its mass directly. Instead we have to construct observables that depend on a top-quark mass parameter in a particular scheme, such as the pole mass, MS mass, 1S mass [8,9], potential-subtracted mass [10], Monte-Carlo mass, MSR mass [11], etc. (for reviews, see [12][13][14]). Then we can fit the experimental data to a theoretical calculation. Some of these schemes, like the MS mass, are short-distance mass schemes, meaning they are free of renormalon ambiguities and are more stable to the order in perturbation theory at which they are used. For the precision in the top-quark mass measurements to continue to improve, understanding the interplay between scheme choice and experimental uncertainty will be crucial.
The most theoretically sensible way to measure the top quark mass is through an inclusive quantity, like the total tt cross section [15] or the tt cross section differential in the top p T [16]. Such calculations can be performed in perturbative QCD using an unambiguous short-distance mass scheme like MS. Unfortunately, extractions using cross sections are unlikely to produce a top-quark-mass uncertainty below 1 GeV, even at the high-luminosity LHC [17]. Another approach under good theoretical control is to look at the production cross section scanning over the energy of the incoming particles, as in e + e − → tt [18][19][20]. This method requires a new collider. Using the Large Hadron Collider, the best top quark mass extractions will come from measurements involving the top quark's hadronic decay products, and therefore it is imperative to get an accurate assesement of the uncertainty on these methods.
So far, the most precise measurements of the top quark mass have involved fitting the reconstructed top decay products to a theoretical curve. These curves are usually produced using Monte Carlo (MC) event generators so that the mass scheme used is a Monte-Carlo mass, m MC t . This Monte-Carlo mass is by definition the value of a parameter in the simulation. It is often assumed to be the same as the pole mass. To make a precision top-mass measurement, one cannot just assume that m MC t = m pole t , and indeed these two schemes cannot be the same since m MC t depends on which Monte-Carlo is used and which tune, while m pole t has a precise field-theoretic definition (up to a renormalon ambiguity of around 70 MeV [21]). Early estimates put the uncertainty in translating from m MC t to a well defined short-distance mass scheme like MS is of order 1 GeV [11], although it seems like the uncertainty may in fact be reducible, perhaps below 100 MeV [22,23].
One approach to translating the MC mass into a short-distance mass scheme was proposed in [23]. The idea in this paper is to do a precision calculation of an observable related to the top-quark decay products, such as the mass of a highly-boosted top-jet. The calculation should involve a short-distance scheme, and the MSR scheme was preferred [22,23]. Then one can fit the distributions from the MC event generators to the theory curves and extract the map from m MSR t to m MC t . Ideally, one could do these fit in a relatively clean environment, like e + e − → tt events, and the extracted relation between m MC t and m MSR t could be applied to values of m MC t extracted from fits to data at hadron colliders. That is, the program involves two maps: The second map seems to be under systematically improvable theoretical control, assuming that the first map exists, that is, that m MC t is well-defined. In order to use m MC t for precision mass measurements, one must understand the inherent ambiguity in the definition of m MC t . This ambiguity, related to tuning and limitations of the Monte Carlo programs, contributes to the uncertainty on the extracted top mass and may be the limiting factor in top mass measurements. In this paper, we explore how the uncertainty on m MC t can be reduced, particularly with the use of the jet grooming techniques trimming [24] and soft-drop [25].
The uncertainty we are concerned with is that the extracted value of m MC t can depend on the various parameters of the simulation. The MC generator has to simulate not only the top quark production and decay, but initial-and final-state radiation (ISR/FSR), hadronization, secondary interactions in the colliding protons known as either underlying event (UE) or multiparton interactions (MPI). There is the additional problem of contamination from collisions of other nearby hadrons known as pileup. Pileup is a stochastic process, uncorrelated with tunings related to m MC t , so we do not consider it here. By varying the various MC tuning parameters associated with these effects, the same curve (either experimental or theoretical) would match to different values of m MC t , thus we can estimate the uncertainty on m MC t by varying the tunes. Most experimental top mass measurements provide some estimate of this uncertainty. For example, the 7 TeV ATLAS top quark mass measurement in the lepton-plus-jets channel [4] has ∆m MC t = 530 MeV, a substantial part of their 1030 MeV total systematic uncertainty from this run. In [3] combining 7 TeV and 8 TeV data, CMS estimates an analogous uncertainty of around 300 MeV. There has also been some theoretical work on understanding how different MC parameters, such as the color-reconnection model, effect m MC t [26].
In recent years, a number of jet grooming algorithms have been developed to help clean up jets or events in some way. Some example groomers are mass-drop filtering [27], trimming [24], pruning [28], modified mass drop [29] and soft drop [25]. A typical application is to help resolve subjects in a highly-boosted decay object, like a boosted top quark [30,31] or boosted W boson [32,33]. Another application is to remove radiation from underlying event or pileup so that peaks or shapes in invariant mass distributions are sharper [34,35]. These techniques have shown to be successful in improving signal over background significance. Such applications do not require precision theory: one can find a bump in groomed data without theory input. Recently, there has been progress in understanding what the groomers are doing from perturbative QCD [29,36,37], and it seems promising that precision groomed jet observables might be compared directly to theory without using MC simulations at all [38,39].
In this paper, we explore the interplay of jet grooming at the uncertainty on m MC t . In Section 2 we describe the setup of our analysis and describe the method used for estimating the uncertainty on the top mass. In Section 3 we show that forcing the reconstructed m W mass to be exactly m W is extremely helpful in stabilizing m MC t over different tunes. Then, in Section 4, we study how grooming techniques trimming and soft-drop can further help reduce uncertainty. In Section 4.2, we explore the parameter space of the groomers and try to get a feel for which parameters are most sensitive to grooming. Our conclusions and a brief discussion is presented in Section 5.

Monte Carlo Top Mass Extraction
The basic idea for how we extract the uncertainty on m MC t is to generate events for each tune for different values of m MC t . Then we fit those distributions to extract a fit mass m fit t . The fit mass will not be the same as the MC mass, but the two are linearly related to an excellent approximation in the regimes we fit: m fit t = κ m MC t for some κ. Different tunes give different values of κ which then translate to an uncertainty on m MC t . More details on the simulation and this extraction procedure are given in this section.

Generation of Events
For our simulated top quark mass measurement we have used the pythia 8.219 [40,41] event generator to generate lepton-plus-jet top events, pp → tt → lνbbjj at √ s = 13 TeV where l = e, µ. All final state particles (except the neutrinos) with pseudorapidity |η| < 4.5 are clustered using FastJet 3.2.1 [42] with anti-k T [43] with R = 0.5 (as used by CMS [3]). We require exactly one isolated lepton and at least 4 jets with p T > 30 GeV, and that the two b-tagged jets are among the 4 jets with highest p T . Only jets with |η| < 2.4 are included in the top reconstruction. The lepton and b-jets are tagged by matching the fourmomentum after the hard interaction to the four-momentum of the jet. If the distance ∆R = ∆η 2 + ∆φ 2 > 0.3 between the four-momentum of the jet and the hard interaction, or if one jet is tagged multiple times, the event is thrown out. The events are generated such that t → ( With the b andb jets tagged, we iterate over all pairs of untagged jets to find the pair with invariant mass closest to m W = 80.4 GeV. Only events with the reconstructed W mass between 75 GeV < m W < 85 GeV are kept. The invariant mass of the four-momenta of this pair together with theb-jet gives us our reconstructed top quark mass.
For each run, we generate 10 7 events of which around 4% pass our cuts. The reconstructed top quark mass for all events passing the cuts are then put in a histogram with bin size 0.5 GeV that is used for fitting. One such histogram is shown in Fig. 1.

Fitting
The fitting procedure we used is similar to that implemented by Skands and Wicke in [26]. We fit the simulated data dσ dm to a 3 parameter (N, σ, m fit t ) Gaussian We use a fit range of |m − m fit t | ≤ σ. This relatively narrow window is chosen to avoid sensitivity to the tails of the distribution. The fit is done multiple times, each time changing the central value and window, until the fit is symmetric around the peak. A typical distribution and fit is shown in Fig. 1. The fit is clearly not perfect, but it does not have to be. One could get better fits with more parameters, but doing so does not improve extracted uncertainty on m MC t . Indeed, after trying more complicated examples, we concluded that a simpler fit gives equivalent results with less statistical variation.

Tunes
The parameters in pythia are not all independent. In fact, changing parameters separately can result in much more unrealistic events than changing a handful of parameters in a coordinated way. The recommended way to change simulation parameters is to vary the tune. Each tune in pythia represents values of the simulation parameters coordinated to give realistic events.
Choosing which tunes to vary to get a realistic estimate of the MC uncertainty is notoriously subjective. One can choose a subset of tunes and take the envelope of those variations, or one can include the variations from 30 tunes and add the uncertainties in quadrature. The first procedure might underestimate the uncertainty, and the latter probably overestimates it. It is not even clear if all the available tunes span the possible forms that events could have [44]. Alternatively, one could vary the simulation itself, comparing pythia to hewig or to sherpa to estimate uncertainties.
In addition to using the A14 tunes, we also look at tunes Tune:pp = 14 − 18. In some plots, we will show the uncertainty from the envelope over these tunes. We include this as a cross check only; tunes 14-18 are not used to to calculate our overall uncertainty. We find the relative reduction in uncertainty using grooming is fairly insensitive to which set of tunes are used, although obviously the absolute size of the reduction does depend on which tunes are chosen. We did not look at the comparison with hewig or any other generator, since the procedure for combining the hewig uncertainty with the pythia one is arbitrary.
For e + e − , we estimate uncerainty by looking at the envelope over tunes 1, 3 and 7.
To be clear, our main concern is the relative improvement in the uncertainty from using grooming. This relative improvement is largely independent of the absolute size of the uncertainty (e.g. soft-drop reduces the uncertainty by 26%). We quote absolute uncertainties for concreteness, but a proper estimate must be done in the context of the experimental measurement which is beyond the scope of, and not the point of, this paper.

W -calibration
One of the biggest systematic uncertainties in top mass measurements is due to jet energy scale (JES). For this paper, we define JES as the uncertainty on how much energy and momentum is in a jet given a particular detector response, although other definitions are sometimes used. One way to calibrate JES is through a standard reference whose energy is known. For events with top quarks a natural reference is the W -boson mass, which is known to precision of a few MeV. Thus one can demand on an event-by-event basis that the W boson is always reconstructed correctly by rescaling the energy of all particles by some factor [26,44]. We call this W-calibration.
W -calibration corrects for a lot of issues associated with detector response, so it is common used as a JES correction in experiment. Note however that W -calibration also corrects for contamination in the W decay products coming from underlying event, pileup, ISR going into the W decay products and FSR going out of the W decay products. Thus by putting the reconstructed W exactly at the right mass, more than just JES is corrected for. Thus it is meaningful, and indeed very useful as we will see, to use W -calibration even for MC-only top mass studies, as we are doing here.
For our implementation of W -calibration, we calculate m fit W from the invariant mass of the W decay products, and then we rescale the fit top quark mass by m fit t → m fit  MeV. In other words, the combined uncertainty from the A14 tunes is reduced by 62% by including the W -calibration.
In addition to the W -calibration, we also tried applying jet area corrections [51]. This did not lead to any additional improvement.
In Fig. 2 we also show the uncertainty coming from the envelope over five other pp tunes. This uncertainty is smaller than the envelope over the A14 tunes.
We also show in in Fig. 2 the variations of three e + e − tunes. The uncertainty at e + e − colliders is significantly smaller than the largest uncertainties from the pp tunes (by a factor of 3 without W -calibration and a factor of 2 with W -calibration). Numerically, the e + e − uncertainty is 110 MeV without W -calibration and 50 MeV with W -calibration. Keeping in mind that we have estimated around a 50 MeV uncertainty in our fitting procedure, the Wcalibration has saturated the improvement we can expect for m MC t at e + e − colliders without 1 Skands and Wicke [26] found that W -calibration (which they call JES corrections) gave a slight increase of ∆m MC t . This is in contradiction to our findings. Details of the variations being studied and improvements in the Monte Carlo simulations over the last ten years make it difficult to reproduce their analysis exactly and may explain the difference. a more comprehensive study (involving detector simulation, systematic uncertainty and so on, all of which are well beyond the scope of our study).

Grooming
In a top mass measurement based on hadronic decay products of the top quark, the reconstructed four-momentum of the top is sensitive to the underlying event and initial-and final-state radiation. More underlying event activitiy will typically give a large contribution to the top quark four-momentum, which will directly affect the reconstructed top mass. To mitigate these effects, many different jet grooming algorithms have been introduced to remove wide-angle and/or soft radiation, as mentioned in the introduction. In this section we study how the application of jet grooming techniques can reduce the uncertainty on m MC t . We focus our attention on two groomers, trimming [24] and soft drop [25]. Based on the improvements on the systematic uncertainty with W -calibration, as seen in the previous section, we will consider both groomed jets with and without the calibration applied.

Optimizing Groomer Parameters
Every grooming algorithm is defined in terms of some set of parameters that we can optimize based on our application. Trimming reclusters each jet using the k T algorithm [52,53] with characteristic radius R sub , and it discards contributions from subjets which carry less than a fraction f cut of the transverse momentum of the original jet. Soft drop reclusters the jet using the Cambridge-Aachen (A/C) algorithm [54,55], and depends on two parameters, the soft threshold z cut and an angular exponent β. It breaks the jet into two subjets (labeled 1 and 2) by undoing the last stage of the C/A clustering, then checks the soft drop condition If the subjets pass this condition, the jet is the final softdropped jet, otherwise the subjet with smaller p T is thrown out, and the procedure is iterated. For both trimming and soft drop, we would like to know which grooming parameters minimize ∆m MC t as we look at the variations within the A14 tunes. As in Section 3, we will consider the 6 subgroups of the A14 tunes: PDF set variations, VAR1, VAR2, VAR3a, VAR3b and VAR3c. For each group we calculate ∆m MC t for each set of groomer parameters, and the uncertainties from the six groups is added in quadrature and plotted in Fig. 3. Without W -calibration we find trimming does not help and for soft-drop (z * cut , β * ) = (0.05, 0.5) is optimal. With W -calibration we find for the optimum is at (f * cut , R * sub ) = (0.02, 0.2), while for soft drop the optimum is at (z * cut , β * ) = (0.1, 1.0). We will call these values our optimized parameters in the rest of this paper.    After optimizing the grooming parameters, we study the effect of grooming for each of the A14 groups of tunes. In Fig. 4 we show a comparison of the calculated ∆m MC t with soft drop, trimming and no grooming, both with and without W -calibration. Our results are summarized in Table 2. In Fig. 4 we also include the uncertainty coming from envelope over tunes Tune:pp = 14 − 18 (using the A14 optimized groomer parameters). That the uncertainty is in the range of the other tunes indicates that improvements from grooming does not crucially depend on fine tuning of groomer parameters. We also show the envelope over tunes Tune:ee = 1, 3, 7 for e + e − → tt events.
For trimming, we see that without W -calibration, trimming only makes the uncertainty worse. After W -calibration, trimming helps in almost all of the tunes. Adding the A14 tune uncertainties in quadrature, we find that in conjunction with W -calibration, the uncertainty is reduced by 68% (compared to 62% using only W -calibration).
Soft drop helps even without W -calibration. With W -calibration, it gives an improvement in all variations except VAR2 and VAR3b, whose uncertainties are small anyway. Adding the uncertainties in quadrature, we find that soft drop gives an improvement of 26% and 74% with and without W calibration, respectively. These results are summarized in Table 2.
As a cross check, it is informative to look at the shapes of the reconstructed mass distribution in the different cases. These are shown in Fig 5. The W -calibration seems to clean up the tails of the distribution. The additional improvement from soft drop seems to improve the peak region slightly, although it is hard to see by eye the origin of the improvement.

Changing Individual Parameters
The A14 tunes contain systematic variations based on the A14 NNPDF tune. The parameters that are changed in each tune and the full range of each setting used by these tunes is listed in Table 3. In Fig. 6 we show the calculated value for ∆m MC t when we compare the maximum vs. minimum value for each individual setting in Table 3. Since the variation tunes are based on the A14 NNPDF tune, we set all other parameters to the values described by this tune. Note that changing each setting separately does not give us a good physics description, but it will give us a direct measure of how sensitive the top quark mass measurement is to each of the MC parameters of interest.
Looking at the results in Fig. 6 we find that the dominant uncertainty is coming from the variations of α s in the multiparton interactions (underlying event), timelike shower (final state radiation) and spacelike shower (initial state radiation).
By comparing the results in Fig. 4 and 6 while referencing Table 3 to see which parameters were varied for each tune, we can understand which settings ∆m MC t is most sensitive to. It is straighforward to see which tuning parameters dominate the uncertainty on the different tunes: • VAR1 is very clearly dominated by MultipartonInteractions:alphaSvalue.
• VAR2, VAR3a and VAR3b are dominated by TimeShower:alphaSvalue, and the size of ∆m MC t for each pair of tunes is nicely correlated with the absolute variation of TimeShower:alphaSvalue for the corresponding pair.
• VAR3c is domianted by SpaceShower:alphaSvalue, since this is the only parameter changed.
In Fig. 7 we show the calculation of ∆m MC t (without W -calibration) obtained by varying MultipartonInteractions:alphaSvalue and TimeShower:alphaSvalue in the minimum and maximum range listed in Table 3 for different grooming parameters. For both trimming and soft drop, the grooming is more aggressive as we move towards the lower right corner. Trimming will create many small subjets, and with a higher f cut it will throw out more and more of them. Looking at the soft drop criterion z > z cut ∆θ 12 R β , we see that higher z cut makes it harder to pass the test, and more particles will be thrown out. Also, since ∆θ 12 R < 1, smaller β will similarly increase the number multiplying z cut , which will make the soft drop condition more difficult to pass.
First consider the case without W -calibration. To explain the difference between the MPI and FSR plots, consider the case where we increase α s . In the final state shower, a particle is more likely to split into two (as determined by the splitting functions), and with more  Table  3 starting out with the A14 NNPDF tune for trimming, soft drop and no grooming for optimized grooming parameters (f * cut , R * sub ) = (0.02, 0.2) and (z * cut , β * ) = (0.1, 1.0). The SpaceShower:pTmaxFudge results has been omitted as it gave exactly zero variation of the top mass.
aggressive grooming we are more likely to throw out one (or maybe both) of these particles, and hence giving a less accurate reconstruction of the top quark four-momentum. For the multiparton interactions, higher α s gives more particles produced in the underlying event which in turn contaminate our reconstructed top quark four-momentum. More aggressive grooming (to a certain degree) will hence remove more of the contamination. Fig. 7 therefore confirms our intuition about grooming: aggressive grooming helps remove contamination from the underlying event, but it comes at the expense of throwing out particles that came from the actual decay of the top quark. The optimized grooming parameters strike a balance between the two effects to give the maximal overall improvement of ∆m MC t . We can see how things change when including W -calibration from Fig. 6. By putting the W on-shell, problems with aggressive grooming are automatically compensated for. For example, when we increase α s for FSR, so that a particle is more likely to split into two smaller subjets and get removed by the groomer, the W mass will also be reduced. Thus the W -calibration will compensate for the aggressive groomer in estimating the top mass. Indeed, looking at Fig. 6, we see that with W -calibration included, sensitivity to TimeShower:alphaSvalue in essentially removed, whether or not grooming is additionally applied. MultipartonInteractions:alphaSvalue on the other hand is somewhat reduced, but it is in no way eliminated by the additional W calibration. Thus, Fig. 6 shows that with W -calibration, the importance of grooming is to correct for contamination by the underlying event.  Table 3.

Conclusions
In this paper we have studied the systematic uncertainty of the definition of the top quark Monte-Carlo mass, m MC t . This is the parameter extracted from experimental fits which so far has given the best top-quark mass measurements. In order to convert m MC suggested that in fact m MC t should be identified with the MSR mass, m MSR t at a particular scale [11,22]. Independent of the conversion to a short-distance scheme, there is the question of the simulation dependence of m MC t . It is the uncertainty on this simulation-dependence that we address in this paper.
Although m MC t corresponds to a parameter in the Monte Carlo event generator, its extracted value depends on what generator is used and what tune is used within that generator. By varying the tunes, we found that m MC t fluctuates by around 530 MeV. A standard experimental procedure to reduce the jet-energy-scale uncertainty is to rescale the energies of the particles so that the W -mass is reconstructed exactly. We call this W -calibration. In addition to mitigating experimental uncertainties associated with detector response, Wcalibration also removes theoretical uncertainties, such as sensitivity to the amount of finalstate radiation and underlying event in an event. We find that by calibrating to the W -mass, the uncertainty on m MC t shrinks to 200 MeV. To reduce the uncertainty further, we considered two grooming methods, trimming and soft-drop. We find that on top of W -calibration, trimming reduces the uncertainty to 170 MeV while soft drop reduces it to 140 MeV. By looking at the parameters in the different tunes, we saw that the dominant effect corrected by the groomers, but not by the W -calibration, is contamination from underlying event. That is, W -calibration largely eradicates sensitivity to a dominant source of uncertainty, the amount of final-state-radiation, even before grooming. In addition, we estimate around a 50 MeV ambiguity on our uncertainties due to the fitting procedure.
Our estimates were based on adding in quadrature the uncertainties from a set of pythia tunes developed by ATLAS, the A14 tunes. The procedure for calculating theoretical uncertainty is always subjective. Using a different set of tunes, or taking the envelope over the variations rather than adding them in quadrature, or using different MC generators, will all give different absolute numbers. Nevertheless, we believe the relative improvement from W -calibration, reducing the uncertainty by about 60%, and from grooming, an additional 15-30% improvement, should be fairly insensitive to the absolute size of the uncertainties. An absolute error estimate is only possible in the context of a particular measurement, including experimental systematic uncertainties, detector effects, and other issues beyond the scope of our study.
We also looked at the analogous uncertainty estimate at e + e − colliders. We find without any correction, the uncertainty is around 110 MeV and with W -calibration, it reduces to 50 MeV. Since 50 MeV is the same as our estimate of the ambiguity on our fitting procedure, there is no need to consider the effect of grooming on top of W -calibration.
There are two implications of our work. First, we recommend that experimental top mass measurements consider jet grooming in addition to their jet-energy scale corrections. This has the potential to reduce the uncertainty on m MC t by an additional 30%. Second, in the pursuit of understanding how to convert m MC t to a short-distance scheme, like MS, it will be important to understand the effect of W -calibration on theoretical predictions.