1 Introduction

Top quark pair production in association with a bottom quark pair (\(t\bar{t}b\bar{b}\)) is pivotal for probing the fundamental interactions of the Standard Model and extracting crucial information about its properties [1,2,3,4]. At the Large Hadron Collider (LHC), \(t\bar{t}b\bar{b}\) production serves as a significant background process across various high-energy physics phenomena, profoundly impacting the precise determination of the top quark Yukawa coupling from experimental data. Specifically, \(t\bar{t}b\bar{b}\) production represents a notable background in scenarios involving the associated production of the Higgs boson with a top quark pair [5,6,7,8,9], followed by the Higgs decay into bottom quarks, as well as in the production of four-top quark final states [10, 11].

In the simulation of the \(t\bar{t}b\bar{b}\) process, there are two primary theoretical frameworks: the four-flavour scheme (4FS) and the five-flavour scheme (5FS). The 4FS approach treats bottom quarks as massive particles, decoupled from the Parton Density Functions (PDFs) and renormalised on-shell. While, in principle straightforward to apply at fixed order [12,13,14,15,16,17,18,19], challenges arise due to the large mass difference between the top and bottom quarks. Indeed, large logarithms can appear in perturbative calculations for multi-scale processes. This is corroborated by difficulties in choosing optimal renormalisation and factorisation scales. Moreover, when integrating this scheme with parton shower simulations [20, 21] radiation generated by the parton shower can produce additional bottom quarks. It is currently poorly understood how this radiation should be constraint such that the leading bottom quarks in the events are attended for by the short-distance hard matrix element calculations, and only the subleading bottom quarks are of parton-shower origin.

Alternatively, the 5FS accommodates massless bottom quarks within the PDFs and adopts the \(\overline{\text {MS}}\) renormalisation scheme. In this scheme one has to generate an inclusive \(t\bar{t}+\text {jets}\) sample [22,23,24,25], and select b-jets only after parton showering. Because the bottom quarks are treated as massless partons, no large logarithms due to small quark masses appear in the short-distance calculation. Moreover, potential large scale hierarchies between the top quarks and the jets can effectively be resummed and taken into account by a multi-jet merging procedure. In addition, in multi-jet merging approaches the jets generated by the parton shower are always softer than a “merging scale” (except for jets coming from the highest-multiplicity sample), which in itself is smaller than the softest jets generated by the matrix elements, resulting in an accurate parton-shower approximation for all jets. This is not necessarily the case for additional jets in the 4FS approach. In practice, the parton shower simulations [26,27,28] incorporate non-zero bottom quark mass effects into their splitting functions, ensuring reasonable modeling accuracy, especially in the infrared (IR) regions, where resummation of logarithms is particularly important. Consequently, the 5FS approach offers a superior description of the \(t\bar{t}b\bar{b}\) process at the LHC as compared to the 4FS.

However, implementing the 5FS approach can be computationally demanding, particularly when event generation involves multi-jet merging at next-to-leading order (NLO) accuracy. Indeed, generating \(t\bar{t}+\text {jets}\) events with up to 2 jets at NLO accuracy [29,30,31] requires substantional computing resources. Moreover, the requirement to select b-jets post-parton showering results in extremely low event selection efficiencies, as a significant portion of generated events do not contain additional b-jets, leading to an enormous computational overhead. To address these challenges, this study proposes a novel method to enhance the b-jet selection efficiency in the 5FS approach, making it a competitive alternative to the 4FS framework.

An alternative approach, known as the “fusion” method [32, 33], merges aspects of both the 4FS and 5FS calculations. This method can be seen as correcting a 5FS calculation by including mass effects directly in the short-distance calculation. Also this method would benefit from the increased efficiency in selecting events with b-jets in the 5FS sample. Note that in Ref. [33] the 5FS component was only computed at LO accuracy (apart from the \(t\bar{t}+0\,\text {jets}\) contribution), and therefore the selection efficiency is of less importance in reducing overall CPU costs.

The outline of this paper proceeds as follows: in the next section, we detail our proposed method for enhancing b-jet selection efficiency within the 5FS simulation. Section 3 presents predictions for the \(t\bar{t}b\bar{b}\) process in the 5FS at the 13 TeV LHC and compares them to the 4FS. Finally, in Sect. 4, we summarise our findings.

2 Enhancing b flavour in Matrix Elements and Parton Showers

Predicting b-jet associated top quark pair production in the 5FS at the next-to-leading order plus parton shower (NLO+PS) level presents challenges due to the dual sources of bottom quark production: the short-distance contribution from matrix elements and the parton shower. This duality poses efficiency issues, where a majority of events lack bottom quarks, resulting in enormous inefficiencies during event selection.

In order to improve the efficiency of generating bottom quarks at the short-distance matrix element level the following improvements can be made. During phase-space integration and unweighting, for each contribution to an event that contains external bottom quarks, the weight is multiplied by \(w_{\mathrm {enh.}}\), irrespective if these bottom quarks are initial or final state. This skews the event generation to favor events with external bottom quarks. In order to compensate for this, events selected with external bottom quarks have their weight multiplied by \(1/w_{\mathrm {enh.}}\), while the original weight is retained for other events. This increases the number of events with bottom quarks, while not changing any of the physics when the event weights are taken into account.

We have implemented this procedure in the MadGraph5_aMC@NLO [34] code. The enhancement factor \(w_{\mathrm {enh.}}\) can be set by a new parameter, bflav_enhancement, in the run_card.dat file. This new feature will become part of an upcoming release of the MadGraph5_aMC@NLO framework. In the mean time, a version of MadGraph5_ aMC@NLO with this feature incorporated is available from the authors upon request.

A similar biasing strategy can be applied in the parton showering process. In versions starting from 8.311, Pythia8 [26] offers a built-in mechanism for enhancing splitting probabilities for certain types of particles. While this can be used to increase the production of bottom quarks, in practice we have found significant trade-offs. Even with a modest increase of the \(g\rightarrow b\bar{b}\) splitting probability the weights of the events varies widely, significantly hampering the statistical significance of the event sample, and nullifying the improvements in the b-jet selection efficiency.

Alternatively, considering the faster speed of event showering compared to short-distance event generation, a more effective approach involves showering events without short-distance bottom quarks multiple times (\(N_{\textrm{PS}}\)) with different random number seeds and subsequently scaling the weights of these events by \(1/N_{\textrm{PS}}\). This approach mitigates the issue of large weights in short-distance events without bottom quarks while increasing the number of events containing at least one b-jet. Care must be taken to set \(N_{\textrm{PS}}\) judiciously to avoid too large correlations among short-distance events passing the selection procedure multiple times for different shower generations.

3 Results

The 5FS calculation for \(t\bar{t}b\bar{b}\) production at the 13 TeV LHC has been performed using the FxFx merging scheme [30] as implemented in the MadGraph5_aMC@NLO event generator [34]. This involves merging \(t\bar{t}+\text {jets}\) processes with up to 2 jets at NLO accuracy and matching them to the Pythia8 parton shower [26]. We set the bottom-flavour enhancement factor equal to \(w_{\mathrm {enh.}}=100\) to increase the number of short-distance events with bottom quarks.Footnote 1 Central values for the renormalisation and factorisation scales are taken from the FxFx merging procedure, with a merging scale of 40 GeV (with 70 and 100 GeV as alternatives) and a default shower starting scale based on \(H_T/2\), with an alternative value based on \(H_T/4\). Uncertainties around the central value are estimated by considering the envelope of predictions based on the usual 7-point scale variation, merging scale alternatives, and shower starting scale alternative. In practice, for most observables, the uncertainty is dominated by the renormalisation and factorisation scale dependence.

The 5FS predictions are compared to an NLO+PS prediction in the 4FS using the MC@NLO matching [35] as implemented in the MadGraph5_aMC@NLO event generator. Central values of the renormalisation and factorisation scales are determined as follows:

$$\begin{aligned} \mu _R= & {} \big (E_{T,t}E_{T,\bar{t}}E_{T,b}E_{T,\bar{b}}\big )^{1/4} \\ \mu _F= & {} \tfrac{1}{2}\big (E_{T,t}+E_{T,\bar{t}}+E_{T,b}+E_{T,\bar{b}}\big ), \end{aligned}$$

respectively, with \(E_T=\sqrt{m^2+p_T^2}\), following Ref. [36]. For the 4FS we show the scale dependence by performing a 7-point variation for the renormalisation and factorisation scales. Note that we do not include shower scale uncertainties, and no matching scheme uncertainties. In particular the latter are expected to be sizeable based on the differences found among the various approaches considered in Ref. [36], but are non-trivial to assess exactly.

Both 5FS and 4FS events are matched to the Pythia8 parton shower. To simplify the analysis and focus on differences between the 4FS and 5FS setups, we do not include hadronisation, underlying events, and keep the top quarks stable. Events that do not contain short-distance bottom quarks have been showered \(N_{\textrm{PS}}=10\) times.

Jets are reconstructed from all final-state partons (excluding the top quarks) using the anti-\(k_T\) algorithm [37, 38] with \(\Delta R>0.4\). Jets must have a transverse momentum \(p_T>25~\text {GeV}\) and pseudo-rapidity \(|\eta |<2.5\). Jets containing at least one bottom quark are identified as b-jets.Footnote 2 We consider two scenarios: the one b-jet setup, where at least one b-jet is required for an event to pass the selection cuts, and the two-b-jet setup, where at least two b-jets are required.

Fig. 1
figure 1

Predictions in the 5FS and 4FS in the one-b-jet scenario for invariant mass of the \(t\bar{t}\) pair (top left); Transverse momentum of the \(t\bar{t}\) pair (top right); number of b-jets (middle left); transverse momentum of the hardest b-jet (middle right); and pseudo-rapidity of the hardest b-jet (bottom left); the transverse momentum of the hardest light jet (bottom right)

In Figs. 1 and 2, we present a selection of representative predictions in the one-b-jet and two-b-jet selection scenarios, respectively. All plots follow a similar format, created with the Rivet analysis toolkit [40]. The main panel displays the absolute predictions, i.e., cross section per bin, in the 5FS and 4FS schemes using red and blue histograms, respectively. For the 5FS, the coloured band represents the envelope of the renormalisation, factorisation, merging, and shower starting scale variation. In the lower panel, the ratio with respect to the central value in the 5FS is shown.

We commence with the results for the one-b-jet scenario. In the upper-left plot of Fig. 1, the top quark pair invariant mass is depicted. The difference between the 5FS and the 4FS is minimal, reaching approximately 20% around \(m^{t\bar{t}}\gtrsim 1~\text {TeV}\), with the central value of the 5FS spectrum being harder than the 4FS. This discrepancy falls within the 5FS scale uncertainty band. Conversely, the transverse momentum of the top quark pair exhibits a difference (upper-right plot), with the 5FS prediction notably harder than the 4FS one. At small transverse momenta \(p_T^{t\bar{t}}\lesssim 100~\text {GeV} \), the 4FS yields a larger prediction, exceeding a factor of 2 at \(p_T^{t\bar{t}}\lesssim 15~\text {GeV}\). Conversely, at large transverse momenta \(p_T^{t\bar{t}}\gtrsim 300~\text {GeV} \), the 5FS predicts more than twice as many events as the 4FS calculation. Although the uncertainty band of the 5FS widens at large transverse momenta, the central value of 4FS lies outside of this band across almost the entire range considered. When taking the renormalisation and factorisation scale dependence of the 4FS into account, the bands almost touch over the range considered. Given that the true 4FS uncertainty is expected to be larger than just the renormalisation and factorisation scale dependence, one can conclude that the two schemes are in agreement for this observable. However, the difference in shape between the two central values is rather notable, and in appendix A we further investigate this observable.

Turning our attention to the b-jets (not stemming from the decay of the top quarks, as the top quarks are kept stable in the predictions), we depict the number of b-jets in the middle-left plot, the transverse momentum of the hardest b-jet (i.e., the one with the highest transverse momentum) in the middle-right plot, and the pseudo-rapidity of the hardest b-jet in the lower-left plot of Fig. 1. The cross section for exactly 1 and 2 b-jets exhibits similarity between the 5FS and 4FS calculations, with their respective central values well within the uncertainty band of the other. Conversely, for 3 or 4 b-jets, the cross section predicted in the 4FS is somewhat larger than in the 5FS, although uncertainties are significant for these bins. The central value of the 4FS prediction for the transverse momentum of the hardest b-jet lies on the edge of the uncertainty band associated with the 5FS prediction, with the 5FS yielding a harder spectrum. Regarding the pseudo-rapidity of this jet, the two predictions are in good agreement.

The transverse momentum distribution of the hardest light jet (i.e., the jet with the highest transverse momentum not containing any bottom quarks) is presented in the lower-right plot of Fig. 1. As observed from this distribution, the 5FS prediction results in a harder spectrum for this jet, with the central value of the 4FS spectrum lying outside of the 5FS uncertainty band, but the two bands do show some overlap over the entire range considered.

Fig. 2
figure 2

Predictions in the 5FS and 4FS in the two-b-jet scenario for transverse momentum of the \(t\bar{t}\) pair (top left); invariant mass of the two leading b-jets (top right); transverse momentum of the hardest b-jet (middle left); transverse momentum of the softest b-jet (middle right); \(\Delta R\) separation between the two leading b-jets (bottom left); and the transverse momentum of the hardest light jet (bottom right)

Similarly to the one-b-jet scenario, also in the two-b-jet scenario, Fig. 2 shows that the transverse momentum of the top quark pair is significantly harder in the 5FS compared to the 4FS scheme (top-left plot). Again, the central value of 4FS curve lies outside of the 5FS uncertainty band, with the bands themselves barely touching, see also appendix A. For the invariant mass between the two leading b jets, i.e., the two b-jets with the highest transverse momenta, the 5FS and 4FS predictions are in agreement, as seen in the upper-right plot of Fig. 2. The 4FS prediction lies within the 5FS uncertainty band. Both the transverse momentum distributions of the hardest and the softest b-jet, shown in the middle-left and middle-right plots, respectively, also demonstrate agreement between the 5FS and the 4FS, with the central value of 4FS lying at the end of the 5FS uncertainty band. The 5FS uncertainty band is relatively large, reaching \(\pm 40\%\) for transverse momenta larger than \(100~\text {GeV}\) and \(50~\text {GeV}\) for the hardest and softest b-jets, respectively. The uncertainty band for the 4FS prediction is smaller, but only takes the renormalisation and factorisation scale dependence into account, and is therefore a lower limit on the true uncertainty.

Fig. 3
figure 3

Transverse momentum of the top quark pair in the one-b-jet (top row) and two-b-jet (bottom row) scenarios. On the left are the predictions in the 5FS and on the right in the 4FS. The distribution is split into the case where the hardest jet is a b-jet and the hardest jet is a light jet

In the lower-left plot, the \(\Delta R\) separation between the two leading b-jets is shown. For small \(\Delta R_{bb}\), the 5FS uncertainties are large, and the 4FS prediction lies within this uncertainty band. For \(\Delta R_{bb}\gtrsim 1.5\) the 5FS uncertainty band shrinks to approximately \({}^{+10\%}_{-20\%}\), but the 4FS scale dependence band increases to approximately \({}^{+25\%}_{-30\%}\). Due to this increase of uncertainty in the 4FS calculations, both schemes are in agreement for this observable. Finally, in the lower-right plot of Fig. 2, the transverse momentum of the light jet is shown. The ratio between the central values of the 4FS and the 5FS predictions for this observable in the one-b-jet and two-b-jet scenarios is very similar. On the other hand, the uncertainty band for the 5FS is considerably larger in the two-b-jet scenario compared to the one-b-jet scenario, resulting in a better agreement between these two flavour schemes in the former, as compared to the latter.

4 Conclusions

In this study, we have presented a calculation for the \(t\bar{t}b\bar{b}\) process in the 5FS at NLO accuracy for the 13 TeV LHC. Our approach involved computing the inclusive \(t\bar{t}+\text {jets}\) process using the FxFx merging prescription with up to 2 jets at NLO accuracy and matching these predictions to the Pythia8 parton shower. We examined the phase-space region containing at least one additional b-jet alongside the top quark pair, as well as the region with at least two additional b jets. To improve the efficiency of selecting these jets within the inclusive sample, we implemented a method to enhance the probability of producing short-distance events with additional bottom quarks in the MadGraph5_aMC@NLO code, compensating for this enhancement by reducing the weight of these events. This approach enabled us to produce distributions in the one-b-jet and two-b-jet phase-space regions with only modest statistical uncertainties, starting from just 5 million \(t\bar{t}+\text {jets}\) short-distance events. Our results demonstrate the viability of producing the \(t\bar{t}b\bar{b}\) process in the 5FS at NLO accuracy, yielding the most accurate predictions for this process to date.

Compared to predictions at NLO+PS in the 4FS, we observed sizeable differences with the 5FS. Notably, events in the 5FS exhibit higher energy levels, with the transverse momentum of the top quark pair and, to a somewhat lesser extent, the transverse momentum of the hardest light jet displaying a harder spectrum in the 5FS. When taking the uncertainty bands into account, for all observables considered the bands either overlap or touch – and care should be taken here that the uncertainty estimated for the 5FS is more reliable than the one for the 4FS, since the latter are expected to have significant uncertainties stemming from the matching scheme which are not taken into account.

Based on our findings, we advocate for the use of the 5FS in predicting the \(t\bar{t}b\bar{b}\) process at the LHC. The 5FS approach offers enhanced accuracy at a modest efficiency cost, providing valuable insights into the physics of top quark pair production in association with bottom quarks. Ultimately, the improvements in efficiency discussed in this work can also be applied to the “fusion” method [32, 33], allowing one to upgrade that method also with the 5FS component to be computed at NLO accuracy for up to 2 jets.