1 Introduction

The standard model (SM) of particle physics is a successful theory to explain a plethora of experimental observations involving weak, electromagnetic, and strong interactions over the last few decades. However, as experiments probe new questions and increasing energies, observations indicate the SM is incomplete and might be a low-energy remnant of a more complete theory. There are a multitude of theoretical models proposed to overcome the SM limitations. Although the initial motivations and resulting implications of these models can vary, a common characteristic is the manifestation of new particles that can be probed in proton–proton (pp) collisions at the CERN’s Large Hadron Collider (LHC).

Numerous ideas have been proposed to probe physics beyond the SM, motivating a large volume of searches at the LHC. Nonetheless, extensive searches have found no firm indication of new phenomena, largely constraining theories and setting exclusion limits up to multi-TeV on the masses of new particles predicted by those theories [1,2,3,4,5,6,7,8]. Possible explanations for the lack of evidence point to either new particles being too massive or having too low a production rate in existing colliders, or new physics having different features compared to what is traditionally assumed in many beyond SM theories and searches, thus remaining concealed in processes not yet investigated. In particular, many searches conducted so far at the LHC rely heavily on the assumption that these hypothesized new particles have similar couplings to all generations of fermions, including couplings to the partons inside the proton, thus favoring LHC production modes through light quarks. Therefore, if new phenomena are within the reach of the LHC, both in energy and production rate, they might manifest with different features compared to what is assumed in searches at high energy colliders, thus requiring new efforts and experimental quests.

In this paper, we consider a different scenario in which new particles have non-universal fermion couplings, favoring higher-generation fermions, which we refer to as anogenophilic particles. In particular, we consider a new neutral vector gauge boson, \(\text {Z}^{\prime }\), with only couplings to third generation fermions, referred to as tritogenophilic. This physics case is also interesting theoretically and because of recent results in precision measurements, offering a new physics phase space not yet fully explored at the LHC.

An anogenophilic resonance is predicted in several theories that extend the SM and in different contexts such as in minimal \(\textrm{U}(1)_{X}\) extensions [9], top-assisted technicolor models [10], Randall–Sundrum models with Kaluza–Klein excitations of the graviton [11,12,13], two-Higgs doublet models that address the naturalness of the electroweak symmetry breaking scale [14,15,16,17], left-right extensions of the SM [18], models with a color-sextet or color-octet [19,20,21,22,23,24], with composite particles [25,26,27,28,29,30,31,32,33,34,35,36,37,38], or with dark matter mediators [39,40,41,42,43,44,45].

Besides well-motivated theoretical extensions of the SM, it is interesting to speculate how new physics with non-equal fermion couplings may reconcile with some experimental results from precision measurements that, although not yet confirmed, appear to indicate that such a hypothesis is viable and worth exploring at the energies accessible at the LHC. In this perspective, recent results seem to confirm long-standing tensions with the SM in the measurement of the muon anomalous magnetic moment [46] and measurements related to the branching ratios \(R_{D}\), \(R_{D^{(*)}}\) [47,48,49,50,51,52,53,54,55], all showing significant deviations from the SM expectation [46, 56]. The scenario of a tritogenophilic new particle is also interesting in light of recent results of the measured cross sections for pp\(\rightarrow \textrm{t} \bar{\textrm{t}} + \textrm{b} \bar{\textrm{b}}\) (\(\textrm{t} \bar{\textrm{t}} \textrm{t} \bar{\textrm{t}}\)) production from the ATLAS [57, 58] [59] and CMS [60, 61] collaborations, which are found to be higher than the expectations from the SM.

Lastly, direct searches for additional Higgs bosons have recently reported some tensions with the SM expectations at about 95 GeV in the diphoton [62] final state, and 100 GeV in the ditau [63] final state, which are consistent with each other within the resolution of the reconstructed invariant mass of the \(\tau \tau \) system, and that urge new experimental quests for particles coupling to higher generation fermions (such as top and bottom quarks investigated in this work). Interestingly, the analysis in [63] also reports a deviation at the TeV scale in a search for leptoquarks coupling to higher generation fermions, which also appears in a separate dedicated search at CMS [64].

Fig. 1
figure 1

Representative Feynman diagram for the production of a \(\text {Z}^{\prime }\) boson through the fusion of a top quark pair, where the \(\text {Z}^{\prime }\) decays to a pair of bottom quarks and the two spectator top quarks decay semi-leptonically

The mass, quantum numbers, and couplings of new hypothetical mediators can be open parameters to be determined experimentally, making the new physics phase space broadly defined. Thus, initial ATLAS/CMS searches for these new type of particles were conducted considering models with democratic couplings to all fermion families, and focused on Drell–Yan production mechanisms with light quarks (e.g., \(\textrm{q}\bar{\textrm{q}}\rightarrow \textrm{Z}^{\prime }\)), and final states with muons and electrons with high signal acceptance and a narrow “bump” in the reconstructed invariant mass spectrum of lepton pairs sitting above a smooth and steeply falling background distribution [65, 66]. However, from the phenomenological point of view, when couplings to light quarks are suppressed in pp colliders, relative to higher-generation fermions, new production mechanisms become dominant to generate and discover beyond SM resonances. They are produced in association with other SM particles and give origin to rare and peculiar signatures. The phenomenology of purely top-philic \(\text {Z}^{\prime }\) [67,68,69] scenarios, as well as models with a \(\text {Z}^{\prime }\) that couples to top quarks and tau/muon [70, 71] leptons, have already been studied in the literature. Furthermore, a CMS search has been performed for a neutral resonance coupling to top quarks and decaying to muons or electrons [72].

In this paper, we perform a previously unexamined feasibility study on the production of a more general tritogenophilic \(\text {Z}^{\prime }\) produced through the fusion of a \(\textrm{t} \bar{\textrm{t}}\) pair (\(\textrm{t} \bar{\textrm{t}}\text {Z}^{\prime }\)) and decaying to a pair of \(\textrm{b}\) quarks (\(\text {Z}^{\prime }\rightarrow \textrm{b}\bar{\textrm{b}}\)), as in Fig. 1. We consider the final state where one of the two remaining tops from the fusion process, referred to as spectator top quarks, decays to \(\textrm{bW}\) and the \(\textrm{W}\) boson subsequenly decays to an electron or muon plus its neutrino. Such a choice balances the lower \(\textrm{W}\rightarrow \ell \nu \) branching fraction compared to \(\textrm{W}\) boson decays into two quarks, with a cleaner final state. This has the double advantage of mitigating the large background from full-hadronic SM quantum chromodynamics (QCD) processes, and of overcoming the otherwise overwhelming events rate that is outside the typical trigger bandwidth at the LHC, rendering the search sensitive to a wide range of \(\text {Z}^{\prime }\) masses. For \(\text {Z}^{\prime }\) masses below the \(2m_{\textrm{t}}\) kinematic production threshold, where \(\text {Z}^{\prime } \rightarrow \textrm{t}\bar{\textrm{t}}\) decays are not permitted, the \(\text {Z}^{\prime }\) decay to \(\textrm{b} \bar{\textrm{b}}\) is the dominant discovery mode. Furthermore, the analysis strategy proposed in this paper provides enhanced sensitivity compared to other approaches already used in searches at the LHC [1, 4, 73,74,75]. Above \(2m_{\textrm{t}}\), the reduced jet multiplicity of the \(\text {Z}^{\prime } \rightarrow \textrm{b}\bar{\textrm{b}}\) final state, in comparison to \(\text {Z}^{\prime } \rightarrow \textrm{t}\bar{\textrm{t}}\), favors the experimental reconstruction of the \(\text {Z}^{\prime }\) mass. In this work, machine learning techniques are used to maximize the experimental sensitivity.

2 Samples and simulation

Signal and background samples are generated with MadGraph5_aMC (v2.6.3.2) [76] considering \(\text {pp}\) beams colliding with a center-of-mass energy of \(\sqrt{s}=13\) TeV and \(\sqrt{s}=14\) \(\textrm{TeV}\). All samples are generated using the NNPDF3.0 NLO [77] set for parton distribution functions (PDFs). Parton level events are then interfaced with the PYTHIA (v8.2.05) [78] package to include parton fragmentation and hadronization processes, while DELPHES (v3.4.1) [79] is used to simulate detector effects, using the CMS detector geometric configurations and parameters, for performance of particle reconstruction and identification. At parton level, jets are required to have a minimum transverse momentum (\(p_{T}\)) of 20 \(\textrm{GeV}\) and pseudorapidity (\(\eta \)) \(|\eta | < 5.0\). The cross sections in this paper are obtained with the aforementioned parton-level selections. The MLM algorithm [80] is used for jet matching and jet merging. The xqcut and qcut variables of the MLM algorithm, related with the minimal distance between partons and the energy spread of the clustered jets, are set to 30 and 45, respectively, as a result of an optimization process requiring the continuity of the differential jet rate as a function of jet multiplicity.

The signal samples are generated considering the production of a \(\textrm{Z}^{\prime }\) and two associated top quarks (\(\textrm{pp}\rightarrow \textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}\)), inclusive in \(\alpha \) and \(\alpha _{\text {s}}\). For our benchmark signal scenario, we consider the simplified model in Ref. [81] where the \(\textrm{Z}^{\prime }\) masses and couplings to the SM particles are free parameters, and defined as variations of the SM \(\textrm{Z}\) boson couplings (i.e., variations of the so-called Sequential Standard Model, SeqSM). The \(\textrm{Z}^{\prime }\) coupling to the first and second generation SM quarks is defined as \(g_{\textrm{Z}^{\prime }q\bar{q}} = g_{q} \times g_{\textrm{Z}q\bar{q}}\), where \(g_{\textrm{Z}q\bar{q}}\) is the SM \(\textrm{Z}\) boson coupling to first and second generation quarks and \(g_{q}\) is a “modifier” for the coupling. Similarly, the \(\textrm{Z}^{\prime }\) coupling to the third generation SM quarks is defined as \(g_{\textrm{Z}^{\prime },\textrm{b}/\textrm{t},\bar{\textrm{b}}/\bar{\textrm{t}}} \times g_{ \textrm{Z},\textrm{b}/\textrm{t},\bar{\textrm{b}}/\bar{\textrm{t}}}\), where \(g_{\textrm{Z}^{\prime },\textrm{b}/\textrm{t},\bar{\textrm{b}}/\bar{\textrm{t}}}\) is the modifier to the SeqSM coupling. We refer to this model as “simplified phenomenological model 1” (SPM1). In all cases considered, the modifiers for the \(\textrm{Z}^{\prime }\) couplings to \(\textrm{t}\bar{\textrm{t}}\) and \(\textrm{b}\bar{\textrm{b}}\) are equal to each other, and thus for simplicity we henceforth refer to those modifiers as \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}\). Therefore, a scenario with \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} = 1\) has similar \(\textrm{Z}^{\prime }\) couplings to top/bottom quarks as the SeqSM. Signal samples were created for \(m(\textrm{Z}^{\prime })\) ranging from 250 to 2000 GeV. Table 1 lists the production cross sections for different \(\text {Z}^{\prime }\) masses, considering \(\text {pp}\) collisions at \(\sqrt{s}=13\) \(\text {TeV}\) and 14 \(\text {TeV}\), and for two representative \(g_{q}\) coupling scenarios with \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} = 1\). The \(g_{q} = 0\) case is a proxy for the tritogenophilic scenarios, where the couplings of the \(\textrm{Z}^{\prime }\) to light quarks are suppressed. The \(g_{q} = 1\) case allows for non-negligible couplings to light quarks, and thus other \(\textrm{t}\bar{\textrm{t}}\textrm{Z}^{\prime }\) production processes can contribute, such as initial state radiation of a \(\textrm{Z}^{\prime }\) from a light quark.

Table 1 Signal cross sections, calculated with MadGraph, for different \(\text {Z}^{\prime }\) masses and couplings to first and second generation quarks. The values in this table are calculated with \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} = 1\)

In addition to our primary signal benchmark model described above, we also consider a tritogenophilic scenario where the \(\textrm{Z}^{\prime }\) is a color singlet vector particle whose effective couplings are not suppressed by factors of the electroweak mixing angles (as in the SeqSM) and whose relevant interactions to top/bottom quarks are given by the following renormalizable Lagrangian: \(\mathcal {L}_{\textrm{int}} = \bar{\textrm{t}}\gamma _{\mu }(c_{L}P_{L}+c_{R}P_{R})\textrm{t}\textrm{Z}^{\prime \textrm{ }\mu } = c_{\textrm{eff}}\bar{\textrm{t}}\gamma _{\mu }(\textrm{cos}\theta P_{L} + \textrm{sin}\theta P_{R})\textrm{t}\textrm{Z}^{\prime \textrm{ }\mu }\), where \(P_{R/L} = (1 \pm \gamma _{5})/2\) are the projection operators, \(c_{\textrm{eff}} = \sqrt{c_{L}^{2} + c_{R}^{2}}\) is the \(\textrm{Z}^{\prime }\) coupling to top/bottom quarks, and \(\textrm{tan}\theta = c_{R}/c_{L}\) is the tangent of the chirality angle. We consider the case where the \(\textrm{Z}^{\prime }\) couplings to top and bottom quarks are equal to each other, and thus for simplicity we henceforth refer to those couplings as \(c_{\textrm{t}}\). This type of simplified model, which we refer to as “simplified phenomenological model 2” (SPM2), has been studied in Refs. [67,68,69], and it has been shown that \(\textrm{t}\bar{\textrm{t}}\textrm{Z}^{\prime }\) production is independent of \(\theta \). We have checked that this is indeed the case. Thus, we only consider \(\theta = \pi /2\). Although the signal kinematic distributions for this particular model are similar to those of SPM1, the \(\textrm{t}\bar{\textrm{t}}\textrm{Z}^{\prime }\) production cross sections for SPM2 are larger than those of SPM1, when \(c_{\textrm{t}} = g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}\), since the SPM2 Lagrangian does not contain suppression terms from the electroweak mixing angles. Our primary motivation in using SPM2 is to compare the projected discovery reach of the proposed analysis strategy in this paper, with other strategies, such as those in Ref. [68], which considers the \(\textrm{Z}^{\prime }\rightarrow \textrm{t}\bar{\textrm{t}}\) decay mode.

For each signal model (SPM1 and SPM2 with \(q_{q} = 0\) and 1), we generate signal samples for \(\textrm{Z}^{\prime }\) mass values between 250 GeV and 2000 GeV, in steps of 25 GeV between 250 and 500 GeV, and steps of 250 GeV between 500 and 2000 GeV. The considered \(\textrm{Z}^{\prime }\)-\(\text {t}\bar{\text {t}}\) coupling values are between 1 and 4, in steps of 0.5. In total there are 476 \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}, m(\textrm{Z}^{\prime }), \text {SPM model} \}\) signal scenarios simulated, and for each of these scenarios two sets of samples are generated, each with one million simulated events, which are used separately for the training and testing of the machine learning algorithm.

Several sources of background are considered for our studies, including production of top quark pairs (\(\textrm{t}\bar{\textrm{t}}\)), \(\text {Z/W}\) bosons with associated jets (\(\text {V}+\)jets), QCD multijet, associated production of a Higgs (\(\textrm{h}\)) or a \(\mathrm {Z/\gamma ^{*}}\) boson from \(\textrm{t}\bar{\textrm{t}}\) fusion processes (denoted \(\textrm{t}\bar{\textrm{t}} \textrm{h}\) and \(\textrm{t}\bar{\textrm{t}} \textrm{X}\)), and associated production of four \(\textrm{t}\) quarks (\(\textrm{t}\bar{\textrm{t}}\textrm{t}\bar{\textrm{t}}\)). Since our signal topology targets final states with four bottom quarks (\(\textrm{Z}^{\prime }\rightarrow \textrm{b}\bar{\textrm{b}}\) and \(\textrm{t}\bar{\textrm{t}}\rightarrow \textrm{bWbW}\)), the \(\textrm{t}\bar{\textrm{t}}\), \(\text {V}+\)jets, and QCD multijet backgrounds do not meaningfully contribute to our studies (\(\ll 1\)% of the total background). The \(\textrm{t}\bar{\textrm{t}}\textrm{h}\), \(\textrm{t}\bar{\textrm{t}}\textrm{X}\), and \(\textrm{t}\bar{\textrm{t}}\textrm{t}\bar{\textrm{t}}\) processes are the dominant sources of background events. The \(\textrm{t}\bar{\textrm{t}h}\) and \(\textrm{t}\bar{\textrm{t}X}\) processes become important backgrounds when \(\textrm{h}\) and \(\textrm{Z}/\gamma ^{*}\) decay to a pair of bottom quarks. Table 2 shows the production cross sections for the dominant backgrounds, at \(\sqrt{s}=13\) \(\text {TeV}\) and 14 \(\text {TeV}\).

Table 2 Cross sections calculated with MadGraph for the dominant background processes

The total event rates are determined using \(N = \sigma \times \text {L} \times \epsilon \), where N represents the total yield of events, \(\text {L}\) the integrated luminosity considered (for this study, 150 fb\(^{-1}\), 300 fb\(^{-1}\), and 3000 fb\(^{-1}\)), and \(\epsilon \) represents any efficiencies which might reduce the total event yield (e.g., particle identification efficiencies). The \(\text {L} = 150\) fb\(^{-1}\) scenario represents an estimate for the amount of data already collected by the ATLAS and CMS experiments, while the other luminosity scenarios are the expectations for the next decade of \(\text {pp}\) data taking at the LHC. All production cross sections are computed at tree level. Since the k-factors associated with higher-order corrections to QCD production cross sections are typically greater than one, our estimates of the sensitivity are conservative.

Following Ref. [82], we consider three possible “working points” for the identification of the b-jet candidates in DELPHES: (i) the “Loose” working point of the DeepCSV algorithm, which gives a 85% b-tagging efficiency and 10% light quark mis-identification rate; (ii) the “Medium” working point of the DeepCSV algorithm, which gives a 70% b-tagging efficiency and 1% light quark mis-identification rate; and (iii) the “Tight” working point of the DeepCSV algorithm, which gives a 45% b-tagging efficiency and 0.1% light quark mis-identification rate. The choice of \(\textrm{b}\)-tagging working points is determined through an optimization process which maximizes discovery reach. The “Medium” working point was ultimately shown to provide the best sensitivity and therefore chosen for this study. For muons (electrons), the assumed identification efficiency is 95% (85%), with a 0.3% (0.6%) mis-identification rate [83,84,85].

3 Data analysis using the gradient boost algorithm

The analysis of signal and background events is performed using a machine learning event classifier, namely a gradient boosted decision trees (BDTs) [86]. Machine learning offers advantages over traditional event classification methods. In particular, machine learning models consider all kinematic variables in tandem, efficiently traversing the high-dimensional space of event kinematics, thereby enabling them to enact complicated selection criteria which incorporates that high-dimensional space in its entirety.

This method iteratively trains decision trees to learn the residuals between predictions and expected values yielded by the tree trained just before it, thereby greedily minimizing error at each iteration. BDTs have been employed to great effect previously in classification problems arising in collider physics (e.g., [87,88,89,90,91,92,93]). We note that although neural networks have also been successfully used for similar tasks, e.g., in Refs. [94, 95], the complex nature of the studies in this work (particle objects considered, experimental constraints in a high luminosity LHC, etc.) motivate the use of a BDT because of its usefulness, efficiency, and simplicity in understanding the machine learning output and underlying nature of the samples being analyzed.

Simulated signal and background events are initially filtered, before being passed to the BDT algorithm, requiring at least four well reconstructed and identified \(\textrm{b}\)-jet candidates, at least two jets not tagged as \(\textrm{b}\) jets, and exactly one identified light lepton (\(\ell \)), that could be either an electron \((\textrm{e})\) or a muon (\(\mu \)). The filtering selections are motivated by experimental constraints, such as the geometric constraints of the CMS/ATLAS detectors, the typical kinematic thresholds for reconstruction of particle objects, and the available lepton triggers which also drive the minimal kinematic thresholds. Selected jets must have \(p_{\textrm{T}} > 30\) \(\text {GeV}\) and \(|\eta (j)| < 5.0\), while \(\textrm{b}\)-jet candidates with \(p_{\textrm{T}} > 30\) \(\text {GeV}\) and \(|\eta (\textrm{b})| < 2.5\) are chosen. The \(\ell \) object must pass a \(p_{\textrm{T}} > 25\) \(\text {GeV}\) threshold and be within a \(|\eta (\ell )| < 2.5\). Overlapping objects in \(\eta -\phi \) space are removed using a minimum \(\Delta R\) among all particle candidates (\(p_{i}\)) above 0.3, where \(\Delta R (p_{i}, p_{j}) = \sqrt{ (\Delta \phi (p_{i}, p_{j}))^{2} + (\Delta \eta (p_{i}, p_{j}) )^{2} }\). These filtering criteria will be henceforth referred to as pre-selections. The efficiency of the pre-selections depends on \(m(\textrm{Z}^{\prime })\), but is typically about 10%. Table 3 summarizes these pre-selections for the analysis.

Events passing this pre-selection are used as input for the BDT algorithm, which classifies them as signal or background, using a probability factor. We implement the BDT algorithm using the canonical scikit-learn [96] and xgboost [97] libraries. In particular, we employed the XGBClassifier class in the latter library with 250 iterations, a max depth of 7, a learning rate of 0.1, and default parameters otherwise, although we note that model performance was found to be largely independent of hyperparameters.

Figures 2, 3, 4 and 5, show relevant kinematic distributions for two SPM1 signal points and dominant backgrounds, normalized to the area under the curve (unity). The distributions correspond to the \(\textrm{b}\)-jet candidate with the highest \(p_{\textrm{T}}\) (\(\mathrm {b_{1}}\)), the second \(\textrm{b}\)-jet candidate with the highest \(p_{\textrm{T}}\) (\(\mathrm {b_{2}}\)), the \(\Delta R\) separation between the \(\mathrm {b_{1}}\) and \(\mathrm {b_{2}}\) candidates, and the reconstructed mass between the \(\mathrm {b_{1}}\) and \(\mathrm {b_{2}}\), \(m(\textrm{b}_{1}, \textrm{b}_{2})\), respectively. These distributions are among the variables identified by the BDT algorithm with the highest signal to background discrimination power.

Table 3 Preliminary event selection criteria used to filter events that are passed to the gradient boosting algorithm. A \(\Delta R(p_{i},p_{j}) > 0.3\) requirement is applied to all the particle candidate pairs \(p_{i},p_{j}\)
Fig. 2
figure 2

Transverse momentum distributions for the \(\textrm{b}\) quark jet with the highest transverse momentum, for two signal points with masses of 350 \(\textrm{GeV}\) and 1000 \(\textrm{GeV}\) and dominant backgrounds

Fig. 3
figure 3

Transverse momentum distributions for the \(\textrm{b}\) quark jet with the second highest transverse momentum, for two signal points with masses of 350 \(\textrm{GeV}\) and 1000 \(\textrm{GeV}\) and dominant backgrounds

As can be seen from Figs. 2 and 3, for \(m(\textrm{Z}^{\prime })\) values beyond the electroweak scale, the relatively large leading and subleading \(\textrm{b}\)-jet \(p_{\text {T}}\) is a key feature attributed to the heavy \(\textrm{Z}^{\prime }\) with respect to the mass of the bottom quarks, thus resulting in an average \(p_{\text {T}}(\textrm{b}_{1,2})\) of approximately \(m(\textrm{Z}^{\prime })/2\). This kinematic feature provides a nice handle to discriminate high \(m(\textrm{Z}^{\prime })\) signal events amongst the large SM backgrounds, which have lower average \(p_{\text {T}}(\textrm{b}_{1,2})\) constrained by the top quark and/or higgs masses. The \(\Delta R\) separation between \(\textrm{b}_{1}\) and \(\textrm{b}_{2}\) is determined by the amount of momentum transfer to the resonant particles in each process (\(\textrm{Z}^{\prime }\), \(\textrm{h}\), or \(\textrm{t}\)), which in turn depends on the masses of those particles. Therefore, Fig. 4 shows greater discrimination between background and signal processes as \(m(\textrm{Z}^{\prime })\) becomes larger. Finally, as noted previously, an advantage of the \(\textrm{Z}^{\prime }\rightarrow \textrm{b}\bar{\textrm{b}}\) final state in comparison to \(\textrm{Z}^{\prime }\rightarrow \textrm{t}\bar{\textrm{t}}\) is the experimental reconstruction of the \(\textrm{Z}^{\prime }\) mass, which is observed as a peak in the \(m(\textrm{b}_{1}, \textrm{b}_{2})\) signal distributions in Fig. 5 near the true \(m(\textrm{Z}^{\prime })\) value. On the other hand, the background \(m(\textrm{b}_{1}, \textrm{b}_{2})\) distributions show a peak near \(m(\textrm{h}) = 125\) GeV for the \(\textrm{t}\bar{\textrm{t}}\textrm{h}\) background, or a broad distribution for the other backgrounds, indicative of the combination of two b jets from different decay vertices. We note that the \(\textrm{Z}^{\prime }\rightarrow \textrm{b}\bar{\textrm{b}}\) decay width depends on \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}^{2}\times \frac{m_{\textrm{b}}^{2}}{m(\textrm{Z}^{\prime })^{2}}\) and is thus suppressed by the relatively small bottom quark mass with respect to the \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}\) and \(m(\textrm{Z}^{\prime })\) values considered in these studies. Therefore, the width of the \(m(\textrm{b}_{1}, \textrm{b}_{2})\) signal distributions is driven by the experimental resolution in the reconstruction of the \(\textrm{b}\)-jet momenta, as well as the probability that the two leading \(\textrm{b}\) jets are the correct pair from the \(\textrm{Z}^{\prime }\) decay.

Fig. 4
figure 4

Distributions for the \(\Delta R\) angular separation between the the highest (\(\mathrm {b_{1}}\)) and second highest (\(\mathrm {b_{2}}\)) transverse momentum \(\textrm{b}\) quark pair, for two signal points with masses of 350 \(\textrm{GeV}\) and 1000 \(\textrm{GeV}\) and dominant backgrounds

Fig. 5
figure 5

Invariant mass distributions for the highest (\(\mathrm {b_{1}}\)) and second highest (\(\mathrm {b_{2}}\)) transverse momentum \(\textrm{b}\) quark pair, for two signal points with masses of 350 \(\textrm{GeV}\) and 1000 \(\textrm{GeV}\) and dominant backgrounds

Fig. 6
figure 6

Output of the gradient boosting algorithm for a \(\textrm{Z}^{\prime }\) signal with mass of 350 \(\textrm{GeV}\) and \(g_{q} = 0\) coupling, and the dominant backgrounds. The distributions are normalized to unity

In addition to these aforementioned variables in Figs. 2, 2, 4, 6 and 5, a variety of other kinematic variables were included as inputs to the BDT algorithm. In particular, 47 such variables were used in total, and these included the momenta of \(\textrm{b}\) and light quark jets (not tagged as \(\textrm{b}\) jets); invariant masses of pairs of \(\textrm{b}\) jets and of the two leading light jets; angular differences between \(\textrm{b}\) jets, between light quark jets, and between the lepton and \(\textrm{b}\) jets; and transverse masses derived from the lepton-\(p^{miss}_{\textrm{T}}\) pair and lepton-\(p^{miss}_{\textrm{T}}\)-\(\textrm{b}\) triplets. The variables \(m(\textrm{b}_{i}, \textrm{b}_{j})\) for \(i, j \ne 1\) provide some additional discrimination between signal and background when the leading \(\textrm{b}\)-jets are not a \(\textrm{Z}^{\prime }\) decay candidate. The transverse mass variables are designed to be sensitive to a leptonic decay of the \(\textrm{W}\) boson and \(\textrm{t}\) quark (i.e., \(m_{jj}\) and \(m_{\textrm{T}}(\ell ,p_{\textrm{T}}^{miss})\) should be near \(m_{\textrm{W}}\), and \(m_{\textrm{T}}(\ell ,\textrm{b},p_{\textrm{T}}^{miss})\) near \(m_{\textrm{t}}\)), as this is an important feature in our signal (Fig. 1). A trained BDT can return the discriminating power of each of its inputs: we found that the plotted kinematic variables (i.e., \(\textrm{p}_{\textrm{T}}(\textrm{b}_1)\), \(\textrm{p}_{\textrm{T}}(\textrm{b}_{2})\), \(\Delta R(\textrm{b}_{1}, \textrm{b}_{2})\), and \(m(\textrm{b}_{1}, \textrm{b}_{2})\)) were among the most productive variables from this standpoint, producing about 60-75% of signal significance (depending on \(\textrm{Z}^{\prime }\) mass), but the inclusion of all 47 variables does provide a non-trivial enhancement.

Fig. 7
figure 7

Output of the gradient boosting algorithm for a \(\textrm{Z}^{\prime }\) signal with mass of 500 \(\textrm{GeV}\) and \(g_{q} = 1\) coupling, and for the most relevant backgrounds. The distributions are normalized to unity

Fig. 8
figure 8

True positive rate versus false positive rate of the BDT algorithm, for two different signal benchmark scenarios

Table 4 Event yields for the main backgrounds and the signal point for \(m(\text {Z}^{\prime }) = 1.0\) \(\textrm{TeV}\), for some of the bin entries for the output of the gradient boosting algorithm. The events correspond to 14 \(\textrm{TeV}\), \(g_{q} = 0\), and 3000 \(\textrm{fb}^{-1}\) luminosity scenario

Figure 6 shows the distributions for the output of the BDT algorithm for a SPM1 signal benchmark point with \(m(\textrm{Z}^{\prime }) = 350\) GeV and \(\{g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}\} = \{0, 1\}\), and the dominant backgrounds. Figure 7 shows the BDT output for \(m(\textrm{Z}^{\prime }) = 500\) GeV and \(\{g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}\} = \{1, 1\}\). The output of the BDT algorithm is a value between 0 and 1, which quantifies the likelihood that an event is either signal-like (BDT output near 1) or background-like (BDT output near 0). The distributions in Figs. 6 and 7 are normalized to an area under the curve of unity. Figure 8 shows the true positive rate (TPR), defined as the probability with which signal events are selected using the BDT output, as a function of the false positive rate (FPR), defined as the probability with which background events are selected. For example, for \(m(\textrm{Z}^{\prime }) = 500\) GeV, when signal events are selected at 50% probability, the background is selected at \(2\times 10^{-4}\) probability. Table 4 shows the expected total event rates for each process, for a particular choice of bin ranges of the BDT output, assuming an integrated luminosity of 3000 fb\(^{-1}\). The values in Table 4 are determined using \(N = \sigma \times \text {L} \times \epsilon \), where \(\epsilon \) contains the efficiency of the pre-selection criteria times the probability that a given process will have a BDT output in a particular bin range. The bins are counted from 1 to 100, going from left to right, such that bin 1 is the leftmost bin near BDT output of 0, and bin 100 is the rightmost bin near a BDT output of 1. The backgrounds dominate over the SPM1 benchmark signal yields in a large part of the BDT output spectrum, especially near zero, where the background yields are about six orders of magnitude larger. The presence of signal will be observed as an enhancement in the yields near a BDT output of unity.

4 Results

Using the BDT distributions normalized to cross section times pre-selection efficiency times luminosity, we calculate the expected experimental signal significance of the proposed search methodology, for different signal models, LHC operation conditions, and integrated luminosity scenarios. As noted earlier, we consider three values for the total integrated luminosity at the LHC: (i) 150 fb\(^{-1}\), which is approximately the amount of \(\textrm{pp}\) data already collected by the ATLAS and CMS experiments; (ii) 300 fb\(^{-1}\), expected in the next few years; and (iii) 3000 fb\(^{-1}\), expected by the end of the High Luminosity LHC era. The significance is calculated using the expected bin-by-bin yields of the BDT output distribution in a profile likelihood fit, using the ROOTFit [98] package developed by CERN. Similar to Refs. [81, 99,100,101,102,103], the signal significance \(Z_{sig}\) is determined using the probability of obtaining the same test statistic with the background-only hypothesis and the signal plus background hypothesis, defined as the local p-value. The value of \(Z_{sig}\) corresponds to the point where the integral of a Gaussian distribution between \(Z_{sig}\) and \(\infty \) results in a value equal to the local p-value.

Systematic uncertainties are incorporated into the significance calculation as nuissance parameters, using a log-normal prior for normalization and a Gaussian prior for shape related uncertainties. The systematic uncertainties are based on both experimental and theoretical constraints. A 3% systematic uncertainty is used to account for experimental errors on the estimation of the integrated luminosity collected by experiments. This is a reasonable and conservative choice based on Ref. [104]. A systematic uncertainty is included due to the choice of PDF, with respect to the default set used to produce the simulated signal and background samples. The PDF uncertainties were calculated following the PDF4LHC prescription [98], and results in up to 5% systematic uncertainty, depending on the process. The effect of the chosen PDF set on the shape of the BDT output distribution is negligible. Other theoretical uncertainties were considered, such as the absence of higher-order contributions to the signal cross sections, which can alter the pre-selection efficiency and shapes of kinematic distributions which are fed into the BDT algorithm. This uncertainty is calculated by varying the renormalization and factorization scales by a factor of two with respect to the nominal value, and by considering the full change in the bin-by-bin yields of the BDT output distribution. They are found to be at most 3% in a given bin. For experimental uncertainties related to the reconstruction and identification of bottom quarks, Ref. [105] reports a systematic uncertainty of 1–5%, depending on \(p_{\textrm{T}}\) and \(\eta \) of the b-jet candidate. However, we assume a conservative 5% uncertainty per b-jet candidate, independent of \(p_{\textrm{T}}\) and \(\eta \), which is correlated between signal and background processes with genuine bottom quarks, and correlated across BDT bins for each process. The electron and muon reconstruction, identification, and isolation requirements have an uncertainty of 2%, while a conservative 3% systematic uncertainty is set on the variation of the electron and muon energy/momentum scale and resolution [106, 107]. We assumed 2–5% jet energy scale uncertainties, depending on \(\eta \) and \(p_{\textrm{T}}\), resulting in shape-based uncertainties on the BDT output distribution that range from 1 to 4%, depending on the BDT bin. Finally, we consider a 10% systematic uncertainty associated with possible errors on the background predictions, which are uncorrelated between background processes.

Fig. 9
figure 9

Expected signal significance as function of reconstructed mass, at \(\sqrt{s} = 13\) \(\textrm{TeV}\) and \(150 \textrm{fb}^{-1}\) luminosity, for the \(g_{q} = 0,1\) and \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} =1\) benchmark coupling scenarios. The \(1.69 \sigma \) reference point for exclusion, and the \(3 \sigma \) and \(5 \sigma \) points for discovery sensitivity are shown as red-dashed lines

Figure 9 shows the SPM1 signal significance as function of \(\text {Z}^{\prime }\) mass, for the \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} \} = \{ 0,1 \}\) and \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} \} = \{ 1,1 \}\) coupling scenarios, assuming \(\sqrt{s} = 13\) \(\textrm{TeV}\) and 150 \(\textrm{fb}^{-1}\). A signal significance of 1.69\(\sigma \) is our threshold to define expected exclusion at 90% confidence level, while 3\(\sigma \) (5\(\sigma \)) significance defines evidence (discovery) of new physics. For the \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} \} = \{ 1,1 \}\) scenario, the analysis shows potential to exclude masses below 1.0 TeV, and achieve greater than 3\(\sigma \) (5\(\sigma \)) signal sensitivity for \(\textrm{Z}^{\prime }\) masses below 800 (675) GeV. For the SPM1 scenario with \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} \} = \{ 0,1 \}\), the expected exclusion range is \(m(\textrm{Z}^{\prime }) < 780\) GeV, and the 3\(\sigma \) (5\(\sigma \)) reach is \(m(\textrm{Z}^{\prime }) < 600\) (500) GeV. Figure 10 shows the results for the same scenarios, but considering \(\textrm{pp}\) collisions at \(\sqrt{s} = 14\) \(\textrm{TeV}\) and integrated luminosities of 300 \(\textrm{fb}^{-1}\) and 3000 \(\textrm{fb}^{-1}\). For the \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} \} = \{ 1,1 \}\) scenario and assuming an integrated luminosity of 3000 \(\textrm{fb}^{-1}\), the expected exclusion bound goes up to \(m(\textrm{Z}^{\prime }) < 1.7\) \(\textrm{TeV}\), while the 3\(\sigma \) reach improves to \(m(\textrm{Z}^{\prime }) < 1.45\) \(\textrm{TeV}\). We note that the \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} \} = \{ 1,1 \}\) scenario is a useful benchmark to compare the sensitivity to existing family-universal \(\textrm{Z}^{\prime }\) searches at CMS/ATLAS. The projected sensitivity obtained in this work for the \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} \} = \{ 1,1 \}\) case is superseded by current and traditional searches at the LHC, such as searches targeting Drell–Yan production of \(\textrm{Z}^{\prime } \rightarrow \text {jj}\). However, even if a new gauge boson with family-universal couplings is first discovered via other analysis strategies, the proposed \(\text {pp} \rightarrow \textrm{t}\bar{\textrm{t}} \textrm{Z}^{\prime }(\rightarrow \textrm{b}\bar{\textrm{b}})\) search strategy remains important to measure the \(\textrm{Z}^{\prime }\) couplings to third-generation fermions. On the other hand, the proposed search strategy in this work can provide the best mode for discovery in the case of a \(\textrm{Z}^{\prime }\) coupling dominantly to the third generation.

Fig. 10
figure 10

Expected signal significance as function of reconstructed mass, at \(\sqrt{s} = 14\) TeV and \(300\, \textrm{fb}^{-1}\) (\(3000 \textrm{fb}^{-1}\)) luminosity, for the \(g_{q} = 0,1\) and \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}} =1\) benchmark benchmark coupling scenarios. The \(1.69 \sigma \) reference point for exclusion, and the \(3 \sigma \) and \(5 \sigma \) points for discovery sensitivity are shown as red-dashed lines

We also estimate the expected signal significance for different SPM1 coupling scenarios of the \(\textrm{Z}^{\prime }\) boson to \(\textrm{t}\)/\(\textrm{b}\) quarks. Figure 11 shows the signal significance for different \(g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}\) and \(m(\textrm{Z}^{\prime })\) scenarios, with suppressed couplings to first and second generation quarks (\(g_{q} = 0\)), assuming \(\sqrt{s} = 13\) \(\textrm{TeV}\) and 150 \(\textrm{fb}^{-1}\). Figure 12 shows the corresponding results for the same \(\{ g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}, m(\textrm{Z}^{\prime }) \}\) combinations, but using \(g_{q} = 1\). The results for \(\sqrt{s} = 14\) \(\textrm{TeV}\), assuming 300 \(\textrm{fb}^{-1}\) and 3000 \(\textrm{fb}^{-1}\), are presented in Figs. 13, 14, 15 and 16 for different \(\{ g_{q}, g_{\textrm{Z}^{\prime }\textrm{t}\bar{\textrm{t}}}, m(\textrm{Z}^{\prime }) \}\) combinations.

Fig. 11
figure 11

Projected signal significance for the \(g_{q} = 0\) benchmark model for different \(g_{tt}\) coupling scenarios and \(\text {Z}^{\prime }\) masses. The estimates are performed at \(\sqrt{s} = 13\) TeV and \(150\, \textrm{fb}^{-1}\)

Fig. 12
figure 12

Projected signal significance for the \(g_{q} = 1\) benchmark model for different \(g_{tt}\) coupling scenarios and \(\text {Z}^{\prime }\) masses. The estimates are performed at \(\sqrt{s} = 13\) \(\textrm{TeV}\) and \(150\, \textrm{fb}^{-1}\)

Fig. 13
figure 13

Projected signal significance for the \(g_{q} = 0\) benchmark model for different \(g_{tt}\) coupling scenarios and \(\text {Z}^{\prime }\) masses. The estimates are performed at \(\sqrt{s} = 14\) \(\textrm{TeV}\) and \(300\, \textrm{fb}^{-1}\)

Fig. 14
figure 14

Projected signal significance for the \(g_{q} = 1\) benchmark model for different \(g_{tt}\) coupling scenarios and \(\text {Z}^{\prime }\) masses. The estimates are performed at \(\sqrt{s} = 14\) \(\textrm{TeV}\) and \(300\, \textrm{fb}^{-1}\)

Fig. 15
figure 15

Projected signal significance for the \(g_{q} = 0\) benchmark model for different \(g_{tt}\) coupling scenarios and \(\text {Z}^{\prime }\) masses. The estimates are performed at \(\sqrt{s} = 14\) \(\textrm{TeV}\) and \(3000\, \textrm{fb}^{-1}\)

Fig. 16
figure 16

Projected signal significance for the \(g_{q} = 1\) benchmark model for different \(g_{tt}\) coupling scenarios and \(\text {Z}^{\prime }\) masses. The estimates are performed at \(\sqrt{s} = 14\) \(\textrm{TeV}\) and \(3000\, \textrm{fb}^{-1}\)

Table 5 Projected signal significance for our second simplified model, considering the \(c_{\textrm{t}} = 1\) coupling scenario with varying \(\textrm{Z}^{\prime }\) masses. The calculations are performed at \(\sqrt{s} = 14\) \(\textrm{TeV}\) and assuming both 300 \(\textrm{fb}^{-1}\) and 3000 \(\textrm{fb}^{-1}\)

Table 5 shows the SPM2 signal significance as function of \(m(\text {Z}^{\prime })\) and integrated luminosity, for the \(\{ c_{\textrm{t}}, \theta \} = \{ 1,\pi /2 \}\) scenario, assuming \(\sqrt{s} = 14\) \(\textrm{TeV}\). The expected SPM2 exclusion range is \(m(\textrm{Z}^{\prime }) < 1.5\) TeV at \(\textrm{L} = 300\) fb\(^{-1}\), while the 5\(\sigma \) discovery reach is \(m(\textrm{Z}^{\prime }) < 1.5\) TeV for the 3000 fb\(^{-1}\) expected by the end of the high luminosity LHC era.

5 Discussion

As the LHC continues to run with pp collisions at the highest energy, and with the slow increase in luminosity expected of the high-luminosity program of the accelerator, it is an important matter to ponder why certain searches for new physics have not provided strong evidence for discovery, and consider unexplored possibilities. In this work, we examine the phenomonology of a \(\textrm{Z}^{\prime }\) boson favoring higher-generation fermions (anogenophilic), in particular coupling to third generation fermions (tritogenophilic). This scenario is well motivated and arises in many theories that extend the SM [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37, 39,40,41,42,43,44,45, 108]. It also seems to appear as a possible, although not yet confirmed, pattern in precision measurements of the B-physics sector [47,48,49,50,51,52,53,54,55, 109,110,111,112,113,114,115,116,117] and the measurement of the muon anomalous magnetic moment [46]. An anogenophilic \(\textrm{Z}^{\prime }\) has already been investigated phenomenologically or experimentally for the case in which the new boson is produced in association with two top quarks and decays to two top quarks (top-philic [67,68,69]), tau/muon leptons [70, 71], or muon/electron leptons [72]. Here we have presented a feasibility study for the \(\textrm{Z}^{\prime }\) decay into two \(\textrm{b}\) quarks. The study has been performed under the context of \(\textrm{pp}\) collisions at the LHC, at \(\sqrt{s} = 13\) \(\textrm{TeV}\) and \(\sqrt{s} = 14\) \(\textrm{TeV}\), using a BDT algorithm to optimize the signal to background separation and maximize exclusion or discovery potential. Various coupling scenarios for the \(\textrm{Z}^{\prime }\) have been considered, including suppressed couplings to light flavour quarks (\(g_{q} = 0\)), enhanced couplings to third generation fermions, and preferential couplings to top and bottom quarks (\(g_{\textrm{Z}^{\prime }t\bar{t}}\)). Under the SPM1 \(g_{q} = 1\) (\(g_{q} = 0\)) scenario, at \(\sqrt{s} = 13\) \(\textrm{TeV}\) and integrated luminosity of 150 \(\textrm{fb}^{-1}\), \(\textrm{Z}^{\prime }\) masses up to 1.0 \(\textrm{TeV}\) (780 \(\textrm{GeV}\)) can be excluded at 95% confidence level, while 5\(\sigma \) discovery potential exists for masses below 675 \(\textrm{GeV}\) (500 \(\textrm{GeV}\)). For the high luminosity era of the LHC with \(\sqrt{s} = 14\) \(\textrm{TeV}\) and integrated luminosity of 3000 \(\textrm{fb}^{-1}\), \(\textrm{Z}^{\prime }\) masses up to 1.70 \(\textrm{TeV}\) (1.25 \(\textrm{TeV}\)) can be excluded for the SPM1 \(g_{q} = 1\) (\(g_{q} = 0\)) scenario, while the 5\(\sigma \) discovery reach is \(m(\textrm{Z}^{\prime }) < 1.25\) \(\textrm{TeV}\) (900 \(\textrm{GeV}\)). For the SPM2 benchmark scenario with \(c_{\textrm{t}} = 1\) and \(\theta = \pi / 2\), the discovery (exclusion) reach is 1.5 (1.7) TeV at \(\sqrt{s} = 14\) \(\textrm{TeV}\) and integrated luminosity of 3000 \(\textrm{fb}^{-1}\). As noted previously, the projected sensitivity using the SPM2 scenario serves as a good comparison with other search strategies. For example, the authors of Ref. [68] examined the high luminosity LHC sensitivity to these anogenophilic scenarios using the \(\textrm{pp}\rightarrow \textrm{t}\bar{\textrm{t}}\textrm{Z}^{\prime }\rightarrow \textrm{t}\bar{\textrm{t}}\textrm{t}\bar{\textrm{t}}\) final state with boosted top tagging algorithms, and reported a projected \(2\sigma \) reach of approximately \(m(\textrm{Z}^{\prime }) < 1.5\) TeV for the same coupling scenario of \(c_{\textrm{t}}=1\), assuming an integrated luminosity of 3000 fb\(^{-1}\). That result is to be compared with the stronger projected significance of \(> 5.41\sigma \) for \(m(\textrm{Z}^{\prime }) < 1.5\) TeV in Table 5, using the strategy presented in this paper. Additionally, Ref. [68] reports that a \(> 5\sigma \) discovery reach is attainable for \(m(\textrm{Z}^{\prime }) = 1.5\) TeV if \(c_{\textrm{t}} > 1.65\). For comparison, Table 5 already shows a significance of \(5.41\sigma \) for \(m(\textrm{Z}^{\prime }) = 1.5\) TeV with a smaller coupling of \(c_{\textrm{t}} = 1\). We also point out that these comparisons are conservative since the studies outlined in Ref. [68] assume a 100% branching ratio of \(\textrm{Z}^{\prime } \rightarrow \textrm{t}\bar{\textrm{t}}\), which would not be the case if \(\textrm{Z}^{\prime }\) couples to both top and bottom quarks.

The main result of this paper is that probing heavy neutral gauge bosons produced in association with spectator top quarks, and decaying to a pair of bottom quarks, can be a key search methodology. It represents the most important anogenophilic/tritogenophilic mode for discovery at \(m(\textrm{Z}^{\prime }) < 2 m_{\textrm{t}}\) where the \(\textrm{Z}^{\prime }\rightarrow \textrm{t}\bar{\textrm{t}}\) decay is kinematically forbidden, and remains competitive with the \(\textrm{Z}^{\prime }\rightarrow \textrm{t}\bar{\textrm{t}}\) decay mode at \(\textrm{TeV}\) scale masses, benefiting from the possibility to reconstruct the \(\textrm{Z}^{\prime }\) mass from the two highest-\(p_{\textrm{T}}\) \(\textrm{b}\) jets and resulting in events with reduced jet multiplicity. Furthermore, even if a \(\textrm{Z}^{\prime }\) boson is discovered in other search channels when \(m(\textrm{Z}^{\prime })\) is large, a \(\textrm{t}\bar{\textrm{t}}\textrm{Z}^{\prime } \rightarrow \textrm{t}\bar{\textrm{t}} \textrm{b}\bar{\textrm{b}}\) search remains a key part of the search program at the LHC in order to establish the couplings of the \(\textrm{Z}^{\prime }\) to all fermions. In particular, whereas a \(\textrm{t}\bar{\textrm{t}}\textrm{Z}^{\prime } \rightarrow \textrm{t}\bar{\textrm{t}} \textrm{t}\bar{\textrm{t}}\) search can measure the \(\textrm{Z}^{\prime }\) mass and coupling to top quarks, the proposed \(\textrm{t}\bar{\textrm{t}}\textrm{Z}^{\prime } \rightarrow \textrm{t}\bar{\textrm{t}} \textrm{b}\bar{\textrm{b}}\) search can additionally measure the \(\textrm{Z}^{\prime }\) coupling to bottom quarks.

The proposed data analysis represents a competitive alternative to complement searches already being conducted at the LHC. Those searches are based on the analysis of the mass distribution of two \(\textrm{b}\)-quark jets, in the resolved or boosted regime, using events whose triggers require high-\(p_{\textrm{T}}\) jets [1, 4, 75], \(\textrm{b}\)-quark jets [74], or a photon [73]. In the analysis strategy considered here instead, we can rely on the presence of an electron or muon lepton originating from the decay of a spectator top, which allows an unbiased selection of \(\textrm{b}\)-quark jets originating from the \(\textrm{Z}^{\prime }\), or on the possibility to define a trigger using both a light lepton and jets, in order to select particles with lower energy, and thus probe lower values of \(m(\textrm{Z}^{\prime })\).

Because of the above reasons, we deem that the proposed analysis strategy should be considered in future \(\textrm{Z}^{\prime }\) searches at the LHC, by both the ATLAS and the CMS Collaborations.