1 Introduction

The long-awaited discovery of the Higgs boson at the LHC experiments [1] in the year 2012 completed the experimental validation of the standard model (SM). However, there are some well known issues that are not addressed by the SM, such as the mass hierarchy problem, the unification of the fundamental interactions, the origin of the baryon asymmetry of the Universe, and a plausible explanation for dark matter. To address these issues, SM is proposed to be extended into a more complete theory. In general, candidate extensions predict the existence of new fundamental particles and interactions. The Large Hadron Collider (LHC) experiments are conducting a great diversity of searches for discovering these new particles and interactions. Results of all these searches so far have been found to be consistent with the SM predictions. The forthcoming High Luminosity Large Hadron Collider (HL-LHC) and Future Circular Collider (FCC) machines, with their higher luminosity, energy and better detector acceptance and efficiency, will increase the sensitivity of these searches, and expand them to more difficult scenarios, enabling access to higher particle masses and lower effective cross sections.

One class of candidate extensions to the SM consists of Grand Unified Theories (GUTs) based on a gauge group larger than that of the SM. The GUT models merge strong and electroweak interactions in a single gauge group, thereby allowing a solution to at least two of the above mentioned problems, namely, the complete unification of the fundamental interactions (except gravity) and the baryon asymmetry of the observed Universe. Specifically, when unifying gravity with other interactions both within the contexts of the superstring and supergravity theories, the exceptional Lie group \(E_6\) has been shown to be the gauge symmetry group which can be compactified from 10 (or 11) dimensions down to the \(3+1\) that we observe [2].

The GUT model using the Exceptional Lie Group \(E_6\) as the gauge symmetry group is referred to as the \(E_6\) model. It predicts the existence of iso-singlet quarks (in literature, denoted by DS,  and B) having charge \(Q=-1/3\). The discovery potential of the ATLAS experiment for the down type iso-singlet quark D of the first SM family has been previously investigated in [3, 4]. The discovery reach for D quarks were estimated at a phenomenology study before the LHC data taking to be 950 GeV for \(100\ \hbox {fb}^{-1}\) integrated luminosity using the combination of all D decay channels [4].

Dedicated searches for down-type iso-singlet quarks predicted by the model described in this paper in the LHC data are currently ongoing in the ATLAS experiment. In the meanwhile, the closest estimates of sensitivity come from searches for vector-like quarks (VLQs), which have similar production mechanisms. However almost all existing VLQ searches are exclusively designed to target third generation vector-like partners B and T of the bottom and top quarks. The most stringent limits to date come from an ATLAS combination of 7 VLQ searches performed with 13 TeV data, looking at different final states [5,6,7,8,9,10,11], which excluded T (B) masses below 1.31 (1.03) TeV for any combination of decays into SM particles [12]. Several CMS studies also searched for third generation VLQs. One search in the single lepton channel excluded T masses less than 1.295 TeV in exclusive decays to tW [13]. A different fully hadronic search excluded T and B quark masses between 0.74–1.37 TeV [14], while a leptonic search excluded T quarks with masses below 1.14–1.30 TeV and B quarks with masses below 0.91–1.24 TeV [15] for various branching fraction combinations. However those limits do not directly apply to the down-type isosinglet quark scenario studied here, as the searches mainly focus on third generation final states that contain b quarks. For the specific case of light-flavor VLQ, a CMS search with at least one lepton excluded pair-produced VLQs below masses 845 and 685 GeV for branching ratios \(B(W) = 1\) and \(B(W) = 0.5\), \(B(Z) = B(H) = 0.25\), respectively [16].

In this study, we investigate the possibility of observing the pair production of first generation down type iso-singlet quarks D, in the decay channel \(D \rightarrow Zd \rightarrow \ell ^+\ell ^-d\) (where \(\ell = e, \mu \)) using the 4 leptons plus 2 jets final state at the HL-LHC and the proton-proton scenario for the FCC. Due to its low effective cross section, this process could not be observed at the current LHC conditions. With their higher luminosity, energy and detector acceptances, HL-LHC and FCC are expected to significantly improve sensitivity in this channel. Despite its low effective cross section, exploring this channel is critical, as it provide the most precise reconstruction of the D quark mass in case of discovery.

Additionally, this work aims to test the feasibility of a new and practical analysis writing approach for high energy physics. The search method in this study is implemented and performed using an analysis description language and its runtime interpreter CutLang, which allows quick analysis prototyping and histogramming [17, 18].

The paper starts by introducing the down-type iso-singlet quark model in Sect. 2 followed by a description of the HL-LHC and FCC colliders and relevant experimental conditions in Sect. 3, and the analysis description language and runtime interpreter CutLang in Sect. 4. Detailed explanation of the search for D quarks and the search results are presented in Sect. 5 followed by the conclusions in Sect. 6.

2 Down-type iso-singlet quark model

If the group structure of the SM, \(SU_C (3) \times SU_W (2) \times U_Y (1)\), originates from the breaking of the \(E_6\) group at the GUT scale, then the quark sector of the SM is extended by the addition of an iso-singlet quark per family as:

$$\begin{aligned}&\begin{pmatrix} u_L\\ d_L \\ \end{pmatrix} , \quad u_R, d_R, D_L, D_R ;\nonumber \\&\begin{pmatrix} c_L\\ s_L \\ \end{pmatrix} ,\quad c_R, s_R, S_L, S_R ;\nonumber \\&\begin{pmatrix} t_L\\ b_L \\ \end{pmatrix} , \quad t_R, b_R, B_L, B_R . \end{aligned}$$
(1)

In the considered model, the S and B quarks are assumed to be heavy and decoupled from the spectrum, leaving the D quark as the only one accessible for searches at the present and near future colliders. A second assumption, following from the general behavior of CKM (Cabibbo, Kobayasi, Maskawa), is that mixing inside a given family is stronger compared to mixing between different families. Therefore, we only consider the Lagrangian relevant for the weak interaction of d and D quarks as given in [19]:

$$\begin{aligned} L_D&= \frac{\sqrt{4\pi \alpha _{em}}}{2\sqrt{2}\sin {\theta _W}} \big [\bar{u}^{\theta } \gamma _\alpha (1 - \gamma _5) d \cos {\phi }\nonumber \\&\quad + \bar{u}^{\theta } \gamma _\alpha (1 - \gamma _5) D \sin {\phi }\big ] W^\alpha \nonumber \\&\quad - \frac{\sqrt{4\pi \alpha _{em}}}{4\sin {\theta _W}}\left[ \frac{\sin {\phi } \cos {\phi }}{\cos {\theta _W}} \bar{d}\gamma _\alpha (1 - \gamma _5) D\right] Z^\alpha \nonumber \\&\quad - \frac{\sqrt{4\pi \alpha _{em}}}{4\cos {\theta _W}\sin {\theta _W}} \nonumber \\&\quad \times \big [\bar{D} \gamma _\alpha (4 \sin ^2{\theta _W} - 3 \sin ^2{\phi }(1 - \gamma _5)) D\nonumber \\&\quad + \bar{d}\gamma _\alpha (4 \sin ^2{\theta _W} - 3 \sin ^2{\phi }(1 - \gamma _5)) d\big ] Z^\alpha \nonumber \\&\quad + h.c. , \end{aligned}$$
(2)

where the superscript \(\theta \) represents the usual CKM mixings taken to be in the up sector for simplicity of calculation, \(\theta _W\) is the weak mixing angle and \(\phi \) is the mixing angle between the d and D quarks, which is responsible for the decay of the D quark. The limits on \(\phi \) can be obtained from the current precision measurements for the \(3\times 3\) CKM matrix elements, assuming that its \(3\times 4\) extension has the sum of the squares of the elements of a row equal to 1.

The evaluation of the presently measured values and their errors yield \({|\sin {\phi }|} \le 0.035\) (0.043) allowing a 1(2) sigma variation on the first row elements [20]. The cross section calculation results for FCC are essentially insensitive to \(\sin (\phi )\), since the studied pair production proceeds mostly via gluon exchange. However at HL-LHC, especially for large values of D quark mass, the production is mostly via the \(q\bar{q}\) channel which has a slight \(\sin (\phi )\) dependence for the cross section due to the t channel sub-process propagating via W boson as shown in Fig. 1, sub-figure (d). Since this sub-process contributes with an opposite sign, reducing the mixing angle effectively increases the \(D\bar{D}\) production cross section.

The branching fractions for the three possible D decay modes, \({D\rightarrow Wu}\), \({D\rightarrow Zd}\) and \({D\rightarrow hd}\) are about \(50\%\), \(25\%\) and \(25\%\) respectively for masses above \(\sim 800~\hbox {GeV}\) [21]. In this study, we consider the pair production of D quarks and their subsequent decay in the \(D\rightarrow Zd\) channel to explore the discovery prospects of two possible future collider scenarios.

3 Considered collider scenarios

3.1 High-luminosity LHC

The LHC reached its design value of peak luminosity \(10^{34}~cm^{-2} s^{-1}\) in June, 2016. The High-Luminosity Large Hadron Collider (HL-LHC) project aims to improve the performance of the LHC in order to increase the potential for discoveries after 2027 [22,23,24]. To implement this, HL-LHC will have several cutting-edge technologies, such as, 11–12 T superconducting magnets; very compact superconducting cavities with ultra precise phase control for beam rotation; new technology for beam collimation; and long high-power superconducting links with zero energy dissipation. HL-LHC is expected to reach the peak luminosity of \(5 \times 10^{34}\ \mathrm{cm}^{-2} \, \mathrm{s}^{-1}\), allowing an integrated luminosity of \(250\, \mathrm{fb}^{-1}\) per year. Therefore, it gives an integrated luminosity of \(3000\, \mathrm{fb}^{-1}\) in the operation period of about a dozen years after the upgrade. This integrated luminosity corresponds to ten times the amount LHC is expected to collect after 12 years of operation.

To meet the challenges brought by this higher luminosity at the HL-LHC, such as higher radiation dose, higher particle rate, higher pileup, and higher event rate, etc, the ATLAS and CMS detectors will undergo an extensive upgrade (i.e. the “Phase 2” upgrade). The ATLAS inner tracker (ITk) is being completely rebuilt for Phase 2, as a result of which, the pseudorapidity coverage will extend up to \(|\eta | = 4\). Moreover, new front-end electronics and a new readout system in the calorimeters will allow triggering higher resolution objects at the lowest trigger level at an increased rate, and lead to improved reconstruction. In addition, new inner barrel chambers will be installed in the muon detector system for increased coverage. The CMS detector will similarly undergo major upgrades which include a replacement of the silicon strip and pixel components in the tracking detector increasing the coverage up to \(|\eta |= 4\). The hadronic calorimeter will be read out by silicon photomultipliers. The endcap electromagnetic and hadron calorimeters will be replaced with a new combined sampling calorimeter that will provide highly-segmented spatial information in both the transverse and longitudinal directions, as well as high-precision timing information. The muon system will be extended with new chambers in the forward region, bringing the coverage up to \(|\eta | = 2.8\). Additionally, both ATLAS and CMS envisage adding timing detectors to provide the capability of adding timing information to reconstruction [25, 26].

3.2 Future Circular Collider

The Future Circular Collider (FCC) was launched as a world-wide international collaboration hosted at CERN in response to the 2013 Update of the European Strategy for Particle Physics (EPPSU) [27, 28]. In the 2020 Update of EPPSU, it has been proposed to investigate the technical and financial feasibility of FCC [29]. FCC scenarios are studied for three different types of particle collisions, namely hadron (proton-proton and heavy ion), electron-positron and proton-electron collisions. The proposed energy frontier proton-proton collider, FCC-hh, which is considered in this study, is designed to provide proton–proton collisions with a centre-of-mass energy of 100 TeV and an integrated luminosity of \(20~ab^{-1}\) for 25 years of operation. The FCC-hh collider layout has two high luminosity interaction points for general purpose detectors. The factor 7 increase in energy over the present LHC requires a vast modification compared to the designs of current general purpose LHC detectors. The detectors for 100 TeV should be able to measure multi-TeV jets, leptons and photons from heavy resonances with masses up to 50 TeV, while at the same time measuring the known SM processes with high precision, and still being sensitive to a broad range of BSM signatures with moderate momentum. In addition, future detectors will need to operate at  1000 pileup events per bunch-crossing. The detector acceptance is targeted to increase up to \(|\eta | = 4.4\) in order to improve sensitivity to vector boson fusion processes.

4 CutLang analysis description language and runtime interpreter

As mentioned earlier, one goal of this study is to test the feasibility of the new “analysis description language” approach in analysis writing and running in phenomenological studies. An analysis description language is a domain-specific, declarative language designed to express the physics contents of an analysis in a standard and unambiguous way. In this approach, the description of the analysis components is decoupled from the software framework that runs the analysis.

This study uses the language ADL [30,31,32], which consists of a plain text file containing blocks with a keyword-value structure. The blocks make clear the separation of analysis components such as object definitions, variable definitions, and event selections while the keywords specify analysis concepts and operations. The syntax includes mathematical and logical operations, comparison and optimization operators, reducers, four-vector algebra and common HEP-specific functions (e.g. \(\delta \phi \), \(\delta R\), etc.). ADL files can refer to self-contained functions encapsulating variables with complex algorithms (e.g. \(M_{T2}\), aplanarity, etc.) or non-analytic variables (e.g. efficiency tables, machine learning discriminators, etc.).

ADL can be used for performing an analysis by any framework capable of interpreting and running it. Here, we use CutLang [17, 18], a runtime interpreter, which is able to operate directly on events without the need for compilation. CutLang is written in C++ and is based on ROOT [33] classes for Lorentz vector operations and histogramming. It uses automatically generated dictionaries and grammar rules based on unix tools Lex and Yacc[34]. The typical output of an analysis in CutLang is a file containing surviving events and histograms which can be used for statistical analysis.

Not having the necessity to write or compile code, combined with the simple, human-readable nature of ADL syntax makes it a very practical construct for quickly performing phenomenological analyses such as the one in this study.

5 Search for down-type iso-singlet quarks

5.1 Signal and background processes

The main tree level Feynman diagrams for the pair production of D quarks at hadron colliders are presented in Fig. 1. The model Lagrangian in Eq. (2) was implemented into the tree level event generator, CompHEP [35, 36]. The resulting pair production cross sections at generator level for HL-LHC and FCC-hh for the gg and \(q\bar{q}\) channels and their sum are shown in Fig. 2 as a function of D quark mass. The pair production cross section is somewhat smaller than the single production, for example for a D quark of 1 TeV the former is 38.6 fb whereas it is 94.5 fb for the latter. However as the single production results depend heavily on the mixing angle and the SM background is especially large due to QCD jets this paper focuses on pair production.

Fig. 1
figure 1

Tree level Feynman Diagrams for the process \(pp\rightarrow D\bar{D}\)

Fig. 2
figure 2

\(pp \rightarrow D\bar{D}\), \(q\bar{q} \rightarrow D\bar{D}\) and \(gg \rightarrow D\bar{D}\) cross sections vs D quark mass for HL-LHC and FCC-hh energies, calculated using CompHep. The \(d-D\) mixing angle is taken as \(\sin {\phi } = 0.035\)

The \(E_6\) GUT model does not predict the masses of the iso-singlet quarks. Therefore, this study scans some plausible values for the D quark mass (up to \(2500~\hbox {GeV}\)) to estimate the experimental reach at both HL-LHC and FCC-hh machines. The iso-singlet quarks are expected to immediately decay into SM particles due to their large masses. In this analysis, we have considered the decay process \(D\bar{D}\rightarrow ZZd\bar{d}\), with subsequent leptonic decays of both Z bosons, \(Z \rightarrow \ell ^+\ell ^-\).

The main SM background to the signal process is \(pp \rightarrow ZZjj\) production, with subsequent leptonic decays of both Z bosons. The SM cross-section of \(pp \rightarrow ZZjj\) is calculated using MadGraph5_aMC@NLO [37] considering up to 4 QED and QCD interaction vertices and found to be 2.918 pb and 68.04 pb for HL-LHC and FCC-hh, respectively.

Processes with Higgs decaying to two Z bosons also provide final states resembling that of the signal, however they are not considered as significant backgrounds in this study due to relatively low effective cross sections as well as one of the Z bosons from the Higgs boson decay being virtual. At 14 TeV, the Higgs production cross sections are estimated as 54.6 pb from gluon fusion, 4.3 pb from VBF, 1.5 pb from WH, 0.98 pb from ZH and 0.55 pb from bbH production channels. To obtain an estimate for ZZjj final states, these numbers are multiplied by the \(h\rightarrow ZZ\) branching fraction and the hadronic branching fraction of W and Z bosons. Moreover, the gluon fusion cross section is corrected to account for multi-jet events [38]. Extrapolating linearly from 8 and 13 TeV results, the \(h+2j\) cross section from gluon fusion is estimated as 5.4 pb at NLO level. Folding in the appropriate branching fractions, the total effective cross section for the Higgs-related backgrounds becomes \(\sim 0.31~\hbox {pb}\), which is a small fraction of the direct ZZjj production cross section. The approximate estimate of these processes for 100 TeV is \(\sim 5~\hbox {pb}\), which is similarly small compared to the SM ZZjj cross section. Moreover, one of the Z bosons originating from the Higgs decays would be virtual. Therefore the majority of such events would be rejected by the requirement of two reconstructed Z bosons having an invariant mass of 91.2 GeV in our analysis.

The \(E_6\) model signal events with D quarks decaying to SM particles and SM background events were generated using CompHEP and MadGraph5_aMC@NLO respectively. The CompHEP setup was adjusted to impose a generator level requirement of 10 GeV on the transverse momenta of the SM d-quarks originating from the \(D\rightarrow Zd\). The NNPDF 3.1 parton distribution function set [39], which is the most up-to-date set available has been used both for 14 and 100 TeV. Further decays and showering and hadronization processes were simulated using Pythia6 [40]. Pythia was set up to only allow electron and muon decays of the Z bosons. Subsequently, the detector effects were modelled with the fast detector simulation program Delphes [41] using the configurations [42, 43] for generic HL-LHC and FCC-hh detectors.

5.2 Object and event reconstruction and selection

The complete object and event reconstruction and selection algorithm for the analysis is given in ADL format in Table 1. This is, in fact, the exact ADL code run in CutLang to produce the results presented in this paper.

The analysis is performed in the \(4\ell +2j\) channel, and thus uses leptons and jets. Both for HL-LHC and FCC-hh cases, leptons considered are electrons and muons, which are both required to have transverse momentum \(p_T > 20~\hbox {GeV}\) and pseudorapidity \(|\eta | < 4\). Electrons (muons) are required to have an isolation of 0.1 (0.2) within a cone of \(dR < 0.3\). Jets are reconstructed with the anti-\(k_T\) algorithm with a radius of \(R=0.5\), and are required to have \(p_T > 50~\hbox {GeV}\) (which is higher than generator level requirement) and \(|\eta | < 4\). Increased pseudorapidity acceptance at the HL-LHC and FCC-hh detectors compared to LHC will provide an increased sensitivity for the analysis. Events are required to have at least 4 leptons and at least 2 jets as defined above.

Table 1 Analysis description using the ADL/CutLang syntax. This description can be directly processed with CutLang over events

5.2.1 Leptonic Z boson reconstruction

The two Z boson candidates from the D decay are reconstructed from the selected leptons. For an efficient Z boson reconstruction, we consider the following criteria:

  1. 1.

    Mass of the reconstructed Z boson candidate should be as close as possible to 91.2 GeV,

  2. 2.

    the Z boson candidate should be flavour and charge neutral (i.e, reconstructed from a \(e^+e^-\) or a \(\mu ^+ \mu ^-\) pair).

In this analysis, we are focused on final states with Z bosons with moderate momentum, which decay to non-collimated leptons that can be independently reconstructed. However, especially at the FCC-hh energies, higher mass D quarks yield a Z boson \(p_T\) spectra with a higher component of boosted Z bosons that would decay to collimated lepton pairs. Such collimated lepton pairs would partially fail to be identified as two individual leptons due to the lepton isolation requirement and be counted as a single lepton, resulting in the event failing the 4 lepton criteria. A more effective treatment of the boosted final states would require Z boson reconstruction via explicit tagging of the boosted Z boson via collimated lepton jets. These boosted channels can be added when collimated lepton jet tagging performance or simulation for the FCC-hh conditions become available, and they would increase the analysis sensitivity.

For the resolved final state, leptons are paired to reconstruct both Z bosons simultaneously in the \(\chi ^2\) expression below, which both selects the dilepton combinations with masses as close as possible to the measured Z mass of 91.2 GeV and ensures the same flavor requirement on dileptons in a candidate:

$$\begin{aligned} \nonumber \chi ^2_{ZZ}\equiv & {} (m_{Z1} - 91.2)^2 + (m_{Z2} - 91.2)^2\nonumber \\&+ \left( 999\times PdgID\left[ Z_1\right] \right) ^{2} + \left( 999 \times PdgID\left[ Z_2\right] \right) ^{2}.\nonumber \\ \end{aligned}$$
(3)

More information on technical implementation of Z reconstruction and the \(\chi ^2_{ZZ}\) in ADL and CutLang is given in Appendix A. The reconstructed Z candidates are additionally required to have a total electric charge of 0. Mass distributions of both Z candidates reconstructed from \(e^+e^-\) and \(\mu ^+\mu ^-\) pairs are shown in Fig. 3 for different D quark masses for HL-LHC and FCC-hh.

Fig. 3
figure 3

Invariant mass distribution for both reconstructed Z boson candidates for HL-LHC (top) and FCC-hh (bottom) conditions. Candidates are reconstructed from both \(e^+e^-\) and \(\mu ^+\mu ^-\) pairs

5.2.2 D quark reconstruction

Each D quark candidate (\(D_1\) and \(D_2\)) is reconstructed from a Z boson candidate and a jet. Once again, the reconstruction is based on a \(\chi ^2\) optimization which takes into account the following conditions:

Fig. 4
figure 4

Transverse momentum distribution for both jets used in D quark reconstruction for HL-LHC (top) and FCC-hh (bottom) conditions. The jets are selected by minimizing the condition defined in Eq. 4

Fig. 5
figure 5

Distribution of angular distance between the two reconstructed D quark candidates \(D_1\) and \(D_2\) for HL-LHC (top) and FCC-hh (bottom) conditions. The jets are selected by minimizing the condition defined in Eq. 4

  1. 1.

    D quark mass is presumed unknown. However, masses of the two reconstructed D quark candidates should be as close as possible to each other. We express this condition as:

    $$\begin{aligned} \chi ^2_{m_D} \equiv ((m_{D_1} - m_{D_2})/m_D )^2 , \end{aligned}$$
    (4)

    where \(m_{D_1}\) and \(m_{D_2}\) are the invariant masses of the two D quark candidates and \(m_D = (m_{D_1} + m_{D_2})/2\).

  2. 2.

    Transverse momentum of the jets directly originating from the D quark decay is expected to be high. To ensure selecting jets with high momentum, we use the Heavyside step function with a weight factor:

    $$\begin{aligned} \nonumber \chi ^2_{p_{T,j}}\equiv & {} H(p_{T,j}^{cut} - p_{T,j_1}) \times ((p_{T,j}^{cut} / p_{T,j_1}) - 1.0) \\&+ H(p_{T,j}^{cut} - p_{T,j_2}) \times ((p_{T,j}^{cut} / p_{T,j_2}) - 1.0),\nonumber \\ \end{aligned}$$
    (5)

    where \(p_{T,j_1}\) and \(p_{T,j_2}\) are the transverse momenta of the jets and \(p_{T,j}^{cut}\) is the selection threshold to be applied to the jet transverse momenta. To determine the optimal value for this threshold which would obtain the best signal-background separation, we show the \(p_T\) distributions of the candidate jets in Fig. 4 for signals with different \(m_D\) and the background at HL-LHC and FCC-hh. The jets are selected by minimizing the condition defined in Eq. 4. Due to its much higher center-of-mass energy, FCC-hh yields a much harder jet \(p_T\) spectrum. Based on these distributions, we select \(p_{T,j}^{cut}= 300\) and \(500~\hbox {GeV}\) as thresholds for HL-LHC and FCC-hh respectively.

  3. 3.

    Angular separation between the two D quarks,

    $$\begin{aligned} dR_{DD} = \sqrt{(\eta _{D_1} - \eta _{D_2})^2 + (\phi _{D_1} - \phi _{D_2})^2} , \end{aligned}$$
    (6)

    should reflect that the D quarks are centrally produced, with negligible Lorentz boost. The most characteristic configuration would correspond to D quarks having \(|\eta | \simeq 0\) and being back-to-back on the transverse plane, which gives \(\delta \phi \simeq \pi \), where \(\delta \phi \) represents the \(\phi \) difference of the two particles. As a result, dR is expected to be dominated by \(\delta \phi \) and peak around 3.14. This can be seen in Fig. 5, which shows the dR distributions for signals and the background for HL-LHC and FCC-hh, after applying a minimization based on Eq. 4. Both signals and the background peak around 3.14, but the backgrounds display a wider distribution. Based on this information, we define a variable that can be minimized to zero:

    $$\begin{aligned} \chi ^2_{dR_{DD}} \equiv (dR_{DD}/3.14 - 1.0)^2. \end{aligned}$$
    (7)

We then combine the three conditions in Eqs. 4, 5 and 7 to obtain a \(\chi ^2\) and select the D candidates by running a minimization based on the sum:

$$\begin{aligned} \chi ^2_{DD} \equiv \chi ^2_{m_D} + \chi ^2_{p_{T,j}} + \chi ^2_{dR_{DD}} \simeq 0 . \end{aligned}$$
(8)

Here, we tried different relative weighting of \(\chi ^2_{m_D}\), \(\chi ^2_{p_{T,j}}\) and \(\chi ^2_{dR_{DD}}\), but the above choice gives the optimal result.

5.2.3 Final selection on \(\chi _{DD}^2\)

Figure 6 shows the distribution of \(\chi ^2_{DD}\) values obtained after minimization for HL-LHC (top) and FCC-hh (bottom) conditions for signals with different \(m_D\) and background. As expected, the signals exhibit a distribution much closer to zero compared to the background. A selection of \(\chi _{DD}^2 < 0.5\) was applied to further reduce the SM contamination. The threshold value was chosen to ensure a high signal significance.

Fig. 6
figure 6

Distribution of \(\chi ^2_{DD}\) values obtained after minimization for HL-LHC (top) and FCC-hh (bottom) conditions

Table 2 Percentage selection efficiencies for various signals and background for the HL-LHC selection
Table 3 Percentage selection efficiencies for various signals and background for the FCC-hh selection
Fig. 7
figure 7

Distribution of average reconstructed D quark invariant mass \((m_{D_1} + m_{D_2})/2\) for HL-LHC conditions for background and signals with \(m_D = 600~\hbox {GeV}\) (top), \(800~\hbox {GeV}\) (middle) and \(1000~\hbox {GeV}\) (bottom). Results of the fit to the sum of a Gaussian and Crystal Ball functions are also shown

Fig. 8
figure 8

Distribution of average reconstructed D quark invariant mass \((m_{D_1} + m_{D_2})/2\) for FCC-hh conditions for background and signals with \(m_D = 800~\hbox {GeV}\) (top), \(1600~\hbox {GeV}\) (middle) and \(2500~\hbox {GeV}\) (bottom). Results of the fit to the sum of a Gaussian and Crystal Ball functions are also shown

5.3 Results

The percentage selection efficiencies for signal and background events for the event selection criteria described above are given in Tables 2 and 3 for HL-LHC and FCC-hh. Overall signal selection efficiency is seen to increase as D mass increases.

The distribution of the average reconstructed D quark invariant mass \((m_{D_1} + m_{D_2})/2\) in the signal and background events that remain after selection are shown in Figs. 7 and 8 for different generated D quark masses for HL-LHC and FCC-hh, respectively. Signal events are seen to peak visibly over the falling background distributions. In order to reduce the statistical fluctuations due to limited amount of statistics, the signal and background distributions can be modelled with a Gaussian function and a Crystal Ball function, respectively. The signal and background yields are obtained from the total events distribution, by fitting it to the sum of these two functions. The initial fit parameters for the Crystal Ball and Gaussian functions were determined by performing independent fits to the signal and background distributions. The resulting fits are also shown in the same figures.

The fit results are then used for estimating the final signal and background yields denoted as S and B. These are obtained by integrating the fitted Gaussian and Crystal Ball functions in a range defined by two standard deviations mass window around the Gaussian mean. The obtained values for each D quark mass are then used for calculating the signal significance \(\sigma _{DD}\) defined as:

$$\begin{aligned} \sigma _{DD} \equiv \sqrt{2\times \left[ \left( S+B\right) \ln {\left( 1+\frac{S}{B}\right) } -S \right] } . \end{aligned}$$
(9)

The yields S and B along with the significance obtained for each simulated mass point are shown in Tables 4 and 5 for HL-LHC and FCC-hh, respectively. Signal significance values are also shown in Fig. 9, plotted against the D quark mass. A linear function is fitted to the plot to estimate the dependence of significance on D quark mass. The D quark mass values, for which it would be possible to make an observation (\(3\sigma \)) or a discovery (\(5\sigma \)), are then calculated from the linear function obtained from the fit, and are shown in Table 6 for HL-LHC and FCC-hh. Finally, the integrated luminosities required for \(3\sigma \) observation and \(5\sigma \) discovery at HL-LHC and FCC-hh are plotted versus D quark mass in Fig. 10.

6 Conclusions

In this paper, we studied the feasibility of discovering pair-produced down type iso-singlet quarks D at the High Luminosity LHC and the hadronic scenario for the Future Circular Collider. The search was designed in the \(4\ell + 2j\) channel, targeting the \(D \rightarrow Zd \rightarrow \ell ^+\ell ^- d\) decay mode, which is not accessible at the LHC. Despite its relative low sensitivity, this channel is expected to provide the most precise reconstruction of the D quark mass. Furthermore, in case of D quark discovery through a higher sensitivity channel, \(ZZ \rightarrow 4\ell \) channel would help to estimate relative branching ratios, thus leading to a preliminary understanding of the underlying model properties. However, extracting further information on the model would require observing and measuring iso-singlet partners of different quark types.

The analysis consisted of a basic event selection followed by a two-step reconstruction of the D quark masses, where the Z bosons were reconstructed in the first step. A \(\chi ^2\) optimization was used for finding the combination giving the best D quark candidates. A further selection was applied on the \(\chi ^2\) to discriminate signal events from the background. Finally, a fit was performed on the average D quark invariant mass distribution to obtain event yields and sensitivity.

The study showed that the \(5\sigma \) discovery reach for D quark mass at HL-LHC is possible, and is around 730 GeV for the full run period, while FCC-hh can reach up to 2980 GeV, considering only the \(4\ell + 2j\) decay channel. It also demonstrated that FCC-hh requires about two orders of magnitude less integrated luminosity than HL-LHC for discovering D quarks at a given mass. Therefore searches for \(E_6\) GUT models using \(4\ell + 2j\) channel would benefit from FCC-hh. Sensitivity of FCC-hh could further be enhanced by the addition of final states with boosted Z bosons decaying to boosted collimated lepton jets.

As a side note, this study showed an example of how extensively the analysis description language (ADL) concept and its runtime interpreter implementation, CutLang, can be used to benefit particle physics analyses. This approach allows performing the analysis algorithm steps (e.g. object definitions, object reconstructions, histogramming) in an easy and descriptive way.

Table 4 Signal and background yields and significance for different D quark masses at HL-LHC
Table 5 Signal and background yields and significance for different D quark masses at FCC-hh
Fig. 9
figure 9

Signal significance as a function of D quark mass for HL-LHC and FCC-hh

Fig. 10
figure 10

The integrated luminosity needed for \(3\sigma \) observation and \(5\sigma \) discovery as a function of D quark mass for HL-LHC (top) and FCC-hh (bottom)

Table 6 Upper limit on D quark masses for \(3\sigma \) observation and \(5\sigma \) discovery for HL-LHC and FCC-hh