1 Introduction

Precision measurements of standard model (SM) parameters are key objectives of the physics program of future lepton and hadron machines [1,2,3,4,5,6]. In particular, the measurement of the Higgs couplings to bottom (b) and charm (c) quarks, and gluons (g) [7,8,9,10,11,12,13], the Higgs self-coupling [14] and the precise characterisation of top quark properties, such as the top quark mass [15] and its electroweak couplings [16, 17] require an efficient reconstruction and identification of hadronic final states. Being able to efficiently identify the flavour of the parton that initiated the formation of a jet, known as jet flavour tagging, is therefore critical for the success of the physics program of future electroweak factories [18]. The large statistics of hadronic Z boson decays (\(>10^{11}\)) at future lepton and hadron machines would provide copious control samples to calibrate jet tagging algorithms in data.

Jets originating from b and c quark decays contain a b or c hadron that typically travels a macroscopic distance before decaying into lighter hadrons. Compared to b, c, up or down jets (collectively referred to as ud or light jets in what follows), strange quark (s) jets contain a larger fraction of s mesons and baryons. Gluons (g) carry a larger colour charge than quarks and thus tend to produce jets with a large particle multiplicity. Quarks have a harder fragmentation function compared to g, which results in a larger fraction of the jet momentum carried by a smaller fraction of the constituents. Jet flavour tagging algorithms aim at identifying these characteristic end-products of the fragmentation and hadronization of the initial parton.

The first b and c quark tagging algorithms were developed at LEP [19, 20] and the Tevatron [21, 22]. These algorithms typically rely on the detector capability to identify and measure charged tracks with a significant displacement (\(c\tau \sim 500~(150)~\upmu {\mathrm {m}}\)) from the beam axis originated from long lived B (D) meson weak decays. On the other hand, tracks from the charged hadrons produced in ud quark decays feature a small distance at closest approach to the interaction point. Therefore, in the case of b and c tagging, tracks are typically clustered to reconstruct possible secondary vertices (SVs). However, c tagging is more challenging than b tagging, due to its properties laying between the b and ud or g jets. The track multiplicity and the mass of the SV (expected to be large for heavy flavour jets), together with the presence of a non-isolated electron or muon indicating a semi-leptonic heavy flavour decay, are also used as discriminating variables in traditional heavy quark tagging algorithms. Such taggers, widely used in the early days of current LHC experiments [23,24,25,26] and future \(e^+e^-\) experiments [27, 28] are implemented by directly applying a selection on a combination of the tracks and SV properties, by constructing a likelihood ratio or a multi-variate discriminant based on a set of jet-level properties.

Recently, a new generation of advanced machine learning based jet tagging algorithms has been developed [29,30,31,32], bringing more than an order of magnitude improvement in background rejection compared to the traditional approaches in heavy flavour and g tagging. Three are the primary reasons for this success. First, significant advancements in the architecture of the neural networks used, as well as new jet representations that allow to better capture the jet properties have been achieved. Second, these algorithms exploit directly low-level information, e.g., from reconstructed particles (as in the Particle-Flow algorithm [33]) or even reconstructed hits, compared to traditional methods. This allows to explore in much more depth the true potential of the detectors and the event reconstruction, and also better capture the jet properties compared to algorithms relying on jet-level observables. Moreover, the nature of each of the jet constituents, via particle identification techniques (PID), is expected to provide an additional useful handle in discriminating between different jet species. Powerful particle identification capabilities based on ionisation energy loss (via dE/dx or cluster counting), or via precise time-of-flight measurements, are expected to be highly beneficial for jet flavour tagging, in particular for s tagging where the identification of charged kaons is crucial [34,35,36]. Finally, the developments in computing, e.g., graphics processing units, and the availability of very large Monte Carlo simulated and collision data samples, were critical for the development of these advanced methods.

In this paper we present a general framework for building a jet flavour tagging at future colliders using fast detector simulation and state-of-the art machine learning techniques. A major goal of the present work has been to allow for the evaluation of the impact of specific detector design options on the jet flavour tagging performance (and in turn on the physics potential) in an efficient yet precise way. To this end, we have implemented two key additions to the official \(\textsc {Delphes}\) fast simulation framework. The \(\texttt {TrackCovariance}\) [37], described in Sect. 2, which allows for a simple definition of a tracker geometry, and the fast simulation and the reconstruction of the parameters and covariance matrix of charged particles tracks. The \(\texttt {TimeOfFlight}\) [38] and \(\texttt {ClusterCounting}\) [39], described in Sect. 3, open up the possibility to model particle identification in \(\textsc {Delphes}\). Section 4 describes the input observables and the implementation of the jet flavour identification algorithm. The tagging algorithm is based on \(\textsc {ParticleNet}\) [40], using state-of-the-art jet representation and a graph neural network (GNN) architecture. The performance of the algorithm is evaluated using one of the FCC-ee/CEPC baseline detector concepts, the IDEA [1, 41, 42] detector. Variations around the baseline using Higgs decays taken from a Higgsstrahlung sample at \(\sqrt{s} = 240\) GeV are discussed. Finally, a discussion of the results, together with limitations of the current approach and perspectives for future work are presented in Sect. 5.

2 Fast tracking simulation

The tracking system is a major part of modern detectors for high energy physics experiments and arguably the most relevant for jet flavour tagging since it is responsible for reconstructing and identifying charged particles. The design of this system, its optimization, and the evaluation of its performance on many specific physics benchmarks is a fundamental step in the planning of future experiments. To this end, we have developed and included in \(\textsc {Delphes}\), a versatile and modular framework to easily study different detector configurations, and provide for each of those a fast simulation of the tracking performance. The corresponding module is named \(\texttt {TrackCovariance}\) [37]. In this section we present the general implementation of the algorithm, while technical details on the speed optimisation and randomisation can be found in Appendix A1.

While various attempts to calculate the track resolution analytically have been made (see for instance [43]), they usually make highly simplifying assumptions such as equal spacing and equal detector resolution, that make them unsuitable to use for a realistic combined tracking system. The tracking system geometry is described in terms of layers. Only two types of layer geometries are considered: cylinders coaxial with the beam axis and planar disks orthogonal to the beam axis (z-direction). Each layer can be either associated to a measurement with a given resolution or else be just included to describe passive material in the system. An accurate description of the material inside the tracking volume is important to estimate appropriately the contribution of multiple coulomb scattering to the track resolution. Several measurement geometries are allowed: axial or stereo strips and wires, and pixels.

The tracking system is located inside the solenoid magnet generating a constant field, B, directed parallel to the z-direction. With these assumptions, charged tracks follow a helix trajectory that is described with a set of five parameters: \(\vec {\alpha }=(D,\, \varphi _0,\,C,\,z_0,\,\lambda )\). These parameters are defined in the point of closest approach (PCA) of the track to the z-axis; D is the signed transverse distance of the PCA from the z-axis, \(\varphi _0\) is the track azimutal angle, C is the signed half curvature, \(z_0\) the z-coordinate of the PCA and \(\lambda \) the cotangent of the track polar angle. Given a charged particle originating at \(\vec {x}\) with momentum \(\vec {p}\) and charge Q, the parameters \(\vec {\alpha }\) are uniquely defined, as is the associated trajectory.

During its motion the charged particle will cross some of the layers described in the geometry and is reconstructed as a track provided that it produces at least 6 hits.Footnote 1 At each crossing the particle will undergo small random changes of its direction due to multiple scattering and, in the case of measurement layers, a generalized coordinate, \(d^*\), will be measured with an uncertainty given by the specific detector resolution. The track parameters are reconstructed from the measured coordinates by minimizing the following \(\chi ^2\) with respect to the track parameters \(\vec {\alpha }\). The \(\chi ^2\) is defined as:

$$\begin{aligned} \chi ^2 = ~(\vec {d}-\vec {d}^*)^t\,S^{-1}(\vec {d}-\vec {d}^*), \end{aligned}$$
(2.1)

where \(\vec {d}^*\) is the array of measured coordinates and \(\vec {d}\) that of predicted coordinates, that can be computed from the track parameters \(\vec {\alpha }\) and the geometry of each measurement layer. S is the covariance matrix of all the measurements and includes contributions from the detector resolution and from the multiple scattering. The superscript t indicates the transpose of a vector or a matrix. Assuming \(\vec {d}_0\) to be the array of predicted coordinates for which the \(\chi ^2\) is minimized, then for small variations of the track parameters relative to this minimum, \(\delta \vec {\alpha }\), we have:

$$\begin{aligned} \vec {d} \simeq \vec {d}_0+\frac{\partial \vec {d}}{\partial \vec {\alpha }}\delta \vec {\alpha } = \vec {d}_0+A\delta \vec {\alpha }, \end{aligned}$$
(2.2)

where A is the derivative matrix. Including Eq. (2.2) in Eq. (2.1) we obtain:

$$\begin{aligned} \chi ^2 = (\vec {d}_0-\vec {d}^*+A\delta \vec {\alpha })^t\,S^{-1}(\vec {d}_0-\vec {d}^*+A\delta \vec {\alpha }). \end{aligned}$$
(2.3)

Differentiating with respect to the track parameters we obtain the track parameter covariance matrix, C:

$$\begin{aligned} C^{-1}=\frac{1}{2}\frac{\partial ^2 \chi ^2}{\partial \vec {\alpha } \partial \vec {\alpha }} = A^t\,S^{-1}A. \end{aligned}$$
(2.4)

This equation highlights the key ingredients to estimate the track covariance matrix; the derivative matrix and the covariance matrix of the measurements. The former is straightforward and can be derived for every type of measurement given the track equation. The latter requires the combination of two elements: the intrinsic detector resolution and the multiple scattering contribution, as shown in the following equation:

$$\begin{aligned} S_{ij} = \sigma _i^2\,\delta _{ij}+M_{ij}, \end{aligned}$$
(2.5)

where the indices i and j identify the measurement layers, \(\sigma _i\) is the detector resolution for layer i and \(M_{ij}\) is the multiple scattering contribution. The \(M_{ij}\) includes contributions from all scattering layers below the smallest of the two indices, as shown in the following equation:

$$\begin{aligned} M_{ij} = \sum _{1\le k<\text {min}(i,j)} (L_i-L_k)(L_j-L_k) \theta _k^2(i,j), \end{aligned}$$
(2.6)

where \(L_i\) is the distance traveled by the track to the layer i and \(\theta _k(i,j)\) the standard deviation of the multiple scattering angle generated by layer k after correcting for projection factors specific for layers i and j.

Once A and S have been determined for a given track, the parameter covariance matrix can be computed analytically [44] using Eq. (2.4). The obtained resolution as a function of momentum of the track parameters (\(p_{\text {T}}\)D\(z_0\)\(\theta \)) for two different reference detector configurations proposed for a future \(e^+e^-\) collider is shown in Fig. 1. In this case, it clearly appears that the detector transparency is more important than the single point detector resolution, in particular for heavy flavor tagging task, that involve reconstructing and identifying mostly few GeV tracks.

Fig. 1
figure 1

Track parameter resolution for the IDEA and CLD detector concepts for FCC-ee [1]. The dashed lines in the top left plot show the multiple scattering contribution

3 Particle identification

Particle identification techniques can play a major role in the identification of the jet flavour. In particular, as will be discussed in Sect. 4, s jets contain a significant fraction of charged kaons (\(K^{\pm }\)) compared to u or d jets that are mostly composed of charged pions (\(\pi ^{\pm }\)). Given that the performance of such algorithms heavily depends on explicit detector design choices, it is crucial to be able to first simulate appropriately the detector response and then to implement such particle identification algorithms.

Two complementary particle identification techniques have been included in the \(\textsc {Delphes}\) fast-simulation. The precise measurement of the time of arrival of tracks in the outermost part of the tracking volume, together with the momentum and the path length, provide an indirect measurement of the particle mass via the well known time-of-flight method. This method has been implemented in the \(\texttt {TimeOfFlight}\) module [38]. The cluster counting method, dN/dx, implemented in the \(\texttt {ClusterCounting}\) module [39], consists in counting the multiplicity of the primary ionization clusters produced along the track in gaseous detectors, which together with the particle momentum, can also be used to infer the particle mass. In this section we discuss the implementation of these two methods within the simulation framework.

3.1 Time-of-flight

The time-of-flight (\(t_\text {flight}\)) of a particle can be expressed as:

$$\begin{aligned} t_\text {flight} \equiv t_\text {F}- t_\text {V} = \frac{L}{\beta } = \frac{L \sqrt{p^2 + m^2}}{p} = \frac{L E}{\sqrt{E^2 - m^2}}, \end{aligned}$$
(3.1)

where \(t_\text {F}\) is the measured time after propagation, \(t_\text {V}\) is the particle time of production at vertex, L is total path length, and p, E and m are the momentum, energy and mass of the particle, respectively. Provided that the quantities L and p (or E) and \(t_\text {V}\) can be measured, the measurement of \(t_\text {flight}\) provides an estimate for the particle mass and thus a powerful handle for particle identification.

For charged particles the reconstructed mass is given by:

$$\begin{aligned} m_{\text {t.o.f.}} ^{(c)} = p \sqrt{\left( \frac{t_\text {flight}}{L}\right) ^2 - 1}. \end{aligned}$$
(3.2)

The initial position (and therefore L) and the particle momentum p are reconstructed by means of the inner/outer tracking system, and simulated with the procedure described in Sect. 2. The time of a particle production at vertex \(t_\text {V}\) can be estimated indirectly, with the following procedure. Assuming that the beamspot has a small time (\(\sigma _\text {B,t}\)) and longitudinal (\(\sigma _\text {B,z}\)) spread compared to the precision of the timing measurement device, the time of the primary vertex can be simply taken as \(t_\text {PV} =0\). However, if the particle originates from a highly displaced vertex (e.g. from \(K_\text {S}\) or \(\Lambda \).), assuming \(t_\text {V} =0\) can lead to a severe over-estimate of \(t_\text {flight}\). A more accurate estimate for the vertex time corresponds to \(t_\text {V} = \frac{r_\text {V}}{\beta _\text {V}}\), where \(r_\text {V}\) is the distance of the vertex to the origin and \(\beta _\text {V}\) is the vertex velocity, computed from its outgoing particles. In the current study we assume we are able to reconstruct the initial time at vertex perfectly and therefore we take the initial time from Monte Carlo simulation. We define the significance (in number of standard deviations) of \(K/\pi \) hypothesis separation as:

$$\begin{aligned} N_\sigma = 2 \; \frac{\mu _\pi - \mu _K}{\sigma _\pi + \sigma _K}. \end{aligned}$$
(3.3)

The time of flight distribution of charged Kaons and Pions emitted at 90\(\,^\circ \) is shown in Fig. 2a, assuming a 30 ps timing resolution, which allows for an efficient 3\(\sigma \) \(K/\pi \) discrimination for momenta \(p<3.5\) GeV. For reference, a 3 ps timing resolution leads to 3\(\sigma \) separation for momenta \(p<10\) GeV.

Fig. 2
figure 2

a Time-of-flight for \(K^{\pm }\) and \(\pi ^{\pm }\) track at \(\theta = 90\,^\circ \) as a function of momentum in the IDEA detector drift chamber30. b Reconstructed \(m_{\text {t.o.f.}}\) for \(K^{\pm }\), \(\pi ^{\pm }\), \(K_{L}\), protons and neutrons with momenta \(p=1\) GeV

For neutral particles the mass can be reconstructed from the energy measurement provided by the calorimeters:

$$\begin{aligned} m_{\text {t.o.f.}} ^{(n)} = E \sqrt{ 1 - \left( \frac{L}{t_\text {flight}}\right) ^2}. \end{aligned}$$
(3.4)

At low momenta, where the time-of-flight method is expected to provide good identification capabilities, the calorimetric energy measurement is sub-optimal and leads to poor \(m_{\text {t.o.f.}}\) resolution for neutral particles compared to charged particles. Moreover, the vertex time determination is inaccessible for neutral particles. The assumption \(t_\text {PV} =0\) for all neutral particles leads to an additional uncertainty on the \(m_{\text {t.o.f.}}\) estimate. As an example, the reconstructed \(m_{\text {t.o.f.}}\) for \(K^{\pm }\), \(\pi ^{\pm }\), \(K_{L}\), protons and neutrons with momenta \(p=1\) GeV, where separation is close to optimal,Footnote 2 is shown in Fig. 2b.

3.2 Cluster counting

The cluster counting technique is expected to provide improved particle identification relative to the more commonly used dE/dx methods in large drift chambers or TPCs [45, 46]. In addition it does not require the tuning of truncated mean algorithms to suppress the large Landau tails present in the dE/dx distribution. The number of ionization clusters per unit length is obtained from a very detailed simulation program, \(\textsc {Heed++}\) [47], now fully integrated into \(\textsc {Garfield++}\) [48]. An array of number of ionization clusters per unit length for several values of \(\beta \gamma \) is obtained from \(\textsc {Garfield++}\) and used to interpolate the average cluster density. The total mean number of clusters is found by multiplying for the track length in the chamber. Finally the observed cluster number is obtained by extraction over a Poisson distribution with that mean. Four common gas options are available: pure Helium or Argon, He 90% + Isobutane 10%, Argon 50% + Ethane 50%. This library can be easily extended if needed to a larger collection of gas mixtures.

In Fig. 3a the potential for \(K/\pi \) separation is shown for a He 90% + Isobutane 10% mixture over a wide range of momenta (\(2<p<30\) GeV) . The combination of the cluster counting and time-of-flight techniques is displayed in Fig. 3b and shows an efficient separation of \(K^{\pm }\) / \(\pi ^{\pm }\) separation (\(\ge 3\sigma \)) for momenta \(p<30\) GeV.

Fig. 3
figure 3

a Number of cluster distribution of charged pions and kaons for 90\(\,^\circ \) tracks in the IDEA detector drift chamber as function of momentum; b \(K/\pi \) separation in number of \(\sigma \) as a function of the particle momentum using the dN/dx and time-of-flight methods

4 Jet flavour identification

In this section a novel jet tagging algorithm is presented. The jet flavour discrimination uses reconstructed observables at the level of the jet constituents. For simplicity, the jet flavour discriminant is built and evaluated using \(e^+e^-\) collisions reconstructed with the IDEA detector concept and will thus be referred as \(\textsc {ParticleNetIdea}\). While the obtained performance is specific to the clean \(e^+e^-\) environment and the explicit detector specifications, the inputs and the construction of the discriminant itself are general. We first discuss the event generation and reconstruction details, then introduce the particle-level input observables and the architecture of the neural network discriminant. Finally we address the tagger performance and its robustness with respect to different detector choices.

Table 1 Set of input variables

4.1 Simulated data

The simulated sample consists of \(e^+e^- \rightarrow ZH\) events produced at a center of mass energy \(\sqrt{s}={240 \hbox {GeV}}\). The Higgs bosons decay to \(H \rightarrow g g\) or \(H \rightarrow q \bar{q}\), where \(q=(u,d), s, c, b\) with relative fraction as expected for a SM Higgs boson with \(m=125\) GeV, whereas the Z bosons always decay to a pair of neutrinos. The hard scattering process is generated with \(\textsc {MadGraph5}\)_aMC@NLO [49], while \(\textsc {Pythia8}\) [50] is used for modeling the decay, parton-shower and hadronisation processes. Five different samples, corresponding to each jet flavour category (ud, s, c, b, g) containing \(10^6\) events each (or equivalently \(2 \times 10^6\) jets) are used for the training. Final state particles are reconstructed with the \(\textsc {Delphes}\) PF algorithm. In particular, charged particles are reconstructed using the latest \(\texttt {TrackCovariance}\) module described in Sect. 2, and the time-of-flight and number of ionisation clusters per unit length (\(dN/dx\)), are reconstructed using the \(\texttt {TimeOfFlight}\) and \(\texttt {ClusterCounting}\) modules, described in Sects. 3.1 and 3.2, respectively. A charged particle is reconstructed provided that it produces at least 6 hits within the tracking volume. Neutral particles (photons and neutral hadrons) are reconstructed by the PF algorithm implemented in the \(\texttt {DualReadoutCalorimeter}\) module [51]. The time-of-flight (and corresponding reconstructed mass \(m_{\text {t.o.f.}}\)) of neutral hadrons is also included and assumes a 100 ps resolution, as opposed to 30 ps assumed for charged particles. The baseline simulation setup assumes the nominal IDEA detector concept [41, 42]. Jets are clustered with the \(\textsc {FastJet-3.3.4}\) [52] package using the \(e^+e^-\) generalized \(k_{\text {T}}\) algorithm [53, 54] with parameter \(p=-1\) (for infrared safety) and \(R=1.5\) to maximise the energy collected in the jet. This set of parameters leads to an optimal Higgs di-jet invariant mass resolution.

4.2 Input features

The jet constituents in the form of PF candidates are used as inputs to the \(\textsc {ParticleNetIdea}\) algorithm. For each PF candidate we define a set of input observables (features) that are summarized in Table 1. The first set of inputs, denoted as kinematics, uses features derived from the 4-momentum of each jet constituent. These include the energy measurement of the constituent relative to the jet energy and the direction of the jet constituents relative to the jet momentum. The second set of features, labelled as displacement, includes observables related to the longitudinal and transverse displacement of the jet constituents which are more relevant to identify jets originating from the hadronization of the b and c quarks. Finally, the third set of inputs, labelled as identification, refers to the nature of each particle using the PF reconstruction and the particle identification (PID) algorithms presented in Sect. 3.

The total number of reconstructed jet constituents, shown in Fig. 4a, is typically larger for g jets compared to quark jets due to their different color factor. We note that the particle multiplicity is shown here for illustrative purposes only as it is not used directly as input to \(\textsc {ParticleNetIdea}\) since it is a jet-based variable, while only particle-level observables are used. The remaining distributions of Fig. 4 correspond to particle-level observables and are calculated using the charged constituent with the largest displacement. Figure 4b displays the relative energy of the jet constituent with respect to the jet energy. Gluon jets populate lower values of this observables, indicating that the jet energy is more democratically distributed among the constituents. Figure 4b display observables relevant for b and c quark identification, such as \(\text {SIP}_{\text {2D}}\) (left) and its significance \(\text {SIP}_{\text {2D}}/ \sigma _{\text {2D}}\) (right) as defined in Table 1. As expected, in b jets, and to smaller extent in c jets, a significantly larger displacement is observed compared to the other jet flavours. Displaced particles can also be present in other jet flavours, e.g. from long-lived \(K_{\text {S}}^{0}\) or \(\Lambda \) hadrons decays, but represent a much smaller fraction.

Fig. 4
figure 4

Shape comparison of a set of representative observables relevant for jet flavour identification. The different colors correspond to different jet flavours. The FCC-ee IDEA detector concept is used. In these histograms, the first and last bin correspond to the underflow and overflow entries

4.3 The flavour tagging algorithm

The \(\textsc {ParticleNetIdea}\) algorithm is based on the \(\textsc {ParticleNet}\) jet tagging algorithm [40]. \(\textsc {ParticleNet}\) uses an advanced network architecture, based on Graph Neural Networks (GNN) that first developed in the context of proton–proton collisions at the LHC. A novel jet representation was utilized in \(\textsc {ParticleNet}\), where jets are represented as an un-ordered set of particles. As shown in Refs. [40, 55,56,57,58,59,60,61,62,63,64,65,66], this provides a more natural jet representation compared to alternative approaches based on jet images [67,68,69,70,71,72,73,74,75,76] or ordered lists of jet constituents [77,78,79,80,81,82,83,84,85] and translates to an improved tagging performance. A hierarchical learning approach using convolution operations [86] is adopted. Different convolutional layers are used to learn features at different scales: the shallower layers explore local neighborhood information, whereas more global structures are learned by deeper layers. The jet constituents are represented as a graph, where each node of the graph is a jet constituent, and relationships between the particles are the edges of the graph. Each node has a set of features related to constituent properties. However, the graph is not static, rather it is updated after each convolutional operation. The ultimate goal is to group jet constituents according to their proximity in the multi-dimensional space defined by the learned features.

The current \(\textsc {ParticleNetIdea}\) implementation uses up to 75 constituents for each jet, sorted by the highest momentum, which typically correspond to more than 99% of the total jet momentum. The algorithm is designed to discriminate between five orthogonal jet classes: ud, s, c, b, and g jets. The training is performed using the \(\textsc {Weaver}\) package [87] on 10M jets (2M per category) over 30 epochs on a NVIDIA GTX 1080Ti GPUs. The network outputs 5 real numbers \(D_{i}\) (\(i = g,\,\ell (ud),\, s,\,c,\,b\)) between 0 and 1 (discriminants), one for each jet category. Approximately 1M jets are used to evaluate the \(\textsc {ParticleNetIdea}\) performance. For every jet flavour pair (ij), the binary discriminant is constructed as:

$$\begin{aligned} D_{i,j} = \frac{D_{i}}{D_{i} + D_{j}}, \end{aligned}$$
(4.1)

where \(D_{i(j)} \) are the output scores of the classes i and j. For example, \(D_{b,c} \) represents the binary discriminant for tagging b quark jets against c quark jets. The efficiency of tagging flavour i as function of the probability of mis-identifying the jet as flavour j (mistag rate) can be constructed by computing the probability of selecting jets that satisfy \(D_{i,j} > \alpha \), for \(\alpha \in [0,1]\). The receiver operating characteristic (ROC) curve, i.e. the mistag rate as a function of the tagging efficiency (for every \(\alpha \)), is used as a figure of merit for evaluating the tagger performance for every jet flavour.

4.4 Results

The nominal \(\textsc {ParticleNetIdea}\) flavour tagging performance is shown in Fig. 5 for different jet flavours. The b tagging performance is shown in Fig. 5a. The most effective discrimination is observed against ud jets since these contain mostly tracks with no displacement. For high b tagging efficiency, g jet rejection is more effective than c jet (with both being less effective than u, d or s jet rejection). Conversely at small tagging efficiencies (i.e. for high tagging purity), c jet rejection becomes more effective than g jet rejection due to a sizeable probability for g to produce \(b \bar{b}\) splittings. For c tagging, at high efficiencies, b jet discrimination is the most effective (due to a large difference of lifetime between B and D mesons), followed by ud and g jet rejection. For large c tagging purity (i.e. at low efficiency and high background rejection), we observe that b jet rejection becomes more challenging than ud jet rejection, which is expected since a fraction of B meson have inherently a comparable decay length to D mesons. We also observe that in this regime \(g \rightarrow c \bar{c}\) splittings result into more challenging g rejection. In Fig. 5c the s tagging performance is shown. The most effective discrimination is observed against b jets followed by c jets due to the large displacement of their tracks. The mistag rate against g and ud jets is substantially larger since displacement observables are not discriminating, and the algorithm relies mainly on PID-related variables. Rejection of g jet is more effective than ud jet one since s and ud jets have similar particle multiplicities. Finally, the g tagging performance is displayed in Fig. 5d. Rejection of ud jets is the most challenging, due to similar particle displacement and nature, followed by s, c and b jet rejection.

Fig. 5
figure 5

Evaluation of \(\textsc {ParticleNetIdea}\) performance in terms of a receiver operating characteristic (ROC) curve for the identification of different jet flavours i.e., b quarks (upper left), c quarks (upper right), s (lower left), and g (lower right). The different jet flavours considered background are indicated on the labels. The IDEA detector configuration is used

The modularity of the framework enables the study of the algorithm performance for different detector design choices. In this work, we report two representative examples of possible detector design variations. Figure 6a shows the importance of particle identification information in discriminating s jets from other jet flavours. Exploiting PID information with the nominal \(dN/dx\) and \(t_\text {flight}\) resolutions yields to approximately an order of magnitude reduced ud jet mistag rate for the same s tagging efficiency. A timing detector providing an improved \(t_\text {flight}\) resolution of 3 ps for charged particles, yields a small, but detectable improvement compared to the more realistic scenario of 30 ps. The performance obtained using MC truth information for PID (“ideal PID”) is also shown for reference. In that case only a marginal improvement in performance is observed, suggesting that the existing detector configurations and PID algorithms are very close to optimal. We also note that the improvement brought by neutral particle timing is limited due a much worse nominal timing resolution (100 ps) from the calorimetric timing measurement and most importantly because the contribution coming from neutral massive particles is at most 10% of the total jet energy. We also observe that the usage of PID information brings only modest improvement in other jet flavour tasks.

The distance of the first vertex detector layer to the interaction point is the most important parameter for achieving optimal transverse impact parameter resolution and hence b and c tagging performance. While the nominal IDEA vertex detector provides already an excellent resolution (three layers, with the innermost layer located at 1.5 cm), we study the impact of introducing an additional fourth layer in the pixel detector, closer to the beam pipe, located at 1 cm from the interaction point, on c jet identification. The corresponding performance is displayed in Fig. 6b. The largest improvement is observed in the discrimination against ud jets, where for the same \(\epsilon _{\mathrm {S}}\), \(\epsilon _{\mathrm {B}}\) is reduced by almost a factor of two. Smaller, yet important improvement, is seen in the discrimination against other jet flavours. The impact of an additional pixel layer was studied for other jet flavours treated as signal without significant improvement in the performance.

Fig. 6
figure 6

Evaluation of \(\textsc {ParticleNetIdea}\) performance of the jet flavour identification for various detector assumptions. a Impact of particle identification on s tagging performance. b Impact of inner track geometry 3 vs 4 layers on c tagging

5 Conclusion and perspectives

Jet flavour tagging will be a crucial tool for maximising the physics potential at future colliders. This work builds on the design of a fast detector simulation framework, and provides an efficient way to study the impact of different detector design options to the jet flavour tagging problem. A fast tracking module was developed, which allows to easily configure a full tracking geometry including material effects and compute both the charge particle track parameters and the track covariance matrix. Two algorithms that allow for particle identification, the time-of-flight and cluster counting with respectively configurable time resolution and gas composition have also been added. The framework is designed to provide flexibility for further studies, such as the exploration of alternative clustering algorithms, beam energies and final states.

Deep learning techniques based on GNNs have proven very effective for classification problems such as jet flavour tagging and boosted jet tagging at the LHC, and have not been explored yet in the context of future experiments. This paper presents the first algorithm for jet flavour tagging at future \(e^+e^-\) colliders using state-of-the-art jet representation and a GNN architecture. At such future machines where statistics for Higgs processes are moderate, flavour taggers will be required to perform well in the high tagging efficiency regime while still providing excellent background rejection. In this study we have investigated the impact of MIP timing resolution and of an additional inner tracking layer on the tagging performance. More studies are possible and should be pursued: the interplay of MIP and calorimeter timing on PID performance, the impact of the tracker design on displaced tracks performance, \(K_{S}\) and \(\Lambda \) reconstruction and hence on s tagging, secondary vertex reconstruction on b, c and s tagging. Another area for future studies is the calibration of the algorithm. The algorithm is designed to have very little dependence on the jet kinematics and therefore a calibration strategy relying on a Z boson sample of unprecedented statistical power expected to be obtained in \(e^+e^-\) experiments seems a promising avenue.

We stress that this study has possible limitations given the inherent optimistic nature of fast simulation. In particular, this tracking simulation include a simplistic particle-matter description where multiple scattering is taken into account to derive track parameter resolutions but no secondary emissions are simulated (i.e. electron bremsstrahlung, photon conversions and hadronic interactions are neglected). A natural next step is to assess the limitation of the fast detector simulation framework by validating the results with events produced using Full Simulation. Nevertheless, the set of tools presented in this article should provide robust means for assessing an upper limit of the achievable tagging performance and the relative performance of alternative detector design choices at future \(e^+e^-\) colliders. We also point out that the presented framework should allow for similar optimisations at any future machine, including high energy proton–proton or Muon colliders, acknowledging however that further caution is required due to the lack of simulation of larger background levels.