Introduction

An overarching issue of Large Hadron Collider (LHC) experiments is the necessity of massive numbers of simulated collision events to estimate the rates of expected processes in very restricted regions of phase space. To mitigate this difficulty, a commonly used approach is the event weighting technique which replaces selection cuts with event weights. Assuming a set of N events before selection cuts that yield \(N_{\rm f}\) events after the selection, the estimated relative statistical uncertainty on the number of selected events will be \(1/\sqrt{N_{\rm f}}\). If instead of applying selection cuts, a weight corresponding to the selection efficiency \(w_i\) is applied to each event indexed by i, then an estimate of the variance will be \(\sum w_i^2\), thus yielding a relative statistical uncertainty on the estimated number of selected events of \(\sqrt{(\sum _{i\le N} w_i^2)/N}\). For the method to be effective, the variance of the weights needs to be small compared to the statistical uncertainty on \(N_{\rm f}\), which is typically the case.

So-far weights have been defined from binned efficiency maps. The difficulty in these methods is the range of applicability of efficiency maps that are limited in the number of dimensions (typically two), and subsequently, fail to capture more subtle effects that appear in specific regions of phase space. To account for these dependencies, a multidimensional mapping is required. This implies large statistical fluctuations in the map itself that defies the original purpose of the method.

A common example of the usage of event weighting techniques is typically given by analyses relying on the identification of jets originating from b-quarks (b-tagging) [1,2,3]. Applying a weight corresponding to the expected identification efficiency of a jet, i.e. the probability of being identified as a b-jets, instead of a direct selection cut can provide large gains in statistics (especially in cases of percent level efficiencies to be applied on several jets in an event). However, obtaining universally applicable maps requires to account for a large number of parameters. Some of which are typically not known or difficult to take into account with the binned approach.

The goal of the proposed method is to provide higher dimensional parametrizations of efficiencies that can capture non-trivial dependencies while making optimal use of the available statistics and therefore be applicable in any analysis context considered. When achieving this goal the parameterization will be referred to as universal. Multidimensional reweighting techniques have been proposed in the context of HEP experiments for BDT and neural networks [4,5,6,7]. We propose an approach based on Graph Neural Networks (GNN) [8, 9]. Compared to other non-equivariant deep-learning algorithms, GNN can naturally cope with variable size datasets that have no inherent order while optimally exploiting the pair-wise dependencies between different objects in the event.

The case study used is the b-tagging performance in the analysis of Higgs boson decays to b-quarks.

The strength of the proposed method relies on its ability to model high dimensional correlations between jets. These jet-by-jet dependencies are not given explicitly as input variables to the neural network, but rather they are inferred from single-jet properties during the training of the network. In case multiple jets in the event are b-tagged, the jet-efficiencies provided by the NN can be combined to derive an unbiased estimator of the event tagging efficiency. A toy model is built to probe the capability of the Machine Learning (ML) approach to provide a robust parameterization of the \(b\)-tagging efficiency.

The paper is organized as follows. Section “Event Weighting Technique” introduces the event weighting technique and describes the main challenges and goals of the method. Section “Simulated samples” describes the simulation technique used to generate the toy data-set. Section “Efficiency Map Techniques” describes a map-based technique that is commonly used to estimate the event weight based on a parameterization of the \(b\)-tagging classifier performance. Section “Truth Tagging with Neural Networks” describes the GNN model, whose results are compared to the ones of the map-based technique in Section “Results”. In Section “Discussion” some considerations about the usage of the proposed methodology in real experiments are presented. Conclusions are drawn in Section “Conclusions”.

Event Weighting Technique

In high energy physics experiments (HEP), estimating a background rate or a signal efficiency from a selection cut is most accurately achieved by a full simulation of the event. However, the precision of such an estimate can be heavily affected by the limitation in the number of events that can be simulated in a given region of phase space. If instead of selecting events based on a classification cut, a weight corresponding to the classifier efficiency is applied, significant improvements in sensitivity can be gained. This procedure is also known as Tag-Rate-Function (TRF) method or Truth Tagging (TT) [10,11,12].

Selections can be interpreted as a classification depending on a vector of input variables \({\mathbf {x}}\). The classifier can be represented by a function \(f({\mathbf {x}})\) and the classification by a simple selection cut on the classifier above a given threshold \(T_{\rm f}\). The classifier can represent simple cuts or a multivariate method. Typically the variables \({\mathbf {x}}\) depend on several underlying variables which will be denoted by \({\varvec{\theta }}\).

In the case of heavy-flavor tagging, \({\varvec{\theta }}\) is typically defined as the jet transverse momentum \(p_\text {T}\) and pseudo-rapidity \(\eta\) [10], while \({\mathbf {x}}\) includes the reconstruction of secondary vertices and a combination of track impact parameter information estimated from the properties of a set of reconstructed charged-particle tracks. This information is then combined to produce a multivariate jet-based classifier \(f({\mathbf {x}})\). Figure 1 schematically shows the usage of the efficiency for event weighting to reduce statistical uncertainties on simulated Monte-Carlo (MC) samples.

A parametrized classifier efficiency can be defined as:

$$\begin{aligned} \epsilon _{\mathrm{jet}}\left( {\varvec{\theta }}\right) = \frac{N(f({\mathbf {x}}) > \mathrm{T}_{\rm f}| {\varvec{\theta }})}{N({\varvec{\theta }}) } \end{aligned}$$
(1)

where \(T_{\rm f}\) is the operating working point threshold of the classifier; the numerator, the selected number of jets of a given flavor at this working point; and the denominator represents the total number of jets of the same flavor.

To achieve a parametrization of the efficiency, applicable to a large number of analyses, a set of relevant variables \({\varvec{\theta }}\) must be defined such that the conditional probability of the classifier inputs, x, at a given value of \({\varvec{\theta }}\), \(p({\mathbf {x}}|{\varvec{\theta }})\), will be identical between samples or different regions of phase space, as illustrated in Fig. 2.

This motivates the efficiency maps approach, where an attempt is made to parametrize \(\epsilon _{\text {jet}}\) binned in \({\varvec{\theta }}\). Efficiency maps are a commonly used tool in collider experiments. However, taking into account the full dependencies of the classifier efficiency is often impractical using efficiency maps. The reason being that a small enough set of variables that fully capture these dependencies might not be available.

In the case of \(b\)-tagging it was found that while \(p_\mathrm{T}\) and \(\eta\) are indeed the most dominant variables in determining \(\epsilon _{\text {jet}}\), there are other variables that affect the efficiency and could be considered had we known them, e.g. the angular separation and flavor of the adjacent jets [2, 13].

Fig. 1
figure 1

Usage of event weighting to reduce MC statistical uncertainties of some observable distribution. The plot on the top shows a classifier \(f({\mathbf {x}})\) used to select events. The events which pass the classification requirement are represented in green while the rejected events are shown in red. The bottom panel shows the event weighting where the classifier efficiency \(\epsilon ({\varvec{\theta }})\) is used to weight the events rather than rejecting them. \({\mathbf {x}}\) are the variables used by the classifier. For \(b\)-tagging, \({\mathbf {x}}\) includes variables such as the secondary vertex information while \({\varvec{\theta }}\) is the set of relevant variables used for the parametrization of the efficiency, such as the jet \(p_\text {T}\) and \(\eta\)

We propose a different approach to estimate \(\epsilon _{\text {jet}}\) based on a neural network built using a GNN. The neural network takes as input a set of jet-variables \({\varvec{\theta }}_{j_{e}}\) for each jet j in the event e. The input variables are the jet-(\(p_\text {T}\), \(\eta\), \(\phi\), \(\text {flavor}\)) and the neural network model is trained to predict the per-jet efficiency \(\epsilon _{\text {jet}}\). Since the true \(\epsilon _{\text {jet}}\) is conditional on the jet environment, its proximity to other jets, the neural network should learn to model that dependence, even if not explicitely given as input variable.

Fig. 2
figure 2

Illustration of a universal parametrization of the classifier efficiency. The joint distribution of (\({\mathbf {x}},{\varvec{\theta }}\)) is generally different between two samples. The top right plot shows the overall probability distribution of the input variables of the classifier, \(P({\varvec{x}})\), for two different samples. Different \(P({\mathbf {x}})\) distributions lead to different overall efficiencies between the two samples. The bottom right plot shows the conditional probability distributions, \(P({\mathbf {x}}|{\varvec{\theta }})\), between the two samples. The set of relevant variables \({\varvec{\theta }}\) is defined to provide a \(P({\mathbf {x}}|{\varvec{\theta }})\) which is sample independent. Under this condition, the parametrized classifier efficiency \(\epsilon ({\varvec{\theta }})\) is expected to be universal

Simulated Samples

The samples employed in this study consist of toy pp collision events with multiple jets generated with generic kinematic and flavor properties. We assume a cylindrical coordinate system where particle beams collide on the z axis, xy is denoted as the transverse plane, \(\phi\) is the azimuthal angle, \({\theta }\) the polar angle, and pseudo-rapidity \(\eta\) is defined as \(\eta =-\log \tan (\theta /2)\).

The generated events are sampled using an exponential function to fix the number of jets in the event and Gaussians or polynomial distributions to sample the jet kinematics variables and the angular distance between two jets \(\Delta R(i,j) =\sqrt{(\eta _i-\eta _j)^2 + (\phi _i-\phi _j)^2}\). More details about the event generation can be found in “Appendix A”.

Three separate samples of four-momenta representing b-, c- and light-jets are generated. The \(b\)-tagging efficiency is modeled using ad-hoc parameterizations using a multivariate Gaussian distribution depending on \(p_\text {T}\) and \(\eta\) which is modified by a multiplicative correction factor depending on the angular distance \(\Delta R(i,j)\) of other jets in the event as well as their flavor. This efficiency is chosen to mimic the \(b\)-tagging performance of ATLAS and CMS [1, 14] and it is expressed as:

$$\begin{aligned} {\epsilon _{\text {jet}}}_{i} = \epsilon _{f_{i}}(p_\text {T},\eta ) \cdot \prod _{j} {\hat{\epsilon }}_{ij} \left( \Delta R(i,j), f_j \right) , \end{aligned}$$
(2)

where \(\epsilon _{f_{i}}(p_\text {T},\eta )\) is the two-dimensional parameterisation of the efficiency to tag a jet of a given flavor \(f_i\), and \({\hat{\epsilon }}_{ij}\left( \Delta R(i,j), f_j \right)\) is the one-dimensional correction factor which accounts for the effect of any close-by jet j of flavor \(f_j\) in the event. The efficiencies \(\epsilon _{f_i}(p_\text {T},\eta )\) and the correction factors \({\hat{\epsilon }}_{ij}\left( \Delta R(i,j), f_j \right)\) are shown in Fig. 3.

Fig. 3
figure 3

The parameterized efficiencies used to emulate the performance of the flavor tagging algorithms. The efficiencies for each flavor as a function of \(p_\mathrm{T}\) and \(\eta\), \(\epsilon _{f_i}(p_\text {T},\eta )\) in the top three panels. The multiplicative correction factor \({\hat{\epsilon }}_{ij}\left( \Delta R(i,j), f_j \right)\) which accounts for the proximity (\(\Delta R(i,j)\)) and flavor of the close-by-jet \(f_j\) is shown at the bottom of the figure

The true \(b\)-tagging efficiency of each individual jet in the event is computed using Eq. 2. This efficiency value \({\epsilon _{\text {jet}}}_i\) is used to emulate \(b\)-tagging by assigning a boolean value to each jet \(\texttt {istag}\) which is set to 1 based on a random score \(s_i\) sampled from a uniform distribution. Namely, if \(s_i < {\epsilon _{\text {jet}}}_{i}\) the i-th jet in the event is considered to be b-tagged (\(\texttt {istag}\)=1). In many physics analyses, multiple jets in the event are required to pass \(b\)-tagging selections, hence the efficiencies of the single jet need to be combined to form a per-event efficiency. In this toy analysis the event selection is based on the two jets with highest \(p_\text {T}\) in the event (“leading jets”, labeled as 1 and 2), and it is defined depending on the number of b-tagged jets, \(n_\text {tag}\):

$$\begin{aligned} \epsilon _{\text {event}} = {\left\{ \begin{array}{ll} \quad (1-\epsilon _1) (1-\epsilon _2)&{}\qquad \hbox { if}\ n_\text {tag}=0,\\ \quad \epsilon _1 (1-\epsilon _2)+(1-\epsilon _1) \epsilon _2&{} \qquad \hbox { if}\ n_\text {tag}=1,\\ \quad \epsilon _1 \epsilon _2&{}\qquad \hbox { if}\ n_\text {tag}=2. \end{array}\right. } \end{aligned}$$
(3)

Efficiency Map Techniques

The estimation of \(\epsilon _{\text {event}}\) in the case of \(b\)-tagging in real experiments is commonly based on the binned two-dimensional efficiency maps in the jet \(p_\text {T}\)-\(\eta\) plane [12, 15], \({\tilde{\epsilon }}\), derived from MC simulation separately for \(b\)-jets, \(c\)-jets and light-jets, which are used to approximate the per-jet \(b\)-tagging efficiency of Eq. 2 as:

$$\begin{aligned} \epsilon _{\text {jet}} \approx {\tilde{\epsilon }} _i = {\tilde{\epsilon }} _{f_i}(p_\text {T},\eta ). \end{aligned}$$
(4)

The choice of the variables used to parameterize \({\tilde{\epsilon }}\) is motivated by the expected dependency of the \(b\)-tagging performance. For example, as the transverse momentum of a \(b\)-jet increases, the dilation of its lifetime in the laboratory frame results in secondary decay vertices which are reconstructed further from the interaction point of the primary collision. The reconstruction efficiency of secondary vertices is not constant as a function of their distance to the primary vertex and this affects the response of the \(b\)-tagging classifier. Similarly, the typical configuration of multi-purpose detectors produces a dependency of track reconstruction performance on detector geometry, which in turn propagates into a dependency of the \(b\)-tagging performance on \(\eta\).

From the per-jet efficiency maps \({\tilde{\epsilon }}\) the event weight \(\epsilon _{\text {event}}\) is computed factorizing the contribution from the various jets, similarly to what is shown in Eq. 3.

The main limitation of this map-based approach is the assumption that correlations between jets can be neglected and that the efficiency of \(b\)-tagging a single jet only depends on its \(p_\text {T}\) and \(\eta\). The dependency of efficiency on residual observables is marginalized out when deriving \({\tilde{\epsilon }}\) from MC samples, introducing a bias that is particularly significant for final states with large jet multiplicities or events where close-by or overlapping jets are reconstructed from the decay of boosted resonances. A dedicated \(\Delta R(i,j)\) reweighing was derived and used to correct for this effect in previous \(H\rightarrow b{\bar{b}}\) and \(H\rightarrow c{\bar{c}}\) analyses [2, 13]. Given the uncertain nature of this correction and the limited statistics of the sample used to derive it, a large systematic uncertainty equal to half of the correction was assigned to the relevant MC templates. The overall uncertainty related to the statistics of the MC templates constitutes a contribution up to around 20% to the total background uncertainty [3, 16].

Additional limitations come from the binning of the two-dimensional maps. To reduce discontinuities, smoothing techniques need to be employed. However, these techniques often require a non-trivial interplay between the bin sizes and the parameters of the smoothing model which makes their implementation unpractical compared to a single unbinned neural network training. Finally, the NN technique provides a simultaneous estimate of the efficiency for each jet-flavor in contrast to the map-based approach which requires a dedicated parametrization for each of the flavors independently.

Truth Tagging with Neural Networks

Taking into account the full dependency of the jet-tagging probability on all event observables would be unpractical with a map-based approach. ML techniques, on the other hand, provide the possibility to scale the problem to higher dimensionality and, therefore, to more challenging physics topologies.

In principle, a standard feedforward neural network could be used to perform the task. However, these models are not able to optimally cope with inputs of variable sizes and thus the overall correlations between jets in the event cannot be easily exploited during the training. The technique we propose uses a GNN to capture efficiently these correlations. A GNN also offers a more natural representation of the data by exploiting pair-wise relationships between the jets. In our toy experiment, each jet is represented by a set of variables corresponding to \((p_\text {T},\eta ,\phi ,\text {flavor})\). The neural network takes as input these variables for each jet in the event e, \({\varvec{\theta }}_{e}\) = \((({p_\text {T}}_{1},{\eta }_{1},{\phi }_{1},\text {flavor}_{1})\), ..., \(({p_\text {T}}_{n_\mathrm{jets}}\), \({\eta }_{n_\mathrm{jets}}\), \({\phi }_{n_\mathrm{jets}}\),\(\text {flavor}_{n_\mathrm{jets}})\)) and learns to approximate the efficiency given in Eq. 2 for each of these jets. Note that the inputs to the neural network do not include \(\Delta R\) between neighboring jets, which is the variable that determines the correction applied in Eq. 2 but rather this dependency is inferred directly during the training.

Model Architecture The model, referred to as NN in the following, consists of two components: a GNN [8] and a jet efficiency network. The flow of information between the different parts is illustrated in Fig. 4.

Fig. 4
figure 4

Schematic representation of the neural network structure

The GNN takes as input the \(n_\mathrm{jets}\times 4\) matrix of jet features, and outputs \(n_\mathrm{jets}\times d_\mathrm{hidden}\) matrix of jet hidden representations.Footnote 1 Thehidden representation for each jet is based on the information of the other jets in the event. The jet efficiency network then operates on each jet individually. It takes as an input the jet variables and the jet hidden representation and it returns as an output the predicted \(\epsilon _{\text {jet}}\) for every jet in the event. More details about the model architecture can be found in “Appendix B”.

Training Procedure The network is trained to predict the \(n_\mathrm{jets}\times 1\) vector of efficiencies. The loss function used for training is the weighted binary cross-entropy (BCE), which for a single event it can be written as:

$$\begin{aligned} \begin{aligned} \text {BCE}_{e}&= \frac{1}{N_\mathrm{jets}}\sum ^{N_\mathrm{jets}}_{i} \left[ - ({\texttt {istag}_{i}}) \log (\epsilon _{{NN}}({\varvec{\Theta }}_{e})_{i})\right] \\&\quad - \left[ \mu (1-{\texttt {istag}}_{i}) \log (1-\epsilon _{{NN}}({\varvec{\Theta }}_{e})_{i})\right] , \end{aligned} \end{aligned}$$
(5)

where the sum runs over the sets of jets, \(N_\mathrm{jets}\), in the event, e, which pass (\(\texttt {istag}\)=1) and do not pass (\(\texttt {istag}\)=0) \(b\)-tagging and \(\epsilon _{{NN}}({\varvec{\Theta }}_{e})_{i}\) is the i-th component of the output of the NN, a vector of variable size representing the predicted efficiency of tagging each jet in an event. The loss function being minimized is the sum of \(\text {BCE}_e\) for all the events in a batch. The factor \(\mu\) controls the weight of the non-tagged events and can be used to balance the number of tagged and non-tagged jets to facilitate the training. This approach could be useful for light-jets where the number of non-tagged jets is \({\mathcal {O}}(100)\) larger than the tagged ones. Even if this factor was found to be helpful in tests conducted with feedforward networks, for GNNs it was found to have a negligible impact on the final results. Therefore, \(\mu\)=1 is assumed in the following discussions.

Using a well-known result, the neural network trained using BCE as loss function converges to the following ratio [17]:

$$\begin{aligned} \begin{aligned} \epsilon _{{NN}}({\varvec{\Theta }}_{e})_{i} \approx \frac{p_{\text {tag}}({\varvec{\Theta }}_{e})_{i}}{p_{\text {tag}}({\varvec{\Theta }}_{e})_{i} + p_{\text {non-tag}}({\varvec{\Theta }}_{e})_{i}} \approx {\epsilon _{\text {jet}}}_{i}, \end{aligned} \end{aligned}$$
(6)

\(\epsilon _{{NN}}({\varvec{\Theta }}_{e})_{i}\) is the output of the network for the i-th jet in the event e which approximate the true efficiency \({\epsilon _{\text {jet}}}_{i}\) given in Eq. 2.

It is worth noticing that the NN computes directly the efficiency \(\epsilon _{{NN}}({\varvec{\Theta }}_{e})_{i}\) without regressing \(p_{\text {tag}}({\varvec{\Theta }}_{e})_{i}\) and \(p_{\text {non-tag}}({\varvec{\Theta }}_{e})_{i}\) independently.

Additional details on the training procedure can be found in “Appendix C”.

Results

In this section, the result of approximating \(\epsilon _{\text {jet}}\) and \(\epsilon _{\text {event}}\) using the jet \(b\)-tagging efficiencies calculated from the NN are presented and compared to the results obtained with the map-based technique discussed in Section 4. Three main aspects are discussed: the modeling of single-jet distributions after jet weighting, the capability of the NN technique to provide an unbiased estimation of \(\epsilon _{\text {event}}\), and the independence of the GNN performance on the choice of the sample used for training.

Fig. 5
figure 5

Violin plot illustrating the distributions of the true and predicted efficiency as a function of different kinematic variables for b-jets

Fig. 6
figure 6

Relative residuals distributions as predicted by the NN and the map-based approach for each individual jet in the event. The mean and RMS of the distributions are outlined in the plot

The true and predicted efficiencies are shown as a function of the set of relevant parameter \({\varvec{\theta }}\) in Fig. 5 for b-jets.Footnote 2 The relative residuals between \(\epsilon _{\text {true}}\) and \(\epsilon _{\text {predicted}}\) for all jets in the dataset is shown in Fig. 6. For both Figures, \(\epsilon _{\text {true}}\) is computed during the generation of the data-set following Eq. 2. While, as expected, the map-based approach is unable to provide good modeling of the \(\Delta R(i,j)\) distribution, the NN predictions are in good agreement with the distributions obtained when jets pass the tagging selection (direct tagging) and with true efficiency weights. These results give us confidence about the ability of the GNN to build an internal representation capable of capturing additional jet-to-jet information relevant to estimating the true tagging efficiency

Results of the reweighing procedure are further studied when both the leading and sub-leading jets are classified as \(b\)-jets, and compared to those from direct tagging. In this case, the event weight is simply computed as the product of the efficiencies of \(b\)-tagging each of the two jets, \(\epsilon _{\text {event}} = \epsilon _{1} \cdot \epsilon _{2}\). It is therefore important to study the modeling of distributions that capture correlations among individual jet observables, once event weights are applied.

The invariant mass distribution computed from the leading and subleading jets in each event is shown in Fig. 7. The figures are further sub-divided based on the true flavors of the two jets. The uncertainty on the efficiency prediction are estimated using a bootstrap procedure. The source of this uncertainty originates from the limited size of the training data-set and the inherent randomnesses of the training process. A more detailed discussion on the uncertainty bands can be found in Appendix 10. Similarly to the single-jet case, the NN predictions show good agreement compared to the true efficiency while the map-based approach is unable to properly capture the effect of close-by jets on \(b\)-tagging. It can also be noted that the reweighing procedure based on NN predictions improves the statistical uncertainly compared to the direct tagging.

Fig. 7
figure 7

istribution of the invariant mass of the two leading jets, when the events are weighted by the product of true efficiencies, as calculated in Eq. 2 (grey). Also shown is the distribution for events where both jets are \(b\)-tagged (direct tagging, black), or when the events are weighted using the estimated efficiency \({\tilde{\epsilon }}\) from the map-based approach (blue) or using the NN output (red). The lower pad shows the ratio between all distributions and the one obtained with true weights. Events are split into categories based on the true flavor of the two leading jets

Finally, the generality of the method is probed by using the same network to reweight events from a separate sample with different jet \(p_\text {T}\), \(\eta\) and \(\Delta R(i,j)\) distributions compared to the training sample. More details about this sample can be found in “Appendix A”. Figure 8 shows the results for the angular separation between the two decay products as well as for the reconstructed invariant mass of the generated boson. An overall good agreement is found between the NN results and direct tagging, similarly to the previous cases. This gives confidence about the universality of the proposed approach: as long as the phase space is sampled adequately during training, the efficiency estimated using the neural network is expected to be independent on the chosen sample.

Fig. 8
figure 8

Distribution of the \(\Delta R(i,j)\) (top) and invariant mass (bottom) of the leading-subleading jet system, obtained for events where these jets are classified as \(b\)-tagged (black), compared to the same distributions obtained when these jets are instead weighted with their probability of passing \(b\)-tagging, calculated using the true weight \(\epsilon\) from Eq. 2 (grey), using the efficiency \({\tilde{\epsilon }}\) from the map-based approach (blue) or using the NN output (red). The lower pad shows the ratio between the two latter distributions and the one obtained with true weights

Discussion

In this section we summarize some of the main considerations aimed at generalizing the proposed approach for use cases beyond the toy model presented in this paper.

  • The size of \({\varvec{\theta }}\): In the toy data-set we used a relatively small number of variables that control the efficiency the network was required only to infer the ”hidden” variable \(\Delta R(i,j)\). In more realistic applications, \({\varvec{\theta }}\) may include more variables and the function \(\epsilon (\theta )\) may be more complicated. To cope with this, the inputs features \({\varvec{\theta }}\) may need to be extended with additional variables. The number of the model learnable parameters also needs to be large enough so that the model is sufficiently expressive to describe \(\epsilon (\theta )\). Any variables potentially correlated with the tagging decision could be used to ensure that all correlations are captured. Neural networks are a particularly suitable tool to perform this task due to their flexibility to cope with higher dimensions.

  • The functional form of \(\epsilon ({\varvec{\theta }})\) : We assumed a relatively simple efficiency in Eq. 2. In principle, the neural network can learn any function, no matter how complex the functional form is, as shown in Ref. [18]. The method can be used in scenarios where the form of \(\epsilon ({\varvec{\theta }})\) may present more complex dependencies between the efficiency and the relevant variables \({\varvec{\theta }}\).

  • Systematic uncertainties: In the applications of the simple efficiency maps, the insufficient capture of the existing underlying correlations requires the introduction of systematic uncertainty. This method is aimed at avoiding this systematic error, it will, however, require thorough checks to ensure that its estimates are accurate.

  • Generalization of the method: In the proposed approach we have focused our studies to approximate efficiency, i.e. density ratios between two complementary classes. The method can also be generalized to approximate ratios between two separate classes.Footnote 3 A multidimensional ratio between two classes could be used in a variety of different applications, such as to derive multi-dimensional scale factors from data to correct the tagging efficiency in Monte Carlo simulation.

Conclusions

The parametrization of classifier efficiencies can play an important role to mitigate the limitations in the number of simulated events at LHC experiments. To be effective, parametrized classifier efficiencies need to be accurate in any context and therefore need to capture the dependencies on event properties that are used in analyses and which entail variations of efficiencies. A new technique that optimally exploits these dependencies is proposed. This technique is based on graph neural networks that provide an estimate of ratios between multidimensional local densities. We use the case of the identification of heavy-flavor jets as a topical example building a toy model based on ad-hoc parameterizations of the classifier efficiency inspired by the observed dependencies of \(b\)-tagging performance in the ATLAS and CMS experiments. A Graph Neural Network is used to exploit correlations between jets in the event to provide a less biased parametrization compared to the canonical map-based method.

A toy example is used to probe the performance of the method, which takes as an input the true flavors and momenta of reconstructed jets, and returns the \(b\)-tagging efficiency of each. These efficiencies are used to build the per-event weights in a sample of simulated events with multiple b-tagged jets. We use the estimated efficiency for the event reweighing technique which is used to reduce the statistical fluctuations of Monte Carlo samples after classification.

Results show good compatibility between per-jet and per-event kinematic distributions obtained with the proposed approach and the distributions expected from the direct application of \(b\)-tagging. We also show that the proposed technique can generalize to samples with input distributions differing significantly compared to the training sample while covering the same phase space.