1 Introduction

The tagging of boosted jets is an essential tool for the search of heavy new physics beyond the Standard Model (SM) that involves one or more coloured particles as final state. After the first works on jet substructure [1,2,3] many jet substructure variables [4,5,6,7,8] have been proposed to discriminate boosted Higgs bosons, top quarks and weak W/Z bosons from the QCD background composed of quark and gluon jets. (See also Refs. [10,11,12].) Multivariate taggers based on jet substructure variables [13, 14] or directly using jet images (see Ref. [15] for a review) have also been employed. These taggers can be qualified as ‘dedicated’, namely they are designed to discriminate jets corresponding to a specific type of signal (Higgs, top, W/Z) from the background. Despite achieving a good discrimination (especially for multivariate taggers), a drawback of dedicated taggers is the fact that they may not be sensitive to other types of jets different from those which they are designed for. This was made clear in Ref. [16]: the application of a wrong tagger to a four-pronged jet can cause the signal significance to plummet, even below the value when the tagger is not applied.

A first attempt to develop a ‘generic’ tagger, sensitive to a variety of complex jets, was done in Ref. [17], by training a neural network (NN) with two, three and four-pronged jets as signals, against the QCD background. Another complementary approach is brought by weakly-supervised and unsupervised methods, where little or no theoretical input is given and the NN learns to discriminate potential signals in different ways, for example by comparing different kinematical regions [18,19,20] or by a reduction of the space dimensionality using an autoencoder [21,22,23,24]. Despite all these multivariate methods are very effective in the discrimination, they have two inherent drawbacks:

  • Often, there is no obvious interpretation of how the multivariate methods work in separating signal from background. This is because the interpretability of the results greatly decreases in general with the complexity of the machine learning (ML) models.

  • More complex ML models require more complex implementations, which translates into more difficult testing and reproducibility of the results, especially when the literature does not provide all the necessary details of the implementation (which is typically the case, unfortunately).

As a complement to these methods, it is our purpose here to develop and test a set of simple taggers for multi-pronged jets, based on a set of variables proposed in Ref. [13],

$$\begin{aligned} \left\{ \tau _1^{(1/2)}, \tau _1^{(1)}, \tau _1^{(2)}, \dots , \tau _{M-2}^{(1/2)}, \tau _{M-2}^{(1)}, \tau _{M-2}^{(2)}, \tau _{M-1}^{(1)}, \tau _{M-1}^{(2)} \right\} \,, \nonumber \\ \end{aligned}$$
(1)

with \(M > 1\) an integer. The subjettiness variables \(\tau _n^{(\beta )}\) measure to which extent the radiation in a jet is clustered around n axes, with \(\beta \) an angular exponent (their precise definition is given in the next section). As shown in Ref. [13], the set of variables (1) allows to reconstruct the phase space of M partons within a jet, up to a global rotation.

Our main goal in this work is simplicity: the taggers obtained are simple functions of the subjettiness variables (1), with an approximate mass decorrelation à la DDT [25] which suffices to maintain the shape of the jet mass distribution to a large extent, after the application of the taggers. The taggers are developed using Logistic Regression Design (LoRD) to find the numerical parameters in the function that achieve the best discrimination, as described in Sect. 2 and Appendix A. There exist already in the literature taggers based on optimised products of subjettiness variables: a method to develop taggers with a scan over the parameter space was earlier introduced in Ref. [26], and complex neural networks were used in Ref. [27], for the development of taggers to discriminate two-pronged jets versus QCD jets, and for quark/gluon jet identification. With respect to those, the design of our taggers is much simpler, and also addresses the two issues mentioned above about interpretability and reproducibility.

We find that the discrimination power of our simple taggers in some cases largely surpasses the simple variables used by the ATLAS and CMS Collaborations in searches for new physics using jet substructure. Results are presented in Sect. 3. We also compare the results using LoRD with NNs using the same architecture as in Ref. [17]. In general, the NNs perform better, except for some signals which neither are trained for.

An important point in the design of the taggers is the kinematical region (i.e. jet mass and \(p_T\)) used for the optimisation. We address this issue in Sect. 4. While the dependence on \(p_{TJ}\) is marginal, the dependence on jet mass is more noticeable when one gets away from the design region. Our conclusions are presented in Sect. 5. In Appendix B we estimate the variance of the taggers obtained with the LoRD. A qualitative discussion about the interpretability and the intrinsic dimension of the datasets is discussed in Appendix C. In Appendix D we compare among different options for the design of the taggers, regarding the grooming (or not) of the jet mass, momentum and subjettiness variables. In Appendix E we summarise a few results for taggers without mass decorrelation, and in Appendix F we investigate the performance of the more complex taggers – the ones designed for four-pronged jets – for jets with less prongs.

2 LoRD of the taggers

The input to the taggers is given by a set of subjettiness variables (1) with \(M\le 9\),Footnote 1 where

$$\begin{aligned} \tau _n^{(\beta )} = \frac{1}{p_{TJ}} \sum _{i} p_{Ti} \; \text {min} \left\{ \Delta R_{1i}^\beta , \Delta R_{2i}^\beta , \dots , \Delta R_{ni}^\beta \right\} \,, \end{aligned}$$
(2)

with i labelling the particles in the jet, \(p_{Ti}\) their transverse momenta, \(\Delta R_{Ki} \) their lego-plot distance to the axis \(K=1,\dots ,N\) and \(p_{TJ}\) the jet transverse momentum. As in Ref. [13], in the computation of these variables we use the axes defined by exclusive \(k_T\) algorithm [28, 29] with standard E-scheme recombination [30].

The proposed functional form for the taggers is

$$\begin{aligned} T = {\bar{T}} - b \rho -a \,, \end{aligned}$$
(3)

with

$$\begin{aligned} {\bar{T}} = \sum _{n,\beta } c_n^\beta \log \tau _n^{(\beta )} \end{aligned}$$
(4)

and \(\rho =\log m_{J}^2 / p_{TJ}^2\). The coefficients \(c_n^\beta \) are determined by Logistic Regression to optimise the discrimination between the signal(s) and the background (see Appendix A for a detailed description of this implementation); b is a parameter chosen to achieve an approximate mass decorrelation, and for convenience the tagger output is shifted by subtracting a fixed quantity a so that its average \(\langle T \rangle \), when evaluated on a reference background sample, vanishes. Notice that the sum (4) is equivalent to a product

$$\begin{aligned} \prod _{n,\beta } \left( \tau _n^{(\beta )} \right) ^{c_n^\beta } \,, \end{aligned}$$
(5)

that includes and generalises the commonly used ratios \(\tau _{21} = \tau _2^{(1)} / \tau _1^{(1)}\) and \(\tau _{32} = \tau _3^{(1)} / \tau _2^{(1)}\). Several comments and clarifications regarding our procedure are now in order.

  • Independently of the precise method used to determine \(c_n^\beta \), a range of jet mass and \(p_T\) has to be specified for the signal(s) and background. For any of these processes the distributions of the variables \(\tau _n^{(\beta )}\) depend on \(m_J\) and \(p_{TJ}\). The coefficients \(c_n^\beta \) are then obtained to optimise the discrimination of signal and background within a given interval of jet mass and \(p_T\).

  • Using ungroomed jet mass and \(p_T\) for the determination of \(c_n^\beta \) reduces the dependence of the resulting taggers on the intervals chosen, and slightly improves the mass decorrelation. Also, this is desirable in order not to stick to a particular grooming algorithm. We have also tried using the groomed mass and \(p_T\), and the discrimination power is similar. We also use \(\tau _n^{(\beta )}\) of the ungroomed jets, since we find that the discrimination between signal and background is better. A comparison among these possibilities is made in Appendix D.

  • For the mass decorrelation, evaluation of the tagger performance, etc., namely in all calculations except the tagger design itself, we use \(m_J\) and \(p_{TJ}\) of the groomed jets. The recursive soft drop [31] algorithm with parameters \(\beta = 1\), \(z_\text {cut} = 0.05\), \(N=3\) is found to work very well for multi-pronged jets, avoiding the peak distortions and shifts that other algorithms produce [32].

  • The parameter a is chosen to adjust \(\langle T \rangle = 0\) for a reference background sample with (groomed) \(p_{TJ}\ge 250\) GeV, and without any restriction on \(m_J\). For larger \(p_T\) or when considering a narrow \(m_J\) interval, the average is slightly shifted. This residual dependence may be accounted for by varying the tagger threshold, as done for example in Ref. [33] or, equivalently, by varying a as a function of \(m_J\) and \(p_{TJ}\). This sophistication however is not required for our discussion.

With the LoRD method we optimise the discrimination between quark/gluon jets (background) and multi-pronged decays of boosted colour singlet particles (signal). Quark and gluon jets are obtained by generating the parton-level processes \(p p \rightarrow Zg\), \(p p \rightarrow Zq\), with decay \(Z \rightarrow \nu \nu \), using MadGraph5 [34], and Pythia 8 [35]. In all cases the centre-of-mass energy is set to 13 TeV. The detector response is simulated with Delphes 3.4 [36] using the CMS detector card. Jets are reconstructed using the anti-\(k_T\) algorithm [37] with radius \(R=0.8\). FastJet 3.2 [38] is used for jet reconstruction, grooming and calculation of the \(\tau _n^{(\beta )}\) variables. For the signal we use fat jets resulting from the decay of neutral, colour-singlet particles into four, three and two quarks, considering the six processes

$$\begin{aligned}&p p \rightarrow Z' \rightarrow S \, Z(\rightarrow \nu \nu ) \,,&S \rightarrow u {\bar{u}} u {\bar{u}} ~~ \text { and } ~~ S \rightarrow b \bar{b} b \bar{b}, \nonumber \\&p p \rightarrow Z' \rightarrow F \,\nu \, Z ( \rightarrow \nu \nu ) \,,&F \rightarrow u d d ~~ \text { and } ~~ F \rightarrow u b b, \nonumber \\&p p \rightarrow Z' \rightarrow S \, Z(\rightarrow \nu \nu ) \,,&S \rightarrow u {\bar{u}} ~~ \text { and } ~~ S \rightarrow b {\bar{b}} , \end{aligned}$$
(6)

with S a scalar and F a fermion. These processes are generated at parton level with Protos [39] and, subsequently passed through the parton shower, hadronisation and fast simulation chain. In order to remain as model-agnostic as possible, we implement decays of S and F with a flat matrix element, so that the decay weight of the different kinematical configurations only corresponds to the two-, three- or four-body phase space. These signal Monte Carlo data are dubbed as Model Independent (MI) data, and its use is motivated by the need to sample phase space without model prejudice [17]. This choice is very effective to make the taggers learn prongness rather than other undesired feature. Consequently, the obtained taggers T can be used outside – though not very far from – the interval they have been designed for.

We build taggers for four-pronged (4P), three-pronged (3P) and two-pronged (2P) signals using the corresponding set of signal processes in the first, second and third line of Eq. (6) as signal, versus the QCD background. For 4P taggers we select \(M=9\) in (1), while for 3P and 2P taggers it is enough to use \(M=7\) and \(M=5\), respectively.Footnote 2 We select three different kinematical regions (for ungroomed quantities) for the design of the taggers, labelled as follows:

  • hi80: \(p_{TJ}\ge 1\) TeV, \(m_J \in [60,100]\) GeV.

  • hi200: \(p_{TJ}\ge 1\) TeV, \(m_J \in [170,230]\) GeV.

  • lo80: \(p_{TJ}\ge 500\) GeV, \(m_J \in [60,100]\) GeV.

For the backgrounds we set at the parton level a cut on the jet, \(p_T \ge 1\) TeV (for hi80 and hi200) and \(p_T \ge 500\) GeV (for lo80) in order to increase the efficiency of the event generation. After simulation, the cuts on \(p_{TJ}\) and \(m_J\) are applied. For the signals we set \(M_{Z'} = 2.2\) TeV for hi80 and hi200, and \(M_{Z'} = 1.1\) TeV for lo80, in order to have a transverse momentum distribution close to the one that is subsequently required by the cut on \(p_{TJ}\). The mass of the intermediate particles S, F is set to 80 GeV for hi80 and lo80 and 200 GeV for hi200. After the simulation, the cut on \(m_J\) is applied to select the corresponding kinematical region.

In total, we develop nine taggers (2P, 3P and 4P for the three kinematical regions defined above), plus three alternate versions of the hi80 taggers for testing purposes. The size of the signal and background datasets used in the optimisation is collected in Table 1. The background events are approximately evenly divided among quark and gluons. The signal events for each (4P, 3P, 2P) signal class are approximately evenly divided among the two contributions listed in each line of Eq. (6). Finally, the two classes (signal and background) are mutually balanced among each other as well. The results for the coefficients for the different taggers are collected in Table 2. The variance of the obtained results is addressed in Appendix B, while a qualitative discussion about the interpretability and the intrinsic dimension of the datasets is discussed in Appendix C.

Table 1 Number of events used in the optimisation and test of the taggers with the LoRD method
Table 2 Numerical coefficients in (3) and (4) corresponding to the different taggers

For the mass decorrelation and test of the tagger performance we use a sample of QCD dijet production generated with MadGraph in 100 GeV intervals of \(p_T\), starting at [200, 300] GeV and with the last one having \(p_T \ge 2.2\) TeV. The different samples are hadronised and passed through the detector simulation, and then combined with a weight that corresponds to the cross section. For each interval we generate \(2 \times 10^5\) events and keep both jets (leading and sub-leading) for the analysis, therefore our QCD sample comprises 8.4 million jets, spanning a very wide range of mass and \(p_T\).

The application of a tight cut on the value of \({\bar{T}}\) produces a noticeable modification in the lineshape of the QCD background versus the jet mass, as seen in Appendix E. This is a serious inconvenient in experimental searches for a bump in this distribution. For other searches that do not use the jet mass as final discriminator, maintaining the shape is still desirable in order to be able to use sidebands for the estimation of the background. We therefore perform an approximate mass decorrelation following the DDT [25] proposal. For each tagger we select the parameter b by fitting the calculating the average slope of \(\langle {\bar{T}} \rangle \) versus \(\rho \), in the interval \(\rho \in [-6,-2]\), for the QCD background sample. Because this average also depends on \(p_{TJ}\), we select \(p_{TJ}\in [500,600]\) GeV, which gives good results when the dependence on \(\rho \) is not linear and the averages show some spread with \(p_{TJ}\). Finally, the parameter a is adjusted so as to have \(\langle T \rangle = 0\) in the inclusive QCD sample with \(p_{TJ}\ge 250\) GeV. The values of b and a for each tagger are collected in the last two lines of Table 2.

Fig. 1
figure 1

Left: average \(\langle T \rangle \) of the three mass-decorrelated hi80 taggers evaluated for the QCD background in three \(p_{TJ}\) bins, as a function of \(\rho \). For comparison, the average \(\langle {\bar{T}} \rangle \) in the inclusive QCD sample is shown in red. Right: Jet mass distribution for the QCD background, after increasingly tighter cuts on the tagger output

For illustration, we show in the left panels of Fig. 1 the average \(\langle T \rangle \) for the QCD background in several \(p_{TJ}\) bins, as a function of \(\rho \). The top, middle and bottom panels correspond to the 4P, 3P and 2P hi80 taggers, respectively. For comparison, we also show \(\langle {\bar{T}} \rangle \) for the non-decorrelated tagger, for the inclusive sample. For the 4P tagger the mass decorrelation is very good: the three lines for \(\langle T \rangle \) are almost coincident and horizontal. For the 3P and 2P taggers the decorrelation achieved with this simple prescription is poorer. The same trend is found with the hi200, lo80 and the alternate hi80 taggers. The jet mass distributions for the QCD background after increasingly tighter cuts on T are presented on the right panels of Fig. 1. We observe that the background lineshape is very well preserved, with results comparable to the best decorrelation methods examined Ref. [40]. One can notice two minor features:

  1. (i)

    When the background is suppressed by a factor around 100 (e.g. green curve with respect to black curve in Fig. 1-top right), a small increase is produced in the first bin [20, 40] GeV.

  2. (ii)

    For the 3P and 2P taggers the slope of the QCD distribution (see right plots) slightly decreases after application of the cuts on T (coloured lines). This corresponds to the fact that the \(\langle T \rangle \) distributions on the left panels are not as flat as for the 4P tagger.

We remark that this simple decorrelation with fixed b in (3) can easily be improved by taking b as a function of \(m_J\) and \(p_{TJ}\). As the numerical calculation of \({\bar{T}}\) is very simple, this is a rather computer-inexpensive task. Our purpose here is to show that even a rough mass decorrelation with fixed b does most of the job to maintain the profile of the QCD jet mass distribution after the application of the tagger, even for very tight cuts on the tagger output. Refinements are always possible, see Ref. [25].

Fig. 2
figure 2

Distributions of T for the QCD background (black) and selected signals (blue), for the hi80 taggers as in Table 2 (v1, solid lines) and alternate versions (v2, dashed lines) obtained with a different random seed

A final point that is of interest for practical applications is the stability of the results obtained by the LoRD. The stability of the classifier performance on test samples is very good, as shown in Appendix B. However, this does not guarantee that the T distributions for the background and signals are similar. With this purpose, we have used an alternate set of hi80 taggers obtained with different random seeds to check to which extent the T distributions are alike. The results, presented in Fig. 2, show a remarkable similarity between the alternate versions (v1 and v2) of the same tagger. No cut on jet mass is applied, and \(p_{TJ}\ge 250\) GeV is required on signal and background for consistency with the parton-level cut in the background generation. The details on the signals can be found in the next section. This stability is useful to be able to build a set of taggers that cover a very wide range of jet masses, if necessary. Notice also that, by design, the signals are expected to have higher values of \({\bar{T}}\) than the background, therefore the background can be reduced with a lower cut on \({\bar{T}}\).

3 Performance

We evaluate the performance of our taggers by selecting several new physics signals yielding multi-pronged jets,

$$\begin{aligned}&p p \rightarrow Z' \rightarrow S S \,,\quad S \rightarrow AA \rightarrow b {\bar{b}} b {\bar{b}}\,, \nonumber \\&p p \rightarrow Z' \rightarrow S S \,,\quad S \rightarrow WW \rightarrow q {\bar{q}} q {\bar{q}} \,, \nonumber \\&p p \rightarrow Z' \rightarrow A A \,,\quad A \rightarrow b {\bar{b}} \,, \nonumber \\&p p \rightarrow Z' \rightarrow W W \,,\quad W \rightarrow q {\bar{q}} \,, \end{aligned}$$
(7)

with \(q=u,d,s,c\) light quarks other than b. We generically denote the scalars with cascade decay into four quarks as S, and the scalars decaying into \(b {\bar{b}}\) as A. The decays \(Z' \rightarrow SS\), \(Z' \rightarrow AA\) can take place in any SM extension with an additional \(\text {U}(1)'\) group and extra scalars, with the minimal consistent implementation given in Ref. [41]. The decays \(Z' \rightarrow WW\) can take place in left-right models [42, 43]. We select \(M_{Z'} = 2.2\) TeV and different values of \(M_S\) and \(M_A\) to test the hi80 and hi200 taggers. For three-pronged jets from \(Z' \rightarrow t {\bar{t}}\) we have not found significant improvement over the simple ratio \(\tau _{32}\) and we omit the results for brevity. The generated signal samples have a minimum of \(10^5\) events, and for each event we use both jets, therefore the samples used have a minimum of \(2 \times 10^5\) jets. The background sample is the same one with \(8.4 \times 10^6\) events used for the mass decorrelation.

A meaningful assessment of the performance of the taggers can only be made within a given interval of jet mass and \(p_T\). (Other anomaly detection methods [18] report combined performances using the jet mass too as discriminator.) In all cases we require \(p_{TJ}\ge 1\) TeV, and we do not apply an upper cut on this variable because the distributions are in all cases concentrated towards smaller transverse momentum. The jet mass interval selected for hi80 taggers is \(m_J \in [60,100]\) GeV, and for hi200 taggers it is \(m_J \in [160,240]\) GeV. These jet mass window requirements reduce the QCD background by factors of 6.5 and 7.1, respectively.

Fig. 3
figure 3

ROC curves for the hi80 and hi200 taggers applied on selected signals giving multi-pronged jets, compared to \(\tau \)-ratios and dedicated NNs (see the text)

We present in Fig. 3 the receiver operating characteristic (ROC) curves for signal efficiency versus background rejection of the hi80 taggers (left column) and hi200 taggers (right column), evaluated on different signals indicated in each plot. For comparison, we include the ROC curves for simple ratios \(\tau _{nm} \equiv \tau _n^{(1)} / \tau _m^{(1)}\) used in the literature. We also include the ROC curves for NN taggers trained on the same (groomed) mass and \(p_T\) interval, using the same architecture of Ref. [17], with two fully connected hidden layers of 512 and 32 nodes, respectively.Footnote 3 For the NN taggers we do not perform any mass decorrelation. The results shown here for NNs are not fully comparable to those in Ref. [17] because here we select to apply the taggers on fixed intervals of groomed mass. The substructure of QCD jets with groomed mass e.g. \(m_J \in [60,100]\) GeV is not the same as for jets with ungroomed mass \(m_J \in [60,100]\) GeV. This can also be noticed by comparing with the results in Appendix B, obtained for jets with ungroomed mass \(m_J \in [60,100]\) GeV.

To better illustrate the effect of the tagging on the signal-to-background significance \(S/\sqrt{B}\) (with S standing for signal and B for background) we define the significance improvement as

$$\begin{aligned} s = \frac{\varepsilon _S}{\sqrt{\varepsilon _B}} \,, \end{aligned}$$
(8)

with \(\varepsilon _S\), \(\varepsilon _B\) the tagger efficiencies for signal and background, respectively. This is precisely the factor multiplying the luminosity-dependent ratio \(S/\sqrt{B}\) due to the tagging. We plot the lines (in gray) that correspond to several values of the significance improvement s. Notice that, for the mass intervals selected, an additional improvement by a factor of 2 is brought by the jet mass cut, which might even be larger for more stringent cuts on \(m_J\).

The first row of Fig. 3 shows the performance for \(S \rightarrow AA \rightarrow 4b\), giving four-pronged jets with b quarks. A scalar undergoing this type of decay has been dubbed as ‘stealth boson’ because of its elusive nature [16]. We observe that \(\tau \)-ratios, especially for \(M_S = 80\) GeV, fail to improve the signal significance, while the LoRD tagger can improve it by a factor of two. (For a \(S \rightarrow AA \rightarrow 4u\) signal the performance is similar.) The NN tagger reaches a higher significance improvement \(s=3\).

In the second row of Fig. 3 we study two-pronged jets from \(A \rightarrow b {\bar{b}}\), which are harder to identify than W/Z bosons using jet substructure. The LoRD tagger performs better than the commonly used ratio \(\tau _{21}\) but, again, worse than the NN tagger.

In the third row of Fig. 3 we examine two signals without b quarks. On the left panel we have \(W \rightarrow q {\bar{q}}\), for which as said the tagger \(T_{2P}\) has a better discrimination than for \(A \rightarrow b {\bar{b}}\) with the same mass. The performance of \(T_{2P}\) is half-way between the simplest \(\tau _{21}\) ratio and the more complex NN. On the right panel we show \(S \rightarrow WW \rightarrow 4q\), giving a four-pronged jet with four light quarks. The performance of \(T_{4P}\) on this signal is quite good (better than for the analogue with four b quarks, top right panel) but it is still surpassed by the NN by a factor of two in terms of significance enhancement.

We also examine the performance of our taggers for jets containing electrons or photons as ‘prongs’, for which the taggers are not designed. We consider

$$\begin{aligned}&p p \rightarrow Z' \rightarrow S S \,,\quad S \rightarrow AA \rightarrow b {\bar{b}} \gamma \gamma \,, \nonumber \\&p p \rightarrow Z' \rightarrow N N \,,\quad N \rightarrow e q \bar{q} \,. \end{aligned}$$
(9)

The decays \(S \rightarrow AA \rightarrow b {\bar{b}} \gamma \gamma \) can take place for example in the model of Refs. [41, 44]. N is a heavy neutral lepton such as the right-handed neutrinos introduced in left-right models, which undergoes a three-body decay mediated by an off-shell \(W_R\) boson. We set \(M_{Z'} = 2.2\) TeV as before, and show our results in Fig. 4. The left column corresponds to hi80 taggers and the right column to hi200 taggers.

Fig. 4
figure 4

ROC curves for the hi80 and hi200 taggers applied on selected signals giving multi-pronged jets containing electrons or photons, compared to \(\tau \)-ratios and dedicated NNs (see the text)

In the top panels of Fig. 4 we show the performance of the \(T_{3P}\) and \(T_{4P}\) taggers for jets containing two b quark and two photons. These conspicuous jets have a shape that is approximately four-pronged, and the \(\tau _{42}\) ratio works well to distinguish them from QCD jets. (The ratio \(\tau _{43}\) is comparable, and other \(\tau \)-ratios are worse; we do not show the corresponding lines for clarity.) The LoRD taggers also work well to discriminate these signals from the background. Remarkably, for higher jet masses the LoRD taggers provide a better discrimination than dedicated NNs, although their performance is similar to that of \(\tau _{42}\). As it can be observed, \(T_{3P}\) has a better discrimination for \(M_S = 80\) GeV (top, left panel) and \(T_{4P}\) is better for \(M_S = 200\) GeV (top, right panel).

In the bottom panels of Fig. 4 we show the performance of the \(T_{3P}\) taggers for a signal that is not properly three-pronged, since one of the ‘prongs’ is an electron rather than a jet. The taggers perform well for this signal for which they are not specifically designed, and better than the simple ratio \(\tau _{32}\). Most surprisingly, the LoRD taggers perform better than the NN taggers, especially at \(m_N = 200\) GeV.

Overall, we find that the highest benefit of the LoRD taggers is achieved for four-pronged signals, for which they largely surpass the performance of simple \(\tau \)-ratios and capture a good deal of the potential of a complex NN. Also, we remark that the taggers work remarkably well for signals for which they are not designed: (a) jets containing two b quarks plus two photons; (b) jets containing two light quarks plus an electron.

The performance of the LoRD taggers remains stable under moderate variations of the masses of the particle originating the fat jet. In the comparisons shown in Figs. 3 and 4, these masses were taken as 80 GeV for hi80 taggers and 200 GeV for hi200 taggers. We now investigate the results for four-pronged signals \(S \rightarrow AA \rightarrow 4b\) of different masses.

Fig. 5
figure 5

ROC curves for the hi80 (left) and hi200 (right) \(T_{4P}\) taggers applied to \(S \rightarrow AA \rightarrow 4b\) signals of different masses, see the text

In the top left panel of Fig. 5 we show the performance of the hi80 \(T_{4P}\) tagger for different masses \(M_S = 65\), 80, 95 and 110 GeV. We keep the ratio \(M_A : M_S \sim 30 : 80\) as in the example with \(M_S = 80\) GeV, \(M_A = 30\) GeV shown in Fig. 3, as the jet shape depends on this ratio. The jet mass cut applied is \(M_S - 20~\text {GeV} \le m_J \le M_S + 20\) GeV. We observe that the tagger can be used for a wide range of masses, even when the mass window for the cut has little overlap with the tagger design region [80, 100] GeV. The bottom left panel shows the performance for the same values of \(M_S\) but halving \(M_A\), which makes the jet shape more two-pronged-like and increases the tagger performance.

In the top right panel of Fig. 5 we select masses \(M_S = 160\), 200 and 240 GeV with \(M_A : M_S \sim 80 : 200\) and in the bottom right panel with \(M_A : M_S \sim 40 : 200\). The jet mass cuts applied are \(M_S - 40~\text {GeV} \le m_J \le M_S + 40\) GeV. Again, we observe that the performance of the hi200 \(T_{4P}\) tagger remains quite stable under moderate variations of the jet mass.

Fig. 6
figure 6

Comparison of taggers designed on different kinematical regions (hi80, hi200, lo80 and applied to signals and backgrounds on reference \(m_J\) and \(p_{TJ}\) intervals

4 Dependence on input data kinematics

We address in this section the variation of the tagger performance for signals (and backgrounds) with masses or transverse momentum quite different from the ones used in the design. We select the two reference mass and \(p_T\) intervals used in Figs. 3 and 4 of the previous section: (a) \(m_J \in [60,100]\) GeV, \(p_{TJ}\ge 1\) TeV; (b) \(m_J \in [160,240]\) GeV, \(p_{TJ}\ge 1\) TeV. We apply the hi80, hi200 and lo80 taggers to selected 4P, 3P and 2P signals. The results are shown in Fig. 6. The left column corresponds to the different taggers applied on signals in the kinematical region (a), and the right column to different taggers applied on signals in the region (b). From the comparison we can observe that:

  • The performance is quite stable with \(p_{TJ}\) (solid versus dotted lines in the left column). Here we apply taggers designed with \(p_{TJ}\ge 1\) TeV (hi80, solid lines) and \(p_{TJ}\ge 500\) GeV (lo80, dotted lines), on jets with \(p_{TJ}\ge 1\) TeV and we observe that the performance is quite similar for 4P taggers (top panel) and basically the same for 3P and 2P taggers.

  • For 4P taggers, the tagger efficiency is degraded when applying the wrong one, i.e. the hi200 tagger in the region (a) or the hi80 tagger in the region (b). For 3P taggers, nevertheless, the hi200 tagger is good in both (a) and (b) kinematical regions. For 2P taggers there is no appreciable difference in any case.

These results suggest that, if one is interested on a very wide range of jet masses, a set of two or three LoRD taggers with different masses can be used to cover that region without performance loss. The stability of the taggers resulting from the optimisation procedure, shown in Fig. 2, and the stability of the performance with small jet mass variations, shown in Fig. 5, ensure that this procedure is feasible and would lead to smooth tagging efficiencies across the whole jet mass interval.

5 Conclusions

In this work we have used a logistic regression design (LoRD) to obtained simple taggers for multi-pronged jets based on jet substructure variables. These taggers can be approximately decorrelated from the jet mass. The application of the taggers keeps the shape of the jet mass distribution for the QCD background to a large extent.

The best results are achieved for taggers for four-pronged (4P) signals, which precisely are the least covered in terms of available tools. In this case, the mass decorrelation is very good (see Fig. 1), the signal to background discrimination largely surpasses simple \(\tau \)-ratios used in current new physics searches, and is not far from dedicated NNs. For boosted top quarks, the three-pronged (3P) taggers do not bring much improvement over the ratio \(\tau _{32}\).Footnote 4 However, for boosted heavy neutrinos – which give jets that are not properly three-pronged – the LoRD taggers perform very well, even better than NNs trained on the same set of signals and backgrounds. A significance improvement by a factor up to 8 can be achieved. This fact is quite interesting since current searches for boosted heavy neutrinos [45] do not use any type of jet substructure analysis. We have also tested two-pronged (2P) taggers on a variety of signals, finding that the performance is half-way between the ratio \(\tau _{21}\) and a dedicated NN. LoRD \(T_{3P}\) and \(T_{4P}\) taggers are also sensitive to jets containing two b quarks plus two photons, a signature which is not experimentally covered [44].

We envisage two possible situations where the LoRD taggers may be very useful. The first one is when the development of a full-fledged multivariate tagger with mass decorrelation is not feasible. In this case, a LoRD tagger (or a handful of them) can easily be used, obtaining results that are not far from the ones that a multivariate method could bring.

The second situation is as a complement and cross-check of results obtained with more complex taggers as the ones based on deep neural networks. These methods are often a ‘black box’ whose results are difficult to test independently. Because the performance of the LoRD taggers is not far from NNs, they can be very useful as a robust test, especially in case any new physics signal involving fat jets is found at the LHC.