Jet tagging made easy

We develop taggers for multi-pronged jets that are simple functions of jet substructure (so-called `subjettiness') variables. These taggers can be approximately decorrelated from the jet mass in a quite simple way. Specifically, we use a Logistic Regression Design (LoRD) which, even being one of the simplest machine learning classifiers, shows a performance which surpasses that of simple variables used by the ATLAS and CMS Collaborations and is not far from more complex models based on neural networks. Contrary to the latter, our method allows for an easy implementation of tagging tasks by providing a simple and interpretable analytical formula with already optimised parameters.


Introduction
The tagging of boosted jets is an essential tool for the search of heavy new physics beyond the Standard Model (SM) that involves one or more coloured particles as final state. After the first works on jet substructure [1][2][3] many jet substructure variables [4][5][6][7][8] have been proposed to discriminate boosted Higgs bosons, top quarks and weak W/Z bosons from the QCD background composed of quark and gluon jets. (See also Refs. [10][11][12].) Multivariate taggers based on jet substructure variables [13,14] or directly using jet images (see Ref. [15] for a review) have also been employed. These taggers can be qualified as 'dedicated', namely they are designed to discriminate jets corresponding to a specific type of signal (Higgs, top, W/Z) from the background. Despite achieving a good discrimination (especially for multivariate taggers), a drawback of dedicated taggers is the fact that they may not be sensitive to other types of jets different from those which they are designed for. This was made clear in Ref. [16]: the application of a wrong tagger to a four-pronged jet can cause the signal significance to plummet, even below the value when the tagger is not applied.
A first attempt to develop a 'generic' tagger, sensitive to a variety of complex jets, was done in Ref. [17], by training a neural network (NN) with two, three and fourpronged jets as signals, against the QCD background. Another complementary approach is brought by weakly-supervised and unsupervised methods, where little or no theoretical input is given and the NN learns to discriminate potential signals in different ways, for example by comparing different kinematical regions [18][19][20] or by a reduction of the space dimensionality using an autoencoder [21][22][23][24]. Despite all these multivariate methods are very effective in the discrimination, they have two inherent drawbacks: • Often, there is no obvious interpretation of how the multivariate methods work in separating signal from background. This is because the interpretability of the results greatly decreases in general with the complexity of the machine learning (ML) models.
• More complex ML models require more complex implementations, which translates into more difficult testing and reproducibility of the results, especially when the literature does not provide all the necessary details of the implementation (which is typically the case, unfortunately).
As a complement to these methods, it is our purpose here to develop and test a set of simple taggers for multi-pronged jets, based on an extended set of substructure variables, called 'subjettiness' [13] τ (1/2) 1 , τ 1 , τ with M > 1 an integer representing the number of axes within a jet determining the clustering. Our main goal is simplicity: the taggers obtained are simple functions of the subjettiness variables (1), with an approximate mass decorrelationà la DDT [25] which suffices to maintain the shape of the jet mass distribution to a large extent, after the application of the taggers. The taggers are developed using Logistic Regression Design (LoRD) to find the numerical parameters in the function that achieve the best discrimination, as described in Sect. 2 and Appendix A. There exist already in the literature taggers based on optimised products of subjettiness variables: a method to develop taggers with a scan over the parameter space was earlier introduced in Ref. [26], and complex neural networks were used in Ref. [27], for the development of taggers to discriminate two-pronged jets versus QCD jets, and for quark/gluon jet identification. With respect to those, the design of our taggers is much simpler, and also addresses the two issues mentioned above about interpretability and reproducibility.
We find that the discrimination power of our simple taggers in some cases largely surpasses the simple variables used by the ATLAS and CMS Collaborations in searches for new physics using jet substructure. Results are presented in Sect. 3. We also compare the results using LoRD with NNs using the same architecture as in Ref. [17]. In general, the NNs perform better, except for some signals which neither are trained for.
An important point in the design of the taggers is the kinematical region (i.e. jet mass and p T ) used for the optimisation. We address this issue in Sect. 4. While the dependence on p T J is marginal, the dependence on jet mass is more noticeable when one gets away from the design region. Our conclusions are presented in Sect. 5. In appendix B we summarise a few results for taggers without mass decorrelation.

LoRD of the taggers
The input to the taggers is given by a set of subjettiness variables (1) with M ≤ 9, 1 where with i labelling the particles in the jet, p T i their transverse momenta, ∆R Ki their legoplot distance to the axis K = 1, . . . , N and p T J the jet transverse momentum. As in Ref. [13], in the computation of these variables we use the axes defined by exclusive k T algorithm [28,29] with standard E-scheme recombination [30].
The proposed functional form for the taggers is and ρ = log m 2 J /p 2 T J . The coefficients c β n are determined by Logistic Regression to optimise the discrimination between the signal(s) and the background (see appendix A for a detailed description of this implementation); b is a parameter chosen to achieve an approximate mass decorrelation, and for convenience the tagger output is shifted by subtracting a fixed quantity a so that its average T , when evaluated on a reference background sample, vanishes. Notice that the sum (4) is equivalent to a product n,β 1 In Ref. [17] it was shown that for the topologies considered in this work (2-, 3-and 4-pronged) going beyond M = 7 does not contribute further to the discrimination performance of a NN. With Logistic Regression we find that the discrimination power saturates at M = 9.
which includes and generalises the commonly used ratios τ 21 = τ 2 . Several comments and clarifications regarding our procedure are now in order.
• Independently of the precise method used to determine c β n , a range of jet mass and p T has to be specified for the signal(s) and background. For any of these processes the distributions of the variables τ (β) n depend on m J and p T J . The coefficients c β n are then obtained to optimise the discrimination of signal and background within a given interval of jet mass and p T .
• Using ungroomed jet mass and p T for the determination of c β n reduces the dependence of the resulting taggers on the intervals chosen, and slightly improves the mass decorrelation. Also, this is desirable in order not to stick to a particular grooming algorithm. We have also tried using the groomed mass and p T , and the discrimination power is similar. We also use τ (β) n of the ungroomed jets, since we find that the discrimination between signal and background is better.
• For the mass decorrelation, evaluation of the tagger performance, etc., namely in all calculations except the tagger design itself, we use m J and p T J of the groomed jets.
The recursive soft drop [31] algorithm with parameters β = 1, z cut = 0.05, N = 3 is found to work very well for multi-pronged jets, avoiding the peak distortions and shifts that other algorithms produce [32].
• The parameter a is chosen to adjust T = 0 for a reference background sample with (groomed) p T J ≥ 250 GeV, and without any restriction on m J . For larger p T or when considering a narrow m J interval, the average is slightly shifted. This residual dependence may be accounted for by varying the tagger threshold, as done for example in Ref. [33] or, equivalently, by varying a as a function of m J and p T J . This sophistication however is not required for our discussion.
With the LoRD method we optimise the discrimination between quark/gluon jets (background) and multi-pronged decays of boosted colour singlet particles (signal). Quark and gluon jets are obtained by generating the parton-level processes pp → Zg, pp → Zq, with decay Z → νν, using MadGraph5 [34], and Pythia 8 [35]. In all cases the centreof-mass energy is set to 13 TeV. The detector response is simulated with Delphes 3.4 [36] using the CMS detector card. Jets are reconstructed using the anti-k T algorithm [37] with radius R = 0.8. FastJet 3.2 [38] is used for jet reconstruction, grooming and calculation of the τ (β) n variables. For the signal we use fat jets resulting from the decay of neutral, colour-singlet particles into four, three and two quarks, considering the six processes pp → Z → S Z(→ νν) , S → uūuū and S → bbbb, with S a scalar and F a fermion. These processes are generated with Protos [39] and, in order to remain as model-agnostic as possible, we implement decays of S and F with a flat matrix element, so that the decay weight of the different kinematical configurations only corresponds to the two-, three-or four-body phase space. These signal Monte Carlo data are dubbed as Model Independent (MI) data, and its use is motivated by the need to sample phase space without model prejudice [17]. This choice is very effective to make the taggers learn prongness rather than other undesired feature. Consequently, the obtained taggers T can be used outside -though not very far from -the interval they have been designed for.
We build taggers for four-pronged (4P), three-pronged (3P) and two-pronged (2P) signals using the corresponding set of signal processes in the first, second and third line of Eqs. (6) as signal, versus the QCD background. For 4P taggers we select M = 9 in (1), while for 3P and 2P taggers it is enough to use M = 7 and M = 5, respectively. 2 We select three different kinematical regions (for ungroomed quantities) for the design of the taggers, labelled as follows: • hi80: p T J ≥ 1 TeV, m J ∈ [60, 100] GeV.
For the backgrounds we set at the parton level a cut on the jet, p T ≥ 1 TeV (for hi80 and hi200) and p T ≥ 500 GeV (for lo80) in order to increase the efficiency of the event generation. After simulation, the cuts on p T J and m J are applied. For the signals we set M Z = 2.2 TeV for hi80 and hi200, and M Z = 1.1 TeV for lo80, in order to have a transverse momentum distribution close to the one that is subsequently required by the cut on p T J . The mass of the intermediate particles S, F is set to 80 GeV for hi80 and lo80 and 200 GeV for hi200. After the simulation, the cut on m J is applied to select the corresponding kinematical region.
In total, we develop nine taggers (2P, 3P and 4P for the three kinematical regions defined above), plus three alternate versions of the hi80 taggers for testing purposes. The size of the signal and background datasets used in the optimisation is collected in Table 1. The background events are approximately evenly divided among quark and gluons. The signal events for each (4P, 3P, 2P) signal class are approximately evenly divided among the two contributions listed in each line of Eqs. (6). Finally, the two classes (signal and background) are mutually balanced among each other as well. The results for the coefficients for the different taggers are collected in Table 2. By design, the signals are expected to have higher values ofT than the background, therefore the background can be reduced with a lower cut onT . For the mass decorrelation and test of the tagger performance we use a sample of QCD dijet production generated with MadGraph in 100 GeV intervals of p T , starting at [200, 300] GeV and with the last one having p T ≥ 2.2 TeV. The different samples are hadronised and passed through the detector simulation, and then combined with a weight that corresponds to the cross section. For each interval we generate 2 × 10 5 events and keep both jets (leading and sub-leading) for the analysis, therefore our QCD sample comprises 8.4 million jets, spanning a very wide range of mass and p T .
The application of a tight cut on the value ofT produces a noticeable modification in the lineshape of the QCD background versus the jet mass, as seen in appendix B. This is a serious inconvenient in experimental searches for a bump in this distribution. For other searches that do not use the jet mass as final discriminator, maintaining the shape is still desirable in order to be able to use sidebands for the estimation of the background. We therefore perform an approximate mass decorrelation following the DDT [25] proposal. For each tagger we select the parameter b by fitting the calculating the average slope of T versus ρ, in the interval ρ ∈ [−6, −2], for the QCD background sample. Because this average also depends on p T J , we select p T J ∈ [500, 600] GeV, which gives good results when the dependence on ρ is not linear and the averages show some spread with p T J . Finally, the parameter a is adjusted so as to have T = 0 in the inclusive QCD sample with p T J ≥ 250 GeV. The values of b and a for each tagger are collected in the last two   (3) and (4) corresponding to the different taggers.
lines of Table 2.
For illustration, we show in the left panels of Fig. 1 the average T for the QCD background in several p T J bins, as a function of ρ. The top, middle and bottom panels correspond to the 4P, 3P and 2P hi80 taggers, respectively. For comparison, we also show T for the non-decorrelated tagger, for the inclusive sample. For the 4P tagger the mass decorrelation is very good: the three lines for T are almost coincident and horizontal.
For the 3P and 2P taggers the decorrelation achieved with this simple prescription is    poorer. The same trend is found with the hi200, lo80 and the alternate hi80 taggers. The jet mass distributions for the QCD background after increasingly tighter cuts on T are presented on the right panels of Fig. 1. We observe that the background lineshape is very well preserved, with results comparable to the best decorrelation methods examined Ref. [40]. One can notice two minor features:  Table 2 (v1, solid lines) and alternate versions (v2, dashed lines) obtained with a different random seed.
(i) When the background is suppressed by a factor around 100 (e.g. green curve with respect to black curve in Fig. 1-top right), a small increase is produced in the first bin [20,40] GeV.
(ii) For the 3P and 2P taggers the slope of the QCD distribution (see right plots) slightly decreases after application of the cuts on T (coloured lines). This corresponds to the fact that the T distributions on the left panels are not as flat as for the 4P tagger.
We remark that this simple decorrelation with fixed b in (3) can easily be improved by taking b as a function of m J and p T J . As the numerical calculation ofT is very simple, this is a rather computer-inexpensive task. Our purpose here is to show that even a rough mass decorrelation with fixed b does most of the job to maintain the profile of the QCD jet mass distribution after the application of the tagger, even for very tight cuts on the tagger output. Refinements are always possible, see Ref. [25].
A final point that is of interest for practical applications is the stability of the results obtained by the LoRD. More concretely, we refer to the stability of the performance of the classifier on the test samples, for different initialisation of the parameters (see appendix A for details). With this purpose, we have used an alternate set of hi80 taggers obtained with different random seeds to check to which extent the T distributions are alike. Of course, with the optimisation procedure it is expected that the overall discrimination between signal and background is similar; however, this does not guarantee that the T distributions for the background and signals are alike.
The results, presented in Fig. 2, show a remarkable similarity between the alternate versions (v1 and v2) of the same tagger. No cut on jet mass is applied, and p T J ≥ 250 GeV is required on signal and background for consistency with the parton-level cut in the background generation. The details on the signals can be found in the next section. This stability is useful to be able to build a set of taggers that cover a very wide range of jet masses, if necessary.

Performance
We evaluate the performance of our taggers by selecting several new physics signals yielding multi-pronged jets, with q = u, d, s, c light quarks other than b. We generically denote the scalars with cascade decay into four quarks as S, and the scalars decaying into bb as A. The decays Z → SS, Z → AA can take place in any SM extension with an additional U(1) group and extra scalars, with the minimal consistent implementation given in Ref. [41]. The decays Z → W W can take place in left-right models [42,43]. We select M Z = 2.2 TeV and different values of M S and M A to test the hi80 and hi200 taggers. For three-pronged jets from Z → tt we have not found significant improvement over the simple ratio τ 32 and we omit the results for brevity. The generated signal samples have a minimum of 10 5 events, and for each event we use both jets, therefore the samples used have a minimum of 2 × 10 5 jets. The background sample is the same one with 8.4 × 10 6 events used for the mass decorrelation.
A meaningful assessment of the performance of the taggers can only be made within a given interval of jet mass and p T . (Other anomaly detection methods [18] report combined performances using the jet mass too as discriminator.) In all cases we require p T J ≥ 1 TeV, and we do not apply an upper cut on this variable because the distributions are in all cases peaked towards smaller transverse momentum. The jet mass interval selected for hi80 taggers is m J ∈ [60, 100] GeV, and for hi200 taggers it is m J ∈ [160, 240] GeV. These jet mass window requirements reduce the QCD background by factors of 6.5 and 7.1, respectively.
We present in Fig. 3 the receiver operating characteristic (ROC) curves for signal efficiency versus background rejection of the hi80 taggers (left column) and hi200 taggers (right column), evaluated on different signals indicated in each plot. For comparison, we  include the ROC curves for simple ratios τ nm ≡ τ (1) n /τ (1) m used in the literature. We also include the ROC curves for NN taggers trained on the same (groomed) mass and p T interval, using the same architecture of Ref. [17], with two fully connected hidden layers of 512 and 32 nodes, respectively. 3 For the NN taggers we do not perform any mass decorrelation. The results shown here for NNs are not fully comparable to those in Ref. [17] because here we select to apply the taggers on fixed intervals of groomed mass. The substructure of QCD jets with groomed mass e.g. m J ∈ [60, 100] GeV is not the same as for jets with ungroomed mass m J ∈ [60, 100] GeV.
To better illustrate the effect of the tagging on the signal-to-background significance S/ √ B (with S standing for signal and B for background) we define the significance improvement as with ε S , ε B the tagger efficiencies for signal and background, respectively. This is precisely the factor multiplying the luminosity-dependent ratio S/ √ B due to the tagging. We plot the lines (in gray) that correspond to several values of the significance improvement s. Notice that, for the mass intervals selected, an additional improvement by a factor of 2 is brought by the jet mass cut, which might even be larger for more stringent cuts on m J .
The first row of Fig. 3 shows the performance for S → AA → 4b, giving four-pronged jets with b quarks. A scalar undergoing this type of decay has been dubbed as 'stealth boson' because of its elusive nature [16]. We observe that τ -ratios, especially for M S = 80 GeV, fail to improve the signal significance, while the LoRD tagger can improve it by a factor of two. (For a S → AA → 4u signal the performance is similar.) The NN tagger reaches a higher significance improvement s = 3.
In the second row of Fig. 3 we study two-pronged jets from A → bb, which are harder to identify than W/Z bosons using jet substructure. The LoRD tagger performs better than the commonly used ratio τ 21 but, again, worse than the NN tagger.
In the third row of Fig. 3 we examine two signals without b quarks. On the left panel we have W → qq, for which as said the tagger T 2P has a better discrimination than for A → bb with the same mass. The performance of T 2P is half-way between the simplest τ 21 ratio and the more complex NN. On the right panel we show S → W W → 4q, giving a four-pronged jet with four light quarks. The performance of T 4P on this signal is quite good (better than for the analogue with four b quarks, top right panel) but it is still surpassed by the NN by a factor of two in terms of significance enhancement.
We also examine the performance of our taggers for jets containing electrons or photons as 'prongs', for which the taggers are not designed. We consider performance.  The decays S → AA → bbγγ can take place for example in the model of Refs. [41,44]. N is a heavy neutral lepton such as the right-handed neutrinos introduced in left-right models, which undergoes a three-body decay mediated by an off-shell W R boson. We set M Z = 2.2 TeV as before, and show our results in Fig. 4. The left column corresponds to hi80 taggers and the right column to hi200 taggers.
In the top panels of Fig. 4 we show the performance of the T 3P and T 4P taggers for jets containing two b quark and two photons. These conspicuous jets have a shape that is approximately four-pronged, and the τ 42 ratio works well to distinguish them from QCD jets. (The ratio τ 43 is comparable, and other τ -ratios are worse; we do not show the corresponding lines for clarity.) The LoRD taggers also work well to discriminate these signals from the background. Remarkably, for higher jet masses the LoRD taggers provide a better discrimination than dedicated NNs, although their performance is similar to that of τ 42 . As it can be observed, T 3P has a better discrimination for M S = 80 GeV (top, left panel) and T 4P is better for M S = 200 GeV (top, right panel).
In the bottom panels of Fig. 4 we show the performance of the T 3P taggers for a signal that is not properly three-pronged, since one of the 'prongs' is an electron rather than a jet. The taggers perform well for this signal for which they are not specifically designed, and better than the simple ratio τ 32 . Most surprisingly, the LoRD taggers perform better than the NN taggers, especially at m N = 200 GeV.
Overall, we find that the highest benefit of the LoRD taggers is achieved for fourpronged signals, for which they largely surpass the performance of simple τ -ratios and capture a good deal of the potential of a complex NN. Also, we remark that the taggers work remarkably well for signals for which they are not designed: (a) jets containing two b quarks plus two photons; (b) jets containing two light quarks plus an electron.
The performance of the LoRD taggers remains stable under moderate variations of the masses of the particle originating the fat jet. In the comparisons shown in Figs

Dependence on input data kinematics
We address in this section the variation of the tagger performance for signals (and backgrounds) with masses or transverse momentum quite different from the ones used in the design. We select the two reference mass and p T intervals used in Figs. 3 and 4 of the previous section: (a) m J ∈ [60, 100] GeV, p T J ≥ 1 TeV; (b) m J ∈ [160, 240] GeV, p T J ≥ 1 TeV. We apply the hi80, hi200 and lo80 taggers to selected 4P, 3P and 2P signals. The results are shown in Fig. 6. The left column corresponds to the different taggers applied on signals in the kinematical region (a), and the right column to different taggers applied on signals in the region (b). From the comparison we can observe that: • The performance is quite stable with p T J (solid versus dotted lines in the left column). Here we apply taggers designed with p T J ≥ 1 TeV (hi80, solid lines) and p T J ≥ 500 GeV (lo80, dotted lines), on jets with p T J ≥ 1 TeV and we observe that the performance is quite similar for 4P taggers (top panel) and basically the same for 3P and 2P taggers.
• For 4P taggers, the tagger efficiency is degraded when applying the wrong one, i.e. the hi200 tagger in the region (a) or the hi80 tagger in the region (b). For 3P taggers, nevertheless, the hi200 tagger is good in both (a) and (b) kinematical regions. For 2P taggers there is no appreciable difference in any case.
These results suggest that, if one is interested on a very wide range of jet masses, a set of two or three LoRD taggers with different masses can be used to cover that region without performance loss. The stability of the taggers resulting from the optimisation procedure, shown in Fig. 2, and the stability of the performance with small jet mass variations, shown in in Fig. 5, ensure that this procedure is feasible and would lead to smooth tagging efficiencies across the whole jet mass interval.

Conclusions
In this work we have used a logistic regression design (LoRD) to obtained simple taggers for multi-pronged jets based on jet substructure variables. These taggers can be approximately decorrelated from the jet mass. The application of the taggers keeps the shape of the jet mass distribution for the QCD background to a large extent.
The best results are achieved for taggers for four-pronged (4P) signals, which precisely are the least covered in terms of available tools. In this case, the mass decorrelation is very good (see Fig. 1), the signal to background discrimination largely surpasses simple τ -ratios used in current new physics searches, and is not far from dedicated NNs. For boosted top quarks, the three-pronged (3P) taggers do not bring much improvement over the ratio τ 32 . However, for boosted heavy neutrinos -which give jets that are not properly three-pronged -the LoRD taggers perform very well, even better than NNs trained on the same set of signals and backgrounds. A significance improvement by a factor up to 8 can be achieved. This fact is quite interesting since current searches for boosted heavy neutrinos [45] do not use any type of jet substructure analysis. We have also tested two-pronged (2P) taggers on a variety of signals, finding that the performance is half-way between the ratio τ 21 and a dedicated NN. LoRD T 3P and T 4P taggers are also sensitive to jets containing two b quarks plus two photons, a signature which is not experimentally covered [44].
We envisage two possible situations where the LoRD taggers may be very useful. The first one is when the development of a full-fledged multivariate tagger with mass decorrelation is not feasible. In this case, a LoRD tagger (or a handful of them) can easily be used, obtaining results that are not far from the ones that a multivariate method could bring.
The second situation is as a complement and cross-check of results obtained with more complex taggers as the ones based on deep neural networks. These methods are often a 'black box' whose results are difficult to test independently. Because the performance of the LoRD taggers is not far from NNs, they can be very useful as a robust test, especially in case any new physics signal involving fat jets is found at the LHC.

A Logistic Regression implementation
In this appendix we describe in detail our implementation of the Logistic Regression classifier for distinguishing signal from background.
, where t i are the binary labels of each event (signal or background) and x i are the input variables, in our concrete case x = lnτ , where τ is the vector of subjettiness variables as defined in (1). We preprocess the data such that signal and background events are equal in number for the training set and validation sets. For the latter, we use 20% of the dataset, while the remaining 80% is used for training (note that the test data in our study are completely independent samples analysed a posteriori).
The Logistic Regression is a simple ML model for classification where the different classes of data points are assumed to be linearly separable in the space of input variables x. As a probabilistic discriminative model, its goal is to model the conditional probability p(C k |x i ) of a given class C k , with (k = 0, 1), given an input x i , and it does so with the following parametric form: where we identify C 0 as the signal class. In the above equation, σ is the sigmoid function, σ(z) = (1 + e −z ) −1 , and w, w 0 are the weights to be optimised. In our analysis we have dropped the 'intercept' w 0 , since we wanted to make a direct correspondence with the product defined in Eq. (5), which has shown to be a good signal-vs-background discriminator in the literature [26] for simple jet topologies. A non-zero intercept amounts to a global multiplicative constant in (5). We have checked in any case that the inclusion of the intercept does not have a visible impact in the classification performance. On the other hand, the weights w are identified as the coefficients c β n in Eq. (4). The optimisation of the w parameters has been done as usual by minimising the so-called cross-entropy cost function: The implementation of the above procedure has been performed with the automatic differentiation tool Tensorflow v1.14 [46]. We have used the Adam stochastic gradient descent optimiser with learning rate 0.01, and otherwise default parameters, and using a mini-batch of size n = 200. Convergence is typically achieved at around 200 epochs, the training process taking around 3 seconds in an Intel(R) Core i5-5200U CPU @ 2.20GHz with 8GB of RAM.
It is worth noting that as a result of the above optimisation method, different random initialisations 4 of the weights w will lead to optimal weights having O(1) differences. Even if such differences will not have an important impact in the error rate of the validation test, they will lead to slightly different distributions for the background and the signal when applying the trained model to the test samples. An example of these variations is shown in Fig. 2.

B Taggers without mass decorrelation
The application of the taggers without prior mass decorrelation produce a significant shaping of the jet mass spectrum for the QCD background. This is illustrated in Fig. 7 for two taggersT 4P designed in two different mass intervals. The peak-like structure produced is near 100 GeV in both cases, therefore the location of the bump is not related to the design mass interval, [60, 100] GeV for hi80 and [170, 230] GeV for hi200. These examples highlight the benefit of the mass decorrelation even if it is not perfect, see Fig. 1. A drawback of the decorrelation procedure is a slightly worse discrimination between signal and background, as it can be seen in Fig. 8, for four-pronged (left), three-pronged (middle) and two-pronged (right) taggers, designed in the same kinematical region hi80.