Mass Unspecific Supervised Tagging (MUST) for boosted jets

Jet identification tools are crucial for new physics searches at the LHC and at future colliders. We introduce the concept of Mass Unspecific Supervised Tagging (MUST) which relies on considering both jet mass and transverse momentum varying over wide ranges as input variables - together with jet substructure observables - of a multivariate tool. This approach not only provides a single efficient tagger for arbitrary ranges of jet mass and transverse momentum, but also an optimal solution for the mass correlation problem inherent to current taggers. By training neural networks, we build MUST-inspired generic and multi-pronged jet taggers which, when tested with various new physics signals, clearly outperform the discriminating variables commonly used by experiments.

Jet identification tools are crucial for new physics searches at the LHC and at future colliders. We introduce the concept of Mass Unspecific Supervised Tagging (MUST) which relies on considering both jet mass and transverse momentum varying over wide ranges as input variables -together with jet substructure observables -of a multivariate tool. This approach not only provides a single efficient tagger for arbitrary ranges of jet mass and transverse momentum, but also an optimal solution for the mass correlation problem inherent to current taggers. By training neural networks, we build MUST-inspired generic and multi-pronged jet taggers which, when tested with various new physics signals, clearly outperform the discriminating variables commonly used by experiments.
Introduction: The high-energy frontier of particle physics has been and will continue to be explored in the decades to come at the Large Hadron Collider (LHC), a machine designed to unveil the intricate dynamics of the Standard Model (SM) and search for new physics signals. Being a proton-proton collider, the LHC abundantly produces sprays of hadronized quarks and gluons (jets), stemming mainly from pure Quantum Chromodynamics (QCD) processes. When sufficiently boosted, the hadronic decay products of SM particles like the W , Z and Higgs bosons and the top quark become highly collimated yielding single 'fat' jets. This could also happen for new particles decaying hadronically. Actually, multijet signals originated from direct or cascade decays of yet unseen particles are predicted in a plethora of theoretical frameworks beyond the SM, ranging from left-right symmetric models [1] to scenarios with warped extra dimensions [2,3]. The complexity of the various possible jet topologies, and the importance of their identification, fostered the development of discrimination techniques to distinguish (signal) jets produced in collimated decays of heavy particles, from the QCD ones (background). Those methods have been extensively used, for instance, in searches for new gauge bosons, scalar and spin-2 particles [4][5][6][7][8][9][10][11][12][13], vector-like quarks [14][15][16][17] and dark matter [18], as well as in SM measurements [19,20].
Identification of jets requires (i) quantifying their mass m J , usually after applying some 'grooming' [21][22][23][24] to remove soft collinear radiation, and (ii) inferring the number of quarks or gluons clustered inside them (prongs). The latter procedure, commonly known as tagging, relies on either a single jet substructure variable like a Nsubjettiness [25] or energy correlation function [26,27], or on a multivariate method that takes as input a set of those variables [28] or jet images [29]. Since quark and gluon jet masses arise mostly from soft radiation, which also modifies jet substructure, mass and substructure variables turn out to be correlated. Their decorrelation is crucial in several experimental searches, as it prevents artificial peaks from appearing in the m J distribution of the SM background after imposing jet substructure constraints, and provides a way of improving its normalization by using sidebands. Moreover, mass decorrelation is a must in new physics searches looking for bumps in jet mass spectra. Given the relevance of this matter, several mass decorrelation methods have been proposed [30][31][32][33] (see [34] for a comparison of different methods and [35] for a review) and subsequently applied in a variety of experimental analyses [7,[10][11][12][18][19][20].
Beyond specific tools designed to identify a certain type of signal (e.g. weak or Higgs bosons and top quarks), more generic ones can also be developed. Supervised taggers use Monte Carlo (MC) simulations of two, three and four-pronged jets as signal, and QCD jets as background. Taking a complete set of substructure variables [28] for both types of jets within some range of m J and transverse momentum p T , a multivariate tagging tool such as a neural network (NN) [32] or a simpler logistic regression [36] can be designed, such that the tagger learns to identify multi-pronged jets as well as new physics objects for which it has not been trained. Alternative proposals focus on unsupervised or weakly-supervised methods, trained directly on data rather than on simulation. Broadly, unsupervised tools are able to distinguish multi-pronged jets from background either by training on samples with different signal and background proportions [37][38][39][40][41][42][43], or by using autoencoders trained on background regions [44][45][46][47][48][49]. Overall, supervised and unsupervised methods have different strengths and weaknesses; however, supervised tools would be certainly essential to claim a new physics discovery if a 5σ excess is found on data -extraordinary claims require extraordinary evidence! Mass decorrelation, as implemented so far in supervised generic taggers, has the disadvantage of showing a residual dependence of the results on the m J and p T training ranges. This makes the tagger performance to drop when applied to kinematical regions different from the ones used to train, as will be explicitly shown later. To overcome this problem, one could think of assembling an array of taggers in a two-dimensional grid of m J and p T to cover the whole kinematical region. But this adhoc solution, besides being quite complex, could lead to potential problems with boundary effects.
Up to now, classifiers based on jet substructure have either not taken m J as input variable [25-28, 32, 36], or arXiv:2008.12792v1 [hep-ph] 28 Aug 2020 have fixed it around some value suitable to tag a specific particle (e.g. a top quark [50]). In contrast, in this Letter we place both m J and p T on equal footing as compared to substructure observables by considering the latter as training inputs varying over wide kinematical ranges. This novel approach, which we dub as Mass Unspecific Supervised Tagging (MUST), not only removes the dependence of the tagger efficiency on m J and p T , but also solves the mass correlation problem in the best possible way by preserving the shape of the m J distribution after applying the tagger. The taggers built upon MUST cover wide ranges of m J and p T (in principle, as wide as wanted) with excellent discrimination performances across all those ranges. The nontrivial challenge of such tools is generating signal multi-pronged jets with continuous m J and p T distributions. This will be accomplished by means of a dedicated MC generator. A powerful multivariate method, such as the NN used here, is also required to correctly disentangle mass and p T effects on jet substructure variables from differences between QCD background and the various multi-pronged signals.
Jet sample generation: We generate signal and background jets as follows. QCD jets are generated with MadGraph [51], in the inclusive process pp → jj. Event samples are generated in 100 GeV bins from [200,300] GeV to [2.1, 2.2] TeV. This guarantees coverage of the entire p T range up to 2.2 TeV (of course, this arbitrarily chosen domain can be extended). Even though within each bin the events mainly populate the lower end of the interval, the bins are narrow enough to provide a smooth p T dependence. As for jet mass, the m J distribution for QCD jets is continuous and we select for our analysis the range [50,250] GeV.
The signal generation is quite more demanding and is carried out with a dedicated MC generator. We implement in Protos [52] the process pp → ZS, with Z → νν and S a scalar, for which we consider the six decay modes to generate multi-pronged jets (F is a fermion). To remain as model-agnostic as possible, the S and F decays are implemented with a flat matrix element, so that the decay weight of the different kinematical configurations only corresponds to the four-, three-or two-body phase space. These signal MC data are dubbed as Model Independent (MI), being its use motivated by the need of sampling phase space without model prejudice [32]. Likewise for the background, signal jet samples are generated in 100 GeV p T bins. To cover different jet masses, the mass of S (and of F for 3-pronged decays) is randomly chosen event by event within the interval [30,400] GeV, and setting an upper limit of p T R/2 to ensure that all decay products are contained in a jet of radius R = 0.8. The parton-level event samples generated with Mad-Graph and Protos are passed through Pythia [53] for hadronization and Delphes [54] for a fast detector simulation, using the CMS card. Jets are reconstructed with FastJet [55] applying the anti-k T algorithm [56] with R = 0.8, and groomed with Recursive Soft Drop [57].
Building the generic taggers: The 'model-agnostic' signals in (1) allows building supervised generic taggers. In this Letter we develop • a fully-generic tagger GenT using the full set of samples as signal and; • multi-pronged taggers GenT 4P , GenT 3P , GenT 2P , which only take the four-, three-and two-pronged jets as signal, respectively.
Jet substructure is characterized by the set of variables proposed in [28], 1 , τ 1 , . . . , τ computed for ungroomed jets. We have verified that including higher-order τ (β) n does not improve tagger discrimination.
The training set is obtained by splitting the considered m J range [50,250] GeV into four 50 GeV bins. For each of the six types of signal jets in (1) and simulated sample (which, as aforementioned, correspond to different 100 GeV slices of parton-level p T ) we extract N 0 = 5000 events from each of the four m J bins. In the lower p T samples we drop the higher mass bins, considering the full m J range only for the p T bins above 800 GeV. For the GenT tagger we take 6N 0 background events from each simulated sample and m J bin, while for the multipronged taggers we take 2N 0 , in order to train the NNs with a balanced sample. We have also explored the possibility of using unbalanced samples with more background than signal events, but we find no improvement in the discrimination power. In total, the GenT and multi-pronged taggers contain N = 4.14 × 10 6 and N/3 events, respectively. The validation sets used to monitor the NN performances are similar to the training ones.
As anticipated above, we follow a novel approach to train the NNs by considering m J and p T , varying over a very wide range, together with the 17 substructure observables as inputs. By means of a principal component analysis (PCA), we verified that the number of physically relevant combinations is actually smaller; however, since the computational speed is not jeopardized, we keep the full input set. A standardisation of the 19 inputs, based on the SM background distributions, is performed. The NN for GenT contains two hidden layers of 2048 and 128 nodes, with Rectified Linear Unit (ReLU) activation for the hidden layers and a sigmoid function for the output one. The NN optimization relies on the binary crossentropy loss function, using the Adam optimizer (other generalized loss functions such as the one proposed in [60] do not lead to appreciable improvements). The NNs for the multi-pronged taggers are similar but with hidden layers of 1024 and 64 nodes. We have found no relevant performance improvements of either tagger when using more hidden layers or layers with more nodes.
Tagger testing and performance analysis: Our taggers are tested with a variety of multi-pronged jet signals from W bosons, top quarks and new scalars of various masses. Namely, These particles are assumed to be produced with a high boost from the decay of a heavy Z resonance, for which we choose representative masses M Z = 1.1, 2.2, 3.3 TeV. As background, we use quark and gluon jets generated in pp → Zq, pp → Zg, with Z → νν, in a 1 : 1 ratio. All these processes are generated with MadGraph, and passed through the simulation and reconstruction chain described above. The tagger performances are evaluated by comparing the efficiencies for signal (ε sig ) and background (ε bkg ) within a narrow m J interval and with a lower cut on p T , so as to isolate the jet substructure discrimination power from that obtained with any other variable, such as m J and p T . (An upper cut on p T is not necessary since both signal and background concentrate towards lower p T values.) In particular, • For signals with M Z = 1.1, 2.2, 3.3 TeV, we set p T ≥ 0.5, 1.0, 1.5 TeV, respectively, for both signal and background. Besides explicitly showing the receiver operating characteristic (ROC) curve for each signal, we use the area under the ROC curve (AUC) in the (ε sig , ε bkg ) plane to quantify the discriminant power with a single quantity. For reference, we compare our results with those obtained with the commonly used ratios τ mn ≡ τ n . The top left panel of Fig. 1 shows the GenT and GenT 2P ROCs for W bosons. We also include the results obtained with τ 21 , often used as a discriminator by the CMS Collaboration [5,11,14]. We observe that the performance of GenT and GenT 2P is remarkable and improves with jet p T , i.e. with increasing M Z , in contrast to τ 21 . Therefore, our taggers provide an excellent alternative to those used, for example, in diboson resonance searches [11]. New scalars A decaying into bb are also looked for at the LHC [8,10]. The top right panel of Fig. 1 shows the results for the signal Z → AA, A → bb with different M Z and M A . The performance is very good across all the m J and p T range, also improving with p T . The results for top quarks are shown in the bottom left panel of Fig. 1 for a couple of Z masses, and compared with the subjettiness ratio τ 32 often used as discriminant [4,[15][16][17]. GenT and GenT 3P perform very well on top quarks, although fully-dedicated taggers [50] perform better (in contrast with [50], our ROCs do not include the additional signal discrimination from m J ). Still, it is worth noting that generic searches using either of those taggers would not miss top signals.
In the bottom right panel of Fig. 1 the four-pronged jet results are shown. For comparison we show τ 42 , which happens to be the τ 4n with highest AUC. Again, the GenT and GenT 4P performances are excellent, exhibiting, for instance, a background rejection better than that of τ 42 by a factor of ten, for ε sig = 0.5. As expected, in all cases we observe that the multi-pronged taggers provide a higher discrimination power than the generic one for their corresponding multi-pronged signals, but of course they are less general. The sensitivity to signals for which the taggers have not been trained will be presented elsewhere.
We are now in position of comparing our results with those obtained with PCA-decorrelated taggers [32] trained on narrow m J and p T intervals. We consider signals S → AA → bbbb with M Z = 2.2 TeV, M S = 80 GeV (as in Fig. 1), and S → W W → qqqq with M Z = 2.2 TeV, M S = 200 GeV (not shown in the figure). Following [32], we build the PCA-decorrelated taggers std1000 80 and std1000 200 for p T ≥ 1 TeV and m J ∈ [60, 80] GeV, m J ∈ [160, 240] GeV, respectively. Their AUC is compared with that of GenT in Table I, showing that the taggers trained on a narrow interval close to the signal perform slightly better, but are much worse when used away from the training region.
Mass decorrelation: Since our taggers are sensitive to multi-pronged jet signals across the whole m J and p T ranges, the SM background shape can be preserved by the simple method of varying the event selection thresh- old, as done by the CMS Collaboration e.g. in [10]. Let us show this explicitly with an example using GenT with a two-pronged (W ) and a four-pronged (S) signal. We define the variable ρ = 2 log m J /p T and consider a twodimensional grid (ρ, p T ) with ρ ∈ [−9, 0] in bins of width 0.2, and p T ∈ [0.25, 2.2] TeV in bins of 50 GeV. Within each bin, we compute the 5%, 25% and 50% percentiles of the NN score X, which we label as X 0.05 , X 0.25 and X 0.5 , respectively. Figure 2 shows the resulting jet mass distribution of the SM background for p T ≥ 1 TeV plus the W and S injected signals with M Z = 2.2 TeV, after applying event selections X ≤ X 0.5 , X 0.25 , X 0.05 (the uncut distribution is labelled as X 1.0 ). By construction, the varying-threshold scheme keeps the background distribution after selection, and the injected signals show up when the cut is sufficiently tight. Our generic taggers therefore provide a perfect solution to the mass correlation problem of jet substructure observables.
Concluding remarks: We have developed novel generic mass unspecific supervised taggers (MUSTs) for multipronged jets that keep an excellent performance across a very wide jet mass and p T range. Mass decorrelation can be easily implemented by the varying-threshold method. Overall, the excellent discrimination power (which increases with jet p T ) and the simplicity of their implementation, make our taggers ideal for the exploration of multi-TeV scales in a wide variety of LHC searches that rely on jet tagging.