Exploring the universality of hadronic jet classification

Cheung, Kingman; Chung, Yi-Lun; Hsu, Shih-Chieh; Nachman, Benjamin

doi:10.1140/epjc/s10052-022-11084-4

Exploring the universality of hadronic jet classification

Regular Article - Theoretical Physics
Open access
Published: 22 December 2022

Volume 82, article number 1162, (2022)
Cite this article

Download PDF

You have full access to this open access article

The European Physical Journal C Aims and scope Submit manuscript

Exploring the universality of hadronic jet classification

Download PDF

Kingman Cheung^1,5,
Yi-Lun Chung ORCID: orcid.org/0000-0001-8641-2601¹,
Shih-Chieh Hsu² &
…
Benjamin Nachman^3,4

920 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

The modeling of jet substructure significantly differs between Parton Shower Monte Carlo (PSMC) programs. Despite this, we observe that machine learning classifiers trained on different PSMCs learn nearly the same function. This means that when these classifiers are applied to the same PSMC for testing, they result in nearly the same performance. This classifier universality indicates that a machine learning model trained on one simulation and tested on another simulation (or data) will likely be optimal. Our observations are based on detailed studies of shallow and deep neural networks applied to simulated Lorentz boosted Higgs jet tagging at the LHC.

Towards machine learning analytics for jet substructure

Article Open access 30 September 2020

Quark jet versus gluon jet: fully-connected neural networks with high-level features

Article 26 June 2019

Neural network-based top tagger with two-point energy correlations and geometry of soft emissions

Article Open access 17 July 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep learning is becoming widely used for various classification tasks in collider physics (see e.g., Refs. [1,2,3,4,5,6]). One of the core benefits of deep learning over traditional analysis techniques is that it is able to identify patterns in very high-dimensional feature spaces. At the Large Hadron Collider (LHC), such low-level inputs are dominated by hadronic activity. Most machine learning approaches are trained using Parton Shower Monte Carlo (PSMC) simulations that produce exclusive final states with the same complexity as real data [7]. However, there are significant variations between PSMCs due to the large number of perturbative and non-perturbative modeling assumptions.

These variations lead to potential biases and suboptimal sensitivity in data analyses [8]. A bias occurs when the simulation model used for inference (given an analysis strategy) is not the same as nature. There is a large and growing literature on methods to reduce biases from PSMC model variations through decorrelation [9,10,11,12] and other approaches [13,14,15]. A key challenge with modeling uncertainties in contrast to experimental uncertainties is that they are often estimated by comparing two simulations. This difference does not have a statistical origin and may not be the full uncertainty, so caution is required to reduce the uncertainty through automated approaches [16]. A general solution to estimating (and then reducing) systematic uncertainties from PSMC variations is still an active area of research and development.^{Footnote 1}

In principle, the same challenge exists when quantifying suboptimal performance due to PSMC variations. Suboptimal performance occurs when the simulation model used for training a machine learning model is different than nature. While not directly a source of systematic uncertainty, this suboptimality has important consequences for the physics program of the LHC. To quantify the suboptimality, one could compare different PSMC models, as is done for determine the systematic uncertainty. This has the same unsatisfying properties as described above.

However, there have been a number of hints in the literature that the suboptimality due to PSMC variations may actually be small. For example, Ref. [19] observed that training a quark versus gluon jet classifier with the Herwig [20] PSMC and then applying it to jets simulated with the Pythia [21] PSMC has nearly the same performance as training with Pythia and also testing on Pythia (with a statistically identical, but independent dataset). This small difference in performance is contrasted to the large difference in performance when testing on jets from Herwig. A similar result was observed in the context of signal jets in Ref. [22]. From these observations, we conjecture that the deep learning models are learning universal properties of quantum chromodynamics (QCD). We hypothesis that the performance gaps present when the test sets differ simply reflects variations in the amount of QCD radiation, but not the type of information that is useful for discrimination.

To build intuition for this conjecture, consider the case of quark versus gluon jet tagging. At leading logarithmic (LL) order and considering only infrared and collinear safe observables, the optimal classifier is simply iterated-soft-drop multiplicity inside the jet [23]. This statement is true independent of the strong coupling constant, $\alpha _s$. However, common metrics of performance such as the Area Under the Curve (AUC) depend on $\alpha _s$;^{Footnote 2} when there are more emissions (higher $\alpha _s$), the quark and gluon perturbative multiplicity distributions are more separable. In particular, at LL, perturbative multiplicity is a Poisson random variable with a mean that is proportional to a color factor multiplied by $\alpha _s$. As $\alpha _s$ grows, the gluon distribution grows significantly faster than the quark one:

$$\begin{aligned} \frac{\mu _g-\mu _q}{\sqrt{\sigma _g^2+\sigma _q^2}}&\sim \frac{\alpha _s(C_F-C_A)}{\sqrt{\alpha _s C_F+\alpha _s C_A}}\propto \sqrt{\alpha _s}\,, \end{aligned}$$

(1.1)

where $C_F=4/3$ ($C_A=3$) is the quark (gluon) color factor. Imagine that two PSMCs had the same physics approximations, but different values of $\alpha _s$. They would find the same classifier and thus if the test set is the same, the performance would be the same.

Our goal is to test the universality hypothesis in detail using the important benchmark problem of Lorentz boosted Higgs boson jet versus QCD jet tagging. In this context, universality means that the learned classifiers are the same up to a monotonic re-scaling, which means that they result in the same decision boundaries. We consider both shallow and deep learning models as well as a variety of PSMC models.

This paper is organized as follows. A concrete example are introduced in Sect. 2. Architectures of deep-learning classifiers are in Sect. 3. The results are provided in Sect. 4. The paper ends with conclusions and outlook in Sect. 5.

2 Numerical examples

Lorenz-boosted Higgs tagging, focusing on the $b\bar{b}$ final state, is the example in this study. High-level features and low-level inputs are used to train shallow and deep-learning classifiers.

2.1 Monte Carlo samples

This study considers Lorenz-boosted Higgs tagging, focusing on the $b\bar{b}$ final state. The signal is high $p_T$ Higgs bosons and the background is generic quark and gluon jets. The hard-scatter reactions are common to all parton shower models and are generated with MadGraph5_aMC@NLO 2.7.3 [24] for modeling pp collisions at $\sqrt{s}$ = 14 TeV. The PDF4LHC15_nnlo_mc [25] parton distribution function and the NNPDF30_nlo_as_0118 [26] parton distribution function are used for signal and background, respectively.

The hard-scattering events are passed to Pythia 8.303 [21] to simulate the parton shower, using three different complete parton-shower frameworks. The first one is default setting, where evolution variable is virtuality of the off-shell propagator. The second framework is Virtual Numerical Collider with Interleaved Antennae (Vincia) shower [27,28,29], where the evolution variable is transverse momentum for QCD + EW/QED showers based on the antenna formalism. The last framework is Dipole resummation (Dire), which is a transverse-momentum ordered dipole shower. The PYTHIA family uses the string model [30, 31] for hadronization. The string model is based on string fragmentation function to break string to form hardons. Herwig 7.2.2 [20] with angularly-ordered showers is also used to model the parton shower. The cluster model [32, 33] is implemented in Herwig 7.2.2. The cluster hadronization model is based on preconfinement. This model forcibly decays gluons into quark-antiquark pairs and form neutral clusters. Pyjet [34, 35] and the anti-$k_t$ [36] algorithm with radius parameter R = 1.0 are used to define the jets.

An event preselection similar to Ref. [37] is used to reject most background events. The Higgs-like jet is required to satisfy 300 GeV < $p^J_T$ < 500 GeV, 110 GeV < invariant mass of the jet ($M_J$) < 160 GeV and to be double b-tagged. Jets are declared double b-tagged if they have two or more ghosted-associated [38, 39] B hadrons. After the preselection, the high-level jet features and low-level features are used to probe the universality of discriminating boosted Higgs jets from QCD jets.

Since the goal of this paper is to investigate the universality of hadronic jet classification, there are a number of simplifying assumptions. The background in the study is only generic quark and gluon jets. The relatively smaller $t\bar{t}$ background is ignored. For each PSMC setup, the default parameters are used.

2.2 High-level features

In order to distinguish Higgs jets via Gradient Tree Boosting (BDT) and a fully connected / dense neural network, the following six commonly-used high-level features are considered:

1.
$M_J$: invariant mass of the leading jet;
2.
$\tau _{21}=\tau _2/\tau _1$: n-subjettiness ratio [40, 41];
3.
$D_2^{(\beta )}=e_3^{(\beta )}/(e_2^{(\beta )})^3$ with $\beta =1,2$: energy correlation function ratios [42];
4.
$C_2^{(\beta )}=e_3^{(\beta )}/(e_2^{(\beta )})^2$ with $\beta =1,2$: energy correlation function ratios [43];

where $e_i$ is the normalized sum over doublets ($i=2$) or triplets ($i=3$) of constituents inside jets, weighted by the product of the constituent transverse momenta and pairwise angular distances. For this analysis, $\beta $ is considered to be 1 and 2.

The distributions of these six variables are shown in Fig. 1, in which the capability of each observable to discriminate between signal and background is demonstrated. The salient features of these histograms are described below.

The jet invariant mass distribution peaks near the Higgs boson mass of 125 GeV [44] for the signal and has a broad distribution for the background. In the setup of this study, Herwig 7.2.2 with angularly-ordered showers leads to slightly higher and broader signal peak due to different underlying event structure compared to Pythia 8.303. Similarly, the distributions of $\tau _{21}$, $D_2^\beta $, and $C_2^\beta $ show similar position and shape of the peak among the Pythia PSMC’s, but somewhat different for the Herwig Angular. The two-prong structure due to the decay of massive objects into two hard QCD partons in the case of the signal jets results in low $\tau _{21}$, $D_2$ and $C_2$.

2.3 Low-level features

The low-level inputs to the CNN are images of Higgs-like jet [45, 46]. The resolution is 40$\times $40 pixels and in 1R$\times $1R range, where R is the jet radius. The images consist of three channels, analogous to the Red-Green-Blue (RGB) channels of a color image [19]. The pixel intensity for the three channels correspond to the sum of the charged particle $p_T$, the sum of the neutral particle $p_T$, and the number of charged particles in a given region of the image. The Higgs-like jet images are rotated to align along two-subject’s axis. The leading subjet is at the origin and the subleading subjet is directly below the leading subjet. If there is a third-leading subjet, the image will be reflected. All images are normalized so that the intensities all sum to unity.^{Footnote 3} After normalization, the pixel intensities are standardized so that their distribution has mean zero and unit variance. Figure 2 shows the average Higgs-like jet images in the charged $p_T$ channel. The patterns in the charged $p_T$ channel are similar to the other two channels.

Figure 3 shows the difference between the four PSMC algorithms with respect to Pythia 8.303 default showering, referred to as the nominal simulation. The substructure in jets are different among the other three PSMC simulations with respect to the nominal sample due to different approximations made in the final state radiation and other QCD effects. This diversity of the PSMC approaches may effect the performance of jet classifiers trained on low-level features. Therefore, we train a convolutional neural network-based jet classifier to explore this generator-dependence of classification performance.

3 Classifier architectures

The BDT has a fixed number of estimators (1400) with maximum depth 5. The minimum number of samples is fixed at 5% as required to split an internal node and 1% as required to be at a leaf node. This BDT model is trained on the high-level features of the jet using the scikit-learn library [48]. KerasTuner [49] is used to get the best configuration of hyperparameters.

The dense neural network has four full connected layers. There are 224, 928, 288 and 1024 neurons, respectively. Rectified linear unit (ReLU) activation functions are used for all layers of this neural network. Before the output layer, Dropout [50] regularization is added to reduce overfitting with a dropout rate = 0.01. For this two-class problem, the activation function of the output layer is a sigmoid function. The binary cross entropy loss function is optimized during the training. The Adam optimizer [51] with a learning rate of 6.5428$\times 10^{-5}$ is used to select the network weights. The KerasTuner [49] is used to get the best configuration of hyperparameters. The Keras-2.4.0 library is used to train the dense neural network models with the TENSORFLOW-2.4.1 [52] backend, on a NVIDIA A100 SXM 80GB Graphical Processing Unit (GPU).

Details of the CNN are as follows. The convolution filter is 5$\times $5, the maximum pooling layers are 2$\times $2, and the stride length is 1. ReLU activation functions are used for all intermediate layers of the neural network. The first convolution layer has 96 filters and the second convolution layer in each stream has 32 filters. A flatten layer is used after the second maximum pooling layer. Two dense layers are connected to the flatten layer with 350 and 400 neurons, respectively. Before the output layer, Dropout regularization is added with a dropout rate = 0.01. As for the dense network, the last activation is a sigmoid function and binary cross entropy is optimized during training. The AdaDelta optimizer [53] with learning rate 6.0216$\times 10^{-3}$ is used to select the network weights. The KerasTuner [49] is used to get the best configuration of hyperparameters. The same setup as for the dense network is used to run the CNN.

4 Results

In this study, the receiver operating characteristic curve (ROC), the area under the ROC curve (AUC), the maximum significance improvement characteristic (SIC) and rejection (inverse background efficiency) at 50% signal efficiency are used to be metrics to quantify the universality. The AUC is between 0.5 (poor classification performance) and 1 (maximum classification performance). The SIC is the signal efficiency divided by the square root of the background efficiency and represents by how much (as a multiplicative factor) the significance would improve with a given threshold on the classifier score. The maximum SIC is simply the maximum SIC attained across all thresholds. In order to quantify the variation from classifier training itself, the performance is evaluated by k-fold cross-validation technique with $k=50$. In this procedure, the datasets are randomly partitioned into 50 parts and for each one, the other 49 sets are used for constructing the classifier. The mean and spread over the folds is used to quantify the model performance.

Figure 4 shows four classifiers trained on various simulations and then tested on the same Herwig dataset. Overall, the CNN has the best performance and the DNN is marginally better than the BDT. The DNN and BDT are trained on the same features and given the relatively low-dimensionality of the problem, it is unsurprising that the two models have a similar performance. Overall, the performance is nearly identical for all training sets. This is even true for the CNN, which has access to low-level substructure information inside the jets. The insensitivity to the training set is in stark contrast to the sensitivity of the test set, as summarized in detail below. Additional results can be found in Appendix A.

The performance of Fig. 4 for all combinations of train and test sets for the three machine learning models are summarized in Figs. 5, 6 and 7. Starting with Fig. 5, we observe that there is a significant spread in performance across test sets (rows). The difference between Higgs jets and QCD jets is smaller for Herwig compared with Pythia by almost 10%. However, the spread in performance for a given test set is about 1%. Similar trends are present for the rejection at a fixed efficiency (Fig. 6) and maximum SIC (Fig. 7) plots, albeit with larger sensitivities to the machine learning training.

5 Conclusions and outlook

We have explored the universality of classifiers trained on hadronic jet tagging. In particular, we have studied the sensitivity of the learned classifier to the Parton Shower Monte Carlo program used during training. While the modeling of the hadronic structure differs significantly among PSMCs, we find that the actual function learned is nearly independent of the training set. This gives us confidence that a classifier trained on one PSMC and tested on another (or data) will likely still be optimal. Although it is not directly a source of uncertainty for physics analysis, this observation has important implications for making the best use of our data. The classifier universality does not mean that the systematic uncertainty from hadronic modeling is small as bias and optimality are separate concepts (see e.g., Ref. [8]).

The universality not only has important experimental implications, but also motivates further theoretical studies. As in the quark versus gluon jet example referenced in Sect. 1, the universality of the classifiers suggests that a theoretical explanation of the classification performance may be attainable as it should be insensitive to the detailed modeling assumptions of a particular PSMC program. We look forward to studies in this direction.

Uncertainty quantification is a critical component of any analysis at the LHC and this task is particularly challenging for analysis strategies like machine learning that are sensitive to low-level hadronic modeling. While determining systematic uncertainties on the potential bias of a result from hadronic modeling is still an active area of research and development, we have shown that at least the optimality of machine learning classifiers is relatively insensitive to hadronic modeling. While we have observed this disconnect between bias and optimality for Higgs jet tagging, we conjecture that this is a generic feature of QCD and it may also be present in other systems at the LHC and beyond.

Data Availability

This manuscript has associated data in a data repository. [Authors’ comment: The data are available upon reasonable request.]

Notes

See Refs. [17, 18] for the possibility of using machine learning to bound these uncertainties.
If Casimir scaling were holding the AUC would have been independent of $\alpha _s$. However, multiplicity breaks the Casimir scaling such that the AUC depends on $\alpha _s$.
This may remove useful discriminating information; however, it significantly improves the stability of the machine learning training [47].

References

M. Feickert, B. Nachman, A living review of machine learning for particle physics. arXiv:2102.02770
A.J. Larkoski, I. Moult, B. Nachman, Jet substructure at the large hadron collider: a review of recent advances in theory and machine learning. Phys. Rep. 841, 1–63 (2020). arXiv:1709.04464
Article ADS Google Scholar
D. Guest, K. Cranmer, D. Whiteson, Deep learning and its application to LHC physics. Ann. Rev. Nucl. Part. Sci. 68, 161–181 (2018). arXiv:1806.11484
Article ADS Google Scholar
A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, T. Wongjirad, Machine learning at the energy and intensity frontiers of particle physics. Nature 560(7716), 41–48 (2018)
Article ADS Google Scholar
D. Bourilkov, Machine and Deep learning applications in particle physics. Int. J. Mod. Phys. A 34(35), 1930019 (2020). arXiv:1912.08245
Article ADS Google Scholar
G. Karagiorgi, G. Kasieczka, S. Kravitz, B. Nachman, D. Shih, Machine learning in the search for new fundamental physics. arXiv:2112.03769
A. Buckley et al., General-purpose event generators for LHC physics. Phys. Rep. 504, 145–233 (2011). arXiv:1101.2599
Article ADS Google Scholar
B. Nachman, A guide for deploying deep learning in LHC searches: how to achieve optimality and account for uncertainty. SciPost Phys. 8, 090 (2020). arXiv:1909.03081
Article ADS MathSciNet Google Scholar
G. Louppe, M. Kagan, K. Cranmer, Learning to pivot with adversarial networks, in Advances in Neural Information Processing Systems, vol. 30, ed. by I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Curran Associates, Inc., 2017). arXiv:1611.01046
C. Englert, P. Galler, P. Harris, M. Spannowsky, Machine learning uncertainties with adversarial neural networks. Eur. Phys. J. C 79(1), 4 (2019). arXiv:1807.08763
Article ADS Google Scholar
S. Wunsch, S. Jórger, R. Wolf, G. Quast, Reducing the dependence of the neural network function to systematic uncertainties in the input space. arXiv:1907.11674
J.M. Clavijo, P. Glaysher, J.M. Katzy, Adversarial domain adaptation to reduce sample bias of a high energy physics classifier. arXiv:2005.00568
P. De Castro, T. Dorigo, INFERNO: inference-aware neural optimisation. Comput. Phys. Commun. 244, 170–179 (2019). arXiv:1806.04743
Article ADS Google Scholar
A. Ghosh, B. Nachman, D. Whiteson, Uncertainty-aware machine learning for high energy physics. Phys. Rev. D 104(5), 056026 (2021). arXiv:2105.08742
Article ADS Google Scholar
N. Simpson, L. Heinrich, neos: End-to-end-optimised summary statistics for high energy physics, in 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded—Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 3 (2022). arXiv:2203.05570
A. Ghosh, B. Nachman, A cautionary tale of decorrelating theory uncertainties. arXiv:2109.08159
B. Nachman, C. Shimmin, AI safety for high energy physics. arXiv:1910.08606
A. Stein, X. Coubez, S. Mondal, A. Novak, A. Schmidt, Improving robustness of jet tagging algorithms with adversarial training. arXiv:2203.13890
P.T. Komiske, E.M. Metodiev, M.D. Schwartz, Deep learning in color: towards automated quark/gluon jet discrimination. JHEP 01, 110 (2017). arXiv:1612.01551
Article ADS MATH Google Scholar
J. Bellm et al., Herwig 7.0/Herwig++ 3.0 release note. Eur. Phys. J. C 76(4), 196 (2016). arXiv:1512.01178
Article ADS Google Scholar
T. Sjostrand, S. Mrenna, P.Z. Skands, A brief introduction to PYTHIA 8.1. Comput. Phys. Commun. 178, 852–867 (2008). arXiv:0710.3820
Article ADS MATH Google Scholar
J.A. Aguilar-Saavedra, Taming modeling uncertainties with mass unspecific supervised tagging. Eur. Phys. J. C 82(3), 270 (2022). arXiv:2201.11143
Article ADS Google Scholar
C. Frye, A.J. Larkoski, J. Thaler, K. Zhou, Casimir meets Poisson: improved quark/gluon discrimination with counting observables. JHEP 09, 083 (2017). arXiv:1704.06266
Article ADS Google Scholar
J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H.S. Shao, T. Stelzer, P. Torrielli, M. Zaro, The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations. JHEP 07, 079 (2014). arXiv:1405.0301
Article ADS MATH Google Scholar
J. Butterworth et al., PDF4LHC recommendations for LHC Run II. J. Phys. G 43, 023001 (2016). arXiv:1510.03865
Article ADS Google Scholar
NNPDF Collaboration, R.D. Ball et al., Parton distributions for the LHC Run II. JHEP 04, 040 (2015). arXiv:1410.8849
P. Skands, R. Verheyen, Multipole photon radiation in the Vincia parton shower. Phys. Lett. B 811, 135878 (2020). arXiv:2002.04939
Article Google Scholar
H. Brooks, C.T. Preuss, Efficient multi-jet merging with the Vincia sector shower. Comput. Phys. Commun. 264, 107985 (2021). arXiv:2008.09468
Article MathSciNet Google Scholar
N. Fischer, S. Prestel, M. Ritzmann, P. Skands, Vincia for hadron colliders. Eur. Phys. J. C 76(11), 589 (2016). arXiv:1605.06142
Article ADS Google Scholar
T. Sjöstrand, Jet fragmentation of multiparton configurations in a string framework. Nucl. Phys. B 248(2), 469–502 (1984)
Article ADS Google Scholar
B. Andersson, G. Gustafson, G. Ingelman, T. Sjöstrand, Parton fragmentation and string dynamics. Phys. Rep. 97(2), 31–145 (1983)
Article ADS Google Scholar
B. Webber, A qcd model for jet fragmentation including soft gluon interference. Nucl. Phys. B 238(3), 492–528 (1984)
Article ADS Google Scholar
J.-C. Winter, F. Krauss, G. Soff, A modified cluster hadronization model. Eur. Phys. J. C 36, 381–395 (2004). arXiv:hep-ph/0311085
N. Dawe, E. Rodrigues, H. Schreiner, B. Ostdiek, D. Kalinkin, M.R.S. Meehan, aryan26roy, and domen13, scikit-hep/pyjet: Version 1.8.2, Jan (2021)
M. Cacciari, G.P. Salam, G. Soyez, yFastJet user manual. Eur. Phys. J. C 72, 1896 (2012). arXiv:1111.6097
Article ADS MATH Google Scholar
M. Cacciari, G.P. Salam, G. Soyez, The anti-$k_t$ jet clustering algorithm. JHEP 04, 063 (2008). arXiv:0802.1189
Article ADS MATH Google Scholar
J. Lin, M. Freytsis, I. Moult, B. Nachman, Boosting $H\rightarrow b{\bar{b}}$ with Machine Learning. JHEP 10, 101 (2018). arXiv:1807.10768
Article ADS Google Scholar
M. Cacciari, G.P. Salam, G. Soyez, The catchment area of jets. JHEP 04, 005 (2008). arXiv:0802.1188
Article ADS Google Scholar
A. Buckley, C. Pollard, yQCD-aware partonic jet clustering for truth-jet flavour labelling. Eur. Phys. J. C 76(2), 71 (2016). arXiv:1507.00508
Article ADS Google Scholar
J. Thaler, K. Van Tilburg, Identifying boosted objects with N-subjettiness. JHEP 03, 015 (2011). arXiv:1011.2268
Article ADS Google Scholar
J. Thaler, K. Van Tilburg, Maximizing boosted top identification by minimizing N-subjettiness. JHEP 02, 093 (2012). arXiv:1108.2701
Article ADS Google Scholar
A.J. Larkoski, I. Moult, D. Neill, Power counting to better jet observables. JHEP 12, 009 (2014). arXiv:1409.6298
Article ADS Google Scholar
A.J. Larkoski, G.P. Salam, J. Thaler, Energy correlation functions for jet substructure. JHEP 06, 108 (2013). arXiv:1305.0007
Article ADS MathSciNet MATH Google Scholar
Particle Data Group, Review of particle physics. Progr. Theor. Exp. Phys. 2020(08), 083C01 (2020)
J. Cogan, M. Kagan, E. Strauss, A. Schwarztman, Jet-images: computer vision inspired techniques for jet tagging. JHEP 02, 118 (2015). arXiv:1407.5675
Article ADS Google Scholar
L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, A. Schwartzman, Jet-images—deep learning edition. JHEP 07, 069 (2016). arXiv:1511.05190
Article Google Scholar
L. de Oliveira, M. Paganini, B. Nachman, Learning particle physics by example: location-aware generative adversarial networks for physics synthesis. arXiv:1701.05927
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
T. O’Malley, E. Bursztein, J. Long, ß. Chollet, H. Jin, L. Invernizzi, et al. KerasTuner. (2019). https://github.com/keras-team/keras-tuner
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)
MathSciNet MATH Google Scholar
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization (2014). https://doi.org/10.48550/ARXIV.1412.6980; arXiv:1412.6980
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org
M.D. Zeiler, ADADELTA: an adaptive learning rate method, CoRR. (2012). arXiv:1212.5701

Download references

Acknowledgements

We thank J. Thaler for useful feedback on the manuscript. K. Cheung and Y.-L.Chung are supported by the Taiwan MoST with the grant number MOST-110-2112-M-007-017-MY3. S.-C. Hsu is supported by the U.S. Department of Energy, Office of Science, Office of Early Career Research Program under Award number DE-SC0015971. B. Nachman is supported by the U.S. Department of Energy, Office of Science under contract DE-AC02-05CH11231.

Author information

Authors and Affiliations

Department of Physics and Center for Theory and Computation, National Tsing Hua University, Hsinchu, 300, Taiwan
Kingman Cheung & Yi-Lun Chung
Department of Physics, University of Washington, Seattle, WA, 98195, USA
Shih-Chieh Hsu
Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Benjamin Nachman
Berkeley Institute for Data Science, University of California, Berkeley, CA, 94720, USA
Benjamin Nachman
Division of Quantum Phases and Devices,School of Physics, Konkuk University, Seoul, 143-701, Republic of Korea
Kingman Cheung

Authors

Kingman Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Lun Chung
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Chieh Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Nachman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi-Lun Chung.

Appendix A: Remaining results

See Tables 1, 2, 3, 4, 5, 6, 7, Figs. 8, 9 and 10.

Table 1 Area under the curve, rejection at 50% signal efficiency and maximum significance improvement when testing on Herwig for each trained classifier. The last rows are the average and standard deviation over the mean values from the other rows

Full size table

Table 2 Area under the curve, rejection at 50% signal efficiency and maximum significance improvement when testing on Pythia default for each trained classifier. The last rows are the average and standard deviation over the mean values from the other rows

Full size table

Table 3 Area under the curve, rejection at 50% signal efficiency and maximum significance improvement when testing on Pythia VINCIA for each trained classifier. The last rows are the average and standard deviation over the mean values from the other rows

Full size table

Table 4 Area under the curve, rejection at 50% signal efficiency and maximum significance improvement when testing on Pythia dipole for each trained classifier. The last rows are the average and standard deviation over the mean values from the other rows

Full size table

Table 5 Area under the curve, rejection at 50% signal efficiency and maximum significance improvement for the BDT model. The last rows are the average and standard deviation over the mean values from the other rows

Full size table

Table 6 Area under the curve, rejection at 50% signal efficiency and maximum significance improvement for the DNN model. The last rows are the average and standard deviation over the mean values from the other rows

Full size table

Table 7 Area under the curve, rejection at 50% signal efficiency and maximum significance improvement for the CNN model. The last rows are the average and standard deviation over the mean values from the other rows

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Funded by SCOAP³. SCOAP³ supports the goals of the International Year of Basic Sciences for Sustainable Development.

Reprints and permissions

About this article

Cite this article

Cheung, K., Chung, YL., Hsu, SC. et al. Exploring the universality of hadronic jet classification. Eur. Phys. J. C 82, 1162 (2022). https://doi.org/10.1140/epjc/s10052-022-11084-4

Download citation

Received: 21 April 2022
Accepted: 29 November 2022
Published: 22 December 2022
DOI: https://doi.org/10.1140/epjc/s10052-022-11084-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Exploring the universality of hadronic jet classification

Abstract

Similar content being viewed by others

Towards machine learning analytics for jet substructure

Quark jet versus gluon jet: fully-connected neural networks with high-level features

Neural network-based top tagger with two-point energy correlations and geometry of soft emissions

1 Introduction