Abstract
Ensemble learning is a technique where multiple component learners are combined through a protocol. We propose an Ensemble Neural Network (ENN) that uses the combined latent-feature space of multiple neural network classifiers to improve the representation of the network hypothesis. We apply this approach to construct an ENN from Convolutional and Recurrent Neural Networks to discriminate top-quark jets from QCD jets. Such ENN provides the flexibility to improve the classification beyond simple prediction combining methods by linking different sources of error correlations, hence improving the representation between data and hypothesis. In combination with Bayesian techniques, we show that it can reduce epistemic uncertainties and the entropy of the hypothesis by simultaneously exploiting various kinematic correlations of the system, which also makes the network less susceptible to a limitation in training sample size.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
S. Marzani, G. Soyez and M. Spannowsky, Looking inside jets: an introduction to jet substructure and boosted-object phenomenology, Lect. Notes Phys. 958 (2019) 1 [arXiv:1901.10342] [INSPIRE].
T. Plehn, G.P. Salam and M. Spannowsky, Fat jets for a light Higgs, Phys. Rev. Lett. 104 (2010) 111801 [arXiv:0910.5472] [INSPIRE].
T. Plehn, M. Spannowsky, M. Takeuchi and D. Zerwas, Stop reconstruction with tagged tops, JHEP 10 (2010) 078 [arXiv:1006.2833] [INSPIRE].
T. Plehn, M. Spannowsky and M. Takeuchi, How to improve top tagging, Phys. Rev. D 85 (2012) 034029 [arXiv:1111.5034] [INSPIRE].
D.E. Soper and M. Spannowsky, Finding top quarks with shower deconstruction, Phys. Rev. D 87 (2013) 054012 [arXiv:1211.3140] [INSPIRE].
D.E. Soper and M. Spannowsky, Finding physics signals with shower deconstruction, Phys. Rev. D 84 (2011) 074002 [arXiv:1102.3480] [INSPIRE].
D.E. Soper and M. Spannowsky, Finding physics signals with event deconstruction, Phys. Rev. D 89 (2014) 094005 [arXiv:1402.1189] [INSPIRE].
S. Prestel and M. Spannowsky, HYTREES: combining matrix elements and parton shower for hypothesis testing, Eur. Phys. J. C 79 (2019) 546 [arXiv:1901.11035] [INSPIRE].
J. Brehmer, K. Cranmer, G. Louppe and J. Pavez, Constraining effective field theories with machine learning, Phys. Rev. Lett. 121 (2018) 111801 [arXiv:1805.00013] [INSPIRE].
J. Brehmer, F. Kling, I. Espejo and K. Cranmer, MadMiner: machine learning-based inference for particle physics, Comput. Softw. Big Sci. 4 (2020) 3 [arXiv:1907.10621] [INSPIRE].
G. Louppe, M. Kagan and K. Cranmer, Learning to pivot with adversarial networks, arXiv:1611.01046 [INSPIRE].
C.K. Khosa and V. Sanz, Anomaly awareness, arXiv:2007.14462 [INSPIRE].
L.G. Almeida, M. Backović, M. Cliche, S.J. Lee and M. Perelstein, Playing tag with ANN: boosted top identification with pattern recognition, JHEP 07 (2015) 086 [arXiv:1501.05968] [INSPIRE].
G. Kasieczka, T. Plehn, M. Russell and T. Schell, Deep-learning top taggers or the end of QCD?, JHEP 05 (2017) 006 [arXiv:1701.08784] [INSPIRE].
A. Butter, G. Kasieczka, T. Plehn and M. Russell, Deep-learned top tagging with a Lorentz layer, SciPost Phys. 5 (2018) 028 [arXiv:1707.08966] [INSPIRE].
J. Pearkes, W. Fedorko, A. Lister and C. Gay, Jet constituents for deep neural network based top quark tagging, arXiv:1704.02124 [INSPIRE].
S. Egan, W. Fedorko, A. Lister, J. Pearkes and C. Gay, Long Short-Term Memory (LSTM) networks with jet constituents for boosted top tagging at the LHC, arXiv:1711.09059 [INSPIRE].
S. Macaluso and D. Shih, Pulling out all the tops with computer vision and deep learning, JHEP 10 (2018) 121 [arXiv:1803.00107] [INSPIRE].
S. Choi, S.J. Lee and M. Perelstein, Infrared safety of a neural-net top tagging algorithm, JHEP 02 (2019) 132 [arXiv:1806.01263] [INSPIRE].
L. Moore, K. Nordström, S. Varma and M. Fairbairn, Reports of my demise are greatly exaggerated: N-subjettiness taggers take on jet images, SciPost Phys. 7 (2019) 036 [arXiv:1807.04769] [INSPIRE].
A. Blance, M. Spannowsky and P. Waite, Adversarially-trained autoencoders for robust unsupervised new physics searches, JHEP 10 (2019) 047 [arXiv:1905.10384] [INSPIRE].
S.H. Lim and M.M. Nojiri, Spectral analysis of jet substructure with neural networks: boosted Higgs case, JHEP 10 (2018) 181 [arXiv:1807.03312] [INSPIRE].
J. Lin, M. Freytsis, I. Moult and B. Nachman, Boosting H → \( b\overline{b} \) with machine learning, JHEP 10 (2018) 101 [arXiv:1807.10768] [INSPIRE].
P. Baldi, K. Bauer, C. Eng, P. Sadowski and D. Whiteson, Jet substructure classification in high-energy physics with deep neural networks, Phys. Rev. D 93 (2016) 094034 [arXiv:1603.09349] [INSPIRE].
G. Louppe, K. Cho, C. Becot and K. Cranmer, QCD-aware recursive neural networks for jet physics, JHEP 01 (2019) 057 [arXiv:1702.00748] [INSPIRE].
J. Gallicchio and M.D. Schwartz, Quark and gluon jet substructure, JHEP 04 (2013) 090 [arXiv:1211.7038] [INSPIRE].
P.T. Komiske, E.M. Metodiev and M.D. Schwartz, Deep learning in color: towards automated quark/gluon jet discrimination, JHEP 01 (2017) 110 [arXiv:1612.01551] [INSPIRE].
T. Cheng, Recursive neural networks in quark/gluon tagging, Comput. Softw. Big Sci. 2 (2018) 3 [arXiv:1711.02633] [INSPIRE].
P.T. Komiske, E.M. Metodiev and J. Thaler, Energy flow networks: deep sets for particle jets, JHEP 01 (2019) 121 [arXiv:1810.05165] [INSPIRE].
S. Bright-Thonney and B. Nachman, Investigating the topology dependence of quark and gluon jets, JHEP 03 (2019) 098 [arXiv:1810.05653] [INSPIRE].
A.J. Larkoski, I. Moult and B. Nachman, Jet substructure at the Large Hadron Collider: a review of recent advances in theory and machine learning, Phys. Rept. 841 (2020) 1 [arXiv:1709.04464] [INSPIRE].
L. de Oliveira, M. Kagan, L. Mackey, B. Nachman and A. Schwartzman, Jet-images — deep learning edition, JHEP 07 (2016) 069 [arXiv:1511.05190] [INSPIRE].
O. Kitouni, B. Nachman, C. Weisser and M. Williams, Enhancing searches for resonances with machine learning and moment decomposition, JHEP 04 (2021) 070 [arXiv:2010.09745] [INSPIRE].
X. Ju and B. Nachman, Supervised jet clustering with graph neural networks for Lorentz boosted bosons, Phys. Rev. D 102 (2020) 075014 [arXiv:2008.06064] [INSPIRE].
A. Butter, S. Diefenbacher, G. Kasieczka, B. Nachman and T. Plehn, GANplifying event samples, arXiv:2008.06545 [INSPIRE].
S. Farrell et al., Next generation generative neural networks for HEP, EPJ Web Conf. 214 (2019) 09005 [INSPIRE].
J. Lin, W. Bhimji and B. Nachman, Machine learning templates for QCD factorization in the search for physics beyond the Standard Model, JHEP 05 (2019) 181 [arXiv:1903.02556] [INSPIRE].
K. Datta, A. Larkoski and B. Nachman, Automating the construction of jet observables with machine learning, Phys. Rev. D 100 (2019) 095016 [arXiv:1902.07180] [INSPIRE].
R.T. D’Agnolo, G. Grosso, M. Pierini, A. Wulzer and M. Zanetti, Learning multivariate new physics, Eur. Phys. J. C 81 (2021) 89 [arXiv:1912.12155] [INSPIRE].
R.T. D’Agnolo and A. Wulzer, Learning new physics from a machine, Phys. Rev. D 99 (2019) 015014 [arXiv:1806.02350] [INSPIRE].
B. Nachman and J. Thaler, E pluribus unum ex machina: learning from many collider events at once, arXiv:2101.07263 [INSPIRE].
T. Faucett, J. Thaler and D. Whiteson, Mapping machine-learned physics into a human-readable space, Phys. Rev. D 103 (2021) 036020 [arXiv:2010.11998] [INSPIRE].
C.K. Khosa, L. Mars, J. Richards and V. Sanz, Convolutional neural networks for direct detection of dark matter, J. Phys. G 47 (2020) 095201 [arXiv:1911.09210] [INSPIRE].
C.K. Khosa, V. Sanz and M. Soughton, Using machine learning to disentangle LHC signatures of dark matter candidates, arXiv:1910.06058 [INSPIRE].
T.G. Dietterich, Ensemble methods in machine learning, in Multiple classifier systems, Springer, Berlin, Heidelberg, Germany (2000), pg. 1.
L. Hansen and P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Machine Intell. 12 (1990) 993.
A.L. Blum and R.L. Rivest, Training a 3-node neural network is NP-complete, Neural Networks 5 (1992) 117.
K. Hornik, M. Stinchcombe and H. White, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Networks 3 (1990) 551.
C. Englert, M. Fairbairn, M. Spannowsky, P. Stylianou and S. Varma, Sensing Higgs boson cascade decays through memory, Phys. Rev. D 102 (2020) 095027 [arXiv:2008.08611] [INSPIRE].
Z.-H. Zhou, J. Wu and W. Tang, Ensembling neural networks: many could be better than all, Artificial Intel. 137 (2002) 239.
A. Krogh and J. Vedelsby, Neural network ensembles, cross validation and active learning, in Proceedings of the 7th international conference on neural information processing systems, NIPS 1 94, MIT Press, Cambridge, MA, U.S.A. (1994), pg. 231.
M.P. Perrone and L.N. Cooper, When networks disagree: ensemble methods for hybrid neural networks, in How we learn; how we remember: toward an understanding of brain and neural systems, World Scientific, Singapore (1995), pg. 342.
J. Xie, B. Xu and Z. Chuang, Horizontal and vertical ensemble with deep representation for classification, arXiv:1306.2759.
L. Rokach, Ensemble-based classifiers, Artificial Intel. Rev. 33 (2009) 1.
R.P.W. Duin and D.M.J. Tax, Experiments with classifier combining rules, in Multiple classifier systems, Springer, Berlin, Heidelberg, Germany (2000), pg. 16.
J. Conrad and F. Tegenfeldt, Applying rule ensembles to the search for super-symmetry at the Large Hadron Collider, JHEP 07 (2006) 040 [hep-ph/0605106] [INSPIRE].
P. Baldi, P. Sadowski and D. Whiteson, Enhanced Higgs boson to τ+τ− search with deep learning, Phys. Rev. Lett. 114 (2015) 111801 [arXiv:1410.3469] [INSPIRE].
A. Alves, Stacking machine learning classifiers to identify Higgs bosons at the LHC, 2017 JINST 12 T05005 [arXiv:1612.07725] [INSPIRE].
A. Alves and F.F. Freitas, Towards recognizing the light facet of the Higgs boson, Mach. Learn. Sci. Tech. 1 (2020) 045025 [arXiv:1912.12532] [INSPIRE].
A. Butter et al., The machine learning landscape of top taggers, SciPost Phys. 7 (2019) 014 [arXiv:1902.09914] [INSPIRE].
N. Ueda and R. Nakano, Generalization error of ensemble estimators, in Proceedings of International Conference on Neural Networks (ICNN′96), volume 1, IEEE, (1996), pg. 90.
S. Bollweg, M. Haußmann, G. Kasieczka, M. Luchmann, T. Plehn and J. Thompson, Deep-learning jets with uncertainties and more, SciPost Phys. 8 (2020) 006 [arXiv:1904.10004] [INSPIRE].
S. Marshall et al., Using Bayesian optimization to find asteroids’ pole directions, AAS/Division for Planetary Sciences Meeting Abstracts 50 (2018) 505.01D.
J. Mukhoti, P. Stenetorp and Y. Gal, On the importance of strong baselines in Bayesian deep learning, arXiv:1811.09385.
B. Nachman, A guide for deploying deep learning in LHC searches: how to achieve optimality and account for uncertainty, SciPost Phys. 8 (2020) 090 [arXiv:1909.03081] [INSPIRE].
B. Nachman and J. Thaler, Neural resampler for Monte Carlo reweighting with preserved uncertainties, Phys. Rev. D 102 (2020) 076004 [arXiv:2007.11586] [INSPIRE].
C. Englert, P. Galler, P. Harris and M. Spannowsky, Machine learning uncertainties with adversarial neural networks, Eur. Phys. J. C 79 (2019) 4 [arXiv:1807.08763] [INSPIRE].
Y. Gal and Z. Ghahramani, Dropout as a bayesian approximation: representing model uncertainty in deep learning, arXiv:1506.02142.
A. Kendall and Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision?, arXiv:1703.04977.
J.F. Kolen and J.B. Pollack, Back propagation is sensitive to initial conditions, in Proceedings of the 3rd International Conference on Neural Information Processing Systems, NIPS1 90, Morgan Kaufmann Publishers Inc., San Francisco, CA, U.S.A. (1990), pg. 860.
K. Cherkauer, Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks, in Working notes of the AAAI workshop on integrating multiple learned models, (1996), pg. 15.
K. Tumer and J. Ghosh, Error correlation and error reduction in ensemble classifiers, Connection Sci. 8 (1996) 385.
L. Breiman, Bagging predictors, Machine Learn. 24 (1996) 123.
M. Gams, New measurements highlight the importance of redundant knowledge, in Proceedings of the fourth european working session on learning, (1989), pg. 71.
B. Parmanto, P. Munro and H. Doyle, Improving committee diagnosis with resampling techniques, in Advances in neural information processing systems, volume 8, D. Touretzky, M.C. Mozer and M. Hasselmo eds., MIT Press, U.S.A. (1996), pg. 882.
Y. Freund and R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55 (1997) 119.
Y. Freund and R.E. Schapire, Experiments with a new boosting algorithm, in Proceedings of the thirteenth international conference on machine learning, Morgan Kaufmann, San Francisco, CA, U.S.A. (1996), pg. 148.
G. Brown, J.L. Wyatt and P. Tiño, Managing diversity in regression ensembles, J. Mach. Learn. Res. 6 (2005) 1621.
P. Domingos, A unifeid bias-variance decomposition and its applications, in Proceedings of the seventeenth international conference on machine learning, ICML ′00, Morgan Kaufmann, San Francisco, CA, U.S.A. (2000), pg. 231.
G. Kasieczka, T. Plehn, J. Thompson and M. Russel, Top quark tagging reference dataset, Zenodo, March 2019.
T. Sjöstrand et al., An introduction to PYTHIA 8.2, Comput. Phys. Commun. 191 (2015) 159 [arXiv:1410.3012] [INSPIRE].
DELPHES 3 collaboration, DELPHES 3, a modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057 [arXiv:1307.6346] [INSPIRE].
M. Cacciari, G.P. Salam and G. Soyez, The anti-kt jet clustering algorithm, JHEP 04 (2008) 063 [arXiv:0802.1189] [INSPIRE].
M. Cacciari, G.P. Salam and G. Soyez, FastJet user manual, Eur. Phys. J. C 72 (2012) 1896 [arXiv:1111.6097] [INSPIRE].
S. Bentvelsen and I. Meyer, The Cambridge jet algorithm: features and applications, Eur. Phys. J. C 4 (1998) 623 [hep-ph/9803322] [INSPIRE].
J.M. Butterworth, A.R. Davison, M. Rubin and G.P. Salam, Jet substructure as a new Higgs search channel at the LHC, Phys. Rev. Lett. 100 (2008) 242001 [arXiv:0802.2470] [INSPIRE].
F. Pedregosa et al., Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825.
F. Chollet et al., Keras, https://keras.io, (2015).
M. Abadi et al., TensorFlow: large-scale machine learning on heterogeneous distributed systems, arXiv:1603.04467 [INSPIRE].
D.P. Kingma and J. Ba, Adam: a method for stochastic optimization, arXiv:1412.6980 [INSPIRE].
Y. Kwon, J.-H. Won, B.J. Kim and M.C. Paik, Uncertainty quantification using Bayesian neural networks in classification: application to biomedical image segmentation, Comput. Statist. Data Anal. 142 (2020) 106816.
N. Tagasovska and D. Lopez-Paz, Single-model uncertainties for deep learning, arXiv:1811.00908.
D.J.C. MacKay, Information theory, inference & learning algorithms, Cambridge University Press, Cambridge, U.K. (2002).
M. Abadi et al., Tensorflow: a system for large-scale machine learning, arXiv:1605.08695.
Y. Wen, P. Vicol, J. Ba, D. Tran and R.B. Grosse, Flipout: efficient pseudo-independent weight perturbations on mini-batches, arXiv:1803.04386.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
ArXiv ePrint: 2102.01078
Rights and permissions
Open Access . This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Araz, J.Y., Spannowsky, M. Combine and conquer: event reconstruction with Bayesian Ensemble Neural Networks. J. High Energ. Phys. 2021, 296 (2021). https://doi.org/10.1007/JHEP04(2021)296
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/JHEP04(2021)296