MadMiner: Machine Learning-Based Inference for Particle Physics

Abstract

Precision measurements at the LHC often require analyzing high-dimensional event data for subtle kinematic signatures, which is challenging for established analysis methods. Recently, a powerful family of multivariate inference techniques that leverage both matrix element information and machine learning has been developed. This approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the underlying physics or detector response. In this paper, we introduce MadMiner , a Python module that streamlines the steps involved in this procedure. Wrapping around MadGraph5_aMC and Pythia 8, it supports almost any physics process and model. To aid phenomenological studies, the tool also wraps around Delphes 3, though it is extendable to a full Geant4-based detector simulation. We demonstrate the use of MadMiner in an example analysis of dimension-six operators in ttH production, finding that the new techniques substantially increase the sensitivity to new physics.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    The issue of likelihood-free inference, the inference techniques discussed here, and MadMiner just as well apply in a Bayesian setting, see for instance Ref. [56].

  2. 2.

    Note that this approach is similar in spirit to the Matrix Element Method, which also uses parton-level likelihoods and aims to estimate \(r(x | \theta _0, \theta _1)\) by calculating approximate versions of the integral in Eq.  (3). But unlike the Matrix Element Method, our machine learning-based approach supports realistic shower and detector simulations and can be evaluated very efficiently.

  3. 3.

    In fact, the score vector is a generalization of the concept of Optimal Observables [27,28,29] from the parton level to the full statistical model including shower and detector simulation.

  4. 4.

    The Fisher information defines a metric on the parameter space, giving rise to the field of information geometry [9, 73, 74]. In that formalism, we can also define “global” distances measured along geodesics, which are equivalent to the expected log likelihood ratio even beyond the local approximation of small \(\Delta \theta\) [75].

  5. 5.

    Fundamentally, the presented inference techniques also support new physics effects that affect e. g. the probabilities of shower splittings, but this is currently not supported in MadMiner.

  6. 6.

    Similarly, important phase-space regions can also be identified using the log likelihood ratio directly [105,106,107].

References

  1. 1.

    Brehmer J, Cranmer K, Espejo I, Kling F, Louppe G, Pavez J (2019) Effective LHC measurements with matrix elements and machine learning. arxiv: 1906.01578

  2. 2.

    Cranmer KS (2001) Kernel estimation in high-energy physics. Comput Phys Commun 136:198

    ADS  MATH  Article  Google Scholar 

  3. 3.

    Cranmer K, Lewis G, Moneta L, Shibata A, Verkerke W (2012) (ROOT) HistFactory: a tool for creating statistical models for use with RooFit and RooStats

  4. 4.

    Frate M, Cranmer K, Kalia S, Vandenberg-Rodes A, Whiteson D (2017) Modeling smooth backgrounds and generic localized signals with gaussian processes. arxiv: 1709.05681

  5. 5.

    Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Statist 12(4):1151

    MathSciNet  MATH  Article  Google Scholar 

  6. 6.

    Beaumont MA, Zhang W, Balding DJ (2002) Approximate bayesian computation in population genetics. Genetics 162(4):2025

    Google Scholar 

  7. 7.

    Alsing J, Wandelt B, Feeney S (2018) Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology. arxiv: 1801.01497

  8. 8.

    Charnock T, Lavaux G, Wandelt BD (2018) Automatic physical inference with information maximizing neural networks. Phys. Rev. D 97(8):083004

    ADS  Article  Google Scholar 

  9. 9.

    Brehmer J, Cranmer K, Kling F, Plehn T (2017) Better Higgs boson measurements through information geometry. Phys Rev D95(7):073002

    ADS  MathSciNet  Google Scholar 

  10. 10.

    Brehmer J, Kling F, Plehn T, Tait TMP (2018) Better Higgs-CP tests through information geometry. Phys Rev D97(9):095017

    ADS  Google Scholar 

  11. 11.

    Kondo K (1988) Dynamical likelihood method for reconstruction of events with missing momentum. I. Method and toy models. J Phys Soc Jpn 57:4126

    ADS  Article  Google Scholar 

  12. 12.

    Abazov VM et al (2004) A precision measurement of the mass of the top quark. Nature 429:638 (DO)

    ADS  Article  Google Scholar 

  13. 13.

    Artoisenet P, Mattelaer O (2008) MadWeight: automatic event reweighting with matrix elements. PoS CHARGED2008:025

    Google Scholar 

  14. 14.

    Gao Y, Gritsan AV, Guo Z, Melnikov K, Schulze M, Tran NV (2010) Spin determination of single-produced resonances at hadron colliders. Phys Rev D81:075022

    ADS  Google Scholar 

  15. 15.

    Alwall J, Freitas A, Mattelaer O (2011) The matrix element method and QCD radiation. Phys Rev D83:074010

    ADS  Google Scholar 

  16. 16.

    Bolognesi S, Gao Y, Gritsan AV et al (2012) On the spin and parity of a single-produced resonance at the LHC. Phys Rev D86:095031

    ADS  Google Scholar 

  17. 17.

    Avery P et al (2013) Precision studies of the Higgs boson decay channel \(H \rightarrow ZZ \rightarrow 4l\) with MEKD. Phys Rev D87(5):055006

    ADS  Google Scholar 

  18. 18.

    Andersen JR, Englert C, Spannowsky M (2013) Extracting precise Higgs couplings by using the matrix element method. Phys Rev D87(1):015019

    ADS  Google Scholar 

  19. 19.

    Campbell JM, Ellis RK, Giele WT, Williams C (2013) Finding the Higgs boson in decays to \(Z \gamma\) using the matrix element method at Next-to-Leading Order. Phys Rev D87(7):073005

    ADS  Google Scholar 

  20. 20.

    Artoisenet P, de Aquino P, Maltoni F, Mattelaer O (2013) Unravelling \(t\overline{t}h\) via the Matrix Element Method. Phys Rev Lett 111(9):091802

    ADS  Article  Google Scholar 

  21. 21.

    Gainer JS, Lykken J, Matchev KT, Mrenna S, Park M (2013) The matrix element method: past, present, and future. In: Proceedings of community summer study on the future of U.S. particle physics: snowmass on the Mississippi (CSS2013): Minneapolis, MN, USA, July 29–August 6 2013. arxiv: 1307.3546

  22. 22.

    Schouten D, DeAbreu A, Stelzer B (2015) Accelerated matrix element method with parallel computing. Comput Phys Commun 192:54

    ADS  MathSciNet  Article  Google Scholar 

  23. 23.

    Martini T, Uwer P (2015) Extending the matrix element method beyond the born approximation: calculating event weights at next-to-leading order accuracy. JHEP 09:083

    ADS  Article  Google Scholar 

  24. 24.

    Gritsan AV, Röntsch R, Schulze M, Xiao M (2016) Constraining anomalous Higgs boson couplings to the heavy flavor fermions using matrix element techniques. Phys Rev D94(5):055023

    ADS  Google Scholar 

  25. 25.

    Martini T, Uwer P (2017) The Matrix Element Method at next-to-leading order QCD for hadronic collisions: single top-quark production at the LHC as an example application. arxiv: 1712.04527

  26. 26.

    Kraus M, Martini T, Uwer P (2019) Predicting event weights at next-to-leading order QCD for jet events defined by \(2\rightarrow 1\) jet algorithms. arxiv: 1901.08008

  27. 27.

    Atwood D, Soni A (1992) Analysis for magnetic moment and electric dipole moment form-factors of the top quark via \(e^+ e^- \rightarrow t \bar{t}\). Phys Rev D45:2405

    ADS  Google Scholar 

  28. 28.

    Davier M, Duflot L, Le Diberder F, Rouge A (1993) The Optimal method for the measurement of tau polarization. Phys Lett B306:411

    ADS  Article  Google Scholar 

  29. 29.

    Diehl M, Nachtmann O (1994) Optimal observables for the measurement of three gauge boson couplings in \(e^+ e^- \rightarrow W^+ W^-\). Z Phys C62:397

    ADS  Google Scholar 

  30. 30.

    Soper DE, Spannowsky M (2011) Finding physics signals with shower deconstruction. Phys Rev D84:074002

    ADS  Google Scholar 

  31. 31.

    Soper DE, Spannowsky M (2013) Finding top quarks with shower deconstruction. Phys Rev D87:054012

    ADS  Google Scholar 

  32. 32.

    Soper DE, Spannowsky M (2014) Finding physics signals with event deconstruction. Phys Rev D89(9):094005

    ADS  Google Scholar 

  33. 33.

    Englert C, Mattelaer O, Spannowsky M (2016) Measuring the Higgs-bottom coupling in weak boson fusion. Phys Lett B756:103

    ADS  Article  Google Scholar 

  34. 34.

    Fan Y, Nott DJ, Sisson SA (2012) Approximate Bayesian computation via regression density estimation. ArXiv e-prints arxiv: 1212.1479

  35. 35.

    Dinh L, Krueger D, Bengio Y (2014) NICE: Non-linear Independent Components Estimation. ArXiv e-prints arxiv: 1410.8516

  36. 36.

    Germain M, Gregor K, Murray I, Larochelle H (2015) MADE: masked autoencoder for distribution estimation. ArXiv e-prints arxiv: 1502.03509

  37. 37.

    Cranmer K, Pavez J, Louppe G (2015) Approximating likelihood ratios with calibrated discriminative classifiers. arxiv: 1506.02169

  38. 38.

    Cranmer K, Louppe G (2016) Unifying generative models and exact likelihood-free inference with conditional bijections. J. Brief Ideas

  39. 39.

    Louppe G, Cranmer K, Pavez J (2016) carl: a likelihood-free inference toolbox. J Open Source Softw 1(1):11

    ADS  Article  Google Scholar 

  40. 40.

    Dinh L, Sohl-Dickstein J, Bengio S (2016) Density estimation using Real NVP. ArXiv e-prints arxiv: 1605.08803

  41. 41.

    Papamakarios G, Murray I (2016) Fast \(\epsilon\)-free inference of simulation models with Bayesian conditional density estimation. arXiv e-prints arXiv:1605.06376

  42. 42.

    Dutta R, Corander J, Kaski S, Gutmann MU (2016) Likelihood-free inference by ratio estimation. ArXiv e-prints arxiv: 1611.10242

  43. 43.

    Uria B, Côté M-A, Gregor K, Murray I, Larochelle H (2016) Neural autoregressive distribution estimation. ArXiv e-prints arxiv: 1605.02226

  44. 44.

    Gutmann MU, Dutta R, Kaski S, Corander J (2017) Likelihood-free inference via classification. Stat Comput 1–15

  45. 45.

    Tran D, Ranganath R, Blei DM (2017) Hierarchical implicit models and likelihood-free variational inference. ArXiv e-prints arxiv: 1702.08896

  46. 46.

    Louppe G, Cranmer K (2017) Adversarial variational optimization of non-differentiable simulators. ArXiv e-prints arxiv: 1707.07113

  47. 47.

    Papamakarios G, Pavlakou T, Murray I (2017) Masked autoregressive flow for density estimation. ArXiv e-prints arxiv: 1705.07057

  48. 48.

    Lueckmann J-M, Goncalves PJ, Bassetto G, Öcal K, Nonnenmacher M, Macke JH (2017) Flexible statistical inference for mechanistic models of neural dynamics. arXiv e-prints arXiv:1711.01861

  49. 49.

    Huang C-W, Krueger D, Lacoste A, Courville A (2018) Neural autoregressive flows. ArXiv e-prints arxiv: 1804.00779

  50. 50.

    Papamakarios G, Sterratt DC, Murray I (2018) Sequential neural likelihood: fast likelihood-free inference with autoregressive flows. ArXiv e-prints arxiv: 1805.07226

  51. 51.

    Lueckmann J-M, Bassetto G, Karaletsos T, Macke JH (2018) Likelihood-free inference with emulator networks. arXiv e-prints arXiv:1805.09294

  52. 52.

    Chen TQ, Rubanova Y, Bettencourt J, Duvenaud DK (2018) Neural ordinary differential equations. CoRR arxiv: abs/1806.07366

  53. 53.

    Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1x1 convolutions. arXiv e-prints arXiv:1807.03039,

  54. 54.

    Grathwohl W, Chen RTQ, Bettencourt J, Sutskever I, Duvenaud D (2018) FFJORD: free-form continuous dynamics for scalable reversible generative models. ArXiv e-prints arxiv: 1810.01367

  55. 55.

    Dinev T, Gutmann MU (2018) Dynamic likelihood-free inference via ratio estimation (DIRE). arXiv e-prints arXiv:1810.09899

  56. 56.

    Hermans J, Begy V, Louppe G (2019) Likelihood-free MCMC with approximate likelihood ratios. arxiv: 1903.04057

  57. 57.

    Alsing J, Charnock T, Feeney S, Wandelt B (2019) Fast likelihood-free cosmology with neural density estimators and active learning. arxiv: 1903.00007

  58. 58.

    Greenberg DS, Nonnenmacher M, Macke JH (2019) Automatic posterior transformation for likelihood-free inference. arXiv e-prints arXiv:1905.07488

  59. 59.

    Brehmer J, Louppe G, Pavez J, Cranmer K (2018) Mining gold from implicit models to improve likelihood-free inference. arxiv: 1805.12244

  60. 60.

    Brehmer J, Cranmer K, Louppe G, Pavez J (2018) Constraining effective field theories with machine learning. Phys Rev Lett 121(11):111801

    ADS  Article  Google Scholar 

  61. 61.

    Brehmer J, Cranmer K, Louppe G, Pavez J (2018) A guide to constraining effective field theories with machine learning. Phys Rev D 98(5):052004

    ADS  Article  Google Scholar 

  62. 62.

    Stoye M, Brehmer J, Louppe G, Pavez J, Cranmer K (2018) Likelihood-free inference with an improved cross-entropy estimator. arxiv: 1808.00973

  63. 63.

    Alwall J, Frederix R, Frixione S et al (2014) The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations. JHEP 07:079

    ADS  MATH  Article  Google Scholar 

  64. 64.

    Sjostrand T, Mrenna S, Skands PZ (2008) A Brief Introduction to PYTHIA 8.1. Comput Phys Commun 178:852

    ADS  MATH  Article  Google Scholar 

  65. 65.

    de Favereau J, Delaere C, Demin P et al (2014) (DELPHES 3): DELPHES 3, A modular framework for fast simulation of a generic collider experiment. JHEP 02:057

    Article  Google Scholar 

  66. 66.

    Agostinelli S et al (2003) (GEANT4): GEANT4: A Simulation toolkit. Nucl. Instrum. Meth. A506:250

    ADS  Article  Google Scholar 

  67. 67.

    Cranmer K Practical Statistics for the LHC. In Proceedings, 2011 European School of High-Energy Physics (ESHEP 2011): Cheile Gradistei, Romania, September 7–20, 2011, pp 267-308, 2015. [247(2015)] arxiv: 1503.07622

  68. 68.

    Baldi P, Cranmer K, Faucett T, Sadowski P, Whiteson D (2016) Parameterized neural networks for high-energy physics. Eur Phys J C76(5):235

    ADS  Article  Google Scholar 

  69. 69.

    Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9(1):60

    MATH  Article  Google Scholar 

  70. 70.

    Wald A (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc 54(3):426

    MathSciNet  MATH  Article  Google Scholar 

  71. 71.

    Cowan G, Cranmer K, Gross E, Vitells O (2011) Asymptotic formulae for likelihood-based tests of new physics. Eur Phys J C 71:1554 (Erratum: Eur Phys J C73:2501–2013)

    ADS  Article  Google Scholar 

  72. 72.

    Alsing J, Wandelt B (2018) Generalized massive optimal data compression. Mon Not R Astron So. 476(1):L60

    ADS  Article  Google Scholar 

  73. 73.

    Efron B (1975) Defining the curvature of a statistical problem (with applications to second order efficiency). Ann Stat 3(6):1189

    MathSciNet  MATH  Article  Google Scholar 

  74. 74.

    Amari S-I (1982) Differential geometry of curved exponential families-curvatures and information loss. Ann Statist 10(2):357

    MathSciNet  MATH  Article  Google Scholar 

  75. 75.

    Brehmer J (2017) New ideas for effective higgs measurements. Ph.D. thesis, U. Heidelberg (main) http://www.thphys.uni-heidelberg.de/~plehn/includes/theses/brehmer_d.pdf

  76. 76.

    Radhakrishna Rao C (1945) Information and the accuracy attainable in the estimation of statistical parameters. Bull Calcutta Math Soc 37:81

    MathSciNet  MATH  Google Scholar 

  77. 77.

    Cramér H (1946) Mathematical methods of statistics. Princeton University Press, ISBN 0691080046

  78. 78.

    Edwards TDP, Weniger C (2018) A fresh approach to forecasting in astroparticle physics and dark matter searches. JCAP 1802(02):021

    ADS  Article  Google Scholar 

  79. 79.

    Degrande C, Duhr C, Fuks B, Grellscheid D, Mattelaer O, Reiter T (2012) UFO—The Universal FeynRules Output. Comput Phys Commun 183:1201

    ADS  Article  Google Scholar 

  80. 80.

    Mattelaer O (2016) On the maximal use of Monte Carlo samples: re-weighting events at NLO accuracy. Eur Phys J C76(12):674

    ADS  Article  Google Scholar 

  81. 81.

    Aad G et al (2015) A morphing technique for signal modelling in a multidimensional space of coupling parameters. Physics note ATL-PHYS-PUB-2015-047. http://cds.cern.ch/record/2066980 (ATLAS)

  82. 82.

    Alsing J, Wandelt B (2019) Nuisance hardened data compression for fast likelihood-free inference. arxiv: 1903.01473

  83. 83.

    Lukas M Feickert, Stark G, Turra R, Forde J (2018) diana-hep/pyhf v0.0.15 https://doi.org/10.5281/zenodo.1464139

  84. 84.

    Frederix R, Frixione S, Hirschi V, Maltoni F, Pittau R, Torrielli P (2012) Four-lepton production at hadron colliders: aMC@NLO predictions with theoretical uncertainties. JHEP 02:099

    ADS  Article  Google Scholar 

  85. 85.

    Paszke A, Gross S, Chintala S et al. (2017) Automatic differentiation in pytorch. In: NIPS-W

  86. 86.

    Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145

    MathSciNet  Article  Google Scholar 

  87. 87.

    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv e-prints arXiv:1412.6980

  88. 88.

    Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations

  89. 89.

    Lakshminarayanan B, Pritzel A, Blundell C (2016) Simple and scalable predictive uncertainty estimation using deep ensembles. arXiv e-prints arXiv:1612.01474

  90. 90.

    Brehmer J, Kling F, Espejo I, Cranmer K (2019) MadMiner code repository. https://doi.org/10.5281/zenodo.1489147

  91. 91.

    Brehmer J, Kling F, Espejo I, Cranmer K (2019) MadMiner technical documentation. https://madminer.readthedocs.io/en/latest/

  92. 92.

    Espejo I, Brehmer J, Cranmer K (2019) MadMiner Docker repositories. https://hub.docker.com/u/madminertool

  93. 93.

    Šimko T, Heinrich L, Hirvonsalo H, Kousidis D, Rodríguez D (2018) REANA: a system for reusable research data analyses. Technical Report CERN-IT-2018-003, CERN, Geneva. https://cds.cern.ch/record/2652340

  94. 94.

    Espejo I, Brehmer J, Kling F, Cranmer K (2019) MadMiner Reana deployment. https://github.com/irinaespejo/workflow-madminer

  95. 95.

    The HDF Group: Hierarchical data format version 5, 2000–2010. http://www.hdfgroup.org/HDF5

  96. 96.

    Dobbs M, Hansen JB (2001) The HepMC C++ Monte Carlo event record for High Energy Physics. Comput Phys Commun 134:41

    ADS  Article  Google Scholar 

  97. 97.

    Rodrigues E, Marinangeli M, Pollack B et al (2019) scikit-hep/scikit-hep: scikit-hep-0.5.1 https://doi.org/10.5281/zenodo.3234683

  98. 98.

    Oliphant T (2006): NumPy: A guide to NumPy. USA: Trelgol Publishing. http://www.numpy.org/

  99. 99.

    Butterworth J et al (2016) PDF4LHC recommendations for LHC Run II. J Phys G43:023001

    ADS  Article  Google Scholar 

  100. 100.

    de Florian D et al, (LHC Higgs Cross Section Working Group) (2016) Handbook of LHC Higgs cross sections: 4. Deciphering the Nature of the Higgs Sector arXiv:1610:07922

  101. 101.

    Giudice GF, Grojean C, Pomarol A, Rattazzi R (2007) The strongly-interacting light Higgs. JHEP 06:045

    ADS  Article  Google Scholar 

  102. 102.

    Alloul A, Fuks B, Sanz V (2014) Phenomenology of the Higgs Effective Lagrangian via FEYNRULES. JHEP 04:110

    ADS  Article  Google Scholar 

  103. 103.

    Maltoni F, Vryonidou E, Zhang C (2016) Higgs production in association with a top-antitop pair in the standard model effective field theory at NLO in QCD. JHEP 10:123

    ADS  Article  Google Scholar 

  104. 104.

    Cepeda M, et al (Physics of the HL-LHC Working Group) (2019) Higgs physics at the HL-LHC and HE-LHC. arxiv: 1902.00134

  105. 105.

    Plehn T, Schichtel P, Wiegand D (2014) Where boosted significances come from. Phys Rev D89(5):054002

    ADS  Google Scholar 

  106. 106.

    Kling F, Plehn T, Schichtel P (2017) Maximizing the significance in Higgs boson pair analyses. Phys Rev D95(3):035026

    ADS  Google Scholar 

  107. 107.

    Gonçalves D, Han T, Kling F, Plehn T, Takeuchi M (2018) Higgs boson pair production at future hadron colliders: From kinematics to dynamics. Phys Rev D97(11):113004

    ADS  Google Scholar 

  108. 108.

    Merkel D (2014) Docker: Lightweight linux containers for consistent development and deployment. Linux J 2014:239

    Google Scholar 

  109. 109.

    Kluyver T, Ragan-Kelley B, Pérez F et al. (2016) Jupyter notebooks—a publishing format for reproducible computational workflows. In: ELPUB

  110. 110.

    Hunter JD (2007) Matplotlib: A 2d graphics environment. Comput Sci Eng 9(3):90

    Article  Google Scholar 

  111. 111.

    Lukas: lukasheinrich/pylhe v0.0.4, 2018. https://doi.org/10.5281/zenodo.1217032

  112. 112.

    Sjstrand T, Ask S, Christiansen JR et al (2015) An Introduction to PYTHIA 8.2. Comput Phys Commun 191:159

    ADS  MATH  Article  Google Scholar 

  113. 113.

    Van Rossum G, Drake FL Jr (1995) Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands

  114. 114.

    Rodrigues E (2019) The Scikit-HEP Project. In: 23rd International conference on computing in high energy and nuclear physics (CHEP 2018) Sofia, Bulgaria, 9–13 July 2018. arxiv: 1905.00002

  115. 115.

    Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825

    MathSciNet  MATH  Google Scholar 

  116. 116.

    Pivarski J, Das P, Smirnov D et al. (2019) scikit-hep/uproot: 3.7.2. https://doi.org/10.5281/zenodo.3256257

  117. 117.

    Heinrich L, Cranmer K (2017) diana-hep/yadage v0.12.13. https://doi.org/10.5281/zenodo.1001816

Download references

Acknowledgements

We would like to thank Zubair Bhatti, Lukas Heinrich, Alexander Held, and Samuel Homiller for their important contributions to the development of MadMiner . We are grateful to Joakim Olsson for his help with the tth data generation. We also thank Pablo de Castro, Sally Dawson, Gilles Louppe, Olivier Mattelaer, Duccio Pappadopulo, Michael Peskin, Tilman Plehn, Josh Rudermann, and Leonora Vesterbacka for fruitful discussions. Last but not least, we are grateful to the authors and maintainers of many open-source software packages, including Delphes 3 [65], Docker [108], Jupyter notebooks [109], MadGraph5_aMC [63], Matplotlib [110], NumPy [98], pylhe [111], Pythia 8 [112], Python [113], PyTorch [85], REANA [93], scikit-hep [114], scikit-learn [115], uproot [116], and yadage [117]. This work was supported by the U.S. National Science Foundation (NSF) under the awards ACI-1450310, OAC-1836650, and OAC-1841471. It was also supported through the NYU IT High Performance Computing resources, services, and staff expertise. JB and KC are grateful for the support of the Moore–Sloan data science environment at NYU. KC is also supported through the NSF grant PHY-1505463, while FK is supported by NSF grant PHY-1620638 and U. S. Department of Energy grant DE-AC02-76SF00515.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Johann Brehmer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Frequently Asked Questions

Appendix: Frequently Asked Questions

Here, we collect questions that are asked often, hoping to avoid misconceptions:

  • Does the whole event history not change when I change parameters?

    No. In probabilistic processes such as those at the LHC, any given event history is typically compatible with different values of the theory parameters, but might be more or less likely. With “event history” we mean the entire evolution of a simulated particle collision, ranging from the initial-state and final-state elementary particles through the parton shower and detector interactions to observables. The joint likelihood ratio and joint score quantify how much more or less likely one particular such evolution of a simulated event becomes when the theory parameters are varied.

  • If the network is trained on parton-level matrix element information, how does it learn about the effect of shower and detector?

    It is true that the “labels” that the networks are trained on, the joint likelihood ratio and joint score, are based on parton-level information. However, the inputs into the neural network are observables based on a full simulation chain, after parton shower, detector effects, and the reconstruction of observables. It was shown in Ref. [59,60,61] that the joint likelihood ratio and joint score are unbiased, but noisy, estimators of the true likelihood ratio and true score (including shower and detector effects). A network trained in the right way will, therefore, learn the effect of shower and detector. We illustrate this mechanism in Sect. 5.1 in a one-dimensional problem.

  • Can this approach be used for signal-background classification?

    Yes. In the simplest case, where the signal and background hypothesis do not depend on any additional parameters, the Carl, Rolr, or Alice techniques can be used to learn the probability of an individual event being signal or background. If there are parameters of interest such as a signal strength or the mass of a resonance, the score becomes useful and techniques such as Sally, Rascal, Cascal, and Alices can be more powerful.

    The techniques that use the joint likelihood ratio or score require less training data when the signal and background processes populate the same phase-space regions. If this is not the case, these methods still apply, but will not offer an advantage over the traditional training of binary classifiers.

  • What if the simulations do not describe the physics accurately?

    No simulator is perfect, but many of the techniques used for incorporating systematic uncertainties from mismodeling in the case of multivariate classifiers can also be used in this setting. For instance, often, the effect of mismodeling can be corrected with simple scale factors and the residual uncertainty incorporated with nuisance parameters. MadMiner can handle such systematic uncertainties as discussed above. If only particular phase-space regions are problematic, for instance those with low-energy jets, we recommend to exclude these parameter regions with suitable selection cuts. If the kinematic distributions are trusted, but the overall normalization is less well known, a data-driven normalization can be used.

    Of course, there is no silver bullet, and if the simulation code is not trustworthy at all in a particular process and the uncertainty cannot be quantified with nuisance parameters, these methods (and many more traditional analysis methods) will not provide accurate results.

  • Is the neural network a black box?

    Neural networks are often criticized for their lack of explainability. It is true that the internal structure of the network is not directly interpretable, but in MadMiner , the interpretation of what the network is trying to learn is clearly connected to the matrix element. In practical terms, one of the challenges is to verify whether a network has been successfully trained. For that purpose, many cross-checks and diagnostic tools are available to make sure that this is the case:

    • checking the loss function on a separate validation sample;

    • training of multiple network instances with independent random seeds, as discussed above;

    • checking the expectation values of the score and likelihood ratio against their known true values, see Ref. [61];

    • varying of the reference hypothesis in the likelihood ratio, see Ref. [61];

    • training classifiers between data reweighted with the estimated likelihood ratio and original data from a new parameter point, see Ref. [61];

    • validating the inference techniques in low-dimensional problems with histograms, see Sect. 5.1;

    • validating the inference techniques on a parton-level scenario with tractable likelihood function, see Sect. 5.2; and

    • checking the asymptotic distribution of the likelihood ratio against Wilks’ theorem [69,70,71].

    Finally, when limits are set based on the Neyman construction with toy experiments (rather than using the asymptotic properties of the likelihood ratio), there is a coverage guarantee: the exclusion contours constructed in this way will not exclude the true point more often than the confidence level. No matter how wrong the likelihood, likelihood ratio, or score function estimated by the neural network is, the final limits might lose statistical power, but will never be too optimistic.

  • Are you trying to replace PhD students with a machine?

    As a preemptive safety measure against scientists being made redundant by automated inference algorithms, we have implemented a number of bugs in MadMiner . It will take skilled physicists to find them, ensuring safe jobs for a while. More seriously, just as MadGraph automated the process of generating events for an arbitrary hard scattering process, MadMiner aims to contribute to the automation of several steps in the inference chain. Both developments enhance the productivity of physicists.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Brehmer, J., Kling, F., Espejo, I. et al. MadMiner: Machine Learning-Based Inference for Particle Physics. Comput Softw Big Sci 4, 3 (2020). https://doi.org/10.1007/s41781-020-0035-2

Download citation