MoMEMta, a modular toolkit for the Matrix Element Method at the LHC

The Matrix Element Method has proven to be a powerful method to optimally exploit the information available in detector data. Its widespread use is nevertheless impeded by its complexity and the associated computing time. MoMEMta, a C++ software package to compute the integrals at the core of the method, provides a versatile implementation of the Matrix Element Method to both the theory and experiment communities. Its modular structure covers the needs of experimental analysis workflows at the LHC without compromising ease of use on simpler and smaller simulated samples used for phenomenological studies. With respect to existing tools, MoMEMta improves on usability and flexibility. In this paper, we present version 1.0 of MoMEMta, together with examples illustrating the wide range of applications at the LHC accessible for the first time with a single tool.


A
: The Matrix Element Method has proven to be a powerful method to optimally exploit the information available in detector data.Its widespread use is nevertheless impeded by its complexity and the associated computing time.MoMEMta, a C++ software package to compute the integrals at the core of the method, provides a versatile implementation of the Matrix Element Method to both the theory and experiment communities.Its modular structure covers the needs of experimental analysis workflows at the LHC without compromising ease of use on simpler and smaller simulated samples used for phenomenological studies.In this paper, we present version 1.0 of MoMEMta, together with examples illustrating the wide range of application at the LHC accessible for the first time with a single tool.

K
: Analysis and statistical methods

Introduction
The discovery of the Higgs boson by the ATLAS and CMS experiments in 2012 [1,2] opened a new era in particle physics.More than just a new particle, a new set of interactions needs to be characterised.The LHC physics program therefore includes precision measurements of standard model (SM) processes (in particular in the top and scalar sectors) and the search for rare production mechanisms or rare decay channels.The absence so far of any obvious sign of physics beyond the standard model further increases the need to look in places where the backgrounds are large and the effect of new physics subtle.In all these studies, it is of the uttermost importance to fully exploit the potential of the large data set collected.For most of the analyses performed in high energy physics (HEP), obtaining an optimal result implies the treatment of multiple correlated quantities in a multivariate setting.The most popular methods for multivariate analysis in HEP are machine learning techniques, such as boosted decision trees and neural networks.These approaches require large training data sets (usually obtained by Monte Carlo techniques) in order to learn the internal structure of the data.On the contrary, the Matrix Element Method (MEM) uses directly our theoretical knowledge of a process to assign to each event a probability that measures the compatibility of experimental data with a given hypothesis.There is no training, since the underlying Lagrangian, from which the matrix element of the partonic process is derived, is known.
The MEM, originally designed at the Tevatron experiments DØ and CDF [3][4][5][6][7][8][9][10] for top quark mass measurements in tt production, is nowadays a common technique in particle physics.Recent examples of its use at the LHC are searches for ttH [11][12][13][14][15][16][17][18] and single top quark production [19], and a measurement of spin correlations in tt production [20].Nevertheless, while it can be used for a wide variety of studies, the practical application of the MEM has been impeded by its complexity and by the associated computing time.In order to evaluate the probability under a given theoretical hypothesis of a given experimental event, a difficult convolution of the theoretical information on the hard scattering (i.e. the matrix element squared) with the experimentally available information on the final state (encoded in the so-called transfer functions) has to be performed.The corresponding integrand varies by several orders of magnitudes in different regions of the phase space, which implies the use of adaptive numerical integration techniques together with a smart choice of integration variables.A general algorithm has been proposed in Ref. [21], which involves optimised phase-space mappings designed to remove as much as possible the peaks in the integrand.Howver, the corresponding implementation (M W ) is not supported anymore and suffers from a lack of flexibility that prevents-or significantly limits-its use in large scale analyses of LHC data by the collaborations.Furthermore, the tool is very rigid and does not allow the user to implement simplifying assumptions In this paper, we present MoMEMta, a modular C++ software package to compute the convolution integrals at the core of the method.Its modular structure covers the needs of experimental analysis workflows at the LHC without compromising the ease of use on simpler and smaller simulated samples used for phenomenological studies.It relies on the same approach as M W to address the parameterisation of the phase space but leaves more freedom to the user.MoMEMta is designed to be fast, to adapt to any process, and to be integrated in any C++ or Python analysis workflow.
In the following, we will first review the MEM, with an emphasis on the assumptions made in MoMEMta, before presenting shortly the philosophy of the implementation.We will then concentrate on a few concrete use cases that illustrate the variety of problems that can be tackled using MoMEMta, and how the modularity can best be exploited to adapt to these problems.

The matrix element method
The MEM is a technique to calculate the conditional probability density P(x|α) to observe an experimental event x, given a specific theoretical hypothesis α.Details about the method can be found for example in Ref. [22].We will here concentrate on the main aspects.
The likelihood for a partonic final state y to be produced in the hard-scattering process is proportional to the differential cross section dσ α of the corresponding process, given by where q 1 and q 2 stand for the initial state parton momentum fractions, s stands for the hadronic centre-of-mass energy and y stands for the kinematics of the final state.The central element in that expression is the squared matrix element for process α, denoted |M α (q 1 , q 2 , y) | 2 , where the summation over spin and colour states is understood.It can be obtained either analytically or numerically through packages like MG5_ MC@NLO [23] or MCFM [24].Because of the intrinsic theoretical difficulty to identify final state particles with partons at NLO, the leading order matrix element is used in most applications.The n-body phase space dΦ(y) must also be considered in the calculation, as it plays an important role in any change of variable needed for the integration of the differential cross section.
To obtain the differential cross section dσ α (y) in hadron collisions, (2.1) is convoluted with the parton density functions (PDF) and summed over all possible flavour compositions of the colliding partons, where f a 1 (q 1 ) and f a 2 (q 2 ) are the parton density functions for a given flavour a i and momentum fraction q i .
The evolution of the parton-level configuration y into a reconstructed event x in the detector is modeled by a transfer function T(x|y), normalised as a probability density over x, that describes how the partonic final state y is reconstructed as x in the detector.This includes the effects from the parton shower, hadronisation, and the limited detector resolution.The efficiency (y), i.e. the probability to reconstruct and select a specific partonic configuration y, also needs to be taken into account.This includes geometrical acceptance effects.
The transfer function and efficiency are assumed to factorise into contributions from each measured final-state particle, and each of these contributions are often assumed to further factorise into simple direction-and momentum-dependent terms.For most applications, it is realistic to then assume that the direction is perfectly reconstructed.The transfer function in that case is a Dirac delta function on the angular variables, and takes a non-trivial form only for the energy (or transverse momentum) degree of freedom.All these assumptions may not be valid in cases where objects, especially jets, are close to each other or reconstructed together like in boosted topologies, and will then result in a less accurate result.
Neutrinos (or other invisible particles) cannot be measured directly, but the corresponding volume of phase space over which to integrate can be narrowed by exploiting mass constraints from intermediate resonances assumed to be on-shell, or by relying on momentum conservation in the transverse plane, which yields an estimate of the total transverse momentum ì p miss T of the missing particles in the event.However, it is not possible to trivially factorise the transfer function on the two components of ì p miss T from the terms relative to visible particles in the final state, since the experimental error in measuring ì p miss T is correlated with the error made in measuring all the other particles in the event.
Aspects to be considered in the transfer function and efficiency are the measurement of the momentum of a particle as well as its (mis-)identification.This latter point might be relevant for b quarks and τ leptons, and allows in principle to combine different final states for the same partonic event.
After convolution, the full expression reads where σ vis α is a normalisation factor that ensures P(x|α) is a probability density over x.Finally, one has also to take into account the fact that some of the particles measured in the detector cannot be assigned unambiguously to specific final-state partons.All possible combinations must be considered and their contributions averaged.
The information contained in (2.3b) can be exploited in different ways, from the extraction of the most probable value of the theory parameters through a likelihood maximisation method [25], for which the normalisation constant σ vis α has to be properly taken into account, to the bare use of the integral result without normalisation σ vis α , referred to in the literature as matrix element weight, W(x|α).
The integral defined in (2.3b) is typically a small number that varies over several orders of magnitudes from event to event.It is therefore common to use instead the event information defined by I α ≡ − log P(x|α).When computed from the weight instead of the probability, the information is only modified by an additive constant, with no consequence in many applications.We will denote this quantity I α ≡ − log W(x|α).
In the limit where all the quantities and functions in (2.3b) are known with perfect accuracy, P(x|α) is a likelihood.By the Neyman-Pearson lemma, the ratio between the likelihoods obtained under two different hypotheses α and α is the most powerful test statistic to discriminate one from the other [26].Hence, if it can be implemented, the MEM should provide optimal experimental sensitivity.In practice, we are limited by the use of leading-order matrix elements, or by assumptions made in constructing the transfer function.The quantity (2.3b) is therefore not a true likelihood, and the Neyman-Pearson lemma does not strictly apply.For discrimination purposes, it is then common to use the event information as input of another multivariate method (typically a boosted decision tree or a neural network).

Implementation
The MEM is used in HEP by both theoretical and experimental communities with different purposes and levels of complexity ranging from the evaluation of a matrix element on a reconstructed eventwhich does not require any integration process-to the precise evaluation of model parameters (e.g. the top quark mass) through the use of the properly normalised likelihood derived from (2.3b).
In order to adapt to these very different use cases, a modular design has been adopted for MoMEMta.The core library is written in C++ and provides modules for various purposes: to represent and evaluate the matrix element and parton density functions, to represent and evaluate transfer functions, to perform changes of variables, to handle the combinatorics of the final state, etc.That way, every term of (2.3b) is treated as a module that can be configured by the user.Weights are computed for a given process by calling and linking the proper set of modules in a configuration file written in the Lua scripting language [27].The resulting object can be called from any C++ or Python code, which means that it seamlessly integrates into the complex analysis environment of the large experimental collaborations but can also be used within small programs reading events from files in any format (e.g. a custom text file, or a file in the R [28], H MC [29], L [30] or S HEP [31] format).
The computation of the weights requires, in most cases, the evaluation of multidimensional integrals via adaptive Monte Carlo techniques.The efficiency in computing these integrals depends on the parameterisation of the phase-space measure used in the integration.In order to map in an efficient way all the structures in the integrand, MoMEMta follows the philosophy introduced by M W [21].In this approach, a standard parameterisation of the phase-space measure is optimised by using a finite number of analytic transformations over subsets of the integration variables, called "blocks".A list of the blocks available in MoMEMta along with the addressed event topologies, and the integration variables removed and introduced by the changes of variables, is shown in Tabs. 1 and 2. The first table lists "Main Blocks", i.e. changes of variables that allow to integrate out the four-dimensional Dirac delta function present in the phase-space density term dΦ(y), that enforces conservation of momentum between the initial and final states.The second table lists "Secondary Blocks", i.e. simple changes of variables that do not remove any degree of freedom.Main and secondary blocks are implemented as dedicated MoMEMta modules, which can be chained to perform the change of integration variables that is best suited for the problem at hand.Examples are given in Sec. 4.
Table 1.Set of MoMEMta Main Blocks.Each block performs a specific change of integration variables, and removes four degrees of freedom by enforcing momentum conservation between the initial and final states.The third and fourth columns show the integration variables respectively removed and introduced in each block definition.Block G is introduced in this work, in addition to those originally defined in M W .We denote by q i the Bjorken fractions of the initial-state partons, and by p i the 4-momentum of a final-state particle, parameterised in polar coordinates by |p i |, θ i and φ i .Furthermore, we set s i... j = (p i + • • • + p j ) 2 .
In block E, y denotes the rapidity of the total partonic system.Removing a particle, p i , means removing all three degrees of freedom associated with that particle.Variables that are not explicitly mentioned are understood to remain present as in the standard phase-space parameterisation.

Main
Topology Removes Integrates block over A MoMEMta ships with matrix elements for a few processes, but any leading-order process handled by MG5_ MC@NLO [23] can be added using a Matrix Element Exporter plugin pro- vided [32].Native support for other matrix element generators is planned for future releases, but the modular structure already enables the user to wrap any C++ code that computes a matrix element to be used with MoMEMta.Parton density functions are obtained from L 6 [33] and the integration is done using the C library [34], that offers a choice of four independent routines for multidimensional numerical integration: Vegas [35], Suave [34], Divonne [36], and Cuhre [37,38].
The MoMEMta implementation [39] is publicly available, is licensed under the GLPv3, and comes together with an online documentation [40] and tutorials [41].

MEM use cases
Nowadays, applications of the MEM in high-energy physics are most often restricted to computing weights W(x|α) under several hypotheses, to discriminate a signal from one or several backgrounds.In complex situations with several reconstructed objects and unconstrained degrees of freedom, there is no unique solution to the problem of efficiently and precisely computing W(x|α), and the user has to play an active role in defining how to evaluate W(x|α) and at which accuracy.
In this section we describe a few use cases of the MEM for signal extraction in LHC analyses.The examples illustrate various levels of complexity, from the simplest case with a precisely reconstructed final state for which no integration is needed, to complex final states including six reconstructed and two unobserved objects.The Lua configuration for each of the examples can be found together with the MoMEMta tutorials [41].In all the cases simulated events are generated using MG5_ MC@NLO [23], P [42] and D [43].The computation times vary by several orders of magnitude among the different use cases, and strongly depend on the choice of parameters governing the integration procedure.Indicative performance figures are given in Sec. 5.

Discovery and characterisation of the Higgs boson
The MEM was instrumental for the CMS collaboration in the discovery of the Higgs boson in the H → ZZ * → 4l channel [2].Likewise, the characterisation by ATLAS [44,45] and CMS [46,47] of the discovered resonance in terms of coupling structure, spin and parity, has relied on matrix-element techniques as suggested in Refs.[48,49].In this channel, all final-state particles can be detected and there are no unobserved degrees of freedom to integrate over.Given the good experimental resolution on muon and electron direction and momentum, it is reasonable to approximate the transfer function in (2.3b) by T(x, y) = δ(x − y).Hence, the integral reduces to a simple evaluation of the matrix element squared and the PDFs using the measured momenta in the event.In this framework, dubbed matrix element likelihood analysis (MELA), it is straightforward to build a discriminating variable between the signal and the qq → ZZ → 4l background by considering the matrix elements of these two hypotheses: Similar variables can be constructed to discriminate, for instance, a SM Higgs boson (J P = 0 + ) from a resonance of the same mass but different spin and/or opposite parity: Although MoMEMta was designed to handle more complex final states, its flexibility allows the user to easily implement a MELA-like analysis.To illustrate this fact, we have simulated events for the gg → H → ZZ * → 4µ and qq → ZZ * /Zγ * → 4µ processes.The SM Higgs sample, as well as the production and decay of a resonance of spin/parity J P = 0 − were generated using the Higgs characterisation framework [50].With MoMEMta's plugin for MG5_ MC@NLO, the same matrix elements used for event generation can be exported in a format suitable for MoMEMta.The configuration of MoMEMta in this use case only requires a single module, which returns the product of the matrix element and the PDFs evaluated on a given event.The phase-space density term present in (2.3b) does not need to be included, since it cancels in the ratios in (4.1) and (4.2).With P(x|bkg), P(x|sig) and P(x|0 − ) computed by MoMEMta, the discriminant variables D bkg and D 0 − can be built.The normalised distributions of these variables, for the different processes considered, are shown on Fig. 1 after an event selection closely following the analysis in Ref. [47].The discrimination power between the concurrent hypotheses is comparable to what is obtained in Refs.[46,47].

Charge identification in tW production
The MEM has been extensively used in the study of single top quark production processes at the Tevatron [51,52], and was instrumental in the most sensitive search for s-channel single top production at the LHC [19].Incidentally, single top and W boson associated production (tW) provides a good opportunity to showcase MoMEMta's abilities.This process features three propagators in the matrix element and, in the case where both the top and W decay leptonically (dilepton channel), missing information due to the presence of two neutrinos in the final state.
We consider the charge-conjugate processes, tW − and tW + , which yield the same visible final state and have practically the same rate at the LHC.It has been suggested to measure the CKM matrix element |V td | at the LHC using the charge asymmetry between these two processes [53].This requires the ability to efficiently disentangle them, a task made difficult by the system not being entirely reconstructible in the dilepton channel.We thus suggest to construct a MEM-based observable as: The charge asymmetry can then be defined by counting the number of events for which either D ± (x) < 0 or D ± (x) > 0.  The implementation of the computation of the weights W(x|tW + ) and W(x|tW − ) is straightforward in MoMEMta.The directions of all "visible" objects (b quark, leptons) are assumed to be perfectly reconstructed, and the transfer function reduces to factorised parameterisations of the resolution of their energies.We choose as integration variables the invariant masses of the top quark and both W boson propagators, the energy of the b quark and of the two charged leptons, and the azimuthal direction of the neutrino from the W boson that does not originate from the decay of a top quark (denoted W ).This can be achieved in MoMEMta by pairing the "Secondary Block B" with the "Main Block B", to handle the phase-space pertaining to the t → b(W → ν) and W → ν decay chains, respectively.In addition, the enhancements in the matrix elements due to the top and W propagators can be removed by a further transformation applied on their invariant masses.
The distribution of the D ± discriminant is shown on Fig. 2.About 75% of events from either process can be retained on each side of D ± = 0, which by symmetry leads to a corresponding mistag rate of 25%.Depending on the analysis needs, the purity can be further improved at the cost of efficiency (e.g.we obtain 1.5% mistag rate for 25% efficiency).

ttH production
One of the most successful uses of the MEM at the LHC can be found in the searches for ttH production.The CMS and ATLAS collaborations have applied the MEM in final states with H → bb [11][12][13][14][15], and multi-lepton final states with either H → VV, where V = W or Z, or H → ττ [16][17][18].
Here we demonstrate MoMEMta's ability to efficiently handle processes as complex as ttH, featuring a large final-state multiplicity, several propagator enhancements in the matrix element, missing information due to neutrinos, and many possible jet-parton assignments.We consider the channel where the Higgs decays to bb and both top quarks decay leptonically, for which the main irreducible background consists of tt + bb associated production.The relevance of the MEM in this channel was first demonstrated in Ref. [54].We generate samples for signal and background processes and select events with 2 opposite-charge leptons and at least 4 b-tagged jets.
Weights are computed with MoMEMta under two hypotheses, ttH(bb) and ttbb, using the following parameterisations of the phase-space.The assumptions related to the transfer function are the same as those considered in Sec.4.2, with the exception of the charged leptons, assumed to be perfectly reconstructed.For both hypotheses, the energies of the b quarks coming from the decays of the top quarks are retained as integration variables.To reduce the number of dimensions to integrate over, we work in the narrow-width approximation (NWA), by which the W boson or top quark propagators are approximated by Dirac delta functions.The momenta of the unobserved neutrinos are then fixed by the constraints on the top quark and W boson invariant masses, as well as by the observed transverse missing momentum in the event.This choice can be implemented by applying the change of variable "Block D" on the standard phase-space parameterisation for the decay products of the top quarks, and fixing the invariants associated with the top quark and W boson propagators (s 134 , s 256 , s 13 and s 25 in Tab. 1) to their respective pole masses.The remaining degrees of freedom are handled differently, depending on the hypothesis: • ttbb: The standard polar phase-space parameterisation for the two extra b quarks is retained, i.e. we integrate over both their energies.
• ttH: We integrate over the energy of one of the b quarks from the Higgs decay.The other quark's energy is fixed by the requirement that their invariant mass be equal to the Higgs mass (NWA).In MoMEMta, this is achieved by applying the transformation of the "Secondary Block C/D", and fixing s 12 to m H .
Using the above parameterisation, the peaks in the integrand remain mapped to the integration variables, and the unobserved degrees of freedom due to the two neutrinos in the final state are effectively removed.Finally, the integrand needs to be averaged over every one of the 4! = 24 possible assignments between jets and partons.By using an additional integration variable controlling which assignment should be considered, the weights are obtained from a single call to the integration routine, and adaptive integration algorithms automatically focus on those assignments yielding large contribution to the final answer.Figure 3 shows the normalised distributions of the event information I ttH and a discriminating variable defined as By applying a requirement on D sig such that 50% of the ttH signal is retained, 83% of the tt + jets background can be rejected.As a comparison, using the invariant masses of b-tagged jets in the events would only reject 65% of the background for the same signal efficiency.

Indicative performance figures
We give approximate performance figures observed when computing weights for the different usecases presented above.It should be clear that those number are indicative only, as the computation time strongly depends on the considered hypothesis and the parameters of the numerical integration algorithm.Furthermore, these results were obtained using the functionalities available out-of-thebox in MoMEMta, and with matrix elements generated by our plugin for MG5_ MC@NLO, which means no attempt whatsoever was made towards optimising the computation for these particular cases.MoMEMta has been designed with the aim of being flexible, enabling users to implement simplifications or optimisations fit for their needs.Note that tuning the parameters of the algorithms used for the numerical integration of the weights can have a strong impact on both the precision p of the resulting integrals (which in turns impacts the power of the discriminant built from the weights), and the overall computation time T. Generally, all other things being equal, the evaluation time T scales roughly as T ∝ p −2 .The computation of the weights used in Sec.4.3 was carried out using two different integration algorithms available in the C library: Vegas and Divonne.The latter was found to yield substantially shorter completion times, without compromising the discrimination between signal and background with respect to the former.
In Tab. 3 we give the average per-event computation times for the weights used in the examples of Sec. 4, along with the average relative precision on these weights reported by the integration algorithm.These results were obtained on a computer cluster with an average per-core HS06 score1 of 9.1.Table 4 shows how much time is spent on the main elements of the computation.These fractions are indicative and vary from event to event, but show that for complex hypotheses such as those considered in Sec.4.3, the bottleneck in the computation is due to the evaluation of the matrix element.
The memory consumption of MoMEMta is strongly linked to the way the integration algorithm is configured.In practice, for the examples shown here, memory consumption was observed never to exceed 200 MB.

Summary
We have presented MoMEMta, a modular software package to compute the convolution integrals at the core of the MEM.Its modular structure covers the needs of experimental analysis workflows at the LHC without compromising the ease of use on simpler and smaller simulated samples used for phenomenological studies.The MEM is used in HEP by both theoretical and experimental communities with different purposes and levels of complexity ranging from the evaluation of a matrix element on a reconstructed event to the precise evaluation of model parameters through the use of the properly normalised likelihood.We have described a few use cases of the MEM for signal extraction in LHC analyses showcasing the different levels of complexity.From the most simple implementation, for Higgs boson characterisation in the H → ZZ * → 4l channel, to a complex final states such as ttH production, MoMEMta has proven to be as performing as previous implementations.
MoMEMta is designed to offer a flexible and reusable framework for a wide range of applications of the MEM.It aims at implementing a wide range of possibilities while letting the user in control of the subset of features actually used for a particular application.It also allows the user to substitute optimised code for part of the integration while profiting from other facilities.A typical example would be the use of an optimised matrix element implementation while keeping the benefits of the I/O, configuration, and integration framework.
The modular structure of MoMEMta lets us envisage a rich set of future developments.To cite a few, we consider adding an interface to matrix element libraries such as MCFM [24] or Sherpa [55], extending the integration over missing (unreconstructed) objects or on the contrary on additional objects present in the reconstructed event, or extending the transfer function to include an efficiency term.Other developments could further enhance the performance of the integration itself, through the use of vector integrand with modified transfer function to evaluate the effect of systematic uncertainties, or through the use of machine-learning inspired integration algorithms [56].

Figure 1 .
Figure 1.Normalised distribution of the D bkg (left) and D 0 − (right) variables for the gg → H → ZZ * → 4µ and qq → ZZ/Zγ * → 4µ processes, where the resonance H is taken to be the SM Higgs or a pseudo-scalar of the same mass (0 − ).For the right-hand figure, we require D bkg > 0.5.

Figure 2 .
Figure 2. Normalised distribution of the D ± asymmetry observable, as defined in (4.3), for the chargeconjugate processes tW − and tW + .Backgrounds such as tt production are expected to be distributed symmetrically around zero.

Figure 3 .
Figure 3. Left: signal information under the signal hypothesis (ttH).Right: discriminating variable built from the weights in the signal and background hypotheses.All distributions are normalised to unity.

Table 3 .
Average computation time of weights under the different hypotheses used in Sec. 4. The average relative precision on the resulting weights are also given.

Table 4 .
Indicative shares of computation time due to the different elements entering the evaluation of weights under different hypotheses.The elements shown are the evaluation of the matrix element, the parton distribution functions and the transfer functions, the generation of the phase-space (including the change of variables introduced in Sec.3), and the various remaining tasks.