Setting limits on Effective Field Theories: the case of Dark Matter

The usage of Effective Field Theories (EFT) for LHC new physics searches is receiving increasing attention. It is thus important to clarify all the aspects related with the applicability of the EFT formalism in the LHC environment, where the large available energy can produce reactions that overcome the maximal range of validity, i.e. the cutoff, of the theory. We show that this does not forbid to set rigorous limits on the EFT parameter space through a modified version of the ordinary binned likelihood hypothesis test, which we design and validate. Our limit-setting strategy can be carried on in its full-fledged form by the LHC experimental collaborations, or performed externally to the collaborations, through the Simplified Likelihood approach, by relying on certain approximations. We apply it to the recent CMS mono-jet analysis and derive limits on a Dark Matter (DM) EFT model. DM is selected as a case study because the limited reach on the DM production EFT Wilson coefficient and the structure of the theory suggests that the cutoff might be dangerously low, well within the LHC reach. However our strategy can also be applied, if needed, to EFT’s parametrising the indirect effects of heavy new physics in the Electroweak and Higgs sectors.


Introduction
The preliminary LHC results made clear that new physics does not assume the "vanilla" form we had imagined for it, leaving us uncertain on how to proceed in the search for new phenomena. The problem is that no single explicit new physics model, through which complete signal predictions could be made, singles out at present as compelling or particularly motivated. Not having to our disposal a sharp signal hypothesis to be compared with the data is obviously a complication. The problem has been addressed, and solved in most cases, by developing comprehensive phenomenological descriptions aimed at parametrising a wide set of new physics models or scenarios in terms of few free parameters. The transition from the constrained MSSM to phenomenological parametrisations of the signal topologies broadly expected in SUSY, advocated in ref. [1], is the paradigmatic example of this approach.
Phenomenological models are useful, but they must be employed with care. It must be kept in mind that they own their simplicity and generality to the fact that they are designed to describe only a specific set of physical processes. Their few free parameters indeed JHEP08(2017)074 correspond to those combinations of the underlying theory parameters that control the processes we decided to focus on. If used outside their domain of validity, their predictions might be incompatible with the new physics scenario they aim at describing, or they correspond to a very specific model and loose their generality. For instance, it is obvious that one should not try to study Dark Matter (DM) in SUSY through a model designed to describe the gluino pair-production topology. Or, one should not use a model for vector resonances production [2] to describe the shape of the signal outside the resonant peak. Making a proper usage of phenomenological models is simple in most cases: it suffices to select for the analysis only those events that are within the domain of validity of the model. However this might not be possible, for instance because the events to which the model applies are not experimentally distinguishable from the others. When this occurs, the situation is more complicated and a special treatment is needed. Addressing this issue for DM searches based on Effective Field Theories (EFT) is the purpose of the present paper. The strategy we will develop can be also applied, if needed, to other EFT-based analyses.
The formalism of low-energy EFT's allows us to produce phenomenological models in a rigorous and systematic way. EFT's provide a complete description of the low-energy dynamics of the lightest particles of a given microscopic (UV) theory, under the sole assumption that the rest of the spectrum sits at an energy scale M cut which is much above the light particles' mass. The validity domain of EFT-based phenomenological models is thus not restricted to a limited set of topologies, but rather to a limited range of energy. All reactions with a center of mass energy E cm below M cut , called the EFT cutoff, are accurately described by the EFT in terms of a few leading EFT interaction operators. The coefficients of these operators, called Wilson coefficients, are free parameters of the EFT. Any UV theory with appropriate low-energy spectrum corresponds to a given value of the Wilson coefficients. By varying the latter ones can thus effectively explore all the UV models with the light particles (and symmetries, which restrict the allowed interaction operators) content we assumed in the construction of the EFT. One can of course also reverse the logic and, starting from experimental results (e.g., from exclusion limits) given in terms of the Wilson coefficients, translate them into constraints on any specific model, in which the Wilson coefficients can be calculated. All this however only works if the EFT is properly employed, i.e. if the comparison of the theory with data relies in no way on the EFT predictions for reactions with E cm at or above the cutoff M cut . As E cm approaches M cut , indeed, the EFT description obtained with the leading operators becomes less and less accurate and more and more sub-leading interactions should be added. At E cm ≥ M cut , infinitely many operators become relevant and the leading-order EFT prediction loses any resemblance to the one of the original theory. In the high energy region the EFT predictions should be replaced with those of the underlying UV theory, which however is precisely what we want to be agnostic about.
EFT's have been employed for new physics searches in a multitude of different contexts. Applications range from flavour and electroweak (EW) precision physics to DM direct and indirect searches and, more recently, to LHC studies of DM [3][4][5][6][7][8][9][10] and of the EW-plus-Higgs sector [11][12][13][14][15][16][17]. The limited range of validity of the EFT is never an issue for non-LHC applications, because the experimental conditions automatically forbid E cm to exceed M cut .

JHEP08(2017)074
For instance in flavour measurements, E cm < few GeV, while the new physics scale one probes by the EFT (i.e., M cut ) is easily above or much above the TeV. A lot of energy is instead available at the LHC and the EFT validity problem must be carefully addressed. This has been studied in ref.s [18][19][20][21][22][23][24][25][26] in the context of DM and in ref.s [27,28] for EWplus-Higgs sector EFT's. The cases discussed in ref. [27], and in ref. [28] for the neutral Drell-Yan analysis, are very easy to deal with, as E cm can be measured experimentally.
By an E cm < M cut selection cut one can thus restrict the data to the region where the EFT predictions are reliable. With this cut in place, data can be compared with the EFT prediction with the standard statistical tools (see e.g. [29][30][31][32]) producing measurements of the Wilson coefficients or, more often, exclusion limits. 1 Obviously E cm cannot be measured in DM searches because the DM particles escape from the detector. Still we will show that a fully rigorous limit-setting strategy can be set up and used to interpret LHC "mono-X" DM searches in the EFT context. Our approach is instead not suited to characterise a discovery. We will comment on this point in the Conclusions.
Mono-X DM searches consist in measuring the detector-level Missing Transverse Energy (E / T ) distribution in events characterised by the presence of one visible object X. Concretely, X can be a jet [33,34], a photon [35], a vector or Higgs boson [36][37][38] or even a top pair [39]. The production of DM contributes to the E / T distribution, or better to its binned version measured in N bins, by an amount where d 2 n DM i is the detector-level DM+X production cross-section times the integrated luminosity, doubly differential in E / T and in the center of mass energy. 2 The equation has been written in a format that outlines how the DM production signal in each E / T bin receives contributions from all the E cm energy scales up to the total collider energy √ S = 13 TeV at the LHC run-2. Notice instead that a lower bound (depending on E / T , see e.g. eq. (3.5)) can be set on the E cm integration domain. This is not relevant for the discussion, that focuses on high-energy contributions, and thus it has not been reported in the equation.
The problem with the EFT is that it does not allow us to compute eq. (1.1) fully, but only part of it. Namely, splitting the E cm integral in the regions below and above the cutoff, we have The first term accounts from DM production occurring at E cm < M cut , i.e. within the domain of validity of the EFT. We can thus compute it by using the EFT, as a function 1 Notice that the limits depend on Mcut, which is a free parameter to be scanned over. Limits are thus effectively set in the enlarged parameter space defined by the Wilson coefficients plus the EFT cutoff Mcut. This is perfectly correct because the cutoff is indeed one of the parameters of the EFT, as extensively discussed in [24]. 2 An unambiguous definition of Ecm is postponed to section 3.

JHEP08(2017)074
of its free parameters (including the cutoff), obtaining the prediction n EFT i . The second contribution, ∆n i , is instead not known from the viewpoint of the EFT because it accounts for "hard" DM production, originating from energy scales where the EFT description no longer applies. All what we know for sure is that it is necessarily positive, ∆n i ≥ 0 ∀i. This being so is obvious from the definition, and physically due to the fact that n EFT i and ∆n i emerge from two kinematically different (and thus quantum-mechanically distinguishable) processes. As such they cannot interfere, producing two positive contributions that add up to each other. In what follows, we will denote n EFT i as "the signal", in spite of the fact that it is only one of the two contributions to the total DM production rate. We will refer to the second contribution, ∆n i , as "unknown additional signal".
The statistical problem of setting limits in the presence of an unknown additional signal has never been seriously addressed in the literature. A simple but unsatisfactory solution (adopted for instance in ref. [24]) is to turn the experimental analysis into a cut-and-count experiment, i.e. to measure the distribution in a single E / T bin. Together with the estimate of the SM background, this single measurement produces an upper limit, call it n DM exc. , on the total DM production signal (or, equivalently, on the DM production cross-section) one can tolerate in the search region with a given (typically, 95%) Confidence Level (CL). The limit is fully model-independent and it is readily interpreted in the EFT by writing This method is rigorous, but largely unsatisfactory. First because the choice of the search region needs to be optimised not only for each given EFT model, but also for each point in the EFT parameter space. This was shown in ref. [24], where the 4 search regions considered by ATLAS in ref. [40] were found to have different limit-setting performances in different regions of the EFT parameter space. Moreover, the cut-and-count approach is never fully optimal as it does not make use of the whole available experimental information. For that, one would need to employ the entire measured distribution, but this is not straightforward in the presence of additional signal. The standard statistical methods to compare a binned histogram with the data crucially rely on the ability of the model to provide a definite prediction for (at least) the shape of the distribution. The EFT model instead gives us only a lower limit on the distribution and tells us nothing about the shape, which is sensitive to the one of the additional signal component. A slight variation of the standard hypothesis test methodology is needed in order to deal with this situation. The rest of the paper is organised as follows. In section 2 we discuss several strategies to address the problem of limit-setting in the presence of additional signal. We start from a complete approach, to be adopted by the LHC experimental collaboration that have access to all the details of the analysis, and simplify it down to a level that allows it to be performed outside the collaborations, relying on the minimal amount of experimental information. The usage of simplified likelihoods [41] (see also [42]) emerges as a valid strategy, but alternatives could also be considered. In section 3 we return to DM EFT and set limits on a specific d = 6 EFT operator by re-interpreting the CMS mono-jet analysis in ref. [34], for which the simplified likelihood is available. On top of serving as an illustration of the limit-setting method, this example will give us the opportunity to JHEP08(2017)074 introduce and to validate the operative definition of the center of mass energy E cm to be employed for the E cm < M cut restriction in the EFT signal simulation. In section 4 we report our conclusions and comment on the applicability of our study outside the domain of DM searches. A validation of our limit-setting strategy and some algebraic derivations are reported in appendices A and B, respectively.
2 Hypothesis test with unknown additional signal

The problem
Consider measuring N observables O = {O i }, i = 1, . . . N , whose joint p.d.f. (the probability model) depends on N "known" quantities S i (the signal), plus N unknown additional signal components ∆ i ≥ 0. Namely, the likelihood function is In what follows we will mostly be interested in interpreting O as event countings in N bins, S as the expected yields as predicted by the EFT and ∆ as the additional signal yield from reactions above the EFT cutoff, in direct correspondence with eq. (1.2). However O might also represent parton-level differential cross-section measurements, in which case S and ∆ are the cross-section predictions from low and high energy processes, respectively. Other parameters ν, among which the backgrounds, also influence the probability model. These are routinely treated as nuisance parameters (see e.g., [31,32]) and constrained by means of auxiliary measurements. Nuisance parameters play no role for the conceptual point we are making here, we thus momentarily ignore their presence. In order to construct an hypothesis test one defines in some way (typically out of the likelihood function) a test statistic which is such that a large value of t signals tension of the model with observations. The test statistic is itself a random variable, because of its dependence on O. Its distribution follows from the probability model for O and therefore it depends, a priori, on S and ∆ in a highly non trivial manner. Namely The probability for the model to produce an experimental result which is equally or more incompatible than the one observed is quantified by the p-value The model is said to be excluded at a given CL α if p < 1 − α, it is not excluded otherwise. The hypothesis test defined by eq. (2.4) is of course not what we want. It is a test for the probability model of O, which requires both S and ∆ to be specified. Our goal is instead to test a physics model that only predicts S and leaves complete freedom on what JHEP08(2017)074 the additional signal ∆ could be. We thus want our test to be such that the model is excluded only if its prediction are incompatible with the data for any value of ∆, provided of course ∆ i ≥ 0 ∀i. 3 In practice, we want the model to be excluded only if it fails the "ordinary" statistical test (i.e., p < 1 − α in eq. (2.4)) for any possible choice of the ∆ i 's. We would also like our test to exclude as much as possible, namely we want the model not to be excluded only if there exists at least one point in the ∆-space that would pass the ordinary test. Both these requirements are fulfilled by defining the p-value as and use it to set a limit in the S-space depending on whether p max ≶ 1 − α.
Eq. (2.5) is in principle all what we need to set limits in the presence of unknown additional signal, however it is not applicable in concrete. The main obstruction is the need of determining the p.d.f. of the test statistic, which in general cannot be computed analytically and must be obtained numerically by running a toy Monte Carlo. The p.d.f. depends on ∆, therefore one toy Monte Carlo would be needed at each point in the ∆space, resulting in a too demanding numerical procedure. Fortunately this problem is easy to address because the standard test statistic variables employed in LHC analyses are defined in such a way that their p.d.f.s assume a specific form (typically, a χ 2 distribution) under suitable conditions, namely in the so-called Asymptotic Limit (AL). The AL approximation relies on the assumption that a large set of data is employed in the analysis. Further qualifications and checks of its validity are reported in the following section and in appendix A. It suffices here to notice that in the AL the p.d.f. of t becomes independent of ∆ (and of S) so that eq. (2.5) becomes (2.6) The problem thus reduces to the one of minimising t in the ∆ space. We will show in the next section how to treat it by a combination of analytical and numerical techniques.
Before concluding this section it is worth noticing that it would be legitimate to regard the ∆'s as additional nuisance parameters. Exactly like the ν's, indeed, they are parameters whose values we are not interested in testing that however enter in the probability model for the observables. We decided not to adopt this interpretation and we introduced the terminology of "unknown additional signal" because ordinary nuisance parameters are constrained by auxiliary measurements (or by theoretical considerations, for theory-driven errors) while there is no way to constrain the ∆'s.

Possible solutions
We discuss now how to deal with eq. (2.5) (actually with its much simpler version (2.6)) in concrete. We begin with the full-fledged treatment, to be carried on directly by the JHEP08(2017)074 experimental collaborations starting from the actual data and from the detailed knowledge of the nuisance parameters, and we progressively simplify it.

The full story
In an LHC experimental analysis, the observations O are event countings in N disjoint bins. Their p.d.f. is thus the product of N independent Poisson distributions and it is fully specified by the mean values M = {M i }. The M's consist of three terms where S is the signal, B is the background, ∆ is the additional signal and ν = {νâ} (â = 1, . . . , κ) represents the nuisance parameters. The signal is the component of M that emerges from the new physics model we aim at testing, i.e. from the EFT with the E cm < M cut restriction as in eq. (1.2). 4 It depends of course on nuisance parameters such as the uncertainties on the luminosity, on the efficiency for trigger and final state reconstruction, etc. Notice that the signal S is taken here to depend on N parameters s = {s i }, one for each histogram bin. Those are meant to be N quantities of purely theoretical nature, calculable within our model with no reference to experimental effects that would induce a dependence on the nuisance parameters. For definiteness, we will think to s as the parton-level EFT cross-section in each bin, however we will never need to specify its exact definition because S will be directly obtained from simulations performed within the model of interest, with no need of computing it as a function of s as an intermediate step. Notice that our approach is different from the standard one [30][31][32]. In the latter, one takes into account from the very beginning that S is actually function of a smaller set of parameters, namely of the free parameters of the BSM model under consideration. One actually goes even further than that, and treats all the BSM parameters as fixed, aside from the total cross-section normalisation µ (the so-called signal-strength). What we are doing here is to first construct an hypothesis test for a generic model in which the cross-section in each bin is a free parameter, and next restrict it to the model of interest in which s (and in turn S) is function of µ only. The concrete implications of this different approach are described below and the reason why we did not adopt the standard strategy is explained in section 2.2.6. The other terms in eq. (2.7) are easier to understand. B is the "ideally reducible" SM background (namely, the one that does not interfere with the signal, see section 2.2.5), some of whose components are obtained from simulations while others are measured in control regions. The background is affected by nuisance parameters which are not necessarily the same ones appearing in the signal prediction. For instance, the background components estimated from data are themselves nuisance parameters, which obviously do not affect the signal prediction. Our notation, in which all the nuisance are collected in a single vector ν, is general enough to account for all possibilities. The last term in eq. (2.7) is the additional JHEP08(2017)074 signal component ∆ and it is independent of the nuisance. One might find this confusing if thinking to the additional signal as emerging from an additional term in the parton-level cross-section. A detector simulation would be needed to relate this additional cross-section to the additional signal and a dependence on the nuisance would be introduced. However the additional cross-section is completely arbitrary (but positive) and therefore it produces a completely arbitrary (but positive, ∆ i ≥ 0) additional signal. We can thus just take the latter, instead of the additional cross-section, to specify our probability model.

The complete likelihood function reads
where L ν is the likelihood for the nuisance parameters obtained by a set of auxiliary measurements, such as those performed in control region or in detector calibration studies. Auxiliary measurements being by assumption insensitive to the putative signal, L ν is exclusively a function of ν. Out of the complete likelihood in eq. (2.8), the test statistic is defined as the Profile Log Likelihood Ratio In the numerator, the likelihood is maximised with respect to the nuisance parameters only, while keeping s fixed. In the denominator, one maximises with respect to both s and ν, leading to the absolute maximum of the likelihood in the entire parameter space of the s's and of the ν's. No maximisation is instead performed over the ∆'s that are treated, at this stage, as fixed constants. Our first goal is indeed to define a test for each individual ∆ hypothesis, to be eventually turned into a ∆-independent test with the strategy outlined in the previous section.
The test statistic in eq. (2.9) perfectly complies with the general definition of Profile Log Likelihood Ratio given in [30,31,43,44], however it differs from the standard definition employed in LHC analyses [31,32]. In the standard case, as previously explained, the signal is restricted from the very beginning to be the one predicted by the model at hand and the signal-strength µ is taken as the only "parameter of interest". This means that the supremum in the denominator of the likelihood ratio formula is taken over µ, rather than over the N parameters s i as in our case. This difference has two implications. The first is that the denominator can be computed analytically in our case. Given that we now have N free parameters to maximise over, we can use them to set M i = O i in each bin, reaching in this way the absolute maximum of the Poisson likelihood. 5 At the same time one of JHEP08(2017)074 course sets ν to whatever value (call it ν 0 ) maximises L ν . We can thus rewrite eq. (2.9) as This formula is the generalisation, including nuisance, of the maximum likelihood goodnessof-fit test for Poisson countings (see e.g. [29]). The second difference with the ordinary test is that t is now distributed, in the AL, as a χ 2 with N degrees of freedom, unlike the ordinary "signal-strength-based" test statistic that is distributed as a χ 2 with 1 degree of freedom independently of the number of bins. This result straightforwardly follows from the Wilks' theorem [43,44], according to which the Profile Log Likelihood Ratio follows a χ 2 distribution with a number of degrees of freedom equal to the number of parameters of interest. The parameters of interest are the vector s, of length N , therefore The similarity of eq. (2.10) with the maximum likelihood goodness-of-fit, which is also (in the AL limit) a χ 2 with N degrees of freedom, is apparent also in this respect. The AL formula (2.11) is, as previously explained, essential for our construction. We must thus discuss in details the conditions for its validity. The AL is the one in which the data sample is large, a condition that we interprete in a restrictive manner by asking for O i 1 in each bin. 6 Of course one cannot check the validity of the latter condition before the experiment is performed. Therefore for the design of the hypothesis test, prior to the experiment, we interprete the AL condition as M i 1 for the expected countings in each bin. The idea is that if O i will eventually turn out to be much different from M i , for instance if M i O i 1, the hypothesis will be excluded anyhow and the possible unaccuracy of the AL formula in this regime will not be an issue. Of course M i depends on ν, however the nuisance parameters are normally well constrained by the auxiliary measurements so that their variation within few sigmas around the central value ν 0 does not change M i radically. The M i 1 condition can thus be enforced at ν = ν 0 . The M i 's also depend on the unknown additional signal components ∆ i , and clearly we need the AL formula to hold for any ∆ i .
is the necessary and sufficient condition for the validity of the AL formula in the entire ∆-space. One might ask how large S i + B i concretely needs to be for the AL to hold with good accuracy. In appendix A we address this question quantitatively in a toy example and we find that values as small as 3 are sufficient. Thousands of countings are expected JHEP08(2017)074 in the mono-jet DM search we study in the next section, therefore the number of events is never an issue. The situation might be different for other applications of our method. In order to construct a ∆-independent hypothesis test we define p max like in eq. (2.5) and we use the AL formula (2.11) to express it, similarly to eq. (2.6), as with t given in eq. (2.10). Notice that t is itself the result of a minimisation, with respect to ν. We must further minimise, with respect to ∆, in order to obtain t min . However it is convenient to invert the order of the two minimisations, because the one with respect to ∆, if performed first, can be done analytically. The point is that the terms in the round brackets of eq. (2.14) The result is trivial for its simplicity. Over-fluctuating bins do not contribute to t min because for them we set M i = O i . The very simple reason for this is that O i ≥ S i + B i is perfectly compatible with the hypothesis, the over-fluctuation of the observed being possibly due to some amount of additional signal. Under-fluctuating bins do instead contribute to t min , and their contribution is conservatively evaluated at ∆ i = 0. Notice that in spite of the fact that they do not contribute to t min , over-fluctuating bins are implicitly taken into account in our procedure. All the bins are indeed counted in the number of degrees of freedom N of the χ 2 distribution by which compatibility or incompatibility is established. An over-fluctuating (and thus perfectly compatible with the hypothesis, as explained) bin does not contribute to t min but contributes to N and thus it increases the p-value making the model more compatible with data as it should.
Having defined the test for a generic model, with arbitrary parton-level cross-section in each bin, we can now straightforwardly restrict it to the model of interest, in which s JHEP08(2017)074 depends on a single signal-strength parameter µ as s i (µ) = µ s i . 7 We just have to set, in eq. (2.14) We stress once again that this step does not require us to first compute S as a function of s and next compute and plug-in s as a function of µ. Everything can be done in one step by simulating the model with the benchmark value for its parameters, that corresponds to µ = 1, compute S i and eventually rescale by µ. Once this is done, the substitution in eq. (2.15) makes t min become a function of µ (and of the observations) only and the exclusion limit on µ, call it µ exc , is set by solving the equation like for an ordinary hypothesis test.
We described the steps needed to apply our method in a rather pedantic manner, with the purpose of outlining that it does not require more work than the standard LHC limitsetting strategy. It requires signal simulations at different points of the ν-space, in order to assess the impact of nuisance on the signal, the determination of the backgrounds in control regions and the numerical minimisation over the nuisance parameters. All these steps are equally necessary for a standard analysis, which on the other hand cannot deal with additional signal. One additional technical difficulty might actually emerge in our case, due to the fact that whether or not one bin over-fluctuates, and should thus be included in the sum or not, might depend on the value of the nuisance parameters. This occurs when one of the bins has O i nearly equal to S i + B i at the central value ν = ν 0 of the nuisance likelihood. In this case nuisance variations around the central value can turn the bin from under-fluctuating to over-fluctuating. The condition of summing over under-fluctuating bins in eq. (2.14) is practically implemented by a step function for each bin in front of the term in the round bracket. These step functions produce discontinuities where their arguments change sign. Actually we don't have discontinuities in the function or in its first derivative, because the term in the round bracket vanishes, together with its derivative, at the singular point S i +B i = O i . Discontinuities appear in the second and higher derivatives. Such discontinuities might be impossible to treat for certain minimisation algorithms, in which case they should be regularised and this would result in a more demanding numerical procedure. We are not in the position to assess how severe the problem would actually be in a concrete LHC analysis (we easily dealt with it in the toy example in appendix A), however we are confident that it can be circumvented. Nevertheless in the next section, in the process of simplifying our test, we will describe an approximate strategy that does not require minimising a non doubly-differentiable function.

A first simplification
A simple approximate version of our test is obtained by relying on three assumptions. The first one is that nuisance parameters (νâ,â = 1, . . . , κ) have Gaussian likelihood, i.e.
with constant ( ν-independent) covariance matrix V . The second assumption is that nuisance affects S and B mildly, so that it is legitimate to Taylor-expand them around ν = ν 0 This is of course necessarily true if ν is very close to ν 0 . The assumption we make is that the linear expansion holds when ν is few sigmas (in terms of the covariance matrix V ) away from ν 0 , which is the relevant region for the minimisation of eq. (2.10). Eq. (2.18) thus expresses the condition that the errors on the nuisance are small, in such a way that the few sigma region around ν 0 is small and the linear expansion is justified. Notice that few sigma variations of ν resulting in a mild change of M, i.e. δM M, does not imply that the uncertainties on the nuisance are going to have a mild impact on the limit. Whether or not this is the case depends on how large or small M is in absolute value, which in turn determines the magnitude of the statistical error. In appendix A we will come back to this point and to a quantitative assessment of how small the relative error due to nuisance needs to be for the validity of the approximation.
The third assumption is that the countings are approximately Gaussian distributed, with mean and variance M i , so that the ∆-dependent test statistic in eq. (2.10) becomes Poisson countings automatically become Gaussian in the large-M i limit we already assumed in eq. (2.12) in order to obtain a χ 2 -distributed test statistic as in eq. (2.11). However the Gaussian approximation for the Poisson distribution becomes accurate enough only for values of M i which are larger (above around few tenths, see appendix A) than those needed for the validity of eq. (2.11). Few straightforward manipulations, reported for completeness in appendix B, allow us to compute eq. (2.19) analytically in the δM i M i limit of eq. (2.18). The result is the familiar χ 2 formula where the total covariance matrix Σ tot is the sum of statistical (Σ M ) and systematical (Σ ν ) errors. Let us discuss them in turn. The statistical error is the variance of the

JHEP08(2017)074
Poisson distribution in each bin, evaluated at the central value of the nuisance parameters Obviously it is diagonal, since statistical errors are uncorrelated. The systematical component depends on the nuisance covariance matrix V and on how sensitive the M i 's are to nuisance departures from the central value. It is given by a formula which seems, at first sight, complicated to apply in a concrete analysis where often neither V nor the functional dependence of M on ν are known explicitly. However by the standard error propagation formula we can rewrite it as where "E" denotes the expectation value over the nuisance parameters, treated here as random variables with p.d.f. L ν . By sampling the ν-space with weight L ν , computing M by means of simulations and taking averages, Σ ν is easily obtained with no need of determining V and ∂âM i as an intermediate step. 8 Having expressed t( s, ∆; O ) analytically by the χ 2 formula, we should now minimise it over the additional signal components ∆ i ≥ 0, as prescribed by eq. (2.13). While this minimisation will in general be performed numerically, it is important to get an analytical understanding of the dependence of t on ∆, the way it emerges from the various elements that compose the χ 2 formula (2.20). The first element is M 0 i , which is given by having made use of eq. (2.15) to restrict the signal to the one predicted by our model in terms of the unique signal-strength parameter µ. The M i 's depend on ∆, but in a trivial (additive) manner. The terms S 0 i and B 0 i are of course independent of ∆ and thus they can be computed once and for all by setting the nuisance parameters to their central values. For the nuisance that correspond to instrumental effects (e.g., trigger or reconstruction), this amounts to perform one single signal and background simulation with the nuisance set to their nominal values. For nuisance associated with backgrounds, it amounts to compute the background at the central value obtained from the fit in the control regions. The dependence on µ of the signal is included by rescaling the benchmark simulation. The second element we need in the χ 2 formula is Σ ν which, importantly enough, is independent of ∆. This being the case is obvious from its definition, but also from rewriting eq. (2.23) explicitly as

JHEP08(2017)074
This equation allows us to compute Σ ν as a function of the signal strength parameter by sampling the nuisance space as previously explained. Together with eq. (2.24), it provides us with t as function of µ and ∆. After minimising over ∆ we obtain t min (µ; O) and in turn µ exc by solving eq. (2.16). The minimisation of t will have to be performed numerically, but this does not pose any conceptual difficulty as the χ 2 is an infinitely differential function of ∆. The procedure can however be rather slow because each function call requires the inversion of Σ tot , which depends on ∆ through Σ M . A considerable simplification is obtained by replacing eq. (2.20) with the "modified" χ 2 formula where Σ O is the statistical error matrix computed with the observed countings rather than with the expected ones. The logic behind this simplification is that M i cannot be much different from O i , in the limit of high statistic, for configurations that are not trivially excluded. The modified χ 2 is a simple quadratic function of ∆, which can minimised either analytically (though with some complication due to the condition ∆ i ≥ 0) or numerically with a fast procedure since no ∆-dependent matrix inversion is involved in the evaluation of the function. The modified χ 2 is known to be a poor approximation of the exact one, that requires large statistics to become accurate. We verify this fact explicitly with our toy example in appendix A. However eq. (2.26) is useful also when the statistics is not large in order to get a first estimate of the location of the minimum in the ∆-space, to be used as a convenient starting point for the minimisation of the exact χ 2 formula.

Simplified likelihood
The limit-setting strategies outlined above require complete knowledge of the data and full control of the sources of systematic uncertainty. Thus they must be carried on internally by the experimental collaborations. Alternative approaches, in which the limit is set externally to the collaborations, are highly desirable because they avoid the presentation of the experimental results being committed to a specific (though quite general, like an EFT) new physics model. The way to proceed is, ideally, rather obvious. The experiments should select a set of quantities calculable with ease by theorists that effectively parametrise a generic signal hypothesis, and report the likelihood as a function of these quantities. If the likelihood is Gaussian, it is of course sufficient to report central values and covariance. In the case of DM, valid quantities to be considered are certainly the parton-level DM plus X production cross-sections in each E / T bin, call them σ i . 9 If L( σ) was given, one could straightforwardly apply the logic we outlined in section 2.1 and carry on limit-setting with no need of further experimental information. One would do so by adopting for eq. (2.1) a slightly different interpretation than the one of sections 2.2.1 and 2.2.2. The signal S would 9 These should not be confused with the si's we defined in section 2.2.1. The latter are the EFT contribution to the cross-section, while σi is the total cross-section emerging from the sum of the EFT and of the unknown UV contribution.

JHEP08(2017)074
be the contribution to σ (call it σ EFT ) coming from the EFT truncated by the E cm < M cut restriction. ∆ would be the additional parton-level DM production cross-section rather then the additional detector-level signal as in sections 2.2.1 and 2.2.2. The test statistic would be defined as the Log Likelihood Ratio and would be distributed as a χ 2 with N degrees of freedom in the AL. After computing σ EFT = µ σ EFT at parton level, plugging in t and maximising over ∆ we would end up with eq. (2.16) and use it to set the limit.
The above procedure has limitations, the first one being that it requires quite a bit of additional experimental work. Computing L( σ) requires taking the benchmark model simulation, rescaling it with one signal-strength parameter for each histogram bin and studying the dependence of the likelihood on these parameters. The second difficulty is that it exposes us to errors that is difficult to quantify, associated with the fact that other kinematical distributions (other than E / T ) of the benchmark simulation might be different from those of the EFT signal to which the analysis will eventually be applied. Such difference might affect the experimental efficiencies and produce a non-accurate result. Even more worrisome is the fact that the kinematical distributions of the additional signal are completely unknown. Notice that this is not an issue with the approaches of sections 2.2.1 and 2.2.2 because ∆ there represents the additional signal yield, whose connection with the additional cross-section needs not to be specified.
Another approach to limit reinterpretation, that does not suffer of the issues above, is based on a "Simplified Likelihood" [41,42]. The way it works is nicely described by looking at eq. (2.25). Assume the approximations we made in section 2.2.2 are justified, so that t becomes the χ 2 in eq. (2.20), with Σ ν as in eq. (2.25). Suppose also, and this needs to be checked case-by-case, that none of the relevant sources of nuisance affect the signal and the background at the same time, or that nuisance effects on the signal are negligible. In the mono-jet DM search we discuss in section 3 this is quite a reasonable approximation since data-driven background estimate is the major source of uncertainty. In the hypothesis that signal and background nuisance parameters are disjoint, the mixed terms in eq. (2.25) drop and one obtains The simplified likelihood proposal [41,42] is that the experimental collaborations report Σ B , leaving to theorist the determination of Σ S , if needed. The central value of the background in each bin, B 0 i , will also be reported, while the evaluation of S 0 i is again left to theory estimate. The potential limitation of this method is that it relies on an accurate simulation of the detector-level signal S 0 i and, even worse, of Σ S , if not negligible. In section 3 we will apply the simplified likelihood method to the CMS mono-jet search, and we will argue that it should be reasonably accurate for our purpose. Validation from a full-fledged EFT experimental analysis would however be welcome.

Cross-section measurements
The last case which is worth discussing is the one in which the experimental collaborations report, as the result of the analysis, measurements of unfolded parton-level differential JHEP08(2017)074 cross-sections. When accurate theoretical predictions of the SM backgrounds are available, this makes setting limits on the EFT a very simple task. Clearly this approach is not applicable to DM searches, where the background cannot be predicted. 10 However it could be useful for other EFT studies such as the ones proposed in refs. [28,45]. The logic outlined in section 2.1 straightforwardly applies to cross-section measurements. The signal S in eq. (2.1) represents now the parton-level EFT cross-sections σ EFT , in N bins, computed within the EFT with the habitual E cm < M cut restriction. ∆ is the additional signal crosssection from reactions above the EFT cutoff and the observations O are the measured cross-sections σ m . The test statistic is just the where Σ is the covariance matrix of the measurements. The SM background b is taken here to be predicted with infinite accuracy, but of course it would not be hard to include the theory uncertainties on the background prediction in the χ 2 formula by proceeding like in the previous section. The test statistic is distributed as a χ 2 with N d.o.f., and it is very easy to maximise over ∆ i ≥ 0, being just a quadratic function. The limit is thus set as a trivial application of eq. (2.6).

Dealing with interference
Until now we described our limit-setting strategy having implicitly (or even explicitly, see eq. (2.15)) in mind the case in which the EFT produces a phenomenon (e.g., DM production) that does not occur in the SM. If this is the case, no quantum mechanical interference is present between the SM and the EFT Feynman diagrams and a clear distinction can be made between the "background" and the "signal", to be further split into the proper EFT signal and the unknown additional signal from high-energy reactions. Specifically, in DM mono-X searches the background is the complete SM contribution to the E / T distribution, emerging from SM processes with arbitrarily high E cm . The signal is instead DM production from E cm < M cut , to be computed starting from the EFT diagrams evaluated with the E cm < M cut restriction on the phase space. The additional signal is the UV contribution to DM production. Other interesting EFT's are those that describe BSM effects in the EW plus Higgs sector. These EFT's normally produce BSM corrections to processes that do occur also in the SM, and as such they do interfere with the SM diagrams. In the presence of interference our method should be applied as follows. The background must be interpreted as the sum of all SM processes that do not interfere with the EFT because they are characterised by a different parton-level final or initial state. We can call it the "ideally reducible" background, as one might get rid of it by an ideal detector capable to reconstruct all the particles (including, say, the neutrinos) with infinite accuracy. The "signal" is all the rest.

JHEP08(2017)074
It contains the pure EFT contribution, the SM-EFT interference and the SM prediction for the relevant final state. This is further split into "proper" low-energy EFT signal with the E cm < M cut restriction, plus "additional signal" from the UV. Specifically this means that the signal also contains SM terms (namely, the square of SM diagrams), which are also truncated by E cm < M cut . Similarly, the unknown additional signal is the physical contribution to the distribution that comes from E cm > M cut processes and as such it also includes high energy SM contributions. Notice that with this definition the additional signal ∆ i is positive (and the signal as well) because the distinction between the EFT signal and the additional one is made on a physical (kinematical) basis. This would not be the case if we had left the SM out of the truncation and regarded it as part of the background like was done in ref. [46].
With this interpretation in mind one can easily go through the previous sections and check that all the consideration we made remain valid. Our method can thus straightforwardly deal with interference, eq. (2.15) being the only formula that concretely needs to be modified. It assumes the signal being proportional to the signal-strength parameter µ, which is interpreted as the total signal cross-section. In the presence of interference µ should be better viewed as the coefficient of the EFT operator and eq. (2.15) takes the form (2.29) The three terms correspond respectively to the square of the EFT diagram, to the interference and to the SM contribution. All the previous formulas that rely on eq. (2.15) can be easily modified according to eq. (2.29).

Why not a signal-strength-based test ?
We extensively discussed in section 2.2.1 that our test is slightly "unusual" because it is first constructed for a generic model in which the signal cross-section in each bin is a free parameter, and later restricted to the model of interest where the signal-strength µ is the only parameter. Ordinary LHC tests are instead "signal-strength-based" from the very beginning, namely they are constructed having in mind that µ is the only parameter of the probability model. Focussing for simplicity on the case in which the interference is absent, the restricted signal is the one in eq. (2.15). In the construction of signal-strengthbased tests, this restriction is applied already in eq. (2.9), both in the numerator and the denominator of the likelihood ratio. This makes no difference for the numerator, where we also make use of eq. (2.15), eventually, but it is a big change for the denominator, where now the maximisation is performed over µ only rather than on s. The signal-strengthbased test has typically a stronger expected limit than the one we used, one might thus wander why we did not take this direction in our construction. The point is that what the signal-strength-based test actually does is comparing the hypothesis we want to test with the most favourable hypothesis (i.e., the maximal µ) that is present in the restricted set defined by eq. (2.15). If all the hypotheses in the set are far from the data, it returns an artificially high p-value. This is precisely the situation we encounter if we consider very large (positive) values for the additional signal ∆ i , much above the µ S i contribution JHEP08(2017)074 from the EFT and the observed O i . In this configuration, the total expected is much larger than the observed and the model is in tension with observations for any value of µ. Moreover the likelihood is nearly constant in µ because µ S i ∆ i , therefore the likelihood divided by its maximum is nearly 1, i.e. t 0, and the signal-strength-based test returns perfect compatibility. Thus a signal-strength-based test cannot be applied in the presence of additional signal because the minimisation of t over ∆ would always return t min = 0, with the minimum reached at ∆ → ∞.

CMS mono-jet search reinterpretation
As a simple example of DM EFT, we consider DM being a Majorana particle χ in the singlet of the EW group. If DM (with mass m DM ) is the lightest particle of its sector, and for energies below the mass "M med " of the other new particles (where M med m DM ), its interactions with the quarks are universally described by a set of d = 6 effective operators classified in ref. [10]. The effective operator Wilson coefficients are conveniently parametrised in terms of dimensionless parameters c i and of an overall interaction scale M * . The latter scale should not be confused with the EFT cutoff M cut or with the mass M med of the heavy particles mediating the quark-DM interaction. In order to avoid confusion it would have been convenient to trade it for a parameter G * ≡ 1/M 2 * , analog to the Fermi constant. We will nevertheless adopt the standard notation and use M * instead. A comprehensive study of all the operators in eq. (3.1) would be interesting, and straightforward with our methodology. However for the sake of simplicity we restrict here to a single axial-axial operator where q runs over the six species of SM quarks. This specific operator is not particularly motivated from a BSM perspective, still it has been extensively studied in the literature and several mediator models have been proposed for its microscopic origin. References can be found in [24]. The EFT we will study is thus endowed with a 3-dimensional parameter space, the parameters being the DM mass m DM , the interaction scale M * and the cutoff of the EFT M cut . The latter should be regarded and treated, for all practical purposes, as one of the free parameters of the EFT [24]. Notice that there is no way to define rigorously M cut , nor to get a hint of its value, before the UV-completion is specified. This is why it is important to treat it as a free parameter and show how the limits change as a function of M cut . Qualitatively, M cut is of the order of the mass of the mediator particles, M med , but it does not necessarily coincides with that. M cut is the maximal energy at which the EFT predictions resemble those of the UV model, and thus it will typically have to be taken slightly below M med . Deciding how much below, and thus choosing the relevant JHEP08 (2017) M cut exclusion contour among the ones we will draw, is left to model-builders aimed at using our results to set limits on their specific UV model. On top of displaying limits in the m DM -M * plane at fixed M cut , one can also visualise exclusions at fixed g * , with g * defined as The advantage is that g * ranges in a finite domain because basic perturbativity considerations require g * 4π and g * 1 is expected for WIMP-like DM [24]. In what follows, limits are derived on this specific EFT by re-interpreting the CMS 13 TeV mono-jet search [34], for which the Simplified Likelihood is available, as an application of the methodology developed in the previous section.

Signal simulation
We simulated the DM production signal with MadGraph 5 [47], interfaced with PYTHIA 6 [48] for showering and hadronization and with Delphes [49] for the simulation of the detector response. DM pair-production pp → χχ is simulated with up to two partonlevel jets in the final state and the resulting event samples are combined by the MLM showering/parton-level matching implemented in MadGraph. The effective operator scale has been set to M * = 1 TeV and seven simulations have been performed at the m DM points listed in table 1. The values reported in the table represent the total production crosssection, inclusive in the number of jets, with no cuts.
In a conventional BSM search, the only cuts to be imposed on the generated event sample would be those related with trigger, acceptance and selection that define the search region employed in the experimental analysis. 11 For an EFT-based signal instead, the E cm < M cut restriction must also be put in place and the question arises of how the center of mass energy E cm should be concretely defined. If the signal was uniquely associated to a single parton-level hard process we would naturally define E cm as the center of mass energy of that partonic process. If for instance the signal was entirely produced by pp → χχj, E cm would be the total DM-pair plus parton-level jet invariant mass. Extra low-p T jets emitted by parton showering must be excluded from the calculation of E cm because those emissions are low virtuality QCD processes, whose occurrence does not invalidate the accuracy of the EFT prediction. While pp → χχj is indeed the dominant process in our sample, after restricting it to the mono-jet search region, the contribution from other partonic configurations is not completely negligible and a more refined definition of E cm is needed. This is constructed by noticing that the purpose of matching algorithms  is precisely to distinguish hard jet emissions, that do contribute to E cm , from soft ones that do not. The definition of E cm is particularly straightforward to construct within the MLM matching algorithm [50], because this algorithm removes all the events in which soft QCD emissions are generated at parton level, so that the soft emissions are exclusively generated by showering in all the events that compose the final sample. The parton-level configuration that produced each event, stored in the output file, is thus the proper hard process to be associated with the event, and E cm is computed out of that. Once the detailed implementation of the algorithm (MLM Kt-jet as implemented in MadGraph 5, in our case) and its parameters ("xqcut = 30 GeV", "ptj=xqcut", "etaj = 7", "maxjetflavor = 5" and "QCUT = 100") are specified, this definition of E cm is unambiguous and fully reproducible.
The E cm definition given above, based on the MLM method, is theoretically accurate and easy to implement, it is thus the one we will use in what follows. However it is interesting to compare it with alternative reasonable definitions, given below • Leading jet: the total invariant mass of the system composed of the two DM particles and of the p T -leading jet after the jet reconstruction.
• Multiple jets: the jets of an event are selected and ordered in p T . Then the leading jet's 4-momentum is summed to the one of the DM particles. If the transverse component of the total 4-momentum calculated this way is at least 90% of the event's E / T , E cm is calculated as the total 4-momentum invariant mass. If this does not happen, the second p T -ordered jet is considered, its 4-momentum is summed to the previous one and then the transverse component is evaluated again. The procedure goes on until the E / T value is balanced at 90% percent by the jets.
On the left panel of figure 1, the E / T distribution is shown for m DM = 200 GeV, after the cuts specified in footnote 11 and the E cm < M cut = 2 TeV restriction, with the three different definitions of E cm given above. The distribution without any E cm cut is also shown for JHEP08(2017)074 comparison. The three definitions give quite similar results, showing however appreciable differences. The "Leading jet" definition tends to overestimate the EFT signal because it does not count hard extra jets emission in the calculation of E cm , thus producing a lower E cm and in turn a higher signal. The "Multiple jets" instead underestimates the EFT signal, showing that showering emissions give a significant contribution to the transverse momentum unbalance of the event.
After the E cm < M cut restriction and with the cuts in footnote 11, the total EFT signal cross-section takes the form The highest value is of course equivalent to M cut = ∞, namely it assumes that the EFT is valid at all the energy scale the LHC can probe.
It is interesting to inspect the dependence of on m DM and M cut , shown on the right panel of figure 1. In the Naive EFT, the efficiency monotonically increases with m DM because heavier DM is produced in more energetic reactions, for which it is easier to produce large enough E / T and jet p T to contribute to the signal region. The efficiency instead decreases with m DM when the E cm cut is in place, because asking for E cm < M cut forbids hard reactions and reduces the allowed phase space. The efficiency goes exactly to zero above the kinematical threshold

Setting limits
The CMS mono-jet analysis [34] contains all the information we need to set limits on the EFT as a straightforward application of section 2.2.3. Namely it reports (see table 1 and figure 14 of [34]) the observed countings O i , the expected backgrounds B 0 i with their covariance matrix Σ B (see eq. (2.27)), in 22 E / T bins. However we considered the number of countings in the high-E / T bins slightly too small for a safe application of the χ 2 formula (see appendix A), therefore we aggregated bins number 14 to 16 and 17 to 22 in two single bins and we employed an N = 15 bins histogram to set the limit. The covariance matrix for the aggregated bins is obtained as in ref. [41]. The simplified likelihood approach prevents a full-fledged treatment of nuisance, but still it allows us to introduce, through Σ S , nuisance parameters that affect only the signal such as the luminosity and the trigger uncertainties. We verified that those effects do not change our limit appreciably, therefore we set Σ S = 0 in what follows. JHEP08(2017)074 All what is missing to compute the χ 2 in eq. (2.20) are the signal central values S 0 i , which we obtain by simulations and parametrise by the obvious generalisation of eq. (3.4) where L = 12.9 fb −1 is the nominal integrated luminosity and i is the efficiency in each bin. Efficiencies are calculated as interpolating functions in the m DM -M cut plane, using the 7 × 7 grid of simulations described above. This allows a fast exploration of the parameter space. The last step is the minimisation of the χ 2 over the unknown additional signal components ∆ i ≥ 0. Out of χ 2 min = t min one obtains the α = 95% limit by solving eq. (2.16). What plays the role of the signal strength µ here is 1/M 4 * , therefore the upper exclusion limits on µ will be reported as lower limits, M exc * , on the effective interaction scale M * . The minimisation of χ 2 is performed numerically, using as staring point of the minimisation algorithm the minimum of the modified χ 2 in eq. (2.26). Being a quadratic function, the location of the minimum of χ 2 mod can be obtained analytically and it is found to be a very good approximation of the true one. The result of this procedure is shown in figure 2 in the m DM -M * plane at fixed M cut (left panel) and g * (right panel). On the left plot, all the points below the curves are excluded, while in the one on the right what is excluded is the interior of the curves. This peculiar behaviour [24] is due to the fact that at fixed g * (see eq. (3.3)) low values of M * correspond to small M cut and the analysis is not sensitive to very small M cut because of the E / T > 200 GeV cut (see eq. (3.5)) on the signal region.
The meaning of the black curve on the right panel of figure 2, labeled as "Pure EFT" needs to be explained. The statistical treatment of the unknown additional signal, which eventually led us to our limit-setting strategy, aims at taking rigorously into account the possible contamination of the EFT signal due to high-energy DM production, occurring and E cm > M cut . However it legitimate to assume that high-energy contributions are small or JHEP08(2017)074 absent, for instance because the mass of the mediator particles, and in turn the EFT cutoff, is above the total LHC energy of 13 TeV. 12 Under this restrictive assumption, no unknown additional signal is present and the ordinary limit-setting strategy can be carried on. The result, obtained with the " q µ " test statistic of ref. [32] and with the simplified likelihood, is dubbed "Pure EFT" limit in figure 2, since it assumes no contamination in the signal region from non-EFT reactions. The Pure EFT limit is stronger than the one obtained with M cut = 13 TeV using our procedure. This was expected because the q µ test is signalstrength based while ours it is not, for the reasons explained in section 2.2.6. Physically this is due to the fact that the Pure EFT test employs the full theoretical information about the shape of the signal distribution, which is partially lost in our analysis because of the additional signal. The difference between the Pure EFT and the 13 TeV lines in figure 2 gives a measure of this effect. Notice that the Pure EFT limit is in some sense more correct than the M cut = 13 TeV one, because assuming M cut = 13 TeV guarantees that no additional signal is present. The limit on the M cut = 13 TeV configuration should thus be set with a signal-strength-based approach that has a stronger expected limit. For lower M cut instead additional signal must be taken into account and our limit-setting strategy is the only viable one.

Validation
Our results rely on two main approximations: the Simplified Likelihood approach and the usage of Delphes for detector simulation. Both these tools have been validated in the literature, however it is worth cross-checking. We do so by reproducing, using the same tools, the limits on a conventional DM benchmark model (see [34] for details) where DM is a Dirac fermion coupled to a spin-1 Z -like mediator. The results, obtained with the " q µ " test statistic [32], are reported as grey triangles in figure 3, showing perfect agreement.

Conclusions
The absence so far of direct discoveries makes the indirect exploration of heavy new physics by means of EFT's one of the priorities of the LHC experimental program. Significant progresses in this direction are expected from run-2 and run-3 data, and even more from the High-Luminosity LHC upgrade. It is thus important to clarify all the aspects related with the usage of EFT's in the LHC environment. In this paper we addressed the issue related with the limited range of validity of EFT's and we found a rigorous and simple procedure to set limits on the EFT parameter space, duly enlarged to include the EFT cutoff parameter. A full-fledged treatment, to be carried on by the experimental collaborations, is possible and not more complicated than ordinary hypothesis tests. A Simplified Likelihood procedure can also be constructed, relying on approximations that we analysed in great detail. 12 Actually it is legitimate to assume Mcut > 13 TeV only if M * 13/4π = 1 TeV, since Mcut 4πM * by perturbativity. Given that M * = 1 TeV is right at the boundary of the region that can be excluded by the analysis, Mcut > 13 TeV is a rather marginal configuration. However the considerations that follow also apply to mediators lighter than 13 TeV but still in the multi-TeV range such that they are too heavy to be produced. The procedure has been applied to the CMS mono-jet search [34], which we reinterpreted as limits on axial-axial DM EFT. The resulting plots, shown in figure 2, can be employed in two ways. First, they provide a semi-quantitative assessment of how heavy the mediator particles (i.e., how large M cut ) need to be for the "Pure EFT" limit to hold to good approximation. For instance we conclude that the Pure EFT constraint approximately applies for mediator masses above around 2 TeV. Second, the plot can be quantitatively reinterpreted in specific UV DM models, where the appropriate value of M cut can be worked out. Concrete examples of this reinterpretation are given in ref. [24]. Notice that the bounds one obtains in this way might not be the strongest experimental limits on the UV model at hand. Especially at low M cut , i.e., low mediator mass, mediators are likely to be efficiently constrained by direct searches. UV models will be constrained by combining patches, the EFT limit in figure 2 being one of those patches. The right panel of figure 2 gives another important semi-quantitative information. Namely that WIMPlike DM models, where g * 1, are poorly constrained by mono-jet DM searches. Direct mediator searches are likely to be more effective in that regime.

JHEP08(2017)074
The limit-setting strategy we developed can be straightforwardly applied to EFT's aimed at describing the effects of heavy new physics in the EW-plus-Higgs sector. However whether our approach is really needed or not is question that needs to be addressed caseby-case. For a concreteness, consider the EFT study proposed by one of us in ref. [28], namely the search, in neutral (l + l − ) and charged (lν) Drell-Yan at high mass, of two d = 6 operators that induce "oblique" corrections to the SM vector bosons propagators. For the neutral Drell-Yan analysis there is obviously no need for our statistical procedure because the center of mass energy of the events can be measured experimentally. When this is the case, the limit on the EFT can be set by using only events with E cm < M cut , employing the standard statistical tools, as explained in the Introduction. This is not the case for the charged Drell-Yan analysis, where the relevant distribution is the one in transverse mass, JHEP08(2017)074 which can receive contributions from arbitrarily high E cm . In principle, charged Drell-Yan should be treated with our strategy, in full analogy with the DM mono-X searches. There is however a substantial quantitative difference. The charged Drell-Yan analysis has such a powerful reach on the EFT Wilson coefficients that it is conceivable to assume that the mediator particles are as heavy as 10 TeV (in "plausible" Universal models, they could be even much higher if other microscopic origins were considered for the EFT operators). It is thus expected (and could be checked) that the mediator particles cannot be produced at a sufficient rate to contribute significantly to the signal region. The "Pure EFT" limitsetting procedure, in which one assumes vanishing additional signal, is thus justified in this case. On the other hand, one might want to study what happens if the mediators are light (but still heavy enough not to be directly seen). This would indeed require applying our limit-setting strategy.
Having in mind limit-setting, in this paper we completely ignored the possibility that a significant excess is observed with respect to the SM prediction. If this occurs, our procedure will obviously return a much larger p-value than expected, and the result might be used to exclude the SM by computing the probability of such a high p-value in the SM hypothesis. However, differently from what happens for a "normal" signal, the dependence of the p-value on the signal-strength µ will not tell us much about its true value. The behaviour of the p-value in the case of a discovery is easily understood by looking at eq. (2.14) and assuming that the observed countings are those predicted by the EFT with signal-strength µ = 0, plus a certain amount of positive additional signal ∆ i in each bin. If the hypothesis µ = 0 is considered in eq. (2.14), and if the observed are sufficiently above the SM backgrounds B i , a small value of t min will be found, and thus a large p-value, because most of the bins are over-fluctuating and do not contribute to the signal. The µ = 0 hypothesis would thus be found to be fully consistent with the data, which is correct because the over-fluctuation could very well be due to the additional signal rather than to the EFT contribution. The p-value will stay constant and large as µ increases, until µ = µ. At that point the minimisation over ∆ i will of course return ∆ i = ∆ i and perfect compatibility will be found. The p-value will start decreasing for µ > µ, with a slope that depends on the O i 's, and thus in turn on true values of the additional signal components ∆ i . It is only when µ is large enough for S i + B i to overcome O i in several bins that large t min , and thus small p-value, will be obtain obtained. Unlike a "normal" p-value, ours does not show a peak at µ = µ, but rather an (not necessarily sharp) end point. It can thus be used to set an upper limit on µ, not to measure its value. Therefore, our statistical analysis is not suited to characterise a discovery in terms of the measurement of the EFT Wilson coefficients. Still the EFT can be used to characterise the discovery under the "Pure EFT" assumption ∆ i = 0. If this assumption is not self-consistent, or if it is not consistent with data, explicit UV models should be employed to characterise the discovered signal.

A Toy example
In order to illustrate the validity of the limit-setting strategies developed in sections 2.1, 2.2.1 and 2.2.2, we consider an histogram with N = 3 equally spaced bins, constructed for a variable x ∈ [0, 3]. The signal contribution to the expected countings, i.e. the signal from the EFT, is taken to be exponentially distributed with total crosssection µ. The total SM background is called B and the background distribution is also an exponential, but with slightly different slope. Namely with λ s = 1/5 and λ b = 2/5. The slopes are chosen in such a way that the signal-overbackground ratio is comparable in the three bins. This avoids the limit being dominated by a single bin, in which case our limit-setting strategy would effectively reduce to the one in eq. (1.3). We momentarily assume that the signal and the background are exactly known. We will study the impact of nuisance parameters later. We consider background values B = 3 and B = 10, which by integrating eq. (A.1) in the bins lead to B {1.4, 0.95, 0.64} and B {4.7, 3.2, 2.1}, respectively. In each case we generated 100 toy Monte Carlos, i.e. a set of observed O vectors distributed according to the background-only hypothesis, and studied the performances of our tests on each of them. The "ideal" test defined by eq. (2.5) is obviously the most difficult one to implement, because it requires the determination of the distribution of the test statistic point-by-point in the ∆-space. Rather than by the Monte Carlo method, we found more convenient to compute the p-value directly by summing the probabilities of the possible outcomes of the experiment that are more incompatible than the actual observation O. Namely, we compute, for each µ, ∆ and O where Θ is the step function and the test statistic t is the standard Log Likelihood Ratio, i.e. the argument of the curly bracket in eq. (2.10). The normalised signal S i {0.40, 0.33, 0.27} is defined in eq. (2.15) and is obtained by integrating eq. (A.1) in the bins. The sum over o i is of course truncated at a maximal value in the calculation, and this maximum is increased in suitably designed step through an iterative procedure. The procedure stops when the relative change of the p-value with respect to the previous step is below 10 −3 . Afterwards, p max (µ; O ) needs to be computed by maximising p over ∆ i ≥ 0. This step is greatly facilitated by taking as starting point of the algorithm the maximum of t, which is provided by eq. (2.14). The result of this procedure is reported in figure 4 as a function of µ, together with the "full" p-value (continuous line) defined in section 2.2.1, the one obtained by the χ 2 formula in eq. (2.20) (dashed line) and the "modified" χ 2 in eq. (2.26) (dotted line). The full p-value and the χ 2 are obtained as explained in the main text and don't pose any computational issue. Notice in particular that the full p-value can be computed analytically in the present example, where no nuisance parameters are included, thanks again to eq. (2.14). The modified χ 2 formula (2.26) is not applicable for B = 3, because the expected countings are so low that the observed ones are very likely to have one vanishing entry. Figure 4 shows the expected p-value, obtained as the median over the possible outcomes, as a function of µ. The agreement of the full version of the test with the ideal one is remarkable, in spite of the fact that the expected background countings are of order one for B = 3, while the full test is supposed to hold in the AL where the number of countings is large. This confirms the common lore according to which the statistical distribution of the Log Likelihood Ratio is well-described by the AL formula in eq. (2.11) even if the number of counting is not large. Other AL formulas are much less accurate. For instance the Gaussian approximation for the Poisson distribution is known to become accurate for a number of countings well above 5. Indeed we see in the figure that the χ 2 approximation of the p-value is completely inaccurate for B = 3 and that it is not yet satisfactory (though the accuracy improves) for B = 10. The modified χ 2 is found to be completely off, compatibly once again with generic expectations. Notice that the agreement of the expected p-value is not strictly speaking sufficient to establish the accuracy of our approximations, which we need to hold for each individual possible outcome of the experiment. We inspected the sample of 100 toy Monte Carlos and verified that for each of them (including those that display large under-and over-fluctuations with respect to the background-only hypothesis) the level of agreement of the different p-value calculation is similar to the one we found for the expected one.
Next, we complicate our probability model in eq. (A.1) by adding five sources of nuisance ν = {L, β 1 , β 2 , β 3 , R}, corresponding respectively to the integrated luminosity normalised to the nominal one, to the determination of the background in the 3 bins from the control region and to the transfer factor needed to relate the latter measurements to the expected background in the signal region. Namely, the signal and background expected countings are taken to be where s i (previously denoted as S i ) and b i (previously, B i ) are the signal and background true values obtained by integrating eq. (A.1) in the bins. The true value of each nuisance parameter is equal to 1 and its "measurement", i.e. the central value of its likelihood, ν 0 i , as obtained by auxiliary measurements, is taken to be Gaussian-distributed with standard deviation δ. In order to mimic the possible outcome of repeated measurements we generated 1000 points in the ν 0 space, with the previously described distribution, and 1000 toy Monte Carlos for O distributed around the background true values b i . The covariance matrix V that defines the likelihood of the nuisance parameter in eq. (2.17) is 1/δ 2 times the identity and it is taken not to fluctuate in the repeated experiments. For each point in the ν 0 and O space we evaluate the full test in eq. (2.14) and the ones based on the χ 2 and the modified χ 2 in eq.s (2.20) and (2.26), as a straightforward application of the formulas in sections 2.1, 2.2.1 and 2.2.2. The presence of step functions in eq. (2.14) does not pose any computational issue to the automatic minimisation algorithm implemented in Mathematica. We didn't try to implement the calculation of the ideal p-value, which would probably be too demanding already in our toy example. We consider the results obtained in the no-nuisance case sufficient to conclude that the full p-value is an accurate estimate of the ideal one, for B ≥ 10, therefore in what follows we will consider the former as reference to assess the validity of the other methods. Results are obtained for different values of B and δ, starting from B = 10 and δ = 0.1 and δ = 0.01, shown in figure 5. The level of agreement is comparable to the one in the right panel of figure 4 and it does not improve substantially lowering δ. This shows that the un-accuracy of the χ 2 and the modified χ 2 approximations is dominated by the low statistics, while the expansion around ν = ν 0 we performed in section 2.2.2 to obtain the χ 2 formula does not introduce much additional error. The agreement improves for B = 100 (see figure 6), because with larger statistics the Gaussian approximation for the Poisson countings becomes more accurate. The un-accuracy of the χ 2 formula due to the ν = ν 0 expansion starts becoming visible in this case, resulting in an improvement of the χ 2 approximation (dashed line) when δ is lowered from 0.1 to 0.01. The behaviour of the modified χ 2 p-value (dotted line) requires further explanations. For δ = 0.1 it provides a rather accurate approximation, but this is an accident due to the fact that for low B (see B = 10, δ = 0.1 in figure 5) the modified χ 2 underestimate the true p-value while for large B (see B = 1000, δ = 0.1 in figure 7) it overestimates it. The point B = 100 happens to be close to the transition between the two regimes. Notice also that for δ = 0.01 (both at B = 100 and B = 1000) the performances of the modified χ 2 are inferior to those of the χ 2 as expected. We thus recommend the latter for a more accurate (and conservative, in all cases we studied) estimate of the p-value. Finally, the value B = 1000 is considered in figure 7. In this case the statistic is large enough for the Gaussian approximation to work extremely well and the expansion around the nuisance central value is the leading source of un-accuracy. The χ 2 approximation becomes indeed substantially exact when δ is lowered from 0.1 to 0.01.

B Derivation of the χ 2 formula
The derivation basically consists in expanding the argument of the "inf" eq. (2.19) up to quadratic order in the nuisance parameters around ν = ν 0 as prescribed by eq. (2.18). Afterwards the minimisation over ν can be performed analytically. The only subtlety is that we expand to the second order only the (M i − O i ) 2 numerator, while treating the denominator at the zeroth order, i.e. setting 1/M i = 1/M 0 i . This is justified by the fact that the numerator is much more sensitive to the nuisance fluctuations than the denominator. Technically, the contributions to the Taylor series from the expansion of the denominator