Digging deeper for new physics in the LHC data

In this paper, we describe a novel, model-independent technique of “rectangular aggregations” for mining the LHC data for hints of new physics. A typical (CMS) search now has hundreds of signal regions, which can obscure potentially interesting anomalies. Applying our technique to the two CMS jets+MET SUSY searches, we identify a set of previously overlooked ∼ 3σ excesses. Among these, four excesses survive tests of inter-and intra-search compatibility, and two are especially interesting: they are largely overlappingbetween the jets+MET searches and are characterized by low jet multiplicity, zero b-jets, and low MET and HT . We find that resonant color-triplet production decaying to a quark plus an invisible particle provides an excellent fit to these two excesses and all other data — including the ATLAS jets+MET search, which actually sees a correlated excess. We discuss the additional constraints coming from dijet resonance searches, monojet searches and pair production. Based on these results, we believe the wide-spread view that the LHC data contains no interesting excesses is greatly exaggerated.


Introduction
The Large Hadron Collider (LHC) has recently achieved major milestones. At the ICHEP 2016 [1] and Moriond 2017 [2, 3] conferences, the ATLAS and CMS collaborations presented the results of many searches for new physics using ∼ 10/fb and ∼ 35/fb of data, respectively, at 13 TeV. At these integrated luminosities, the sensitivity to new physics begins to truly outstrip what was previously achieved in the ∼ 20/fb of 8 TeV data collected in Run I. When analyzed by the experimental collaborations, no evidence for new physics has emerged from this data.
It would be fair to say that the lack of new physics in the ATLAS and CMS data has reinforced a growing sense of unease among particle physicists. While new physics around the weak scale remains as theoretically well-motivated as ever, there is a general feeling that it should have showed up by now. For example, although natural regions of supersymmetric parameter space remain, the simplest versions of supersymmetry (which were the pre-LHC expectations) are excluded by the null results [4][5][6][7][8][9][10][11]. If solutions to dark matter or the hierarchy problem have failed to manifest in 35/fb of 13 TeV data, why should we expect them to appear in the next 35 or 100/fb? -1 -

JHEP11(2017)194
In this paper, we wish to push back on the characterization of the LHC data as clearly devoid of interesting signals with the potential to be new physics. We believe this is overly pessimistic. It is entirely possible that signatures of new physics are present in the existing data. At the very least, this possibility cannot be excluded without significantly more work by both theorists and experimentalists.
The issue is that the LHC searches, especially those from CMS, now typically contain hundreds of signal regions (SRs) categorized by various bins in topology (number of leptons, jets, b-jets, etc.) and kinematics (H T , M eff , / E T , etc.). Slicing the data this finely is a potentially powerful, model-independent approach, and it allows the searches to be sensitive to a much wider variety of models than the small set of benchmarks that have been studied so far. Indeed, we are very grateful to the experimental collaborations for providing such a wealth of information. However, having so many SRs also results in a noisier dataset. Adding in the presence of non-trivial correlations in the background predictions, it can be very challenging to get a sense of the presence or absence of statistically significant anomalies in the data.
The conclusion that there is no evidence of new physics in the data is primarily based on the study of a handful of benchmark "simplified" models by the experimental collaborations. These typically consist of a few particles, with pre-determined cross sections and simple branching ratios -pair produced gluinos decaying 100% to qqχ, for example. These models only ever populate a small subset of the many SRs in the LHC searches. Interesting excesses could exist in the data and yet be completely overlooked by these analyses of simplified models. Even within a given simplified model topology, one could have an excess at lower masses, where the simplified model is naively excluded, provided the cross section were somehow reduced. 1 Showing limit plots for just a narrow set of simplified models paints a potentially misleading picture of the data -they are simply not an adequate basis set to cover data as complex as what the LHC is now providing.
More generally, while it is straightforward to use the full set of SRs (and their correlations) to test for the presence of a specific new physics scenario, this presupposes knowledge of the new physics model to be tested. Imagine that new physics is present in the LHC data, but with a drastically different set of signatures and kinematics than realized in the set of popular simplified models. The new physics would result in statistical excesses in some subset of SRs, but with so many SRs in the analysis (and so many more chances for random fluctuations), this would not be immediately apparent. Likelihood calculations using signal templates derived from the current set of simplified models would likewise miss the new physics, as they populate very different SRs.
What we need then is some method which provides a more comprehensive "basis set" of signal templates than the existing simplified models. This will allow us to take a more data-driven approach towards discovering new physics at the LHC: rather than asking "is this particular model of new physics present in the LHC data?" we can instead ask "what potentially interesting excesses are present in the data? And what type of new physics JHEP11(2017)194 models are compatible with them?" In our opinion, this approach is better suited to the situation particle physics currently finds itself in.
In this work, we will attempt to provide such a method by scanning over all possible "rectangular aggregations" (RAs) of individual SRs. We are motivated by the fact that a true signal (as opposed to a statistical fluctuation) would tend to populate a set of kinematically and topologically neighboring SRs. Moreover, given how finely the SRs are sliced now, the combination of detector resolution and underlying physics (such as angular distributions and ISR/FSR) would tend to spread the signal over multiple SRs. Not knowing more about the distribution of events within neighboring SRs, we choose to simply aggregate together signal and background counts in rectangular regions in the (multi-dimensional) space of cut variables (for example, one rectangle might be 2 ≤ N j ≤ 3, N b = 0, 500 ≤ H T ≤ 1000 GeV, 300 ≤ / E T ≤ 500 GeV). By performing a full profilelikelihood analysis including correlations within and outside of each RA, we can generate a model-independent list of possible excesses over the background, which can then be more carefully examined to determine if they are possibly consistent with a new-physics interpretation.
Our approach should be contrasted with the alternative approach taken by some analyses (see e.g. [12,13]) to define a small set (typically O(10)) of "aggregate signal regions", 2 coarser selections than the individual SRs that are motivated by various signal topologies. While these aggregate SRs can be potentially useful, they are still too signature-dependent and too few in number to adequately assess whether there are any interesting excesses in the data. Also, any choice of aggregate SRs is prone to wash out underlying excesses, which we refer to as over-aggregation: this will be the case any time a bin (or a set thereof) with a significant excess is combined with bins consistent with the background.
Clearly, our method is only possible if the SRs are non-overlapping (i.e. exclusive) in the space of kinematic variables. While this is the case for the CMS searches, most of the ATLAS searches released to date have a small set of overlapping signal regions 3 (see, for example, [14]). We concentrate therefore on the CMS results, and will consider the corresponding ATLAS data only after potentially interesting anomalies are identified in particular CMS channels. We will further focus on the two CMS jets+ / E T searches [12,13] which have 174 and 213 SRs, respectively. These searches provide full covariance matrices, 4 are well documented, and are thus recastable. We consider them to be a good testing ground for our approach. Obviously, it would be interesting to continue in this vein with all of the other viable CMS searches.
Our rectangular aggregation technique results in ∼ 7,000 and ∼ 33,000 possible aggregations in the jets+ / E T searches [12,13], respectively. Within these, we find 10 and 14 2 Sometimes also called "super signal regions" or "combined regions." 3 The ATLAS searches also tend to have much higher kinematic thresholds, for unclear reasons. This makes them much less powerful than their CMS counterparts. 4 As we will discuss in detail later, the background estimates can be highly correlated in these searches, so the covariance matrices play a critical role in our rectangular aggregation technique, and in performing accurate statistical calculations more generally. Without them, our list of interesting excesses would have been completely different. We thank CMS for providing the full covariance matrices and encourage them to continue.

JHEP11(2017)194
minimal rectangles that contain statistical excesses with a local SM-only p-value below 1% (N σ > 2.57); these rectangles are minimal in the sense that they cannot be decreased in size without significantly lowering the size of the excess. These rectangles can be further grouped into three (five) clusters for [13] ( [12]) which share similar kinematic and topological features, which we will refer to as regions of interest (ROI). Finally, we dig deeper into these potentially interesting rectangles to determine whether they are more likely to be statistical fluctuations, or whether they are compatible with a new physics interpretation. We examine distributions of N j , N b , H T , etc. to make sure they look sensible, and we also test for compatibility between the two searches. Of the eight total statistically-significant ROIs in the two analyses that our aggregation technique revealed, we found only four that appeared consistent with a new physics interpretation (two in [13] and two in [12]), all with local statistical preference at approximately the 3σ-level. (As we will discuss in more detail later, we estimate a global significance of ∼ 2σ.) We consider these excesses to be very interesting and deserving of further study.
In particular, one of the two excesses in [12] is broadly consistent with one of the excesses in [13]. In section 3 we consider this anomaly in more detail. It is characterized by low jet multiplicity, no b-tagged jets, and relatively low / E T or M T 2 , and so we refer to this as a "monojet" excess. Having identified this excess through our rectangular aggregation technique, we can now attempt to analyze it using a more conventional approach: constructing simplified models and performing full fits to their parameter spaces using all of the SRs. We can also use the CMS exotica "monojet" search [15] and the ATLAS jets+ / E T search [14] to further refine our calculations.
After considering several simplified models, we find that the following provides an excellent fit to the data: resonant production of a colored mediator that decays promptly to a quark plus an invisible particle. Combining the CMS and ATLAS jets+ / E T searches, we find a local significance of 3.5σ for this model. (Interestingly, the ATLAS jets+ / E T search also shows an excess in its most relevant SR, so including it actually increases the preference for signal.) On the other hand, the best fit cross section is in some tension with the results of the CMS monojet search [15]; including the constraints from this reduces the local significance to 3σ. This model is also consistent with the limits from dijet searches 5 [16], which would be implied by the associated decay of the mediator back to a pair of colored particles.
The stated 3σ excess is, of course, only the local significance. While one might expect that the global significance to drop significantly after the application of the look-elsewhereeffect (after all, our rectangular aggregation technique covers some 33,000 rectangles), this is in fact not what occurs. The look-elsewhere-effect (LEE) (for a nice discussion, see e.g. [17]) is rigorously defined only in terms of a specific model, and as we will demonstrate in section 4, for the resonant colored particle model we find the global significance is ∼ 2σ after the LEE is applied. In essence, even though there are a very large number of rectangular aggregations which we scan over, they are highly correlated (rectangles overlap Figure 1. The rectangular aggregation method with two kinematic variables on a particular set of 10 SRs from [13]. The colored rectangles represent individual signal regions, while the dotted gray lines illustrate the narrowest binning in each kinematic variable. The color-coding of each SR shows the pulls (in units of standard deviations) of a mock dataset which includes statistical fluctuations and an injected signal, with red (blue) regions having upward (downward) fluctuations with respect to the background. The black rectangles (solid and dashed) delimit two of the many possible choices of RAs. As described in the text, the SRs overlapping with a rectangle will be aggregated to form an RA. other in both topology and the kinematic variables. Lacking a more sophisticated-yetmodel-independent template for how events are distributed across SRs, we choose to simply aggregate together SRs in a "rectangular" fashion. 7 By considering all possible rectangular aggregations (RAs) of any size in a given search, we ensure sensitivity to a wide range of possible signals.
We illustrate the general idea behind this method in figure 1, where two kinematic variables (H T and / E T ) are used in the SR definitions. (For searches that have more than two kinematic variables -as is generally the case -our rectangular aggregation scheme is extended in the obvious way.) Each bin is color-coded according to the statistical pull with respect to the background in a mock dataset, while the black rectangles show a few possible RAs. In particular, the solid rectangle shows a small aggregation (SRs 5-6) resulting in large local significances, while the dashed rectangle exemplifies an over-aggregated RA (SRs 4-8) which washes out the excesses in the underlying bins. Figure 1 also illustrates an important complication: the SRs are typically not uniformly spaced in the kinematic variables. In this case we make all possible rectangles given by the finest possible binning (denoted by dotted gray lines in figure 1) in each kinematic variable. For each rectangle, we then add the SRs that overlap with that rectangle due to the non-uniform binning.
To compute the local significance of a given RA containing an excess, we make use of the (asymptotic) profile likelihood method described in [18]. Specifically, we compute the JHEP11(2017)194 test statistic q 0 , which is the difference of the profile log-likelihoods of the background-only hypothesis and the background plus best-fit signal-strength hypothesis. As shown in [18], in SRs with large event counts, N σ = √ q 0 can be translated to a p-value through the Gaussian distribution, e.g. N σ = 2 corresponds to a p-value of 0.05. (The full details of our statistical technique are discussed in appendix A.) Here our key assumption about the signal hypothesis is that the signal populates only the given RA and nowhere else. 8 In the calculation of the profile likelihood, correlated uncertainties in the background estimates of individual SRs play an essential role. These correlations are often sizable, especially between nearby bins whose background expectations are inferred from correlated control samples. If these correlations are not included in the calculation of q 0 , the resulting significances can be wildly off.
Fortunately, the CMS collaboration has recently started releasing full correlation and covariance matrices for their searches. To incorporate them into our RA procedure, we should sum the entries from the aggregate bins in the full covariance matrix V to form a reduced matrix V R . Explicitly, we construct a new vector of observed and expected events, as well as a new covariance matrix, given by: is the vector of expected background (observed data) for all bins, (i, j) refer to bins that are being aggregated in the rectangle R, and (I, J) refer to bins that are not included in the rectangle R. The signal hypothesis (with arbitrary normalization) is then the vector S R = (1, 0, . . . , 0) in this basis.
As an example of the importance of correlations, in ROI 1b constructed from the SRs of [12] (to be described below), the full calculation including correlations yields 2.95σ, while neglecting correlations reduces the significance to just 1.8σ. The reason can ultimately be traced back to the fact that the correct measure of the error is 1/(V −1 R ) 11 (see eq. (A.2)), and this is very different (and in this case much smaller) than (V R ) 11 , due to the presence of correlations.
Using our method of rectangular aggregations, we can generate in any search a modelindependent list of RAs with locally high statistical significance. We can then follow this up with a more detailed study of the excesses in these RAs, and whether they are compatible with any actual new physics models. In the next subsection, we will apply this method to the CMS jets+ / E T searches.

Application: jets plus missing energy searches
Currently, CMS has two jets+ / E T searches using the full 36/fb Run II dataset: [13] and [12], which we will also refer to via their PAS identifiers: CMS033 and CMS036, respectively. The kinematic variables used in CMS033 are the number of jets N j with p T > 30 GeV,

JHEP11(2017)194
number of b-tagged jets N b , the scalar sum of jet p T (H T ), and the missing transverse momentum H miss T = / E T . 9 CMS036 uses N j , N b , H T , and the stransverse mass variable M T 2 [19][20][21]. Apart from small triggering differences and the use of / E T vs. M T 2 , the selections are very similar, so that events in both searches are largely overlapping.
In fact, it is even possible to directly and rigorously map SRs of CMS036 onto a set of SRs of CMS033, using a simple inequality between M T 2 and / E T . We will make use of this fact to test the compatibility of any excesses in the former with the latter. M T 2 is calculated in CMS036 by iteratively grouping all jets into two pseudo-jets, and then computing the transverse mass using each pseudojet and the missing momentum. The key observation is that for two (pseudo)jets, the stranverse mass can actually be calculated analytically as M 2 T 2 = 2p T j1 p T j2 (1 + cos θ 12 ) [22] (which is the same expression as the contransverse mass M CT [23,24]). Meanwhile, / E 2 T = p 2 T j1 + p 2 T j2 + 2p T j1 p T j2 cos θ 12 , making it clear that M T 2 ≤ / E T in every event. There are 174 individual SRs in CMS033 and 213 in CMS036, which when combined in four-dimensional rectangles result in roughly 7,000 and 33,000 possible aggregations, respectively. After scanning over all rectangular collections of signal regions, we found several deviations from the background-only hypothesis; we summarize the most statistically significant (with a local p-value below 1%, or equivalently, N σ > 2.57) discrepancies in table 1 for CMS036 [12] and table 2 for CMS033 [13]. In order to avoid double-counting aggregations where adding nearby SRs does not appreciably increase the significance, we only list aggregations which do not contain a smaller RA with N σ greater than 0.9 of the larger region's significance.
We find 14 and 10 aggregations in CMS036 and CMS033, respectively, that are above our threshold of 1% local p-value. However, they are not all independent from each other. Rather, they form distinct clusters or "hot-spots" in parameter space, with nested and overlapping aggregations. We will refer to these clusters as "regions of interest" (ROIs) in what follows. Altogether, we find five ROIs in table 1 for CMS036 and three in table 2 for CMS033.

Discriminating statistical fluctuations from viable excesses
Obviously, we do not expect all of these excesses to be due to new physics, and at the very least several, if not all, of them should be caused by statistical fluctuations of the SM background. While in some cases the only way to determine whether an excess is a real signal of beyond-the-Standard Model (BSM) physics is to wait for more data, there are two tests we can apply in order to guide our reasoning with the information on hand: • Incompatibility with nearby bins in the same search, due to general properties of kinematic variables.
For example, a signal populating an RA with high jet multiplicity and a narrow H T , M T 2 or / E T range can be disfavored if nearby bins see deficits or no sizable excesses, 9 H miss T is defined as the missing energy formed from jets only. However, given the lepton veto in these searches, in this case the distinction from / E T is negligible. Thus, we will refer to it as / E T .

JHEP11(2017)194
because it is unlikely for these distributions to be so localized if they come from a realistic many-jet signal. Similarly, we expect distributions of N j to be smeared around some underlying partonic value due to ISR/FSR. Finally, we expect N b distributions to be consistent with b-tagging rates (or c-mistagging rates).
In tables 1 and 2, we denote with a symbol those statistical excesses we have identified in CMS036 and CMS033 which we believe are not compatible with the signal regions in the same search.
• Incompatibility with similar bins in other searches.
As discussed above, CMS033 and CMS036 are highly overlapping -the kinematic variables defining the SRs in the two searches are largely identical, except that CMS036 uses M T 2 and CMS033 uses / E T . Thus an excess in one search will usually populate analogous bins in the other. We can make this more precise using the inequality M T 2 ≤ / E T derived above: a signal generating an excess in a particular RA of CMS036 will show up in specific SRs of CMS033 (the converse is not always true, in particular CMS036 would not be sensitive to a model with low / E T and M T 2 / E T , which would only populate CMS033 SRs).
In table 1, we denote with a symbol those statistical excesses we have identified in CMS036 which we believe are not compatible with the signal regions of CMS033.
Applying these arguments to the excesses listed in tables 1 and 2 marks roughly half of the anomalies as unlikely to be anything other than statistical fluctuations. As the detailed listing is rather tedious, we point the reader to appendix C, and in particular to figures 13 and 14 for histograms illustrating the incompatibility. Of course, it is possible that some of these disfavored excesses could be due to a combination of new physics events and an upward fluctuation in background. So while we do not spend time constructing models for them, tracking their evolution with more data will still be useful and important.

Promising excesses
We now focus on the anomalies which we believe have the most potential to be new physics. In figures 2 and 3, we show the kinematic distributions of the residuals (difference between observed and expected event counts) for the viable groups of excesses in both searches. We highlight the location of the excess in each kinematic variable (N j , N b , H T , M T 2 or / E T ) with a gray shading. The error bars represent (approximated) uncertainties on the background expectations: the error on the i-th bin is taken to be (E i +1/(V −1 ) ii ) 1/2 , where E is the vector of expected backgrounds and V is the covariance matrix (possibly reduced after aggregating, as in eq. (2.1)). As discussed in section 2.1, because of the correlations, 1/(V −1 ) ii = V ii , and the inverse of the covariance matrix is the correct measure of the uncertainty, as it enters the likelihood calculation, see eq. (A.2). It can be seen that for each excess and kinematic variable, the neighboring regions can accomodate tails of a BSM signal (as opposed to the other non-viable aggregations in appendix C), either due to deviations from the background, or large error bars. Obs.-Exp. Obs.
-Exp. Obs. -Exp. Obs. -Exp.   table 1 and table 2, with error bars denoting the uncertainty, as explained in the text. The left column shows kinematic distributions for CMS036 ROI #2 while the right column displays CMS033 ROI #2. Within each column, from top to bottom we show the N j , N b , H T and M T 2 ( / E T for CMS033) distributions of the significant aggregation (shaded in gray) and the neighboring bins in that direction in kinematic space. Solid and dashed lines show different components of each aggregation, as labeled in the legends. See text for more details.
Each column in figures 2 and 3 shows a viable cluster of excesses as listed in tables 1 and 2. Obs.-Exp. Obs.
-  second bin of the N j = 1 distribution (p j 0 T = 350 − 450 GeV) which only marginally increases the significance. Finally, the bottom plot shows the M T 2 distribution of the N j ≥ 2 bins (note that M T 2 is not defined for N j = 1).
• In the right column, we show the kinematic distributions for the very similar CMS033 ROI #2, which is formed by signal regions with low H T ∼ / E T . Four separate significant aggregations are possible, of which two are shown in solid blue (#2a), dashed yellow (#2b, which for presentation purposes is rescaled by a factor of 10). We do not plot aggregation #2c and #2d to avoid cluttering the figure: #2c is mostly degenerate with #2a, differing only by the ∼ 100 events in the N j = 5 − 6 bins, while #2d drops the highly populated N j = 2 bin, but has similar shapes as #2a for the other distributions. In the N j , N b plots, we separately show the excess location -12 -

JHEP11(2017)194
with overlapping gray shading: for example, aggregation #2b has N j = 5 − 6 and N b = 0 − 1 which is represented by a gray shaded area delimited by a dashed line. It should be noted that in the H T = 300 − 500 GeV range, CMS033 does not have bins at / E T > 500 GeV, which is why there are no data points above 500 GeV in the bottom plot.
• We now turn our attention to the remaining excesses in figure 3. The left column of figure 3 illustrates CMS036 ROI #3, which has low jet multiplicity, moderate H T and low M T 2 : we note that the core of the excess (#3b) has N j = 2 − 3, with the N j = 1 bin increasing the significance only slightly (as seen in the N j plot on top). We therefore show separately these two bins as solid and dashed. In particular, we note that aggregation #3a (which includes N j = 1) requires a narrower H T range, shown in darker gray delimited by dashed vertical line.
• Finally, in the right column, we show the remaining viable CMS033 excess, #1b. This excess has relatively low jet multiplicity, one or two b jets, low missing energy and high H T . While this might be hard to reproduce in a specific model, it is not clearly excluded according to our criteria.
Of the excesses listed above, CMS036 #2b and CMS033 #2c are particularly interesting, as they both have low jet multiplicity, no b-jets, and low H T , M T 2 and / E T . This opens the possibility that both searches are observing the same events due to new physics. In the rest of this work, we will discuss possible BSM explanations of this pair of excesses. While we focus on this excess for the remainder of the paper, we encourage model-building efforts for the other significant aggregations listed above, as they could just as well be due to new physics. In any case, even at this point we think it is interesting and noteworthy that several ∼ 3σ anomalies can be identified in the experimental data, which is not the commonly received wisdom in the community at this point in time.

Analysis of the mono-jet excess
In this section, we try to fit the ∼ 3σ anomaly corresponding to CMS033 #2b and CMS036 #2c to a BSM model. For definiteness, we repeat here the kinematic properties of the two RAs: As our calculation of the statistical preference for signal over background relied crucially on the covariance matrix, and this is only an approximation provided by the CMS Collaboration, we confirmed with the experimentalists directly that their full calculation for signal preference in these aggregated rectangles matches our results [25]. Given this final state, we also make sure to include any search that is expected to have good sensitivity. In particular, we also reinterpret the ATLAS 2-6 jets + / E T search [14] -13 -

JHEP11(2017)194
and the CMS mono-jet search [15]. The ATLAS search defines large overlapping SRs (using the variable M eff = H T + / E T ), of which the first one (2j-Meff-1200) has some sensitivity to (the tail of) the N j = 2 component of our excess. 10 Meanwhile, the CMS monojet+ / E T search (denoted as CMS048 in the following) has a significant overlap with the events of CMS033 and CMS036. This search has very loose requirements (p j 0 T > 100 GeV for the leading jet and / E T > 250 GeV), with any number of jets allowed, and its SRs are simply / E T bins. In both cases, we do not apply our aggregation technique to these searches, as an excess is easily identified by eye; we simply use these additional datasets to constrain the excess found in CMS033 and CMS036.
For all the BSM models considered, we generate parton-level LHC events with Mad-Graph5v2.5.3 [26], after which initial and final state radiation, as well as hadronization, are handled by Pythia8.219 [27]. We then simulate the detector response with Delphes3.4 [28] tuned to the ATLAS and CMS detectors (depending on the relevant analysis). Each recasted analysis is validated against the simplified models considered by the collaboration, see appendix B for validation plots and more details. We then compute efficiencies by taking the fraction of events populating each bin, and quantify the significance of each model with the test statistic q 0 (described in appendix A) as a function of the model parameters (usually masses of the particles in the decay chain). It should be noted that the putative signal model for the RA method is in general different than for a defined BSM model, in two important ways: first, our method aggregated several bins into one RA, while a BSM model can individually populate different bins within each aggregation, and potentially reach higher significance if its differential distributions are shaped like the excess events. Second, a full model typically has non-negligible tails populating nearby bins, which can both lower or increase the significance. Therefore, although the aggregations described in section 2.2 pointed us to this particular final state, we now use the full set of underlying bins (including their full correlations) to test the significance of different models of new physics.
As noted previously, in this work we neglect the possibility of control region (CR) contamination for our hypothetical signal models. We believe this is unlikely to be an issue for the following reasons. For the jets+ / E T searches, the main background sources are (W → ν)+jets (where the lepton is not reconstructed), (Z → νν)+jets and QCD multijet events where the missing energy comes from mismeasurements of the visible jets. In the first two cases, leptonic CRs are defined, while for multijets different methods are used, including inverting the ∆φ requirement between the jets and the missing energy. Since we will only consider purely-hadronic signal models, there should be no risk in contaminating the leptonic CRs. The multijet background is typically at most a few percent of the whole background, so CR contamination should not be problematic. In any case, we expect that signal contamination would increase the number of measured events in the control regions, therefore overestimating the backgrounds in the SRs, which means that our significance estimate could be even higher. 10 However, the M eff > 1200 GeV cut is too hard and greatly reduces the effectiveness of the ATLAS search. This is a prime example of the difference in the approaches of the CMS and ATLAS SUSY groups to designing their analyses, and how the many-exclusive-SR approach of CMS is much more powerful. Figure 4. Representative Feynman diagrams of the prospective models: (a) squark-neutralino associated production, (b) resonant colored scalar φ decaying to a quark and a singlet fermion ψ, and (c) a resonant singlet vector V decaying to singlet fermions χ.

Possible explanations
As the significance of this excess is driven by the N j = 1 SRs, we focus here on final states with at most one parton and missing energy in the hard process, as we expect additional jets from ISR to populate the higher N j bins. In particular, we compare: • A squark-neutralino simplified model, where a squark is produced in association with a neutralino LSP. The squark then decays to a quark and the LSP.
• A simplified model where a particle φ is resonantly produced and decays to a jet and an invisible fermion ψ, resulting in missing energy. Note that φ needs to be a color triplet, as an octet cannot have a renormalizable operator leading to a two-body decay into a gluon and a color-singlet. We will discuss this model in more detail below, but here we comment that the ψ particle can decay back to three quarks, and so there must be a hidden sector into which it can also decay with significant branching ratios. For the purpose of fitting the kinematics of the observed excesses, we assume the branching ratio of ψ to the hidden sector is 100%.
• A simplified model with a vector mediator V decaying to dark matter, χ. The only jets in the event are due to ISR, with the mediator and thus the missing energy recoiling against it.
Feynman diagrams for the three models are shown in figure 4. These models provide a representative (though not exhaustive) set of possible topologies that could fit the excess. Each model produces quite different kinematics: in the first case the squark and the first neutralino momenta are set by the proton parton distribution functions and there is a continuum choice of initial momenta leading to the production of the pair, resulting in broad distributions for the final states. In the second case, φ is resonantly produced at rest, therefore the leading jet p T and the missing energy are set by the φ−ψ mass difference, with additional jets from ISR. In the last case the mediator is resonantly produced at rest and both the jet momentum and the missing energy are distributed like the ISR, which is just a steeply falling power law dictated by QCD, with no characteristic scale.
We illustrate the difference between these models in figure 5, which shows the distributions of N j , H T , / E T , and M T 2 (for events passing the CMS036 triggers) at a benchmark point in the mass plane. To fit the excess, the hardest jet should have p j 0 T ∼ 300 − 400 GeV with comparable missing momentum. We choose the masses accordingly, withq, φ and V at 1.2 TeV while the invisible particle is at 850 GeV forq and φ and 600 GeV for V (for the vector mediator case the distributions are largely insensitive to the invisible particle mass). While the distributions peak at the excess, it is clear that the squark-neutralino model has no chance in populating only the excess in eq. (3.1), in particular because the H T tails at H T > 500 GeV are much wider than for the other models. The difference between the vector mediator and mono-φ models is also evident, the former having fatter tails and the latter sharply peaked: in particular, as the dark matter model relies on ISR to generate both the jet momentum and the missing energy, it is distributed like the background (mostly Z → νν with a ISR jets). If it were to populate the N j = 1 bins in CMS036, this model would also generate a consistent excess across a large fraction of bins where no deviation was observed in the data. Hence, neither the squark-neutralino or the dark matter model reach a significance above 1.5σ across their mass planes. On the other hand, the resonant φ model seems to fit well. 11 11 As the number of jets and their pT distributions are among the primary features characterizing this excess, it is important to be confident in their modeling. In particular, a more correct procedure would be to generate the hard events in MadGraph5 and Pythia8 matched to extra jets using the MLM scheme [29]. For the squark-neutralino and vector mediator models, this can be done without issue, and the differences in the jet distributions were slight. However, due to a bug in the Pythia8 color-connection algorithm, this was not possible for our resonantly produced color-triplet. As the specific issue was with the triplet-triplettriplet vertex, we generated a resonantly produced color-octet decaying to a gluon and an invisible particle at both the matched and unmatched level. This set of color-assignments is extremely difficult to justify in any reasonable new physics model, which is why we do not use it as our benchmark scenario. However, no significant difference was seen in the experimental acceptances due to matching, which we believe allows us to ignore (for now) matching in the color-triplet model.  In the top row of figure 6, we show the significance for the mono-φ model in the φ − ψ mass plane (because the jet momentum is set by the φ − ψ mass difference, we set the vertical axis to m φ − m ψ ) for each individual search. In particular, it can be seen that the same region of parameter space generates a significant excess in both CMS036 [12] and CMS033 [13], which is exactly how a first glimpse of new physics would appear. In the bottom row, we show the significance achieved combining the independent ATLAS and CMS datasets (left), which brings a slight increase to the likelihood (due to a ∼ 1.5σ excess in the ATLAS search), and the cross section necessary to achieve that significance (right). Note that the best-fit value of the cross section varies between different searches, so that the same model with given parameter values cannot achieve 3σ in both CMS036 and CMS033. In particular, the latter would prefer a higher cross section (by a factor of two), but its significance still reaches N σ = 2.5 if the signal cross section is set to the best-fit value of CMS036. Given the overlapping datasets, we cannot combine the significance of the two CMS searches.  We see that the mono-φ model is preferred with respect to the Standard Model at more than 3σ (local significance) in the broad range m φ ∼ 800 − 1400 GeV, m φ − m ψ ∼ 400 GeV and a cross section of order 0.2 − 0.4 pb. At the best fit point (m φ , m ψ ) = (1300, 900) GeV, we get N σ = 3.5 with a best-fit cross section of σ = 0.4 pb.
Finally, we include limits from the CMS048 monojet search [15]: this search also shows a modest excess of events in the low / E T bins, but because the bin width is much finer than for the N j = 1 bins in CMS036, it gives a strong discriminatory power for this model, whose / E T peaks near 300 GeV if it is to explain the excess in CMS036. As the datasets are overlapping between CMS036 and CMS048, we cannot compute a joint likelihood, as we did with ATLAS022. We show the effect of the limits in two ways in figure 7: on the left, for each mass point we set the cross section to be the best-fit cross section for the combined ATLAS and CMS excesses (as in figure 6) and then see if that signal is excluded by the monojet search at the 95% C.L: the resulting exclusion is shaded in gray. On the right, we set the cross section to its best-fit value unless it is excluded by the monojet search, in which case we set it to the 95% C.L. upper limit given by that search. We see that, while the best fit value of the mono-φ model is ruled out, a local significance of nearly 3σ is still allowed by all the present data.
Additional signatures of this simplified model are: • Dijet resonance: as φ is resonantly produced, it will also decay back to jets. The cross section shown in figure 6 is then σ(pp → φ) × BR(φ → j + / E T ), accompanied by a model-dependent dijet cross section σ(pp → φ) × BR(φ → jj) (depending on the coupling to ψ).  The combined ATLAS+CMS036 significance is above 3σ in most of the plane. The shaded gray area is excluded by the observed dijet limits [16], with the dashed line showing the dijet expected limits. In purple, we show exclusions from φ pair production followed by mixed decays according to the branching ratio on the vertical axis, set by CMS033 [13].
• Two jets and / E T : given that φ must be color-charged, it is also pair-produced, so that the signal must be accompanied by a 2j + / E T signature. In addition, depending on the branching ratio to dijets, there will be final states with 3j + / E T as well as 4j.
• If the branching ratio of ψ to three jets is non-negligible (a possibility we do not consider in this paper for simplicity), resonant φ production will form a four-jet resonance with a nested three-jet subresonance, which is currently unconstrained at the LHC. In addition, pair production can generate eight-jet final states (4j + 4j), six-jet final states (4j + 2j), or five jets + / E T .
While the branching ratio of φ into dijets depends on the relative size of the couplings (which will be discussed in the context of a full model in the next section), we can still show model-independent limits as in figure 8. Here, we set m φ − m ψ = 400 GeV (which maximizes the significance of the excess), set the cross section to the best-fit cross section σ (as given in figure 6), and then vary the scalar mass and the branching ratios. In the presence of only two decay channels, we have Br(φ → jj) = 1 − Br(φ → qψ), and having fixed σ(pp → φ)Br(φ → qψ) =σ, we find the observable dijet cross section as where A is the acceptance of the CMS dijet search [16]. We compute it at the parton level as recommended in [16], requiring |∆η jj | < 1.3, and H T > 250 GeV, m jj > 0.49 TeV for the low mass range considered there, or H T > 900 GeV, m jj > 1.25 TeV in the high-mass range. We find acceptances between 0.5 and 0.6, in agreement with the quoted value of 0.6. Finally, we compare σ jj in eq. (3.2) to the dijet limits on narrow quark-quark resonances

JHEP11(2017)194
from [16], as a function of the mass m φ and the branching ratio Br(φ → qψ), and show the excluded region in gray in figure 8. 12 We also set limits from pair-production of φ decaying to two or three jets and / E T , purple shading in figure 8 (the strongest limits are set by CMS033 [13]). The limits are more constraining when the decays are mixed, as on average there are more jets in the final state, while still retaining some missing energy. As the branching ratio into dijets increases, the limits are weaker as fewer events have missing energy at all: for Br(φ → qψ) = 0, values of m φ < 400 GeV are excluded from the ATLAS paired dijet resonance search [30].

Full model
The simplified model described is so far incomplete: in the absence of other fields, the decay ψ → 3j would happen on collider timescales (due to the large coupling needed for resonant production), resulting in a 4j final state. While currently there are no direct limits on this final state, in order to describe the excess the (dominant) ψ decay channel should be to invisible particles, suggesting a rich hidden sector. In addition, to avoid potentially dangerous baryon-number-violating processes at low energy such as dinucleon decay (as present in RPV SUSY, see e.g. [31]), ψ cannot be a Majorana fermion, and a Dirac mass term is needed. The Dirac partner of ψ could couple exclusively to the hidden sector which would easily explain the missing energy signature of the excess.
The minimal Lagrangian for the mono-φ model is the following: where q c i are right-handed quarks, N,Ñ are neutral, hidden sector fields. The scalar φ is a right-handed color-triplet and its electric charge can be either + i q c j combination is dominant. The requirement that ψ decays mostly to the hidden sector can easily be achieved either kinematically (e.g. if m ψ > mÑ + m N the hidden decay is two-body, while the SM decay mode is three-body), and/or if g > g. For the model to fit the excess, see eq. (3.1), it is imperative that the final state jet is not tagged as a b-jet. We have checked that even a c quark with a roughly 20% mistagging rate would generate too many events in signal regions with N b = 1, and would not reach 3σ significance as in figure 6.
The production cross section is set by the parton luminosity of the initial state flavors (we use the MSTW2008lo PDF set [32]) and the coupling λ, while the branching ratio 12 We also show the expected limits as a dashed line, with the 1σ (2σ) bands in green (yellow). As the dijet and other CMS data samples are independent, it would be possible to compute the combined loglikelihood of the two searches and possibly reach even higher significances. For example, note that near m φ = 1.2 TeV, there is a ∼ 2σ deviation from the expected limits, which if naively added in quadrature to our significance could reach a combined 4σ. Unfortunately, the event counts in the dijet mass distribution from the preliminary results are not public (!!), so it is not possible to reinterpret the data for assessing the significance of a particular model, and we can only use the quoted limits. into the excess, Br(φ → qψ), also depends on the coupling g. In figure 9, we illustrate the dependence on the production mode and the couplings, by showing the best-fit value of λ as a function of the φ mass, again fixing the mass splitting m φ − m ψ = 400 GeV to reach the highest significance in CMS036. In each plot, the value of g is fixed at reference values of 0.1 (a), 0.3 (b) and 1 (c). We then vary the initial state from ud (solid lines), ds (dashed) and bs (dotted), with the red line being the value of λ that reproduces the best-fit cross section. 13 At each point on the mass plane, the dijet cross section is fixed and it can be seen if it is allowed or excluded by the dijet search [16]: we show limits from dijet resonances on different initial states with the same line-style as for the best-fit λ (for example, dijet limits exclude the best-fit cross section for ds initial state when the black 13 In total there are nine possible qiqj initial state combinations: for the most part, the parton luminosities depend on the number of valence vs. sea quarks in the initial state, so that the results for ds apply to us, ub, cd, db, while the results for bs apply also for cs, cb initial states. Also note that while for simplicity we refer to qiqj initial states, we are including both qiqj → φ * andqiqj → φ processes.

JHEP11(2017)194
solid line is above the colored dashed line). As mentioned earlier, near m φ = 1.2 TeV the CMS dijet limits show a 2σ fluctuation, which could fit naturally in this model. In purple, we show limits on pair-produced φ's decaying to qq or qχ depending on the given branching ratios at each point.
In the limit λ g, σ × Br(φ → qψ) → const., which is the reason for some of the lines to quickly get out of the plot range: in that limit, above a certain mass no coupling can reproduce the excess, as the desired cross section is too high.
We note that this model is very similar to a previously proposed model of baryogenesis in the context of Twin Higgs, dubbed "Twin baryogenesis" [33]. There, the hidden sector was formed of twin quarks and ψ decays resulted in the same particle-antiparticle asymmetry in both sectors (due to the Dirac nature of ψ), therefore explaining the baryon asymmetry as well as the coincidence between matter and dark matter densities.
Another implementation would be a non-minimal version of RPV SUSY, where φ can be identified with a right-handed squark and ψ with a bino. Because the model needs Dirac neutralinos as well as a hidden sector, we do not try to pursue a full SUSY implementation, but mention that anomaly-mediated contributions to the neutralino Majorana mass bring back baryon number violation to an unacceptable level [34], so that a SUSY model faces many obstacles.
To conclude, we find that for this model to reproduce the excess, it must have g > 0.1 for any initial state, g > 0.3 for ds initial state and g 1 for bs. The typical values of the λ couplings are 0.05 − 1 depending on the initial states. If we require the absence of Landau poles at nearby scales, we predict that φ couples preferentially to (at least one of the) light quarks, namely to either ud, us, ub, cd or db. With such large couplings, one could have new diagrams contributing to φ pair-production (via a t-channel quark and two λ insertions, or a t-channel ψ and two g insertions), and to associated φ − ψ production (as in figure 4(a)). 14 We neglect those, as even for O(1) couplings, the cross sections only go up to O(10) fb, and will not significantly affect our limits, as well as our best-fit estimates. Finally, we mention that in a complete model, we would expect φ to couple to all three quark generations, possibly with flavor-dependent couplings, in which case flavor-changing neutral currents could become a strong constraint. We leave this aspect to future work.

Comments on the look-elsewhere effect
In the previous section, we have attempted to fit the "mono-jet excess" in the CMS jets+ / E T searches to a model consisting of resonant color-triplet production. We saw that the best fit point not excluded by other searches rose to ∼ 3σ local significance. However, as with any excess, we should also be interested in the global significance of the observed statistical fluctuation. What are the odds of seeing an excess of this size anywhere in the data set from the Standard Model alone? That is, what is the statistical significance after the look-elsewhere-effect (LEE) is applied?
This question is especially important given the novel method we have used to identify the excess: the rectangular aggregation technique. Given the extremely large number of

JHEP11(2017)194
RAs that we have iterated over, it is certainly tempting to believe that the LEE should reduce a 3σ anomaly to insignificance. After all, if we have scanned over 33,000 rectangles, have we not looked elsewhere 33,000 times?
As a naive upper bound on the LEE, we first calculate the local significance in all the 33,000 RAs of CMS036 with 1,000 pseudo-experiments. In 15% of these pseudoexperiments, we see at least one RA with local significance above N σ = 3.5 (the highest local significance in the real data). This is already far less than a trials factor of 33,000 would imply. Obviously, the 33,000 rectangles are not all independent -as each rectangle overlaps with many others, an excess in one would typically appear in many.
In fact, we expect the true LEE to be much less severe because, as we saw in section 2.2.1, many fluctuations have kinematic and topological characteristics that make them unlikely to be well-fit by any plausible new physics model. Quantifying this rigorously without resorting to a specific model would require formulating a complete set of signal templates, which is beyond the scope of this work. Instead, we will limit ourselves to demonstrating in the rest of this section that, in the context of the mono-φ model, the LEE reduces the 3σ local significance to 2σ global, which is in line with expectations for the LEE in a traditional "bump-hunt" type anomaly [17]. 15 To calculate the LEE for the mono-φ model, we first generate 10,000 pseudoexperiments for CMS036, taking into account the full covariance matrix, assuming only SM contributions. (Note that we cannot quantify the likelihood of the same pseudo-experiment to also give a fluctuation in CMS033 without actually generating Monte Carlo background events, as the searches are largely overlapping.) Using the number of generated "observed" events in each SR, we calculate the statistical preference for the mono-φ model anywhere in the full parameter space of the model (i.e. (m φ , m ψ , g, λ) plus the choice of initial state). In practice this amounts to allowing the cross sectionσ to be a free parameter, and fitting in the mass plane. We then ask how many pseudo-experiments contain a statistical deviation from the background-only hypothesis at least as significant as the excess seen in the real data (N σ ≈ 3). Figure 10 summarizes the look-elsewhere effect for the model in section 3.2. It illustrates the fraction of the 10,000 pseudo-experiments that have a local excess above a certain local significance threshold. We also indicate the global p-value where the parameter g has been restricted to a couple of special values, g = 0.1, 0.3, with initial state ud. The choice of this parameter might be motivated by particular frameworks such as SUSY. We find that there are 518 pseudo-experiments with an excess at least as significant as the excess in the real data. Of these, 364 (272) have an achievable cross section and are not ruled out by CMS dijets limits [16] with g = 0.3 (g = 0.1). This corresponds to a p-value of 5.2% (or 3.6% and 2.7% for the fixed values of g = 0.3 or 0.1) from which an equivalent 1.95 Gaussian N σ is inferred (2.1 and 2.2 for g = 0.3, 0.1 respectively). Thus, the LEE removes ∼ 1σ from the 3σ local excess.   Figure 10. Fraction of CMS036 pseudo-experiments (generated from background counts reported in [12]) with a global significance (anywhere on the mass grid of the mono-φ model) above a specified local significance threshold. The axis on the right shows this fraction, while the axis on the left indicates its equivalent standard normal distribution significance. The green line denotes the fraction of 10,000 pseudo-experiments that have a given local significance in the full parameter space of the model, while the orange and blue lines represent the global significance for particular choices of g. The excess in the real data lies on the vertical line at σ local ≈ 3.

JHEP11(2017)194
As expected, the look-elsewhere effect, when applied to a specific model, has brought down the significance of the anomaly; but no more than is typically seen for any experimental excess. Even with that taken into account, we are still left with a non-trivial deviation from the background-only predictions, over 2σ globally. If this excess is actually a window to new physics, its significance will go up as the LHC dataset increases.

Conclusions
The LHC continues to provide vast amounts of high-quality data across many distinct final states. The sheer volume of information makes it difficult to assess the presence of potential deviations from the Standard Model background predictions. With so many signal regions, statistical fluctuations in individual bins are expected, and some method must be used to combine the signal regions to identify which excesses are interesting potential signals of new physics.
The experimental and phenomenology communities have traditionally addressed this problem by using pre-defined signal templates, often cast in the language of supersymmetry or simplified models. However, this approach is very limiting, and provides a view of the data biased by pre-LHC theoretical assumptions. These expectations are not the only forms new physics can take, and given the lack of evidence for theories such as minimal supersymmetry in the data so far, a more flexible approach is needed.

JHEP11(2017)194
Our method of rectangular aggregations provides a systematic approach to the data that allows us to identify a list of interesting excesses, with the only theoretical prior being that new physics should populate a compact set of kinematic and topological variables. As we have demonstrated using the CMS jets+ / E T searches, anomalies at the 3σ level (2σ including the look-elsewhere effect) exist in these searches. We have pursued new physics explanations of just one of the identified excesses in this paper, which appears to be shared by both CMS jets+ / E T searches as well as possibly the ATLAS jets+ / E T search. However the other excesses we identified may also yield interesting results.
Applying rectangular aggregations to the rest of the current LHC analyses is also an immediate and obvious follow-up to this work. This requires the collaborations to construct their searches in terms of non-overlapping signal regions covering as much of the kinematic space as possible, and make public the background correlation matrices of these regions -as CMS has already done for many (but not all) searches. We strongly encourage the ATLAS collaboration to consider this approach as well.
Identifying and categorizing the statistical deviations in the current data provides a first step in the long-term project to monitor the LHC data for interesting anomalies. Early identification of these regions allows for theoretical work (model-building) that can point to better-optimized search strategies. For example, the mono-φ model in the monojet channel has distinctive jet kinematics (as one jet comes from the decay of a heavy resonance), which the current jets+ / E T searches are not optimizing for. Also, a given model will tend to predict correlated signatures (such as dijets, paired dijet resonances, and 2j/3j+ / E T signatures in the mono-φ model), whose presence in the data would give much greater confidence in the new physics interpretation.
Early identification of anomalies also allows the experimental collaborations to freeze the relevant selection criteria. This point is especially important: as the rate of LHC data collection continues to increase, there will be pressure to raise trigger and selection thresholds. However, this runs the risk of blinding the experiments to interesting physicsidentifying potentially interesting regions will provide another piece of evidence to consider in this process. Already, the high thresholds used in some of the experimental analyses can make new physics searches at low particle masses difficult, and these thresholds should be lowered where possible.
Our rectangular aggregation method can potentially be improved by refining the templates used to aggregate signal regions. In this paper, we used simple rectangles, which are certainly model-independent, but are insensitive to new physics which populate signal regions in more complicated patterns. For example, if two kinematic variables are highly correlated, the correct aggregation template would be one that moves along a diagonal. The signal templates could possibly be further improved by incorporating information about the effects of initial-and final-state radiation on the topology and kinematics of the hard event.
Though a great deal of discovery potential remains at the LHC, we are entering a slower phase of progress -at least as measured in absolute increase in the mass of particles. For example, it will take some 20 years to achieve the ultimate reach of ∼ 3 TeV for gluinos (see [9]). Given this phase of steady data acquisition, we believe that now is the time to -25 -

JHEP11(2017)194
chase ambulances -as the data comes in at a dependable rate and at fixed energy, it is less productive to just wait around for an excess to grow or shrink. It becomes more and more well motivated to fully explore the data and attempt to fit new physics models to it. We believe that the prevailing view of the LHC data as containing no interesting anomalies is, at best, premature, and a great deal of work remains to fully explore the data set.

A Statistics
In this appendix, we review the profile likelihood ratio methods used to quantify the significance of an excess or to set exclusions on a model. We do not aim to give a complete analysis, for which we refer the reader to [18,35], but simply go through the essential concepts.
The likelihood ratio test compares two competing hypotheses, usually referred to as the null hypothesis H 0 , and the alternative hypothesis H 1 , and can be related to a p-value, that is, the probability of finding a greater or equal test statistic than the observed one if the null hypothesis is true. In a typical LHC search, the data is separated into multiple SRs or bins, If there were no theoretical or experimental uncertainties on the predicted backgrounds, the probability of observing n i events in the i-th bin, given a SM expectation b i and a BSM signal s i , would be given by the Poisson distribution: where µ is a signal strength modifier [35]: µ = 0 stands for no signal beyond the SM and µ = 1 refers to the fiducial signal. It is useful to keep explicit the dependence on µ in order to be able to test how well the data is described by the particular BSM topology under study with other values of µ, which for example can be achieved by changing the branching ratios or the particle multiplicity.
In real experiments, both the SM backgrounds and the BSM signal have systematic uncertainties arising from many sources (theory errors, MC and control region statistics, jet energy scale, fake rates, etc.) In general, the uncertainties are treated as nuisance parameters, and as outlined in [35], they can be well-approximated in many instances by -26 -

JHEP11(2017)194
zero-mean Gaussian variables θ i , which are added to the background, b i → b i + θ i , together with a covariance matrix V . The likelihood function for all bins is then defined as: We then minimize (profile) the likelihood function with respect to the nuisance parameters θ (and the signal strength µ), and define a likelihood ratio as: whereθ µ is a θ vector that maximizes the likelihood in eq. (A.2) for a given µ and (μ,θ) are the µ and the θ vector that globally maximize the likelihood.λ(µ) is a measure of how far away a given signal (µ) is from being the best model to explain the observed data. Larger values ofλ(µ) (notice 0 λ 1) imply a better compatibility between the signal and the observed data [18]. We assume a signal only increases the event count in each signal region (therefore neglecting cases where interference effects would be important). Thus,μ < 0 implies that µ = 0 has the best agreement with the data while still being a physical value for µ (this is reflected in the second line in eq. (A.3)).
In the limit of large sample size, it can be found [36,37] that −2 lnλ(µ) follows a chisquare distribution with one degree of freedom. For smaller sample sizes, one can either find its distribution by generating toy experiments, or by using the asymptotic formulae in [18].
We now define two test statistics suitable for our studies: • The test statistic for discovery of a positive signal: In this case, the goal is to rule out the Standard Model (the null hypothesis), while the alternative hypothesis is the positive BSM signal. While the presence of underfluctuations, where one would findμ < 0 in eq. (A.3), means that the SM is not a good fit, it should not be automatically taken as a sign of new physics but rather point to possible errors in the SM background estimation. In this case we have q 0 = 0.
In the large N limit, one can find the p-value p 0 , and the equivalent Gaussian significance Z 0 , as: where Φ(x) = 1 2 erfc(−x/ √ 2) is the cumulative Gaussian distribution.
• The test statistic for setting upper limits:

JHEP11(2017)194
Here we are testing the compatibility between the data and the BSM signal with a signal strength µ, and a largerq µ representing increasing incompatibility. The null hypothesis we aim to reject is therefore the BSM signal. In the case thatμ > µ, the best-fit signal contribution is larger than the signal strength we are testing, and we should not reject the signal in favor of the SM (the alternative hypothesis) by setting an upper limit; therefore,q µ is set to zero in that range.
In the large N limit, the p-value p µ and the Gaussian significance Z µ are simply given by: In particular, when the p-value is below a certain threshold α we say that the signal is excluded at a confidence level of 1 − α. Results are usually quoted at the 95% confidence level, corresponding to α = 0.05 or Z = 1.96. We find this value numerically by varying µ until we findq µ = 4 (we here gloss over the small difference between 2σ exclusions (Z = 2) and 95% C.L. exclusions).
The quantities q 0 andq µ can be calculated for either the full set of SRs of each search, or for the reduced set of SRs found after aggregations, eq. (2.1). In section 2.2, we calculate q 0 with the input signal populating only that aggregated region (which is treated as a single new bin). We are then quantifying how excluded the background-only hypothesis is compared to a hypothetical BSM model that only populates that rectangular aggregation. This number is reported for the RAs in tables 1-2. Since we are only studying the fluctuations localized to the aggregated bin, the significance we obtain is local.
Because multiple searches involve hundreds of exclusive signal regions, each with its nuisance parameter θ i , the definition ofλ(µ) in eq. (A.3) involves maximizing the likelihood function L(µ, θ) with respect to hundreds of variables. While in the absence of correlations, L is simply a sum of terms that can be individually maximized, in general this is not an easy task: in this work, we use the powerful Minuit routines [38], interfaced to Python via the iminuit package.

B Recasting pipeline and validation
In this work, we have reinterpreted several ATLAS and CMS searches. In this section, we validate each search by reproducing the exclusion plots on simplified models present in each experimental paper.
We generate hard events in MadGraph5 aMC@NLO 2.5.3 [26], with additional hard jets in the events if needed. For the SUSY simplified models used for validations, we use the MSSM module included in Madgraph, while for the dark matter simplified models used in [15] we use the DMsimp UFO model [39]. For the mono-φ model discussed in section 3, we use the RPVMSSM UFO model [40], which was generated with FeynRules. While for the validation plots we do not find it necessary to generate more than 10,000 MC events, for the significance plots shown in section 3 we generate 100,000 MC events to avoid statistical fluctuations in low-efficiency bins. We use the leading-order cross sections as calculated by MadGraph. Parton-level events are showered and hadronized with Pythia8.219 [27]. If necessary, we match the matrix-element and parton shower events with the MLM technique [29]. The resulting particles are reconstructed in a ATLAS-or CMS-like simulated detector using Delphes3.4 [28], depending on the search, with efficiencies for particle reconstruction taken from the experimental papers. Jets are reconstructed with the FastJet package [41], using the anti-k t algorithm [42]; to validate [15], we also use pruning techniques [43] on large-R jets, on which n-subjettiness variables [44] are also calculated. Finally, cuts and SR definitions in each experimental search are simulated with pyROOT. For the CMS searches, efficiencies from all the signal regions are used to compute the likelihood, while for ATLAS we use the SR with the best-expected exclusion to set limits (or the bestexpected discovery reach for positive significance).
In order to make our work most useful to the community, we also release auxiliary material containing our source code for each analysis as well as the ATLAS and CMS detector cards used in Delphes. With this material, our results can be reproduced as well as extended to different scenarios. We stress that our aim is not to release a public recasting tool, but to make it easier for existing public codes to incorporate the searches that we used.
In the following, we show our validation plots, comparing the experimental exclusion region with the exclusion region we find from our pipeline. In order to show what the uncertainties are in our pipeline, we also show a shaded band around our recasted limits, which are obtained by multiplying or dividing our efficiencies by a factor of 1.5. Using the likelihood analysis described above, and the data (observed event counts, expected backgrounds with relative uncertainties and covariance matrices when available) from each search, we compute the test statisticsq µ as described in eq. (A.6), and exclude a point whenq µ=1 ≥ 4, that is when the nominal cross section is excluded at the 95% C.L. Of all the simplified models studied in [12][13][14] we only show validation plots for the ones most similar to the topologies studied in section 3, namely pp →gg,g → qqχ 0 1 (left column) and pp →qq * ,q → qχ 0 1 (right column) with either one or all eight first and second generation squarks in the spectrum. For [15], we show the simplified dark matter models with either a vector or an axial-vector mediator.

C Identified excesses inconsistent with new physics
We here discuss the excesses in tables 1 and 2 that we think are more likely due to statistical fluctuations of the background, according to the criteria outlined in section 2.2.1. To map a CMS036 excess into corresponding CMS033 bins, we recall that / E T ≥ M T 2 .
• ROI #1 of CMS036 is in tension with the lack of any excesses in the corresponding SRs of CMS033. Because the core of the excess is from bins 126, 127 (which by themselves provide 3.1σ), we will focus on these. In figure 13(a), we show the residuals of the CMS033 / E T distribution for the bins corresponding to the N j , N b , H T ranges of this excess in CMS036 (shaded grey region). Because the N j bins do not align in the two searches, we include N j = 3 − 4 of CMS033. The best-fit value for the number -29 -JHEP11(2017)194 Figure 11. The validation plots for [13] (first row), [12] (second row) and [14] (third row), for theg → qqχ 0 (left) andq → qχ 0 (right) simplified models. The blue and purple lines denote the 95% C.L. limit calculated using the likelihood analysis described in appendix A. The shaded regions denote the same limit as the solid line with 50% error included in our signal strength in each direction (to take the possible recasting errors into account). The red and orange lines are the official observed limits. Figure 12. The validation plots for [15]. The color-coding is the same as in figure 11. Obs. -Exp.
CMS036 5: N j =4-6, N b =1, H T =575-1000 GeV (d) Figure 13. Kinematic distributions which suggest various CMS036 excesses (see table 1) are unlikely to be new physics. The residuals (observed minus expected) are shown in blue, (with error bars denoting the error). In each plot, the shaded grey region denotes the region with an observed excess. See text for details. of signal events in the CMS036 excess is 71 events: from the / E T distribution in the corresponding CMS033 bins, we see that the only place for these extra events is the bin 500 < / E T < 750, which has an excess of ∼ 40 events. Therefore, one would have needed a ∼ 1σ downward fluctuation of the backgrounds in that bin of CMS033 to accomodate the full CMS036 excess. On the other hand, if we reduce the signal strength to ∼ 70% of the best-fit value, the CMS036 significance is reduced from 3.1σ down to 2.9σ, but the events could be responsible for the small excess in CMS033. In either case, the signal should have a highly peaked / E T distribution, and / E T ≈M T 2 , which seems implausible.
• In the same spirit, in figure 13(b), we show the / E T distribution of the CMS033 events corresponding to the CMS036 ROI #2. As the excess has 200 < M T 2 < 300 GeV, any missing energy is in principle allowed. The excess features events with both 0 and 1 b-tagged jets: we could conjecture a signal with one true b quark in the hard process, in which case we would expect additional ISR jets to populate N j ≥ 2 bins: in that case, the CMS036 excess should often be seen in the N j = 2 bins of CMS033. While there is a large excess in the N b = 0 bins (corresponding to CMS033 excesses    (see table 2) are unlikely to be new physics. The residuals (observed minus expected) are shown in blue, (with error bars denoting the error). In each plot, the shaded grey region denotes the region with an observed excess. See text for details. #2a,c,d), there is a deficit in the N b = 1 bins, so that we deem the N b = 1 RAs of this ROI to be inconsistent with the corresponding CMS033 bins. On the other hand, the aggregation with the N b = 0 bins is highly significant and mirrors an excess in CMS033, which we have investigated in section 3.
• In figure 13(c), we show the H T distribution of CMS036 ROI #4: the nearby bins in H T have deficits, so that a putative signal would have to have an extremely narrow H T distribution, which is unlikely for events with up to six jets.
• In figure 13(d), we study CMS036 ROI #5 and show the M T 2 distribution of the neighboring bins. For N j > 2, the M T 2 distributions are generally much wider than 100 GeV, so that a signal would contaminate nearby bins, all of which are consistent with backgrounds, especially at lower M T 2 .
• Now, we turn to the excesses in CMS033, table 2: first, aggregation #1a includes the peculiar combination of N j = 2 and N b = 3 bins, which we deem unlikely to come from a model with a set decay topology. On the other hand, turning to smaller -32 -

JHEP11(2017)194
sub-aggregations results in a viable excess (#1b, as shown in section 2.2.1), as well as less viable ones which we discuss next.
• in figure 14(a), we show the N j distribution of CMS033 aggregation #1c: the excess spans a large range of N j , but it can be seen that the residual distribution is approximately flat. A signal with a large number of jets in the hard process will be peaked around that number, with tails given by ISR or jets overlapping with each other, while a signal with low jet multiplicity should be peaked at low N j and have tails from ISR. As the residuals do not look like any of these cases, we think that the N j distribution of this excess is only compatible with background fluctuations.
• In figures 14(b)-14(c), we show the N j and N b distributions of the CMS033 aggregation #1d. While the N j distribution should be sharply peaked at N j ≤ 4, the aggregation requires N b = 2, 3. As b-jets from ISR are rare, adding ISR to the b-jets would likely overpopulate the high-multiplicity N j ≥ 4 bins. This excess is incompatible with surrounding bins. Note that the N b = 0 bin is out of scale in this plot and overlaps with the viable aggregation #2c.
• In figure 14(d), we show the N j distribution of CMS033 ROI #3. We here only plot the distribution of bin 126 which is the core of the excess. It has N j = 7, 8 and high H T , but nearby bins at both higher and lower jet multiplicities do not show any deviation. Such high jet multiplicities require at least 4 − 5 jets at the parton level with the rest coming from ISR/FSR, resulting in wide jet distributions which would populate the nearby bins. We exclude this aggregation due to the N j distribution.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.