Artificial Proto-Modelling: Building Precursors of a Next Standard Model from Simplified Model Results

We present a novel algorithm to identify potential dispersed signals of new physics in the slew of published LHC results. It employs a random walk algorithm to introduce sets of new particles, dubbed"proto-models", which are tested against simplified-model results from ATLAS and CMS (exploiting the SModelS software framework). A combinatorial algorithm identifies the set of analyses and/or signal regions that maximally violates the SM hypothesis, while remaining compatible with the entirety of LHC constraints in our database. Demonstrating our method by running over the experimental results in the SModelS database, we find as currently best-performing proto-model a top partner, a light-flavor quark partner, and a lightest neutral new particle with masses of the order of 1.2 TeV, 700 GeV and 160 GeV, respectively. The corresponding global p-value for the SM hypothesis is approximately 0.19; by construction no look-elsewhere effect applies.


Introduction
Inverse problems [1] are defined to be the process of inferring causal factors and general rules from observational data. They are ubiquitous in all sciences and notoriously hard to tackle. In particle physics, we typically refer to the problem of constructing the fundamental Lagrangian from our observations as our field's inverse problem. In the context of the Standard Model, the precise theoretical predictions essentially turn this problem into classical hypothesis tests, cf. for instance the determination of the properties of the Higgs boson.
Should experimental evidence for physics beyond the Standard Model (BSM) arise at the LHC or elsewhere, this mapping from signature space to the parameter space of the underlying theory will, however, be much less clear-cut [2,3]; see also [4][5][6][7][8]. The reasons are twofold. First, there is the large, and still growing, number of proposed extensions of the Standard Model (see, e.g., [3,9] for concise overviews), many of which have a large number of free parameters and can come in multiple, non-minimal variants. Second, very likely not enough information (i.e., too few observables) will be available for a direct connection between experiment and theory, making it necessary to "relate incomplete data to incomplete theory" [2].
At the same time, the LHC has been providing new experimental results at an enormous pace, making it a highly non-trivial task to determine which theories or scenarios survive experimental scrutiny and which do not. Testing a BSM model or inferring its preferred region of parameter space typically requires the construction of a likelihood, which encapsulates the information from the relevant LHC results and/or from other experiments. Building such a likelihood can be a daunting task, due to the limited amount of information available outside the experimental collaborations 1 and the computational resources required for exploring large parameter spaces. Thus, even when possible, the statistical inference obtained is naturally limited to the concrete BSM model under investigation. Owing to the vast number of proposed models (and new ones still to come) it is not a given that this top-down approach will direct us towards the hypothetical Next Standard Model (NSM) [12]. Figure 1: Overall strategy of how we may envisage to construct an NSM from LHC data: the outcomes of specific searches, which are communicated via simplified model results, are used to construct proto-models. These may be scrutinized in dedicated analyses and eventually help to infer the NSM; see also [4,12,13].
In this work, we therefore aim to tackle the inference of the NSM through a bottom-up approach, which strongly relies on LHC data and contains only minimal theoretical bias. The idea itself is not new: systematic, bottom-up approaches to the LHC inverse problem were envisaged previously with the Bard algorithm in [4], the MARMOSET framework in [12], and the characterisation of new physics through simplified models in [13]. Our approach is different in the sense that, given the absence of any clear signal of new physics so far, we focus on potential dispersed signals that may have been missed in the common analysis-by-analysis interpretations of the data.
Concretely, we introduce so-called "proto-models", which can be viewed as sets or "stacks" of simplified models, where the number of BSM particles as well as their masses, production cross sections and decay branching ratios are taken as free parameters. Through the use of simplified-model constraints via the SModelS [14] package, we construct approximate likelihoods for the proto-models, allowing us to perform Markov Chain Monte Carlo (MCMC)-type walks in their parameter space.
These MCMC walks consist of random changes in the BSM number of particles, their masses, production cross sections and decay branching ratios, with the goal of finding proto-models which evade all available constraints and at the same time explain potential dispersed signals in the data. We stress that proto-models are not intended to be UV complete nor to be a consistent effective field theory. Nonetheless we hope they can be useful to guide future experimental and theoretical efforts, and perhaps even serve as a first step toward a construction of the NSM, as illustrated in Fig. 1.
The rest of this paper is organized as follows. Section 2 discusses in detail our definition of proto-models, their parameters and the assumptions they are based on. Section 3 describes the method used for computing the LHC constraints for a given proto-model as well as how we use this information to construct an approximate likelihood. The MCMC walk and the algorithm used for building new proto-models is detailed in Section 4. Finally, in Section 5, we apply the MCMC walk to the full SModelS database and discuss the results obtained. We conclude in Section 6 with a short summary and charting out future developments and extensions of this work.

Proto-Models
As mentioned in the Introduction, proto-models are defined by their BSM particle content, the particle masses, production cross sections and decay branching ratios.
In this work, we assume that all particles either decay promptly or are fully stable at detector scales. Since proto-models are not intended to be fully consistent theoretical models, their properties are not bound by higher-level theoretical assumptions, such as representations of the SM gauge groups or higher symmetries. Nonetheless, since we will make extensive use of simplified-model results from searches for supersymmetry (SUSY) we impose the following constraints: 1. all BSM particles are odd under a Z 2 -type symmetry, so they are always pair produced and always cascade decay to the lightest state; 2. the Lightest BSM Particle (LBP) is stable and electrically and color neutral, and hence is a dark matter candidate; 3. except for the LBP, all particles are assumed to decay promptly; 4. only particles with masses within LHC reach are considered part of a specific proto-model.
In the current version of the algorithm, we allow proto-models to consist of up to 20 BSM particles as described in the next subsection.

Particle Spectrum
Unlike "full" models, the BSM particle content of a proto-model is not fixed. Since the number of degrees of freedom (including spin and multiplicity of states) for each particle is mostly relevant for the production cross section, which is treated as a free parameter, we do not specify the spin or multiplicity of the BSM particles. In the current version, we consider the following pool of 20 (SUSY inspired) particles: • Light quark partners X q (q = u, d, c, s): we allow for a single partner for each light-flavor quark. Unlike SUSY models, we do not consider two independent particles (left and right-handed squarks) for each flavor. As mentioned above, a possible multiplicity of (degenerate) states is accounted for by rescaling the production cross section.
• Heavy quark partners X i b , X i t (i = 1, 2): unlike the light quark partners, we consider two independent particles for each flavor. Since these particles have so extensively been searched for at the LHC, we include two states in order for the proto-models to have enough degrees of freedom to accommodate the data.
• Gluon partner X g : we introduce one new color-octet particle, analogous to a gluino in SUSY.
• Electroweak partners X i W , X j Z (i = 1, 2; j = 1, 2, 3): we allow for two electrically charged and three neutral states. These might correspond to charginos and neutralinos in the MSSM (with the neutral higgsinos being exactly massdegenerate), or to the scalars of an extended Higgs sector with a new conserved parity. The lightest neutral state (X 1 Z ) is assumed to be the LBP.
• Charged lepton partners X ( = e, µ, τ ): like for light-flavor quarks, we consider a single partner for each lepton flavor.
As mentioned above, the lightest BSM particle is required to be the X 1 Z , hence all masses must satisfy m(X) ≥ m(X 1 Z ). In a given proto-model only particle masses below 2.4 TeV are considered, so they are within the LHC reach. States which appear in two-or three-fold multiplicities are mass ordered, e.g., m(X 2 t ) > m(X 1 t ). Some additional requirements are necessary for the masses of the colored new particles, to avoid that our machinery "discovers" light states in regions which are poorly covered in the database. Concretely, for the X g and X q we disallow masses below 310 GeV, since most of the simplified-model results we employ do not consider masses below this value. Also, for X t masses below 280 GeV, we disallow the region The reason is that many CMS analyses do not make any statements for this "corridor" region, as events in there become too similar to tt events, see e.g. Ref. [15].

Decay Channels and Branching Ratios
The decay modes of each new particle must of course be consistent with its quantum numbers. At present, we restrict the BSM particle decays to the channels shown in Table 1. Note that not all possibilities are considered in this work. In particular X W and X Z decays into X + /ν, X ν + ν/ , and X W,Z + γ are not yet taken into account.
Specific decay modes are turned off if one of the daughter particles is not present in the proto-model, or if the decay is kinematically forbidden. The branching ratios (BRs) of the allowed channels are taken as free parameters, which however must add up to unity.

Production Cross Sections
In addition to the masses and BRs of the BSM particles, their production cross sections are also allowed to vary freely. However, in order to have sensible starting values, the cross sections are first computed assuming the BSM particles to be particle decay channels particle decay channels MSSM-like, and then allowed to be rescaled freely by signal strength multipliers κ.
For instance, the pair production of X g is given by: with the mass of the gluinog set to the X g mass. The rescaling factors κ X i X j are taken as free parameters of the proto-model.
In practice, we use a template SLHA input file for the MSSM to define masses and BRs of the proto-model (withq R ,χ 0 4 and heavy Higgses decoupled); the reference SUSY cross sections are then computed with Pythia 8.2 [16] and NLLFast 3.1 [17][18][19][20][21][22][23]. We note that this way cross sections which are naturally suppressed or theoretically not allowed (such as pp → X − e X − e ) are automatically neglected. While this adds a certain bias, as processes which do not occur in the SUSY case (like production of two different top partners, pp → X 1 tX 2 t ) will also be absent in the proto-model, this is not a problem, because there are currently no simplified-model results available for such cases.

LHC Results
In order to quickly confront individual proto-models with a large number of LHC results, we make use of the SModelS [14,[24][25][26][27][28][29]  Key to this is the construction of an approximate likelihood for the signal, which describes the plausibility of the data D, given a signal strength µ. Here, θ denotes the nuisance parameters describing systematic uncertainties in the signal (µ) and background (b), while p(θ) corresponds to their prior. 2 In the following two subsections, we explain the two main steps to arrive at L BSM : computing the individual likelihoods for each applicable analysis, and combining the individual likelihoods into a global one.

Likelihoods and Constraints from Individual Analyses
The extent to which likelihoods can be computed crucially depends on the information available from the experimental collaboration.
1. If only observed ULs are available, the likelihood becomes a constraint in the form of a step function at the observed 95% CL exclusion limit. This is in fact not useful for constructing L BSM per se, but will be used to determine the maximal allowed signal strength µ max (see Section 3.2 below).
2. If the expected ULs are available in addition to the observed ones, following [30] we approximate the likelihood as a truncated Gaussian: Here, the likelihood is a function of the signal strength multiplier µ for any reference cross section σ ref (in our case the cross section predicted for the given proto-model), while σ obs is approximated with the standard deviation of the expected Gaussian likelihood, σ obs ≈ σ exp = σ UL exp 1. 96 , with σ UL exp the expected 95% CL upper limit on the signal cross section. In addition, σ max is chosen such that the approximate truncated Gaussian likelihood correctly reproduces the 95% CL observed limit on the production cross section, σ UL obs . Finally, c is a normalization constant, c ≥ 1, which ensures that the integral over the 2 Note that µ is a global signal strength. Therefore the signal cross sections are given by  However, we cannot expect Eq. (3) to hold for large excesses. Therefore, if the observed UL differs from the expected one by more than two standard deviations, we "cap" the likelihood by artificially setting the observed UL to σ UL obs = σ UL exp + 2σ exp . With this procedure we avoid overly optimistic interpretations of excesses. Again, this is a crude approximation, and it could be avoided by EM results, see below.
3. EM results contain more information to construct a proper likelihood. In the absence of a full statistical model (see point 4 below), we assume p(θ) to follow a Gaussian distribution centered around zero and with a variance of δ 2 , whereas P (D) corresponds to a counting variable and is thus properly described by a Poissonian. The likelihood is then given by where n obs is the number of observed events in the signal region under consideration, b is the number of expected background events, and δ 2 = δ 2 b + δ 2 s the signal+background uncertainty. The nuisances θ can be profiled or marginalized over; our default is profiling. This is often referred to as a simplified likelihood [31][32][33]. While n obs , b and δ b are directly taken from the experimental publications, we assume a default 20% systematical uncertainty on the signal. See [24] for more detail.
Generally, in each analysis we use only the signal region with the highest σ UL obs /σ UL exp ratio. A large ratio indicates an excess has been observed and allows us to identify potential dispersed signals. Although it is not a given that this is the best criterion when combining results from distinct analyses, it allows us to drastically reduce the number of possible signal region combinations across analyses. 4. The best case is to have EM results together with a statistical model, which describes the correlations of uncertainties across signal regions [10,11]. In this context, CMS sometimes provides covariance matrices, which allow for the combination of signal regions as discussed in [32]. (N.b., this is still a simplified likelihood assuming Gaussian uncertainties.) So far, however, a covariance matrix together with simplified model EMs is available for only one analysis [34].
ATLAS, on the other hand, has recently started to publish full likelihoods using a plain-text (JSON) serialization [35] of the likelihood, which describes statistical and systematic uncertainties, and their correlations across signal regions, at the same fidelity as used in the experiment. 3 At present, EM results with JSON likelihoods are available in the SModelS database for three ATLAS SUSY analyses [38-40] for full Run 2 luminosity (139 fb −1 ), leading to clear improvements in the statistical evaluation for these analyses [29]. 4  In this work, we therefore treat such correlations in a binary way-a given pair of analyses is either considered to be approximately uncorrelated, in which case it may  analysis correlation matrices. We understand that, although well motivated, this is but an educated guess from our side. Future runs of our algorithm will be based on more reliable estimates of the results' mutual statistical independence. Indeed work on systematically constructing such analyses correlation matrices is underway, see [44] and contribution 16 in [45]. However, while a good number of analyses can be combined in our approach, we also see from the white bins in Fig. 2, that about half (one third) of the 13 (8) TeV results are observed ULs only, and thus no proper likelihood can be computed for them. Nonetheless, under the above assumptions it is possible to construct an approximate combined likelihood for subsets of LHC results. We will refer to each of these subsets as a combination of results. Each combination must satisfy:

Building a Global Likelihood
• any pair of results in the subset must be considered as uncorrelated and • any result which is allowed to be added to the combination, is in fact added.
Likelihoods from uncorrelated analyses can simply be multiplied: , where the product is over all n uncorrelated analyses and µ is the global signal strength. Information from all the other analyses, which are not included in the combination, is accounted for as a constraint on the global signal strength µ.
Given the upper limit on the signal cross section obtained from the most sensitive analysis/signal region, we compute an upper limit on µ: where σ is the relevant signal cross section and the 1.3 factor allows for a 30% violation of the 95% CL observed limit. We accept such mild transgressions in order to account for the fact that when simultaneously checking limits from a large number of analyses, a few are statistically allowed to be violated. Imposing the above limit on the global signal strength µ corresponds to truncating the likelihood at µ max and ensures constraints from correlated analyses, as well as analyses for which no proper likelihood can be computed, are approximately taken into account.
The above procedure is illustrated in Fig. 3. In this case we truncate the likelihood for "Result A" at µ max obtained from "Result B". Hence, the value ofμ for the truncated likelihood shifts slightly with respect to the full result. In this work we apply this approximation whenever it is not possible to construct a full likelihood.

The Walker
Let us now turn to the algorithm for performing an MCMC-type walk in the protomodel's parameter space in order to identify the models that best fit the data. This algorithm, which we dub the walker, is composed of several building blocks, or "machines" that interact with each other in a well-defined fashion: 1. Starting with the Standard Model, a builder creates proto-models, randomly adding or removing particles and changing any of the proto-model parameters (see Appendix A.1 for details). 4. Using the combinations provided by the combiner, the walker computes a test statistic K for the proto-model (see Section 4.1 below). If the K value is higher than the one obtained in the last step, the new proto-model is kept.
If it is lower than the previous K, the step is reverted with a probability of By many iterations of the above procedure, the walker is able to identify protomodels which evade all simplified-model limits and at the same time can explain dispersed signals in the data. This will be illustrated by means of a toy walk in Section 4.2 and by walking over the full SModelS database in Section 5.

Computing a Proto-model's Test Statistic
As mentioned above, during the MCMC-type walk, the algorithm aims to maximize the proto-model test statistic K. Naturally, K must be designed such that it increases for models which better satisfy all the constraints (which includes better fitting potential dispersed signals). Furthermore, it is desirable to reduce the test statistic of models with too many degrees of freedom in order to enforce the law of parsimony, that is, Occam's razor [46]. Given a proto-model, we define for each combination of results c ∈ C the auxiliary quantity Here L c BSM is the likelihood for a combination c of experimental results given the proto-model, evaluated at the signal strength valueμ, which maximizes the likelihood and satisfies 0 ≤μ < µ max . L c SM is the corresponding SM likelihood, given by L c BSM (µ = 0). Finally, π(SM) and π(BSM) denote respectively the priors for the SM and the proto-model. The total set of combinations of results, C, is determined as explained in Section 3. We shall use this auxiliary quantity to define the test statistic K for a proto-model simply as: Some explanations are in order regarding the choice of the prior. Since π(SM) is a common factor for all combinations and does not affect the comparison between distinct proto-models, we wish to define the prior such that π(SM) := 1 .
Moreover, as mentioned above, the proto-model prior π(BSM) should penalize the test statistic for newly introduced particles, branching ratios, or signal strength multipliers. We therefore heuristically choose the prior to be of the form where n particles is the number of new particles present in the proto-model, n BRs is the number of non-trivial branching ratios 6 , and n SSMs the number of signal strength multipliers. 7 In order to favor democratic decays, if two decay channels for a given particle have BRs differing by less than 5%, they are counted only once. The description n particles n BRs n SSMs ∆K one new particle, one non-trival BR,  parameters a 1 , a 2 , and a 3 are chosen to be 2, 4, and 8, respectively. This way, one particle with one non-trivial decay and two production modes is equivalent to one free parameter in the Akaike Information Criterion (AIC) [47]. For the SM, n particles = n BRs = n ssm = 0 and the prior in Eq. (9) satisfies the normalization set by Eq. (8). Note, however, that it is not normalized in the space of all proto-models.
In Table 2 we illustrate how the prior affects the model test statistic K for a few choices of parameters. It should be noted that, with our choice of prior, K ceases to have a clear, probabilistic interpretation. K can even become negative.
We interpret such an outcome as a preference of the SM over the BSM model.

Toy Walk
In order to illustrate the walker algorithm, we first apply it to an extremely reduced version of the database with only three 13 TeV analyses: the ATLAS search for stops in the 1 lepton channel (ATLAS-SUSY-2016-16) [48], the CMS search for hadronically decaying tops (CMS-SUS-16-050) [49], and the generic CMS search in the multi-jet plus E miss T channel (CMS-SUS-19-006) [15]. All three present results for simplified models with 2 and 4 tops or b-jets in the final state. 8 As discussed previously, the aim of the algorithm is to find proto-models which can fit dispersed signals and at the same time satisfy the other constraints. In this example, the potential dispersed signals appear in the ATLAS and CMS stop searches, where ∼ 1-2 σ excesses have been observed in some signal regions. The third analysis (CMS-SUS-19-006), on the other hand, has seen an under-fluctuation in data and will play the role of the constraining result, or critic. Indeed, since the two CMS searches included cannot be considered as uncorrelated (see Fig. 2), they are not allowed to be combined into a single likelihood. The ATLAS analysis, however, can be combined with either one of the CMS searches, and since the small excesses are found in ATLAS-SUSY-2016-16 and CMS-SUS-16-050, these will correspond to the best combination discussed at the beginning of Section 4.
In Fig. 4  hence they are set to zero. Around step 20, a proto-model with a top partner (X 1 t ) is created, thus resulting in an increase in the test statistic. The following steps modify the X 1 Z + X 1 t model by randomly changing its parameters and adding or removing particles. Since the addition of new states only reduces the model test statistic due to the prior (see Eq. (9)), the walker usually reverts back to the minimal scenario with one X Z and one X t . We also see that for a few steps, such as step 70, the top partner is replaced by another state, resulting in a drastic reduction in K. After 200 steps, the highest scoring proto-model is that of step 94 and only contains the X 1 Z and X 1 t with masses of roughly 160 GeV and 1.1 TeV, respectively. The behavior of the likelihoods at a few relevant steps is illustrated in  Figure 4: Evolution of the walker during 200 steps for the toy model of Secion 4.2.
Shown are the particle content and masses every 10 steps. The scaling of each point is proportional to the proto-model's test statistic K at the corresponding step, so bigger points represent larger K values.
proto-model containing a X µ , X 1 t and X 1 Z with masses around 1.8 TeV, 1 TeV and 160 GeV, respectively. Since only the top partner contributes for fitting the excesses in the best combination, X µ adds irrelevant degrees of freedom, thus reducing the proto-model score due to the penalty from the prior. In the following step, shown in the center panel, X µ is removed, without impact on the likelihood ratio, as expected. Nonetheless, due to the change in prior, the test statistic K increases by about one unit. The next step, shown in the right panel, modifies the X t mass, which gives a better fit to the excess and an increase of the likelihood ratio. With K = 3.65 this turns out to be the highest-score model of the toy walk.
Although better results are likely to be found with a larger number of steps, this example illustrates how the algorithm walks in the proto-model parameter space to find models consistent with dispersed signals and the constraints. In the following section, we will present more meaningful results, found using the same algorithm but for the full SModelS database and with a much larger number of steps.

Results using the full SModelS Database
In this section we present the results of running the walker algorithm over the full

Walking over "fake" Standard Model Data
In order to compute the significance of the test statistic and, ultimately, its global p-value, we produce "MC toy" versions of the database, in which we replace the observed data by values sampled from the SM background models. More specifically, • for EM-type results we sample a normal distribution with a mean of the background estimate, and a variance of the squared error of the background estimate. The sampled value is then entered as the lambda parameter of a Poissonian distribution. Finally, the value drawn from the Poissonian is used as the "fake" observation, i.e. the fake event yield; • for UL-type results, for which expected upper limits are known to SModelS, we estimate the SM uncertainty on the expected limit assuming a large number of observed events and small systematical uncertainties [30]. The observed upper limits are then sampled from a Gaussian distribution: where δσ UL exp = σ UL exp /1.96 is the estimated width of the distribution; • UL-type results, for which only observed upper limits are available, enter the fake database as they are, since it is not possible to estimate the SM uncertainty in this case.
Using the sampled values described above, we produce 50 fake databases and apply the walker algorithm to all of them. The proto-models with the highest K values from each of these runs are then used to estimate the density of the test statistic K under the SM hypothesis via a Kernel density estimator: Here, N = 50 is the number of runs with fake databases, and K i fake the test statistic for the i-th run; kern w denotes the choice of Kernel in the Kernel density estimation. We choose a Gaussian with a width w determined by Scott's rule [50]. Figure 6 shows the K i fake values generated for each run as well as the result obtained for ρ(K). As we can see, under the SM hypothesis, values up to K ≈ 11 are expected, with the density peaking at K ≈ 4.
As discussed in Section 4, the walker algorithm looks for dispersed signals, with K computed as the maximum over all possible combinations of results. Therefore, large K values are expected to be found even when considering the SM hypothesis, where upward background fluctuations can mimic dispersed signals. In addition, it has been suggested [51] that the experimental results tend to be conservative and overestimate the background uncertainties.
We can strengthen this claim by considering the p-values computed for all signal regions of the EM results contained in the database. If the observed data is well described by the estimated backgrounds and the uncertainties thereon, one would expect a flat distribution of p-values. In fact, for the fake databases generated under the SM hypothesis, this is the case, as shown by the red histogram in Fig. 7. When we consider the real data, however, the distribution is not flat, as seen from the blue histogram in the same figure. This is expected if the background uncertainties are overestimated. As a consequence, the density ρ(K) of the SM hypothesis in Fig. 6 is conservative; we can safely assume that the actual density would be shifted towards lower values of K and that our results generally lie on the conservative side.

Walking over the LHC Data
We are now ready to apply the walker algorithm to the actual SModelS database with the real LHC results. We perform 10 runs, each employing 50 walkers and 1, 000 steps/walker. The results are summarized in Fig. 8, where we display the proto-models with the highest K value from each run. Besides the X 1 Z LBP, all models include one top-partner, X 1 t , and one light-flavor quark partner, X d,c , and their test statistics are at K = 6.76 ± 0.08 thus showing the stability of the algorithm. The X µ particle introduced in run 5 is due to small ≈ 1σ excesses in the  The red histogram shows the distribution for a few "fake" databases (but normalized to a single database), where the observed data has been obtained by sampling the background expectations as explained in the text. The solid blue histogram displays the same distribution, but for the real data. In order to avoid distortions due to small numbers of events, only signal regions with at least 3.5 expected events were included.
instructive to discuss the absolute "winning" one, i.e. the proto-model with K = 6.90 generated in step 582 of the 29th walker in run #9, in some more detail. It has X 1 t , X d and X 1 Z masses of 1166, 735 and 163 GeV, respectively, and produces signals in the tt + E miss T and jets+E miss T final states with SUSY-like cross sections.
Concretely, the effective signal strength multipliers areμ × κ X 1 tX 1 t ≈ 1.2 andμ × κ X dXd ≈ 0.5, corresponding to σ(pp → X 1 tX 1 t ) ≈ 2.6 fb and σ(pp → X dXd ) ≈ 24 fb at √ s = 13 TeV, and both the X 1 t and X d directly decay to the lightest state (X 1 Z ) with 100% BR. The analyses, which contain possible dispersed signals and drive this model (as well as the other ones in Fig. 8 [48,49], which lead to the introduction of a top partner (X 1 t ) with a mass around 1.2 TeV. Despite corresponding to small excesses, identifying the presence of such potential dispersed signals is one of the main goals of the algorithm presented here. Furthermore, the fact that these excesses appear in distinct ATLAS and CMS analyses and can be explained by the introduction of a single top partner is another interesting outcome of the whole procedure.
It is worth noting that the signal injected by the winning proto-model is typically smaller than the one favored by the excesses, as seen when comparing the Obs and the Signal columns in Table 3. This is due to a tension with the constraints imposed   Some comments are in order on the significance of the excesses and K values found for the models in Fig. 8. In Fig. 9 Table 4: List of the most constraining results for the highest score proto-model. The second column displays the constrained production mode, while the third column shows the respective cross section value. We also show the corresponding observed and expected upper limits and the ratio r obs = σ XX /σ UL obs .
the SM-only hypothesis (see the discussion in Section 5.1) and the values for the 10 highest-scoring proto-models obtained with the "real" database (green stars).
As we can see, despite being at the tail of the distribution, the values still seem compatible with the SM-only hypothesis. In order to quantify this compatibility, we compute the p-value given by the relative frequency of the K values under the SM hypothesis which were found above the observed K values (K obs ): Here N = 50 is the number of fake SM-only runs, 1 X (x) denotes the indicator function that equals unity for all x ∈ X and zero otherwise,K obs is the average of K obs of the ten runs over the "real" database, and ρ(K) refers to the estimated density of the test statistic as defined in Eq. (11). The resulting value corresponds to the shaded area under the curve shown in Fig. 9. Using the K values in Fig. 8 we obtain the global p-value for the SM hypothesis: Our results thus indicate a very mild disagreement with the SM hypothesis. As already pointed out in Section 5.1, we expect this to be conservative due to potentially overestimated background uncertainties. Furthermore, this result requires no look-elsewhere correction, since the SM density ρ(K) was derived through the exact same procedure as applied for the real database.
Complementary information on the distributions of the test statistic K and posterior densities can be found in appendix A.2.

Walking over "fake" Signals
As discussed above, the K values obtained for the real database are in a mild disagreement with the SM hypothesis. It is relevant, however, to investigate whether the procedure proposed in this work would be able to reproduce the underlying model. For this kind of closure test, we employ a scheme analogous to the one used to produce fake databases under the SM-only hypothesis in Section 5.1. This time, however, using the highest-score model from Fig. 8, we create fake observed data under the "BSM hypothesis". Concretely, we sample from the signal plus background model, assuming that the corresponding uncertainties are dominated by the errors on the background. However, when producing the fake "BSM hypothesis" databases, we scale the background uncertainties by a "fudge factor" of 0.65, aiming to produce fake data closely resembling those observed in the real database. This is motivated by the fact that, as discussed in Section 5.1, the observed data in the real database point to an overestimation of the background uncertainties. We generate 10 such "fake signal" databases and perform a run with each of them. The resulting distribution of the test statistics K is shown in Fig. 9. As can be seen, under the BSM hypothesis (the fake signals) K varies between K ≈ 5.8 and K ≈ 12.3. Figure 10 shows the particle spectra of the high-score models of these fake signal runs and compares them to the injected signal. We see that X 1 t was reconstructed nicely at around the right mass scale, in 8 out of 10 runs. The quark partner, X q , was found in 9 out of 10 runs, though once at a significantly lower mass. Finally, runs 5, 6, and 9 introduced spurious lepton partners, all due to various background fluctuations created in the statistical sampling.
All in all, the differences between the injected and reconstructed signals shown in Fig. 10   including the masses. The result from injecting the high-score model from Fig. 8 is shown in Fig. 11. It nicely confirms (i) the robustness of our procedure, and (ii) the conclusion that the small deviations in Fig. 10 are due to fluctuations generated by the statistical sampling.

Conclusions and Outlook
In view of the null results (so far) in the numerous channel-by-channel searches for new particles, it becomes increasingly relevant to change perspective and attempt a more global approach to find out where BSM physics may hide. To this end we presented in this paper a novel statistical learning algorithm, that is capable of identifying potential dispersed signals in the slew of published LHC analyses. The with a neutralino-like lightest particle X 1 Z below about 400 GeV, while staying in agreement with constraints from other searches. The highest-score proto-model has a test statistic of K = 6.9, while the global p-value for the SM hypothesis is We stress that, while interesting per se, these results are intended mostly as a proof-of-principle. Indeed the current realisation of our proto-model builder is still limited by the variety and type of simplified-model results available. First, the mass planes of simplified model results need to extend far enough to high and low masses to allow a good coverage of all types of new particles and coupling strengths. 11 Wider mass ranges than is current practice would be very useful in this respect.
Second, and perhaps more importantly, only EM-type results (i.e. A× maps) allow for the computation of a proper likelihood. Analyses for which only 95% CL limits on the signal cross sections are available force us to make crude approximations, which can significantly distort the statistical evaluation. So far, however, merely one third of the available simplified model results are EMs. Third, as also stressed in [10], correlation data enabling the combination of signal regions are essential for avoiding either overly conservative or over-enthusiastic interpretations, and for stabilising global fits. 12 So far, however, appropriate correlation data in combination with EMs are available for only four analyses.
The first and second issues above can in principle be resolved to some extent by ourselves, through the development of "home-grown" EMs by means of simulationbased recasting. This comes however with the caveat that "home-grown" EMs will always be less precise than "official" ones from the experimental collaborations.
Moreover, information on background correlations can only be provided by the experiments. We therefore strongly encourage ATLAS and CMS to systematically 11 For example, fermionic partners have larger production cross sections than scalar ones; simplified model results developed for the scalar hypothesis (squarks, stops, etc.) therefore often do not extend to high enough masses to fully cover fermionic partners. 12 In this respect we also refer to Ref. [62], which found that the use of single best-expected signal regions was numerically unstable as well as statistically suboptimal. Furthermore, any approach forced to conservatively use single best-expected signal regions invalidates the interpretation of the profile log-likelihood-ratio via Wilks' Theorem, necessitating the uptake of approximate methods.
provide EM results together with correlation data, following the recommendations in [10]. The pyhf JSON likelihoods, as provided for some ATLAS analyses, are particularly useful: in addition to communicating the full statistical model, which allows for a much more accurate evaluation of the likelihood, they may also allow for a better estimate of cross-analysis correlations than currently possible.
Also on the technical side our current procedure, while good enough for a proofof-principle, can be improved in several aspects. For example, in the current version an Akaike-type information criterion has been used to judge the quality of a protomodel. It will be interesting and relevant to try other information criteria, such as the Bayesian information criterion, or the deviance information criterion, systematically comparing their performances [63]. Furthermore, we plan to work towards making the global, combined likelihood differentiable. This will allow for using gradient-based methods, which will be a major performance boost. Note here that pyhf also aims at full differentiability, which is another argument in favor of communicating full statistical models via this scheme. In this respect it will also be interesting to machine learn the SModelS database, e.g. via neural networks or symbolic regression, as these methods automatically come with a gradient.
Regarding the statistical interpretation, future work will concern, for instance, a more quantitative assessment of the fitted proto-models, including estimations of posterior distributions for the relevant parameters. Information from LHC measurements may be added to the game in a Contur-like approach [64]. Evidently, the project will also profit from all developments of SModelS itself, since any generalization of the SModelS formalism allows for an ever larger space of proto-models.
Last but not least, going beyond the concept of proto-models, it will be highly interesting to investigate how the high score models generated by our algorithm can be mapped to effective BSM field theories or even to full UV complete theories. All in all, there is much exciting work to be done.
Code and data management: while not a published tool so far, the code of the proto-model builder is publicly available at https://github.com/SModelS/ protomodels together with the detailed data files of the results presented in this paper.

A.1 Building Proto-Models
As discussed in Section 2, proto-models are defined by their particle content, masses, decay modes and signal strength multipliers. In order to perform a MCMC-type walk over this parameter space, in each step of the walk the following random changes are made: • Add a new particle: one of the BSM particles listed in Table 1 not yet present in the model can be randomly added. Once added, the mass of the new particle is drawn from a uniform distribution between the LBP mass and 2.4 TeV. The new particle is initialized with random branching ratios (for the corresponding decays listed in Table 1) and signal strength multipliers set to one. Adding a particle is programmed to occur more often for models with low test statistics and/or with a small number of particles.
• Remove an existing particle: one particle present in the model is randomly selected and removed. All the production cross sections and decays involving the removed particle are deleted and the remaining branching ratios are normalized, so they add up to 1. Removing a particle is set to occur more often for models with low test statistics and/or with a large number of particles.
• Change the mass of an existing particle: the mass of a randomly chosen particle is changed by an amount δ m according to a uniform distribution whose exact interval depends on the test statistic and number of unfrozen particles in the model, with better performing models making smaller changes. This change is always performed if no other changes have been made in the protomodel in a given step.
• Change the branching ratios: the branching ratio of a randomly chosen particle is changed. This change can occur in three distinct ways: i) a random decay channel can have its BR set to 1 and all other channels are closed, ii) a random decay channel can be closed and iii) each decay channel can have its BR modified by a distinct random amount δ BR drawn from a uniform distribution between −a and a, where a = 0.1/(number of open channels).
After any of these changes, the branching ratios are normalized to make sure they add up to unity.
• Change the signal strength multipliers: the signal strength multiplier (SSM) can be randomly changed in three ways: i) a specific production cross section for a randomly selected final state can have its SSM re-scaled by a value drawn from a Gaussian distribution centered around 1 and with width 0.1, ii) a random process can have its SSM set to zero, one, or to the SSM of another process and iii) all the processes involving a randomly chosen particle can have their SSMs re-scaled by a random number between 0.8 and 1.2.
In addition to the above changes, we also include the following changes in the proto-model at each step of the MCMC walk: • Check for particles below the mass wall or within the mass corridor: any changes that would violate the conditions on proto-models outlined at the end of Section 2.1 are reverted.
• Remove redundant particles: particles which do not contribute to a signal entering the likelihood for the best combination of results are removed.

A.2 Distributions and Posteriors
Distributions of K In order to illustrate the convergence of the walker algorithm, we show in Fig. 12 the behavior of the test statistic close to the highest-score proto-model (cf. run 9 in Fig. 8) as a function of the masses of the BSM particles (X d , X t and X 1 Z ). As we can see, the proto-model generated by the walker is very close to the maximum of the 1D distributions, showing that the best score obtained during the walk has indeed converged to the (local) maximum. The same behavior is seen in Fig. 13, where we display the K distribution as a function of the signal strength multipliers. All the curves in Fig. 13 display a sharp cut onμκ at large κ values. This behavior is due to the limit imposed on the signal strength by the critic. Hence, as κ increases,μ must decrease in order to satisfy the constraints on the total signal cross-section, which is proportional toμκ. This explicitly shows the tension between the critic, which limits µ, and the proto-model builder, which would tend to increase κ in order to maximize K.

Posteriors of Particle Masses
Another interesting question is that of the mutual compatibility of the excesses.
To address this question, we show in Fig. 14   The combined BCR is shown by the solid black curve. As we can see, the dependence on the masses is rather flat making all the BCRs compatible with each other.
We also see that the preferred points are at the edge of the region allowed by the critic, which once more displays the tension between the critic and the excesses. We remind the reader that, when only observed ULs are available, no proper likelihood can be constructed; instead only a step function at the 95% CL limit can