A strategy for a general search for new phenomena using data-derived signal regions and its application within the ATLAS experiment

This paper describes a strategy for a general search used by the ATLAS Collaboration to find potential indications of new physics. Events are classified according to their final state into many event classes. For each event class an automated search algorithm tests whether the data are compatible with the Monte Carlo simulated expectation in several distributions sensitive to the effects of new physics. The significance of a deviation is quantified using pseudo-experiments. A data selection with a significant deviation defines a signal region for a dedicated follow-up analysis with an improved background expectation. The analysis of the data-derived signal regions on a new dataset allows a statistical interpretation without the large look-elsewhere effect. The sensitivity of the approach is discussed using Standard Model processes and benchmark signals of new physics. As an example, results are shown for 3.2 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} of proton–proton collision data at a centre-of-mass energy of 13 TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {TeV}$$\end{document} collected with the ATLAS detector at the LHC in 2015, in which more than 700 event classes and more than 105\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} regions have been analysed. No significant deviations are found and consequently no data-derived signal regions for a follow-up analysis have been defined.


Introduction
Direct searches for unknown particles and interactions are one of the primary objectives of the physics programme at the Large Hadron Collider (LHC). The ATLAS experiment at the LHC has thoroughly analysed the Run 1 pp collision dataset (recorded in 2010-2012) and roughly a quarter of the expected Run 2 dataset (2015Run 2 dataset ( -2018. No evidence of physics beyond the Standard Model (SM) has been found in any of the searches performed so far. e-mail: atlas.publications@cern.ch Searches that have been performed to date do not fully cover the enormous parameter space of masses, crosssections and decay channels of possible new particles. Signals might be hidden in kinematic regimes and final states that have remained unexplored. This motivates a modelindependent 1 analysis to search for physics beyond the Standard Model (BSM) in a structured, global and automated way, where many of the final states not yet covered can be probed.
General searches without an explicit BSM signal assumption have been been performed by the DØ Collaboration [1][2][3][4] at the Tevatron, by the H1 Collaboration [5,6] at HERA, and by the CDF Collaboration [7,8] at the Tevatron. At the LHC, preliminary versions of such searches have been performed by the ATLAS Collaboration at √ s = 7, 8 and 13 TeV, and by the CMS Collaboration at √ s = 7 and 8 TeV. This paper outlines a strategy employed by the ATLAS Collaboration to search in a systematic and (quasi-)modelindependent way for deviations of the data from the SM prediction. This approach assumes only generic features of the potential BSM signals. Signal events are expected to have reconstructed objects with relatively large momentum transverse to the beam axis. The main objective of this strategy is not to finally assess the exact level of significance of a deviation with all available data, but rather to identify with a first dataset those phase-space regions where significant deviations of the data from SM prediction are present for a further dedicated analysis. The observation of one or more significant deviations in some phase-space region(s) serves as a trigger to perform dedicated and model-dependent analyses where these 'data-derived' phase-space region(s) can be used as signal regions. Such an analysis can then determine the level of significance using a second dataset. The main advantage of this procedure is that it allows a large number of phase-space regions to be tested with the available resources, thereby minimizing the possibility of missing a signal for new physics, while simultaneously maintaining a low false discovery rate by testing the data-derived signal region(s) on an independent dataset in a dedicated analysis. The dedicated analysis with data-derived signal regions also allows an improved background prediction.
In this approach, events are first classified into different (exclusive) categories, labelled with the multiplicity of finalstate objects (e.g. muons, electrons, jets, missing transverse momentum, etc.) in an event. These final-state categories are then automatically analysed for deviations of the data from the SM prediction in several BSM-sensitive distributions using an algorithm that locates the region of largest excess or deficit. Sensitivity tests for specific signal models are performed to demonstrate the effectiveness of this approach. The methodology has been applied to a subset of the √ s = 13 TeV proton-proton collision data as reported in this paper. The data were collected with the ATLAS detector in 2015, and correspond to an integrated luminosity of 3.2 fb −1 .
The paper is organized as follows: the general analysis strategy is outlined in Sect. 2, while Sect. 3 provides specific details about its application to the ATLAS 2015 pp collision dataset. Conclusions are given in Sect. 4.

Strategy
The analysis strategy assumes that a signal of unknown origin can be revealed as a statistically significant deviation of the event counts in the data from the expectation in a specific data selection. A data selection can be any set of requirements on objects or variables needed to define a signal region (e.g. an event class or a specific range in one or multiple observables). In order to search for these signals a large variety of data selections need to be tested. This requires a high degree of automation and a categorization of the data events according to their main features. The main objective of this analysis is to identify selections for which the data deviates significantly from the SM expectation. These selections can then be applied as data-derived signal regions in a dedicated analysis to determine the level of significance using a new dataset. This has the advantage of a more reliable background expectation, which should allow an increase in signal sensitivity compared to a strategy that only relies on Monte Carlo expectations with a typically conservative evaluation of uncertainties. The strategy is divided into the seven steps described below.

Step 1: Data selection and Monte Carlo simulation
The recorded data are reconstructed via the ATLAS software chain. Events are selected by applying event-quality and trig-ger criteria, and are classified according to the type and multiplicity of reconstructed objects with high transverse momentum ( p T ). Objects that can be considered in the classification are those typically used to characterize hadron collisions such as electrons, muons, τ -leptons, photons, jets, b-tagged jets and missing transverse momentum. More complex objects, which were not implemented in the example described in Sect. 3, could also be considered. Examples are resonances reconstructed by a specific decay (e.g. Z or Higgs bosons decaying into two or four isolated leptons respectively, or decaying hadronically and giving rise to large radius jets with substructure) and displaced vertices. Event classes (or channels) are then defined as the set of events with a given number of reconstructed objects for each type, e.g. two muons and a jet.
Monte Carlo (MC) simulations are used to estimate the expected event counts from SM processes. To allow the investigation of signal regions with a low number of expected events it is important that the equivalent integrated luminosity of the MC samples significantly exceeds that of the data, and that all relevant background processes are included, in particular rare processes which might dominate certain multiobject event classes.

Step 2: Systematic uncertainties and validation
The particular nature of this analysis, in which a large number of final states are explored, makes the definition of control and validation regions difficult. In searches for BSM physics at the LHC, control regions are used to constrain MC-based background predictions with auxiliary measurements. Validation regions are used to test the validity of the background model prediction with data.
The simplest way to construct a background model is to obtain the background expectation from the MC prediction including the corresponding theoretical and experimental uncertainties. This approach, which is applied in the example in Sect. 3, has the advantage that it prevents the absorption of BSM signal contributions into a rescaling of the SM processes. Another possible approach is to automatically define, for each data selection and algorithmic hypothesis test, statistically independent control selections. The data in the control selections can be used to rescale the MC background predictions and to constrain the systematic uncertainties. This comes at the price of reduced sensitivity for the case in which a BSM model predicts a simultaneous effect in the signal region and control region, which would be absorbed in the rescaling.
To verify the proper modelling of the SM background processes, several validation distributions are defined using inclusive selections for which observable signals for new physics are excluded. If these validation distributions show problems in the MC modelling, either corrections to the MC backgrounds are applied or the affected event class is excluded.
Uncertainties in the background estimate arise from experimental effects, and the theoretical accuracy of the prediction of the (differential) cross-section and acceptance of the MC simulation. Their effect is evaluated for all contributing background processes as well as for benchmark signals.

Step 3: Sensitive variables and search algorithm
Distributions of observables in the form of histograms are investigated for all event classes considered in the analysis. Observables are included if they have a high sensitivity to a wide range of BSM signals. The total number of observables considered is, however, restricted to a few to avoid a large increase in the number of hypothesis tests, as the latter also increases the rate of deviations from background fluctuations. In high-energy physics this effect is commonly known as the 'trial factor' or 'look-elsewhere effect'. Examples of such observables are the effective mass m eff (defined as the sum of the scalar transverse momenta of all objects plus the scalar missing transverse momentum), the total invariant mass m inv (defined as the invariant mass of all visible objects), the invariant mass of any combination of objects (such as the dielectron invariant mass in events with two electrons and two muons), event shape variables such as thrust [9,10] or even more complicated variables such as the output of a machinelearning algorithm.
A statistical algorithm is used to scan these distributions for each event class and quantify the deviations of the data from the SM expectation. The algorithm identifies the data selection that has the largest deviation in the distribution of the investigated observable by testing many data selections to minimize a test statistic. An example of a possible teststatistic which has also been used in the analysis described in Sect. 3, is the local p 0 -value, which gives the expected probability of observing a fluctuation that is at least as far from the SM expectation as the observed number of data events in a given region, if the experiment were to be repeated: where n is the independent variable of the Poisson probability mass function (pmf), N obs is the observed number of data events for a given selection, P(n ≤ N obs ) is the probability of observing no more than the number of events observed in the data and P(n ≥ N obs ) is the probability of observing at least the number of events observed in the data. The quantity N SM is the expectation for the number of events with its total uncertainty δ N SM for a given selection. The convolution of the Poisson pmf (with mean x) with a Gaussian probability density function (pdf), G(x; N SM , δN SM ) with mean N SM and width δ N SM , takes the effect of both non-negligible systematic uncertainties and statistical uncertainties into account. 2 If the Gaussian pdf G is replaced by a Dirac delta function δ(x − N SM ) the estimator p 0 results in the usual Poisson probability. The selection with the largest deviation identified by the algorithm is defined as the selection giving the smallest p 0 -value. The smallest p 0 for a given channel is defined as p channel , which therefore corresponds to the local p 0 -value of the largest deviation in that channel. Data selections are not considered in the scan if large uncertainties in the expectation arise due to a lack of MC events, or from large systematic uncertainties. To avoid overlooking potential excesses in these selections the p 0 -values of selections with more than three data events are monitored separately. Single outstanding events with atypical object multiplicities (e.g. events with 12 muons) are visible as an event class. Single outstanding events in the scanned distributions are monitored separately.
The result of scanning the distributions for all event classes is a list of data selections, one per event class containing the largest deviation in that class, and their local statistical significance. Details of the procedure and the statistical algorithm used for the 2015 dataset are explained in Sect. 3.3.

Step 4: Generation of pseudo-experiments
The probability that for a given observable one or more deviations of a certain size occur somewhere in the event classes considered is modelled by pseudo-experiments. Each pseudo-experiment consists of exactly the same event classes as those considered when applying the search algorithm to 2 The second term in Eq. (2) gives the probability of observing no events given a negative expectation from downward variations of the systematic uncertainties. It can be derived as follows: where μ is the mean of the Poisson pmf and δ n0 = {1 if n = 0, 0 if n = 0} is the Kronecker delta. In Eq. (3) this term vanishes for N obs > 0.
Eur. Phys. J. C (2019) 79:120 Fig. 1 The fractions of pseudo-experiments (P exp,i ( p min )) in the m inv scan, which have at least one, two or three p channel -values smaller than a given threshold ( p min ). Pseudo-datasets are generated from the SM expectation. Dotted lines are drawn at P exp,i = 5% and at the corresponding − log 10 ( p min )-values data. However, the data counts are replaced by pseudo-data counts which are generated from the SM expectation using an MC technique. Pseudo-data distributions are produced taking into account both statistical and systematic uncertainties by drawing pseudo-random data counts for each bin from the convolved pmf used in Eqs. (1)-(3) to compute a p 0 -value. Correlations in the uncertainties of the SM expectation affect the chance of observing one or more deviations of a given size. The effect of correlations between bins of the same distribution or between distributions of different event classes are therefore taken into account when generating pseudo-data for pseudo-experiments. Correlations between distributions of different observables are not taken into account, since the results obtained for different observables are not combined in the interpretation.
The search algorithm is then applied to each of the distributions, resulting in a p channel -value for each event class. The p channel distributions of many pseudo-experiments and their statistical properties can be compared with the p channel distribution obtained from data to interpret the test statistics in a frequentist manner. The fraction of pseudo-experiments having one of the p channel -values smaller than a given value p min indicates the probability of observing such a deviation by chance, taking into account the number of selections and event classes tested.
To illustrate this, Fig. 1 shows three cumulative distributions of p channel -values from pseudo-experiments. The number of event classes (686) and the m inv distributions used to generate these pseudo-experiments coincide with the example application in Sect. 3. The distribution in Fig. 1 with circular markers is the fraction of pseudo-experiments with at least one p channel -value smaller than p min . For example, about 15% of the pseudo-experiments have at least one p channelvalue smaller than p min = 10 −4 . Therefore, the estimated probability (P exp,i ) of obtaining at least one p channel -value (i = 1) smaller than 10 −4 from data in the absence of a signal is about 15%, or P exp,1 (10 −4 ) = 0.15. To estimate the probability of observing deviations of a given size in at least two or three different event classes, the second or third smallest p channel -value of a pseudo-experiment is compared with a given p min threshold. From Fig. 1 it follows for instance that 2% of the pseudo-experiments have at least three p channel -values smaller than 10 −4 . Consequently, the probability of obtaining a third smallest p channel -value smaller than 10 −4 from data in the absence of a signal is about 2%, or P exp,3 (10 −4 ) = 0.02 In Fig. 1 a horizontal dotted line is drawn at a fraction of pseudo-experiments of 5% and corresponding vertical dotted lines are drawn at the three p min thresholds. The observation of one, two or three p channel -values in data below the corresponding p min threshold, i.e. an observation with a P exp,i < 0.05, promotes the selections that yielded these deviations to signal regions that can be tested in a new dataset.

Step 5: Evaluation of the sensitivity
The sensitivity of the procedure to a priori unspecified BSM signals can be evaluated with two different methods that either use a modified background estimation through the removal of SM processes or in which signal contributions are added to the pseudo-data sample.
In the first method, a rare SM process (with either a low cross-section or a low reconstruction efficiency) is removed from the background model. The search algorithm is applied again to test the data or 'signal' pseudo-experiments generated from the unmodified SM expectation, against the modified background expectation. The data samples would be expected to reveal excesses relative to the modified background prediction.
In the second method, pseudo-experiments are used to test the sensitivity of the analysis to benchmark signal models of new physics. The prediction of a model is added to the SM prediction, and this modified expectation is used to generate 'signal' pseudo-experiments. The search algorithm is applied to the pseudo-experiments and the distribution of p channelvalues is derived.
To provide a figure of merit for the sensitivity of the analysis, the fraction of 'signal' pseudo-experiments with P exp,i < 5% for i = 1, 2, 3 is computed.

Step 6: Results
Finding one or more deviations in the data with P exp,i < 5% triggers a dedicated analysis that uses the data selection in which the deviation is observed as a signal region (step 7). If no significant deviations are found, the outcome of the analysis technique includes information such as: the number of events and expectation per event class, a comparison of the data with the SM expectation in the distributions of observables considered, the scan results (i.e. the location and the local p 0 -value of the largest deviation per event class) and the comparison with the expectation from pseudo-experiments.

2.7
Step 7 (only in the case of P exp,i < 5%): Dedicated analysis of deviation Dedicated analysis on original dataset Deviations are investigated using methods similar to those of a conventional analysis. In particular, the background prediction is determined using control selections to control and validate the background modelling. Such a procedure further constrains the background expectation and uncertainty, and reduces the dependence on simulation. If such a re-analysis of the region results in an insignificant deviation, it can be inferred that the deviation seen before was due to mismodellings or not well-enough understood backgrounds.
Dedicated analysis on an independent dataset If a deviation persists in a dedicated analysis using the original dataset, the data selection in which the deviation is observed defines a data-derived signal region that is tested in an independent new dataset with a similar or larger integrated luminosity. At this point, a particular model of new physics can be used to interpret the result of testing the data-derived signal region. Since the signal region is known, the corresponding data can be excluded ('blinded') from the analysis until the very end to minimize any possible bias in the analysis. Additionally, since only a few optimized hypothesis tests are performed on the independent dataset, the large look-elsewhere effect due to the large number of hypothesis tests performed in step 3 is not present in the dedicated analysis of the signal region(s). The assumptions of Gaussian uncertainties for the background models can also be tested in the dedicated analysis. If the full LHC data yields a significant deviation, the LHC running time may need to be increased, or the excess may have to be followed up at a future collider.

Advantages and disadvantages
The features of this strategy lead to several advantages and disadvantages that are outlined below. Advantages: • It can find unexpected signals for new physics due to the large number of event classes and phase-space regions probed, which may otherwise remain uninvestigated. • A relatively small excess in two or three independent data selections, each of which is not big enough to trigger a dedicated analysis by itself (P exp,1 > 5%), can trigger one in combination (P exp,2,3 < 5%).
• The approach is broad, and the scanned distributions can be used to probe the overall description of the data by the event generators for many SM processes. • The probability of a deviation occurring in any of the many different event classes under study can be determined with pseudo-experiments, resulting in a truly global interpretation of the probability of finding a deviation within an experiment such as ATLAS. Disadvantages: • The outcome depends on the MC-based description of physics processes and simulations of the detector response. Event classes in which the majority of the events contain misreconstructed objects are typically poorly modelled by MC simulation and might need to be excluded from the analysis. Although step 2 validates the description of the data by the MC simulation, there is still a possibility of triggering false positives due to an MC mismodelling in a corner of phase space.
Step 7 aims to minimize this by reducing the dependence on MC simulations in a dedicated analysis performed for each significant deviation. In future implementations a better background model could be constructed with the help of control regions or data-derived fitting functions. This might allow the detection of excesses that are small compared to the uncertainties in the MC-based description of the SM processes. • Since this analysis is not optimized for a specific class of BSM signals, a dedicated analysis optimized for a given BSM signal achieves a larger sensitivity to that signal. The enormous parameter space of possible signals makes an optimized search for each of them impossible. • The large number of data selections introduce a large look-elsewhere effect, which reduces the significance of a real signal.
Step 7 circumvents this problem since the final discovery significance is determined with a dedicated analysis of one or a few data selection(s) and a statistically independent dataset. This can yield an improved signal sensitivity if the background uncertainty can be constrained in the dedicated analysis. • Despite being broad, the procedure might miss a certain signal because it does not show a localized excess in one of the studied distributions. This might be overcome with better observables, better event classification or modified algorithms, which may then be sensitive to such signals.

Application of the strategy to ATLAS data
This section describes the application of the strategy outlined in the previous section to the 13 TeV pp collision data recorded by the ATLAS experiment in 2015.

ATLAS detector and dataset
The ATLAS detector [11] is a multipurpose particle physics detector with a forward-backward symmetric cylindrical geometry and a coverage of nearly 4π in solid angle. 3 The inner tracking detector (ID) consists of silicon pixel and microstrip detectors covering the pseudorapidity region |η| < 2.5, surrounded by a straw-tube transition radiation tracker which enhances electron identification in the region |η| < 2.0. Between Run 1 and Run 2, a new inner pixel layer, the insertable B-layer [12], was inserted at a mean sensor radius of 3.3 cm. The inner detector is surrounded by a thin superconducting solenoid providing an axial 2 T magnetic field and by a fine-granularity lead/liquid-argon (LAr) electromagnetic calorimeter covering |η| < 3.2. A steel/scintillator-tile calorimeter provides hadronic coverage in the central pseudorapidity range (|η| < 1.7). The endcap and forward calorimeter coverage (1.5 < |η| < 4.9) is completed by LAr active layers with either copper or tungsten as the absorber material. An extensive muon spectrometer with an air-core toroid magnet system surrounds the calorimeters. Three layers of high-precision tracking chambers provide coverage in the range |η| < 2.7, while dedicated fast chambers provide a muon trigger in the region |η| < 2.4. The ATLAS trigger system consists of a hardware-based level-1 trigger followed by a software-based high-level trigger [13]. The data used in this analysis were collected by the ATLAS detector during 2015 in pp collisions at the LHC with a centre-of-mass energy of 13 TeV and a 25 ns bunch crossing interval. After applying quality criteria for the beam, data and detector, the available dataset corresponds to an integrated luminosity of 3.2 fb −1 . In this dataset, each event includes an average of approximately 14 additional inelastic pp collisions in the same bunch crossing (pile-up).
Candidate events are required to have a reconstructed vertex [14], with at least two associated tracks with p T > 400 MeV. The vertex with the highest sum of squared transverse momenta of the tracks is considered to be the primary vertex. 3 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point in the centre of the detector. The positive x-axis is defined by the direction from the interaction point to the centre of the LHC ring, with the positive y-axis pointing upwards, while the beam direction defines the z-axis. Cylindrical coordinates (r , φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity η is defined in terms of the polar angle θ by η = − ln tan(θ/2). The angular distance is defined as R = ( η) 2 + ( φ) 2 . Rapidity is defined as y = 0.5·ln[(E+ p z )/(E− p z )] where E denotes the energy and p z is the component of the momentum along the beam direction.

Monte Carlo samples
Monte Carlo simulated event samples [15] are used to describe SM background processes and to model possible signals. The ATLAS detector is simulated either by a software system based on Geant4 [16] or by a faster simulation based on a parameterization of the calorimeter response and Geant4 for the other detector systems. The impact of detector conditions on the simulation is typically corrected for as part of the calibrations and scale factors applied to the reconstructed objects.
To account for additional pp interactions from the same or nearby bunch crossings, a set of minimum-bias interactions generated using Pythia 8.186 [17], the MSTW2008LO [18] parton distribution function (PDF) set and the A2 set of tuned parameters (tune) [19] was superimposed onto the hardscattering events to reproduce the observed distribution of the average number of interactions per bunch crossing.
Any further study of time-dependent detector variations would be part of the dedicated search following any interesting deviation.
In all MC samples, except those produced by Sherpa [20], the EvtGen v1.2.0 program [21] was used to model the properties of the bottom and charm hadron decays. The SM MC programs are listed in Table 1 and a detailed explanation can be found in Appendix A.1.
In addition to the SM background processes, two possible signals are considered as benchmarks. The first benchmark model considered is the production of a new heavy neutral gauge boson of spin 1 (Z ), as predicted by many extensions of the SM. Here, the specific case of the sequential extension of the SM gauge group (SSM) [22,23] is considered, for which the couplings are the same as for the SM Z boson. This process was generated at leading order (LO) using Pythia 8.212 with the NNPDF23LO [24] PDF set and the A14 tune [25], as a Drell-Yan process, for five different resonant masses, covering the range from 2 TeV to 4 TeV, in steps of 0.5 TeV. The considered decays of Z bosons are inclusive, covering the full range of lepton and quark pairs. Interference effects with SM Drell-Yan production are not included, and the Z boson is required to decay into fermions only.
The second signal considered is the supersymmetric [26][27][28][29][30][31] production of gluino pairs through strong interactions. The gluinos are assumed to decay promptly into a pair of top quarks and an almost massless neutralino via an offshell top squarkg → ttχ 0 1 . Samples for this process were generated at LO with up to two additional partons using MG5_aMC@NLO 2.2.2 [32] with the CTEQ6L1 [33] PDF set, interfaced to Pythia 8.186 with the A14 tune. The matching with the parton shower was done using the CKKW-L [34] prescription, with a matching scale set to one quarter of the pair-produced resonance mass. The signal cross-sections Table 1 A summary of the MC samples used in the analysis to model SM background processes. For each sample the corresponding generator, matrix element (ME) accuracy, parton shower, cross-section normalization accuracy, PDF set and tune are indicated. Details are given in Appendix A.1. Samples with 'data' in the 'cross-section normalization' column are scaled to data as described in Sect. 3 were calculated at next-to-leading order (NLO) in the strong coupling constant, adding the resummation of soft gluon emission at next-to-leading-logarithm (NLL) accuracy [35-37].

Object reconstruction
Reconstructed physics objects considered in the analysis are: prompt and isolated electrons (e), muons (μ) and photons (γ ), as well as b-jets (b) and light (non-b-tagged) jets ( j) reconstructed with the anti-k t algorithm [38] with radius parameter R = 0.4, and large missing transverse momentum (E miss T ). Table 2 lists the reconstructed physics objects along with their p T and pseudorapidity requirements. Jets and electrons misidentified as hadronically decaying τ -leptons are difficult to model with the MC-based approach used in this analysis. Therefore, the identification of hadronically decaying τ -leptons is not considered; they are mostly reconstructed as light jets. Details of the object reconstruction can be found in Appendix B.
After object identification, overlaps between object candidates are resolved using the distance variable R y = ( y) 2 + ( φ) 2 . If an electron and a muon share the same ID track, the electron is removed. Any jet within a distance R y = 0.2 of an electron candidate is discarded, unless the jet has a value of the b-tagging MV2c20 discriminant [39,40] larger than that corresponding to approximately 85% b-tagging efficiency, in which case the electron is discarded since it probably originated from a semileptonic bhadron decay. Any remaining electron within R y = 0.4 of a jet is discarded. Muons within R y = 0.4 of a jet are also removed. However, if the jet has fewer than three associated tracks, the muon is kept and the jet is discarded instead to avoid inefficiencies for high-energy muons undergoing significant energy loss in the calorimeter. If a photon candidate is found within R y = 0.4 of a jet, the jet is discarded. Photons within a cone of size R y = 0.4 around an electron or muon candidate are discarded.
The missing transverse momentum (with magnitude E miss T ) is defined as the negative vector sum of the transverse momenta of all selected and calibrated physics objects (electrons, photons, muons and jets) in the event, with an additional soft-term [41]. The soft-term is constructed from all tracks that are not associated with any physics object, but are associated with the primary vertex. The missing transverse momentum is reconstructed for all events; however, separate analysis channels are constructed for events with E miss T > 200 GeV. These events are taken exclusively from the E miss T trigger.

Event selection and classification
The events are divided into mutually exclusive classes that are labelled with the number and type of reconstructed objects listed in Table 2. The division can be regarded as a classification according to the most important features of the event. The classification includes all possible final-state configurations and object multiplicities, e.g. if a data event with seven reconstructed muons and no other objects is found, it is classified in a '7-muon' event class (7μ). Similarly an event with missing transverse momentum, two muons, one photon and four jets is classified and considered in the corresponding event class denoted E miss T 2μ1γ 4 j. All events contributing to a particular event class are also required to be selected by a trigger from a corresponding class of triggers by imposing a hierarchy in the event selection. This avoids ambiguities in the application of trigger efficiency corrections to MC simulations and avoids variations in the acceptance within an event class. The flow diagram in Fig. 2 gives a graphical representation of the trigger and offline event selection, based on the class of the event. Since the thresholds for the single-photon and single-jet triggers are higher than the p T requirements in the photon and jet object selection, an additional reconstruction-level p T cut is imposed to avoid trigger inefficiencies. For the other triggers, the p T requirements in the object definitions exceed the trigger thresholds by a sufficient margin to avoid additional trigger inefficiencies. Electrons are considered before muons in the event selection hierarchy because the electron trigger efficiency is considerably higher compared to the muon trigger efficiency.
Events with E miss T > 200 GeV are required to pass the E miss T trigger which becomes fully efficient at 200 GeV, otherwise they are rejected and not considered for further event selection. If the event has E miss T < 200 GeV but contains an electron with p T > 25 GeV it is required to pass the single-electron trigger. However, events with more than one electron with p T > 25 GeV or with an additional muon with p T > 25 GeV can be selected by the dielectron trigger or electron-muon trigger respectively if the event fails to pass the single-electron trigger. Events with a muon with p T > 25 GeV but no reconstructed electrons or large E miss T are required to pass the single-muon trigger. If the event has more than one muon with p T > 25 GeV and fails to pass the single-muon trigger, it can additionally be selected by the dimuon trigger. Remaining events with a photon with p T > 140 GeV or two photons with p T > 50 GeV are required to pass the single-photon or diphoton trigger, respectively. Finally, any remaining event with no large E miss T , leptons, or photons, but containing a jet with p T > 500 GeV is required to pass the single-jet trigger. In addition to the thresholds imposed by the trigger, a further selection is applied to event classes with E miss T < 200 GeV containing one lepton or one electron and one muon and possibly additional photons or jets (1μ + X , 1e + X and 1μ1e + X ), to reduce the overall data volume. In these event classes, one lepton is required to have p T > 100 GeV if the event has less than three jets with p T > 60 GeV.
To suppress sources of fake E miss T , additional requirements are imposed on events to be classified in E miss T categories. The ratio of E miss T to m eff is required to be greater than 0.2, and the minimum azimuthal separation between the E miss T direction and the three leading reconstructed jets (if present) has to be greater than 0.4, otherwise the event is rejected. . The luminosity measurement was calibrated during dedicated beam-separation scans, using the same methodology as that described in Ref. [47]. The uncertainty of this measurement is found to be 2.1%.
In total, 35 sources of experimental uncertainties are identified pertaining to one or more physics objects considered. For each source the one-standard-deviation (1σ ) confidence interval (CI) is propagated to a 1σ CI around the nominal SM expectation. The total experimental uncertainty of the SM expectation is obtained from the sum in quadrature of these 35 1σ CIs and the uncertainty of the luminosity measurement.
Theoretical modelling uncertainties Two different sources of uncertainty in the theoretical modelling of the SM production processes are considered. A first uncertainty is assigned to account for our knowledge of the cross-sections for the inclusive processes. A second uncertainty is used to cover the modelling of the shape of the differential cross-sections. In order to derive the modelling uncertainties, either variations of the QCD factorization, renormalization, resummation and merging scales are used or comparisons of the nominal MC samples with alternative ones are used. For some SM processes additional modelling uncertainties are included. Appendix A.2 describes all theoretical uncertainties considered for the various SM processes. The total uncertainty is taken as the sum in quadrature of the two components and the statistical uncertainty of the MC prediction.

Validation procedures
The evaluated SM processes, together with their standard selection cuts and the studied validation distributions, are detailed in Table 3. These validation distributions rely on inclusive selections to probe the general agreement between data and simulation and are evaluated in restricted ranges where large new-physics contributions have been excluded by previous direct searches.
There are some cases in which the validation procedure finds modelling problems and MC background corrections are needed (multijets, γ (γ ) + jets). In other cases, the affected event classes are excluded from the analysis as their SM expectation dominantly arises from object misidentification (e.g. jets reconstructed as electrons) which is poorly modelled in MC simulation. The excluded classes are: 1e1 j, 1e2 j, 1e3 j, 1e4 j, 1e1b, 1e1b1 j, 1e1b2 j, 1e1b3 j. Event classes containing a single object, as well as those containing only E miss T and a lepton are also discarded from the analysis due to difficulties in modelling final states with one high energy object recoiling against many soft (non-reconstructed) ones.

Corrections to the MC background
The MC samples for multijet and γ + jets production, while giving a good description of kinematic variables, predict an overall cross-section and a jet multiplicity distribution that disagrees with data. Following step 2, correction procedures were applied.
In classes containing only j and b the multijet MC samples are scaled to data with normalization factors ranging between approximately 0.8 and 1.2. The normalization factors are derived separately in each exclusive jet multiplicity class by equating the expected total number of events to the observed number of events. Multijet production in other channels are not rescaled and found to be described by the MC samples within the theoretical uncertainties. If a channel contains less than four data events, no modifications are made.
For γ + jets event classes the same rescaling procedure is applied to classes with exactly one photon, no leptons or E miss T , and any number of jets.
is the number of (b-)jets in an event. For the distance variables R and φ, the two instances of the objects with the minimum distance between them are used. The H T (jets), m T ( , E miss T ), p T ( ) and m inv validation distributions are evaluated in restricted ranges where large new-physics contributions have been excluded by previous direct searches. Sameflavour opposite-charge sign lepton pairs are referred to as SFOS pairs Physics process Event selection Additional validation distributions Same selection as Z (→ ) + jets Same distributions as Z (→ ) + jets and tt → 1 lepton, at least 2 b-jets and at least 2 light jets 1 electron and 1 muon of opposite charge and no jets No leptons or photons E miss T /m eff The Sherpa 2.1.1 MC generator has a known deficiency in the modelling of E miss T due to too large forward jet activity. This results in a visible mismodelling of the E miss T distribution in event classes with two photons, which also affects the m eff distribution. To correct for this mismodelling a reweighting [48] is applied to the background events containing two real photons (γ γ + jets). The diphoton MC events are reweighted as a function of E miss T and of the number of selected jets to match the respective distributions in the data for the inclusive diphoton sample in the range  The application of scale factors also outside the region where data to Monte Carlo comparisons are made would be cross-checked in the dedicated reanalysis of any deviation.

Comparison of the event yields with the MC prediction
After classification, 704 event classes are found with at least one data event or an SM expectation greater than 0.1 events. The data and the background predictions from MC simulation for these classes are shown in Fig. 3 and Appendix C. Agreement is observed between data and the prediction in most of the event classes. In events classes having more than two b-jets and where the SM expectation is dominated by tt production, the nominal SM expectation is systematically slightly below the data. Data events are found in 528 out of 704 event classes. These include events with up to four leptons (muons and/or electrons), three photons, twelve jets and eight b-jets. There are 18 event classes with an SM expectation of less than 0.1 events; no more than two data events are observed in any of these, and they are not considered further in the analysis. No outstanding event was found in those channels. The remaining 686 classes are retained for statistical analysis.

Step 3: Sensitive variables and search algorithm
In order to quantitatively determine the level of agreement between the data and the SM expectation, and to identify regions of possible deviations, this analysis uses an algorithm for multiple hypothesis testing. The algorithm locates a single region of largest deviation for specific observables in each event class. In the following, an algorithm derived from the algorithm used in Ref. [5] is applied to the 2015 dataset.

Choice of variables
For each event class, the m eff and m inv distributions are considered in the form of histograms. The invariant mass is computed from all visible objects in the event, with no attempt to use the E miss T information. These variables have been widely used in searches for new physics, and are sensitive to a large range of possible signals, manifesting either as bumps, deficits or wide excesses. Several other commonly used kinematic variables have also been studied for various models, but were not found to significantly increase the sensitivity. The approach is however not limited to these variables, as discussed in Sect. 2.
For each histogram, the bin widths h(x) as a function of the abscissa x are determined using: where N objects is the number of objects in the event class, k is the width of the bin in standard deviations, and σ i (x/2) is the expected detector resolution in the central region for the p T of object i evaluated at p T = x/2 to roughly approx-imate the largest p T -scale in the event. An exception to this is the missing transverse momentum resolution (σ E miss T ), which is a function of E T , where E T is approximated by the effective mass minus the E miss T object requirement: object is only considered in the binning of the effective mass histograms. A ±1σ interval is used for the bin width (k = 2) for all objects except for photons and electrons, for which a ±3σ interval is used (k = 6) to avoid having too finely binned histograms with few MC events. This results in variable bin widths with values ranging from 20 GeV to about 2000 GeV. For a given event class, the scan starts at a value of the scanned observable larger than two times the sum of the minimum p T requirement of each contributing object considered (e.g. 100 GeV for a 2μ class). This minimises spurious deviations which might arise from insufficiently well modelled threshold regions.

Algorithm to search for deviations of the data from the expectation
The algorithm identifies the single region with the largest upward or downward deviation in a distribution, provided in the form of a histogram, as the region of interest (ROI). The total number of independent bins is 36,936, leading to 518,320 combinations of contiguous bins (regions 4 ) with an SM expectation larger than 0.01 events. For each region with an SM expectation larger than 0.01, the statistical estimator p 0 is calculated as defined in Eqs. (1)-(3). Here, p 0 is to be interpreted as a local p 0 -value. The region of largest deviation found by the algorithm is the region with the smallest p 0value. Such a method is able to find narrow resonances and single outstanding bins, as well as signals spread over large regions of phase space in distributions of any shape.
To illustrate the operation of the algorithm, six example distributions are presented. Figure 4a shows the invariant mass distribution of the event class with one photon, three light jets and large missing transverse momentum (E miss T 1γ 3 j), which has the smallest p channel -value in the m inv scan. Figure 4b shows the effective mass distribution of the event class with one muon, one electron, four b-jets and two light jets (1μ1e4b2 j), which has the smallest p channelvalue in the m eff scan. Figure 4c shows the invariant mass distribution of the event class with one electron, one photon, two b-jets and two light jets (1e1γ 2b2 j). Figure 4d shows the effective mass distribution of the event class with six light jets (6 j). Figure 4e shows the invariant mass distribution of the event class with two muons, a light jet and large missing transverse momentum (E miss T 2μ1 j) and Fig. 4f shows the effective mass distribution of the event class with three light jets and large missing transverse momentum (E miss T 3 j). The regions with the largest deviation found by the search algorithm in these distributions, an excess in Fig. 4a-c, f, and a deficit in Fig. 4d, e, are indicated by vertical dashed lines.
To minimize the impact of few MC events, 213, 992 regions where the background prediction has a total relative uncertainty of over 100% are discarded by the algorithm. Discarding a region forces the algorithm to consider a different or larger region in the event class, or if no region in the event class satisfies the condition, to discard the entire event class. 5 For all discarded regions with N obs > 3 a p 0 -value is calculated. If the p 0 -value is smaller than the p channel -value (or if there is no ROI and hence no p channel -value), it is evaluated manually by comparing it with the distribution of p channelvalues from the scan. This is done for 27 event classes among which the smallest p 0 -value observed in a discarded region is 0.01. To model the analysis of discarded regions in pseudoexperiments, regions are allowed to have larger uncertainties if they fulfil the N obs > 3 criterion.
In addition to monitoring regions discarded due to a total uncertainty in excess of 100%, regions discarded due to N SM < 0.01 but with N obs > 3 would also be monitored individually; however, no such region has been observed.
Tables 4 and 5 list the three event classes with the largest deviations in the m inv and m eff scans respectively. The largest deviation reported by a dedicated search using the same dataset was observed in an inclusive diphoton data selection at a diphoton mass of around 750 GeV with a local significance of 3.9σ [49]. Due to the different event selections and background estimates the excess has a lower significance in this analysis. The excess was not confirmed in a dedicated analysis with 2016 data [50].

Step 4: Generation of pseudo-experiments
As described in Sect. 2.4, pseudo-experiments are generated to derive the probability of finding a p 0 -value of a given size, for a given observable and algorithm. The p channel -value distributions of the pseudo-experiments and their statistical properties can be compared with the p channel -value distribution obtained from data. Correlations in the uncertainties of the SM expectation affect this probability and their effect is taken into account in the generation of pseudo-data as outlined in the following.
For the experimental uncertainties, each of the 35 sources of uncertainty is varied independently by drawing a value at random from a Gaussian pdf. This value is assumed to be 100% correlated across all bins and event classes. The    uncertainty in the normalization of the various backgrounds is also considered as 100% correlated. Likewise, theoretical shape uncertainties, including those estimated from scale variations or the differences with alternative generators, are assumed to be 100% correlated, with the exception of the uncertainties which are used for some SM processes with small cross-sections. The latter uncertainties are assumed to be uncorrelated, both between event classes and between bins of the same event class. Scale variations are applied in the generation of pseudo-experiments by varying the renormalization, factorization, resummation and merging scales independently. The values for each scale of a given pseudoexperiment are 100% correlated between all bins and event classes. The scales are correlated between processes of the same type which are generated with a similar generator setup, i.e. scales are correlated among the W/Z /γ + jets processes, among all the diboson processes, among the tt +W/Z processes, and among the single-top processes.
Changing the size of the theoretical uncertainties by a factor of two leads to a change of less than 5% in the − log 10 ( p min ) thresholds at which a dedicated analysis is triggered. The correlation assumptions in the theoretical uncertainties were also tested. Figure 5 shows the effect of changing the correlation assumption for all theoretical shape uncertainties that are nominally taken as 100% correlated. This test decorrelates the bin-by-bin variations due to the theoretical shape uncertainties in the pseudo-data while retaining the correlation when summing over selected bins in the scan, thus testing the impact of an incorrect assumption in the correlation model. By comparing the nominal assumption of 100% correlation with a 50% correlated component, and a fully uncorrelated assumption, the threshold at which a dedicated analysis is triggered is changed by a negligible amount.

Sensitivity to standard model processes
The sensitivity of the procedure is evaluated with two different methods that either use a modified background estimation through the removal of SM processes or in which signal contributions are added to the pseudo-data sample. As a figure of merit, the fraction of 'signal' pseudo-experiments with P exp,i < 5% for i = 1, 2, 3 is computed. Figure 6 shows how removing the W Z process from the background prediction affects the three smallest expected p channel -values. In Fig. 6a, b, the dashed curves show the nominal expected p channel distribution obtained from pseudoexperiments. These define the p min thresholds for which P exp,i < 5% and vertical dotted lines are drawn at the threshold values. The solid lines show the p channel distributions obtained by testing pseudo-experiments generated from the SM prediction against the modified background prediction which has the W Z diboson process removed. It can be observed that in this case the m eff scan is more sensitive; the Additionally, in Fig. 6a, b, the three smallest p channelvalues observed in the data are shown by arrows, both when tested against the full SM prediction (dashed) and when tested against the modified prediction (solid). For all three cases (i = 1, 2, 3), P exp,i < 5% is found again. This means that a dedicated analysis would be performed for the three event classes in which the p channel -values are observed, i.e. 3μ, 1μ2e1 j, and 2μ1e1 j, likely resulting in the discovery of an unexpected signal due to W Z production. Figure 6c, d shows the m eff distributions of the data with the full SM prediction and the modified prediction respectively. This test uses the conclusion from Sect. 3.6 and is performed in retrospect. In the case of a significant deviation, this test would be performed with pseudo-data to assess the sensitivity of the search to a missing background. Figure 7 shows the effect of removing the tt + γ process. Again the m eff scan is slightly more sensitive, and about 70% of 'signal' pseudo-experiments have P exp,i < 5% in all three cases i = 1, 2, 3. In the data, P exp,i < 5% is found again for all three cases (i = 1, 2, 3). A dedicated analysis would be (a) (b) Fig. 8 The fraction of pseudo-experiments in which a deviation is found with a p channel -value smaller than a given p min . Distributions are shown for pseudo-experiments generated from the SM expectation (circular markers), and after injecting signals of a inclusive Z decays or b gluino pairs withg → ttχ 0 1 decays and various masses. The line corresponding to the injection of a Z boson with a mass of 4 TeV and a gluino with a mass of 1600 GeV overlap with the line obtained from the SM-only pseudo-experiments due to the small signal cross-section performed for the three classes 1μ1γ 2b1 j, 1e1γ 2b2 j, and 1μ1γ 1b3 j, likely resulting in the discovery of an unexpected signal due to tt + γ production.
It is interesting to note that these discoveries would have been made without a priori knowledge of the existence of these processes. Figure 8 shows the sensitivity for the two benchmark signals considered as a function of the mass of the produced particle. For the Z model, where the mass of the resonance can be reconstructed from its decay products, the sensitivity to the signal is found to be the largest in the scan of the m inv distribution. Gluinos undergo a cascade decay process to the lightest neutralino, which is undetected and leads to missing transverse momentum. It is not possible to fully reconstruct an event from gluino pair production due to the presence of neutralinos in the final state. The sensitivity to the gluino signal is therefore found to be the largest in the m eff scan, where a broad excess at large values of this quantity is expected.

Sensitivity to new-physics signals
Exclusion and discovery sensitivity have to be carefully distinguished when the results of this search are compared with model-based searches. An exclusion sensitivity at the 95% CL in a dedicated search roughly corresponds to a single class having a p 0 -value for a discovery test smaller than 0.05. Consequently, the sensitivity to a benchmark signal corresponding to a given particle mass should be compared with the discovery sensitivity of other searches for a Z boson or gluino.
As previously described, a deviation for which P exp,i < 5% promotes the selection to a signal region for a dedicated analysis. By applying this sensitivity criterion, it can be seen in Fig. 8 that this search is sensitive to a Z boson with a mass of about 2.5 TeV as more than 90% of the signal-injected pseudo-experiments show a deviation for which P exp,1 < 5%. Similar sensitivity is expected for a gluino with a mass of about 1 TeV. The probability of discovering a new-physics signal in a new dataset with a dedicated search in the selected event classes is estimated in the next section.

Sensitivity of a second independent dataset
In step 7 a dedicated analysis of a deviation is performed on an independent dataset. The sensitivity of step 7 is evaluated with pseudo-experiments.
A first pseudo-experiment emulates the original dataset on which this analysis is performed. The scan algorithm is applied after which eight different cases can be distinguished where P exp,i is either larger or smaller than 5% for i = 1, 2, 3. In seven cases at least one P exp,i < 5% and a new independent pseudo-experiment is generated to emulate a new independent dataset with the same integrated luminosity. The one, two or three data selections for which P exp,i < 5% are applied to the second pseudo-experiment to obtain the p 0values for these selections. Although the systematic uncertainties may be reduced by applying data-driven estimates of the background, they were assumed to have the same size in the second pseudo-experiment to make a conservative estimate. The systematic uncertainties are also expected to be partially correlated between two datasets but here they were assumed to be uncorrelated.
In four of the seven cases P exp,1 < 5% and these cases are grouped together into a 'one signal region' class. This class shows the sensitivity when there would be only a single data-derived signal region. The case where only P exp,3 < 5% (a) (b) Fig. 9 The fraction of cases in which the first pseudo-experiment has P exp,i < 5% and triggers a second pseudo-experiment which yields a value for − n k=1 log 10 ( p 0k ) smaller than a given value (− n k=1 log 10 ( p min,k )). Here n denotes the minimum number of event classes (1, 2 or 3) for which P exp,i < 0.05. The distribution is shown for pseudo-experiments generated from the SM expectation, and from an SM-plus-signal expectation of a inclusive decays of a Z boson with mass m Z = 2.5 TeV or b gluino pairs with mass mg = 1.0 TeV and g → ttχ 0 1 decays. The signal region tested in the follow-up pseudo-experiment is defined by the preceding pseudo-experiment. The 5σ thresholds are obtained by extrapolating the SM-only fractions to 5.7 · 10 −7 and are indicated at the top of the figure for n = 1 (left), n = 2 (middle) or n = 3 (right) event classes is called the 'three signal region' class. The two remaining cases where P exp,2 < 5% define the 'two signal region' class. These classes show cases when a data-derived signal region is found only by a combination of two and three regions.
For each of the three classes (n = 1, 2, 3) the statistical estimator n k=1 p 0k is computed. Figure 9 shows, as a function of the estimator in the logarithmic form − n k=1 log 10 ( p 0k ) the fraction of cases in which the first pseudo-experiment has P exp,i < 5% and triggers a second pseudo-experiment which yields a value for − n k=1 log 10 ( p 0k ) above a threshold given by the value on the horizontal axis. This is done for pseudo-experiments generated from the SM expectation (SM-only) and for pseudoexperiments generated from the SM expectation plus Z or gluino signal contributions. The 5σ lines are derived from the fractions given by the SM-only lines, as these correspond to the probability of false positives which defines the level of significance. It should be noted that the SM-only lines with circular markers start at a fraction of 0.05 by construction of the P exp,i < 5% definition. The n = 2 and n = 3 lines show the gain in sensitivity when a deviation in one or two channels, respectively, is not large enough to define a data-derived signal region. It does not show the gain in sensitivity from considering multiple channels when a single channel defines a signal region. Signals which produce one or more large deviations therefore lower the number of cases in the n = 2 and 3 categories, while signals producing deviations close to the P exp,i < 5% threshold (e.g. for higher Z masses) would raise the number of cases in the n = 2 and 3 categories. A Z boson with a mass of 2.5 TeV would yield a discovery in almost all cases.
In the case of a 1.0 TeV gluino the sensitivity is about 5σ . The sensitivity increases to about 1.1 TeV if the integrated luminosity of the two datasets combined is increased to about 10 fb −1 by doubling the size of the second dataset to 6.4 fb −1 . ATLAS has determined the discovery sensitivity of the dedicated searches for gluinos decaying to quarks, a W boson and a neutralino. This dedicated search estimates a local significance (i.e. not corrected for trial factors of the dedicated searches) of 5σ with a luminosity of 10 fb −1 for gluinos with a mass of 1.35 TeV assuming a systematic uncertainty of 25% [51].
It should be noted that, with this strategy, these signals are found without any a priori assumptions about the model, including the mass and the decay chain of the gluinos or the Z boson. It can therefore be concluded that this procedure could also be sensitive to possible unexpected signals for new physics.

Step 6: Results
In step 6 the p channel -values found in the analysis of the 2015 ATLAS data are interpreted by comparing them with the p channel -values found in the pseudo-experiments. Figure 10 shows the fractions of pseudo-experiments that have at least one, two or three p channel -values below a given threshold ( p min ) in the scans of the m inv and the m eff distributions. The statistical tests in both distributions for the three leading p channel -values are all consistent at the P exp,i > 50% level with the SM expectation of p channel -values obtained from pseudo-experiments. Changing the size of the theoretical shape uncertainties by a factor of two leads to a change in  Tables 4 and 5 the three smallest p channel -values of a factor of two. It therefore does not lead to an appreciable change in the result.
In conclusion, no significant deviations are found in the 2015 dataset and consequently no dedicated analysis using data-derived signal regions (step 7) is initiated.

Conclusions
A strategy for a model-independent general search to find potential indications of new physics is presented. Events are classified according to their final state into many event classes. For each event class an automated search algorithm tests whether the data is compatible with the Monte Carlo simulated Standard Model expectation in several distributions sensitive to the effects of new physics. For each distribution the search algorithm is repeated on many pseudoexperiments to make a frequentist estimate of the statistical significance of the three largest deviations. A data selection in which a significant deviation is observed defines a dataderived signal region which will be tested on a new dataset in a dedicated analysis with an improved background model.
The strategy has been applied to the data collected by the ATLAS experiment at the LHC during 2015, corresponding to a total of 3.2 fb −1 of 13 TeV pp collisions. In this dataset, exclusive event classes containing electrons, muons, photons, b-tagged jets, non-b-tagged jets and missing transverse momentum have been scanned for deviations from the MC-based SM prediction in the distributions of the effective mass and the invariant mass. Sensitivity studies with various toy signals (tt + γ , W Z, gluino, and Z production) have shown that the strategy could discover signals for new physics without an a priori knowledge of the existence of the processes.
No significant deviations are found in the 2015 dataset and consequently no dedicated analysis using data-derived signal regions is performed. The strategy discussed in this paper will be useful to search for signals of unknown particles and interactions in the subsequent Run 2 datasets.

Data Availability Statement
This manuscript has no associated data or the data will not be deposited. [Authors' comment: "All ATLAS scientific output is published in journals, and preliminary results are made available in Conference Notes. All are openly available, without restriction on use by external parties beyond copyright law and the standard conditions agreed by CERN. Data associated with journal publications are also made available: tables and data from plots (e.g. cross section values, likelihood profiles, selection efficiencies, cross section limits, ...) are stored in appropriate repositories such as HEPDATA (http://hepdata.cedar.ac.uk/). ATLAS also strives to make additional material related to the paper available that allows a reinterpretation of the data in the context of new theoretical models. For example, an extended encapsulation of the analysis is often provided for measurements in the framework of RIVET (http://rivet.hepforge.org/)." This information is taken from the ATLAS Data Access Policy, which is a public document that can be downloaded from http://opendata.cern.ch/ record/413 [opendata.cern.ch].] Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Funded by SCOAP 3 .

A.1 Monte Carlo programs and settings
Samples of multijet production were simulated with 2 → 2 matrix elements (ME) at leading order (LO) using the Pythia 8.186 generator [17]. The A14 [25] set of shower and multiple parton interactions parameters (tune) was used together with the NNPDF2.3LO PDF set [24]. Alternative multijet samples with 2 → 2 ME at LO were generated with Herwig++ 2.7.1 [53] with the UEEE5 underlying-event tune and the CTEQ6L1 [33] PDF set, and with Sherpa 2.1.1 [20] with ME for 2 → 2 and 2 → 3 partons at LO merged using the ME+PSLO prescription. All Sherpa samples use the CT10 [54] PDF set and the Sherpa parton shower [55] with a dedicated shower tuning developed by the Sherpa authors.
Events containing leptonic decays of a W or Z bosons with associated jets (W /Z + jets) were simulated using the Sherpa 2.1.1 generator [56]. Matrix elements were calculated using the Comix [57] and Open-Loops [58] generators. They include up to two partons at NLO and four partons at leading order (LO), merged using the ME+PS@NLO prescription [59]. Samples with W and Z decaying hadroni-cally are also generated with Sherpa 2.1.1 including up to four partons at LO. The W /Z + jets events were normalized to their inclusive next-to-next-to-leading-order (NNLO) cross-sections [60,61]. Simulated samples of massive vector bosons produced in association with one or two real photons were also generated with Sherpa 2.1.1 with a ME calculated at LO for up to three partons. They are scaled to their NLO cross-sections computed with MCFM [62,63].
Samples of prompt photon production in association with jets (γ + jets) were generated using Sherpa 2.1.1. For these samples up to four real parton emissions are included at LO. Events containing two prompt photons (γ γ + jets) were also generated with Sherpa 2.1.1. Matrix elements were calculated with up to two partons at LO. The gluon-induced box process is also included. These samples were scaled to data following the procedure described in Sect. 3.2.3.
Top-quark pair production events, and single top quarks in the W t-and s-channels, were simulated using the Powheg-Box v2 [64] generator with the CT10 PDF set, as detailed in Ref. [65]. The top quark mass was set to 172.5 GeV. The h damp parameter, which regulates the transverse momentum of the first extra emission beyond the Born configuration and thus controls the p T of the tt system, was set to the mass of the top quark. Electroweak t-channel single-top-quark events were generated using the Powheg-Box v1 generator. This generator uses the four-flavour scheme for the NLO matrixelement calculations together with the fixed four-flavour PDF set CT10f4. For all top-quark processes, top-quark spin correlations are preserved (for the single-top t-channel, top quarks were decayed using MadSpin [66]). An alternative sample of tt was generated with the Sherpa 2.1.1 generator, including up to one additional parton at NLO and up to four additional partons at LO accuracy, interfaced to the parton shower using the ME+PS@NLO prescription. The parton shower (PS), fragmentation, and the underlying event of the Powheg-Box samples were simulated using Pythia 6.428 [67] with the CTEQ6L1 PDF set and the corresponding Perugia 2012 tune (P2012) [68]. The tt and single-top-quark events were normalized to the NNLO cross-section including the resummation of soft gluon emission at next-to-next-to-logarithm accuracy [69][70][71] using Top++2.0 [72]. Both the default and the alternative tt samples were corrected to reproduce the NNLO prediction [73,74] of the top quark p T and the p T of the tt system. The contribution of tt + bb was generated separately with Sherpa 2.1.1 at NLO; the calculation was performed in the four-flavour scheme and with the CT10f4 PDF set.
Diboson samples were generated with the Sherpa 2.1.1 generator, and are described in Ref. [75]. The matrix elements contain the W W , W Z and Z Z processes and all other diagrams with four or six electroweak vertices (such as sameelectric-charge W boson production in association with two jets, W ± W ± j j). Fully leptonic triboson processes (W W W , W W Z, W Z Z and Z Z Z) with on-shell bosons and up to six charged leptons were also simulated using Sherpa 2.1.1. The ME for the Z Z processes were calculated at NLO for up to one additional parton; final states with two and three additional partons were calculated at LO. The W Z and W W processes were calculated at NLO with up to three extra partons at LO using the ME+PS@NLO prescription. The W W final states were generated without bottom quarks in the hardscattering process, to avoid contributions from top-quarkmediated processes. The triboson processes were calculated with the same configuration and with up to two extra partons at LO. The generator cross-sections were used for the normalization of these backgrounds.
Samples of top quark production in association with vector bosons [76] (W , Z , γ and W W , including the nonresonant γ * / Z contributions) were generated at LO with MG5_aMC@NLO 2.2.2 [32] interfaced to Pythia 8.186, with up to two (tt W ), one (tt Z) or no (tt W W , ttγ ) extra partons included in the matrix element. The A14 tune was used together with the NNPDF2.3LO PDF set. The ttγ sample uses a fixed QCD renormalization and factorization scale of 2m t , and the top decay was performed in MG5_aMC@NLO to account for hard photon radiation from the top decay products. The same generator was also used to simulate the t Z, 3-top and 4-top quarks processes. The tt W , tt Z, tt W W and 4-top samples were normalized to their NLO crosssections [32] while the LO cross-section from the generator was used for t Z and 3-top quarks.
The Higgs boson mass was set to 125 GeV and all SM Higgs boson decay modes were considered. The production of the SM Higgs boson in the gluon-gluon fusion (ggF) and vector-boson fusion (VBF) channels was modelled using the Powheg-Box v2 generator with the CT10 PDF set. It was interfaced to Pythia 8.186 with the CTEQ6L1 PDF set and the AZNLO tune [77]. Production of a Higgs boson in association with a pair of top quarks was simulated using MG5_aMC@NLO 2.2.2 interfaced to Herwig++ 2.7.1 [78] for showering and hadronization. The UEEE5 underlyingevent tune was used together with the CT10 (matrix element) and CTEQ6L1 (parton shower) PDF sets. Simulated samples of SM Higgs boson production in association with a W or Z boson were produced with Pythia 8.186, using the A14 tune and the NNPDF2.3LO PDF set. Events were normalized to their most accurate cross-sections calculations (typically NNLO) [79].
To avoid double counting, events with a hard photon from final-state radiation were removed from the multijet, tt and W /Z + jets samples.

A.2 Theoretical uncertainties
The inclusive W and Z cross-sections are known at NNLO, with an uncertainty of about 5% [60,61]. Modelling uncer-tainties for W + jets and Z + jets are determined by varying the renormalization, factorization and resummation scales in the ME by factors 0.5 and 2, together with a change of the merging scale from 20 GeV to 15 GeV or 30 GeV.
For top quark pair or single-top production, processes known to NNLO+NNLL [72] or approximate NNLO [69][70][71], respectively, the cross-section uncertainty is 7%. The modelling uncertainty for tt is determined by comparing the nominal Powheg+Pythia NLO+PS sample with an alternative sample generated with Sherpa including up to two partons at NLO and four at LO accuracy in the ME. The singletop quark uncertainty is estimated by varying the renormalization and factorization scales, and by changing the h damp parameter and the shower tune of the Powheg+Pythia sample. An uncertainty in the interference between the W t and tt production is estimated by comparing the nominal W t sample, where all doubly resonant NLO W t diagrams are removed, with a sample where the cross-section contribution from Feynman diagrams containing two top quarks is subtracted [80].
Diboson cross-sections (W W , W Z and Z Z) are calculated at NLO, and a 6% uncertainty, evaluated with the MCFM program [62,63], is applied to their cross-sections. Their modelling uncertainty is evaluated analogously to V + jets by varying the scales used to perform the calculation. For W + γ and Z + γ samples, which are computed at LO, a 20% uncertainty in the cross-section is assumed, with a further 20% modelling uncertainty assigned in accord with the measurement in Ref. [81].
The cross-sections for top-quark pair production in association with one or two vector bosons are calculated at NLO and an uncertainty of 15% is used [32]. Their modelling uncertainty is evaluated from variations of the renormalization and factorization scales, together with a change in the merging scale. For tt + γ , an additional 12% uncertainty in the normalization is assumed, while a uniform 30% uncertainty is assigned to the modelling [76].
Multijet and γ + jets processes are scaled to data following the procedure described in Sect. 3.2.2. Therefore, no uncertainty is applied to their normalization. For multijets the maximum bin-by-bin difference between the Pythia 8 nominal sample and alternative samples generated with Sherpa and Herwig++ is considered as a shape uncertainty. In addition the standard deviation of the 100 replica sets of the NNPDF2.3LO PDF is used. The modelling uncertainty for γ + jets is estimated from scale variations with the same methodology as for the V + jets samples. The uncertainty in the γ γ + jets modelling is instead taken to be 30% from parton-level comparisons of samples with varied scales.
A conservative uncertainty of 20% [79] is assumed for Higgs production in the ggF, VBF and V H channels. A further uncertainty of 20% is assigned as a shape uncertainty.
For the subdominant triboson (including V +γ γ ), tt H , 3top, 4-top and t Z production processes a 50% uncertainty is assigned to the event yields, similarly to Ref. [82]. Uncertainties associated to PDFs are found to be small in all channels compared to the modelling uncertainties of the MC simulations.

B Details of the object reconstruction
Electron candidates are reconstructed from an isolated electromagnetic calorimeter energy deposit matched to an ID track and are required to have |η| < 2.47, a transverse momentum p T > 10 GeV, and to pass a loose likelihoodbased identification requirement [44,83]. The likelihood input variables include measurements of calorimeter shower shapes and measurements of track properties from the ID. The candidate electrons are selected if the matched tracks have a transverse impact parameter significance relative to the reconstructed primary vertex of |d 0 |/σ (d 0 ) < 5. Candidates within the transition region between the barrel and endcap electromagnetic calorimeters, 1.37 < |η| < 1.52, are removed.
Muon candidates are reconstructed in the region |η| < 2.7 from muon spectrometer tracks matched to ID tracks. The muon candidates are selected if they have a transverse momentum above 10 GeV and pass the medium identification requirements defined in Ref.
[43], based on selections on the number of hits in the different ID and muon spectrometer subsystems, and the significance of the charge to momentum ratio q/ p.
All candidate leptons (electrons and muons) are used for the object overlap removal, as discussed in Sect. 3.1.3. Tighter requirements on the lepton candidates are imposed, which are then referred to as 'signal' electrons or muons and are used further in the analysis, i.e. to establish the accuracy of the background modelling processes or to classify the events. Signal electrons must satisfy a tight likelihood-based identification requirement [44,83]. Signal muons must fulfil the requirement of |d 0 |/σ (d 0 ) < 3. The track associated with a signal lepton must have a longitudinal impact parameter relative to the reconstructed primary vertex, z 0 , satisfying |z 0 · sin θ | < 0.5 mm. Isolation requirements are applied to both the signal electrons and muons. The calorimeter isolation is computed as the sum of the energies of calorimeter energy clusters in a cone of size R = 0.2 around the lepton. Track isolation is defined as the scalar sum of the p T of tracks within a variable-size cone around the lepton, in a cone of size R = 0.2 (0.3) for electron (muon) trans-verse momenta p T < 50 GeV ( p T < 33 GeV) and of size R = 10 GeV/ p T for p T > 50 GeV ( p T > 33 GeV). The efficiency of these criteria increases with the lepton transverse momentum, reaching 95% at 25 GeV and 99% at 60 GeV, as determined in a control sample of Z decays into leptons selected with a tag-and-probe technique [43,44]. Corrections are applied to the MC samples to match the leptons' trigger, reconstruction and isolation efficiencies in data.
Photon candidates are reconstructed from an isolated electromagnetic calorimeter energy deposit and are required to satisfy the tight identification criteria described in Refs. [46,84]. Furthermore, photons are required to have p T > 25 GeV and |η| < 2.37, excluding the barrel-endcap calorimeter transition in the range 1.37 < |η| < 1.52. Photons must further satisfy isolation criteria based on both track and calorimeter information [46]. After correcting for contributions from pile-up, the energy within a cone of R = 0.4 around the cluster barycentre is required to be less than 2.45 GeV + 0.022 × p See Figs. 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23. 1j Events / class 1 −  1e5j  1e6j  1e7j  1e8j  1e9j  1e1b4j  1e1b5j  1e1b6j  1e1b7j  1e1b8j  1e1b9j  1e2b  1e2b1j  1e2b2j  1e2b3j  1e2b4j  1e2b5j  1e2b6j  1e2b7j  1e2b8j  1e3b  1e3b1j  1e3b2j  1e3b3j  1e3b4j  1e3b5j  1e3b6j  1e3b7j  1e4b  1e4b1j  1e4b2j  1e4b3j  1e4b4j  1e4b5j  1e5b  1e5b1j  1e5b2j Events / class 1 −      16 The number of events in data, and for the different SM background predictions considered, for classes with three or four leptons and (b-)jets (no photons or E miss T ). The classes are labelled according to the multiplicity and type (e, μ, γ , j, b, E miss T ) of the reconstructed objects for the given event class. The hatched bands indicate the total uncertainty of the SM prediction Events / class Events / class    Fig. 20 The number of events in data, and for the different SM background predictions considered, for classes with large E miss T , at least one photon, leptons and (b-)jets. The classes are labelled according to the multiplicity and type (e, μ, γ , j, b, E miss T ) of the reconstructed objects for the given event class. The hatched bands indicate the total uncertainty of the SM prediction