Boosting mono-jet searches with model-agnostic machine learning

We show how weakly supervised machine learning can improve the sensitivity of LHC mono-jet searches to new physics models with anomalous jet dynamics. The Classification Without Labels (CWoLa) method is used to extract all the information available from low-level detector information without any reference to specific new physics models. For the example of a strongly interacting dark matter model, we employ simulated data to show that the discovery potential of an existing generic search can be boosted considerably.


Introduction
The LHC is a designated discovery machine. It has delivered the discovery of a Standard Model-like Higgs particle [1,2] almost ten years ago and has also provided a wealth of precision measurements. However, new physics beyond the Standard Model (SM) has not been found in direct searches, yet. This is in contrast to our hope that there is a link between LHC physics and dark matter. The LHC experiments will increase the amount of collected data in the coming years, and even more so in the high luminosity phase of the LHC. This vast amount of data has to be scrutinized in all possible ways, in particular also expecting the unexpected.
Model-agnostic Machine Learning (ML) techniques have been shown to provide sensitivity to various new physics signatures. Completely unsupervised methods based on learning a representation of the data to find anomalies in a completely data-driven way have been introduced in Refs. [3][4][5][6][7][8][9]. Guiding ML methods with simulations while keeping an open mind to what new physics might look like has been used in Refs. [10][11][12][13][14]. It is the ultimate vision to have a model-agnostic ML algorithm that uncovers new physics even if it has never been considered by any human physicist. However, there is a long way to go. Being able to tag potential signal events, e.g. with autoencoders or any other anomaly search algorithm, does not mean that one can make a statistically meaningful discovery [15] if there is no strategy to compare to the SM expectation. Only recently, first steps have been made to develop such a general, complete strategy [16].
It is at the heart of any LHC new physics search to compare the background expectation with a measurement. If the background expectation can be measured in a control region (usually involving some theory assumptions), only data needs to be examined for potential discoveries. ML methods have been proposed to increase the sensitivity of searches employing control regions [17][18][19][20][21][22][23][24][25][26][27][28] and applied to experiment [29]. In a traditional resonance search or bump hunt, one simply counts events in side bands and signal regions and looks for deviations from a smooth distribution. In such a setup, model-agnostic ML is particularly promising since it can go far beyond counting events. ML can discover patterns in the signal region which are absent in the control regions, e.g. investigating jet substructure in the dijet invariant mass spectrum. To examine the potential of in this context, two community challenges have recently collected many interesting ideas [30,31].
In this work we employ the Classification Without Labels (CWoLa) method [18] which is designed to uncover any difference between a control and a signal region. A supervised classifier is trained to tag any event as belonging to the signal or the control region. If the control and the signal region have identically distributed features in the absence of new physics this is supposed to be an impossible task. The events being tagged as most signal like will be equally distributed between the control and the signal region. However, if there is a different signature in the signal region due to new physics, the supervised classifier will recognize it and tag it as most signal like. Moreover, the method can provide a statistically meaningful discovery. Labels for new physics events, which are unavailable in real data, are not needed. Only a control region is needed which contains less new physics events than the signal region.
CWoLa has been successfully used for bump hunts [20,24] and combined with density estimation to improve the sensitivity of the corresponding searches [22]. In the present work, we go beyond bump hunting. We show how to improve mono-jet searches at the LHC using CWoLa. The dominant background in a mono-jet search stems from invisibly decaying weak vector bosons. Hence, events with visibly decaying vector bosons naturally provide a suitable control region. A standard mono-jet search is a cut-and-count search, where features such as the structure of the observed jets do not play a role. Looking for any difference in the jet structure can boost the sensitivity to models with modified jet dynamics. This has been shown for a specific model using supervised ML for example in Ref. [32]. Here, we show how the use of CWoLa improves the sensitivity to differences in jet structure in a model unspecific way. We demonstrate the general idea following the most recent ATLAS mono-jet search [33]. The discovery potential for new physics is illustrated employing a specific dark matter model with a strongly interacting dark sector [32,34], highlighting the potential as well as the limitations of the method. In particular, we demonstrate that signal regions which are already limited by systematic uncertainties in a standard search, gain sensitivity using model-agnostic ML and will profit in particular from the high luminosity phase at the LHC.
This work is organised as follows: In Sec. 2, we introduce the ATLAS mono-jet search and describe the simulation of all the required data to be used in the following investiga-tions, i.e. the SM backgrounds and the strongly interacting dark-matter model used as an example. Sec. 3 summarizes the general CWoLa idea, before the specific CWoLa setup for the mono-jet search is described in Sec. 4. In Sec.5, we highlight the discovery potential of the method under simplifying assumptions, before we scrutinize and reinforce our findings in Sec. 6. We conclude in Sec. 7 and refer to Appendices A and B for more details on the setup and the results, respectively.

Setup
The strategy to search for new physics as outlined in this paper is not tied to a specific mono-jet analysis, and it might also be promising for searches with a well-defined control region beyond a mono-jet analysis. However, for concreteness, we follow the ATLAS monojet search in Ref. [33]. In this search, events with E miss T > 200 GeV and a leading anti-k T jet with jet radius 0.4, p jet T > 150 GeV and |η jet | < 2.4 are analysed using an integrated luminosity of 139 fb −1 of LHC Run 2 in several inclusive and exclusive E miss T regions. At most three additional jets with p jet T > 30 GeV and |η jet | < 2.8 are allowed in an event. Events with identified leptons are vetoed. We focus on the inclusive signal region IM1 with E miss T > 250 GeV and ∆φ(p jet T , p miss T ) > 0.4 for all jets. For this region more than 10 6 events have been recorded. The choice of signal region IM1 is motivated by the fact that the CWoLa method benefits from large amounts of training data and will in particular improve searches which are systematics limited. In the high-luminosity phase of the LHC, our machine-learning approach will become more and more applicable also at higher E miss T thresholds. Other machine learning applications for improving the sensitivity of mono-jet searches can be found in Refs. [35][36][37].

SM backgrounds
The SM background in the signal region IM1 consists mainly of Z+jet production where the Z-boson decays invisibly into neutrinos (61%), followed by W+jet production where the W-boson decays invisibly, i.e. it decays leptonically and the charged lepton is not identified (31%). Smaller backgrounds are associated with top-quark production where one top quark decays leptonically (3.5%) and diboson production where one boson decays invisibly (2%).
To reduce the systematic errors of the theory prediction, the background estimate of the ATLAS search [33] uses dedicated control samples where charged leptons from the Zboson or W-boson decays are identified. The control region for IM1 is defined by the same cuts as the signal region, but the cut on E miss T is replaced by a cut on the recoil momentum p recoil T , where p recoil T = p l T + p miss T and p l T is the transverse momentum of the identified leptons (one for W-boson decays, two for Z-boson decays). Whether an event belongs to the signal or the control region therefore only depends on the decay products of the vector boson. We make use of this control sample in our CWoLa approach as discussed in Sec. 4.
Since we do not have access to LHC data, we investigate our CWoLa based approach using Monte-Carlo simulations. All events are simulated at 13 TeV center-of-mass energy using MadGraph5 [38] for the hard process, Pythia8.2 [39] as a parton shower, Delphes3 [40] for a basic detector simulation using the ATLAS Card, and FastJet [41] for jet-clustering, with the default settings for all tools.
The CWoLa method is sensitive to differences in jet structure, comparing the jets in the control and the signal region. The leading QCD jets in the Z+jet and W+jet background processes are generated through initial-state radiation (ISR), so their structure is determined by QCD dynamics and independent of the decay channel and the decay dynamics of the vector boson. If the Z boson in the Z+jet process decays into neutrinos or charged leptons, the corresponding ISR QCD jet belongs to the signal or the control region, respectively. Also for W+jet production, the leading jets from ISR can populate the signal or the control region, depending on whether the charged lepton from the W decay is identified or not. As the structure of the ISR jets is the same for all Z+jet and W+jet background processes, we simulate only Z+jet production followed by a Z decay into neutrinos and use the corresponding jets for both the signal and the control region. These events are the basis for the discussion in Sec. 5 where we show the applicability of the CWoLa method in a simplified setup.
As discussed above, the signal region also contains smaller admixtures from top-quark and diboson production. Fat jets (with a jet-radius R = 0.8, see Sec. 4) in those processes might have a multi-prong structure since they can contain more than one subjet emerging from the decay products of a top quark or a vector boson. Since the CWoLa method tags those jets as different if they are not present in the control region as well, they have to be treated with some care as discussed in Sec. 6. To keep things simple, we only simulate tt production in the semi-leptonic channel, where the transverse momentum of the leptonically decaying W-boson has to fulfill p recoil T > 250 GeV. The leading jets from those tt events can populate the signal and the control region, depending on whether the charged lepton is identified or not. Concerning the diboson events, we simulate W(→ jets)Z(→ νν) events in the IM1 region. Again, we assume that a more realistic composition of the diboson events in the signal and control regions would not qualitatively change the jet characteristics of the sample we are interested in. Other backgrounds are strongly suppressed and are thus ignored in this work for the sake of simplicity.

New physics example: Strongly interacting dark matter model
The CWoLa strategy outlined in Sec. 3 and Sec. 4 is completely model agnostic. However, to show the discovery potential, we employ a specific new physics model. Since we are sensitive to modified jet dynamics, we use the strongly interacting dark matter model introduced in Ref. [34]. How the modified jet dynamics of this model can be used to improve a mono-jet analysis in a supervised setup has been discussed in Ref. [32]. Other machine learning applications to strongly interacting dark matter models can be found in Refs. [9,42,43]. For our choice of parameters, the model contains a heavy vector boson Z with mass m Z = 2 TeV interacting with standard model quarks with a coupling g d = 0.1. It couples the Standard Model to a sector of dark quarks q d with a coupling e d = 0.4. These quarks are also charged under a dark SU(3) gauge group with a confinement scale Λ d = 5 GeV. Hence, being pair-produced in a Z decay they shower and hadronize to form dark pions π d and dark rho mesons ρ d with masses m π d = m ρ d = 5 GeV. While the neutral rho mesons ρ 0 d mix with the Z and decay back to SM quarks promptly, the other mesons are stable dark-matter candidates and escape the detector. Hence, the jets in these models are usually called semi-visible jets. On average the invisible fraction of the jet energy amounts to r inv = 0.75, leading to a specific modified jet structure. Furthermore, the dark jets differ from ordinary QCD jets due to the different running of the dark gauge coupling, the absence of heavy quarks in the shower and the presence of substructure from the decays of dark mesons. Because of the invisible fraction, the Z decays into dark quarks do not lead to a resonance in the dijet invariant mass. Moreover, a certain fraction of jets turns out to be completely invisible and populates the signal region of our mono-jet search.
We use the UFO [44,45] implementation of the model to simulate the pair-production of dark quarks starting with Madgraph and using the same tool-chain as discussed in Sec. 2.1. The hidden-valley module of Pythia [46] is used to handle the dark showering, hadronisation and the decay of the ρ 0 d mesons into SM quarks. The subsequent showering and hadronisation of these quarks as well as the detector simulation are performed as for the SM backgrounds. Here, we simulate events in the IM1 region. The same model parameters and the same tool chain have been also used to produce the Aachen benchmark data set introduced in Ref. [47].

The CWoLa method
Classification Without Labels (CWoLa) is based on the typical setup in high energy physics experiments: One defines a signal region (SR) and a control region (CR). The signal region is chosen to optimize the fraction f SR = N SR A /N SR B for a certain class of models, where N SR A is the number of new physics or anomalous events/data instances and N SR B is the number of background events/data instances. The control region should be free of anomalous events or should at least contain a smaller fraction f CR = N CR A /N CR B of anomalous events, i.e. f CR < f SR . The expected number of background events N SR B measured in the signal region is assumed to be known up to a certain relative error σ which includes statistical as well as systematic uncertainties. The control region is used to minimize the systematic uncertainty using data driven methods. For f SR > σ the measurement is sensitive, e.g. for f SR > 5σ one would expect a discovery.
There are features in each event that are used for the definition of the signal and the background region. CWoLa is based on the crucial assumption that there is an additional set of features such that background events from the signal and the control region are indistinguishable using only those features, i.e. the events in the signal and the control region are drawn from an identical probability distribution concerning this restricted feature space. These features will be called CWoLa features in the following.
In the CWoLa setup, a standard supervised binary classifier is trained on the CWoLa features which tags each event as belonging to the signal or the control region. These labels are available for real experimental data. No reference is made to labeling events as background or new physics events, which are the labels one would actually like to know, hence the name Classification Without Labels.
In the absence of new physics events and taking the above assumption for granted, the classifier has to fail because the task is impossible. The predicted labels cannot be better than random guessing. However, if there are anomalous events which are drawn from a different probability distribution and f CR < f SR , the classifier should assign a higher score for those anomalous events to belong to the signal region. Thus, for a given score threshold, the classifier will predominantly select anomalous events.
It is worth stressing the model-agnostic nature of this approach. Searching for a specific new physics model, a difference between the new physics and the background events in some feature will already be used to define the signal region to boost the sensitivity of a simple cut-and-count analysis. However, in this case one has sensitivity to this feature only. CWoLa instead is sensitive to all potential differences in the CWoLa features.
It has been shown that this setup leads to an optimal classifier [18], i.e. it can have the same performance as a supervised classifier working with events labeled as being anomalous or belonging to the background. However, this finding does not imply that the method is guaranteed to work in practice.
CWoLa has been shown to improve the sensitivity of bump-hunt searches [20]. In this context, the control region consists of the side bands of a specified signal region. For concreteness, consider a resonance search using dijet events. An interval in the dijet invariant mass is selected as signal region. It would be populated by new physics events if there was a resonance with a mass in the signal region decaying into a dijet final state. If the structure of new physics jets differs from QCD jets, the CWoLa method can boost the sensitivity of the search by only taking into account events with a classifier score beyond a certain threshold. However, the CWoLa features for the classifier have to be chosen with care since they might be correlated with the dijet mass which defines the signal region. Otherwise the assumption that background jets from the signal and the control region are indistinguishable using the CWoLa features is violated. The corresponding decorrelation of observables has been discussed in Ref. [24] and methods to improve on those limitiations for bump hunts have also been suggested [22].

CWoLa for anomalous mono-jets
To improve an existing or a future mono-jet search, we suggest to use the CWoLa setup in the following way: the signal region is defined as usual being mainly based on missing transverse energy. As discussed in Sec. 2, we use the IM1 region of the most recent ATLAS mono-jet search [33] as a concrete example. The control region is defined by events where neutrinos are replaced by charged leptons (see also Sec. 2). Data in the control regions has been recorded in past experimental analyses for most of those backgrounds to control systematics. Details can be also found in Ref. [33].
As CWoLa features to the classifier, low level information of the leading fat jet in each event is used. To find the leading fat jet, the constituents of the events in the IM1 region are reclustered into anti-k T jets with a jet radius R = 0.8. In our simulation based studies, we use the 40 jet constituents with largest transverse momentum in the constituents branch of the Delphes output (see Sec. 2 for the simulation details). The assumption that the CWoLa features are uncorrelated with respect to the definition of signal and background regions, i.e. the leptonic decays of weak bosons, is physically sound: the evolution of the jets from initial-state radiation is driven by QCD dynamics and not influenced by the decay properties of a leptonically decaying weak boson which recoils against the jet. We have verified the independence of the jet structure with respect to the weak-boson decays in our simulation and simplify the simulation accordingly as detailed in Sec. 2. Replacing simulated by experimentally recorded data should be straightforward.
As the binary classifier, we use a Dynamic Graph Convolutional Neural Network (DGCNN) [48] which is based on the ParticleNet architecture [49] and has proven to be an extremely powerful jet tagger [50]. Architecture, preprocessing, and training are discussed in Appendix A. Note that instead any powerful supervised classification algorithm could be used, e.g. the recently proposed graph-based LundNet [51] or LorentzNet [52].
The score s ∈ [0, 1] of the trained classifier for each event can be interpreted as the probability of the event to belong to the signal region (see Appendix A for details). For few anomalous events, the output distribution for background events is expected to peak close to s = 0.5. As discussed in Sec. 3, anomalous events are expected to be tagged as belonging to the signal region.
Given the classifier score s for each event, we choose a threshold t. For s > t, the event is selected as being potentially anomalous. The threshold t is chosen such that one permille of the events in the control region is selected, i.e. n CR = CR N CR with CR = 0.001. We will comment on this choice below. Here we always assume that there is only background in the control region. Hence, we work with a background rejection 1/ CR B = 1/ CR = 1000. For a given 1/ CR B , corresponding to a specific threshold t, the classifier selects a number of anomalous events n SR A = SR S N SR A in the signal region, where SR S is the signal efficiency and depends on t. In our weakly supervised setup, SR S is unknown. In analogy, the classifier selects a certain number of background events from the signal region n SR B = SR B N SR B . If the CWoLa assumptions hold, we have SR B = CR B = 0.001. If there are no anomalies in the signal region, the fraction of selected events from the signal region will be identical to that of the control region, i.e. the expected number of selected events is n SR exp = CR B N SR . This is the Null hypothesis. If n SR exp and n SR differ only as expected from statistical fluctuations, no indication of new physics is observed. If there are anomalous events in the signal region and the classifier is successful in identifying them with some SR S > SR B , the fraction of selected events from the signal region will be larger. If it exceeds the expectation for statistical fluctuations the Null hypothesis can be excluded in the usual way. Due to the model agnostic nature of the method, there are no exclusion limits to be derived for any models. The CWoLa method is a discovery tool.
The choice for the background rejection 1/ CR B is to a certain extent arbitrary. Our choice 1/ CR B = 1000 is driven by the following considerations: Although SR S is unknown, the signal-to-background ratio n SR A /n SR B is usually a monotonically growing function of the background rejection 1/ CR B which favours to choose 1/ CR B large. In particular, the signal-to-background ratio for a discovery should not be too small such that one is not too sensitive to small unknown systematics concerning the validity of the CWoLa assumptions. On the other hand, for increasing 1/ CR B , a classifier is often not performant enough to not only improve the signal-to-noise ratio but also the significance n SR A / n SR exp of an observed excess. Hence, the minimal acceptable signal-to-background ratio for a discovery is a good guide for 1/ CR B . As a default, we use one million events in both the signal and the control region which is close to the actual number of observed events in the IM1 signal region of the most recent ATLAS search [33]. For CR B = 0.001, this corresponds to n CR = 1000 selected events. Although n CR is fixed by choosing the threshold t for our data set, n CR for the same t is distributed with the usual statistical uncertainty √ n CR for independent data sets. Therefore the corresponding relative statistical uncertainty due to the limited size of the the control region for n SR is roughly σ CR = 3%. Having a signal sample of the same size (n SR exp = n CR ) and adding the corresponding relative statistical uncertainty σ SR in quadrature, a 5σ discovery needs at least 5 (σ CR ) 2 + (σ SR ) 2 n SR exp = 5 √ 2 σ CR n SR exp = 5 2 n SR exp ∼ 224 additional events. In principle, one could also scan the background rejection. However, here we do not follow this idea in order to avoid discussions about the look-elsewhere effect. The chosen value has not been tuned to the success of finding our example signal. Moreover, as we will see in the following sections, a discovery using our CWoLa setup will most likely be an iterative process which is driven not so much by statistical considerations but by iterated efforts to understand the quality of the control region.
In this setup we are sensitive to modified jet structures which occur in events with large missing transverse energy. A physically well motivated example for such a model is discussed in Sec. 2.2 and used in the following sections to demonstrate the sensitivity of the method.

Proof of principle
In this section, we take the CWoLa assumptions for granted. The simulated background events in the signal and the control region are indeed drawn from the same probability distribution since we use the same simulation setup to generate them. We only use Z+jet events which are simulated as discussed in Sec. 2.1. Therefore, we investigate the performance of the CWoLa setup under ideal circumstances. The problems which might arise for more complicated samples of background jets or in defining a suitable control sample are discussed in Sec. 6.
As a default, we use N CR = N SR = 10 6 events in the control region and in the signal region. We use a fraction f CR = 0 of anomalous events in the control region, i.e. we have N CR B = N CR and only Z+jet events. In the signal region, we have N SR B = (1 − f SR )N SR Z+jet events and N SR A = f SR N SR new physics events, simulated as detailed in Sec. 2.2. We take the number of new physics events N SR A as a free parameter. Note that not all new physics events have anomalous leading fat jets since also initial-state radiation QCD jets can be leading. This is making the task to identify new physics even harder.
Using  Table 1. Number of events n SR selected from the signal region for N CR = N SR = 10 6 and several signal fractions f SR . We have used the mean score of five classifiers trained on the same data. We also show the number of events expected to be selected in the absence of a new physics signal, n SR exp , and the number of selected anomalous and background events (n SR A and n SR B ). The latter two numbers are not known for real data. The last column shows an estimate of the statistical significance of a possible discovery.
the same data and average their scores. Most importantly, the Null test works fine. If there are no anomalous events in the data (f SR = 0), the selected number is, within statistical fluctuations, in agreement with the expected value n SR exp = 1000. CWoLa does not provide any false indication for new physics. Hence, overfitting is no issue. Moreover, a signal rate f SR = 1%, which is still consistent with constraints from the latest ATLAS mono-jet search, leads to n SR = 1666. Without statistical doubts, such a finding would indicate that there is something to be understood about the data. Ideally, a thorough investigation of the selected jets will uncover the unexpected jet structure and hint towards a suitable new physics model.
Around a signal fraction f SR = 0.6%, the statistical significance rapidly drops below 5σ. Even under the ideal circumstances which are assumed in this chapter, the classifier is then not able to identify the new physics events as anomalous. Note that only N SR A = 6000 anomalous events are in the training sample with N SR + N CR = 2 · 10 6 events. It is extremely challenging for a classifier to efficiently learn the anomalous structures under these circumstances. In a supervised setup, the data instances would be weighted according to their abundance to help the classifier. In our weakly supervised setup, however, this is not possible.
Moreoever, our studies show that the absolute number of anomalous events is an essential parameter. If the number of background events is increased for a fixed number of anomalous events in the signal region, the performance of the CWoLa method is relatively stable, although the signal fraction is decreasing. In Tab f SR = 0.01 if the collected data increase. This is good news for the high-luminosity phase of the LHC. Whether improved training strategies or more powerful classifiers can improve the overall performance is left for future research. We have also investigated the effect of the control region being smaller than the signal region; in the analysis we have considered as an example, the smaller branching ratio of the Z boson into charged leptons leads to a smaller number of events in the control sample. Weighting the events accordingly, we do not observe a significant loss of discovery power beyond the increased statistical error.

Reality checks
In Sec. 5, we have shown the sensitivity of the CWoLa method under idealized conditions. In this section, we study how a non-trivial and more realistic composition of the signal and control samples impacts the performance. Here, we assume that the background events in the signal region contain r SR tt = 3.5% tt events and r SR V V = 2% diboson events (see Sec. 2.1). In particular, we investigate how well this composition has to be understood and reflected in the control region.
In the absence of any new physics events (f CR = f SR = 0), we show the results of the CWoLa tagging for several values of r CR tt and r CR V V in Tab. 4. For r CR tt = r CR V V = 0, i.e. the additional backgrounds are absent from the control region, the CWoLa method correctly identifies the different jet structures from the top-quark and weak-boson decays in the signal region. However, this is of course not a sign for new physics but a consequence of our poor modeling of the control region. Although this is a naive example, it highlights the challenge of the CWoLa approach: tagging more signal-region events than expected can always either be due to new physics or due to a mismodeling of the control region. For real data, the leading jets for the control region have to be measured from different event  Table 4. Number of events selected by the classifier from the different background classes in the signal and control region for different compositions of the control region. There are no anomalous events (f SR = 0) and we have r SR tt = 3.5% and r SR V V = 2%.
topologies and then combined to form a proper control region that matches the expected rates of the different backgrounds in the signal region, where input from theory and Monte Carlo is required. Therefore, it is a relevant question to which extent the control region needs to be understood. The results in Tab. 4 show, that the understanding and the inclusion of the backgrounds are crucial at the percent level, as it is already the case for the standard searches. However, variations of backgrounds between the control and signal region at the level of a few permille (i.e. a relative understanding at the level of 10%) are tolerable. Hence, one does not need to be perfect. In particular, the CWoLa setup is not too sensitive to overestimating small background components. Moreover, in a certain sense, we propose a self-correcting setup. Tagging more events from the signal region than expected in real data would first of all prompt more efforts to better understand the control region. The tagged events might also help in understanding which backgrounds are mismodeled. Only after all those studies one would want to pursue an interpretation of the selected signal events in terms of new physics, also aided by the investigation of the tagged jet's structure.
In Tab. 5 we again study if the exemplary semivisible jets can be tagged, using a more realistic background sample in the signal as well as in the control region. We see that excesses in n SR over the expected number of observations are dominated by these semivisible jets, even for slightly mismodeled control regions. Completely neglecting additional backgrounds in the control region, however, results in an excess dominated by these backgrounds and only minor enhancement of the fraction of signal jets.
In Appendix B, we further discuss the classifier score using the Monte Carlo labels of the events which are not available for real data.

Conclusion
We have demonstrated that the Classification Without Labels (CWoLa) method [18] is a powerful tool to boost the LHC discovery potential for new physics models with anomalous jet dynamics and a mono-jet signature. There are no model-specific assumptions, and the  Table 5. Number of events selected by the classifier from the different classes in the signal and control region for different compositions of the control region. There is a fixed fraction of anomalous events (f SR = 0.01) and we have r SR tt = 3.5% and r SR V V = 2%.
proposed setup can be implemented directly using collected data. The CWoLa method relies on a background-dominated control sample, which is in general available for LHC searches. Using Monte Carlo simulation and a well-motivated specific new physics model with a strongly interacting hidden sector [34], we show that less than 1% of new physics events in inclusive signal regions would be sufficient to discover signs of new physics. In contrast, the corresponding traditional search [33] is not even sensitive enough to exclude the model at 95% confidence level (the systematics dominated error on the SM prediction for signal region IM1 is 1.2%). We consider the hidden sector model to be a rather challenging test case since the modified jet dynamics of the model is difficult to recognize by unsupervised methods [47] and by dedicated supervised taggers [32].
The CWoLa setup, as a weakly supervised method, is not as sensitive as a dedicated model-specific and Monte Carlo-based approach using supervised methods. Moreover, as a discovery tool, it will not provide exclusion limits for specific model parameters. On the other hand, we emphasise again that the CWoLa method is model agnostic and can be applied directly to data. We have also demonstrated that the CWoLa method provides a useful data-driven tool to improve the understanding of the signal and control regions.
We have shown that the CWoLa discovery potential is increasing with more data, while traditional searches might already be limited by systematic uncertainties. The method should therefore continue to gain importance with, in particular, the large amounts of data expected in the high-luminosity phase of the LHC. Given that the CWoLa method is model-agnostic and data-driven and that it only requires a background-dominated control sample, we consider CWoLa-assisted searches for new physics and reanalyses of data already collected by the LHC experimental collaborations to be very promising. the computing time granted by the NHR4CES Resource Allocation Board and provided on the supercomputer CLAIX at RWTH Aachen University as part of the NHR4CES infrastructure. The calculations for this research were conducted with computing resources under the project rwth0934.

A The DGCNN classifier
As input for our Dynamic Graph Convolutional Neural Network (DGCNN) we use the 40 leading p T constituents of each jet, zero padded when needed. From the 4-momentum of each jet constituent we construct the set of seven input features {∆η, ∆φ, log(p T ), log(p T /p jet T ), log(E), log(E/E jet ), ∆R} with ∆η = η − η jet , ∆φ = φ − φ jet and ∆R = ∆η 2 + ∆φ 2 ; η, φ, p T and E refer to the rapidity, the azimuthal angle, the transverse momentum (in GeV), and the energy (in GeV) of the constituent, respectively. The quantities with superscript "jet" refer to the respective characteristics of the jet.
The DGCNN constructs a k-nearest-neighbors (knn) graph with these particles as nodes. We use k = 16. The initial graph is constructed using the Euclidean distances of the particles in ∆η and ∆φ. The network's architecture is almost identical to the one used in Ref. [32] for supervised classification on similar data sets, i.e. we use three EdgeConv blocks with three convolutions each. The number of features is increased successively from the seven input features to 64, 128 and finally 256. The graph is dynamically updated as knn graph after the first and second block using the Euclidean distances between nodes/particles based on all features. The outputs of each block are concatenated with the input resulting in 455 features per particle. After global average pooling the 455 features are fed into a fully connected network with 256, 128 and 2 nodes. We regularize the fully connected network using dropout with a fraction of 0.1 after the first two layers. We apply softmax activation to the last layer, allowing for a probability interpretation of the output. The two outputs then correspond to the probability of the input to belong to the control region or the signal region. We use the probability for the signal region as score s (see Sec. 4). The only difference to Ref. [32] is that we use Leaky ReLU with α = 0.1 instead of ReLu as activation function, since we observed that it leads to more stable results.
We implement the network using TensorFlow 2.6.0 [53] and the build-in version of Keras [54]. We use the Adam optimizer [55] with its default settings to minimize the categorical cross entropy. During training we reduce the learning rate by a factor of 0.1 when the training loss does not improve for 8 epochs. If the loss still does not improve for an additional four epochs, we stop the training. We set the maximum number of training epochs to 75, which is hardly ever reached. Note that we evaluate the network on the same data that we use to train. This corresponds to the procedure one would use on experimental data. The results in Sec. 5 show that overfitting is not a problem for such a large dataset, as we see no excess in selected events in the signal region in the absence of signal (see Tab. 1).

B Classifier output
In this appendix we show and analyze the classifier score for the training sample. Here, we make use of the Monte Carlo labels for anomalous/new physics events and background events of the different types. Hence, this information is not available for data collected at the LHC. However, it is nevertheless instructive to investigate it.
In Fig. 1, we show the signal score s for the simplified setup used in Sec. 5. The scores peak at s = 0.5, as this minimizes the loss function if all jets in the control and the signal region are drawn from identical probability distributions. For a well-defined control region there are no jets which are present only in the control region but not in the signal region, such that scores well below s = 0.5 would be a sign of overfitting and should not be observed. In the signal region, we distinguish jets from Z+jet background events and semivisible jets from new physics events. For a signal fraction f SR = 1% (and f CR = 0), there are enough semivisible jets such that the training is successful in at least identifying the most anomalous jets. Most jets from new physics events are not identified, but this is not a problem for the CWoLa method to work. For f SR = 0.5%, those anomalous jets are still there, but the training procedure is not able to distinguish them in an efficient way. In particular the range of classifier scores is significantly reduced to s < 0.55 (s < 0.8 for f SR = 1%).
In Fig. 2 and Fig. 3, we show the signal score s including subdominant backgrounds as discussed in Sec. 6. The signal fraction is always fixed at f SR = 1% (and f CR = 0). We use the nominal fractions r SR tt = 3.5% and r SR V V = 2% for the subdominant backgrounds in the signal region and vary r CR tt and r CR V V in the control region. If the mismodelling is too severe (right plot in Fig. 2), the classifier mainly tags tt and diboson events as signal-like, as expected and also shown in Tab. 5. If the mismodelling is reduced (right plot in Fig. 3) the semi-visible events dominate at large scores as it should be. This is even more so the case for perfect modelling (left plot in Fig. 2). The left plot in Fig. 3 shows that slightly overestimating distinct (multi-prong) jets in the control region is less dangerous to the CWoLa method than an underestimation. In this case, these jets are more abundant in the (not perfectly modeled) control region leading to scores s < 0.5. . We have r SR tt = 3.5% and r SR V V = 2.0% and use r CR tt = r SR tt and r CR V V = r SR V V (left) or r CR tt = r CR V V = 0 (right). We also show the threshold value t for CR = 0.001 as a vertical line.  Figure 3. Same as Fig. 2 but for overestimated (left) additional backgrounds in the control region (r CR tt = 5.0% and r CR V V = 3.0%) and underestimated (right) additional backgrounds (r CR tt = 2.8% and r CR V V = 1.6%). We also show the threshold value t for CR = 0.001 as a vertical line.