1 Introduction

Despite its success in describing the elementary particles and their interactions, the Standard Model (SM) is still incomplete, e.g., it does not account for neutrino masses, baryon asymmetry or dark matter. Thus, the discovery of physics beyond the Standard Model (BSM) is one of the main goals of particle physics. In particular, it is a core component of the physics program of the two multipurpose experiments, ATLAS [1] and CMS [2] at the Large Hadron Collider (LHC) at CERN. So far hundreds of searches for BSM physics have been conducted, but no significant deviation from the SM predictions has been found. With only a few exceptions (e.g., [3,4,5,6,7,8]), most of these searches were conducted following the blind-analysis paradigm according to which the data is only looked at in the last step of the analysis, after most of the time and efforts were invested. Moreover, since the data are looked at in the end, these analyses were typically designed to inspect a specific region of the observables space – the space spanned by all observables of the recorded data. As a result, despite thousands of person-years invested, a large portion of the observables space has yet to be fully exploited (see also Refs. [9, 10]). The risk of missing a discovery by studying only a limited number of final states could be mitigated by prioritizing the searches and focusing the efforts on high priority ones. Traditionally, this is mostly done based on theoretical considerations. However, by now, the searches with the strongest theoretical motivation have mostly been conducted and to a large extent, none of the many remaining ones are, a priori, more motivated than the others. This calls for investigating additional search paradigms.

Complementary to the blind searches, we propose extending the discovery potential of the LHC with a data-directed paradigm (DDP). Similarly to [5, 6, 8], its principal objective is to efficiently scan large portions of the observables space for hints of new physics (NP), but unlike [5, 6, 8] without using any Monte-Carlo (MC) simulation. We look directly at the data, in an attempt to identify regions in the observables space that exhibit deviations from a theoretically well established property of the SM. Such regions should be considered as data-directed BSM hypotheses, as opposed to theoretically-motivated ones, and could be studied using traditional data-analysis methods. As detailed in [11], a search in the DDP can be implemented with two key ingredients: (a) a theoretically well established property of the SM and (b) an efficient algorithm to search for deviations from this property.

In this work, we show that any symmetry of the SM can be exploited in such a data-directed search. Symmetries can be used to split the data into two mutually exclusive samples which should only differ by statistical fluctuations. By comparing them, we become sensitive to any potential BSM process which breaks this symmetry. In some cases, systematic detector effects could also affect the symmetry. There are methods to account for these effects in principle and so we do not consider them further in this proof-of-principle study. In an experimental realization of the symmetry-based DDP search, such systematic effects must be taken into account.

The concept of exploiting symmetries of the SM for data-driven BSM searches was previously proposed in [12] and [13]. It is also implemented in the ATLAS search for lepton flavor violating (LFV) decays of the Higgs (H) Boson [14], and the search for asymmetry between \(e^{+}\mu ^{-}\) and \(e^{-}\mu ^{+}\) events [15]. In the former, the SM background is estimated from the data using the electron-muon (\(e/\mu \)) symmetry method, based on the premise that kinematic properties of SM processes are, to a good approximation, symmetric under the exchange of prompt electrons and prompt muons.Footnote 1 In this case, the sample of all recorded data events with one electron and one muon in the final state is split into the \(e\mu \) and \(\mu e\) samples, which differ only by the \(p_\text {T}\) ordering of the two leptons. The Higgs LFV signal is expected to contribute only to one of these two samples, while the other is used as the background estimate. In [14], it was shown that systematic effects which violate the expected SM symmetry, e.g., the different detection efficiencies of electrons and muons, can be accounted for and the symmetry can be restored.Footnote 2 However, the implementation of the search still follows the blind-analysis paradigm where only a specific signal is searched for, in a small theoretically-motivated subset of the observables space.

In terms of the DDP proposed here, no specific signal is searched for. Instead, the full \(e\mu \) and \(\mu e\) samples are compared in many different sub-samples (corresponding to exclusive selections of the data), and any significant deviation observed is considered a potential sign for NP, to be further investigated. Thus, sensitivity to many more possible BSM processes and scenarios is enabled, and this \(e\mu /\mu e\) comparison becomes a general test for lepton flavor universality in the final state containing one electron and one muon. Similarly, different final states including a number of electrons, muons and other objects can be probed (ee vs \(\mu \mu \), \(e+\)jet vs \(\mu +\)jet, etc.), each potentially sensitive to different BSM manifestations. In this context, the recent hints for non-universality in the R\(_K\) measurements from LHCb [16] are in fact hints of an asymmetry between the ee and \(\mu \mu \) samples in the decay of b hadrons to a Kaon and two same-flavor leptons. Likewise, the comparison of \(e^-\mu ^+\) to \(e^+\mu ^-\) in [13, 15] is a test of CP symmetry in the lepton sector. Other symmetries could be used in similar implementations, such as forward-backward or time-reversal symmetries.

Given the large number of symmetries in the SM which can be violated in BSM scenarios, the potential benefits of implementing such symmetry-based generic searches are significant. However, interpreting the results must be done with care. Indeed, a data-directed search will naturally be tuned to identify regions including statistical fluctuations, or other measurement effects which could induce asymmetries. If a detected signal originates from a statistical fluctuation, it will disappear with more collected data. If it originates from a detector or other systematic effect which are correctly modeled in MC simulations then it can be ruled out. Any residual asymmetry can be considered a data-directed BSM hypotheses, to be inspected using standard analysis techniques. In this manner, the risk to claim a false discovery should not be higher than when implementing hundreds of searches in the blind-paradigm, since the trial factor is high in both cases [17].

The aim of this paper is to draw the attention on the potential for discovering BSM physics when implementing searches in the DDP, and in particular, data-directed searches based on symmetries of the SM. In this context, we lay the groundwork for a generic method to compare two data samples, and quantify the level of any discrepancy between them, if present. As previously discussed, we do not address here the treatment of eventual systematic effects which can deteriorate the expected SM symmetry between the two samples. Nonetheless, as shown in [14, 15], in analyses that were based on symmetry considerations, such effects can be accounted for.

Since the goal is to quickly scan multiple sub-regions of the observables space in a large number of final states, a fast method for identifying asymmetries is needed. We develop this method based on a simplified framework using MC simulated data. Different test statistics can be used to compare the two samples (e.g. Kolmogorov–Smirnov [18], student t test [19]). In the implementation proposed in this paper, the samples are represented by 2D histogramsFootnote 3 of predetermined properties of the data and compared using the simple \(N_\sigma \) test statistic defined below. Since the method is fast, multiple 2D histograms of all the existing properties and their combinations can be compared efficiently. We leave to future work the generalization of this study for a more comprehensive and optimized implementation.

When working with histograms, there is no a priori way to choose the bins, which is particularly challenging in many dimensions. One solution to this challenge is to make use of machine learning. Starting from [20, 21] based on [22], there have been a variety of proposals to perform anomaly detection with machine learning by comparing two samples [7, 20, 21, 23,24,25,26,27,28,29,30] (see Refs. [31,32,33,34,35] for recent reviews). Complementary to the binned DDP (henceforth, simply ‘the DDP’), we demonstrate that asymmetries can also be identified using weakly supervised Neural Networks (NN), similar to the approach in [22]. Nevertheless, for now such methods require training at least one NN for each event selection. This is time consuming and restricts the number of selections that can be tested, which could be limiting in the context of the DDP, depending on the available computational resources.

The sensitivity of the proposed DDP search is compared to that of two likelihood-based test statistics. While both assume exact knowledge of the signal shape, one represents an ideal search in which also the distribution of the symmetric background components are exactly known, and the other represents the expected sensitivity of a traditional blind analysis search employing a symmetry-based background estimation. According to the Neyman–Pearson lemma [36] these are the most sensitive tests for the respective scenarios they consider.

This paper is organized as follows. Section 2 describes some of the statistical properties of the DDP symmetry search. The simulated data used for our numerical studies is presented in Sect. 3. Results for the DDP are given in Sect. 4 and a complementary approach using neural networks is discussed in Sect. 5. The paper ends with conclusions and outlook in Sect. 6.

2 Quantifying asymmetries

Given two data samples, our goal is to determine the probability that they are asymmetric, as opposed to originating from the same underlying distribution. The latter represents the null hypothesis, where both measurements are indeed symmetric as expected from the symmetry property of the SM considered. In the context of the symmetry-based DDP proposed here, and unlike in other statistical tests commonly used in BSM searches, no signal assumptions are made. The test is intended to output the probability at which the background-only hypothesis is rejected.

In order to rapidly scan many selections and final states, the method used to quantify the asymmetry between two samples should be efficient. This can be achieved if we ensure that the results obtained are independent of the properties of the underlying symmetric background component. Indeed, one of the most time consuming tasks for implementing a statistical test to reject an hypotheses is the determination of the test statistic’s probability distribution function (PDF) under said hypotheses. But if this PDF is constant and known, we avoid the time consuming task of deriving it for each different samples tested.

The generic \(N_{\sigma }\) test statistic considered is given in Eq. (1). A and B are two n-dimensional matrices, representing the two tested data samples projected into histograms of n properties of the measurements. They each have M bins in total, the \(A_i\) and \(B_i\) are their respective number of entries in bin i, and the \(\sigma _{Ai}\) and \(\sigma _{Bi}\) their respective standard errors:

$$\begin{aligned} {\mathrm {N}}_{\sigma }(B,A) = \frac{1}{\sqrt{M}} \sum _{i=1}^{M} \frac{B_i - A_i}{\sqrt{\sigma _{Ai}^2+\sigma _{Bi}^2}}\,. \end{aligned}$$
(1)

In this formalism, we search for a signal in B by comparing it to the reference measurement A, but their roles are exchangeable. When A and B are two (Poisson-distributed) measurements, Eq. (1) simplifies to:

$$\begin{aligned} {\mathrm {N}}_\sigma (B,A) = \frac{1}{\sqrt{M}} \sum _{i=1}^{M} \frac{B_i - A_i}{\sqrt{A_i+B_i}}\,. \end{aligned}$$
(2)

It can be shown that in the limits of the normal approximation, applicable here provided there are enough statistics in each bin of the two matrices, the symmetry-case PDF of the \(N_{\sigma }\) test is well approximated by a standard Gaussian. This satisfies the condition that the test should be independent of the underlying symmetric component, ensuring its efficiency. In what follows, we confirmed that this approximation is valid when ensuring at least 25 entries per bin. For scenarios with lower statistics, the distortion of the background-only PDF from the normal distribution should be evaluated. Nevertheless, large \(N_{\sigma }\) values would correspond to asymmetries.

The performance of the \(N_{\sigma }\) test is compared to that of two distinct likelihood-based test statistics, which are built on the test statistic for discovery of a positive signal introduced in [37] and rely on the full knowledge of the signal shape that is being searched for:

  • \(q_{0}^{L1}\) assumes that the underlying symmetric component is perfectly known. This is equivalent to the ideal analysis case in which the signal and background distributions are perfectly known (no uncertainties).

  • \(q_{0}^{L2}\) uses no a priori knowledge of the underlying symmetric distribution, and estimates it from the two measurements as part of the fitting procedure. This represents the case where the symmetry is the only available information.

Since we aim at comparing the sensitivity to detect asymmetries using the \(N_\sigma \) test relatively to the likelihood-based tests, statistical uncertainties on the signal are not included in this study. The likelihood functions for each scenario are shown below, where S is the shape of the signal considered, B is the tested sample, T is the true distribution of the symmetric background and A is a measurement of T. The parameter \(\mu \) represents the signal-strength, and \(b=\{b_i\}\) are the background parameters (one per bin of the matrix):

$$\begin{aligned} L1_{\mu }(B,T,S) = {\mathrm {Poisson}}(B~|~T+\mu S) \qquad \qquad \qquad \end{aligned}$$
(3)
$$\begin{aligned} L2_{\mu }(B,A,S;b) = {\mathrm {Poisson}}(B~|~b+\mu S) \cdot {\mathrm {Poisson}}(A~|~b) \end{aligned}$$
(4)

The formalism used, which permits a comparison with the \(N_\sigma \) test, is shown in Eqs. (5) and (6), where \(L_\mu \) is the likelihood function (either \(L1_\mu \) or \(L2_\mu \)), \(\lambda _\mu \) is the profile likelihood ratio, \(\hat{\mu }\) and \(\hat{b}\) are the maximum likelihood estimators of \(\mu \) and the \(b_i\) parameters, and \(\hat{\hat{b}}\) is the maximum likelihood estimator of the \(b_i\) when \(\mu \) is fixed.

$$\begin{aligned}&\lambda _\mu (B,A,S)=\frac{L_\mu (B,A,S;\hat{\hat{b}})}{L_{\hat{\mu }}(B,A,S;\hat{b})} \end{aligned}$$
(5)
$$\begin{aligned}&q_0(B,A,S) = \left\{ \begin{array}{ll} -2\ln \lambda _0(B,A,S) ,&{} \hat{\mu } \ge 0 \\ +2\ln \lambda _0(B,A,S) ,&{} \hat{\mu } < 0 \end{array} \right. \end{aligned}$$
(6)

When performing a test for discovery, we compare the test’s score to the background-only PDF to obtain a p value (p) which gives a measure of the level at which the background hypothesis can be rejected. We then translate this p value into an equivalent significance \(Z =\Phi ^{-1}(1-p)\), where \(\Phi ^{-1}\) is the quantile of the standard Gaussian. A significance of 5 is commonly considered an appropriate level to constitute a discovery, corresponding to \(p\approx 2.87\times 10^{-7}\). For the case of the \(N_\sigma \) test, the background-only PDF is itself a standard Gaussian. Therefore the score obtained is directly a measure of the obtained significance Z, bypassing the need to compute the p value:

$$\begin{aligned} Z=N_\sigma (B,A)\,. \end{aligned}$$
(7)

Similarly, regarding the \(q_0\) test, we know from [37] that:

$$\begin{aligned} Z=\sqrt{q_0}(B,A,S)\,. \end{aligned}$$
(8)

So the \(\sqrt{q_0}\) background-only PDF is again a standard Gaussian.Footnote 4 Therefore, in the following, we directly compare the \(N_\sigma \) and \(\sqrt{q_0}\) significance values.

3 Data preparation

The symmetry-based DDP is demonstrated in a practical example, the search for Higgs LFV decays, \(H\rightarrow \tau \mu \) where the \(\tau \) further decays to an electron. The SM processes considered which contribute to the symmetric background includes Drell-Yan, di-boson, Wt, \(t\bar{t}\) and SM Higgs (\(H\rightarrow WW / \tau \tau \)). For each of these processes, a sample equivalent to \(40~\mathrm {fb}^{-1}\) of pp collisions at \(\sqrt{s} = 13\) TeV was generated using MadGraph 2.6.4 [38] and Pythia 8.2 [39]. The response of the ATLAS detector was emulated using Delphes 3 [40]. The signal processes considered are gluon–gluon fusion and vector boson fusion Higgs production mechanisms. These SM events are used to construct an \({e\mu }\) symmetric template (T) matrix – representing the SM background underlying distributions from which symmetric samples will be drawn (see description of this process further below). The Higgs LFV signal events are used to construct a normalized signal template matrix S. This is done by projecting the simulated measured events on a \(28\times 28\) 2D histogram, with two selected event properties:

  • x-axis: collinear mass (defined e.g. in [14]), 5 GeV bins from 30-170 GeV

  • y-axis: leading lepton \(p_\text {T}\), 5 GeV bins from 10-140 GeV

To demonstrate the concept, and to allow quantitative comparisons to the performance of the likelihood-based tests, we avoided bins with low statistics by adding a flat 25 entries to each bin in T. The resulting T and S templates are shown in Fig. 1.

Fig. 1
figure 1

The \(e/\mu \) background template matrix T (top) and the Higgs LFV signal template matrix S (bottom). The x, y and z axes are the collinear mass, leading lepton \(p_\text {T}\) and number of entries per bin respectively (S is normalized)

Other background and signals considered are flat T background distributions (with either 100 or \(10^4\) entries per bin), and rectangle and 2D Gaussian signals S templates.

Given a background template T, which represents the underlying symmetric distribution, and a signal template S, which can be injected with different levels of signal-strength, the procedure to generate the samples used to qualify the different tests is as follows. From T we Poisson draw N pairs of (AB) background-only measurements which are symmetric up to statistical fluctuations. The background + signal measurements \(B^s\) are obtained by injecting some signal into the B samples. We inject the signal with a signal-strength \(\mu _{\mathrm {inj}}\), determined such that a \(q_0\) test for discovery (\(q_0^{L1}\) or \(q_0^{L2}\)) outputs a given significance \(Z_{\mathrm {inj}}\) when testing \({B^s=B+\mu _{\mathrm {inj}}S}\) against B:

$$\begin{aligned} \sqrt{q_0}(B+\mu _{\mathrm {inj}} S, B, S)=Z_{\mathrm {inj}}\,. \end{aligned}$$
(9)

Since S is normalized, \(\mu _{\mathrm {inj}}\) is the number of signal events added to the B sample.

Explicitly, for the \(q_0^{L1}\) and \(q_0^{L2}\) cases, it is found by solving Eqs. (10) and (11), respectively:

$$\begin{aligned} 2\left( -\mu _{\mathrm {inj1}} + \sum \limits _{i=1}^{M}\left[ (B_i+\mu _{\mathrm {inj1}}S_i)\ln \left( 1+\mu _{\mathrm {inj1}}\frac{S_i}{B_i}\right) \right] \right) = Z_{\mathrm {inj1}}^2 \end{aligned}$$
(10)
$$\begin{aligned} 2\sum \limits _{i=1}^{M}\left[ (B_i+\mu _{\mathrm {inj2}} S_i)\ln \left( 1+\mu _{\mathrm {inj2}}\frac{S_i}{2B_i+\mu _{\mathrm {inj2}} S_i}\right) \right. \nonumber \\ \left. -B_i\ln \left( 1+\mu _{\mathrm {inj2}}\frac{S_i}{2B_i}\right) \right] =Z_{\mathrm {inj2}}^2 \end{aligned}$$
(11)

For each separate experiment considered and detailed below, the number of A, B and \({B^s=B+\mu _{\mathrm {inj}}S}\) matrices we generate is \(N=20000\). For the \(N_{\sigma }\) and \(q_{0}^{L2}\) tests, the PDFs of the symmetric case (background-only) are obtained by comparing the B and A pairs, and the PDFs of the asymmetric case (signal+background) by comparing the \(B^s\) and A pairs. The same is applied for the \(q_{0}^{L1}\) test, when the A matrices are replaced by the template T.

4 Results

Focusing on the Higgs LFV example, using the signal (S) and background (T) templates shown in Fig. 1, we apply an injected signal-strength \(\mu _{\mathrm {inj}}\) which corresponds to \(5\sigma \) significance of the ideal \(q_{0}^{L1}\) test. To give an impression, when applied to T, this corresponds to a signal fraction of 0.2%, or in a \(6\times 6\) window centered around the signal of 2.8%. In Fig. 2, we compare Z PDFs obtained with the \(q_{0}^{L1}\), \(q_{0}^{L2}\) and \(N_\sigma \) tests. As expected, the symmetric-case PDFs of all tests are consistent with standard Gaussian distributions. We observe that the background + signal (asymmetric-case) PDFs are consistent with Gaussians with variance \(1\pm 0.05\) (for all examples considered), centered around the resulting average significance \(Z_{\mathrm {avg}}\) of the relevant test. The \(Z_{\mathrm {avg}}\) of each test can be directly estimated by using the Asimov data [37]; setting \(A=T\) and \(B^s=T+\mu _{\mathrm {inj}}S\). The resulting significance with the \(q_{0}^{L1}\) test is predictably \(Z_{\mathrm {avg}}=5.0\approx Z_{\mathrm {inj}}\). With \(Z_{\mathrm {avg}}=3.53\), \(q_{0}^{L2}\) is less sensitive than \(q_{0}^{L1}\) since it does not use an a priori knowledge of the background, but estimates it from the two measurements as part of the fitting procedure. Since the \(N_\sigma \) test is averaged on all the bins, and most of them only include background contributions, the resulting average significance \(Z_{\mathrm {avg}}=1.48\) is significantly lower than the separation power measured with the \(q_{0}^{L2}\) test.

Fig. 2
figure 2

Significance PDFs comparing results of the \(N_\sigma \), \(q_{0}^{L1}\) and \(q_{0}^{L2}\) tests for the Higgs LFV example, with injected signal strength corresponding to \(5\sigma \) of \(q_{0}^{L1}\)

In general, it can be much more efficient to apply the \(N_\sigma \) test in a sub-region of the data samples. Even though the signal’s shape and location is not known in a generic test, since the calculation of \(N_\sigma \) is fast, one could test multiple bin subsets,Footnote 5 or develop an algorithm to optimize this selection. In Fig. 3, we show \(N_\sigma \) scores with the Asimov data, obtained when the test is performed on square windows of different sizes, centered around the location of the signal. The \(N_\sigma \) sensitivity increases when the window encapsulates the signal region more precisely, reaching up to \(Z_\mathrm {avg,max}=2.74\) with the \(6\times 6\) bins window. Thus, for this example, the sensitivity achieved is only slightly worse than the one achieved with the \(q_{0}^{L2}\) test, which exploits a full knowledge of the signal shape. The \(N_\sigma \) results presented hereafter are for the best suited window (\(6\times 6\) bins for all examples considered).

Fig. 3
figure 3

Significance measured from the Asimov data, with the \(N_\sigma \) test applied to increasing window sizes, and compared to the \(q_{0}^{L1}\) and \(q_{0}^{L2}\) significance. Results for the Higgs LFV example and the ideal (flat) scenario are shown, with injected signal strength corresponding to \(5\sigma \) of \(q_{0}^{L1}\). The green and yellow bands correspond to the 1\(\sigma \) and \(2\sigma \) deviations from the symmetry (no signal) assumption, respectively

In Fig. 4 we show the Receiver Operating Characteristic (ROC) curves obtained from the PDFs of the different tests. The Area-Under-Curve (AUC) measured is approximately 1.0 for the \(q_{0}^{L1}\) test and 0.994 for the \(q_{0}^{L2}\) test. With an AUC of \(=0.973\), the \(N_\sigma \) test is only 2.6% less sensitive than the \(q_{0}^{L1}\) test, and 2.0% less sensitive than the \(q_{0}^{L2}\) test. Finally, in Fig. 5, we show \(Z_{\mathrm {avg}}\) per test (estimated from the Asimov data), for increasing injected signal strength. Using the \(N_\sigma \) test statistic, the symmetric case (background only) can be separated from the asymmetric case at the level of \(2\sigma \) if the signal that would have been measured assuming an ideal analysis (\(q_0^{L1}\)) is at the level of \(3.5\sigma \). This should be compared also to the \(2.5\sigma \) separation that would have been obtained in the same case using the profile likelihood ratio test statistic that uses the two samples to estimate the symmetric background and a full knowledge of the signal shape (\(q_0^{L2}\)).

Fig. 4
figure 4

ROC curves comparing results of the \(N_\sigma \), \(q_{0}^{L1}\) and \(q_{0}^{L2}\) tests for the Higgs LFV example, with injected signal strength corresponding to \(5\sigma \) of \(q_{0}^{L1}\)

For clarity, we also consider a flat background template T with \(10^4\) entries in each bin, and a flat rectangle signal template S of size \(6\times 6\) bins, located at the center of T. Since the \(q_{0}^{L1}\) and \(q_{0}^{L2}\) are independent of the background and signal shapes, and only depend on the injected signal strength, their symmetry- and asymmetry-case PDF will remain unchanged. The PDF associated with the \(N_\sigma \) in the asymmetric case will change. As shown in Figs. 3 and 5, in this simplified case the \(N_\sigma \) sensitivity matches exactly the sensitivity of \(q_{0}^{L2}\) test. This hints that the loss of sensitivity of the generic \(N_\sigma \) test, compared to \(q_{0}^{L2}\), is mainly due to shape variations of the background and the signal (in the optimal sub-region that is tested). But even in a realistic scenario like the Higgs LFV example, the sensitivity loss is reasonable (from \(Z_{\mathrm {avg}}=3.53\) to 2.74) and the power achieved to identify regions with asymmetry, even though the \(N_\sigma \) test is generic, is significant.

Fig. 5
figure 5

Significance measured from the Asimov data for increasing injected signal, comparing results of the \(N_\sigma \), \(q_{0}^{L1}\) and \(q_{0}^{L2}\) tests. Results for the Higgs LFV example and the ideal (flat) scenario are shown. The green and yellow bands correspond to the 1\(\sigma \) and \(2\sigma \) deviations from the symmetry (no signal) assumption, respectively

In terms of the ability to identify asymmetries, similar performance was obtained for all the other shapes of signal and background considered.

5 Identifying asymmetries with neural networks

Machine learning-based anomaly detection methods constructed by comparing two samples are categorized as weakly- or semi-supervised learning because both samples are mostly background and one of them will have more signal than the other. The sample with more potential signal is given a noisy label of one and the other sample is given a label of zero. A classifier trained to distinguish the two samples can then automatically identify subtle differences between the samples without explicitly setting up bins. Existing proposals construct the samples from signal region/sideband regions [7, 20, 21], from data versus simulation [23, 24, 30, 41], as well as other approaches [25,26,27,28,29]. We propose to extend this methodology to symmetries.

The combination of machine learning and symmetry has received significant attention. For a given symmetry, one can construct machine learning methods that are invariant or covariant (in machine learning, this is called equivariant) under the action of that symmetry. For example, recent proposals have shown how to construct Lorentz covariant neural networks [42,43,44]. Symmetries can also be used to build a learned representation of a sample [45]. There have also been proposals to use machine learning methods to discover symmetries automatically in samples [46,47,48]. In the context of BSM searches, Refs. [49, 50] recently described how to use a weakly supervised-like approach to test if a given symmetry is broken by applying the transformation to the input data. Our approach also starts by positing a symmetry, but we do not apply the symmetry transformation to each data point. Instead, we have two samples which should be statistically identical in the presence of a symmetry, but which could be different when BSM is present.

In the following, we demonstrate the concept of identifying asymmetries using a weakly supervised approach. Considering the \(e\mu \) symmetry example discussed above, one of the samples is the \(e\mu \) sample and the other is the \(\mu e\) sample. The same two-dimensional space as described earlier is used for illustration; extending to higher dimensions is technically straightforward. A deep neural network with three hidden layers and 50 nodes per layer is used for the classifier. Rectified Linear Units (ReLU) are used for all intermediate layers and the output is passed through a sigmoid function. This network is implemented using Keras [51] and Tensorflow [52] using Adam [53] for optimization. We train for 20 epochs with a batch size of 200. None of these parameters were optimized. Figure 6 shows the symmetry/asymmetry separation power of the NN as a function of the signal fraction injected to the \(\mu e\) sample. The background-only band is computed via bootstrapping [54]. For each bootstrap, two samples are created by drawing from the \(e\mu \) and \(\mu e\) events with replacement. By mixing the two samples, any asymmetry is removed.

There is no unique way to quantify the NN performance. An optimal test statistic by the Neyman–Pearson Lemma [36] is monotonically related to the likelihood ratio. Refs. [23, 24, 30, 55] show how to modify the loss function so that the average loss approximates the (log) likelihood ratio. Here, we find that in practice, the maximum NN score using the standard binary cross entropy loss function is an effective statistic, which goes from 0.5 in the case of no signal and increases as more signal is injected. The background-only band in Fig. 6 is computed via bootstrapping and where the blue line and green/yellow bands cross indicate the approximate \(1\sigma /2\sigma \) exclusion. The NN is able to automatically identify the presence of BSM for signal fractions that are a few per mil, corresponding to around 5\(\sigma \) significance calculated with the ideal \(q_0^{L1}\) test. Future explorations of this idea will understand the best way to set up the training, what statistics are most effective, and how to best extend to higher dimensions.

Fig. 6
figure 6

The maximum neural network score from training a classifier to distinguish the \(e\mu \) from \(\mu e\) samples with (asym) and without (sym) a BSM contribution. The green (yellow) and blue bands represent (twice) the standard deviation over 10 bootstrap samples.The separation power is shown as a function of the injected signal fraction (bottom scale) and the corresponding significance calculated with the ideal \(q_0^{L1}\) test. Note that these results are not directly comparable to the binned DDP because it is not possible to ignore signal statistical uncertainties

6 Discussion

With limited resources at hand and yet no conclusive indication of BSM physics found, we must try novel and complementary avenues for discovery. To overcome the limitations stemming from adapting the blind-analysis strategy, we propose developing the DDP. Similarly to [3,4,5,6, 8] yet without relying on MC simulations, its principal objective is to allow scanning as many regions of the observable-space as possible and direct dedicated analyses towards the ones in which the data itself exhibits deviation from some fundamental and theoretically well-established property of the SM. Relative to regions in which the data agrees well with the SM predictions, the ones that exhibit deviations are promising for further investigations into BSM physics.

We propose developing the DDP based on symmetries of the SM and demonstrate its potential sensitivity using as an example the \(e/\mu \) symmetry. Symmetries allow splitting the data into two mutually exclusive samples which, under the symmetry assumption, differ only by statistical fluctuations. Thus, asymmetry observed between the two samples in any observable and at any sub-selection of these samples, is potentially interesting and should be considered for further study.

While different algorithms can be developed to identify asymmetries, even the most simple one developed, the \(N_\sigma \) test statistic, already provides good sensitivity. It is compared to the sensitivity obtained with two likelihood-based test statistics; the first, \(q_0^{L1}\), represents an ideal analysis in which both the signal and the symmetric contribution from the SM processes are perfectly known. The second, \(q_0^{L2}\), represents the expected sensitivity of a traditional blind analysis search for a predefined signal that employs a symmetry-based background estimation ([14]).

Compared to the sensitivity obtained in an ideal analysis, the separation power between the symmetric case and an asymmetry at the level of 5\(\sigma \) is less than 3% lower in terms of the area under the ROC curve, and a separation at the level of \(2\sigma \) is achieved for \(3.5\sigma \) signal injected. Compared to a traditional symmetry-based analysis, the separation power between the symmetric case and an asymmetry at the level of 3.5\(\sigma \) is less than 2% lower in terms of the area under the ROC curve, and a separation at the level of \(2\sigma \) achieved using the \(N_\sigma \) test is only slightly degraded relative to the \(2.5\sigma \) obtained with the \(q_0^{L2}\) test. The results quoted are when applying the \(N_\sigma \) test in the best suited window for the examples considered. The ability to find this optimal window demonstrates the strength of the DDP. Since the test is rapid, a large number of n-dimensional histograms and windows within can be tested efficiently. This could permit scanning the data systematically in search for asymmetries.

We have shown that weakly-supervised NNs can also be used to identify asymmetries between two samples. This paves the way towards NN based DDP.

We emphasize that traditional blind-analyses are expected to be the most sensitive ones for any predefined signal. Nonetheless, it is impossible to conduct a dedicated search in any possible final state and at any possible event selection. Moreover, not all potential signals can be thought of. Thus, the DDP could significantly expand our discovery reach.