A data-directed paradigm for BSM searches: the bump-hunting example

Volkovich, Sergey; De Vito Halevy, Federico; Bressler, Shikma

doi:10.1140/epjc/s10052-022-10215-1

A data-directed paradigm for BSM searches: the bump-hunting example

Letter
Open access
Published: 27 March 2022

Volume 82, article number 265, (2022)
Cite this article

Download PDF

You have full access to this open access article

The European Physical Journal C Aims and scope Submit manuscript

A data-directed paradigm for BSM searches: the bump-hunting example

Download PDF

964 Accesses
7 Citations
4 Altmetric
Explore all metrics

A preprint version of the article is available at arXiv.

Abstract

We propose a data-directed paradigm (DDP) to search for new physics. Focusing on the data without using simulations, exclusive selections which exhibit significant deviations from known properties of the standard model can be identified efficiently and marked for further study. Different properties can be exploited with the DDP. Here, the paradigm is demonstrated by combining the promising potential of neural networks (NN) with the common bump-hunting approach. Using the NN, the resource-consuming tasks of background and systematic uncertainty estimation are avoided, allowing rapid testing of many final states with only a minor degradation in the sensitivity to bumps relative to standard analysis methods.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Despite its success in describing the elementary particles and their interactions, the Standard Model (SM) is still incomplete [1]. Many models extending beyond the Standard Model (BSM) have been developed over the years predicting the existence of new resonances. Thus, the search for new resonances, either theoretically-predicted or model-agnostic, is a core strategy for discovery in experimental high energy physics (e.g. recently [2,3,4]).

With almost no exception,^{Footnote 1} all BSM searches have been conducted following the blind analysis paradigm, in which an enormous amount of time and effort is invested before looking at the data, i.e. on background modeling and systematic uncertainty estimation. These resource-intensive tasks have allowed only a limited region in the space spanned by all observables (“observable-space”) to be explored to-date. Indeed, searches typically focus on inclusive final states – di-lepton, di-photon, di-jet, etc. – ignoring all other observables and avoiding exclusive selections such as di-lepton + jets, di-jets + missing transverse momentum, di-photon within a $t\bar{t}$ topology, etc. Moreover, within the studied final states, event selection is usually optimized relative to predefined signal models. So far, no significant indication of BSM physics has been found.

Complimentary to the blind analysis paradigm, we propose a data-directed paradigm (DDP) which begins by efficiently identifying regions of interest in the data. Similarly to [5,6,7], albeit without using Monte Carlo (MC) simulation, the strategy consists of quickly searching the observable-space for exclusive regions exhibiting a significant deviation from some fundamental SM property. Such regions should be considered for data-directed signal hypotheses and further examined using traditional analysis techniques. Like in [4], no MC simulation is used. Thus, the search is not sensitive to MC mismodelling or limited MC statistics. Given the large number of plausible signals which could manifest in an infinite number of exclusive regions, and moreover, the limited time, manpower and resources at hand, searches like the proposed DDP might provide our best chance at discovering BSM physics.

2 A data-directed paradigm

A DDP search can be realized with two key ingredients:

1.
A theoretically well-established property of the SM with respect to which deviations can be searched for - here we exploit the fact that within the SM, in absence of resonances, almost any invariant mass distribution is smoothly falling. Other properties of the SM, such as flavour symmetry [8] or forward-backward symmetry could also be exploited once detector effects are taken into account (as implemented for instance in [9]).
2.
An efficient algorithm to scan the observable-space in search for deviations – here we train a deep neural network (NN) to map any invariant mass distribution into a distribution of statistical significance for excesses of events (“bumps”). The latter is known as a “z” distribution and is based on the profile likelihood ratio test for positive signals [10]. Different algorithms should be developed when searching for deviations from other SM properties.

The challenge of bump-hunting is an excellent showcase for a search in the DDP^{Footnote 2}; even a simple implementation achieves good accuracy. As long as the underlying background distribution is smoothly falling, a single trained NN, as described in this letter, can quickly perform statistical inference from many selections of observed data. For example, when adapting it to a narrower mass range, it predicted a maximum significance in agreement with the di-muon results presented in [3] in seconds. Crucially, it avoids the time- and effort-consuming tasks of full background and systematic uncertainty estimation currently carried out for every invariant mass distribution under consideration. This way, a potentially unlimited quantity of exclusive distributions can be scanned and large regions in the observable-space can be covered. Nevertheless, event-by-event optimization for bump enhancement, as studied for instance in [11,12,13], is left to future work.

Bumps identified using the DDP are likely to be caused by statistical fluctuations. These will disappear when tested with more data. Bumps originating from systematic uncertainties due to detector effects (trigger thresholds, kinematic edges, etc.) should appear in MC simulations as well and can be ruled out. Among the bumps found which do not disappear with added data and do not appear in simulation, the most significant ones should be considered as BSM signal hypotheses and devoted a dedicated analysis. Inevitably, some may be due to mismodelled systematic effects.

3 A neural network implementation

The NN we employ is trained in a supervised manner, for which we generate a set of artificial training and testing data. These contain inputs which simulate realistic distributions in observed data (in contrast to individual events, as in [11,12,13]), and can be further tailored for any given search. Inputs are matched to analytically calculated z distributions as targets. When given an invariant mass distribution, the NN predicts a z distribution which shows where and how likely it is that the data contains a bump. Once the NN has been trained, we validate that its predictions are consistent and that its loss value converges. Finally, its predictions are evaluated on the test set and we discuss its performance.

We generate the inputs of the NN as 100 bin histograms of observed events, $N=B+S$. These are representative of data with high statistics and dynamic range (the bin width reflects a given detector resolution). The generation process is illustrated in Fig. 1. Each input is composed of a smoothly decaying background curve, B, to which Poisson fluctuations are added, and a localized Gaussian signal, S, whose significance is defined relative to the fluctuated background. We calculate bin-by-bin the corresponding NN target, z, which we use to approximate the significance distribution given the unfluctuated background and assuming a Gaussian signal shape [10]. Each input and target pair is collectively referred to as a “sample”. All samples are globally scaled to the interval [0, 1] under a linear transformation before being utilized by the NN.

A variety of smoothly falling backgrounds is modelled by randomly selecting one of the following ten functional forms for each sample:

$$\begin{aligned}&be^{-ax},\ \ ax+b,\ \ \frac{1}{ax}+b,\ \ \frac{1}{ax^2}+b,\ \ \frac{1}{ax^3}+b,\nonumber \\&\frac{1}{ax^4}+b,\ \ a\left( x-x_2\right) ^2+y_2,\ \ -a\cdot \ln \left( x\right) +b,\nonumber \\&\left( y_1-y_2\right) \cos \left( a\left( x-b\right) \right) +y_2,\ \ \cosh \left( a\left( x-x_2\right) \right) +b.\nonumber \\ \end{aligned}$$

(1)

The parameters a and b are defined such that each curve decays between two points, $\left( x_1,y_1\right) $ and $\left( x_2,y_2\right) $, where $x_1 < x_2$ are the centers of the extreme bins and $y_1 > y_2$ are randomized from the interval [100, 10,000].

Gaussian shaped signals are generated with mean values distributed randomly between bin 25 and bin 76. The width (standard deviation) of the signals is fixed at 3 bins. To improve the desired feature detection, the NN is trained with a data set containing signals with significance in the range [1,20]$\sigma $. The performance of the NN is determined on a testing data set by evaluating its ability to identify bumps with a significance of 3$\sigma $ – the common definition of a “hint” for BSM physics.

Various NN architectures can be used. Here, we choose an architecture based on a dense layer followed by six 1-dimensional convolutional layers. The latter are intended for feature-detection, while the former is useful in suppressing position-dependent biases. A “rectified linear unit” activation function is used. The “Adam” optimizer is used to minimize the “mean squared error” loss function over 200 epochs at a learning rate of 0.0003 with a batch size of 100. We generate a total of 600,000 training samples, 20% of which are used for validation, and 150,000 testing samples.

4 Results

The accuracy of the NN prediction is illustrated in Fig. 2 in terms of the difference between the maximal predicted significance, $z\mathrm {^{max}_{pred}}$, and the one calculated via the profile likelihood ratio test, $z\mathrm {^{max}_{true}}$. All generated test samples are included in the figure; in over 87% of these the predicted peak was found within 1 bin of $z\mathrm {^{max}_{true}}$. A mean ($\mu $) of $-0.02$ indicates a negligible bias in the prediction and a 0.46 standard deviation ($\sigma $) measures its precision. The asymmetry seen as a sharp edge in the third quadrant originates from the small number of maximal z predictions below one.

We are interested in finding samples with bumps of 3$\sigma $ significance while rejecting samples without bumps. Figure 3 shows $z\mathrm {^{max}_{true}}$ in a solid line and $z\mathrm {^{max}_{pred}}$ in a dashed line for samples with no signal added (blue) and for samples with a 3$\sigma $ significance signal added (orange). In a traditional bump-hunting search, the signal significance is evaluated relative to an estimated background. Thus, the measured significance of a 3$\sigma $ signal could fluctuate around this value. This is the origin of the width of the $z\mathrm {^{max}_{true}}$ distributions: the signal is generated with a significance relative to the fluctuated background and its $z\mathrm {^{max}_{true}}$ is evaluated relative to the smooth background.

According to the Neyman–Pearson lemma (see e.g. [14]), $z\mathrm {^{max}_{true}}$ provides the most powerful signal to background separation. It relies on exact knowledge of both the background and signal shapes. Yet, despite using no a priori knowledge of the two, the signal to background separation in $z\mathrm {^{max}_{pred}}$ is only slightly degraded relative to $z\mathrm {^{max}_{true}}$. This is quantified in terms of receiver operating characteristic (ROC) curves in Fig. 4, obtained from the distributions of Fig. 3. The true (blue) and predicted (orange) ROC curves show the efficiency to correctly identify a 3$\sigma $ bump versus the false positive rate of selecting samples with no injected bump. The area under the true curve, $A_{\mathrm {true}}$, is 0.899 while the area under the predicted curve, $A_{\mathrm {pred}}$, is 0.865, which implies a degradation in performance of less than 4%. In other words, the probability that based on the NN output a selection will be marked as potentially interesting approaches the probability that a traditional method would do the same.

We also confirmed that the NN is able to generalize in identifying with comparable accuracy bumps over linear combinations of the background forms (Eq. 1), and over an unseen 10th shape when trained on 9 background shapes.^{Footnote 3} Thus, it is unrestricted by specific background forms in its capacity to detect bumps, which goes beyond the potential of traditional techniques. Similar performance was obtained in additional scenarios: when testing on distributions with lower and higher statistics (in the ranges between 100–500 and 5000–10,000, respectively), when extending the bump region from the bin range 25–76 to 5–96, and when training and testing on samples with wider bumps, of either 4 or 5 bins.

5 Validation

We validate the convergence of the loss value achieved by increasing either the number of epochs or the size of the training data set. In terms of $A_{\mathrm {pred}}$ from Fig. 4, the NN performance varies insignificantly, by less than 1% when moving past 200 epochs (for 100,000 input samples) or 500,000 input samples (for 100 epochs).

Consistency was ensured by comparing the NN predictions in two scenarios (with 100,000 training samples and 100 epochs). First, we trained four different NNs using an independent training data set for each and compared their predictions on a common testing data set (with 25,000 samples). Second, the performance of each of the NNs was separately compared on four different testing data sets. In all cases, the accuracy when separating signal from background was unaffected.

6 Discussion

We have presented a data-directed paradigm, complementary to the blind analysis paradigm, and demonstrated one of its possible implementations using the concept of bump-hunting. We have shown that a NN can be trained to efficiently identify bumps over smoothly falling backgrounds without being given any a priori information about the background or the bump’s position. Relative to the most powerful test statistic (profile likelihood ratio), which relies on exact knowledge of both the background and signal shapes, the performance of the NN was only inferior by less than 4% when considering the area under the ROC curve. Since for each data distribution the NN prediction is obtained within a couple of seconds (compared to a year or more when following the blind analysis paradigm), these results pave the way towards scanning the overwhelming observable-space that is being measured in experiments searching for bumps. Examples could be searches for di-lepton, di-jet, di-photon, or jet-lepton-missing $E_T$ resonances, in events containing, in addition, any other set of objects.

In the search for BSM physics we must leave no stone unturned. Complementary to traditional theory-directed blind analysis searches, the DDP should be pursued as well. With the expected ramp up of the Large Hadron Collider, existing data should be thoroughly explored. A first milestone could be demonstrating sensitivity to bumps in regions already investigated. If needed, dedicated NNs could be trained to account for scenarios not covered by the current implementation (e.g. different dynamic ranges, bins or widths) and other architectures could be explored. The search for bumps over smoothly falling backgrounds is just one example of a property of the SM that could be considered. Others such as flavour symmetry [8] or forward-backward symmetry could be exploited as well. Given the challenge ahead, searches like the proposed DDP might provide our best chance at discovering BSM physics.

Notes

In [5,6,7], discrepancies between the data and Monte Carlo prediction were searched for in a large variety of final states.
In [7], bump-hunting was exploited by comparing data to MC prediction in many different final states.
This test was carried out for the background shapes $be^{-ax}$ and $\frac{1}{ax^4}+b$.

References

S. Weinberg, Phys. Rev. Lett. 121, 220001 (2018)
Article ADS Google Scholar
ATLAS Collaboration, JHEP 03, 145 (2020)
ATLAS Collaboration, Phys. Lett. B 796, 68-87 (2019)
ATLAS Collaboration, Phys. Rev. Lett. 125, 051801 (2020)
ATLAS Collaboration, Eur. Phys. J. C 79, 120 (2019)
CMS Collaboration, Eur. Phys. J. C 81, 629 (2021)
CDF Collaboration, Phys. Rev. D 79, 011101 (2009)
S. Bressler, A. Dery, A. Efrati, Phys. Rev. D 90, 015025 (2014)
Article ADS Google Scholar
ATLAS Collaboration, Eur. Phys. J. C 77, 70 (2017)
G. Cowan, K. Cranmer, E. Gross, O. Vitells, Eur. Phys. J. C 71, 1554 (2011)
Article ADS Google Scholar
M. Farina, Y. Nakai, D. Shih, Phys. Rev. D 101, 075021 (2020)
Article ADS Google Scholar
J. Collins, K. Howe, B. Nachman, Phys. Rev. D 99, 014038 (2019)
Article ADS Google Scholar
J. Collins, K. Howe, B. Nachman, Phys. Rev. Lett. 121, 241803 (2018)
Article ADS Google Scholar
E.L. Lehmann, J.P. Romano, Testing Statistical Hypotheses, 3rd edn. (Springer, New York, 2005)
MATH Google Scholar

Download references

Funding

This work was supported by Grant no. 2871/19 from the Israeli Science Foundation (ISF), Grant no. I-1506-303.7/2019 from the German Israeli Foundation (GIF) and by the Sir Charles Clore Prize.

Author information

Authors and Affiliations

Department of Particle Physics and Astrophysics, Weizmann Institute of Science, Rehovot, Israel
Sergey Volkovich, Federico De Vito Halevy & Shikma Bressler

Authors

Sergey Volkovich
View author publications
You can also search for this author in PubMed Google Scholar
Federico De Vito Halevy
View author publications
You can also search for this author in PubMed Google Scholar
Shikma Bressler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikma Bressler.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Code availability

The code of this work is available from the corresponding author on reasonable request.

Availability of data and material

The data sets generated and analysed during this work are available from the corresponding author on reasonable request.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Funded by SCOAP³

Reprints and permissions

About this article

Cite this article

Volkovich, S., De Vito Halevy, F. & Bressler, S. A data-directed paradigm for BSM searches: the bump-hunting example. Eur. Phys. J. C 82, 265 (2022). https://doi.org/10.1140/epjc/s10052-022-10215-1

Download citation

Received: 07 December 2021
Accepted: 12 March 2022
Published: 27 March 2022
DOI: https://doi.org/10.1140/epjc/s10052-022-10215-1

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A data-directed paradigm for BSM searches: the bump-hunting example

Abstract

1 Introduction

2 A data-directed paradigm

3 A neural network implementation

4 Results

5 Validation

6 Discussion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Availability of data and material

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation