Searching for periodic signals in kinematic distributions using continuous wavelet transforms

Many models of physics beyond the Standard Model include towers of particles whose masses follow an approximately periodic pattern with little spacing between them. These resonances might be too weak to detect individually, but could be discovered as a group by looking for periodic signals in kinematic distributions. The continuous wavelet transform, which indicates how much a given frequency is present in a signal at a given time, is an ideal tool for this. In this paper, we present a series of methods through which continuous wavelet transforms can be used to discover periodic signals in kinematic distributions. Some of these methods are based on a simple test statistic, while others make use of machine learning techniques. Some of the methods are meant to be used with a particular model in mind, while others are model-independent. We find that continuous wavelet transforms can give bounds comparable to current searches and, in some cases, be sensitive to signals that would go undetected by standard experimental strategies. ar X iv :1 90 7. 03 67 6v 1 [ he pph ] 8 J ul 2 01 9


Introduction
Experimental searches for physics beyond the Standard Model often look for peaks or dips in kinematic distributions. Although this is certainly well motivated, there exist potential signals that can take far more complicated forms and which have received very little attention.
One such possibility is periodic signals. These are a typical signature of models that include a large number of similar resonances with small mass splitting. Such models include the linear dilaton scenario [1][2][3][4], discrete and continuum clockwork models [3,4], certain limits [5][6][7] of the Randall-Sundrum model [8] and more exotic warped extra dimensions [9]. An example of the two-photon invariant mass distribution is shown in figure 1 for the clockwork/linear dilaton (CW/LD) scenario [4].
Intuitively, one may think that taking the Fourier transform of a distribution like this would be an ideal strategy to exploit its periodic nature. There are however some complications with this in practice, as in most scenarios signals do not repeat themselves perfectly and indefinitely. For example, the repetitions might only occur over a finite interval. The position of that interval, which could potentially be used to discriminate signal from background, is essentially lost when passing to frequency space. Also, the frequency of the signal may not be constant. The Fourier transform would then not convey clearly which frequency is present at which point. All in all, certain characteristics of realistic periodic signals are easier to see in the time domain, while others are more clear in the frequency domain. As such, neither the signal itself nor its Fourier transform is ideal to discover a signal whose periodicity changes with time.
Continuous Wavelet Transforms (CWT) address these issues by projecting a given signal over a basis of functions that are localized in both time and frequency space. An example of such a basis function is shown in the inset of figure 1. The output of the CWT is a scalogram, a two dimensional function which indicates how much a certain frequency is present at a given time. The CWT of the signal of figure 1, as well as a similar but weaker signal, is shown in figure 2. A signal that repeats itself with constant frequency would appear as a horizontal line, while one whose frequency changes with time as a line that moves up and/or down. A varying amplitude of the oscillations is also directly represented in the scalogram, as well as when the signal starts and ends. This makes the CWT very flexible when it comes to discovering generic periodic signals.
The goal of the current paper is to demonstrate that continuous wavelet transforms can be used to discover periodic signals in kinematic distributions, in particular in the context of new physics searches at colliders. To do this, we will present a series of methods through which CWT can be used to discover such signals. These will range from the use of a simple test statistic to more advanced machine learning techniques. We find that CWT can compete with current methods and in some cases be sensitive to signals that would otherwise go undetected by current analyses. The paper is organized as follows. We begin by defining the continuous wavelet transforms more carefully. The set of methods are then introduced. Windowed Fourier transforms are then presented to serve as a comparison. The different methods are then compared in the context of the diphoton signal of the CW/LD benchmark, details about which are provided in the appendix. Some concluding remarks complete the paper.

Overview of continuous wavelet transforms
Assume ψ(t) is a basis function localized in both time and frequency space. The continuous wavelet transform of a signal f (t) at a scale a > 0 and translational parameter b ∈ R is given by a projection over rescaled and shifted version of ψ(t): In practice, it is a measure of how much a certain frequency is present in the signal at a given time. The function ψ(t) is known as the mother wavelet and its rescaled and shifted versions as daughter wavelets. The mother wavelet is required to satisfy two conditions: where Ψ(ω) is the Fourier transform of ψ(t). The first condition is simply that the mother wavelet has a finite norm and the second is known as the admissibility condition. The latter implies Ψ(0) = 0, which means in turn that an admissible wavelet must integrate to zero and as such that the CWT of a constant function is zero. Note that for practical uses the signal might be binned, in which case the integral is replaced by a sum over bins. We will use the Morlet wavelet throughout this article. It consists of a localized wave packet and is given by: where B and C are two constants that we will take as 2 and 1 respectively. With this choice, the wavelet transform of a signal will be maximum when its wavelength corresponds approximately to the scale a. The second term ensures that the admissibility condition is satisfied, though it can safely be ignored for our choice of parameters. The Morlet wavelet is shown in the inset of figure 1. Do note that there exist different conventions on the definition of the Morlet wavelet. 1 In this article, we will be using the example of production of a set of resonances decaying to two photons. The invariant mass of the two photons m γγ will play the role of t. When we discuss the distribution of m γγ directly, we will say we are in mass space. When we deal with its wavelet transform, we will say we are in frequency space. The same ideas can be applied to dielectron, dimuon and other final states.

Search strategies with continuous wavelet transforms
We present in this section a series of methods through which continuous wavelet transforms can be used to discover periodic signals in kinematic distributions. These methods will be compared in section 4 by applying them to the CW/LD scenario [4] assuming the diphoton dataset of ref. [10] (37 fb −1 at 13 TeV). All illustrations are also taken from examples of that model.

Method 1: Model-specific search with a simple test statistic
The first strategy that we discuss is the use of a simple test statistic to detect a specific signal. A periodic signal is reflected in a scalogram as a series of ridges, each corresponding to a different harmonic. A given harmonic corresponds of course to a single scale for a given mass. Typically, the first harmonic is dominant and will have a large significance before the other harmonics are even visible. As such, we will concentrate on only the first one. Each point of a scalogram of the measured data can be assigned a local p-value by generating a large set of toy experiments with background only and determining which fraction of these have a larger norm of the wavelet coefficient at that point. Statistical fluctuations are simulated by finely binning the expected spectrum in mass space, fluctuating the number of events in each bin according to the Poisson distribution, and then applying the CWT. Of course, the bin size must be chosen to be smaller than the expected scale of the signal. A signal will appear as a valley of low p-value, i.e. an extended structure. This can be seen in figure 3. With these considerations in mind, a natural choice for a test statistic is: where the sum is over the mass bins of the scalogram, a i is the scale of the first harmonic for bin i and p i (a i ) the local p-value for bin i at scale a i . Roughly speaking, i min is where the first harmonic starts to be discernible and i max where it stops to be. In practice, it is best to perform an optimization of these parameters on a case by case basis. The division by the scale is performed to counteract the fact that fluctuations typically span a mass range of the order of their scale. This test statistic is also inspired by the Fisher method [30]. More advanced test statistics could in principle be used, but eq. (3.1) is easy to implement and will be shown to return good limits.

Method 2: Model-independent search with a simple test statistic
Arguably the greatest strength of continuous wavelet transforms is that they can reveal a periodic signal that was not predicted by any previously considered model. If a region of discrepancy is present in a scalogram, a question that would need to be answered is how significant it is. Previous work on the subject includes [20,26]. First, bins of interest can be selected by asking that their local p-value be below a certain value. They are then grouped into continuous regions. The test statistic (3.1) is then applied to each region by taking the bin with the smallest p-value of each column. The largest test statistic is kept as a hyperstatistic. The statistical significance can then be obtained via a series of toy experiments.

Method 3: Neural network and a simple test statistic
Wavelet transforms map a periodic signal present over a background to an excess over an extended region in a scalogram. When the amount of statistics available is limited, such excesses can potentially be mistaken for simple statistical fluctuations of the background. As can be seen in figure 2, the scalogram of a given signal can take a very complicated form in practice. However, it is clear that a real signal will tend to present certain features that are typically absent from background fluctuations and vice-versa. For example, scalograms of a CW/LD signal will present an excess that is mostly horizontal, while statistical fluctuations of the background can present excesses that are almost completely vertical. These features can potentially be used to increase the statistical significance of a given signal. Trying to manually classify them is at the very least an extremely daunting task, but it is an obvious application of machine learning.
One possibility is to train a neural network to identify regions compatible with the signal searched for inside a scalogram and then calculate their significance using a test statistic. Note that after the mass spectrum is sampled with sufficient resolution to capture the details of the signal, the resulting scalogram can be sampled in a cruder fashion thanks to the extended nature of the excess. This helps making the size of the neural network's input manageable. The neural network is trained on an equal mix of pure backgrounds and examples with signals from random points in the parameter space of the model. In both cases, the network's input is the norm of the CWT of the signal + background divided by the expectation value of the norm of the CWT of the background only W b . When there is a signal, the output it is trained to return is W P s / W b , where W P s is the norm of the wavelet transform of the pure (expected) signal. When there is no signal, the neural  network is trained to return zero in every bin. The details of the neural network we used are given in table 1. When applied to a pseudo-experiment, the neural network will assign bins compatible with a signal-like excess a much larger value than those associated to the background. An example of this for a sample containing a signal is shown in figure 4, which shows that the neural network returns a very signal-like shape. For background-only samples, the neural network typically fails to return such well-defined regions. In principle, one could also train the neural network to return some other function, as long as its region of maximal value corresponds to where the excess should appear in the scalogram. All neural networks were implemented via the Python deep learning library Keras [31] with the TensorFlow backend [32] using the Adam optimizer [33].
Having identified the region of interest, one can select for each mass column the scale with the largest neural network output and apply the test statistic of eq. (3.1) using the actual value of the wavelet coefficient for this bin. A minimal value of the neural network output, optimized for each point in the parameter space, is required for a bin to be counted in the test statistic. This way, the neural network's opinion on whether a signal-like excess is present at all, is taken into account. The expected sensitivity limits are obtained by pseudo-experiments.

Method 4: Classifier neural network
A more powerful way to use machine learning to discover a specific signal in a scalogram is with a classifier. First, toy experiments, with and without the signal (for a specific choice of model parameters), are generated and finely binned mass spectra are produced. The continuous wavelet transform of the mass spectrum is taken and rebinning is performed. The norm of each bin is then passed as an input to a convolutional neural network whose output is trained to be one when the signal is present and zero otherwise. The output of the neural network can then be used as a test statistic. The structure of the neural network that we used and additional parameters are provided in table 2.

Method 5: Autoencoder neural network
A more model-independent option to look for generic discrepancies in scalograms is via autoencoders, similar to their application to jet images in ref. [34]. The idea is to use an autoencoder network to compress a scalogram to a smaller set of parameters which are then used to reconstruct the original scalogram. The neural network is then trained on backgrounds only to reproduce the original scalogram as well as possible. After training, the neural network should be able to reproduce the original scalogram to good approximation if applied to a typical background sample, and fail if applied to a sample that contains a signal. One can then use the reconstruction loss function as a test statistic. The details of the neural network are presented in table 3 and are simply an adaption of the convolutional neural network of ref. [34]. The input of the neural network, which is also the output it is trained to return, is the negative log of the local p-value of each bin. Examples of inputs and outputs of the neural network are shown in figure 5. As can be seen, the autoencoder manages to reproduce approximately the major fluctuations of the background. At the same time, as desired, it fails to reproduce the signal, which results in a much larger value for the reconstruction loss function.

Fourier analysis as a reference
Before moving on to comparing the methods, we discuss the use of Fourier transforms to discover periodic signals, which is the approach that was proposed in ref. [4] in the context of the CW/LD scenario. This will serve as a reference, though it is clear that CWT are far more general. We define in general the power spectrum as: The quantity L is the parton luminosity given for CW/LD by L(m 2 ) = L gg (m 2 ) + 4 3 q L qq (m 2 ) (see eq. (A.4)). The mass spectrum is divided by this quantity to counteract the fast decrease that it causes in the signal and bring it closer to a regularly oscillating function [4]. For a general signal, the function g(m) is defined such that the mass of the nth resonance is related to its index by n = g(m n ). This is the quantity in terms of which the locations of the resonances are periodic and as such the quantity in terms of which the Fourier transform is best performed. 2 Simply doing a Fourier transform in terms of m would not lead to an optimal significance when g(m) differs too much from m. This 2 As the masses mn represent a discrete set and g(m) must be a continuous function, simply requesting g(m) to reproduce the correct masses does not fully define it and a smooth interpolation needs to be provided for the intermediate masses. In practice, there is usually an obvious definition for g(m).  is because a signal that varies over a wide range of frequencies over its duration would lead to a very wide peak in Fourier space. For CW/LD, we take g(m) = R √ m 2 − k 2 (see eq. (A.1)) [4]. Obviously, the power spectrum is expected to peak at T = 1. The value of the peak can then be used as a test statistic.

Comparison between the different methods
To compare the different methods, we apply them to the CW/LD scenario with the γγ dataset of ref. [10]. The resulting reach in the parameter space of the model is shown in figure 6, where the contours correspond to a median expected significance of 2 sigma. As described in more detail in appendix A, the parameter k is approximately the mass at which the spectrum begins, while M 5 controls the cross section, which is approximately proportional to 1/M 3 5 . The structure of the resonance masses m n , if described in terms of m n /k, is essentially independent of k and M 5 in the range of parameters considered, and the asymptotic value of the mass splittings at high mass is given by ∆m ≈ k/10.
The blue (thin solid) curve corresponds to the test statistic method of section 3.1. The lower limit of the sum in eq. (3.1) was optimized to maximize the reach. A similar procedure was done for the upper limit, but we found that the results were virtually identical to those where the upper limit was taken as very large. In practice, we simply took min(10k, 2.7 TeV), where the latter is the upper limit of the experimental data. In terms of the reach in M 5 , the sensitivity peaks at around k ≈ 600 GeV. As k moves upward, the limit on M 5 decreases as the total number of gravitons produced decreases. As k moves downward, the part of the spectrum in which the splittings are resolvable experimentally also moves to lower masses, and the higher background at the lower masses lowers the sensitivity. The purple (thick solid) curve corresponds to the model-independent search of section 3.2. Bins were considered as significant if their p-values were below 10%. This value represents a fair compromise. A much lower value would reduce the sensitivity to weak signals, while for much higher values the selected regions would have such an extent that their interpretation would become unclear. As this approach covers a vast space of possible models, the bounds are unsurprisingly weaker than with the previous approach.
The green (medium thickness, dashed) curve corresponds to the discrepancy region finder of section 3.3. The neural network was trained with signals with k varying from 200 to 2700 GeV and M 5 varying from 1 to 6.5 TeV, where the latter number was obtained by optimization. The reach of this model-independent (within the CW/LD framework) method is somewhat better even than that of the first (model-specific) method thanks to the neural network's ability to assess whether a given excess is signal-like.
The red (thin dashed) curve corresponds to the classifier method of section 3.4. It gives the strongest bounds. One should note, however, that in this case one still needs to account for the look-elsewhere effect, similar to the first method.
The orange (thick dashed) curve corresponds to the autoencoder of section 3.5. While the bounds are somewhat weaker than those of some of the other methods, one should note that the look-elsewhere effect is already taken into account in this case.
The black (thin dotted) curve corresponds to the bounds from Fourier analysis as in section 3.6. As in ref. [4], the lower limit of the integral was taken to be m min = k, i.e. where the signal starts, and the upper limit m max was optimized at each point in the parameter space to maximize the expected significance. The bounds are mostly similar to those obtained using the CWT and a test statistic without machine learning.

Discussion
The LHC experiments have by now developed very comprehensive sets of analyses that provide good coverage of essentially all the simple final states, as well as many exotic ones. They have discovered the Higgs boson, and progress is constantly being made on covering more and more of the parameter space in which physics beyond the Standard Model may be found. While no signs of new physics are seen yet, the theoretical expectation that at least the solution to the electroweak-Planck hierarchy problem is likely to be within the energy reach of the LHC calls for continuing the searches. However, as the LHC experiments have by now matured, improvements in the reach of existing techniques will mostly be gradual. It can therefore be very useful, as an alternative to just waiting, to think about new ways of looking at the data.
In this paper we proposed that wavelet transforms offer such a new way. It is a very general method that can be applied to many different final states at the LHC to search for periodic signals in kinematic distributions in a rather model-independent way. We have also pointed out examples of theoretical models, including some that address the hierarchy problem, for which such searches could be relevant.
We have designed and simulated five different approaches for processing the scalograms produced by the wavelet transforms. Part of these approaches use machine learning techniques, which is another direction into which new physics searches at the LHC can expand, as has been also proposed recently in several other contexts (e.g., refs. [34][35][36][37][38][39][40][41][42][43]).
In our first approach, one assumes a specific new physics model and tests for the presence of its signature in the corresponding region of the scalogram using a simple test statistic. In the second, model-independent approach, the whole scalogram is being searched for extended regions of excess, and the most significant excess is assessed using the test statistic. In the third approach, the scalogram is being analyzed by a neutral network, which searches for regions of excess whose shape is consistent with excesses expected in a given class of models. The fourth approach tests for a specific signal using the classifier neural network, whose single output turns out to be a much more powerful test statistic than the more pedestrian test statistic of the first three methods. Finally, the fifth approach is a model-independent analysis in which an autoencoder neural network learns the background only, and identifies potential signals as deviations from a typical background.
We have exemplified the different methods and compared their sensitivities in the context of the diphoton invariant mass spectrum and the clockwork / linear dilaton model. We have also compared them with the Fourier transform method that was proposed in the context of the same model in ref. [4].
Our reach estimates should be viewed as conservative as there are various possible optimizations, either model-dependent or general, that we have left outside the scope of the current study. For example, we have not explored the possibility of using wavelets other than the Morlet wavelet. Also, we have done only very basic optimization of the architecture and the training parameters of the neural networks, so their performance is likely suboptimal. Nevertheless, all the methods resulted in sensitivities of up to 6-8 TeV in M 5 , which is comparable to existing searches that would be sensitive to the same scenario [4,44]. In addition, we would like to emphasize that similar analyses are interesting also in the dilepton [45,46], dijet [47,48] and other invariant mass spectra (and possibly other kinematic variables) and that the relative strengths and weaknesses of the different methods can vary depending on the final state, the new physics scenario, and the integrated luminosity.
Wavelet-space searches can be of special importance if the new physics signal does not extend to sufficiently high masses where the background is low; if the new physics makes both positive and negative contributions to the mass spectrum (due to quantum interference); and in situations in which the systematic uncertainty on the normalization of the mass spectrum is a limiting factor. In these kinds of cases, signals can go undetected by existing experimental strategies (given a finite integrated luminosity), but discovered in wavelet-space searches.
In conclusion, we hope to have convinced the reader that making wavelet-space searches part of the ATLAS and CMS toolkits is a possibility worth considering. Note added: When this work was close to completion, ref. [49] appeared, which also proposed wavelet transforms as a way to search for new physics in kinematic distributions. Our approaches differ substantially due to the fact that ref. [49] uses the discrete wavelet transform based on the Haar wavelet, whereas our methods use the continuous wavelet transform.
a setup is its ability to explain the hierarchy between the electroweak and Planck scales. This scenario has first appeared in ref. [50], inspired by the seven-dimensional gravitational dual [51,52] of Little String Theory [53,54]. More recently, the same five-dimensional geometry has been rediscovered in [3] while exploring new applications for the clockwork mechanism [55][56][57], in the limit of a large number of sites. Many phenomenological aspects of this scenario have been studied in refs. [1,2,4,58].
Most important for the collider phenomenology of the model are the Kaluza-Klein (KK) gravitons, whose masses (using the notation of ref. [4]) are given by The model parameters k and R, which are related to the curvature and size of the extra dimension, are predicted to satisfy kR ≈ 10 if this scenario is indeed responsible for the hierarchy. This implies a spectrum of narrowly-spaced resonances starting from mass m 1 k, with mass splittings that grow as a function of the mode number n before reaching an asymptotic value of ∆m 1/R ≈ k/10 for n kR. Near the beginning of the spectrum, the relative splittings ∆m/m are around a few percent (almost independent of the value of k), while their decrease as 1/m at large n implies that at some point they fall below the experimental resolution. The intrinsic widths of the resonances are almost always negligible relative to the experimental resolution.
The KK graviton fields h µν n couple to the Standard Model stress-energy tensor T µν as where Λ 2 n = M 3 5 πR 1 + Here M 5 is the five-dimensional reduced Planck mass, which is the fundamental scale of the theory. These couplings allow the KK gravitons to be produced from gg and qq in pp collisions with the cross sections σ(pp → G n ) = π 48Λ 2 are the parton luminosities, for which we take the LO MSTW2008 PDFs [59]. These couplings also allow the KK gravitons to decay to pairs of Standard Model particles, including γγ. Heavy KK gravitons can also decay to pairs of lighter KK gravitons or KK scalars. We account for these decays in computing the γγ branching fraction, taking the case of rigid boundary conditions for the dilaton. However, since they start having an impact only for m k, their effect is insignificant in the range of parameters we consider in this work. For additional details, see ref. [4]. The γγ branching fraction ends up being about 4%, almost independent of the model parameters or the KK graviton mass. Since the parameters M 5 , k and R must combine to give the known value of the fourdimensional reduced Planck mass, M P ≡ 1/ √ 8πG, as M 2 P = (e 2πkR − 1)M 3 5 /k, only two of the parameters are independent, and we choose them to be M 5 and k. The parameter k determines the beginning of the KK graviton spectrum, while M 5 fixes the cross section, which is approximately proportional to 1/M 3 5 . We assume the experimental resolution in the diphoton invariant mass to be which is based on partial information from refs. [10,[60][61][62]. This resolution is used in figure 1 and in the rest of the paper. We take the background from the search for heavy resonances decaying to two photons of ref.