Analytical estimation of the signal to noise ratio efficiency in axion dark matter searches using a Savitzky-Golay filter

The signal to noise ratio efficiency ϵSNR in axion dark matter searches has been estimated using large-statistic simulation data reflecting the background information and the expected axion signal power obtained from a real experiment. This usually requires a lot of computing time even with the assistance of powerful computing resources. Employing a Savitzky-Golay filter for background subtraction, in this work, we estimated a fully analytical ϵSNR without relying on large-statistic simulation data, but only with an arbitrary axion mass and the relevant signal shape information. Hence, our work can provide ϵSNR using minimal computing time and resources prior to the acquisition of experimental data, without the detailed information that has to be obtained from real experiments. Axion haloscope searches have been observing the coincidence that the frequency independent scale factor ξ is approximately consistent with the ϵSNR. This was confirmed analytically in this work, when the window length of the Savitzky-Golay filter is reasonably wide enough, i.e., at least 5 times the signal window.


Introduction
According to cosmological measurements and the standard model of Big Bang cosmology [1], cold dark matter (CDM) is responsible for about 85% of the total matter in our Universe.CDM does not belong to the Standard Model of particle physics (SM), and as of today the nature of CDM is unknown in spite of the strong evidence of its existence [2].The axion [3] is one of the iconic CDM candidates and originally stems from a breakdown of a new symmetry proposed by Peccei and Quinn [4], as a very natural solution to the strong CP problem in the SM [5].In light of the current galaxy formation dark matter has to be massive, stable, and nonrelativistic, and the axion is predicted to meet all of these conditions.The axion haloscope search proposed by Sikivie utilizes axion-photon coupling and a microwave resonant cavity, which results in the resonant conversion of axions to photons [6].This enhances the detected axion signal power drastically.Making the axion haloscope the most promising method for axion dark matter searches in the microwave region.Recent experimental efforts have reached the Dine-Fischler-Srednicki-Zhitnitskii (DFSZ) axion sensitivity [7,8], where the DFSZ axion can be implemented in grand unified theories (GUT) [9].If DFSZ cold axion dark matter turns out to constitute 100% of the local dark matter density, that would not only explain the total matter in our Universe, but also support GUT.
Since the resonated axion signal power is only sensitive to the resonant frequency region, and information about axion mass is absent, the most significant and practical figure of merit in axion haloscope searches is the scanning rate [10].One of the experimental parameters determining the scanning rate is the signal to noise ratio (SNR) efficiency squared ϵ 2 SNR [11], where the SNR = P aγγ a /σ Pn according to the radiometer equation [12] and we define ϵ SNR later in equation (2.2).In the SNR equation, P aγγ a is the expected axion signal power for an axion-photon coupling strength [6,10] and σ Pn is the fluctuation in the noise power P n .Estimates of the ϵ SNR have relied on large-statistic simulation data that usually require a lot of computation time and storage resources [13][14][15].With hints reported in refs.[14,15], in this study we have developed an analytical method to estimate the ϵ SNR for axion dark matter searches.The ϵ SNR for axion haloscope searches is frequencyindependent or, equivalently, background-independent with the relevant scale factor ξ, as demonstrated by refs.[14,15], where ξ is necessary and affected by the combination of power excess bins that are correlated with each other due to background subtraction.In addition, one of our previous works showed that ϵ SNR is practically the signal power efficiency due to background subtraction [15].In another one of our recent works [16], however, we found a subtle effect to the background fluctuation due to background subtraction with a specific condition, which was considered as well.
Inspired by that information, we calculated the signal power efficiency ϵ P aγγ a = ϵ sig whose effects can first be seen in Fig. 1 and ξ independently using only an arbitrary axion mass and the relevant signal shape information.They showed good agreement with those obtained from large-statistic simulation data.In the end, axion haloscope searches have been observing the coincidence that the frequency independent scale factor ξ is approximately consistent with the ϵ SNR .This was confirmed analytically in this work, if the window length of our background estimator is reasonably wide enough, i.e., at least 5 times the signal window.

Obtaining the SNR efficiency analytically
The general data analysis procedure for axion haloscope searches studied in this work aims to maximize the significance of an axion signal by weighting and combining data points [14].Raw data is acquired through power spectra in the frequency domain.The digitized power spectrum data will have the background noise power stored in frequency bins that are spaced at intervals of the resolution bandwidth (RBW) ∆ν.The background will feature a baseline characterized by the system noise and its fluctuations.If an axion signal exists amidst the background, it will appear as a lineshape which is distributed to multiple frequency bins.In the case of the standard halo model, the lineshape is a boosted Maxwellian [17] This lineshape depends on the axion frequency ν a , r = v E /v rms , and β 2 = 1.5v 2 rms /c 2 , where v E is the velocity of Earth with respect to the galaxy halos, v rms is the local circular velocity of the galaxy, and c is the speed of light.In the analysis stage an axion signal window ∆ν a is used where the integral of f (ν) from ν a to ν a + ∆ν a nears unity.The axion signal power is weak; for instance in the CAPP-12TB experiment it was on the order of tens of yoctowatts [8], making it difficult to distinguish from noise fluctuations.Therefore the data is further processed in a way that increases the SNR of an expected signal.
Following the conventional methods of haloscope data analysis, an axion signal, should it exist, will appear as a sharp peak on the background as shown in Fig. 1.This is also replicable by inserting a software-injected axion signal into an experiment's background.
Then the SNR of an input signal, which will have its background perfectly removed, can be compared with the SNR of signal which has its background removed by a fit.Examples of a fit estimator are a χ 2 fit such as in refs.[8,18] or a Savitzky-Golay (SG) filter [19] as seen in refs.[14,16].In this work we will be focusing on the latter as it does not require a functional form of the background structure, can handle zero-fluctuation data like the axion signal power considered in axion dark matter searches, and its estimates are fully predictable.
Figure 1.The effect of an axion signal distorting a background estimation using an SG filter (left) and its effect in terms of normalized power excess (right).An axion signal with exaggerated signal power was added at 1.096 GHz.
Typically any estimate of the background will be affected by a signal such as the one shown in Fig. 1 and due to this the output SNR will not recover 100% of the input.The difference in the input and output SNR can be expressed through an efficiency term ϵ SNR where we define ϵ sig = δ output /δ input for input and output signal powers δ input and δ output .ϵ sig is approximately equal to SNR output /SNR input as σ Pn for input and output are practically equal [15].The frequency-independent scale factor ξ is used to offset the bin-to-bin correlations that are introduced from background subtraction [14,15].This value is set to the width of the Gaussian fit for the normalized power excess of the output.In terms of input, the normalized power excess' Gaussian width should be equal to one.We will first proceed to discuss how to obtain the value of ϵ sig .Using an SG filter as a background estimate, it is possible to construct a signal following equation (2.1) on a flat background that does not include any noise fluctuations.The input signal in units of arbitrary power excess will be distributed into n c = ∆ν a /∆ν frequency bins that have a resolution bandwidth ∆ν, assuming that the first bin's frequency is equal to ν a .The input signal power in the kth bin is where δ a is a multiplicative constant that represents P aγγ a and L k are the fractions of axion signal power stored into the kth bin.While our SG filter is designed to describe the background only, it will be applied to the set of δ k (the red line in the left panel of Fig. 2) to parametrize the spectrum containing the signal as well as the flat background.As also seen in the left panel of Fig. 2, the smoothing from the filter, shown as orange dashed lines, does not fit the background properly around the signal region when one is present.The exact response of the SG filter is with k c k = 1.δk is the SG filter's estimate of the background for the cases considered in this work.What is considered to be the signal power after background subtraction will be δ ′ k = δ k − δk (the blue line in the left panel of Fig. 2).The coefficients c k only depend on the SG filter parameters, window length 2W + 1 and polynomial order d.As k c k = 1, an estimation done with the SG filter acts like a weighted average.The same 2W + 1 values are applied to the coefficients when obtaining each separate δk with different k.A few examples of SG filter coefficients depending on the parameters can be seen in the right panel of Fig. 2. The coefficients can be obtained from its definition in the original paper [19], and it can be fit using a dth-order polynomial (for even d).It is also easy to obtain c k in SG filter packages in programming languages [20].δ k and δ ′ k are combined according to the analysis procedure [14].Following this all further combination processes use inverse-variance weighting.For any stage of combination, this means that a combined power δ combined and its error σ combined will follow for all relevant powers δ before k and their errors σ before k used for the combination.In the presence of σ before k , the weights w k = (σ before k ) −2 , except during the coadding process to obtain the grand spectrum.In this case w k = (σ before k ) −2 L k .Now we will obtain the SNR ratio through the axion signal power ratio: ϵ sig = δ output /δ input .As the SG filter response of each δ k is up to the multiplicative constant δ a , its effects cancel out in the value of ϵ sig .Therefore the value of ϵ sig will not change when signal power is added in a way that changes δ a by the same ratio for both input and output, such as vertical combination.This was confirmed to be in agreement with largestatistic simulations which do include vertical combination and will be further discussed in Sec. 4. Rebinning, or equivalently RBW reduction, however, does affect the results and will be discussed in Sec. 3. If this process is not used, the only leftover contribution to the analysis process is coadding bins.We can now use equation (2.5) with δ before For the particular bin that coincides with the axion frequency, the signal power after coadding n c bins is and resulting in where if |i − j| > W , c i−j = 0.As ϵ sig does not take any noise fluctuations into account it does not carry any error.Hence, this can be used as an exact value instead of an estimate.The value of ϵ sig depending on W is shown in the right panel of Fig. 3.The remaining factor in ϵ SNR , ξ, will be obtained next.For this calculation there will be no axion signal, and only the background is considered.This time the inputs will be multiple independent random variables x k that follow a Gaussian distribution N xk , σ 2 x k .The normalized power is making σ X k one.The estimated background using the SG filter for (2.10) The power excess is Ideally the estimate Xk is equal to Xk = xk /σ x k as X k follows an N Xk , σ 2 X k distribution.It was observed [16] and is known that the standard deviation of X ′ k depends on W and is reduced by where b = 9/8 for d = 2 and b = 225/128 for d = 4 if W is large enough [22].Therefore X ′ k must be further divided by ξ SG to make it follow an N 0, σ 2 The normalized power excess for SG filter estimated data is The use of normalized power excess to obtain ξ is valid as it is independent of the background [14,15] and so are the coefficients c k .The exception to this would be when background subtraction is done poorly or is biased, which will be explored in Sec. 4.Here we will continue with the assumption that Xk is an unbiased estimate of Xk .Similar to the discussion from before, a vertical combination combines different raw spectra, so the random variables that are combined must be independent, and therefore it does not affect the distribution of The normalized input noise at the axion frequency corresponding to k = 0 after coadding is which has a standard deviation of one as expected from n c independent random variables.
Coadding the set of X ′ k we have (2.14) X ′ k are no longer independent to each other since each can include overlapping terms that can range from X −W to X W +nc−1 according to equation (2.12).Therefore this equation is used to decouple X ′ coadded into independent random variables X i .The coefficient value in front of X i is with the constraints L i = 0 for i < 0 or i ≥ n c , c k = 0 for k < −W or k > W .The standard deviation of X ′ coadded , which corresponds to the frequency independent scale factor ξ, is (2.16) The value of ξ for an SG filter with varying W can be seen in Fig. 4. It can be observed that both ϵ sig and ξ become closer to unity as W increases.This can be understood intuitively since the SG filter is an estimate for a background in the context of this raw spectrum.As W becomes larger, the SG filter will tend to "ignore" the signal and therefore retain more of its power, without being too biased.On the other hand, for noise fluctuations when more points are taken into consideration, the bin-to-bin correlations are smaller as W increases since the sum of all c k is one, meaning that the contribution of each bin is smaller, as seen from the right panel of Fig. 2.This also tends to average out to Xk .

Considering the effects of rebinning spectra
By merging n m bins and increasing the RBW value to ∆ν merged = n m ∆ν, the power spectrum undergoes a process called rebinning.This process combines the n m bins according to equation (2.5).For the calculation of ϵ sig , the signal powers after rebinning with w k = 1 and which are the averages of the signal powers over n m bins.
A coadded bin after rebinning will coadd n c /n m bins, with the weights used for coadding Obtaining ϵ sig is straightforward in the sense that the rebinning is applied to each δ k , then the rebinned bins are coadded.
and ϵ merged sig = δ merged output /δ merged input .For both signal input and output the resulting SNR values have been shown to decrease, as seen in ref. [8].While this is a drawback of rebinning, the number of rescan candidates is also reduced, which is an advantage that could be considered as a tradeoff [8,16].Now the rebinned counterpart for noise, X merged k and X ′merged k , will yield a modified ξ merged .Using equation (2.16) and the fact X k and X ′ k are combined in a linear fashion, the only part of the equation that needs modification is L k , which must be expressed in terms of L merged k .For any n m bins merged, X merged k and σ X merged k are as follows where the total number of i is n m .The same applies for X ′merged k .All n m merged bins will now be coadded with the same L merged k value instead of the n m different L k values (e.g., L 0 , L 1 , • • • , L nm−1 ) from before.The coadded and normalized values are where the floor function ⌊x⌋ returns the largest integer smaller or equal to x. X coadded has a standard deviation of one as expected.For X ′ coadded , the standard deviation, or equivalently, the frequency independent scale factor after rebinning ξ merged is obtained as and the relevant results are shown in table 1.
As shown in equation (3.5), the rebinned set of X ′merged k in equation (3.5) is separable into X ′ k using equation (3.4) and ultimately X k via equation (2.12).By unpacking the merged variables fully into independent random variables, the calculation of ξ merged in equation (3.6) becomes as straightforward as ξ in equation (2.16).
Putting the effects of rebinning from both sides together, the SNR efficiency ϵ merged SNR = ϵ merged sig /ξ merged is smaller than ϵ SNR , which is a separate effect from the aforementioned SNR decrease reported in ref. [8].This is later shown in table 1 (ϵ SNR and ϵ merged SNR ) and Fig. 6 (blue and orange lines), although the difference is smaller for large W .While ϵ merged sig < ϵ sig and ξ merged < ξ, the difference between ϵ sig and its rebinned counterpart is larger than the change in ξ, leading to a decrease in SNR efficiency after rebinning.

Comparison with large-statistic simulations
So far the value of ϵ SNR has been obtained given ν a , ∆ν a , and ∆ν.In order to see if this agrees with previously used large-statistic simulations, an axion signal was inserted by software using realistic CAPP-12TB background data.The input CAPP-12TB background shapes use the function and its parameters that were obtained from the χ 2 fit in the original data analysis [8].Another background was constructed for comparison as well, which used the same parameters except for one that described the cavity bandwidth, which was multiplied by 20.This ensured that the structure of the cavity was not seen in this background, and instead was a "smooth" one.Both backgrounds are compared in Fig. 5.Each simulation was done 10000 times with pseudo-random-generated noise to obtain a sufficient amount of statistics, with an axion signal injected near 1.096 GHz.We will denote the results for the two types of backgrounds as "12TB" and "smooth," respectively, e.g., ϵ 12TB SNR or ξ smooth .Since the simulations follow the CAPP-12TB analysis parameters, there is rebinning.Hence the relevant values should be compared with ϵ merged SNR and ξ merged .
Compared to the time and disk space needed for large-statistic simulations, obtaining the values analytically for the same parameters is practically interactive, taking less than a minute while using only computer memory.The results can be compared in table 1 and Fig. 6.As seen from the simulation results, a smooth background matches the rebinned analytical calculations well, but the 12TB background does not.This is expected from ref. [22], i.e., as W increases the background estimation begins to show bias for the 12TB background shown in Fig. 5, similar to the SG filter's tendency to "ignore" the axion signal, as mentioned earlier.Looking at ξ 12TB , ξ smooth , and ξ merged in table 1, this bias propagates to the grand spectrum and becomes evident from 2W + 1 = 801.Furthermore, the bias even shows a positive correlation when 2W + 1 = 1001, which has also been noted in [14].In terms of ϵ SNR in Fig. 6, which combines the effects of ϵ sig and ξ, we can see the bias become evident from 2W + 1 = 601.
The bias shown in the normalized power excess in the CAPP-12TB background suggests that there is a limit to using larger W for SG filters and this can affect both ϵ SNR and ξ in an undesirable manner.Using the analytical calculations can work as a timeefficient way to obtain the SNR efficiency and a predictor for ξ during data analysis.If the resulting ξ from data analysis is noticeably different compared to its predicted value, this  could indicate that there is bias in background subtraction.As seen in Fig. 7, there is a pattern in the bias that comes from the cavity response in the CAPP-12TB background.The bias is not always obvious as even the 2W + 1 = 401 estimate shows a slight bump near the center of the normalized power excess, but only after one hundred different power spectra have been averaged.While this ultimately did not dramatically affect the results in table 1, there is a certain upper limit in W which could be used before it becomes inaccurate.This, however, does not suggest a limitation of this analytical method but rather acts as a confidence check for background subtraction to look out for instances such as the one shown in Fig. 7.The verification of proper background subtraction is necessary in the analysis procedure not only for the correct estimation of SNR but also for obtaining unbiased power excesses.
In table 2, as further validation, two other experiments that used the SG filter for background subtraction and provided the required parameters (ν a , ∆ν, W , d, n m ) have their ξ and ϵ SNR values compared with the analytical method.).HAYSTAC used an unboosted Maxwellian to obtain ξ † and ϵ † SNR in their paper, which was also considered in our calculations.The CAST-CAPP experiment initially reported W = 1000 in their published paper, but the corresponding authors have confirmed that this was a typo and the actual parameter used was 2W + 1 = 1001 (W = 500).

Summary
We report a fully analytical estimation of the signal to noise ratio efficiency ϵ SNR in axion dark matter searches employing an SG filter for a background estimate.Provided the background estimation is properly done with appropriate SG filter parameters like the blue colored data shown in Fig. 7, this approach provides us with the ϵ SNR interactively, using only an arbitrary axion mass and the relevant shape information, without relying on large-statistic simulation data, reflecting the background information and the expected axion signal power obtained from a real experiment.By comparing our ξ or ξ merged with the width of the normalized grand power spectrum from the experimental data, one can also quickly cross check the reasonableness of the error estimation as well as contamination of non-Gaussian components in the experimental data.Axion haloscope searches have been observing the coincidence that the frequency independent scale factor ξ is approximately consistent with ϵ SNR .This was confirmed analytically in this work, when the window length of the SG filter is reasonably wide enough, i.e., at least 5 times the signal window.

Figure 2 .
Figure 2. Left panel shows the power excess of an axion signal at 1.096 GHz before and after applying an SG filter (2W + 1 = 401, d = 4).The power excess is in arbitrary units (a.u.) and the peak value of the lineshape was set to one.Right panel shows the SG filter coefficients c k for different W and d.For any |k| > W , c k is not defined and set to zero in any relations using it.

Figure 3 .
Figure 3. Left panel shows the input and output coadded axion signal power at 1.096 GHz.ϵ sig is calculated when ∆ν a = 4050 Hz and ∆ν = 50 Hz at the bin coinciding with the axion frequency ν a = 1.096GHz, which is where the spectrum shows maximum power excess.Right panel shows the calculated ϵ sig value at the axion frequency.The SG filter window is also represented with a normalized value C = (2W + 1)/(∆ν a /∆ν).

Figure 4 .
Figure 4.The calculated value of ξ using an SG filter with d = 4.The L k used for coadding is the same as those used to calculate ϵ sig above.The SG filter window is also represented with a normalized value C = (2W + 1)/(∆ν a /∆ν).

Figure 5 .
Figure 5.A comparison between the backgrounds used for large-statistic simulations.

Figure 6 .
Figure 6.A comparison of the calculated ϵ SNR values and large-statistic simulation ϵ SNR values for an SG filtered axion signal at ν a = 1.096GHz with ∆ν a = 4050 Hz.The blue line represents ϵ SNR after coadding at ∆ν = 50 Hz and the orange line represents ϵ SNR after rebinning with n m = 9 bins merged (∆ν merged = 450 Hz) and then coadding.The green and red dots are results from simulations with different backgrounds and comes with 1% error (10000 averages).The SG filter window is also represented by a normalized value C = (2W + 1)/(∆ν a /∆ν).

Table 1 .
A comparison of various ϵ and ξ obtained from analytic calculation or simulations.The polynomial order for all SG filters was d = 4.As the simulation results include rebinning, ξ smooth and ξ 12TB should be compared with ξ merged , while ϵ smooth Efficiency results on the simulation side include a 1% error.The results do not consider the additional 5% loss in SNR that occurs due to rebinning, i.e., SNR input and SNR output are both at the ∆ν = 450 Hz level when calculating ϵ merged SNR , ϵ smooth SNR , and ϵ 12TB SNR .

Table 2 .
Comparisons of reported values from other experiments (ξ † and ϵ † SNR ) with their analytically calculated values from this work (ξ † † and ϵ † † SNR