1 Introduction

Persuasive astrophysical and cosmological evidence for the existence of dark matter has led to numerous direct detection efforts for weakly interacting massive particles (WIMPs) over the last 20 years [2]. Amongst these was the XENON1T [3] experiment, which collected 1 t-year of exposure from 2016 to 2018. It culminated in the then most stringent limits for spin-independent (SI) WIMP nucleon interactions above 6 GeV/c\(^2\) [4] at the time. Subsequent limits on spin-dependent WIMP interactions with neutrons and protons [5] as well as WIMP-pion couplings [6] have been published. In each of the aforementioned interactions the expected signature in the detector is a nuclear recoil (NR), induced by the single scatter of a WIMP off a xenon atom. All four searches use the same background and detector response models and NR search data set. Other XENON results may be applicable to some NR interactions, such as ionisation-only signatures [7] or Migdal effect searches [8]. For WIMPs below \(\sim 10~{\textrm{GeV}}/c^2\), the best XENON1T limits are provided by a dedicated low-energy NR search searching for solar \({}^8\)B neutrinos [9]. The SI recoil spectrum and a fixed halo model are the standard for reporting direct-detection WIMP searches [10]. Different interactions or dark matter fluxes, either from alternate dark matter halo models [11] or methods for generating boosted dark matter [12] can yield different spectra. As the exact halo parameters are uncertain, and any candidate dark matter particle may interact through a number of different channels, a robust method to constrain arbitrary nuclear recoil spectra is required.

In the full likelihood used for the XENON1T NR searches there are two data-taking periods, each with an accompanying electronic recoil (ER) calibration set and ancillary measurement terms constraining the detector response and microphysics parameters as well as background models represented in 20 nuisance parameters [13]. Each science data set is modelled in three analysis dimensions (discussed in Sect. 2), with five background components (presented in Sect. 2). This complexity was reflected in the computational expense, requiring about \(\sim \) 30 s for a toy Monte Carlo (toy-MC) simulation of the analysis.

In this paper we present the profiled likelihood of the XENON1T NR search in bins of reconstructed energy, a description of how it may be used to calculate upper limits for a generic NR spectrum and a data release with accompanying code [1] allowing the physics community to use this method to recast the XENON1T result. The computation is fast, taking about \(\sim \) 40 ms to compute an upper limit for a recoil spectrum. We present comparisons to the full toy-MC simulation computation result for several recoil spectra. For heavy WIMPs, the limit computed with the approximate likelihood is typically conservative and within \(10\%\) of the full-likelihood computation, while lower-energy recoil spectra see a higher spread around the full-likelihood upper limit of up to \(\sim \) 30%. Finally, we extend this work by including the XENONnT 20 t-year sensitivity projection [14], with 1000 toy-MC simulation binwise likelihoods, so that the sensitivity of this projection can be evaluated for any NR signature.

In Sect. 2 we give an overview of the XENON1T NR search, highlighting the analysis dimensions used in the inference, and in Sect. 3 we discuss the response to NRs. We present our statistical model in Sect. 4 and the exact methodology in Sect. 5. Section 6 details how to use this approach for approximate limits, and provides estimates of the bias and variance of the method for a selection of NR recoil spectra.

2 XENON1T nuclear recoil search

XENON1T was designed and optimized to detect the low-energy NRs expected from WIMPs recoiling off xenon nuclei [3]. Its primary detector was a dual phase xenon time projection chamber (TPC) containing 2 t of instrumented liquid xenon, observing scintillation and ionization charges from interactions in the target. Prompt scintillation light is observed from the recombination or de-excitation of xenon ions or dimers, respectively, and is referred to as the S1 signal. Ionization electrons are drifted to the liquid-gas interface at the top of the detector by means of a drift field applied between a cathode electrode at the bottom of the chamber and a grounded gate electrode just below the liquid-gas interface. The electrons produce scintillation light proportional to the charge, referred to as the S2 signal, when they are extracted into the gas by a higher extraction field. Xenon scintillation was observed by 248 photomultiplier tubes (PMTs) arranged in two arrays at the top and bottom of the detector. The x–y position of the interaction was inferred from the pattern of S2 photons observed by the top PMT array, while the time separation between the S1 and the S2 signal indicated the z-depth. With access to the full 3D position information we could fiducialize the detector volume, selecting only the innermost 1.3 t xenon volume where contributions from radioactivity in the detector materials are minimized.

Fig. 1
figure 1

Scatter-plot of the XENON1T NR search dataset in \(\textrm{cS1}\) and \(\mathrm {cS2_b}\). Gray lines indicate the 80 bins in reconstructed NR energy. Coloured contours indicate 1\(\sigma \) contours for background and signal models: Blue and green contours show the ER and wall background models, the purple and orange contours show the \(6~{\textrm{GeV}}/{\textrm{c}}^2\) and \(50~{\textrm{GeV}}/{\textrm{c}}^2\) spin-independent WIMP signal models, and the red a \(30~{\textrm{keV}}\) monoenergetic NR recoil

Both S1 and S2 signals were corrected to account for the detector’s position dependent light collection efficiency [15], and in the case of the S2 we also corrected for electron attachment to impurities in the liquid xenon volume as the electrons are drifted upwards. These corrected S1 and S2 variables are named \(\textrm{cS1}\) and \(\textrm{cS2}\).

The relative size of the ionisation and scintillation signals, and therefore \(\textrm{cS1}\) and \(\textrm{cS2}\), depends on whether the incident particle scattered off the nucleus (NR) or an electron (ER) of a xenon atom. In Fig. 1 the predicted 1\(\sigma \) contour for interactions of a 50 (6) GeV/c\(^2\) WIMP, which is expected to interact with the xenon nucleus, producing NRs, is shown in orange (purple). We also illustrate the signal expectation from a mono-energetic 30 keV NR line in red. The 1\(\sigma \) contour of the ER background is shown in blue, demonstrating the separation between nuclear and electronic recoils in XENON1T. Shown in green is the 1\(\sigma \) contour of the “wall” background, which is discussed at the end of this section.

WIMPs are expected to scatter at most once off a target nucleus due to their small interaction cross sections, therefore XENON1T optimized its search strategy to look for single scatter NR events. The analysis space spans from 3 to 70 photoelectrons (PE) in the \(\textrm{cS1}\) space, where the lower boundary is driven by the detection efficiency, and the upper boundary is chosen to include the bulk of the expected WIMP signal. We use the light observed in the bottom PMT array to determine magnitude of the position corrected S2s, referred to as \(\mathrm {cS2_b}\), due to the more uniform response of this array in the x–y plane. The \(\mathrm {cS2_b}\) space is chosen to fully contain the expected background and signal models in our chosen \(\textrm{cS1}\) region and spans from 50 to 7940 PE, corresponding to approximately 1.5–250 electrons.

The full XENON1T exposure was collected in two science campaigns, SR0 and SR1, between November 2016 and February 2018, with drift fields of 120 V/cm and 81 V/cm, respectively. Continual purification of the xenon improved the electron lifetime from 380 \(\upmu \textrm{s}\) at the start of SR0 to \(\sim \) 650 \(\upmu \textrm{s}\) at the end of SR1. The final data, after quality selections detailed in [15] and fiducialization consisted of 739 events in a \(1~\text {t}\)-year exposure, shown in Fig. 1 as grey circles.

The response of the detector to low-energy ER and NR interactions was calibrated with \({}^{220}{\textrm{Rn}} \), the decay products of which produce low-energy beta-decays, and \(^{241}\)AmBe and deuterium-deuterium fusion generator neutron sources. We used a detector response model based on a fast detector simulation to fit the calibration data and model ER and NR sources in XENON1T [13].

Background models for five sources of interactions within XENON1T were considered, detailed in [13]. The largest background is ERs induced by the \(^{214}\)Pb decay product of \(^{222}\)Rn or decays of \(^{85}\)Kr. The second largest background expectation is referred to as the “wall” background. These are events which occur close to the polytetrafluoroethylene walls of the detector, and consequently lose a portion of the ionization electrons to the wall as they drift upwards. The lower S2 signal, observed close to the detector edge will result in larger position reconstruction errors, and this population will therefore bleed into the fiducial volume. For this reason, we include the radius, denoted by R, as an analysis dimension along with \(\textrm{cS1}\) and \(\mathrm {cS2_b}\) for the background and signal models. The 1\(\sigma \) contours of these two dominant backgrounds is shown in Fig. 1 in blue (ERs) and green (wall). The other backgrounds considered are radiogenic neutrons from detector materials, coherent elastic neutrino-nucleus scattering (CE\(\nu \)NS) of solar \(^8\)B neutrinos, and accidental pairing of lone S1 and S2 signals.

3 Analysis variables and detector response

Previous XENON1T searches for WIMP interactions [13, 15] directly used the observed \(\textrm{cS1}\) and \(\mathrm {cS2_b}\) variables as described in Sect. 2. Since the total number of quanta produced is dependent on the original energy deposition, the number of prompt scintillation photons and ionization electrons, observed as the \(\textrm{cS1}\) and \(\mathrm {cS2_b}\), respectively,are intrinsically anti-correlated. Additionally, the fraction of quanta observed as ionization electrons or scintillation photons is energy dependent. Thus for a given \(\textrm{cS1}\) selection, different NR energies yield different distributions in \(\mathrm {cS2_b}\) space. To reduce the dependence on the recoil energy, we transform our analysis space to explicitly feature reconstructed energy as one dimension.

3.1 Reconstructed energy

The reconstructed ER energy \({E_\textrm{rec}}_{\textrm{ER}}\) of the original interaction can be obtained from \(\textrm{cS1}\) and \(\mathrm {cS2_b}\) quantities as:

$$\begin{aligned} {E_\textrm{rec}}_{\textrm{ER}} (\textrm{cS1},\mathrm {cS2_b}) \equiv W \cdot [\textrm{cS1}/g_1 + \mathrm {cS2_b}/g_2], \end{aligned}$$

where \(W=13.7\) eV is the average amount of energy required to produce one electron or photon in xenon [16]. The detector dependent quantities \(g_1\) and \(g_2\) represent the number of photoelectrons observed in the PMT arrays per emitted scintillation photon and the number of photoelectrons observed per extracted electron respectively.

Since the approximate likelihood will be presented in bins of \(E_\textrm{rec}\), it is necessary that other analysis dimensions are as independent of recoil energy as possible. Therefore, we also introduce \(E_\textrm{rec}^\perp \),

$$\begin{aligned} E_\textrm{rec}^\perp (\textrm{cS1},\mathrm {cS2_b}) \equiv W \cdot [\textrm{cS1}/g_2 - \mathrm {cS2_b}/g_1], \end{aligned}$$

which is constructed so that \({E_\textrm{rec}}_{\textrm{ER}}\) and \(E_\textrm{rec}^\perp \) contours are perpendicular.

Performing the analysis in \(E_\textrm{rec}\), \(E_\textrm{rec}^\perp \)coordinates rather than in \(\textrm{cS1}\), \(\mathrm {cS2_b}\) is only a coordinate transformation, and does not affect the XENON1T inference results.

In order to obtain the reconstructed recoil energy for NR events (\({E_\textrm{rec}}\)), one must also account for the energy dependent quenching effect, where NR energy is lost to unobserved heat. We estimate the quenching magnitude at a given energy from an empirical comparison between the true NR energy and the reconstructed ER energy using the detector response model described in [15]. The constant NR energy lines obtained from the above procedure are shown in Fig. 1 as gray shaded bands.

3.2 Migration matrix

In order to convert an arbitrary NR spectrum into the reconstructed energy spectrum expected to be observed in XENON1T we account for detector effects. The complete detector response model, derived from fits to calibration data and accounting for detection efficiency, resolution and correction effects is described in [15]. Using this model, we calculate the spread in reconstructed energy space of a fine grid of true NR recoil energies. The migration matrix is shown in Fig. 2, where the components of the migration matrix

$$\begin{aligned} {\mathscr {P}}_{r,t} = P(E_\textrm{rec} ~\text {in bin } r\mid E_\textrm{true} ~\text {in bin } t), \end{aligned}$$

represent the probability for a NR recoil in some true recoil bin to be reconstructed in a given reconstructed energy bin. The transformation of the true recoil energy spectra for a 6 and 50 GeV/c\(^2\) WIMP into bins in reconstructed energy space is shown in purple and orange respectively. Also shown is the transformation of a mono-energetic 30 keV line, illustrating the broadening of the signal spectrum from detector effects.

Fig. 2
figure 2

Illustration of the migration matrix included in the data release [1], as defined in Eq. 3, showing the conversion between true NR recoil energy and the reconstructed energy. The bottom panel shows the true NR spectrum of a \(30~{\textrm{keV}}\) line in red, and spin-independent (SI) WIMP recoil spectra for a \(6~{\textrm{GeV}}/{\textrm{c}}^2\) and \(50~{\textrm{GeV}}/{\textrm{c}}^2\) WIMP in purple and orange, respectively, all with arbitrary normalisation. The left panel shows the same spectra in reconstructed energy after multiplication with the migration matrix. The matrix is normalized such that selections in \(E_\textrm{rec}\) account for our overall detection efficiency

4 Statistical model

We use a profiled log-likelihood ratio test statistic and toy-MC simulations of the test statistic distribution to compute discovery significances and confidence intervals. The likelihood \({\mathscr {L}}_{\textrm{total}}\) used for NR searches with XENON1T is presented in [13]. It is a product of:

  • \({\mathscr {L}}_{\textrm{SR}}^{\textrm{sci}}(s,{\varvec{\theta }}\mid {\varvec{x}})\): unbinned, extended likelihood terms in three analysis dimensions: \(\textrm{cS1}\), \(\mathrm {cS2_b}\) and R, for the two science data-taking periods, labeled SR0 and SR1 (indexed with SR). The likelihood is a function of the signal strength parameter s, and the set of nuisance parameters \({\varvec{\theta }}\), and is evaluated for the data \({\varvec{x}}\).

  • \({\mathscr {L}}_{\textrm{SR}}^{\textrm{cal}}({\varvec{\theta }}\mid {\varvec{x}})\): unbinned, extended likelihood terms in two analysis dimensions; \(\textrm{cS1}\) and \(\mathrm {cS2_b}\) for the \(^{220}{\textrm{Rn}}\) calibration data taken for each science data-taking period. Since this calibration source is uniformly distributed in the detector, R is not included.

  • \({\mathscr {L}}^{\textrm{anc}}({\varvec{\theta }}\mid {\varvec{x}}_{\textrm{anc}})\): terms representing ancillary measurements of background rates and the signal detection efficiency, with \({\varvec{x}}_{\textrm{anc}}\) being the ancillary measurements.

The aim of this paper is to present an approximate likelihood applicable to any NR signal in an easily publishable format. To that end, we first reparameterise the signal and background models to be in \(E_\textrm{rec}\), \(E_\textrm{rec}^\perp \) and R, and write separate likelihood terms, primed to mark the reparameterisation, \({\mathscr {L}}^{{\textrm{sci}}\prime }_{{{\textrm{r}},{\textrm{SR}}}}\) for bins r in reconstructed energy. These two changes leave the likelihood unaltered (up to a constant factor)

$$\begin{aligned} {\mathscr {L}}^{{\textrm{tot}}\prime }(s,{\varvec{\theta }})&= \prod _r \prod _{\textrm{SR}}{\mathscr {L}}^{{\textrm{sci}}\prime }_{{\textrm{r}},{\textrm{SR}}} (s,{\varvec{\theta }})\times {\mathscr {L}}_{\textrm{SR}}^{{\textrm{cal}}\prime } ({\varvec{\theta }})\times {\mathscr {L}}^{\textrm{anc}}({\varvec{\theta }}). \end{aligned}$$

The per-bin science data likelihood for bin r with observed events \(N_r\) and lower and upper edges \(E_\textrm{rec} {}_{,d}\) and \(E_\textrm{rec} {}_{,u}\) is

$$\begin{aligned} {\mathscr {L}}^{{\textrm{sci}}\prime }_{{\textrm{r}},{\textrm{SR}}}(s,{\varvec{\theta }})&=Poisson(N_r\mid \mu _r^{\textrm{tot}}(s,{\varvec{\theta }}))\nonumber \\&\quad \times \prod _{i\in S_r} f^{{\textrm{tot}}\prime }(E_\textrm{rec} {}_{,i}, E_\textrm{rec}^\perp {}_{,i}, R_i \mid s,{\varvec{\theta }}) \end{aligned}$$

where \(f^{{\textrm{tot}}\prime }(E_\textrm{rec},E_\textrm{rec}^\perp ,R\mid s,{\varvec{\theta }})\) is the total probability density function (PDF) in the transformed analysis variables, and \(S_r \equiv \{i\mid E_\textrm{rec} {}_{,d}<E_\textrm{rec} {}_{,i}<E_\textrm{rec} {}_{,u}\}\) is the set of events in the science run with \(E_\textrm{rec}\) in bin r. The total expected number of events in each bin r, and the expectation from each source j in that bin are defined as

$$\begin{aligned}&\mu _r^{\textrm{tot}}(s,{\varvec{\theta }}) \equiv \sum _j \mu _{j,r}(s,{\varvec{\theta }}) \end{aligned}$$
$$\begin{aligned}&\mu _{j,r}(s,{\varvec{\theta }}) \equiv \mu _j(s,{\varvec{\theta }}) \nonumber \\&\quad \times \int _{E_\textrm{rec} {}_{,d}}^{E_\textrm{rec} {}_{,u}} \left( \int f_j^\prime (E_\textrm{rec}, E_\textrm{rec}^\perp ,R\mid s,{\varvec{\theta }}) {\textrm{d}}E_\textrm{rec}^\perp {\textrm{d}}R\right) {\textrm{d}} E_\textrm{rec}, \end{aligned}$$

where \(\mu _j(s,{\varvec{\theta }})\) and \(f_j^\prime (E_\textrm{rec}, E_\textrm{rec}^\perp ,R\mid s,{\varvec{\theta }})\) are the expected number of events and the total PDF of source j, respectively.

The first approximation we make is to replace the PDF in each bin by the averaged PDF in that bin, we will denote this change with double primes,

$$\begin{aligned}&f_{j,r}^{\prime \prime }(E_\textrm{rec}^\perp ,R\mid s,{\varvec{\theta }}) \equiv \frac{\mu _{j,r}(s,{\varvec{\theta }})}{\mu _j(s,{\varvec{\theta }})}\nonumber \\&\quad \times \int _{E_\textrm{rec} {}_{,d}}^{E_\textrm{rec} {}_{,u}} f_j^\prime (E_\textrm{rec}, E_\textrm{rec}^\perp ,R\mid s,{\varvec{\theta }}) {\textrm{d}} E_\textrm{rec} \end{aligned}$$

and the science likelihood for the bin to one using this averaged PDF,

$$\begin{aligned} {\mathscr {L}}^{{\textrm{sci}}\prime \prime }_{{\textrm{r}},{\textrm{SR}}} (s,{\varvec{\theta }})&= Poisson(N_r\mid \mu _r^{\textrm{tot}}(s,{\varvec{\theta }})) \end{aligned}$$
$$\begin{aligned}&\quad \times \prod _{i\in S_r} f^{{\textrm{tot}}\prime \prime } (E_\textrm{rec}^\perp {}_{,i}, R_i \mid s,{\varvec{\theta }})\nonumber \\ {\mathscr {L}}^{{\textrm{sci}}\prime \prime }_{\textrm{r}}(s,{\varvec{\theta }})&\equiv \prod _{\textrm{SR}}{\mathscr {L}}^{{\textrm{sci}}\prime \prime }_{{\textrm{r}},{\textrm{SR}}} (s,{\varvec{\theta }}). \end{aligned}$$

The total approximate likelihood is the product of each binwise contribution times the calibration and ancillary constraint terms,

$$\begin{aligned} {\mathscr {L}}^{{\textrm{tot}}\prime \prime }(s,{\varvec{\theta }}) = \prod _r \left( {\mathscr {L}}^{{\textrm{sci}}\prime \prime }_{\textrm{r}} (s,{\varvec{\theta }})\right) \times {\mathscr {L}}^{{\textrm{cal}}\prime } ({\varvec{\theta }})\times {\mathscr {L}}^{\textrm{anc}}({\varvec{\theta }}). \end{aligned}$$

5 Binwise profiling

For the binwise-averaged likelihoods to be a good approximation to the unbinned likelihood, the bins must be small with respect to the XENON1T resolution. In Sect. 6.1, we choose the bin number n to minimise bias and maximise accuracy. To produce a likelihood for any signal shape, we wish to compute profiled likelihood ratios for each bin separately. However, the chosen binning is so narrow that many nuisance parameters in \({\varvec{\theta }}\), for instance the normalisation of the wall background, cannot be constrained in each bin separately. In practice, no nuisance parameter is strongly pulled from its best-fit value in the original XENON1T upper limit computation. Therefore, our second approximation is to first compute \(\hat{{\varvec{\theta }}}_0\), the value of the nuisance parameters that optimises \({\mathscr {L}}^{{\textrm{tot}}\prime \prime }(0,{\varvec{\theta }})\), and fix the nuisance parameters to this value. The exception is the ER mismodelling term (and therefore also the ER normalisation) that requires special attention:

Over- or under-estimating a signal-like tail of the background model would bias results towards too-strict limits or spurious discoveries, respectively. Therefore, the XENON1T WIMP search likelihood [13] includes an ER mismodelling term [17] that takes the form of a signal-like component added to the ER model,

$$\begin{aligned} f_{\textrm{ER}}(x) \rightarrow \gamma (\alpha )\times {\textrm{max}}\left[ (1-\alpha ) f_{\textrm{ER}}(x) + \alpha f_{\textrm{SIG}}(x),0\right] \end{aligned}$$

where \(f_{\textrm{ER}}(x)\), \(f_{\textrm{SIG}}(x)\) are the PDFs in x of the ER background and (WIMP) signal, respectively, \(\alpha \) is the size of the ER mismodelling term and \(\gamma (\alpha )\) is a normalisation term to ensure that the total PDF is normalized even for negative \(\alpha \). Since this term depends on the signal model considered, it cannot be determined by the background-only fit, and must be profiled per bin. The total ER distribution used in the likelihood becomes

$$\begin{aligned}&f_{{\textrm{ER}},{\textrm{r}}}({\varvec{x}} \mid \alpha _r) \nonumber \\&\quad \equiv {\left\{ \begin{array}{ll} \gamma (\alpha _r)\times {\textrm{max}}[0,((1-\alpha _e)f_{\textrm{ER}} &{} \\ \quad (x\mid \hat{{{\varvec{\theta }}_{\varvec{0}}}}) +\alpha _r \times f_{\textrm{SIG}}(x)) ], &{} \text {if } E_\textrm{rec} {}_{,d}<E_\textrm{rec} \le E_\textrm{rec} {}_{,u}\\ \gamma (\alpha _r)\times f_{\textrm{ER}}({\varvec{x}}\mid \hat{{{\varvec{\theta }}_{\varvec{0}}}}), &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

which is used both in the calibration and science data likelihoods. Each bin has its own mismodelling component, parameterized with \(\alpha _r\), which therefore can more freely fit the calibration data shape, resulting in an improved fit. The total calibration PDF is the sum of \(f_{\textrm{ER}}({\varvec{x}}\mid \hat{{\varvec{\theta }}}_0)\) and the accidental background component making \({\mathscr {L}}^{\prime \prime {\textrm{cal}}}(\alpha _r)\) have the same form as Eq. 5. Since the mis-modelling term only affects the shape of the background, the normalisation in the calibration term is fixed to the best-fit value.

Using the ER model of Eq. 13 and the best-fit nuisance parameters for the no-signal fit \(\hat{{\varvec{\theta }}}_0\), thereby fixing \({\mathscr {L}}^{\textrm{anc}}\), we construct the likelihood in each bin of reconstructed energy,

$$\begin{aligned} {\mathscr {L}}^{{\textrm{tot}}\prime \prime }(s,\alpha _r,\mu ^{\textrm{ER}}_r)_r&= {\mathscr {L}}^{{\textrm{sci}}\prime \prime }(s_r,\alpha _r,\mu ^{\textrm{ER}}_r, \hat{{\varvec{\theta }}}_0)\nonumber \\&\quad \times {\mathscr {L}}^{{\textrm{cal}}\prime \prime } (\alpha _r,\hat{{\varvec{\theta }}}_0). \end{aligned}$$

Here, \(s_r\) is the signal expectation in each reconstructed energy bin r, which relates to the expectation in each bin of true energy t via the migration matrix

$$\begin{aligned} s_r = \sum _t {\mathscr {P}}_{r,t} \cdot s_t, \end{aligned}$$

and the signal expectation in each true energy bin t in turn is given by

$$\begin{aligned} s_t = s \int _{E_\textrm{true} {}_{,d}}^{E_\textrm{true} {}_{,u}} g(E_\textrm{true}) {\textrm{d}} E_\textrm{true}, \end{aligned}$$

where g(E) is the signal PDF in true recoil energy \(E_\textrm{true} \), and s the expected number of true signal events.

The binwise profiling follows the approach in [18, 19], where the likelihood is profiled separately in sections of the analysis variable space. The profiled likelihood in each bin is

$$\begin{aligned} \lambda _{\textrm{r}}(s) =-2\times \log \left( \frac{{\mathscr {L}}^{{\textrm{tot}}\prime \prime } (s,{\hat{\hat{\alpha }}}_r,\hat{\hat{\mu }}^{\textrm{ER}}_r)}{{\mathscr {L}}^{{\textrm{tot}}\prime \prime }({\hat{s}},\hat{\alpha _r}, \hat{\mu }^{\textrm{ER}}_r)}\right) \end{aligned}$$

where \({\hat{s}},{\hat{\alpha }}_r,\hat{\mu }^{\textrm{ER}}_r\) is the signal expectation value, mismodelling fraction and ER rate that maximises the likelihood, and \({\hat{\hat{\alpha }}}_r,{\hat{\hat{\mu }}}^{\textrm{ER}}_r\) maximise the conditional likelihood. Figure 3 shows the profiled binwise likelihood for each bin as function of s. The per-bin likelihoods show the expected fluctuation from a lower-statistics sample – some prefer a positive signal, others no signal. To compute a full result, they must be combined into one likelihood.

Fig. 3
figure 3

Illustration of the binwise, profiled log-likelihood \(\lambda _{\textrm{r}}(s)\) for bins in reconstructed NR energy. The total approximate likelihood is obtained by summing over the entry in each reconstructed energy bin at the expected signal, as in Eq. 18. The purple, orange and red lines indicate the expectation values in each bin for a \(6~{\textrm{GeV}}/{\textrm{c}}^2\) and a \(50~{\textrm{GeV}}/{\textrm{c}}^2\) spin-independent WIMP signals and a 30 keV NR line signal respectively, at their respective upper limits derived from the XENON1T dataset. White bins at the highest and lowest reconstruction energies reflect bins for which the migration matrix 0

6 Inference using the binwise likelihood

Using the energy migration matrix defined in Sect. 3 to compute bin-wise signal expectations \(s_r\), together with the likelihood ratio for each bin defined in Eq. 17, we can write our approximation of the log-likelihood written in Eq. 4

$$\begin{aligned} \Lambda _{\textrm{tot}}(s) = \sum _r{\lambda _{\textrm{r}}(s_{\textrm{r}})} \end{aligned}$$

and the corresponding log-likelihood ratio

$$\begin{aligned} \lambda _{\textrm{tot}}(s) = \Lambda _{\textrm{tot}}(s) - \Lambda _{\textrm{tot}}({\hat{s}}). \end{aligned}$$

Using this approximate likelihood induces only a moderate systematic and random error in confidence intervals with respect to the ones computed with the full, computationally much slower XENON1T likelihood. Best-fit and upper limits are then computed using the standard asymptotic formulae [20, 21].

6.1 Fidelity of the approximate likelihood method

Differences between the unbinned and approximate binwise results are the binning in \(E_\textrm{rec} \), the per-bin ER mismodelling and profiling, and the slight change in the signal distribution in \(E_\textrm{rec}^\perp \) in individual bins for different signal shapes.

Table 1 Table of bias and spread, defined as the median and 1-sigma spread of the ratio between binwise and full likelihood upper limits using 1000 toy-MC simulations
Fig. 4
figure 4

Top: The 90-percentile threshold of the approximate log-likelihood ratio test statistic as function of the true signal expectation. Thresholds estimated with toy-MC simulations for a range of monoenergetic signals are shown with black dots, and the magenta line shows the smoothed maximum. The threshold converges to the asymptotic value for around \(\sim 4\) expected signal events. Bottom: The coverage of 95, 90 and 68-percent confidence level upper limits are shown with diamonds, squares and circles, respectively, for five NR recoil spectra: Flat (blue), a 3 keV monoenergetic line (red), a \(6~{\textrm{GeV}}/ {\textrm{c}}^2\) SI WIMP (purple) and a \(50~{\textrm{GeV}}/ {\textrm{c}}^2\) SI WIMP (orange)

Fig. 5
figure 5

Comparison between \(90\%\) confidence level upper limits from published XENON1T NR searches [4,5,6] (black), and limits using the approximate likelihood presented in this work. Cyan lines are computed assuming an asymptotic distribution of the test statistic, while magenta lines show the upper limit using the non-asymptotic threshold described in Sect. 6.2. As in the toyMC studies, the binwise result on data is a good approximation of the full computation for WIMPs with masses \(\gtrsim 50~{\textrm{GeV}}/c^2\), and gives a conservative result for lower-mass WIMP signals

We validated the performance of the binwise likelihood approach by computing upper limits for a range of signal spectra and different numbers of bins in \(E_\textrm{rec}\). Table 1 shows the median ratio between the limits computed with the approximate and full likelihood, and errors corresponding to the 15th and 85th percentiles of the ratio between the two. Increasing the number of bins beyond 80 bins between 0 and \(60~{\textrm{keV}}\) did not markedly improve either the bias or spread of the upper limits for the binwise likelihood. Therefore we choose to report the result of this work using 80 bins in reconstructed energy space. For heavy WIMPs, the bias and errors are both on the order of \(10\%\). The more peaked low-mass WIMP signals or lower-energy monoenergetic lines, both concentrated in only a few bins, have a larger range of deviation from the full result, up to \(30\%\) scatter with respect to upper limits with the full likelihood. The bias and errors in Table 1 give an indication of how well the approximate likelihood should be expected to perform for different signal shapes and energy ranges.

6.2 Correcting for non-asymptoticity

The XENON1T results were computed from test statistic distributions estimated using toy-MC simulations of datasets. This was necessary due to the non-asymptotic nature of the distributions for the low signal-numbers considered [13].

Since generating datasets depends on the signal model, this approach must be amended if a similar correction should be applied to the likelihood ratio of Eq. 18.

Our approach is motivated by the observation that the non-asymptotic behaviour of the XENON1T likelihood is driven by the signal-to-background discrimination that leaves the signal region almost background free. Computing the test statistic distribution for a range of monoenergetic NRs will include the best signal-to-background discrimination. Other NR signals will be broader and therefore feature less extreme ER-NR discrimination than these monoenergetic signals. Therefore, we compute the 90th percentile thresholds of the test statistic for a fine grid of monoenergetic NR signals, and choose the 90th percentile of all thresholds, to avoid statistical fluctuations, before smoothing this threshold using a Gaussian filter. Figure 4 (top) shows the thresholds computed as a function of the signal, while Fig. 4 (bottom) shows the coverage for several signal spectra using the smoothed upper envelope of these thresholds together with Eq. 19 to compute upper limits for several recoil spectra. All show either the nominal coverage, or conservatively over-cover at low signal expectations. We therefore recommend using this threshold rather than the asymptotic \(\chi ^2\) threshold to compute frequentist confidence intervals. In the data release, we include 68, 90 and 95-percentile thresholds.

Upper limits computed with the binwise approximation and with the binwise approximation plus the non-asymptotic threshold are compared with all XENON1T high-mass NR searches in Fig. 5. Close agreement is seen except at low masses, where the binwise approximation yields a higher, and thus more conservative, upper limit.

7 Summary

This paper and the accompanying code and data release provide a fast and flexible method to compute approximate results of the XENON1T NR search [4] for any NR spectrum. As many spectra can be tested, care should be taken when interpreting the likelihood to compute discovery significances. On the other hand, we have validated with toy-MC simulations and comparisons with the XENON1T full likelihood that good agreement is found for confidence intervals. We also provide a method to ensure that these confidence intervals have, on average, correct or over-coverage only. In the appendix, we also show how this method can be employed to provide recasts of sensitivity projections, in this case of the 20 t-year XENONnT projection presented in [14]. Together with the XENON1T ER spectral search [22, 23] and ionisation-only [7, 24] publications, the approximate NR likelihood provides a range of recastable legacy results of the XENON1T experiment.