Estimating the efficiency turn-on curve for a constant-threshold trigger without a calibration dataset

Many particle physics experiments use constant threshold triggers, where the trigger threshold is in an online variable that can be calculated quickly by the trigger module. Offline data analysis then calculates a more precise offline variable for the same quantity, for example the event energy. The efficiency curve is a step function in the online variable, but not in the offline variable. One typically obtains the shape of the efficiency curve in the offline variable by way of a calibration dataset, where the true rate of events at each value of the offline variable is measured once and compared to the rate observed in the physics dataset. For triggers with a fixed threshold condition, it is sometimes possible to obtain the trigger efficiency curve without use of a calibration dataset. This is useful to verify stability of a calibration over time when calibration data cannot be taken often enough. It also makes it possible to use datasets for which no calibration is available. This paper describes the method and the conditions that must be met for it to be applicable.


Introduction
In many particle physics experiments, the data acquisition system (DAQ) monitors the signals from the detector and initiates recording of data when a pre-defined trigger condition is met. The DAQ has only a short time window in which to determine, based on the detector data, the variables on which the trigger condition is based. Hence, the decision about when to trigger must be based on an imperfect estimate of the quantity of interest (such as the event energy), which can be calculated with the required speed [1] and without the use of calibration inputs, which are often only determined after a dataset is recorded.
Consider a situation where the DAQ bases the trigger decision on an online variable x. Data read-out is triggered whenever x θ x , so the trigger efficiency in x is a step a e-mail: tina.pollmann@tum.de function (x) = (x − θ x ). In offline data analysis, the variable y is calculated based on the same event information, but using more elaborate algorithms and calibration inputs. The y variable is therefore a more precise indicator of the trigger quantity. Of interest for analysis is the trigger efficiency as a function of y, (y).
As an example, the DEAP-3600 dark matter detector uses a constant threshold trigger [2]. Data read-out is triggered whenever the signal intensity from the light detectors, which is related to the event energy, passes a fixed threshold. In offline analysis, the signal intensity is converted into the total number of photo electrons, using calibration constants that account for differences in light detector gain between different light detectors, and for changes in gain over time. This offline variable measures the event energy more precisely, but near the trigger threshold, the efficiency is no longer a simple step function.
A number of methods exist to determine (y) by way of a calibration dataset. These rely on measuring directly or indirectly the true rate of events at each value of y, so that by comparison with the rate obtained after the trigger, the trigger efficiency can be calculated.
Obtaining a trigger efficiency calibration is not always possible. The calibration data could be corrupt, it could be impossible to record calibration data due to electronics or physics constraints, or the calibration could drift with time faster than calibration datasets can be taken. In such cases, the efficiency curve might still be recovered or verified provided the value of x for each event was recorded or can be obtained offline.
In the DEAP-3600 detector for example, recording a calibration dataset takes approximately 48hours, so performing regular trigger efficiency calibrations reduces the lifetime of the detector for dark-matter search data. The trigger efficiency changes the shape of the spectra used to obtain the energy calibration. It also changes the shape of the distributions of certain backgrounds in a pulseshape-discrimination parameter, which cannot be modelled correctly unless a trigger efficiency correction is applied [3]. Therefore, monitoring the trigger efficiency on an ongoing basis is crucial to some analysis efforts.

General principle and illustration of the method
A dataset contains the value of x and y for each event. For concreteness, we say that these are both variables for the event energy. We assume that the events recorded have a continuous spectrum in both x and y in the region relevant to the trigger.
Consider the histogram of x versus y for many events, I (x, y). Because x and y are variables describing the same quantity, they are correlated and the data will form a 'band' in this 2-dimensional space. The data has a spectrum in the x parameter, I (x) = ∞ −∞ I (x, y)dy, and a spectrum in the y parameter To illustrate how to obtain (y) from a dataset, we create data in a toy Monte Carlo (MC) simulation with a spectrum I (x) = 10x, a trigger threshold θ x = 100, and a resolution such that the shape of the y distribution for events of the same x is a skewed Gaussian. The I (x, y) histogram for the simulated events is shown in Fig. 1a. The functional form of the relation between x and y is not typically known a-priori in real data, and the method developed here does not rely on such knowledge, so we limit ourselves in this section to information that can be obtained from the data. This situation will be analysed mathematically in Sect. 3.2. Figure 1a shows what the real detector data might look like. To obtain the trigger turn-on curve in y, the following steps are taken 1. Normalize the I (x, y) histogram such that I (x) = 1 for x θ x . This can be achieved by dividing each bin (x,y) in the histogram by I (x). We denote the histogram normalized in x as I x (x, y). It is shown in Fig. 1b. After this normalization, I x (x) is equal to the efficiency curve Because in the simulation the functional dependence is known, the functional shape of the turn-on curve can be fit to the data. This is the blue solid line in Fig. 2a The crucial feature necessary for this method to work is the constant plateau in I x (y). In other words, that step 1 forces the spectrum I x (y) to be uniform. A constant plateau arises when I x (y) does not depend on x or y until the trigger threshold is introduced. That is,Î x (y) = c where the hat indicates the absence of the trigger and c is a constant plateau height. Any value in the actual I x (y) histogram not equal to c then indicates the influence of the trigger and the difference c − I x (y) is proportional to the number of events missing at y due to the trigger efficiency. 1 The conditions necessary to obtain a constant plateau will be discussed in Sect. 5.

Validation
In the following two sections, we analytically demonstrate the validity of this method for two common resolution functions and discuss the conditions that must be met to obtain a constant plateau region.

Gaussian example
In this section, the method of determining the trigger efficiency is discussed mathematically for Gaussian resolution functions. The offline variable y follows a Gaussian distribution for any given value of x. The shape parameters of the Gaussian distribution are functions of x, and the data has some spectrum N (x) 2 : Division by N (x) gives I x (x, y) which is by construction already normalized such that I x (x) = 1 above the trigger threshold. To obtain I x (y), an assumption must be made about the shape parameters. In the simplest case, σ (x) = σ and μ(x) = a · x. Thus: We temporarily ignore the trigger condition to study the spectrum in y (indicated by the hat).
We find that the spectrum in y is a constant if no trigger condition is applied. Thus, we proved here that the critical condition for the method to work is met, i.e. that for values of y well above the trigger region, I x (y) forms a constant plateau.
The analytic shape of the turn-on curve can be obtained by including the trigger condition in the integral The integral in Eq. (5) is formally equal to a convolution of a Gaussian with a step function. This curve describes the fraction of events that pass the trigger for each value of y, relative to some plateau height 1 a that is reached for y aθ x . It can be turned into the trigger turn-on curve by scaling such that the plateau is at 1 In a more realistic case, both the mean and the width of the y distribution at a given x vary with x: σ (x) = b · x and μ(x) = a · x so that This integral cannot be solved analytically. The numeric solution for a = 0.2y and b = 0.015y is shown in Fig. 4. Panel (b) shows that a constant plateau exists and the plateau This method does not produce proper efficiency curves in all situations. Figures 5 and 6 show situations when it does not work, namely when at least one of the mean or the sigma functions are polynomials of level bigger than 1. In both figures, the trigger efficiency model from Eq. (8) is drawn to illustrate the differences.

Skewed Gaussian example
To show that this method does not work only for Gaussian distributions, we repeat the calculation for skewed Gaussian (also called exponentially modified Gaussian (EMG)) resolution functions: In the simplest case, σ (x) = σ , λ(x) = λ, and μ(x) = a·x, shown in Fig. 7. The y-axis projection without the trigger condition is then 3 (see Appendix 1).
Again, a constant plateau height is expected in the absence of the trigger.
The analytic shape of the trigger turn-on curve is again obtained by including the trigger condition: 3 The lower integration bound is set at 0 because of the definition of the EMG.
and dividing by the plateau height Figure 7 shows the EMG model for values of μ = 2y, σ = 3, and θ x = 100. The turn on curve, Eq. (15), is drawn as well. Function parameters are taken at the trigger threshold (i.e. this is not a fit). Figure 8 shows the skewed Gaussian example for μ = 0.2y, σ = 0.015y, and λ = 0.7. The efficiency curve again shows a plateau and the shape is described by Eq. (15), but here the parameters of the turn-on curve were fit out to μ = 19.8, σ = 2.0, and λ = 0.705, so the parameters that determine the shape differ slightly from the parameters of the skewed Gaussian at the trigger threshold. As previously seen in the Gaussian example, the method fails if μ or σ are polynomials of order bigger than 1 in y. It also fails if λ is not constant, though if the dependence of λ on x is not strong, an approximately flat plateau region is obtained.

Uncertainties
The normalization of the spectrum in x adds correlated uncertainties to the statistical uncertainties of each bin in the I x (y) histogram. Then, the plateau level must be fit out, introducing an uncertainty in the 'true' number of events. The final efficiency curve or histogram comes with a complicated mixture of correlated and uncorrelated, statistical and systematic uncertainties. Furthermore, because the 'true' number of events is only an estimate, the efficiency histogram can have values bigger than 1.
Some assumptions can be made to simplify the uncertainties. The uncertainty on the total number of events in each x bin will always be smaller than the statistical uncertainty on the events in any (x, y) bin. Thus, the correlated uncertainties can be neglected in the I x (y) histogram. The efficiency histogram can be fit with an analytic function where the maximum value is constrained to 1. Then, confidence regions can be obtained in the usual way by varying the fit parameters within their uncertainties.
The uncertainty on the plateau height must be minded as a systematic uncertainty.

Discussion
The method introduced here allows an estimation of the trigger efficiency turn-on curve without the use of calibration data. This method will have a larger uncertainty than a typical efficiency calibration for the same amount of data used. However, since 'physics' datasets are often much larger than calibration datasets, this method can result in a more precise estimate.
The method works if a number of assumptions are true: (1) The trigger bases the trigger decision solely on an online parameter x, and the value of x is known for each event. (2) The trigger curve in x is a step function; that is the efficiency is known to be 0 for x < θ x and 1 for x θ x . (3) The existing data covers the full available parameter space in the trigger turn-on region, and far enough into the plateau region to estimate the plateau height. (4) The distribution of the offline parameter y for events with the same x, I (y; x = const), has the same functional form at all values of x (or at least for all values of x in the critical turn-on region and far enough into the plateau region that the plateau height can be obtained).
(5) The I (x, y) histogram shows a linear dependence of y on x, and the width of the I (y; x = const) distribution is a polynomial of order 1 in x. Non-Gaussian distributions will have additional requirements on the distribution shape parameters. These do not have to be explicitly determinedif this method produces a flat plateau region, the conditions are met.
Condition (1) is typically met in particle physics experiments. We note that if the trigger variable x is not recorded for each event, it can often by re-constructed by programming an offline analysis algorithm that reproduces the trigger module algorithm.
Condition (2) must be met such that the result of this method is in fact a trigger efficiency. This method determines the efficiency curve of y with regard to x, not the efficiency of y with regard to the actual trigger. But if the trigger efficiency in x is a step function with values of either 0 or 1, the curve obtained is the trigger efficiency in y. If the trigger efficiency in x is not a step function, then it must additionally be obtained another way before the trigger efficiency in y can be determined. A typical situation where this is the case would be one where the trigger is pre-scaled or otherwise known to approach a value different from one.
Condition (3) means that the physics data recorded must contain a sufficient number of events with values of x near the trigger threshold. If it does not, presumably, the trigger turn-on curve is not of interest to begin with.
Condition (4) is the only one that is not trivial to verify based just on the physics data. An unchanging functional form of the I (y; x = const) distribution is a reasonable assumption in most cases, but should if possible be checked using a traditional efficiency calibration approach. If I (y; x = const) is known analytically, such that the shape of the turn-on curve can be obtained by convolution with a step function, then the shape of the data should be well described by this analytic turn-on curve. If the shapes do not match, it would be an indication that condition iv) is not met.
We showed analytically that this method works for certain forms of Gaussian and skewed Gaussian resolution functions. We expect that this method will work for many realistic distributions I (x, y), as long as I (y; x = const) tends to 0 at both tails. This can be intuitively understood. At the integration borders of y = −∞ and y = ∞, the distribution is 0. x determines where inside the integration region the distribution peaks (through μ = μ(x)), but since the integration goes from minus to plus infinity, the location of the distribution on the y-axis is not relevant.
The events used to obtain the trigger efficiency calibration do not need to be signal events. Taking the DEAP-3600 detector as an example again, events from a high-rate background, the beta decay of 39 Ar, are used to obtain the trigger efficiency calibration.

Summary
We have presented a method to obtain the trigger efficiency turn-on curve for a physics dataset. This method uses only the physics data itself, that is it does not require calibration data. It is based on several assumptions that are fulfilled for many types of experiments but at least one of which is difficult to verify without a calibration. Therefore, this method is particularly well suited to tracking the efficiency turn-on curve over time. It can be verified against calibration at any point of data recording and, once verified, be used to obtain the trigger efficiency curve over time, for example if calibration parameters drift faster than it is reasonable to record calibrations.
It can also be used to find out where full efficiency is reached, even if the precise shape of the turn-on is not reliable because the method was not verified. This can be useful when a dataset must be analyzed for which no other calibration is available, to at least find out in which region the recorded data is reliable.

Data Availability Statement
This manuscript has no associated data or the data will not be deposited. [Authors' comment: This work is about a mathematical method, and uses only data generated in a simple toy Monte Carlo simulation.] Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Funded by SCOAP 3 .