Search for boosted diphoton resonances in the 10 to 70 GeV mass range using 138 fb$^{-1}$ of 13 TeV $pp$ collisions with the ATLAS detector

A search for diphoton resonances in the mass range between 10 and 70 GeV with the ATLAS experiment at the Large Hadron Collider (LHC) is presented. The analysis is based on $pp$ collision data corresponding to an integrated luminosity of 138 fb$^{-1}$ at a centre-of-mass energy of 13 TeV recorded from 2015 to 2018. Previous searches for diphoton resonances at the LHC have explored masses down to 65 GeV, finding no evidence of new particles. This search exploits the particular kinematics of events with pairs of closely spaced photons reconstructed in the detector, allowing examination of invariant masses down to 10 GeV. The presented strategy covers a region previously unexplored at hadron colliders because of the experimental challenges of recording low-energy photons and estimating the backgrounds. No significant excess is observed and the reported limits provide the strongest bound on promptly decaying axion-like particles coupling to gluons and photons for masses between 10 and 70 GeV.


Introduction
1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Δ ≡ √︁ (Δ ) 2 + (Δ ) 2 .
Signal event samples were generated for a hypothetical resonance produced in gluon-gluon fusion in association with up to two additional jets for resonance masses between 10 and 80 GeV. The decay width Γ was set to 4 MeV, negligible compared to the experimental resolution, to describe a hypothetical resonance in the narrow-width approximation (NWA). The samples were generated using the effective-field-theory approach [36] implemented in MadGraph5_aMC@NLO [37] with the NNPDF2.3lo PDF set [38], and using the A14 set [39] of tuned parameters and Pythia 8.240 [40] to simulate parton showering, hadronization and the decay of the resonance into a pair of photons.
Background events with two prompt photons and associated jets were simulated using the Sherpa 2.2.4 [41, 42] event generator. Matrix elements were calculated in perturbative QCD (pQCD) at next-to-leading order (NLO) for up to one additional parton, and at LO for two or three partons, and merged with the Sherpa parton-shower simulation using the MEPS@NLO prescription [43][44][45][46]. The NNPDF3.0nnlo PDF set was used in conjunction with a dedicated parton-shower tune in the Sherpa generator. Interference effects between the resonant signal and all background processes are expected to be small for narrow-width signals and are neglected in this analysis.
The effects of multiple interactions in the same bunch crossing as the hard scatter and in neighbouring ones (defined as pile-up) are included using simulated events generated with Pythia 8. Simulated events were weighted to reproduce the distribution of the average number of interactions per bunch crossing observed in data.
All simulated signal events were processed using a full simulation of the ATLAS detector [47] based on Geant4 [48]. The background events were processed using a fast simulation of the ATLAS detector [49], where the full simulation of the calorimeter is replaced with a parameterization of the calorimeter response. All simulated events were reconstructed with the same reconstruction algorithms as those used for data.

Object and event selection
Photon candidates are reconstructed from topological clusters of energy deposited in the EM calorimeter, as well as from charged-particle tracks and conversion vertices reconstructed in the inner detector, and they are calibrated as described in Ref. [30]. The event selection requires at least two photon candidates with transverse energies larger than 22 GeV and | | < 2.37, excluding the barrel-to-endcap transition regions of the calorimeter, 1.37 < | | < 1.52. The transverse energy requirement is chosen to mitigate the effect of the trigger efficiency turn-on from the trigger thresholds discussed in Section 3. The properties of the EM clusters associated with the two highest-T photons and additional information from the tracking systems are used to identify the diphoton production vertex [50], which is used to correct the photon direction, resulting in improved resolution.
To reduce the background from jets, photon candidates are required to satisfy tight identification criteria based on the shape of EM showers in the LAr calorimeter and energy leakage into the hadronic calorimeter [30]. Events with one or both photon candidates passing a looser identification are kept for background estimations. The tight identification is optimized in ranges of photon T and | |, and has an identification efficiency that increases with T from 70% at 22 GeV to 90% above 50 GeV.
To further improve the rejection of jets misidentified as photons, the candidates are required to be isolated using information from both the calorimeter and tracking subsystems. The calorimeter isolation transverse energy iso,calo T is required to be smaller than 0.065 T , where iso,calo T is defined as the sum of the transverse energies of positive-energy topological clusters [51] within a cone of size Δ = 0.2 around the photon candidate, excluding the photon transverse energy T and correcting for pile-up and underlying-event contributions [52][53][54]. The track isolation transverse energy iso,trk T is required to be less than 0.05 T , where iso,trk T is defined as the scalar sum of the transverse momenta of tracks with T > 1 GeV in a Δ = 0.2 cone around the photon candidate, and which satisfy some loose track-quality criteria, are not associated with a photon conversion, and originate from the diphoton production vertex. The combined isolation efficiency for pairs of photons fulfilling the identification requirement in simulated signal samples increases with from 80% at 10 GeV to 90% at 90 GeV.
The diphoton invariant mass is computed using the transverse energies of the leading and subleading photon candidates and their angular separation in both azimuth and pseudorapidity , determined from their positions in the calorimeter and the diphoton production vertex.
An additional kinematic selection is placed on the transverse momentum of the diphoton system, T , requiring events to have a diphoton pair with T > 50 GeV. This requirement is motivated by the fact that the analysis targets diphoton pairs with low masses, down to about half the trigger energy thresholds, and such pairs are typically highly boosted with respect to the ATLAS detector rest frame. The T requirement is chosen in order to reach the best compromise between the statistical uncertainty in the lowest part of the spectrum and sculpting effects on the background shape from the trigger efficiency turn-on, the modelling of which would result in large systematic uncertainties.
In total, 1 166 636 data events with < 80 GeV are selected.
Following the detector-level selection, the measurement of the signal production cross-section is performed in a fiducial volume defined from the simulated samples by requiring two photons at particle level with T > 22 GeV, | | < 2.37 and T > 50 GeV. The particle isolation, defined as the scalar sum of the T of all the stable particles (except muons and neutrinos) found within a Δ = 0.2 cone around the photon direction, is required to be less than 0.05 T . This isolation requirement is chosen to reproduce the detector-level selection.

Signal modelling
The shape of a possible signal in the diphoton invariant mass distribution is modelled by a double-sided Crystal Ball (DSCB) function, composed of a Gaussian core with power-law tails [1,55], whose parameter values evolve linearly with respect to the mass . The parameters of the DSCB function are extracted from fits to the MadGraph simulated signal samples. The width of the Gaussian core is entirely determined by the resolution of the detector and ranges from 0.2 to 1.2 GeV, as shown in Figure 1(a). Good agreement between the signal parameterization and the simulated signal samples is found, with differences below 1% of the fitted signal yield. An example of the simulated resonance overlaid with the signal model parameterization is shown in Figure 1

Background estimates
The dominant background components consist of continuum production, and of photon-jet ( and ) 2 and jet pair ( ) events where one or more jets are misidentified as photons. Other backgrounds arising from electrons faking photons in boson decays are found to be negligible in the mass range of this search and are not considered. The analysis makes use of a data-driven background estimate in which the continuum background shape is parameterized by an analytic function. The chosen analytic functional form is described in Section 6.1. The uncertainty arising from the choice of background model is based on signal-plus-background fits to background-only template histograms with a binning of 100 MeV, following the methodology described in Ref. [56] and further described in Section 6.1. The background modelling uncertainty is found to be dominated by the limited size of available simulated event samples. In order to reduce the impact of the background modelling uncertainty on the analysis, the background templates are smoothed using a Gaussian Processes fit. This technique is described further in Section 6.2, and its impact on the analysis is detailed in Section 6.3.

Background template modelling
The background-only template has two components. The component is built from the simulated samples described in Section 3, and the and components are built from control samples obtained from data, in which one or both photons must fail the tight identification requirements while passing a looser set of identification cuts. The two components are combined according to their relative fractions. The relative contribution of each of these processes is shown in Figure 2 and is estimated using the two-dimensional sideband method described in Ref. [57]. The purity of the diphoton sample, defined as the fraction of events, increases with from 50% at 10 GeV to 70% at 80 GeV with an overall uncertainty of 3% dominated by its statistical component arising from the limited size of the data sample collected with the prescaled triggers described in Section 4. No significant difference in the diphoton purity is observed between the various LHC data-taking periods of Run 2. The goal of this analysis is to reach the lowest invariant mass possible, including the 'turn-on' region for masses below 20 GeV. The resulting background shape needs to be described by a more complex analytic form than in previous diphoton resonance searches [2,26] and is constructed as the combination of two pieces: one capable of describing the turn-on shape and a second used to describe the smoothly decreasing part. The two-parts analytic function described below was found to adequately model the background shape across the full invariant mass range of the search.
The turn-on region (TO) is described by the following function: where 0 corresponds to the value of the function at = 0 and TO is the length scale of the turn-on.
The smoothly falling region beyond 30 GeV is described by a power-law function multiplied by an 'activation' function to increase its flexibility in the high mass region (above 50 GeV). For this activation term, an exponential function times a 'Fermi-Dirac-like' function is chosen, and the total function is: where 1 , 0 and 0 are the parameters of the power-law term, and tail , tail , thresh and thresh describe the activation function. The power-law part is qualitatively described by its endpoints, being 1 at = 0, and 0 at = 1 , and a fixed value of 1 = 115 GeV can be set with no impact on the flexibility of the complete model. The activation function only plays a role above thresh , below which its value is practically 1.
The complete functional form is obtained by adding the two components and has ten parameters in total: where TO is the parameter that describes the relative contribution of the turn-on component and are the sets of parameters belonging to ℎ High and ℎ TO . To reduce the potentially large correlations between the ten parameters, a subset of them are fixed to ensure the convergence of the fit. The choice of free parameters is based on the results of stability tests using generated pseudo-datasets from the best fit to the background template. The chosen configuration has the largest number of floating parameters, with only three fixed parameters ( 1 , thresh , thresh ) while the remaining seven parameters are free to vary.
Variations of the nominal background template are built to validate the flexibility of the chosen functional form and subsequently to estimate background modelling systematic uncertainties in Section 6.3. They are constructed by i) modifying the fractions of the background components, referred to as variations of the fraction ( ); ii) varying the identification criteria used to define the and control regions, referred to as variations of the control region; and iii) altering the templates by varying the T cut by 10%, referred to as T variations. These variations change the steepness of the turn-on by up to 20%, and the slope in the high mass region by ±5%, with respect to the nominal background template.

Gaussian Processes to mitigate statistical fluctuations in background templates
The bias arising from the choice of background model is evaluated from signal-plus-background fits to background-only templates: any fitted signal yield SS is referred to as a 'spurious signal' and it is considered as a systematic uncertainty in the modelling of the background shape. The functional form of Eq.
(1) provides an acceptable maximal spurious signal that is below 30% of the statistical uncertainty in the mass range between 10 and 75 GeV, and therefore it is chosen as the background model.
The estimation of this systematic uncertainty requires the shape of the background template to be as close as possible to the shape of the data distribution, but with an event count large enough for the statistical fluctuations of the template to be negligible. When evaluating the uncertainty, if the background model perfectly describes the representative background sample, then the number of signal events fitted by the signal-plus-background model will be zero. However, the representative background-only sample for this analysis is constructed using a limited number of simulated diphoton events, and the presence of statistical fluctuations in the sample introduces large statistical fluctuations in the number of fitted signal events, regardless of the quality of the background model. This issue was addressed previously [26] by using simulated datasets with much larger event counts than in data, which leads to smaller statistical fluctuations in the background shape but is computationally expensive. In order to meet the aforementioned requirements, an alternative approach to easing statistical effects on the background modelling uncertainty by using Gaussian Processes is followed instead.
A Gaussian Process (GP) is a flexible Bayesian machine-learning technique which may be used to obtain a non-parametric fit to an input dataset [58]. The analysis uses the scikit-learn GP implementation [59] to fit a GP to the representative background sample histogram; the posterior mean of the GP fit is used as a smoothed background template. The combined signal and background model fit is then performed on the smoothed template instead of the original representative background sample. The degree of smoothing applied is controlled through the choice of kernel and its hyperparameters. A radial basis function (RBF) kernel [58] with an additional constant noise component is utilized here. The RBF kernel includes a length-scale hyperparameter that encodes the correlation between the event counts for different bins in invariant mass. The contents of bins which are less than the length scale apart in invariant mass are highly correlated, while the contents of those which are much further apart than the length scale are essentially uncorrelated. Physically, the length scale encodes a minimum feature size expected in the background shape, which in this analysis corresponds to the 1-2 GeV width of the trigger efficiency turn-on region and thus is also greater than the 100 MeV bin width of the original background histogram. The kernel hyperparameter values are determined in the fit to the representative background sample. Notably, because GPs are non-parametric in nature, the GP smoothing technique is not expected to significantly bias the shape of the resulting smoothed template towards a specific choice of analytic background model.
GPs may introduce mis-modelling at the edges of the diphoton invariant mass distribution, since edge points are only constrained by their correlation with other data points on one side. To mitigate these edge effects, the GP fit is performed using an extended invariant mass range of 7-80 GeV, while the combined signal and background model fit is performed using the nominal analysis invariant mass range of 9-77 GeV. Figure 3 shows an example of a pseudo-dataset generated from the nominal background modelling function, as well as the GP-smoothed pseudo-dataset. The smoothed pseudo-dataset is observed to reproduce the nominal background modelling function shape with relatively high accuracy, and without the bin-scale statistical fluctuations of the original pseudo-dataset. The smoothed pseudo-dataset shows some oscillatory behaviour beyond the turn-on region; these features are an artefact of the steep slope in the turn-on region pulling the fitted length scale to a smaller value than would otherwise be needed to model the remaining mass range. Similarly, some mis-modelling of the smoothed pseudo-dataset is observed in the turn-on region because of the length scale being pulled to a larger value by the higher mass region. The magnitude of the fluctuations in the smoothed pseudo-dataset is significantly smaller than that of statistical fluctuations in the unsmoothed pseudo-dataset.

Impact of smoothing on the background modelling uncertainty
In order to verify that the GP smoothing technique does not introduce any significant bias into the background histogram shape, its effect is checked on an ensemble of pseudo-datasets generated from a known background shape. Ensembles are generated using both the nominal background modelling function (provided in Eq. (1)) and a set of analytic forms capable of describing the turn-on feature in the background shape. This set is composed of functional forms built similarly to the nominal functional form in which either the turn-on component or the smoothly falling component is replaced by other analytic forms, such as different Fermi-Dirac-like functions for the turn-on or sums of exponential functions for the smoothly falling component. The parameters of the analytic forms are determined by fitting them to the original simulated background sample, and each histogram in the ensemble is generated with the same effective event count as in the original simulated background sample.
The background modelling uncertainty is then evaluated for each pseudo-dataset, with and without smoothing. The aforementioned functional forms are used to probe for potential smoothing bias in both the cases where the analytic background model did or did not properly describe the pseudo-dataset. The bias that arises from the GP smoothing technique is defined as the difference between the observed spurious signals in the smoothed template and the unsmoothed template. The uncertainty associated with the GP smoothing technique is observed to be roughly 20% of the background modelling uncertainty for masses below 20 GeV, stabilizing at 5% for larger masses. This bias is added in quadrature to the background modelling uncertainty.
The final background modelling uncertainty is computed as the envelope of the maximal fitted signal yields over all the background template variations defined previously in this section after smoothing. Figure 4(a) shows the number of spurious-signal events SS , taken as the background modelling uncertainty, relative to its statistical uncertainty for the unsmoothed and smoothed templates. Applying the GP smoothing procedure to the background template leads to a reduction of at least 50% in this background modelling uncertainty relative to the unsmoothed case. The uncertainty arising from the GP smoothing technique is found to be small compared to the decrease in background modelling uncertainty due to the reduction of statistical noise. The magnitude of the smoothing uncertainty, as well as the remaining background modelling uncertainty, is presented as a function of the diphoton invariant mass in Figure 4(b).

Statistical analysis
The data are interpreted by following the statistical procedure described in Ref. [60]. A binned likelihood function is built from the observed diphoton invariant mass distribution and the analytic functions discussed in Sections 5 and 6, describing the signal and background components in the 9 to 77 GeV mass range. The search is performed in the 10 to 70 GeV mass range to avoid edge effects, based on the different diphoton invariant mass resolutions at these values, as illustrated in Figure 1(a).
The parameter of interest to be extracted from the likelihood fit is the fiducial production cross-section times branching ratio fid · B ( → ). Since the measurement is performed in a fiducial volume (defined in Section 4) to allow easier reinterpretation of the results, the fiducial cross-section includes a correction factor to account for the signal detection efficiency: where S is the number of signal events fitted in data, L is the integrated luminosity, det MC is the number of reconstructed and selected signal events in the simulation and fid MC is the number of simulated signal events present within the fiducial volume. The values are computed from the simulated signal samples described in Section 3 and range from 0.2 to 0.5 as a function of .
The theoretical uncertainties affecting the measurement of fid · B ( → ) arise from variations of the renormalization and factorization scales affecting the signal efficiencies evaluated in simulated samples. The experimental uncertainties directly impacting the signal yield include those involved in the luminosity determination, the modelling of pile-up interactions in simulation, the trigger efficiency, and photon identification and isolation. An additional systematic uncertainty in the trigger is included to account for the capability of the trigger system to identify two closely spaced electromagnetic showers. Events containing a → decay, recorded only with electron triggers, in which the photon is close to one of the two electrons are used to evaluate the photon trigger efficiency in data and simulated radiative Z samples [35]. The observed difference is added in quadrature to the nominal trigger systematic uncertainty. Uncertainties in the signal shape parameterization from the modelling and the determination of the photon energy resolution and scale are also accounted for, with mild impact on the signal yield.
The systematic uncertainties are implemented in the likelihood function as nuisance parameters constrained by Gaussian penalty terms, except for the background modelling systematic uncertainty, which is implemented as an additional signal component. All sources of systematic uncertainties are summarized in Table 1.
The compatibility of the observed data and the background-only hypothesis for a given signal hypothesis is tested by estimating a local -value based on a profile-likelihood-ratio test statistic, detailed in Ref. [60]. The global significance of a given event excess is computed using background-only generated pseudo-datasets to account for the look-elsewhere effect [61].
In the absence of a signal, the expected and observed 95% confidence level (CL) exclusion limits on the cross-section times branching ratio are evaluated using the modified frequentist approach CL s [62,63] with the asymptotic approximation to the test-statistic distribution.

Results
The diphoton invariant mass distribution of events passing the analysis selection is shown in Figure 5, along with the background-only fit performed in the 9 to 77 GeV mass range.
The result of the -value scan as a function of the hypothesized resonance mass is shown in Figure 6. The most significant deviation from the background-only hypothesis is observed for a mass of 19.4 GeV, corresponding to a local significance of 3.1 . The global significance of such an excess is 1.5 , computed

Phenomenological interpretation
In this section the observed limits on the fiducial cross-section for a hypothetical resonance are recast in the parameter space of an ALP . The KSVZ-ALP model, inspired by the simplest QCD-axion model [64][65][66], is chosen as a benchmark because it allows for couplings between the ALP field and gauge bosons, including a non-zero coupling to gluons and photons. It is described by the following effective Lagrangian: where and are the ALP field and its mass, and corresponds to the decay constant that governs its coupling with the SM fields. The QCD and EW field strengths are denoted by , and with = (1/2) for all field strengths. The coupling constants 3 , 2 and 1 = (5/3) (where stands for the weak SM hypercharge coupling constant) set the strength of the strong and EW interactions in the SM. The coefficients encode the anomalies of the global symmetry non-linearly realized by the ALP with the SM gauge group. These anomalies are generated by integrating out heavy fermions which are charged under the SM gauge group at the scale . This Lagrangian is equivalent to the one used to generate the simulated signal samples described in Section 3.
The ALP under consideration, being the pNGB of an approximate global symmetry, remains naturally light well below the scale of new physics. Considering , the mass of the ALP, to be much smaller than , the relevant two-body decays of are to photons and to jets, with widths which can be found, e.g., in Ref. [67]. In the 10 to 70 GeV mass range and for the choice of anomalies 1 = 2 = 3 = 10, the branching ratio B ( → ) varies from 0.6 · 10 −3 to 1.6 · 10 −3 . Choosing to set the magnitude of the parameters to be the same is motivated by gauge coupling unification in a Grand Unified Theory scenario. While the specific value of 10 is arbitrary, the rescaling of the results to a different anomaly (2)). The observed and expected lower bounds on the ALP decay constant derived from this analysis are shown in black solid and dashed lines respectively. BABAR bounds on → derived in Ref. [72] are shown in purple; in green the LHC bounds on boosted dĳet resonances [73] and in blue the LHC searches for diphoton resonances taken from Ref. [67]. The red bounds are derived from Tevatron [74] and LHC [57,75,76] diphoton cross-section measurements, following the method described in Ref. [67]. Weaker constraints covering lower invariant masses are obtained from LHCb diphoton measurements [77] and from LEP searches for → ( ) [78], in cyan and yellow respectively. On the right, the -axis shows the ALP-photon coupling ≡ em / ( = 2 + 5 3 1 ), a standard QCD axion notation.
parameter choice would be trivial. The ALP Lagrangian in Eq.
(2) is implemented in Feynrules [68], and the production cross-section at the LHC for the process → is computed at leading order with MadGraph [69], where the gluon is explicitly required to boost the ALP. A constant -factor = 2 is applied to this cross-section to account for NLO corrections, which were computed for a similar signal topology in Ref. [70]. ALPs that couple to gluons decay promptly over the entire mass range of interest for this study (recent studies of displaced ALP decays can be found in Refs. [70,71]). Because of the large hierarchy between and , and the loop suppression of the coupling, the ALP total width is dominated by its coupling to gluons and is always small compared to its mass. As a consequence, the narrow-width approximation always applies and finite-width effects can be safely neglected.
The recasting is done by comparing the theoretical signal yield obtained from the ALP model of Eq. (2), after applying the particle-level selection described in Section 4, with the bounds on the fiducial cross-section in Figure 7. The signal cross-section times branching ratio can be written as 1/ 2 times a weakly varying function of the ALP mass. The upper limit on the cross-section then results in a lower limit on , which is shown in Figure 8 for a specific choice of the coefficients. Figure 8 shows how the sensitivity of the search presented here covers a large portion of the unexplored ALP parameter space where the heavy colour states generating the ALP coupling to gauge bosons are in the multi-TeV range and therefore unaccessible at the LHC. Any production mechanism other than gluon-gluon fusion suffers from a smaller production cross-section, and the decoupling of the heavy states inducing the ALP coupling to SM states would require further study.
Constraints from Υ → ( ) [79], constraints from boson width measurements [80], and ALP production in light-by-light scattering in heavy-ion collisions [81,82] are too weak to appear in the plot.

Conclusion
A search for new narrow-width boosted resonances is performed in the diphoton invariant mass spectrum ranging from 10 to 70 GeV, using 138 fb −1 of collision data collected at a centre-of-mass energy of 13 TeV with the ATLAS detector at the Large Hadron Collider. The data are consistent with the SM background expectation. Limits are set on the fiducial cross-section times branching ratio in a fiducial region defined to mimic the detector-level selection. The observed limits on fid · B ( → ) range from 4 to 17 fb, with variations mainly due to statistical fluctuations of the data. The dominant uncertainties arise from the limited number of collisions collected and the background modelling uncertainty. The impact of the latter is reduced by smoothing the simulated background-only sample with Gaussian Processes in order to reduce the statistical fluctuations in the sample. Furthermore, the observed limits are recast in the parameter space of an axion-like particle, covering a longstanding gap in diphoton resonance searches.
The crucial computing support from all WLCG partners is acknowledged gratefully, in particular from CERN, the ATLAS Tier-1 facilities at TRIUMF (Canada)