1 Introduction

Axion-like particles (ALPs), or more generally light (pseudo) scalars, are represented as gauge-singlets beyond the Standard Model (SM) that can couple to the 125 \(\text {GeV}\) Higgs boson [1, 2], and appear in many well-motivated extensions of the SM, e.g., the next-to-minimal supersymmetric standard model [3, 4]. These include models that address the baryon asymmetry of the universe [5, 6], offer a solution to the naturalness problem [7, 8], or provide insights into the nature of dark matter [9,10,11,12,13,14]. ALPs and light bosons produced in Higgs boson decays could also be mediators to dark sectors that do not otherwise couple to the SM [15].

A combination of ATLAS measurements of Higgs boson properties using \(139\,\text{ fb}^{-1} \) of data constrains the branching ratios into invisible and undetected states to be \({\mathcal {B}}(H\rightarrow \text {invisible})<10.7\%\) and \({\mathcal {B}}(H\rightarrow \text {undetected})<12\%\) at 95% confidence level (CL) [16, 17]. Combined measurements of Higgs boson couplings performed by the CMS Collaboration using \(138\,\text{ fb}^{-1} \) of data set upper limits of \({\mathcal {B}}(H\rightarrow \text {invisible})<16\%\) and \({\mathcal {B}}(H\rightarrow \text {undetected})<16\%\) at 95% CL [18]. These results allow potentially large branching fractions into beyond-the-standard-model (BSM) particles, \({\mathcal {B}}(H\rightarrow \text {BSM})\), such as ALPs.

In Ref. [19] it has been argued that if the ALP, denoted as a in the following, couples to at least some SM particles with couplings of order (0.01–1)TeV\(^{-1}\), its mass must be above 1 MeV. Taking into account the possibility of a long-lived ALP, large regions of so far unconstrained parameter space can be explored by searches for exotic, on-shell Higgs boson decays into two ALPs. In particular, this includes the parameter space in which ALPs can explain the observed discrepancy between the measurement [20,21,22] and the theoretical prediction [23, 24] of the anomalous magnetic moment of the muon [19]. It was suggested that subsequent ALP decays into photons provide unprecedented sensitivity to the ALP-photon couplings in the mass region above a few MeV, even if the relevant ALP-photon couplings are loop suppressed and the \(a \rightarrow \gamma \gamma \) branching ratios are significantly less than 100% [19].

This paper presents a search for decays of the 125 \(\text {GeV}\) Higgs boson into two ALPs in proton–proton (pp) collisions at the LHC [25]. The search is sensitive to events where each a-boson decays into two photons. For the first time, a dedicated search for long-lived \(a\rightarrow \gamma \gamma \) decays with a significantly displaced vertex within the tracking system of the ATLAS detector is performed, allowing a large region of the parameter space of the ALP-photon coupling \(C_{a\gamma \gamma }\) to be probed. Previous searches for \(H\rightarrow aa \rightarrow 4\gamma \) signatures were performed by the ATLAS [26] and CMS Collaborations [27, 28], but assumed promptly decaying ALPs and hence were valid only up to decay lengths of a few centimeters. The decay length scales with \(\tau _a \propto \Lambda ^{2} / \left( m_a^3 |C_{a\gamma \gamma }|^{2} \right) \), where \(m_a\) is the ALP mass and \(\Lambda \), the new physics scale, is assumed to be in the TeV range [19]. The limits obtained here for ALP masses \(m_a>15\,\text {GeV} \) are about one order of magnitude more stringent than previous ATLAS analyses using \(8\,\text {TeV} \) data, and reach sensitivies similar to or slightly better than previous analyses from CMS using \(132\,\text{ fb}^{-1} \) of \(\sqrt{s} = 13\,\text {TeV} \) data.

The paper is structured as follows. A brief discussion of the ATLAS detector and an overview of the Monte Carlo (MC) samples and data sets used are given in Sects. 2 and 3. Object reconstruction and the event selection are described in Sect. 4, where special focus is given to the reconstruction of collimated photon signatures and to the categorization of the final state topologies that can be reconstructed in the detector. The background estimates are discussed in Sect. 5, followed by a description of the dominant systematic uncertainties in Sect. 6, in particular those that involve the reconstruction of photons with a displaced production vertex. The statistical interpretation and the final results are summarized in Sect. 7. The paper closes with a brief conclusion in Sect. 8.

2 The ATLAS detector

The ATLAS detector [29] at the LHC covers nearly the entire solid angle around the collision point.Footnote 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.

The inner-detector system (ID) is immersed in a \({2}\,{\textrm{T}}\) axial magnetic field and provides charged-particle tracking in the range of \(|\eta | < 2.5\). The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements (hits) per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [30, 31]. It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to \(|\eta | = 2.0\). The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.

The calorimeter system covers the pseudorapidity range of \(|\eta | < 4.9\). Within the region \(|\eta |< 3.2\), electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering \(|\eta | < 1.8\) to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within \(|\eta | < 1.7\), and two endcap copper/LAr hadron calorimeters up to \(|\eta | < 3.2\). The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements, respectively.

The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers that measure the deflection of muons in the magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and \({6.0}\,\textrm{Tm}\) across most of the detector. Three layers of precision chambers, each consisting of layers of monitored drift tubes, cover the region \(|\eta | < 2.7\), complemented by cathode-strip chambers in the forward region. The muon trigger system covers the range of \(|\eta | < 2.4\) with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.

Events of interest for this analysis are selected using dedicated di-photon triggers. The first-level of the trigger system is implemented in custom hardware, followed by a high-level trigger [32] where further selections are made by algorithms implemented in software. The first-level trigger accepts events from the \({40}\,\textrm{MHz}\) bunch crossings at a rate below \({100}\,\textrm{kHz}\), which the high-level trigger further reduces in order to record events to disk at about \({1}\,\textrm{kHz}\).

An extensive software suite [33] is used in MC simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

3 Data set and simulated events

This analysis uses pp collision data collected by the ATLAS experiment from 2015 to 2018 with a centre-of-mass energy of \(\sqrt{s} =13\) \(\text {TeV}\). After data quality requirements [34] the full data set corresponds to an integrated luminosity of \(140.1 \pm 1.2\) fb\(^{-1}\) [35]. The lowest-threshold unprescaled di-photon triggers are used to select events for further analysis, as discussed in Sect. 4.2.

The main background contributions stem from multijet processes, where jets might be misidentified as photons, as well as Standard Model multiphoton processes [36]. However, MC simulation does not capture the full background shape observed in data and therefore a fully data-driven background estimation has been employed. Simulated events are used to validate the data-driven background estimation techniques described in Sects. 5.15.2, and to model Higgs boson decays into photons for both signal and background processes. Background events from continuum \(3\gamma \), \(4\gamma \) and \(\gamma \gamma \) production, neglecting any interference effects with the \(H\rightarrow \gamma \gamma \) decay, are generated using Sherpa 2.2.8 and Sherpa 2.2.4 [37,38,39], respectively, with the AZNLO set of tuned parameters [40] and the NNPDF30 [41] set of parton distribution functions (PDF). Production of the Higgs boson through gluon–gluon fusion (ggF) and vector-boson fusion (VBF) processes is modelled at next-to-leading order (NLO) using Powheg-Box v2 [42,43,44] interfaced with Pythia 8.186 [45] using the AZNLO tune and NNPDF30 PDF. The change in selection efficiency between ggF and VBF and other Higgs boson production mechanisms is at the percent level, so its impact is negligible. Thus, all signal (\(H\rightarrow aa \rightarrow 4\gamma \)) samples use the ggF Higgs boson production mechanism and have their cross-sections scaled to the total Higgs boson production cross-section predicted by a next-to-next-to-next-to-leading-order QCD calculation with NLO electroweak corrections applied [46,47,48,49].

Signal samples are produced for ALP masses in the range of \(0.1\,\text {GeV} \)\(62\,\text {GeV} \), with a spacing of \(0.1\,\text {GeV} \) in the range of \(0.1\,\text {GeV} \)\(0.5\,\text {GeV} \), a spacing of \(0.5\,\text {GeV} \) in the range of \(0.5\,\text {GeV} \)\(5\,\text {GeV} \) and a spacing of \(1\,\text {GeV} \) in the range of \(5\,\text {GeV} \)\(62\,\text {GeV} \), assuming \(\Lambda \) = 1 \(\text {TeV}\). The full set of mass points is simulated for four different values of the coupling \(C_{a\gamma \gamma }\): 1, 0.01, \(5\times 10^{-4}\) and \(10^{-5}\). Samples of simulated signal events are generated using Powheg-Box v2 at NLO for the Higgs production and Pythia 8.212 to simulate the decay into ALPs and the subsequent decay into photons employing a generic 2HDM model [50]. For hadronisation and parton showering, Pythia 8.212 is used as well, with the AZNLO tune and the CTEQ6L1 PDF set [51].

Different pile-up conditions due to additional pp interactions in the same and neighbouring bunch crossings are simulated by overlaying the hard-scattering event with inelastic pp events generated by pythia8.186 using the NNPDF2.3lo PDF set [52] and the A3 tune [53]. Differences between the simulated and observed distributions of the number of interactions per bunch crossing are corrected by reweighting the simulated events to match the data distribution.

All MC samples use a full simulation of the ATLAS detector [54] based on geant [55] to reproduce the detector response. Corrections are applied to the simulated events to match the photon selection efficiencies, energy scale and resolution to those determined in data.

4 Object reconstruction and event selection

The experimental signature of \(H\rightarrow aa \rightarrow 4\gamma \) depends significantly on the ALP mass and on its coupling to photons, \(C_{a\gamma \gamma }\). ALP masses below \(3.5\,\text {GeV} \) dominantly yield collimated photon signatures, which are reconstructed as one energy cluster in the electromagnetic calorimeter. The photons originating from the decays of axion-like particles with higher masses can typically be resolved by the ATLAS calorimeter system and the ATLAS identification algorithms. The \(C_{a\gamma \gamma }\) coupling determines the lifetime and hence the distance the ALPs travel after being produced, so the search for \(H\rightarrow aa \rightarrow 4\gamma \) is divided into two sub-searches: a search for promptly decaying ALPs corresponding to couplings \(C_{a\gamma \gamma } \ge 0.1\) and a search for long-lived ALPs that is optimised for smaller couplings. Coupling values smaller than \(C_{a\gamma \gamma }=10^{-7}\) cannot be probed within this analysis since they yield ALP lifetimes that imply a decay outside of the detector volume for all ALP masses considered.

4.1 Object reconstruction

Photons are reconstructed from topologically connected clusters [56] of energy deposits in the electromagnetic calorimeter in the region \(|\eta | < 2.5\) where a distinction to electrons is provided by the tracking system. Only photon candidates within \(|\eta | < 2.37\) are used in this analysis to allow the usage of a track based isolation variable. The transition region between the barrel and endcap electromagnetic calorimeters, \(1.37< |\eta | < 1.52\), is excluded due to poor resolution of the calorimeter in this area. The analysis does not distinguish between photons converting into \(e^+e^-\) pairs and unconverted photons.

Photon candidates are defined by the lateral shower profile of the energy deposits in the first and second electromagnetic calorimeter layers and by the fractional energy leakage into the hadron calorimeter. The analysis uses different photon identification criteria for different regions of the signal parameter space. Standard ‘Tight’ and ‘Loose’ photon identification criteria [57, 58], which are tuned for converted and unconverted photons separately, are applied in final states where individual photons can be separately reconstructed. Axion-like particles with small masses, e.g., \(m_a <3.5\,\text {GeV} \), decay predominantly into a pair of highly collimated photons, which are reconstructed as single photon object. For these merged photon objects, the standard photon identification is not efficient. Hence, a neural network (NN)-based classification approach is developed. A first classifier is trained to separate real photon signatures, single or collimated, from ‘fake photons’ that come from multijet background. A second classifier is then trained to separate single-photon signatures from collimated signatures. Both classifiers use eight shower-shape variables relevant for photon identification [58]. The training data sets are based on single- and collimated-photon signatures from simulation and fake-photon candidates from data; the fake-photon candidates are selected by inverting the requirement on several isolation observables. If a photon passes a minimum threshold of the output neurons of both classifiers, it is labelled as a ‘merged’ photon. The performance of the classifier is shown in Fig. 1. Due to coarse pointing resolution of the calorimeter it is not possible to reconstruct displaced secondary vertices from ALP decays.

Fig. 1
figure 1

Output of the neural network classifier to distinguish a real from fake photons, and b single from merged photons

The photon energy is reconstructed using the nominal ATLAS reconstruction and calibration procedures [59]. The energy of photon candidates that are identified as collimated photon signatures is corrected by adding the measured cluster energies in the calorimeter within a cone of \(\Delta R = 0.2\) around the photon, if not yet associated with the photon.

To further improve the rejection of misidentified photons, a track-based variable \(p_{\text {T}}^{\text {iso}}\) is defined as the scalar sum of the transverse momenta of all tracks with transverse momentum (\(p_{\text {T}} \)) above 1 \(\text {GeV}\) that originate from the primary vertex and are within a cone of \(\Delta R = 0.2\) around the photon candidate with transverse energy \(E^{\gamma }_{\text {T}}\). Isolated photons must have \(p_{\text {T}}^{\text {iso}}/E^{\gamma }_{\text {T}}<0.05\). A calorimeter-based isolation requirement is not used due to its large signal rejection for collimated photon signatures.

4.2 Event selection and categorization

Events are selected using one of two di-photon triggers [60], each of which requires two reconstructed photon candidates. The first trigger requires a minimum transverse energy of 35 \(\text {GeV}\) for the leading photon and 25 \(\text {GeV}\) for the subleading photon where both photons must satisfy the online Loose or Medium identification criteria [60], depending on the data-taking period. The alternative trigger requires that each photon satisfy the Tight identification criteria and have a transverse energy of at least 22 \(\text {GeV}\). The terms ‘leading’ and ‘subleading’ refer to the photon candidate with the highest and second-highest transverse energy, respectively. Photon candidates in the event must have a minimum transverse energy corresponding to the trigger threshold for the leading and subleading photons and at least 15 \(\text {GeV}\) for any additional photon. For \(H\rightarrow aa\rightarrow 4\gamma \) events with \(C_{a\gamma \gamma }=1\) and at least two selected photon candidates, the average trigger efficiency is larger than 60% for all ALP masses. This still holds for \(C_{a\gamma \gamma } = 10^{-5}\) and \(m_a > 50\,\text {GeV} \), while for smaller masses the trigger efficiency decreases down to \(30\,\%\). The trigger efficiency drops for decreasing ALP masses as collimated photon signatures often fail the isolation requirement at trigger level, which is based on calorimeter information rather than tracks.

Signal events are fully reconstructed if all four photons are detected. However, some of the photons might be out of detector acceptance and some might not be reconstructed. The lifetime of the ALPs depends on their mass and coupling to photons, as shown in Sect. 1. Some examples are listed here: assuming a coupling of \(C_{a\gamma \gamma } = 10^{-4}\) a mass of \(m_a = 1\,\text {GeV} \) (\(m_a = 10\,\text {GeV} \)) yields a lifetime of \(c\tau = 25\,\text {m}\) (\(c\tau = 25\,\text {mm}\)). For \(C_{a\gamma \gamma } = 10^{-5}\) the lifetime increases by a factor of hundred. Missing photons as well as pairs of collimated photons being reconstructed as one photon result in fewer reconstructed photons. At least two reconstructed photon candidates are required for the analysis. Further classification is based on the number and types of reconstructed photons.

Each event is classified into one of the five categories according to their experimental signature, in the following order: (1) events with four reconstructed photons, where all photons satisfy the Loose identification and at least one satisfies the Tight identification, fall into the four-single (4S) category. (2) events with three reconstructed photons, where all photons satisfy the Tight identification, fall into the three-single (3S) category. (3) events with two merged photon candidates fall into the two-merged (2M) category. Additional loose photon candidates are likely to originate from background processes and are ignored, with negligible impact on the rate of falsely identified signal events. (4) events with two photon candidates, where one satisfies the merged classification and the other satisfies the Loose identification, fall into the one-merged-one-single (1M1S) category. This category accepts events where the merged classification of two photons is not efficient. (5) events with exactly two photons that satisfy the Tight identification but without any further photon candidates that satisfy the Loose identification fall into the two-single category (2S), which is dominated by events from the \(H\rightarrow \gamma \gamma \) process.

Only the most sensitive categories are used for the ALP search. For long-lived ALPs with masses \(m_a\ge 3.5\,\text {GeV} \) the 3S and 4S categories provide the most sensitivity. Long-lived ALPs with \(m_a<3.5\,\text {GeV} \) yield collimated photon signatures where the 2M, and 1M1S categories provide the largest sensitivity.

The 3S and 4S categories allow for the reconstruction of the ALP mass, \(m_a\), since at least one photon pair stems from the decay of an ALP for the signal process. Separate neural networks were trained for the three- and four-photon categories to select the correct photon pairing(s). The inputs to the networks are the invariant masses of all photon pair combinations and differences in their transverse energies and directions. The training sample consists of both correct and wrong photon combinations. The combinations are based on MC signal samples for all ALP masses and couplings. In the 3S category, the invariant mass of the photon pair that is predicted to stem from the same mother particle is defined as the reconstructed ALP mass \(m^{\text {reco}}_a\), while the average of both resulting invariant masses is defined as \(m^{\text {reco}}_a\) in the 4S category.

The signal region selection uses the invariant mass of all photon candidates, denoted by \(m^{\text {reco}}_{\text {inv}}\), which is expected to peak around the Higgs boson mass for signal processes. The signal regions for all categories and searches are defined to contain at least 90% of the reconstructed signal contribution in simulated data. For the two-photon categories (2M, 1M1S and 2S), the signal region is defined by the invariant mass requirement \(115\,\text {GeV}< m^{\text {reco}}_{\text {inv}} < 130\,\text {GeV} \) around the Higgs boson mass.

The 3S and 4S categories also require the reconstructed ALP mass \(m^{\text {reco}}_a\) to fall within a window around the generated ALP mass, whose size varies depending on the mass point being tested. In general, the requirements on \(m^{\text {reco}}_{\text {inv}}\) are looser in the 3S category since one photon typically escapes detection, leading to a wider \(m^{\text {reco}}_{\text {inv}}\) spectrum. The \(m^{\text {reco}}_a\) distribution also enables the definition of a control region in data by inverting the mass requirement on \(m^{\text {reco}}_a\).

In the search for promptly decaying ALPs only the 4S category is used when estimating the limits on the signal process, as it has by far the largest sensitivity. In this case the category is defined by tightening the selection cuts so that at least three out of the four photons satisfy the Tight identification requirements and is labelled \(4S_p\). Promptly decaying ALPs are only considered for \(m_a > 5\) \(\text {GeV}\), where a significant increase in sensitivity over the search for long-lived ALPs is observed. Table 1 summarizes the signal region definitions for all ALP mass hypotheses m\(_a\).

Table 1 Definition of the signal region for different event categories in the prompt and long-lived search

5 Background estimation

The main background contributions stem from multijet processes, where jets might be misidentified as photons, as well as Standard Model multiphoton processes. However, as the simulation does not capture the full background shape observed in data we rely on a fully data driven background estimate.

The signal is expected to peak in \(m^{\text {reco}}_{\text {inv}}\), the invariant mass of the selected photons, near the Higgs boson mass, \(m_H=125\,\text {GeV} \), in all categories and searches. The \(m^{\text {reco}}_{\text {inv}}\) sidebands are used for the background estimate in the signal region and for the estimation of spurious signal effects.

5.1 Two-photon final states in the search for long-lived axion-like particles

The distribution of \(m^{\text {reco}}_{\text {inv}}\) in all two-photon categories (2M, 1M1S, 2S) is fitted over a mass range from 100 to 150 \(\text {GeV}\), region excluding the signal region. A suitable fitting function should describe the data in the sidebands, provide an unbiased estimate of the background in the signal region, and produce small uncertainties on the yields of spurious signals. This is ensured by defining validation regions that use similar requirements to those of the signal region but reject signal events. Simulated samples of two-photon continuum processes and a data-driven validation region are studied. The latter is defined using the nominal signal selection and classification, but inverting the isolation cut on the photon candidates, thus yielding a multijet-enhanced data sample.Footnote 2 These validation regions are chosen to cover any scenario between a background composed purely of di-photon events and a background consisting of mis-identified multijet events.

A Landau function gives unbiased background estimates in the signal region of both validation samples and provides \(\chi ^2\) per number of degrees of freedom around unity (\(\chi ^2/{\text {ndf}} \approx 1\)) in all sideband regions for the 2S and 1M1S categories. It is expected that the background shape in the 2M category differs from the 2S and 1M1S categories, due to the different background composition after the requirements on the NN-based classifiers. A second-order polynomial provides a good description of all validation regions of the 2M category. Variations of the background estimates using different fitting functions and different fitting ranges are used to define systematic uncertainties and are discussed in Sect. 6.

The \(H\rightarrow \gamma \gamma \) process contributes not in the sideband but in the signal region as irreducible background, significantly in the 2S category and to a negligible extent in the 2M and 1M1S categories. Its contribution and shape are estimated by MC predictions.

Figure 2 shows the \(m^{\text {reco}}_{\text {inv}}\) spectra including the sideband fit for the signal selection for the 2M and 1M1S categories, respectively. The expected signal shape for an ALP with \(m_a = 0.5\,\text {GeV} \) and \(C_{a\gamma \gamma } = 0.01\) is also shown for illustration.

Fig. 2
figure 2

\(m^{\text {reco}}_{\text {inv}}\) distribution for the nominal signal selection for the a 1M1S and b 2M category. The nominal sideband fitting function is shown as the blue dashed line. The background, estimated from a fit in the side-band regions, and its systematic uncertainty is shown as a blue histogram for both cases. The green dotted line shows the alternative fitting function used to estimate the spurious signal uncertainty (discussed in Sect. 6). The expected signal shape for \(m_a=0.5\,\text {GeV}, C_{ayy}=0.01\) is also shown with an arbitrary normalization. The signal region selection on \(m^{\text {reco}}_{\text {inv}}\) is indicated using vertical dashed lines. The contribution from \(H\rightarrow \gamma \gamma \) is negligible and not visible in the figures. The lower panels show the data divided by the estimated continuum background, where the shaded area indicates the uncertainty on the background estimation

5.2 Three- and four-photon final states in the search for long-lived axion-like particles

The background estimation in the long-lived ALPs searches also employs a sideband fit using the \(m^{\text {reco}}_{\text {inv}}\) spectrum in the 3S and 4S categories.

Polynomials of third and second order serve as the nominal background fitting functions for the 3S and 4S categories, respectively, where the fits are carried out in the range of 80–150 \(\text {GeV}\) and 105–145 \(\text {GeV}\), excluding the signal region. First, the suitability of the sideband functions for background estimation in both categories is tested on three- and four-photon continuum MC samples. Next, the sideband functions and corresponding background estimates are validated using an orthogonal set of data events in which the requirement on the reconstructed ALP mass is inverted.

This inverted sample can be used as a validation region, since the shape of background events should not change with a different choice of ALP mass apart from minor kinematic changes in the \(m^{\text {reco}}_{\text {inv}}\) distribution. The multi-photon MC samples are used to correct for this kinematic effect.

The chosen fitting functions yield a \(\chi ^2/{\text {ndf}}\) close to unity in all validation regions and the estimated background using these validation regions is consistent with the observed numbers of background events from the signal region sidebands. Systematic uncertainties due to the choice of the background function and the fitting range are discussed in Sect. 6. Figure 3 depicts the \(m^{\text {reco}}_{\text {inv}}\) spectrum for various ALP mass searches, and shows the sideband fitting functions, the estimated background in the signal region, and the expected signal shape for two \(C_{a\gamma \gamma }\) coupling parameters.

Fig. 3
figure 3

\(m^{\text {reco}}_{\text {inv}}\) distribution for the nominal signal selection for the 4S category. The nominal sideband fitting function is shown as the blue dashed line. The background, estimated from a fit in the side-band regions, and its systematic variation (obtained from a fit with reduced range) is shown as the blue histogram. The green dotted line shows the alternative fitting function which is used to estimate the spurious signal uncertainty (discussed in Sect. 6). The four subfigures show different ALP mass ranges: a \(3.5\,\text {GeV}<m_a<10\,\text {GeV} \), b \(10\,\text {GeV}<m_a<25\,\text {GeV} \), c \(25\,\text {GeV}<m_a<40\,\text {GeV} \), d \(40\,\text {GeV}<m_a<62\,\text {GeV} \). The signal region selection on \(m_{a}^{\text {reco}}\) is applied while the signal region selection on \(m^{\text {reco}}_{\text {inv}}\) is indicated as dashed lines. The expected signal shapes for \(m_a=5\,\text {GeV}, C_{ayy}=0.01\); \(m_a=5\,\text {GeV}, C_{ayy}=5\cdot 10^{-4}\); \(m_a=15\,\text {GeV}, C_{ayy}=0.01\); \(m_a=15\,\text {GeV}, C_{ayy}=5\cdot 10^{-4}\); \(m_a=35\,\text {GeV}, C_{ayy}=0.01\); \(m_a=35\,\text {GeV}, C_{ayy}=10^{-5}\); \(m_a=50\,\text {GeV}, C_{ayy}=0.01\); \(m_a=50\,\text {GeV}, C_{ayy}=10^{-5}\);is shown with arbitrary normalization. The lower panels show the data divided by the estimated continuum background, where the shaded area indicates the uncertainty on the background estimation

5.3 Four-photon final states in the search for promptly decaying axion-like particles

The number of selected events in the \(4S_p\) category of the search for promptly decaying ALPs, also defined in the \(m^{\text {reco}}_{\text {inv}}\) vs. \(m^{\text {reco}}_{a}\) plane, is significantly lower than that in the analysis optimised for long-lived ALPs due to the stricter rejection of fake-photon signatures using more stringent selection criteria such as Tight photon identification. To further suppress background contributions in the signal region, a tight selection around the \(m_a\) model parameter, as discussed in Sect. 4, is imposed. The size of the signal regions for each ALP mass are shown in Table 1. The background in the signal region of the search for promptly decaying ALPs is estimated by counting the events around the signal region in the \(m^{\text {reco}}_{\text {inv}}\)-\(m^{\text {reco}}_{a}\) plane, extending the signal region by \(\pm \,5\,\text {GeV} \) in the \(m^{\text {reco}}_{\text {inv}}\) dimension and by 1.5 times the signal region width in the \(m^{\text {reco}}_{a}\) dimension. Due to the low statistics, the total number of background events in the shaded sideband area is scaled by the ratio of the areas of the signal region to the sideband region to estimate the number of background events in the signal region, as illustrated in Fig 4. This is equivalent to assuming a flat background distribution in the plane.

As an alternative background estimate, the size of the control region is taken to be 2.5 times the signal region instead of the 1.5 times used in the nominal background estimate. The difference between the nominal and the alternative background estimates is used as a systematic uncertainty. Figure 4 shows distributions in the \(m^{\text {reco}}_{\text {inv}}\)-\(m^{\text {reco}}_{a}\) plane for events selected in the search for promptly decaying ALPs, with the sideband regions shown for the search parameters \(m_a=10\) GeV and \(m_a=40\) GeV. The validity of this procedure was tested using simulated multi-photon samples confirming that the associated statistical and systematic uncertainties cover potential shape differences.

Fig. 4
figure 4

\(m^{\text {reco}}_{\text {inv}}\) vs. \(m^{\text {reco}}_{a}\) for the \(4S_p\) categories in the search for promptly decaying ALPs, for a simulated \(pp \rightarrow 4\gamma \) sample and b for data. The signal (sideband) regions are indicated by solid lines (shaded areas) for the searches for ALPs with masses of 10 GeV and 40 GeV. Events within the signal region are shown with filled markers, those outside with open markers

5.4 Background estimate summary

Table 2 summarizes the data and the expected background contribution in the signal region for the different categories. The acceptance times selection efficiency of a signal event in any of the categories is largely dependent on the ALP mass and coupling parameters under investigation. It ranges from 9 to \(23\%\) for low ALP masses (\(m_a<5\) GeV) and large couplings (\(C_{a\gamma \gamma }=1\)) and is around 13% for large ALP masses (\(m_a \approx 60\) GeV) at all studied couplings. The largest fraction of events are already cut by the trigger due to the relatively high trigger thresholds. A future reduction of the trigger thresholds could significantly improve the analysis acceptance.

Table 2 Overview of the number of observed events in the search for long-lived ALPs (left) and selected mass points from the search for prompt ALPs (right) in comparison to the expected number of background events. The uncertainty on the background estimate includes statistical and systematic uncertainties as described in Sect. 6

6 Systematic uncertainties

The systematic uncertainties are assessed below, and their impact on the results is discussed in Sect. 7. First the general experimental uncertainties are discussed, where special attention is given to the uncertainties arising from the displaced decay of long lived ALPs. Then the uncertainties impacting the background estimation are detailed followed by a discussion of the relevant theoretical uncertainties.

The experimental systematic uncertainty ranges, depending on the category and the hypothesized ALP mass and coupling, from 6.5% to 18% for most categories, with the exception of the 4S category where the uncertainty rises to 40% for masses \(m_a < 15\,\text {GeV} \) and small couplings \(C_{a\gamma \gamma }\) due to low statistics and large contributions from the photon identification uncertainties. The theoretical uncertainty is around 6% for all categories.

6.1 General experimental uncertainties

The following general experimental systematics are applied to the signal model.

The uncertainty in the combined 2015–2018 integrated luminosity is 0.83% [35], obtained using the LUCID-2 detector [61] for the primary luminosity measurements, complemented by measurements using the inner detector and calorimeters.

To evaluate any impact on the expected signal yield due to imperfect modelling of pile-up, the average number of pile-up interactions is varied in the simulation. The corresponding uncertainty is below 1%.

The trigger efficiency used to select events is evaluated in simulation and data using a bootstrap method and radiative Z-boson decays [60]. The difference between data and simulation, which ranges from 2 to 3%, is taken as a systematic uncertainty.

The systematic uncertainties from the standard photon identification and isolation efficiencies are estimated following the prescriptions in Ref. [58]. They affect the di-photon selection efficiency and are evaluated by varying the correction factors for photon selection efficiencies in simulation by their corresponding uncertainties. The experimental uncertainties in the photon energy scale and resolution are obtained as described in Ref. [58]. These variations produce uncertainties below 3% on the expected number of events in the signal region in the search for promptly decaying ALPs.

The uncertainties of the NN-based classifiers are estimated by comparing their identification performance using \(Z\rightarrow ee\) events in simulation and data, where the electron shower-shape variables are used as the network input variables. Very good agreement of the network output between data and simulation is observed. The residual differences are fully propagated as uncertainties on the expected signal yields and produce normalisation uncertainties of up to 15% in the 2M and 1M1S categories, respectively.

6.1.1 Uncertainties due to displaced ALP decays

The uncertainties related to photon identification and energy reconstruction for photons produced with displaced vertices – i.e., those arising from long-lived ALP decays – are estimated by studying the decays of long-lived hadrons, mainly kaons with transverse momentum \(p_{\text {T}} > 10\,\text {GeV} \), which can be reconstructed as displaced tracks in the ATLAS tracking system. The daughter tracks from these decays, which originate from a displaced vertex, can be matched to reconstructed clusters in the electromagnetic calorimeter. A comparison of the shower shapes predicted by simulations to those from data for signatures with displaced vertices can then be used to estimate systematic uncertainties in photon reconstruction. The MC prediction of the shower shapes of hadronic particles also relies on the correct description of particle multiplicities and energies. To correct for any mismatch of particle composition in data and simulation, scale factors are derived from the differences between data and MC predictions for shower-shape variables of tracks close to the primary vertex (\(z_0<20\) mm and \(d_0<1\) mm, where \(z_0\) and \(d_0\) are the longitudinal distance from the IP and the impact parameter, respectively). These deviations are taken as a nominal bias and are used to correct tracks originating from a distance between \(20\,\text {mm}<z_0<500\,\text {mm}\) and \(1\,\text {mm}<d_0<80\,\text {mm}\) from the primary vertex (medium regime), and tracks with an origin further than \(z_0 > 500\) mm and \(d_0 > 80\) mm from the interaction point (far regime). These systematic uncertainties are used additionally for all photons stemming from a displaced decay. The systematic uncertainty from the modelling of the NN classifier that discriminates real photons from fakes due to displaced photon vertices is \(3\%\). The corresponding uncertainties from the modelling of the photon identification and the second NN, which discriminates between merged and resolved photons, ranges from 4 to \(23\%\), depending on the displacement.

Long-lived hadrons are also utilized to estimate systematic uncertainties in energy reconstruction. The observed differences in the momentum over energy ratio between the prompt, medium and far regimes are found to be negligible compared to the nominal energy reconstruction uncertainty.

Systematic uncertainties in identifying the correct ALP pairing are mainly caused by variations in the photon energy scale corrections within their uncertainties. The final impact on the number of reconstructed events in a particular ALP mass signal region is less than 5% and hence negligible.

6.2 Uncertainties on the background estimation

The continuum background processes are estimated from data and are subject to uncertainties related to the potential bias arising from the selected background model, as detailed in Sect. 5. The nominal background estimate in each bin is calculated using the nominal fitting functions, which have been fitted in the sideband regions. The shape uncertainty of the background is estimated by employing the same nominal fitting functions, but fitting them in a sideband region whose width is varied by 5 \(\text {GeV}\), corresponding to a 25% to 100% change in the fit range, depending on the category, allowing for a large variability of the background shape. The spurious signal bias or background model bias is assessed as an additional uncertainty on the total number of signal events in each category. This bias is estimated by generating pseudo-data using a modified background model and performing the full signal-plus-background fit (see Sect. 7) on these pseudo-data [62]. The alternative background model for the spurious signal bias estimate is based on a second-order polynomial for the 2M, 1M1S, 3S, and 4S categories, while the function \(f_{\text {sys}} = N_0 \exp (p\cdot x) + N_1 + a\cdot x^2 + b\cdot x\) is used for the 2S category. The estimated signal-strength in these pseudo-data is considered as an additional systematic uncertainty in the final signal-strength estimation. The largest impact is found in the 4S category shifting the branching ratio limit by \(10^{-6}\).

6.3 Impact of theory uncertainties

To estimate the effects of scale uncertainties arising from missing higher-order corrections in the theoretical calculations, the factorisation and renormalisation scales are varied up and down by a factor of two from their nominal values. The cross-section is then recalculated for each case, and the largest deviation from the nominal cross-section is taken as the uncertainty. The uncertainties on the SM Higgs ggF production cross-section due to the choice of renormalisation scheme and top-quark mass, as well as their combination with those from factorisation and renormalisation scale variations, are based on Ref. [63]. The uncertainties in the cross-sections, which include the effects of uncertainties on the PDF and the strong coupling constant \(\alpha _s\), and the uncertainties in the \(H \rightarrow \gamma \gamma \) branching fractions, are taken from Ref. [46] to be 5.7%. It is found that further uncertainties on the Higgs boson signal prediction are negligible.

The uncertainties on the Higgs boson production cross-section enter when calculating the limit on the branching ratio of the signal process on all Higgs boson decays. The values for the uncertainties are taken from Ref. [46].

Fig. 5
figure 5

The number of data and estimated background events in the signal region of the most sensitive categories. The uncertainty in the background estimate is shown as shaded band. The left side shows the different categories of the long-lived ALP search, while the right side displays the \(4S_p\) category of the prompt search for increasing mass hypotheses. The numbers in parentheses in the x-axis labels correspond to the probed ALP mass hypothesis in \(\text {GeV}\). The SM \(H\rightarrow \gamma \gamma \) background is only sizeable in the first three bins, corresponding to the two-photon categories

7 Results

The statistical analysis is carried out using the PyHF framework [64, 65]. In the long-lived ALP searches, the analysis results are obtained by performing a simultaneous maximum-likelihood fit to the \(m^{\text {reco}}_{\text {inv}}\) distribution over the range \(100~\text {GeV} \) to \(150~\text {GeV} \) for the two most sensitive categories for each ALP mass and coupling parameter. The 2M and 1S1M categories are most sensitive for low ALP masses (\(m_a<5\) \(\text {GeV}\)), while for larger ALP masses, the 4S category dominates over the 3S category. Including more than two categories in the fit does not significantly improve the sensitivity for any model parameter. In the prompt ALP search, only the number of events in the signal region of the \(4S_p\) category is used. The analysis sensitivity is limited by the available data statistics for the 4S and \(4S_p\) categories, while the systematic uncertainties dominate in other categories. The likelihood function is defined as follows:

$$\begin{aligned} {\mathcal {L}} = {\displaystyle \prod _{c}\left( \textrm{Pois}(n^{}_c|N_c(\pmb {\theta }))\cdot {\displaystyle \prod _{i=1}^{n_\text {bins}^c} f_c(n_i,m^{\text {reco}, i}_\text {inv},\pmb {\theta })} \right) \cdot G( \pmb {\theta })}.\nonumber \\ \end{aligned}$$
(1)

Here, \(n_{\text {bins}}^c\) is the number of bins in the \(m^{\text {reco}}_{\text {inv}}\) distribution, \(n_c\) is the observed number of events, and \(N_c\) is the expected number of events for each category c. For each bin i in the \(m^{\text {reco}}_{\text {inv}}\) distribution of category c, \(f_c\) is the value of the probability density function (pdf) which is estimated from simulation, \(n_i\) is the number of observed events in bin i, \(\pmb {\theta }\) represents the nuisance parameters (NP) used to parametrize the effect of systematic uncertainties, and \(G(\pmb {\theta })\) represents constraint pdfs for the nuisance parameters. All constraints correspond to Gaussian pdfs. The expected number of events \(N_c\) is defined as the sum of the expected yields from \(H\rightarrow aa \rightarrow 4\gamma \) production processes (\(N_{H\rightarrow aa}\)), single Higgs-boson production (\(N^{\mathrm {H\rightarrow \gamma \gamma }}_{\textrm{bkg}}\)), the non-resonant background (\(N^{\textrm{nonres}}_{\textrm{bkg,c}}\)), and the spurious signal uncertainty (\(N_{\textrm{SS,c}}\)). It is defined as:

$$\begin{aligned} N_c( \pmb {\theta })= & {} \mu \cdot N_{H\rightarrow aa}( \pmb {\theta }_{H\rightarrow aa}^\textrm{yield}) + N^{\mathrm {H\rightarrow \gamma \gamma }}_{\textrm{bkg}}( \pmb {\theta }_{\mathrm {H\rightarrow \gamma \gamma }}^\textrm{yield}) \nonumber \\{} & {} + N^{\textrm{nonres}}_{\textrm{bkg,c}}(\pmb {\theta }_\textrm{nonres}^\textrm{yield}) + N_{\textrm{SS,c}}. \end{aligned}$$
(2)
Fig. 6
figure 6

Upper limits on \({\mathcal {B}}(H\rightarrow aa\rightarrow 4\gamma )\) at 95% CL as a function of the axion mass and for different ALP-photon couplings, from a \(C_{a\gamma \gamma }=1\) to d \(C_{a\gamma \gamma }=10^{-5}\)

Fig. 7
figure 7

Zoomed in version of Fig. 6 showing upper limits on \({\mathcal {B}}(H\rightarrow aa\rightarrow 4\gamma )\) at 95% CL as a function of the signal mass hypothesis and for different ALP-photon couplings. a \(m_a<5.0\,\text {GeV} \), \(C_{a\gamma \gamma }=1\); b \(m_a<5.0\,\text {GeV} \), \(C_{a\gamma \gamma }=0.01\); c \(m_a > 5.0 \,\text {GeV} \), \(C_{a\gamma \gamma }=1\); d \(m_a > 5.0 \,\text {GeV} \), \(C_{a\gamma \gamma }=0.01\)

Here, \(\mu \) is the signal strength, and \( \pmb {\theta }^\textrm{yield}\) represents the NPs affecting the event yield, as described in Sect. 6. Correlation of the nuisance parameters across different signal and background components, and categories, is taken into account. The normalisation parameter for the \(H\rightarrow \gamma \gamma \) production rate, \(\pmb {\theta }_{\mathrm {H\rightarrow \gamma \gamma }}^\textrm{yield}\), is set to 1 and corresponds to the SM prediction for the \(H\rightarrow \gamma \gamma \) cross-section. It is allowed to vary within its theoretical and experimental uncertainties.

The signal-plus-background hypothesis for the production of a Higgs boson that decays into ALPs is tested using the profile-likelihood-ratio test statistic derived from Eq. 1, and is parameterized with the signal-strength parameter \(\mu \). This parameter is defined as the ratio of the extracted signal events to the total number of signal events in the MC simulation.Footnote 3

Figure 5 shows the distribution of the number of estimated and observed events in the signal region of the most sensitive category for various ALP masses and coupling parameters for the prompt and long-lived ALP searches. Good agreement is observed between the estimated backgrounds and the data. No significant pulls of the NP are observed after the fits. The NP \(\pmb {\theta }_{\mathrm {H\rightarrow \gamma \gamma }}^\textrm{yield}\) is found to be consistent with 1, corresponding to the expected SM Higgs boson production cross-section on the 2S category.

Upper limits, derived using the CL\(\text {s}\) technique [66], are set on \({\mathcal {B}}(H \rightarrow aa \rightarrow 4\gamma )\). The branching ratio is obtained by dividing the fitted signal cross-section by the total Higgs-boson production cross-section of \(55.6\, \text {pb}\) [46]. The expected and observed limits as a function of \(m_a\) from the search optimized for long-lived ALPs, i.e., \(C_{a\gamma \gamma }<0.1\), are shown in Fig. 6 along with the performance on prompt decays for \(C_{a\gamma \gamma }=1\). For low masses and low couplings, the lifetime of the ALP gets significantly larger and most of the ALPs decay outside the active detector area. Therefore there are no limits available for \(C_{a\gamma \gamma } \le 5\times 10^{-4}\) and m\(_a<10\) \(\text {GeV}\). Limits for prompt decays with \(C_{a\gamma \gamma }=1\) are also shown. A relatively uniform sensitivity is achieved for larger ALP masses, above \(10~\text {GeV} \) for \(C_{a\gamma \gamma } \ge 5\times 10^{-4}\) and above \(25\,\text {GeV} \) for \(C_{a\gamma \gamma } = 10^{-5}\). The sensitivity for low ALP masses decreases with smaller coupling values, as more ALP decays happen outside the sensitive detector volume. The largest differences between expected and observed limits are found in long-lived ALP searches for masses between 10 and 25 \(\text {GeV}\) at 1.5 \(\sigma \). It should be noted that the background estimation is the same for all couplings in this mass region. Hence correlated behaviour is expected for all relevant couplings. The limits for the couplings \(C_{a\gamma \gamma }=1\) and \(C_{a\gamma \gamma }=0.01\) as a function of \(m_a\) are shown in Fig. 7, separated for low ALP masses (\(m_a <5\,\text {GeV} \)) and higher ALP masses. The upper limits at 95% CL on \({\mathcal {B}}(H\rightarrow aa\rightarrow 4\gamma )\) range from \(10^{-4}\) to \(3 \times 10^{-2}\) for low ALP masses and from \(2 \times 10^{-5}\) to \(2 \times 10^{-4}\) for higher ALP masses. The observed limits are compatible with the expected limits. The loss in sensitivity around 3 \(\text {GeV}\) is due to the transition between the merged and resolved photon categories, where the former have significantly larger background contributions.

The limits on \({\mathcal {B}}(H \rightarrow aa \rightarrow 4\gamma )\) derived from the \(4S_p\) category assuming promptly decaying ALPs are shown in Fig. 8. The derived limit is mostly flat at \({\mathcal {B}} < 2\times 10^{-5}\) in the mass range \(10\,\text {GeV}< m_a < 62\,\text {GeV} \). For the long-lived searches it is necessary to loosen the selection criteria to allow for displaced ALP decays. Therefore the background contributions are significantly larger and consequently the searches are less sensitive than the prompt searches. The observed limits are consistent with the expected limits.

The limits on ALP masses with \(m_a>15\) \(\text {GeV}\) are about one order of magnitude more stringent than previous ATLAS analyses [26] using \(8\,\text {TeV} \) data, and reach similar to slightly better sensitivity than previous analyses from CMS [28] using \(132\,\text{ fb}^{-1} \) of \(\sqrt{s} = 13\,\text {TeV} \) data. These are the first limits on ALPs with masses below \(10\,\text {GeV} \) from the ATLAS experiment, and are up to 40% more stringent than previous results from CMS [27] using \(136\,\text{ fb}^{-1} \) of \(\sqrt{s} = 13\,\text {TeV} \) data. The limits on long-lived ALPs in anomalous Higgs boson decays are the first obtained by any experiment.

Fig. 8
figure 8

Upper limits on \({\mathcal {B}}(H\rightarrow aa\rightarrow 4\gamma )\) at 95% CL as a function of the signal mass hypothesis and for the assumption of promptly decaying ALPs

Fig. 9
figure 9

Limits on the ALP mass and coupling to photons at 95% CL, assuming \({\mathcal {B}}(a\rightarrow \gamma \gamma ) = 1\), \(\Lambda = 1\,\text {TeV} \) with \(|C_{aH}^{\text {eff}}|\) = 1 (solid line) and \(|C_{aH}^{\text {eff}}|\) = 0.1 (dashed line) as predicted in Ref. [19]. The shaded blue area represents the excluded region. The nearly horizontal orange shaded area indicates the region favoured by an ALP explanation for the \((g-2)_{\mu }\) discrepancy [19]. Also shown are exclusion limits from the respective ATLAS [67] and CMS [68] Light-by-Light (LbyL) scattering analysis, and beam dump experiments, supernova SN1987a and cosmological observations adapted from Ref. [69]

The limit on the branching ratio can be converted into a limit on the coupling of axion-like particles to photons, \(C_{a\gamma \gamma }\). The branching ratio, which depends on \(C_{a\gamma \gamma }\), can be calculated using the method described in Ref. [19]:

$$\begin{aligned} {\mathcal {B}}^{\text {theo}} = \Gamma _{Haa} f_{aa}^2(C_{a\gamma \gamma }) \frac{{\mathcal {B}}(a\rightarrow \gamma \gamma )}{\Gamma _{H} + \Gamma _{Haa}}, \end{aligned}$$
(3)

where it is assumed that all ALPs decay into photons and hence \({\mathcal {B}}(a\rightarrow \gamma \gamma ) = 1\). In the following \(m_a\) and \(m_H\) refer to the masses of the ALP and Higgs boson, respectively, v is the vacuum expectation value, and the effective coupling of the ALP to the Higgs boson is assumed to be \(C_{aH}^{\text {eff}}/\Lambda ^2 = 1\,\text {TeV} ^{-2}\). \(\Gamma _H\) is the total decay width of the Higgs boson. The Higgs boson to ALP decay width \(\Gamma _{Haa}\) is calculated as

$$\begin{aligned} \Gamma _{Haa} = \frac{v^2m_H^3}{32\pi }\frac{\left| C_{aH}^{\text {eff}}\right| ^2}{\Lambda ^4} \left( 1-\frac{2m_a^2}{m_H^2}\right) ^2 \sqrt{1-\frac{4m_a^2}{m_H^2}}. \end{aligned}$$
(4)

Using the Higgs mass of \(m_H = 125\,\text {GeV} \) and assuming \(m_a = 10\,\text {GeV} \) and \(C_{aH}^{\text {eff}}/\Lambda ^2 = 1 \, \text {TeV} ^{-2}\) yields a branching ratio of the \(H\rightarrow aa\) process of \(30\%\). The factor \(f_{aa}^2(C_{a\gamma \gamma })\) represents the fraction of ALPs detected inside the detector volume. It depends on the ALP decay length, and hence on \(C_{a\gamma \gamma }\), since the decay width is given by

$$\begin{aligned} \Gamma _{a\gamma \gamma } = \frac{4\pi \alpha ^2m_a^3}{\Lambda ^2}|C_{a\gamma \gamma }|^2. \end{aligned}$$
(5)

\(f_{aa}\) is determined from the signal simulation and interpolated between the simulated \(C_{a\gamma \gamma }\) values. The coupling is adjusted until the expected branching ratio matches the observed limit on the branching ratio, yielding the corresponding limit on \(C_{a\gamma \gamma }\).

The resulting limits are shown in the two-dimensional exclusion plot of \(C_{a\gamma \gamma }\) vs. \(m_a\) presented in Fig. 9. This search significantly reduces the allowed parameter space for ALP-based models that could explain the \((g-2)_{\mu }\) discrepancy in the \(H\rightarrow aa \rightarrow 4\gamma \) decay mode, as suggested in Ref. [19].

8 Conclusion

This paper reports a search for a light pseudoscalar particle (a) produced in the decay \(H\rightarrow aa\), where H is the 125 \(\text {GeV}\) Higgs boson. The a boson, which can have a short or long lifetime, decays into two photons, resulting in a final state with four photons with an invariant mass near 125 \(\text {GeV}\). The analysis uses 140 fb\(^{-1}\) of pp collision data at a centre-of-mass energy of 13 \(\text {TeV}\) collected by the ATLAS detector between 2015 and 2018. The search aims to identify a narrow \(a\rightarrow \gamma \gamma \) resonance with a mass in the range of \(100~\text {MeV} \) to \(62~\text {GeV} \), where the resonance decay occurs within a distance of 1970 mm from the collision vertex. Dedicated search strategies for long-lived \(a\rightarrow \gamma \gamma \) decays are developed for the first time. To enable the search for low resonance masses, neural network classifiers are trained to distinguish between single and collimated photon signatures.

No significant excess over the Standard Model backgrounds is observed in the data. The largest deviation from the expected limit, \(1.5\sigma \), is observed in the range of \( 10\,\text {GeV}< m_a < 25\,\text {GeV} \). Upper limits at 95% CL are set for \({\mathcal {B}}(H \rightarrow aa \rightarrow 4\gamma )\), which range from \(2\times 10^{-5}\) to \(3\times 10^{-2}\) depending on \(m_a\) for the prompt axion-like particle search. For the search for long-lived ALPs with significant displaced decay vertices, upper limits at 95% CL are set, ranging from \(2\times 10^{-5}\) to \(6\times 10^{-5}\) for \(10~\text {GeV}<m_a<62~\text {GeV} \) and from \(10^{-4}\) to \(3\times 10^{-2}\) for \(0.1\,\text {GeV}<m_a<10\,\text {GeV} \). These are the most stringent limits to date.