1 Introduction

The Higgs boson, H, was first observed by the ATLAS [1] and CMS [2] collaborations in 2012 [3, 4] with a mass of approximately 125 \(\text {GeV}\). Detailed measurements of its properties [5, 6] have confirmed its role in the spontaneous breaking of electroweak symmetry and the mass generation of the massive vector bosons [7, 8]. In the Standard Model (SM), the mass generation for fermions is implemented through Yukawa interactions. The ATLAS and CMS collaborations have reported observations of Higgs boson decays into a pair of \(\tau \)-leptons [9, 10], a pair of bottom quarks [11, 12], and associated production of Higgs bosons with top-quark pairs [13, 14]. These measurements represent a complete observation of Higgs boson couplings to third-generation charged fermions, and are in agreement with SM expectations. Recently, evidence has been reported for the Higgs boson coupling to muons in the second generation through the decay \(H\rightarrow \mathrm {\mu ^+ \mu ^-}\) [15, 16]. Direct searches for \(H\rightarrow {c}{\bar{c}}\) have been performed by both the ATLAS and CMS collaborations [17,18,19,20], as have searches for \(H\rightarrow e^+ e^-\) decays [21, 22], but no further experimental evidence currently exists for the Higgs boson couplings to the first and second generations of fermions. Searches for potential beyond-the-SM (BSM) couplings of the Higgs boson have also been performed by the ATLAS and CMS collaborations, including searches for flavour-changing neutral currents via the t-quark decays \(t\rightarrow cH\) and \(t\rightarrow uH\) [23,24,25,26], and the lepton-flavour-violating decays \(H\rightarrow e\mu \), \(H\rightarrow e\tau \) and \(H\rightarrow \mu \tau \) [21, 27, 28]. No evidence for these couplings has been found.

A direct method for accessing the couplings of the first- and second-generation quarks is the study of inclusive \(H\rightarrow q\bar{q}\) decays. While these channels have large branching fractions, their experimental sensitivity is substantially obscured by large multi-jet backgrounds. Searches for the exclusive, radiative decays of the Higgs boson into a vector meson state and a photon offer an alternative way to probe the quark Yukawa couplings [29,30,31]. Although their branching fractions are comparatively small, these radiative decays have a distinct experimental signature, which helps suppress the large multi-jet backgrounds that affect the \(H\rightarrow q\bar{q}\) searches. Figure 1 shows Feynman diagrams depicting the \(H\rightarrow {\mathcal {Q}}\,\gamma \) process, where \({\mathcal {Q}}\) is a vector quarkonium state. There are two primary contributions to the decay amplitude: the direct amplitude \({{{\mathcal {A}}}}_\text {dir}\) occurs via the quark Yukawa coupling; the indirect amplitude \({{{\mathcal {A}}}}_\text {ind}\) occurs at the one-loop level in the SM through the \(H\rightarrow \gamma \gamma ^*\) decay, where the virtual photon subsequently fragments into a vector quarkonium state. The two processes interfere destructively, and despite being loop-induced, the indirect amplitude is typically the more dominant contribution in decays of the Higgs boson into a vector meson state and a photon.

Fig. 1
figure 1

Feynman diagrams depicting the a direct amplitude and b indirect amplitude contributing to the \(H\rightarrow {\mathcal {Q}}\,\gamma \) process, where Q is a vector quarkonium state. The hatched circle in b denotes a set of one-loop diagrams

Table 1 Recent calculations of the \(H\rightarrow {\mathcal {Q}}\,\gamma \) branching fractions expected in the Standard Model

Higgs boson decays in the charmonium sector, \(H\rightarrow {J/\psi }\,\gamma \) and \(H\rightarrow \psi (2S)\,\gamma \), offer an opportunity to access both the magnitude and the sign of the charm-quark Yukawa coupling [29, 30]; the corresponding decays in the bottomonium sector, \(H\rightarrow \Upsilon (1S,2S,3S)\,\gamma \), can provide information about the real and imaginary parts of the bottom-quark coupling to the Higgs boson [31]. Studies of these decays complement searches for the inclusive \(H\rightarrow c\bar{c}\) and \(H\rightarrow b\bar{b}\) decays. The results of recent independent calculations of the branching fractions expected for these decays in the SM are presented in Table 1, and are of the order of \(10^{-6}\) for \(H\rightarrow {J/\psi }\,\gamma \) and of order \(10^{-9}\) to \(10^{-8}\) for \(H\rightarrow \Upsilon (1\,S,2\,S,3\,S)\,\gamma \) [31,32,33,34,35,36]. The branching fraction for \(H\rightarrow \psi (2S)\,\gamma \) is expected to be \((1.03\pm 0.06)\times 10^{-6}\). This was obtained via a private communication from the authors of Ref. [34], who used an estimate of the value of the order-\(\nu ^2\) non-relativistic QCD long-distance matrix element, where \(\nu \) is the velocity of the heavy quarks in the \({\mathcal {Q}}\) rest frame. It is noted that the branching fractions for decays in the bottomonium sector are small compared to those in the charmonium sector: in this case there is an almost perfect cancellation between the direct and indirect decay amplitudes, caused by the mass of the b-quark being large compared to the masses of quarks in the first and second generations.

Deviations of the quark Yukawa couplings from SM expectations can lead to significant enhancements in the branching fractions of these radiative decays, particularly in the bottomonium sector. Such deviations can arise in BSM theories [37]. For instance, the quark masses may not originate entirely from the Higgs mechanism, but could also be induced by other, subdominant, sources of electroweak symmetry breaking [38]. Some further examples are the Froggatt–Nielsen mechanism [39], the Randall–Sundrum family of models [40], the minimal flavour violation framework [41], the Higgs-dependent Yukawa couplings model [42], and the possibility of the Higgs boson being a composite pseudo-Goldstone boson [43].

The Z boson production cross section at the LHC [44] is approximately 1000 times larger than the Higgs boson production cross section [37, 45], which allows rare Z boson decays to be probed to much smaller branching fractions than Higgs boson decays to the same final state. Similarly to the Higgs boson decays in Fig. 1, radiative decays of the Z boson into a vector quarkonium state and a photon receive analogous contributions from direct and indirect amplitudes. In \(Z\rightarrow {\mathcal {Q}}\,\gamma \) decays, the power corrections in terms of the ratio of the QCD energy scale to the vector-boson mass are small. As discussed in Ref. [46], this allows the light-cone distribution amplitudes (LCDAs) of the mesons to be probed in a theoretically clean region where power corrections are in control, which is not possible in other applications of the QCD factorisation approach. These decays have not yet been measured, but recent independent calculations of the SM branching fractions for \(Z\rightarrow {J/\psi }\,\gamma \) and \(Z\rightarrow \Upsilon (1S,2S,3S)\,\gamma \) are presented in Table 2 and are expected to be of order \(10^{-8}\) to \(10^{-7}\) [46,47,48]. No value has been calculated for \(Z\rightarrow \psi (2S)\,\gamma \).

Table 2 Overview of calculations of the Standard Model \(Z\rightarrow {\mathcal {Q}}\,\gamma \) branching fractions

Decays of the Higgs and Z bosons into \({J/\psi }\) or \(\Upsilon (1S,2S,3S)\) and a photon were searched for by the ATLAS Collaboration, initially with up to \(20.3~\text{ fb}^{-1} \) of data collected at \(\sqrt{s}=8~\text {TeV} \) [49] and subsequently with up to \(36.1~\text{ fb}^{-1} \) of data collected at \(\sqrt{s}=13~\text {TeV} \) [50]; the latter search also introduced the study of the \(\psi (2S)\) decay channels. The obtained 95% confidence level (CL) upper limits on the branching fractions were \(3.5\times 10^{-4}\) and \(2.0\times 10^{-3}\) for \(H\rightarrow {J/\psi }\,\gamma \) and \(H\rightarrow \psi (2S)\,\gamma \), respectively, and \(\left( 4.9,5.9,5.7\right) \times 10^{-4}\) for \(H\rightarrow \Upsilon (1S,2S,3S)\,\gamma \). The corresponding 95% CL upper limits for the analogous Z boson decays were \(2.3\times 10^{-6}, 4.5\times 10^{-6}\) and \(\left( 2.8,1.7,4.8\right) \times 10^{-6}\). The \(H\rightarrow J/\psi \,\gamma \) and \(Z\rightarrow J/\psi \,\gamma \) decays have also been searched for by the CMS Collaboration [51, 52], yielding similar upper limits of \(7.6\times 10^{-4}\) and \(1.4 \times 10^{-6}\) from \(35.9~\text{ fb}^{-1} \) of data collected at \(\sqrt{s}=13~\text {TeV} \). In addition, the ATLAS Collaboration has searched for the rare Higgs and Z boson decays to light vector mesons \(H(Z)\rightarrow \phi \,\gamma \) and \(H(Z)\rightarrow \rho \,\gamma \) [53, 54], while the CMS Collaboration has also searched for Higgs boson decays into a Z boson and a \(\rho \) or \(\phi \) meson [55], and Higgs and Z boson decays into pairs of \(J/\psi \) or \(\Upsilon (1S,2S,3S)\) mesons [56].

This paper describes searches for Higgs and Z boson decays into the exclusive final states \({J/\psi }\,\gamma \), \(\psi (2S)\,\gamma \), and \(\Upsilon (1S,2S,3S)\,\gamma \), and then into \(\mathrm {\mu ^+ \mu ^-}\gamma \), using 139 fb\(^{-1}\) of ATLAS proton–proton (pp) collision data collected between 2015 and 2018 at \(\sqrt{s} =13~\text {TeV} \). Hereafter, where no distinction is relevant, the \({J/\psi }\) and \(\psi (2S)\) states are collectively denoted by \(\psi (nS)\), and the \(\Upsilon (1S,2S,3S)\) states are denoted by \(\Upsilon (nS)\). The results are interpreted in the kappa framework [37, 57] in terms of the ratios \(\kappa _c/\kappa _\gamma \) and \(\kappa _b/\kappa _\gamma \), where \(\kappa _c\), \(\kappa _b\) and \(\kappa _\gamma \) are the modifiers of the coupling between the Higgs boson and the charm quark, bottom quark, and the effective coupling between the Higgs boson and the photon, respectively.

2 ATLAS detector and data sample

ATLAS [1] is a multipurpose particle detector with an approximately forward–backward symmetric cylindrical geometry and near 4\(\pi \) coverage in solid angle.Footnote 1 It consists of an inner tracking detector, electromagnetic and hadronic calorimeters, and a muon spectrometer.

The inner tracking detector (ID) covers the pseudorapidity range \(|\eta |<2.5\) and is surrounded by a thin superconducting solenoid providing a 2 T magnetic field. At small radii, a high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track. The first hit is usually in the insertable B-layer, an additional layer installed in 2015 before 13 TeV data taking began [58, 59]. It is followed by a silicon microstrip tracker, which provides up to eight measurement points per track. The silicon detectors are complemented by a gas-filled straw-tube transition radiation tracker, which enables radially extended track reconstruction up to \(|\eta | = 2.0\) with typically 35 measurements per track.

Electromagnetic (EM) calorimetry within \(|\eta |<3.2\) is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) EM calorimeters with an additional thin LAr presampler covering \(|\eta | < 1.8\) to correct for energy loss in upstream material; for \(|\eta | < 2.5\) the EM calorimeter is divided into three layers in depth. A steel/scintillator-tile calorimeter provides hadronic calorimetry for \(|\eta |<1.7\). LAr technology with copper as absorber is used for the hadronic calorimeters in the endcap region, \(1.5<|\eta |<3.2\). The solid-angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules in \(3.1< |\eta | < 4.9\) optimised for EM and hadronic measurements, respectively.

The muon spectrometer (MS) surrounds the calorimeters and features trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field provided by three air-core superconducting toroidal magnets. The field integral of the toroids ranges between 2.0 and \({6.0}\,\hbox {T}\,\hbox {m}\) across most of the detector. The precision chamber system covers the region \(|\eta | < 2.7\) with three layers of monitored drift tubes, complemented by cathode strip chambers in the forward regions. The muon trigger system covers the range \(|\eta | < 2.4\), featuring resistive plate chambers in the barrel and thin gap chambers in the endcap regions.

A two-level trigger and data acquisition system is used to record events for offline analysis [60]. The first-level trigger is implemented in hardware and uses a subset of detector information. This is followed by a software-based high-level trigger which outputs events for permanent storage at an average rate of 1 kHz, reduced from the maximum first-level rate of 100 kHz. An extensive software suite [61] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data samples considered in this analysis were collected during stable beam conditions with all relevant detector systems functional [62] and were recorded by a combination of triggers requiring a photon and either one or two muons in the event. Due to the increasing instantaneous luminosity, the transverse momentum thresholds and identification requirements were modified during the data-taking periods. Available throughout the data collection period was a trigger requiring a photon fulfilling the ‘medium’ identification criteria [63] and transverse momentum \(p_{\text {T}}^{\gamma }\) greater than \(25~\text {GeV} \), and at least one muon identified at the first-level trigger with \(p_{\text {T}}^{\,\mu }\) greater than \(24~\text {GeV} \). In the 2015 and 2016 data collection periods, this was complemented with a trigger requiring a photon fulfilling the ‘loose’ identification criteria [63] and \(p_{\text {T}}^{\gamma }>35~\text {GeV} \), and at least one muon identified at the high-level trigger with \(p_{\text {T}}^{\,\mu }>18~\text {GeV} \). During the 2017 and 2018 data collection, a trigger requiring an isolated photon fulfilling the ‘tight’ identification criteria [63] and \(p_{\text {T}}^{\gamma }>35~\text {GeV} \) and at least one muon identified at the high-level trigger with \(p_{\text {T}}^{\,\mu }>18~\text {GeV} \), and a trigger requiring a ‘loose’ photon with \(p_{\text {T}}^{\gamma }>35~\text {GeV} \), a muon identified at the first-level trigger with \(p_{\text {T}}^{\,\mu }>15~\text {GeV} \) and an additional muon identified at the high-level trigger with \(p_{\text {T}}^{\,\mu }>2~\text {GeV} \) were used. For candidate signal events that fulfil the analysis selection, described in Sect. 3, the trigger efficiency exceeds 97% in all cases. The uncertainty in the trigger efficiency is estimated to be 0.8% [64, 65]. After applying trigger and data-quality requirements, the integrated luminosity of the data sample used in this search corresponds to \(139.0\pm 2.4~\text{ fb}^{-1} \), obtained using the LUCID-2 detector [66] for the primary luminosity measurements and with the uncertainty in the integrated luminosity derived using the method described in Refs. [67, 68]. The average number of pp interactions per bunch crossing ranged from about 13 in 2015 to about 39 in 2018.

3 Event selection

In addition to the trigger and data-quality requirements, selection criteria based on geometric acceptance, event kinematics, isolation, and vertex quality are imposed to select candidate \(H(Z)\rightarrow {{\mathcal {Q}}}\,\gamma \rightarrow \mathrm {\mu ^+ \mu ^-}\,\gamma \) events. Reconstructed muon candidates must be either ‘segment-tagged’, where an ID track matches at least one track segment in the MS, or ‘combined’, where an ID track matches a full track in the MS [69]. Muons must also have an absolute value of pseudorapidity \(|\eta ^{\,\mu }|<2.5\) and a transverse momentum \(p_{\text {T}}^{\,\mu }>3~\text {GeV} \). Pairs of oppositely charged muons are fitted to a common vertex with the track parameter uncertainties taken into account [70]. Pairs satisfying a loose \(\chi ^2\) requirement for this fit are combined to reconstruct candidate \({{\mathcal {Q}}}\rightarrow \mathrm {\mu ^+ \mu ^-}\) decays. The higher-\(p_{\text {T}} \) muon in each pair, called the leading muon, is required to have \(p_{\text {T}}^{\,\mu }>18~\text {GeV} \). Dimuons with a mass \(m_\mathrm {\mu ^+ \mu ^-}\) between \(2.4~\text {GeV} \) and \(4.3~\text {GeV} \) are selected as \(\psi (nS)\rightarrow \mathrm {\mu ^+ \mu ^-}\) candidates, and those with a mass \(8.0~\text {GeV}<m_\mathrm {\mu ^+ \mu ^-}<12.0~\text {GeV} \) are selected as \(\Upsilon (nS)\rightarrow \mathrm {\mu ^+ \mu ^-}\) candidates. To improve the sensitivity of the \(\Upsilon (nS)\gamma \) analysis in resolving the individual \(\Upsilon (nS)\) states, events are classified into two exclusive categories based upon muon pseudorapidity to account for differences in resolution across the ATLAS detector. Events where both muons satisfy \(|\eta ^{\,\mu }|<1.05\) are placed in the higher-resolution barrel (B) category, otherwise they are placed in the endcap (EC) category.

Candidate \({{\mathcal {Q}}}\rightarrow \mathrm {\mu ^+ \mu ^-}\) decays are subjected to further isolation and vertex-quality requirements. The sum of the \(p_{\text {T}} \) of ID tracks within a cone of variable size \(\Delta R = \sqrt{(\Delta \phi )^{2} + (\Delta \eta )^{2}} = \min \{10~\text {GeV}/(p_{\text {T}}^{\,\mu }~[\text {GeV} ]), 0.3\}\) around the leading muon is required to be less than 6% of the \({\mathcal {Q}}\) candidate’s transverse momentum, \(p_{\text {T}} ^{\mathrm {\mu ^+ \mu ^-}}\) [71]. This sum excludes the ID track associated with the leading muon itself, as well as the ID track associated with the subleading muon if it falls inside the \(\Delta R\) cone. To mitigate the effects of multiple pp interactions in the same or neighbouring bunch crossings, only ID tracks that originate from the primary vertex are considered, which is defined as the reconstructed vertex with the highest \(\Sigma {\,p_{\text {T}}}^2\) of all associated tracks used in the formation of the vertex. To reject contributions from events involving b-hadron decays which result in displaced vertices, the vector leading from the primary vertex to the dimuon vertex is projected onto the direction of the \({{\mathcal {Q}}}\) candidate’s transverse momentum, and the signed projection \(L_{xy}\) is required to be smaller than three times its uncertainty, \(\sigma _{L_{xy}}\), such that \(|L_{xy}/\sigma _{L_{xy}}| < 3\).

Photons are reconstructed from clusters of energy in the electromagnetic calorimeter. Clusters that match ID tracks consistent with the hypothesis of a photon conversion into \(e^+ e^-\) are classified as converted photon candidates, whilst clusters that have no matching ID tracks are classified as unconverted candidates [63]. Reconstructed photon candidates are required to satisfy the ‘tight’ photon identification criteria [63], have absolute pseudorapidity \(|\eta ^{\gamma }|<2.37\), excluding the calorimeter barrel/endcap transition region \(1.37<|\eta ^{\gamma }|<1.52\), and have transverse momentum \(p_{\text {T}}^{\gamma }>35~\text {GeV} \). Further track- and calorimeter-isolation requirements are imposed to suppress contamination from jets: the sum of the \(p_{\text {T}}\) of all ID tracks originating from the primary vertex and within \(\Delta R = 0.2\) of the photon direction, excluding any associated with the reconstructed photon, is required to be less than \(5\%\) of \(p_{\text {T}}^{\gamma }\); the sum of the \(p_{\text {T}}\) of the calorimeter energy clusters within \(\Delta R = 0.4\) of the photon direction, excluding the energy cluster of the reconstructed photon, is required to be less than \(\left( 2.45~\text {GeV} + 0.022 \times p_{\text {T}}^{\gamma }~[\text {GeV} ]\right) \). The calorimeter isolation variable is corrected to account for contributions associated with other pp interactions in the same bunch crossing [63].

Combinations of a \({{\mathcal {Q}}}\rightarrow \mathrm {\mu ^+ \mu ^-}\) candidate and a photon that satisfy \(\Delta \phi ({{{\mathcal {Q}}}},\gamma )>\pi /2\) are retained for further analysis: this requirement suppresses contributions from events where the quarkonium and photon candidates have a small angular separation. If multiple combinations are possible, a situation that arises in fewer than \(2\%\) of events that pass the trigger requirements, the combination of the highest-\(p_{\text {T}}\) photon and the \({{\mathcal {Q}}}\) candidate with an invariant mass closest to that of the \(J/\psi \)  meson is retained. To maintain a common event selection for the Higgs and Z boson analyses, while ensuring near-optimal sensitivity for both, a variable \(p_{\text {T}} ^{\mathrm {\mu ^+ \mu ^-}}\) threshold that depends on the invariant mass of the \({{\mathcal {Q}}}\gamma \) system, \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\), is applied. For the \(\psi (nS)\rightarrow \mathrm {\mu ^+ \mu ^-}\) (\(\Upsilon (nS)\rightarrow \mathrm {\mu ^+ \mu ^-}\)) candidates, the \(p_{\text {T}} ^{\mathrm {\mu ^+ \mu ^-}}\) threshold is \(40~\text {GeV} \) (\(34~\text {GeV} \)) for \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\le 91~\text {GeV} \), and \(54.4~\text {GeV} \) (\(52.7~\text {GeV} \)) for \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\ge 140~\text {GeV} \). In the region \(91 \text {GeV}<m_{\mathrm {\mu ^+ \mu ^-}\gamma }<140~\text {GeV} \), the \(p_{\text {T}} ^{\mathrm {\mu ^+ \mu ^-}}\) threshold varies linearly between its fixed values outside this region. The \(p_{\text {T}} ^{\mathrm {\mu ^+ \mu ^-}}\) thresholds are chosen to optimise the significance of potential signals at the Higgs and Z boson masses.

4 Signal modelling

The expected \(H(Z)\rightarrow {{\mathcal {Q}}}\gamma \) signals were modelled using Monte Carlo (MC) simulated events to extract analytical approximations of the final observables. Several Higgs boson production modes were considered in the simulation of the \(H\rightarrow {{\mathcal {Q}}}\gamma \) events. In order of decreasing production cross section, these are gluon–gluon fusion (ggH), vector-boson fusion (VBF), and associated production of a H boson with a Z boson (ZH), a \(W^{\pm }\) boson (WH) or a \(t\bar{t}\) top-quark pair (\(t\bar{t}H\)). Explicit simulation of the \(t\bar{t}H\) Higgs boson production mechanism is an addition with respect to the analysis strategy of the \(36.1~\text{ fb}^{-1} \) study [50]. The contribution of Higgs bosons produced in association with a \(b\bar{b}\) bottom-quark pair (\(b\bar{b}H\)) was not directly simulated, but was included by scaling up the production cross section used to normalise the ggH sample. This assumes that the efficiency for \(b\bar{b}H\) events is equal to that for ggH; their inclusion increases the signal yield by less than 1%. For the \(Z\rightarrow {{\mathcal {Q}}}\gamma \) signal events, Z boson production was modelled inclusively.

Fig. 2
figure 2

Generator-level transverse momentum (\(p_{\text {T}} \)) distributions of the photon and muons for a \(H\rightarrow {J/\psi }\,\gamma \), b \(H\rightarrow \psi (2S)\,\gamma \), c \(H\rightarrow \Upsilon (nS)\,\gamma \), d \(Z\rightarrow {J/\psi }\,\gamma \), e \(Z\rightarrow \psi (2S)\,\gamma \) and f \(Z\rightarrow \Upsilon (nS)\,\gamma \) simulated events, respectively. The dashed-line distributions with a clear fill show the events at generator level which fall within the analysis geometric acceptance (both muons are required to have \(|\eta ^{\,\mu }|<2.5\), while the photon is required to have \(|\eta ^{\gamma }|<2.37\), excluding the region \(1.37<|\eta ^{\gamma }|<1.52\)), and are each normalised to unity. The solid-line distributions with a hatched fill show the fraction of these events which pass the full analysis event selection described in Sect. 3. The relative difference between the two sets of distributions corresponds to the effects of reconstruction, trigger, and event selection efficiencies. The leading muon candidate is denoted by \(p_{\text {T}} ^{\mu 1}\) (black), the subleading candidate by \(p_{\text {T}} ^{\mu 2}\) (blue), and the photon candidate is denoted by \(p_{\text {T}} ^{\gamma }\) (green)

The Powheg Box v2 MC event generator [72,73,74,75,76] was used to model the ggH and VBF Higgs boson production mechanisms, calculated up to next-to-leading order (NLO) in \(\alpha _\text {s}\). Powheg Box was interfaced with Pythia 8.212 [77, 78], which used the CTEQ6L1 parton distribution functions [79] and a set of tuned parameters called the AZNLO tune [80] to model the parton shower, hadronisation, and underlying event. The Pythia 8.212 event generator was used to model the ZH and WH production mechanisms, with NNPDF2.3lo parton distribution functions [81] and the A14 tune [82] for hadronisation and the underlying event. The MadGraph5_aMC@NLO 2.2.2 [83] event generator, interfaced with Pythia 8.212 for the parton shower, was used to model the \(t\bar{t}H\) production mechanism, using the same parton distribution functions and event tune as the ZH and WH samples. Inclusive Z boson production was modelled with the same event generators, parton distribution functions, and event tune as the ggH and VBF Higgs boson production mechanisms. The subsequent decays of the Higgs and Z bosons into \(J/\psi \,\gamma \), \(\psi (2S)\,\gamma \) and \(\Upsilon (1S,2S,3S)\,\gamma \) were simulated as a cascade of two-body decays, where the explicit simulation of \(\psi (2S)\,\gamma \) decays is an addition to the analysis strategy compared to the \(36.1~\text{ fb}^{-1} \) study [50], which modelled these events by using \(J/\psi \,\gamma \) samples. Interference effects between \(J/\psi \,\gamma \) produced in Z boson decays and non-resonant QCD gluon-mediated processes [84] are expected to be small and were neglected. The generated events were passed through the detailed \(\textsc {Geant4} \) simulation of the ATLAS detector [85, 86] and processed with the same software as used to reconstruct the data. Conditions in the ATLAS detector, such as the average number of pp interactions per bunch crossing, changed throughout Run 2. Separate samples were produced with 2015–2016 conditions, 2017 conditions, and 2018 conditions, where each sample is normalised according to the integrated luminosity of the corresponding run period. The effect of multiple interactions in the same or neighbouring bunch crossings (pile-up) was modelled by overlaying each simulated hard-scattering event with inelastic pp events generated with Pythia 8.186 using the NNPDF2.3lo parton distribution functions and the A3 tune [87].

The production rates for the SM Higgs boson with \(m_{H} = 125~\text {GeV} \), obtained from the CERN Yellow Reports [37, 45], are assumed throughout this analysis. The ggH sample is normalised such that it reproduces the total cross section predicted by a next-to-next-to-next-to-leading-order (N\(^3\)LO) QCD calculation with NLO electroweak corrections applied [88,89,90,91]. The VBF sample is normalised to an approximate next-to-next-to-leading-order (NNLO) QCD cross section with NLO electroweak corrections applied [92,93,94]. The samples for the associated production of a W or Z boson with a Higgs boson are normalised to cross sections calculated at NNLO in QCD with NLO electroweak corrections [95, 96] including the NLO QCD corrections [97] for \(gg\rightarrow ZH\). The production of \(t\bar{t}H\) is normalised to cross sections calculated at NLO in QCD with NLO electroweak corrections [37]. The production cross section used to scale the ggH sample to account for the \(b\bar{b}H\) mechanism was calculated at a mix of NNLO and NLO accuracy in QCD, with no electroweak corrections [37]. The production rate for the Z boson is normalised to the total cross section obtained from a measurement by the ATLAS Collaboration using \(81~\text{ pb}^{-1} \) of \(\sqrt{s}=13\) \(\text {TeV}\) data [44]. The branching fractions for the decays \({{\mathcal {Q}}}\rightarrow \mathrm {\mu ^+ \mu ^-}\) used in signal normalisation are taken from the Review of Particle Physics [98], as are the inclusive branching fractions for t-quark decays to hadrons or leptons in the normalisation of the \(t\bar{t}H\) samples. The effects of meson polarisation on the dimuon kinematic distributions are accounted for via a reweighting of the simulated events, which are initially simulated without polarisation. The quarkonium states in the decay of the Higgs boson are expected to be transversely polarised; the quarkonium states in the decay of the Z boson are expected to be longitudinally polarised, due to a vanishing contribution from the transversely polarised meson [47]. Accounting for meson polarisation results in a 2–3% decrease in Higgs boson signal efficiency and a 9–10% increase in Z boson signal efficiency.

Figure 2 shows the generator-level photon and muon \(p_{\text {T}}\) distributions from the simulated \(H(Z)\rightarrow {{\mathcal {Q}}}\gamma \) signal events. For the \({J/\psi }\,\gamma \rightarrow \mathrm {\mu ^+ \mu ^-}\,\gamma \) and \(\psi (2S)\,\gamma \rightarrow \mathrm {\mu ^+ \mu ^-}\,\gamma \) final states, the total signal efficiency is 19% for the Higgs boson decays and 10% for the Z boson decays. These values take into account the trigger, reconstruction, identification, and isolation efficiencies, as well as the kinematic acceptance. The corresponding values for the \(\Upsilon (1S,2S,3S) \gamma \rightarrow \mathrm {\mu ^+ \mu ^-}\,\gamma \) final states are 21% and 13%. The difference in efficiency between the Higgs and Z boson decays arises primarily from the softer photon and muon \(p_{\text {T}}\) distributions associated with \(Z\rightarrow {{\mathcal {Q}}}\,\gamma \) production, as seen by comparing Fig. 2a and d for the \({J/\psi }\,\gamma \) case, Fig. 2b and e for the \(\psi (2S)\,\gamma \) case, and Fig. 2c and f for the \(\Upsilon (nS)\,\gamma \) case.

The \(m_{\mathrm {\mu ^+ \mu ^-}\,\gamma }\) resolution is 1.6–1.8% for both the Higgs and Z boson decays. For each of the final states, a two-dimensional probability density function (PDF) is used to model the signal in \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) and \(m_{\mathrm {\mu ^+ \mu ^-}}\). The Higgs boson signals are modelled with the sum of two bivariate Gaussian distributions, which describe the approximately \(60\%\) correlations between the two mass variables in each decay channel as well as the effects of detector resolution. For Z boson decays, since the Z boson natural width is comparable to the detector resolution, the correlation between \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) and \(m_{\mathrm {\mu ^+ \mu ^-}}\) is small, of order \(10\%\), and is neglected, and the two mass distributions are treated as uncorrelated. The \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) distributions of the Z boson signals are modelled with the sum of two Voigtian PDFs corrected with a mass-dependent efficiency factor, which accounts for the changing selection efficiency along the Z lineshape that arises from the kinematic requirements described in Sect. 3. The Voigtian shape is a convolution of a Breit–Wigner distribution, which describes the natural width of the Z boson, and a Gaussian distribution, which describes detector resolution effects. The \(m_{\mathrm {\mu ^+ \mu ^-}}\) distributions of the Z boson decays are modelled with a sum of two Gaussian PDFs, where the peak value and resolution parameters of the PDFs are fixed to the values obtained in a fit to the simulated event samples.

Systematic uncertainties in the signal yield and inferred branching fraction of the H and Z boson decays are considered. Uncertainties in the Higgs boson production cross sections total 5.8% [37, 45]. The uncertainty in the measured Z boson production cross section is 2.9% [44], where the luminosity component of the uncertainty in the Z boson production cross-section measurement is treated as completely uncorrelated with the integrated luminosity uncertainty of data set used in this search. The uncertainty in the integrated luminosity is estimated to be 1.7% [67, 68], using primary luminosity measurements from the LUCID-2 detector [66]. The uncertainty in the acceptance of the Higgs boson signal due to the choice of MC generator parameter values, parton distribution functions, set of tuned parameters for the underlying event, and parton showering, is estimated by studying how different choices affect the acceptance at generator level. The total uncertainty in the Higgs boson signal acceptance is estimated to be 1.8%. For the Z boson, the respective signal acceptance uncertainty is determined to be 1.0% by comparing the Z boson kinematic distributions in simulated events with measurements in data [99]. Trigger efficiencies for photons are determined from samples enriched with \(Z\rightarrow e^+ e^-\) events in data [100]. The photon trigger efficiency is estimated to contribute a systematic uncertainty of 0.8% to the expected signal yields [64, 65]. Photon identification efficiencies are determined using the enriched \(Z\rightarrow e^+ e^-\) event samples, as well as inclusive photon events and \(Z\rightarrow \ell ^+\ell ^-\gamma \) events [101, 102]. The photon identification efficiency uncertainties, for both the converted and unconverted photons, are 1.7–1.9% for the Higgs and Z boson signals. The effect of the muon reconstruction and identification efficiency uncertainty is 2.2–2.4% [103]. The photon energy scale uncertainty, determined from \(Z\rightarrow e^+ e^-\) events and validated using \(Z\rightarrow \ell ^+\ell ^- \gamma \) events [104, 105], is propagated through the simulated samples as a function of \(\eta ^{\gamma }\) and \(p_{\text {T}}^{\gamma }\). The uncertainty associated with the photon energy scale and resolution in the simulation has a 0.1–0.2% effect on the Higgs and Z boson signal yields. Similarly, the systematic uncertainty associated with the scale of the muon momentum measurement has a 0.1–0.5% effect on the signal yields [103]. To assess any effect on the expected signal yield from imperfect modelling of pile-up, the average number of pile-up interactions is varied in the simulation; the corresponding uncertainty is 0.7–1.1%. These systematic uncertainties in the expected signal yields are summarised in Table 3. The effect of the energy and momentum scale and resolution uncertainties on the Higgs and Z boson signal shapes is negligible.

Table 3 Summary of the systematic uncertainties in the expected signal yields. In the case of the H decays, the ‘signal acceptance’ uncertainty was estimated through parton distribution function variations, scale variations and variations of the underlying-event tune. For the Z decays, the ‘signal acceptance’ uncertainty was estimated by comparing simulation to data

5 Background modelling

The background is considered to be composed of two distinct contributions which are modelled separately in this analysis. The first is an exclusive contribution originating from \(\mathrm {\mu ^+ \mu ^-}\gamma \) events produced via the Drell–Yan process, where a highly energetic photon typically arises from final-state radiation. The second, which is the dominant background, is an inclusive contribution mostly from multi-jet and \(\gamma \)+jet events involving dimuon or \({\mathcal {Q}}\) production. The exclusive background is modelled with simulation, similarly to the signal model in Sect. 4, whilst the inclusive background is modelled with a data-driven technique, discussed in detail in Ref. [106]. The background is modelled independently for the \(\psi (nS)\) dimuon mass region, and the B and EC categories of the \(\Upsilon (nS)\) dimuon mass region.

5.1 Exclusive background

The Drell–Yan production of dimuons with a highly energetic photon, typically from final-state radiation, \(q\bar{q}\rightarrow \gamma ^{*}/Z^{*}\rightarrow \mathrm {\mu ^+ \mu ^-}\gamma \), constitutes a significant background contribution, exhibiting a characteristic resonant structure in the \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) distribution. The shapes of this background in \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) and \(m_{\mathrm {\mu ^+ \mu ^-}}\) are modelled using events simulated with Sherpa 2.2.10 [107] at leading order with the NNPDF3.0 parton distribution function set in the \(\psi (nS)\) and \(\Upsilon (nS)\) \(m_{\mathrm {\mu ^+ \mu ^-}}\) regions. The normalisation of this background is determined from a fit to the data in the signal region following the selection described in Sect. 3. The resonant shape in \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) is modelled analytically as the sum of a Voigtian function and a threshold function defined as \(f(x) = \sqrt{x - x_0}\,\textrm{e}^{-A(x - x_0)}\), where A and \(x_0\) are constants, \(f(x) = 0\) for \(x < x_0\), and x is the three-body mass \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\). The Voigtian function describes the on-shell Z production, which dominates in the \(\Upsilon (nS)\) channels, whereas the threshold function describes the off-shell \(\gamma ^*/Z^*\) production, which dominates in the \(\psi (nS)\) channels. Different forms of the threshold function were considered, but the function above was found to provide the best description of the background shape. The shape in \(m_{\mathrm {\mu ^+ \mu ^-}}\) is non-resonant and is modelled with a first-order Chebyshev polynomial. The parameters of the shape functions used to model each mass distribution are obtained from a fit to the simulated samples in each category once the signal region selection described in Sect. 3 is imposed. The statistical uncertainty associated with the parameters of the \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) shape function is accounted for via nuisance parameters in the maximum-likelihood fit to data in Sect. 7, based on the fit to the simulated samples, and is found to have a negligible effect on the final result.

Fig. 3
figure 3

The data-driven sequential sampling method used to generate pseudocandidate events for the inclusive background model. The labels ‘1D’ and ‘2D’ refer to the dimensionality of the PDFs used in the model generation of the variables underneath. Vertices labelled ‘3D’ signify that the output variable (or variables), indentifed by the arrow leading out of the vertex, is sampled from a three-dimensional PDF described in bins of the input variable (or variables), indentifed by the lines leading into the vertex. If two variables share a border, they are sampled simultaneously from a joint PDF. Vertices labelled ‘Sum’ signify that the output variable is calculated directly from the sum of the input variables

5.2 Inclusive background

The dominant background arises from inclusive multi-jet or \(\gamma \)+jet events that involve the production of \({\mathcal {Q}}\) states, which subsequently decay into \(\mathrm {\mu ^+ \mu ^-}\), or the production of non-resonant dimuon pairs such as from the Drell–Yan process or from random combinations of muons; the photon candidate may be genuine or, typically, a misidentified jet. The contribution from dimuon events in the inclusive background is separate from the exclusive \(q\bar{q}\rightarrow \mathrm {\mu ^+ \mu ^-}\gamma \) background discussed in Sect. 5.1, as the latter involves a genuine photon candidate originating from the same hard-scattering process. The complicated mixture of background contributions, which involve QCD processes and misidentification of physics objects, and the highly selective phase-space region of interest makes it challenging to model this inclusive background accurately with simulation. Furthermore, the features of the background shape, which exhibits a broad kinematic peak at the location of a possible \(Z\rightarrow {\mathcal Q}\gamma \) signal, combined with the relatively low number of events in the signal region make background modelling through direct fits of parametric models to the data unsuitable. For this reason, a generative approach is pursued, where the \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) shape of the inclusive background is obtained with a non-parametric data-driven model, described in Ref. [106] and used in several previous analyses [49, 50, 53, 54], using templates to describe the kinematic distributions. The background normalisation is extracted directly from a fit to the data, and shape variations are incorporated in the background model in the final discriminating variable; these shape variations are also profiled in the fit.

The background model generation uses a sample of approximately \(1.8\times 10^{4}\) \(\psi (nS)\,\gamma \) and \(8.9\times 10^{3}\) \(\Upsilon (nS)\,\gamma \) candidate events. These events pass all the kinematic selection requirements described in Sect. 3, except that the \({\mathcal {Q}}\) and \(\gamma \) candidates are not required to satisfy the nominal isolation requirements, and a looser minimum \(p_{\text {T}} ^{\mathrm {\mu ^+ \mu ^-}}\) requirement of \(30~\text {GeV} \) is imposed; these events define the background-dominated generation region (GR). The exclusive background contribution is subtracted from the data to prepare for the construction of the inclusive background model. The exclusive background mass shape is obtained from a fit to the simulated events that meet the GR selection criteria. The background model is derived iteratively; in each iteration the exclusive \(q\bar{q}\rightarrow \mathrm {\mu ^+ \mu ^-}\gamma \) contribution and the background model are fitted to the data in the GR to derive the exclusive background normalisation. PDFs are constructed to describe the distributions of the relevant kinematic and isolation variables and their most important correlations. The PDFs of these kinematic and isolation variables are sampled to generate an ensemble of pseudocandidate events, each with complete \({\mathcal {Q}}\) and \(\gamma \) candidate four-vectors and their associated isolation values. The important correlations among the kinematic and isolation variables of the background events, in particular between the \({\mathcal {Q}}\) and \(\gamma \) candidate transverse momenta \(p_{\text {T}} ^{{\mathcal {Q}}}\) and \(p_{\text {T}}^{\gamma }\), are retained in the generation of the pseudocandidate events through the following sequential sampling scheme, shown diagrammatically in Fig. 3:

  1. 1.

    The \({\mathcal {Q}}\)-candidate pseudorapidity \(\eta _{{\mathcal {Q}}}\), mass \(m_{{\mathcal {Q}}}\), and azimuthal angle \(\phi _{{\mathcal {Q}}}\), are drawn independently from one-dimensional PDFs, and values for \(p_{\text {T}} ^{{\mathcal {Q}}}\) and \(p_{\text {T}} ^{\mathcal {\gamma }}\) are drawn simultaneously from a two-dimensional PDF.

  2. 2.

    The pseudorapidity difference between the \({\mathcal {Q}}\) and \(\gamma \) candidates, \(\Delta \eta ({\mathcal {Q}},\gamma )\), and the \(\gamma \)-candidate calorimeter isolation are drawn simultaneously from a three-dimensional PDF, based on the value previously drawn for \(p_{\text {T}} ^{{\mathcal {Q}}}\).

  3. 3.

    The \({\mathcal {Q}}\)-candidate track isolation and the azimuthal angular separation between the \({\mathcal {Q}}\) and \(\gamma \) candidates, \(\Delta \phi ({\mathcal {Q}},\gamma )\), are drawn separately from two three-dimensional PDFs, given the selected values of the \(\gamma \)-candidate calorimeter isolation and \(p_{\text {T}} ^{{\mathcal {Q}}}\).

  4. 4.

    The \(\gamma \)-candidate track isolation is drawn from a three-dimensional PDF using the previously sampled values of the \({\mathcal {Q}}\)-candidate track isolation and \(\gamma \)-candidate calorimeter isolation variables, and the values for \(\eta _{\gamma }\) and \(\phi _{\gamma }\) are calculated from the values drawn for \(\eta _{{\mathcal {Q}}}\), \(\phi _{{\mathcal {Q}}}\), \(\Delta \eta ({\mathcal {Q}},\gamma )\), and \(\Delta \phi ({\mathcal {Q}},\gamma )\).

The nominal selection requirements are imposed and the surviving pseudocandidates are used to construct templates for the \(m_{\mathrm {\mu ^+ \mu ^-}\,\gamma }\) distributions, which are then smoothed using a Gaussian kernel density estimation method [108]. Potential contamination of the GR sample from signal events is expected to be negligible. Signal injection tests were performed where a significant amount of signal, much larger than the signal branching fraction excluded in the presented analysis, was added to the GR to investigate the effect of such a potential signal contamination on the model; it was found that the presence of the signal is largely inconsequential to the shape of the background model and does not lead to any peaking structures in the background templates. The ability to predict data using the background model is studied in several validation regions (VRs), defined by selections looser than the nominal signal requirements and tighter than the generation region requirements. The requirements imposed in each of the selection regions used in the analysis are summarised in Table 4. The application of the \(p_{\text {T}} ^{\mathrm {\mu ^+ \mu ^-}}\) requirement (VR1), the muon isolation requirements (VR2), and the photon isolation requirement (VR3) are each checked with this method, and in all regions the background model is found to describe the data well. This is shown in the comparison of the background model with data in each VR in Fig. 4, which includes the exclusive background contribution. The exclusive background shape in each selection region is obtained from a fit to the simulated events, and the normalisation is extrapolated from the fit to data in the GR.

Table 4 Summary of the selection regions used in the analysis. The term ‘Full’ indicates the corresponding requirement applied in the SR, and discussed in Sect. 3. The relaxed photon isolation requires the sum of the \(p_{\text {T}} \) of all ID tracks originating from the primary vertex and within \(\Delta R = 0.2\) of the photon direction, excluding any associated with the reconstructed photon, to be less than \(20\%\) of \(p_{\text {T}}^{\gamma }\) and the sum of the \(p_{\text {T}} \) of the calorimeter energy clusters within \(\Delta R = 0.4\) of the photon direction, excluding the energy of the reconstructed photon, to be less than \(\left( 2.45~\text {GeV} + 0.4 \times p_{\text {T}}^{\gamma }~[\text {GeV} ]\right) \). The relaxed \({\mathcal {Q}}\) isolation requires the sum of the \(p_{\text {T}} \) of the ID tracks within \(\Delta R = \min \{10~\text {GeV}/(p_{\text {T}}^{\,\mu }~[\text {GeV} ]), 0.3\}\) of the leading muon to be less than 40% of \(p_{\text {T}} ^{{\mathcal {Q}}}\)
Fig. 4
figure 4

The distribution of \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) in data compared to the prediction of the background model for ac \(H(Z)\rightarrow \psi (nS)\,\gamma \) and df \(H(Z)\rightarrow \Upsilon (nS)\,\gamma \) in the VR1, VR2 and VR3 validation regions. The total background is normalised to the observed number of events within the region shown, where the ratio of the exclusive and inclusive background components is extrapolated from the generation region. The uncertainty band corresponds to the uncertainty envelope derived from variations in the inclusive background modelling procedure. The dashed lines in the ratio plot in each figure indicate 1.2 and 0.8 on the y-axis. It should be noted that these plots are pre-fit, where the shapes of the inclusive and exclusive background components are fixed to the nominal template. In the maximum-likelihood fit to data in the signal region, the normalisation of each background is free, and their shapes are allowed to morph according to the defined shape variations

Fig. 5
figure 5

Distribution of \(\mathrm {\mu ^+ \mu ^-}\) invariant mass for a \(\psi (nS)\,\gamma \), b \(\Upsilon (nS)\,\gamma \) barrel category (B) and c \(\Upsilon (nS)\,\gamma \) endcap category (EC). Candidates satisfy the requirements of the background generation region (GR) defined in Sect. 5.2. The error bars on the data points denote their statistical uncertainty

The shape of the background model in three-body mass, \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\), is allowed to vary around the nominal shape, and the parameters controlling these systematic variations are treated as nuisance parameters in the maximum-likelihood fit described in Sect. 6. Three such shape variations are implemented to allow the background template to adjust to the observed data. The first variation is produced via a scale variation of the \(p_{\text {T}}^{\gamma }\) distribution in the model, which allows the peak of the three-body mass distribution to shift to lower or higher masses; the second variation is produced via a linear distortion of the shape of the \(\Delta \phi ({{\mathcal {Q}}},\gamma )\) distribution, which allows the width of the three-body mass distribution to increase or decrease; the third variation is produced via a global tilt of the three-body mass distribution around a pivot point, which allows the background to adapt to slopes with respect to the data. The first two variations are straightforward alterations to the underlying kinematics of the pseudocandidates, which cause corresponding changes in the three-body mass. Their nuisance parameters are loosely constrained by a Gaussian term in the likelihood, which arbitrarily assigns the \(\pm 1\sigma \) variations to a relatively large change in the shape, and these are subsequently constrained by the data. The third variation is applied directly to the final three-body mass template and left unconstrained in the final fit.

The \(m_{\mathrm {\mu ^+ \mu ^-}}\) distribution for \(\psi (nS)\,\gamma \) candidates that meet the background model generation criteria is shown in Fig. 5a and exhibits clear peaks at the \({J/\psi }\) and \(\psi (2S)\) masses. In Fig. 5b and  c, the corresponding distributions for the selected \(\Upsilon (nS)\,\gamma \) candidates are shown respectively for the B and EC categories, where the individual \(\Upsilon (1S)\),\(\Upsilon (2S)\), and \(\Upsilon (3S)\) peaks can be observed. During the \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) background model generation, a value for \(m_{\mathrm {\mu ^+ \mu ^-}}\) is drawn from the distributions shown in Fig. 5. No significant correlations between \(m_{\mathrm {\mu ^+ \mu ^-}}\) and \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) are observed. Thus, the \(m_{\mathrm {\mu ^+ \mu ^-}}\) distribution itself is modelled independently of \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) using analytical PDFs. The \(\psi (nS)\) and \(\Upsilon (nS)\) peaks are modelled with Gaussian PDFs, while the inclusive background is modelled with a first-order Chebyshev polynomial function; these functions were found to be sufficient to describe the data, given its statistical uncertainty. The parameters of these PDFs are obtained by means of a one-dimensional fit to events selected in the background model generation region, as shown in Fig. 5, and are used to model the \(m_{\mathrm {\mu ^+ \mu ^-}}\) distribution in the maximum-likelihood fit in Sect. 6. It is noted that the slope of the inclusive background is allowed to adapt to the observed data in the fit to the signal region, and that the relative normalisations of the different contributions to \(m_{\mathrm {\mu ^+ \mu ^-}}\) are left free.

6 Statistical methods

The data selected by the signal region criteria are compared with background and signal predictions using a two-dimensional (2D) unbinned maximum-likelihood fit in \(m_{\mathrm {\mu ^+ \mu ^-}\,\gamma }\) and \(m_{\mathrm {\mu ^+ \mu ^-}}\), for events with \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }<300~\text {GeV} \). This allows the \(H(Z)\rightarrow {\mathcal {Q}}\,\gamma \) signals to be distinguished from each other, as well as the from the exclusive and inclusive background contributions. The likelihood function \({{{\mathcal {L}}}}\) for each of the \(\psi (nS)\) and \(\Upsilon (nS)\) analyses is constructed using the signal and background models described in Sects. 4 and 5. The parameters of interest \(\vec {\mu }=\{\mu _i\}\) are the signal strengths, which correspond to the signal rate normalised to the SM expectation, for each of the Higgs and Z boson signals counted by the index i. The likelihood for the \(\psi (nS)\) analysis when n events are observed in the signal region is defined asFootnote 2:

$$\begin{aligned} \begin{aligned} {{{\mathcal {L}}}}(\vec {\mu },\vec {b}, \alpha ,\theta ,\theta ')&= {{{\mathcal {P}}}}\left( n|\sum ^{4}_{i=1}\mu _{i}\cdot s_{i} (\vec {\alpha })\right. \\&\quad \left. +\sum ^{4}_{j=1} b_j\right) \times \prod _{r} \mathcal{G}(\alpha _r|0,1) \\&\quad \times \prod _{k=1}^{n}\left( \sum ^{4}_{i=1}\mathcal{F}^{s}_{i}\cdot {{{\mathcal {S}}}}_i(m^k_{\mathrm {\mu ^+ \mu ^-}\gamma },m^k_\mathrm {\mu ^+ \mu ^-}|\mu _i)\right. \\&\quad \left. + \sum ^{4}_{j=1}{{{\mathcal {F}}}}^{b}_{j}\cdot \mathcal{R}_j(m^k_{\mathrm {\mu ^+ \mu ^-}\gamma }|\vec {\theta })\mathcal{M}_j(m^k_\mathrm {\mu ^+ \mu ^-}|\theta ')\right) \\&\quad \times \prod _{l} {{{\mathcal {G}}}}(\theta _l|0,1) \\ \end{aligned} \end{aligned}$$

In the above equation, \({{{\mathcal {P}}}}\) is the Poisson distribution for n observed events given the total signal and background. The symbol \(s_i(\vec {\alpha })\) denotes the expected SM signal yield for signal i as modified by the nuisance parameters \(\vec {\alpha }\), which correspond to the signal normalisation systematic uncertainties discussed in Sect. 4. These nuisance parameters are counted by index r and are constrained with standard Gaussian PDF terms, \({{{\mathcal {G}}}}\). The normalisation parameters associated with each independent background contribution are denoted by \(\vec {b}=\{b_j\}\). These are not constrained and are determined directly from the fit to the data. In the case of the \(\psi (nS)\) analysis, the background components are exclusive \(q\bar{q}\rightarrow \mathrm {\mu ^+ \mu ^-}\gamma \) production, inclusive \(J/\psi \) decays, inclusive \(\psi (2S)\) decays, and inclusive non-resonant dimuon production.Footnote 3 The symbols \({{{\mathcal {F}}}}^s_i\) and \({{{\mathcal {F}}}}^b_j\) denote the signal i and background j as fractions of the total signal and background, respectively. The shape of signal i in \((m_{\mathrm {\mu ^+ \mu ^-}\gamma },m_\mathrm {\mu ^+ \mu ^-})\) is given by the PDF \({{{\mathcal {S}}}}_i\). Correspondingly, the shape of the background component j is given by the product of PDFs \(\mathcal{R}_j(m_{\mathrm {\mu ^+ \mu ^-}\gamma })\) and \({{{\mathcal {M}}}}_j(m_\mathrm {\mu ^+ \mu ^-})\), since no correlations between \(m_\mathrm {\mu ^+ \mu ^-}\) and \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) are observed in background shape. The nuisance parameters \(\vec {\theta }\) parameterise the systematic variations of the background shapes discussed in Sect. 5. They are counted by index l and are constrained with standard Gaussian terms. The only exception is the nuisance parameter related to the ‘global tilt’ systematic shape variation of the inclusive background, which is not constrained. The nuisance parameter \(\theta '\) corresponds to the slope in \(m_\mathrm {\mu ^+ \mu ^-}\) of the non-resonant component of the inclusive background, which is free in the fit.

Upper limits are set on the branching fractions for the Higgs and Z boson decays into \({{\mathcal {Q}}}\,\gamma \) using the CL\(_\textrm{s}\) modified frequentist formalism [109] with the profile-likelihood-ratio test statistic and the asymptotic approximations derived in Ref. [110]. When setting limits for one of the signals, the other potential signal contributions are treated as nuisance parameters and are profiled in the fit. Only the decays \({\mathcal {Q}}\rightarrow \mathrm {\mu ^+ \mu ^-}\) are considered in the limit setting, feed-down from \(Q'\rightarrow Q + X\) transitions are not included.

The ability of the fit to identify potential signal contributions was verified through signal injection tests. Signals corresponding to a branching fraction of \(5\times 10^{-4}\) and \(5\times 10^{-7}\) for the Higgs and Z boson decays, respectively, were injected into an Asimov dataset [110] constructed from a background-only fit in the signal region. The full fit accurately recovered the injected signals.

7 Results

In total, 3394 events are observed in the \(\psi (nS)\,\gamma \) signal region and 3577 events are observed in the \(\Upsilon (nS)\,\gamma \) signal region. The results of the background-only fits for the \(\psi (nS)\,\gamma \) and \(\Upsilon (nS)\,\gamma \) analyses are shown in Figs. 6 and 7 respectively, where the signal distributions shown correspond to the extracted 95% confidence-level (CL) branching fraction upper limits. The expected and observed numbers of background events within the \(m_{\mathrm {\mu ^+ \mu ^-}\,\gamma }\) ranges relevant to the Higgs and Z boson signals are given in Table 5, where the expected backgrounds are obtained from these background-only fits. The exclusive contribution to the total background in these regions of relevance ranges from approximately 10% for the \(H\rightarrow J/\psi \,\gamma \) and 22% for \(H\rightarrow \psi (2\,S)\,\gamma \) to 21% and 41% for the corresponding Z boson decay searches. For the \(\Upsilon (nS)\,\gamma \) analysis the exclusive contribution ranges from 24% to 29% for the ranges of relevance for the Higgs boson searches, while for the ranges of relevance for the Z boson searches it is between 75% and 79%. Table 5 also shows the expected number of signal events for reference branching fractions near the sensitivity of the analysis: \(10^{-3}\) for \(H\rightarrow {\mathcal {Q}}\,\gamma \) and \(10^{-6}\) for \(Z\rightarrow {\mathcal {Q}}\,\gamma \).

Fig. 6
figure 6

Projection of the \(\psi (nS)\) channel fit in \(m_{\mathrm {\mu ^+ \mu ^-}}\) for the a Z boson and b Higgs boson \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) regions, and in \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) for the c \({J/\psi }\) and d \(\psi (2S)\) \(m_{\mathrm {\mu ^+ \mu ^-}}\) regions. The dimuon/\({{{\mathcal {Q}}}}(nS)\) backgrounds in the legend refer to the inclusive background contribution which is non-resonant/resonant in \(m_\mathrm {\mu ^+ \mu ^-}\). The branching fraction of each of the signals is set to the observed \(95\%\) CL upper limit

Fig. 7
figure 7

Projection of the \(\Upsilon (nS)\) channel fit in \(m_{\mathrm {\mu ^+ \mu ^-}}\) for the a Z boson and b Higgs boson \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) regions, and in \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) for the \(\Upsilon (1S,2S,3S)\) \(m_{\mathrm {\mu ^+ \mu ^-}}\) regions in ce, respectively. The dimuon/\({{{\mathcal {Q}}}}(nS)\) backgrounds in the legend refer to the inclusive background contribution which is non-resonant/resonant in \(m_\mathrm {\mu ^+ \mu ^-}\). The branching fraction of each of the signals is set to the observed \(95\%\) CL upper limit

Table 5 Numbers of observed and expected background events for the \(m_{\mathrm {\mu ^+ \mu ^-}\gamma }\) ranges of interest. Each expected background and the corresponding uncertainty of its mean is obtained from a background-only fit to the data; the uncertainty does not take into account statistical fluctuations in each mass range. Expected Z and Higgs boson signal contributions, with their corresponding total systematic uncertainty, are shown for reference branching fractions of \(10^{-6}\) and \(10^{-3}\), respectively. The ranges in \(m_{\mathrm {\mu ^+ \mu ^-}}\) are centred around each quarkonium resonance, with a width driven by the resolution of the detector; in particular, the ranges for the \(\Upsilon (nS)\) resonances are based on the resolution in the endcaps. It is noted that the discrepancy between the observed and expected backgrounds for \(m_{\mathrm {\mu ^+ \mu ^-}} = 9.0\)\(9.8~\text {GeV} \) in the endcaps was found to have a small impact on the observed limit for \(Z\rightarrow \Upsilon (1S)\,\gamma \)
Table 6 Expected, with the corresponding \(\pm 1\sigma \) intervals, and observed 95% CL branching fraction upper limits for the Higgs and Z boson decays into a quarkonium state and a photon. Standard Model production of the Higgs boson is assumed. The corresponding upper limits on the production cross section times branching fraction \(\sigma \times {{\mathcal {B}}}\) are also shown

From the fit to the observed data, the largest observed local excess is 1.9\(\sigma \) in the search for \(Z\rightarrow {J/\psi }\,\gamma \), followed by a 0.8\(\sigma \) excess in the search for \(H\rightarrow \psi (2S)\,\gamma \). The expected and observed 95% CL upper limits on the branching fractions for Higgs and Z boson decays into a quarkonium state and a photon are presented in Table 6, along with the observed upper limits in terms of Higgs and Z boson production cross section times branching fraction to a quarkonium state and a photon. The expected sensitivity improves by a factor of approximately two relative to the previous ATLAS result presented in Ref. [50]; this is in line with what is expected from the increase in integrated luminosity for this search, as this analysis is limited primarily by statistical uncertainties. The systematic uncertainties in the signal normalisation and background shape described respectively in Sects. 4 and 5 result in a 0.8% increase of the expected 95% CL upper limit on the branching fraction of the \(H\rightarrow {J/\psi }\,\gamma \) decay. For the \(Z\rightarrow {J/\psi }\,\gamma \) decay the effect is larger, 4.2%, mostly due to the systematic uncertainty in the background shape. The increase is 0.1% for \(H\rightarrow \psi (2S)\,\gamma \), and 0.6% for \(Z\rightarrow \psi (2S)\,\gamma \). Similar behaviour is observed in the \(\Upsilon (nS)\,\gamma \) analysis, with systematic uncertainties resulting in a 0.1–0.8% deterioration in the sensitivity to the \(H\rightarrow \Upsilon (1S,2S,3S)\,\gamma \) decays and a 0.3–0.5% deterioration in the sensitivity to the \(Z\rightarrow \Upsilon (1S,2S,3S)\,\gamma \) decays.

For the interpretation of these searches in terms of constraints on the charm- and bottom-quark Yukawa couplings the approach presented in Refs. [31, 38] is employed: The ratio of signal strength (\(\mu \)) measurements for the \(H\rightarrow J/\psi \,\gamma \) and \(H\rightarrow \gamma \gamma \) channels corresponds to the ratio of measurements of their production cross section times branching fraction \(\sigma \times {{\mathcal {B}}}\), normalised to their respective SM expectations. This is, to a good approximation, equal to the ratio of the respective partial decay widths \(\Gamma \) normalised to their SM expectation \(\Gamma ^\text {SM}\), since the dependence on the production mechanism and Higgs total width cancels out. The ratio \(\kappa _c/\kappa _\gamma \) of the coupling modifiers, each of which is the ratio of the coupling to its SM value, for the charm-quark Yukawa coupling and the effective coupling of Higgs boson to photons can be estimated as:

$$\begin{aligned}{} & {} \frac{\mu _{H\rightarrow J/\psi \,\gamma }}{\mu _{H\rightarrow \gamma \gamma }}= \frac{\sigma _H{\mathcal {B}}_{H\rightarrow J/\psi \,\gamma } / \sigma ^\text {SM}_H{\mathcal {B}}^\text {SM}_{H\rightarrow J/\psi \,\gamma }}{\sigma _H{\mathcal {B}}_{H\rightarrow \gamma \gamma }/\sigma ^\text {SM}_H{\mathcal {B}} ^\text {SM}_{H\rightarrow \gamma \gamma }} \\{} & {} \quad \approx \frac{\Gamma _{H\rightarrow J/\psi \,\gamma }/\Gamma ^\text {SM}_{H\rightarrow J/\psi \,\gamma }}{\Gamma _{H\rightarrow \gamma \gamma }/\Gamma ^\text {SM}_{H\rightarrow \gamma \gamma }}=\frac{|\mathcal{A}_\text {ind}+\mathcal{A}_\text {dir}\kappa _c/\kappa _\gamma |^2}{\Gamma ^\text {SM}_{H\rightarrow J/\psi \,\gamma }} \end{aligned}$$

The indirect and direct amplitudes \({{{\mathcal {A}}}}\) for \(H\rightarrow J/\psi \,\gamma \) and \(H\rightarrow \Upsilon (nS)\,\gamma \) interfere destructively and are obtained from Ref. [36].Footnote 4 The signal strength for \(H\rightarrow \gamma \gamma \) is obtained from Ref. [111]. An observed 95% CL interval of \((-133,175)\) is obtained for \(\kappa _c/\kappa _\gamma \), with the expected interval being \((-120, 161)\). The interval is dominated by the statistical uncertainty of the \(H\rightarrow {J/\psi }\,\gamma \) search. The correlated components in the uncertainties of the two measurements were removed, but this had negligible impact. The theoretical uncertainties of the amplitudes result in a widening of the obtained interval by approximately 8%, mainly through the uncertainty in the real part of the direct amplitude. Furthermore, the magnitude of the direct amplitude, which is sensitive to the charm-quark coupling to the Higgs boson, is significantly smaller in the most recent theory calculations [34, 36] than in earlier ones [29], leading to much weaker constraints. Very large values of \(\kappa _c\) lead to tensions with other ATLAS [112] and CMS [113] measurements of Higgs boson couplings [114]. A similar relation can be written for the ratio \(\kappa _b/\kappa _\gamma \), where \(\kappa _b\) is the coupling modifier for the bottom-quark Yukawa coupling. Combining the three \(\Upsilon (nS)\,\gamma \) decays, and accounting for the \(-21\%\) correlation between \(\mu _{H\rightarrow \Upsilon (2S)\,\gamma }\) and \(\mu _{H\rightarrow \Upsilon (3S)\,\gamma }\), a 95% CL interval of \((-37,40)\) is obtained for \(\kappa _b/\kappa _\gamma \). The corresponding expected 95% CL interval is \((-37,39)\). The \(\Upsilon (1S)\,\gamma \) decay contributes most of the sensitivity to \(\kappa _b/\kappa _\gamma \) thanks to its indirect amplitude being the largest amongst the \(\Upsilon (nS)\,\gamma \) decays. Similarly to the \(\kappa _c/\kappa _\gamma \) case, the statistical uncertainty of the search for exclusive Higgs boson decays into \(\Upsilon (nS)\,\gamma \) dominates the interval. The theoretical uncertainties of the amplitudes enlarge the interval by 12%.

8 Summary

Searches for the exclusive decays of Higgs and Z bosons into a vector quarkonium state and a photon have been performed with a \(\sqrt{s} =13~\text {TeV} \) pp collision data sample collected with the ATLAS detector at the LHC and corresponding to an integrated luminosity of \(139~\text{ fb}^{-1} \). The observed data are compatible with the background expectations. The 95% CL upper limits obtained for the \({J/\psi }\,\gamma \) final state are \(\mathcal{B}\left( H\rightarrow {J/\psi }\,\gamma \right) <2.0\times 10^{-4}\) and \(\mathcal{B}\left( Z\rightarrow {J/\psi }\,\gamma \right) < 1.2\times 10^{-6}\). The corresponding upper limits for the \(\psi (2S)\,\gamma \) final state are \({{{\mathcal {B}}}}\left( H\rightarrow \psi (2S)\,\gamma \right) <10.5\times 10^{-4}\) and \({{{\mathcal {B}}}}\left( Z\rightarrow \psi (2S)\,\gamma \right) < 2.4\times 10^{-6}\). The 95% CL upper limits \(\mathcal{B}\left( H\rightarrow \Upsilon (nS)\,\gamma \right) < (2.5,4.2,3.4)\times 10^{-4}\) and \(\mathcal{B}\left( Z\rightarrow \Upsilon (nS)\,\gamma \right) < (1.1,1.3,2.4)\times 10^{-6}\) are set for the \(\Upsilon (1S,2S,3S)\,\gamma \) final states. These upper limits represent an improvement by a factor of approximately two relative to the previous results from the ATLAS Collaboration using \(36.1~\text{ fb}^{-1} \) of \(\sqrt{s}=13~\text {TeV} \) pp collision data. Further, constraints are set on the ratio \(\kappa _c/\kappa _\gamma \) and \(\kappa _b/\kappa _\gamma \) of the Higgs boson coupling modifiers. An observed 95% CL interval of \((-133,175)\) is obtained for \(\kappa _c/\kappa _\gamma \), and a 95% CL interval of \((-37,40)\) is obtained for \(\kappa _b/\kappa _\gamma \).