1 Introduction

A new boson with a mass of 125\(\,\text {GeV}\) was observed in data from the ATLAS and CMS experiments at the CERN LHC [1,2,3,4,5,6,7]. All measurements of the properties of this boson are consistent with those of the Higgs boson (\(\mathrm {H} \)) of the standard model (SM). However, the Yukawa couplings of the Higgs boson to the first- and second-generation quarks are currently only weakly constrained. Rare exclusive decays of the Higgs boson to mesons in association with a photon can be used to explore such couplings. For example, the \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) decay can probe the Higgs boson coupling to the charm quark [8]. The corresponding decay, \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \), can be used as an experimental benchmark in the search for \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) [9, 10], and in checking approaches to factorization in quantum chromodynamics (QCD) used to estimate branching fractions (\(\mathcal {B}\)) in radiative decays of electroweak bosons [11].

Both \(\mathrm {Z}\) and Higgs boson decays receive contributions from direct and indirect processes. In the direct process, \(\mathrm {Z}\) and Higgs bosons couple to charm quarks, and charm quarks then hadronize to form \({\mathrm {J}/\psi } \) mesons. In the indirect process, the \(\mathrm {Z}\) and Higgs bosons decay through quark or \(\mathrm {W}\) boson loops to \(\gamma \gamma ^{*}\), and the \(\gamma ^{*}\) then converts to a \(\mathrm {c} \overline{\mathrm {c}} \) resonant state. The lowest order Feynman diagrams for these decay modes are shown in Fig. 1. The latest SM calculations of the branching fractions of both decays, taking into account the interference between direct and indirect processes, are [12, 13]:

$$\begin{aligned} \mathcal {B}_{\text {SM}}(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma )= & {} (9.0^{+1.5}_{-1.4})\times 10^{-8}, \end{aligned}$$
(1)
$$\begin{aligned} \mathcal {B}_{\text {SM}}(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma )= & {} (3.0^{+0.2}_{-0.2})\times 10^{-6}. \end{aligned}$$
(2)
Fig. 1
figure 1

Lowest order Feynman diagrams for the \(\mathrm {Z}\) (or \(\mathrm {H} \))\(\rightarrow {\mathrm {J}/\psi } \gamma \) decay. The left-most diagram shows the direct and the remaining diagrams the indirect processes

Modified \(\mathrm {H} \mathrm {c} \overline{\mathrm {c}} \) couplings can arise in certain extensions of the SM [14]. For example, within the context of effective field theory, the \(\mathrm {H} \mathrm {c} \overline{\mathrm {c}} \) coupling may be modified in the presence of a dimension-six operator, leading to an enhancement of coupling relative to the SM at the cutoff scale \(\Lambda \) that can be as small as 30\(\,\text {TeV}\). This provides no other signature of new physics at the LHC. In the two Higgs doublet model with minimal flavor violation [15, 16], the \(\mathrm {H} \mathrm {c} \overline{\mathrm {c}} \) coupling can be significantly enhanced by breaking flavor symmetry, while other couplings are not severely affected. The composite pseudo-Nambu-Goldstone boson model [17] parametrizes the coupling by the degree of compositeness and compositeness scale. The coupling can be constrained through a direct experimental search for the composite particles associated with the charm quark [18].

Deviations from SM predictions for the couplings can affect the interference terms and result in changes to the branching fractions. For example, the shift in the branching fraction for \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) can be more than 100% if the \(\mathrm {H} \mathrm {c} \overline{\mathrm {c}} \) coupling deviates from its SM value by more than a factor of 2 [8]. Since this Higgs boson decay is sensitive to the \(\mathrm {H} \mathrm {c} \overline{\mathrm {c}} \) coupling, a measurement of the branching fraction can verify whether the Higgs boson couples to second-generation quarks with the strength predicted by the SM.

The ATLAS experiment has searched for the decay \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \) in proton-proton (\(\mathrm {p}\mathrm {p}\)) collisions collected at \(\sqrt{s}=8\,\text {TeV} \) [19]. The respective observed and expected upper limits at 95% confidence level (\(\text {CL}\)) on the branching fraction were reported to be 2.6 and \(2.0^{+1.0}_{-0.6}\times 10^{-6}\), where the subscript and superscript reflect the range in the 68% central-quantiles of upper limits assuming a background-only hypothesis. Searches for the \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) decay were performed by ATLAS and CMS in \(\mathrm {p}\mathrm {p}\) collisions collected at \(\sqrt{s}=8\,\text {TeV} \) [19, 20]. The respective observed and expected upper limits in the branching fractions were 1.5 and \(1.2^{+0.6}_{-0.3}\times 10^{-3}\) from ATLAS, and 1.5 and \(1.6^{+0.8}_{-0.8}\times 10^{-3}\) from CMS. The ATLAS experiment performed similar searches for both the \(\mathrm {Z}\) and Higgs boson decays in \(\mathrm {p}\mathrm {p}\) collisions collected at \(\sqrt{s}=13\,\text {TeV} \). The respective observed and expected upper limits on the branching fractions were 2.3 and \(1.1^{+0.5}_{-0.3}\times 10^{-6}\) for the \(\mathrm {Z}\) boson decay, and 3.5 and \(3.0^{+1.4}_{-0.8}\times 10^{-4}\) for the Higgs boson decay [21]. The ATLAS experiment also searched for the \(\mathrm {H} \rightarrow \mathrm {c} \overline{\mathrm {c}} \) decay in \(\mathrm {p}\mathrm {p}\rightarrow \mathrm {Z}\mathrm {H} \) production in data collected at \(\sqrt{s}=13\,\text {TeV} \) [22], and reported observed and expected limits on the ratio \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {Z}\mathrm {H})\times \mathcal {B}(\mathrm {H} \rightarrow \mathrm {c} \overline{\mathrm {c}} )\) relative to the SM prediction of 110 and \(150^{+80}_{-40}\) respectively, where \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {Z}\mathrm {H})\times \mathcal {B}(\mathrm {H} \rightarrow \mathrm {c} \overline{\mathrm {c}} )\) is the upper limit for the cross section.

The results presented in this paper are based on \(\mathrm {p}\mathrm {p}\) collisions at \(\sqrt{s}=13\,\text {TeV} \) recorded with the CMS detector, corresponding to an integrated luminosity of 35.9\(\,\text {fb}^{-1}\).

2 The CMS detector

A detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [23]. The central feature of the CMS apparatus is a superconducting solenoid, 13 m in length and 6 m in internal diameter, providing an axial magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (\(\eta \)) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.

The silicon tracker measures charged particles within the range \(|\eta | < 2.5\). It consists of 1440 silicon pixel and 15 148 silicon strip detector modules. For non-isolated particles with transverse momentum, \(p_{\mathrm {T}}\), between 1 and 10\(\,\text {GeV}\) and \(|\eta | < 1.4\), the track resolutions are typically 1.5% in \(p_{\mathrm {T}}\) and 25–90 (45–150) \(\mu \)m in the transverse (longitudinal) direction [24].

The ECAL consists of 75 848 crystals, which provide coverage in \(|\eta | < 1.479\) in the barrel region (EB) and \(1.479< |\eta | < 3.000\) in the two endcap regions (EE). The preshower detectors, each consisting of two planes of silicon sensors interleaved with a total of \(3X_{0}\) of lead are located in front of the EE [25, 26]. In the barrel section of the ECAL, an energy resolution of about 1% is achieved for unconverted or late-converting photons in the tens of \(\,\text {GeV}\) energy range. The remaining barrel photons have a resolution of about 1.3% up to \(|\eta | = 1\), rising to about 2.5% at \(|\eta | = 1.4\). In the endcaps, the resolution of unconverted or late-converting photons is about 2.5%, while the remaining endcap photons have a resolution between 3 and 4% [26].

Muons are measured in the range \(|\eta | < 2.4\), with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive plate chambers. Matching muons to tracks measured in the silicon tracker results in a relative \(p_{\mathrm {T}}\) resolution, for muons with \(p_{\mathrm {T}}\) up to 100\(\,\text {GeV}\), of 1% in the barrel and 3% in the endcaps. The \(p_{\mathrm {T}}\) resolution in the barrel is better than 7% for muons with \(p_{\mathrm {T}}\) up to 1\(\,\text {TeV}\)  [27].

A two-tier trigger system selects collision events of interest. The first level (L1) of the CMS trigger system [28], composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a fixed time interval of less than 4 \(\upmu \)s. The high-level trigger processor farm further decreases the event rate from around 100 kHz to less than 1 kHz, before data storage.

3 Data and simulated samples

The L1 trigger requires the presence of a muon with \(p_{\mathrm {T}}\) greater than 5\(\,\text {GeV}\) and an isolated electromagnetic object with \(p_{\mathrm {T}}\) greater than 18\(\,\text {GeV}\). The HLT algorithm requires the presence of a muon and a photon with \(p_{\mathrm {T}}\) exceeding 17 and 30\(\,\text {GeV}\), respectively. No isolation requirement is imposed on the muons because of the small angular separation expected between the muons in signal events. No further isolation constraint is required for the photon. The trigger efficiency for events satisfying the selection used in the analysis is determined using a high-purity (\(\sim 97\%\)) \(\mathrm {Z}\rightarrow \mu \mu \gamma \) control sample; it is measured to be \(82\pm 0.7\%\) in data and \(83\pm 0.4\%\) in simulated events.

Simulated samples of the \(\mathrm {Z}\) and Higgs boson decays are used to estimate the expected signal yields and model the kinematic distributions of signal events. The \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \rightarrow \mu \mu \gamma \) sample, with \(m_{\mathrm {Z}}=91.2\,\text {GeV} \) [29], is produced with the pythia 8.226 Monte Carlo (MC) event generator [30, 31], with hadronization and fragmentation using underlying event tune CUETP8M1 [32]. The parton distribution function (PDF) set used is NNPDF3.0 [33]. The SM \(\mathrm {Z}\) boson production cross section includes the next-to-next-to-leading order (NNLO) QCD contributions, and the next-to-leading order (NLO) electroweak corrections from fewz 3.1 [34] calculated using the NLO PDF set NNPDF3.0. The \(\mathrm {Z}\) boson \(p_{\mathrm {T}} \) is reweighted to match the NLO calculation [35,36,37].

The \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \rightarrow \mu \mu \gamma \) sample with \(m_{\mathrm {H}}=125\,\text {GeV} \) is produced with the powheg v2.0 MC event generator [35, 36] and includes gluon-gluon fusion (\(\mathrm {g} \mathrm {g} \)F), vector boson fusion (VBF), associated vector boson production (V\(\mathrm {H} \)), and associated top quark pair production (\({\mathrm {t}\overline{\mathrm {t}}} \mathrm {H} \)). The generator is interfaced with pythia 8.212 [30, 31] for hadronization and fragmentation with tune CUETP8M1. The PDF set used is NNPDF3.0. The SM Higgs boson cross section is taken from the LHC Higgs cross section working group recommendations [38].

In the SM, the \({\mathrm {J}/\psi } \) meson from the Higgs boson decay must be fully transversely polarized in helicity frame (\(\lambda _\theta = +1\), as described in Ref. [39]), because the Higgs boson has spin 0, and the photon is transversely polarized. Since the polarization of the \({\mathrm {J}/\psi } \) meson is not correctly simulated in the signal samples, a reweighting factor is applied to each event to emulate the effect of polarization. The reweighting procedure results in a decrease of the signal acceptance by \(7.0\%\). For the \(\mathrm {Z}\) boson decay, the helicity of the \({\mathrm {J}/\psi } \) meson depends on that of the \(\mathrm {Z}\) boson, which can have multiple helicity states. The results from the \(\mathrm {Z}\) boson polarization measurement [40, 41] are not used to constrain the helicity of the \({\mathrm {J}/\psi } \) meson in this analysis. The nominal results are obtained using a signal acceptance calculated for the unpolarized case. Assuming that the \({\mathrm {J}/\psi } \) is produced with full transverse or longitudinal polarization (\(\lambda _\theta = +1\) or \(-1\)) changes the acceptance by \(-7.8\%\) or \(+15.6\%\), respectively.

Fig. 2
figure 2

The lowest order Feynman diagrams for the Drell-Yan process in \(\mathrm {p}\mathrm {p}\rightarrow \mathrm {Z}\rightarrow \mu \mu \gamma \). The background exhibits a peak in \(m_{\mu \mu \gamma }\) at the \(\mathrm {Z}\) boson mass

The Drell-Yan process, \(\mathrm {p}\mathrm {p}\rightarrow \mathrm {Z}\rightarrow \mu \mu \gamma \), produces the same final state as the signal. This process exhibits a peak at the \(\mathrm {Z}\) boson mass, \(m_{\mathrm {Z}}\), in the three-body invariant mass, \(m_{\mu \mu \gamma }\), as do the signal events, and it is therefore referred to as a resonant background. This background is included when deriving the upper limit on the branching fraction for \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \). The lowest order Feynman diagrams for the \(\mathrm {p}\mathrm {p}\rightarrow \mathrm {Z}\rightarrow \mu \mu \gamma \) process are shown in Fig. 2. The MadGraph 5_amc@nlo 2.6.0 matrix element generator [37] is used to generate a sample of these resonant background events at leading order with the NNPDF3.0 PDF set, interfaced with pythia 8.226 for parton showering and hadronization with tune CUETP8M1. The photons in these events are all produced in final-state radiation from the \(\mathrm {Z}\rightarrow \mu \mu \) decay, and therefore the \(m_{\mu \mu \gamma }\) distribution peaks at the \(\mathrm {Z}\) boson mass without a continuum contribution.

Fig. 3
figure 3

The lowest order Feynman diagrams for the Higgs boson Dalitz decay of \(\mathrm {H} \rightarrow \gamma ^{*}\gamma \rightarrow \mu \mu \gamma \). The background exhibits a peak in \(m_{\mu \mu \gamma }\) at the Higgs boson mass

Similarly, the Higgs boson Dalitz decay [42], \(\mathrm {H} \rightarrow \gamma ^{*}\gamma \rightarrow \mu \mu \gamma \), is a resonant background to \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) decay. The lowest order Feynman diagrams for the \(\mathrm {H} \rightarrow \gamma ^*\gamma \) process are shown in Fig. 3. Samples of the Higgs boson Dalitz decays, produced via \(\mathrm {g} \mathrm {g} \)F, VBF, V\(\mathrm {H} \) modes for \(m_{\mathrm {H}}=125\,\text {GeV} \), are simulated at NLO using the MadGraph 5_amc@nlo generator interfaced with pythia 8.212 for parton showering and hadronization. The \({\mathrm {t}\overline{\mathrm {t}}} \mathrm {H} \) contribution is accounted for by scaling the VBF signal to the \({\mathrm {t}\overline{\mathrm {t}}} \mathrm {H} \) production cross section. The branching fraction for \(\mathrm {H} \rightarrow \gamma ^{*}\gamma \) is obtained from the mcfm 7.0.1 program [43]. The other source of resonant background is the decay of a Higgs boson into two muons with a photon radiated from one of the muons. After the event selection, described in Sect. 4, the contribution of this background is negligible.

There are also background processes that do not give resonant peaks in the three-body invariant mass spectrum. These are referred to as nonresonant backgrounds. These processes include: (1) inclusive quarkonium production associated with either jets or photons where energetic jets can be misidentified as a photon (\(\mathrm {p}\mathrm {p}\rightarrow {\mathrm {J}/\psi } +\text {jets}/\gamma \)), (2) the Drell-Yan process with associated jets (\(\mathrm {p}\mathrm {p}\rightarrow \mathrm {Z}/\gamma ^{*}+\text {jets}\)), and (3) associated photons plus jets production (\(\mathrm {p}\mathrm {p}\rightarrow \gamma +\text {jets}\)). These nonresonant backgrounds, which are discussed in Sect. 5, are modeled using fits to the \(m_{\mu \mu \gamma }\) distributions in data.

All generated events are processed through a detailed simulation of the CMS detector based on Geant4  [44]. Simultaneous \(\mathrm {p}\mathrm {p}\) interactions that overlap the event of interest (pileup) are included in the simulated samples. The distribution of the number of additional pileup interactions per event in the simulation corresponds to that observed in the 13\(\,\text {TeV}\) data collected in 2016.

4 Event reconstruction and selection

The global event reconstruction (also called particle-flow event reconstruction [45]) reconstructs and identifies each individual particle in an event with an optimized combination of all subdetector information. In this process, the identification of the particle type (photon, electron, muon, charged hadron or neutral hadron) plays an important role in the determination of the particle direction and energy. Photons (e.g., coming from \(\mathrm {\pi ^0}\) decays or from electron bremsstrahlung) are identified as ECAL energy clusters not linked to the extrapolation to the ECAL of any charged particle trajectory. Electrons are identified as a primary charged particle track with one or more ECAL energy clusters consistent with the extrapolation of this track to the ECAL or with bremsstrahlung photons emitted as the electron passes through the tracker material. Muons (e.g., from \(\mathrm {b}\)-hadron semileptonic decays) are identified as a track in the central tracker consistent with either a track or several hits in the muon system, and associated with calorimeter deposits compatible with the muon hypothesis. Charged hadrons are identified as charged particle tracks that are not identified as electrons or muons. Finally, neutral hadrons are identified as either HCAL energy clusters not linked to any charged hadron trajectory or ECAL and HCAL energy excesses with respect to any expected charged hadron energy deposit.

The high instantaneous luminosity of the LHC results in multiple \(\mathrm {p}\mathrm {p}\) interactions per bunch crossing. The reconstructed vertex with the largest value of summed physics-object \(p_{\mathrm {T}} ^2\) is the primary \(\mathrm {p}\mathrm {p}\) interaction vertex. The physics objects are the jets, clustered using the anti-\(k_{\mathrm {T}} \) jet finding algorithm [46, 47] with the tracks assigned to the vertex as inputs, and the associated missing \(p_{\mathrm {T}}\), taken as the negative vector \(p_{\mathrm {T}}\) sum of those jets.

Photon and electron candidates are reconstructed by summing and clustering the energy deposits in the ECAL crystals. Groups of these clusters, called superclusters, are combined to recover the bremsstrahlung energy of electrons and converted photons passing through the tracker. In the endcaps, preshower energy is added in the region covered by the preshower (\(1.65<|\eta | <2.60\)). The clustering algorithms result in an almost complete recovery of the energy of photons.

A multivariate discriminant is used to identify photon candidates. The inputs to the discriminant are the isolation variables, the ratio of hadronic energy in the HCAL towers behind the superclusters to the electromagnetic energy in the superclusters, and the transverse width of the electromagnetic shower. A conversion-safe electron veto [26], which requires no charged-particle track with a hit in the inner layer of the pixel detector pointing to the photon cluster in the ECAL, is applied to avoid misidentifying an electron as a converted photon. Photons are required to be reconstructed within the region \(|\eta | < 2.5\), although those in the ECAL transition region \(1.44<|\eta |<1.57\) are excluded from the analysis. The efficiency of the photon identification procedure is measured with \(\mathrm {Z}\rightarrow \mathrm {e}\mathrm {e}\) events using “tag-and-probe” techniques [48], and is between 84–91 (77–94)%, depending on the transverse energy \(E_{\mathrm {T}} \), in the barrel (endcap). The electron veto efficiencies are measured with \(\mathrm {Z}\rightarrow \mu \mu \gamma \) events, where the photon is produced by final-state radiation, and found to be 98 (94)% in the barrel (endcap).

Muons are reconstructed by combining information from the silicon tracker and the muon system [49]. The matching between the inner and outer tracks proceeds either outside-in, starting from a track in the muon system, or inside-out, starting from a track in the silicon tracker. In the latter case, tracks that match track segments in only one or two planes of the muon system are also included in the analysis to ensure that very low-\(p_{\mathrm {T}}\) muons that may not have sufficient energy to penetrate the entire muon system are retained. Muons reconstructed only in the muon system are not retained for the analysis. In order to avoid reconstructing a single muon as multiple muons, whenever two muons share more than half of their segments, the one with lower reconstruction quality is removed. The compatibility with a minimum ionizing particle signature expected in the calorimeters is taken into account [50]. Muons with \(p_{\mathrm {T}} >4\,\text {GeV} \) and \(|\eta | <2.4\) are accepted.

To suppress muons originating from in-flight decays of hadrons, the impact parameter of each muon track, defined as its distance of closest approach to the primary event vertex position, is required to be less than 0.5 (1.0) cm in the transverse (longitudinal) plane. In addition, the three-dimensional impact parameter is required to be less than four times its uncertainty. A cone of size \(\varDelta R = \sqrt{\smash [b]{(\varDelta \phi )^2 + (\varDelta \eta )^2}} = 0.3\) is constructed around the momentum direction of each muon candidate, where \(\phi \) is the azimuthal angle in radians. The relative isolation variable for the muons is defined by summing the \(p_{\mathrm {T}}\) of all photons, charged hadrons, and neutral hadrons within this cone, correcting for additional underlying event activity due to pileup events [51], and then dividing by the muon \(p_{\mathrm {T}}\):

$$\begin{aligned} \begin{aligned} \mathcal {I}^{\mu } \equiv&\left( \sum p_{\mathrm {T}} ^\text {charged} \right. \\&\left. + \max \left[ 0, \sum p_{\mathrm {T}} ^\text {neutral} + \sum p_{\mathrm {T}} ^{\mathrm {\gamma }} - p_{\mathrm {T}} ^\mathrm {PU}(\mu ) \right] \right) / p_{\mathrm {T}} ^{\mu }, \end{aligned} \end{aligned}$$
(3)

where \(p_{\mathrm {T}} ^{\text {PU}}(\mu )\equiv 0.5\sum _{i} p_{\mathrm {T}} ^{\text {PU},i}\), and i runs over the momenta of the charged-hadron particle-flow candidates not originating from the primary vertex. The \(\sum p_{\mathrm {T}} ^\text {charged}\) is the scalar \(p_{\mathrm {T}}\) sum of charged hadrons originating from the primary event vertex. The \(\sum p_{\mathrm {T}} ^\text {neutral}\) and \(\sum p_{\mathrm {T}} ^{\mathrm {\gamma }}\) are the scalar \(p_{\mathrm {T}}\) sums of neutral hadrons and photons, respectively. The requirement \(\mathcal {I}^{\mu } < 0.35\) is imposed on the leading muon to reject muons from electroweak decays of hadrons within jets or any jets that punch through the calorimeters mimicking a muon signature. The angular separation \(\varDelta R\) between the two muons is small because of their low invariant mass, \(m_{\mu \mu }\), and the high \(p_{\mathrm {T}}\) of the \({\mathrm {J}/\psi } \) meson from the decay of the \(\mathrm {Z}\) or Higgs boson. Therefore, no isolation requirement is applied to the subleading muons since they are within the isolation cone of the leading muon in most events. The momentum of the subleading muon is excluded from the isolation calculation. The efficiency of identification is measured in \(\mathrm {Z}\rightarrow \mu \mu \) and \({\mathrm {J}/\psi } \rightarrow \mu \mu \) events using the tag-and-probe method, and is 94–98 (92–97)% in the barrel (endcap), depending on muon \(p_{\mathrm {T}}\) and \(\eta \). The isolation efficiency, which is \(p_{\mathrm {T}} \) dependent, is measured to be 90–100 (92–100)% in the barrel (endcap), and is consistent with the measurement from \(\mathrm {Z}\rightarrow \mu \mu \) events.

Table 1 The number of observed \(\mathrm {Z}\) or \(\mathrm {H} \) boson events, the expected signal yields, the expected nonresonant background with uncertainties estimated from the fit (described in Sect. 5), and the expected resonant background (see Sect. 3) contribution in the ranges of 81 or 120 \(< m_{\mu \mu \gamma }<\) 101 or 130\(\,\text {GeV}\), respectively, for the \(\mathrm {Z}\) or \( \mathrm {H} \) boson searches

Signal candidates are selected by applying additional selection criteria to events containing at least two muons and one photon. The two muons must have opposite charges and \(p_{\mathrm {T}} >20\ (4)\,\text {GeV} \) for the leading (subleading) muon. The \(p_{\mathrm {T}}\) requirement for the leading muon is driven by the trigger threshold. The requirement that the photon has \(E_{\mathrm {T}} >33\,\text {GeV} \) is also driven by the trigger threshold. The angular separation of each muon from the photon is required to satisfy \(\varDelta R>1\) in order to suppress Drell-Yan background events with final-state radiation. To ensure that the dimuon \({\mathrm {J}/\psi } \) candidate is well-separated from the photon, events are required to have \(\varDelta R(\mu \mu ,\gamma ) > 2\) and \(|\varDelta \phi (\mu \mu ,\gamma ) |>1.5\). Both the photon and dimuon momenta must satisfy \(p_{\mathrm {T}}/m_{\mu \mu \gamma }>0.38\ (0.28)\) for the \(\mathrm {Z}\) (\(\mathrm {H} \)) boson decay. This constraint helps to reject the \(\gamma ^*+\)jet and \(\gamma +\)jet backgrounds, with minimal effect on the signal efficiency and \(m_{\mu \mu \gamma }\) spectrum. Events in which the mass of the two muons is consistent with the mass of the \({\mathrm {J}/\psi } \) meson [29], \(3.0<m_{\mu \mu }<3.2\,\text {GeV} \), are retained. In addition, only events with a three-body invariant mass in the range of \(70\ (100)< m_{\mu \mu \gamma } < 120\ (150)\,\text {GeV} \) are considered in the \(\mathrm {Z}\ (\mathrm {H})\) boson search.

The simulated events are reconstructed using the same algorithms as the data, but the simulation does not reproduce the data perfectly. The differences in efficiencies between data and simulation for trigger, offline object reconstruction, identification, and isolation are corrected by reweighting the simulated events with data-to-simulation correction factors. The scale correction factors are observed to deviate from 1 by less than 2.5%. The energy and momentum resolutions for muons and photons in simulated events are also corrected to match those in \(\mathrm {Z}\rightarrow \mu \mu /\mathrm {e}\mathrm {e}\) events in data.

In the \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \) search, selected events are classified into mutually exclusive categories in order to enhance the sensitivity of the search. The categorization is based on the \(\eta \) and \(R_\mathrm {9}\) variables of the photon, where \(R_\mathrm {9}\) is defined as the energy sum of 3\(\times \)3 ECAL crystals centered on the most energetic crystal in the supercluster associated with the photon, divided by the energy of the supercluster [26]. Photons that do not convert to an \(\mathrm {e}^+\mathrm {e}^- \) pair in the detector tend to have high values of \(R_\mathrm {9}\) and a threshold of 0.94 is used to classify reconstructed photons with high \(R_\mathrm {9}\) (thus with a better resolution) and low \(R_\mathrm {9}\) (worse resolution). The three categories are: (1) photon in the barrel region with a high \(R_\mathrm {9}\) value (referred to as EB high \(R_\mathrm {9}\)); (2) photon in the barrel region with low \(R_\mathrm {9}\) value (referred to as EB low \(R_\mathrm {9}\)); and (3) photon in the endcap region (referred to as EE). The EE category is not divided into high/low \(R_\mathrm {9}\) because there are only a few events in this category. Events in the \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) search are not divided into categories since the sample size is limited and the sensitivity is still far from the SM prediction, and therefore event categorization does not result in a significant improvement in the expected limit.

Table 1 shows the numbers of observed events in data, the expected yields from the \(\mathrm {Z}\ (\mathrm {H})\rightarrow {\mathrm {J}/\psi } \gamma \) signals, the expected nonresonant backgrounds with uncertainties estimated from the fits (described in Sect. 5), and the expected resonant background contributions in the range of \(81\ (120)< m_{\mu \mu \gamma } < 101\ (130)\,\text {GeV} \) for the \(\mathrm {Z}\ (\mathrm {H})\) boson search. The values for the signal yields quoted for the \(\mathrm {Z}\) boson decay assume that the \({\mathrm {J}/\psi } \) meson is unpolarized and those for the Higgs boson decay assume transverse polarization for the \({\mathrm {J}/\psi } \) meson. In the \(\mathrm {Z}\) and Higgs boson channels, the numbers of events coming from the resonant backgrounds are large compared with those expected for the signal in the SM. However, the resonant backgrounds are small compared to the nonresonant backgrounds and therefore their effect on the final result is minimal.

The overall signal efficiency, including kinematic acceptance, trigger, object reconstruction, identification, and isolation efficiencies for the \({\mathrm {J}/\psi } \gamma \rightarrow \mu \mu \gamma \) final state, is approximately 14 (22)% for the \(\mathrm {Z}\) (\(\mathrm {H} \)) boson signal, respectively. The total signal efficiency for the \(\mathrm {Z}\) boson decay is 13% if the \({\mathrm {J}/\psi } \) meson is fully transversely polarized and 16% if it is fully longitudinally polarized. The difference between the efficiency for the \(\mathrm {Z}\) boson and that for the Higgs boson arises from the differences in the \(p_{\mathrm {T}} \) spectra for the muons and the photon in the two cases. These differences are due to the difference between the \(\mathrm {Z}\) boson and Higgs boson masses.

Figures 4 and 5 show the dimuon invariant mass and photon \(E_{\mathrm {T}}\) distributions for both \(\mathrm {Z}\) and Higgs boson searches with events from all categories included. The number of events in the distributions from signal events is set to 40 (750) times the SM predicted yield for the \(\mathrm {Z}\) (\(\mathrm {H} \)) boson decay. The number of events in distributions in the resonant background samples is normalized to 5 (150) times the expected yield. The peak at the \({\mathrm {J}/\psi } \) mass in data shows that real \({\mathrm {J}/\psi } \) candidates are reconstructed and selected. These events come from inclusive quarkonium production; no simulation is available for this analysis so they cannot be included in the distributions. The background from \(\mathrm {Z}\rightarrow \mu \mu \gamma \) events, for which a proper simulation exists, is much smaller than from inclusive quarkonium production, and it is scaled to make it visible. Figure 6 shows the distribution of the proper decay time t, defined as \((m_{\mu \mu }/p_{\mathrm {T}} ^{\mu \mu }) L_{\mathrm {xy}}\), where \(L_{\mathrm {xy}}\) is the distance between the primary event vertex and the common vertex of the muons in the transverse plane, for both \(\mathrm {Z}\) and Higgs boson decays. These distributions are normalized to the number of selected events in data. The negative values come from the fact that \(L_{\mathrm {xy}}\) is defined either to be positive or negative. The positive (negative) value indicates that the angle between the \(L_{\mathrm {xy}}\) vector and the vector of \(p_{\mathrm {T}} ^{{\mathrm {J}/\psi }}\) is smaller (larger) than \(\pi /2\). The distributions suggest that the \({\mathrm {J}/\psi } \) candidates reconstructed in data, like the signal events, are produced promptly at the \(\mathrm {p}\mathrm {p}\) interaction point, rather than coming from displaced heavy hadron decays.

Fig. 4
figure 4

The \(m_{\mu \mu }\) distributions in the \(\mathrm {Z}\) (upper) and Higgs (lower) boson searches. The number of events in the distributions from signal events is set to respective factors of 40 and 750 larger than the SM values for the predicted yields for \(\mathrm {Z}\) and \(\mathrm {H} \) boson decays. The number of events in distributions in the resonant background samples is normalized to 5 and 150 multiples in the expected yields

Fig. 5
figure 5

The photon \(E_{\mathrm {T}} \) distributions in the \(\mathrm {Z}\) (upper) and Higgs (lower) boson searches. The number of events in the distributions from signal events is set to factors of 40 and 750 those of the SM predicted yields for the \(\mathrm {Z}\) and \(\mathrm {H} \) boson decays, respectively. The number of events in distributions in the resonant background samples is normalized to respective factors of 5 and 150 larger than the expected yields

Fig. 6
figure 6

The proper decay time, t, distributions in the \(\mathrm {Z}\) (upper) and Higgs (lower) boson searches. Distributions in simulated events are normalized to the number of selected events in data. The distributions suggest that the \({\mathrm {J}/\psi } \) candidates reconstructed in data, just as signal events, are produced promptly at the \(\mathrm {p}\mathrm {p}\) interaction point, and not from displaced heavy-hadron decays

5 Background and signal modeling

The subdominant, resonant backgrounds are estimated from the simulated samples, while the continuum background for each category for both the \(\mathrm {Z}\) and Higgs boson decays is estimated and modeled using data by fitting a parametric function to the \(m_{\mu \mu \gamma }\) distribution. An unbinned maximum likelihood fit is performed over the range \(70\ (100)< m_{\mu \mu \gamma } < 120\ (150)\,\text {GeV} \) for the \(\mathrm {Z}\ (\mathrm {H})\rightarrow {\mathrm {J}/\psi } \gamma \) search. The true form of the background \(m_{\mu \mu \gamma }\) distribution is unknown and mismodeling of the background by the distribution obtained from the fit in data could lead to a bias in the analysis. The procedure used to study the bias introduced by the choice of function is described below.

Four families of functions are tested as potential parametrizations of the background: Bernstein polynomials, exponentials, power laws, and Laurent form polynomials. In the first step, one of the functions among the four families is chosen to fit the \(m_{\mu \mu \gamma }\) distribution observed in data. Pseudo-events are randomly generated by using the resulting fit as a background model to simulate possible experiment results. Here, the order of the background function required to describe the data for each of the families is determined by increasing the number of parameters until an additional increase does not result in a significant improvement in the quality of the fit to the observed data. The improvement is quantified by the differences in the negative log-likelihood between fits with two consecutive orders of the same family of functions given the increment of the number of free parameters between two functions.

Signal events with signal strength \(\mu _{\text {gen}}\) are introduced when generating the pseudo-events. The value \(\mu _{\text {gen}}=1\) corresponds to injecting 1 times the signal yield expected from the SM on top of the sum of resonant and nonresonant background. A fit is made to the distribution using one of the functions in the four families combined with a signal model, where the normalization of the signal in this step is allowed to be negative. This procedure is repeated 5000 times and for each of the functions, and it is expected that ideally on average the signal strength predicted by the fit \(\mu _{\text {fit}}\) will be equal to \(\mu _{\text {gen}}\). The deviation of the mean fitted signal strength \(\mu _{\text {fit}}\) from \(\mu _{\text {gen}}\) in pseudo-events is used to quantify the potential bias. The criterion for the bias to be negligible is that the deviation must be at least five times smaller than the statistical uncertainty on \(\mu _{\text {fit}}\). In other words, the distribution of the pull values, defined as \((\mu _{\text {fit}}-\mu _{\text {gen}})/\sigma _{\text {fit}}\), calculated from each pseudo-event should have a mean value of less than 0.2. This requirement implies and ensures that the uncertainty in the frequentist coverage, defined as the fraction of experiments where the true value is contained within the confidence interval, is negligible.

The polynomial background function satisfies the bias requirement. An order-three polynomial function is used for each category in the \(\mathrm {Z}\) boson search, and an order-two polynomial function is used in the Higgs boson search. The \(m_{\mu \mu \gamma }\) distribution and background model for each category is shown in Fig. 7.

The signal model for each case is obtained from an unbinned maximum likelihood fit to the \(m_{\mu \mu \gamma }\) distributions of the corresponding sample of simulated events. In the \(\mathrm {Z}\) boson search, a double-sided Crystal Ball function [52] is used. A Crystal Ball function plus a Gaussian with the same mean value is used in the Higgs boson search.

Fig. 7
figure 7

Fits to nonresonant background using lowest-order unbiased functions to describe the three-body invariant mass \(m_{\mu \mu \gamma }\) distributions observed in data for the \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \) channel in the EB high \(R_\mathrm {9}\) category (top left), the EB low \(R_\mathrm {9}\) category (top right), the EE category (bottom left), as well as the \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) channel (bottom right)

Table 2 Systematic uncertainties in both the searches for \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \) and \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \). In the \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \) search, the uncertainties are averaged over all categories. The numbers for uncertainties in the integrated luminosity, theoretical uncertainties, detector simulation and reconstruction correspond to the changes in the expected number of signal and resonant background events. The numbers for the uncertainties in the signal model correspond to the effect on the mean and width of the Gaussian component of the signal models resulting from the object momentum resolutions
Table 3 Limits for \(\mathrm {Z}\) and \(\mathrm {H} \) decays to \({\mathrm {J}/\psi }->\mu \mu \) final states. Shown in the second and third columns are the observed and expected limits for cross sections and branching fractions, with the upper and lower bounds in the expected \(68\%\) \(\text {CL}\) intervals shown, respectively, as superscripts and subscripts. The third column presents the \(\mathrm {Z}\) decay branching fractions when the \({\mathrm {J}/\psi } \) is assumed to be produced with \(\lambda _\theta = +1\) or \(-1\), in the helicity frame

6 Results

The distributions in \(m_{\mu \mu \gamma }\) observed in the data are in agreement with the SM expectation of the background-only hypothesis. The results are used to derive upper limits on the branching fractions, \(\mathcal {B}(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma )\) and \(\mathcal {B}(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma )\). The exclusion limits are evaluated using the modified frequentist approach, \(\text {CL}_\text {s} \), taking the profile likelihood as a test statistic [53,54,55,56]. An unbinned evaluation of the likelihood is performed.

Systematic uncertainties in the expected number of signal events and in the signal model used in the fit come from the imperfect simulation of the detector and uncertainties in the theoretical prediction for the signal production. They are evaluated by varying contributing sources within their corresponding uncertainties and propagating the uncertainties to the signal yields or shapes in simulated signal samples. The sources of the uncertainties and their magnitudes are summarized in Table 2. The uncertainties are classified into two types, one affecting the predicted signal yields and the other affecting the shapes of the signal models. The first type includes the uncertainties in the luminosity measurement [57], the pileup modeling in the simulations, the corrections applied to the simulated events in order to compensate for differences in trigger, object reconstruction, and identification efficiencies, and the theoretical uncertainties. The theoretical uncertainties come from the effects of the PDF choice on the signal cross section [33, 38, 58], the lack of higher-order calculations for the cross-section [59,60,61,62,63], and the prediction of the decay branching fractions [64]. The second type arises from the uncertainties in the momentum (energy) scale and resolution for muons (photons). These uncertainties are incorporated into the signal models by varying the momentum (energy) scale and resolution and introducing the effects on the mean and width of the Gaussian component of the signal models as shape nuisance parameters in the estimation of the limits.

The systematic uncertainties associated with the resonant background processes are evaluated with the methods used for the signal samples. The continuum background prediction is derived solely from data, so only statistical uncertainties are considered, which are translated into the uncertainties in each parameter of the fit function. The bias study mentioned in the previous section is performed to ensure that the bias from the choice of the background function is negligible. Hence, no additional systematic uncertainty is assigned to that background estimate.

The observed and median expected exclusion limits on the production cross sections and branching fractions at 95% confidence level (\(\text {CL}\)) for the \(\mathrm {Z}\) and Higgs boson searches are summarized in Table 3. With the assumption that the \({\mathrm {J}/\psi } \) meson is unpolarized, the observed upper limit on the branching fraction of \(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma \) is \(1.4\times 10^{-6}\), whereas the median expected upper limit is \(1.6^{+0.7}_{-0.5}\times 10^{-6}\) with the 68% \(\text {CL}\) interval indicated by the subscript and superscript. The observed and median expected limits correspond to 15 and 18 times the SM prediction, respectively. Extreme polarization scenarios give rise to variations from \(-13.6 (-13.5)\%\), for a fully longitudinally polarized \({\mathrm {J}/\psi } \), to +8.6 (+8.2)%, for a fully transversely polarized \({\mathrm {J}/\psi } \) meson, in the observed (expected) branching fraction. The observed upper limit on the branching fraction of \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) is \(7.6\times 10^{-4}\), and the median expected upper limit is \(5.2^{+2.4}_{-1.6}\times 10^{-4}\). The observed and median expected limits correspond to 260 and 170 times the SM prediction. For the Higgs boson decay, the \({\mathrm {J}/\psi } \) is assumed to be fully transversely polarized. The overall impact of systematic uncertainties in the final results is negligible.

The results from our \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) analysis are combined with the results from a similar search performed by the CMS Collaboration using \(\mathrm {p}\mathrm {p}\) collision data at \(\sqrt{s}=8\,\text {TeV} \), corresponding to an integrated luminosity of 19.7\(\,\text {fb}^{-1}\)  [20]. The combination results in an upper limit corresponding to 220 (160) times the SM prediction. The uncertainties are assumed either uncorrelated or correlated; the difference in the result is negligible.

7 Summary

A search is performed for decays of the standard model (SM) \(\mathrm {Z}\) and Higgs bosons into a \({\mathrm {J}/\psi } \) meson and a photon, with the \({\mathrm {J}/\psi } \) meson subsequently decaying into \(\mu ^+ \mu ^- \). The data are from \(\mathrm {p}\mathrm {p}\) collisions at \(\sqrt{s}=13\,\text {TeV} \), corresponding to an integrated luminosity of 35.9\(\,\text {fb}^{-1}\). No excess is observed above the measured background. The observed and expected exclusion limits at 95% confidence level (\(\text {CL}\)) on the branching fraction of the \(\mathrm {Z}\) boson decay in the unpolarized case are \(\mathcal {B}(\mathrm {Z}\rightarrow {\mathrm {J}/\psi } \gamma ) < \) 1.4 and \(1.6^{+0.7}_{-0.5}\times 10^{-6}\), corresponding to factors of 15 and 18 greater than the SM prediction. The 68% \(\text {CL}\) range in the confidence interval is shown as the subscript and superscript. Extreme polarization possibilities give rise to changes from \(-13.6\) and \(-13.5\%\) for a longitudinally polarized \({\mathrm {J}/\psi } \) meson, to \(+8.6\) and +8.2%, for a transversely polarized \({\mathrm {J}/\psi } \) meson, in the respective observed and expected branching fractions. The 95% \(\text {CL}\) limit on the branching fraction of the Higgs boson are \(\mathcal {B}(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma )<\) 7.6 and \(5.2^{+2.4}_{-1.6}\times 10^{-4}\), corresponding to factors of 260 and 170 times the SM value. The results for the Higgs boson channel are combined with previous CMS data from proton-proton collisions at \(\sqrt{s}=8\,\text {TeV} \) to produce observed and expected upper limits on the branching fraction for the decay \(\mathrm {H} \rightarrow {\mathrm {J}/\psi } \gamma \) of factors of 220 and 160 larger than the SM predictions.