Electroweak ALP Searches at a Muon Collider

A high-energy muon collider with center-of-mass energy around and above 10 TeV is also a vector boson fusion (VBF) machine, due to the significant virtual electroweak (EW) gauge boson content of high-energy muon beams. This feature, together with the clean environment, makes it an ideal collider to search for TeV-scale axion-like particles (ALP) coupling to Standard Model EW gauge bosons, which current and other future colliders have limited sensitivities to. We present detailed analyses of heavy ALP searches in both the VBF and associated production channels at a muon collider with different running benchmarks. We also show projected constraints on the ALP couplings in the effective field theory, including an operator with its coefficient not determined by the mixed Peccei-Quinn anomaly. We demonstrate that a muon collider could probe new ALP parameter space and push the sensitivities of the couplings between the ALP and EW gauge bosons by one order of magnitude compared to HL-LHC. The projected limits and search strategies for ALPs could also be applied to other types of resonances coupling to EW gauge bosons.


Introduction
An axion-like particle (ALP) refers to a general periodic pseudo-scalar, a ∼ = a + 2πf a with f a the decay constant. ALPs appear ubiquitously in extensions of the Standard Model (SM), usually as pseudo Nambu-Goldstone bosons arising from spontaneous breaking of some global symmetries [1,2]. ALPs enjoy a wide range of phenomenological applications 1 and serve as one of the most motivated new physics scenarios. While most of the studies focus on very light ALPs, there has been growing interests to search for a heavy ALP with a mass m a around or above the weak scale at collider-type experiments. A heavy ALP could acquire its mass from a strongly-interacting hidden sector with a confining scale above the weak scale. 2 Experimentally, the Large Hadron Collider (LHC) could probe m a up to the scale of ∼ (2 − 3) TeV, around and above which the parameter space is still widely open. In this article, we will focus on searches of heavy ALPs with m a of TeV scale, which couple to SM electroweak (EW) gauge bosons, at a possible future muon collider.
Searches for heavy ALP signals could be challenging at LHC and future colliders. A heavy ALP with m a ∼ O(1) TeV requires highly energetic beams for direct production. In addition, its one-loop-suppressed couplings and a large f a further restrain the signal rate. Consequently, combining a high integrated luminosity and low background level is necessary for detecting it. A variety of direct heavy ALP searches are available (e.g., [13][14][15][16][17][18][19][20][21][22][23][24][25][26]), constraining visible ALPs at LEP and LHC. Future circular e + e − colliders [27,28] are not optimal for the detection of TeV-scale ALPs as their achievable energies are limited by the synchrotron radiation energy loss. Future linear e + e − colliders [29,30], although almost free from radiation energy loss, suffer from smaller luminosities as each bunch is dumped instead of stored after the first crossing. Heavy ALPs could be efficiently produced at future hadron colliders [31]. However, the signal could easily be overwhelmed by the enormous QCD backgrounds or vulnerable to large systematic uncertainties.
A future muon collider provides the potential solution to all the issues regarding energy, luminosity, and background cleanliness [32][33][34][35][36][37]. As m µ ≈ 207m e , the synchrotron radiation energy loss is under control since the energy loss rate is proportional to m −4 . The circular collider design with a high luminosity could be achievable even for multi-TeV muon beams. From the conservative projection [38,39], the expected integrated luminosity reaches O(1) ab −1 for TeV-scale muon colliders and could potentially increase further. Muon also carries a significant portion of beam energy as an elementary particle. In contrast, multiple quarks and gluons share the beam energy at a future hadron collider, with the high energy tip of parton distribution functions (PDFs) suppressed. When searching for a heavy ALP that couples to EW vector bosons, the advantage of a muon collider is even more explicit since the beam muon, unlike a beam proton, radiates a significant fraction of its energy to all EW vector bosons. For example, about 5% of the total beam energy is carried by γ, W , and Z in a TeV-scale muon beam [40], while this fraction drops to below 1% in a proton beam [41]. In addition, a muon collider can directly produce an ALP almost as heavy as the center-of-mass energy √ s from µ + µ − annihilations in association with an EW vector boson, which is unlikely at hadron colliders. Last but not least, the inclusive event rate of muon collisions is much lower than that of hadronic collisions. It is well known that the proton-proton inclusive cross section scales as the proton size (∼ Λ −2 QCD ∼ O(mb)). On the other hand, muons are elementary particles, and their inclusive cross section remains small at large √ s, dominated by vector boson fusion (VBF) processes. This feature greatly reduces the overall background level and background-induced systematic uncertainties. 2 Ref. [12] proposes to use an enlarged but unified color sector to bring the QCD axion mass up to the TeV scale.
-2 - The main difficulty of the muon collider concept comes from the short muon lifetime ∼ 2 µs. It is highly challenging to produce low emittance muon beams and store them in the collider ring before they decay. Thanks to the progress of accelerator physics, the particle physics community's interest in a high-energy muon collider grows again. Several leading approaches have been developed quickly over the past years [42,43]. The US Muon Accelerator Program (MAP) [44][45][46][47] focuses on proton-driven muon sources where secondary pions decay to muons. The program's target scenario covers a wide range of beam energy, from O(1-10) GeV neutrino factories to Higgs factories and multi-TeV colliders. Such proton-driven muon beams occupy a large phase space volume, which need to be cooled significantly before muons decay. The issue is addressed by the development of Muon Ionization Cooling Experiment (MICE) [48][49][50], which demonstrates the potential of cooling muon beams over short time scales. More recently, the novel approach of Low Emittance Muon Accelerator (LEMMA) allows high luminosity muon beams to be produced with low transverse emittance [47,51]. The design uses a 45-GeV positron beam to produce muon pairs at the threshold, resulting in a small beam transverse emittance without extra beam cooling. The technology may allow a muon collider operating at even higher energies up to √ s = O(100) TeV but with limited luminosities.
Given all the potential advantages outlined above, we implement and report in this paper a detailed analysis of TeV-scale ALPs coupling to EW gauge bosons at a future muon collider. We show that the search of electroweakly coupled ALPs provides another showcase for the great physics potential of a high-energy muon collider. The search strategies and projected sensitivities on the cross sections could be applied to other types of EW resonances.
The paper is organized as follows. In Section 2, we present the effective field theory (EFT) of an ALP that couples to SM EW gauge bosons. In Section 3, we present the common production mechanisms of heavy ALPs at a collider. The HL-LHC projections on ALP couplings are deduced from current LHC limits, as HL-LHC is the inevitable step en route to any future colliders. In Section 4, details of simulations for ALP searches at a muon collider are presented. Section 5 enumerates seven analyses on ALPs produced in the VBF channel, aiming at seven final states. Meanwhile, we show that the analysis results could apply to general ALP couplings and other new physics resonance models. The gain in sensitivity from forward-region information is also discussed. Another important ALP production mechanism, associated production with the tri-photon final state, is analyzed in Section 6. In Section 7, the projected constraints on the EFT couplings are deduced from limits derived in the previous sections. We conclude in Section 8.

The Effective Theory of ALP coupling to EW Gauge Bosons
We will focus on the EFT of ALP coupling to the EW gauge bosons in the SM. Here, the heavy axion of interest has a mass of O(TeV). Before electroweak symmetry breaking (EWSB), the Lagrangian reads: where W (B) µν are the field strength tensors of SM SU (2) W (U (1) Y ) gauge groups before EWSB, W ( B) µν ≡ ρσµν W (B) ρσ /2 are the corresponding dual field strengths, i = 1, 2, 3 are the SU (2) W indices, and g 1 (2) are the corresponding coupling constants of U (1) Y (SU (2) W ). These effective couplings could be generated by integrating out heavy fermions carrying EW charges at one loop. The coefficients are thus suppressed by the loop factor, 1/(4π) 2 . 3 Note that other conventions for the EFT Lagrangian exist. In particular, another widely used convention is to rescale the coefficients to absorb the gauge couplings and loop factors [90]. We will follow the convention in Eq. (2.1) throughout the analysis, while the other convention will also be presented in some plots for the readers' convenience. We will ignore other possible ALP couplings like the ones to SM fermions and the Higgs in our study since these couplings could lead to different production channels and detection strategies beyond the scope of this paper, e.g., s-channel muon annihilation µ + µ − → a(+γ) → µ + µ − (+γ) if an axion has a sizable coupling to muon. It is noteworthy that a non-zero C BW coupling in Eq. (2.1) is not always included in the literature. An ALP coupling is usually proportional to associated global anomaly coefficient, which is true for an ALP coupling to massless SM gauge bosons, such as photons or gluons. Yet as discussed in Refs. [91][92][93][94][95], this coupling could arise in several ALP models and shall be considered in a generic analysis. Although there is no non-vanishing U (1) global × U (1) Y × SU (2) W anomaly coefficient for C BW , the coupling term turns out to arise from dimension-7 operators such as aH † τ i HW i µνB µν /Λ 3 , where H is the SM Higgs doublet, τ i 's are SU (2) W generators, and Λ is the UV cutoff scale. After EWSB, this results in an operator of the form (a/Λ)(v EW /Λ) 2B µν W 3 µν , in which v EW is the Higgs VEV. This term could be numerically important when Λ is not far beyond the EW scale. There have been efforts to construct models in which such terms appear. In the standard DFSZ axion model [7,8], the light axion is a linear combination of angular modes of the two Higgs doublets and the PQ scalar [96,97]. The top loop generates a non-zero C BW coupling [95]. Another example of a possible UV completion is constructed in Ref. [93] via a DFSZ-type extension in which heavy EW charged chiral fermions are integrated out. In this scenario, the new fermions receive their masses mainly from EWSB and are thus within reach of current and near-future collider searches. One may also note that other operators exist beyond dimension 7; however, at and below dimension 7, the operators listed in Eq. (2.1) are equivalent to the complete set of gaugeinvariant operators that involve one axion field and EW gauge bosons. In this paper, we remain agnostic about the UV completion of the C BW term in the ALP EFT.
After EWSB, the ALP couplings to the gauge boson mass eigenstates can be written as: where F, Z, W are EW field strengths, α is the fine structure constant, and θ W is the Weinberg angle. The magnitude C W W remains the same as in Eq. (2.1). The other three effective couplings after EWSB are related to the linear combinations of the original ones as With the couplings above, the partial decay widths of axion to vector bosons are given by where m Z and m W are the masses of the Z and W gauge bosons.

ALP Collider Signals and (HL)-LHC Constraints
In this section, we will introduce two production channels of heavy ALPs based on the EFT in Eq. (2.1) and discuss projected constraints on the ALP couplings to EW gauge bosons at HL-LHC. We will also comment on the indirect probes where the heavy ALP is a mediator. Figure 1: Feynman diagram for the VBF production of ALP at a collider.

Vector Boson Fusion
VBF channels refer to both W W fusion and neutral boson fusion (ZZ, Zγ, and γγ fusion) processes, as shown in Fig. 1. They could be schematically written as where f and f are beam fermions, V and V are SM vector bosons, and X's denotes recoiling beam remnants. The representation above applies to both hadron and lepton colliders with different choices of f ( ) and corresponding X's. Due to the enhanced amplitude when X's are collinear with the beams, X's are mainly present in the forward regions with large |η|'s, which could be helpful to veto non-VBF backgrounds. However, the detector performance is usually limited in these regions. The VBF-produced ALPs will also have low p T 's on average and generate a sharp invariant mass peak ≈ m a if their final states are fully visible. Since the width Γ a m a for realistic couplings, it is justified to take the narrow-width approximation and make the ALP production and decay independent to each other.
At hadron colliders like the LHC, the initiating fermions f ( ) are quarks and the recoiling remnant X's contain two (or even more) jets in the high-|η| region. LHC has already implemented several relevant searches with the √ s =13 TeV dataset, e.g., searches for diphoton resonances [98,99], inclusive W W or ZZ resonances [100][101][102][103], and Zγ resonances [104,105]. The large SM backgrounds from hadronic interactions significantly hinder signal reconstruction and affect the discovery potential. For example, soft particles from pileup could pose serious challenges to reconstruct mother particles in the decay chains, especially for hadronic W or Z fat jets as their large jet areas contain more pileup particles [106]. Most exclusion limits at 95% C.L. on σ(pp → a + X's) × BR(a → V V ) are of O(1 − 10) fb when m a ≈ 1 TeV. These limits strengthen by about one order of magnitude when m a ≈ 2 TeV since the SM backgrounds drop rapidly as the hard scattering energy scale increases.
We adopt the four VBF rates of ALP productions from Ref. [107], using the LUXqed photon PDF [108] and matrix element (ME) method. Notice that the interference between initial state γ and transverse Z bosons is ignored as the PDF set including W and Z bosons is not yet available. In addition, such an interference is unlikely to change the ALP production -6 -rate significantly [109,110]. Hence, we leave a more precise determination of ALP production rates at hadron colliders to future work. In the left panel of Fig. 2, we show the corresponding production cross section σ(pp → a + X's) VBF as a function of axion mass m a , normalizing each C V V /f a = 1 TeV −1 after EWSB. Hierarchies between processes originate from vector boson properties and definitions in Eq. (2.2). For example, the W W fusion rate is enhanced by the large s −2 W factor in Eq. (2.2) and becomes much larger than the other three. It is followed by the γγ fusion rate as it is easy for the energetic proton beam to radiate a massless photon. For m a , f a 1 TeV, and assuming O(1) EFT couplings, the typical production rates are 10 −2 fb, which are at least two orders of magnitudes below the current LHC limits.  [90].
To estimate the HL-LHC sensitivities on couplings between ALP and EW gauge bosons, we take the current limits and scale them with the integrated luminosity. We assume that systematic uncertainties will scale as statistical ones. 4 The constraints on C W W /f a and C BB /f a are plotted in the right panel of Fig. 2 for a 1-TeV ALP, fixing C BW = 0. We take into account of a → W + W − and a → ZZ constraints from [103], a → Zγ constraint from the scalar production benchmark in Ref. [100], and a → γγ constraint from [99]. 5 For f a m a ∼ TeV, HL-LHC could only probe C V V of O(10) or higher. The situation still persists when C BW = 0. In another convention of ALP EFT [90], where the loop factors and gauge couplings in Eq. (2.1) are absorbed in the definition of couplings, the projected sensitivities will appear stronger, as shown by the top and right axes of the right panel in Fig. 2. Figure 3: Feynman diagram for ALP production associated with a SM vector boson.

Associated Production
Another important ALP production mechanism is the associated production of an ALP with an EW vector boson, sometimes also called ALP-strahlung: As the ALP further decays to two bosons, the typical collider signal is a tri-boson final state, V V V , with V V forming a narrow resonance. The powers of loop factors and f a suppressions in these processes are the same as the VBF production channels. Since we ignore the ALP couplings to SM fermions, the associated production at colliders is dominated by the s-channel diagram through an off-shell vector boson, as shown in Fig. 3. At LHC, several searches for tri-boson signals are implemented with the 13-TeV data. There are limits on cross sections of W γγ and Zγγ processes [115]. Measurements of W W W , W W Z, W ZZ and ZZZ final states are also presented in [116][117][118]. Nonetheless, for W Zγ and W W γ final states, the 13-TeV analysis is not available yet, leaving the latest constraints from the 8-TeV data [119]. Most analyses above are non-resonant measurements of tri-boson rates, which all turn out to be compatible with the SM predictions. The typical uncertainties on σ(V V V ) are of O(1 − 100) fb depending on the final states. The resonance W a(→ W W ) limit is also deduced in Ref. [118], covering m a ∈ [200, 600] GeV range. 6 Fig. 4 shows the typical pp → V a cross sections at √ s = 13 TeV, with each relevent C V V /f a = 1 TeV −1 after EWSB. The production rates drop faster than the VBF ones as m a increases, due to the power suppression of energy from the s-channel propagator.  Figure 4: Different associated production rates as a function of m a at the 13-TeV LHC, Because of the lower rates and essentially non-resonant limits, the projected HL-LHC bounds from V a associated production are even weaker than the VBF ones in the right panel of Fig. 2. For a 1-TeV ALP (with f a of the same order), typical constraints on C V V are O(10 2 ), dominated by the W γγ and Zγγ measurements with lower experimental uncertainties. Even for future resonance searches associated with an extra boson, the small V a production rates still make this channel less relevant at HL-LHC.

Non-resonant (Indirect) Signals
Inevitably, heavy ALPs coupled to the SM could also modify event rates as a mediator. In this case, there will be no narrow ALP resonances in the final states. Since the most significant correction stems from the interference terms between the SM diagrams and ALPexchanging diagrams, the modifications have the same loop factor and f a suppressions as the VBF processes. However, as discussed in the last section, non-resonant searches have large SM backgrounds and are subject to significant systematic uncertainties. The current LHC data from this channel are only relevant when m a m Z [120].
For m a m Z , the impact of virtual ALPs can be described by several dimension-8 anomalous quartic gauge coupling (aQGC) operators. They are suppressed by powers of m −1 a , in addition to f −1 a and loop factors from the EFT in Eq. (2.1). The natural scaling of aQGC terms from ALP mediation follows where O 8 SM represents dimension-8 aQGC operators. The limits can be set by both multiboson [121] and vector boson scattering (VBS) [110] measurements. The extra suppression factor ∼ α 2 /(16π 2 ) 10 −6 strongly hinders the discovery potential of this non-resonance method. According to the HL-LHC performance study [122], the coefficient of the dimension-8 operators above can be measured down to ∼ 0.5 TeV −4 in the high luminosity phase. To -9 -make the signal detectable for m a , f a 1 TeV, C V V 's need to be as large as O(10 3 ). Such bounds are much weaker than those given by the direct searches. Though corrections to the discussions above may arise from finite m a √ s, it is safe to conclude that the nonresonant/indirect approach will be unlikely to play an important role in heavy ALP searches at the HL-LHC.

Muon Collider Phenomenology: Simulation Setups
In the forthcoming sections, we will study the potential of searching a heavy ALP, in particular, through its coupling to EW gauge bosons at a future muon collider.
Throughout this work, we use MadGraph 5 [123] to generate parton-level hard processes for both ALP signals and SM backgrounds. We then use Pyhthia 8 [124] to handle hadronic and electromagnetic shower effects. The detector effects are simulated by Delphes 3 [125] with its built-in muon collider detector template.
For simplicity, we follow the detector template's default parameters for reconstructing elementary objects, e.g., photon γ and light leptons . A photon with transverse momentum p T > 0.5 GeV is considered isolated if the scalar sum of p T over all the other particles inside a cone with a radius ∆R = 0.1 around the photon is smaller than 20% of the photon's p T . For simplicity, we adopt the approximation that the detector efficiencies and resolutions of electrons and muons are the same. In practice, only processes with muonic final states are simulated, with their event rates rescaled to the inclusive ones of both electrons and muons accordingly. The p T and isolation requirements for isolated muons are the same as the photon ones. Both isolated muons and photons must satisfy |η| < 2.5. Besides, muons in VBF beam remnants encode critical information like the identities of initial-state vector bosons. The ability to detect beam muons, usually in the forward region, enables a better signal background discrimination [58]. These recoiling muons are then measured with the default forward muon spectrometer in the muon collider template, covering the 2.5 < |η| < 8 region. We use the Valencia algorithm to cluster jets for hadronic ALP decay final states, which was reported to provide better performances at high-energy lepton colliders [126]. The W and Z bosons from ALP decays are highly boosted on average. Therefore, we cluster them into fat jets with cone sizes R = 1.0 or 1.2, depending on the detection channels. All fat jets must have p T > 200 GeV, above which W/Z bosons are sufficiently boosted. We also assume the tracker's resolution on the impact parameter is much smaller than τ 's lifetime ∼ 87 µm. Charged particles from τ decays are thus long-lived enough to have significant impact parameters. Such features are distinctive from prompt ALP decays. Therefore, we safely ignore any backgrounds having Z → τ τ decays.
We follow the muon collider's beam energy and integrated luminosity benchmarks in [38,39,88]. For each beam energy, the analysis results are projected according to the conservative (L con ) or the optimistic (L opt ) scenario of the integrated luminosity. In particular, we choose is the SM VBS cross section, and σ(µ + µ − → W + W − + X's)| off-shell+SM is the cross section of the same process including t-channel and off-shell s-channel ALP diagrams. Similarly, the corresponding VBF resonance cross section is also for W + W − final state. Note that the fluctuation of σ off-shell (m a ) lies within the Monte Carlo uncertainties and is likely due to numerical artifacts.
To evaluate different ALP production mechanisms at a muon collider, their rates at √ s=14 TeV are calculated by the full ME method without any parton-level cuts. For ev- Fig. 5, the VBF mechanism (corresponding to the blue curve) is the most efficient way to produce heavy ALPs at a muon collider, with the largest rate for any m a < √ s. Meanwhile, the associated V a production rate remains small in the entire mass range considered. However, the SM tri-boson backgrounds for the V a channel are also small. Therefore, the associated production mechanism could still be a relevant search channel. For the indirect search, the difference in the VBS rate induced by non-resonant ALP diagrams, namely is the cross section of the same process including tchannel and off-shell s-channel ALP diagrams. Similar to the discussion in Section 3.3 for Table 1: Summary of cuts and cross sections for SM backgrounds simulated by MadGraph: γ * denotes a partonic photon from the muon beam, and X's denote the beam remnants, including muon, muon neutrino, and their antiparticles. The background cross section σ BKGD is understood to include branching fractions to all secondary decays indicated in the parentheses. Decay products from Z's and W 's are not subject to parton-level cuts.
the LHC, the indirect search channels are not as useful as the direct ones due to the absence of narrow ALP resonances, though their rates could be larger than those of V a channels. Although most of the beam energy is carried by the muon, a fraction of it could be transferred to collinear vector bosons and neutrinos. Such initial-state-radiation (ISR) effect introduces corrections to µ + µ − annihilations. With ISR, the effective center-of-mass energy of µ + µ − annihilation drops below √ s, leading to a mild increase of event rates. Since the built-in muon PDF for muon beam is not yet available for our simulation framework, we approximate the ISR effect by inclusive collinear photon emissions. In particular, events with at most one extra photon γ col are simulated for both signals and backgrounds, i.e., µ + µ − → aγ + (γ col ) and µ + µ − → 3γ + (γ col ). As we are interested in cases where the ISR photon is unidentified, the extra ISR photon must satisfy p T < 0.1 GeV and |η| > 2.5. For other photons, only a p T (γ) > 10 GeV cut is imposed at the parton level.

Simulating Major SM Backgrounds
SM backgrounds are more involved to simulate compared to ALP signals. At a muon collider, the irreducible backgrounds for VBF ALP channels are the SM diboson processes, especially the VBS [110] due to large EW boson PDFs and sizable SM gauge couplings. The situation is analogous to the gg → jj backgrounds for the dijet resonance search at hadron colliders. The diboson invariant mass distributions of VBS backgrounds are continuous, on top of which we search for the sharp peak of an ALP resonance. The overall background level can thus be modeled by a background fit. Moreover, vector bosons from ALP decays give characteristic final states, e.g., a hard photon, a lepton pair at the Z pole, or a fat jet with mass ≈ m W/Z , which further suppress reducible backgrounds like µ + µ − → qq.
-12 - In contrast to signal simulations, the soft and collinear singularities in VBS amplitudes require parton-level cuts, which render the Monte Carlo integrations convergent. However, stringent parton-level cuts on beam remnants could lead to background underestimations. It is necessary to choose proper cuts to achieve realistic background distributions and rates without oversampling in the divergent part of the phase space. Details of each background channel simulated are summarized in Table 1.
Depending on the beam remnants, X's, VBS backgrounds could be classified into three types: 1) X's contain only muon neutrinos, with W + W − as the corresponding initial-state vector bosons; 2) X's contain only muons, with the initial states being γγ, Zγ or ZZ; 3) X's contain one muon and one muon neutrino, with the initial states W Z or W γ. For type 1), the background rates are not sensitive to cuts on neutrino beam remnants. For type 2), the final state could only be W + W − due to the SM gauge symmetry. Equivalently, ZZ, Zγ, γγ final states could only be from type 1). The process initiated by γγ are susceptible to variations of parton-level cuts due to a collinear divergence. Its cross section is about one order of magnitude above the other processes in this category. Fortunately, at the cost of a long integration time, the ME method still provides plausible distributions of γγ-initiated processes. In practice, we use the same parton-level cuts as in type 1).
For type 3), i.e., VBS ZW and γW events, the final state dibosons have a net electric charge. They only contribute to the reducible backgrounds when the W decays hadronically and is misidentified as a Z. For µ + µ − → γW + X's background, the ME method with soft parton-level cuts can give converging results. However, the ME method leads to numerical instability when generating µ + µ − → ZW + X's samples. In this case, we use the effective vector boson approximation (EVA) based on the improved Weizsäcker-Williams formula [127,128], incorporated in MadGraph [129] to generate µ ± γ * → ZW + X's samples, where γ * denotes the partonic photon from the muon beam. The vertex of µ → W ν emission is still handled by the ME calculation in this case. For simplicity, the Z boson component in the muon beam is ignored, as its contribution to the overall VBS ZW rate is negligible. For our background simulation, we set the factorization scale Q = √ s/2. We also calculate the ZW background rates with Q = √ s and Q = √ s/4 and estimate the factorization scale induced uncertainty to be ∼ 6%.
The discussions above focus on the VBS backgrounds, the leading ones for ALP searches in the VBF channels. For the search in the associated production channel, we need to consider the SM tri-boson background. In particular, we will consider the 3γ final state, the simplest one in the associated production channel. Most VBS 3γ background can be removed by requiring the invariant mass of 3γ to be ≈ √ s. Simulation shows that the VBS 3γ background rate after applying the parton-level cut m 3γ ∈ [0.9, 1.0] √ s is four orders of magnitude below that of µ + µ − annihilations and can be ignored safely. The ISR effects are also included by the same method described in Section 4.1, giving a cross section of µ + µ − → 3γ + (γ col ) = 0.44 fb. Table 2: Detector-level cuts imposed on various analyses: -means that no value is applicable; J represents a detector-level fat jet, while j corresponds to a parton-level jet. The subscript of a particle name denotes its detector-level p T ordering. The jet radius R = 1 is used for final states with two J's (corresponding to 4j's), and R = 1.2 is used for final states with one J (2j's). The superscript * indicates that the requirement is only for fat jets that do not contain photons. The width of each resonance peak is characterized by σ m , see the main text for more details.

Muon Collider Phenomenology: VBF Channels
As discussed in Section 3 and 4, VBF production of ALPs provides a heavy diboson resonance with a narrow width. Therefore, we focus on the diboson final states which are fully visible, i.e., no W → ν or Z → νν decays. All final states containing τ 's will also be excluded from our analysis since they always contain neutrinos. Here we propose several analyses targeting all four ALP decay modes and their final states. Details of each analysis are described in Section 5.1. The general procedures of our analyses are straightforward. We first select events containing the same number of photons, leptons, and/or jets as the target ALP final state. Hard cuts on p T of O(m a ) are often applied to these objects at this stage to suppress SM VBS backgrounds. Z/W reconstruction cuts are applied if the final state involves Z/W decays. Sensitivities on σ(µ + µ − → a + X's) × BR(a → ZZ, Zγ , γγ, W W ) are deduced from the signal and background yields within a narrow invariant mass window around m a . The width of each invariant window is determined by the final state, the detector resolution, and the ALP mass. As to be discussed in Section 5.2, different initial states of VBF production only cause mild variations in signal efficiencies; hence, results provided in this section are approximately model independent. On the other hand, forward-region observables, e.g., number and energies of forward muons, are highly correlated with the initial boson states. Including forward-region information into the analysis may further enhance the sensitivity; see Section 5.3. Since the forward detector at a very-high-energy muon collider is still under development, we present limits utilizing forward-region information only as suggestive values.  Figure 6: Projected 2σ exclusion limits on the VBF production cross section times the branching fraction of the ALP to a specific diboson final state, at a 14-TeV muon collider with L = 10 ab −1 . Solid lines are exclusion limits from the analyses of a → ZZ with different secondary decays of Z's, short dashed line is the exclusion limit from the analysis of a → W W with W 's decaying hadronically, long dashed line is the exclusion limit from the γγ channel, and exclusion limits from a → γZ analyses are given by the dot-dashed lines. The W and Z decay products are indicated in the parentheses.

Event Reconstruction and Analyses
To show concrete kinematic distributions and efficiencies, in the remainder of this section we will mainly present distributions and results for the √ s =14 TeV benchmark. Analyses on seven final states are demonstrated in the following paragraphs, covering all four ALP diboson decay modes. Table 2 summarizes the key cut parameters for all the analyses. For convenience, we use J to represent detector-level fat jets, while j corresponds to parton-level jets. Also, the subscript of a particle name denotes its p T ordering at the detector level. The projected 2σ exclusion limits on σ(µ + µ − → a + X's → V V + X's) are plotted in Fig. 6, assuming Poisson statistics [130]. Since the typical signal and background yields are sufficiently larger than one, signal significance is well approximated by the Gaussian limit S/ √ S + B, in which S (B) denotes the signal (background) counts in the signal region. Depending on ALP reconstruction efficiencies, EW boson decay branching ratios, and background levels, -16 -the limits from different analyses can vary by two orders of magnitude. Combining possible final states, the a → γγ mode has the highest sensitivity, followed by a → Zγ, a → ZZ, and a → W W with the least sensitivity.
Analysis of a → γγ As the preselection criteria, only events containing exactly two isolated photons with the transverse momentum of the leading photon satisfying p T (γ 1 ) > 350 GeV are kept. Also, no more than one charged track other than forward muons with p T > 10 GeV is allowed. The signal efficiency for the m a = 1(5) TeV benchmark is ∼ 57(74)% after the preselection. The same cuts keep only 8.5% of VBS diphoton backgrounds since most of them have low p T (γ) and are vetoed by the hard p T (γ 1 ) cut. The diphoton invariant mass (m γγ ) distributions for the VBS SM background and ALP benchmarks are shown in the upper left panel of Fig. 7. The typical widths of the a → γγ signal peaks are 20 GeV due to the high resolution on the photons' momenta. The signal region for each ALP benchmark is then defined as |m γγ − m a | < 3σ mγγ . Here the characteristic width σ mγγ is the half width at half maximum (HWHM) of each signal peak fitted by the Breit-Wigner distribution. Background yields in each signal region are calculated from a background fit using a generalized gamma distribution. 7 rather than the event counts to avoid large fluctuations. 8 Thanks to the high reconstruction efficiency and low backgrounds, the 2σ exclusion limit on σ(µ + µ − → a+X's → γγ +X's) reaches a few to 10 ab for TeV-scale axion at a √ s = 14 TeV muon collider with L = 10 ab −1 .
A similar search strategy applies to other benchmark beam energies. Fig. 8 shows the 2σ limits on σ(µ + µ − → a + X's → γγ + X's) for various muon collider energy benchmarks. Limits for optimistic luminosity scenarios are lower, corresponding to higher sensitivities, due to the aggressive increase of the integrated luminosities. However, in a conservative scenario where the luminosity is fixed, the exclusion limit becomes weaker for a higher √ s. The reason is that as m a becomes much smaller than √ s, the ALP produced picks up a greater boost in the beam direction on average. Consequently, the ALP decay products have larger |η|'s and are harder to reconstruct, reducing the signal efficiency. Meanwhile, SM backgrounds also increase moderately.
Analysis of a → γZ(→ ) We first select events containing exactly one isolated photon with p T (γ) > 350 GeV, and two isolated leptons with the leading p T ( 1 ) > 100 GeV. They must have opposite charges and same flavor. In addition, the dilepton invariant mass must satisfy |m − m Z | < 5Γ Z in which Γ Z denotes the full Z width. All events containing more than one extra charged track with p T > 10 GeV are vetoed, excluding the lepton pairs discussed above and forward muons. 7 We have also fitted the backgrounds with other common distributions, e.g., exponential, log-normal, and Pareto. It was found that generalized gamma distribution provides a similar if not better background fit in general. 8 To improve the quality of the background fit, any event with mγγ < 200 GeV is dropped. Similar requirements are applied to all other VBF channel analyses.
-17 - The m γ distributions of signal and background samples are shown in the upper right panel of Fig. 7. The signal peaks are also narrow, thanks to the high lepton and photon momentum resolutions. The signal region is defined as |m γ − m a | < 3σ m γ , where σ m γ is the HWHM of the fitted signal peak. Although the signal reconstruction is straightforward and backgrounds are small, the 2σ upper limits on σ(µ + µ − → a + X's → γZ + X's) from this analysis are only of O(100) ab because of the low BR(Z → ) ≈ 7%.
Analysis of a → γZ(→ jj) As the preselection, events with a hard isolated photon with p T (γ) > 350 GeV are chosen. Note that the inclusive Valencia jet algorithm may also recognize the isolated photon as a hard fat jet. Thus, the requirement on the number of fat jets with R = 1.2 is relaxed to N J ∈ [1,2]. For those events with two J's, it is necessary to ensure that one J comes from the hard photon. More specifically, we require one of them to have m J < 1 GeV and contain at most one charged track. In addition, the leading isolated photon must be inside this J. This light J will be identified and treated as the photon instead of a hadronic J. The p T of the remaining J (the true hadronic fat jet) must be greater than 200 GeV. To suppress W γ backgrounds, its m J is further required to be ∈ [m Z − 1.5Γ Z , m Z + 2.5Γ Z ]. The m J window is asymmetric around m Z to suppress the W → jj background and provide adequate signal efficiencies. Numerically, the m J selection above suppresses the W γ backgrounds by a factor of ∼ 3% at the price of keeping ∼ 40% of the signal events. Similar W jet suppressions are also present in other analyses with hadronic Z decays.
The middle left panel of Fig. 7 shows the m γJ distributions of SM backgrounds and several signal benchmarks. As expected, the finite jet momentum resolution makes signal peaks much wider than previous cases. Since signal peaks around m a are wide and asymmetric, the -18 -optimized signal region is defined as m γJ ∈ [m a − 1.5σ mγJ , m a + 0.5σ mγJ ] for each m a . Here σ mγJ is simply the standard deviation of each signal peak, as the asymmetric peaks make the fitted HWHM less meaningful. Benefited from the large BR(Z → jj), the 2σ limit on σ(µ + µ − → a + X's → Zγ + X's) from the a → γZ(→ jj) analysis is at least a factor of two better than its a → γZ(→ ) counterpart, in the entire axion mass range we have studied.
Analysis of a → ZZ(→ 4 ) For preselection, events with exactly four isolated leptons (excluding forward muons) with net electric charge zero are selected. Only those containing two same-flavor, opposite-sign lepton pairs both satisfying |m + − − m Z | ≤ 5Γ Z are kept. To suppress the soft SM ZZ → 4 backgrounds, p T ( 1,2 ) must be greater than 300 and 200 GeV, respectively.
The resulting distributions of four lepton invariant mass, m 4 , are shown in the middle right panel of Fig. 7. The narrow signal peaks leave a simple signal region definition as |m 4 − m a | < 3σ m 4 , where σ m 4 is the HWHM of each fitted signal peak. The projected 2σ upper bounds on σ(µ + µ − → a + X's → ZZ + X's) are of O(1) fb, considerably weaker than other channels due to the small BR(Z → ) 2 ∼ 5 × 10 −3 .
Analysis of a → Z(→ )Z(→ jj) We first preselect events containing exactly two oppositesign same-flavor leptons (excluding forward muons) with p T ( 1 ) > 300 GeV. Their invariant mass must satisfy |m + − − m Z | ≤ 5Γ Z . A fat jet with R = 1.2 and m J ∈ [m Z − 1.5Γ Z , m Z + 2.5Γ Z ] is also required to suppress the reducible VBS W (→ jj)Z(→ ) background. The signal and background m J2 distributions are shown in the lower left panel in Fig. 7.
The signal regions are defined by m J2 ∈ [m a − 1.5σ mJ2 , m a + 0.5σ mJ2 ], where σ mJ2 is the standard deviation of each asymmetric signal peak. Benefited from the high BR(Z → jj), the sensitivity provided by this analysis is stronger than the corresponding a → ZZ(4 ) one.
Analysis of a → ZZ(→ 4j) Events with exactly two fat jets (R = 1.0) with both p T (J 1,2 ) > 300 GeV are selected. Both J's must satisfy m J ∈ [m Z − 2Γ Z , m Z + 5Γ Z ], where a wider m J window is to ensure sufficient signal efficiencies. In addition, events containing more than one isolated leptons or photons with p T (γ/ ) > 10 GeV are vetoed.
The resulting distributions of two jet invariant mass are shown in the lower right panel of Fig. 7. Finally, the signal region is defined by m 2J ∈ [m a − 1.5σ m2J , m a + 0.5σ m2J ] for each m a benchmark, where σ m2J is the standard deviation of the fitted signal peak. Benefited from the sizeable BR(Z → jj), the analysis provides the leading constraint on ZZ detection channel with σ(µ + µ − → a + X's → ZZ + X's) O(100) ab.
Analysis of a → W W (→ 4j) Similar to the previous analysis, events having two R = 1.0 fat jets with both p T (J 1,2 ) > 300 GeV are chosen. To veto VBS ZZ(→ 4j) and ZW (→ 4j) backgrounds, each fat jet must satisfy m J ∈ [m W − 5Γ W , m W + 3Γ W ]. Also, no more than one isolated lepton or photon with p T (γ/ ) > 10 GeV is allowed.
The m JJ distributions of signal and backgrounds of this analysis are shown in the last panel of Fig. 7. The signal region is defined by m 2J ∈ [m a − 1.5σ m2J , m a + 0.5σ m2J ] for each m a benchmark, where σ m2J is the standard deviation of the signal peak. This analysis is subject to the highest background rate, stemming from the large VBS W W cross section and BR(W → jj) ∼ 70%. Meanwhile, the reconstructed signal peaks are also the widest among all analyses due to the broad m J window. As a result, the constraint on σ(µ + µ − → a + X's → W W + X's) is only about (100 -400) ab.  The signal samples used to derive the limits summarized in Fig. 6 are simulated assuming C BB = C W W and C BW = 0. It is then necessary to check whether collider limits depend on a particular choice of ALP couplings. The concern is that with different combinations of couplings, the signal cut efficiency can be affected as the relative strengths in various production channels change. Even though multiple sets of EFT couplings allow ALPs to decay to the same final states, they can still affect the kinematics of produced ALPs as the importance of different initial states varies. Different initial states could lead to distinctive ALP momentum distributions, which may lead to different signal efficiencies in multiple analyses. For example, the ALPs produced in a γγ-philic model (i.e., the ALP coupling to γγ is significantly larger than the others after EWSB) will have a greater average longitudinal boost than the ALPs produced in a W W -philic model. Then, in the γγ-philic model, the average |η| of the ALP decay products will also be larger, causing more difficult event reconstructions and reduced signal efficiencies.

Model Independence of the Limits
To examine the generality of limits in Section 5.1, we survey twelve models parameterized by {C BB , C W W , C BW } and compare their signal efficiencies. The descriptions of tested models are provided in Table 3. The relative standard deviations of the signal efficiencies, i.e. the -20 -standard deviations of signal efficiencies divided by the mean signal efficiencies of models considered, represent the analyses' model dependence 9 and are collected in Table 4. One could see that the signal efficiencies only vary by 20% for all the analyses in a wide range of m a . Therefore, it is reasonable to take the results in Fig. 6 as model-independent limits for VBF produced pseudo-scalars at a 14-TeV muon collider. Although minor model-dependent variations remain, the exclusion limits in Fig. 6 provide a solid order-of-magnitude estimate for further studies.
We also investigate some different new physics models with similar final state topologies. Specifically, we simulate models of heavy scalars with couplings to the square of EW field strengths [131] and heavy vector boson with Chern-Simons couplings to EW gauge bosons [132,133]. It turns out that signal efficiencies of these scenarios match with the ALP models within the 1σ range shown in Table 4. Hence, it is very likely the upper limits in Fig. 6 apply to generic VBF-produced resonances decaying to EW boson pairs. m a (TeV) γγ γZ(2 ) γZ(2j) ZZ(4 ) ZZ (2 2j Table 4: Percentage variations of the signal efficiencies for different search channels, based on models listed in Table 3. The percentage variation is defined as the standard deviation of signal efficiencies divided by the mean signal efficiency of all the models we checked.

Analysis Including Forward Muons
One possible way to suppress the SM backgrounds µ + µ − → (ZZ, Zγ, γγ) + X's is to use the beam remnants' information in the forward region. As discussed in Section 4.2, these processes are always associated with W + W − initial state due to the SM gauge symmetry; therefore, X's in these processes are only muon neutrinos. They will be strongly suppressed if we require one or two energetic forward muons in the analysis. Nevertheless, additional forward-region cuts also reduce signal efficiencies. Typically, only O(20%) of signal events will remain if two forward muons are required. In addition, limits obtained with forwardregion cuts are more sensitive to the model variation since different C V V couplings in the EFT significantly affect the identity and kinematics of the beam remnants. Exclusion limits using only events with two forward muons are shown in Fig. 9, together with corresponding ones without requiring forward muons from Fig. 6. In the high-m a region For cleaner reconstruction, we consider only channels with γ or Z → + − final states.
with small SM backgrounds already, results from analyses with no forward-muon cut take the lead as their signal efficiencies are higher. For some analyses, the forward-muon cuts could help improve the sensitivities slightly in the low-m a region. For example, requiring two forward muons improves the a → γZ(→ 2 ) and a → γγ analyses when m a 3 TeV. However, due to the small SM background and decreased signal efficiency from the forwardmuon cut, the limits from forward-muon-specific analyses are always weaker in the cleanest channel, i.e., a → ZZ(→ 4 ).

Muon Collider Phenomenology: Associated Production Channels
From Fig. 5, the associated production µ + µ − → V a appears to be a subdominant production channel for ALP at a muon collider. However, this channel has several interesting features that could potentially benefit the detection and calls for further study. Firstly, as is clear from Fig. 5, σ(µµ → V a) is approximately a constant for a wide range of m a up to the threshold m a ∼ √ s, where VBF production is less efficient. Secondly, the major SM backgrounds for this channel are much smaller than the VBS backgrounds for the VBF channel [53,87,88]. Last but not least, the V a production kinematics is almost independent of the ALP model choices when √ s m Z . The limits derived are thus approximately model independent.

Analysis of a(→ γγ)γ
Inspired by the strong limit set by the VBF a → γγ analysis, we focus on the µ + µ − → γa(γγ) process as the benchmark of associated production channels. The top panel of Fig. 10 displays the three photon invariant mass (m 3γ ) distributions of the SM background and four representative m a benchmarks, simulated according to the method discussed in Section 4. Note that m 3γ can be regarded as a proxy for the energy scale of the hard process, sometimes denoted as √ŝ in ISR-related studies. Aside from the detector smearing effects, the ISRinduced low-m 3γ tails are also obvious. The m 3γ distribution of the e + e − → 3γ process in SM with ISR effects (using the electron ISR PDF provided by MadGraph5) is also shown for comparison. Its ISR-induced tail is even more significant as expected since electrons radiate more than muons. σ Exc, with ISR (ab) 31.6 37.7 48.6 80.7 Table 5: Exclusion limit at 2σ level for the tri-photon search at a 14-TeV muon collider with 10 ab −1 data.
Due to different kinematics, the analysis procedures for "light ALP" (m a √ s/2) and "heavy ALP" (m a > √ s/2) benchmarks differ. For the light ALP case, we expect the associatively produced photon to be harder than the ALP decay products. We hence identify the two softer photons, i.e., γ 2 and γ 3 , as ALP decay products. The detector-level cuts p T (γ 1 ) ≥ 4.5 TeV and ∆R(γ 2 γ 3 ) ≤ 2 are imposed to further suppress backgrounds with small m γ 2 γ 3 . To suppress the potential VBF background, we also impose a detector-level cut on the hard scattering scale m 3γ ≥ 0.9 √ s. The lower left panel of Fig. 10 shows the signal and background distributions of m γ 2 γ 3 . Only events with a resonance peak and m γ 2 γ 3 ∈ [m a − 3σ mγ 2 γ 3 , m a + 3σ mγ 2 γ 3 ] are selected, where σ mγ 2 γ 3 is the signal HWHM. The corresponding 2σ limits are shown in Table. 5. Notice that the continuous SM background level increases slowly when m γ 2 γ 3 decreases. Nevertheless, the larger background is compensated by the improved photon resolution (hence narrower m γ 2 γ 3 window) and higher signal efficiencies at low m γ 2 γ 3 , rendering stronger constraints for smaller m a . The limits obtained from samples without ISR are also shown for comparison. As expected, the ISR effect enhances the SM background rate and thus weakens the sensitivity slightly. For the last benchmark (m a > √ s/2), we use an angular-separation-ordered strategy instead. This is inspired by the fact that the associated photon is less energetic than the heavy ALP, and the heavy resonance has a small boost inside the detector. Therefore, the photon pair from the ALP decay shall have a larger angular separation, and the signal peak shall be reconstructed by the photon pair with the largest ∆R. Our simulation also suggests that imposing min{p T,γ i } ≥ 1 TeV, max i,j {∆R γ i γ j } ≤ 5, and m 3γ ≥ 0.9 √ s helps separate -23 - the signal and SM backgrounds. The m γγ range of the signal region is determined ad hoc as m γγ ∈ [9.86, 10.1] TeV since the reconstructed signal peak shape is highly non-trivial. Nonetheless, this choice of the signal region carves out most of the resonance peak around m a . Relevant m γγ distributions are shown in the lower right panel of Fig. 10, and the 2σ exclusion limit for the m a = 10 TeV benchmark is listed in Table 5.
To conclude, limits from the a(→ γγ)γ analysis are weaker than those of the VBF diphoton channel by a factor of O(1 − 10) at a 14-TeV muon collider. A more detailed study on the ALP associated production channel with different final states and collider energies will -24 -be left to future work.

Constraining the EFT
With ALP production rates, branching ratios, and corresponding collider sensitivities known, one can put constraints on the ALP EFT in Eq. (2.1). ALP production rates, computed by MadGraph5, are quadratic functions of the EFT couplings, and the decay rates in terms of the couplings are given in Eq. (2.5). For collider sensitivities, we take the model-averaged limits discussed in Section 5 and 6.
We first focus on the C W W -C BB subspace with C BW = 0, which is the most common scenario. The projected 2σ constraints for a 1-TeV ALP at a 14-TeV muon collider with L = 10 ab −1 is shown in Fig. 11. We show the parameter space covered by the four most sensitive VBF analyses and the 3γ analysis with associated production. While diphoton channel provides the strongest constraint on the signal rate, as demonstrated in Fig. 7, it is insensitive to the C BB −C W W direction in the coupling plane, as shown in Fig. 11. This could be understood from Eq. (2.4): along this direction, the ALP coupling to photon vanishes. Thus, we need other analyses to complement the diphoton resonance search. In particular, the γZ(2j) analysis provides the strongest sensitivity along the C BB −C W W direction. The combined 2σ contour takes a butterfly shape, as visible in Fig. 11. In brief, a 14-TeV muon collider with L = 10 ab −1 data could probe C BB /f a and C W W /f a down to O(1) TeV −1 , which is at least one order of magnitude more sensitive than HL-LHC! Other muon collider energy/luminosity benchmarks are considered in Fig. 12. The VBF a → γγ exclusion limits for other running benchmarks are adopted from Section 5.1 directly. For limits on other final states, they are approximated by rescaling σ(µ + µ − → a + X's → γγ + X's) limits in Fig. 8. In particular, we assume that the ratios between the limits at any ( √ s, L) and (14 TeV, 10 ab −1 ) in every search channel stay the same as those from di-photon limits. We checked this assumption by completing the analysis with full simulations in the next-to-most-contributing detection channel, γZ(jj) channel, with a variety of values of m a and √ s. The percentage variations between the rescaled exclusion limits and the exclusion limits with full simulations are 20% for √ s = 10, 30 TeV benchmarks and are at most ∼ 50% for the √ s = 50 TeV benchmark. When √ s increases, the VBF ALP production rates increase, and consequently, the projected limits improve for both conservative and optimistic luminosity scenarios. The left panel of Fig. 12 shows the limits in the conservative luminosity scenario, while the right panel shows the ones in the optimistic scenario. By comparing both panels, the significant benefit from high integrated luminosities is evident. Now we turn to the case in which C BW = 0. We treat C BW on an equal footing as C W W and C BB . To better understand the effects of a non-zero C BW , we show exclusion contours in the C W W -C BB plane with C BW /f a = {−10, −5, 0, 5, 10} TeV −1 , in the top panel of Fig. 13, with darker color indicating a higher value of C BW . Near the origin of the plane, we obtain the familiar butterfly-shaped contour at the center of Fig. 11. As C BW becomes non-zero, the constraints shift away from the origin due to the interference between operators. A larger -25 - |C BW | (so as the other couplings) also gives rise to a higher ALP production rate in general, leading to shrinking contours. Similar analyses are implemented in the C W W -C BW plane with C BB /f a = {−6, 0, 6} TeV −1 , and in the C BW -C BB plane with C W W /f a = {−4, 0, 4} TeV −1 . It is noteworthy that the overall constraints are the weakest in a direction C W W : C BB : C BW −3 : −5 : 8 that simultaneously renders a small ALP production rate and elusive ALP decays with BR(a → γZ)∼ 5% and BR(a → γγ) < 1%.

Summary and Outlook
In this article, we present a detailed analysis of TeV-scale ALP searches at a future highenergy muon collider. In particular, we focus on the searches that probe axion couplings to EW gauge bosons. The dominant ALP production channel at a muon collider is the VBF channel due to the high virtual EW gauge boson content of high-energy muon beams, -26 - analogous to the virtual gluon content of high-energy proton beams. We show that the a → γγ final state enjoys the highest sensitivity, followed by the Zγ mode. We also analyze the associated production channel with a lower rate but a smaller background, which turns out to be less sensitive compared to the VBF diphoton channel. Meanwhile, we show the projected constraints on various subspaces of ALP couplings in the EFT. A muon collider with energy 10 TeV could improve the sensitivity to ALP couplings by at least one order of magnitude compared to the HL-LHC, as well as expand the mass range of ALPs that could be searched for. We also demonstrate that the model-independent limits on ALP VBF production and decays are also applicable to generic BSM resonances coupling to EW gauge bosons, which could benefit future related studies. This study serves as another example of the great physics potentials of a high-energy muon collider.
There are several directions to expand the work. First, we base our study on the EFT of an ALP coupling to EW gauge bosons. While the EFT usually serves as a useful modelindependent theoretical framework for experimental searches, the UV completions could predict (model-dependent) degrees of freedom and signals that could also be within reach. In particular, the couplings of heavy ALPs that current and future colliders are sensitive to are pretty large. They could be induced by heavy fermions carrying EW charges, which could be searched for as well. It will be useful to survey and classify possible UV completions of the ALP EFTs, check whether there are some generic predictions for collider phenomenologies, and compare the sensitivities to the associated UV degrees of freedom and to ALPs.
-27 - Secondly, the beam remnants in the forward region in the VBF channels ecode additional information. We carry out a crude estimate in this paper and find that they could lead to minor improvements of sensitivities in some mass range of ALPs. It will be useful to examine how to employ the forward-region information in a more sophisticated way, which could be valuable inputs for the design of a future muon collider. Thirdly, despite of the weaker constraints on the EFT couplings from γa(γγ) associated production channel, a generic V a channel is robust against model variations and systematic uncertainties. In this study, we use the most straightforward observables and analysis strategy on the benchmark tri-photon channel. Dedicated designs of observables and analyses in other V a channels can potentially complement the VBF search at a future muon collider and are worth further investigations.
Note added -After our preprint draft appeared on arxiv, we became aware of another work [134] on the same subject. While both papers share some components in analyses, they differ and complement each other in several aspects.