Prospects of measuring the branching fraction of the Higgs boson decaying into muon pairs at the International Linear Collider

The prospects for measuring the branching fraction of $H \to \mu ^+ \mu ^-$ at the International Linear Collider (ILC) have been evaluated based on a full detector simulation of the International Large Detector (ILD) concept, considering centre-of-mass energies ($\sqrt{s}$) of 250 GeV and 500 GeV. For both $\sqrt{s}$ cases, the two final states $e^+ e^- \to q\overline{q}H$ and $e^+ e^- \to \nu \overline{\nu}H$ have been analyzed. For integrated luminosities of 2 ab$^{-1}$ at $\sqrt{s} = 250$ GeV and 4 ab$^{-1}$ at $\sqrt{s} = 500$ GeV, the combined precision on the branching fraction of $H \to \mu ^+ \mu ^-$ is estimated to be 17{\%}. The impact of the transverse momentum resolution for this analysis is also studied.


Introduction
: The Higgs production cross-section as a function of √ s. Taken from Ref. [25]. process changes by about 40% between the two beam polarisation configurations, while the ννH process is more significantly affected due to the WW -fusion contribution to the νν final state. At √ s = 250 GeV, the e + e − → ZH process is the dominant production process. Thus, the ZH → qqµ + µ − channel is the most important signal process at this energy due to the large branching fraction of Z → qq. Since WWfusion is the major production process at √ s = 500 GeV, ννH → νν µ + µ − with left-handed polarisation (including both the WW -fusion as well as ZH with Z → νν) is the most relevant channel at this energy. As the cross sections of the ZH and WW -fusion processes will be known to very high precision from other ILC measurements, the separation of the two production modes is not targeted in this paper and it is assumed that the uncertainties on these cross sections will have only a negligible impact on the extraction of the branching fraction. The expected numbers of H → µ + µ − signal events for each channel are summarised in Table 1, together with the integrated luminosities based on the assumed running scenario. The fourth column introduces the abbreviations to specify the combination of √ s, beam polarisation, and signal production process used throughout this paper. The processes e + e − → ℓ + ℓ − H → ℓ + ℓ − µ + µ − , where ℓ denotes a lepton (e, µ, or τ), are not considered in this paper.
The prospects for measuring the H → µ + µ − decay at linear e + e − colliders have been studied previously under various conditions [12,[26][27][28][29][30], but all studies except Ref. [30] have been performed at a centre-of-mass energy of 1 TeV or higher. The studies in Refs. [28] and [30] are based on a mass of the Higgs boson of 120 GeV. In Ref. [30] for example, the signal significance is estimated to be 1.1σ , which corresponds to a precision on BF(H → µ + µ − ) of 91%, based on 250 fb −1 of data at √ s = 250 GeV from Figure 2: The transverse momentum resolution for single muon events as a function of the momentum of particles, for tracks with different polar angles. The points show the resolution as obtained from full simulation of the ILD detector, the lines correspond to the design goal of σ 1/P t = 2 × 10 −5 ⊕ 1 × 10 −3 /(P t sin θ ) for θ = 30 • (green) and θ = 85 • (blue). Taken from Ref. [12].
the analysis of the process e + e − → ZH → qqµ + µ − , assuming a Higgs mass of 120 GeV and the Silicon Detector (SiD) concept [12,30] for the ILC. Our study comprehensively evaluates the measurement precision of H → µ + µ − channel assuming the mass of the Higgs boson of 125 GeV and the running scenario of the ILC for √ s = 250 GeV and 500 GeV. This paper is structured as follows: in Sec. 2 the ILD concept and the conditions used for producing the Monte-Carlo (MC) data samples are briefly introduced. The details of the analysis at √ s = 250 GeV and 500 GeV are explained in Sec. 3. The impact of the transverse momentum resolution σ 1/P t of the central ILD tracking system specifically for this analysis is discussed in Sec. 4 before summarising in Sec. 5.

The ILD Concept and MC Samples
ILD [12,22] is one of the proposed detector concepts for the ILC. It is a multi-purpose detector designed for particle flow analysis based on the reconstruction of hadronic jets. ILD consists of a high precision vertex detector, a time projection chamber, silicon tracking detectors, a highly granular calorimeter system and a forward detector system, all placed inside of a solenoid providing a magnetic field of 3.5 T, surrounded by an iron yoke instrumented for muon detection. Details of the ILD design, as well as about the particle flow concept can be found in Refs. [12,13,31].
Specifically for this analysis, the key performance aspect of the detector is the transverse momentum resolution σ 1/P t , since the invariant mass of the muon pair will be the final observable for distinguishing the signal from the background. The goal of the ILD design for the transverse momentum resolution is σ 1/P t ∼ 2× 10 −5 GeV −1 at high momenta in the central region of the detector [32]. This level of performance ensures that the model-independent selection of e + e − → ZH events from the recoil against leptonic Z → µ + µ − decays is dominated by beam energy spread rather than by the detector effects [12]. This goal is compared to the transverse momentum resolution obtained from the ILD full detector simulation in Fig. 2. The impact of transverse momentum resolution will be discussed in Sec. 4. The performance of electromagnetic calorimeter will be important for the recovery of final state radiation photons, which, however, is not yet considered in this study.
The MC samples have been generated in the context of the ILC Technical Design Report [12] with the matrix element generator Whizard [33] (version 1.95). Initial state radiation and the effect of beamstrahlung, as simulated with GuineaPig [34] based on the beam parameters [35], are included in the event generation. Pythia [36] (version 6.422) is used for parton shower development, hadronisation, and to decay short-lived particles, other than leptons. The decays of tau leptons are simulated by Tauola [37][38][39]. The full detector simulation based on Geant4 [40] has been performed in the Mokka framework [41] with the so-called ILD_o1_v05 detector model [12]. The pile-up from γγ → low P t hadron events has been generated based on the cross-section model described in Ref. [42]. These events have been passed through the same Geant4-based detector simulation and the resulting hits were overlaid to all MC samples before the reconstruction. Events have been reconstructed using PandoraPFA [31] in the Marlin framework [43].
To make the analysis as realistic as possible, all relevant SM processes with up to six fermions in the final state have been included. For the ILC-TDR [12], the SM background samples are grouped with the number of fermions in the final state. For example, the e + e − → 2 f process comprises the SM processes with two fermions † in the final state, i.e., two quarks or two leptons. Table 2 shows the list of MC samples used in this analysis. The total number of simulated events are of the order of 10 7 for each centre-of-mass energy. For the production of these MC samples, a significant amount of CPU time was necessary. A new MC production of similar size in the context of the recently completed ILD Interim Design Report [44] required about 320 CPU-years.
In the qqH analysis, the process e + e − → qqH with H → µ + µ − is considered as the signal, and all other processes, including other Higgs channels, are considered as background. Similarly, in the ννH analysis, e + e − → ννH → νν µ + µ − is considered as signal, while all other processes are regarded as background.

Analysis
The analysis is structured in the same way for all channels: first, a pair of well-measured, prompt, oppositely charged muons consistent with H → µ + µ − is selected by a series of sequential cuts described in Sec 3.1. These cuts are "common cuts" for all analysis channels since they only pertain to the properties of the H → µ + µ − signal-candidates. Then, the rest of the event is subjected to channel-specific event reconstruction and event selection as detailed in Sec. 3.2 to 3.4. To perform the final event selection, a multivariate analysis technique is used, as described in Sec. 3.5. Finally, BF(H → µ + µ − ) is extracted from a template fit to the invariant di-muon mass distributions in each channel. A toy MC technique is applied to estimate the final precision. This technique and its results will be discussed in Sec. 3.6 and 3.7. Table 3: Requirements for selecting H → µ + µ − candidate in the IsolatedLeptonTagging processor. The definition of variables is in the text. variables qqH250-L/R nnH250-L/R qqH500-L/R nnH500- Table 4: The "common cuts" for H → µ + µ − candidate. The definition of variables is in the text. # variables qqH250-L/R nnH250-L/R qqH500-L/R nnH500-L/R 1 First, the IsolatedLeptonTagging processor [45] is applied to select the H → µ + µ − candidates. The criteria required for isolated muon candidates are listed in Table 3. Here, E CAL is the total energy deposit in the calorimeter system (apart from the BeamCal), p track is the momentum of the track, E yoke is the energy deposit in the yoke, d 0 and z 0 are the impact parameters in transverse and longitudinal directions [46], respectively, with their uncertainties σ (d 0 ) and σ (z 0 ) as obtained for each individual track fit. A multivariate double-cone method is used to check the isolation of each particle, and a cut on the MVA output is applied. In most cases, the default values of the IsolatedLeptonTagging processor are used for isolated muon identification. In the case of the nnh250-L/R channels, the signal rate is rather low to start with, while the events hardly contain any other particles than the muons. Therefore, some of the criteria have been relaxed for these channels. The particles passed all the requirements listed in Table 3 are considered as isolated muon candidates. Only events that have exactly one µ + and one µ − are considered for further analysis. The "common cuts" applied to the H → µ + µ − candidates are summarised in Table 4. They have been chosen by maximising efficiency times purity. Here, χ 2 /Ndf(µ ± ) is the reduced χ 2 of the muon track fit, σ (M µ + µ − ) is the event-by-event mass uncertainty as obtained from error propagation from the track parameter uncertainties, M µ + µ − is the invariant mass of H → µ + µ − candidate, and θ µ + µ − is the angle between the µ + and the µ − . Cut #1 (χ 2 /Ndf) serves to select well-measured tracks, followed by two cuts (d 0 and z 0 ) which ensure prompt tracks and reject muons likely to originate from τ decays. The cut on σ (M µ + µ − ), cut #4, rejects events with too imprecise mass measurement, while cut #5 on M µ + µ − ensures that the invariant mass of the candidate is well above M Z , while not removing any di-muons with a mass close to M H . Fig. 3 shows the distribution of M µ + µ − before applying cut #5 for the example of the qqh250-L channel. The last cut (#6, cos θ µ + µ − ) requires the di-muons to have a minimum opening angle which is defined by the boost of the produced Higgs boson and thus depends significantly on the centre-of-mass energy. # Events / 10 GeV Figure 3: The distribution of M µ + µ − before applying cut #5 (see Table 4) in qqH250-L. The solid blue histogram shows the signal process.

ISR Identification
Some events include energetic initial state radiation (ISR) photons which, if within the detector acceptance, will affect the further analysis. Therefore a simple ISR identification procedure is applied after the muon identification. First, a candidate photon is selected if its energy E photon is greater than 10 GeV. All charged particle energies in a cone with half-opening angle cos θ cone = 0.95 around the photon are summed up. If this energy sum is less than 5% of the photon energy, the photon is regarded as an ISR photon. These ISR photons are not subject to jet reconstruction.

Jet Reconstruction
For the qqH channels, a jet clustering algorithm is applied to reconstruct Z → qq candidates. After the selection of H → µ + µ − and a possible ISR photon, one can expect that the remaining particles consist of Z → qq and some contribution from the overlaid γγ → low P t events. At √ s = 250 GeV, only 0.4 γγ → low P t hadron events are expected per bunch crossing on average [42]. Thus, no dedicated attempt is made to remove the overlay and the Durham clustering algorithm [47] is used to force the remaining particles into two jets. However at √ s = 500 GeV, the average number of γγ → low P t hadron events per bunch crossing increases to 1.7 [42]. To remove this background, an exclusive k T clustering algorithm [48,49] is applied to the remaining particles, requesting four jets with a generalised jet radius of 1.0. The jet radius has been tuned to optimise the reconstruction of the invariant mass spectrum of the Z → qq system. The clustering into four jets has been proven to render the overlay-removal step more robust in the presence of hard gluon emission [50]. After this process, the Durham algorithm [47] is used to force the particles contained in the four k T -jets into two final jets. The H → µ + µ − candidates and ISR photons are not included in jet clustering.

Preselection
After the general preselection described in Sec. 3.1, channel-specific cuts are applied. Table 5 summarises the cuts applied in the qqH channels. Cut #1 vetoes isolated leptons since no further isolated leptons beyond the H → µ + µ − candidate are expected in the event. For this cut, the IsolatedLeptonTagging processor [45] is applied again to the remaining particles and requires that no isolated leptons (electrons or muons) are found. The cut on the number of charged particles in each jet (#3) is applied to remove Table 5: The channel-specific cuts for the qqH processes. The definition of variables is in the text. # quantity qqH250-L/R qqH500-L/R 1 number of isolated leptons beyond µ + µ − pair 0 2 jet clustering successful yes 3 number of charged particles in each jet ≥ 2 4 M j j (GeV) 50 -130 50 -160 Table 6: The channel-specific cuts for ννH processes. The definition of variables is in the text.
| cos θ miss | < 0.99 reconstructed jets of low charged multiplicity originating mostly from 1-prong hadronic tau decays. The last cut #4 selects events with the invariant mass M j j of two jets consistent with Z → qq. Since the di-jet mass resolution is somewhat worse at √ s = 500 GeV, the allowed mass window is wider than in the √ s = 250 GeV case. The channel-specific cuts for the ννH channel are summarised in Table 6. Apart from the H → µ + µ − candidate, there should be no visible particles in the event except for some contribution from γγ → low P t hadron backgrounds which typically have low transverse momentum. Therefore, the number of charged particles in an event, excluding the H → µ + µ − candidate, which have a transverse momentum larger than 5 GeV, N P t , has to be zero (cut #1). The four-momenta of all particle flow objects in an event are summed up to the visible four-momentum, of which E vis and P t are the energy and the transverse momentum, respectively. The visible four-momentum can be subtracted from the four-momentum of the initial state, ( √ s, √ s · tan θ cross /2, 0, 0) where θ cross = 14 mrad is the crossing angle of beam collision, to obtain the missing four-momentum, and in particular its polar angle, θ miss . The cuts #2 to #4 (E vis , missing P t , and cos θ miss ) use these quantities to select events with neutrinos. The E vis requirement thereby depends on the centre-of-mass energy. Table 7 shows the number of signal and background events in each channel after the preselection. Overall, the signal selection efficiency is ∼ 85% in all channels. The "other Higgs" category includes all events with Higgs bosons with other decay modes than the signal. The category of "irreducible" backgrounds is defined as follows: for the qqH process, e + e − → 4 f → qqµ + µ − and qqτ + τ − with both tau leptons decaying into µ are defined as irreducible. For the ννH process, e + e − → 4 f → νν µ + µ − , νν µτ with τ decaying into µ, and νντ + τ − with both τ decaying into µ are defined as irreducible. These events have the same or very similar final states to the signal, thus they are difficult to remove. After the preselection, the irreducible category dominates the background in nearly all analysis channels. In case of the qqH500-L/R channels, though, the e + e − → 6 f processes (dominated by e + e − → tt) remain at the same level as the irreducible background.

Multivariate Analysis
For the further rejection of background, in particular the "irreducible" one, a multivariate analysis is performed based on the gradient boosted decision tree method (BDTG) implemented in the TMVA package in ROOT [51,52]. Typically, ∼ 10 4 MC events remain after the preselection for signal and background, each. In all channels, half of the remaining events after the channel-specific preselection are used for training and the other half for testing. The variable M µ + µ − is not used in the BDTG since it will be E sub , cos θ sub qqH500-L/R M j j , cos θ j j , P t,µ + µ − , cos θ µ + µ − , cos θ µ + − cos θ µ − , M recoil , E lead , E sub , cos θ lead , cos θ sub nnH500-L/R E vis , cos θ thrustaxis , E µ + µ − , cos θ µ + µ − , cos θ µ + − cos θ µ − , E lead , E sub , cos θ lead , cos θ sub used later in further analysis. The input variables for the BDTG are summarised in Table 8 for all the channels. Here, θ j j is the angle between two jets, E µ + µ − is the energy sum of the H → µ + µ − candidate, θ µ + (µ − ) is the polar angle of µ + (µ − ), E lead (E sub ) is the energy of the higher energy (lower energy) muon of the H → µ + µ − candidate, θ lead (θ sub ) is the polar angle of the higher energy (lower energy) muon of the H → µ + µ − candidate, M recoil is the recoil mass against the H → µ + µ − candidate (corrected for reconstructed ISR photons), P t, µ + µ − is the transverse momentum of the H → µ + µ − candidate system, and θ thrustaxis is the polar angle of the thrust axis of the visible part of the event. The variable E µ + µ − is used in nnH500-L but not in nnH500-R. For each channel, the minimum number of relevant inputs has been chosen, considering the minimisation of correlations and avoiding overtraining, in particular given the finite amount of available MC events. Fig. 4 shows the seven input variables for qqH250-L, while the corresponding distribution of the BDTG score for the signal and background events is displayed in Fig. 5. For each channel, the final cut value on the BDTG score is chosen such that it optimises the expected precision on BF(H → µ + µ − ) as described in the following section.

Extraction of the Signal Strength
After applying a cut on the BDTG score, the signal strength is extracted from a template fit to the invariant di-muon mass M µ + µ − distribution for signal and background with the signal normalisation as a  Due to the excellent mass reconstruction, the whole template fit is restricted to the range 120 GeV < M µ + µ − < 130 GeV. For the signal, a linear sum of a Crystal Ball function (CB) [53] and a Gaussian, is used as modeling function f S . This empirical function models sufficiently well the combined effect of final state radiation photons, which create a tail in the M µ + µ − distribution of the signal process, as well as effects of the finite detector resolution. It should be noted that no attempt has been made to recover final state radiation photons because the measuring accuracy of the electromagnetic calorimeter is not good enough to improve the mass resolution of events with recovered photons. As we will see in Sec. 4, an excellent invariant mass resolution is a core ingredient to the final performance of the analysis, and thus recovery of final state radiation is not considered worthwhile in this case. For the signal modeling, an unbinned fit is performed to avoid effects of the bin width which, due to finite MC statistics, cannot always be small compared to the width of the mass peak, especially when considering different P t resolutions in Sec. 4. The Higgs mass itself is assumed to be known very precisely, to about 14 MeV, from the recoil analysis [20]. Therefore, the mean value of the CB is fixed to the nominal Higgs mass of 125 GeV in this study. Figure 6(a) illustrates the modeling of the signal M µ + µ − distribution using the nnH500-L channel as example. In this example, the parameter k in Eq. (1) is 0.92. The width of the peak at half its maximum height (FWHM) is 0.23 GeV. The background is modeled by a straight line f B in all channels. An example is given in Fig. 6(b), again based on the nnH500-L channel.
The fitted f S and f B are then used as probability density functions for the generation of 2× 10 4 pseudodata sets via a toy MC technique based on RooFit [52,54]. In each pseudo-experiment, the number of pseudo-signal(-background) events is drawn from a Poisson distribution with the estimated average number of signal(background) events after all cuts as expectation value. Then, an unbinned fit of the function f ≡ Y S f S +Y B f B to the sum of pseudo-data is performed, where Y S (Y B ) is the yield of signal(background) events. Thereby, Y B is fixed to the expected average number of backgrounds after all cuts, assuming that by the time the ILC has collected its full data set, the SM background at a lepton collider can be predicted much more precisely than the statistical uncertainty for rare signal events. Thus, Y S is the only free parameter in the template fit. Fig. 7 shows an example of one pseudo-experiment in the nnH500-L channel. The final Y S distribution from 2 × 10 4 pseudo-experiments is fitted by a Gaussian to extract its mean and width as shown in Fig. 7(b). The expected relative precision on BF(H → µ + µ − ) is calculated as the width of the fitted Gaussian divided by the mean of the fitted Gaussian, which in all channels agrees with the mean number of signal events expected from the full simulation listed in Table 9.
The cut values for the BDTG score have been optimised for each analysis channel by applying the toy MC procedure described above for different values of the cut and selecting the one which gives the best measurement precision. Figs. 6 and 7 correspond to the optimal BDTG score cut case in nnH500-L. Table 9 shows the number of signal and background events in each channel after the optimisation of the BDTG score cut. The signal efficiency ranges between 45% and 72%, with an overall average of 53%. A notable exception is the nnH250-L channel, which gives an optimal result for a very hard cut on the BDTG score and as a result has a rather low efficiency of only 28%, while the background is still higher than in its sister channel nnH250-R. This is an effect of the W + W − contribution to the irreducible background, which has a much higher cross-section in the left-handed polarisation configuration, while the corresponding increase for the signal is much smaller since it is -at this energy -dominated by ZH production. At 500 GeV, the effect on the background is even more drastic, but since the now WWfusion dominated signal profits in the same way from the polarisation, there is no need to optimise for an extremely hard cut on the BDTG score. In all cases, the total background count is strongly dominated by the irreducible component. This implies that the misidentification of the final-state particles is not a limiting factor in the analysis. In the future it could be investigated, however, whether the event kinematics could be exploited in a more efficient way to suppress the irreducible component, as will be discussed in more detail below.

Results and Discussion
The expected precisions on BF(H → µ + µ − ) obtained in the eight channels are summarised in Table 10. With the √ s = 250 GeV data alone, a precision of 23% can be reached, dominated by the qqH channels.   The √ s = 500 GeV data alone reach 24%, but now the ννH channel in the left-handed data set is the most sensitive. A combination of all data sets improves the expected precision to 17%.
These numbers demonstrate a significant improvement with respect to earlier analyses. An extrapolation of the result reported by SiD [30] to a luminosity of 0.9 ab −1 , which corresponds to the size of the left-handed data set at √ s = 250 GeV in the present study, yields a precision on BF(H → µ + µ − ) of ∼ 48%. This can be compared to the qqH250-L result of 34% in the present analysis. This difference originates partially from a more sophisticated, multivariate analysis instead of the cut-based selection in the SiD study. But more importantly, for the intermediate momentum range of 40 to 100 GeV which is of relevance here, the ILD momentum resolution is considerably better than that of SiD: in the case of SiD [12], σ 1/P t ranges between 3 × 10 −5 GeV −1 for P = 100 GeV at θ = 90 • to 2 × 10 −4 GeV −1 for P = 40 GeV at θ = 30 • , while in the ILD case the relevant numbers range from 2 × 10 −5 GeV −1 for P = 100 GeV at θ = 85 • to 5 × 10 −5 GeV −1 for P = 40 GeV at θ = 30 • , as shown in Fig. 2.
The combined result of the present study is about 50% larger than the most recent projections for the HL-LHC based on the tracker upgrades for ATLAS and CMS introduced in Sec. 1. Taking into account that HL-LHC will provide O(10 4 ) H → µ + µ − events with the full expected integrated luminosity of 3 ab −1 , and thus about 100 times more signal events than ILC with 2 ab −1 at √ s = 250 GeV and 4 ab −1 at √ s = 500 GeV together, a difference of only 50% shows the highly efficient use of data possible at an e + e − collider. In addition, the results of the analysis at ILC1000 presented in Ref. [27] can be extrapolated to the full luminosity of 8 ab −1 expected at ILC1000 [23] to yield a precision of 14% on BF(H → µ + µ − ). Combining this result with our analysis at 250 GeV and 500 GeV yields a precision of 11%. Thus, from a combination of HL-LHC with the full ILC program a precision of 7% could be expected, without taking into consideration possible improvements of the analysis.
For a better understanding of analysis limitations, one can compare these results with the "theoretical limit" case, i.e. assuming 100% signal selection efficiency and no backgrounds. In this hypothetical case, the precision would reach 10.4% for ILC250, and 7.1% for the full ILC250+500 data set. The results presently achieved in full detector simulation are about a factor of 2.4 more than these theoretical limits for three reasons: the signal efficiency of 53% on average, the remaining irreducible backgrounds, and the invariant mass resolution for the di-muon system.
• If only the signal efficiencies as given in Table 9 are considered, the combined precision at ILC250 would be 13.4% and would improve to 9.4% when combined with the 500 GeV data, which is a factor of ∼ 1.7 better than the full simulation results. Table 11 shows the detailed cut flow of the single most sensitive channel qqH250-L as an example. At the "common cuts" stage, ∼ 10% of the signal events are lost. In about 4% of the events, the muons are not found, and about 2.5% of events are lost due to the d 0 and invariant mass requirements, each, c.f.; Table 11. The invariant mass requirement mostly fails due to the presence of FSR, which is not recovered, c.f. discussion in Sec. 3.6. During the rest of the preselection, only a few additional percents are lost, while a ∼ 15% reduction occurs via the BDTG score cut. In total, it seems hard to increase the overall signal efficiency drastically, but some improvement could be achieved by exploiting the variables discussed in the next item.
• The irreducible background almost entirely consists of processes with the same final state as the signal process: e + e − → qqµ + µ − for the qqH process and e + e − → νν µ + µ − for the ννH process, originating from ZZ as well as, in the case of νν µ + µ − , from W + W − production and single-Z/γ radiation off a t-channel W . Future upgrades of the analyses could attack these kinds of backgrounds by even better exploitation of all kinematic information, e.g. by testing various intermediate boson hypotheses in a kinematic fit [55] and/or by evaluating the matrix element probabilities for the common cuts (see Table 4) #1 39 (95%) 9.4 × 10 3 1.10 × 10 5 6.5 × 10 7 #2 38 (93%) 9.0 × 10 3 1.04 × 10 5 4.5 × 10 7 #3 38 (93%) 9.0 × 10 3 1.04 × 10 5 4.5 × 10 7 #4 38 (93%) 9.0 × 10 3 1.03 × 10 5 4.5 × 10 6 #5 37 (90%) 2.0 × 10 2 5.08 × 10 3 3.1 × 10 5 #6 37 (90%) 2.0 × 10 2 3.80 × 10 3 1.9 × 10 5 preselection (see Table 5 signal and various background hypotheses on an event-by-event basis [56]. There are also some background events with tau leptons such as νντ µ with τ decaying to µ, but this contribution is negligible compared to the ones with exactly the signal final state. • Last but not least, the di-muon invariant mass resolution is an important ingredient -as soon as the number of background events at the end of the selection is larger than zero: the sharper the reconstructed Higgs mass peak, the lower the "effective" amount of background under the peak. The invariant mass resolution is dominated by the precision achieved on the transverse momentum of the two muons, while the angular resolutions play a negligible role. Therefore, we will study the impact of the (inverse) transverse momentum resolution σ 1/P t in Sec. 4.
Finally, it should be noted that in the ννH process, especially at √ s = 500 GeV, two signal processes (ZH process with Z → νν and WW -fusion process) are contributing. The relative contributions of these production modes will be fixed to the percent-level or better from other Higgs decay modes like H → bb and can be used to convert the cross section times branching fraction measurement into a measurement of BF(H → µ + µ − ). With the help of the total ZH cross section determined with the recoil method, the absolute H µ µ Yukawa coupling can be extracted. Therefore, the quoted ILC precisions can directly be taken as precision on the branching fraction for H → µ + µ − , or, divided by a factor of two, as precision on the muon Yukawa coupling. This is qualitatively different from the signal strength determinations at the (HL-)LHC.

Impact of the Transverse Momentum Resolution
The di-muon mass M µ + µ − is the most important observable for this analysis. The uncertainty on M µ + µ − is directly related to the precision of the measurement of the muon momentum, and in particular the resolution on its transverse component, σ 1/P t , plays a crucial role in this analysis. The transverse momentum resolution of the ILD detector has been shown already in Fig. 2 as a function of the momentum for different polar angles.
Instead of implementing the full p and θ dependency of the resolution, a simplified approach of smearing all true muon transverse momenta with the same resolution has been taken here. This is a fully justified approach in case of √ s = 500 GeV, since the vast majority of muons have high momenta in the asymptotic regime, and, due to the isotropic decays of the Higgs boson, are mostly at large polar angles in the centre of the detector. For the case of √ s = 250 GeV, the muons have lower momenta between 40 and 100 GeV and the approximation is less precise, but still useful.
The dependence of the result on the asymptotic value of the transverse momentum resolution has been studied by adding a Gaussian-distributed error to the transverse momentum taken from the MC-truth information for all events passing the preselection described in Sec. 3.4. All other quantities in the event are taken from the full simulation as before. Transverse momentum resolutions between 1 × 10 −3 to 1 × 10 −6 (GeV −1 ) have been considered. The background is kept unchanged from the full simulation study since its invariant mass distribution after the BDTG score cut does not exhibit any sharp peaks, as can be seen, e.g.: in Fig. 6, and thus a change in momentum resolution will not affect the distribution significantly. Fig. 8 shows the obtained precision on BF(H → µ + µ − ) as a function of the transverse momentum resolution σ 1/P t at √ s = 250 GeV, together with the theoretical limit as defined in Sec. 3.7 shown by dashed lines. The red line indicates the typical value of transverse momentum resolution at √ s = 250 GeV. The "effective" resolution for which the smearing approach gives the same precision on the branching fraction as the full simulation is ∼ 4 × 10 −5 . This result is consistent with Fig. 2, because at this energy muons typically have momenta in the regime of 40 to 100 GeV which corresponds to a resolution of around ∼ 4 × 10 −5 .
The following conclusion can be drawn. First of all, with σ 1/P t = 2 × 10 −4 GeV −1 for example, typical for LHC experiments, precision would be 36% instead of 23%, i.e. bigger by a factor of 1.6. Therefore, it is very important for this analysis to reach the ILD goal for the transverse momentum resolution. In the other direction, though technologically not realistic, an improvement of σ 1/P t to a few times 10 −6 GeV −1 would allow to nearly reach the "zero-background" scenario, in the sense that although the same amount of (irreducible) background events pass the selection, the Higgs signal peak becomes so narrow that the background contribution underneath the peak doesn't have a significant effect anymore.   9 shows the equivalent result at √ s = 500 GeV, while the combined result of both centre-of-mass energies is displayed in Fig. 10. Since at √ s = 500 GeV the momenta of the muons are higher than in the √ s = 250 GeV case, the transverse momentum resolution is closer to the asymptotic performance of 2 × 10 −5 , and thus the "effective" resolution gets closer to the case of 2 × 10 −5 . Otherwise, the conclusions remain similar to the √ s = 250 GeV case, underlining again the importance to achieve the ILD design goal on the transverse momentum resolution.
A similar study has also been performed by the Compact LInear Collider (CLIC) [29], based on e + e − → ννH at √ s = 1.4 TeV (the qqh contribution is negligible at 1.4 TeV). Figure 11 in Ref. [29] shows a saturation around σ 1/P t = 1 × 10 −5 , where the precision on BF(H → µ + µ − ) reaches ∼ 25%. While the saturation is reached already for worse σ 1/P t resolutions compared to the ILD case, the CLIC study leads to the same conclusion as our analysis, namely that even a large improvement of the muon momentum resolution would result in only a moderate improvement of the statistical uncertainty of the measured product of the Higgs production cross-section and the branching fraction for the H → µ + µ − decay. On the other hand, not reaching the design goal for the momentum resolution would lead to a significant loss of sensitivity.

Summary
In this study, the prospects for measuring the branching fraction of H → µ + µ − at the ILC have been evaluated based on full simulation of the ILD detector for the √ s = 250 GeV and 500 GeV data sets as defined by the standard ILC running scenario. Eight channels have been analysed in total, √ s of 250 GeV and 500 GeV, two beam polarisation cases, and the two signal processes qqH and ννH. The combined precision on BF(H → µ + µ − ) using all results is estimated to be 17%; the 250 GeV data alone yield a precision of 23%. These results are about a factor of 2.4 bigger than the "theoretical" limit of zero background and 100% efficiency. Compared to most recent HL-LHC prospects based on the full detector upgrades, the precision is only about 50% larger, despite the fact that 100 times more H → µ + µ − events are expected to be produced at HL-LHC. In combination with other ILC measurements, the observed  signal strength can be translated into a direct measurement of the branching fraction and thus the muon Yukawa coupling. The combination of HL-LHC and ILC full program up to 1 TeV would provide an ultimate precision of ∼ 7% on BF(H → µ + µ − ). In addition to the full simulation analysis, the impact of the transverse momentum resolution was studied. This study shows the importance to achieve the ILD design goal of the transverse momentum resolution, otherwise the precision will be significantly degraded. The first evaluation of the prospects to measure BF(H → µ + µ − ) at the lower-energy stages of the ILC presented in this paper could be improved in future analyses. Interesting points to study comprise the application of beam-spot constraint in the track fit of the two muons, a full treatment of events with significant FSR and the inclusion of the Z boson decays to charged leptons.