Simulation-based inference in the search for CP violation in leptonic WH production

Sources of CP violation beyond the Standard Model (BSM) are required to explain the baryonic asymmetry of the Universe. In this work, we study BSM CP-violating components in the HWW interaction in WH production, parametrized by an effective dimension-6 CP-odd operator. We explore a machine learning simulation-based inference method that estimates a detector-level optimal observable - SALLY - comparing it with energy-dependent and angular observables, exploring different binnings for their distributions. We show that in regions of phase space where the interference between SM and the effective operator dominates, a SALLY observable leads to optimal limits. In regions where effects of the quadratic term of the effective operator start becoming dominant, such an observable still leads to optimal limits. This work aims to test current multivariate techniques and inform analysis strategies for LHC Run 3 and beyond.


Introduction
Violation of charge-parity (CP) symmetry from beyond the Standard Model (BSM) physics is required to explain the observed matter-antimatter asymmetry of the Universe.Higgs boson interactions are a natural place to search for BSM CP-violating components, given that the effect of such components on inclusive cross-sections is negligible when compared with that coming from BSM CP-conserving components.The same happens to their impact on cross-sections differential in energy-related observables, such as the ones targetted by the current Simplified Template Cross Section formalism [1].Searches for BSM CP-violating components typically use one of two approaches: either measuring differential cross-sections in bins of angular observables [2,3,4], or in bins of (statistically) optimal observables built from matrix element calculators and the complete kinematic information of the event [5,6,7,8].Both have sub-optimal points: the former measures only one or two observables simultaneously, discarding a significant fraction of the available information.The latter commonly requires neglecting or approximating parton shower, hadronization, and detector effects, often calculating matrix elements directly from detector-level observables.Refs.[9,10,11] introduce a machine-learning (ML) method which to estimate detector-level optimal observables (known as score in statistics literature), called SALLY (Score Approximates Likelihood LocallY), which removes the need for such approximations.
In this work, we studied CP-violating components in the interaction between the Higgs boson and W boson pairs (HWW interaction) in the WH production channel, with the W boson decaying to leptons and the Higgs decaying to a pair of b-quarks.This channel is challenging because the neutrino longitudinal momentum and the W boson 4-vector cannot be reconstructed unambiguously.Nonetheless, they can be estimated, typically with significant uncertainties, that propagate to angular or matrix-element-based optimal observables.
We perform an analysis to compare the sensitivity of a SALLY observable with that of energydependent and angular observables, disentangling their sensitivity to different terms in the squared matrix element.We also study different binnings for all the observables.
The structure of this paper is as follows: Sec. 2 explains the theory behind the SALLY method.Sec. 3 introduces the effective field theory formalism, used in this analysis to parametrize CP-violating components and describes the current best measurements from experiments.Sec. 4 presents the analysis details: signal and relevant backgrounds, sample generation, and selection criteria.Sec. 5 describes the different observables to be compared.Sec.6 explains the statistical methods used to extract the sensitivity of different observables.In Sec. 7, we compare the sensitivity of the different observables and derive conclusions.method then uses the joint score in the loss function of a neural network with detector-level observables as input variables.Such an estimator will converge to the detector-level optimal observable in the limit of infinite training data.The joint score is equivalent to the standard Optimal Observable when neglecting the detector response and setting θ ref = 0. SALLY is one of several simulation-based inference methods, which use information from event generators to build estimators of the likelihood, likelihood ratio, or score.
There are other ML-based methods to reconstruct detector-level optimal observables for highdimensional x without approximations [12,13,14].These build an observable from the output of classifiers trained to distinguish between events with the same (fixed) value of the Wilson coefficient but with opposite sign.They differ from SALLY since they don't estimate the score and don't use generator information in the loss function.

Effective field theory
So far, searches at the LHC have not produced any compelling hints of the existence of new particles.This fact has boosted the use of the Effective Field Theory (EFT) approach to look for new physics.In the EFT formalism, the SM Lagrangian is extended with additional operators of mass dimension d > 4. CP-violating effects come from the interference between terms in the SM Lagrangian and CP-odd higher dimension operators.These terms do not change the inclusive cross-sections, but lead to modifications in observable distributions.
This work uses the Standard Model Effective Field Theory (SMEFT) [15] formalism.In the SMEFT, the d > 4 operators are built from combinations of SM fields, constrained to be invariant under the SM SU (3) c ×SU (2) L ×U (1) Y symmetry.Electroweak symmetry breaking happens as in the SM (linearly), where the massless W and B fields absorb a Goldstone boson to gain their longitudinal polarization/mass, leaving a physical state -the SM Higgs boson.The non-redundant operator basis defined in Ref. [16] is used (the so-called Warsaw basis).The only dimension-6 CP-odd operator in the HWW vertex is ÕHW , defined in Eq. 2.

Analysis
The MadMiner package [24] wraps around the entire analysis workflow and was used to develop this work (version 0.9.3).

Signal and backgrounds
This analysis targets the associated WH production channel in the W → ℓν, H → b b final state (ℓ = e, µ), represented by the Feynman diagram in Fig. 1.The main reasons to choose this particular final state are the following: the decay of the Higgs boson to b-quarks is the one with the highest branching ratio (BR ≈ 58%), and there is a high energy (isolated) lepton from the W decay which allows for efficient triggering and removal of a large part of the QCD multijet background.The HWW interaction can also be probed in VBF production and in the H → W W decay.Despite the higher cross-section of VBF production, it is a process where it is not possible to disentangle the contributions of the HZZ and HWW interaction vertices.The WH production channel allows access to the HWW vertex independently of the HZZ vertex, which removes the need for the assumption that custodial symmetry is not broken.
The main backgrounds taken into account are t t production in the semileptonic decay channel, single top production in the s-channel, and associated production of a W boson and b-jets since these have the same or similar final state as the signal process.Their Feynman diagrams are shown in Fig. 2.

Sample generation and selection conditions
We generated signal samples at LO in QCD with Madgraph5 aMC@NLO [25] using the SMEFTsim3 [26] UFO model assuming a U (3) 5 flavor symmetry.The experimental values of m W = 80.387 GeV, m Z = 91.1876GeV and G F = 1.1663787 × 10 −5 , m t = 172.76GeV, m b = 4.18 GeV, were used as input parameters, and other fermions were considered massless.We didn't perform any truncation of the squared matrix element and fixed the decay widths of all the relevant massive particles to their LO values.We set the new physics scale to Λ = 1 TeV We generated a signal sample with 30 × 10 6 events at cHW = 0. We used LO reweighting [27] to calculate event weights for two values of cHW ̸ = 0, and a morphing technique [28] to obtain the weights for all other values.We used the functionality in the MadMiner package to determine the values of cHW ̸ = 0 which minimize the sum of the squared morphing weights, to avoid numerical instabilities from the morphing procedure.We set the maximum possible range to | cHW | ≤ 1.2 (slightly looser than the combination of experimental constraints shown in Sec.3), and the optimal points were set at cHW = 1.15 and cHW = −1.035.We used the PDF4LHC15 PDF set [29].
For each of the backgrounds, we generated samples with 10 × 10 6 events at LO in QCD with the default Madgraph5 aMC@NLO SM UFO model.We did not apply reweighting or morphing to the background samples, given that ÕHW does not affect these processes.
For parton shower and hadronization, we used Pythia with the A19 tune and MLM merging, setting the merging scale (that separates hard and parton shower emissions) to Q cut = 20 GeV.
After generating the samples, we passed the events through detector simulation, physics object identification, and selection using Delphes [30] with the default ATLAS card.We selected events where all the objects were reconstructed and the two leading jets were b-tagged, using the b-tagging efficiency map in the ATLAS Delphes card.These criteria retained ≈ 24% of the generated events.
We applied the set of generator-level cuts first introduced in Ref. [31] to mimic typical experimental analysis selection conditions.Table 1 describes the cuts and their values.

Observable Cut
Transverse momentum of lepton/light quarks (charm or lighter) p T,ℓ , p Before any cut is applied, the cross-sections are 213.6 fb for the WH signal, 6.9 × 10 4 fb for the t t background, 5.2 × 10 4 fb for the W+(b-)jets background and 665.8 fb for the single top background.Table 2 presents the cumulative efficiency (in %) of the cuts up to a specific cut.After all of the cuts, the cross-sections are 73.16fb for the WH signal, 194.24 fb for the t t background, 239.5 fb for the W+(b-)jets background and 75.74 fb for the single top background.The criteria with the largest background rejection are the cut on the minimum transverse momentum of the b-quark for W+jets and the one on the maximum value for the transverse momentum of light jets for t t, with individual (background) rejection factors of 29 and 48, respectively.Given that the NLO corrections for effective Lagrangians with CP-odd operators are still unknown, we only used LO values for the cross-sections and did not employ any signal or background k-factors.

Energy-dependent observables
The (CP-violating) SM-EFT interference component in the squared matrix element has a residual dependence on the partonic center-of-mass energy [32,33,34].Hence, it makes sense to start from observables that capture this dependence.A natural candidate is the transverse momentum of the W boson, p W T .Ref. [31] showed that the total transverse mass of the event, m ℓνb b T , captures energydependent effects complementary to p W T .We show the distribution of these observables at the reconstruction level (normalized to unit area) in Fig. 3  demonstrated by the increase in the signal-to-background ratio in the high energy region compared to the SM prediction.Nonetheless, these are not sensitive to the sign of cHW , which leads one to conclude that the modifications to these observables come mainly from the quadratic EFT term in the squared matrix element.

Angular observables
Several observables sensitive to the CP-odd interference component have been proposed in the literature [32,33].In this work, we took as a benchmark the observables defined in Ref. [32], cos δ + and cos δ − .These observables are defined in Eqs. 3 and 4, respectively.
where p W and p H are the momenta of the W and Higgs bosons, respectively, p  (normalized to unit area) of the longitudinal momentum of the neutrino and of the described angular observables, for the SM signal and backgrounds, as well as two signals with cHW ̸ = 0. We observed that the asymmetry in the angular observable distributions for cHW ̸ = 0 had the opposite sign for events with W bosons with opposite charges.This phenomenon happens because the charged lepton (and neutrino) in W + H and W − H have opposite helicities, leading to some of the angular components having an opposite sign, an idea we derived from Ref. [36].This effect leads to a dilution of the asymmetry in charge-inclusive distributions, which become more similar to those of the SM and backgrounds, reducing their sensitivity to CP-violating effects.To mitigate this effect, we propose a similar observable, obtained by multiplying the angular observable's value by the corresponding lepton charge, Q ℓ .In Fig. 5, we show the distributions of Q ℓ cos δ + and Q ℓ cos δ − (normalized to unit area) and observe that these maintain the properties of the distributions cos δ + and cos δ − but with increased asymmetry, when compared to the distributions in Fig. 4.

SALLY, data augmentation and training settings
In this work, we used the SALLY method to build a detector-level optimal observable with the SM point (θ ≡ cHW = 0) as reference.We created an unweighted training dataset with 11.5M events drawn from a combined SM signal+background sample, with a probability given by the value of the event generator weight, calculating the joint score for each.We used neural networks with a single hidden layer with 50 hidden units, scaling the input variables to have zero mean and unit variance.The loss function is minimized with the AMSGrad [37] optimizer for 50 epochs, with a learning rate that decays exponentially from 10 −3 to 10 −4 and a batch size of 128.The dataset was divided into a training and validation set, which have 75% and 25% of the events, respectively.We used early stopping based on the validation loss to avoid overfitting.Similarly to Ref. [31], we trained an ensemble of five networks to make the prediction more robust to different neural network random seeds.The network input variables are the following: In Fig. 6, we show the SALLY observable distributions (normalized to unit area).We can see that we have distributions centered at 0 for SM signal and backgrounds and non-zero mean value for non-zero values of cHW , with the sign of the asymmetry being opposite to the sign of the coefficient.

Statistical analysis
The likelihood ratio can be used in a frequentist setting to perform hypothesis testing and derive confidence intervals.One can expand the likelihood ratio around a reference point θ ref as in Eq. 5.
where p full is the likelihood.−E ∂ 2 log p full (x|θ) where E is the expected value notation) is the Fisher Information matrix, which quantifies the sensitivity of measurements around θ ref , with larger components indicating more precise measurements.The Fisher Information in histograms of observables is calculated from the variation of the differential cross-section in each bin (and will depend on the binning).If an estimator of the score (such as SALLY) is available, one can obtain the Fisher Information in the full kinematics from the estimated score for each event (regardless of binning).Neglecting the O(θ 3 ) terms leads to a likelihood ratio (and limits) linearized in θ, and is called the Local Fisher Distance formalism [38,39,31].To consistently take into account the higher order terms in the likelihood ratio, one can instead use the full likelihood ratio in the asymptotic formalism [40] or the Global Fisher Distance formalism [10,31].It is shown in Ref. [38] that such limits are equivalent.

Results
To study the sensitivity of different observables, we compare the range of the expected 95% CL limits on cHW (where a smaller the range is equivalent to a higher sensitivity), assuming the SM ( cHW = 0) as the null hypothesis, for a luminosity of 300 fb −1 , calculating the full likelihood ratio in steps of cHW = 0.024.We explore different binnings to study the sensitivity of these observables in different regions of phase space.We compare these limits with linearized limits derived using the Local Fisher Distance formalism to study the relative importance of the SM-EFT interference and quadratic EFT terms in the squared matrix element to the sensitivity of each observable.These are also compared to limits derived with the full likelihood ratio, but neglecting changes in the total rate.The results are shown in Table 3.
In the upper section of Table 3, we show the limits derived from one-dimensional (1D) and twodimensional (2D) histograms of two energy-dependent observables, p W T and m ℓνb b T .The linearized limits obtained with these observables are outside of the cHW range where the linearization of the likelihood ratio is a valid approximation (we write N.A. to make this clear).This strengthens the idea that these observables alone are sensitive to the quadratic EFT term in the likelihood ratio (and squared matrix element).Neglecting the cross-section information leads to negligible changes in the limits, showing that the change in the shape of the distributions is driving the sensitivity of such a measurement.Energy-dependent observables alone are unsuitable for probing CP-violating components since they cannot distinguish between the contribution from the quadratic component of a CP-odd operator and the contributions from other CP-even operators.
In the middle section of Table 3, we show the limits derived from 1D histograms of the angular observables Q ℓ cos δ + and Q ℓ cos δ − as well as 2D histograms of an angular observable and one of the two energy-dependent observables, p W T and m ℓνb b T .The full limits obtained with an angular observable are looser than those obtained with p W T when using a coarse binning for the angular observable histogram.We explain this observation by the fact that p W T has a large sensitivity to the quadratic EFT term, which is negligible for the angular observables.2D limits obtained with an energy-dependent  Table 3: 95% CL on cHW assuming the SM cHW = 0 as the null hypothesis for the different observables and binnings tested (showing only lower bin edges, for the last bin the upper edge is +∞) for an integrated luminosity of L = 300 fb −1 .variable and an angular observable are tighter than those obtained with an angular observable alone, both with the linearized and full likelihood ratio, an effect coming from the sensitivity of energydependent observables to the SM-EFT interference component when binning in a CP-odd angular observable, as shown in Ref. [34], as well as additional sensitivity of energy-dependent observables to the quadratic term.The size of the linearized and full limits obtained with angular observables is very similar, indicating that most of their sensitivity comes from the linear component, which reinforces the point presented in Sec.5.2 and in previous work.Finer binnings in the angular observable histograms lead only to a marginal increase in the sensitivity.The limits obtained with 2D histograms do not change when changes in the total cross-section are neglected, similar to histograms of energy-dependent observables.
On the bottom section of the table, we show the limits obtained with SALLY using the kinematic observable set described in Sec.5.3, as well as an additional models where we added p ν z , Q ℓ cos δ − and Q ℓ cos δ + as additional input variables.As mentioned in Sec.6, we derive the linearized limits for SALLY using the estimated score for each event, so the binning mentioned pertains only to the histograms used to derive the full limits.We explore two different binnings: one with 25 bins between the 0.1% and the 99.9% percentile (∈ [−0.79, 0.78]) with equal fraction of (weighted) events in each bin (default definition in MadMiner), and another with 6 equally-spaced bins between -1 and 1.The full limits are about 70% looser than the linearized ones, explained by SALLY being only optimal when quadratic EFT effects are subdominant.The limits obtained with SALLY are ≈ 40% tighter than the ones obtained from a 1D histogram of an angular observable alone, showing that a multivariate SALLY method can capture very well kinematic correlations sensitive to CP-violating components, without having to reconstruct the neutrino longitudinal momenta.We obtained the tightest full limits for a 6-bin histogram of the SALLY model trained with standard kinematic observables as input variables.

Conclusions
This work studied CP-violating components in the HWW interaction in leptonic WH production, parametrized by an effective dimension-6 CP-odd operator, ÕHW .We explored a machine-learningbased simulation-based inference method SALLY that estimates a detector-level optimal observable using the full kinematic information of the event.We compared the sensitivity of SALLY with that of 1D and 2D distributions of energy-dependent and angular observables.An analysis was put in place, including backgrounds as well as effects of parton shower, hadronisation and detector reconstruction.We showed that weighting angular observables by the charge of the lepton leads to an improvement in their sensitivity.To compare different observables, we extracted 95% CL limits with the SM ( cHW = 0) as the null hypothesis for a luminosity of 300 fb −1 , both with linearized and full likelihood ratios.This allowed us to separate the effect of the SM-EFT interference and quadratic EFT terms on the sensitivity of the different observables.
We showed that a SALLY method trained around the SM point, with only detector-level observables as input has the best sensitivity to the linear component of all the observables studied, around 30% higher than the sensitivity obtained from a 2D histogram of the transverse momentum of the W boson, p W T and the CP-odd angular observable Q ℓ cos δ + .Taking into account the quadratic components consistently, the best sensitivity is obtained by a SALLY observable trained with standard kinematic observables as input variables in a 6-bin setup, slightly better than that of the 2D histogram of p W T and Q ℓ cos δ + .In conclusion, an observable such as SALLY can extract the maximal amount of information from the full event kinematics, leading to optimal limits.Our analysis code is available online in Ref. [41].It is composed of a series of Python scripts using extensively the MadMiner Python package [9,24].More information and instructions on how to run the code can be found on the GitHub page.This code is made available not only for analysis preservation and replicability, but also so that any similar future work -such as exploring other processes or operators -can build on it simpler and faster.

Figure 1 :
Figure 1: Feynman diagram for WH associated production in the ℓνb b final state.The vertex of interest is circled in black.

Figure 3 :
Figure 3: Distributions (normalized to unit area) of p W T (left) and m ℓνb b T(right) for the SM backgrounds(red), SM signal (blue), and two signal samples with cHW = 0.5 (orange) and cHW = −0.5 (green) obtained by morphing the benchmark signal samples.
of the charged lepton in the W boson rest frame, and p (−) (ℓ/ν) are the momenta of the charged lepton/ neutrino in the rest frame of the Higgs with (p H → −p H ). To reconstruct the longitudinal momentum of the neutrino, p ν z , we identify p ν T ≡ E miss T , and solve the quadratic equation p Wµ p µ W = m 2 W , neglecting imaginary parts which can arise from resolutions effects.Following the recommendations of Refs.[32,35], we choose the solution which minimizes the difference in longitudinal boosts between the W and of the Higgs boson, |β H z − β W z |, where β z = p z / p 2 z + m 2 .We show in Fig.4the distributions

• 4 -
vector of the two b-quarks and the charged lepton • p T , η, θ and ϕ of the two b-quarks • p T , η, θ and ϕ of the Higgs boson candidate • m bb • ∆ϕ and ∆R between the two b-quarks • ∆ϕ and ∆R between the lepton and each of the b-quarks • x and y components of E miss T • |E miss T | • p T and ϕ of the W boson candidate • ∆ϕ between E miss T and each of the b-quarks • ∆ϕ between E miss T and the lepton

Figure 6 :
Figure 6: Distributions (normalized to unit area) of a SALLY observable trained on the sig-nal+background sample for the SM backgrounds (red), SM signal (blue), and two signal samples with cHW = 0.5 (orange) and cHW = −0.5 (green) obtained by morphing the benchmark signal samples.

Table 2 :
t W +jets Single top p T ℓ , p T j > 10 Cumulative efficiencies (in %) of the cuts up to the specific cut in each line.