Analysis of the CP structure of the Yukawa coupling between the Higgs boson and $\tau$ leptons in proton-proton collisions at $\sqrt{s} =$ 13 TeV

The first measurement of the CP structure of the Yukawa coupling between the Higgs boson and $\tau$ leptons is presented. The measurement is based on data collected in proton-proton collisions at $\sqrt{s} =$ 13 TeV by the CMS detector at the LHC, corresponding to an integrated luminosity of 137 fb$^{-1}$. The analysis uses the angular correlation between the decay planes of $\tau$ leptons produced in Higgs boson decays. The effective mixing angle between CP-even and CP-odd $\tau$ Yukawa couplings is found to be $-$1 $\pm$ 19$^\circ$, compared to an expected value of 0 $\pm$ 21$^\circ$ at the 68.3% confidence level. The data disfavour the pure CP-odd scenario at 3.0 standard deviations. The results are compatible with predictions for the standard model Higgs boson.


Introduction
There are strong theoretical motivations to search for CP-violating effects in couplings of the H to fermions rather than V bosons. In couplings to V bosons, CP-odd contributions enter via nonrenormalisable higher-order operators that are suppressed by powers of 1/Λ 2 [28][29][30], where Λ is the scale of the physics beyond the SM in an effective field theory. Therefore, these are expected to only yield a small contribution to the coupling. A renormalisable CP-violating Higgs-to-fermion coupling may occur at tree level. The τ lepton and top quark Yukawa couplings, Hττ and Htt, respectively, are therefore the optimal couplings for CP studies in pp collisions [31], and measurements of these two couplings are complementary. Recently, both the CMS [32] and ATLAS [33] Collaborations presented first measurements of the CP structure of the H coupling to top quarks. The CMS result rejects the purely CP-odd hypothesis with a significance of 3.2 standard deviations, σ, while the ATLAS analysis rejects it with a significance of 3.9σ. The CMS and ATLAS Collaborations have also studied the CP-nature of the H interaction with gluons [21,34] which was found to be consistent with the SM expectation, albeit with limited sensitivity. Such studies may also be interpreted in terms of the H coupling to top quarks, under the assumption that the interaction is mediated predominantly via top quark loops.
The CP-properties of the H → ττ process is commonly described in terms of an effective mixing angle α Hττ , which is virtually equal to 0 • in the SM. The measurement of a nonzero α Hττ would therefore directly contradict the SM predictions, and have implications for beyond the SM physics models, such as two-Higgs-doublet models [35], including supersymmetry. For example, in the minimal supersymmetric model CP violation in the Higgs-to-fermion couplings is expected to be small and therefore the measurement of a sizeable mixing angle would disfavour such scenarios. In contrast, in the next-to-minimal supersymmetric model, α Hττ can be as large as 27 • [36]. Feasibility studies have indicated that the LHC experiments can measure α Hττ to a precision of about 5-10 • with 3 ab −1 of data [29,37].
In this paper we present the first measurement of the CP structure of the H coupling to τ leptons. This analysis uses the pp data sets collected by the CMS detector at √ s = 13 TeV in 2016, 2017, and 2018. These correspond to integrated luminosities of 35.9, 41.5, and 59.7 fb −1 , respectively. This analysis targets the most sensitive τ h τ h , τ µ τ h and τ e τ h decay channels, where a τ lepton decaying to hadrons is denoted as τ h , and a τ lepton decaying to a muon or an electron as τ µ or τ e (or collectively as τ ), respectively. The decays into light leptons are accompanied by two neutrinos, while the hadronic modes involve one neutrino. These particles are not directly detected but result in a transverse momentum imbalance which can be used to partially constrain the ττ system. In total, this analysis covers about 70% of all possible τ lepton pair decay modes. Table 1 summarises the τ lepton decay modes used in this analysis, their branching fractions, and the shorthand symbols that we use to denote them in the rest of this paper. The charged hadrons are denoted by the symbol h ± , which consist mainly of charged pions but include a smaller contribution from charged kaons. Throughout this paper we will assume that all h ± are charged pions (π ± ) since the CMS detector is not able to distinguish between different types of h ± . Table 1: Decay modes of τ leptons used in this analysis and their branching fractions B [38].
Where appropriate, we indicate the known intermediate resonances. The last row gives the shorthand notation for the decays used throughout this paper.
Mode 17 This paper is organised as follows. The parameterisation of the CP properties of the τ Yukawa coupling is discussed in Section 2. In Section 3 the experimental setup is outlined, and this is followed by a discussion of the data sets and simulated samples in Section 4. Subsequently, the event reconstruction is presented in Section 5. Thereafter, in Section 6 the CP-sensitive observables used to extract the results are outlined. In Section 7 the event selection is presented. The estimation of the backgrounds is discussed in Section 8. The techniques used to distinguish the signal from the background events are outlined in Section 9. In Section 10 various distributions that are used to extract the results are displayed and discussed. In Section 11 the systematic uncertainties are presented. The results are discussed in Section 12, and a summary of the analysis is given in Section 13.

Parametrisation of the CP properties of the τ Yukawa coupling
We parameterise the Lagrangian for the τ Yukawa coupling in terms of the coupling strength modifiers κ τ and κ τ that parameterise the CP-even and CP-odd contributions, respectively [31]: In this equation, m τ is the mass of the τ lepton, τ denotes the Dirac spinor of τ lepton fields, and the vacuum expectation value of the Higgs field, v, has a value of 246 GeV. The effective mixing angle α Hττ for the Hττ coupling is defined in terms of the coupling strengths as while the fractional contribution of the CP-odd coupling f Hτ τ CP is obtained from the mixing angle as f between α Hττ and φ CP may be inferred from the decay of a H via τ leptons to two outgoing charged particles [39] as In this equation, the outgoing charged particles have an energy E ± in their respective τ rest frames. The functions b are spectral functions [40] that encapsulate the correlation between the τ spin and the momentum of the outgoing charged particle. We note that the spectral functions for the leptonic and various hadronic decays are different. Figure 1: The decay planes of two τ leptons decaying to a single charged pion. The angle φ CP is the angle between the decay planes. The illustration is in the H rest frame. Figure 2 shows the normalised distribution of φ CP at the generator level, calculated in the rest frame of the H, for the scalar, pseudoscalar, and maximally mixed values of α Hττ , as well as the φ CP distribution from Drell-Yan processes. The simulated event samples that are used to generate these distributions are discussed in Section 4. These distributions are for the scenario where both τ leptons decay to a charged pion and a neutrino.
There is a phase shift between different mixing scenarios such that the difference in φ CP equals two times the difference in α Hττ , as given by Eq. (3). It is important to note that the distribution of φ CP for the Drell-Yan background is constant; we will exploit this symmetry to reduce statistical fluctuations in the background estimates, as explained in Section 9.
The observable φ CP was originally introduced in the context of e + e − collisions [41,42] where the τ lepton momenta can be reconstructed and thus φ CP can be calculated in the H rest frame.
In hadronic collisions the momenta of the neutrinos cannot be well constrained, except for the configuration in which both τ leptons decay via the a 3pr 1 mode to three charged pionswhere the momenta of the τ leptons can be further constrained from the reconstruction of the τ lepton production and decay vertices. Therefore, the methods for estimating φ CP have been extended and optimised for hadronic collisions [37], as discussed in Section 6. Throughout this document, we will denote the angle between the τ decay planes as φ CP , irrespective of the frame in which it is calculated. The normalised distribution of φ CP between the τ lepton decay planes in the H rest frame at the generator level, for both τ leptons decaying to a charged pion and a neutrino. The distributions are for a decaying scalar (CP-even, solid red), pseudoscalar (CP-odd, dash blue), a maximal mixing angle of 45 • (CP-mix, dash-dot-dot green), and a Z vector boson (black dashdot). The transverse momentum of the visible τ decay products p τ T was required to be larger than 33 GeV during the event generation.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionisation chambers embedded in the steel flux-return yoke outside the solenoid. Events of interest are selected using a two-tiered trigger system. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of about 4 µs [43]. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimised for fast processing, and reduces the event rate to around 1 kHz before data storage [44]. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [45].

Simulated samples
The signal and relevant background processes are modelled with samples of Monte Carlo simulated events. The signal samples with a H produced through gluon-gluon fusion (ggH), vector boson fusion (VBF), or in association with a W or Z vector boson (denoted as WH or ZH, or VH when combined) are generated at next-to-leading order (NLO) in perturbative quantum chromodynamics (QCD) with the POWHEG 2.0 [46][47][48][49][50][51][52] event generator. The H production mechanism is configured to only produce scalar Higgs bosons, as opposed to pseudoscalar Higgs bosons or mixed couplings. The latter scenarios would also affect various properties of the production, e.g. the production rate and the topology of associated jets, such as the azimuthal angle ∆φ jj between the two leading jets, when present [53]. We note that in our analysis of α Hττ we are not sensitive to the modifications to the ggH and VBF+VH production rates as we treat them as unconstrained parameters that are allowed to float freely in the fit to data. We also do not use the ∆φ jj or similar variables to define event selection criteria or as inputs to discriminants; whereas modifications to other kinematic variables must be negligibly small in order to avoid experimental bounds from dedicated measurements (e.g. Ref. [21]). For the ggH production process, we used dedicated simulations [49,54] to confirm that modifications to the CP properties of the Yukawa couplings between the Higgs and top and bottom quarks did not significantly influence either the signal acceptance or the distributions of discriminants used to extract our results. We observed that such effects are typically at the O(1%) level or smaller and are negligible compared to theoretical uncertainties on the signal modelling. Therefore, our measurement of α Hττ is not sensitive to the assumptions made about the CP-nature of the production interactions.
The distributions of the Higgs boson's transverse momentum (p T ) and of the jet multiplicity are reweighted to match the predictions at next-to-NLO (NNLO) accuracy obtained from full phase space calculations with the POWHEG NNLOPS (version 1) generator [55,56]. The decay of the H does not depend on its production. The description of the decay of the H to τ leptons is obtained using the PYTHIA generator version 8.230 [57]. These samples are simulated without accounting for the τ spin correlations. After the samples have been generated, the TAUSPINNER package [58] is used to calculate event weights that can be applied to the simulated signal samples to model τ polarisation effects for a boson with CP-mixing angles of 0, 45, and 90 • . There is no normalisation effect from the reweighting procedure, i.e. the integrated H → ττ cross section of the signal samples is invariant under rotations in α Hττ . All 2016 samples are generated with the NNPDF3.0 [59] NLO parton distribution functions (PDFs), while the NNPDF3.1 [60] NNLO distributions are used for 2017-2018.
The MADGRAPH5 aMC@NLO [61] generator (version 2.6.0) is used for processes involving a Z or W boson and up to four outgoing partons generated with the matrix element, and these processes are denoted Z + jets and W + jets, respectively. Processes involving W bosons originating from top quark decays are not considered in these samples. They are simulated at leading order (LO) with the MLM jet matching and merging approach [62]. The same generator is used at NLO for diboson production, whereas POWHEG 2.0 (1.0) is used for top quark-antiquark pair production [63] and single top quark production (associated with a W boson) [64,65]. The generators are interfaced with PYTHIA to model the parton showering and fragmentation, as well as the decay of the τ leptons. The PYTHIA parameters that affect the description of the underlying event are set to the CUETP8M1 tune [66] in 2016, and CP5 tune [67] in 2017-2018.
Monte Carlo generated events are processed through a simulation of the CMS detector that is based on GEANT4 [68], and are reconstructed with the same algorithms as the ones used for data. Additional pp interactions per bunch crossing ("pileup") are included. The effect of pileup is taken into account by generating concurrent minimum bias collision events with PYTHIA. The pileup distribution in simulation is weighted to match the pileup in data.

Event reconstruction
The reconstruction algorithms for both observed and simulated events are based on the particleflow (PF) algorithm [69], which relies on the information from the different CMS subdetectors to reconstruct muons, electrons, photons, and charged and neutral hadrons. These objects are combined to form more complex ones, such as τ h candidates or missing transverse momentum (p miss T ).

Primary vertex reconstruction
The positions of all pp interactions (vertices) in the event, including the hard scatter (primary) and soft (pileup) vertices, are reconstructed in a two-step procedure [70]. The steps consist in clustering the tracks that appear to originate from the same interaction using the deterministic annealing algorithm [71], and subsequently fitting the position of each vertex using tracks associated to its cluster with the adaptive vertex fitter (AVF) algorithm [72]. The candidate vertex with the largest value of the sum of the p 2 T of all associated physics objects is considered to be the primary pp interaction vertex (PV). The physics objects included in this sum are jets, clustered using the anti-k T jet finding algorithm [73] with the tracks assigned to candidate vertices as inputs, and the associated p miss T , taken as the negative vector sum of the p T of those jets.

Muon reconstruction
Muons are identified and reconstructed with requirements on the quality of the track reconstruction and on the number of hits in the tracker and muon systems [74], and selected within |η| < 2.4. In order to reject muons that originate from nonprompt interactions, or are misidentified, a relative isolation is defined as In this equation, ∑ charged p T is the scalar p T sum of the charged particles originating from the PV and located in a cone of size ∆R = √ (∆η) 2 + (∆φ) 2 = 0.4 (where φ is azimuthal angle in radians) centred on the muon direction. The sum ∑ neutral p T is a similar quantity for neutral particles. The ∑ charged, PU p T term sums over charged particles originating from pileup vertices in order to estimate and subtract the contribution of pileup to the neutral particle sum, which is scaled by 1/2 to account for the fraction of neutral to charged energy in pileup interactions. The p T of the muon is denoted by p µ T . In the τ µ τ h channel, it is required that I µ < 0.15.

Electron reconstruction
Electrons are reconstructed using tracks from the tracking system and calorimeter deposits in the ECAL, with a veto on objects with a large HCAL to ECAL energy ratio. Electrons are identified using a multivariate analysis (MVA) discriminant combining several quantities that describe the shape of the energy deposits in the ECAL, the quality of tracks, and the compatibility of the measurements from the tracker and the ECAL [75]. The energy scale of electrons is adjusted in data and simulation using the Z mass peak, while its resolution in simulation is adjusted to data.
For the electrons, an isolation criterion I e is defined for a cone size of R < 0.3 centred on the electron direction. Its definition is analogous to Eq. (4) for the charged tracks, but the pileup contribution of neutral particles is estimated via an effective-area method as In this equation, the pileup contribution is estimated as ρ EA, where ρ is the event-specific average pileup energy density per unit area in the φ-η plane and EA, which depends on the electron η, is the effective area specific to the neutral component of the isolation variable [75].
In the τ e τ h channel, it is required that I e < 0.15.

T reconstruction
Jets are reconstructed using the anti-k T algorithm [73] with distance parameter R = 0.4 as implemented in the FASTJET package [76]. The anti-k T algorithm functions by taking PF objects and grouping them together based on inverse powers of the p T of the objects [73,77]. Jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5 to 10% of the true momentum over the whole p T spectrum and detector acceptance. Pileup interactions can contribute additional tracks and calorimetric energy depositions to the jet momentum. To mitigate this effect, charged particles identified to be originating from pileup vertices are discarded and an offset correction is applied to correct for remaining contributions. Jet energy corrections are derived from simulation to bring the measured response of jets to that of particle level jets on average. In situ measurements of the momentum balance in dijet, photon + jet, Z + jet, and multijet events are used to account for any residual differences in the jet energy scale between data and simulation [78]. The jet energy resolution amounts typically to 15-20% at 30 GeV, 10% at 100 GeV, and 5% at 1 TeV [78]. Additional selection criteria are applied to each jet to remove jets potentially dominated by anomalous contributions from various subdetector components or reconstruction failures. Data collected in the ECAL endcaps were affected by large amounts of noise during the 2017 data-taking period, which led to disagreements between simulation and data. To mitigate this issue, jets used in the analysis of the 2017 data are discarded if they have p T < 50 GeV and 2.65 < |η| < 3.10. Hadronic jets that contain b-quarks (b-jets) are tagged using a deep neural network (DNN), called DEEPCSV algorithm [79]. The medium working point used for the DEEPCSV algorithm corresponds to a b-jet identification efficiency of about 70% for a misidentification rate for jets originating from light quarks and gluons of around 1%.
The pileup per particle identification algorithm [80] is applied to reduce the pileup dependence of the p miss T observable. The p miss T and its magnitude (p miss T ) are computed from the PF candidates weighted by their probability to originate from the PV [81]. The p miss T is adjusted for the effect of jet energy corrections.

Tau lepton reconstruction
The τ h lepton reconstruction is performed with the Hadron-Plus-Strips (HPS) algorithm [82]. Starting from the constituents of reconstructed jets, the algorithm works by combining charged hadrons with the signature of neutral pions-one or more electron/photon candidates falling within a certain ∆η×∆φ region (referred to as a "strip"). The combination of these signatures provides the four-vector of the visible decay products of the parent τ h . The identification of τ h candidates makes use of isolation discriminators to reject quark and gluon jets that could be misidentified as τ h . For this analysis, a DNN called DEEPTAU [83] is used on the HPS τ h candidates to provide further discrimination. In order to achieve an optimal τ h identification performance, the DNN combines information from the high-level reconstructed τ h features together with the low-level information from the inner tracker, calorimeters and muon subdetectors, using PF candidates reconstructed within the τ h isolation cone. The working point on the output discriminant is chosen to provide a τ h identification efficiency of about 60% at a jet misidentification rate of approximately 5 × 10 −3 . Two other DNNs are used to reject electrons and muons misidentified as τ h candidates using dedicated criteria based on the consistency between the measurements in the tracker, calorimeters, and muon detectors.
The mass of the ττ system m τ τ is calculated using a simplified matrix-element algorithm, SV-FIT [84], which combines the p miss T and its uncertainty matrix with the four-vectors of both τ candidates to calculate the parent boson's mass. The resolution of m τ τ is 15-20% depending on the ττ final state and the boost of the ττ system.

Reconstruction of CP-sensitive observables
In this section we outline the methods used to construct CP-sensitive observables, collectively referred to as φ CP angles. Various techniques can be used to define φ CP depending on the decay topology of the τ leptons. In total, four methods are employed in the analysis: the "impact parameter" [85,86], "neutral-pion" [86,87], "combined" [86], and "polarimetric vector" [88] methods. We provide a detailed description of these methods below. We then summarise for which di-τ final states each method is utilised, and outline the procedures used to optimise the resolving power of the φ CP observables.

Impact parameter method
This method exploits the finite lifetime of the τ leptons and can be applied to all events where both τ leptons decay to a single charged particle. We define the impact parameter j ± of a track (where ± refers to the charge of the track) as the vector between the PV and the point on the track where distance to the PV is minimal.
For each τ lepton we define a plane using the impact parameter vector and the charged-particle momentum vector. This plane, which is constructed in the laboratory frame, only represents the genuine plane of the decay into a single charged pion and neutrino when the laboratory frame coincides with the rest frame of the H. This means that this method does not reconstruct the genuine τ lepton decay plane, but rather a plane that is correlated with it. In order to approximate the rest frame of the H we use the charged decay products of the τ leptons of the H to define a zero-momentum frame (ZMF) into which the decay planes are boosted. The ZMF is used to define φ CP for all channels in this analysis, except the a 3pr 1 a 3pr 1 channel, where both τ leptons decay to three charged pions and the H rest frame can be reconstructed.
We then construct four-component vectors in the laboratory frame as λ ± = (0, j ± ). The λ ± four-vectors are boosted into the ZMF and denoted λ ZMF ± . We also boost the respective charged-pion four-vectors to the ZMF, denoted q ZMF± . Subsequently, we take the transverse components of λ ZMF ± with respect to q ZMF± . We normalise the vectors to obtain unit vectorŝ To reconstruct φ CP , we first define the angle φ ZMF and O ZMF as From φ ZMF and O ZMF we reconstruct φ CP in a range [0, 360 • ] as The τ lepton spectral functions have opposite signs for single-pion decays and leptonic decays in the kinematic regions considered in this analysis. This causes a phase flip between the φ CP distributions for single pion decays and leptonic decays when the impact parameter method is used [40]. An illustration of the definition of the φ CP observable using the impact parameter method is shown in Fig. 3 Figure 3: Illustration of the τ lepton decay planes and the angle φ CP for various decay configurations. The decay planes are illustrated with the shaded regions, and either the vectorλ or the momentum vector of the neutral pion is in the decay plane. The illustrations are in the frame in which the sum of the momenta of the charged particles is zero. Left: the decay plane for the decays τ − → π − + ν and τ + → π + + ν . Middle: the decay plane as reconstructed from the neutral and charged pion momenta. Right: φ CP for the mixed scenario, in which one τ lepton decays to a pion while the other decays via an intermediate ρ meson.

Neutral-pion method
This method can be applied to hadronic decay channels in which both τ leptons undergo decays involving more than one outgoing hadron. We describe the method applied to the intermediate ρ meson decay, and the intermediate a 1 (1260) meson to 1-and 3-prong decay modes.
For the ρ meson decays, the vector λ is replaced by the four-momentum vector of the π 0 , which means we use the planes spanned by the ρ decay products (e.g. the π ± and π 0 in the case of the ρ ± → π ± π 0 decay) to define the φ CP observable. The four-momentum vector of the π 0 is obtained as follows: to estimate the π 0 energy, we sum the energies of all electron/photon candidates collected by the HPS algorithm. The direction of the π 0 is then taken as the direction of the leading electron/photon candidate. In most cases the leading candidate is a photon and the direction is determined by pointing its associated ECAL clusters back towards the PV. Finally, the mass is set to the known π 0 mass.
The same method is applied to a 1pr 1 decays involving two neutral pions by summing the neutral constituents in the decay, as they cannot be easily resolved experimentally. The angle φ CP is then calculated in an analogous method to that used in the impact parameter method except that to avoid destructive interference from differently polarised states of the mesons, the following observables need to be defined: In this equation, E π is the energy of the pion in the laboratory frame. If y τ is negative, φ CP is obtained via the shift 360 • − φ CP . The neutral-pion method can also be successfully adapted to the a 3pr 1 decay mode. In these decays we select the oppositely charged pion pair with an invariant mass closest to the intermediate ρ 0 , an illustration is depicted in Fig. 4. Of this pair we treat the pion with the charge opposite of that of the τ h lepton as though it was a π 0 , and the momentum of the pion with the same sign as the τ h is used for the calculation of the ZMF. After these assignments the neutral-pion method is applied as described for 1-prong decays. An illustration of the definition of the φ CP observable using the neutral-pion method is shown in Fig. 3 (middle).

Combined method
This method combines the impact parameter and neutral-pion methods outlined in the two previous sections, which is appropriate for events where only one of the two τ leptons decay into multiple hadrons. For the τ lepton decaying into a ρ, a 1pr 1 , or a 3pr 1 mesons, the vector λ in Eq. (6.1) is replaced by four-momentum vectors as described in Section 6.2, and the angle φ CP is then calculated using the same formulae.
Analogously to the neutral-pion method we avoid destructive interference from differently polarised states of the mesons by applying the shift 360 • − φ CP for events with y τ ± , where y τ ± is computed for the τ lepton that decays to the intermediate resonance.
An illustration of the definition of the φ CP observable using the combined method is shown in Fig. 3 (right).

Polarimetric vector method
This method can, in principle, be applied to any τ lepton decay mode in which both τ lepton momenta can be well reconstructed. When τ leptons decay via the a 3pr 1 mode, the τ lepton rest frames can be reconstructed using the secondary vertices (SVs), that are extracted by fitting the three tracks originating from the a 3pr 1 decays. Therefore, we only apply the polarimetric vector method to the a 3pr 1 a 3pr 1 decay configuration. The polarimetric vector h can be considered as an estimate of the most likely direction of the spin vector s of the τ lepton in the τ lepton rest frame [88]. We start by outlining the reconstruction of the τ lepton momenta, which are required to compute the polarimetric vectors. Subsequently, we describe how φ CP is reconstructed in the H rest frame from both the τ lepton momenta and the polarimetric vectors.
To reconstruct the τ lepton momentum in the a 3pr 1 channel we assume that the reconstructed τ lepton candidate has a mass m τ and undergoes a two-body decay to a massless neutrino and an intermediate a 1 meson with mass m a 1 . Furthermore, we define the Gottfried-Jackson angle θ GJ as the angle between the a 1 momentum and the τ lepton momentum [89]. The latter is reconstructed from the positions of the τ lepton production and decay vertex. The magnitude of the τ lepton momentum is then given by [89] . (9) The maximal allowed value θ max GJ of the Gottfried-Jackson angle is defined as For decays in which the reconstructed θ GJ exceeds θ max GJ , due to the finite angular resolution of the charged pions and τ direction measurements, the value of θ GJ is set to θ max GJ . As can be seen in Eq. (9), there can be two solutions for the τ lepton momentum. This can be understood by considering the decay in the τ lepton rest frame. In this frame, the a 1 meson may be emitted in either the same or the opposite direction to that of the τ lepton momentum in the lab frame. When the a 1 meson is emitted in the direction orthogonal to the τ lepton, we obtain the uniquely determined solution when the square root in the numerator of Eq. (9) vanishes. Thus, we may obtain up to four pairs of solutions for the momenta of the two τ leptons. This ambiguity is resolved by selecting the pair of solutions with the mass closest to that of the H. The direction of the τ lepton in the lab frame is determined by the vector SV−PV.
Once the τ leptons and a 3pr 1 momenta have been determined, the polarimetric vectors h 1,2 may be retrieved using the a 3pr 1 resonance model as implemented in the TAUOLA [90][91][92] program, which uses the parameters as measured by the CLEO Collaboration [93]. To reconstruct φ CP from the polarimetric vectors and the τ lepton momenta vectors, we introduce a vector k that is defined as In this definition, n 1,2 are the two τ lepton momentum unit vectors in the H rest frame. We then reconstruct φ * and O * (in the H rest frame) as From φ * and O * we reconstruct φ CP via the assignments defined in Eq. (7).
In summary, for the configuration involving two a 3pr 1 decays, the secondary decay vertices are exploited to reconstruct the τ momenta in the rest frame of the H. Together with the a 1 resonance model, this allows for the extraction of the polarimetric vectors. Studies on simulated signal events revealed that fits to φ CP measured using the polarimetric vector method have approximately twice the resolving power between the CP-even and CP-odd states as compared to applying the neutral-pion method.

Strategy for selecting the CP-sensitive observables
The τ h impact parameter is relatively small compared to the tracking resolution and therefore the precision to which it can be measured is limited despite the excellent resolution of the CMS tracker. An advantage of the neutral-pion method is that it does not rely on the reconstruction of the impact parameter; instead, the direction of the neutral pion needs to be determined. Due to the relatively large distance between the primary interaction point and the ECAL (O(1 m)), coupled with the fine ECAL granularity, the direction of neutral pions can be reconstructed with smaller relative uncertainties compared to the impact parameter direction.
Studies were performed on signal events to review the CP sensitivity of the neutral pion and impact parameter methods in regions of phase space where the latter is expected to perform optimally. The sensitivity normalised to the number of events was comparable while the selections (explained below) that are needed for the impact parameter method discard a significant number of events. The cuts imposed on the impact parameter significance mean that the neutral-pion method can be applied to about twice as many events as the impact parameter method. Therefore, although the impact parameter method can in principle be applied to every τ lepton decay mode, in this analysis we only use this method for the ππ, µπ, and eπ final states. For the ρρ, ρa In other configurations where one τ lepton decays to a single charged hadron or lepton and the other to multiple hadrons, the combined methods is used.

Extraction of φ CP optimisation
In this section we outline the experimental techniques that are developed for this analysis to optimise the experimental extraction of φ CP . A dedicated MVA discriminant is deployed to improve the identification of the τ h decay modes. To improve the estimate of the impact parameters, the reconstruction of the PV coordinates is improved, and a helical extrapolation of the track to the PV was implemented. These methods are discussed in detail below.

Multivariate discriminant for τ h decay mode identification
In order to optimally discriminate between the different decay modes, a boosted decision tree (BDT) [94] is deployed. It is trained using the XGBOOST [95] framework, and is applied on top of the τ h selection. The algorithm was trained to distinguish between the 1-and 3-prong τ lepton decays: π, ρ, a 1pr 1 , a 3pr 1 , and π ± π ∓ π ± π 0 . The π ± π ∓ π ± π 0 decay is not used in the extraction of the CP angle but must be separated from a 3pr 1 to avoid contamination. The inputs to the BDT are the kinematic features of the τ h reconstructed by the HPS method and its constituents. The BDT exploits angular correlations between the decay products, invariant mass quantities, and kinematic properties of the photons.

Primary vertex refitting
The finite lifetime of the τ lepton means that tracks emanating from its decay do not originate from the PV. These tracks are removed and the PV is refitted using the remaining tracks as input to the AVF algorithm. The LHC beamspot represents a three-dimensional (3-D) profile of the luminous region, where the LHC beams collide in the CMS detector. The parameters of the beamspot are determined from an average over many events [70]. The uncertainties in the beamspot parameters are relatively small and are incorporated into the AVF algorithm to provide an additional constraint on the PV position. The inclusion of the beamspot information leads to an improvement of the PV resolution in the transverse plane of a factor of about 3 for signal events and of about 4 for Drell-Yan events, while the z coordinate of the PV is largely unaffected. This refitted PV is used when estimating the impact parameters and the polarimetric vectors.

Impact parameter estimate and significance
A dedicated algorithm is deployed to derive the impact parameter of the charged track from the τ lepton decay using an analytic extrapolation of track trajectory towards the PV position. The extrapolation depends on the magnetic field and the helical parameters of the track. The distance between the extrapolated track and the PV position is then minimised numerically to determine the impact parameter.
This procedure has two advantages. Firstly, with this extrapolation, the minimisation of the impact parameter is performed in three dimensions. For tracks with large η values, the procedure leads to a better estimation of the z coordinate of the impact parameter than when the minimisation is done exclusively in the transverse plane. Secondly, the helical extrapolation allows for the propagation of both the track and PV uncertainties into an overall impact parameter significance S IP (defined as the ratio of the magnitude of the impact parameter divided by its uncertainty). In this analysis, selections are made on the impact parameter significance, as further explained in Section 9.

Event selection
Events are selected online by the CMS trigger system. For the τ τ h channels, events are triggered by either a paired + τ h cross trigger or a single-lepton trigger with a higher p T threshold for the lepton compared to the cross trigger. For the τ h τ h channel, a di-τ trigger is used.
Offline, a pair of oppositely charged τ leptons separated by ∆R > 0.5 is required. The offlinereconstructed objects must match the required trigger objects (i.e. the object as reconstructed by the trigger system) within ∆R < 0.5. The offline-reconstructed light lepton is required to have a p T value that is at least 1 GeV higher than the online threshold. If an offline τ h candidate is matched to a τ h trigger object (including the τ h leg of the + τ h cross trigger for the semileptonic channels), the τ h must have a p T at least 5 GeV above the trigger threshold. The offline thresholds are higher than the online thresholds due to the turn-on curve in the trigger efficiencies. Table 2 summarises the online trigger and offline p T thresholds for 2016-2018. The offline requirements apply only to objects that are matched to a trigger object. If, in the τ τ h channels, the event is selected online by the single-lepton trigger, the offline τ h is required instead to have a p T of at least 20 GeV.
For the τ τ h channels, the large W + jets background is reduced by rejecting events based on the transverse mass m T of the light lepton and p miss T system, where ∆φ is the azimuthal angle between the direction of the light lepton and p miss T .
The longitudinal and transverse impact parameters d z and d xy of the muon and electron are required to satisfy |d z | < 0.2 cm and |d xy | < 0.045 cm. These impact parameters originate from a minimisation of the magnitude of the impact parameters in the transverse plane only, Table 2: Kinematic trigger and offline requirements applied to the τ e τ h , τ µ τ h , and τ h τ h channels. The trigger p T requirement is indicated in parentheses (in GeV). The p T thresholds indicated for the τ h apply only for the object matched to the hadronic trigger or to the hadronic leg from the cross trigger.
Channel Year Trigger requirement Offline p T (GeV) in contrast to the impact parameters used for calculating φ CP , which are derived using a 3-D minimisation. For the purpose of the event selection this factorised approach is sufficiently precise. For the leading τ h track, only the requirement |d z | < 0.2 cm is imposed to avoid loss of selection efficiency. Further, a veto on events containing loosely identified additional electrons or muons is imposed. For the τ τ h channels, a veto on jets passing b-tagging requirements is also applied. When multiple τ lepton pairs are present, the pairs are ranked based on the output scores of the DEEPTAU algorithm for the τ h candidates, and the relative isolation for the τ candidates. The highest ranked pair is selected.

Background estimation
The processes that contribute to the background in this analysis are Z + jets, W + jets, top quark-antiquark pair production (tt), single top quark, and diboson production. Additionally, events comprised uniquely of jets produced through the strong interaction, referred to as QCD multijet events, form a significant background. These processes contribute to the production of genuine τ leptons, jets and leptons that are misidentified as τ h , as well as prompt leptons and jets that are misidentified as τ in the semileptonic channels. All background processes resulting in two genuine τ leptons constitute a major background, and are estimated from data using a τ-embedding technique [96]. The majority of the backgrounds due to jets misidentified as τ h candidates are estimated using the "fake factor" (F F ) method, as described in Ref. [97]. The remaining minor backgrounds are determined using simulated events. In the remainder of this section we outline the τ-embedding and F F methods, and describe the corrections applied to simulated events in order to improve their description of the data.

The τ embedding method
In order to obtain the genuine ττ background we exploit lepton universality, and replace oppositely charged muon pairs in data events with simulated oppositely charged τ lepton pairs. The dominant process for this background is Z → ττ, but there are also small contributions from tt and diboson processes.
For all data-taking periods, events containing an oppositely charged dimuon pair were collected using a dedicated di-µ trigger. The detector hits belonging to the muon tracks are removed from these events. A Z boson is simulated in an empty detector, which is forced to decay to a pair of oppositely charged τ leptons with identical kinematics to the muon pair that was removed. The τ leptons are forced to decay fully hadronically or semileptonically in order to simulate either the τ h τ h or τ τ h channels. The detector response to the τ pair is then simulated and added to the data event.
In order to model the background processes in data well, various corrections need to be applied to the embedded event samples. Muons and electrons are corrected for mismodelling of their trigger, tracking and identification, and isolation requirements efficiencies. The τ h candidates are corrected for mismodelling of their trigger, reconstruction and identification efficiencies.
The "tag-and-probe" method [98] is used to derive these corrections. The τ h energy scales are corrected per decay mode to match the corresponding scales in data. The electron, muon, and pion impact parameters are corrected using a samples of Z → µµ and Z → ee events and quantile-mapping techniques.

The F F method
This method is designed to provide an estimate of the shape and normalisation of events in which at least one quark or gluon jet is misidentified as a τ h lepton based on control samples in data. We refer to such a jet as a jet → τ h .
We define a determination region that is orthogonal to the signal region and dominated by a background process resulting in jet → τ h misidentifications; the construction of these regions is outlined below. We define a τ h nominal ID as a τ h object that passes nominal ID requirements as outlined in Section 5, and a relaxed τ h ID as objects that pass a looser requirement on the DNN output but fail the nominal ID. In this determination region we obtain the ratio between the nominal ID τ h rate and the relaxed ID τ h rate. The ratio in the determination region is the F F . To obtain the rate of misidentified jets in the signal region, an application region is defined by selecting events that fulfil all event selection criteria except that they contain a τ h lepton that passes the relaxed instead of the nominal requirement (for the τ h τ h channel it must be the leading τ h ). The rate of misidentified jet events in the signal region is obtained by applying the F F values from the determination region on an event-by-event basis as an event weight to the events in the application region. In both determination and application regions the contribution of other background processes not involving jet → τ h events, which amounts to about 1% (5%) in the τ h τ h (τ τ h ) channel(s), is subtracted using simulated events. The contamination from signal events is significantly smaller (<0.1%) and is therefore neglected.
The jet → τ h background in the τ h τ h channel originates almost entirely from QCD multijet events. The determination region is thus defined by inverting the opposite-sign requirement on the τ lepton pair to a same-sign requirement, which effectively selects a control region pure in QCD jets. The F F are parameterised for the leading τ h lepton as a function of the p T of the τ h , and binned in the reconstructed decay mode, jet multiplicity, and impact parameter significance. Correction factors are derived using control regions in data to correct for residual differences in the p miss T spectrum, and to account for the sign inversion used to define the determination region. The final F F value for the τ h channel is obtained by applying the raw F F and the two corrections multiplicatively. This F F also accounts for other processes with a jet misidentified as the leading τ h lepton, such as W + jets production. The events in which the subleading τ h is a misidentified jet and the leading τ h candidate is a genuine τ lepton are modelled via simulation; these events constitute only a small fraction (O(2%)) of the total misidentified jet background in the τ h τ h channel.
In the τ τ h channels, the W + jets process, and to a lesser extent the tt process, contribute to jet misidentification as well as events originating from QCD multijet production. Therefore, separate F F are derived for these processes, and these individual F F values are subsequently weighted into an overall F F to account for their different contributions in the application region. Simulated events are used to determine the expected relative contributions of W + jets and tt events, and the QCD contribution is estimated by subtracting all simulated non-QCD processes from the data in the application region. In order to account for dependencies of the weights on several kinematic variables, a multi-class BDT is trained to separate W + jets, QCD, and tt events. The inputs to the BDT include kinematic features of the reconstructed ττ system and the associated jets, as well as the τ h decay mode. The output of the BDT is a set of three scores (one per class) that sum to unity-meaning one of the outputs is redundant. The weights are thus determined in bins of two of these scores. The overall F F accounts for the jet misidentification in all background processes. The procedure for the QCD F F is similar to the method described for the τ h τ h channel, except that the light lepton isolation parameter must be larger than 0.05 to reduce processes resulting in genuine leptons. Correction factors are derived to correct for residual differences in the lepton p T and p miss T spectra, and to account for the sign inversion and additional lepton isolation requirement used to define the determination region. A determination region sufficiently pure in W + jets is defined by selecting events with m T > 70 GeV. Correction factors are derived to correct for residual differences in the lepton p T and p miss T spectra, and to account for the inverted m T selection used to define the determination region. For the tt process, it is difficult to define a sufficiently pure region in data, and thus the F F values are estimated from a simulated tt sample. Correction factors are derived to correct for residual differences in the p miss T spectrum, and to account for differences in the F F in data and simulation. The latter is derived by comparing the F F values measured for W + jets in data and simulation.

Estimation of minor backgrounds
The τ-embedding and F F methods combined describe around 90% of the backgrounds in this analysis. All events containing a genuine τ lepton pair are taken from the embedded samples, while events in which the (leading) τ h is a misidentified hadronic jet in the (τ h τ h ) τ τ h channels are obtained from data using the F F method. All other background events are obtained from simulation.
In addition to the genuine τ and jet → τ h contributions to the selected pairs, there are additional sources of τ misidentifications that may occur. This includes prompt leptons that may be either misidentified as a τ or as a τ h , τ leptons being misidentified as τ h , and jets misidentified as τ candidates. In Tables 3 and 4 we summarise the different background composition configurations and their modelling for the τ h τ h and τ τ h channels, respectively. To avoid double-counting events with a genuine τ lepton pair, such events are subtracted from all simulated samples, as well as events in which the τ h is a misidentified hadronic jet (for the τ h τ h channel this must be the leading τ h ). In order to model the background processes in data well, various corrections need to be applied to the simulated samples. All corrections to the τ lepton decay products applied to the embedded samples (described in Section 8.1) are also applied to the simulated samples. Although both embedded and simulated samples include simulated leptons, the corresponding corrections can differ slightly due to deposits from other nearby objects, that may influence, for example, isolation sums and/or particle identification decisions. Therefore, dedicated correction factors are derived in each case.
Jet energy scale corrections are applied to both data and simulated events as described in Section 5. Recoil corrections to the p miss T are applied to reduce the mismodelling of the simulated 8.4 Validating the modelling of the φ CP observables Table 3: The different sources of backgrounds in the τ h τ h channel are shown in the rows and columns. The entries in the table represent the possible τ lepton pair background contribution from different processes and misidentifications and encapsulate the different experimental techniques that are deployed to estimate the background contributions. and total p T of the neutrinos originating from the decay of the Z, W, or H. Their average effect is the reduction of the p miss T obtained from simulation by a few GeV. Recoil corrections to p miss T are measured in Z → µµ events. The corrections are subsequently applied to Drell-Yan plus jets events, W + jets, and signal event samples. The → τ h misidentification rates are corrected in simulation by applying the tag-and-probe method to Z → events, and the energy scales are corrected in simulation to match the scale in data.
The Z boson mass and p T spectra in simulation are corrected to better match the data. To this purpose the Z mass and p T are measured in data and simulation in di-muon events, and the simulated events are corrected to match the spectra in data. A correction is also applied to the top quark p T spectrum in the tt sample, using a dedicated control region. The procedure used to derive this correction is detailed in Ref. [99].
After applying all corrections, we obtain a satisfactory description of the observables that we use to categorise events, which are described in Section 9.

Validating the modelling of the φ CP observables
To validate the modelling of the φ CP spectrum, the τ µ τ h events in data and the background estimates are divided into distributions in which the charged π is "nearly coplanar" or "nearly perpendicular" to the production plane of the beam axis and the τ momentum in the laboratory frame, as described in Ref. [37]. To this purpose we introduce the observable α π − that is defined as cos α π − = ẑ ×p |ẑ ×p| ·ĵ ×p |ĵ ×p| .
In this equation,ẑ is the unit vector pointing along the beam axis,p is the unit momentum vector of the charged π, andĵ is the unit impact parameter vector. We can define a subset of events in which the charged π is nearly perpendicular or coplanar by requiring α π − > π/4 or α π − < π/4, respectively. We also perform equivalent checks for τ decays into ρ mesons, where we substitute the unit π 0 momentum vector forĵ in Eq. (14) to define an equivalent observable, α ρ − . In Fig. 5 we display the coplanar and perpendicular distributions in the µπ and µρ channels.  The angle φ CP for τ µ τ h events in which the τ h decays to a charged π (upper) or a charged ρ meson (lower). The distributions are decomposed in a subset in which the charged π is "nearly coplanar" (left) or "nearly perpendicular" (right) to the production plane.

Event categorisation
In order to enhance the sensitivity of this analysis we apply MVA discriminants to separate signal from background events.
This event categorisation is formulated as a multi-class problem. The output is a set of scores for each event (one per class) that, by construction, sum to unity. Each score can therefore be interpreted loosely as the probability that an event belongs to a given class. We then assign each event to a category depending on the class that received the highest score. Since both the τ τ h and τ h τ h channels are dominated by backgrounds containing contributions from genuine τ and jet → τ h production, the discriminant is trained to categorise events in three classes: • The "Higgs" category is trained to distinguish events from the ggH, VBF, and VH samples from background events, which are reweighed by their cross sections before merging them into one sample. Events in this category are used to infer the CP quantum number of the boson.
• The "Genuine" category includes all background processes involving two genuine τ leptons. • The "Mis-ID" category includes all background processes in which minimally one hadronic jet is misidentified as a τ h lepton. This category also contains → τ h misidentified events for all channels and prompt light leptons in the τ τ h channels.
The three categories are mutually exclusive and, by definition, the lower bound for the highest MVA score is 1/3. Subsequently, the three training categories are normalised to account for the different number of events in each data set. All event classes are then chosen to contribute to the training with the same weight, i.e. with uniform prevalence. For the semileptonic channels, the backgrounds for the training are provided from simulated samples, except for QCD events, which are obtained using same-sign τ lepton pair candidates in data. For the hadronic channel, the embedded samples and the F F method are used in the training. For the latter, the events from the application region are used and reweighted by their F F values. The contribution of other background processes not involving jet → τ h events is not removed in this case, but the impact of these events on the performance of the MVA is negligible as they amount to only O(1%) of the total. A separate training was performed for each year to account for differences in the performance of the CMS detector in different data-taking periods. In the τ τ h channels, the event categorisation is performed with a multiclass neural network. In the τ h τ h channel, the event categorisation is performed using a multi-class BDT algorithm combined with the XGBOOST package. The input variables used in the categorisation of the τ τ h and τ h τ h channels are displayed in Table 5. The training is performed inclusively for all the τ lepton decay modes.
Events are sorted into the three categories depending on which of the three output scores is closest to unity. The maximum output score is also retained and used for the purpose of signal extraction. These maximum scores will be referred to as the "MVA scores" henceforth.
After the categorisation, a cutoff of S IP > 1.5 is applied to the impact parameter significances of the electron and muon, as well as to the single pions from a τ lepton for events that are classified as signal events. Events with a lower S IP would dilute the sensitivity of the analysis. In the background categories, a cutoff on the impact parameter significance is only applied to the single-pion decays.
In Fig. 6, the post-fit MVA score distributions of the Genuine and Mis-ID categories are displayed for the τ µ τ h and τ e τ h channels. The best fit signal contributions are overlaid. The fitting procedure is outlined in Section 12. The genuine di-τ and jet → τ h background contributions are displayed separately as indicated in the legends. The remaining background contributions are collated and indicated by the "Others" label. The BDT scores for the τ h τ h channel are analogously displayed in Fig. 7. Observable

The φ CP distributions in windows of the MVA discriminant score
The MVA score distributions described in Section 9 allow for a partial separation of signal from background events. The φ CP distributions of the events in the signal categories are then analysed in windows of increasing MVA score, corresponding to progressively higher signalto-background ratios. The result is a set of 2-D distributions built from the MVA score and φ CP variables. These distributions are used in the fit to data to extract the results.
The statistical fluctuations in the estimates of the background contributions (denoted as background templates) in the signal and background categories are sizeable. It has been underlined that backgrounds involving two genuine τ leptons are flat in φ CP at the generator level [37]. Experimental smearing effects do not modulate this flat shape for decay modes in which we apply the neutral pion method for at least one τ lepton. Therefore, for this background process and these decay modes we flatten the background templates by merging the bins. The φ CP distribution is not flat for the jet → τ h background for all decay modes due to the kinematic properties of the events, but the distributions are still symmetric around φ CP = 180 • , and so this background is symmetrised-meaning the symmetry around φ CP = 180 • is enforced. For other background templates, for example the µ → τ h contribution, the distributions are found to be flat within the statistical uncertainties, and therefore these background templates are also flattened.
The backgrounds are not expected to be flat in decay modes in which the impact parameter method is used or when the polarimetric vector method is applied when both τ leptons decay via the a 3pr 1 mode. This can be understood from the fact that smearing effects in the PV are correlated for the decay planes. The smearing of the PV results in a depletion in the region φ CP = 180 • [37], such that the shape of the background distributions in the µπ, eπ, and a 3pr 1 a 3pr 1 (ππ) channels tends to resemble the CP-even (CP-odd) signal rather than the CP-odd (CPeven). However, for such channels the backgrounds are symmetric around φ CP = 180 • , and therefore the background templates are symmetrised.
For certain decay modes, the statistical fluctuations in the signal templates are also sizeable. Therefore, the templates for the scalar and pseudoscalar cases are symmetrised around φ CP = 180 • as well. The maximally mixed signal template, which is not displayed in the plots, is used in the fitting procedure described in Section 12. In order to symmetrise this template, we reweight the signal sample to another sample with α Hττ = −45 • . The φ CP distribution is shifted by 180 • , and the average is taken between the sample with α Hττ = 45 • and α Hττ = −45 • .
In Figs. 8-10 we display the post-fit data and background template distributions, after the bin smearing and symmetrisation, with the best fit and pseudoscalar signal templates overlaid. The cross sections for the pseudoscalar signal are set to the values determined from the fit to data for the best fit signal, which are given in Section 12.1. The uncertainties have been adjusted to their value after the fit described in Section 12. The most sensitive decay modes of the analysis are displayed, which are the µρ and µπ mode in the τ µ τ h channel displayed in Fig. 8, the ρρ and πρ mode in the τ h τ h channel displayed in Fig. 9, and the eρ and eπ channels in the τ e τ h channel displayed in Fig. 10. The distributions highlight the effectiveness of the MVA discriminant in optimising the signal over background ratio, as well as the CP-sensitivity of the measurement that follows from the visibly different phases of the best fit signal and CPodd signal distributions. The 180 • phase shift between the τ h τ h and τ τ h channels is manifest in the figures. The correlated effect of the PV smearing is also visible in the µπ and eπ modes via the non-flat shapes of the background distributions.  Obs.−Bkg.
Bkg. unc. Figure 6: The post-fit MVA score distributions for the Genuine (left) and Mis-ID categories (right) in the τ µ τ h (upper) and τ e τ h (lower) channels. The distributions are inclusive in τ h decay mode. The best fit signal distributions are overlaid, where the signal cross sections are set to the values obtained from the fit to data, which are given in Section 12.1. In the lower panels, the data minus the background template divided by the uncertainty in the background template is displayed, as well as the signal distribution divided by the uncertainty in the background template. The uncertainty band accounts for all sources of systematic uncertainty in the background prediction, after the fit to data. Bkg. unc. Figure 7: The post-fit MVA score distributions for the Genuine (left) and Mis-ID categories (right) in the τ h τ h channel. The distributions are inclusive in τ h decay mode. The best fit signal distributions are overlaid, where the signal cross sections are set to the values obtained from the fit to data, which are given in Section 12.1. In the lower panels, the data minus the background template divided by the uncertainty in the background template is displayed, as well as the signal distribution divided by the uncertainty in the background template. The uncertainty band accounts for all sources of systematic uncertainty in the background prediction, after the fit to data.  Bkg. unc. Figure 8: Distributions of φ CP in the µρ (upper) and µπ (lower) channels in windows of increasing MVA score, shown on top of each window. The best fit and pseudoscalar (PS) signal distributions are overlaid, where in both cases the signal cross sections are set to the values obtained from the fit to data, which are given in Section 12.1. The x-axis represents the cyclic bins in φ CP in the range of (0, 360 • ). In the lower panels, the data minus the background template divided by the uncertainty in the background template is displayed, as well as the signal distributions divided by the uncertainty in the background template. The uncertainty band accounts for all sources of systematic uncertainty in the background prediction, after the fit to data. Bkg. unc. Figure 9: Distributions of φ CP in the ρρ (upper) and πρ (lower) channels in windows of increasing MVA score, shown on top of each window. The best fit and pseudoscalar (PS) signal distributions are overlaid, where in both cases the signal cross sections are set to the values obtained from the fit to data, which are given in Section 12.1. The x-axis represents the cyclic bins in φ CP in the range of (0, 360 • ). In the lower panels, the data minus the background template divided by the uncertainty in the background template is displayed, as well as the signal distributions divided by the uncertainty in the background template. The uncertainty band accounts for all sources of systematic uncertainty in the background prediction, after the fit to data.  Bkg. unc. Figure 10: Distributions of φ CP in the eρ (upper) and eπ (lower) channels in windows of increasing MVA score, shown on top of each window. The best fit and pseudoscalar (PS) signal distributions are overlaid, where in both cases the signal cross sections are set to the values obtained from the fit to data, which are given in Section 12.1. The x-axis represents the cyclic bins in φ CP in the range of (0, 360 • ). In the lower panels, the data minus the background template divided by the uncertainty in the background template is displayed, as well as the signal distributions divided by the uncertainty in the background template. The uncertainty band accounts for all sources of systematic uncertainty in the background prediction, after the fit to data.

Systematic uncertainties
The uncertainties considered in this analysis can be categorised into normalisation and shape uncertainties. The former modify only the normalisation of a distribution while leaving its shape unchanged, whereas the latter allow for correlated changes across bins that also alter the shapes of the distributions. The uncertainties are accounted for as nuisance parameters in the fit to data. The normalisation and shape uncertainties are summarised in Table 6, in which we also state their correlations between the three different years of data-taking considered in this analysis.
The uncertainty in the muon reconstruction efficiency including the tracking, identification, and isolation requirements is 1%, while for electrons it is 2%. The uncertainty in the muon and electron trigger efficiencies, which affect both the single-lepton and cross-triggers, is 2%. An additional normalisation uncertainty of 4% is applied to the embedded event samples, originating from the uncertainty in the measurement of the muon trigger and identification efficiencies used to scale the embedded samples.
For the τ τ h channels, which contain a veto on events containing b-jets, an uncertainty in the propagation of the b-quark tagging scale factors of 1-9% is applied on the tt and diboson event yields (the uncertainties on the event yields for other simulated processes are found to be negligible).
The FEWZ 3.1 program [104] was used to calculate the W + jets and Z + jets cross sections. Uncertainties in the factorisation and renormalisation scales, the PDF, and the running coupling α S were propagated and added in quadrature. The TOP++V2.0 program [105] was used to calculate the tt cross section and its uncertainty. The extracted uncertainties for the simulated Z + jets, W + jets, and tt cross sections amount to 2, 4, and 4%, respectively. For the diboson and single top quark production processes, a combined systematic uncertainty in the background yield is estimated to be 5% using CMS measurements [106,107]. The uncertainties in the signal ggH, VBF, and VH production cross sections, as well as the uncertainty in the H → ττ branching fraction, are applied as recommended in Ref. [100].
The uncertainty in the µ → τ h misidentification rate in the τ µ τ h channel is split into four independent uncertainties depending on the MVA decay mode of the µ → τ h candidate. The sizes of the uncertainties are 20% for π and ρ, 30% for a 1pr 1 , and 40% for a 3pr 1 , respectively. An uncertainty of 10 (2)% to the e → τ h misidentification rate is applied for 2016 (2017, 2018) in the τ h τ h channel. In the τ e τ h channel, the e → τ h misidentification rate is split per decay mode and is, at most, 10%.
For the τ h τ h channel, the uncertainty in the jet → τ h background normalisation due to the extrapolation of the F F from same-sign to opposite-sign regions ranges between 4 and 7%.
For the decay of the τ lepton to µ or a single-pion, an uncertainty in the correction of S IP is applied by varying the size of the correction by ±25%, while for the decay to an electron the correction is varied by 40%. The uncertainty is converted into a normalisation uncertainty per decay mode and ranges 1-5%. For the a 3pr 1 a 3pr 1 mode, the uncertainty in the SV reconstruction efficiency is 2%.
Finally, a 3% uncertainty in the efficiency of the τ h candidates to pass the DNN discrimination

Shape uncertainties
The uncertainty in the τ h reconstruction and identification efficiency is typically of the order of 3%, and split into several uncertainties in each p T and MVA decay mode bin. The uncertainties in these corrections originates from uncertainties in the fits to the scale factors for these corrections and are statistically dominated. We also checked if applying separate uncertainties for τ h candidates that are incorrectly classified in a different decay mode (e.g. a 1pr 1 misclassified as a ρ) creates any variations in the shapes of the signal or background distributions. However, we found that such uncertainties only resulted in tiny modifications of the shapes of the φ CP distributions, which were negligible in comparison to the statistical uncertainties in the signal and background templates, and therefore common uncertainties were used for correctly and incorrectly classified τ h candidates in each MVA decay mode bin. The uncertainty in the τ h trigger depends on the p T and decay mode, and originates from the statistical uncertainty in parameterising the turn-on curve of the triggers. The τ h energy scale uncertainty is 0.8-1.1 (0.2-0.5)% for simulated (embedded) events, and is decay mode dependent. The uncertainty in the µ momentum scale varies as a function of the η of the muon and ranges 0.4-2.7%. The uncertainty in the electron energy scale is less than 1% and depends on the p T and η. The e → τ h energy scale uncertainty ranges 0.5-6.5%, while the µ → τ h energy scale uncertainty is 1%.
Uncertainties in the jet energy scale originate from different sources with limited correlations. The uncertainties depend on the jet kinematics and are typically larger in the forward regions. Uncertainties in the jet energy resolution are also incorporated; these uncertainties are typically smaller than the jet energy scale uncertainties. Uncertainties related to the hadronic recoil response and resolution as derived from the Z + jets, W + jets and signal samples, are propagated to p miss T and observables dependent on p miss T in the simulated samples that are subject to hadronic recoil corrections. For the samples in which no hadronic recoil is applied (diboson, single top quark, and tt), the jet energy scale and resolution uncertainties as well as the uncertainty in the unclustered energy are propagated to p miss T and observables dependent on p miss T in the simulated samples instead.
The embedded samples contain small fractions of tt and diboson events. A shape uncertainty is therefore applied by adding and subtracting 10% of the simulated tt and diboson contributions. The top quark p T and Drell-Yan p T and mass spectra are reweighed. For the top samples, the size of the correction is taken as the uncertainty, while for the Drell-Yan samples the correction is varied by 10%.
The F F values are parameterised with continuous functions, and the statistical uncertainties in the fitted parameters are treated as nuisance parameters. The uncertainties are parameterised in a manner that allows for asymmetric variations above and below the p T value where the uncertainty is minimal; the procedure is similar to the method described in detail in Ref. [11]. The size of the correction in p miss T is taken as an uncertainty for all F F values. For the τ h τ h channel, the shape uncertainty in the QCD same-sign to opposite-sign region correction is determined as the difference between a correction binned in the distance ∆R between the two τ leptons and the jet multiplicity, and the unbinned correction. For the τ τ h channels, the equivalent shape uncertainty is taken as the size of the same-sign to opposite-sign correction. In addition, a systematic uncertainty due to the light-lepton p T correction is taken as the size of the correction. For the W + jets F F values, the uncertainty due to the extrapolation from the high-m T to the low-m T region is taken as the size of residual differences observed when applying F F values derived for high-m T simulated events to low-m T simulated events. For the tt F F , a systematic uncertainty is applied to account for potential differences between data and simulation. To this purpose, the difference between F F values derived via data and simulated W + jets samples is applied as the uncertainty. An uncertainty in the subtraction of the background processes not involving jet → τ h events is considered by varying the contribution predicted from simulation by ±10%.
For uncertainties that are common to simulated and embedded samples we treat the lepton and τ h identification uncertainties and the lepton and τ h energy scale uncertainties as being 50% correlated. All other common uncertainties are treated as being uncorrelated.
During the 2016 and 2017 data-taking periods, a gradual shift in the timing of the inputs of the ECAL L1 trigger in the forward endcap region (2.5 < |η| < 3.0) led to a specific inefficiency. Additional correction factors and corresponding uncertainties are applied to the simulation to account for this inefficiency. The magnitude of the uncertainties ranges between 0-4% depending on the process, category, and channel.
For the signal samples, renormalisation and factorisation scales and parton showering uncertainties were incorporated [100] .
The limited number of events in the signal and background templates is accounted for using the "Barlow-Beeston" method [108,109], which assigns a single nuisance parameter per bin per process. For background templates that have been flattened as described in Section 9 the bin-by-bin uncertainties are fully correlated such that there is only one independent nuisance parameter for all φ CP bins. For background templates that are symmetric in φ CP = 180 • one nuisance parameter per pair of symmetrised bins is utilised. It should be noted that for flattened background templates multiple nuisance parameters are still needed per process since multiple windows of increasing MVA score are used.
We also considered other systematic uncertainties that could modify the shape of the simulated φ CP distributions, including the energy scale, and energy and angular resolutions of the charged and neutral pions, impact parameters, and SV−PV directions. However, we found that such uncertainties only resulted in tiny modifications to the shapes of the φ CP distributions, which were negligible in comparison to the statistical uncertainties in the signal and background templates, and they were therefore neglected in this analysis.
The systematic uncertainty scheme is validated by fitting the φ CP distributions in a Z → ττ control region, obtained following the procedure described in Section 8.4. Goodness of fit tests have been performed to assess the validity of the statistical model. These tests indicated a good compatibility between the data and the model.

Results
In order to extract the CP-mixing angle α Hττ , a simultaneous fit to the data is performed using the likelihood function L(L, µ, α Hττ , θ) that depends on µ = (µ ggH , µ qqH ), which are the Higgs boson signal strength modifiers (defined as the cross section times H → ττ branching fraction with respect to the SM value), the CP-mixing angle α Hττ , and the nuisance parameters θ that account for the systematic uncertainties. In the fit, all H → ττ production processes involving V boson couplings, namely VBF and VH, are scaled by µ qqH , while the ggH process is scaled by µ ggH . The fit is able to differentiate between these production modes because the shapes of the MVA score distributions are different; the VBF signal tends to peak more sharply towards larger MVA scores, whereas the ggH distribution is broader.
The likelihood function is defined as a product of conditional probabilities P over binned distributions of the discriminating observables in each event category: In this equation, the Poisson distributions P correspond to the observation of n i,j events in each bin of the discriminating observable given the expectation for the background B i,j ( θ) and the signal S i,j (L, α Hττ , µ, θ) = L µ A i,j ( θ, α Hττ ), in which L is the integrated luminosity and A i,j ( θ, α Hττ ) is the signal acceptance in each production bin. Constraints on the nuisance parameters corresponding to the systematic uncertainties described in Section 11 are represented by the functions C m ( θ). A more detailed discussion on the formulation of the statistical inference may be found in Refs. [109,110]. The systematic uncertainties affecting the normalisation of the signal and background templates are incorporated in the fit via nuisance parameters with a log-normal prior probability density function. The shape-altering systematic uncertainties are represented by nuisance parameters whose variations cause continuous morphing of the signal or background template shapes, and are assigned a Gaussian prior probability density function. The bin-by-bin statistical uncertainties in the background samples are also assigned a Gaussian prior probability density function.
Using the negative log-likelihood, which is defined as we find the 68.3, 95.5, and 99.7% confidence intervals when −2∆ ln L equals 1.00, 4.02, and 8.81, respectively. A detailed discussion may be found in Section 3.2 of Ref. [111].
The inputs to the likelihood fits differ for the signal and background categories. For the signal categories, the φ CP distributions in bins of the MVA score are used (a subset of these are displayed in Figs. [8][9][10]. For the background categories, the MVA score distributions are used. This allows for the background contributions and systematic uncertainties to be further constrained, and helps to improve the fit convergence.

α Hττ mixing angle results
We present the observed and expected negative log-likelihood scan for the combination of the τ e τ h , τ µ τ h , and τ h τ h channels in Fig. 11. The two rate parameters that scale the ggH and qqH production signal strength were left to float freely in the fit. The best fit values of these parameters are µ ggH = 0.59 +0.28 −0.32 and µ qqH = 1.39 +0.56 −0.47 , respectively, with the correlation coefficient ρ = −0.76 . We note that there is a strong anticorrelation between these parameters as the analysis does not directly attempt to differentiate between the production modes.
The expected sensitivities of the τ e τ h , τ µ τ h , and τ h τ h channels are 1.0, 1.5, and 1.8σ, respectively. The µρ mode yields the most sensitive expected contribution of 1.2σ, followed by the ρρ and πρ modes that contribute 1.1 and 1.0σ, respectively. All other modes have sensitivities below 1σ.
The statistical uncertainties in the background templates are the subleading source of systematic uncertainty in this analysis. As the dominant contributions to the backgrounds are determined themselves from control samples in data, the amount of data is the limiting factor in this uncertainty. The next most dominant sources of uncertainty are the hadronic trigger efficiency, theory uncertainties, the τ h energy scale, and uncertainties related to the implementation of the F F method.
It was shown in Ref. [36] that in the next-to-minimal supersymmetric model mixing angles as large as ≈27 • can be accommodated by the latest electric dipole moment and Higgs boson measurements. This measurement is thus sensitive to the larger allowed mixing angles in this model.
A fit to the data is also performed assuming µ ggH = µ qqH = µ. In this case µ is the combined signal strength modifier that scales the total H production cross section times H → ττ branching fraction relative to the SM value. In Fig. 12 we display a scan of µ versus α Hττ . We observe that there is no strong correlation between these parameters.
In order to make a 2-D scan of κ τ and κ τ , as defined in Eq. (2)  found when −2∆ ln L 2D equals 2.30, 6.20, and 11.62 [111], respectively, defined analogously to Eq. (16) with the likelihood now a function of both κ τ and κ τ . All other CP-even (CP-odd) couplings affecting the production cross sections and/or the H → ττ branching fraction are fixed to their SM values, κ i = 1 ( κ i = 0). The observed result of the scan is shown in Fig. 13. It should be noted that the fit is only sensitive to the relative sign between κ τ and κ τ and thus the scan has two best fit points for positive and negative values of κ τ .
In Fig. 14 we display the data of the ρρ, πρ, µρ, and eρ channel together with CP-even and CP-odd predictions. These channels are chosen as the same number of φ CP bins are used in the fit to data, and collectively they account for most of the sensitivity to α Hττ . Events are included from all MVA score bins in these signal categories. Each MVA score bin is weighed by A S/(S + B), where S and B are the signal and background rates, respectively, and A is a measure for the average asymmetry between the scalar and pseudoscalar distributions. The definition of the value of A per bin is |CP even − CP odd |/(CP even + CP odd ), and A is normalised to the total number of bins. In this equation, CP even and CP odd are the scalar and pseudoscalar contributions per bin. This distribution shows that the data favour the CP-even scenario.  The background is subtracted from the data. The scalar distribution is depicted in red, while the pseudoscalar is displayed in blue. In the predictions, the rate parameters are taken from their best fit values. The grey uncertainty band indicates the uncertainty in the subtracted background component. In combining the channels, a phase-shift of 180 • was applied to the channels involving a lepton since this channel has a phase difference of 180 • with respect to the two hadronic channels due to a sign-flip in the spectral function of the light lepton.

Summary
The first measurement of the effective mixing angle α Hττ between scalar and pseudoscalar Hττ couplings has been presented for a data set of proton-proton collisions at √ s = 13 TeV corresponding to an integrated luminosity of 137 fb −1 . The data were collected with the CMS experiment at the LHC in the period 2016-2018. The following τ lepton decay modes were included: e ± , µ ± , π ± , ρ ± → π ± π 0 , a 1 ± → π ± π 0 π 0 , and a 1 ± → π ± π ∓ π ± . Dedicated strategies were adopted to reconstruct the angle φ CP between the τ decay planes for the various τ decay modes. The data disfavour the pure CP-odd scenario at 3.0 standard deviations. The observed effective mixing angle is found to be −1 ± 19 • , while the expected value is 0 ± 21 • at the 68.3% confidence level (CL). The observed and expected uncertainties are found to be ±41 • and ±49 • at the 95.5% CL, respectively, and the observed sensitivity at the 99.7% CL is ±84 • . The leading uncertainty in the measurement is statistical, implying that the precision of the measurement will increase with the accumulation of more collision data. The measurement is consistent with the standard model expectation, and reduces the allowed parameter space for its extensions. Tabulated results are provided in HEPDATA [112].

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid and other centres for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC, the CMS detector, and the supporting computing infrastructure provided by the follow-