Testing CP Properties of Extra Higgs States at the HL-LHC

Extra Higgs states appear in various scenarios beyond the current Standard Model of elementary particles. If discovered at the LHC or future colliders, the question will arise whether CP is violated or conserved in the extended scalar sector. An unambiguous probe of (indirect) CP violation would be the observation that one of the extra Higgs particles is an admixture of a CP-even and a CP-odd state. We discuss the possibility to discover scalar CP violation in this way at the high-luminosity (HL) phase of the LHC. We focus on the Two-Higgs Doublet Model of type I, where we investigate its currently allowed parameter region. Considering a benchmark point that is compatible with the current constraints but within reach of the HL-LHC, we study the prospects of determining the CP property of an extra neutral Higgs state $H$ via the angular distribution of final states in the decay $H \to \tau\bar\tau$. The analysis is performed at the reconstructed level, making use of a Boosted Decision Tree for efficient signal-background separation and a shape analysis for rejecting a purely CP-even or odd nature of $H$.


Introduction
After the discovery of the scalar resonance with a mass of about 125 GeV, the combined measurements are now used to establish the particle's properties [1] and whether or not it is indeed the Higgs boson as predicted by the Standard Model (SM). An important part of this procedure is the test of its spin [2,3] and CP transformation properties [4]. The latter is of particular interest, as the violation of the CP symmetry is a fundamental ingredient in order to explain the long-standing puzzle of the matter-antimatter asymmetry of the Universe [5], in particular because the CP violation in the SM -observed first in Kaon [6] and recently also in Charmed meson decays [7] -is insufficient. This calls for physics beyond the SM (BSM) explanations with significant amount of CP violation, which can e.g. be introduced in the scalar sector.
According to present analyses, the discovered scalar resonance is compatible with the scalar Higgs boson as predicted by the SM, yet the possibility of a more complex scalar sector that includes CP violation remains. Since no additional scalar resonances have been found to date 1 the scalar sector may include additional scalar bosons with masses above 250 GeV, which mix only weakly with the SM-like scalar Higgs boson.
A minimal prototype for an extended scalar sector is the Two Higgs Doublet Model (THDM) where the scalar sector of the Standard Model (SM) is extended by an additional scalar SU (2) L doublet field [11], which allows for the possibility of spontaneous violation of the CP symmetry in the scalar sector [12], see e.g. ref. [13] for an overview over its phenomenology. In general, additional Higgs doublets are tightly constrained as they may introduce Flavour Changing Neutral Currents (FCNCs) at tree-level, and Electric Dipole Moments (EDM) for SM particles, see e.g. ref. [14]. Scalar particles as in the THDM can be discovered and studied at particle colliders, such as the Large Hadron Collider (LHC) [15,16]. Once another scalar boson is discovered, its CP properties will be studied, similarly to the Higgs boson, via correlations of the final state leptons from its decays, for instance from sequential gauge boson decays [17], polarisation of tau lepton pairs [18], or top quark associated production [19]. Recently the state-of-the-art experimental constraints on the type II THDM were combined and it was shown that observable CP-violating effects in the neutron EDM and also in tth production at the LHC were still possible [20].
In this paper we go beyond existing studies by investigating in detail the possibility to establish the presence of CP violation in the THDM type I from mixing of heavy neutral scalar particles with different CP transformation properties. To this end we define the model in section 2, discuss present experimental constraints and the allowed parameter space in section 3 and perform a collider analysis of the angular distribution of final states in the decay H → ττ and how it can be used to infer the CP property of extra Higgs states in section 4. We summarise our results and conclude in section 5. In the Appendices A and B we discuss the potential of the alternative decay channel H → ZZ → 4µ.

The Complex Two-Higgs Doublet Model
The THDM was introduced in ref. [12] to discuss the phenomenon of CP violation in the scalar sector, an effect that can potentially be large. All incarnations of the THDM tend to create treelevel flavor changing neutral currents (FCNCs) that arise from the Yukawa potential. In the THDM the FCNCs can be naturally suppressed when a Z 2 symmetry is imposed on the Lagrangian [21], as discussed below.

The scalar potential
In the THDM the scalar sector of the SM is extended by an additional field such that the theory contains two SU (2) L -doublet fields, φ 1 and φ 2 , with identical quantum numbers under the SM gauge symmetry group: Here we introduced the real neutral fields h i , i = 1, ..., 4, the charged (complex) fields η + i , i = 1, 2, and the vacuum expectation values (vevs) v i , i = 1, 2. In its most general form the THDM allows for global transformations which mix these fields and change the relative phases. The Lagrangian density for this model can be decomposed as where L SM,kin denotes the kinetic terms for SM gauge fields and fermions, L φ,kin denotes the kinetic terms for the two scalar fields φ i , i = 1, 2, V φ denotes the scalar potential, and Y φ the Yukawa terms which gives rise to the couplings between the SM fermions and the scalar fields. The most general potential for THDMs can be written as To avoid FCNCs interactions, THDMs are often defined with a global Z 2 symmetry [21], which transforms the scalar fields as In V φ , this symmetry enforces λ 6 = λ 7 = m 2 12 = 0. In addition, some of the fermion representations also transform under the symmetry to ensure that only one of the Higgs doublets is involved in each Yukawa matrix. With exact Z 2 symmetry, there is no CP violation in the scalar sector, because the only complex parameter in V φ would be λ 5 , and its effect could be absorbed into global redefinitions of the fields.
To allow for CP violation in the scalar sector of the THDMs, we will consider a softly broken Z 2 symmetry, where in addition to λ 5 also the (complex) parameter m 2 12 is present and non-zero. The scalar potential is then given by We parametrize the two a priory complex-valued parameters as m 2 12 = |m 2 12 |e iη(m 2 12 ) , λ 5 = |λ 5 |e iη(λ5) , introducing the two phases η(m 2 12 ) and η(λ 5 ). When minimizing the Higgs potential after electroweak symmetry breaking, the tadpole equations require with v 1 and v 2 denoting the two (by convention real) vacuum expectation values (vevs) of the two scalar fields φ 1 and φ 2 . The two vevs satisfy v = v 2 1 + v 2 2 , with v denoting the SM vev v ≈ 246 GeV, and we define tan β := v 2 /v 1 . Solving the first two equations one can eliminate m 2 11 and m 2 22 while from the third equation we get the condition (m 2 12 ) = 1 2 v 1 v 2 (λ 5 ). As long as η(λ 5 ) = 2η(m 2 12 ), this implies that the two complex phases cannot be removed simultaneously by global redefinitions of the fields, and CP is violated [22]. One still has the freedom to remove one of the phases, and in this paper we will use this freedom to make m 2 12 real and positive and thus by convention we have such that the phase η(λ 5 ) of λ 5 is the only source of CP violation in V φ .

The mass matrix
The tree-level mass matrix for the neutral scalars is given by: with h i (i = 1, 2, 3, 4) being the neutral components of the two Higgs doublets including the Goldstone boson to be absorbed by the Z boson after electroweak symmetry breaking. The mass matrix for the four neutral states in the Higgs basis h 1 , h 2 , h 3 , h 4 is with the diagonal elements and the off-diagonal elements Diagonalizing the mass matrix in eq. (9) leads to three massive neutral scalar bosons H 1 , H 2 and H 3 , and one massless neutral field H 0 . In this article we will evaluate the mass matrix numerically. An analytical dependence of the mass eigenstates' physical properties on the model parameters can be extracted under certain simplifying assumptions, see e.g. refs. [23,24]. In general, the mass eigenstates do not conserve the CP symmetry. One can see that with the only source of CP violation coming from (λ 5 ), for (λ 5 ) → 0 one retains the CP conserving THDM (with vanishing off-diagonal entries in the mass matrix, O 2,3,4,5 → 0). The squared neutral Higgs mass matrix can be diagonalized by a 4 × 4 matrix R as The neutral Higgs mass eigenstates H i (i = 0, 1, 2, 3) are related to the interaction fields h i via the rotation In the following we identify H 0 with the Goldstone boson that is absorbed by the Z boson and H 1 with the SM-Higgs-like scalar resonance at ∼ 125 GeV. This leaves the neutral bosons H 2 and H 3 as new scalar mass eigenstates yet to be observed. We will assume that the extra Higgs states are heavier than H 1 and, without loss of generality, require the mass ordering M H1 ≤ M H2 ≤ M H3 . The evaluation of the mass matrix and the rotation matrix R is carried out numerically using SPheno [25,26].

The Yukawa sector
The absence of FCNCs at tree-level is ensured when a basis exists in which the contributions to the mass matrices for each fermion of a given representation stem from a single source [21,27].
In the Standard Model with left-handed doublets and right-handed singlets, this implies that all right-handed quarks of a given charge must couple to a single Higgs multiplet, which can be ensured via a discrete Z 2 symmetry. This symmetry transforms the scalar fields as in eq. (4), and allows for different possible Z 2 charge assignments for the SM fermions. Here, we select the Z 2 charge assignment of the type I version of the THDM, where all quarks and charged leptons couple only to one of the scalar doublet fields, conventionally chosen to be φ 2 . 2 The Z 2 -symmetric Yukawa terms of type I THDM are given by with the Yukawa coupling matrices Y u , Y d , Y e .

CP violation
The scalar potential of eq. (5) in general mixes the interaction states with definite CP transformation properties. This is clearly visible in the mass matrix of eq. (9), which mixes the CP-even h 1 , h 2 with the CP-odd h 3 , h 4 when at least one of the off-diagonal entries O i , i = 2, 3, 4, 5 is non-zero, i.e. when λ 5 has a non-zero imaginary part. The proposed methods for testing CP violation in the Higgs sector include: • If an extra Higgs state H i is discovered, its top quark associated production cross section could be used to determine its CP property [19,28,29], because it is sensitive to the relative magnitudes of the CP-even and CP-odd coefficients of thettH i coupling. However, this effect is suppressed by the smaller cross section of a three particle final state.
• The angular momentum correlations of the final state muons in H i → ZZ → 4µ have been proposed as a method to determine the CP transformation property of H i [2,17,[30][31][32][33][34][35]. We will discuss the applicability of this method in the context of the THDM of type I in Appendix A. We find that at the HL-LHC the loop-induced decay rate of the CP-odd pseudoscalar (or of the CP-odd component of a mixed state) via ZZ into 4µ is too suppressed for successful application of the method.
• When contributions from loop-level decays of the H i can be neglected, an obvious sign for CP mixing in the THDM is the simultaneous observation of three different Higgs states with interactions that, in the CP conserving case, are only possible at tree-level for pure CP eigenstates [36]. One example is the scalar decay chain H i → ZZ → 4µ mentioned above.
Since H i → ZZ at tree-level is only possible when the H i has a CP-even component, in the CP conserving THDM only H 1 and either H 2 or H 3 can have this tree-level decay. Observing it for all three H i one can conclude that the THDM violates CP. We discuss this example in Appendix B. However, it is important to note that the observation of several scalar resonances with decays into ZZ is not an unambiguous signal of CP mixing in general, since the third resonance could stem from additional scalar fields outside the THDM.
• The CP transformation property of the H i can be inferred from its decays to tau lepton pairs. To be specific, the correlation of the tau lepton polarisation planes are directly linked to the CP properties of the parent, and they can be reconstructed via the hadronic decay modes of the two tau leptons [18,[37][38][39]. In the following we will focus on this method in the main part of the paper.

Discovering CP violation via H → ττ
We use the impact parameter method as first presented in ref. [18] to extract transverse spin correlations in the decay chains of a field S which is a mixture of a scalar and a pseudoscalar field. In particular we focus on decay chains of the form S → ττ with τ ± → π ±ν τ (ν τ ) and make use of the impact parameters of the visible decay products of the tau lepton, τ had , to extract an asymmetry in the acoplanarity angle of the two tau leptons. We remark that the method does not depend on the Higgs boson production mechanism, but translates directly into correlations among their decay products.
The Yukawa interaction of S can be written as with y Sτ being the effective Yukawa coupling of S and the tau lepton and C a , C V being the scalar and pseudoscalar components of the coupling, respectively, with C 2 v + C 2 a = 1. The effective mixing angle θ τ τ , defined as measures the mixing of CP eigenstates. For example, θ τ τ = 0 ( π 2 ) holds for pure scalar (pseudoscalar) coupling.
The ττ spin correlation can be inferred from the angle between the tau decay planes. We remark that we consider here only the tau decay mode τ ± → π ±ν τ (ν τ ), which has a branching fraction of 11%. While this limits our statistics it provides a clear signal and can thus serve as a conservative estimate for the sensitivity to distinguish CP properties.
For the tau decay τ ± → π ±ν τ (ν τ ) the angular correlation in the decay width can be written as [18]: The angle φ between the decay planes is the so-called acoplanarity angle, which is sensitive to the CP properties of the scalar parent S via the coupling parameters C v and C a . The angular correlation in eq. 16 is given for the case in which one cannot distinguish φ from 2π − φ and is obtained by the sum over both cases [18]. The acoplanarity angle (φ) can be reconstructed from the tau decay properties, namely the two impact parameter vectors where we introduced The impact parameter is defined as the shortest path between the primary vertex and the pion momentum vector extended in the direction of the tau decay point. Since it is basically impossible to reconstruct the tau lepton momentum due to the presence of tau neutrinos among the decay products, the authors in ref. [18] introduce the so-called "Zero-Momentum-Frame" (ZMF) of the tau decay products, in our case the pions. This does not affect the correlations of the decay planes, such that the exact tau direction does not matter. We find the ZMF by boosting the meson momenta such, that P * π + = − P * π − , where quantities with an asterisk ( * ) refer to the ZMF. Then a 4-vector is defined for the normalized impact parameter for each tau lepton in the ZMF as n * ± = (0, n * ± ), from which one can extract the acoplanarity angle in the boosted frame: The resulting distribution for φ * between 0 and π allows for a clear distinction of fields that are even or odd eigenstates of CP. Below, in sec. 4, we will analyse how well the CP property of an extra Higgs state can be distinguished using this method. Finally, we note that the distribution in eq. (16) remains invariant when switching from ττ in the laboratory frame to π + π − in the ZMF (cf. also ref. [18]): 3 Constraints The THDM with CP violation is constrained from various observations and measurements at collider and non-collider experiments. Below, we discuss constraints from theoretical considerations, from B-physics measurements, Higgs data (from the LHC, LEP, and the Tevatron) and from measurements of EDMs.

Theory considerations
As first condition from theory we impose that it has to be perturbative, which constrains the magnitude of the couplings |λ i | 4π. The second theory condition that each model has to satisfy is the stability of the vacuum. Therefore, the potential should be positive for large values of φ, which leads to the constraints [23,40]: The third condition is that the S-matrix has to be unitary for an elastic two-to-two boson scattering process, which limits the magnitude of λ i , cf. refs. [23,40]. The fourth condition stems from the so-called oblique parameters, which are constrained as (cf. the global fit of [41,42]): These parameters receive contributions from the THDM at the loop-level, and present an independent important constraint.

B physics data
The charged Higgs bosons from the THDM contribute to the decays of B mesons, such that the B-physics data set can be used to constrain the THDM parameters. Since the couplings of the charged Higgs bosons are not sensitive to the parameters of the neutral sector, these constraints are independent of the amount of CP violation in the model. To evaluate the flavor phenomenology in particular for the B physics processes we use the numerical tool FlavorKit [43], which evaluates many flavor-related observables for every scanned point. The most stringent constraints on our model parameters stem from the process B → X s γ, which limits in particular the charged Higgs mass: m ± H ≥ 580 GeV at tan β = 1 for the THDM of type II. For the type I THDM, the strongest constraints on the charged Higgs mass apply for tan β ≤ 2, while with increasing tan β the constraints get weaker, see refs. [44,45]. We use the experimental bounds as reported in [46]:

Higgs data
The global data set on the Higgs boson includes results from LEP, the Tevatron and the LHC experiments. The existing data is combined with the numerical tool HiggsBounds [47], which we employ to constrain the THDM parameter space. HiggsBounds first identifies the most sensitive signal channel for each boson H i separately and then computes the ratio of this theoretically predicted to the observed signal strength for heavy Higgs bosons as which we use to obtain an exclusion limit at 95% C.L for parameter space points where at least one observable exists, such that K i > 1.
In addition to the exclusion of individual parameter points, we employ the numerical tool Hig-gsSignals [48] to evaluate the statistical compatibility of the lightest SM-like Higgs boson in the model with the observed scalar resonance, as it is observed by the LHC experiments. Also, the SM-like Higgs signal rates and masses are compared with the various signal rate measurements published by the experimental collaborations for a fixed Higgs mass hypothesis. The model is tested at the mass position of the observed Higgs peak in the channels with high mass resolutions like h → ZZ * → 4 and h → γγ. The signal strength modifier for the model for one channel is calculated as with ω being the SM weight, including the experimental efficiency. A χ 2 test for the model hypothesis is performed, where a local excess in the observed data at a specified mass is matched by the model. The signal strength modifiers and the corresponding predicted Higgs masses enters the total χ 2 evaluation as where χ 2 µ is the χ-squared measure calculated from the signal strength modifier only and χ 2 m H i is the χ-squared measure calculated from Higgs bosons mass, with i running over the number of the neutral Higgs bosons in the model. The intrinsic experimental statistical and systematic uncertainties within 1σ for χ 2 µ is given by where C ij is the signal strength covariance matrix that contains the uncorrelated intrinsic experimental statistical and systematic uncertainties in its diagonal entries. The 1σ and 2σ error can be obtained from the best-fit value as 1(2)σ = ∆χ 2 best + 2.3(5.9) with ∆χ 2 best = 1.049. CMS reports the combined best fit value for the SM Higgs signal strength at center of mass energy = 13 TeV and integrated luminosity = 35.9f b −1 to be µ best = 1.

Electric Dipole Moments
The upper limit on the electric dipole moment (EDM) of the electron is |d e | < 1.1 × 10 −29 ecm [51]. The new scalars contribute to the electron EDM via Barr-Zee diagrams as discussed, e.g., in refs. [52]  and [53]. In particular, the CP violating complex phase is found to strongly affect the magnitude of the EDM, its main source being the modified couplings of the Higgs bosons. In the type II THDM the EDM is enhanced by tan β, while in type I the EDM is suppressed by 1 tan β . As stated above, the Yukawa couplings can be expressed as a sum of their CP-even and CP-odd part. In general, if a fermion couples to φ 1 (φ 2 ) both parts of the coupling are proportional to tan β ( 1 tan β ). Thus, in the type I THDM all Yukawa couplings are proportional to 1 tan β , while in the type II THDM the Yukawa couplings of down-type quarks and leptons are proportional to tan β.
In this article we consider large tan β, which leads to potentially large couplings and large contributions to the EDM. Therefore, for large tan β the type I THDM with couplings proportional to 1 tan β is less constrained, which is one reason for us to focus on this version of THDM. To analyse the EDM constraint, we employ the formulae from refs. [52,53].

Scanning the parameter space
In order to find viable parameter space points that satisfy all constraints we perform a scan over the parameter space. In this scan the full parametric dependence of the physical properties of the scalar particles, like their masses and interaction vertices, are calculated, and the above described constraints are evaluated. For the numerical scan we consider the following ranges of parameters: We obtain 5k parameter space points that satisfy all experimental constraints. We remark that the above parameter ranges are optimised to yield a good efficiency with respect to passing the list of constraints. As we mentioned above we use SPheno to evaluate the mixing matrix numerically.
In fig. 1 we show the contribution to the EDM for our parameter space points as a function of tan β and η(λ 5 ) for the type I THDM (left panel). We also show the results for the type II THDM for comparison in the right panel of the same figure. One can see that for the THDM of type II low tan β with large η has the smallest EDM values. With the used scan resolution no points below the EDM bound are found. This can be compared with the analysis in ref. [20], wherein a region with small values for tan β was identified that is not excluded by the EDM and the Higgs constraints considering the type II THDM. For the type I version of the THDM allowed parameter space points can be found for all considered values of η(λ 5 ). For the parameter space points satisfying all of the above constraints we show the projection of the three neutral scalar masses versus tan β in fig. 2. From this figure we can see that for the viable points we found in our scan, both, H 2 and H 3 , have masses between about 200 and 700 GeV, while H 2 has more parameter space points with masses around 200 to 300 GeV, and H 3 tends to be slightly heavier.

Analysis
In this section we discuss the production mechanism for the scalar bosons of the type I THDM and the currently allowed cross sections. We investigate the process pp → H i → ττ that we use to analyse the CP properties of extra Higgs states and evaluate the prospects of finding it in the presence of the considered background processes. Then we perform an analysis of the angular distribution of the final state taus.

Heavy scalar production rates
We consider the LHC in its high-luminosity phase (the HL-LHC) with an expected total integrated luminosity of 3 ab −1 and a center-of-mass energy of 14 TeV. The dominant production processes for the Higgs bosons at the HL-LHC are gluon-gluon fusion (around 90%) and vector boson fusion. We calculate the effective gluon-gluon-Higgs coupling using SPheno and include the QCD corrections from ref. [54]. The production cross sections are calculated including the effective gluon-Higgs vertex in MadGraph [55]. Since the signal for CP violation is encoded in angular correlations of the heavy scalars' decay products, it can only be assessed statistically. Therefore we are interested in how many signal events can be expected, requiring that the parameter point is allowed by the above discussed constraints. For this assessment, we use our parameter space set from the previous section, selecting for parameter points that conform with all constraints. We show the total cross section for the process pp → H i → ττ in fig. 3, wherein the blue and red points denote the cross sections for the scalar bosons H 2 and H 3 , respectively.
We notice that parameter space points exist with production cross sections larger than a few femtobarn, which would yield more than a few thousand events at the HL-LHC. While this is in principle sufficient for a statistical study of the CP violation signal, it may be difficult in practice due to large backgrounds and reconstruction uncertainties. In the next subsection, we will evaluate a specific benchmark point. Backgrounds Table 1: Dominant background processes considered in our analysis and their total cross sections. The samples have been produced with the following cuts: P T (j) ≥ 20 GeV, P T (l) ≥ 10 GeV. The efficiency of the QCD jets to be mistagged as tau jet is taken from the CMS paper [57] and we use the fake rate = 5 × 10 −3 from ref. [57].
The main irreducible SM backgrounds to this process come from Z → ττ [56] and from single top andtt, with tau jet pair produced from the W decay. Other backgrounds arise from the misidentification of light jets as tau jets, for instance W boson plus jet or multijets. The here considered backgrounds are listed, together with their cross sections, in tab. 1 We simulate signal samples including 20 million events and background samples including 30 million events for each background with MadGraph5 [55]. The parton shower, hadronisation and spin correlation of the tau lepton decay is taken care of by Pythia8 [58]. We perform a fast detector simulation with Delphes [59]. The tau jets are tagged using the Delphes analysis framework with reconstruction efficiency of 70% and misidentification rate of 5 × 10 −3 for the QCD jet, which we implement at the analysis level. For the background we adopt a reconstruction efficiency of 60% (following ref. [60]). For the event reconstruction we require two tau tagged jets with P T > 20 GeV where events with b-tagged jets are rejected.
We find that interference between the H i bosons has a very small effect for the here chosen benchmark point, namely it increases the total cross section by about 5%. In particular, the interference between H 2 and H 3 is suppressed by the small H 3 total cross section, which is about 1.5 · 10 −5 pb, compared to the total cross section of the H 2 , which is 0.3 pb. Therefore, in the next section, we will study an exclusive sample from the process pp → H 2 → ττ . We focus here on the H 2 boson, which in general is more strongly coupled to the SM fermions and thus yields a stronger signal, i.e. more events. To separate the signal from the backgrounds, we train a Boosted Decision Tree (BDT), 3 which we feed with the simulated distributions from the process pp → H 2 → ττ , neglecting the small contributions from H 1 and H 3 . As variables we include the invariant mass of the two reconstructed taus, the missing transverse energy and ∆R(τ had , τ had ).

Shape analysis for establishing CP violation
The BDT algorithm ranks the input variables according to their ability to separate between signal events and background events. The BDT classifier ranges from −1 to 1 and quantifies the separability of signal and background. Events with discriminant value near 1 are classified as signal-like events and those near −1 are considered as background-like events. The BDT response to signal and background events is shown in the left panel of fig. 4 in blue and red, respectively. The optimization of the signal significance as a function of signal and background cut efficiency is shown in the right panel of fig. 4. The maximum cut efficiency is at BDT classifier ≥ 0.193, corresponding to a signal significance 7σ with signal efficiency 0.57 and background rejection efficiency 0.0059. For the benchmark point with θ τ τ = 0.68, the BDT yields 2043 signal events versus 82212 background events.
Additionally, we simulate distributions for the same benchmark point but with different CPmixing angles θ τ τ = 0, π/8, 3/8π, π/2. We remark that we are using the simulations with different θ τ τ values only for comparison, and we do not check that all experimental constraints are satisfied for suitable corresponding parameter points.
As signal we consider the decay H 2 → ττ with subsequent decay of τ ± → ν τ π ± . As described above we study the tracks inside the tau jets, which carry information about the spin correlation between the tau lepton and π ± , and thus allow us to reconstruct the angle between the decay planes of the two τ leptons, the acoplanarity angle φ * as defined above. A P T cut on the tagged jets is applied, forcing the transverse momentum to be larger than 20 GeV. Furthermore, we improve the quality of the events with a cut on the track impact parameter: d 0 ≥ 50 µm. (This cut is taken into account during the analysis and the reported numbers after BDT cut assume this cut.) The fourvectors of the pion candidates' track are boosted to the ZMF as described in sec. 2.5 above. In the ZMF the new acoplanarity angle φ * is evaluated according to eq. (19). Now we turn to analysing the shape of the distribution, aiming to infer the CP-mixing angle θ τ τ from the simulated data. First we observe that the simulated distributions after all cuts have a very similar shape to the theory prediction for Γ(Φ) from eq. 16. We thus define the reconstructed distribution in the ZMF frame for our numerical fit to the data, introducing the fit parameters a, b, as: We find excellent agreement between our fitted values for a θττ , b θττ and the theoretical values in eq. 16, which are a θττ = 1/(2π) and b θττ = π/32 cos 2 θ τ τ − sin 2 θ τ τ . We therefore directly compare the reconstructed distributions with the theory predictions from eq. 20.
For our shape analysis we consider the distributions for the samples of 2043 events labelled "2K", corresponding to the expected event yield of the benchmark point at the HL-LHC, and the "infinite statistics" limit labelled "2M", corresponding to 2 million events. The latter have a much smaller statistical uncertainty compared to the systematic one, which stems from uncertainties related to hadronisation, detector simulation, and the reconstruction of tau leptons. We show the distributions for both, the small and large versions of the five signal samples, in fig. 5. In the figure we also show the theory prediction for 1/ΓdΓ/dφ * in eq. (20).
The distributions are given for N bins = 20 bins from which we create a χ 2 fit for different values of θ τ τ using  where θ τ τ is the mixing angle of a given benchmark point, θ fit an input of the theoretical distribution, S θττ i the signal distribution in bin i, n S = 2043 is the total number of signal events, and The number of background events after the BDT cut is N bkg = 82212 and α is the precision with which the background can be controlled experimentally. The background is completely flat with respect to the signal, which is an outcome of our simulation. In the following we consider the three exemplary values α = 5%, 1%, and 0.5%, which we assume to be conservative, realistic, and optimistic, respectively. For both, the distributions from the small and large samples, the above χ 2 fit yields a minimum for θ fit that agrees with the set value θ τ τ with high accuracy. We chose the confidence level (CL) for excluding pure CP-even or CP-odd hypotheses from the ∆χ 2 distributions at 90%. For our 20 observables (the bins) minus the one parameter (θ fit ) this corresponds to ∆χ 2 = 27.2. We find that for α = 5% and 1% no statistically meaningful statement on CP violation is possible at the 90% CL for our benchmark point at the HL-LHC. We show the resulting χ 2 distributions for the five considered CP mixing angles in fig. 6 for α = 0.5%. For our benchmark point where the set value is θ τ τ = π 4.6 , and considering the HL-LHC sample with 2043 events, our procedure allows to determine θ τ τ π 4.6 ± 0.3 at 90% CL. CP-conservation can therefore be excluded at 90% CL for this point.

Conclusions
The violation of CP symmetry is fundamental to the baryon asymmetry of the Universe. One of the few ways to introduce it is a CP-violating scalar sector, which implies the existence of additional scalar degrees of freedom (i.e. extra Higgs states) with possible observable consequences at the LHC and future colliders. Some of the signatures that indicate the violation of CP in the scalar sector include: simultaneous observation of specific processes, top-quark associated production modes and angular momentum correlations in sequential decays such as H i → ZZ → 4µ and H i → ττ .
In this article we consider the type I and II Two Higgs Doublet Models (THDMs) as examples for observable CP violation in the scalar sector. We evaluated the mass eigenbasis numerically, i.e. without assumptions on any of the parameters. We determined a viable parameter space region via a numerical scan over the parameters that are compatible with the present constraints, including theoretical considerations, B-physics measurements, Higgs data, and measurements of electric dipole moments. Our scan shows that the constraints allow for scalar bosons with masses of order a few hundreds of GeV, which can be within reach of the HL-LHC. Moreover, we find that the possible amount of CP violation is much more suppressed in the type II THDM.
In case of CP violation the decay chain H i → ZZ → 4µ can give rise to three clearly distinct Higgs peaks. This can provide a clear signal for CP violation in the considered THDM (cf. Appendix B), where exactly two of the Higgs fields can decay to ZZ at the tree-level in case of CP conservation. However, this signature is not unambiguous, since the third resonance could stem from additional scalar fields outside the THDM. Using the angular distributions in this decay chain turned out not to be feasible due to the coupling of the CP-odd component to ZZ, occurring only at loop-level, being too strongly suppressed (cf. Appendix A).
Towards finding an unambiguous signal of CP violation in the scalar sector we have analysed the process pp → H 2 → ττ in the type I THDM at the detector level for a selected benchmark point, using a Boosted Decision Tree (BDT). We included the following SM backgrounds: Z → ττ , single top and tt, and light jet misidentification. For our analysis the decays τ → ντ π were implemented. The detectability of CP non-conservation was quantified via a χ 2 fit of the theoretically predicted distributions of the reconstructed tau-decay planes to the simulated data. We find that CP conservation in the scalar sector can be excluded at the 90% CL for our selected benchmark point, i.e. when the CP-mixing angle is close to its maximal value (π/4) and the background can be controlled with a relative accuracy of 0.5%, which could be the accuracy target for future measurements. Our results are conservative, since also other τ -decays (such as τ → ν τ ρ) can be used to study CP violation.

A Angular correlations in H i → ZZ → 4µ
In this appendix we investigate the process H i → ZZ → 4µ, and the possibility to infer the CP property of H i from angular correlations in the final state muons. This possibility has been discussed previously, cf. refs. [2,17,[30][31][32][33][34][35]. Searches for such processes have been carried out by the ATLAS [15] and CMS [16] collaborations.
The starting point is the observation that CP-odd fields couple to ZZ only at loop-level, dominantly via a loop involving a top quark, cf. fig. 7. The pseudoscalar coupling to the top quark gives rise to specific correlations in the four-fermion final states. In order to determine whether or not these final state correlations can be observed, we investigate the branching ratios of CP-even (H) and CP-odd scalars (A 0 ) into ZZ.
Let us evaluate the size of the effective couplings for H and A 0 from the contributions in fig. 7. The matrix element for Higgs decays to ZZ is given by where * 1 , * 2 and P 1 , P 2 are the polarization vectors and the momenta for the outgoing gauge bosons, and E µναβ is the totally antisymmetric tensor. The form factors C 2 and C 3 measure the strength of the coupling of the CP-even and CP-odd states to ZZ that arises at one-loop level, while C 1 is the coupling of the CP-even field to ZZ from the tree-level diagram. It is the contraction of the momenta via the antisymmetric tensor E µναβ in the last term of eq. (34) that gives rise to the different correlations in the four-muon final states of the process A 0 → ZZ → 4µ. We evaluated the coefficients C i using FeynCalc [62] and Package-X [63]. The tree-level and the one-loop couplings are given by: where α is the mixing angle between the CP-even Higgs bosons, θ W is the weak mixing angle and with m φ and m a denoting the masses of the decaying Higgs bosons (H or A 0 ) and of the loop particles, respectively. It is in principle possible to test the CP transformation property of the extra Higgs state via an asymmetry in the angular distributions of the four fermion final states [31]. For CP-odd fields, this requires the measurement of the final state correlations in the final states from the process A 0 → ZZ → 4µ. To infer the CP transformation property sucessfully, a large sample of 4µ from this decay chain is needed, which in turn requires a substantial branching fraction of the process. The dominant pseudoscalar decay modes are e.g. A 0 →tt ∝ (y t cos β) 2 and A 0 →bb ∝ (y b sin β) 2 and also A 0 → H ± W ∓ ∝ (g 2 sin 2β(P A − P W )) 2 , A 0 → HZ ∝ (sin(α − β) g 2 1 + g 2 2 (P A − P Z )) 2 .
Since the dominant decay channels are unsuppressed tree-level decays, it turns out that the branching ratio for A 0 → ZZ in THDMs is quite small, maximally about 10 −3 (cf. [64]), and the branching ratio to 4µ leads to a further suppression by Br(ZZ → 4µ) 10 −3 We find that the production cross section for A 0 is at most 0.1 pb, which yields a total cross section for the process pp → A 0 → ZZ → 4µ of ∼ 10 −7 pb, and suppressing backgrounds by introducing cuts will reduce the resulting number of events that can be used for an analysis even further. With the total luminosity at the HL-LHC being 3 (ab) −1 it is clear that the loop-suppressed decay A 0 → ZZ → 4µ is too much suppressed to use it for studying the angular correlations of the four muons. Of course the same conclusion also applies to the CP-odd component of an H i that is an admixture of a CP-even and a CP-odd field.

B The Higgs spectrum from
The process pp → H i → ZZ → 4µ is a very clear channel that may contribute substantially to the discovery of the scalar H i . 4 As we discussed in the previous section, it is not feasible to use the angular distributions of the final state muons from this process at the HL-LHC for establishing the existence of CP violation in the scalar sector. However, in the context of THDMs, it can still be used to establish a signal of CP violation via the reconstructed Higgs spectrum from the invariant mass distribution of the Higgs decay products.
When the scalars are not pure eigenstates of CP, all of the H i can have sizeable branching ratios into ZZ, giving rise to three resonances in the 4µ final state, as shown in fig. 8. Measuring three peaks for the invariant masses of the four muon final states is thus a clear signal for CP violation within the complex THDM. To evaluate the observability of this process, we consider a benchmark point with m H2 = 260 GeV and m H3 = 500 GeV, based on the model parameters tan β = 4, λ 1 = 0.172, λ 2 = 0.0828, λ 3 = 5.149, λ 4 = −0.313, (λ 5 ) = −4.6431, η(λ 5 ) = 0.81 and m 2 12 = 1.091 × 10 4 GeV 2 . The parameters m 2 11 and m 2 22 are then fixed by the previous parameters due to the tadpole equations, cf. eq. (6). In fig. 9 we show the total cross section for the process pp → H i → ZZ → 4µ for number of scanned points from our scan in sec. 3.5.  Table 2: Dominant background processes considered in our analysis and their total cross sections. The samples have been produced with the following cuts: P T (j) ≥ 20 GeV, P T (l) ≥ 10 GeV.
We consider the following backgrounds: The dominant SM background that contributes to the final state with 4µ is ZZ production. Other reducible backgrounds are W W and W Z production, where one of the jets is misidentified as a muon. This set of background processes can be sufficiently reduced by the requirement of tight isolation criteria for the hard final state muons. The set of backgrounds with tt production and the associated production of top quark with a W boson can be reduced by vetoing b-jets. The last set of backgrounds with three gauge boson production is highly suppressed by the large missing energy associated to these processes and will not be included in the analysis. All the considered and included backgrounds are listed with their cross sections in tab. 2.
We constructed all possible kinematic variables for the signal and all relevant backgrounds and used the BDT to optimize the signal to background classifier as shown in fig. 10 (left). According to the BDT ranking, the invariant mass of the four final state muons is the most important variable to separate the signal from the backgrounds. The fact that all three neutral bosons can decay into a pair of Z bosons proofs that our benchmark point has Higgs states with mixed CP properties, since otherwise one of the three bosons would be a pure pseudoscalar which does not interact with ZZ at tree-level.
The signal significance as a function of signal and background cut efficiency is shown in the right panel of fig. 10. The maximum cut efficiency is at BDT ≥ 0.193, corresponding to a signal significance 11σ with signal efficiency 0.187 and background rejection efficiency 0.0004, which demonstrates an excellent discovery potential for our benchmark point in this channel alone.
We emphasize that the observation of three scalar resonances in the 4µ final state is a positive signal for CP violation only in the THDM, because there it is absent when CP is conserved. It is not an unambiguous signal of CP violation outside the THDM, since the third resonance could stem from some other CP-even scalar field.