Scalar mass dependence of angular variables in tt¯ϕ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ t\overline{t}\phi $$\end{document} production

In this paper we explore CP discrimination in the associated production of top-quark pairs (tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ t\overline{t} $$\end{document}) with a generic scalar boson (𝜙) at the LHC. We probe the CP-sensitivity of several observables for a varying scalar boson mass and CP-number, either CP-even (𝜙 = H ) or CP-odd (𝜙 = A), using dileptonic final states of the tt¯ϕ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ t\overline{t}\phi $$\end{document} system, with 𝜙 →bb¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b} $$\end{document}. We show that CP-searches are virtually impossible for 𝜙 boson masses above a few hundred GeV in this channel. A full phenomenological analysis was performed, using Standard Model background and signal events generated with MadGraph5_aMC@NLO and reconstructed using a kinematic fit. The most sensitive CP-observables are used to compute Confidence Levels (CLs), as a function of luminosity, for the exclusion of different signal hypotheses with scalar and pseudoscalar boson masses that range from m𝜙 = 40 GeV up to 200 GeV. We finalize by analysing the impact of a measurement (or limit) of the CP-violating angle in the parameter space of a complex two-Higgs doublet model known as the C2HDM.


Introduction
The CP-nature of the discovered Higgs boson is still an open issue. Although the ATLAS and CMS collaborations established that the discovered 125 GeV Higgs [1,2] cannot be a pure pseudoscalar with more than 99% confidence level (CL), a mixed state with a significant CP-odd component is still possible. The need for further sources of CP-violation was first discussed by Sakharov as one of the three conditions for baryogenesis to occur [3]. This is an important motivation to look for physics Beyond the Standard Model (BSM) and, in particular, to models with extra sources of CP-violation. One of the simplest extensions of the Standard Model (SM) we can build with a CP-violating scalar sector is to add an extra scalar doublet to the SM field content while keeping the same gauge symmetries. The CP-conserving version of the model is commonly referred to as two-Higgs doublet model (2HDM), while the simplest CP-violating version of the model is known as complex 2HDM (C2HDM) and has been the subject of many studies [4][5][6][7][8][9][10][11][12][13][14][15][16][17]. Due to its simplicity, the C2HDM is seen as an ideal benchmark model to test the CP quantum numbers of scalars at the LHC. In the C2HDM, all three neutral scalars may have a mixture of CP-even and CP-odd components and there is no restriction to the mass of these scalars. Although one of the scalars has to be the discovered 125 GeV Higgs boson, it can be any of the three neutral scalars predicted by the theory, from the lightest to the heaviest.
The search for BSM physics is a major goal of the LHC experiments. The measurement of the Yukawa couplings has become a primary target, since it is decisive to establish the CP-nature of the scalars in case a new scalar is discovered. The relation between the scalar and the pseudoscalar components in the Yukawa couplings can be directly probed either in the production or in the decays of the scalars, depending on the final state fermions.

JHEP06(2020)155
Examples of the use of asymmetries to probe the CP-nature of the Higgs boson in the top quark Yukawa coupling were discussed in [18][19][20][21][22][23][24], while the decays of the tau leptons were used to probe the τ -lepton Yukawa coupling [25][26][27][28][29]. These studies for the top-quark and for the τ -lepton are now being discussed in detail by the ATLAS and CMS collaborations. In the case of the top quark we will be probing the Yukawa coupling directly in the production process. For the τ -leptons, the Higgs decay is used. It is important to point out that there is still room for very large pseudoscalar components in the couplings to b-quarks and τleptons, for any of the scalars in some Yukawa versions of the C2HDM, considering the recently announced ACME collaboration's constraint on the electron EDM, |d e | < 1.1 × 10 −29 e cm, (1.1) from measurements of the ThO molecule [30], as recently reported in [31,32]. In this work we will examine in detail how the asymmetries and the angular variables distributions previously proposed change when the mass of the scalar differs from the measured Higgs mass. Many studies have examined several angular variables in ttφ production (with m φ = 125 GeV). These have considered a next-to-leading order (NLO) computation, including resummation of soft emission corrections to next-to-leading-logarithmic accuracy (NLL) [33][34][35][36][37][38]. Nontheless, a detailed study for a scalar with a mass either below or above this value is still not available in the literature.
We will build on a series of papers where the issue of the CP-nature of the discovered Higgs boson was thoroughly studied in associated production with top quark pairs [20,21,39,40]. We will discuss the same set of angular variables with several goals in mind. The first one is to answer the question: if a new scalar or pseudoscalar boson exists, what is the confidence level to exclude a signal hypothesis (either CP-even or CP-odd) assuming the SM holds, as a function of the LHC luminosity and φ boson mass? The second one focuses on determining the confidence level for exclusion of a pure CP-odd signal in case a new massive scalar boson is found, as a function of the φ boson mass and the LHC luminosity. The third one relates to setting the confidence level for the exclusion of the SM (once again as a function of mass and luminosity) assuming a new CP-even scalar particle signal is found.
The outline of the paper is as follows. In section 2, we describe the φ boson mass dependence of the several angular distributions to be studied. In section 3, we present and discuss our main results. In section 4, we consider the impact of the discovery of a new Higgs boson on the parameter space of the main benchmark model for CP-violation studies, the complex version of the two-Higgs doublet model (C2HDM). Our conclusions are presented in section 5.

Theoretical limitations on asymmetries measurements
The most general Yukawa interaction of a boson (φ), with no definite value of CP, to a top quark pair can be written as L = κ t y tt (cos α + iγ 5 sin α)tφ , (2.1)

JHEP06(2020)155
where y t is the SM Yukawa coupling, κ t parametrises the total coupling strength relative to the SM and the angle α parametrises the CP-phase, which is related to the parameters in the Higgs potential. We will refer to φ = H for the pure CP-even scenario and φ = A for the pure CP-odd case. The pure CP-even case is recovered by setting cos α = ±1 while the pure CP-odd case is obtained by fixing cos α = 0. In previous works [18][19][20][21] several angular variables were proposed, not only to increase the sensitivity in discriminating signals from irreducible backgrounds at the LHC in ttφ final states, but also as a means to probe the CP nature of the Yukawa coupling in ttφ production at the LHC. The results in [20,21] showed that we can define a minimal set of variables to obtain the best possible sensitivity, to achieve both goals in a very effective way. While these studies assumed a mass of 125 GeV for the φ boson, in this paper we extend their use to a wider mass range, from 40 GeV to 500 GeV. This is discussed in the following sections.

ttH and ttA angular distributions
A first set of variables is introduced [20] using θ X Y , defined as the angle between the direction of the Y system 3-momentum (in the rest frame of X) with respect to the momentum direction of the X system (in the rest frame of its parent system). When reconstructing the signal angular distributions, we consider successive two body decays of the ttφ system down to the final state particles i.e., the quarks (or jets), the charged and the neutral leptons, which originated from the decays of the t,t quarks and φ boson. If the decay chain of the ttφ system is labelled (123), the successive decays considered include all possible combinations of the type (123) → 1+(23), (23) → 2+(3) and (3) → 4+5 (see figure 1). We then build three families of observables: f (θ 123 1 )g(θ 3 4 ), f (θ 123 1 )g(θ 23 3 ) and f (θ 23 3 )g(θ 3 4 ), with f, g = {sin, cos}. The momentum direction of the (123) system is measured with respect to the laboratory (LAB) frame, where the net 3-momentum of the protons colliding is zero. Particles 1 to 3 are either the t or thet quark, or the Higgs boson, while particle 4 can be any of the products of the decay of the top quarks and the Higgs boson, including the intermediate W bosons. We use two ways of computing particle 4 Lorentz vector in the centre-of-mass of particle 3. One is by using the laboratory four-momentum of both particles 3 and 4, and boost particle 4 directly to the centre-of-mass frame of particle 3 (direct boost). The other, is to boost particles 3 and 4 sequentially through all intermediate centre-of-mass systems until particle 4 is evaluated in the centre-of-mass frame of particle 3 (sequential boost or seq. boost).
We will also use the variables b 2 and b 4 as defined in [18,40] in the LAB and ttφ centre-of-mass systems (b ttφ 2 and b ttφ 4 , respectively), where the z-direction corresponds to the beam line. It is worth noting that b 2 and b 4 have a natural physics interpretation. They depend on the t andt polar angles, θ t and θt respectively, with respect to the z-direction, and can be expressed as b 2 = sin θ t × sin θt and b 4 = cos θ t × cos θt.  Forward-backward asymmetries associated to each of the observables under study were defined according to [20] where σ(x Y > x Y ) and σ(x Y < x Y ) correspond to the total cross section for x Y above and below x Y . The latter is the central value of the x Y domain. The reason why these distributions allow us to probe the CP-nature of a scalar in the ttφ coupling lies ultimately in the behaviour of the cross section as a function of the particle's CP value. In fact, as discussed in [18], the amplitude for the process pp → ttφ has two terms: one that does not depend on the mixing angle, α, and another that is proportional to cos 2α. Hence, only the latter is sensitive to a CP-odd component of the Yukawa coupling. This term is proportional to the top quark mass and therefore its contribution is important, as long as the Higgs boson mass is of the same order of magnitude. One could ask if the process pp → bbφ could be used to probe the Yukawa structure of the bbφ vertex. The answer is clearly negative because the CP-asymmetric term is now proportional to m 2 b , that is, at least three orders of magnitude smaller. In the left panel of figure 2, we present the b 4 distribution, at parton level, for the process JHEP06(2020)155 pp → bbφ for m φ = 125 GeV. In blue, we present the pure scalar case while in red we show the pure pseudoscalar one. As expected no difference is found in the distributions. We have checked that the distributions of all other angular variables follow the same trend and again no difference was seen. Finally we repeated the procedure for a very light scalar, with a mass of m φ = 10 GeV, with similar null results as we show on the right side of the same figure. The case of the bbφ final state has been previously discussed in [41,42].
In figure 3, we present the total cross section, at NLO, for a centre-of-mass energy of 13 TeV, at the LHC, for the process pp → ttH (blue) and pp → ttA (red) as a function of the φ boson mass. The fact that the CP-asymmetric term is much larger compared to the bbφ case means that CP-discrimination between the different CP-components of the Higgs is now possible. Figures 4 and 5 show the b 2 and b 4 distributions for ttH and ttA events with different φ boson masses, computed in the LAB and in the centre-of-mass frame of the ttφ system, respectively. They are shown at parton level without any cuts. Next-to-leading order corrections and shower effects (NLO+Shower) are also included. Clear differences are now visible between the scalar and pseudoscalar signals, and also between the distributions computed in the LAB and in the centre-of-mass frames.
In order to study the CP-sensitivity as a function of the φ boson mass, forwardbackward asymmetries of some variables were computed for each CP-component of the top quark Yukawa coupling, i.e., CP-even and CP-odd. The variables are, • X = sin θ ttφ t sin θ φ W + (with sequential boost).
The full normalized distributions, at parton level, are shown in figure 6. As hinted by the behaviour of the cross sections, for large enough Higgs masses, the difference between JHEP06(2020)155 CP-even and CP-odd distributions disappears. Although this behaviour was confirmed for all variables, the exact mass value for which the difference becomes negligible depends on the choice of variables. The maximum value of the φ boson mass for which a meaningful difference between distributions exists is 400 GeV.

Generation of events
Signal events from pp → ttφ associated production at the LHC (with φ = {H, A}), were generated at NLO with the Higgs Characterization model HC NLO X0 [43], using MadGraph5 aMC@NLO [44]. The pure CP-even and the pure CP-odd odd samples were generated by setting the CP-phase to cos α = 1 or 0, respectively, following equation (2.1), with κ t = 1. Several samples, for both scalar and pseudoscalar signals, were generated with masses m φ between 40 and 300 GeV, in steps of 20 GeV, and also the four masses m φ = 350, 400, 450 and 500 GeV. While the CP-even and CP-odd bosons were only allowed to decay to a pair of b-quarks (φ → bb), the tt system was assumed to decay to a pair of b-quarks and two intermediate W ± gauge bosons which, in turn, decay to two charged leptons and two neutrinos t(t) → bW . Following the decay of all intermediate massive particles, the signal final state is characterized by the presence of two oppositely charged leptons, two neutrinos and two bb quark pairs, at parton level. Only W boson decays to electrons (e) and muons (µ) were considered as signal. This configuration defines the dileptonic channel.
In addition to the signal samples, backgrounds from SM processes were also generated using MadGraph5 aMC@NLO. The dominant background, a pair of top-and b-quarks (ttbb), as well as the associated production of top-quarks with the SM Higgs boson (ttH SM ), were generated at NLO. For the latter, a SM Higgs boson mass of m H SM = 125 GeV was assumed. These two backgrounds lead to the same partonic final state as the signal.
The remaining backgrounds considered, which were all generated at tree-level (LO), are: • tt+3 jets i.e., top-quark pair production with up to three light jets.
• ttV + jets i.e., top-quark pair production with one gauge boson (V = Z, W ± ), plus up to one light jet.
• Single top quark production through the s-, t-channel (with up to one additional jet) and W t associated production.
• W +4 jets, i.e., W ± boson production with up to four light jets.
• W bb+2 jets, i.e., W ± boson production with two jets from the hadronization of bquarks (b-jets), and up to two additional light jets.
• Z+4 jets i.e., Z boson production with up to four light jets.
• Zbb+2 jets i.e., Z boson production with a pair of b-jets plus up to two light jets.
• W W, W Z, ZZ+3 jets i.e., diboson production with up to three jets.
All events were generated assuming proton collisions at the LHC with a centre-ofmass energy of 13 TeV. The masses of the top quarks (m t ) and the W bosons (m W ), were set to 173 GeV and 80.4 GeV, respectively, while their widths were set to the default JHEP06(2020)155 MadGraph5 aMC@NLO values of 1.4915 GeV and 2.0476 GeV, respectively. For all samples, the NNPDF2.3 [45] parton distribution functions (PDFs), were used. The renormalization and factorisation scales were fixed to the sum of the transverse masses of all final state particles and partons. The decay of particles was performed by MadSpin [46] for signal and background events in order to preserve spin correlations among the decay products and with the respective heavy parent resonances. Parton shower and hadronization was performed by Pythia6 [47]. The matching between the generator and the parton shower used the MLM scheme [48] for the LO samples and the MC@NLO matching [49] for the NLO events. For a fast, parametrised detector simulation of a LHC-like experiment, we used Delphes [50] with the default ATLAS parameter card. For jet reconstruction of the signal and background events, FastJet [51] is employed with the anti-k t algorithm [52] with a cone size of ∆R = 0.7. 1 Transverse momentum (p T ) cuts are applied to jets such that, in any events, these objects are kept if the following condition is met No additional cuts were applied to the transverse momentum of leptons nor to the pseudo-rapidity (η) of jets and leptons (at generation level).

Kinematic reconstruction
After event generation, hadronization and detector simulation, we use a kinematic reconstruction to assign detector level jets to partons from the hard-scattering process and, using the detected charged leptons, reconstruct the massive intermediate particles i.e., the top quarks, the W and φ bosons. This, unavoidably, requires the reconstruction of the undetected neutrinos, which is performed on an event by event basis, using the MadAnalysis5 [53] framework.
Only events with at least two charged leptons of opposite charge and four or more jets are selected and reconstructed. Both leptons and jets were required to have p T ≥ 20 GeV and |η| ≤ 2.5, which leads to signal selection efficiencies that vary from 9% (12%) to 18% (19%) for masses of the scalar (pseudoscalar) from 40 GeV to 200 GeV, respectively. The uncertainties on these numbers are smaller than 0.2%. It should be stressed at this point, that no attempt to optimize the selection was applied by looking for instance, to boosted jets, which is outside the scope of this paper.
One of the main challenges of the kinematic reconstruction, is the assignment of jets to the reconstructed parton level objects, that match correctly the decay particles of the top quarks, the W and the φ bosons. In order to check the performance of the kinematic reconstruction a truth-match approach was used for the assignment, by finding the four jets with smallest ∆R distance to the parton level b-quarks. As we expect a one-to-one correspondence, a wrong assignment leads, necessarily, to combinatorial background. As when dealing with real data no possible truth-match information is available, an association criteria needs to be applied to the events. This relies on a multivariate analysis method JHEP06(2020)155   tailored for each case, CP-even and CP-odd, using TMVA [54]. In both cases, two samples labelled as signal and background were created from simulated ttφ signal events and used for training and testing. While signal samples contain kinematic distributions only from the correct association, background samples contain equivalent kinematic distributions from wrong associations. The following variables were used for training the methods: ∆R, ∆Φ, ∆θ for the pairs (b t , l + ), (bt, l − ) and (b φ ,b φ ), where b t (bt) represents the bottom (anti-bottom) quark from the top (anti-top) decay and b φ (b φ ) represents the bottom (anti-bottom) quark from the Higgs decay. The invariant mass of the first two pairs, at parton level, and the invariant mass of the system (b φ ,b φ ) at the detector level, were also considered. These variables and their correlations are shown in figures 7 and 8 for the CP-odd case with m A = 40 GeV. We have found that, for all the mass values of the CP-even and CP-odd signals, the methods with best performance are the Boosted Decision Tree (BDT) and the Gradient Boosted Decision Tree (BDTG). The latter is the method used in the kinematic reconstruction. During the testing phase, the jet combination chosen is the one returning the highest value of the BDTG discriminant. The Receiver Operating Characteristic (ROC) curve and the BDT and BDTG discriminant distributions are shown in figures 9 and 10, respectively, for ttA with m A = 40 GeV.
In events with jet multiplicity above six, only the six highest p T jets are considered. The reason for this choice relates to the fact that, in about 95% of all signal events, the jets corresponding to the hadronization of parton level b-quarks are among the six with highest p T . Jet combinations also need to verify m l + bt (m l −bt ) < 150 GeV and 20 GeV < m b φbφ < 300 GeV.    Following the pairing of jets and leptons, the reconstruction of the undetected neutrinos 3-momentum is performed by solving the following set of equations, In the first four, relativistic mass constraints are imposed to signal and background events, by assuming the four-momentum of the W bosons (p W ± ), with masses set to m W , are reconstructed using the charged leptons and neutrinos four-momentum p ± and p ν(ν) , respectively. The top quarks, t andt, with masses set to m t , are reconstructed with the b-quarks four-momentum, p b and pb, correctly paired to the respective W + and W − . In the last two equations, the 3-momentum x and y components of the undetected neutrinos (anti-neutrinos), p x ν (p x ν ) and p y ν (p ȳ ν ), respectively, fully account for the x and y components of the missing transverse energy ( / E). Since top quarks and W bosons have non-zero widths, their mass distributions follow Breit-Wigner probability distribution functions (p.d.f.s), with pole masses fixed to m t and m W , respectively. In order to reconstruct the neutrino and anti-neutrino four-momenta, we generate random top and anti-top quark masses from 1-dimensional parton level p.d.f.s, and generate, consistently, random W ± masses, following 2-dimensional mass p.d.f.s of (m W + , m t ) and (m W − , mt). This ensures kinematic correlations are preserved when generating the top quark and W boson masses. We then solve the equations for all momentum components of the neutrinos. If no solution is found, the mass generation is repeated up-to a maximum of 500 trials. If there is still no solution, the event is discarded. Additionally, as the mass equations are of quadratic form, several solutions may exist for a single event.
In order to choose the best one, a likelihood function is constructed using p.d.f.s from the JHEP06(2020)155 transverse momenta of the neutrinos, the top quarks and the tt system, respectively P (p Tν ), P (p Tν ), P (p Tt ), P (p Tt ), P (p T tt ), all obtained from parton level distributions. Furthermore, we consider the two dimensional mass p.d.f. of the tt pair, P (m t , mt), and the mass of the reconstructed Higgs, P (m φ ), obtained with truth-matching. The likelihood is defined according to L ttφ ∝ 1 p Tν p Tν P (p Tν )P (p Tν )P (p Tt )P (p Tt )P (p T tt )P (m t , mt)P (m φ ).

(3.3)
A normalization factor 1/(p Tν p Tν ) is applied in the likelihood because energy losses due to radiation emission and effects from detector resolutions will tend to increase the reconstructed neutrino four-momentum. This factor compensates for too extreme values of the neutrinos p T , giving less weight to solutions of that type. We have checked, after event selection and considering only truth-matched signal events, that 66% to 73% of the total number of events are correctly reconstructed, corresponding to φ masses in the range 40 GeV to 300 GeV (for both scalar and pseudoscalar ttφ signals). If truth-match is not applied, the reconstruction efficiency varies from 49% (51%) to 63% (62%), for scalars (pseudoscalars), in the same mass range. In this case, the number of times the reconstruction results in the same jet configuration as the one found with truth-match varies from 29% (31%) to 49% (55%) for the same mass range of scalar (pseudoscalar) signals. It is worth mentioning here that the current kinematic reconstruction nicely extends the one discussed in [21] to a wider mass range of scalar and pseudoscalar bosons with very similar performance numbers, if not better. Figure 11 shows two-dimensional p T distributions of the W + (top-left), the top quark (top-right), the tt system (bottom-left) and the Higgs boson (bottom-right) after kinematic reconstruction of ttH events, for m H = 40 GeV. The correlation between the parton level (x-axis) and reconstructed (y-axis) p T distributions, is clearly visible. The same behaviour is observed for the ttA signals, as well as for the other scalar boson masses considered. The main difference for these distributions as we increase the Higgs mass is that the density of points in higher p T regions will also increase. The choice of the 40 GeV case was made for representation purposes only.
In figure 12, we show the neutrino reconstructed p T versus the parton level value (left) and the distribution of the Higgs boson reconstructed masses, obtained with truthmatching, for several masses of the scalar boson (right). In spite of the wider spread of values in the neutrino p T distribution, a clear correlation between the parton level and reconstructed p T is observed.
Additional selection criteria were applied to events following the kinematical reconstruction (final selection cuts), to further increase the signal to background ratio. The depletion of Z + 4 jets and Zbb + 2 jets backgrounds is accomplished by selecting events with a dilepton invariant mass (m + − ) outside a window around the Z boson mass (m Z = 91 GeV). That is defined by |m + − − m Z | > 10 GeV. Most backgrounds, notably the tt + 3 jets, are mitigated by selecting events with at least 3 b-jets. In figure  13, the expected number of events that survive the full selection criteria, for the different SM backgrounds is shown at the LHC and for an integrated luminosity of 100 fb −1 . The  distributions are compared to the CP-even and CP-odd signals, with m φ = 40 GeV, for different observables. The Z+jets includes the Z +4 jets and the Zbb+2 jets contributions. The W +jets, includes the contributions from W + 4 jets and W bb + 2 jets. Diboson events are composed of the W W + 3 jets, W Z + 3 jets and ZZ + 3 jets backgrounds, ttcc, tt + light jets is the tt + 3 jets process and ttH (m H = 125 GeV) is the ttH SM process.

CLs results for different exclusion scenarios
In this section, CLs on the exclusion of scalar and pseudoscalar signals ttφ (φ = H, A) evaluated for different scenarios, are computed as a function of the LHC luminosity, up to the High-Luminosity Phase (HL-LHC). Several mass values of the φ boson are considered,

JHEP06(2020)155
in the range 40-200 GeV. For larger scalar boson masses, the sensitivity with the current analysis is lost, because the total production cross sections for ttφ is too small, for both the scalar and pseudoscalar bosons. For Higgs masses lower than 40 GeV, we saw a significant degradation of the reconstruction efficiency, for the resolved ttφ analysis considered.
The b 2 and b 4 distributions are used to set the CLs evaluated in both the LAB and ttφ centre-of-mass systems, for comparison. The contribution of all SM backgrounds is taken into account, normalized to the LHC luminosity, as well as the different signal hypotheses. For each scenario under study, one million pseudo experiments are generated, using binby-bin Poisson fluctuations around a mean value, which is set to the number of events in each individual bin of the distributions. The probability that a H 0 and an alternative H 1 hypothesis can describe the pseudo experiment is evaluated for each of them. The likelihood ratio of the H 1 and H 0 probabilities is used as test statistics, to compute the CLs with which hypothesis H 1 can be excluded assuming H 0 is true. The expected CLs for exclusion were calculated as a function of the integrated luminosity, from 100 to 3000 fb −1 , using the b 2 and b 4 observables. The calculation of the CLs follows the prescription set by [55,56]. The different scenarios under consideration are: • Scenario 1: Exclusion of the SM plus a new CP-even scalar particle, assuming the SM. In this case, H 0 is the SM only hypothesis, 2 while H 1 is the SM plus a new CP-even signal; • Scenario 2: Exclusion of the SM plus a new CP-odd scalar particle, assuming the SM. In this case, H 0 is the SM only hypothesis, while H 1 is the SM plus a new CP-odd signal; • Scenario 3: Exclusion of the SM plus a new CP-odd scalar particle, assuming the SM plus a new CP-even scalar particle of the same mass. In this case, H 0 is the SM plus a new CP-even signal hypothesis, while H 1 is the SM plus a new CP-odd signal; • Scenario 4: SM exclusion, assuming the SM plus a new CP-even scalar particle. In this case, H 0 is the SM plus a new CP-even signal hypothesis, while H 1 is the SM only.
Other scenarios of interest could also be considered like, for instance, the exclusion of the SM plus a new CP-even scalar particle assuming the SM plus a new CP-odd scalar particle of the same mass, or the SM exclusion assuming the SM plus a new CP-odd scalar particle. As these scenarios were judged similar to some of the ones considered already, they were not shown in this paper.
In figure 14, we show the luminosity required for exclusion, at a given CL (2σ for the first three scenarios and 5σ for scenario 4), for each of the scenarios considered, as a function of the Higgs mass. If for a given scalar boson mass no points are shown, the exclusion is not possible, even at the end of HL-LHC (L = 3000 fb −1 ), for that mass. In each plot, the luminosity for exclusion is shown for the b 2 and b 4 variables. Integrated Luminosity (fb -1 ) CP-even exclusion at 2σ CL CP-odd exclusion (vs CP-even) at 2σ CL SM exclusion (vs CP-even) at 5σ CL Figure 14. Luminosity needed to exclude scenarios 1 (top left), 2 (top right) and 3 (bottom left) at the 2σ level, and scenario 4 (bottom right) at the 5σ level, as a function of the φ boson mass.
The top left of figure 14 (Scenario 1) tells us that, with the current LHC luminosity, we can exclude a CP-even scalar boson with CLs that exceed 2σ if its mass is m φ 80 GeV. For masses around the SM Higgs boson mass we require roughly 300 fb −1 to achieve the 2σ CLs exclusion. This will be obtained during the incoming RUN's of the LHC. Masses of the CP-even scalar boson above 200 GeV cannot be excluded even at the end of HL-LHC, with the dileptonic channel alone.
The top right of figure 14 (Scenario 2) shows that the exclusion CLs are quite different for the CP-odd case, greatly due to the reduced cross section when compared with the CP-even case. To exclude pseudoscalars at 2σ with respect to the SM, and with masses in the range m φ = 80 − 200 GeV, a luminosity of at least ∼1500 fb −1 is required at the LHC. For the m φ = 40 GeV case, the reconstruction efficiency is the lowest, thus the CLs are worse relative to the other CP-odd scalar boson masses.
If a new CP-even scalar is discovered (Scenario 3), a CP-odd exclusion is possible for the mass range considered, with diminishing CLs for an increasing φ boson mass, as shown JHEP06(2020)155 in the bottom left of figure 14. The exception for m φ = 160 GeV, with much lower CLs, is due to both hypotheses presenting almost the same number of events distributed similarly in the variables considered, hence both hypotheses have similar results which degrades the sensitivity for exclusion. Without considering the uncertainties, the best variable is b 4 in the laboratory frame (see figure 19 of appendix A).
In the bottom right of figure 14 (Scenario 4), we show that the SM only hypothesis will be excluded at the 5σ CL at some stage of the LHC lifetime, for masses of the discovered new CP-even scalar below m φ 120 GeV. For the case of m φ = 160 GeV, it can be excluded with almost 3σ at the end of HL-LHC (see figure 20 of appendix A).
The expected confidence levels, as a function of the integrated LHC luminosity, are shown in appendix A.

Interpretation in the framework of the C2HDM
We will now interpret the results in the framework of the C2HDM. We will briefly review the relevant aspects of the C2HDM to be used in the discussion (for a detailed description of the model see [17]). In the C2HDM the scalar potential is explicitly CP-violating and is invariant under a Z 2 symmetry Φ 1 → Φ 1 , Φ 2 → −Φ 2 , softly broken by the m 2 12 term, where the doublets Φ i (i = 1, 2) develop real vacuum expectation values (VEVs) v 1 and v 2 . All parameters are real except for m 2 12 and λ 5 . We define tan β ≡ v 2 v 1 and the rotation matrix that takes us from the gauge to the mass eigenstates is where s i = sin α i , c i = cos α i (i = 1, 2, 3), and − π/2 < α 1 ≤ π/2, −π/2 < α 2 ≤ π/2, −π/2 < α 3 ≤ π/2. The Higgs boson masses are ordered such that m H 1 ≤ m H 2 ≤ m H 3 . In the C2HDM, there are four types of Yukawa models. However, the top Yukawa couplings are the same in all four types and therefore this discussion is valid for all of them. The Yukawa Lagrangian for the up quarks in all four types has the form

JHEP06(2020)155
where ψ f denotes the fermion fields with mass m f , i is the scalar index, v 2 = v 2 1 + v 2 2 (fixed by the W boson mass) and t β = v 2 v 1 . What we want to understand now is what can be concluded for the parameter space of this specific model once we have either a measurement or an exclusion for a given φ boson mass (and luminosity). We will just analyse a simple situation where the 125 GeV Higgs is H 2 and the lightest Higgs is H 1 , with a mass below 125 GeV. We start by mapping equation (4.5) into equation (2.1)., The values of κ t and α are free to vary in their allowed range (taking into account available theoretical and experimental constraints) because no scalar was found below 125 GeV. Let us start by noting that sin α = 0 and sin α 2 = 0 are equivalent. This means that the CP-even limit is obtained unambiguously. The H 1 V V coupling, where V is a vector boson, is proportional to cos α 2 which vanishes for α 2 = π/2. What will be measured or constrained in the experiment is α and κ t . Also, the limits in this work were set for the pure scalar and pure pseudoscalar scenarios. For these scenarios we get, respectively, and a measurement or limit on κ t will set a limit on the parameters of the model. For the particular scenario where c 2 = 0 we obtain a limit on tan β. Because tan β is already constrained to be above one by low energy physics measurements (see [17]) information can only be added if we increase the limit. This is in fact the case, if the limit is for instance κ t ≤ 1/10 we get tan β ≥ 10 (c 2 = 0). With the same limit for κ t , in the remaining two scenarios the bound on the parameters is s 1 ≤ 1/10 (s 2 = 0) and s 2 ≤ t β /10 (s 1 = 0). In figure 15 we present the luminosity needed to exclude κ t at the 2σ level for the pure CP-even case (scenario 1), for a CP-even scalar boson mass of 40 GeV. Note that this is the most favourable scenario for discovery (and for exclusion). As can be seen the value of κ t attainable is close to 0.3 by the end of the LHC run. However, because this is a study using the dileptonic final state, we can expect to reach values of κ t of the order of 1/10 for an analysis which includes all other decay channels. The next question to ask is what are the constraints on the parameter space in scenarios where one is either close to CP-even or to the CP-odd scenario. In figure 16 (left), we present the allowed points in the C2HDM parameter space (c 1 vs. s 2 ) if a measurement of κ t and sin α is in the ranges 0.1 ≤ κ t ≤ 1.2 and 0.1 ≤ sin α ≤ 0.2. We also force 1 ≤ tan β ≤ 10. In the top plot we see the variation with κ t , in the middle with sin α and on the bottom with tan β. This is the case where we are close to the CP-even limit.  Figure 15. Luminosity needed to exclude κ t at the 2σ level for the pure CP-even case (scenario 1), for a CP-even scalar boson mass of 40 GeV.

JHEP06(2020)155
In figure 16 (right), we present the scenario when we are close to the CP-odd limit, that is 0.8 ≤ sin α ≤ 0.9. The most striking point is that although in each case we are closer to one of the limits, CP-even or CP-odd, the allowed parameter space is quite large and we clearly need some other sources of measurement to constraint the parameter space.

Conclusions
In this paper we examine the possibility of determining the CP nature of the heavier quarks (b and t quarks) Yukawa interactions with a generic scalar boson φ, in qqφ production at the LHC. We found that strategies to achieve this goal suggested in the literature for the case of the top quark do not work for the bottom quark, even at parton level. This was also confirmed for very light Higgs bosons with masses of the order of 10 GeV. The underlying reason is the CP-asymmetric term, responsible for the CP-asymmetries, which is proportional to m 2 f . Hence, this term is only meaningful when the fermion mass is of the order of m φ .
Previous works established that several kinematic distributions for ttφ are sensitive to the CP-components of the top quark Yukawa coupling. These studies assumed m φ = 125 GeV and an extension of the study to other masses of scalar and pseudoscalar Higgs bosons was still missing in the literature. In this paper, we investigate the dilepton final states of ttφ (with φ = H, A) for several masses of the CP-even or CP-odd boson (φ). We found that for the masses considered, there is still a good level of discrimination between scalar and pseudoscalar Yukawa interactions, at parton level. However, the differences between those cases become smaller as the Higgs mass increases, and vanish around m φ = 450 GeV. JHEP06(2020)155 Figure 16. Points allowed in the plane c 1 vs. s 2 for 0.1 ≤ κ t ≤ 1.2 and 1 ≤ tan β ≤ 10. In the left we impose 0.1 ≤ sin α ≤ 0.2 (CP-even like) and in the right we impose 0.8 ≤ sin α ≤ 0.9 (CP-odd like). On the top, we color superimpose κ t , in the middle sin α and on the bottom tan β.

JHEP06(2020)155
A full kinematic reconstruction was applied to signal and background events, to reconstruct the four momenta of the undetected neutrinos, allowing to estimate the experimental sensitivity of the CP-search. CLs are presented for the exclusion of several scenarios as a function of the luminosity, for different Higgs boson masses. Generally, it is shown that the required luminosity for exclusion at a given CL increases with the φ boson mass. Given the current LHC luminosity, of 150 fb −1 , exclusion of the SM plus a pure CP-even Higgs with masses of 40 and 80 GeV and SM couplings, assuming the SM only, is already possible. For m H > 200 GeV, CP-searches will require the inclusion of additional channels. We also found that the SM plus a CP-odd scalar exclusion, assuming the SM only, is harder than the CP-even exclusion for CP-odd Higgs masses up to 160 GeV. For higher masses, the opposite is true. If a new Higgs is found, we have enough sensitivity to exclude the possibility of the scalar being purely CP-odd in the explored mass range, again assuming SM-like couplings. In this work, only the dileptonic final states of the ttφ system is considered in the CLs evaluation, at the LHC. A natural follow up would be to combine several ttφ decay channels, to further improve the results obtained in this paper.
Finally, the impact of a new discovery was discussed for the C2HDM. If a new particle is found to be an exact CP-eigenstate, this will impose further constrains on typical 2HDM parameters such as tan β. In case the new particle is just close to either the CP-even or the CP-odd scenarios, the allowed parameter space would still be very large and will require other measurements to further constrain it.

JHEP06(2020)155
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.