Boosting the charged Higgs search prospects using jet substructure at the LHC

Charged Higgs bosons are predicted in variety of theoretically well-motivated new physics models with extended Higgs sectors. In this study, we focus on a type-II two Higgs doublet model (2HDM-II) and consider a heavy charged Higgs with its mass ranging from 500 GeV to 1 TeV as dictated by the $b\to s\gamma$ constraints which render $M_{H^\pm}>480$ GeV. We study the dominant production mode $H^\pm t$ associated production with $H^\pm \to W^\pm A$ being the dominant decay channel when the pseudoscalar $A$ is considerably lighter. For such a heavy charged Higgs, both the decay products $W^\pm$ and $A$ are relatively boosted. In such a scenario, we apply the jet substructure analysis of tagging the fat pseudoscalar and $W$ jets in order to eliminate the standard model background efficiently. We perform a detailed detector simulation for the signal and background processes at the 14 TeV LHC. We introduce various kinematical cuts to determine the signal significance for a number of benchmark points with charged Higgs boson mass from 500 GeV to 1 TeV in the $W^\pm A$ decay channel. Finally we perform a multivariate analysis utilizing a boosted decision tree algorithm to optimize these significances.


Introduction
The Large Hadron Collider (LHC) achieved a milestone when it discovered a 125 GeV scalar boson in its first run [1,2]. Even though the current LHC data points to a SM like scalar particle, it is still not sufficient to completely resolve the issue. Through the production and decays of the Higgs to the SM fermions and gauge bosons, the LHC has already measured many of its couplings. However the data as yet allows wide deviations from the SM expectations to within 2σ. Thus it is still too early to regard the SM as the ultimate theory of particle interactions and there is a need to explore alternative scenarios of electroweak symmetry breaking (EWSB) beyond the SM (BSM) which would be tested in the next run of the LHC, or possibly, at an e + e − collider which now has a reasonable hope of being constructed.
There are several theoretically well motivated scenarios to explain the EWSB beyond the minimal SM Higgs sector. One of the simplest and minimal extension of the SM Higgs sector is to include another Higgs doublet. With two Higgs doublets, the scalar sector of the two Higgs doublet model (2HDM) now contains five scalar eigenstates: a light CP even neutral Higgs h, a heavy CP even neutral Higgs H, a CP odd neutral Higgs A and a pair of charged Higgs H ± . Depending upon how these two scalar doublets couple with the SM fermions, there can be four distinct 2HDMs, namely, type I, type II, type Y and type X. A recent review of the phenomenology of 2HDM can be found in Ref. [3].
Any signal of a charged Higgs boson at the LHC would be an unambiguous discovery of new physics beyond the SM. However the search for the charged Higgs is quite complicated.
If it is lighter than the top quark, it would be profusely produced from the decays of the top quark in top pair production. Such a light H ± would dominantly decay to τ ± ν. A detailed study of this decay in top pair and single top productions has been performed in Refs. [4,5]. In the high mass region where M H ± > M t , the dominant production of H ± at the LHC is in association with a single top occurring via the bg → tH − + c.c. fusion process [6]. However search prospects in the high mass region for H ± is quite difficult, owing to large backgrounds to dominant decay process H ± → tb. So long as tan β, the ratio of the vacuum expectation value of two Higgs doublets, is sufficiently small (≤ 1.5) or large (≥ 30), the charged Higgs has a reasonable prospects of discovery in tb decay mode [7][8][9][10][11]. A recent exhaustive analysis of the discovery prospects of charged Higgs can be found in Ref. [12]. Searches for charged Higgs in type II 2HDM in associated production with a top quark using the top polarization has been performed in Refs. [13][14][15][16][17][18].
In this work, among all 2HDM Yukawa types, we mainly focus on the 2HDM model of type II, wherein, the charged Higgs mass is constrained to be larger than 480 GeV according to the b → sγ measurements [19]. Within the framework of the 2HDM, when other neutral scalars such as H and A are lighter, then H ± → W ± φ (φ ≡ (h, H, A)) decays are kinematically open and a new realm in the search for charged Higgs in bosonic decays is available. Recent studies [20][21][22] have demonstrated the potential of bosonic channels H ± → W ± φ for the H ± searches at the LHC. In Ref. [20], the inverse alignment scenario of type II 2HDM was considered and their conclusion was that the W ± φ decay mode can be utilized to detect H ± in the early run of 14 TeV LHC with only 100 fb −1 of integrated luminosity.
Jet substructure methods in the context of charged Higgs searches have been utilized in Ref. [23] where authors have studied a full hadronic decay mode of the charged Higgs and top quark when they are produced in association with each other. They utilized various boosted top-tagging algorithms to reconstruct the boosted hadronic top emanating from the decay of charged Higgs. They concluded that the sensitivity of the LHC to the heavy charged Higgs boson with two b taggings can reach upto 9.5σ significance for a 1 TeV charged Higgs. The jet substructure analysis is also found to be useful in probing other Higgs particles in BSM models with extended Higgs sectors [24][25][26][27][28][29].
Spurred on by the aforementioned results and the fact that a heavy charged Higgs would lead to boosted decay products, we intend to utilize the jet substructure tools [30] in order to identify boosted Higgs and W bosons. We will then reassess the H ± discovery prospects in the W ± φ decay modes in the context of 2HDM at the 14 TeV LHC. We first study a simple cut-based data analysis on the signal and background events. For this, we design a set of kinematical cuts which are found to be suitable for the case of 1 TeV charged Higgs achieving a reasonable signal significance in the early run of 14 TeV LHC. In pursuit of a better signal significance, we also perform a multivariate analysis (MVA) that takes into account the distribution profiles of many kinematical variables. To maximize the signal to background discrimination, we employ a boosted decision tree (BDT) algorithm that enhances classification performance by sequentially boosting the decision trees using the training data.
The plan of the paper is as follows. In the section II, we discuss the production and decays of charged Higgs in 2HDM at the LHC. In section III we discuss the signal and background processes; and various benchmark points for further analysis. In section IV we identify the two signal regions for the analysis, perform a full signal-to-background analysis using jet substructure tools and construct various kinematical variables for the boosted decision tree analysis. Finally we discuss our conclusions from these results.
2 Production and decay of heavy charged Higgs boson in 2HDM

Production of H ±
We consider the charged Higgs production in association with a single top at the LHC. At the parton level, this production occurs via There are two Feynmann diagrams contributing to the process: the s-channel exchange of a bottom quark and the t-channel exchange of a top quark. It is the H ± tb coupling in the 2HDM which is relevant for the production cross section. The H ± tb coupling in a general 2HDM can be written as where g is the SU(2) gauge coupling and where A t and A b have been defined in Table 1 for type I and II 2HDMs.
type I 1/ tan β 1/ tan β type II 1/ tan β tan β Table 1. The values of A t and A b for type I and II 2HDMs.
The production cross section for the tH − process is proportional to [m 2 ] in a general 2HDM. For a type I 2HDM, as both A t and A b are inversely proportional to tan β, the production cross section decreases rapidly with increasing tan β. For a type II 2HDM, first the cross section goes down with increasing tan β until reaching its minimum value at tan β = m t /m b ∼ 6.4. This is independent of center-of-mass energy and charged Higgs mass. As tan β is increased further the m 2 b tan β 2 term takes over leading to rise in production cross section. This behaviour is seen in the Fig. 1, where the cross section as a functions of tan β (left panel) and H ± mass (right panel) has been displayed for type I and II 2HDMs. At large values, tan β ≥ 30, the production cross section in the type I 2HDM becomes almost insignificant while for type II case it is as significant as for the lower tan β values. In the figure, the red (blue) curves denote the cross section for type I (II) 2HDM. In the right panel of the figure, for tan β = 1, the cross sections for type I and II models are equal and thus the two corresponding curves overlap with each other.  Cross section for inclusive tH − production at the 14 TeV LHC as a function of tan β (left) and mass of the charged Higgs M H ± . The red and blue color curves denote the cross sections for type I and type II 2HDMs. Note that in right panel, the red and blue solid curves overlap each other for tan β = 1.

Decays of H ±
The tree level decay modes for a heavy charged Higgs relevant to our analysis include H ± → (tb, W ± h, W ± H, W ± A) with h, H and A being the light CP even neutral scalar, the heavy CP even neutral scalar and CP odd scalar respectively. The partial decay widths for each channel can be expressed as  The decay width of H ± → tb depends on the parameters tan β and M H ± . For a type I 2HDM, with increasing tan β, the decay branching ratio (BR) of the tb decay mode goes down as m 2 t / tan 2 β. On the other hand, for the type II 2HDM, it first decreases until tan β = m t /m b then rises significantly as m 2 b tan 2 β as it does for for the tH − production cross section. The bosonic decays of H ± to the CP even Higgs bosons (H ± → W ± h and H ± → W ± H) are proportional to the mixing angle (β − α) between h and H. The BR for the former decay is proportional to c 2 β−α while the latter to s 2 β−α . The current LHC data prefers the alignment limits of 2HDM, i.e., the s β−α ∼ 1 (c β−α ∼ 1) for the case when h (H) is the light SM-like Higgs. In such a scenario, it is easy to see that a charged Higgs couples very weakly to the SM-like Higgs boson. Thus, unfortunately H ± searches in the bosonic decays cannot exploit the invariant mass reconstruction of SM-like scalar around 125 GeV. On the other hand, the H ± decay in the W ± A channel is not suppressed by any mixing angle and thus can be dominant over other decays if A is light enough. Moreover, in the alignment limit of the 2HDM, the BR of the W ± A decay mode becomes equal to that of the W ± H mode if A and H are degenerate in mass.
For the purpose of illustration, in Fig. 2 we show the branching ratios of the charged Higgs to various decay channels, namely, tb, W ± h, W ± H and W ± A for s β−α = 0.9 and tan β = 1 in type II 2HDM. We show the variation of these BRs with the mass of the charged Higgs H ± from 100 GeV until 1 TeV. We have chosen the m h = 125 GeV, m H = 200 GeV and m A = 100 GeV (200 GeV) in the left (right) panels of the Fig. In the work, we mainly focus on the bosonic decays of the charged Higgs to pseudoscalar though the same analysis can be extended to heavy CP even Higgs. In Fig. 3, we display the product of the charged Higgs production cross section (σ(pp → tH − )) and the branching ratio of the H ± decay to W ± A in the plane of the charged Higgs mass and the pseudoscalar mass for tan β = 1. All the decay widths and branching ratios of charged and pseudoscalar Higgs are obtained using 2HDMC [31].

Signature and Backgrounds
We study a heavy charged Higgs production in association with a top quark at the 14 TeV LHC followed by the decay H ± → W ± A with both W ± and A being highly boosted when A is light. Thus the final state contains two W bosons coming from the top and H ± decays, 3 b jets coming from A and top decays. In this analysis, we consider one of the W 's decaying leptonically and the other hadronically leading to a signal ± bbbjjE / T + X. The largest background for the signal comes from the W W bbj process which includes the top pair production plus one jet process. It mimics the signal when the light jet is mistagged as a b jet. The irreducible background comes from the W W bbb process which includes the top pair production associated with a b quark. Another background considered in this analysis comes from the W W bjj process, which has significant cross section but can be manageable using b identification.
In order to generate the signal and background events at leading order, we use Madgraph5 [32]. Further, we use PYTHIA8.2 [33] to perform parton showers and hadronization for both signal and background events. All events are then passed to DELPHES3 [34] for the fast detector simulation, where we apply the default ATLAS detector card. The DELPHES3 output is then used for jet substructure analysis using FastJet [35].

Benchmark points for the analysis
The choice of benchmark points in this analysis is dictated by the fact that jet substructure methods work best in the scenarios where the mass difference between the charged Higgs and the pseudoscalar boson is large. Thus to demonstrate the utility and limitation of the jet substructure analysis we choose benchmark points with various mass differences between H ± and A. We consider three values of the charged Higgs mass, M ± H = 500 GeV, 750 GeV and 1 TeV and three values of the CP odd Higgs mass, M A = 100 GeV, 150 GeV and 200 GeV. Thus, in total, we study 9 benchmark points: BP1: (500, 100) GeV, BP2: We take M h = 125 GeV and tan β = 1 in the analysis. As this analysis does not depend on the CP property of the neutral scalars that H ± decays to, it is equally applicable to signals in which H ± → W ± h/H. As mentioned in the foregoing, the current LHC data prefers the alignment scenario leading to almost equal coupling of the pseudoscalar and the heavy CP even scalar to charged Higgs. Thus including the contribution of H into our analysis may further improve the signal cross section and in turn achieve a better signal-to-background ratio.

Framework
Our search strategy heavily relies on the very boosted Higgs boson which opportunely enhances the signal-to-background ratio. We start our analysis with the preselection of the objects. The particle-flow charged tracks, after isolating the charged leptons, the particle-flow neutral hadrons, and the particle-flow photons in the DELPHES3 output are used for jet reconstruction. The fat jets are clustered using the Cambridge-Aachen (CA) jet algorithm with a particular jet cone size of R = 1.2 in order to capture all the collimated decay products of boosted bosons. We then apply the BDRS algorithm which utilizes the mass-drop technique to identify the substructure inside a reconstructed fat jet. This is followed by filtering in which we recluster the constituents of fat jets with radius R filt   = min(0.35; R 12 /2) and select the three hardest subjects to suppress the pileup effects 1 .
The Higgs identification then requires the tagging of two b jets among the three filtered subjets. We assume 70% b tagging efficiency with a 1% mistagging rate for light flavor and gluon jets [37]. To tag the b jets inside the boosted Higgs jet, both ATLAS [38] and CMS [39] use "subjet b tagger" that first identifies two subjets within the Higgs jet and then applies the standard b tagging algorithm with the similar efficiency as that of an isolated b jet to each subjet. In [40] CMS proposed the "double-b tagger" method which first 1 In ref. [36] it has been found that the jet grooming techniques such as filtering are quite effective in suppressing pileup and underlying events. Though we do not include these events in simulation, we do perform the filtering in order to include its effect on signal and background effects. Note however that including the pileup and underlying events into our simulation may worsen the significance attained in the analysis.
identifies the displaced vertices within the Higgs jet, and then combines information from these vertices with other jet quantities in a dedicated multivariate algorithm. This method shows a better tagging efficiency than the "subjet b tagger" (see fig. 3 of ref. [40]). In our analysis, we use "subjet b tagger" which both CMS and ATLAS have used in their analysis of boosted Higgs as mentioned earlier.
After the Higgs tag is successful, we remove the constituents of the corresponding fat jet from the event and recluster the remaining remnants in the events using the anti-k T jet clustering algorithm with jet cone radius of R = 0.4.  In what follows, we define two signal regions, aiming for two different decay modes: (1) one where the top decays hadronically while the charged Higgs decays semileptonically (SRI) and (2) vice versa (SRII). In SRII, the leptonic W (W lep ) would come from the top  and thus is likely to be relatively softer with low p T while in SRI, as it comes from a heavy charged Higgs, it is likely to be quite harder. This fact can be seen in Fig. 4. In SRI the hadronic W boson is reconstructed via two light narrow jets, while for SRII decay products of the hadronic W boson are collimated along its direction and thus it appears as a fat jet whose invariant mass shows a peak around M W . In both signal regions the longitudinal component of the neutrino momentum (p νL ) coming from W lep can be determined, using the information of the missing transverse momentum and by imposing the invariant mass constraint M ν = M W ± , as In the case where there are two solutions, we adopt the one which gives M ν closer to the W -mass. We reject events with complex solutions. The four momentum of W lep is obtained by the vector sum of the charged lepton and neutrino momenta. The p T of the leptonic W boson is used to separate the two signal regions kinematically i.e., an event is attributed to SRI if p T (W lep ) > 150, 200, 250 GeV for M H ± = 500, 750 and 1 TeV respectively and attribute to SRII otherwise.

Cut-based Analysis
For the sake of comparison with the multivariate analysis that we perform in the next section, we study the signal efficiency with respect to background after applying a series of   Similarly, for signal region SRII, which exhibits different kinematical characteristics, we study following variables: (1) H T distribution; (2) invariant mass of the first leading fat jet; (3) invariant mass of the second leading fat jet; (4) p T distribution of the first leading fat jet; (5) p T distribution of the second leading fat jet; (6) pseudorapidity distribution of the first leading fat jet; (7) pseudorapidity distributions of the second leading fat jet; (8) ∆R separation between the first two leading fat jets; and (9) the mass of the charged Higgs reconstructed from the leading two fat jets (one of which must be a Higgs jet). These distributions have been shown in the Figs. 8, 9 and 10 for benchmark points BP3, BP5 and BP9 respectively for signal region SRII.
After analyzing the kinematical distributions, we devise a set of cuts in each signal regions. Below we list the cuts which we imposed in the signal region SRI: 1. Trigger: Trigger includes all the detector acceptance cuts, namely, p ± T > 20 GeV, |η ± | < 2.5, p j,b T > 20 GeV, |η j,b | < 2.5.
In addition, we also impose that there must be exactly one charged lepton in each event. The trigger efficiency is around 40-45% for the signal as well as background.

2.
A cut on transverse momentum of the leptonic W boson: To separate the signal region SRI from SRII, we further require the p T of reconstructed W ± lep to be greater than 150 (250) GeV for charged Higgs of mass of 500 (1000) GeV.

A cut on H T distribution:
The H T distributions for the signal in the case of the heavy charged Higgs are much more harder than the background distributions. It is obvious that for heavy charged Higgs, a stringent H T cut can be quite detrimental to the backgrounds and thus can be effective in enhancing the signal-to-background ratio. To utilize this fact, we adopt the cut: H T > 500(700) GeV for M H ± = 500(1000) GeV.

One Higgs jet:
The leading fat jet in an event must be tagged as a Higgs jet. This step requires b taggings on the leading two filtered subjets inside the fat jet. It turns out that this particular cut is the most effective in suppressing the background. For

Signal WWbbj WWbbb WWbjj
Cross section x BR (fb) 35 Cross section after cuts (fb) 0.25 the signal, the Higgs tagging efficiency is much higher for the 1 TeV H ± i.e., 50% while it is only 10% for 500 GeV charged Higgs 2 .

5.
A cut on p T of the Higgs jet: As the Higgs is emanated from the decay of a heavy charged Higgs, it is expected to have large transverse momentum. To make use of this, we further impose a cut on its pT > 200 GeV. This cut diminishes the background to half while the signal events are only reduced by 15%.
6. Missing transverse energy: In signal region SRI, the elusive neutrino comes from a heavy H ± and thus is expected to carry large E / T while for the background the E / T is quite small. We choose events with E / T larger than 100 GeV. In Table 2, we present the cut flow of the efficiencies for the signal for BP3 in signal region SRI and for the different backgrounds. For the BP3 the signal cross section after multiplying the branching ratios of H ± → W ± A and A → bb and before applying any 2 For a Higgs with pT between 400 GeV to 800 GeV, ATLAS [38] have found the Higgs tagging efficiency to be around 40%-50%. For a highly boosted Higgs, the efficiency drops sharply as it becomes increasingly difficult to resolve a fat jet into subjets.

Signal
WWbbj Cross section after cuts (fb) 0.44 kinematical cuts is 35 fb (including the contribution of the conjugate process). The corresponding total background is 170 pb (after including BR of the two W bosons in the process). We see from the table that the background is reduced to 6.5% after applying the trigger and the cut on the transverse momentum of the leptonic W boson in the process while the signal events are only reduced to 32%. This is expected as the W ± lep bosons in the signal region SRI are expected to carry large transverse momentum. The other important cuts happen to be those on H T and the requirement of at least one Higgs jet in an event which suppress the total background contribution to O(10 −4 ) of its initial value while the signal events are at 4.1%. Subsequent cuts on p T of the Higgs jet, missing transverse energy and mass window around the reconstructed charged Higgs further reduce the signal cross section to 0.25 fb and the final total background cross section turns out to be 1.7 fb. Thus we find the signal significance, S/ √ S + B for this benchmark point to be 4.1 with integrated luminosity of 500 fb −1 . Thus, even the most difficult scenario in our analysis has the reasonable prospects of discovery with around 1 ab −1 of data.
The situation improves further for the benchmark point BP9 even though it has a very small production cross section for a charged Higgs of 1 TeV mass. In Table 3, we show a table for the cut efficiencies and signal efficiencies for the benchmark point BP9. The initial cross section for the signal for this benchmark point is only 5 fb considerably smaller than 35 fb in the benchmark point BP3. However, because of the large signal and background separation in for this mass of H ± , the signal efficiency comes out to be 8.7% which is remarkably better than 0.7% in the benchmark point BP3. This result leads to a much larger significance for BP9 and this benchmark point is within the reach of early LHC data in its 14 TeV run.

Signal
WWbbj 2nd leading fat jet p T > 100 GeV Cross section after cuts (fb) 0.21 6.4 0.03 Table 4. Cut flow of the efficiencies for signal and backgrounds at the 14 TeV LHC in SRII for M H ± = 500 GeV (BP3).
In addition, we also require that there must be exactly one charged lepton in each event. The trigger efficiency is around 40-45% for the signal as well as background.

2.
A cut on transverse momentum of the leptonic W boson: To separate the signal region SRII from SRI, we require the p T of reconstructed W ± lep to be smaller than 150 (250) GeV for a charged Higgs of mass of 500 (1000) GeV.

A cut on H T distribution:
This cut is the same as in signal region SRI. 4. One Higgs jet: One of the two leading fat jets in an event must be tagged as a Higgs jet. As mentioned earlier, this requires tagging the two subjets inside the fat jet as b jets. In SRII, we find that the Higgs tagging efficiency for the signal is 50 (10)% for 1 TeV (500 GeV) H ± .

5.
Cuts on p T 's of first two leading fat jets: The decay of a heavy charged Higgs in the signal leads to two fat jets with the high p T 's while for the background, these are expected to be soft. To utilize this fact we impose cuts on p T J 1 > 150 GeV and p T J 1 > 100 GeV.

Charged Higgs Mass Window:
The charged Higgs in signal region SRII is reconstructed from the two hardest fat jets, one of which must be tagged as a Higgs jet. As earlier, for M H ± = 500 GeV (1 TeV), we select events if the invariant mass of the two hardest fat jets (J 1 and J 2 ), |M J1J2 − M H ± | < 100(200) GeV.
We now discuss the effects of kinematical cuts on the benchmark points BP3 and BP9 in the signal region SRII. The cut flow efficiencies of the signal and various backgrounds are presented in Tables 4 and 5. Unlike in the signal region SRI where the cut on the p T of the leptonic W boson is quite stringent and suppress the total background by a factor of ∼ 15, it is not much effective in the signal region SRII. However in the signal region SRII there is another cut that we find to be much effective viz.. a cut on the p T of the 2nd leading fat jet which suppresses the background contribution by an order of magnitude for the benchmark point BP9. Moreover the reconstructed charged Higgs invariant mass distribution is quite separated for BP9 than for BP3. All these facts result in significantly better suppression of the total background in the benchmark point BP9 than for BP3. Consequently, the search prospects are far better for a 1 TeV charged Higgs in BP9 than for a 500 GeV H ± in BP3 in the early run of the 14 TeV LHC.

Multivariate Analysis
We can improve the signal-to-background ratio if we are able to utilize all possible discriminating features in the kinematical distribution profiles of signal and backgrounds through the use of multivariate techniques. For this purpose, we utilize the TOOLKIT FOR MULTIVARIATE DATA ANALYSIS WITH ROOT (TMVA) [41] in which various multivariate techniques are implemented in an effective and simple manner. We employ the boosted decision trees (BDTs) analysis to get a better discrimination between signal and backgrounds. This has been shown to perform quickly and effectively for HEP classification problems [42]. Other algorithms such as Multilayered Perceptron within TMVA were also considered but deemed to slow for the accuracy of classification provided. One major advantage of using the BDT algorithm is that it can handle a large number of input kinematical variables. In general terms, the more variables are included in the input, the better is the signal and background separation. One can construct several kinematical variables which have some discriminatory power to segregate the signal and background events. However, too many variables might reduce the boosting performance. Thus it is crucial to select the most useful variables, which show reasonable potential for discrimination, so as to maximize the boosting performance. In this regard, we include all kinematical variables displayed in the previous section.
We first train BDT with 5 × 10 5 signal and 10 6 background events. Then we perform the testing with the number of events normalized by the integrated luminosity. The BDT algorithm used an 850 tree ensemble ("forest") that required a minimum of 2.5% of training events to be passed through each tree and a maximum tree depth of 3. Before passing the events to the BDT for multivariate analysis, we apply preselection cuts in order to separate the events in two different signal regions SRI and SRII. In the following, we list the preselection cuts for each of the signal regions SRI and SRII: Preselection for SRI: Events must have one charged lepton ± , one fat jet (J 1 ) tagged

Signal WWbbj WWbbb WWbjj
Cross section x BR (fb) 5.0 1.6×10 5 2.   as the Higgs jet and 3 narrow jets with following requirements on p T and pseudorapidity: p j T > 20 GeV, |η j | < 2.5 (4.2c) In addition, we further require the transverse momentum of W lep to be greater than 150 GeV, 200 GeV and 250 GeV respectively for the charged Higgs mass of 500 GeV, 750 GeV and 1 TeV. Preselection for SRII: Events must have one charged lepton ± , two fat jets (J 1 , J 2 ) one of which tagged as the Higgs jet and one narrow jet with following requirements on p T and pseudorapidity: The selection cuts are for M H ± =500 GeV. For a charged Higgs of 750 GeV and 1 TeV, we take p J 1 T > 200 and 250 GeV respectively. Similarly, we slect events with p J 2 T > 150 and 200 GeV for 750 GeV and 1 TeV charged Higgs. In addition, we further require the transverse momentum of W lep to be smaller than 150 GeV, 200 GeV and 250 GeV respectively for the charged Higgs mass of 500 GeV, 750 GeV and 1 TeV. The preselection efficiencies for each signal BPs and background in SRI and SRII have been presented in Table 6. In writing the efficiencies we include the b tagging/mistagging efficiencies in the  background can be greatly suppressed while the signal efficiencies are still large. On the other hand, for a low charged Higgs mass of 500 GeV, as the signal and background are less separate, the preselection cuts are not that stringent and lead to large background efficiency. In the signal region SRI, owing to the p T cut on the leptonic W that is specially introduced to select hard leptonic W bosons from the decay of charged Higgs, the preselection efficiencies for the background events are significantly smaller. Thus in the signal region SRI the background events are expected to be much smaller than in SRII. However in the signal region SRII, the kinematical distribution of variables have far rich features owing to the presence of two hard fat jets. This fact leads to a better performance of MVA in the discrimination of the signal and background events leading to a reasonable signal significance despite having considerably large background events. In the left panels of Figs. 11 and 12, we show the BDT distributions for signal and background while in the right panels, the variations of the signal and background efficiencies along with the signal purity and significance have been displayed. The top panel corresponds to benchmark point BP3, the middle panel to BP5 and the bottom panel to BP9 of the signal. The signal purity is defined as the ratio of the signal and the sum of signal and background cross section, S/(S + B), while the signal significance is defined as S/ √ S + B. The plots in the right panel of Figs. 11 and 12 have been normalized to integrate luminosity of 100 and 500 fb −1 respectively. It is evident from the figures that the background efficiency represented by the red curve falls more sharply for 1 TeV H ± than for 500 GeV as the far greater mass splitting between the pseudo-scalar Higgs and the charged Higgs is better for substructure analysis.
In Table 7, we present the statistical significance of the various signal benchmark points in the two signal regions SRI and SRII for three different chosen values of integrated luminosities, 100 fb −1 , 500 fb −1 and 1 ab −1 . We find that utilizing a multivariate technique like BDT can significantly enhance the discovery prospects of a charged Higgs. Even with the a integrated luminosity of 100 fb −1 , the significance of the signal is larger than 5 in most cases. Despite having very small cross section for a 1 TeV charged Higgs, the detection prospects are comparable to the lower H ± masses thanks to its suitability for jet substructure analysis.

Conclusions
The discovery of 125 GeV Higgs-like particle has ushered in a new era in exploration of electroweak symmetry breaking (EWSB) at the large hadron collider (LHC). Various extensions of the standard model EWSB sector introduce additional scalars into the theory. Any further discovery of scalars would be an unambiguous signal of beyond standard model physics. In particular, discovery of a charged Higgs would be a confirmation of an extended scalar sector of EWSB. However, even if a charged Higgs is being produced profusely at the LHC, the search for it is quite complicated due to a large background to its dominant decay to tb.
In this work, we study bosonic decays of the charged Higgs that are dominant in certain regions of the parameter space of the two Higgs double models (2HDM). More specifically, we focus on the H ± → W ± A decay mode with subsequent decay of the pseudoscalar to a pair of b quarks in associated production of charged Higgs with a top quark. As a charged Higgs of mass lighter than 480 GeV is already ruled out from b → sγ constraints in type II 2HDM, we consider a heavy charged Higgs and consider three different values of its mass 500 GeV, 750 GeV and 1 TeV. We further consider three different masses of the pseudoscalar (100 GeV, 150 GeV and 200 GeV) for each charged Higgs mass. This choice of mass spectrum leads to highly boosted pseudoscalar Higgs in the final state emanating from a heavy charged Higgs decay.
To enhance the discovery prospects of a charged Higgs in the heavy mass regime, we employ the techniques of jet substructure analysis which play a significant role in tagging a highly boosted Higgs boson. We perform a detailed detector simulation on the 9 signal benchmark points as well as background processes and devise a set of well optimized cuts in a simple cut-based analysis to maximize the signal-to-background ratio. In doing so, we define two signal regions so as to capture the features for in each of the signal regions. The conclusion of the simple cut-based analysis is that a heavy charged Higgs of 1 TeV is discoverable at the 14 TeV LHC with a large significance, while for a 500 GeV H ± the signal significance can barely reach 5σ even with 3000 fb −1 of integrated luminosity.
Finally we perform a multivariate analysis incorporating various kinematical variables that have large discriminating power between signal and backgrounds. We engage the boosted decision tree technique in order to enhance classification performance. We conclude from the MVA that the different distribution profiles of the input variables for the signal and background lead to a very high signal efficiency with respect to background. We find that the charged Higgs would be discoverable with only 100 fb −1 of data in the heavy mass region.