Strong Double Higgs Production at the LHC

The hierarchy problem and the electroweak data, together, provide a plausible motivation for considering a light Higgs emerging as a pseudo-Goldstone boson from a strongly-coupled sector. In that scenario, the rates for Higgs production and decay differ significantly from those in the Standard Model. However, one genuine strong coupling signature is the growth with energy of the scattering amplitudes among the Goldstone bosons, the longitudinally polarized vector bosons as well as the Higgs boson itself. The rate for double Higgs production in vector boson fusion is thus enhanced with respect to its negligible rate in the SM. We study that reaction in pp collisions, where the production of two Higgs bosons at high pT is associated with the emission of two forward jets. We concentrate on the decay mode hh ->WW^(*)WW^(*) and study the semi-leptonic decay chains of the W's with 2, 3 or 4 leptons in the final states. While the 3 lepton final states are the most relevant and can lead to a 3 sigma signal significance with 300 fb^{-1} collected at a 14 TeV LHC, the two same-sign lepton final states provide complementary information. We also comment on the prospects for improving the detectability of double Higgs production at the foreseen LHC energy and luminosity upgrades.


Introduction
It is clear that, in addition to the four known fundamental forces (gravity, electromagnetism, the weak and the strong interactions), new dynamics must exist in order to account for the observed phenomenon of electroweak symmetry breaking (EWSB). Luckily the state of our knowledge is about to change as the Large Hadron Collider (LHC) is set to directly explore, for the first time in history, the nature of this dynamics. A basic question the LHC will address concerns the strength of the new dynamics: is the force behind EWSB a weak or a strong one? In most regards this question is equivalent to asking whether a light Higgs boson exists or not. This is because in the absence of new states (in particular the Higgs boson) the strength of the interaction among the longitudinally polarized vector bosons grows with energy becoming strong at around 1 or 2 TeV's. The Standard Model (SM) Higgs boson plays instead the role of 'moderator' of the strength of interactions, and allows the model to be extrapolated at weak coupling down to very short distances, possibly down to the Unification or Planck scale [1]. In order to achieve this amazing goal the couplings of the SM Higgs are extremely constrained and predicted in terms of just one new parameter, the mass of the Higgs itself. In such situation, the SM Higgs is for all practical purposes an elementary particle. However it is also possible, and plausible in some respects, that a light and narrow Higgs-like scalar does exist, but that this particle is a bound state from some strong dynamics not much above the weak scale. In such a situation the couplings of the Higgs to fermions and vector bosons are expected to deviate in a significant way from those in the SM, thus indicating the presence of an underlying strong dynamics. Provided such deviations are discovered, the issue will be to understand the nature of the strong dynamics. In that perspective the importance of having a well founded, but simple, theoretical picture to study the Higgs couplings at the LHC cannot be overemphasized.
The hierarchy problem and electroweak data, together, provide a plausible motivation for considering a light composite Higgs. It is well known that the absence of an elementary Higgs scalar nullifies the hierarchy problem. Until recently the idea of Higgs compositeness was basically seen as coinciding with the so called Higgsless limit, where there exists no narrow light scalar resonance. The standard realization of this scenario is given by Technicolor models [2]. However, another possibility, which is now more seriously considered, is that the Higgs, and not just the eaten Goldstone bosons, arises as a naturally light pseudo-Goldstone boson from strong dynamics just above the weak scale [3,4,5,6]. This possibility is preferable over standard Technicolor in view of electroweak precision constraints. The reason is that the electroweak breaking scale v is not fixed to coincide exactly with the strong dynamics scale f , like it was for Technicolor. Indeed v is now determined by additional parameters (in explicit models these can be the top Yukawa and the SM gauge couplings) and it is conceivable to have a situation where there is a small separation of scales. As a matter of fact v ∼ < 0.3f is enough to largely eliminate all tension with the data. The pseudo-Goldstone Higgs is therefore a plausible scenario at the LHC. In that respect one should mention another possibility that was considered recently where the role of the Higgs is partially played by a composite dilaton, that is the pseudo-Goldstone boson of spontaneously broken scale invariance [7]. This second possibility is less motivated than the previous one as regards electroweak data, in that, like in Technicolor, no parameter exists to adjust the size of S (and T ). However it makes definite predictions for the structure of the couplings, that are distinguished from the pseudo-Goldstone case. The existence of the dilaton example suggests that it may be useful to keep a more ample perspective on "Higgs" physics.
The effective Lagrangian for a composite light Higgs was characterized in Ref. [6], also focussing on the pseudo-Goldstone scenario. It was shown that the Lagrangian is described at lowest order by a very few parameters, and, in particular, in the pseudo-Goldstone case, only two parameters c H and c y are relevant at the LHC. Both parameters modify in a rather restricted way the Higgs production rate and branching ratios. In particular, the parameter c H , that corresponds to the leading non-linearity in the σ-model kinetic term, gives a genuine "strong coupling" signature by determining a growing amplitude for the scattering among longitudinal vector bosons. As seen in the unitary gauge, because of its modified coupling to vectors, the Higgs fails to completely unitarize the scattering amplitude. This is the same σ-model signature one has in Technicolor. The novelty is that the Higgs is also composite belonging to the σ-model, and thus the same growth with energy is found in the amplitude for V L V L → hh (V = W, Z). One signature of this class of models at hadron collider is therefore a significant enhancement over the (negligible) SM rate for the production of two Higgs bosons at high p T along with two forward jets associated with the two primary partons that radiated the V L V L pair. The goal of the present paper is to study the detectability of this process at the LHC and at its foreseen energy and luminosity upgrades.

General parametrization of Higgs couplings
In this section we will introduce a general parametrization of the Higgs couplings to vectors and fermions. The goal is to describe deviations from the SM in Higgs production and decay.
We are interested in the general situation in which a light scalar h exists in addition to the vectors and the eaten Goldstones associated to the breaking SU (2) × U (1) Y → U (1) Q . By the request of custodial symmetry, the Goldstone bosons describe the coset SO(4)/SO(3) and can be fit into the 2 × 2 matrix Σ = e iσaπ a /v v = 246 GeV . (2.1) By working at sufficiently low energy with respect to any possible strong scale, we can perform a derivative expansion. The leading effects growing with energy arise at the 2derivative level, and so we truncate our Lagrangian at this order. Moreover we assume that the gauge fields are coupled to the strong sector via weak gauging: the operators involving the field strengths W µν and B µν will appear with loop suppressed coefficients, and we neglect them. Similarly, we assume that the elementary fermions are coupled to the strong sector only via the (proto)-Yukawa interactions, so that the leading effects will not involve derivatives (e.g. operators involving the product of a fermionic and σ-model current will be suppressed). Under these assumptions the most general Lagrangian is 1

2)
1 In general c can be a matrix in flavor space, but in the following we will assume for simplicity that it is proportional to unity in the basis in which the mass matrix is diagonal. In this way no flavor-changing neutral current effects originate from the tree-level exchange of h. and a, b, c, d 3 , d 4 are arbitrary numerical parameters. We have neglected terms of higher order in h (denoted by the dots) as they do not affect the leading 2 → 2 processes. For a = b = c = d 3 = d 4 = 1 and vanishing higher order terms, the scalar h can be embedded into a linear multiplet and one obtains the SM Higgs doublet Lagrangian. The role of a, b and c in 2 → 2 processes is easily seen by working in the equivalent Goldstone boson approximation [8], according to which longitudinal vector bosons can be replaced by the corresponding Goldstone bosons at high energy, V i L ↔ π i . The parameter a controls the strength of the V L V L → V L V L scattering (V = W, Z), see Fig. 1 (upper row). At the two derivative level the Goldstone scattering amplitude is A(π i π j → π k π l ) = δ ij δ kl A(s) + δ ik δ jl A(t) + δ il δ jk A(u) where subleading terms in (M 2 W /s) have been omitted. Perturbative unitarity is thus satisfied for a = 1. The parameter b instead controls the process V L V L → hh, see Fig. 1 (lower row), A(π i π j → hh) δ ij s v 2 (b − a 2 ) . (2.7) In this case perturbative unitarity is satisfied for b = a 2 . Notice that an additional contribution from the s-channel Higgs exchange via the trilinear coupling d 3 has been omitted because subleading at high energy. In fact, as it will be shown in the following sections, in a realistic analysis of double Higgs production at the LHC such contribution can be numerically important and lead to a significant model dependency. Finally the parameter c controls the V L V L → ψψ amplitude which is weak for ac = 1. Hence, as well known, only for the SM choice of parameters a = b = c = 1 the theory is weakly coupled at all scales. From the above general perspective, the study of V V → V V , V V → hh and V V →ψψ tests three different parameters. However, in specific models a, b and c can be related to each other. For instance in the pseudo-Goldstone Higgs models based on the coset SO(5)/SO(4) [4,5], indicating by f the decay constant of the σ-model and defining ξ ≡ v 2 /f 2 , one has a = 1 − ξ b = 1 − 2ξ .   (5)) .
(2.11) By expanding the above equations at small ξ, the result matches the general expressions obtained by using the Strongly Interacting Light Higgs (SILH) Lagrangian in the notation of Ref. [6] a = 1 − c H 2 ξ b = 1 − 2c H ξ c = 1 − c H 2 + c y ξ . (2.12) In particular, fermions in the spinorial (fundamental) representations of SO(5) correspond to c y = 0 (c y = 1). Notice however that the general SILH parametrization applies more generally to a light composite SU (2) L Higgs doublet, regardless of whether it has a pseudo-Goldstone boson interpretation. The prediction for d 3 and d 4 is more model dependent, as it relies on the way the Higgs potential is generated. As benchmark values for the trilinear coupling d 3 we consider those predicted in the SO(5)/SO(4) minimal models of Ref. [4] (MCHM4) and Ref. [5] (MCHM5), respectively with spinorial and fundamental fermion representations, where the Higgs potential is entirely generated by loops of SM fields: 2 (2.14) 2 The singularity for ξ → 1 in Eqs. (2.11) and (2.14) appears because this limit is approached by keeping the mass of the Higgs and of the fermions fixed.
Another, distinct example arises when h represents the dilaton from spontaneously broken scale invariance. There one obtains a different relation among a, b and c. Indeed the dilaton case corresponds to the choice a 2 = b = c 2 with the derivative terms in the Lagrangian exactly truncated at quadratic order in h. For this choice one can define the dilaton decay constant by v/a ≡ f D , the dilaton field as 2.15) and the Lagrangian can be rewritten as [7] Notice that in the case of a SILH all the amplitudes of the three processes discussed above grow with the energy. On the other hand, in the dilaton case the relation a 2 = b ensures that the amplitude for V V → hh does not feature the leading growth ∝ s. The wildly different behaviour of the process V V → hh is what distinguishes the case of a genuine, but otherwise composite Higgs, from a light scalar, the dilaton, which is not directly linked to the breakdown of the electroweak symmetry. Another difference which is worth pointing out between the specific case of a pseudo-Goldstone Higgs and a dilaton or a composite non-Goldstone Higgs has to do with the range of a, b, c. In the case of a pseudo-Goldstone Higgs one can prove in general that a, b < 1 [9], while all known models also satisfy c < 1. Instead one easily sees that in the dilaton case depending on f D > v or f D < v one respectively has a, b, c < 1 or a, b, c > 1.
In general the couplings a, b, c also parametrize deviations from the SM in the Higgs branching ratios. However, for the specific case of the dilaton the relative branching ratios into vectors and fermions are not affected. Instead, for loop induced processes like h → γγ or gg → h, deviations of order 1 with respect to the Standard Higgs occur due to the trace anomaly contribution [10]. Similarly, in the pseudo-Goldstone Higgs case with matter in the spinorial representation, the dominant branching ratios to fermions and vectors are not affected. On the other hand, in the case with matter in the fundamental representation the phenomenology can be dramatically changed when ξ ∼ O(1). From Eqs. (2.9) and (2.11) we have  Higgs decay branching ratios as a function of ξ for SM fermions embedded into fundamental representations of SO(5) for two benchmark Higgs masses: m h = 120 GeV (left plot) and m h = 180 GeV (right plot). For ξ = 0.5, the Higgs is fermiophobic, while in the Technicolor limit, ξ → 1, the Higgs becomes gaugephobic.
One final remark must be made concerning the indirect constraints that exist on a, b, c. As stressed by the authors of Ref. [11], the parameter a is constrained by the LEP precision data: modifying the Higgs coupling to the SM vectors changes the one-loop infrared contribution to the electroweak parameters 1,3 by an amount and Λ denotes the mass scale of the resonances of the strong sector. For example, assuming no additional corrections to the precision observables and setting m h = 120 GeV, Λ = 2.5 TeV, one obtains 0.8 a 2 1.5 at 99% CL. However, such constraint can become weaker (or stronger) in presence of additional contributions to 1,3 . For that reason in our analysis of double Higgs production we will keep an open mind on the possible values of a. On the other hand, no indirect constraint exists on the parameters b, c, thus leaving open the possibility of large deviations from perturbative unitarity in the V V → hh and V V → ψψ scatterings.

Anatomy of
The key feature of strong electroweak symmetry breaking is the occurrence of scattering amplitudes that grow with the energy above the weak scale. We thus expect them to dominate over the background at high enough energy. Indeed, with no Higgs to unitarize the amplitudes, on dimensional grounds, and by direct inspection of the relevant Feynman diagrams, one estimates [12] A with f (t/s) a rational function which is O(1), at least formally, in the central region Figure 3: The full set of diagrams for qq → W W qq at order g 4 W . The blob indicates the sum of all possible W W → W W subdiagrams. It is understood that the bremsstrahlung diagrams (second and third diagrams) correspond to all possible ways to attach an outgoing W to the quark lines.
−t = O(s). Then, according to the above estimates, in the central region we have where N h is a numerical factor expected to be of order 1. On the other hand, f (t/s) has simple Coulomb poles in the forward region, due to t-and u-channel vector exchange. Then, after imposing a cut 3 −s + Q 2 min < t < −Q 2 min , with M 2 W Q 2 min s, the expectation for the integrated cross sections is Here again N s is a numerical factor expected to be of order 1. By the above estimates, we expect the longitudinal cross section, both the hard one and the more inclusive one, to become larger than the transverse cross section right above the vector boson mass scale.
In reality the situation is more complicated because, since we do not posses on-shell vector boson beams, the V 's have first to be radiated from the colliding protons. Then the physics of vector boson scattering is the more accurately reproduced the closer to on-shell the internal vector boson lines are, see Fig. 3. This is the limit in which the process factorizes into the collinear (slow) emission of virtual vector bosonsà la Weizsacker-Williams and their subsequent hard (fast) scattering [13,14]. As evident from the collision kinematics, the virtuality of the vector bosons is of the order of the p T of the outgoing quarks. Thus the interesting limit is the one where the transverse momentum of the two spectator jets is much smaller than the other relevant scales. In particular when where p T W and p T jet respectively represent the transverse momenta of the outgoing vector bosons and jets. In this kinematical region, the virtuality of the incoming vector bosons can be neglected with respect to the virtuality that characterizes the hard scattering subdiagrams. Then the cross section can be written as a convolution of vector boson distribution 3 The offshellness of the W 's radiated by the quarks in fact provides a natural cut on |t| and |u| of the order of p 4 T jet /s. Nevertheless, the total inclusive cross section is dominated by soft physics and does not probe the dynamics of EW symmetry breaking.
functions with the hard vector cross section. It turns out that the densities for respectively transverse and longitudinal polarizations have different sizes, and this adds an extra relative factor in the comparison schematized above. In particular, the emission of transverse vectors is logarithmically distributed in p T , like for the Weizsacker-Williams photon spectrum. Thus we have that the transverse parton splitting function is [13] where z indicates the fraction of energy carried by the vector boson andp T is the largest value allowed for p T . On the other hand, the emission of longitudinal vectors is peaked at p T ∼ M W and shows no logarithmic enhancement when allowing large p T [13] Hence, by choosing a cut p T jet <p T withp T M W the cross section for transverse vectors is enhanced due to their luminosity by a factor (lnp T /M W ) 2 . For reasonable cuts this is not a very important effect though. At least, it is less important than the numerical factors N h and N s that come out from the explicit computation of the hard cross section, and which we shall analyze in a moment.
One last comment concerns the subleading corrections to the effective vector boson approximation (EWA). On general grounds, we expect the corrections to be controlled by the ratio p 2 T jet /p 2 T W , that is the ratio between the virtuality of the incoming vector lines and the virtuality of the hard V V → V V subprocess 4 . In particular, both in the fully hard region p T jet ∼ p T W ∼ √ s and in the forward region p T jet , p T W ∼ < m W we expect the approximation to break down. In these other kinematic regions, the contribution of the other diagrams in Fig. 3 is not only important but essential to obtain a physically meaningful gauge independent result [14]. For the process qq → qqV T V T , when the cross section is integrated over p T jet up top T the subleading corrections to EWA become only suppressed by 1/(lnp T /M W ). This is the same log that appears in P T (z). The process qq → qqV T V T is not significantly affected by a strongly coupled Higgs sector. On the other hand, in the presence of a strongly coupled Higgs sector, for qq → qqV L V L the EWA is further enhanced with respect to subleading effects because of the underlying V L V L → V L V L strong subprocess. Indeed, by applying the axial gauge analysis of Ref. [15], one finds that, independent of the cut on p T jet , the subleading effects to the EWA are suppressed by at least M 2 W /p 2 T W . Having made the above comments on vector boson scattering in hadron collisions, let us now concentrate on the partonic process. We will illustrate our point with the example of the W + W + → W + W + process (similar results can be obtained for the other processes) in the case of the composite pseudo-Goldstone Higgs, where for a = 1 the longitudinal scattering is dominated at large energies by the (energy-growing) contact interaction. Let us 4 In fact, another kinematic parameter controlling the approximation is given by the invariant mass of the W + jet subsystem m 2 JW = (pW + pjet) 2 . In the region m 2 JW s the bremsstrahlung diagrams are enhanced by a collinear singularity. In a realistic experimental situation this region is practically eliminated by a cut on the relative angle between the jet and the (boosted) decay products of the W 's.
where the A's are numerical constants which take different values for the different polarization channels (see Table 1). The coefficients A t,u are easily computed in the eikonal approximation and are directly related to the electric-and SU (2) L -charges of the W 's: Since U (1) em is unbroken, the longitudinal and transverse W 's have the same electric charge e, but their SU (2) L charges are different: the charge of the transverse W 's, gc W , is directly obtained from the triple point interaction W + W − Z, whereas the charge of the longitudinal W , g(c 2 W − s 2 W )/(2c W ), can be deduced from the coupling of the Z to the Goldstones π ± of the Higgs doublet.
The energy-growing term in Eq. (3.7) has a non-vanishing coefficient A s only for the scattering of longitudinal modes (and a = 1), in which case it dominates the differential cross section. At large s and for |t|, |u| > Q 2 min M 2 W (s Q 2 min ) one has:    On the other hand, the scattering of transverse modes is dominated by the forward t-and u-poles 10) and the ratio of the longitudinal to transverse cross section is corresponding to a numerical factor N s ∼ 1/500 ! By using Table 1 one can directly check that this factor simply originates from a pile up of trivial effects (factors of 2) in the amplitudes. Interestingly, this numerical enhancement occurs for the T T → T T and LT → LT scattering channels, as clearly displayed by the left plot of Fig. 4, while it is absent in T T → LL (this latter channel is not shown in Fig. 4 because its cross section is much smaller than the others).
Of course the best way to test hard vector boson scattering is to go to the central region where the 'background' from the Coulomb singularity of Z and γ exchange is absent. Figure 5 reports the ratio of the differential cross sections as a function of t both for a = 0 (left plot) and a = 1 (right plot). It is shown that even for exactly central W 's (t = −s/2) the ratio is still smaller than its naive estimate, the suppression factor being N h ∼ 4 × 10 −4 for a = 0. The origin of this numerical (as opposed to parametric) suppression is in the value of the coefficients A i entering the various scattering channels. Indeed, for t = −s/2 Eq. (3.7) simplifies to  which leads to the differential cross sections (for t = −s/2): (3.13) Hence corresponding to an amazingly small numerical factor N h = 1/2304 again resulting from a pile up of 'factors of 2'. In Eq. (3.13) we detailed the contribution of the non-vanishing polarization channels to the transverse scattering cross section (the dominant channels are ++ → ++ and its complex conjugate). The result of our analysis is synthesized in the plots of Fig. 4. For the hard cross section (right plot, with −3/4 < t/s < −1/4) the signal wins over the SM background at √ s ∼ 600 GeV (a = 0), while for the inclusive cross section (left plot, with −s + 4M 2 W < t < −M 2 W ) one must even go above 1 TeV. This is consistent with the different s dependence displayed in Eqs. (3.3) and (3.2). These scales are both well above M W due to the big numerical factors N h,s . Of course the interesting physical phenomenon, hard scattering of two longitudinal vector bosons, is better isolated in the hard cross section, but at the price of an overall reduction of the rate.
It is this numerical accident that makes the study of strong vector boson scattering difficult at the LHC. The center of mass energy m W W of the vector boson system must be ∼ > 1 TeV in order to have a significant enhancement over the T T → T T background. But, taking into account the α W price to radiate a W , m W W ∼ 1 TeV is precisely where The differential cross section for pp → W ± W ± jj as a function of the invariant mass of the W W pair, for different choices of the outgoing W helicities. All curves have been obtained by using Madgraph and imposing the following cuts: M jj > 500 GeV, p T j < 120 GeV, p T W > 300 GeV. The cut on p T j exploits the forward jets always present in the signal. The cut on p T W eliminates the forward region where the cross section is (trivially) dominated by the Z and γ t-channel exchange.
the W luminosity runs out of steam. This situation is depicted in Fig. 6. The case a = 0 corresponds to the Higgsless case already studied in Ref. [16,17,18,19]. Our result, in spite of the different cuts, basically agrees with them: the cut in energy necessary to win over the T T background reduces the cross section down to σ(pp → jjW ± L W ± L ) ∼ 2.5 fb 6 . Remarkably, a collider with a center of mass energy increased by about a factor of 2 would do much better than the LHC. But this is an old story.

V V → hh scattering
As illustrated by Fig. 7, the situation is quite different for the W W → hh scattering. Here there is no equivalent of a fully transverse scattering channel, as the Higgs itself can be considered as a 'longitudinal' mode, being the fourth Goldstone from the strong dynamics. The scatterings W T W T → hh and W L W T → hh never dominate over W L W L → hh. As previously, at large energy (s M 2 W with fixed t and u) the amplitude for the various polarization channels can be decomposed as: where the numerical constants A's are given in Table 2. The only scattering channel which can have in principle a Coulomb enhancement is also the one with the energygrowing interaction, i.e. the longitudinal to Higgs channel. Furthermore, after deriving the differential cross sections for s v 2 This choice of Q 2 min is compatible with the kinematical constraint close to threshold energies and coincides with the cut applied in the right plot of Fig. 4 for s m 2 h (as Q 2 min → s/4). Notice that differently from W W → W W , the ratio of longitudinal over transverse scattering is not particularly enhanced by the cut. The behavior of the amplitudes near threshold is sensitive to the cubic self-coupling d 3 controlling the s-channel Higgs exchange. The continuous and dotted LL → hh curves respectively correspond to the MHCM4 and MCHM5 models with ξ = (a 2 − b) and d 3 as given in Eqs. (2.13) and (2.14). Table 2: W + W − → hh scattering: coefficients for the decomposition of the amplitude according to Eq. (3.15). By crossing and complex conjugation there are only 4 independent polarization channels, one of which has vanishing coefficients and is not shown. When computing the cross section each channel has to be weighted by the corresponding multiplicity factor reported in the third column.
one finds that in this case the naive estimate works well, and the onset of strong scattering is at energies √ s ≈ gv. Notice that the differential cross sections in Eq. (3.16) are almost independent of t, except in the very forward/backward regions where the longitudinal channels can be further enhanced by the W exchange.
A final remark concerns the behavior of the W L W L → hh cross section close to threshold energies. While at s v 2 the cross section only depends on (a 2 − b), as expected from the estimate performed in the previous section using the Goldstone boson approximation, at smaller energies there is a significant dependence on the value of the trilinear coupling d 3 . This is clearly shown in Fig. 7, where the continuous and dotted curves re-spectively correspond to the MHCM4 and MCHM5 models with ξ = (a 2 − b) and d 3 as given in Eqs. (2.13) and (2.14). As we will see in the next sections, such model dependency is amplified by the effect of the parton distribution functions and significantly affects the total rate of signal events at the LHC, unless specific cuts are performed to select events with a large M hh invariant mass.

The analysis
In this section we discuss the prospects to detect the production of a pair of Higgs bosons associated with two jets at the LHC. If the Higgs decays predominantly to bb, we have verified that the most important signal channel, hhjj → bbbbjj, is completely hidden by the huge QCD background. We thus concentrate on the case in which the decay mode h → W W ( * ) is large, and consider the final state hhjj → W W ( * ) W W ( * ) jj. As shown in Section 2, see Fig. 2, if the Higgs couplings to fermions are suppressed compared to the SM prediction, the rate to W W ( * ) can dominate over bb even for light Higgses. In our analysis we have set m h = 180 GeV and considered as benchmark models the SO(5)/SO(4) MCHM4 and MCHM5 discussed in the previous sections. All the values of the Higgs couplings are thus controlled by the ratio of the electroweak and strong scales ξ = (v/f ) 2 , see Eqs. (2.9), (2.10), (2.11), (2.13) and (2.14). As anticipated, the two different models do not simply lead to different predictions for the Higgs decay fractions, but also to different pp → hhjj production rates as a consequence of the distinct predictions for the Higgs cubic self-coupling d 3 . For example, for m h = 180 GeV one has where the acceptance cuts of Eq. (4.2) have been imposed on the two jets. Values of the signal cross section for the various final state channels will be reported in the following subsections for ξ = 1, 0.8, 0.5 in the MCHM4 and ξ = 0.8, 0.5 in the MCHM5. We do not consider ξ = 1 in the MCHM5 because the branching ratio h → W W ( * ) vanishes in this limit. Notice that the coupling hW W formally vanishes for ξ → 1 in both models, but in the MCHM4 all couplings are rescaled in the same way, so that the branching ratio h → W W ( * ) stays constant to its SM value. Cross sections for the SM backgrounds will be reported assuming SM values for the Higgs couplings and detailing possible (resonant) Higgs contributions as separate background processes whenever sizable. A final prediction for the total SM background in each model will be presented at the end of the analysis in Section 4.4 by properly rescaling the Higgs contributions to account for the modified Higgs couplings. Throughout our analysis we have considered double Higgs production from vector boson fusion only, neglecting the one-loop QCD contribution from gluon fusion in association with two jets. The latter is expected to have larger cross section than vector boson fusion [20], but it is insensitive to non-standard Higgs couplings to vector bosons. As discussed in the literature for single Higgs production with two jets [21,22], event selections involving a cut on the dijet invariant mass and η separation, as the ones we are considering, strongly suppress the gluon fusion contribution. We expect the same argument applies also to double Higgs production.
We concentrate on the three possible decay chains that seem to be the most promising ones to isolate the signal from the background: where l ± = e ± /µ ± , E T denotes missing transverse energy due to the neutrinos and j stands for a final-state jet. A fully realistic analysis, including showering, hadronization and detector simulation is beyond the scope of the present paper. We will stick to the partonic level as far as possible, including showering effects only to provide a rough account of the jet-veto benefit for this search. We perform a simple Gaussian smearing on the jets as a crude way to simulate detector effects. 7 Signal events have been generated using MADGRAPH [23], while both ALPGEN [24] and MADGRAPH have been used for the background. A summary with information about the simulation of each process, including the Montecarlo used, the choice of factorization scale and specific cuts applied at the generation level can be found in the Appendix B.
Our event selection will be driven by simplicity as much as possible: we design a cut-based strategy by analyzing signal and background distributions, cutting over the observable which provides the best signal significance, and reiterating the procedure until no further substantial improvement is achievable. As our starting point, we define the following set of acceptance cuts where p T j (p T l ) and η j (η l ) are respectively the jet (lepton) transverse momentum and pseudorapidity, and ∆R jj , ∆R jl , ∆R ll denote the jet-jet, jet-lepton and lepton-lepton separations.
In the next sections we will present our analysis for each of the three channels of Eq. (4.1) assuming a value m h = 180 GeV for the Higgs mass. A qualitative discussion on the dependence of our results on the Higgs mass will be given in Section 6.

Channel S 3 : three leptons plus one hadronically-decaying W
Perhaps the most promising final state channel is that with three leptons. The signal is characterized by two widely separated jets (at least one in the forward region) and up to (dσ/dp T ) [ab/GeV] Figure 8: Differential cross section of the signal S 3 in the MCHM4 with ξ = 1 as a function of the transverse momentum of the jets. On the left: jets from the W decay; On the right: jets from the primary interaction. Continuous line: hardest jet; Dashed line: second hardest jet. Cuts as in Eq. (4.2) have been applied, except no cut on the p T j of the jets from the W decay has been applied on the left plot. Jets from the primary interaction on the right plot are required to satisfy p T j > 30 GeV. two additional jets from the hadronically decaying W . By using the definition of "jet" given in Eq. (4.2) and working at the parton level, we find that the fractions of events with 4, 3 and 2 jets are, respectively, 40%, 56% and 4%. Considering that the background cross sections decrease by roughly a factor three for each additional jet, we will require at least 4 jets. In the case of the signal this choice allows the reconstruction of the hadronic W , which gives an additional handle to improve the signal to background ratio, as discussed in the following. Signal events with less than 4 jets mostly arise when some of the jets from the hadronic W decay are too soft to meet the p T j acceptance cut, while only ∼ 30% of the times two quarks merge into a single jet. In a more detailed and realistic analysis it is certainly worthwhile to explore the possibility of relaxing the constraint on p T j at least for the softer jets. Figure 8 shows the signal cross section as a function of the jets' transverse momentum.
In the second column of Table 3 we report the cross sections after the acceptance cuts of Eq. (4.2) for the signal and for the main backgrounds that we have studied. A few comments are in order: • The samples ttW (W ) + n jets with n larger than the minimal value are enhanced because of two main reasons: 1. jets originating from the top decay can be too soft and fail to satisfy the acceptance cut. Having extra available jets thus increases the efficiency of the acceptance cuts.
2. jets originating from the top decay are mostly central in rapidity, which makes the occurrence of a pair with a large dijet invariant mass and at least one jet forward (one of the requirements that we will impose to improve the detectability   ). For each channel, the proper branching fraction to a three-lepton final state (via W → lν, qq and τ → lνν τ ) has been included. of our signal) quite rare. Additional jets from initial state radiation are instead more likely to emerge with large rapidity.
Notice that including all the samples ttW (W ) + n jets at the partonic level is redundant and in principle introduces a problem of double counting. A correct procedure would be resumming soft and collinear emissions by means of a parton shower, which effectively accounts for Sudakov form factors, and matching with the hard matrix element calculation by means of some procedure to avoid double counting of jet emissions. Here we retain all the ttW (W ) + n jets contributions, as the cuts that we will impose on extra hadronic activity make the events with additional jets almost completely negligible, solving in this way the problem of double counting.
• Events with additional jets are much less important for the W ll backgrounds, where already at leading order the jets can originate from a QCD interaction. This is clearly illustrated in Table 3 by the small cross section of W ll5j after the cuts.
• For m h = 180 GeV the bulk of the contribution to ttW W + n jets is via Higgs production and decay: tth + n jets → ttW W ( * ) + n jets. Given the complexity of the final state, for n = 2, 3 we have computed this latter simpler signal as a good approximation of ttW W + n jets.
• There is no overlap between ttW W and ttW jj, since the latter has been computed at order O(α EW ) and as such it does not include contributions from intermediate W * → jj.
• • The process W l + l − 4j includes the Higgs resonant contribution hW jj → ZZW jj with ZZ → l + l − jj. This accounts for less than 7% of the total W l + l − 4j, and has not been reported separately for simplicity.
• The process W τ + τ − 4j leads to a three-lepton final state provided both τ 's decay leptonically. It is clearly subdominant compared to W l + l − 4j, but it is at the same time much less reduced by the cut on the dilepton invariant mass m SF -OS which we impose in the following (see Eq. (4.4)). For this reason it must be included in the list of relevant backgrounds.
As clearly seen from Table 3, after the acceptance cuts the background is still by far dominating over the signal. We therefore try to exploit the peculiar kinematics of the signal, which is distinctive of vector boson fusion events: two widely separated jets with a least one at large rapidity. We will refer to these two jets as "reference" jets in the following. To identify them we first select the jet with the largest absolute rapidity, and we then compute the dijet invariant mass it forms with each one of the remaining jets: the two reference jets will be those forming the largest dijet invariant mass. 8 Figure 9 shows the rapidity of the most forward jet (first reference jet), η ref J1 , the invariant mass of the two reference jets, m ref JJ , and their separation, ∆η ref In the case of the signal, the remaining jets will reconstruct a W boson. In Fig. 10 we plot the invariant mass of all the jets other than the reference ones, m W JJ , for both the signal 9 and the background. A second crucial feature of the signal is that there are two Higgs bosons in the final state: one decaying fully leptonically, the other semileptonically. The two leptons from the leptonically-decaying Higgs can be identified as those forming the opposite-charge pair 8 In the case of the signal this procedures selects, at the partonic level, the two jets which are not produced in the W decay with an efficiency of ∼ 0.97 (∼ 0.90) for ξ ≥ 0.5 (ξ = 0). A similar result is obtained using ∆ηJJ to select the reference jets. At the partonic level mJJ looks slightly better, although this has to be confirmed by a more detailed analysis. 9 Obviously, the distribution for the signal has a Breit-Wigner peak with a small continuous tail due to events where jets from the decay of the W have been chosen as reference jets. The experimental resolution on the dijet mass is much larger than the W width, and this has to be properly taken into account if we wish to use this observable to improve the significance of the signal. At the rough level of our analysis, this will be taken into account by selecting an appropriate mass window around the W mass. with the smallest relative angle. Both lepton spin correlations and the boost of the Higgs in the laboratory frame favour this configuration. For example, for a final state e + µ + e − X, we compute cos θ e + e − and cos θ µ + e − and we pick up the pair with the largest cosine. 10 Figure 11 shows the mass of this lepton pair, m h ll , for both the signal and the background. The other Higgs boson candidate is reconstructed as the sum of the remaining lepton plus all the jets different from the reference ones; its mass, m h JJl , is also shown in Fig. 11. As a first set of cuts, we use the observables discussed above and require that each individual cut reduces the signal by no more than ∼ 2%. We demand: Signal and background cross sections after this set of cuts are reported as σ 2 in Table 3.
We first notice that all the backgrounds with a number of jets larger than four have been strongly reduced: this is mostly due to the cuts on m W JJ and on m h JJl , that heavily penalize events with a large available jet energy. This is the reason why we can neglect the problem of double counting introduced by including samples with arbitrary number of jets: after the cuts of Eq. (4.3) are imposed, the events with a too large number of jets are essentially rejected.
We now proceed to identify the cuts which are most effective for improving the significance of our signal. We first notice that the largest background, W l + l − 4j, has a dominant contribution from the Z resonance. In Fig. 12 we plot the invariant mass, m SF -OS , of the e + e − or µ + µ − pair found in the event. If two such pairings are possible (this is the case when the three leptons in the final state all have the same flavor), the invariant mass closer to M Z is selected. It is clear that the significance of the signal can be largely improved by excluding values of m SF -OS that are in a window around the Z pole or close to the photon pole.
We searched for the optimal set of cuts on m SF -OS and other possible distributions (including all those mentioned above and shown in Figs. 9-12 by following an iterative procedure: at each step we cut over the observable which provides the largest enhancement of the signal significance, until no further improvement is possible. The significance has been computed performing a goodness-of-fit test of the background-only hypothesis with Poisson statistics. 11 We assumed 300 fb −1 (3000 fb −1 ) of integrated luminosity at the LHC (at the LHC luminosity upgrade). We end up with the following set of additional cuts:

4)
M Z and Γ Z being respectively the Z boson mass and width. The cross sections for signal and backgrounds after these cuts are reported as σ 3 in Table 3.
As a final set of cuts, we consider a further restriction on m W JJ around the W pole: The cuts in Eqs.  Table 3. An additional veto on b-jets has a relatively small impact, since it would reduce the ttW (W ) + jets backgrounds which are however already subdominant. Assuming for example a b-jet tagging efficiency of b = 0.55 for η b < 2.5, the signal significances increase by approximately 10%. 11 Given the number of signal and background events a p-value is computed using the Poisson distribution.
The significance is defined as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p-value. For example, a p-value = 2.85 × 10 −7 corresponds to a 5σ significance.

Estimate of showering effects
There is still one feature of the signal which has not been exploited yet. A unique signature of vector boson fusion events is a very small hadronic activity in the central region (rapidities between the first and second reference jet) [25]. This is not the case for the backgrounds, especially after imposing the cuts on ∆η ref JJ and m ref JJ in Eq. (4.4), which imply a large total invariant mass √ŝ for the event and therefore a stronger radiation probability (the radiation probability is proportional to log 2 (ŝ/λ 2 ), where λ is the infrared/collinear cut-off). By vetoing this activity in the central region, one can then obtain an additional suppression of the background without affecting much the signal. For our event selection, the effect of the showering on the background is twofold: a large number of jets appears in the final state and, as a consequence, both m W JJ and m h JJl are shifted towards larger values. 12 In order to assess the relative impact of these effects, we have processed both the signal and the most relevant background, W l + l − 4j, through the parton shower PYTHIA [26], and we have reconstructed the final-state jets using a cone algorithmà la UA1, as implemented in the GETJET [27] routine. To avoid mixing different and unrelated effects, we have studied only the relative efficiencies of the various cuts compared to the partonic level analysis. Figure 13 shows the distribution of the number of jets for both the signal and the W l + l − 4j background after showering and imposing the acceptance cuts of Eq.  4.4)). 12 Let us denote as X the system of final state jets other than the reference jets. If the additional radiation is from the X system, MX will be unaffected, if instead it is from initial state or from the reference jets the momentum of the radiation will add to that of the X system increasing its mass. After showering, we find the following additional reduction on the signal and background rates compared to the partonic level: A further veto on events with more than 5 jets has a negligible impact, both for the signal and the background, as the cuts on m W JJ and m h JJl effectively act like a veto on extra hadronic activity. Although a full inclusion of showering effects can only be obtained by using matched samples, yet we expect that our rough estimate captures the bulk of the effect.

Additional backgrounds from fake leptons
Since the number of signal events at the end of our analysis is very small, it is important to check if there are additional potential sources of reducible backgrounds. Here we consider the possibility that a jet is occasionally identified as a lepton, in which case we speak of a "fake" lepton from a jet. We find that the effect of such jet mistagging is likely to be negligible in the three lepton case as follows.
As shown in Table 3, the dominant background in this case is W ll4j. After the acceptance cuts we have σ pp→W l + l − 4j = 12 fb. A first possibility is that a fake lepton (most likely an electron) originates from the misidentification of a "light jet" (originated either from gluons or from a light quark). In this case the most serious potential source of background is ll + 5j. Since the relative cross section after the acceptance cuts is σ pp→l + l − +5j 2.8 pb, even a modest mistagging probability 10 −3 (according to both CMS and ATLAS collaborations [28,29], rejection factors as small as 10 −5 can be achieved by making the jet reconstruction algorithm tight enough) is sufficient to suppress this source of background.
A second possibility is that a heavy quark (b or c) decays semileptonically and the resulting lepton is isolated. Backgrounds of this type are l + l − bb + 3j and l + l − cc + 3j, which have similar cross sections. To estimate the first process we have computed the cross section for pp → l + l − bb + 3j where one of the two b's is randomly chosen and assumed to be mistagged as a lepton. After applying the cuts of Eqs. (4.2)-(4.5) we obtain a cross section of 1.2 fb. A b mistagging probability ∼ 10 −3 is therefore sufficient to keep this background below the irreducible background. This level of rejection seems feasible at the LHC: in Ref. [30] a mistagging probability of 7×10 −3 is estimated for a lepton with p T > 10 GeV, rapidly decreasing (by a factor 10 to 30 for p T > 20 GeV) with increasing p T . A potentially more problematic contribution is tt + 3j, whose cross section after acceptance cuts is σ pp→tt+3j = 770 fb. A b mistagging probability 10 −3 makes this background at most as important as the other tt channels in Table 3, which however turn out to be subdominant at the end of the analysis.
We thus conclude that the effect of fake leptons is expected to be negligible in the three lepton case.
4.2 Channel S 2 : two same-sign leptons plus two hadronically-decaying W 's In the case of a two-lepton final state, in order to keep the background at a manageable level, and avoid the otherwise overwhelming tt background, we are forced to select only events with two leptons with the same charge.
Along with the two leptons, the signal is characterized by two widely separated jets and up to four additional jets from the two hadronically-decaying W 's. Using the definition of "jet" given in Eq. (4.2) and working at the parton level, we find that in the majority of the events at least one quark from a W decay is either too soft to form a jet or it merges with another quark to form one single jet. The fractions of signal events with 6, 5, 4 and 3 jets are respectively 0.16, 0.43, 0.37 and 0.04. We choose to retain events with at least 5 jets. Including events with a lower jet multiplicity is not convenient, as the background increases by a factor ∼ 3 for each jet less, and the identification of the Higgs daughters in the signal becomes less effective.
In order to suppress the otherwise overwhelming W l + l − + jets background, we forbid the presence of extra hard isolated leptons: we require to have exactly two leptons (with the same charge) satisfying the acceptance cuts of Eq. (4.2).
In this way the resonant contribution W Z + jets → W l + l − + jets is strongly suppressed. Other backgrounds that can have 3 leptons in their final state at the partonic level are also reduced. 13 In the second column of Table 4, we report the cross sections after the acceptance cuts of Eq. (4.2) for the signal S 2 and for the main backgrounds we have studied. A few comments are in order (comments made for Table 3 also apply and will not be repeated here): • While the cross section for W W production is obviously much larger than the cross section for W W W production, those for W W W and W +(−) W +(−) (equal sign) are comparable, so that both these latter backgrounds must be included.
Channel   ). For each channel the proper branching fraction to a same-sign dilepton final state (via W → lν, qq and τ → lνν τ , qqν τ ) has been included. For the decay modes of the taus, see text. In the case of the background W l + l − 5j, the lepton with different sign is required to fail the acceptance cuts of Eq. (4.2), see text.
• The background W W W W j includes the resonant contribution W W hj → W W W W j.
For simplicity, since W W W W j represents only a small fraction of the total background at the end of the analysis, the Higgs resonant contribution has not been reported separately. There is no overlap between W W W W j and W W W jjj, since the latter has been generated at order O(α 3 EW ) and as such it does not include contributions from intermediate W * → jj.
• The process W τ + τ − 4j leads to a dilepton final state if one τ decays leptonically and the other is mistagged as a QCD jet. 14 We have conservatively assumed that the momentum of the mistagged jet is equal to that of the parent τ , and we have included a mistagging probability at the end of our analysis.
• The process W τ + τ − 5j leads to a dilepton final state if one τ decays leptonically and the other is either not detected (independently of its decay mode), or it decays hadronically and it is mistagged as a QCD jet. We include the mistagging probability at the end of our analysis, when we impose a veto on hadronic taus in the event. The momentum of the mistagged jet has been assumed to be equal to that of the parent τ .
• Included in the cross sections of the processes ttW W j, tthjj, tthjjj and ttW jjj is the contribution of the three leptons final state where both tops decay leptonically and the wrong-sign lepton fails the acceptance cut. The analog contribution from ttW 4j has been computed and found to be very small, and for simplicity is not reported here.
• If required for trigger issues, the cut on the hardest lepton can be increased to p T > 30 GeV at basically no cost for the signal (the efficiency relative to the acceptance cuts of Eq. (4.2) is 97%). This should be sufficient to pass the high-level trigger at CMS and ATLAS even during the high-luminosity phase of the LHC. Furthermore, the presence of a huge amount of hadronic energy in the signal might help to reduce the trigger requirements on the p T of the leptons.
As one can see from Table 4, after the acceptance cuts the background dominates by far over the signal. In order to select our first set of additional cuts we proceed in close analogy to the three-lepton case. We first identify the two "reference" jets as described in  Fig. 9 and are thus not reported here. Next, we reconstruct one hadronic W as follows: using all the non-reference jets, we select the pair with invariant mass m W JJ closer to the W mass. If |m W JJ − M W | < 40 GeV we label these two jets as j W 1 1 and j W 1 2 , otherwise the event is rejected. All the remaining jets will be labelled as belonging to the other hadronic W , j W 2 k . We then proceed to identify the decay products of the two Higgs bosons. As a criterion to select the lepton and the W from the same Higgs, we use the separation ∆R between them, as they will tend to emerge collimated due to the Higgs boost. More explicitly, by defining we compute ∆R l 1 W 1 and ∆R l 2 W 1 . If ∆R l 1 W 1 < ∆R l 2 W 1 , we assign l 1 and j W 1 k to the first Higgs and the remaining jets and lepton to the second one; otherwise we form the first Higgs boson candidate with l 2 and j W 1 k and the other one with the remaining jets and lepton. We denote by m h lW 1 and m h lW 2 the invariant mass of the Higgs system containing respectively the jet j W 1 k and j W 2 k . They are plotted in Fig. 15 for both the signal and the background.
as potential backgrounds. As a first set of cuts we use the observables discussed above and require that each individual cut reduces the signal by no more than ∼ 2%. We demand: Signal and background cross sections after these cuts are reported as σ 2 in the third column of Table 4. Notice that similarly to the three-lepton case, all the backgrounds with a large number of jets have been strongly reduced. As done for the three-lepton channel we search for the optimal set of cuts by following an iterative procedure: at each step we cut on the observable which leads to the largest increase in the signal significance, until no further improvement is possible. We end up with the following set of additional cuts: Signal and background rates after these cuts are reported as σ 3 in Table 4. As a final cut, we require m W to deviate from M W by no more than twice the CMS or ATLAS dijet mass resolution:  Table 4. We do not impose an analog cut on the invariant mass of the second hadronic W candidate, formed by all the remaining jets, since the previous cuts already strongly suppress the backgrounds with large jet multiplicities, so that in the majority of the events, the second W system is formed by a single jet and hence has a small invariant mass.   A further reduction of the W l + l − 5j background can be achieved by vetoing events which contain soft leptons (1 GeV ≤ p T l ≤ 20 GeV) that are isolated from any jet (∆R jl > 0.4) and form a same-flavor opposite-sign pair with at least one of the two hard leptons. In order to estimate the efficiency of such veto on the signal, we showered and hadronized the events with PYTHIA. We find that the majority of the additional leptons originates from the decay of the final-state hadrons, especially from the leptonic decay of charmed mesons. The fraction of signal events rejected is quite small, less than 4%, and we will neglect it. For simplicity, the effect of the veto on all the backgrounds with exactly two leptons at the parton level 15 will also be neglected. The cross sections after this veto are reported in Table 5 as σ CM S 5 and σ AT LAS 5 , respectively after the cut of Eq. (4.9) and Eq. (4.10). 15 These backgrounds are: W +(−) W +(−) 5j, W W W jjj, hW jjj, W W W W j, W W W W jj, ttW j, ttW jj, ttW W and W τ + τ − 4j.  Table 5, respectively after the cut of Eq. (4.9) and Eq. (4.10).

Estimate of showering effects
As for the three leptons channel, background events have generically a larger hadronic activity in the central region, compared to the signal, once the showering is turned on. In this case, the main effect is that of shifting the m h lW 2 distribution towards larger values. This is clearly illustrated by Fig. 16, which reports the sum of the cross sections of the main backgrounds, W l + l − 5j, W W W jjj, hW jjj, ttW jj and ttW W j, as a function of . After the showering, we find the following additional reduction on the rates of the signal and of the main backgrounds:   the three-lepton case, it is worth stressing that these results should be confirmed by a full treatment of showering effects using matched samples.

Fake leptons and lepton charge misidentification
Differently from the three lepton case, we expect the effect of fake leptons from jet misidentification to be much more relevant for same-sign dilepton events. The reason is that the cross section for the production of two same-sign W 's is about two orders of magnitude smaller than that for W + W − . It might then turn out to be more convenient to produce one W and pay the misidentification probability factor for a fake lepton from an extra jet than having a second leptonically-decaying W with the same sign. Moreover, an additional source of background comes in this case from events where the charge of a primary lepton is misidentified. A precise estimate of all these effects is beyond the scope of the present paper, since it would require a full detector simulation as well as a dedicated strategy designed to minimize the effect while keeping the lepton reconstruction efficiency as high as possible. We will limit ourselves to performing a crude estimate and quoting the rejection factors required to make such backgrounds negligible. The most serious potential source of background with fake leptons from light jets is W + 6j. Table 6 reports the relative cross section after all the cuts imposed in our analysis (without including any mistagging probability factor). The quoted number is obtained by computing the cross section for pp → W + 6j, picking up randomly one jet and assuming it is mistagged as a lepton, and multiplying by a factor 6 to account for the six different possibilities to mistag a jet. A rejection factor of ∼ 10 −5 , quoted as achievable by both collaborations [28,29], is sufficient to reduce this background down to a manageable level. A dedicated experimental study is however required to establish whether this can be obtained without reducing too much the lepton identification efficiency. The largest background with fake leptons from heavy quarks is ttjj, with one b from a top decay tagged as a lepton. Table 6 reports the cross sections for ttjj and tt3j after all the cuts plus a b-jet veto. For simplicity, we have approximated the "fake" lepton momentum to be equal to that of its parent b quark. This is a conservative, reasonable assumption as the requirement of having a hard, isolated lepton forces the remaining hadronic activity from the b decay to be quite soft to escape detection [30]. As Table 6 shows, our rough estimate seems to indicate that rejection factors as small as 10 −4 are required to make this background comparable to those studied in the previous sections.  Finally, we consider the most dangerous backgrounds where the charge of a primary lepton is not correctly measured. The size of this effect strongly depends on the algorithm used to reconstruct the leptons, and it is in general larger for electrons than for muons. Table 7 reports the cross sections for tt3j, tt4j and l + l − 5j after all the cuts imposed in our analysis plus a b-jet veto, assuming that the charge of one lepton has not been correctly measured. Even after applying a charge misidentification probability 10 −3 for electrons and a few ×10 −4 for muons as quoted in the ATLAS TDR [29] for leptons with p T ∼ 100 GeV, the l + l − 5j background is still sizable, while the tt+jets channels are smaller. Since however l + l − 5j has no neutrinos, while the signal has two of them, E T provides an important handle to reduce this background. In Fig. 17 we plot E T for both the signal S 2 in the MCHM4 at ξ = 1 and l + l − 5j. We have computed E T by including a Gaussian resolution σ( E T ) = a · E T /GeV, where E T is the total transverse energy deposited in the calorimeters (from electrons and jets). We chose a = 0.55, which is expected to be a good fit for the ATLAS detector [29]. Assuming a charge misidentification probability equal to 10 −3 for electrons and 3 × 10 −4 for muons, we find that a cut E T > 25 GeV provides the best sensitivity, the efficiency on the signal being 0.9. The corresponding cross sections after this cut (including the charge misidentification probabilities) are reported in the last column of Table 7, which shows that the background has been reduced to a manageable level although it remains non-negligible.
To summarize, our estimates show that both backgrounds with fake leptons from jets and those with misidentification of the charge of a primary lepton are expected to have an important impact on the same-sign dilepton channel. A detailed experimental study is therefore needed to determine the precise relevance of these backgrounds and fully assess the signal significance in this case.

Channel S 4 : four leptons
The last channel we have considered is the one with four leptons. In this case, the signal is characterized by two widely separated jets with no further hadronic activity in between, four hard leptons from the decay of the two Higgses and missing energy. The second column of Table 8 reports the cross sections (σ 1 ) for the signal S 4 and for the main backgrounds we have studied after the acceptance cuts of Eq. (4.2). We notice that (comments made for Tables 3 and 4 also apply and will not be repeated here): • The Higgs resonant contributions hl + l − jj → W W l + l − jj and hjj → τ + τ − l + l − jj are separately reported and are thus not included in the backgrounds τ + τ − l + l − jj and W W l + l − jj.
• The background l + l − l + l − jj includes the Higgs resonant contribution hjj → l + l − l + l − jj. The latter has not been separately reported in this case since the entire background is negligible at the end of the analysis.
• The background W W W W jj is largely dominated by its resonant subprocess hW W jj → W W W W jj. The non-resonant contribution is negligible and it has not been reported in the table.
As for the three-and two-lepton case, the two reference jets have been identified as the pair with the largest invariant mass containing the most forward jet. We identify the pair of leptons coming from the first Higgs, (l + 1 l − 1 ), and that from the second Higgs, (l + 2 l − 2 ), by using the angular separation as a criterion: there are two ways of combining the initial four leptons in two opposite-sign pairs, and we choose the combination which maximizes cos θ l + 1 l − 1 + cos θ l + 2 l − 2 , as leptons from the same Higgs tend to emerge collimated due to both the Higgs boost and the spin correlations. We will refer to (l + 1 l − 1 ) and (l + 2 l − 2 ) defined in this way as our two Higgs candidates.
As a first set of cuts, we use the invariant mass and rapidities of the two reference jets as well as the invariant masses of the two Higgs candidates. The corresponding distributions for the signal have the same shape as those in Figs. 9 and 11    previous two analyses, we require that each individual cut reduces the signal by no more than ∼ 2%. We demand:  Table 8 as σ 2 . At this level, the l + l − l + l − jj background is much larger than the signal. It can be drastically reduced, however, by exploiting the fact that the signal has four neutrinos, hence a substantial amount of missing energy, while l + l − l + l − jj has none, see Fig. 18. Here as before, the missing energy of each event has been computed by including a Gaussian resolution σ( E T ) = 0.55 · E T /GeV to account for calorimeter effects, where E T is the total transverse energy of jets and electrons. A further reduction is obtained by cutting on the invariant mass of same-flavor opposite-sign lepton pairs, m SF -OS , excluding values around M Z . This clearly suppresses l + l − l + l − jj, as well as all the processes with resonant Z contributions. We find that optimized values for these cuts, which almost completely eliminate the l + l − l + l − jj background, are as follows: The individual efficiencies on l + l − l + l − jj are of ∼ 5 × 10 −3 for the cut on E T and ∼ 0.05 for that on m SF -OS . The other feature of the signal that can be exploited to further reduce the background is its small hadronic activity in the central region. We have thus imposed a veto on any extra hard and isolated jet (in addition to the two reference jets) satisfying the acceptance cuts of Eq. (4.2). Signal and background cross sections after this veto and the cuts in Eq. (4.12) are reported in Table 8 as σ 3 . Next, as for the other channels, we have monitored the observables of Eqs. (4.11) and (4.12) in search for the optimal set of cuts. We find that the best improvement in the signal efficiency is obtained by strengthening the cut on the separation between the reference jets as follows: Signal and background cross sections after this cut are reported in Table 8 as σ 4 . A final reduction of the background is obtained by imposing a veto on b-jets in the central region η b < 2.5. Assuming a b-tagging efficiency b = 0.55 we find the signal and background cross sections reported in Table 8 as σ 5 .

Results
We collect here our final results for the three channels and the statistical significance of the signal in each case. Table 9 reports the final number of events with 300 fb −1 of integrated  Table 9: Number of events with 300 fb −1 of integrated luminosity based on the cross sections predicted in each channel at the end of the analysis (σ AT LAS

4
, σ AT LAS 6 and σ 5 for the channels with respectively three, two and four leptons). Values for the background have been obtained by properly rescaling the Higgs contributions to account for its modified couplings in each model.

SM hypothesis CHM hypothesis
Significance and σ 5 for the channels with respectively three, two and four leptons). Values for the background have been obtained by properly rescaling the Higgs contributions to account for its modified couplings in each model (see Eqs. (2.9)-(2.11)). The Higgs decay branching fractions that have been used in the case of the MCHM5 are those shown in the right plot of Fig. 2. Backgrounds from fake leptons and charge misidentification, for which we provided an estimate in Sections 4.1.2 and 4.2.2, have not been included. In the case of the same-sign dilepton channel their inclusion is likely to decrease the signal significance.
The signal significance is shown in Table 10 for two statistical hypotheses: 16 in the first hypothesis (dubbed as SM in the Table), we assume the Standard Model and compute the significance of the observed excess of events compared to its expectation. This means in particular that the number of background events assumed in this case is that for the SM, ie, ξ = 0. In the second hypothesis (dubbed as CHM in the Table), we assume that the Higgs couplings have been already measured by means of single production processes, and that the underlying model has been identified. In this case the assumed number of background events is that predicted by taking into account the modified Higgs couplings.

Features of strong double Higgs production
The discussion insofar focused on the possibility to detect the signal over a relatively large background. This was done as a counting experiment. The very limited number of events left no other possibility open. Assuming a much larger statistics one can try to establish the distinguished features of strong double Higgs production. These are basically two. The first and most important one is the hardness of the W L W L → hh subprocess in the SILH scenario, corresponding to an s-wave dominated cross section growing with the invariant mass squared m 2 hh = (p h ) 2 of the two Higgs system: σ(W W → hh) ≈ m 2 hh /(32πf 4 ). In spite of this obvious property of the signal, as we will discuss below, a harder cut on m hh would not help our analysis. A second feature is the presence of two energetic forward jets with a transverse momentum p T peaked at p T ∼ m W , independently of the jet energy. The absence of a typical scale in the collinear momentum of the virtual W L emitted from the quark lines implies that also the partonic cross sectionσ(qq → hhjj) grows with the square of the center of mass energyŝ of the hhjj system. For the same reason, the quantities m hh , H T (where H T is defined as the scalar sum of the transverse momenta of all the jets and charged leptons forming the two Higgs candidates) and m ref JJ will all be distributed, for fixedŝ, with a typical value of order √ŝ . Given that the partonic cross section of the signal grows withŝ, one would naively expect the distribution of these variables to be harder for the signal than for the background. Similarly, the rapidity separation of the reference jets ∆η ref JJ , which for the signal is directly correlated with ln m ref JJ (given that the p T of the jets is peaked at ∼ m W ) is expected to have a more significant tail at large values than for the background. In practice things are however more complicated. First of all, in order to realize the above expectations it is essential to identify the Higgs decay products and to impose the optimized cuts of Eq.  4) and (4.6). Before the optimized cuts, the background distributions are actually harder than the signal ones, and this is more so for m hh and H T . Secondly when devising optimized cuts not all variables work equally well. In particular the signal significance is better enhanced by cutting on ∆η JJ as shown in the analysis. This is largely due to complex features of the background that are not immediately described analytically. There are however features of the signal than can be easily understood analytically. In particular the relative hardness in the distributions of m JJ and m hh is one such feature. Indeed, working in first approximation with √ŝ , m W , m h and using σ(W W → hh) ∝ m 2 hh and the splitting quark function P q→W L q (x) ∝ (1−x)/x, we find the following partonic distributions at fixed s: This result shows that, for the signal, m hh is distributed with lower values than m ref JJ . This is a consequence of the soft 1/x singularity in the splitting function that favors softer hh invariant masses. This property is clearly shown by Fig. 19: One can see that m ref JJ has a significant tail up to 3.5 TeV while m vis hh dies off already above 1 TeV (the total m hh dies off above 1.5 TeV). Notice that, after optimized cuts, also for the background the distribution of m JJ is harder than that of m hh .
We already explained that at the stage of optimization cuts ∆η ref JJ is the best variable to cut on. On observing Fig. 19, one may wonder if additional cuts on any of the above observables could further enhance the signal. In practice we have checked that below the already optimistic luminosity of 3 ab −1 this is not the case, due to the loss of statistics. This is, for example, illustrated by Fig. 20, where we show the number of three-lepton events at the end of the analysis (i.e., after the optimized cuts) as a function of m vis hh and H T . Additional cuts on m vis hh or H T would always further reduce the significance. The only possible and marginal improvement would be obtained by a further cut on ∆η ref JJ in the case ξ = 0.5. Of course, if an excess were to be discovered, the study of the distributions in the above variables would provide an essential handle to attribute the excess to strong double Higgs production. It turns out that the scalar p T sum, H T , seems overall the best variable in this regard: its shape is distinguished from both the SM background and from the ξ = 0 limit of pp → hhjj, and this is a simple consequence of the signal consisting of a pure s-wave amplitude. Notice that the normalized distributions in m vis hh , m ref JJ and ∆η ref JJ ,while they significantly differ from the background, surprisingly depend very little on ξ. In particular they are basically the same as for ξ = 0, where σ(W W → hh) is dominated by the forward t−channel vector boson exchange and goes to a constant ∝ m 2 W , rather than growing, at large energy. This flattening in the ξ dependence is due to the rapidly decreasing quark PDFs that makes the cross section dominated by events close to threshold, that is with m 2 hh /ŝ fixed. On the other hand, even close to threshold the distribution in H T of the signal stands out, on both QCD background and on ξ = 0.

Higgs mass dependence
All the results presented so far were obtained by setting the Higgs mass to 180 GeV. This choice was made to enhance the decay branching fraction to two W 's. Varying the Higgs mass affects the decay branching ratios and the signal cross section, as well as the kinematics of the events. For example, Fig. 21 shows how the value of the cross section of the three-lepton channel changes after the acceptance cuts when varying the Higgs mass. In order to extract the different effects, we have set the BR(h → W W ) to one.
The overall decrease of the signal for lighter Higgs masses is the result of two competing effects. On one hand, due to the fast decrease of the quark PDFs at large energies, the cross section is dominated by events close to threshold, i.e., corresponding to quarks carrying away a fraction of the proton's momentum of order x 1 x 2 ∼ 4m 2 h /s. The cross section is thus expected to increase for lighter Higgs masses, as smaller values of x 1,2 are probed. This is indeed the case before the acceptance cuts, as shown for the MCHM4 and the MCHM5 by Table 11. On the other hand, the lighter the Higgs is, the softer its decay products, and the less effective the acceptance cuts. In fact, this second effect dominates and leads to the overall decrease of the signal cross section when the Higgs mass diminishes. We have checked that, as expected, the bulk of the reduction comes from the p T cut on the softest jet and lepton.
As already noticed, the value of the Higgs mass also affects the final number of signal  events through the decay branching ratios. In models like the MCHM4, where the Higgs couplings are shifted by a common factor, the branching ratio to two W 's is the same as in the SM, and thus rapidly falls to zero below the W W threshold. In general, however, the branching ratios can be significantly modified compared to the SM prediction, and the branching ratio BR(h → W W ) can be still sizable even for very light Higgses. This is for example the case of the MCHM5 with ξ ∼ 0.5, as illustrated by Fig. 2.

Luminosity vs energy upgrade
The key feature of the composite Higgs scenario is the partonic cross section growing withŝ. This behaviour persists until the strong coupling scale is reached where new states  are expected to come in and the growth in the cross section saturates. Of course, with a sufficiently high beam energy, it is the direct study of the new, possibly narrow, resonances that conveys most information on the compositeness dynamics. 17 Still, it is fair to ask how better a higher beam energy would allow one to ascertain the growth in the partonic cross section below the resonance threshold. Unfortunately, since after the acceptance cuts the signal is still largely dominated by the background, it turns out that it is not possible to properly answer that question without a dedicated study, in particular without cut optimization, a task that exceeds the purpose of this paper. Here we limit ourselves to a qualitative discussion based on "standard" (at the LHC) acceptance cuts and on a few additional cuts which seem the most obvious to enhance the signal to background ratio. Since the most promising channel is the one with three leptons and the respective background is dominantly W l + l − 4j, we restrict our discussion to this channel and we examine the behaviour of this background only. A reasonable expectation is that, as the centre of mass energy grows, the signal features become more prominent over the background. In the upper panel of Table 12, we report the cross section, with the same acceptance cuts as for 14 TeV, as a function of the collider energy √ s for both the signal and the background. It is manifest that contrary to naive expectations the signal to background ratio is insensitive (if not degrading) to the rising collider energy. As a matter of fact, this result is easily understood as follows.
In general, at fixedŝ the differential cross section to some final state X can be written as the product of a partonic cross sectionσ(q A q B → X) times a partonic luminosity factor ρ AB : 1) 17 In the simplest models based on 5-dimensional constructions there exists no spin-0 resonance that could provide an s-channel enhancement of W W → hh. Such a resonance instead exists in a 4-dimensional example based on a linear SO(5)/SO(4) σ-model [11]. For recent studies on the detection of vector and scalar heavy resonances at the LHC see for example [31].   where f q (x, Q 2 ) denotes the PDF for the parton q, and Q is the factorization scale. An implicit sum over all possible partons q A , q B is understood. The dependence on the collider energy s only enters through the luminosity factors ρ AB (ŝ/s, Q 2 ), which rapidly fall off whenŝ/s increases, see instance a simple form ρ(τ, Q 2 ) = 1/τ q , which gives a good fit of the ud (gg) parton luminosity for τ 0.01 with q 0.5 (q 1.35), see Fig. 22. With that simple scaling, for all processes where the integral defining F is saturated at the lower end (ŝ ∼ s 0 ) one has that under s → α · s the integrated cross sections rescale universally as σ → α q · σ. Even though this idealized situation is not exactly realized for our processes, we believe it largely explains the 'universal' growth in the cross sections shown in Table 12. That is a simple reflection of the growth of the PDFs at small x. This phenomenon is typical when considering rather inclusive quantities, as it is the case for the total cross section after simple acceptance cuts. To the extreme case, with suitable hard and exclusive cuts, one should be able to contrast the ∝ŝ growth of the partonic signal cross section on the ∝ 1/ŝ decay of the background.
The first obvious thing to do in order to put the underlying partonic dynamics in evidence is to rescale the lower cut as s 0 = y s, with fixed y. Doing so, it is easy to see that, independent of the form of ρ, for a partonic cross section scaling likeσ ∝ŝ p one finds an integrated hadronic cross section scaling in the same way: σ ∝ s p . The lower panel in Table 12 shows the signal and background cross sections as a function of √ s after imposingŝ > 0.01 s in addition to the acceptance cuts. One notices immediately that the background cross section still grows with √ s, although with a much slower rate. In fact, this is not surprising, since in absence of more exclusive cuts the t-channel singularities of the background W l + l − + jets imply a constant cross section even at the partonic level, σ ∝ 1/M 2 W , with a possible residual logarithmic growth due to the soft and collinear singularities. Imposing more aggressive cuts can further uncover the 1/ŝ behavior of the background at high energies, but the efficiency on the signal would likely be too small, and assessing the effectiveness of this strategy to enhance the signal significance requires a dedicated study.
A more surprising result is the behavior of the signal in Table 12: after the rescaled cut, one would expect the signal cross section at ξ = 1 to grow like s, modulo a mild logarithmic evolution of the PDFs. We do observe such a growth between 10 and 20 TeV, but the growth saturates towards 40 TeV. On inspection, this is a simple consequence of the acceptance cut we have imposed. A first effect comes from the constraint on the rapidity of the forward jets: |η j | < 5. Since the p T of the forward jets is ∼ m W , their rapidity will scale like ln √ s/m W . Our Montecarlo simulation shows that above 40 TeV the η j distribution peaks above 4.5, and thus the apparently reasonable acceptance cut eliminates a significant portion of the signal (approximately 20% at 40 TeV, which increases when selecting events at largeŝ, or large m hh ), see Fig. 24. We do not know how realistic is to consider detectors with larger rapidity acceptance, but it seems that one lesson to be drawn is that forward jet tagging is a potential obstacle towards the exploitation of very high beam energies.
A second and more dramatic effect comes from our request of having highly separated jets and leptons. Quite intuitively, the more energetic the event is, the more boosted the Higgses, and the more collimated their decay products. This implies that the efficiency of the "standard" isolation cuts in Eq. (4.2) drastically decreases at large energies. Rather thanŝ, the best variable to look at in this case is m hh , which is the real indicator of the strength of the hard scattering in the signal and consequently of the boost of the Higgs decay products. At 14 TeV, in the MCHM4 at ξ = 1, the total fraction of three-lepton events where the two quarks from the decay of the hadronic W are reconstructed as a single jet, so that the event has three hard and isolated jets, is 0.17. This has to be confronted with the fraction of events with four hard isolated jets, i.e., those selected for the analysis of Section 4.1, which is equal to 0.4. If one requires m hh > 750 GeV, the fraction of events where the hadronic W is reconstructed as a single 'fat' jet grows to 0.32, while the fraction of four jet events decreases to 0.36. For m hh > 1500 GeV, these fractions become respectively 0.59 and 0.18. It is thus clear that a different cut and event selection strategy has to be searched for if one wants to study the signal at very large energies. Certainly, events with three jets will have to be included in the analysis, and jet substructure techniques [32] can prove extremely useful to beat the larger background. Ultimately, the very identification  and reconstruction of the signal events will probably have to be reconsidered, trying to better exploit the peculiar topology of the signal events at large energy, a limit in which the two Higgses and the two reference jets form four collimated and energetic clusters.
Other than to beat the background, studying the signal at large energies is crucial to disentangle its model dependency and extract (a 2 − b). If the subdominant ZZ → hh contribution is neglected, the signal cross section at fixed m hh can be written as the product of a W W → hh hard cross section times a W luminosity factor ρ W : An implicit sum over all partons q A , q B and over transverse and longitudinal W polarizations i, j = T, L is understood. P T,L A,B (z) are the W splitting functions given in Eqs. (3.5) and (3.6), which depend upon the parton flavor A, B through the vectorial and axial couplings. Unless a cut on the rapidity of the final Higgses is imposed (see Section 3.2), the contribution of the longitudinal W 's is by far dominating both at ξ = 0 and at ξ = 0. Hence, by taking the ratio of the observed number of signal events over the SM expectation at ξ = 0, the W luminosity factors drop out, and the quadratic growth in m hh can be extracted. The left plot of Fig. 25 shows such ratio for events with no cuts imposed. After the cuts, one obtains a similar plot, although the range of accessible values of m hh is reduced as the consequence of the smaller efficiency at large m hh discussed above. The plot on the right in Fig. 25 reports, instead, the ratio of the number of signal events predicted in two different models, respectively the MCHM4 and MCHM5, with BR(h → W W ) set to one. As expected, at large m hh the universal ∝ m 2 hh behavior dominates over the model-dependent threshold effects controlled by the Higgs trilinear coupling, and the ratio tends to 1. These two plots show that the strong scattering growth of the signal could be established, and (a 2 − b) be extracted, if one were able to study events with m hh up to 1.0 − 1.5 TeV, corresponding to m vis hh up to ∼ 0.7 − 1.0 TeV. As Fig. 20 clearly illustrates, at 14 TeV with 300 fb −1 there are too few events at large m hh to perform such study. It is thus necessary to have either a luminosity or an energy upgrade of the LHC.
With 3 ab −1 of integrated luminosity our analysis predicts approximately 50 threelepton events and 150 two same-sign lepton events in the MCHM4 at ξ = 1, see Table 9. Although these are still small numbers, this shows that even following a standard strategy a tenfold luminosity upgrade of the LHC should be sufficient to extract the energy growing behavior of the signal. The advantage of a higher-energy collider compared to a luminosity upgrade is that, for the same integrated luminosity, one can probe larger values of m hh . According to Eq. (7.3), when the collider energy is increased the differential cross section gets rescaled due to the modified luminosity factor ρ W . The plot of Fig. 26 shows the increase in the number of signal events at a given m hh . This is well approximated by the ratio of luminosity factors r(m 2 hh /s) = ρ LL W (m 2 hh /s, Q 2 )/ρ LL W ((m hh /14 TeV) 2 , Q 2 ) and is thus independent of the imposed cuts. One can see that at 28 TeV the increase is larger than 10 only for events with m hh 1.6 TeV. This suggests that in order to study the signal up to m hh ∼ 1.5 TeV a tenfold luminosity upgrade of the LHC would be as effective as, if not better than, a 28 TeV collider. Of course a definitive conclusion on which of the two facilities is the most effective, whether a luminosity or an energy upgrade, requires a precise estimate of the background, which scales differently in the two cases, and a more precise knowledge of how the various reconstruction efficiencies are modified at the higher luminosity phase. We leave this to a future study.

Conclusions and outlook
In this paper we have considered the general scenario where a light composite CP even scalar h with couplings similar, but different, to those of the Standard Model Higgs arises from the electroweak symmetry breaking dynamics. We have simply called h 'the Higgs', although our parametrization also applies to situations where h is quite distinguished from a Higgs, like for instance the case of a light dilaton. We have noticed that besides deviations from the SM in single Higgs production and decay rates, this scenario is characterized by the growth with the energy of the amplitudes for the processes W L W L → W L W L , W L W L → hh and W L W L → tt. In particular, the reaction of double Higgs production in vector boson fusion W L W L → hh emerges, along with the well studied process of vector boson scattering W L W L → W L W L , as a potentially interesting probe of strongly coupled electroweak dynamics. Specifically, the amplitude for W L W L → hh is predicted to grow with energy at the same rate as W L W L → W L W L in models where h is a pseudo-Goldstone boson, like those based on the SO(5)/SO(4) coset of Refs. [4,5]. On the other hand, when h represents a dilaton, the amplitude for W L W L → hh does not grow at the leading linear order in the center of mass energy s.
Motivated by the above, we have performed a detailed analysis of the detectability of the process W L W L → hh at the LHC, more precisely pp → hhjj. Our analysis focussed for concreteness on the pseudo-Goldstone Higgs scenario, but our results have clearly a broader validity. Theoretically the physics of strong W L W L → hh in hadron collisions resembles quite closely that of W L W L → W L W L . In practice there are important differences due to the different decay channels of the final states and due to the different SM backgrounds. For instance, it is a known fact, which we reviewed in Section 3, that the cross section for the scattering of transversely polarized vector bosons W T W T → W T W T is numerically large in the SM, to the point that even in maximally coupled Higgsless models one must go to a center of mass energy around 700 GeV in order for the signal W L W L → W L W L to win over. This 'difficulty' is compensated by the availability of rather clean final states, in particular the purely leptonic gold-plated modes W L W L → + E T . The end result is that, at 14 TeV with 300 fb −1 , strong vector boson scattering should be detectable in Higgsless models and in pseudo-Goldstone Higgs models with v 2 /f 2 0.5 [6]. In the case of W L W L → hh the situation is somewhat reversed. In realistic composite Higgs models the rate for W L W L → hh is significantly bigger than the one in the SM, already close to threshold. However, the final states from the decay of the Higgs pair most of the time involve QCD jets, thus making it more difficult to distinguish the signal from the background created by other SM processes.
We performed a partonic analysis of pp → hhjj using 'standard' cuts as shown in Eq. (4.2) to define jets. With that method we found that for the final state hh →bbbb the pure QCD background from pp →bbbbjj makes the signal undetectable. We have then focussed on the case where the Higgs decays dominantly to W 's, i.e., hh → W + W − W + W − . While in the SM this requires m h ∼ > 150 GeV, it should be remarked that in the case of a composite Higgs the range can in principle extend to lower values of m h as the single Higgs couplings are also modified. For example, in some interesting models like those based on the SO(5)/SO(4) coset with matter transforming as the fundamental representation of SO(5), the Higgs coupling to fermions is suppressed over a significant range of the parameter space, thus enhancing the relevance of the channel h → W W * over h → bb. We have made a detailed study of the detectability of the final states involving at least 2 leptons shown in Eq. (4.1). One basic feature of the signal events that plays a crucial role in our analysis is the presence of two very energetic forward jets with large rapidity separation, large relative invariant mass and p T ∼ < m W . Like in W W -scattering, these jets originate from the collinear splitting q → qW * L , where W * L is a longitudinally polarized W with virtuality ∼ p T ∼ m W . For each of the final states we have devised the optimal cuts by proceeding with a 3-step analysis. First we have performed standard acceptance cuts (Eq. (4.2)). In our simple partonic analysis those also provide our crude definition of jets. Secondly we have identified the relevant set of kinematical variables that characterize the signal against the background. These are the rapidity separation and invariant mass of the suitably identified forward jets ∆η ref JJ , m ref JJ , and the mass shell conditions of the reconstructed candidate h's and W 's. On those variables we then performed a set of master cuts defined in such way that cutting on each variable would not decrease the signal by more than 2%. As a third final step we searched for the optimal set of cuts on the relevant kinematical variables by following an iterative procedure: at each step we cut over the observable providing the largest enhancement of the signal significance, until no further improvement is possible. For instance, for the three lepton final state S 3 the optimized cuts are shown in Eqs. (4.4), (4.5) and (4.6), where in the latter two equations we specialize the cut on the invariant mass on the candidate hadronic W 's to the energy resolution of respectively CMS and ATLAS. In the case of two and four-lepton events we proceeded in a similar way.
The final results for the cross section of the signal and of the various backgrounds at different stages of the cut procedure are shown in Tables (3), (4) and (8), respectively for S 3 , S 2 and S 4 . Some of the background processes needed in our study were not available in the literature, and we computed them by writing new routines in ALPGEN. We believe that the results of our simple partonic analysis are robust, and should remain stable when performing a more proper treatment of initial and final state radiation. We have not done a complete analysis, but only considered showering for the signal and the leading sources of background. We found that the inclusion of showering enhances the efficiency of our cuts. This is not surprising: while the energy scale in the signal is large, colored particles have a virtuality ∼ < m W and little QCD radiation is associated with them. This is not the case for the background: extra radiation in this case increases the invariant masses of the Higgs and W candidates and makes it more difficult for the background to pass our on-shell cuts.
The outcome of our analysis is synthesized in Tables (9) and (10). With 300 fb −1 only for very low compositeness scale ξ = 1, basically the Technicolor limit, can one barely see the signal. A realistic viewpoint is therefore that the LHC luminosity upgrade of 3 ab −1 will be needed to study strong double Higgs production. The three lepton final state would then provide a rather clean signal for ξ > 0.5. The two same-sign lepton final state is not as free from background, but yields a predicted number of events a factor 3 larger. Both channels would independently give a 9σ signal in the limiting case ξ = 1. It should be emphasized that for the case of the two lepton signal a more careful estimate of the background, including correcting for detector efficiency, will be needed to reach the above mentioned significance, given that the background is more important for that channel. One should compare our results for strong double Higgs production to those of the more studied W W scattering. In that case the final numbers are significantly better. For instance, according to Ref. [16], the reaction W + W + → W + W + in the purely leptonic final state would yield approximately 40 events of signal at ξ = 1 with 300 fb −1 , with a background of about 10 events (mostly due to the scattering of transversely polarized W 's). It should however be emphasized that the hh final state gives access to additional information on the independent parameters b and d 3 . At large m hh the effect of b dominates as it controls the energy growing part of the amplitude. In our analysis, we did not impose a lower cut on m hh and we thus collected also the events close to threshold, which depend also on the Higgs cubic d 3 . This parameter has a significant impact on the total cross section. For instance, this can be seen in Table 9 by inspecting the two lepton channel in the two different models MCHM4 and MCHM5 for the same value ξ = 0.8, that is for coinciding a and b: the 40% mismatch in the number of events is a measure of the relevance of the cubic coupling d 3 . In principle a scan of the dependence of the signal events on m hh should allow the extraction of both b and d 3 . By putting together the information contained in Figs. 20 and 25 one can deduce that with a tenfold luminosity upgrade of the LHC it would be realistic to perform such a study, at least for models that deviate sizably from the SM (i.e., with (a 2 − b) = O(1)). We have not attempted to estimate how well we could extract b and d 3 , because in order to do so in a general model-independent way, we would also need to study in more detail how accurately a and c can be extracted from single Higgs production. This is because these two parameters affect both the signal and the background cross sections. On the other hand, if an excess in the total cross section is found, it should be possible to decide whether it was a pseudo-Goldstone Higgs or a dilaton by considering the energy distribution of the events. In the case of a dilaton the dependence on m hh would be the same as in the SM, while for the pseudo-Goldstone Higgs a characteristic growth ∝ m 2 hh as well as a harder distribution in H T would appear.
Our detailed study of the background was done assuming 14 TeV collisions. We have not attempted such an analysis at higher energies. We have however tried to assess how better an energy upgrade, as opposed to a luminosity upgrade, would improve things. We believe that the answer to the above dilemma is somewhat answered by Fig. 26, where we show how the differential signal cross section rescales with the beam energy in the relevant region of m hh . Assuming the same luminosity as the LHC, it seems that an energy upgrade to 28 TeV would not do better than a tenfold luminosity upgrade at 14 TeV. Of course there are many other variables in such extrapolation, like for instance the issue of pile-up, which we cannot control. Our result should thus be taken as a hint. It should also not be forgotten that an increase in beam energy would increase the sensitivity to resonances. In particular a scalar resonance in the s-channel could clearly enhance our signal.
There are a few directions along which our analysis can be extended or improved. One source of limitation in our study was the small branching ratio to leptonic final states. A possible improvement could come from considering W decays to τ . By a simple estimate, one concludes that by including events with three leptons of which one is a tau, the yield of this channel is almost doubled. A careful study of background, including consideration of the efficiency of τ tagging and τ /jet mistagging, would however be in order. Another limitation of our analysis is due to our 'conservative' choice of acceptance cuts. The parton isolation criterion corresponding to these cuts clearly disfavors the signal in the interesting energy range where the center of mass energy of the two Higgs system is large and the final decay products are boosted. It would be interesting to explore another cut strategy where the jets and leptons from each decaying Higgs are allowed to merge, and where the features of the signal are contrasted to those of the background by using jet substructure observables. On one hand this direction seems to make things worse by increasing the relevance of background events with fewer jets. On the other hand, it would allow a more efficient collection of signal events in the region of large invariant m hh where the signal cross section becomes larger. Indeed with that more aggressive strategy one could in principle consider the possible relevance of the one lepton channel, where only one W decays leptonically. One advantage of that channel is that one can reconstruct the momentum of the neutrino and close the kinematics. To the extreme one could even reconsider the 4b's final state, which could well be the dominant one if the Higgs is light.
The following tables report the choice of the factorization and renormalization scale Q chosen for each sample (where m h = 180 GeV, m t = 171 GeV):