Search for 4th family quarks with the ATLAS detector

The pair production of heavy fourth-generation quarks, which are predicted under the hypothesis of flavor democracy, is studied using tree-level Monte Carlo generators and fast detector simulation. Two heavy-quark mass values, 500 and 750$\gev$, are considered with the assumption that the fourth family mixes primarily with the two light families. It is shown that a clear signature will be observed in the data collected by the ATLAS detector, after the first year of low-luminosity running at the Large Hadron Collider.


Introduction
It is well known that the number of fundamental fermion families (generations) is not fixed by the Standard Model (SM). The precision measurements performed by the Large Electron-Positron Collider experiments at the Z pole have shown that the number of families with light neutrinos (m ν < m Z /2 ) is equal to three. On the other hand, the asymptotic freedom in QCD constrains this number to be less than nine. Therefore, from a pure experimentalist approach, it is meaningful to search for a possible fourth SM family at the forthcoming colliders. On the theoretical side, the fourth SM family is a direct outcome of the flavor democracy (or in other words democratic mass matrix) approach [1,2,3] which is strongly motivated by the naturalness arguments (see the review [4] and the references therein). Meanwhile, there are phenomenological arguments against the existence of a fifth SM family [5]. In this paper, the additional quark and lepton pairs of the fourth family are denoted as u 4 , d 4 and e 4 , ν 4 .
The most recent limit on the mass of the u 4 quark is m u 4 > 256 GeV [6]. The partial wave unitarity gives an upper bound of about 1 TeV to the fourth family fermion masses [7]. According to flavor democracy, the masses of the new quarks have to be within few GeV of each other. This is also experimentally hinted by the value of the ρ parameter which is close to unity [8]. Therefore, if the fourth SM family exists, the Large Hadron Collider (LHC) will copiously produce its quarks [9] and the proposed linear colliders will provide opportunity to discover its leptons [10]. As the single production of the new quarks in LHC is suppressed as compared to their pair production, due to the small value of the CKM matrix elements, the latter is considered. The new quarks, being heavy, will decay to the known SM quarks and W bosons. The dominant decay channels are defined by the 4 × 4 extension of the CKM mixing matrix with two distinct possibilities: 1) If the fourth family is primarily mixing with the third one, the decay channels will be u 4 → W + b and d 4 → W − t. The signature of the u 4ū4 production will be W + W − bb whereas in the case of d 4d4 , the final state would have an additional W + W − pair. The former case has been studied in [9,11] about 10 years ago 1 . The latter case, while potentially feasible owing to the low predicted SM backgrounds with four W bosons in the final state, is likely to be less interesting as a discovery channel, due to the difficulties in the jet association and invariant mass reconstruction.
2) If the fourth generation is primarily mixing with the first two families, the dominant decay channels will be u 4 → W + d/s and d 4 → W − u/c. In this case, since the light quark jets are indistinguishable, the signature will be W + W − j j for both u 4ū4 and d 4d4 pair production. Therefore, both up and down type new quarks should be considered together since distinguishing between u 4 and d 4 quarks with quasi-degenerate masses at hadron collider seems to be a difficult task. In this sense, lepton colliders are more advantageous, especially if the fourth family quarkonia could be formed.
Results of the most up-to-date measurements on the quark mixings as published by the Particle Data Group [8] together with the unitarity assumption of the 4 × 4 extension of the CKM matrix can be used to constrain the fourth-family quark related mixings. The first step is to calculate the squares of the entries in the fourth row and column together with their errors: where V i j are the CKM matrix elements and the δ i j are the quoted errors on these measurements. If one allows the V 2 i4 and V 2 4i to deviate by one sigma, the square root of the sum gives the upper limit for the fourth family quark mixings: where the lower (upper) limit of 0 (1) is implicitly assumed for all the new entries [13]. The remaining of this paper investigates the discovery potential of ATLAS experiment at the LHC accelerator for the fourth family quarks in the case where their dominant mixings are to first and second SM families as described in the second scenario above. The tree level diagrams for the pair production of the new quarks and their subsequent decays are given in Fig. 1 for the d 4 quark decaying via d 4 → W q (q = u, c) . The same diagrams are also valid for the u 4 quark production and decay, provided c and u quarks are replaced by s and d quarks. The widths of the d 4 and u 4 quarks are proportional to |V d 4 u | 2 + |V d 4 c | 2 and |V u 4 d | 2 + |V u 4 s | 2 respectively. Although the extension parameters have much higher upper limits, for the event generation and analysis section, the common and conservative value of 0.01 is used for all four relevant mixings. As the widths of the new quarks are much smaller than their masses, this selection of the new CKM elements has no impact on the pair production cross sections.

Event Generation
In order to study the possibility of discovery, the four-family model has been implemented into the treelevel generator, CompHEP v4.4.3 [14] and the pair production of the new quarks at the LHC and their subsequent decay into SM particles have been simulated. The QCD scale is set to the mass of the new quark under study and CTEQ6L1 set is chosen for the parton distribution functions [15]. Table 1 gives the cross section for the d 4d4 production for three example values of d 4 quark mass together with the decay widths. As the cross section for u 4ū4 production is within 1% of the d 4d4 one, from this point on only d 4 will be considered and the results will be multiplied by two to cover all signal processes involving both u 4 and d 4 quarks. For each of the considered mass values, 12 thousand signal events have been generated W + Figure 1: The tree-level Feynman diagrams for the pair production and decay of the d 4 quarks at the LHC.
for the d 4d4 → W − W + j j process where j is a jet originating from a quark or antiquark of the first two SM families. To benefit from the possible lepton and jet combined triggers and to reduce the ambiguity in the invariant mass reconstruction, the hadronic decay of one W boson and the leptonic (electron or muon) decays of the other one have been considered. Therefore, the signal is searched for in the 4 j + ℓ + E/ T final state where ℓ is an electron or a muon. The backgrounds events originate from all the SM processes whose final state has at least two W bosons and two non b-tagged jets. The direct background is from SM events which yield exactly the same final state particles as the signal events. The contributions from same sign W bosons are insignificant. Some of the indirect backgrounds are also taken into account. The dominant contribution is from tt pair production where the b jets from the decay of the top quark could be mistagged as a light jet. Similarly the jet associated top-quark pair production ( tt j → W − W + bb j ) contributes substantially to the SM background as the production cross section is comparable to the pair production and only one mistagged b-jet is sufficient to fake the signal events. The cross section for the next-order process, namely p p → tt 2 j, has been computed to be four times smaller than tt j process and therefore this process has not been considered. It should be noted that the tt and tt j samples have been conservatively added together, in spite of the fact that initial and final-state parton showers simulated in Pythia for the former would account for part of the cross section for the latter. Finally, background from SM processes with W ± Z qq (q = u, d, s, c) final state has been studied. Its contribution to the total background is very similar to the direct (WW j j) background. All the mentioned background processes have been generated with MadGraph v3.95 [16]. This tree-level generator was previously shown to give results in good agreement with CompHEP and to be more suitable for running on a computer farm [17]. A total of more than 280 thousand events generated at different QCD scales and jet selection criteria comprise the background sample.
The events from both generators are fed into the ATLAS detector simulation and event reconstruction framework, ATHENA v11.0.41, with the CompHEP events using the interface program CPYTH v2.0.1 [18]. Parton showering, hadronization and fragmentation are simulated using the ATHENA interface of Pythia v6.23 [19] and the detector response is obtained from the fast simulation software, ATLFast [20]. This software uses a parameterized function to calculate the final particle kinematic variables rapidly, and its output is calibrated to match the results from GEANT-based full detector simulation [21]. The physics objects from ATLFast are used in the final analysis in ROOT 5.12 [22].  Table 2, respectively. Signal and background histograms have been scaled to the same luminosity, except in plot c, where the histograms have been normalized to unit area.

Event Selection and Reconstruction
The first step of the event selection is the requirement of a single isolated lepton of transverse momentum, p lept T > 15 GeV, and at least four jets with transverse momenta, p jet T > 20 GeV. The transverse momentum of the highest-momentum isolated muon in each event is shown in Fig. 2a. The four highest-energy jets are required not be b-tagged, as determined by ATLFASTB [20], a fast b-tagging simulation program, which utilizes a p T dependent parameterization of tagging efficiencies. For instance, at high momenta (p jet T > 100 GeV) the tagging efficiency for b, c and light jets are 50%, 7.6% and 0.6%, respectively.
The leptonically decaying W boson is reconstructed by attributing the total missing transverse momentum in the event, shown in Fig. 2b, to the lost neutrino, and using the nominal mass of the W as a constraint. The two-fold ambiguity in the longitudinal direction of the neutrino is resolved by choosing the solution with the lower neutrino energy. The four-momenta of the third and fourth most energetic jets in the event are combined to reconstruct the hadronically decaying W boson. Due to the high momentum of the W boson in the signal events particularly for the high values of the q 4 mass, the jets are not always resolved in the detector. When this happens, one of the two jets used in the combination is a random jet, which spuriously increases the invariant mass, m W j j , of the reconstructed W . Such cases cause a long high-end tail in the invariant mass distribution for the signal as shown in Fig. 2c. In order to reduce their adverse effect on the final m q 4 distributions, events with m W j j > 200 GeV are rejected, even though the comparison of the distributions for the signal and the background would suggest that a looser criterion would benefit the final statistical significance.
The surviving events are used to obtain the invariant mass of the new quark. Each reconstructed W is  associated with one of the two hardest jets, for which the minimum transverse momentum requirements are tightened to p jet T > 100 GeV. As observed in Fig. 2d, this tighter requirement has no significant effect on the signal, while substantially reducing the background. A tighter p jet T selection would start to skew the final invariant mass distributions. Therefore the lower value of 100 GeV was chosen so that the analysis results could be safely interpreted for lower q 4 masses as well. The W -jet association ambiguity is resolved by selecting the combination which results in the smallest difference between the masses of the two reconstructed q 4 quarks in the same event. If this mass difference is more than 100 GeV for either combination, the event is rejected. The summary of the event selection cuts and their efficiencies for both signal and background events are listed in Table 2 for a quark mass of 500 GeV. These selection criteria were not optimized for the m q 4 = 750GeV case to be safely pessimistic. The results of the reconstruction for quark masses of 500 GeVand 750 GeV are shown in Fig. 3 together with various backgrounds for integrated luminosities of 1 and 10 fb −1 respectively. The bulk of the background in both cases is due to tt j events as discussed before.

Results
In order to extract the signal significance, an analytical function consisting of a Crystal Ball term [23] to represent the background and a Breit-Wigner term to represent the signal resonance is fitted to the total number of q 4 candidates in the invariant mass plots of Fig. 3. In both plots, the fitted function is shown in solid black, and its signal component is plotted as a dashed red line. The shape of the background curve was also verified against random fluctuations (as in Fig. 3 left side in the 500-600 GeV region of the WW bb j curve) by parameterizing the background and then by generating a large sample of pseudoMC experiments. It was found that with large statistics the Crystal Ball is a very accurate description of the background shape. The extracted number of total signal events is in very good agreement with the actual number of events in the signal Monte Carlo sample. The significance is estimated as S/ √ S + B, where S(B) is the number of signal (background) events determined from the Breit-Wigner (Crystal Ball) term of the fitted function. As each event contributes two q 4 candidates to the invariant mass histogram, the total number of signal (background) events is obtained by taking half of the integral of the signal (background) term within ±2Γ (twice the fullwidth at half maximum) of the peak position of the signal. For the case of m d 4 =500 GeV (750 GeV), with 1 fb −1 (10 fb −1 ) of data, the signal significance is found to be 9.2 (7.1). The number of events for these two example cases for both signal and background are presented in Table 3.

Conclusion
The analysis can be extrapolated to other q 4 quark mass values to estimate the amount of integrated luminosity necessary for a discovery. Fig. 4 contains the fourth generation quark (u 4 and d 4 combined) pair production tree-level cross section showing the contributions from gluon fusion and q −q annihilation. For the selected parton distribution function, the latter becomes more important at a quark mass of around 650 GeV. The same figure, on the right-hand side, shows the estimated integrated luminosity required for 5σ discovery as a function of the mass of the new quark. The estimates on this plot are based on the cross sections shown and the integration of the background function as obtained from the fits presented in the analysis section. In all cases, the number of signal events to be collected in order to reach the 5σ significance is above 20. While this study is based on a fast simulation of the detector response which was not fully validated and there are uncertainties associated with the QCD scale, statistical errors etc, we believe that the conservative selection cuts and the simplicity of the reconstruction algorithms give reliability to the conclusions. This study has shown that, if the fourth family quarks mix primarily with the first two generations, a clear signal will be observed for the mass range of interest within the first year of the low-luminosity running at the LHC. On the other hand, if the mixing matrix is such that the third SM family quarks play the dominant role, similar results can be claimed for the u 4 quark, while the discovery of the d 4 quark is likely to require more luminosity because of the complexity of the event signature arising from the top-quark decays. In either case, the first few years of the LHC data will resolve the discussion on the possibility of four SM families within the context of flavor democracy.  Figure 4: On the left, the q 4q4 pair production cross section at the tree level and on the right, the integrated luminosity needed for a 5σ discovery of the signal, both as a function of the new quark mass. Only the pair production and the mixing to first two families are considered.