Reconstruction of physics objects at the Circular Electron Positron Collider with Arbor

After the Higgs discovery, precise measurements of the Higgs properties and the electroweak observables become vital for the experimental particle physics. A powerful Higgs/Z factory, the Circular Electron Positron Collider(CEPC) is proposed. The Particle Flow oriented detector design is proposed to the CEPC and a Particle Flow algorithm, Arbor is optimized accordingly. We summarize the physics object reconstruction performance of the Particle Flow oriented detector design with Arbor algorithm and conclude that this combination fulfills the physics requirement of CEPC.


The Higgs discovery and the precision measurements
The discovery of the Higgs boson completes the entire Standard Model (SM) particle spectrum [1] [2]. As one of the most successful models that mankind ever constructed, the SM agrees with, predicts and interprets almost all the data taken from the collider experiments. However, the SM is incapable to explain lots of observed or anticipated fundamental phenomena beyond the collider experiments. For instance, the SM consists of no candidate particle for the dark matter, it cannot explain the dark energy and inflation, and so far it doesn't provide enough CP violation for the baryogenesis. In addition, the SM suffers from the problem of the naturalness, the hierarchy, and the vacuum stability, etc. All these clues point to an intriguing, and highly probable possibility: the SM is a low-energy effective theory of much profound physics principles. The revelation of these principles a e-mail: Manqi.ruan@ihep.ac.cn is the key objective of experimental particle physics after the Higgs discovery, or say, in the Post-Higgs era.
Interestingly, most of the clues point to the Higgs field. The huge difference between the Higgs boson mass and the Planck scale stands for the naturalness problem; the couplings between Higgs boson and the SM fermions inhabit the CP violation phases. The Higgs boson may serve as a portal to the dark matter and even dark energy. Therefore, the Higgs boson is an excellent probe towards these fundamental physics principles, and a Higgs factory that can reveal the nature of the Higgs boson become a must for the experimental particle physics.
The LHC is a powerful Higgs factory. It not only discovers the Higgs boson but also indicates the discovered Higgs boson is highly SM-like [3]. The planned high-luminosity operation of the LHC (HL-LHC) will certainly shed more light on the nature of the Higgs boson. However, at a proton collider, the accuracies of the Higgs measurements are limited by the huge QCD background, and most of the Higgs signals can only be identified from its decay final state. As a result, a very small fraction (roughly 10 −3 ) of the Higgs events are identified at the proton collider. The measurement precision (i.e. the signal strengths) is typically limited to 10% level at the HL-LHC [4] [5].
The electron-positron collider provides crucial information on top of the HL-LHC. First of all, the electron-positron Higgs factory is free of the QCD background. Within the detector fiducially volume, the ratio between the Higgs signal cross section and that of the inclusive physics events is roughly 10 −2 ∼ 10 −3 , roughly eight orders of magnitude better than the LHC. The entire event rate at an electronpositron Higgs factory is so low that almost every physics event could be recorded. In addition, a significant portion of the Higgs boson is generated with a Z boson (the Hig-gsstrahlung process) at an electron-positron Higgs factory. At these events, the Higgs boson could be identified through the Z boson via the recoil mass method, leading to absolute measurements of the inclusive ZH cross section, Higgs boson width and couplings between the Higgs boson to its decay final states. The electron-positron collider is also extremely sensitive to the exotic Higgs decay mode search.
For these advantages, many electron-positron Higgs factories have been proposed [6] [7][8] [9]. The fact that the Higgs boson has 125 GeV mass promotes the concepts of circular Higgs factories, which is upgradable to high energy proton colliders. The Circular Electron-Positron Collider (CEPC) is one of these proposals. With a main ring circumference of 100 km, the CEPC will be operated at 240 GeV center of mass energy and produce 1 million Higgs boson in 10 years' operation with two detectors. At this energy, roughly 95% of the Higgs bosons are generated via the ZH process, ensuring an excellent g(HZZ) measurement. Lowing the center of mass energy to 91 GeV, the CEPC could produce more than 10 10 Z boson per year. From which, electroweak observables such as A F B b , R b , the Z line shape can be measured precisely. After the electron-positron collision phase, a super proton proton collider (SppC) with a center of mass energy up to 100 TeV can be installed in the same tunnel.
In terms of the Higgs measurement, the CEPC determines the absolute Higgs couplings to accuracies of 0.1% -1%, roughly one order of magnitude superior to the model dependent measurements at the HL-LHC [4] [5]. The Higgs total width could be measured to an accuracy of 3%. Depends on the event topology, the exotic decay branching ratios can be limited to 10 −3 to 10 −5 [10]. Meanwhile, the CEPC produces lots of Z and W bosons, it can boost the precisions of EW measurements by at least one order of magnitude from current precision. A combination of the electroweak (EW) and the Higgs measurements could significantly enhance the physics reach [11].

The CEPC physics requirements and the Particle Flow
As a Higgs factory, the CEPC detector should be able to distinguish the Higgs signal from the SM background and to classify different Higgs generation/decay modes. In another word, the CEPC detector is required to reconstruct all the physics objects in the Higgs events with high efficiency, high purity and measure them with high precision. The physics requirements for the CEPC detector could be schematized (but not limited to) as follows: 1, Be adequate to the CEPC collision environment: the detector should be fast enough to record all the physics events and robust enough against the irradiation.

2, Highly hermetic;
3, Excellent track reconstruction efficiency and momentum resolution better than δ ( 1 P t ) = 2×10 −5 (GeV −1 ), required by g(Hµ + µ − ) measurement and the Higgs recoil mass reconstruction at llH channels; 4, Excellent lepton identification, required by both Higgs measurements and EW measurements; 5, Capable to identify charged kaons, required by the flavor physics; 6, Precise reconstruction of photons, required by physics with τ final states, jet energy reconstruction, and the Br(H → γγ) measurement; 7, Capable to identify τ lepton and different decay modes of the τ lepton, requested by g(Hτ + τ − ) measurements and physics with τ final states; 8, Good Jet/Missing Energy (MET) reconstruction, appreciated by most of the CEPC physics measurements; 9, Capable to separate b-jets, c-jets and light jets (uds and gluon jets): required by the g(Hbb), g(Hcc), and g(Hgg) measurements.
Since the W and Z bosons decay into similar physics objects as the Higgs boson, the EW measurements also benefit from these requirements. In addition, compared to the Higgs measurements, the EW measurements are much demanding in the systematic control. For example, the CEPC detector is required to determine the luminosity to a relative accuracy of 10 −3 for the Higgs measurements, and 10 −4 for the Z pole operation.
Adequate reconstruction and detector design are fundamental to the CEPC. As a significant trend for the experimental particle physics [12][13] [14] [15], the Particle Flow oriented detector design and reconstruction is selected as the baseline for the CEPC. The Particle Flow aims at reconstructing all the final state particles with the most suited sub-detector system. Ultimately, it provides 1-1 correspondence between the reconstructed particles and the physics truth. The physics objects are then reconstructed from the final state particles. The Particle Flow, with an adequate detector design, can significantly enhance the reconstruction efficiency, purity and the measurement accuracy of the key physics objects. In addition, Particle Flow can largely improve the accuracy of jet energy resolution, since the majority of jet energy is carried by the charged hadrons, whose track momentum are usually measured at a much better accuracy by the tracking system comparing to its cluster energy measured by the calorimeter system. As the other side of the coin, the software and the reconstruction is vital, and challenge for the Particle Flow oriented design. Adequate Particle Flow algorithm is needed to fully exploit the potential of the physics performance.
A Particle Flow algorithm, Arbor [16], has been developed for the CEPC study. Arbor has been optimized on a set of reference detector geometries for the CEPC [9] [17]. In this manuscript, we summarize the reconstruction performance at the physics objects and at the Higgs physics benchmarks, based on Geant4 [18] simulation. The detector geometry is introduced in section 2. Section 3 briefly summarizes the principle and key performance of the Arbor. From section 4 to section 9, we demonstrate the reconstruction performance of different physics objects. Final section 10 is devoted to the conclusion and discussion.

Reference detector geometry and softwares
To fulfill the CEPC physics requirements, the Particle Flow oriented design is used as the baseline for the CEPC detector design. In this manuscripts, most of the results are based on the detector model CEPC v_1, the benchmark geometry used in the CEPC PreCDR study [9]. CEPC v_1 is developed from the ILD detector, the baseline detector of the linear collider studies [6] [7]. To get adapted to the CEPC collision environments, CEPC v_1 takes mandatory changes at the Machine Detector Interface (MDI), the forward region, and the Yoke system. Comparing to ILC, CEPC requires much short distance between the final focusing magnet (QD0) to the interaction point, which is reduced from 3.5 meters to 1.5 meters. The forward region is changed, providing a solid angle coverage of |cos(θ )| < 0.995. In the original design, the ILD has a total weight of 15k tons, roughly 5 times larger than the LEP detectors. The main reason for ILD to be so heavy is its extremely thick return Yoke (3.2 meters in the barrel and 2.6 meters in the endcap). Such a heavy yoke is required for the Push-Pull operation scenario, where two detectors are housed in the same experimental Hall and efficient magnetic field shielding is required. At CEPC v_1, the Yoke thickness is reduced by 1 meter for both barrel and endcap and the total weight is reduced by 40% w.r.t the ILD.
The CEPC v_1 uses the Time Projection Chamber (TPC) as the main tracker. The TPC provides good energy resolution, excellent track reconstruction efficiency and has low material budgets. These properties are highly appreciated in the PFA reconstruction. The low material budget is important to limit the probability of nuclear interactions and bremsstrahlung before the particle incident on the calorimeter. In addition, the TPC dE/dx measurement is essential for the charged Kaon identification, see section 5. Using dedicated hardware designs, the TPC is operational at CEPC, where the typical physics event rate at CEPC is roughly 10/1000 Hz at the Higgs/Z pole operation [19].
The TPC in the CEPC v_1 has a radius of 1.8 meters and a length of 4.7 meters. It is divided into 220 radical layers, each has a thickness of 6 mm. Along the φ direction, each layer is segmented into 1 mm wide cells. In total, the TPC has 10 million readout channels in each endcap. Operating in 3.5 Tesla solenoid B-Field, the TPC provides a spatial resolution of 100 µm in the R − φ plane and 500 µm resolution in the Z direction for each tracker hit. The TPC reaches a standalone momentum resolution of δ (1/P t ) ∼ 10 −4 GeV −1 .
The CEPC v_1 is equipped with large-area silicon tracking devices, including the pixel vertex system, the forward tracking system, and the silicon inner/external tracking layers located at the boundary of the TPC. Combining the measurements from the silicon tracking system and the TPC, the track momentum resolution could be improved to δ (1/P t ) ∼ 2 × 10 −5 GeV −1 . In fact, the TPC is mainly responsible for the pattern recognition and track finding, while the silicon tracking devices dominate the momentum measurement. The silicon pixel vertex system also provides precise impact parameter resolution (∼ 5µm), which is highly appreciated for the τ lepton reconstruction and the jet flavor tagging.
The CEPC v_1 uses high granular sampling Electromagnetic Calorimeter (ECAL) and Hadronic Calorimeter (HCAL). The calorimeter is responsible for separating final state particle showers, measuring the neutral particle energy, and providing information for the lepton identification [21] [22] and charged kaon identification, see section 5. The entire ECAL and HCAL are installed inside the solenoid, providing 3dimensional spatial position, the energy and the time information for each hit. The ECAL is composed of 30 layers of alternating silicon sensor and tungsten absorber. It has a total absorber thickness of 84 mm. Transversely, each sensor layer is segmented into 5 mm by 5 mm cells. The HCAL uses Resistive Plate Chamber sensor and Iron absorber. It has 48 longitudinal layers, each consists of a 25 mm Iron absorber. Transversely, it is segmented into 10 mm by 10 mm cells.
This calorimeter system provides decent energy measurement for the neutral particles (i.e. roughly 16%/ E/GeV for the photons and 60%/ E/GeV for the neutral hadrons). More importantly, it records enormous information of the shower spatial development, ensuring efficient separation between nearby showers and providing essential information for the lepton identification, see section 4. In addition, the silicon tungsten ECAL could provide precise time measurement. Requesting a cluster level time resolution of 50 ps, the ECAL Time of Flight (ToF) measurement plays a complementary role to the TPC dE/dx measurement, leading to a decent charged Kaon identification performance, see section 5.
On top of the CEPC v_1 geometry, several standalone detector geometries are used to explore the dependence between detector geometry and the objective performances. This information is given in corresponding sections.
All the geometries are implemented via Mokka [20], the Geant4 simulation package that had been used in the linear collider studies. A set of single particle samples and Higgs physics process samples have been used in this manuscript. The Higgs physics processes are generated using Whizard [23]. The simulated data files are then reconstructed via ilcsoft [24] and Arbor. The ilcsoft provides functionalities of the data management [25] [26], the digitization [27], the tracking [28], and the flavor tagging. The Arbor is used as the core PFA algorithm that builds all the reconstructed particles from calorimeter hits and tracks. In the next section, we will introduce Arbor.

Arbor
Arbor [16] algorithm is inspired by the simple fact that the particle shower spatial configuration naturally follows a tree configuration. Arbor is composed of a calorimeter clustering module and a matching module. The clustering module reads the calorimeter hits and builds the calorimeter clusters. The matching module identifies the calorimeter clusters induced by charged particles (charged clusters), combines these clusters with tracks, and builds charged reconstructed particles. The remaining clusters are reconstructed into photons, neutral hadrons, and fragments (mainly from charged clusters). The final state particles are therefore reconstructed.
Arbor clustering module creates oriented connectors between calorimeter hits, and iterates until the configuration of the connector-hit ensemble follows a tree topology. The branches hence represent the trajectory of charged shower particles. The seeds usually correspond to the incident position of the particle at the calorimeter. Since the separation of the seeds is straightforward, Arbor efficiently separates the particle showers, which is highly appreciated by the Particle Flow principle. Fig. 1 shows a reconstructed calorimeter shower of a 20 GeV K 0 L particle at the high granularity calorimeter, where the readout density is roughly 1 channel/cm 3 . The reconstructed tree branches are demonstrated with different colors. Therefore the trajectory length of charged shower particle can be reconstructed. Fig. 2 compares the reconstructed trajectory length with MC truth, the red distribution is the MC truth level trajectory length of charged particles generated inside 40 GeV π showers; the green one is corresponding to the trajectory of the electron and the positron generated in the showers; while the blue is the trajectory length reconstructed by Arbor. Good agreement between the reconstruction and MC truth is found at sufficient trajectory length.
Arbor can also be characterized by the energy collection performance at single neutral particle and the separation performance at bi-particle samples. Typically, Arbor reaches an energy collection efficiency higher than 99% for photons  with energy higher than 5 GeV. Higher hit collection efficiency usually leads to a better energy resolution, however, it usually increases the chance of confusions, i.e, the wrong clustering of calorimeter hits. Therefore, an optimized performance depends on the balance of these two effects.
Excellent separation performance is crucial for the jet energy reconstruction, the π 0 reconstruction, and the measurement with τ final states. This performance can be characterized via the reconstruction efficiency of di-photon samples, where two photons with the same energy are shot in parallel at different positions, see Fig. 3. According to the distribution of π 0 energy at Z → τ + τ − events at CEPC Z pole operation, we set the photon energy to 5 GeV.
The reconstruction efficiency is defined as the probability of successfully reconstructed two photons with anticipated energy (each candidate is required to have an energy within 1/3 to 2/3 of the total induced energy). The efficiency curve naturally exhibits an S-curve dependency on the dis-  tance between the photon impact positions, see Fig. 4. The distance at which 50% of the events are successfully reconstructed is referred to as the critical distance, which depends on the ECAL transverse cell size. At the cell size smaller than the Moliere radius, the critical distance is roughly 2 times the cell size, see Table. 1.
To conclude, Arbor is a geometrical algorithm that reconstructs each shower cluster into a tree topology. At high granularity calorimeter, Arbor efficiently separates nearby particle showers and reconstructs the shower inner struc-ture. It maintains a high efficiency in collecting the shower hits/energy, which is appreciated by the energy reconstruction. The overall performance on different physics object and physics benchmarks will be discussed in details in the following sections.

Leptons
The lepton identification is fundamental to the CEPC physics program. About 7% Higgs bosons at the CEPC are generated with a pair of leptons. Those events are the golden signal for the Higgs recoil analysis, which is the anchor for the absolute Higgs measurements at the electron-positron Higgs factory. A significant fraction of the Higgs boson decays, directly or via cascade, into final states with leptons. In addition, a significant fraction of H → bb/cc events generate leptons in their jet fragmentation cascade, thus a good lepton identification performance improves flavor tagging performance. The lepton identification is also crucial for the EW measurements. The PFA oriented detector, especially the high granularity calorimeter system, provides enormous information for the lepton identification. A dedicated lepton identification algorithm, LICH [22], has been developed for the detectors using high granularity calorimeter. For each reconstructed charged particle, LICH extracts more than 20 observables from the associated track and calorimeter cluster. These observables include the track dE/dx measurement, the shower fractal dimension [21] that describes the global shower compactness, the shower longitudinal profiles, and the distances in between the track and calorimeter cluster. Using the Gradient Boost Decision Tree method at the TMVA toolkit [30], LICH then calculates the electron and muon likelihood for Efficiencies of µ ± (blue), e ± (red) and π ± (green) identifications at different calorimeter granularity.
the charged particle. Fig. 5 shows the likelihood distribution of 40 GeV electron, muon and pion samples, where clear separation is observed.
At the CEPC v_1 geometry, for isolated charged particles with energy larger than 2 GeV, LICH achieves a lepton identification efficiency better than 99.5%. The accumulated misidentification rate of hadrons to leptons is smaller than 1%. This misidentification is mainly caused by the irreducible background such as pion decays and highly electromagnetic like pion clusters (via the π 0 generated from the pion-nuclear interactions). The performance of LICH has been scanned over a large range of the granularity for both ECAL and HCAL, while the performance is stable for particles with energy larger than 2 GeV, see Fig. 6. This performance is significantly better than the experiments at the LHC and the LEP [31][32]. In the physics event, the lepton identification performance is limited by the separation power of the particle detector. To evaluate this impact, we studied the efficiency of successfully identified two prompt leptons at the l + l − H event. This analysis shows at 10 mm ECAL cell size, the reconstruction efficiency reaches 97-98%, for e + e − H and µ + µ − H events respectively [22]. This efficiency degrades at larger ECAL cell size. Taken into account the detector acceptance, we conclude that less than 0.5% of the prompt leptons in the l + l − H events will potentially be misidentified due to the limited separation power at the CEPC v_1 geometry.

Charged kaons
Successful identification of the charged kaons is crucial for the flavor physics and is appreciated in the jet flavor and jet charge measurements [33]. A clear π − K separation is the key for the charged kaon identification. According to the Bethe-Bloch equation, the dE/dx of the charged pions is larger than that of kaons by roughly 10% at the same momentum in the relativistic energy range at the CEPC Z pole operation. In another word, an efficient π − K separation can be achieved if the dE/dx can be measured to a relative accuracy better than 5%.
The large TPC main tracker at the CEPC v_1 provides the dE/dx measurement. At the MC truth level, the Geant4 simulation predicts a 3.9σ π-K separation and 1.5σ K − proton separation at the inclusive Z → qq samples at 91.2 GeV center of mass energy [34] (Integrated over track momentum range of 2-20 GeV). A survey of the existing experiments shows that, with respect to the MC truth, the achieved dE/dx measurements degrade by 15 -50%. which is caused by the intrinsic energy resolution, the inhomogeneity, the stability of devices, the occupancy, etc. The 50% degrading is used as a conservative estimation of the dE/dx measurement at the CEPC. Fig. 7 shows the anticipated separation performance between different charged particles at the CEPC v_1 TPC. The upper band boundaries are corresponding to the MC truth prediction, while the lower boundaries are corresponding to this conservative estimation. Integrated over the momentum interval of 2-20 GeV, a 2.6σ π-K separation is anticipated in the conservative estimation.  The dE/dx difference between the pions and the kaons vanishes at 1 GeV track momentum. To cover this low mo- mentum range, a Time of Flight (ToF) measurement with an accuracy of 50 ps (at cluster level) is proposed. According to the recent progress of high granularity calorimeters, this ToF information could be measured by the ECAL [14][15] [36]. This ToF measurement is crucial for the K-p separation, see Fig. 8. Using both ToF and dE/dx information, at inclusive Z → qq sample at 91.2 GeV center of mass energy, a kaon identification reaches an efficiency/purity of 91%/94% in the conservative scenario at the CEPC v_1 geometry. If the dE/dx measurement achieves an objective scenario that the degrading with respect to the MC truth is controlled to be 20%, the identification performance could be improved to an efficiency/purity of 97%/97%, which is only 2% degraded from the MC truth prediction.
To conclude, a decent kaon identification performance could be achieved using the TPC dE/dx measurement and the ECAL ToF measurement. The TPC hardware design is encouraged to achieve a dE/dx resolution that degrades less than 20% with respect to the MC truth prediction. Bench-marked with tracks at Z → qq events, the dE/dx resolution should be measured to a precision better than 3.6%. The ECAL ToF measurement is recommended to achieve a time resolution of 50 ps at the cluster level.

Photons
Successful photon reconstruction is crucial for the jet energy reconstruction, the Br(H → γγ) measurement, and the physics with τ leptons. In this study, we benchmark the overall photon reconstruction using the Higgs mass resolution with H → γγ event.
The photon reconstruction is sensitive to the tracker material and the calorimeter geometry defects, such as the cracks between the ECAL modules, staves, and the dead zone between the ECAL barrel and endcaps. To quantify their impact, a simplified, defect-free ECAL geometry is implemented. The benchmark Higgs invariant mass distributions are analyzed for both simplified and realistic geometry (the CEPC v_1).
This simplified geometry uses cylindrical barrel layer and its endcaps are directly attached to the barrel, forming a closed cylinder. No tracker geometry is implemented in this simplified geometry. Fig. 9 shows the Higgs boson invariant mass reconstructed from Br(H → γγ) signal at this simplified geometry. A relative mass resolution of 1.7% is achieved, which agrees with the intrinsic electromagnetic energy resolution measured at the CALICE Si-W ECAL prototype test beam experiments [13].  The reconstructed Higgs invariant mass of H → γγ events at the simplified detector geometry (without any gap and defects in the ECAL, and has no tracker). 10k events are reconstructed and the distribution is normalized to unit area.
Comparing to the simplified geometry, the relative resolution of the Higgs mass at CEPC v_1 degrades by almost a factor of two, and the mean value of the mass peak is shifted to 121 GeV. A preliminary geometry based correction algorithm has been developed, which scales the energy of EM clusters located at the geometry cracks. After applying this correction algorithm, the Higgs boson invariant mass distribution at CEPC v_1 is shown in Fig. 10. This distribution could be fit to a core Gaussian center and a wider Gaussian with a lower mean value. The core gaussian exhibits a mass resolution of 1.9%, while the low-mass wider gaussian is caused by the fact that the correction algorithm is only optimized. The average mass resolution (taking weighted average of both Gaussian) is then 2.3%. The latter can be improved with much dedicated correction algorithm.  In terms of photon reconstruction efficiency, the CEPC v_1 detector is sensitive to photons with energy larger than 10 MeV, the efficiency saturates to 100% for photon energy larger than 1 GeV [37]. Proportional to the material before the calorimeter, roughly 7% of the photons at CEPC v_1 convert into e + e − pairs or even start an electromagnetic shower before reaching the calorimeter. Thanks to the lepton identification performance and the large solid angle coverage, the majority of these converted photons could be identified.
To summarize, our simulation predicts the Higgs mass resolution at two-photon final state reaches 1.6-2.1% level at the CEPC. This result is consistent with the CALICE prototype test beam result. The reconstruction of converted photons and the correction of the geometry defects at any realistic detector geometry is vital for the photon reconstruction.

Taus
The τ lepton is an extremely intriguing physics object. As the heaviest lepton in the SM, τ has a large Yukawa coupling to the Higgs boson, leading to a significant Br(H → τ + τ − ). The σ (HX) × Br(H → τ + τ − ) is expected to be measured better than 1% relative accuracy at the CEPC [38]. Measuring the τ polarization at the Z pole leads to a precise determination of sin 2 θ e f f W [31]. Also, the measurements via spectral functions of τ hadronic decays are very compelling at the CEPC [39].
The τ lepton has various different decay modes, and the successful τ lepton identification is highly non-trivial. In the CEPC studies, we classify the events with final state τ leptons into two classes and develop the identification algorithms accordingly.
A successful identification of these events based mostly on the reconstruction of photons, charged particles, and the track impact parameters.
The second class is the hadronic events with jets in their final states, for instance: Finding the τ candidate in the hadronic events depends on the isolation conditions, the multiplicities, the visible mass of τ candidates, and the track impact parameters.
A full simulation analysis of g(Hτ + τ − ) measurement includes both classes and is performed at [38]. The first class is represented by the Br(H → τ + τ − ) measurement at µ + µ − H events. The inclusive SM background is efficiently subtracted by requesting the proper multiplicity of photons, charged particles and the restriction on the invariant/recoil mass of the µ + µ − system. Thanks to the PFA oriented design and reconstruction, the final event selection reduced the inclusive SM background by nearly six orders of magnitudes, while preserves a signal efficiency of 93%. The leading remaining background is the irreducible Higgs background (i.e. H → WW * , ZZ * → τ + τ − νν). A relative accuracy of 2.7% is achieved for the signal strength measurement in the µ + µ − H channel.
The second class includes qqH, H → τ + τ − events. A double size cone-based τ finding algorithm is developed. For each individual track, two cones with different sizes are formed. A τ candidate is identified once the multiplicities, the mass, etc at each cone satisfy certain constraints. These cone parameters are optimized. In short, by requesting two τ candidates with opposite charge, the signal efficiency is 57% and the background could be suppressed by three orders of magnitude.
Giving the significant cτ of the τ lepton (89 µm) and the precise vertex system at CEPC v_1, the signal and background could be further separated using the track impact parameter D 0 and Z 0 . For each track, we define a pull parameter as ((D 0 /mm) 2 + (Z 0 /mm) 2 . Fig. 11 shows the sum of the pull of the leading track for each tau candidate for both signal and backgrounds (after above-mentioned event selection), where the signal is clearly separated from the background for both µ + µ − H and qqH channels. Applying a template fit to the pull parameter, a relative accuracy of 2.1% and 1.0% for the signal strength measurements can be achieved for the µ + µ − H and qqH channels respectively.
To conclude, the τ reconstruction at the CEPC uses different algorithms for the leptonic and hadronic events. In both cases, the τ events identification relies strongly on a successful reconstruction of the photons, charged hadrons, and leptons, which, is secured by separation performance of Arbor with current CEPC baseline detector geometry. Meanwhile, a precise reconstruction of the impact parameters plays an important role in the identification of events with τ final states.
It should be reminded that the requirements of τ physics are more demanding than the g(Hτ + τ − ) measurements. The former requests a successful reconstruction of the number of π 0 generated in the τ decay cascade, making strong requirements on the separation power of ECAL and on the ECAL energy/geometry acceptances.

Jet
The jet is fundamental for the CEPC physics program. About 90% of the SM Higgs boson decays into final states with jets (70% directly to di-jet final states; and roughly 20% via decay cascade from the ZZ * ,WW * ), while 70% of W and Z bosons decay into di-jet final states. Roughly 60% of the jet energy is carried by the charged particles, and the Particle Flow could improve significantly the precision of jet energy measurement with respect to the calorimeter based reconstruction.
In the Particle Flow reconstruction, the jet candidates are constructed from the reconstructed final state particles via the jet clustering algorithms. The ambiguity from the jet clustering is significant and usually dominants the uncertainty, especially for these events with more than two final state jets such as the measurement of g(Hbb), g(Hcc), and g(Hgg) via ZH → 4 jet events.
To characterize the jet reconstruction performance, a twostage evaluation has been applied at the CEPC studies. The first stage is the Boson Mass Resolution (BMR) analysis designed to avoid the complexity induced by the jet clustering. The second is the individual jet response analysis, which requests the jet clustering.
The Boson Mass Resolution analysis is applied to physics events with two final state jets decayed mostly from one intermediate gauge boson, including 1, ννqq events via the ZZ intermediate state; 2, lνqq events via mostly WW intermediate state; 3, ννH events with H → bb, cc, or gg.
In these processes, besides the jet final state particles, the other particles are either invisible or could be easily identified. The invariant mass of all the boson final state particles can be reconstructed. Therefore, disentangled from the jet clustering algorithm, the BMR evaluates the jet reconstruction. Meanwhile, the BMR shows immediately how these massive gauge bosons can be separated at jet final state.
Using the jet clustering and matching algorithms, the jet response is also analyzed at each individual jet. The overall response includes the detector resolution, the ambiguous induced by the jet clustering and the mismatching. These effects are physics process dependent and a complete analysis is beyond the scope of this manuscript. In this paper, this analysis is limited to individual jet reconstruction performance at ννqq process.
Corresponding to 5 ab −1 integrated luminosity at the CEPC, we simulate 1.8 millions ννqq, 11 millions lνqq and 170 thousands ννH, H → j j events at the CEPC v_1 geometry. All these samples are reconstructed with Arbor. Fig. 12 shows the inclusive reconstructed boson mass distributions normalized to unit area. These distributions are well separated, each exhibits a peak at the expected boson mass. These mass distributions are all asymmetric for different reasons. At the low mass side, the green distribution, corresponding to ννH, H → j j events, has a long tail. This tail is mainly stemmed from the neutrinos generated in the heavy jets fragments (most of the H → j j events are H → bb events ). The heavy jet components are also responsible for the low mass tail in the other two distributions. Because W boson hardly decays into b-jets, the low mass tail of lνqq sample is much less significant. The Breit-Wigner width of massive gauge bosons and the phase space effects also contribute to the long tails at the lνqq and the ννqq samples. The high mass tail induced by ISR photon(s) is observed in each distribution.
To decouple the detector response from these physics effects, a standard event selection is designed: 1, the jets are generated from light flavor quarks (u, d) or gluons.
3, there is no energetic visible final state ISR photon: the accumulated scalar transverse momentum of the ISR photons should be smaller than 1 GeV. 4, there is no energetic jet neutrino: the accumulated scalar transverse momentum of the jet neutrinos should be smaller than 1 GeV. This event selection clearly leads to much narrow boson mass distribution and much better separation, see Fig. 13.
After this event selection, the mass distributions are much symmetric. The Higgs boson mass could be simply fit to a Gaussian, while the other two distributions include the nonnegligible intrinsic widths. The efficiency of this event selection depends on the decay branching ratio (condition 1), differential cross section (condition 2), the radiation behavior (condition 3) and jet fragmentation (condition 4). As in the ννH, H → gg sample, this event selection has an overall efficiency of 65% (75%/94%/94% for the 2nd/3rd/4th condition, respectively). The relative mass resolution of the Higgs mass is then 3.8%, providing a quantitative reference for the BMR.
It should be remarked that both lepton identification and jet flavor tagging information are available from current reconstruction. Combing these information enhances the distinguishing power on different physics processes.  The calibration process plays an important role in measuring the jet energy. Technically, Arbor was calibrated via two steps, the single particle level calibration, and the datadriven calibration. The single particle calibration is to figure out the global ECAL/HCAL calibration constants according to the comparison between the reconstructed neutral particle energy and the truth. The ECAL calibration constant is derived from photon samples while the HCAL calibration constant at K 0 L samples. Due to the Particle Flow double counting, i.e. the fragments of charged particle showers are misidentified as neutral particles, the single particle calibration leads to typically 1% overestimation on the boson mass. The data-driven calibration is to scale all the reconstructed boson masses according to the W mass peak exhibited in the lνqq events, the leading physics processes of the above three. This simple calibration simultaneously scales the three boson mass peak positions to the expected positions. To fully appreciate the enormous productivity of massive bosons at the CEPC, sophisticated calibration methods must be developed and validated for the real experiments, i.e. control and corrections of differential dependences, insitu calibrations, detector homogeneity monitoring and control, etc.
The reconstruction performance of individual jet is explored via the same ννqq sample. Using ee-anti-kt algorithm (a.k.a Durham algorithm [40]), all the reconstructed particles are forced into two jets (recojets). The same jetclustering algorithm is applied to the visible final state particles at the MC truth level, forming the generator level jets (genjets). Using a matching algorithm that minimizes the angular difference, the jet reconstruction performance is characterized by the difference between the 4-momentum of the initial quarks, the genjets, and the recojets. The difference between the quarks and the genjets is mainly coming from the fragmentation and the jet clustering processes, while the difference between the genjets and the recojets is induced by the jet clustering, matching, and the detector response. A dedicated analysis shows that, even at this simple di-jet process, the uncertainty induced by the jet clustering and matching can be as significant as those from the detector response [41].
These two reconstructed jets are classified into leading/subleading jets according to their energy. The relative energy difference between genjet and recojet is then fit with a doublesided crystal ball function. The exponential tails are mainly induced by the jet clustering algorithm, the matching performance, and the detector acceptance. The Gaussian core then describes the detector resolution, therefore we define its mean value as the Jet Energy Scale (JES) and its relative width as the Jet Energy Resolution (JER). lap part between the endcap and the barrel. The JES is also larger in the endcap than in the barrel. These patterns are correlated with the Particle Flow confusions, especially the artificial splitting of the charged clusters. Not surprisingly, the leading jets have a systematically higher JES comparing to the sub-leading one. Without any corrections, the entire amplitude of the JES is controlled to 1% level, which is significantly better than that of LHC even after the correction [42].
The jet energy resolution (JER) at different jet transverse momenta is displayed in Fig. 15. The overall JER takes a value between 6% (at P t < 20 GeV) to 3% (at P t > 100 GeV). The leading jets usually has a slightly better JER comparing to the sub-leading ones. Taking the performance of the CMS detector as a reference, the JER at the CEPC reference detector is 2-4 times better at the same P t range [42].
To conclude, the jet energy response has been analyzed at the BMR level and at the individual jet level. For physics events with only two jets, the boson mass could be measured

CEPC Preliminary
Subleading q q ν ν → ZZ Fig. 15 The jet energy resolution for leading (upper) and sub-leading jets (lower), as a function of the jet transverse momenta. The performance at the CMS [42] has been overlapped for comparison. to a relative accuracy better than 4% at CEPC v_1 using a standard event selection. This resolution ensures significant separation between the W boson, the Z boson, and the Higgs boson. At individual jets, the JES is controlled to 1% level and the JER of 3% to 6%, both are significantly better than the LHC detector performances. This superior performance is based on the clean electron-positron collision environment, the PFA oriented detector design and reconstruction. It is highly appreciated for the CEPC physics program, i.e. the measurements of W boson mass at the CEPC Higgs operation. It should also be emphasized that the jetclustering algorithm has a strong and even dominant impact on the physics measurements with multiple jets in the final states.

Jet Flavor Tagging
Identification of the jet flavor is essentially for the measurement of the Higgs couplings (g(Hbb), g(Hcc), g(Hgg)) and the EW observables at the CEPC. During the jet fragmentation cascade, the heavy flavor quarks (b and c) are mostly fragmented into heavy hadrons (i.e. B 0 , B ± , B s , D 0 , D ± , etc). Those heavy hadrons have a typical cτ of a few hundred micrometers. Therefore, the reconstruction of the secondary vertex is crucial for the flavor tagging. The information of jet mass, vertex mass, number of leptons, etc, are also frequently used in flavor tagging.  Technically, the flavor tagging is operated using the LCFI-Plus package [29], the default flavor tagging algorithm for the linear collider studies. At CEPC studies, the LCFIPlus takes the reconstructed final state particles from Arbor, reconstructs the second vertexes and performs the flavor tagging. For each jet, LCFIPlus extracts more than 60 distin-guish observables and calculates the corresponding b-likeness and c-likeness using the Boost Decision Tree method [30]. Since the b-mesons have longer lifetime compared to the cmesons, the c-tagging is much more challenging than the b-tagging. Thanks to the high precision vertex system, the c-jet could be distinguished from other jets at the ILD detector and the CEPC v_1 detector. Fig. 16 shows the reference ROC curve trained on Z → qq sample at 91.2 GeV center of mass energy. The X-axis indicates the b/c-jet efficiency, while the Y-axis represents the surviving rate for the backgrounds.
Applying to the inclusive Z → qq sample, the typical performance of the b-tagging reaches an efficiency/purity of 80%/90%, changing the working point to a reduced efficiency of 60%, the purity could be enhanced close to 100%. While for c-tagging, a typical working point has the efficiency/purity of 60%/60%.
It should be emphasized that, with the current detector geometry design and reconstruction algorithm, the c-tagging is still very difficult. As a result, the accuracy of g(Hcc) measurement is largely limited by the contamination from the H → bb events.

Conclusion
Adequate reconstruction and detector designs are crucial for the success of particle physics experiments. Targeting at precise the precise measurements of the Higgs boson properties and the EW observables, the CEPC needs detectors that can reconstruct all the physics objects generated at its Higgs/EW events. The current CEPC studies use Arbor reconstruction and the PFA oriented detector designs as the baseline. This manuscript provides a global description of the physics performance on the physics objects reconstruction and on some benchmark analyses.
Arbor is optimized to fulfill the CEPC physics requirements. It reads all the calorimeter hits and tracks and builds reconstructed particles. The physics objects are then reconstructed from the reconstructed particle list. Inspired by the tree topology of the particle showers, Arbor could efficiently separate nearby particle shower, reconstruct the inner shower structure, and maintain a good energy collection efficiency for individual particles. Applying Arbor at the CEPC v_1 geometry, the following performance has been achieved.
1, Lepton identification: ε e→e > 99.5%, ε µ→µ > 99.5%, P h→lepton < 1% for isolated tracks with energy larger than 2 GeV; 2, Charged Kaon identification: efficiency/purity of 91-97%/94-97% at inclusive Z pole sample with energy range of 2 -20 GeV; 3, Photon reconstruction: a relative accuracy of 1.7%/2.3% is achieved for the Higgs mass reconstruction at H → γγ event using simplified/CEPC v_1 detector geometry; 4, τ: A relative accuracy of 1% could be achieved for the signal strength measurement of H → τ + τ − events; 5, Jet energy resolution: A relative accuracy of 3.8% of Boson mass reconstruction is achieved at a cleaned H → gg event sample. The Higgs boson, the Z boson, and the W boson can be efficiently separated from each other in their hadronic decay modes. The jet energy scale is controlled to 1% level. At individual jet, the relative jet energy varies from 3% to 6%, depending on the jet transverse momentum.
6, Jet Flavor Tagging: at the inclusive Z → qq samples at 91.2 GeV, the b-jets could be identified with an efficiency/purity of 80%/90%; while the c-jets could be identified with efficiency/purity of 60%/60%.
These key physics objects at the CEPC can be successfully reconstructed. The performances at the single particle level, such as the leptons, the kaons, and the photons at simplified geometry, are close to the physics/hardware limits. The separation and high-efficiency reconstruction of charged particles/photons ensure good τ lepton reconstruction. The jet energy resolution leads to a clear separation between massive bosons at di-jet events. At individual jets, the uncertainty induced by the final state particle reconstruction is comparable or smaller than these from jet clustering algorithms. Meanwhile, using final state particles reconstructed by Arbor, the LCFIPlus algorithm could distinguish b-jet, c-jet, and light-jet from each other. In terms of overall performance, the Higgs couplings to its decay final states can be determined to 0.1-1% accuracy, mostly limited by statistics [9]. Therefore, the PFA oriented detector design and Arbor fulfill the CEPC physics requirements on the physics object reconstruction.
In terms of the reconstruction algorithm development and the detector design, huge efforts are needed to bridge the Proof of Principle to the engineering design. Here we would like to emphasize a few key topics to be explored in the future.
1, The systematic control and in-situ monitoring method. Systematic control is fundamental to the physics measurements. Given the large integrated luminosity at the CEPC, the stability and the systematic control of the CEPC detector system is extremely important and challenging, especially for the Z pole operation.
2, A global design of the DAQ system. A global design of the DAQ system, with which the power consumption could be better estimated, is crucial for the further design/optimization work at the detector geometry.
3, Detector integration studies. The detector design needs to ensure that at the integration level, the detector is stable enough to be operated continuously for decades. Thermal simulation and mechanic studies are crucial, which have not been covered yet. An on-line system that monitors the tension, the temperature, and possibly other condition data like B-field strength, needs to be designed and validated. 4, Development and validation of sub-detector digitization algorithms. A proper modeling of the detector response is crucial for the systematic control. In principle, all the sub-detectors need to have mature test beam references. The difference between test beam data and the MC simulation needs to be quantized, properly modeled, and integrated into future simulation tools. 5, Advanced reconstruction algorithm and pattern recognition studies. The current Arbor uses only the hit spatial information in its topological clustering. A better usage of the hit time, energy information should significantly enhance its physics performance. The pattern recognition plays an essential role in the reconstruction/analysis. Meanwhile, the artificial intelligence is in a blooming development. The experimental particle physics should also benefit from this trend, making synergies and extend the physics potential accordingly.