Calibration Strategy of the JUNO Experiment

We present the calibration strategy for the 20 kton liquid scintillator central detector of the Jiangmen Underground Neutrino Observatory (JUNO). By utilizing a comprehensive multiple-source and multiple-positional calibration program, in combination with a novel dual calorimetry technique exploiting two independent photosensors and readout systems, we demonstrate that the JUNO central detector can achieve a better than 1% energy linearity and a 3% effective energy resolution, required by the neutrino mass ordering determination.


Introduction
Neutrino oscillation experiments [1] have firmly established that there are three neutrino mass states with eigenvalues m 1 , m 2 , and m 3 , and that at least two of these values are known to be different from zero. Solar neutrino experiments have established that m 1 < m 2 . However, there has been no clear experimental evidence whether m 1 < m 2 < m 3 or m 3 < m 1 < m 2 , conventionally referred to as the normal or inverted neutrino mass ordering (MO), respectively. The Jiangmen Underground Neutrino Observatory (JUNO) is a multi-purpose experiment designed to elucidate fundamental neutrino properties, study neutrinos with astrophysical or terrestrial origins, and search for rare processes beyond the Standard Model of particle physics [2]. Its 20 kton liquid scintillator (LS) central detector (CD) is located 680 m underground in Guangdong, China. The detector is at an equal distance of 53 km from the Yangjiang and Taishan nuclear power plants. Such a medium baseline configuration is ideal for the determination of neutrino MO using the electron antineutrinos from reactors [3][4][5][6]. Due to neutrino oscillation, the survival probability of electron antineutrinos can be written as:  In this expression, θ 12 and θ 13 are the neutrino mixing angles, ∆m 2 ij ≡ m 2 i −m 2 j is the masssquare-difference between eigenstates i and j, L is the distance from neutrino production to detection, and E ν is the antineutrino energy. Numerically, ∆m 2 21 ∼ 7.5 × 10 −5 eV 2 , and |∆m 2 32 | and |∆m 2 31 | are much larger (∼ 2.5 × 10 −3 eV 2 ). The normal or inverted MO corresponds to |∆m 2 31 | > |∆m 2 32 | or |∆m 2 32 | > |∆m 2 31 |. Due to the fact that θ 12 ∼ 34°, the two oscillation frequencies driven by ∆m 2 31 and ∆m 2 32 are weighted differently in (1.1). This allows the determination of the MO if the oscillation pattern in E ν can be measured with very high precision.
The JUNO LS is contained inside a 35-m diameter acrylic sphere with 12 cm thickness, strengthened by 591 stainless steel connection bars. About 17,600 20-inch and 25,600 3inch photomultiplier tubes (PMTs) are closely packed and immersed in ultra-pure water outside the acrylic sphere oriented towards the LS. These two sets of photosensors and their corresponding readout electronics constitute the LPMT (large) and SPMT (small) systems, forming the basis for the dual calorimetry discussed below.
The electron antineutrinos are detected via the so-called inverse β-decay (IBD),ν e +p → e + +n, where the electron antineutrinos interact with protons in the LS to produce positrons and neutrons. The positron creates a prompt deposit of kinetic energy within a range from 0 to about 8 MeV in the LS, then annihilates with an electron producing gamma rays. The energy of the anti-neutrino is related to the kinetic energy of the positron as E ν ≈ E e+ + 1.8 MeV. The neutron will be captured on hydrogen (99%) or carbon (1%) within an average delay time of ∼200 µs, producing gammas of 2.22 MeV or 4.95 MeV, respectively.
Particle interactions in the LS will produce scintillating (dominant) and Cherenkov (sub-dominant, ≤10%) photons, in numbers mostly proportional to the deposited energy, which will then be converted to photoelectrons (PEs) by the PMTs and digitized by the electronics. However, there is an intrinsic non-linearity in both light emitting mechanisms. Scintillation photon yield follows a so-called Birks' law with "quenching" effect depending on the energy and type of particle [1], whilst the Cherenkov emission depends on the velocity of the charged particle. The combined effect will be referred to as the "physics non-linearity" hereafter, which can be calibrated by the combination of radioactive sources and natural radioactivity background. The PMT instrumentation and electronics may carry additional event-level "instrumental non-linearity", a nonlinear response between the created photons in the LS and the measured charge from the electronics, originated from channel-level instrumental non-linearity. For JUNO, this effect is particularly delicate, since at a given energy the charge response of single LPMT varies by more than two orders of magnitude depending on the vertex position. A novel methodology called the dual calorimetry is designed, utilizing a comparison between the LPMT and the SPMT systems to directly calibrate such non-linearity.
In addition to the energy non-linearity, the total number of PEs collected by the LPMT and SPMT systems is also position-dependent in JUNO due to PMT solid angles, optical attenuation effects, reflections at material interfaces, shadowing due to opaque materials, etc. This intrinsic position non-uniformity, mostly energy independent, has to be corrected by a multi-positional calibration to optimize the energy resolution.
The determination of MO requires that the uncertainty of the positron kinetic energy scale should be better than 1%. Moreover, the effective energy resolution has to be better than 3% [2,7,8], an unprecedented energy resolution in any of the LS-based neutrino experiment. To achieve these stringent requirements, not only the detector performance has to be excellent (e.g. high light yield and LS transparency), but a comprehensive calibration program is also a must. Some early concepts of JUNO calibration have been introduced in [9]. In this paper, however, we focus on the calibration strategy for JUNO and demonstrate how it helps satisfy the physics requirements.
The remainder of the paper is organized as follows. In section 2, we discuss the approach of the energy scale calibration to tackle the physics and instrumental non-linearity. We then develop the method to minimize the energy resolution via correcting the position nonuniformity in section 3. Based on these, we present the conceptual design of the complete calibration hardware in section 4, and the calibration program that we envision in section 5, before we conclude in section 6.

Energy scale calibration
A custom Geant4-based (version 9.4.p04) [10] software, called SNIPER [11], is used to perform calibration-related simulations. SNIPER contains among others the up-to-date JUNO detector geometry and optical parameters. The JUNO LS was tested in a decommissioned Daya Bay detector, so the LS optical parameters such as the light yield, absorption and re-emission probability are tuned to the data [12,13]. The Rayleigh scattering length is obtained from a separate bench experiment [14]. The optical parameters of the acrylic sphere, ultrapure water and other materials are taken from bench measurements. The quantum efficiency and collection efficiency (angle-dependent) of the LPMTs are initially set at the average value from quality assurance tests and can be adjusted individually in the simulation. The "low energy Livermore model", which incorporates atomic shell cross section data [15], is selected as the electromagnetic interaction model in SNIPER.

Selection of sources
The radioactive sources and processes considered in JUNO and types of emitted radiation are listed in table 1. For a large LS detector, thin-walled electron or positron sources would pose risk of leakage of radionuclides. Instead, we consider γ sources ranging from a few hundred keV to a few MeV to cover the range of the prompt energy of IBDs. Concerning the 68 Ge source, it decays in 68 Ga via electron capture, which then β + -decays to 68 Zn. The kinetic energy of the positrons will be absorbed by the enclosure, so only the annihilation gammas are released. In addition, (α,n) sources such as 241 Am-Be (AmBe) and 241 Am-13 C (AmC) can be used to provide both high energy gammas and neutrons, the latter of which also produces capture gammas on hydrogen and carbon atoms in the LS.

Model of physics non-linearity
Most IBD positrons lose energy by ionization before they stop. A stopped positron can either directly annihilate with an electron, or form para-or ortho-positronium bound state before annihilation [16][17][18]. The annihilation produces either two or three gammas, respectively, with a total energy of 1.022 MeV. We define a general term "visible energy" as the energy estimated based on the detected number of PEs where the light yield Y 0 is a constant obtained from calibration (see section 2.1.3). The visible prompt energy of an IBD can then be decomposed as: The first term E e vis describes the visible energy associated with the kinetic energy of the positron, which is approximately the same for an electron with the same energy. The second term E anni vis is the visible energy of the annihilation gammas, which can be approximately calibrated using an enclosed 68 Ge source. The physics non-linearity is defined by: in which E e is the true kinetic energy of the electron or positron. Therefore, if f nonlin can be determined via calibration, then for each IBD event, the true kinetic energy of a positron can be reconstructed as: A gamma deposits its energy into the LS via secondary electrons, allowing a robust determination of f nonlin using the gamma calibration data [19][20][21]. Without loss of generality, f nonlin is modeled as a four parameter function, as used in the first Daya Bay spectral analysis [20]: where the numerator is to include a first order non-linearity and the denominator is to ensure a damping of the non-linearity at high energy, as both Birks and Cherenkov contributions are expected to get more linear with energy. The visible energy of the gamma can be written as In this expression, P (E e ) denotes the probability density function of a given gamma source converted to secondary electrons/positrons at an energy E e via Compton scattering, photoelectric effect, or pair production, determined from the simulation. I is a normalization factor very close to unity to account for an order of 0.2% missing energy in the simulation due to secondary production thresholds.
The highest gamma energy in table 1 is 6.13 MeV, which is insufficient to cover the full range of IBD positron energies. To extend the energy range in calibration, cosmogenically produced 12 B, about 1000 events per day in JUNO [2], will be used. 12 B decays via βemissions with a Q value of 13.4 MeV and a lifetime of 29 ms, with more than 98% into the ground state of 12 C. Therefore it offers complementary constraints to f nonlin (E e ) at the high energy end [22]. 12 B events can be cleanly identified by looking for delayed high energy β event (with a few percent mixture of 12 N, a β + -emitter) after an energetic muon [23,24].

Calibration procedure
To validate our constraints to the physics non-linearity using calibration, bare gamma sources in table 1 are simulated from the center of the CD, while 12 B decays are simulated uniformly in the detector. The light yield is determined to be Y 0 =1345 PE/MeV by taking the simulated neutron captures on hydrogen at the CD center, and dividing the mean number of PEs by 2.22 MeV. The visible energy of each event is then reconstructed by (2.1). 100,000 events are simulated for each gamma source, for which the centroid is determined by Gaussian fit to a level of 0.01% statistically. The statistics of 12 B events is assumed to be equivalent to one month running period. The non-uniformity correction (3.3) is applied to these events, as well as a 3 MeV threshold to suppress accidental background. In addition, a radius cut of 15 m is chosen to avoid energy leakage via Bremsstrahlung and complicated non-uniformity correction close to the boundary, leading to a 61% acceptance.
To combine the gamma sources and 12 B data to fit for the four parameters of f nonlin , we define a χ 2 as: where M γ i and P γ i are the measured and predicted (see (2.6)) visible energy peaks of the ith gamma source, respectively, and σ i is the statistical uncertainty of M γ i . The 12 B visible energy spectrum is binned into 90 bins from 3 MeV to 12 MeV, so M B12 j and P B12 j are the number of measured and predicted events in the jth bin. In addition to f nonlin , P B12 j also takes into account spectral smearing according to the energy resolution function (3.1). The systematic uncertainty of the calibration is neglected in (2.8), but is included separately in the uncertainty band of electron non-linearity using the procedure in section 2.3.7.
The non-linearity parameters p i in (2.5) can now be determined using the χ 2 fit. For the gamma sources, the ratios of the best fit visible energy to the true are compared to those from the simulated data in figure 1(a). The best fit model for 12 B spectrum is also overlaid with simulated data in figure 1(b). Excellent agreements are observed in both figures.   For comparison, we use the same SNIPER simulation to produce the visible energy for individual mono-energetic electrons at the CD center and to extract the true inherent f nonlin . The best fit f nonlin is displayed in figure 1(c), together with the true, and they agree within 0.3% within the entire energy range from 0.5 to 8 MeV.  For a final sanity check, a simulation of mono-energetic positrons at the CD center is performed. The kinetic energy is reconstructed event-by-event using (2.4), in which E anni vis is obtained from the simulation of an enclosed 68 Ge source, and the best fit f nonlin is taken from the previous step. The residual bias in the reconstructed energy, as depicted in figure 2, is within ± 0.2%, with a 0.7% uncertainty band to be discussed in section 2.3.7. The positron energy non-linearity could be further improved by utilizing additional cosmogenic background in the detector, for example, 10 C and 11 C.

Calibration of instrumental non-linearity
As mentioned earlier, even at a given energy, the single channel response of LPMT is strongly position dependent in JUNO. Channel-level non-linearities between the actual photons and measured charge would convolve into an event-level instrumental non-linearity, which is consequently entangled with position non-uniformity.
To correct for this complication, the LPMT system is first calibrated for its channel-level non-linearity utilizing the dual calorimetry calibration technique, which implies the response comparison between the LPMT and SPMT calorimetries, with the help of a tunable light source covering the full range of the uniform IBDs (0 to 100 PE per LPMT channel). Such a UV laser system has been developed and described in [25], which can produce a uniform illumination on all channels over the desired range when flashing from the center of the CD. Within such range, the SPMT can serve as an approximate linear in-detector reference, ensured by both photon counting and charge measurement, since SPMT channels primarily operate in the single photon regime. As the laser intensity varies, the ratio of the LPMT charge to that of the total SPMTs leads to a direct determination of the LPMT channellevel non-linearity. This calibration scheme is immune from the physics non-linearity, since both LPMT and SPMT are exposed to the same energy deposition. The non-uniformity is also irrelevant here, as the laser calibration source is kept in the detector center.
To illustrate this approach, electron events from 1 to 8 MeV are simulated uniformly  in the CD. An extreme channel-level non-linearity of 50% over 100 PE for the LPMT is assumed. As shown in figure 3, the impact the channel-level instrumental non-linearity can lead to an event-level non-linearity as large as 2% at 8 MeV. For comparison, if the LPMT charge is first calibrated and corrected at the channel-level with the dual calorimetry approach, the residual event-level non-linearity is reduced to <0.3%. Residual biases could still remain after the laser calibration if the instrumental nonlinearity depends on the photon arrival time profile, which may be different between the laser and physical events. This effect can be controlled by a systematic comparison of the LPMT charge responses between the laser and radioactive sources.

Evaluation of systematic uncertainties
The non-linearity calibration methodology above is designed upon the experience in Daya Bay [22], KamLAND [26], Borexino [27] and Double Chooz [28]. Residual systematic uncertainties and their combined effects to the positron energy scale are evaluated in this section.

Shadowing effect
A realistic radioactive source is not a point source. A typical source assembly we envision is shown in figure 4. The source is enclosed in a 6 mm by 6 mm cylindrical stainless steel shell, covered by bullet-shaped highly reflective Polytetrafluoroethylene (PTFE), and attached to a stainless steel wire with 1 mm diameter. A PTFE connector 160 mm above the source allows easy exchange of the source when desired. To maintain the tension in the wire, a weight of about 100 g covered with PTFE is also attached to the wire below the source, with a separation of about 160 mm.  Despite the PTFE materials, optical photons can still be absorbed by these surfaces, leading to a small bias in visible energy. To study this bias, a 90% reflectivity is assumed for the PTFE surfaces based on the worst case in the laboratory measurement [29]. Only events with energy fully absorbed in the LS region are chosen from the simulation to decouple this effect from the energy loss in dead material (see next).
The resulting biases, in comparison to the bare sources in figure 1(a), are shown in figure 5. The effect is less than 0.15% for all sources. The bias first reduces towards higher energy up to 40 K (1.5 MeV), as these gammas deposit energy further away from the source enclosure. The n-H point has the least bias due to additional displacement of the neutron before its capture. For the two highest energy gamma points, also produced by the neutron sources, the photons are on average produced closer to the weight/connector and are being shadowed.

Energy loss effect
Some gamma energy can also be deposited in the non-scintillating material, e.g. the enclosure of the source, leading to a leakage tail in the detected PE distribution. The bias to the peak is referred to as the energy loss effect. As an example, the measured PE distribution for a realistic 60 Co source is shown in figure 6, in which contributions from the fully absorbed peak in the LS and the leakage tail are separately plotted. A fitting function combining a single-value peak and an exponential tail, convolved with a resolution function [30], is adopted here to fit the simulated gamma spectra. The fractional difference of the best fit peak to those from the fully absorbed events are shown in figure 7, where residual biases are less than 0.06% for all sources.

PMT dark rate
Assuming an average dark rate of 30 kHz per LPMT channel (based on early quality assurance tests) and a 350 ns analysis time window, the dark rate pileup in any given physical event is estimated to be ∆Q DR = 350 ns×17600 LPMT×30 kHz/LPMT = 185 (PE) summed over LPMTs. Such an offset in energy can be precisely calibrated and subtracted using random triggers.
2.3.4 6.13 MeV gamma uncertainty 241 Am-13 C source can produce a 6.13 MeV gamma, but mixed with neutron-proton recoil energy. According to the simulation, the positive bias caused by the neutron-proton recoil is at a 0.4% level. Although subtractable, this leads to an uncertainty at the high energy end.

Instrumental non-linearity
As discussed in section 2.2, the uncertainty of the instrumental non-linearity can be controlled below 0.3%, conservatively taken as a fully correlated uncertainty in energy.

Residual bias after non-uniformity correction
Since IBDs are detected throughout the detector, position-dependent energy scale variation has to be corrected for. The correction will be discussed in section 3 as a major ingredient of energy resolution optimization. The residual bias is controlled to 0.3% level, which is conservatively taken as a fully correlated uncertainty in energy.

Combined systematic uncertainty
The effects discussed above are summarized in table 2. We assume that all the biases can be corrected, but with conservatively 100% uncertainty. We further separate the uncertainties into either correlated at different energy, or single point, depending on how they would move individual energy points up and down. For example, the uncertainties due to the shadowing effects are correlated among different sources. On the other hand, the bias due to 6.13 MeV gamma is independent from the others (single point).
To evaluate the combined effects, mock calibration data are produced by randomly biasing the naked source values ( figure 1(a)) and 12 B visible energy spectra ( figure 1(b)) according to the 1σ uncertainties in table 2, either in a correlated or single point fashion. For each set of the data, a fit as in section 2.1.3 is performed, yielding an electron non-linearity curve. This is repeated many times. The individual contribution is obtained by "turningon" only one effect in table 2 at a time and by repeating the procedure. The dominating contribution comes from instrumental non-linearity and position-dependent effect, each contributing to 0.3% in the overall uncertainty band. The 1σ distribution of these fitted models are overlaid in figures 1(c) and 2. The uncertainty band is ∼0.7% across all energy, which is better than the 1% requirement. Note that such precision has been experimentally corroborated by Daya Bay [22] and Double Chooz [31].

Optimizing the energy resolution
As outlined in the introduction, another key aspect of the calibration is to optimize the energy resolution for the IBD positron signals. In general, the fractional energy resolution for a visible energy E vis in MeV can be written as an approximate formula The a term is the statistical term, which is mainly driven by the Poisson statistics of true number of PEs associated with E vis . To set the scale, for a light yield Y 0 of 1345 PE/MeV, a is about 2.7%. The b term is a constant independent of energy, dominated by the position non-uniformity. In figure 8, the average number of PEs for 2.22 MeV gammas vs. radius is shown for a few representative polar θ angles in spherical coordinates. Starting from the detector center, the gradual increase is a combined effect of variations in active photon coverage and the attenuation of optical photons. The sharp decrease close to 15.5 m is due to the mismatch of the indices of refraction between the acrylic and water -the closer the event to the edge, the more likely is the occurrence of total reflection and consequently, the loss of photons by absorption. Within the same radius, there is an additional dispersion in θ due to effects such as PMT coverage and photon shadowing on opaque materials, etc. The parameter b estimated from figure 8 is of the order 10%, but can be largely suppressed based on position-dependent calibration. The c term represents the contribution of a background noise, i.e. the dark noise from the PMTs, which is always mixed with E vis in the measurement. The charge bias induced by dark rate (∆Q DR = 185 PE, see section 2.3.3) has a Poisson noise of 13.6 PE, so c is estimated to be 1.0%. The impact of the energy resolution (3.1) is studied in details in [2]. Mock neutrino energy spectra were generated assuming different values of a, b and c in (3.1), based on which MO fits were performed. It was found that the JUNO baseline requirement to determine the MO to 3 -4 σ significance could be translated into a requirement on an effective resolutionã as: Conceptually, the facts that in (3.1) the second term is not improving with the visible energy, and that the third term declines quickly with the energy, are reflected in factors 1.6 and 1/1.6, respectively. Note also thatã is only an effective parameter and is not the detector energy resolution at 1 MeV.
From the perspective of calibration, reducing the b term is the key to optimize the energy resolution, particularly given the large apparent non-uniformity depicted in figure 8. The non-uniformity can be studied by either deploying radioactive sources to fixed locations, or by using uniformly distributed background events, e.g. spallation neutrons (∼1.8 evt/s [2]). Although we focus on the first approach in this paper, it should be emphasized that the second approach will offer a powerful in situ cross check.
The non-uniformity is characterized by g(r, θ, φ), defined as the light yield in a given position relative to that at the center. The key question is how to calibrate g sufficiently well under realistic situations. In this study, we start by assuming that we have perfect positron sources deployable to any given location in the detector. We then gradually go to more realistic calibration with gammas at finite points. The performance of the calibration is assessed by reconstructing the visible energy for uniformly distributed IBDs at each MeV from 0 to 8 MeV as: where PE tot is the total number of PEs for the LPMT and SPMT, and the dark rate pileup ∆Q DR is included as an independent Poisson-fluctuated offset. After obtaining the resolution σ/E for each energy via Gaussian fit as in figure 9, (a, b, c) can be extracted and compared to (3.2).

Central IBDs
To start, IBDs are produced from the geometrical center of the detector, so that the position non-uniformity is completely absent, i.e. g = 1. However, the average number of detected PEs is lower than the full volume ( figure 8). Interestingly, a fit of the energy resolution of these positrons using (3.1) yields a = 2.62% and a non-vanishing b = 0.73%. The parameter c = 1.38% is also sizably larger than the naive dark rate estimate (1.0%). The origins of these non-Poisson fluctuations are traced in the simulation by switching off individual  processes. The residual b is dominated by non-Poisson distribution of Cherenkov photons due to track length fluctuations, leading to additional "constant" noise term in the detected PEs. The additional contribution in c is due to the fluctuations in secondary electrons produced by the annihilation gammas, folded with their corresponding non-linearity, leading to a smearing effect independent of the positron energy.

Ideal non-uniformity correction
In the ideal case, uniformly distributed IBDs are the calibration sources themselves. The detector is divided into 20,000 equal-volume "voxels", and g(r, θ, φ) is computed in each voxel for every energy, allowing a potential energy dependence to take into account effects such as energy leakage at the edges.
The fit of the energy resolution is shown in figure 10, yielding a=2.57%, b=0.73%, c=1.25%, andã = 2.93%. In comparison to those of the center IBDs, the reductions in a and c terms are due to the increase of full-volume detected PEs ( figure 8). This scenario will serve as the most ideal reference for the calibration.

Utilizing azimuthal symmetry
The azimuthal symmetry in JUNO detector allows the first simplification that only g in a vertical plane of the detector is needed. For illustration, g(r, θ) surface in the φ = 0 plane for positrons with zero kinetic energy is shown in figure 11. The two-dimensional g(r, θ) function obtained at each positron energy is then applied to the corresponding positron events. The resultingã is 2.96%.

Single gamma source
In this step, the hypothetical positron source is replaced by a realistic AmC neutron source (2.22 MeV capture gammas), so that both the particle type and energy dependence in g(r, θ) are neglected. The parameterã is found to be 2.98%. Note that this choice would also allow direct comparisons to the spallation neutron capture signals in the experiment.

Finite calibration points
The next critical step is to select a list of "must-do" calibration points while still maintaining a good approximation for g(r, θ). Several pragmatic considerations are made: 1. As observed in figure 11, there is no apparent symmetry in g(r, θ) that allows a vast simplification to the choice of points. However, the variation of g appears faster at the edge, which mandates more sampling points there. 3. For the region in between the central axis and acrylic boundary, 200 points would be reasonable, e.g. selecting 20 points along each of the radial line in figure 8. In this study, however, a semi-random approach is adopted to choose these 200 points in order to yield the best calibration performance. Since the response varies fast at large radius, 300 regular points are first selected along the 10 radial lines between 15 and 17 m, then another 1000 random points are chosen in the vertical plane. This 1300-point grid is used as the basic template from which the optimal 200 points are decided (see later).

4.
Realistically, not all points in (r, θ) are accessible. If we consider a cable loop system with an anchor on the inner surface of the CD [9], some points too close to the vertical geometrical limit cannot be reached due to the loss of cable tension (ref. [32]). To allow a good coverage we consider two cable loops (figures. 12 and 13) in the CD. Note that physically the two loops can be at different vertical planes, and azimuthal symmetry can be applied to combine them into a single (r, θ) half-plane.
The choice of the calibration positions is correlated with the choice of the two anchor locations (θ 1 , θ 2 ) of the cable loops. The following automatic optimization procedure is performed: 1. The 27+23 points are fixed along the vertical central axis and the LS-acrylic boundary.
4. For each set of 200 (random) plus 50 (fixed) points, simulations with 2.22 MeV gammas are performed. A smooth surface of g(r, θ) is constructed using a two-dimensional spline function.
5. g(r, θ) thus obtained is applied to the uniform positron events, on whichã is extracted as a figure-of-merit. Steps 1 through 5 are repeated until a minimalã is found.
The optimal choice of the calibration points are illustrated in figure 12, in which the best anchor locations are θ 1 = 48°and θ 2 = 78°, and the resultingã is 2.98%. On the other hand, for this (θ 1 , θ 2 ), out of 100,000 random choices, 40,000 choices yieldã less than 3.0%.

Vertex smearing
So far, true production vertex of the IBDs have been used to look up g(r, θ). In reality, the event-by-event correction has to rely on the reconstructed interaction vertex, which will in turn affect the quality of the correction. An algorithm based on time and charge information of LPMTs has been developed in [33], where a resolution of 8 cm/ E(MeV) (average distance between the true and reconstructed vertex) was achieved. To study the impact to the overall energy resolution, three vertex resolution assumptions have been made here, 8, 10, and 15 cm/ E(MeV). For each simulated event, Gaussian position smearing is applied accordingly before applying g(r, θ) in section 3.5. The corresponding result forã is 3.01%, 3.05%, and 3.10%, respectively.
Another related effect is the uncertainty on the calibration source location, leading to a small smearing on g(r, θ) values. The hardware requirement is a precision of 3 cm [9]significantly better than the reconstructed spatial resolution at 1 MeV. It is verified that this contribution can be neglected.

LPMT quantum efficiency variation
For each LPMT, the quantum efficiency (QE) was measured individually at 430 nm during the quality assurance process, and can be approximated by a Gaussian with a mean of ∼30%, and a σ/mean of 7% [34]. To take this effect into account, QEs of individual LPMTs in the simulation are drawn from the Gaussian distribution. The resultingã is 3.02%.

Realistic detector
So far we have discussed the simulation of an ideal JUNO detector with nominal parameters in the simulation. A real detector can certainly differ from the simulation, so it is important to study whether an in situ calibration in a realistic detector can still achieve the required resolution. Five conservative alterations of the CD are considered: 1. The JUNO LPMTs and readout electronics are designed to yield less than 1% dead channels (∼180) in six years. We assume a CD with 1% LPMT failure in random positions.
2. Same as above but forcing an additional asymmetry among the bad channels (75:105, an unlikely asymmetric failure with only 1% p-value if individual PMTs are failing randomly) in the two semispheres separated by the vertical calibration plane, breaking the assumed azimuthal symmetry.
3. The JUNO LS has been tested in one of the decommissioned Daya Bay detector for 1.5 years, so all optical parameters in JUNO LS are validated with these data. No temporal degradation of the light yield has been observed with a maximum variation of ±0.5% among measurements [13]. To be conservative, we assume that the light yield Y 0 is reduced by 1% and 5% and study their effects.

4.
A 4% reduction of the absorption length (default 77 m at 430 nm [12]) can also produce a 1% reduction of Y 0 . However, instead of a global reduction of light yield everywhere, the reduction of absorption length alters the uniformity of the detector. We assume such scenario in the simulation. Note that changes in LS transparency (both absorption and scattering) will be monitored to a precision of 1 m with a laser system named AURORA ( figure 13).
5. The average single PE resolution from the bench measurement of LPMTs is 30% [34], which will also affect the energy resolution, especially if the integral of the waveform is used as the energy estimator. A 30% single PE charge smearing is applied to each of the simulated PE from the LPMTs.
In each case, g(r, θ) is calibrated with the 250 points in figure 12, and then gets applied to uniformly distributed positrons. The results are summarized in table 3. For the two cases whereã exceed 3.1% (5% reduction of light yield, or the inclusion of 30% single PE smearing), sizable increases of the a term are observed (as expected), which obviously could not be mitigated by the calibration. For the 30% single PE resolution, our treatment is conservative since within the IBD energy range, there are still 40% of the LPMTs working under the single photon regime, the counting of which could partially mitigate the charge smearing. A successful application of this approach is in ref. [35]. A waveform deconvolution technique [36] may also offer a better estimator for the number of PEs.

Bias in the energy scale
A residual bias in positron energy scale can still exist after the non-uniformity correction. With nominal JUNO detector and 250-point-based g(r, θ), the relative difference between E prompt vis of uniform positron events to that at the CD center shows a less than 0.05% energy scale difference. For the five realistic detector conditions above, with the in situ calibration, the bias can be controlled to below 0.3%, as shown in table 3. Therefore, an additional 0.3% systematic uncertainty has been included in the positron energy scale in section 2.3.6.

Conclusion of the energy resolution
The step-by-step downgrade of the non-uniformity calibration from the ideal to the most realistic situation is summarized in table 3. One sees that the constant term b in the energy resolution can be optimized by utilizing a single gamma source deployed to about 250 points in a vertical plane of the CD, bootstrapped to the entire CD using a smooth two-dimensional spline function in (r, θ). This leads to anã of 3.02% for nominal JUNO detector, in agreement with the requirement put forward in [2]. For detector imperfections (below the double-line in table 3), the individual impact can be estimated by taking the difference of itsã and 3.02% in quadrature. Although it is difficult to predict what may happen in a real detector, the individual imperfection can lead to a worst-caseã of 3.12%, which is still sufficient to fulfill a 3σ determination of the MO [2]. Effects of combination of multiple imperfections are also studied in the simulation -approximately consistent with combining individual contributions in quadrature.  Table 3. Energy resolution after sequential downgrade from the ideal to realistic calibration, considering all assumptions from section 3.1 to section 3.9. Values in parentheses indicate fitting uncertainties, and the uncertainty ofã has taken into account the correlations in a, b and c. Each row from "Azimuthal symmetry" to "PMT QE random variations" indicates cumulative effects down to this row. This gives anã of 3.02% for nominal JUNO situation. Each line starting from "1% PMT death (random)" represents an individual imperfection of the CD, which also includes effects up to the double-line (nominalã).  Figure 13. Overview of the calibration system (not to scale), including the Automatic Calibration Unit (ACU), two Cable Loop Systems (CLSs), the Guide Tube (GT), and the Remotely Operated Vehicle (ROV). The red points represent a source assembly described in figure 4. The AURORA is an auxiliary laser diode system to monitor the attenuation and scattering length of the LS, which is beyond the scope of this paper.

Conceptual design of the calibration system
The hardware design of the calibration system is driven by the strategy outlined in sections 2 and 3, to confront the challenge of energy scale and resolution calibration. As demonstrated in section 3, such a system should be capable to place a source along the central axis of the CD, on a circle at the LS-acrylic boundary, and in the region in between. The corresponding hardware design consists of several independent subsystems (figure 13), which will be discussed in turn.

Automatic Calibration Unit (ACU)
The ACU is developed to do calibration along the central vertical axis z of the CD. The design is very similar to the ACU in the Daya Bay experiment [37], with four independent spools mounted on a turntable. Each spool is capable to unwind and deliver the source via gravity through the central chimney of the CD, with a better than 1 cm positioning precision in z. Three sources can be deployed regularly, including a neutron source (AmC), a gamma source ( 40 K), and a pulsed UV laser source carried by an optical fiber with a diffuser ball attached to the end [25]. To be flexible, the fourth spool will carry a replaceable source, for example a radioactive source or even a temperature sensor. Due to its simplicity and robustness, we envision to use the ACU frequently during data taking to monitor the stability of the energy scale, and to partially monitor the position non-uniformity.

Guide Tube system (GT)
The GT is a tube looped outside of the acrylic sphere along a longitudinal circle similar to that used in Double Chooz [28] and CUORE [38]. Within the tube, a radioactive source with cables attached to both ends gets driven around with a positioning precision of 3 cm. The design of this system is discussed in details in [39]. Although physically outside of the CD, MeV-scale gammas can easily penetrate the 12 cm acrylic and deposit energy into the LS. The full absorption peak will be mixed with a leakage tail due to energy deposition in the acrylic that can be disentangled by fitting. Based on the simulation studies in [39], this subsystem is sufficient to calibrate the CD non-uniformity at the boundary.

Cable Loop System (CLS)
As illustrated in figure 13, two CLSs will be installed in the two opposite half-planes to deploy sources to off-axis positions. The design concept is inspired by those in SNO [40], KamLAND [26], and Borexino [27]. For each CLS, two cables are attached to the source, which also form a loop to deliver and retract the source. Different sources can be interchanged on the CLS. The central cable goes upwards towards the north pole of the CD. The side cable winds through an anchor on the inner surface of the acrylic sphere, then also towards the north pole the CD. By adjusting the lengths of the two cables, the source could be delivered ideally within an area bounded by the vertical lines through the anchor and the central axis. More realistic coverage measured by a CLS prototype is discussed in [32]. The sources on the CLS will be positioned by an independent ultrasonic system to achieve a precision of 3 cm.

Remotely Operated Vehicle (ROV)
Locations other than the CLS plane could also turn out to be important, if significant local effects or azimuthal dependence were identified during data taking. These effects can be studied by physical background events, for example, the spallation neutrons. Alternatively, we also plan to have a ROV [41] similar to that in the SNO experiment [42], capable of deploying a radioactive source in almost the entire LS volume. As the CLS, the ROV also needs to work with the ultrasonic positioning system. The mechanical design also needs to be optimized in size and surface reflection to minimize the loss of photons. We envision that the ROV serves as a supplement to the ACU, CLS, and GT, and should be deployed infrequently.

Positioning system
For all source deployment systems above, the control of source position is crucial. Table 4 summarizes the expected precisions. The ACU and GT can achieve good positioning through accurate measurements of the cable lengths. For the CLS, due to the self-weight of the cable and the friction in the loop, cables do not run in straight lines, so calculations based on simple trigonometric relations introduce significant uncertainties [32]. For the ROV, it is even more complex as the positioning feedback is needed during the navigation. For these purposes, an independent ultrasonic positioning system has been developed [43]. Eight ultrasonic receivers will be mounted inside the acrylic sphere, and the source deployed by the CLS or ROV will carry a miniature ultrasonic emitter. Based on the prototype tests, such a system is capable to provide a positioning precision to a level of 3 cm.

System
Positioning precision [cm] ACU 1 CLS 3 GT 3 ROV 3 Table 4. Summary of expected positioning precision for each subsystem.

JUNO calibration program
Based on all discussions above, we can now streamline the calibration program. The rates of the radioactive source are set to be around 100 neutron or gamma emissions per second 4 , so that the data rate during the calibration does not differ too much from that during the neutrino data taking (∼1000 Hz).
Similar to the Daya Bay calibration, we envision to separate the program into comprehensive (but infrequent), weekly and monthly calibrations. As a requirement, source deployments should not introduce noticeable radioimpurity such as radon. The nominal speed of the source movement is about 1 m/min.

Comprehensive calibration
A comprehensive calibration is likely to be carried out at start of the experiment to achieve a basic understanding of the CD performance, and then a few times throughout the JUNO life time. Multiple sources will be deployed to the CD center to study the non-linearity. The UV laser diffuser ball will be deployed to the CD center and pulsed with a repetition rate of 50 Hz under eight different intensities (equivalent energy from 0.3 MeV to 1 GeV), to allow channel-wise instrumental calibration. In addition, the AmC neutron source will be deployed to the full set of 250 points (figure 12) using the ACU, CLS and GT. ROV is envisioned not to be deployed at the beginning of the experiment, but when enough evidence of local effects or azimuthal dependence is accumulated. An estimate time cost for a comprehensive calibration run is shown in table 5, with 100,000 events per calibration point at CD center (except 40 K with 6000 events), including the travel time to move the source into designations. The total time of this campaign is expected to be about 48 hours. Table 5. A baseline plan of a comprehensive calibration. The AmC will be deployed into 250 points utilizing the ACU, CLS and GT. All other sources will rely on the ACU only. More sources at other locations are also possible.

Weekly calibration
The weekly calibration is designed to track major changes of the detector properties such as variations in the light yield of the LS, PMT gains, and electronics. As shown in table 6, the neutron source (AmC) will be deployed to five locations along the central axis, each with one minute data taking, leading to a better-than 0.1% statistical uncertainty to the gamma peak. The UV laser intensity scan will also be taken at the CD center. The total time of weekly calibration is about 2.4 hours.  Table 6. Weekly calibration using the ACU.

Monthly calibration
The monthly calibration goes through a limited number of positions in figure 12, given that a comprehensive calibration has been performed at the start of the experiment, and that the temporal variations such as the PMTs and optical properties of the LS are minimal. As shown in table 7, the ACU, CLS, and GT will all be operated during the calibration.
The UV laser will also be deployed to the same locations as the AmC along the central axis. To balance the extensiveness and time cost, we select the full sets of 27 and 23 points from the ACU and GT, respectively, and 40 representative points in CLS to monitor the non-uniformity of the CD. The total time for a monthly calibration is expected to be 11.2 hours.  Table 7. Monthly calibration with ACU, CLS and GT.

Summary
We have carried out a comprehensive study to develop a multi-faceted calibration strategy to secure JUNO's full potential in the determination of the neutrino mass ordering. This study is based on the most up-to-date JUNO simulation software, including all major features of the detector design. We demonstrate that using various gamma and neutron sources, cosmogenic 12 B, in combination with a pulsed UV laser, the nonlinear energy scale of the positrons can be determined to a sub-percent level within the entire energy range of the IBDs. The novel dual calorimetry allows the clean determination of instrumental non-linearity, leading to robust and independent control of other non-linearity and nonuniformity effects. We also have developed a multi-positional source deployment strategy to optimize the energy resolution. With a selection of 250 key positions in a vertical plane of the detector and by utilizing the azimuthal symmetry, an effective energy resolutionã of 3.02% is achieved with the nominal JUNO detector parameters. This calibration plan requires a multi-component source deployment hardware including a vertical spooling system covering the central axis of the CD, a guide tube system attached to the acrylic sphere, two cable loops to cover a large fraction of area in a vertical plane, and a supplementary Remotely Operated Vehicle. In the end, we separate the calibration tasks into different frequency categories to ensure both the timeliness and the comprehensiveness. Approximately 3% of the total live time is dedicated to calibration. We demonstrate that with such a calibration strategy, JUNO's challenging requirements on the neutrino energy spectrum measurement can be achieved.