Gpu-based optical photon simulation for the LHCb RICH 1 detector

We present the investigation of the use of Opticks, a GPU-accelerated optical photon interface with the LHCb detector simulation, to improve computation time of optical photon propagation. The hybrid workflow, combining the particle simulation package Geant4 and Opticks, offloads optical photon propagation to GPUs, thereby accelerating the overall simulation process. The consistency of the results obtained from Geant4 and Opticks simulations is verified with a simplified LHCb RICH1 detector geometry, demonstrating the feasibility of the proposed approach. In addition, the ongoing transition to the NVIDIA OptiX 7 API and re-structuring of Opticks code is discussed within the context of HEP simulation workflows, with caveats explored.


Introduction
Simulation of proton-proton collisions dominates the computing resources of the LHCb experiment [1].This dominance is illustrated in Figure 1, which presents the CPU usage at LHCb from January 2022 to present [2,3].The CPU usage is dominated by detailed simulations based on Geant4 [4,5], with a secondary contribution from fast simulation options, such as ReDecay [6] and partial detector simulation.Through explicit measurements, it is known that the majority of time in LHCb simulation is dominated by the propagation of particles through the detector within the Geant4 framework [7].
The projected use of computing resources will only grow as we pass into the High Luminosity-LHC era, greatly surpassing the resources pledged to the experiment.As a result, research and development into simulation on diverse computing architectures is necessary.Graphical Processing Units (GPUs), are under active investigation, particularly as they have been designed for high-efficiency parallel processing, allowing for considerable computing cost reduction and increase in simulation speed.The LHCb collaboration utilises Geant4 for its fully detailed simulation [4,5].Numerous research and development endeavours have been initiated within the larger High Energy Physics (HEP) community, Fig. 1: LHCb experiment CPU usage broken down by task from 2022 to present.As illustrated in blue and orange, the dominant CPU user is the simulation of collisions.These numbers are from productions of simulated event samples for LHCb utilising the DIRAC project [2] and monitored via the DIRAC web portal [3].The largest chunk, labelled MC Simulation, represents the generation and propagation of particles throughout the complete detector geometry as described in Geant4.The slice labelled MC Fast Simulation is similar to MC Simulation, but utilises the ReDecay framework [6].Finally, MC Reconstruction represents the pattern recognition and track reconstruction, particle combinations and application of hardware and software level triggers.Jobs labelled as User are individual submissions, with the final slice representing all other contributions.Similar dominance of MC simulation is seen during past data-taking periods.
targeting the execution of specific simulation components on GPUs. Figure 2 illustrates the time used in Geant4 simulations per sub-detector for the latest release of the LHCb simulation package Gauss [8], highlighting that the computational resources for simulation are dominated by the calorimeters and the two Ring Imaging Cherenkov (RICH) detectors.These two systems are associated with two distinct physical processes: electromagnetic showers and the transportation of optical photons.The Geant4 collaboration is currently investigating methods to incorporate GPUs for modelling both processes.The Celeritas [9] and ADePT projects [10] represent such development initiatives, aiming to simulate the electromagnetic physics utilising a GPU.Furthermore, the Opticks package [11] provides an interface between Geant4 and the NVIDIA OptiX ray tracing engine to simulate photon interactions.This paper reports efforts testing the Opticks package for use by the LHCb simulation framework, further discussed in Sec.The acceleration of optical photon propagation is not only a necessity for future LHCb upgrades, but also for numerous neutrino experiments, due to the significant presence of cosmic muons, which can generate critical background noise for neutrino detectors such as Daya Bay [12] and JUNO [13].For example, a muon with a typical energy of 200 GeV traversing the JUNO scintillator is projected to produce tens of millions of optical photons [11].Similarly to the LHCb use case, the propagation of optical photons in these experiments accounts for over 99% of CPU time and imposes rigorous memory limitations on the storage of each event [11].Consequently, it is beneficial to explore transfer of optical photon simulation workloads from CPUs to GPUs, utilising potential programs such as Opticks that can perform these actions, and which can therefore accelerate the simulation of any experiment relying on Geant4 optical photon propagation.

OptiX ray tracing engine
The NVIDIA OptiX [14] ray tracing engine constitutes a versatile programmable system, specifically devised for NVIDIA GPUs and other highly parallel architectures.As a domain-specific just-intime compiler, the OptiX engine operates on the fundamental premise that the majority of ray tracing algorithms can be executed through a series of programmable operations, encompassing ray generation, material shading, object intersection, and scene traversal.OptiX ray tracing pipelines are assembled from a limited set of user-provided programmes, coded utilising CUDA [15] -a parallel computing platform and programming model established by NVIDIA for general-purpose computing on GPUs.This facilitates the implementation of an extensive variety of ray tracing-based algorithms and applications, including interactive rendering, offline rendering, collision detection systems, artificial intelligence, and scientific simulations, such as sound propagation.Spatial index data structures, exemplified by the boundary volume hierarchy (BVH) [16], primarily serve to expedite intersections between photon paths and geometry structures.OptiX offers a flexible interface, appropriate for a broad spectrum of applications, to regulate its acceleration structures and facilitate intersections with any geometric form.The OptiX acceleration structure system supports adaptable instancing, which pertains to the low-overhead replication of scene geometry via referencing the same data multiple times without duplication.This enables the acceleration of constructing multiple instances of identical geometry, such as the pixel Multianode Photomultiplier Tubes (MaPMTs) within the LHCb RICH1 detector.
In August 2019, NVIDIA unveiled OptiX 7, superseding the previous OptiX 6.5 API and offering a low-level, CUDA-centric API.The OptiX 7 API gives explicit control over memory management, compilation, and launches to the application, whilst preserving the ray tracing programming model and shader types.The Opticks package is currently undergoing a transition from the OptiX 6.5 API to OptiX 7 [17].Unless indicated otherwise, the subsequent analyses are performed utilising the OptiX 6.5 API.

Hybrid Geant4 and Opticks workflow
To facilitate the offloading of optical photon propagation in Geant4 to GPUs, while maintaining the simulation of other particles on CPUs, all elements of optical photon generation and propagation are converted into a suitable format and uploaded to the GPU by Opticks.The primary components within the hybrid workflow encompass detector description including material and surface properties, as well as optical photon generation and propagation as determined by optical physics.These aspects are covered in detail in Ref [11], and are summarised here.

Geometry translation
During the initialisation phase, Geant4 geometries are initially translated into the corresponding Opticks geometry buffers.This translation encompasses the conversion of materials, surfaces, solids, volumes, and sensors, and construction of complex geometries is accomplished utilising these primitive shapes through constructive solid geometry (CSG) modelling.The user-provided Geant4 geometry is then translated into a parallel tree of Opticks nodes.Each volume boundary is allocated an index uniquely identifying the combination of four indices representing outer and inner materials and outer and inner surfaces.Outer/inner surfaces manage inward/outward-directed photons, enabling the translation of Geant4 border and skin surface functionality.Surfaces exhibiting a non-zero efficiency property are employed to identify sensor volumes.To circumvent repetition of geometry translation in each run, the Opticks geometry is cached within a directory structure for future use.
Upon invoking the visualisation or actual simulation, the Opticks geometry buffers are further translated into an OptiX geometry node graph, which is also utilised to configure the resulting acceleration structures.One such visualisation is presented in Figure 4, which exhibits a ray traced rendering of the RICH1 detector geometry.

Optical photon generation and propagation
We summarise a few key points on optical photon generation and propagation here for future reference.Optical physics processes, such as scattering, absorption, scintillator re-emission, and boundary processes, are implemented in CUDA functions based on the corresponding Geant4 implementations.Rather than generating photon secondary tracks iteratively, pertinent "genstep" parameters, including the number of photons to generate and the line segment along which they should be generated, are collected.These gensteps, in conjunction with CUDA ports of the Cherenkov and scintillation generation, facilitate the direct generation of photons on the GPU within the ray generation programme provided to OptiX.Ultimately, only a small proportion of the photons detected as hits on the sensor volumes are copied back to the CPU and incorporated into Geant4 hit collections.

Test of Opticks with LHCb RICH1 detector
The LHCb detector employs two RICH detectors for particle identification [18,19].These detectors operate on the principle of Cherenkov radiation produced by charged particles traversing the radiator medium at velocities surpassing the speed of light in the radiator.The emitted light forms a conical pattern, characterised by an angle θ c contingent upon the particle's velocity, given by where β = v/c is the velocity of the particle normalised to the speed of light in the vacuum and n is the refractive index of the medium, which can depend on the wavelength of light traversing the medium.By ascertaining the Cherenkov angle, and combining it with the particle momentum derived from tracking measurements, a mass hypothesis can be tested for the particle, facilitating its identification.For this reason, statistical comparisons between any optical photon propagation offloading and nominal Geant4 techniques are imperative.
LHCb is equipped with a pair of RICH detectors, utilising two distinct radiators: RICH1 employs a C 4 F 10 radiator, possessing a refractive index of 1.0014, to identify tracks in the momentum range of 2 to 60 GeV/c, while RICH2 is filled with CF 4 gas, with a refractive index of 1.0005, specifically for momenta as high as 100 GeV/c.Both RICH detectors incorporate dual sets of mirrors.The primary mirrors are spherical and are tilted, reflecting Cherenkov photons towards one of the secondary mirrors.These secondary mirrors are effectively planar, redirecting the photons outside the LHCb detector acceptance, where the photodetectors are situated (refer to Fig. 3).This plane aligns with the focal plane of the corresponding segment of the optical system, causing Cherenkov photons produced by the same track to focus into a ring on the photon detection plane.The photodetectors selected for RICH1 consist of MaPMTs, which are structured as matrices of 8 × 8 anodes.RICH1 is outfitted with 1inch MaPMT modules, each with a pixel size of 2.88×2.88mm 2 .The Cherenkov angle resolution for RICH1 is expected to be approximately 0.83 mrad for a single photon [20].

Simplified RICH1 geometry
The full RICH1 system includes support structures, and is at a level of detail which is unnecessary for validation of the optical photon simulation presented here.Instead, we use a simplified RICH geometry to validate the performance of Opticks and check the consistency of Geant4 simulation and hybrid workflow simulation in a convenient way.This simplified RICH contains only a singular set of spherical and planar mirrors in the upper section of RICH1, and it does not incorporate features such as the beam pipe, magnetic shielding, and exit window -the optical geometry and the radiator are identical to those utilised in the full RICH1 detector.Detector properties, such as mirror reflectivity and MaPMT quantum efficiency corrections, are identical to those employed in the full LHCb RICH1.
Furthermore, to obtain the Geant4 and Opticks simulation results simultaneously, a Rich Simplified program based on the Opticks G4Opticks class has been developed.This program allows direct comparison and validation, as well as visualisation of the simulation results by propagation of the same event within the two frameworks.

Measurement of Timing and Statistical Comparisons
The measurement of timing is performed two ways, first using built-in timing capabilities of the Rich Simplified program, separately for Geant4 and Opticks timing, and second, utilising the NVIDIA Nsight tool [21], a package of debugging and profiling tools designed for NVIDIA GPUaccelerated applications.

Rendering of Geometries
The incorporation of the RICH1 and Rich Simplified geometries are shown in Figure 4.These renders show the capability of incorporation of arbitrary geometries within the Opticks framework.However, several iterations were necessary before an adequate geometry representation was achieved.These include overlapping volumes as seen by Opticks, incompatible boolean subtractions between Geant4 based geometries, and incompatibilities between detection planes of Geant4 and Opticks were discovered and subsequently updated in the geometry itself.These issues were reported to core developers.

Timing and Profiling
For timing studies we compared the performance using an Intel(R) Xeon(R) Silver 4210R 2.4 GHz CPU and two NVIDIA Tesla T4 GPUs.The timing results of the Rich Simplified programme with Opticks and Geant4 are reported in Figure 5. Reported CPU propagation times are measured using a single core, and GPU propagation times are presented per GPU used, with two GPUs used in total.Figure 5 top shows that there is an underlying over- head of device initialisation within the Opticks code and is verified via the NVIDIA Nsight tool.This time corresponds to 35% of the total time of the program for a single muon propagated through the Simplified RICH geometry.This effect is further quantified by Figure 5 bottom, which shows the photon propagation time divided by the total kernel execution time, indicating a measure of efficiency of the offload process.We find that for the Rich Simplified program, one needs to propagate 10 5 photons per kernel execution to ensure at least 95% of the execution time is used to propagate photons.

Statistical Comparisons
The agreement between Geant4 and Opticks simulation results is checked by running the Rich Simplified program and performing a statistical comparison of the results.Test hardware is a single NVIDIA Tesla T4 GPU with Intel(R) Xeon(R) Silver 4210R 2.40 GHz, 62G CPUs. Figure 6 displays the hits collected on the MaPMT photocathodes, obtained from a muon passing through the simplified RICH1 with an energy of 10 GeV.It is worth noting that these hits are acquired with accounting for quantum efficiency corrections.Upon the inclusion of quantum efficiency corrections, both Geant4 and Opticks simulation results yield approximately 60 hits, consistent in position of the detection plane and also yielding a consistent Cherenkov ring. Figure 7 displays the distributions of hit numbers gathered from 500 events by both Geant4 and Opticks simulations, fitted using Gaussian functions.The mean values of collected hits are consistent within one standard deviation, and both simulations demonstrate a similar Gaussian distribution, suggesting accurate translations of both geometry and physical processes.The variance in the mean values stems from two individual points: first, Geant4 allows the possibility for user defined physics processes to be implemented, which is taken advantage of in the Rich Simplified program, for optimisations such as the elimination multiple bounces of trapped optical photons.The Opticks package, however, has its own set of physics processes which have been incorporated within the library.These differing physics processes account for the shift in the mean position of the peaks.This difference is validated by removing certain user-implemented physics processes, resulting in a notable alteration in the observed discrepancies.
Lastly, the efficiency improvement of Opticks simulation over pure Geant4 simulation is evaluated in two ways: by running a single event with an increased number of generated photons, achieved by multiplying the expected number of generated photons by a magnification factor and then using this number in the simulation, or by running multiple events in a single run.In this particular instance, approximately 600 photons are generated during a single event.The results of both methods are presented in Fig. 8, indicating that Opticks simulation is approximately 10 times faster in the first case and 5 times faster in the second case than pure Geant4 simulation.It should be noted that the computation times reported only pertain to the duration taken for optical photon generation and propagation and do not take into account the time required for geometry translation.

Full RICH1 geometry
Upon verifying the consistency between Geant4 and Opticks simulation results and confirming Opticks' acceleration capabilities, a simulation of a single event through the complete RICH1 geometry is carried out.Figure 9 displays the hits collected on the PMTs' photocathode, resulting from random charged particles passing through the RICH1 detector.
From the collected hits within both Opticks and Geant4, it is clear that Opticks is also well suited for the use with more than one input particle per processed event, as multiple Cherenkov rings are visible.

Physics comparison of Opticks with and Geant4
As seen in Figure 9, the overlap of rings of multiple events shows the correct conversion of the geometry to Opticks.However, the difference between the distributions, illustrated by the shift in the mean number of hits in Figure 7 and in the areas outside of the rings in Figure 9, illustrates the need for automatic porting of user defined physics processes to the Opticks framework in an automatic fashion.Regardless, the work shows that irrespective of occupancy, the core physics results are consistent.Finally, the linear scaling of results with number of events in Figure 8 shows optimisation of the use of GPU resources.

Integration of Opticks with LHCb simulation framework
Integrating Opticks with the LHCb simulation framework [22][23][24], presents a considerable challenge, primarily due to the following two aspects:  The positions of the hits on the MaPMTs' photocathode in the x-y plane, obtained from the simulation results of Geant4 (in blue) and Opticks (in red) for the RICH1 geometry.The particle types, directions and initial energies of the events are generated randomly.
First, Opticks relies on numerous external packages, including Geant4, XercesC, CLHEP, and Boost, as well as other related libraries.These programs are all required to be specific versions for use in distributed computing resources.Although Opticks offers an automated command (opticks-full) to install these external packages, various issues may arise when attempting to install Opticks on a new machine.To minimise possible conflicts, a bash script is developed to update, build, and test Opticks, based on the LHCb Nightly Build System [25].This system is a collection of tools and scripts that automate the build and test tasks for LHCb software, making the installation of Opticks significantly easier and providing the necessary foundation for integrating Opticks with the LHCb simulation framework.It can also provide key insights into compatibility with future software versions.
Second, to deploy Opticks for use in simulation workloads, an interface between the LHCb core simulation framework and Opticks needs to be developed.This necessitates major tagged releases of Opticks and the use of consistent versions of the underlying software across all software projects, highlighted in the previous paragraph, with the compatibility between all used versions to be explicitly checked.In the context of the LHCb detector, the optical photon simulations in the RICH detectors represent one where Opticks would be invoked.The challenge lies in translating the RICH detectors from the detector geometry source provided by Gauss into the Opticks geometry buffers, interfacing other Gauss features such as algorithms, tools, and services with Opticks, loading external physics lists, and simulating the detector response.These issues within the integration process require further investigation and effort.Additionally, the detector construction relies heavily on boolean volumes, leading to an extremely unbalanced binary search tree.The balancing of this tree requires either the complete re-design of the geometry or other tools that can automatically balance the tree to be introduced.Another possible solution is the export of only the specific geometry elements for consideration for optical photon workloads by first exporting only these parts to a parallel geometry within Geant4, then exported to Opticks via GDML or otherwise.

Migration of Opticks to OptiX 7.0
The code of Opticks is currently in the process of transitioning to the new NVIDIA OptiX 7 API [17], and as a result, all the GPU code is currently being restructured.The Opticks GGeo geometry structure will be replaced by the CSG geometry model, and the interface between Geant4 code and the Opticks GPU context will be managed by the G4CXOpticks class.Figure 10 presents the renderings of the simplified RICH1 and full RICH1 geometries using the OptiX 7 API.At time of study, the full suite of physics processes was unavailable for OptiX 7, but the rendering capabilities were present, allowing only the report of rendering results within this paper.While this shows the possibility of translations of geometries with the same framework but an updated interface, these geometries will still have unbalanced binary trees, meaning experiments wishing to use this will need to either optimise their geometries themselves or incur non-optimal performance.Automatic tools for the detection of such unbalanced trees may be possible, aiding users in their use of Opticks.

Conclusion
The integration of Opticks, the GPU-accelerated optical photon simulation package, with the LHCb detector simulation framework has shown promising results in terms of efficiency improvement and consistency with Geant4 simulations.The hybrid workflow, which combines Geant4 and Opticks, has demonstrated the feasibility of offloading the CPU optical photon simulation to GPUs, improving the overall performance of the simulation process.The performance characteristics of the Rich Simplified program provided valuable insights for accelerating optical photon simulations in the LHCb detector, particularly in the RICH subsystem, contributing to more efficient data processing and analysis.We also highlight difficulties encountered and enumerate future steps necessary for inclusion in full production environments such as those utilised within LHCb.

2 .OFig. 2 :
Fig. 2: Time spent in each sub-detector relative to the total time for simulating 100 events, produced by the LHCb performance and regression (PR) tests.The left part of the plot shows individual sub-detector contributions and the right part shows sums by detector type, indicating that simulation of the RICH detectors amount to about one quarter of the total time.

Fig. 3 :
Fig.3: Schematic view of RICH1 (top) and RICH2 (bottom) detectors and their optical systems.In Run 2 of the LHC, the silica aerogel in the RICH1 was removed and is not considered further.Taken from[1,18].

Fig. 4 :
Fig. 4: OpenGL rendering of the full RICH1 (top) and simplified RICH1 (bottom) geometry.Both are converted from Geant4 into an Opticks geometry.The geometry is cut in the centre to allow the visualisation of components.The colour gradient is artificial and implemented to display the surfaces.

Fig. 5 :
Fig. 5: The top figure illustrates the kernel initialisation time (represented by the red line), photon propagation time (depicted by the blue line), and total CUDA kernel execution time (shown by the yellow line) as the initial photon numbers increase.The bottom figure displays the kernel execution efficiency, defined as the CUDA kernel execution time divided by the total compute time, as a function of the initial photon numbers.

Fig. 6 :
Fig. 6: The positions of the hits on the MaPMTs' photocathode are obtained from the simulation results of Geant4 (in blue) and Opticks (in red).The top figure displays the positions of the hits in the x − y plane, while the bottom figure shows the positions in the y − z plane.

Fig. 7 :
Fig. 7: The top figure displays the distributions of hit numbers from 500 muons at 10 GeV, simulated by Opticks (in red) and Geant4 (in blue).The means and standard deviations obtained from Gaussian function fits are indicated on the plots.The bottom figure presents the distributions of hit numbers from 500 electrons at 10 GeV.

Fig. 8 :
Fig.8: The computational times required by Geant4 (in blue) and Opticks (in red) simulations are displayed in the figures.The top figure illustrates the times as the number of photons generated in a single event is increased by a magnification factor, while the bottom figure depicts the times as the number of events is increased.

Fig. 9 :
Fig.9: The positions of the hits on the MaPMTs' photocathode in the x-y plane, obtained from the simulation results of Geant4 (in blue) and Opticks (in red) for the RICH1 geometry.The particle types, directions and initial energies of the events are generated randomly.

Fig. 10 :
Fig. 10: Renders of the simplified RICH1 (top) and full RICH1 (bottom) geometries utilising NVIDIA OptiX 7. The geometry is modelled using a shared CPU/GPU geometry model, known as the CSG geometry model, which is designed to operate with the NVIDIA OptiX 7 API.The geometry is converted using the CSG GGeo class and rendered with the assistance of the CSGOptiX class.