Taking the aforementioned considerations and requirements into account, a more detailed scheme of the simulation workflow becomes necessary, as outlined in Fig. 2. Some of the aspects of this diagram still need to be optimized or determined precisely. Nevertheless, with the basic design as given here, the modular functionality and building blocks can be developed in parallel. Rudimentary definitions of interfaces needed for these purposes are given below.
Note that the code fragments given as examples here are, first of all, not in any specific language and do not follow any specific syntax. This is a pure pseudo-code used to illustrate the basic functionality and employed patterns, and only vaguely resembles C++.
Conventions and Coding
A programming language offering high level of design flexibility and, at the same time, excellent compiler and optimization support is required. It is an advantage to chose a language that also has non-science relevance and thus assures long-term support, development, and expertise. For this purpose, we decided to use C++. At the beginning, ngC will be based on the C++17 standard, a choice that will most probably evolve in the future.
General guidelines for contributing of the code will be well defined and must be strictly enforced [33]. These guidelines will be distributed via the documentation section on the gitlab server mentioned above and/or the wiki pages. The guidelines can be discussed, agreed upon, and also improved in discussions between the developers and the project steering. One of the most important things in such a project is communication—and the code will be the prime means of communication between the team members [34], since, let us not forget, most of the time people will spend on this project which will be dedicated to reading other people’s code [35]. A more exhaustive list of core guidelines for C++ can be found in Ref. [36]. Those items are also relevant in this respect:
code must be accompanied with inline comments. Note that a well-chosen naming of identifiers and functions can greatly reduce the burden of documenting the code. A well-written code is self-explanatory to a large extent. In addition, for systematic documentation, doxygen commands must be used where possible.
One aspect of choices of the style should be to minimize the probability of programming errors. For example, pointers should be used only where absolutely necessary, and that should never be exposed to the user.
We will favor static over dynamic polymorphism. On a low level of the code, this will lead to the abundant use of templates. However, high-level users and physicists should not be exposed to templates, unless absolutely required.
Test-driven development is encouraged. Therefore, from early on, a useful setup of unit tests should be supported by the build system. The unit testing will be an essential part of ngC. A high coverage of code by tests will be a prime criterion for acceptance.
Dependencies
The use of external code and libraries must be minimized to the absolute minimum to stay conflict-free and operational over a very extended period of time. Individual exceptions might be possible, but must be well motivated and discussed before getting included into the mainline code. For each functionality, we should evaluate whether a basic reimplementation is more feasible than inclusion of an external dependency. In any case, whenever possible, appropriate wrappers in ngC should hide the implementation details of external packages to keep replacement or re-implementation option open without a need for breaking the interface. Likely packages and options for external libraries are (excluding packages that will be distributed together with ngC):
C++17 compiler.
CMake build system.
git [for development].
doxygen [for development].
presumably \({{\mathbf {\mathtt{{boost}}}}}\) for \({{\mathbf {\mathtt{{yaml}}}}}\) and \({{\mathbf {\mathtt{{xml}}}}}\), histograms, file system access, command-line options, light-weight configuration parsers (property tree), random numbers, etc.
HDF5 and/or ROOT for data storage [at least one of both required].
PyBind11 [37] for bindings to Python.
HepMC [38] as generic interface, also for exotics [optional].
To generate random numbers, we will use standardized interfaces and established methods. For testing purposes, the possibility to exchange the random-number engine should be relatively easy. No homegrown generators and only well established, checked, and vetted methods for generating random numbers should be used, likely provided by \({{\mathbf {\mathtt{{boost}}}}}\), as well.
Light-weight packages like small header-only libraries can be distributed together with ngC. Likely candidates are:
Eigen3 [39] for linear algebra.
catch2 [40] for unit tests.
PhysUnits [41] for units (see below).
Configuration
The framework has to support extensive run-time (from configuration files or on command line) as well as compile-time configuration. The latter involves conditional compilation, static polymorphism, and switching between policies in policy-driven classes.
The run-time configuration will support structured yaml or xml as input, either in a single file, or multiple files located in a directory. Modules of ngC can retrieve the required configuration via a global object in a structured way. Command-line options are parsed and provided via the same mechanism. By default, the complete configuration will be saved into the output file, and will thus, if needed, allow identical reproduction of a simulation at a later time. Physics modules can access configuration via a section name and a parameter name; for example
$$\begin{aligned} {{\mathbf {\mathtt{{primaryEnergy = Config.Get(''PrimaryParticle/Energy'');}}}}} \end{aligned}$$
where \({{\mathbf {\mathtt{{PrimaryParticle}}}}}\) is the name of the configuration section, and \({{\mathbf {\mathtt{{Energy}}}}}\) the parameter. The data can be obtained from files, or provided via the command line, for example via \({{\mathbf {\mathtt{{{-}{-}set PrimaryParticleEnergy = 1e18\_{eV}}}}}}\).
For more intricate situations where a simple configuration file might not be sufficient, or when a dynamic change of parameters during run time is needed, the simulation process can be more conveniently steered by means of a script. The library PyBind11 allows us to provide bindings to Python with minimal efforts.
Units
ngC will utilize the header-only library PhysUnits for handling quantities having physical dimensions (i.e., “units”). First, it allows us to conveniently attach units to the numerical literals in the code (e.g., \({{\mathbf {\mathtt{{auto \,criticalEnergy = 87\_{MeV};}}}}}\)), thereby avoiding other, hard to enforce explicit conventions and improving readability, especially in a collaborative environment.
Second, as the dimensions of quantities are encoded in their respective types, a dimensional analysis is imposed upon computations involving dimensionful quantities during the compilation. This way, an otherwise silent error of mismatched units is converted to a compile-time error, as in the following example:
$$\begin{aligned}&{{\mathbf {\mathtt{{Length\_t}}}}}\, {{\mathbf {\mathtt{{distance = 47.2\_cm;}}}}}\\&{{\mathbf {\mathtt{{Time\_t}}}}}\, {{\mathbf {\mathtt{{time = 35.9\_ns;}}}}}\\&{{\mathbf {\mathtt{{Speed\_t}}}}} \,{{\mathbf {\mathtt{{speed}}}}} = {{\mathbf {\mathtt{{distance + time;}}}}}\, {{\mathbf {\mathtt{{//}}}}}\, {{\mathbf {\mathtt{{compiler}}}}}\, {{\mathbf {\mathtt{{error!}}}}}\\&{{\mathbf {\mathtt{{Frequency\_t}}}}}\, {{\mathbf {\mathtt{{freq = 1\, /\, distance;}}}}}\, {{\mathbf {\mathtt{{//}}}}}\, {{\mathbf {\mathtt{{compiler}}}}}\, {{\mathbf {\mathtt{{error!}}}}} \end{aligned}$$
During compilation, the conversion of quantities to common base units (which the developer does not need to know and is internally chosen to minimize numerical errors) is performed.
Because of this functionality, this approach is more restrictive than more simplistic implementations like, e.g., provided in Geant4/CLHEP [28, 42], where units are provided only as a set of self-consistent multiplication constants. We believe, nevertheless, that the use of “strongly-typed units” will make development less error-prone.
At the same time, no run-time overhead is introduced when compiler optimizations are enabled since, after all, such a dimensionful quantity in memory is just the usual floating-point number.
Geometry, Coordinate Systems, and Transformations
A key ingredient to the usability of ngC is the ability to conveniently work with geometrical objects such as points, vectors, trajectories, etc., possibly defined in different coordinate systems. We will provide a geometry framework (with unit support fully integrated), to a large extent inspired by
, in which geometrical objects are defined always with a reference to a specific coordinate system. In our case, the relevant coordinate systems mainly comprise the environmental reference frame and the shower frame, but additional systems can be defined as needed. When dealing with multiple objects at the same time, e.g., \({{\mathbf {\mathtt{{sphere.IsInside(point)}}}}}\), is it automatically taken care of transforming the affected objects into a common reference frame. Therefore, when one can formulate his computations in a way that does not involve any specific coordinate system, the handling of potentially necessary transformations stays completely transparent.
As possible transformations that define coordinate systems with respect to each other, we restrict ourselves to the elements of the special Euclidean group SE(3) (see ref. [43]), i.e., rotations and translations. Although one might favor Poincaré transformations as they include Lorentz boosts, which are certainly required for interfacing external interaction models, this would require to add a time-like coordinate to all geometric objects. This adds a significant complexity to the code in our setup that is otherwise completely static. For example, the concept of a point fixed in space in the lab frame would require to be upgraded to a world line. We currently do not envisage to support modeling of relativistic moving objects in our environment—except for the particles, of course—as this would significantly complicate and slow down our particle tracking algorithms. Due to the special properties of rotations and translations, it is not computationally expensive to perform inverse transformations, because expensive matrix inversions can be avoided.
Regarding the aforementioned Lorentz boosts, special attention must be paid to ensure numerically accurate results in all relevant regimes, comprising the range from non-relativistic (\(\beta \ll 1, \gamma \simeq 1\)) to ultra-relativistic (\(\beta \simeq 1, \gamma \gg 1\)) boosts.
Particle Representation
The typical minimal set of information to describe a particle is: type, mass, energy-momentum, and space–time position. In certain use cases this can be extended, for example, with (multiple) weights, history information (unique ID, generation, grandparents, and interaction ID), or further information.
Interaction models typically do not care about the space–time part, since once the model is invoked according to the total cross section, the impact parameter is determined internally by the model in a small Monte Carlo procedure (and not from the microscopic positions of air nuclei in the atmosphere). Nevertheless, the propagation and the continuous losses will eventually need the space–time parts of the particle information.
Particle properties like mass and lifetime are extracted from the \({{\mathbf {\mathtt{{ParticleData.xml}}}}}\) file provided by PYTHIA 8 [44], together with their PDG code [45]. To allow for efficient lookup of these properties, the ngC-internal particle code is chosen to be different than the PDG code. Since the PDG codes only very sparsely cover a large integer range, they are not very useful as indices in a lookup table. ngC, therefore, uses a contiguous range of integers which is automatically generated from the union of all particles known by the user-enabled interaction models. Rather than using these integers directly in the ngC code, however, \({{\mathbf {\mathtt{{enum}}}}}\) declarations will be provided for convenience and improved code readability. In contrast to their corresponding numerical values, the \({{\mathbf {\mathtt{{enum}}}}}\) identifiers (e.g., \({{\mathbf {\mathtt{{Code::DStarMinus}}}}}\)) are guaranteed to be stable after recompilation with different interaction modules, as well as in future ngC releases.
For this purpose, the needed code is generated by a provided script before the actual compilation of ngC. This script will depend on the aforementioned file from PYTHIA. The output is C++ code that will allow to write expressions like these:
$$\begin{aligned}&{{\mathbf {\mathtt{{//}}}}}\,{{\mathbf {\mathtt{{compile-time\, evaluated\, expressions:}}}}}\\&{{\mathbf {\mathtt{{auto\, constexpr\, mElectron = ParticleData:}}}}}{{\mathbf {\mathtt{{:GetMass}}}}}{{\mathbf {\mathtt{{(Code:}}}}}{{\mathbf {\mathtt{{:Electron);}}}}}\\&{{\mathbf {\mathtt{{auto\, constexpr\, tauPi = ParticleData:}}}}}{{\mathbf {\mathtt{{:GetLifetime(Code:}}}}}{{\mathbf {\mathtt{{:PiPlus);}}}}}\\&{{\mathbf {\mathtt{{...}}}}}\\&{{\mathbf {\mathtt{{//\, run-time\, evalueted\, expressions:}}}}}\\&{{\mathbf {\mathtt{{auto\, particleType = stack.GetNextParticle().GetType();}}}}}\\&{{\mathbf {\mathtt{{auto\, charge = ParticleData::GetCharge(particleType);}}}}} \end{aligned}$$
The internal numeric particle-ID is just an index; the representation of particles in ngC code and \({{\mathbf {\mathtt{{enums}}}}}\) is obtained from the particle names in the \({{\mathbf {\mathtt{{xml}}}}}\) file. When specific interaction models internally use different schemes for particle identification, extra code is provided in the interface part to those models, where the conversion between the external and internal codes is performed.
For binary output purposes, however, ngC-internal codes are converted to the well-known, standardized PDG codes to ensure seamless interoperability with other software packages used within the HEP community. In any text output, e.g., log files, the output is by default converted to a human-readable identifiers. For example, \({{\mathbf {\mathtt{{cout<< someParticleCode<< endl;}}}}}\) might, depending on the value of \({{\mathbf {\mathtt{{someParticleCode}}}}}\), print out “\({{\mathbf {\mathtt{{e-}}}}}\)” or “\({{\mathbf {\mathtt{{D+}}}}}\)” unless a numerical output (in ngC or PDG scheme) is explicitly requested.
Framework
The ngC consists of an inner core and associated modules that can also be entirely external. Thus, there can be—and generally is—a distinction between code in the “core” of ngC and “outside” of this, defining a “frontier” where conventions, units, and all kinds of reference frames have to be adapted and converted in a consistent way. Most obviously is the case for all the existing hadronic event-generators and input/output facilities. Nevertheless, this can occur also in other components, and the frontier can thus occur at different places. The code needed for the conversions in the frontier must be provided together with the ngC framework. Special care must be taken in cases where different models, for example, use different constants for the mass of particles, which can lead to numerically unreasonable results like negative kinetic energies or invalid transformations. The details of such effects must be investigated and a comprehensive solution has to be found at a later time.
Particle Processing and Stacks
A core concept of ngC is that particles are stored on a dedicated stack. This is needed, since, in cascade processes, an enormous number of particles can be accumulated, requiring careful handling of such data. The stack can automatically swap to disk when memory is exhausted. The access and handling of particles on the stack has an important impact on the performance of the simulation process. In typical applications, it is optimal in terms of memory footprint to process the lowest energy particles first, but there can be situations where completely different strategy becomes relevant. The stack should be flexible enough to allow various user-specific interventions, while the simulation is writing to and reading from it.
In ngC, there is no need to have a dedicated persistent object describing an individual particle. Particles are always represented by a reference/proxy to the data on the stack. On a fundamental level, such stacks can be an FORTRAN common block, dynamically-allocated C++ data, a swap file, or any other source/storage of particle data.
Main Loop, Simulation Steps, and Processes
A central part of ngC is the loop over all particles on the stack. These particles are transported and processed in interactions with the medium, and as part of this, also CE tables can be filled. All these processes can produce new particles or modify the existing particles on the stack. Furthermore, the processes can produce various output data of the simulation process. CE migration matrices are either computed at program start or read from pre-calculated files. When the stack is empty (or any other trigger), the CE are solved numerically, which can, once more, also fill the particle stack. Thus, a double-loop is required here to process the full particle cascade:
The transport procedure needs to handle geometric propagation of neutral and charged particles, and thus, magnetic and electric deflections are important. The transport step length is used to distinguish two types of processes:
Continuous processes occur on a scale much below the transport step length, e.g., ionization, and thus, an effective treatment can be used.
Discrete processes typically lead to the disappearance of a particle and to production of new particles (typically in, but not limited to, collisions or decays).
The optimal size of the simulation step is determined from the list of all processes considered. The discrete process with the highest cross-section limits the maximum step size. However, also a continuous process can limit the step size, for example by the requirement that ionization energy-loss, the multiple-scattering angle, or the number of emitted Cherenkov photons cannot exceed specific limits. Furthermore, even particle transport is just a specific type of process which propagates particles. Since the propagation can lead a particle from one medium (e.g., the atmosphere) into another (e.g., ice), the particle transport can also have a limiting effect on the maximum step length allowed. An individual step cannot cross from one medium to another, but, for correct treatment, must terminate at the boundary between the two media. Furthermore, the particle transport in magnetic fields leads to deflections, where step size has to be adjusted according to the curvature of the deflection.
Thus, the geometric particle transport must be the first process to be executed. The information about the particle trajectory is important input for the calculation of subsequent continuous processes. Finally, the type and probability of one single discrete process is last to be determined for each simulated transport step. The simulated discrete process is randomly selected, typically according to its cross section or lifetime. The structure of the code to execute in one simulation step is thus:
The numerical solution of the CE is performed as being functionally fully equivalent to a normal propagation. While some of the processes can easily be formulated using migration matrices, our aim is, though, to scientifically evaluate and exploit the concept as extensively as possible, covering the production of Cherenkov photons, radio emission, etc. The data for the CE are stored in a table (which, in general, will cover multiple dimensions) representing histograms, for example, of the number of particles of specific type versus energy. The migration of particles to different bins in energy and to different particle types is described by pre-computed migration matrices. The matrices implicitly already encode the information on the geometric length of simulation steps. In some aspects, the CE approach corresponds to the approximation where the discrete processes are handled like continuous processes. This is reflected in the structure of the corresponding code:

The limits of the application of CE to specific processes are not known precisely at this moment, and certainly, there are various challenges facing us ahead. Particularly difficult processes are those which depend significantly on geometry, like Cherenkov or radio emission. It is up to the detailed studies to evaluate their performance and adapt the methods to potential (limited) use cases. This will be subject of research as part of the project.
Radio
Radio emission calculations, which, in the original CORSIKA, are provided by the CoREAS extension [46], rely on the position and timing information of charged particles to calculate the electromagnetic radiation associated with a particle shower. With its increased flexibility, ngC will enable radio emission calculations for a much larger range of problems. In particular, simulation of the radiation associated with showers penetrating from air into a dense medium or vice versa will become possible due to the more generic configuration of the interaction media. Feedback of the radio calculation to the cascade simulation (e.g., modifying simulation step sizes or possibly thinning levels) might increase performance and/or simulation accuracy. GPU parallelization has the potential to greatly reduce computation times, which are currently the main bottleneck for simulations of signals in dense antenna arrays. Simulations in media with a sizable refractive-index gradient will require certain ray-tracing functionalities, possibly even finite-difference time-domain calculations. The modular approach of ngC will allow the implementation of different radio emission calculation formalisms and enable systematic studies of their differences.
Environment
Traditionally, the medium of transport for CORSIKA was the Earth’s atmosphere. It is one of the purposes of ngC to allow for much more flexible combination of environments. This includes water, ice, mountains, the moon, planets, stars, space, etc. In this case, also the interface between different media becomes a matter of significance for the simulation. Showers can start in one medium and subsequently traverse into different media. The environment will be a dedicated object to configure for every physics application. The structure of the environment will be defined before compilation, the properties of the environment can be configured via configuration files in any way needed for the application. This can be either static or time-dependent.
The global reference frame is specified by the user and depends on the chosen environmental model. For a standard curved Earth this is the center-of-the-earth frame. With double floating-point precision this yields a precision better than a nanometer over more than 10, 000 km distance.
Particles are tracked in the global reference frame. The secondary particles produced by discrete processes occurring at specific locations in the cascade are transformed and boosted back into the global coordinate frame.
For specific purposes, like tabulations and some approximations, the shower coordinate system, in which the z-axis points along the primary-particle momentum, can also be relevant.
The initial randomization of primary-particle locations and directions is performed by dedicated modules, which can be changed and configured by the users to get, on the detector level, the desired distributions. The environment object provides all of the required access to the environmental parameters, e.g., roughly in the following form:
$$\begin{aligned}&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetVolumeId(point)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetVolumeBoundary(trajectory)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetTargetParticle(point)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetDensity(point)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetIntegratedDensity(trajectory)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetRefractiveIndex(point)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetTemperature(point)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetHumidity(point)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetMagneticField(point)}}}}}\\&{{\mathbf {\mathtt{{Environment:}}}}}{{\mathbf {\mathtt{{:GetElectricField(point)}}}}} \end{aligned}$$
This interface is sufficient, since, for example, a concept like altitude, defined as distance from a point to a surface on a direct line to the origin (center of the Earth), is needed only internally within the environment object.
The environment object will use a C++ policy concept to provide access to the underlying models. This requires re-compilation after changes in the model setup. However, individual models can still be configured at run time.
Geometric objects
We will keep the geometry description as simple as possible and to the level needed to achieve the physics goals. At the moment, these goals include being able to define different (typically very large) environment regions with distinct properties. Initially, it is sufficient to provide only the most simple forms and shapes, e.g., sphere, cuboid, cylinder, and maybe trapezoid as well as pyramid. The geometry package must be structured in a generic way, so that it can be extended, if needed, to include more complex and fine-grained objects at a later time. We are not planning to support general-purpose geometry as, for example, in Geant4 [28]. When, in a specific volume of the simulation, a very complex geometry is required, it is probably the best choice to allow seamless integration of ngC with Geant4, where particles can be passed-on from one package to the other.