Towards extracting cosmic magnetic field structures from cosmic-ray arrival directions

We present a novel method to search for structures of coherently aligned patterns in ultra-high energy cosmic-ray arrival directions simultaneously across the entire sky. This method can be used to obtain information on the Galactic magnetic field, in particular the integrated component perpendicular to the line of sight, from cosmic-ray data only. Using a likelihood-ratio approach, neighboring cosmic rays are related by rotatable, elliptically shaped density distributions and the significance of their alignment with respect to circular distributions is evaluated. In this way, a vector field tangential to the celestial sphere is fitted which approximates the local deflections in cosmic magnetic fields if significant deflection structures are detected. The sensitivity of the method is evaluated on the basis of astrophysical simulations of the ultra-high energy cosmic-ray sky, where a discriminative power between isotropic and signal-induced scenarios is found.


Introduction
It is generally assumed that ultra-high energy cosmic rays (UHECRs) of extragalactic origin are deflected in the Galactic magnetic field (GMF). This assumption for deflections is, on one hand, based on astronomical measurements of Faraday rotation and synchrotron radiation, which indicate magnetic fields of micro-Gauss strengths [1]. On the other hand, measurements of the atmospheric depth of cosmic rays can be explained by a composition of light to medium-heavy nuclei with charge numbers Z ≥ 1 [2,3,4]. Together, these measurements predict deflections of the nuclei of several tens of degrees within the galaxy compared to their original extragalactic directions [5,6,7,8,9,10,11]. In previous analyses that aimed to verify such deflections of cosmic rays, local regions of arrival were examined for energy ordering, but no scientific evidence for particle deflections was found for any region [12,13,14]. Recently, we introduced a fit method that determines the most probable extragalactic source directions by inverting the deflections that are caused by a specific GMF model and fitting a particle charge for each cosmic-ray event [15]. The several thousand free parameters are fitted using the backpropagation method developed for neural network training [16].
In this work, we present a novel approach in which all cosmic-ray arrival directions are simultaneously examined for alignment structures without relying on a certain GMF model [17]. The method is independent of energy ordering and analyzes only the arrival directions above a minimum energy threshold. In order to quantify coherent directional deflections, elliptically shaped regions are employed whose orientation is optimized by the frequency of neighboring particles (cf. Fig. 1). Coherence of adjacent ellipses is realized by means of a spherical harmonic expansion which assigns the local orientations of the set of ellipses.
The method is formulated as a likelihood ratio where for each cosmic-ray arrival direction, it is checked whether the cosmic ray is part of a deflection pattern or rather a particle of isotropic arrival directions. As a null hypothesis, circular regions are used instead of elliptical regions to distinguish the effects of coherent deflections from overdensities. The likelihood ratio is employed as the objective function for adjusting the spherical harmonic functions that specify magnetic field deflections. Thus, the test statistics of all measured particles are used to answer the question whether coherent deflections exist in the cosmic-ray arrival directions. If the answer to this question is positive, the orientations of the ellipses indicate the directional deflections caused by the GMF. This novel approach is hereafter referred to as: COherent Magnetic Pattern Alignment in a Structure Search (COMPASS).
The work is structured as follows: First, the analysis strategy is presented, covering the tangent vector field, the definition of the likelihood ratio, and its normalization. Two benchmark simulations are then introduced: one features simplified patterns of point sources to demonstrate the proof of concept and the other one is an advanced astrophysical simulation where UHECR nuclei from uniformly distributed sources are attenuated during propagation in the extragalactic universe. The ability to reconstruct the coherent directional deflections of the GMF and the advantages of using a circular reference model in the likelihood ratio are demonstrated in the following two chapters. Finally, the sensitivity of the method is investigated for both simulations, the simplified patterns, and the astrophysical universe.

Analysis strategy
The objective of the COMPASS method is to find alignment patterns in UHECR arrival directions simultaneously across the entire sky. In this approach, an adjustable vector fieldû(ϑ, ϕ) tangential to the local celestial sphere determines the orientation of elliptically shaped probability density functions (PDFs) that are centered on each cosmic-ray arrival direction. Here, ϑ and ϕ denote the polar angle and azimuthal angle, respectively, in a spherical coordinate system. The likelihood that the distribution of neighboring arrival directions is better described by an elliptical PDF than by a background hypothesis is then evaluated to optimize the orientation of the ellipses' major axes for all cosmic-ray events in one single step. In this way, the vector fieldû(ϑ, ϕ) locally aligns with elongated structures which are expected to occur from UHECR deflections in the GMF 1 . Technically, a likelihood ratio (cf. equation (14)) serves as an objective function in a minimization based on gradient descent. Additional constraints within the analysis can be accounted for by adding a corresponding penalization term to the objective function.
The basic concept of the COMPASS method is demonstrated in Fig. 1 where the initial state of the system is shown in the left panel and the fitted state in the right panel. Here, the initialization of the vector fieldû(ϑ, ϕ) is equal to the local unit vectorê ϑ . The upper panel shows the ellipsoidal PDF (green) and a corresponding Gaussian background PDF (red). One can see that the orientation of the ellipse has changed after the fit where an alignment with a prominent pattern originating from the red marked source is clearly visible. Additionally, the lower panel indicates the orientation of the adjusted vector fieldû(ϑ, ϕ) in the vicinity of the pattern. The orientation has changed considerably only for the cosmic-ray events which are part of the pattern, whereas the ellipses of most of the isotropic events have not changed substantially during the fit. This finding is also visualized by the color-coded likelihood ratio where high values are found only for events that are part of the pattern.
The method requires a high number of fit parameters, both for the parameterization of the vector fieldû(ϑ, ϕ) and the UHECR model for the likelihood ratio. For the analyzed simulated data set of UHECRs with energies above 40 EeV, the number of free fit parameters is of the order of O(1000). Our method uses the software package TensorFlow [16] to perform a gradient descent-based optimization in this high dimensional parameter space. To enable the computation of gradients within the scope of the backpropagation technique used in the field of machine learning, all operations of this analysis (cf. following subsections) -spherical harmonics expansion, parameterization of density functions, vector algebra operations, likelihood ratio -were written with the TensorFlow API.

Tangent vector field
A particular challenge is the parameterization of the tangent vector fieldû(ϑ, ϕ) which is meant to describe the orientation of deflection patterns caused by the GMF. The large-scale component of the GMF is most likely responsible for patterns of coherent deflection [18,19]. Thus, the orientation of patterns is expected to vary only slightly within a local domain of the sky.
Here, the adaptable vector field is first realized by a constant vector fieldû 0 (ϑ, ϕ) which serves as an initialization and is then modified locally by an angle Ψ (ϑ, ϕ). To preserve the local coherence of the resulting vector fieldû(ϑ, ϕ), the modification angle is parameterized by a spherical harmonics expansion of order k: where Y m (ϑ, ϕ) are the spherical harmonics functions and a m represent a set of free fit parameters to model any continuous differentiable function on the surface of the sphere.
Rapid variations on small angular scales can be suppressed by demanding the order k of the spherical harmonics expansion to have an upper limit of k = 5. Typical examples for this value k used in the analysis are k = 4 and k = 5 which yield 25 and 36 free fit parameters, respectively. The resulting modification Ψ for a certain directionr is then described by a rotation ofû 0 around the axisr: where the two arguments of the rotation matrix R are the rotation axis and angle, respectively. Note that here the polar angle ϑ is defined as being consistent with the Galactic latitude b; thus, in Cartesian coordinatesr is given by: For the initialized vector fieldû 0 (r), three approaches were investigated in this work. The hairy ball theorem states that there exists no nonvanishing continuous tangent vector field on the surface of the three-dimensional sphere [20]. Thus,û 0 always exhibits at least one region on the sphere where the vector field either radially diverges at a certain point or where it circularly curls around it. The following three initializations were used: -JF12 GMF: An intuitive approach is to initialize the fit with the best guess of the pattern orientations, e.g. the predictions of the currently most reliable GMF model, namely that developed by Jansson & Farrar [21] (JF12). Here, to obtain the local direction of deflection in the direction ofr, a magnetically highly rigid particle of 10 20 eV is backtracked, leaving the Galaxy in directionr . Then, the local tangent vector field is defined asû 0 (r) =r × (r ×r )/C where C is determined by the normalization according to ||û 0 || = 1. -Galactic meridians: Here, the local tangent vector u 0 (r) is equal to the local spherical unit vectorê ϑ in the Galactic coordinate system. The advantage of this initialization is that it is independent of a certain GMF model, while an overall symmetry with respect to the Galactic plane is still maintained. Additionally, certain models favor a general deflection preference towards the Galactic plane [22], which is approximately realized in this case. -Equatorial meridians: In analogy to the Galactic meridians, here the initialization is equal to the unit vectorê θ in the Equatorial coordinate system. This initialization has the advantage that one of the two points of divergence is located in the blind region of a ground-based Observatory. Here, it is used only as a crosscheck for the fit reliability.
A visualization of the three tangent vector field initializationsû 0 is presented in Fig. 2. Regions with curls or divergences of the vector field can be seen in all three initializations. At these locations, the analysis exhibits a decreased sensitivity to find locally aligned structures as the underlying vector fieldû 0 cannot describe them. For the initialization of JF12 GMF, two of these features For the JF12 GMF initialization -depending on the reliability of the model predictions -it may be beneficial for the sensitivity to include a penalization term for large model deviations in the objective function. This penalization can be achieved by limiting the integrated squared amplitude of Ψ over the entire celestial sphere as: As a potential improvement in scenarios where many directions exhibit alignment patterns, the tangent vector fieldû may be directly defined in the form of vector spherical harmonics (VSH) [23]. In this way, positions of curls and divergences of the vector field can be shifted dynamically over the sky during the fit. Thus, they will likely stall in sky regions without noteworthy contributions to the likelihood ratio, i.e. in regions without prominent alignment patterns.

Maximum likelihood ratio
The COMPASS method evaluates the distribution of arrival directions around each cosmic-ray event i in order to search for the existence of an elongated structure. Here, the likelihood is defined in analogy to the approach in [24]: the total UHECR sky model consists of a sum of a signal part E(r) S i (r) (with the contribution |f i |) which captures elongated patterns and a purely isotropic part E(r) which represents the geometrical exposure of the observatory [25]. This likelihood log(L S i ) is then compared to a suitable reference model by calculating the likelihood ratio. The fundamental difference with respect to the approach in [24] is that each cosmic-ray event i provides a separate density function S i (r) including a fit parameter f i which describes the contribution of the respective event to the log-likelihood as: where N tot is the total number of events in the data set andΘ j the unit vector of the arrival direction of cosmic ray j. Here, the signal and background contributions, E(r) S i (r) and E(r), are normalized over the surface A of the sphere: The set of f i represents a total number of N tot free fit parameters which are initialized with a value close to zero. Thus, the total number of free parameters of the COMPASS method is n fit = (k+1) 2 +N tot , where the (k+ 1) 2 part comes from the spherical harmonics coefficients a m .
The signal part S i (r) of equation (5) is constructed as an elliptically shaped density function which is centered at cosmic-ray directionΘ i . The major axis is aligned with the local direction of the tangent vector fieldû i ≡ u(ϑ i , ϕ i ). Since the GMF is not well known, there is no accurate mathematical description for the expected shape and size of a deflection pattern. Here, the density function S i is parameterized on the basis of a non-symmetrical Gaussian distribution where the width follows an ellipse equation as for all directionsr located in the same hemisphere as cosmic ray i, i.e.Θ i ·r ≥ 0. In the opposite hemisphere of the sky the density function is set to zero. The hyperparameters δ max and δ min denote the angular extent in the direction of the ellipses' semi-major and semi-minor axes, respectively. C denotes a normalization factor which is investigated in section 2.3.
For every cosmic ray i, the minimal distance δ ⊥ i,j between the directionΘ j of the neighboring cosmic ray j and the orthodrome ζ, as defined byΘ i andû i , is given by the relation: SinceΘ i andû i are unit vectors, the term ||Θ i ×û i || is equal to one. Thus, the numerator of the second term in the exponential function of equation (7) can be identified as the transverse displacement of cosmic-ray directionΘ j relative to the fitted orientationû i . Likewise, the greatcircle distance along the orthodrome ζ -and therefore along the pattern orientation -is given by sin(δ i,j ) = Θ j ·û i . Thus, for small angles, i.e. sin δ i,j ≈ δ i,j , equation (7) can be identified as a two-dimensional Gaussian distribution on the sphere where contour lines of equal function values S i follow an ellipse equation: For the reference model of the likelihood ratio, two approaches were explored in this work: a purely isotropic hypothesis following the geometrical exposure E(r) and a Gaussian reference model with identical signal strength f i (compared to the elliptical ones) to cancel out overdensities.
-Isotropic reference model E(r): The highest sensitivity to reject an isotropic scenario is obtained by testing explicitly against this hypothesis in the likelihood ratio. Thus, each cosmic ray i provides the same log-likelihood contribution: -Gaussian reference model G i (r): Here, the reference model is provided by equation (7) where both the major axis and minor axis radii are set to √ δ max × δ min with respect to the ellipse dimensions of the signal distribution. By choosing the geometric average of both dimensions, the effective solid angle of the symmetric For the signal and Gaussian reference parts, the cosmic-ray directionΘi is centered at Galactic coordinates ϑ = 0 • and ϕ = 0 • and the ellipse geometry is δmax = 30 • and δmin = 10 • where the major axis is aligned with theê ϑ unit vector. The geometrical exposure is given by the geometry of the Pierre Auger Observatory with a maximum zenith angle of 80 • . reference model is unchanged and equation (7) can be written as a symmetric Gaussian-like distribution: In the log-likelihood ratio, the same value for the contribution f i as in equation (5) is chosen to evaluate solely the difference between an elliptically shaped and a symmetrical pattern: During the TensorFlow fit, the gradient of f i is computed only with respect to the signal contribution in equation (5) in order to prevent active adaption of the reference model R.
Examples of the normalized probability density function of the elliptically shaped signal part S(r), the symmetric reference part G(r), and the geometrical exposure E(r) are visualized in Fig. 3.
For the objective function of the fit, each cosmic ray contributes with a separate log-likelihood ratio ts i as: For an isotropic arrival distribution, each individual ts i follows a χ 2 distribution with the degrees of freedom corresponding to the free fit parameters according to Wilks' theorem [26]. Therefore, the corresponding sum of all individual test-statistic contributions is Gaussian distributed according to the central limit theorem. The latter statement holds only if the individual contributions are independent of each other, which is not entirely the case in this application since the density functions of neighboring cosmic rays overlap. Nevertheless, it has been explicitly checked in Monte Carlo simulations of isotropic arrival distributions that the summed test statistics i ts i approximately follows a Gaussian distribution. Thus, the average test statistic ts provides a well-defined metric to evaluate the pattern alignment over the entire sphere: To maximize equation (14), the negative average test statistic is chosen as the objective function for the gradient descent. Additionally, in the case of the JF12 GMF initialization of the tangent vector field, the objective term F from equation (4) is added. Thus, the total objective function J exhibits one hyperparameter λ F which represents the confidence in the GMF model: For the gradient descent, the fit parameters f i and a m are adapted simultaneously by calculating the derivative of  the objective function J with respect to the parameters. As an optimizer for the minimization problem, RMSProp [27] is used, which supports an efficient optimization for adaptive parameters of different magnitudes. This is realized by allowing each adaptive parameter to have a separate step size which is increased (decreased) depending on a consistent (inconsistent) direction of the gradient with respect to the parameter in two consecutive update steps. For stability reasons, the optimizer was complemented by additional conditions for the learning rate adaption.

Normalization
The normalization factor C in equation (7) is generally determined by equation (6) and requires a numerical integration for a non-uniform geometrical exposure E(r).
Since the exposure E(r) = E(δ(r)) depends only on the equatorial declination δ [25], the surface integral of the term E(δ) S i (r) depends only on the center directionΘ i of the ellipse and its relative orientation |α i | towards the local spherical unit vectorê δ in equatorial coordinates. Asû i determines the orientation of the ellipse S i (r), the angle α i is defined by cos α i =û i ·ê δ . The exposure-weighted integral for an ellipse with dimensions (δ max , δ min ) = (30 • , 10 • ) is visualized as a function of the equatorial declination δ of the center direction and for different values of α i in Fig. 4. It can be seen that the inverse normalization factor 1/C approaches the probability density function value of the exposure E(δ) for intermediate equatorial declinations −60 • < δ < 20 • . In the border regions of the geometrical exposure, the integral is larger than the respective exposure values due to the extent of the elliptical distribution. For the gradient descent, the normalization factor was calculated for a grid of equatorial declinations δ and orientations α and then interpolated linearly between the grid points.

Benchmark simulations
The sensitivity of the COMPASS fit method is evaluated in two different benchmark simulations. Proof of concept is given on the basis of a simple four-source model where each source contributes an equal number N s of cosmic rays. The second benchmark extends an astrophysical simulation of an extragalactic source population [28] by deflections in the GMF and the observational exposure E(r). Therefore, given a certain source density, it provides the most reasonable estimate of the sensitivity. Both simulations mimic the current data set of the Pierre Auger Observatory for anisotropy studies (e.g. [17]) above energies of 40 EeV with a total event number of N tot = 1119 and zenith angles up to 80 • .

Benchmark 1: Distinct source scenario
The first benchmark simulation consists of UHECRs with energies E that follow the parameterized power law from [29] above an energy threshold of 40 EeV. The nuclear charges are assumed to be energy-independent and uniformly distributed between Z = 1 and Z = 8. While N s out of the N tot = 1119 UHECR events are assigned to each of four different, randomly placed point-like sources in the sky, the remaining cosmic rays are distributed isotropically, following the geometrical exposure E(r) of the experiment.
The deflections in the GMF are simulated as follows: First, the cosmic rays with magnetic rigidity R = E/Z are propagated through the large-scale component of the JF12 model using a magnetic field lens [6,30]. Next, a rigiditydependent Gaussian smearing of δ = 0.5 × Z/E [EeV] rad is applied which corresponds approximately to the median scattering angle in the JF12 random and striated fields. A visualization of the arrival directions of this step is shown for N s = 20 source events as black circles in the upper panel of Fig. 5.
If the JF12 GMF initialization is chosen in the fit method, we additionally apply a shift of arrival directions to mimic the uncertainties in the GMF model. Here, the entire pattern of source m is modified by a spherical anglẽ Ψ m performed by a rotation of the individual cosmic-ray arrival directionsΘ i from source m around its direction r m by the angleΨ m . The construction of this rotation is sketched by the dotted line in the top panel of Fig. 5. For the four sources, spherical anglesΨ m are selected in good accordance with uncertainties between existing GMF models [18], as examplesΨ 1 = −30 • ,Ψ 2 = 0 • ,Ψ 3 = +45 • andΨ 4 = +15 • in order of the ascending Galactic longitude l (from right to left in the upper panel of Fig. 5). The colored symbols indicate the resulting arrival directions after this displacement. All arrival directions, consisting of 80 source events and 1039 isotropic events, are shown in

Benchmark 2: Astrophysical simulation
The second benchmark simulation is based on results obtained in a combined fit of the UHECR observables at the Pierre Auger Observatory [31] and their anisotropy implications for given source densities following [28]. Here, source candidates are uniformly distributed in the universe following a source density ρ S which results in anisotropies due to attenuation effects during the propagation.
The deflection in the GMF is applied in the same way as for the benchmark 1 scenario by using the JF12 model and a rigidity-dependent Gaussian smearing of δ = 0.5 × Z/E [EeV] rad. The relative arrival probability for different extragalactic directions and rigidities caused by the GMF (e.g. [22]) are accounted for. The relative observation probability resulting from the geometrical exposure of the observatory is likewise accounted for. Fig. 6 shows the resulting arrival directions of the benchmark 2 simulation for a source density of ρ S = 10 −2 Mpc −3 . The circular symbols denote cosmic-ray arrival directions, the color scale corresponds to the nuclear charge Z, and the gray shaded events originate from sources which contribute at least three cosmic rays. One can see that patterns may occur in multiple regions of the sky with strongly varying event contributions attributable mainly to the source dis- Fig. 6. Example of benchmark 2 scenario consisting of an astrophysical simulation including deflection in the JF12 model for the GMF. Cosmic rays originate from uniformly distributed sources of density ρS = 10 −2 Mpc −3 and are attenuated by extragalactic photon fields. Gray shaded events denote cosmic rays which originate from a source with at least three contributing events. For this specific source distribution the strongest source contributes 30 cosmic rays, which corresponds to the median value of 1000 simulated universes for this source density. The star symbols denote source directions where the size is proportional to the cosmic-ray contribution. The skymap is shown in the Galactic coordinate system. tance. Some sources situated outside the visible sky of the observatory (e.g. source at Galactic coordinates l ≈ 110 • and b ≈ 50 • ) still contribute a substantial fraction of cosmic rays due to coherent deflection in the GMF.
Additionally, if the tangent vector field is initialized as JF12 GMF, again, an uncertainty angleΨ for the GMF is simulated. To conserve consistent deflection patterns of sources in similar directions of the sky, the uncertaintỹ Ψ (r) =Ψ ar ·d i is modeled as a dipolar function with amplitudeΨ a = 45 • and random direction of the dipole maximumd i for each simulated universe i.
This simulation of the UHECR universe exhibits only one single free parameter, the source density ρ S , which directly determines the degree of anisotropy in the arrival directions. The higher the source density, the more sources are within a horizon where attenuations do not play an important role and, therefore, the more isotropic the sky is.

Reconstruction of the Galactic magnetic field
In this section, proof of concept is provided by showing that the orientation of patterns can be correctly reconstructed based on the benchmark 1 simulation from section 3.
During the minimization process, the modification angle Ψ (ϑ, ϕ) rotates the ellipses of the signal hypothesis of equation (1) such that they align with elongated patterns in the cosmic-ray arrival direction distribution. For the JF12 GMF initialized vector fieldû 0 , the angle Ψ (ϑ, ϕ) corresponds directly to a correction of the JF12 model in Fig. 7. Visualization of the modification angles Ψ (ϑ, ϕ) (colorcoded) which is defined relative to the JF12 model (JF12 GMF initialization forû0) after a fit to benchmark 1 scenario with Ns = 20 source events. Gray circles denote cosmic rays that originate from one of the four sources and the dotted lines mark contours of 15 • -spacing in Ψ . sky regions where a significant pattern is found. Hence, for the benchmark 1 simulation, the final angle Ψ i ≡ Ψ (ϑ i , ϕ i ) of cosmic rays i which originate from one of the simulated sources m is expected to approach the simulated uncertaintyΨ m . Here, the necessary correction is modeled by the spherical anglesΨ m for the four sources wherẽ . section 3). Note that a non-linear deflection behavior in the GMF may disturb the correct values of Ψ m . This effect is particularly strong for cosmic rays with a low rigidity E i /Z i , i.e. for high absolute deflection angles with respect to their source.
Here, for the first application of the fit, the order of the spherical harmonics expansion of equation (1) is defined as k = 5, which corresponds to 36 free fit parameters. In this case, modifications of the GMF model can be performed coherently in sky regions that have angular scales above the order of 180 • /k = 36 • . The degree of modification Ψ itself is constrained by the hyperparameter in the objective function (15) where a value of λ F = 1 is chosen for this purpose. For the ellipse geometry, values of (δ max , δ min ) = (10 • , 5 • ) are chosen in equation (7). Furthermore, the Gaussian reference model from equation (12) was selected for the likelihood ratio where the effective Gaussian width is √ 10 • × 5 • ≈ 7.1 • . The fitted modification function Ψ (θ, ϕ) is visualized in Fig. 7 together with the cosmic rays that originate from the simulated source candidates. As an overall impression the color code in the vicinity of the source candidates m agrees with the simulated uncertaintiesΨ m . To quantify the method's reconstruction abilities, for each source m the fitted Ψ i for the 10 closest cosmic rays that originate from the source are averaged. The corresponding averaged values are Ψ 1 = (−32 ± 1) • , Ψ 2 = (6 ± 2) • , Ψ 3 = (+47 ± 5) • and Ψ 4 = (+15 ± 5) • , which are in good agreement with the simulated uncertaintiesΨ m . The next step is to investigate if orientations of patterns as simulated with the JF12 model can also be cap- Fig. 8. Visualization of the tangent vector fieldû(Θi) (black lines) of equation (2) using the Galactic meridians initialization ofû0(r). Black star symbols show the simulated source directions while the red circular symbols mark the Ns = 20 events per source which have been displaced by the JF12 model without simulated GMF uncertainties.
tured without including information on the explicit GMF model. For this purpose, we chose the Galactic meridians initialization ofû 0 where initial ellipse orientations are aligned with the local spherical unit vectorê ϑ of the latitude in the Galactic coordinate system. Since no information on the simulated GMF is included, the penalization factor of equation (15) is not required and is therefore set to λ F = 0. Thus, the tangent vector field can be rotated by the angle Ψ (θ, ϕ) without constraint. For this setup, the degree of the spherical harmonics expansion was decreased to k = 4 to avoid rapid changes of Ψ on small angular scales. The order k of the spherical harmonics expansion is the most challenging free parameter since the optimal choice depends on the angular scale of domains with a coherent GMF deflection. While more complex patterns can generally be fitted with an increasing order of k, these structures are more difficult to interpret.
A visualization of the fitted tangent vector fieldû(Θ i ) for each individual cosmic-ray arrival direction Θ i is presented in Fig. 8 where the source events are highlighted in red. All four deflection patterns in this simulation were successfully captured by an alignment of the tangent vector fieldû(r) along the local track of source events. There is also structure visible in sky regions without source contribution where fluctuations of isotropically distributed arrival directions were connected along their most prominent patterns. In the sky region at Galactic coordinates (l, b) ≈ (−90 • , −75 • ) the isotropic fluctuation was even strong enough to rotate the initialized tangent vector field u 0 by up to 95 • . This suggests that arbitrarily oriented alignment patterns of cosmic-ray arrival directions can be captured even when oriented orthogonally with respect to the chosen initializationû 0 .
To assess whether a fitted pattern is caused by isotropic fluctuations, the individual cosmic-ray test statistics from equation (13) have to be compared to those found in isotropic skies. These sensitivity studies are presented in section 6.

Reference model of the likelihood ratio
In this section, two different choices of the reference model (cf. section 2) for the likelihood ratio as defined in equation (13) are studied: the isotropic model E which follows the geometrical exposure of an experiment, and the Gaussian model G i with identical signal contributions f i as assigned to the elliptically shaped signal model S i . Again, the ellipse geometry is defined as (δ max , δ min ) = (10 • , 5 • ), the initialization forû 0 is Galactic meridians, the spherical harmonics order is k = 4, and the hyperparameter λ F = 0. To obtain an impression of the performance over the sky, it is useful to investigate the individual test statistics ts i defined in equation (13) as well as the anticipated signal contribution f i from equation (5).

Isotropic reference model E
The resulting test statistics ts i after the fit using the isotropic reference model (cf. equation (10) Clearly, for benchmark 1 scenario in the upper panel, the patterns produced by the four simulated sources exhibit cosmic rays with a substantially larger test statistic ts i compared to the isotropically distributed background events. Accordingly, the anticipated signal contribution f i of the source events is larger, as shown by the size of the markers. Some local clusters of events with test statistics ts i > 0 can also be found in the isotropically distributed background events, however, with considerably smaller values of both the test statistic ts i and the signal contribution f i . The average test statistic from equation (14) is ts ≈ 1.6. The highest signal fractions reach values of about f i = 1.4%, which is in good agreement with the 20 of 1119 injected signal cosmic rays per source. Note that the complete signal contribution of 20/1119 ≈ 1.8% is not necessarily reached even for the innermost cosmic ray of the pattern due to fluctuations in the isotropic background and the ellipse's limited extent of 10 • in the semi-major axis, which is mostly less than the extent of the pattern.
To assess the impact of solely overdense but not elongated structures on the test statistic, a new simulation is studied which again consists of four sources each emitting 20 cosmic rays. Instead of simulating deflections in the GMF, the source events are drawn from a Fisher distribution [32] of a width of 10 • centered on the direction of the source. Here, to enable a better comparison between both scenarios, the source directions were approximately centered within the resulting arrival patterns of the benchmark 1 simulation.
As shown in the lower panel of Fig. 9, the method also responds with high individual test statistics ts i and anticipated signal contributions f i due to the event excess of N s = 20 relative to an isotropic expectation. However, for three of the four Fisher distributions the resulting test statistics are much smaller than in the case of the benchmark 1 simulation.

Gaussian reference model G i
The right panel of Fig. 9 again shows the individual test statistic ts i (color coded) and the fitted signal contributions f i (size of circles) for the Gaussian reference hypothesis G i as defined in equation (12). For the benchmark 1 simulation in the upper panel, the anticipated signal contribution f i is approximately equal to the case where the isotropic reference model E was chosen, as can be estimated from the size of the markers. However, while in the case of the isotropic reference model both the event excess and the elongation of the structure contributed to the test statistic, for the Gaussian reference model only the latter information can be used. Thus, on the one hand, the overall scale of the individual test statistics ts i is much smaller, as reflected by the color scale. Therefore, the average test statistic of equation (14) drops to ts ≈ 0.8. On the other hand, since there is no sensitivity to overdense regions, some of the patterns that were caused by fluctuations of isotropically distributed background events are no longer visible. Thus, the purity of detected patterns is increased compared to when the isotropic reference model was used.
Again, the response to Gaussian overdensities is assessed in the bottom panel of Fig. 9 with four Fisherdistributed event clusters of 10 • Gaussian width. Since the Gaussian-shaped event structures are well described by the Gaussian reference hypothesis G i , there is a significant loss in the test statistic for the overdense sky regions compared to when the isotropic reference E is used. In the vicinity of three Gaussian event clusters, there is only barely more fitted signal contribution f i compared to the remaining sky. The individual test statistics ts i visibly deviates from natural isotropic fluctuations only for the events of one of the Gaussian clusters, namely at coordinates (l, b) = (−20 • , −20 • ).
As there are already known event excesses in UHECR data, e.g. in data of the Pierre Auger Observatory for this energy threshold (e.g. [24,33]), there is a risk of detecting these features again rather than new elongated structures when using the isotropic reference model E. Therefore, in the following we use the Gaussian-like reference model G i where the effects of overdensities are mostly canceled out by the likelihood ratio.

Sensitivity studies
In this section we investigate the sensitivity of the COM-PASS method with respect to its ability to reject isotropic distributions of cosmic-ray arrival directions. According to the findings from section 5, for the following subsections the Gaussian reference model G i (cf. equation (12)) is chosen in the likelihood ratio. In addition, following section 4 and the studies in section 6.3, the tangent vector fieldû 0 is initialized along the Galactic meridians -i.e.û 0 is equal to the local spherical unit vectorê θ . Therefore, the penalization term F in equation (15) is removed by setting λ F = 0. For a comparison of the sensitivity with a more classical analysis to search for elongated structures refer to [34].

Sensitivity for distinct source scenario
The distribution of individual test statistics ts i as obtained in the benchmark 1 simulation from section 3 is presented in Fig. 10. As already suggested by the upper right panel of Fig. 9, most of the events that exhibit high test statistics are attributed to one of the four sources. In total, more than half of the source events show test statistics larger than 3, which, in turn, is only reached for about 5% of the isotropic events. Instead, the isotropic distribution peaks close to zero, with about 50% of events exhibiting test statistics smaller than 10 −5 . One can of course find a statistical measure to reject isotropy based on the evaluation of events with a high test statistic, i.e. based on the tail of the distribution in Fig. 10. However, it was found that the average test statistic ts provides the most stable measure for various simulation setups.
In the next step, we evaluated the average test statistic ts for different numbers of source events N s = 5, 10, 15, 20 in the benchmark 1 simulation. As shown in Fig. 11, the resulting values for the average test statistics are To calculate the chance probability p val of obtaining these average test statistics from an isotropic arrival-direction distribution, the method is additionally applied to 5 × 10 4 isotropic realizations of the sky which follow the geometrical exposure of the observatory. The distribution of the average test statistics ts iso for isotropic skies is shown as a gray histogram in Fig. 11. While there is no isotropic sky yielding a higher average test statistic than the scenarios with N s = 20 source events, the isotropic chance probabilities for the simulations with smaller source events are p val = 0.40, 0.14, 0.003, in order of increasing N s . As expected from the central limit theorem (cf. section 2), the gray histogram shows that the average test statistic ts iso for an isotropic arrival sky approximately follows a Gaussian distribution. Thus, the sensitivity for the scenario shown in Fig. 5 with N s = 20 source events can be estimated by fitting a Gaussian distribution to the gray histogram. In this case, the estimated chance probability is about 3 × 10 −11 which translates to about 6.5 σ standard deviations in the normal distribution.

Sensitivity for the astrophysical universe
While the previous section provided an idea of the sensitivity for comparably clear patterns with a certain signal contribution, this section evaluates the expected implication for an astrophysical universe of uniformly distributed UHECR sources. For the benchmark 2 simulations, 300 simulated universes for each of the source densities ρ S = (10 −1 , 3 × 10 −2 , 10 −2 , 3 × 10 −3 ) Mpc −3 were investigated. The average test-statistic distribution as obtained from the fit exhibits a comparably large spread, which is consistent with the fluctuations in the degree of anisotropy. The median and 68 percentiles of the average test statistics for the four source densities are ts = 0.27 +0. 16 −0.08 , 0.38 +0.39 −0.11 , 0.54 +0.31 −0.22 , 0.86 +1.92 −0.30 , respectively, as visualized in Fig. 12. As expected, the test statistic increases with decreasing source density as the arrival scenarios become increasingly anisotropic. For the isotropic chance probability p val , the average test statistic is again compared to the fit results for the 5×10 4 isotropic realizations which are shown as a gray histogram in Fig. 12. For the source densities of 10 −1 Mpc −3 and 3·10 −2 Mpc −3 the isotropic chance probability can be directly determined by the fraction of the gray distribution that is above the corresponding test-statistic values. Here, the chance probabilities yield p val = 0.229, 0.013 for the two source densities respectively. For the smaller source densities of 10 −2 Mpc −3 and 3 × 10 −3 Mpc −3 , the isotropic chance probability can again be estimated by parameterizing the null hypothesis with a Gaussian distribution. In this case, the estimated values are 8.3 × 10 −6 and 1.4 × 10 −17 , respectively, which correspond to a deviation of 4.3 σ and 8.5 σ standard deviations in the normal distribution. Thus, in the case of a result on data that is compatible with an isotropic distribution, the density of UHECR sources for this astrophysical model can be estimated. Fig. 13 shows arrival directions in the sky region around the strongest individual test statistic ts i for the scenario that exhibits the median average test statistic ts out of the 300 simulations with a source density of 10 −2 Mpc −3 . Here, the strongest individual test statistic is ts i = 8.6 and the corresponding cosmic-ray event (yellow point in the center of the sky patch) is part of the pattern from the source at Galactic coordinates l ≈ −133 • and b ≈ 14 • . Contributing with a total of 29 events, this is the strongest source in this realization. The orientation of the tangent vector fieldû(ϑ, ϕ) (short black lines) additionally suggests that the alignment works reasonably well even for patterns that are only separated by about 20 • . Thus, the vector fieldû(ϑ, ϕ) is expected to provide an adequate coherent description of the deflection in the GMF for sufficiently strong signals.

Optimization of free parameters
In this subsection we evaluate the impact of the free parameters more profoundly based on the astrophysical benchmark 2 simulation with a source density of ρ s = 3 × 10 −2 Mpc −3 from section 3. Firstly, the initialization method of the tangent vector fieldû 0 and accordingly the free parameter λ F are addressed. Secondly, the impact of the ellipse geometry, namely the semi-major and semiminor axes, on the performance is studied.

Confidence in the JF12 GMF initialization
The initialization of the tangent vector fieldû 0 according to the predictions from the JF12 model (JF12 GMF ) is visualized in the top panel of Fig. 2. As pointed out in section 2, depending on the reliability of the GMF model it may be beneficial to constrain the allowed deviations Ψ (ϑ, ϕ) with equation (4), since this reduces high test statistics from fluctuations in isotropic arrival distributions. Therefore, the isotropic chance probability p val is investigated as a function of the free objective parameter λ F in equation (15) for two reasonable estimates of the uncertainties in GMF models.
The first estimate is obtained by simulating the deflection in the GMF with the model of Pshirkov et al. using an antisymmetric disk field (PT11-ASS) [35] instead of the JF12 model used in section 3. The second estimate is given by a modification of the JF12 model with dipolar distributed modification angles of amplitude Ψ a = 45 • as described in section 3. The isotropic chance probabilities for both estimates are presented in Fig. 14 as a function of λ F . For high values of O(λ F ) = 10, the tangent vector field is too stiff and the test statistic is therefore obtained for ellipses which are not aligned with the simulated structures. There is a minimum for both assumptions of GMF uncertainties located consistently at values about λ F ≈ 0.5. For an entirely flexible tangent vector field, i.e. for the parameter λ F = 0, the additionally found patterns from isotropic skies reduce the sensitivity slightly; however, the overall isotropic chance probability is still of the same order.
Since the uncertainties of the GMF models might be even higher than assumed here and particularly uncertain in the Galactic disk region, the advantage of a hyperparameter λ F > 0 may be even smaller. Therefore, in the following ellipse geometry investigation the penalization term F is canceled in equation (15) and the Galactic meridians initialization is utilized which features a symmetry with respect to the Galactic disk.

Ellipse geometry
Here, we assess the impact of the ellipse geometry, the semi-major axis width δ max , and the semi-minor axis width δ min , for the astrophysical benchmark 2 simulation. For the two-dimensional scan of the widths, angular bins of (3 • , 5 • , 7 • , 10 • , 15 • , 20 • ) were chosen where the condition δ max > δ min is required by design. Thus, there are 15 different scanned ellipse geometries. To calculate the Fig. 15. Scan of the ellipse geometry (δmax, δmin) for the Galactic meridians initialization and λF = 0 evaluated on the astrophysical benchmark 2 simulation with source density of ρS = 3 × 10 −2 Mpc −3 . The isotropic chance probability is calculated with a total of 10 4 isotropic skies for each of the ellipse constellations.
isotropic chance probability p val , the analysis is applied to a total of 10 4 isotropic skies for each geometry.
The resulting median chance probabilities p val of 300 sky realizations are displayed in Fig. 15 where each of the five segments indicate one of the semi-major axes δ max . Generally, larger and less elongated ellipse sizes are beneficial for the sensitivity of the COMPASS method. Consequently, the largest ellipse with values of δ max = 20 • and δ min = 15 • for the semi-major and semi-minor axes, respectively, yields the lowest isotropic chance probability of p val = 2.8 × 10 −3 . This result is significantly better than the previously considered ellipse geometry of (δ max , δ min ) = (10 • , 5 • ), which exhibits a chance probability of p val = 1.2 × 10 −2 in the same benchmark scenario. However, the specific behavior of the sensitivity for the various ellipse geometries may be characteristic for the simulation setup. Since the angular scales of existing structures are unknown for an application to data, it is suggested to scan the ellipse geometry in a reasonable range.

Conclusion
In this work we investigated a novel approach to search for structures in the arrival directions of UHECRs induced by cosmic magnetic fields. A dynamic vector field tangential to the local celestial sphere is utilized to fit the orientation of elongated patterns. Thus, elliptically shaped density functions are aligned by the vector field and evaluated in a likelihood ratio with a circular reference model. This work demonstrates that the orientation of the directional deflections of the GMF is detectable by faint signatures of simulated UHECR sources. The sensitivity of the method was investigated by means of an astrophysical simulation of uniformly distributed sources where UHECR nuclei are attenuated during propagation in the extragalactic universe. It was shown that the hypothesis of isotropically distributed arrival directions can be excluded with more than 4 σ Gaussian significance if a maximum spatial density of UHECR sources of ρ S = 10 −2 Mpc −3 is assumed.