1 Introduction

Kalman filter [2] is a versatile algorithm that has wide applications in various fields, like [311] etc. In 1987, Frühwirth [12] demonstrated its application to track fitting problems in high energy physics experiments for the first time. Since then, many experiments adopted this tool for track fitting purpose (for example, [13, 14]) and various authors contributed to different aspects of the algorithm (for example, [15, 16]). The problem is to estimate the charges, momenta, directions etc. of the observed particles from the measurements performed along their tracks.

These parameters are combined together to form a state vector. Usually, a Kalman filter based program (estimator) deduces the near-optimal values of the elements of the state vector iteratively, from the weighted averages of the predicted locations of the particle positions and the measured particle positions at the sensitive detector elements. In general, the prediction is done based on some analytical (or numerical) solution to the equation of motion of a charged particle passing through a dense material and magnetic field (see Ch. 3 of [13], or [16], for instance). However, the prediction represents the deterministic aspect of the particle motion. But the motion of the particle is also affected by the random processes like multiple Coulomb scattering [17] and energy loss fluctuations [18]. These are the stochastic perturbations to the deterministic motion of the particle, the latter being controlled by the magnetic field and the average energy loss. The estimator must take into account the random fluctuations appropriately, because precision of the filter estimation depends crucially on proper treatment of these random processes. Clearly, when the charged particle passes through thick layers of dense materials, the effects of such fluctuations are greater.

Fig. 1
figure 1

a ICAL detector geometry and b Magnetic field map shown in central module. The same field pattern exists in side modules as well. Figures taken from [19]

This situation arises in case of track fitting in the Iron CALorimeter (ICAL) experiment, which is an upcoming neutrino oscillation experiment under the India-based Neutrino Observatory (INO) project [20]. It comprises a 50 kiloton magnetized iron calorimeter detector of dimension \(48\,\mathrm{\mathrm m}\times 16\,\mathrm m\times 14.4\,\mathrm m\), divided into three identical modules, as seen in Fig. 1a. The sensitive detector elements are made by \(2\,\mathrm{m}\times 2\,\mathrm{m}\) resistive plate chamber detectors (RPCs), placed horizontally, which are sandwiched between 5.6 cm thick plates of iron. RPCs are planes of constant Z coordinates and any two RPCs are separated by vertical width of 9.6 cm. The iron plates are magnetized with current coils which generate up to \(1.5\,\mathrm T\) of magnetic field (Fig. 1b). ICAL will try to resolve the neutrino mass hierarchy, by observing the earth matter effect on the neutrino oscillation. The experiment is most capable of the measurement of the properties of the muons, coming from the charged current interactions of the muon-neutrinos. These muons travel through the different layers of detector materials and leave electronic signals at the RPC planes. The position measurements done from these signals are used for track fitting. Since ICAL will observe atmospheric neutrinos of a wide energy range (\(E_\nu \in 1 - 15\,\mathrm GeV\)) coming from all directions, it is clear that a major fraction of muon tracks will be strongly affected by multiple scattering, while crossing the horizontal thick layers of iron at various angles. The thickness and the radiation lengths of the dense materials that the muons have to pass through within the ICAL detector are shown in the following Table:

Materials

Iron

RPC-glass

Graphite

Copper

Aluminium

Thickness (cm)

5.6

0.3

3.00002e-03

9.99999e-03

0.0150001

Rad. length (cm)

1.75667

11.6285

19.2293

1.43516

8.87889

The width of the scattering angle is related inversely to the particle momentum and the radiation length of the material it is passing through [21]. So, the muons will be subjected to significant amount of multiple scattering inside iron. The effect will clearly be more pronounced at lower energy. It is also important to note that the flux of atmospheric neutrinos is much higher at lower energy [22]. Thus, the ICAL track fitting program must account for the random effects in a proper fashion.

Let us consider the state vector x \(=(x,y,t_x,t_y,q/p)^T\) which is used in many experiments like INO–ICAL [19, 20, 22, 23], MINOS [2426], LHCb [2730] with forward detector geometry. Since the Kalman prediction is performed along an approximate particle trajectory, it introduces some deterministic uncertainties (dependent on magnetic field, average energy loss etc.) to the elements of the state vector. The random processes introduce additional uncertainties to these elements. These uncertainties are accounted for in a error covariance matrix \(C=\langle ({\mathbf{{x}}}-{\bar{\mathbf{{x}}}})({\mathbf{{x}}} -{\bar{\mathbf{{x}}}})^T\rangle \), where \({\bar{\mathbf{{x}}}}\) contains thetrue values of the elements of the state vector. The total error matrix propagated from a point l to the next \(l+dl\) along the track is given by:

$$\begin{aligned} C_{l+dl}=FC_lF^T+Q \end{aligned}$$
(1)

where F denotes the Kalman propagator matrix, encoding the deterministic factors between l and \(l +dl\). F propagates the errors of the track parameters, represented by C matrix, deterministically, from l to \(l+dl\). On the other hand, the matrix Q represents the error contributions from all the random processes to the total error C at \(l+dl\). However, between two measurement sites, separated by some distance, the track fitting program should be sensitive to the possible variations of track parameters (momenta, direction etc.) and also to the possible variations of ambient parameters (materials, magnetic field components etc.). Then, one must apply Eq. (1) repeatedly, in small tracking steps, while approaching towards the next measurement site. Thus, the effective propagator matrix becomes \(F=\Pi _{j=1}^{N}F_j\) between two measurement sites [19]. Hence, the total propagated error at the next measurement site equals the sum of the (a) matrix representing deterministic error propagation \((\Pi _{j=1}^{N} F_j)\ C_{l_0}\ (\Pi _{j=1}^{N} F_j)^T\) and the (b) sum of the matrices of the deterministically propagated random uncertainties in all the tracking steps. It can be shown from Eq. (1) that this term becomes equal to (Eq. (3.16) of [13]):

$$\begin{aligned} \sum _{m_s=1}^{N}F_{m_s,k}Q_{m_s}F_{m_s,k}^T \end{aligned}$$
(2)

where \(F_{m_s,k}\) denotes the product of \(F_j\)s between \(m_s\)-th step and the final step. That is, to propagate the random uncertainties of a ‘deeper’ layer, a longer ‘chain’ of \(F_j\)s is required.

The variances of the position, angle and the momentum elements of the state vector, arising from the multiple scattering and energy loss fluctuation in the thin layer of dense materials, have been investigated by various authors [17, 21, 3134]. However, when passage of a particle through a thick layer of dense material is considered, one has to use effective variances and covariances, valid in the thick scatterer limit. These terms are obtained from a thorough study of Eq. (2) (see Appendix B of [1] written by Mankel). The author takes a simple form of the Kalman propagator matrix (F) and obtains a set of 10 ordinary linear coupled differential equations. The solutions to these equations correspond to the elements of the random noise matrix in the thick scatterer limit.

However, the result of this work is not general in two respects: (1) the propagator matrix has been assumed to be constant and very simple in form (see Sect. 2.3). This results in simple analytical form of elements of the random noise matrix Q (Eq. (12)). However, in many experiments, the Kalman propagator matrix may evolve significantly from iteration to iteration and may have a quite non-trivial form (for example, in ICAL track fitting program [19]). Naturally, in these cases, one needs to find the more appropriate form of the random noise matrix. (2) This work [1] concerns only the \(4\times 4\) block of the random noise matrix that corresponds to the position and the angular elements which directly suffer from multiple scattering. But the functional forms of the variance and covariance terms of q / p with other state vector elements which are affected by the fluctuations in energy loss, are not considered in this work.

The purpose of this paper is to derive the appropriate functional form of all the elements of the random process noise matrix for a curved track in magnetic field in the thick scatterer limit. We shall take a non-trivial and evolving propagator matrix for this purpose and ascertain what difference it makes to the track fitting performance. Even if the modification does not yield significant improvements in the track fitting performance, this exercise serves two purposes: (a) it completes the problem from a mathematical point of view and (b) it confirms that Mankel’s approximate solutions are good enough. To the best of knowledge of the authors, no work has been done before which addresses these two issues.

The problem will be formulated mathematically in the next Sect. 2. The desired elements of the random noise matrix will be seen to be solutions of a matrix differential equation. Then, we will describe two methods of obtaining its solutions in Sect. 3. Among these methods, the first one (decoupling a set of linear coupled ODEs) is practical for implementation and will be used in the ICAL track fitting program in the presence of magnetic field. The details of implementation technique will be discussed in Sect. 4. The relevant details on software supports will be covered in Appendix C. The reconstruction performance will be shown in Sect. 5. We will conclude with a discussion of the merits and demerits of the approach in Sect. 6.

2 Mathematical formalism

In case of the deterministic propagation of the random uncertainties, Kalman propagator matrix F transports these uncertainties at l to \(l+dl\). The total random uncertainty matrix at \(l+dl\) has another term coming from the random uncertainties introduced to the direction and the momentum of the particle due to the multiple scattering and the energy loss fluctuations by the material between l and \(l+dl\). We call this term \(\delta Q\). The overall process noise matrix Q at \(l+dl\) is given by:

$$\begin{aligned} Q(l+dl)=F Q(l) F^T+\delta Q \end{aligned}$$
(3)

In Eq. (3), F is the \(5\times 5\) propagator matrix for the Kalman filter. It can be written as [15, Eq. (24)]:

$$\begin{aligned} F=I+s\ {F' dl} =I+s\ { \begin{pmatrix} \cdots &{} \quad \cdots \\ \cdots &{} \quad \cdots \end{pmatrix} dl} \end{aligned}$$
(4)

where \(s=+1(-1)\) when the direction of propagation increases (decreases) the z coordinate while the tracking is carried out and I denotes the identity matrix. The elements of the \(F'\) matrix (i.e. the dots within the parenthesis of the matrix in Eq. (4)) are the track length derivatives of the elements of the residual propagator matrix \((F-I)\). These quantify the additional uncertainties introduced by the presence of the magnetic field etc. to the existing uncertainties at l [15, pp. 10–12]. Concrete examples of the elements can be found from [13, 15, 19] etc. We shall see that the nature of

$$\begin{aligned} F'\equiv \begin{pmatrix} \cdots &{} \quad \cdots \\ \cdots &{} \quad \cdots \end{pmatrix} \end{aligned}$$

in Eq. (4) determines the functional forms of the elements of Q matrix.

2.1 Some comments on \(\delta Q\)

Since this uncertainty originates from a very small step of length dl, it may be assumed that the scattering took place in a plane of infinitesimal thickness. The elastic scattering with the Coulomb field of the nuclei of the dense detector material brings about a sudden change in the particle direction at the plane of the scattering. However, the particle position does not change laterally at that plane. Also, the magnitude of the momentum of the particle hardly changes as the energy imparted to these heavy nuclei is practically negligible [35, pp.20]. If instead of q / p, \(q/p_T\) is chosen to be a state element, where \(p_T\) denotes the transverse momentum, it will change at that plane where the particle undergoes the scattering [1, pp. 9]. So, multiple scattering introduces uncertainty only in the particle direction and it is parametrized by two orthogonal angles \(\theta _1\) and \(\theta _2\), defined with respect to the particle direction. On the other hand, the fluctuation in the energy loss happens due to uncertainty in the collision rate with the atomic electron when a high energy particle passes through a dense material. The physical mechanism of the ionization hardly changes the particle direction but surely changes the magnitude of the momentum. The fluctuation, therefore, is independent of multiple scattering angles, but dependent on particle momentum p. Now, the covariance between \(m ^{th}\) and \(n^{th}\) elements of the state vector is given by:

$$\begin{aligned} c(\mathbf{r}_m,\mathbf{r}_n)=\sum _i{\frac{\partial \mathbf{r}_m}{\partial \xi _i}}{\frac{\partial \mathbf{r}_n}{\partial \xi _i}}\sigma ^2(\xi _i) \end{aligned}$$
(5)

In Eq. (5), \(\xi _i\) denotes any variable representing fluctuation due to the random processes (thus, \(\xi =\theta _1\) or \(\theta _2\) or p) and \(\sigma (\xi _i)\) is the width of that fluctuation. Since \(\theta _1\), \(\theta _2\) and particle momentum p are independent parameters, one does not need to calculate the covariance terms between \((\xi _i,\xi _j)\) for \(i\ne j\). Then, for the chosen state vector \((x,y,t_x,t_y,q/p)^T\), the corresponding covariance elements may be calculated (for point scattering). All covariances with position coordinates (x or y) is zero according to our assumption that there is no horizontal shift of particle position in the infinitesimal plane of scattering. The covariances \(c(t_x,q/p), c(t_y,q/p)=0\), because:

$$\begin{aligned} c(t_x,q/p)= & {} \frac{\partial t_x}{\partial \theta _1}\frac{\partial (q/p)}{\partial \theta _1}\sigma ^2(\theta _1)+\frac{\partial t_x}{\partial \theta _2}\frac{\partial (q/p)}{\partial \theta _2}\sigma ^2(\theta _2) \nonumber \\&+ \; \frac{\partial t_x}{\partial p}\frac{\partial (q/p)}{\partial p}\sigma ^2(p) \end{aligned}$$
(6)

Now, change of direction due to multiple scattering does not change p and change of momentum due to energy loss fluctuation does not change direction \(t_x\) or \(t_y\). As a result, \(\frac{\partial (q/p)}{ \partial \theta _{1}}=\frac{\partial (q/p)}{\partial \theta _{2}}=\frac{\partial (t_x)}{\partial {p}}=\frac{ \partial (t_y)}{\partial p}=0\). Thus, over a tracking step length dl, the integrated random uncertainty matrix is given by:

$$\begin{aligned} \delta Q = \begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad c(t_x,t_x) &{}\quad c(t_x,t_y) &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad c(t_y,t_x) &{}\quad c(t_y,t_y) &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad c(q/p,q/p) \end{pmatrix}dl \end{aligned}$$
(7)

The nonzero variance and covariance elements of \(t_x\) and \(t_y\) are known in terms of the rms errors of the scattering angles [21, 34]. The calculation of c(q / pq / p ) is available from [15].

2.2 Formulating the problem

In this section, we formulate the problem in the same way as indicated in Appendix B of [1]. However, we also take into account the effect of energy loss fluctuation on the q / p element of state vector. At a track length l, the random noise matrix Q(l) is given by:

$$\begin{aligned} Q(l)= \begin{pmatrix} Q_{11}(l) &{}\quad Q_{12}(l) &{}\quad Q_{13}(l) &{} \quad Q_{14}(l) &{} \quad Q_{15}(l) \\ \cdots &{}\quad Q_{22}(l) &{}\quad Q_{23}(l) &{}\quad Q_{24}(l) &{}\quad Q_{25}(l) \\ \cdots &{} \quad \cdots &{}\quad Q_{33}(l) &{}\quad Q_{34}(l) &{}\quad Q_{35}(l) \\ \cdots &{} \quad \cdots &{}\quad \cdots &{}\quad Q_{44}(l) &{} \quad Q_{45}(l) \\ \cdots &{} \quad \cdots &{}\quad \cdots &{} \quad \cdots &{} \quad Q_{55}(l) \end{pmatrix} \end{aligned}$$
(8)

where in Eq. (8), the symmetric elements of the real symmetric matrix Q has been replaced by dots. This shows that there are exactly fifteen independent elements of the process noise matrix that need to be determined. If the propagator matrix F deviates from the identity matrix I by a matrix \(F'\ s\ dl\) (see Eq. (4)), then we can say:

$$\begin{aligned} Q(l+dl)&\approx Q(l)+Q'(l)dl \nonumber \\&=\left( I+F'\ s\ dl\right) \ Q(l)\ \left( I+F'\ s\ dl\right) ^T+\delta Q\nonumber \\&\approx Q(l)+s(F'\ Q(l)+(F'\ Q(l))^T)\ dl+O(^{2})+\delta Q \end{aligned}$$
(9)

From Eq. (9), one can deduce the differential equation of process noise Q:

$$\begin{aligned} \frac{dQ}{dl}=s\left( (F'\ Q(l)+(F'\ Q(l))^T\right) +\delta Q/dl \end{aligned}$$
(10)

We note that \(\frac{dQ}{dl}\), \(\delta {Q/dl}\) and \(\left( (F'\ Q(l)+(F'\ Q(l))^T\right) \) in Eq. (10) are real symmetric matrices. This equation encodes a system of 15 coupled linear ODEs corresponding to the 15 independent elements of Q. The matrix \(\left( (F'\ Q(l)+(F'\ Q(l ))^T\right) \) has been calculated with the help of Mathematica [36], assuming all the elements of \(F'\) are nonzero. In fact, some elements of \(F'\) were found to be rather high (of the order of one or more) depending upon the tracking directions and momenta. The functional form of every element of Q is the solution of the set of independent equations in Eq. (10).

2.3 Mankel’s solution

In his work, Mankel [1] used a \(4\times 4\) block of random noise matrix whose elements were covariance terms of position and angular coordinates. The corresponding \(4\times 4\) block of the propagator matrix was given by:

$$\begin{aligned} F= I_{4\times 4} + \begin{pmatrix} 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \end{pmatrix} s\ dl \end{aligned}$$
(11)

That is, except for \(F'_{13}=F'_{24}=1\), Mankel took all the other elements of the \(F'\) matrix to be zero. In that case, the random noise matrix has 10 independent elements. Thus, 10 linear coupled ODEs are obtained. The simple form of the propagator (Eq. (11)) keeps the forms of the coupled equations simple. They can be easily solved and the resulting process noise matrix becomes:

$$\begin{aligned} Q(l)= \begin{pmatrix} c(t_x,t_x)\frac{l^3}{3} &{}\quad c(t_x,t_y)\frac{l^3}{3} &{}\quad c(t_x,t_x)s\frac{l^2}{2} &{} \quad c(t_x,t_y)s\frac{l^2}{2}\\ \cdots &{}\quad c(t_y,t_y)\frac{l^3}{3} &{}\quad c(t_x,t_y)s\frac{l^2}{2} &{} \quad c(t_y,t_y)s\frac{l^2}{2} \\ \cdots &{} \quad \cdots &{} \quad c(t_x,t_x)l &{} \quad c(t_x,t_y)l \\ \cdots &{} \quad \cdots &{} \quad \cdots &{} \quad c(t_y,t_y)l \end{pmatrix} \end{aligned}$$
(12)

where the symmetric counterparts are replaced by dots.

3 Solution of the problem

The matrix solution Eq. (12) is not valid in general, when all the elements of the propagator matrix are nonzero. From Eq. (10), if the matrix connecting the fifteen independent elements of Q (i.e. \(Q_{11}\) to \(Q_{55}\)) to their derivatives is given as \(\mathbf{A}_{15\times 15}\), then we can write:

$$\begin{aligned} \begin{pmatrix} \frac{dQ_{11}}{dl}\\ \frac{dQ_{12}}{dl}\\ \cdots \\ \cdots \\ \frac{dQ_{55}}{dl} \end{pmatrix}= & {} s \begin{pmatrix} A_{11} &{} \quad A_{12} &{} \quad \cdots &{} \quad A_{1n} \\ A_{21} &{} \quad A_{22} &{} \quad \cdots &{} \quad A_{2n} \\ \cdots &{} \quad \cdots &{} \quad \cdots &{} \quad \cdots \\ \cdots &{} \quad \cdots &{} \quad \cdots &{} \quad \cdots \\ A_{n1} &{} \quad A_{n2} &{} \quad \cdots &{} \quad A_{nn} \end{pmatrix}_{15\times 15} \nonumber \\&\times \; \begin{pmatrix} Q_{11}\\ Q_{12}\\ \cdots \\ \cdots \\ Q_{55} \end{pmatrix} + \begin{pmatrix} \delta Q_{11}/dl\\ \delta Q_{12}/dl\\ \cdots \\ \cdots \\ \delta Q_{55}/dl \end{pmatrix} \end{aligned}$$
(13)

This matrix is real but not symmetric. From Appendix A, it is seen that 110 elements out of 225 elements of \(\mathbf{A}_{15\times 15}\) matrix are zero. Further simplifications arise from the fact that only 4 elements of the 15 elements of \(\delta Q/dl\) vector are nonzero. Hence, Eq. (13) can be succinctly written as:

$$\begin{aligned} \frac{d\mathbf{q}}{dl}=s\mathbf{A}{} \mathbf{q}+\delta \mathbf{q} \end{aligned}$$
(14)

where \(\mathbf{q}\) is a column vector of the fifteen independent elements of the Q matrix \((Q_{11}, Q_{12},\ldots ,Q_{55})\) and \(\delta \mathbf{q}\) denotes the vector of the corresponding elements of \(\delta Q\) matrix (see Appendix A). Within the step of length dl, the elements of \(\mathbf{A}\) remain unchanged, as they are obtained from the propagator matrix for that step. Hence, the problem is to solve non-homogeneous linear coupled system of differential equations with constant coefficients. Now, we shall investigate different approaches for solving this initial value problem and discuss their merits and demerits.

3.1 Solution by decoupling

The most elegant method to solve Eq. (14) is to decouple the equations by diagonalizing \(\mathbf{A }\). If \(\mathbf{A}\) is diagonalizable (i.e. \(\mathbf{A}=PDP^{-1}\)) with an invertible P and a diagonal D , the system of equations can be decoupled through the substitution \(\mathbf{q}=P\mathbf{u}\). In that case, Eq. (14) reduces to:

$$\begin{aligned} P\frac{d\mathbf{u}}{dl}&=sPDP^{-1}(P\mathbf{u})+\delta \mathbf{q} \nonumber \\ \frac{d\mathbf{u}}{dl}&=sD\mathbf{u} + P^{-1}\delta \mathbf{q} \end{aligned}$$
(15)

Here P is the matrix of the eigenvectors of \(\mathbf{A}\); the corresponding eigenvalues are located at the diagonal position of the diagonal matrix D. As \(\mathbf{A}\) is not necessarily real symmetric, the eigenvalues can be complex numbers as well and \(\mathbf{A}\) may not be diagonalizable altogether in some cases. However, when it is diagonalizable, we can easily solve Eq. (15) for \(\mathbf{u}\) from the fact that the jth component of the equation is just a first order linear ODE:

$$\begin{aligned} \frac{du_j}{dl}=s\lambda _j u_j + (P^{-1}\delta \mathbf{q})_j \end{aligned}$$
(16)

where the set of \(\lbrace \lambda _j\rbrace \) denotes the set of eigenvalues of \(\mathbf{{A}}_{15\times 15}\). Equation (16) can be solved by using the integrating factors and the solution to Eq. (14) becomes:

$$\begin{aligned} q_{i}(l)&=\sum _{j=1}^{15}P_{ij}u_j(l) \nonumber \\&=\sum _{j=1}^{15}P_{ij}\left( e^{s\lambda _jl}u_j(0)+e^{s\lambda _jl}\int _0^l e^{-s\lambda _jl}(P^{-1}\delta \mathbf{q})_j\ dl\right) \end{aligned}$$
(17)

We assume that \(P^{-1}\delta \mathbf{q}\) varies very slowly over the small step of length l, so that it may be considered to remain constant while calculating the integral in Eq. (17). Thus, we get:

$$\begin{aligned} q_{i}(l)&\approx \sum _{j=1}^{15}P_{ij}\left( e^{s\lambda _jl}u_j(0)+e^{s\lambda _jl}(P^{-1}\delta \mathbf{q})_j\int _0^l e^{-s\lambda _jl}\ dl\right) \nonumber \\&=\sum _{j=1}^{15}P_{ij}\left[ e^{s\lambda _jl}u_j(0)+e^{s\lambda _jl}(P^{-1}\delta \mathbf{q})_j\left( \frac{1-e^{-s\lambda _jl}}{s\lambda _j}\right) \right] \nonumber \\&=\sum _{j=1}^{15}P_{ij}\left[ e^{s\lambda _jl}u_j(0)+\frac{(P^{-1}\delta \mathbf{q})_j}{s\lambda _j}(e^{s\lambda _jl}-1)\right] \end{aligned}$$
(18)

In Eq. (18), there are 15 unknown coefficients \(u_j(0)\) that must be deduced from the initial conditions. The initial condition is that at \(l=0\), all random noise errors are zero. We see that for \(l=0\), Eq. (18) reduces to:

$$\begin{aligned} q_{i}(0)&=\sum _{j=1}^{15} P_{ij}u_j(0)=0 \end{aligned}$$
(19)

Equation (19) is possible only if all \(u_j(0)\)s are individually zero. Thus, we have:

$$\begin{aligned} q_{i}(l)=\sum _{j=1}^{15}P_{ij}\frac{(P^{-1}\delta \mathbf{q})_j}{s\lambda _j}(e^{s\lambda _jl}-1) \end{aligned}$$
(20)

In the case when \(\mathbf{A}\) is diagonalizable, the only difficulty of implementation is the occurrence of complex numbers in the result. In this case, we simply take the real parts of \(q_i(l)\) to form the elements of the random noise matrix. The imaginary parts of \(q_i(l)\) cannot be used, as the imaginary parts of \(q_1(l),q_6(l),q_{10}(l),q_{13}(l),q_{15}(l)\) (that correspond to the diagonal elements of Q, i.e. the variance terms) are found to take negative values frequently. This inconsistency does not occur if real parts of \(q_i(l)\) are used. As long as the matrix \(\mathbf{{A}}\) is diagonalizable, P is invertible and \(\lambda _j\ne 0\), this method is observed to work. This typically happens inside the magnetized iron plates of ICAL detector. However, outside iron, the conditions are not satisfied (\(Det(\mathbf{A})\rightarrow 0\), one or more \(\lambda _j\) are zero etc). As a result, Eq. (20) cannot be used there.

3.2 Reconciliation with the process noise matrix derived in [1]

The simple process noise matrix (Eq. (12)) derived in [1] is valid in a region with zero magnetic field where the simple form of the Kalman propagator matrix (Eq. (11)) is valid. On the other hand, Eq. (20) describes the form of every independent element of the process noise matrix in presence of magnetic field. It is not possible to directly reduce \(q_i(l)\) of Eq. (20) to the corresponding elements of Eq. (12) in the absence of magnetic field to check whether the generalization has been consistent, since in that scenario \(Det(\mathbf{A})\) becomes zero (or very close to zero) which prohibits the computations of P matrix and the eigenvalues \(\lambda _j\). However, it is possible to reconcile Eq. (20) with Eq. (12) inside the magnetic field, by checking if the real parts of \(q_i(l)\) are close to the corresponding elements in Eq. (12).

Although there is an exponential dependence in Eq. (20), the exponent can be replaced by its series in the limit of small step length l. As a result, each term in the summation becomes a power law in itself and can be represented as:

$$\begin{aligned} q_i(l)=&\,(P_{i1}(P^{-1}\delta \mathbf{q})_1 + P_{i2}(P^{-1}\delta \mathbf{q})_2 + \cdots ) l \nonumber \\&+(P_{i1}(P^{-1}\delta \mathbf{q})_1\frac{s\lambda _1}{2} + P_{i2}(P^{-1}\delta \mathbf{q})_2\frac{s\lambda _2}{2} + \cdots ) l^2 \nonumber \\&+(P_{i1}(P^{-1}\delta \mathbf{q})_1\frac{(s\lambda _1)^2}{6} + P_{i2}(P^{-1}\delta \mathbf{q})_2\frac{(s\lambda _2)^2}{6} + \cdots ) l^3 \nonumber \\&+ \cdots \end{aligned}$$
(21)

So, one needs to check if the real parts of the coefficients of l, \(l^2\) and \(l^3\) in Eq. (21) are close to the corresponding coefficients in Eq. (12). This exercise has been performed and the results are shown in Appendix B.

3.3 Solution method without diagonalization

In this case, one first needs to solve the homogeneous equation \( \frac{d\mathbf{q}}{dl}=\mathbf{Aq}\) (where the constant factor s is absorbed within the matrix \(\mathbf{A}\)). The solutions for the vector \(\mathbf{q}(l)\) are used to form a fundamental matrix solution M(l) [37], each column of which is independent and satisfies the homogeneous part of Eq. (14). Using the method of variation of parameters, the solution to the non-homogeneous initial value problem:

$$\begin{aligned} \frac{d\mathbf q}{dl}=\mathbf{A\ q}+\delta \mathbf{q},\mathbf{q}(l_0)=\mathbf{q}_0 \end{aligned}$$
(22)

can be given by [37]:

$$\begin{aligned} \mathbf{q}(l)&=M(l)M(l_0)^{-1}\mathbf{q}_0+M(l)\int _{l_0}^{l}M(t)^{-1}\delta \mathbf{q}(t)dt \end{aligned}$$
(23)

When it is possible to find out all the possible eigenvalues and independent eigenvectors of \(\mathbf{A} \), construction of M(l) is straightforward [38, Ch.37]. However, matrices are not always diagonalizable. So, it is essential to have an alternative method of deriving M(l) when the calculation of all independent eigenvectors is not possible. This can be achieved by Putzer’s algorithm [37]. The method is elegant in the sense that it does not require all the eigenvalues to be distinct or nonzero. However, in case of solving Eq. (14) it is seen that the calculation of M(l), a \(15\times 15 \) matrix, becomes impractically lengthy, and therefore, the method has not been adopted. But if it is possible compute M(l) by this method, that may be used even outside magnetized iron plates.

4 Application to ICAL

In the track fitting program for ICAL [19], thick scatterer approximation has been used previously by implementing Mankel’s form of random noise matrix [1]. Strictly speaking, this form of matrix is valid only if the track segment is linear, since the magnetic field dependent terms (that lead to curvature of the track) are assumed to be zero in the propagator matrix F (Eq. (11)). Therefore, it is a matter of interest to see how the performance of track fitting is affected, when a more appropriate solution (Eq. (20)) is applied to construct the random process noise matrix in the presence of the magnetic field.

This has been carried out through the use of a C++ based computational library it++ [39]. Details of the coding techniques etc. are given in Appendix C. It was seen that in all the cases where all the elements of \(\mathbf{A}\) are non-trivial (which commonly happens within the magnetic field), the determinants of \(\mathbf{A}\) assume large values \((10^0-10^6)\) and the diagonalizations can be carried out quite easily. However, in the regions where the magnetic field is zero (outside the iron slabs in the ICAL detector) or its spatial derivatives are zero (inside iron), \(|Det(\mathbf{A})|=0\) (or \(|Det(\mathbf{A})|\rightarrow 0\)) and Eq. (20) cannot be applied. This can be understood in the following way: outside iron, the propagator matrix reduces to Eq. (11), as all the magnetic field integrals vanish. Even inside iron, certain elements in the first two columns of \(F'\) matrix (e.g. \(F'_{11}\), \(F'_{12}\) etc. which depend on spatial derivatives of magnetic field components [19]) become zero occasionally. These zeros lead to additional zeros in the matrix \(\mathbf{A}\) and the determinant of the latter becomes very small (close to zero).Footnote 1 That the determinant is zero (or close to zero) suggests that one or more eigenvalues are zero (or close to zero). Hence, Eq. (20) cannot be evaluated properly and unphysical solutions are obtained if Eq.(20) is applied. Therefore, outside the iron plates (and occasionally inside the iron plates) where \(|Det(\mathbf{A})|\) is small \((\le 1)\), we used Mankel’s form of the process noise matrix Eq. (12). In general, inside the magnetic field, where \(Det(\mathbf{A})\) is typically \(\gg 1\), we applied Eq. (20) to construct the elements of the process noise matrix by diagonalizing \(\mathbf{A}\) through it++.

Since the solutions \(q_i(l)\)s represent the terms of a covariance matrix, we expect that \(q_1\), \(q_6 \), \(q_{10}\), \(q_{13}\), \(q_{15}\) will be positive, because they correspond to the diagonal elements of the Q matrix (\(Q_{11}, Q_{22}, Q_{33}, Q_{44},Q_{55}\) respectively). However, the real parts of the solutions \(q_i(l)\)s need not be positive. It is interesting to see that the computation automatically led to positive values of diagonal elements, as expected. No additional measure was needed to obtain these positive values. This shows that the analysis has been consistent.

5 Reconstruction performance

Since the method of computing the process noise matrix described in this paper is somewhat abstract, first we would like to show that the resulting Kalman filter works in a consistent fashion. Once that is done, we shall check if the Kalman filter, equipped with the random noise matrix developed in this paper, has better (or worse!) reconstruction performance compared to the one equipped with the random noise matrix derived by Mankel [1].

We used GEANT4 [40] to generate 5000 Monte Carlo muons (\(\mu ^-\)) inside the ICAL detector. The event vertices were smeared uniformly across a volume of \((43.2\,\mathrm{m}\times 14.4\,\mathrm { m} \times 10.0\,\mathrm{m})\) around the center of the detector (see Fig. 1a in all \(\phi \) directions (\(\phi \in [0,2\pi ]\)). This ensures that the muon tracks from the inhomogeneous magnetic field region (Fig. 1b) are also present in the total set of simulated tracks, in the same way it would happen in reality.

To show that the filter is working in the expected way, we shall present the ‘goodness of fit’ plots in the following. These are the pull distributions of the fitted variables and the reduced \(\chi ^2\) distribution. The pull of a given variable \(\zeta \) is defined as:

$$\begin{aligned} Pull(\zeta )=\frac{\zeta _{Reconstructed}-\zeta _{Monte\ Carlo}}{\sqrt{C_{\zeta \zeta }}} \end{aligned}$$
(24)

where \(C_{\zeta \zeta }\) denotes the error of the reconstructed \(\zeta \) parameter, as estimated from the updated covariance matrix of the Kalman filter. In ICAL, we are mostly interested in the fitted parameters near the muon event vertex; hence, the pull is evaluated there only. For good fit, the pull distributions must have mean at zero and standard deviation equal to unity. In Fig. 2, we show these plots for muons of momentum 5 GeV/c, with initial direction \(\theta =\cos ^{-1}0.95\) to the vertical.

The reduced \(\chi ^2\) of the model prediction of every event is obtained by dividing the total \(\chi _p^2=\sum _k(\mathbf{{r }}_k^{k-1})^T(R_k^{k-1})^{-1}{} \mathbf{{r}}_k^{k-1}\) [12] by the no. of free parameters. Here \(\mathbf{{r}}_k^{k-1}\) denotes the residual of state prediction and \(R_k^{k-1}\) is the corresponding error covariance matrix. The total no. of free parameters equals \(2n-5\), found by subtracting five constraints (through initialization of the filter) from the total degrees of freedom (two times the no. of hits n along the track). The \(\chi _p^2\) for prediction is equal to the \(\chi _f^2\) [12] for the track fit. From Fig. 2, it is observed that the pull distributions of the elements of the state vector have mean very close to zero and fitted width very close to unity. The reduced \(\chi ^2\) plot (Fig. 2f) peaks close to unity as well, as expected. Similar performance of track reconstruction is observed in a wide range of p and \(\cos \theta \).

Fig. 2
figure 2

Reconstructed muon of momentum 5 GeV/c at zenith angle \(\theta =\cos ^{-1}0.95\). a Pull of X, b pull of Y, c pull of \(t_x\), d pull of \(t_y\), e pull of \(\frac{q}{p}\) and f Reduced \(\chi ^2\). The error of a variable \(\zeta \) has been denoted by \(\epsilon (\zeta )\)

At very low momenta (\(p<2\) GeV/c) and very large angles \(\theta >60^0\), gradual worsening of the reconstruction performance is observed. This degradation is intrinsic to the tracking problem, irrespective of whether or not the enhanced track scatterer treatment, described in this paper, is included. The gradual worsening is seen from the following momentum and direction resolution plots in Fig. 3. Here, the momentum resolution has been defined as \(\frac{\sigma (p)}{p_{in}}\), where \(\sigma (p)\) denotes the rms width of the reconstructed momentum distribution and \(p_{in}\) denotes the input momentum. On the other hand, rms width of the reconstructed \(\cos \theta \) distribution (i.e. \(\sigma (\cos \theta )\)) has been chosen as the definition of the direction resolution.

Fig. 3
figure 3

Reconstructed a momentum and b direction (\(\cos \theta \)) resolution plots of muon \(\mu ^-\) as functions of increasing input momenta at various initial angles

Fig. 4
figure 4

Comparison of the track fitting performance between the Kalman filters equipped with the process noise matrix derived in [1] and that derived in this paper. a Comparison of reconstructed momentum and b direction (\(\cos \theta \)). The result is for 5000 muon tracks of true momentum 5 GeV/c in the ICAL detector

At lower momenta and/or larger zenith angles, the muon tracks are affected by the multiple scattering to a greater extent. This affects the precision of muon momentum estimation, for which the resolution becomes poor (Fig. 3a). On the other hand, at higher muon input momenta, the contribution to the momentum resolution from the spatial resolution component is higher [41, Eq. (3.5)] and that leads to gradual worsening of momentum resolution. The direction resolution steadily improves with increasing \(p_{in}\), but worsens as input \(\theta \) is increased.

Let us now proceed to the comparative study of the Kalman filters equipped with two different process noise matrices: one derived in this paper and the other derived in [1]. To see if the former has any advantage or disadvantage over the latter, these two programs were used to fit the two copies of the simulated muon tracks of momentum 5 GeV/c and initial direction \(\theta =\cos ^{-1} 0.95\). Thus, the two filters with the two different process noise matrices operated on identical sets of position measurements. It was observed that the quality and performance of reconstruction of these two programs are of the same order. For individual events, the correction in the reconstructed values of momentum or \(\cos \theta \) usually appeared at the second or third decimal places or beyond that. In fact, no significant improvement or deterioration was observed for any of the track parameters. Thus, no gross improvement was achieved by using the more appropriate form of random process noise matrix inside the iron plate equipped with magnetic field. This is shown in the following Fig. 4a and b.

Fig. 5
figure 5

Reconstruction performance: a momentum and b direction (\(\cos \theta \)) of the events with the elements of the random process noise matrix set to zero

This observation that there is hardly any difference in the reconstruction performance may raise some doubt about the validity of the process noise treatment. One may be interested to check how large the effect of the process noise treatment is in the first place. If the fitting is performed by switching off the process noise, we expect the fitting performance to deteriorate. This exercise was performed by setting all the elements of the process noise matrix to zero, but keeping the other of the program the same as before. The result is shown in Fig. 5. It is seen that the fitting performance of the same set of events becomes less precise, due to inconsideration of the process noise treatment.

In fact, many of the events are reconstructed with worse momenta values, as seen from the event count in Fig. 5a (less compared to those in Fig. 4a). The filter, however, converges to more or less accurate mean value, since it took into account the mean energy loss in correct manner. On the other hand, the direction estimation becomes very poor, as seen from the width of the distribution in Fig. 5b.

This consistency check also confirms that the fitting performance improves significantly with respect to “no process noise treatment”, when the process noise matrix is accounted for. Figure 4a shows that the formula of the process noise matrix developed in this paper, which was used for tracking inside magnetized iron plates, does not lead to gross improvement of track fitting performance. So, we conclude that Mankel’s simple solution for random noise matrix is indeed a good approximation.

6 Summary

In this paper, a mathematical formalism has been developed for expressing the elements of the random noise matrix while performing track fitting with a Kalman filter through a thick scatterer and nonzero magnetic field. In this case, all the elements of the propagator are nonzero, unlike Mankel’s approach [1] and we made use of the method of diagonalization (see 3.1) to construct the desired elements under such circumstances. Through this formalism, the elements \(\mathrm cov (\mathbf{{x}}, q/p)=\mathrm cov(q/p, \mathbf{{x}})\) of the random noise matrix can also be calculated for a track deflected by a magnetic field in a thick scatterer. Evaluation of these elements was not included in Mankel’s treatment [1]. Although no precaution was taken to render the real parts of \(q_1, q_6, q_{10}, q_{13}\) and \(q_{15}\) positive (which correspond to the diagonal elements of the random noise matrix), they turned out to be positive in all the cases. However, this solution could not be used outside magnetic field region. Also, its use inside the magnetized iron plates did not improve the track fitting performance. The treatment by Mankel [1], derived under approximations, seems good enough for reconstruction of momentum, at least to the first or second decimal place. This is also clear from Table 1 in Appendix B which shows that the corrections introduced to the elements of the process noise matrix are small. On the other hand, the mathematical form of the elements of the process noise matrix, derived in this paper, is quite general and can be used in the context of any state vector in other HEP experiments employing different state vectors (for example, those containing \(q/p_T\) or curvature \(\kappa \) of the track as one of the elements).