1 Least-Squares Fitting

In this section, three methods are described that are based on the least-squares (LS) principle for estimation of the track parameters. They are linear or linearized regression, the (extended) Kalman filter, and regression with breakpoints. In the case of a strictly linear model, they are mathematically equivalent. With a nonlinear model, there may be small differences because of different choices of the expansion point(s). In the following, the more frequent case of nonlinear models will be described, which contains the linear model as a special case.

1.1 Least-Squares Regression

Assume that track finding has produced a track candidate, i.e., a collection of n measurements m 1, …, m n in different layers of the tracking detector, along with their respective covariance matrices V 1, …, V n. The measurements may have different dimensions m i and usually have different covariance matrices, resulting in a heteroskedastic model. The initial parameters of the track to be fitted to the measurements are denoted by p. They are assumed to be tied to a reference surface (layer 0). The regression model has the following form:

$$\displaystyle \begin{gathered} {\boldsymbol{m}_{}}={\boldsymbol{f}_{}}({\boldsymbol{p}})+{\boldsymbol{\varepsilon}},\ \;{\mathsf{E}\left[{\boldsymbol{\varepsilon}}\right]}=\mathbf{0},\ \;{\mathsf{Var}\left[{\boldsymbol{\varepsilon}}\right]}={\boldsymbol{V}}, {} \end{gathered} $$
(6.1)

where m = (m 1, …, m n)T and f = (f 1, …, f n)T. The function f k maps the initial parameters p to the measurement m k in layer k. It is the composition of the track propagators up to layer k (see Sect. 4.3) and the function that maps the track state to the measurement (see Sect. 3.2.3):

$$\displaystyle \begin{gathered} {\boldsymbol{f}_{k}}={\boldsymbol{h}_{k}}\circ\boldsymbol{f}_{k{\hspace{0.5pt}|\hspace{0.5pt}} k-1}\circ\boldsymbol{f}_{k-1{\hspace{0.5pt}|\hspace{0.5pt}} k-2}\circ\ldots\circ\boldsymbol{f}_{2{\hspace{0.5pt}|\hspace{0.5pt}} 1}\circ\boldsymbol{f}_{1{\hspace{0.5pt}|\hspace{0.5pt}} 0}, \end{gathered} $$
(6.2)

Its Jacobian F k is given by the product of the respective Jacobians:

$$\displaystyle \begin{gathered} {\boldsymbol{F}}_k={\boldsymbol{H}_{k}}\hspace{0.5pt}\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} k-1}\hspace{0.5pt}\boldsymbol{F}_{k-1{\hspace{0.5pt}|\hspace{0.5pt}} k-2}\hspace{0.5pt}\ldots\hspace{0.5pt}\boldsymbol{F}_{2{\hspace{0.5pt}|\hspace{0.5pt}} 1}\hspace{0.5pt}\boldsymbol{F}_{1{\hspace{0.5pt}|\hspace{0.5pt}} 0}.{} \end{gathered} $$
(6.3)

The covariance matrix V is the sum of two parts, V  = V M + V S. The first part is the joint covariance matrix of all measurement errors. These can virtually always be assumed to be uncorrelated across different layers, so that V M is block-diagonal:

$$\displaystyle \begin{gathered} {\boldsymbol{V}_{\mathrm{M}}}={{\text{blkdiag}\hspace{0.5pt}({\boldsymbol{V}_{1}},\ldots,{\boldsymbol{V}_{ n}})}},\ \;\mathrm{with}\ \; {\boldsymbol{V}_{ i}}={\mathsf{Var}\left[{\boldsymbol{\varepsilon}_{i}}\right]},\ \; i=1,\ldots,n. \end{gathered} $$
(6.4)

The second part V S is the joint covariance matrix of the process noise caused by material effects, mainly multiple Coulomb scattering; see Sect. 4.5 . As in Sect. 3.2.3, the process noise encountered during the propagation from layer k − 1 to layer k is denoted by γ k and its covariance matrix by Q k. The integrated process noise up to layer k is denoted by Γ k. Linearized error propagation along the track gives the following expression for the covariance matrix of Γ k:

$$\displaystyle \begin{gathered} {\mathsf{Var}\left[{\boldsymbol{\varGamma}_{k}}\right]}=\sum_{i=1}^{k}\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} i}\hspace{0.5pt}{\boldsymbol{Q}_{i}}\hspace{0.5pt}\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} i}{{}^{\mathsf{T}}}, \end{gathered} $$
(6.5)

with

$$\displaystyle \begin{gathered} \boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} i}=\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} k-1}\hspace{0.5pt}\boldsymbol{F}_{k-1{\hspace{0.5pt}|\hspace{0.5pt}} k-2}\hspace{0.5pt}\cdots\hspace{0.5pt}\boldsymbol{F}_{i+1{\hspace{0.5pt}|\hspace{0.5pt}} i} \ \;\mathrm{for}\ \; i<k \quad \mathrm{and}\quad\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} k}=\boldsymbol{I}. \end{gathered} $$
(6.6)

If i < k, Γ i and Γ k are correlated with the following cross-covariance matrix:

$$\displaystyle \begin{gathered} {\mathsf{Cov}\left[{\boldsymbol{\varGamma}_{i}},{\boldsymbol{\varGamma}_{k}}\right]}=\sum_{j=1}^{i} \boldsymbol{F}_{i{\hspace{0.5pt}|\hspace{0.5pt}} j}\hspace{0.5pt}{\boldsymbol{Q}_{j}}\hspace{0.5pt}\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} j}{{}^{\mathsf{T}}}. \end{gathered} $$
(6.7)

Error propagation from the track states to the measurements gives the final block structure of V S:

$$\displaystyle \begin{gathered} {\boldsymbol{V}_{\mathrm{S}}}= \begin{pmatrix} {\boldsymbol{C}}_{11} & {\boldsymbol{C}}_{12} & \cdots & {\boldsymbol{C}}_{1n}\\ {\boldsymbol{C}}_{21} & {\boldsymbol{C}}_{22} & \cdots & {\boldsymbol{C}}_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ {\boldsymbol{C}}_{n1} & {\boldsymbol{C}}_{n2} & \cdots & {\boldsymbol{C}}_{nn} \end{pmatrix}, \ \;\mathrm{with}\ \; {\boldsymbol{C}}_{ik}= \begin{cases} {\boldsymbol{H}_{k}}\hspace{0.5pt}{\mathsf{Var}\left[{\boldsymbol{\varGamma}_{k}}\right]}\hspace{0.5pt}{\boldsymbol{H}_{k}}{{}^{\mathsf{T}}}, \ \;\mathrm{if}\ \; i=k\\ {\boldsymbol{H}_{i}}\hspace{0.5pt}{\mathsf{Cov}\left[{\boldsymbol{\varGamma}_{i}},{\boldsymbol{\varGamma}_{k}}\right]}\hspace{0.5pt}{\boldsymbol{H}_{k}}{{}^{\mathsf{T}}}, \ \;\mathrm{if}\ \; i<k\\ {\boldsymbol{C}}_{ki}{{}^{\mathsf{T}}}, \ \;\mathrm{if}\ \; i>k\\ \end{cases} \end{gathered} $$
(6.8)

Estimation of the initial state p is usually done by the Gauss-Newton method; see Sect. 3.2.2. A first approximation \({\boldsymbol {\tilde {p}}}_0\) of p has to be delivered by the track finder to be used as the expansion point to compute the Jacobians in Eq. (6.3). The updated estimate \({\boldsymbol {\tilde {p}}}_1\) is obtained via:

(6.9)

The corresponding chi-square statistic is given by:

$$\displaystyle \begin{gathered} {\chi^2}_{1}=[{\boldsymbol{m}_{}}-{\boldsymbol{f}_{}}({\boldsymbol{\tilde{p}}}_1)]{{}^{\mathsf{T}}}\hspace{0.5pt}{\boldsymbol{G}}\hspace{0.5pt}[{\boldsymbol{m}_{}}-{\boldsymbol{f}_{}}({\boldsymbol{\tilde{p}}}_1)],\ \;\mathrm{with}\ \;{\boldsymbol{G}}={\boldsymbol{V}}{{}^{-1}}. \end{gathered} $$
(6.10)

It is approximately χ 2 distributed with M − m degrees of freedom, where M = dim(G) is the sum of all m i, and m is the number of estimated track parameters, usually five.

The Jacobians are recomputed at the new expansion point \({\boldsymbol {\tilde {p}}}_{1}\), and an updated estimate \({\boldsymbol {\tilde {p}}}_{2}\) is computed. This two-step procedure is iterated until the absolute difference \(\left |\hspace{0.5pt}{\chi ^2}_{k+1}-{\chi ^2}_k\hspace{0.5pt}\right |\) is below a predefined threshold or a maximal number of iterations is reached.

The p-value of the chi-square statistic (see Sect. 3.2.1) is the primary quality indicator of the track fit; see Sect. 6.4. A small p-value indicates a misspecification of the model or at least one outlying measurement that does not fit to the track . The standardized residuals or pulls can be used to look for outliers . The residuals r of the fit are defined by:

$$\displaystyle \begin{gathered} {\boldsymbol{r}}={\boldsymbol{m}_{}}-{\boldsymbol{f}_{}}({\boldsymbol{\tilde{p}}}),{} \end{gathered} $$
(6.11)

where \({\boldsymbol {\tilde {p}}}\) is the final estimate after convergence. Their covariance matrix is obtained by linearized error propagation and is approximately equal to:

$$\displaystyle \begin{gathered} {\boldsymbol{R}}={\mathsf{Var}\left[{\boldsymbol{r}}\right]}\approx{\boldsymbol{V}}-{\boldsymbol{F}}\hspace{0.5pt}({\boldsymbol{F}}{{}^{\mathsf{T}}}\hspace{0.5pt}{\boldsymbol{G}}\hspace{0.5pt}{\boldsymbol{F}}){{}^{-1}}\hspace{0.5pt}{\boldsymbol{F}}{{}^{\mathsf{T}}}.{} \end{gathered} $$
(6.12)

Note that R has rank M − m and thus cannot be inverted. The standardized residuals s are given by:

$$\displaystyle \begin{gathered} {\boldsymbol{s}}={\boldsymbol{r}}\,./\sqrt{{{\text{diag}\hspace{0.5pt}({\boldsymbol{R}})}}},{} \end{gathered} $$
(6.13)

where ./ denotes the point-wise division of two vectors or matrices (Matlab ® convention). They are approximately distributed according to a standard normal distribution. Outliers are characterized by an unusually large value of s.

The residuals r, i.e., the differences of the measured track and the fitted track are a superposition of measurement noise and process noise (mainly multiple scattering). The vector r of residuals can be further decomposed into two parts corresponding to the two types of noise [1]:

$$\displaystyle \begin{aligned} {\boldsymbol{r}}&={{\boldsymbol{r}}_{\mathrm{M}}}+{{\boldsymbol{r}}_{\mathrm{S}}},\ \;\mathrm{with}\ \; \end{aligned} $$
(6.14)
$$\displaystyle \begin{aligned} {{\boldsymbol{r}}_{\mathrm{M}}}&={\boldsymbol{V}_{\mathrm{M}}}\hspace{0.5pt}{\boldsymbol{G}^\prime}\hspace{0.5pt}{\boldsymbol{m}_{}},\ \;{\boldsymbol{R}_{\mathrm{M}}}={\mathsf{Var}\left[{{\boldsymbol{r}}_{\mathrm{M}}}\right]}={\boldsymbol{V}_{\mathrm{M}}}\hspace{0.5pt}{\boldsymbol{G}^\prime}\hspace{0.5pt}{\boldsymbol{V}_{\mathrm{M}}}, \end{aligned} $$
(6.15)
$$\displaystyle \begin{aligned} {{\boldsymbol{r}}_{\mathrm{S}}}&={\boldsymbol{V}_{\mathrm{S}}}\hspace{0.5pt}{\boldsymbol{G}^\prime}\hspace{0.5pt}{\boldsymbol{m}_{}},\ \;{\boldsymbol{R}_{\mathrm{S}}}={\mathsf{Var}\left[{{\boldsymbol{r}}_{\mathrm{S}}}\right]}={\boldsymbol{V}_{\mathrm{S}}}\hspace{0.5pt}{\boldsymbol{G}^\prime}\hspace{0.5pt}{\boldsymbol{V}_{\mathrm{S}}}, \end{aligned} $$
(6.16)
$$\displaystyle \begin{aligned} {\boldsymbol{G}^\prime}&={\boldsymbol{G}}\hspace{0.5pt}{\boldsymbol{R}}\hspace{0.5pt}{\boldsymbol{G}}. \end{aligned} $$
(6.17)

The two noise contributions can thus be checked independently via their standardized residuals s M and s S, given by:

$$\displaystyle \begin{aligned} {{\boldsymbol{s}}_{\mathrm{M}}}&={{\boldsymbol{r}}_{\mathrm{M}}}\,./\sqrt{{{\text{diag}\hspace{0.5pt}({\boldsymbol{R}_{\mathrm{M}}})}}}{} \end{aligned} $$
(6.18)
$$\displaystyle \begin{aligned} {{\boldsymbol{s}}_{\mathrm{S}}}&={{\boldsymbol{r}}_{\mathrm{S}}}\,./\sqrt{{{\text{diag}\hspace{0.5pt}({\boldsymbol{R}_{\mathrm{S}}})}}} \end{aligned} $$
(6.19)

1.2 Extended Kalman Filter

A “progressive” or recursive version of the LS regression for track fitting was first proposed in [2]. Soon, it was realized that this was the same as an (extended) Kalman filter [3] in the state space model of the track dynamics. The Kalman filter has the advantage that only small matrices have to be inverted, that the track fit follows the track as closely as possible, and that material effects such as multiple scattering and energy loss can be treated locally in each measurement or material layer. In addition, the filter can be complemented by the smoother; see Sect. 3.2.3.

In track fitting with the extended Kalman filter, it is assumed that the trajectory crosses a number of surfaces or layers with well-defined positions and orientations. A layer can be a measurement layer, a material layer, or both. At the intersection point of the trajectory with layer k, the state vector q k contains information about the local position, the local direction, and the local momentum of the track. The uncertainty of the information is specified by the associated covariance matrix C k. Different possible parameterizations of the track state are discussed in Sect. 4.2.

The Kalman filter is a sequence of alternating prediction and update steps; see Sect. 3.2.3. In the prediction step, the estimate \({\boldsymbol {\tilde {q}}}_{k-1}\) of the track state in layer k − 1 is extrapolated to layer k, along with its covariance matrix C k−1:

$$\displaystyle \begin{gathered} {\boldsymbol{\tilde{q}}}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k-1}}=\boldsymbol{f}_{k{\hspace{0.5pt}|\hspace{0.5pt}} k-1}({\boldsymbol{\tilde{q}}}_{k-1}),\ \; {\boldsymbol{C}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k-1}}}=\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} k-1}{\boldsymbol{C}_{ k-1}}\boldsymbol{F}_{k{\hspace{0.5pt}|\hspace{0.5pt}} k-1}{{}^{\mathsf{T}}}, \end{gathered} $$
(6.20)

where f k | k−1 is the track propagator from layer k − 1 to layer k, and F k | k−1 is its Jacobian matrix; see Sect. 4.3.

The update step is different in material and detector layers. In a material layer, multiple scattering is taken into account by inflating the covariance matrix elements of the track directions and, in a thick scatterer, of the track position; see Sect. 4.5. Energy loss by ionization is taken into account by decreasing the track momentum. For the treatment of electron bremsstrahlung, see Sect. 6.2.3.

The update step in a detector layer is given by Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32). The associated chi-square statistic χk2, Eq. (3.35), can be used to test the compatibility of the observation m k with the predicted state \({\boldsymbol {\tilde {q}}}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k-1}}\) in the combinatorial Kalman filter; see Sect. 5.1.7. A large value of χk2 or a small p-value indicates that the observation does not belong to the track.

In track fitting, the initial state q 0 is obtained from the track finder and therefore contains information from the observations. In order not to use the information twice, its covariance matrix is set to a large diagonal matrix. As a consequence, the initial chi-square statistics χk2 have zero degrees of freedom, until the covariance matrix of the state vector has full rank. For instance, if all measurements are 2D, and the state is 5D, χ12 and χ22 have zero degrees of freedom, χ32 has one degree of freedom, all subsequent χk2 have two degrees of freedom and the total chi-square statistic \({\chi ^2_{\mathrm {tot}}}\) has 2n − 5 degrees of freedom.

The smoother can be implemented either according to Eqs. (3.36) and (3.37) or by running a second filter in the opposite direction and combining the states of the two filters by a weighted mean (Eq. (3.38)):

$$\displaystyle \begin{gathered} {\boldsymbol{\tilde{q}}}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}={\boldsymbol{C}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}}\left[{\boldsymbol{C}_{ k}}{{}^{-1}}{\boldsymbol{\tilde{q}}}_{k}+\left({\boldsymbol{C}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}}\right){{}^{-1}}{\boldsymbol{\tilde{q}}}{}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}\right], \ \; {\boldsymbol{C}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}}{{}^{-1}}={\boldsymbol{C}_{ k}}{{}^{-1}}+\left({\boldsymbol{C}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}}\right){{}^{-1}}, \end{gathered} $$
(6.21)

where \({\boldsymbol {\tilde {q}}}{ }^{\,\mathrm {b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}\) is the predicted state from the backward filter and \({\boldsymbol {C}^{\,\mathrm {b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}}\) its covariance matrix . Alternatively, the predicted state from the forward filter and the updated state from the backward filter can be combined. The associated chi-square statistic χk | n2 (Eqs. (3.39) and (3.40)) can be used to test the compatibility of the observation m k with the smoothed state \({\boldsymbol {\tilde {q}}}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}\), using the entire track information. A large value of χk | n2 or a small p-value indicates that the observation does not belong to the track and is an outlier; see Sect. 6.4.2. A simplified version of the chi-square statistic χk | n2 is used in the deterministic annealing filter; see Sect. 6.2.2.

1.3 Regression with Breakpoints

Instead of absorbing the effects of multiple scattering in the covariance matrix of the process noise, the scattering angles at certain points, called breakpoints , can be explicitly incorporated into the track model as additional parameters. At these breakpoints, the scattering angles are estimated [4, 5], using their known expectation (zero) and known covariance matrix in the curvilinear parameterization; see Sect. 4.5.1. The breakpoint fit is mathematically equivalent both to LS regression in the linear approximation of the model and to the Kalman filter, unless the initial state of the latter has non-negligible information.

Let θ j = (θ j1, θ j2), j = 1, …, m denote two uncorrelated multiple scattering angles at the breakpoint j, and Q j their covariance matrix. Then the regression model in Eq. (6.1) can be modified to:

$$\displaystyle \begin{gathered} {\boldsymbol{m}_{k}}={\hat{\boldsymbol{f}}_{k}}({\boldsymbol{p}},\boldsymbol{\theta}_{1},\ldots,\boldsymbol{\theta}_{ j_k})+{\boldsymbol{\varepsilon}}_{k}, \ \;{\mathsf{E}\left[{\boldsymbol{\varepsilon}}_{k}\right]}=\mathbf{0},\ \;{\mathsf{Var}\left[{\boldsymbol{\varepsilon}}_{k}\right]}={\boldsymbol{V}_{ k}},\ \; k=1,\ldots,n, \end{gathered} $$
(6.22)

where j k is the index of the last breakpoint before measurement layer k. All measurement errors ε k are now independent, and their joint covariance matrix is block-diagonal, as is the joint covariance matrix of all θ j. The full regression model, which includes the scattering angles as additional parameters, now reads:

$$\displaystyle \begin{aligned} \begin{pmatrix}{\boldsymbol{m}_{1}}\\ \vdots\\ {\boldsymbol{m}_{n}}\\ \mathbf{0}\\ \vdots\\ \mathbf{0}\end{pmatrix}&= \begin{pmatrix} {\hat{\boldsymbol{f}}_{1}}({\boldsymbol{p}},\boldsymbol{\theta}_{1},\ldots,\boldsymbol{\theta}_{ j_1})\\ \vdots\\ {\hat{\boldsymbol{f}}_{n}}({\boldsymbol{p}},\boldsymbol{\theta}_{1},\ldots,\boldsymbol{\theta}_{ m})\\ \boldsymbol{\theta}_{1}\\ \vdots\\ \boldsymbol{\theta}_{ m}\end{pmatrix}+{\boldsymbol{\delta}},\ \;\mathrm{with}\ \; \end{aligned} $$
(6.23)
$$\displaystyle \begin{aligned} \ \;{\mathsf{E}\left[{\boldsymbol{\delta}}\right]}=\mathbf{0}, \ \;{\mathsf{Var}\left[{\boldsymbol{\delta}}\right]}&={{\text{blkdiag}\hspace{0.5pt}({\boldsymbol{V}_{1}},\ldots,{\boldsymbol{V}_{ n}},{\boldsymbol{Q}_{1}},\ldots,{\boldsymbol{Q}_{m}})}}. \end{aligned} $$
(6.24)

If the functions \({\hat {\boldsymbol {f}}_{k}}\) are Taylor-expanded to first order, a linear regression model with the following structure is obtained:

$$\displaystyle \begin{gathered} \begin{pmatrix}{\boldsymbol{m}_{1}}\\ \vdots\\ {\boldsymbol{m}_{n}}\\ \mathbf{0}\\ \vdots\\ \mathbf{0}\end{pmatrix}= \begin{pmatrix} {\boldsymbol{F}}_1 & \boldsymbol{H}_{1{\hspace{0.5pt}|\hspace{0.5pt}} 1} & \cdots & \boldsymbol{H}_{1{\hspace{0.5pt}|\hspace{0.5pt}} j_1} & \boldsymbol{O} & \cdots & \boldsymbol{O}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ {\boldsymbol{F}}_n & \boldsymbol{H}_{n{\hspace{0.5pt}|\hspace{0.5pt}} 1} & \cdots & \cdots & \cdots & \cdots & \boldsymbol{H}_{n{\hspace{0.5pt}|\hspace{0.5pt}} m}\\ \boldsymbol{O} & \boldsymbol{I} & \boldsymbol{O} & \cdots & \cdots & \cdots & \boldsymbol{O}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ \boldsymbol{O} & \cdots & \cdots & \cdots & \cdots & \boldsymbol{O} & \boldsymbol{I} \end{pmatrix}\cdot \begin{pmatrix}{\boldsymbol{p}}\\ \boldsymbol{\theta}_{1}\\ \vdots\\ \boldsymbol{\theta}_{ m}\end{pmatrix} +{\boldsymbol{c}}, \end{gathered} $$
(6.25)

where H k | j, j ≤ j k is the Jacobian matrix of the function that describes the dependence of m k on θ j and F k is as in Eq. (6.3).

The following two subsections describe two efficient implementations of the breakpoint concept.

1.4 General Broken Lines

The general broken lines (GBL) algorithm [6] is a fast track refit based on the breakpoint concept. It is particularly useful for track-based alignment and calibration with the Millepede-II package [7]. It is assumed that only thin scatterers are present, or that thick scatterers are divided into several thin scatterers. The algorithm needs an initial trajectory as a reference.

At each measurement plane and thin scatterer, a local orthonormal coordinate system (u, v, w) is defined. The natural choice of the w-axis is perpendicular to the sensor plane for a measurement and parallel to the track direction for a scatterer. At each thin scatterer, the offset (u, v) is a fit parameter, as is the global signed inverse momentum qp. The prior information on the scattering angles is the same as above, obtained from multiple scattering theory; see Sect. 4.5.1. The transformations from the curvilinear frame following the track to the local frames are given in Sect. 4.4.1.

The corrections to the track parameters in a measurement plane depend only on the (inverse) momentum and the adjacent offsets; therefore, their estimation requires the inversion of a bordered band matrix, the computing time of which is proportional to the number of breakpoints. The GBL is available in a dedicated software package [8] and is also implemented in GENFIT [9,10,11,12,13].

A comparative study in [6] shows that track fitting with the GBL is a little faster than the extended Kalman filter and up to three times faster than the Kalman filter plus smoother.

1.5 Triplet Fit

A track fit for situations in which the principal source of stochastic uncertainty is multiple scattering is described in [14]. The fit is an independent combination of fits to triplets of hits, where the middle point of a triplet is a breakpoint. As the triplet fits are fast and can be parallelized, the method is well suited for online reconstruction. The fit has been designed for low momentum tracks with several turns in the detector, but performs also very well for tracks in a high-resolution pixel tracker with momenta up to 10 GeV. For a comparison with a full helix fit and the GBL fit (Sect. 6.1.4) see [14]. The fit is implemented in a software package called WATSON [15] which is available on request from the authors of [14].

1.6 Fast Track Fit by Affine Transformation

Online track reconstruction in the first-level trigger, see Sect. 5.2, requires an ultra-fast track fit that can be implemented in high-speed hardware such as FPGAs [16]. One possibility to achieve the required speed is to fit an affine model to a training sample of simulated tracks that expresses the track parameters p as an affine function of the measurements m[17]:

$$\displaystyle \begin{gathered} {\boldsymbol{\tilde{p}}}={{\boldsymbol{A}}}{\boldsymbol{m}_{}}+{\boldsymbol{c}}. \end{gathered} $$
(6.26)

The matrix A and the vector c are estimated by minimizing the objective function

$$\displaystyle \begin{gathered} {\mathcal{S}}\left({{\boldsymbol{A}}},{\boldsymbol{c}}\right)=\sum_{i=1}^N \left({{\boldsymbol{A}}}{\boldsymbol{m}_{i}}+{\boldsymbol{c}}-{\boldsymbol{p}}_i\right){{}^{\mathsf{T}}}\left({{\boldsymbol{A}}}{\boldsymbol{m}_{i}}+{\boldsymbol{c}}-{\boldsymbol{p}}_i\right), \end{gathered} $$
(6.27)

where the p i are the true parameters of the N tracks in the training sample. The solution is given by:

$$\displaystyle \begin{aligned} {{\boldsymbol{A}}}&=\left[\langle{\boldsymbol{p}}\hspace{0.5pt}{\boldsymbol{m}_{}}{{}^{\mathsf{T}}}\rangle-\left\langle{\boldsymbol{p}}\right\rangle\langle{\boldsymbol{m}_{}}{{}^{\mathsf{T}}}\rangle\right]{\boldsymbol{C}}{{}^{-1}}, \ {\boldsymbol{c}}=\left\langle{\boldsymbol{p}}\right\rangle-{{\boldsymbol{A}}}\left\langle{\boldsymbol{m}_{}}\right\rangle,\ {\boldsymbol{C}}=\langle{\boldsymbol{m}_{}}{\boldsymbol{m}_{}}{{}^{\mathsf{T}}}\rangle-\left\langle{\boldsymbol{m}_{}}\right\rangle\langle{\boldsymbol{m}_{}}{{}^{\mathsf{T}}}\rangle, \end{aligned} $$
(6.28)

where C is the sample covariance matrix of the measurement vectors, and the angle brackets denote the average over the training sample. The goodness-of-fit can be judged by the chi-square statistic

$$\displaystyle \begin{gathered} {\chi^2}=\left[{\boldsymbol{m}_{}}-\left\langle{\boldsymbol{m}_{}}\right\rangle\right]{{}^{\mathsf{T}}}\hspace{0.5pt}{\boldsymbol{C}}{{}^{-1}}\hspace{0.5pt}\left[{\boldsymbol{m}_{}}-\left\langle{\boldsymbol{m}_{}}\right\rangle\right], \end{gathered} $$
(6.29)

which is approximately χ 2-distributed with M − m degrees of freedom, where M = dim(m) and m = dim(p). If the track model is exactly linear, i.e., m = Fp with some matrix F, the matrix A is equal to the Moore-Penrose pseudoinverse of F, and the rank of C is m, so that C has m positive eigenvalues while the remaining ones are equal to 0. If the linear approximation to the track model is valid in the entire training sample, C has m large and M − m small eigenvalues. The training sample, therefore, has to be chosen such that this condition is satisfied. In practice, this means that the detector volume has to be partitioned into many small regions, each with its own training sample. The training automatically takes into account material effects, measurement errors, misalignment, and the configuration of the magnetic field.

2 Robust and Adaptive Fitting

2.1 Robust Regression

The LS regression described in Sect. 6.1.1 is not robust in the sense that outliers in the track candidate lead to a significant distortion of the estimated parameters. One way to cope with this problem is to look for outliers in the standardized residuals of the regression and to remove them; see Sect. 6.4.2. A more elegant way is to make the regression robust by minimizing an objective function that is different from the sum of squared differences between the model and the measurements.

Three important approaches to robust regression are Least Median of Squares (LMS), Least Trimmed Squares (LTS), and M-estimators [18, 19]. LMS regression is difficult to compute in the context of track fitting; it is statistically less efficient than either LS or LTS, and there is no simple prescription for the computation of the covariance matrix of the estimated parameters. The computation of the LTS estimator requires the solution of a combinatorial optimization problem and is therefore rather time-consuming. The M-estimator, on the other hand, can be implemented as an iterated re-weighted LS regression and is therefore an excellent method for a robust track fit. The re-weighting can be done on single components of the measurement vectors m i or on entire measurement vectors.

An M-estimator is characterized by a function ψ(z) that determines the corresponding weight function ω(z) = ψ(z)∕z, where z is the standardized residual of a measurement. Setting ψ(z) = z yields the LS estimator. Table 6.1 and Fig. 6.1 show three examples of ψ and weight functions from the literature.

Fig. 6.1
figure 1

The weight functions of the three M-estimators in Table 6.1. (a) Huber type with c = 2. (b) Turkey’s biweight with c = 3. (c) Adaptive with c = 3

Table 6.1 The ψ functions and the corresponding weight functions ω of three M-estimators. c and T are constants. Further explanations can be found in the text

The constant c controls the shape of the weight function. The weight function of Huber’s M-estimator [19] is equal to one in the interval [−c, c] and slowly drops to zero outside the interval (see Fig. 6.1a). Tukey’s biweight [20] is a redescending estimator for which the weight function is equal to zero outside the interval [−c, c] (see Fig. 6.1b). For the adaptive M-estimator [21], c is the point where the weight function is equal to 0.5. This estimator has an additional control parameter T that modifies the transition from large to small weight and can be used for implementing an annealing procedure; see Sect. 6.2.2. In the limit of T → 0, the estimator is redescending, as the weight function approaches a step function that drops from 1 to 0 at z = c (see Fig. 6.1c). The computation of the M-estimator is summarized in Table 6.2. In order to be less sensitive to multiple scattering, it uses only the measurement component s M of the standardized residuals; see Eq. (6.18).

Table 6.2 Algorithm: Track fit with M-estimator

2.2 Deterministic Annealing Filter

The deterministic annealing filter (DAF) is an adaptive version of the standard Kalman filter [22]. It can modify the influence of outlying measurements by assigning them a smaller weight. It can also deal with the case that two (or more) measurements in the same detector layer are tagged by the track finder as valid candidates for inclusion in the track. This is particularly relevant in the LHC experiments, given the high track density in the central tracker. It is then up to the track fit to decide which of the competing measurements, if any, is most compatible with the local track state. Another use case is the choice between two mirror hits in a drift chamber . The DAF is implemented in GENFIT [9,10,11,12].

The measurements in layer k are denoted by m kj, j = 1, …, M k, and their covariance matrices by V ij. The DAF is implemented as an iterated Kalman filter plus smoother with annealing; see Table 6.3 for the basic sequence and [23, 24] for implementation details. In each layer, the state vector is updated with a weighted mean of the measurements, using the covariance matrices of the current iteration.

Table 6.3 Algorithm: Deterministic annealing filter

2.3 Gaussian-Sum Filter

The Kalman filter is (near) optimal if the track model is (approximately) linear, and both system and measurement noise are (approximately) Gaussian; see Sect. 3.2.3. If the noise is long-tailed or highly asymmetric, the Gaussian-sum filter (GSF) is an alternative estimator. It can be applied to a broad class of non-Gaussian distributions by allowing all densities involved to be mixtures of normal PDF s or Gaussian sums. The GSF can be applied in the following use cases:

  1. 1.

    Long-tailed measurement errors. The distribution of the measurement errors is contaminated by frequent outliers and can be modelled by a mixture of two Gaussians, the “core” and the “tails” [25].

  2. 2.

    Thin scatterers. The distribution of the multiple scattering angle in thin layers is non-Gaussian because of its long tails [26], but can be approximated by a Gaussian sum with two components [27]; see Sect. 4.5.1.

  3. 3.

    Inhomogeneous scatterers. Multiple scattering in an inhomogeneous material is often treated by computing an average thickness and an average radiation length (Eq. (4.79)). With the GSF, it is possible to describe the angular distribution by a Gaussian sum, with one or two components for each type of material [28].

  4. 4.

    Energy loss by bremsstrahlung. The distribution of the energy loss caused by bremsstrahlung of electrons is very far from being Gaussian. While the Kalman filter is restricted to using the first two moments (mean and variance) of the distribution, an approximation by a normal mixture allows the GSF to take into account more details of the shape of the energy loss PDF [29, 30]; see Sect. 4.5.3.

In the GSF, the PDF of the state vector can be a Gaussian sum at every surface. First assume that the surface is a material surface and that the predicted state vector qat the entry of the surface has the following normal mixture PDF with J components:

$$\displaystyle \begin{gathered} p_0({\boldsymbol{q}_{}})=\sum_{j=1}^J \pi_{j}\hspace{0.5pt}\varphi\hspace{0.5pt}({\boldsymbol{q}_{}};{\boldsymbol{q}_{j}},{\boldsymbol{C}_{ j}}),\ \; \sum_{j=1}^J \pi_{j}=1, \end{gathered} $$
(6.31)

where φ (q;q j, C j) is the normal PDF with mean q j and covariance matrix C j, j = 1, …, J. The process noise γin the surface is modeled by a normal mixture as well (see also Sect. 3.2.3.2):

$$\displaystyle \begin{gathered} g({\boldsymbol{\gamma}_{}})=\sum_{m=1}^M \omega_m\hspace{0.5pt}\varphi\hspace{0.5pt}({\boldsymbol{\gamma}_{}};{\boldsymbol{g}_{m}},{\boldsymbol{Q}_{m}}),\ \; \sum_{m=1}^M \omega_m=1. \end{gathered} $$
(6.32)

Note that both J and M may be equal to one. Then the PDF of the state vector at the exit of the surface is given by the following normal mixture with J × M components:

$$\displaystyle \begin{gathered} p_1({\boldsymbol{q}_{}})=\sum_{j=1}^J\sum_{m=1}^M \pi_{j}\hspace{0.5pt}\omega_m\hspace{0.5pt}\varphi\hspace{0.5pt}({\boldsymbol{q}_{}};{\boldsymbol{q}_{j}}+{\boldsymbol{g}_{m}},{\boldsymbol{C}_{ j}}+{\boldsymbol{Q}_{m}}). \end{gathered} $$
(6.33)

Now assume that the surface is a measurement surface and that the distribution of the measurement error of measurement mis modeled by a normal mixture:

$$\displaystyle \begin{gathered} g({\boldsymbol{\varepsilon}_{}})=\sum_{m=1}^M \omega_m\hspace{0.5pt}\varphi\hspace{0.5pt}({\boldsymbol{\varepsilon}_{}};\mathbf{0},{\boldsymbol{V}_{ m}}),\ \; \sum_{m=1}^M \omega_m=1. \end{gathered} $$
(6.34)

Again, J and M may be equal to one. The PDF of the updated state vector is given by:

$$\displaystyle \begin{gathered} p_1({\boldsymbol{q}_{}})=\sum_{j=1}^J\sum_{m=1}^M \eta_{jm}\hspace{0.5pt} \varphi\hspace{0.5pt}({\boldsymbol{q}_{}};{\boldsymbol{q}_{jm}},{\boldsymbol{C}_{ jm}}), \end{gathered} $$
(6.35)

with

$$\displaystyle \begin{gathered} \eta_{jm}\propto\pi_{j}\hspace{0.5pt}\omega_m\hspace{0.5pt}\varphi\hspace{0.5pt}({\boldsymbol{m}_{}};{\boldsymbol{h}_{}}({\boldsymbol{q}_{j}}),{\boldsymbol{V}_{ m}}+{\boldsymbol{H}}\hspace{0.5pt}{\boldsymbol{C}_{ j}}\hspace{0.5pt}{\boldsymbol{H}}{{}^{\mathsf{T}}}), \ \; \sum_{j=1}^J\sum_{m=1}^M \eta_{jm}=1. \end{gathered} $$
(6.36)

The mean q jm and the covariance matrix C jm are obtained by the Kalman filter update of component j of the predicted PDF p 0(q) with component m of the measurement error PDF g(ε), see Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32).

Finally, assume that there are M measurements m m in the measurement surface, with weights ω m, m = 1, …, M. In the absence or prior information, all weights are set to 1∕M. The observation mcan then be modeled by the following normal mixture PDF :

$$\displaystyle \begin{gathered} g({\boldsymbol{m}_{}})=\sum_{m=1}^M \omega_m\hspace{0.5pt}\varphi\hspace{0.5pt}({\boldsymbol{m}_{}};{\boldsymbol{m}_{m}},{\boldsymbol{V}_{ m}}),\ \; \sum_{m=1}^M \omega_m=1. \end{gathered} $$
(6.37)

The PDF of the updated state vector is given by:

$$\displaystyle \begin{gathered} p_1({\boldsymbol{q}_{}})=\sum_{j=1}^J\sum_{m=1}^M \eta_{jm}\hspace{0.5pt} \varphi\hspace{0.5pt}({\boldsymbol{q}_{}};{\boldsymbol{q}_{jm}},{\boldsymbol{C}_{ jm}}), \end{gathered} $$
(6.38)

with

$$\displaystyle \begin{gathered} \eta_{jm}\propto\pi_{j}\hspace{0.5pt}\omega_m\hspace{0.5pt}\varphi\hspace{0.5pt}({\boldsymbol{m}_{m}};{\boldsymbol{h}_{}}({\boldsymbol{q}_{j}}),{\boldsymbol{V}_{ m}}+{\boldsymbol{H}}\hspace{0.5pt}{\boldsymbol{C}_{ j}}\hspace{0.5pt}{\boldsymbol{H}}{{}^{\mathsf{T}}}), \ \; \sum_{j=1}^J\sum_{m=1}^M \eta_{jm}=1. \end{gathered} $$
(6.39)

The mean q jm and the covariance matrix C jm are obtained by a Kalman filter update of component j of the predicted PDF p 0(q) with component m of the observation PDF g(m). The resulting GSF is basically a combinatorial Kalman filter in which each track candidate has an additional weight which shows how likely it is compared to the other ones. This version of the GSF can be used for track finding; in this case, a missing hit with large errors should be added in each layer [31]. Its weight can reflect the hit efficiency of the measurement device.

In principle, the number of components rises exponentially in the course of the GSF, and it is necessary to reduce their number whenever it exceeds a threshold K set by the user. The simplest way of reducing the number of components is to keep the K components with the largest weights, drop the remaining ones, and renormalize the weights to a sum of one. A more sophisticated approach is to search for clusters among the components and to collapse the components in a cluster to a single Gaussian with the same mean and covariance matrix. Clustering can be based on the similarity between components, as measured by, e.g., the Kullback–Leibler divergence [30]. For a brief review of clustering algorithms, see Sect. 3.3. The choice of the threshold K and the clustering procedure have to be optimized by simulation studies.

Even for moderate values of K, the GSF is significantly slower than the Kalman filter; it is, therefore, used mainly for special applications such as the track fit of electrons with non-negligible bremsstrahlung [30, 32, 33].

3 Linear Approaches to Circle and Helix Fitting

3.1 Conformal Mapping Method

The conformal transformation described in Sect. 5.1.1 can be generalized to deal with circles passing close the origin [34]. The conformal transformation maps such a circle to a circle with a very small curvature, which in turn can be well approximated by a parabola:

$$\displaystyle \begin{gathered} v = \frac{1}{2b} - \frac{a}{b} \cdot u - \epsilon \left(\frac{R}{b}\right)^3 \cdot u^2, \end{gathered} $$
(6.40)

where \(\epsilon = R - \sqrt {a^2 + b^2}\) is the impact parameter. A standard parabola fit to the measurements in the transformed (u, v)-coordinates yields the parameters A, B and C according to

$$\displaystyle \begin{gathered} v = A +Bu+Cu^2, \end{gathered} $$
(6.41)

and the circle parameters are therefore given by

$$\displaystyle \begin{gathered} b=\frac{1}{2A}, \ \; a=-bB, \ \; \epsilon = -C \cdot \frac{b^3}{(a^2+b^2)^{3/2}}, \end{gathered} $$
(6.42)

using the approximation \(R \approx \sqrt {a^2 + b^2}\) in the expression of 𝜖 [34]. Following Gluckstern [35], it is also possible to obtain expressions of the estimated errors of the circle parameters [34].

3.2 Chernov and Ososkov’s Method

The task of fitting a circular track to a set of measurements is tantamount to minimizing the function

$$\displaystyle \begin{gathered} {\chi^2} = \sum_{i=1}^n {d_i}^2, \end{gathered} $$
(6.43)

where d i are measurement residuals orthogonal to the particle trajectory:

$$\displaystyle \begin{gathered} d_i = \pm \left[ \sqrt{(x_i - a)^2 + (y_i - b)^2} - R \right], \ i=1,\ldots,n, \end{gathered} $$
(6.44)

where a, b, and R are the coordinates of the circle centre and the radius. The approach of Chernov and Ososkov [36] is to simplify this non-linear minimization problem by introducing an approximate expression for the residuals d i,

$$\displaystyle \begin{gathered} d_i \approx \pm \left[ (x_i - a)^2 + (y_i - b)^2 - R^2 \right]/2R, \end{gathered} $$
(6.45)

which holds true with high precision as long as the residuals are small compared to the circle radius. The equations obtained by differentiating χ 2 with respect to the circle parameters and setting these to zero are quartic (polynomial equations of degree 4) and can be solved efficiently by a standard Newton iteration procedure.

3.3 Karimäki’s Method

Karimäki’s approach [37] starts from the simplified expression of the residuals d i introduced by Chernov and Ososkov [36] and considers a χ 2 with weighted residuals:

$$\displaystyle \begin{gathered} {\chi^2} = \sum_{i=1}^n w_i\hspace{0.5pt} {d_i}^2, \end{gathered} $$
(6.46)

The weights can for instance contain measurement uncertainties if they are not the same for all measurements (x i, y i).

The χ 2 function is minimized with respect to a set of circle parameters with Gaussian behaviour: the curvature κ (the inverse radius of curvature), the impact parameter 𝜖 (the distance from the origin to the point of closest approach of the fitted circle), and the direction ϕ of the tangent of the circle at the point of closest approach. Using this set of parameters, the simplified residuals are expressed as

$$\displaystyle \begin{gathered} d_i = \frac{1}{2}\hspace{0.5pt} \kappa\hspace{0.5pt} {r_i}^2 - (1 + \kappa \epsilon)\hspace{0.5pt} r_i \sin (\phi - \phi_i) + \frac{1}{2} \kappa \epsilon^2 + \epsilon , \end{gathered} $$
(6.47)

where r i and ϕ i are the polar coordinates of measurement i. The residuals can be written as d i = (1 + κ𝜖) η i, with

$$\displaystyle \begin{gathered} \eta_i = \gamma {r_i}^2 - r_i \sin (\phi - \phi_i) + \delta , \end{gathered} $$
(6.48)

and

$$\displaystyle \begin{gathered} \gamma = \frac{\kappa}{2(1 + \kappa \epsilon)}, \ \; \delta = \frac{1 + \kappa \epsilon/2}{1 + \kappa \epsilon} \epsilon. \end{gathered} $$
(6.49)

Using these definitions, the χ 2 can be written as

$$\displaystyle \begin{gathered} {\chi^2} = (1 + \kappa \epsilon)^2\hspace{0.5pt} \tilde{\chi}^2, \end{gathered} $$
(6.50)

where \(\tilde {\chi }^2 = \sum _i w_i \eta _i^2\). With the approximation \(1 + \kappa \epsilon \approx 1,\ \tilde {\chi }^2\) can be minimized instead of χ 2, leading to the attractive feature of a set of equations with explicit solutions:

$$\displaystyle \begin{aligned} \phi & = \frac{1}{2} \arctan\hspace{0.5pt}(2q_1/q_2), \end{aligned} $$
(6.51)
$$\displaystyle \begin{aligned} \gamma & = \left( \sin \phi \cdot C_{x z} - \cos \phi \cdot C_{yz} \right) / C_{z z}, \end{aligned} $$
(6.52)
$$\displaystyle \begin{aligned} \delta & = - \gamma \langle z \rangle + \sin \phi \langle x \rangle - \cos \phi \langle y \rangle, \end{aligned} $$
(6.53)

where q 1 = C zz C xy − C xz C yz and \(q_2= C_{z z}\hspace{0.5pt} ( C_{xx} - C_{yy}) - C_{x z}^2 + C_{y z}^2\) and the angle brackets 〈〉 denote a weighted average, e.g., 〈x〉 =∑i w i x i∕∑i w i. The variances and covariances of the measurements x, y and z = x 2 + y 2 are given by

$$\displaystyle \begin{gathered} \begin{array}{lll} C_{xx} = \langle x^2 \rangle - \langle x \rangle^2, & C_{xy} = \langle xy \rangle - \langle x \rangle \langle y \rangle, & C_{yy} = \langle y^2 \rangle - \langle y \rangle^2, \\ C_{xz} = \langle xz \rangle - \langle x \rangle \langle z \rangle, & C_{yz} = \langle yz \rangle - \langle y \rangle \langle z \rangle, & C_{z z} = \langle r^4 \rangle - \langle z \rangle^2. \end{array} \end{gathered} $$
(6.54)

The curvature κ and impact parameter 𝜖 are given by

$$\displaystyle \begin{gathered} \kappa = \frac{2 \gamma}{\sqrt{1 - 4 \delta \gamma}}, \ \; \epsilon = \frac{2 \delta}{1 + \sqrt{1 - 4 \delta \gamma}}. \end{gathered} $$
(6.55)

Expressions of the uncertainties of the estimated parameters are available and can be found in [37].

3.4 Riemann Fit

The Riemann circle fit [38] is based on the fundamental theorem in complex analysis that circles and lines in the plane correspond one-to-one to circles on the Riemann sphere. Since a circle on a sphere is the intersection of a plane with the sphere, there is a one-to-one correspondence between circles and lines in the plane and planes in space. The problem of fitting a circle to a set of measurements in the plane can, therefore, be transformed into the problem of fitting the transformed measurements to a plane in space. The latter problem can be solved directly by non-iterative methods.

The mapping of a point (u i, v i) in the plane to the transformed point (x i, y i, z i) on the Riemann sphere is given by

$$\displaystyle \begin{aligned} x_i & = u_i/(1 + {u_i}^2 + {v_i}^2), \\ y_i & = v_i/(1 + {u_i}^2 + {v_i}^2), \\ z_i & = ({u_i}^2 + {v_i}^2)/(1 + {u_i}^2 + {v_i}^2). \end{aligned} $$
(6.56)

The denominator in the expressions of the transformed measurements leads to small distances between the transformed measurements and the fitted plane for large radii \(R_i = \left ({u_i}^2 + {v_i}^2\right )^{1/2}\) in the plane. In an attempt to satisfy the Gauss-Markov conditions as closely as possible, a radius-dependent scaling factor was introduced in the fitting procedure in [39]. It was realized in [40] that this scaling factor could be omitted by mapping the points in the plane to a paraboloid rather than the Riemann sphere,

$$\displaystyle \begin{gathered} x_i = u_i,\ \; y_i = v_i,\ \; z_i = {u_i}^2 + {v_i}^2, \end{gathered} $$
(6.57)

leading to the same values of the estimated parameters if the measurements are at fixed radial positions.

Fitting a plane in space to the n measurements on the paraboloid is tantamount to minimizing the objective function

$$\displaystyle \begin{gathered} {\mathcal{S}}(c,{\boldsymbol{n}}) = \sum_{i=1}^n \frac{(c + n_1 x_i + n_2 y_i + n_3 z_i)^2}{\sigma_i^2} = \sum_{i=1}^n \frac{{d_i}^2}{\sigma_i^2} \end{gathered} $$
(6.58)

with respect to c and n = (n 1, n 2, n 3)T with the constraint that n is a unit vector. This is achieved by choosing n as the unit eigenvector corresponding to the smallest eigenvalue of the sample covariance matrix A of the measurements:

(6.59)

The constant c is equal to with the centre of gravity vector and the weights

$$\displaystyle \begin{gathered} w_i = \frac{1/\sigma_i^2}{\sum_{j=1}^n 1/\sigma_j^2},\ i=1,\ldots,n. \end{gathered} $$
(6.60)

Given the parameters of the fitted plane, a suitable set of circle parameters can be derived. For example, the parameters chosen in Sect. 6.3.3 are given by:

$$\displaystyle \begin{aligned} \phi & = \arctan \left( \frac{n_2}{n_1} \right), \end{aligned} $$
(6.61)
$$\displaystyle \begin{aligned} \kappa & = s\cdot \frac{2 n_3}{\sqrt{1 - n_3^2 -4 c n_3}}, \end{aligned} $$
(6.62)
$$\displaystyle \begin{aligned} \epsilon & = s\cdot \frac{\sqrt{1 - n_3^2 -4 c n_3} - \sqrt{1 - n_3^2 }}{2 n_3}, \end{aligned} $$
(6.63)

up to a sign s = ±1. One possible convention of determining s is given in [41].

Expressions of the uncertainties of the estimated circle parameters are given in [41] for measurement uncertainties both in the transverse and in the radial direction. Effects of multiple Coulomb scattering can also be included in this approach, essentially by modifying Eq. (6.59) to include correlations between all measurements due to multiple scattering [40]. A robust version of the Riemann fit based on LMS regression is proposed in [42].

3.5 Helix Fitting

Linearized helix fitting can be done by first estimating the parameters of the circle that results from the projection of the helix on the transverse (bending) plane. Any of the methods described above can be used for this. If the detector system at hand is of a barrel-type, so that the radial positions of the measurements are known to a very high precision, the path lengths to the intersections between the fitted circle and the detector elements can be obtained from the circle parameters. For the Riemann circle fit, these path lengths can be found directly from the knowledge of the parameters of the fitted plane [43]. The longitudinal (non-bending) plane parameters can then be found by solving the linear regression model:

$$\displaystyle \begin{gathered} {\boldsymbol{z}} = {{\boldsymbol{A}}} {\boldsymbol{p}} + {\boldsymbol{\varepsilon}}, \ \; {{\boldsymbol{A}}} = \begin{pmatrix} 1 & s_1 \\ \vdots & \vdots \\ 1 & s_n \end{pmatrix}, \end{gathered} $$
(6.64)

where z is the vector of z measurements and the parameter vector p is given by:

$$\displaystyle \begin{gathered} {\boldsymbol{p}} = \begin{pmatrix} z \\ \tan \lambda \end{pmatrix}, \ \; {\mathsf{Var}\left[{\boldsymbol{\varepsilon}}\right]}={\boldsymbol{V}}_z, \end{gathered} $$
(6.65)

where the dip angle λ = π∕2 − θ is the complement of the polar angle θ, and V z is the covariance matrix of the measurements (containing contributions from multiple scattering, if desired). From the fitted parameters, θ and z in the innermost layer can be immediately obtained.

In a forward-type detector, the z positions are known very precisely, whereas the radial positions of the measurements are observed. In this case, the regression is s on z. A suitable regression model is then:

$$\displaystyle \begin{gathered} {\boldsymbol{s}} = {{\boldsymbol{A}}} {\boldsymbol{p}} + {\boldsymbol{\varepsilon}}, \ \; {{\boldsymbol{A}}} = \begin{pmatrix} 1 & z_1 \\ \vdots & \vdots \\ 1 & z_n \end{pmatrix}, \ \; {\boldsymbol{p}} = \begin{pmatrix} s \\ \tan \theta \end{pmatrix}, \ \; {\mathsf{Var}\left[{\boldsymbol{\varepsilon}}\right]}={\boldsymbol{V}}_R, \end{gathered} $$
(6.66)

since the covariance matrix of s is a very good approximation to the covariance matrix of R. From the fitted parameters, the polar angle θ of the track and the s-values in all layers can be immediately determined, and, if desired, the predicted radial positions of all measurements [43]. If higher precision is needed, the circle and line fits can be iterated.

4 Track Quality

4.1 Testing the Track Hypothesis

The principal test statistic of the track hypothesis, i.e., of the hypothesis that all measurements in the track are generated by the same charged particle, is the total χ 2 of the track. It is exactly χ 2-distributed if, and only if, the following conditions are met:

  1. 1.

    the track model is exactly linear;

  2. 2.

    the measurement errors are normally distributed with mean zero and have the correct covariance matrix;

  3. 3.

    the material effects are normally distributed and have the correct covariance matrix;

  4. 4.

    the estimator is the LS estimator and thus a linear function of the measurements.

Obviously, these conditions are met very rarely, if ever, in the experiment. In most circumstances, the track model is the linear approximation of a non-linear one; the measurement errors are not strictly normal, and the calibration of their covariance matrix is not perfect; the distribution of the multiple scattering angle has tails that contradict the assumption of normality; the estimated track parameters may be distorted by outliers; and the estimator may be a robust version of the usual LS estimator. As a consequence, the best one can hope for is that the total χ 2 is at least approximately χ 2-distributed. Its distribution for a sample of tracks can be visualized by a histogram of the p-values, defined by:

$$\displaystyle \begin{gathered} p=\int_{{\chi^2}}^{\infty} g_{d}(x)\mathrm{d} x, \end{gathered} $$
(6.67)

where g d(x) is the PDF of the χ 2-distribution with d degrees of freedom. The number of degrees of freedom is the sum M of the measurement dimensions m i minus the dimension of the track parameter vector. In the ideal case, the p-values are uniformly distributed in the interval [0, 1]. In practice, one frequently observes a fairly uniform distribution with a peak at zero, where defective and fake tracks accumulate.

Besides the total chi-square statistic, the track length and the number of holes or missing measurements is an indication of the track quality. As the number of degrees of freedom of the track fit is the same as the number of geometrical constraints imposed on the measurements, a long track is much less likely to be a fake track than a short track. On the other hand, an outlier has less effect on the total χ 2 in a long track than it has in a short track. This can be demonstrated by a simple example. Assume a perfect sample of tracks with four measurements of dimension two. The fit of five track parameters leaves three degrees of freedom. A χ 2-cut at the 99%-quantile q 0.99,3 = 11.345 rejects 1% of the tracks. Assume that one of measurements is replaced by an outlier, thereby increasing the total χ 2 of every track by 3. Now the same cut rejects 4% of the tracks. Under the same assumptions, but with ten measurements and 15 degrees of freedom, the cut rejects only 2.4% of the tracks with outliers.

If the efficiency of the tracking detectors or sensors is known with good precision, a rigorous test on the allowed number of holes can be constructed. If, for the sake of simplicity, it is assumed that the efficiency 𝜖 is the same for all sensors contributing hits to a track candidate, and that the occurrence of holes is independent across sensors, the number h of holes in a track with n measurements is distributed according to a binomial distribution:

$$\displaystyle \begin{gathered} P(h)=\binom{n}{h}\,(1-\epsilon)^h \epsilon^{n-h}. \end{gathered} $$
(6.68)

If 𝜖 = 0.98 and n = 15, then P(1) = 0.23 and P(2) = 0.032, so a single hole is not suspicious at all, and two holes are unlikely, but not impossible. If n = 6, then P(1) = 0.11 and P(2) = 0.0055, so a single hole is possible, but the occurrence of two holes just by chance is very unlikely, in which case the suspicion of a fake track or a contaminated track is well-founded.

4.2 Detection of Outliers

In the context of track fitting, an outlier is defined as a measurement that does not follow the expected behaviour. This may be put into statistical terms by saying that a measurement is considered as an outlier whenever its distance from the locally estimated track position is too large under the assumption of normal measurement errors, the distance being expressed in terms of the covariance matrix attached to the measurement.

Outliers can be classified into track-correlated and track-uncorrelated ones [44]. Some sources of track-correlated outliers are:

  • Ambiguous measurements. Some tracking detectors, in particular drift chambers, give rise to ambiguous information; see also Sect. 1.2.3. The track search, being less restrictive than a rigorous track fit, is not always able to decide which of the two possible solutions is the correct one, and the decision must be deferred to the track fit. In the track fit, the wrong solution is regarded as an outlier which has to be spotted or suppressed.

  • Delta rays. Delta rays are energetic ionization electrons that leave a trail of secondary ionization in the detector and can cause a shift in the measured position.

  • Cluster merging. In a silicon sensor or in a gaseous detector, two clusters belonging to particles that are close in space may merge to a single cluster that is biased with respect to both true positions.

  • Cluster decay. Similarly, a large cluster may decay into two clusters, which are both biased with respect to the true position.

  • Non-normal measurement errors. Although the bulk of the measurements follows a normal distribution in most tracking detectors, there is nearly always a small fraction of the data that deviate from the normal law. These data show up as long tails in the error distribution and look like outliers.

  • Faulty covariance matrix. The errors attached to the measurement are too small, because of insufficient calibration, fluctuations of the signal, wrong assumptions about the track angle with respect to the sensor, dead channels, or other detector problems.

Track-uncorrelated outliers are signals that are not caused by the track, but are nevertheless picked up by the track search. They may be, for instance, signals from adjacent tracks, ghost hits in double-sided silicon sensors, see Sect. 1.3.1, or noise.

Whatever the source, an outlier can be detected by a test based on the residuals of the measurements with respect to the estimated track position. In the case of a single outlier, the test is most powerful if the estimate contains the information of all the other measurements. This is done most easily in the state space model of the track ; see Sects. 3.2.3 and 6.1.2. Let m k be measurement under scrutiny, q k | n the smoothed residual and C k | n its covariance matrix, see Eqs. (3.39) and (3.40) and end of Sect. 6.1.2. The compatibility of m k with q k | n can be checked component-wise on the basis of the standardized residuals, or globally on the basis of the chi-square statistic:

$$\displaystyle \begin{gathered} {\chi^2}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}{}=\left(\boldsymbol{r}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}\right){{}^{\mathsf{T}}}\left(\boldsymbol{R}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}\right){{}^{-1}}\boldsymbol{r}_{{k{\hspace{0.5pt}|\hspace{0.5pt}}{}n}}. \end{gathered} $$
(6.69)

If there are no outliers, the standardized residuals should be compatible with a standard normal distribution, and χk | n2 should be compatible with a χ 2-distribution with m k degrees of freedom, where m k is the dimension of m k. If m k is an outlier, this should be visible in the values of r k | n and χk | n2. There is, however, a problem with this approach. Even a single outlier at position k introduces a bias in all of the states q i | n, i ≠ k, so that also an inlier at position i ≠ k can show abnormal values of the residuals and χi | n2, especially if i is close to k. As a consequence, it is by no means obvious that m k can be correctly identified as the outlier. The situation is even worse if there are several outliers. In this case, a robust track fit that down-weights outliers, instead of trying to find and remove them, is a better solution; see Sect. 6.2.

4.3 Kink Finding

A charged particle decay that produces a single charged daughter particle plus some neutral ones manifests itself as a sudden change of the track direction and/or curvature, often called a kink or breakpoint. Footnote 1 Typical examples are the muonic decays of charged π and K mesons. Another source of kinks is hard elastic scattering on the material of the detector [45]. Collinear energy loss of an electron by bremsstrahlung does not result in a kink, but only in a change of curvature.

It is characteristic for a kink that the track segments in front of and behind the kink both give a good fit to their respective track model; however, there is a significant difference between the two sets of track parameters estimated from the two track segments. In the Kalman filter framework, this difference and its covariance matrix are readily available at any layer k from the forward and the backward filter :

$$\displaystyle \begin{gathered} {\boldsymbol{\varDelta}}_k={\boldsymbol{q}_{k}}-{\boldsymbol{\tilde{q}}}{}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1},\ \;{\mathsf{Var}\left[{\boldsymbol{\varDelta}}_k\right]}={\boldsymbol{C}}_{\varDelta,k}={\boldsymbol{C}_{ k}}+{\boldsymbol{C}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}}. \end{gathered} $$
(6.70)

The associated chi-square statistic is given by:

$$\displaystyle \begin{gathered} {\chi^2}_{\varDelta,k}={\boldsymbol{\varDelta}}_k{{}^{\mathsf{T}}}\hspace{0.5pt}{\boldsymbol{C}}_{\varDelta,k}{{}^{-1}}\hspace{0.5pt}{\boldsymbol{\varDelta}}_k. \end{gathered} $$
(6.71)

In [44] a χ 2-test statistic X 2 for kink finding is investigated:

$$\displaystyle \begin{aligned} X^2=\max_{k\in K}{\chi^2}_{\varDelta,k}, \end{aligned} $$
(6.72)

where the range K of layers is restricted by the requirement that the respective track segments from the forward and the backward filter are well defined. Results from the simulation of π and K decays in a simplified setup are given in [44].

This simple test does not take into account the specific features of the process leading to the kink. In energy loss by bremsstrahlung, only the curvature changes; in hard elastic scattering, only the direction changes; in a decay, the direction changes and the momentum decreases, or curvature increases. In all three processes, the position of the two track segments has to be compatible. These features are taken into account by the modified track fit described in [45]. At each possible breakpoint, an extended set of track parameters α is defined that allows for sudden changes in a subset of the track parameters. Three cases are considered:

  1. 1.

    Energy loss by bremsstrahlung. The curvature is allowed to change; therefore, α contains two curvature parameters instead of one, for instance, κ f and κ b.

  2. 2.

    Hard elastic scattering. Only the direction is allowed to change; therefore, α contains two sets of direction parameters instead of one, for instance \(\tan {\lambda _{\mathrm {f}}},{\phi _{\mathrm {f}}}\) and \(\tan {\lambda _{\mathrm {b}}},{\phi _{\mathrm {b}}}\).

  3. 3.

    Muonic decays of π and K mesons. Direction and curvature are allowed to change; therefore, α contains two sets of the corresponding parameters instead of one, for instance, \(\tan {\lambda _{\mathrm {f}}},{\phi _{\mathrm {f}}},{\kappa _{\mathrm {f}}}\) and \(\tan {\lambda _{\mathrm {b}}},{\phi _{\mathrm {b}}},{\kappa _{\mathrm {b}}}\).

At layer k, α can be estimated by a linear regression in which \({\boldsymbol {\tilde {q}}}_{k}\) and \({\boldsymbol {\tilde {q}}}{ }^{\,\mathrm {b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}\) play the role of the observations. This is equivalent to the minimization of the following objective function:

$$\displaystyle \begin{aligned} {\mathcal{S}}({\boldsymbol{\alpha}}{})&=\left({\boldsymbol{\tilde{q}}}_{k}-{{\boldsymbol{H}}_{\mathrm{f}}}\hspace{0.5pt}{\boldsymbol{\alpha}}{}\right){{}^{\mathsf{T}}}{\boldsymbol{C}_{ k}}{{}^{-1}}\left({\boldsymbol{\tilde{q}}}_{k}-{{\boldsymbol{H}}_{\mathrm{f}}}\hspace{0.5pt}{\boldsymbol{\alpha}}\right)+\\ &\quad +\left({\boldsymbol{\tilde{q}}}{}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}-{{\boldsymbol{H}}_{\mathrm{b}}}\hspace{0.5pt}{\boldsymbol{\alpha}}{}\right){{}^{\mathsf{T}}}\left({\boldsymbol{C}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}}\right){{}^{-1}}\left({\boldsymbol{\tilde{q}}}{}^{\,\mathrm{b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}-{{\boldsymbol{H}}_{\mathrm{b}}}\hspace{0.5pt}{\boldsymbol{\alpha}}\right), \end{aligned} $$
(6.73)

where H f and H b are the matrices that project α on \({\boldsymbol {\tilde {q}}}_{k}\) and \({\boldsymbol {\tilde {q}}}{ }^{\,\mathrm {b}}_{k{\hspace{0.5pt}|\hspace{0.5pt}}{}k+1}\), respectively.

From the estimated vector α and its covariance matrix, standardized forward-backward differences of the relevant parameters can be computed. In addition an F-test can be performed to test whether additional parameters result in a significant reduction of the total chi-square statistic. The location of the breakpoint can be determined by the location of the largest discrepancy between forward and backward parameters, as measured by the value of \({\mathcal {S}}({\boldsymbol {\alpha }}{})\) at the minimum. Results of studies of simulated pion decays in the NOMAD detector are shown in [45].

The breakpoint finder in [46] is based on the autocorrelation function of the residuals of the track fit. In an undisturbed track with many measurements, typically in a TPC, the residuals between the measured coordinates and the fitted trajectory are only weakly correlated so that the autocorrelation function of the residuals is close to zero for arbitrary lags. A breakpoint in the track introduces correlated shifts in all subsequent position measurements, resulting in an autocorrelation function that is significantly different from zero. Assuming 1D position measurements, the average autocorrelation of lag is defined by:

$$\displaystyle \begin{gathered} \rho_\ell=\left(\sum_{i=1}^{n-\ell}r_i r_{i+\ell}\right)\, \left(\sum_{i=1}^{n-\ell} {r_i}^2 \sum_{i=1}^{n-\ell}r_{i+\ell}^2\right)^{-1/2}, \end{gathered} $$
(6.74)

where r i = δ iσ i is the residual of measurement i divided by the standard error of measurement i, and n is the total number of measurements. The test statistic λ used in [46] is a weighted average of the autocorrelations up to a maximal lag L such that small lags have larger weight:

$$\displaystyle \begin{gathered} \lambda=\sum_{\ell=1}^L w_\ell\,\rho_\ell,\ \; w_\ell=\frac{2\,(L-\ell)}{L\,(L-1)},\ \; \sum_{\ell=1}^L w_\ell=1. \end{gathered} $$
(6.75)

For the simulated data used in [46], setting L equal to the nearest integer to n/8 gives the largest power of the test. In general, the maximal lag L and the weights w must be tuned on simulated data. The threshold of λ above which the null hypothesis (no breakpoint) is rejected, is set according to the tolerated percentage of undisturbed tracks that are rejected, i.e., to the probability of an error of the first kind.