13.1 Track Reconstruction

13.1.1 Introduction

Track reconstruction is the task of finding and estimating the trajectory of a charged particle, usually embedded in a static magnetic field to determine its momentum and charge. It involves pattern recognition algorithms and statistical estimation methods. Depending on the physics goals, not all charged tracks have to be reconstructed. For instance, in many cases there is a physically motivated lower limit on the momentum or transverse momentum of the particles to be found. Other examples are short-range secondary particles, such as δ-electrons, that normally need not be reconstructed. It may also be useful to reconstruct electron-positron pairs from photon conversions in order to check the distribution of material in the detector. Track reconstruction frequently proceeds in several steps:

  1. 1.

    Pattern recognition or Track finding: Finds the detector signals (hits) that are generated by the same charged particle.

  2. 2.

    Track fitting: Estimates for each track candidate the track parameters and the associated covariance matrix.

  3. 3.

    Test of track hypothesis: Tests for each track candidate whether all hits do indeed belong to the track and identifies outliers.

There are many different algorithms for track finding. A selection of them is described in Sects. 13.1.2.1 and 13.1.2.2. For an extended treatment of the subject, containing many examples, see the excellent exposition in [1]. The track fit takes a track candidate and estimates the track parameters (location, direction, momentum or curvature, see Sect. 13.1.3.2) from the detector hits (Sect. 13.1.3.4), taking into account the equation of motion (Sect. 13.1.3.2) in the magnetic field (Sect. 13.1.3.1) and the effects of the detector material on the trajectory (Sect. 13.1.3.3). In the test stage (Sect. 13.1.3.5) outliers are identified, i.e., hits which apparently do not belong to the track. If outliers are expected, the estimation procedure should be robust so that the estimated track is not significantly biased by the outliers. Some robust methods are discussed in Sect. 13.1.3.5. Section 13.1.4 treats track-based alignment, and Sect. 13.1.5 contains many useful formulas for determining the approximate momentum resolution of a tracking detector without extensive simulations.

13.1.2 Pattern Recognition

Pattern recognition or track finding methods can be divided into global and local methods. In a global method, all detector hits are treated on an equal footing, and all track candidates are found in parallel; in a local method, there is a privileged subset of hits which is used to find initial track candidates, which are then completed to full track candidates.

13.1.2.1 Global Methods

Typical global methods of track finding find the tracks in parallel, for instance by identifying peaks in a one- or two-dimensional histogram, or by observing the final state of a recurrent neural network.

13.1.2.1.1 Conformal Mapping

A popular method for finding circular particle tracks is the conformal mapping method [2]. It uses the fact that the mapping

$$\displaystyle \begin{aligned} u = \frac{x}{x^2 + y^2},\quad v = \frac{y}{x^2 + y^2}, \end{aligned}$$

transforms circles going through the origin of an xy coordinate system into straight lines of the form

$$\displaystyle \begin{aligned} v = \frac{1}{2b} - u \frac{a}{b}, \end{aligned}$$

where the parameters a and b are defined by the circle equation

$$\displaystyle \begin{aligned} \left( x - a \right)^2 + \left( y - b \right)^2 = R^2 = a^2+b^2. \end{aligned}$$

The distance of the line to the origin is equal to 1∕(2R), so that for large radius R it passes very close to the origin. The lines can be found by a histogramming method. After transforming the measurements in the uv plane to polar coordinates and collecting the polar angle θ in a histogram, measurements belonging to the same particle will tend to create peaks in the histogram.

As an example, the measured points of six circular tracks are shown in the left hand panel of Fig. 13.1, while the transformed measurements are shown in the right hand panel of Fig. 13.1. The resulting histogram of the polar angle θ is shown in Fig. 13.2.

Fig. 13.1
figure 1

The original measurements (left) and the transformed measurements (right) of six circular tracks

Fig. 13.2
figure 2

Histogram of θ = arctan(vu)

13.1.2.1.2 Hough Transform

In the general case of lines not passing close to the origin, a more general approach is needed in order to find the lines. A very popular method for this purpose is the Hough transform [3]. The principle of the Hough transform can be explained by noting that a straight line in an xy coordinate system, y = cx + d, can also be regarded as a straight line in a cd coordinate system by the transformation d = −xc + y. For a fixed point (x, y), the line in cd space (also denoted parameter space) corresponds to all possible lines going through this point in xy space (also denoted image space). Measurements lying along a straight line in image space therefore transform into lines in parameter space which cross at the specific value of the parameters of the line under consideration in image space. In practice, parameter space is discretized, and each measurement (x, y) leads to an increment of a set of histogram bins. Measurements lying along straight lines tend to create peaks in the histogram, and the lines can be found by searching for peaks in this histogram. The granularity of the discretization has to be optimized for each specific application, as it depends on the amount of noise present and the actual values of measurement uncertainties. A too fine-grained histogram can split or destroy peaks if the measurement uncertainties are non-negligible. On the other hand, a too course-grained histogram increases the sensitivity to noise, and nearby tracks may merge into a single peak.

The basic formulation of the Hough transform is an example of a divergent transform, i.e., one measurement in image space corresponds to a set of increments of histogram entries in parameter space. The Hough transform can also be made convergent by considering instead a pair of measurements in image space. A unique line passes through any such pair, and only one entry in the parameter space histogram needs to be incremented. A possible disadvantage of such an approach is that the number of pairs grows quadratically with the number of measurements in image space. In order to reduce computational complexity, one may consider only a randomly selected subset of all the pairs. This is the basic feature of probabilistic Hough transforms [4].

The Hough transform has turned out to be successful also for finding circles passing through the origin. With this constraint, two parameters are enough to uniquely describe the circle, and the task again amounts to finding peaks in a two-dimensional histogram. With three or more parameters, one has to search for clusters in multi-dimensional spaces, and in this case the Hough transform is in general less powerful than in the two-dimensional case.

For track finding in drift tubes, with their inherent left-right ambiguity, the drift circles can be transformed to sine curves in the (r, θ) space by applying a Legendre transform [5]. The peaks at the intersections of several sine curves represent the common tangents to a set of several circles.

13.1.2.1.3 Neural Networks

Recurrent neural networks of the Hopfield type [6] are used in finding solutions to certain kinds of combinatorial optimization problems, i.e., problems that can be formulated as finding the minimum of an energy function

$$\displaystyle \begin{aligned} E = - \frac{1}{2} \sum_i \sum_j T_{ij} S_i S_j \end{aligned}$$

with respect to the configuration of n binary-valued neurons S i, i = 1, …, n and fixed connection weights T ij, i, j = 1, …, n. It was realized independently in [7] and [8] that the track finding problem can be formulated as a minimization problem of this kind. The neurons are links between measurements which potentially belong to the same track. The connection weights T ij have a structure which favors links sharing a measurement and pointing in a similar direction. The standard network dynamics leads to a solution corresponding to a local minimum of the energy function. A better solution is to apply a mean-field annealing technique [9], which introduces a temperature parameter and thereby allows the neurons to take all values in the interval between the two original binary values. The network is initialized at a high temperature, the mean-field equations are iteratively solved as the network is cooled down, and the low-temperature limit is taken in the end. At a significantly lower computational effort, the approximate solutions found by the mean-field technique have been shown to be very close to the exact solutions [10]. For applications of the Hopfield network in experiments see e.g. [11,12,13,14,15].

The energy function of the Hopfield network can be generalized in order to take into account the track model (see Sect. 13.1.3.2), i.e., the known parametric form of the tracks. The resulting algorithm is called elastic tracking or elastic arms [16,17,18,19]. A related generalization is the elastic net, originally used to tackle the traveling salesman problem [20]. Applications to track finding are described in [21] and [22, 23].

13.1.2.2 Local Methods

A local track finding method finds the tracks sequentially, starting from an initial track segment or an initial collection of measured points.

13.1.2.2.1 Track Road

The track road method starts out with a set of measurements that potentially belong to the same track, typically one close to the vertex area, one far out in the tracking detector, and one in the middle. The track model can then be used, either exactly or approximately, if speed is an important issue, to interpolate between the measurements and create a road around the hypothesized track. Measurements inside the road are then collected. The number of measurements and the quality of the subsequent track fit are used to determine whether the track candidate should be kept or discarded.

13.1.2.2.2 Track Following

A track following procedure takes a track seed as a starting point. A seed is often a short track segment, potentially including a constraint of the position of the vertex region. Seeds can be generated at the inner part of the tracking detector, where the measurements frequently are of very high precision, or at the outer part, where the track density is lower. From the seed, the track is extrapolated to the next detector unit. As for the track roads method, this can be done either with the full track model or with an approximate, simplified model. The measurement closest to the predicted track is included in the track candidate, and the track is extrapolated again.

13.1.2.2.3 Kalman Filter

The Kalman filter [24,25,26] can be regarded as a statistically optimal track following procedure. It works by alternating prediction and update steps. Starting from the seed, the track parameters and their covariance matrix are extrapolated to the next detector unit containing a measurement, using the full track model. If the measurement is compatible with the prediction, it is included in the track candidate, and the track parameters and their covariance matrix are updated with the information from the measurement. The procedure is repeated until too many detector units without compatible measurements are traversed or the end of the tracking detector is reached.

In the original formulation of the method, the measurement closest to the predicted track is included in the track candidate [27]. However, if the density of measurements is high, the closest measurement might originate from another particle or from noise in the detector electronics. Including the wrong measurement could therefore lead to a wrong subsequent prediction and ultimately to the loss of the track. The currently most popular approach, the combinatorial Kalman filter, avoids such losses by splitting the track candidate into several branches when several compatible measurements are found after the prediction [28]. In order to take into account detector inefficiencies an additional branch with a missing hit can be generated.

All branches are extrapolated to the next detector layer containing compatible measurements. A branch is split again if several measurements are compatible with the branch prediction. Branches are removed if too many detector units without compatible measurements are traversed or if the quality of the track candidate, in terms of the value of a χ 2 statistic, is too low. If there are several surviving candidates after the end of the detector has been reached, the candidate with most measurements and the lowest value of the χ 2 statistic is kept and regarded as the final track candidate. An example is shown in Fig. 13.3.

Fig. 13.3
figure 3

An example of the combinatorial Kalman filter (reprinted from R. Mankel [28], with permission from Elsevier)

A similar track finding method has been formulated in the language of cellular automata [23, 29]. The combinatorial problem can also be solved by using generalized, adaptive versions of the Kalman filter [30,31,32].

13.1.3 Estimation of Track Parameters

13.1.3.1 Magnetic Field Representation

The presence of a magnetic field in a tracking detector causes a bending of the trajectory of a charged particle, and, hence, allows a measurement of the particle momentum. A precise knowledge of the magnetic field is therefore crucial for accurate estimates of the particle momenta.

The magnetic field can be calculated by solving Maxwell’s equations, knowing the detailed configuration of the current sources and the magnetic materials in the detector volume. In the general case, a numerical solution of these equations in terms of a finite-element analysis is needed. In special cases, the field can be found by less general approaches. The simplest situation is a solenoidal magnet, providing a homogeneous field in a large volume. Also, it is known that the field inside a volume with no magnetic material can be determined by knowledge of the field on the volume boundary only [33]. Measurements of the field on the volume boundary allows an estimation of coefficients of polynomials obeying Maxwell’s equations. Field measurements inside the volume are used to evaluate the quality of the calculated field. If the measurements inside the volume are precise enough, they can be used to further refine the knowledge of the field by being included in the estimation procedure of the abovementioned coefficients [34].

In a track reconstruction application, fast access to the value of the magnetic field at any point inside the detector volume is crucial. For this purpose, a numerical representation of the field is needed. A frequently used approach is to create a table of the magnetic field values at a grid of points and to determine the field at points between the grid nodes by linear or quadratic interpolation. An alternative approach is to divide the detector volume into several sub-volumes and to fit the coefficients of low-order polynomials to the known field values inside each sub-volume [35, 36]. If the number of sub-volumes is large, potentially many coefficients have to be determined. On the other hand, once the coefficients are determined, the field access is very fast. Also, the derivatives of the field, which are needed by some track reconstruction algorithms, can be computed as fast as the field itself.

13.1.3.2 Track Models

Consider a charged particle with mass m and charge Q = qe, e being the elementary charge. Its trajectory x(t) in a magnetic field B(x) is determined by the equations of motion given by the Lorentz force F ∝ qv ×B, where v = dx∕dt is the velocity of the particle. In vacuum, Newton’s second law reads [37]

$$\displaystyle \begin{aligned} \frac{\mathrm{d}\boldsymbol{p}}{\mathrm{d} t} = k q {\boldsymbol{v}} ( t ) \times \boldsymbol{B} ( \boldsymbol{x} ( t ) ),{} \end{aligned} $$
(13.1)

where p = γmv is the momentum of the particle, γ = (1 −v 2c 2)−1∕2 is the Lorentz factor, and k is a unit-dependent proportionality factor. If p is in GeV/c, x is in meters, and B is in Tesla, k = 0.29979 GeV∕c T−1 m−1. The trajectory is uniquely defined by the initial conditions, the six degrees of freedom specified for instance by the initial position and the initial velocity. If these are tied to a surface, five degrees of freedom are necessary and sufficient. Geometrical quantities other than position and velocity can also be used to specify the initial conditions. The collection q of these quantities is called the initial track parameters or the initial state vector.

Equation (13.1) can be written in terms of the path length s(t) along the trajectory instead of t, giving [37]

$$\displaystyle \begin{aligned} \frac{\mathrm{d}^2 \boldsymbol{x}}{\mathrm{d} s^2} = \frac{k q}{|\boldsymbol{p}|} \cdot \frac{\mathrm{d}\boldsymbol{x}}{\mathrm{d} s} \times \boldsymbol{B} ( \boldsymbol{x} ( s ))=F(s,\boldsymbol{x}(s),\dot{\boldsymbol{x}}(s)). \end{aligned} $$
(13.2)

In simple situations this equation has analytical solutions. In a homogeneous magnetic field the trajectory is a helix; it reduces to a straight line in the limit of a vanishing field. In the general case of an inhomogeneous field, numerical methods can be used, such as Runge–Kutta integration of the equations of motion or parametrization by polynomials or splines [37]. Among Runge–Kutta methods, the Runge-Kutta-Nyström algorithm is specially designed for second-order equations such as Eq. (13.2). In the fourth-order version a step of length h, starting at s = s n, is computed by [37]

$$\displaystyle \begin{aligned} \boldsymbol{x}_{n+1}=\boldsymbol{x}_n+h\dot{\boldsymbol{x}}_n+h^2(k_1+k_2+k_3)/6,\quad \dot{\boldsymbol{x}}_{n+1}=\dot{\boldsymbol{x}}_n+h(k_1+2k_2+2k_3+k_4)/6, \end{aligned}$$

with

$$\displaystyle \begin{aligned} k_1&=F(s_n,\boldsymbol{x}_n,\dot{\boldsymbol{x}}_n),\\ k_2&=F(s_n+h/2,\boldsymbol{x}_n+h\dot{\boldsymbol{x}}_n/2+h^2k_1/8,\dot{\boldsymbol{x}}_n+hk_1/2),\\ k_3&=F(s_n+h/2,\boldsymbol{x}_n+h\dot{\boldsymbol{x}}_n/2+h^2k_1/8,\dot{\boldsymbol{x}}_n+hk_2/2),\\ k_4&=F(s_n+h,\boldsymbol{x}_n+h\dot{\boldsymbol{x}}_n+h^2k_3/2,\dot{\boldsymbol{x}}_n+hk_3), \end{aligned} $$

where x n is the position of the particle at s = s n and \(\dot {\boldsymbol {x}}_n\) is the unit tangent vector. The magnetic field needs to be looked up for the calculation of k 2, k 3 and k 4, i.e., three times per step. If the field at the final position x n+1, which is the starting position of the next step, is approximated by the field used for k 4, only two lookups are required per step. If the field is (almost) homogeneous, as for example in a solenoid, the step size h can be chosen to be constant; otherwise a variable step size is more efficient. The step size can be optimized using an adaptive version of the Runge-Kutta-Nyström algorithm [38]. Note that the error of a step of length h may be larger than O(h 5) if the magnetic field does not have smooth derivatives, as is the case if it is computed by linear interpolation. If the field is represented by low-order polynomials in sub-volumes, Runge–Kutta steps should terminate at volume boundaries.

Different detector geometries often lead to different choices of the parametrization. However, the parametrization of the trajectory should comply to some basic requirements: the parameters should be continuous with respect to small changes of the trajectory; the choice of track parameters should facilitate the local expansion of the track model into a linear function; and the uncertainties of the estimated values of the parameters should follow a Gaussian distribution as closely as possible. For example, curvature should be used rather than radius of curvature, and inverse (transverse) momentum rather than (transverse) momentum.

The track model, given by the solution of the equations of motion, describes how the state vector q k at a given surface k depends on the state vector at a different surface i:

$$\displaystyle \begin{aligned} \boldsymbol{q}_k = \boldsymbol{f}_{k|i} ( \boldsymbol{q}_i ), \end{aligned}$$

where f k|i is the track propagator from surface i to surface k. When analytical solutions of the equations of motion exist, the track propagator is also analytical. Even in a homogeneous magnetic field, the path length can be determined analytically only for propagation to cylinders with symmetry axis parallel to the field direction or to planes orthogonal to the field direction. Otherwise, a Newton iteration or a parabolic approximation has to be used to find the path length.

For track reconstruction purposes, the covariance matrix of the estimated track parameters needs to be propagated along with the track parameters themselves. The track propagator is often a non-linear function of the track parameters at the initial surface, but the covariance matrix has to be transported under the assumption of a linear track model. This procedure, called linear error propagation, is based on a Taylor expansion of the track propagator, keeping only first-order terms. These first-order terms, defining the Jacobians of the track model, are given by

$$\displaystyle \begin{aligned} \boldsymbol{F}_{k|i} = \left. \frac{\partial \boldsymbol{q}_k}{\partial \boldsymbol{q}_i} \right|{}_{\breve{\boldsymbol{q}}_{i}}, \end{aligned}$$

where q̆i is the expansion point in surface i. For analytical track models, the Jacobian is also analytical. If, for example, the magnetic field is homogeneous, the general case of propagation to a plane of arbitrary spatial orientation uses a curvilinear coordinate frame moving along with the trajectory as a means of deriving the required Jacobians [39].

In the general case of a non-analytical track model, the Jacobians cannot be computed analytically either. The most straightforward approach is to calculate the relevant derivatives in a purely numerical way. The basis for these calculations is a reference trajectory corresponding to the expansion point. In addition, five other trajectories are created, corresponding to small variations in each of the track parameters. By propagating these five trajectories to the destination surface, numerical derivatives can be obtained. A potential disadvantage of such an approach is its computational complexity, as six trajectories have to be propagated instead of a single one. Much less computational load is introduced by transporting the Jacobian terms in parallel to the track parameters during the Runge–Kutta integration [40, 41], avoiding the need for propagating auxiliary trajectories.

The measurement model describes the functional dependence of the measured quantities on the state vector at a detector surface k:

$$\displaystyle \begin{aligned} \boldsymbol{m}_k = \boldsymbol{h}_k ( \boldsymbol{q}_k ). \end{aligned}$$

The vector of measurements m k usually contains the measured coordinates, but may contain also other quantities, e.g. measurements of direction or even momentum. In a pixel detector or in a double-sided silicon strip detector, m k is two-dimensional; in a one-sided strip detector, it is one-dimensional. In a drift chamber or a multi-wire proportional chamber with several layers, the measurement may be a track segment resulting from an internal track reconstruction. In this case the vector m k may be four- or five-dimensional, depending on whether the curvature can be estimated or not.

In most cases the function h k(q k) includes a transformation of the state vector q k into the local coordinate system of the detector. For use in track reconstruction, the Jacobian of this transformation is needed:

$$\displaystyle \begin{aligned} \boldsymbol{H}_k = \left. \frac{\partial \boldsymbol{m}_k}{\partial \boldsymbol{q}_k} \right|{}_{\breve{\boldsymbol{q}}_{k}},{} \end{aligned} $$
(13.3)

where q̆k is the expansion point in surface k. In many cases the Jacobian contains only rotations and projections, and thus can be computed analytically.

The measurement is always smeared by a measurement error:

$$\displaystyle \begin{aligned} \boldsymbol{m}_k = \boldsymbol{h}_k ( \boldsymbol{q}_k ) + \boldsymbol{\varepsilon}_k. \end{aligned}$$

The mean value and the covariance matrix of ε k depend on the detector type and the detector geometry and have therefore in general to be calibrated for each detector unit independently. The measurement error is often assumed to follow a Gaussian distribution, but frequently exhibits tails which are incompatible with this assumption. In this case a Gaussian mixture is a more appropriate model.

13.1.3.3 Material Effects

A charged particle crossing a tracking detector interacts with the material of the detector. The most important types of interactions in track reconstruction are multiple Coulomb scattering, energy loss by ionization, and energy loss by bremsstrahlung. For an in-depth treatment of material effects see Chapter 2.

13.1.3.3.1 Multiple Coulomb Scattering

Elastic Coulomb scattering of particles heavier than the electron is dominated by the atomic nucleus. For small angles the differential cross-section is approximately equal to

$$\displaystyle \begin{aligned} \frac{\mathrm{d}\sigma}{\mathrm{d}\theta}=2\pi\left(\frac{2Ze^2}{pv}\right)^2\frac{1}{\theta^3}, \end{aligned}$$

where θ is the polar angle of the scattering, Z is the charge of the nucleus in units of the elementary charge e, v is the velocity of the scattered particle, and p is its momentum [42]. Because of screening effects and the finite size of the nucleus the differential cross-section is modified to [43]

$$\displaystyle \begin{aligned} \frac{\mathrm{d}\sigma}{\mathrm{d}\theta}=2\pi\left(\frac{2Ze^2}{pv}\right)^2\frac{\theta}{(\theta^2+\theta_{\mathrm{min}}^2)^2},\quad 0\leq\theta\leq\theta_{\mathrm{max}}. \end{aligned}$$

If the momentum p is given in GeV/c, the lower and upper limits are approximately equal to

$$\displaystyle \begin{aligned} \theta_{\mathrm{min}}\approx\frac{2.66\cdot 10^{-6}Z^{1/3}}{p},\quad \theta_{\mathrm{max}}\approx\frac{0.14}{A^{1/3}p}. \end{aligned}$$

The average number of scattering processes in a layer of thickness d (in cm) is given by

$$\displaystyle \begin{aligned} N(d)=d\sigma\frac{N_{\mathrm{A}}\rho}{A}, \end{aligned}$$

where σ is the integrated elastic cross section, N A is the Avogadro constant, ρ is the density of the material (in g/cm3), and A is the atomic mass of the nucleus. In track reconstruction it is convenient to work with the projected scattering angles in two perpendicular planes. The projected multiple scattering angle θ P is equal to the sum of the projected single scattering angles, and its variance can be obtained by multiplying the variance of the projected single scattering angle by the average number of scatters, the projected single scattering angles being uncorrelated. With increasing thickness d the distribution of the projected scattering angle approaches a normal distribution, and the two projected angles become independent. For thin scatterers, however, the width of the Gaussian core is notably narrower than is indicated by the variance [42]. This is taken into account by Highland’s formula for the standard deviation of the projected scattering angle [44]:

$$\displaystyle \begin{aligned} \sigma_{\mathrm{P}}=E(\theta_{\mathrm{P}}^2)^{1/2}=\frac{0.0136}{\beta p}\sqrt{{d}/{X_0}}\left[1+0.038{\mathrm{ln}}(d/X_0)\right], \end{aligned}$$

where X 0 is the radiation length of the material in cm, β = vc is the particle velocity in units of c, and p is the particle momentum in GeV/c. The logarithmic correction ceases to be applicable above d ≈ X 0.

If a scatterer is sufficiently thin, the transverse offset of the track due to multiple scattering can be neglected. Only the track direction is affected in this case. If the direction is represented by the polar angle θ and the azimuthal angle φ, their joint covariance matrix is given by

$$\displaystyle \begin{aligned} \mathrm{var}(\Delta\theta)=\sigma_{\mathrm{P}}^2,\quad \mathrm{var}(\Delta\varphi)= \sigma_{\mathrm{P}}^2/\sin^2\theta,\quad \mathrm{cov}(\Delta\theta,\Delta\varphi)=0. \end{aligned}$$

If the direction is represented by the direction tangents t x = dx∕dz and t y = dy∕dz, the covariance matrix is [45]

$$\displaystyle \begin{aligned} \mathrm{Var}[(\Delta{}t_x,\Delta{}t_y){}^{\mathrm{T}}]=\sigma^2_P(1+{}t_x^2+{}t_y^2) \begin{pmatrix} 1+{}t_x^2 & {}t_x {}t_y \\ {}t_x {}t_y & 1+{}t_y^2\end{pmatrix}. \end{aligned}$$

If the direction is represented by the direction cosines c x = dx∕ds and c y = dy∕ds, the covariance matrix is [45]

$$\displaystyle \begin{aligned} \mathrm{Var}[(\Delta{}c_x,\Delta{}c_y){}^{\mathrm{T}}]=\sigma^2_P \begin{pmatrix} (1-{}c_x)^2 & -{}c_x {}c_y \\ -{}c_x {}c_y & (1-{}c_y)^2\end{pmatrix}. \end{aligned}$$

In all cases the projected variance \(\sigma _{\mathrm {P}}^2\) takes into account the effective amount of material crossed by the track.

If the transverse offset cannot be neglected, its variance and its correlation with the angle have to be taken into account. Assume that the particle passes a scatterer of length d, traveling along the z-axis. Neglecting the curvature of the track, the joint covariance matrix of the offset Δx and the scattering angle θ x in the xz projection is

$$\displaystyle \begin{aligned} \mathrm{var}(\Delta x)=\sigma_0^2 d^3/3,\quad \mathrm{var}(\theta_x)=\sigma_0^2 d,\quad \mathrm{cov}(\Delta x,\theta_x)=\sigma_0^2 d^2/2, \end{aligned}$$

where \(\sigma _0^2\) is the variance of the projected scattering angle per unit length. If the particle enters the scatterer at z = 0 with direction (t x, t y), the joint covariance matrix of the offsets Δx,  Δy and the angles Δt x,  Δt y at z = d is

$$\displaystyle \begin{aligned} \mathrm{Var}\left[\begin{pmatrix}\Delta x \\ \Delta y \\ \Delta{}t_x \\ \Delta{}t_y \end{pmatrix}\right]= \sigma_0^2(1+{}t_x^2+{}t_y^2) \begin{pmatrix} (1+{}t_x^2)D^3/3 & {}t_x{}t_y D^3/3 & (1+{}t_x^2) D^2/2 & {}t_x{}t_y D^2/2 \\ {}t_x{}t_y D^3/3 & (1+{}t_y^2)D^3/3 & {}t_x{}t_y D^2/2 & (1+{}t_y^2) D^2/2 \\ (1+{}t_x^2) D^2/2 & {}t_x{}t_y D^2/2 & (1+{}t_x^2) D & {}t_x{}t_y D \\ {}t_x{}t_y D^2/2 & (1+{}t_y^2) D^2/2 & {}t_x{}t_y D & (1+{}t_y^2) D \end{pmatrix}, \end{aligned}$$

where \(D=(1+{}t_x^2+{}t_y^2)^{1/2}d\) is the effective thickness crossed. If the direction is represented by θ and φ, the covariance matrix can be computed via the transformation

$$\displaystyle \begin{aligned}t_x=\tan{}(\theta)\cos{}(\varphi),\quad {}t_y=\tan{}(\theta)\sin{}(\varphi), \end{aligned}$$

and linear error propagation with the Jacobian

$$\displaystyle \begin{aligned} \displaystyle\boldsymbol{T}=\frac{\partial(\theta,\varphi)}{\partial({}t_x,{}t_y)}= \begin{pmatrix} \frac{{}t_x}{\sqrt{{}t_x^2+{}t_y^2}(1+{}t_x^2+t_y^2)} & \frac{{}t_y}{\sqrt{{}t_x^2+{}t_y^2}(1+{}t_x^2+t_y^2)} \\ -\frac{{}t_y}{{}t_x^2+{}t_y^2} & \frac{{}t_x}{{}t_x^2+{}t_y^2} \end{pmatrix}= \begin{pmatrix} \frac{\cos{}(\varphi)}{1+\tan^2(\theta)} & \frac{\sin{}(\varphi)}{1+\tan^2(\theta)} \\ -\frac{\sin{}(\varphi)}{\tan{}(\theta)} & \frac{\cos{}(\varphi)}{\tan{}(\theta)} \end{pmatrix}. \end{aligned}$$

For analogous formulas in cylindrical coordinates, see [46].

If the curvature of the track cannot be neglected, the simplest approach is a stepwise integration of the equation of motion, assuming the validity of a helical track model within each step and considering each such step as a thin scatterer [39, 47].

13.1.3.3.2 Energy Loss

For particles other than electrons the energy loss in material is almost exclusively due to scattering on electrons. The momentum correction Δp in a material layer of thickness d is calculated by integrating the Bethe-Bloch formula [37]:

$$\displaystyle \begin{aligned} \Delta p=\int_0^d \frac{\mathrm{d} p}{\mathrm{d}{}x}\mathrm{d}{}x=\int_0^d \frac{1}{\beta}\frac{\mathrm{d} E}{\mathrm{d}{}x}\mathrm{d}{}x= \int_0^d \frac{K}{\beta^3} \left[{\mathrm{ln}}\frac{2m_{\mathrm{e}} c^2\beta^2\gamma^2}{\langle I \rangle}-\beta^2\right]\mathrm{d}{}x,{} \end{aligned} $$
(13.4)

where K is a constant depending on the material, m e is the electron mass, 〈I〉 is the average ionization potential of the material, and β = vc and γ = Emc 2 are the usual kinematic parameters. The ratio 〈I〉∕Z is about 20 eV for hydrogen and helium, between 12 and 16 eV for light nuclei, and around 10 eV for heavy nuclei [44]. For practical purposes, the differential energy loss dE∕dx is a function only of β. For small β, it decreases like 1∕β 2. It has a minimum, the position of which drops with increasing Z from βγ ≈ 3.5 (carbon) to βγ ≈ 3 (lead). In terms of momentum, the minimum is at p = βγmc and thus depends on the mass of the particle. This dependency is used for particle identification. The energy loss at the minimum can be parameterized for Z ≥ 6 by [44]:

$$\displaystyle \begin{aligned} (\mathrm{d} E/\mathrm{d}{}x)_{\mathrm{min}}=(2.35-0.64\ln_{10} Z)\ \mathrm{Me}\mathrm{V}\,\mathrm{g}^{-1}\mathrm{cm}^2. \end{aligned}$$

From this the constant K in Eq. (13.4) can be calculated. For large βγ the energy loss increases like ln(βγ); this is called the relativistic rise. For momenta in the vicinity of the minimum dE∕dx can be considered as constant, giving Δp ≈ (dE∕dx)min ⋅ β, ρ being the density of the material.

13.1.3.3.3 Bremsstrahlung

For an electron (or positron) passing through matter the most significant contribution to energy loss is bremsstrahlung, the emission of photons in the electric field of an atomic nucleus. In the Bethe–Heitler model [48] the relative energy loss is distributed independently of the energy. Let d be the path length in the material in units of radiation length, and z the fraction of energy remaining after the material is traversed. Then the distribution of z is given by the following probability density function:

$$\displaystyle \begin{aligned} f(z)=\frac{(-{\mathrm{ln}} z)^{c-1}}{\Gamma(c)},\quad 0\leq z\leq 1, \end{aligned}$$

where Γ(x) is Euler’s gamma function and c = d∕ ln2. For high energy electrons p ≈ E, so the momentum correction is Δp ≈ p(z − 1). The first two moments of Δp are

$$\displaystyle \begin{aligned} E(\Delta p)=p(2^{-c}-1),\quad \mathrm{var}(\Delta p)=p^2(3^{-c}-4^{-c}). \end{aligned}$$

The moments can be used for a Gaussian representation of bremsstrahlung as an additional process noise in the Kalman filter (see Sect. 13.1.3.4). As this is a very crude approximation, more sophisticated methods have been developed that take into account the actual shape of the distribution. One of them is the Gaussian-sum filter [49, 50], see Sect. 13.1.3.5. A computationally less intensive approach is described in [51].

13.1.3.4 Estimation Methods

The main task of the track fit is to estimate the values of a set of parameters describing the state of a particle somewhere in the detector, often at a reference surface close to the interaction vertex. The information from the measurements created by the particle while traversing the tracking detector should be processed in an optimal manner. If the track model is truly linear, i.e., if the measurements are strictly linear functions of the track parameters, and all stochastic disturbances entering the estimation procedure are Gaussian, the linear least-squares method is the optimal one [37]. Since track parameter propagation in general is a nonlinear procedure, strict linearity holds very rarely in practice. The relation between the track parameter vector q 0 at a reference surface and the measurement vector m k at a detector layer k is a function d k given by

$$\displaystyle \begin{aligned} \boldsymbol{m}_k = {\boldsymbol{d}}_k (\boldsymbol{q}_0 ) + \boldsymbol{\gamma}_k , \end{aligned}$$

where γ k is a noise term containing the measurement error of m k and all multiple scattering in front of m k. The function d k is a composition of the measurement model function h k and the track propagator functions f i|i−1 (see Sect. 13.1.3.2):

$$\displaystyle \begin{aligned} {\boldsymbol{d}}_k = \boldsymbol{h}_k \circ \boldsymbol{f}_{k|k-1} \circ \cdots \circ \boldsymbol{f}_{2|1} \circ \boldsymbol{f}_{1|0}. \end{aligned}$$

For the linear least-squares method d k has to be linearized around some expansion point, providing the Jacobian D k of each d k:

$$\displaystyle \begin{aligned} \boldsymbol{D}_k = \boldsymbol{H}_k \boldsymbol{F}_{k|k-1} \cdots \boldsymbol{F}_{2|1} \boldsymbol{F}_{1|0}, \end{aligned}$$

with H k from Eq. (13.3). The covariance matrix of γ k is obtained by linear error propagation:

$$\displaystyle \begin{aligned} \mathrm{var} (\boldsymbol{\gamma}_k)=\boldsymbol{V}_k + \boldsymbol{H}_k(\boldsymbol{F}_{k|1} \boldsymbol{Q}_1 \boldsymbol{F}_{k|1}{}^{\mathrm{T}} + \cdots +\boldsymbol{F}_{k|k-1} \boldsymbol{Q}_{k-1} \boldsymbol{F}_{k|k-1}{}^{\mathrm{T}} + \boldsymbol{Q}_{k} )\boldsymbol{H}_k{}^{\mathrm{T}}, \end{aligned}$$

where V k is the covariance matrix of the measurement error ε k of m k, and Q j is the covariance matrix of multiple scattering after layer j − 1 up to and including layer j. The part of Q j originating from scattering between the layers has to be transported to layer j by the appropriate Jacobian. Because of the cumulative effect of multiple scattering γ i and γ k are correlated. If i < k, the covariance is given by

$$\displaystyle \begin{aligned} \mathrm{cov} (\boldsymbol{\gamma}_i,\boldsymbol{\gamma}_k)= \boldsymbol{H}_i(\boldsymbol{F}_{i|1} \boldsymbol{Q}_1 \boldsymbol{F}_{k|1}{}^{\mathrm{T}} + \cdots + \boldsymbol{F}_{i|i-1} \boldsymbol{Q}_{i-1}\boldsymbol{F}_{k|i-1}{}^{\mathrm{T}} + \boldsymbol{Q}_{i}\boldsymbol{F}_{k|i}{}^{\mathrm{T}})\boldsymbol{H}_k{}^{\mathrm{T}}. \end{aligned}$$

The observations m k, the functions d k, their Jacobians D k, and the noise γ k are now collected in single vectors and a matrix:

$$\displaystyle \begin{aligned} \boldsymbol{m}=\begin{pmatrix}\boldsymbol{m}_1\\ \cdot \\ \cdot \\ \boldsymbol{m}_n\end{pmatrix},\quad {\boldsymbol{d}}=\begin{pmatrix}{\boldsymbol{d}}_1\\ \cdot \\ \cdot \\ {\boldsymbol{d}}_n\end{pmatrix},\quad \boldsymbol{D}=\begin{pmatrix}\boldsymbol{D}_1\\ \cdot \\ \cdot \\ \boldsymbol{D}_n\end{pmatrix},\quad \boldsymbol{\gamma}=\begin{pmatrix}\boldsymbol{\gamma}_1\\ \cdot \\ \cdot \\ \boldsymbol{\gamma}_n\end{pmatrix}, \end{aligned}$$

where n is the total number of measurement layers. This gives the following model:

$$\displaystyle \begin{aligned} \boldsymbol{m}={\boldsymbol{d}}(\boldsymbol{q}_0)+\boldsymbol{\gamma}, \end{aligned}$$

which now can be linearized into

$$\displaystyle \begin{aligned} \boldsymbol{m} = \boldsymbol{D} \boldsymbol{q}_0 + {\boldsymbol{c}} + \boldsymbol{\gamma}, \end{aligned}$$

where c is a constant vector. The global least-squares estimate of q 0 is given by

$$\displaystyle \begin{aligned} \tilde{\boldsymbol{q}}_0=(\boldsymbol{D}{}^{\mathrm{T}} \boldsymbol{G} \boldsymbol{D})^{-1} \boldsymbol{D}{}^{\mathrm{T}} \boldsymbol{G}\,(\boldsymbol{m}-{\boldsymbol{c}}), \end{aligned}$$

where V  = G −1 is the non-diagonal covariance matrix of γ. The quality of the initial expansion point can be monitored by using the obtained estimate as a new expansion point, and the state vector estimate can hence be re-calculated. Such a procedure is repeated until convergence, defined by a suitable stopping criterion.

If the track model is a circle and multiple scattering and energy loss can be neglected, the estimation can be simplified substantially. Explicit estimators are given in [52] for the center and radius of the circle, and in [53] for the curvature, the direction and the distance from a fixed point. Other algorithms are based on conformal mapping in the plane [2] or on a mapping to the Riemann sphere [54,55,56].

If there is strong multiple scattering, the estimated track can be quite far away from the real track. In order to follow the actual track more closely, two projected scattering angles can be explicitly estimated at each detector layer or at a set of virtual breakpoints inside a continuous scatterer [45, 57]. The breakpoint method, also known as General Broken Lines [58], and the global least-squares method are equivalent, as far as the estimate of the state vector q 0 is concerned [59].

If the number of measurements or the number of breakpoints is substantial, the computational cost of these methods can be high due to the necessity of inverting large matrices during the estimation procedure. The Kalman filter, a recursive formulation of the least-squares method, requires the inversion of only small matrices and exhibits the same attractive feature as the breakpoint method of following the actual track quite closely [26, 60].

As mentioned earlier, the Kalman filter proceeds by alternating prediction and update steps. The prediction step is the propagation of the track parameter vector from one detector layer containing a measurement to the next,

$$\displaystyle \begin{aligned} \boldsymbol{q}_{k|k-1} = \boldsymbol{f}_{k|k-1} ( \boldsymbol{q}_{k-1|k-1} ), \end{aligned}$$

and the associated covariance matrix,

$$\displaystyle \begin{aligned} \boldsymbol{C}_{k|k-1} = \boldsymbol{F}_{k|k-1} \boldsymbol{C}_{k-1|k-1} \boldsymbol{F}_{k|k-1} {}^{\mathrm{T}} + \boldsymbol{Q}_k. \end{aligned}$$

The update step is the correction of the predicted state vector due to the information from the measurement in layer k:

$$\displaystyle \begin{aligned} \boldsymbol{q}_{k|k} = \boldsymbol{q}_{k|k-1} + \boldsymbol{K}_k \left[ \boldsymbol{m}_k - \boldsymbol{h}_k ( \boldsymbol{q}_{k|k-1} ) \right], \end{aligned}$$

where the gain matrix K k is given by

$$\displaystyle \begin{aligned} \boldsymbol{K}_k = \boldsymbol{C}_{k|k-1} \boldsymbol{H}_k {}^{\mathrm{T}} \left( \boldsymbol{V}_k + \boldsymbol{H}_k \boldsymbol{C}_{k|k-1} \boldsymbol{H}_k {}^{\mathrm{T}} \right)^{-1}. \end{aligned}$$

The update of the covariance matrix is given by

$$\displaystyle \begin{aligned} \boldsymbol{C}_{k|k} = \left( \boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k \right) \boldsymbol{C}_{k|k-1}. \end{aligned}$$

The information filter is a mathematically equivalent, but numerically more stable formulation of the Kalman filter. In the information filter, the update of the state vector reads

$$\displaystyle \begin{aligned} \boldsymbol{q}_{k|k} = \boldsymbol{C}_{k|k} \left[ \left( \boldsymbol{C}_{k|k-1} \right)^{-1} \boldsymbol{q}_{k|k-1} + \boldsymbol{H}_k {}^{\mathrm{T}} \boldsymbol{V}_k^{-1} \boldsymbol{m}_k \right], \end{aligned}$$

whereas the update of the covariance matrix is given by

$$\displaystyle \begin{aligned} \boldsymbol{C}_{k|k} = \left[ \left( \boldsymbol{C}_{k|k-1} \right)^{-1} + \boldsymbol{H}_k {}^{\mathrm{T}} \boldsymbol{V}_k^{-1} \boldsymbol{H}_k \right]^{-1}. \end{aligned}$$

The implementation of the Kalman filter requires the computation of the Jacobians F k|k−1 and H k. A compilation of analytical formulas for two important cases (fixed-target configuration and solenoidal configuration) is given in [61].

Full information of the track parameters at the end of the track is obtained when all n measurements in the track candidate have been processed by the filter. The full information can be propagated back to all previous estimates by another iterative procedure, the Kalman smoother. A step of the smoother from layer k + 1 to layer k is for the state vector

$$\displaystyle \begin{aligned} \boldsymbol{q}_{k|n} = \boldsymbol{q}_{k|k}+ \boldsymbol{A}_k ( \boldsymbol{q}_{k+1|n} - \boldsymbol{q}_{k+1|k} ), \end{aligned}$$

where the smoother gain matrix is given by

$$\displaystyle \begin{aligned} \boldsymbol{A}_k = \boldsymbol{C}_{k|k} \boldsymbol{F}_{k+1|k} {}^{\mathrm{T}} ( \boldsymbol{C}_{k+1|k})^{-1}. \end{aligned}$$

The smoothed covariance matrix is

$$\displaystyle \begin{aligned} \boldsymbol{C}_{k|n} = \boldsymbol{C}_{k|k} - \boldsymbol{A}_k ( \boldsymbol{C}_{k+1|k} - \boldsymbol{C}_{k+1|n}) \boldsymbol{A}_k {}^{\mathrm{T}}. \end{aligned}$$

The smoother can also be realized by combining two filters running in opposite directions: a forward filter from m 1 to m n and a backward filter from m n to m 1. The smoothed states are the weighted mean of the predicted states of one filter and the updated states of the other filter. This approach is numerically more stable than the gain matrix formulation of the smoother.

13.1.3.5 Track Quality and Robust Estimation

Robust estimators are insensitive to outliers, i.e., measurements that are biased or do not originate from the particle creating the majority of the hits in a track candidate. Some estimators are inherently robust by construction; other estimators can be made robust by finding and discarding outliers.

In the Kalman filter, the residual of the measurement in layer k with respect to the updated state vector is

$$\displaystyle \begin{aligned} {\boldsymbol{r}}_{k|k} = \boldsymbol{m}_k - \boldsymbol{h}_k (\boldsymbol{q}_{k|k}), \end{aligned}$$

and the covariance matrix of this residual is

$$\displaystyle \begin{aligned} \boldsymbol{R}_{k|k} = \boldsymbol{V}_k - \boldsymbol{H}_k \boldsymbol{C}_{k|k} \boldsymbol{H} {}^{\mathrm{T}} . \end{aligned}$$

The chi-square increment in layer k is

$$\displaystyle \begin{aligned} \chi_{k,+}^2 = {\boldsymbol{r}}_{k|k} {}^{\mathrm{T}} \boldsymbol{R}_{k|k}^{-1} {\boldsymbol{r}}_{k|k}, \end{aligned}$$

and the total chi-square of the track is found by summing up the chi-square increments for all measurements in the track candidate. The total chi-square is used to evaluate the quality of the track candidate. A too large value of this test statistic indicates that one or more of the measurements of the track candidate do not originate from the particle creating the majority of the measurements. Such measurements are called outliers.

An outlier rejection procedure can make use of the chi-squares of the measurements with respect to the smoothed predictions, i.e., a weighted mean of the predicted states of a forward and a backward Kalman filter. The measurement with the largest value of the chi-square is removed, and the total chi-square is again calculated. This procedure is repeated until the value of the total chi-square falls below a defined threshold.

In the presence of a potentially large fraction of outliers in a track candidate, the sequential outlier rejection procedure outlined above might become unstable, because the smoothed predictions may themselves be biased by outliers. An alternative approach is the Gaussian-sum filter [62]. This algorithm is based on the assumption that the probability distribution of the measurement error can be modeled as a two-component Gaussian mixture, where a narrow component represents the hypothesis that the measurement is real and a wider component represents the hypothesis that the measurement is an outlier. It takes the form of a set of Kalman filters running in parallel, each Kalman filter representing a specific hypothesis of a subset of the measurements that should be classified as outliers. A weight attached to each Kalman filter can be interpreted as the probability of correctness of the hypothesis. In the end, the Kalman filter with the largest weight or a weighted mean of the different filters can be taken as the final estimate.

The Gaussian-sum filter can also be used to deal with a mixture model of the process noise, i.e., the stochastic disturbance of the track because of interactions with the detector material [63]. In the case of bremsstrahlung, a successful application to the reconstruction of electrons is described in [50].

For the treatment of outliers, the Gaussian-sum filter has two disadvantages. First, it may create a large number of Kalman filters running in parallel because of poor knowledge of the track parameters in the early stages of the filter, making the approach expensive in terms of computing time. Second, an explicit outlier model is required. A faster and even more robust alternative is the Deterministic Annealing Filter [64]. This filter is an iterated Kalman filter with annealing, which assigns small weights to measurements far away from the track. A temperature parameter is introduced, facilitating convergence to the globally optimal solution. The iterations start at a high temperature, continue with a gradual lowering of the temperature and converge at the nominal value of the temperature. The procedure is easily generalized to the situation of several measurements being present in the same detector layer. In this case the measurements compete for inclusion in the track. As opposed to a standard outlier rejection approach, the assignment of measurements is soft. This means that several measurements in the same detector layer might contribute to the final estimate of the track parameters, each with a weight equal to the assignment probability. A further generalization is the multi-track filter, where several tracks are allowed to compete for compatible hits in all detector layers [65]. For an experimental application, see [66].

13.1.3.6 Jet Reconstruction

Jets are bundles of collimated hadrons, reflecting hard scattering processes at the parton level. In order to carry out detailed comparisons between parton-level predictions and hadron-level observations a well-defined “jet finder” is required. In the jet finding information from both the tracking devices and the calorimeters is used.

Jet finding can be understood as finding clusters in the set of reconstructed tracks, including neutral tracks. As in the case of vertex finding (see Sect. 13.2.2), various types of clustering methods have been proposed and investigated. The performance strongly depends on the underlying physics, and usually a jet finder is optimized for specific physics requirements. For instance, the widely used k clustering algorithm comes in several versions, for instance one for e +e collisions [67], and one for hadron-hadron collisions [68].

Hierarchical cluster algorithms offer a large variety of jet finders, differing mainly by the definition of the measure of distance between objects (tracks and jets), but sometimes also by the order in which the objects are combined. Some examples of agglomerative clustering algorithms are described and studied in [69]. Table 13.1 gives a summary of the distance measures used. The names refer to the ones used in [69]. E i is the energy of cluster i, p i is its momentum, θ ij is the opening angle between the momentum vectors of the two clusters, and E vis is the visible energy.

Table 13.1 Some distance measures used for agglomerative jet finding with respective references

A divisive hierarchical clustering algorithm is described in [76]. It is based on the following measure of distance between two tracks:

$$\displaystyle \begin{aligned} d_{ij}=\frac{\theta^2_{ij}}{p_i p_j}, \end{aligned}$$

but can be generalized to any other measure of distance. The method first constructs a minimum spanning tree [77] in the edge-weighted graph connecting all particles with each other and then proceeds to cut the tree along its longest edges. The procedure stops when the longest remaining edge is shorter than a fixed multiple of the median of all edge lengths.

Several non-hierarchical cluster algorithms have been proposed as well. Some of them employ general unsupervised learning methods, such as deterministic annealing [78] or k-means [79]. Others are specially designed for jet finding, for instance the cone algorithm described in [72]. It is an iterating procedure which constructs jets out of seeds. In contrast to the hierarchical clustering method the jets may overlap and a unique assignment has to be forced at the end. A modified cone algorithm suitable for the much larger multiplicity of heavy-ion collisions is proposed in [80]. A specialized jet finder for the reconstruction of hadronic τ-decays is described in [81].

13.1.4 Detector Alignment

Alignment is the general term used in experimental high energy physics to refer to the process of obtaining and applying corrections to the nominal setup of a given experiment. These corrections are typically related to geometrical displacements of devices with a spatial resolution, in contrast to calibrations, where the corrections are usually extracted from pedestal or reference measurements to compensate for offsets in scalar measurements. Misalignment compromises tracking and vertex finding [82] and thus directly affects physics measurements such as momentum and invariant mass resolutions, or the efficiency of b-tagging algorithms. There are various possibilities for the treatment of alignment corrections, ranging from simple translations and rotations, equivalent to those of a rigid body, to more complex deformations, like sags or twists.

To this end experiments typically use several independent strategies [83]. For testing the long-term stability or the alignment of sub-detectors with respect to each other, very often so-called hardware alignment is utilized, where special reference markers are measured directly e.g. via optical systems or photogrammetry. However, these techniques reach only a limited precision in the range of several tens to hundreds of microns. If the intrinsic resolution of a tracking device is smaller, an improved resolution can only be obtained with track-based alignment, where the information from recorded particle tracks is used to obtain the alignment parameters [83, 84]. For various examples of the track-based alignment methods used in experiments since the LEP era, see [85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100].

13.1.4.1 General Overview

The basis of all track-based alignment algorithms is an extended track model d, where the measurements m depend not only on the true track-parameters q 0, but also on a set of alignment parameters p 0 that describe the effects of sufficiently small deviations from the ideal geometry:

$$\displaystyle \begin{aligned} \boldsymbol{m} = {\boldsymbol{d}} ( \boldsymbol{q}_0, \boldsymbol{p}_0 ) + \boldsymbol{\gamma}, \quad \mathrm{cov} ( \boldsymbol{\gamma} ) = \boldsymbol{V}. \end{aligned}$$

The stochastic term γ, which describes the intrinsic resolution of the tracking devices and the effects of multiple scattering, is dealt with via its covariance matrix V . Since typically high momentum particles are used, energy-loss effects can be assumed to be deterministic and hence directly taken care of in the track model d itself.

With an initial guess q̆ for the track parameters and p̆ for the alignment parameters, this model allows to define residuals that are functions of the unknowns q and p:

$$\displaystyle \begin{aligned} {\boldsymbol{r}} ( \boldsymbol{q}, \boldsymbol{p} ) = \boldsymbol{m} - {\boldsymbol{d}} ( \boldsymbol{q}, \boldsymbol{p} ) \approx \boldsymbol{m} - \breve{\boldsymbol{d}}\;\negthickspace - \boldsymbol{D}_{q} \boldsymbol{\Delta}\boldsymbol{q} - \boldsymbol{D}_{p} \boldsymbol{\Delta}\boldsymbol{p} {} \end{aligned} $$
(13.5)

with

$$\displaystyle \begin{aligned} \breve{\boldsymbol{d}}\;\negthickspace = {\boldsymbol{d}} ( \breve{\boldsymbol{q}}, \breve{\boldsymbol{p}} ), \quad \boldsymbol{\Delta}\boldsymbol{q} =\boldsymbol{q} - \breve{\boldsymbol{q}}, \quad \boldsymbol{\Delta}\boldsymbol{p} = \boldsymbol{p} - \breve{\boldsymbol{p}}, \end{aligned}$$
$$\displaystyle \begin{aligned} \boldsymbol{D}_{q} = \left.\frac{\partial{\boldsymbol{d}} }{ \partial\boldsymbol{q}}\right|{}_{\breve{\boldsymbol{q}},\breve{\boldsymbol{p}}}, \quad \boldsymbol{D}_{p} = \left.\frac{\partial{\boldsymbol{d}} }{ \partial\boldsymbol{p}}\right|{}_{\breve{\boldsymbol{q}},\breve{\boldsymbol{p}}}. \end{aligned}$$

The goal of a track-based alignment algorithm is to determine p from the residuals r, by minimizing the quadratic form χ 2 = r TV −1r, using a sufficiently large set of recorded tracks. The methods used are quite diverse, but can be grouped into two categories: biased and unbiased algorithms.

Biased algorithms initially ignore the fact that the initial guess of the track parameters q 0 is in general biased by the factual misalignment. In other words, by setting q = q̆ for every track, the residuals become a function of p alone, i.e., r(q, p) →r(p). In general, the influence of the biased track information has to be compensated by iterating several times over the track sample, where at each iteration step the previously determined parameters are applied to the track reconstruction.

Unbiased algorithms on the other hand, minimize the residuals or the normalized residuals, respectively, estimating at the same time the track parameters. The problem with such an approach is the resulting huge number of parameters. In the presence of N alignment parameters and a sample of M tracks with m track parameters each, a total of N + m ⋅ M parameters have to be dealt with. While the value of N depends on the experimental setup, and m usually equals 5, the number of tracks M has always to be of considerable size to acquire reasonable statistics. On the other hand, unbiased algorithms usually do not require iterations, with the possible exception of problems like non-linearities or rejection of outliers.

Besides the differences between various algorithms it should be noted that the final result of any track-based alignment is always limited by the tracks used. Basic quality cuts, like the selection of high momentum tracks to minimize the influence of multiple scattering or cuts on the minimum number of hits, have a strong influence on the convergence. More subtle is the effect of an unbalanced mixture of tracks or the complete absence of some types of tracks, such as tracks from collisions and cosmic events or tracks taken with and without a magnetic field. This is due to the fact that any kind of tracks has several unconstrained degrees of freedom, usually referred to as weak modes, weakly defined modes or χ 2-invariant modes. As an example, typical weak modes for straight tracks are shears but not bends, and vice versa for curved tracks. Combining the information of both kinds of tracks is therefore a reasonable strategy to avoid these deformations in the final result. The most obvious weak mode is a translation or rotation of the entire tracking device, which can be only fixed with some kind of reference frame, be it an external system or by definition. This, however, is less severe and sometimes even not considered at all, as it does not affect the internal alignment of the tracking device.

Once a set of alignment parameters is calculated, it should always be validated [83, chapter 11]. Apart from checking the improvement of the residuals, several physics measurements can be utilized, especially to probe for remaining weak modes. Known charge, forward-backward or φ-symmetries of distinct physics processes can be used. Distributions of the signed curvature or the signed transverse impact parameter are also sensitive observables.

13.1.4.2 Examples of Alignment Algorithms

Some modern experiments deploy large tracking devices that require a large number of alignment parameters, of the order of 105. In such a case the computation of parameters by using straightforward recipes might become unreasonably slow or cause numerical problems. The two algorithms presented in this section are examples of how to cope with such challenging circumstances.

13.1.4.2.1 The HIP Algorithm

The HIP algorithm [101] is a straightforward and easy-to-implement biased alignment algorithm. It computes the alignment parameters for each alignable object separately. Only when iterating on the track sample a certain kind of indirect feedback between the alignable objects is established due to the track refit.

Since only individual alignable objects are regarded, Eq. (13.5) can be partitioned. This is simply done by evaluating the corresponding expressions for each alignable object i together with its associated parameters p i:

$$\displaystyle \begin{aligned} {\boldsymbol{r}}_i ( \boldsymbol{p}_i ) = \boldsymbol{m}_i - {\boldsymbol{d}}_i(\breve{\boldsymbol{q}},\boldsymbol{p}_i) \approx \boldsymbol{m}_i - \breve{\boldsymbol{d}}\;\negthickspace_{i} - \boldsymbol{D}_{p i} \boldsymbol{\Delta}\boldsymbol{p}_i \end{aligned}$$

with

$$\displaystyle \begin{aligned} \breve{\boldsymbol{d}}\;\negthickspace_{i} = {\boldsymbol{d}}_i ( \breve{\boldsymbol{q}}, \breve{\boldsymbol{p}}_{i} ), \quad \boldsymbol{\Delta}\boldsymbol{p}_i = \boldsymbol{p}_i - \breve{\boldsymbol{p}}_i, \quad \boldsymbol{D}_{p i} = \left. \frac{\partial{\boldsymbol{d}}_i}{ \partial\boldsymbol{p}_i}\right|{}_{\breve{\boldsymbol{q}},\,\breve{\boldsymbol{p}}_i}. \end{aligned}$$

The result is determined by minimizing the normalized squared residuals from a given set of tracks, again for each alignable object separately. The formal solution is given by

$$\displaystyle \begin{aligned} \boldsymbol{\Delta}\boldsymbol{p}_i = \left( \, \sum_{\mathrm{tracks}} \boldsymbol{D}_{p i}{}^{\mathrm{T}} \boldsymbol{V}_{i\,}^{-1} \boldsymbol{D}_{p i} \right)^{-1} \left( \, \sum_{\mathrm{tracks}} \boldsymbol{D}_{p i}{}^{\mathrm{T}} \boldsymbol{V}_{i\,}^{-1} {\boldsymbol{r}}_i(\breve{\boldsymbol{p}}_i) \right) \end{aligned}$$
13.1.4.2.2 The Millepede Algorithm

The Millepede algorithm [102] is an unbiased algorithm that minimizes the sum of the squared residuals of all tracks at once. To this end a system of linear equations, equivalent to the formal solution of an ordinary χ 2-fit, is solved. However, to achieve this within a reasonable amount of time, only the solution for the alignment parameters is computed, while the computation of the improved track parameters is skipped. This is possible because of the special structure of the system: Firstly, the coefficient matrix is symmetric and, mostly due to the independence of the individual tracks, relatively sparse. Secondly, only the alignment parameters are common parameters for all track measurements, while the specific track parameters are only relevant for each corresponding track. Due to the latter, the solutions for the alignment and track parameters are only coupled via coefficient matrices of the form

$$\displaystyle \begin{aligned} {\boldsymbol{G}} = \boldsymbol{D}_{p}{}^{\mathrm{T}} \, \boldsymbol{V}^{-1} \boldsymbol{D}_{q} . \end{aligned}$$

To set up the reduced system of equations, for each track the following information has to be extracted:

$$\displaystyle \begin{aligned} \boldsymbol{\Gamma} = \boldsymbol{D}_{q}{}^{\mathrm{T}} \, \boldsymbol{V}^{-1} \boldsymbol{D}_{q}, \quad \boldsymbol{\beta} = \boldsymbol{D}_{q}{}^{\mathrm{T}} \, \boldsymbol{V}^{-1} \left( \, \boldsymbol{m} - \breve{\boldsymbol{d}}\;\negthickspace - \boldsymbol{D}_{p} \, \boldsymbol{\Delta}\boldsymbol{p}^{\prime} \, \right). \end{aligned}$$

Here Δp  = p p̆ may already include an estimate p on the actual alignment. Then compute

$$\displaystyle \begin{aligned} \boldsymbol{\Delta}\boldsymbol{C} = \boldsymbol{D}_{p}{}^{\mathrm{T}} \, \boldsymbol{V}^{-1} \boldsymbol{D}_{p} - {\boldsymbol{G}} \, \boldsymbol{\Gamma}^{-1} \boldsymbol{G}{}^{\mathrm{T}}, \quad \boldsymbol{\Delta}\boldsymbol{g} = \boldsymbol{D}_{p}{}^{\mathrm{T}} \, \boldsymbol{V}^{-1} \left( \, \boldsymbol{m} - \breve{\boldsymbol{d}}\;\negthickspace - \boldsymbol{D}_{p} \, \boldsymbol{\Delta}\boldsymbol{p}^{\prime} + \boldsymbol{D}_{q} \, \boldsymbol{\Gamma}^{-1} \boldsymbol{\beta} \, \right). \end{aligned}$$

Note the expression − Γ −1β instead of Δq. These are all necessary terms, including implicitly the full information from all track parameters. The complete system of equations to determine the alignment parameters then reads

$$\displaystyle \begin{aligned} \boldsymbol{C} \, \boldsymbol{\Delta}\boldsymbol{p} = -\boldsymbol{g}, \end{aligned}$$

with

$$\displaystyle \begin{aligned} \boldsymbol{C} = \sum_{\mathrm{tracks}} \! \boldsymbol{\Delta}\boldsymbol{C}, \quad \boldsymbol{g} = \sum_{\mathrm{tracks}} \! \boldsymbol{\Delta}\boldsymbol{g}. \end{aligned}$$

The solution by matrix inversion is only feasible if the number of parameters is fairly small (N ≤ 103). The matrix C is usually relatively sparse, so that less time-consuming and more reliable methods can be used, such as the GMRES algorithm [103].

It is also possible to introduce constraints into the solution, which allows to align on various hierarchical levels at once. When aligning for instance on module- and layer-level at the same time, these constraints can remove redundant degrees of freedom by forcing the average movement of all modules within one layer to zero.

Millepede is a well-tested algorithm. To use it efficiently, some knowledge of its inner workings is of advantage. The HIP algorithm is simpler to implement, but less suitable for very large setups than Millepede. Another unbiased algorithm is the Kalman Alignment Algorithm [104]. It is a sequential method, derived from the Kalman filter (see also [105]).

13.1.5 Momentum Resolution

The momentum resolution that can be achieved by a tracking detector is determined by the magnetic field, the arrangement and precision of the tracking detectors, and the amount of material crossed by the particle. Simple approximate formulas can be obtained for two cases:

  1. (a)

    A spectrometer consisting of a central bending magnet and two arms of tracking detectors in front of and behind the magnet. This is a typical arrangement for a fixed-target experiment with small track multiplicities.

  2. (b)

    A set of cylindrical tracking detectors immersed in a homogeneous magnetic field. This is a typical arrangement for the barrel part of a collider experiment, for instance layers of silicon or a TPC.

The units are the same as in Sect. 13.1.3.2: momentum in GeV/c, length in meters, and magnetic field in Tesla.

13.1.5.1 Two-arm Spectrometer

We assume that the trajectory of the particle is parallel to the z axis and that B y is the only significant component of the magnetic field. The angle of deflection is then given by [37]

$$\displaystyle \begin{aligned} \alpha\approx -\frac{k q}{p} \int_L B_y\, \mathrm{d} z = -\frac{k q}{p} \bar{B}_y L, \end{aligned}$$

where L is the length of the magnet, \(\bar {B}_y\) is the average value of the field along the trajectory, p is the momentum, and q, k are as in Eq. (13.2). Assuming that |q| = 1, linear error propagation gives

$$\displaystyle \begin{aligned} \frac{\sigma(p)}{p}=\frac{p\sigma(\alpha)}{k |\bar{B}_y| L}. \end{aligned}$$

Assume that each arm consists of m identical position detectors spread over a length l, and that the standard deviation of the measurement error of x is equal to δ. The best angular resolution is obtained if in each arm half of the detectors is placed at each end of the arm. Neglecting all multiple scattering, it is equal to

$$\displaystyle \begin{aligned} \sigma(\alpha)=\frac{2 \delta}{l\sqrt{m/2}}. \end{aligned}$$

The relative momentum resolution due to measurement errors is therefore

$$\displaystyle \begin{aligned} \frac{\sigma_{\mathrm{me}}(p)}{p}=\frac{2 p\delta}{l\sqrt{m/2}k |\bar{B}_y| L}. \end{aligned}$$

Although this arrangement optimizes the precision in terms of geometry, it offers little redundancy for track finding and should be used only in setups with trivial pattern recognition requirements, for instance in the forward direction of fixed target experiments.

At low energies, multiple scattering can no longer be neglected. Whereas σ(p)∕p arising from position measurement errors only is proportional to p, the term σ ms(p)∕p arising from multiple scattering is proportional to \(1/(\beta |\bar {B}_y| L)\), which is large for small β and constant for high momenta (β ≈ 1). Under the same assumptions about the detector positions as above, the following formula is obtained:

$$\displaystyle \begin{aligned} \frac{\sigma_{\mathrm{ms}}(p)}{p}=\frac{0.0136}{\beta k |\bar{B}_y| L}\left( \frac{m{}d}{X_0} \right)^{1/2}, \end{aligned}$$

where dX 0 is the thickness of the detectors in units of radiation length. The total resolution is obtained by adding the corresponding variances and taking the square root,

$$\displaystyle \begin{aligned} \frac{\sigma(p)}{p}=\frac{\sigma_{\mathrm{me}}(p)}{p}\oplus\frac{\sigma_{\mathrm{ms}}(p)}{p}=ap\oplus b, \end{aligned}$$

with a and b depending on the detector and the magnetic field.

13.1.5.2 Cylindrical Spectrometer

Assume that there are m cylindrical detectors immersed in a homogeneous magnetic field B z parallel to the z axis. The projection of the track on the xy plane is a circle with curvature κ. For high momentum the circle can be approximated by a parabola, the detector cylinders can be approximated by planes, and multiple scattering can be neglected. For this case closed formulas for the joint covariance matrix of κ and the tangent \(t_\varphi =\tan \varphi \) of the initial track direction φ can be given [106, 107]. For equidistant detectors, uniform resolution δ and t φ = 0 it is given by

where L is now the track length in the xy projection. L is approximately equal to the radial distance between the innermost and the outermost detector. As κ = kB zp T,

$$\displaystyle \begin{aligned} \frac{\sigma_{\mathrm{me}}(p_{\mathrm{T}})}{{p_{\mathrm{T}}}}=\frac{{p_{\mathrm{T}}}}{k |B_z| L}\frac{\delta}{L}\left[\frac{720\,(m-1)^3}{(m-2)m(m+1)(m+2)}\right]^{1/2}. \end{aligned}$$

There is a high negative correlation between 1∕p T and the direction tangent t φ. For large m, the asymptotic values are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\sigma_{\mathrm{me}}({p_{\mathrm{T}}})}{{p_{\mathrm{T}}}}&\displaystyle =&\displaystyle \frac{{p_{\mathrm{T}}}}{k |B_z| L}\frac{\delta}{L}\left[\frac{720}{m+4}\right]^{1/2},\ \sigma(t_\varphi)=\frac{\delta}{L}\left[\frac{192}{m+3.875}\right]^{1/2},\\ \mathrm{cov}(t_\varphi,1/{p_{\mathrm{T}}})&\displaystyle =&\displaystyle -\frac{\sqrt{15}}{4}=-0.968. \end{array} \end{aligned} $$

More general closed formulas for t φ ≠ 0 are given in [107].

If one half of the detectors is placed at the center of the track and one quarter at either end, the variance of the curvature is minimal, and the covariance matrix reads

which is considerably smaller than in the equidistant case. This arrangement, however, is not particularly well suited for track finding and moreover difficult to realize.

The contribution of multiple scattering to the transverse momentum resolution can be approximated by

$$\displaystyle \begin{aligned} \frac{\sigma_{\mathrm{ms}}({p_{\mathrm{T}}})}{{p_{\mathrm{T}}}}=C_m\cdot\frac{s}{\beta k|B_z|L} \left(\frac{md}{X_0\cos\lambda} \right)^{1/2},{} \end{aligned} $$
(13.6)

where dX 0 is the thickness of the detectors in units of radiation length, λ = π∕2 − θ is the dip angle of the track, s = 0.0136(1 + 0.038ln(dX 0)), k is as in Eq. (13.2), and C m is a factor depending on m. Values of C m for small m, obtained by the program described in [108], are given in Table 13.2. Note that the values are different from the ones given in [106]. In a time projection chamber mdX 0 has to be replaced by LX 0, where X 0 is the radiation length of the gas. The factor \(\cos \lambda \) in the denominator accounts for the actual amount of matter traversed by a track with dip angle λ. Approximate formulas for the best possible resolution including multiple scattering can be found in [109].

Table 13.2 Values of C m in Eq. (13.6)

The total transverse momentum resolution is calculated by quadratic addition,

$$\displaystyle \begin{aligned} \frac{\sigma(p_{\mathrm{T}})}{p_{\mathrm{T}}}=\frac{\sigma_{\mathrm{me}}(p_{\mathrm{T}})}{p_{\mathrm{T}}} \oplus \frac{\sigma_{\mathrm{ms}}(p_{\mathrm{T}})}{p_{\mathrm{T}}}, \end{aligned}$$

which can be written in the form

$$\displaystyle \begin{aligned} \frac{\sigma(p_{\mathrm{T}})}{p_{\mathrm{T}}}= \frac{a\,p_{\mathrm{T}}}{\sqrt{m+4}}\oplus\frac{b\sqrt{m}}{\sqrt{\cos\lambda}}. \end{aligned}$$

This shows that an optimal m exists for every p T and λ if the projected track length L is kept fixed. Overinstrumentation will deteriorate the resolution for low momenta unless additional measurements can be included without increasing the amount of matter to be traversed.

In order to calculate the error of the momentum \(p={p_{\mathrm {T}}}/\cos \lambda \) the error in λ must be taken into account:

$$\displaystyle \begin{aligned} \sigma^2(p)=\sigma^2({p_{\mathrm{T}}})/\cos^2\lambda+\sigma^2(\lambda)\,{p_{\mathrm{T}}}^2\sin^2\lambda/\cos^4\lambda, \end{aligned}$$

the correlation between p T and λ being negligible in practice. Because of σ(p)∕p = (1∕p) it follows that:

$$\displaystyle \begin{aligned} \frac{\sigma(p)}{p}=\frac{\sigma(p_{\mathrm{T}})}{p_{\mathrm{T}}} \oplus \sigma(\lambda)\tan\lambda. \end{aligned}$$

With the exception of very low momenta the track can be approximated by a straight line in the rz projection, where r = (x 2 + y 2)1∕2. For m equidistant detectors and uniform resolution δ, the variance of the direction tangent \(t_\lambda =\tan \lambda \) due to the measurement errors is given by [106]:

$$\displaystyle \begin{aligned} \sigma_{\mathrm{me}}^2(t_\lambda)=\frac{\delta^2}{L^2}\frac{12(m-1)}{m(m+1)}\frac{1}{\cos^4\lambda}. \end{aligned}$$

If the measurement error in z is very small, the variance of t λ is dominated by multiple scattering. For equidistant layers of uniform thickness d an approximate formula can be given. Remarkably, it does not depend on the number of layers:

$$\displaystyle \begin{aligned} \sigma_{\mathrm{ms}}^2(t_\lambda)\approx \frac{s^2}{p^2}\frac{d}{X_0\cos\lambda}\frac{1}{\cos^4\lambda}= \frac{s^2}{{p_{\mathrm{T}}}^2}\frac{d}{X_0\cos\lambda}\frac{1}{\cos^2\lambda}, \end{aligned}$$

with s = 0.0136(1 + 0.038ln(dX 0)).

In the design and optimization phase of the detector a precise evaluation of the resolution of all track parameters is mandatory. There are several software packages that allow a fast track simulation plus reconstruction in a general detector setup, for instance [110] (in FORTRAN), [111] (in Matlab/Octave), or [108] (in Java).

13.2 Vertex Reconstruction

13.2.1 Introduction

Vertex reconstruction is the task of finding and estimating the production point of a set of particles. The pattern recognition algorithms and statistical estimation methods involved are in many respects similar to the ones used in track reconstruction. For an overview of vertex reconstruction algorithms used in past or active experiments see for instance [112,113,114,115,116].

In practice it is useful to distinguish between several types of vertices:

  1. 1.

    The primary vertex is the point of collision of two beam particles (in a collider experiment) or of a beam particle and a target particle (in a fixed-target experiment).

  2. 2.

    A secondary decay vertex is the point where an unstable particle decays in the detector volume or in the beam pipe. An example is the decay \({K^0_S}{\ \rightarrow \ }{\pi ^+}{\pi ^-}\).

  3. 3.

    A secondary interaction vertex is the point where a particle interacts with the material of the detector. Examples are bremsstrahlung, pair production, and inelastic hadronic interactions.

Vertex reconstruction frequently proceeds in several steps:

  1. 1.

    Vertex finding: Finds the tracks that belong to a common primary or secondary vertex.

  2. 2.

    Vertex fitting: Estimates for each vertex candidate the location of the common vertex and computes the associated covariance matrix.

  3. 3.

    Test of vertex hypothesis: Tests for each vertex candidate whether all tracks do indeed belong to the vertex and identifies outliers.

  4. 4.

    Update: Uses the vertex constraint to improve the location and momentum estimate of the tracks belonging to the vertex.

  5. 5.

    Kinematic fit: Kinematic constraints such as momentum and energy conservation are imposed on the mother and daughter particles of a vertex, and mass hypotheses are tested. Kinematic fits are most frequently applied to secondary decay vertices.

Vertex finding can be accomplished in many different ways. A few of them will be described in Sect. 13.2.2. The vertex fit takes a vertex candidate and estimates the vertex location from the estimated track parameters of the outgoing particles (Sect. 13.2.3). As a rule, only charged particles are used, but sometimes also neutral particles contribute to the vertex fit. In the test stage (Sect. 13.2.3.2) outliers are identified, i.e., particles that apparently do not belong to the estimated vertex. As this can lead to a different assignment of particles to vertices, it can be considered as a method of vertex finding. If outliers are expected, the estimation procedure should be robust so that the estimated vertex is not significantly biased by the outliers (Sect. 13.2.3.3). Kinematic constraints (Sect. 13.2.4) are usually imposed via Lagrange multipliers. By repeating the kinematic fit under various mass hypotheses of the mother and/or daughter particles the most likely mass assignment can be found out.

13.2.2 Vertex Finding

Vertex finding is the process of dividing the reconstructed tracks in an event into classes such that presumably all tracks in a class are produced at the same vertex. The primary vertex in an event is usually easy to find, especially if prior information about its location is available (beam profile, target position). On the other hand, secondary decay vertices of short-lived decays are hard to find, as some of the decay products may also be compatible with the primary vertex. Vertex finding methods can be roughly divided in three main types: generic clustering algorithms, topological methods, and iterated estimators. The latter can be considered as a special divisive clustering method.

13.2.2.1 Clustering Methods

As mentioned above in the context of jet finding (see Sect. 13.1.3.6), clustering methods are based on a distance matrix or a similarity matrix of the objects to be classified. A cluster is then a group with small distances (large similarities) inside the group and large distances (small similarities) to objects outside the group. The distance measure reflects only the geometry of the tracks.

Various clustering methods have been evaluated in the context of vertex finding, of both the hierarchical and the non-hierarchical type [117]. Hierarchical clustering can be agglomerative or divisive. In agglomerative clustering each track starts out as a single cluster. Clusters are merged iteratively on the basis of a distance measure. The shortest distance in space between two tracks is peculiar insofar as it does not satisfy the triangle inequality: if tracks a and b are close, and tracks b and c are close, it does not follow that tracks a and c are close as well. The distance between two clusters of tracks should therefore be defined as the maximum of the individual pairwise distances, known as complete linkage in the clustering literature. Alternatively, the distance between two clusters can be the distance between the two vertices fitted from the clusters. Divisive clustering starts out with a single cluster containing all tracks. Further division of this cluster can be based on repeated vertex estimation with outlier identification (see Sect. 13.2.2.3). Examples of non-hierarchical clustering methods used in vertex finding are vector quantization, the k-means algorithm and deterministic annealing [113].

13.2.2.2 Topological Methods

A very general topological vertex finder was proposed in [118]. It is related to the Radon transform, which is a continuous version of the Hough transform used for track finding (Sect. 13.1.2.1). The search for vertices is based on a function V (v) which quantifies the probability of a vertex at location v. For each track a Gaussian probability tube f i(v) is constructed. The function V (v) is defined taking into account that the value of f i(v) must be significant for at least two tracks:

$$\displaystyle \begin{aligned} V(\boldsymbol{v})=\sum_{i=0}^n f_i(\boldsymbol{v})-\frac{\sum_{i=0}^n f_i^2(\boldsymbol{v})}{\sum_{i=0}^n f_i(\boldsymbol{v})} \end{aligned}$$

Due to the second term on the right-hand side, V (v) ≈ 0 in regions where f i(v) is significant for only one track. The form of V (v) can be modified to fold in known physics information about probable vertex locations. For instance, V (v) can be augmented by a further function f 0(v) describing the location and spread of the interaction point. In addition, V (v) may be modified by a factor dependent on the angular location of the point v.

Vertex finding amounts to finding local maxima of the function V (v). The search starts at the calculated maxima of the products f i(v)f j(v) for all track pairs. For each of these points the nearest maximum of V (v) is found. These maxima are clustered together to form candidate vertex regions. The final association of the tracks to the vertex candidates can be done on the basis of the respective χ 2 contributions or by an adaptive fit (see Sect. 13.2.3.3). In [119] the topological vertex finder was augmented by a procedure based on the concept of the minimum spanning tree of a graph.

13.2.2.3 Iterated Estimators

Vertex finding can also be accomplished by iterated vertex fits (see Sect. 13.2.3). The procedure can be summarized in the following way:

  1. 1.

    Fit one vertex with all tracks

  2. 2.

    Discard all incompatible tracks

  3. 3.

    Repeat step 1 with all discarded tracks

The iteration stops when no vertex with at least two tracks can be successfully fitted. Step 2 might itself be iterative, especially if the vertex fit is not robust, so that the incompatible tracks have to be removed sequentially. Iterative vertex finders based on a least-squares fit (Sect. 13.2.3.1) and an adaptive fit (Sect. 13.2.3.3) are implemented in the RAVE toolbox [120, 121].

13.2.3 Vertex Fitting

The input to the vertex fit is a vertex candidate, i.e., a set of estimated track parameters \(\{\tilde {\boldsymbol {q}}_1,\ldots ,\tilde {\boldsymbol {q}}_n\}\) located at one or more reference surfaces, along with their covariance matrices {C 1, …, C n}. For instance, in the primary vertex fit in a collider experiment the reference surface may be the beam tube. If possible, the reference surface(s) should be chosen such that multiple scattering between the vertex and the location of the track parameters is negligible.

The parameters to be fitted are the vertex position v and the track momenta p i at the vertex. The functional dependence of the track parameters on the vertex parameters requires a track model, which depends on the shape of the magnetic field in the vicinity of the vertex. If the field is homogeneous, the track model is a helix; if the field is zero, the track model is a straight line. In other cases the track model may have to be computed numerically (see Sect. 13.1.3.2).

13.2.3.1 Least-Squares Methods

The conventional approach to estimating the vertex position is the minimization of some quadratic objective function, yielding a least-squares estimate. There are two main flavors of least-squares estimation in vertex fitting, constrained and unconstrained minimization. In the first case the vertex constraint is introduced into the objective function via a Lagrange multiplier, in the second case the constraint is implicit in the track model.

As an example, consider a vertex fit with n straight tracks. The n straight tracks originating from the common vertex v = (x v, y v, z v)T can be represented by n straight lines with parameters λ i:

$$\displaystyle \begin{aligned} x=x_{\mathrm{v}}+\lambda_i a_i,\quad y=y_{\mathrm{v}}+\lambda_i b_i,\quad z=z_{\mathrm{v}}+\lambda_i,\quad i=1,\ldots,n, \end{aligned}$$

where a i and b i are the direction tangents at the vertex. At the reference surface z = z ref track i is specified by its parameter vector q i = (x i, y i, a i, b i)T, consisting of the intersection point (x i, y i) and the two direction tangents (a i, b i). The track fit delivers estimates \(\tilde {\boldsymbol {q}}_i\) and information matrices G i for i = 1, …, n. In the constrained problem the sum of the squared residuals

$$\displaystyle \begin{aligned} M(\boldsymbol{q}_1,\ldots,\boldsymbol{q}_n)=\sum_{i=1}^n \boldsymbol{e}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{e}_i,\quad \boldsymbol{e}_i=\tilde{\boldsymbol{q}}_i-\boldsymbol{q}_i, \end{aligned} $$
(13.7)

must be minimized under the 2n nonlinear constraints

$$\displaystyle \begin{aligned} x_{\mathrm{v}}=x_i+a_i(z-z_{\mathrm{ref}}) ,\quad y_{\mathrm{v}}=y_i+b_i(z-z_{\mathrm{ref}}) ,\quad i=1,\ldots,n \end{aligned}$$

There are 4n + 3 unknowns, 4n observations and 2n constraints, giving 4n + 2n − (4n + 3) = 2n − 3 degrees of freedom. The resulting track parameters \(\bar {\boldsymbol {q}}_i\) fit best, in the least-squares sense, to the track fit estimates \(\tilde {\boldsymbol {q}}_i\) and at the same time have a common vertex. For the solution of the constrained vertex fit see Sect. 13.2.4.2.

In the example, the constraints can be rewritten as

$$\displaystyle \begin{aligned} x_i=x_{\mathrm{v}}+(z_{\mathrm{ref}}-z) a_i,\quad y_i=y_{\mathrm{v}}+(z_{\mathrm{ref}}-z) b_i,\quad i=1,\ldots,n. \end{aligned} $$
(13.8)

Insertion of Eq. (13.8) into Eq. (13.7) gives the objective function of the unconstrained nonlinear least-squares problem:

$$\displaystyle \begin{aligned} M(\boldsymbol{v},a_1,b_1,\ldots,a_n,b_n)=\sum_{i=1}^n \boldsymbol{e}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{e}_i,\quad \boldsymbol{e}_i=\tilde{\boldsymbol{q}}_i-\boldsymbol{q}_i. \end{aligned}$$

There are now 4n observations and 2n + 3 unknown parameters, namely the vertex position and the track directions at the vertex, giving again 4n − (2n + 3) = 2n − 3 degrees of freedom.

A generalization of this simple case to helix tracks can be found in [122,123,124]. In the general case, the unconstrained problem can be formulated in terms of the unknown vertex position v and the unknown track momentum vectors p i at the vertex [26, 59]. The measurement equation reads

$$\displaystyle \begin{aligned} \boldsymbol{q}_i=\boldsymbol{h}_i(\boldsymbol{v},\boldsymbol{p}_i), i=1,\ldots,n, \end{aligned} $$
(13.9)

where the function h i incorporates the track model in the magnetic field. The objective function is equal to

$$\displaystyle \begin{aligned} M(\boldsymbol{v},\boldsymbol{p}_1,\ldots,\boldsymbol{p}_n)=\sum_{i=1}^n \boldsymbol{e}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{e}_i,\quad \boldsymbol{e}_i=\tilde{\boldsymbol{q}}_i-\boldsymbol{q}_i. \end{aligned}$$

Minimization of the objective function can proceed in several ways. For a detailed exposition of non-linear least-squares estimation see e.g. [125].

13.2.3.1.1 Gauss-Newton Method

Assume that there are approximate values v̆ and p̆i for all i. Then Eq. (13.9) can be approximated by an affine function:

$$\displaystyle \begin{aligned} \boldsymbol{q}_i\approx\boldsymbol{h}_i(\breve{\boldsymbol{v}},\breve{\boldsymbol{p}}_i)+\boldsymbol{A}_i(\boldsymbol{v}-\breve{\boldsymbol{v}})+\boldsymbol{B}_i(\boldsymbol{p}_i-\breve{\boldsymbol{p}}_i)={\boldsymbol{c}}_i+\boldsymbol{A}_i\boldsymbol{v}+\boldsymbol{B}_i\boldsymbol{p}_i, \end{aligned}$$

with

$$\displaystyle \begin{aligned} \boldsymbol{A}_i=\left.\frac{\partial \boldsymbol{h}_i({\boldsymbol{v}},{\boldsymbol{p}}_i)}{\partial\boldsymbol{v}}\right|{}_{\breve{\boldsymbol{v}},\breve{\boldsymbol{p}}_i}, \quad \boldsymbol{B}_i=\left.\frac{\partial \boldsymbol{h}_i({\boldsymbol{v}},{\boldsymbol{p}}_i)}{\partial\boldsymbol{p}_i}\right|{}_{\breve{\boldsymbol{v}},\breve{\boldsymbol{p}}_i}, \quad {\boldsymbol{c}}_i=\boldsymbol{h}_i(\breve{\boldsymbol{v}},\breve{\boldsymbol{p}}_i)-\boldsymbol{A}_i\breve{\boldsymbol{v}}-\boldsymbol{B}_i\breve{\boldsymbol{p}}_i. \end{aligned}$$

The objective function then reads

$$\displaystyle \begin{aligned} M(\boldsymbol{v},\boldsymbol{p}_1,\ldots,\boldsymbol{p}_n)=\sum_{i=1}^n (\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i-\boldsymbol{A}_i\boldsymbol{v}-\boldsymbol{B}_i\boldsymbol{p}_i){}^{\mathrm{T}}{\boldsymbol{G}}_i (\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i-\boldsymbol{A}_i\boldsymbol{v}-\boldsymbol{B}_i\boldsymbol{p}_i). \end{aligned}$$

As M is now quadratic in the unknown parameters, the minimum can be computed explicitly. The estimated vertex position and its covariance matrix are given by

$$\displaystyle \begin{aligned} \tilde{\boldsymbol{v}}_n=\boldsymbol{C}_n\sum_{i=1}^n \boldsymbol{A}_i{}^{\mathrm{T}}{\boldsymbol{G}_i}^B (\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i), \quad \mathrm{Var}(\tilde{\boldsymbol{v}}_n)=\boldsymbol{C}_n=\left(\sum_{i=1}^n \boldsymbol{A}_i{}^{\mathrm{T}}{\boldsymbol{G}_i}^B\boldsymbol{A}_i\right) ^{-1}, \end{aligned} $$
(13.10)

with

$$\displaystyle \begin{aligned} {{\boldsymbol{G}}_i}^B={\boldsymbol{G}}_i-{\boldsymbol{G}}_i\boldsymbol{B}_i\boldsymbol{W}_i\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i,\quad \boldsymbol{W}_i=(\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{B}_i)^{-1}. \end{aligned}$$

In general, the procedure has to be iterated. The measurement equation is expanded at the new estimate, and the estimate is recomputed until convergence is obtained. The formulas required for the implementation of two important cases, fixed-target configuration and solenoidal configuration, are given in [61].

Once \(\tilde {\boldsymbol {v}}_n\) is known, the track momenta and the full covariance matrix can be computed:

$$\displaystyle \begin{aligned} &\tilde{\boldsymbol{p}}_i^n=\boldsymbol{W}_i\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i(\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i-\boldsymbol{A}_i\tilde{\boldsymbol{v}}_n),\\ &\mathrm{Var}(\tilde{\boldsymbol{p}}_i^n)=\boldsymbol{D}_i^n=\boldsymbol{W}_i+\boldsymbol{W}_i\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{A}_i\boldsymbol{C}_n\boldsymbol{A}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{B}_i\boldsymbol{W}_i,{}\\ &\mathrm{Cov}(\tilde{\boldsymbol{p}}_i^n,\tilde{\boldsymbol{v}}_n)=\boldsymbol{E}_i^n=-\boldsymbol{W}_i\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{A}_i\boldsymbol{C}_n. \end{aligned} $$
(13.11)

The estimates can also be computed recursively, resulting in an extended Kalman filter [25, 26, 59]:

$$\displaystyle \begin{aligned}\begin{array}{r*{20}l} &\tilde{\boldsymbol{v}}_i=\boldsymbol{C}_i[\boldsymbol{C}_{i-1}^{-1}\tilde{\boldsymbol{v}}_{i-1}+\boldsymbol{A}_i{}^{\mathrm{T}}{{\boldsymbol{G}}_i}^B (\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i)],\quad & &\boldsymbol{C}_i=(\boldsymbol{C}_{i-1}^{-1}+\boldsymbol{A}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{A}_i)^{-1}\\ &\tilde{\boldsymbol{p}}_i=\boldsymbol{W}_i\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i(\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i-\boldsymbol{A}_i\tilde{\boldsymbol{v}}_i),\quad & &\boldsymbol{D}_i=\boldsymbol{W}_i+\boldsymbol{W}_i\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{A}_i\boldsymbol{C}_i\boldsymbol{A}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{B}_i\boldsymbol{W}_i,\\ &\boldsymbol{E}_i=-\boldsymbol{W}_i\boldsymbol{B}_i{}^{\mathrm{T}}{\boldsymbol{G}}_i\boldsymbol{A}_i\boldsymbol{C}_i. \end{array}\end{aligned} $$

The associated smoother is tantamount to recomputing the track momenta using the last vertex estimate \(\tilde {\boldsymbol {v}}_n\), i.e., Eq. (13.11).

13.2.3.1.2 Newton–Raphson Method

This method uses a local quadratic approximation to the objective function. In order to simplify the notation we introduce α = (v, p 1, …, p n)T, \(\tilde {\boldsymbol {q}}=(\tilde {\boldsymbol {q}}_1,\ldots ,\tilde {\boldsymbol {q}}_n){ }^{\mathrm {T}}\) and h = (h 1, …, h n)T. Then the objective function can be written as

$$\displaystyle \begin{aligned} M({\boldsymbol{\alpha}})=[\tilde{\boldsymbol{q}}-\boldsymbol{h}({\boldsymbol{\alpha}})]{}^{\mathrm{T}}{\boldsymbol{G}}[\tilde{\boldsymbol{q}}-\boldsymbol{h}({\boldsymbol{\alpha}})],\quad {\boldsymbol{G}}=\mathrm{diag}({\boldsymbol{G}}_1,\dots,{\boldsymbol{G}}_n). \end{aligned}$$

If ᾰ is an appropriate expansion point, M(α) is approximated by

$$\displaystyle \begin{aligned} M({\boldsymbol{\alpha}})\approx M(\breve{{\boldsymbol{\alpha}}})+\boldsymbol{g}{}^{\mathrm{T}}({\boldsymbol{\alpha}}-\breve{{\boldsymbol{\alpha}}})+ \textstyle\frac{1}{2}({\boldsymbol{\alpha}}-\breve{{\boldsymbol{\alpha}}}){}^{\mathrm{T}}\boldsymbol{\Omega}({\boldsymbol{\alpha}}-{\breve{{\boldsymbol{\alpha}}}}), \end{aligned}$$

where

$$\displaystyle \begin{aligned} \boldsymbol{g}=\frac{\partial M}{\partial{\boldsymbol{\alpha}}}=-2\boldsymbol{H}{}^{\mathrm{T}}{\boldsymbol{G}}[\tilde{\boldsymbol{q}}-\boldsymbol{h}(\breve{{\boldsymbol{\alpha}}})],\quad \boldsymbol{\Omega}=\frac{\partial^2 M} {\partial{\boldsymbol{\alpha}}\partial{\boldsymbol{\alpha}}{}^{\mathrm{T}}} = 2\boldsymbol{H}{}^{\mathrm{T}}{\boldsymbol{G}}\boldsymbol{H}-2\frac{\partial\boldsymbol{H}{}^{\mathrm{T}}}{\partial{\boldsymbol{\alpha}}{}^{\mathrm{T}}}{\boldsymbol{G}}[\tilde{\boldsymbol{q}}-\boldsymbol{h}(\breve{{\boldsymbol{\alpha}}})] \end{aligned}$$

are the gradient and the Hessian of M, respectively, evaluated at ᾰ, and H is the Jacobian of the track model h(α). If Ω is positive definite, M has a minimum when its gradient is zero, leading to

$$\displaystyle \begin{aligned} \tilde{{\boldsymbol{\alpha}}}=\breve{{\boldsymbol{\alpha}}}-\boldsymbol{\Omega}^{-1}\boldsymbol{g}. \end{aligned}$$

If the second term of the Hessian is set to zero, the Gauss–Newton method is recovered. Clearly, the Newton–Raphson method is more complex, but it gives some additional information about the problem. In particular, a Hessian that is not positive definite indicates that the expansion point is too far from the true global minimum.

13.2.3.1.3 Levenberg–Marquardt Method

In this method the matrix H TGH is inflated by a diagonal matrix kI. As a consequence, the direction of the parameter update is intermediate between the direction of the Gauss–Newton step (k = 0) and the direction of steepest descent (k). An example of a vertex fit with the Levenberg–Marquardt method is given in [122].

13.2.3.1.4 Fast Vertex Fits

The estimated track parameters \(\tilde {\boldsymbol {q}}_i\) are frequently given at the innermost detector surface or at the beam tube. If the \(\tilde {\boldsymbol {q}}_i\) are propagated to the vicinity of the presumed vertex, the vertex estimation can be speeded up by applying some approximations.

The “perigee” parametrization for helical tracks was introduced in [123], with a correction in [124]. The track is parameterized around the point of closest approach (the perigee point v P) of the helix to the z-axis. The variation of transverse errors along the track is neglected in the vicinity of the perigee, and the track direction and curvature at the vertex is considered to be constant. The approximate objective function of the vertex fit can then be written entirely in terms of the perigee points:

$$\displaystyle \begin{aligned} M(\boldsymbol{v})=\sum_{i=1}^n (\boldsymbol{v}_i{}^{\mathrm{P}}-\boldsymbol{v}){}^{\mathrm{T}}\boldsymbol{T}_i(\boldsymbol{v}_i{}^{\mathrm{P}}-\boldsymbol{v}), \end{aligned} $$
(13.12)

where T i is a weight matrix of rank 2. The vertex estimate is then

$$\displaystyle \begin{aligned} \tilde{\boldsymbol{v}}=\left(\sum_{i=1}^n \boldsymbol{T}_i\right)^{-1}\left(\sum_{i=1}^n \boldsymbol{T}_i\boldsymbol{v}_i{}^{\mathrm{P}}\right). \end{aligned}$$

The Jacobians required to compute the T i are spelled out in [123, 124].

A further simplification was proposed in [126]. In the vicinity of the vertex the track is approximated by a straight line. The estimated track parameters are transformed to a coordinate system the x-axis of which is parallel to the track. The vertex is then estimated by minimizing the sum of the weighted transverse distances of the tracks to the vertex. The resulting objective function has the same form as in Eq. (13.12), again with weight matrices of rank 2. The estimate is exact for straight tracks.

A different type of a fast vertex fitting algorithm is described in [127]. It is based on approximating the tracks by straight lines both in the xy plane and in the xz plane. In either projection, the lines representing the tracks are Hough-transformed to points in the dual plane of line parameters. The vertex coordinates are then obtained by a weighted linear least-squares fit in the dual plane.

13.2.3.1.5 Adding Prior Information

If the vertex to be fitted is the primary vertex, there may be prior information about the vertex position from the beam profile in a collider experiment or the target location in a fixed target experiment. The prior information usually comes in the form of a position v 0 plus a covariance matrix C 0. The objective function is then augmented by an additional term

$$\displaystyle \begin{aligned} (\boldsymbol{v}_0-\boldsymbol{v}){}^{\mathrm{T}}\boldsymbol{C}_0^{-1}(\boldsymbol{v}_0-\boldsymbol{v}). \end{aligned}$$

For instance, the Gauss–Newton estimate Eq. (13.10) is modified in the following way:

$$\displaystyle \begin{aligned} \tilde{\boldsymbol{v}}_n=\boldsymbol{C}_n\left[C_0^{-1}\boldsymbol{v}_0+\sum_{i=1}^n \boldsymbol{A}_i{}^{\mathrm{T}}{{\boldsymbol{G}}_i}^B (\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i)\right], \ \mathrm{Var}(\tilde{\boldsymbol{v}}_n)=\boldsymbol{C}_n=\left(C_0^{-1}+\sum_{i=1}^n \boldsymbol{A}_i{}^{\mathrm{T}}{{\boldsymbol{G}}_i}^B\boldsymbol{A}_i\right)^{-1}. \end{aligned}$$

Similar modifications apply to the Newton–Raphson estimate and the fast vertex fits.

13.2.3.2 Vertex Quality and Outlier Removal

Some tracks used in the vertex fit may be outliers in the sense that they do not actually belong to the vertex. Also, the estimated track parameters may be distorted by outliers or distorted hits in the track fit. Both types of outliers distort the vertex estimate and need to be identified.

In the case of Gaussian errors and a linear model the contribution of each track to the minimum value of the objective function is distributed according to a χ 2-distribution with two degrees of freedom. The contribution \(\chi ^2_i\) of track i has to be computed relative to the vertex estimated without track i. For instance, in the Gauss–Newton algorithm:

$$\displaystyle \begin{aligned} \chi^2_i={\boldsymbol{r}}^n_i{}{}^{\mathrm{T}}{\boldsymbol{G}}_i{\boldsymbol{r}}^n_i + (\tilde{\boldsymbol{v}}_n-\tilde{\boldsymbol{v}}_n^{-i}){}^{\mathrm{T}}(\boldsymbol{C}_n^{-i})^{-1}(\tilde{\boldsymbol{v}}_n-\tilde{\boldsymbol{v}}_n^{-i}), \end{aligned}$$

where \({\boldsymbol {r}}^n_i=\tilde {\boldsymbol {q}}_i-{\boldsymbol {c}}_i-\boldsymbol {A}_i\tilde {\boldsymbol {v}}_n-\boldsymbol {B}_i\tilde {\boldsymbol {p}}^n_i\) is the residual of track i and \(\tilde {\boldsymbol {v}}_n^{-i}\) is the vertex estimate with track i removed:

$$\displaystyle \begin{aligned} \tilde{\boldsymbol{v}}_n^{-i}=\boldsymbol{C}_n^{-i}\left[\boldsymbol{C}_n^{-1}\tilde{\boldsymbol{v}}_n-\boldsymbol{A}_i{}^{\mathrm{T}}{{\boldsymbol{G}}_i}^B(\tilde{\boldsymbol{q}}_i-{\boldsymbol{c}}_i)\right],\quad \boldsymbol{C}_n^{-i}=\left(\boldsymbol{C}_n^{-1}-\boldsymbol{A}_i{}^{\mathrm{T}}{{\boldsymbol{G}}_i}^B\boldsymbol{A}_i\right)^{-1}. \end{aligned}$$

Analogous but somewhat simpler formulas hold for the fast vertex fits.

The test statistic \(\chi ^2_i\) can be computed for all i, and the track with the largest \(\chi ^2_i\) is a candidate for removal. This procedure can be repeated until all \(\chi ^2_i\) are below the cut. Even if there is only a single outlier, all \(\chi ^2_i\) are no longer χ 2-distributed and the power of the test is impaired. This loss of power can be compensated by robust estimation of the vertex.

13.2.3.3 Robust and Adaptive Estimators

Robust estimators are less influenced or not influenced at all by outlying observations. This can be achieved by downweighting outliers or by excluding them from the estimate. For example, in the case of a one-dimensional location estimate, the M-estimator [128] downweights outliers, whereas the LMS (least median of squares) estimator [129] uses only one half of the sample (the one spanning the shortest interval) and ignores the other one.

Robust estimators tend to be statistically less efficient and computationally more expensive than least-squares estimators. On the other hand, estimation and outlier detection are performed in parallel, whereas a least-squares estimator has to be recomputed after an outlier has been identified and removed.

One of the earliest proposals for a robust vertex fit is in [130]. The method is an M-estimator with Huber’s ψ-function [131]. It is implemented as a re-weighted least-squares estimator. The initial vertex estimate is a plain least-squares estimate. Then, for each track, the residuals are rotated to the eigensystem of the covariance matrix of the track, and weight factors are computed according to

$$\displaystyle \begin{aligned} w_i=\frac{\psi(r_i/\sigma_i)}{r_i/\sigma_i}= \begin{cases} 1, & |r_i|\leq c\sigma_i,\\ c\sigma_i/|r_i|, & |r_i|> c\sigma_i, \end{cases} \end{aligned}$$

where r i is one of the residuals in the rotated frame, σ i is the standard deviation in the rotated frame, and c is the robustness constant, usually chosen between 1 and 3. The weight factors are applied and the estimate is recomputed. The entire procedure is iterated until convergence.

A different kind of re-weighted least-squares estimator is proposed in [132]. The weights are computed according to Tukey’s bi-square function [128]:

$$\displaystyle \begin{aligned} w_i= \begin{cases} \left(1-\dfrac{r_i^2/\sigma^2_i}{c^2}\right)^{2}, & |r_i|\leq c\sigma_i,\\ 0, & \text{otherwise}, \end{cases} \end{aligned}$$

where \(r_i^2\) is the squared residual of track i with respect to the vertex, \(\sigma ^2_i\) is its variance, and c is again the robustness constant. The estimator is now equivalent to a redescending M-estimator, and consequently less sensitive to outliers than Huber’s M-estimator.

The combination of a redescending M-estimator with the concept of deterministic annealing [133] leads to the adaptive method of vertex fitting [113, 117, 134, 135]. The concept of the adaptive vertex fit is derived from the Deterministic Annealing Filter [64] (see Sect. 13.1.3.5). The weights are computed according to

$$\displaystyle \begin{aligned} w_i=\frac{\mathrm{exp}(-\chi^2_i/2T)}{\mathrm{exp}(-\chi_i^2/2T)+\mathrm{exp}(-\chi_{\mathrm{cut}}^2/2T)}, \end{aligned}$$

where \(\chi ^2_i\) is the χ 2-contribution of track i, \(\chi _{\mathrm {cut}}^2\) is a cutoff value, and T is a temperature parameter. The computation of the redescending M-estimator can be interpreted as an EM (expectation–maximization) algorithm [136, 137]. Alternatively it can be viewed as the minimization of the energy function of an elastic arm algorithm [18, 19]. If annealing is employed, the iteration starts at high T. The temperature is then gradually decreased. At low T the weights approach either zero or one. The final weights can be used for classification of the tracks as inliers or outliers. A comparison of the adaptive method with other robust estimators can be found in [138]. The adaptive estimator has been extended to a multi-vertex estimator fitting several vertices simultaneously, including competition of all vertices for all tracks [139].

Iterated re-weighted least-squares estimators require a good starting point, in order to ensure convergence to the global minimum and to minimize the number of iterations required. In many cases a standard least-squares estimate is sufficient. In the presence of a large number of outliers also the starting point should be estimated robustly, preferably by an estimator with a high breakdown point [129]. Several such initial estimators have been proposed and studied in [117].

The M-estimators and the adaptive estimator presented above do not presuppose an explicit outlier model. If it is possible to describe the outliers by a Gaussian mixture model, estimation of the vertex can be carried out by the Gaussian-sum filter [140].

Several of the estimators described here are implemented in RAVE, a detector-independent toolkit for reconstruction of interaction vertices [120, 121].

13.2.4 Kinematic Fitting

Kinematic fitting imposes physical constraints on the particles participating in an interaction and thereby improves the measured track momenta and positions. At the same time hypotheses about the interaction and the participating particles can be tested.

13.2.4.1 Lagrange Multiplier Method

The most commonly used method of imposing constraints on the measured tracks is by way of Lagrange multipliers [141]. Let \(\tilde {\boldsymbol {q}}=(\tilde {\boldsymbol {q}}_1,\ldots ,\tilde {\boldsymbol {q}}_n){ }^{\mathrm {T}}\) be the unconstrained estimated parameters of a set of n tracks, along with their joint information matrix G = diag(G 1, …, G n) = V −1. The r functions describing the constraints can be written as g(q) = 0. Taylor expansion around a suitable point q̆ yields the linearized equation

$$\displaystyle \begin{aligned} \breve{\boldsymbol{g}}+\boldsymbol{D}(\boldsymbol{q}-\breve{\boldsymbol{q}})=\boldsymbol{0}, \end{aligned}$$

where D is the Jacobian of g with respect to q, evaluated at q̆, and ğ = g(q̆). The obvious expansion point is \(\breve {\boldsymbol {q}}=\tilde {\boldsymbol {q}}\). The constrained track parameters \(\bar {\boldsymbol {q}}_i\) are obtained by minimizing the objective function

$$\displaystyle \begin{aligned} M(\boldsymbol{q},{\boldsymbol{\lambda}})=(\boldsymbol{q}-\tilde{\boldsymbol{q}}){}^{\mathrm{T}}{\boldsymbol{G}}(\boldsymbol{q}-\tilde{\boldsymbol{q}})+2{\boldsymbol{\lambda}}{}^{\mathrm{T}}\left[\breve{\boldsymbol{g}}+\boldsymbol{D}(\boldsymbol{q}-\breve{\boldsymbol{q}})\right] \end{aligned}$$

with respect to q and λ. λ is a vector of r unknowns, the Lagrange multipliers. The solution is

$$\displaystyle \begin{aligned} \bar{\boldsymbol{q}}=\tilde{\boldsymbol{q}}-\boldsymbol{V}\boldsymbol{D}{}^{\mathrm{T}}\bar{\boldsymbol{\lambda}},\quad \text{with}\quad \bar{\boldsymbol{\lambda}}={\boldsymbol{G}_{D}}\left[\breve{\boldsymbol{g}}+\boldsymbol{D}(\tilde{\boldsymbol{q}}-\breve{\boldsymbol{q}})\right]\quad \text{and}\quad {\boldsymbol{G}_{D}}=(\boldsymbol{D}\boldsymbol{V}\boldsymbol{D}{}^{\mathrm{T}})^{-1}. \end{aligned}$$

The covariance matrix \(\bar {\boldsymbol {V}}\) and the χ 2 statistic are given by

$$\displaystyle \begin{aligned} \bar{\boldsymbol{V}}=\boldsymbol{V}-\boldsymbol{V}\boldsymbol{D}{}^{\mathrm{T}}{\boldsymbol{G}_{D}}\boldsymbol{D}\boldsymbol{V},\quad \chi^2=\bar{\boldsymbol{\lambda}}^{\mathrm{T}}\boldsymbol{G}_{D}^{-1}\bar{\boldsymbol{\lambda}}=\bar{\boldsymbol{\lambda}}{}^{\mathrm{T}}\left[{\breve{\boldsymbol{g}}}+\boldsymbol{D}(\tilde{\boldsymbol{q}}-\breve{\boldsymbol{q}})\right]. \end{aligned}$$

If required, the constraint function g can be re-expanded at the new point \(\breve {\boldsymbol {q}}=\bar {\boldsymbol {q}}\), and the constrained track parameters can be recomputed.

The Jacobian D depends both on the parametrization of the tracks and on the type of constraint to be imposed. For kinematic constraints it is often convenient to choose a parametrization that uses physically meaningful quantities. In [142] it is proposed to use the four-momentum and a point in space, i.e., q = (p x, p y, p z, E, x, y, z). With this parametrization the following constraints can be formulated in a straightforward manner (for further examples see [142]).

  1. 1.

    Invariant mass constraint. The equation that constrains a track to have an invariant mass m c is

    $$\displaystyle \begin{aligned} E^2-p_x{}^{2\,}-p_y{}^{2\,}-p_z{}^{2\,}-m_c{}^{2\,}=0. \end{aligned}$$

    Expanding at q̆ = (p̆x, p̆y, p̆z, Ĕ, x̆, y̆, z̆) yields

    $$\displaystyle \begin{aligned} \boldsymbol{D}=\begin{pmatrix} -2\breve{p}_x & -2\breve{p}_y &-2\breve{p}_z & 2\breve{E} & 0 & 0 & 0 \end{pmatrix},\quad \breve{\boldsymbol{g}}=\breve{E}^2-\breve{p}_x{}^{2\,}-\breve{p}_y{}^{2\,}-\breve{p}_z{}^{2\,}-m_c{}^{2\,}. \end{aligned}$$
  2. 2.

    Total energy constraint. The equation that constrains a track to have a total energy E c is

    $$\displaystyle \begin{aligned} E-E_c=0. \end{aligned}$$

    It follows that

    $$\displaystyle \begin{aligned} \boldsymbol{D}=\begin{pmatrix} 0 & 0 & 0 & 1 & 0 & 0 & 0 \end{pmatrix},\quad \breve{\boldsymbol{g}}=\breve{E}-E_c. \end{aligned}$$
  3. 3.

    Total momentum constraint. The equation that constrains a track to have a total momentum p c is

    $$\displaystyle \begin{aligned} \sqrt{p_x{}^{2\,}+p_y{}^{2\,}+p_z{}^{2\,}}-p_c=0. \end{aligned}$$

    Expanding at q̆ = (p̆x, p̆y, p̆z, Ĕ, x̆, y̆, z̆) yields

    $$\displaystyle \begin{aligned} \boldsymbol{D}=\begin{pmatrix}\dfrac{\breve{p}_x}{\breve{p}} & \dfrac{\breve{p}_y}{\breve{p}} & \dfrac{\breve{p}_z}{\breve{p}} & 0 & 0 & 0 & 0 \end{pmatrix},\quad \breve{\boldsymbol{g}}=\sqrt{\breve{p}_x{}^{2\,}+\breve{p}_y{}^{2\,}+\breve{p}_z{}^{2\,}}-p_c. \end{aligned}$$

13.2.4.2 Vertex Constraint

If a vertex constraint is added to the kinematic constraints, the constraint functions depend on the unknown vertex position v and are extended to g(q, v) = 0. Taylor expansion around a suitable point (q̆, v̆) yields the linearized equation

$$\displaystyle \begin{aligned} \breve{\boldsymbol{g}}+\boldsymbol{D}(\boldsymbol{q}-\breve{\boldsymbol{q}})+\boldsymbol{E}(\boldsymbol{v}- \breve{\boldsymbol{v}})=\boldsymbol{0}, \end{aligned}$$

where E is the Jacobian of g with respect to v, evaluated at v̆, and ğ = g(q̆, v̆). It is assumed that there is prior information about the vertex position, represented by the position \(\tilde {\boldsymbol {v}}\) and the covariance matrix C. The position \(\tilde {\boldsymbol {v}}\) can be used as the expansion point v̆.

The constrained track parameters \(\bar {\boldsymbol {q}}_i\) and the estimated vertex position \(\bar {\boldsymbol {v}}\) are obtained by minimizing the objective function

$$\displaystyle \begin{aligned} M(\boldsymbol{q},\boldsymbol{v},\boldsymbol{\lambda})=(\boldsymbol{q}-\tilde{\boldsymbol{q}}){}^{\mathrm{T}}\boldsymbol{G}(\boldsymbol{q}- \tilde{\boldsymbol{q}})+(\boldsymbol{v}-\tilde{\boldsymbol{v}}){}^{\mathrm{T}}\boldsymbol{C}^{-1}(\boldsymbol{v}-\tilde{\boldsymbol{v}})+ 2\boldsymbol{\lambda}{}^{\mathrm{T}}\left[\breve{\boldsymbol{g}}+\boldsymbol{D}(\boldsymbol{q}-\breve{\boldsymbol{q}})+\boldsymbol{E}(\boldsymbol{v}- \breve{\boldsymbol{v}})\right] \end{aligned}$$

with respect to q, v, and λ. The solution is

$$\displaystyle \begin{aligned} \bar{\boldsymbol{\lambda}}=\boldsymbol{W}\left[\breve{\boldsymbol{g}}+\boldsymbol{D}(\tilde{\boldsymbol{q}} - \breve{\boldsymbol{q}})+\boldsymbol{E}(\tilde{\boldsymbol{v}}-\breve{\boldsymbol{v}})\right],\quad \bar{\boldsymbol{v}}=\tilde{\boldsymbol{v}}-\boldsymbol{C}\boldsymbol{E}{}^{\mathrm{T}}\bar{\boldsymbol{\lambda}},\quad \bar{\boldsymbol{q}}=\tilde{\boldsymbol{q}}-\boldsymbol{V}\boldsymbol{D}{}^{\mathrm{T}}\bar{\boldsymbol{\lambda}}, \end{aligned}$$

with \(\boldsymbol {W}=\left (\boldsymbol {D}\boldsymbol {V}\boldsymbol {D}{ }^{\mathrm {T}}+\boldsymbol {E}\boldsymbol {C}\boldsymbol {E}{ }^{\mathrm {T}}\right )^{-1}\). The covariance matrices are

$$\displaystyle \begin{aligned} \mathrm{Var}(\bar{\boldsymbol{v}})=\boldsymbol{C}-\boldsymbol{C}\boldsymbol{E}{}^{\mathrm{T}}\boldsymbol{W}\boldsymbol{E}\boldsymbol{C},\quad \mathrm{Var}(\bar{\boldsymbol{q}})=\boldsymbol{V}-\boldsymbol{V}\boldsymbol{D}{}^{\mathrm{T}}\boldsymbol{W}\boldsymbol{D}\boldsymbol{V},\quad \mathrm{Cov}(\bar{\boldsymbol{q}},\bar{\boldsymbol{v}})=-\boldsymbol{V}\boldsymbol{D}{}^{\mathrm{T}}\boldsymbol{W}\boldsymbol{E}\boldsymbol{C}. \end{aligned}$$

The χ 2 statistic is

$$\displaystyle \begin{aligned} \chi^2=\bar{\boldsymbol{\lambda}}{}^{\mathrm{T}}\boldsymbol{W}^{-1}\bar{\boldsymbol{\lambda}}=\bar{\boldsymbol{\lambda}}{}^{\mathrm{T}}\left[ \breve{\boldsymbol{g}}+\boldsymbol{D}(\tilde{\boldsymbol{q}}-\breve{\boldsymbol{q}})+\boldsymbol{E}(\tilde{\boldsymbol{v}}-\breve{\boldsymbol{v}})\right], \end{aligned}$$

with r degrees of freedom, where r is the number of constraint functions. If the vertex constraint is the only constraint imposed on the tracks, the χ 2 has 2n degrees of freedom. If there is no prior information about the vertex, the prior vertex position is assigned an infinitely large covariance matrix, and W is replaced by its limiting value:

$$\displaystyle \begin{aligned} \boldsymbol{W}=\lim_{\boldsymbol{C}\rightarrow\infty}\left(\boldsymbol{D}\boldsymbol{V}\boldsymbol{D}{}^{\mathrm{T}}+\boldsymbol{E} \boldsymbol{C}\boldsymbol{E}{}^{\mathrm{T}}\right)^{-1}={\boldsymbol{G}_{D}}-{\boldsymbol{G}_{D}}\boldsymbol{E}(\boldsymbol{E}{}^{\mathrm{T}}{\boldsymbol{G}_{D}}\boldsymbol{E})^{-1}\boldsymbol{E}{}^{\mathrm{T}}{\boldsymbol{G}_{D}}. \end{aligned}$$

The number of degrees of freedom is reduced to 2n − 3.

13.3 Track Reconstruction in the LHC Experiments

13.3.1 ALICE

ALICE [143] is the experiment at the LHC that is devoted to the physics of high energy ion collisions. Its main goal is to investigate the physics of strongly interacting matter and the quark-gluon plasma at extreme values of energy density and temperature in nucleus-nucleus collisions. Among the four experiments at the LHC, ALICE is equipped with the largest number of subdetectors in order to face the reconstruction complexity of ion physics events. In particular, three subdetectors focus on measuring the passage of charged particles using the bending power of the magnetic field. They are assembled in a cylindrical fashion: the Inner Tracking System (ITS) with six planes of high-resolution silicon pixel, drift, and strip detectors, the cylindrical Time-Projection Chamber (TPC) and the Transition Radiation Detector (TRD). The principal functions of the ITS are the identification and reconstruction of secondary vertices, the track reconstruction of low-p T particles and the improvement of the impact parameter and momentum resolution. The TPC is the most important tracking sub-detector. Thanks to its time information, it can provide an efficient and robust tracking also in a very high multiplicity environments (in the order of 10,000 charged particles). Finally, the TRD is also used for tracking in the central region and for improving the p T resolution at high momentum.

The first step in the track reconstruction in ALICE is the clusterization, which is performed separately for each of the three subdetectors [144]. Tracking then proceeds by determining the preliminary interaction vertex using tracklets defined as lines built with pairs of clusters in the first two layers of the ITS. The preliminary interaction vertex is thus found as a space point to which a maximum number of tracklets converge. In the next step, track finding and fitting is performed in three stages using a inward-outward-inward strategy:

  • Initially, tracks in the TPC are searched for using the Kalman filter technique and the outermost layers of the TPC for the seed. A preliminary particle identification is also possible at this stage based on the specific energy loss in the TPC gas. Then, the reconstructed TPC tracks are propagated to the outermost ITS layer and become the seeds for finding tracks in the ITS. In Fig. 13.4 the ITS–TPC matching efficiency as a function of the transverse momentum for 2010–2013 data and Monte Carlo for pp and heavy ion collisions is shown. Finally, the last step is performed in order to recover tracks of particle with p T down to 80 MeV. It performs a standalone ITS reconstruction with those clusters that were not used in the ITS–TPC tracks.

    Fig. 13.4
    figure 4

    ITS–TPC matching efficiency vs. p T for data and Monte Carlo for pp (left) for Pb-Pb (right) collisions in the ALICE experiment [144]

  • All reconstructed tracks are then extrapolated to their point of closest approach to the preliminary interaction vertex, and are extrapolated from the innermost layer to the outermost one. Tracks are refitted by the Kalman filter using the clusters found at the previous stage. After the reconstruction in the TRD subdetectors, the track is matched with a possible TRD tracklet in each of the six TRD layers. In a similar way, the tracks reaching the time-of-flight (TOF) detector are matched to TOF clusters.

  • At the final stage of the track reconstruction, all tracks in both ITS and TPC subdetectors are propagated inwards and refitted one last time to determine the final estimate of the track position, direction, inverse curvature, and its associated covariance matrix.

The final interaction vertex is then re-determined using the all tracks reconstructed in TPC and ITS. The precise vertex fit is performed using track weighting to suppress the contribution of any remaining outliers. For data-taking conditions where a high pileup rate is expected, a more robust version of vertex finding inspired by the algorithm described in [132] is used. It is based on iterative vertex finding and fitting using Tukey bisquare weights to suppress outliers. The algorithm stops when no more vertices are identified in the scan along the beam direction. Once the tracks and the interaction vertex have been found, a search for photon conversions and decays of strange hadrons such as \(K^0_{S}\) and Λ0 concludes the central-barrel tracking procedure.

13.3.2 ATLAS

ATLAS [145] is the largest of the four LHC experiments, measuring 25 m in diameter and 44 m in length. Its magnet system is composed of a Central Solenoid Magnet with a 2 T field, a Barrel Toroid and an Endcap Toroids with 4 T each. The Inner Detector (ID) is very compact and highly sensitive in order to measure accurately the decay products of each collision. It consists of three different systems of sensors immersed in the solenoid magnetic field: the Pixel Detector, the Semiconductor Tracker (SCT), and the Transition Radiation Tracker (TRT). The Pixel Detector is situated closest to the interaction point and has the highest granularity with about 80 million readout channels. The intrinsic spatial resolution of the Pixel Detector sensors is 10 μm in rϕ and 115 μm in z. The SCT is a silicon microstrip detector surrounding the Pixel Detector. It provides eight measurements per track with an overall resolution of 16 μm in rϕ and 580 μm in z. In the outermost region, the TRT is placed. It is a light-weight detector composed of proportional gas counters (70% Xe, 27% CO2 and 3% O2 straws) embedded in a radiator material and its operational drift radius accuracy is about 130 μm. The TRT contributes both to the track pattern recognition stage, featuring typically around 30 hits per track, and to particle identification.

The basic concepts of the ATLAS track reconstruction are described in [146, 147]. The tracking in the ID consists of two principal sequences: an initial inside-out tracking, and a subsequent outside-in tracking. Inside-out tracking starts with space point formation in the silicon part of the ID. Using the space points, track seeds are generated with or without a constraint on the longitudinal vertex position. The seeds are then followed through the SCT by a combinatorial Kalman filter/smoother. After ambiguity solving, the remaining track candidates are extended into the TRT. Outside-in tracking first finds track segments in the TRT, using a Hough transform of the straw centers. A Kalman filter/smoother using also the drift times builds the final track segments. These track segments are then extrapolated back into the SCT and the Pixel Detector. Muons are reconstructed in the ID like any other charged particles; for the standalone reconstruction of muons in the muon system and the combined reconstruction, see [148].

Based on the experience gained in Run 1, several improvements to track reconstruction were made for Run 2 [149]. For example, the tracking was adapted to the new insertable B-layer (IBL) [150], and track reconstruction in dense environments (TIDE) was optimized [151]. This included an artificial neural network based approach to identify pixel clusters created by multiple charged particles. The effect of these two developments is shown in Fig. 13.5. In Fig. 13.5a the transverse impact parameter as a function of track momentum resolution is shown for data taken in 2015 at 13 TeV with the inclusion of the IBL information and for data in 2012 at 8 TeV without the IBL. The data in 2015 was collected with a minimum bias trigger. The data in 2012 is derived from a mixture of jet, tau and missing E T triggers [150]. Figure 13.5b shows the improvement of the track reconstruction efficiency in the jet core due to the TIDE optimization [151].

Fig. 13.5
figure 5

(a) Upper panel: unfolded transverse impact parameter resolution measured from data in 2015 at 13 TeV with the Inner Detector including the IBL, as a function of track p T for values of 0.0 < η < 0.2, compared to that measured from data in 2012 at 8 TeV [150]; lower panel: ratio of the resolution in 2015 over the resolution in 2012. (b) Improvement of the track reconstruction efficiency due to the TIDE optimization, as a function of the angular distance of the particle from the jet axis. The track selection is explained in [151]

ATLAS track reconstruction efficiency as a function of pseudorapidity and transverse momentum with simulated data at a center-of-mass energy of 13 TeV is shown in Fig. 13.6 [152].

Fig. 13.6
figure 6

The track reconstruction efficiency (a) as a function of pseudorapidity and (b) as a function of transverse momentum, as predicted by Pythia 8 A2 simulation. The statistical uncertainties are shown as black lines, the total uncertainties as green shaded areas [152]

13.3.3 CMS

CMS [153], together with ATLAS, is one of the two general-purpose experiments at the LHC. Its main distinguishing feature is a 3.8 T superconducting solenoid. With a length of 13 m and a diameter of 6 m, it provides a high bending power to precisely measure the momentum of charged particles. The solenoid magnetic field lines run parallel to the beam direction in the central region, where the tracking system is placed. The tracking system is designed to provide a precise and efficient measurement of particle trajectories using position-sensitive detectors. The CMS tracker is a silicon-based system [154]. It splits into two parts, the Pixel Tracker and the Strip Tracker and covers a pseudorapidity range up to |η| = 2.5. The Pixel Tracker is the innermost CMS detector sub-system and is composed of 66 million silicon pixels with dimensions 100 × 250 × 250 μm, covering a total area of about 1 m2. In the barrel layers the magnetic field induces a Lorentz angle which increases charge sharing between neighbouring pixels. Charge sharing in conjunction with analog readout allows to achieve 10 μm position resolution for the (r, ϕ) coordinate and 15 μm in the z direction. The pixel detectors in the forward direction are tilted at an angle of 20 to induce charge sharing which allows to achieve 15 μm and 20 μm resolution respectively. This resolution is not only necessary for a precise track reconstruction, but also for the determination of both the vertices produced in the primary interaction and the decay vertices of short-lived particles.

The Strip Tracker constitutes the outer part of the tracking system. Its basic building blocks are silicon strip modules. Each module is equipped with one or two silicon sensors and a so-called Front-End hybrid containing readout electronics. In total, the CMS silicon strip tracker has 9.3 million strips and covers 198 m2 of active silicon area. The resolution in (r, ϕ) is ≃30 μm in all layers. The inner layers of the strip tracker are equipped with double-sided sensors, one side of which is rotated by a stereo angle of 100 mrad, achieving a resolution along the z coordinate of about 230 μm and allowing the reconstruction of the hit position in 3-D. In the outer layers the sensors are single-sided, and the z resolution can be approximated by the strip length over \(\sqrt {12}\), or about 15 mm. In order to maintain excellent tracking performance until the Long Shutdown 3 of LHC, the Pixel Tracker was replaced in the year-end technical stop of 2016∕2017 with a new Pixel Tracker composed of four barrel layers and six forward disks providing four-hit pixel coverage up to |η| = 2.5. After the Long Shutdown 3, the High Luminosity phase of the LHC (HL-LHC) is scheduled where the accelerator will provide an unprecedented instantaneous luminosity of 5 − 7.5 × 1034 cm−2 s−1. In [155] the new CMS silicon tracker and its tracking and vertexing performance for different event types, pileup scenarios and detector geometries are presented.

The CMS track reconstruction algorithm is based on an iterative approach [156]. The main idea is to search for easier-to-find tracks first, to mask the hits associated to the found tracks, and to proceed to the next iteration. In this way the combinatorial problem is reduced, and the search for more difficult classes of tracks is simplified. Moreover, this approach introduces the possibility of developing special iterations that can improve track reconstruction in high-density environments such as jets, or to use the information from other subsystems such as muon chambers and calorimeters. In each iteration, the Combinatorial Track Finder is run. It can be divided into four different steps:

  1. 1.

    Seed generation: Using the information of three or four hits, the trajectory parameters and the corresponding uncertainties of the initial track candidates are computed.

  2. 2.

    Track finding: Staring from the seed, the current trajectory parameters and their uncertainties are extrapolated to the next layer and compatible hits are found. Each of them is added to a clone of the track candidate. Each of these candidates is again extrapolated to the next layer and compatible hits are found. This procedure is repeated for each candidate until there is more than one missing hit or the extrapolation does not find another tracker layer.

  3. 3.

    Track fitting: A Kalman filter or a Gaussian-sum filter/smoother is performed to obtain the final estimate of the track parameters at the interaction point exploiting the full trajectory information.

  4. 4.

    Track selection: Tracks are grouped in classes according to different track quality criteria.

As an example, the twelve tracking iterations foreseen for 2017 data taking is listed in Table 13.3 [157]. The main difference between iterations is the configuration of the seed generation and the target tracks.

Table 13.3 List of different tracking iterations used after the Pixel Tracker upgrade with the corresponding seeding configuration used and target tracks [157]

Figure 13.7 shows the tracking efficiency, using a standard sample of \({\mathrm {t}\overline {\mathrm {t}}}\) events simulated with \(\sqrt {s}=13\,\mbox{TeV}\) with different superimposed pileup conditions. The contribution of different iterations for 2017 track reconstruction is also shown as a function of the p T of the simulated particle. It can be seen how iterations targeting low-p T tracks are more efficient in the region between 100 and 500 MeV.

Fig. 13.7
figure 7

Track reconstruction efficiency as a function of simulated track pseudorapidity for 2017 tracker at different pileup conditions (left) and cumulative contributions to the overall tracking performance from the twelve iterations in 2017 track reconstruction shown as a function of the simulated track p T (right) [157]. The 2017 tracking reconstruction includes the Cellular Automaton-based Hit Chain-Maker (CA) seeding [158]

Figure 13.8 shows the muon tracking efficiency and the corresponding ratios between real and simulated data for 2016 collisions data coming from the Z resonance using the tag and probe method. The measured track efficiency as a function of |η| is found to be between 99.5% and 100% for the collection including all tracks. It degrades, however, with increasing number of primary vertices.

Fig. 13.8
figure 8

Data (black dots) and simulation (rectangles) tracking efficiency and respective ratio for muons coming from the Z decay as a function of the absolute pseudorapidity of the probe muon (left) and the number of primary vertices (right). The data are based on an integrated luminosity of 36 fb−1 [155]

13.3.4 LHCb

As its name indicates, LHCb [159] focuses on physics involving bottom quarks and investigates CP violation phenomena. These studies require the measurement of the rare decays of Bd, Bs, and D mesons which are produced with a large cross-section at the LHC. Given the fact that b hadrons are predominantly produced in the forward or backward cone, the LHCb experiment is a single-arm spectrometer in contrast to the other three experiments. In order to exploit this large number of b hadrons, it requires a robust and flexible trigger and a data acquisition that allows high bandwidth data taking and provides powerful online data processing. Furthermore, superior vertex and momentum resolution are crucial to study the rapidly oscillating \(\mathrm {B}_{\mathrm {s}} - \overline {\mathrm {B}_{\mathrm {s}}}\) meson system. LHCb is thus equipped with the highly sophisticated silicon microstrip detector close to the interaction point, the Vertex Locator (VELO). It can be moved to a distance of only 7 mm from the proton beams and measures the position of the primary vertices and the impact parameters of the track with extremely high precision. A further silicon microstrip detector, the Tracker Turicensis (TT) is placed before the dipole magnet. Its task is to improve the momentum resolution of reconstructed tracks and reject pairs of tracks that in reality belong to the same particle. The magnet is placed behind the TT. It bends the flight path of the particles in the x − z plane and therefore allows the determination of their momenta. The tracking system is completed by the T stations (T1-T2-T3), which, together with the information from the VELO, determine the momentum and flight direction of the particles. The T stations are composed of silicon microstrip sensors close to the beam pipe and by straw tubes in the outer regions.

Track reconstruction uses hits in the VELO, TT and T stations. Depending on which detectors are crossed, different track types are defined [160, 161]:

  • Long tracks traverse the full tracking system. They have hits in both the VELO and the T stations, and optionally in TT. They are the most important set of tracks for physics analyses.

  • Upstream tracks pass only through the VELO and TT stations. In general their momentum is too low to traverse the magnet and reach the T stations.

  • Downstream tracks pass only through the TT and T stations. They are important for the reconstruction of long-lived particles that decay outside the VELO acceptance.

  • VELO tracks pass only through the VELO. These tracks are particularly important in the primary vertex reconstruction.

  • T tracks pass only through the T stations. Like the downstream tracks, they are useful for particle identification in the Ring Imaging Cherenkov detectors.

Reconstruction of long tracks starts in the VELO. There are two complementary algorithms to add information from the downstream tracking stations to these VELO tracks. The first one combines the VELO tracks with information from the T stations. The second one combines the VELO tracks with track segments found after the magnet in the T stations, using a standalone track finding algorithm. The candidate tracks found by each algorithm are then combined, removing duplicates, to form the final set of long tracks used for analysis. Finally, hits in the TT consistent with the extrapolated trajectories of each track are added to improve their momentum determination.

Downstream tracks are found starting with T tracks, extrapolating them through the magnetic field and searching for corresponding hits in the TT. Upstream tracks are found by extrapolating VELO tracks to the TT where matching hits are then added in a procedure similar to that used by the downstream tracking. At least three TT hits are required to be present by these algorithms.

The found tracks are fitted using a Kalman filter, taking into account multiple scattering and energy loss due to ionisation. The χ 2-statistic of the fit is used to determine the quality of the reconstructed track. If two or more tracks have many hits in common, only the one with most hits is kept.

The track reconstruction efficiency for the 2012 and the 2015 data as a function of the momentum can be seen in Fig. 13.9 [162]. The results of the two periods are compatible.

Fig. 13.9
figure 9

LHCb track reconstruction efficiency for the 2012 and the 2015 data as a function of the momentum. The efficiency is computed using the “Long method”, described in [161]

13.4 Conclusion

An overview of current methods in track and vertex reconstruction and alignment has been presented. Many of them have been developed in response to the requirements of the current experimental program at the Large Hadron Collider. The most difficult challenges are:

  • Reliable reconstruction of signal events over a large background of non-signal events, pileup events, and low-momentum tracks;

  • Reliable reconstruction of secondary vertices with very short distances from the primary vertex;

  • Precise alignment of a large number of sensors.

Every experiment has to meet these challenges on its own terms. The outlines of the solutions found by the four major LHC experiments are described in Sect. 13.3 and, in more detail, in the references given there. In addition, the repertory of the methods discussed in this contribution can certainly not lay claim to completeness. We have tried to select widely applicable methods, thereby neglecting by necessity many experiment specific adaptations, improvements and innovations, for which we again refer to the references.