1 Basic Techniques

1.1 Conformal Transformation

Circles in the plane passing through the origin can be transformed into straight lines by the following mapping [1]:

$$\displaystyle \begin{gathered} u=\frac{x}{x^2+y^2},\ \; v =\frac{y}{x^2+y^2}.{} \end{gathered} $$
(5.1)

The mapping is conformal, i.e., preserves angles between curves. Assume that a circle is defined by the equation:

$$\displaystyle \begin{gathered} (x-a)^2+(y-b)^2=R^2=a^2+b^2. \end{gathered} $$
(5.2)

Expansion of the left hand side and division by x 2 + y 2 gives a linear equation in u and v:

$$\displaystyle \begin{gathered} 2au+2bv=1. \end{gathered} $$
(5.3)

This is the equation of a straight line in the (u, v)-plane with distance d = 1∕(2R) from the origin. A circle with a large radius R or small curvature κ is therefore transformed into a line that passes very close to the origin (Fig. 5.1). In the limit of zero curvature, the circle becomes a line transformed into itself by the mapping in Eq. (5.1). Both circle finding and circle fitting can be simplified by this transformation from circles to straight lines.

Fig. 5.1
figure 1

Conformal transformation of the circles through the origin in the (x, y)-plane (left) into lines in the (u, v)-plane (right)

1.2 Hough Transform

The Hough transform [2] is a technique that finds clusters of points that lie on or close to a parametric curve such as a straight line or a circle. The number of parameters is usually two or three. In the simplest case, there is a set of points {(x 1, y 1), …, (x n, y n)} in the plane (image space) that lie on a straight line, parameterized by y = k 0 x + d 0, where k 0 is the slope and d 0 is the intercept of the line. A line passing through (x i, y i) fulfills the equation d = −x i k + y i, which is the equation of a line i in the parameter space (Hough space) of k and d. The point of intersection of two lines i, j is just (k 0, d 0), the parameters of the original line. Finding straight lines in the image space is therefore equivalent to finding intersection points in the Hough space.

In practice, the measured points do not lie exactly on a straight line, and the lines in the Hough space do not intersect exactly in a single point. The usual approach is to define a binning in the Hough space and count the number of lines crossing each bin. Peaks in the 2D histogram correspond to lines that are close to many points in the image space. The size of the bins depends on the distribution of the measurement errors and can be tuned on simulated tracks. Alternatively, a 2D binary search can be performed using a quadtree data structure [3, 4].

The parametrization of the lines by y = kx + d can be numerically problematic if very large values of the slope k are possible. A more robust parametrization of the line has the form \(x\cos \varphi +y\sin \varphi -c=0\). The curve in the Hough space of φ and c passing through the point (x i, y i) is a sinusoid with the equation

$$\displaystyle \begin{gathered} c=x_i\cos\varphi+y_i\sin\varphi=r_i\sin\hspace{0.5pt}(\varphi+\varphi_i),\ \;\mathrm{with}\ \;\\ r_i=\sqrt{x_i^2+y_i^2},\ \; \varphi_i=\arctan\hspace{0.5pt}(x_i/y_i). \end{gathered} $$
(5.4)

If the curve to be found in the image space is a circle through the origin, there are two possibilities. The problem can be reduced to the straight line case by a conformal transformation, or a circle through the origin can be parameterized in the following form:

$$\displaystyle \begin{gathered} x^2-2x\hspace{0.5pt} R\cos\varphi+y^2-2y\hspace{0.5pt} R\sin\varphi=0, \end{gathered} $$
(5.5)

where R is the circle radius and φ is the azimuth of the circle center in polar coordinates. The curve in the Hough space of κ = 1∕R and φ passing through the point (x i, y i) is a sinusoid with the equation

$$\displaystyle \begin{gathered} \kappa=\frac{2}{r_i}\sin{}(\varphi+\varphi_i), \end{gathered} $$
(5.6)

with r i and φ i as above, see Fig. 5.2.

Fig. 5.2
figure 2

Top: Image space (x, y); bottom: Hough space (φ, c). The circled point in the Hough space corresponds to the straight line in the image space

If the curve to be found in the image space is a circle in general position with the equation

$$\displaystyle \begin{gathered} (x-x_{\mathrm{c}})^2+(y-y_{\mathrm{c}})^2-R^2=0, \end{gathered} $$
(5.7)

the constraint that the circle passes through the point (x i, y i) defines a second order surface in the 3D Hough space of x c, y c, z:

$$\displaystyle \begin{gathered} z=(x_i-x_{\mathrm{c}})^2+(y_i-y_{\mathrm{c}})^2,\ \;\mathrm{with}\ \; z=R^2. \end{gathered} $$
(5.8)

It follows that finding circles requires finding intersection points of surfaces in a 3D histogram, which is computationally much more expensive than the same problem in 2D. A search in 3D can be based on octrees, the 3D analogues of quadtrees [4].

An alternative is the randomized Hough transform [5], that randomly selects triplets of points. The center of the circle passing through the triplet and its radius are stored in a 3D histogram. Peak finding can be done in 3D or in the 2D histogram of the circle centers. After finding a peak, the best circle center and radius is obtained by computing the medoid [6] of the entries in the peak bin.

1.3 Artificial Retina

The concept of the “Artificial Retina” was introduced in [7]. Similar to the Hough transform, it relies on a partition of the track parameter space into cells. Figure 5.3 shows a simple example with a track in 2D, specified by slope k and intercept d.

Fig. 5.3
figure 3

Left: A cell in the parameter space of (k, d) corresponds to an ideal track with these parameters. Right: The corresponding track receptors represent the intercepts of this ideal track with the tracking layers. The hits of a real track are close to the track receptors within the experimental resolution. (Adapted from [11], by permission of Elsevier)

The intensity of a cell is the sum of the responses of its associated receptors to the hits that are present in the layer. Assuming a Gaussian response, the intensity R(k, d) of the cell centered at (k, d) is given by:

$$\displaystyle \begin{gathered} R(k,d)=\sum_{i=1}^n\sum_{j=1}^m\exp\left(-s_{ij}^2/2\sigma_i^2\right), \end{gathered} $$
(5.9)

where s ij = y ij − (kx i + d) is the distance of hit j in layer i from the ideal track position in layer i, and σ i is a scale parameter that regulates the width of the receptive field in layer i. Other response functions are of course possible, and their shape and width can be adjusted for optimal performance. As with the Hough transform, track candidates correspond to the local maxima of intensity in parameter space.

The artificial retina is eminently suitable for high-speed track finding, as it can be highly parallelized and implemented on commercial FPGAs [8, 9]. For applications in the vertex locator of LHCb (Sect. 1.6.1.4) and in a test beam, see [10, 11].

1.4 Legendre Transform

The Legendre transform is an extension of the Hough transform, used to find common tangent lines or tangent circles through the origin to a set C of circles [12, 13]. In the context of track finding, the circles in C are drift circles in a drift chamber or a drift tube chamber [14], see Fig. 5.4. Assume a drift circle in the plane, with center (x w, y w) and radius ρ. A line parameterized by \(x\cos \varphi +y\sin \varphi -c=0\) is tangent to the circle if, and only if, its signed distance from the circle center is equal to ± ρ:

$$\displaystyle \begin{gathered} x_{\mathrm{w}}\cos\varphi+y_{\mathrm{w}}\sin\varphi-c=\pm\rho_i\quad \mathrm{or}\quad c=x_{\mathrm{w}}\cos\varphi+y_{\mathrm{w}}\sin\varphi\pm\rho_i. \end{gathered} $$
(5.10)

The drift circle in the image space (x, y) therefore corresponds to two sinusoids in the Legendre space (φ, c), see Fig. 5.4. The further procedure is the same as with the Hough transform.

Fig. 5.4
figure 4

Top: Image space (x, y); bottom: Legendre space (φ, c). The circled point in the Legendre space corresponds to the straight line in the image space

A circle through the origin can be parameterized by its radius R and the angle Φ, where the circle center is given by \(x_{\mathrm {c}}=R\cos \varPhi ,\ y_{\mathrm {c}}=R\sin \varPhi \). Such a circle touches the drift circle with center (x w, y w) and radius ρ if, and only if, the squared distance of the circle centers is equal either to (R + ρ)2 or to (Rρ)2:

$$\displaystyle \begin{gathered} 2R\,(\pm\rho+x_{\mathrm{w}}\cos\varPhi+y_{\mathrm{w}}\sin\varPhi)=x_{\mathrm{w}}^2+y_{\mathrm{w}}^2-\rho^2. \end{gathered} $$
(5.11)

In order to avoid large radii in the limit of the circle approaching a straight line, the Legendre space is chosen as (κ, Φ), where κ = 1∕R is the curvature of the circle. The drift circle again corresponds to two sinusoids in (κ, Φ):

$$\displaystyle \begin{gathered} \kappa=\frac{2(\pm\rho+x_{\mathrm{w}}\cos\varPhi+y_{\mathrm{w}}\sin\varPhi)}{x_{\mathrm{w}}^2+y_{\mathrm{w}}^2-\rho^2} \end{gathered} $$
(5.12)

The task of finding a circle through the origin can be reduced to the task of finding a straight line if the Legendre transform is preceded by a conformal transformation, see Sect. 5.1.1, which transforms the circle into a straight line while transforming the drift circles into circles.

1.5 Cellular Automaton

A cellular automaton (CA) is a dynamical system where space, time, and variables are discrete [15]. It has five fundamental defining characteristics [16]:

  1. 1.

    It consists of a discrete lattice of sites.

  2. 2.

    It evolves in discrete time steps.

  3. 3.

    Each site takes on a finite set of possible values.

  4. 4.

    At each site, the value evolves according to the same deterministic rules.

  5. 5.

    The rules for the evolution of a site depend on a local neighbourhood around it.

Probably the best known CA is Conway’s “Game of Life” [17]. Like many, but not all, subsequently proposed cellular automata, it assumes that the cells are located on a regular 2D rectangular lattice. However, this assumption is too restrictive for the application to track finding. Instead, the cells are represented by the nodes of a graph, and the neighbourhood of a cell C i is the set of all cells connected by an edge to C i.

The neighbours of a cell are divided into “inner” and “outer” neighbours such that if C i is an inner neighbour of C j, then C j is an outer neighbour of C i. The possible states of a cell are the non-negative integers. In practice, the states are bounded by some upper limit depending on the number of detector layers. The initial states of all cells are set to zero.

The earliest applications of the CA to track finding are described in [18,19,20,21]. With the exception of [18], cells are defined as short track segments connecting hits in adjacent detector layers; segments that skip a layer are sometimes allowed as well [22, 23]. Each cell has an inner hit and an outer hit according to the arrangement of the detector layers. Two cells are neighbours if the outer hit of one cell is the inner hit of the other cell. In addition, the neighbourhood relation can be further restricted by imposing a cut on the angle spanned by the two cells (segments). It is the task of the CA to find chains of neighbouring segments that correspond to actual tracks. This is achieved by the following rule of evolution. At each time step, the state of each cell is augmented by 1 if it has the same state as its inner neighbour. The states of all cells are updated synchronously. When no state changes anymore, the evolution is stopped. At this point, the state of a cell is the length of the longest unbroken chain of segments terminating in this cell. An illustration of the CA is shown in Fig. 5.5.

Fig. 5.5
figure 5

Illustration of the cellular automaton algorithm. It creates tracklets and links, numbers them as possibly situated on the same trajectory, and collects tracklets into track candidates. (From [24], by permission of Elsevier)

The actual search for track candidates starts with the cells with the highest state. If such a cell has an inner neighbour with a state that is smaller by 1, it is attached to the track candidate. This procedure is repeated until a cell with no inner neighbour is reached. If, at any point, several neighbours can be attached to the track candidate, either the “best” cell according to some criterion is selected, or the track candidate is split into two, and each candidate is followed independently. This inevitably results in candidates that share hits, requiring a final selection of compatible (non-overlapping) candidates. This is the topic of Sect. 5.3.

In tracking detectors with few layers and little redundancy [25], the CA can be complemented by prior information stored in a sector map [26, 27]. In this concept, the sensors are partitioned into sectors. In order to create a sector map, a large training sample of simulated tracks in a chosen angular and momentum region is generated. Two sectors are declared as “friends” if a sufficiently large number of tracks passes through them without hitting any other sector in between. The friendship relation defines an acyclic directed graph that is stored in the sector map.

In the track-finding phase, the hits are sorted into their sectors, and the segment finder is activated. It creates pairs of hits in friendly sectors and keeps those pairs (segments) that pass various cuts, which are also stored in the sector map. If two segments share a hit such that the outer hit of the inner segment is the same as the inner hit of the outer segment, they are passed to the neighbour finder, which applies additional cuts, also stored in the sector map. Finally, all surviving segments form the cells of the CA.

Different sector maps for different geometrical or kinematical regions can be created and applied sequentially. For instance, high-momentum, high-quality tracks can be found first by using a sector map with tighter cuts. After removing their hits, a second sector map with less selective cuts can be used to find the remaining tracks.

Another extension of the CA, termed the 4D CA, is described in [28, 29]. Pairs of segments or triplets of hits are accepted only if the time stamps of the hits are consistent with the hypothesis that they have been created by the same charged particle. For the application of the 4D CA in the CBM experiment, see Sect. 11.2.

1.6 Neural Networks

The application of neural networks to track finding was first proposed independently in [30] and [31]. In this approach, the network is of the Hopfield type [32], the neurons being track segments that connect observations in adjacent or nearby layers of the detector; see Sect. 5.1.6.1. More recently, the HEP.TrkX pilot project was established with the aim to develop deep neural networks for track finding in high-multiplicity environments typical for the LHC era [33]. Two deep networks are described in Sects. 5.1.6.25.1.6.3. A follow-up project, called Exa.TrkX, was started with a kick-off meeting in June 2019 [34]; a second workshop was held in April 2020 [35]. A reference to first published results can be found in Sect. 5.1.6.3.

1.6.1 Hopfield Network

A Hopfield network is a fully connected network with a single layer of neurons. In the simplest case, the neurons are binary with two states: s i = ±1, i = 1, …, n. Each pair (i, j) of neurons has a fixed connection weight w ij with w ij = w ji and w ii = 0. The states of the neurons evolve in discrete time steps according to the following prescription:

$$\displaystyle \begin{gathered} s_{i}(t)=\operatorname{\mathrm{sign}}\left[\sum_{j=1}^n w_{ij}\,s_{j}\hspace{0.5pt}(t-1)\right].{} \end{gathered} $$
(5.13)

The update can be synchronous (the states are recomputed in parallel) or asynchronous (the states are recomputed sequentially). The network has an associated function E(s), defined as:

$$\displaystyle \begin{gathered} E({\boldsymbol{s}})=-\frac{1}{2}\sum_{i,j=1}^n w_{ij}\,s_{i}\,s_{j},{} \end{gathered} $$
(5.14)

where s = (s 1, …, s n) is the state of the network. In analogy to the theory of spin glasses, E(s) is called the energy function of the network. It can be shown that E(s) is a non-increasing function of the time t and that the update rule Eq. (5.13) leads to a local minimum of E(s) [36].

In most applications, including the one discussed here, the aim is to find the global minimum rather than a local one. To this end, thermal noise is introduced in the network. At temperature T, the state s is Boltzmann distributed with the probability function

$$\displaystyle \begin{gathered} P({\boldsymbol{s}})=\frac{1}{Z}\exp\left[-E({\boldsymbol{s}})/T\right], \ \;\mathrm{with}\ \; Z=\sum_{{\boldsymbol{s}}} \exp\left[-E({\boldsymbol{s}})/T\right]. \end{gathered} $$
(5.15)

As the number of possible states rises exponentially with the number of neurons, the partition function Z is computed in the mean-field approximation [37], and the thermal average v i of s i is given by:

(5.16)

where the states v = (v 1, …, v n) are now continuous in the interval (−1, 1). The definition of the energy function is analogous to Eq. (5.14):

$$\displaystyle \begin{gathered} E({\boldsymbol{v}})=-\frac{1}{2}\sum_{i,j=1}^n w_{ij}\,v_{i}\,v_{j},{} \end{gathered} $$
(5.17)

and the update is modified accordingly:

$$\displaystyle \begin{gathered} v_{i}(t)=\tanh\left[\frac{1}{T}\sum_{j=1}^n w_{ij}\,v_{j}\hspace{0.5pt}(t-1)\right].{} \end{gathered} $$
(5.18)

Finding the global minimum or at least a low local minimum of the energy function is facilitated by deterministic annealing [38]. First, the energy function is minimized at high temperature; the temperature is then lowered according to a predefined cooling or annealing schedule. At low temperature, the states of the network are close to either 1 (active) or − 1 (inactive).

For the purpose of track finding, the problem has to be mapped on the Hopfield network such that the final state of the network corresponds to the solution of the problem. Similar to the CA, the neurons are short track segments connecting space points in adjacent or nearby layers of the tracking detector. To keep the number of neurons manageable, geometric cuts ensure that only segments that can be part of an actual track in the momentum range of interest are used as neurons. The sector map introduced in Sect. 5.1.5 can be used to store the cuts. A track can be considered as an unbroken chain of segments, so only pairs of segments sharing a point qualify for having a positive weight. Consider a triple of points (k, l, m) in consecutive layers, defining two neurons v kl and v lm. The weight w klm depends on the angle between the segments and on their length. In [31], it is defined as follows:

$$\displaystyle \begin{gathered} w_{klm}=\frac{\cos^\lambda\theta_{klm}}{d_{kl}+d_{lm}}, \end{gathered} $$
(5.19)

where λ is an odd exponent and d ij is the length of neuron v ij, either in space or in the projection to the bending plane. This definition of the weights favours combinations of short segments with small angles. Small weights can be set to zero.

Constraints on the possible final configurations of the active neurons can be included by adding a cost or penalty term to the energy function of the network. This serves to prevent association of a point to several tracks and to get approximately the expected number of active neurons in the final state. For example, in [39] the following cost term was used:

$$\displaystyle \begin{gathered} C({\boldsymbol{v}})=\alpha\left(\sum_{k,l,n,l\neq n}\!\!\!v_{kl}v_{kn}+\sum_{k,l,m,k\neq m}\!\!\!\!v_{kl}v_{ml}\right)-b, \end{gathered} $$
(5.20)

where α is a Lagrange multiplier and b is a small constant bias term. Minimizing the first term leads to a competition between neurons starting or ending in the same point, so that in the end at most one should survive. The final update equation is given by Eq. (5.16).

For an evaluation of the performance of the Hopfield network on real data from an experiment at the LEP electron-positron collider, see [39]. For an application to tracking in the ALICE experiment at the LHC, see [40].

1.6.2 Recurrent Neural Network

The dynamics of a particle track can be modelled by a nonlinear state-space model; see Sect. 3.2.3. A track is, thus, very similar to a discrete time series, the main difference being that the observations are not progressing in time but in space. Recurrent neural networks (RNNs), in particular networks of long short-term memory (LSTM) neurons [41, 42], are routinely used for forecasting time series arising in various contexts: financial, industrial, weather, traffic, etc. It is, therefore, to be expected that RNNs can be trained to learn the dynamics of a track and follow it in a way similar to the extended Kalman filter. The crucial difference is that in the Kalman filter, the dynamics is explicitly coded in the system equation and in the distribution of the process noise, whereas in the RNN, the dynamics is implicit in the weights learned by the network on a training set of examples.

The first successful attempt to train an RNN for track finding is described in [43]. Two models are presented. The first is a sequential hit predictor that, given a sequence of past hits, predicts the position of the next hit. The second model augments the first one by predicting the covariance matrix of the hit, using a Gaussian distribution. Both predictors are implemented as an LSTM layer, followed by a fully connected (FC) layer. The scheme of the Gaussian predictor is shown in Fig. 5.6. The training data for the RNN network and the GNN network in Sect. 5.1.6.3 were generated by the ACTS package [44].

Fig. 5.6
figure 6

Diagram of the Gaussian hit predictor model that takes a sequence of 3D coordinates as input and produces bivariate Gaussian probability distributions as next-step predictions. The architecture is the same as the basic hit predictor, but the model provides additional output that parameterizes the Gaussian covariance matrix. (Adapted from [43], by permission of the author)

1.6.3 Graph Neural Network

Like the CA described in Sect. 5.1.5, track finding with the graph neural network (GNN) is based on the representation of the tracking data by a graph [43]. The detector hits are the nodes (vertices) of the graph, and two nodes are connected by an edge if they are compatible according to some criterion. Such a criterion can be stored in a sector map as the one described in Sect. 5.1.5. The graph is fed into the GNN that consists of an input transformation layer (IL) followed by alternating units of edge networks (ENs) and node networks (NNs), each implemented as a multi-layer perceptron with two layers, see Fig. 5.7.

Fig. 5.7
figure 7

Diagram of the Graph Neural Network model which begins with an input transformation layer (IL) and has a number of recurrent iterations of alternating EdgeNet (EN) and NodeNet (NN) units. In this case, the final unit is an EN, making this a segment classifier model. (Adapted from [43], by permission of the author)

An EN computes a new weight for every edge from the features of the end nodes while an NN computes new features for every node from the current features and the edge-weighted aggregated features of all connected nodes in the adjacent layers. The network can be used to classify whether nodes/hits or edges belong to a track or not. The network in Fig. 5.7 classifies edges, as the last unit is an EN. An extension of the work in [43] to track seeding and hit labelling with GNNs is described in [45].

1.7 Track Following and the Combinatorial Kalman Filter

In track finding methods such as the Hough transform or the CA, the complete set of all hits serves as the primary input. Such methods have, therefore, been dubbed “global” [46]. This is in contrast to methods that find tracks locally or sequentially, one after the other. The most prominent example of a sequential method is track following.

In track following, a track candidate starts from a “seed”, i.e., a short track segment . This seed can in principle be anywhere in the tracking detector. Generating seeds in the outer layers of the trackers has the advantage of smaller occupancy and less background from low-momentum tracks that spiral in the inner layers of the tracker. Generating seeds in the inner layers, which are frequently pixel layers, has the advantage of using 3D hits with higher resolution both in the bending plane and in the longitudinal direction . As will be described in more detail in Chap. 10, ATLAS and CMS, the two general-purpose experiments at the LHC, have opted for the second solution.

The generation of seeds is often a simple combinatorial search for compatible triplets or quadruplets of hits, potentially assisted by a CA [47], and includes information about the size and position of the beam spot; see Sect. 7.1. Some examples of seed generation algorithms will be given in Chap. 10.

Once the seeds have been found, each seed is then followed through the tracker by extrapolating it toward the outside of the tracker or toward the production region, depending on where the seed is situated. After each extrapolation step, compatible hits are searched for and attached to the track candidate.

The progressive track recognition described in [48] can be extended to a combinatorial Kalman filter (CKF), introduced in [49, 50] under the name “Concurrent Track Evolution”, see Fig. 5.8. First, each seed is fitted with one of the methods described in Chap. 6 . The parameters and the covariance matrix of the seed are then extrapolated to the nearest tracker layer, taking into account interactions with the detector material; see Sects. 4.3, 4.4 and 4.5. The hits in the sensor in which the extrapolated trajectory intersects with the layer are tested for compatibility with the predicted track parameters using a chi-square statistic ; see Sects. 3.2.3 and 6.1.2. If n compatible hits are found, n copies of the predicted state, i.e., its track parameters and its covariance matrix, are generated, and each one of them is updated with one of the n hits according to the Kalman filter, Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32). The original state is also kept and marked as having a missing hit, giving a total of n + 1 track candidates. This procedure is iterated on each track candidate until the last layer of the tracker is reached or the count of missing hits in a candidate exceeds a preset threshold, typically one or two.

Fig. 5.8
figure 8

Schematic view of concurrent track evolution in a five-layered part of a tracking system with hexagonal drift cells, which is traversed by three particles, labelled T1, T2 and T3. The simulated drift time isochrones are indicated by circles. The propagation proceeds upstream from the right to the left and starts with a seed of hits from track T1 outside of the picture. (From [49], by permission of Elsevier)

In the course of the combinatorial Kalman filter, it may be necessary to limit the number of active candidates for reasons of memory and/or speed. In this case, the “worst” track candidates are discarded and not followed anymore. The quality of a track candidate can be measured by a combination of its total number of hits, its number of missing hits, and its total chi-square \({\chi ^2_{\mathrm {tot}}}\) (Sect. 6.4). The tuning of the combination is usually performed on simulated data, where the correct association of hits to tracks is known.

Avoiding a combinatorial explosion is an important issue in experiments with high track multiplicities. Therefore, the CKF in, for instance, ATLAS and CMS starts with seeds in the pixel detector with its very high resolution in all three spatial dimensions. As a consequence, the compatibility test in the first non-pixel layer rejects wrong hits with a high probability. As the CKF proceeds, the state becomes more precisely known, and the probability of attaching a wrong hit becomes even smaller. For more details, see Sects. 10.2 and 10.3.

1.8 Pattern Matching

Pattern or template matching is mostly used in real-time track finding for the purpose of triggering on charged tracks (see Sect. 5.2). It can be applied to detectors with a layer structure, each layer being segmented into sectors or bins [51, 52]; see also Sect. 5.1.5. A charged particle crossing the detector generates hits in certain bins of the layers, thereby creating a pattern of “on” and “off” bins, which can be coded as strings of zeros and ones. The set of physically meaningful patterns is generated by extensive simulations and stored in a pattern bank.

The number of patterns to be stored depends on the geometry and size of the detector, on the characteristics of the tracks to be found, and on the granularity of the binning. For the purpose of triggering a lower bound on the momentum is usually imposed; therefore, only the patterns of tracks with momentum above the threshold need to be stored. The granularity of the binning determines how well two nearby tracks can be separated, and therefore depends on the occupancy of the layer. If the binning is very coarse, for instance, only one bin per module in a silicon strip tracker, fewer patterns have to be stored, but two nearby tracks cannot be resolved (see Fig.1 in [52]). If the binning is very fine, for instance, a bin for each strip in the extreme case, nearby tracks can be resolved almost perfectly, but the number of patterns is far too large to store. Also, higher track multiplicity implies smaller track separation on average, which in turn requires finer granularity. In any case, the granularity has to be optimized by extensive simulation studies.

In an event, a particular configuration of hits is generated and translated into a pattern. This pattern is compared to all patterns in the bank, and matching patterns constitute track candidates. The sketch in Fig. 5.9 shows an example of a pattern generated by two tracks and some patterns stored in the pattern bank. To cope with inefficiencies of the tracking detector, it may be necessary to accept partial matches.

Fig. 5.9
figure 9

Top: Two tracks in a detector with four layers creating two patterns. Bottom: Four patterns in the pattern bank

As the number of patterns that have to be stored can be very large, the method is feasible only if there is sufficient memory and if the comparison can be made very fast and in a highly parallel mode. The matching has therefore to be implemented in VLSI hardware, using content-addressable or associative memories [51, 52]. The pattern can be arranged in a tree structure, starting with coarse granularity of the sectors and proceeding to finer granularity. It is also possible to store patterns with variable resolution [53].

Pattern matching was used for real-time track finding both in the vertex detector and in the drift chamber of the CDF experiment; see Sect. 5.2.1. Later applications include the FTK (Fast Track Trigger) for the ATLAS experiment, see Sect. 5.2.2, and a proposed track trigger for the new CMS tracker that will be installed for the HL-LHC, see Sect. 5.2.3 .

2 Online Track Finding

An early proposal for online track finding by dedicated hardware is the one described in [52]. It is based on matching hit patterns in the tracking detector with a pattern bank stored in associative memory; see Sect. 5.1.8. As field programmable gate arrays (FPGAs) were still in their infancy at the time of the publication, the associative memory (AM) is an array of 400 custom VLSI chips [51] that can hold \(\mathcal {O}(10^5)\) patterns [51]. The pattern matching is organized as a tree search through different levels of spatial resolution [52]. This was soon followed by the actual implementation in the CDF experiment at the Tevatron collider.

2.1 CDF Vertex Trigger

The Silicon Vertex Tracker (SVT, [54]) was designed to provide track impact parameter information for the level-2 trigger of the CDF experiment [55] . It was realized in custom hardware [56, 57]. Track finding is done by an AM, refining the information from the XFT track processor (see below) that finds tracks in the central drift chamber for the level-1 trigger . Tracks are fitted by a farm of digital signal processors using a linearized fit that requires only scalar products; see Sect. 6.1.6.

The upgraded SVT, now renamed Silicon Vertex Trigger, is described in [58, 59]. If an event passes the level-1 trigger, the SVT extrapolates the XFT tracks, associates hits in the silicon vertex detector , and computes the transverse impact parameter. Its average latency is 24 µs. The hit association is performed by custom AM chips, the linearized track fit in FPGAs.

The eXtremely Fast Tracker (XFT, [60, 61]) is a track processor that finds tracks with high transverse momentum in the central drift chamber of the experiment [62]. It is highly parallel and reports its results every 132 ns, in time for the trigger level-1 decision. The XFT works with hits in the four axial superlayers. Track identification is done in two stages called the Finder and the Linker. The Finder searches for high-p T segments in each of the superlayers, and the Linker searches for high-p T track candidates by combining segments from at least three (out of four possible) segments. Both stages use pattern matching to accomplish their tasks.

2.2 ATLAS Fast Tracker

The Fast Tracker (FTK) system of the ATLAS experiment is designed for global track reconstruction after each level-1 trigger [63, 64]. It enables the level-2 trigger to gain rapid access to tracking results. The system is based on the Silicon Vertex Trigger of the CDF experiment; see the preceding subsection. It uses hit data from four pixel layers and from both sides of four silicon strip layers, twelve in total . The tracker volume is split into 64 regions or towers, which are processed independently. The sensors are divided into “superstrips” with a coarser resolution.

Data processing starts with clustering in the pixel and in the strip sensors. The clustering algorithm is optimized for execution in an FPGA [65]. After clustering, a track is represented by a list of superstrips that corresponds to a pattern in the custom AM chip [66]. Pattern matching produces track candidates at coarse resolution; these are then refined by a high resolution track fit in an FPGA. Missing layers are allowed in both stages. The number of patterns that have to be stored is currently in the order of a billion. This number would have to rise by a factor of the order of 10 at the HL-LHC [63], because of the larger number of channels in the tracker and of the higher track multiplicity, which requires finer granularity (see Sect. 5.1.8). For this and other reasons, the FTK will not be upgraded for operation at the HL-LHC; instead, the focus will be on the acceleration of the track reconstruction software [67, 68]; see also Sect. 10.2.

The fitted tracks are sent to the Second Stage Board wherein they are extrapolated to the remaining silicon layers and fitted again. Finally, duplicate tracks are removed based on the number of common hits and χ 2 [64].

2.3 CMS Track Trigger

Starting in 2026, the luminosity of the LHC is expected to increase by a factor of about ten above the current design value . The current CMS silicon tracker, having been in operation since 2009, cannot withstand the radiation level predicted for the HL-LHC and has no triggering capability. It will, therefore, be replaced by a newly designed tracker [69] . The new design features so-called p T modules as the basic sensing devices. A p T module consists of two closely stacked sensors, either a pixel and a strip sensor (PS) or two strip sensors (2S); see [70] and Fig. 5.10. A charged particle crossing the stack generates a stub that consists of two clusters. Tracks of sufficiently high transverse momentum (p T > 2 GeV) have little curvature and a small offset in the sensor stack, in contrast to tracks with smaller p T, which are bent more strongly and have a larger offset. Stubs that pass the cut on p T are the input to the track trigger.

Fig. 5.10
figure 10

A p T module of the new CMS tracker. High-momentum tracks pass the p T cut, low-momentum tracks fail. (From [70], reproduced under License CC-BY-3.0)

Three concepts are explored for reconstructing tracks at the level-1 trigger, two using an all-FPGA system [71], the third one using a combination of AM and FPGAs [72].

2.3.1 Time Multiplexing

The all-FPGA system is based on the principle of time multiplexing [73]. The fundamental idea is that several sources send their information from a given bunch crossing to a single destination for processing. The architecture of the system has two layers: the first extracts and preprocesses the stubs and sends them to the second layer, which contains the track finding processors.

Two track finding algorithms are investigated in the time-multiplexed track trigger, tracklets and Hough transform. The tracklet algorithm has the following stages:

  1. 1.

    Stub organization: The stubs are sorted into sectors in ϕ.

  2. 2.

    Seeding: Tracklets are formed from stubs in adjacent layers.

  3. 3.

    Projection: Tracklets are projected to other layers to search for matching stubs, both inside-out and outside-in.

  4. 4.

    Fit: Track fit of stubs matched to the tracklet.

  5. 5.

    Duplicate removal: Candidate selection based on χ 2.

The second algorithm [71] has the following stages:

  1. 1.

    Hough transform: Stubs on the same trajectory are transformed into lines that meet in the vicinity of a single point; see Sect. 5.1.2.

  2. 2.

    Fit: Combinatorial Kalman filter, see Sect. 5.1.7.

  3. 3.

    Duplicate Removal: Tracks are removed whose parameters do not correspond to the bin of the Hough transform where they were found.

2.3.2 Pattern Matching

Pattern matching is done in parallel in 48 regions called trigger towers [72]. Each tower has two boards for pattern recognition and track fitting. Pattern recognition is done with lower resolution data, which are compared to predefined patterns by a content addressable memory. Only patterns corresponding to tracks with p T > 2 GeV are stored. If a match is found, the corresponding high-resolution data are sent to the track fitting module. The linearized track fit (Sect. 6.1.6) runs in the FPGA and computes the helix parameters and the χ 2.

3 Candidate Selection

After track finding, track candidates may share hits. If two candidates share more hits than is deemed acceptable, for instance more than one, the track candidates are called incompatible. The incompatibility relation can be represented by an undirected graph (V, E), where the n vertices v i ∈ V, i = 1, …, n are the track candidates. Two incompatible track candidates v i and v j are connected by the edge e ij = e ji, which is defined as the unordered pair (v i, v j). The number of compatible track candidates can be maximized by finding an independent set of vertices of maximal size, i.e., a subset V 1 ⊆ V  of vertices, no two of which are connected by an edge.

Alternatively, the graph can represent the compatibility relation, in which case two compatible tracks/vertices are connected by an edge. The problem is then to find a maximum clique, i.e., a fully connected subset V 2 ⊆ V  of vertices of maximal size.

Both problems are NP-hard [74] so that finding a maximum independent set or maximum clique can be very time-consuming for large graphs. A set of C routines for finding cliques in a compatibility graph can be found in [75]. The fastest exact algorithm for finding independent sets published up to now is the one in [76]. An independent set can also be obtained by finding a vertex cover, i.e., a set V 3 ⊆ V  of vertices the removal of which leaves an independent set. In [77] , it was shown that there is a one-to-one correspondence between minimal vertex covers or maximal independent sets and steady states of Hopfield networks [32] with nonpositive weights. In addition, such a network converges to its steady state in at most 2n steps. There may be many minimal vertex covers of different size, and the steady state reached depends on the initial state of the network, of which there are 2n. Finding the minimal vertex cover by an exhaustive or random search of all initial states is, therefore, computationally infeasible for large n.

In any case, finding the largest independent set is not necessarily the best approach for finding an “optimal” set of track candidates, as the quality of the track candidates (see Sect. 6.4) should be taken into account, too. If the quality of the candidate v i is quantified by a positive weight w i, the problem is now to find an independent set that maximizes the sum of weights (MWIS). Like its unweighted counterpart, the MWIS problem is NP-hard. For a recent approximative solution and numerous references to previous work, see [78].

If the weight w i is mapped to a quality indicator q i ∈ [0, 1], the network in [77] can be generalized to a recurrent network with annealing that aims to find the set of compatible vertices with the largest sum of weights [79]; see also Sect. 11.1.